Nitrogen fixing potential in extreme environments

Reut Sorek Abramovich

A thesis in fulfilment of the requirements for the degree of Doctor of Philosophy

School of Biotechnology and Biomolecular Sciences The University of New South Wales Sydney, Australia

March 2013 ORIGINALITY STATEMENT

‘I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgement is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the project's design and conception or in style, presentation and linguistic expression is acknowledged.’

Signed ……………………………………………......

Date ……………………………………………......

iii

COPYRIGHT STATEMENT

‘I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Dissertation Abstract International (this is applicable to doctoral theses only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation.'

Signed ……………………………………………......

Date ……………………………………………......

AUTHENTICITY STATEMENT ‘I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format.’

Signed ……………………………………………......

Date ……………………………………………......

iv

Abstract

Biological nitrogen fixation is a key process in providing accessible nitrogen to Earth’s biosphere. This process has been studied in various habitats yet extreme environments still remain relatively unexplored. The nifH gene codes for the Fe protein component in the nitrogenase, which facilitates the nitrogen fixation.

Our aims in this study were to assess diazotrophic diversity, richness and community structure in three unique environments and analyse potential adaptations in the Fe protein composition and structure. Our methods included a terminal-restriction fragment length polymorphism (T- RFLP) analysis on 16S rDNA, PCR amplification of the nifH gene, statistical t-test analysis of amino acid compositions, a novel evolutionary analysis and 3D modelling with the I-TASSER web server.

Boulder Clay and Amorphous Glacier are two ice-free areas in Terra Nova Bay, Antarctica, which differ in their geological origins and physio-chemical properties. DNA yields from ice- core samples ranged from 0.29 ng PL-1 in Amorphous Glacier to 88 ng PL-1 in Boulder Clay. Bray-Curtis cluster analysis suggested Boulder Clay bacterial profiles were similar to each other, but cluster separately from Amorphous Glacier.

The hypersaline (>70 ppt) bays of Shark Bay, Western Australia, are home to the stromatolites microbial mats. The microbial diversity of diazotrophs from two different years, 1996 and 2004, was investigated. Our analysis indicated columnar stromatolites included a common persisting cyanobacterial diazotroph, a Cyanothece or Xenoccocous. Both samples contained novel nifH gene sequences of low similarity to uncultured nifH clones from saline to hypersaline environments, and their inferred NifH amino acid sequences were highly similar to unicellular, non-heterocystous and γ,G- sequences.

Paralana’s hot radon springs (PHS, 57 C°) are situated in South Australia. Phylogenetic analysis indicated a rich and diverse group of amino acid NifH sequences from α-, γ-, and δ- Proteobacteria, Chloroflexi and Cyanobacteria phyla. These results suggested aerobic and anaerobic with conventional Mo nitrogenase might be involved in nitrogen fixation.

Our bioinformatic analysis suggested that halophilic adaptations, with an increase in salt bridges, acidic residues and a decrease in bulkier hydrophobic amino acids, did occur in stromatolite diazotrophs and that partial thermophilic adaptations, mainly an increase in salt bridges, Pro and charged residues, did occur in the PHS diazotrophs. These studies provide new insight on the ongoing evolution of nitrogen fixation in extreme environments.

v

Acknowledgments

I would like to thank my supervisors - Prof. Brett A. Neilan, Dr. Michelle Gehringer and Dr. Brendan P. Burns, for their support and advice during my PhD studies. I have benefited from their advice, and followed their wise council. I would like to thank Dr. Sohail Siddiqui, Prof. Aharon Oren and Prof. Nir Ben Tal, for their support and invaluable suggestions.

The Australian Centre for Astrobiology was a creative hub for me and other students, a place to exchange ideas, thoughts and avenues of exploration into the biggest mysteries of life. I would like in particular to thank the director, Prof. Malcolm Walter, for his ongoing support of my efforts, and thank Carol, Jessica, Maria, Tamsyn, David and Ivan for creative conversations during my research career at the centre.

I would also like to thank my friends and colleagues at the Blue Green Groove Machine lab, for their patience, help and suggestions. I could not have come this far without their knowledge. My special thanks go to: Anne D.J., Michelle A, Kristin, Falicia, Alex, Hannah, Ivan, Jasper, Troco, Shane, Stefan, Jae, Frank, Tim, Maria, Sarah, Tamsyn, Angie, Rati, Julia, Shauna, Leanne, Will and Alper.

The Mars Society of Australia (MSA) is a group of intelligent and devoted people. My 2009 field trip to the Paralana Hot Springs in South Australia, with NASA’s Spaceward Bound program, was very special thanks to their efforts and hard work. I salute you: David Cooper, David Wilson, Jon Clarke, Guy Murphy, Mark Gargano, Eriita Jones, Marcia Tanner and Shaun Strong. I am also indebted to Dr. Chris McKay and Prof. Penelope Boston for enlightened conversations and field trip advice & help.

Thank you my coffee break friends: Rhea, Shahar, Nitzan, Eldad and Mikayla.

To my ever loving husband, Aviv - Thank You, my No. 1. To my parents & brother, Aryeh & Channa & Shachar - Thank you for inspirational stories. To my first born daughter, Eleanor - You were the best surprise I’ve have ever received. May your life be interesting and filled with joy.

One last statement if I may -

“The time has come for humanity to journey to Mars.” (The Mars Society founding declaration, University of Colorado, Boulder, Colorado, United States, 1998)

vi

List of Publications

Abramovich, R.S., Pomati, F., Jungblut, A.D., Guglielmin, M., and Neilan, B.A. (2012) T- RFLP Fingerprinting Analysis of Bacterial Communities in Debris Cones, Northern Victoria Land, Antarctica. Permafrost and Periglacial Processes 23: 244-248.

Contributions to academic conferences

Abramovich, R.S., Burns, B.P., and Neilan, B.A. Temporal Biodiversity of Potential Diazotrophs in Stromatolites, Shark Bay, Western Australia. Australian Mars Exploration Conference. July, 17-19th 2009, Adelaide, South Australia.

Abramovich, R.S., Burns, B.P., and Neilan, B.A. Nitrogen fixation potential in stromatolites, Shark Bay, Western Australia. The 9th Australian Space Science Conference. 28 - 30th, September 2009, Sydney, Australia.

Abramovich, R.S., Gehringer, M.M., and Neilan, B.A. Biodiversity of Potential Diazotrophs in Microbial Communities of Stromatolites at Shark Bay, Western Australia. Sydney Astronomy and Astrophysics Student Symposium. 18th, June 2010, Sydney, Australia.

Abramovich, R.S., Gehringer, M.M., and Neilan, B.A. Biodiversity of Potential Diazotrophs in Microbial Communities of a Radon Hot Spring in the Flinders Ranges and Stromatolites at Shark Bay. The 8th International Congress on . 12-14th, September 2010, Azores, Portugal.

Abramovich, R.S., Gehringer, M.M., and Neilan, B.A. Biological nitrogen fixation potential in stromatolites, Shark Bay, Western Australia. The 16th SUNFix Symposium. 25th of June 2010, Sydney, Australia.

Abramovich, R.S., Gehringer, M.M., Burns, B.P., and Neilan, B.A. Biodiversity of Potential Diazotrophs in Stromatolites of Shark Bay and a Radon Hot Spring. The Australian Society for Microbiology, Annual Scientific Meeting. 4-8th, July 2010, Sydney, Australia.

vii

List of Acronyms and Abbreviations

ARA Acetylene reduction assay ATCC American Type Culture Collection ATP Adenosine triphosphate BLAST Basic local alignment search bp Base pairs BSA Bovine serum albumin cDNA Complementary Deoxyribonucleotide acid Chla Chlorophyll a DMSO Dimethyl sulfoxide DNA Deoxyribonucleotide acid dNTP Deoxyribonucleotide triphosphate DTT Dithiothreithol EDTA Ethylenediaminetetraacetic acid EPS Exopolysaccharide FISH Fluorescence in situ hybridisation g Gram Pg Microgram GC-MS Gas chromatography-mass spectrometry GTP Guanosine-5'-triphosphate h Hour IPTG Isopropyl- D-thiogalactoside kb Kilobase kDa Kilodalton km Kilometre km2 Square kilometre L Litre PL Microlitre LB Luria-Bertani m Metre m.b.s.l. Meters below surface level PM Micromolar min Minute ml Millilitre mm Millimetre MQ Milli-Q mRNA Messenger RNA NCBI National Centre for Biotechnology nd Not detected ºC Degrees Celsius ORF Open reading frame OTU Operational Taxonomic Unit PCC Pasteur Culture Collection (France) PCR Polymerase chain reaction PDB Protein Data Bank pmol Picomol viii rDNA Ribosomal Deoxyribonucleotide acid RDP Ribosomal Database Project RFLP Random fragment length polymorphism RFLP Restriction fragment length polymorphism RNA Ribonucleic acid rpm Revolutions per minute rRNA Ribosomal ribonucleic acid RT Room temperature RT-PCR Reverse Transcriptase PCR s Second SD Standard deviation SDS Sodium dodecyl sulphate SRB Sulphate reducing bacteria SSU Small sub-unit TAP T-RFLP Analysis Program T-RFLP Terminal Restriction Fragment Length Polymorphism UTCC University of Toronto Culture UTEX University of Texas Culture UV Ultraviolet light

ix

TABLE OF CONTENTS

Chapter 1 Introduction ...... 5

1.1 The extremophiles ...... 5 1.2 Nitrogen significance and source ...... 6 1.3 Nitrogenase structure and function ...... 7 1.3.1 Fe protein structure and function ...... 7 1.3.2 MoFe protein structure and function ...... 8 1.3.3 Nitrogenase modus operandi ...... 9

1.4 Diazotroph phylogeny ...... 10 1.4.1 Cyanobacteria ...... 11 1.4.2 Other prokaryotic diazotrophs ...... 12

1.5 Psychrophilic diazotrophs ...... 13 1.6 Halophilic diazotrophs ...... 15 1.7 Thermophilic diazotrophs ...... 18 1.8 Analyzing nitrogen fixation ...... 20 1.9 Research aims ...... 21

Chapter 2 T-RFLP analysis of potential diazotrophs in glacial and permafrost formations in Northern Victoria Land, Antarctica...... 24

2.1 Introduction ...... 24 2.2 Materials and methods ...... 28 2.2.1 Study sites ...... 28 2.2.2 Ice core collection ...... 28 2.2.3 Sample preparation ...... 29 2.2.4 DNA extraction and amplification ...... 29 2.2.5 Terminal Restriction Fragment Length Polymorphism (T-RFLP) ...... 31 2.2.6 T-RFLP profiles ...... 32 2.2.7 T-RFLP Analysis Program (TAP) ...... 32 2.2.8 RDP 9, TAP and T-RFLP databases ...... 32 2.2.9 PCR amplification of nifH genes ...... 32

2.3 Results and discussion ...... 33 2.3.1 Amorphous Glacier and Boulder Clay T-RFLP Profiles ...... 34 2.3.2 In silico database composition ...... 37 2.3.3 Amorphous Glacier and Boulder Clay cryospheric bacteria ...... 40 1

2.4 Concluding remarks ...... 44

Chapter 3 Diazotrophic diversity in columnar stromatolites of Shark Bay, Western Australia...... 46

3.1 Introduction ...... 46 3.2 Materials and methods ...... 51 3.2.1 Sample collection and sample sites ...... 51 3.2.2 DNA isolation and PCR amplification of nifH genes ...... 51 3.2.3 Clone libraries and Restriction Fragment Length Polymorphism (RFLP) ...... 53 3.2.4 DNA sequencing ...... 53 3.2.5 Phylogenetic sequence analysis ...... 54 3.2.6 Diversity, richness and coverage estimators ...... 55 3.2.7 Accession numbers ...... 56

3.3 Results and discussion ...... 57 3.3.1 General methodology consideration ...... 57 3.3.2 2004 clone library BLAST & BLASTX analysis ...... 58 3.3.3 1996 clone library BLAST & BLASTX analysis ...... 62 3.3.4 BLAST and BLASTX comparative analysis ...... 64 3.3.5 Phylogenetic analysis ...... 65 3.3.6 Coverage, diversity and community structure ...... 73 3.3.7 Nitrogen fixation potential in Shark Bay ...... 77

3.4 Concluding remarks ...... 82

Chapter 4 The bacterial diazotrophic community in a radon hot spring, South Australia. ……………………………………………………………………………………………83

4.1 Introduction ...... 83 4.2 Materials and methods ...... 86 4.2.1 Sample collection ...... 86 4.2.2 DNA isolation and PCR amplification of nifH genes ...... 87 4.2.3 Clone library and Restriction Fragment Length Polymorphism (RFLP) ...... 88 4.2.4 DNA sequencing ...... 89 4.2.5 Phylogenetic analysis ...... 89 4.2.6 Diversity, richness and coverage analysis ...... 89 4.2.7 Accession numbers ...... 89

4.3 Results and discussion ...... 90 4.3.1 BLAST & BLASTX comparative analysis ...... 90 2

4.3.2 Phylogenetic analysis ...... 96 4.3.3 Coverage, diversity and community richness ...... 101 4.3.4 Nitrogen fixation in Paralana Hot Springs ...... 102

4.4 Concluding remarks ...... 107

Chapter 5 Structural and evolutionary adaptations in the Fe protein component of the nitrogenase ...... 108

5.1 Introduction ...... 108 5.2 Material and methods ...... 114 5.2.1 Evolutionary conservation ...... 114 5.2.2 Residue composition ...... 114 5.2.3 Statistical analysis ...... 114 5.2.4 Structural characteristics ...... 115

5.3 Results ...... 116 5.3.1 Evolution, composition and structure of the Cluster III Fe protein ...... 116 5.3.2 Evolution, composition and structure of the Cluster I Fe protein ...... 126 5.3.3 Comparative analysis of cluster I and cluster III Fe proteins ...... 142

5.4 Discussion ...... 148 5.4.1 Methodology ...... 148 5.4.2 Evolution, composition and structure in cluster I & III ...... 150

5.5 Concluding remarks ...... 155

Chapter 6 Halophilic and thermophilic adaptations in the Fe protein ...... 156

6.1 Introduction ...... 156 6.2 Material and methods ...... 158 6.2.1 Evolutionary conservation ...... 158 6.2.2 Residue composition ...... 158 6.2.3 Statistical analysis ...... 158 6.2.4 Distance matrices ...... 158 6.2.5 Structural characteristics ...... 158

6.3 Results ...... 159 6.3.1 Potential halophilic adaptations in the Fe protein ...... 159 6.3.2 Potential thermophilic adaptations in the Fe protein ...... 168

6.4 Discussion ...... 176 6.4.1 Halophilic adaptations ...... 176 3

6.4.2 Thermophilic adaptations ...... 180

6.5 Concluding remarks ...... 182

Chapter 7 Conclusions & future work ...... 184

References ...... 189

Appendix A ...... 225

4

Chapter 1 Introduction ______

1.1 The extremophiles

Today, it is clear micro-organisms are one of Earth’s most extraordinary life forms, employing complex strategies to withstand harsh conditions we, as human species, cannot endure (Schleifer, 2004). Records from the beginning of the 19th century describe bacteria capable of withstanding acidic conditions, thriving in hot geysers or enduring 0°C (Pikuta et al., 2007) and references within), and since then it has been found that extreme geochemical and physical conditions such as acidity, high salinity, intense radiation and extremes in temperature or pressure, do not block microbial life from thriving (Rothschild and Mancinelli, 2001). Bacteria living in such conditions have been termed extremophiles, and have been shown to be useful industrial agents (Herbert and Sharp, 1992; Pennisi, 1997; van den Burg, 2003), as well as model organisms in astrobiology research (Imshenetsky et al., 1967; Friedmann, 1993; Cavicchioli, 2002).

Since the 1950’s more than 40 successful robotic missions were lead by NASA and other space agencies, which provided new knowledge regarding atmospheric and geological processes on other bodies in the solar system (NASA, 2012). These missions revealed a wide range of environmental conditions - from very high temperatures to very cold (730 K, Venus, 110 K, Jupiter), high pressure (90 atm, Venus) and intense radiation (0.3-0.4 Sv yr-1 of galactic cosmic rays dosage, Mars) to name a few variables (Zeitlin et al., 2004; Moses et al., 2005; Pätzold et al., 2007; Pierrehumbert, 2011).

While Earth-like life forms have not been detected anywhere in the solar system as of yet (nor elsewhere in the universe), on Earth there are extreme environments which are teeming with microbial life. A , 3550 m deep in the mid Atlantic ridge provided us with Thermococcus barophilus - a barophilic, hyperthermophilic Archean that grows optimally at 358 K and 396 atmospheres (Marteinsson et al., 1999). Salinibacter, an obligate which grows optimally with 200 - 300 g l–1 salt, was isolated from a saltern crystallized pond (Bardavid et al., 2007). Active endolithic Cyanobacteria (Chroococcidiopsis) and heterotrophic bacteria live in halite crust, in the hyperarid Atacama Desert (d 2 mm y-1), under extreme

5 dryness and radiation conditions (Davila et al., 2008; Ríos et al., 2010). Hygroscopic salts (such as sodium chloride, magnesium chloride) were found on Mars, and are considered a potential niche to support endolithic microbial communities, even under Martian conditions (Davila et al., 2010). Earth’s extreme environments, analogous to other environments in distant planets, are worthy of intense research because investigating their ecological systems expands our knowledge and chances of finding Earth-like life on other solar bodies.

The environmental limits, in which life can thrive, especially microbiological life, are consequently constantly being re-defined. Amongst the known biochemical pathways, nitrogen fixation is one of the most important, as it is a fundamental process of acquiring an abiotic element, and integrating it into complex biological and ecological systems.

1.2 Nitrogen significance and source

Biological Nitrogen Fixation (BNF) is an important process to all Earth life. The element is present in amino acids, purines, pyrimidines and other important biological molecules (Postgate, 1982). BNF is coupled to important biochemical pathways such as and represents a direct input of one of earth’s most abundant atmospheric elements, N, into living organisms and the biosphere in general (Postgate, 1987). Most of the biosphere cannot access the atmospheric source directly and although N2 is the most abundant gas in Earth’s atmosphere (78%), it is extremely unreactive resulting from the triple bond between N atoms having a high bond energy of 225 kcal mol-1 (Howard and Rees, 1996). Most organisms require nitrogen to be reduced to ammonia before they can integrate this important element into variable biosynthesis pathways (Berg et al., 2002; Berman-Frank et al., 2003).

Non-biological nitrogen fixation occurs via atmospheric ionization caused by lightning and UV radiation, and in an industrial process devised by F. Haber in 1910 (and developed later for commercial purposes by C. Bosch, (Kim and Rees, 1994)). Lightning and UV radiation discharge enough electrons and energy to break the triple bond and form nitrogen oxides, while in the industrial process an iron catalyst is used (with 200-500 atmospheres and 330-800°C) followed by the addition of hydrogen to form ammonia (Mishustin and Shilnikova, 1971; Postgate, 1982; Postgate, 1987; Howard and Rees, 1996; Berg et al., 2002).

BNF capabilities are found in micro-organisms from two kingdoms – the and Bacteria. The major nitrogen fixing phylogenetic groups in the Eubacteria are the green bacteria, Firmicutes, Cyanobacteria and Proteobacteria. In Archaea there are several genera found to be nitrogen fixating: , Methanobacterium, Methanococcus, Methanolobus,

6

Methanoplanus, Methanosarcina and Methanothermus (Gary Stacey, 1992; Dixon and Kahn, 2004). Additional processes are involved in the nitrogen cycle on Earth and provide oxidized and reduced forms of nitrogen. Aerobic nitrification converts ammonia into oxidized varieties, + - using ammonia and nitrite oxidation pathways (NH4 / NH3ÆNO2ÆNO3 ). Denitrification - - converts oxidized forms to dinitrogen (NO3 ÆNO2 ÆNO->N2O->N2), as does anaerobic ammonium oxidation (ANAMMOX) by the Planctomycetes phylum and members of the (Francis et al., 2007).

1.3 Nitrogenase structure and function

All diazotrophic micro-organisms have in common an enzyme – the nitrogenase, which compromises about 10% of the total cellular proteins (Burns et al., 1972). An ATP-hydrolyzing complex of two proteins: Dinitrogenase, a α2β2 heterotetramer where α encoded by nifD and β by nifK genes, and the dinitrogenase reductase, a γ2 homodimer encoded by nifH gene (Georgiadis et al., 1992; Dilworth et al., 1993). These components are sometimes referred to as the MoFe protein and Fe protein, respectively.

Furthermore, during the last two decades crystallographic structures of nitrogenase have emerged, leading to new 3D structural models and new insights and understanding of its mechanism. Currently there are 36 3D structures of nitrogenase in the (Research Collaboratory for Structural Bioinformatics, Protein Data Bank,H.M. Berman, 2003). The first were crystallographic structures of nitrogenase reductase from Azotobacter vinelandii and Clostridium pasteurianum at 2.9 and 3.0 Å resolution, respectively (Georgiadis et al., 1992; Kim et al., 1993). Since then, 34 structures of nitrogenase were resolved from A. vinelandii, C. pasteurianum, Klebsiella pneumoniae and Azospirillum brasilense at 1.16 to 3.2 Å resolution (H.M.Berman, 2000; H.M. Berman, 2003). The following paragraphs briefly describe the structure and function of the individual components of nitrogenase.

1.3.1 Fe protein structure and function

Research based on crystallographic structures, genetic and molecular methodologies has revealed that the Fe protein, a ~60kD protein, has several functionalities: it binds MgATP/MgADP (each monomer contains an ATP-binding site in a single domain) and is required for the initial biosynthesis of the FeMo cofactor and its insertion into the MoFe protein (Burgess and Lowe, 1996). It also transfers electrons from a suitable donor (such as reduced ferredoxin or flavodoxin) to the dinitrogenase. The homodimer is composed of two polypeptide

7 chains linked by a single redox-active Fe4S4 cluster that can reach three oxidative states (Howard and Rees, 1996, see figure 1). The nucleotides are essential for the electron transfer because they induce conformational changes which result in receptive iron atoms in the clusters. The Fe protein structure reflects these multiple functionalities via its complex structure and motifs: eight parallel beta-sheets flanked by nine alpha-helices, a nucleotide binding fold (Walker et al., 1982) and two switch regions, designated by Schlessman et al. (1998) Switch I and Switch II, which interact with the gamma-phosphate group of the bound MgATP and facilitate the conformational changes (Jang et al., 2000; Jang et al., 2004).

Figure 1. General view of the Fe protein. The two polypeptide chains are linked by a single redox-active Fe4S4 cluster - chains F (red) and E (blue). Secondary structure depicted as determined by Tezcan et al. (2005). A1 - Fe4S4 cluster centred view, B1 - view centres on the cleft between the two chains. A2, B2 - same viewing angles, only the PCR amplified region of NifH in each chain is coloured. From Azotobacter vinelandii (PDB ID: 2AFH).

1.3.2 MoFe protein structure and function

This ~250kD component is encoded by nifD and nifK genes and contains two types of clusters: P clusters and FeMo cofactors (Kim and Rees, 1994). The α subunit contains a FeMo cofactor, typically a MoFe7S9 metal cluster (see figure 2). Some organisms contain nitrogenases wherein Molybdenum is replaced by either Iron or Vanadium. Homocitrate and two residues, His and Cys, coordinate the FeMo cofactor in the protein (Burgess and Lowe, 1996). Each P cluster contains eight iron atoms and seven sulphides linked to the protein by six Cys residues. The 8 clusters serve as a conduit for electron transfer from the Fe protein to the FeMo cofactor to which N2 has been hypothesized to bind (Howard and Rees, 1996).

Figure 2. General overview of α2β2 heterotetramer MoFe protein from Klebsiella pneumoniae (PDB ID: 1H1L), the FeMo cofactor (with the homocitrate molecule close by), cation binding site and the P cluster are marked in the image (Hawkes et al., 1984).

1.3.3 Nitrogenase modus operandi

Three events of electron transfer are involved in the nitrogenase modus operandi: (1) reduction of Fe protein through an electron transfer from a suitable donor – ferredoxin or flavodoxin, (2) transfer of the electron to MoFe protein, (3) electron transfer from the active site within MoFe protein (presumably FeMo cofactor) to the substrate. For each 1 mol dinitrogen, 2 mol of ammonia and 1 mol of H2 form. A total of 8 electrons are thus consumed (Burgess and Lowe, 1996). For every electron utilized in this fashion, 2 mol of MgATP are hydrolyzed to MgADP. Reaction formula:

+ - N2 + 8 H + 8 e + 16 MgATP Æ 2 NH3 + H2 + 16 MgADP + 16Pi

The first step in nitrogenase operation is the formation of a complex between two enzymes, a reduced dinitrogen reductase (with MgATP bound) and dinitrogenase. One electron is then transferred to the P cluster, and 2 MgATPs are hydrolyzed to 2 MgADPs and 2Pi. The next step is a slow dissociation of dinitrogen reductase from dinitrogenase. This is usually the rate limiting step, responsible for the slow turnover rate, 1.25 sec (Howard and Rees, 1996). Dinitrogen reductase is now bound with MgADP and free from the complex. It will be first reduced again before 2 MgADPs and 2Pi are released, and then 2 MgATPs bind again (quite 9 rapidly). These steps will be repeated until 8 electrons are transferred to dinitrogenase in order to reduce N2 to 2 NH3 (and form H2). Electrons and protons are then transferred within the dinitrogenase (in a way not entirely known) to the active site (presumably the FeMo cofactor) to form ammonia and hydrogen as mentioned above (Postgate, 1987; Raymond et al., 2004b). In addition, oxygen inhibits synthesis of nitrogenase in many diazotrophs and exerts different effects on the individual nitrogenase components. Whereas both the P and Fe4S4 clusters are inhibited by oxygen, the Fe4S4 cluster is irreversibly damaged in vitro. Inhibition has also been associated with the presence of reactive oxygen species (ROS) (Postgate, 1987; Berman-Frank et al., 2003).

Nitrogenase is also a non exclusive enzyme and is capable of reducing other molecules besides dinitrogen. Some of these are listed in table 1 (Burns et al., 1972; Rasche and Seefeldt, 1997).

Table 1. List of molecules reduced by nitrogenase.

Acetylene C2H2 Æ C2H4 Nitrous oxide N2O Æ N2 + H2O - Azide N3 Æ N2 + NH3 - Cyanide CN Æ CH4 + NH3 + CH3NH2 + traces of C2H4 and C2H6 Methyl Isocyanide CH3NC Æ CH4 + C2H6+ C2H4+ C3H6 + C3H8 + CH3NH2 1-Propyne, 1-Butyne (C4H6) reduced to corresponding alkenes

Generally, proteins from extremophiles must adapt in order to retain their functionality under extremities of temperature, pH, salinity and more (Siddiqui and Thomas, 2008). Additive changes to the primary structure - by changing amino acids composition for instance, or changes at higher structural levels, provide structural stability under such conditions. While an extremophilic nitrogenase, or one of its individual components, is yet to be isolated and characterized in depth, other proteins from , and some (Eisenberg et al., 1992; Madern et al., 2000; Feller and Gerday, 2003; Georlette et al., 2003) have been assessed and provide a starting point to look at the Fe protein and its possible adaptations to extreme conditions. The huge advancements in computing power (Schaller, 1997) means structural analysis based in bioinformatics and molecular results provides an increasing number of plausible models to work with (Polański and Kimmel, 2007; Edwards et al., 2009; Ramsden, 2009).

1.4 Diazotroph phylogeny

Genetic research on nitrogen fixation genes originally focused on Klebsiella pneumoniae nif genes (Postgate, 1987; Glenn and Dilworth, 1991; Gary Stacey, 1992). Comparing the nif gene 10 structure of K. pneumoniae to other diazotrophs like Azotobacter vinelandii, Clostridium pasteurianum or Anabaena spp. revealed high level of nucleotide conservation between nifH genes in the different diazotrophs. Originally, 20 nif genes were identified, arranged in eight transcriptional units: nifJ, nifHDKTY, nifENX, nifUSVWZ, nifM, nifF, nifLA, nifBQ (Renato et al., 2000). Transcription of the nif operons is prevented in the presence of oxygen, sources of combined nitrogen and also when temperature reaches above a certain threshold, different per organism (Fay, 1992; Klopprogge et al., 2002; Steunou et al., 2006). The system is regulated by nifL,A products and additional genes, ntrA,B,C and glnB,D (Bohme, 1998).

1.4.1 Cyanobacteria

Cyanobacteria were of special focus in regards to the genetic basis for nitrogen fixation, as they were known to exhibit high fixation rates and contributed substantially to the global biological nitrogen fixation budget (Gallon, 2001; Berman-Frank et al., 2003). Their ability to fix dinitrogen has been studied extensively and some cyanobacterial groups synthesize exclusively in specialized heterocyst cells (Fleming and Haselkorn, 1973, see table 1). Some heterocystous Cyanobacteria were not exclusive - in Anabaena variabilis ATCC 29413, for example, the nif genes are organized in two clusters: nif1, which is expressed only in a heterocyst cell, and nif2, which is expressed in vegetative cells only under anaerobic conditions (and also expressed in heterocysts; (Fleming and Haselkorn, 1973; Bohme, 1998; Adams, 2000). Also, nifH and nifD are contiguous and separated from nifK by 11kb. During heterocyst differentiation, excision of the 11kb fragments (by xisA gene product) leads to the restoration of nifHDK operon and synthesis of nitrogenase subunits begins (Bohme, 1998). However, it seems nifHDK genes are contiguous in non heterocystous strains (Berman-Frank et al., 2003). Most non heterocystous Cyanobacteria can fix only under micro-oxic or anoxic conditions which occur, for instance, when photosynthesis is not active and therefore oxygen is not produced (i.e. during dark periods (Bergman et al., 1997).

11

Table 2. Nitrogen fixing Cyanobacteria genera, heterocystous and non heterocystous.

N2 Fixing Heterocystous Cyanobacteria Anabaena, Aphanizomenon, Calothrix, Cylindrospermum, Nodularia, Nostocales Nostoc, Scytonema Stigonematales Chlorogloeopsis, Fischerella, Geitleria, Stigonema

N2 Fixing Non Heterocystous Cyanobacteria Synechocystis group, Gloeothece, Cyanothece group, Gloeocapsa group, Aerobic Synechococcus group, Trichodesmium, Oscillatoria

Micro-oxic or Pseudanabaena, Lyngbya, Phormidium, Plectonema, Oscillatoria anoxic conditions

Anoxic Chroococcidiopsis, Dermocarpa, Myxosarcina, Xenococcus, Pleurocapsa group

1.4.2 Other prokaryotic diazotrophs

While BNF is present in several bacterial phyla, it is not restricted to the Cyanobacteria. Phylogenetic analysis based on nif genes and nif genes homologs, depicts the diazotrophic community as five (I-IV) distinct groups (see figure 3, Raymond et al., 2004a) . Group I+II are diazotrophs with a Molybdenum dependent nitrogenase - an active nitrogenase with Molybdenum in dinitrogenase component, operative under aerobic and anaerobic conditions. These groups include members of the Cyanobacteria, Proteobacteria, Firmicutes (Clostridia), (Frankia) and Archaea (). Group III includes Molybdenum independent nitrogenase - an active alternative nitrogenase which can use Iron or Vanadium as a cofactor in the metalo clusters. This group includes strictly anaerobic Proteobacteria,

Spirochetes, Chlorobia and Archaea members. Group IV constitutes organisms which have the genes, but do not fix dinitrogen and are mostly Archaeans and group V is a diverse group of organisms which do not fix dinitrogen, yet they possess homolog genes to nifH and nifD which encode protochlorophyllide reductase and chlorophyllide reductase. These enzymes are analogues of nitrogenase and are related to pigment biosynthesis. NifH gene phylogeny analyses support above grouping in general (Chien and Zinder, 1996; Zehr et al., 2003a; Moisander et al., 2006; Zhang et al., 2007a).

Phylogenetic studies based on nifH, D, K, E, N gene sequences yielded tree topologies that were fairly similar to 16S rDNA phylogeny (Zani et al., 2000; Jenkins et al., 2004). Additionally, nifH was found to be highly conserved across diverse taxa in general, as well as a part of a conserved nifHDK transcriptional operon (Omoregie et al., 2004b). These reoccurring results, from genetic and evolution studies, support a vertical type of gene transfer, from a common Archaean ancestor, followed by loss of gene activity due to 12 environmental adaptations (Fani et al., 2000). The inconsistencies with 16S rDNA phylogeny are usually explained as Lateral Gene Transfer and loss of genes due to loss of function (Hartmann and Barnum, 2010). In general, the evolutionary progress of the nif genes is a complicated matter, not entirely resolved as of yet.

Figure 3.General overview of an unrooted nifH gene tree topology modified from Zehr et al. (2003a) with four major clusters I-IV.

It is of interest to review what is known of nitrogen fixation in extreme environments. The following paragraphs provide background on nitrogen fixation in relation to cryospheric, hypersaline and high temperature environments, from a microbiology point of view.

1.5 Psychrophilic diazotrophs

Nitrogen fixation has been studied in Antarctica for several decades now. Early studies in the 1960’s detected nitrogen fixation by Cyanobacteria, mainly by Anabaena, Calothrix and Nostoc genera, and to a lesser extent by other genera - Stigonema and Tolypothrix (Smith and Russell, 1982). Nitrogen fixation was usually detected between 4-10°C, during mid day and was rarely detected during winter or below 0°C (Stewart, 1970b; Davey and Marchant, 1983). More recently, N2 fixation was found to represent between 6.3%-33% of total N incorporated by microbial component in ponds or soils in Antarctica (Fernandez-Valiente et al., 2001), with the higher end of contributed N reported from microbial mat studies, mostly from surface layers and during day time (Fernandez-Valiente et al., 2007), supporting heterocystous Cyanobacteria as 13 the substantial providers of reduced nitrogen in the Antarctic ecosystem (Vincent et al., 1993). Recent studies also have reported unicellular (Gloeocapsa, Synechococcus) and filamentous non-heterocystous (Oscillatoria , Phormidium) Cyanobacteria as active nitrogen fixers, usually under dark conditions and at substantially lower optimal temperatures than tropical or temperate strains (Pandey et al., 2004). These Cyanobacteria were not considered true psychrophiles, since nitrogen fixation optima was in the range of 15-25°C, and they were not able to grow at 0°C or at subzero temperatures (Pikuta et al., 2007).

While Cyanobacteria were the dominant active nitrogen fixers reported in most studies, other potential diazotrophs have been reported, from the Proteobacteria, Verrumicrobia, Firmicutes, Spirochaetes and Bacteroidetes in Antarctica and other cryospheric environments. Representatives of these major phyla were also found in other cryospheric environments such as ice shelves, sub-glacial lakes and streams, as well as fjords and deep sea basalt flows (Priscu et al., 1998; Carpenter et al., 2000; Bowman et al., 2003; Gaidos et al., 2004; Liu et al., 2006; Perreault et al., 2007; Jungblut and Neilan, 2010).

The bacterial diversity in polar permafrost is considered high as nearly 40 genera have been isolated or cloned from Arctic and Antarctic permafrost so far (Gilichinsky et al., 2007; Gilichinsky et al., 2008), some of which are diazotrophs. The various genera identified in these regions include: Acinetobacter, Bradyrhizobium, Comamonas, Lysobacter, Methylobacterium, Pseudomonas and Sphingomonas of the Proteobacteria, Bacillus, Clostridium, Paenibacillus, Planococcus and Sporosarcina from Firmicutes, Flavobacterium and Pedobacter from Bacteroidetes and Arthrobacter, Brevibacterium, Corynebacterium, Kocuria, , Rhodococcus and Streptomyces from the phylogenetic group Actinobacteria (Soina et al., 1995; Shi et al., 1997; Zhou et al., 1997; Kochkina et al., 2001; Steven et al., 2006; Vishnivetskaya et al., 2006; Steven et al., 2007; Mindlin et al., 2008; Niederberger et al., 2008).

The permafrost environment itself is characterized by temperatures below or equal to 0°C for at least two consecutive years (Muller, 1947) and severe environmental conditions such as extreme cold, high salt concentrations and low nutrient supply (Friedmann et al., 1993; Aislabie et al., 2006; Barrett et al., 2006). Permafrost covers more than 25% of the Earth’s landmass, yet its microbiology remains largely unexplored. Relatively little is known of Antarctic permafrost (Gilichinsky et al., 2007; Cannone et al., 2008; Niederberger et al., 2008) and most current data originate mainly from Siberian permafrost studies (Shi et al., 1997; Bakermans et al., 2003; Vishnivetskaya et al., 2006).

14

A characteristic of cryospheric environments is that they usually have low bacterial content and are not easy to culture (Christner et al., 2005; Miteva, 2008), and are therefore suitable to the application of molecular based techniques in exploring their bacterial communities, diversity and richness. Terminal Restriction Fragment Length Polymorphism (T-RFLP) is a DNA fingerprinting method that also enables one to produce bacterial community profiles and match bacterial genera to specific terminal restriction fragments (T-RFs) after digestion of fluorescently labelled 16S rRNA amplicons with specific restriction enzymes (Liu et al., 1997; Marsh et al., 2000; Derakshani et al., 2001).

T-RFLP has been used widely in microbial ecology studies of temperate zones and in versatile environments such as marine and lake sediments, soils, plant roots and more (Clement et al., 1998; Marsh, 1999; Liesack and Dunfield, 2004). However, to date it has been rarely used for community analysis of permafrost or glacial environments. Bhatia et al. (2006) employed this method to explore the relationship between supra-, sub-, and pro-glacial bacterial communities of the John Evans Glacier (Canada) but did not identify any bacteria. T-RFLP is a quick and sensitive molecular technique for exploring possible bacterial genotypes in a given environmental sample, enabling future studies to target specific groups or genes.

1.6 Halophilic diazotrophs

It is of interest to look into nitrogen fixation and halophilicity in two aspects - halophilic diazotrophs, and nitrogen fixation in hypersaline environments in general. Halophilic micro-organisms require salt in the media for optimal growth and can be divided to slightly, moderate or extremely halophilic (2-5 %, 5–20 % and minimum of 20–30% NaCl respectively, in media). Halophiles can be found in the Archaea, Bacteria and Eukaryota domains (DasSarma and Arora, 2006; Ma et al., 2010). Hypersaline environments are generally defined as containing salt in higher concentration than sea water (3.5% total dissolved salts, or 35 PSU). Halophilic micro-organisms were detected and isolated from solar saltern ponds, Great Salt Lake (USA), the Dead Sea (Israel et al.), African soda lakes, Hamelin Pool (Western Australia), deep-sea brines, and many others worldwide localities (Oren, 2002; DasSarma and Arora, 2006; Ma et al., 2010; Goh et al., 2011).

Moderate diazotrophic halophiles exist amongst the Cyanobacteria and other prokaryotes (see table 3). Very few extreme halophilic bacteria possess nif genes, and none have been studied extensively in terms of their nitrogen fixation capabilities. halophila (γ- proteobacteria) nif genes have been mapped and nitrogenase shown to be active and mediating hydrogen production (Tsuihiji et al., 2006). nifH genes were reported also from H. abdelmalekii 15 and H. halochloris (Tourova et al., 2007) of the family (γ- proteobacteria; Chromatiales), which also includes additional halophilic diazotrophs genera - Ectothiorhodospira and Thiorhodospira. These species are slightly to moderately halophilic, and their nitrogen fixation capacity remains largely unexplored (Hirschler-Réa et al., 2003; Imhoff, 2006). None of the known nitrogen fixing Archaean, members of the Methanococcales, Methanomicrobiales and Methanobacteriales, are halophiles (Leigh, 2000).

Nitrogen fixation studies in the Dead Sea (347 g l-1 salinity) have not been conducted to date, though several halophilic micro-organisms with potential for diazotrophy were isolated. A halophilic Rhodospirillum sodomense have been isolated from the Dead Sea, but it lacked the nitrogenase activity usually found in the family of Rhodospirillaceae (Madigan et al., 1984; Mack et al., 1993). Another moderate halophile from the Dead Sea, Ectothiorhodospira marismortui, was able to grow on N2, but very poorly or not at all (Oren et al., 1989). The Dead Sea represents the most saline environment known to date, and thus the upper salinity limits for nitrogen fixation.

However, even though an extreme halophilic diazotroph seems to be a rare commodity, studies into other hypersaline environments clearly indicated nitrogen fixation occurs under stressful conditions. Few investigations in such environments revealed different dynamics of nitrogen fixation (Pinckney et al., 1995;Paerl et al., 2003; Yannarell et al., 2007).

Microbial mat in a tropical hypersaline lagoon (74‰ salinity) has exhibited higher nitrogen fixation rates once introduced to lower salinity levels, from 74 to 37‰ (Pinckney et al., 1995). Additional experiments have reported similar results (Paerl et al., 2003; Yannarell et al., 2007) with the interesting addition that non-cyanobacterial diazotrophs were more sensitive to salinity changes than cyanobacterial diazotrophs (Yannarell et al., 2006). Nitrogen fixation rates were rather similar during dark and light periods, until oxygenic photosynthesis was blocked, which caused a big spike in nitrogen fixation rates under light conditions (Pinckney and Paerl, 1997). These results suggested that halophilic anaerobic phototrophic diazotrophs were important to nitrogen fixation just as Cyanobacteria, yet they are more sensitive to changes in salinity, and hence their composition may vary. In another study, a hypersaline (90-78‰) Microcoleus chthonoplastes dominated microbial mats showed high nitrogen fixation rates during night time and low fixation rates during the day (Omoregie et al., 2004b).

16

Table 3. Representatives of moderately halophilic diazotrophs.

Cyanobacteria Halothece Microcoleus chthonoplastes O. limnetica O. salina Oscillatoria neglecta Phormidium ambiguum Synechococcus

Chloroflexi Chloroflexus aurautiacus

Bacteroidetes/Chlorobi group Chlorobium limicola C. phaeobacteriales

Proteobacteria Alkalilimnicola halodurans Desulfovibrio halophilus Ectothiorhodospira Halomonas maura Marichromatium purpuratum Rhodospirillum salexigens Thiocapsa roseoparsarcina Thiorhodococcus minor Thiorhodospira sibirica

References: (Madigan et al., 1984; Yakimov et al., 2001; Oren, 2002; Argandoña et al., 2005; DasSarma and Arora, 2006; Imhoff, 2006; Tsuihiji et al., 2006; Tourova et al., 2007).

This established that the active nitrogen fixers were non heterocystous Cyanobacteria (Plectonema boryanum, Halothece, Phormidium spp), and halophilic anaerobic sulphate reducer similar to Desulfovibrio spp (Omoregie et al., 2004b). This suggested that lack of oxygen enabled more diazotrophs to actively fix nitrogen.

Halophilic Bacteria and Archaea adapt to saline conditions mainly via ‘salt in’ or ‘salt out’ strategies, cell membrane and proteomic modifications (Pikuta et al., 2007). With the first strategy, a halophile tends to accumulate salt ions (K+ Cl-, Na+) in high concentrations within the cytoplasm - thus creating an internal osmotic pressure to counter balance the environmental stress (Oren, 1986, 1999). Due to the high concentration of salt ions, intracellular electrostatic charges of the enzymes change significantly and require further adaptations in enzyme structure and composition to maintain activity and bind water molecules and ions efficiently (Rengpipat et al., 1988; Madern et al., 2000). Oren (1999) states that the salting in strategy has been found to date only in Halobacteriales (Archaea) and Haloanaerobiales (Bacteria) orders. In the second strategy, ‘salting out’, a halophile synthesises and accumulates organic compatible solutes such as betaines, ectoines, N-acetylated diamino acids and N-derivatized carboxamides of glutamine in order to maintain an osmotic balance (Galinski and Trüper, 1994). It is suggested that these low molecular weight osmolytes interact with water molecules via their 17 hydrophilic and hydrophobic regions and counteract the ionic imbalance, yet the exact mechanism of their model of interaction with proteins is still under investigation (Galinski, 1993; Oren, 1999).

In halophilic Archaea membrane modifications may include specific transport systems to accommodate the import or export of salt ions into the cytoplasm, bacteriorhodopsin (as a light driven proton pump, to expel salt ions) and high content of glycerol isopranoid ethers lipids to maintain membrane integrity under high salt concentrations (Yamauchi et al., 1992; Gambacorta et al., 1995; van de Vossenberg et al., 1998; van de Vossenberg et al., 1999; Gliozzi et al., 2002).

Theoretically, proteins in micro-organisms which employ several of these strategies won’t require specific adaptations as to compete with the salt ions for water molecules. Yet, genetic analysis of several halophilic bacterial genomes, known to employ compatible solutes for stress management, has clearly indicated changes in the genetic code and in proteins residues composition in comparison to non-halophilic bacterial proteins (Severin et al., 1992; Galinski and Trüper, 1994; Oren, 1999; Paul et al., 2008; Rhodes et al., 2010) and suggest there are specific genetic variations for proteins coping with salt induced stress conditions. The main finding from metagenomic studies of halophilic micro-organisms, indicated that halophilic proteins possessed more acidic residues (Asp, Glu) on the protein exterior, than in their interior or in the active site (Lanyi, 1974; Rao and Argos, 1981; Madern et al., 1995; Madern et al., 2000; Fukuchi et al., 2003).

1.7 Thermophilic diazotrophs

The hot geysers of California were the first terrestrial environment in which a thermophilic Chlamydobacteriales was discovered in 1866 (Brewer, 1866; Edwards, 1868). Since then, our knowledge has expanded the known temperature boundaries for life. High temperatures can degrade chlorophyll (>75°C), proteins, nucleic acids (>70°C) and increase the fluidity of membranes and yet, thermophilic Archaea and Bacteria can survive and grow in high temperatures. They can furthermore be divided into moderately thermophilic, which have a growth optimum at 50°–60°C, thermophilic micro-organisms, with an optimum higher than 70°C, and hyperthermophilic, with an optimum higher than 80°C (Rothschild and Mancinelli, 2001; Pikuta et al., 2007).

Microbial mats in hot environments, mainly hot springs, have been studied in regards to their diazotrophic capabilities. A few decades of research into microbial mats from Yellowstone

18

National Park, have portrayed the nitrogen fixation dynamics and participants within a wide temperature range (16°-82°C) in this unique environment (Stewart, 1970a; Miyamoto et al., 1979). Within the mats, nitrogen fixation occurs in various layers, during daytime and night. During daytime, it was established that heterocystous Cyanobacteria Mastigocladus laminosus and members of the genus Calothrix were the active nitrogen fixers, at 55° and 40°C respectively, in mid layers of the mats (Stewart, 1970a; Miyamoto et al., 1979). Under dark conditions, 14 morphological diverse sulphate reducing anaerobic diazotrophs, were fixing nitrogen, at temperature ranges of 30°-60°C (Wickstrom, 1984). Unicellular Synechococcus spp. have been also identified as active nitrogen fixers at 60°C, while nifHDK gene transcripts were high during sunset and nil when light levels were high and the mat oxic (Steunou et al., 2006). Accordingly, nitrogenase activity (via acetylene reduction) was highest during night time, when the mat was anoxic. It would appear then, that in the hot springs of Yellowstone National Park, unicellular Cyanobacteria Synechococcus in the mats upper levels, as well as heterocystous Mastigocladus in mid layers, fix atmospheric nitrogen with temporal differences. Heterotrophic bacteria fix nitrogen during night time, when oxygen levels are low (Hamilton et al., 2011b); (Steunou et al., 2006; Steunou et al., 2008).

Roseiflexus spp have been identified as potential diazotrophs in this system (Klatt et al., 2011) and recently, a diverse array of nifH phylotypes have been reported from 57 springs, including springs at 89°C, in Yellowstone National Park (Hamilton et al., 2011a). The most reoccurring phylotypes were identified as Mastigocladus laminosus strain CCMEE 5201, Synechococcus sp. JA-3-3Ab (Cyanobacteria), Burkholderia tropica, B. xenovorans LB400, and Dechloromonas sp. SIUL (E-Proteobacteria). Aquificae, α-γ-G-Proteobacteria and Verrucomicrobia diazotrophic representatives were less frequent (Hamilton et al., 2011a). The maximum rates of nitrogen fixation were recorded at 82°C and pH 2.5 by an isolated anaerobic single nifH phylotype, related to Leptospirillium ferrooxidans (Hamilton et al., 2011b). This is the highest recorded temperature for nitrogen fixation by a bacterial species. Bacterial , Hydrogenobacter thermophilus strain TK-6 and Thermocrinis albus DSM 14484 (Aquificales), posses a nifH gene copy in their respective genomes (NC_013799, CP001931), yet taxonomic studies of these species and others in the Aquificales order have not indicated they were actively fixing atmospheric nitrogen (Kawasumi et al., 1984; Huber et al., 1998; Eder and Huber, 2002). In the Archaea, the highest temperature for nitrogen fixation was recorded at 92°C, by a Methanocaldococcus jannaschii -like isolate (Mehta and Baross, 2006) with a nifH gene copy most similar to Methanothermococcus thermolithotrophicus, the only other known thermophilic Archaea to fix nitrogen at high temperatures (Belay et al., 1984).

19

Thermophiles accumulate compounds, such as amino acids and sugars (and their derivatives), as well as mannosylglycerate and glucosylglycerate, in response to stress conditions (Borges et al., 2002). Under high temperatures it was found these compounds protect enzymes from denaturing or aggregating, thus demonstrating their multipurpose function, under heat as well as osmotic leverage in saline stress (Empadinhas and da Costa, 2010). In addition, proteins from thermophiles have several characteristics which enable them to function under normally damaging temperatures, extremely thermostable enzymes can remain active above 85°C (Pikuta et al., 2007). These features include changes in the primary, secondary and tertiary structural hierarchies, which produce a compact thermophilic protein, highly complex, relatively short in length and more hydrophobic in nature, in comparison to mesophilic or non-thermophilic homologs (Jaenicke and Böhm, 1998; Haney et al., 1999). A higher percentage of charged amino acids (Glu, Lys, Arg), accompanied by fewer uncharged polar residues (Ser, Thr, Asn, and Gln) and more salt bridges provide a network of ionic bonds and hence stability to the tertiary structure (Daniel et al., 2008; Somero, 2003). Additional features reported included: shortening of the N- and C-terminals, increased amounts of Pro, decreased Gly content, fewer and smaller internal cavities and higher degrees of oligomerisation. Thermostable enzymes are thus more rigid, and need higher melting temperatures to denature and become inactive (Jaenicke and Böhm, 1998; Somero, 2003; Greaves and Warwicker, 2009).

1.8 Analyzing nitrogen fixation

There are several molecular and chemical methods available to analyze nitrogen fixation and 15 collect relevant data. Dinitrogen fixation rates are usually measured by two techniques - N2 uptake and the Acetylene Reduction Assay (ARA) (Stewart, 1967; Stewart, 1973). Potential and active nitrogen fixers are usually determined by extraction of DNA and RNA from environmental sample or bacteria of choice, followed by Polymerase Chain Reaction (PCR) amplification process and analysis (Muyzer et al., 1993).

The molecular approach of analysing nitrogen fixation via DNA or RNA extractions is quite robust and reliable, with few known disadvantages. In general, even though DNA-based methods are considered better in exploring natural microbial diversity than classic culturing techniques (Amann et al., 1995; Head et al., 1998), there are several possible biases generated by DNA extraction methods and PCR kinetics which might affect the objective representation of an uncultured environmental microbial community. Adsorption of DNA to soil particles or mucilaginous polysaccharides produced by many micro-organisms can inhibit DNA extraction (Frostegard et al., 1999; Tillett and Neilan, 2000). The PCR process may be faulty at the

20 selection stage e.g., higher binding efficiencies to GC rich templates, or at the drift (amplification) stage, resulting in a 1:1 product ratio bias, due to quick amplification of an initially higher concentrated template. This would then result in a biased view of the original sample DNA content and composition (Suzuki and Giovannoni, 1996; Polz and Cavanaugh, 1998). Additional problems in PCR process (mostly relating to 16S rDNA amplification) include for instance: PCR chimeras, bias due to PCR cycling conditions, limitations involving primers design and more (Wilson and Blitchington, 1996; Marchesi et al., 1998; Qiu et al., 2001).

These problems can, however, be circumvented, and molecular techniques to identify diazotrophs in environmental samples, via amplification of the nifH gene specifically, have been successfully implemented and reviewed by the scientific community for at least two decades (Zehr and McReynolds, 1989). Specifically, the nested PCR approach, targeting nifH gene, has been successfully tried and implemented in environmental studies of aqueous origins (marine, fresh water, ice, snow, salt pans, etc) and terrestrial origins (soil, rhizosphere, rocks, etc) under a wide range of physical and chemical conditions (Zehr et al., 1995; Affourtit et al., 2001; Brown et al., 2003; Mehta et al., 2003; Short and Zehr, 2005; Izquierdo and Nüsslein, 2006; Jungblut and Neilan, 2010; Singh et al., 2010).

There are over 38,000 matches of the nifH gene currently in NCBI GenBank database (as of December, 2011), making it a favourable reference gene for use in phylogenetic and genetic studies. The partially amplified portion of the nifH gene encodes the nitrogenase Fe protein and provides insights into the function and structure of this important protein.

1.9 Research aims

It is thus evident that a wide variety of diazotrophs in microbial mats participate in diel cycles of nitrogen fixation, under stressful conditions. While the general dynamics remain similar, the diazotrophic participants are different per extreme environment, and most probably represent an optimal adaptation to the respective environment.

I chose to identify potential diazotrophs from three different environments: Antarctic permafrost, halophilic microbial mats from Western Australia and thermophilic microbial population from a hot and slightly radioactive spring in South Australia. I also have assessed their adaptation to environmental conditions via changes to the Fe protein, as manifested in the nifH gene.

21

We aimed to assess the diversity and potential for diazotrophs in Boulder Clay and Amorphous Glacier, two ice-free areas in Terra Nova Bay, Antarctica. I have employed molecular and computational methods which included environmental DNA extraction, amplification of the bacterial 16S rDNA and Terminal Restriction Fragment Length Polymorphism (T-RFLP) analysis, followed by an in-depth analysis with the T-RFLP Analysis Program (TAP). This allowed for a diversity and structure analysis, with preliminary results as to who are the diazotrophs in these unique sites.

The question of nitrogen fixation in the Shark Bay environment has never been addressed before. I chose to employ a molecular approach which included environmental DNA extraction, PCR amplification of the nifH gene followed by clone libraries, restriction fragment length polymorphism, DNA sequencing and phylogenetic sequence analysis (Zehr et al., 1998; Omoregie et al., 2004c). I was able to characterise diazotrophs in samples obtained in two different years, and assess the diversity and structural changes to the bacterial community as well as potential halophilic adaptations in the Fe protein of the stromatolites.

Paralana Hot Springs (55.6°C), a hot spring in South Australia, was investigated before for its bacterial community (Anitori et al., 2002) and nothing is known in regards to the diazotrophic diversity. I used the same research procedure as described for the Shark Bay environment. I was able to compare the diazotrophic community characteristics to other thermal microbial systems and assess potential thermal adaptations in the Fe protein of the springs’ diazotrophs.

Specific research aims were -

1. Estimation of bacterial diversity and identification of potential nitrogen fixers using T-RFLP community analysis and PCR amplification of the nifH gene in glacial and permafrost formations in Northern Victoria Land, Antarctica (chapter 2).

2. Assessment of diazotrophic diversity, richness and community structure in stromatolites in Shark Bay, Western Australia from two different years (1996 & 2004, chapter 3).

3. Assessment of diazotrophic diversity, richness and community structure in Paralana Hot Springs, South Australia (chapter 4).

4. Analysis of molecular data from aims 2 and 3, and investigate potential adaptations in the Fe protein composition and structure (chapter 5).

22

Overall, extreme environments harbour novel solutions for biotechnology, as well as analogous conditions to environments on other worlds. The overall objective of this thesis was to contribute information in regards to diazotrophs in extreme environments and how they adapt to their environment.

23

Chapter 2 T-RFLP analysis of potential diazotrophs in glacial and permafrost formations in Northern Victoria Land, Antarctica.

______

2.1 Introduction

Antarctica has been the focus of microbial research for some time now, due to its extreme climate and pristine conditions. Until a few decades ago, glacial formations and permafrost areas on the Antarctic continent have been seen as abiotic systems. However, new data are emerging that indicate microorganisms live within cryospheric geological features. Diverse bacterial compositions have been described from recent and ancient permafrost (Rivkina et al., 2004, and references within). Bacteria were found in ice cores from Lake Vostok, Mizuho Base in the Enderby Land Mountains, and the Yamato Mountains in Dronning Maud (Christner et al., 2001; Segawa et al., 2010). Nitrospira isolates, for instance, were detected in Luther Vale soil samples, in Northern Victoria land and also in sediment cores at 761 m.b.s.l. from the Mertz Glacier Polynya (MGP), Antarctica (Bowman and McCuaig, 2003; Aislabie et al., 2009). Antarctic microbial population have changed our views of the continent as abiotic, and substantial research have identified mainly Proteobacteria members, as well as Firmicutes, Cytophaga-Flavobacteria- Bacteroidetes (CFB group), Actinobacteria and Deinococcus members to successfully function under cold and desiccation stressful conditions (see also chapter 1, section 1.5). Our study focused on two localities in the Terra Nova Bay area, Northern Victoria Land (see figure 1). Past microbial studies in this area analysed various ecological niches, such as soil and seawater from coastal and terrestrial stations (Nicolaus et al., 1991; Nicolaus et al., 1996; Bargagli et al., 2004; Pepi et al., 2005). Some 140 bacterial isolates were identified and characterised using molecular tools, such as 16S rDNA amplification, fluorescence in situ hybridization (FISH), clone libraries and culture-dependant methods (Michaud et al., 2004; Yakimov et al., 2004; Lo Giudice et al., 2007). Spore-forming Bacilli species were identified from a seawater sample in Rod Bay, and Alicyclobacillus has been isolated from geothermal soils on Mount Melbourne (Nicolaus et al., 1998; Pepi et al., 2005). Burkholderia, a cold-tolerant, hydrocarbon-degrading soil bacteria, was also found in sea water samples from Rod Bay (Yakimov et al., 2004). In addition, clones affiliated with Burkholderiales were found in soil samples from a Northern Victoria Land locality (Niederberger et al., 2008). Pseudomonas, a Gram-negative, aerobic bacterium known to inhabit 24 cold marine ecosystems, was detected in sea water samples from Santa Maria Novella and Rod Bay (Yakimov et al., 2004; Lo Giudice et al., 2007).These studies revealed diverse communities exist in this area, comprised principally from Proteobacteria, Bacteroidetes, Firmicutes and Actinobacteria bacterial groups. It is unknown whether diazotrophic communities exist in the Terra Nova Bay area, and only several genera from these studies are known to have the nifH gene (see table 1). Interestingly, no representative from the Cyanobacteria phylum has been reported from the Terra Nova Bay studies so far.

Table 1. Potential nitrogen fixers in Terra Nova Bay area, see references in text. Phylum Genus

α- Proteobacteria Loktanella, Sulfitobacter, Methylobacterium, Paracoccus, Sphingomonas E- Proteobacteria Burkholderia γ- Proteobacteria Stenotrophomonas, Halomonas, Pseudomonas Firmicutes Bacillus, Paenibacillus Actinobacteria Micrococcus, Arthrobacter, Microbacterium

In general, relatively few micro-organisms are culturable (Amann et al., 1995) and due to the low bacterial content in polar ice core samples and difficulties in culturing them (Christner et al., 2005; Miteva, 2008), investigating bacterial content in ice cores requires the use of highly sensitive techniques. Terminal Restriction Fragment Length Polymorphism (T-RFLP) is a sensitive, affordable and applicable method used mainly for estimating the diversity of bacterial communities.

Briefly, this method amplifies 16S rDNA templates of a target community using PCR (Clement et al., 1998) with one primer carrying a fluorescent label. Fragmentation of the amplicons by endonuclease restriction enzymes produces a population of fluorescently labelled terminal fragments (‘T-RF’, length in base pairs). The fluorescent PCR products are detected using sequencing electrophoresis technologies and are visualized as peaks - each peak represents a fragment, post-digestion (Marsh, 1999; Blackwood et al., 2003). The general assumption in this method is that the height of a peak represents the abundance of a fragment. The more of fragment X that is present, a stronger signal will be detected and the peak will be higher (Marsh, 2005). It should be noted that the method provides a quantitative and detailed view of the PCR product pool derived from a community, and does not accurately reflect the native community structure (Moeseneder et al., 1999).

This method has been employed successfully in cryospheric environments. A T-RFLP analysis of John Evans Glacier reported 142 T-RFs from 141 DNA preparations with HaeIII digestion,

25 suggesting a relative low number of T-RFs was reported for each sample preparation from the glacier (Bhatia et al., 2006). An ice core taken from Lake Vostok, at 3589 m depth, produced 12 fragments after a universal bacterial 16S rDNA amplification and digestion (Priscu et al., 1999). T-RFLP was also used in an extensive study of the microbial diversity in lithic niches (sandstones, quartz, soil, etc), in the McKelvey Valley, McMurdo Dry Valleys and revealed a complex community of bacteria and eukaryota (Pointing et al., 2009).

Biases involved in environmental DNA extraction and in primer annealing to different templates during PCR mean that certain DNA sequences (or T-RF’s) are preferentially retrieved from a sample (Liesack and Dunfield, 2004). Therefore a particular T-RF cannot be compared to a different T-RF in a single profile. However, it is possible to compare a T-RF to itself over different samples, and so also a T-RF pair match can be compared to itself over different samples (Osborn et al., 2000). In the end, the list of T-RFs in a sample is a profile of a bacterial community present in the environmental sample.

The study area is located in Northern Victoria Land, Antarctica, close to Mario Zucchelli station (74° 41′ 36.96″ S, 164° 6′ 42.12″ E), previously known as Terra Nova Bay station. In general, the climate is cold with a mean annual temperature of -14°C (Frezzotti et al., 2001) and mean monthly air temperature ranging between -26°C and 0°C. Average precipitation is 270 mm/year water equivalent in snow (Piccardi et al., 1994). Two sites in the study area were the focus of research - “Amorphous Glacier” (74°41’25’’ S, 164°00’ E) and “Boulder Clay“ (74°44’45’’ S, 164°01’17’’ E), which are two small ice-free areas characterised by debris cones (Guglielmin et al., 2002; Guglielmin and French, 2004).

Although in close proximity, Amorphous Glacier is above the Pleistocene grounding line and Holocene in age, whereas Boulder Clay is below the grounding line, with sediments likely of a glacial-marine origin and dated to the Late Pleistocene (Orombelli et al., 1991). These novel sites have been extensively studied for their isotopic composition, mechanisms of ice distribution and formations (Guglielmin et al., 1997; Gragnani R et al., 1998; French and Guglielmin, 1999a; Guglielmin and French, 2004) yet to date, their microbial and diazotrophic aspects remain unknown. We therefore proceeded to test if bacterial DNA could be obtained from ice and permafrost cores of the Amorphous Glacier and Boulder Clay areas, and whether bacterial community profiles differ between these two distinct sites by way of terminal- restriction fragment length polymorphism (T-RFLP) analysis. Knowledge of microbial life existing in ice does not only improve our understanding of the taxonomic diversity, richness and biogeography of cold-adapted microorganisms, but also

26 assists in evaluating the metabolic requirements for survival and proliferation of life in the cryosphere, and in defining the actual limits of life.

Amorphous Glacier

Boulder Clay

Figure 1 Antarctic study sites. Left: location of the two study sites. Upper right pane: View of the perennially frozen lake and debris cone in Amorphous Glacier (Guglielmin et al., 2002). Lower right pane: Frost mound at Boulder Clay (Guglielmin and French, 2004). Reproduced with permission from author.

27

2.2 Materials and methods

2.2.1 Study sites

Amorphous Glacier is located west of Mario Zucchelli Station (MZS) between 250 and 290 m above sea level (see figure 1). The summit of the cone is partially collapsed and its debris cover consists of 70-80% of light grey granitic gravel, with some granite boulders being more than 1 m in diameter. Ice within it represents congelation ice derived from ground waters formed under different thermodynamic conditions (Guglielmin et al., 2002). The age of the cone is relatively recent within the Holocene. The ice core stratigraphy has revealed several layers, based on crystallographic characteristics (C-axes, bubble density, crystal size) and chemical analyses (Guglielmin et al., 2002). These layers are summarised in table 2.

Boulder Clay site is located south of Mario Zucchelli Station (MZS) in an ice-free area, 205 m above sea level (Guglielmin and French, 2004). The mean annual air temperature is -13.8°C and the mean annual ground temperature at the surface (2 cm depth) is 16.1°C and at the permafrost table (30 cm depth) -16.5°C. The mean annual temperature of the deepest monitored layer (3.6 m, within the ice), is -17°C (Guglielmin and Cannone, 2012).

In the Boulder Clay area, an ablation till of late-glacial age overlies a body of buried glacier ice (Guglielmin et al., 1997; Gragnani R et al., 1998; Guglielmin and French, 2004), and surface features include perennially ice-covered ponds with icing blisters and frost mounds (French and Guglielmin, 2000), frost-fissure polygons and debris islands (French and Guglielmin, 1999b). The age of the frost blister is younger than 1020 BP ± 70, while the till that generally covered the surface of the Boulder Clay area is referred to the Late Pleistocene and in particular to the Ross Sea I glaciations (Orombelli et al., 1991). The analysed frost mound formed during the late Holocene, in the middle of a perennially ice-covered lake, which is located on the sublimation till, overlying the buried Pleistocene relict glacier ice (Guglielmin et al., 2009).

2.2.2 Ice core collection

Two ice cores were obtained during the austral summer in 1996 (Guglielmin et al., 2002) with slow rotary drilling equipment without any chemical solutions, antifreeze liquid or any drilling fluid in order to minimize possible contamination. A 237 cm long ice core was extracted from the debris cone of Amorphous Glacier (AM), placed in polyethylene bags and stored in MZS station at -25°C (Guglielmin et al., 2002). The Boulder Clay (Stöver and Müller) core was 375 cm long and sampled from a shallow perennially-frozen pond through the underlying sediment 28 into the moraine-covered glacial ice. Cores were transported to Milan-Bicocca University, Italy, and stored in -40°C for further processing. Both cores contained several distinct layers (table 2). Amorphous Glacier was previously characterised chemically and isotopically (Guglielmin et al., 2002).

2.2.3 Sample preparation

Samples were aseptically cut from the ice cores in a -40°C room and stored on dry ice in a -40°C room, by a former member of the lab. Internal parts of the cores were cut by an electric saw (repetitively washed with ethanol) and stored in sterile falcon tubes after the surface was washed with 70% ethanol. BC samples contained a mixture of ice, stones and shells due its glacial-marine origin. These samples were crushed with an ethanol washed hammer. Two duplicates from each sample were taken and stored in sterile falcon tubes for further amplification and T-RFLP analysis.

2.2.4 DNA extraction and amplification

Samples were thawed overnight at 4°C and always kept in the dark. AM samples were then filtered through a sterile 0.22 mm membrane (Millipore). The flow-through was collected in sterile Falcon tubes, lyophilised and resuspended in 1mL sterile buffer. Filters were washed with 1 mL TNE buffer to recover bacterial cells. DNA was extracted from the filter and flow-through fractions using a protocol as previously described (Burns et al., 2004) with a modified incubation step with proteinase K (10 mg ml-1) and SDS (10%) to give a final concentration of 100 μg ml-1 proteinase K in 0.5% SDS, for 1 h at 37°C, and finally resuspended in 50 mL sterile water.

BC samples (400 mL) were added to 500 mL TNE buffer DNA extracted as described above and resuspended in 50 mL sterile Milli-Q water. All BC and AM samples were resuspended in a final volume of 50 mL sterile Milli-Q water. The DNA concentration was measured using NanoDrop ND-1000 Spectrophotometer. The presence of bacterial DNA, as well as the quality of extracted DNA and the presence of PCR inhibitors, was tested by universal bacterial 16S rDNA PCR using unlabelled 27F and 1494R primers. To amplify 16S rDNA fragment for the T-RFLP analysis, PCR was performed with a labelled universal forward primer 27F (6-FAM, carboxyfluorescein, 5’ AGAGTTTGATCCTGGCTCAG) and universal reverse primer 1494R (5’ TACGGCTACCTTGTTACGAC) in a 50 μL reaction (1X reaction buffer, 0.2 mM dNTP’s each, 0.25 mM MgCl, 0.2 μM primers, 0.8 U Taq polymerase). After an initial denaturing step

29 at 92ºC for 2 min, 30 cycles of amplification followed (92ºC for 20 sec, 50ºC for 30 sec, 72ºC for 1 min), concluding with an extension step at 72ºC for 7 min. DNA extraction of a microbial

Table 2. Ice core sections and layers description from Amorphous Glacier and Boulder Clay samples. n.d. - no data. Ice core El. Cond. - -2 section - pH Cl (Peq SO4 Layer description Sample (μS cm 1 (cm, (20°C) L-1) (Peq L-1) 20°C) (Guglielmin et al., 2002) depth)

Amorphous Glacier

AM-18 0-22 124 6.48 983.14 125.26 Active layer composed of loose sandy gravel with fine material increasing with depth. AM-3 75-79 19.5 5.73 10.77 Massive ice with high 155.45 bubble density, elongated and big crystals; chemicals maximum concentration peaked in sinusoidal cycles every 60 cm in depth. AM-21 265-272 59.95 6.71 347.4 7.62 Massive ice with an intermediate bubble density, less elongated and smaller crystals; sinusoidal chemicals cycles were not present.

Boulder Clay

BC-1 0-15 n.d. n.d. n.d. n.d. Dynamic active layer in a small debris cone (0-30 cm depth changes). BC-T 325-330 n.d. n.d. n.d. n.d. Massive ice and brine pockets within the ice (from a frost mound in perennial frozen pond). BC-B 370-375 n.d. n.d. n.d. n.d. Massive ice and brine pockets within the ice (from a frost mound in perennial frozen pond). Data taken from (Abramovich et al., 2012). mat sample from Brack Pond (McMurdo Ice Shelf, Antarctica) was used as a positive control while filter sterilized water was used as a negative. Positive results from the PCR were verified by 2% agarose gel electrophoresis and ethidium bromide staining prior to UV transillumination. DNA concentration was measured using NanoDrop ND-1000 Spectrophotometer. Samples at all times were kept in the dark at all times.

30

2.2.5 Terminal Restriction Fragment Length Polymorphism (T-RFLP)

Quadruplicates of AM-3, BC-1, BC-T and BC-B samples were analysed as well as triplicates of Brack Pond mat sample. Approximately 150 ng of each FAM-labelled PCR product was digested with 6 U of the restriction endonuclease MspI or 3 U of ScrFI (New England Biolabs). Digestions were carried out in a total volume of 10 μL over night at 37ºC following the manufacturer’s instructions.

The size of each Terminal Restriction Fragment (“T-RF”) was determined according to the GeneScan™ 1200 LIZ® size standard on an ABI 3730 Capillary sequencer (Applied Biosystems Inc.) with an acceptable error of ± 0.5 bp and also analysed using Peak Scanner™ Software Version 1.0 (Applied Biosystems Inc). T-RFs were visualized as peaks in GeneScan™, which are characterised by width (base pairs) and height (arbitrary fluorescence units as a linear representation of the abundance of a specific T-RF in the PCR pool). Height is therefore a qualified estimation of the original amount of a specific DNA fragment in a sample, prior to the PCR process (Marsh, 2005). Here, the absolute peak height was not used as a measure of bacterial abundance, since PCR fragment levels could have originated from process biases (Suzuki and Giovannoni, 1996).

Little background noise was evident in the electropherograms, affording an unambiguous selection of valid T-RFs with a minimum height of 35 fluorescent units (Liesack and Dunfield, 2004). T-RFs over 35 fluorescent units in intensity and present in at least two replicates were selected for further analysis. For comparative analysis, T-RFs within an electropherogram were normalized to the total height of that sample (Dunbar et al., 2000) and T-RFs with a relative height of less than 1% of the total height were excluded from further analysis. T-RFs with peak heights determined to be off-scale by GeneScan™ were also excluded from further analysis, unless present in other replicates at lower heights, in which case these T-RFs were adjusted to the lower height value (Dunbar et al., 2001).

Identical T-RFs in replicas were aligned and grouped after manually inspecting electropherograms. Assigning a specific size to a group of similar T-RFs was based on averaging their sizes. Similarly, assignment of relative height to a group of similar T-RFs, was based on averaged normalized relative height values. Only a few T-RFs were separated by 1 base pair but were shown to be identical peaks after manual inspection of electropherograms. These included T-RFs 80, 81, 82 and 145,146 that were collectively assigned as size 81 and 145, respectively.

31

2.2.6 T-RFLP profiles

The presence of similar T-RFs in each profile was the basis for the community comparison between AM-3, BC-1, BC-T and BC-B (Dunbar et al., 2000). The T-RFs list of each sample was considered a community profile and the similarity between Boulder Clay and Amorphous Glacier samples was assessed. A binary data set was created based on presence or absence of T-RFs from all samples. Bray- Curtis analysis was performed on presence-absence data using PAleontological STatistics program (PAST) (Hammer et al., 2001). The Venn diagram was calculated with the online program Venny (Oliveros, 2007).

2.2.7 T-RFLP Analysis Program (TAP)

T-RFs from all profiles were matched to the in silico digestions performed by the TAP software, on 16S rRNA genes present in the Ribosomal Database Project release 9, update 57 (Marsh et al., 2000; Cole et al., 2003). The software produced terminal restriction fragments, after taking into account the PCR primer binding sites, and the restriction enzyme excision sites (MspI and ScrFI), producing a database which contained list of T-RFs and bacteria divided into phyla, genera and species, which were then manually matched to the T-RFs observed in each Antarctic sample.

2.2.8 RDP 9, TAP and T-RFLP databases

T-RFs from all profiles were used for putative bacterial identification, based on the list the TAP software produced (section 2.2.7). We compared three 16S rDNA databases in terms of their phylogenetic composition: RDP (release 9, update 61), RDP 9 after TAP performed an in silico digestion with MspI and ScrFI, and a third database which was based on the samples profiles from the T-RFLP analysis. The first two databases provided a reference point for the third database in terms of taxa distribution.

2.2.9 PCR amplification of nifH genes

PCR amplification of nifH genes could not be carried out, mainly due to lack of source material after optimising our methodology as described in sections 2.2.4 - 2.2.5.

32

2.3 Results and discussion

Molecular fingerprinting analysis based on the bacterial 16S rDNA allows us to determine the presence of bacteria in environmental samples and their community profiles (Marsh et al., 2000). We obtained DNA with concentrations ranging from 0.29 to 88.02 ng mL-1, with the highest concentration from sample BC-1 (table 3). DNA yields and bacterial cell counts would, however, be required to determine if the different DNA concentrations are due to changes in the distribution of bacteria in the ice cores. DNA did not degrade under specified storage conditions and the partial 16S rDNA was successfully amplified from the samples BC-1, BC-T, BC-B and AM-3. Table 3 summarises DNA yields and results of 16S rDNA amplification from the study site and figure 2 presents representative electropherograms from each sample. Reasons for the failure of any amplification from the samples AM-18 and AM-21 could be due to a combination of low DNA concentration and degraded DNA (Rivkina et al., 2004), as we did not detect PCR inhibition in the extracted nucleic acids.

Table 3. DNA yields and results of 16S rDNA amplification from Amorphous Glacier and Boulder Clay samples. +, successful amplification; -, no amplification. Ice core section DNA Amplification Sample (cm, depth) (ng μL-1) 16S rDNA

Amorphous Glacier

AM-18 0-22 0.29 - AM-3 75-79 2.44 + AM-21 265-272 2.47 -

Boulder Clay

BC-1 0-15 88.02 + BC-T 325-330 3.48 + BC-B 370-375 9.29 +

In general, the number of T-RFs from the ice core samples was lower in comparison to T-RFLP studies from non-cryospheric environments. For example, a bioreactor study reported 69 T-RFs (McGuinness et al., 2006) and a cucumber roots study reported about 32 T-RFs using MspI digestion (Tiquia et al., 2002).

However, the number of T-RFs from this study is within the range of results from other cryospheric environments (Priscu et al., 1999; Bhatia et al., 2006). It would seem T-RFLP studies of icy environments tend to produce relatively low numbers of T-RFs, suggesting restricted bacterial diversity in these environments.

33

Peaks that accounted for about a third of the total fluorescence (peak height) in any profile were usually small T-RFs (38 base pairs or lower), while longer fragments were generally one fifth or less of the total fluorescence in any profile. Small fragments were not marked as out of scale, and were also present in replicas. They may be the result of primer dimers (Liesack and Dunfield, 2004; Marsh, 2005) or have originated from unidentified bacteria. In this study, they were excluded from downstream analysis as they were considered most probably primer dimers which did not reflect true bacterial community diversity.

2.3.1 Amorphous Glacier and Boulder Clay T-RFLP Profiles

T-RFLP analysis identified 18 T-RFs from MspI and ScrFI digestions (table 4). Four T-RFs were found in all sites. There were 11 unique T-RFs and the majority were detected in the AM sample (figure 3A). The BC-B and BC-1 ice-core samples had one and three unique T-RFs, respectively, but no unique T-RFs were identified in BC-T, which also only had five T-RFs in total. BC-T and BC-B bacterial community profiles were most similar to one another and both were more similar to BC-1 than to Amorphous (table 4, figure 3).

The relative peak height of T-RFs can indicate their relative abundance within the bacterial communities. In the bacterial profiles analysed here, 88 per cent of T-RFs from the MspI digestion had a relative peak abundance of less than 10 per cent (table 4). The greatest peak abundance was T-RF size 553 from AM-3 sample, with 39.1 per cent. In the ScrFI digestion, T- RF size 81 was the most abundant fragment, with a relative abundance of 32.5 per cent (BC-B and BC-1 samples), and 76 per cent of T-RFs had a relative peak abundance below 10 per cent.

This could suggest that some taxa within the bacterial communities may dominate the overall abundance of the community profiles. Bray-Curtis similarity cluster analysis (figure 3B) suggested that Boulder Clay T-RF profiles were similar to each other but clustered separately from the AM-3 ice-core sample. Two possible explanations for these results are the brine pockets in Boulder Clay, with high salt concentrations created due to partially melted ice with hypersaline water intrusions, and the penetration of bacteria from the top to lower layers via liquid water or micro-channels in the ice.

34

A

B

C

D

E

Figure 2. T-RFLP electropherograms after MspI digestion. A- AM-3 B- BC-T, C- BC-B , D- Brack Pond , E- BC-1.

Table 4. Number and relative peak abundances (%) of T-RFs > 40 bp, following MspI and ScrFI digestions of ice core samples and a 1% relative height threshold. The most abundant T-RF in a sample is marked bold. T-RFs (bp) Relative peak abundance (%) AM-3 BC-T BC-B BC-1 MspI Digestion 43 + (1.3) 48 +(1.3) 73 +(1.6) +(2.0) 81 +(5.6) +(6.7) +(10.1) +(7.8) 145 +(1.6) +(1.2) 147 +(1.2) 148 +(1.7) 149 +(1.3) 279 +(2.4) 538 +(1.3) 553 +(39.1) 1205 +(1.7)

Sum 6 2 4 5

ScrFI Digestion 43 +(3.1) +(1.3) +(1.6) 76 +(1.2) +(1.3) +(1.2) +(1.6) 81 +(15.6) +(16.2) +(32.5) +(32.4) 116 +(1.3) 145 +(2.7) +(2.5) +(2.1) +(3.1) 796 +(3.6)

Sum 6 3 4 4 Total 12 5 8 9 35

We proceeded to evaluate diversity without implementing a 1% relative height threshold, in order to gain more data points, and we assessed each digestion separately (table 5). According to the ScrFI digestion results and similarity comparison, the majority of T-RFs detected in Boulder Clay were shared between its profiles - BC-1, BC-T and BC-B. Eighty-two and 88% of BC-1 T-RFs were shared with BC-B and BC-T, respectively (ScrFI digestion). In addition, 93% and 100% of BC-T T-RFs were shared with BC-1 and BC-B, respectively. Most T-RFs (88- 100%) from all Boulder Clay profiles were detected in AM-3 profile which had about 3 times more T-RFs (76) in total than other profiles.

The shared T-RFs between BC-1, BC-B, BC-T and AM-3 amounted to a fifth (20%, 22%) of the AM-3 profile; therefore 80% of AM-3 DNA fragments were different than the Boulder Clay ice cores contents, according to ScrFI digestion results. MspI digestion produced more T-RFs for each profile and therefore less T-RFs were shared between profiles. Twenty-one and 34% of BC-1 T-RFs were shared with BC-T and BC-B, respectively. BC-T shared T-RFs at 65% and 62% with BC-1 and BC-B respectively, yet 73% of BC-T T-RFs were shared with AM-3. Additionally, BC-1 shared 35% of T-RFs with AM-3 while BC-B shared 42% T-RFs with AM-3, altogether suggesting that Boulder Clay DNA fragments were also present in AM-3.

Figure 3 (A) Venn diagram illustrating the number of T-RFs per bacterial profile. (B) Bray-Curtis cluster analysis of 16S rDNA T-RF profiles from Amorphous Glacier (AM) and Boulder Clay (Stöver and Müller) obtained from 1000 bootstraps.

36

Table 5. Cross-profile analysis based on shared T-RFs between AM-3, BC-1, BC-T and BC-B. Ice core profiles ScrFI digestion BC-1 (17)(a) BC-T (15) BC-B (19) AM-3 (76) BC-1 93 79 20 BC-T 82(b) 79 20 BC-B 88 100 22 AM-3 88 100 89

MspI digestion BC-1 (82) BC-T (26) BC-B (57) AM-3 (94)

BC-1 65 49 31 BC-T 21 28 20 BC-B 34 62 26 AM-3 35 73 42 a The total number of T-RFs counted for a specific profile is displayed in brackets. T-RF count was done prior to implementing 1% relative height threshold to produce as much data points as possible for the analysis. b Total number of T-RFs varied between profiles. The result in each cell is the percentage of shared T- RFs between two profiles, with respect to the column profile.

Additionally, as observed in the ScrFI digestion analysis, the shared T-RFs between BC-1, BC- B, BC-T and AM-3 still amounted to a relatively small portion of the AM-3 profile (31%, 20% and 26%), even though the number of T-RFs in Boulder Clay profiles was considerably higher (82, 26, 57) in comparison to the ScrFI digestion.

In summary, with or without a 1% relative height threshold, AM-3 was the most diverse sample and differed from Boulder Clay samples, and few DNA fragments were shared between all sites. Additionally, Boulder Clay samples were similar to one another, and clustered separately from the AM-3 ice-core sample. Amorphous Glacier and Boulder Clay differ lithologically and in their geological ages (Holocene vs. Late Pleistocene), and therefore most probably support different microbial populations.

2.3.2 In silico database composition

The in silico process of the TAP program is based on the RDP 16S rRNA sequences database (Marsh et al., 2000; Cole et al., 2007). It was of interest to compare the outcome of the in silico digestion to the composition and size of the original RDP database. If the in silico digestion produced a seriously skewed bacterial taxa representation of the original RDP database bacterial composition, the T-RFLP profiling would be biased as well and would be only partially representative of the bacteria in the samples.

37

The RDP 9 (release 61) database contained a total of 180,642 bacterial sequences (figure 4, A). Thirty three point five percent were affiliated with Proteobacteria sequences, 31.9% Firmicutes, 12.7% Bacteroidetes (CFB group) and 8.8% Actinobacteria (Cole et al., 2007; Cole et al., 2009). Another 31 phyla were present in the database, 26 of which had less than 1% of the total amount of sequences in the database, while , Cyanobacteria, Spirochaetes, Verrucomicrobia and unclassified bacteria phyla were present with slightly more than 1% ratio.

The second database was produced in silico by the TAP program and contained 30,781 sequences (figure 4, B). Major groups included Proteobacteria (34.8%), Firmicutes 32.7%), Bacteroidetes (14%, CFB group) and Actinobacteria (8.4%), similarly to the RDP 9 database distribution. Additional twenty eight phyla were present in this database. Cyanobacteria and unclassified bacteria were >1%, while 26 other phyla had lower proportions.

From size perspective, the TAP database size was only 17% of the RDP 9 database, yet from a composition perspective, it was similar in its composition to the RDP 9 database. Therefore, the TAP program produced a representative database for the downstream process.

Figure 4. Databases phylogenetic compositions. (A) Bacterial phylogenetic composition of RDP 9 16S rRNA gene sequence database; (B) Bacterial phylogenetic composition based on TAP in-silico digestion with ScrfI and MspI;(C) Bacterial phylogenetic composition from all digested samples, after T-RFs were assigned bacterial identification.

38

The third database, based on the T-RFLP analysis (figure 4, C), contained fewer bacterial sequences than the TAP or RDP 9 databases and varied in the composition of the phylogenetic groups. It contained potential cryospheric bacteria from the analysed samples, and consisted of 650 sequences in total. Four major phylogenetic groups were represented: Proteobacteria (39.1%), Firmicutes (22.2%), Actinobacteria (13.2%) and Bacteroidetes (CFB group) (12.5%). Acidobacteria, Cyanobacteria, Planctomycetes, Spirochaetes and unclassified bacteria were also present with > 1% sequence abundance in the database and eight additional phyla were present with < 1%.

Thus the T-RFLP database, generated in this study, contained seventeen phyla vs. the 32 and 35 phyla identified in the TAP and RDP 9 databases, respectively, with proportional shifts within the four major phylogenetic groups - an increase in the Proteobacteria and Actinobacteria sequences, a decrease in the Firmicutes, and no substantial changes within the Bacteroidetes (CFB group).

We then continued to further analyse the T-RFLP profiles of each sample in order to gain an overview of putative phylotypes (Pointing et al., 2009). The TAP database contained all the possible T-RFs emerging specifically from using MspI and ScrFI restriction enzymes, we therefore normalized the individual profiles, of each sample, to the TAP database (table 6). An average ratio value above 1 indicated a higher portion of a specific phylum relatively to the original TAP database. Across all samples, for instance, there was a higher portion of Proteobacteria, Firmicutes, Actinobacteria and Nitrospira (1.12, 1.03, 1.04 and 13.43, average ratios respectively) in the samples than in the original TAP database. Conversely, Bacteroidetes (CFB group), Cyanobacteria, and unclassified bacteria had an average ratio < 1 (0.47, 0.44 and 0.78, respectively) indicating a lower portion of these phyla in each profile, compared to the TAP database.

Generally, Amorphous Glacier (AM-3) sample retained 16 phyla (TAP projected originally 32 phyla) and it had the largest number of phylogenetic groups compared to BC-T, BC-1 and BC-B (Table 6). Cross-profile analysis also suggested that Amorphous also shared common T-RFs with Boulder Clay (table 5) and TAP in silico projection established these common T-RFs were probably related to Proteobacteria, Firmicutes, Bacteroidetes (CFB group), Actinobacteria, Cyanobacteria and Nitrospira.

Except for OP10 and the groups, all other phyla associated with AM-3, have previously been found in Antarctic lakes and microbial mats and other cryospheric environments - alpine permafrost and Siberian permafrost-affected soils (Franzmann and

39

Dobson, 1992; Brambilla et al., 2001; Bowman et al., 2003; Sheridan et al., 2003; Miteva et al., 2004; Bai et al., 2006; Wagner et al., 2009).

Table 6. Distribution of potential phyla groups within each profile based on TAP database of 16S rDNAsequences. TAP database Profiles individual T-RFLP databases

Sequences Phylum (a) AM-3 BC-T BC-B BC-1 Average (%) (b) (%) (c) (%) (%) (%) ratio(d) Proteobacteria 34.8 42.1 51.2 33.6 28.6 1.12 Firmicutes 32.7 17.5 32.6 25.6 58.9 1.03 Bacteroidetes 14.0 13.9 5.8 4.8 1.8 0.47 Actinobacteria 8.4 16.3 7.0 8 3.6 1.04 Cyanobacteria 3.6 1.6 1.6 0.44 Unclassified bacteria 2.0 1.4 1.8 0.78 Acidobacteria 0.9 2.0 (2.24)(e) Chloroflexi 0.6 0.6 (0.99) Verrucomicrobia 0.6 0.6 (1.06) Spirochaetes 0.5 24 (45.32) Planctomycetes 0.5 1.2 (2.48) Deinococcus-Thermus 0.2 1.0 (4.30) Nitrospira 0.2 0.8 3.5 2.4 5.4 13.43 Fusobacteria 0.2 0.4 (2.40) Thermotogae 0.1 0.4 (3.39) Chlorobi 0.05 0.2 (4.07) OP10 0.05 0.2 (4.07) a A partial list of phyla present in the TAP database post in-silico MspI and ScrFI digestion. b The ratio of each phylum in the TAP database. c The ratio of each phylum in the T-RFLP profiles: AM, BC-T, BC-B and BC-1. d The ratio of each phylum was divided by the ratio of its corresponding phylum in the TAP database. The resulting ratios were then averaged. e “(X)” not an average, value based on one profile only.

2.3.3 Amorphous Glacier and Boulder Clay cryospheric bacteria

Phyla level analysis provided a broad overview and we proceeded to review the data at the genus level. This would further correlate our results to current microbial cryospheric data, and in particular, to findings from the Terra Nova Bay area. A genus was denoted ‘cryospheric’ based on published reports of sequence data with 95% or higher 16S rDNA sequence similarity to a specific genus isolated from cold environments (Everett et al., 1999). This was deemed necessary in order to narrow down the possible diazotrophic candidates, and after applying the 95% sequence similarity criteria, 65 genera were excluded from the analysis (data not shown) while 38 genera passed the criteria.

The mean number of T-RFs following MspI digestion was 10.4 ± 1.7 for an individual profile and a total of 26 different 16S rDNA T-RFs were observed from AM-3, BC-1, BC-T, BC-B combined. From these 26 T-RFs, 16 T-RFs were singular peaks (appeared in one profile only)

40 with relative heights between 1.2% and 39.1% of the total fluorescence (T-RF 553 in AM-3 profile, table 4).

Five T-RFs were observed after MspI digestion in most profiles (T-RFs 30, 31, 35, 38 and 81). However, the first four of these T-RFs were considered primer dimers and discarded from further analysis. T-RF 81 had a relative average height of 8.8% and had an in-silico (TAP Projected T-RFs (TPTs) identification of Bacilli spp. (Firmicutes) and the Nitrospira class. Clones of both taxa were reported from cryospheric environments (Bakermans et al., 2003; Gilichinsky et al., 2007; Steven et al., 2007), and more specifically from Rod Bay (Terra Nova Bay area) and Northern Victoria land (Yakimov et al., 2004; Aislabie et al., 2009). Two T-RFs following MspI digestion (73 and 145, table 4) were observed in 60% of the profiles with relative heights below 4%, indicating fragments were not abundant after PCR amplification process. Only T-RF 145 had an in-silico bacterial match to various phylogenetic groups - , Firmicutes and Actinobacteria.

The mean number of T-RFs following ScrFI digestion was 8.6 ± 2 per profile and a total of 13 different 16S rDNA T-RFs were observed across all profiles. Three out of the 13 T-RFs were singular peaks. Four T-RFs following ScrFI digestion (43, 76, 81 and 145) were observed in more than 80% of profiles and were associated in-silico with Proteobacteria, Actinobacteria, Firmicutes, Bacteroidetes (CFB group) and green sulphur bacteria. T-RF 81 was the most abundant fragment with an average relative height of 30.4 ± 16.2 %. Its TAP Projected T-RF (TPT) was associated in-silico with nine different genera. Alicyclobacillus has been isolated from Mount Melbourne, Northern Victoria land (Nicolaus et al., 1998; Pepi et al., 2005) and Bacillus is a common find in polar environments, as mentioned previously. Burkholderia and Pseudomonas were reported previously from Rod Bay and Santa Maria Novella (Yakimov et al., 2004; Lo Giudice et al., 2007). In addition, clones affiliated with Burkholderiales were recently found in soil samples in Northern Victoria Land (Niederberger et al., 2008). Of the remaining five genera associated with T-RF 81, three (Rikenella, Terrimonas and Sporolactobacillus) have not been detected in cryospheric environments, yet Leptospirillum and Rhodanobacter were reported previously in the cryospheric literature (Spirina et al., 2003; Vishnivetskaya et al., 2006).

Three unique T-RFs were observed only in AM-3 profile after ScrFI digestion (18, 116 and 796) with respective relative heights of 1.3%, 1.3% and 3.6%. T-RF 116 did not have an in- silico bacterial match. T-RF 796 was associated with Sporolactobacillus and Streptococcus genera from the Bacilli class. Sporolactobacillus was not detected or found in cryospheric

41 environments to date, while Streptococcus has been discussed above. In silico matches to all T- RFs within samples, are listed in table 7.

A total of 20 putative diazotrophic genera were identified based on the analysis (table 7), 8 in Boulder Clay samples and 17 in Amorphous Glacier samples, none relating to Cyanobacteria. Of these diazotrophic bacteria, some have been previously reported from the Terra Nova Bay area (table 1): Arcobacter, Burkholderia, Halomonas, Methylobacterium and Pseudomonas (Proteobacteria), Bacillus and Paenibacillus (Firmicutes), and Arthrobacter (Actinobacteria).

Table 7. Genera and diazotrophic cryospheric bacteria in AM-3, BC-T, BC-B and BC-1. Shaded rows indicate genera previously found in Terra Nova Bay area.

Potential diazotroph Genera AM-3 BC-T BC-B BC-1 Cryospheric References(b) N/Y (a) Proteobacteria (Priscu et al., 1999; Liu et N Acidovorax + al., 2006; Lo Giudice et al., 2007) Y Aeromonas + (Gilichinsky et al., 2007) Y Afipia + (Priscu et al., 1999) (Feller et al., 1992; Gauthier et al., 1995) N Alteromonas + + Now Pseudoalteromonas haloplanktis (Bowman and McCuaig, Y Arcobacter + 2003; Yakimov et al., 2004) (Zhou et al., 1997; Y Bradyrhizobium + Sheridan et al., 2003; Xiao et al., 2007) (Christner et al., 2000; Y Burkholderia + + Yakimov et al., 2004) (Bowman and McCuaig, N Coxiella + 2003) (Gaidos et al., 2004; Y Delftia + Skidmore et al., 2005; Xiao et al., 2007) (Ravenschlag et al., 1999; Y Desulfobacterium + Bowman and McCuaig, 2003) N Erythrobacter + (Yakimov et al., 2004) N Gallionella + + (Skidmore et al., 2005) (Bowman et al., 1997; Y Halomonas + Xiang et al., 2004; Yakimov et al., 2004) (Liu et al., 2006; Amato N Hydrogenophaga + et al., 2007) (Vishnivetskaya et al., N Lysobacter + 2006; Steven et al., 2007) (Brinkmeyer et al., 2003; N Marinobacter + + + + Lysnes et al., 2004; 42

Yakimov et al., 2004) (Bowman and McCuaig, Y Mesorhizobium + 2003) (Yakimov et al., 2004; Miteva and Brenchley, Y Methylobacterium + 2005; Zhang et al., 2007b) (Brambilla et al., 2001; Y Pelobacter + Sjöling and Cowan, 2003) (Michaud et al., 2004; Y Pseudomonas + + + Yakimov et al., 2004; Lo Giudice et al., 2007)

(Bowman et al., 2003; N Shewanella + + + Yakimov et al., 2004; Lo Giudice et al., 2007) N Sphingobium + (Xiang et al., 2005)

Total for each sample 20 3 6 3 Firmicutes (Nicolaus et al., 1996; Y Bacillus + + + + Yakimov et al., 2004; Steven et al., 2007) (Brambilla et al., 2001; Gilichinsky et al., 2005; Y Clostridium + Vishnivetskaya et al., 2006) (Segawa et al., 2005; N Lactobacillus + Sundset et al., 2007) (Bargagli et al., 2004; Y Paenibacillus + Pepi et al., 2005; Mindlin et al., 2008) (Gaidos et al., 2004; N Streptococcus + Segawa et al., 2005) N Syntrophococcus + (Sundset et al., 2007)

Total for each sample 6 1 1 1 Bacteroidetes (Van Trappen et al., N Algoriphagus 2004) N Bacteroides + + (Sheridan et al., 2003) (Bowman et al., 1997; N Flavobacterium + Gilichinsky et al., 2007) N Hymenobacter + (Hirsch et al., 1998)

Total for each sample 2 1 1 0 Actinobacteria (Bargagli et al., 2004; Y Arthrobacter + Michaud et al., 2004; Lo Giudice et al., 2007) (Gilichinsky et al., 2007; N Corynebacterium + Lo Giudice et al., 2007) 43

(Miteva and Brenchley, Y Curtobacterium + 2005) Y Frankia + + (Christner et al., 2000) (Michaud et al., 2004; N Janibacter + Miteva et al., 2004; Lo Giudice et al., 2007) (Petrova et al., 2002; Y Micrococcus + Xiang et al., 2005; Steven et al., 2007) N Mycobacterium + (Miteva et al., 2004) N Nocardiopsis + (Abyzov et al., 1983) (Michaud et al., 2004; N Rhodococcus + Yakimov et al., 2004; Lo Giudice et al., 2007) (Kochkina et al., 2001; Zhang et al., 2002; Y Streptomyces + + Mannisto and Haggblom, 2006)

Total for each sample 8 0 4 0

Verrucomicrobia (Bowman and McCuaig, N Prosthecobacter + 2003)

Total for each sample 1 0 0 0 Deinococcus-Thermus N Thermus + (Sheridan et al., 2003)

Total for each sample 1 0 0 0 a Y denotes a genus with species which possess a copy of nifH gene, based on NCBI genomic databases. b An abbreviated list of cryospheric references. Where possible, only references which presented clones with 95% and higher 16S rDNA sequence similarity to a specific genus were included.

2.4 Concluding remarks

Boulder Clay and Amorphous Glacier are two ice-free areas in Terra Nova Bay, Antarctica, which differ in their geological origins and physio-chemical properties, which have been assessed, for the first time, for their microbial content and biodiversity. In order to gather first evidence for the bacterial communities in these glacial zones, we carried out terminal-restriction fragment length polymorphism (T-RFLP) analysis on 16S rDNA using a universal bacterial amplification protocol on two permafrost cores.

Microbial diversity differed between Boulder Clay and Amorphous Glacier and between the different layers of Boulder Clay. Bray-Curtis cluster analysis suggested Boulder Clay bacterial

44 profiles were similar to each other, but cluster separately from the Amorphous Glacier bacterial profile. With our current data it is not possible to ascertain definitively if the difference in the geological age or other properties that distinguish one site from the other, is responsible for the analysis results.

Our analysis suggested that the microbial population of the Boulder Clay active layer was less diverse than the other layers at this site. This maybe is due to two reasons: A. vertical water movement permitted micro-organisms to penetrate into deeper permafrost layers and not remain on the surface B. hypersaline brine pockets that remained liquid at very low temperatures, providing basic conditions for the survival of the microbial community.

Another finding of this analysis was that the Amorphous Glacier sample included potentially 38 cryospheric genera. Boulder Clay and Amorphous Glacier possibly shared in common the following genera: Gallionella, Burkholderia (), Alteromonas, Marinobacter, Pseudomonas, Shewanella (Gammaproteobacteria), Bacillus (Gram positive), Bacteroides (CFB group), Frankia and Streptomyces (high G+C Gram-positive). Each of these phylotypes include species psychrotolerant to psychrophilic and microaerophilic. These phylotypes were either detected in marine environments, or proven to be tolerant to NaCl salt stress, which is not surprising considering the connection between salt tolerance and cold resistant bacteria in terms of survival mechanisms (Jeffrey O. Dawson and Gibson, 1987; Deming, 2002; Yakimov et al., 2004; Lo Giudice et al., 2007; Pumbwe et al., 2007).

Amorphous Glacier sample also included potentially 20 nitrogen fixing genera, based solely on the known presence of a nifH gene copy in their genomes. Unfortunately, we were unable to further characterise this community due to the lack of source material after a lengthy optimising process of the T-RFLP analysis.

This preliminary work suggested the presence of a common group of cold and desiccation resistant bacteria, some of which might be nitrogen fixers. In general, our molecular analysis provided us with relatively few data points and the bacterial identification is by no means a definitive conclusion and therefore would require further sampling and analysis. Such research would help confirm and correlate the community composition with the geological and habitat characteristics of Amorphous Glacier and Boulder Clay.

45

Chapter 3 Diazotrophic diversity in columnar stromatolites of Shark Bay, Western Australia. ______

3.1 Introduction

Biologically accessible nitrogen is imperative and essential for a thriving microbial community. Identifying potential nitrogen fixers - diazotrophs, has not been investigated to date in columnar stromatolites in Shark Bay, Western Australia. Nothing is known of the nitrogen cycle in Shark Bay’s modern stromatolite community, and it is of interest to compare Shark Bay’s diazotrophic community characteristics to other comparable hypersaline microbial systems. The aim of this study was to ascertain and characterise the diazotrophic community in columnar stromatolites from Hamelin Pool in Shark Bay (figure 1).

Shark Bay is a 14,000 km2 world heritage site, off the central coast in Western Australia (24°-26° S 113°-114° E). It is a semi-enclosed embayment comprised of two long, narrow reaches: Freycinet Reach is 105 km long and 20–35 km wide, and Hopeless Reach (35 km long, 40 km wide). These reaches are separated by the Peron Peninsula with a mean water depth of 10 m (Smith and Atkinson, 1983). Shallow, carbonate rich banks exist in both reaches, effectively creating bays of 1-2 m deep, with relatively low oceanic water influx, which are regulated mainly by diurnal to semi-diurnal tidal processes (Burling et al., 2003).

Faure Sill, a major sea grass-covered sand bank, divides Hopeless Reach into two unequally sized bays - L'Haridon Bight and Hamelin Pool. The seawater Figure 1 Shark Bay, Western Australia. temperature within the bay varies between 17°C in Image Google Earth. Salinity values are in parts per thousand (ppt), modified August and a maximum of 27°C in February (Bureau of from O’Leary (2008). Meteorology, 2011). The low oceanic circulation within the bay, and the low intermittent

46 rainfall (<200 mm y-1) plus the high evaporation rates (>2000 mm y-1), form a NW-SE salinity gradient, of oceanic to hypersaline levels: 35-40 ppt, then 40-56 ppt (metahaline) and up to 56- 70 ppt, almost twice as that of sea water (O'Leary et al., 2008).

Within Shark Bay shallow and hypersaline pools, exist vast banks of benthic microbial deposits, known as stromatolites (Riding, 1999; Jahnert and Collins, 2011). The word stromatolite is derived from the Latin word stroma, meaning bed covering, and the Greek word strōmat, meaning spreading out, as well as lithos, meaning stone. Geologists have identified fossilized records of stromatolites in the rock record for more than 200 years (Walter, 1976). The oldest fossilized stromatolite was identified in the Dresser Formation of the Pilbara subgroup, Western Australia, dated at 3.496 Ga, from the Archaean period (Schopf, 2006). Very few stromatolite structures have been found in Archean rocks (which are less preserved in the rock record), while most structures to date have been found in rocks from the Proterozoic and Phanerozoic era, 2.5 Ga to the present (Bertrand-Sarfati and Walter, 1976; Krylov et al., 1976; Serebryakov and Walter, 1976a, b; Schopf, 2006).

The ancient origins of stromatolitic deposits have led to Shark Bay modern stromatolites being referred to as “living fossils”, and marked them as important to our understanding of the origin of life on Earth. Shark Bay entered UNSECO’s World Heritage List in 1991. As stated in the nomination, the foremost justification for the inclusion was that the Hamelin Pool stromatolites represented an ancient life form in existence, and Hamelin Pool would be the classic site for study of these “living fossils” (UNESCO, 1991).

There are five main stromatolite morphologies known to exist in Shark Bay - pustular, smooth, cerebroid, microbial pavement, and columnar or colloform microbial deposits (Hoffman, 1976; Jahnert and Collins, 2011). Pustular, smooth mats and columnar morphotypes are the three best known and studied (Logan, 1961; Logan et al., 1974; Hoffman and Walter, 1976; Playford et al., 1976; Burns et al., 2004; Allen et al., 2009; Burns et al., 2009). Pustular mats are irregular, coarsely fenestrated, non-laminated mats, usually found in the upper intertidal zone (figure 2, C). Smooth mats in contrast, are finely laminated, with Figure 2 Three stromatolite morphotypes from Shark distinct, well-defined layers, usually found in the lower intertidal zone. Bay (A) columnar (B) smooth (C) pustular Columnar stromatolites are usually found in the intertidal to sub tidal zones and exist down to a depth of 1-2 m below sea surface. Club

47 shaped with coarsely laminated internal texture, they are highly calcified and are up to 1.5 m height with spherical tops (Hoffman, 1976; Playford et al., 1976; Jahnert and Collins, 2011). The various stromatolites morphologies represent interplay between environmental and microbial processes. Environmental factors have been suggested to control the external morphological development of a stromatolite structure. These factors included for instance - wave energy, tide amplitudes, water levels and turbulences, sand waves, sediment grain size, hard/soft substrates and more (Logan, 1961; Logan et al., 1964; Logan and Cebulski, 1970; Logan et al., 1970; Playford et al., 1976; Dupraz et al., 2009). Whilst the active microbial component produced repetitive cemented grain layers and internal laminae, mainly by precipitation of aragonite micro crystals and repeatedly trapping and binding sediment particles (Andres and Pamela Reid, 2006). Biotic and abiotic factors thus come together to promote the outgrowth of stromatolites both on a micro- and macro-scale.

Microbiologists took a keen interest in the microbial component of the stromatolites and investigated them using microscopic, culturing techniques, and more recently, molecular methodologies. Table 1 provides a summarised view of the microbial agents identified in stromatolites from Hamelin Pool and Guerrero Negro, Baja California Sur, Mexico, a highly similar saline environment, discussed later in this chapter.

Combining microscopic and molecular tools has provided researchers with a qualitative and quantitative view of the microbial communities residing in stromatolite mats. In the past, microscopic observations in the stromatolite mats identified mainly cyanobacterial species - Microcoleus chthonoplastes, Entophysalis deusta, Schizothrix fuscescens and Leptolyngbya spp. (Logan, 1961; Golubic and Walter, 1976) and there is no record of their nitrogen fixation potential or actual rates, under local conditions. PCR based research has increased the number of identified microorganisms several fold with members of the Archaea, Bacteria and Eukaryota detected in stromatolite samples, as can be seen in table 1. Classification of the functional groups in stromatolites has identified Archaea as involved in fermentation (mainly methanogenesis), cyanobacteria, as oxygenic photosynthetic produces nitrogen fixers, and diatoms as oxygenic photosynthesisers (Paerl et al., 2000; Des Marais, 2003; Dupraz and Visscher, 2005). Aerobic heterotrophs, anoxygenic phototrophs, sulphate reducers and sulphide oxidizers from the Proteobacteria, Actinobacteria, Firmicutes and Bacteroidetes groups are apparently involved in several overlapping processes: fermentation, denitrification, nitrogen fixation and sulphur reduction/oxidation, which are all tightly bound to the light levels and oxygen/sulphur profiles within the mats (Paerl et al., 2000; Des Marais, 2003; Dupraz and Visscher, 2005; Goh et al., 2008; Allen et al., 2009; Burns et al., 2009).

48

Table 1. Stromatolite related microorganisms (genus level) from published studies. Potential diazotrophs which contain nifH gene are highlighted in bold. Hamelin Pool, Shark Guerrero Negro, Baja Environment Hamelin Pool, Shark Bay (c) Bay (a) California Sur, Mexico (b)

Bacteria Allochromatium Aphanothece Acidobacterium* Entophysalis Chlorobium Alcanivorax Microcoleus Chloroflexus Alteromonas Leptolyngbya Chlorothrix Arthrospira* Schizothrix Chromatium Bacillus Chroococcidiopsis Cellulomonas Cyanothece Chroococcidiopsis Desulfobacter Chroococcus Desulfobacterium Cyanothece* Desulfococcus Cytophaga Desulfovibrio Dermocarpella Euhalothece Desulfatibacillum Gloeocapsa Euhalothece Halospirulina Gloeocapsa* Halothece Gloeothece* Leptolyngya Halobacillus Lyngbya Halomicronema Microcoleus Halomonadaceae Oscillatoria Halomonas Phormidium Halothece Pseudanabaena Idiomarina Spirulina Leptolyngbya Synechocystis Flexistipes* Lyngbya Marinobacter Marinomonas Microcoleus Myxosarcina* Nitrococcus* Oscillatoria Phormidium Planococcus Planctomyces Pleurocapsa* Pontibacillus Porphyrobacter Prochloron Pseudoalteromonas Pseudomonas Rhodomicrobium Rhodopseudomonas Rhodospirillum* Rhodovibrio Rhodobacter Roseobacter Salinimonas Spirulina * Stanieria*

49

Symploca Synechococcus* Synechocystis Vibrio Virgibacillus Xenococcus Archaea Halococcus Haloferax Halobacterium* Methanosarcina* - - Halogeometricum Nitrosopumilus Cenarchaeum Fusobacterium* * 16S rDNA sequence similarity was 90% - 95% to a designated genus. Their identification should therefore be cautiously accepted (Everett et al., 1999). (a) Data collected with microscopic methods only (Logan, 1961; Bauld et al., 1986). (b) Data collected with microscopic and molecular methods (Javor and Castenholz, 1981; Risatti et al., 1994; López-Cortés et al., 2001; Ley et al., 2006). (c) Data collected by microscopic and molecular methods (Bauld et al., 1986; Burns et al., 2004; Papineau et al., 2005; Goh et al., 2006; Allen et al., 2008; Allen et al., 2009).

Diazotrophs have not been investigated to date in columnar stromatolites. In general, identifying nitrogen fixers has advanced considerably with the introduction of molecular techniques. These techniques are based on DNA and RNA extractions, as well as on the polymerase chain reaction (PCR, Mullis and Erlich, 1988) and are considered better in exploring natural microbial diversity (Amann et al., 1995; Head et al., 1998). The possible biases generated by DNA extraction methods and PCR kinetics were briefly discussed in the introduction chapter, section 1.8.2 and have been addressed in this study (see the methods section).

The aim of this study was to use molecular methodology to ascertain and characterise diazotrophs in columnar stromatolites. Important nitrogen fixers in stromatolites from other environments were usually cyanobacterial representatives from the Nostocales, Chroococcales and Oscillatoriales, as well as anaerobic, sulphate reducing G-Proteobacteria representatives (Steppe et al., 2001; Fourçans et al., 2004; Jenkins et al., 2004; Omoregie et al., 2004a; Omoregie et al., 2004b; Jungblut et al., 2005; Yannarell et al., 2006; Desnues et al., 2007; Leuko et al., 2007).

Nothing is known of the nitrogen cycle in Shark Bay’s stromatolitic community, and it is of interest to compare Shark Bay’s diazotrophic community characteristics to other comparable hypersaline microbial systems in order to broaden our understanding of the nitrogen fixation processes occurring within extant stromatolites and by extrapolation, processes which might have occurred in extinct stromatolites.

50

3.2 Materials and methods

3.2.1 Sample collection and sample sites

Sampling was conducted by former lab students in the intertidal region of Hamelin Pool at Telegraph Station at low tide in Shark Bay, Western Australia in 1996 and May 2004 (26°24’03” S, 114°09’36.1” E, figure 3).

Intertidal columnar stromatolite pieces were collected by cracking the stromatolite with a geological hammer about 2 cm from the top of the stromatolite or collecting a whole small stromatolite. All samples were placed in sterile specimen bags and kept at 4qC during transport back to the laboratory, where they were stored in the dark at 4qC until further processing. All samples were collected and handled with sterile instruments throughout the course of the study. No other environmental data was collected.

Figure 3 Left: Map of Shark Bay region. Inset image shows Shark Bay’s location on the west coast of Australia. Image copyright GeoScience Australia. Right: Low tide at Hamelin Pool, Telegraph station, showing columnar stromatolites. Image copyright Torben Rübke, 2006.

3.2.2 DNA isolation and PCR amplification of nifH genes

Within two weeks of 4qC storage, samples were processed and DNA extracted in order to avoid potential DNA degradation. A rock hammer washed with 70% ethanol and flamed was used to break small chunks out of stromatolite specimens from 2004 and 1996. Approximately 1 cm3 fragments of the stromatolite were ground to a fine paste using a sterile mortar and pestle. Genomic DNA was extracted by the method of Neilan (1995). Approximately 100 mg of fine paste was transferred to a 1.5 ml eppendorf tube and suspended in 567 μL TE buffer (10 mM

51

Tris-HCl, 1 mM EDTA, pH 8.0), to which 30 μL 10% SDS and 3 μL Proteinase K (10 mg ml-1) were added. The samples were incubated at 37qC for 3 h with intermittent shaking. An additional step of 5 cycles of freezing at -40 C and thawing was added to ensure complete cell lysis. Next, 100 μL of 5 M NaCl was added to the lysate and mixed thoroughly before the addition of 80 μL CTAB solution (10% w/v acetyltrimethyl ammonium bromide in 0.7 M NaCl) and incubated at room temperature overnight (Wilson, 2001).

An equal volume of phenol: chloroform: isoamyl alcohol (25:24:1) was added to the supernatant and mixed thoroughly before centrifugation at 14,000 g for 5 min at RT. The top layer of the supernatant was transferred to a fresh tube and the DNA precipitated in 50% isopropanol and

0.4 M potassium acetate. The samples were incubated at RT for 30 min or at 4qC overnight, and then centrifuged at 14,000 g for 5 min to pellet the DNA. The supernatant was discarded and the pellet washed with 70% ethanol, air dried and resuspended in 30 μL of sterile MilliQ water.

Two replicas of 1996 or 2004 extracted genomic DNA (5 ng μL-1), were used in a nested PCR to amplify the nitrogenase gene nifH (Omoregie et al., 2004c). The first PCR in the nested approach was performed using 0.3 units of Taq polymerase (Sigma-Aldrich, St. Louis, MO) in a

20 μL reaction mix containing 2.5 mM MgCl2, 1x Taq-Polymerase reaction buffer, 0.2 mM dNTPs (Fisher Biotec, WA, Australia), and 2 pM of each of the primers NifH3 (5' ATR TTR TTN GCN GCR TA 3') and NifH4 (5' TTY TAY GGN AAR GGN GG 3') (Zani et al., 2000), 1 μL of genomic DNA (5 ng μL-1) and sterile MilliQ water to a total volume of 20 μL. Thermal cycling was performed in a GeneAmp PCR System 2400 Thermocycler (Perkin Elmer, Norwalk, CT). Thermal cycling conditions for the amplification of bacterial nifH genes were as follows: An initial denaturation step at 94˚C for 4 min was followed by 30 cycles of DNA denaturation at 94˚C for 1 min, primer annealing at 55˚C for 1 min and strand extension at 72˚C for 1 min, with a final extension step at 72˚C for 7 min. Two microliters of the first PCR reaction were used for a second amplification round using primers NifH1 (5'-TGY GAY CCN AAR GCN GA-3') and NifH2 (5'-ADN GCC ATC ATY TCN CC-3', (Zehr and McReynolds, 1989). The reaction mix and amplification protocol were as described above except for increasing the annealing temperature to 57°C. All PCR experiments included a negative control reaction without DNA template, and a positive control using DNA from the reference strain Nostoc PCC 7120.

PCR products were visualised on 1% and 2% agarose gels (molecular biology grade, Progen Pharmaceuticals, QLD, Australia) with 1x TAE-buffer and stained by ethidium bromide (1 μg ml-1) for 10-15 min. Nucleic acids were visualised via UV transillumination (Gel Doc 2000,

52

BioRad, Hercules, CA) using QuantityOne 4.1R software (BioRad, Hercules, CA) and raw images were exported in jpeg format for later visualisation.

3.2.3 Clone libraries and Restriction Fragment Length Polymorphism (RFLP)

Fresh PCR products (containing an A-overhang at the 3’ end) of the nifH gene amplification were ligated into the pGEM-T Easy vector (Promega, Madison, WI) according to the manufacturer’s instructions. From each clone library, at least 50 clones containing inserts were selected and amplified using the vector specific primers MpF and MpR. PCR products of the correct size were precipitated by transferring the remaining reaction mixture to a 1.5 ml eppendorf tube, adding a double volume of ice-cold 100% ethanol, and then incubated on ice for 15 min. The samples were centrifuged at 14,000 g for 15 min, the supernatant discarded and the pellet washed once with 200 μL freshly made 70% ethanol. The resultant pellets were dried using a SpeedVac vacuum centrifuge (Thermo Fisher Scientific Inc., Waltham, MA) or left with open caps under aluminium foil in room temperature, after which they were resuspended in 10- 15 μL sterile MilliQ water. To verify that the PCR product had not been lost during the ethanol precipitation, the cleaned PCR products were visualised on 1% agarose gels (molecular biology grade, Progen Pharmaceuticals, QLD, Australia) with 1x TAE-buffer, stained by ethidium bromide (1 μg ml-1) for 10-15 min and visualised via UV transillumination (Gel Doc 2000, BioRad) using QuantityOne 4.1R software (BioRad).

Each clone was subjected to duplicate Restriction Fragment Length Polymorphisms (RFLP) analysis using restriction enzymes ScrFI and MspI (New England Biolabs, Ipswich, MA) separately. Each digest reaction contained 3 μL PCR product, 1 μL of the corresponding enzyme buffer, 2 units of restriction enzyme and sterile MilliQ water to a total volume of 10 μL. The digests were incubated at 37°C overnight. Clones’ RFLP patterns were analysed manually after electrophoresis on 2% agarose gels as previously described. At least one clone from each unique RFLP pattern was sequenced.

3.2.4 DNA sequencing

Sequencing of selected clones was carried out using the PRISM Big Dye cycle sequencing system with MPF or MPR primers and 3-49 ng of the precipitated product. The sequencing reaction products were transferred to a 1.5 ml tube and precipitated by the addition of 16 μL sterile MilliQ water and 64 μL 95% ethanol and mixed thoroughly. After incubation at RT for 15 minutes, the samples were centrifuged at 16,000 g for 20 min and all the 53 supernatant was removed. The pellet was washed and dried as above. The sample was submitted for automated sequencing at The Ramaciotti Centre for Gene Function Analysis, UNSW, using the Applied Biosystems 3730 DNA Analyser (Foster City, CA) and analysed with Applied Biosystems Sequencing Analysis 5.1.1 software provided by Applied Biosystems.

3.2.5 Phylogenetic sequence analysis

Sequences chromatograms were manually checked for signal quality with “ABI and SCF Trace Viewer” embedded in “BioEdit Sequence Alignment Editor” software version 7.0.5.3 (Hall, 1999). Sequences with high background signal noise were discarded from further analysis. The 2004 and 1996 stromatolite clone library sequences were initially batch edited with “BioEdit” and a text editor “Crimson Editor” version 3.70 (freeware, Copyright © Ingyu Kang). Any remaining nucleotides from the cloning vector were removed from the 5’ and 3’ ends of the sequences, which were then temporarily realigned in a default fashion in “BioEdit”.

Sequence homologies were obtained using a nucleotide query (Altschul et al., 1990) with “BLASTN” version 2.2.18 and translated nucleotide query “BLASTX” version 2.2.24 (Altschul et al., 1997) from the National Center for Biotechnology Information (NCBI) website. BLAST results were screened against 42 sequences known to arise mainly from PCR reagents contamination - AY225105–AY225107, AY333089–AY333101, AB198366 - AB198391 (Zehr et al., 2003b; Goto et al., 2005). The reference NifH amino acid sequences for the alignment and phylogenetic analyses were imported from The Universal Protein Resource – UniProt (The UniProt, 2008; Apweiler et al., 2010) see Appendix A, table A-5.

Multiple alignments of the nifH gene nucleotide and amino acid sequences were carried out separately, initially using three different software packages: ClustalX 2.0.11 (Larkin et al., 2007), Muscle 3.8.31 (MUltiple Sequence Comparison by Log-Expectation, (Edgar, 2004) as implemented in EMBL-EBI website, and MAFFT 6 (Multiple sequence Alignment based on Fast Fourier Transform (Katoh et al., 2002; Katoh and Toh, 2008). Based on all software output, Muscle was chosen as the best alignment tool for nifH gene multiple sequences based on a visual check of the resulting alignments as well as the bootstrapping values of the NJ phylogenetic tree branches. As reflected in a benchmark testing of these multiple alignment tools (Nuin et al., 2006), MAFFT and Muscle produced similar quality outputs and both were better than ClustalX software results.

54

The appropriate amino acid substitution model for phylogenetic inference of nifH genes was obtained using “ProtTest” version 2.4 (Abascal et al., 2005). “ProtTest” proposed the best protein evolutionary model based on the smallest Akaike Information Criterion (AIC) score (Akaike, 2002). Phylogenetic trees were then created by the maximum likelihood approach (Felsenstein, 1981), LG substitution matrix (Le and Gascuel, 2008) and approximate Likelihood-Ratio Test for branch support (aLRT-SH-like, Posada et al.(2006) with “PhyML” version 3.0 (Guindon and Gascuel, 2003; Guindon et al., 2010) as implemented in http://www.atgc-montpellier.fr/phyml web site. “TreeDyn” version 198.3 (Chevenet et al., 2006), “MEGA 4” version 4.0.2 (Tamura et al., 2007), “TreeGraph 2” version 2.0.45 (Stöver and Müller, 2010) and “Adobe Photoshop Elements” version 8 (Copyright © 1990-2009 Adobe Systems Incorporated), were used for phylogenetic tree visualisation and final modifications.

3.2.6 Diversity, richness and coverage estimators

The NifH inferred sequences were aligned using “Muscle” version 3.8.31 and the molecular distances were calculated with the Probability Matrix from Blocks (PMB, Veerassamy et al. (2003) model implemented in “PHYLIP Protdist” version 3.67 (Felsenstein, 2007) available at ;ron et al., 2005יthe Mobyle web portal - http://mobyle.pasteur.fr/cgi-bin/portal.py#welcome (N Néron et al., 2009). Distance matrices generated by the above procedure were used in “Mothur” version 1.15.0 to group sequences into Operational Taxonomic Units (OTUs) of 88 % - 100 % phylotype cutoff thresholds, using the furthest-neighbour algorithm (Schloss et al., 2009). OTUs were then used in calculating collector’s curves and various estimators relating to clone library sampling coverage, diversity, richness, as well as shared estimators between clone libraries and structural similarities between 1996 and 2004 communities. Coverage of the clone libraries was calculated by “DOTUR” (Schloss and Handelsman, 2005) and the method of Good (1953). Richness was calculated by the method of Chao (Chao and Yang, 1993). Diversity index (H) was calculated by the method of Shannon–Wiener (Krebs, 1989). The programs “∫-LIBSHUFF”, “TreeClimber” and “UniFrac” were used for cross communities comparisons (Singleton et al., 2001; Schloss et al., 2004; Lozupone and Knight, 2005; Schloss and Handelsman, 2006a; Schloss and Handelsman, 2006b).

55

3.2.7 Accession numbers

Sequences of the nifH clones are available under GenBank accession numbers JF826460- JF826496.

56

3.3 Results and discussion

3.3.1 General methodology consideration

In order to minimize potential biases, this study used a DNA extraction method known to produce high quality DNA extractions, that does not skew the original composition of the microbial community in the sample (Leuko et al., 2007; Goh et al., 2008). PCR cycles were kept at 30 cycles, in order to avoid introducing biases in amplification and the nifH primer sets used in this study, have been shown not to cause major bias in the PCR amplification process (Diallo et al., 2008). To insure no false-positive results were created due to contaminated PCR reagents, all reactions and gel visualizations included negative controls, and BLAST results were screened against sequences known to arise from such contamination (Zehr et al., 2003b; Goto et al., 2005).

BLAST and BLASTX analyses are useful tools for taxonomical identification in terms of evolutionary interpretations, under certain known limitations (States and Botstein, 1991). Taxonomical affiliation based on 16S rDNA sequences, generally assumes a 1 to 1.5 % sequence difference is appropriate for defining strains within the same species, and a 2 to 5 % difference for species within the same genus (Clarridge, 2004). Translated NifH sequences are much more interesting and informative as they relate to the protein itself, a 3-D entity composed of primary, secondary and higher levels of structural compilations and subject to biochemical influences (Stormo, 2002). The selective pressure to adapt functionality to a micro-environment and yet retain a specific functionality, can produce nucleotide sequences which are distantly related, but the amino acid code would contain homologous coding regions which reflect structural and functional similarities (Sander and Schneider, 1991). A 2 to 8 % difference in the NifH amino acid sequences represents variations on the amino acid sequence, which relates to structural changes. Thus, a 2 to 8 % difference was considered appropriate for a positive identification of a Fe protein, based on past studies analysing structural homologies and sequence similarities (Sander and Schneider, 1991; Hobohm et al., 1992).

In this study, BLAST and BLASTX results passed significant statistical thresholds: BLAST expected (E) value range was e-88 – e-168 and BLASTX E-values ranged e-27 – e-60 (Ladunga, 2002a, b; McGinnis and Madden, 2004). In addition, NifH translated sequences were longer than 100 residues and the remaining hits for each sequence with lower E- values were also identified as Fe protein (NifH). We therefore assumed our translated sequences were homologs of the nitrogenase Fe protein component (Rychlewski et al., 2000) and

57 attributed sequence differences to biological adaptations to ecological constraints, or inter- species differentiation.

3.3.2 2004 clone library BLAST & BLASTX analysis

NifH genes were present and amplified from the total DNA extracted from the 2004 stromatolite samples (figure 4). In total, 38 clones containing the correctly sized insert (350bp) were obtained and analysed (figure 5). RFLP analysis was performed on 30 random positive clones from the 2004 library, which grouped into five patterns (see figure 6 and table 2).

Figure 4 Products obtained during the amplification of nifH from stromatolite DNA extractions using nested primers. Left pane: amplification products after the first step of nifH PCR amplification; right pane: amplification products after the second step. Lane 1: 2004 stromatolite; 2: 1996 stromatolite; 3, 4: -1 positive control Nostoc PCC 7120; 5: negative control sterile MilliQ H2O; M: 0.5 μg μL GeneRuler™ DNA Ladder Mix (Fermentas, Ontario).

Figure 5 Example of a 1% agarose gel showing the PCR amplification of 13 clones with the nifH insert before (left) and after (right) PCR product clean up procedure. Expected band size of a pGEM vector without nifH gene insert – 236 bp. Expected band size with nifH insert present – 586 bp. Lanes 1 - 13: 2004 stromatolite clones (white colonies); Lane B: blue colony product (negative control); M: 0.5μg μL-1 GeneRuler™ DNA ladder Mix (Fermentas, Ontario).

58

Figure 6 2% agarose gel showing RFLP patterns using ScrFI (bottom) and MspI (top) restriction enzymes on 12 clones from 2004 stromatolite library. M: 0.5μg/μL GeneRuler™ DNA ladder Mix (Fermentas).

Table 2 Modified representations of the 2004 stromatolite clones RFLP digestion patterns. Gel lanes 1 2 3 5 6 7 8 9 10 12 13 350* 350 350 MspI 200 200 200 200 200 200 200 200 200 200 600 600 550 500 ScrFI 350 350 300 300 300 300 300 300 300 300 200* 200 200 200 200 200 200 200 200 200 200 * Band size in basepairs.

A minor portion of the 2004 NifH clone library (table 3) was identified in the BLAST analysis as an uncultured cyanobacterial nifH clone from a benthic hypersaline microbial mat in San Salvador Island, Bahamas, with 98% and 100% sequence similarity (DQ140596 accession ID, Yannarell et al. (2006). Half of the clone library was identified as uncultured nifH clone sequences obtained from the microbial mats of a natural marsh in Guerrero Negro (GN), Mexico (Moisander et al., 2006), with sequence similarity of 87%. The vast majority of the clone library had less than 90% sequence similarity to uncultured cyanobacterial clones from

59 saline to hypersaline environments in the BLAST analysis, indicating potential novel nifH genes.

Translated NifH sequences from the 2004 clone library were identified by the BLASTX analysis as cyanobacterial NifH amino acid sequences affiliated with Chroococcales and Oscillatoriales, at 90%-94% sequence similarity (Table 3). The vast majority of the clones were identified as NifH sequences of Cyanothece spp. (strains CCY0110, ATCC 51142 and PCC 7425), with an average sequence similarity of 93%, and the remainder of the library were affiliated with uncultivated Cyanobacterium UCYN-A and Oscillatoria PCC 6506 with 91% and 94% sequence similarity, respectively.

This finding suggests nitrogen fixation might occur during night time in columnar stromatolites, as transcriptional nitrogenase studies of Cyanothece sp., strain ATCC 51142, revealed this strain fixed nitrogen diurnally, usually under dark, aerobic conditions, and that the nitrogenase was degraded during light periods (Colon-Lopez et al., 1997). Filamentous Oscillatoria spp. exhibited also a diurnal rhythm in N2 fixation, with aerobic nitrogen fixation detected only during dark periods (Stal and Krumbein, 1987). Therefore, the group of unicellular, filamentous and non-heterocystous Cyanobacteria that includes Oscillatoria sp. PCC 6506 and Cyanothece spp. would probably fix nitrogen aerobically during dark periods in stromatolites, and thus avoid potential oxygen damage to the nitrogenase complex (Reddy et al., 1993; Schneegurt et al., 1994; Berman-Frank et al., 2003). In addition, unicellular Cyanothece spp. and Cyanobacterium UCYN-A were shown to have similar nifH sequences (Zehr et al., 2008). The oceanic unicellular Cyanobacterium UCYN-A finding is of interest as it can fix N2 during daylight hours as well, without producing oxygen and causing redox damage to nitrogenase (Bothe et al., 2010).

Its genome contains only photosystem I with no trace of photosystem II, associated pigments or carbon fixation genes (Zehr et al., 2008). In addition, it was recently found that UCYN-A lacks many metabolic pathways and relies on other bacteria for the provision of essential amino acids and other important compounds (Tripp et al., 2010) and therefore the presence of UCYN-A within the stromatolite community is reasonable, as it is an anoxygenic photosynthetic heterotrophic microorganism living within a complex community and in contact with seawater, as is the case of the intertidal columnar stromatolites. However, the identification of this strain was not absolute in this study, as sequence similarity was only 91% when compared to Cyanobacterium UCYN-A NifH amino acid sequence. It remains to be seen if the newly added UCYN-A genome will be identified not only in oceanic environments, but also in other microbial mats studies.

60

Table 3 BLAST and BLASTX analysis of 2004 stromatolite nifH clone library. Total of 38 clones were analysed. BLAST Analysis Average No. of Accession sequence clones Source(a) ID similarity (%) (%) Natural marsh microbial mats, Mexico: DQ338040 50.0 87 Guerrero Negro, Baja California DQ338103 (Moisander et al., 2006) EU594141 Marine sponges bacterial symbionts 28.95 86 EU594212 (Mohamed et al., 2008a) Marine water, Heron Reef Lagoon, EF174812 10.53 90 Great Barrier Reef EF174826 (Hewson et al., 2007) Benthic, hypersaline microbial mat, DQ140596 5.26 99 San Salvador Island, Bahamas (Yannarell et al., 2006) Lyngbya mats of an intertidal flat, Mexico: AY450628 2.63 87 Guerrero Negro, Baja California (Omoregie et al., 2004a) U73133 2.63 89 Myxosarcina PCC 7312 (Zehr et al., 1997)

BLASTX Analysis Average No. of Accession sequence Closest NifH -deduced amino acid clones Phylum ID similarity sequence bacteria (%) (%)

ZP_01727765 52.63 93 Cyanobacteria Cyanothece sp. CCY0110

YP_001801976 26.32 93 Cyanobacteria Cyanothece strain ATCC 51142

YP_002483083 13.16 93 Cyanobacteria Cyanothece PCC 7425

YP_003421696 5.26 91 Cyanobacteria Cyanobacterium UCYN-A

ZP_07112556 2.63 94 Cyanobacteria Oscillatoria PCC 6506

(a) Unless specifically specified, all matches were to uncultured bacterial nifH clones.

61

3.3.3 1996 clone library BLAST & BLASTX analysis

In total, 100 clones were screened and 37 positive clones, containing the correctly sized insert (350bp), were obtained and analysed from the 1996 stromatolite clone library. Clones were sequenced directly without performing RFLP analysis until sufficient coverage (>50%) of the clone library was attained.

About a third of the clone library matched uncultured nifH clones from microbial mats of Guerrero Negro (GN), Mexico, with an average sequence similarity of 86% (Table 4), based on the BLAST analysis. A similar number of clones were identified as uncultured nifH clones from natural seawater sediments contaminated with crude oil from the Gulf of Mexico, at an average sequence similarity of 86%. Less than a fifth of the 1996 stromatolite clone library, had 96% - 98% similar to uncultured nifH clones from a benthic hypersaline microbial mat of a salt pond in Salins-de-Giraud, Camargue, France, and uncultured clones from a natural marsh in GN, Mexico (AM286438 and DQ821946). Overall, the vast majority of the 1996 stromatolites clone library had less than 90% sequence similarity to known sequences in the databases. Most of the BLAST matches were to uncultured nifH clones from saline to hypersaline environments, similar to the BLAST analysis of the 2004 clone library.

Translated NifH sequences of the 1996 clone library clustered into three different phyla and seven different diazotrophic genera based on BLASTX analysis (table 4). The majority of sequences in this clone library were affiliated with the G-Proteobacteria group, followed by γ- Proteobacteria and Cyanobacteria representatives. Almost a third of the clone library matched Pelobacter carbinolicus DSM 2380 with an average 96% sequence similarity; another 24.32% of the clone library matched Desulfatibacillum alkenivorans AK-01 (93% similarity) followed by matches to Desulfonatronospira thiodismutans ASO3-1 with 89% average sequence similarity. Desulfovibrio magneticus RS-1 matches constituted a minute portion of the clone library, yet had a relatively high sequence similarity of 97%.

62

Table 4 BLAST and BLASTX analysis of 1996 stromatolite nifH clone library. Total of 37 clones were analysed. BLAST Analysis Average No. of Accession sequence clones Source(a) ID similarity (%) (%)

Natural marsh microbial mats, Mexico: Guerrero Negro, DQ338014 35.14 86 Baja California (Moisander et al., 2006) DQ338071 DQ078021 29.73 88 Oil contaminated marine sediments (Musat et al., 2006) DQ078042 HM750443 13.51 89 Rhizosphere of a salt marsh (unpubl.) HM750759 Natural marsh microbial mats, Mexico: Guerrero Negro, DQ821946 8.11 98 Baja California (Moisander et al., 2007) Saline pond benthic microbial mat, France: Salins-de AM286438 5.41 96 Giraud, Camargue (Bonin and Michotey, 2006) GU193021 5.41 84 Intertidal microbial mat (unpubl.) AP010904 2.70 85 Desulfovibrio magneticus RS-1 (Nakazawa et al., 2009)

BLASTX Analysis Average No. of Accession sequence Closest NifH -deduced amino acid clones Phylum ID similarity sequence (%) (%)

YP_357508 29.73 96 δ-Proteobacteria Pelobacter carbinolicus DSM 2380 YP_002430688 24.32 93 δ-Proteobacteria Desulfatibacillum alkenivorans AK-01 ZP_07015343 21.62 89 δ-Proteobacteria Desulfonatronospira thiodismutans ASO3-1 ZP_01727765 8.11 95 Cyanobacteria Cyanothece sp. CCY0110 YP_001001870 5.41 93 γ-Proteobacteria Halorhodospira halophila SL1 YP_003073074 5.41 92 γ-Proteobacteria Teredinibacter turnerae T7901 YP_002953433 2.70 97 δ-Proteobacteria Desulfovibrio magneticus RS-1

(a) Unless specifically specified, all matches were to uncultured bacterium nifH clones.

All these species are strict anaerobes, sulphide or sulphate reducers, isolated from marine and freshwater sediments, and are considered mesophilic (Schink, 1992; Sakaguchi et al., 2002; Cravo-Laureau et al., 2004; Sorokin et al., 2008). Nitrogenase activity has yet to be characterised in these genera, though nifH DNA fragments have been amplified from mostly marine sediments (Zadorina et al., 2009; Bertics et al., 2010; Quaiser et al., 2010). Several individual NifH clones had 97% sequence similarity to Halorhodospira halophila SL1 and Teredinibacter turnerae T7901. H. halophila is an anaerobic halophilic phototroph with nitrogenase activity under light conditions, and its nifH DNA fragments have been amplified 63 from various environments, usually at relatively low sequence similarities (Tsuihiji et al., 2006; Falcón et al., 2007; Zadorina et al., 2009; Ma et al., 2010). T. turnerae is a mesophilic endosymbiotic J-Proteobacterium isolated from molluscs (Bivalvia: Teredinidae) which is able to fix nitrogen under microaerobic conditions (Distel et al., 2002). Cyanothece sp. CCY0110 (ZP_01727765) was the only cyanobacterial match in the 1996 clone library, with 95% average sequence similarity.

3.3.4 BLAST and BLASTX comparative analysis

A common match in both clone libraries, based on BLAST analysis, were uncultured nifH sequences from microbial mats in Guerrero Negro (GN), Mexico, dominated by a Lyngbya sp. (Moisander et al., 2006). Common nifH sequences suggest, to a certain extent, that the same diazotrophs were present in the 1996 and 2004 stromatolites communities, but do not imply that they employed similar nitrogen fixation patterns.

The GN microbial mats were collected from a tidal flat, which underwent alternating desiccation/wetting periods pending tidal flooding; they were therefore subjected to alternating levels of salinity (sea water-hypersalinity), in a similar fashion to the intertidal columnar stromatolites in Hamelin Pool. Additional cyanobacterial species, purple and colourless sulphur bacteria were also identified (Omoregie et al., 2004b) and it was not surprising that the clones sequence similarity was only 86%-87% similar to the nifH nucleotide sequences of the GN mat samples (Moisander et al., 2006) after taking into consideration the varying environmental salinity levels and methodological differences between our study and the GN mats studies. The 1996 and 2004 stromatolite nifH clones may represent novel sequences due to local adaptations to their own environment and its specific characteristics such as salinity levels, and nutrient dynamics.

Based on the BLASTX results, Cyanothece sp. CCY0110 (ZP_01727765, Cyanobacteria) emerged as the common match for both clone libraries at 93% and 95% average sequence similarity. This was a major component of the 2004 clone library but less prevalent in the 1996 clone library (only 8.11% of the clones). Cyanothece sp. CCY0110 is constantly found and cultured from various hypersaline and marine environments (Garcia-Pichel et al., 1998; López- Cortés et al., 2001) and as mentioned previously, known to fix nitrogen under dark, aerobic conditions.

64

3.3.5 Phylogenetic analysis

Most of the reference NifH sequences were obtained from the Swiss-Prot database (Boeckmann et al., 2003), in which they were manually annotated, reviewed and verified, thus providing a reliable genetic framework into which we integrated the clones sequences and additional BLASTX hits (see appendix A, table A-5). The LG model (Le-Gascuel) is an improved model over WAG and JTT in estimating amino acid substitution rates and in general provides better tree topologies and likelihood probabilities (Le and Gascuel, 2008; Guindon et al., 2010). The LG model takes into consideration not only variations in amino acid substitutions per site but also whether a site is slow or fast to change due to evolutionary constraints. Using deduced amino acid sequences instead of nucleotides might cause loss of some information in regards to synonymous vs. non-synonymous substitutions, which then might provide a different evolutionary presentation of the nifH gene. Yet, because nifH is a coding gene for a protein, it is logical to view the code at the amino acid level, where it will be subjected to far more selective pressure arising from physical and chemical conditions within the cell. The resulting branch support values in this study were satisfactory and provided a reliable representation of the possible evolution of nifH genes amongst Archaea and Eubacteria (Posada et al., 2009).

For this analysis, a total of 232 NifH amino acid sequences with an average length of 120 residues, were subjected to a maximum likelihood analysis. This produced a phylogenetic tree with four major clusters, corresponding to NifH designated clusters I-IV (Chien and Zinder, 1996; Zehr et al., 2003a; Raymond et al., 2004a), plus two smaller clusters, one affiliated with Desulfuromonadales representatives from the G-Proteobacteria and another cluster of Roseiflexus spp. NifH amino acid sequences (93 and 96 branch support values, respectively, figure 7).

65

Figure 7 A phylogenetic tree based on Maximum-likelihood analysis of partial NifH amino acid sequences. Sequences determined in this study were given an alphanumeric prefix RSAYYYY and are marked bold; number of clones is in parenthesis; the scale bar represents the number of substitutions per 100 bases.

Briefly, cluster I contained the conventional Mo-containing NifH sequences most of them affiliated with Proteobacteria, Cyanobacteria and Firmicutes (figure 8). Cluster I contained a total of 46 stromatolite NifH clones and had a branch support value of 88 within the entire NifH tree. Cluster II included phylotypes with an alternative nitrogenase containing Fe instead of Mo or V. These included Archean methane producers, and alternative nifH genes (nifH2, nifH3) from Firmicutes, α,J-Proteobacteria and Spirochaetes. Cluster III included NifH sequences of anaerobic diazotrophs with conventional nitrogenase (mostly Mo), mainly from the G- Proteobacteria, Spirochaetes, Chlorobi group, Firmicutes and Archaea (figure 9). Cluster III contained a total of 18 stromatolite NifH clones and had a support value of 75 within the entire NifH tree. NifH Cluster IV was very divergent and included mostly strict anaerobic Archaean genera, some with alternative nifH genes: , Methanosarcina, Methanobrevibacter, Methanothermobacter (nifH2), Methanobacterium, Methanocaldococcus and Methanococcus. The only exception to the Archaea was an alternative nifH gene copy of Rhodobacter capsulatus (nifH2), a phototrophic purple non-sulphur α-Proteobacterium.

66

Eleven NifH clones from the 1996 stromatolite library were affiliated with the closest out-group to cluster I - Pelobacter carbinolicus DSM 2380 (G -Proteobacteria, figure 8). Three sub-clusters in cluster I, 1-Cyan-A/B/C, were treated as one sub-cluster designated “1B” by Zehr et al (2003) and included only cyanobacterial NifH sequences. The sub-cluster 1-Cyan- B branch support value was 89 and included unicellular Cyanobacteria: Cyanothece, Gloeothece and a very divergent Cyanobacterium UCYN-A NifH sequence. It is unclear why they clustered separately from Cyanothece and Gloeothece NifH sequences in 1-Cyan-A and 1-Cyan-C. The entire 2004 stromatolite clone library, 38 clones in total, clustered closely to a NifH sequence of Xenococcus PCC 7305, in the cyanobacterial cluster 1-Cyan-B (O08262 accession ID, 108AA length). Three 1996 stromatolite clones clustered separately from the 2004 sequences, but in the same-sub cluster. Xenococcus sp. NifH fragments were identified in marine sponges, coral reef lagoon seawater samples from Heron Island, Australia, in microbial mats from Guerrero Negro (GN) salt ponds in Baja California, and additionally from core samples of marine stromatolites from Highborne Cay, Bahamas (Steppe et al., 2001; Omoregie et al., 2004c; Hewson et al., 2007; Mohamed et al., 2008b). In these studies nitrogenase activity was highest during the dark period, but the activity was not attributed to a specific bacterial group. Xenococcus PCC 7305 specific strain is known to fix nitrogen anaerobically (Fay, 1992; CRBIP, 2007), though its sequence clustered with the aerobic diazotrophic Cyanothece spp.

Two other sub-clusters included five 1996 stromatolites clones - 1-Prot-J-B and 1-Prot-J-C, with branch support values >74. Two clones were closely affiliated with Marichromatium purpuratum, also known as Chromatium purpuratum, which is a halophilic purple sulphur anaerobic J-Proteobacterium, phototrophic, with high G+C content (68.9%) and 25-35 °C optimal growth temperature, usually found in anoxic marine sediments, marine sponges and other marine invertebrates (Proctor, 1997; Imhoff et al., 1998). Accordingly, its NifH fragments have been found in water surface samples from a river estuary, Hawaiian corals and in a tropical intertidal lagoon (Affourtit et al., 2001; Bauer et al., 2008; Olson et al., 2009). The above clones were originally matched in BLASTX as the halophilic γ-Proteobacterium Halorhodospira halophila SL1 at 93% sequence similarity (table 4). H. halophila and M. purpuratum NifH sequences share a high level of sequence similarity - 91%, confirmed by another published analysis of H. halophila NifH sequence which positioned it within the same cluster as M. purpuratum (Imhoff et al., 1998; Tsuihiji et al., 2006; Bertics et al., 2010). This ‘mismatch’ between BLASTX match and the phylogenetic affiliation was due to the different basic assumptions employed in BLAST and BLASTX algorithms vs. the phylogenetic modelling. BLAST and BLASTX are statistical methods designed to ‘fish out’ significant matches from huge databases, without any evolutionary framework or assumptions (Altschul et al., 1997;

67

Ladunga, 2002b). Hence, because these two clones had almost the same sequence length as H. halophila SL1, 120 vs. 121 residues, while M. purpuratum had only 109 residues and was short of two known conserved motifs – “CDPKAD” at the beginning of the NifH partial sequence and “GEMMAL/M” further along the sequence - BLASTX analysis chose H. halophila SL1 as the best ‘correct’ match for these clones.

Phylogenetic models, on the other hand, incorporate evolutionary assumptions into their algorithms such as time reversibility, amino acid substitution matrices, base frequencies, proportion of invariable sites and more (Sullivan and Joyce, 2005). Therefore, the few non- conserved residues between the clone sequences and H. halophila SL1 and M. purpuratum, eventuated in these two clones clustering with M. purpuratum instead of H. halophila SL1, disregarding the length issue.

Figure 8 next page: Cluster I phylogenetic tree, based on Maximum-likelihood analysis of NifH partial amino acid sequences. Sequences obtained in this study were given an alphanumeric prefix RSAYYYY and are marked bold, branch support values (approximate likelihood-ratio test, aLRT) are shown for key branches; only values > 50 were considered significant. Text box contain designation of clusters and in parenthesis is the closest sub cluster nomination as per (Zehr et al., 2003a). ‘1’ - cluster I, Prot=Proteobacteria, Cyan=Cyanobacteria, Firm=Firmicutes. The scale bar represents the number of substitutions per 100 bases. Out-group was Desulfuromonadales (G-Proteobacteria) NifH sequences from Geobacter and Pelobacter genera. 68

1-Prot-αβ (1J, 1K)

1-Cyan-A (1B)

1-Cyan-B (1B)

1-Cyan-C (1B)

1-Firm-A (1D)

1-Prot-J-A (1P)

1-Prot-J-B (1M)

1-Prot-J-C (1H, 1T, 1l, 1U)

69

Additionally, three 1996 clones clustered with Teredinibacter turnerae T7901 in a Vibrio spp. cluster (92% amino acid sequence similarity, table 4). T. turnerae is an endosymbiotic J- proteobacterium isolated from molluscs (Bivalvia: Teredinidae), that can fix nitrogen under microaerobic conditions, at seawater salinity level (Fiore et al., 2010). This genus cluster with Pseudomonas spp. based on its 16S rDNA sequence, yet its NifH amino acid sequence clustered with Vibrio spp. rather than Pseudomonas (Distel et al., 2002). There are a few bivalve species living in Hamelin Pool (and Shark Bay in general), hence it is reasonable to assume T. turnerae integrated structurally within the columnar stromatolite. The abundant bivalves Fragum hamelini Iredale, Fragum erugatum and the small bivalve Irus irus (Linnd), which is found at the sides of many sub tidal stromatolites, have been reported in this area (Hoffman and Walter, 1976; Playford et al., 1976; Flint and Abeysinghe, 2000/07), yet this is the first report of a T. turnerae NifH fragment in a stromatolite microbial mat.

As mentioned earlier, cluster III contained 18 stromatolite NifH clones, entirely from the 1996 stromatolite clone library, which clustered in two sub-clusters: 3-Prot-G-A and 3-Prot-G-B, each with branch support values >89 (figure 9). Nine clones clustered with Desulfovibrio gigas (P71156) and Desulfonatronospira thiodismutans ASO3-1 (D6SLD2) - G-Proteobacteria, sulphate reducers and strict anaerobes. D. thiodismutans ASO3-1 is an obligatory alkaliphilic (optimum pH 10) bacterium with moderate salinity acceptance and maximum growth temperature of 43 °C (Sorokin et al., 2008). It has not been detected in Shark Bay or in other marine microbial mats to date, perhaps because the sequence is relatively new in the databases (first entry 10th august-2010) and therefore additional confirmation may follow. A singular clone was closely affiliated with D.gigas, whose nifH DNA fragments were found in plant rhizopheres, marine sediment samples and in a few cyanobacterial mats (Zehr et al., 1995; Moisander et al., 2007). The same clone had 97% BLASTX sequence similarity to D. magneticus RS-1 (Table 4), which the phylogenetic analysis had assigned to a different sub-cluster, 3-Prot-G-GS. Ten residues were not conserved between D. magneticus and the 1996 stromatolite clone and were sufficient for the phylogenetic model to place the clone sequence with D.gigas instead of D. magneticus. Desulfovibrio spp. seem to fix nitrogen within marine sediment microcosms regardless of light conditions (Postgate et al., 1988; Kent et al., 1989; Musat et al., 2006). Some studies suggested that the genus fixed nitrogen mainly during dark periods in marine intertidal microbial mats (Zehr et al., 1995; Steppe and Paerl, 2002).

Additionally, nine 1996 stromatolite clones clustered with an alkene-degrading, sulphate- reducing bacterium - Desulfatibacillum alkenivorans strain AK-01 (B8FAC4) which was first isolated from oil-polluted sediments of a sewage plant (Cravo-Laureau et al., 2004). Related 70 nifH sequences were reported in low abundance from a low temperature, acidic peat bog and in a ghost shrimp benthic burrow within intertidal lagoon waters (Zadorina et al., 2009; Bertics et al., 2010).

Briefly summarising, our phylogenetic analysis indicated that 100% of the 2004 and 22% of the 1996 stromatolite clone libraries were affiliated with cluster I. Almost 30% of the 1996 stromatolite clone library sequences were associated with an out-group to cluster I, which was composed of Desulfuromonadales representatives from the G-Proteobacteria, and an additional 48% were affiliated with cluster III. Neither clone library had representatives in cluster II or cluster IV as designated by Zehr et al. (2003a). Combined to a unified representation of potential diazotrophs in columnar stromatolites, cluster I clones would represent 61% and cluster III clones would represent 39% of the diazotrophic community composition.

Additionally, a NifH sequence of Xenococcus PCC 7305, in the cyanobacterial sub-cluster 1- Cyan-B, was a common phylogenetic affiliation for both clone libraries. This may indicate, as with the previous BLAST and BLASTX analyses, that this was the common diazotrophic specie in columnar stromatolites. The few inconsistencies between the BLAST or BLASTX results and the phylogenetic assignments emphasize the importance of applying at least two different methods on the same batch of nucleotide or amino acid sequences, in order to gain an unbiased view of the possible outcomes from the original sequences.

71

3-Prot-G-A (3P)

3-Spiro-A (3L)

3-Prot-G-B (3B, 3E, 3L)

3-Prot-G-GS (3L, 3T)

3-Firm-Arch (3C, 3D, 3A)

Figure 9 cluster III phylogenetic tree based on Maximum-likelihood analysis of partial NifH amino acid sequences. Sequences determined in this study were given a prefix RSA and are marked bold, branch support values (approximate likelihood-ratio test (aLRT)) are shown for key branches; only values > 50 were considered significant. Spir=Spirochaetes, Arch=Archaea. The scale bar represents the number of substitutions per 100 bases.

72

3.3.6 Coverage, diversity and community structure

Before analyzing richness, diversity and structure, it was necessary to ascertain whether the clone library coverage was sufficient enough to provide a decent assessment of the above factors. The program “Mothur” employs molecular distance matrices in order to calculate various ecological parameters and coverage estimates, and has been successfully used in microbial ecological studies (Schloss et al., 2009). The Mothur software version used in this study did not provide a sub-program to calculate distances of amino acid sequences, so after aligning sequences with “Muscle” and confirming alignment quality against known NifH reference sequences, the Probability Matrix from Blocks (PMB, Veerassamy et al., 2003) as implemented in “PHYLIP Protdist”, version 3.67 (Felsenstein, 2007), was used for that purpose. PMB is derived from the popular BLOSUM matrices for amino acid substitutions and from the Blocks database (Henikoff et al., 1999; Henikoff et al., 2000). This matrix takes into consideration aligned ungapped conserved regions and adjusts amino acid substitution scores based on evolutionary assumptions (e.g. evolutionary distances are additive in a linear fashion). The resulting model is strongly based in empirical data as it included the NifH/BchL/ChlL family, and was suitable for use with NifH sequences which have several conserved blocks in the sequence.

Statistical analyses of the clone libraries from 1996 and 2004 stromatolites are presented in tables 5-7, as well as collector curves for coverage of all libraries (figure 10). These curves represent the frequency data for each distance level (0.01-0.12, figure 10) plotted against the number of unique sequences or species observed. In other words, the data is based on the number of observed OTUs as a function of distance between sequences and the number of sequences sampled. Therefore, when a curve reaches an asymptote, it means no more unique sequences were observed for a specific distance level, full species coverage attained, and no need to sample the clone library any further (Schloss et al., 2004).

The estimated clone library coverage was 73% for 99% phylotype cutoff and up to 97% coverage for 87% phylotype cutoff, regardless of the sampling year (table 5). Phylotype cutoff of 99% meant sequences were at a maximum distance of 1% from one another. Therefore, the coverage of potential diazotrophs was comprehensive and the clone libraries were representative of the diazotrophic diversity in our samples. At 100% phylotype cutoff (unique sequences), 34 OTUs were identified and the number of observed species by Chao1 non parametric estimator for richness was 121.75 (63.88-291.73, 95% CI), indicating that when sampled to completion there would be between 29 and 257 additional NifH species. Shannon-Wiener index of diversity (H’) estimator was 2.72 (2.37-3.07, 95% CI). 73

Between 100% - 87% phylotype cutoff , 8.82% to 55.55% of the OTUs respectively (table 5), were shared between the clone libraries, indicating common OTUs of NifH sequences in the 2004 and 1996 clone libraries. At 87% phylotype cutoff, Yue and Clayton’s non-parametric estimator for similarity (θ) was 0.85, indicating a high proportional similarity between the clone libraries. Similarity (θ) between the libraries was estimated at 0.3 under 100% phylotype cutoff (lower values).

Table 5 Shared coverage, observed richness, diversity & similarity estimators, based on NifH translated amino acid sequences from both clone libraries. Phylotype Shared Community Richness Diversity index Coverage cutoff OTUs OTUs Similarity index Chao1 Shannon–Wiener (%) (a) (%) (%) (d) (θ) (e) (95% CI) (b) (95% CI) (c)

100 34 8.82 64.00 0.30 121.75(63.88-291.73) 2.72(2.37-3.07) 99 28 17.85 73.33 0.46 75.50(42.89-179.48) 2.51(2.18-2.84) 98 23 30.43 82.67 0.58 38.60(27.23-80.53) 2.38(2.07-2.68) 96 19 36.84 85.33 0.69 37.33(23.48-94.08) 2.16(1.87-2.45) 95 15 46.66 92.00 0.75 18.75(15.64-37.02) 2.00(1.73-2.27) 90 10 50.00 96.00 0.82 10.75(10.07-18.45) 1.52(1.25-1.78) 87 9 55.55 97.33 0.85 9.33(9.02-14.96) 1.49(1.24-1.74)

Abbreviations: CI, confidence interval; OTUs, operational taxonomic units. (a) The coverage index was calculated by the method of Good (1953). (b) The richness index was calculated by the method of Chao et al. (1993). (c) The diversity index calculated by the method of Shannon–Wiener (Krebs, 1989). (d) Number of shared OTUs between libraries. (e) Yue and Clayton’s (2005) community overlap measure based on shared OTUs proportions (Yue and Clayton, 2005).

Statistical analysis by Libshuff (Singleton et al., 2001), confirmed that the clone libraries were not significantly different from one another (significance >0.025, table 6). The marginal significance (0.01-0.05) given by the parsimony method (P-test) and weighted UniFrac test (Martin, 2002; Lozupone and Knight, 2005) indicated that the structural similarity between the communities might not occur by chance, which can be interpreted to mean that the communities were not significantly different from one another (table 6).

74

Table 6 Community structure comparisons based on NifH translated amino acid sequences.

dCXYScore Significance Libshuff (a) Strom2004-Strom1996 0.00059148 0.2839 Strom1996-Strom2004 0.00106589 0.0912

Corrected P-value Significance Parsimony(b) Strom1996-Strom2004 0.0300 Marginally significant* UniFrac(c) Strom1996-Strom2004 0.0200 Marginally significant

(a) Libshuff analysis calculated using the Cramer-von Mises test statistic with 10,000 randomisations by the method of Schloss et al. (2004). (b) Parsimony statistical test (P-test) with 100 permutations by the method of (Martin, 2002) corrected for multiple comparisons using the Bonferroni correction. (c) UniFrac statistical test (P-test) with 100 permutations by the method of (Lozupone and Knight, 2005), corrected for multiple comparisons using the Bonferroni correction. * Marginal significance 0.01-0.05 as calculated by UniFrac (Lozupone et al., 2006).

According to the collector’s curve analysis of OTU’s and based on the furthest-neighbour algorithm and a distance precision of 0.01 (Schloss and Handelsman, 2005), the number of unique NifH amino acid sequences began to stabilize and reach an asymptote at the 99% phylotype cutoff, in each clone library (table 7, figure 10). This meant full species coverage was attained at that cutoff. The 99% phylotype cutoff meant that all sequences were at a maximum distance of 1% from one another. At the 99% phylotype cutoff, the 2004 clone library sequences were grouped into 6 OTUs only, and grouped into one OTU at 93% phylotype cutoff (0.07 distance), indicating the relatively low diversity of the 2004 clone library. The 1996 stromatolite sequences grouped into 20 OTUs at the 99% phylotype cutoff, and even at the 88% cutoff (0.12 distance) were still not grouped into one collective OTU, as occurred with the 2004 stromatolite sequences. This indicated a higher diversity and potential richness of the NifH sequences from the 1996 clone library compared with the 2004 clone library.

Figure 10 Collector’s curves for taxa (defined here as OTUs), with phylotype cut-offs of 99% (0.01) - 88% (0.12), based on NifH translated amino acid sequences. (A) 1996 stromatolite clone library (B) 2004 stromatolite clone library.

75

Table 7 Coverage, observed phylotype richness and diversity indices for each clone libraries, based on NifH translated amino acid sequences. Diversity index Sequence Coverage Richness Phylotype Shannon– length Good OTUs Index Chao1(b) cutoff (%) Wiener (c) analysed (%)(a) (95% CI) (95% CI) 1996 100 105 AA 43.24 25 130 (56.46-375.48) 2.97 (2.65-3.28) Stromatolites 99 51.35 22 98.5 (43.99-288.10) 2.76 (2.42-3.09) 98 67.57 18 40 (23.58-104.73) 2.55 (2.23-2.86) 96 72.97 16 31 (19.50-80.24) 2.40 (2.09-2.72) 95 83.78 13 16.75 (13.64-35.02) 2.21 (1.91-2.50) 90 91.89 10 10.75 (10.07-18.45) 1.96 (1.69-2.23) 87 94.59 9 9.33 (9.02-14.96) 1.91 (1.66-2.16)

2004 100 104 AA 84.21 9 14.00 (9.86-37.91) 1.11 (0.67-1.55) Stromatolites 99 94.74 6 6.33 (6.02-11.96) 0.91 (0.53-1.29) 98 97.37 5 5.00 (5-5.00) 0.85 (0.50-1.19) 96 97.37 3 3.00 (3-3.00) 0.55 (0.30-0.81) 95 100.00 2 2.00 (2-2.00) 0.44 (0.24-0.63)

Abbreviations: CI, confidence interval; OTUs, operational taxonomic units. (a) The coverage index was calculated by the method of Good (1953). (b)The richness index was calculated by the method of Chao et al. (1993). (c) The diversity index by the method of Shannon–Wiener (Krebs, 1989).

The 1996 columnar stromatolite diazotrophic community at 98% phylotype cutoff grouped into 18 OTUs, and the number of observed species by Chao1 non-parametric estimator for richness was 40 (23.58-104.73, 95% CI, table 7), indicating that when sampled to completion there would be between 5 and 86 more NifH species obtained. The 2004 clone library included 5 OTUs at 98% phylotype cutoff, and the number of observed species by Chao1 estimator for richness was 5 (5 with 95% CI) which indicated that at this specific cutoff, all NifH species were sampled to completion. Shannon-Wiener index of diversity (H’) was 2.55 and 0.85 for 1996 and 2004 clone libraries, respectively. With an estimated coverage of >67% for both libraries, at 98% phylotype cutoff, it was clear that the 2004 clone library was far less diverse and less rich in NifH species compared with the 1996 clone library.

A possible explanation for the differences in diversity and richness estimators between libraries might originate from different environmental conditions at the time of sampling. Mean rainfall (mm) from 1990 to 2010 in Hamelin Pool was 199.7 mm y-1, and in the month of May alone was 29.8 mm (Bureau of Meteorology, 2011). In 2004 and 1996 there was no substantial deviation from this mean in May, yet 1996 had much higher rainfall occurring throughout the year. Hamelin Pool experienced far more rainfall in February, June, July, August and October 1996, culminating in a total of 299.2 mm rainfall (50% increase).

This would have changed the local water budget, usually dominated by evaporation of freshwater and influx of saline oceanic waters (Smith and Atkinson, 1983). Increase of fresh

76 water would probably have lowered salinity levels, washed additional nutrients into the bay and changed Hamelin pool’s water chemistry. These conditions would further influence microbial community composition, allowing proliferation of new phylogenetic groups to participate in new biochemical processes and niches, as has been evident in other hypersaline microbial mats under similar conditions (Yannarell et al., 2006). As expected from arid and dry conditions, 2004 library NifH sequences were affiliated with Cyanobacteria, a resilient group of microorganisms which flourish (sometimes exclusively) under various stressful conditions (Paerl et al., 2000; Pandey et al., 2004; Yannarell et al., 2007). The 1996 library included far more non-cyanobacterial nitrogenase sequences, which was also the case for a sample, taken during the wet season, of a hypersaline microbial mat from Salt Pond, San Salvador Island, Bahamas (Yannarell et al., 2006). Currently we do not have additional environmental data to further support the above suggestion or offer an alternative explanation.

3.3.7 Nitrogen fixation potential in Shark Bay

Past studies of the stromatolite bacterial communities in Hamelin Pool, Shark Bay, have suggested the presence of several possible diazotrophs based on 16S rDNA molecular analyses and culturing efforts (table 1). Bacterial matches between those studies and this study, included uncultured clones of the sulphate reducer Desulfatibacillum alkenivorans and clones with less than 90% sequence similarity to Desulfovibrio africanus and P.carbinolicus DSM 2380, sampled from smooth and pustular mats in the same locality (Allen, 2006; Allen et al., 2009). In addition, cyanobacterial matches to this study included Xenococcus, Oscillatoria and Cyanothece isolates at 92% - 93% sequence similarity, below the acceptable threshold of 95% sequence similarity for a positive genus identification (Everett et al., 1999; Clarridge, 2004). Xenococcus spp. were isolated from pustular and smooth mats and Cyanothece and Oscillatoria spp. were isolated from columnar stromatolites in the past (Burns et al., 2004; Goh et al., 2008). Since few stromatolite NifH clones were affiliated in the phylogenetic analysis with Marichromatium purpuratum, it is worth mentioning that an obligate halophilic diazotrophic strain of Chromatium vinosum (also known as Allochromatium vinosum, from the same family - Chromatiaceae), was isolated from surface deposits of columnar stromatolites, in the intertidal zone of Hamelin Pool (Bauld et al., 1986).

While taking into consideration all the possible diazotrophic genera in Hamelin Pool stromatolites (table 1, underlined names), this study has confirmed δ-Proteobacteria and Cyanobacteria representatives were present in columnar stromatolites. More specifically, Desulfatibacillum and Chroococcales, Oscillatoriales and Pleurocapsales members - Cyanothece, Xenococcus and Oscillatoria were identified. Additional potential diazotrophs 77 were a novel discovery and were not identified before: Cyanobacterium UCYN-A and γ- Proteobacteria members Teredinibacter and Halorhodospira in cluster I, δ-Proteobacteria representatives Desulfovibrio and Desulfonatronospira in cluster III and Pelobacter in the out- group to cluster I.

Because the nifH gene is present in a relatively limited number of Eubacteria genomes, in comparison to 16S rDNA, we would definitely not expect diversity based on nifH to exceed diversity estimates based on 16S rDNA analysis. Diversity estimates would be lower also because they were based on amino acid sequences, not nucleotides. A 4% distance within a group of amino acid sequences might underestimate a more diverse population of nucleotide sequences. However, for OTU based analysis purposes, we can go forward, bearing this assumption in mind while discussing molecular diversity based on NifH translated sequences vs. 16S rDNA.

Previous molecular analyses of 16S rDNA, from smooth and pustular stromatolite mats, generated bacterial clone libraries that were fairly similar to one another in terms of their richness and diversity (Allen et al., 2009), yet columnar intertidal stromatolite clone libraries had lower estimates of diversity and OTUs richness (Allen et al., 2009). At the 98% phylotype cutoff, bacterial smooth mat sequences were grouped into 111 OTUs, with a Chao1 richness estimator of 6216, and 4.71 for Shannon-Wiener index of diversity (H’). At the same cutoff (98%), pustular mat sequences were grouped into 110 OTUs, Chao1 = 3053, H’ = 4.7. Columnar intertidal stromatolite sequences, on the other hand, grouped into 34 OTUs, Chao1 = 45.2 and H’ = 2.89 at the same cutoff level, which indicated a substantial drop in richness and species diversity. Additional 16S rDNA-based studies confirmed relatively low richness and diversity estimators for the bacteria within columnar stromatolites from Shark Bay (Papineau et al., 2005; Goh et al., 2008). This consistent finding can be attributed to the fact that columnar stromatolites contain lower biomass in general and higher net carbon precipitation, and therefore undergo lithification, producing less space and volume in which microorganisms can live (Dupraz and Visscher, 2005).

The shared estimators for richness and diversity from both NifH clone libraries were slightly lower compared to the above mentioned 16S rDNA-based analysis of bacterial communities (Burns et al., 2004; Allen et al., 2009). At a 98% phylotype threshold, NifH sequences were grouped into 23 OTUs, with a Chao1 richness estimator of 38.6 and H’ = 2.38 (table 5). Because our analysis was based on amino acid sequences, it underestimated, to a certain degree, the true diazotrophic diversity within stromatolites. However, our NifH estimates were on the same scale as 16S rDNA-based richness and diversity estimation, in columnar stromatolites and

78 it is possible the nifH DNA fragments were similarly diverse and abundant as 16S rDNA fragments. A cautious conclusion based solely on diversity and richness calculations, would be that the bacterial community in columnar stromatolites specifically, is comprised mostly from diazotrophic species, and may exhibit spatial and temporal differentiation in regards to nitrogen fixation.

Uncultured nifH clones from Guerrero Negro (GN) salt ponds were a common finding in our BLAST analysis (see tables 3 & 4). Microbial mats from the Guerrero Negro in Baja California, Mexico, provide a well-studied system which is similar, in certain characteristics, to the Shark Bay system. Furthermore, in order to provide a likely depiction of the active and potential nitrogen fixers in columnar stromatolites, we reviewed findings from our study, the GN studies and former 16S rDNA-based analyses of the Hamelin Pool stromatolites (table 8).

The GN study site is set in a hyperarid climate (sporadic rainfall of 35 mm yr-1), with mean monthly maximum high temperature of 29°C (Summers et al.) and high evaporation rates (1500 mm yr-1) (Jørgensen and Des Marais, 1990). A gentle tide of 0.5 – 1 m floods onto narrow, shallow trenches and creates a natural large marsh land with shallow pools and hypersaline evaporitic ponds (80‰ - 108‰ salinity), in which cyanobacterial mats prosper (Fryberger et al., 1990; Jørgensen and Des Marais, 1990). While the environmental characteristics are similar in general to those of Shark Bay’s Hamelin Pool, the mat morphologies differ, as columnar stromatolites (also known as ‘stromatolite heads’) are not present in the Guerrero Negro study site (Javor and Castenholz, 1981; Hoehler et al., 2001).

Generally, bacterial communities were found to be similar between Hamelin Pool (HP) and Guerrero Negro (GN) in terms of based on 16S rDNA analysis, but they were not identical. Some of the most abundant bacterial divisions in HP mats were also abundant in GN mats – mainly α-Proteobacteria, Bacteroidetes, Planctomycetes and G-Proteobacteria (Ley et al., 2006; Goh et al., 2008; Allen et al., 2009).

79

Table 8 Potential diazotrophs in Hamelin Pool (HP) and Guerrero Negro (GN), based on 16S rDNA or nifH genes molecular analysis. Potential diazotrophs based on Potential diazotrophs based on nifH

16S rDNA (a,b) gene (c,d) Chroococcidiopsis Cyanothece* Gloeocapsa* Common Halothece Potential Cyanothece* Leptolyngbya diazotrophs Desulfovibrio* Lyngbya in GN and Myxosarcina* Microcoleus HP Oscillatoria Phormidium* Synechocystis

HP GN HP GN Bacillus Cyanobacterium Anabaena Desulfatibacillum UCYN-A* Azotobacter* Gloeothece* Desulfatibacillum* Burkholderia* Halomonas Desulfonatronospira Clostridium* Methanosracina* Halorhodospira Dermocarpa Myxosarcina* Marichromatium Desulfonema* Pleurocapsa* Chlorobium Oscillatoria* Halothece Pseudoalteromonas Desulfobacter Unique Pelobacter Klebsiella* Pseudomonas Desulfobacterium potential Teredinibacter Plectonema Rhodobacter Desulfococcus diazotrophs Xenococcus Synechocystis* Rhodopseudomonas Desulfovibrio in GN or HP Rhodospirillum* Pseudanabaena Stanieria* Symploca Synechococcus* Vibrio Xenococcus*

* 16S rDNA and nifH genes sequence similarity was less than 95% to a designated genus. (a) Data collected from the following references: (Burns et al., 2004; Papineau et al., 2005; Goh et al., 2006; Allen et al., 2008; Allen et al., 2009) (b) Data collected from the following references: Risatti et al., 1994; López-Cortés et al., 2001; Ley et al., 2006. (c) Data from this study. (d) Data collected from the following references: Omoregie et al., 2004a; Omoregie et al., 2004c. It does not include results from green house experiments.

The majority of potential diazotrophs in GN mats were affiliated with cluster III representatives - G-Proteobacteria and Firmicutes; yet included also cluster I representatives such as Cyanobacteria, β,J-Proteobacteria (table 8 and references within). There were no representatives of Pelobacter spp. or associations with cluster II or cluster IV. Common potential diazotrophs in HP and GN, based on 16S rDNA, included 10 cyanobacterial representatives from cluster I - 80

Chroococcales, Oscillatoriales and Pleurocapsales groups, while unique GN potential diazotrophs included six genera mainly from G-Proteobacteria, cluster III.

There were fewer common diazotrophs in HP and GN, based on nifH gene studies. These included Cyanothece, Myxosarcina (cluster I) and Desulfovibrio genera (cluster III). The GN site had 10 unique potential diazotrophs, and our study has identified 9 unique potential diazotrophs in HP, all of which were affiliated with cluster I or III.

Following reverse transcriptase PCR analysis in the GN mats, it was concluded that actual nitrogen fixers during night time were Halothece sp. strain MPI96P605, Myxosarcina strain ATCC 29377, Synechocystis sp. strain WH8501, Plectonema boryanum, Phormidium sp. strain ATCC 29409 and NifH2 of Anabaena variabilis ATCC 29413 from cluster I. Only one genus from cluster III was identified as an active nitrogen fixer - Desulfovibrio (Omoregie et al., 2004a). Halothece, Synechocystis, and Phormidium were detected in HP based on past 16S rDNA analysis, and Myxosarcina and Desulfovibrio were detected in HP columnar stromatolites based on this study using nifH gene analysis. This would point to a potentially similar pattern of nitrogen fixation.

In regards to community diversity and richness, the GN system, based on 16S rDNA, was estimated to harbour almost twice the number of bacterial species - 10,000 vs. 6216 in HP smooth or pustular mats (Ley et al., 2006; Goh et al., 2008; Allen et al., 2009). However, diazotroph-related estimators of richness and diversity, based on nifH gene, were not available for the GN mats and we therefore cannot compare this specific aspect. Though nitrogenase activity was not measured in columnar stromatolites, in GN mats nitrogenase activity was

-2 -1 restricted mostly to the upper 5 mm and peaked during night time (9-37 Pmol C2H4 m h , 0:00-6:00), with almost no activity during the day time (Omoregie et al., 2004b).

In summary, based on the available data, Hamelin Pool columnar stromatolites and GN mats harbour similar diazotrophic species. These include G-Proteobacteria and Cyanobacteria representatives from cluster I and cluster III of the nifH phylogeny tree. It is plausible that the nitrogenase activity in columnar stromatolites in HP would peak during night time in the upper layers of the mat, and that actual nitrogen fixers would be Desulfovibrio, Myxosarcina, Xenococcus spp. and also perhaps Halothece, Synechocystis, and Phormidium, as they were previously identified in Hamelin Pool (table 8), and in GN mat they were active nitrogen fixers. It remains to be seen if future samples from columnar stromatolites, under different environmental conditions, would reveal additional diazotrophs and their activity pattern.

81

3.4 Concluding remarks

Columnar stromatolites are one of five well known morphologies of modern stromatolites in Shark Bay, usually found in shallow hypersaline waters. In order to assess this complex microbial mat community, this study used DNA-based, culture independent, molecular techniques and provided a novel view of the microbial diazotrophic communities within columnar stromatolites.

Sequence analysis has provided statistically significant taxonomical identification and an evolutionary representation of the nifH genes in this community. Our analysis indicated columnar stromatolites, sampled from different years, included a common persisting cyanobacterial diazotroph, of the genus Cyanothece or Xenoccocous (tables 3 & 4, figure 8). The diazotrophic community structure did not vary significantly between the temporal samples according to our statistical tests (table 6). Diversity and richness did vary between the samples, probably due to environmental shifts which affected seawater salinity levels and allowed for diverse microbial groups to proliferate in 1996 (table 7). Both samples contained novel nifH gene nucleotide sequences with low similarity scores to uncultured nifH clones from saline to hypersaline environments, and translated NifH sequences with high similarity to unicellular, non-heterocystous Cyanobacteria and γ,G-Proteobacteria NifH sequences.

NifH clones sequences were mainly affiliated with cluster I and to a lesser extent with cluster III, suggesting aerobic and anaerobic bacteria with conventional Mo nitrogenase might be involved in the nitrogen fixation process. Not a single clone was affiliated with cluster II or cluster IV, while several clones were affiliated with a G -Proteobacteria out-group to cluster I, represented by P. carbinolicus DSM 2380. Taking into consideration past studies done on this community and similar microbial mats in hypersaline environments such as those present in the Guerrero Negro (GN) salt ponds, we suggest columnar intertidal stromatolites are less diverse and rich in microbial species relatively to other mat morphologies, and most of these species will retain nitrogen fixation capabilities. Additionally, it would seem marine based diazotrophic bacteria are capable of enduring hypersaline conditions and it remains to be seen what are their adaptive mechanisms.

In conclusion, Shark Bay, a UNSECO’s World Heritage site, continuously provides researchers with fascinating endemic microbiological subjects that bridge our current era with Archaean fossil records of early organic life on planet Earth. This furthers our understanding of how life began, evolved and survived dynamic environmental conditions, on a geological scale. 82

Chapter 4 The bacterial diazotrophic community in a radon hot spring, South Australia. ______

4.1 Introduction

Paralana Hot Springs (PHS) are situated in Mt. Painter, near the town Arkaroola, on the north eastern side of the Flinders Ranges, South Australia (30°10’35”S, 139°26’26”E, figure 1). The climate is arid, with an average annual rainfall of 20.3 mm, with an extreme of 1270 mm in 1974 (Sprigg, 1984). The maximum temperatures at Arkaroola can exceed 30°C during the summer months and minimum temperatures can fall below 10°C during May to September (Bureau of Meteorology, 2011). There are several water sources in the Paralana fault area; PHS is the only radioactive spring in the area, and includes two connected oval shaped pools and a draining creek. Pool 1 is the hot source pool, and pending on the time of year and flooding events, tends to vary in terms of its size, depth and temperature (2 - 9 m2 , 30 cm – 80 cm deep, 48°C – 63°C; (Mawson, 1927; Grant, 1938; Long et al., 2001; Anitori et al., 2002). Pool 2 is the larger of the two, deeper and cooler (50-80 m2, 1 to 4.5 m deep, 40.2°C - 48°C, respectively) with neutral pH (7). Both pools include microbial components, which manifest as floating microbial mats, of emerald-green colour, as well as dark benthic mats, mainly in pool 2 (Anitori et al., 2002).

The PHS system, though unique in its characteristics, is not an isolated or a closed ecosystem. It is subject to external inputs from its surrounding fauna and flora due to floods and long-standing human interest in the springs for cultural and medicinal values (Sprigg, 1984). Underground water circulates through the hot, radioactive rocks underlying the Mt. Painter Domain, and then flows near a localized radiogenic source, relatively close to the surface (Brugger et al., 2005). Hence, water is discharged at the hot source pool, at relatively high temperatures (56°C - 63°C), and very high radon levels (29,000 Bq/L, in the gas bubbles), with traces of radiogenic helium.

83

N

14 km

Figure 1: Arkaroola and Paralana Hot Springs locations in South Australia. Main satellite image by Google Earth, inset map source: Australian Bureau of Meteorology.

In general, most of the PHS studies have focused on their geology, mineralization processes, hydrothermal activity and geochemistry characteristics, while rarely analyzing the springs’ unique biological and ecological attributes (Mawson, 1927; Grant, 1938; Blight, 1977; Smith, 1992; Long et al., 2001; Anitori et al., 2002; Thomas and Walter, 2002; Brugger et al., 2005). PHS water chemistry has not changed significantly over the past 81 years, therefore its rather stable conditions can support a steady, endemic microbial community within the pools (Mawson, 1927; Grant, 1938; Long et al., 2001). The PHS waters contain at ppm concentrations levels fluorine, cesium, rubidium, tungsten and molybdenum, and uranium is at ppb concentrations levels (Brugger et al. 2005). It has been suggested that the concentration differences between the hot source pool and pool 2 of several trace elements such as Al, Cu, Pb, Y, and Zn, as well as Mn and Fe, were caused by the microbial community present in the pools, though the exact mechanisms were not suggested (Brugger and others 2005).

Furthermore, there seems to be an active microbial uptake of N2 and CO2, based on the gas bubble analysis. Relative to their known atmospheric content, the dry gas concentration of N2(g) ranged between 79 to 80%, and CO2(g) 3.8 to 5.2% in the hot source pool, while their atmospheric composition is usually 78% and 0.03%, respectively (Brugger and others 2005). 84

Identifying the functional microbial groups in the hot source pool would be of great interest, especially in light of its current temperature and radiation regime.

A single microbial study has been published to date about PHS (Anitori et al., 2002). The 16S rDNA molecular analysis detected a rich and diverse bacterial community with representatives from nine taxonomical groups - the β-Proteobacteria, δ-Proteobacteria, Cyanobacteria, Firmicutes, Bacteroidetes/Chlorobi group, Nitrospira, Chloroflexi and two candidate divisions – OP8 and OP12 (Anitori et al., 2002). The 16S rDNA clone library was composed mainly from cyanobacterial and β-Proteobacteria sequences, and few G-Proteobacteria representatives. Thermophilic bacteria were mainly affiliated with the Cyanobacteria group - a thermophilic Oscillatoria amphigranulata was a dominant sequence and additional Oscillatoriales, Chroococcales, and Nostocales at low sequence similarities were found in the temperature range of 48°C to 60°C. Mastigocladus laminosus was the prevalent sequence in a sample taken at 53°C in the hot source pool. Mesophilic heterotrophic bacteria were affiliated with the β- Proteobacteria and Nitrospira groups. PHS bacterial high diversity was demonstrated by the 180 different RFLP patterns that were detected during 16S rDNA molecular analysis of the PHS hot source pool (Anitori et al., 2002).

It would be of interest to see how whether the diversity of the diazotrophic community would complement 16S rDNA study, and whether our findings would suggest nitrogen fixation dynamics occur in PHS, in a similar fashion to Yellowstone National Park hot springs (see introduction chapter, section 1.7). Based on 16S rDNA study, and studies done in the past in hot springs, such as Yellowstone National Park, one would expect to find evidence of diazotrophic cyanobacterial representatives (heterocystous or unicellular), as well as sulphate reducers and heterotrophic diazotrophs from the Nitrospira and β-Proteobacteria groups.

Thermal radioactive springs are unique habitats for exploring microbiological agents with unique adaptations to this environment, as well as potential novel solutions for DNA stabilization and repair mechanisms, chaperone proteins and internal modifications to important proteins, and other molecular mechanisms. The main aim of this study was to explore the diazotrophic community in PHS hot source pool and compare the results to the former microbiological study of this unique site, as well as to other findings relating to thermophilic diazotrophs. Nothing is known of the potential and active nitrogen fixers in the PHS bacterial community, and it is of interest to compare this diazotrophic community characteristic to other thermal microbial systems.

85

4.2 Materials and methods

4.2.1 Sample collection

Water levels were relatively low, and white evaporitic lines were evident on the rocks surrounding the pools at the time of sampling, 11th of July, 2009. The hot source pool (pool 1) was ~30 cm deep, and pool 2 was 1-2 m deep (figures 2 and 3). The hot source pool measured ~ 1 × 2 m in a roughly oval shape, with water and gas bubbles (Radon) emerging through the sediment and gravel. Two samples, 50 ml falcon tube each, were collected from several localities in PHS: the hot source pool (pool 1), the subsequent cooler pool (pool 2) and from a ~43 m local point downstream of the Hot Spring Creek. Samples taken from sites other than pool 1, were not analysed, and therefore not discussed here.

At the time of sampling, 15:00-18:00, the hot source pool temperature was 55.6°C and at a neutral pH 7. The temperature at the sampling sites was measured with a digital pocket thermometer while the pH was recorded using pH test strips (range 0–14; Sigma, Australia). Fifty ml sterile falcon tubes were used to bore into 2 different sections of the source pool separated ~ 0.5 m from one another. Site one was right above the emergence of gas bubbles and the sediment was composed of finely grained, soft, grey particles, entangled with brown-beige coloured organic matter. The second site was further away from the gas source with no gas bubbles present and included a layered mat, spongy in structure and texture, coloured yellow, brown and dark green with less sediment particles than at site 1. The source pond was blocked from two sides by big boulders and its water trickled around them, via the sediment into the cooler pond (pool 2). After collection all samples were placed in sterile specimen bags and kept in the dark, at 4qC during transport back to Arkaroola lodge for further processing. Five hours after sampling, roughly 2 ml of the original samples were transferred, under flame, to an equal volume of RNALater buffer (Ambion, Austen, TX) and kept in the dark, at 4qC until further processing.

86

Figure 2 PHS hot source pool on the 11th of July 2009. Water and gas bubbles (Radon) emerge through the sediment just behind the right boulder.

Creek Pool 1

Figure 3 PHS pools 2 on the 11th of July 2009, with floating microbial mats.

4.2.2 DNA isolation and PCR amplification of nifH genes

Two different genomic DNA extraction methods were employed, each with three replicates from the hot source pool samples. The total genomic DNA from three replicates was extracted using the PowerPlant™ DNA Isolation Kit (MO BIO Laboratories, Inc., Carlsbad, CA) following the manufacturer’s instructions.

87

Three replicates were from the low organic content locality with gas bubbles present, and three replicates were from the distant locality, rich in organic content, no gas bubbles. Total genomic DNA was also extracted in using the XS DNA extraction method (Neilan, 1995; Tillett and Neilan, 2000). Approximately 100 mg of homogenised environmental sample was transferred to 1.5 ml eppendorf tubes, which contained 500 μL XS extraction buffer (1% potassium- methylxanthogenate; 800 mM ammonium acetate; 20 mM EDTA; 1% SDS; 100 mM Tris-HCl, pH 7.4). Then 0.2 g of silicate beads (0.1 mm) was added to each tube, and lysis performed by bead beating (BIO101/Savant FastPrep FP120, Qbiogene, Inc.), three times at the highest speed (45 seconds at 6.5 m sec-1), followed by 2 hours of 65ºC incubation with intermittent vortexing. Once properly lysed, the sample was placed on ice for 30 min or left overnight at -20ºC, after which it was centrifuged at 14000 g for 20 min.

An equal volume of phenol: chloroform: isoamyl alcohol (25:24:1) was added to the supernatant, then centrifuged at 14000 g for 5 min at 4ºC. The top layer of the supernatant was again transferred to a fresh tube and DNA precipitated using 1 volume of isopropanol (or two volume absolute ice cold ethanol) and 0.1 volume of 4 M potassium acetate. The samples were incubated at -20ºC for 2 hours, or overnight, and then centrifuged at 14000 g, at 4ºC, for 20 min to pellet the DNA. The supernatant was discarded and the pellet washed with ice cold 70% ethanol, followed by centrifugation at 14000 g. The DNA pellet was dried and resuspended in the 30PL of sterile TE buffer (10 mM Tris-HCl, 1 mM EDTA, pH 8.0). The DNA concentration was measured using NanoDrop® ND-1000 spectrophotometer (Thermo Fisher Scientific, Wilmington, DE).

The pooled extracted genomic DNA from each method, 1 μL of a 5 ng μL-1, was applied separately in a nested PCR to amplify the nitrogenase gene nifH (Omoregie et al., 2004c), as described previously in chapter 3, section 3.2.2. All PCR experiments included a negative control reaction without DNA template and a positive control using DNA extracted from the cyanobacterial reference strain DNA, Nostoc PCC 7120.

4.2.3 Clone library and Restriction Fragment Length Polymorphism (RFLP)

Ligation and transformation of freshly amplified PCR products of the nifH gene, containing an A-overhang at the 3’ end, were ligated into the pCR2.1 vector of the TOPO TA Cloning kit (Invitrogen Corporation, Carlsbad, CA) according to the manufacturer’s instructions. From each clone library, at least 50 positive (white) clones, with the correct insert size - 350 bp, were selected and their inserts amplified using the vector specific primers MpF and MpR. PCR

88 products of the correct size from positive clones were cleaned and visualized as described previously, in section 4.2.2.

Each clone was subjected to Restriction Fragment Length Polymorphism (RFLP) analysis and was screened twice, using restriction enzymes ScrFI and MspI (New England Biolabs, Ipswich, MA) separately. Each digest reaction contained 1.5 μL PCR products, 2 μL of the appropriate enzyme buffer, 1 U of restriction enzyme and sterile MilliQ water to a total volume of 20 μL and incubated at 37°C overnight. The RFLP patterns were analysed manually after electrophoresis on 2% and 3% agarose gels (molecular biology grade, Progen Pharmaceuticals, QLD, Australia) with 1x TAE-buffer, stained by ethidium bromide (1 μg ml-1) for 10-15 min and visualized as described previously in section 4.2.2.

4.2.4 DNA sequencing

Sequencing of selected clones was carried out using the PRISM Big Dye cycle sequencing system with MPF or MPR primers (3.2 PM) and 3-60 ng of cleaned PCR product. After sequencing reactions had been performed, the reaction was cleaned up and analysed as described previously in chapter 3, section 3.2.4.

4.2.5 Phylogenetic analysis

Phylogenetic analysis was carried out as described previously in chapter 3, section 3.2.5.

4.2.6 Diversity, richness and coverage analysis

NifH translated nucleotide sequences of 137 bp average lengths were aligned using the computer package “Muscle” version 3.8.31 and the clone library sampling coverage, diversity and richness were calculated as described previously in chapter 3, section 3.2.6.

4.2.7 Accession numbers

Sequences of the nifH clones are available under GenBank accession numbers KC295666- KC295692.

89

4.3 Results and discussion

4.3.1 BLAST & BLASTX comparative analysis

NifH genes were present and were amplified from Paralana hot source pool samples (pool 1, figure 4).

1 2 3 M

1000 bp

500 bp 300 bp

Figure 4 Products obtained after the second step of the PCR amplification of nifH from PHS hot source pool DNA extractions using nested primers. Lane 1: PHS hot source pool; 2: negative control sterile -1 MilliQ H2O; 3: positive control Nostoc PCC 7120; M: 0.5 μg μL GeneRuler™ DNA Ladder Mix (Fermentas, Ontario).

Seventy six clones containing the correctly sized insert (350 bp) were obtained and analysed. RFLP analysis was performed on 64 random positive clones with the nifH nucleotide insert, which grouped them into 7 groups (figure 5). Initially, three representatives of each RFLP pattern were selected for sequencing, and due to high sequence variation and diversity, additional positive clones were sequenced directly without further restriction enzyme treatment, until clone library coverage was deemed sufficient for diversity and richness analysis.

M

Figure 5. 3% agarose gel showing RFLP patterns using ScrFI restriction enzyme on 9 positive clones from PHS library. M: 0.5μg/μL GeneRuler™ DNA ladder Mix (Fermentas, Ontario).

90

BLAST and BLASTX results passed significant statistical thresholds, with BLAST expected (E) values e-33 – e-180 and BLASTX results ranged from e-47 to e-64 for all clones.

Two representative PHS nifH clones were related to Geobacter lovleyi strain SZ, with high sequence identity similarity in the BLAST results, (CP001089 accession ID, 99% sequence similarity, table 2). At 98% sequence similarity there were few clones related to Mastigocladus laminosus CCMEE 5198 (EF570547, 98%) and uncultured nifH clones (EF568492, EF568489, 98%) from the Mediterranean Sea (Man-Aharonovich et al., 2007). These matches represented less than a fifth of the clone library composition. The majority of the hot source pool clone library (43%, left pane, figure 8), was matched in BLAST with nifH genes remotely related to G. lovleyi strain SZ (88%), Desulfovibrio magneticus RS-1 (AP010904, 85%) and uncultured nifH clones from thermal, cold and saline environments, covering sediments, soil and coastal samples (Zhang et al., 2008; Farnelid et al., 2009; Brown et al., 2010; Severin and Stal, 2010; Singh et al., 2010).

Inferred NifH amino acid sequences clustered into eight different phyla and 15 different diazotrophic genera based on the BLASTX analysis. The majority of sequences in our clone library, were affiliated with the δ-Proteobacteria (33%, figure 9) and Cyanobacteria (27%), followed by β-Proteobacteria (13%), Bacteroidetes (Cytophaga-Flexibacter-Bacteroides group) and Nitrospirae (9%). The J-Proteobacteria, α-Proteobacteria and Firmicutes (low G+C Gram- positive bacteria), were the smallest represented groups, at 4%, 2% and 2%, respectively.

The highest amino acid sequence similarity matches (98%-100%, table 2) included Azotobacter vinelandii DJ (YP_002797378, 98%), RCB (YP_284634, 100%), Burkholderia sp. Ch1-1 (ZP_06839018, 100%) and Geobacter lovleyi SZ (YP_001951460, 100%). These sequences constituted slightly more than a fifth of the clone library composition. The largest section in the clone library, 37%, included translated NifH sequences at 85%-89% similarity levels to Desulfobulbus propionicus DSM 2033, Desulfatibacillum alkenivorans AK- 01, Desulfovibrio fructosovorans JJ and D. magneticus RS-1 (δ-Proteobacteria), Paludibacter propionicigenes WB4 (Bacteroidetes) and Thermodesulfovibrio yellowstonii DSM 11347 (Nitrospirae, see figure 9, right pane). Cyanobacterial matches included members of the Nostocales (Nostoc sp. PCC 7120, Anabaena variabilis ATCC 29413), Chroococcales (Cyanothece sp. CCY0110) and Oscillatoriales (Oscillatoria sp. PCC 6506) at an average sequence similarity of 95%.

91

Figure 8: BLASTN (left) and BLASTX (right) sequences similarity levels, within the PHS hot source pool clone library.

Figure 9 Phyla distribution within PHS hot source pool NifH clone library, based on the BLASTX analysis.

92

Table 2 Paralana hot source pool BLAST and BLASTX results, presenting only the highest match for each sequence. A total of 46 clones were sequenced and blasted. Highest BLAST and BLASTX sequence similarities (98-100%) were marked bold. BLAST match Sequence BLASTX match Sequence Sequence Clone Nearest bacterial Fe protein Nearest relative in GenBank accession Similarity accession similarity Phylum file ID ID match in GenBank ID (%) ID (%)

RSA158 HS1 Uncultured nifH clone EU916413 97 YP_004196476 Desulfobulbus propionicus DSM 88 δ-Proteobacteria 2033 RSA159 HS2 " EF178501 88 YP_002797378 Azotobacter vinelandii DJ 98 γ-Proteobacteria RSA160 HS3 " AY763451 78 YP_002249508 Thermodesulfovibrio yellowstonii 88 Nitrospirae DSM 11347 RSA162 HS5 " AY763455 84 “ “ 83 Nitrospirae RSA163 HS6 Mastigocladus laminosus EF570547 98 NP_485497 Nostoc sp. PCC 7120 97 Cyanobacteria CCMEE 5198 RSA164 HS7 Uncultured nifH clone AY763451 85 YP_002249508 Thermodesulfovibrio yellowstonii 85 Nitrospirae DSM 11347 RSA165 HS8 " EU916413 89 YP_002430688 Desulfatibacillum alkenivorans 89 δ-Proteobacteria AK-01

RSA166 HS9 " " 89 “ “ 89 δ-Proteobacteria RSA167 HS10 " " 86 “ “ 86 δ-Proteobacteria RSA168 HS11 " " 87 YP_004196476 Desulfobulbus propionicus DSM 87 δ-Proteobacteria 2033 RSA169 HS12 " " 86 “ “ 86 δ-Proteobacteria RSA170 HS13 Oscillatoriales cyanobacterium FJ797416 95 YP_324741 Anabaena variabilis ATCC 29413 95 Cyanobacteria JSC-1 nifH RSA171 HS14 " " 95 “ “ 96 Cyanobacteria RSA172 HS15 " " 97 ZP_07112556 Oscillatoria sp. PCC 6506 95 Cyanobacteria RSA173 HS16 Uncultured nifH clone EF178501 89 YP_002797378 Azotobacter vinelandii DJ 98 γ-Proteobacteria RSA174 HS17 " AY763451 86 YP_002249508 Thermodesulfovibrio yellowstonii 86 Nitrospirae DSM 11347

93

BLAST match Sequence BLASTX match Sequence Sequence Clone Nearest bacterial Fe protein Nearest relative in GenBank accession Similarity accession similarity Phylum file ID ID match in GenBank ID (%) ID (%) RSA191 HS10 Oscillatoriales cyanobacterium FJ797416 97 ZP_07112556 Oscillatoria sp. PCC 6506 94 Cyanobacteria JSC-1 nifH RSA192 HS18 Uncultured nifH clone AY196418 89 YP_284634 Dechloromonas aromatica RCB 95 β-Proteobacteria RSA193 HS30 " EF568492 89 YP_003447953 Azospirillum sp. B510 92 α-Proteobacteria RSA194 HS31 " GU117600 94 YP_284634 Dechloromonas aromatica RCB 100 β-Proteobacteria RSA195 HS34 " EU622627 86 YP_004042554 Paludibacter propionicigenes 87 Bacteroidetes WB4 RSA198 HS15 " EF568492 98 ZP_06839018 Burkholderia sp. Ch1-1 100 β-Proteobacteria RSA203 HS32 Oscillatoriales cyanobacterium FJ797416 96 ZP_07112556 Oscillatoria sp. PCC 6506 95 Cyanobacteria JSC-1 nifH RSA204 HS41 " “ 94 “ “ 95 Cyanobacteria RSA205 HS20 Uncultured nifH clone EF568489 95 YP_004042554 Paludibacter propionicigenes 79 Bacteroidetes WB4 RSA206 HS22 Oscillatoriales cyanobacterium FJ797416 94 ZP_07112556 Oscillatoria sp. PCC 6506 95 Cyanobacteria JSC-1 nifH RSA207 HS35 Uncultured nifH clone EU915063 86 ZP_07333883 Desulfovibrio fructosovorans JJ 86 δ-Proteobacteria RSA208 HS42 " DQ398449 91 YP_284634 Dechloromonas aromatica RCB 95 β-Proteobacteria RSA209 HS44 " EF568492 98 ZP_06839018 Burkholderia sp. Ch1-1 100 β-Proteobacteria RSA212 HS29 Oscillatoriales cyanobacterium FJ797416 94 ZP_07112556 Oscillatoria sp. PCC 6506 94 Cyanobacteria JSC-1 nifH RSA213 HS46 Uncultured nifH clone EF568489 98 ZP_06839018 Burkholderia sp. Ch1-1 100 β-Proteobacteria RSA214 HS48 Geobacter lovleyi SZ CP001089 88 YP_001951460 Geobacter lovleyi SZ 95 δ-Proteobacteria RSA215 HS49 Uncultured nifH clone EU915063 85 ZP_07333883 Desulfovibrio fructosovorans JJ 87 δ-Proteobacteria RSA217 HS8 Geobacter lovleyi SZ CP001089 99 YP_001951460 Geobacter lovleyi SZ 100 δ-Proteobacteria RSA218 HS11 " “ 99 “ “ 100 δ-Proteobacteria RSA219 HS2.1 Desulfovibrio magneticus RS-1 AP010904 85 YP_002953433 Desulfovibrio magneticus RS-1 88 δ-Proteobacteria RSA220 HS2.4 Oscillatoriales FJ797416 94 ZP_01727765 Cyanothece sp. CCY0110 94 Cyanobacteria cyanobacterium JSC-1 nifH

94

BLAST match Sequence BLASTX match Sequence Sequence Clone Nearest bacterial Fe protein Nearest relative in GenBank accession Similarity accession similarity Phylum file ID ID match in GenBank ID (%) ID (%)

RSA220 HS2.4 Oscillatoriales cyanobacterium FJ797416 94 ZP_01727765 Cyanothece sp. CCY0110 94 Cyanobacteria JSC-1 nifH RSA221 HS2.19 " “ 96 ZP_07112556 Oscillatoria sp. PCC 6506 95 Cyanobacteria RSA222 HS2.23 Geobacter lovleyi SZ CP001089 99 YP_001951460 Geobacter lovleyi SZ 100 δ-Proteobacteria RSA223 HS2.43 " “ 99 YP_001950896 “ 100 δ-Proteobacteria RSA224 HS2.16 Uncultured nifH clone GU193472 88 YP_004042554 Paludibacter propionicigenes 88 Bacteroidetes WB4 RSA225 HS2.27 Oscillatoriales cyanobacterium FJ797416 94 ZP_01727765 Cyanothece sp. CCY0110 95 Cyanobacteria JSC-1 nifH RSA226 HS2.34 Uncultured nifH clone AY224041 87 YP_002953433 Desulfovibrio magneticus RS-1 89 δ-Proteobacteria RSA227 HS2.46 " GU193822 77 YP_003639458 Thermincola sp. JR 95 Firmicutes RSA228 HS2.47 " GU193472 89 ZP_07333883 Desulfovibrio fructosovorans JJ 88 δ-Proteobacteria RSA229 HS2.56 " “ 89 YP_004042554 Paludibacter propionicigenes 88 Bacteroidetes WB4

95

4.3.2 Phylogenetic analysis

A total of 256 NifH amino acid sequences (137 AA length) were subjected to a maximum likelihood analysis and produced a phylogenetic tree with four major clusters, corresponding to NifH designated clusters I-IV, previously described in chapter 3, section 3.3.5 (Chien and Zinder, 1996; Zehr et al., 2003a; Raymond et al., 2004a).

Cluster I, known as the conventional Mo-containing NifH sequences, contained eight sub clusters and had a support value of 90 within the entire NifH tree (figure 10). All its sub clusters had support values above 74, thus the tree topology had high likelihood probabilities. Two distinct clusters were out grouped to cluster I, at branch support values of 59 (Nitrospirae) and 98 (Desulfuromonadales, G-Proteobacteria). PHS clones were not affiliated with cluster II or IV. NifH cluster III had a support value of 79 within the entire NifH tree and included anaerobic diazotrophs within five sub clusters, all with branch support values above 72.

A total of 20 NifH PHS clones were affiliated with cluster I, with the Cyanobacteria and the D,E and J-Proteobacteria (figure 11). Ten clones formed their own tight group related to cyanobacterial genera, namely: Oscillatoria sp. PCC 6506 and Cyanothece sp. CCY0110 (sub cluster 1-Cyan-B). One clone clustered closely to the Nostocales and Mastigocladus laminosus (Q47917 accession ID), in sub cluster 1-Cyan-C. Three clones were closely related to the Burkholderia spp. (D-Proteobacteria, 1-Prot-αβ sub cluster). A single clone, RSA193-HSP09, nestled individually between 1-Prot-αβ sub cluster and Paenibacillus azotofixans (Firmicutes, Q9AKT8). The BLASTX analysis indicated this clone was related to Azospirillum sp. B510, D- Proteobacteria, at 92% sequence similarity (table 2). In the sub cluster 1-Prot-βJ-A, two clones were affiliated with Azoarcus communis, and a single clone with Dechloromonas aromatica strain RCB (E-Proteobacteria, Q79AX4 and Q47G67, respectively). In the sub cluster 1-Prot-J- C, two clones were closely related to the Azotobacter spp.

96

Figure 10: Cluster I and Cluster III positions within the three main clusters of the nifH phylogenetic tree. Topology was based on Maximum-likelihood analysis of nifH amino acid sequences. Cluster I was outgrouped by Nitrospirae, and Cluster III was outgrouped by Roseiflexus spp. (Chloroflexi).

During the BLASTX analysis, several PHS clones had 85 - 100% sequence similarity to unverified NifH sequences (table 2). These included for instance, Thermodesulfovibrio yellowstonii DSM 11347 and NifH sequences from G-Proteobacteria – Pelobacter and Geobacter spp. Though these sequences were not manually annotated or verified in the Swiss- Prot database, they were nevertheless integrated into the phylogenetic analysis to provide an unbiased view (figure 11). Four NifH clones clustered with T. yellowstonii, a thermophilic sulphate-reducing organism isolated from a thermal vent in Yellowstone Lake in Wyoming, USA (Henry et al., 1994). An additional five NifH clones clustered with the Desulfuromonadales order as these clones were matched to Geobacter lovleyi strain SZ NifH sequences, at 99-100 % sequence similarity (CP001089, YP_001951460, YP_001950896, table 2). T. yellowstonii, Geobacter spp. NifH sequences and affiliated clones, clustered separately from one another, forming distinct groups outside of cluster I (figure 11).

NifH cluster III which included the anaerobic diazotrophs (figure 12), contained five sub clusters with support values >72. A total of 16 NifH PHS clones were affiliated to G- Proteobacteria, Spirochaetes, Firmicutes and Bacteroidetes. Six clones formed a tight group within sub cluster 3-Prot-G-B (G-Proteobacteria) remotely related to NifH sequences from Desulfobulbus propionicus DSM 2033 and Desulfatibacillum alkenivorans AK-01 (88% BLASTX sequence similarity, table 2). An additional three clones in this sub cluster were

97 closely related to Desulfovibrio gigas (P71156), while in sub cluster 3-Spir-A, five clones were affiliated with Treponema and Spirochaeta spp. (Spirochaetes). A single clone (RSA205-HSP09) nestled individually between two sub clusters, 3-Firm-Arch and 3-Prot-G-B, which was suggested by the BLASTX analysis to be remotely related to Paludibacter propionicigenes WB4, Bacteroidetes, with only 79% NifH sequence similarity. An additional singular clone was affiliated with Thermincola sp. JR (YP_003639458) and the family Peptococcaceae (Firmicutes) in the sub cluster 3-Firm-Arch.

Overall, Cyanobacteria and G-Proteobacteria contributed the main NifH sequences to the clone library (figure 13). The Spirochaetes affiliated clones were detected only during the phylogenetic analysis, while in the BLASTX analysis those sequences were matched to δ- Proteobacteria and Bacteroidetes representatives, at 88% sequence similarity. Other shifts occurred within the assignments to Firmicutes and α- and E-Proteobacteria, as would be expected, since the phylogenetic analysis employs different assumptions and algorithms in its calculation, in comparison to the BLASTX analysis (see chapter 3, section 3.3.5, for further discussion regarding this point).

Figure 11 next page: Phylogenetic distribution of cluster I based on Maximum-likelihood analysis of partial NifH amino acid sequences. Sequences determined in this study were given an alphanumeric prefix RSAX-HSP09 and are marked bold; number of clones for each sequence is in parenthesis. Branch support values are shown for key branches; only values > 50 were considered significant. Text boxes contain designation of clusters and in parenthesis is the closest sub cluster nomination as per (Zehr et al., 2003a). Prot=Proteobacteria, Cyan=Cyanobacteria, Firm=Firmicutes. The scale bar represents the number of substitutions per 100 bases. Outgroup was Desulfuromonadales (G- Proteobacteria) nifH sequences from Geobacter and Pelobacter genera.

98

1-Prot-αβ (1J, 1K)

1-Prot-βJ-A (1P)

1-Prot-J-C (1H,1T,1L,1U,1M)

1-Firm-A

(1D) 1-Cyan- A (1B)

1-Cyan-B & C (1B)

99

3-Prot-G-GS (3L, 3T)

3-Spir-A (3P, 3L)

3-Prot-G-A (3P,

3-Prot-G-B (3B, 3E, 3L)

3-Firm-Arch (3C, 3D, 3A)

Figure 12: Phylogenetic distribution of PHS hot source clones in cluster III based on Maximum- likelihood analysis of NifH partial amino acid sequences. Sequences from this study (alphanumeric prefix RSAX-HSP09) and are marked bold, branch support values are shown for key branches; only values > 50 were considered significant. Spir=Spirochaetes, Arch=Archaea. The scale bar represents the number of substitutions per 100 bases.

100

Cluster III Cluster I

A

B

Figure 13: Phyla percentile representation from PHS hot source pool clone library between NifH cluster I (clear slices) and cluster III (shaded slices). Pane A) Phyla distribution based on the BLASTX results. Pane B) Phyla distribution based on the phylogenetic analysis. In bold – The two main phyla per cluster.

4.3.3 Coverage, diversity and community richness

The “Mothur” program was employed in order to calculate the various ecological parameters and coverage estimators as detailed in the previous chapter. As evident from the collectors curve (figure 14), the number of unique NifH amino acid sequences has not reached a plateau at 99% phylotype cutoff (based on furthest neighbour algorithm and distance precision of 0.01, Schloss and Handelsman (2005)), but did so at 93% phylotype cutoff. The estimated coverage by the method of Good (1953) was above 75%, at the 98% phylotype cutoff. Therefore, the coverage of potential diazotrophs in PHS hot source pool was sufficient but not complete, and the clone library was mostly representative of the diazotrophic diversity. Coverage and collectors curves suggested high diazotrophic diversity and the potential richness of the NifH species present in the hot source pool

101

Figure 14: Collector’s curves for taxa (OTUs) with minimum thresholds of 99(0.01), 98(0.02), 97(0.03), 96(0.04), 95% phylotype cutoff and lower, based on NifH partial amino acid sequences.

The PHS hot source pool diazotrophic community included 20 OTUs (Operational Taxonomic Units) at a 98% phylotype threshold, and the number of observed species by the Chao1 non- parametric estimator for richness was 33.75 (23.40-75.55, 95% CI), indicating that when sampled to completion there would be between 14 and 56 more NifH species obtained (table 3, at 98% phylotype cutoff). The Shannon-Wiener index of diversity (D) range was 2.35 - 3.12, between 91% - 100% phylotype cutoff.

Table 3: Coverage, observed phylotype richness and diversity indices for PHS hot source pool clone libraries, based on NifH partial amino acid sequences Sequence Coverage Richness Diversity index Phylotype length Good (%) OTUs Index Chao1(b) Shannon–Wiener (c) cutoff (%) analysed (a) (95% CI) (95% CI)

PHS 100 123AA 53.33 28 133.00(59.46-378.48) 3.12(2.86-3.38) 2009 99 64.44 24 54.00(32.72-127.19) 2.88(2.60-3.16) 98 75.56 20 33.75(23.40-75.55) 2.68(2.41-2.96) 95 82.22 17 24.00(18.45-50.75) 2.53(2.27-2.78) 93 86.67 15 20.00(15.86-43.91) 2.42(2.18-2.66) 91 88.89 14 17.33(14.50-36.07) 2.35(2.11-2.59)

Abbreviations: CI, confidence interval; OTUs, operational taxonomic units. (a) The coverage index was calculated by the method of Good (1953). (b) The richness index was calculated by the method of Chao et al. (1993). (c) The diversity index by the method of Shannon–Wiener (Krebs, 1989).

4.3.4 Nitrogen fixation in Paralana Hot Springs

The diversity analysis demonstrated high diazotrophic diversity and richness in the PHS hot source pool. The number of NifH clones analysed and sequenced in this study (76), represents the highest number of NifH clones from a singular hot spring to be analysed to date (Hamilton et al., 2011a). There is a strong potential for active nitrogen fixers, yet our attempts to identify

102 actively transcribing species was unsuccessful owing mainly to limited material availability (see section 4.2.1).

The DNA extraction and nifH gene amplification were successful. In summary, the best matches in the BLAST analysis were to G. lovleyi strain SZ nifH gene, M. laminosus CCMEE 5198 and a few uncultured nifH clones from a sea water sample, in the Mediterranean Sea (Man- Aharonovich et al., 2007). The heterocystous Cyanobacterium M. laminosus CCMEE 5198, is a moderately thermophilic bacteria, that is found in many hot springs worldwide (Miller et al., 2007) and was reportedly previously from PHS (Anitori et al., 2002). However, the BLASTX analysis recognised this specific clone as the heterocystous Nostoc sp. PCC 7120, at 97% sequence similarity. Even at lower similarities in the BLASTX results, M. laminosus nifH sequence was not suggested as a possible match (data not shown), and we concluded that most probably this clone was indeed a NifH sequence from Nostoc sp. PCC 7120. Additionally, a common match in the BLAST and BLASTX analyses was the mesophilic, strictly anaerobic G. lovleyi, with high sequence similarities scores. G. lovleyi is a known metal reducer and dechlorinating agent, that was studied extensively for its capabilities in bioremediation of pollutants (Sung et al., 2006). This finding in the hot source pool of PHS is of interest, especially if future work can verify it is an active nitrogen fixer. Cyanobacteria and G- Proteobacteria were the main diazotrophic taxa present in PHS hot source pool, according to the BLASTX analysis.

Phylogenetic analysis provided additional interesting results as well, mainly in relation to the cluster I vs. cluster III affiliations. The overall tree topology included high likelihood branches, which were the result of choosing verified reference NifH sequences to work with, as well as highly optimised amino acid substitution matrix and phylogeny algorithms. The tree topology in general was similar to previously reported NifH phylogeny trees (Zehr et al., 1997; Zehr et al., 2003a), and PHS hot source pool sequences were divided amongst two main clusters, I and III, and additional out groups.

The PHS D-Proteobacteria NifH representatives were closely related to Burkholderia and Azospirillum in cluster I. These genera are considered mesophilic and are routinely found to fix nitrogen in rhizosphere and soil environments (Okon, 1985; Garrity et al., 2005). Finding such traces is not surprising in the hot source pool, as it is an open pool, subjected to various interventions from nearby soil areas. The DNA fragments may represent adjacent bacteria, which landed in the sampling area, but do not actively fix nitrogen. The E-Proteobacteria representatives were related to Azoarcus and Dechloromonas from cluster I, and BLAST analysis indicated their sequences were originally isolated from an estuary, an Antarctic

103 microbial mat (Moisander et al., 2007; Jungblut and Neilan, 2010) and a Yellowstone National Park hot spring (Hall et al., 2008). This could point to potential active nitrogen fixers, which have the capability to adapt to various temperatures. The J-Proteobacteria NifH clones were affiliated with the Mo-dependant Azotobacter, a well studied mesophilic soil nitrogen fixer under microaerobic conditions, with an optimal diazotrophic growth pH at 7.0-7.5 (Dixon and Kahn, 2004; Garrity et al., 2005), again pointing to the possible introduction of this genus from nearby soil or rhizosphere areas. Half of the hot source pool cluster I NifH sequences were affiliated with the Cyanobacteria, a well known resiliant group of microorganisms, reported from virtually every extreme environment on Earth, and are the best candidates to be the active nitrogen fixers in this unique ecosystem (Whitton and Potts, 2000; Pandey et al., 2004; Thomas, 2005; Kaštovský and Johansen, 2008).

Sulphate reducers were another prominent finding in the hot source pool and were affiliated with cluster III. The PHS G-Proteobacteria NifH representatives were closely related to the anaerobic Desulfobulbus, Desulfatibacillum and Desulfovibrio genera. Desulfobulbus spp. are found in diverse environments, including deep sea methane vents and arsenic-rich, ferruginous shallow marine hydrothermal sediments (Pernthaler et al., 2008; Handley et al., 2010). Desulfatibacillum spp. were recently found in oil deposits and wellheads from hyper temperature oil wells (74 °C), and nifH fragments were also found in acidic, low temperature, peat bogs (Zadorina et al., 2009; Yamane et al., 2011). D. gigas has been rarely reported from thermal environments, yet it is known to fix nitrogen (Gall, 1963; Riederer-Henderson and Wilson, 1970; Steppe and Paerl, 2002).

PHS NifH clones were affiliated also with bacteria from the Spirochaetes group. Treponema and Spirochaeta spp., which are obligate anaerobes, are commonly found in hot and thermal environments, with an optimum growth range of up to 60°C in certain species (Patel et al., 1985; Paster et al., 1991; Weller et al., 1992). They are known contributors to the global nitrogen cycle with high N2 fixation rates of up to 5 ng of N2 per hour (Lilburn et al., 2001). A singular clone was affiliated with an anaerobic Thermincola member of the Firmicutes phylum. This genus closest phylogenetic relatives - Desulfosporosinus and Desulfotomaculum, were found to fix nitrogen in soil and termite guts (Postgate, 1982; Roesch et al., 2010). It is of interest to note that this thermophilic alkali-tolerant genus was isolated from a hot spring, in the Baikal Lake region (Sokolova et al., 2005), and its nitrogen fixation capabilities under various temperatures conditions, are currently unknown.

Two out groups to cluster I were T. yellowstonii and Geobacter spp. NifH sequences (figure 11). Certain strains of Geobacter were shown to fix atmospheric nitrogen, under anaerobic 104 conditions (Bazylinski et al., 2000; Methé et al., 2005), however, this is the first report of finding a Geobacter nifH gene fragments from a hot environment. Potential nifH sequences were identified in few other Geobacter spp. genomes, and they are under different stages of verification in the databases, hence most were not included in the tree (these were: G. sulfurreducens, G. bemidjiensis, G. metallireducens, G. sp. M21, G. sp. FRC-32, G. uraniireducens, G. sp. M18 and G. daltonii (NCBI nucleotide database, 2012). Furthermore, a thermophilic isolate of Geothermobacter ehrlichii of the same family - Geobacteracea, was isolated from hydrothermal vents and grew at 35°C and 65°C, with an optimum growth temperature of 55°C, suggesting thermophilic adaptations are quite possible within members of the Geobacteracea (Kashefi et al., 2003). The above data suggests a thermophilic nitrogen fixer of the Geobacter genus might be active in PHS hot source pool.

The thermophilic T. yellowstonii DSM 11347 NifH sequence was obtained from a complete genome sequence project, directly submitted to NCBI databases (Genbank ID CP001147.1, bioproject ID PRJNA30733) and is unverified by any other source. To our knowledge, this is the first report of NifH from this species, from a hot environment. Nothing is known of its true nitrogen fixation capabilities. It is interesting as well that this thermophilic group of sulphate reducers (Geobacter, Thermodesulfovibrio) do not cluster within cluster III, with other sulphate reducers. We estimate that as additional genome sequencing projects are completed, a thermophilic NifH cluster would further establish itself separately from other NifH clusters. This is mainly because the temperature regime would impose changes onto the nitrogenase characteristics, in order for it to remain functional under high temperatures. These changes will probably be reflected in the amino acids sequence and the nifH genetic code, effectively producing a new cluster in the tree topology.

In the past, PHS system was found to harbour high bacterial diversity based on a 16S rDNA molecular analysis (Anitori et al., 2002). In the same study, 180 different RFLP patterns were detected across all samples, and the Shannon-Wiener diversity estimator ranged from 0.57-3.85, with most samples showing values higher than 2.5. Our study echoed that diversity with a high Shannon-Wiener diversity range of 2.35 - 3.12. Only one study has provided this specific estimator for thermophilic diazotrophs, and at the moment, these are the highest values reported from a thermophilic environment. Hydrothermal vents, at 20°C to 78°C temperature range, reported diversity estimators of 1.8 to 2.2 (Mehta et al., 2003), while non thermophilic studies produced diversity estimators as high as 2.92 and as low as 1.02 in comparison (Izquierdo and Nüsslein, 2006; Roesch et al., 2010). Diversity studies in the geothermal springs of Yellowstone National Park (Hamilton et al., 2011a), have not provided this specific diversity estimator for the NifH clones, yet 13 hot springs were found to harbour 2-12 unique phylotypes, at sequence 105 identity threshold of 99%. In this study, at the same sequence identity threshold, the unique OTU number was 24 (table 3), pointing to a potentially higher diazotrophic diversity.

A substantial increase in G-Proteobacteria sequences was evident in our study, in comparison to the published 16S rDNA analysis (Anitori et al., 2002). Also, we have identified representatives of the Spirochaetes for the first time, and did not find any NifH Chloroflexi related clones, though that group of bacteria was reported previously (Anitori et al., 2002). There were no exact taxonomical matches in the D-, J-Proteobacteria groups between the studies. However, there were several 16S rDNA sequences affiliated at 95% similarity, to a T. islandicus, originally isolated from Icelandic hot springs (Sonne-Hansen and Ahring, 1999), and of the same genus as T. yellowstonii strain DSM 11347, which was detected in our study (Anitori et al., 2002). In a similar fashion, few 16S rDNA sequences were also remotely related to Pelobacter carbinolicus DSM 2380 and to Desulfuromonas spp. (87%, 84%, respectively), from the Desulfuromonadales order. Our study has reported NifH clones affiliated P. carbinolicus DSM 2380 and Geobacter spp. from the same order.

We did not measure nitrogen fixing rates in this study, and we were unable to confirm active nitrogen fixers. However, the literature points to common species that are repeatedly detected in thermal springs around the world, some are known to actively fix nitrogen. For instance, our analysis and the previous 16S rDNA analysis (Anitori et al., 2002), suggested heterocystous diazotrophic Nostocales, specifically Nostoc PCC 7120 and Anabaena variabilis ATCC 29413, were present in the hot source pool. Both species are aerobic nitrogen fixers, usually during light periods (Stewart, 1973), and they were also detected in hot springs in Japan, at 70°C, though it was not mentioned if they actively fixed nitrogen (Watanabe and Yamamoto, 1971). These facts make them a likely candidate to be an active nitrogen fixer in the PHS system. In a similar fashion, unicellular, filamentous and non-heterocystous Cyanobacteria found in our study, such as Oscillatoria sp. PCC 6506 and Cyanothece sp. CCY0110, tend to fix nitrogen aerobically during dark periods in order to avoid potential oxygen damage to the nitrogenase complex (Stal and Krumbein, 1987; Reddy et al., 1993; Schneegurt et al., 1994; Berman-Frank et al., 2003). A large thermophilic Oscillatoriales group was present at the Zerka Ma’in hot springs at 59°C - 63°C (Ionescu et al., 2010). An interesting finding in a sulphide-rich hot spring microbial mat (54°C), included a thermophilic Oscillatoria terebriformis, which was found to move vertically along the sulphide gradients in the mat, from oxic to anoxic conditions (Richardson and Castenholz, 1987). In addition, some strains of the Oscillatoria exhibited reduced nitrogenase activity during light periods (in vivo) when grown heterotrophically, yet when grown anaerobically, they were able to fix nitrogen during the light period as well (Stal and Heyer, 1987; Gallon et al., 1991). Considering the Oscillatoria group nitrogen fixing and 106 motility capabilities, and their repeated presence in hot environments, might suggest that the thermophilic Oscillatoria spp. present in the PHS hot source pool, would be a likely candidate to be an active nitrogen fixer. We would suggest also that anaerobic sulphate reducing G- Proteobacteria Desulfovibrio spp. would potentially be the active nitrogen fixers in the hot source pool. Evidence for anaerobically nitrogen fixation have been reported from various hot sources - a 63°C hot spring in Jordan (Steppe and Paerl, 2002), 50°C to 60°C alkaline springs in Yellowstone National Park (Wickstrom, 1984; Oren et al., 2009).

Though few D,E, and J-Proteobacteria have been detected in other thermophilic environments (Ward et al., 1998; Ferris et al., 2001; Miller et al., 2009), PHS Proteobacteria NifH clones were mainly associated with plants rhizosphere, and it remains to be seen the extent of their active nitrogen contribution to the PHS system.

4.4 Concluding remarks

In summary, PHS hot source pool NifH clones partially matched a past study based on 16S rDNA (Anitori et al., 2002). NifH clones were affiliated with the Oscillatoriales, Chroococcales, Nostocales (Cyanobacteria), as well as with P. carbinolicus and Thermodesulfovibrio spp. (G- Proteobacteria and Nitrospirae, respectively). Cluster III NifH clones were related to members of the G-Proteobacteria (mostly SRB), Spirochaetes, Bacteroidetes and Firmicutes, none of the which were identified in the original 16S rDNA bacterial community study (Anitori et al., 2002).

BLAST and BLASTX identified diazotrophs, who might be active in nitrogen fixation in this system. We would suggest that a thermophilic Oscillatoria spp. and an anaerobic sulphate reducing G-Proteobacteria from the Desulfovibrio spp. would potentially be the active nitrogen fixers in the hot source pool.

As with other culture independent studies, we assumed that not all bacteria which have the nifH genes actually express them and fix N2. Nevertheless, we would like to suggest nitrogen fixation does occur in the hot source pool, mainly because N2 levels in the spring waters were higher, compared to the local atmospheric composition (Brugger et al., 2005).

In summary, the hot source pool in Paralana Hot Springs supports a diverse and rich diazotrophic community. Our study has not only identified potential nitrogen fixers it has also expanded our basic knowledge of the microbial community composition and the potential of it nitrogen fixation dynamics.

107

Chapter 5 Structural and evolutionary adaptations in the Fe protein component of the nitrogenase ______

5.1 Introduction

The background question, propelling our efforts throughout this chapter, was whether there were changes to the inferred NifH sequences obtained from hypersaline and thermal environments (chapters 3 & 4), which reflect adaptations of the Fe protein to these environments ?

In order to remain active and functional under various physical conditions, it is essential for any protein to adapt to its immediate surroundings (Jaenicke and Böhm, 1998; Somero, 2003; Bolhuis et al., 2008). There are several possible pathways for adaptation; a protein may be protected from inactivation by “external” factors, such as being enclosed within a cell or organelle (a heterocyst for example). Micro-conditions surrounding the protein can also be controlled, either with heat/cold shock proteins, or by organic compatible solutes or by a heterotrophic existence, thus preventing exposure to unfavourable conditions and inactivation (Des Marais, 1995; Fields, 2001; Pikuta et al., 2007). The amino acid composition within a protein was found to change under stressful conditions such as high salinity, pressure, extreme temperatures and pH (Madern et al., 1995; Jaenicke, 1996; Groudieva et al., 2004; Siddiqui and Thomas, 2008; Greaves and Warwicker, 2009). It is therefore of interest to look into the potential adaptations in the Fe protein in response to stressful environmental conditions, and gain better understanding of the mechanistic solutions originating from genetic code permutations.

The Fe protein, encoded by the nifH gene, has been phylogenetically classified within the family of the Mrp/MinD proteins, as part of the SIMIBI class within the GTPase super class group of proteins, which include translation factors, signal recognition particle (Costello et al.) GTPases, and several families of ATPases (Leipe et al., 2002). GTPase proteins include several conserved elements - a repetitive α/E secondary structure, an N-terminal Walker A motif, also known as a P-loop, which structurally forms a loop and binds the J-phosphate of a nucleotide to facilitate hydrolysis (Walker et al., 1982). In addition, GTPases also include the Walker B

108 motif, which binds via a water molecule to the MgATP, and includes a conserved Asp and Gly residues, preceded by four hydrophobic residues (Peters et al., 1995). Two switch regions, known as Switch I and Switch II, were termed as an analogy to the homologous regions in ras P21 proteins (Lanzilotta et al., 1996; Jang et al., 2000; Jang et al., 2004) and are vital to the conformational change upon nucleotide binding. The Fe protein is a dimer, structurally composed from eight beta sheets and alpha helices (Schlessman et al., 1998; Tezcan et al., 2005), with a 4Fe:4S metalo cluster nestled in between (see also introduction chapter, section 1.3.1, for further details).

The Fe protein structure has been studied quite extensively due to its role in dinitrogen fixation (Howard and Rees, 1996; Peters and Szilagyi, 2006). Molecular phylogenetic studies utilising the nifH gene primers (Zehr and McReynolds, 1989; Omoregie et al., 2004b), amplify only part of the gene, corresponding to residues 37-155 (residue numbering according to P00456 Swiss- Prot ID sequence, see figure 1). This part contains information on switches I and II, the Walker B motif and residues which coordinate the metallo cluster and interact with the second component of the nitrogenase, the MoFe protein (see figure 2). The amplified section does not cover the nucleotide binding fold, Walker A motif (Walker et al., 1982). Within the amplified region, there are known loops which can undergo conformational variations, plus several conserved residues which form multiple hydrogen bonds via interaction with conserved water molecules, and also NH-S bonds between the amide groups and sulfur atoms, specifically around the 4Fe:4S cluster (Georgiadis et al., 1992; Schlessman et al., 1998; Chiu et al., 2001).

Figure 1. Amplified regions of NifH in the Fe protein (highlighted in blue and red). MoFe chains A - D are\ shown with minimal backbone atom display, the Fe protein chains E and F are shown in grey ribbons, except for the amplified NifH regions, residues 37-155. Space filled atoms are displayed for the Calcium ions, Fe7MoNS9 and Fe8S7 clusters in the MoFe protein, and the Fe4S4 cluster in the Fe protein. Image based on 2AFH PDB file (Tezcan et al., 2005). 109

Figure 2. Known functional regions in the amplified regions of NifH in the Fe protein. Switch I region is highlighted in orange, switch II in forest green, Walker B motif in blue, and residues which interact with the MoFe protein are coloured red. Q54, part of the Q-loop motif (see main text in section 5.4.2) is in pink. For visualization purposes, MoFe chains A and B are presented in minimal wire, and the image was cropped. Fe protein chain E was omitted from the image (2AFH PDB file (Tezcan et al., 2005).

In order to elucidate structural deviations relating to potential environmental adaptation, it was imperative to obtain a known Fe protein structure, which would represent each of cluster I and III individually. Since 1992 the crystallographic structures of the Fe protein provided new insights on its mechanism and structure (Georgiadis et al., 1992; Kim et al., 1993; Peters and Szilagyi, 2006). Twenty Fe proteins have been resolved in the range of 2.1 - 3.2 Å from Azotobacter vinelandii, phylogenetically affiliated with cluster I (P00459 Swiss-Prot ID, H.M. Berman, 2003). The best refined model, 2AFH at 2.1 Å (P00459 Swiss-Prot ID), was chosen as the reference structure for clones affiliated with cluster I (Tezcan et al., 2005). However, only two resolved structures have emerged from bacteria affiliated with cluster III - Clostridium pasteurianum, and these structures were determined at 1.93 and 3.00 Å resolution (Kim et al., 1993; Schlessman et al., 1998). The more refined structure, designated 1CP2 (P00456 Swiss-Prot ID), was chosen as the reference structure for this study, for clones affiliated with cluster III. These two Fe protein models, 2AFH and 1CP2, are from mesophilic bacteria and share a 69% overall amino acid sequence and 73.5% sequence identity in the

110 amplified region of nifH specifically (Burgess et al., 1980; Zehr and McReynolds, 1989; Schlessman et al., 1998; Omoregie et al., 2004a).

In order to detect amino acid substitutions in a sequence and changes in the Fe protein structure, two different bioinformatic tools were employed with 1CP2, 2AFH and NifH clones from this study and existing databases. ConSurf is a bioinformatic tool which identifies functional regions in proteins, by taking into consideration their phylogenetic background and similarities between amino acids (Glaser et al., 2003; Landau et al., 2005). After estimating the level of conservation of each amino acid in a set of sequences, a representative colour scheme is projected onto a protein 3D visualized structure, thus helping researchers to identify areas highly conserved and functionally important, but also areas of medium to high variability (Pupko et al., 2002; Goldenberg et al., 2008). ConSurf is currently ranked as one of the best bioinformatic tools available today for identifying important functional sections in proteins (Chung et al., 2005; Ashkenazy et al., 2010; Mooney et al., 2011). ConSurf has been employed in the past in the analysis of various proteins which included an iron-sulfur cluster, or supervised the biogenesis of such clusters, for instance - the cytosolic iron-sulfur assembly protein (Cia1, Srinivasan et al., 2007), the nitrogenase molybdenum-iron protein (Chung et al., 2006), the Iron–Sulfur Cluster Assembly proteins (IscU, IscS, Ramelot et al., 2004; Shi et al., 2010), reverse-acting Dissimilatory sulphite reductase (DsrAB, Grimm et al., 2010), and an ATPase component in the biosynthesis of Fe–S clusters (SufC, SufE, Goldsmith-Fischman et al., 2004).

In most cases, these studies used ConSurf additionally to an analysis of the protein resolved structure, to highlight regions of strict conservation and point out or confirm their specific functionality (Ramelot et al., 2004; Li et al., 2009). At times, ConSurf has been used without any accompanying biochemical analysis, being used as a prediction tool, to help researchers find, among other things, protein-protein interaction sites, ligand binding sites, provide data for future mutational or structural studies, and assigning domain functions to the ever increasing number of hypothetical proteins (Bell and BenǦTal, 2003; Chung et al., 2005; Ashkenazy et al., 2010). Thus, ConSurf analysis can be used to distinguish and illuminate conserved important functional zones in families of proteins, and help in deducing lineage specific adaptations (Glaser et al., 2005), even when a known 3D crystallographic structure is unavailable (Razia et al., 2010; Kumar et al., 2012).

In our analysis, multiple alignments of each NifH cluster were compiled from reviewed NifH sequences obtained from the Swiss-Prot database (Boeckmann et al., 2003), 58 and 32 reference sequences, of cluster I and cluster III, respectively. Using reference sequences provided less background noise to the data, as there are many NifH sequences available in the databases,

111 isolated from various sources under various conditions. Furthermore, it was assumed that genetic changes would manifest mainly in the non conserved regions of the protein, and therefore each multiple alignment was further split into two - a set of multiple sequences with conserved residues only, and another set with variable residues only. The distinction between variable and conserved residues was based on the ConSurf analysis, detailed in section 5.2.1. The dichotomy between conserved and non-conserved enabled us to analyse shifts in the amino acids composition for each segment.

Currently, the ConSurf web server requires a 3D structure of a protein, written as a PDB file, for visualising the end result (Glaser et al., 2003). This and our aim to detect structural shifts in the clones, directed us to use predicted structures, that were modelled by the iterative threading assembly refinement (I-TASSER) server, which predicts 3D protein models (Zhang, 2008, 2009). Briefly, the server first assesses the possible secondary structure of a given sequence against a representative PDB template library, using a 70% cutoff criterion for the pair-wise comparison, and a combination of alignment programs, such as Needleman-Wunsch (Needleman and Wunsch, 1970), Smith-Waterman (Pearson, 1991), etc, to propose a potential secondary structure for the sequence (Zhang, 2008, 2009). The potential structure is then divided into continuous segments of good quality structural alignments, and unaligned fragmented sections, usually loop regions, which require a different method for structural refinement. After additional spatial characteristics are calculated and averaged across a cluster of potential structures, the modelling process is repeated to produce the best structural candidate. In the second round, additional algorithms are used, such as the TM-align (Zhang and Skolnick, 2005), for structural alignment, and other softwares to add backbone atoms and side chain rotamers, eventually producing a PDB file for downstream applications (Roy et al., 2010).

The I-TASSER server provides different scores to evaluate the quality of its models. ‘C-score’ is a confidence score for estimating the quality of predicted models by the I-TASSER server. The C-score is calculated based on the significance of threading template alignments and the convergence parameters of the structure assembly simulations. It is typically in the range of [- 5,2], where a C-score of higher value signifies a model with a high confidence and vice-versa (Zhang and Skolnick, 2007; Roy et al., 2010). A template modelling score (TM-score) is a scale for measuring the topological similarity between two structures (Zhang and Skolnick, 2004), a TM-score >0.5 would indicate that a model had correct topology and a TM-score below 0.17 would indicate random similarity. Root mean square deviation (RMSD) is another score provided by the server, which is a well known standard for measuring the accuracy of structure modelling, when the native structure is known (Kabsch, 1976, 1978; Carugo, 2003). The lower the RMSD score, the better is the match between structures (i.e., smaller deviations). 112

I-TASSER consistently ranks the best method in the Critical Assessment of Structure Prediction (Caspi and Karp) experiments for predicting protein structures (Zhang, 2007, 2009; Roy et al., 2010).

Either accompanied with a functional or biochemical analysis, or without, these two bioinformatics tools can provide powerful insight and novel information on protein structure and conservation. In one study on the membrane associated thioredoxins of the Arabidopsis thaliana plant, ConSurf was used to highlight two conserved amino acids, Gly and Cys, in the N-terminal extension of the protein. These were then mutated to Ala, for further functional and structural analysis (Meng et al., 2010). Subsequently, I-TASSER was used to predict the protein and the mutant variants’ 3D structures, which enabled the researchers to show structural modifications due to the changes in those specific amino acids. Their mutational and biochemical study supported the ConSurf and I-TASSER results. In another study, a large scale ConSurf analysis of the NS1 and NS2 amino acid sequences, from influenza A virus, was projected onto I-TASSER models of NS1 and NS2, to highlight novel potential binding sites for drugs (Darapaneni et al., 2009). There are few additional studies which employed both tools, and we expect more studies will emerge using ConSurf and I-TASSER (Jimenez-Lopez et al., 2010; Meng and Feldman, 2010; Aluri and Terli, 2012; Bhat et al., 2012).

To our knowledge, this is the first time these tools have been used in the analysis of the Fe protein component of the nitrogenase protein. This required us first and foremost to analyse the novel methodology, followed by the later analysis of our data with the established protocol. Specific aims:

1. The two main clusters in the NifH phylogeny tree are cluster I and cluster III. Most of our previously findings (detailed in chapters 3 and 4) were affiliated with these clusters. Therefore our aim in this chapter was firstly to characterise conservation patterns and amino acid distribution in NifH sequences from cluster I and cluster III.

2. Evaluate novel methodology in regards to known structural and functional regions of the Fe protein.

113

5.2 Material and methods

5.2.1 Evolutionary conservation

Evolutionary conserved and non conserved residues for cluster I & III and affiliated clones, were calculated by the “ConSurf” program (Pupko et al., 2002; Glaser et al., 2003; Landau et al., 2005; Goldenberg et al., 2008). Pre-compiled multiple alignments were built using MUSCLE (Edgar, 2004), manually checked, and submitted to ConSurf online web server for analysis (http://ConSurf.tau.ac.il/). Specific parameters were chosen - homologues were collected from Swiss-Prot (Boeckmann et al., 2003), PSI-BLAST E-value: 0.0001, no. of PSI- BLAST iterations: 1, maximal % ID Between Sequences: 100, minimal % ID for homologs: 72. A phylogenetic tree was constructed with the method of Neighbour Joining and ML distance, and the method of maximum likelihood and the LG protein substitution model were chosen for the conservation scores (Le and Gascuel, 2008; Posada et al., 2009). After the initial collection of homologues, only cluster I or cluster III specific sequences from known organisms were chosen (58 and 32 sequences, respectively) and submitted for further analysis with ConSeq (Berezin et al., 2004).

5.2.2 Residue composition

The aligned NifH sequences from known organisms affiliated with cluster I or cluster III and the inferred NifH sequences from clones were subjected to residue composition calculation. The average ratio of 20 amino acids in the partial NifH sequences was calculated for each set of multiple alignment using MEGA 5 software (Tamura et al., 2007; Kumar et al., 2008; Tamura et al., 2011). In each multiple alignment, the average ratio of the amino acids was calculated separately for the conserved and variable sections of the NifH sequence (conserved residues denoted ‘9’ by ConSurf evolutionary scoring matrix, variable residues - ‘0-8’).

5.2.3 Statistical analysis

All statistical analyses were calculated using GraphPad Prism version 5.04 for Windows (GraphPad Software, San Diego, California, USA). D'Agostino & Pearson “omnibus K2” normality tests (D'Agostino, 1986) were performed on each amino acid in the partial NifH sequences of cluster I and III. Amino acids were then subjected to frequency distribution analysis, followed by a non linear regression analysis using the Gaussian equation and “robust fit” as fitting method, with ‘Q’ parameter set to 1% in order to exclude possible outliers in the

114 data set (Motulsky and Brown, 2006). Two tailed unpaired t-tests with Welch’s correction, allowing for different variances, were performed (Welch, 1947) and the mean composition was denoted significantly different when P < 0.05.

5.2.4 Structural characteristics

3D crystallographic representatives of the Fe protein from mesophilic Azotobacter vinelandii PDB file ID 2AFH (Burgess et al., 1980; Tezcan et al., 2005) and Clostridium pasteurianum, PDB file ID 1CP2 (Schlessman et al., 1998), were chosen in order to assess potential structural changes in relation to cluster I and cluster III, respectively. Secondary structures based on 3D coordinates were analysed by DSSP (Define Secondary Structure of Proteins) as implemented in The Protein Data Bank (Kabsch and Sander, 1983; H.M. Berman, 2003) and in WHAT IF program (Vriend, 1990), and were also predicted by I- TASSER online server (Zhang, 2008; Roy et al., 2010). Solvent accessibility was calculated by the ConSeq program (Sridharan et al., 1992; Pollastri et al., 2002; Berezin et al., 2004), WHAT IF program and I-TASSER on line server. Images were created using the Chimera UCSF program, version 1.6.2 (Pettersen et al., 2004). Salt bridges were defined by the WHAT IF web server (Vriend, 1990), version 10.1a (http://swift.cmbi.ru.nl/servers/html/index.html). Salt bridges were restricted to an interatomic distance of less than 4.0 Å between a negative atom, at the side chain oxygen atoms of an Asp or Glu residue, and a positive atom at the side chain nitrogen of an Arg, Lys or His residue (Rodriguez et al., 1998).

115

5.3 Results

5.3.1 Evolution, composition and structure of the Cluster III Fe protein

The evolutionary analysis was based on the alignment of 32 complete NifH reference sequences from known organisms affiliated with cluster III (figure 3). The assigned function for the residues presented here was taken from Schlessman et al., 1998, unless otherwise specified. The completely conserved residues were scored 9 by ConSurf and coloured in maroon, and included residues with important function. The residues coordinating the 4Fe:4S cluster and creating NH- S bonds between the amide groups and sulphur atoms were at positions 91, 93 to 97, 127 and 129 to 132 (numbering as presented in figure 3, score 9). Many positions involved in the binding of the Fe protein to the MoFe protein were completely conserved - 59-62, 90-103, 133- 141, 171, though several were not. Positions 66-67 and 142 included an Asp or Glu, while position 106 usually included Asn (score 6) and sometimes Gly or Asp. Position 110 included almost always a polar uncharged amino acid (Q/N/S, score 2) and positions 172-174 were an interplay between a hydrophobic Y/F followed by a highly conserved Gly or Ala (173, score 8) and a charged amino acid (E/D/K at position 174, score 1).

All the residues in the two switch regions, known as Switch I and Switch II, were completely conserved (38-43 and 126-136 , respectively). Walker A motif, the phosphate binding loop region, at residues 7-17, was completely conserved except for position 13 (Gly, score 8), in which Ala replaced the Gly in one sequence. Most of the residues implicated in the chains interaction within the Fe protein were completely conserved, but several were not. The completely conserved positions were - 9, 41, 43, 46, 91-92, 94, 97, 128, 130-133, 136-137, 155, 157, 160-161, 164-165, 167, 171, 188, 190 and 266. There were 16 less conserved positions, according to the ConSurf analysis. Positions 52 and 156, included an exchange between two hydrophobic amino acids (L/M, score 7), position 172 was an exchange between large hydrophobic amino acids (Y/F, score 1), position 189 was highly variable (score 1), position 191 was mostly an exchange between A/D (score 7).

Position 214 was mostly Arg (score 8) and position 216 was mostly an exchange between P/N (score 6) while positions 219-220 were an interplay between polar uncharged amino acids T/Q (score 8) followed by a positively charged amino acid, K/R (score 5). Positions 222-225 scored 6-8, as position 222 was mainly a negatively charged Glu, while position 225 was always a positively charged amino acid, R/K. 223-224 were uncharged amino acids, mainly Ile or Asn (scores 6/7, respectively). Position 262 was highly variable (score 1), while positions 269-270 always contained a pair of hydrophobic amino acids, I/L/M/V (and scored 4-1). 116

In addition, hydrogen bonding partners, with water molecules, were highly conserved as well. Completely conserved residues were at positions 11-17, 39, 44, 86, 109, 128, 130-132, 144, 187 and 205. Several other hydrogen bonding positions were not conserved. These positions were - 3 (exchanges between Q/K, score 8), 13 (mainly Gly, score 8), 55 (mainly Lys, score 6), 113 (mainly Ala, 8), 114 (Tyr, 8), 125 (Tyr, 8), 178 (Val, 8), 201 (Ala, 8), 253 (L/M/K, score 7) and 261 (highly variable, score 1).

Figure 3 Multiple alignment of NifH complete sequences (N=32) from known organisms affiliated with cluster III coloured by ConSurf. Scale bar colours represent scores 1-9, variable residues in turquoise and completely conserved residues in maroon. The first line shows the residue number, the second line shows consensus, and third line shows the evolutionary score. The first reference sequence, input-pdb-seqres_A is NifH chain A from 1CP2 pdb file, the rest are NifH sequences affiliated with cluster III, see section 5.2.1. Thin line marks the amplified region. Figure continues in the next pages.

117

118

119

120

121

Because the nifH gene primers, used throughout this study (Zehr and McReynolds, 1989; Omoregie et al., 2004b), amplify only part of the gene, the rest of our analyses referred only to the amplified section in order to obtain specifics to compare against the clones’ inferred sequences.

In regards to the average residue composition of cluster III alignment, it was important to clarify whether the amino acids population followed a Gaussian or normal distribution in order to perform a comparative analysis, such as a t-test analysis or ANOVA, on their residue compositions (Smith, 1966; D'Agostino, 1986; Motulsky and Christopoulos, 2004). Table 1 summarises our analysis of amino acid composition in cluster III sequences.

In cluster III, 12 amino acids were found to pass the normality test, and eight amino acids did not pass. Trp rarely appeared and did not pass the normality test because there was no distribution to observe. Asp, Glu and Gly did converge to a bell shape curve yet the bell shape curve was not the best fit (figure 4). The amino acids Phe, Val, and Ser did not pass the normality test mainly because the distribution revolved around a few discrete values and did not exhibit normal distribution in our data set. Gly was the most common amino acid in cluster III sequences (mean value 11.58), while Glu composition varied the most, and had the highest standard deviation value within the group - 1.22.

Table 1. Amino acids composition in the amplified region of NifH, cluster III sequences from known organisms (N=32). Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Mean (%) 7.59 2.37 6.01 8.41 2.40 11.58 0.88 7.48 6.34 8.07 3.96 Std. Deviation 0.67 0.31 0.92 1.22 0.56 0.57 0.41 0.63 0.58 1.05 0.53 Passed normality test (alpha=0.05)? (a) Yes Yes No No No No No Yes Yes Yes Yes P-value Summary (b) **** * ** * **

Amino Acid Asn Pro Gln Arg Ser Thr Val Trp Tyr Mean (%) 3.92 3.19 3.23 4.98 3.51 4.93 7.50 0.14 3.50 Std. Deviation 0.75 0.21 0.49 0.84 0.87 0.40 0.69 0.21 0.52 Passed normality test (alpha=0.05)? Yes Yes Yes Yes No Yes No No Yes P-value Summary ** ** * (a) D'Agostino & Pearson omnibus normality test (D'Agostino, 1986). (b) **** P< 0.0001 extremely significant, *** 0.0001

122

Composition (%)

Composition (%)

Figure 4 Upper pane: The distribution shape of three amino acids based on their composition pattern in the NifH sequence: Asp, Glu and Gly. Goodness of fit (Robust sum of square):Asp-6.22, Glu-6.04, Gly- 8.03. Lower pane: The distribution shape of three amino acids based on their composition pattern in the NifH sequence: Ser, Val and Phe. Goodness of fit (Robust sum of square): Ser-6.7, Val-6.83, Phe- curve did not converge. Y axis represents the relative frequency of the X axis values. X axis represents the range of the composition values of each amino acid in the NifH amplified region.

Completely conserved Gly and Ala, in the amplified section of NifH, had interesting structural characteristics (Table 2). Their secondary structure, based on 1CP2 resolved structure, was characterised by high curvature sections (bends, ‘S’), H-bonded turns (‘T’, just before or after a helix, usually) and 3-helix turns (‘G’ - three residues per turn), and few were present in unidentified or coil regions (Table 2, G or A score 9). Conserved Gly or Ala residues were adjacent to important functional domains, such as the nucleotide or MoFe binding sites, the Fe protein inter-subunits interaction region and the switch I & II regions. Solvent accessibility analysis suggested that some of the Gly and Ala were accessible to the solvent at positions - 41, 50, 64, 76, 86-87, 91, 110 and 139. However, other conserved Gly or Ala residues were buried, usually within conserved structural motifs.

123

Table 2 1CP2 Fe protein partial NifH sequence, conservation scores, secondary structure and solvent accessibility. Conserved Ala (A) and Gly (G) residues are highlighted. 1CP2 40 60 80 | | | (^) Partial NifH Sequence (a) CDP KADSTRLLLGGLAQKSVLDT LREEGEDVELDSILKEGYGG IRCVESGGPEPGVGCAGRGI Conservation Score (b) 999 99999984919719688699 99189757161131419111 41889999999999999999 Secondary Structure (c) * E-T TS-SSHHHHTS-----HHHH HHHHGGG--HHHH-EE-GGG -EEEE------TTSS-HHHH *** ------HHHHHH------HHHH HHHH-----HHHHHHH---- EEEEE------H Solvent Accessibility (d) † bee eeebbebbbeebeeeebbee beeeeeebebeebbeeeeee beeeeeeeeeeeebbbbebb †† --e --e-e---e-ee-e--ee-- eeee-ee--eeee--ee--e -e-e----e-e--ee--e-- ††† b-b ---bb--b------bb-- b-----b-b--bb------b-bbbb---bbbbb----bb

Binding to MoFe Protein ------LD- LR---ED------P---V-C--R-- Nucleotide Binding Site -D- K-DS------Chains Interface --- K-D--R-----L------EP-V--A---- Structural Motifs SWITC H1------(a) Corresponding partial NifH amino acid sequence P00456 accession ID, chain A. (b) Residue conservation scores, calculated by ConSurf with Maximum likelihood and LG protein substitution model (Pupko et al., 2002; Glaser et al., 2003; Landau et al., 2005; Goldenberg et al., 2008). (e) Secondary structure : ‘H’ - helix; ‘T’- hydrogen bonded turn; ‘S’- bend; ‘E’ - extended beta sheet; ‘G’- 3-helix (three residues per turn); ‘-‘ unknown/ random coil. * on 3D coordinates calculated by DSSP as implemented in The Protein Data Bank (Kabsch and Sander, 1983; H.M. Berman, 2003). *** Predicted secondary structure by I-TASSER on line server, based on P00459 NifH sequence (Zhang, 2008; Roy et al., 2010). ‘H’ - helix, ‘E’ - extended beta sheet, ‘- ‘unknown. (c) Solvent accessibility: † Buried (b) or exposed (e) residue; calculated by ConSeq on line server (Sridharan et al., 1992; Pollastri et al., 2002; Berezin et al., 2004). †† Solvent accessibility calculated by WHAT IF program, ‘-‘ unknown , (e) exposed - a residue that is clearly solvent accessible, more exposed than 102 Angstrom, or more than 33% of its accessibility in the unfolded state (Vriend, 1990). ††† Predicted solvent accessibility calculated by I-TASSER server. ‘b’- buried residue (0); ‘e’ highly exposed residue (7-9); ‘-‘ varying degrees of exposure; (Chen and Zhou, 2005; Wu and Zhang, 2008). (d) residues interacting with structural components in nitrogenase (Schlessman et al., 1998) (^) Cysteine residues which coordinate the metallo cluster are marked C

124

1CP2 sequence continued. Conserved Ala (A) and Gly (G) residues are highlighted. 100 120 140 154 Partial NifH Sequence | | (^) | | ITSINMLEQLGAYTDDLDYV FYDVLGDVVCGGFAMPIREG KAQEIYIVASGEMMAL Conservation Score 99886379279861119967 88999999999999999949 9919799959797983 Secondary Structure * HHHHHHHHTT----TT-SEE EEEEE-SS-STTTTHHHHTT S--EEEEEE-SSHHHH *** HHHHHHHHHHHH------EE EEE-----EEE---EE------EEEEEE--HHHHH Solvent Accessibility † bbbbbbbeebeebeeebebb bbbbbbbbbbbbbbbebeee ebeebbbbbbbebbbb †† -e-ee-eeee-eeee-e------ee- e-e------eee-- ††† bbb--b-e------e-b-bb bbbb-b--bbbb-bb--b-- -b--bbbbb----bbb

Binding to MoFe protein IT--N---Q------C----M--RE------Nucleotide Binding Site ------D---D------E-MA- Chains Interface ------L-DVVC--FA------EMM-- Structural Motifs ------S W I T C H2------

125

The I-TASSER analysis of 1CP2 chain A sequence, produced one 3D model, with an estimated accuracy of RMSD 1.9±1.5 Å, C-score of 2.13 and 0.99±0.04 TM-score. The PDB templates, identified by the various server software modules in the threading stage, were PDB 1CP2 chain A and 2AFH chains E & A, the latter receiving lower sequence identity percentages. The top ranking EC predicted number was 1.18.6.1 (nitrogenase) with a TM-score of 0.9782 and RMSD 0.66 Å, with 100% sequence identity to the query sequence, an EC-score of 4.5881 and a PDB hit to 1CP2 chain B (the dimer chain in the Fe protein). The most structurally similar protein to the I-TASSER model was actually identified as the 2AFH chain E, with a TM-score of 0.9929 and RMSD 0.55 Å, with 69% sequence identity.

Figure 5 Superimpositionnofthe of the II-TASSER-TASSER model of the amplified NifH seqsequence based on P00456, and the crystallographic structures of 1CP2. Sections with RMSD > 1Å are highlighted with colour.

Superimposing the I-TASSER model over the known x-ray crystallographic structure of 1CP2 (see section 5.2.3 for specific parameters), highlighted which amino acids were positioned imprecisely by the I-TASSER server. The overall RMSD of superimposing both structures, predicted and known, was 0.529 Å. Four sections had RMSD values higher than 1 Å (figure 5): Gly51-Leu52 (2.982 Å), Glu91 (1.811 Å), Gly93-Val94 (2.214 Å) and Thr115-Asp116 (1.374 Å).

5.3.2 Evolution, composition and structure of the Cluster I Fe protein

The evolutionary analysis was based on 58 complete reference NifH sequences, from known organisms affiliated with cluster I. The ConSurf evolutionary analysis results for cluster I (figure 6) were similar to the results from cluster III analysis. All the residues coordinating the 4Fe:4S cluster in the Fe protein were completely conserved (positions 96-100, 130, 132-135, numbering based on figure 6). All the residues in the Walker A motif, known as Switch I and

126

Switch II were completely conserved as well, positions 7-17, 38-43, and 125-135. Most of the residues involved in the binding of the Fe protein to the MoFe protein - 61-62, 67-69, 91, 95, 97, 100, 103-104, 137, 132, 140, 170-171, were completely conserved as well, though some were not. Residues at position 58 were a hydrophobic Met or Leu (score 7), 59 was highly variable (score 3), position 66 included exchanges between T/S/A (score 5), 107 was mainly Asn (score 8), and position 141 included an exchange between E/Q (score 6), 173-174 were highly variable (score 1-4).

Aside from one position, all the residues involved in the nucleotide binding sections were completely conserved. This single position included either Ile or Val (Score 7). Not all the residues involved in the intersubunits interaction were completely conserved. The completely conserved positions were 9, 41, 43, 46, 52, 92-93, 95, 98, 127, 129-132, 135-136, 154-156, 163- 164, 170-171, 187, 213, 223, 262 and 266. The less conserved positions were 159 (mainly Y, score 8), 166 was an exchange between positively charged amino acids (K/R, 6), position 215 was mainly Asn (N, 8) and position 219 (another exchange - R/H, 6). Additionally, positions 187-190 included a motif which started with a positively charged amino acid, and ended with a negatively charged amino acid, with a hydrophobic or uncharged residue inserted in between (R,9; N/Q/K,1; T/V,6; D,8).

A similar motif was present in positions 221-224, starting with a negatively charged amino acid, followed by a hydrophobic residue and ending with a positively charged amino acid (E, 9; L/I, 8; R, 9; R/K, 7). In addition, 26 residues that participated as hydrogen bonding partners with water molecules (Schlessman et al., 1998), were completely conserved, with only two residues scoring 8 - position 143 which was mainly Lys and position 169 that had an exchange between Val and Leu.

127

Figure 6 Multiple alignment of 58 NifH complete sequences (N=58) from known organisms affiliated with cluster I, coloured by ConSurf. Colours represent scores 1-9, variable residues in turquoise, average conservation in white and completely conserved residues in maroon. The first line shows the residue number, the second line shows consensus, and third line shows the evolutionary score by ConSurf. The first sequence in the alignment, Input_pdb_ATOM_E, is the complete NifH sequence of chain E from 2AFH PDB file. The rest are NifH sequences affiliated with cluster I, see section 5.2.1. Thin black line marks the amplified region by the nifH gene PCR primers. Figure continues in the next pages.

128

129

130

131

132

133

134

135

136

In a similar fashion to the residue composition analysis previously done on the cluster III alignment, table 3 summarises our analysis of amino acids composition and distribution in cluster I sequences. In cluster I, 13 amino acids were found to pass the “omnibus K2” normality test (alpha=0.05), and seven amino acids did not pass. These amino acids did not pass the normality test due to three different reasons: a. an amino acid very rarely appeared in a sequence, hence there was no distribution to observe (Trp), b. the distribution revolved around a few discrete values and did not exhibit normal distribution (Cys, Asp, Phe, Asn, Pro), and c. two distributions were observed instead of just one in the data set (Arg). Figure 7 shows examples of a Gaussian non linear regression analysis for points b & c, for Arg, Cys and Phe. Gly was the most common amino acid in cluster I sequences (mean value 10.05), while Leu composition varied the most, it had the highest standard deviation value in the group - 1.12.

Table 3. Amino acids composition in the amplified region of NifH sequences from known organisms affiliated with cluster I (N=58). Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met Mean (%) 9.56 2.09 5.33 9.13 1.99 10.05 1.51 7.94 5.13 8.59 4.00 Std. Deviation 0.78 0.49 0.63 0.86 0.36 0.47 0.52 0.62 0.84 1.12 0.75 Passed normality test (alpha=0.05)? (a) Yes No No Yes No Yes Yes Yes Yes Yes Yes P-value Summary (b) **** ** *** Asn Pro Gln Arg Ser Thr Val Trp Tyr Mean (%) 4.25 2.89 3.62 4.77 4.07 4.76 6.99 0.04 3.32 Std. Deviation 0.55 0.15 0.76 0.73 0.72 0.67 0.90 0.11 0.29 Passed normality test (alpha=0.05)? No No Yes No Yes Yes Yes No Yes P-value Summary *** *** *** **** (a)D'Agostino & Pearson omnibus normality test (D'Agostino, 1986). (b) **** P< 0.0001 extremely significant, *** 0.0001

137

Composition (%)

Figure 7 The distribution shape of three amino acids based on their composition pattern in the NifH sequence: Arg, Cys and Phe. Goodness of fit (Robust sum of square): Arg-7.21, Cys- 6.01, Phe-5.54. Y axis represents the relative frequency of the X axis values. X axis represents the range of the composition values of each amino acid in the NifH amplified region.

In a similar fashion to 1CP2 structural analysis, the conserved Gly and Ala of the amplified region of NifH in 2AFH, appeared in alpha helices (‘H’), regions with high curvature (bends, ‘S’) and H-bonded turns (‘T’, just before or after a helix, usually) and 3-helix turns (‘G’ - three residues per turn), with only few present in unidentified or coil regions (G or A with conservation score 9, table 4). Conserved Gly or Ala residues in 2AFH were adjacent to important functional domains, similar to the 1CP2 findings. Solvent accessibility analysis suggested that some of the Gly or Ala residues were accessible to the solvent, note positions - 42, 65, 89-90 and 114, while others remained buried.

The I-TASSER server provided five potential models for 2AFH chain E sequence, and their C- score ranged between 2.12 to -5. ‘Model1’ had the best C-score - 2.12, RMSD 2.0±1.6 Å, and a 0.99±0.04 TM-score. The templates identified in the threading stage were - 2AFH chains E and A, 1CP2 chain A, 1DE0 chain A and 2NIP chain A (additional Fe proteins from A. vinelandii). Only chains E and A from 2AFH were 100% sequence identity while the rest had varying degrees of sequence identity. The top ranking EC predicted number was 1.18.6.1 (nitrogenase) with a TM-score of 0.8922, RMSD 1.7 Å, 98% sequence identity to the query sequence, an EC- score of 4.0401 and a PDB hit to 1N2C chain E (a nitrogenase complex from A. vinelandii). The most structurally similar protein to the first I-TASSER model was identified as 2AFH chain E, according to its TM-score of 0.9897 (RMSD 0.54 Å, 100% sequence identity).

138

Superimposing the I-TASSER model with the known x-ray crystallographic coordinates of 2AFH (see section 5.2.3 for specific parameters), highlighted only two residues that were not precisely positioned (figure 8). These were G96 and E116 (numbering according to P00459 sequence), at RMSD values of 1.311 and 1.168 Å, respectively, while the overall RMSD was 0.349 Å.

Figure 8 Superimposition of the I-TASSER model based on P00459 sequence, amplified section, and the crystallographic structures of 2AFH chain E. Sections with RMSD >1 Å are highlighted with colour.

139

Table 4 2AFH Fe protein partial NifH section, conservation scores, secondary structure and solvent accessibility. Conserved Ala (A) and Gly (G) residues are highlighted. 2AFH 40 60 80 | | | (^) Partial NifH Sequence (a) CDPKADSTRLILHSKAQNTIM EMAAEAGTVEDLELEDVLKA GYGGVKCVESGGPEPGVGCA Conservation Score (b) 999999999979619793947 38991195999582139111 91115197989999979999 Secondary Structure (e) * E-S-SSSSHHHH--SS--HHH HHHHTTSSGGG--HHHH-EE -GGG-EEEE-----TTT--H *** ---EEEEEE------HHHH HHHHHHHH---EEEEE------HHHHHHH------HHHHH Solvent Accessibility (c) † beeeeebbebbbebebeebbb ebbbeeeebeebebeebbee beeebebeeeeeeeeeebbb †† --ee----e---e-ee-e--- e--ee----ee-eeee--ee -eee-e------e--e--- ††† b-----bb-bbb------bb ----e------b-b--bb------b-bb------bbb

Binding to MoFe Protein (d) ------M E-AA---TVED------P---V-C- Nucleotide Binding Site (d) -D-K-DS------Chains Interface (d) ---K-D--R-----K------EP-V--A Structural Motifs (d) SWITCH1------

(a) Partial NifH amino acid sequence from 2AFH crystallographic 3D structure and P00459 sequence accession ID, chain E. (b) Residue conservation scores, calculated by ConSurf with Maximum likelihood and LG protein substitution model (Pupko et al., 2002; Glaser et al., 2003; Landau et al., 2005; Goldenberg et al., 2008). (e) Secondary structure : ‘H’ - helix; ‘T’- hydrogen bonded turn; ‘S’- bend; ‘E’ - extended beta sheet; ‘G’- 3-helix (three residues per turn); ‘-‘ unknown/ random coil. * Based on 3D coordinates calculated by DSSP as implemented in The Protein Data Bank (Kabsch and Sander, 1983; H.M. Berman, 2003). *** Predicted secondary structure by I-TASSER on line server, based on P00459 NifH sequence (Zhang, 2008; Roy et al., 2010). ‘H’ - helix, ‘E’ - extended beta sheet, ‘-‘unknown. (c) Solvent accessibility: † Buried (b) or exposed (e) residue; calculated by ConSeq on line server (Sridharan et al., 1992; Pollastri et al., 2002; Berezin et al., 2004). †† Solvent accessibility calculated by WHAT IF program, ‘-‘ unknown , (e) exposed - a residue that is clearly solvent accessible, more exposed than 102 Angstrom, or more than 33% of its accessibility in the unfolded state (Vriend, 1990). ††† Predicted solvent accessibility calculated by I-TASSER server. ‘b’- buried residue (0); ‘e’ highly exposed residue (7-9); ‘-‘ varying degrees of exposure; (Chen and Zhou, 2005; Wu and Zhang, 2008). (d) residues interacting with structural components in nitrogenase (Schlessman et al., 1998) (^) Cysteine residues which coordinate the metallo cluster are marked C

140

Table 4 2AFH sequence continued. Conserved Ala (A) and Gly (G) residues are highlighted. Partial NifH Sequence 100 120 140 158 | | (^) | | GRGVITAINFLEEEGAYEDD LDFVFYDVLGDVVCGGFAMP IRENKAQEIYIVCSGEMMAM Conservation Score 99989969889992799113 18897999999999999999 99668999999939999999 Secondary Structure * HHHHHHHHHHHHHTT-SSTT -SEEEEEEE-SS--TTTTHH HHTT---EEEEEE-SSHHHH *** HHH------HHHHHHH-- --EEEEE------HHHHHHHHHH------Solvent Accessibility † bebebbbbbbbeeeeeeeee bebbbbbbbbbbbbbbbbbe beeeebeebbbbbbbebbbb †† ------e----eee -e------eee--e------ee-- ††† --bbbbbb-bb------b-bbbbbb-bbbbbb-bb-- b----b--bbbbbb---bbb

Binding to MoFe protein -R--IT--N---E------C----M- -RE------Nucleotide Binding Site ------D---D------E-MA- Chains Interface ------L-DVVC--FA------EMM-- Structural Motifs ------S W I T C H 2 ------

141

5.3.3 Comparative analysis of cluster I and cluster III Fe proteins

We projected our results from the evolutionary analysis onto the relevant Fe protein structures, 2AFH chain E for cluster I and 1CP2 chain A for cluster III (figure 9). Big blocks of completely conserved residues were evident in the interior of the Fe protein, in the vicinity of the metallo cluster, as expected. Completely conserved residues were found throughout the structure, also toward its exterior, in a more fragmented fashion, alongside less conserved residues.

The secondary structure was similar between 2AFH & 1CP2 (figure 10). The overall RMSD when superimposing 2AFH & 1CP2 crystallographic structures was 0.67 Å (without the 13 residues in the C-terminus of 2AFH), which meant most Cα atoms of the amino acids were positioned fairly similarly in both proteins. However - there were regions where the RMSD was higher than 1 Å, as can be seen in figure 11. Six sections with RMSD values higher than 1 Å were present in the amplified region (Table 5).

Table 5 2AFH & 1CP2 regions of RMSD >1 Å. In bold, residues in the amplified region of NifH. Residue positions* 26-28 50-53 58-68 87-90 93 2AFH sequence(a) AEM SKAQ EMAAEAGTVEDLE GPEP G Conservation score(b) 841 1979 3899119599958 9999 9 1CP2 sequence HAM GLAQ DTLREEGEDVE GPEP G Conservation score 111 9719 99991897571 9999 9 Main secondary structure(c) helix&turn coil helix&turn coil bend RMSD (Å) 1.513 2.143 2.341 1.295 1.998

Residue positions 96-97 108-116 184-188 198 262-269 2AFH sequence GR EEGAYEDDL RNTDR N EELLMEFG Conservation score 99 927991131 91684 1 91695149 1CP2 sequence GR QLGAYTDDL RKVAN K EEILMQYG Conservation score 99 279861119 91971 1 91641119 Main secondary structure helix coil&turn coil&bend helix helix&turn RMSD (Å) 1.001 1.379 3.473 1.073 3.083 * Position number according to 1CP2, chain A sequence P00456. (a) The amino acid in each position in the respective Fe protein, 2AFH or 1CP2. (b) ConSurf conservation scores, 1-9, non conserved to completely conserved, respectively (c) Secondary structure calculated by DSSP as implemented in The Protein Data Bank.

These six regions included coil, turns and parts of alpha helices as their secondary structure (tables 2 & 4). They included residues involved in intersubunit interactions (51, 89-90), binding to the MoFe protein (58-68, 88, 97, 108-116) and coordination of the metallo cluster (87-90, 93, 96-97).

142

Figuregure 9 Conservation pattern of the Fe proteiproteinn. TopTop image:image: SuperimposedSuperimposed 1CP2 and 2AFH Fe proteinsoteins at opposite anglesangles, composed from from completecoompletely conserved residues only (score 9). The metallo cluster is represented with space filled atoms, yellow for Sulphur atoms,atoms orange for the Fe atoms. atoms Bottom image: Conservation scores projected onto individual Fe protein structures. Left Fe protein is 2AFH chain E, and right Fe protein is 1CP2 chain A. Coloured ribbons represent less conserved residues in the protein (scores 1-8, turquoise - pink), while the wire is composed only from completely conserved residues (score 9, maroon) in each cluster.

For the amino acid composition analysis, we performed a two tailed unpaired t-tests, to measure the significant changes between the amino acid compositions between cluster I and III, in the variable and conserved sections in the NifH sequences. Most amino acids in both clusters have passed the normality tests (sections 5.3.1), which marked them suitable for t-tests (Heeren and D'Agostino, 1987).

The composition analysis in the conserved vs. variable sections in the partial NifH sequence, revealed some interesting similarities and differences between cluster I (C1) and cluster III (C3, figure 12) sequences. Under the conserved section, Cys, Leu, Pro and Thr compositions were 143 similar in both clusters, while His, Asn and Trp were absent (Table 6). However, Ala and Gly differed substantially in the conserved region, as evident from their relatively high SD values (3.5 and 4.9, respectively, Table 6). In the variable sections, the composition of six amino acids - Ala, Asp, Glu, Ile, Arg and Val, showed no statistically significant differences between the clusters (Table 7). In addition, Pro was nonexistent in cluster I, and rarely present in cluster III, while Leu was the most common amino acid in both clusters (composition mean 11 and 15, C1 and C3, respectively, Table 7), followed by Glu (11, 12). Phe, Gly, His, Lys, Asn and Ser content decreased significantly in cluster III (Table 7), while Cys, Leu, Met, Gln, Thr and Tyr compositions increased significantly.

Table 6 Amino acid mean composition in the conserved sections of partial NifH sequences from known organisms affiliated with cluster I (C1, N=58) and cluster III (C3, N=32). Shaded cells denote highest standard deviation (SD) values. Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met C1-Conserved (%) 11 5.3 6.6 9.2 1.3 14 0.0 6.6 2.7 5.6 4.9 C3-Conserved (%) 6.1 4.6 9.1 7.6 1.5 21 0.0 5.9 3.0 6.1 3.1 Mean (SD)* 8.6(3.5) 5(0.49) 7.9(1.8) 8.4(1.1) 1.4(0.14) 18(4.9) 0(0) 6.3(0.49) 2.9(0.21)5.9(0.35) 4(1.3) Asn Pro Gln Arg Ser Thr Val Trp Tyr C1-Conserved (%) 0.0 5.3 2.6 3.9 2.7 3.9 11 0.0 3.9 C3-Conserved (%) 0.0 6.1 1.5 6.1 4.5 4.5 7.7 0.0 1.5 Mean (SD) 0(0) 5.7(0.57) 2.1(0.78) 5(1.6) 3.6(1.3 ) 4.2(0.42) 9.4(2.3) 0(0) 2.7(1.7) *Mean and Standard Deviation of C1 and C3 conserved values.

Table 7 Amino acid mean composition in variable sections of partial NifH sequences from known organisms affiliated with cluster I (C1, N=58) and cluster III (C3, N=32). Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu Met C1-Variable(a) 6.9(1.8) 0.81(1.5) 8.5(2.7) 11(3) 4.6(2.3) 7.5(1.8) 2.8(1.7) 5.8(2.2) 5.7(2.6) 11(2.3) 3.3(1.6) C3-Variable 6.3(2.3) 2.5(0.86) 7.7(2.2) 12(3.5) 3(1.2) 5.7(2) 0.29(0.68) 6.3(2.1) 4.6(1.5) 15(3) 4.2(1.5) t-test P value N **** N N **** **** **** N ** **** * summary (b) Asn Pro Gln Arg Ser Thr Val Trp Tyr C1-Variable 6.7(2.3) 0(0) 1.4(1.8) 2.2(2.5) 7.7(3.3) 3.2(2.9) 8(2.3) 0.16(0.58) 2.6(1.9) C3-Variable 3.2(1.6) 0.35(0.73) 2.9(1.3) 2(1.3) 3.6(1.9) 4.9(2) 8.5(2.1) 0.57(0.86) 6.2(1.4) t-test P value **** n/a **** N **** ** N * **** summary (a)The mean composition of each amino acid and its standard deviation. (b) Indicates if the means were significantly different (P<0.05) according to unpaired t-tests with Welch’s correction for unequal variances. **** P< 0.0001 extremely significant, *** 0.0001

144

Figureure 10 Superimposed structures of 1CP2 chain A anandd 2AFH chain E ccrystallogrrystallographic structures highlighighlightingh similaritieses iinn theirtheir secondarsecondaryy structure in the NifH amplified region.on. Overall RMSD 0.67 Å. Helices in blue, coil in lilightght grey and beta sheets in red. TheThe metalometalo cclusterluster ppositionosition iiss bbasedaseedd onon 2AFH PDB filefile atomat coordinates (orange forfor Fe, and yellow for S atoms).

Figure 11 Superimposed structures of 1CP2 chain A and 2AFH chain E crystallographic structures highlighting specific sections with RMSD values >1 Å. Left: Conserved and non conserved regions in which RMSD values >1 Å, highlighted with the ConSurf colour scheme. Therefore, completely conserved regions are in maroon (score ‘9’), and non-conserved are coloured turquoise to pink (scores ‘1-8’). Right: The same regions in which RMSD >1 Å, highlighted per protein structure, hence, 1CP2 145 is in dark blue and 2AFH is highlighted in dark green.

Figure 12 The amino acids mean composition in the partially amplified NifH sequence, from known organisms affiliated with cluster I (C1) and cluster III (C3), divided to variable vs. conserved regions. Error bars are SD.

Salt bridges analysis, based on the crystallographic structures of 1CP2 and 2AFH, revealed that most of the bridges included highly conserved residues mainly in coil regions, with few residues in helices or beta sheets (Table 8).

While the residues within the beta sheets were buried, almost all the other residues were exposed to the solvent according to our previous analysis (tables 2 & 4). Four common bridges were completely conserved in both structures, yet two unique salt bridges in 2AFH and three in 1CP2, had low conservation scores, suggesting these specific bridges were not present in all the sequences from cluster I or cluster III. Three unique salt bridges were highly conserved and connected the intersubunits of 2AFH, yet similar intersubunits bridges were not found in 1CP2, even when our distance criterion was extended to allow a distance of 7 Å between participating atoms.

146

Table 8 Potential salt bridges, with maximum intertatomic distance of 4 Å, in the amplified NifH region of the Fe protein 1CP2 or 2AFH. Shaded rows represent common salt bridges. Distance Conservation Residue Position(b) Residue Position (Å) scores (c) 1CP2(a) ASP 38 LYS 14 3.07 9,9 GLU 62* LYS 54 2.77 1,6 GLU 75 ARG 81 3.84 1,1 GLU 107 LYS 140 3.96 9,9 ASP 115 ARG 81 3.22 1,1 ASP 122* LYS 14 2.95 9,9 GLU 143 ARG 2 3.58 9,9

2AFH(d) ASP 39 LYS 15 3.09 9,9 E↔F GLU 92 LYS 170 2.8 9,9 GLU 110 LYS 143 3.24 9,8 ASP 118 LYS 32 3.38 3,7 ASP 125 LYS 15 3.14 9,9 ASP 129 LYS 41 2.74 9,9 GLU 141 ARG 140 2.92 6,9 GLU 146 ARG 3 2.88 9,9 GLU 154 LYS 10 3.97 9,9 GLU 229 HIS 50 2.53 2,6 E→F GLU 265 LYS 52 3.68 9,9 F→E GLU 277 LYS 52 2.86 -,9

(a) 1CP2 analysis by WHAT IF, salt bridges were not detected between the Fe protein subunits A & B. (b) Positioning was manually corrected for minor shifts per alignment. (c) Conservation score was based on the individual analysis of ConSurf on cluster III, Stromatolite affiliated with cluster III (S3), Cluster I and its affiliated stromatolite clones (S1). Scores ranged from 1 to 9, non-conserved to completely conserved, respectively. “-“ score was not calculated. (d) 2AFH analysis by WHAT IF, salt bridges were detected between subunits E & F, and are designated where relevant. * Yellow background denotes a residue in a α-helix structure, and green denotes a residue within a E-sheet. No background colour means random coil or unknown structure.

147

5.4 Discussion

Cluster I and cluster III NifH sequences from known organisms were subjected to analyses of their conservation patterns, amino acids composition and structural shifts, in representative Fe proteins. Two new bioinformatic tools were evaluated, ConSurf and the I-TASSER web server for structural prediction.

5.4.1 Methodology

Our methods included a statistical t-test analysis of the amino acid composition, a ConSurf evolutionary analysis, and using I-TASSER for modelling the Fe proteins. In general, these methods performed well, with the following limitations and restrictions.

Amino acid composition analysis is usually used, to ascertain unique patterns and characterise a designated group of sequences. The number of sequences and their length may vary, and can encompass a few dozen sequences to hundreds, as well as statistical tools that can be employed, such as Chi square tests, Significance (‘R’) formula based on standard deviation, Cluster analysis and more (Bohm and Jaenicke, 1994; Fukuchi and Nishikawa, 2001; Fukuchi et al., 2003; Paul et al., 2008). In our analysis, the amino acids composition was compared between NifH defined clusters derived from known reference NifH sequences, and our specific NifH clone sets obtained from the Shark Bay stromatolites and the Paralana Hot Springs. While most amino acids passed the normality tests, Trp, Phe and Asp from both cluster I and cluster III did not (tables 1 & 3).

However, we continued with the statistical analysis, with all the amino acids, because of the robustness of the t-test to violations of normal distribution (Heeren and D'Agostino, 1987). The distribution shape of amino acids composition in proteins is a complex matter. The normal type of distribution has been suggested in the past (Smith, 1966; Nishikawa et al., 1983; Gerstein, 1998), but it remains inconclusive. However, we would have liked to see more amino acids pass the normality test. Once elongated, the amplified region of the nifH gene will provide longer sequences, with more appearances of each amino acid, and eventually the distribution shape will become clearer. However, because some of these amino acids would appear mostly in the conserved regions of the sequence, they would always attain the same values, regardless of the data set size, and therefore perhaps a different statistical approach should be used in those cases.

The limitation of the ConSurf analysis was tightly related to the multiple alignment quality, more so than its size. The multiple alignment quality is at the core of the ConSurf analysis

148

(Glaser et al., 2005). In our study we used the Muscle alignment software, and visually checked the alignments. As reflected in independent benchmark testing of multiple alignment tools, MAFFT and Muscle produce similar quality outputs and both are better than ClustalX software (Nuin et al., 2006). Big blocks of conserved motifs throughout our alignments were always correctly aligned, however, whenever there was a single insertion, Muscle tended to position it a bit randomly. Thus an insertion near two identical residues, would create three different forms - xx, x-x, xx-, and impact how ConSurf computes conservation for these specific positions. If the insertion is inserted randomly next to two identical residues, these highly conserved residues will ‘lose’ their specific positioning within a multiple alignment, and will be marked as variable, though they are not. In a highly conserved region of the Fe protein, any minute modifications to the sequence could represent an adaptation. On a large scale, these mini-modifications might get lost or overlooked, yet in our alignments, they were observed and corrected.

In addition, positions with functionally similar amino acids, i.e. a Glu or Asp, will exhibit higher rates of change compared with positions which require the function and the structure of the amino acid to be exactly the same in order for the protein to function at all. Hence, positions which include functionally similar amino acids, but not structurally identical - will alternate between those optional amino acids. As Consurf uses “rate4site” algorithm, which scores positions based on their mutational rates - such alternating positions will be scored as relatively variable, not conserved. They will be given lower scores. Across the NifH sequences from known organisms in cluster III, positions with alternating Glu or Asp received a range of scores - 1,4,5,7 (figure 2, table 3). The exact mechanism by which the specific score was given to these positions requires an in-depth analysis and inspection of the algorithm, factoring into it the Maximum likelihood and the Le and Gascuel substitution matrix and the effect the total number of sequences in the alignment has on the calculation.

The limitation of the I-TASSER modelling method was identified after we have employed the RMSD calculation method (Kabsch and Sander, 1983), as implemented in the Chimera UCSF software (Meng et al., 2006). We performed RMSD analysis on the resolved structures of 1CP2 chain A and 2AFH chain E, and gained an independent RMSD analysis of their structural differences (Table 5). This base analysis was later reviewed against our analysis of each resolved structure against its predicted model by the I-TASSER server (figures 5 & 8). The base analysis indicated where authentic structural changes actually occur between 1CP2 and 2AFH (Table 5). Six of those sections were in the amplified region of NifH, and relatively exposed to the solvent. They are known to undergo conformational changes upon nucleotide binding, or upon forming a docking complex with the MoFe protein (Georgiadis et al., 1992; Tezcan et al., 2005). 149

The RMSD analysis of 1CP2 or 2AFH structure against their predicted I-TASSER models allowed us to isolate any differences introduced by the I-TASSER process. The four sections in 1CP2 which were positioned imprecisely, correlated entirely with our base analysis between the two Fe proteins. We therefore assumed the cause for the misplacement was due to the TM-align procedure on the 1CP2/P00456 sequence, which identified 2AFH chain E as the best structure to model after even though sequence identity to P00456 was only 69%. Hence, in the resulting predicted model for 1CP2/ P00456, the Cα atoms of seven amino acids were placed at a distance from the resolved structure, and were mainly based on 2AFH chain E structural alignment. The only two residues that were slightly off on the 2AFH chain E model, reflect the ab initio modelling procedure I-TASSER employs for loop regions, which has a lower success rate than comparative analysis to a known sequence and structural template procedure (Zhang et al., 2003; Moult, 2005).

5.4.2 Evolution, composition and structure in cluster I & III

The ConSurf analysis of cluster I and cluster III provided some interesting points. The results for both clusters confirmed most residues with important functional roles, were completely or highly conserved. The scoring system itself, based on the maximum likelihood and the LG amino acid substitution matrix, produced a colour scheme for the multiple alignment of each cluster (figures 3 & 6).

In general, Score 8 was assigned by ConSurf to positions in which only one sequence in the entire multiple alignment included a change of a residue, while scores 6-7 were usually assigned to exchanges between a positively charged Arg/Lys and a negatively charged Asp/Glu, or an exchange between Gln/Thr residues with polar uncharged side chains. The latter type of exchange points to function preservation over structural one. Lower scores were given to exchanges of hydrophobic residues, mostly of the bulkier type - Leu/Met/Tyr/Phe. In general, 60% of cluster I residues scored a 8-9, and ~11% scored a 6-7, altogether suggesting that both function and structure were highly conserved throughout cluster I. Cluster III residues scored differently - 54% scored a 9-8 and 14.5% scored a 6-7 - suggesting that cluster III, while still quite conserved in function and structure, is more prone to changes in its amino acids composition, relative to cluster I.

Completely conserved residues in both clusters were found in the switch I & II regions, the 4Fe:4S metalo cluster, and the Walker A motif (GKGGIGGKST), as expected. In addition, regions that were involved in at least two functional roles scored 9. For example, positions 86-

150

99 that were involved in the MoFe binding and in the intersubunits interface, included completely conserved blocks of residues (residue numbering according to 1CP2; (Schlessman et al., 1998). Another conserved motif, in both clusters, was the complete conservation of Q54, as part of a Q-loop motif (Yang et al., 2011). This motif is usually found within the ATP-binding cassette (ABC) transporters, which includes also the multidrug resistance protein MRP. The Q- loop motif is integral to the binding of the nucleotide via its metal cofactor (Yang et al., 2011), and its presence is not surprising considering that NifH has been affiliated phylogenetically with the Mrp /MinD protein family, within the SIMIBI class of the p-loop GTPases (Leipe et al., 2002). Other motifs were not completely conserved in both clusters: The Walker B hhhhDxxG motif was partially conserved (‘h’ denotes hydrophobic residue, 122-129 residue number in 1CP2), DxxG motif was DVLG in both clusters, however, in cluster I, the preliminary ‘hhhh’ included Ser/Cys/Phe (position 123), while remaining positions were hydrophobic residues, as expected. In cluster III, the ’hhhh’ motif was conserved and the residue exchanges were solely hydrophobic in nature (Val/Phe/Tyr/Ile). These structural and functional motifs are involved in the nucleotide binding as well.

The Fe protein, in general, has retained the Asn residue which is part of a Nxxx motif , a variation on the original NKXD in p-loop GTPases (Leipe et al., 2002). This Asn is thought to stabilize the guanine nucleotide binding site, and produce a specificity for GTP binding (Bourne et al., 1991). It was not completely conserved in cluster III, in which position N106 scored only 6, as some sequences also included an Asp or Gly variation for this position. In cluster I, N107 scored 8, as almost the entire alignment maintained this specific residue at this position.

The salt bridges, in the amplified NifH region, were mainly composed from exposed and completely conserved residues, located chiefly in the coil regions. 2AFH included a larger number of completely conserved bridges, in contrast to the 1CP2 structure. The unique bridge in 2AFH of Glu92-Lys170 was observed previously (Schlessman et al., 1998), however, our analysis indicated two additional bridges might be present - Glu265E-Lys52F from chain E to chain F, and Glu277F-Lys52E, from chain F to E, both of which were highly conserved (Table 8). Our analysis did not detect Asp129-Lys41 as a bridge between the two subunits in 2AFH, but as a bridge within subunit E, yet it was reported as an intersubunit bridge in another Fe protein structure, 1NIP (Schlessman et al., 1998), hence this pair of residues may be multifunctional, which would explain their complete conservation. Some studies have indicated that the highly conserved Arg100 (cluster I numbering) is part of a salt bridge with Glu120 in the alpha chain of the MoFe protein (Georgiadis et al., 1992; Burgess and Lowe, 1996), and replacing this residue produced salt sensitivity and partial functionality of the Fe protein (Peters et al., 1995). 151

Our analysis indicated this residue may interact also with Glu156 of the beta chain in the MoFe protein. In both bridges, the distance calculated by the WHAT IF program for the participating atoms was larger than 4.1 Å, suggesting that using the 4.0 Å as the maximum distance criterion between atoms for an established salt bridge, was not ideal. Glu110, Arg140 and Lys143 also have been implicated in mutational studies to be important for the protein to function under saline conditions, as replacing them caused salt sensitivity and various degrees of uncoupling of the MgATP hydrolysis (Peters et al., 1995). The exact role of these residues is not yet determined, though some have suggested they play a crucial role during the docking procedure with the MoFe protein (Tezcan et al., 2005). The highly conserved salt bridge between Asp125 and Lys15 (2AFH numbering, Table 8), have been confirmed and is known to connect between the Walker A motif (Lys15) and the switch II region, and is crucial for the conformational changes the Fe protein undergoes once a nucleotide is bound (Georgiadis et al., 1992; Lanzilotta et al., 1995). However, Asp39 and Lys15 represent a potential additional important salt bridge, between the Walker A motif and switch I region, which requires further studies. The other salt bridges suggested in our analysis of 1CP2 and 2AFH should be further characterised, particularly those involving highly conserved residues such as Asp129-Lys41, Glu146-Arg3, Glu154-Lys10 and the intersubunit salt bridges (Table 8). Because most salt bridges were exposed to the solvent, it is possible they switch between possible partners within the Fe protein and partners in the MoFe protein upon docking.

The ConSurf analysis not only confirmed the functional and structural elements in cluster I & III, it also provided an additional layer of information - mainly in regards to residues which maintained functionality but not structure. This was complemented by the amino acid composition analysis.

Most amino acids composition, from cluster I and cluster III sequences were found to have passed robust normality tests (tables 1 and 3). They therefore have been considered as following a Gaussian distribution, and able to withstand specific statistical tests. A two tailed unpaired t- test analysis in the amplified region of NifH divided into conserved vs. variable segments, produced positive results and found shifts in both segments (tables 7 & 8). Under the conserved region, only Ala and Gly showed any considerable shifts in composition between the two clusters, while other amino acids remained very similar in composition, as would be expected from conserved regions.

The variation between Ala and Gly residues, in the conserved region, might be an indication of the relative effect of Ala versus Gly on helix stability within the Fe protein. Other studies indicated these amino acids stabilised helices, but at different locations along the helix, Gly at 152 the N- and C terminals with Ala in internal position (Chakrabartty et al., 1991; Serrano et al., 1992b; Serrano et al., 1992a). It is thought this specific exchange of Gly & Ala impacts solvent accessibility, and influences the exposure of hydrophobic surfaces to the solvent. Conserved Gly or Ala residues in 2AFH or 1CP2 were positioned adjacent to important functional domains, and most were accessible to the solvent, making them suitable to provide minor adjustments for the functional residues, in regards to the solvent accessibility (tables 2 & 4).

Our composition analysis of the variable sections (scores 1-8), indicated that cluster III increased its hydrophobic content and thus reduced the overall accessible surface area to the solvent (Moret and Zebende, 2007), producing a more compact Fe protein in general (Table 7). In addition, although our analysis suggested that the variable sections differ substantially in their amino acid compositions between cluster I and III, there were underlying common trends. These included Leu as the highest occurring amino acid, and that charged amino acids, such as Asp, Glu and Arg, did not defer significantly in their composition. This was true also for several hydrophobic amino acids, mainly Ala, Val and Ile. The fact these groups had no significant change in composition, although located in the variable section of the NifH sequence, points to their involvement in a functional role via their side chains, perhaps in a similar fashion to Arg100, Arg140, and Lys143 (2AFH sequence numbering). Mutational studies revealed these conserved amino acids provided essential ionic support during the complex formation with the MoFe protein (Peters et al., 1995), perhaps Asp, Glu and Arg in the variable segments, provide similar support as well.

Prior to analysing a clone based Fe protein structure, it was of interest to check how the I- TASSER server would perform on 1CP2 and 2AFH sequences vs. their known X-ray crystallographic structures, in order to independently gauge the server performance. Overall, the I-TASSER models were in good agreement and quality with the crystallographic structures of 2AFH and 1CP2 Fe proteins (figures 5 & 8). The performance of the I-TASSER server was rather accurate, taking into consideration known server limitations, such as an average error of 0.08 for the TM-score and 2 Å for RMSD (Zhang, 2008). All models used in this study had TM- scores higher than 0.5, C-scores in the range of 1.34-2.14 and RMSD values of 1.7-2.3 Å.

Lower RMSD scores for 2AFH than 1CP2 I-TASSER models, were most probably due to the fact that there are more resolved Fe protein structures from A. vinelandii (20 in total) than C. pasternium (2). Therefore the resulting 2AFH model was more accurate than the 1CP2 I- TASSER, with an RMSD value of 0.689 Å for the 1CP2 and its I-TASSER model, and 0.529 Å for 2AFH and its I-TASSER model. The TM-align software (Zhang and Skolnick, 2005), ranked 2AFH chain E as the best structural match to the sequence of 1CP2 chain A, even

153 though the sequence identity was only 0.69, and we suspected this introduced bias in the cluster III clones predicted models.

In total there were seven mismatched positions in the 1CP2 model, which meant that in the predicted model, Cα atoms of seven amino acids were placed at a distance from the resolved structure. According to our previous analyses (Table 2) these residues were exposed to the solvent, and participated in two coil sections, one hydrogen bond turn and one beta sheet, respectively, and were completely conserved, except for Leu52 and Thr115-Asp116 (ConSurf score 7, and 1-1, respectively). As expected, there were only two minor mismatched positions in the predicted model for 2AFH chain E. According to our previous analyses G96 is completely conserved in cluster I, present at the end of a α-helix, buried and close to a loop region (Table 4), and Glu116 is a highly variable position (score 1 E/D/V/S, sometimes absent), exposed to the solvent, and is part of a bend near the end of a α-helix. Coil and loop regions are notoriously hard to model accurately (Moult, 2005), and these results were not surprising.

When comparing 1CP2 and 2AFH actual crystallographic structures and superimposing them, the mismatched residues, in the amplified region of NifH, included most of the positions we reported as a mismatch between 1CP2 and its I-TASSER model (Table 5, figure 5). This suggested again that the lack of additional structures affiliated with cluster III, in the PDB template library, has most probably caused a slight bias in the I-TASSER process. However - as our analysis clarified what those positions were, we would be able to inspect them carefully in future analyses.

Projecting the ConSurf evolutionary scheme onto the resolved structures of 1CP2 chain A and 2AFH chain E, has demonstrated that in the amplified section of the NifH sequence, the most conserved regions were switch I and II and residues coordinating the 4Fe:4S metalo cluster (see figure 9). Combining the RMSD analysis and the ConSurf evolutionary scheme, suggested that structural shifts occur within specific regions, which included highly conserved and also non conserved residues (figure 11). The region of 113-118 was in particular prone to insertions and structural shifts, and this region is chiefly involved in the docking procedure between the Fe protein and the MoFe protein of the nitrogenase (Peters et al., 1995; Tezcan et al., 2005). In general, our RMSD analysis was in agreement with the RMSD analysis as presented by Schlessman et al., 1998, on 1CP2 Fe protein structure, and A. vinelandii Fe-protein denoted Av2, at 2.13 Å resolution.

154

5.5 Concluding remarks

NifH sequences from known organisms affiliated with cluster I or cluster III were analysed in terms of their conservation patterns, amino acid composition and existing and potential structural attributes. Our methods included a statistical t-test analysis of the amino acid composition, a novel ConSurf evolutionary analysis, and the use of the I-TASSER web server. These methods performed well in general, with few limitations, and provided interesting results.

The analyses results suggested cluster III was slightly less conserved than cluster I, and contained more hydrophobic residues. A possible role for the Ala and Gly residues as interchangeable stabilisers of the alpha helices in the Fe protein was suggested as well.

The main known difference between cluster I and cluster III is that the latter includes strictly anaerobic species, while cluster I includes both aerobic and anaerobic species (see section 1.4, chapter 1). Our analysis highlights what are the underlying changes which facilitate this specilialisation in cluster III diazotrophs.

155

Chapter 6 Halophilic and thermophilic adaptations in the Fe protein ______

6.1 Introduction

Clones obtained from columnar stromatolites (chapter 3) and Paralana Hot Springs (chapter 4), were phylogenetically affiliated mainly with cluster I and cluster III, of the nifH phylogenetic tree (Zehr et al., 2003a; Raymond et al., 2004a). We expected that halophilic adaptations would manifest themselves to some extent in the nifH genes from columnar stromatolites of Shark Bay, because representatives of halophilic Halobacteriales have been previously detected in stromatolites (Goh et al., 2006; Allen et al., 2008; Allen et al., 2009) as well as Haloanaerobiales in Guerrero Negro microbial mats (Ley et al., 2006). The archaeon Halococcus hamelinensis, isolated from Shark Bay stromatolite mats, has been found to employ mainly glycine betaine as an osmolyte (Goh et al., 2011), while 18 Cyanobacteria isolates from the Oscillatoriales, Chroococcales and Pleurocapsales orders, have been found to accumulate predominantly various saccharides, glycine betaine, and trimethylamine-N-oxide (Goh et al., 2010). While halophilic Archaean diazotrophs have not been detected in our analysis (chapter 3), we have detected Cyanobacteria representatives. Thus, we have potential nitrogen fixers with known halophilic adaptive strategies in Shark Bay.

Halophilic adaptations may include an increase in acidic residues (Asp, Glu), a decrease in large hydrophobic residues and their replacement with small hydrophobic residues such as Ala, Gly and Val, and a lower Lys content, alongside an increase in salt bridges, within monomers and between subunits (Lanyi, 1974; Rao and Argos, 1981; Madern et al., 1995; Madern et al., 2000; Fukuchi et al., 2003). The main ‘threat’ to a protein under saline conditions, is the excess of salt ions in the solvent, which prevent proper bonding with the water molecules and promotes aggregation (Bolhuis et al., 2008). The increase in negative charges in a protein, by the increase in the acidic residues, acts as a charged screen against the salt ions and attracts water molecules to the protein (Bolhuis et al., 2008). Other studies suggested that the salt bridges were stabilized at times by the solvent salt ions, thus harnessing the solvent to preserve the protein structure and function (Eisenberg, 1995; Madern et al., 2000). The change in hydrophobicity helps the protein to remain flexible under saline conditions and prevents aggregation (Jaenicke and Böhm, 1998; Madern et al., 2000). These changes provide different mechanisms which enable a protein to function under extreme saline conditions, such as those surrounding Shark Bay stromatolites.

156

Similar information about known thermophilic diazotrophs in Paralana Hot Springs (PHS) is scarce. However reports of active diazotrophs in hot springs and hydrothermal vents (Mehta and Baross, 2006; Hamilton et al., 2011b) and recent analyses of thermophilic proteins (Siddiqui and Thomas, 2008), suggest that a thermophilic diazotroph might acquire unique adaptations, and reside in PHS. Thermophilic adaptations usually include an increase in charged amino acids and some hydrophobic amino acids (Ile, Met, Val, Tyr), as well as an increase in Pro and a decrease in Gly content (Kumar and Nussinov, 2001; Somero, 2003). A decrease in uncharged polar amino acids such Ser, Thr, Asn and Gln was also observed in various thermophilic proteins (Georlette et al., 2003; Daniel et al., 2008). Structural adaptations may involve an increase in salt bridges within monomers and between subunits, and a decrease in the protein size, usually by removing loop regions and sections in the N- and C-terminals (Fields, 2001; Daniel et al., 2008). The increase in charged residues and salt bridges increases ionic networks which stabilize the protein at higher temperatures and prevent unfolding. Removal of Asn and Gln stabilizes the protein in general as these amino acids tend to deaminate at higher temperatures (Kumar and Nussinov, 2001). The increase in hydrophobic residues, specifically at the core of the protein, enhances hydrophobic interactions and increases its overall. In general, thermophilic proteins increase their hydrophobic, electrostatic, Van der Waals and hydrogen bonds to prevent unfolding at higher temperatures and in the process become compact and rigid, relative to mesophilic and psychrophilic proteins (Siddiqui and Cavicchioli, 2006; Daniel et al., 2008).

Our aim was to assess halophilic and thermophilic adaptations in the inferred NifH sequences from columnar stromatolites of Shark Bay and from the microbial communities at Paralana Hot Springs, respectively.

157

6.2 Material and methods

6.2.1 Evolutionary conservation Analysed as described in section 5.2.1, chapter 5.

6.2.2 Residue composition Analysed as described in section 5.2.2, chapter 5.

6.2.3 Statistical analysis Analysed as described in section 5.2.3, chapter 5.

6.2.4 Distance matrices

Subsets of the individual multiple alignments were converted to phylip format using Readseq (Gilbert, 2003) on the EMBL-EBI web server (EMBL-European Bioinformatics Institute) and submitted to “PHYLIP Protdist” version 3.67 (Felsenstein, 2007), available via the Mobyle web portal (http://mobyle.pasteur.fr/cgi-bin/portal.py#forms::protdist), to create distance matrices as described previously (chapter 2, section 2.2.6). 1CP2 and 2AFH NifH sequences were extracted from their respective PDB files, by the WHAT IF web server version 8.0 (Vriend, 1990) and trimmed to include only one copy of NifH for this calculation.

6.2.5 Structural characteristics

3D crystallographic representatives of the Fe protein from mesophilic Azotobacter vinelandii PDB file ID 2AFH (Burgess et al., 1980; Tezcan et al., 2005) and Clostridium pasteurianum, PDB file ID 1CP2 (Schlessman et al., 1998), were chosen in order to assess potential structural changes in the clone libraries, in relation to cluster I and cluster III, respectively. Structural characteristics were calculated as described in section 5.2.4, chapter 5).Protein images were created using the Chimera UCSF program, version 1.6.2 (Pettersen et al., 2004). The I-TASSER on line server provided 3D models for chosen clone sequences, which were superimposed on the 1CP2 or 2AFH, using the “MatchMaker” option in the UCSF Chimera software (Pettersen et al., 2004; Meng et al., 2006) with Smith-Waterman (Pearson, 1991) alignment algorithm and other options left in default fashion (BLOSUM 62 substitution matrix and 30% weighting of the secondary structure term). Salt bridges were defined by the WHAT IF web server (Vriend, 1990), version 10.1a (http://swift.cmbi.ru.nl/servers/html/index.html) and restricted to an interatomic distance of less than 4.0 Å as described in section 5.2.4, chapter 5.

158

6.3 Results

6.3.1 Potential halophilic adaptations in the Fe protein

The evolutionary conservation amongst the stromatolites revealed that 50% of the amplified region of NifH scored 8 & 9, while 9% scored 6 & 7. Important functional areas such as the switch I & II, the nucleotide binding site, intersubunit interface within the Fe protein, as well as the MoFe binding and the metalo cluster coordinating residues, were completely conserved in the stromatolites, in a similar fashion to cluster I & III (figure 1). Four positions had unique attributes, in comparison to cluster I & III (figure 1, highlighted residues in bold). Position 70 (residue number according to figure 1) in the stromatolite alignment, was a completely conserved Leu, though in cluster I several other hydrophobic amino acids were also present, such as Ile/Val, and in cluster III Ala/Thr were present. Position 79 was highly variable, but included an Asn residue in many of the stromatolites. Asn was absent from cluster I and III alignments for this position. The residues variants in positions 118-119 included also the addition of Glu and Leu, in the stromatolites, while in the clusters, position 118 included mainly Asp, and position 119 included mainly Phe/Tyr. No other unique variants were found.

We continued with an amino acid compositional analysis, as previously done for cluster I and III. Residue shifts in the amplified section of the NifH sequences, in the affiliated stromatolites of cluster III, are depicted in figure 2. 15 amino acids did not change in their composition in the conserved segments, and had similar composition values (red dots in figure 2, table 1). Nevertheless, Asp, Gly and Arg decreased in stromatolites, while Leu and Tyr increased in their respective composition (table 1). Leu had the highest standard deviation value, suggesting variation in its composition within the stromatolites (7.7 SD). In the variable segments for the clones affiliated with cluster III, there were significant ratio changes with 14 amino acids (table 2). A significant increase was observed in the composition of Asp, Glu, Phe, Gly, Lys, Pro, Gln, Arg, Val and a significant decrease was observed with Cys, Ile, Leu, Met, and Tyr. Ala, His, Asn, Ser and Thr composition did not change significantly.

159

37(*) - DPKADSTRLM LHAKAQNTIM EMAAEAGTVE DLELDEVLKVG YNDVKCVES GGPEPGVGCA GRGVITAINF LEEEGAYDDD-116 5799999694 9115191976 1256521485 66691171114 211119199 8889889899 8687977975 9611898112 117 - LDFVFYDVLG DVVCGGFAMP IRENKAQEIY IVVSGEMMA-156 2224381994 6499889678 9624996975 731985987

Consensus (a)

Cluster I (b) Cluster III

Figure 1 Conservation of the partial region of NifH in stromatolites (N=61). (*) ConSurf conservation scores for the stromatolite alignment. A representative clone sequence was chosen, with no gaps. Residues in bold and grey background colour are unique variants, see text for details. (a) The consensus line of the stromatolites, red = completely conserved residue, purple = highly conserved residues (80% or greater), non conserved residues are shown in black. (b) Cluster I sequence and conservation, based on P00459 sequence, and cluster III sequence and conservation based on P00456 sequence.

160

Figure 2 The amino acids mean composition in the partial NifH sequence of cluster III (C3) and affiliated stromatolite clones (S3). Divided into variable vs. conserved regions of NifH. Error bars are SD.

Figure 3 Amino acids mean composition in the partial region of NifH from cluster I (C1) and affiliated stromatolites clones (S1), divided into variable vs. conserved regions of NifH. Error bars are SD. 161

Table 1 The amino acid mean composition in the conserved sections in cluster I (C1, N=58), cluster III (C3, N=32and affiliated clones (S1=44, S3=18). Shaded cells denote high Standard Deviation values (SD). Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu C1-Conserved 11 5.3 6.6 9.2 1.3 14 0.0 6.6 2.7 5.6 S1-Conserved 12 3.6 5.9 9.5 2.4 14 1.2 4.8 3.6 8.4 SD 0.71 1.2 0.49 0.21 0.78 0.0 0.85 1.3 0.64 2.0 C3-Conserved 6.1 4.6 9.1 7.6 1.5 21 0.0 5.9 3.0 6.1 S3-Conserved 4.1 4.2 5.5 8.0 1.4 16 0.0 6.9 1.3 17 SD 1.4 0.28 2.5 0.28 0.071 3.5 0.0 0.71 1.2 7.7 Met Asn Pro Gln Arg Ser Thr Val Trp Tyr C1-Conserved 4.9 0.0 5.3 2.6 3.9 2.7 3.9 11 0.0 3.9 S1-Conserved 3.50 0.0 4.8 2.4 2.4 3.6 2.4 11 0.0 4.8 SD 0.99 0.0 0.35 0.14 1.1 0.64 1.1 0.0 0.0 0.64 C3-Conserved 3.1 0.0 6.1 1.5 6.1 4.5 4.5 7.7 0.0 1.5 S3-Conserved 2.4 1.4 5.5 1.4 2.8 5.3 4.2 7.0 0.0 5.5 SD 0.49 0.99 0.42 0.071 2.3 0.57 0.21 0.49 0.0 2.8

Table 2 Amino acid compositions in cluster I (C1, N=58), cluster III (C3, N=32) and affiliated clones (S1=44, S3=18) in the variable portions of the amplified section in NifH. Amino Ala Cys Asp Glu Phe Gly His Ile Lys Leu Acid C3(a) 6.3(2.3) 2.5(0.86) 7.7(2.2) 12(3.5) 3(1.2) 5.7(2) 0.29(0.68) 6.3(2.1) 4.6(1.5) 15(3) S3 5.5(1.6) 0.14(0.59) 11(1.9) 15(1.6) 4.4(0.77) 12(1.2) 0.41(0.91) 2.2(1) 6.2(2) 3.5(1.7) C1 6.9(1.8) 0.81(1.5) 8.5(2.7) 11(3) 4.6(2.3) 7.5(1.8) 2.8(1.7) 5.8(2.2) 5.7(2.6) 11(2.3) S1 0.71(2.1) 0.26(0.83) 6.4(3.1) 17(1.4) 0.71(1.7) 5.4(0.94) 0.72(1.3) 9.2(1.9) 1.1(2.2) 15(3.4) C3:C1(b) N **** N N **** **** **** N ** ****       S3:C3 N **** **** ** **** **** N **** ** ****         S1:C1 **** * *** **** **** **** **** **** **** ****           Amino Met Asn Pro Gln Arg Ser Thr Val Trp Tyr Acid C3 4.2(1.5) 3.2(1.6) 0.35(0.73) 2.9(1.3) 2(1.3) 3.6(1.9) 4.9(2) 8.5(2.1) 0.57(0.86) 6.2(1.4) S3 1.5(1.4) 2.9(3.1) 1.4(1.5) 4.7(1.2) 7.9(3.9) 4.2(1.6) 5.9(1.7) 11(1.9) 0(0) 0.14(0.56) C1 3.3(1.6) 6.7(2.3) 0(0) 1.4(1.8) 2.2(2.5) 7.7(3.3) 3.2(2.9) 8(2.3) 0.16(0.58) 2.6(1.9) S1 3.1(1.2) 8.4(1.5) 0(0) 0.065(0.43) 8.1(1.1) 7.6(2.3) 13(2.9) 3.6(2.1) 0(0) 0.067(0.44) C3:C1 * **** n/a **** N **** ** N * ****         S3:C3 **** N * **** **** N N *** n/a ****        S1:C1 N **** n/a **** **** N **** **** n/a ****       

(a) Mean ratios and standard deviation in parenthesis. C1-Cluster I, C3-Cluster III, S1-Stromatolites affiliated with cluster I and S3-Stromatolites clones affiliated with cluster III. (b) Two tailed t-test P value summary. Means were significantly different (P<0.05) according to unpaired t-tests with Welch’s correction for unequal variances. **** P< 0.0001 extremely significant, *** 0.0001

An increase in the Leu composition within the conserved segments was also evident in the affiliated stromatolite clones of cluster I (figure 3, table 1). All other amino acids in the conserved segments had low standard deviation values and did not vary in their composition, relative to the reference cluster. In the variable segments, significant ratio changes were found with 16 amino acids. A significant increase of Glu, Ile, Leu, Asn, Arg and Thr (table 2) and a significant decrease of Ala, Cys, Asp, Phe, Gly, His, Lys, Gln, Val and Tyr were observed. Pro and Trp were absent, and the Met and Ser compositions did not vary significantly in the clones affiliated with cluster I.

Potential structural changes to the Fe protein structure were assessed by using 20 stromatolite NifH clones that were submitted to the I-TASSER server for model prediction. Two groups of ten clones each, which were at maximum distance of 0.23 to1CP2 or 2AFH sequences (see table 3). The RMSD values and evolutionary conservation of the I-TASSER models were analysed, relative to 1CP2 or 2AFH structures.

The 10 stromatolite clones affiliated with cluster III were at a distance of 0.14-0.23 from the 1CP2 sequence and their model RMSD values, according to I-TASSER, ranged between 0.56 (RSA13796) to 1.05 Å (table 3). There were seven sections, in the clones’ Fe protein models, that showed an average RMSD value higher than 1 Å, and positions 50-51, and 113-115 presented average RMSD values higher than 2 Å, indicating a larger shift in the structural alignment compared to the 1CP2 chain A structure (table 4, figure 5). These two sections were composed from conserved and non conserved residues, exposed to the solvent and their predicted secondary structures included coils, bends and turns. The 10 stromatolite clones affiliated with cluster I were at a distance of 0.08-0.22, from the 2AFH sequence (table 3), and their RMSD values ranged from 0.36 (RSA13396) to 0.61 Å. There were three sections which showed RMSD values above 1 Å, and none had values above 2 Å, as seen in the previous analysis of cluster III affiliated clones (figure 6). According to the 2AFH and affiliated clones analysis, two Gly residues showed minor structural shifts, one Gly was denoted buried and the other exposed to the solvent, while the third section, 115-118, was mostly exposed (table 5). The secondary structure was characterised as coil, turns and bends for these sections and they included conserved and non conserved residues, according to the ConSurf analysis.

163

Table 3 Stromatolite clones chosen for structural analysis. Sequence ID Distance(*) C-score(a) TM(b) RMSD (Å)(c) IDEN(d) 1CP2 0 2.2 0.972 0.84 1 RSA9396 0.14 1.93 0.962 0.92 0.941 RSA9696 0.14 1.93 0.963 0.87 0.952 RSA9096 0.15 1.81 0.966 0.9 0.915 RSA13796 0.16 1.93 0.974 0.56 0.929 RSA9196 0.16 1.34 0.9279 1.05 0.7 RSA98963 0.16 1.93 0.964 0.85 0.926 RSA10796 0.18 2.12 0.9804 0.73 0.68 RSA14196 0.19 1.89 0.964 0.85 0.926 RSA11996 0.21 1.91 0.963 0.89 0.918 RSA15296 0.23 1.93 0.962 0.9 0.914

2AFH 0 2.12 0.9897 0.54 1 RSA13596 0.08 2.14 0.9926 0.56 0.96 RSA13396 0.09 1.82 0.996 0.36 0.949 RSA10296 0.12 1.81 0.994 0.41 0.935 RSA11596 0.13 2.15 0.9953 0.43 0.95 RSA11496 0.17 1.79 0.989 0.61 0.916 RSA10196 0.18 1.76 0.989 0.44 0.913 RSA7904 0.19 1.85 0.99 0.43 0.916 RSA6104 0.2 1.57 0.986 0.42 0.867 RSA6704 0.2 1.85 0.989 0.48 0.912 RSA6904 0.22 1.76 0.987 0.5 0.889 (*) Distances as calculated by PHYLIP Protdist” version 3.67 (Felsenstein, 2007). (a) Confidence score for estimating the quality of the predicted top model by I-TASSER (Roy et al., 2010). (b) TM-score of the structural alignment between the query structure and known structures in the PDB library (Zhang and Skolnick, 2005). (c) The overall RMSD between residues that were structurally aligned by TM-align (Zhang and Skolnick, 2005). (d) The percentage sequence identity in the structurally aligned region (Roy et al., 2010).

Table 4 Residue characteristics of stromatolite clones affiliated with cluster III with regions of RMSD >1 Å. Residue positions* 41-42 50-51 62-63 65-68 89 91-93 113-115

Amino acid(a) AD GL EE EDVE E GVG TDD Conservation score(b) 99 97 18 7571 9 999 111 Main secondary structure(c) bend/coil bend/coil helix 3-helix turn coil turn/bend coil/turn Solvent accessibility(d) ee ee ee ee-- e --- eee Average RMSD (Å) (e) 1.245 2.252 1.316 1.142 1.053 1.412 2.328 * Position number according to 1CP2, chain A sequence, P000456 accession ID. (a) The amino acid in each position in the respective 1CP2 sequence. (b) ConSurf conservation scores for cluster III, 1-9, non-conserved to completely conserved, respectively. (c) The secondary structure based on the crystallographic structure of 1CP2. (d) Solvent accessibility - Buried (b) or exposed (e) residue, (-) varying degrees of exposure. See Table 2 for full details. (e) Average RMSD values for specific positions, in which the clones presented RMSD values >1 Å, relatively to the 1CP2 structure.

164

Figure 5 Two opposite angles of 1CP2 Fe protein superimposed with ten stromatolite clones I- TASSER models. Magenta highlights areas where RMSD >1 Å. The largestg structural shifts,, where RMSD >2 Å and the site of the two rresidueesidue insertion are in red,red, sesee tatableble 12 fforor ffurtherurther details.details.

Figure 6 Two different angles of 2AFH Fe protein superimposed with its closest ten stromatolite clones I-TASSER models. Magenta highlights areas where RMSD >1 Å, as per table 13.

Table 5 Residue characteristics of stromatolite clones affiliated with cluster I with regions of RMSD >1 Å. Residue positions* 94 96 115-118

Amino acid(a) G G YEDD Conservation score(b) 7 9 9113 Main secondary structure(c) turn coil bend & turn Solvent accessibility(d) e b eeee Average RMSD (Å) (e) 1.379 1.028 1.016 * Position number according to 2AFH, P00459 sequence accession ID, chain E. (a) The amino acid in each position in the respective sequence. (b) ConSurf conservation scores for cluster I, 1-9, non- conserved to completely conserved, respectively. (c) The secondary structure based on the crystallographic structure of 2AFH.

165

(d) Solvent accessibility - Buried (b) or exposed (e) residue, (-) varying degrees of exposure. See Table 4 for full details. (e) Average RMSD values for specific positions, in which the clones presented RMSD values >1 Å, relatively to the 2AFH structure.

According to the analysis we performed on six stromatolite clones (three for each cluster), there was a total of 11 common salt bridges for the stromatolite clones and 1CP2 or 2AFH (table 14), and 15 unique salt bridges which were not detected in 1CP2 or 2AFH, under the enforced 4 Å interatomic distance limit, between the side chain oxygen atoms in Asp or Glu, to the side chain nitrogen atoms in Arg, Lys or His (table 6). Two unique salt bridges were highly conserved in S3 and S1, and were at positions Asp42-Arg45, and Asp128-Lys9 (residue numbering according to S3, underlined in table 6). The additional negative residues that sometimes appear in the region of 113-115 in stromatolites (tables 4 & 5); seem to strengthen ionic bonds with Lys32 and Lys84 mainly, but not only. Salt bridges in S3, corresponding to the above mentioned region, were detected in our analysis but the interatomic distances ranged between 4.5 to 6.99 Å, and were therefore not specified in table 6. These salt bridges included Asp residues which interacted mainly with Lys30 and Met1, residues which scored 1 and 9 for conservation, respectively, in cluster III. The salt bridges also included Lys residues in this region that interacted with Glu113, but at a distance of 6.52 Å, and therefore were not specified in table 6.

166

Table 6 Potential salt bridges, with maximum intertatomic distance of 4Å, in the amplified NifH region of the Fe protein. Shaded rows represent common salt bridges present in the representative clones and the selected structure, 1CP2 or 2AFH. Distance Conservation Residue Position(b) Residue Position (Å) scores (c) 1CP2(a) ASP 38 LYS 14 3.07 9,9 GLU 62* LYS 54 2.77 1,6 GLU 75 ARG 81 3.84 1,1 GLU 107 LYS 140 3.96 9,9 ASP 115 ARG 81 3.22 1,1 ASP 122* LYS 14 2.95 9,9 GLU 143 ARG 2 3.58 9,9

S3(d) ASP 42 ARG 45 3.66 9,8 GLU 62 ARG 54 3.61 9,1 ASP 71 ARG 54 3.89 7,1 GLU 89 ARG 61 3.65 9,9 GLU 107 LYS 142 2.71 8,8 ASP 128 LYS 9 3.97 8,- GLU 145 ARG 2 3.32 8,-

2AFH(e) ASP 39 LYS 15 3.09 9,9 E↔F GLU 92 LYS 170 2.8 9,9 GLU 110 LYS 143 3.24 9,8 ASP 118 LYS 32 3.38 3,7 ASP 125 LYS 15 3.14 9,9 ASP 129 LYS 41 2.74 9,9 GLU 141 ARG 140 2.92 6,9 GLU 146 ARG 3 2.88 9,9 GLU 154 LYS 10 3.97 9,9 GLU 229 HIS 50 2.53 2,6 E→F GLU 265 LYS 52 3.68 9,9 F→E GLU 277 LYS 52 2.86 -,9

S1(f) GLU 28 ARG 81 2.7 ,-,1 ASP 39 LYS 15 2.73 9,- ASP 44 ARG 47 2.71 9,9 ASP 70 ARG 65 2.66 8,1 GLU 74 LYS 77 2.68 7,1 GLU 92 ARG 100 2.71 9,8 GLU 110 LYS 143 2.73 9,9 ASP 116 LYS 32 2.74 7,- ASP 116 LYS 84 2.62 1,7 ASP 118 LYS 32 2.68 7,- ASP 120 LYS 31 2.73 7,- ASP 125 LYS 15 3.77 8,- ASP 129 LYS 10 2.73 9,- ASP 129 LYS 41 2.71 9,9 GLU 141 ARG 140 2.7 8,9 GLU 146 ARG 3 2.65 9,- GLU 154 ARG 187 2.74 8,- GLU 221 ARG 46 2.69 ,-,9 GLU 229 HIS 50 2.75 ,-,9 (a) 1CP2 analysis by WHAT IF, salt bridges were not detected between the Fe protein subunits A & B. (b) Positioning was manually corrected for minor shifts per alignment.

167

(c) Conservation score was based on the individual analysis of ConSurf on cluster I, affiliated stromatolite clones (S1), cluster III and stromatolite affiliated with cluster III (S3). Scores ranged from 1 to 9, non-conserved to completely conserved, respectively. “-“score was not calculated. (d) Based on the WHAT IF analysis on the I-TASSER PDB files of cluster III stromatolite clones: RSA14196, RSA11996 and RSA98963. (e) 2AFH analysis by WHAT IF, salt bridges were detected between subunits E & F, and are designated where relevant. (f) Based on the WHAT IF analysis on the I-TASSER PDB files of cluster I stromatolite clones: RSA13596, RSA7904 and RSA6904. * A yellow background colour denotes a residue in a α-helix structure, and a green colour denotes a residue within a E-sheet. No background colour means random coil or unknown structure.

6.3.2 Potential thermophilic adaptations in the Fe protein

The conservation analysis of the Paralana Hot Springs (PHS) clones revealed that 54% of the amplified region sequence scored 8 & 9, while 13% scored 6 & 7. Several sections were completely conserved - the nucleotide binding site, intersubunits interface within the Fe protein, the MoFe binding residues, the metalo cluster and the two switch regions. Positions 80 and 116 had additional variants, in comparison to cluster I & III (figure 7, highlighted residues in bold). These positions were highly variable in PHS and the original clusters, however, in the PHS clones several sequences included a Cys at position 80, and at position 116 several sequences included a Lys, both variants were not present in these positions in the original clusters. Position N106 was completely conserved throughout the PHS alignment, though not so in the original clusters. Following a statistical analysis of the amino acid composition, significant shifts in the were discovered in the variable segments of the NifH sequences in Paralana Hot Springs (PHS) clones affiliated with cluster III. There were significant ratio changes in 10 amino acids: a significant increase in Asp, Phe, Pro, Arg and Val, and a significant decrease in Ala, Glu, Leu, Ser and Tyr (figure 8, table 7). The composition of Cys, Gly, His, Ile, Lys, Met, Asn, Gln and Thr, did not change significantly. In its conserved segment, Gly, Leu and Tyr content varied, according to their SD values, 2.1, 2.8 and 2.3, respectively (table 8), while SD values for the other amino acids ranged from 0.0 to 1.2. Affiliated PHS clones with cluster I included a significant increase in the content of Cys, Ile, Leu, and Arg in the variable section (figure 9, table 7, P1:C1). Ala, Glu, Gly, Lys, Asn, and Gln content decreased significantly, while eight other amino acids did not change significantly. In the conserved segment, Glu content increased in P1 clones (SD = 2.7,table 8), compared to cluster I, while SD for the other amino acids remained low and ranged from 0 to 1.4.

168

38(*) - 9469999994 9136691936 1466339199 94663331119 111119599 9989899999 9993795991 991199911--116 DPKADSTRLI LHSKAQNTIM EMAAEAGTVE DLELEDVLKVG YGGIKCVES GGPEPGVGCA GRGVITAINF LEEEGAYED-

117- -11915196 9699979999 9989611991 9799719999999-157 -DLDFVFYD VLGDVVCGGF AMPIRENKAQ EIYIVCSGEMMAL

Consensus (a)

Cluster I (b) Cluster III

Figure 7 Conservation of the amplified region of NifH in PHS clones (N=36). (*) ConSurf conservation score for the PHS alignment. A representative clone sequence was chosen, with no gaps. (a) The consensus line of the stromaolites, red = completely conserved residue, purple = highly conserved residues (80% or

greater), non conserved residues are shown in black. (b) Cluster I sequence and conservation, based on P00459 sequence, and cluster III sequence and conservation based on P00456 sequence.

169

Figure 8 The amino acids mean composition in the amplified NifH amino sequence of cluster III (C3) and affiliated PHS clones (P3). Divided into variable vs. conserved regions of NifH. Error bars are SD.

Figure 9 The amino acids mean composition in the amplified NifH amino sequence of cluster I (C1) and affiliated PHS clones (P1). Divided into variable vs. conserved regions of NifH. Error bars are SD.

170

Table 7 Amino acid compositions in cluster I (C1, N=58), cluster III (C3, N=32), and affiliated clones (P1=20, P3=16) in the variable sections of NifH amino acid sequences. Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu C3(a) 6.3(2.3) 2.5(0.86) 7.7(2.2) 12(3.5) 3(1.2) 5.7(2) 0.29(0.68) 6.3(2.1) 4.6(1.5) 15(3) P3 4.91(1.66) 1.74(1.40) 10.62(2.70) 8.44(3.68) 4.51(1.31) 7.30(4.20) 0.17(0.69) 6.60(1.92) 6.38(3.80) 12.74(2.28) C1 6.9(1.8) 0.81(1.5) 8.5(2.7) 11(3) 4.6(2.3) 7.5(1.8) 2.8(1.7) 5.8(2.2) 5.7(2.6) 11(2.3) P1 5.88(1.03) 2.72(1.11) 8.45(3.58) 6.16(3.52) 3.71(2.56) 5.62(1.53) 2.17(1.29) 10.93(2.72) 4.30(2.12) 13.42(4.31)

C3:C1(b) N **** N N **** **** **** N ** ****       P3:C3 * N ** ** *** N N N N **      P1:C1 ** **** N **** N **** N **** * *        Met Asn Pro Gln Arg Ser Thr Val Trp Tyr C3 4.2(1.5) 3.2(1.6) 0.35(0.73) 2.9(1.3) 2(1.3) 3.6(1.9) 4.9(2) 8.5(2.1) 0.57(0.86) 6.2(1.4) P3 4.73(1.40) 2.21(3.40) 3.95(2.20) 3.25(2.17) 3.40(2.44) 1.52(2.21) 4.50(1.60) 11.94(3.62) 0(0) 1.09(1.46) C1 3.3(1.6) 6.7(2.3) 0(0) 1.4(1.8) 2.2(2.5) 7.7(3.3) 3.2(2.9) 8(2.3) 0.16(0.58) 2.6(1.9) P1 4.02(1.41) 3.90(1.45) 0.14(0.62) 0.15(0.68) 6.95(3.19) 6.96(3.19) 4.03(2.39) 7.74(3.42) 0(0) 2.75(1.75)

C3:C1 * **** n/a **** N **** ** N * ****   +      P3:C3 N N **** N * ** N ** n/a ****       P1:C1 N **** n/a **** **** N N N n/a N      (a) Mean ratios and standard deviation in parenthesis. C1-Cluster I, C3-Cluster III, P1-PHS clones affiliated with cluster I, P3-PHS clones affiliated with cluster III. (b) Two tailed t-test P value summary. Means are significantly different (P<0.05) according to unpaired t- tests with Welch’s correction for unequal variances. **** P< 0.0001 extremely significant, *** 0.0001

Table 8 The amino acid mean composition in the conserved sections of cluster I (C1, N=58), cluster III (C3, N=32) and affiliated clones (P1=20, P3=16). Shaded cells denote high Standard Deviation values (SD). Amino Acid Ala Cys Asp Glu Phe Gly His Ile Lys Leu C1-Conserved 11 5.3 6.6 9.2 1.3 14 0.0 6.6 2.7 5.6 P1-Conserved 11 3.6 5.9 13 2.4 16 1.2 4.7 2.4 6.4 SD 0.0 1.2 0.49 2.7 0.78 1.4 0.85 1.3 0.21 0.57 C3-Conserved 6.1 4.6 9.1 7.6 1.5 21 0.0 5.9 3.0 6.1 P3-Conserved 7.2 4.9 8.2 8.5 1.3 18 0.0 4.7 3.6 10 SD 0.78 0.21 0.64 0.64 0.14 2.1 0.0 0.85 0.42 2.8

Met Asn Pro Gln Arg Ser Thr Val Trp Tyr C1-Conserved 4.9 0.0 5.3 2.6 3.9 2.7 3.9 11 0.0 3.9 P1-Conserved 3.2 1.2 4.7 2.4 2.4 3.4 3.6 9.7 0.0 3.5 SD 1.2 0.85 0.42 0.14 1.1 0.49 0.21 0.92 0.0 0.28 C3-Conserved 3.1 0.0 6.1 1.5 6.1 4.5 4.5 7.7 0.0 1.5 P3-Conserved 2.4 1.3 3.8 1.3 4.7 4.6 4.8 6.0 0.0 4.8 SD 0.49 0.92 1.6 0.14 0.99 0.071 0.21 1.2 0.0 2.3

171

In order to find out how changes in the amino acids content might have influenced the Fe protein structure (if at all), 18 PHS clones, 9 from each cluster, were submitted to the I- TASSER process, in order to create structural models (table 9). The distance of nine PHS clones affiliated with cluster III ranged from 0.13 (RSA207) to 0.16, from the 1CP2 sequence, and their models RMSD values, ranged between 0.45 (RSA227) to 0.89 Å, according to I-TASSER results.

There were seven sections, in the P3 clones Fe proteins, with an average RMSD value higher than 1 Å, and positions 50-51, and 112-115 presented an average RMSD values higher than 2 Å (figure 10, table 10). These two sections were exposed to the solvent and composed from conserved and non conserved residues. Their predicted secondary structures included coils, bends and turns. The distance of nine PHS clones affiliated with cluster I ranged from 0.02 (RSA173) to 0.21, from the 2AFH sequence, and the RMSD values of their I-TASSER models were between 0.34 (RSA159) to 0.56 Å (table 9). The only section which varied structurally, 108-113, included conserved and non conserved residues, most of which were exposed to the solvent (figure 11, table 11). The predicted structure included parts of a helix and a turn.

Table 9 PHS clones chosen for structural analysis. Sequence ID Distance(e) C-score(a) TM(b) RMSD (Å)(c) IDEN(d) 1CP2 0 1.91 0.972 0.84 1 RSA207Par09 0.13 1.85 0.983 0.51 0.938 RSA165Par09 0.14 1.84 0.969 0.88 0.938 RSA215Par09 0.14 1.85 0.968 0.9 0.933 RSA228Par09 0.14 2.14 0.9815 0.7 0.67 RSA195Par09 0.15 1.85 0.967 0.9 0.929 RSA158Par09 0.15 1.84 0.976 0.89 0.933 RSA226Par09 0.15 2.13 0.9833 0.63 0.69 RSA219Par09 0.16 1.85 0.985 0.46 0.924 RSA227Par09 0.16 1.82 0.989 0.45 0.915

2AFH 0 2.12 0.9897 0.54 1 RSA173Par0 0.02 1.84 0.994 0.43 0.841 RSA159Par0 0.03 1.86 0.992 0.34 0.987 RSA194Par0 0.09 1.86 0.989 0.48 0.954 RSA208Par0 0.11 1.87 0.946 0.41 0.996 RSA192Par0 0.12 1.87 0.99 0.44 0.946 RSA203Par0 0.17 1.83 0.988 0.52 0.916 RSA221Par0 0.17 1.85 0.989 0.48 0.924 RSA213Par0 0.18 1.85 0.992 0.38 0.916 RSA163Par0 0.21 1.85 0.987 0.56 0.895 (a) Confidence score for estimating the quality of the predicted top model by I-TASSER (Roy et al., 2010). 172

(b) TM-score of the structural alignment between the query structure and known structures in the PDB library (Zhang and Skolnick, 2005). (c) The overall RMSD between residues that were structurally aligned by TM-align (Zhang and Skolnick, 2005). (d) The percentage sequence identity in the structurally aligned region (Roy et al., 2010). (e) Distances as calculated by PHYLIP Protdist” version 3.67 (Felsenstein, 2007).

Table 10 Residue characteristics of PHS clones affiliated with cluster III with regions of RMSD >1 Å. Residue positions* 50-51 62-63 65-67 89 91-93 112-115 152-153

Amino acid(a) GL EE EDV E GVG YTDD MM Conservation score(b) 97 18 757 9 999 6111 79 Main secondary 3-helix- bend/coil helix coil turn/bend coil/turn helix structure(c) turn/coil Solvent accessibility(d) ee ee ee- e --- eeee eb Average RMSD (Å) (e) 2.32 1.23 1.28 1.24 1.37 2.3 1.58 * Position number according to 1CP2, chain A sequence, P000456 accession ID (a) The amino acid in each position in the respective 1CP2 sequence. (b) ConSurf conservation scores for cluster III, 1-9, non- conserved to completely conserved, respectively. (c) The secondary structure based on the crystallographic structure of 1CP2. (d) Solvent accessibility - Buried (b) or exposed (e) residue, (-) varying degrees of exposure. See Table 2 for full details. (e) Average RMSD values for specific positions, in which the clones presented RMSD values > 1 Å, relatively to the 1CP2 structure.

Table 11 Residue characteristics of PHS clones affiliated with cluster I with regions of RMSD >1 Å. Residue positions* 108-113

Amino acid(a) FLEEEG Conservation score(b) 899927 Main secondary structure(c) helix&turn Solvent accessibility(d) bbeeee Average RMSD (Å) (e) 1.7 * Position number according to 2AFH, P00459 sequence accession ID, chain E. (a) The amino acid in each position in the respective sequence. (b) ConSurf conservation scores for cluster I, 1-9, non- conserved to completely conserved, respectively. (c) The secondary structure based on the crystallographic structure of 2AFH. (d) Solvent accessibility - Buried (b) or exposed (e) residue, (-) varying degrees of exposure. See Table 4 for full details. (e) Average RMSD values for specific positions, in which the clones presented RMSD values > 1 Å, relatively to the 2AFH structure

173

Figure 10 Two opposite angles of 1CP2 Fe protein superimposed with PHS clones I-TASSER models. Magenta highlights areas where RMSD >1 Å. The largest structural shifts, where RMSD >2 Å and the site of the two residue insertion are in red, see table 18.

Figure 11 2AFH Fe protein superimposed with its closest PHS clones I- TASSER models. Magenta highlights areas where RMSD >1 Å, as per table 19.

According to the analysis we performed on six PHS clones (three for each cluster), there was a total of seven common salt bridges for PHS clones and 1CP2 or 2AFH and 12 unique salt bridges which were not detected in 1CP2 or 2AFH (Table 12). Two unique salt bridges were highly conserved in P3 and P1, and were at positions Asp42-Arg45 and Glu151-Arg184 (P3 residue numbering respectively, underlined in Table 12). The Asp or Glu residues and most of their positive partners were highly conserved in the unique salt bridges. The additional negative residues in the region of 112-115 (cluster III, table 10); seem to strengthen ionic bonds with Lys33 and Met1 mainly, but not only. Salt bridges in P3, corresponding to the above mentioned region, were detected in our analysis but their interatomic distances ranged between 4.1 to 6.88 Å and were therefore not specified in Table 12. The negative residues interacted mainly with Arg81, Lys113 and His26, all of which scored 1 for conservation in cluster III.

174

Table 12 Potential salt bridges, with maximum intertatomic distance of 4 Å, in the amplified NifH region of the Fe protein. Shaded rows represent common salt bridges present in the representative clones and the selected structure, 1CP2 or 2AFH. Conservation Residue Position(b) Residue Position Distance (Å) scores (c) 1CP2(a) ASP 38 LYS 14 3.07 9,9 GLU 62* LYS 54 2.77 1,6 GLU 75 ARG 81 3.84 1,1 GLU 107 LYS 140 3.96 9,9 ASP 115 ARG 81 3.22 1,1 ASP 122* LYS 14 2.95 9,9 GLU 143 ARG 2 3.58 9,9

P3(d) ASP 42 ARG 45 2.65 9,9 ASP 58 LYS 54 3.3 9,9 ASP 58 ARG 61 3.25 9,9 ASP 62 LYS 54 3.61 7,9 GLU 89 ARG 61 3.19 9,9 GLU 107 LYS 140 2.73 9,9 ASP 126 LYS 40 2.84 9,9 GLU 145 ARG 2 3.22 9,- GLU 151 ARG 184 3.04 9,- 2AFH(e) ASP 39 LYS 15 3.09 9,9 E↔F GLU 92 LYS 170 2.8 9,9 GLU 110 LYS 143 3.24 9,8 ASP 118 LYS 32 3.38 3,7 ASP 125 LYS 15 3.14 9,9 ASP 129 LYS 41 2.74 9,9 GLU 141 ARG 140 2.92 6,9 GLU 146 ARG 3 2.88 9,9 GLU 154 LYS 10 3.97 9,9 GLU 229 HIS 50 2.53 2,6 E→F GLU 265 LYS 52 3.68 9,9 F→E GLU 277 LYS 52 2.86 -,9

P1(f) GLU 29 ARG 82 3.7 -,1 ASP 44 ARG 47 2.84 9,9 ASP 70 ARG 65 2.65 9,2 GLU 111 LYS 144 2.82 9,9 ASP 117 LYS 33 3.97 7,- GLU 118 MET 1 2.77 7,- ASP 119 LYS 33 3.25 7,- ASP 126 LYS 16 3.82 8,- GLU 147 ARG 4 2.66 9,- GLU 155 ARG 188 2.96 9,- (a) 1CP2 analysis by WHAT IF, salt bridges were not detected between the Fe protein subunits A & B. (b) Positioning was manually corrected for minor shifts per alignment. (c) Conservation score was based on the individual analysis of ConSurf on cluster I and its affiliated clones (P1), cluster III, PHS clones affiliated with cluster III (P3). Scores ranged from 1 to 9, non- conserved to completely conserved, respectively. “-“score was not calculated. (d) Based on the WHAT IF analysis on the I-TASSER PDB files of cluster III PHS clones: RSA227Par09, RSA158Par09 and RSA207Par09. (e) 2AFH analysis by WHAT IF, salt bridges were detected between subunits E & F, and are designated where relevant. (f) Based on the WHAT IF analysis on the I-TASSER PDB files of cluster I PHS clones: RSA163Par09, RSA208Par09 and RSA194Par09. * Yellow background colour denotes a residue in a α-helix structure, and green colour denotes a residue within a E-sheet. No background colour means random coil or unknown structure.

175

6.4 Discussion

6.4.1 Halophilic adaptations

The stromatolite inferred NifH sequences were subjected to analyses of conservation patterns, amino acids composition and structural shifts, in comparison to cluster I and cluster III.

According to our analyses, most of the amplified region was conserved in the stromatolite clones. A lesser portion of the alignment was completely conserved in comparison to the original clusters. 70% and 66% of cluster I and cluster III positions in the same region, respectively, scored 8 & 9 in comparison to the stromatolite alignment (50%). Also, it would appear that highly variable positions with scores lower than 6, were more prevalent in the sequence alignment of columnar stromatolites (41%), and in comparison to clusters I and III (21%, each). This was expected as the stromatolite multiple alignment was combined from clones affiliated with both clusters, hence their multiple alignment included more variable residues per position.

Two characteristics were checked in order to find if the stromatolite sequences, as a set, as a group, had a pattern regardless of their clusters affiliation. Firstly, whether the completely conserved sections matched the correlating segments in cluster I or III and secondly, whether the variability of residues, per position, was within the known variants we found for cluster I & III previously. Completely conserved sections did match the correlating segments in cluster I and cluster III and included the important functional regions, and in this regards they did not provide any new information. In regards to the second point, 16 positions out of the 119 residues of the partial gene region (on average), matched our second criteria, and included residue variants in the stromatolite clones which were different than those present in cluster I or III. However, 13 variants were present in only one sequence (RSA152) out of the whole set, and therefore were discarded from further analysis and discussion. Four positions were found to include patterns unique to the stromatolites, as a group. They suggested bias towards Leu and Asn, and conservation of function over structure.

The amino acids composition analysis enabled us to detect shifts relative to cluster I or cluster III. In the clones affiliated with cluster III, the slight decrease in the charged amino acids in the conserved section, Arg and Asp, could be interpreted as an adaptive strategy to minimise interference from salt ions within the core of the protein, and the addition of hydrophobic elements such as Tyr and Leu would minimise accessible surface area to the solvent, within the important functional buried sites of the protein (Moret and Zebende, 2007). Additionally, in the 176 variable section there was an increase in positively and negatively charged amino acids (Asp, Glu, Arg, Lys), an increase in small amino acids (Gly, Pro) and small hydrophobic amino acid (Val), as well as a decrease in bulkier hydrophobic amino acids (Ile, Leu, Met, Tyr, Cys). Additional findings included an increase of Leu in the conserved section of stromatolites regardless of cluster affiliation (table 1). The common finding in the variable sections included five amino acids whose composition varied significantly in the stromatolites, but not so in the reference clusters (table 2). Interestingly, Glu and Arg content increased in stromatolite affiliated with cluster I (S1) and cluster III (S3), yet Asp, Ile and Val did not have a joint trend for S1 and S3, and displayed different shifts (table 2). Asp and Val decreased in S1 and increased in S3, while Ile increased in S1 and decreased in S3. Tyr and Cys decreased in S1 and S3, and the most prevalent amino acid was Glu (17/15 for S1/S3, respectively, table 2). The interplay previously observed in cluster I & III, between Ala and Gly amino acids, was not present in the stromatolites.

Metagenomic studies of halophilic bacteria, Archaea and also of total bacterial DNA from saline and hypersaline environmental samples, revealed several re-occurring genomic themes, though not all of them were absolute for all protein families across all halophiles (Fukuchi et al., 2003; Paul et al., 2008; Rhodes et al., 2010).

The shift in hydrophobic residues has been partially attributed to the rich GC-based DNA halophiles possess, which effects codon usage (Paul et al., 2008). Rao and Argos (1981) reported that in a chloroplast-type 2Fe-2S ferredoxins from two halophiles, large hydrophobic and aliphatic residues such as Ile, Leu, Phe and Met, were replaced by smaller residues - Ala, Gly and Val to reduce overall protein bulkiness and promote a tight configuration which is less accessible to the solvent. The overall hydrophobicity remaining relatively unchanged in comparison to the non halophilic proteins (Rao and Argos, 1981). The increase in charged amino acid frequency has been reported as a halophilic mechanism, for instance, to produce excess of negative charges which act as a charged screen against salt ions, attract water molecules and enable the protein to remain active in saline conditions up to 4M NaCl. According to Lanyi (1974), there is also an excess of small amino acids with short side chain - Gly, Ala. Fukuchi et al. (2003) performed a statistical analysis on 126 proteins from Halobacterium sp. NRC-1 and three other halophiles and found an abundance of acidic residues on the external surface of halophilic proteins vs. non halophiles, while the internal composition did not change significantly

Paul et al. (2008) presented data from which it was concluded that halophilic proteins, in general, are less hydrophobic than non halophilic proteins. Similar findings were reported from

177 a statistical review of 26 halophilic enzymes by Madern et al. (1995) with the additional finding of lower Lys content (a feature which was mentioned by Eisenberg (1995) as well). However, it should be noted that some analysis has shown that the composition of Arg and Lys is dictated solely by the G+C content in the DNA, and has nothing to do with their charge or other biochemical properties (Cambillau and Claverie, 2000).

In order to achieve a reliable structural analysis, we chose the 10 stromatolite clone sequences that were relative close, distance wise, to the 1CP2/P00456 or 2AFH/P00459 sequences. The minimum distance for a stromatolite clone to 1CP2 was 0.14 in our dataset (table 3). The maximum distance was 0.23 and we therefore expected that some structural changes would be evident. The highest RMSD value for a clone model with cluster I was 0.61 Å (RSA114) while the highest value for a clone model of cluster III was 1.05 Å (RSA9196), indicative of the uncertainties in modelling a cluster III Fe protein, when the I-TASSER does not have a robust number of cluster III resolved structures to rely on. Our analysis suggested two sections participated in structural shifts in the cluster III affiliated clones. These two sections included a residue involved in the Fe protein dimer interface (Leu51), and a hydrogen bonding partner with water molecules (Thr113; (Schlessman et al., 1998).

The region of 112-115 in the S3 clones was always elongated by two charged residues. The two main forms in this region were AESEE or EEDKK in S3 clones, which formed unfavourable salt bridges (interatomic distance >4 Å, table 6). A quick glance at the cluster III alignment (figure 2, section 5.3.1, chapter 5), revealed that in these positions, an insertion of two residues tend to occur, and the specific KK or EE type of insertion was also present in the NifH sequences of Desulfovibrio magneticus strain ATCC 700980 (NIFH_DESMR_1_271 sequence ID), Desulfovibrio gigas (NIFH_DESGI_1_271) and Desulfatibacillum alkenivorans strain AK- 01 (NIFH_DESAA_1_271). Therefore while the insertion of the charged residues was not unique or endemic to the stromatolite group, it was definitely a stabilising adaptation for saline conditions, as these specific species are known to withstand saline conditions (Cravo-Laureau et al., 2004; Garrity et al., 2005). An increase of salt bridges was reported within monomers and at the inter subunits interfaces of halophilic proteins, as a stabilising mechanism (Eisenberg, 1995; Madern et al., 2000). These studies suggested that the salt bridges were composed by either an Arg residue which interacted with the acidic residues, or by the solvent ions such as chloride and sodium to which the salt bridges were bound. Our analysis of potential salt bridges revealed that while the Asp or Glu residues were always highly conserved in S3 or S1 in general, their positive partners were sometimes highly variable. The mechanism described by Madern et al., (2000), would therefore fit the conservation we see of acidic residues in S3 or S1 and would allow for flexible interactions with positive ions from the solvent. 178

Our analysis suggested three sections were involved in structural shifts in the affiliated clones of cluster I. The Gly residues, in positions 94 and 96, support a functionally important Val residue in between (V95, table 4, section 5.3.2, chapter 5), which participates both in binding to the MoFe protein and in the dimer interaction within the protein. In addition C97, right after G96, coordinates the metalo cluster by a hydrogen bonding between the main chain amide, the sulfur atoms in the cluster and the thiol group of the Cys (NH-S bond). This loop region have been found to exhibit variation in conformation previously and though one residue was denoted buried and the other exposed to the solvent, the entire cluster area is considered accessible to the solvent in general (Schlessman et al., 1998). The region of YEDD (115-118) is mostly non- conserved and exposed to the solvent, and position 117 sometimes included Asp or Asn in the S1 clones. The Asn is a rather unique choice for some of the stromatolites, since in 13 sequences in cluster I alignment, an insertion of D/E/S/V was evident as well (figure 5, section 5.3.2, chapter 5), but Asn was never present. These 13 sequences were of NifH from Magnetococcus strain MC-1, Dechloromonas aromatic, , Pectobacterium atrosepticum, Klebsiella pneumonia, Teredinibacter turnerae T7901, Alcaligenes faecalis, Pseudomonas stutzeri, Azotobacter chroococcum and A. vinelandii. Most of the sequences in cluster I did not have an additional residue in position 117 (figure 5, section 5.3.2, chapter 5).

According to our previous analysis with 1CP2 and 2AFH resolved crystallographic structures - the residues corresponding to positions 51-52, 89, 93 and 113-114 also presented high RMSD values (table 5, section 5.3.3, chapter 5). Hence, we believe a plausible explanation for the structural shifts in positions 51-52, 89, 91-93, and 113-114 in cluster III clones, originated from the methodology used by I-TASSER. The process relies on available protein structures, which at the moment are mostly Fe proteins from A. vinnelandii, and therefore may not reflect authentically potential shifts in the structure of Fe proteins from the clones. In a similar fashion, two out of the three sections observed when analysing 2AFH and related stromatolite clones (table 5, figure 6) were also detected previously and can be attributed to the I-TASSER process, and may not represent authentic structural shifts. On the other hand, positions 41-42, 62-63, 65- 68, and 115 (with the insertion of two additional residues), may authentically represent structural shifts in stromatolite affiliated with cluster III. Altogether these findings suggest that structural shifts occur in the stromatolite Fe proteins, in addition to the possible bias introduced by the I-TASSER procedure.

In summary, based on amino acid composition and structural analysis, the overall results suggest halophilic adaptations were present in the inferred NifH sequences of the stromatolites.

179

6.4.2 Thermophilic adaptations

The Paralana Hot Springs NifH clones (PHS) were subjected to analyses of conservation patterns, amino acids composition and structural shifts, in comparison to cluster I and cluster III. Our analysis demonstrated that most of the amplified region was conserved in the PHS clones, yet relative to cluster I and cluster III, they were less conserved. Seventy percent and 66% of cluster I and cluster III positions in the same region, respectively, scored 8 & 9. In addition, the sequence alignment of PHS clones included more highly variable positions with scores below 6. This was expected as the PHS multiple alignment was combined from clones affiliated with both clusters; hence their multiple alignment included more variable residues per position. We also looked for unique residue variants within the PHS multiple alignment that differed from the variants of the original clusters.

In the variable region of PHS alignment, 13 positions were found to include amino acid variants which were not present in cluster I or III. However, except for positions 80 and 116, the variants were present in only one sequence out of the whole set, and therefore were discarded from further analysis and discussion. N106 was completely conserved in PHS, in contrast to the original clusters (figure 7). It is unknown at the moment, how these changes would affect the Fe protein function in PHS clones.

Hyperthermophilic proteins usually display an increase in charged (Arg, Lys, Glu, Asp) and some hydrophobic amino acids (Ile, Met, Val, Tyr), accompanied by a decrease in uncharged polar residues such as Ser, Thr, Asn and Gln, with no significant variation for His, Pro, Gly or Cys (Cambillau and Claverie, 2000; Daniel et al., 2008). Other studies reported slightly different results: an increase in Glu, Ile, Val, Tyr, accompanied by decreases in Ala, His, Gln and Thr (Fukuchi and Nishikawa, 2001; Singer and Hickey, 2003).

In general, the increase in charged amino acids results in chains of ion pairs, which enhance stability at high temperatures. Asn and Gln are sensitive to temperature fluctuations, due to the increased rate of deamination at high temperatures, hence decreasing their presence promotes stability overall, at high temperatures. Hydrophobic interactions in the protein affect its stability , increasing the core hydrophobicity produces a small and compact core, which stabilises the protein at higher temperatures (Siddiqui and Cavicchioli, 2006; Siddiqui and Thomas, 2008).

The amino acids composition analysis enabled us to detect composition shifts relative to cluster I or cluster III. Clones affiliated with cluster III (P3) had an increase in positively and negatively charged amino acids (Asp, Arg), and an increase in small or hydrophobic amino acids (Phe, Val,

180

Pro) but also a decrease in hydrophobic residues such as Tyr, Leu and Ala, as well as the negatively charged Glu (table 7, figure 8).

The fluctuations in the conserved sections, relative to cluster III, point to an increase in the Leu & Tyr content, and a decrease in the Gly content. Therefore there might be interplay between the external, variable sections to the conserved interior. In the interior, a slight increase in large hydrophobic residues, would help to minimise accessible surface area to the solvent, within the important functional buried sites of the protein (Jaenicke and Böhm, 1998; Haney et al., 1999). In common with the P3 clones, the variable section included an increase in Arg, and a decrease in Ala and Glu amino acids (table 7, figure 9). P1 clones also decreased in other charged amino acids and uncharged polar residues. Glu increased in the conserved sections of P1 but this was not observed in the P3 clones (table 8). The interplay previously observed in cluster I & III, between Ala and Gly amino acids in the conserved regions, was not detected in PHS clones.

In order to achieve a reliable structural analysis, we have chosen 18 PHS clones, at a maximum distance of 0.21, to 1CP2/P00456 or 2AFH/P00459 sequences (table 9). The highest RMSD value for a clone model with cluster I was 0.56 Å (RSA163) while the highest value for a clone model of cluster III was 0.9 Å (RSA215, RSA195), indicative of the uncertainties in modelling a cluster III Fe protein, with the current low number of available resolved structures of Fe proteins from this cluster. According to our previous analysis with 1CP2 and 2AFH resolved crystallographic structures (table 5, section 5.3.3, chapter 5), the residues corresponding to positions 50-51, 62-63, 65-67, 89, 91-93 and 112-115, presented high RMSD values and therefore some of the reported shifts in P3 are a result from the I-TASSER process and may not represent authentic shifts.

These results were similar to our findings with the stromatolites clone partial NifH sequences. In the P3 clones, the region of 112-115 was sometimes elongated by two residues, and our analysis suggested a salt bridge might be established at times (Table 12figure 10). Three main alternatives for this section were observed in the clone sequences - KMD/EESQE/DADKK. For some of the PHS clones affiliated with cluster I, no insertion was evident at all, and one of the Asp residues would change to Gly, while another negative residue would be omitted at times (table 11, figure 11). P1 and P3 did not share the exact same modification, but they did share the same region in which this modification occurred.

In summary, based on amino acid composition and structural analysis, the analysis suggested thermophilic adaptations were not present in full, in the inferred NifH sequences of PHS clones.

181

6.5 Concluding remarks

NifH sequences from Shark Bay hypersaline environment and Paralana hot springs were analysed in terms of their conservation patterns, amino acid composition and existing and potential structural attributes. Our methods included a statistical t-test analysis of the amino acid composition, a novel ConSurf evolutionary analysis, and the use of the I-TASSER web server for 3D modelling of the amplified region of NifH. Our results were explained in light of the methodology limitations as discussed previously, in section 5.4.1, chapter 5.

The results suggested that to a certain degree, halophilic adaptations, with an increase in salt bridges, charged residues and a decrease in bulkier hydrophobic amino acids, did occur. The changes were less apparent in the clones affiliated with cluster I, than with the clones affiliated with cluster III, which may be an indication of some measure of protection of the protein from the environment in the cluster I affiliated clones (see table 13).

The NifH protein sequences from Paralana Hot Springs were subjected to a similar analysis. The results suggested that to a limited degree, some of the known thermophilic adaptations - an increase in salt bridges, charged residues and Pro, were present in the sequences; however other known features were not detected, including an increase in several hydrophobic amino acids and a decrease in uncharged polar residues. These conflicting results may be indicative of a changing temperature regime in the hot spring, as different temperatures were reported in the past (Mawson, 1927; Grant, 1938; Long et al., 2001; Anitori et al., 2002), or of additional environmental factors such as salinity, coming into play (see table 13). These factors require further confirmation.

Some of our findings can only be confirmed once a determined Fe protein structure has been isolated from representatives’ microorganisms from the investigated environments.

182

Table 13 Summarising halophilic and thermophilic findings from this study.

Halophilic adaptations*

More Asp or Glu, Less Lys, Ile or Leu or More salt

Ala, Gly or Val(a) Phe or Met(a) bridges(b)

Stromatolites S3: Glu, Asp, Gly, Val S3: Ile, Leu, Phe, Met + NifH clones S1: Glu S1: Lys, Phe

Thermophilic adaptations**

High Less Gly or Met or More Ile or Tyr, Arg More salt Arg/Lys Gln or Thr or Asn or or Glu, Pro or Lys(a) bridges(b) ratio (>1) Ser(a) (a)

PHS NifH P3:Arg, Pro P3: Ser P3: 0.53 + clones P1: Ile, Arg, Pro P1: Gly, Gln, Asn, Ser P1: 1.61

*Specific halophilic adaptations (Eisenberg, 1995; Madern et al., 2000; Bolhuis et al., 2008). ** Specific thermophilic adaptations (Haney et al., 1999; Daniel et al., 2008). (a) Specific changes in the amino acids composition and whether they appeared in the variable sections of the NifH sequence of the stromatolite clones (S1, S3), or the PHS clones (P1, P3). Changes were in comparison to cluster I (S1, P1 vs. C1) or cluster III (S3, P3 vs. C3) values. (b) See tables 14 & 20 - salt bridges calculated by WHAT IF, version 10.1a, (Rodriguez et al., 1998).

183

Chapter 7 Conclusions & future work ______

“Nothing in biology makes sense except in the light of evolution” is a statement that still stands true, throughout the decades (Dobzhansky, 1973). Genetic studies have revealed that the nifH gene is present in numerous bacteria and Archaea and is relatively common in a vast number of genomes (Gary Stacey, 1992; Berman-Frank et al., 2003; Raymond et al., 2004a). This in turn suggests that the gene has been present in the genetic code, for a long time, perhaps even since the Last Universal Common Ancestor (LUCA) (Fani et al., 2000; Leipe et al., 2002; Latysheva et al., 2012).

As stated in the beginning of this thesis - nitrogen fixation is one of the most important biochemical processes. Our main aim in this work was to study microbial communities involved in this process, which reside in unique, sometimes extreme, environments. We then analysed the modifications in the NifH sequences we obtained from the molecular work, and assessed whether unique adaptations of the Fe protein were evident. Our non molecular methods included a statistical t-test analysis of amino acid compositions, and a novel combination of an evolutionary analysis and protein 3D models.

It would appear then, that from the early beginning of life on Earth, the nifH gene had been translated into a functional protein, under various environmental conditions (Leigh, 2000). According to some recent studies, it would seem that phylogeny trees based on functional genes, such as the nifH gene, represent microbial communities better than taxonomy based phylogeny trees, as they reflect the immediate environment in which the micro-organisms live (Burke et al., 2011; Hamilton et al., 2011a). It is reasonable to assume that proteins would be optimised to ensure survival in a specific environmental setting, and that micro-evolution would match specific ecological niches (Taroncher-Oldenburg et al., 2003). Findings of this nature suggest that the different clusters in the phylogenetic tree would actually represent past adaptations to environmental changes regardless of taxonomical relations (Burke et al., 2011; Hamilton et al., 2011a) and would actually represent conditions currently influencing the composition of the genetic code.

In other words - phylogenetic affiliations would correlate best with specific physical and chemical influences, during a specific time frame, and not necessarily with taxonomical groups,

184 and in addition, functional genes such as the nifH, would not be identical in the same species, if its members reside in different environments. Altogether this suggests that a linear story for the evolution of the nifH gene (or other functional gene), is highly unlikely. Published phylogenetic analyses of nifH, and also related nif operon genes, seem to support this avenue of thought (Gary Stacey, 1992; Fani et al., 2000; Leipe et al., 2002; Berman-Frank et al., 2003; Raymond et al., 2004a; Latysheva et al., 2012).

A possible interpretation of the current known topology of the nifH tree (four clusters, cluster I and III as the main clades, see chapters 1-3) would be that the main clusters most probably represent an adaptation to the presence, or lack of, oxygen. In turn, this would set the cluster’s time of branching around the 2.22 - 2.45 billion years ago, at the great oxidation event (Brocks et al., 1999; Anbar et al., 2007). The current tree topology may thus represent not only a specific and dramatic change that happened in the past, at some point in time, but also an ongoing global setting - still affecting genomes across a wide range of geographical locations. We would argue that any functional gene phylogenetic tree should be searched for a similar topology, and if found, one could assume that the ‘great divide’ would have been set around the time of the great oxidation event.

We have assessed in this study, for the first time, bacterial profiles from two Antarctic sites, in the Terra Nova Bay area (Abramovich et al., 2012). In order to gather evidence for the bacterial communities in these glacial zones, we carried out a terminal-restriction fragment length polymorphism (T-RFLP) analysis on 16S rDNA using a universal bacterial amplification protocol on two permafrost cores (Marsh, 1999). Bray-Curtis cluster analysis suggested Boulder Clay bacterial profiles were similar to each other, but cluster separately from the Amorphous Glacier bacterial profile (Hammer et al., 2001). Amorphous Glacier was potentially rich in microbial species and the two sites differed in their microbial diversity. Permafrost and icy environments are difficult to work with (Miteva, 2008), but they are present on Mars and other objects in the solar system (McKay et al., 1991; Friedmann, 1993; Ostroumov and Siegert, 1996). Icy environments on Earth are therefore important analogue sites for astrobiological research, if we aim to learn and adapt technologies to find life elsewhere in the universe (Soina et al., 1995).

Our study is the first to confirm the presence of nifH genes in columnar stromatolites, Shark Bay, Western Australia (chapter 3). Shark Bay, a UNSECO’s world heritage site, provides researchers with fascinating endemic microbiological subjects, which bridge our current era with Archaean fossil records from the beginning of life on Earth. These “living fossils” are important to our understanding of the origin of life on Earth, as their remnants are consistently

185 being found in the Earth’s geological records, the oldest to date found in 3.49 Ga Archean rocks (Walter, 1976; Schopf, 2006).

Our findings partially matched former taxonomical findings on the stromatolites based on studies which utilized mainly 16SrDNA and culturing analyses. Common potential diazotrophs included cyanobacterial species and Desulfatibacillum of the δ-Proteobacteria (Goh et al., 2008; Allen et al., 2009; Burns et al., 2009). The two stromatolite samples, from different years, differed in their species diversity and richness, and we suggested this was related to the environmental events that occurred at the time of sampling. Our results indicated that columnar stromatolites and the salt ponds of Guerrero Negro, Mexico, harbour similar diazotrophic species, mainly from the δ-Proteobacteria and Cyanobacteria groups. However, the stromatolites included unique species, such as non-heterocystous Cyanobacteria and γ, δ-Proteobacteria NifH sequences, which were not present in the Guerrero Negro salt ponds. A new clade was an out- group to cluster I, and centred on the δ-proteobacterium, Pelobacter carbinolicus DSM 2380 and affiliated NifH clones.

In a different part of Australia, the diazotrophic community of a hot and slightly radioactive spring was investigated for the first time (chapter 4). Our findings included diazotrophs from the Cyanobacteria, Nitrospirae, Spirochaetes, Bacteroidetes, Firmicutes and δ-Proteobacteria groups, few of which were reported by a former taxonomical study utilising a 16SrDNA analysis (Anitori et al., 2002). These diazotrophs were mainly affiliated with cluster I and cluster III of the NifH phylogeny tree; however, two new clades were established as out groups to cluster I. These clades included NifH clones closely related to Thermodesulfovibrio yellowstonii DSM 11347 (Nitrospirae), several Geobacter spp. and P. carbinolicus DSM 2380 (δ-Proteobacteria).

The number of NifH clones analysed and sequenced in this study (76), represents the highest number of NifH clones from a singular hot spring to be analysed to date (Hamilton et al., 2011). According to our richness and diversity analysis, the diazotrophic community was more diverse and included more NifH species than Shark Bay columnar stromatolites and should be further investigated and sampled. Hydrothermal systems in general produce habitable microenvironments (Jannasch and Wirsen, 1981; Sogin et al., 2006), and there is evidence to suggest their existence on Mars, Europa, Enceladus and other solar bodies (McCollom, 1999; Vance et al., 2007; Glein et al., 2008; Skok et al., 2010), making the Paralana’s active amagmatic hydrothermal system an interesting analogue site for astrobiology research.

Our bioinformatics approach paved the way for future research to use the nifH gene as a reference point for analysis of genomic and protein modifications (chapters 5 & 6). While our 186 data sets were small, it allowed for an in depth analysis of our methodology and its limitations. The results were limited by the nature of our datasets, and yet showed great promise as specific adaptations were detected in the NifH sequences from Shark Bay and Paralana Hot springs, supporting the notion of dynamic evolution in their respective environments.

Future work

Molecular and bioinformatics tools were our main methodologies in this study. Future researchers may want to focus not only on potential diazotrophs but also on identifying the actual nitrogen fixers in these unique environments. The new out groups of the NifH phylogenetic tree reported in this study, represent adaptation to high temperatures and high salinity, but it is unclear if they are active agents in fixing atmospheric nitrogen. Assessment of actual nitrogenase activity can be achieved with reverse transcriptase PCR and quantitative reverse transcriptase PCR, and with acetylene reduction assays. These methodologies would shed light on the key players in the N2 fixation cycle.

Whole genome amplification could also be used to increase the DNA concentrations recovered from the environment for downstream PCR analysis. Such research will confirm the presence and viability of psychrophilic, thermophilic and halophilic bacterial phyla, and correlate the community composition with the geological and habitat characteristics. Proteomics studies would link nitrogen fixation key enzymes and genes to other biochemical processes, such as photosynthesis (oxygenic and anoxic) or sulphate reduction and oxidation, and would provide comparable data with other microbial systems. Measurements of N15 uptake on a micron scale, within for example, the stromatolite mats’ upper layers (down to 5-8 mm depth), would provide a reliable portrait of the nitrogen budget within the layered microbial mats and within the different types of stromatolite mats.

Additionally, as most of what is currently known about the nitrogenase activity is derived from studies based on Cyanobacteria, nitrogenase activity should be explored and characterised in diazotrophic sulphate reducing bacteria (SRB) and other anaerobic bacteria.

Future work may also include analysing the new phylogenetic out groups presented in this study (chapters 3 and 4). Our methodology can be employed on these sequences and compared to our current body of work, and also compare them to distinct thermophilic or halophilic NifH sequences or perhaps GTPases from thermophilic and halophilic genomes. This in turn, will not only clarify what are the evolutionary steps which bring forth thermophilic or halophilic

187 adaptations, across taxonomical groups and across protein families, but it will clarify whether taxonomy trumps functionality for this type of gene (see the opening paragraphs in this chapter). In addition, comparing these out group sequences to cluster I and cluster III affiliated clones might reveal a gradient of adaptations in the protein composition and structure, thus illuminating the entire range of adaptations possible to diazotrophs in a specific environment.

Elongation of the amplified region of the nifH gene via the PCR process would be very beneficial to our analysis, and will enable researchers to confirm or reject our current analysis, mainly in regards to the amino acid compositions and content in the conserved vs. non conserved regions of the Fe protein. Additional characteristics that can be assessed in regard to potential adaptations include (briefly): aromatic interactions, hydrogen bonds, disulfide bridges, surface accessibility of certain amino acids, electrostatic interactions in the core vs. protein surface and thermodynamic and protein activity properties.

In summary, this study has enhanced our knowledge of microbiological agents which survive successfully in extreme environments. These environments are worthy of our attention as they provide analogous sites for research intended on finding evidence for life elsewhere in the solar system. Given enough time to adapt, these successful micro-organisms could survive rigorous conditions outside of Earth’s protective shell, promoting an optimistic view of finding micro- organisms elsewhere in the solar system.

188

References

Abascal, F., Zardoya, R., and Posada, D. (2005) ProtTest: selection of best-fit models of protein evolution. Bioinformatics 21: 2104-2105. Abramovich, R.S., Pomati, F., Jungblut, A.D., Guglielmin, M., and Neilan, B.A. (2012) T-RFLP Fingerprinting Analysis of Bacterial Communities in Debris Cones, Northern Victoria Land, Antarctica. Permafrost Periglac 23: 244-248. Abyzov, S.S., Filippova, S.N., and Kuznetsov, V.D. (1983) Nocardiopsis antarcticus- A new species of actinomyces isolated from the ice sheet of the Central Antarctica glacier. Izv Akad Nauk Ser Biol 4: 559-568. Adams, D.G. (2000) Heterocyst formation in cyanobacteria. Curr Opin Microbiol 3: 618-624. Affourtit, J., Zehr, J., and Paerl, H. (2001) Distribution of nitrogen-fixing microorganisms along the Neuse River Estuary, North Carolina. Microb Ecol 41: 114- 123. Aislabie, J., Jordan, S., Ayton, J., Klassen, J.L., Barker, G.M., and Turner, S. (2009) Bacterial diversity associated with ornithogenic soil of the Ross Sea region, Antarctica. Can J Microbiol 55: 21-36. Aislabie, J.M., Chhour, K.L., Saul, D.J., Miyauchi, S., Ayton, J., Paetzold, R.F., and Balks, M.R. (2006) Dominant bacteria in soils of Marble Point and Wright Valley, Victoria Land, Antarctica. Soil Biol Biochem 38: 3041-3056. Akaike, H. (2002) A new look at the statistical model identification. Automatic Control, IEEE Transactions on 19: 716-723. Allen, M., Goh, F., Burns, B., and Neilan, B. (2009) Bacterial, archaeal and eukaryotic diversity of smooth and pustular microbial mat communities in the hypersaline lagoon of Shark Bay. Geobiology 7: 82-96. Allen, M.A. (2006) An Astrobiology-Focused Analysis of Microbial Mat Communities from Hamelin Pool, Shark Bay, Western Australia. In School of Biotechnology and Biomolecular Sciences. Sydney: The University of New South Wales, p. 243. Allen, M.A., Goh, F., Leuko, S., Igo, A.E., Mizuki, T., Usami, R. et al. (2008) Haloferax elongans sp nov and Haloterax mucosum sp nov., isolated from microbial mats from Hamelin Pool, Shark Bay, Australia. Int J Syst Evol Microbiol 58: 798-802. Altschul, S., Madden, T., Schaffer, A., Zhang, J., Zhang, Z., Miller, W., and Lipman, D. (1997) Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res 25: 3389. Altschul, S.F., Gish, W., Miller, W., Myers, E.W., and Lipman, D.J. (1990) Basic local alignment search tool. J. Mol. Biol 215: 403-410.

189

Aluri, S., and Terli, R. (2012) Three dimensional modelling of beta endorphin and its interaction with three opioid receptors. Journal of Computational Biology and Bioinformatics Research 4: 51-57. Amann, R.I., Ludwig, W., and Schleifer, K.H. (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59: 143-169. Amato, P., Hennebelle, R., Magand, O., Sancelme, M., Delort, A.M., Barbante, C. et al. (2007) Bacterial characterization of the snow cover at Spitzberg, Svalbard. FEMS Microbiol Ecol 59: 255-264. Anbar, A.D., Duan, Y., Lyons, T.W., Arnold, G.L., Kendall, B., Creaser, R.A. et al. (2007) A whiff of oxygen before the great oxidation event? Science 317: 1903-1906. Andres, M.S., and Pamela Reid, R. (2006) Growth morphologies of modern marine stromatolites: A case study from Highborne Cay, Bahamas. Sediment Geol 185: 319- 328. Anisimova, M., and Gascuel, O. (2006) Approximate likelihood-ratio test for branches: A fast, accurate, and powerful alternative. Syst Biol 55: 539. Anitori, R.P., Trott, C., Saul, D.J., Bergquist, P.L., and Walter, M.R. (2002) A culture-independent survey of the bacterial community in a radon hot spring. Astrobiology 2: 255-270. Apweiler, R., Martin, M., O’Donovan, C., Magrane, M., Alam-Faruque, Y., Antunes, R. et al. (2010) The universal protein resource (UniProt) in 2010. Nucleic Acids Res 38: D142-D148. Argandoña, M., FernándezǦCarazo, R., Llamas, I., MartínezǦCheca, F., Caba, J.M., Quesada, E., and Moral, A. (2005) The moderately halophilic bacterium Halomonas maura is a freeǦliving diazotroph. FEMS Microbiol Lett 244: 69-74. Ashkenazy, H., Erez, E., Martz, E., Pupko, T., and Ben-Tal, N. (2010) ConSurf 2010: calculating evolutionary conservation in sequence and structure of proteins and nucleic acids. Nucleic Acids Res 38: W529-W533. Bai, Y., Yang, D., Wang, J., Xu, S., Wang, X., and An, L. (2006) Phylogenetic diversity of culturable bacteria from alpine permafrost in the Tianshan Mountains, northwestern China. Res Microbiol 157: 741-751. Bakermans, C., Tsapin, A.I., Souza-Egipsy, V., Gilichinsky, D.A., and Nealson, K.H. (2003) Reproduction and metabolism at -10°C of bacteria isolated from Siberian permafrost. Environ Microbiol 5: 321-326. Bardavid, R., Ionescu, D., Oren, A., Rainey, F., Hollen, B., Bagaley, D. et al. (2007) Selective enrichment, isolation and molecular detection of Salinibacter and related extremely halophilic Bacteria from hypersaline environments. Hydrobiologia 576: 3-13. Bargagli, R., Skotnicki, M.L., Marri, L., Pepi, M., Mackenzie, A., and Agnorelli, C. (2004) New record of moss and thermophilic bacteria species and physico-chemical properties of geothermal soils on the northwest slope of Mt. Melbourne (Antarctica). Polar Biol 27: 423-431.

190

Barrett, J.E., Virginia, R.A., Wall, D.H., Cary, S.C., Adams, B.J., Hacker, A.L., and Aislabie, J.M. (2006) Co-variation in soil biodiversity and biogeochemistry in northern and southern Victoria Land, Antarctica. Antarct Sci 18: 535-548. Bauer, K., Díez, B., Lugomela, C., Seppälä, S., Borg, A., and Bergman, B. (2008) Variability in benthic diazotrophy and cyanobacterial diversity in a tropical intertidal lagoon. FEMS Microbiol Ecol 63: 205-221. Bauld, J., Favinger, J.L., Madigan, M.T., and Gest, H. (1986) Obligately halophilic Chromatium vinosum from Hamelin Pool, Shark Bay, Australia. Curr Microbiol 14: 335-339. Bazylinski, D.A., Dean, A.J., Schüler, D., Phillips, E.J.P., and Lovley, D.R. (2000) N2 dependent growth and nitrogenase activity in the metal metabolizing bacteria, Geobacter and Magnetospirillum species. Environ Microbiol 2: 266-273. Belay, N., Sparling, R., and Daniels, L. (1984) Dinitrogen fixation by a thermophilic methanogenic bacterium. Bell, R.E., and BenǦTal, N. (2003) In silico identification of functional protein interfaces. Comp Funct Genomics 4: 420-423. Berezin, C., Glaser, F., Rosenberg, J., Paz, I., Pupko, T., Fariselli, P. et al. (2004) ConSeq: the identification of functionally and structurally important residues in protein sequences. Bioinformatics 20: 1322. Berg, J.M., Tymoczko, J.L., and Stryer, L. (2002) Biochemistry. New York:: W. H. Freeman and Co. Bergman, B., Gallon, J.R., Rai, A.N., and Stal, L.J. (1997) N2 Fixation by non- heterocystous cyanobacteria. In, pp. 139-185. Berman-Frank, I., Lundgren, P., and Falkowski, P. (2003) Nitrogen fixation and photosynthetic oxygen evolution in cyanobacteria. Res Microbiol 154: 157-164. Bertics, V., Sohm, J., Treude, T., Chow, C., Capone, D., Fuhrman, J., and Ziebis, W. (2010) Burrowing deeper into benthic nitrogen cycling: the impact of bioturbation on nitrogen fixation coupled to sulfate reduction. Mar Ecol Prog Ser 409: 1-15. Bertrand-Sarfati, J., and Walter, M.R. (1976) Chapter 5.2 An Attempt to Classify Late Precambrian Stromatolite Microstructures. In Developments in Sedimentology: Elsevier, pp. 251-259. Bhat, W.W., Lattoo, S.K., Razdan, S., Dhar, N., Rana, S., Dhar, R.S. et al. (2012) Molecular cloning, bacterial expression and promoter analysis of squalene synthase from< i> Withania somnifera(L.) Dunal. Gene. Bhatia, M., Sharp, M., and Foght, J. (2006) Distinct Bacterial Communities Exist beneath a High Arctic Polythermal Glacier. Appl Environ Microbiol 72: 5838-5845. Blackwood, C.B., Marsh, T., Kim, S.-H., and Paul, E.A. (2003) Terminal Restriction Fragment Length Polymorphism Data Analysis for Quantitative Comparison of Microbial Communities. Appl Environ Microbiol 69: 926-932. Blight, P.G. (1977) Uraniferous Metamorphics and" younger" Granites of the Paralana Area, Mount Painter Province, South Australia: A Petrographical and Geochemial Study: Department of Geology, University of Adelaide.

191

Boeckmann, B., Bairoch, A., Apweiler, R., Blatter, M.C., Estreicher, A., Gasteiger, E. et al. (2003) The SWISS-PROT protein knowledgebase and its supplement TrEMBL in 2003. Nucleic Acids Res 31: 365-370. Bohm, G., and Jaenicke, R. (1994) Relevance of sequence statistics for the properties of extremophilic proteins. International Journal of Peptide and Protein Research 43: 97-106. Bohme, H. (1998) Regulation of nitrogen fixation in heterocyst-forming cyanobacteria. Trends Plant Sci 3: 346-351. Bolhuis, A., Kwan, D., and Thomas, J. (2008) Halophilic Adaptations of Proteins. In Protein adaptation in extremophiles. Siddiqui, K.S., and Thomas, T. (eds): Nova Science Publishers, Inc., pp. 71-104. Bonin, P., and Michotey, V. (2006) Nitrogen budget in a microbial mat in the Camargue (southern France). MARINE ECOLOGY-PROGRESS SERIES- 322: 75. Bothe, H., Tripp, H., and Zehr, J. (2010) Unicellular cyanobacteria with a new mode of life: the lack of photosynthetic oxygen evolution allows nitrogen fixation to proceed. Arch Microbiol: 1-8. Bourne, H.R., Sanders, D.A., and McCormick, F. (1991) The GTPase superfamily: conserved structure and molecular mechanism. Bowman, J.P., and McCuaig, R.D. (2003) Biodiversity, Community Structural Shifts, and Biogeography of Prokaryotes within Antarctic Continental Shelf Sediment. Appl Environ Microbiol 69: 2463-2483. Bowman, J.P., McCammon, S.A., Brown, M.V., Nichols, D.S., and McMeekin, T.A. (1997) Diversity and association of psychrophilic bacteria in Antarctic sea ice. Appl Environ Microbiol 63: 3068-3078. Bowman, J.P., McCammon, S.A., Gibson, J.A.E., Robertson, L., and Nichols, P.D. (2003) Prokaryotic Metabolic Activity and Community Structure in Antarctic Continental Shelf Sediments. Appl Environ Microbiol 69: 2448-2462. Brambilla, E., Hippe, H., Hagelstein, A., Tindall, B.J., and Stackebrandt, E. (2001) 16S rDNA diversity of cultured and uncultured prokaryotes of a mat sample from Lake Fryxell, McMurdo Dry Valleys, Antarctica. Extremophiles 5: 23-33. Brewer, W. (1866) Note on the organisms of the geysers of California. Am. J. Sci 92: 429. Brinkmeyer, R., Knittel, K., Jurgens, J., Weyland, H., Amann, R., and Helmke, E. (2003) Diversity and Structure of Bacterial Communities in Arctic versus Antarctic Pack Ice. Appl Environ Microbiol 69: 6610-6619. Brocks, J., Logan, G., Buick, R., and Summons, R. (1999) Archean molecular fossils and the early rise of . Science 285: 1033. Brown, I.I., Bryant, D.A., Casamatta, D., Thomas-Keprta, K.L., Sarkisova, S.A., Shen, G. et al. (2010) Polyphasic Characterization of a Thermotolerant Siderophilic Filamentous Cyanobacterium That Produces Intracellular Iron Deposits. Appl Environ Microbiol 76: 6664.

192

Brown, M., Friez, M., and Lovell, C. (2003) Expression of nifH genes by diazotrophic bacteria in the rhizosphere of short form Spartina alterniflora. FEMS Microbiol Ecol 43: 411-417. Brugger, J., Long, N., McPhail, D.C., and Plimer, I. (2005) An active amagmatic hydrothermal system: The Paralana hot springs, Northern Flinders Ranges, South Australia. Chemical Geology 222: 35-64. Bureau of Meteorology, C.o.A. (2011). Climate Data Online [WWW document]. URL http://www.bom.gov.au/climate/data/. Burgess, B.K., and Lowe, D.J. (1996) Mechanism of Molybdenum Nitrogenase. Chem Rev 96: 2983-3012. Burgess, B.K., Jacobs, D.B., and Stiefel, E.I. (1980) Large-scale purification of high activity< i> Azotobacter vinelandii nitrogenase. Biochimica et Biophysica Acta (BBA)-Enzymology 614: 196-209. Burke, C., Steinberg, P., Rusch, D., Kjelleberg, S., and Thomas, T. (2011) Bacterial community assembly based on functional genes rather than species. Proceedings of the National Academy of Sciences 108: 14288-14293. Burling, M., Pattiaratchi, C., and Ivey, G. (2003) The tidal regime of Shark Bay, Western Australia. Estuarine, Coastal and Shelf Science 57: 725-735. Burns, B., Goh, F., Allen, M., and Neilan, B. (2004) Microbial diversity of extant stromatolites in the hypersaline marine environment of Shark Bay, Australia. Environ Microbiol 6: 1096-1101. Burns, B., Anitori, R., Butterworth, P., Henneberger, R., Goh, F., Allen, M. et al. (2009) Modern analogues and the early history of microbial life. Precambrian Res 173: 10-18. Burns, R.C., Hardy, R.W.F., and Anthony San, P. (1972) Purification of nitrogenase and crystallization of its Mo---Fe protein. In Methods in Enzymology: Academic Press, pp. 480-496. Cambillau, C., and Claverie, J.-M. (2000) Structural and genomic correlates of hyperthermostability. J Biol Chem 275: 32383-32386. Cannone, N., Wagner, D., Hubberten, H., and Guglielmin, M. (2008) Biotic and abiotic factors influencing soil properties across a latitudinal gradient in Victoria Land, Antarctica. Geoderma 144: 50-65. Carpenter, E.J., Lin, S., and Capone, D.G. (2000) Bacterial Activity in South Pole Snow. Appl Environ Microbiol 66: 4514-4517. Carugo, O. (2003) How root-mean-square distance (rmsd) values depend on the resolution of protein structures that are compared. Journal of applied crystallography 36: 125-128. Caspi, R., and Karp, P.D. (2002) Using the MetaCyc Pathway Database and the BioCyc Database Collection: John Wiley & Sons, Inc. Cavicchioli, R. (2002) Extremophiles and the search for extraterrestrial life. Astrobiology 2: 281-292. Chakrabartty, A., Schellman, J.A., and Baldwin, R.L. (1991) Large differences in the helix propensities of alanine and glycine.

193

Chao, A., and Yang, M.C.K. (1993) Stopping Rules and Estimation for Recapture Debugging with Unequal Failure Rates. In: Biometrika Trust, pp. 193-201. Chen, H., and Zhou, H.X. (2005) Prediction of solvent accessibility and sites of deleterious mutations from protein sequence. Nucleic Acids Res 33: 3193. Chevenet, F., Brun, C., Bañuls, A., Jacq, B., and Christen, R. (2006) TreeDyn: towards dynamic graphics and annotations for analyses of trees. BMC Bioinformatics 7: 439. Chien, Y., and Zinder, S. (1996) Cloning, functional organization, transcript studies, and phylogenetic analysis of the complete nitrogenase structural genes (nifHDK2) and associated genes in the archaeon Methanosarcina barkeri 227. J Bacteriol 178: 143. Chiu, H.J., Peters, J.W., Lanzilotta, W.N., Ryle, M.J., Seefeldt, L.C., Howard, J.B., and Rees, D.C. (2001) MgATP-bound and nucleotide-free structures of a nitrogenase protein complex between the Leu 127 -Fe-protein and the MoFe-protein. Biochemistry 40: 641-650. Christner, B.C., Mosley-Thompson, E., Thompson, L.G., and Reeve, J.N. (2005) Classification of Bacteria from Polar and Nonpolar Glacial Ice. In Life in Ancient Ice. Castello, J.D., and Rogers, S.O. (eds). Princeton, New Jersey: Princeton University Press, pp. 227-239. Christner, B.C., Mosley-Thompson, E., Thompson, L.G., Zagorodnov, V., Sandman, K., and Reeve, J.N. (2000) Recovery and Identification of Viable Bacteria Immured in Glacial Ice. Icarus 144: 479-485. Chung, J., Wang, W., and Bourne, P. (2006) Exploiting sequence and structure homologs to identify protein-protein binding sites. PROTEINS-NEW YORK- 62: 630. Chung, J.L., Wang, W., and Bourne, P.E. (2005) Exploiting sequence and structure homologs to identify protein–protein binding sites. Proteins: Structure, Function, and Bioinformatics 62: 630-640. Clarridge, J.E., III (2004) Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin. Microbiol. Rev. 17: 840-862. Clement, B.G., Kehl, L.E., DeBord, K.L., and Kitts, C.L. (1998) Terminal restriction fragment patterns (TRFPs), a rapid, PCR-based method for the comparison of complex bacterial communities. J Microbiol Methods 31: 135-142. Cole, J., Wang, Q., Cardenas, E., Fish, J., Chai, B., Farris, R. et al. (2009) The Ribosomal Database Project: improved alignments and new tools for rRNA analysis. Nucleic Acids Res 37: D141. Cole, J.R., Chai, B., Farris, R.J., Wang, Q., McGarrell, D.M., Bandela, A.M. et al. (2007) The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res 35: D169. Cole, J.R., Chai, B., Marsh, T.L., Farris, R.J., Wang, Q., Kulam, S.A. et al. (2003) The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy. Nucleic Acids Res 31: 442-443. Colon-Lopez, M., Sherman, D., and Sherman, L. (1997) Transcriptional and translational regulation of nitrogenase in light-dark-and continuous-light-grown cultures

194 of the unicellular cyanobacterium Cyanothece sp. strain ATCC 51142. J Bacteriol 179: 4319. Costello, E., Halloy, S., Reed, S., Sowell, P., and Schmidt, S. (2009) Fumarole- Supported Islands of Biodiversity within a Hyperarid, High-Elevation Landscape on Socompa Volcano, Puna de Atacama, Andes. Appl Environ Microbiol 75: 735. Cravo-Laureau, C., Matheron, R., Joulian, C., Cayol, J.-L., and Hirschler-Rea, A. (2004) Desulfatibacillum alkenivorans sp. nov., a novel n-alkene-degrading, sulfate- reducing bacterium, and emended description of the genus Desulfatibacillum. Int J Syst Evol Microbiol 54: 1639-1642. CRBIP, T. (2007) Centre de Ressources Biologiques de l'Institut Pasteur. In: Institut Pasteur. D'Agostino, R.B. (1986) Tests for Normal Distribution. In Goodness-of-fit techniques. D'Agostino, R.B., and Stephens, M.A. (eds). New York, NY, USA: Marcel Dekker, Inc. Daniel, R.M., Danson, M.J., Hough, D.W., Lee, C.K., Peterson, M.E., and Cowan, D.A. (2008) Enzyme stability and activity at high temperatures. In Protein Adaptation in Extremophiles. Siddiqui, K.S., and Thomas, T. (eds). New York, NY: Nova Science Publishers, Inc, pp. 1-34. Darapaneni, V., Prabhaker, V.K., and Kukol, A. (2009) Large-scale analysis of influenza A virus sequences reveals potential drug target sites of non-structural proteins. J Gen Virol 90: 2124-2133. DasSarma, S., and Arora, P. (2006) Halophiles. eLS. Davey, A., and Marchant, H.J. (1983) Seasonal Variation in Nitrogen Fixation by Nostoc-Commune Vaucher at the Vestfold Hills Antarctica. Phycologia 22: 377-386. Davila, A.F., Gómez-Silva, B., De los Rios, A., Ascaso, C., Olivares, H., McKay, C.P., and Wierzchos, J. (2008) Facilitation of endolithic microbial survival in the hyperarid core of the Atacama Desert by mineral deliquescence. Journal of Geophysical Research 113: G01028. Davila, A.F., Duport, L.G., Melchiorri, R., Jänchen, J., Valea, S., de los Rios, A. et al. (2010) Hygroscopic Salts and the Potential for Life on Mars. Astrobiology 10: 617- 628. Deming, J.W. (2002) Psychrophiles and polar regions. Curr Opin Microbiol 5: 301- 309. Derakshani, M., Lukow, T., and Liesack, W. (2001) Novel bacterial lineages at the (sub) division level as detected by signature nucleotide-targeted recovery of 16S rRNA genes from bulk soil and rice roots of flooded rice microcosms. Appl Environ Microbiol 67: 623-631. Des Marais, D. (1995) The biogeochemistry of hypersaline microbial mats. Adv Microb Ecol 14: 251. Des Marais, D.J. (2003) Biogeochemistry of Hypersaline Microbial Mats Illustrates the Dynamics of Modern Microbial Ecosystems and the Early Evolution of the Biosphere. Biol Bull 204: 160-167. Desnues, C., Michotey, V., Wieland, A., Zhizang, C., Fourçans, A., Duran, R., and Bonin, P. (2007) Seasonal and diel distributions of denitrifying and bacterial

195 communities in a hypersaline microbial mat (Camargue, France). Water Res 41: 3407- 3419. Diallo, M.D., Reinhold-Hurek, B., and Hurek, T. (2008) Evaluation of PCR primers for universal nifH gene targeting and for assessment of transcribed nifH pools in roots of Oryza longistaminata with and without low nitrogen input. FEMS Microbiol Ecol 65: 220-228. Dilworth, M.J., Eldridge, M.E., and Eady, R.R. (1993) The molybdenum and vanadium nitrogenases of Azotobacter chroococcum: effect of elevated temperature on N2 reduction. Biochem J 289: 395. Distel, D.L., Morrill, W., MacLaren-Toussaint, N., Franks, D., and Waterbury, J. (2002) Teredinibacter turnerae gen. nov., sp. nov., a dinitrogen-fixing, cellulolytic, endosymbiotic gamma-proteobacterium isolated from the gills of wood-boring molluscs (Bivalvia: Teredinidae). Int J Syst Evol Microbiol 52: 2261-2269. Dixon, R., and Kahn, D. (2004) Genetic regulation of biological nitrogen fixation. Nature Reviews Microbiology 2: 621-631. Dobzhansky, T. (1973) Nothing in biology makes sense except in the light of evolution. American Biology Teacher 35: 125-129. Dunbar, J., Ticknor, L.O., and Kuske, C.R. (2000) Assessment of Microbial Diversity in Four Southwestern United States Soils by 16S rRNA Gene Terminal Restriction Fragment Analysis. Appl Environ Microbiol 66: 2943-2950. Dunbar, J., Ticknor, L.O., and Kuske, C.R. (2001) Phylogenetic Specificity and Reproducibility and New Method for Analysis of Terminal Restriction Fragment Profiles of 16S rRNA Genes from Bacterial Communities. Appl Environ Microbiol 67: 190-197. Dupraz, C., and Visscher, P. (2005) Microbial lithification in marine stromatolites and hypersaline mats. Trends Microbiol 13: 429-438. Dupraz, C., Reid, R., Braissant, O., Decho, A., Norman, R., and Visscher, P. (2009) Processes of carbonate precipitation in modern microbial mats. Earth-Sci Rev 96: 141- 162. Eder, W., and Huber, R. (2002) New isolates and physiological properties of the Aquificales and description of Thermocrinis albus sp. nov. Extremophiles 6: 309-318. Edgar, R.C. (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucl. Acids Res. 32: 1792-1797. Edwards, A.M. (1868) Original Communications: On the Occurrence of Living Forms in the Hot Waters of California. Quarterly Journal of Microscopical Science 2: 247- 250. Edwards, D., Stajich, J.E., and Hansen, D. (2009) Bioinformatics: Tools and Applications: Springer. Eisenberg, H. (1995) Life in unusual environments: progress in understanding the structure and function of enzymes from extreme halophilic bacteria. Archives of Biochemistry and Biophysics 318: 1-5. Eisenberg, H., Mevarech, M., and Zaccai, G. (1992) Biochemical, structural, and molecular genetic aspects of halophilism. Advances in protein chemistry 43: 1-62.

196

Empadinhas, N., and da Costa, M.S. (2010) Diversity and biosynthesis of compatible solutes in hyper/thermophiles. Int Microbiol 9: 199-206. Everett, K.D.E., Bush, R.M., and Andersen, A.A. (1999) Emended description of the order Chlamydiales, proposal of Parachlamydiaceae fam. nov. and Simkaniaceae fam. nov., each containing one monotypic genus, revised taxonomy of the family Chlamydiaceae, including a new genus and five new species, and standards for the identification of organisms. Int J Syst Bacteriol 49: 415-440. Falcón, L., Cerritos, R., Eguiarte, L., and Souza, V. (2007) Nitrogen fixation in microbial mat and stromatolite communities from Cuatro Cienegas, Mexico. Microb Ecol 54: 363-373. Fani, R., Gallo, R., and Liò, P. (2000) Molecular Evolution of Nitrogen Fixation: The Evolutionary History of the nifD, nifK, nifE, and nifN Genes. J Mol Evol 51: 1-11. Farnelid, H., Öberg, T., and Riemann, L. (2009) Identity and dynamics of putative N2 fixing picoplankton in the Baltic Sea proper suggest complex patterns of regulation. Environmental Microbiology Reports 1: 145-154. Fay, P. (1992) Oxygen relations of nitrogen fixation in cyanobacteria. Microbiology and Molecular Biology Reviews 56: 340. Feller, G., and Gerday, C. (2003) Psychrophilic enzymes: hot topics in cold adaptation. Nature Reviews Microbiology 1: 200-208. Feller, G., Lonhienne, T., Deroanne, C., Libioulle, C., Van Beeumen, J., and Gerday, C. (1992) Purification, characterization, and nucleotide sequence of the thermolabile alpha-amylase from the antarctic psychrotroph Alteromonas haloplanctis A23. J Biol Chem 267: 5217-5221. Felsenstein, J. (1981) Evolutionary trees from DNA sequences: A maximum likelihood approach. J Mol Evol 17: 368-376. Felsenstein, J. (2007) PHYLIP (phylogeny inference package) version 3.67. Distributed by the author. Department of Genome Sciences, University of Washington, Seattle, USA. Fernandez-Valiente, E., Quesada, A., Howard-Williams, C., and Hawes, I. (2001) N2-Fixation in Cyanobacterial Mats from Ponds on the McMurdo Ice Shelf, Antarctica. Microb Ecol 42: 338-349. Fernandez-Valiente, E., Camacho, A., Rochera, C., Rico, E., Vincent, W.F., and Quesada, A. (2007) Community structure and physiological characterization of microbial mats in Byers Peninsula, Livingston Island (South Shetland Islands, Antarctica). In, pp. 377-385. Ferris, M., Nold, S., Santegoeds, C., and Ward, D. (2001) Examining bacterial population diversity within the Octopus Spring microbial mat community. Thermophiles: Biodiversity, Ecology and Evolution: 51–64. Fields, P.A. (2001) Review: Protein function at thermal extremes: balancing stability and flexibility. Comparative Biochemistry and Physiology-Part A: Molecular & Integrative Physiology 129: 417-431. Fiore, C.L., Jarett, J.K., Olson, N.D., and Lesser, M.P. (2010) Nitrogen fixation and nitrogen transformations in marine symbioses. Trends Microbiol 18: 455-463.

197

Fleming, H., and Haselkorn, R. (1973) Differentiation in Nostoc muscorum: Nitrogenase is Synthesized in Heterocysts. In, pp. 2727-2731. Flint, D.J., and Abeysinghe, P.B. (2000/07) Geology and mineral resources of the Gascoyne Region: Western Australia Geological Survey. In. Perth: Western Australia Geological Survey, p. 29. Fourçans, A., Oteyza, T., Wieland, A., Solé, A., Diestra, E., Bleijswijk, J. et al. (2004) Characterization of functional bacterial groups in a hypersaline microbial mat community (Salins de Giraud, Camargue, France). FEMS Microbiol Ecol 51: 55-70. Francis, C.A., Beman, J.M., and Kuypers, M.M.M. (2007) New processes and players in the nitrogen cycle: the microbial ecology of anaerobic and archaeal ammonia oxidation. The ISME Journal 1: 19-27. Franzmann, P.D., and Dobson, S.J. (1992) Cell wall-less, free-living spirochetes in Antarctica. FEMS Microbiol Lett 97: 289-292. French, H., and Guglielmin, M. (1999a) Observations on the ice-marginal, periglacial geomorphology of Terra Nova Bay, northern Victoria Land, Antarctica. Permafrost Periglac 10: 331-347. French, H.M., and Guglielmin, M. (1999b) Observations on the Ice-Marginal, Periglacial Geomorphology of Terra Nova Bay, Northern Victoria Land, Antarctica. Permafrost Periglac 10: 331-347. French, H.M., and Guglielmin, M. (2000) Frozen Ground Phenomena in the Vicinity of Terra Nova Bay, Northern Victoria land, Antarctica: A Preliminary Report. Geografiska Annaler: Series A, Physical Geography 82: 513-526. Frezzotti, M., Salvatore, M., Vittuari, L., Grigioni, P., and De Silvestri, L. (2001) Satellite Image Map - Northern Foothills and Inexpressible Island Area (Victoria Land, Antarctica). Ter Ant Rep 6: 1-8. Friedmann, E.I. (1993) Extreme environments and exobiology. G Bot Ital 127: 369- 376. Friedmann, E.I., Kappen, L., Meyer, M.A., and Nienow, J.A. (1993) Long-term productivity in the cryptoendolithic microbial community of the Ross Desert, Antarctica. Microb Ecol 25: 51-69. Frostegard, A., Courtois, S., Ramisse, V., Clerc, S., Bernillon, D., Le Gall, F. et al. (1999) Quantification of bias related to the extraction of DNA directly from soils. Appl Environ Microbiol 65: 5409. Fryberger, S., Krystinik, L., and Schenk, C. (1990) Tidally flooded back-barrier dunefield, Guerrero Negro area, Baja California, Mexico. Sedimentology 37: 23-43. Fukuchi, S., and Nishikawa, K. (2001) Protein surface amino acid compositions distinctively differ between thermophilic and mesophilic bacteria1. J Mol Biol 309: 835-843. Fukuchi, S., Yoshimune, K., Wakayama, M., Moriguchi, M., and Nishikawa, K. (2003) Unique amino acid composition of proteins in halophilic bacteria. J Mol Biol 327: 347-357.

198

Gaidos, E., Lanoil, B., Thorsteinsson, T., Graham, A., Skidmore, M., Han, S.K. et al. (2004) A Viable Microbial Community in a Subglacial Volcanic Crater Lake, Iceland. Astrobiology 4: 327-344. Galinski, E.A. (1993) Compatible solutes of halophilic eubacteria: molecular principles, water-solute interaction, stress protection. Cell Mol Life Sci 49: 487-496. Galinski, E.A., and Trüper, H.G. (1994) Microbial behaviour in salt-stressed ecosystems. FEMS Microbiol Rev 15: 95-108. Gall, J.L. (1963) A new species of Desulfovibrio. J Bacteriol 86: 1120. Gallon, J.R. (2001) N-2 fixation in phototrophs: adaptation to a specialized way of life. Plant and Soil 230: 39-48. Gallon, J.R., Hashem, M.A., and Chaplin, A.E. (1991) Nitrogen fixation by Oscillatoria spp. under autotrophic and photoheterotrophic conditions. Microbiology 137: 31. Gambacorta, A., Gliozzi, A., and Rosa, M. (1995) Archaeal lipids and their biotechnological applications. World Journal of Microbiology and Biotechnology 11: 115-131. Garcia-Pichel, F., Nübel, U., and Muyzer, G. (1998) The phylogeny of unicellular, extremely halotolerant cyanobacteria. Arch Microbiol 169: 469-482. Garrity, G.M., Brenner, D.J., Krieg, N.R., and Staley, J.R. (2005) Bergey's Manual of Systematic Bacteriology, Volume Two: The Proteobacteria, Parts A - C: Springer - Verlag. Gary Stacey, R.H.B., Harold J. Evans (1992) Biological nitrogen fixation New York Chapman & Hall. Gauthier, G., Gauthier, M., and Christen, R. (1995) Phylogenetic analysis of the genera Alteromonas, Shewanella, and Moritella using genes coding for small-subunit rRNA sequences and division of the genus Alteromonas into two genera, Alteromonas (emended) and Pseudoalteromonas gen. nov., and proposal of twelve new species combinations. Int J Syst Bacteriol 45: 755-761. Georgiadis, M., Komiya, H., Chakrabarti, P., Woo, D., Kornuc, J., and Rees, D. (1992) Crystallographic structure of the nitrogenase iron protein from Azotobacter vinelandii. Science 257: 1653. Georlette, D., Damien, B., Blaise, V., Depiereux, E., Uversky, V.N., Gerday, C., and Feller, G. (2003) Structural and Functional Adaptations to Extreme Temperatures in Psychrophilic, Mesophilic, and Thermophilic DNA Ligases. J Biol Chem 278: 37015- 37023. Gerstein, M. (1998) How representative are the known structures of the proteins in a complete genome? A comprehensive structural census. Folding and Design 3: 497-512. Gilbert, D. (2003) Sequence File Format Conversion with Command Line Readseq. Current Protocols in Bioinformatics. Gilichinsky, D., Vishnivetskaya, T., Petrova, M., Spirina, E., Mamykin, V., and Rivkina, E. (2008) Bacteria in Permafrost. In Psychrophiles: from Biodiversity to Biotechnology. Margesin, R., Schinner, F., Marx, J.-C., and Gerday, C. (eds). Berlin, Germany: Springer, pp. 83-102.

199

Gilichinsky, D., Rivkina, E., Bakermans, C., Shcherbakova, V., Petrovskaya, L., Ozerskaya, S. et al. (2005) Biodiversity of cryopegs in permafrost. FEMS Microbiol Ecol 53: 117-128. Gilichinsky, D.A., Wilson, G.S., Friedmann, E.I., McKay, C.P., Sletten, R.S., Rivkina, E.M. et al. (2007) Microbial Populations in Antarctic Permafrost: Biodiversity, State, Age, and Implication for Astrobiology. Astrobiology 7: 275-311. Glaser, F., Rosenberg, Y., Kessel, A., Pupko, T., and Ben-Tal, N. (2005) The ConSurf-HSSP database: the mapping of evolutionary conservation among homologs onto PDB structures. Proteins 58: 610–617. Glaser, F., Pupko, T., Paz, I., Bell, R.E., Bechor-Shental, D., Martz, E., and Ben- Tal, N. (2003) ConSurf: identification of functional regions in proteins by surface- mapping of phylogenetic information. Bioinformatics 19: 163. Gliozzi, A., Relini, A., and Chong, P.L.G. (2002) Structure and permeability properties of biomimetic membranes of bolaform archaeal tetraether lipids. J Membr Sci 206: 131-147. Goh, F., Barrow, K.D., Burns, B.P., and Neilan, B.A. (2010) Identification and regulation of novel compatible solutes from hypersaline stromatolite-associated cyanobacteria. Arch Microbiol: 1-8. Goh, F., Jeon, Y.J., Barrow, K., Neilan, B.A., and Burns, B.P. (2011) Osmoadaptive Strategies of the Archaeon Halococcus hamelinensis Isolated from a Hypersaline Stromatolite Environment. Astrobiology 11: 529-536. Goh, F., Leuko, S., Allen, M., Bowman, J., Kamekura, M., Neilan, B., and Burns, B. (2006) Halococcus hamelinensis sp. nov., a novel halophilic archaeon isolated from stromatolites in Shark Bay, Australia. Int J Syst Evol Microbiol 56: 1323. Goh, F., Allen, M., Leuko, S., Kawaguchi, T., Decho, A., Burns, B., and Neilan, B. (2008) Determining the specific microbial populations and their spatial distribution within the stromatolite ecosystem of Shark Bay. The ISME Journal 3: 383-396. Goldenberg, O., Erez, E., Nimrod, G., and Ben-Tal, N. (2008) The ConSurf-DB: pre- calculated evolutionary conservation profiles of protein structures. Nucleic Acids Res. Goldsmith-Fischman, S., Kuzin, A., Edstrom, W.C., Benach, J., Shastry, R., Xiao, R. et al. (2004) The SufE Sulfur-acceptor Protein Contains a Conserved Core Structure that Mediates Interdomain Interactions in a Variety of Redox Protein Complexes. J Mol Biol 344: 549-565. Golubic, S., and Walter, M.R. (1976) Chapter 4.1 Organisms that Build Stromatolites. In Developments in Sedimentology: Elsevier, pp. 113-126. Good, I.J. (1953) THE POPULATION FREQUENCIES OF SPECIES AND THE ESTIMATION OF POPULATION PARAMETERS. In, pp. 237-264. Goto, M., Ando, S., Hachisuka, Y., and Yoneyama, T. (2005) Contamination of diverse nifH and nifH-like DNA into commercial PCR primers. FEMS Microbiol Lett 246: 33-38. Gragnani R, Guglielmin M, Stenni B, Longinelli A, Smiraglia C, and L, C. (1998) Origins of the ground ice in the ice-free lands of the Northern Foothills (Northern

200

Victoria Land, Antarctica). Lewkowicz, A.G., and Allard, M. (eds). Yellowknife, Canada: Collection Nordicana, pp. 335-340. Grant, K. (1938) The Radio-activity and Composition of the Water and Gases of the Paralana Hot Spring. Trans. Roy. Soc. SA 62: 2. Greaves, R.B., and Warwicker, J. (2009) Stability and solubility of proteins from extremophiles. Biochem Biophys Res Commun 380: 581-585. Grimm, F., Cort, J.R., and Dahl, C. (2010) DsrR, a novel IscA-like protein lacking iron-and Fe-S-binding functions, involved in the regulation of sulfur oxidation in Allochromatium vinosum. J Bacteriol 192: 1652-1661. Groudieva, T., Kambourova, M., Yusef, H., Royter, M., Grote, R., Trinks, H., and Antranikian, G. (2004) Diversity and cold-active hydrolytic enzymes of culturable bacteria associated with Arctic sea ice, Spitzbergen. Extremophiles 8: 475-488. Guglielmin, M., and French, H.M. (2004) Ground ice in the Northern Foothills, northern Victoria Land, Antarctica. Ann Glaciol 39: 495-500. Guglielmin, M., and Cannone, N. (2012) A permafrost warming in a cooling Antarctica? Clim Change 111: 177-195. Guglielmin, M., Biasini, A., and Smiraglia, C. (1997) The Contribution of Geoelectrical Investigations in the Analysis of Periglacial and Glacial Landforms in Ice Free Areas of the Northern Foothills (Northern Victoria Land, Antarctica). Geogr Ann Ser A PhyGeogr: 17-24. Guglielmin, M., Camusso, M., Polesello, S., Valsecchi, S., and Teruzzi, M. (2002) A Note on the Ice Crystallography and Geochemistry of a Debris Cone, Northern Foothills, Antarctica. Permafrost Periglac 13: 77-82. Guindon, S., and Gascuel, O. (2003) A Simple, Fast, and Accurate Algorithm to Estimate Large Phylogenies by Maximum Likelihood. Syst Biol 52: 696-704. Guindon, S., Dufayard, J., Lefort, V., Anisimova, M., Hordijk, W., and Gascuel, O. (2010) New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Syst Biol 59: 307. H.M. Berman, K.H., H. Nakamura (2003) Announcing the worldwide Protein Data Bank. Nature Structural Biology 10: 980. H.M.Berman, J.W., Z.Feng, G.Gilliland, T.N.Bhat, H.Weissig, I.N.Shindyalov, P.E.Bourne (2000) The Protein Data Bank. Nucleic Acids Res 28: 235-242. Hall, J.R., Mitchell, K.R., Jackson-Weaver, O., Kooser, A.S., Cron, B.R., Crossey, L.J., and Takacs-Vesbach, C.D. (2008) Molecular Characterization of the Diversity and Distribution of a Thermal Spring Microbial Community using rRNA and Metabolic Genes. Appl Environ Microbiol. Hall, T.A. (1999) BioEdit: a user-friendly biological sequence alignment editor and analysis program for Windows 95/98/NT. Nucl Acid Symp Ser 41: 95-98. Hamilton, T.L., Boyd, E.S., and Peters, J.W. (2011a) Environmental constraints underpin the distribution and phylogenetic diversity of nifH in the Yellowstone geothermal complex. Microb Ecol 61: 860-870.

201

Hamilton, T.L., Lange, R.K., Boyd, E.S., and Peters, J.W. (2011b) Biological nitrogen fixation in acidic high-temperature geothermal springs in Yellowstone National Park, Wyoming. Environ Microbiol 13: 2204-2215. Hammer, Ã., Harper, D.A.T., and Ryan, P.D. (2001) PAST: paleontological statistics software package for education and data analysis. Palaeontologia electronica 4: 9. Handley, K.M., Boothman, C., Mills, R.A., Pancost, R.D., and Lloyd, J.R. (2010) Functional diversity of bacteria in a ferruginous hydrothermal sediment. The ISME Journal. Haney, P.J., Badger, J.H., Buldak, G.L., Reich, C.I., Woese, C.R., and Olsen, G.J. (1999) Thermal adaptation analyzed by comparison of protein sequences from mesophilic and extremely thermophilic Methanococcus species. Proceedings of the National Academy of Sciences 96: 3578. Hartmann, L.S., and Barnum, S.R. (2010) Inferring the evolutionary history of Mo- dependent nitrogen fixation from phylogenetic studies of nifK and nifDK. J Mol Evol: 1-16. Hawkes, T.R., McLEAN, P.A., and Smith, B.E. (1984) Nitrogenase from nifV mutants of Klebsiella pneumoniae contains an altered form of the iron-molybdenum cofactor. Biochem J 217: 317. Head, I., Saunders, J., and Pickup, R. (1998) Microbial evolution, diversity, and ecology: a decade of ribosomal RNA analysis of uncultivated microorganisms. Microb Ecol 35: 1-21. Heeren, T., and D'Agostino, R. (1987) Robustness of the two independent samples tǦtest when applied to ordinal scaled data. Stat Med 6: 79-90. Henikoff, J.G., Greene, E.A., Pietrokovski, S., and Henikoff, S. (2000) Increased coverage of protein families with the blocks database servers. Nucleic Acids Res 28: 228. Henikoff, S., Henikoff, J.G., and Pietrokovski, S. (1999) Blocks+: a non-redundant database of protein alignment blocks derived from multiple compilations. Bioinformatics 15: 471. Henry, E., Devereux, R., Maki, J., Gilmour, C., Woese, C., Mandelco, L. et al. (1994) Characterization of a new thermophilic sulfate-reducing bacterium. Arch Microbiol 161: 62-69. Herbert, R.A., and Sharp, R. (1992) Molecular biology and biotechnology of extremophiles: Blackie and Son Ltd. Hewson, I., Moisander, P.H., Morrison, A.E., and Zehr, J.P. (2007) Diazotrophic bacterioplankton in a coral reef lagoon: phylogeny, diel nitrogenase expression and response to phosphate enrichment. ISME J 1: 78-91. Hirsch, P., Ludwig, W., Hethke, C., Sittig, M., Hoffmann, B., and Gallikowski, C.A. (1998) Hymenobacter roseosalivarius gen. nov., sp. nov. from continental Antartica soils and sandstone: bacteria of the Cytophaga/Flavobacterium/Bacteroides line of phylogenetic descent. Syst Appl Microbiol 21: 374-383. Hirschler-Réa, A., Matheron, R., Riffaud, C., Mouné, S., Eatock, C., Herbert, R.A. et al. (2003) Isolation and characterization of spirilloid purple phototrophic bacteria

202 forming red layers in microbial mats of Mediterranean salterns: description of Halorhodospira neutriphila sp. nov. and emendation of the genus Halorhodospira. Int J Syst Evol Microbiol 53: 153-163. Hobohm, U., Scharf, M., Schneider, R., and Sander, C. (1992) Selection of representative protein data sets. Protein Sci 1: 409-417. Hoehler, T.M., Bebout, B.M., and Des Marais, D.J. (2001) The role of microbial mats in the production of reduced gases on the early Earth. Nature 412: 324-327. Hoffman, P. (1976) Stromatolite Morphogenesis in Shark Bay, Western Australia. Developments in Sedimentology 20: 261-271. Hoffman, P., and Walter, M.R. (1976) Chapter 6.1 Stromatolite Morphogenesis in Shark Bay, Western Australia. In Developments in Sedimentology: Elsevier, pp. 261- 271. Howard, J.B., and Rees, D.C. (1996) Structural Basis of Biological Nitrogen Fixation. Chem Rev 96: 2965-2982. Huber, R., Eder, W., Heldwein, S., Wanner, G., Huber, H., Rachel, R., and Stetter, K.O. (1998) Thermocrinis ruber gen. nov., sp. nov., a pink-filament-forming hyperthermophilic bacterium isolated from Yellowstone National Park. Appl Environ Microbiol 64: 3576-3583. Imhoff, J., Suling, J., and Petri, R. (1998) Phylogenetic relationships among the Chromatiaceae, their taxonomic reclassification and description of the new genera Allochromatium, Halochromatium, Isochromatium, Marichromatium, Thiococcus, Thiohalocapsa and Thermochromatium. Int J Syst Evol Microbiol 48: 1129. Imhoff, J.F. (2006) The family Ectothiorhodospiraceae. The Prokaryotes: 874-886. Imshenetsky, A.A., Abyzov, S.S., Voronov, G.T., Kuzjurina, L.A., Lysenko, S.V., Sotnikov, G.G., and Fedorova, R.I. (1967) Exobiology and the effect of physical factors on micro-organisms. Life Sci Space Res 5: 250-260. Ionescu, D., Hindiyeh, M., Malkawi, H., and Oren, A. (2010) Biogeography of thermophilic cyanobacteria: insights from the Zerka Ma'in hot springs (Jordan). FEMS Microbiol Ecol 72: 103-113. Israel, G., Cabane, M., Coll, P., Coscia, D., Raulin, F., and Niemann, H. (1999) The Cassini-Huygens ACP experiment and exobiological implications. Adv Space Res 23: 319-331. Izquierdo, J.A., and Nüsslein, K. (2006) Distribution of extensive nifH gene diversity across physical soil microenvironments. Microb Ecol 51: 441-452. Jaenicke, R. (1996) How Do Proteins Acquire Their Three-Dimensional Structure and Stability? Naturwissenschaften 83: 544-554. Jaenicke, R., and Böhm, G. (1998) The stability of proteins in extreme environments. Curr Opin Struct Biol 8: 738-748. Jahnert, R.J., and Collins, L.B. (2011) Significance of subtidal microbial deposits in Shark Bay, Australia. Mar Geol. Jang, S.B., Seefeldt, L.C., and Peters, J.W. (2000) Insights into nucleotide signal transduction in nitrogenase: structure of an iron protein with MgADP bound. Biochemistry 39: 14745-14752.

203

Jang, S.B., Jeong, M.S., Seefeldt, L.C., and Peters, J.W. (2004) Structural and biochemical implications of single amino acid substitutions in the nucleotide-dependent switch regions of the nitrogenase Fe protein from Azotobacter vinelandii. Journal of Biological Inorganic Chemistry 9: 1028-1033. Jannasch, H.W., and Wirsen, C.O. (1981) Morphological survey of microbial mats near deep-sea thermal vents. Appl Environ Microbiol 41: 528-538. Javor, B.J., and Castenholz, R.W. (1981) Laminated microbial mats, laguna Guerrero Negro, Mexico. Geomicrobiol J 2: 237 - 273. Jeffrey O. Dawson, and Gibson, A.H. (1987) Sensitivity of selected Frankia isolates from Casuarina, Allocasuarina and North American host plants to sodium chloride. Physiol Plant 70: 272-278. Jenkins, B.D., Steward, G.F., Short, S.M., Ward, B.B., and Zehr, J.P. (2004) Fingerprinting diazotroph communities in the Chesapeake Bay by using a DNA macroarray. Appl Environ Microbiol 70: 1767-1776. Jimenez-Lopez, J.C., Gachomo, E.W., Seufferheld, M.J., and Kotchoni, S.O. (2010) The maize ALDH protein superfamily: linking structural features to functional specificities. BMC Struct Biol 10: 43. Jørgensen, B., and Des Marais, D. (1990) The diffusive boundary layer of sediments: Oxygen microgradients over a microbial mat. Limnol Oceanogr 35: 1343-1355. Jungblut, A.D., and Neilan, B.A. (2010) NifH gene diversity and expression in a microbial mat community on the McMurdo Ice Shelf, Antarctica. Antarct Sci 22: 117- 122. Jungblut, A.D., Hawes, I., Mountfort, D., Hitzfeld, B., Dietrich, D.R., Burns, B.P., and Neilan, B.A. (2005) Diversity within cyanobacterial mat communities in variable salinity meltwater ponds of McMurdo Ice Shelf, Antarctica. Environ Microbiol 7: 519- 529. Kabsch, W. (1976) A solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 32: 922-923. Kabsch, W. (1978) A discussion of the solution for the best rotation to relate two sets of vectors. Acta Crystallographica Section A: Crystal Physics, Diffraction, Theoretical and General Crystallography 34: 827-828. Kabsch, W., and Sander, C. (1983) Dictionary of protein secondary structure: pattern recognition of hydrogen bonded and geometrical features. Biopolymers 22: 2577-2637. Kashefi, K., Holmes, D.E., Baross, J.A., and Lovley, D.R. (2003) Thermophily in the Geobacteraceae: Geothermobacter ehrlichii gen. nov., sp. nov., a Novel Thermophilic Member of the Geobacteraceae from the" Bag City" Hydrothermal Vent. Appl Environ Microbiol 69: 2985. Kaštovský, J., and Johansen, J.R. (2008) Mastigocladus laminosus (Stigonematales, Cyanobacteria): phylogenetic relationship of strains from thermal springs to soil- inhabiting genera of the order and taxonomic implications for the genus. Phycologia 47: 307-320.

204

Katoh, K., and Toh, H. (2008) Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics 9: 286-298. Katoh, K., Misawa, K., Kuma, K., and Miyata, T. (2002) MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. Nucleic Acids Res 30: 3059. Kawasumi, T., Igarashi, Y., Kodama, T., and Minoda, Y. (1984) Hydrogenobacter thermophilus gen. nov., sp. nov., an Extremely Thermophilic, Aerobic, Hydrogen- Oxidizing Bacterium. Int J Syst Bacteriol 34: 5-10. Kent, H., Buck, M., and Evans, D. (1989) Cloning and sequencing of the nifH gene of Desulfovibrio gigas. FEMS Microbiol Lett 61: 73-78. Kim, J., and Rees, D.C. (1994) Nitrogenase and biological nitrogen fixation. Biochemistry 33: 389-397. Kim, J., Woo, D., and Rees, D. (1993) X-ray crystal structure of the nitrogenase molybdenum-iron protein from Clostridium pasteurianum at 3.0-. ANG. resolution. Biochemistry 32: 7104-7115. Klatt, C.G., Wood, J.M., Rusch, D.B., Bateson, M.M., Hamamura, N., Heidelberg, J.F. et al. (2011) Community ecology of hot spring cyanobacterial mats: predominant populations and their functional potential. The ISME Journal 5: 1262-1278. Klopprogge, K., Grabbe, R., Hoppert, M., and Schmitz, R.A. (2002) Membrane association of Klebsiella pneumoniae NifL is affected by molecular oxygen and combined nitrogen. Arch Microbiol 177: 223-234. Kochkina, G.A., Ivanushkina, N.E., Karasev, S.G., Gavrish, E.Y., Gurina, L.V., Evtushenko, L.I. et al. (2001) Survival of Micromycetes and Actinobacteria under Conditions of Long-Term Natural Cryopreservation. Microbiology 70: 356-364. Krebs, C. (1989) Ecological methodology: Harper & Row New York. Krylov, I.N., Semikhatov, M.A., and Walter, M.R. (1976) Appendix II Table of Time-Ranges of the Principal Groups of Precambrian Stromatolites. In Developments in Sedimentology: Elsevier, pp. 693-694. Kumar, M., Ahmad, S., Ahmad, E., Saifi, M.A., and Khan, R.H. (2012) In Silico Prediction and Analysis of Caenorhabditis EF-hand Containing Proteins. PloS one 7: e36770. Kumar, S., and Nussinov, R. (2001) How do thermophilic proteins deal with heat? Cell Mol Life Sci 58: 1216-1233. Kumar, S., Nei, M., Dudley, J., and Tamura, K. (2008) MEGA: a biologist-centric software for evolutionary analysis of DNA and protein sequences. Briefings in bioinformatics 9: 299. Ladunga, I. (2002a) Finding Similar Nucleotide Sequences Using Network BLAST Searches: John Wiley & Sons, Inc. Ladunga, I. (2002b) Finding Homologs in Amino Acid Sequences Using Network BLAST Searches: John Wiley & Sons, Inc. Landau, M., Mayrose, I., Rosenberg, Y., Glaser, F., Martz, E., Pupko, T., and Ben- Tal, N. (2005) ConSurf 2005: the projection of evolutionary conservation scores of residues on protein structures. Nucleic Acids Res 33: W299.

205

Lanyi, J. (1974) Salt-dependent properties of proteins from extremely halophilic bacteria. Microbiology and Molecular Biology Reviews 38: 272. Lanzilotta, W.N., Ryle, M.J., and Seefeldt, L.C. (1995) Nucleotide Hydrolysis and Protein Conformational Changes in Azotobacter vinelandii Nitrogenase Iron Protein: Defining the Function of Aspartate 129. Biochemistry 34: 10713-10723. Lanzilotta, W.N., Fisher, K., and Seefeldt, L.C. (1996) Evidence for electron transfer from the nitrogenase iron protein to the molybdenum-iron protein without MgATP hydrolysis: characterization of a tight protein-protein complex. Biochemistry 35: 7188- 7196. Larkin, M.A., Blackshields, G., Brown, N.P., Chenna, R., McGettigan, P.A., McWilliam, H. et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947. Latysheva, N., Junker, V.L., Palmer, W.J., Codd, G.A., and Barker, D. (2012) The evolution of nitrogen fixation in cyanobacteria. Bioinformatics 28: 603-606. Le, S.Q., and Gascuel, O. (2008) An Improved General Amino Acid Replacement Matrix. Mol Biol Evol 25: 1307-1320. Leigh, J.A. (2000) Nitrogen fixation in methanogens: the archaeal perspective. Curr Issues Mol Biol 2: 125-131. Leipe, D.D., Wolf, Y.I., Koonin, E.V., and Aravind, L. (2002) Classification and evolution of P-loop GTPases and related ATPases. J Mol Biol 317: 41-72. Leuko, S., Goh, F., Allen, M., Burns, B., Walter, M., and Neilan, B. (2007) Analysis of intergenic spacer region length polymorphisms to investigate the halophilic archaeal diversity of stromatolites and microbial mats. Extremophiles 11: 203-210. Ley, R., Harris, J., Wilcox, J., Spear, J., Miller, S., Bebout, B. et al. (2006) Unexpected diversity and complexity of the Guerrero Negro hypersaline microbial mat. Appl Environ Microbiol 72: 3685. Li, X.-D., Huergo, L.F., Gasperina, A., Pedrosa, F.O., Merrick, M., and Winkler, F.K. (2009) Crystal Structure of Dinitrogenase Reductase-activating Glycohydrolase (DRAG) Reveals Conservation in the ADP-Ribosylhydrolase Fold and Specific Features in the ADP-Ribose-binding Pocket. J Mol Biol 390: 737-746. Liesack, W., and Dunfield, P.F. (2004) T-RFLP Analysis: A Rapid Fingerprinting Method for Studying Diversity, Structure, and Dynamics of Microbial Communities. In Environmental Microbiology: Methods and Protocols. Spencer, J.F.T., and Ragout de Spencer, A.L. (eds). Totowa, New Jersey: Springer, pp. 23-38. Lilburn, T., Kim, K., Ostrom, N., Byzek, K., Leadbetter, J., and Breznak, J. (2001) Nitrogen fixation by symbiotic and free-living spirochetes. Science 292: 2495. Liu, W.T., Marsh, T.L., Cheng, H., and Forney, L.J. (1997) Characterization of microbial diversity by determining terminal restriction fragment length polymorphisms of genes encoding 16S rRNA. Appl Environ Microbiol 63: 4516-4522. Liu, Y., Yao, T., Jiao, N., Kang, S., Zeng, Y., and Huang, S. (2006) Microbial community structure in moraine lakes and glacial meltwaters, Mount Everest. FEMS Microbiol Lett 265: 98-105.

206

Lo Giudice, A., Brilli, M., Bruni, V., De Domenico, M., Fani, R., and Michaud, L. (2007) Bacterium-bacterium inhibitory interactions among psychrotrophic bacteria isolated from Antarctic seawater (Terra Nova Bay, Ross Sea). FEMS Microbiol Ecol 60: 383-396. Logan, B. (1961) Cryptozoon and associate stromatolites from the recent, Shark Bay, Western Australia. The Journal of Geology 69: 517-533. Logan, B., and Cebulski, D. (1970) Sedimentary environments of Shark Bay, Western Australia. Am. Assoc. Pet. Geol. Mem 13: l-37. Logan, B., Rezak, R., and Ginsburg, R. (1964) Classification and environmental significance of algal stromatolites. The Journal of Geology 72: 68-83. Logan, B., Hoffman, P., and Gebelein, C. (1974) Algal mats, cryptalgal fabrics and structures. Hamelin Pool, Western Australia: American Association of Petroleum Geologists Memoir 22: 140-194. Logan, B., Davies, G., Read, J., and Cebulski, D. (1970) Carbonate sedimentation and environments, Shark bay, Western Australia: AAPG. Long, N., McPhail, D., Brugger, J., and Plimer, I. (2001) Geochemical and thermal characterisation of the Paralana Hot Springs, northern Flinders Ranges, South Australia: Geological Society of Australia; 1999, pp. 35-35. López-Cortés, A., García-Pichel, F., Nübel, U., and Vázquez-Juárez, R. (2001) Cyanobacterial diversity in extreme environments in Baja California, Mexico: a polyphasic study. Int Microbiol 4: 227-236. Lozupone, C., and Knight, R. (2005) UniFrac: a new phylogenetic method for comparing microbial communities. Appl Environ Microbiol 71: 8228. Lozupone, C., Hamady, M., and Knight, R. (2006) UniFrac – An online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics 7: 371. Lysnes, K., Thorseth, I.H., Steinsbu, B.O., Ovreas, L., Torsvik, T., and Pedersen, R.B. (2004) Microbial community diversity in seafloor basalt from the Arctic spreading ridges. FEMS Microbiol Ecol 50: 213-230. Ma, Y., Galinski, E.A., Grant, W.D., Oren, A., and Ventosa, A. (2010) Halophiles 2010: Life in Saline Environments. Appl Environ Microbiol 76: 6971. Mack, E.E., Mandelco, L., Woese, C.R., and Madigan, M.T. (1993) Rhodospirillum sodomense, sp. nov., a Dead Sea Rhodospirillum species. Arch Microbiol 160: 363-371. Madern, D., Pfister, C., and Zaccai, G. (1995) Mutation at a single acidic amino acid enhances the halophilic behaviour of malate dehydrogenase from Haloarcula marismortui in physiological salts. Eur J Biochem 230: 1088-1095. Madern, D., Ebel, C., and Zaccai, G. (2000) Halophilic adaptation of enzymes. Extremophiles 4: 91-98. Madigan, M., Cox, S.S., and Stegeman, R.A. (1984) Nitrogen fixation and nitrogenase activities in members of the family Rhodospirillaceae. J Bacteriol 157: 73- 78.

207

Man-Aharonovich, D., Kress, N., Zeev, E.B., Berman-Frank, I., and Beja, O. (2007) Molecular ecology of nifH genes and transcripts in the eastern Mediterranean Sea. In, pp. 2354-2363. Mannisto, M.K., and Haggblom, M.M. (2006) Characterization of psychrotolerant heterotrophic bacteria from Finnish Lapland. Syst Appl Microbiol 29: 229-243. Marchesi, J.R., Sato, T., Weightman, A.J., Martin, T.A., Fry, J.C., Hiom, S.J., and Wade, W.G. (1998) Design and evaluation of useful bacterium-specific PCR primers that amplify genes coding for bacterial 16S rRNA. Appl Environ Microbiol 64: 795. Marsh, T.L. (1999) Terminal restriction fragment length polymorphism (T-RFLP): An emerging method for characterizing diversity among homologous populations of amplification products. Curr Opin Microbiol 2: 323-327. Marsh, T.L. (2005) Culture-independent microbial community analysis with terminal restriction fragment length polymorphism. Methods Enzymol 397: 308-329. Marsh, T.L., Saxman, P., Cole, J., and Tiedje, J. (2000) Terminal Restriction Fragment Length Polymorphism Analysis Program, a Web-Based Research Tool for Microbial Community Analysis. Appl Environ Microbiol 66: 3616-3620. Marteinsson, V.T., Birrien, J.-L., Reysenbach, A.-L., Vernet, M., Marie, D., Gambacorta, A. et al. (1999) Thermococcus barophilus sp. nov., a new barophilic and hyperthermophilic archaeon isolated under high hydrostatic pressure from a deep-sea hydrothermal vent. Int J Syst Bacteriol 49: 351-359. Martin, A.P. (2002) Phylogenetic approaches for describing and comparing the diversity of microbial communities. Appl Environ Microbiol 68: 3673. Mawson, D. (1927) The Paralana hot spring. Trans R Soc S Aust 20: 391–397. McGinnis, S., and Madden, T.L. (2004) BLAST: at the core of a powerful and diverse set of sequence analysis tools. Nucleic Acids Res 32: W20. McGuinness, L.M., Salganik, M., Vega, L., Pickering, K.D., and Kerkhof, L.J. (2006) Replicability of Bacterial Communities in Denitrifying Bioreactors as Measured by PCR/T-RFLP Analysis. Environ Sci Technol 40: 509-515. McKay, C.P., Friedmann, E.I., and Meyer, M.A. (1991) From Siberia to Mars. Planet Rep Mar-Apr: 8-11. Mehta, M.P., and Baross, J.A. (2006) Nitrogen fixation at 92 C by a hydrothermal vent archaeon. Science 314: 1783-1786. Mehta, M.P., Butterfield, D.A., and Baross, J.A. (2003) Phylogenetic diversity of nitrogenase (nifH) genes in deep-sea and hydrothermal vent environments of the Juan de Fuca Ridge. Appl Environ Microbiol 69: 960. Meng, E., Pettersen, E., Couch, G., Huang, C., and Ferrin, T. (2006) Tools for integrated sequence-structure analysis with UCSF Chimera. BMC Bioinformatics 7: 339. Meng, L., and Feldman, L.J. (2010) CLE14/CLE20 peptides may interact with CLAVATA2/CORYNE receptor-like kinases to irreversibly inhibit cell division in the root meristem of Arabidopsis. Planta 232: 1061-1074. Meng, L., Wong, J.H., Feldman, L.J., Lemaux, P.G., and Buchanan, B.B. (2010) A membrane-associated thioredoxin required for plant growth moves from cell to cell,

208 suggestive of a role in intercellular communication. Proceedings of the National Academy of Sciences 107: 3900-3905. Methé, B.A., Webster, J., Nevin, K., Butler, J., and Lovley, D.R. (2005) DNA microarray analysis of nitrogen fixation and Fe (III) reduction in Geobacter sulfurreducens. Appl Environ Microbiol 71: 2530. Michaud, L., Cello, F., Brilli, M., Fani, R., Giudice, A., and Bruni, V. (2004) Biodiversity of cultivable psychrotrophic marine bacteria isolated from Terra Nova Bay (Ross Sea, Antarctica). FEMS Microbiol Lett 230: 63-71. Miller, S.R., Castenholz, R.W., and Pedersen, D. (2007) Phylogeography of the thermophilic cyanobacterium Mastigocladus laminosus. Appl Environ Microbiol 73: 4751. Miller, S.R., Strong, A.L., Jones, K.L., and Ungerer, M.C. (2009) Bar-coded pyrosequencing reveals shared bacterial community properties along the temperature gradients of two alkaline hot springs in Yellowstone National Park. Appl Environ Microbiol 75: 4565. Mindlin, S., Soina, V., Petrova, M., and Gorlenko, Z. (2008) Isolation of antibiotic resistance bacterial strains from Eastern Siberia permafrost sediments. Russ J Genet 44: 27-34. Mishustin, E.N., and Shilnikova, V.K. (1971) Biological fixation of atmospheric nitrogen. London: Macmillan.420. Miteva, V. (2008) Bacteria in Snow and Glacier Ice. In Psychrophiles: from Biodiversity to Biotechnology. Margesin, R., Schinner, F., Marx, J.-C., and Gerday, C. (eds). Berlin, Germany: Springer pp. 31-50. Miteva, V.I., and Brenchley, J.E. (2005) Detection and Isolation of Ultrasmall Microorganisms from a 120,000-Year-Old Greenland Glacier Ice Core. Appl Environ Microbiol 71: 7806-7818. Miteva, V.I., Sheridan, P.P., and Brenchley, J.E. (2004) Phylogenetic and Physiological Diversity of Microorganisms Isolated from a Deep Greenland Glacier Ice Core. Appl Environ Microbiol 70: 202-213. Miyamoto, K., Hallenbeck, P.C., and Benemann, J.R. (1979) Nitrogen fixation by thermophilic blue-green algae (cyanobacteria): temperature characteristics and potential use in biophotolysis. Appl Environ Microbiol 37: 454. Moeseneder, M.M., Arrieta, J.M., Muyzer, G., Winter, C., and Herndl, G.J. (1999) Optimization of Terminal-Restriction Fragment Length Polymorphism Analysis for Complex Marine Bacterioplankton Communities and Comparison with Denaturing Gradient Gel Electrophoresis. In, pp. 3518-3525. Mohamed, N., Colman, A., Tal, Y., and Hill, R. (2008a) Diversity and expression of nitrogen fixation genes in bacterial symbionts of marine sponges. Environ Microbiol 10: 2910-2921. Mohamed, N.M., Colman, A.S., Tal, Y., and Hill, R.T. (2008b) Diversity and expression of nitrogen fixation genes in bacterial symbionts of marine sponges. Environ Microbiol 10: 2910-2921.

209

Moisander, P.H., Morrison, A.E., Ward, B.B., Jenkins, B.D., and Zehr, J.P. (2007) Spatial-temporal variability in diazotroph assemblages in Chesapeake Bay using an oligonucleotide nifH microarray. Environ Microbiol 9: 1823-1835. Moisander, P.H., Shiue, L., Steward, G.F., Jenkins, B.D., Bebout, B.M., and Zehr, J.P. (2006) Application of a nifH oligonucleotide microarray for profiling diversity of N2-fixing microorganisms in marine microbial mats. Environ Microbiol 8: 1721-1735. Mooney, C., Davey, N., Martin, A., Walsh, I., Shields, D.C., and Pollastri, G. (2011) In silico protein motif discovery and structural analysis. In Methods in molecular biology. Yu, B., and Hinchcliffe, M. (eds). Clifton, NJ: Springer Science+Business Media, pp. 341-353. Moret, M., and Zebende, G. (2007) Amino acid hydrophobicity and accessible surface area. Physical Review E 75: 011920. Moses, J., Fouchet, T., Bézard, B., Gladstone, G., Lellouch, E., and Feuchtgruber, H. (2005) Photochemistry and diffusion in Jupiter's stratosphere: Constraints from ISO observations and comparisons with other giant planets. J. Geophys. Res 110: E08001. Motulsky, H., and Christopoulos, A. (2004) Fitting models to biological data using linear and nonlinear regression: a practical guide to curve fitting: Oxford University Press, USA. Motulsky, H.J., and Brown, R.E. (2006) Detecting outliers when fitting data with nonlinear regression–a new method based on robust nonlinear regression and the false discovery rate. BMC Bioinformatics 7: 123. Moult, J. (2005) A decade of CASP: progress, bottlenecks and prognosis in protein structure prediction. Curr Opin Struct Biol 15: 285-289. Muller, S.W. (1947) Permafrost or, Permanently frozen ground and related engineering problems (Strategic engineering study). Ann Arbor, Michigan: Edwards Brothers.231. Mullis, K., and Erlich, H. (1988) Primer-directed enzymatic amplification of DNA with a thermostable DNA polymerase. Science 239: 487–491. Musat, F., Harder, J., and Widdel, F. (2006) Study of nitrogen fixation in microbial communities of oil-contaminated marine sediment microcosms. Environ Microbiol 8: 1834-1843. Muyzer, G., de Waal, E.C., and Uitterlinden, A.G. (1993) Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl Environ Microbiol 59: 695- 700. Nakazawa, H., Arakaki, A., Narita-Yamada, S., Yashiro, I., Jinno, K., Aoki, N. et al. (2009) Whole genome sequence of Desulfovibrio magneticus strain RS-1 revealed common gene clusters in magnetotactic bacteria. Genome Res 19: 1801. NASA (2012). Missions. URL http://science.nasa.gov/earth-science/missions/ Needleman, S.B., and Wunsch, C.D. (1970) A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 48: 443-453.

210

Neilan, B.A. (1995) Identification and Phylogenetic Analysis of Toxigenic Cyanobacteria by Multiplex Randomly Amplified Polymorphic DNA PCR. In, pp. 2286-2291. Néron, B., Ménager, H., Maufrais, C., Joly, N., Maupetit, J., Letort, S. et al. (2009) Mobyle: a new full web bioinformatics framework. Bioinformatics 25: 3005. Nicolaus, B., Lama, L., Esposito, E., Manca, M., Gambacorta, A., and Prisco, G. (1996) “Bacillus thermoantarcticus” sp. nov., from Mount Melbourne, Antarctica: a novel thermophilic species. Polar Biol 16: 101-104. Nicolaus, B., Improta, R., Manca, M.C., Lama, L., Esposito, E., and Gambacorta, A. (1998) Alicyclobacilli from an unexplored geothermal soil in Antarctica: Mount Rittmann. Polar Biol 19: 133-141. Nicolaus, B., Marsiglia, F., Esposito, E., Trincone, A., Lama, L., Sharp, R. et al. (1991) Isolation of five strains of thermophilic eubacteria in Antarctica. Polar Biol 11: 425-429. Niederberger, T.D., McDonald, I.R., Hacker, A.L., Soo, R.M., Barrett, J.E., Wall, D.H., and Cary, S.C. (2008) Microbial community composition in soils of Northern Victoria Land, Antarctica. Environ Microbiol 10: 1713 - 1724. Nishikawa, K., Kubota, Y., and Tatsuo, O. (1983) Classification of proteins into groups based on amino acid composition and other characters. I. Angular distribution. Journal of biochemistry 94: 981-995. Nuin, P.A.S., Wang, Z., and Tillier, E.R.M. (2006) The accuracy of several multiple sequence alignment programs for proteins. BMC Bioinformatics 7: 471. ry, P., and Letondal, C. (2005) Mobyle: a Web portal framework forיron, B., TuffיN bioinformatics analyses. Network Tools and Applications in Biology (poster), Naples, Italy. O'Leary, M.J., Hearty, P.J., and McCulloch, M.T. (2008) U-series evidence for widespread reef development in Shark Bay during the last interglacial. Palaeogeography, Palaeoclimatology, Palaeoecology 259: 424-435. Okon, Y. (1985) Azospirillum as a potential inoculant for agriculture. Trends Biotechnol 3: 223-228. Oliveros, J. (2007) VENNY. An interactive tool for comparing lists with Venn Diagrams. In: BioinfoGP, CNB-CSIC. URL http://bioinfogp. cnb. csic. es/tools/venny/index. html [accessed on 30 April 2009]. Olson, N., Ainsworth, T., Gates, R., and Takabayashi, M. (2009) Diazotrophic bacteria associated with Hawaiian Montipora corals: diversity and abundance in correlation with symbiotic dinoflagellates. J Exp Mar Biol Ecol 371: 140-146. Omoregie, E., Crumbliss, L., Bebout, B., and Zehr, J. (2004a) Determination of nitrogen-fixing phylotypes in Lyngbya sp. and Microcoleus chthonoplastes cyanobacterial mats from Guerrero Negro, Baja California, Mexico. Appl Environ Microbiol 70: 2119. Omoregie, E.O., Crumbliss, L.L., Bebout, B.M., and Zehr, J.P. (2004b) Comparison of diazotroph community structure in Lyngbya sp. and Microcoleus chthonoplastes

211 dominated microbial mats from Guerrero Negro, Baja, Mexico. FEMS Microbiol Ecol 47: 305-318. Omoregie, E.O., Crumbliss, L.L., Bebout, B.M., and Zehr, J.P. (2004c) Determination of nitrogen-fixing phylotypes in Lyngbya sp. and Microcoleus chthonoplastes cyanobacterial mats from Guerrero Negro, Baja California, Mexico. Appl Environ Microbiol 70: 2119-2128. Oren, A. (1986) Intracellular salt concentrations of the anaerobic halophilic eubacteria Haloanaerobium praevalens and Halobacteroides halobius. Can J Microbiol 32: 4-9. Oren, A. (1999) Bioenergetic aspects of halophilism. Microbiology and Molecular Biology Reviews 63: 334. Oren, A. (2002) Diversity of halophilic microorganisms: environments, phylogeny, physiology, and applications. J Ind Microbiol Biotechnol 28: 56-63. Oren, A., Kessel, M., and Stackebrandt, E. (1989) Ectothiorhodospira marismortui sp. nov., an obligately anaerobic, moderately halophilic purple sulfur bacterium from a hypersaline sulfur spring on the shore of the Dead Sea. Arch Microbiol 151: 524-529. Oren, A., Ionescu, D., Hindiyeh, M., and Malkawi, H. (2009) Morphological, phylogenetic and physiological diversity of cyanobacteria in the hot springs of Zerka Ma. BioRisk 3: 69. Orombelli, G., Baroni, C., and Denton, G. (1991) Late Cenozoic glacial history of the Terra Nova Bay region, northern Victoria Land, Antarctica. Geogr Fis Din Quat 13: 139-163. Osborn, A.M., Moore, E.R.B., and Timmis, K.N. (2000) An evaluation of terminal restriction fragment length polymorphism (T-RFLP) analysis for the study of microbial community structure and dynamics. Environ Microbiol 2: 39-50. Ostroumov, V., and Siegert, C. (1996) Exobiological aspects of mass transfer in microzones of permafrost deposits. Adv Space Res 18: 79-86. Paerl, H.W., Pinckney, J.L., and Steppe, T.F. (2000) Cyanobacterial-bacterial mat consortia: examining the functional unit of microbial survival and growth in extreme environments. Environ Microbiol 2: 11-26. Paerl, H.W., Steppe, T.F., Buchan, K.C., and Potts, M. (2003) Hypersaline cyanobacterial mats as indicators of elevated tropical hurricane activity and associated climate change. AMBIO: A Journal of the Human Environment 32: 87-90. Pandey, K.D., Shukla, S.P., Shukla, P.N., Giri, D.D., Singh, J.S., Singh, P., and Kashyap, A.K. (2004) Cyanobacteria in Antarctica: ecology, physiology and cold adaptation. Cell Mol Biol (Noisy-le-grand) 50: 575-584. Papineau, D., Walker, J., Mojzsis, S., and Pace, N. (2005) Composition and structure of microbial communities from stromatolites of Hamelin Pool in Shark Bay, Western Australia. Appl Environ Microbiol 71: 4822. Paster, B., Dewhirst, F., Weisburg, W., Tordoff, L., Fraser, G., Hespell, R. et al. (1991) Phylogenetic analysis of the spirochetes. J Bacteriol 173: 6101. Patel, B., Morgan, H., and Daniel, R. (1985) Thermophilic anaerobic spirochetes in New Zealand hot springs. FEMS Microbiol Lett 26: 101-106.

212

Pätzold, M., Häusler, B., Bird, M., Tellmann, S., Mattei, R., Asmar, S. et al. (2007) The structure of Venus’ middle atmosphere and ionosphere. Nature 450: 657-660. Paul, S., Bag, S.K., Das, S., Harvill, E.T., and Dutta, C. (2008) Molecular signature of hypersaline adaptation: insights from genome and proteome composition of halophilic prokaryotes. Genome Biol 9: R70. Pearson, W.R. (1991) Searching protein sequence libraries: comparison of the sensitivity and selectivity of the Smith-Waterman and FASTA algorithms. Genomics 11: 635-650. Pennisi, E. (1997) Biotechnology: in industry, extremophiles begin to make their mark. Science 276: 705. Pepi, M., Agnorelli, C., and Bargagli, R. (2005) Iron Demand by Thermophilic and Mesophilic Bacteria Isolated from an Antarctic Geothermal Soil. BioMetals 18: 529- 536. Pernthaler, A., Dekas, A.E., Brown, C.T., Goffredi, S.K., Embaye, T., and Orphan, V.J. (2008) Diverse syntrophic partnerships from deep-sea methane vents revealed by direct cell capture and metagenomics. Proceedings of the National Academy of Sciences 105: 7052. Perreault, N.N., Andersen, D.T., Pollard, W.H., Greer, C.W., and Whyte, L.G. (2007) Characterization of the Prokaryotic Diversity in Cold Saline Perennial Springs of the Canadian High Arctic. Appl Environ Microbiol 73: 1532-1543. Peters, J., Fisher, K., and Dean, D. (1995) Nitrogenase structure and function: a biochemical-genetic perspective. Annual Reviews in Microbiology 49: 335-366. Peters, J.W., and Szilagyi, R.K. (2006) Exploring new frontiers of nitrogenase structure and mechanism. Curr Opin Chem Biol 10: 101-108. Petrova, M.A., Mindlin, S.Z., Gorlenko, Z.M., Kalyaeva, E.S., Soina, V.S., and Bogdanova, E.S. (2002) Mercury-Resistant Bacteria from Permafrost Sediments and Prospects for their Use in Comparative Studies of Mercury Resistance Determinants. Russ J Genet 38: 1330-1334. Pettersen, E.F., Goddard, T.D., Huang, C.C., Couch, G.S., Greenblatt, D.M., Meng, E.C., and Ferrin, T.E. (2004) UCSF Chimera - A Visualization System for Exploratory Research and Analysis. J Comput Chem 25: 1605-1612. Piccardi, G., Udisti, R., and Casella, F. (1994) Seasonal trends and chemical composition of snow at Terra Nova Bay (Antarctica). Int J Environ Anal Chem 55: 219- 234. Pierrehumbert, R.T. (2011) Infrared radiation and planetary temperature. Physics Today 64: 33. Pikuta, E.V., Hoover, R.B., and Tang, J. (2007) Microbial extremophiles at the limits of life. Crit Rev Microbiol 33: 183-209. Pinckney, J., Paerl, H.W., and Bebout, B.M. (1995) Salinity control of benthic microbial mat community production in a Bahamian hypersaline lagoon. J Exp Mar Biol Ecol 187: 223-237.

213

Pinckney, J.L., and Paerl, H.W. (1997) Anoxygenic Photosynthesis and Nitrogen Fixation by a Microbial Mat Community in a Bahamian Hypersaline Lagoon. Appl Environ Microbiol 63: 420-426. Playford, P.E., Cockbain, A.E., and Walter, M.R. (1976) Chapter 8.2 Modern Algal Stromatolites at Hamelin Pool, A Hypersaline Barred Basin in Shark Bay, Western Australia. In Developments in Sedimentology: Elsevier, pp. 389-411. Pointing, S.B., Chan, Y., Lacap, D.C., Lau, M.C.Y., Jurgens, J.A., and Farrell, R.L. (2009) Highly specialized microbial diversity in hyper-arid polar desert. Proceedings of the National Academy of Sciences 106: 19964-19969. Polański, A., and Kimmel, M. (2007) Bioinformatics: Springer. Pollastri, G., Baldi, P., Fariselli, P., and Casadio, R. (2002) Prediction of coordination number and relative solvent accessibility in proteins. Proteins: Structure, Function, and Bioinformatics 47: 142-153. Polz, M.F., and Cavanaugh, C.M. (1998) Bias in template-to-product ratios in multitemplate PCR. Appl Environ Microbiol 64: 3724-3730. Posada, D., Guindon, S., Delsuc, F., Dufayard, J.-F., and Gascuel, O. (2009) Estimating Maximum Likelihood Phylogenies with PhyML. In Bioinformatics for DNA Sequence Analysis. Posada, D. (ed): Humana Press, pp. 113-137. Postgate, J., Kent, H., and Robson, R. (1988) Nitrogen fixation by Desulfovibrio. The Nitrogen and Sulphur Cycles: 457–471. Postgate, J.R. (1982) The fundamentals of nitrogen fixation: Cambridge Univ Pr. Postgate, J.R. (1987) Nitrogen Fixation: Cambridge University Press. Priscu, J.C., Fritsen, C.H., Adams, E.E., Giovannoni, S.J., Paerl, H.W., McKay, C.P. et al. (1998) Perennial Antarctic lake ice: an oasis for life in a polar desert. Science 280: 2095-2098. Priscu, J.C., Adams, E.E., Lyons, W.B., Voytek, M.A., Mogk, D.W., Brown, R.L. et al. (1999) Geomicrobiology of Subglacial Ice Above Lake Vostok, Antarctica. Science 286: 2141. Proctor, L.M. (1997) Nitrogen-fixing, photosynthetic, anaerobic bacteria associated with pelagic copepods. Aquat Microb Ecol 12: 105-113. Pumbwe, L., Skilbeck, C.A., and Wexler, H.M. (2007) Impact of Anatomic Site on Growth, Efflux-Pump Expression, Cell Structure, and Stress Responsiveness of Bacteroides fragilis. Curr Microbiol 55: 362-365. Pupko, T., Bell, R.E., Mayrose, I., Glaser, F., and Ben-Tal, N. (2002) Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues. Bioinformatics 18: S71-77. Qiu, X., Wu, L., Huang, H., McDonel, P.E., Palumbo, A.V., Tiedje, J.M., and Zhou, J. (2001) Evaluation of PCR-Generated Chimeras, Mutations, and Heteroduplexes with 16S rRNA Gene-Based Cloning. Appl Environ Microbiol 67: 880- 887.

214

Quaiser, A., Zivanovic, Y., Moreira, D., and López-García, P. (2010) Comparative metagenomics of bathypelagic plankton and bottom sediment from the Sea of Marmara. The ISME Journal. Ramelot, T.A., Cort, J.R., Goldsmith-Fischman, S., Kornhaber, G.J., Xiao, R., Shastry, R. et al. (2004) Solution NMR structure of the iron–sulfur cluster assembly protein U (IscU) with zinc bound at the active site. J Mol Biol 344: 567-583. Ramsden, J. (2009) Bioinformatics: an introduction: Springer. Rao, J., and Argos, P. (1981) Structural stability of halophilic proteins. Biochemistry 20: 6536-6543. Rasche, M.E., and Seefeldt, L.C. (1997) Reduction of Thiocyanate, Cyanate, and Carbon Disulfide by Nitrogenase:  Kinetic Characterization and EPR Spectroscopic Analysis†Biochemistry 36: 8574-8585. Ravenschlag, K., Sahm, K., Pernthaler, J., and Amann, R. (1999) High Bacterial Diversity in Permanently Cold Marine Sediments. Appl Environ Microbiol 65: 3982- 3989. Raymond, J., Siefert, J.L., Staples, C.R., and Blankenship, R.E. (2004a) The Natural History of Nitrogen Fixation. Mol Biol Evol 21: 541-554. Raymond, J., Siefert, J.L., Staples, C.R., and Blankenship, R.E. (2004b) The Natural History of Nitrogen Fixation. In, pp. 541-554. Razia, M., Raja, K., Padmanaban, K., Sivaramakrishnan, S., and Chellapandi, P. (2010) A Phylogenetic Approach for Assigning Function of Hypothetical Proteins in Photorhabdus luminescens Subsp. laumondii TT01 Genome. J Comput Sci Syst Biol 3: 21-29. Reddy, K., Haskell, J., Sherman, D., and Sherman, L. (1993) Unicellular, aerobic nitrogen-fixing cyanobacteria of the genus Cyanothece. J Bacteriol 175: 1284. Rengpipat, S., Lowe, S., and Zeikus, J. (1988) Effect of extreme salt concentrations on the physiology and biochemistry of Halobacteroides acetoethylicus. J Bacteriol 170: 3065. Rhodes, M.E., Fitz-Gibbon, S.T., Oren, A., and House, C.H. (2010) Amino acid signatures of salinity on an environmental scale with a focus on the Dead Sea. Environ Microbiol 12: 2613-2623. Richardson, L.L., and Castenholz, R.W. (1987) Diel vertical movements of the cyanobacterium Oscillatoria terebriformis in a sulfide-rich hot spring microbial mat. Appl Environ Microbiol 53: 2142. Riding, R. (1999) The term stromatolite: towards an essential definition. Lethaia 32: 321-330. Riederer-Henderson, M.A., and Wilson, P. (1970) Nitrogen fixation by sulphate- reducing bacteria. Microbiology 61: 27. Ríos, A., Valera, S., Ascaso, C., Davila, A., Kastovsky, J., McKay, C.P. et al. (2010) Comparative analysis of the microbial communities inhabiting halite evaportes of the Atacama Desert. International microbiology: official journal of the Spanish Society for Microbiology 13: 79-89.

215

Risatti, J., Capman, W., and Stahl, D. (1994) Community structure of a microbial mat: the phylogenetic dimension. Proceedings of the National Academy of Sciences of the United States of America 91: 10173. Rodriguez, R., Chinea, G., Lopez, N., Pons, T., and Vriend, G. (1998) Homology modeling, model and software evaluation: three related resources. Bioinformatics 14: 523-528. Roesch, L.F.W., Fulthorpe, R.R., Jaccques, R.J.S., Bento, F.M., and de Oliveira Camargo, F.A. (2010) Biogeography of diazotrophic bacteria in soils. World Journal of Microbiology and Biotechnology: 1-6. Rothschild, L.J., and Mancinelli, R.L. (2001) Life in extreme environments. Nature 409: 1092-1101. Roy, A., Kucukural, A., and Zhang, Y. (2010) I-TASSER: a unified platform for automated protein structure and function prediction. Nature protocols 5: 725-738. Rychlewski, L., Li, W., Jaroszewski, L., and Godzik, A. (2000) Comparison of sequence profiles. Strategies for structural predictions using sequence information. Protein Sci 9: 232-241. Sakaguchi, T., Arakaki, A., and Matsunaga, T. (2002) Desulfovibrio magneticus sp. nov., a novel sulfate-reducing bacterium that produces intracellular single-domain-sized magnetite particles. Int J Syst Evol Microbiol 52: 215. Sander, C., and Schneider, R. (1991) Database of homology derived protein structures and the structural meaning of sequence alignment. Proteins: Structure, Function, and Bioinformatics 9: 56-68. Schaller, R.R. (1997) Moore's law: past, present and future. Spectrum, IEEE 34: 52-59. Schink, B. (1992) The genus Pelobacter. The Prokaryotes: 3393–3399. Schleifer, K.-H. (2004) Microbial Diversity: Facts, Problems and Prospects. Syst Appl Microbiol 27: 3-9. Schlessman, J.L., Woo, D., Joshua-Tor, L., Howard, J.B., and Rees, D.C. (1998) Conformational variability in structures of the nitrogenase iron proteins from Azotobacter vinelandii and Clostridium pasteurianum1. J Mol Biol 280: 669-685. Schloss, P., and Handelsman, J. (2006a) Introducing TreeClimber, a test to compare microbial community structures. Appl Environ Microbiol 72: 2379. Schloss, P.D., and Handelsman, J. (2005) Introducing DOTUR, a Computer Program for Defining Operational Taxonomic Units and Estimating Species Richness. Appl Environ Microbiol 71: 1501-1506. Schloss, P.D., and Handelsman, J. (2006b) Introducing SONS, a Tool for Operational Taxonomic Unit-Based Comparisons of Microbial Community Memberships and Structures. Appl Environ Microbiol 72: 6773-6779. Schloss, P.D., Larget, B.R., and Handelsman, J. (2004) Integration of Microbial Ecology and Statistics: a Test To Compare Gene Libraries. Appl Environ Microbiol 70: 5485-5492. Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B. et al. (2009) Introducing mothur: Open-Source, Platform-Independent, Community-

216

Supported Software for Describing and Comparing Microbial Communities. Appl Environ Microbiol 75: 7537-7541. Schneegurt, M.A., Sherman, D.M., Nayar, S., and Sherman, L.A. (1994) Oscillating behavior of carbohydrate granule formation and dinitrogen fixation in the cyanobacterium Cyanothece sp. strain ATCC 51142. J Bacteriol 176: 1586. Schopf, J.W. (2006) Fossil evidence of Archaean life. Philosophical Transactions of the Royal Society B: Biological Sciences 361: 869-885. Segawa, T., Miyamoto, K., Ushida, K., Agata, K., Okada, N., and Kohshima, S. (2005) Seasonal Change in Bacterial Flora and Biomass in Mountain Snow from the Tateyama Mountains, Japan, Analyzed by 16S rRNA Gene Sequencing and Real-Time PCR. Appl Environ Microbiol 71: 123-130. Serebryakov, S.N., and Walter, M.R. (1976a) Chapter 10.8 Distribution of Stromatolites in Riphean Deposits of the Uchur-Maya Region of Siberia. In Developments in Sedimentology: Elsevier, pp. 613-614, 615-620, 621-633. Serebryakov, S.N., and Walter, M.R. (1976b) Chapter 6.4 Biotic and Abiotic Factors Controlling the Morphology of Riphean Stromatolites. In Developments in Sedimentology: Elsevier, pp. 321-336. Serrano, L., Sancho, J., Hirshberg, M., and Fersht, A.R. (1992a) [alpha]-Helix stability in proteins:: I. Empirical correlations concerning substitution of side-chains at the N and C-caps and the replacement of alanine by glycine or serine at solvent-exposed surfaces. J Mol Biol 227: 544-559. Serrano, L., Neira, J.L., Sancho, J., and Fersht, A.R. (1992b) Effect of alanine versus glycine in α-helices on protein stability. Severin, I., and Stal, L.J. (2010) NifH expression by five groups of phototrophs compared with nitrogenase activity in coastal microbial mats. FEMS Microbiol Ecol 73: 55-67. Severin, J., Wohlfarth, A., and Galinski, E.A. (1992) The predominant role of recently discovered tetrahydropyrimidines for the osmoadaptation of halophilic eubacteria. Journal of general microbiology 138: 1629. Sheridan, P.P., Miteva, V.I., and Brenchley, J.E. (2003) Phylogenetic Analysis of Anaerobic Psychrophilic Enrichment Cultures Obtained from a Greenland Glacier Ice Core. Appl Environ Microbiol 69: 2153-2160. Shi, R., Proteau, A., Villarroya, M., Moukadiri, I., Zhang, L., Trempe, J.F. et al. (2010) Structural basis for Fe–S cluster assembly and tRNA thiolation mediated by IscS protein–protein interactions. PLoS Biol 8: e1000354. Shi, T., Reeves, R.H., Gilichinsky, D.A., and Friedmann, E.I. (1997) Characterization of Viable Bacteria from Siberian Permafrost by 16S rDNA Sequencing. Microb Ecol 33: 169-179. Short, S.M., and Zehr, J.P. (2005) Quantitative analysis of nifH genes and transcripts from aquatic environments. Methods Enzymol 397: 380-394. Siddiqui, K.S., and Cavicchioli, R. (2006) Cold-adapted enzymes. Annu. Rev. Biochem. 75: 403-433.

217

Siddiqui, K.S., and Thomas, T. (2008) Protein adaptation in extremophiles: Nova Biomedical. Singer, G., and Hickey, D.A. (2003) Thermophilic prokaryotes have characteristic patterns of codon usage, amino acid composition and nucleotide content. Gene 317: 39. Singh, C., Soni, R., Jain, S., Roy, S., and Goel, R. (2010) Diversification of nitrogen fixing bacterial community using nifH gene as a biomarker in different geographical soils of Western Indian Himalayas. J Environ Biol. Singleton, D.R., Furlong, M.A., Rathbun, S.L., and Whitman, W.B. (2001) Quantitative Comparisons of 16S rRNA Gene Sequence Libraries from Environmental Samples. Appl Environ Microbiol 67: 4374-4376. Sjöling, S., and Cowan, D.A. (2003) High 16S rDNA bacterial diversity in glacial meltwater lake sediment, Bratina Island, Antarctica. Extremophiles 7: 275-282. Skidmore, M., Anderson, S.P., Sharp, M., Foght, J., and Lanoil, B.D. (2005) Comparison of Microbial Community Compositions of Two Subglacial Environments Reveals a Possible Role for Microbes in Chemical Weathering Processes. Appl Environ Microbiol 71: 6986-6997. Smith, A.B. (1992) Geology of the Yudnamutana Gorge, Paralana Hot Springs Area and Genesis of Mineralization at the Hodgkinson Prospect, Mount Painter Province, South Australia: University of Adelaide, Dept. of Geology and Geophysics. Smith, M.H. (1966) The amino acid composition of proteins. J Theor Biol 13: 261-282. Smith, S., and Atkinson, M. (1983) Mass balance of carbon and phosphorus in Shark Bay, Western Australia. Limnol Oceanogr 28: 625-639. Smith, V.R., and Russell, S. (1982) Acetylene reduction by bryophyte-cyanobacteria associations on a Subantarctic island. Polar Biol V1: 153-157. Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.M., Neal, P.R. et al. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proceedings of the National Academy of Sciences 103: 12115-12120. Soina, V.S., Vorobiova, E.A., Zvyagintsev, D.G., and Gilichinsky, D.A. (1995) Preservation of cell structures in permafrost: A model for exobiology. Adv Space Res 15: 237-242. Sokolova, T.G., Kostrikina, N.A., Chernyh, N.A., Kolganova, T.V., Tourova, T.P., and Bonch-Osmolovskaya, E.A. (2005) Thermincola carboxydiphila gen. nov., sp. nov., a novel anaerobic, carboxydotrophic, hydrogenogenic bacterium from a hot spring of the Lake Baikal area. Int J Syst Evol Microbiol 55: 2069. Somero, G. (2003) Protein adaptations to temperature and pressure: complementary roles of adaptive changes in amino acid sequence and internal milieu* 1. Comparative Biochemistry and Physiology Part B: Biochemistry and Molecular Biology 136: 577- 591. Sonne-Hansen, J., and Ahring, B. (1999) Thermodesulfobacterium hveragerdense sp. nov., and Thermodesulfovibrio islandicus sp. nov., two thermophilic sulfate reducing bacteria isolated from a Icelandic hot spring. Syst Appl Microbiol 22: 559-564. Sorokin, D.Y., Tourova, T.P., Henstra, A.M., Stams, A.J.M., Galinski, E.A., and Muyzer, G. (2008) Sulfidogenesis under extremely haloalkaline conditions by

218

Desulfonatronospira thiodismutans gen. nov., sp. nov., and Desulfonatronospira delicata sp. nov. - a novel lineage of Deltaproteobacteria from hypersaline soda lakes. Microbiology 154: 1444-1453. Spirina, E., Cole, J., Chai, B., Gilichinksy, D., and Tiedje, J. (2003) New high throughput approach to study ancient microbial phylogenetic diversity in permafrost. In Geophysical Research Abstracts. Nice, France: Copernicus Publications. Sprigg, R.C. (1984) Arkaroola-Mount Painter in the northern Flinders Ranges, SA: the last billion years: Arkaroola. Sridharan, S., Nicholls, A., and Honig, B. (1992) A new vertex algorithm to calculate solvent accessible surface areas. Biophys. J 61: A174. Srinivasan, V., Netz, D.J.A., Webert, H., Mascarenhas, J., Pierik, A.J., Michel, H., and Lill, R. (2007) Structure of the yeast WD40 domain protein Cia1, a component acting late in iron-sulfur protein biogenesis. Structure 15: 1246-1257. Stal, L., and Krumbein, W. (1987) Temporal separation of nitrogen fixation and photosynthesis in the filamentous, non-heterocystous cyanobacterium Oscillatoria sp. Arch Microbiol 149: 76-80. Stal, L.J., and Heyer, H. (1987) Dark anaerobic nitrogen fixation (acetylene reduction) in the cyanobacterium Oscillatoria sp. FEMS Microbiol Lett 45: 227-232. States, D.J., and Botstein, D. (1991) Molecular Sequence Accuracy and the Analysis of Protein Coding Regions. Proceedings of the National Academy of Sciences of the United States of America 88: 5518-5522. Steppe, T., and Paerl, H. (2002) Potential N2 fixation by sulfate-reducing bacteria in a marine intertidal microbial mat. Aquat Microb Ecol 28: 1-12. Steppe, T.F., Pinckney, J.L., Dyble, J., and Paerl, H.W. (2001) Diazotrophy in Modern Marine Bahamian Stromatolites. Microb Ecol 41: 36-44. Steunou, A.S., Bhaya, D., Bateson, M.M., Melendrez, M.C., Ward, D.M., Brecht, E. et al. (2006) In situ analysis of nitrogen fixation and metabolic switching in unicellular thermophilic cyanobacteria inhabiting hot spring microbial mats. Proceedings of the National Academy of Sciences of the United States of America 103: 2398-2403. Steunou, A.S., Jensen, S.I., Brecht, E., Becraft, E.D., Bateson, M.M., Kilian, O. et al. (2008) Regulation of nif gene expression and the energetics of N2 fixation over the diel cycle in a hot spring microbial mat. The ISME Journal 2: 364-378. Steven, B., Briggs, G., McKay, C.P., Pollard, W.H., Greer, C.W., and Whyte, L.G. (2007) Characterization of the microbial diversity in a permafrost sample from the Canadian high Arctic using culture-dependent and culture-independent methods. FEMS Microbiol Ecol 59: 513-523. Stewart, W. (1970a) Nitrogen fixation by blue-green algae in Yellowstone thermal areas. Stewart, W. (1973) Nitrogen fixation by photosynthetic microorganisms. Annual Reviews in Microbiology 27: 283-316. Stewart, W.D.P. (1967) Nitrogen Turnover in Marine and Brackish Habitats II. Use of 15N in Measuring Nitrogen Fixation in the Field. In, pp. 385-407.

219

Stewart, W.D.P. (1970b) Algal fixation of atmospheric nitrogen. Plant and Soil 32: 555-588. Stormo, G.D. (2002) An Introduction to Sequence Similarity (“Homology”) Searching: John Wiley & Sons, Inc. Stöver, B., and Müller, K. (2010) TreeGraph 2: Combining and visualizing evidence from different phylogenetic analyses. BMC Bioinformatics 11: 7. Sullivan, J., and Joyce, P. (2005) Model selection in phylogenetics. Annual Review of Ecology, Evolution, and Systematics 36: 445. Summers, M.L., Wallis, J.G., Campbell, E.L., and Meeks, J.C. (1995) Genetic evidence of a major role for glucose-6-phosphate dehydrogenase in nitrogen fixation and dark growth of the cyanobacterium Nostoc sp. strain ATCC 29133. J Bacteriol 177: 6184. Sundset, M., Præsteng, K., Cann, I., Mathiesen, S., and Mackie, R. (2007) Novel Rumen Bacterial Diversity in Two Geographically Separated Sub-Species of . Microb Ecol 54: 424-438. Sung, Y., Fletcher, K.E., Ritalahti, K.M., Apkarian, R.P., Ramos-Hernández, N., Sanford, R.A. et al. (2006) Geobacter lovleyi sp. nov. strain SZ, a novel metal- reducing and tetrachloroethene-dechlorinating bacterium. Appl Environ Microbiol 72: 2775. Suzuki, M.T., and Giovannoni, S.J. (1996) Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl Environ Microbiol 62: 625- 630. Tamura, K., Dudley, J., Nei, M., and Kumar, S. (2007) MEGA4: molecular evolutionary genetics analysis (MEGA) software version 4.0. Mol Biol Evol 24: 1596. Tamura, K., Peterson, D., Peterson, N., Stecher, G., Nei, M., and Kumar, S. (2011) MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol. Taroncher-Oldenburg, G., Griner, E.M., Francis, C.A., and Ward, B.B. (2003) Oligonucleotide microarray for the study of functional gene diversity in the nitrogen cycle in the environment. Appl Environ Microbiol 69: 1159-1171. Tezcan, F.A., Kaiser, J.T., Mustafi, D., Walton, M.Y., Howard, J.B., and Rees, D.C. (2005) Nitrogenase complexes: multiple docking sites for a nucleotide switch protein. Science 309: 1377-1380. The UniProt, C. (2008) The Universal Protein Resource (UniProt). Nucl. Acids Res. 36: D190-195. Thomas, D.N. (2005) Photosynthetic microbes in freezing deserts. Trends Microbiol 13: 87-88. Thomas, M., and Walter, M.R. (2002) Application of hyperspectral infrared analysis of hydrothermal alteration on Earth and Mars. Astrobiology 2: 335-351. Tillett, D., and Neilan, B.A. (2000) Xanthogenate nucleic acid isolation from cultured and environmental cyanobacteria. Journal of Phycology 36: 251-258. Tiquia, S.M., Lloyd, J., Herms, D.A., Hoitink, H.A.J., and Michel, F.C. (2002) Effects of mulching and fertilization on soil nutrients, microbial activity and

220 rhizosphere bacterial community structure determined by analysis of TRFLPs of PCR- amplified 16S rRNA genes. Appl Soil Ecol 21: 31-48. Tourova, T.P., Spiridonova, E.M., Berg, I.A., Slobodova, N.V., Boulygina, E.S., and Sorokin, D.Y. (2007) Phylogeny and evolution of the family Ectothiorhodospiraceae based on comparison of 16S rRNA, cbbL and nifH gene sequences. Int J Syst Evol Microbiol 57: 2387. Tripp, H., Bench, S., Turk, K., Foster, R., Desany, B., Niazi, F. et al. (2010) Metabolic streamlining in an open-ocean nitrogen-fixing cyanobacterium. Nature 464: 90-94. Tsuihiji, H., Yamazaki, Y., Kamikubo, H., Imamoto, Y., and Kataoka, M. (2006) Cloning and characterization of nif structural and regulatory genes in the purple sulfur bacterium, Halorhodospira halophila. J Biosci Bioeng 101: 263-270. UNESCO (1991) World heritage nomination - IUCN summary, 578: Shark Bay (Australia). In. van de Vossenberg, J.L.C.M., Driessen, A.J.M., and Konings, W.N. (1998) The essence of being extremophilic: the role of the unique archaeal membrane lipids. Extremophiles 2: 163-170. van de Vossenberg, J.L.C.M., Driessen, A.J.M., Grant, D., and Konings, W.N. (1999) Lipid membranes from halophilic and alkali-halophilic Archaea have a low H+ and Na+ permeability at high salt concentration. Extremophiles 3: 253-257. van den Burg, B. (2003) Extremophiles as a source for novel enzymes. Curr Opin Microbiol 6: 213-218. Van Trappen, S., Vandecandelaere, I., Mergaert, J., and Swings, J. (2004) Algoriphagus antarcticus sp. nov., a novel from microbial mats in Antarctic lakes. Int J Syst Evol Microbiol 54: 1969-1973. Veerassamy, S., Smith, A., and Tillier, E. (2003) A transition probability model for amino acid substitutions from blocks. J Comput Biol 10: 997-1010. Vincent, W., Castenholz, R., Downes, M., and H-Williams, C. (1993) Antarctic cyanobacteria: Light, nutrients, and photosynthesis in the microbial mat environment. Journal of Phycology 29: 745-755. Vishnivetskaya, T.A., Petrova, M.A., Urbance, J., Ponder, M., Moyer, C.L., Gilichinsky, D.A., and Tiedje, J.M. (2006) Bacterial Community in Ancient Siberian Permafrost as Characterized by Culture and Culture-Independent Methods. Astrobiology 6: 400-414. Vriend, G. (1990) WHAT IF: a molecular modeling and drug design program. J Mol Graphics 8: 52-56. Wagner, D., Kobabe, S., and Liebner, S. (2009) Bacterial community structure and carbon turnover in permafrost-affected soils of the Lena Delta, northeastern Siberia. Can J Microbiol 55: 73-83. Walker, J.E., Saraste, M., Runswick, M.J., and Gay, N.J. (1982) Distantly related sequences in the alpha-and beta-subunits of ATP synthase, myosin, kinases and other ATP-requiring enzymes and a common nucleotide binding fold. The EMBO journal 1: 945.

221

Walter, M. (1976) Stromatolites. New York: Elsevier.20.1-790. Ward, D.M., Ferris, M.J., Nold, S.C., and Bateson, M.M. (1998) A natural view of microbial biodiversity within hot spring cyanobacterial mat communities. Microbiology and Molecular Biology Reviews 62: 1353. Watanabe, A., and Yamamoto, Y. (1971) Algal nitrogen fixation in the tropics. Plant and Soil 35: 403-413. Welch, B.L. (1947) The generalization ofstudent's' problem when several different population variances are involved. Biometrika 34: 28-35. Weller, R., Bateson, M.M., Heimbuch, B.K., Kopczynski, E.D., and Ward, D.M. (1992) Uncultivated cyanobacteria, Chloroflexus-like inhabitants, and spirochete-like inhabitants of a hot spring microbial mat. Appl Environ Microbiol 58: 3964. Whitton, B.A., and Potts, M. (2000) The Ecology of Cyanobacteria Their Diversity in Time and Space: Kluwer Academic Publishers. Wickstrom, C.E. (1984) Discovery and evidence of nitrogen fixation by thermophilic heterotrophs in hot springs. Curr Microbiol 10: 275-280. Wilson, K. (2001) Preparation of genomic DNA from bacteria. In Current Protocols in Molecular Biology. F. M. Ausubel, R.B., R. E. Kingston, D. D. Moore, J.G. Seidman, J. A. Smith, K. Struhl (ed). New York: John Wiley & Sons Inc, p. Unit 2.4. Wilson, K.H., and Blitchington, R.B. (1996) Human colonic biota studied by ribosomal DNA sequence analysis. Appl Environ Microbiol 62: 2273. Wu, S., and Zhang, Y. (2008) MUSTER: improving protein sequence profile–profile alignments by using multiple sources of structure information. Proteins: Structure, Function, and Bioinformatics 72: 547-556. Xiang, S., Yao, T., An, L., Xu, B., and Wang, J. (2005) 16S rRNA Sequences and Differences in Bacteria Isolated from the Muztag Ata Glacier at Increasing Depths. Appl Environ Microbiol 71: 4619-4627. Xiang, S.R., Yao, T.D., An, L.Z., Xu, B.Q., Li, Z., Wu, G.J. et al. (2004) Bacterial diversity in Malan ice core from the Tibetan Plateau. Folia Microbiol 49: 269-275. Xiao, X., Li, M., You, Z., and Wang, F. (2007) Bacterial communities inside and in the vicinity of the Chinese Great Wall Station, King George Island, Antarctica. Antarct Sci 19: 11-16. Yakimov, M.M., Giuliano, L., Chernikova, T.N., Gentile, G., Abraham, W.R., Timmis, K., and Golyshin, P. (2001) Alcalilimnicola halodurans gen. nov., sp. nov., an alkaliphilic, moderately halophilic and extremely halotolerant bacterium, isolated from sediments of soda-depositing Lake Natron, East Africa Rift Valley. Int J Syst Evol Microbiol 51: 2133-2143. Yakimov, M.M., Gentile, G., Bruni, V., Cappello, S., D'Auria, G., Golyshin, P.N., and Giuliano, L. (2004) Crude oil-induced structural shift of coastal bacterial communities of rod bay (Terra Nova Bay, Ross Sea, Antarctica) and characterization of cultured cold-adapted hydrocarbonoclastic bacteria. FEMS Microbiol Ecol 49: 419-432. Yamane, K., Hattori, Y., Ohtagaki, H., and Fujiwara, K. (2011) Microbial diversity with dominance of 16S rRNA gene sequences with high GC contents at 74 and 98° C subsurface crude oil deposits in Japan. FEMS Microbiol Ecol.

222

Yamauchi, K., Doi, K., Kinoshita, M., Kii, F., and Fukuda, H. (1992) Archaebacterial lipid models: highly salt-tolerant membranes from 1, 2- diphytanylglycero-3-phosphocholine. Biochimica et Biophysica Acta (BBA)- Biomembranes 1110: 171-177. Yang, R., Hou, Y., Campbell, C.A., Palaniyandi, K., Zhao, Q., Bordner, A.J., and Chang, X. (2011) Glutamine residues in Q-loops of multidrug resistance protein MRP1 contribute to ATP binding via interaction with metal cofactor. Biochimica et Biophysica Acta (BBA)-Biomembranes 1808: 1790-1796. Yannarell, A.C., Steppe, T.F., and Paerl, H.W. (2006) Genetic Variance in the Composition of Two Functional Groups (Diazotrophs and Cyanobacteria) from a Hypersaline Microbial Mat. Appl Environ Microbiol 72: 1207-1217. Yannarell, A.C., Steppe, T.F., and Paerl, H.W. (2007) Disturbance and recovery of microbial community structure and function following Hurricane Frances. Environ Microbiol 9: 576-583. Yue, J., and Clayton, M. (2005) A similarity measure based on species proportions. Communications in Statistics-Theory and Methods 34: 2123-2131. Zadorina, E., Slobodova, N., Boulygina, E., Kolganova, T., Kravchenko, I., and Kuznetsov, B. (2009) Analysis of the diversity of diazotrophic bacteria in peat soil by cloning of the nifH gene. Microbiology 78: 218-226. Zani, S., Mellon, M.T., Collier, J.L., and Zehr, J.P. (2000) Expression of nifH Genes in Natural Microbial Assemblages in Lake George, New York, Detected by Reverse Transcriptase PCR. Appl Environ Microbiol 66: 3119-3124. Zehr, J., Bench, S., Carter, B., Hewson, I., Niazi, F., Shi, T. et al. (2008) Globally distributed uncultivated oceanic N2-fixing cyanobacteria lack oxygenic photosystem II. Science 322: 1110. Zehr, J.P., and McReynolds, L.A. (1989) Use of degenerate oligonucleotides for amplification of the nifH gene from the marine cyanobacterium Trichodesmium thiebautii. Appl Environ Microbiol 55: 2522-2526. Zehr, J.P., Mellon, M.T., and Hiorns, W.D. (1997) Phylogeny of cyanobacterial nifH genes: evolutionary implications and potential applications to natural assemblages. Microbiology 143: 1443-1450. Zehr, J.P., Mellon, M.T., and Zani, S. (1998) New Nitrogen-Fixing Microorganisms Detected in Oligotrophic Oceans by Amplification of Nitrogenase (nifH) Genes. Appl Environ Microbiol 64: 5067. Zehr, J.P., Jenkins, B.D., Short, S.M., and Steward, G.F. (2003a) Nitrogenase gene diversity and microbial community structure: a cross-system comparison. Environ Microbiol 5: 539-554. Zehr, J.P., Crumbliss, L.L., Church, M.J., Omoregie, E.O., and Jenkins, B.D. (2003b) Nitrogenase genes in PCR and RT-PCR reagents: implications for studies of diversity of functional genes. BioTechniques 35: 996-1002, 1004-1005. Zehr, J.P., Mellon, M., Braun, S., Litaker, W., Steppe, T., and Paerl, H.W. (1995) Diversity of Heterotrophic Nitrogen Fixation Genes in a Marine Cyanobacterial Mat. Appl Environ Microbiol 61: 2527-2532.

223

Zeitlin, C., Cleghorn, T., Cucinotta, F., Saganti, P., Andersen, V., Lee, K. et al. (2004) Overview of the Martian radiation environment experiment. Adv Space Res 33: 2204-2210. Zhang, L., Hurek, T., and Reinhold-Hurek, B. (2007a) A nifH-based oligonucleotide microarray for functional diagnostics of nitrogen-fixing microorganisms. Microb Ecol 53: 456-470. Zhang, S., Hou, S., Ma, X., Qin, D., and Chen, T. (2007b) Culturable bacteria in Himalayan ice in response to atmospheric circulation. Biogeosci Disc 3: 765-778. Zhang, X., Yao, T., Ma, X., and Wang, N. (2002) Microorganisms in a high altitude Glacier Ice in Tibet. Folia Microbiol 47: 241-245. Zhang, Y. (2007) Template based modeling and free modeling by I TASSER in CASP7. Proteins: Structure, Function, and Bioinformatics 69: 108-117. Zhang, Y. (2008) I-TASSER server for protein 3 D structure prediction. BMC Bioinformatics 9: 40. Zhang, Y. (2009) I-TASSER: fully automated protein structure prediction in CASP8. Proteins: Structure, Function, and Bioinformatics 77: 100-113. Zhang, Y., and Skolnick, J. (2004) Scoring function for automated assessment of protein structure template quality. Proteins: Structure, Function, and Bioinformatics 57: 702-710. Zhang, Y., and Skolnick, J. (2005) TM-align: a protein structure alignment algorithm based on the TM-score. Nucleic Acids Res 33: 2302-2309. Zhang, Y., and Skolnick, J. (2007) Scoring function for automated assessment of protein structure template quality. Proteins 68: 1020. Zhang, Y., Kolinski, A., and Skolnick, J. (2003) TOUCHSTONE II: a new approach to ab initio protein structure prediction. Biophys J 85: 1145. Zhang, Y., Dong, J., Yang, Z., Zhang, S., and Wang, Y. (2008) Phylogenetic diversity of nitrogen-fixing bacteria in mangrove sediments assessed by PCR– denaturing gradient gel electrophoresis. Arch Microbiol 190: 19-28. Zhou, J., Davey, M.E., Figueras, J.B., Rivkina, E., Gilichinsky, D., and Tiedje, J.M. (1997) Phylogenetic diversity of a bacterial community determined from Siberian tundra soil DNA. Microbiology 143: 3913-3919.

224

Appendix A ______x 1996 and 2004 Stromatolite nifH BLAST and BLASTX matches

Table A-1: Stromatolite 2004 clones BLAST results, presenting the highest sequence similarity match for each clone. 38 clones of 2004 stromatolites were analysed. Blast match Sequence Sequence Clone ID Nearest relative in GenBank* accession Similarity file ID ID (%) RSA46_04 C1 Uncultured bacterium nifH clone DQ140596 98 RSA47_04 C2 “ EU594145 86 RSA48_04 C5 “ DQ338103 87 RSA49_04 C7 “ EU594212 88 RSA50_04 C10 “ AY450628 87 RSA51_04 C12 “ DQ338103 87 RSA52_04 C13 “ DQ338103 87 RSA53_04 C14 “ DQ338103 87 RSA54_04 C16 “ EU594141 87 RSA55_04 C17 “ EU594141 87 RSA56_04 C18 “ DQ338103 86 RSA59_04 C20 “ DQ338103 87 RSA61_04 C21 “ EF174812 88 RSA62_04 C22 “ EU594146 87 RSA63_04 C23 “ DQ338103 87 RSA64_04 C24 “ DQ338103 87 RSA65_04 C25 “ DQ338040 85 RSA66_04 C26 “ EU594145 86 RSA67_04 C27 “ DQ338103 86 RSA68_04 C28 “ DQ338103 87 RSA69_04 C29 “ EU594145 85 RSA70_04 C30 “ EU594145 87 RSA71_04 C31 “ DQ338103 87 RSA74_04 C34 Myxosarcina sp. dinitrogenase U73133 89 reductase (nifH ) gene RSA75_04 C35 Uncultured bacterium nifH clone DQ338103 86 RSA76_04 C36 “ DQ140596 100 RSA77_04 C37 “ DQ338103 87 RSA78_04 C39 “ EU594188 85 RSA79_04 C40 “ DQ338103 87 RSA80_04 C41 “ EF174826 93 RSA81_04 C42 “ EF174812 89 RSA82_04 C43 “ DQ338103 87 RSA83_04 C44 “ DQ338103 87 RSA84_04 C45 “ EU594188 84 RSA85_04 C47 “ EF174812 89 RSA86_04 C48 “ DQ338103 86 RSA87_04 C49 “ EU594145 87 RSA88_04 C50 “ DQ338103 87 *Only a single match is shown. There could be two or more identical high scores for each record here.

225

Table A-2: Stromatolite 1996 clones BLAST results, presenting only the highest sequence similarity match for each clone. 37 clones of 1996 stromatolites were analysed. Blast match Sequence Sequence Clone Nearest relative in accession similarity file ID ID GenBank* ID (%) RSA90_96 GC15 Uncultured bacterium nifH DQ338027 85 clone RSA91_96 GC16 “ DQ338027 87 RSA93_96 GC20 “ DQ078021 85 RSA94_96 GC21 “ DQ338071 85 RSA95_96 GC22 “ DQ338014 86 RSA96_96 GC23 “ AM286438 95 RSA97_96 GC25 “ DQ338071 85 RSA98_96 GC26 “ DQ338027 87 RSA99_96 GC27 “ DQ078042 89 RSA101_96 GC29 “ DQ821946 98 RSA102_96 GC30 “ HM750588 92 RSA107_96 GC36 “ DQ338071 85 RSA112_96 GC41 “ GU193054 82 RSA114_96 GC43 “ DQ821946 98 RSA115_96 GC45 “ HM750588 92 RSA119_96 GC17 “ HM750603 83 RSA121_96 GC31 “ DQ338014 87 RSA122_96 GC32 “ DQ078042 89 RSA124_96 GC36 “ DQ338071 85 RSA126_96 GC38 “ HM750759 89 RSA127_96 GC39 “ DQ078042 88 RSA128_96 GC35 “ DQ078042 87 RSA129_96 GC40 “ AM286438 96 RSA130_96 GC42 “ DQ078042 89 RSA132_96 GC47 “ DQ078042 88 RSA133_96 GC48 “ HM750443 87 RSA134_96 GC49 “ DQ821946 97 RSA135_96 GC50 “ GU193021 86 RSA137_96 GC52 Desulfovibrio magneticus RS-1 AP010904 85 DNA RSA138_96 GC53 Uncultured bacterium nifH DQ338014 87 clone RSA139_96 GC54 “ DQ078042 89 RSA141_96 GC56 “ DQ338071 85 RSA143_96 GC58 “ DQ078042 89 RSA147_96 GC57 “ DQ338014 88 RSA148_96 GC1 “ DQ078042 89 RSA150_96 GC3 “ DQ078042 89 RSA152_96 GC6 “ DQ338014 86 *Only a single match is shown. There could be two or more identical high scores for each record here.

226

Table A-3: Stromatolite 2004 clones BLASTX results, presenting only the highest sequence similarity match for each clone. BLASTX match Sequence Sequence Clone Nearest bacterial Nitrogenase Phylum accession similarity file ID ID iron protein match in GenBank ID (%) RSA46_04 C1 Oscillatoria PCC 6506 Cyanobacteria ZP_07112556 94 RSA47_04 C2 Cyanothece sp. CCY0110 Cyanobacteria ZP_01727765 94 RSA48_04 C5 “ Cyanobacteria ZP_01727765 94 RSA49_04 C7 “ Cyanobacteria ZP_01727765 94 RSA50_04 C10 “ Cyanobacteria ZP_01727765 92 RSA51_04 C12 “ Cyanobacteria ZP_01727765 94 RSA52_04 C13 “ Cyanobacteria ZP_01727765 94 RSA53_04 C14 “ Cyanobacteria ZP_01727765 93 RSA54_04 C16 “ Cyanobacteria ZP_01727765 93 RSA55_04 C17 “ Cyanobacteria ZP_01727765 93 RSA56_04 C18 “ Cyanobacteria ZP_01727765 93 RSA59_04 C20 “ Cyanobacteria ZP_01727765 93 RSA61_04 C21 Cyanobacterium UCYN-A Cyanobacteria YP_003421696 90 RSA62_04 C22 Cyanothece ATCC 51142 Cyanobacteria YP_001801976 93 RSA63_04 C23 “ Cyanobacteria YP_001801976 93 RSA64_04 C24 “ Cyanobacteria YP_001801976 94 RSA65_04 C25 “ Cyanobacteria YP_001801976 91 RSA66_04 C26 “ Cyanobacteria YP_001801976 93 RSA67_04 C27 Cyanothece PCC 7425 Cyanobacteria YP_002483083 93 RSA68_04 C28 Cyanothece ATCC 51142 Cyanobacteria YP_001801976 93 RSA69_04 C29 Cyanothece PCC 7425 Cyanobacteria YP_002483083 91 RSA70_04 C30 Cyanothece ATCC 51142 Cyanobacteria YP_001801976 93 RSA71_04 C31 “ Cyanobacteria YP_001801976 93 RSA74_04 C34 “ Cyanobacteria YP_001801976 93 RSA75_04 C35 “ Cyanobacteria YP_001801976 93 RSA76_04 C36 Cyanothece PCC 7425 Cyanobacteria YP_002483083 95 RSA77_04 C37 Cyanothece CCY0110 Cyanobacteria ZP_01727765 93 RSA78_04 C39 “ Cyanobacteria ZP_01727765 93 RSA79_04 C40 “ Cyanobacteria ZP_01727765 93 RSA80_04 C41 Cyanothece PCC 7425 Cyanobacteria YP_002483083 93 RSA81_04 C42 Cyanobacterium UCYN-A Cyanobacteria YP_003421696 91 RSA82_04 C43 Cyanothece sp. CCY0110 Cyanobacteria ZP_01727765 93 RSA83_04 C44 “ Cyanobacteria ZP_01727765 93 RSA84_04 C45 “ Cyanobacteria ZP_01727765 93 RSA85_04 C47 Cyanothece PCC 7425 Cyanobacteria YP_002483083 92 RSA86_04 C48 Cyanothece sp. CCY0110 Cyanobacteria ZP_01727765 93 RSA87_04 C49 “ Cyanobacteria ZP_01727765 93 RSA88_04 C50 “ Cyanobacteria ZP_01727765 93

227

Table A-4: 1996 stromatolites clones BLASTX results, presenting only the highest sequence similarity match for each clone. Nearest bacterial BLASTX Sequence Sequence Clone nitrogenase iron Phylum match similarity file ID ID protein match in accession (%) GenBank ID RSA90_96 GC15 Desulfonatronospira δ-Proteobacteria ZP_07015343 89 thiodismutans ASO3-1 RSA91_96 GC16 “ δ-Proteobacteria ZP_07015343 97 RSA93_96 GC20 Desulfatibacillum δ-Proteobacteria YP_002430688 95 alkenivorans AK-01 RSA94_96 GC21 “ δ-Proteobacteria YP_002430688 93 RSA95_96 GC22 Desulfonatronospira δ-Proteobacteria ZP_07015343 94 thiodismutans ASO3-1 RSA96_96 GC23 Desulfatibacillum δ-Proteobacteria YP_002430688 84 alkenivorans AK-01 RSA97_96 GC25 “ δ-Proteobacteria YP_002430688 95 RSA98_96 GC26 Desulfonatronospira δ-Proteobacteria ZP_07015343 93 thiodismutans ASO3-1 RSA99_96 GC27 Pelobacter δ-Proteobacteria YP_357508 90 carbinolicus DSM 2380 RSA101_96 GC29 Cyanothece sp. Cyanobacteria ZP_01727765 91 CCY0110 RSA102_96 GC30 Halorhodospira γ-Proteobacteria YP_001001870 97 halophila SL1 RSA107_96 GC36 Desulfatibacillum δ-Proteobacteria YP_002430688 94 alkenivorans AK-01 RSA112_96 GC41 Teredinibacter γ-Proteobacteria YP_003073074 96 turnerae T7901 RSA114_96 GC43 Cyanothece sp. Cyanobacteria ZP_01727765 96 CCY0110 RSA115_96 GC45 Halorhodospira γ-Proteobacteria YP_001001870 94 halophila SL1 RSA119_96 GC17 Desulfatibacillum δ-Proteobacteria YP_002430688 89 alkenivorans AK-01 RSA121_96 GC31 Desulfonatronospira δ-Proteobacteria ZP_07015343 97 thiodismutans ASO3-1 RSA122_96 GC32 Pelobacter δ-Proteobacteria YP_357508 94 carbinolicus DSM 2380 RSA124_96 GC36 Desulfatibacillum δ-Proteobacteria YP_002430688 95 alkenivorans AK-01 RSA126_96 GC38 Pelobacter δ-Proteobacteria YP_357508 95 carbinolicus DSM 2380 RSA127_96 GC39 “ δ-Proteobacteria YP_357508 96 RSA128_96 GC35 “ δ-Proteobacteria YP_357508 92 RSA129_96 GC40 Desulfatibacillum δ-Proteobacteria YP_002430688 91 alkenivorans AK-01 RSA130_96 GC42 Pelobacter δ-Proteobacteria YP_357508 97 carbinolicus DSM 2380 RSA132_96 GC47 “ δ-Proteobacteria YP_357508 93 RSA133_96 GC48 Teredinibacter γ-Proteobacteria YP_003073074 97 turnerae T7901 RSA134_96 GC49 Cyanothece sp. Cyanobacteria ZP_01727765 89 CCY0110 228

RSA135_96 GC50 Teredinibacter γ-Proteobacteria YP_003073074 97 turnerae T7901 RSA137_96 GC52 Desulfovibrio δ-Proteobacteria YP_002953433 97 magneticus RS-1 RSA138_96 GC53 Desulfonatronospira δ-Proteobacteria ZP_07015343 85 thiodismutans ASO3-1 RSA139_96 GC54 Pelobacter δ-Proteobacteria YP_357508 89 carbinolicus DSM 2380 RSA141_96 GC56 Desulfatibacillum δ-Proteobacteria YP_002430688 97 alkenivorans AK-01 RSA143_96 GC58 Pelobacter δ-Proteobacteria YP_357508 95 carbinolicus DSM 2380 RSA147_96 GC57 Desulfonatronospira δ-Proteobacteria ZP_07015343 93 thiodismutans ASO3-1 RSA148_96 GC1 Pelobacter δ-Proteobacteria YP_357508 94 carbinolicus DSM 2380 RSA150_96 GC3 “ δ-Proteobacteria YP_357508 84 RSA152_96 GC6 Desulfonatronospira δ-Proteobacteria ZP_07015343 95 thiodismutans ASO3-1

x NifH phylogeny reference sequences

Table A-5: 186 imported nifH amino acid sequences from The Universal Protein Resource (UniProtKB) database. Length Organism Accession ID Entry name Status(a) (AA)

Acidithiobacillus ferrooxidans strain B7JA99 NIFH_ACIF2 296 reviewed ATCC 23270 Acidithiobacillus ferrooxidans strain B5ER76 NIFH_ACIF5 296 reviewed ATCC 53993 Alcaligenes faecalis Q44044 NIFH_ALCFA 296 reviewed Alkaliphilus metalliredigens A6TTY3 NIFH_ALKMQ 272 reviewed Anabaena azollae P0A3S2 NIFH_ANAAZ 295 reviewed Anabaena sp. strain L31 P33178 NIFH_ANASL 294 reviewed Anabaena variabilis strain ATCC P0A3S1 NIFH1_ANAVT 295 reviewed 29413 Anabaena variabilis strain ATCC Q44484 NIFH2_ANAVT 296 reviewed 29413 Arcobacter nitrofigilis Q6XJ51 Q6XJ51_9PROT 121 unreviewed Arcobacter nitrofigilis Q7WUN7 Q7WUN7_9PROT 121 unreviewed Arcobacter nitrofigilis DSM 7299 D5V3K8 D5V3K8_ARCNC 304 unreviewed Azoarcus communis Q79AX4 Q79AX4_9RHOO 137 unreviewed Azoarcus sp. O31255 O31255_AZOSP 113 unreviewed Azoarcus sp. strain BH72 Q9F0V9 Q9F0V9_AZOSB 297 unreviewed Azorhizobium caulinodans strain P26251 NIFH1_AZOC5 296 reviewed ATCC 43989 Azorhizobium caulinodans strain P26252 NIFH2_AZOC5 296 reviewed 229

ATCC 43989 Azospirillum brasilense P17303 NIFH_AZOBR 293 reviewed Azotobacter chroococcum strain mcd 1 P06118 NIFH2_AZOCH 290 reviewed Azotobacter chroococcum strain mcd 1 P26248 NIFH1_AZOCH 291 reviewed Azotobacter vinelandii P00459 NIFH1_AZOVI 290 reviewed Azotobacter vinelandii P15335 NIFH2_AZOVI 290 reviewed Azotobacter vinelandii P16269 NIFH3_AZOVI 275 reviewed Beijerinckia indica subsp.indica strain B2IET2 B2IET2_BEII9 290 unreviewed ATCC 9039 Bradyrhizobium japonicum P06117 NIFH_BRAJA 294 reviewed Bradyrhizobium strain ANU 289 P00463 NIFH_BRASP 294 reviewed Burkholderia sp. WSM3938 A9YL99 A9YL99_9BURK 204 unreviewed Burkholderia tropica A6Y960 A6Y960_9BURK 284 unreviewed Burkholderia vietnamiensis A4JRN7 A4JRN7_BURVG 293 unreviewed Candidatus Azobacteroides B6YRI6 NIFH_AZOPC 274 reviewed pseudotrichonymphae genomovar. CFP2 Chlorobaculum parvum strain NCIB B3QQ12 NIFH_CHLP8 274 reviewed 8327 Chlorobium chlorochromatii strain Q3AR70 NIFH_CHLCH 274 reviewed CaD3 Chlorobium limicola strain DSM 245 B3EH88 NIFH_CHLL2 274 reviewed Chlorobium phaeobacteroides strain B3EL81 NIFH_CHLPB 274 reviewed BS1 Chlorobium phaeobacteroides strain A1BEH0 NIFH_CHLPD 274 reviewed DSM 266 Chlorobium tepidum Q8KC92 NIFH_CHLTE 274 reviewed Chlorogloeopsis sp. P94673 P94673_9CYAN 108 unreviewed Clostridium acetobutylicum Q97ME5 NIFH_CLOAB 272 reviewed Clostridium cellobioparum Q59270 NIFH_CLOCB 271 reviewed Clostridium pasteurianum P00456 NIFH1_CLOPA 273 reviewed Clostridium pasteurianum P09552 NIFH2_CLOPA 272 reviewed Clostridium pasteurianum P09553 NIFH3_CLOPA 275 reviewed Clostridium pasteurianum P09554 NIFH5_CLOPA 273 reviewed Clostridium pasteurianum P09555 NIFH6_CLOPA 272 reviewed Clostridium pasteurianum P22548 NIFH4_CLOPA 273 reviewed Cupriavidus sp. SWF66167 C1JID2 C1JID2_9BURK 268 unreviewed Cyanobacterium UCYN-A B7TB36 UCYN_06140 287 unreviewed Cyanothece strain ATCC 51142 O07641 NIFH_CYAA5 327 reviewed Cyanothece sp. CCY0110 A3IL28 A3IL28_9CHRO 290 unreviewed Cyanothece sp. strain PCC 7424 B7KG76 NIFH_CYAP7 299 reviewed Cyanothece sp. strain PCC 7425 B8HWE3 NIFH_CYAP4 298 reviewed Cyanothece sp. strain PCC 8801 Q55028 NIFH_CYAP8 296 reviewed Dechloromonas aromatica strain RCB Q47G67 NIFH_DECAR 296 reviewed Dehalococcoides ethenogenes strain Q3Z7C7 NIFH_DEHE1 274 reviewed 195 Desulfatibacillum alkenivorans strain B8FAC4 NIFH_DESAA 274 reviewed AK-01 Desulfobacter curvatus Q9X3A2 Q9X3A2_9DELT 109 unreviewed Desulfobacter latus Q7WUN9 Q7WUN9_9DELT 109 unreviewed Desulfobacterium autotrophicum strain C0QKK8 NIFH_DESAH 273 reviewed ATCC 43914 Desulfobacterium autotrophicum strain C0QMP6 C0QMP6_DESAH 99 unreviewed ATCC 43914 230

Desulfomicrobium baculatum strain C7LPA9 C7LPA9_DESBD 276 unreviewed DSM 4028 Desulfonatronospira thiodismutans D6SLD2 D6SLD2_9DELT 276 unreviewed ASO3-1 Desulfonema limicola Q9X3A1 Q9X3A1_9DELT 107 unreviewed Desulforudis audaxviator MP104C B1I0Y8 NIFH_DESAP 280 reviewed Desulfosporosinus orientis Q9F8Z7 Q9F8Z7_9FIRM 107 unreviewed Desulfotomaculum reducens MI-1 A4J8C2 NIFH_DESRM 272 reviewed Desulfovibrio baculatus Q8VPI5 Q8VPI5_DESBA 109 unreviewed Desulfovibrio gigas P71156 NIFH_DESGI 274 reviewed Desulfovibrio magneticus strain ATCC C4XRK2 NIFH_DESMR 274 reviewed 700980 Desulfovibrio salexigens strain ATCC C6BX34 NIFH_DESAD 275 reviewed 14822 Frankia alni P08925 NIFH_FRAAL 287 reviewed Frankia sp. strain CcI3 Q2J4F8 NIFH_FRASC 287 reviewed Frankia strain EAN1pec A8L2C4 NIFH_FRASN 289 reviewed Frankia sp. strain EuIK1 Q47922 NIFH_FRASE 287 reviewed Frankia sp. strain FaC1 P46034 NIFH_FRASP 287 reviewed Gloeothece sp. KO11DG A4PC92 A4PC92_9CHRO 119 unreviewed Gloeothece sp. KO68DGA A2V899 A2V899_9CHRO 290 unreviewed Gluconacetobacter diazotrophicus Q9ZIE4 NIFH_GLUDA 298 reviewed strain ATCC 49037 Halorhodospira halophila SL1 A1WTQ6 A1WTQ6_HALHL 291 unreviewed Herbaspirillum seropedicae P77873 NIFH_HERSE 292 reviewed Klebsiella pneumoniae P00458 NIFH_KLEPN 293 reviewed Klebsiella pneumoniae strain 342 B5XPH2 NIFH_KLEP3 293 reviewed Magnetococcus strain MC-1 A0L6X0 NIFH_MAGSM 296 reviewed Mastigocladus laminosus Q47917 NIFH1_MASLA 295 reviewed Mastigocladus laminosus Q47921 NIFH2_MASLA 307 reviewed Methanobacterium ivanovii P08624 NIFH2_METIV 263 reviewed Methanobacterium ivanovii P51602 NIFH1_METIV 275 reviewed Methanobacterium O26739 NIFH2_METTH 265 reviewed thermoautotrophicum Methanobacterium O27602 NIFH1_METTH 275 reviewed thermoautotrophicum Methanobacterium Q50785 NIFH1_METTM 275 reviewed thermoautotrophicum strain DSM 2133 Methanobrevibacter arboriphilus Q48890 Q48890_9EURY 109 unreviewed Methanobrevibacter ruminantium O93628 O93628_9EURY 139 unreviewed Methanobrevibacter smithii O93629 O93629_METSM 139 unreviewed Methanococcus jannaschii Q58289 NIFH_METJA 279 reviewed Methanococcus maripaludis Q50218 NIFH_METMP 275 reviewed Methanococcus thermolithotrophicus P08625 NIFH2_METTL 292 reviewed Methanococcus thermolithotrophicus P25767 NIFH1_METTL 284 reviewed Methanococcus voltae P06119 NIFH_METVO 278 reviewed Methanoculleus marisnigri strain A3CWW3 NIFH_METMJ 272 reviewed ATCC 35101 Methanopyrus kandleri Q8TVH3 Q8TVH3_METKA 266 unreviewed Methanosarcina acetivorans Q8TJ93 Q8TJ93_METAC 273 unreviewed Methanosarcina acetivorans Q8TJZ9 Q8TJZ9_METAC 265 unreviewed Methanosarcina barkeri O93630 O93630_METBA 142 unreviewed Methanosarcina barkeri P54799 NIFH1_METBA 275 reviewed 231

Methanosarcina barkeri P54800 NIFH2_METBA 273 reviewed Methanosarcina lacustris Q977F4 Q977F4_9EURY 134 unreviewed Methanosarcina lacustris Q977P9 Q977P9_9EURY 134 unreviewed Methanosarcina mazei Q8PYY0 NIFH_METMA 273 reviewed Methanosarcina mazei Q8PZH9 Q8PZH9_METMA 265 unreviewed Methanospirillum hungatei strain Q2FUB7 NIFH_METHJ 280 reviewed DSM 864 Methylobacter luteus Q6KEW2 Q6KEW2_9GAMM 148 unreviewed Methylobacter marinus Q93DU4 Q93DU4_METMR 120 unreviewed Methylobacterium nodulans strain B8ITG7 NIFH_METNO 299 reviewed ORS 2060 Methylobacterium strain 4-46 B0UAK2 NIFH_METS4 299 reviewed silvestris strain DSM B8EJ25 B8EJ25_METSB 293 unreviewed 15510 Methylocella tundrae Q6KEX4 Q6KEX4_METTU 147 unreviewed Methylomonas methanica Q93DU1 Q93DU1_9GAMM 121 unreviewed Methylomonas rubra Q83TP5 Q83TP5_METRU 150 unreviewed Methylosinus trichosporium OB3b D5QKI5 D5QKI5_METTR 295 unreviewed Nostoc commune P26250 NIFH_NOSCO 297 reviewed Nostoc muscorum Q09158 NIFH_NOSMU 108 reviewed Nostoc sp. strain PCC 6720 Q51296 NIFH_NOSS6 295 reviewed Nostoc sp. strain PCC 7120 O30577 NIFH2_NOSS1 297 reviewed Nostoc sp. strain PCC 7120 P00457 NIFH1_NOSS1 295 reviewed Oscillatoria sp. PCC 6506 D8G542 D8G542_9CYAN 300 unreviewed Paenibacillus azotofixans Q9AKT4 NIFH2_PAEAZ 292 reviewed Paenibacillus azotofixans Q9AKT8 NIFH1_PAEAZ 292 reviewed Pectobacterium atrosepticum Q6D2Y8 NIFH_ERWCT 293 reviewed Pelobacter carbinolicus DSM 2380 Q3A2R9 Q3A2R9_PELCD 292 unreviewed Pelodictyon luteolum DSM 273 Q3B2P6 NIFH_PELLD 274 reviewed Pelodictyon phaeoclathratiforme DSM B4SC59 NIFH_PELPB 274 reviewed 5477 Phormidium sp. AD1 Q9F8Z5 Q9F8Z5_9CYAN 108 unreviewed Plectonema boryanum Q00240 NIFH_PLEBO 296 reviewed Prosthecochloris aestuarii strain DSM B4S9H5 NIFH_PROA2 274 reviewed 271 Prosthecochloris vibrioformis strain A4SFU6 NIFH_PROVI 274 reviewed DSM 265 Pseudomonas stutzeri strain A1501 A4VJ70 NIFH_PSEU5 293 reviewed Rhizobium etli P00462 NIFH_RHIET 297 reviewed Rhizobium leguminosarum bv. trifolii P00461 NIFH_RHILT 297 reviewed Mesorhizobium loti Q98AP7 NIFH_RHILO 297 reviewed Rhizobium meliloti P00460 NIFH_RHIME 297 reviewed Sinorhizobium fredii P19068 NIFH_RHISN 296 reviewed Rhodobacter azotoformans Q8L0U5 Q8L0U5_9RHOB 154 unreviewed Rhodobacter capsulatus P08718 NIFH1_RHOCA 295 reviewed Rhodobacter capsulatus Q07942 NIFH2_RHOCA 275 reviewed Rhodobacter capsulatus strain ATCC D5AKX6 D5AKX6_RHOCB 290 unreviewed BAA-309 Rhodobacter capsulatus strain ATCC D5ANI3 D5ANI3_RHOCB 295 unreviewed BAA-309 Rhodobacter sp. AP-10 Q8L0U4 Q8L0U4_9RHOB 153 unreviewed Rhodobacter sp. SW2 C8S0T4 C8S0T4_9RHOB 295 unreviewed Rhodobacter sphaeroides O31183 NIFH_RHOSH 291 reviewed Rhodobacter sphaeroides strain ATCC Q3J0H1 NIFH_RHOS4 291 reviewed 232

17023 Rhodobacter sphaeroides strain ATCC A3PLS9 NIFH_RHOS1 291 reviewed 17029 Rhodopseudomonas blastica Q8L0T4 Q8L0T4_RHOBL 154 unreviewed Rhodopseudomonas palustris strain Q13C78 NIFH_RHOPS 299 reviewed BisB5 Rhodopseudomonas palustris strain Q2J1I1 NIFH_RHOP2 299 reviewed HaA2 Rhodospirillum rubrum P22921 NIFH_RHORU 295 reviewed Rhodovulum sulfidophilum Q8L0T6 Q8L0T6_RHOSU 153 unreviewed Roseiflexus castenholzii strain DSM A7NR80 NIFH_ROSCS 273 reviewed 13941 Roseiflexus strain RS-1 A5USK5 NIFH_ROSS1 273 reviewed Scytonema sp. NCC-4B Q19AR3 Q19AR3_9CYAN 103 unreviewed Sinorhizobium medicae WSM419 A6UME9 A6UME9_SINMW 297 unreviewed Spirochaeta aurantia Q9AMD9 Q9AMD9_SPIAU 109 unreviewed Spirochaeta aurantia Q9AME0 Q9AME0_SPIAU 109 unreviewed Spirochaeta stenostrepta Q9AMD7 Q9AMD7_9SPIO 134 unreviewed Spirochaeta stenostrepta Q9AMD8 Q9AMD8_9SPIO 143 unreviewed Spirochaeta zuelzerae Q9AMD6 Q9AMD6_9SPIO 143 unreviewed Symploca atlantica PCC 8002 Q7WUP5 Q7WUP5_9CYAN 108 unreviewed Synechococcus strain JA-2-3B Q2JP78 NIFH_SYNJB 292 reviewed Synechococcus strain JA-3-3Ab Q2JTL7 NIFH_SYNJA 292 reviewed Synechocystis sp. WH 002 Q7WUP2 Q7WUP2_9SYNC 108 unreviewed Syntrophobacter fumaroxidans strain A0LH11 NIFH_SYNFM 274 reviewed DSM 10017 Teredinibacter turnerae T7901 C5BTB0 NIFH_TERTT 292 reviewed Acidithiobacillus ferrooxidans P06661 NIFH_THIFE 296 reviewed Tolumonas auensis DSM 9187 C4LAS5 NIFH_TOLAT 295 reviewed Tolypothrix sp. PCC 7101 Q7WUP4 Q7WUP4_9CYAN 108 unreviewed Tolypothrix sp. PCC 7601 Q3L168 Q3L168_9CYAN 136 unreviewed Treponema azotonutricium Q9AMC8 Q9AMC8_9SPIO 144 unreviewed Treponema denticola Q9AMD4 Q9AMD4_TREDE 143 unreviewed Treponema primitia Q9AMD0 Q9AMD0_9SPIO 143 unreviewed Trichodesmium erythraeum IMS101 O34106 NIFH_TRIEI 296 reviewed Trichodesmium thiebautii P26254 NIFH_TRITH 294 reviewed Uncultured methanogenic archaeon Q0W443 NIFH_UNCMA 274 reviewed RC-I Vibrio cincinnatii Q9LAI5 Q9LAI5_VIBCI 229 unreviewed Vibrio natriegens Q9RQR0 Q9RQR0_VIBNA 198 unreviewed Vibrio parahaemolyticus A7X7L2 A7X7L2_VIBPA 139 unreviewed Wolinella succinogenes Q7M8U8 NIFH_WOLSU 303 reviewed Xanthobacter autotrophicus Q93G65 Q93G65_XANAU 134 unreviewed Xenococcus sp. O08262 O08262_9CYAN 108 unreviewed Zymomonas mobilis Q5NLG3 NIFH_ZYMMO 295 reviewed

(a) ‘Reviewed’ status indicates sequences that were manually annotated and reviewed in the Swiss-Prot database. Reviewed sequences are reliable as they were inferred from homology studies, and there is evidence at transcript and protein level of their existence. ‘Unreviewed’ status indicates sequences that were automatically annotated and were not reviewed (TrEMBL database). They are mostly derived from prediction studies and have far less verification at the protein and transcript levels.

233