<<

AN ABSTRACT OF THE DISSERTATION OF

Amy Renee Smith for the degree of Doctor of Philosophy in Ocean, Earth, and Atmospheric Sciences presented on November 22, 2017.

Title: Impact of Igneous Mineralogy on the Composition and Metabolic Function of Microbial Biofilms in a Thermal Suboceanic Crustal Aquifer.

Abstract approved: ______Frederick S. Colwell Martin R. Fisk

Igneous oceanic crust encompasses ~60% of Earth’s surface and is composed of basalt glass and mafic, ultramafic, and felsic minerals. A vast marine aquifer lies within the crust, exchanging geochemically altered fluids with seawater from the overlying ocean at ridge crests, flanks, seamounts, and outcrops where permeable crust is exposed. Correlation studies of crustal surface rocks have shown that mineralogy is linked to microbiology; however, the influence of individual mineral phases or their compositions on microbial communities has yet to be empirically demonstrated. In addition, the habitable zone of oceanic crust can extend to depths of several kilometers and communities deeper within this zone may be more representative of the whole suboceanic aquifer ecosystem than those communities found just at the surface where the environment is influenced by infiltration of cold, oxic seawater. The focus of this work was to explore how deep subsurface biofilm communities in the suboceanic aquifer of the Juan de Fuca Ridge (JdFR) are

influenced by igneous mineral phases and their compositions. We expect that crustal mineralogy will affect the microbial community structure and of these aquifer communities. Exploring the of these communities will also lead to a greater understanding of the functioning of the suboceanic aquifer ecosystem and its role in the global carbon cycle.

Microbial communities that colonized a variety of in situ-incubated igneous minerals and glasses were investigated. We used International Ocean Drilling

Program (IODP) borehole 1301A as a subseafloor observatory to incubate these mineral substrates for a four-year period. After retrieval, we found that taxa related to thermophilic and hyperthermophilic chemolithotrophs and heterotrophs were present.

Archaeal taxa included three genera of , including members of the sulfate-reducing . Bacterial taxa were overwhelmingly dominated by and other deep-branching . Most taxa were not closely related to known organisms so their metabolic capabilities could not be predicted by taxonomic association. Microbial communities were also influenced by the mineralogical properties of attachment surfaces; particularly with respect to iron- rich phases. Communities attached to rocks, minerals, and glasses in this environment were more similar to each other than they were to aquifer fluid communities, bottom seawater, and other marine or deep crustal communities, and thus represented a distinct mineral-colonizing “attached” community. Metabolic reconstruction of metagenome-derived genomes from olivine also showed that sulfate reduction, carbon fixation, and hydrogenotrophic pathways dominated that community. These

results suggest there is a potential for hydrogen-based chemolithoautotrophy in the deep oceanic crust, that these microbial communities do not fully rely on photosynthetically-derived organic carbon for energy and carbon, and that the suboceanic aquifer biosphere may play a role in global carbon cycling and productivity.

©Copyright by Amy Renee Smith November 22, 2017 All Rights Reserved

Impact of Igneous Mineralogy on the Composition and Metabolic Function of Microbial Biofilms in a Thermal Suboceanic Crustal Aquifer

by Amy Renee Smith

A DISSERTATION

submitted to

Oregon State University

in partial fulfillment of the requirements for the degree of

Doctor of Philosophy

Presented November 22, 2017 Commencement June 2018

Doctor of Philosophy dissertation of Amy Renee Smith presented on November 22, 2017.

APPROVED:

Co-Major Professor, representing Ocean, Earth, and Atmospheric Sciences

Co-Major Professor, representing Ocean, Earth, and Atmospheric Sciences

Dean of the College of Earth, Ocean, and Atmospheric Sciences

Dean of the Graduate School

I understand that my dissertation will become part of the permanent collection of Oregon State University libraries. My signature below authorizes release of my dissertation to any reader upon request.

Amy Renee Smith, Author

ACKNOWLEDGEMENTS

I wish to thank my committee members for their patience and support though the years. Drs. Frederick S. Colwell and Martin R. Fisk have been instrumental in helping me amalgamate my work into cohesive, influential manuscripts, and provided advice for funding applications, exams, classwork, presentations, and more. I am forever in your debt for helping me become a better scientist. Thank you to Drs. Staci

Simonich, Adam Schultz, and Ryan Mueller for agreeing to serve on my committee.

A special thank you goes to Dr. Mueller for providing significant support in bioinformatics which was crucial to the success of this dissertation and for our deep discussions about metabolism. Finally, the Geomicrobiology Group has provided insight and advice for presentations, support during my time here, and invaluable feedback in a warm and supportive environment.

I would also like to acknowledge collaborators who have contributed to my work and offered insight when needed. For their contribution to my ocean crust research, I would like to thank Dr. Mark Nielsen, Dr. C Geoffrey Wheat, Dr. Andrew

Fisher, Dr. Hans Jannasch, Dr. Keir Becker, and Dr. Stefan Sievert. Thank you also to the crews of the submersible Alvin and the RV Atlantis and JOIDES Resolution.

Many people provided instruction on writing and scientific analyses throughout my time at OSU. Dr. Andrew Thurber provided instruction and guidance on community analysis with PRIMER, Teresa Sawyer trained me in the use of the

Environmental Scanning Electron Microscope, and Dr. Olivia Mason introduced me to MG-RAST and STAMP.

Funding for the flow cell construction was provided by a small grant from the

Ocean Drilling Program. The idea for the flow cells used in this project was developed at a workshop in Bergen Norway in 2002 and was made possible by support from the University of Bergen and the Ocean Drilling Program. Unimin Co. provided the Fo90 olivine.

This work was also funded by grants from the National Aeronautics and Space

Administration (NASA), the National Science Foundation’s Center for Deep Energy

Biosphere Investigations (C-DEBI), and the Census of Deep Life which is funded by the Sloan Foundation as part of the Deep Carbon Observatory.

I would like to give a very special thanks to the folks involved with C-DEBI and the Deep Carbon Observatory. In particular, I would like to thank Rosalynn

Sylvan and Katie Pratt for providing incredible support and positivity. Those involved with the sequencing of Census of Deep Life projects at the Marine Biological

Laboratory provided stellar pyrotag and metagenome sequencing results and other related support that provided the basis for all the work in this dissertation. I can’t thank you all enough!

Finally, I would like to acknowledge my family and friends who provided so much love and support throughout graduate school. My husband Shane and daughters

Stella and Zephyr have loaded me up with enough hugs and smiles to get me through the moments when I have felt discouraged. Stella and Zephyr, you are my inspiration and I do this as much for you as I do for myself! I hope you are proud of your

Mommy and know that you can do anything you set your mind to. My Mom Cynthia

has been incredibly helpful by flying to conferences with me to babysit the girls while

I present my work and being there at the other end of the phone line when I needed encouragement. I certainly could not have done this without you, Mom, and I appreciate everything you have done for me! I love you.

CONTRIBUTION OF AUTHORS

For Chapters 1 and 5, Drs. Martin R. Fisk and Frederick S. Colwell provided edits, suggested revisions, and direction.

For Chapter 2, Drs. Martin R. Fisk and Frederick S. Colwell provided writing and conceptual advice, Dr. Andrew Thurber and Gilberto E. Flores helped guide the analyses of pyrotag data, and Dr. Olivia U. Mason provided technical and writing advice. Radu Popa provided support during the writing phase and was instrumental in procuring funding for pyrotag sequencing from C-DEBI and the Census of Deep Life.

For Chapter 3, Ryan Mueller provided instruction and guidance on bioinformatics principles and lent his technical expertise to help me solve problems during the metagenome analysis. Olivia U. Mason aided in project design, funding, and initial analyses. Radu Popa provided support during the writing phase. Brandon

Kieft analyzed genome bin purity and completeness using CheckM and produced

Supplemental Figure 3.1. Frederick S. Colwell and Martin R. Fisk provided guidance and support throughout the data collection, analysis, and writing phases.

For Chapter 4, again Ryan Mueller provided instruction and guidance on bioinformatics and metagenome analysis. Frederick S. Colwell and Martin R. Fisk provided guidance and support throughout the data collection, analysis, and writing phases.

TABLE OF CONTENTS

Page

1. INTRODUCTION………….……….……………………………………………..1 Oceanic Crust……………………………………………………….………….. …1 Structure………………………………………………………………….…… 1 The Suboceanic Aquifer…………………………………………………...…. 3 The Habitable Zone…………………………………………….…………...... 4 Aquifer Sampling ………………………………………………………………....5 Circulation Obviation Retrofit Kits (CORKs)...... ……………...... 5 Microbiological Sampling…….……………………………………………… 6 Planktonic vs. Sessile Communities……………………………………….…. 7 The Suboceanic Aquifer Biosphere (SAB)…………………………………...... 7 A Significant Reservoir of Biomass………………………………………...... 7 Common Energy Sources…………………………………………………….. 9 Potential for H2-based Life in the SAB…………………………………….….... 10 The SAB of the Juan de Fuca Ridge…………………………………………….. 12 Predicted Metabolisms………………………………………………………. 12 Dominant Taxonomic Groups………………………………………………..13 Carbon Fixation in the SAB………………………………..………………… .…14 Carbon Fixation Pathways……………………………………………….. ….14 The Ancient Wood-Ljungdahl Pathway…………………………………..….15 Enzymology of the Wood-Ljungdahl Pathway…………………………....…16 Evidence for Carbon Fixation in the JdFR Aquifer………………...………. .17 Research Questions and Hypotheses……….…………………………………… 18 Approach: A Genomic Investigation of Colonizing Mineral Biofilms in a Subseafloor Microbial Observatory…...…………………….…………... ….19 Objective and Expected Outcomes………………………………………….….. 21 References…………………………………………………………….…….…. ...22

2. DEEP CRUSTAL COMMUNITIES OF THE JUAN DE FUCA RIDGE ARE GOVERNED BY MINERALOGY ….…………………………………….…… 29 Abstract..…………………………………………………………….……….…. .30 Introduction……………………………………………………………………… 30 Materials and Methods…………………………………………………..………. 35 Overview of Experimental Design…………………………………….……..35 DNA Extraction and Sequencing………………………………………...... 36 Data Processing……………………………………………………………… 37 Removal of Laboratory Contaminants from Sequence Data……………...… 38 Mineral Community Analysis………………………………………….……. 39 Environmental Scanning Electron Microscopy………………………..……. 39 Results………………………………………………………………………… …40 Mineralogy Influenced Archaeal Community Structure…………………….. 40

TABLE OF CONTENTS (Continued)

Page

Mineral Chemistry Influenced Bacterial Community Structure………...…...42 Community Richness……………………………………………………..…. 43 Comparison to Other Local JdFR Communities…………………………..… 44 Attached vs. Planktonic Aquifer Communities………………………..……. 46 Biofilm Morphologies………………………………………………….……. 46 Discussion…..…………….……………………………………….……….……. 49 Conclusion and Future Directions…………………………………………….… 54 Acknowledgements..…………………………………………………………..… 55 References………..………………………………………………….……….….. 55

3. CARBON AND ENERGY PATHWAYS INDICATE DOMINANT BACTERIA COLONIZING OLIVINE IN YOUNG THERMAL OCEANIC CRUST ARE ACETOGENS…………………………….……………………………………... 61 Abstract..…………………………………………………………….………..… .61 Introduction…………………………………………………………….……..…. 62 Materials and Methods………………………………………………………...… 67 Overview of Experimental Design………………………………………….. 67 DNA Extraction………………………………………………………...….... 68 Metagenome Sequencing…………………………………………….…….... 68 Pre-Processing of Sequence Read Files for Assembly……………….….. ….69 Metagenome Assembly……………………………………………….……... 69 Pathway Distribution and Data Availability………………………….…….. .69 Binning Genomes from the Olivine Metagenome………………………..… .70 Bin Taxonomic Assignments…………………………………………….. ….70 Bin Quality Check………………………………………………………..…..70 Metabolic Reconstruction……………………………………………….…... 71 Carbon Fixation Target Genes…………………………………………….… 71 Translating and Assigning Function…………………………………………72 Results…………………………………………………………………………… 72 Metagenome Assembly and Genome Binning…………………..………….. 72 Genome ………………………………………………..………… 75 Carbon Fixation Pathways……………………………………….………….. 76 Energy Metabolisms………………………………………………………… 79 Hydrogenases and Electron Transfer Agents……………………………...… 80 Discussion …………………….……………………………………….……...… 85 Phylogenetic Lineages of the Olivine Community and the Deep JDFR Aquifer………………………………………………………………………. 85 Carbon Fixation Genes………………………………………………….…... 87 The Wood-Ljungdahl Pathway…………………………………………..….. 88 The Acetyl-CoA Pathway…………………………………………………… 90

TABLE OF CONTENTS (Continued)

Page

CAM Metabolism…………………………………………………………… 91 and Metabolism……………………………………….…… 92 Hydrogenases and Other ETAs…………………………………………..….. 94 Olivine Community Model……………………………………………….…. 95 Conclusion…………………………………………………………………….… 97 Acknowledgements..………………………………………………….…………. 99 References………..………………………………………………….……..……. 99

4. DOMINANT BACTERIUM FROM A THERMAL SUBOCEANIC AQUIFER OLIVINE BIOFILM IS A NOVEL ACETOGEN….……………….………… 105 Abstract,.…………………………………………………………….…………. 105 Introduction…………………………………………………………….………. 106 Materials and Methods…………………………………………………………. 109 In-situ Colonization of Olivine in the JdFR Aquifer………………………. 109 Genomic DNA Extraction and Metagenome Sequencing…………………. 110 Sequence Quality Check, Assembly, and Binning……………………….... 110 Verification of Bin Purity………………………………………………….. 111 Phylogenetic Tree Construction………………………………………….… 112 Genome Annotation and Reconstruction……………... 113 Visualization of Reconstructed Metabolism……………………………….. 115 Results……………………………………………………………………..…… 115 Evolutionary Relatedness……………………………………………….…..115 Genome Attributes…………………………………………………………. 117 Wood-Ljungdahl Pathway and the acs Gene Cluster……………………… 117 Incomplete TCA Cycle…………………………………………………….. 118 Oxidative Phosphorylation……………………………………………….…119 Incomplete Sulfate Reduction Pathway……………………………………. 126 Transporters………………………………………………………………... 126 Central Metabolism…………………………………………. 129 Discussion …………………….……………………………………….………. 134 Genome Attributes…………………………………………………………. 134 Evolutionary Relationship to the Acetogen Clade……………... 134 An Acetogen Dominates an Olivine Community………………………….. 136 acs Gene Cluster Comparison……………………………………………... .137 Incomplete TCA Cycle…………………………………………………….. 138 Oxidative Phosphorylation……………………………………………...... 139 Lack of Sulfate Reduction and Carbohydrate/Lipid Degradative Pathways…………………………………………………………………… 141 Transport and Degradation of Amino Acids and Oligopeptides…………... 141

TABLE OF CONTENTS (Continued)

Page

Central ………………………………………… .143 Other Carbon Metabolisms………………………………………………… 144 Conclusion……………………………………………………………………... 144 References………..………………………………………………….…………. 145

5. SYNTHESIS AND CONCLUSION……………………..……………………. 150 References………..………………………………………………….………… 155

6. APPENDICES ………………………………………………………………… 157 Appendix A: Chapter 2 Supplementary Material……………………………… 157 Appendix B: Chapter 3 Supplementary Material……………………………….163 Appendix C: Chapter 4 Supplementary Material..……………………………...173

7. BIBLIOGRAPHY …………………………………………………………….. .221

LIST OF FIGURES

Figure Page

1. Figure 1.1. Layers of oceanic crust……………………...……………………..…. 2

2. Figure 1.2. General overview of the suboceanic aquifer………………….…….....4

3. Figure 1.3. Global carbon biomass by reservoir………………………….……..... 9

4. Figure 2.1 Location, experimental timeline, and temperature data for this study. 33

5. Figure 2.2. Community structure of (A) archaeal and (B) bacterial communities as represented in an nMDS………………………………………………………….41

6. Figure 2.3. Archaeal community composition from this and other subseafloor studies at IODP Holes 1301A and 1026B…………………………………….….42

7. Figure 2.4. Bacterial community composition from this and other subseafloor studies at IODP Holes 1301A and 1026B………………………………………..44

8. Figure 2.5. Scanning electron micrographs of mineral and glass biofilms……… 48

9. Figure 3.1. Summary of subseafloor olivine colonization study………………... 65

10. Figure 3.2. Genomic binning of olivine metagenome using VizBin……………. 74

11. Figure 3.3. Summary of carbon fixation, methane, nitrogen, and sulfur cycling pathways in olivine genomic bins as determined from the KEGG Pathway Module.………………………………………………………………………….. 78

12. Figure 3.4. Wood-Ljungdahl pathway summary for olivine genomes (A), presence of potential alternatives to formate dehydrogenase (B), and relative abundance of respiratory and biomolecular metabolisms in olivine genomes (C)………………………………………………………………………………. .79

13. Figure 3.5. Hydrogenases and Electron Transfer Agents (ETAs) present in KEGG-annotated olivine genome bins………………………………………….. 83

14. Figure 3.6. Olivine community model depicting nutrient cycling and proposed routes of metabolites and substrates…………………………………………….. 97

LIST OF FIGURES (Continued)

Figure Page

15. Figure 4.1. 16s rRNA gene phylogeny of Ca. Acetocimmeria pyornia and closest taxonomic relatives…………………………………………………………….. 116

16. Figure 4.2. The Wood-Ljungdahl pathway for carbon fixation………………... 120

17. Figure 4.3. Arrangement of acs gene cluster of the Wood-Ljungdahl pathway on contig-65_203 of Ca. Acetocimmeria pyornia as compared to closely related acetogens……………………………………………………………………….. 121

18. Figure 4.4. Tricarboxylic Acid Cycle and related pathways in Ca. A. pyornia... 125

19. Figure 4.5. Metabolic pathways of Ca. A. pyornia…………………………….. 128

LIST OF TABLES

Table Page

1.1 Igneous minerals and glasses incubated in IODP Hole 1301A, their mineral classes, and compositions….………………………………………………... 20

2.1 Incubated igneous phase composition and community data………………… 32

3.1 CheckM marker gene summary and taxonomic assignment for each genomic bin from the olivine metagenome in this study……………………………… 76

4.1 KEGG annotations of contig-65_203 containing the acs gene cluster of the Wood-Ljungdahl pathway for carbon fixation……………………….……. 122

4.2 Complete pathways of carbon fixation and energy metabolism in Ca. A. pyornia (Ca. Apy) and three closely related acetogens as determined by the KEGG pathway module…………………………………………………… .127

4.3 Complete pathways of carbohydrate and lipid metabolism in Ca. A. pyornia (Ca. Apy) and two closely related acetogens as determined by the KEGG pathway module……………………………………………………………. 131

4.4 Complete pathways of nucleotide and metabolism in Ca. A. pyornia (Ca. Apy) and two closely related acetogens as determined by the KEGG pathway module…………………………………………………… .132

LIST OF APPENDIX FIGURES

Supplemental Figure Page

1. Supplemental Figure 2.1. Cluster-based community similarity for ……………………...……………………..…………………………...157

2. Supplemental Figure 2.2. Cluster-based community similarity for Bacteria with contaminants removed ………………………….……...... 158

3. Supplemental Figure 2.3. (A) Archaeal and (B) Bacterial community compositions for the eight minerals and glasses incubated in the subseafloor prior to removing suspected laboratory contaminants……………………………….. 159

4. Supplemental Figure 2.4. nMDS plots for A) Archaeal and B) Bacterial microbial communities prior to removal of suspected contaminants …………………….. 160

5. Supplemental Figure 2.5. EDAX spectrum for Fo90 olivine biofilm in Figure 2.5A …………………………………………..……………………………….. 161

6. Supplemental Figure 2.6. EDAX spectrum for Fo90 olivine secondary surface mineral in Figure 2.5A ……………………………………………………….... 162

7. Supplemental Figure 3.1. Quality checking and taxonomic assignment of genome bins from Figure 2……………………………………………………………… 164

8. Supplemental Figure 4.1. Ffh gene phylogeny of Ca. Acetocimmeria pyornia and closest relatives ………………………………………………………….... 174

LIST OF APPENDIX TABLES

Supplemental Table Page

2.1 Pyrotag read counts from domain-specific primer amplification of the v6v4 region of the 16S rRNA gene for the incubated minerals and glasses in this study ….………………………………………………...... 157

3.1 Statistics table of olivine metagenome assembly using IDBA-UD. Bolded kmer size 65 was used for analysis based on optimal results of largest n50, maximum contig size, and total assembly size…………………………….. 163

3.2 Results of genomic bin quality and completeness using CheckM ………… 165

3.3 List of olivine metagenome genes identified from ‘COG0243, BisC, anaerobic dehydrogenase, typically selenocysteine-containing’……………………… 166

3.4 Description of hydrogenase and related genes from Figure 3.5A .………... 170

3.5 Description of ferredoxins and related genes from Figure 3.5B ………….. .171

3.6 Description of cytochromes and other electron transport agent or related genes from Figure 3.5C …………………………………………………………... 172

4.1 Information for sequences from Figure 4.1……………………………….. .173

4.2 Complete KEGG pathways for secondary metabolism, energy metabolism and genetic information processing ……………………………………………. 175

4.3 Complete KEGG pathways for secondary metabolism, transport systems …………………………………………………………………….. 176

4.4 Complete KEGG pathways for other energy metabolisms and regulatory systems …………………………………………………………………….. 179

4.5 List of all and associated KEGG-annotated genes for Ca. A. pyornia …………………………………………………………………….. 180

1

Impact of Igneous Mineralogy on the Composition and Metabolic Function of Microbial Biofilms in a Thermal Suboceanic Crustal Aquifer

1. INTRODUCTION

Oceanic Crust

Structure

Oceanic crust spans ~ 60 % of Earth’s surface and is formed at mid-ocean ridges and submarine hotspot volcanoes. Crusts that are formed at slow (1 – 5 cmy-1) and ultra-slow (< 1 cmy-1) spreading ridges like the Southwest Indian Ridge are largely composed of iron and magnesium-rich mantle peridotites (Dick, 1989; Dick et al., 2008); however, at intermediate (5 – 10 cmy-1) or fast (10 – 20 cmy-1) spreading ridges, oceanic crust has a distinct layered structure (Bratt & Purdy, 1984; Dick et al.,

2003, 2008; Kennett, 1982; Kearey et al., 2009). These layers consist of extrusive and intrusive igneous rocks overlain with pelagic sediments. The pelagic sediments comprise Layer 1 and accumulate away from ridge crests over time; hence Layer 1 is missing from most young ridge crest environments. One exception to this rule is the eastern flank of the Juan de Fuca Ridge (JdFR), which lies proximal to the western edge of the North American continent and receives terrigenous input. The uppermost igneous layer is designated Layer 2, and consists largely of iron (Fe)- bearing amorphous basalt glass, olivine ([Mg,Fe]2SiO4), pyroxene ([Mg,Fe]CaSiO6), and Fe- poor feldspars ([Na,Ca][Al,Si]2Si2O8; White & Klein, 2013; Fisher et al., 2005).

Layer 2 can be divided into two additional layers; Layer 2A consists of pillow basalt

2 and Layer 2B consists of sheeted dikes (Figure 1.1). Layer 3 consists of intrusive coarse-grained gabbros which are similar to basalt in composition, and Layer 4 is predicted to be olivine and pyroxene-rich peridotites, or crystallized mantle (Bratt &

Purdy, 1984; White & Klein, 2013). Since olivine and pyroxene are common Fe- bearing minerals in deep basaltic crust and crust which forms slowly, the these minerals may exert a wide-ranging influence over the microbial communities here.

Figure 1.1. Layers of oceanic crust. a) Layers of igneous oceanic crust in fast and intermediate spreading zones. b) Model of ultra-slow spreading crust composed of mainly olivine-rich peridotite. Layer 2A, the porous, fractured uppermost layer of igneous crust in an intermediate spreading zone, is the location of study in this dissertation and where the greatest microbial biomass is predicted to reside (Heberling et al., 2010; modified from Alt, 1995 after Bratt and Purdy [1984] with additional data from Kennett [1982], Dick et al. [2003, 2007] and Kearey et al. [2009]).

3

The Suboceanic Aquifer

Volcanic igneous crust is exposed (with little to no sediment) at ridge crests, submarine volcanoes, and outcrops along the seafloor, which allows seawater to circulate through the porous, fractured upper basaltic Layer 2 (Wheat, 2004; Bratt &

Purdy, 1984; Fisher et al., 2005). Cool, oxic seawater enters the crust and becomes heated and more chemically reduced as it circulates, eventually venting back to the ocean at hydrothermal vents, seamounts, nearby outcrops, or through diffuse crustal flow (Figure 1.2; Edwards et al., 2011; Kelley et al., 2001; Emerson & Moyer, 2002;

Alt, 1995). Aquifer fluid is warmer and more reduced near ridge crests, deeper within the crust, and where sediments prevent seawater exchange (Edwards et al.,

2012a and b; Edwards et al., 2005), where concentrations of sulfate, nitrate, and oxygen are lower than bottom seawater and reduced iron, ammonium, methane, and molecular hydrogen higher than that of bottom seawater (Lin et al., 2012).

The thermodynamic disequilibrium between circulating seawater and reduced iron in crustal minerals can be used as a source of energy for in the crust, potentially supporting metabolisms such as Fe oxidation (Orcutt et al., 2011b;

Edwards et al., 2012a; Bach & Edwards, 2003; Edwards et al., 2003) and hydrogenotrophy (Edwards et al., 2005; McCollom & Bach, 2009; Miller et al., 2017;

Jungbluth et al., 2017). In the suboceanic aquifer, there is little remaining allochthonous organic matter originating from photosynthetic processes in the surface ocean, and even though this photosynthetically-derived organic matter may be consumed in the crust over time (McCarthy et al., 2010), the SAB ecosystem as a whole may be more dependent on chemosynthesis supported by water-rock reactions.

4

Thus, studying this vast chemosynthetic ecosystem may provide valuable insights into early life on Earth or other worlds where chemosynthesis is expected to be dominant.

Figure 1.2. General overview of the suboceanic aquifer. Cool, oxygen-rich bottom seawater containing photosynthetically-derived organic carbon, nitrate, and sulfate enters the crust at exposed areas. This circulating fluid becomes increasingly altered and reduced, accumulating reduced iron, methane, ammonium, and molecular hydrogen, while the oxygen, sulfate, organic carbon, and nitrate is consumed. Microorganisms can use the thermodynamic disequilibrium between reduced minerals and seawater to support chemosynthesis. Hydrothermally-altered fluid exits the crust, returning nutrients to the deep ocean.

The Habitable Zone

The oceanic crust contains a vast microbial habitat that can theoretically exist to depths where life’s temperature limit of ~120 C has not been reached. This

‘habitable zone’ occurs mainly in the upper basaltic layer of most crustal environments, but can extend up to 4 kilometers deep in older or slow-spreading crust

(Heberling et al., 2010; Ildefonse et al., 2010). At this depth, the ‘habitable zone’ can extend down into the gabbroic layer or perhaps even as deep as the peridotite layer

5

(Figure 1; Heberling et al., 2010; Ildefonse et al., 2010). Since the potential habitable crust extends kilometers deep and spans over half the Earth’s surface, the sheer volume of the suboceanic aquifer ecosystem suggests that the deep crustal biosphere has the potential to significantly impact global biogeochemical cycles and ocean chemistry and productivity (Orcutt et al., 2011b; Edwards et al., 2012a; Mason et al.,

2009; McCarthy et al., 2010). Further exploration into the phylogenetic and metabolic properties of the microbes that live in this environment as well as the application of culturing to determine their growth rates and activities are crucial to determining their full impact on ocean productivity and global biogeochemical cycling.

Aquifer Sampling

Circulation Obviation Retrofit Kits (CORKs)

One of the most important tools that have been developed for the study of the suboceanic aquifer is the CORK. The CORK system allows microbiologists, geochemists, and seafloor hydrologists to obtain data from long-term experiments which they use to understand the aquifer and its inhabitants. The CORK consists of coated steel, fiberglass, or high-density plastic casings (Edwards et al., 2010) equipped with packers that create an artificial seal between layers of rock or sediment in the crust, simulating natural flow conditions and limiting fluid mixing in the borehole. The advantages of CORKs are 1) to allow the aquifer to return to its natural state of temperature, pressure, and fluid flow, while also allowing the microbial communities a chance to recover from the drilling process, 2) to enable the continuous long-term study of aquifer geochemistry while monitoring changes in

6 environmental parameters, and 3) to allow for the incubation of in-situ microbial colonization devices (Fisher et al., 2005).

Microbiological Sampling

There are two options for microbiological sampling using a CORK. The first option is sampling aquifer fluids at the well head without disturbing the aquifer (this was usually done during yearly JdFR cruise); the second option for microbiological sampling is to use the colonization substrates that are incubated long-term in flow cells or other devices suspended from an instrument string within the CORK (Smith et al., 2011; Orcutt, et al., 2011a). These colonization substrates are usually rocks or minerals similar in composition to the surrounding crust and are used to investigate the attached microbial community in the aquifer. Much of the sampling of the deep crustal biosphere is done though fluid sampling either at the CORK or through sampling efflux of aquifer fluids at sites such as Baby Bare near Hole 1301A (Huber et al., 2006; Lin et al., 2012; Jungbluth et al., 2016, 2013; Cowen et al., 2003).

Microbiologists using fluids for microbiological or geochemical analyses have the advantage of being able to obtain aquifer samples frequently and without disturbing the microbial observatory, and they can collect larger samples than with downhole collection methods. Mineral colonization studies are limited in that they cannot be sampled until the minerals are recovered, there may be founder effects where one mineral in a sequence affects the communities on subsequent minerals in the sequence, sample sizes are limited by the diameter of the borehole and CORK, and there is not enough space in the CORK for replicates. However, borehole

7 colonization studies are necessary as there is no other practical alternative for studying the attached community in deep crustal environments.

Planktonic vs. Sessile Communities

Many of the microbiological studies of igneous crust occur by sampling communities in venting aquifer fluids (Jungbluth et al., 2014, 2013; Cowen et al.,

2003; Huber et al., 2006), but studies of continental aquifer communities indicate that sessile (attached) microbes better represent the whole aquifer community (Lehman,

2007). Thus, the key to understanding whole microbial communities in the suboceanic aquifer requires the study of the sessile community as well. Previous studies of sessile communities in bulk surface rock have shown that mineralogy strongly influences community structure in igneous crust (Sylvan et al., 2013; Flores et al., 2011; Toner et al., 2013), yet questions remain as to which individual mineral classes or compositions support microbial life in the crust and what metabolic strategies are most likely on these different mineral types.

The Suboceanic Aquifer Biosphere (SAB)

A Significant Reservoir of Biomass

The majority of life on Earth is predicted to occur in subsurface environments

(Figure 1.3; Whitman et al., 1998), with the greatest biomass occurring in aquifer and marine sediments. After considering new revelations that sediments beneath oligotrophic ocean gyres contain significantly lower biomass than originally predicted

(Kallmeyer et al., 2012), terrestrial aquifer sediments are now predicted to hold the most biomass (Heberling et al., 2010). Neither of these studies included estimates for

8 igneous oceanic crust, which contains a globally-distributed suboceanic aquifer with the potential to hold significant biomass. Since investigations of the deep aquifer biosphere are met with unique challenges in sampling and expense, estimates of biomass have focused on using models that incorporate the nature and extent of the ocean crust within the ‘habitable zone’. Heberling et al., (2010) calculated the potential biomass within the habitable zone of the igneous oceanic crust and found that most of it resides in the basaltic layer, where porous and fractured crust promotes seawater circulation and water-rock reactions that can support an active subsurface biosphere. More importantly, Heberling and colleagues estimated that the potential biomass in this ‘habitable zone’ amounts to 200 Pg of carbon. This biomass rivals or even surpasses that of continental aquifer sediment (22 – 215 Pg C), the largest known biological reservoir of carbon (Figure 1.3; Whitman et al., 1998; Kallmeyer et al., 2012; Heberling et al., 2010). This biomass estimate also far surpasses those for all other aquatic (2.2 Pg C), soil (26 Pg C), and marine sediments (4.1 Pg C) combined (Heberling et al., 2010; Whitman et al., 1998; Kallmeyer et al., 2012). The suboceanic aquifer therefore represents a global subsurface ecosystem (Whitman et al., 1998; Kallmeyer et al., 2012; Nielsen & Fisk, 2010) whose microbial activities may significantly impact biogeochemical cycles, ocean chemistry, and ocean productivity (Edwards et al., 2011; McCarthy et al., 2010; Mason et al., 2010).

Microbiological studies of SAB microbes will aid in our understanding of the function of this ecosystem and help elucidate the metabolic strategies these microbes employ to survive in low-energy environments with no sunlight.

9

Figure 1.3. Global carbon biomass by reservoir. Aquatic biomass includes all marine, freshwater, and saline habitats. Marine sediment includes seafloor Layer 1 biomass estimates and measurements. Soil includes terrestrial plants and soil microorganisms. Terrestrial aquifers include measured and predicted numbers for aquifer sediments and groundwater, which vary up to an order or magnitude. Suboceanic aquifer predictions were based on the extent of the habitable zone and crustal properties such as porosity. Pg C = 1015 grams of carbon. Data taken from 1. Whitman et al., 1998, 2. Kallmeyer et al., 2012, and 3. Heberling et al., 2010.

Common Energy Sources

Microbial studies of the igneous oceanic crust have shown that microbial metabolism often reflects fluid chemistry, available energy, and mineralogy of the crust (Zhang et al., 2016; Boettger et al., 2013; Jungbluth et al., 2013; Lin et al.,

2012; Cowen et al., 2003; Robador et al., 2015). Where oxic seawater and photosynthetically-derived organic matter infiltrate the cool, fractured basaltic crust,

Fe oxidation and heterotrophy are common metabolic strategies (Zhang et al., 2016).

The thermodynamic disequilibrium between Fe in crustal minerals and O2 from

10 seawater can fuel microbial activity (Edwards et al., 2011; Edwards et al., 2003;

Orcutt et al., 2011b; Emerson et al., 2010), and heterotrophs can gain energy from oxidizing the organic matter in seawater. In thermally- or hydrothermally-influenced regions of the crust, the altered fluid chemistry, reducing environment, lower organic matter, and mineral weathering contribute to an environment more conducive for chemolithotrophy and autotrophy than heterotrophy. These metabolisms are often based on inorganic energy sources such as molecular hydrogen (H2) originating from

Fe-bearing rock-water reactions like serpentinization and seawater-derived sulfate

2- (SO4 ) (Ver Eecke et al., 2012; Nealson et al., 2005; Takai et al., 2004; Jungbluth et al., 2017; Stetter, 2006; Robador et al., 2015; Lever, 2012; Nakagawa et al., 2006;

Brazelton et al., 2006).

Potential for H2-based life in the SAB

Molecular hydrogen (H2) produced in subsurface igneous habitats could potentially support a variety of microbial processes that use hydrogen as a reductant, such as acetogenesis, sulfate and iron reduction, and methanogenesis (Kashefi et al.,

2002; Chivian et al., 2008; Grabarse et al., 2001; Ragsdale & Pierce, 2008). The

3+ reduction of these compounds (i.e., sulfate, CO, CO2, Fe ) can be used for biosynthesis, energy production, or both (as in methanogenesis and acetogenesis).

Hydrogenotrophic and acetogens could provide the basis for a subsurface chemosynthetic community that could exist completely independent of photosynthetically-derived energy sources, as was first posited by the Subsurface

11

Lithoautotrophic Microbial Ecosystem (SLiME) (Stevens & McKinley, 1995) and

HyperSLiME hypotheses (Takai et al., 2004; Nealson et al., 2005).

H2-based SLiMEs are predicted to be the earliest form of microbial life on

Earth, and may have arisen in thermally- or hydrothermally-influenced oceanic crust like the JdFR aquifer (Nitschke & Russell, 2013); however, the existence of “true

SLiMEs” remains difficult to prove since subsurface communities often have photosynthetically-derived carbon input and may therefore not be fully autotrophic

(Nealson et al., 2005). Conditions in the deep anoxic aquifer remain similar today to those found in early Earth’s crust (warm, anoxic, reducing) and the microbial communities present could be using metabolic pathways that are the same or similar to those present in early crustal communities.

Molecular hydrogen in igneous aquifers could be mantle-derived or it may be produced as a result of the serpentinization of Fe-bearing minerals such as olivine.

Serpentinization is a process whereby hot, iron-bearing rocks or minerals interact with circulating seawater in the crust to produce hydrogen (Reaction 1) and potentially low molecular-weight hydrocarbons and lipids through a Fischer-Tropsch- type synthesis (Fruh-Green, 2004; McCollom & Bach, 2009; Sleep et al., 2004).

2+ Fe + H2O  H2 + FeO Reaction 1

Hydrogen was shown to evolve even at relatively low thermal temperatures like those in the JdFR aquifer at Hole 1301A (~ 64 C; Mayhew et al., 2013), which translates to a high potential for hydrogen-driven microbial communities in the vast igneous oceanic crustal aquifer.

12

The concentration of H2 in the JdFR aquifer may be underestimated with fluid chemistry measurements and may not directly determine the potential for hydrogenotrophy. The concentration of hydrogen in aquifer fluids at Hole 1301A is just sufficient to support H2 -based metabolisms (Lin et al., 2014); however, H2 may be rapidly consumed by the aquifer community, causing the concentration to decrease. There may also be pockets of hydrogen produced on Fe-bearing mineral surfaces that biofilm communities could use for energy that may not be accounted for in fluid chemistry measurements (Mayhew et al., 2013; McCollom & Bach, 2009;

Miller et al., 2017). These could support H2 –based microbial communities that are localized to Fe-bearing mineral surfaces. Assessing the genomic potential for hydrogenotrophy in the crustal aquifer could help answer the question of whether H2 plays a central role in the JdFR aquifer community.

The SAB of the Juan de Fuca Ridge

Predicted Metabolisms

The microbiology of the JdFR has been well-studied, especially with regard to communities from sampled aquifer fluids (Jungbluth et al., 2013, 2014; Cowen et al.,

2003; Jungbluth et al., 2016; Orcutt et al., 2011a; Smith et al., 2016; Fisk et al., 2000;

Jungbluth et al., 2017; Mason et al., 2009; Huber et al., 2006; Nakagawa et al., 2006;

Robador et al., 2015; Smith et al., 2011). Although fluid community structure has been shown to possess spatial and temporal variability (Jungbluth et al., 2013, 2014), there is some consistency with regard to the predominant metabolisms found in microbial observatories of the JdFR. Sulfate reduction, ammonia oxidation, and

13 methane cycling are among the most common metabolisms reported here, mainly based on genomic data (Lin et al., 2014; Cowen et al., 2003; Jungbluth et al., 2013,

2016). Microbial communities from whole rock, incubation substrates, and sulfides have been investigated and may be cycling methane, carbon, and sulfur (Robador et al., 2015; Lever et al., 2013; McCarthy et al., 2010; Orcutt et al., 2015). The metabolisms of most of these organisms have not been well studied since they have not been cultured; however, genomic studies suggest the presence carbon fixation pathways such as the Wood-Ljungdahl pathway (Orcutt et al., 2015; Lever et al.,

2013; Jungbluth et al., 2017) and uncovered target genes for energy metabolisms like sulfate reduction and methane cycling (Robador et al., 2015; Lever et al., 2013).

Dominant Taxonomic Groups

Taxonomic groups commonly identified in the local JdFR community include thermophilic and hyperthermophilic Firmicutes, members of the Archaeoglobaceae, and Bacterial candidate phyla such as OP1 and OP8 (Hugenholtz et al., 1998). Ca.

Desulforudis are common in deep terrestrial and marine subsurface habitats

(Jungbluth et al., 2013; Chivian et al., 2008), including the JdFR aquifer (Orcutt et al., 2011b; Jungbluth et al., 2013; Smith et al., 2016; Jungbluth et al., 2017).

Genomic evidence suggests these organisms may play a role in nitrogen fixation and sulfate reduction in deep subsurface environments (Chivian et al., 2008; Jungbluth et al., 2017). Ca. Desulforudis audaxviator and a close relative from the JdFR also possess genes involved in the Wood-Ljungdahl pathway for carbon fixation and are genetically capable of heterotrophy (Jungbluth et al., 2017; Chivian et al., 2008).

Deep-branching groups from Firmicutes such as Clostridia are also quite common in

14 the JdFR aquifer and have an unknown function since they are not closely related to cultured representatives (Smith et al., 2016; Orcutt et al., 2011b). Archaeoglobaceae inhabiting the aquifer include the sulfate-reducing Archaeoglobus (Klenk et al., 1998;

Nakagawa et al., 2006) and iron-reducing (Kashefi & Tor, 2002), indicating that energy metabolisms based on iron and sulfur cycling may be present.

Related Archaeal taxa also contain the acetyl Co-A pathway for carbon fixation and are capable of using hydrogen as an electron donor for energy metabolism (Kashefi et al., 2002; Klenk et al., 1998). The presence of these taxonomic groups suggests that hydrogenotrophy, sulfate and iron reduction, and the Wood-Ljungdahl pathway may be common in the SAB.

Carbon Fixation in the SAB

Carbon Fixation Pathways

There are currently six known pathways for carbon fixation: (1) The Calvin cycle, (2) the reductive TCA cycle (rTCA), (3) the Wood-Ljungdahl or reductive acetyl CoA pathway, and (4 – 6) the related 3-hydroxypropionate, 3- hydroxypropionate/4-hydroxybutyrate, and dicarboxylate/4-hydroxybutyrate cycles

(Braakman & Smith, 2012). The rTCA and Wood-Ljungdahl pathways are ancient forms of carbon fixation, and function in anaerobic conditions that mirror those in deep oceanic crust, on early Earth, and on other planets and moons (Braakman &

Smith, 2012; Russell & Martin, 2004; Nitschke & Russell, 2013). The most abundant enzyme associated with carbon fixation in surface basalts of oceanic crust was found to be ribulose 1,5-bisphospate carboxylase/oxygenase (RuBisCo; Orcutt et al., 2015),

15 the enzyme responsible for carbon fixation in the first phase of the Calvin cycle during oxygenic photosynthesis:

CO2 + RuBisCo  2,3-phosphoglycerate Reaction 2

This pathway functions in aerobic conditions, but most of the oceanic crustal aquifer is anoxic and reducing; thus, it is not expected to play a large role in most of the crust.

The Wood-Ljungdahl pathway was not tested in that study so its importance in oceanic crust remains unknown. The rTCA, or Arnon-Buchanan cycle, is used to fix carbon dioxide into acetate using the enzymes involved in the Tricarboxylic Acid

(TCA) cycle (also known as the Krebs Cycle or the Citric Acid Cycle). Although this pathway is ancient and functions under anaerobic conditions (Braakman & Smith,

2012), it has not been found to be a prominent carbon fixation pathway in oceanic crust. The related 3-hydroxypropionate, 3-hydroxypropionate/4-hydroxybutyrate, and dicarboxylate/4-hydroxybutyrate cycles are not widespread and have been detected only in green nonsulfur bacteria and some thermophilic archaea. These pathways are not expected to play a role in carbon cycling in oceanic crust.

The Ancient Wood-Ljungdahl Pathway

The Wood-Ljungdahl pathway is a common carbon fixation and energy pathway in deep, anoxic thermal aquifers (Magnabosco et al., 2015; Takami et al.,

2012; Nealson et al., 2005; Lever, 2012; Chivian et al., 2008) and has either been predicted to occur or has been found in the JdFR aquifer in some organisms (Lever et al., 2013; Orcutt et al., 2015; Jungbluth et al., 2017). This pathway is an ancient carbon fixation and biosynthetic pathway used by acetogens, methanogens, and sulfate reducers to produce acetate or acetyl-CoA from simple inorganic carbon

16 sources like CO2 (Nitschke & Russell, 2013; Stephen W. Ragsdale & Pierce, 2008;

Russell & Martin, 2004):

2 CO2 + 4 H2  CH3COOH + 2 H2O Reaction 3

The bifunctional enzyme responsible for fixing carbon in this pathway is carbon monoxide dehydrogenase/acetyl-CoA synthase (CODH/ACS). Although archaeal sulfate reducers and methanogens as well as bacterial acetogens use this enzyme to fix carbon, the bacteria and archaea each have a unique set of enzymes to harness energy that is coupled to the production of acetate. Methanogens and sulfate reducers have a more complex pathway involving more enzymes and proteins; thus, it is believed that the bacterial acetogenesis pathway evolved prior to the methanogenesis pathway (Nitschke & Russell, 2013; Grabarse et al., 2001). Several lines of evidence point to the Wood-Ljungdahl pathway as the most ancient carbon fixation pathway and the first biosynthetic pathway, and that it may have first appeared in an environment much like the one in the JdFR crustal aquifer (Nitschke & Russell,

2013), which is warm, dark, anoxic, and has low organic matter.

Enzymology of the Wood-Ljungdahl Pathway

The Wood-Ljungdahl pathway consists of two “branches”, the carbonyl branch whereby CO2 is reduced to CO by the bifunctional enzyme CODH/ACS and the methyl branch which reduces another CO2 to a methyl group via the formation of formate (Ragsdale, 2008; Ragsdale & Pierce, 2008). The CODH/ACS enzyme binds the CO from the carbonyl branch to the methyl group from the methyl branch as well as a CoA group to finally form acetyl CoA. The carbonyl branch, also known as the

‘acetyl-CoA pathway’, is shared between acetogens, sulfate reducers, and

17 methanogens (Pierce et al., 2008; Ragsdale, 2008b), and may be used for carbon fixation without the methyl branch.

Evidence for Carbon Fixation in the JdFR Aquifer

Carbon fixation in the JdFR aquifer near IODP Hole 1301A has been previously reported from isotopic measurements, genomic reconstructions, and laboratory rate measurements (McCarthy et al., 2010; Orcutt et al., 2015; Jungbluth et al., 2017). Although the net reactions of microbes in the JdFR suboceanic aquifer may be heterotrophic since organic carbon measurements are lower than those for bottom seawater, there is evidence of fresh organic carbon being vented to the seafloor near Hole 1301A (McCarthy et al., 2010; Lin et al., 2012). Carbon fixation rate measurements for cored basalts recovered from nearby Holes U1382A and

U1383C indicate an active autotrophic community in the JdFR aquifer (Orcutt et al.,

2015).

Functional gene assays for seafloor-exposed basalts were previously performed for RuBisCo form II, ATP citrate , and methyl CoM reductase to detect the presence of the Calvin Benson Bassham cycle (also reductive pentose phosphate or C3 cycle), the reverse Tricarboxylic Acid (rTCA) cycle, and methanogenesis, respectively (Mason et al., 2009; Orcutt et al., 2015). Orcutt et al.

(2015) determined that the Calvin cycle was the predominant carbon fixation pathway in oceanic crust, potentially with additional input from other pathways such as the

Wood-Ljungdahl; however, this assessment was based solely on taxonomic groups present and not the presence of a target gene for the Wood-Ljungdahl pathway (i.e.,

CODH/ACS) or activity measurements. One dominant member of the JdFR fluid

18 aquifer community was also shown to possess genes for the Wood-Ljungdahl pathway, suggesting that this pathway may be a key carbon fixation pathway in the

JdFR aquifer (Jungbluth et al., 2017); however, more studies are needed, particularly with respect to the sessile communities, to determine the prevalence of carbon fixation pathways in the JdFR aquifer biosphere.

Research Questions and Hypotheses There are a number of questions that remain regarding the function of the

SAB and its potential effect on ocean productivity and chemistry. The goal of this dissertation work is to address some of these questions by studying microbial communities that colonize minerals in the thermal JdFR aquifer. The questions that this dissertation specifically addresses, the hypotheses, and expected outcomes are:

1. Does mineralogy influence microbial community structure? We

hypothesize that the class of minerals or their composition will dictate the

types of organisms that will be present. Fe-bearing minerals are expected

to have a higher abundance of organisms that are related to known

microbes that can use Fe for energy or its alteration H2. The

olivine class of minerals weathers more quickly and contains Fe, so its

mineralogy and its composition may influence its community structure.

This hypothesis will be tested in Chapter 2.

2. What are the common carbon and energy metabolisms found in the

genomes of organisms from JdFR mineral-colonizing biofilms? Based on

19

previous taxonomic surveys and geochemistry of the system, we

hypothesize that genomic evidence for sulfate reduction,

hydrogenotrophy, and carbon fixation will be present. This hypothesis will

be tested in Chapter 3.

3. Is there genomic evidence that these same biofilm communities (from #2)

possess the functional capability to use the Wood-Ljungdahl pathway? We

hypothesize that some of these biofilm organisms possess the complete

Wood-Ljungdahl pathway. If they do contain the complete pathway, they

may be functionally capable of using H2 and CO2 to fuel chemosynthesis

in the JdFR. Some of these organisms may be novel and their metabolisms

may not have been described before. Thus, further exploration of such

novel organisms and their metabolic strategies is a major goal of this

dissertation in that it will lead to a deeper understanding of the potential

function of these aquifer communities. This hypothesis will be tested in

Chapter 3, and a novel JdFR biofilm community organism’s genome is

described in Chapter 4.

Approach: A Genomic Investigation of Colonizing Mineral Biofilms in a Subseafloor Microbial Observatory The questions proposed in this dissertation were explored using incubated mineral sands emplaced in a subseafloor borehole equipped with CORK technology on the eastern flank of the Juan de Fuca Ridge (47° 45.210′ N, 127° 45.833′ W; Smith et al., 2011). This subseafloor microbial observatory, located at International Ocean

Drilling Program (IODP) Hole 1301A, was emplaced into 3.5 Ma oceanic crust at

20

2,667 meters below sea level (Fisher et al., 2005). This hole penetrates through the overlying sediment and into the basaltic basement (Layer 2 A; Figure 1.1) where twelve igneous minerals and glasses (Table 1.1) were suspended for four years. We then used a genomic approach (i.e. 454 pyrotag sequencing of the 16S rRNA gene) to investigate the microbial community structure of each mineral biofilm and produce a metagenome from Fe-bearing olivine, a major mineral class found in igneous oceanic crust.

Table 1.1. Igneous minerals and glasses incubated in IODP Hole 1301A, their mineral classes, and compositions.

Mineral Name Formula Class

forsterite (Fo100) Mg2SiO4 Olivines olivine (Fo90) Mg1.8Fe0.2SiO4 fayalite (Fo0) Fe2SiO4 Amphibole hornblende Ca2(Mg,Fe)4Al(Si7Al)O22(OH)2 basalt glass Variable composition Glasses obsidian Variable composition augite (Mg,Fe)CaSiO Pyroxenes 6 diopside MgCaSiO6 anorthite CaAl2Si2O8 Feldspars bytownite Na0.2Ca0.8Al1.8Si2.2O8 orthoclase KAlSi3O8 3- PO4 mineral apatite Ca5(PO4)3OH

The mineral samples used for the investigations in this dissertation were placed in-situ in the basaltic aquifer long-term, which allowed us to empirically test the role of crustal mineralogy on microbial communities and their potential function.

Some drawbacks to using this technique include sampling from a disturbed site which may not fully reflect the native community, contamination with seawater and sediment communities as mixing occurs during and after drilling (Jungbluth et al.,

21

2016), and risk of sample loss due to changing crustal conditions (this occurred with our duplicate samples in a nearby borehole following an earthquake that shifted the casing and prevented retrieval of samples). The microbial communities obtained from mineral colonization substrates will not necessarily be representative of a community from a whole rock that contains those minerals, but using individual mineral substrates can be beneficial as they can be used to determine the driving forces of microbial function and distribution with respect to mineralogy. Fundamentally, despite the limitations of the in situ incubations, there is no other way that these studies can be conducted. Thus, experiments made using these incubations can be used to make more accurate predictions about microbiology across all crustal habitats.

Objective and Expected Outcomes

The main objective of this dissertation is to define the genomic potential of mineral-colonizing microbial communities in the suboceanic aquifer of the Juan de

Fuca Ridge and to determine how mineralogy influences microbial community structure in this local environment. These studies will inform future investigations that promote cultivation of novel microbes from the suboceanic aquifer whose growth or carbon fixation rates can be studied in the laboratory. Cultivation can be achieved with more success when armed with the knowledge of the microbial metabolisms that are present, such as that acquired by a metagenome analysis. Information about how mineralogy may be influencing the structure and composition of the attached communities will also allow us to make inferences about the SAB in other regions of the crust with distinct mineralogies (i.e., in peridotite-hosted slow and ultra-slow

22 spreading regions). Ultimately, these studies will increase our understanding of the function of the suboceanic aquifer ecosystem, especially with respect to how important chemosynthesis may be and how activities of the SAB may influence ocean chemistry and productivity.

References

Alt JC. (1995). Subseafloor processes in Mid-Ocean ridge hydrothermal systems. Geophys Monogr 91:85–114.

Bach W, Edwards KJ. (2003). Iron and sulfide oxidation within the basaltic ocean crust: implications for chemolithoautotrophic microbial biomass production. Geochim Cosmochim Acta 67:3871–3887.

Boettger J, Lin H-T, Cowen JP, Hentscher M, Amend JP. (2013). Energy yields from chemolithotrophic metabolisms in igneous basement of the Juan de Fuca ridge flank system. Chem Geol 337–338:11–19.

Braakman R, Smith E. (2012). The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol 8:e1002455.

Bratt SR, Purdy GM. (1984). Structure and variability of oceanic crust on the flanks of the East Pacific Rise between 11° and 13°N. J Geophys Res Solid Earth 89:6111– 6125.

Brazelton WJ, Schrenk MO, Kelley DS, Baross J. (2006). Methane- and sulfur- metabolizing microbial communities dominate the Lost City hydrothermal field ecosystem. Appl Environ Microbiol 72:6257–70.

Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, DeSantis TZ, et al. (2008). Environmental genomics reveals a single-species ecosystem deep within Earth. Science 322:275–278.

Cowen JP, Giovannoni SJ, Kenig F, Johnson HP, Butterfield D, Rappé MS, et al. (2003). Fluids from aging ocean crust that support microbial life. Science 299:120–3.

Dick HJB. (1989). Abyssal peridotites, very slow spreading ridges and ocean ridge magmatism. Geol Soc London, Spec Publ 42:71–105.

23

Dick HJB, Lin J, Schouten H. (2003). An ultraslow-spreading class of ocean ridge. Nature 426:405–12.

Dick HJB, Tivey MA, Tucholke BE. (2008). Plutonic foundation of a slow-spreading ridge segment: Oceanic core complex at Kane Megamullion, 23°30′N, 45°20′W. Geochemistry, Geophys Geosystems 9. doi:10.1029/2007GC001645.

Edwards KJ, Bach W, McCollom TM. (2005). Geomicrobiology in oceanography: microbe-mineral interactions at and below the seafloor. Trends Microbiol 13:449–56.

Edwards KJ, Bach W, Rogers DR. (2003). Geomicrobiology of the ocean crust: a role for chemoautotrophic Fe-bacteria. Biol Bull 204:180–5.

Edwards KJ, Becker K, Colwell F. (2012a). The Deep, Dark Energy Biosphere: Intraterrestrial Life on Earth. Annu Rev Earth Planet Sci 40:551–568.

Edwards KJ, Fisher AT, Wheat CG. (2012b). The deep subsurface biosphere in igneous ocean crust: frontier habitats for microbiological exploration. Front Microbiol 3:8.

Edwards KJ, Glazer BT, Rouxel OJ, Bach W, Emerson D, Davis RE, et al. (2011a). Ultra-diffuse hydrothermal venting supports Fe-oxidizing bacteria and massive umber deposition at 5000 m off Hawaii. ISME J 5:1748–1758.

Edwards KJ, Wheat CG, Sylvan JB. (2011b). Under the sea: microbial life in volcanic oceanic crust. Nat Rev Microbiol 9:703–712.

Edwards, KJ, Bach W, and Klaus A (2010). Integrated Ocean Drilling Program Prospectus, Expedition 336. College Station: IODP

Emerson D, Fleming EJ, McBeth JM. (2010). Iron-oxidizing bacteria: an environmental and genomic perspective. Annu Rev Microbiol 64:561–83.

Emerson D, Moyer CL. (2002). Neutrophilic Fe-Oxidizing Bacteria Are Abundant at the Loihi Seamount Hydrothermal Vents and Play a Major Role in Fe Oxide Deposition. Appl Env Micro 68:3085–3093.

Fisher AT, Wheat CG, Becker K, Davis EE, Jannasch H, Schroeder D, et al. (2005). Scientific and technical design and deployment of long-term subseafloor observatories for hydrogeologic and related experiments, IODP Expedition 301, eastern flank of Juan de Fuca Ridge, and general design. Proc Integr Ocean Drill Progr 301. doi:10.2204/iodp.proc.301.103.2005.

24

Fisk MR, Thorseth IH, Urbach E, Giovannoni SJ. (2000). Investigation of microorganisms and DNA from subsurface thermal water and rock from the east flank of Juan de Fuca Ridge. Proc Ocean Drill Program, Sci Results 168:167–174.

Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI, et al. (2011). Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge. Environ Microbiol 13:2158–71.

Fruh-Green G. (2004). Serpentinization of Oceanic Peridotites. The Subseafloor Biosphere at Mid-Ocean Ridges. Geophys Monogr Ser 144. 10.1029/144GM08.

Grabarse W, Mahlert F, Duin EC, Goubeaud M, Shima S, Thauer RK, et al. (2001). On the mechanism of biological methane formation: structural evidence for conformational changes in methyl-coenzyme M reductase upon binding. J Mol Biol 309:315–330.

Heberling C, Lowell RP, Liu L, Fisk MR. (2010). Extent of the microbial biosphere in the oceanic crust. Geochemistry Geophys Geosystems 11:1–15.

Huber J, Johnson HP, Butterfield D, Baross J. (2006). Microbial life in ridge flank crustal fluids. Environ Microbiol 8:88–99.

Hugenholtz P, Pitulle C, Hershberger KL, Pace NR. (1998). Novel division level bacterial diversity in a Yellowstone hot spring. J Bacteriol 180:366–76.

Ildefonse B, Abe N, Blackman D, Canales J, Isozaki Y, Kodaira S, et al. (2010). MoHole: A Crustal Journey and Mantle Quest, Workshop in Kanazawa, Japan, 3 - 5 June, 2010. Sci Drill 10:56–62.

Jungbluth SP, Bowers RM, Lin H, Cowen JP, Rappé MS. (2016). Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt. ISME J 10:1–15.

Jungbluth SP, Grote J, Lin H-T, Cowen JP, Rappé MS. (2013). Microbial diversity within basement fluids of the sediment-buried Juan de Fuca Ridge flank. ISME J 7:161–172.

Jungbluth SP, Lin H-T, Cowen JP, Glazer BT, Rappé MS. (2014). Phylogenetic diversity of microorganisms in subseafloor crustal fluids from Holes 1025C and 1026B along the Juan de Fuca Ridge flank. Front Microbiol 5:119.

25

Jungbluth SP, del Rio TG, Tringe SG, Stepanauskas R, Rappé MS. (2017). Genomic comparisons of a bacterial lineage that inhabits both marine and terrestrial deep subsurface systems. PeerJ 1–22.

Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, D’Hondt S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proc Natl Acad Sci U S A 109:16213–6.

Kashefi K, Tor JM, Holmes DE, Gaw Van Praagh C V, Reysenbach A-L, Lovley DR. (2002). Geoglobus ahangari gen. nov., sp. nov., a novel hyperthermophilic archaeon capable of oxidizing organic acids and growing autotrophically on hydrogen with Fe(III) serving as the sole electron acceptor. Int J Syst Evol Microbiol 52:719–728.

Kearey P, Klepeis K, Vine FJ. (2009). Global tectonics. doi:10.1038/236261b0.

Kelley DS, Karson JA, Blackman DK, Fruh-Green GL, Butterfield DA, Lilley MD, Olsen EJ, Schrenk MO, Roe KK, Lebon GT, and Rivizzigno, P (2001). An off-axis hydrothermal vent field near the Mid-Atlantic Ridge at 30 N. Nature 412:145 – 149.

Kennett JP. (1982). Marine Geology. Prentice-Hall https://books.google.com/books?id=gVASAQAAIAAJ.

Klenk H, Clayton RA, Tomb J, Dodson RJ, Gwinn M, Hickey EK, et al. (1998). The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon. Nature 394:6342–6349.

Lehman RM. (2007). Understanding of aquifer microbiology is tightly linked to sampling approaches. Geomicrobiol J 24:331–341.

Lever MA. (2012). Acetogenesis in the energy-starved -a paradox? Front Microbiol 2. doi:10.3389/fmicb.2011.00284.

Lever M, Rouxel O, Alt J, Shimizu N. (2013). Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Cowen JP, Olson EJ, Amend JP, Lilley MD. (2012). Inorganic chemistry, gas compositions and dissolved organic carbon in fluids from sedimented young basaltic crust on the Juan de Fuca Ridge flanks. Geochim Cosmochim Acta 85:213– 227.

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. (2014). Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

26

Magnabosco C, Ryan K, Lau MCY, Kuloyo O, Lollar BS, Kieft TL, et al. (2015). A metagenomic window into carbon metabolism at 3 km depth in Precambrian continental crust. ISME J 10:730–741.

Mason OU, Di Meo-Savoie C, Van Nostrand JD, Zhou J, Fisk MR, Giovannoni SJ. (2009). Prokaryotic diversity, distribution, and insights into their role in biogeochemical cycling in marine basalts. ISME J 3:231–42.

Mason OU, Nakagawa T, Rosner M, Van Nostrand JD, Zhou J, Maruyama A, et al. (2010). First investigation of the microbiology of the deepest layer of ocean crust. PLoS One 5:e15399.

Mayhew LE, Ellison ET, Mccollom TM, Trainor TP, Templeton AS. (2013). Hydrogen generation from low-temperature water–rock reactions. Nat Geo 6. doi:10.1038/NGEO1825.

McCarthy MD, Beaupré SR, Walker BD, Voparil I, Guilderson TP, Druffel ERM. (2010). Chemosynthetic origin of 14C-depleted dissolved organic matter in a ridge- flank hydrothermal system. Nat Geosci 4:32–36.

McCollom TM, Bach W. (2009). Thermodynamic constraints on hydrogen generation during serpentinization of ultramafic rocks. Geochim Cosmochim Acta 73:856–875.

Miller HM, Mayhew LE, Ellison ET, Kelemen P, Kubo M, Templeton AS. (2017). Low temperature hydrogen production during experimental hydration of partially- serpentinized dunite. Geochim Cosmochim Acta 209:161–183.

Nakagawa S, Inagaki F, Suzuki Y, Steinsbu BO, Lever MA, Takai K, et al. (2006). Microbial community in black rust exposed to hot ridge flank crustal fluids. Appl Environ Microbiol 72:6789–99.

Nealson KH, Inagaki F, Takai K. (2005). Hydrogen-driven subsurface lithoautotrophic microbial ecosystems (SLiMEs): do they exist and why should we care? Trends Microbiol 13:405–10.

Nielsen ME, Fisk MR. (2010). Surface area measurements of marine basalts: Implications for the subseafloor microbial biomass. Geophys Res Lett 37. doi:10.1029/2010GL044074.

Nitschke W, Russell MJ. (2013). Beating the acetyl -pathway to the origin of life. Philos Trans R Soc Lond B Biol Sci 368:20120258.

27

Orcutt BN, Bach W, Becker K, Fisher AT, Hentscher M, Toner BM, et al. (2011a). Colonization of subsurface microbial observatories deployed in young ocean crust. ISME J 5:692–703.

Orcutt BN, Sylvan JB, Knab NJ, Edwards KJ. (2011b). Microbial ecology of the dark ocean above, at, and below the seafloor. Microbiol Mol Biol Rev 75:361–422.

Orcutt BN, Sylvan JB, Rogers DR, Delaney J, Lee RW, Girguis PR. (2015). Carbon fixation by basalt-hosted microbial communities. Front Microbiol 6:1–14.

Pierce E, Xie G, Barabote R, Saunders E, Han C, Detter J, et al. (2008). The complete genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ Microbiol 10:2550–73.

Ragsdale SW. (2008). Enzymology of the Woods-Ljundahl Pathway of Acetogenesis. Ann N Y Acad Sci 1125:129–136.

Ragsdale SW, Pierce E. (2008). Acetogenesis and the Wood-Ljungdahl pathway of

CO2 fixation. Biochim Biophys Acta - Proteins Proteomics 1784:1873–1898.

Robador A, Jungbluth SP, LaRowe DE, Bowers RM, Rappé MS, Amend JP, et al. (2015). Activity and phylogenetic diversity of sulfate-reducing microorganisms in low-temperature subsurface fluids within the upper oceanic crust. Front Microbiol 6:1–13.

Russell MJ, Martin W. (2004). The rocky roots of the acetyl-CoA pathway. Trends Biochem Sci 29:358–363.

Sleep NH, Meibom a, Fridriksson T, Coleman RG, Bird DK. (2004). H2-rich fluids from serpentinization: geochemical and biotic implications. Proc Natl Acad Sci U S A 101:12818–23.

Smith A, Popa R, Fisk M, Nielsen M, Wheat CG, Jannasch HW, et al. (2011). In situ enrichment of ocean crust microbes on igneous minerals and glasses using an osmotic flow-through device. Geochemistry Geophys Geosystems 12:1–19.

Smith AR, Fisk MR, Thurber AR, Flores GE, Mason OU, Popa R, et al. (2016). Deep Crustal Communities of the Juan de Fuca Ridge Are Governed by Mineralogy. Geomicrobiol J 451.

Stetter KO. (2006). Hyperthermophiles in the history of life. Philos Trans R Soc Lond B Biol Sci 361:1837-42–3.

28

Stevens TO, McKinley JP. (1995). Lithoautotrophic Microbial Ecosystems in Deep Basalt Aquifers. Science 270:450–454.

Sylvan JB, Sia TY, Haddad AG, Briscoe LJ, Toner BM, Girguis PR, et al. (2013). Low temperature geomicrobiology follows host rock composition along a geochemical gradient in lau basin. Front Microbiol 4. doi:10.3389/fmicb.2013.00061.

Takai K, Gamo T, Tsunogai U, Nakayama N, Hirayama H, Nealson KH, et al. (2004). Geochemical and microbiological evidence for a hydrogen-based, hyperthermophilic subsurface lithoautotrophic microbial ecosystem (HyperSLiME) beneath an active deep-sea hydrothermal field. Extremophiles 8:269–82.

Takami H, Noguchi H, Takaki Y, Uchiyama I, Toyoda A, Nishi S, et al. (2012). A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem. PLoS One 7:e30559.

Toner BM, Lesniewski RA, Marlow JJ, Briscoe LJ, Santelli CM, Bach W, et al. (2013). Mineralogy drives bacterial biogeography of hydrothermally inactive seafloor sulfide deposits. Geomicrobiol J 30:313–326.

Ver Eecke HC, Butterfield DA, Huber JA, Lilley MD, Olson EJ, Roe KK, et al. (2012). Hydrogen-limited growth of hyperthermophilic methanogens at deep-sea hydrothermal vents. Proc Natl Acad Sci U S A 109:13674–9.

Wheat CG. (2004). Heat flow through a basaltic outcrop on a sedimented young ridge flank. Geochemistry Geophys Geosystems 5. doi:10.1029/2004GC000700.

White WM, Klein EM. (2013). Composition of the Oceanic Crust. Treatise on Geochemistry: Second Edition 4:457–496.

Whitman WB, Coleman DC, Wiebe WJ. (1998). : the unseen majority. Proc Natl Acad Sci 95:6578.

Zhang X, Feng X, Wang F. (2016). Diversity and metabolic potentials of subsurface crustal microorganisms from the western flank of the mid-Atlantic ridge. Front Microbiol 7:1–16.

29

2. DEEP CRUSTAL COMMUNITIES OF THE JUAN DE FUCA RIDGE ARE GOVERNED BY MINERALOGY

A.R. Smith, M.R. Fisk, A.R. Thurber, G.E. Flores, O.U. Mason, R. Popa, F.S. Colwell

Geomicrobiology Journal 5 Howick Place London, UK SW1P 1WG Volume 34(2), 147 – 156 (2016)

30

Abstract

Volcanic ocean crust contains a global chemosynthetic microbial ecosystem that impacts ocean productivity, seawater chemistry, and geochemical cycling. We examined the mineralogical effect on community structure in the aquifer ecosystem by using a four-year in situ colonization experiment with igneous minerals and glasses in IODP Hole 1301A on the Juan de Fuca Ridge. Microbial community analysis and scanning electron microscopy revealed that olivine phases and iron- bearing minerals bore communities that were distinct from iron-poor phases.

Communities were dominated by Archaeoglobaceae, Clostridia, Thermosipho,

Desulforudis, and OP1 lineages. Our results suggest that mineralogy determines microbial composition in the subseafloor aquifer ecosystem.

Introduction

Igneous oceanic crust contains the largest aquifer on earth (Johnson & Pruis,

2003). The basaltic layer contains ~2,300 m2 kg-1 of surface area that supports an extensive subsurface microbial ecosystem (Nielsen & Fisk, 2010; Heberling et al.,

2010; Santelli et al., 2008) whose biological activity impacts global carbon cycling and productivity in the ocean (Edwards et al., 2011; McCarthy et al., 2010).

Planktonic and mineral-attached microbial communities in aquifer habitats are distinct from one another, yet both are important contributors to aquifer ecology

(Lehman, 2007). Previous observational studies have suggested that mineralogy dictates microbial community structure in the ocean crust (Toner et al., 2013; Sylvan et al., 2013; Flores et al., 2011). However, these studies focused on communities at

31 the interface between the overlying seawater and the crustal surface, whereas the habitable zone extends up to kilometers below the surface (Heberling et al., 2010).

The primary drivers of microbial community structure in the deep oceanic crustal aquifer are largely unknown.

Igneous oceanic crust is a heterogeneous mix of minerals and glasses, and some of these can provide energy to microbial communities as they react with circulating seawater (Edwards et al., 2005). The majority of habitable oceanic crust is basaltic, composed primarily of plagioclase (~47%), pyroxene (~35%), and olivine

(~4%) minerals, and fine-grained material including glass (~14%; bolded in Table

1.1). In contrast, habitable ultramafic crust is composed of olivine and pyroxene and occurs along ~30% of the length of the ocean ridges (Dick et al., 2003). Plagioclase feldspars (NaAlSi3O8 – CaAl2Si2O8) are the most common minerals in ocean crust, but they contain only small amounts of redox-active elements and are less important in the overall biological activity of the crust. Calcic pyroxene [(Mg,Fe)CaSiO6], olivine [(Mg,Fe)2SiO4], and basalt glass (amorphous and without mineral structure) all contain significant amounts of Fe2+ and minor Mn2+, and pyroxene and basalt glass contain Fe3+, all of which could support microbial metabolism (e.g. Fe-oxidation, Fe reduction, and Mn-oxidation). Mafic minerals can produce enough molecular hydrogen to fuel microbial metabolism even at low temperatures (55 oC; Mayhew et al., 2013), and this is predicted to occur in aquifers of mafic oceanic crust (Lin et al.,

2014). The olivine group of minerals also weathers quickly, potentially supporting more robust microbial growth and increasing the rate at which energy is transferred between the geosphere and biosphere (McCollom, 2007; Mayhew et al., 2013). We

32 hypothesized that differences in ocean crust mineralogy and mineral chemistry influence the structure and distribution of microbial communities in the deep crustal aquifer. Furthermore, we hypothesized that minerals serving as rich energy sources

(e.g., Fe-bearing minerals and glasses) or are highly reactive (e.g., olivine) would promote richer microbial communities that are unique from minerals and glasses that do not contain sources of energy or are less reactive.

Table 2.1 Incubated igneous phase composition and community data. Most 1 common phases of incubated minerals and glasses in ocean crust are bolded. Total taxa include all groups of Archaeal and Bacterial sequences that were assigned taxonomy. 2Unique OTUs (combined Archaea and Bacteria) are found only on one mineral or glass. 3Cell density from DAPI nucleic acid staining and microscopic cell counting (from Smith et al., 2011).

Mineral Name Mineral Total Unique CHAO Cell Density3 Class Formula Taxa1 Taxa2 Diversity (106 cells g mineral-1) (Mineralogy) (Composition) (Richness) Archaea Bacteria

Forsterite (Fo100) Mg2SiO4 88 14 9 115 140 ± 21

Olivine Olivine (Fo90) Mg 1.8Fe 0.2SiO4 113 21 9 132 390 ± 19

Fayalite (Fo0) Fe2SiO4 94 14 7 111 280 ± 48

Amphibole Hornblende Ca2(Mg,Fe)4Al(Si7Al)O22(OH)2 94 9 10 112 150 ± 2 Basalt glass Si, Fe, other variable elements 98 9 11 120 88 ± 7 Glasses Obsidian Si, Fe, other variable elements 108 16 10 139 92 ± 7

Augite (Mg,Fe)CaSiO6 81 12 12 108 91 ± 2 Pyroxene Diopside MgCaSiO6 82 10 18 99 46 ± 11

We empirically tested these hypotheses using Integrated Ocean Drilling

Program (IODP) Hole 1301A on the eastern flank of the Juan de Fuca Ridge (JdFR) as a subseafloor microbial observatory equipped with Circulation Obviation Retrofit

Kit (CORK) technology (Smith et al., 2011; Fisher et al., 2005; Figure 2.1A). We incubated sand-sized grains of eight igneous minerals and glasses (Table 1.1) for four years (2004 – 2008) in microbial flow cells suspended ~ 280 meters below seafloor

(mbsf) in ~ 3.5 million year old basalt basement rock (Fisher et al., 2005). Seawater

33 entrainment occurred in Hole 1301A during the first three years of incubation (Figure

2.1B; Wheat et al., 2010), likely affecting the evolution of colonizing microbial communities. In contrast, during the last year of incubation the minerals were exposed to a consistent flow of natural aquifer fluids which were ~64 ºC, reduced, and rich in sulfate (17.6 mmol kg-1), iron (4 mol kg-1), and ammonium (840 mol kg-1) (Figure 2.1B; Wheat et al., 2010). We expect that this full year of natural basement aquifer conditions was sufficient for native thermophilic aquifer communities to become established mature biofilms on the minerals and glasses.

Figure 2.1. Location, experimental timeline, and temperature data for this study. (A) Location of IODP Hole 1301A on the eastern flank of the JdFR. (B) Hole 1301A temperature logger data and flow cell incubation timeline for the incubation period 2004 – 2008 (modified from Wheat et al., 2010).

Bacteria and Archaea living in the JdFR flank crustal aquifer can potentially use nitrate, sulfate, iron, molecular hydrogen, and organic molecules such as methane as energy sources for microbial metabolism. Nitrate reduction coupled to hydrogen or methane oxidation was predicted to be the most energetically favorable chemolithotrophic metabolism in anoxic aquifer fluid at this location of the crust; however, periods of seawater entrainment are more thermodynamically favorable for

34 ammonium oxidation, which yields the most energy per kg of basement fluid

(Boettger et al., 2013). Fluid chemistry data and laboratory incubation experiments indicate the molecular hydrogen and methane concentrations in circulating fluids of

Hole 1301A are supporting hydrogen and methane-based microbial metabolism (Lin et al., 2012, 2014). Dominant microbial clades in bulk fluid and rock samples are involved in iron, sulfur, and methane cycling, nitrate reduction, fermentation, and carbon fixation (Cowen et al., 2003; Nakagawa et al., 2006; Jungbluth et al., 2013;

Orcutt et al., 2011; McCarthy et al., 2010; Huber et al., 2006; Lever et al., 2013). The microbiology of fluids and attached communities from sulfide and basalt chip colonization experiments from the same time and location as this study have been reported and both appear to be dominated by thermophilic Bacteria that are chiefly

Firmicutes (Orcutt et al., 2011; Jungbluth et al., 2013). These studies revealed basement communities distinct from surface and near-surface basalts, hydrothermal fluids, and sulfides (Huber et al., 2006; Santelli et al., 2008; Toner et al., 2013; Takai et al., 2008), which are typically dominated by Proteobacteria. By providing comparisons of our mineral-incubated communities to those of attached and planktonic communities from IODP Hole 1301A, nearby 1026B, and deep seawater

(Cowen et al., 2003; Orcutt et al., 2011; Nakagawa et al., 2006; Jungbluth et al.,

2013), we highlight the unique affiliation of microbes with specific minerals, the observation that mineral-attached microbial communities of the deep ocean crust are distinct from those in aquifer fluid, and that a unique “deep biosphere” community inhabits the warm igneous crust of the JdFR.

35

Materials and Methods

Overview of Experimental Design

We used eight common igneous minerals and glasses in oceanic crust for this study. Details of incubation and retrieval were previously reported (Smith et al.,

2011), and will be briefly summarized again here. The minerals and glasses from this experiment were incubated in 3.5 my old basaltic crust in two flow cells named “1” and “3” (Smith et al., 2011). Each flow cell was connected to an osmotic pump that ensured continuous fluid flow (~ 30 mL per year) for the duration of the experiment

(4 years total). Each flow cell contained four mineral chambers arranged in sequence through which fluid flowed. Each chamber contained only one mineral. In flow cell

“1”, fluid flowed through chambers containing forsterite, Fo90 olivine, fayalite, and then hornblende. The sequence of minerals in flow cell “3” was basalt glass, obsidian, augite, and then diopside (Smith et al., 2011).

Each flow cell-pump assembly was placed into IODP Hole 1301A (47°

45.210′ N, 127° 45.833′ W) between 275 and 287 meters below sea floor (mbsf;

Figure 2.1; Smith et al., 2011). Hole 1301A was emplaced in oceanic crust at 2667 meters below sea level and has a CORK (Fisher et al., 2005) designed to seal the observatory system at the seafloor and allow the aquifer to return to native conditions after drilling and CORK insertion. During the first three years of the incubation, bottom seawater leaked into the observatory and mixed with aquifer fluids, providing a cooler, more oxidant-rich environment for aquifer communities (Wheat et al.,

2010). In the fourth year of incubation, seawater entrainment became undetectable and the mineral samples were exposed to fluids characteristic of the natural basement

36 aquifer (Figure 2.1B; Wheat et al., 2010). Minerals were retrieved in August 2008 and frozen at – 40 oC until extraction.

DNA extraction and sequencing

Genomic DNA was extracted from minerals using the FastDNA Spin Kit for

Soil (MP Biomedicals Catalog #116560200) as recommended for deep-sea basalts

(Wang & Edwards, 2009) with the following modifications: to enhance yield of encrusted, Gram positive, and Archaeal cells, 1 mL of sterile seawater medium was added to a 2 mL tube containing ~ 500 mg frozen minerals then vigorously vortexed at medium-high speed for 10 min to dislodge cells. Mineral-medium slurries were transferred to FastDNA Lysing Matrix E tubes and centrifuged for 10 min at 14,000 rpm with an Eppendorf bench-top microcentrifuge to pellet cells. The excess medium was removed with a pipette, and then FastDNA lysis buffer components were added to the lysis tubes and the remaining DNA extraction steps were followed according to the FastDNA protocol. However, the lysis-bead beating step was extended to 10 min, with periodic checks of lysis tube temperature to ensure that it did not exceed 40 oC.

DNA extracts were quantified with the Quant-iT dsDNA HS Reagent (Molecular

Probes, Inc.) on a Qubit Fluorometer or PCR-amplified directly using 2 L as template. Approximately 100 ng of genomic DNA was recovered from 500 mg of each of the minerals. Domain-specific primer amplification of the v6v4 region of the

16S rRNA genes for Bacteria and Archaea was performed at the Marine Biological

Laboratory (MBL, Woods Hole, MA) on a 454 GS-FLX sequencer with Titanium chemistry using ~ 1 ng/L DNA obtained from the attached mineral communities.

This project was part of the Census of Deep Life and additional details of sequence

37 generation and taxonomic assignment of tags for these projects that are not provided here were described previously (Thór Marteinsson et al., 2012).

Data Processing

Sequence files for this study and other related JdFR studies (Jungbluth et al.,

2013; Nakagawa et al., 2006; Cowen et al., 2003; Orcutt et al., 2011) were processed and analyzed using Visualization and Analysis of Microbial Population Structures

(VAMPS; Huse et al., 2014; Thór Marteinsson et al., 2012). In VAMPS, domain- specific primer amplicons from this study and clones from other studies (Orcutt et al.,

2011; Jungbluth et al., 2013; Cowen et al., 2003; Nakagawa et al., 2006) were assigned taxonomy with Global Alignment for Sequence Taxonomy (GAST; Huse et al., 2008). Clone sequences were aligned with 454 amplicons using MEGA5 (Tamura et al., 2011) and trimmed to match the v4v6 region of the 16S rRNA gene before uploading to VAMPS. To confirm that Archaeoglobaceae taxonomy differences were not an artifact of the taxonomic assignment methodology, the most abundant representative sequences of each of the dominant Archaeoglobaceae taxa were aligned to the SILVA database (http://www.arb-silva.de/; Pruesse et al., 2007). All sequences and metadata are publicly available on the VAMPS website under project name DCO_PPA_Av6v4 for Archaea and DCO_PPA_Bv6v4 for Bacteria and can also be found under GenBank accession #SRP039455.

We compared our community’s structural similarity to local, regional, and concurrent temporal communities. These communities included local deep seawater

(Jungbluth et al., 2013), aquifer fluids from Hole 1301A or from nearby Hole 1026B

(1 km distant; Cowen et al., 2003; Jungbluth et al., 2013), those attached to other rock

38 or mineral substrates in Hole 1301A (Orcutt et al., 2011), and contaminant communities arising from CORK materials (Nakagawa et al., 2006). These data are published and publicly available. In these comparisons we were unable to rigorously test community structure, as sequencing approaches varied widely among the studies.

These ranged from a minimum of 3 sequences per sample when cloning approaches were undertaken, up to our high throughput analysis with a mean of 15,541 ± 5,182

SD sequence per sample. As such we treat these comparisons qualitatively as quantitative comparisons would be inappropriate.

Removal of Laboratory Contaminants from Sequence Data

DNA extracts from low biomass environmental samples are commonly contaminated with DNA originating from DNA extraction kits (such as the FastDNA

Spin Kit for Soil we used) and laboratory supplies used in the extraction process

(Salter et al., 2014). Therefore, we identified and removed sequences belonging to likely laboratory contaminants (the Bacterial groups Burkholderia, Enterobacteriacea, and the genus Ralstonia) from our non-normalized taxonomy tables prior to our community-based analyses. None of these putative contaminant taxa are from groups of known thermophiles or hyperthermophiles, i.e., the groups expected to occur in the native communities (Wheat et al., 2010; Orcutt et al., 2011; Cowen et al., 2003;

Jungbluth et al., 2013). Although the Burkholderia and Enterobacteriaceae groups of sequences were a minor (< 5%) component of the total sequences, in some cases

Ralstonia sequences comprised a major portion (up to 50%) of the bacterial community (Supplemental Figure 2.2). We present a treatment of our data without these taxa removed in supplemental material (Supplemental Figure 2.1). Clostridia

39 are not reported as a contaminant from the FastDNA Spin Kit for Soil (Salter et al.,

2014) and have not been found in our laboratory supplies after multiple high throughput sequencing efforts from low biomass environments and so these taxa were retained in our community analysis.

Mineral Community Analysis

After removal of contaminant sequences, the taxonomy tables downloaded from VAMPS were normalized by relative abundance and imported into PRIMERv6.

In PRIMERv6, data were square-root transformed, and resemblance matrices constructed using Bray-Curtis Similarity. Non-metric Multi-Dimensional Scaling

(nMDS) analysis was used to visualize the community similarity of our samples and statistically different communities were identified using the Similarity Profile

(SIMPROF) routine. SIMPROF identifies groups of samples whose within-group variance is less than between other groups, presenting a robust approach to identify significant among-sample and between-sample community structure without using an arbitrary community similarity cut-off or a priori categories. Similarity Percentages

(SIMPER) was used to identify those taxa that defined the statistically significant groups.

Environmental Scanning Electron Microscopy

We investigated the morphological features of microbial biofilms and exposed mineral surfaces to determine if colonizing microbes were producing weathering features on mineral surfaces and if community biofilms visually differed according to mineralogy. We imaged the surfaces of at least 10 sand grains for each mineral or glass using an FEI QUANTA 600F ESEM for mineral imaging. Energy-dispersive X-

40 ray spectroscopy (EDX) was used in tandem with ESEM to characterize mineral features and determine the composition of biofilm materials on incubated mineral surfaces.

Results

Mineralogy Influenced Archaeal Community Structure

The olivine group of minerals (forsterite, Fo90 olivine, fayalite) contained

Archaeal communities that were more similar to each other than to those present on the other minerals and glasses (average similarity 51%; SIMPROF p < 0.05; Figure

2.2A; Supplemental Figure 2.1). Similarity of olivine communities is driven by the common occurrence and abundance of an Archaeoglobaceae taxon that did not fit within any of the known genera of this family (66% contribution to similarity).

Communities on other Fe-bearing minerals and glasses grouped together mainly due to the consistent occurrence of Archaeoglobus, a genus within Archaeoglobaceae (57

– 58% contribution) and diverged into two groups based on differential abundances of

Geoglobus (yet another genus of Archaeoglobaceae) and Miscellaneous

Crenarchaeotic Group (MCG). The Archaeal community present on diopside, a Fe- poor mineral, was distinct from other Fe-bearing or olivine mineral-associated

Archaeal communities (Figure 2.2A). Nearly all (99.7%) of the Archaeal sequences

(19,352 sequences; Supplemental Figure 2.1) from this mineral were identified as

Archaeoglobus. Other Archaeal taxa identified from this study include Marine Group

1, Deep Sea Hydrothermal Vent Group 6 (DSHVG6), and methane-cycling archaea

(Figure 2.3).

41

Figure 2.2. Community structure of (A) archaeal and (B) bacterial communities as represented in an nMDS. Each dot and encricled data points represent a significantly different group as identifed using a SIMPROF test.

42

Figure 2.3. Archaeal community composition from this and other subseafloor studies at IODP Holes 1301A and 1026B. (A) Archaeal community structure of eight minerals and glasses incubated in the subseafloor of Hole 1301A (this study; Table 2.1). These data are from 454 pyrotag sequencing of the v6v4 region of the archaeal 16S rRNA gene. (B) Cloning-based community studies of fluid or attached microbes from Holes 1301A or 1026B. References for panel (B) are: a. (Jungbluth et al., 2013); b.(Orcutt et al., 2011); c. (Cowen et al., 2003); d. (Nakagawa et al., 2006). Sequence numbers represented in each sample are in parentheses. Each non-genus level (not italicized) taxonomic group contains closely related sequences that could not be resolved to genus level and currently have no known matches in sequence

databases. These could not be identified beyond the level or group indicated.

Mineral Chemistry Influenced Bacterial Community Structure

Bacterial communities formed three statistically significant groups with

SIMPROF and nMDS analysis that were driven by mineral chemistry according to the presence or absence of iron (average similarity 56 – 58%; Figure2.3B;

43

Supplemental Figure 2.2). Fayalite, the mineral with the highest iron content and energy density, had a unique Bacterial community that did not group with the other mineral communities (average dissimilarity 54 – 64%). It had a low abundance of the

Clostridia taxonomic group and more variable distributions of the remaining taxa found on the other minerals (Figure 2.4). The Clostridia taxonomic group and

Clostridiaceae, a family of Clostridia, were the dominant groups contributing to similarity of both the Fe-bearing and Fe-poor mineral communities (30% and 45% combined taxonomic contribution to similarity, respectively); however, differential abundances of lesser abundant taxa such as Thermosipho, Desulforudis, and OP1 caused these groups to diverge (average dissimilarity 51%). Generally, OP1 and

Thermosipho were more abundant in the Fe-bearing communities, whereas

Desulforudis was more abundant in Fe-poor communities.

Community Richness

Archaeal communities had low richness in comparison to Bacterial communities (Table 2.1), and consisted of up to three dominant taxonomic groups,

Geoglobus, Archaeoglobus, and another taxon of Archaeoglobaceae (Figure 2.3).

Archaeal sequence recovery from Fo0 olivine (fayalite) was low (575 reads;

Supplemental Table 2.1), but its community most resembled that of Fo90 olivine

(Figure 2.3). The Fo90 olivine community had the greatest Bacterial diversity (Table

2.1) of all mineral and glass phases and contained the highest number of total taxa and the highest number of taxa that were only found on a single mineral.

44

Figure 2.4. Bacterial community composition from this and other subseafloor studies at IODP Holes 1301A and 1026B. (A) Bacterial community structure of eight minerals and glasses incubated in the subseafloor of Hole 1301A (this study; Table 2.1). These data are from 454 pyrotag sequencing of the v6v4 region of the bacterial 16S rRNA gene. (B) Cloning-based community studies of fluid or attached microbes from Holes 1301A or 1026B. References for panel (B) are: a. (Jungbluth et al., 2013); b.(Orcutt et al., 2011); c. (Cowen et al., 2003); d. (Nakagawa et al., 2006). Sequence numbers represented in each sample are in parentheses. OP1 and OP8 are candidate Bacterial phyla. Each non-genus level (not italicized) taxonomic group contains closely related sequences that could not be resolved to genus level and currently have no known matches in sequence databases. These could not be identified beyond the level or group indicated.

Comparison to Other Local JdFR Communities

The mineral-associated microbial communities from this study were most similar to the rock chip-associated microbial communities incubated in the same hole

(1301A) during the same time period (Orcutt et al., 2011). We compared the resulting

45 microbial communities from our experiments to other JdFR subseafloor communities, including borehole fluid, here named “1301A fluid 08”, bottom seawater (“1301A seawater 08”), and other rock and mineral-attached communities (“1301A rock 08”) sampled from Hole 1301A in 2008 when we retrieved our samples (Figures 2.3 and

2.4; Jungbluth et al., 2012; Orcutt et al., 2011). Additionally, we included sequences from earlier sampling of the 1026B borehole fluid community (“1026B fluid 03”) and black rust community (“1026B rust 06”) which was collected from the steel casing of the 1026B well head above the seafloor (Figures 2. 3 and 4; Cowen et al., 2003;

Nakagawa et al., 2006). The “1301A rock 08” mineral and rock chip samples were most similar to our attached communities, sharing the three abundant Archaeal taxa that were Archaeoglobus, the aforementioned Archaeoglobaceae, and a group of

MCG (Figure 2.3). However, the rock chip microbial communities also contained

Methanosarcina which was not present on the minerals and glasses from this study, and they did not contain other organisms present in low to moderate abundance on the minerals from this study (Geoglobus, Marine Group 1, and Deep Sea Hydrothermal

Vent Group 6). The rock chip Archaeal communities appeared most similar to the olivine group in this study, largely in response to the Archaeoglobaceae taxa present.

Bacterial communities that we observed were also similar to the microbial communities detected on rock chips (Orcutt et al., 2011), with the major difference being a lack of Thermosipho and Halobacteroidaceae sequences on the rock chips

(Figure 2.4). Based on community composition, our communities were least similar to the “1026B rust 06” casing contaminant community (Nakagawa et al., 2006) and

46

“1301A seawater 08” (Jungbluth et al., 2013), which were sampled above the seafloor.

Attached vs. Planktonic Aquifer Communities

Attached subseafloor communities sampled from colonized rock chips, minerals, and glasses incubated in Hole 1301A are more similar to each other than to the planktonic fluid communities from Holes 1301A and 1026B. The fluid communities were characterized by an abundant Desulforudis (Figure 2.4) and an

MCG Archaeon (Figure 2.3) in both samples (comprising 40% of the bacterial and

78% of the archaeal communities, respectively). These taxa were also present on the mineral and glass communities from this study and microbial communities associated with rock chips (Orcutt et al., 2011), albeit in lower abundance (an average of 5%

Desulforudis in the bacterial community and 4% of MCG in the archaeal community).

Biofilm Morphologies

We used ESEM to visualize biofilm morphologies, microbial weathering features, and secondary mineral formation on the incubated minerals and found that each mineralogical group had unique biofilm morphologies (Figure 2.5). EDX spectroscopy revealed that Fo90 olivine contained thick, carbon-rich biofilms

(Supplemental Figure 2.5). These contained embedded cells of uniform size and shape and often contained secondary minerals whose structure could not be clearly identified (Figure 2.5A; Supplemental Figure 2.6). Large (2 - 5 m) globe-shaped cells connected to each other via an outer sheath or extracellular polymeric substance were also found on mineral particles from Fo90 olivine. Glass surfaces contained

47 layers of mineral-encrusted and mineral-free spherical and rod-shaped cells (Figure

2.5B). Tubular structures, cell-shaped pits, and small mounds with holes in the center were also common. We did not observe twisted stalks indicative of low temperature neutrophilic iron oxidizers like those observed previously as possible remnants of the seawater entrainment period (Orcutt et al.,

2011); however, we occasionally observed diatom frustules on glass surfaces that originated from seawater. Compared to other minerals, the surfaces of the pyroxenes augite and diopside had few cells and these were mainly solitary rod-shaped or coccoid cells (Figure 2.5C).

48

Figure 2.5. Scanning electron micrographs of mineral and glass biofilms. Images from Fo90 olivine (A), from basalt glass (B), and from augite (C). (A) Surface biofilm with embedded cells (1), cells on the mineral surface after a thick carbon-rich film has pulled away (2), and secondary minerals forming on the surface in association with the biofilm (3). Locations where EDAX spectroscopy (secondary minerals and biofilm) was performed are noted by open circles (Supplemental Figures 2.5 and 2.6). (B) Basalt glass biofilms with a diversity of encrusted tubes, cellular crusts and pits, and rod and spherical-shaped cells. (C) Single cells on the surface of augite. Bars are 10 microns.

49

Discussion

The two dominant Archaeal taxa provided insight into the importance of mineralogy on community structure. Based on other known members of the genus

Archaeoglobus, the dominant Archaeon on all non-olivine phases is a sulfate reducer

(Klenk et al., 1998), which means it requires sulfate and reductants (possibly organics) from basement fluids to fuel its metabolism. This organism was present on all mineral and rock chip incubations from Hole 1301A and aquifer fluids sampled from 1026B (Figure 2.3) regardless of mineralogy, indicating its distribution is not tied to the basement mineralogy and is likely dependent on fluid chemistry. In contrast to this, the Archaeoglobaceae taxon was most abundant on the highly reactive olivine minerals (this study) and Fe-bearing rock chips (including pyrite;

Orcutt et al., 2011). This species distribution is likely due to the energy density (i.e., a high concentration of redox-active elements on a rapidly weathering surface) of Fe- bearing mineral groups such as olivine. Iron and molecular hydrogen concentrations in borehole fluid were high enough to support microbial metabolism (~ 1 M and ~ 2

M, respectively; Lin et al., 2012), but if these were the most important sources of energy in the attached communities we would not have observed the clear difference among mineral types that we did (Figures 2.4 and 2.5). Our results indicate that the inherent mineralogical properties of these reduced Fe-bearing minerals are driving the differences in attached archaeal communities.

Bacterial communities also appeared to be structured as a result of mineral chemistry, specifically with respect to Fe-bearing or Fe-poor minerals. Clostridia are common to olivine-dominated systems (Brazelton et al., 2012) however, the

50

Clostridia and taxa we found dominating the mineral-attached communities were least abundant on Fe-rich fayalite. The Clostridia taxon was abundant in the two other attached communities from Holes 1301A and 1026B, which may indicate that these microbes prefer living in biofilm communities such as those we observed. These Clostridia lineages may be more abundant in the attached communities from the JdFR if their distribution is driven by H2, which members of

Clostridia are known to use for acetogenesis (Nagarajan et al., 2013; Stephen W

Ragsdale & Pierce, 2008; Braakman & Smith, 2012). Fe-bearing phases can provide molecular hydrogen as an energy source as they react with seawater (Mayhew et al.,

2013).

The Archaea (Figure 2.3) and Bacteria (Figure 2.4) that colonized the mineral and glass phases in the crust are related to thermophiles and hyperthermophiles, as expected from the basement temperature over the last year (~64 oC; Figure 2.1B). The most abundant Archaeal and Bacterial taxa were Archaeoglobaceae, Thermosipho, and Firmicutes (including Clostridia and candidatus Desulforudis; Chivian et al.,

2008) which all reside in thermophilic and hyperthermophilic groups. All known genera of Archaeoglobaceae are thermophiles from hydrothermal or deep subsurface environments that utilize iron and sulfur-based metabolisms to gain energy (Klenk et al., 1998; Kashefi et al., 2002; Anderson et al., 2011). Sequences from the Clostridia and Clostridiaceae groups most closely match those previously obtained from rock chips incubated in Hole 1301A (Orcutt et al., 2011). The closest cultured representatives to the Clostridia that was abundant in this study are Pelotomaculum thermopropionicum (91% similarity), a thermophilic bacterium that forms a

51 syntrophic relationship with hydrogenotrophic methanogenic archaea (Imachi &

Sekiguchi, 2002), Moorella humiferrea (91% similarity), and Thermolithobacter ferrireducens (91% similarity), both of which are thermophilic iron reducers

(Nepomnyashchaya et al., 2011; Sokolova et al., 2007). The closest cultured representatives to the Clostridiaceae sequences we obtained are T. ferrireducens and

T. carboxydivorans (both 88% similarity), which are hydrogenotrophic and hydrogenogenic respectively, and also thermophilic chemolithoautotrophic iron reducers isolated from Calcite Spring in Yellowstone National Park (Sokolova et al.,

2007).

Our findings indicate a strong presence of chemolithoautotrophic lineages mixed with putative heterotrophs and mixotrophs. This is consistent with what has been previously reported in studies of JdFR microbiology (Orcutt et al., 2011;

Jungbluth et al., 2013; Cowen et al., 2003). Although the crustal aquifer is most likely a carbon sink since its venting dissolved organic carbon (DOC) values are lower than that of its deep oceanic source water (Lang et al., 2006), the basement community in this region of the crust is also fixing and supplying fresh organic carbon to the deep ocean (McCarthy et al., 2010). We observed dominant Bacterial and Archaeal community members related to known hydrogen oxidizers

(Archaeoglobaceae; Kashefi et al., 2002; Klenk et al., 1998), H2-producing fermenters (Clostridia; Jiang et al., 2014), acetogens (Clostridia, OP1; Takami et al.,

2012; Nagarajan et al., 2013), and iron and sulfur-metabolizing microbes (Geoglobus,

Archeaeoglobus; (Kashefi et al., 2002; Klenk et al., 1998)), and these taxa may be responsible for supplying the fresh organic carbon previously observed venting out of

52 the ocean floor (McCarthy et al., 2010). Methane cycling activity has been previously reported from 351 – 583 mbsf basalt cores from a nearby borehole (Hole 1301B;

Lever et al., 2013) and from JdFR aquifer fluids (Lin et al., 2014), although little evidence of taxa involved in methane cycling were found in the mineral-associated communities that we studied.

The fact that the attached communities from these locations are more similar to each other than to the aquifer fluids from the same region suggests that there is a common deep attached community in the Earth's crust. Further, this pattern assuages one our main concerns from our experimental design, namely that the early seawater entrainment period might have led to the observed community patterns rather than the mineralogy. The clear separation between aquifer fluid and deep seawater community structure and our results indicate that this scenario is unlikely. In addition, the distinction between the mineral-associated community structures that we observed and those reported from borehole rust indicate that the steel CORK casing in Hole

1301A is unlikely to have impacted our community patterns. Although evidence of temporal variability in Hole 1301A fluid communities has been demonstrated previously as a community shift from Firmicute-dominated to Proteobacteria- dominated over the years 2008, 2009, and 2010 (Jungbluth et al., 2013), we only used

2008 fluid community data in our comparison. These samples were taken in parallel with this study and with the rock chip incubations and most accurately reflect the whole (fluid plus attached) aquifer community at the time of sample retrieval (Orcutt et al., 2011; Jungbluth et al., 2013). Furthermore, the fluid communities sampled in subsequent years were more dissimilar to mineral communities from this study than

53 the fluid community from 2008 that we used for comparison, strengthening our finding that the planktonic and attached aquifer communities are distinct from one another (Jungbluth et al., 2013).

Pelagibacter was found in 1301A fluid and bottom seawater (Jungbluth et al.,

2013), but not in fluid from 1026B or any mineral or rock sample. The planktonic heterotroph Pelagibacter dominates pelagic seawater communities (Rappé et al.,

2002), and its presence in 1301A fluid samples has been shown to indicate seawater intrusion from improper sealing of the CORK or leaky fluid delivery lines which were used for sampling (Jungbluth et al., 2013, 2014). The absence of Pelagibacter further supports that the communities that we report on are not water column communities from the initial seawater intrusion.

Removal of sequences from potential microbial contaminants did not change the outcome of our analyses or our conclusions about the mineralogy and composition of the crust influencing the structure of aquifer microbial communities. By excluding the potential contaminants, we found fayalite no longer grouped with the other Fe- bearing minerals and instead was unique (Figures 2.3, Supplementary Figures 2.3 and

2.4). Prior to removal, Ralstonia sequences were low in abundance on Fe-poor minerals and high in abundance on Fe-bearing minerals. Although Ralstonia sequences have been reported from deep crustal environments and is potentially a H2- oxidizer (Brazelton et al., 2012; Mason et al., 2010), it is : (1) an aerobic mesophile that is not reported to grow under the warm, anoxic JdFR aquifer conditions; (2) not reported in any other community data from Hole 1301A or 1026B; and (3) it was the most common contaminant listed for the DNA extraction kit we used (Salter et al.,

54

2014). We therefore decided these sequences were most likely those of laboratory contaminants and not part of the natural community colonizing these minerals and elected to remove them from the analysis.

Conclusion and Future Directions

We found that the structure, composition, and morphological characteristics of surface-attached communities differ with respect to the mineralogy and phase chemistry of the ocean crust. This study extends our knowledge of the mineralogical impact on microbial communities from surficial seafloor basalts and sulfide communities to those deeper within the habitable zone of the igneous crust. We also found that the attached and planktonic components of the suboceanic aquifer have comparatively different microbial community structures which is supportive of other studies (e.g., Lehman, 2007), yet together they form a distinct deep crustal aquifer community which differs markedly from other local seafloor communities. These results suggest there is a structural, and potentially functional, segregation of microbial communities within the aquifer that is driven by the properties of igneous minerals. We predict that the aquifer ecosystem’s structural and functional diversity is heterogeneous since the earth’s crust is mineralogically heterogeneous. Future studies aimed at assessing functional traits of communities (e.g., metagenomics), direct activity measurements, and the use of CORKs emplaced into different crustal types will allow us to determine if the differences in community structure that we report here translate into true functional differences in the oceanic crust. This study provides insight into the factors that drive micro-scale complexity of aquifer ecosystems and

55 has important implications for resolving ecosystem function, elemental cycling, and microbial distributions in igneous crust, the largest deep habitat on Earth.

Acknowledgments: This is Center for Dark Energy Biosphere Investigations (C-

DEBI) contribution #320. Pyrotag sequencing was made possible by the Deep Carbon

Observatory’s Census of Deep Life supported by the Alfred P. Sloan Foundation and was performed at the Marine Biological Laboratory (Wood Hole, MA, USA). We are grateful for the assistance of Mitch Sogin, Susan Huse, Joseph Vineis, Andrew

Voorhis, Sharon Grim, and Hilary Morrison at MBL. Andrew Fisher, C. Geoffrey

Wheat, Hans Jannasch, Stefan Sievert, Keir Becker, Mark Nielsen, and the crews of submersible DSRV Alvin and the RV Atlantis and JOIDES Resolution assisted with flow cell development, deployment, and retrieval. Thanks to Teresa Sawyer in OSU’s

EM facility for her help with training and imaging. William Rugh contributed to the design of the flow cells.

References

Anderson I, Risso C, Holmes D, Lucas S, Copeland A, Lapidus A, et al. 2011. Complete genome sequence of placidus AEDII12DO. Stand Genomic Sci 5:50–60.

Boettger J, Lin H-T, Cowen JP, Hentscher M, Amend JP. 2013. Energy yields from chemolithotrophic metabolisms in igneous basement of the Juan de Fuca ridge flank system. Chem Geol 337-338:11–19.

Braakman R, Smith E. 2012. The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol 8:e1002455.

56

Brazelton WJ, Nelson B, Schrenk MO. 2012. Metagenomic evidence for H2 oxidation and H2 production by serpentinite-hosted subsurface microbial communities. Front Microbiol 2:268.

Cowen JP, Giovannoni SJ, Kenig F, Johnson HP, Butterfield D, Rappé MS, et al. 2003. Fluids from aging ocean crust that support microbial life. Science 299:120–3.

Dick HJB, Lin J, Schouten H. 2003. An ultraslow-spreading class of ocean ridge. Nature 426:405–12.

Edwards KJ, Bach W, McCollom TM. 2005. Geomicrobiology in oceanography: microbe-mineral interactions at and below the seafloor. Trends Microbiol 13:449–56.

Edwards KJ, Wheat CG, Sylvan JB. 2011. Under the sea: microbial life in volcanic oceanic crust. Nat Rev Microbiol 9:703 – 712.

Fisher AT, Wheat CG, Becker K, Davis EE, Jannasch H, Schroeder D, et al. 2005. Scientific and technical design and deployment of long-term subseafloor observatories for hydrogeologic and related experiments , IODP Expedition 301 , eastern flank of Juan de Fuca Ridge 1 and general design. Proc Integr Ocean Drill Progr 301. doi:10.2204/iodp.proc.301.103.2005.

Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI, et al. 2011. Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge. Environ Microbiol 13:2158–71.

Heberling C, Lowell RP, Liu L, Fisk MR. 2010. Extent of the microbial biosphere in the oceanic crust. Geochemistry Geophys Geosystems 11:1–15.

Huber J, Johnson HP, Butterfield D, Baross J. 2006. Microbial life in ridge flank crustal fluids. Environ Microbiol 8:88–99.

Huse SM, Dethlefsen L, Huber JA, Mark Welch D, Welch DM, Relman DA, et al. 2008. Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. Eisen, JA (ed). PLoS Genet 4:e1000255.

Huse SM, Welch DBM, Voorhis A, Shipunova A, Morrison HG, Eren AM, et al. 2014. VAMPS : a website for visualization and analysis of microbial population structures. BMC Bioinformatics 15.

Imachi H, Sekiguchi Y. 2002. Pelotomaculum thermopropionicum gen. nov., sp. nov., an anaerobic, thermophilic, syntrophic propionate-oxidizing bacterium. Int J Syst Evol Microbiol 1729–1735.

57

Jiang L, Long C, Wu X, Xu H, Shao Z, Long M. 2014. Optimization of thermophilic fermentative hydrogen production by the newly isolated Caloranaerobacter azorensis H53214 from deep-sea hydrothermal vent environment. Int J Hydrogen Energy 39:14154–14160.

Johnson HP, Pruis MJ. 2003. Fluxes of fluid and heat from the oceanic crustal reservoir. Earth Planet Sci Lett 216:565–574.

Jungbluth SP, Grote J, Lin H-T, Cowen JP, Rappé MS. 2013. Microbial diversity within basement fluids of the sediment-buried Juan de Fuca Ridge flank. ISME J 7:161–172.

Jungbluth SP, Lin H-T, Cowen JP, Glazer BT, Rappé MS. 2014. Phylogenetic diversity of microorganisms in subseafloor crustal fluids from Holes 1025C and 1026B along the Juan de Fuca Ridge flank. Front Microbiol 5:119.

Kashefi K, Tor JM, Holmes DE, Gaw Van Praagh C V, Reysenbach A-L, Lovley DR. 2002. Geoglobus ahangari gen. nov., sp. nov., a novel hyperthermophilic archaeon capable of oxidizing organic acids and growing autotrophically on hydrogen with Fe(III) serving as the sole electron acceptor. Int J Syst Evol Microbiol 52:719–728.

Klenk H, Clayton RA, Tomb J, Dodson RJ, Gwinn M, Hickey EK, et al. 1998. The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon. Nature 394:6342–6349.

Lang SQ, Butterfield DA, Lilley MD, Paul Johnson H, Hedges JI. 2006. Dissolved organic carbon in ridge-axis and ridge-flank hydrothermal systems. Geochim Cosmochim Acta 70:3830–3842.

Lehman RM. 2007. Understanding of aquifer microbiology is tightly linked to sampling approaches. Geomicrobiol J 24:331–341.

Lever M, Rouxel O, Alt J, Shimizu N. 2013. Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Cowen JP, Olson EJ, Amend JP, Lilley MD. 2012. Inorganic chemistry, gas compositions and dissolved organic carbon in fluids from sedimented young basaltic crust on the Juan de Fuca Ridge flanks. Geochim Cosmochim Acta 85:213–227.

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. 2014. Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

58

Mason OU, Nakagawa T, Rosner M, Van Nostrand JD, Zhou J, Maruyama A, et al. 2010. First investigation of the microbiology of the deepest layer of ocean crust. PLoS One 5:e15399.

Mayhew LE, Ellison ET, Mccollom TM, Trainor TP, Templeton AS. 2013. Hydrogen generation from low-temperature water–rock reactions. Nature Geo 6. doi:10.1038/NGEO1825.

McCarthy MD, Beaupré SR, Walker BD, Voparil I, Guilderson TP, Druffel ERM. 2010. Chemosynthetic origin of 14C-depleted dissolved organic matter in a ridge- flank hydrothermal system. Nat Geosci 4:32–36.

McCollom TM. 2007. Geochemical constraints on sources of metabolic energy for chemolithoautotrophy in ultramafic-hosted deep-sea hydrothermal systems. Astrobiology 7:933–50.

Nagarajan H, Sahin M, Nogales J, Latif H, Lovley DR, Ebrahim A. 2013. Characterizing acetogenic metabolism using a genome-scale metabolic reconstruction of Clostridium ljungdahlii. Microb Cell Fact 12:1–13.

Nakagawa S, Inagaki F, Suzuki Y, Steinsbu BO, Lever MA, Takai K, et al. 2006. Microbial community in black rust exposed to hot ridge flank crustal fluids. Appl Environ Microbiol 72:6789–99.

Nepomnyashchaya YN, Slobodkina GB, Baslerov RV., Chernyh N, Bonch- Osmolovskaya E , Netrusov I, et al. 2011. Moorella humiferrea sp. nov., a thermophilic, anaerobic bacterium capable of growth via electron shuttling between humic acid and Fe(III). Int J Syst Evol Microbiol 62:613–617.

Nielsen ME, Fisk MR. 2010. Surface area measurements of marine basalts: Implications for the subseafloor microbial biomass. Geophys Res Lett 37. doi:10.1029/2010GL044074.

Orcutt BN, Bach W, Becker K, Fisher AT, Hentscher M, Toner BM, et al. 2011. Colonization of subsurface microbial observatories deployed in young ocean crust. ISME J 5:692–703.

Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, et al. 2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–96.

Ragsdale SW, Pierce E. 2008. Acetogenesis and the Wood-Ljungdahl pathway of CO2 fixation. Biochim Biophys Acta 1784:1873–98.

59

Rappé M, Connon S, Vergin K, Giovannoni S. 2002. Cultivation of the ubiquitous SAR 11 marine bacterioplankton clade. Nature 418:630–3.

Robador A, Jungbluth SP, LaRowe DE, et al. 2014. Activity and phylogenetic diversity of sulfate-reducing microorganisms in low-temperature subsurface fluids within the upper oceanic crust. Frontiers in Microbiology 5:748. doi:10.3389/fmicb.2014.00748.

Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, et al. 2014. Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol 12:87.

Santelli CM, Orcutt BN, Banning E, Bach W, Moyer CL, Sogin ML, et al. 2008. Abundance and diversity of microbial life in ocean crust. Nature 453:653–6.

Smith A, Popa R, Fisk M, Nielsen M, Wheat CG, Jannasch HW, et al. 2011. In situ enrichment of ocean crust microbes on igneous minerals and glasses using an osmotic flow-through device. Geochemistry Geophys Geosystems 12:1–19.

Sokolova T, Hanel J, Onyenwoke RU, Reysenbach a-L, Banta a, Geyer R, et al. 2007. Novel chemolithotrophic, thermophilic, anaerobic bacteria Thermolithobacter ferrireducens gen. nov., sp. nov. and Thermolithobacter carboxydivorans sp. nov. Extremophiles 11:145–57.

Sylvan JB, Sia TY, Haddad AG, Briscoe LJ, Toner BM, Girguis PR, et al. 2013. Low temperature geomicrobiology follows host rock composition along a geochemical gradient in lau basin. Front Microbiol 4. doi:10.3389/fmicb.2013.00061.

Takai K, Nunoura T, Ishibashi J, Lupton J, Suzuki R, Hamasaki H, et al. 2008. Variability in the microbial communities and hydrothermal fluid chemistry at the newly discovered Mariner hydrothermal field, southern Lau Basin. J Geophys Res 113:G02031.

Takami H, Noguchi H, Takaki Y, Uchiyama I, Toyoda A, Nishi S, et al. 2012. A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem. PLoS One 7:e30559.

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. 2011. MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–9.

Thór Marteinsson V, Rúnarsson A, Stefánsson A, Thorsteinsson T, Jóhannesson T, Magnússon SH, et al. 2012. Microbial communities in the subglacial waters of the Vatnajökull ice cap, Iceland. ISME J 7:427–437.

60

Toner BM, Lesniewski RA, Marlow JJ, Briscoe LJ, Santelli CM, Bach W, et al. 2013. Mineralogy drives bacterial biogeography of hydrothermally inactive seafloor sulfide deposits. Geomicrobiol J 30:313–326.

Wang H, Edwards KJ. 2009. Bacterial and Archaeal DNA extracted from inoculated experiments: implication for the optimization of DNA extraction from deep-sea basalts. Geomicrobiol J 26:463–469.

Wheat CG, Jannasch HW, Fisher AT, Becker K, Sharkey J, Hulme S. 2010. Subseafloor seawater-basalt-microbe reactions: Continuous sampling of borehole fluids in a ridge flank environment. Geochemistry Geophys Geosystems 11:1–18.

61

3. CARBON AND ENERGY PATHWAYS INDICATE DOMINANT BACTERIA COLONIZING OLIVINE IN YOUNG THERMAL OCEANIC CRUST ARE ACETOGENS

Amy R Smith, Ryan Mueller, Martin R. Fisk, Olivia U. Mason, Radu Popa, Brandon

Kieft, and Frederick S. Colwell

Abstract

Earth’s largest aquifer ecosystem resides in igneous oceanic crust, where chemosynthesis and water-rock reactions provide the carbon and energy to support an active deep biosphere. Although the Calvin cycle was determined to be the dominant carbon fixation pathway in cool, oxic oceanic crust, less is known about the microbial communities in dark, low-organic matter environments like deep thermal aquifers and the carbon fixation pathways they may use. Here, molecular hydrogen (H2) may be used as an electron donor rather than organic matter, and it along with CO2 may be fed into carbon fixation pathways such as the Wood-Ljungdahl pathway as has been reported in deep continental aquifers. Using a metagenomic approach, we examined the carbon and energy pathways of a microbial community colonizing olivine that was incubated for four years in a deep thermal oceanic crustal aquifer. Eleven high- quality genomes related to thermophilic and hyperthermophilic bacteria and archaea were obtained from the olivine biofilm. We found that the dominant carbon fixation pathway was the hydrogenotrophic Wood-Ljungdahl, or reductive acetyl-CoA, pathway for acetogenesis. This pathway was complete or near-complete in nearly all high-quality bacterial genomes, representing seven novel thermophilic acetogens. The acetyl-CoA pathway, which is shared between acetogens, methanogens, and sulfate

62 reducers, and is part of the Wood-Ljungdahl pathway, was complete in all three archaeal genomes. Sulfate reduction was a common energy metabolism among bacteria and archaea, and one bacterial genome contained a complete nitrogen fixation pathway. These findings give strong evidence for H2-based chemolithoautotrophy in the ocean’s thermal basaltic aquifer, and may represent a previously unrecognized hotspot for acetogenesis.

Introduction

The Wood-Ljungdahl, or reductive acetyl-CoA, pathway for acetogenesis is an ancient carbon fixation and biosynthetic pathway employed by acetogenic bacteria, sulfate reducers, and methanogenic archaea (Nitschke & Russell, 2013;

Grabarse et al., 2001; Fuchs, 2011). Molecular hydrogen (H2) and inorganic carbon

- (CO2 or HCO3 ), along with a suite of enzymes, hydrogenases, and ferredoxins, are used to synthesize acetyl-CoA (Braakman & Smith, 2012; Nitschke & Russell, 2013).

Acetyl-CoA is a key biomolecule from which other biomolecules are made (i.e. lipids), and many biosynthetic pathways are dependent upon it as a precursor molecule. This pathway often dominates in deep continental aquifers and mines where H2, derived from either radiolytic processes or water-rock reactions, is present

(Chivian et al., 2008; Brazelton et al., 2012, 2013; Stevens & McKinley, 1995;

Magnabosco et al., 2015). Its presence has also been noted in warm marine aquifers and hot springs (Jungbluth et al., 2017; Takami et al., 2012); however, it is not known if the Wood-Ljungdhal pathway is a common metabolism in the deep oceanic crustal aquifer. Since iron-bearing crustal minerals such as olivine can support

63 hydrogenotrophy by producing H2 as they react with warm aquifer fluid (Mayhew et al., 2013), we propose that iron-bearing minerals (e.g., olivine) in thermal oceanic crustal aquifers support microbial communities using the Wood-Ljungdahl pathway.

The Juan de Fuca Ridge (JdFR) crustal aquifer is a deep, thermal (~ 64 oC) basaltic aquifer contained within young (~3.5 mya) oceanic crust with thick sediment deposits that limit thermal exchange with the overlying ocean and create a warm, reducing habitat (Fisher et al., 2005a and b; Wheat, 2004). The thermodynamic disequilibrium between aquifer fluid and reduced, Fe-bearing minerals in the crust can be exploited by microbes for energy and growth, and H2 production at the surface of these minerals (e.g., olivine) could support a H2-based microbial community. From studies of the JdFR aquifer community (Jungbluth et al., 2013, 2017; Orcutt et al.,

2011; Nakagawa et al., 2006; Smith et al., 2016; Lever et al., 2013), there is evidence that the community attached to minerals is distinct from the aquifer fluid community

(Smith et al., 2016; Lever et al., 2013; Jungbluth et al., 2013; Cowen et al., 2003;

Orcutt et al., 2011), and that the lineages inhabiting mineral surfaces contain members using the Wood-Ljungdahl pathway.

To determine if the Wood-Ljungdahl pathway plays a prominent role in the carbon and energy metabolism of microbes that attach to olivine in the deep JdFR suboceanic aquifer, we produced twelve metagenome-derived genomes from a microbial biofilm community that colonized the mineral during a four-year in situ incubation (Figure 3.1; Smith et al., 2016). We then used metabolic pathway reconstruction to determine the carbon fixation pathways and energy metabolisms

64 present in these genomes. The phylogenetic lineages from this community were used for comparison (Figure 3.1; Smith et al., 2016, Chapter 2 of this dissertation). We found that this community’s metabolism is dominated by hydrogenotrophy, acetogenesis via the ancient Wood-Ljungdahl pathway, and sulfate reduction.

65

Figure 3.1. Summary of subseafloor olivine colonization study. A) Simplified schematic of IODP Borehole 1301A on the eastern flank of the Juan de Fuca Ridge (modified from Fisher et al., 2005a; Fisher et al., 2005b; Smith et al., 2011). The borehole depth was 367 meters below seafloor, penetrating the sediment layer and basaltic basement rock. The microbial flow cell containing olivine was suspended in the basaltic basement and was exposed to lateral aquifer fluid flow from the basalt. B) Location of Hole 1301A on the eastern flank of the Juan de Fuca Ridge at a depth of 2,667 m. C) Open flow cell with chambers containing colonized mineral substrates. Sponges that held minerals in place are visible and are discolored, presumably from water-rock reactions. D) Environmental SEM of microbial cells from the olivine biofilm captured on a filter (Smith et al., 2016). E) Summary of olivine community structure after combining Fo100 and Fo90 olivine communities from (Smith et al., 2016). Representative taxa are named according to best resolution of taxonomic group. For example, Archaeoglobus and Geoglobus are genera of Archaeoglobaceae; however, another unknown genus is from the family Archaeoglobaceae is represented, and thus is denoted simply Archaeoglobaceae.

66

Figure 3.1

67

Materials and Methods

Overview of Experimental Design

Details of oceanic crust mineral incubation and retrieval were previously reported (Smith et al., 2016, 2011). Briefly, olivine and other minerals were incubated in 3.5-my old basaltic crust on the eastern flank of the JdFR (Figure 3.1).

Olivine was incubated for four years in a flow cell connected to an osmotic pump that ensured continuous fluid flow that depended on the temperature of the fluid and increased to 200 mL/year for the final year of the experiment (Wheat et al., 2010).

The flow cell contained four mineral chambers connected in sequence, each containing one mineral. Aquifer fluid was pumped through each chamber in sequence. The first two chambers contained the ~ 2 mm grain size olivine used in this study. They were forsterite 100 (Fo100) olivine (Mg2SiO4) and forsterite 90 (Fo90) olivine [(Mg0.9Fe0.1)2SiO4].

The olivine-bearing flow cell assembly was incubated in IODP Hole 1301A

(47° 45.210′ N, 127° 45.833′ W; 2,667 m water depth; Fisher et al., 2005) from summer 2004 – summer 2008 between 275 and 287 meters below seafloor (mbsf;

Smith et al., 2011). Seawater entrainment occurred in the borehole during the first three years of incubation (Wheat et al., 2010); however, in the fourth year the aquifer fluid chemistry and temperature changed to that which is more representative of the natural thermal aquifer: ~64 ºC aquifer fluid rich in sulfate (17.6 mmol kg-1), iron (4

mol kg-1), and ammonium (840 mol kg-1), with some nitrate (0.8 mol kg-1) and a source of hydrogen that could support hydrogenotrophy (~ 2M H2; Wheat et al.,

2010; Lin et al., 2012, 2014). Organic carbon concentrations in the aquifer are lower

68 than seawater (McCarthy et al., 2010). Upon recovery of the assembly aboard the ship, the olivine was frozen at – 40 o C where it remained until extraction.

DNA Extraction

Genomic DNA was extracted from ~ 500 mg of thawed olivine sand using a modified protocol for the FastDNA Spin Kit for Soil (MP Biomedicals Catalog

#116560200) as described previously (Smith et al., 2016; Wang & Edwards, 2009).

DNA extracts were quantified with the Quant-iT dsDNA HS Reagent (Molecular

Probes, Inc.) on a Qubit Fluorometer or PCR-amplified directly using 2 L as template and Amplitaq Gold LD . Bacterial DNA was amplified using the primers 27F (5’-AGAGTTTGATCCTGGCTCAG-3’) and 926R (5’-

CCGTCAATTCCTTTRAGTTT-3’), and Archaeal DNA was amplified using 181F

(5’-TAGGATGGATCTGCGGCCA-3’) and 1392R (5’-

CCCCTGCGAACCTAGATT-3’). PCR conditions were: 95 oC for 5 minutes, 34 cycles of: 95 oC for 40 seconds, 50 oC for 40 seconds, and 72 oC for 1 minute, and a final step of 72 oC for 10 minutes.

Metagenome Sequencing

A portion of genomic DNA from the two most common olivine mineral phases (forsterite and Fo90 olivine) was used previously for 454 pyrotag sequencing

(Smith et al., 2016). The remaining genomic DNA was pooled from these two olivine phases to produce this metagenome (a total of 50 – 70 ng of DNA). The pooled olivine DNA extract was sequenced at the Marine Biological Laboratory in Woods

Hole, MA, and genomic DNA was amplified using whole genome amplification and then 2x101bp paired-end sequencing with dedicated-read indexing was performed

69 using an Illumina Hi-Seq1000 instrument. Sequence reads were de-multiplexed using

Consensus Assessment of Sequence and Variance (CASAVA) 1.8.2 (Illumina).

Pre-Processing of Sequence Read Files for Assembly

Raw sequence files were concatenated into read 1 (R1) and read 2 (R2) files for this metagenome and then sent through the String Graph Assembler (SGA)’s pre- processing pipeline (Simpson & Durbin, 2012). Reads with quality scores less than 10 or those less than 50 base pairs were eliminated. Ambiguous bases were called and the final read set was output into an interleaved, quality controlled mate pair file.

Metagenome assembly

Preprocessed reads were assembled into continuous sequences (contigs) using the Iterative deBruin Graph Assembler – Uneven Depth (IDBA-UD) program (Peng et al., 2012). The olivine preprocessed read fastq file was changed to a fasta file using fq2fa, and then run through IDBA-UD with the following specifications: minimum contig length of 450 bases, iterative kmer values from 45 – 69, stepwise by

4 (Supplemental Table 1). The assembly with kmer length of 65 was chosen as the optimal assembly due mainly to its largest n50 value and maximum contig size, and was therefore used in subsequent steps.

Pathway Distribution and Data Availability

Metagenome reads were uploaded to the Metagenomics Analysis Server (MG-

RAST) as raw, ~ 108 base long unprocessed reads un-joined with de-replication

(project: DCO_SMI_1301; files: mgs 129730 – mgs 129793), as assembled contigs

(project: DCO_SMI_1301; file: mgs 182665), and genome bins (project:

DCO_SMI_1301_bins). The annotated genome bin files from MG-RAST were used

70 to determine the distribution of biomolecular pathways among members of the olivine community. We compared the relative abundances of genes relating to carbohydrate, lipid, protein, and amino acid metabolisms as well as genes involved in respiration.

Nucleic acid metabolism genes were in very low abundance for all genomes and were therefore omitted from analysis. All files are also publicly available under the

Bioproject Number PRJNA264811 on the National Center for Biotechnology

Information website at https://www.ncbi.nlm.nih.gov.

Binning Genomes from the Olivine Metagenome

We used VizBin (Laczny et al., 2014), a Java-based genomic binning program to separate assembled genomic DNA contigs into genome bins for export and analysis. VizBin uses nonlinear dimension reduction of genomic signatures

(nucleotide frequency) to assign contigs to taxonomic bins (Laczny et al., 2014), with each bin representing all the contigs from a unique individual genome. These “bins” of taxonomically similar contigs (genomes) were exported for subsequent analysis in genome reconstruction.

Bin Taxonomic Assignments

We used PhyloPythia (Patil et al., 2012) to bin contigs into taxonomic groups

(Figure 3.2). These different taxonomic groups are represented by different colors on the VizBin output graph. Taxonomy was assigned at the class level.

Bin Quality Check

Bin completeness and contamination was assessed using the standard lineage- based workflow in CheckM (Supplemental Figure 3.1; Parks et al., 2015). Briefly,

Prodigal (Hyatt et al., 2010) and HMMER (http://hmmer.janelia.org/) were used to

71 locate 43 phylogenetically informative marker genes within each bin. These genes were then used to place each bin on a reference tree created from a concatenation of the markers. Based on this placement, a lineage-specific set of marker genes was chosen to estimate completeness and contamination of each individual bin.

Metabolic Reconstruction

Genome bins captured and exported from VizBin were uploaded to the Kyoto

Encyclopedia of Genes and Genomes (KEGG) using BlastKOALA (Kanehisa et al.,

2016). KEGG’s BlastKOALA output was used to determine completeness of metabolic pathways for this study regarding each genome bin (metabolic reconstruction). The reconstructed metabolism for each genome bin was used to assess potential for carbon fixation, possible energy sources, and potential for hydrogenotrophy for the olivine community. Genome bins were searched for hydrogenases, ferredoxins, and cytochromes for comparison across genomes.

Carbon Fixation Target Genes

The presence of a particular carbon fixation pathway in a community is often indicated by the presence of a key gene that is required for completion of one step of the carbon fixation pathway. This gene is critical to the inorganic carbon reduction step and is often referred to as a ‘target’ gene since it represents the crucial step in a carbon fixation pathway. Although there are a variety of pathways that are

‘incomplete’ in this study and others, these pathways may or may not be used for carbon fixation, depending on the genes present. In addition to pathway reconstruction, we investigated for the presence of ‘target’ genes for key carbon fixation pathways.

72

Translating and Assigning Function

Prodigal (Hyatt et al., 2010) was used to determine open reading frames in the kmer 65 assembly of the olivine metagenome obtained using IDBA-UD. Prodigal output was fed into DIAMOND using blastp against the refseq database (Buchfink et al., 2015) to assign function to proteins. Clusters of Orthologous Groups of proteins

(COGs) were produced from the protein file using Reverse PSI-Blast (rpsblast;

NCBI) and were used to search for gene candidates not identified by KEGG (i.e. formate dehydrogenase).

Results

Metagenome Assembly and Genome Binning

We produced a metagenome from the olivine minerals incubated in IODP

Hole 1301A on the JdFR, and from this we obtained twelve genome bins (Figure 2).

We produced 84 million sequences with an average length of 108 bases. Assembly of sequences with IDBA-UD resulted in an n50 (n50 = the size of the contig with 50 percent of sequences above and 50 percent below) of 19,191 bases (Supplemental

Table 3.1), the largest contig was 286,347 bases, and total assembly size 30,811,948 bases. Of the twelve genome bins, eleven represented high-quality complete (100 %) or near-complete (> 81.19 %) genomes, and one was incomplete when compared to other genomes from the same lineage (Figure 3.3; Supplemental Figure 3.1).

To identify the taxonomic lineages to which each genome belonged, the genes in each contig from a given genome bin were compared to between 59 – 218 genomes that were representative for each corresponding lineage based on the PhyloPythia

73 contig lineage results (Table 3.1). Minimal contamination or heterogeneity was observed in each genomic bin (Supplemental Figure 3.1), indicating relatively pure genomic bins (each bin contained DNA from the same unique organism). This is also reflected in the VizBin output where clear separation of contigs into the twelve discreet taxonomic bins was observed (Figure 3.2).

74

Figure 3.2. Genomic binning of olivine metagenome using VizBin. Individual genomes were binned based on sequence similarity (nucleotide frequency patterns). Twelve genomic bins were produced from the olivine metagenome. Bins 1 – 9 are Bacteria (blue ellipses are Firmicutes, Purple are Bacteria, and orange is Proteobacteria) and bins 10 – 12 (pink) are Archaea (all members of Archaeoglobaceae) (see Supplementary Table 3.2 for marker gene summary and Table 3.1 for bin phylogeny summary). Each dot represents a contiguous DNA sequence (contig) produced during metagenome assembly. Contigs with similar nucleotide frequencies (i.e. sequence similarity) will cluster together to represent all sequences that belong to a unique genome. Bins that are more spread out contain contigs that have less sequence similarity; however separation between individual bins is more important for separating individual genomes. Colors of dots correspond to class-level taxonomic assignments for each contig using PhyloPythia, which may vary depending on relatedness to sequenced genomes in the database. The completeness and purity of each bin was verified using CheckM (Figure 3.3; Supplemental Figure 1). Ellipses represent only the location of unique genomes within the VizBin map and are not fully representative of contigs contained within final bin files after all analyses were completed.

75

Genome Taxonomy

Taxonomic assignments of these bins aligned well with the expected taxonomic groups from previous pyrotag sequencing of the 16S rRNA genes in this community (Table 1; Figure 1; Smith et al., 2016). Three Archaeoglobaceae bins correspond with taxa previously identified as belonging to Archaeoglobus,

Geoglobus, and a related but unknown member of Archaeoglobaceae (Bins 10 – 12;

Table 3.1). Bacterial genomes corresponding to the Firmicutes, all belonging to the lineage of Clostridia, were also present (Bins 2, 3, and 6 – 9; Table 3.1). Other deep- branching bacteria were successfully binned, but taxonomic identification beyond domain Bacteria was unclear (Bins 4 and 5; Table 3.1). The incomplete genome (Bin

1) was previously identified as belonging to the genus Ralstonia, a laboratory contaminant originating from the DNA extraction kit used in this study that appears in low biomass samples; it was also present only in samples whose DNA was extracted using this kit in our laboratory and not found using other methods (Table

3.1; Smith et al., 2016; Salter et al., 2014). We removed this sequence from the olivine 16S rRNA gene pyrotag sequencing and this organism has not been described from the JdFR community over multiple previous studies (Orcutt et al., 2011;

Jungbluth et al., 2013; Robador et al., 2015; Jungbluth et al., 2016; Cowen et al.,

2003; Nakagawa et al., 2006; Huber et al., 2006); however, we analyzed the DNA of this bin to verify its phylogeny and to contrast its metabolic capabilities with those organisms native to the JdFR aquifer.

76

Table 3.1. CheckM marker gene summary and taxonomic assignment for each genomic bin from the olivine metagenome in this study. Unique phylogenetic markers (43) were used to assign genomes to taxonomic groups. Taxonomic abbreviations: k = kingdom/domain; p = phylum; c = class; o = order; f = family; s = genus and species name.

# Unique # Bin Markers Multi- CheckM Taxonomy ID (of 43) copy k__Bacteria;p__Proteobacteria;c__Betaproteobacteria;o__Burkholderiales; 1 16 0 f__Burkholderiaceae;s__Ralstonia_solanacearum 2 39 2 k__Bacteria;p__Firmicutes;c__Clostridia_3;o__Clostridiales_3 k__Bacteria;p__Firmicutes;c__Clostridia_3;o__Clostridiales_3; 3 43 0 f__Clostridiales_Family_XVII._Incertae_Sedis 4 41 0 k__Bacteria 5 41 0 k__Bacteria 6 43 0 k__Bacteria;p__Firmicutes;c__Clostridia_3;o__Clostridiales_3 7 41 0 k__Bacteria;p__Firmicutes;c__Clostridia_3;o__Clostridiales_3 8 43 0 k__Bacteria;p__Firmicutes;c__Clostridia 9 43 0 k__Bacteria;p__Firmicutes;c__Clostridia;o__Clostridiales;f__Peptococcaceae k__Archaea;p__Euryarchaeota;c__Archaeoglobi;o__Archaeoglobales; 10 43 0 f__Archaeoglobaceae;g__Archaeoglobus k__Archaea;p__Euryarchaeota;c__Archaeoglobi;o__Archaeoglobales; 11 43 0 f__Archaeoglobaceae;g__Archaeoglobus k__Archaea;p__Euryarchaeota;c__Archaeoglobi;o__Archaeoglobales; 12 43 0 f__Archaeoglobaceae;g__Archaeoglobus

Carbon Fixation Pathways

We found that the majority of organisms from this deep thermal olivine community possessed complete pathways for carbon fixation (Figure 3.3). At least two Clostridia (Bins 3 and 6) from the olivine biofilm community possess both branches of the Wood-Ljungdahl pathway for carbon fixation. Another five bacterial genomes (Bins 2, 4, 5, 7, and 9) contain near-complete Wood-Ljungdahl pathways, only lacking one step in the methyl branch (Figures 3.3 and 3.4A). Genes belonging to the reductive pentose phosphate cycle, or Calvin-Benson Cycle, were found in all quality genomes, but only one Archaeal genome (Bin 10) possessed the complete pathway from ribulose-5 phosphate, and this genome contained the genes to

77 synthesize the carbon fixation enzyme ribulose 1,5-bisphospate carboxylase/oxygenase (RuBisCo). The other two Archaeal genomes (Bins 11 and

12) also have genes found in this pathway, but they are missing this critical carbon fixation step, so these genes may not be used to fix carbon. All Archaeal genomes

(Bins 10 – 12) also contained the complete pathway for carbon fixation via the acetyl-

CoA pathway (Figure 3.3), which allows for the conversion of CO2 into acetyl-CoA and is analogous to the carbonyl branch of the methanogenic Wood-Ljungdahl pathway. Archaeal genomes also have the complete pathway for Coenzyme F-420 biosynthesis, an accessory pathway for methanogenesis; however, this pathway does not directly result in carbon fixation. Other carbon fixation pathways are also present

(i.e., with no more than two genes missing in a pathway) in lesser abundance than the

Wood-Ljungdahl and acetyl-CoA pathways (Figure 3.3); these include the incomplete reductive citrate cycle, dark cycle Crassulacean acid metabolism (CAM) in which four genomes contain the carbon fixation gene PEP carboxylase (ppdC; Bins 4, 5, 11, and 12), and the phosphate acetyltransferase-acetate pathway whereby acetyl-

CoA is converted to acetate in two Bacterial genomes (Bins 6 and 8; Figure 3.3).

Although nearly all of the bacterial genomes contained the Wood-Ljungdahl pathway, most of these were missing genes that code for one of two enzymes (Figure

3.4A). Bins 4 and 5 (‘Bacteria’) are missing the enzyme methylene-tetrahydrofolate reductase (metVF), and Bins 2, 7, and 9 are missing formate dehydrogenase (fdhAB).

The latter genomes do, however, contain potential alternative routes to either produce formate outside of the Wood-Ljungdahl pathway or to use analogous subunits from other formate dehydrogenases as substitutes for those that are missing (Figure 3.4B).

78

Bin 1 2 3 4 5 6 7 8 9 10 11 12 Bacteria Archaea Nitrogen fixation Missing 2 enzymes

Dissimilatory nitrate reduction Missing 1 enzyme N Assimilatory nitrate reduction Complete Complete nitrification

Assimilatory sulfate reduction Missing 2 enzymes S Dissimilatory sulfate reduction Missing 1 enzyme Complete

2-Oxocarboxylic acid chain extension CH

Methanogenesis (from CO2) Missing 2 enzymes 4 Methanogenesis (from acetate) Missing 1 enzyme related pathways Methanogenesis (from methanol) Complete Coenzyme M biosynthesis Formaldehyde assimilation Target gene for carbon * F420 biosynthesis fixation present Acetyl-CoA pathway * * *

Reductive pentose phosphate cycle (from ribulose-5P) * FixationC & related pathways Reductive pentose phosphate cycle (from glyceraldehyde-3P) CAM (dark) * * * * Missing 2 enzymes CAM (light) Missing 1 enzyme Wood-Ljungdahl pathway * * * * * * * Complete C4-dicarboxylic acid cycle (NADP - malic enzyme type) C4-dicarboxylic acid cycle (PEP carboxykinase type) Target gene for carbon * Reductive TCA cycle fixation present Phosphate acetyltransferase- pathway Incomplete reductive TCA cycle (to a-ketoglutarate)

Percent complete >0 >20 >40 >60 >80 >90 100

Genome size 500K + 1500K + 2000K + 3000K +

G + C content >30 >40 >50 >60

Figure 3.3. Summary of carbon fixation, methane, nitrogen, and sulfur cycling pathways in olivine genomic bins as determined from the KEGG Pathway Module. Colored boxes indicate presence of genes for each listed pathway. Any pathways not listed were absent from genome bins. Carbon and energy metabolisms and related pathways are separated by color and each metabolism listed on the right. Bins containing ‘target’ genes for carbon fixation are indicated with an asterisk. Completeness of each genome, genome size, and G + C content are listed at the bottom of the figure.

79

Figure 3.4. Wood-Ljungdahl pathway enzyme summary for olivine genomes (A), presence of potential alternatives to formate dehydrogenase (B), and relative abundance of respiratory and biomolecular metabolisms in olivine genomes (C). M = Methyl branch of the Wood-Ljungdahl pathway; C = carbonyl branch of the Wood-Ljungdahl pathway; WL = Wood-Ljungdahl pathway; PAAK = phosphate acetyltransferase-acetate kinase pathway. EC numbers in parentheses are enzyme commission numbers. Formate C-acetyltansferase is also known as pyruvate formate lyase.

Energy Metabolisms

Differences in the relative abundance of genes that fit into broad biomolecular energy metabolism categories were observed for organisms in this subseafloor olivine community (Figure 3.4C). For most of the organisms (nine of the twelve genomes),

80 carbohydrate metabolism genes were most abundant. For the other genomes, amino acid (Bin 6) and protein (Bins 4 and 9) metabolism genes were most abundant. This may indicate a stronger reliance on protein-related biomolecules for these organisms than . Lipid metabolism genes were least abundant in the energy metabolism of most genomes; however, respiration genes were least abundant in Bins

1 and 10.

Nitrogen and sulfur cycling are also potentially occurring in this basaltic aquifer habitat by the olivine community. The Bin 9 genome from this study, identified as a Firmicute from the Peptococcaceae family, has both the complete pathway for nitrogen fixation and dissimilatory sulfate reduction (Figure 3.3). Three other genome bins contained the dsr pathway, including Bins 10 and 12 which represent Archaeoglobaceae, and Bin 7, another Firmicute (Table 3.1; Figure 3.3), bringing the total bins that contained this pathway to four out of the eleven high- quality genomes produced in this study.

Hydrogenases and Electron Transfer Agents

The presence and abundance of different hydrogenases and other ETAs can be an indicator of modes of energy metabolism, such as hydrogenotrophy. All high- quality genome bins (bins 2 – 12) contained hydrogenase genes (Figure 3.5A;

Supplemental Table 3.4); however, some of these differed with respect to taxonomy.

Most genomes contained the hypABCDEF and hyaABD genes that encode for [NiFe] hydrogenases. Bins 4 and 5 (both Bacteria) had identical hydrogenase genes. Two

Clostridia (Bins 6 and 9) were more similar in terms of presence and abundance of hydrogenase genes, but Bin 9 had more gene copies of the hya hydrogenase, and Bin

81

6 contained a unique gene that was not present in any other genome, hyaC, the [NiFe] hydrogenase 1B-type cytochrome subunit. These two bins are also the only two that contained an ech, or ‘energy conserving’, hydrogenase gene, again showing similarity between these two genomes (Bins 6 and 9) in terms of hydrogenase genes. Genomes that belong to different phylogenetic groups such as Bin 8 (class_Clostridia), Bin 10

(Archaeoglobaceae), and Bin 3 (Clostridiaceae) all had unique complements of hydrogenases. Archaeal genomes also had different hydrogenase genes than bacterial genomes, except for hypABCDEF and hyaABD. Archaeal genomes were the only ones that contained the F420-reducing hydrogenase hnnCD genes.

Ferredoxin genes were variable across the metagenome (Figure 3.5B;

Supplemental Table 3.5); however, most genomes had several ferredoxin genes in common, with some ferredoxins unique to either individual genomes or particular lineages. All genomes contained the aor and korAB ferredoxins, and most genomes contained porABDG, the pyruvate:ferredoxin which catalyzes the reversible conversion of pyruvate to acetyl-CoA. Only high-quality bacterial genomes

(Bins 2 – 9) contained the fpr ferredoxin; no archaeal genome possessed this ferredoxin gene. Bin 6 contained the greatest number of copies of ferredoxin genes, mainly due to extra copies of the por ferredoxin. Bins 1, 2, 6, and 12 which all belong to different lineages did not contain the fer gene for ferredoxin.

Cytochromes and other ETAs (Figure 3.5C; Supplemental Table 6) were highly variable within genomes, with five genomes containing all succinate dehydrogenase genes (sdhABCD), and most containing at least sdhA. The Bin 7 genome contained a unique ETA, nrfHA, which reduces nitrite to ammonium during

82 respiration (Clarke et al., 2008), and Bin 1 (Betaproteobacteria) contained a completely unique set of ETAs. Bin 8 contained no cytochromes or other ETAs, and bins 4, 5, and 9 only contained one cytochrome gene, sdhA.

83

Figure 3.5. Hydrogenases and Electron Transfer Agents (ETAs) present in KEGG-annotated olivine genome bins. The number of copies of each gene in a genome is indicated on the x-axis. Bin numbers are listed on the y-axis. A. Hydrogenases and related genes that play key roles in hydrogenase function. B. Ferredoxin-related genes found in olivine genomes. C. Cytochromes and other ETAs found in each olivine genome. See Supplemental Tables 3.3 – 3.5 for gene descriptions.

84

Figure 3.5

85

Discussion

Phylogenetic Lineages of the Olivine Community and the Deep JDFR Aquifer

The taxonomic assignments of the olivine metagenome-derived genomes corresponded with the 16S rRNA gene pyrotag sequencing from the same DNA sample (Figure 3.1; Smith et al., 2016). We obtained eleven complete to near- complete genomes that belong to the bacterial and archaeal lineages of Bacteria,

Clostridia, and Archaeoglobaceae (Table 3.1). These taxa have also been identified as part of the unique attached community in the deep suboceanic aquifer of the JdFR, associated with the olivine from this study (Smith et al., 2016), basalt glass and plagioclase from companion flow cells (Smith et al., 2016), other igneous rock and mineral substrates (Orcutt et al., 2011), and a biofilm on the surface of a well head exposed to aquifer fluid (Nakagawa et al., 2006; Steinsbu et al., 2010).

The bacterial community of the eastern flank aquifer of the JdFR is dominated by Firmicutes (Smith et al., 2016; Orcutt et al., 2011; Jungbluth et al., 2016, 2013) and other deep-branching bacteria with no cultured representatives or close relatives, including undescribed members of Class Clostridia and its order Clostridiales that have been described primarily from attached communities of colonized surfaces

(Smith et al., 2016; Orcutt et al., 2011; Nakagawa et al., 2006), and members of the candidate phyla Acetothermia (formerly OP1) and Amenicenantes (formerly OP8)

(Hugenholtz et al., 1998; Smith et al., 2016; Jungbluth et al., 2013; Rinke et al.,

2013). Firmicutes and Acetothermia have members that are capable of using the

Wood-Ljungdahl pathway (Pierce et al., 2008; Chivian et al., 2008; Takami et al.,

2012). Some Clostridia from this community belong to the Family Peptococcaceae,

86 namely relatives of Candidatus Desulforudis audaxviator such as Ca. Desulfopertinax cowenii (Jungbluth et al., 2017). These organisms are likely common deep subsurface inhabitants of both marine and continental aquifers (Jungbluth et al., 2014; Smith et al., 2016; Jungbluth et al., 2013; Chivian et al., 2008; Jungbluth et al., 2017) and possess a suite of energy and carbon metabolisms that are predicted to enable them to adapt to low-nutrient, subsurface environments (Chivian et al., 2008), including carbon fixation via the Wood-Ljungdahl pathway (Chivian et al., 2008; Jungbluth et al., 2017).

Archaea from lineages that contain carbon fixation pathways have also been detected in the JdFR flank aquifer fluids and attached communities. The most abundant archaeal lineage in the attached community appears to be Archaeoglobaceae

(Orcutt et al., 2011; Cowen et al., 2003; Smith et al., 2016; Jungbluth et al., 2016,

2013), and members of this lineage can use the acetyl-CoA pathway (Braakman &

Smith, 2012), which is analogous to the carbonyl branch of the Wood-Ljungdahl pathway. Methanogenic archaea, that use a modified, more enzymatically-intensive

Wood-Ljungdahl pathway, have also been detected in the JdFR aquifer through activity measurements and 16S rRNA gene sequencing (Lever et al., 2013; Lin et al.,

2014; Orcutt et al., 2011; Nakagawa et al., 2006), but do not appear to be dominant members of this olivine biofilm community.

The co-dominance of lineages of Clostridia and Archaeoglobaceae has been described in terrestrial deep subsurface environments such as continental mines and fractures as well as marine aquifers (Magnabosco et al., 2015; Chivian et al., 2008;

Jungbluth et al., 2017; Smith et al., 2016; Orcutt et al., 2011; Cowen et al., 2003),

87 and we are just beginning to understand their role in harnessing carbon and energy in the deep crust. Their presence here in this JdFR community indicates there may be a shared functional role for these microorganisms in deep thermal aquifers, and that the

Wood-Ljungdahl pathway is more common in these environments than previously known. Deep thermal aquifers are somewhat isolated from the surface where photosynthetic productivity occurs, and primary production in these systems may be dependent on chemosynthesis using the available energy sources from water-rock reactions. This energy –limited environment can promote energetic efficiency in growth and maintenance of the cell; thus the bifunctional Wood-Ljungdahl pathway may be energetically favored in these types of systems where oxygen and organic matter are low and hydrogen is present (Nitschke & Russell, 2013).

Carbon Fixation Genes

The presence of a particular carbon fixation pathway in a community is often indicated by the presence of a key functional gene that is required for completion of a critical step of the carbon fixation pathway. This gene is involved in the inorganic carbon reduction step and is often referred to as a ‘target’ gene since its presence is indicative of carbon fixation ability. Although there are a variety of pathways that are

‘incomplete’ in this study and others, it must be stressed that these pathways may or may not be used for carbon fixation, depending on which genes are present or even utilized under certain conditions. Therefore, in addition to pathway reconstruction and to use for comparative purposes, we investigated whether ‘target’ genes for key carbon fixation pathways were present to use as a comparison and have included these genes in our discussion of key carbon fixation pathways below.

88

The Wood-Ljungdahl Pathway

The Wood-Ljungdahl pathway for carbon fixation and energy generation featured prominently in the metagenome-derived genomes from the young, thermal basaltic aquifer, and was present in nearly all of the bacterial genomes (Figure 3.3).

Three of the five bacterial genomes lacking one step to complete this pathway belong to the ‘order Clostridiales’ (Bins 2, 7, and 9), and the other two are classified as unknown ‘Bacteria’ (Bins 4 and 5). The ‘Clostridiales’ genomes are missing the carbon-reducing first step in the methyl branch of the Wood-Ljungdahl pathway, which results in the production of formate and is carried out by a formate dehydrogenase enzyme (Ragsdale, 2008b). Bins 4 and 5 are also missing a step in this pathway, which requires methylene-tetrahydrofolate (methylene-THF) reductase; however, this does not preclude these organisms from completing this pathway as this enzyme was previously determined to be ‘dispensable’ in an analysis of core and pan genomes of acetogens (Shin et al., 2016). Even though these genomes do not contain an enzymatically complete Wood-Ljungdahl pathway, it is predicted that Bins 4 and 5 have a ‘functionally complete’ Wood-Ljungdahl pathway (Figure 4A) and retain the ability to fix carbon (presumably originating from seawater bicarbonate). The organisms missing the formate dehydrogenase gene have alternative mechanisms available that may allow them to forego this step and still complete this pathway, as discussed below.

Although we were unable to identify an fdhA gene, the gene for a key subunit of formate dehydrogenase from three genomes (Bins 2, 7, and 9), these organisms may remain able to utilize this pathway (Figure 3.4A). Each of these bacteria is

89 capable of producing formate without this enzyme, since they contain at least one copy of the pyruvate formate lyase (PFL) gene (Figure 3.4B). This gene codes for the

PFL enzyme that produces formate and acetyl-CoA from pyruvate and CoA. This may be energetically advantageous since the acetyl-CoA produced here could be used to make more ATP if it is transformed into acetate. Some organisms may also be able to use one of the other eight types of formate dehydrogenase to produce formate in this environment (Maia et al., 2015). For example, the three genomes missing fdhA also contain genes coding for another formate dehydrogenase, formate dehydrogenase

O. The fdoG gene codes for the major subunit of a different formate dehydrogenase and is functionally analogous to fdhA (Maia et al., 2015). Yet another possibility is that the fdhA gene may have been mis-annotated (or marked as unknown), as was the case in the early genome annotation for Moorella thermoacetica, a common terrestrial acetogen (Pierce et al., 2008). The selenocysteine residue in a selenocysteine- containing FDH can be mistaken as a stop codon (Pierce et al., 2008), and when using the KEGG annotation method we used here, M. thermoacetica’s fdhA gene was also not identified.

To determine if each of these three ‘Clostridiales’ genomes contained potential fdhA genes that were mis-annotated or marked as ‘unknown’ using KEGG analysis, we determined the COG group to which the fdhA genes from the olivine community were assigned and identified all genes from the community assigned to this COG category (Supplemental Table 3.3). Any ‘unknown,’ or unidentified genes belonging to this COG category (COG0243, BisC, anaerobic dehydrogenase, typically selenocysteine-containing) were compared to those also identified as

90

‘unknown’ using KEGG module and clustered with the same genes in the genomes.

We found that the fdhA genes were adjacent to the nuoF gene, which codes for a subunit of NADH-quinone oxidoreductase that shuttles electrons for energy, and when coupled with the identification of a predicted general function for each

‘unknown’ gene, we found a potential fdhA candidate for each of the three genomes missing fdhA (Supplemental Table 3.3). Thus, there is a potential for these organisms to produce formate dehydrogenase through an unknown fdhA gene. We predict that all ‘Clostridia’ from this study could have ‘functionally complete’ Wood-Ljungdahl pathways for acetogenesis as several options exist for these organisms to complete this pathway (Figure 3.4C).

The Acetyl-CoA Pathway

The acetyl-CoA pathway, which is analogous to the carbonyl branch of the

Wood-Ljungdahl pathway and is found in methanogens and sulfate reducers, was complete in all Archaeoglobaceae genomes (Figure 3.3). All other quality genomes except the ‘Class Clostridia’ Bin 8 were missing two steps, which consist of protein subunits coded by three separate genes (cdhA, cdhB, and cdhC). These genes code for the Archaeal-type carbon-fixing CODH/acetyl-CoA enzymes used in the carbonyl branch of the methanogenic Wood-Ljungdahl pathway (Maupin-Furlow & Ferry,

1996), which are analogous to the bacterial CODH/acs system of the acetogenic

Wood-Ljungdahl pathway present in seven bacterial genomes (the eighth bacterial genome does not contain genes for the Wood-Ljungdahl pathway and does not appear to fix carbon). This suggests that at least ten of the eleven high-quality genomes

91 obtained from this study have complete pathways for carbon fixation using enzymes of the Wood-Ljungdahl pathway.

CAM metabolism

Some of the genes involved in the CAM metabolism pathway are present in the olivine genomes, but may these be used in other metabolic pathways since CAM metabolism is incomplete in all genomes. In the case of CAM (light) metabolism

(Figure 3.3), Bins 2 – 9 contain the gene pyruvate, orthophosphate dikinase (ppdK), which allows for the conversion of pyruvate to phosphoenolpyruvate (Hatch & Slack,

1968). Having the ppdK gene is a trait that is useful in low-nutrient environments such as the JdFR aquifer since it allows an organism to link the steps of the citric acid cycle, Wood-Ljungdahl, and pathways together for optimal energy and biosynthesis. This enzyme allows for the production of pyruvate from acetyl-

CoA, a product of the Wood-Ljungdahl pathway which is also used in the citric acid cycle. The pyruvate can be then be used for gluconeogenesis. The other gene involved in light-phase CAM metabolism is malate dehydrogenase (oxaloacetate- decarboxylating, NADP+; maeB), which was not found in any of the bins from the olivine metagenome. This enzyme converts malate to pyruvate and releases CO2 in

CAM organisms (Harary et al., 1953), and is the primary reason this pathway is incomplete in Bins 2 – 9.

The dark phase of CAM metabolism is where CO2 from the light phase, or the environment, is fixed back into organic carbon. Here, two Archaeal organisms represented by Bins 11 and 12 have both genes to complete this part of the CAM pathway (Figure 3.3). Bins 4 and 5 (‘Bacteria’) also contain the key gene of the

92 carbon fixation step, which is ppc, or phosphoenolpyruvate carboxylase, which converts phosphoenolpyruvate (PEP) to oxaloacetate while fixing CO2. Bins 9 – 12 all contain the malate dehydrogenase (mdh) gene that completes this CAM cycle and produces malate from oxaloacetate. It is assumed that none of the organisms whose genomes are represented from the olivine metagenome are using CAM metabolism since no genome has the complete set of four enzymes to complete this cycle; however, those who have the carbon fixation target gene for this pathway may retain the ability to use it in some capacity.

Sulfur and Nitrogen Metabolism

In addition to the Wood-Ljungdahl pathway, sulfate reduction may play a prominent role in energy metabolism in this community. At least four out of the eleven high-quality genomes we produced from the olivine biofilm had complete pathways for dissimilatory sulfate reduction (Figure 3.3). This is consistent with the presence of sulfate in the anoxic aquifer fluid at the time of recovery (17.6 mmol kg-1;

Wheat et al., 2010; Lin et al., 2012), and that sulfate reduction is predicted to be thermodynamically favorable in the JdFR aquifer (Boettger et al., 2013). Sulfate- reducing lineages have previously been reported from multiple subseafloor microbial observatory studies on the eastern flank of the JdFR, including the location of this study, IODP Hole 1301A (Smith et al., 2016; Robador et al., 2015; Orcutt et al.,

2011; Jungbluth et al., 2013; Lever et al., 2013). Species of Archaeoglobus are known to reduce sulfate either using organic carbon or H2 as reductants (Steinsbu et al., 2010; Klenk et al., 1998). Two of the three archaeal genomes identified from the olivine biofilm in this study (an Archaeoglobus member and another related

93

Archaeoglobaceae with similarity to the JdFR sulfate-reducing isolate A. sulfaticallidus) also contain this pathway (Steinsbu et al., 2010). Thus, these organisms may rely on sulfate reduction as a main source of energy in this environment. Ca. Desulfopertinax cowenii, a member of the Clostridial

Peptococcaceae lineage that is closely related to Ca. Desulforudis audaxviator

(Chivian et al., 2008), was also shown to possess the dissimilatory sulfate reduction pathway (Jungbluth et al., 2017). The Peptococcaceae genome identified in this study

(Bin 9) also contains this pathway. Bin 6 contains a near-complete dissimilatory sulfate reduction pathway, missing only one step; however, this step requires dissimilatory sulfate reductase (dsrAB), the key enzyme in sulfate reduction, so this organism may not use this pathway for energy; however, sulfate reduction is a common attribute of acetogens since it provides additional energy to supplement acetogenesis (Pierce et al., 2008).

Only one genome bin (Bin 9) contained a complete nitrogen metabolism pathway (Figure 3.3). Nitrogen is not a limiting nutrient in this environment (Lin et al., 2012), and this community may be able to rely on organic nitrogen sources like amino acids and small peptides. Bin 9, whose genome is similar to the Ca.

Desulforudis and Ca. Desulfopertinax lineages, contains the pathway for nitrogen fixation. Ca. Desulforudis audaxviator also contains the nitrogen fixation pathway, but Ca. Desulfopertinax cowenii does not (Chivian et al., 2008; Jungbluth et al.,

2017). It was proposed that Ca. Desulforudis audaxviator has a highly flexible metabolism, as it has the Wood-Ljungdahl pathway for acetogenesis, the sulfate reduction pathway, nitrogen fixation capability, and is able to use multiple carbon and

94 energy sources to survive in a nutrient-limited environment (Chivian et al., 2008).

The Peptococcaceae genome from this study appears to have similar properties; further reinforcing the idea that metabolic versatility may be an important strategy for surviving in nutrient-constrained deep subsurface environments.

Hydrogenases and Other ETAs

Hydrogenases have been linked to early metabolisms on Earth (Boyd et al.,

2014) and may have been a precursor to respiratory Complex I and chemiosmosis

(Peters et al., 2015). They play a key role in hydrogenotrophic metabolisms like the

Wood-Ljungdahl pathway (Vignais & Billoud, 2007) and can facilitate the uptake and

+ - 2- of H2 to 2 H using oxidants such as O2, CO2, NO3 , SO4 , and small organic molecules like fumarate (Vignais & Billoud, 2007). The Wood-Ljungdahl pathway utilizes hydrogenases to provide reducing equivalents for CO2 reduction in both branches, and includes Electron Transfer Agents (ETAs) such as ferredoxins and cytochromes to couple this pathway with energy-generating processes. Hydrogenases are also more abundant in H2-rich environments such as serpentinizing systems

(Brazelton et al., 2012), and the types of hydrogenases present may be linked to particular metabolic functions.

The distribution of hydrogenases and related genes was variable in the olivine hosted genomes (Figure 3.5), which may indicate that these organisms are using hydrogenases in different pathways. There were three clear sets of genomes that were similar; Bins 6 and 9 had similar complements of hydrogenases genes, and there was evidence of lineage-specific hydrogenases within the two Bacterial (Bins 4 and 5) and two Archaeal (Bins 11 and 12) genomes. The hnnCD hydrogenase genes found in the

95

Archaeal genomes code for the F420-reducing hydrogenase (Supplemental Table

3.4). This hydrogenase is used by methanogens, who are close relatives of the

Archaeoglobaceae, in the methanogenic Wood-Ljungdahl pathway (Vignais &

Billoud, 2007; Peters et al., 2015). The two ‘Bacteria’ had identical hydrogenases, ferredoxins, and cytochrome genes, and they contained a unique hydrogenase encoded by the hoxHYU genes that was not found in the other genomes. This hydrogenase is unique in that it is a ‘hydrogen sensor’ that regulates the transcription of other hydrogenases. Bins 6 and 9 are both from class Clostridia and had similar hydrogenases, but other Clostridia did not share the same hydrogenases with these two genomes. This may indicate a shared functional role for these two organisms.

However, these two organisms did not contain similar ferredoxin or cytochrome genes, which may be explained either by their unique phylogenies or the presence of different metabolic pathways. The Bin 6 genome was unusual in that it had the greatest number of ferredoxin genes and its cytochromes were most similar to the

Archaeal gene set. These groups shared both the ccm and the sdh genes, respiratory membrane proteins responsible for cytochrome C maturation (Sanders et al., 2010) and the succinate dehydrogenase respiratory complex, respectively. Cytochrome C and succinate dehydrogenase are active in generating energy for cells using electron transport. Only six genomes contained the complete succinate dehydrogenase gene set and are able to make all four subunits, and only three of those genomes are able to produce a fully functional cytochrome C.

Olivine Community Model

The olivine community described here appears to have the ability to use

96 energy and carbon substrates originating from the aquifer fluid, from the biofilm and its members, and water-rock reactions occurring at the surface of olivine. Phosphate, inorganic carbon (bicarbonate), dinitrogen, and sulfate from the aquifer fluid can all be utilized by the community (Figure 3.6). Bin 8, which is likely the most abundant organism and a heterotroph with the ability to utilize a wide variety of substrates, is able to produce formate (Figure 3.4B). This organism has the largest genome, and possesses eleven copies of the PFL gene that would allow it to quickly produce large amounts of formate; a veritable ‘formate factory’. Wood-Ljungdhal organisms that lack formate dehydrogenase may use this formate if they cannot produce formate themselves or so they can reduce energy expenditure from the production of enzymes.

This would be especially advantageous to organisms in an energy-limited environment such as the JdFR aquifer (Lever, 2012). Bin 6, which is likely the second most abundant organism in this community, is an acetogen that appears to rely on H2 from water-rock reactions and may also rely on amino acids or small peptides for carbon and energy (Figure 3.4C). This organism also contains the PAAK pathway to make acetate, which could supplement heterotrophic members of the community such as sulfate reducers (or potentially methanogens not noted here) that can use acetate.

The ‘Bacteria’ (Bins 4 and 5) can import simple sugars and also use the Wood-

Ljungdhal pathway.

All organisms except for the ‘true’ heterotroph, Bin 8, have pathways or target genes that allow them to potentially fix carbon and use molecular hydrogen for energy. Although some H2 is present in aquifer fluid (~ 2 M; Lin et al., 2014), much

2+ of the H2 fueling this community may be coming from the oxidation of Fe in olivine

97 by water (Figure 3.6). This olivine community appears to benefit from a steady source of H2 and a mix of metabolic strategies that fuel syntrophic relationships, providing the basis for a strong, sustainable aquifer community.

Figure 3.6. Olivine community model depicting nutrient cycling and proposed routes of metabolites and substrates. Each oval represents a bacterial (2 – 9) or archaeal (10 – 12) representative from the olivine biofilm (olivine = large green structure; biofilm = yellow structure surrounding olivine and containing microbes). Water reacting with reduced iron in olivine produces molecular hydrogen and iron oxides, which can be deposited on the surface of olivine as secondary minerals (orange structure on olivine). The molecular hydrogen from this reaction is presumed to help fuel the microbial biofilm through the Wood-Ljungdahl pathway. The aquifer fluid containing nutrients such as bicarbonate or CO2, phosphate, sulfate, nitrogen, and fluid-derived organic carbon is represented by the blue background. Dashed lines are possible routes of production and consumption of nutrients. Solid lines indicate genomic potential for import or metabolism of particular substrates.

Conclusion

This study provides genomic evidence of a marine subsurface chemosynthetic community with a dependence on H2 and sulfate. The Wood-Ljungdahl pathway appears to be the predominant metabolic strategy in this community, and this finding could represent a paradigm shift away from the understanding that the Calvin Cycle is

98 the dominant carbon fixation pathway in oceanic crust. Seven novel acetogens composed the bulk of the thermal suboceanic aquifer olivine community, an unusual finding for this environment. These results demonstrate the genetic potential to utilize

H2 from the aquifer fluid or resulting from water-rock reactions for biosynthesis of organic matter and energy in the JdFR basaltic aquifer. The study of microbial communities that employ ancient metabolisms to survive in environments analogous to those on early Earth or other planets or moons, much like the community described here, may yield clues as to the metabolic strategies used to overcome carbon and energy limitations. This is particularly important for the study of life on other planets since they lack strong sunlight to support photosynthetic communities.

In the future, understanding of this suboceanic ecosystem will be improved with the study of more rock and mineral communities from the aquifer (both in-situ incubation studies and cores from a wide variety of locations) and through the application of culture-dependent methods which will help assess the full metabolic potential of these organisms. Members of this microbial community have no cultured representatives and genomic studies such as these can provide hints of the metabolic repertoire possessed by these unique organisms and allow us a better chance of success in culturing these deep inhabitants. Use of H2 or Fe-bearing rocks as growth substrates and considering the use of co-cultures may be a good strategy for growing these organisms now that we are armed with knowledge of their metabolic capabilities.

99

Acknowledgements

Metagenome sequencing was made possible by the Deep Carbon

Observatory’s Census of Deep Life supported by the Alfred P. Sloan Foundation and was performed at the Marine Biological Laboratory (Wood Hole, MA, USA). We are grateful for the assistance of Mitch Sogin, Susan Huse, Joseph Vineis, Andrew

Voorhis, Sharon Grim, and Hilary Morrison at MBL. Andrew Fisher, C. Geoffrey

Wheat, Hans Jannasch, Stefan Sievert, Keir Becker, Mark Nielsen, and the crews of submersible DSRV Alvin and the RV Atlantis and JOIDES Resolution assisted with flow cell development, deployment, and retrieval. Thanks to Teresa Sawyer in OSU’s

EM facility for her help with training and imaging. William Rugh contributed to the design of the flow cells. This is Center for Dark Energy Biosphere Investigations (C-

DEBI) contribution.

References

Boettger J, Lin H-T, Cowen JP, Hentscher M, Amend JP. (2013). Energy yields from chemolithotrophic metabolisms in igneous basement of the Juan de Fuca ridge flank system. Chem Geol 337–338:11–19.

Boyd E, Schut G, Adams M, Peters J. (2014). Hydrogen metabolism and the evolution of biological respiration. Microbe 9:361–367.

Braakman R, Smith E. (2012). The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol 8:e1002455.

Brazelton WJ, Morrill PL, Szponar N, Schrenk MO. (2013). Bacterial communities associated with subsurface geochemical processes in continental serpentinite springs. Appl Environ Microbiol 79:3906–16.

100

Brazelton WJ, Nelson B, Schrenk MO. (2012). Metagenomic evidence for H2 oxidation and H2 production by serpentinite-hosted subsurface microbial communities. Front Microbiol 2:268.

Buchfink B, Xie C, Huson DH. (2015). Fast and sensitive protein alignment using DIAMOND. Nat Meth 12:59–60.

Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, DeSantis TZ, et al. (2008). Environmental genomics reveals a single-species ecosystem deep within Earth. Science 322:275–278.

Clarke TA, Mills PC, Poock SR, Butt JN, Cheesman MR, Cole JA, et al. (2008). Escherichia coli Cytochrome c Nitrite Reductase NrfA. Methods Enzymol 437:63–77.

Cowen JP, Giovannoni SJ, Kenig F, Johnson HP, Butterfield D, Rappé MS, et al. (2003). Fluids from aging ocean crust that support microbial life. Science 299:120–3.

Fisher AT, Urabe T, Klaus A. (2005a). IODP expedition 301 installs three borehole crustal observatories, prepares for three-dimensional, cross-hole experiments in the Northeastern Pacific Ocean. Sci Drill 1. http://www.iodp.org/iodp_journals/SD001_04_SRexp301.pdf.

Fisher AT, Wheat CG, Becker K, Davis EE, Jannasch H, Schroeder D, et al. (2005b). Scientific and technical design and deployment of long-term subseafloor observatories for hydrogeologic and related experiments, IODP Expedition 301 , eastern flank of Juan de Fuca Ridge 1 and general design. Proc Integr Ocean Drill Progr 301. doi:10.2204/iodp.proc.301.103.2005.

Fuchs G. (2011). Alternative Pathways of Carbon Dioxide Fixation : Insights into the Early Evolution of Life ? Annu Rev Microbiol 65:635–58.

Grabarse W, Mahlert F, Duin EC, Goubeaud M, Shima S, Thauer RK, et al. (2001). On the mechanism of biological methane formation: structural evidence for conformational changes in methyl-coenzyme M reductase upon substrate binding. J Mol Biol 309:315–330.

Harary I, Korey SR, Severo O. (1953). Biosynthesis of dicarboxylic acids by carbon dioxide fixation. J Biol Chem 203:595–604.

Hatch MD, Slack CR. (1968). A new enzyme for the interconversion of pyruvate and phosphopyruvate and its role in the C4 dicarboxylic acid pathway of photosynthesis. Biochem J 106:141–146.

101

Huber J, Johnson HP, Butterfield D, Baross J. (2006). Microbial life in ridge flank crustal fluids. Environ Microbiol 8:88–99.

Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119.

Jungbluth SP, Bowers RM, Lin H, Cowen JP, Rappé MS. (2016). Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt. Nat Publ Gr 10:1–15.

Jungbluth SP, Grote J, Lin H-T, Cowen JP, Rappé MS. (2013). Microbial diversity within basement fluids of the sediment-buried Juan de Fuca Ridge flank. ISME J 7:161–172.

Jungbluth SP, del Rio TG, Tringe SG, Stepanauskas R, Rappé MS. (2017). Genomic comparisons of a bacterial lineage that inhabits both marine and terrestrial deep subsurface systems. PeerJ 1–22.

Kanehisa M, Sato Y, Morishima K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol 428:726–731.

Klenk H, Clayton RA, Tomb J, Dodson RJ, Gwinn M, Hickey EK, et al. (1998). The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon. Nature 394:6342–6349.

Laczny CC, Pinel N, Vlassis N, Wilmes P. (2014). Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci Rep 4:4516.

Lever MA. (2012). Acetogenesis in the energy-starved deep biosphere-a paradox? Front Microbiol 2. doi:10.3389/fmicb.2011.00284.

Lever M, Rouxel O, Alt J, Shimizu N. (2013). Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Cowen JP, Olson EJ, Amend JP, Lilley MD. (2012). Inorganic chemistry, gas compositions and dissolved organic carbon in fluids from sedimented young basaltic crust on the Juan de Fuca Ridge flanks. Geochim Cosmochim Acta 85:213– 227.

102

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. (2014). Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

Magnabosco C, Ryan K, Lau MCY, Kuloyo O, Lollar BS, Kieft TL, et al. (2015). A metagenomic window into carbon metabolism at 3 km depth in Precambrian continental crust. ISME J 10:730–741.

Maia LB, Moura JJG, Moura I. (2015). Molybdenum and tungsten-dependent formate dehydrogenases. J Biol Inorg Chem 20:287–309.

Maupin-Furlow JA, Ferry JG. (1996). Analysis of the CO dehydrogenase/acetyl- coenzyme A synthase operon of Methanosarcina thermophila. J Bacteriol 178:6849– 6856.

Mayhew LE, Ellison ET, Mccollom TM, Trainor TP, Templeton AS. (2013). Hydrogen generation from low-temperature water–rock reactions. Nat Geo 6. doi:10.1038/NGEO1825.

McCarthy MD, Beaupré SR, Walker BD, Voparil I, Guilderson TP, Druffel ERM. (2010). Chemosynthetic origin of 14C-depleted dissolved organic matter in a ridge- flank hydrothermal system. Nat Geosci 4:32–36.

Nakagawa S, Inagaki F, Suzuki Y, Steinsbu BO, Lever MA, Takai K, et al. (2006). Microbial community in black rust exposed to hot ridge flank crustal fluids. Appl Environ Microbiol 72:6789–99.

Nitschke W, Russell MJ. (2013). Beating the acetyl coenzyme A-pathway to the origin of life. Philos Trans R Soc Lond B Biol Sci 368:20120258.

Orcutt BN, Bach W, Becker K, Fisher AT, Hentscher M, Toner BM, et al. (2011). Colonization of subsurface microbial observatories deployed in young ocean crust. ISME J 5:692–703.

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–55.

Patil KR, Roune L, McHardy AC. (2012). The PhyloPythia Web Server for Taxonomic Assignment of Metagenome Sequences. PLoS One 7:e38581.

103

Peng Y, Leung HCM, Yiu SM, Chin FYL. (2012). IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.

Peters JW, Schut GJ, Boyd ES, Mulder DW, Shepard EM, Broderick JB, et al. (2015). [ FeFe ] - and [ NiFe ] -hydrogenase diversity , mechanism , and maturation. BBA - Mol Cell Res 1853:1350–1369.

Pierce E, Xie G, Barabote R, Saunders E, Han C, Detter J, et al. (2008). The complete genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ Microbiol 10:2550–73.

Ragsdale SW. (2008). Enzymology of the Woods-Ljungdahl Pathway of Acetogenesis. Ann N Y Acad Sci 1125:129–136.

Robador A, Jungbluth SP, LaRowe DE, Bowers RM, Rappé MS, Amend JP, et al. (2015). Activity and phylogenetic diversity of sulfate-reducing microorganisms in low-temperature subsurface fluids within the upper oceanic crust. Front Microbiol 6:1–13.

Sanders C, Turkarslan S, Lee DW, Daldal F. (2010). Cytochrome c biogenesis: The Ccm system. Trends Microbiol 18:266–274.

Shin J, Song Y, Jeong Y, Cho BK. (2016). Analysis of the core genome and pan- genome of autotrophic acetogenic bacteria. Front Microbiol 7. doi:10.3389/fmicb.2016.01531.

Simpson JT, Durbin R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556.

Smith A, Popa R, Fisk M, Nielsen M, Wheat CG, Jannasch HW, et al. (2011). In situ enrichment of ocean crust microbes on igneous minerals and glasses using an osmotic flow-through device. Geochemistry Geophys Geosystems 12:1–19.

Smith AR, Fisk MR, Thurber AR, Flores GE, Mason OU, Popa R, et al. (2016). Deep Crustal Communities of the Juan de Fuca Ridge Are Governed by Mineralogy. Geomicrobiol J 451:0.

Steinsbu BO, Thorseth IH, Nakagawa S, Inagaki F, Lever MA, Engelen B, et al. (2010). Archaeoglobus sulfaticallidus sp . nov ., a thermophilic and facultatively lithoautotrophic sulfate-reducer isolated from black rust exposed to hot ridge flank crustal fluids. Int J Syst Evol Microbiol 60:2745–2752.

104

Stevens TO, McKinley JP. (1995). Lithoautotrophic Microbial Ecosystems in Deep Basalt Aquifers. Science 270:450–454.

Takami H, Noguchi H, Takaki Y, Uchiyama I, Toyoda A, Nishi S, et al. (2012). A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem. PLoS One 7:e30559.

Vignais PM, Billoud B. (2007). Occurrence , Classification , and Biological Function of Hydrogenases : An Overview. Chem Rev 107:4206–4272.

Wang H, Edwards KJ. (2009). Bacterial and Archaeal DNA extracted from inoculated experiments: implication for the optimization of DNA extraction from deep-sea basalts. Geomicrobiol J 26:463–469.

Wheat CG. (2004). Heat flow through a basaltic outcrop on a sedimented young ridge flank. Geochemistry Geophys Geosystems 5. doi:10.1029/2004GC000700.

Wheat CG, Jannasch HW, Fisher AT, Becker K, Sharkey J, Hulme S. (2010). Subseafloor seawater-basalt-microbe reactions: Continuous sampling of borehole fluids in a ridge flank environment. Geochemistry Geophys Geosystems 11:1–18.

105

4. DOMINANT BACTERIUM FROM A THERMAL SUBOCEANIC AQUIFER OLIVINE BIOFILM IS A NOVEL ACETOGEN

Abstract

The Wood-Ljungdahl pathway for acetogenesis is an ancient carbon fixation and energy-generating biosynthetic pathway, forming an essential component of the metabolism of microbes residing in deep terrestrial aquifers. Although this pathway has not been commonly reported in the suboceanic aquifer biosphere, there is evidence that this pathway first arose in oceanic crust. Here we describe the genome of a novel acetogen from a thermal suboceanic aquifer olivine biofilm in the basaltic crust of the Juan de Fuca Ridge (JdFR). This organism encodes the genes for the

Wood-Ljungdahl pathway and is phylogenetically distinct from a nearby clade of other acetogens from the order Clostridiales; thus this bacterium is proposed as

Candidatus Acetocimmeria pyornia. Unlike the closely-related acetogen clade and other bacteria in the deep igneous crust, Ca. A. pyornia appears to be incapable of using sulfate reduction for energy generation or using carbohydrates and lipids as carbon sources or electron donors. Ca. A. pyornia has genes that may allow it to use peptides and amino acids to supplement its carbon and energy metabolism instead, which may be a reflection of its preferred local environment attached to minerals in an energy-limited environment. There is evidence that this organism has a tendency towards a biofilm lifestyle on Fe-bearing minerals and crusts. This evidence includes genes involved in surface adhesion, use of metallic cations found in Fe-bearing minerals, hydrogenases that utilize molecular hydrogen (an alteration product formed as water reacts with Fe-bearing minerals), and it is uncommon in aquifer fluid. Based

106 on these findings, we predict that this organism maintains an ancient chemosynthetic lifestyle and that it is uniquely adapted to life in Fe-bearing mineral biofilms.

Introduction

The extrusive layer of igneous oceanic crust is one of Earth’s largest microbial habitats and is predicted to contain up to seventy-five percent of the total carbon biomass on Earth (~ 200 Pg of carbon; Pg = 1015 g; Heberling et al., 2010; Whitman et al., 1998; Kallmeyer et al., 2012). Water-rock reactions deep in the crust are known to support active subsurface chemosynthetic microbial communities (Edwards et al., 2011; Orcutt et al., 2015; Lever et al., 2013), and the production of fresh

(labile) organic carbon that vents to the seafloor (McCarthy et al., 2010; Mason et al.,

2009a) may significantly increase secondary productivity in the deep ocean and affect global carbon cycling. Little is known about the specific metabolisms of organisms in this habitat or the rates of chemosynthesis that occur, which is often due to the logistical difficulties of deep biosphere sampling or the application of traditional cultivation techniques to organisms with unknown metabolic needs. We present a novel subseafloor acetogen capable of using the Wood-Ljungdahl pathway for chemosynthesis and energy generation in order to understand its role in the carbon cycle and to guide future cultivation efforts.

Recently, it was reported that acetogenesis may be occurring in the oceanic crustal aquifer (Chapter 3; Jungbluth et al., 2017). In deep oceanic crust like the

JdFR, dissolved organic carbon is lower than that of deep seawater (Lin et al., 2012), and photosynthesis cannot occur as sunlight does not penetrate the crust. The

107 acetogenesis pathway may be advantageous to subseafloor microbes by allowing them to fix carbon and conserve energy. Acetogenesis also appeared early in oceanic crust and remained the dominant form of primary production until the appearance of photosynthesis (Ragsdale & Pierce, 2008; Nitschke & Russell, 2013); thus, it follows that acetogenesis potentially remains a key metabolic strategy in deep crustal aquifers.

The Wood-Ljungdahl pathway for acetogenesis can be used for biosynthesis

(if the main product is the building block acetyl-CoA) or for energy conservation (if the pathway proceeds all the way to acetate). Acetogens may use either mode of metabolism, switching between them depending on the needs of the organism (Pierce et al., 2008; Lever, 2012). Additional energy could be gained from this pathway by utilizing sodium or proton pumps coupled to enzymatic steps, which would allow for

ATP generation using ATP synthase (Schuchmann & Müller, 2014). This pathway has two branches, the carbonyl and the methyl branch, which both use molecular hydrogen (H2) in the reduction and condensation of two molecules of CO2 to form acetyl-CoA. Acetyl-CoA is a key biomolecule used in multiple biosynthetic pathways such as lipid synthesis (Ragsdale, 2008a; Pierce E et al., 2008). Molecular hydrogen

(a product of water reacting with iron-bearing minerals) and CO2, or bicarbonate

- (HCO3 ), are abundant in oceanic crust (Lin et al., 2014; Ver Eecke et al., 2012; Takai et al., 2004; Brazelton et al., 2006). Although acetogenesis is less thermodynamically favorable than methanogenesis, which also uses H2 and CO2, acetogens have the advantage of requiring a smaller set of enzymes for biosynthesis and energy generation via the Wood-Ljungdahl pathway. This lowers the overall energetic cost of

108 metabolism and may allow acetogens to compete more effectively for resources with methanogens (Ragsdale & Pierce, 2008; Nitschke & Russell, 2013; Lever, 2012).

Since the production of acetate yields little energy, acetogens often utilize additional metabolisms such as sulfate reduction and heterotrophy to supplement their energy needs (Lever, 2011; Ragsdale & Pierce, 2008; Drake et al., 1997;

Schuchmann & Müller, 2014; Thór Marteinsson et al., 2012b). Moorella thermoacetica (Pierce et al., 2008; Fontaine et al., 1942), Candidatus Desulforudis audaxviator (Chivian et al., 2008), and Desulfopertinax hafniense (Nonaka et al.,

2006) are three terrestrial acetogens who have had their complete genome sequenced, and each has additional energy-generating pathways that are uniquely suited to their needs. M. thermoacetica is a common acetogen that uses carbohydrates and a wide variety of other substrates for growth (Pierce et al., 2008; Fontaine et al., 1942). Ca.

Desulforudis audaxviator is a deep subsurface that can use H2 that is derived from radiolytic processes and is able to reduce sulfate and fix nitrogen

(Chivian et al., 2008). Desulfitobacterium hafniense is able to use dehalogenation as an alternative energy acquisition strategy to acetogenesis in addition to using a multitude of electron acceptors and donors (Nonaka et al., 2006). Each of these organisms may take advantage of unique environmental conditions that help shape their metabolisms. In the marine subsurface, inorganic electron donors and acceptors are expected to be the primary source of energy much like the deep terrestrial subsurface where Ca. D. audaxviator is known to live, and peptides and amino acids have been reported both in aquifer fluid and adsorbed to mineral surfaces (Lin et al.,

2015). We compare the genome characteristics of Ca. A. pyornia to the genomes of

109 these three known acetogens to determine how similar its genome is to those of closely related acetogens.

We found novel genomic evidence that a dominant member from a mineral biofilm collected from a deep subsurface oceanic crust sample is a thermophilic acetogen from the bacterial order Clostridiales. This organism has the genetic potential to act as a primary producer of this simplified microbial community. Based on the genome information this organism is capable of chemosynthesis using molecular hydrogen as a reductant to fix carbon in the form of bicarbonate from seawater and gain energy. It may be supplementing its energy by utilizing amino acids or peptides commonly found in the environment, but it does not appear to use sulfate reduction or import carbohydrates. We propose this novel subseafloor acetogen as Candidatus Acetocimmeria pyornia (aceto = acetogen; Cimmeria = dweller of dark land beyond the ocean; pyornia, short for ‘rock biter’).

Materials and Methods

In-situ Colonization of Olivine in the JdFR Aquifer

Olivine and other igneous minerals and glasses were incubated in 3.5-mya oceanic crust for four years to investigate the properties of surface-colonizing microbial communities from the Juan de Fuca Ridge aquifer. These ~ 2 mm-sized mineral grains were emplaced in flow cells at 275 – 287 meters below seafloor (mbsf) in International Ocean Drilling Program Hole 1301A (47° 45.210′ N, 127° 45.833′ W;

2667 m below sea level) on the eastern flank of the Juan de Fuca Ridge (Smith et al.,

2011, 2016; Fisher et al., 2005). Details of incubation and sample retrieval have been

110 reported previously, and are only briefly summarized here (Smith et al., 2011, 2016).

After the four-year incubation in Hole 1301A, recovered minerals were frozen at – 40 oC.

Genomic DNA Extraction and Metagenome Sequencing

The extraction and amplification methods were published previously (Smith et al., 2011, 2016) and are summarized again here. Genomic DNA was extracted from ~

500 mg olivine using a modified protocol for the FastDNA Spin Kit for Soil (MP

Biomedicals Catalog #116560200) as described previously (Smith et al., 2016; Wang

& Edwards, 2009).

Genomic DNA extracted from the two most common olivine mineral phases

(forsterite and Fo90 olivine) was pooled to produce the olivine metagenome (a total of

50 – 70 ng of DNA). The olivine genomic DNA extract was sent to the Census of

Deep Life sequencing facility at Marine Biological Laboratory in Woods Hole, MA.

Genomic DNA was amplified at this facility using whole genome amplification and

2x101bp paired-end sequencing with dedicated-read indexing on an Illumina Hi-

Seq1000 instrument. Following amplification, sequence reads were de-multiplexed using the Consensus Assessment of Sequence and Variance (CASAVA) 1.8.2

(Illumina). A total of 84 million sequences with average length of 108 bases were produced for the olivine metagenome, which totaled 9.8 Gigabases of sequence data.

Sequence Quality Check, Assembly, and Binning

Raw sequence files were concatenated into read files produced using the forward primer and reverse primer for this metagenome and then sent through the

111

String Graph Assembler (SGA)’s pre-processing pipeline (Simpson & Durbin, 2012) for quality check as described previously (Chapter 3).

The high-quality mate pair sequence read file was assembled into contigs using the Iterative deBruin Graph Assembler – Uneven Depth (IDBA-UD) program

(Peng et al., 2012) with the following specifications: minimum contig length of 450 bases and iterative kmer values from 45 – 69, stepwise by 4. The assembly with kmer length of 65 was chosen as the optimal assembly due mainly to its largest n50 value

(median size of contigs) and maximum contig size, and was therefore used in subsequent steps.

VizBin (Laczny et al., 2014), a Java-based genomic binning program which uses nonlinear dimension reduction of genomic signatures (i.e. nucleotide frequency) to assign contigs to taxonomic bins, was used to separate individual genomes out of the assembled metagenome based on sequence similarity (see Chapter 3). DNA sequences with similar nucleotide frequencies were placed into “bins” which represent unique taxa. A total of 12 bins formed from the olivine metagenome, each of which was exported for genome reconstruction, but only one is described here. Ca.

A. pyornia’s genome and all other genomes from the olivine community were uploaded to the Metagenomics Analysis Server (MG-RAST) for analysis and comparison of genome features. This genome will be made publicly available at the time of publication.

Verification of Bin Purity

Bin purity checking methods have been described previously (Chapter 3), and will be briefly summarized again here. CheckM (Parks et al., 2015) was used to

112 verify taxonomic purity and completeness of bins using a total of 43 marker genes expected in all genomes. The contamination index was calculated by adding up all non-single-copy marker genes and dividing by the total marker gene sets.

Contaminating marker genes had less than 90% amino acid identity. The strain heterogeneity index was calculated as the fraction of non-single-copy marker genes above 90% amino acid identity.

Phylogenetic Tree Construction

There were no 16S rRNA genes available from the Bin 6 genome; therefore we used several methods to determine phylogenetic relatedness of this genome’s sequence to other closely related sequences. First, we used phylogenetic results from

PhyloPythia, CheckM, BlastKoala, MG-RAST, KEGG, and the pyrotag sequencing of olivine community DNA to determine that Bin 6 from the olivine metagenome described in Chapter 3 was the best match for the ‘Clostridiales’ taxonomic group previously identified (Figure 4.1; Chapter 1; Smith et al., 2016). We therefore used the ‘Clostridiales’ 16S rRNA gene sequence from pyrotag sequencing results to construct the 16S rRNA gene phylogeny, assuming it would be a match with the Bin

6 genome.

The 16S rRNA gene phylogeny of three related organisms recovered from the surface of olivine (Chapter 3), including the Clostridiales organism described here, was constructed using MEGA5 (Tamura et al., 2011). The 16S rRNA gene sequences for each of these phylotypes, here designated 1301A_f_Peptococcaceae_Bin9,

1301A_c_Clostridia_Bin8, and Ca. Acetocimmeria pyornia (1301A_o_Clostridiales

#3_1_Bin6), and the most closely related clone or isolate sequences to each that were

113 identified by NCBI’s Basic Local Alignment and Search Tool (BLAST) were aligned and trimmed in MEGA5 (Tamura et al., 2011).

There were a total of 423 positions in the final 16S rRNA gene dataset. The evolutionary history of each organism was inferred using the Neighbor-Joining method (Saitou & Nei, 1987) and evolutionary distances were computed using the

Jukes-Cantor method (Jukes et al., 1969). The bootstrap test was applied to tree generation with 500 replicates, and branches that clustered together more than 50 percent of the time have their percentages shown on the tree (Felsenstein, 1985). A maximum likelihood tree was also produced, but was not as robust.

To compare results from the 16S rRNA gene phylogeny, we chose a highly conserved, single copy phylogenetic marker gene that is not horizontally transferred to construct a phylogenetic tree (Sunagawa et al., 2013). We used the signal recognition particle subunit SRP54 (ffh) gene sequence to construct a Maximum

Likelihood tree based on the Tamura-Nei model in MEGA5 (Tamura et al., 2011).

The initial trees were obtained through the Neighbor-Join and BioNJ algorithms.

These algorithms were applied to a matrix of pairwise distances that were estimated by Maximum Composite Likelihood. Positions that represented gaps or missing data were eliminated from the analysis. A total of 264 positions were used to construct the phylogenetic tree.

Genome Annotation and Metabolic Pathway Reconstruction

Genome bins captured and exported from VizBin were uploaded to the Kyoto

Encyclopedia of Genes and Genomes (KEGG) using BlastKOALA (Kanehisa et al.,

2016) to reconstruct the genomes of organisms recovered from the surface of olivine,

114 including the Clostridiales bacterium described here. The reconstructed metabolism for this organism was used to assess the potential for carbon fixation (Table 4.1), determine likely energy sources, and explore mineral or surface-based metabolic properties that may yield clues as to how this organism survives in JdFR aquifer biofilms.

To determine how the metabolic capabilities of Ca. A. pyornia compared to other closely related acetogens, KEGG genome annotations of Ca. Desulforudis audaxviator (Chivian et al., 2008) , Moorella thermoacetica (Pierce E et al., 2008), and Desulfitobacterium hafniense (Nonaka et al., 2006) were compared to KEGG

BlastKOALA results of metabolic pathway reconstruction for Ca. Acetocimmeria pyornia (Tables 4.2 – 4.4; Supplementary Tables 4.2 – 4.5).

For comparison of acetyl-CoA synthase (acs) gene clusters in related acetogens, KEGG Genome was used to find each acs gene cluster in the genomes of

M. thermoacetica, Ca. Desulforudis audaxviator, and D. hafniense. The acs gene clusters from these genomes were compared to the acs gene cluster from Ca. A. pyornia using reconstructed KEGG genome maps and BlastKOALA annotations from this study.

Genes not covered by pathway analysis were searched using the KEGG- annotated genome data obtained through BlastKOALA for all contigs

(Supplementary Table 4.4). Genes necessary for completion of near-complete pathways or those that may be involved in alternative pathways were searched and identified. Information regarding other genes relating to biofilm formation,

115 chemotaxis and flagella, pilus formation, metal transport or utilization, and others were gathered from the annotated genome (Supplementary Table 4.4).

Visualization of Reconstructed Metabolism

Relevant energy, carbon, and biosynthesis pathways were included in a whole cell model. The Wood-Ljungdahl pathway, incomplete TCA cycle, pathway, all transporters, and other complete pathways were included to visualize the reconstructed metabolism of Ca. A. pyornia.

Results

Evolutionary Relatedness

The evolutionary relationships of Ca. A. pyornia indicate this organism represents a novel acetogen within the bacterial Class Clostridia (Figure 4.1). Based on 16S rRNA gene and ffh gene phylogenies (Figure 4.1 and Supplemental Figure

4.1), this related ‘acetogen’ cluster includes M. thermoacetica (Pierce et al., 2008),

Ca. D. audaxviator (Chivian et al., 2008), and D. hafniense (Nonaka et al., 2006).

The closest 16S rRNA gene relative (based on the V4V6 hypervariable region) belonged to a sequence obtained from the JdFR Ridge Flank aquifer fluids (Jungbluth et al., 2016), although these sequences only had 91% similarity. This level of divergence is more than the generally accepted value for a new species (< 97% 16S rRNA gene identity, or more recently < 98.65%; Kim et al., 2014) or a new genus, which ~ 95% or less. The next closest sequences to Ca. A. pyornia, which include Ca.

Syntrophonatonum acetioxidans, Thermolithobacter carboxydivorans, T. ferrireducens, and Tepidanaerobacter acetatoxydans, all had only 88% similarity. For

116 phylogeny based on the marker gene ffh, the acetogens Ca. D. audaxviator, M. thermoacetica, and D. hafniense also cluster nearby; however, these genes have only

78%, 75%, and 74% identity to Ca. A. pyornia’s ffh gene, respectively.

Figure 4.1. 16s rRNA gene phylogeny of Ca. Acetocimmeria pyornia and closest taxonomic relatives. The evolutionary history of Ca. Acetocimmeria pyornia was inferred using the Neighbor-Joining method (Saitou & Nei, 1987) in MEGA5 (Tamura et al., 2011), with the optimal tree shown. The percentage of 500 replicate trees in which the associated taxa clustered together (bootstrap test) is shown next to the branches if greater than 50% (Felsenstein, 1985). Organisms whose taxonomic identification is in red are genomes originating from this 1301A olivine metagenome (Chapter 3). Organisms in blue are closely related uncultured bacterial clones from the Juan de Fuca Ridge borehole system and include clones from IODP Hole 1301A, 1026B, and 1362A; see Supplementary Table 4.1 for more information. Organisms in green have the Wood-Ljungdahl pathway.

117

Genome Attributes

The genome of Ca. A. pyornia is 91.5% complete, with only 3.96% contamination detected (Chapter 3). Of 2725 predicted coding sequences, 1446 genes were annotated to a given orthologous functional group (KO number) within the

KEGG database (53.1%). In general, Ca. A. pyornia’s genome features were more closely aligned with those of the closely-related acetogens Ca. D. audaxviator

(Chivian et al., 2008) and M. thermoacetica (Pierce E et al., 2008) than to D. hafniense, especially with respect to genome size and the number of protein-encoding genes. D. hafniense’s 16S rRNA gene was not closely related to that of Ca. A. pyornia (Figure 4.1), but its ffh gene phylogeny was more closely aligned. D. hafniense’s genome is roughly twice the size with double the amount of protein- encoding genes (Nonaka et al., 2006). The G + C content, which can reflect environmental factors, was similar among all acetogens in this study (~ 50%) with the exception of Ca. D. audaxviator, whose G + C content is much higher (60.9%;

Chivian et al., 2008). When Ca. A. pyornia’s genome attributes were compared to the seven other high-quality bacterial genomes from the same community, it was found that Ca. A. pyornia had the highest percentage of genes for amino acid metabolism and respiration and lowest percentage of carbohydrate metabolism genes.

There were at least 21 genes and 15 peptidase genes identified in the Ca. A. pyornia genome, more than any other member of this community.

Wood-Ljungdahl Pathway and the acs Gene Cluster

Ca. A. pyornia contains the complete Wood-Ljungdahl pathway for acetogenesis (Figure 4.2). This includes genes that code for enzymes and electron

118 transfer agents needed to complete both branches of the pathway. This organism’s genome also contains the complete phosphate acetyltransferase-acetate kinase pathway to transform acetyl-CoA to acetate and recover the energy expended earlier in the pathway (Figure 4.2). Genes for the carbonyl branch of the Wood-Ljungdahl pathway are contained within an acs gene cluster that is commonly found in acetogens (Figure 4.3; Table 4.1). The acs gene cluster of Ca. A. pyornia most closely resembles that of Ca. D. audaxviator, with only slight differences in the annotations of a 4Fe-4S ferredoxin gene, which is specifically denoted as heterodisulfide reductase A (hdrA) in Ca. A. pyornia. The acs gene clusters of M. thermoacetica and D. hafniense were less similar to the acs gene cluster of Ca. A. pyornia, mainly due to the addition or loss of genes within the cluster.

Incomplete TCA Cycle

The genome of Ca. A. pyornia encodes for an incomplete TCA cycle. The genes coding for the enzymes malate dehydrogenase and citrate synthase are missing

(Figure 4.4). Malate dehydrogenase transforms malate into oxaloacetate, and citrate synthase uses oxaloacetate and acetyl-CoA to produce citrate and begin the cycle again. However, if portions of the TCA cycle are run in reverse, key biosynthetic intermediates can still be produced for metabolic needs. This organism does, however, contain the oxaloacetate-decarboxylating malate dehydrogenase, which is situated in a TCA cluster on the same contig immediately following the acs cluster from the Wood-Ljungdahl pathway (Table 4.1). This portion of the contig contains genes coding for fumarate hydratase, the aforementioned malate dehydrogenase

(oxaloacetate-decarboxylating), all four succinate dehydrogenase/fumarate reductase

119 subunits, and succinyl-CoA synthetase; however, the oxaloacetate-decarboxylating malate dehydrogenase is not known to produce oxaloacetate, only to decarboxylate it to malate (the reverse step in the TCA cycle). At least one other pathway that results in oxaloacetate formation was identified. The enzyme aspartate aminotransferase can produce oxaloacetate and L-glutamate from L-aspartate and-Ketoglutarate (Figure

4.4). Oxaloacetate can also be converted to pyruvate using oxaloacetate decarboxylase. The gene for citrate lyase beta subunit was also present; therefore it is possible that other genes not identified could be present in the genome, allowing Ca.

A. pyornia to produce oxaloacetate from citrate in the reverse direction (Ragsdale &

Pierce, 2008). Ca. A. pyornia contains at least two complete pathways for the production of acetate from acetyl-CoA (Figure 4.4), and can convert pyruvate to phosphoenolpyruvate (PEP), and vice versa.

Oxidative Phosphorylation

Ca. A. pyornia contains all but one gene to construct the complete anaerobic electron transport chain (ETC) for oxidative phosphorylation. We found the near- complete pathway for Complex I, the complete pathway for Complex II of the ETC, and the complete pathway for the prokaryotic F-type ATPase (Supplementary Table

4.2). Complex I, or NADH:quinone oxidoreductase (NADH dehydrogenase), is a multi-subunit complex consisting of fourteen gene products, thirteen of which are present in the Ca. A. pyornia genome (subunits A – F and H – N (no G),

Supplementary Table 4.2). We also found the complete pathway for succinate dehydrogenase, which consists of four genes that code for each of the four subunits of this enzyme, sdhABCD (Supplementary Table 4.2). M. thermoacetica, Ca. D.

120 audaxviator, and D. hafniense are all missing the complete gene set to make succinate dehydrogenase (Supplementary Table 4.2), but all have at least one subunit in their genome. All four acetogens discussed in this manuscript have the F-type ATPase.

Figure 4.2. The Wood-Ljungdahl pathway for carbon fixation. The acetogenic Wood-Ljungdahl pathway (or acetyl-CoA) pathway for carbon fixation uses molecular hydrogen and carbon dioxide to produce acetate, generating ATP. If needed, acetyl-CoA can be rerouted for biomass production. The enzymes in this pathway are as follows: a: formate dehydrogenase; b: formyl-H4folate synthase; c: formyl-H4folate cyclohydrase; d: methylene-H4folate dehydrogenase; e: methylene- H4folate reductase; f: methyltransferase; g: carbon monoxide dehydrogenase; h: acetyl-CoA synthase; i: phosphotransacetylase; j: acetate kinase. THF (tetrahydrofolate) = H4. The genome of Ca. A. pyornia also contains the complete phosphate acetyltransferase-acetate kinase pathway.

121

Figure 4.3. Arrangement of acs gene cluster of the Wood-Ljungdahl pathway on contig-65_203 of Ca. Acetocimmeria pyornia as compared to closely related acetogens. Descriptions of abbreviated gene IDs and relevant Enzyme Commission (EC) numbers (in brackets) are: acsA/cooS: acetyl-CoA synthase alpha subunit/carbon-monoxide dehydrogenase catalytic subunit [EC:1.2.7.4]; acsB: acetyl- CoA synthase beta subunit [EC:2.3.1.169]; acsC/cdhE: acetyl-CoA decarbonylase/synthase complex subunit gamma [EC:2.1.1.245]; ferredoxin (RefSeq): no KEGG orthology (KO) assigned; acsF/cooC: CO dehydrogenase maturation factor; cas: cobyrinic acid a,c-diamide synthase (RefSeq), no KO assigned; acsD/cdhD: acetyl-CoA decarbonylase/synthase complex subunit delta [EC:2.1.1.245]; acsE: 5-methyltetrahydrofolate corrinoid/iron sulfur protein methyltransferase [EC:2.1.1.258]; hdrC: heterodisulfide reductase subunit C; hp: hypothetical protein; hdrA: heterodisulfide reductase subunit A [EC:1.8.98.1]; 4Fe-4S ferredoxin (RefSeq): no KO assigned; d: methyl-viologen-reducing hydrogenase subunit delta (RefSeq), no KO assigned; zfp: zinc-finger protein (RefSeq), no KO assigned; metF: methylenetetrahydrofolate reductase (NADPH) [EC:1.5.1.20].

122

Table 4.1. KEGG annotations of contig-65_203 containing the acs gene cluster of the Wood-Ljungdahl pathway for carbon fixation. Contig-65_203 contains 25 genes, here numbered contig-65_203_1 – contig-65_203_25. Some genes do not have matching orthologous proteins in the KEGG database and the gene or function is therefore unassigned. Some of these matches do have “second best” matches, which may be used to infer the closest function. KO=KEGG Orthology.

123

Table 4.1

KO Second Gene ID for Gene Definition Best Best Contig-65_203 Match Match

contig-65_203_1 Unassigned None

contig-65_203_2 Unassigned None

cooS; carbon-monoxide dehydrogenase catalytic contig-65_203_3 K00198 subunit [EC:1.2.7.4]

contig-65_203_4 acsB; acetyl-CoA synthase [EC:2.3.1.169] K14138

cdhE; acetyl-CoA decarbonylase/synthase contig-65_203_5 K00197 complex subunit gamma [EC:2.1.1.245]

contig-65_203_6 Unassigned None

contig-65_203_7 cooC; CO dehydrogenase maturation factor K07321

cdhD; acetyl-CoA decarbonylase/synthase contig-65_203_8 K00194 complex subunit delta [EC:2.1.1.245]

acsE; 5-methyltetrahydrofolate corrinoid/iron contig-65_203_9 K15023 sulfur protein methyltransferase [EC:2.1.1.258]

contig- hdrA; heterodisulfide reductase subunit A K03388 65_203_10 [EC:1.8.98.1]

contig- Unassigned None K14127 65_203_11

contig- Unassigned None 65_203_12

124

Table 4.1 (Continued)

contig- metF; methylenetetrahydrofolate reductase K00297 65_203_13 (NADPH) [EC:1.5.1.20]

contig- Unassigned None K17836 65_203_14

contig- E4.2.1.2AA; fumarate hydratase subunit alpha K01677 65_203_15 [EC:4.2.1.2]

contig- E4.2.1.2AB; fumarate hydratase subunit beta K01678 65_203_16 [EC:4.2.1.2]

contig- ME2; malate dehydrogenase (oxaloacetate- K00027 65_203_17 decarboxylating) [EC:1.1.1.38]

contig- sdhC; succinate dehydrogenase / fumarate K00241 65_203_18 reductase, cytochrome b subunit

contig- sdhD; succinate dehydrogenase / fumarate K00242 K00241 65_203_19 reductase, membrane anchor subunit

sdhA; succinate dehydrogenase / fumarate contig- reductase, flavoprotein subunit [EC:1.3.5.1 K00239 65_203_20 1.3.5.4] sdhB; succinate dehydrogenase / fumarate contig- reductase, iron-sulfur subunit [EC:1.3.5.1 K00240 65_203_21 1.3.5.4]

contig- sucC; succinyl-CoA synthetase beta subunit K01903 65_203_22 [EC:6.2.1.5]

contig- sucD; succinyl-CoA synthetase alpha subunit K01902 65_203_23 [EC:6.2.1.5]

contig- Unassigned None K15876 65_203_24

contig- Unassigned None K01181 65_203_25

125

Figure 4.4. Tricarboxylic Acid Cycle and related pathways in Ca. A. pyornia. Ca. A. pyornia has an incomplete TCA cycle, and is missing genes that code for two enzymes: citrate synthase and malate dehydrogenase (red arrows o and g). Oxaloacetate can still be produced from L-aspartate and 2-oxoglutarate, and oxaloacetate can be decarboxylated to pyruvate. Acetate can be produced through multiple pathways, including the phosphate acetyltransferase-acetate kinase pathway. The enzymes labeled a – s are as follows: a: pyruvate, orthophosphate dikinase, b: , c: , d: acetyl-CoA synthase, e: putative phosphotransacetylase, f: acetate kinase, g: citrate synthase, h & i: aconitase, j: isocitrate dehydrogenase, k: -ketoglutarate dehydrogenase, l: succinyl-CoA synthetase, m: succinate dehydrogenase, n: fumarase, o: malate dehydrogenase, p: aspartate aminotransferase, q: oxaloacetate decarboxylase, r: pyruvate synthase; s: ATP citrate lyase, t: malate dehydrogenase (oxaloacetate-decarboxylating)/malic enzyme. PEP = phosphoenolpyruvate.

126

Incomplete Sulfate Reduction Pathway

The Ca. A. pyornia genome does not contain genes that allow it to perform dissimilatory sulfate reduction (Table 4.2). Dissimilatory sulfate reductase genes

(dsrA and dsrB) were not present in the genome, although the sat and aprAB genes, whose enzyme products catalyze the reversible transformation of sulfate to sulfite, were detected (Table 4.2).

Transporters

The only pathways for importers present in the Ca. A. pyornia genome are those for proteinaceous compounds, minerals, and metals (Supplementary Tables 4.3

& 4.4; Figure 4.5). There are complete pathways for the production of oligopeptide, peptide/nickel, and branched chain amino acid transporters. The branched chain amino acid transport system and its associated aminotransferase allows for the utilization of leucine, isoleucine, and valine. Ca. A. pyornia contains both the transporter and the branched-chained amino acid aminotransferase, and has one of three enzymes that make up the branched-chain -keto acid dehydrogenase complex, which allows for the degradation of these amino acids to produce energy. There were also complete pathways for iron, nickel, molybdate, and tungstate transporters, some of which are required for Wood-Ljungdahl pathway enzymes or respiratory activity.

Ca. A. pyornia appears unable to import lipids or carbohydrate molecules from the environment, since the pathways that are used to make these transporters were incomplete or missing from its genome. These pathways include the saccharide, polyol, and lipid transport system and the system (Supplementary

Table 4.3).

127

Table 4.2. Complete pathways of carbon fixation and energy metabolism in Ca. A. pyornia (Ca. Apy) and three closely related acetogens as determined by the KEGG pathway module. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene.

KEGG Gene Discrete pathway Organism Da Ca. Energy metabolism Mta Dsy u Apy Carbon fixation

M00377 Reductive acetyl-CoA pathway (Wood-Ljungdahl pathway) (+) + + + Phosphate acetyltransferase-acetate kinase pathway, acetyl- M00579 + + + CoA => acetate Nitrogen metabolism

M00175 Nitrogen fixation, nitrogen => ammonia + + M00530 Dissimilatory nitrate reduction, nitrate => ammonia + Methane metabolism

M00422 + Acetyl-CoA pathway, CO2 => acetyl-CoA Sulfur metabolism

M00596 Dissimilatory sulfate reduction, sulfate => H S + (-) 2

128

Olivine (Mg,Fe)2SiO4

Figure 4.5. Metabolic pathways of Ca. A. pyornia. This organism contains the complete Wood-Ljungdahl pathway for carbon fixation, hydrogenases, and no carbohydrate or lipid importers. It is predicted that this organism is able to take advantage of hydrogen production on Fe2+-bearing minerals like olivine as they react with seawater. Hydrogen and bicarbonate can be funneled into the Wood-Ljungdahl pathway to produce acetyl-CoA for biosynthesis and energy. This organism is missing many amino acid synthesis pathways and likely needs to import these for biosynthesis or energy. Individual transport or regulatory systems pathways (denoted by lower case letters here) are listed in Tables 4.2 – 4.4 and Supplementary Tables 4.2 – 4.4 under the headings that match the labels in this figure.

129

The genes involved in export of lipooligosaccharides, heme (used in membrane-bound succinate dehydrogenase), and other components related to the cell membrane or extracellular structures were also present (Supplementary Table 4.3).

The Sec secretion system, which is involved in the synthesis and secretion of , toxins, and pilus proteins like adhesin, was present in all acetogens in this study. Pathways for export of some lipooligosaccharides, which are likely membrane components, and heme, which is an integral part of the membrane-bound succinate dehydrogenase complex, were also present in the genome.

Central Carbohydrate Metabolism

The glycolysis pathway is incomplete in Ca. A. pyornia, and although the typical gluconeogenesis pathway is incomplete, Ca. A. pyornia may still be able to use this pathway with an alternative enzyme that performs the same function. The glycolysis pathway is missing the step performed by bisphosphate aldolase

(Table 4.3), which converts -D fructose 1,6-bisphosphate to D-glyceraldehyde 3- phosphate; however the reverse of this step is not reported missing in the gluconeogenesis pathway, as Ca. A. pyornia contains the fructose 1, 6-bisphosphate aldolase, . The gluconeogenesis pathway was deemed incomplete since the conversion of oxaloacetate to phosphoenolpyruvate via phosphoenolpyruvate

(PEP) carboxykinase is missing, but this organism encodes the oxaloacetate decarboxylase and pyruvate orthophosphate-dikinase enzymes, which provide an alternative route to gluconeogenesis (Figure 4.4). These enzymes enable the conversion of oxaloacetate to pyruvate and then to PEP, from which gluconeogenesis can then proceed. There are complete pathways for the glycolysis 3-C compound core

130 module and the non-oxidative pentose phosphate pathway. There is also no PTS system for , so exogenous glucose cannot be imported, phosphorylated, and subsequently used as a substrate. The pentose phosphate pathway allows this organism to produce 5-carbon compounds for biosynthesis, and the gluconeogenesis pathway allows it to produce 6-carbon compounds. M. thermoacetica, Ca. D. audaxviator, and D. hafniense all have complete pathways for glycolysis and the pentose phosphate pathway (Table 4.3).

131

Table 4.3. Complete pathways of carbohydrate and lipid metabolism in Ca. A. pyornia (Ca. Apy) and two closely related acetogens as determined by the KEGG pathway module. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene.

KEGG Gene Discrete pathway Organism Ca. Energy metabolism Mta Dau Dsy Apy Carbohydrate and lipid metabolism Central carbohydrate metabolism

Glycolysis (Embden-Meyerhof pathway), glucose => M00001 + + + (-) pyruvate Glycolysis, core module involving three-carbon M00002 + + + + compounds M00003 Gluconeogenesis, oxaloacetate => fructose-6P + + + (-) Citrate cycle, first carbon oxidation, oxaloacetate => 2- M00010 + (-) oxoglutarate M00307 Pyruvate oxidation, pyruvate => acetyl-CoA + Pentose phosphate pathway, non-oxidative phase, M00007 + + + + fructose 6P => ribose 5P Semi-phosphorylative Entner-Doudoroff pathway, M00308 (-) gluconate => glycerate-3P M00005 PRPP biosynthesis, ribose 5P => PRPP + + + + Other carbohydrate metabolism Propanoyl-CoA metabolism, propanoyl-CoA => M00741 (-) succinyl-CoA M00549 Nucleotide sugar biosynthesis, glucose => UDP-glucose + (-) Fatty acid metabolism

M00082 Fatty acid biosynthesis, initiation (-) M00083 Fatty acid biosynthesis, elongation + + + + M00086 beta-Oxidation, acyl-CoA synthesis + Lipid metabolism Phosphatidylethanolamine (PE) biosynthesis, PA => PS M00093 + => PE Terpenoid backbone biosynthesis

M00096 C5 isoprenoid biosynthesis, non-mevalonate pathway + +

M00364 C10-C20 isoprenoid biosynthesis, bacteria +

132

Table 4.4. Complete pathways of nucleotide and amino acid metabolism in Ca. A. pyornia (Ca. Apy) and two closely related acetogens as determined by the KEGG pathway module. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene.

133

Table 4.4

KEGG Gene Discrete pathway Organism Ca. Energy metabolism Mta Dau Dsy Apy Nucleotide and amino acid metabolism

Inosine monophosphate biosynthesis, PRPP + glutamine M00048 + (-) => IMP M00049 Adenine ribonucleotide biosynthesis, IMP => ADP,ATP + + + + M00050 Guanine ribonucleotide biosynthesis IMP => GDP,GTP + + + + Pyrimidine metabolism

Uridine monophosphate biosynthesis, glutamine (+PRPP) M00051 (-) => UMP Pyrimidine ribonucleotide biosynthesis, UMP => M00052 + + + + UDP/UTP,CDP/CTP Serine and threonine metabolism

Threonine biosynthesis, aspartate => homoserine => M00018 + + + + threonine Cysteine and metabolism

M00021 Cysteine biosynthesis, serine => cysteine + + + + Branched-chain amino acid metabolism

Valine/isoleucine biosynthesis, pyruvate => valine / 2- M00019 + + + + oxobutanoate => isoleucine Isoleucine biosynthesis, threonine => 2-oxobutanoate => M00570 + + + isoleucine Leucine biosynthesis, 2-oxoisovalerate => 2- M00432 + + + + oxoisocaproate Lysine metabolism

Lysine biosynthesis, DAP dehydrogenase pathway, M00526 + (-) aspartate => lysine Lysine biosynthesis, DAP aminotransferase pathway, M00527 + + + + aspartate => lysine and proline metabolism

M00015 Proline biosynthesis, glutamate => proline + + + + M00028 Ornithine biosynthesis, glutamate => ornithine + + + (-) Histidine metabolism

Histidine degradation, histidine => N-formiminoglutamate M00045 (-) => glutamate M00026 Histidine biosynthesis, PRPP => histidine + + + + Aromatic amino acid metabolism

Shikimate pathway, phosphoenolpyruvate + erythrose-4P M00022 + + + + => chorismate M00023 Tryptophan biosynthesis, chorismate => tryptophan + + +

M00024 Phenylalanine biosynthesis, chorismate => phenylalanine (-) M00025 Tyrosine biosynthesis, chorismate => tyrosine (-)

134

Table 4.4 (Continued)

Cofactor and vitamin biosynthesis

M00127 Thiamine biosynthesis, AIR => thiamine-P/thiamine-2P + +

M00125 Riboflavin biosynthesis, GTP => riboflavin/FMN/FAD +

M00115 NAD biosynthesis, aspartate => NAD + + Pantothenate biosynthesis, valine/L-aspartate => M00119 + (-) pantothenate M00120 Coenzyme A biosynthesis, pantothenate => CoA + + + + M00123 Biotin biosynthesis, pimeloyl-ACP/CoA => biotin +

M00122 Cobalamin biosynthesis, cobinamide => cobalamin (-) M00140 C1-unit interconversion, prokaryotes + + +

M00141 C1-unit interconversion, eukaryotes +

Polyamine biosynthesis

Polyamine biosynthesis, arginine => agmatine => M00133 + + + putrescine => spermidine

Discussion

Genome Attributes

The genome of Ca. A. pyornia is most similar to those of M. thermoacetica and Ca. D. audaxviator; however, the large amount of unannotated predicted coding genes highlights the paucity of genomic information obtained from the deep marine subsurface and our limited understanding of the dominant processes in the subseafloor. Just over half of the genes were annotated using the KEGG database, even with the other fully sequenced genomes of related acetogens publicly available.

Ca. A. pyornia’s genome contains 17% more unannotated genes than reported for M. thermoacetica (53.1% versus 70.13%), a stark contrast between the cultured and uncultured microbial majority.

Evolutionary Relationship to the Acetogen Firmicutes Clade

The phylogenetic relationships of Ca. A. pyornia to other acetogens suggest that this is a novel organism within a group of related acetogenic Clostridia (Figure

135

4.1 & Supplementary Figure 4.1). This group now contains both marine and terrestrial organisms; however, the marine acetogens from oceanic crust have only recently been described (Jungbluth et al., 2017; Chapter 3). The terrestrial acetogen

Ca. D. audaxviator was first described from deep continental crust (Chivian et al.,

2008), D. hafniense from contaminated soil (Nonaka et al., 2006), and M. thermoacetica from horse manure (Fontaine et al., 1942). A new marine acetogen related to Ca. D. audaxviator, Ca. Desulfopertinax cowenii, was recently described from the JdFR flank aquifer fluid community (Jungbluth et al., 2017), and seven other putative acetogens were discovered from the JdFR olivine community, one of which is the Ca. A. pyornia described here. Considering the 16S rRNA and ffh gene phylogenies along with the acs gene cluster arrangement, Ca. D. audaxviator and M. thermoacetica appear to be the closest acetogenic relatives to Ca. A. pyornia.

Ca. Desulfopertinax cowenii (Jungbluth et al., 2017) and the 16S rRNA gene of Ca. A pyornia matches the closely related JdFR clade containing

1301A_f_Peptococcaceae_Bin9 from this olivine community (Chapter 3; Smith et al.,

2016), an uncultured bacterium clone from a nearby borehole (1026B3), and another clone from the fluid of Hole 1301A (1301A08 104; Supplementary Table 4.1). Ca. D. cowenii also possesses the Wood-Ljungdahl pathway; however, the genome is missing the formate dehydrogenase gene. The same enzyme was also missing from the 1301A_f_Peptococcaceae_Bin9, suggesting that this Peptococcaceae clade of organisms from the JdFR aquifer, which comprised the largest portion of the fluid community at the time of retrieval, but not the attached community, may not be able to use both branches of the Wood-Ljungdahl pathway like Ca. A. pyornia.

136

Ca. A. pyornia is evolutionarily distant from acetogens in the Clostridia clade and other closely related environmental sequences. It should be noted that for the genes analyzed here (ffh and 16S rRNA), the maximum similarity between those of

Ca. A. pyornia and these closely related acetogens does not exceed 87%. The 16S rRNA gene similarity between M. thermoacetica and Ca. D. audaxviator is also 87%.

The maximum 16S rRNA gene sequence similarity score for Ca. A. pyornia was 91% compared to a sequence obtained from an environmental sample in the JdFR flank fluids (Jungbluth et al., 2016), and this is a likely reason that the lowest classification level it has ever received is ‘Order Clostridiales’. Ca. A. pyornia represents at the very least a new genus within the Clostridiales since the cutoff value for genera is generally accepted to be ~95%.

An Acetogen Dominates an Olivine Community

The presence of this acetogen may have great impact on carbon cycling and the health and function of the suboceanic aquifer ecosystem and the deep ocean

(McCarthy et al., 2010; Mason et al., 2009a). Ca. A. pyornia may be a prominent member of the attached community in Hole 1301A (and perhaps 1026B) as evidenced by its ubiquity in colonized rock and mineral biofilms (Smith et al., 2016; Orcutt et al., 2011) and because it encodes the full Wood-Ljungdahl carbon fixation pathway

(Figure 4.2). This organism’s genome also encodes reductases and a ferredoxin that may be used to generate ATP via oxidative phosphorylation (Pierce et al., 2008).

Since this organism has only been described from mineral biofilms and does not appear to be a dominant member of the aquifer fluid community, this organism may benefit from a mineral biofilm lifestyle. A mineral biofilm can provide a wider range

137 of substrates for metabolism either obtained from the mineral as it reacts with water or from other members of the community. Since H2 can be produced by mineral surfaces as they react with water in the thermal aquifer (Mayhew et al., 2013), this organism may be using H2 to power the Wood-Ljungdahl pathway. Charged mineral surfaces can also attract other charged molecules like certain amino acids from aquifer fluid (Lin et al., 2015), effectively concentrating these nutrients near the mineral surface. There is also some indication that this organism may participate in a syntrophic relationship in the biofilm community, especially since Ca. A. pyornia is missing some complete biosynthetic pathways, such as those involved in amino acid biosynthesis, and is able to import small peptides (Table 4.4; Supplementary Table

4.3). acs Gene Cluster Comparison

The co-localization of acs genes within a gene clusters is a common attribute of acetogens, and may allow this pathway to be horizontally transferred (Pierce et al.,

2008). Previously, it was found that acs gene clusters come in several varieties

(Pierce et al., 2008), and the gene arrangement of Ca. A. pyornia, along with the gene cluster that most closely resembles that of Ca. A. pyornia (Ca. D. audaxviator), may form a new class of acs gene cluster that is unlike those that were described before.

The gene clusters of Ca. A. pyornia and Ca. D. audaxviator are near-identical, with the only difference being the annotation for the 4Fe-4S ferredoxin (green arrow,

Figure 4.3). This gene was annotated as hdrA, also a 4Fe-4S ferredoxin, but these genes may represent isoforms of the same heterodisulfide reductase.

138

The acs gene clusters of M. thermoacetica and D. hafniense are similar to the

Ca. A. pyornia and Ca. D. audaxviator gene clusters in that they share the same acs genes in the same arrangement (Figure 4.3); however, there are some marked differences with each genome. M. thermoacetica’s genome contains another heterodisulfide reductase gene (hdrC) and a hypothetical protein (hp) after the acsE gene, and D. hafniense has a simplified gene cluster with no other associated genes after acsE. Both M. thermoacetica and D. hafniense also contain an acsF gene at the beginning of the acs cluster (Figure 4.3). M. thermoacetica also contains an unannotated isoform of the acsF gene within its cluster, the cobyrinic acid a,c- diamide synthase gene. Due to the similarity of Ca. A. pyornia’s acs gene cluster with that of Ca. D. audaxviator’s, it is likely that these organisms have a shared history relating to the Wood-Ljungdahl pathway, and these gene clusters may form a new cluster arrangement apart from the one M. thermoacetica or D. hafniense occupies.

Incomplete TCA Cycle

Ca. A. pyornia contains an incomplete TCA cycle and is missing citrate synthase and malate dehydrogenase. Although incomplete TCA cycles are common in Clostridia, such as the closely-related acetogens discussed here (Chivian et al.,

2008; Pierce et al., 2008; Nonaka et al., 2006), the missing enzymes vary. All of the bacteria from the olivine community to which Ca. A. pyornia belongs (8 microbes – 2

‘Bacteria’, 6 Clostridia) were also missing citrate synthase from their genomes, and the acetogens Ca. D. audaxviator and D. hafniense are also missing this enzyme. It is not known if Ca. A. pyornia can make citrate using alternative pathways. Ca. A. pyornia is also missing malate dehydrogenase, as is M. thermoacetica (Pierce et al.,

139

2008); however, Ca. A. pyornia should still be able to make oxaloacetate through the aspartate aminotransferase reaction (Figure 4.4).

The other closely-related acetogens to Ca. A. pyornia are also missing some

TCA cycle genes; however these varied among genomes. As noted above, among those that shared the same missing enzymes as Ca. A. pyornia, Ca. D. audaxviator and D. hafniense are both missing citrate synthase (Chivian et al., 2008; Nonaka et al., 2006), and M. thermoacetica is missing malate dehydrogenase (Pierce et al.,

2008). Ca. D. audaxviator is missing the additional enzymes aconitase and - ketoglutarate dehydrogenase, M. thermoacetica is also missing succinyl-CoA synthetase and fumarase, and D. hafniense is missing isocitrate dehydrogenase and succinate dehydrogenase. Ca. D. audaxviator and D. hafniense do contain fumarate reductase that can act similarly to succinate dehydrogenase; thus, these organisms can still complete the reversible enzymatic step that forms fumarate from succinate, depending on the conditions present (Chivian et al., 2008; Nonaka et al., 2006).

However, Ca. A. pyornia is the only one of this group that contains all four succinate dehydrogenase genes that code for different subunits (Chapter 3).

Oxidative Phosphorylation

Ca. A. pyornia may use oxidative phosphorylation during anaerobic respiration with formate as a terminal electron acceptor. Ca. A. pyornia contains the genes for succinate dehydrogenase (sdhABCD) and heme export (Complex II), has all but one gene coding for NADH dehydrogenase (Complex I), and has the complete gene set for production of the F-type ATPase (Supplementary Table 4.2). This genome also contains abundant hydrogenases, ferredoxins, and cytochromes that may

140 be used to shuttle protons out of the cell to create a proton-motive force and drive oxidative phosphorylation (Chapter 3; Vignais & Billoud, 2007). These electron transport proteins may be coupled to the Wood-Ljungdahl pathway to provide additional energy for the cell, as has been proposed in M. thermoacetica (Pierce et al.,

2008; Schuchmann & Müller, 2014). All genomes from the olivine community from which Ca. A. pyornia originated lacked at least one gene to complete NADH dehydrogenase, but most of the eleven high-quality genomes lacked more than one gene. Ca. A. pyornia and two other genomes lacked only one gene and one of these was missing the gene that codes for the same subunit as Ca. A. pyornia, nuoG. This organism is not an acetogen and is unrelated as it belongs to Archaea. Although these nuo genes were typically found in clusters of two or more throughout the olivine community genomes and were often associated with Wood-Ljungdahl pathway genes, there did not appear to be a conserved operon with potential un-annotated or mis- annotated nuo genes. There is evidence that formate dehydrogenase and some

NAD(P)+ - dependent hydrogenases contain the nuoEFG module of NADH dehydrogenase (Friedrich & Scheide, 2000), so perhaps Ca. A. pyornia is able to couple these enzymes together for respiration or use an alternative module to complete the NADH dehydrogenase complex. The NADH dehydrogenase complex has also been shown to be coupled with hydrogenases such as ech, and Ca. A. pyornia has this and a myriad of other hydrogenases that may be candidates for such coupling

(Chapter 3). Thus, it remains unknown if Ca. A. pyornia is able to synthesize a fully functional NADH dehydrogenase, but there may be a mechanism to allow it to complete anaerobic respiration. In contrast with the other acetogens discussed here,

141 the deep subsurface Ca. D. audaxviator only has the nuoEFG complex, but M. thermoacetica, and D. hafniense contain the complete gene set (nuoA – nuoN) to synthesize this complex.

Lack of Sulfate Reduction and Carbohydrate/Lipid Degradative Pathways

Unlike the typical acetogenic lifestyle, Ca. A. pyornia does not appear to use sulfate reduction, carbohydrates, or lipids for energy, although key genes may have been either unannotated or mis-annotated in our analysis. This may be indicate a loss of function due to competition with others for sulfate or a possible syntrophic relationship whereby less energy can be devoted to producing substrates that can be obtained from others nearby. Although the Ca. A. pyornia genome is high-quality and is nearly complete, we are missing 8.5% of its genome that may contain key metabolism genes. The closely-related Ca. D. audaxviator is a sulfate reducer that uses nitrogen fixation and carbohydrate degradation to supplement its energy

(Chivian et al., 2008). M. thermoacetica also uses carbohydrates (Pierce et al., 2008) and among other metabolic strategies, D. hafniense is known to dechlorinate compounds for energy (Nonaka et al., 2006). Since Ca. A. pyornia is unable to use these substrates and energy sources for supplemental metabolism, it may be using other substrates that are not typical of the other acetogens.

Transport and Degradation of Amino Acids and Oligopeptides

Although all acetogens considered in this study can import branched-chain amino acids, oligopeptides, and contain a peptide/nickel transporter, Ca. A. pyornia may rely on these compounds as a source of energy. Ca. A. pyornia had more genes related to respiration and amino acid metabolism in its genome than any other

142 organism in its community (4.86% and 9.17%, respectively), as well as a larger number of peptidases and proteases. This suggests that Ca. A. pyornia is either more dependent on amino acids and peptides as an additional source of energy than other members of its community or that it imports these compounds to spend less energy in producing them.

Dissolved amino acids are available in the JdFR aquifer, although their concentrations are less than typical submarine hydrothermal fluid and are instead roughly equivalent with concentrations in deep seawater (1 - 13 nM for dissolved free amino acids and 43 - 89 nM for dissolved hydrolyzable amino acids; Lin et al., 2015).

The lower amino acid concentrations in the JdFR basement fluids relative to seawater could indicate interaction with adsorptive mineral surfaces as has been suggested (Lin et al., 2015), or it could be an indication of consumption by the microbial community.

Ca. A. pyornia could be one such organism contributing to this drawdown, and since it was detected here in mineral biofilms, it may be utilizing those amino acids adsorbing to the mineral surfaces as an energy or nitrogen source. For Ca. A. pyornia, it would be beneficial to use amino acid fermentation in this environment, especially since this organism does not appear to use or import other exogenous substrates, and using an uncommon metabolic strategy (Fonknechten et al., 2010) may mean less competition.

Although the pathways to import and begin the degradation of branched-chain amino acids are present in the Ca. A. pyornia genome, the degradation pathway is incomplete. It is possible that the fate of these amino acids lies in Stickland

143 fermentation; however, we were unable to identify enzymes that could be used to ferment amino acids using this strategy. Stickland fermentation couples the oxidation and deamination of one amino acid to the reduction and deamination of another

(Fonknechten et al., 2010; Nisman, 1954), producing ATP during a subsequent enzymatic reaction involving acetylphosphate. Ca. A. pyornia contains the enzymes dehydrogenase and dehydrogenase (Supplementary Table 4.5) to complete the deamination reaction; however, we could not identify an amino acid reductase that these could be coupled with. Glycine dehydrogenase is coupled with glycine hydroxymethyltransferase in the interconversion of glycine and serine during the production of a Wood-Ljungdahl pathway intermediate 5,10-methylene THF

(Fonknechten et al., 2010). Alanine dehydrogenase will deaminate alanine and convert it to pyruvate, while producing NADH and H+. The NADH and H+ are usually shuttled to an amino acid reductase to deaminate the other amino acid to acetylphosphate and then acetate. ATP is produced from each amino acid in the

Stickland reaction when acetylphosphate is converted to acetate; therefore it could be used as an alternative energy source for acetogens in environments like the suboceanic aquifer where carbohydrates and lipids are scarce.

Central Carbohydrate Metabolism

Ca. A. pyornia may be able to build 5 and 6-carbon sugars for biosynthesis of saccharides, nucleotides, and energy and enzymatic cofactors via a modified gluconeogenesis pathway originating from pyruvate. The enzyme pyruvate, orthophosphate kinase allows for the conversion of pyruvate to phosphoenolpyruvate is one of the steps of the Crassulacean acid metabolism (CAM) carbon fixation

144 pathway (Table 4.2; Supplementary Table 4.5; Figure 4.4). Ca. A. pyornia is unable to complete glycolysis, which is unusual among the closely-related acetogens (Table

4.3); however, the lack of sugar importers along with the missing genes for the glycolysis pathway suggest that Ca. A. pyornia is not utilizing exogenous carbohydrates and must therefore synthesize these biomolecules from pyruvate using the modified gluconeogenesis pathway.

Other Carbon Metabolisms

Ca. A. pyornia is able to synthesize nucleotides and some amino acids by using the non-oxidative pentose phosphate pathway (Table 4.3 and Figure 4.6); however, the pathway for lipid synthesis is incomplete. Ca. A. pyornia is missing the gene necessary to make acetyl-CoA carboxylase (ACC), which forms malonyl-CoA from acetyl-CoA; however, this initiation of fatty acid synthesis is also incomplete in the other closely-related acetogens (Table 4.3). All of these acetogens are able to elongate fatty acids, so they may have an unknown pathway for malonyl-CoA synthesis.

Conclusion Ca. A. pyornia represents a novel deep-sea thermophilic acetogen that may be utilizing amino acids instead of carbohydrates or sulfate reduction to supplement its carbon and energy metabolism. Its dominance in this subseafloor-incubated olivine community is unusual in that its acetogenic metabolism is not predicted under thermodynamic considerations. This organism’s products may be a source of carbon for others in the community, particularly since it is potentially capable of

145 chemoautotrophy and produces acetate as a waste product. Ca. A. pyornia’s genome indicates that this organism is uniquely suited to life on Fe-bearing minerals in the subseafloor, potentially utilizing molecular hydrogen and amino acids adsorbed to mineral surfaces. Its metabolic pathways include the complete Wood-Ljungdahl pathway for acetogenesis and it utilizes enzymes from other partial carbon fixation pathways including the reverse TCA cycle, the CAM light reaction in photosynthesis, and the reductive pentose phosphate cycle. The unusual metabolisms of Ca. A. pyornia have been previously undetected in the suboceanic crustal aquifer, and may bring new light to the function of this vast ecosystem.

References

Brazelton WJ, Schrenk MO, Kelley DS, Baross J a. (2006). Methane- and sulfur- metabolizing microbial communities dominate the Lost City hydrothermal field ecosystem. Appl Environ Microbiol 72:6257–70.

Chivian D, Brodie E, Alm E, Culley D. (2008). Environmental genomics reveals a single-species ecosystem deep within Earth. Science 322:275–278.

Drake HL, Daniel SL, Küsel K, Matthies C, Kuhner C, Braus-Stromeyer S. (1997). Acetogenic bacteria: what are the in situ consequences of their diverse metabolic versatilities? BioFactors 6:13–24.

Edwards KJ, Wheat CG, Sylvan JB. (2011). Under the sea: microbial life in volcanic oceanic crust. Nat Rev Microbiol 9:703–712.

Felsenstein J. (1985). Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution (N Y) 39:783–791.

Fisher AT, Wheat CG, Becker K, Davis EE, Jannasch H, Schroeder D, et al. (2005). Scientific and technical design and deployment of long-term subseafloor observatories for hydrogeologic and related experiments , IODP Expedition 301 , eastern flank of Juan de Fuca Ridge 1 and general design. Proc Integr Ocean Drill Progr 301. doi:10.2204/iodp.proc.301.103.2005.

146

Fonknechten N, Chaussonnerie S, Tricot S, Lajus A, Andreesen JR, Perchat N, et al. (2010). Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence. BMC Genomics 11:555.

Fontaine FE, Peterson WH, McCoy E, Johnson MJ, Ritter GJ. (1942). A New Type of Glucose Fermentation by Clostridium thermoaceticum. J Bacteriol 43:701–715.

Friedrich T, Scheide D. (2000). The respiratory complex I of bacteria, archaea and eukarya and its module. Febs Lett 479:1.

Heberling C, Lowell RP, Liu L, Fisk MR. (2010). Extent of the microbial biosphere in the oceanic crust. Geochemistry Geophys Geosystems 11:1–15.

Jukes TH, Cantor CR, Munro HN. (1969). Evolution of protein molecules. Mamm protein Metab 3:132.

Jungbluth SP, Bowers RM, Lin H, Cowen JP, Rappé MS. (2016). Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt. ISME J 10:1–15.

Jungbluth SP, del Rio TG, Tringe SG, Stepanauskas R, Rappé MS. (2017). Genomic comparisons of a bacterial lineage that inhabits both marine and terrestrial deep subsurface systems. PeerJ 1–22.

Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, D’Hondt S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proc Natl Acad Sci U S A 109:16213–6.

Kanehisa M, Sato Y, Morishima K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol 428:726–731.

Kim M, Oh HS, Park SC, Chun J. (2014). Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64:346–351.

Laczny CC, Pinel N, Vlassis N, Wilmes P. (2014). Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci Rep 4:4516.

Lever MA. (2012). Acetogenesis in the energy-starved deep biosphere-a paradox? Front Microbiol 2. doi:10.3389/fmicb.2011.00284.

147

Lever M, Rouxel O, Alt J, Shimizu N. (2013). Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Amend J, E. LaRowe D, Bingham J-P, P. Cowen J. (2015). Dissolved amino acids in oceanic basaltic basement fluids. Geochim et Cosmo Acta 356:155-159. doi:10.1016/j.gca.2015.04.044.

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. (2014). Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

Mason OU, Di Meo-Savoie C a, Van Nostrand JD, Zhou J, Fisk MR, Giovannoni SJ. (2009). Prokaryotic diversity, distribution, and insights into their role in biogeochemical cycling in marine basalts. ISME J 3:231–42.

Mayhew LE, Ellison ET, Mccollom TM, Trainor TP, Templeton AS. (2013). Hydrogen generation from low-temperature water–rock reactions. Nat Geo 6. doi:10.1038/NGEO1825.

McCarthy MD, Beaupré SR, Walker BD, Voparil I, Guilderson TP, Druffel ERM. (2010). Chemosynthetic origin of 14C-depleted dissolved organic matter in a ridge- flank hydrothermal system. Nat Geosci 4:32–36.

Nisman B. (1954). THE STICKLAND REACTION. Bacteriol Rev 18:16–42.

Nitschke W, Russell MJ. (2013). Beating the acetyl coenzyme A-pathway to the origin of life. Philos Trans R Soc Lond B Biol Sci 368:20120258.

Nonaka H, Keresztes G, Shinoda Y, Ikenaga Y, Abe M, Naito K, et al. (2006). Complete genome sequence of the dehalorespiring bacterium Desulfitobacterium hafniense Y51 and comparison with Dehalococcoides ethenogenes 195. J Bacteriol 188:2262–2274.

Orcutt BN, Bach W, Becker K, Fisher AT, Hentscher M, Toner BM, et al. (2011). Colonization of subsurface microbial observatories deployed in young ocean crust. ISME J 5:692–703.

Orcutt BN, Sylvan JB, Rogers DR, Delaney J, Lee RW, Girguis PR. (2015). Carbon fixation by basalt-hosted microbial communities. Front Microbiol 6:1–14.

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–55.

148

Peng Y, Leung HCM, Yiu SM, Chin FYL. (2012). IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.

Pierce E, Xie G, Barabote R, Saunders E, Han C, Detter J, et al. (2008). The complete genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ Microbiol 10:2550–73.

Ragsdale SW. (2008). Enzymology of the Wood-Ljungdahl pathway of acetogenesis. Ann NY Acad Sci 1125:129–136.

Ragsdale SW, Pierce E. (2008). Acetogenesis and the Wood-Ljungdahl pathway of

CO2 fixation. Biochim Biophys Acta - Proteins Proteomics 1784:1873–1898.

Saitou N, Nei M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.

Schuchmann K, Müller V. (2014). Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nat Rev Microbiol 12:809–821.

Simpson JT, Durbin R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556.

Smith A, Popa R, Fisk M, Nielsen M, Wheat CG, Jannasch HW, et al. (2011). In situ enrichment of ocean crust microbes on igneous minerals and glasses using an osmotic flow-through device. Geochemistry Geophys Geosystems 12:1–19.

Smith AR, Fisk MR, Thurber AR, Flores GE, Mason OU, Popa R, et al. (2016). Deep Crustal Communities of the Juan de Fuca Ridge Are Governed by Mineralogy. Geomicrobiol J 451:0.

Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger S a, Kultima JR, et al. (2013). Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods 10:1196–1199.

Takai K, Gamo T, Tsunogai U, Nakayama N, Hirayama H, Nealson KH, et al. (2004). Geochemical and microbiological evidence for a hydrogen-based, hyperthermophilic subsurface lithoautotrophic microbial ecosystem (HyperSLiME) beneath an active deep-sea hydrothermal field. Extremophiles 8:269–82.

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–9.

149

Thór Marteinsson V, Rúnarsson A, Stefánsson A, Thorsteinsson T, Jóhannesson T, Magnússon SH, et al. (2012). Microbial communities in the subglacial waters of the Vatnajökull ice cap, Iceland. ISME J 1–11.

Ver Eecke HC, Butterfield DA, Huber JA, Lilley MD, Olson EJ, Roe KK, et al. (2012). Hydrogen-limited growth of hyperthermophilic methanogens at deep-sea hydrothermal vents. Proc Natl Acad Sci U S A 109:13674–9.

Vignais PM, Billoud B. (2007). Occurrence , Classification , and Biological Function of Hydrogenases : An Overview. Chem Rev 107:4206–4272.

Wang H, Edwards KJ. (2009). Bacterial and Archaeal DNA extracted from inoculated experiments: implication for the optimization of DNA extraction from deep-sea basalts. Geomicrobiol J 26:463–469.

Whitman WB, Coleman DC, Wiebe WJ. (1998). Prokaryotes: the unseen majority. Proc Natl Acad Sci 95:6578.

150

5. SYNTHESIS AND CONCLUSION

The goals of this dissertation were to address a number of key questions about the attached microbial communities of the Juan de Fuca Ridge (JdFR) aquifer biosphere. These questions were:

1. Does mineralogy influence microbial community structure?

2. What carbon and energy metabolisms are present in the genomes of

organisms from the JdFR mineral-colonizing communities?

3. Is there genomic evidence that these mineral biofilm communities possess

the functional capability to use the Wood-Ljungdahl pathway?

To address these questions, a subseafloor microbial observatory was used at

IODP Hole 1301A on the eastern flank of the JdFR. Igneous minerals and glasses common in oceanic crust (e.g., basalt glass, olivine, plagioclase, pyroxene, and others) were incubated in the basaltic basement at Hole 1301A for four years. The minerals and glasses were placed in flow cells attached to an osmotic pump that pulled aquifer fluid through the chambers at a slow rate (< 200 mL yr-1). Upon recovery of the flow cells in 2008, microbial DNA obtained from surface biofilms was analyzed using genomic techniques. We used 454 Pyrotag sequencing to determine community structure on individual mineral phases and metagenome- derived genomes were produced to identify carbon and energy pathways of community members.

Mineralogy and composition were expected to determine community structure since Fe-bearing minerals can potentially support more diverse metabolisms owing to

151 its redox-active nature. Fe-bearing minerals and glasses contain either reduced (Fe2+)

3+ 2+ or oxidized (Fe ) iron, and Fe can react with seawater to produce H2 (Miller et al.,

2017; Sleep et al., 2004), all of which can be used to generate energy for metabolism.

The olivine class of minerals also weathers more quickly, so it can provide a steady source of energy to support microbial communities until it becomes too oxidized or pore space is filled preventing seawater circulation, as can happen in older crust. The bacterial communities on Fe-bearing minerals in this study were distinct in composition from those on Fe-poor minerals, even when incubated in the same borehole and exposed to the same fluid. The olivine class of minerals had a unique archaeal community, and this may be reflected in other regions of the crust where olivine is more common. Olivine and pyroxene-rich ultramafic crust comprises ~1/3 of the total length of oceanic ridges and may have distinct hydrogenotrophic communities influenced by the abundance of these Fe-bearing minerals. In the future, it would be beneficial to repeat this study using multiple boreholes, mineralogical environments, and locations to determine if olivine and other Fe-bearing minerals are influencing microbial communities throughout the crustal aquifer. This has been shown for surface crustal environments (Toner et al., 2013; Sylvan et al., 2013;

Flores, Campbell, Kirshtein, Meneghin, Podar, Steinberg, Seewald, Tivey, Mary A

Voytek, et al., 2011), and this work suggests it may continue deeper into the crust.

With respect to the metabolisms of these microbial communities, we expected to find genomic evidence of sulfate reduction, hydrogenotrophy, and carbon fixation.

In this oceanic crustal habitat, evidence of methanogenesis and sulfate reduction has been observed (Lever et al., 2013; Cowen et al., 2003; Robador et al., 2015; Lin et

152 al., 2014), and the Calvin cycle was predicted to be the dominant carbon fixation pathway in all oceanic crust (Orcutt et al., 2015). However, based on the potential for

Fe mineral-supported hydrogenotrophy and the presence of the Wood-Ljungdahl pathway in oceanic crust early in Earth’s history (Nitschke & Russell, 2013), we predicted that the Wood-Ljungdahl pathway was occurring on mineral surfaces in the deep oceanic aquifer. We determined that the incubated olivine biofilm, which contained similar organisms to the glass and plagioclase biofilms albeit with a unique community structure, was dominated by bacteria and archaea capable of using the

Wood-Ljungdahl pathway. These microbes, being obtained from the surface of minerals, may be more common in the deep ocean crust than previously known since

1) most studies have traditionally focused on fluid sampling, and 2) the attached communities are predicted to be more representative of the whole aquifer community

(Lehman, 2007). Although this study suggests that the Wood-Ljungdahl pathway may be an important carbon fixation pathway in the deep ocean crust, perhaps even more so in olivine-dominated crust where it could dominate, further studies of either incubated substrates or whole rock native communities in both basaltic and olivine- rich crust can help determine if these organisms or this pathway is significantly impacting the carbon cycle in the suboceanic aquifer or influencing productivity in the deep ocean.

We found seven novel thermophilic marine acetogens in these communities that were not closely related to cultured organisms and in some cases could not be defined beyond class or order level. This significantly adds to the number of sequenced genomes of acetogens, which are currently only ~10 genomes, or around

153

10% of all known acetogens (Shin et al., 2016). Although acetogens have been studied in culture for decades (Pierce E et al., 2008; Fontaine et al., 1942), the mechanisms of energy conservation remain elusive. Even for those acetogens whose has been intensively studied, there are many unanswered questions about which hydrogenases are involved and many proteins of unknown function yet to be identified. The marine acetogens from this study have a large number of unknown genes that could be the focus of future studies and allow us to better understand the energetics of acetogenesis.

The dominant acetogen in the olivine community, Ca. A. pyornia, was identified as a keystone species in its community and will serve as a model organism for the colonized minerals in the JdFR aquifer. It remains unclear how this organism obtains energy, but since its metabolic repertoire has now been elucidated, future expeditions to the JdFR can focus on culturing this organism for the purpose of determining rates of growth and carbon fixation in the laboratory as well as identifying the substrates it uses for energy. Growth and carbon fixation rates can be fed into biogeochemical models whose outcome will be a greater understanding of the role of such communities in ocean productivity and global carbon cycles.

This dissertation expands upon previous work in the JdFR aquifer by presenting evidence for previously unrecognized metabolic strategies for energy and carbon fixation in the subseafloor of the JdFR (i.e. acetogenesis via the Wood-

Ljungdahl pathway) and provides support for the hypothesis that the basaltic aquifer contains H2-based lithoautotrophic microbial communities that are likely

154 heterogeneously distributed in oceanic crust based on mineralogy, a key driver microbial community structure.

Finally, an important feature of igneous environments is that they are found on other planets and moons in our solar system. Understanding how the suboceanic aquifer ecosystem functions is a key first step in the search for life on these other worlds, and focusing on how microbes and minerals interact can tell us what types of biosignatures are likely to be found there. Evidence of past oceans and subsurface water on Mars has been detected (Villanueva et al., 2015), and it is possible that subsurface microorganisms would be sheltered from the oxidizing atmosphere, temperature extremes, and harmful radiation that would hinder life at the surface.

Ocean worlds like Europa and Enceladus would also be likely candidates for life. On

Enceladus, sunlight is limited and liquid water is believed to flow through igneous crust. Here, plumes of H2-bearing water are ejected from between icy cracks on the

Saturnian moon’s surface, indicating potential H2-generating hydrothermal processes in igneous oceanic crust below (Waite et al., 2017). This molecular hydrogen can support microbial life, which may be analogous to SLiMEs described on Earth. The findings presented in this dissertation expand our knowledge of life in Earth’s crust, and this information will be valuable in future efforts to understand life in the solar system; how it began and if it still persists today on other worlds beyond Earth.

155

References

Cowen JP, Giovannoni SJ, Kenig F, Johnson HP, Butterfield D, Rappé MS, et al. (2003). Fluids from aging ocean crust that support microbial life. Science 299:120–3.

Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI, et al. (2011). Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid-Atlantic Ridge. Environ Microbiol 13:2158–71.

Fontaine FE, Peterson WH, McCoy E, Johnson MJ, Ritter GJ. (1942). A New Type of Glucose Fermentation by Clostridium thermoaceticum. J Bacteriol 43:701–715.

Lehman RM. (2007). Understanding of aquifer microbiology is tightly linked to sampling approaches. Geomicrobiol J 24:331–341.

Lever M, Rouxel O, Alt J, Shimizu N. (2013). Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. (2014). Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

Miller HM, Mayhew LE, Ellison ET, Kelemen P, Kubo M, Templeton AS. (2017). Low temperature hydrogen production during experimental hydration of partially- serpentinized dunite. Geochim Cosmochim Acta 209:161–183.

Nitschke W, Russell MJ. (2013). Beating the acetyl coenzyme A-pathway to the origin of life. Philos Trans R Soc Lond B Biol Sci 368:20120258.

Orcutt BN, Sylvan JB, Rogers DR, Delaney J, Lee RW, Girguis PR. (2015). Carbon fixation by basalt-hosted microbial communities. Front Microbiol 6:1–14.

Pierce E, Xie G, Barabote R, Saunders E, Han C, Detter J, et al. (2008). The complete genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ Microbiol 10:2550–73.

Robador A, Jungbluth SP, LaRowe DE, Bowers RM, Rappé MS, Amend JP, et al. (2015). Activity and phylogenetic diversity of sulfate-reducing microorganisms in low-temperature subsurface fluids within the upper oceanic crust. Front Microbiol 6:1–13.

156

Shin J, Song Y, Jeong Y, Cho BK. (2016). Analysis of the core genome and pan- genome of autotrophic acetogenic bacteria. Front Microbiol 7. doi:10.3389/fmicb.2016.01531.

Sleep NH, Meibom a, Fridriksson T, Coleman RG, Bird DK. (2004). H2-rich fluids from serpentinization: geochemical and biotic implications. Proc Natl Acad Sci U S A 101:12818–23.

Sylvan JB, Sia TY, Haddad AG, Briscoe LJ, Toner BM, Girguis PR, et al. (2013). Low temperature geomicrobiology follows host rock composition along a geochemical gradient in Lau basin. Front Microbiol 4. doi:10.3389/fmicb.2013.00061.

Toner BM, Lesniewski RA, Marlow JJ, Briscoe LJ, Santelli CM, Bach W, et al. (2013). Mineralogy drives bacterial biogeography of hydrothermally inactive seafloor sulfide deposits. Geomicrobiol J 30:313–326.

Villanueva, G.L; Mumma, M.J.; Novak, R.E.; Käufl, H.U.; Hartogh, P.; Encrenaz, T.; Tokunaga, A.; Khayat A., and Smith MD. (2015). Strong water isotopic anomalies in the Martian atmosphere: Probing current and ancient reservoirs. Science 348:218– 221.

Waite JH, Glein CR, Perryman RS, Teolis BD, Magee BA, Miller G, et al. (2017). Cassini finds molecular hydrogen in the Enceladus plume: Evidence for hydrothermal processes. Planet Geol 356:155–159.

157

6. APPENDICES

Appendix A: Chapter 2 Supplementary Material

Supplemental Table 2.1 Pyrotag read counts from domain-specific primer amplification of the v6v4 region of the 16S rRNA gene for the incubated minerals and glasses in this study.

Mineral Name Archaeal Reads Bacterial Reads

Forsterite (Fo100) 15002 16520

Olivine (Fo90) 19539 14099

Fayalite (Fo0) 575 14345 Hornblende 19495 18160 Basalt glass 19889 9490 Obsidian 22092 17065 Augite 14537 11416 Diopside 19352 17095 Totals 130,481 118,190

Supplemental Figure 2.1. Cluster-based community similarity for Archaea. Dotted lines indicate statistically significant groups determined by SIMPROF. These groups are represented in the Archaeal nMDS plot in Figure 2.22.

158

Supplemental Figure 2.2. Cluster-based community similarity for Bacteria with contaminants removed. Dotted lines indicate statistically significant groups determined by SIMPROF. These groups are represented in the Bacterial nMDS plot in Figure 2.2.

159

Supplemental Figure 2.3. (A) Archaeal and (B) Bacterial community compositions for the eight minerals and glasses incubated in the subseafloor prior to removing suspected laboratory contaminants. These compositional data were used to produce the nMDS plots in Supplemental Figure 2.1. (A) DSHVG6 is Deep Sea Hydrothermal Vent Group 6, a Halobacterium in . (B) Ralstonia and Burkholderia sequences constituted a significant portion of each mineral community and were removed in the analyses in the main text since they are common contaminants of laboratory reagents and the DNA extraction kit we used. MCG is Miscellaneous Crenarchaeotic Group. OP1 and OP8 are candidate Bacterial phyla.

160

Supplemental Figure 2.4. nMDS plots for A) Archaeal and B) Bacterial microbial communities prior to removal of suspected contaminants. Each dot color represents a unique group by SIMPROF test (data not shown for Bacteria; Archaea are unchanged, see Figure S2 for SIMPROF). (A) Archaeal communities grouped according to mineralogy, with an olivine group, two non-olivine Fe-bearing mineral groups, and diopside. (B) Bacterial mineral and glass communities formed an Fe-bearing mineral group and the Fe-poor minerals forsterite and diopside grouped separately.

161

Supplemental Figure 2.5. EDAX spectrum for Fo90 olivine biofilm in Figure 2.5A. An olivine (Mg,Fe)2SiO4 signature can be seen along with a high abundance of carbon and other detectable trace elements.

162

Supplemental Figure 2.6. EDAX spectrum for Fo90 olivine secondary surface mineral in Figure 2.5A. An olivine (Mg,Fe)2SiO4 signature can be seen along with a high abundance of carbon, calcium, and chloride.

163

Appendix B: Chapter 3 Supplementary Materials

Supplemental Table 3.1. Statistics table of olivine metagenome assembly using IDBA-UD. Bolded kmer size 65 was used for analysis based on optimal results of largest n50, maximum contig size, and total assembly size.

kmer size n50 (bp) max contig (bp) total assembly size 45 7058 157500 39062281 49 13696 157508 32149784 53 15052 157516 31670483 57 16919 206649 31266346 61 18161 286339 31048643 65 19191 286347 30811948 69 17771 286355 31532255

164

Supplemental Figure 3.1. Quality checking and taxonomic assignment of genome bins from Figure 2. CheckM was used to check purity and completeness of genomes. Each green box indicates a single copy of a marker gene. Gray boxes represent missing marker genes that should be present in a complete genome. Heterogeneity, indicated by blue boxes (darker blue = more heterogeneity) is calculated as the fraction of non-single-copy marker genes that have > 90% amino acid identity. Contamination, indicated by shades of orange boxes (darker orange = more contamination) equals the fraction of non-single-copy marker genes that have < 90% amino acid identity to the taxonomic marker gene set for each bin.

165

Supplemental Table 3.2. Results of genomic bin quality and completeness using CheckM. Each bin was assigned a marker lineage from which to check for contamination with DNA from other lineages. The number of genomes from each marker lineage used in the analysis is provided, as well as the number of genome markers used from each lineage which was used to determine completeness of each genomic bin. The number of markers found in each bin as compared to the number of markers expected is expressed as % completeness of each genome. For marker lineages, f = family; p = phylum; and k = domain/kingdom.

Genomes # Markers # Markers Bin Marker Lineage % Complete Used Used Found 1 f__Burkholderiaceae 218 596 97 13.77 2 p__Firmicutes 101 175 145 81.19 3 p__Firmicutes 101 173 155 84.32 4 k__Bacteria 59 105 99 91.53 5 k__Bacteria 59 105 100 91.53 6 p__Firmicutes 101 175 162 91.58 7 p__Firmicutes 101 175 163 94.06 8 p__Firmicutes 104 179 177 98.08 9 p__Firmicutes 157 293 289 98.41 10 p__Euryarchaeota 153 228 226 98.69 11 p__Euryarchaeota 153 228 227 100 12 p__Euryarchaeota 153 228 227 100

166

Supplemental Table 3.3. List of olivine metagenome genes identified from ‘COG0243, BisC, anaerobic dehydrogenase, typically selenocysteine-containing’. This COG is classified as pertaining to ‘Energy Production and Conversion’ and has a Conserved Domain Database (CDD) ID of 223321. Some of the listed genes belonging to COG0243 were successfully identified using BLASTKoala or GHOSTKoala with KEGG. Formate dehydrogenase A (fdhA) belongs to COG0243 as well as other proteins with similar function. FdhA genes identified by KEGG were found in bins 3, 4, 5, and 6 (yellow blocks with red letters), and possible fdhA genes from those bins missing this gene are highlighted in blue. Genes identified from this COG that were not binned with VizBin and not part of the genome analyses are listed at the bottom of the table. FdoG = (aerobic) formate dehydrogenase O major subunit analogous to fdhA (formate to CO2); fwdBDG = formyl-methanofuran dehydrogenase in the methanogenic WL pathway, analogous to formate dehydrogenase in Bacteria; nuoF = NADH-quinone oxidoreductase subunit F, often adjacent to fdhA in the genome. Other genes identified, some of which appear misannotated or result from a frame shift, are: rep = ATP-dependent DNA Rep; nasA = assimilatory nitrate reductase catalytic subunit; tadC = tight adherence protein C; AACS = acetoacetyl- CoA synthetase; pstS = phosphate transport system substrate-binding protein; narG = nitrate reductase / nitrite oxidoreductase, alpha subunit; and tatD = TatD DNase family protein.

167

Supplemental Table 3.3

Domain Bin KOALA ID contig and gene ID fdoG contig-65_1969_1 1 rep contig-65_2317_1 nasA contig-65_2809_1 fdoG contig-65_1331_1 2 possible fdhA contig-65_2833_1 fdhA contig-65_45_53 3 tadC (102?) contig-65_6_103 fdhA contig-65_1154_1 4 unknown/next to nuoF contig-65_540_3 5 fdhA contig-65_165_3 fdhA contig-65_1066_3 unknown contig-65_1351_3 AACS contig-65_167_6 Bacteria 6 unknown contig-65_191_1 fwdB contig-65_212_19 fwdD contig-65_212_20 unknown contig-65_721_8 nuoF contig-65_15_57 possible fdhA contig-65_201_23 7 fdoG/pstS contig-65_245_16 possible fdhA contig-65_802_2 8 unknown contig-65_247_13 fdoG contig-65_1150_2 fdoG contig-65_1150_3 9 fdoG contig-65_1150_3 possible fdhA contig-65_32_17 unknown contig-65_1754_4 fwdD contig-65_197_38 fwdB contig-65_197_39 unknown contig-65_2195_1 Archaea 10 unknown contig-65_22_22 fwdG contig-65_22_23 fwdG contig-65_22_23 fwdG contig-65_22_23 unknown contig-65_2308_2

168

Supplemental Table 3.3 (Continued)

unknown contig-65_61_1

unknown contig-65_7_135 unknown contig-65_205_16 narG contig-65_26_55 narG contig-65_26_55 narG contig-65_26_55 narG contig-65_26_55 tatD contig-65_27_103 11 tatD contig-65_27_103 fdoG contig-65_36_42 fdoG contig-65_36_42 fwdB contig-65_93_21 fwdB contig-65_93_43 fwdD contig-65_93_44 fwdB contig-65_21_12 12 fwdD contig-65_75_24 fwdB contig-65_75_25 contig-65_3193_1 contig-65_3717_1 contig-65_3755_1 contig-65_3876_1 contig-65_4064_1 contig-65_4112_1 contig-65_4156_1 contig-65_4258_2 contig-65_4325_1 not binned none N/A contig-65_4553_2 contig-65_4579_1 contig-65_4615_1 contig-65_4949_1 contig-65_5428_1 contig-65_5683_2 contig-65_6271_1 contig-65_6426_1 contig-65_6504_1 contig-65_7115_1

169

Supplemental Table 3.3 (Continued)

contig-65_7533_1 contig-65_7882_1 contig-65_7954_1 not binned none N/A contig-65_7990_1 contig-65_8083_1 contig-65_9691_1

170

Supplemental Table 3.4. Description of hydrogenase and related genes from Figure 3.5A.

Hydrogenase Genes Description hypA hydrogenase nickel incorporation protein HypA/HybF hypB hydrogenase nickel incorporation protein HypB hypC hydrogenase expression/formation protein HypC hypD hydrogenase expression/formation protein HypD hypE hydrogenase expression/formation protein HypE hypF hydrogenase maturation protein HypF hyaA hydrogenase small subunit [EC:1.12.99.6] hyaB hydrogenase large subunit [EC:1.12.99.6] hyaC Ni/Fe-hydrogenase 1 B-type cytochrome subunit hyaD hydrogenase maturation protease [EC:3.4.23.-] mvhA F420-non-reducing hydrogenase large subunit [EC:1.12.99.-] mvhG F420-non-reducing hydrogenase small subunit [EC:1.12.99.-] echA ech hydrogenase subunit A echB ech hydrogenase subunit B echC ech hydrogenase subunit C hndB NADP-reducing hydrogenase subunit HndB [EC:1.12.1.3] hndC NADP-reducing hydrogenase subunit HndC [EC:1.12.1.3] hndD NADP-reducing hydrogenase subunit HndD [EC:1.12.1.3] hyfB hydrogenase-4 component B hycI hydrogenase 3 maturation protease [EC:3.4.23.51] hoxH NAD-reducing hydrogenase large subunit [EC:1.12.1.2] hoxY NAD-reducing hydrogenase small subunit [EC:1.12.1.2] hoxU bidirectional [NiFe] hydrogenase diaphorase subunit [EC:1.6.5.3] frhB coenzyme F420 hydrogenase subunit beta [EC:1.12.98.1] ehbQ energy-converting hydrogenase B subunit Q

171

Supplemental Table 3.5. Description of ferredoxins and related genes from Figure 3.5B.

Ferredoxin Genes Description fixX ferredoxin like protein fpr ferredoxin/flavodoxin---NADP+ reductase [EC:1.18.1.2 1.19.1.1] porA pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] porB pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] porD pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] porG pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1] aor aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha korA [EC:1.2.7.3 1.2.7.11] 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta korB [EC:1.2.7.3 1.2.7.11] korC 2-oxoglutarate ferredoxin oxidoreductase subunit gamma [EC:1.2.7.3] korD 2-oxoglutarate ferredoxin oxidoreductase subunit delta [EC:1.2.7.3] fer ferredoxin iorA indolepyruvate ferredoxin oxidoreductase, alpha subunit [EC:1.2.7.8] iorB indolepyruvate ferredoxin oxidoreductase, beta subunit [EC:1.2.7.8] por pyruvate-ferredoxin/flavodoxin oxidoreductase [EC:1.2.7.1 1.2.7.-] napH ferredoxin-type protein NapH fdxA ferredoxin naphthalene 1,2-dioxygenase ferredoxin reductase component nahAa [EC:1.18.1.7] fwdF 4Fe-4S ferredoxin fwdG 4Fe-4S ferredoxin nirA ferredoxin-nitrite reductase [EC:1.7.7.1]

172

Supplemental Table 3.6. Description of cytochromes and other electron transport agent or related genes from Figure 3.5C.

Cytochrome & other Description ETC Genes ccmE cytochrome c-type biogenesis protein CcmE ccmF cytochrome c-type biogenesis protein CcmF ccmH cytochrome c-type biogenesis protein CcmH ccdA cytochrome c-type biogenesis protein resB cytochrome c biogenesis protein nrfH cytochrome c nitrite reductase small subunit nrfA nitrite reductase (cytochrome c-552) [EC:1.7.2.2] LDHD D- (cytochrome) [EC:1.1.2.4] coxC cytochrome c oxidase subunit III [EC:1.9.3.1] exaA dehydrogenase (cytochrome c) [EC:1.1.2.8] cybB cytochrome b561 succinate dehydrogenase / fumarate reductase, flavoprotein subunit sdhA [EC:1.3.5.1 1.3.5.4] succinate dehydrogenase / fumarate reductase, iron-sulfur subunit sdhB [EC:1.3.5.1 1.3.5.4] sdhC succinate dehydrogenase / fumarate reductase, cytochrome b subunit sdhD succinate dehydrogenase / fumarate reductase, membrane anchor subunit

173

Appendix C: Chapter 4 Supplementary Materials

Supplementary Table 4.1. Information for sequences from Figure 4.1. GenBank Reference ID Accession # Cowen et al., 2003 Uncultured bacterium clone 1026B3 AY181047 Jungbluth et al., Uncultured bacterium clone 1301A08 104 JX194453 2013 Cowen et al., 2003 Uncultured bacterium clone 1026B104 AY181043 Orcutt et al., 2011 Uncultured bacterium clone 1301APY A05 GU189002 Orcutt et al., 2011 Uncultured bacterium clone 1301AWY F02 GU189037 Orcutt et al., 2011 Uncultured bacterium clone 1301APX F10 GU188999 Jungbluth et al., Uncultured bacterium clone SSF21-22 KR072788 2017 1362A11 026

174

Supplemental Figure 4.1. Ffh gene phylogeny of Ca. Acetocimmeria pyornia and closest relatives. The evolutionary history of Ca. Acetocimmeria pyornia was inferred using the Maximum Likelihood method based on the Tamura-Nei model in MEGA5 (Tamura et al., 2011), with the optimal tree for the ffh gene shown. The percentage of 500 replicate trees in which the associated taxa clustered together (bootstrap test) is shown next to the branches if greater than 50% (Felsenstein, 1985). Branch lengths are measured in the number of substitutions per site. There were 264 positions in the final dataset. Organisms in green are known acetogens.

175

Supplementary Table 4.2. Complete KEGG pathways for secondary metabolism, energy metabolism and genetic information processing. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene. KEGG Gene Discrete pathway Organism Ca. Secondary metabolism – structural complex Mta Dau Dsy Apy Energy metabolism

ATP synthesis

M00144 NADH:quinone oxidoreductase, prokaryotes + + (-)

M00149 Succinate dehydrogenase, prokaryotes +

M00153 Cytochrome d ubiquinol oxidase + +

M00157 F-type ATPase, prokaryotes and chloroplasts + + + + Genetic information processing

DNA polymerase

M00260 DNA polymerase III complex, bacteria + + + (-) RNA polymerase

M00183 RNA polymerase, bacteria + + + + Ribosome

M00178 Ribosome, bacteria + + + (-)

176

Supplementary Table 4.3. Complete KEGG pathways for secondary metabolism, transport systems. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene.

177

Supplementary Table 4.3

KEGG Gene Discrete pathway Organism Ca. Secondary metabolism – environmental information processing Mta Dau Dsy Apy Mineral and organic ion transport system

M00185 Sulfate transport system + M00189 Molybdate transport system + + + + M00186 Tungstate transport system + + +

M00188 NitT/TauT family transport system + + +

M00436 Sulfonate transport system + M00299 Spermidine/putrescine transport system + M00208 Glycine betaine/proline transport system + M00209 Osmoprotectant transport system + +

Saccharide, polyol, and lipid transport system M00201 alpha-Glucoside transport system + M00207 Putative multiple sugar transport system + M00212 Ribose transport system +

M00221 Putative simple sugar transport system +

M00211 Putative ABC transport system +

Phosphate and amino acid transport system

M00222 Phosphate transport system + + + (-) M00589 Putative lysine transport system + M00237 Branched-chain amino acid transport system + + + + M00238 D-Methionine transport system + M00228 Putative glutamine transport system + M00236 Putative polar amino acid transport system + + + Peptide and nickel transport system

M00439 Oligopeptide transport system + +

M00239 Peptides/nickel transport system + + +

Metallic cation, iron-siderophore and vitamin B12 transport

system M00240 Iron complex transport system + + + + M00242 Zinc transport system + +

M00319 Manganese/zinc/iron transport system + M00245 Cobalt/nickel transport system + + + + M00243 Manganese/iron transport system + M00246 Nickel transport system + + + + M00247 Putative ABC transport system + M00582 Energy-coupling factor transport system + + +

ABC-2 type and other transport systems

M00298 Multidrug/hemolysin transport system + M00813 Lantibiotic transport system + M00762 Copper-processing system + M00224 Fluoroquinolone transport system + M00252 Lipooligosaccharide transport system + +

M00256 Cell division transport system + + + + M00259 Heme transport system +

M00258 Putative ABC transport system + + + + M00254 ABC-2 type transport system + + + +

178

Supplementary Table 4.3 (Continued)

Drug efflux transporter/pump M00707 Multidrug resistance, SmdAB/MdlAB transporter + M00712 Multidrug resistance, efflux pump YkkCD + Phosphotransferase system (PTS)

M00273 PTS system, fructose-specific II component +

Bacterial secretion system

M00335 Sec (secretion) system + + + + M00336 Twin-arginine translocation (Tat) system + + + +

179

Supplementary Table 4.4. Complete KEGG pathways for other energy metabolisms and regulatory systems. Closely-related known acetogens: Mta = Moorella thermoacetica, Dau = Ca. Desulforudis audaxviator, and Dsy = Desulfitobacterium hafniense. + = complete pathway, (+) = known complete pathways not identified through the KEGG module. (-) = near-complete, with at least one gene. KEGG Gene Discrete pathway Organism Ca. Energy metabolism Mta Dau Dsy Apy Functional set

Aminoacyl tRNA

M00360 Aminoacyl-tRNA biosynthesis, prokaryotes + + + + Nucleotide sugar

M00362 Nucleotide sugar biosynthesis, prokaryotes +

Environmental information processing

Two-component regulatory system

KdpD-KdpE (potassium transport) two-component regulatory M00454 + system PhoR-PhoB (phosphate starvation response) two-component M00434 + regulatory system ResE-ResD (aerobic and anaerobic respiration) two- M00458 + + component regulatory system VicK-VicR (cell wall metabolism) two-component regulatory M00459 + system DegS-DegU (multicellular behavior control) two-component M00478 + (-) regulatory system NreB-NreC (dissimilatory nitrate/nitrite reduction) two- M00483 + component regulatory system DctS-DctR (C4-dicarboxylate transport) two-component M00489 + regulatory system MalK-MalR (malate transport) two-component regulatory M00490 + system M00492 LytS-LytR two-component regulatory system +

GlnK-GlnL (glutamine utilization) two-component regulatory M00518 + system M00519 YesM-YesN two-component regulatory system + VanS-VanR (VanE type vancomycin resistance) two- M00657 + component regulatory system Drug efflux transporter/pump M00765 Multidrug resistance, efflux pump Bmr +

180

Supplementary Table 4.5. List of all enzymes and associated KEGG-annotated genes for Ca. A. pyornia.

181

Supplementary Table 4.5 (Continued)

1. 1.1 Acting on the CH-OH group of donors 1.1.1 With NAD+ or NADP+ as acceptor 1.1.1.1 alcohol dehydrogenase contig-65_1009_180_2 K04072 adhE; acetaldehyde dehydrogenase / alcohol dehydrogenase [EC:1.2.1.10 1.1.1.1] 1.1.1.3 homoserine dehydrogenase contig-65_807_137_5 K00003 E1.1.1.3; homoserine dehydrogenase [EC:1.1.1.3] contig-65_943_167_5 K00003 E1.1.1.3; homoserine dehydrogenase [EC:1.1.1.3] 1.1.1.4 (R,R)-butanediol dehydrogenase contig-65_806_136_6 K00004 BDH, butB; (R,R)-butanediol dehydrogenase / meso-butanediol dehydrogenase / diacetyl reductase [EC:1.1.1.4 1.1.1.- 1.1.1.303] 1.1.1.6 dehydrogenase contig-65_435_48_1 K00005 gldA; [EC:1.1.1.6] 1.1.1.14 L-iditol 2-dehydrogenase contig-65_795_132_6 K00008 SORD, gutB; L-iditol 2-dehydrogenase [EC:1.1.1.14] 1.1.1.22 UDP-glucose 6-dehydrogenase contig-65_166_6_14 K00012 UGDH, ugd; UDPglucose 6-dehydrogenase [EC:1.1.1.22] contig-65_495_62_1 K00012 UGDH, ugd; UDPglucose 6-dehydrogenase [EC:1.1.1.22] 1.1.1.23 histidinol dehydrogenase contig-65_595_81_10 K00013 hisD; histidinol dehydrogenase [EC:1.1.1.23] 1.1.1.25 shikimate dehydrogenase contig-65_600_82_7 K00014 aroE; shikimate dehydrogenase [EC:1.1.1.25] 1.1.1.38 malate dehydrogenase (oxaloacetate-decarboxylating) contig-65_203_12_17 K00027 ME2, sfcA, maeA; malate dehydrogenase (oxaloacetate- decarboxylating) [EC:1.1.1.38] 1.1.1.41 isocitrate dehydrogenase (NAD+) contig-65_592_80_4 K00030 IDH3; isocitrate dehydrogenase (NAD+) [EC:1.1.1.41] 1.1.1.42 isocitrate dehydrogenase (NADP+) contig-65_314_24_3 K00031 IDH1, IDH2, icd; isocitrate dehydrogenase [EC:1.1.1.42] 1.1.1.60 2-hydroxy-3-oxopropionate reductase contig-65_595_81_13 K00042 garR, glxR; 2-hydroxy-3-oxopropionate reductase [EC:1.1.1.60] 1.1.1.85 3-isopropylmalate dehydrogenase contig-65_352_33_7 K00052 leuB, IMDH; 3-isopropylmalate dehydrogenase [EC:1.1.1.85] 1.1.1.86 ketol-acid reductoisomerase (NADP+) contig-65_352_33_4 K00053 ilvC; ketol-acid reductoisomerase [EC:1.1.1.86] 1.1.1.94 glycerol-3-phosphate dehydrogenase [NAD(P)+] contig-65_684_101_1 K00057 gpsA; glycerol-3-phosphate dehydrogenase (NAD(P)+) [EC:1.1.1.94] 1.1.1.95 phosphoglycerate dehydrogenase contig-65_95_1_8 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] contig-65_166_6_25 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] contig-65_457_57_5 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] 1.1.1.100 3-oxoacyl-[acyl-carrier-protein] reductase contig-65_628_91_7 K00059 fabG; 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] contig-65_681_100_10 K00059 fabG; 3-oxoacyl-[acyl-carrier protein] reductase [EC:1.1.1.100] 1.1.1.125 2-deoxy-D-gluconate 3-dehydrogenase contig-65_675_99_5 K00065 kduD; 2-deoxy-D-gluconate 3-dehydrogenase [EC:1.1.1.125]

182

Supplementary Table 4.5 (Continued)

1.1.1.136 UDP-N-acetylglucosamine 6-dehydrogenase contig-65_166_6_11 K13015 wbpA; UDP-N-acetyl-D-glucosamine dehydrogenase [EC:1.1.1.136] contig-65_627_90_5 K13015 wbpA; UDP-N-acetyl-D-glucosamine dehydrogenase [EC:1.1.1.136] 1.1.1.157 3-hydroxybutyryl-CoA dehydrogenase contig-65_167_7_8 K00074 paaH, hbd, fadB, mmgB; 3-hydroxybutyryl-CoA dehydrogenase [EC:1.1.1.157] contig-65_752_118_7 K00074 paaH, hbd, fadB, mmgB; 3-hydroxybutyryl-CoA dehydrogenase [EC:1.1.1.157] contig-65_1212_208_2 K00074 paaH, hbd, fadB, mmgB; 3-hydroxybutyryl-CoA dehydrogenase [EC:1.1.1.157] 1.1.1.193 5-amino-6-(5-phosphoribosylamino)uracil reductase contig-65_193_11_21 K11752 ribD; diaminohydroxyphosphoribosylaminopyrimidine deaminase / 5-amino-6-(5-phosphoribosylamino)uracil reductase [EC:3.5.4.26 1.1.1.193] 1.1.1.205 IMP dehydrogenase contig-65_885_159_1 K00088 IMPDH, guaB; IMP dehydrogenase [EC:1.1.1.205] 1.1.1.262 4-hydroxythreonine-4-phosphate dehydrogenase contig-65_709_106_6 K00097 pdxA; 4-hydroxythreonine-4-phosphate dehydrogenase [EC:1.1.1.262] 1.1.1.267 1-deoxy-D-xylulose-5-phosphate reductoisomerase contig-65_297_21_16 K00099 dxr; 1-deoxy-D-xylulose-5-phosphate reductoisomerase [EC:1.1.1.267] 1.1.1.271 GDP-L-fucose synthase contig-65_166_6_20 K02377 TSTA3, fcl; GDP-L-fucose synthase [EC:1.1.1.271] 1.1.1.281 GDP-4-dehydro-6-deoxy-D-mannose reductase contig-65_633_93_4 K15856 rmd; GDP-4-dehydro-6-deoxy-D-mannose reductase [EC:1.1.1.281] 1.1.1.303 diacetyl reductase [(R)-acetoin forming] contig-65_806_136_6 K00004 BDH, butB; (R,R)-butanediol dehydrogenase / meso-butanediol dehydrogenase / diacetyl reductase [EC:1.1.1.4 1.1.1.- 1.1.1.303] 1.1.1.399 2-oxoglutarate reductase contig-65_95_1_8 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] contig-65_166_6_25 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] contig-65_457_57_5 K00058 serA, PHGDH; D-3-phosphoglycerate dehydrogenase / 2- oxoglutarate reductase [EC:1.1.1.95 1.1.1.399] 1.1.1.- contig-65_806_136_6 K00004 BDH, butB; (R,R)-butanediol dehydrogenase / meso-butanediol dehydrogenase / diacetyl reductase [EC:1.1.1.4 1.1.1.- 1.1.1.303] contig-65_167_7_18 K21416 acoA; acetoin:2,6-dichlorophenolindophenol oxidoreductase subunit alpha [EC:1.1.1.-] contig-65_167_7_19 K21417 acoB; acetoin:2,6-dichlorophenolindophenol oxidoreductase subunit beta [EC:1.1.1.-] 1.1.3 With oxygen as acceptor 1.1.3.15 (S)-2-hydroxy-acid oxidase contig-65_167_7_25 K00104 glcD; glycolate oxidase [EC:1.1.3.15] contig-65_380_41_10 K00104 glcD; glycolate oxidase [EC:1.1.3.15] contig-65_1495_247_3 K00104 glcD; glycolate oxidase [EC:1.1.3.15] 1.2 Acting on the aldehyde or oxo group of donors 1.2.1 With NAD+ or NADP+ as acceptor

183

Supplementary Table 4.5 (Continued)

1.2.1.10 acetaldehyde dehydrogenase (acetylating) contig-65_1009_180_2 K04072 adhE; acetaldehyde dehydrogenase / alcohol dehydrogenase [EC:1.2.1.10 1.1.1.1] 1.2.1.11 aspartate-semialdehyde dehydrogenase contig-65_377_39_11 K00133 asd; aspartate-semialdehyde dehydrogenase [EC:1.2.1.11] 1.2.1.12 glyceraldehyde-3-phosphate dehydrogenase (phosphorylating) contig-65_583_78_7 K00134 GAPDH, gapA; glyceraldehyde 3-phosphate dehydrogenase [EC:1.2.1.12] 1.2.1.38 N-acetyl-gamma-glutamyl-phosphate reductase contig-65_566_74_1 K00145 argC; N-acetyl-gamma-glutamyl-phosphate reductase [EC:1.2.1.38] 1.2.1.41 glutamate-5-semialdehyde dehydrogenase contig-65_870_156_4 K00147 proA; glutamate-5-semialdehyde dehydrogenase [EC:1.2.1.41] 1.2.1.43 formate dehydrogenase (NADP+) contig-65_1066_187_3 K05299 fdhA; formate dehydrogenase alpha subunit [EC:1.2.1.43] 1.2.1.70 glutamyl-tRNA reductase contig-65_737_114_3 K02492 hemA; glutamyl-tRNA reductase [EC:1.2.1.70] 1.2.7 With an iron-sulfur protein as acceptor 1.2.7.1 pyruvate synthase contig-65_189_9_24 K00169 porA; pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] contig-65_250_16_12 K00169 porA; pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] contig-65_378_40_7 K00169 porA; pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] contig-65_380_41_8 K00169 porA; pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] contig-65_585_79_9 K00169 porA; pyruvate ferredoxin oxidoreductase alpha subunit [EC:1.2.7.1] contig-65_189_9_23 K00170 porB; pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] contig-65_250_16_11 K00170 porB; pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] contig-65_378_40_6 K00170 porB; pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] contig-65_380_41_9 K00170 porB; pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] contig-65_585_79_10 K00170 porB; pyruvate ferredoxin oxidoreductase beta subunit [EC:1.2.7.1] contig-65_189_9_25 K00171 porD; pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] contig-65_250_16_13 K00171 porD; pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] contig-65_378_40_8 K00171 porD; pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] contig-65_380_41_7 K00171 porD; pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] contig-65_585_79_8 K00171 porD; pyruvate ferredoxin oxidoreductase delta subunit [EC:1.2.7.1] contig-65_189_9_26 K00172 porG; pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1]

184

Supplementary Table 4.5 (Continued)

contig-65_250_16_10 K00172 porG; pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1] contig-65_378_40_9 K00172 porG; pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1] contig-65_380_41_6 K00172 porG; pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1] contig-65_585_79_7 K00172 porG; pyruvate ferredoxin oxidoreductase gamma subunit [EC:1.2.7.1] 1.2.7.3 2-oxoglutarate synthase contig-65_659_97_7 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_1138_198_2 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_1264_215_4 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_659_97_8 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11] contig-65_1253_212_1 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11] contig-65_1264_215_3 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11] contig-65_659_97_6 K00176 korD, oorD; 2-oxoglutarate ferredoxin oxidoreductase subunit delta [EC:1.2.7.3] contig-65_1138_198_3 K00176 korD, oorD; 2-oxoglutarate ferredoxin oxidoreductase subunit delta [EC:1.2.7.3] contig-65_659_97_9 K00177 korC, oorC; 2-oxoglutarate ferredoxin oxidoreductase subunit gamma [EC:1.2.7.3] contig-65_1253_212_2 K00177 korC, oorC; 2-oxoglutarate ferredoxin oxidoreductase subunit gamma [EC:1.2.7.3] contig-65_1264_215_2 K00177 korC, oorC; 2-oxoglutarate ferredoxin oxidoreductase subunit gamma [EC:1.2.7.3] 1.2.7.4 anaerobic carbon-monoxide dehydrogenase contig-65_203_12_3 K00198 cooS, acsA; anaerobic carbon-monoxide dehydrogenase catalytic subunit [EC:1.2.7.4] contig-65_860_152_4 K00198 cooS, acsA; anaerobic carbon-monoxide dehydrogenase catalytic subunit [EC:1.2.7.4] 1.2.7.5 aldehyde ferredoxin oxidoreductase contig-65_342_30_17 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] contig-65_435_48_13 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] contig-65_796_133_7 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] contig-65_818_140_9 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] contig-65_963_172_1 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] contig-65_1667_260_1 K03738 aor; aldehyde:ferredoxin oxidoreductase [EC:1.2.7.5] 1.2.7.11 2-oxoacid oxidoreductase (ferredoxin) contig-65_659_97_7 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_1138_198_2 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_1264_215_4 K00174 korA, oorA, oforA; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit alpha [EC:1.2.7.3 1.2.7.11] contig-65_659_97_8 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11]

185

Supplementary Table 4.5 (Continued)

contig-65_1253_212_1 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11] contig-65_1264_215_3 K00175 korB, oorB, oforB; 2-oxoglutarate/2-oxoacid ferredoxin oxidoreductase subunit beta [EC:1.2.7.3 1.2.7.11] 1.2.99 With unknown physiological acceptors 1.2.99.5 formylmethanofuran dehydrogenase contig-65_212_13_18 K00200 fwdA, fmdA; formylmethanofuran dehydrogenase subunit A [EC:1.2.99.5] contig-65_212_13_19 K00201 fwdB, fmdB; formylmethanofuran dehydrogenase subunit B [EC:1.2.99.5] contig-65_212_13_16 K00202 fwdC, fmdC; formylmethanofuran dehydrogenase subunit C [EC:1.2.99.5] contig-65_212_13_20 K00203 fwdD, fmdD; formylmethanofuran dehydrogenase subunit D [EC:1.2.99.5] 1.3 Acting on the CH-CH group of donors 1.3.1 With NAD+ or NADP+ as acceptor 1.3.1.12 prephenate dehydrogenase contig-65_458_58_10 K04517 tyrA2; prephenate dehydrogenase [EC:1.3.1.12] 1.3.1.14 dihydroorotate dehydrogenase (NAD+) contig-65_191_10_8 K17828 pyrDI; dihydroorotate dehydrogenase (NAD+) catalytic subunit [EC:1.3.1.14] 1.3.1.76 precorrin-2 dehydrogenase contig-65_737_114_5 K02304 MET8; precorrin-2 dehydrogenase / sirohydrochlorin ferrochelatase [EC:1.3.1.76 4.99.1.4] 1.3.1.98 UDP-N-acetylmuramate dehydrogenase contig-65_342_30_15 K00075 murB; UDP-N-acetylmuramate dehydrogenase [EC:1.3.1.98] 1.3.5 With a quinone or related compound as acceptor 1.3.5.1 succinate dehydrogenase contig-65_203_12_20 K00239 sdhA, frdA; succinate dehydrogenase / fumarate reductase, flavoprotein subunit [EC:1.3.5.1 1.3.5.4] contig-65_203_12_21 K00240 sdhB, frdB; succinate dehydrogenase / fumarate reductase, iron- sulfur subunit [EC:1.3.5.1 1.3.5.4] 1.3.5.4 fumarate reductase (quinol) contig-65_203_12_20 K00239 sdhA, frdA; succinate dehydrogenase / fumarate reductase, flavoprotein subunit [EC:1.3.5.1 1.3.5.4] contig-65_203_12_21 K00240 sdhB, frdB; succinate dehydrogenase / fumarate reductase, iron- sulfur subunit [EC:1.3.5.1 1.3.5.4] 1.4 Acting on the CH-NH2 group of donors 1.4.1 With NAD+ or NADP+ as acceptor 1.4.1.1 alanine dehydrogenase contig-65_359_34_4 K00259 ald; alanine dehydrogenase [EC:1.4.1.1] 1.4.1.2 glutamate dehydrogenase contig-65_659_97_10 K00260 gudB, rocG; glutamate dehydrogenase [EC:1.4.1.2] 1.4.1.3 glutamate dehydrogenase [NAD(P)+] contig-65_367_37_2 K00261 GLUD1_2, gdhA; glutamate dehydrogenase (NAD(P)+) [EC:1.4.1.3] 1.4.1.9 leucine dehydrogenase contig-65_1138_198_6 K00263 E1.4.1.9; leucine dehydrogenase [EC:1.4.1.9] 1.4.1.12 2,4-diaminopentanoate dehydrogenase contig-65_285_18_14 K21672 ord; 2,4-diaminopentanoate dehydrogenase [EC:1.4.1.12 1.4.1.26] 1.4.1.13 glutamate synthase (NADPH)

186

Supplementary Table 4.5 (Continued)

contig-65_174_8_13 K00266 gltD; glutamate synthase (NADPH/NADH) small chain [EC:1.4.1.13 1.4.1.14] contig-65_445_53_10 K00266 gltD; glutamate synthase (NADPH/NADH) small chain [EC:1.4.1.13 1.4.1.14] 1.4.1.14 glutamate synthase (NADH) contig-65_174_8_13 K00266 gltD; glutamate synthase (NADPH/NADH) small chain [EC:1.4.1.13 1.4.1.14] contig-65_445_53_10 K00266 gltD; glutamate synthase (NADPH/NADH) small chain [EC:1.4.1.13 1.4.1.14] 1.4.1.18 lysine 6-dehydrogenase contig-65_1297_222_2 K19064 lysDH; lysine 6-dehydrogenase [EC:1.4.1.18] 1.4.1.26 2,4-diaminopentanoate dehydrogenase (NAD+) contig-65_285_18_14 K21672 ord; 2,4-diaminopentanoate dehydrogenase [EC:1.4.1.12 1.4.1.26] 1.4.4 With a disulfide as acceptor 1.4.4.2 glycine dehydrogenase (aminomethyl-transferring) contig-65_314_24_19 K00282 gcvPA; glycine dehydrogenase subunit 1 [EC:1.4.4.2] contig-65_314_24_20 K00283 gcvPB; glycine dehydrogenase subunit 2 [EC:1.4.4.2] 1.5 Acting on the CH-NH group of donors 1.5.1 With NAD+ or NADP+ as acceptor 1.5.1.2 pyrroline-5-carboxylate reductase contig-65_675_99_4 K00286 proC; pyrroline-5-carboxylate reductase [EC:1.5.1.2] 1.5.1.5 methylenetetrahydrofolate dehydrogenase (NADP+) contig-65_174_8_12 K01491 folD; methylenetetrahydrofolate dehydrogenase (NADP+) / methenyltetrahydrofolate cyclohydrolase [EC:1.5.1.5 3.5.4.9] 1.5.1.20 methylenetetrahydrofolate reductase [NAD(P)H] contig-65_203_12_13 K00297 metF, MTHFR; methylenetetrahydrofolate reductase (NADPH) [EC:1.5.1.20] 1.5.1.28 opine dehydrogenase contig-65_191_10_20 K04940 odh; opine dehydrogenase [EC:1.5.1.28] 1.5.5 With a quinone or similar compound as acceptor 1.5.5.- contig-65_123_4_16 K00313 fixC; electron transfer flavoprotein-quinone oxidoreductase [EC:1.5.5.-] 1.6 Acting on NADH or NADPH 1.6.5 With a quinone or similar compound as acceptor 1.6.5.3 NADH:ubiquinone reductase (H+-translocating) contig-65_440_51_1 K00330 nuoA; NADH-quinone oxidoreductase subunit A [EC:1.6.5.3] contig-65_618_87_6 K00330 nuoA; NADH-quinone oxidoreductase subunit A [EC:1.6.5.3] contig-65_440_51_2 K00331 nuoB; NADH-quinone oxidoreductase subunit B [EC:1.6.5.3] contig-65_618_87_5 K00331 nuoB; NADH-quinone oxidoreductase subunit B [EC:1.6.5.3] contig-65_440_51_3 K00332 nuoC; NADH-quinone oxidoreductase subunit C [EC:1.6.5.3] contig-65_618_87_4 K00332 nuoC; NADH-quinone oxidoreductase subunit C [EC:1.6.5.3] contig-65_440_51_10 K00333 nuoD; NADH-quinone oxidoreductase subunit D [EC:1.6.5.3] contig-65_618_87_3 K00333 nuoD; NADH-quinone oxidoreductase subunit D [EC:1.6.5.3] contig-65_1212_208_4 K00334 nuoE; NADH-quinone oxidoreductase subunit E [EC:1.6.5.3] contig-65_1066_187_2 K00335 nuoF; NADH-quinone oxidoreductase subunit F [EC:1.6.5.3] contig-65_440_51_11 K00337 nuoH; NADH-quinone oxidoreductase subunit H [EC:1.6.5.3] contig-65_618_87_2 K00337 nuoH; NADH-quinone oxidoreductase subunit H [EC:1.6.5.3] contig-65_440_51_12 K00338 nuoI; NADH-quinone oxidoreductase subunit I [EC:1.6.5.3] contig-65_1220_209_4 K00339 nuoJ; NADH-quinone oxidoreductase subunit J [EC:1.6.5.3] contig-65_1220_209_3 K00340 nuoK; NADH-quinone oxidoreductase subunit K [EC:1.6.5.3]

187

Supplementary Table 4.5 (Continued)

contig-65_1220_209_2 K00341 nuoL; NADH-quinone oxidoreductase subunit L [EC:1.6.5.3] contig-65_336_29_15 K00342 nuoM; NADH-quinone oxidoreductase subunit M [EC:1.6.5.3] contig-65_1220_209_1 K00342 nuoM; NADH-quinone oxidoreductase subunit M [EC:1.6.5.3] contig-65_336_29_14 K00343 nuoN; NADH-quinone oxidoreductase subunit N [EC:1.6.5.3] 1.7 Acting on other nitrogenous compounds as donors 1.7.99 With unknown physiological acceptors 1.7.99.1 hydroxylamine reductase contig-65_1175_203_2 K05601 hcp; hydroxylamine reductase [EC:1.7.99.1] contig-65_1203_207_4 K05601 hcp; hydroxylamine reductase [EC:1.7.99.1] 1.8 Acting on a sulfur group of donors 1.8.1 With NAD+ or NADP+ as acceptor 1.8.1.4 dihydrolipoyl dehydrogenase contig-65_174_8_33 K00382 DLD, lpd, pdhD; dihydrolipoamide dehydrogenase [EC:1.8.1.4] 1.8.1.9 thioredoxin-disulfide reductase contig-65_1006_179_2 K00384 trxB; thioredoxin reductase (NADPH) [EC:1.8.1.9] 1.8.98 With other, known, physiological acceptors 1.8.98.1 CoB---CoM heterodisulfide reductase contig-65_123_4_26 K03388 hdrA; heterodisulfide reductase subunit A [EC:1.8.98.1] contig-65_203_12_10 K03388 hdrA; heterodisulfide reductase subunit A [EC:1.8.98.1] contig-65_669_98_5 K03388 hdrA; heterodisulfide reductase subunit A [EC:1.8.98.1] contig-65_123_4_27 K03389 hdrB; heterodisulfide reductase subunit B [EC:1.8.98.1] contig-65_333_28_11 K03389 hdrB; heterodisulfide reductase subunit B [EC:1.8.98.1] contig-65_669_98_6 K03389 hdrB; heterodisulfide reductase subunit B [EC:1.8.98.1] contig-65_1296_221_4 K03389 hdrB; heterodisulfide reductase subunit B [EC:1.8.98.1] contig-65_2048_299_2 K03389 hdrB; heterodisulfide reductase subunit B [EC:1.8.98.1] contig-65_123_4_28 K03390 hdrC; heterodisulfide reductase subunit C [EC:1.8.98.1] contig-65_333_28_12 K03390 hdrC; heterodisulfide reductase subunit C [EC:1.8.98.1] contig-65_669_98_7 K03390 hdrC; heterodisulfide reductase subunit C [EC:1.8.98.1] contig-65_1296_221_5 K03390 hdrC; heterodisulfide reductase subunit C [EC:1.8.98.1] contig-65_1296_221_6 K03390 hdrC; heterodisulfide reductase subunit C [EC:1.8.98.1] 1.8.99 With unknown physiological acceptors 1.8.99.2 adenylyl-sulfate reductase contig-65_1697_263_2 K00394 aprA; adenylylsulfate reductase, subunit A [EC:1.8.99.2] contig-65_796_133_5 K00395 aprB; adenylylsulfate reductase, subunit B [EC:1.8.99.2] 1.12 Acting on hydrogen as donor 1.12.99 With unknown physiological acceptors 1.12.99.6 hydrogenase (acceptor) contig-65_189_9_18 K06281 hyaB, hybC; hydrogenase large subunit [EC:1.12.99.6] contig-65_189_9_19 K06282 hyaA, hybO; hydrogenase small subunit [EC:1.12.99.6] 1.12.99.- contig-65_333_28_7 K14126 mvhA, vhuA, vhcA; F420-non-reducing hydrogenase large subunit [EC:1.12.99.-] contig-65_669_98_2 K14126 mvhA, vhuA, vhcA; F420-non-reducing hydrogenase large subunit [EC:1.12.99.-] contig-65_333_28_8 K14128 mvhG, vhuG, vhcG; F420-non-reducing hydrogenase small subunit [EC:1.12.99.-] contig-65_669_98_3 K14128 mvhG, vhuG, vhcG; F420-non-reducing hydrogenase small subunit [EC:1.12.99.-] 1.15 Acting on superoxide as acceptor 1.15.1 Acting on superoxide as acceptor (only sub-subclass identified to date) 1.15.1.2 superoxide reductase contig-65_627_90_11 K05919 dfx; superoxide reductase [EC:1.15.1.2]

188

Supplementary Table 4.5 (Continued)

1.17 Acting on CH or CH2 groups 1.17.1 With NAD+ or NADP+ as acceptor 1.17.1.8 4-hydroxy-tetrahydrodipicolinate reductase contig-65_377_39_8 K00215 dapB; 4-hydroxy-tetrahydrodipicolinate reductase [EC:1.17.1.8] 1.17.4 With a disulfide as acceptor 1.17.4.1 ribonucleoside-diphosphate reductase contig-65_407_44_2 K00525 E1.17.4.1A, nrdA, nrdE; ribonucleoside-diphosphate reductase alpha chain [EC:1.17.4.1] 1.17.7 With an iron-sulfur protein as acceptor 1.17.7.1 (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (ferredoxin) contig-65_297_21_14 K03526 gcpE, ispG; (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase [EC:1.17.7.1 1.17.7.3] 1.17.7.3 (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase (flavodoxin) contig-65_297_21_14 K03526 gcpE, ispG; (E)-4-hydroxy-3-methylbut-2-enyl-diphosphate synthase [EC:1.17.7.1 1.17.7.3] 1.17.7.4 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase contig-65_620_88_2 K03527 ispH, lytB; 4-hydroxy-3-methylbut-2-en-1-yl diphosphate reductase [EC:1.17.7.4] 1.17.99 With unknown physiological acceptors 1.17.99.6 epoxyqueuosine reductase contig-65_307_23_22 K09765 queH; epoxyqueuosine reductase [EC:1.17.99.6] contig-65_1038_185_1 K09765 queH; epoxyqueuosine reductase [EC:1.17.99.6] contig-65_167_7_12 K18979 queG; epoxyqueuosine reductase [EC:1.17.99.6] 1.18 Acting on iron-sulfur proteins as donors 1.18.1 With NAD+ or NADP+ as acceptor 1.18.1.2 ferredoxin---NADP+ reductase contig-65_174_8_14 K00528 fpr; ferredoxin/flavodoxin---NADP+ reductase [EC:1.18.1.2 1.19.1.1] 1.19 Acting on reduced flavodoxin as donor 1.19.1 With NAD+ or NADP+ as acceptor 1.19.1.1 flavodoxin---NADP+ reductase contig-65_174_8_14 K00528 fpr; ferredoxin/flavodoxin---NADP+ reductase [EC:1.18.1.2 1.19.1.1] 1.21 Catalysing the reaction X-H + Y-H = X-Y 1.21.98 With other, known, physiological acceptors 1.21.98.1 cyclic dehypoxanthinyl futalosine synthase contig-65_292_20_13 K11784 mqnC; cyclic dehypoxanthinyl futalosine synthase [EC:1.21.98.1] 1.97 Other oxidoreductases 1.97.1 Sole sub-subclass for oxidoreductases that do not belong in the other subclasses 1.97.1.4 [formate-C-acetyltransferase]-activating enzyme contig-65_511_65_1 K04069 pflA, pflC, pflE; pyruvate formate lyase activating enzyme [EC:1.97.1.4] 2. 2.1 Transferring one-carbon groups 2.1.1 Methyltransferases 2.1.1.13 methionine synthase contig-65_342_30_14 K00548 metH, MTR; 5-methyltetrahydrofolate--homocysteine methyltransferase [EC:2.1.1.13] contig-65_380_41_15 K00548 metH, MTR; 5-methyltetrahydrofolate--homocysteine methyltransferase [EC:2.1.1.13]

189

Supplementary Table 4.5 (Continued)

contig-65_884_158_5 K00548 metH, MTR; 5-methyltetrahydrofolate--homocysteine methyltransferase [EC:2.1.1.13] 2.1.1.63 methylated-DNA---[protein]-cysteine S-methyltransferase contig-65_424_46_3 K00567 ogt, MGMT; methylated-DNA-[protein]-cysteine S- methyltransferase [EC:2.1.1.63] 2.1.1.72 site-specific DNA-methyltransferase (adenine-specific) contig-65_1418_239_4 K06223 dam; DNA adenine methylase [EC:2.1.1.72] 2.1.1.74 methylenetetrahydrofolate---tRNA-(uracil54-C5)-methyltransferase (FADH2-oxidizing) contig-65_1279_219_2 K04094 trmFO, gid; methylenetetrahydrofolate--tRNA-(uracil-5-)- methyltransferase [EC:2.1.1.74] 2.1.1.79 cyclopropane-fatty-acyl-phospholipid synthase contig-65_367_37_17 K00574 cfa; cyclopropane-fatty-acyl-phospholipid synthase [EC:2.1.1.79] contig-65_1186_205_1 K00574 cfa; cyclopropane-fatty-acyl-phospholipid synthase [EC:2.1.1.79] 2.1.1.80 protein-glutamate O-methyltransferase contig-65_109_2_27 K00575 cheR; chemotaxis protein methyltransferase CheR [EC:2.1.1.80] 2.1.1.107 uroporphyrinogen-III C-methyltransferase contig-65_737_114_7 K13542 cobA-hemD; uroporphyrinogen III methyltransferase / synthase [EC:2.1.1.107 4.2.1.75] 2.1.1.130 precorrin-2 C20-methyltransferase contig-65_632_92_8 K03394 cobI-cbiL; precorrin-2/cobalt-factor-2 C20-methyltransferase [EC:2.1.1.130 2.1.1.151] 2.1.1.131 precorrin-3B C17-methyltransferase contig-65_443_52_13 K05934 E2.1.1.131, cobJ, cbiH; precorrin-3B C17-methyltransferase [EC:2.1.1.131] 2.1.1.133 precorrin-4 C11-methyltransferase contig-65_632_92_9 K05936 cobM, cbiF; precorrin-4/cobalt-precorrin-4 C11-methyltransferase [EC:2.1.1.133 2.1.1.271] 2.1.1.148 thymidylate synthase (FAD) contig-65_360_35_16 K03465 thyX, thy1; thymidylate synthase (FAD) [EC:2.1.1.148] 2.1.1.151 cobalt-factor II C20-methyltransferase contig-65_632_92_8 K03394 cobI-cbiL; precorrin-2/cobalt-factor-2 C20-methyltransferase [EC:2.1.1.130 2.1.1.151] 2.1.1.163 demethylmenaquinone methyltransferase contig-65_737_114_1 K03183 ubiE; demethylmenaquinone methyltransferase / 2-methoxy-6- polyprenyl-1,4-benzoquinol methylase [EC:2.1.1.163 2.1.1.201] contig-65_1099_193_1 K03183 ubiE; demethylmenaquinone methyltransferase / 2-methoxy-6- polyprenyl-1,4-benzoquinol methylase [EC:2.1.1.163 2.1.1.201] 2.1.1.170 16S rRNA (guanine527-N7)-methyltransferase contig-65_462_59_15 K03501 gidB, rsmG; 16S rRNA (guanine527-N7)-methyltransferase [EC:2.1.1.170] 2.1.1.176 16S rRNA (cytosine967-C5)-methyltransferase contig-65_843_149_4 K03500 rsmB, sun; 16S rRNA (cytosine967-C5)-methyltransferase [EC:2.1.1.176] 2.1.1.177 23S rRNA (pseudouridine1915-N3)-methyltransferase contig-65_109_2_9 K00783 rlmH; 23S rRNA (pseudouridine1915-N3)-methyltransferase [EC:2.1.1.177] 2.1.1.182 16S rRNA (adenine1518-N6/adenine1519-N6)-dimethyltransferase contig-65_531_68_8 K02528 ksgA; 16S rRNA (adenine1518-N6/adenine1519-N6)- dimethyltransferase [EC:2.1.1.182] 2.1.1.185 23S rRNA (guanosine2251-2'-O)-methyltransferase

190

Supplementary Table 4.5 (Continued)

contig-65_360_35_17 K03218 rlmB; 23S rRNA (guanosine2251-2'-O)-methyltransferase [EC:2.1.1.185] 2.1.1.190 23S rRNA (uracil1939-C5)-methyltransferase contig-65_1350_230_4 K03215 rumA; 23S rRNA (uracil1939-C5)-methyltransferase [EC:2.1.1.190] 2.1.1.192 23S rRNA (adenine2503-C2)-methyltransferase contig-65_843_149_3 K06941 rlmN; 23S rRNA (adenine2503-C2)-methyltransferase [EC:2.1.1.192] 2.1.1.193 16S rRNA (uracil1498-N3)-methyltransferase contig-65_819_141_2 K09761 rsmE; 16S rRNA (uracil1498-N3)-methyltransferase [EC:2.1.1.193] 2.1.1.195 cobalt-precorrin-5B (C1)-methyltransferase contig-65_632_92_6 K02188 cbiD; cobalt-precorrin-5B (C1)-methyltransferase [EC:2.1.1.195] 2.1.1.196 cobalt-precorrin-6B (C15)-methyltransferase [decarboxylating] contig-65_632_92_7 K02191 cbiT; cobalt-precorrin-6B (C15)-methyltransferase [EC:2.1.1.196] 2.1.1.198 16S rRNA (cytidine1402-2'-O)-methyltransferase contig-65_1072_189_3 K07056 rsmI; 16S rRNA (cytidine1402-2'-O)-methyltransferase [EC:2.1.1.198] 2.1.1.199 16S rRNA (cytosine1402-N4)-methyltransferase contig-65_323_26_2 K03438 mraW, rsmH; 16S rRNA (cytosine1402-N4)-methyltransferase [EC:2.1.1.199] 2.1.1.201 2-methoxy-6-polyprenyl-1,4-benzoquinol methylase contig-65_737_114_1 K03183 ubiE; demethylmenaquinone methyltransferase / 2-methoxy-6- polyprenyl-1,4-benzoquinol methylase [EC:2.1.1.163 2.1.1.201] contig-65_1099_193_1 K03183 ubiE; demethylmenaquinone methyltransferase / 2-methoxy-6- polyprenyl-1,4-benzoquinol methylase [EC:2.1.1.163 2.1.1.201] 2.1.1.217 tRNA (adenine22-N1)-methyltransferase contig-65_447_54_14 K06967 trmK; tRNA (adenine22-N1)-methyltransferase [EC:2.1.1.217] 2.1.1.226 23S rRNA (cytidine1920-2'-O)-methyltransferase contig-65_174_8_6 K06442 tlyA; 23S rRNA (cytidine1920-2'-O)/16S rRNA (cytidine1409-2'- O)-methyltransferase [EC:2.1.1.226 2.1.1.227] 2.1.1.227 16S rRNA (cytidine1409-2'-O)-methyltransferase contig-65_174_8_6 K06442 tlyA; 23S rRNA (cytidine1920-2'-O)/16S rRNA (cytidine1409-2'- O)-methyltransferase [EC:2.1.1.226 2.1.1.227] 2.1.1.228 tRNA (guanine37-N1)-methyltransferase contig-65_562_73_7 K00554 trmD; tRNA (guanine37-N1)-methyltransferase [EC:2.1.1.228] 2.1.1.245 5-methyltetrahydrosarcinapterin:corrinoid/iron-sulfur protein Co-methyltransferase contig-65_203_12_8 K00194 cdhD, acsD; acetyl-CoA decarbonylase/synthase complex subunit delta [EC:2.1.1.245] contig-65_203_12_5 K00197 cdhE, acsC; acetyl-CoA decarbonylase/synthase complex subunit gamma [EC:2.1.1.245] 2.1.1.246 [methyl-Co(III) methanol-specific corrinoid protein]:coenzyme M methyltransferase contig-65_380_41_3 K14080 mtaA; [methyl-Co(III) methanol-specific corrinoid protein]:coenzyme M methyltransferase [EC:2.1.1.246] 2.1.1.249 dimethylamine---corrinoid protein Co-methyltransferase contig-65_572_76_8 K16178 mtbB; dimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.249] contig-65_572_76_9 K16178 mtbB; dimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.249] 2.1.1.250 trimethylamine---corrinoid protein Co-methyltransferase contig-65_572_76_4 K14083 mttB; trimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.250]

191

Supplementary Table 4.5 (Continued)

contig-65_572_76_6 K14083 mttB; trimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.250] contig-65_572_76_7 K14083 mttB; trimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.250] contig-65_669_98_10 K14083 mttB; trimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.250] contig-65_710_107_1 K14083 mttB; trimethylamine---corrinoid protein Co-methyltransferase [EC:2.1.1.250] 2.1.1.258 5-methyltetrahydrofolate:corrinoid/iron-sulfur protein Co-methyltransferase contig-65_203_12_9 K15023 acsE; 5-methyltetrahydrofolate corrinoid/iron sulfur protein methyltransferase [EC:2.1.1.258] 2.1.1.271 cobalt-precorrin-4 methyltransferase contig-65_632_92_9 K05936 cobM, cbiF; precorrin-4/cobalt-precorrin-4 C11-methyltransferase [EC:2.1.1.133 2.1.1.271] 2.1.1.297 peptide chain release factor N5-glutamine methyltransferase contig-65_376_38_17 K02493 hemK, prmC, HEMK; release factor glutamine methyltransferase [EC:2.1.1.297] 2.1.1.- contig-65_714_108_2 K02654 pilD, pppA; leader peptidase (prepilin peptidase) / N- methyltransferase [EC:3.4.23.43 2.1.1.-] 2.1.2 Hydroxymethyl-, formyl- and related transferases 2.1.2.1 glycine hydroxymethyltransferase contig-65_174_8_32 K00600 glyA, SHMT; glycine hydroxymethyltransferase [EC:2.1.2.1] 2.1.2.2 phosphoribosylglycinamide formyltransferase contig-65_548_71_1 K11175 purN; phosphoribosylglycinamide formyltransferase 1 [EC:2.1.2.2] 2.1.2.3 phosphoribosylaminoimidazolecarboxamide formyltransferase contig-65_1404_238_2 K00602 purH; phosphoribosylaminoimidazolecarboxamide formyltransferase / IMP cyclohydrolase [EC:2.1.2.3 3.5.4.10] 2.1.2.5 glutamate formimidoyltransferase contig-65_600_82_2 K00603 fctD; glutamate formiminotransferase [EC:2.1.2.5] 2.1.2.9 methionyl-tRNA formyltransferase contig-65_380_41_5 K00604 MTFMT, fmt; methionyl-tRNA formyltransferase [EC:2.1.2.9] contig-65_751_117_8 K00604 MTFMT, fmt; methionyl-tRNA formyltransferase [EC:2.1.2.9] contig-65_843_149_7 K00604 MTFMT, fmt; methionyl-tRNA formyltransferase [EC:2.1.2.9] 2.1.2.10 aminomethyltransferase contig-65_314_24_17 K00605 gcvT, AMT; aminomethyltransferase [EC:2.1.2.10] 2.1.2.11 3-methyl-2-oxobutanoate hydroxymethyltransferase contig-65_551_72_5 K00606 panB; 3-methyl-2-oxobutanoate hydroxymethyltransferase [EC:2.1.2.11] contig-65_725_111_7 K00606 panB; 3-methyl-2-oxobutanoate hydroxymethyltransferase [EC:2.1.2.11] 2.1.3 Carboxy- and carbamoyltransferases 2.1.3.2 aspartate carbamoyltransferase contig-65_191_10_5 K00609 pyrB, PYR2; aspartate carbamoyltransferase catalytic subunit [EC:2.1.3.2] 2.1.3.3 ornithine carbamoyltransferase contig-65_511_65_3 K00611 OTC, argF, argI; ornithine carbamoyltransferase [EC:2.1.3.3] 2.2 Transferring aldehyde or ketonic groups 2.2.1 Transketolases and transaldolases 2.2.1.1 transketolase contig-65_1266_216_1 K00615 E2.2.1.1, tktA, tktB; transketolase [EC:2.2.1.1]

192

Supplementary Table 4.5 (Continued)

contig-65_1266_216_2 K00615 E2.2.1.1, tktA, tktB; transketolase [EC:2.2.1.1] 2.2.1.2 transaldolase contig-65_248_15_15 K00616 E2.2.1.2, talA, talB; transaldolase [EC:2.2.1.2] 2.2.1.6 acetolactate synthase contig-65_250_16_23 K01652 E2.2.1.6L, ilvB, ilvG, ilvI; acetolactate synthase I/II/III large subunit [EC:2.2.1.6] contig-65_352_33_3 K01652 E2.2.1.6L, ilvB, ilvG, ilvI; acetolactate synthase I/II/III large subunit [EC:2.2.1.6] contig-65_352_33_6 K01652 E2.2.1.6L, ilvB, ilvG, ilvI; acetolactate synthase I/II/III large subunit [EC:2.2.1.6] contig-65_352_33_5 K01653 E2.2.1.6S, ilvH, ilvN; acetolactate synthase I/III small subunit [EC:2.2.1.6] 2.2.1.7 1-deoxy-D-xylulose-5-phosphate synthase contig-65_174_8_7 K01662 dxs; 1-deoxy-D-xylulose-5-phosphate synthase [EC:2.2.1.7] 2.3 Acyltransferases 2.3.1 Transferring groups other than aminoacyl groups 2.3.1.1 amino-acid N-acetyltransferase contig-65_566_74_2 K00620 argJ; glutamate N-acetyltransferase / amino-acid N- acetyltransferase [EC:2.3.1.35 2.3.1.1] 2.3.1.8 phosphate acetyltransferase contig-65_151_5_20 K15024 K15024; putative phosphotransacetylase [EC:2.3.1.8] contig-65_604_85_5 K15024 K15024; putative phosphotransacetylase [EC:2.3.1.8] 2.3.1.9 acetyl-CoA C-acetyltransferase contig-65_123_4_38 K00626 E2.3.1.9, atoB; acetyl-CoA C-acetyltransferase [EC:2.3.1.9] contig-65_189_9_21 K00626 E2.3.1.9, atoB; acetyl-CoA C-acetyltransferase [EC:2.3.1.9] 2.3.1.12 dihydrolipoyllysine-residue acetyltransferase contig-65_167_7_21 K00627 DLAT, aceF, pdhC; pyruvate dehydrogenase E2 component (dihydrolipoamide acetyltransferase) [EC:2.3.1.12] 2.3.1.15 glycerol-3-phosphate 1-O-acyltransferase contig-65_628_91_11 K03621 plsX; glycerol-3-phosphate acyltransferase PlsX [EC:2.3.1.15] contig-65_1275_218_1 K03621 plsX; glycerol-3-phosphate acyltransferase PlsX [EC:2.3.1.15] contig-65_684_101_2 K08591 plsY; glycerol-3-phosphate acyltransferase PlsY [EC:2.3.1.15] contig-65_705_105_5 K08591 plsY; glycerol-3-phosphate acyltransferase PlsY [EC:2.3.1.15] contig-65_1085_191_1 K08591 plsY; glycerol-3-phosphate acyltransferase PlsY [EC:2.3.1.15] 2.3.1.19 phosphate butyryltransferase contig-65_645_95_3 K00634 ptb; phosphate butyryltransferase [EC:2.3.1.19] 2.3.1.30 serine O-acetyltransferase contig-65_360_35_12 K00640 cysE; serine O-acetyltransferase [EC:2.3.1.30] 2.3.1.31 homoserine O-acetyltransferase contig-65_118_3_13 K00641 metX; homoserine O-acetyltransferase [EC:2.3.1.31] 2.3.1.35 glutamate N-acetyltransferase contig-65_566_74_2 K00620 argJ; glutamate N-acetyltransferase / amino-acid N- acetyltransferase [EC:2.3.1.35 2.3.1.1] 2.3.1.39 [acyl-carrier-protein] S-malonyltransferase contig-65_628_91_8 K00645 fabD; [acyl-carrier-protein] S-malonyltransferase [EC:2.3.1.39] 2.3.1.51 1-acylglycerol-3-phosphate O-acyltransferase contig-65_620_88_3 K00655 plsC; 1-acyl-sn-glycerol-3-phosphate acyltransferase [EC:2.3.1.51] 2.3.1.101 formylmethanofuran---tetrahydromethanopterin N-formyltransferase contig-65_212_13_17 K00672 ftr; formylmethanofuran--tetrahydromethanopterin N- formyltransferase [EC:2.3.1.101] 2.3.1.128 ribosomal-protein-alanine N-acetyltransferase

193

Supplementary Table 4.5 (Continued)

contig-65_675_99_8 K03789 rimI; ribosomal-protein-alanine N-acetyltransferase [EC:2.3.1.128] contig-65_885_159_6 K03789 rimI; ribosomal-protein-alanine N-acetyltransferase [EC:2.3.1.128] 2.3.1.157 glucosamine-1-phosphate N-acetyltransferase contig-65_336_29_3 K04042 glmU; bifunctional UDP-N-acetylglucosamine pyrophosphorylase / Glucosamine-1-phosphate N-acetyltransferase [EC:2.7.7.23 2.3.1.157] 2.3.1.169 CO-methylating acetyl-CoA synthase contig-65_203_12_4 K14138 acsB; acetyl-CoA synthase [EC:2.3.1.169] 2.3.1.179 beta-ketoacyl-[acyl-carrier-protein] synthase II contig-65_628_91_5 K09458 fabF; 3-oxoacyl-[acyl-carrier-protein] synthase II [EC:2.3.1.179] 2.3.1.180 beta-ketoacyl-[acyl-carrier-protein] synthase III contig-65_628_91_10 K00648 fabH; 3-oxoacyl-[acyl-carrier-protein] synthase III [EC:2.3.1.180] 2.3.1.181 lipoyl(octanoyl) contig-65_167_7_22 K03801 lipB; lipoyl(octanoyl) transferase [EC:2.3.1.181] 2.3.1.201 UDP-2-acetamido-3-amino-2,3-dideoxy-glucuronate N-acetyltransferase contig-65_627_90_3 K13018 wbpD, wlbB; UDP-2-acetamido-3-amino-2,3-dideoxy- glucuronate N-acetyltransferase [EC:2.3.1.201] 2.3.1.234 N6-L-threonylcarbamoyladenine synthase contig-65_885_159_5 K01409 KAE1, tsaD, QRI7; N6-L-threonylcarbamoyladenine synthase [EC:2.3.1.234] 2.3.1.263 2-amino-4-oxopentanoate thiolase contig-65_285_18_12 K21399 ortA; 2-amino-4-ketopentanoate thiolase alpha subunit [EC:2.3.1.263] contig-65_285_18_13 K21400 ortB; 2-amino-4-ketopentanoate thiolase beta subunit [EC:2.3.1.263] 2.3.2 Aminoacyltransferases 2.3.2.2 gamma-glutamyltransferase contig-65_439_50_14 K00681 ggt; gamma-glutamyltranspeptidase / glutathione [EC:2.3.2.2 3.4.19.13] 2.3.3 Acyl groups converted into alkyl groups on transfer 2.3.3.13 2-isopropylmalate synthase contig-65_352_33_10 K01649 leuA, IMS; 2-isopropylmalate synthase [EC:2.3.3.13] 2.3.3.14 homocitrate synthase contig-65_592_80_7 K02594 nifV; homocitrate synthase NifV [EC:2.3.3.14] 2.4 Glycosyltransferases 2.4.1 Hexosyltransferases 2.4.1.15 alpha,alpha-trehalose-phosphate synthase (UDP-forming) contig-65_1302_223_2 K00697 otsA; trehalose 6-phosphate synthase [EC:2.4.1.15] 2.4.1.129 peptidoglycan glycosyltransferase contig-65_424_46_1 K05366 mrcA; penicillin-binding protein 1A [EC:2.4.1.129 3.4.16.4] 2.4.1.187 N-acetylglucosaminyldiphosphoundecaprenol N-acetyl-beta-D- mannosaminyltransferase contig-65_1266_216_3 K05946 tagA, tarA; N-acetylglucosaminyldiphosphoundecaprenol N- acetyl-beta-D-mannosaminyltransferase [EC:2.4.1.187] 2.4.1.227 undecaprenyldiphospho-muramoylpentapeptide beta-N-acetylglucosaminyltransferase contig-65_323_26_9 K02563 murG; UDP-N-acetylglucosamine--N-acetylmuramyl- (pentapeptide) pyrophosphoryl-undecaprenol N-acetylglucosamine transferase [EC:2.4.1.227] 2.4.1.245 alpha,alpha-trehalose synthase contig-65_1004_178_3 K13057 treT; trehalose synthase [EC:2.4.1.245] 2.4.2 Pentosyltransferases

194

Supplementary Table 4.5 (Continued)

2.4.2.2 pyrimidine-nucleoside contig-65_359_34_8 K00756 pdp; pyrimidine-nucleoside phosphorylase [EC:2.4.2.2] 2.4.2.7 adenine phosphoribosyltransferase contig-65_320_25_6 K00759 APRT, apt; adenine phosphoribosyltransferase [EC:2.4.2.7] 2.4.2.8 hypoxanthine phosphoribosyltransferase contig-65_1436_241_2 K00760 hprT, hpt, HPRT1; hypoxanthine phosphoribosyltransferase [EC:2.4.2.8] 2.4.2.9 uracil phosphoribosyltransferase contig-65_191_10_4 K02825 pyrR; pyrimidine operon attenuation protein / uracil phosphoribosyltransferase [EC:2.4.2.9] 2.4.2.10 orotate phosphoribosyltransferase contig-65_191_10_10 K00762 pyrE; orotate phosphoribosyltransferase [EC:2.4.2.10] 2.4.2.14 amidophosphoribosyltransferase contig-65_548_71_3 K00764 purF, PPAT; amidophosphoribosyltransferase [EC:2.4.2.14] 2.4.2.17 ATP phosphoribosyltransferase contig-65_595_81_11 K00765 hisG; ATP phosphoribosyltransferase [EC:2.4.2.17] 2.4.2.18 anthranilate phosphoribosyltransferase contig-65_212_13_1 K00766 trpD; anthranilate phosphoribosyltransferase [EC:2.4.2.18] 2.4.2.21 nicotinate-nucleotide---dimethylbenzimidazole phosphoribosyltransferase contig-65_118_3_29 K00768 E2.4.2.21, cobU, cobT; nicotinate-nucleotide-- dimethylbenzimidazole phosphoribosyltransferase [EC:2.4.2.21] 2.4.2.28 S-methyl-5'-thioadenosine phosphorylase contig-65_95_1_40 K00772 mtaP, MTAP; 5'-methylthioadenosine phosphorylase [EC:2.4.2.28] contig-65_292_20_14 K00772 mtaP, MTAP; 5'-methylthioadenosine phosphorylase [EC:2.4.2.28] contig-65_457_57_6 K00772 mtaP, MTAP; 5'-methylthioadenosine phosphorylase [EC:2.4.2.28] 2.4.2.29 tRNA-guanosine34 transglycosylase contig-65_659_97_3 K00773 tgt, QTRT1; queuine tRNA-ribosyltransferase [EC:2.4.2.29] 2.4.2.52 triphosphoribosyl-dephospho-CoA synthase contig-65_212_13_13 K05966 citG; triphosphoribosyl-dephospho-CoA synthase [EC:2.4.2.52] 2.4.2.- contig-65_595_81_8 K02501 hisH; glutamine amidotransferase [EC:2.4.2.-] 2.4.99 Transferring other glycosyl groups 2.4.99.17 S-adenosylmethionine:tRNA ribosyltransferase- contig-65_659_97_2 K07568 queA; S-adenosylmethionine:tRNA ribosyltransferase-isomerase [EC:2.4.99.17] 2.4.- Glycosyltransferases 2.4.-.- contig-65_302_22_16 K02849 waaQ, rfaQ; heptosyltransferase III [EC:2.4.-.-] contig-65_2824_325_2 K20534 gtrB, csbB; polyisoprenyl-phosphate glycosyltransferase [EC:2.4.-.-] 2.5 Transferring alkyl or aryl groups, other than methyl groups 2.5.1 Transferring alkyl or aryl groups, other than methyl groups (only sub-subclass identified to date) 2.5.1.1 dimethylallyltranstransferase contig-65_174_8_9 K13789 GGPS; geranylgeranyl diphosphate synthase, type II [EC:2.5.1.1 2.5.1.10 2.5.1.29] 2.5.1.3 thiamine phosphate synthase contig-65_302_22_15 K00788 thiE; thiamine-phosphate pyrophosphorylase [EC:2.5.1.3] contig-65_604_85_1 K00788 thiE; thiamine-phosphate pyrophosphorylase [EC:2.5.1.3] contig-65_725_111_10 K00788 thiE; thiamine-phosphate pyrophosphorylase [EC:2.5.1.3]

195

Supplementary Table 4.5 (Continued)

2.5.1.6 methionine adenosyltransferase contig-65_696_104_7 K00789 metK; S-adenosylmethionine synthetase [EC:2.5.1.6] 2.5.1.7 UDP-N-acetylglucosamine 1-carboxyvinyltransferase contig-65_109_2_7 K00790 murA; UDP-N-acetylglucosamine 1-carboxyvinyltransferase [EC:2.5.1.7] contig-65_323_26_11 K00790 murA; UDP-N-acetylglucosamine 1-carboxyvinyltransferase [EC:2.5.1.7] contig-65_779_126_7 K00790 murA; UDP-N-acetylglucosamine 1-carboxyvinyltransferase [EC:2.5.1.7] 2.5.1.9 riboflavin synthase contig-65_193_11_20 K00793 ribE, RIB5; riboflavin synthase [EC:2.5.1.9] 2.5.1.10 (2E,6E)-farnesyl diphosphate synthase contig-65_174_8_9 K13789 GGPS; geranylgeranyl diphosphate synthase, type II [EC:2.5.1.1 2.5.1.10 2.5.1.29] 2.5.1.15 dihydropteroate synthase contig-65_438_49_10 K00796 folP; dihydropteroate synthase [EC:2.5.1.15] 2.5.1.16 spermidine synthase contig-65_109_2_18 K00797 speE, SRM; spermidine synthase [EC:2.5.1.16] 2.5.1.17 cob(I)yrinic acid a,c-diamide adenosyltransferase contig-65_342_30_19 K19221 cobA, btuR; cob(I)alamin adenosyltransferase [EC:2.5.1.17] 2.5.1.19 3-phosphoshikimate 1-carboxyvinyltransferase contig-65_458_58_9 K00800 aroA; 3-phosphoshikimate 1-carboxyvinyltransferase [EC:2.5.1.19] 2.5.1.29 geranylgeranyl diphosphate synthase contig-65_174_8_9 K13789 GGPS; geranylgeranyl diphosphate synthase, type II [EC:2.5.1.1 2.5.1.10 2.5.1.29] 2.5.1.30 heptaprenyl diphosphate synthase contig-65_123_4_21 K00805 hepST; heptaprenyl diphosphate synthase [EC:2.5.1.30] 2.5.1.31 ditrans,polycis-undecaprenyl-diphosphate synthase [(2E,6E)-farnesyl-diphosphate specific] contig-65_297_21_19 K00806 uppS; undecaprenyl diphosphate synthase [EC:2.5.1.31] 2.5.1.39 4-hydroxybenzoate polyprenyltransferase contig-65_1099_193_5 K03179 ubiA; 4-hydroxybenzoate polyprenyltransferase [EC:2.5.1.39] 2.5.1.47 cysteine synthase contig-65_307_23_10 K01738 cysK; cysteine synthase A [EC:2.5.1.47] 2.5.1.49 O-acetylhomoserine aminocarboxypropyltransferase contig-65_118_3_14 K01740 metY; O-acetylhomoserine (thiol)-lyase [EC:2.5.1.49] 2.5.1.54 3-deoxy-7-phosphoheptulonate synthase contig-65_458_58_11 K03856 AROA2, aroA; 3-deoxy-7-phosphoheptulonate synthase [EC:2.5.1.54] 2.5.1.61 hydroxymethylbilane synthase contig-65_737_114_6 K01749 hemC, HMBS; hydroxymethylbilane synthase [EC:2.5.1.61] 2.5.1.75 tRNA dimethylallyltransferase contig-65_220_14_3 K00791 miaA, TRIT1; tRNA dimethylallyltransferase [EC:2.5.1.75] 2.5.1.78 6,7-dimethyl-8-ribityllumazine synthase contig-65_193_11_18 K00794 ribH, RIB4; 6,7-dimethyl-8-ribityllumazine synthase [EC:2.5.1.78] 2.5.1.120 aminodeoxyfutalosine synthase contig-65_292_20_11 K18285 mqnE; aminodeoxyfutalosine synthase [EC:2.5.1.120] 2.5.1.129 flavin prenyltransferase contig-65_250_16_21 K03186 ubiX, bsdB, PAD1; flavin prenyltransferase [EC:2.5.1.129]

196

Supplementary Table 4.5 (Continued)

2.5.1.131 (4-{4-[2-(gamma-L-glutamylamino)ethyl]phenoxymethyl}furan-2-yl)methanamine synthase contig-65_212_13_10 K07072 mfnF; (4-(4-[2-(gamma-L- glutamylamino)ethyl]phenoxymethyl)furan-2-yl)methanamine synthase [EC:2.5.1.131] 2.6 Transferring nitrogenous groups 2.6.1 Transaminases 2.6.1.1 aspartate transaminase contig-65_406_43_3 K00812 aspB; aspartate aminotransferase [EC:2.6.1.1] 2.6.1.9 histidinol-phosphate transaminase contig-65_378_40_5 K00817 hisC; histidinol-phosphate aminotransferase [EC:2.6.1.9] contig-65_535_70_10 K00817 hisC; histidinol-phosphate aminotransferase [EC:2.6.1.9] 2.6.1.11 acetylornithine transaminase contig-65_566_74_4 K00821 argD; acetylornithine/N-succinyldiaminopimelate aminotransferase [EC:2.6.1.11 2.6.1.17] 2.6.1.16 glutamine---fructose-6-phosphate transaminase (isomerizing) contig-65_123_4_3 K00820 glmS, GFPT; glucosamine--fructose-6-phosphate aminotransferase (isomerizing) [EC:2.6.1.16] 2.6.1.17 succinyldiaminopimelate transaminase contig-65_566_74_4 K00821 argD; acetylornithine/N-succinyldiaminopimelate aminotransferase [EC:2.6.1.11 2.6.1.17] 2.6.1.19 4-aminobutyrate---2-oxoglutarate transaminase contig-65_572_76_5 K00823 puuE; 4-aminobutyrate aminotransferase [EC:2.6.1.19] 2.6.1.21 D-amino-acid transaminase contig-65_562_73_1 K00824 dat; D- [EC:2.6.1.21] 2.6.1.42 branched-chain-amino-acid transaminase contig-65_443_52_1 K00826 E2.6.1.42, ilvE; branched-chain amino acid aminotransferase [EC:2.6.1.42] contig-65_566_74_7 K00826 E2.6.1.42, ilvE; branched-chain amino acid aminotransferase [EC:2.6.1.42] contig-65_566_74_8 K00826 E2.6.1.42, ilvE; branched-chain amino acid aminotransferase [EC:2.6.1.42] 2.6.1.82 putrescine aminotransferase contig-65_302_22_11 K09251 patA; putrescine aminotransferase [EC:2.6.1.82] 2.6.1.83 LL-diaminopimelate aminotransferase contig-65_118_3_8 K10206 E2.6.1.83; LL-diaminopimelate aminotransferase [EC:2.6.1.83] contig-65_191_10_27 K10206 E2.6.1.83; LL-diaminopimelate aminotransferase [EC:2.6.1.83] 2.6.1.85 aminodeoxychorismate synthase contig-65_443_52_2 K01665 pabB; para-aminobenzoate synthetase component I [EC:2.6.1.85] 2.6.1.- contig-65_501_64_2 K05825 LYSN; 2-aminoadipate transaminase [EC:2.6.1.-] contig-65_109_2_13 K08969 mtnE, mtnV; aminotransferase [EC:2.6.1.-] 2.7 Transferring -containing groups 2.7.1 with an alcohol group as acceptor 2.7.1.2 contig-65_1004_178_6 K00845 glk; glucokinase [EC:2.7.1.2] contig-65_1355_232_1 K00845 glk; glucokinase [EC:2.7.1.2] 2.7.1.11 6- contig-65_406_43_10 K00850 pfkA, PFK; 6- [EC:2.7.1.11] 2.7.1.23 NAD+ kinase contig-65_174_8_4 K00858 ppnK, NADK; NAD+ kinase [EC:2.7.1.23] 2.7.1.24 dephospho-CoA kinase contig-65_1071_188_5 K00859 coaE; dephospho-CoA kinase [EC:2.7.1.24]

197

Supplementary Table 4.5 (Continued)

2.7.1.25 adenylyl-sulfate kinase contig-65_1162_201_1 K00860 cysC; adenylylsulfate kinase [EC:2.7.1.25] contig-65_1697_263_1 K00860 cysC; adenylylsulfate kinase [EC:2.7.1.25] 2.7.1.26 contig-65_297_21_1 K11753 ribF; riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2] 2.7.1.33 contig-65_709_106_5 K03525 coaX; type III pantothenate kinase [EC:2.7.1.33] 2.7.1.39 contig-65_1142_199_1 K00872 thrB1; homoserine kinase [EC:2.7.1.39] 2.7.1.40 pyruvate kinase contig-65_406_43_11 K00873 PK, pyk; pyruvate kinase [EC:2.7.1.40] 2.7.1.45 2-dehydro-3-deoxygluconokinase contig-65_123_4_7 K00874 kdgK; 2-dehydro-3-deoxygluconokinase [EC:2.7.1.45] 2.7.1.49 hydroxymethylpyrimidine kinase contig-65_604_85_3 K00941 thiD; hydroxymethylpyrimidine/phosphomethylpyrimidine kinase [EC:2.7.1.49 2.7.4.7] 2.7.1.50 hydroxyethylthiazole kinase contig-65_604_85_2 K00878 thiM; hydroxyethylthiazole kinase [EC:2.7.1.50] 2.7.1.71 contig-65_714_108_10 K00891 E2.7.1.71, aroK, aroL; shikimate kinase [EC:2.7.1.71] 2.7.1.107 (ATP) contig-65_723_110_1 K00901 dgkA, DGK; diacylglycerol kinase (ATP) [EC:2.7.1.107] 2.7.1.148 4-(cytidine 5'-diphospho)-2-C-methyl-D- contig-65_531_68_2 K00919 ispE; 4-diphosphocytidyl-2-C-methyl-D-erythritol kinase [EC:2.7.1.148] 2.7.1.156 adenosylcobinamide kinase contig-65_769_123_5 K02231 cobP, cobU; adenosylcobinamide kinase / adenosylcobinamide- phosphate [EC:2.7.1.156 2.7.7.62] 2.7.1.165 glycerate 2-kinase contig-65_1761_269_2 K00865 glxK, garK; glycerate 2-kinase [EC:2.7.1.165] contig-65_447_54_8 K11529 gck, gckA, GLYCTK; glycerate 2-kinase [EC:2.7.1.165] 2.7.1.177 L-threonine kinase contig-65_118_3_23 K16651 pduX; L-threonine kinase [EC:2.7.1.177] 2.7.2 Phosphotransferases with a carboxy group as acceptor 2.7.2.1 acetate kinase contig-65_633_93_5 K00925 ackA; acetate kinase [EC:2.7.2.1] 2.7.2.2 contig-65_478_61_3 K00926 arcC; carbamate kinase [EC:2.7.2.2] contig-65_511_65_2 K00926 arcC; carbamate kinase [EC:2.7.2.2] 2.7.2.3 contig-65_583_78_8 K00927 PGK, pgk; phosphoglycerate kinase [EC:2.7.2.3] 2.7.2.4 contig-65_377_39_12 K00928 lysC; aspartate kinase [EC:2.7.2.4] 2.7.2.7 contig-65_645_95_2 K00929 buk; butyrate kinase [EC:2.7.2.7] 2.7.2.8 contig-65_566_74_3 K00930 argB; acetylglutamate kinase [EC:2.7.2.8] 2.7.2.11 glutamate 5-kinase contig-65_870_156_3 K00931 proB; glutamate 5-kinase [EC:2.7.2.11] 2.7.4 Phosphotransferases with a phosphate group as acceptor 2.7.4.3

198

Supplementary Table 4.5 (Continued)

contig-65_346_31_16 K00939 , AK; adenylate kinase [EC:2.7.4.3] 2.7.4.6 nucleoside-diphosphate kinase contig-65_95_1_43 K00940 ndk, NME; nucleoside-diphosphate kinase [EC:2.7.4.6] 2.7.4.7 phosphooxymethylpyrimidine kinase contig-65_604_85_3 K00941 thiD; hydroxymethylpyrimidine/phosphomethylpyrimidine kinase [EC:2.7.1.49 2.7.4.7] 2.7.4.8 contig-65_751_117_3 K00942 E2.7.4.8, gmk; guanylate kinase [EC:2.7.4.8] 2.7.4.9 dTMP kinase contig-65_725_111_3 K00943 tmk, DTYMK; dTMP kinase [EC:2.7.4.9] 2.7.4.16 thiamine-phosphate kinase contig-65_725_111_9 K00946 thiL; thiamine-monophosphate kinase [EC:2.7.4.16] 2.7.4.22 UMP kinase contig-65_297_21_22 K09903 pyrH; uridylate kinase [EC:2.7.4.22] 2.7.4.25 (d)CMP kinase contig-65_620_88_4 K00945 cmk; CMP/dCMP kinase [EC:2.7.4.25] 2.7.4.31 [5-(aminomethyl)furan-3-yl]methyl phosphate kinase contig-65_212_13_11 K07144 mfnE; 5-(aminomethyl)-3-furanmethanol phosphate kinase [EC:2.7.4.31] 2.7.6 2.7.6.1 ribose-phosphate diphosphokinase contig-65_336_29_4 K00948 PRPS, prsA; ribose-phosphate pyrophosphokinase [EC:2.7.6.1] 2.7.6.3 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine diphosphokinase contig-65_438_49_12 K00950 folK; 2-amino-4-hydroxy-6-hydroxymethyldihydropteridine diphosphokinase [EC:2.7.6.3] 2.7.6.5 GTP diphosphokinase contig-65_320_25_7 K00951 relA; GTP pyrophosphokinase [EC:2.7.6.5] 2.7.7 2.7.7.2 FAD synthetase contig-65_297_21_1 K11753 ribF; riboflavin kinase / FMN adenylyltransferase [EC:2.7.1.26 2.7.7.2] 2.7.7.3 pantetheine-phosphate adenylyltransferase contig-65_458_58_2 K00954 E2.7.7.3A, coaD, kdtB; pantetheine-phosphate adenylyltransferase [EC:2.7.7.3] 2.7.7.4 sulfate adenylyltransferase contig-65_796_133_4 K00958 sat, met3; sulfate adenylyltransferase [EC:2.7.7.4] 2.7.7.6 DNA-directed RNA polymerase contig-65_346_31_23 K03040 rpoA; DNA-directed RNA polymerase subunit alpha [EC:2.7.7.6] contig-65_691_102_7 K03043 rpoB; DNA-directed RNA polymerase subunit beta [EC:2.7.7.6] contig-65_898_161_1 K03043 rpoB; DNA-directed RNA polymerase subunit beta [EC:2.7.7.6] contig-65_691_102_6 K03046 rpoC; DNA-directed RNA polymerase subunit beta' [EC:2.7.7.6] contig-65_751_117_4 K03060 rpoZ; DNA-directed RNA polymerase subunit omega [EC:2.7.7.6] 2.7.7.7 DNA-directed DNA polymerase contig-65_285_18_8 K02334 dpo; DNA polymerase bacteriophage-type [EC:2.7.7.7] contig-65_286_19_12 K02335 DPO1, polA; DNA polymerase I [EC:2.7.7.7] contig-65_406_43_8 K02337 DPO3A1, dnaE; DNA polymerase III subunit alpha [EC:2.7.7.7] contig-65_462_59_7 K02338 DPO3B, dnaN; DNA polymerase III subunit beta [EC:2.7.7.7] contig-65_512_66_2 K02340 DPO3D1, holA; DNA polymerase III subunit delta [EC:2.7.7.7] contig-65_1453_243_1 K02341 DPO3D2, holB; DNA polymerase III subunit delta' [EC:2.7.7.7]

199

Supplementary Table 4.5 (Continued)

contig-65_585_79_2 K02343 DPO3G, dnaX; DNA polymerase III subunit gamma/tau [EC:2.7.7.7] contig-65_302_22_1 K02346 DPO4, dinB; DNA polymerase IV [EC:2.7.7.7] contig-65_95_1_50 K14162 dnaE2; error-prone DNA polymerase [EC:2.7.7.7] 2.7.7.8 polyribonucleotide contig-65_377_39_2 K00962 pnp, PNPT1; polyribonucleotide nucleotidyltransferase [EC:2.7.7.8] 2.7.7.9 UTP---glucose-1-phosphate uridylyltransferase contig-65_1004_178_5 K00963 UGP2, galU, galF; UTP--glucose-1-phosphate uridylyltransferase [EC:2.7.7.9] 2.7.7.12 UDP-glucose---hexose-1-phosphate uridylyltransferase contig-65_574_77_1 K00965 galT, GALT; UDPglucose--hexose-1-phosphate uridylyltransferase [EC:2.7.7.12] 2.7.7.13 mannose-1-phosphate guanylyltransferase contig-65_633_93_3 K16881 K16881; mannose-1-phosphate guanylyltransferase / phosphomannomutase [EC:2.7.7.13 5.4.2.8] 2.7.7.18 nicotinate-nucleotide adenylyltransferase contig-65_775_125_2 K00969 nadD; nicotinate-nucleotide adenylyltransferase [EC:2.7.7.18] 2.7.7.23 UDP-N-acetylglucosamine diphosphorylase contig-65_336_29_3 K04042 glmU; bifunctional UDP-N-acetylglucosamine pyrophosphorylase / Glucosamine-1-phosphate N-acetyltransferase [EC:2.7.7.23 2.3.1.157] 2.7.7.41 phosphatidate cytidylyltransferase contig-65_297_21_18 K00981 E2.7.7.41, CDS1, CDS2, cdsA; phosphatidate cytidylyltransferase [EC:2.7.7.41] 2.7.7.56 tRNA nucleotidyltransferase contig-65_497_63_10 K00989 rph; PH [EC:2.7.7.56] 2.7.7.60 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase contig-65_360_35_10 K12506 ispDF; 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase / 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase [EC:2.7.7.60 4.6.1.12] 2.7.7.62 adenosylcobinamide-phosphate guanylyltransferase contig-65_769_123_5 K02231 cobP, cobU; adenosylcobinamide kinase / adenosylcobinamide- phosphate guanylyltransferase [EC:2.7.1.156 2.7.7.62] 2.7.7.72 CCA tRNA nucleotidyltransferase contig-65_516_67_12 K00974 cca; tRNA nucleotidyltransferase (CCA-adding enzyme) [EC:2.7.7.72 3.1.3.- 3.1.4.-] 2.7.7.74 1L-myo-inositol 1-phosphate cytidylyltransferase contig-65_95_1_11 K07281 ipct; 1L-myo-inositol 1-phosphate cytidylyltransferase [EC:2.7.7.74] 2.7.7.77 molybdenum guanylyltransferase contig-65_961_171_3 K03752 mobA; molybdenum cofactor guanylyltransferase [EC:2.7.7.77] 2.7.7.80 molybdopterin-synthase adenylyltransferase contig-65_307_23_3 K21029 moeB; molybdopterin-synthase adenylyltransferase [EC:2.7.7.80] 2.7.7.85 contig-65_360_35_6 K07067 disA; diadenylate cyclase [EC:2.7.7.85] contig-65_123_4_6 K18672 dacA; diadenylate cyclase [EC:2.7.7.85] 2.7.7.87 L-threonylcarbamoyladenylate synthase contig-65_376_38_16 K07566 tsaC, rimN, SUA5; L-threonylcarbamoyladenylate synthase [EC:2.7.7.87] 2.7.7.- contig-65_834_145_3 K02316 dnaG; DNA [EC:2.7.7.-] 2.7.8 Transferases for other substituted phosphate groups 2.7.8.7 holo-[acyl-carrier-protein] synthase

200

Supplementary Table 4.5 (Continued)

contig-65_443_52_9 K00997 acpS; holo-[acyl-carrier protein] synthase [EC:2.7.8.7] 2.7.8.13 phospho-N-acetylmuramoyl-pentapeptide-transferase contig-65_323_26_7 K01000 mraY; phospho-N-acetylmuramoyl-pentapeptide-transferase [EC:2.7.8.13] 2.7.8.15 UDP-N-acetylglucosamine---dolichyl-phosphate N- acetylglucosaminephosphotransferase contig-65_495_62_12 K01001 ALG7; UDP-N-acetylglucosamine--dolichyl-phosphate N- acetylglucosaminephosphotransferase [EC:2.7.8.15] 2.7.8.26 adenosylcobinamide-GDP ribazoletransferase contig-65_118_3_24 K02233 E2.7.8.26, cobS, cobV; adenosylcobinamide-GDP ribazoletransferase [EC:2.7.8.26] 2.7.8.33 UDP-N-acetylglucosamine---undecaprenyl-phosphate N- acetylglucosaminephosphotransferase contig-65_376_38_11 K02851 wecA, tagO, rfe; UDP-GlcNAc:undecaprenyl- phosphate/decaprenyl-phosphate GlcNAc-1-phosphate transferase [EC:2.7.8.33 2.7.8.35] 2.7.8.34 CDP-L-myo-inositol myo-inositolphosphotransferase contig-65_95_1_11 K07291 dipps; CDP-L-myo-inositol myo-inositolphosphotransferase [EC:2.7.8.34] 2.7.8.35 UDP-N-acetylglucosamine---decaprenyl-phosphate N- acetylglucosaminephosphotransferase contig-65_376_38_11 K02851 wecA, tagO, rfe; UDP-GlcNAc:undecaprenyl- phosphate/decaprenyl-phosphate GlcNAc-1-phosphate transferase [EC:2.7.8.33 2.7.8.35] 2.7.9 Phosphotransferases with paired acceptors 2.7.9.1 pyruvate, phosphate dikinase contig-65_834_145_1 K01006 ppdK; pyruvate, orthophosphate dikinase [EC:2.7.9.1] 2.7.9.3 selenide, water dikinase contig-65_333_28_15 K01008 selD, SEPHS; selenide, water dikinase [EC:2.7.9.3] 2.7.11 Protein-serine/threonine 2.7.11.1 non-specific serine/threonine contig-65_359_34_11 K06379 spoIIAB; stage II sporulation protein AB (anti-sigma F factor) [EC:2.7.11.1] contig-65_843_149_1 K12132 prkC, stkP; eukaryotic-like serine/threonine-protein kinase [EC:2.7.11.1] 2.7.13 Protein-histidine kinases 2.7.13.3 contig-65_109_2_23 K03407 cheA; two-component system, chemotaxis family, sensor kinase CheA [EC:2.7.13.3] contig-65_109_2_4 K07652 vicK; two-component system, OmpR family, sensor histidine kinase VicK [EC:2.7.13.3] contig-65_758_120_6 K07704 lytS; two-component system, LytTR family, sensor histidine kinase LytS [EC:2.7.13.3] contig-65_167_7_1 K07717 ycbA, glnK; two-component system, sensor histidine kinase YcbA [EC:2.7.13.3] contig-65_571_75_8 K07777 degS; two-component system, NarL family, sensor histidine kinase DegS [EC:2.7.13.3] contig-65_902_164_6 K07777 degS; two-component system, NarL family, sensor histidine kinase DegS [EC:2.7.13.3] 2.7.14 Protein-arginine kinases 2.7.14.1 protein contig-65_360_35_3 K19405 mcsB; protein arginine kinase [EC:2.7.14.1] 2.7.- Transferring phosphorus-containing groups

201

Supplementary Table 4.5 (Continued)

2.7.-.- contig-65_765_122_8 K07588 argK; LAO/AO transport system kinase [EC:2.7.-.-] 2.8 Transferring sulfur-containing groups 2.8.1 2.8.1.6 contig-65_174_8_35 K01012 bioB; biotin synthase [EC:2.8.1.6] 2.8.1.7 contig-65_1034_184_5 K04487 iscS, NFS1; cysteine desulfurase [EC:2.8.1.7] contig-65_1074_190_2 K04487 iscS, NFS1; cysteine desulfurase [EC:2.8.1.7] 2.8.1.8 lipoyl synthase contig-65_167_7_23 K03644 lipA; lipoyl synthase [EC:2.8.1.8] 2.8.1.10 thiazole synthase contig-65_302_22_13 K03149 thiG; thiazole synthase [EC:2.8.1.10] 2.8.1.13 tRNA-uridine 2- contig-65_1074_190_4 K00566 mnmA, trmU; tRNA-uridine 2-sulfurtransferase [EC:2.8.1.13] 2.8.3 CoA-transferases 2.8.3.12 glutaconate CoA-transferase contig-65_1142_199_2 K01039 gctA; glutaconate CoA-transferase, subunit A [EC:2.8.3.12] contig-65_1142_199_3 K01040 gctB; glutaconate CoA-transferase, subunit B [EC:2.8.3.12] 2.8.3.16 formyl-CoA transferase contig-65_804_135_8 K07749 frc; formyl-CoA transferase [EC:2.8.3.16] 2.8.3.19 CoA:oxalate CoA-transferase contig-65_109_2_31 K18702 uctC; CoA:oxalate CoA-transferase [EC:2.8.3.19] 2.8.4 Transferring alkylthio groups 2.8.4.3 tRNA-2-methylthio-N6-dimethylallyladenosine synthase contig-65_151_5_30 K06168 miaB; tRNA-2-methylthio-N6-dimethylallyladenosine synthase [EC:2.8.4.3] 2.8.4.4 [ribosomal protein S12] (aspartate89-C3)-methylthiotransferase contig-65_151_5_4 K14441 rimO; ribosomal protein S12 methylthiotransferase [EC:2.8.4.4] 2.8.4.5 tRNA (N6-L-threonylcarbamoyladenosine37-C2)-methylthiotransferase contig-65_819_141_1 K18707 mtaB; threonylcarbamoyladenosine tRNA methylthiotransferase MtaB [EC:2.8.4.5] 2.9 Transferring selenium-containing groups 2.9.1 Selenotransferases 2.9.1.1 L-seryl-tRNASec selenium transferase contig-65_333_28_14 K01042 selA; L-seryl-tRNA(Ser) seleniumtransferase [EC:2.9.1.1] 2.10 Transferring molybdenum- or tungsten-containing groups 2.10.1 Molybdenumtransferases or tungstentransferases with sulfide groups as acceptors 2.10.1.1 molybdopterin molybdotransferase contig-65_961_171_1 K03750 moeA; molybdopterin molybdotransferase [EC:2.10.1.1] contig-65_974_173_3 K03750 moeA; molybdopterin molybdotransferase [EC:2.10.1.1] contig-65_974_173_4 K03750 moeA; molybdopterin molybdotransferase [EC:2.10.1.1] 2.- 2.-.- 2.-.-.- contig-65_406_43_15 K13292 lgt, umpA; phosphatidylglycerol:prolipoprotein diacylglycerol transferase [EC:2.-.-.-] contig-65_455_56_15 K13292 lgt, umpA; phosphatidylglycerol:prolipoprotein diacylglycerol transferase [EC:2.-.-.-] contig-65_624_89_4 K13292 lgt, umpA; phosphatidylglycerol:prolipoprotein diacylglycerol transferase [EC:2.-.-.-] 3.

202

Supplementary Table 4.5 (Continued)

3.1 Acting on ester bonds 3.1.1 Carboxylic-ester hydrolases 3.1.1.29 aminoacyl-tRNA hydrolase contig-65_336_29_6 K01056 PTH1, pth, spoVC; peptidyl-tRNA hydrolase, PTH1 family [EC:3.1.1.29] 3.1.1.61 protein-glutamate methylesterase contig-65_109_2_26 K03412 cheB; two-component system, chemotaxis family, response regulator CheB [EC:3.1.1.61] 3.1.2 Thioester hydrolases 3.1.2.- contig-65_406_43_12 K07107 ybgC; acyl-CoA thioester hydrolase [EC:3.1.2.-] 3.1.3 Phosphoric-monoester hydrolases 3.1.3.5 5'- contig-65_95_1_32 K03787 surE; 5'-nucleotidase [EC:3.1.3.5] 3.1.3.7 3'(2'),5'-bisphosphate nucleotidase contig-65_297_21_3 K06881 nrnA; bifunctional oligoribonuclease and PAP phosphatase NrnA [EC:3.1.3.7 3.1.13.3] 3.1.3.11 fructose-bisphosphatase contig-65_725_111_4 K01622 K01622; fructose 1,6-bisphosphate aldolase/phosphatase [EC:4.1.2.13 3.1.3.11] 3.1.3.12 trehalose-phosphatase contig-65_1004_178_2 K01087 otsB; trehalose 6-phosphate phosphatase [EC:3.1.3.12] 3.1.3.15 histidinol-phosphatase contig-65_95_1_13 K04486 E3.1.3.15B; histidinol-phosphatase (PHP family) [EC:3.1.3.15] contig-65_118_3_4 K04486 E3.1.3.15B; histidinol-phosphatase (PHP family) [EC:3.1.3.15] 3.1.3.16 protein-serine/threonine phosphatase contig-65_438_49_1 K06382 spoIIE; stage II sporulation protein E [EC:3.1.3.16] contig-65_1172_202_3 K06382 spoIIE; stage II sporulation protein E [EC:3.1.3.16] contig-65_843_149_2 K20074 prpC, phpP; PPM family [EC:3.1.3.16] 3.1.3.71 2-phosphosulfolactate phosphatase contig-65_95_1_36 K05979 comB; 2-phosphosulfolactate phosphatase [EC:3.1.3.71] 3.1.3.97 3',5'-nucleoside bisphosphate phosphatase contig-65_769_123_9 K07053 E3.1.3.97; 3',5'-nucleoside bisphosphate phosphatase [EC:3.1.3.97] 3.1.3.100 thiamine phosphate phosphatase contig-65_1506_249_2 K06949 rsgA, engC; ribosome biogenesis GTPase / thiamine phosphate phosphatase [EC:3.6.1.- 3.1.3.100] 3.1.3.- contig-65_516_67_12 K00974 cca; tRNA nucleotidyltransferase (CCA-adding enzyme) [EC:2.7.7.72 3.1.3.- 3.1.4.-] 3.1.4 Phosphoric-diester hydrolases 3.1.4.- contig-65_516_67_12 K00974 cca; tRNA nucleotidyltransferase (CCA-adding enzyme) [EC:2.7.7.72 3.1.3.- 3.1.4.-] 3.1.5 Triphosphoric-monoester hydrolases 3.1.5.1 dGTPase contig-65_834_145_2 K01129 dgt; dGTPase [EC:3.1.5.1] 3.1.11 producing 5'-phosphomonoesters 3.1.11.5 V contig-65_696_104_5 K03581 recD; exodeoxyribonuclease V alpha subunit [EC:3.1.11.5] 3.1.11.6 exodeoxyribonuclease VII contig-65_174_8_11 K03601 xseA; exodeoxyribonuclease VII large subunit [EC:3.1.11.6]

203

Supplementary Table 4.5 (Continued)

contig-65_174_8_10 K03602 xseB; exodeoxyribonuclease VII small subunit [EC:3.1.11.6] 3.1.12 Exodeoxyribonucleases producing 3'-phosphomonoesters 3.1.12.1 5' to 3' exodeoxyribonuclease (nucleoside 3'-phosphate-forming) contig-65_862_154_4 K07464 cas4; CRISPR-associated Cas4 [EC:3.1.12.1] 3.1.13 producing 5'-phosphomonoesters 3.1.13.3 oligonucleotidase contig-65_297_21_3 K06881 nrnA; bifunctional oligoribonuclease and PAP phosphatase NrnA [EC:3.1.3.7 3.1.13.3] 3.1.21 producing 5'-phosphomonoesters 3.1.21.2 IV contig-65_604_85_6 K01151 nfo; deoxyribonuclease IV [EC:3.1.21.2] 3.1.21.3 type I site-specific deoxyribonuclease contig-65_658_96_5 K01153 hsdR; type I , R subunit [EC:3.1.21.3] 3.1.21.5 type III site-specific deoxyribonuclease contig-65_775_125_7 K01156 res; type III restriction enzyme [EC:3.1.21.5] 3.1.21.- contig-65_531_68_9 K03424 tatD; TatD DNase family protein [EC:3.1.21.-] 3.1.22 Endodeoxyribonucleases producing 3'-phosphomonoesters 3.1.22.4 crossover junction contig-65_307_23_19 K01159 ruvC; crossover junction endodeoxyribonuclease RuvC [EC:3.1.22.4] 3.1.26 producing 5'-phosphomonoesters 3.1.26.3 ribonuclease III contig-65_628_91_4 K03685 rnc, DROSHA, RNT1; ribonuclease III [EC:3.1.26.3] 3.1.26.4 contig-65_447_54_12 K03469 rnhA, RNASEH1; ribonuclease HI [EC:3.1.26.4] contig-65_562_73_3 K03470 rnhB; ribonuclease HII [EC:3.1.26.4] 3.1.26.5 contig-65_462_59_10 K03536 rnpA; ribonuclease P protein component [EC:3.1.26.5] 3.1.26.- contig-65_445_53_5 K08301 rng, cafA; ribonuclease G [EC:3.1.26.-] 3.1.31 Endoribonucleases that are active with either ribo- or deoxyribonucleic acids and produce 3'-phosphomonoesters 3.1.31.1 micrococcal contig-65_123_4_10 K01174 nuc; [EC:3.1.31.1] 3.1.- Acting on ester bonds 3.1.-.- contig-65_342_30_12 K03698 cbf, cbf1; 3'-5' [EC:3.1.-.-] contig-65_614_86_9 K07012 cas3; CRISPR-associated /helicase Cas3 [EC:3.1.-.- 3.6.4.-] contig-65_845_150_5 K07012 cas3; CRISPR-associated endonuclease/helicase Cas3 [EC:3.1.- .- 3.6.4.-] contig-65_443_52_5 K07171 mazF, ndoA, chpA; mRNA interferase MazF [EC:3.1.-.-] contig-65_535_70_5 K07447 ruvX; putative holliday junction resolvase [EC:3.1.-.-] contig-65_320_25_5 K07462 recJ; single-stranded-DNA-specific exonuclease [EC:3.1.-.-] contig-65_320_25_8 K07560 dtd, DTD1; D-tyrosyl-tRNA(Tyr) deacylase [EC:3.1.-.-] contig-65_292_20_10 K12574 rnj; ribonuclease J [EC:3.1.-.-] contig-65_377_39_14 K12574 rnj; ribonuclease J [EC:3.1.-.-] contig-65_817_139_3 K16898 addA; ATP-dependent helicase/nuclease subunit A [EC:3.1.-.- 3.6.4.12] contig-65_378_40_16 K16899 addB; ATP-dependent helicase/nuclease subunit B [EC:3.1.-.- 3.6.4.12]

204

Supplementary Table 4.5 (Continued)

contig-65_817_139_4 K16899 addB; ATP-dependent helicase/nuclease subunit B [EC:3.1.-.- 3.6.4.12] contig-65_151_5_14 K18682 rny; ribonucrease Y [EC:3.1.-.-] contig-65_845_150_1 K19091 cas6; CRISPR-associated Cas6 [EC:3.1.-.-] 3.2 Glycosylases 3.2.2 Hydrolysing N-glycosyl compounds 3.2.2.23 DNA-formamidopyrimidine glycosylase contig-65_286_19_13 K10563 mutM, fpg; formamidopyrimidine-DNA glycosylase [EC:3.2.2.23 4.2.99.18] 3.3 Acting on ether bonds 3.3.1 Thioether and trialkylsulfonium hydrolases 3.3.1.1 adenosylhomocysteinase contig-65_725_111_1 K01251 E3.3.1.1, ahcY; adenosylhomocysteinase [EC:3.3.1.1] contig-65_1438_242_1 K01251 E3.3.1.1, ahcY; adenosylhomocysteinase [EC:3.3.1.1] 3.4 Acting on peptide bonds (peptidases) 3.4.11 Aminopeptidases 3.4.11.4 tripeptide aminopeptidase contig-65_535_70_8 K01258 pepT; tripeptide aminopeptidase [EC:3.4.11.4] 3.4.11.9 Xaa-Pro aminopeptidase contig-65_174_8_29 K01262 pepP; Xaa-Pro aminopeptidase [EC:3.4.11.9] 3.4.11.18 methionyl aminopeptidase contig-65_346_31_17 K01265 map; methionyl aminopeptidase [EC:3.4.11.18] 3.4.13 Dipeptidases 3.4.13.19 membrane dipeptidase contig-65_151_5_18 K01273 DPEP; membrane dipeptidase [EC:3.4.13.19] 3.4.15 Peptidyl-dipeptidases 3.4.15.6 cyanophycinase contig-65_531_68_4 K13282 cphB; cyanophycinase [EC:3.4.15.6] 3.4.16 Serine-type carboxypeptidases 3.4.16.4 serine-type D-Ala-D-Ala carboxypeptidase contig-65_424_46_1 K05366 mrcA; penicillin-binding protein 1A [EC:2.4.1.129 3.4.16.4] contig-65_292_20_17 K05515 mrdA; penicillin-binding protein 2 [EC:3.4.16.4] contig-65_328_27_6 K05515 mrdA; penicillin-binding protein 2 [EC:3.4.16.4] contig-65_1438_242_3 K05515 mrdA; penicillin-binding protein 2 [EC:3.4.16.4] contig-65_342_30_2 K07258 dacC, dacA, dacD; serine-type D-Ala-D-Ala carboxypeptidase (penicillin-binding protein 5/6) [EC:3.4.16.4] contig-65_359_34_9 K07258 dacC, dacA, dacD; serine-type D-Ala-D-Ala carboxypeptidase (penicillin-binding protein 5/6) [EC:3.4.16.4] contig-65_501_64_11 K07258 dacC, dacA, dacD; serine-type D-Ala-D-Ala carboxypeptidase (penicillin-binding protein 5/6) [EC:3.4.16.4] 3.4.17 Metallocarboxypeptidases 3.4.17.11 glutamate carboxypeptidase contig-65_909_165_5 K01295 cpg; glutamate carboxypeptidase [EC:3.4.17.11] 3.4.17.- contig-65_307_23_4 K21140 mec; sulfur-carrier protein carboxypeptidase [EC:3.4.17.-] 3.4.19 Omega peptidases 3.4.19.13 glutathione hydrolase contig-65_439_50_14 K00681 ggt; gamma-glutamyltranspeptidase / glutathione hydrolase [EC:2.3.2.2 3.4.19.13] 3.4.19.- contig-65_774_124_4 K01305 iadA; beta-aspartyl-dipeptidase (metallo-type) [EC:3.4.19.-] 3.4.21 Serine endopeptidases

205

Supplementary Table 4.5 (Continued)

3.4.21.53 endopeptidase La contig-65_193_11_13 K01338 lon; ATP-dependent Lon protease [EC:3.4.21.53] contig-65_465_60_12 K01338 lon; ATP-dependent Lon protease [EC:3.4.21.53] 3.4.21.88 repressor LexA contig-65_95_1_18 K01356 lexA; repressor LexA [EC:3.4.21.88] 3.4.21.89 signal peptidase I contig-65_562_73_5 K03100 lepB, TPP; signal peptidase I [EC:3.4.21.89] 3.4.21.92 endopeptidase Clp contig-65_465_60_2 K01358 clpP, CLPP; ATP-dependent Clp protease, protease subunit [EC:3.4.21.92] 3.4.21.102 C-terminal processing peptidase contig-65_248_15_7 K03797 E3.4.21.102, prc, ctpA; carboxyl-terminal processing protease [EC:3.4.21.102] 3.4.21.107 peptidase Do contig-65_109_2_8 K04771 degP, htrA; serine protease Do [EC:3.4.21.107] 3.4.21.116 SpoIVB peptidase contig-65_1618_258_1 K06399 spoIVB; stage IV sporulation protein B [EC:3.4.21.116] 3.4.21.- contig-65_465_60_11 K04076 lonB; Lon-like ATP-dependent protease [EC:3.4.21.-] contig-65_109_2_34 K04773 sppA; protease IV [EC:3.4.21.-] 3.4.23 Aspartic endopeptidases 3.4.23.36 signal peptidase II contig-65_1827_279_2 K03101 lspA; signal peptidase II [EC:3.4.23.36] 3.4.23.43 prepilin peptidase contig-65_714_108_2 K02654 pilD, pppA; leader peptidase (prepilin peptidase) / N- methyltransferase [EC:3.4.23.43 2.1.1.-] 3.4.23.- contig-65_189_9_16 K03605 hyaD, hybD; hydrogenase maturation protease [EC:3.4.23.-] contig-65_333_28_6 K03605 hyaD, hybD; hydrogenase maturation protease [EC:3.4.23.-] contig-65_836_147_7 K03605 hyaD, hybD; hydrogenase maturation protease [EC:3.4.23.-] contig-65_348_32_3 K06383 spoIIGA; stage II sporulation protein GA (sporulation sigma-E factor processing peptidase) [EC:3.4.23.-] 3.4.24 3.4.24.78 gpr endopeptidase contig-65_458_58_5 K06012 gpr; spore protease [EC:3.4.24.78] 3.4.24.- contig-65_438_49_3 K03798 ftsH, hflB; cell division protease FtsH [EC:3.4.24.-] contig-65_445_53_8 K06402 spoIVFB; stage IV sporulation protein FB [EC:3.4.24.-] contig-65_297_21_15 K11749 rseP; regulator of sigma E protease [EC:3.4.24.-] 3.4.25 Threonine endopeptidases 3.4.25.2 HslU---HslV peptidase contig-65_1279_219_4 K01419 hslV, clpQ; ATP-dependent HslUV protease, peptidase subunit HslV [EC:3.4.25.2] 3.4.- Acting on peptide bonds (peptidases) 3.4.-.- contig-65_535_70_9 K08303 K08303; putative protease [EC:3.4.-.-] contig-65_835_146_3 K21471 cwlO; peptidoglycan DL-endopeptidase CwlO [EC:3.4.-.-] 3.5 Acting on carbon-nitrogen bonds, other than peptide bonds 3.5.1 In linear amides 3.5.1.18 succinyl-diaminopimelate desuccinylase contig-65_1038_185_5 K01439 dapE; succinyl-diaminopimelate desuccinylase [EC:3.5.1.18] 3.5.1.28 N-acetylmuramoyl-L-alanine amidase

206

Supplementary Table 4.5 (Continued)

contig-65_118_3_20 K01448 E3.5.1.28B, amiA, amiB, amiC; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] contig-65_191_10_28 K01448 E3.5.1.28B, amiA, amiB, amiC; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] contig-65_497_63_7 K01448 E3.5.1.28B, amiA, amiB, amiC; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] contig-65_592_80_3 K01448 E3.5.1.28B, amiA, amiB, amiC; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] contig-65_669_98_8 K01448 E3.5.1.28B, amiA, amiB, amiC; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] contig-65_118_3_9 K01449 E3.5.1.28C, cwlJ, sleB; N-acetylmuramoyl-L-alanine amidase [EC:3.5.1.28] 3.5.1.42 nicotinamide-nucleotide amidase contig-65_151_5_9 K03742 pncC; nicotinamide-nucleotide amidase [EC:3.5.1.42] 3.5.1.44 protein-glutamine glutaminase contig-65_109_2_24 K03411 cheD; chemotaxis protein CheD [EC:3.5.1.44] 3.5.1.81 N-acyl-D-amino-acid deacylase contig-65_118_3_11 K06015 E3.5.1.81; N-acyl-D-amino-acid deacylase [EC:3.5.1.81] contig-65_786_130_2 K06015 E3.5.1.81; N-acyl-D-amino-acid deacylase [EC:3.5.1.81] 3.5.1.88 peptide deformylase contig-65_751_117_7 K01462 PDF, def; peptide deformylase [EC:3.5.1.88] 3.5.1.124 protein deglycase contig-65_1022_182_3 K05520 pfpI; protease I [EC:3.5.1.124] 3.5.1.- contig-65_342_30_6 K12410 npdA; NAD-dependent deacetylase [EC:3.5.1.-] 3.5.2 In cyclic amides 3.5.2.3 dihydroorotase contig-65_191_10_6 K01465 URA4, pyrC; dihydroorotase [EC:3.5.2.3] contig-65_818_140_1 K01465 URA4, pyrC; dihydroorotase [EC:3.5.2.3] contig-65_1350_230_1 K01465 URA4, pyrC; dihydroorotase [EC:3.5.2.3] 3.5.2.7 imidazolonepropionase contig-65_264_17_24 K01468 hutI, AMDHD1; imidazolonepropionase [EC:3.5.2.7] contig-65_600_82_1 K01468 hutI, AMDHD1; imidazolonepropionase [EC:3.5.2.7] 3.5.2.10 creatininase contig-65_285_18_19 K01470 E3.5.2.10; creatinine amidohydrolase [EC:3.5.2.10] 3.5.3 In linear amidines 3.5.3.11 agmatinase contig-65_109_2_19 K01480 speB; agmatinase [EC:3.5.3.11] 3.5.4 In cyclic amidines 3.5.4.5 cytidine deaminase contig-65_723_110_4 K01489 cdd, CDA; cytidine deaminase [EC:3.5.4.5] 3.5.4.9 methenyltetrahydrofolate cyclohydrolase contig-65_174_8_12 K01491 folD; methylenetetrahydrofolate dehydrogenase (NADP+) / methenyltetrahydrofolate cyclohydrolase [EC:1.5.1.5 3.5.4.9] 3.5.4.10 IMP cyclohydrolase contig-65_1404_238_2 K00602 purH; phosphoribosylaminoimidazolecarboxamide formyltransferase / IMP cyclohydrolase [EC:2.1.2.3 3.5.4.10] 3.5.4.12 dCMP deaminase contig-65_376_38_13 K01493 comEB; dCMP deaminase [EC:3.5.4.12] 3.5.4.16 GTP cyclohydrolase I contig-65_438_49_9 K01495 GCH1, folE; GTP cyclohydrolase I [EC:3.5.4.16] 3.5.4.19 phosphoribosyl-AMP cyclohydrolase

207

Supplementary Table 4.5 (Continued)

contig-65_595_81_5 K11755 hisIE; phosphoribosyl-ATP pyrophosphohydrolase / phosphoribosyl-AMP cyclohydrolase [EC:3.6.1.31 3.5.4.19] 3.5.4.25 GTP cyclohydrolase II contig-65_193_11_19 K14652 ribBA; 3,4-dihydroxy 2-butanone 4-phosphate synthase / GTP cyclohydrolase II [EC:4.1.99.12 3.5.4.25] 3.5.4.26 diaminohydroxyphosphoribosylaminopyrimidine deaminase contig-65_193_11_21 K11752 ribD; diaminohydroxyphosphoribosylaminopyrimidine deaminase / 5-amino-6-(5-phosphoribosylamino)uracil reductase [EC:3.5.4.26 1.1.1.193] 3.5.4.27 methenyltetrahydromethanopterin cyclohydrolase contig-65_212_13_15 K01499 mch; methenyltetrahydromethanopterin cyclohydrolase [EC:3.5.4.27] 3.5.4.33 tRNA(adenine34) deaminase contig-65_585_79_1 K11991 tadA; tRNA(adenine34) deaminase [EC:3.5.4.33] contig-65_1009_180_7 K11991 tadA; tRNA(adenine34) deaminase [EC:3.5.4.33] 3.5.4.39 GTP cyclohydrolase IV contig-65_212_13_5 K17488 mptA; GTP cyclohydrolase IV [EC:3.5.4.39] 3.5.5 In nitriles 3.5.5.1 nitrilase contig-65_109_2_30 K01501 E3.5.5.1; nitrilase [EC:3.5.5.1] 3.6 Acting on acid anhydrides 3.6.1 In phosphorus-containing anhydrides 3.6.1.1 inorganic diphosphatase contig-65_497_63_3 K15987 hppA; K(+)-stimulated -energized sodium pump [EC:3.6.1.1] 3.6.1.13 ADP-ribose diphosphatase contig-65_359_34_1 K01515 nudF; ADP-ribose pyrophosphatase [EC:3.6.1.13] 3.6.1.23 dUTP diphosphatase contig-65_377_39_4 K01520 dut, DUT; dUTP pyrophosphatase [EC:3.6.1.23] 3.6.1.27 undecaprenyl-diphosphate phosphatase contig-65_603_84_2 K06153 bacA; undecaprenyl-diphosphatase [EC:3.6.1.27] 3.6.1.31 phosphoribosyl-ATP diphosphatase contig-65_595_81_5 K11755 hisIE; phosphoribosyl-ATP pyrophosphohydrolase / phosphoribosyl-AMP cyclohydrolase [EC:3.6.1.31 3.5.4.19] 3.6.1.66 XTP/dITP diphosphatase contig-65_497_63_11 K02428 rdgB; XTP/dITP diphosphohydrolase [EC:3.6.1.66] 3.6.1.- contig-65_1506_249_2 K06949 rsgA, engC; ribosome biogenesis GTPase / thiamine phosphate phosphatase [EC:3.6.1.- 3.1.3.100] 3.6.3 Acting on acid anhydrides to catalyse transmembrane movement of substances 3.6.3.3 Cd2+-exporting ATPase contig-65_809_138_2 K01534 zntA; Cd2+/Zn2+-exporting ATPase [EC:3.6.3.3 3.6.3.5] 3.6.3.5 Zn2+-exporting ATPase contig-65_809_138_2 K01534 zntA; Cd2+/Zn2+-exporting ATPase [EC:3.6.3.3 3.6.3.5] 3.6.3.8 Ca2+-transporting ATPase contig-65_191_10_25 K01537 E3.6.3.8; Ca2+-transporting ATPase [EC:3.6.3.8] 3.6.3.14 H+-transporting two-sector ATPase contig-65_376_38_3 K02111 ATPF1A, atpA; F-type H+-transporting ATPase subunit alpha [EC:3.6.3.14] contig-65_376_38_1 K02112 ATPF1B, atpD; F-type H+-transporting ATPase subunit beta [EC:3.6.3.14] contig-65_1919_286_3 K02112 ATPF1B, atpD; F-type H+-transporting ATPase subunit beta [EC:3.6.3.14]

208

Supplementary Table 4.5 (Continued)

contig-65_399_42_17 K02412 fliI; flagellum-specific ATP synthase [EC:3.6.3.14] 3.6.3.29 molybdate-transporting ATPase contig-65_961_171_4 K02017 modC; molybdate transport system ATP-binding protein [EC:3.6.3.29] 3.6.3.34 iron-chelate-transporting ATPase contig-65_755_119_3 K02013 ABC.FEV.A; iron complex transport system ATP-binding protein [EC:3.6.3.34] 3.6.3.41 heme-transporting ATPase contig-65_455_56_10 K02193 ccmA; heme exporter protein A [EC:3.6.3.41] 3.6.3.54 Cu+-exporting ATPase contig-65_455_56_2 K17686 copA, ATP7; Cu+-exporting ATPase [EC:3.6.3.54] contig-65_750_116_7 K17686 copA, ATP7; Cu+-exporting ATPase [EC:3.6.3.54] contig-65_1085_191_2 K17686 copA, ATP7; Cu+-exporting ATPase [EC:3.6.3.54] 3.6.3.55 tungstate-importing ATPase contig-65_109_2_39 K06857 tupC, vupC; tungstate transport system ATP-binding protein [EC:3.6.3.55] 3.6.3.- contig-65_346_31_25 K16786 ecfA1; energy-coupling factor transport system ATP-binding protein [EC:3.6.3.-] contig-65_346_31_26 K16787 ecfA2; energy-coupling factor transport system ATP-binding protein [EC:3.6.3.-] contig-65_798_134_2 K16907 K16907; fluoroquinolone transport system ATP-binding protein [EC:3.6.3.-] 3.6.4 Acting on acid anhydrides to facilitate cellular and subcellular movement 3.6.4.12 DNA helicase contig-65_193_11_12 K02314 dnaB; replicative DNA helicase [EC:3.6.4.12] contig-65_1022_182_4 K02314 dnaB; replicative DNA helicase [EC:3.6.4.12] contig-65_307_23_20 K03550 ruvA; holliday junction DNA helicase RuvA [EC:3.6.4.12] contig-65_307_23_21 K03551 ruvB; holliday junction DNA helicase RuvB [EC:3.6.4.12] contig-65_939_166_2 K03657 uvrD, pcrA; DNA helicase II / ATP-dependent DNA helicase PcrA [EC:3.6.4.12] contig-65_95_1_33 K03722 dinG; ATP-dependent DNA helicase DinG [EC:3.6.4.12] contig-65_817_139_3 K16898 addA; ATP-dependent helicase/nuclease subunit A [EC:3.1.-.- 3.6.4.12] contig-65_378_40_16 K16899 addB; ATP-dependent helicase/nuclease subunit B [EC:3.1.-.- 3.6.4.12] contig-65_817_139_4 K16899 addB; ATP-dependent helicase/nuclease subunit B [EC:3.1.-.- 3.6.4.12] 3.6.4.- contig-65_336_29_7 K03723 mfd; transcription-repair coupling factor (superfamily II helicase) [EC:3.6.4.-] contig-65_751_117_6 K04066 priA; primosomal protein N' (replication factor Y) (superfamily II helicase) [EC:3.6.4.-] contig-65_614_86_9 K07012 cas3; CRISPR-associated endonuclease/helicase Cas3 [EC:3.1.-.- 3.6.4.-] contig-65_845_150_5 K07012 cas3; CRISPR-associated endonuclease/helicase Cas3 [EC:3.1.- .- 3.6.4.-] 3.6.5 Acting on GTP to facilitate cellular and subcellular movement 3.6.5.4 signal-recognition-particle GTPase contig-65_562_73_11 K03106 SRP54, ffh; signal recognition particle subunit SRP54 [EC:3.6.5.4] 3.6.5.-

209

Supplementary Table 4.5 (Continued)

contig-65_775_125_1 K03979 obgE, cgtA; GTPase [EC:3.6.5.-] 3.6.- Acting on acid anhydrides 3.6.-.- contig-65_462_59_13 K03650 mnmE, trmE, MSS1; tRNA modification GTPase [EC:3.6.-.-] 3.7 Acting on carbon-carbon bonds 3.7.1 In ketonic substances 3.7.1.12 cobalt-precorrin 5A hydrolase contig-65_443_52_14 K02189 cbiG; cobalt-precorrin 5A hydrolase [EC:3.7.1.12] 4. 4.1 Carbon-carbon lyases 4.1.1 Carboxy-lyases 4.1.1.3 oxaloacetate decarboxylase contig-65_109_2_36 K01571 oadA; oxaloacetate decarboxylase, alpha subunit [EC:4.1.1.3] contig-65_681_100_3 K01572 oadB; oxaloacetate decarboxylase, beta subunit [EC:4.1.1.3] 4.1.1.7 benzoylformate decarboxylase contig-65_1667_260_2 K01576 mdlC; benzoylformate decarboxylase [EC:4.1.1.7] 4.1.1.11 aspartate 1-decarboxylase contig-65_551_72_3 K01579 panD; aspartate 1-decarboxylase [EC:4.1.1.11] 4.1.1.19 arginine decarboxylase contig-65_109_2_17 K02626 pdaD; arginine decarboxylase [EC:4.1.1.19] 4.1.1.20 diaminopimelate decarboxylase contig-65_695_103_1 K01586 lysA; diaminopimelate decarboxylase [EC:4.1.1.20] 4.1.1.23 orotidine-5'-phosphate decarboxylase contig-65_191_10_9 K01591 pyrF; orotidine-5'-phosphate decarboxylase [EC:4.1.1.23] 4.1.1.36 phosphopantothenoylcysteine decarboxylase contig-65_751_117_5 K13038 coaBC, dfp; phosphopantothenoylcysteine decarboxylase / phosphopantothenate--cysteine [EC:4.1.1.36 6.3.2.5] 4.1.1.37 uroporphyrinogen decarboxylase contig-65_809_138_4 K01599 hemE, UROD; uroporphyrinogen decarboxylase [EC:4.1.1.37] contig-65_949_169_2 K01599 hemE, UROD; uroporphyrinogen decarboxylase [EC:4.1.1.37] 4.1.1.48 indole-3-glycerol-phosphate synthase contig-65_769_123_1 K01609 trpC; indole-3-glycerol phosphate synthase [EC:4.1.1.48] 4.1.1.50 adenosylmethionine decarboxylase contig-65_95_1_45 K01611 speD, AMD1; S-adenosylmethionine decarboxylase [EC:4.1.1.50] 4.1.1.81 threonine-phosphate decarboxylase contig-65_118_3_22 K04720 cobD; threonine-phosphate decarboxylase [EC:4.1.1.81] 4.1.1.98 4-hydroxy-3-polyprenylbenzoate decarboxylase contig-65_380_41_17 K03182 ubiD; 4-hydroxy-3-polyprenylbenzoate decarboxylase [EC:4.1.1.98] 4.1.2 Aldehyde-lyases 4.1.2.13 fructose-bisphosphate aldolase contig-65_725_111_4 K01622 K01622; fructose 1,6-bisphosphate aldolase/phosphatase [EC:4.1.2.13 3.1.3.11] 4.1.2.14 2-dehydro-3-deoxy-phosphogluconate aldolase contig-65_123_4_8 K01625 eda; 2-dehydro-3-deoxyphosphogluconate aldolase / (4S)-4- hydroxy-2-oxoglutarate aldolase [EC:4.1.2.14 4.1.3.42] 4.1.2.17 L-fuculose-phosphate aldolase contig-65_285_18_20 K01628 fucA; L-fuculose-phosphate aldolase [EC:4.1.2.17] 4.1.2.20 2-dehydro-3-deoxyglucarate aldolase contig-65_166_6_28 K01630 garL; 2-dehydro-3-deoxyglucarate aldolase [EC:4.1.2.20] 4.1.2.25 dihydroneopterin aldolase

210

Supplementary Table 4.5 (Continued)

contig-65_438_49_11 K01633 folB; dihydroneopterin aldolase / 7,8-dihydroneopterin epimerase [EC:4.1.2.25 5.1.99.8] 4.1.2.48 low-specificity L-threonine aldolase contig-65_320_25_13 K01620 ltaE; threonine aldolase [EC:4.1.2.48] 4.1.2.52 4-hydroxy-2-oxoheptanedioate aldolase contig-65_118_3_34 K02510 hpaI, hpcH; 4-hydroxy-2-oxoheptanedioate aldolase [EC:4.1.2.52] contig-65_817_139_2 K02510 hpaI, hpcH; 4-hydroxy-2-oxoheptanedioate aldolase [EC:4.1.2.52] 4.1.3 Oxo-acid-lyases 4.1.3.4 hydroxymethylglutaryl-CoA lyase contig-65_804_135_7 K01640 E4.1.3.4, HMGCL, hmgL; hydroxymethylglutaryl-CoA lyase [EC:4.1.3.4] 4.1.3.27 anthranilate synthase contig-65_212_13_3 K01657 trpE; anthranilate synthase component I [EC:4.1.3.27] contig-65_212_13_2 K01658 trpG; anthranilate synthase component II [EC:4.1.3.27] contig-65_443_52_3 K01658 trpG; anthranilate synthase component II [EC:4.1.3.27] 4.1.3.34 citryl-CoA lyase contig-65_118_3_16 K01644 citE; citrate lyase subunit beta / citryl-CoA lyase [EC:4.1.3.34] 4.1.3.42 (4S)-4-hydroxy-2-oxoglutarate aldolase contig-65_123_4_8 K01625 eda; 2-dehydro-3-deoxyphosphogluconate aldolase / (4S)-4- hydroxy-2-oxoglutarate aldolase [EC:4.1.2.14 4.1.3.42] 4.1.3.- contig-65_595_81_6 K02500 hisF; cyclase [EC:4.1.3.-] 4.1.99 Other carbon-carbon lyases 4.1.99.12 3,4-dihydroxy-2-butanone-4-phosphate synthase contig-65_193_11_19 K14652 ribBA; 3,4-dihydroxy 2-butanone 4-phosphate synthase / GTP cyclohydrolase II [EC:4.1.99.12 3.5.4.25] 4.1.99.17 phosphomethylpyrimidine synthase contig-65_193_11_17 K03147 thiC; phosphomethylpyrimidine synthase [EC:4.1.99.17] 4.1.99.19 2-iminoacetate synthase contig-65_302_22_12 K03150 thiH; 2-iminoacetate synthase [EC:4.1.99.19] 4.1.99.22 GTP 3',8-cyclase contig-65_974_173_2 K03639 moaA, CNX2; GTP 3',8-cyclase [EC:4.1.99.22] 4.2 Carbon-oxygen lyases 4.2.1 Hydro-lyases 4.2.1.2 fumarate hydratase contig-65_203_12_15 K01677 E4.2.1.2AA, fumA; fumarate hydratase subunit alpha [EC:4.2.1.2] contig-65_203_12_16 K01678 E4.2.1.2AB, fumB; fumarate hydratase subunit beta [EC:4.2.1.2] 4.2.1.3 aconitate hydratase contig-65_601_83_9 K01681 ACO, acnA; aconitate hydratase [EC:4.2.1.3] 4.2.1.7 altronate dehydratase contig-65_118_3_2 K16849 uxaA1; altronate dehydratase small subunit [EC:4.2.1.7] contig-65_193_11_3 K16849 uxaA1; altronate dehydratase small subunit [EC:4.2.1.7] contig-65_193_11_2 K16850 uxaA2; altronate dehydratase large subunit [EC:4.2.1.7] 4.2.1.9 dihydroxy-acid dehydratase contig-65_307_23_16 K01687 ilvD; dihydroxy-acid dehydratase [EC:4.2.1.9] contig-65_445_53_16 K01687 ilvD; dihydroxy-acid dehydratase [EC:4.2.1.9] contig-65_551_72_8 K01687 ilvD; dihydroxy-acid dehydratase [EC:4.2.1.9] contig-65_725_111_8 K01687 ilvD; dihydroxy-acid dehydratase [EC:4.2.1.9] contig-65_1355_232_5 K01687 ilvD; dihydroxy-acid dehydratase [EC:4.2.1.9]

211

Supplementary Table 4.5 (Continued)

4.2.1.10 3-dehydroquinate dehydratase contig-65_174_8_30 K03786 aroQ, qutE; 3-dehydroquinate dehydratase II [EC:4.2.1.10] 4.2.1.11 phosphopyruvate hydratase contig-65_447_54_7 K01689 ENO, eno; [EC:4.2.1.11] contig-65_497_63_1 K01689 ENO, eno; enolase [EC:4.2.1.11] 4.2.1.17 enoyl-CoA hydratase contig-65_1212_208_1 K01715 crt; enoyl-CoA hydratase [EC:4.2.1.17] 4.2.1.19 imidazoleglycerol-phosphate dehydratase contig-65_595_81_9 K01693 hisB; imidazoleglycerol-phosphate dehydratase [EC:4.2.1.19] 4.2.1.20 tryptophan synthase contig-65_769_123_4 K01695 trpA; tryptophan synthase alpha chain [EC:4.2.1.20] contig-65_769_123_3 K01696 trpB; tryptophan synthase beta chain [EC:4.2.1.20] contig-65_592_80_8 K06001 trpB; tryptophan synthase beta chain [EC:4.2.1.20] 4.2.1.24 porphobilinogen synthase contig-65_765_122_3 K01698 hemB, ALAD; porphobilinogen synthase [EC:4.2.1.24] 4.2.1.33 3-isopropylmalate dehydratase contig-65_250_16_9 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_352_33_9 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_592_80_6 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_782_127_2 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_1144_200_2 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_352_33_8 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] contig-65_592_80_5 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] contig-65_782_127_3 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] contig-65_1144_200_1 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] 4.2.1.35 (R)-2-methylmalate dehydratase contig-65_250_16_9 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_352_33_9 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_592_80_6 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_782_127_2 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_1144_200_2 K01703 leuC, IPMI-L; 3-isopropylmalate/(R)-2-methylmalate dehydratase large subunit [EC:4.2.1.33 4.2.1.35] contig-65_352_33_8 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] contig-65_592_80_5 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] contig-65_782_127_3 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35]

212

Supplementary Table 4.5 (Continued)

contig-65_1144_200_1 K01704 leuD, IPMI-S; 3-isopropylmalate/(R)-2-methylmalate dehydratase small subunit [EC:4.2.1.33 4.2.1.35] 4.2.1.46 dTDP-glucose 4,6-dehydratase contig-65_285_18_3 K01710 E4.2.1.46, rfbB, rffG; dTDP-glucose 4,6-dehydratase [EC:4.2.1.46] 4.2.1.47 GDP-mannose 4,6-dehydratase contig-65_166_6_21 K01711 gmd, GMDS; GDPmannose 4,6-dehydratase [EC:4.2.1.47] 4.2.1.49 urocanate hydratase contig-65_264_17_17 K01712 hutU, UROC1; urocanate hydratase [EC:4.2.1.49] contig-65_264_17_19 K01712 hutU, UROC1; urocanate hydratase [EC:4.2.1.49] contig-65_264_17_23 K01712 hutU, UROC1; urocanate hydratase [EC:4.2.1.49] 4.2.1.51 prephenate dehydratase contig-65_191_10_24 K04518 pheA2; prephenate dehydratase [EC:4.2.1.51] 4.2.1.55 3-hydroxybutyryl-CoA dehydratase contig-65_123_4_40 K17865 croR; 3-hydroxybutyryl-CoA dehydratase [EC:4.2.1.55] 4.2.1.59 3-hydroxyacyl-[acyl-carrier-protein] dehydratase contig-65_696_104_8 K02372 fabZ; 3-hydroxyacyl-[acyl-carrier-protein] dehydratase [EC:4.2.1.59] 4.2.1.75 uroporphyrinogen-III synthase contig-65_737_114_7 K13542 cobA-hemD; uroporphyrinogen III methyltransferase / synthase [EC:2.1.1.107 4.2.1.75] 4.2.1.120 4-hydroxybutanoyl-CoA dehydratase contig-65_123_4_39 K14534 abfD; 4-hydroxybutyryl-CoA dehydratase / vinylacetyl-CoA- Delta-isomerase [EC:4.2.1.120 5.3.3.3] contig-65_189_9_22 K14534 abfD; 4-hydroxybutyryl-CoA dehydratase / vinylacetyl-CoA- Delta-isomerase [EC:4.2.1.120 5.3.3.3] 4.2.1.136 ADP-dependent NAD(P)H-hydrate dehydratase contig-65_443_52_8 K17758 nnr; ADP-dependent NAD(P)H-hydrate dehydratase [EC:4.2.1.136] 4.2.1.151 chorismate dehydratase contig-65_292_20_12 K11782 mqnA; chorismate dehydratase [EC:4.2.1.151] 4.2.2 Acting on polysaccharides 4.2.2.- contig-65_1071_188_6 K08309 slt; soluble lytic murein transglycosylase [EC:4.2.2.-] 4.2.3 Acting on phosphates 4.2.3.1 threonine synthase contig-65_167_7_2 K01733 thrC; threonine synthase [EC:4.2.3.1] contig-65_943_167_6 K01733 thrC; threonine synthase [EC:4.2.3.1] contig-65_1134_196_2 K01733 thrC; threonine synthase [EC:4.2.3.1] 4.2.3.4 3-dehydroquinate synthase contig-65_571_75_10 K01735 aroB; 3-dehydroquinate synthase [EC:4.2.3.4] 4.2.3.5 chorismate synthase contig-65_714_108_9 K01736 aroC; chorismate synthase [EC:4.2.3.5] 4.2.3.153 (5-formylfuran-3-yl)methyl phosphate synthase contig-65_212_13_12 K09733 mfnB; (5-formylfuran-3-yl)methyl phosphate synthase [EC:4.2.3.153] 4.2.99 Other carbon-oxygen lyases 4.2.99.18 DNA-(apurinic or apyrimidinic site) lyase contig-65_286_19_13 K10563 mutM, fpg; formamidopyrimidine-DNA glycosylase [EC:3.2.2.23 4.2.99.18]

213

Supplementary Table 4.5 (Continued)

4.3 Carbon-nitrogen lyases 4.3.1 Ammonia-lyases 4.3.1.3 histidine ammonia-lyase contig-65_264_17_21 K01745 hutH, HAL; histidine ammonia-lyase [EC:4.3.1.3] 4.3.1.14 3-aminobutyryl-CoA ammonia-lyase contig-65_1142_199_4 K18014 kal; 3-aminobutyryl-CoA ammonia-lyase [EC:4.3.1.14] 4.3.1.19 threonine ammonia-lyase contig-65_336_29_2 K01754 E4.3.1.19, ilvA, tdcB; threonine dehydratase [EC:4.3.1.19] 4.3.2 Amidine-lyases 4.3.2.1 argininosuccinate lyase contig-65_566_74_6 K01755 argH, ASL; argininosuccinate lyase [EC:4.3.2.1] 4.3.2.2 adenylosuccinate lyase contig-65_548_71_8 K01756 purB, ADSL; adenylosuccinate lyase [EC:4.3.2.2] 4.3.3 Amine-lyases 4.3.3.6 pyridoxal 5'-phosphate synthase (glutamine hydrolysing) contig-65_457_57_2 K06215 pdxS, pdx1; pyridoxal 5'-phosphate synthase pdxS subunit [EC:4.3.3.6] contig-65_457_57_3 K08681 pdxT, pdx2; 5'-phosphate synthase pdxT subunit [EC:4.3.3.6] 4.3.3.7 4-hydroxy-tetrahydrodipicolinate synthase contig-65_377_39_13 K01714 dapA; 4-hydroxy-tetrahydrodipicolinate synthase [EC:4.3.3.7] contig-65_963_172_2 K01714 dapA; 4-hydroxy-tetrahydrodipicolinate synthase [EC:4.3.3.7] 4.4 Carbon-sulfur lyases 4.4.1 Carbon-sulfur lyases (only sub-subclass identified to date) 4.4.1.19 phosphosulfolactate synthase contig-65_458_58_8 K08097 comA; phosphosulfolactate synthase [EC:4.4.1.19] 4.6 Phosphorus-oxygen lyases 4.6.1 Phosphorus-oxygen lyases (only sub-subclass identified to date) 4.6.1.12 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase contig-65_360_35_10 K12506 ispDF; 2-C-methyl-D-erythritol 4-phosphate cytidylyltransferase / 2-C-methyl-D-erythritol 2,4-cyclodiphosphate synthase [EC:2.7.7.60 4.6.1.12] 4.6.1.17 cyclic pyranopterin monophosphate synthase contig-65_974_173_1 K03637 moaC, CNX3; cyclic pyranopterin monophosphate synthase [EC:4.6.1.17] 4.99 Other lyases 4.99.1 Sole sub-subclass for lyases that do not belong in the other subclasses 4.99.1.4 sirohydrochlorin ferrochelatase contig-65_737_114_5 K02304 MET8; precorrin-2 dehydrogenase / sirohydrochlorin ferrochelatase [EC:1.3.1.76 4.99.1.4] 5. 5.1 Racemases and epimerases 5.1.1 Acting on amino acids and derivatives 5.1.1.1 alanine racemase contig-65_443_52_7 K01775 alr; alanine racemase [EC:5.1.1.1] 5.1.1.3 glutamate racemase contig-65_497_63_8 K01776 murI; glutamate racemase [EC:5.1.1.3] 5.1.1.7 diaminopimelate epimerase contig-65_191_10_26 K01778 dapF; diaminopimelate epimerase [EC:5.1.1.7] 5.1.1.13 aspartate racemase contig-65_367_37_3 K01779 racD; aspartate racemase [EC:5.1.1.13] 5.1.3 Acting on carbohydrates and derivatives 5.1.3.1 ribulose-phosphate 3-epimerase contig-65_1506_249_3 K01783 rpe, RPE; ribulose-phosphate 3-epimerase [EC:5.1.3.1]

214

Supplementary Table 4.5 (Continued)

5.1.3.2 UDP-glucose 4-epimerase contig-65_166_6_24 K01784 galE, GALE; UDP-glucose 4-epimerase [EC:5.1.3.2] 5.1.3.14 UDP-N-acetylglucosamine 2-epimerase (non-hydrolysing) contig-65_376_38_10 K01791 wecB; UDP-N-acetylglucosamine 2-epimerase (non- hydrolysing) [EC:5.1.3.14] 5.1.99 Acting on other compounds 5.1.99.1 methylmalonyl-CoA epimerase contig-65_681_100_7 K05606 MCEE, epi; methylmalonyl-CoA/ethylmalonyl-CoA epimerase [EC:5.1.99.1] 5.1.99.3 allantoin racemase contig-65_264_17_6 K16841 hpxA; allantoin racemase [EC:5.1.99.3] 5.1.99.6 NAD(P)H-hydrate epimerase contig-65_443_52_8 K17759 AIBP, nnrE; NAD(P)H-hydrate epimerase [EC:5.1.99.6] 5.1.99.8 7,8-dihydroneopterin epimerase contig-65_438_49_11 K01633 folB; dihydroneopterin aldolase / 7,8-dihydroneopterin epimerase [EC:4.1.2.25 5.1.99.8] 5.3 Intramolecular oxidoreductases 5.3.1 Interconverting aldoses and ketoses, and related compounds 5.3.1.1 triose-phosphate isomerase contig-65_583_78_9 K01803 TPI, tpiA; triosephosphate isomerase (TIM) [EC:5.3.1.1] 5.3.1.6 ribose-5-phosphate isomerase contig-65_376_38_15 K01808 rpiB; ribose 5-phosphate isomerase B [EC:5.3.1.6] 5.3.1.8 mannose-6-phosphate isomerase contig-65_633_93_2 K15916 pgi-pmi; glucose/mannose-6-phosphate isomerase [EC:5.3.1.9 5.3.1.8] 5.3.1.9 glucose-6-phosphate isomerase contig-65_1355_232_2 K06859 pgi1; glucose-6-phosphate isomerase, archaeal [EC:5.3.1.9] contig-65_633_93_2 K15916 pgi-pmi; glucose/mannose-6-phosphate isomerase [EC:5.3.1.9 5.3.1.8] 5.3.1.16 1-(5-phosphoribosyl)-5-[(5-phosphoribosylamino)methylideneamino]imidazole-4- carboxamide isomerase contig-65_212_13_4 K01814 hisA; phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase [EC:5.3.1.16] contig-65_595_81_7 K01814 hisA; phosphoribosylformimino-5-aminoimidazole carboxamide ribotide isomerase [EC:5.3.1.16] 5.3.1.23 S-methyl-5-thioribose-1-phosphate isomerase contig-65_457_57_7 K08963 mtnA; methylthioribose-1-phosphate isomerase [EC:5.3.1.23] 5.3.1.24 phosphoribosylanthranilate isomerase contig-65_769_123_2 K01817 trpF; phosphoribosylanthranilate isomerase [EC:5.3.1.24] 5.3.2 Interconverting keto- and enol-groups 5.3.2.6 2-hydroxymuconate tautomerase contig-65_367_37_6 K01821 praC, xylH; 4-oxalocrotonate tautomerase [EC:5.3.2.6] 5.3.3 Transposing C=C bonds 5.3.3.2 isopentenyl-diphosphate Delta-isomerase contig-65_620_88_1 K01823 idi, IDI; isopentenyl-diphosphate Delta-isomerase [EC:5.3.3.2] 5.3.3.3 vinylacetyl-CoA Delta-isomerase contig-65_123_4_39 K14534 abfD; 4-hydroxybutyryl-CoA dehydratase / vinylacetyl-CoA- Delta-isomerase [EC:4.2.1.120 5.3.3.3] contig-65_189_9_22 K14534 abfD; 4-hydroxybutyryl-CoA dehydratase / vinylacetyl-CoA- Delta-isomerase [EC:4.2.1.120 5.3.3.3] 5.4 Intramolecular transferases 5.4.2 Phosphotransferases (phosphomutases)

215

Supplementary Table 4.5 (Continued)

5.4.2.7 phosphopentomutase contig-65_359_34_7 K01839 deoB; phosphopentomutase [EC:5.4.2.7] 5.4.2.8 phosphomannomutase contig-65_765_122_7 K01840 manB; phosphomannomutase [EC:5.4.2.8] contig-65_633_93_3 K16881 K16881; mannose-1-phosphate guanylyltransferase / phosphomannomutase [EC:2.7.7.13 5.4.2.8] 5.4.2.10 phosphoglucosamine mutase contig-65_123_4_4 K03431 glmM; phosphoglucosamine mutase [EC:5.4.2.10] 5.4.2.12 (2,3-diphosphoglycerate-independent) contig-65_583_78_10 K15633 gpmI; 2,3-bisphosphoglycerate-independent phosphoglycerate mutase [EC:5.4.2.12] 5.4.3 Transferring amino groups 5.4.3.5 D-ornithine 4,5-aminomutase contig-65_285_18_16 K17898 oraE; D-ornithine 4,5-aminomutase subunit beta [EC:5.4.3.5] contig-65_285_18_15 K17899 oraS; D-ornithine 4,5-aminomutase subunit alpha [EC:5.4.3.5] 5.4.3.8 glutamate-1-semialdehyde 2,1-aminomutase contig-65_583_78_2 K01845 hemL; glutamate-1-semialdehyde 2,1-aminomutase [EC:5.4.3.8] 5.4.3.9 glutamate 2,3-aminomutase contig-65_1655_259_1 K19814 eam; glutamate 2,3-aminomutase [EC:5.4.3.9] 5.4.99 Transferring other groups 5.4.99.2 methylmalonyl-CoA mutase contig-65_681_100_9 K01848 E5.4.99.2A, mcmA1; methylmalonyl-CoA mutase, N-terminal domain [EC:5.4.99.2] contig-65_681_100_8 K01849 E5.4.99.2B, mcmA2; methylmalonyl-CoA mutase, C-terminal domain [EC:5.4.99.2] 5.4.99.5 chorismate mutase contig-65_620_88_5 K06208 aroH; chorismate mutase [EC:5.4.99.5] contig-65_943_167_4 K06209 pheB; chorismate mutase [EC:5.4.99.5] 5.4.99.12 tRNA pseudouridine38-40 synthase contig-65_346_31_28 K06173 truA, PUS1; tRNA pseudouridine38-40 synthase [EC:5.4.99.12] 5.4.99.18 5-(carboxyamino)imidazole ribonucleotide mutase contig-65_548_71_10 K01588 purE; 5-(carboxyamino)imidazole ribonucleotide mutase [EC:5.4.99.18] 5.4.99.22 23S rRNA pseudouridine2605 synthase contig-65_620_88_12 K06178 rluB; 23S rRNA pseudouridine2605 synthase [EC:5.4.99.22] 5.4.99.23 23S rRNA pseudouridine1911/1915/1917 synthase contig-65_220_14_9 K06180 rluD; 23S rRNA pseudouridine1911/1915/1917 synthase [EC:5.4.99.23] contig-65_1827_279_1 K06180 rluD; 23S rRNA pseudouridine1911/1915/1917 synthase [EC:5.4.99.23] 5.4.99.25 tRNA pseudouridine55 synthase contig-65_297_21_2 K03177 truB, PUS4, TRUB1; tRNA pseudouridine55 synthase [EC:5.4.99.25] 5.4.99.58 methylornithine synthase contig-65_710_107_7 K16180 pylB; methylornithine synthase [EC:5.4.99.58] 5.4.99.60 cobalt-precorrin-8 methylmutase contig-65_443_52_12 K06042 cobH-cbiC; precorrin-8X/cobalt-precorrin-8 methylmutase [EC:5.4.99.61 5.4.99.60] 5.4.99.61 precorrin-8X methylmutase contig-65_443_52_12 K06042 cobH-cbiC; precorrin-8X/cobalt-precorrin-8 methylmutase [EC:5.4.99.61 5.4.99.60] 5.5 Intramolecular lyases

216

Supplementary Table 4.5 (Continued)

5.5.1 Intramolecular lyases (only sub-subclass identified to date) 5.5.1.4 inositol-3-phosphate synthase contig-65_95_1_12 K01858 INO1, ISYNA1; myo-inositol-1-phosphate synthase [EC:5.5.1.4] 5.99 Other isomerases 5.99.1 Sole sub-subclass for isomerases that do not belong in the other subclasses 5.99.1.2 DNA topoisomerase contig-65_1279_219_1 K03168 topA; DNA topoisomerase I [EC:5.99.1.2] contig-65_1434_240_2 K03168 topA; DNA topoisomerase I [EC:5.99.1.2] 5.99.1.3 DNA topoisomerase (ATP-hydrolysing) contig-65_1841_281_1 K02469 gyrA; DNA gyrase subunit A [EC:5.99.1.3] contig-65_1971_290_2 K02469 gyrA; DNA gyrase subunit A [EC:5.99.1.3] contig-65_462_59_2 K02470 gyrB; DNA gyrase subunit B [EC:5.99.1.3] 6. 6.1 Forming carbon-oxygen bonds 6.1.1 Ligases forming aminoacyl-tRNA and related compounds 6.1.1.1 tyrosine---tRNA ligase contig-65_1814_276_2 K01866 YARS, tyrS; tyrosyl-tRNA synthetase [EC:6.1.1.1] 6.1.1.2 tryptophan---tRNA ligase contig-65_516_67_10 K01867 WARS, trpS; tryptophanyl-tRNA synthetase [EC:6.1.1.2] 6.1.1.3 threonine---tRNA ligase contig-65_875_157_3 K01868 TARS, thrS; threonyl-tRNA synthetase [EC:6.1.1.3] 6.1.1.4 leucine---tRNA ligase contig-65_378_40_3 K01869 LARS, leuS; leucyl-tRNA synthetase [EC:6.1.1.4] 6.1.1.5 isoleucine---tRNA ligase contig-65_348_32_14 K01870 IARS, ileS; isoleucyl-tRNA synthetase [EC:6.1.1.5] 6.1.1.6 lysine---tRNA ligase contig-65_645_95_10 K04567 KARS, lysS; lysyl-tRNA synthetase, class II [EC:6.1.1.6] 6.1.1.7 alanine---tRNA ligase contig-65_535_70_2 K01872 AARS, alaS; alanyl-tRNA synthetase [EC:6.1.1.7] 6.1.1.9 valine---tRNA ligase contig-65_328_27_17 K01873 VARS, valS; valyl-tRNA synthetase [EC:6.1.1.9] 6.1.1.10 methionine---tRNA ligase contig-65_531_68_10 K01874 MARS, metG; methionyl-tRNA synthetase [EC:6.1.1.10] 6.1.1.11 serine---tRNA ligase contig-65_457_57_8 K01875 SARS, serS; seryl-tRNA synthetase [EC:6.1.1.11] 6.1.1.12 aspartate---tRNA ligase contig-65_320_25_12 K01876 DARS, aspS; aspartyl-tRNA synthetase [EC:6.1.1.12] 6.1.1.14 glycine---tRNA ligase contig-65_723_110_8 K01878 glyQ; glycyl-tRNA synthetase alpha chain [EC:6.1.1.14] contig-65_723_110_9 K01879 glyS; glycyl-tRNA synthetase beta chain [EC:6.1.1.14] 6.1.1.15 proline---tRNA ligase contig-65_297_21_13 K01881 PARS, proS; prolyl-tRNA synthetase [EC:6.1.1.15] 6.1.1.16 cysteine---tRNA ligase contig-65_360_35_13 K01883 CARS, cysS; cysteinyl-tRNA synthetase [EC:6.1.1.16] 6.1.1.17 glutamate---tRNA ligase contig-65_585_79_11 K01885 EARS, gltX; glutamyl-tRNA synthetase [EC:6.1.1.17] 6.1.1.19 arginine---tRNA ligase contig-65_109_2_20 K01887 RARS, argS; arginyl-tRNA synthetase [EC:6.1.1.19] 6.1.1.20 phenylalanine---tRNA ligase contig-65_829_144_4 K01889 FARSA, pheS; phenylalanyl-tRNA synthetase alpha chain [EC:6.1.1.20]

217

Supplementary Table 4.5 (Continued)

contig-65_829_144_3 K01890 FARSB, pheT; phenylalanyl-tRNA synthetase beta chain [EC:6.1.1.20] 6.1.1.21 histidine---tRNA ligase contig-65_320_25_11 K01892 HARS, hisS; histidyl-tRNA synthetase [EC:6.1.1.21] 6.2 Forming carbon-sulfur bonds 6.2.1 Acid-thiol ligases 6.2.1.1 acetate---CoA ligase contig-65_95_1_21 K01895 ACSS, acs; acetyl-CoA synthetase [EC:6.2.1.1] 6.2.1.5 succinate---CoA ligase (ADP-forming) contig-65_203_12_23 K01902 sucD; succinyl-CoA synthetase alpha subunit [EC:6.2.1.5] contig-65_203_12_22 K01903 sucC; succinyl-CoA synthetase beta subunit [EC:6.2.1.5] contig-65_1264_215_1 K01903 sucC; succinyl-CoA synthetase beta subunit [EC:6.2.1.5] 6.2.1.16 acetoacetate---CoA ligase contig-65_167_7_6 K01907 AACS, acsA; acetoacetyl-CoA synthetase [EC:6.2.1.16] 6.2.1.30 phenylacetate---CoA ligase contig-65_1203_207_2 K01912 paaK; phenylacetate-CoA ligase [EC:6.2.1.30] 6.2.1.31 2-furoate---CoA ligase contig-65_982_174_3 K16876 hmfD; 2-furoate---CoA ligase [EC:6.2.1.31] 6.3 Forming carbon-nitrogen bonds 6.3.1 Acid-D-ammonia (or amine) ligases (amide synthases) 6.3.1.2 glutamine synthetase contig-65_95_1_17 K01915 glnA, GLUL; glutamine synthetase [EC:6.3.1.2] contig-65_123_4_12 K01915 glnA, GLUL; glutamine synthetase [EC:6.3.1.2] 6.3.1.5 NAD+ synthase contig-65_307_23_12 K01916 nadE; NAD+ synthase [EC:6.3.1.5] 6.3.1.10 adenosylcobinamide-phosphate synthase contig-65_118_3_25 K02227 cbiB, cobD; adenosylcobinamide-phosphate synthase [EC:6.3.1.10] 6.3.1.20 lipoate---protein ligase contig-65_167_7_17 K03800 lplA, lplJ; lipoate---protein ligase [EC:6.3.1.20] contig-65_174_8_34 K03800 lplA, lplJ; lipoate---protein ligase [EC:6.3.1.20] 6.3.2 Acid-D-amino-acid ligases (peptide synthases) 6.3.2.1 pantoate---beta-alanine ligase (AMP-forming) contig-65_551_72_4 K01918 panC; pantoate--beta-alanine ligase [EC:6.3.2.1] 6.3.2.4 D-alanine---D-alanine ligase contig-65_250_16_19 K01921 ddl; D-alanine-D-alanine ligase [EC:6.3.2.4] contig-65_307_23_2 K01921 ddl; D-alanine-D-alanine ligase [EC:6.3.2.4] 6.3.2.5 phosphopantothenate---cysteine ligase contig-65_751_117_5 K13038 coaBC, dfp; phosphopantothenoylcysteine decarboxylase / phosphopantothenate--cysteine ligase [EC:4.1.1.36 6.3.2.5] 6.3.2.6 phosphoribosylaminoimidazolesuccinocarboxamide synthase contig-65_548_71_7 K01923 purC; phosphoribosylaminoimidazole-succinocarboxamide synthase [EC:6.3.2.6] 6.3.2.8 UDP-N-acetylmuramate---L-alanine ligase contig-65_323_26_10 K01924 murC; UDP-N-acetylmuramate--alanine ligase [EC:6.3.2.8] contig-65_885_159_3 K01924 murC; UDP-N-acetylmuramate--alanine ligase [EC:6.3.2.8] 6.3.2.9 UDP-N-acetylmuramoyl-L-alanine---D-glutamate ligase contig-65_406_43_4 K01925 murD; UDP-N-acetylmuramoylalanine--D-glutamate ligase [EC:6.3.2.9] 6.3.2.10 UDP-N-acetylmuramoyl-tripeptide---D-alanyl-D-alanine ligase contig-65_323_26_6 K01929 murF; UDP-N-acetylmuramoyl-tripeptide--D-alanyl-D-alanine ligase [EC:6.3.2.10]

218

Supplementary Table 4.5 (Continued)

6.3.2.12 dihydrofolate synthase contig-65_328_27_16 K11754 folC; dihydrofolate synthase / folylpolyglutamate synthase [EC:6.3.2.12 6.3.2.17] 6.3.2.13 UDP-N-acetylmuramoyl-L-alanyl-D-glutamate---2,6-diaminopimelate ligase contig-65_323_26_5 K01928 murE; UDP-N-acetylmuramoyl-L-alanyl-D-glutamate--2,6- diaminopimelate ligase [EC:6.3.2.13] contig-65_982_174_1 K01928 murE; UDP-N-acetylmuramoyl-L-alanyl-D-glutamate--2,6- diaminopimelate ligase [EC:6.3.2.13] contig-65_1072_189_4 K01928 murE; UDP-N-acetylmuramoyl-L-alanyl-D-glutamate--2,6- diaminopimelate ligase [EC:6.3.2.13] 6.3.2.17 tetrahydrofolate synthase contig-65_328_27_16 K11754 folC; dihydrofolate synthase / folylpolyglutamate synthase [EC:6.3.2.12 6.3.2.17] 6.3.2.29 cyanophycin synthase (L-aspartate-adding) contig-65_531_68_3 K03802 cphA; cyanophycin synthetase [EC:6.3.2.29 6.3.2.30] 6.3.2.30 cyanophycin synthase (L-arginine-adding) contig-65_531_68_3 K03802 cphA; cyanophycin synthetase [EC:6.3.2.29 6.3.2.30] 6.3.2.- contig-65_212_13_14 K05844 rimK; ribosomal protein S6--L-glutamate ligase [EC:6.3.2.-] 6.3.3 Cyclo-ligases 6.3.3.1 phosphoribosylformylglycinamidine cyclo-ligase contig-65_548_71_2 K01933 purM; phosphoribosylformylglycinamidine cyclo-ligase [EC:6.3.3.1] 6.3.3.2 5-formyltetrahydrofolate cyclo-ligase contig-65_681_100_1 K01934 MTHFS; 5-formyltetrahydrofolate cyclo-ligase [EC:6.3.3.2] 6.3.4 Other carbon-nitrogen ligases 6.3.4.2 CTP synthase (glutamine hydrolysing) contig-65_109_2_32 K01937 pyrG, CTPS; CTP synthase [EC:6.3.4.2] 6.3.4.3 formate---tetrahydrofolate ligase contig-65_438_49_8 K01938 fhs; formate--tetrahydrofolate ligase [EC:6.3.4.3] 6.3.4.4 adenylosuccinate synthase contig-65_193_11_10 K01939 purA, ADSS; adenylosuccinate synthase [EC:6.3.4.4] 6.3.4.5 argininosuccinate synthase contig-65_566_74_5 K01940 argG, ASS1; argininosuccinate synthase [EC:6.3.4.5] 6.3.4.13 phosphoribosylamine---glycine ligase contig-65_1404_238_1 K01945 purD; phosphoribosylamine--glycine ligase [EC:6.3.4.13] 6.3.4.15 biotin---[acetyl-CoA-carboxylase] ligase contig-65_709_106_2 K03524 birA; BirA family transcriptional regulator, biotin operon repressor / biotin-[acetyl-CoA-carboxylase] ligase [EC:6.3.4.15] 6.3.4.19 tRNAIle-lysidine synthase contig-65_438_49_2 K04075 tilS, mesJ; tRNA(Ile)-lysidine synthase [EC:6.3.4.19] 6.3.4.21 nicotinate phosphoribosyltransferase contig-65_620_88_10 K00763 pncB, NAPRT1; nicotinate phosphoribosyltransferase [EC:6.3.4.21] contig-65_2429_313_1 K00763 pncB, NAPRT1; nicotinate phosphoribosyltransferase [EC:6.3.4.21] 6.3.5 Carbon-nitrogen ligases with glutamine as amido-N-donor 6.3.5.2 GMP synthase (glutamine-hydrolysing) contig-65_1436_241_3 K01951 guaA, GMPS; GMP synthase (glutamine-hydrolysing) [EC:6.3.5.2] 6.3.5.3 phosphoribosylformylglycinamidine synthase

219

Supplementary Table 4.5 (Continued)

contig-65_548_71_4 K01952 purL, PFAS; phosphoribosylformylglycinamidine synthase [EC:6.3.5.3] contig-65_548_71_5 K01952 purL, PFAS; phosphoribosylformylglycinamidine synthase [EC:6.3.5.3] contig-65_548_71_6 K01952 purL, PFAS; phosphoribosylformylglycinamidine synthase [EC:6.3.5.3] 6.3.5.5 carbamoyl-phosphate synthase (glutamine-hydrolysing) contig-65_1087_192_3 K01955 carB, CPA2; carbamoyl-phosphate synthase large subunit [EC:6.3.5.5] contig-65_1133_195_1 K01955 carB, CPA2; carbamoyl-phosphate synthase large subunit [EC:6.3.5.5] contig-65_1133_195_2 K01956 carA, CPA1; carbamoyl-phosphate synthase small subunit [EC:6.3.5.5] 6.3.5.6 asparaginyl-tRNA synthase (glutamine-hydrolysing) contig-65_758_120_3 K02433 gatA, QRSL1; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit A [EC:6.3.5.6 6.3.5.7] contig-65_758_120_4 K02434 gatB, PET112; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit B [EC:6.3.5.6 6.3.5.7] contig-65_758_120_2 K02435 gatC, GATC; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit C [EC:6.3.5.6 6.3.5.7] 6.3.5.7 glutaminyl-tRNA synthase (glutamine-hydrolysing) contig-65_758_120_3 K02433 gatA, QRSL1; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit A [EC:6.3.5.6 6.3.5.7] contig-65_758_120_4 K02434 gatB, PET112; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit B [EC:6.3.5.6 6.3.5.7] contig-65_758_120_2 K02435 gatC, GATC; aspartyl-tRNA(Asn)/glutamyl-tRNA(Gln) amidotransferase subunit C [EC:6.3.5.6 6.3.5.7] 6.3.5.9 hydrogenobyrinic acid a,c-diamide synthase (glutamine-hydrolysing) contig-65_443_52_11 K02224 cobB-cbiA; cobyrinic acid a,c-diamide synthase [EC:6.3.5.9 6.3.5.11] 6.3.5.10 adenosylcobyric acid synthase (glutamine-hydrolysing) contig-65_118_3_26 K02232 cobQ, cbiP; adenosylcobyric acid synthase [EC:6.3.5.10] 6.3.5.11 cobyrinate a,c-diamide synthase contig-65_443_52_11 K02224 cobB-cbiA; cobyrinic acid a,c-diamide synthase [EC:6.3.5.9 6.3.5.11] 6.5 Forming phosphoric-ester bonds 6.5.1 Ligases that form phosphoric-ester bonds (only sub-subclass identified to date) 6.5.1.1 DNA ligase (ATP) contig-65_516_67_4 K01971 ligD; bifunctional non-homologous end joining protein LigD [EC:6.5.1.1] contig-65_516_67_5 K01971 ligD; bifunctional non-homologous end joining protein LigD [EC:6.5.1.1] 6.5.1.2 DNA ligase (NAD+) contig-65_758_120_1 K01972 E6.5.1.2, ligA, ligB; DNA ligase (NAD+) [EC:6.5.1.2] contig-65_939_166_1 K01972 E6.5.1.2, ligA, ligB; DNA ligase (NAD+) [EC:6.5.1.2] 6.5.1.3 RNA ligase (ATP) contig-65_457_57_1 K14415 RTCB, rtcB; tRNA-splicing ligase RtcB [EC:6.5.1.3] contig-65_1971_290_1 K14415 RTCB, rtcB; tRNA-splicing ligase RtcB [EC:6.5.1.3] 6.5.1.- contig-65_151_5_11 K01975 ligT; 2'-5' RNA ligase [EC:6.5.1.-] 6.6 Forming nitrogen-D-metal bonds 6.6.1 Forming coordination complexes

220

Supplementary Table 4.5 (Continued)

6.6.1.1 magnesium chelatase contig-65_320_25_2 K03404 chlD, bchD; magnesium chelatase subunit D [EC:6.6.1.1] contig-65_320_25_1 K03405 chlI, bchI; magnesium chelatase subunit I [EC:6.6.1.1]

221

7. BIBLIOGRAPHY

Alt JC. (1995). Subseafloor processes in Mid-Ocean ridge hydrothermal systems. Geophys Monogr 91:85–114.

Anderson I, Risso C, Holmes D, Lucas S, Copeland A, Lapidus A, et al. (2011). Complete genome sequence of Ferroglobus placidus AEDII12DO. Stand Genomic Sci 5:50–60.

Bach W, Edwards KJ. (2003). Iron and sulfide oxidation within the basaltic ocean crust: implications for chemolithoautotrophic microbial biomass production. Geochim Cosmochim Acta 67:3871–3887.

Boettger J, Lin H-T, Cowen JP, Hentscher M, Amend JP. (2013). Energy yields from chemolithotrophic metabolisms in igneous basement of the Juan de Fuca ridge flank system. Chem Geol 337–338:11–19.

Boyd E, Schut G, Adams M, Peters J. (2014). Hydrogen metabolism and the evolution of biological respiration. Microbe 9:361–367.

Braakman R, Smith E. (2012). The emergence and early evolution of biological carbon-fixation. PLoS Comput Biol 8:e1002455.

Bratt SR, Purdy GM. (1984). Structure and variability of oceanic crust on the flanks of the East Pacific Rise between 11° and 13°N. J Geophys Res Solid Earth 89:6111– 6125.

Brazelton WJ, Morrill PL, Szponar N, Schrenk MO. (2013). Bacterial communities associated with subsurface geochemical processes in continental serpentinite springs. Appl Environ Microbiol 79:3906–16.

Brazelton WJ, Nelson B, Schrenk MO. (2012). Metagenomic evidence for H2 oxidation and H2 production by serpentinite-hosted subsurface microbial communities. Front Microbiol 2:268.

Brazelton WJ, Schrenk MO, Kelley DS, Baross J a. (2006). Methane- and sulfur- metabolizing microbial communities dominate the Lost City hydrothermal field ecosystem. Appl Environ Microbiol 72:6257–70.

Buchfink B, Xie C, Huson DH. (2015). Fast and sensitive protein alignment using DIAMOND. Nat Meth 12:59–60.

222

Chivian D, Brodie EL, Alm EJ, Culley DE, Dehal PS, DeSantis TZ, et al. (2008). Environmental genomics reveals a single-species ecosystem deep within Earth. Science 322:275–278.

Clarke TA, Mills PC, Poock SR, Butt JN, Cheesman MR, Cole JA, et al. (2008). Escherichia coli Cytochrome c Nitrite Reductase NrfA. Methods Enzymol 437:63–77.

Cowen JP, Giovannoni SJ, Kenig F, Johnson HP, Butterfield D, Rappé MS, et al. (2003). Fluids from aging ocean crust that support microbial life. Science 299:120–3.

Dick HJB. (1989). Abyssal peridotites, very slow spreading ridges and ocean ridge magmatism. Geol Soc London, Spec Publ 42:71–105.

Dick HJB, Lin J, Schouten H. (2003). An ultraslow-spreading class of ocean ridge. Nature 426:405–12.

Dick HJB, Tivey MA, Tucholke BE. (2008). Plutonic foundation of a slow-spreading ridge segment: Oceanic core complex at Kane Megamullion, 23°30′N, 45°20′W. Geochemistry, Geophys Geosystems 9. doi:10.1029/2007GC001645.

Drake HL, Daniel SL, Küsel K, Matthies C, Kuhner C, Braus-Stromeyer S. (1997). Acetogenic bacteria: what are the in situ consequences of their diverse metabolic versatilities? BioFactors 6:13–24.

Edwards KJ, Bach W, McCollom TM. (2005). Geomicrobiology in oceanography: microbe-mineral interactions at and below the seafloor. Trends Microbiol 13:449–56.

Edwards KJ, Bach W, Rogers DR. (2003). Geomicrobiology of the ocean crust: a role for chemoautotrophic Fe-bacteria. Biol Bull 204:180–5.

Edwards KJ, Becker K, Colwell F. (2012). The Deep, Dark Energy Biosphere: Intraterrestrial Life on Earth. Annu Rev Earth Planet Sci 40:551–568.

Edwards KJ, Fisher AT, Wheat CG. (2012). The deep subsurface biosphere in igneous ocean crust: frontier habitats for microbiological exploration. Front Microbiol 3:8.

Edwards KJ, Glazer BT, Rouxel OJ, Bach W, Emerson D, Davis RE, et al. (2011). Ultra-diffuse hydrothermal venting supports Fe-oxidizing bacteria and massive umber deposition at 5000 m off Hawaii. ISME J 5:1748–1758.

223

Edwards KJ, Rogers DR, Wirsen CO, Mccollom TM. (2003). Isolation and Characterization of Novel Psychrophilic, Neutrophilic,  -Proteobacteria from the Deep Sea. Appl Env Micro 69:2906–2913.

Edwards KJ, Wheat CG, Sylvan JB. (2011). Under the sea: microbial life in volcanic oceanic crust. Nat Rev Microbiol 9:703–712.

Edwards, KJ, Bach W, and Klaus A (2010). Integrated Ocean Drilling Program Prospectus, Expedition 336. College Station: IODP

Emerson D, Fleming EJ, McBeth JM. (2010). Iron-oxidizing bacteria: an environmental and genomic perspective. Annu Rev Microbiol 64:561–83.

Emerson D, Moyer CL. (2002). Neutrophilic Fe-Oxidizing Bacteria Are Abundant at the Loihi Seamount Hydrothermal Vents and Play a Major Role in Fe Oxide Deposition. Appl Environ Micro 68:3085–3093.

Felsenstein J. (1985). Confidence Limits on Phylogenies: An Approach Using the Bootstrap. Evolution (N Y) 39:783–791.

Fisher AT, Urabe T, Klaus A. (2005). IODP expedition 301 installs three borehole crustal observatories, prepares for three-dimensional, cross-hole experiments in the Northeastern Pacific Ocean. Sci Drill 1. http://www.iodp.org/iodp_journals/SD001_04_SRexp301.pdf.

Fisher AT, Wheat CG, Becker K, Davis EE, Jannasch H, Schroeder D, et al. (2005). Scientific and technical design and deployment of long-term subseafloor observatories for hydrogeologic and related experiments , IODP Expedition 301 , eastern flank of Juan de Fuca Ridge 1 and general design. Proc Integr Ocean Drill Progr 301. doi:10.2204/iodp.proc.301.103.2005.

Fisk MR, Thorseth IH, Urbach E, Giovannoni SJ. (2000). Investigation of microorganisms and DNA from subsuface thermal water and rock from the east flank of Juan de Fuca Ridge. Proc Ocean Drill Program, Sci Results 168:167–174.

Flores GE, Campbell JH, Kirshtein JD, Meneghin J, Podar M, Steinberg JI, Seewald JS, Tivey MK, Voytek MA, et al. (2011). Microbial community structure of hydrothermal deposits from geochemically different vent fields along the Mid- Atlantic Ridge. Environ Microbiol 13:2158–71.

Fonknechten N, Chaussonnerie S, Tricot S, Lajus A, Andreesen JR, Perchat N, et al. (2010). Clostridium sticklandii, a specialist in amino acid degradation:revisiting its metabolism through its genome sequence. BMC Genomics 11:555.

224

Fontaine FE, Peterson WH, McCoy E, Johnson MJ, Ritter GJ. (1942). A New Type of Glucose Fermentation by Clostridium thermoaceticum. J Bacteriol 43:701–715.

Friedrich T, Scheide D. (2000). The respiratory complex I of bacteria, archaea and eukarya and its module. Febs Lett 479:1.

Fruh-Green G. (2004). Serpentinization of Oceanic Peridotites. Geophys Mono Ser 144. 10.1029/144GM08.

Fuchs G. (2011). Alternative Pathways of Carbon Dioxide Fixation : Insights into the Early Evolution of Life ? Annu Rev Microbiol 65:635–58.

Grabarse W, Mahlert F, Duin EC, Goubeaud M, Shima S, Thauer RK, et al. (2001). On the mechanism of biological methane formation: structural evidence for conformational changes in methyl-coenzyme M reductase upon substrate binding. J Mol Biol 309:315–330.

Harary I, Korey SR, Severo O. (1953). Biosynthesis of dicarboxylic acids by carbon dioxide fixation. J Biol Chem 203:595–604.

Hatch MD, Slack CR. (1968). A new enzyme for the interconversion of pyruvate and phosphopyruvate and its role in the C4 dicarboxylic acid pathway of photosynthesis. Biochem J 106:141–146.

Heberling C, Lowell RP, Liu L, Fisk MR. (2010). Extent of the microbial biosphere in the oceanic crust. Geochemistry Geophys Geosystems 11:1–15.

Huber J, Johnson HP, Butterfield D, Baross J. (2006). Microbial life in ridge flank crustal fluids. Environ Microbiol 8:88–99.

Hugenholtz P, Pitulle C, Hershberger KL, Pace NR. (1998). Novel division level bacterial diversity in a Yellowstone hot spring. J Bacteriol 180:366–76.

Huse SM, Dethlefsen L, Huber JA, Mark Welch D, Welch DM, Relman DA, et al. (2008). Exploring microbial diversity and taxonomy using SSU rRNA hypervariable tag sequencing. Eisen, JA (ed). PLoS Genet 4:e1000255.

Huse SM, Welch DBM, Voorhis A, Shipunova A, Morrison HG, Eren AM, et al. (2014). VAMPS : a website for visualization and analysis of microbial population structures VAMPS : a website for visualization and analysis of microbial population structures. BMC Bioinformatics 15.

225

Hyatt D, Chen G-L, Locascio PF, Land ML, Larimer FW, Hauser LJ. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11:119.

Ildefonse B, Abe N, Blackman D, Canales J, Isozaki Y, Kodaira S, et al. (2010). MoHole: A Crustal Journey and Mantle Quest, Workshop in Kanazawa, Japan, 3 - 5 June, 2010. Sci Drill 10:56–62.

Imachi H, Sekiguchi Y. (2002). Pelotomaculum thermopropionicum gen. nov., sp. nov., an anaerobic, thermophilic, syntrophic propionate-oxidizing bacterium. Int J Syst Evol Microbiol 1729–1735.

Jiang L, Long C, Wu X, Xu H, Shao Z, Long M. (2014). Optimization of thermophilic fermentative hydrogen production by the newly isolated Caloranaerobacter azorensis H53214 from deep-sea hydrothermal vent environment. Int J Hydrogen Energy 39:14154–14160.

Johnson HP, Pruis MJ. (2003). Fluxes of fluid and heat from the oceanic crustal reservoir. Earth Planet Sci Lett 216:565–574.

Jukes TH, Cantor CR, Munro HN. (1969). Evolution of protein molecules. Mamm protein Metab 3:132.

Jungbluth SP, Bowers RM, Lin H, Cowen JP, Rappé MS. (2016). Novel microbial assemblages inhabiting crustal fluids within mid-ocean ridge flank subsurface basalt. Nat Publ Gr 10:1–15.

Jungbluth SP, Grote J, Lin H-T, Cowen JP, Rappé MS. (2013). Microbial diversity within basement fluids of the sediment-buried Juan de Fuca Ridge flank. ISME J 7:161–172.

Jungbluth SP, Lin H-T, Cowen JP, Glazer BT, Rappé MS. (2014). Phylogenetic diversity of microorganisms in subseafloor crustal fluids from Holes 1025C and 1026B along the Juan de Fuca Ridge flank. Front Microbiol 5:119.

Jungbluth SP, del Rio TG, Tringe SG, Stepanauskas R, Rappé MS. (2017). Genomic comparisons of a bacterial lineage that inhabits both marine and terrestrial deep subsurface systems. PeerJ 1–22.

Kallmeyer J, Pockalny R, Adhikari RR, Smith DC, D’Hondt S. (2012). Global distribution of microbial abundance and biomass in subseafloor sediment. Proc Natl Acad Sci U S A 109:16213–6.

226

Kanehisa M, Sato Y, Morishima K. (2016). BlastKOALA and GhostKOALA: KEGG Tools for Functional Characterization of Genome and Metagenome Sequences. J Mol Biol 428:726–731.

Kashefi K, Tor J. (2002). Geoglobus ahangari gen. nov., sp. nov., a novel hyperthermophilic archaeon capable of oxidizing organic acids and growing autotrophically on hydrogen with Fe (III). Int J Syst Evol Microbiol 53:719–728.

Kashefi K, Tor JM, Holmes DE, Gaw Van Praagh C V, Reysenbach A-L, Lovley DR. (2002). Geoglobus ahangari gen. nov., sp. nov., a novel hyperthermophilic archaeon capable of oxidizing organic acids and growing autotrophically on hydrogen with Fe(III) serving as the sole electron acceptor. Int J Syst Evol Microbiol 52:719–728.

Kearey P, Klepeis K, Vine FJ. (2009). Global tectonics. doi:10.1038/236261b0.

Kelley DS, Karson JA, Blackman DK, Fruh-Green GL, Butterfield DA, Lilley MD, Olsen EJ, Schrenk MO, Roe KK, Lebon GT, and Rivizzigno, P (2001). An off-axis hydrothermal vent field near the Mid-Atlantic Ridge at 30 N. Nature 412:145 – 149.

Kennett JP. (1982). Marine Geology. Prentice-Hall https://books.google.com/books?id=gVASAQAAIAAJ.

Kim M, Oh HS, Park SC, Chun J. (2014). Towards a taxonomic coherence between average nucleotide identity and 16S rRNA gene sequence similarity for species demarcation of prokaryotes. Int J Syst Evol Microbiol 64:346–351.

Klenk H, Clayton RA, Tomb J, Dodson RJ, Gwinn M, Hickey EK, et al. (1998). The complete genome sequence of the hyperthermophilic, sulphate-reducing archaeon. Nature 394:6342–6349.

Laczny CC, Pinel N, Vlassis N, Wilmes P. (2014). Alignment-free visualization of metagenomic data by nonlinear dimension reduction. Sci Rep 4:4516.

Lang SQ, Butterfield DA, Lilley MD, Paul Johnson H, Hedges JI. (2006). Dissolved organic carbon in ridge-axis and ridge-flank hydrothermal systems. Geochim Cosmochim Acta 70:3830–3842.

Lehman RM. (2007). Understanding of aquifer microbiology is tightly linked to sampling approaches. Geomicrobiol J 24:331–341.

Lever MA. (2012). Acetogenesis in the energy-starved deep biosphere-a paradox? Front Microbiol 2. doi:10.3389/fmicb.2011.00284.

227

Lever M, Rouxel O, Alt J, Shimizu N. (2013). Evidence for microbial carbon and sulfur cycling in deeply buried ridge flank basalt. Science 339:1305–1308.

Lin H-T, Amend J, E. LaRowe D, Bingham J-P, P. Cowen J. (2015). Dissolved amino acids in oceanic basaltic basement fluids. Geochim Cosmochiim Acta 164.doi:10.1016/j.gca.2015.04.044.

Lin H-T, Cowen JP, Olson EJ, Amend JP, Lilley MD. (2012). Inorganic chemistry, gas compositions and dissolved organic carbon in fluids from sedimented young basaltic crust on the Juan de Fuca Ridge flanks. Geochim Cosmochim Acta 85:213– 227.

Lin H-T, Cowen JP, Olson EJ, Lilley MD, Jungbluth SP, Wilson ST, et al. (2014). Dissolved hydrogen and methane in the oceanic basaltic biosphere. Earth Planet Sci Lett 405:62–73.

Magnabosco C, Ryan K, Lau MCY, Kuloyo O, Lollar BS, Kieft TL, et al. (2015). A metagenomic window into carbon metabolism at 3 km depth in Precambrian continental crust. ISME J 10:730–741.

Maia LB, Moura JJG, Moura I. (2015). Molybdenum and tungsten-dependent formate dehydrogenases. J Biol Inorg Chem 20:287–309.

Mason OU, Di Meo-Savoie C a, Van Nostrand JD, Zhou J, Fisk MR, Giovannoni SJ. (2009). Prokaryotic diversity, distribution, and insights into their role in biogeochemical cycling in marine basalts. ISME J 3:231–42.

Mason OU, Nakagawa T, Rosner M, Van Nostrand JD, Zhou J, Maruyama A, et al. (2010). First investigation of the microbiology of the deepest layer of ocean crust. PLoS One 5:e15399.

Maupin-Furlow JA, Ferry JG. (1996). Analysis of the CO dehydrogenase/acetyl- coenzyme A synthase operon of Mmethanosarcina thermophila. J Bacteriol 178:6849–6856.

Mayhew LE, Ellison ET, Mccollom TM, Trainor TP, Templeton AS. (2013). Hydrogen generation from low-temperature water–rock reactions. Nat Geo 6. doi:10.1038/NGEO1825.

McCarthy MD, Beaupré SR, Walker BD, Voparil I, Guilderson TP, Druffel ERM. (2010). Chemosynthetic origin of 14C-depleted dissolved organic matter in a ridge- flank hydrothermal system. Nat Geosci 4:32–36.

228

McCollom TM. (2007). Geochemical constraints on sources of metabolic energy for chemolithoautotrophy in ultramafic-hosted deep-sea hydrothermal systems. Astrobiology 7:933–50.

McCollom TM, Bach W. (2009). Thermodynamic constraints on hydrogen generation during serpentinization of ultramafic rocks. Geochim Cosmochim Acta 73:856–875.

Miller HM, Mayhew LE, Ellison ET, Kelemen P, Kubo M, Templeton AS. (2017). Low temperature hydrogen production during experimental hydration of partially- serpentinized dunite. Geochim Cosmochim Acta 209:161–183.

Nagarajan H, Sahin M, Nogales J, Latif H, Lovley DR, Ebrahim A. (2013). Characterizing acetogenic metabolism using a genome-scale metabolic reconstruction of Clostridium ljungdahlii. Microb Cell Fact 12:1–13.

Nakagawa S, Inagaki F, Suzuki Y, Steinsbu BO, Lever MA, Takai K, et al. (2006). Microbial community in black rust exposed to hot ridge flank crustal fluids. Appl Environ Microbiol 72:6789–99.

Nealson KH, Inagaki F, Takai K. (2005). Hydrogen-driven subsurface lithoautotrophic microbial ecosystems (SLiMEs): do they exist and why should we care? Trends Microbiol 13:405–10.

Nepomnyashchaya YN, Slobodkina GB, Baslerov R V., Chernyh N a., Bonch- Osmolovskaya E a., Netrusov a. I, et al. (2011). Moorella humiferrea sp. nov., a thermophilic, anaerobic bacterium capable of growth via electron shuttling between humic acid and Fe(III). Int J Syst Evol Microbiol 62:613–617.

Nielsen ME, Fisk MR. (2010). Surface area measurements of marine basalts: Implications for the subseafloor microbial biomass. Geophys Res Lett 37. doi:10.1029/2010GL044074.

Nisman B. (1954). THE STICKLAND REACTION. Bacteriol Rev 18:16–42.

Nitschke W, Russell MJ. (2013). Beating the acetyl coenzyme A-pathway to the origin of life. Philos Trans R Soc Lond B Biol Sci 368:20120258.

Nonaka H, Keresztes G, Shinoda Y, Ikenaga Y, Abe M, Naito K, et al. (2006). Complete genome sequence of the dehalorespiring bacterium Desulfitobacterium hafniense Y51 and comparison with Dehalococcoides ethenogenes 195. J Bacteriol 188:2262–2274.

229

Orcutt BN, Bach W, Becker K, Fisher AT, Hentscher M, Toner BM, et al. (2011). Colonization of subsurface microbial observatories deployed in young ocean crust. ISME J 5:692–703.

Orcutt BN, Sylvan JB, Knab NJ, Edwards KJ. (2011). Microbial ecology of the dark ocean above, at, and below the seafloor. Microbiol Mol Biol Rev 75:361–422.

Orcutt BN, Sylvan JB, Rogers DR, Delaney J, Lee RW, Girguis PR. (2015). Carbon fixation by basalt-hosted microbial communities. Front Microbiol 6:1–14.

Parks DH, Imelfort M, Skennerton CT, Hugenholtz P, Tyson GW. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Res 25:1043–55.

Patil KR, Roune L, McHardy AC. (2012). The PhyloPythiaS Web Server for Taxonomic Assignment of Metagenome Sequences. PLoS One 7:e38581.

Peng Y, Leung HCM, Yiu SM, Chin FYL. (2012). IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28:1420–1428.

Peters JW, Schut GJ, Boyd ES, Mulder DW, Shepard EM, Broderick JB, et al. (2015). [ FeFe ] - and [ NiFe ] -hydrogenase diversity , mechanism , and maturation. BBA - Mol Cell Res 1853:1350–1369.

Pierce E, Xie G, Barabote R, Saunders E, Han C, Detter J, et al. (2008). The complete genome sequence of Moorella thermoacetica (f. Clostridium thermoaceticum). Environ Microbiol 10:2550–73.

Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, Peplies J, et al. (2007). SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35:7188–96.

Ragsdale SW. (2008). Enzymology of the Wood-Ljungdahl pathway of acetogenesis. Ann NY Acad Sci 1125:129–136.

Ragsdale SW, Pierce E. (2008). Acetogenesis and the Wood-Ljungdahl pathway of

CO2 fixation. Biochim Biophys Acta - Proteins Proteomics 1784:1873–1898.

Rappé M, Connon S, Vergin K, Giovannoni S. (2002). Cultivation of the ubiquitous SAR 11 marine bacterioplankton clade. Nature 418:630–3.

230

Robador A, Jungbluth SP, LaRowe DE, Bowers RM, Rappé MS, Amend JP, et al. (2015). Activity and phylogenetic diversity of sulfate-reducing microorganisms in low-temperature subsurface fluids within the upper oceanic crust. Front Microbiol 6:1–13.

Russell MJ, Martin W. (2004). The rocky roots of the acetyl-CoA pathway. Trends Biochem Sci 29:358–363.

Saitou N, Nei M. (1987). The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425.

Salter SJ, Cox MJ, Turek EM, Calus ST, Cookson WO, Moffatt MF, et al. (2014). Reagent and laboratory contamination can critically impact sequence-based microbiome analyses. BMC Biol 12:87.

Sanders C, Turkarslan S, Lee DW, Daldal F. (2010). Cytochrome c biogenesis: The Ccm system. Trends Microbiol 18:266–274.

Santelli CM, Orcutt BN, Banning E, Bach W, Moyer CL, Sogin ML, et al. (2008). Abundance and diversity of microbial life in ocean crust. Nature 453:653–6.

Schuchmann K, Müller V. (2014). Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nat Rev Microbiol 12:809–821.

Shin J, Song Y, Jeong Y, Cho BK. (2016). Analysis of the core genome and pan- genome of autotrophic acetogenic bacteria. Front Microbiol 7. doi:10.3389/fmicb.2016.01531.

Simpson JT, Durbin R. (2012). Efficient de novo assembly of large genomes using compressed data structures. Genome Res 22:549–556.

Sleep NH, Meibom a, Fridriksson T, Coleman RG, Bird DK. (2004). H2-rich fluids from serpentinization: geochemical and biotic implications. Proc Natl Acad Sci U S A 101:12818–23.

Smith A, Popa R, Fisk M, Nielsen M, Wheat CG, Jannasch HW, et al. (2011). In situ enrichment of ocean crust microbes on igneous minerals and glasses using an osmotic flow-through device. Geochemistry Geophys Geosystems 12:1–19.

Smith AR, Fisk MR, Thurber AR, Flores GE, Mason OU, Popa R, et al. (2016). Deep Crustal Communities of the Juan de Fuca Ridge Are Governed by Mineralogy. Geomicrobiol J 451:0.

231

Sokolova T, Hanel J, Onyenwoke RU, Reysenbach a-L, Banta a, Geyer R, et al. (2007). Novel chemolithotrophic, thermophilic, anaerobic bacteria Thermolithobacter ferrireducens gen. nov., sp. nov. and Thermolithobacter carboxydivorans sp. nov. Extremophiles 11:145–57.

Steinsbu BO, Thorseth IH, Nakagawa S, Inagaki F, Lever MA, Engelen B, et al. (2010). Archaeoglobus sulfaticallidus sp . nov ., a thermophilic and facultatively lithoautotrophic sulfate-reducer isolated from black rust exposed to hot ridge flank crustal fluids. Int J Syst Evol Microbiol 60:2745–2752.

Stetter KO. (2006). Hyperthermophiles in the history of life. Philos Trans R Soc Lond B Biol Sci 361:1837-42–3.

Stevens TO, Mckinley JP. (1995). Lithoautotrophic Microbial Ecosystems in Deep Basalt Aquifers. Science 270:450–454.

Sunagawa S, Mende DR, Zeller G, Izquierdo-Carrasco F, Berger S a, Kultima JR, et al. (2013). Metagenomic species profiling using universal phylogenetic marker genes. Nat Methods 10:1196–1199.

Sylvan JB, Sia TY, Haddad AG, Briscoe LJ, Toner BM, Girguis PR, et al. (2013). Low temperature geomicrobiology follows host rock composition along a geochemical gradient in lau basin. Front Microbiol 4. doi:10.3389/fmicb.2013.00061.

Takai K, Gamo T, Tsunogai U, Nakayama N, Hirayama H, Nealson KH, et al. (2004). Geochemical and microbiological evidence for a hydrogen-based, hyperthermophilic subsurface lithoautotrophic microbial ecosystem (HyperSLiME) beneath an active deep-sea hydrothermal field. Extremophiles 8:269–82.

Takai K, Nunoura T, Ishibashi J, Lupton J, Suzuki R, Hamasaki H, et al. (2008). Variability in the microbial communities and hydrothermal fluid chemistry at the newly discovered Mariner hydrothermal field, southern Lau Basin. J Geophys Res 113:G02031.

Takami H, Noguchi H, Takaki Y, Uchiyama I, Toyoda A, Nishi S, et al. (2012). A deeply branching thermophilic bacterium with an ancient acetyl-CoA pathway dominates a subsurface ecosystem. PLoS One 7:e30559.

Tamura K, Peterson D, Peterson N, Stecher G, Nei M, Kumar S. (2011). MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. Mol Biol Evol 28:2731–9.

232

Thór Marteinsson V, Rúnarsson A, Stefánsson A, Thorsteinsson T, Jóhannesson T, Magnússon SH, et al. (2012). Microbial communities in the subglacial waters of the Vatnajökull ice cap, Iceland. ISME J 7:427–437.

Toner BM, Lesniewski RA, Marlow JJ, Briscoe LJ, Santelli CM, Bach W, et al. (2013). Mineralogy drives bacterial biogeography of hydrothermally inactive seafloor sulfide deposits. Geomicrobiol J 30:313–326.

Ver Eecke HC, Butterfield DA, Huber JA, Lilley MD, Olson EJ, Roe KK, et al. (2012). Hydrogen-limited growth of hyperthermophilic methanogens at deep-sea hydrothermal vents. Proc Natl Acad Sci U S A 109:13674–9.

Vignais PM, Billoud B. (2007). Occurrence, Classification, and Biological Function of Hydrogenases : An Overview. Chem Rev 107:4206–4272.

Villanueva, G.L; Mumma, M.J.; Novak, R.E.; Käufl, H.U.; Hartogh, P.; Encrenaz, T.; Tokunaga, A.; Khayat A., and Smith MD. (2015). Strong water isotopic anomalies in the martian atmosphere: Probing current and ancient reservoirs. Science 348:218– 221.

Waite JH, Glein CR, Perryman RS, Teolis BD, Magee BA, Miller G, et al. (2017). Cassini finds molecular hydrogen in the Enceladus plume: Evidence for hydrothermal processes. Planet Geol 356:155–159.

Wang H, Edwards KJ. (2009). Bacterial and Archaeal DNA extracted from inoculated experiments: implication for the optimization of DNA extraction from deep-sea basalts. Geomicrobiol J 26:463–469.

Wheat CG. (2004). Heat flow through a basaltic outcrop on a sedimented young ridge flank. Geochemistry Geophys Geosystems 5. doi:10.1029/2004GC000700.

Wheat CG, Jannasch HW, Fisher AT, Becker K, Sharkey J, Hulme S. (2010). Subseafloor seawater-basalt-microbe reactions: Continuous sampling of borehole fluids in a ridge flank environment. Geochemistry Geophys Geosystems 11:1–18.

White WM, Klein EM. (2013). Composition of the Oceanic Crust.Treatise on Geochemistry: Second Edition 4:457–496.

Whitman WB, Coleman DC, Wiebe WJ. (1998). Prokaryotes: the unseen majority. Proc Natl Acad Sci 95:6578.

233

Zhang X, Feng X, Wang F. (2016). Diversity and metabolic potentials of subsurface crustal microorganisms from the western flank of the mid-atlantic ridge. Front Microbiol 7:1–16.

234