Copyright by Marguerite Viola Langwig 2019

The Thesis Committee for Marguerite Viola Langwig Certifies that this is the approved version of the following Thesis:

Expansion of diversity from marine sediments reveals unique metabolic features

APPROVED BY SUPERVISING COMMITTEE:

Brett J. Baker, Supervisor

Deana Erdner

Brandi Kiel Reese

Expansion of Deltaproteobacteria diversity from marine sediment reveals unique metabolic features

by

Marguerite Viola Langwig

Thesis

Presented to the Faculty of the Graduate School of The University of Texas at Austin in Partial Fulfillment of the Requirements for the Degree of

Master of Science in Marine Science

The University of Texas at Austin December 2019 Dedication

This work is dedicated to my mom, Ann Langwig. You are in my heart forever.

Acknowledgements

I would like to thank Dr. Brett Baker for guiding me through this project and allowing me to grow immensely as a scientist. You have been an amazing and supportive mentor. Thank you to Valerie De Anda, your help was crucial for this work and without you I would be lost. Your questions and thinking inspire me every day. Thank you to the brohort for keeping me sane and being the most supportive and loving group of friends I could have imagined. Thank you to Lucas for a lifetime of love, laughter, and friendship. Thank you dad, Gen, and Celia, your love and phone calls helped me get through to the finish. Thank you Nina and Kiley, your training and guidance helped me build the foundation I needed to complete this project. And finally, thank you to my mom, I am here because of you and I will keep pushing to make you proud.

v Abstract

Expansion of Deltaproteobacteria diversity from marine sediment reveals unique metabolic features

Marguerite Viola Langwig, M.S. Marine Sci The University of Texas at Austin, 2019

Supervisor: Brett J. Baker

Deltaproteobacteria are a ubiquitous class of that play a substantial role in carbon and nutrient cycling. However, our understanding of Deltaproteobacteria is biased towards cultured exemplars. To better understand the biodiversity and ecology of the Deltaproteobacteria, we obtained hundreds of unique, uncultured metagenome-assembled genomes (MAGs) from a variety of coastal and deep-sea sediments. These 402 Deltaproteobacteria MAGs represent a 28% increase in Deltaproteobacteria genomes. Phylogenomic analyses revealed 12 novel lineages which consist entirely of uncultured representatives. Among these are two lineages that appear to represent a new order, which are capable of denitrification and dissimilatory nitrate reduction to ammonia (DNRA). Metabolic inference of Deltaproteobacteria MAGs reveals extensive versatility, central carbon metabolism, and a broad distribution of the Wood-Ljungdahl pathway for

CO2 fixation. 54% of Deltaproteobacteria MAGs encode dissimilatory sulfite reductases (DsrAB), and for those with the ability to reduce sulfate, several dsr genes are related to those from thermophiles. This study expands the genetic catalog of Deltaproteobacteria vi and provides a better ecological context for Deltaproteobacteria worldwide. The description of these new lineages highlights that there is much to be learned about this globally distributed .

vii Table of Contents

List of Tables ...... x

List of Figures ...... xi

Introduction ...... 1

Results and Discussion ...... 3

Deltaproteobacteria phylogeny ...... 3

Multigenomic entropy-based scores ...... 4

Central metabolism and metabolic clustering ...... 5

Group 0 ...... 6

Group 1 ...... 7

Group 2 ...... 8

Group 3 ...... 11

Dissimilatory sulfite reductases ...... 13

Conclusions ...... 14

Methods...... 15

Sampling ...... 15

Metagenomic sequencing and assembly ...... 15

Genome binning ...... 18

Phylogenetic analyses ...... 19

Clustering Analysis ...... 20

Dissimilatory sulfite reductases ...... 21

Hydrogenases ...... 22

Functional characterization of genomes ...... 23 viii Appendix ...... 29

References ...... 42

ix List of Tables

Table 1: Geochemical data taken at Guaymas Basin ...... 35 Table 2: Oxygen profile for Mesquite Bay, Texas ...... 36 Table 3: KEGG modules, pathways, and KO numbers used to determined

completeness of pathways shown in Figure 2 ...... 37

x List of Figures

Figure 1: Updated phylogeny of Deltaproteobacteria based on a concatenation of 37

single-copy marker genes ...... 24 Figure 2: Percent completeness of KEGG modules for the 402 reconstructed

Deltaproteobacteria genomes analyzed in this study ...... 26 Figure 3: Phylogeny of nearly 300 new NiFe Deltaproteobacteria hydrogenases ...... 27 Figure 4: Dissimilatory sulfite reductase (DsrA) tree ...... 28 Figure 5: 16S rRNA maximum likelihood tree of 55 MAGs from this analysis and 94

references obtained from the ARB database...... 30 Figure 6: Principal component analysis of MEBS entropy scores for 1,778

Deltaproteobacteria genomes ...... 30 Figure 7: Dissimilatory sulfite reductase (DsrB) tree ...... 31 Figure 8: Phylogeny of FeFe hydrogenases compared to characterized hydrogenase

groups ...... 32 Figure 9: Low-dimension projection and clustering of the entropy scores for 1,778

Deltaproteobacteria genomes ...... 33 Figure 10: Principal component projection of a k means clustering analysis of entropy

scores for 1,778 Deltaproteobacteria genomes ...... 34

xi Introduction

Deltaproteobacteria are a functionally and phylogenetically diverse class of

Proteobacteria with several cultured representatives (DeLong et al., 2014). These cultured organisms are capable of dissimilatory sulfate and sulfur-reduction, elemental sulfur disproportionation, aromatic hydrocarbon degradation, nitrogen fixation, dissimilatory iron-reduction, and predation (Burnham, Collart, & Highison, 1981; Schnell et al., 1989;

Slobodkin et al., 2013). More recently, omics methods such as single cell genomics, de novo metagenomic assembly, and metatranscriptomics have been employed to understand the ecophysiology of uncultured Deltaproteobacteria (Jochum et al., 2018; Sheik, Jain, &

Dick, 2014). These methods have been especially crucial for understanding

Deltaproteobacteria in extreme environments, such as hydrothermal vents, where conditions are difficult to recreate in a laboratory setting (Anantharaman, Breier, & Dick,

2016).

Hydrothermal vents are an ideal environment in which Deltaproteobacteria, and microbial physiology in general, can be studied; these systems contain a diversity of electron donors, and thus are able to support the array of metabolic pathways

Deltaproteobacteria are known to utilize (Kato & Yamagishi, 2016; Martin et al., 2008)

Novel Deltaproteobacteria have been identified from vent environments, including thermophilic members of the iron (III)-reducing Geobacteraceae, of thermoacidophilic , and thermophilic sulfur-disproportionating

Dissulfuribacter (Flores et al., 2012; Kashefi et al., 2003; Slobodkin et al., 2013).

Additionally, metagenomic data shows that Deltaproteobacteria in Guaymas Basin carry 1 out sulfate reduction and butane and propane oxidation (Dombrowski, Teske, & Baker,

2018).

Despite significant advances in our understanding about the metabolic versatility of Deltaproteobacteria, no studies to date have used this growing body of literature to understand the phylogeny and role that this group plays in diverse biogeochemical pathways. In addition, many studies have focused on a subset of specific lineages. Here, we collected 1,376 Deltaproteobacteria genomes from public datasets and obtained 402 new metagenome assembled genomes (MAGs) from marine sediments to examine their biodiversity and metabolic capabilities on a global scale.

2 Results and Discussion

DELTAPROTEOBACTERIA PHYLOGENY

Newly obtained Deltaproteobacteria MAGs were reconstructed from hydrothermal sediments in Guaymas Basin (Gulf of California), and coastal sediments in

Mesquite Bay (MB, Gulf of Mexico). Fifty-six of the Guaymas Basin (GB) MAGs were recently described in Dombrowski et al. (2018), and 314 GB MAGs were reconstructed as part of a recent large-scale sequencing effort that has yielded close to 3,000 draft genomes. These MAGs were derived from 16 sediment samples, spanning 0-33 cm and

3.1-28.32ºC. An additional 32 MAGs were recovered from a 3-15 cm sediment core from

MB. These 402 MAGs have an estimated completeness >50% and contamination <10%

(274 MAGs >70% complete and 101 MAGs >90% complete). To resolve the biodiversity of Deltaproteobacteria MAGs, we constructed a phylogenetic tree using 37 single-copy marker genes for all 1,778 genomes (Figure 1). This analysis revealed the existence of 18 cultured (designated C1-C18) and 12 uncultured (U1-U12) clades. A lineage is considered cultured, “C,” if it contains an organism that has been isolated in a laboratory.

In contrast, uncultured lineages (designated with a “U”) do not have a cultured representative and are only known from environmental sequences.

One-hundred and seventy-three (43%) of the newly obtained MAGs fall within U lineages. U1 and U2 are the most divergent uncultured Deltaproteobacteria of those reconstructed here, forming a distinct branch from all known orders. The phylogenetic placement of U1 and U2 is further supported by their unique amino acid content. U1

3 MAGs have less than 49% amino acid identity (AAI) compared to all the other 402

MAGs analyzed. U2 MAGs are less distinct, showing AAI as high as 79.3% to an uncultured reference genome from the deep sea (Mid-Atlantic Ridge, ASM325167v1).

However, the majority of U2 MAGs have AAI values in the range of 50-60% when compared with closely branching relatives. Taken together, this suggests these lineages comprise a new order within the Deltaproteobacteria.

MULTIGENOMIC ENTROPY-BASED SCORES

To examine the metabolic capabilities encoded by these genomes, we compared their predicted proteins to a variety of functional databases (see methods). Multigenomic

Entropy Based Score (MEBS) was employed to evaluate the overall metabolic machinery involved in nitrogen, iron, oxygen, C1 (single carbon compounds), and sulfur cycles (De

Anda et al., 2017). A high entropy score indicates that a given genome can perform the metabolic pathways involved in these cycles. For the iron cycle, genomes are analyzed for the presence of Pfam domains involved in iron (II) oxidation, iron reduction, and iron absorption. Some of the major pathways assessed for the nitrogen cycle include ammonia assimilation and oxidation, denitrification, assimilatory and dissimilatory nitrate reduction, and nitrogen fixation. Oxygen scores evaluate the capacity for oxygen utilization and anoxygenic photosynthesis. C1 scoring searches for the presence of single carbon compound pathways include methane oxidation, methanogenesis, methylamine degradation, and methane oxidation. Finally, the sulfur entropy score is based on the presence of metabolic pathways involved in the mobilization of inorganic-organic sulfur

4 compounds through microbial-catalyzed reactions (De Anda et al., 2017). The phylogenetic distribution of these cycles is shown in Figure 1. Several patterns are made apparent by this analysis; nitrogen cycling scores are highest for the Myxococcales and appear high for some . Iron cycling appears prevalent throughout the

Desulfuromonadales, which includes the family Geobacteraceae, and this is supported by an abundance of iron (III)-reducing bacteria in this family (Bond et al., 2002; Holmes et al., 2004). MAGs from lineages U1 and U2, which form novel orders, seem to be involved in nitrogen and iron cycles. and Syntrophobacterales lineages display low oxygen scores, which is consistent with cultures from these orders that are anaerobic (Harmsen et al., 1998; Sorokin & Chernyh, 2016). Additionally, sulfur scores are highest for the Desulfovibrionales, and are generally low for Myxococcales. This is consistent with cultured representatives of Desulfovibrionales such as Desulfovibrio vulgaris, a model sulfate reducing bacteria (Heidelberg et al., 2004).

CENTRAL METABOLISM AND METABOLIC CLUSTERING

There are several metabolic characteristics shared among Deltaproteobacteria;

Generally, many Deltaproteobacteria appear to be anaerobic. According to the entropy score analysis, most Deltaproteobacteria display an average oxygen score of 3.23, in contrast to photosynthetic organisms (i.e Cyanobacteria) which have an average oxygen score of 7.5. Deltaproteobacteria across all taxonomic lineages display nearly complete pathways for central carbohydrate metabolism, including glycolysis, gluconeogenesis, the pentose phosphate pathway, and the TCA cycle (Figure 2). Over half of

5 Deltaproteobacteria genomes code PorA, which catalyzes the oxidation of pyruvate to acetyl-CoA, and thus links glycolysis and the TCA cycle (Furdui & Ragsdale, 2000).

Many Deltaproteobacteria can metabolize acetate, with many genomes coding acs, ack, and acdA. Most GB and MB MAGs encode extracellular carbohydrate-activating enzymes (CAZYmes) for starch and glycogen degradation (GH13) and cell wall synthesis (GT51) (Karpinets, 2010). Many Deltaproteobacteria MAGs also have portions of the Wood Ljungdahl pathway for carbon fixation (268 genomes >57% complete and

196 genomes >71% complete). The carbonyl branch of the Wood Ljungdahl pathway is widespread, with 243 genomes coding both components of the enzyme required for this pathway, the CODH/ACS complex (Schuchmann & Müller, 2014). Finally, sulfur dioxygenase (sdo) is present in nearly all Deltaproteobacteria analyzed, and this gene is thought to be involved in sulfide detoxification, or potentially energy production (Liu et al. 2014).

GROUP 0

To examine overall metabolic patterns throughout the Deltaproteobacteria, we compared 402 MAGs obtained in this study with 1,376 publicly available

Deltaproteobacteria genomes. These genomes were clustered using entropy-based scores for C1, N, O, S, and Fe capabilities. This revealed that Deltaproteobacteria form four distinct groups (group 0-3) (Figure 6). The first group (0) consists of 436 genomes which are primarily cultured species. Half of these genomes have a significantly high sulfur score (False Discovery Rate = 0.01), similar to those previously described in known

6 cultured representatives involved in sulfur cycling (i.e. Desulfovibrio,

Desulfonatronospira) (De Anda et al., 2017). In addition, genomes in this group had the highest average iron score. This is consistent with the large number of Geobacter genomes within this group, which are a genus known for their ability to couple the oxidation of organic compounds with the reduction of iron and manganese (Lovley et al.,

2011). Fifty-six MAGs obtained in this study fall within group 0 (24 U MAGs, and 32 C

MAGs), and most of the uncultured MAGs belong to U2. U2 genomes have pathways for denitrification (6 MAGs ≥75% complete denitrification pathway, 6 MAGs 100% complete), as they contain napAB or narGHI, nirK or nirS, norBC, and nosZ (Zehr &

Ward, 2002). Five group 0, U2 MAGs also code for nrfA, and thus are potentially involved in dissimilatory nitrate reduction to ammonia. U2 MAGs are also involved in iron cycling, as they code ferric reductases for iron transport, as well as mtrC and mtrB, a multi heme cytochrome and outer membrane protein involved in manganese and iron reduction (Beliaev et al., 2001; Schuchmann & Müller, 2014). Thirty-three group 0

MAGs code hydrogenases, and these include 1 FeFe hydrogenase (group C3) and 7 types of NiFe hydrogenases (Figures 3, 8). Most NiFe hydrogenases are group 1b and 1c, which can function in hydrogenotrophic respiration using sulfate, nitrate, and metal as terminal electron acceptors (Søndergaard, Pedersen, & Greening, 2016).

GROUP 1

The next group (1) is primarily composed of Desulfobacterales and

Syntrophobacterales derived from environmental sequences (307 genomes). These

7 genomes have low entropy scores for all pathways, suggesting key genes for vital metabolic processes may be absent. Thus, these organisms may rely on hosts and partners for survival and growth (Moran, 2002). There are several cultured syntrophic (e.g.

Syntrophus aciditrophicus SB) and pathogenic (i.e. Lawsonia intracellularis) organisms in group 1, providing further evidence that this group can be characterized by symbiotic lifestyles (Kroll et al., 2005; McInerney et al., 2007). Genomes within this group are also expected to be anaerobic, due to their general low oxygen scores and the presence of strict anaerobic reference genomes (e.g. Desulfurella amilsii) (Florentino et al., 2016). In addition, group 1 lacks cytochromes involved in oxidative phosphorylation, suggesting these organisms are anaerobic (Figure 2). Also, group 1 MAGs code NiFe Group 3c hydrogenases, which are known to occur in several Syntrophobacterales species and are membrane-bound, which is characteristic of syntrophic organisms (Figure 3)

(Søndergaard et al., 2016). In addition to hydrogen-based metabolism, group 1 MAGs have very few genes for nitrogen cycling. Genes present for sulfur cycling include sdo

(24 MAGs) for sulfide detoxification and sat (13 MAGs) to activate sulfate into adenosine 5’-phosphosulfate (APS) (Wasmund, Mußmann, & Loy, 2017). The absence of additional genes involved in dissimilatory sulfite reduction indicate sdo may function in sulfur assimilation in these organisms.

GROUP 2

Group 2 consists of 626 genomes and most MAGs analyzed in this study (249

MAGs). Reference genomes are mostly derived from environmental sequences, though

8 there are also several cultured, model organisms in this group involved in sulfate reduction, such as Desulfovibrio piger and Desulfococcus multivorans. Group 2 organisms have the highest number of proteins involved in sulfur cycling compared to other groups. Specifically, these genomes have protein domains involved in sulfate reduction, tetrathionate and thiosulfate reduction, and elemental sulfur reduction. Group 2 genomes also have high C1 scores relative to other Deltaproteobacteria, which reflects genes involved in methylamine degradation (mtmB1, di- and trimethylamine methyltransferases). In addition, Group 2 genomes have the catalytic domains of tetrahydromethanopterin S-methyltransferase (mtaA and mtaH), the complex which catalyzes the energy-conserving, sodium-ion translocating step in methanogenesis (Sauer,

1986). These C1 proteins are mostly unique to group 2 genomes, where 38 genomes of all those analyzed have mtaA, and 30 of these are from group 2. Components of this complex have previously been identified in the bacterium Moorella thermoacetica and are thought to function in methanol metabolism and acetate biosynthesis in this organism

(Das et al., 2007). In addition, these domains are found in other enzymes and thus may have different functions other than those known to occur in methanogens.

Low oxygen scores in group 2 are due to a lack of cytochromes involved in aerobic respiration, especially cytochrome c oxidase E.C. 1.9.3.1, indicating these organisms lead anaerobic lifestyles (Figure 2). A total of 249 MAGs belong to group 2, and 99 are uncultured (U4-5, U7, U9, U11-12) while 150 are from cultured (C15-C18) lineages. Fourteen of these MAGs have the complete carbonyl and methyl branches of the Wood Ljungdahl pathway, which are thought to rarely coexist in bacteria and have 9 only been identified in one Deltaproteobacteria organism (Adam, Borrel, & Gribaldo,

2019). Forty-nine MAGs possess formate dehydrogenase (fdoG), indicating they can oxidize formate to CO2 (Ferry, 1990). This and the limited number of genes for oxygen utilization indicate they are anaerobic organisms able to metabolize formate. Also, 21 GB group 2 MAGs encode lactate dehydrogenase (Ldh), which is rare among the

Deltaproteobacteria (only present in 8.5% of the genomes). Thus, lactate may be important for fermentative growth in these organisms. Most group 2 MAGs code hydrogenases, with 199 MAGs containing 1 FeFe and 10 NiFe types (Figures 3, 8). The most common hydrogenases are group 1b (84 MAGs), 3c (31 MAGs), and 1a (29

MAGs). Meg22_1214_Bin_188 has a 1e-type hydrogenase, which are not known in

Deltaproteobacteria, function in hydrogenotrophic respiration using sulfur as a terminal electron acceptor, and are present in bacteria capable of sulfur oxidation and reduction

(Søndergaard et al., 2016). Sulfur oxidation genes are absent from this MAG, though sulfur reduction genes are present (dsrAB, dsrKMJOP, qmoABC), suggesting the NiFe 1e hydrogenase may function in this organism in respiratory sulfur reduction (Figure 3).

Both U1 genomes (Meg19_46_Bin_87, Meg22_24_Bin_159) were classified in group 2 and have several genes involved in nitrogen cycling. These genomes encode napAB and narGH, which allow for dissimilatory nitrate reduction, and nirBD, which catalyzes nitrite reduction to nitric oxide (NO) (Zehr & Ward, 2002). Sulfur cycling genes are less abundant, and U1 genomes lack dsrAB and thus are not likely involved in sulfate reduction. However, one or both U1 genomes possess genes for other sulfur cycling processes including thiosulfate reduction (phsABC), tetrathionate reduction 10 (ttrABC), thiosulfate transferase (glpE), and sulfoacetaldehyde degradation (isfD)

(Wasmund et al., 2017). U1 MAGs are also potentially involved in selenate cycling, as they contain a putative selenate reductase, and one of the genomes also codes for a putative selenate reductase binding subunits (Kirsch, Méjean, & Verméglio, 2002).

Finally, U1 contains bcrABCD genes, and thus is likely able to anaerobically reduce the aromatic compound benzoyl-CoA (Löffler et al., 2011). Interestingly, selenate reductases can utilize benzoate as electron donors, and thus the presence of benzoyl-CoA and selenate reductases may be linked in these organisms (Macy et al., 1993).

GROUP 3

Group 3 is unique from other groups in that it has low entropy scores for sulfur and C1, and high scores for oxygen, iron, and nitrogen pathways. A total of 408 genomes cluster within group 3, and 49 of these are MAGs from this study. Nitrogen cycling is evident in group 3 due to the presence of several genes, including nrfA, which is present in 138 genomes and catalyzes the reduction of nitrate to ammonia in dissimilatory nitrate reduction (Zehr & Ward, 2002). Ninety-eight genomes have norB, indicating these organisms can reduce nitric oxide to nitrous oxide. Nitrogen cycling is also evident in

MAGs reconstructed in this study. Most (37 of 49) group 3 MAGs are from lineages C5 and U2, which branch with Myxococcales and form a novel order, respectively. Four U2

MAGs possess the complete denitrification pathway, and thus likely act as an important nitrogen removal mechanism in GB sediments. Eight C5 genomes contain norB and 15 contain nosZ, indicating these organisms can reduce nitric oxide and nitrous oxide.

11 Because nitrite reductases produce NO, a precursor to NO2, these organisms may be of interest in future studies as contributors to the production of this major air pollutant.

Furthermore, Myxococcales identified in estuarine sediments have been shown to have denitrification genes (Baker et al., 2015). High oxygen scores in group 3 are due to the presence of cbb3-type cytochrome c oxidases involved in aerobic respiration (Figure 2).

Also, all 49 MAGs were reconstructed from surface sediments between 0-6 cm, where bacteria most frequently encounter oxygen. Interestingly, these oxidases have been shown to play a role in denitrification under anoxic conditions, and thus could function in both aerobic metabolism and denitrification in these organisms (Hamada et al., 2014).

High iron scores in group 3 are due to the presence of protein domains involved in iron

(II) oxidation. In addition, 47 genomes encode the multi heme cytochrome mtrC and 24 have the outer membrane protein mtrB (Beliaev et al., 2001). Group 3 genomes are unique in that they code for methylamine dehydrogenase (MauAB), which oxidizes methylamine to formaldehyde (Chistoserdov et al., 1994; Gak, Tsygankov, &

Chistoserdov, 1997). MauAB may be rare in Deltaproteobacteria as a whole, as we only identified it in 12 reference genomes in this study. C5 and U2 MAGs also uniquely possess glutathione dehydrogenase (frmA), suggesting these organisms can oxidize formaldehyde (Barber, Rott, & Donohue, 1996; Macy et al., 1993). In addition to these metabolic pathways, group 2 MAGs also play a role in hydrogen cycling. Thirty-four

MAGs have FeFe group C3 hydrogenase, as well as NiFe group 1b, 1f, 3c, and 3d

(Figures 3, 8). Most MAG hydrogenases are type 1f, which have been minimally studied and have an unresolved role (Søndergaard et al., 2016). 12 DISSIMILATORY SULFITE REDUCTASES

More than half of sediment MAGs contain dissimilatory sulfite reductase genes dsrA (217 genomes) and dsrB (216 genomes), which are key markers for sulfate reduction, and sulfide and sulfur oxidation (Thorup & Schramm, 2017; Wasmund et al.,

2017). The phylogenetic distribution of DsrA and DsrB proteins identified in this study are shown in Figures 4 and 7, and these analyses show that most dsr sequences branch with other uncultured Deltaproteobacteria. Two MAGs (belonging to lineages C12 and

C13, Desulfobacterales) have Dsr proteins that branch with partial reverse-type dsr sequences of SAR324 Deltaproteobacteria identified in subtropical gyres (Swan et al.,

2011). Thus, these Desulfobacterales likely catalyze the oxidation of sulfide and/or sulfur. Nineteen MAG Dsr sequences branch closely to those from thermophilic sulfate reducers, including Desulfacinum hydrothermale, Dissulfuribacter thermophilus,

Desulfacinum infernum, Desulfosoma caldarium, and Thermodesulforhabdus norvegica

(Figure 4) (Baena et al., 2011; Beeder, Torsvik, & Lien, 1995; Rees et al., 1995; Sievert

& Kuever, 2000; Slobodkin et al., 2013). This is consistent with their phylogenetic placement according to 37 marker genes, where they are related to D. thermophilus and

T. norvegica (Figure 1), as well as the hot sediment temperatures from which they were recovered (Table 1). Eleven of 19 genomes were identified in sediment with a temperature measured above 42ºC and are thus expected to be thermophilic (Bausum &

Matney, 1965). However, 6 of the MAGs with Dsr sequences related to thermophiles were identified in sediments at lower temperatures, from 3.1ºC-38ºC. MAGs found in the upper portion of this temperature range (29ºC-38ºC) could also be thermophiles given the 13 dynamic nature of these sediments (Teske, Callaghan, & LaRowe, 2014). MAGs identified from sediment between 3.1ºC-12.65ºC are more distantly related to T. norvegica compared to other MAGs found in higher temperatures, forming a distinct branch with 3 uncultured Deltaproteobacteria identified from perchlorate-reducing sediment enrichments (Figure 1) (Barnum et al., 2018). This group may represent a sister clade to T. norvegica that encodes Dsr proteins with high sequence similarity to this group but are not truly thermophilic sulfate reducers.

CONCLUSIONS

We reconstructed an array of uncultured Deltaproteobacteria genomes from coastal and deep-sea sediments. Phylogenomic analyses of these and Deltaproteobacteria obtained from public databases revealed a dramatic expansion of Deltaproteobacteria biodiversity, including 12 distinct and entirely uncultured lineages. We leveraged entropy-based scores to examine the versatility of physiological processes encoded by these bacteria. This revealed four distinct groups based on metabolic pathways for C1, oxygen, and nutrient cycling among all known Deltaproteobacteria. We characterized a novel order that has considerable genetic versatility for iron cycling, denitrification, and dissimilatory nitrate reduction. This study highlights that there is considerable phylogenetic and physiological diversity to be explored among the Deltaproteobacteria, even among those with many cultured representatives.

14 Methods

SAMPLING

GB samples were collected from sediments in the Gulf of California, Mexico,

(27°N 0.388’, 111°W 24.560’). Samples were collected during six Alvin dives in 2008 and 2009 (dives 4569, 4567, 4571, 4573, 4486, 4488) from a depth of approximately

2,000 m. Sediment samples were collected during Alvin dives using polycarbonate cores

(45-60 cm in length, 6.25 cm interior diameter), subsampled into cm layers under N2 gas in the ship’s laboratory and immediately frozen at -80°C. Twenty-seven sediment subsamples from different depth profiles yielded sufficient genomic DNA for metagenomic sequencing. Details on geochemical characteristics are provided in Table 1.

MB samples were collected in July 2016, in Mesquite Bay, Mission-Aransas

National Estuarine Research Reserve, Texas, (28°N 0.147’, -96°W 0.8455’) using a PVC sediment core. The sediment sample was stored on ice and then immediately subsampled and stored at -80ºC until processing. The core was subsampled into four (1D-4D) 3 cm sections spanning 3-15 cm. Oxygen profiles were taken at the site and are shown in Table

1.

METAGENOMIC SEQUENCING AND ASSEMBLY GB Dives 4569, 4567, 4571, and 4488: Total DNA from ≥10 g of sediment from each of the eleven samples was extracted using the MoBio PowerMax soil kit using the manufacturer’s instructions and adjusted to a final concentration of 10 ng/µl for each sample (total amount 100 ng). Libraries for Illumina sequencing were prepared and

15 sequenced by the Joint Genome Institute (JGI). Paired-end sequencing was performed on an Illumina HiSeq 2500 machine generating 2x125 reads providing ~280 gigabases of sequencing data. Quality control and sequence assembly was performed by JGI. Briefly, sequences were trimmed and quality controlled using bbtools, and assembled using megahit v1.0.6 (Li et al., 2015).

MB: Total DNA from ≥10 g of sediment was extracted at four depths (3-6 cm, 6-9 cm, 9-12 cm, and 12-15 cm) using a DNeasy PowerSoil kit (Qiagen) following the manufacturer’s instructions. DNA concentrations were quantified using a QUBIT 2.0 fluorometer (Thermo-Fisher) and metagenomic sequencing was performed at the

Michigan State University RTSF Genomics Core. Libraries were prepared using the

Illumina TruSeq Nano DNA Library Preparation kit on a Perkin Elmer Sciclone NGS robot following manufacturer's protocols. Completed libraries were quality controlled and quantified using a combination of Qubit dsDNA HS and Caliper LabChipGX HS

DNA assays. All libraries were pooled in equimolar quantities and the final pool was quantified using the Kapa Biosystems Illumina Library Quantification qPCR Assay kit.

The pool was loaded onto 2 lanes of an Illumina HiSeq 4000 flow cell and sequencing was performed in a 2x150bp paired end format using HiSeq 4000 SBS reagents. Base calling was completed by Illumina Real Time Analysis (RTA) v2.7.6 and output of RTA was demultiplexed and converted to FASTQ format with Illumina Bcl2fastq v2.18.0.

Adapters were trimmed from FASTQ reads with CutAdapt (Martin, 2011) for TruSeq adapters and quality controlled using Sickle (Joshi & Fass, 2011). Assembly was performed with MegaHit v1.1.1 (--12 --k-list 21,33,55,77,99,121 --min-count 2 --verbose 16 -t 4 --memory 500000000000) (Li et al., 2015). Read mapping was performed with the

BWA aligner v.0.7.12 BWA-MEM algorithm and Samtools v.0.1.19 (Li et al., 2009).

GB Dives 4486 and 4573: Total DNA from ≥10 g of sediment from each of the sixteen samples was extracted using the DNeasy PowerSoil kit (Qiagen) following the manufacturer’s instructions. DNA concentrations were quantified using a QUBIT 2.0 fluorometer (Thermo-Fisher) and metagenomic sequencing was performed at the

Michigan State University RTSF Genomics Core. Libraries were prepared using the

Illumina TruSeq Nano DNA Library Preparation Kit on a Perkin Elmer Sciclone G3 robot following manufacturer’s recommendations. Completed libraries were quality controlled and quantified using a combination of Qubit dsDNA HS and Advanced

Analytical Fragment Analyzer High Sensitivity DNA assays. The libraries were divided into 4 pools, each with 4 libraries combined in equimolar amounts. Pools were quantified using the Kapa Biosystems Illumina Library Quantification qPCR kit. Each pool was loaded onto 2 lanes of a HiSeq 4000 flow cell (8 lanes total) and sequencing was performed in a 2x150bp paired end format using HiSeq 4000 SBS reagents. Base calling was completed by Illumina Real Time Analysis (RTA) v2.7.7 and output of RTA was demultiplexed and converted to FastQ format with Illumina Bcl2fastq v2.19.1. An additional round of sequencing was completed on these library pools for improved resolution during genomic reconstruction. Sequences were trimmed and quality controlled using Sickle v1.33 and assembly was performed using IDBA-UD v1.0.9 (Joshi

& Fass, 2011; Peng et al., 2012). These methods produced close to 3,000 reconstructed genomes and 3.7 TB of genomic data. 17 GENOME BINNING Individual assemblies (only scaffolds ≥ 2000 bp) form dives 4569, 4567, 4571, and 4488 were binned using tetra-ESOM, Anvi’o (v2.2.2), and Metabat (v1) (Dick et al.,

2009; Eren et al., 2015; Kang et al., 2015). For ESOM, bins were extracted using getClassFasta.pl (using -loyal 51). For Anvi’o and Metabat, coverage was determined using BWA-MEM in paired-end mode (bwa-0.7.12-r1034). Anvi’o was used with default settings and Metabat was run with the following parameters: --minProb 75 --minContig

2000 --minContigByCorr 2000. Results from the three different binning tools were combined using DAS Tool v1.0 using default settings (Sieber et al., 2018). The accuracy of the binning approach was evaluated by calculating the percentage of completeness and contamination using CheckM lineage_wf v1.0.5 (Parks et al., 2015). Genomes were only analyzed further if they were more than 50% complete and showed a contamination below 10%, yielding 552 genomic bins. 56 bins were identified as Deltaproteobacteria and thus included in this analysis.

Binning of MB assembled fragments was performed with MetaBat v2.12.1,

Maxbin v2.2.4, and Anvi’o v3 using contigs >= 2500 bp (Eren et al., 2015; Kang et al.,

2015; Wu et al., 2014). Consensus bins were determined with DASTool v1.0 (-- search_engine diamond) (Sieber et al., 2018). Bin completeness and contamination were determined with CheckM v1.0.11 (Parks et al., 2015). For subsequent analyses, only bins with completeness >=50% and contamination <10% were used. 32 bins were identified as

Deltaproteobacteria and included in this analysis.

18 Binning of individual GB assemblies (scaffolds >2000 bp) was performed from dives 4486 and 4573 using CONCOCT v0.4.0 and Metabat v2.12.1 (Alneberg et al.,

2014; Kang et al., 2015). CONCOCT was used with default settings and Metabat was run with the following parameters: –minCVSum 0 --saveCls -d -v --minCV 0.1 -m 2000.

Results from these two binning tools were combined using DAS Tool v1.0 using default settings. CheckM v1.0.11 was used to determine bin completeness and contamination

(Parks et al., 2015; Sieber et al., 2018). Genomes were only analyzed further if they were more than 50% complete and showed a contamination below 10%, yielding 2,998 genomic bins. 314 bins were identified as Deltaproteobacteria and included in this analysis. All three analyses produced 402 Deltaproteobacteria bins in total.

PHYLOGENETIC ANALYSES Phylosift was used to extract 37 single-copy, protein-coding marker genes for the phylogenetic placement of the assembled metagenomic bins (Darling et al., 2014). A set of 1,376 reference genomes were downloaded from NCBI in February of 2019 for comparison with GB and MB genomes (Benson et al., 2013; Leary et al., 2016). 402 reconstructed genomic bins were used as input for Phylosift v1.0.1 using the ‘phylosift search’ followed by the ‘phylosift align’ mode (Darling et al., 2014). The concatenated protein alignments of 37 elite marker genes (concat.updated.1.fasta) were combined for all genomes of interest and trimmed using TrimAl v3 using the automated1 setting

(Capella-Gutiérrez, Silla-Martínez, & Gabaldón, 2009). A phylogenetic tree was generated using a maximum likelihood-based approach using RAxML v8.2.10, called as:

19 raxmlHPC-PTHREADS-AVX -f a -m PROTGAMMAAUTO -N autoMRE (Stamatakis,

2006).

16S rRNA gene sequences were extracted from reconstructed genomes using barrnap v.3 using the following parameters: --lencutoff 0.2 --reject 0.3 --evalue 1e-05

(Seeman, 2013). For several bins, 16S sequences were identified in two or three different scaffolds, and for these bins barrnap was run without specifying any parameters. This produced a single 16S hit for these genomes and this is the scaffold that was used for further analysis. Sequences were uploaded to ARB and aligned with the reference arb database SILVA_132_SSURef_NR99_13_12_17_opt.arb.gz (Ludwig et al., 2004). The alignment was checked and manually refined, exported from ARB, and trimmed using

BMGE v.1.12 (Criscuolo & Gribaldo, 2010). This analysis produced an alignment of 149 sequences, 55 16S sequences from reconstructed genomes, and 94 from the reference database. Using this alignment, a maximum likelihood tree was created with RaxML v.8.2.4 as: raxmlHPC-PTHREADS-AVX -f a -m GTRGAMMA -N autoMRE

(Stamatakis, 2006).

CLUSTERING ANALYSIS MEBS entropy scores for the 1,778 Deltaproteobacteria genomes were analyzed using the MEBS script F_MEBS_cluster.py (De Anda et al., 2017). A combination of three projection approaches (tsne, pca and isomap) are used to visualize four clustering methods (ward, kmeans, meanshift, spectral) using the option -all with default parameters. A 4-group kmeans clustering algorithm was chosen after observing the

20 general behavior of the clustering methods and determining that 4 groups formed consistent metabolic divisions. This was verified by manual inspection of the entropy scores for each group.

DISSIMILATORY SULFITE REDUCTASES DsrAB proteins were identified in GB and MB genomes using KAAS and Hmmer

(Finn, Clements, & Eddy, 2011; Moriya et al., 2007). One thousand ninety-two reference

DsrA sequences were obtained from NCBI using the query: (dsrA) AND "d- proteobacteria"[porgn:__txid28221] NOT "PRJNA362212" on October 10th, 2019. This was repeated for DsrB for a total of 1,307 DsrB reference sequences. DsrA and DsrB reference fasta files were manually inspected, and non-dsr sequences were removed. The exclusion of PRJNA362212 in the search query is to ensure that sequences from those publications are not used, as this analysis uses Deltaproteobacteria bins from publications under that BioProject. Archaea Dsr sequences were obtained using from NCBI using the query: “dsrA AND pyrobaculum” and then sorted by RefSeq only results. The query

“dsrA AND vulcanisaeta” was also used, and these searches were repeated for DsrB. To ensure Dsr sequences closely related to GB and MB were included, GB and MB dsrA and dsrB sequences were BLAST searched against the NCBI non-redundant protein database

(Johnson et al., 2008). Reference sequences with >90% sequence identity were included if not already identified from the NCBI search. DsrA and DsrB sequences were aligned separately using MAFFT v7.271 (default parameters) (Katoh & Toh, 2008). Alignments were masked in Geneious and manually refined (Kearse et al., 2012). Trees were

21 constructed using IQ-TREE v1.3.11.1 with the ultrafast bootstrapping option -bb 1000

(Nguyen, Schmidt, Haeseler, & Minh, 2014).

HYDROGENASES To identify hydrogenases in GB and MB genomes, a BLAST search was completed with all genomes queried against a DIAMOND (v0.9.24.125) hydrogenase database (Buchfink, Xie, & Huson, 2015). This database was created with a custom set of

3,261 manually curated, high quality reference hydrogenase sequences using the command “diamond makedb.” Results from this search were filtered to ensure an alignment length cutoff greater than 40 amino acid residues and a sequence identity greater than 50%. The sequences identified in this search were then uploaded to HydDB web server, and 13 sequences identified as non-hydrogenases were removed from the analysis (Søndergaard et al., 2016). Remaining GB and MB hydrogenase sequences were concatenated with hydrogenase references for a total of 4,005 hydrogenase sequences.

These sequences were aligned using MAFFT v7.271 using default parameters and the -- anysymbol option to allow for “U” as selenocysteine in reference sequences (Katoh &

Toh, 2008). The alignment was trimmed using TrimAl v1.4.22 with the -automated1 option and manually refined in Geneious (Capella-Gutiérrez et al., 2009; Kearse et al.,

2012). The final alignment was used to construct a phylogenetic tree using IQ-TREE v1.3.11.1 with the ultrafast bootstrapping option -bb 1000 (Nguyen et al., 2014).

22 FUNCTIONAL CHARACTERIZATION OF GENOMES Gene prediction for individual genomes was performed using prodigal (v2.6.2, default settings) (Hyatt et al., 2010). Predicted genes of individual genomes were further characterized using KAAS (KEGG Automatic Annotation Server), the HydDB web server, dbCAN, as well as custom hmmer and blast databases (Kanehisa, 2000; Potter et al., 2018; Søndergaard et al., 2016; Yin et al., 2012). For KAAS, protein sequences of each individual genome were uploaded to the KAAS webserver using the ‘Complete or

Draft Genome’ setting (parameters: GHOSTX, custom genome dataset, BBH assignment method). Key metabolic genes were identified using custom blastp and hmmer databases.

Genomes were also annotated using the KEGG-based annotation program METABOLIC

(Zhou et al., 2019). Finally, genomes were annotated using MEBS to identify informative pfam domains for different nutrient cycles (De Anda, et al., 2017). Genes encoding carbohydrate degradation enzymes described in the Carbohydrate-Active enZYmes

(CAZYmes) database were identified with the dbCAN meta server using the HMMER,

DIAMOND, and Hotpep tools (Terrapon et al., 2017). Protein localization was determined for CAZYmes and peptidases using the command-line version of Psort (v3.0)

(Peabody et al., 2016).

23

Figure 1: Updated phylogeny of Deltaproteobacteria based on a concatenation of 37 single-copy marker genes. Newly obtained Deltaproteobacteria MAGs (designated with black dots) reconstructed from GB and MB, as well as 1,376 reference genomes. The tree is surrounded by a heatmap representing the metabolic potential to perform (N), iron (Fe), oxygen (O), carbon (C), and sulfur (S) cycles across Deltaproteobacteria. Normalized entropy scores for every cycle range from 0-1 with values closer to 1 (red) being the most informative and meaning that conserved proteins involved a biogeochemical cycle are present. A low entropy score (blue) is not informative, indicating proteins for a given pathway are not present. U lineages (teal) lack cultured

24 representatives and C lineages (gold) have a cultured representative. Grey circles signify bootstrap support ≥70. This tree was generated using RAxML (v8.2.10) under the GAMMA model of rate heterogeneity and a maximum of 1,000 bootstrap replicates (option -N autoMRE).

25

Figure 2: Percent completeness of KEGG modules for the 402 reconstructed Deltaproteobacteria genomes analyzed in this study. Genomes are shown in the top bar labeled “Group,” and are separated according to the cluster group they fall within; red = group 3, green = group 2, orange = group 1, blue = group 0. KEGG modules and corresponding KOs used to calculate completeness are included in Table 3. The legend shows percent completeness, with dark blue representing 100% complete, and white representing 0% complete. PPP = pentose phosphate pathway, EM = Embden-Meyerhorf, EDP = Entner-Doudoroff pathway.

26

Figure 3: Phylogeny of nearly 300 new NiFe Deltaproteobacteria hydrogenases. NiFe hydrogenases are compared to characterized hydrogenase groups. This phylogeny were generated using IQ-TREE (v.1.6.11) with the ultrafast bootstrapping option -bb 1000. Hydrogenase sequences identified in this study are highlighted in red and indicated with an asterisk (*) in the hydrogenase group name.

27 Desulfobacterales

Thermophilic Deltaproteobacteria uncultured Deltaproteobacteria

uncultured Deltaproteobacteria Desulfarculaceae Desulfatiglans Desulfobacca

Syntrophorhabdus

reverse-type dsr

Tree scale: 0.1

uncultured Deltaproteobacteria

Desulfobacteraceae

Desulfovibrio

Figure 4: Dissimilatory sulfite reductase (DsrA) tree. This tree was inferred using IQ-TREE (v.1.6.11) with the ultrafast bootstrapping option -bb 1000. Dsr protein sequences identified from genomes in this study are highlighted in red.

28 Appendix

29 Figure 5: 16S rRNA maximum likelihood tree of 55 MAGs from this analysis and 94 references obtained from the ARB database.

Figure 6: Principal component analysis of MEBS entropy scores for 1,778 Deltaproteobacteria genomes (402 genomes from this study, and 1,336 reference genomes). Different clusters represent groups with similar entropy scores considering sulfur, carbon, oxygen, iron, and nitrogen cycling.

30 Figure 7: Dissimilatory sulfite reductase (DsrB) tree. This tree was inferred using IQ-TREE v.1.6.11. Dsr sequences identified from genomes in this study are highlighted in red. Grey circles signify bootstrap support ≥70.

31

Figure 8: Phylogeny of FeFe hydrogenases compared to characterized hydrogenase groups. Phylogenies were generated using IQ-TREE (v.1.6.11) with the ultrafast bootstrapping option -bb 1000. Hydrogenase sequences identified in this study are highlighted in red and indicated with an asterisk (*) in the hydrogenase group name in groups C3 and A3. Grey circles signify bootstrap support ≥70.

32

Figure 9: Low-dimension projection and clustering of the entropy scores for 1,778 Deltaproteobacteria genomes (402 MAGs described in this analysis and 1,376 references). A combination of three projection approaches (tsne, pca and isomap) were used to visualize the four clustering methods (ward, kmeans, meanshift, spectral) by using the option –all with default parameters. This option was used to visualize the general behavior of the clustering methods. The built-in function F_MEBS_cluster.py (a script within the MEBS framework) was used to generate these figures.

33

Figure 10: Principal component projection of a k means clustering analysis of entropy scores for 1,778 Deltaproteobacteria genomes (402 genomes from this study, and 1,336 reference genomes). Colored clusters represent groups with similar entropy scores considering sulfur, carbon, oxygen, iron, and nitrogen cycling.

34 13 Dive_Core Depth Temperature SO4 (mM) CH4 (mM) δ C-CH4 (cm) (ºC) 4573_23 0-3 3.8 11.94 4.8 -47.7 4573_23 12-15 12.65 0.09 6.5 -45.5 4573_23 30-33 22.2 0.36 6.4 -45 4573_23 6-9 8 0.93 4.2 -47 4569_9 0-3 21.3 22.67 0.9 -27.41 4567_28 0-3 3.1 24.66 near 0 -68.70 4567_28 21-24 3.4 23.61 near 0 -70.00 4569_2 21-24 41.5 19.01 2.32 -23.48 4569_2 12-15 28.3 21.40 2.33 -27.62 4571_4 12-15 33.6 19.17 3.30 -34.72 4569_9 9-12 48.5 23.66 1.93 -27.35 4569_4 0-3 3.5 24.25 0.01 -47.60 4569_2 0-3 6.1 21.75 2.50 -38.61 4571_4 0-3 10.4 21.85 2.60 -34.32 4486_19 0-2 25.7 27.39 0.18 -39.45 4486_19 10-12 38 25.01 0.52 -35.82 4486_19 4-6 28.8 28.09 0.44 -38.31 4486_24 0-2 18.7 28.32 0.14 -26.42 4486_24 10-12 42.1 24.32 1.41 -38.58 4486_24 12-14 46.2 24.99 1.46 -38.39 4486_24 14-16 50.3 N/D 1.19 -38.18 4486_24 2-4 25.2 27.47 0.38 -27.21 4486_24 4-6 29.5 25.55 0.8 -34.09 4486_24 8-10 38 24.93 1.47 -38.59

Table 1: Geochemical data taken at Guaymas Basin. N/D signifies no data taken.

35 Table 2: Oxygen profile for Mesquite Bay, Texas (28°N 0.147’, -96°W 0.8455’).

Depth (µm) Concentration Signal (mV) (µmol/l) 0 202.802994 164.722137 250 199.976425 162.470551 500 198.618118 161.38855 750 198.470627 161.271057 1000 198.231949 161.080933 1250 199.22879 161.875 1500 200.71257 163.056946 1750 199.387207 162.00119 2000 186.821869 151.991882 2250 184.136856 149.853058 2500 144.147171 117.998047 2750 107.402779 88.7281799 3000 72.0408249 60.5595398 3250 44.1328049 38.3285522 3500 24.6088943 22.7761841 3750 10.8792877 11.839447 4000 1.40983462 4.29626465 4250 -0.5279218 2.75268555 4500 -0.5731285 2.7166748 4750 -0.6309776 2.67059326 5000 -0.6035855 2.69241333 5250 -0.6279128 2.67303467 5500 -0.7813472 2.55081177 5750 -0.750124 2.57568359 6000 -0.7225403 2.59765625 0 202.536163 164.509583 250 202.200562 164.242249 500 200.135605 162.597351 750 201.672256 163.821411 1000 199.503296 162.093658 1250 202.2155 164.25415 1500 202.186768 164.231262 1750 201.428406 163.627167

36 2000 170.102875 138.673859 2250 131.816086 108.175354 2500 90.6960449 75.4199219 2750 64.7311401 54.7367859 3000 42.2086487 36.7958069 3250 23.8698807 22.1875 3500 8.33258247 9.81079102 3750 0.95546949 3.93432617 4000 -0.5091496 2.76763916 4250 -0.6685221 2.64068604 4500 -0.6794407 2.63198853 4750 -0.7849867 2.5479126 5000 -0.7725357 2.55783081 5250 -0.8229144 2.5177002 5500 -0.8545207 2.49252319

Table 3: KEGG modules, pathways, and KO numbers used to determined completeness of pathways shown in Figure 2.

Modu Pathway KO le M0000 Glycolysis (Embden- K00844,K12407,K00845,K00886,K08074,K00918 1+01 Meyerhof pathway) M0000 Glycolysis (Embden- K01810,K06859,K13810,K15916 1+02 Meyerhof pathway) M0000 Glycolysis (Embden- K00850,K16370,K00918 1+03 Meyerhof pathway) M0000 Glycolysis (Embden- K01623,K01624,K11645,K16305,K16306 1+04 Meyerhof pathway) M0000 Glycolysis (Embden- K01803 1+05 Meyerhof pathway) M0000 Glycolysis (Embden- K00134,K00150 1+06 Meyerhof pathway) M0000 Glycolysis (Embden- K00927,K11389 1+07 Meyerhof pathway) M0000 Glycolysis (Embden- K01834,K15633,K15634,K15635 1+08 Meyerhof pathway) M0000 Glycolysis (Embden- K01689 1+09 Meyerhof pathway) M0000 Glycolysis (Embden- K00873,K12406 1+10 Meyerhof pathway) 37 M0000 Gluconeogenesis K01596,K01610 3+01 M0000 Gluconeogenesis K01689 3+02 M0000 Gluconeogenesis K01834,K15633,K15634,K15635 3+03 M0000 Gluconeogenesis K00927,K00134,K00150 3+04 M0000 Gluconeogenesis K01803 3+05 M0000 Gluconeogenesis K01623,K01624,K11645 3+06 M0000 Gluconeogenesis K03841,K02446,K11532,K01086,K04041,K01622 3+07 M0000 Pentose phosphate K13937,K00036,K19243 4+01 pathway (PPP) M0000 Pentose phosphate K01057,K07404 4+02 pathway (PPP) M0000 Pentose phosphate K00033 4+03 pathway (PPP) M0000 Pentose phosphate K01783 4+0 pathway (PPP) M0000 Pentose phosphate K01807,K01808 4+05 pathway (PPP) M0000 Pentose phosphate K00615 4+06 pathway (PPP) M0000 Pentose phosphate K00616 4+07 pathway (PPP) M0000 Pentose phosphate K01810,K06859,K13810,K15916 4+08 pathway (PPP) M0000 Entner-Doudoroff K00036 8+01 pathway (EDP) M0000 Entner-Doudoroff K01057,K07404 8+02 pathway (EDP) M0000 Entner-Doudoroff K01690 8+03 pathway (EDP) M0000 Entner-Doudoroff K01625 8+04 pathway (EDP) M0000 TCA cycle K01647 9+01 M0000 TCA cycle K01681,K01682 9+02 M0000 TCA cycle K00031,K00030 9+03 M0000 TCA cycle K00164,K00658,K00382,K00174,K00175,K00177,K00176 9+04 M0000 TCA cycle K01902,K01903,K01899,K01900,K18118 9+05 M0000 TCA cycle K00234,K00235,K00236,K00237,K00239,K00240,K00241,K00242, 9+06 K18859,K18860,K00244,K00245,K00246,K00247 38 M0000 TCA cycle K01676,K01679,K01677,K01678 9+07 M0000 TCA cycle K00026,K00025,K00024,K00116 9+08 M0017 Reductive TCA cycle K00169,K00170,K00171,K00172,K03737 3+01 M0017 Reductive TCA cycle K01007,K01006 3+02 M0017 Reductive TCA cycle K01595,K01959,K01960,K01958 3+03 M0017 Reductive TCA cycle K00024 3+04 M0017 Reductive TCA cycle K01676,K01679,K01677,K01678 3+05 M0017 Reductive TCA cycle K00239,K00240,K00241,K00242,K00244,K00245,K00246,K00247, 3+06 K18556,K18557,K18558,K18559,K18560 M0017 Reductive TCA cycle K01902,K01903 3+07 M0017 Reductive TCA cycle K00174,K00175,K00177,K00176, 3+08 M0017 Reductive TCA cycle K00031 3+09 M0017 Reductive TCA cycle K01681,K01682 3+10 M0017 Reductive TCA cycle K15230,K15231,K15232,K15233 3+11 M0017 Reductive TCA cycle K15234 3+12 M0037 Wood-Ljungdahl K00198 7+01 pathway M0037 Wood-Ljungdahl K05299,K15022 7+02 pathway M0037 Wood-Ljungdahl K01938 7+03 pathway M0037 Wood-Ljungdahl K01491 7+04 pathway M0037 Wood-Ljungdahl K00297 7+05 pathway M0037 Wood-Ljungdahl K15023 7+06 pathway M0037 Wood-Ljungdahl K14138,K00197,K00194 7+07 pathway M0008 FA biosynthesis and K00232,K00249,K00255,K06445,K09479 7+01 degradation M0008 FA biosynthesis and K01692,K07511,K13767 7+02 degradation M0008 FA biosynthesis and K00022,K07516,K01825,K01782,K07514,K07515,K10527 7+03 degradation M0008 FA biosynthesis and K00632,K07508,K07509,K07513 7+04 degradation 39 M0074 Propanoyl-CoA K01965,K01966,K11263,K18472,K19312,K01964,K15036,K15037 1+01 metabolism M0074 Propanoyl-CoA K05606 1+02 metabolism M0074 Propanoyl-CoA K01847,K01848,K01849 1+03 metabolism M0053 Xylene degradation K15757,K15758 7+01 M0053 Xylene degradation K00055 7+02 M0053 Xylene degradation K00141 7+03 M0054 Benzoyl-CoA K04112,K04113,K04114,K04115,K19515,K19516 1+01 degradation M0054 Benzoyl-CoA K07537 1+02 degradation M0054 Benzoyl-CoA K07538 1+03 degradation M0054 Benzoyl-CoA K07539 1+04 degradation M0041 Cymene degradation K10616,K18293 9+01 M0041 Cymene degradation K10617 9+02 M0041 Cymene degradation K10618 9+03 M0041 Toluene degradation K07540 8+01 M0041 Toluene degradation K07543,K07544 8+02 M0041 Toluene degradation K07545 8+03 M0041 Toluene degradation K07546 8+04 M0041 Toluene degradation K07547,K07548 8+05 M0041 Toluene degradation K07549,K07550 8+06 M0056 Catechol meta-cleavage K00446,K07104 9+01 M0056 Catechol meta-cleavage K10217 9+02 M0056 Catechol meta-cleavage K01821 9+03 M0056 Catechol meta-cleavage K01617,K10216 9+04 M0056 Catechol meta-cleavage K18364,K02554 9+05 M0056 Catechol meta-cleavage K18365,K01666 9+06 40 M0056 Catechol meta-cleavage K18366,K04073 9+07 M0015 Cytochrome c oxidase K02275,K02274,K02276,K15408,K02277 5+01 M0015 Cytochrome c oxidase, K00404,K00405,K15862,K00407,K00406 6+01 cbb3-type M0015 Cytochrome d K00425,K00426 3+01 ubiquinol oxidase M0052 Denitrification K00370,K00371,K00374,K02567,K02568 9+01 M0052 Denitrification K00368,K15864 9+02 M0052 Denitrification K04561,K02305,K15877 9+03 M0052 Denitrification K00376 9+04 M0053 Dissimilatory nitrate K00370,K00371,K00374,K02567,K02568 0+01 reduction M0053 Dissimilatory nitrate K00362,K00363,K03385,K15876 0+02 reduction M0059 Dissimilatory sulfate K00956,K00957,K00958 6+01 reduction M0059 Dissimilatory sulfate K00394,K00395 6+02 reduction M0059 Dissimilatory sulfate K11180,K11181 6+03 reduction M0059 SOX complex K17222,K17223,K17224,K17225,K08738,K17226,K17227 5+01 M0024 Iron complex transport K02016,K02015,K02013 0+01 system

41 References

Adam, P. S., Borrel, G., & Gribaldo, S. (2019). An archaeal origin of the Wood– Ljungdahl H4MPT branch and the emergence of bacterial methylotrophy. Nature Microbiology. https://doi.org/10.1038/s41564-019-0534-2 Alneberg, J., Bjarnason, B. S., Bruijn, I. De, Schirmer, M., Quick, J., Ijaz, U. Z., … Quince, C. (2014). Binning metagenomic contigs by coverage and composition. Nature Methods, 11(11). https://doi.org/10.1038/nmeth.3103 Anantharaman, K., Breier, J. A., & Dick, G. J. (2016). Metagenomic resolution of microbial functions in deep-sea hydrothermal plumes across the Eastern Lau Spreading Center. The ISME Journal, 10, 225–239. https://doi.org/10.1038/ismej.2015.81 Baena, S., Perdomo, N., Carvajal, C., Diaz, C., & Patel, B. K. C. (2011). Desulfosoma caldarium gen. nov., sp. nov., a thermophilic sulfate-reducing bacterium from a terrestrial hot spring. International Journal of Systematic and Evolutionary Microbiology, 732–736. https://doi.org/10.1099/ijs.0.020586-0 Baker, B. J., Lazar, C. S., Teske, A. P., & Dick, G. J. (2015). Genomic resolution of linkages in carbon, nitrogen, and sulfur cycling among widespread estuary sediment bacteria. Microbiome, 1–12. https://doi.org/10.1186/s40168-015-0077-6 Barber, R. D., Rott, M. A., & Donohue, T. J. (1996). Characterization of a Glutathione- Dependent Formaldehyde Dehydrogenase from Rhodobacter sphaeroides, 178(5), 1386–1393. Barnum, T. P., Figueroa, I. A., Carlström, C. I., Lucas, L. N., Engelbrektson, A. L., & Coates, J. D. (2018). Genome-resolved metagenomics identi fi es genetic mobility , metabolic interactions , and unexpected diversity in perchlorate- reducing communities. The ISME Journal, 1568–1581. https://doi.org/10.1038/s41396-018- 0081-5 Bausum, H. T., & Matney, T. S. (1965). Boundary Between Bacterial Mesophilism and Thermophilism. Journal of Bacteriology, 90(1), 50–53. Retrieved from http://www.ncbi.nlm.nih.gov/pubmed/16562042%0Ahttp://www.pubmedcentral.nih. gov/articlerender.fcgi?artid=PMC315593 Beeder, J., Torsvik, T., & Lien, T. (1995). Thermodesulforhabdus norvegicus gen. nov., sp. nov., a novel thermophilic sulfate-reducing bacterium from oil field water. Archives of Microbiology, 164(5), 331–336. https://doi.org/10.1007/BF02529979 Beliaev, A. S., Saffarini, D. A., Mclaughlin, J. L., & Hunnicutt, D. (2001). MtrC, an outer membrane decahaem c cytochrome required for metal reduction in Shewanella putrefaciens. Molecular Microbiology, 39, 722–730. Benson, D. A., Cavanaugh, M., Clark, K., Karsch-Mizrachi, I., Lipman, D. J., Ostell, J., & Sayers, E. W. (2013). GenBank. Nucleic Acids Research, 41(D1), 36–42. https://doi.org/10.1093/nar/gks1195 Bond, D. R., Holmes, D. E., Tender, L. M., & Lovley, D. R. (2002). Electrode-Reducing Microorganisms That Harvest Energy from Marine Sediments, 295(January), 483– 42 486. Buchfink, B., Xie, C., & Huson, D. H. (2015). Fast and sensitive protein alignment using DIAMOND. Nature Methods, 12(1). https://doi.org/10.1038/nmeth.3176 Burnham, J. C., Collart, S. A., & Highison, B. W. (1981). Entrapment and Lysis of the Cyanobacterium Phormidium luridum by Aqueous Colonies of Myxococcus xanthus PCO2. Archives of Microbiology, 285–294. Capella-Gutiérrez, S., Silla-Martínez, J. M., & Gabaldón, T. (2009). trimAl : a tool for automated alignment trimming in large-scale phylogenetic analyses. Bioinformatics, 25(15), 1972–1973. https://doi.org/10.1093/bioinformatics/btp348 Chistoserdov, A. Y., Chistoserdova, L. V, Mcintire, W. S., & Lidstromi, M. E. (1994). Genetic Organization of the mau Gene Cluster in Methylobacterium extorquens AM1: Complete Nucleotide Sequence and Generation and Characteristics of mau Mutants. Journal of Bacteriology, 176(13), 4052–4065. Criscuolo, A., & Gribaldo, S. (2010). BMGE (Block Mapping and Gathering with Entropy): a new software for selection of phylogenetic informative regions from multiple sequence alignments. BMC Evolutionary Biology. Darling, A. E., Jospin, G., Lowe, E., Matsen, F. A., Bik, H. M., & Eisen, J. A. (2014). PhyloSift: phylogenetic analysis of genomes and metagenomes. PeerJ, 2, e243. https://doi.org/10.7717/peerj.243 Das, A., Fu, Z., Tempel, W., Liu, Z., Change, J., Chen, L., … Wang, B. (2007). Characterization of a Corrinoid Protein Involved in the C1 Metabolism of Strict Anaerobic Bacterium Moorella thermoacetica. Wiley InterScience, 70(October 2006), 659–664. https://doi.org/10.1002/prot De Anda, V., Zapata-Peñasco, I., Poot-Hernandez, A. C., Eguiarte, L. E., Contreras- Moreira, B., & Souza, V. (2017). MEBS, a software platform to evaluate large (meta)genomic collections according to their metabolic machinery: Unraveling the sulfur cycle. GigaScience, 6(11), 1–17. https://doi.org/10.1093/gigascience/gix096 DeLong, Edward, Lory, S., Stackebrandt, E., Thompson, F. (2014). The Prokaryotes: Deltaproteobacteria and Epsilonproteobacteria. (E. Rosenberg, Ed.) (4th ed.). Springer. Dick, G. J., Andersson, A. F., Baker, B. J., Simmons, S. L., Thomas, B. C., Yelton, A. P., & Banfield, J. F. (2009). Community-wide analysis of microbial genome sequence signatures. Genome Biology, 10(8). https://doi.org/10.1186/gb-2009-10-8-r85 Dombrowski, N., Teske, A. P., & Baker, B. J. (2018). Expansive microbial metabolic versatility and hydrothermal sediments. Nature Communications, 1–13. https://doi.org/10.1038/s41467-018-07418-0 Eren, A. M., Esen, O. C., Quince, C., Vineis, J. H., Morrison, H. G., Sogin, M. L., & Delmont, T. O. (2015). Anvi’o: An advanced analysis and visualization platform for ’omics data. PeerJ, 2015(10), 1–29. https://doi.org/10.7717/peerj.1319 Ferry, J. G. (1990). Formate dehydrogenase. FEMS Microbiology Letters, 87(3–4), 377– 382. https://doi.org/10.1016/0378-1097(90)90482-6 Finn, R. D., Clements, J., & Eddy, S. R. (2011). HMMER web server: interactive sequence similarity searching. Nucleic Acids Research, 39(May), 29–37. 43 https://doi.org/10.1093/nar/gkr367 Florentino, A. P., Brienza, C., Stams, A. J. M., & Sánchez-Andrea, I. (2016). Desulfurella amilsii sp. nov., a novel acidotolerant sulfur-respiring bacterium isolated from acidic river sediments. International Journal of Systematic and Evolutionary Microbiology, 66(3), 1249–1253. https://doi.org/10.1099/ijsem.0.000866 Flores, G. E., Hunter, R. C., Liu, Y., Mets, A., Schouten, S., & Reysenbach, A. (2012). Hippea jasoniae sp. nov. and Hippea alviniae sp. nov., thermoacidophilic members of the class Deltaproteobacteria isolated from deep-sea hydrothermal vent deposits. International Journal of Systematic and Evolutionary Microbiology, (62), 1252– 1258. https://doi.org/10.1099/ijs.0.033001-0 Furdui, C., & Ragsdale, S. W. (2000). The Role of Pyruvate Ferredoxin Oxidoreductase in Pyruvate Synthesis during Autotrophic Growth by the Wood-Ljungdahl Pathway. The Journal of Biological Chemistry, 275(37), 28494–28499. https://doi.org/10.1074/jbc.M003291200 Gak, E. R., Tsygankov, Y. D., & Chistoserdov, A. Y. (1997). Organization of methylamine utilization genes (mau) in “Methylobacillus flagellatum” KT and analysis of mau mutants. Microbiology, 4025(1997), 1827–1835. Hamada, M., Toyofuku, M., Miyano, T., & Nomura, N. (2014). cbb3-Type Cytochrome c Oxidases, Aerobic Respiratory Enzymes, Impact the Anaerobic Life of Pseudomonas aeruginosa PAO1. Journal of Bacteriology, 196(22), 3881–3889. https://doi.org/10.1128/JB.01978-14 Harmsen, H. J. M., Kuijk, B. L. M. Van, Plugge, C. M., Akkermans, A. D. L., Vos, W. M. De, & Stams, A. J. M. (1998). Syntrophobacter furnaroxidans sp. nov., a syntrophic propionate-degrading sulfate- reducing bacterium. International Journal of Systematic and Evolutionary Microbiology, 1383–1388. Heidelberg, J. F., Seshadri, R., Haveman, S. A., Hemme, C. L., Paulsen, I. T., Kolonay, J. F., … Fraser, C. M. (2004). The genome sequence of the anaerobic, sulfate-reducing bacterium Desulfovibrio vulgaris Hildenborough. Nature Biotechnology, 22(5), 554–559. https://doi.org/10.1038/nbt959 Holmes, D. E., Bond, D. R., Neil, R. A. O., Reimers, C. E., Tender, L. R., & Lovley, D. R. (2004). Microbial Communities Associated with Electrodes Harvesting Electricity from a Variety of Aquatic Sediments. Microbial Ecology, 48(1), 178– 190. https://doi.org/10.1007/s00248-003-0004-4 Hyatt, D., Chen, G., Locascio, P. F., Land, M. L., Larimer, F. W., & Hauser, L. J. (2010). Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics. Jochum, L. M., Schreiber, L., Marshall, I. P. G., & Jørgensen, B. B. (2018). Single-Cell Genomics Reveals a Diverse Metabolic Potential of Uncultivated Desulfatiglans- Related Deltaproteobacteria Widely Distributed in Marine Sediment. Frontiers in Microbiology, 9(September), 1–16. https://doi.org/10.3389/fmicb.2018.02038 Johnson, M., Zaretskaya, I., Raytselis, Y., Merezhuk, Y., Mcginnis, S., & Madden, T. L. (2008). NCBI BLAST: a better web interface. Nucleic Acids Research, 36(April), 5– 9. https://doi.org/10.1093/nar/gkn201 44 Joshi, N., & Fass, J. (2011). Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files. Retrieved from https://github.com/najoshi/sickle. Kang, D. D., Froula, J., Egan, R., & Wang, Z. (2015). MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ, 1–15. https://doi.org/10.7717/peerj.1165 Karpinets, T. V. (2010). CAZymes Analysis Toolkit (CAT): Web service for searching and analyzing carbohydrate-active enzymes in a newly sequenced organism using CAZy database. Glycobiology, 20(12), 1574–1584. https://doi.org/10.1093/glycob/cwq106 Kashefi, K., Holmes, D., Baross, J., & Lovley, D. (2003). Thermophily in the Geobacteraceae: Geothermobacter ehrlichii gen. nov., sp. nov., a Novel Thermophilic Member of the Geobacteraceae from the “Bag City” Hydrothermal Vent. Applied and Environmental Microbiology, 69(5), 2985–2993. https://doi.org/10.1128/AEM.69.5.2985 Kato, S., & Yamagishi, A. (2016). A novel large filamentous deltaproteobacterium on hydrothermally inactive sulfide chimneys of the Southern Mariana Trough. Deep- Sea Research Part I, 110, 99–105. https://doi.org/10.1016/j.dsr.2015.12.015 Katoh, K., & Toh, H. (2008). Recent developments in the MAFFT multiple sequence alignment program. Briefings in Bioinformatics, 9(4), 81–92. https://doi.org/10.1093/bib/bbn013 Kearse, M., Moir, R., Wilson, A., Stones-havas, S., Sturrock, S., Buxton, S., … Drummond, A. (2012). Geneious Basic: An integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics, 28(12), 1647–1649. https://doi.org/10.1093/bioinformatics/bts199 Kirsch, J., Méjean, V., & Verméglio, A. (2002). Involvement of a putative molybdenum enzyme in the reduction of selenate by Escherichia coli, 3865–3872. Kroll, J. J., Roof, M. B., Hoffman, L. J., Dickson, J. S., & Hank Harris, D. L. (2005). Proliferative enteropathy: a global enteric disease of pigs caused by Lawsonia intracellularis. Animal Health Research Reviews, 6(2), 173–197. https://doi.org/10.1079/ahr2005109 Leary, N. A. O., Wright, M. W., Brister, J. R., Ciufo, S., Haddad, D., Mcveigh, R., … Pruitt, K. D. (2016). Reference sequence (RefSeq) database at NCBI: current status, taxonomic expansion, and functional annotation. Nucleic Acids Research, 44, 733– 745. https://doi.org/10.1093/nar/gkv1189 Li, D., Liu, C., Luo, R., Sadakane, K., & Lam, T. (2015). Sequence analysis MEGAHIT: an ultra-fast single-node solution for large and complex metagenomics assembly via succinct de Bruijn graph. Bioinformatics, 31(January), 1674–1676. https://doi.org/10.1093/bioinformatics/btv033 Li, H., Handsaker, B., Wysoker, A., Fennell, T., Ruan, J., Homer, N., … Durbin, R. (2009). The Sequence Alignment/Map format and SAMtools. Bioinformatics, 25(16), 2078–2079. https://doi.org/10.1093/bioinformatics/btp352 Li, H., Li, H., Durbin, R., & Durbin, R. (2009). Fast and accurate short read alignment with Burrows-Wheeler transform. Bioinformatics (Oxford, England), 25(14), 1754– 45 1760. https://doi.org/10.1101/gr.129684.111 Löffler, C., Kuntze, K., Vazquez, J. R., Rugor, A., Kung, J. W., Böttcher, A., & Boll, M. (2011). Occurrence, genes and expression of the W/Se-containing class II benzoyl- coenzyme A reductases in anaerobic bacteria. Environmental Microbiology, 13(3), 696–709. https://doi.org/10.1111/j.1462-2920.2010.02374.x Lovley, D. R., Ueki, T., Zhang, T., Nikhil, S., Shrestha, P. M., Flanagan, K. A., … Nevin, K. P. (2011). Geobacter: The Microbe Electric’s Physiology, Ecology, and Practical Applications. Advances in Microbial Physiology (1st ed., Vol. 59). Elsevier Ltd. https://doi.org/10.1016/B978-0-12-387661-4.00004-5 Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Buchner, A., … Bode, A. (2004). ARB: a software environment for sequence data. Nucleic Acids Research, 32(4), 1363–1371. https://doi.org/10.1093/nar/gkh293 Macy, J. M., Rech, S., Auling, G., Dorsch, M., Stackebrandt, E., & Sly, L. I. (1993). Subclass of Proteobacteria with a Novel Type of Anaerobic Respiration, 135–142. Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet.Journal, 17(1), 5–7. Martin, W., Baross, J., Kelley, D., & Russell, M. J. (2008). Hydrothermal vents and the origin of life. Nature Reviews Microbiology, 6(11), 805–814. https://doi.org/10.1038/nrmicro1991 McInerney, M. J., Rohlin, L., Mouttaki, H., Kim, U., Krupp, R. S., Rios-hernandez, L., … Gunsalus, R. P. (2007). The genome of Syntrophus aciditrophicus: Life at the thermodynamic limit of microbial growth. PNAS, 104(18), 7600–7605. Moran, N. A. (2002). Microbial Minimalism: Genome Reduction in Bacterial Pathogens. Cell, 108, 583–586. Moriya, Y., Itoh, M., Okuda, S., Yoshizawa, A. C., & Kanehisa, M. (2007). KAAS: an automatic genome annotation and pathway reconstruction server. Nucleic Acids Research, 35, 182–185. https://doi.org/10.1093/nar/gkm321 Nguyen, L., Schmidt, H. A., Haeseler, A. Von, & Minh, B. Q. (2014). IQ-TREE: A Fast and Effective Stochastic Algorithm for Estimating Maximum-Likelihood Phylogenies. Molecular Biology and Evolution, 32(1), 268–274. https://doi.org/10.1093/molbev/msu300 Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome Research, 1043–1055. https://doi.org/10.1101/gr.186072.114.Freely Peng, Y., Leung, H. C. M., Yiu, S. M., & Chin, F. Y. L. (2012). IDBA-UD: A de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics, 28(11), 1420–1428. https://doi.org/10.1093/bioinformatics/bts174 Rees, G. N., Grassia, G. I. N. S., Sheehy, A. J., Dwivedi, P. P., & Patel, B. K. C. (1995). Desulfacinum infernum gen. nov., sp. nov., a Thermophilic Sulfate-Reducing Bacterium from a Petroleum Reservoir. International Journal of Systematic Bacteriology, 45(1), 85–89. 46 Sauer, F. D. (1986). Tetrahydromethanopterin methyltransferase, a component of the methane synthesizing complex of Methanobacterium thermoautotrophicum. Biochemical and Biophysical Research Communications, 136(2), 542–547. https://doi.org/10.1016/0006-291X(86)90474-2 Schnell, S., Bak, F., Pfennig, N., & Konstanz, U. (1989). Anaerobic degradation of aniline and dihydroxybenzenes by newly isolated sulfate-reducing bacteria and description of Desuifobacterium anilini. Archives of Microbiology, 556–563. Schuchmann, K., & Müller, V. (2014). Autotrophy at the thermodynamic limit of life: a model for energy conservation in acetogenic bacteria. Nature Reviews Microbiology, 12(December). https://doi.org/10.1038/nrmicro3365 Seeman, T. (2013). barrnap 0.7: rapid ribosomal RNA prediction. Sheik, C. S., Jain, S., & Dick, G. J. (2014). Metabolic flexibility of enigmatic SAR324 revealed through metagenomics and metatranscriptomics. Environmental Microbiology, 16, 304–317. https://doi.org/10.1111/1462-2920.12165 Sieber, C. M. K., Probst, A. J., Sharrar, A., Thomas, B. C., Hess, M., Tringe, S. G., & Banfield, J. F. (2018). Recovery of genomes from metagenomes via a dereplication, aggregation and scoring strategy. Nature Microbiology, 3(July). https://doi.org/10.1038/s41564-018-0171-1 Sievert, S. M., & Kuever, J. (2000). Desulfacinum hydrothermale sp. nov., a thermophilic, sulfate-reducing bacterium from geothermally heated sediments near Milos Island (Greece). International Journal of Systematic and Evolutionary Microbiology, 1239–1246. Slobodkin, A. I., Reysenbach, A., Slobodkina, G. B., Kolganova, T. V, & Kostrikina, N. A. (2013). Dissulfuribacter thermophilus gen. nov., sp. nov., a thermophilic, autotrophic, sulfur-disproportionating, deeply branching deltaproteobacterium from a deep-sea hydrothermal vent. International Journal of Systematic and Evolutionary Microbiology, 1967–1971. https://doi.org/10.1099/ijs.0.046938-0 Søndergaard, D., Pedersen, C. N. S., & Greening, C. (2016). HydDB: A web tool for hydrogenase classification and analysis. Scientific Reports, 6, 1–8. https://doi.org/10.1038/srep34212 Sorokin, D. Y., & Chernyh, N. A. (2016). ‘Candidatus Desulfonatronobulbus propionicus’: a first haloalkaliphilic member of the order Syntrophobacterales from soda lakes. Extremophiles, 20(6), 895–901. https://doi.org/10.1007/s00792-016- 0881-3 Stamatakis, A. (2006). RAxML-VI-HPC: Maximum likelihood-based phylogenetic analyses with thousands of taxa and mixed models. Bioinformatics, 22(21), 2688– 2690. https://doi.org/10.1093/bioinformatics/btl446 Swan, B. K., Martinez-Garcia, M., Preston, C. M., Woyke, T., Lamy, D., Reinthaler, T., … Gerhard, J. (2011). Potential for Chemolithoautotrophy Among Ubiquitous Bacteria Lineages in the Dark Ocean. Science, 333(6047), 1296–1300. Teske, A., Callaghan, A. V., & LaRowe, D. E. (2014). Biosphere frontiers of subsurface life in the sedimented hydrothermal system of Guaymas Basin. Frontiers in Microbiology, 5(JULY), 1–11. https://doi.org/10.3389/fmicb.2014.00362 47 Thorup, C., & Schramm, A. (2017). Disguised as a Sulfate Reducer: Growth of the Deltaproteobacterium Desulfurivibrio alkaliphilus by Sulfide Oxidation with Nitrate. MBio, 8(4), 1–9. Wasmund, K., Mußmann, M., & Loy, A. (2017). The life sulfuric: microbial ecology of sulfur cycling in marine sediments. Environmental Microbiology Reports, 9, 323– 344. https://doi.org/10.1111/1758-2229.12538 Wu, Y., Tang, Y., Tringe, S. G., Simmons, B. A., & Singer, S. W. (2014). MaxBin: an automated binning method to recover individual genomes from metagenomes using an expectation-maximization algorithm. Microbiome, 1–18. Zehr, J. P., & Ward, B. B. (2002). Nitrogen Cycling in the Ocean: New Perspectives on Processes and Paradigms. Applied and Environmental Microbiology, 68(3), 1015– 1024. https://doi.org/10.1128/AEM.68.3.1015

48