<<

Uncovering New Players and New Roles in Microbial Anoxic Carbon Transformations

Dissertation

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Lindsey Marie Solden

Graduate Program in Microbiology

The Ohio State University

2018

Dissertation Committee:

Professor Kelly Wrighton, Adviser

Professor Jeffrey Firkins

Professor Venkat Gopalan

Professor Daniel Wozniak

Copyrighted by

Lindsey Marie Solden

2018

Abstract

Organic carbon in anoxic ecosystems flows in a cascade from complex plant material to more labile sugars, and ultimately to short-chain fatty acids (SCFA) and gasses like carbon dioxide and methane. Microbial communities, groups of microorganisms that interact with one another, facilitate this process. Microbial anaerobic carbon degradation is exemplified in ruminants. These animals harness energy from plant material using the power of interacting microorganisms, which break down plant carbon into SCFA under largely anoxic conditions in the rumen. Because microbial SCFA can provide up to 80% of the animal’s energy, understanding microbial carbon degradation mechanisms in the rumen is important for many agricultural industries including the production of meat, milk, leather, and wool. Beyond domesticated ruminants, there are over 75 million wild ruminants that are fundamental members in ecosystems from Alaska to Australia. Furthermore, the microbial enzymes that break down plant material in the rumen have industrial applications for modifying enzymatic cocktails in biofuel production.

The research presented here uses cultivation-independent and laboratory approaches to assign carbon degradation capabilities to specific members of the microbial community in the moose rumen. Moose, animals that naturally forage on woody biomass, were selected to provide access to natural rumen microbial communities that are especially adapted to a high lignocellulose diet. We sampled rumen fluid from moose in the spring, summer, and winter, along a seasonal gradient in lignocellulose. Rumen fluid was sampled via the rumen cannula, offering access into the active microbial interactions ii mediating complex carbon degradation. From these rumen fluid samples, we performed high-throughput shotgun metagenomics and metaproteomics, coupled to multiple methods for metabolite quantification (1H NMR, sequential fiber analyses, and carbohydrate microarray polymer profiling (CoMPP)). We binned hundreds of , resulting in 77 unique (~>80% complete) genomes. A majority of these genomes (71%) belong to novel genera, families, and orders. Five of these genomes belong to an uncultivated, family, the BS11, which represent the first ever genomic representatives from this family. A newly resolved genus in this family was the most enriched member on a high lignocellulose diet and was found to ferment hemicellulose sugars.

To uncover interactions between these microorganisms and determine their functional role, we mapped metaproteomics data to the unique data. This revealed most of the carbon degradation enzymes were encoded within polysaccharide utilization loci from uncultivated Bacteroidetes genomes. We then characterized the carbon metabolite chemistry focusing primarily on carbohydrate polymers and sugars of using CoMPP and 1H NMR. Linking our proteomes to metabolomics, we discovered that proteins from only seven Bacteroidetes genomes were processing all plant polymers detected, suggesting that a few generalist microorganisms are responsible for most of the carbon degradation in the rumen. One of these highly active Bacteroidetes genomes contained protospacer linkages to viral genomes, indicating that immunity against viral predation that may be required for some organisms to sustain carbon degradation in the rumen.

iii Finally, winter rumen fluid with elevated condensed tannins was used to enrich for tannin-degrading microorganisms. From these reactors, we isolated a sp. that can degrade Sorghum condensed tannins (CT) in the presence of glucose. Label- free proteomics was performed to evaluate the dynamic proteome when the isolate was grown in the presence or absence of CT. CT lead to the enrichment of many proteins annotated as tannase enzyme, transcriptional regulators for phenolic metabolism, putative enzymes involved in phenolic metabolism, stress response proteins, and proteins originating from prophage. Cumulatively, this dissertation research examines the rumen on multiple scales, to identify microbial community and viral interactions, microorganism physiology, and the putative enzymes that facilitate how carbon flows through the rumen.

iv

Acknowledgments

This body of work would not have been possible without the help, guidance, and support from so many people. First and foremost, I would like to thank my PhD advisor

Dr. Kelly Wrighton. It has been an honor to be one of her first graduate students. Kelly has provided every possible opportunity to learn, grow, and challenge myself. She has taught me both consciously and unconsciously how to navigate being a female academic scientist. Her passion and intense enthusiasm for her research is both motivational and inspiring and I hope to emulate her in the future. I appreciate all of her time spent teaching me how to design experiments, manage scientific collaborations, and communicate findings in talks and writing. I am indebted for her patience, generosity, mentorship, and most importantly friendship. I would not be the scientist or person I am today without her. I truly cannot thank her enough.

I also would like to thank my graduate committee, Dr. Jeffrey Firkins, Dr. Daniel

Wozniak and Dr. Venkat Gopalan, for making themselves available for helpful discussions, critical comments, and for sharing the most precious of all resources, time. I would also like to thank Dr. Barbara Wolfe for providing support, career development opportunities, and mentorship. Finally, I am especially indebted to my family, friends, and lab members who have provided me with kind words, friendship, and love. All of

v you are essential in my amazing support system, and I wouldn’t be where I am today without you. I am incredibly lucky.

vi

Vita

2009………………………………………..Mentor High School

2013………………………………………..B.S. Microbiology, The Ohio State University

2013 to present …………………… ...... Graduate Teaching/Research Associate,

Department of Microbiology, The Ohio State

University

Publications

Naas AE, Solden LM, Norbeck AD, Brewer H, Hagem LH, Heggenes IM, McHardy AC, Mackie RI, Pasa-Tolic L, Arntzen MO, Eijsink VGH, Koropatkin NM, Hess M, Wrighton KC, Pope PB. (2018) “Candidatus Paraporphyromonas polyenzymogenes” encodes multi-modular cellulases linked to the type IX secretion system. Microbiome 6 (44).

Solden LM, Wrighton KC. (2017) Finding life’s missing pieces. Nature Microbiology 2, 1458-1459.

Angle JA, Morin T, Solden LM, Narrowe A, Smith G, Borton MA, Rey-Sanchez C, Daly RA, Mirfenderesgi G, Hoyt DW, Riley W, Miller CS, Bohrer G, Wrighton KC. (2017) Methanogenesis in oxygenated soils is a substantial fraction of wetland methane emissions. Nature Comm 8: 1567.

Solden LM, Hoyt DE, Collins WB, Plank JE, Daly RA, Hildebrand E, Beavers TJ, Wolfe R, Nicora CD, Purvine SO, Carstensen M, Lipton MA, Spalinger DE, Firkins JL, Wolfe BA, Wrighton KC. (2017) New roles in hemicellulose degradation for the uncultivated Bacteroidetes family BS11. The ISME J 11(3): 691.

Borton MA, Sabag-Daigle A, Wu J, Solden LM, O’Banion BS, Daly RA, Wolfe R, Gonzalez JF, Wysocki VH, Ahmer BMM, Wrighton KC. (2017) Chemical and pathogen induced inflammation disrupt the murine intestinal microbiome. Microbiome 5.47.

vii Faulkner MJ, Wenner BA, Solden LM, Weiss WP (2017). Source of supplemental dietary copper, zinc, and manganese affects fecal microbial relative abundance in lactating dairy cows. J Dairy Science 100 (2): 1037-1044.

Solden LM, Lloyd K, Wrighton KC. (2016). The Bright Side of Microbial Dark Matter: Lessons learned from the uncultivated majority. Current Opinions in Microbiology, 31: 217-226.

Fields of Study

Major Field: Microbiology

viii

Table of Contents

Abstract ...... ii

Acknowledgments ...... v

Vita ...... vii

List of Tables ...... xiii

List of Figures ...... xiv

Chapter 1: Introduction ...... 1

1.1 Rumen microbial membership and community genomics ...... 1

1.2 Rumen viruses are currently undersampled ...... 6

1.3 The first ecosystems perspective on the rumen microbiome ...... 8

Chapter 2: New roles in rumen hemicellulosic sugar fermentation for the uncultivated

Bacteroidetes family BS11 ...... 9

2.1 Introduction ...... 10

2.2 Materials and Methods ...... 12

2.2.1 Experimental design and sample collection ...... 12

2.2.2 DNA extraction and 16S rRNA gene sequencing ...... 14

2.2.3 Forage and rumen chemical characterization ...... 14

2.2.4 Hemicellulose metabolites identified by 1H NMR ...... 15

2.2.5 Metagenomic assembly, annotation and binning ...... 16

ix 2.2.6 Phylogenetic analyses ...... 16

2.2.7 Metaproteomic extraction, spectral analysis, and data acquisition ...... 17

2.3 Results and Discussion ...... 19

2.3.1 Dietary increases in woody biomass ...... 19

2.3.2 Microbial members respond to increases in woody biomass ...... 20

2.3.3 Members of the BS11 are poorly described, yet prevalent in mammalian

gastrointestinal tracts ...... 24

2.3.4 Multiple near-complete genomes resolve the family BS11 and identify two new

candidate genera ...... 27

2.3.5 BS11 ferments hemicellulosic monomers to produce short-chain fatty acids ...... 32

2.4 Conclusions ...... 39

Chapter 3: Decrypting carbon degradation and phage infection networks in the rumen ecosystem ...... 41

3.1 Introduction ...... 42

3.2 Materials and Methods ...... 43

3.2.1 Experimental design, sequencing, assembly, genome reconstruction ...... 43

3.2.2 Genome dereplication, annotation and proteomics ...... 44

3.2.3 Phylogenetic analyses ...... 45

3.2.4 Detecting polysaccharide utilization loci and predicting substrates ...... 46

3.2.5 Metabolic reconstruction and network analyses ...... 47

3.2.6 Carbohydrate microarray profiling ...... 49

3.2.7 Discovery of viral contigs from metagenomes ...... 49

3.2.8 Taxonomic classification of viral contigs via vContact ...... 50

3.3 Results and Discussion ...... 52

x 3.3.1 MAGs define taxonomic and metabolic classification of the rumen microbial

communities ...... 52

3.3.2 Polysaccharide utilization loci are critical for rumen carbon degradation ...... 60

3.3.3 Carbon turnover is mediated by many uncultivated taxa ...... 66

3.3.4 Viral infections are a key modulating factor of rumen microbial ecosystems ...... 70

3.4 Conclusion ...... 74

Chapter 4: Unlocking condensed tannin resistance mechanisms in the moose rumen ..... 76

4.1 Introduction ...... 76

4.2 Materials and Methods ...... 80

4.2.1 Experimental design ...... 80

4.2.2 Preparation of stocks, media, and cell suspension fluid ...... 81

4.2.3 Preparation of cells ...... 83

4.2.4 DNA extraction and 16S rRNA gene sequencing ...... 83

4.2.5 Isolation, genome sequencing, and genome analyses ...... 84

4.2.6 CT degradation experiment, proteomics and metabolite analyses ...... 87

4.3 Results and Discussion ...... 89

4.3.1 Sorghum CT enriches for potential CT-degrading microorganisms ...... 89

4.3.2 S. gallolyticus genome contains putative phenolic degrading genes ...... 92

4.3.3 Proteomics reveals differential expression between CT and non-CT treatments ...... 96

4.4 Conclusions and Outlook ...... 102

Appendix A: Additional Data ...... 109

Appendix B: Naming description and metabolic summary of all genomes ...... 111

Metabolic summary and naming description for novel lineages ...... 111

xi Description of novel groups in the Saccharibacteria, Lentisphaerae, Tenericutes, and

Proteobacteria phyla ...... 113

Description of novel groups within the ...... 114

Description of sampled families within the Bacteroidales ...... 117

Description of novel genera within known Bacteroidales families ...... 119

Respiratory capacity recovered in genomes ...... 121

References ...... 123

xii

List of Tables

Table 1. Genes used for CT-degradation gene blast search...... 86

Table 2. The phylogenetic affiliations of novel genomes reconstructed...... 113

xiii

List of Figures

Figure 1. The core rumen microbiome...... 3

Figure 2. Experimental design...... 13

Figure 3. Chemistry of consumed plant material...... 20

Figure 4. Non-metric multidimensional scaling of all collected samples...... 21

Figure 5. Rumen fluid microbial communities are statistically different in winter....23

Figure 6. BS11 relevance in other moose...... 24

Figure 7. Different BS11 populations are associated with different rumen fluid size fractions...... 28

Figure 8. BS11 is a novel family in the Bacteroidales...... 31

Figure 9. Genome enabled metabolic potential of BS11 Candidatus Alcium...... 35

Figure 10. Rank abundance curve of top 50 genera sampled in 16S rRNA gene amplicon sequencing...... 53

Figure 11. Phylogeny and genomic sampling of rumen MAGs...... 56

Figure 12. Metabolic reconstruction of all 77 unique near-complete genomes...... 58

Figure 13. Proteomic summary of all 77 unique near-complete genomes...... 59

Figure 14. Novelty and relative abundance of all recovered genomes across all metagenomes...... 61

Figure 15. Proteomics summary of carbon degrading enzymes and mechanisms...... 62

Figure 16. Detection and expression of polysaccharide utilization loci (PUL) across recovered Bacteroidales members...... 64 xiv Figure 17. Measured plant polymers and sugars in winter rumen fluid...... 65

Figure 18. Network analysis of plant carbon degradation...... 67

Figure 19. Summary of viral contigs, host interactions and expression of viral proteins...... 70

Figure 20. Host-viral interaction network...... 72

Figure 21. Illustration of known CT degradation and tolerance mechanisms...... 79

Figure 22. Experimental Design for CT enrichment...... 81

Figure 23. CT resistant microbial taxa...... 90

Figure 24. 16S rRNA gene alignment...... 92

Figure 25. Heatmap of proteins detected in proteomics with significant differences between CT treatments...... 100

Figure 26. Gene orientation of putative phenolic metabolism genes...... 101

Figure 27. Integration of tools to understand rumen microbial communities...... 103

Figure 28. 18S rRNA gene and ITS sequencing results...... 120

xv

Chapter 1: Introduction

1.1 Rumen microbial membership and community genomics

Because of their agricultural value, there is a great body of scientific research aimed at understanding the microbial members and mechanisms for carbon degradation in the rumen. To date, there are many cultured bacterial isolates from the rumen (e.g.,

Prevotella, Ruminococcus, Butyrivibrio, and Fibrobacter) that have clearly defined metabolisms and enzymatic mechanisms for degrading plant biomass3,4. For instance, in the laboratory, Prevotella spp. are known to import and degrade starch and hemicellulose polymers using polysaccharide utilization loci (PUL)5,6, while Ruminococcus spp. are known to degrade cellulose using extracellular protein complexes called cellulosomes7.

Recently, culture-independent investigations have revealed the prevalence and importance of these systems in situ, but have also highlighted roles for uncultured microorganisms in rumen carbon processing1,2,8-13. A large-scale single-gene census from

30 different types of ruminants in over 700 individual animals revealed a core microbiome present in 90% of animals1. While this study represented an important first of its kind survey inventorying the microbial composition across ruminants, it relied on poorly articulated sample collection, was limited by a small number of animals at each location, and was reliant on a 16S rRNA database that is not well annotated for

1 uncultivated lineages14. To remedy this latter technical concern, I reanalyzed the microbial sequencing data using an updated Silva database15. While the reported core microbiome was composed of unclassified members in four taxonomic groups (Figure

1A, colored boxes), my findings uncovered that within those groups, especially the

Bacteroidetes, there were highly abundant uncultured families and genera that actually could be resolved using my analysis approach (Figure 1B). Despite being prevalent in nearly every sampled ruminant species, and also in many mammalian gastrointestinal tracts, these lineages lacked a single sequenced genome or isolated representative in any publicly available database, obscuring their metabolic role in the host gastrointestinal tract. To address this knowledge gap, my thesis uses multi-omic and metabolic- enrichment strategies to uncover the phylogenetic and metabolic novelty residing in the rumen, yielding insights that extend more broadly to other mammalian gastrointestinal systems.

2 A Prevotella Butyrivibrio Unclassified Unclassified Clostridiales Unclassified Unclassified Bacteroidales Ruminococcus Non-core genera Lachnospiraceae Ruminococcaceae

0.0 Average Relative Abundance 1.0

B RC9 RF16 BS11 S24-7 BF311 Unclassified Paraprevotellaceae Unclassified Prevotellaceae

0.00 0.02 0.04 0.06 0.08 0.10 0.12 Average Relative Abundance

Figure 1. The core rumen microbiome. This figure was created using data reported in the Supplementary materials by Henderson et al. (2015)1. (A) Genera previously assigned by Henderson et al. (2015) as part of the core rumen microbiome are represented by boxes and sized to their relative abundance1. Colored taxonomic groups represent uncultivated lineages prevalent and abundant across ruminant animals. (B) Bacteroidales lineages uncovered using analyses by Solden et al. (2017) using the Silva database resolved lineages (families and genera) that were well defined by 16S rRNA but lacked genomic or cultivated representatives 2. This dissertation research (Chapters 2 and 3) uncovers the metabolic function and in situ proteomic expression of many of these lineages.

3

Previous metagenomics investigations in the rumen have focused on surveying the carbohydrate degrading enzymes encoded in the rumen with the goal of developing new biofuel enzyme cocktails8-10,16-23. In particular, enzymes of interest included cohesins, dockerins, endoglucanases, β-glucosidases, and cellobiohydrolases, which are primarily proteins involved in the formation of lignocellulolytic multi-enzyme complexes. Rarely were these enzymes actually linked to specific organisms or metabolic pathways. Hess et al. in 2011 performed metagenomics in cow rumen and expressed enzymes reconstructed in metagenomics data, showing that 20% of the enzymes tested were active on biofuel relevant substrates9. This study also was the first to reconstruct nine near-complete (>75%) genomes from a rumen metagenome, identifying the linkage of enzymes, pathways, and organisms. Some of these genomes were from uncultivated

Bacteroidetes members and contained up to 50 glycoside hydrolases per genome9.

Besides this study, most other rumen metagenomics studies focused on gene inventories and did not bin genomes from their metagenomic data.

Here, our genome-binned approach allows the resolution of organism-specific pathways and provides insights into microbial communities as networks characterized by symbioses, competition, and partitioning of community-essential roles24. Stressing the importance of binned-metagenomic approaches, a recent large-scale genomic study binned nearly 8,000 genomes from publicly available metagenomic datasets across a range of ecosystems from rumen to hydrothermal vent systems25. This study resulted in

615 near-complete (>75%) metagenomic assembled genomes (MAGs) from the rumen25.

Unfortunately, this study, like many other rumen metagenomic studies, did not use 4 MAGs to describe the metabolic potential of these new lineages, maintaining the mysterious role of uncultivated microorganisms in the rumen.

Even more elusive than the identity and metabolic potential encoded within the rumen microbiome are the microbial cross feeding between taxa that sustain anaerobic carbon degradation. Culture-based studies have attempted to disentangle microbial interactions by examining co-cultures of known carbon-degrading microorganisms26-29.

The results of these studies demonstrated that the presence of hemicellulose degraders, such as Prevotella ruminicola and Butyrivibrio fibrisolvens, enhances the ability of other to degrade cellulose28,29. While co-culturing P. ruminicola, Ruminococcus flavefaciens, and Fibrobacter succinogenes, the non-cellulolytic P. ruminicola enhanced cellulose digestion of both R. flavefaciens and F. succinogenes29. Another study showed that in co-culture with B. fibrisolvens, F. succinogenes also degrades cellulose more efficiently but switches its metabolism to utilizing both hemicellulose and glucose, potentially explaining the enhanced degradation28. These results reveal that the coordinated degradation of carbon by multiple organisms can enhance total carbon degradation. Because these studies occurred using co-cultured isolates from the rumen, interactions between many potentially important uncultivated microorganisms or additional strains were likely missed. Further contributing to this knowledge gap is the fact that few studies have inferred microbial metabolisms from genomes assembled from metagenomes, confining our knowledge of microbial metabolic roles to inferences gleaned from phylogenetically distant laboratory organisms9,10. Studying metabolic and viral interactions in microbial communities is a new frontier in microbial ecology and

5 requires a blend of multiple ‘omic and laboratory based tools to tease apart these interactions.

1.2 Rumen viruses are currently undersampled

Viruses are increasingly recognized as important regulators of microbial populations through cell lysis, nutrient fluxes, and reprogramming host metabolism30.

From the rumen, there are many lytic phages isolated from abundant bacterial genera like

Bacteroides, Ruminococcus, Streptococcus, and others31-35. Viral cultivation-independent techniques have highlighted that, like their microbial hosts, rumen viral populations are diverse and many of them have never been sampled before30,36-39. One of the first rumen viromes was sampled from rumen fluid of three Holstein dairy cows on the same diet38. A majority of the viruses (78%) did not have a significant hit in the RefSeq database, similar to other virome studies at the time40-42. Overall the viral taxonomy was similar between cows, with a dominance of Siphoviridae, followed by Myoviridae and

Podoviridae viral families38. More recently, Anderson et al. (2017) used metagenomes to sample viromes from five cows on four different diets30. While the overall viral populations changed with dietary shifts, there were 14 novel viruses present and abundant in all cows on all diets, suggesting a potential core viral population in the rumen. To date, rumen viral populations are under-sampled. With the critical role of rumen microorganisms in host health and productivity, it is imperative to better understand the viral processes influencing microbial population dynamics and carbon cycling in the rumen.

6 One process that can have dramatic impacts on microbial communities is viral infection mechanisms. Lytic infections produce virion progeny and result in cell death, while lysogenic viruses replicate with the cells without producing virions43. Because rumen viral populations are understudied, the interactions between microorganisms and viruses are also not well known. In the three Holstein dairy cow viromes, Berg-Miller et al. (2012) discovered that 14% of their virome (of 22% that could be classified) were estimated to be prophages38. This observation could be an artifact of the database used, but it is interesting to consider that lysogenic phages (viruses integrated into a host genome) dominate in the rumen ecosystem, as has been proposed for the human gut42,44.

While lysogenic phage are integrated in the host microbial genome, they can have many positive benefits on the host cell such as preventing infection from other viruses with super-exclusion proteins or altering cell physiology by introducing novel functions43.

These phage also can impact the microbial population dynamics by changing gene expression of their host, lysing competitive species, and facilitating the transfer of DNA, potentially conferring new phenotypes and new strains45. In the rumen, these phage dynamics could contribute to more stable communities under either adverse or constantly changing environmental dietary conditions, as seen in soil and marine communities43.

Lastly, metagenomics evidence for phage-resistance mechanisms such as CRISPR-cas systems have been reported to be widely encoded in rumen microbial genomes38, although it is unclear how these mechanisms operate in the rumen ecosystem. The biological and ecosystem impact of viral taxonomy and infection mechanism is only beginning to be realized43, and with further sampling, the microbial:viral interactions in

7 the rumen, and their impact to the host will soon be uncovered. In chapter 3 of this dissertation, viral diversity, proteome abundances, and host-linkages are uncovered.

1.3 The first ecosystems perspective on the rumen microbiome

This body of research uses a combination of meta-omic and physiological enrichment methods to begin understanding how the rumen functions as an ecosystem.

Here we asked the fundamental question how is carbon degraded in the rumen? To begin to answer this question, I explored novel microorganisms, novel microbial metabolisms, and microbial metabolic integrating a variety of tools and techniques. The three primary objectives of my dissertation research were to:

1. Uncover new roles in hemicellulose degradation for novel BS11 family in the

Bacteroidales (Chapter 2).

2. Decipher carbon degradation networks and microbial phage interactions across

the rumen microbial ecosystem (Chapter 3).

3. Enrich for novel condensed tannin (CT)-degrading microorganisms and

uncover enzymes expressed in the presence of CT relative to non-CT controls

(Chapter 4).

Together this work will help deepen knowledge of the microbial interactions and metabolisms underpinning anaerobic carbon degradation in the rumen and other anoxic ecosystems.

8

Chapter 2: New roles in rumen hemicellulosic sugar fermentation for the

uncultivated Bacteroidetes family BS111

Ruminants have co-evolved with their gastrointestinal microbial communities that digest plant materials to provide energy for the host. Some arctic and boreal ruminants have already been shown to be vulnerable to dietary shifts caused by changing climate, yet we know little about the metabolic capacity of the ruminant microbiome in these animals. Here, we use meta-omics approaches to sample rumen fluid microbial communities from Alaskan moose foraging along a seasonal lignocellulose gradient.

Winter diets with increased hemicellulose and lignin strongly enriched for BS11, a

Bacteroidetes family lacking cultivated or genomically sampled representatives. We show that BS11 are cosmopolitan host-associated bacteria prevalent in gastrointestinal tracts of ruminants and other mammals. Metagenomic reconstruction yielded the first four BS11 genomes and helped to phylogenetically resolve two genera within this previously taxonomically undefined family. Genome-enabled metabolic analyses uncovered multiple pathways for fermenting hemicellulose monomeric sugars to short- chain fatty acids (SCFA), metabolites vital for ruminant energy. Active hemicellulosic sugar fermentation and SCFA production was validated by shotgun proteomics and

1 This chapter was reproduced verbatim from Solden et al. 2017 The ISME J. The text benefitted from the

9 rumen metabolites, illuminating the role BS11 play in carbon transformations within the rumen. Our results also highlight the currently unknown metabolic potential residing in the rumen that may be vital for sustaining host energy in response to a changing vegetative environment.

2.1 Introduction

Ruminants are herbivorous mammals that depend on their microbial symbionts for the degradation of plant biomass, providing the host with energy in the form of short-chain fatty acids (SCFA)3. A recent census of microbial membership from over

30 ruminant species from across the globe revealed seven core bacterial groups conserved across 94% of the samples 1. Notably three of these seven bacterial taxa (the

BS11 gut group, BF311, and an unclassified Bacteroidales family) represent uncultivated families with the phylum Bacteroidetes. Although the Bacteroidetes are inferred to play key roles in rumen carbon transformations, these three prevalent lineages remain elusive due to a lack of genomic sampling and physiological characterization.

Due to recent genomic efforts, we have gained new metabolic insights into ruminant microbes. The Hungate 1000 project was critical in advancing our knowledge of ruminant physiology, sequencing the genomes of over hundreds of isolated rumen microorganisms46. Recent metagenomics studies have focused on the vast carbon

10 degradation potential that lies within the rumen microbiome from a variety of hosts8,9.

However, with a few exceptions, most rumen metagenomics projects have not reconstructed genomes from metagenomic information, and thus we have little insight into the uncultivated and prevalent Bacteroidetes families distributed across ruminants.

Given the abundance of unclassified Bacteroidales in browsing ruminants, we selected moose as our model, largely because the diet of these animals naturally spans a gradient of carbon complexity (from highly digestible grasses and leaves in spring to predominantly woody biomass in winter). This diet variability offers an opportunity for uncovering understudied rumen microbial taxa, which may be enriched under specific diet regimes. We used Alaskan moose fitted with rumen cannulae that were naturally foraging along a seasonal lignocellulose gradient, gaining real-time access to the rumen microbiome. Using metagenomics, we uncovered the first four genomes from the BS11 gut group. Metaproteomic data, in tandem with measured rumen NMR metabolites, confirmed that members of the BS11 group have the potential and express genes for degrading hemicellulose monomeric sugars in the rumen. These results shed further light on the metabolic versatility of the rumen, and link uncultivated Bacteroidetes members to essential carbon degradation processes.

11 2.2 Materials and Methods

2.2.1 Experimental design and sample collection

Moose of good health and body condition were released into a 12-acre enclosure for two weeks during the spring (May), summer (August), and winter (January) in 2014-

2015 at the University of Alaska’s Matanuska Research Center. Two of these naturally foraging moose were fitted with rumen cannulae, offering a unique, first of its kind opportunity to continuously sample rumen fluid in real-time and in response to diet.

Based on prior studies, moose were allowed to acclimate to the new diet for seven days prior to our first sample collection. Rumen fluid and corresponding fecal samples were collected at three time points during the second week of the diet (Figure 2). Samples were collected via two copper tubes cemented into the plug of the rumen cannula. These tubes were each attached to a tygon hose placed inside the rumen either 10 cm long (to sample the dorsal sac) or 30 cm long (sampling the ventral sac). These samples were combined for microbial and chemical analyses. Samples were immediately frozen and stored at -20 °C until shipment from Alaska, and -80 °C until DNA extraction and corresponding analyses in Ohio.

12 Figure 2. Experimental design. Two cannulated moose at the University of Alaska Matanuska Research Center were monitored for diet consumption and changes in their rumen microbiome. On the top are pictures of willow trees in this pasture at time of diet trial. Spring forages typically consist of buds, young shoots, and new leaves, whereas winter forages consist primarily of woody twigs without leaves. These differences in plant materials were reflected in the color of rumen fluid (pictured at the bottom), with spring and summer rumen fluid samples green in color, and winter samples brown in color. Colored boxes represent weeks of the diet trial (2 weeks per season, with pelleted diet in grey) and black squares (moose 1) or circles (moose 2) represent sampling dates during the second week of the diet trial. Between diet trials moose consumed a pelleted diet ad libitum consisting of: 13% crude protein, 15% crude fiber, 1.85% crude fat, and less than 1% each of calcium and phosphorus (Alaska Mill & Feed’s Reindeer Pellet).

13 2.2.2 DNA extraction and 16S rRNA gene sequencing

Genomic DNA was extracted from rumen fluid, feces, and tissue samples (0.5g each) using a MoBio PowerSoil DNA Isolation Kit following manufacturer’s protocol, with the additional preparation of initially pre-heating PowerBead tubes at 70°C for 10 min before placing in a Thermomixer at 2,000 rpm for 10 min. DNA was sequenced at the Next Generation Sequencing facility at Argonne National Laboratory with a single lane of Illumina MiSeq using 2 x 251-bp paired end reads following established HMP protocols47. Data processing was performed using QIIME 1.9.0 unless otherwise noted, with analyses details included in the Supplementary Methods. Our QIIME pipeline and

NMDS workflow in R are available on github (https://github.com/lmsolden/Qiime- pipeline, https://github.com/lmsolden/NMDS-with-loadings).

2.2.3 Forage and rumen chemical characterization

Following visual observation of plant species consumed by moose, representative samples were collected from leaves, stems, or whole plants, immediately frozen and subsequently lyophilized. Sequential detergent fiber analyses (neutral detergent fiber, acid detergent fiber, acid detergent lignin) were performed on all plants following the procedure outlined by Spalinger et al. (2010), with only lignin corrected for ash48. For rumen content chemical analyses, lyophilized digesta samples were chopped and ground by mortar and pestle to obtain a uniform particle size. Subsamples previously dried at

105°C were analyzed for CP (N × 6.25) by Kjeldahl determination using a Foss Tecator digestion system. Subsamples previously dried at 55°C were analyzed sequentially for

14 NDF, ADF, and ADL using standard protocols, described in detail in the supplementary methods49. Lignin was determined as the fraction of ash-free ADF insoluble in 72% sulfuric acid. Cellulose and hemicellulose were estimated as the mass loss from ADF to

ADL and NDF to ADF, respectively. The entire ADF filtrate from one winter rumen fluid was concentrated to 10 mL for 1H NMR analyses of hemicellulosic monomers. Short- chain fatty acids in filtered (0.2 µm) rumen fluids were detected on a Shimadzu HPLC fitted with an Aminex HPX-87H organic acid column using manufacturer’s protocol.

2.2.4 Hemicellulose metabolites identified by 1H NMR

Filtered (0.2 µm) rumen fluid samples from each season were sent to EMSL at the

Pacific Northwest National Laboratory for NMR metabolite analysis. Two hundred and seventy µl of the filtered sample (rumen fluid or hemicellulose acid extraction fluid) were

50 mixed with 30 µl of 5 mM 2,2-dimethyl-2-silapentane-5-sulfonate (DSS) in D2O . NMR data were acquired on a Varian Direct Drive 600 MHz spectrometer operating VNMRJ

4.0 software. On each sample, a one dimensional 1H nuclear Overhauser effect spectroscopy with presaturation experiment was performed following standard Chemomx data collection guidelines. These 1D 1H spectra were collected with either 512 transients

(filtered rumen fluid) or 2048 transients (hemicellulose acid extraction). Collected spectra were analyzed using Chenomx 8.1 software with quantification based on spectral intensities relative to the 0.5 mM DSS internal standard. Two-dimensional spectra were acquired on the rumen fluid samples and aided the 1D 1H assignments.

15 2.2.5 Metagenomic assembly, annotation and binning

One winter rumen fluid sample was separated into a pellet of plant material

(gentle centrifugation for 5 min at 3,000 x g) and the supernatant was sequentially filtered through a 0.8 µm filter and then onto a 0.2 µm filter. DNA was extracted from four fractions: the pellet (1 g), half of the biomass retained on each of the 0.8 and 0.2 µm filters, and the filtrate that passed through the 0.2 µm filter. DNA was sequenced with

Illumina Hi-Seq 2500 at The Ohio State University. 16S rRNA gene sequences were reconstructed from the Illumina trimmed unassembled reads using EMIRGE51. Trimmed reads were assembled de novo to generate genome fragments using IDBA-UD52. Genes were called, annotated, and analyzed as previously described53,54. A combination of phylogenetic signal, coverage, and GC content was used to identify BS11 genomic bins55.

Additional assembly and binning methods and validation information are available in the

Supplementary methods. Genomic completion of the BS11 bins was assessed based on the presence of a core gene set that typically occurs only once per genome and is widely conserved among bacteria and archaeal56. For sequence-based comparison, average amino acid identity (AAI) and average nucleotide identity (ANI) values were calculated using the ANI and AAI calculators from the Kostas lab calculator.

2.2.6 Phylogenetic analyses

Existing reference datasets for the 11 ribosomal proteins chosen as single-copy phylogenetic marker genes (RpL2, 3, 4, 6, 14, 15, 16, and 18 and RpS8, 17, and 19) were augmented with sequences mined from sequenced genomes from the Bacteroidales phyla

16 from the NCBI and JGI IMG databases (August 2015). Each individual protein dataset was aligned using MUSCLE 3.8.31 and then manually curated to remove end gaps57.

Alignments were concatenated to form a 11-gene, 63 taxa alignment and then run through

ProtPipeliner, a python script developed in-house for generation of phylogenetic trees2.

The pipeline runs as follows: alignments are curated with minimal editing by

GBLOCKS58, and model selection conducted via ProtTest 3.459. A maximum likelihood phylogeny for the concatenated alignment was conducted using RAxML version 8.3.1 under the LG model of evolution with 100 bootstrap replicates60 and visualized in iTOL61. Identified glycoside hydrolases of selected functional classes (e.g. chitin, hemicellulose, debranching) were identified by a Pfam HMM search. Briefly, Pfam search was performed and parsed into an output table organized by function per genome.

Additionally we manually identified genes for central carbon, respiratory, motility, and fermentation product generation in all genomes.

2.2.7 Metaproteomic extraction, spectral analysis, and data acquisition

The other half of the same filter used for metagenomics was selected for metaproteomic analyses. Filters were sonicated in SDS-lysis buffer and water bath sonication. Proteins in the supernatant were precipitated with protein pellets washed twice with acetone, and the pellet lightly dried under nitrogen. Filter Aided Sample

Preparation (FASP) kits were used for protein digestion according to the manufacturer’s instructions. Resultant peptides were snap frozen in liquid N2, digested again overnight using the FASP kit, and concentrated to ~30 µL using a SpeedVac. Final peptide

17 concentrations were determined using a bicinchoninic acid (BCA) assay. All mass spectrometric data were acquired using a Q-Exactive Pro connected to an ACQUITY

UPLC M-Class liquid chromatography system via in-house column packed using

Phenomenex Jupiter 3µm C18 particles and in-house built electrospray apparatus.

MS/MS spectra were compared to the predicted protein collections using the search tool

MSGF+62. Contaminant proteins typically observed in proteomics experiments were also included in the protein collections searched. The searches were performed using +/- 15 ppm parent mass tolerance, parent signal isotope correction, partially tryptic enzymatic cleavage rules, and variable oxidation of methionine. Additionally a decoy sequence approach63 was employed to assess false discovery rates. Data were collated using an in- house program, imported into a SQL server database, filtered to ~1% FDR (peptide to spectrum level), and combined at the protein level to provide unique peptide count (per protein) and observation count (i.e. spectral count) data. Protein identification was typically based on two or more peptides per protein, but we have reported single peptides with low confidence that were manually confirmed by examining the spectra.

18 2.3 Results and Discussion

2.3.1 Dietary increases in woody biomass

We assessed dietary plant chemistry, rumen content chemistry, and rumen microbial membership from moose foraging along a seasonal lignocellulose gradient.

Across all time points, the diet was composed primarily of deciduous shrubs (e.g., birch and willow)2. Winter plants were statistically different in chemistry, with 3-fold higher lignin, 40% higher cellulose and hemicellulose, and 63% lower protein concentrations compared to spring forages (Figure 3). Reflecting winter increases in dietary woody biomass, rumen fluid transitioned from green colored fluids in spring to brown fluids in winter (Figure 2). Winter rumen fluid samples also had significantly higher cellulose and lignin concentrations with lower crude protein concentrations than spring rumen fluid samples2.

19 * Winter * 35 Summer 0.15

Spring PPC (mg BSA/mg Dry Matter) 30 * * p<0.05 ** 0.12 25 * 20 * 0.09

15 0.06 10

Percent of Dry Matter (%) 0.03 5

0 0.00 Lignin Cellulose Hemicellulose Protein CT

Figure 3. Chemistry of consumed plant material. Data includes the average of insoluble fiber determined by sequential fiber analyses on primary plant species consumed across all seasons, with standard deviation shown. Condensed tannin (CT) column represents the average protein precipitating capacity (PPC) for the same primary plant species consumed across all seasons. All values are reported as a percent of dry matter that was remaining at each sequential step. Significant differences (p<0.05, students t-test) between spring and winter is denoted by *, while significant differences between spring and summer are denoted by **.

2.3.2 Microbial members respond to increases in woody biomass

16S rRNA gene analyses showed that microbial communities in rumen fluids and feces changed in response to seasonal plant chemistry (Figure 4). Winter rumen microbial communities were statistically different from spring and summer, and all three treatments could be distinguished from pelleted rations (p=0.001) (Figure 4B). Spring and summer microbial communities could not be statistically differentiated. Increased dietary woody biomass and decreased SCFA concentrations were correlated to changes in microbial structure, not alpha diversity (envfit, p<0.05) (Figure 5A).

20 A stress=0.059 B Feces stress=0.149 0.4 Rumen Fluid

Spring week 2 0.25

0.2 Captive Week 2 Spring Day 5

Spring Day 3 Spring Day 3 NMDS2 NMDS2 0.00

0.0 Spring Day 1

−0.25 Captive Week 2 Spring Day 5 −0.2 Spring Day 1 −0.5 0.0 0.5 1.0 −0.25 0.00 0.25 0.50 NMDS1 NMDS1

Figure 4. Non-metric multidimensional scaling (NMDS) of all collected samples. Color indicates season of diet (week 2 samples: spring, light green; summer, dark green; winter, orange; pelleted ration, gray). Week 1 of spring diet (purple) for both feces (A) and rumen fluid (B) was sampled to determine how quickly microbial communities change. Day 1 of spring diet in rumen fluids is more similar to all week 2 diet samples than all other diet groups (mrpp, p=0.002). Day 1 of spring diet in feces is more similar to captive diet samples than spring (mrpp, p=0.001). After 5 days of native spring diet, feces is statistically similar to week 2 spring diet samples and distinct from captive diet fecal samples (mrpp, p=0.002).

We noted that four OTUs demonstrated the largest change in relative abundance on the winter diet compared to spring. Of these four OTUs, which all represented at least

0.8% average relative abundance across the winter samples, the two most dynamic OTUs were members of the ‘uncultivated BS11 gut group’, while the other two OTUs were members of the Prevotella genus. The most enriched OTU, a BS11, increased 800 fold from spring (0.005% abundance) to winter (4.4%) (Figure 5B). This OTU was a low abundant member in spring rumen fluids (1026 OTU rank) and became the third most abundant OTU detected in winter rumen fluid. Here, we targeted the BS11 for genomic

21 reconstructions, given that this prevalent ruminant bacterial family lacks any physiological or phylogenetic information. Additionally, these uncultivated organisms represent conditionally rare members of the rumen community that responded to the increase in dietary woody biomass suggesting a unique role in rumen carbon cycling

(Figure 5B).

22 A Summer Butyrate Winter 0.2 Acetate 0.0

Crude Protein NMDS2 −0.2 Hemicellulose Spring Propionate Lignin −0.4

−0.6 Cellulose −1.0 −0.5 0.0 0.5 NMDS1

B5 (1) Spring (2) Winter 4 (3) (4) 3

2

Relative Abundance (%) Relative 1

0 1 55 270 1026 1580 Spring OTU Rank

Figure 5. Rumen fluid microbial communities are statistically different in winter. (A) Non-metric multidimensional scaling (NMDS) of Bray-Curtis similarity metric shows a statistically significant (mrpp p=0.001) separation of rumen microbial communities from winter to spring in both moose. Vectors are fitted with envfit and the length represents strength of the correlation (solid lines significant at p<0.005, dotted lines p>0.05). (B) Rank abundance of all the OTUs in spring and the corresponding abundance of the same OTUs in the rumen fluid in winter. The top two most enriched OTUs in winter are highlighted with an asterix, with the third and fourth most dominant OTUs in winter highlighted with a dotted line.

23 2.3.3 Members of the BS11 are poorly described, yet prevalent in mammalian

gastrointestinal tracts

Based on our findings in Alaska, we sampled the entire gastrointestinal intestinal

tracts (GIT) from deceased moose from different geographic regions to better understand

the biogeography of BS11. A majority of these samples were from wild animals

opportunistically collected in Minnesota as part of a study identifying factors involved in

the 58% decline in moose over the past decade64. We demonstrated that BS11 is present

in rumen fluids from Ohio and Minnesota moose, in addition to Alaska.

A B 25 20 Minnesota Cause of Death Columbus Zoo Illness Alaska 20 OSU Dairy Farm Vehicle Collision undance (%) 15 b e A v 15 undance (%) 10 b e A v 10

5

BS11 Relati 5 Dominant BS11 OTU Relati 0 0 MN Wild OSU Spring Summer Winter Sick Small Vehicle Dairy MN Wild & Ab RF RT Colon Feces Om Re Collision Cow Healthy Alaska OH Zoo Intestine Figure 6. BS11 relevance in other moose. (A) The BS11 family enriched in cannulated moose on winter forage in Alaska (orange) was also enriched in wild and captive moose that died of illness (blue box). This family is detected but not abundant in rumen fluid collected at 4 time points from a dairy cow at OSU Waterman Dairy Farm (scarlet box), 4 healthy Minnesota moose (brown box), or in our healthy Alaska moose consuming a highly digestible diet. (B) BS11 family abundance in MN and Columbus Zoo moose. Abbreviations are as follows: Ab, abomasum; RF, rumen fluid; RT, rumen tissue; Om, omasum; Re, reticulum. In all tissues (abomasum, rumen, colon, omasum, reticulum, small intestine) and samples (feces, rumen fluid), the abundance of BS11 is plotted on the y-axis. Significant differences of BS11 abundances between causes of death were found in rumen fluid and reticulum tissue.

24

Similar to the results of the seasonal diet experiment, the BS11 were the most dominant members in five wild Minnesota moose and moose from the Columbus Zoo, whose cause of death was inferred to be illness, and all appeared nutritionally deprived

(Figure 6). This increase at the family level was driven largely by one OTU that was not similar (<90% identical) to the OTUs enriched on the Alaskan winter diet. The increased relative abundance of BS11 family in rumen fluids in sick moose could not be attributed to season due to lack of sampling but was shown to be independent of host geography, captivity status, diet, age, and sex but was higher with more complex dietary carbon

(Alaska winter) or diminished health status (wild MN and Columbus Zoo).

Alternatively, the BS11 family was not enriched in four healthy wild Minnesota moose where vehicle collisions were the cause of death (2.1% ± 1), nor was it prevalent in rumen fluids from a dairy cow (Figure 6A). The abundance of BS11 in our sampled cow rumen (3%) was similar to a microbial community analysis of 394 cattle (1.4%)1. In addition to the rumen fluids, we also examined the GIT tissue for BS11 and found that high abundances were confined to reticulum tissue from moose that died of illness, but were not significantly enriched in other tissues from the rumen, omasum, abomasum, small intestine, or colon, and feces from healthy or cachectic moose (Figure 6B). The low abundance of BS11 in moose rumen tissue is consistent with a survey from hunter kills65. The increase in BS11 under specific conditions is further support that these bacteria may represent dormant metabolic potential that respond to nutritional, environmental, or health related stressors in ruminants.

25 Given that a microbial rumen census reported BS11 as the fifth most abundant taxon sampled in rumen fluid, and prevalent in 94% of 32 different hosts, this family appears highly adept at colonizing rumen fluids1. We mined the dominant BS11 16S rRNA gene in our Alaska samples (Figure 5B) from other ruminant datasets. This OTU was detected in rumen fluid surveys from reindeer, elk, white tailed deer, and muskoxen, as well as yak, cow, camel, goats and sheep13,66-71. Analogous to our findings, these ruminants consumed a high fiber diet, with the BS11 often associated with the plant adherent fraction of the rumen67,69. BS11 also colonized monogastric hosts and was detected in feces from a wide variety of zoo animals consuming a plant-based diet (e.g. elephants, orangutans, and gorillas)72,73, as well as in human feces74 and periodontal disease pockets (HE681238). Based on our analysis, BS11 is likely host associated15, as we failed to identify 16S rRNA gene sequences originating from a non-animal source; i.e., no sequences were found from other high woody biomass anoxic habitats, such as wetland soils.

BS11 remain enigmatic due to poor taxonomic assignment and a lack of metabolic information. Based on the ARB Silva (v123) database, the BS11 family contains 3,344 unique 16S rRNA gene sequences (resolved at >97% identity) but lacks any finer level of taxonomic resolution (e.g. genera)15. Other 16S rRNA gene databases fail to recognize the BS11, for example commonly assigning sequences as “unclassified members of the phylum Bacteroidetes” (RDP)75. This inconsistent taxonomic assignment across databases may contribute to the misidentification of BS11 in 16S rRNA surveys in hosts. Furthermore, the metabolic roles for the BS11 are unknown. For example, the most enriched BS11 OTU in our dataset shared only 85% 16S rRNA gene identity with the 26 nearest isolated bacterium (a member of Porphyromonadaceae). Despite the lack of genomic sampling in the BS11, the broad host distribution in animals consuming a vegetarian-based diet suggests a role in the degradation of plant-based compounds.

2.3.4 Multiple near-complete genomes resolve the family BS11 and identify two new

candidate genera

To define the phylogeny and metabolic role of BS11, we sequenced four different size fractions from a winter rumen fluid sample (a loosely-centrifuged pellet of plant material, biomass contained on a 0.8 µm filter, biomass on contained on a 0.2 µm filter, and the 0.2 µm filtrate). Near full-length 16S rRNA genes were reconstructed from metagenomic reads51 and phylogenetic analyses showed that two BS11 genera were dominant in these samples (Figure 7). In our metagenomic dataset, the most dominant

BS11 OTU (EU381891) was detected in the plant pellet and the 0.8 µm filter fractions and not on smaller-sized soluble contents, while the second most abundant BS11 OTU

(EU470401), constituting a different genus, was recovered from all size fractions greater than 0.2 µm in the rumen fluid (Figure 7). Both dominant OTUs were identical to the two OTUs in our amplicon sequencing that responded to the winter diet treatment. Our metagenomic findings show that the two dominant BS11 genera in moose rumen fluid are associated with different size fractions, suggesting a possible niche partitioning.

27

Prevotella (FJ172886) EU871381 9% Pellet 100 GQ327743 99 EU469906 GQ327864 85 100

Avg abundance Avg EU381891 100 0.3% EU381994 0.8μm 72 72 EU381801

HM008732 GQ327297 EU459512 0.2μm 90 EU470401 Tree scale 0.1 GU303571 Figure 7. Different BS11 populations are associated with different rumen fluid size fractions. Maximum likelihood tree of the reconstructed near full-length BS11 16S rRNA gene OTUs (circles). Branches are named by corresponding Silva accession number, with Prevotella (EU259391) as the outgroup. Network analysis (right) highlights the distribution of the reconstructed 16S rRNA sequences in the different size fractions (diamonds). Circle size represents the average relative abundance across the detected fractions. The gray box indicates sequences corresponding to Ca. Alcium, whereas the white box indicates sequences corresponding to Ca. Hemicellulyticus. The blue and red stars indicate the EMIRGE 16S sequence that match identically over the V4 region to the enriched BS11 OTUs (blue and red stars, respectively).

28 We reconstructed four BS11 genomes with an estimated completion of 87->99% and an average genome size of 2.9 Mbp2. We also recovered a BS11 scaffold containing

28 ribosomal proteins (0.155 Mbp), and a separate bin that contained BS11 scaffolds that could not be confidently assigned to one of the four BS11 genomes (1.2 Mbp). To evaluate the presence of BS11 in other rumen systems, we searched assembled ruminant metagenomes publically available (img.jgi.doe.gov) for BS11 ribosomal genes, identifying putative BS11 ribosomal protein S3 genes in sheep, cow, and reindeer rumen fluids2,9,13. However these genomic fragments and the functional properties also contained on the scaffolds were unbinned and not taxonomically assigned to the BS11 due a lack of available reference genomes. Until now the metabolic role of these BS11 genera were missing pieces of the carbon degradation process in the rumen.

Phylogenetic analyses using concatenated alignments of 11 ribosomal proteins from the 4 near-complete genomes and the one scaffold supported the BS11 as a monophyletic family within the order Bacteroidales (Figure 8A). This topology was also confirmed in single gene analyses. The average amino acid identity across the genomes and 16S rRNA gene linkages revealed that these four genomes could be assigned to the two BS11 genera identified in our reconstructed 16S rRNA phylogeny2. The Candidatus

Alcium genus includes the OTU that most increased in abundance in response to a winter diet (red star corresponding to red star in Figure 5B), while the Candidatus

Hemicellulyticus genus is the second most enriched OTU relative to the spring diet (blue star, Figure 5B, Figure 7). Following the recent naming protocol for near-complete

(>95%) genomes from metagenomics76, we propose the name Candidatus Alcium based on the mammalian source (Alces alces) for this widely distributed OTU. For the second 29 genus, composed of two near-complete genomes (>96%) and one partial genomic fragment, we propose the name Candidatus Hemicellulyticus based on the inferred metabolism described below.

30

Figure 8. BS11 is a novel family in the Bacteroidales. (A) Concatenated ribosomal protein tree of the Bacteroidales shows that the 4 BS11 genomes and 1 BS11 scaffold are a novel family in the Bacteroidales (gray box). The number of genomes in each family is denoted in parentheses. BS11 sequences are shown in light blue. Numbers on the node represent bootstrap support, using 100 bootstraps. (B) Glycoside hydrolase (GH) genes are reported as the average number of GH genes/per genome for each family (x-axis). GHs were clustered into functional category (denoted by color gradient). The cluster labeled as “other” includes α-amylase, pectinesterase, concanavalin, and polyphenol oxidoreductase laccase.

31 2.3.5 BS11 ferments hemicellulosic monomers to produce short-chain fatty acids

We infer a fermentation-based lifestyle for the four BS11, as these genomes lack c-type cytochromes, a linked electron transport chain, a complete tricarboxylic acid cycle, and pyruvate dehydrogenase. The metabolic capabilities for one Ca. Hemicellulyticus genome (F08-3) may be more extensive, as this genome had a partial NADH dehydrogenase (subunits D, E, F, C, B, A), succinate dehydrogenase, and a putative b1 oxidase. We note however, that the NADH dehydrogenase may be an annotation artifact as the homologs identified had a low identity to known proteins and the remaining required subunits (M, N, L, J, K, H, I, and G) were not detected in the genome. Members of Ca. Hemicellulyticus and Alcium had bacterioferritin (cytochrome b1 oxidase), but given the lack of other electron transport chain complexes, we suggest this enzyme may play a role in iron assimilation77. All four genomes are inferred to have a gram-negative cell envelope and lacked flagella, but encode a 11-gene gliding motility complex that in other organisms enables attachment and movement across surfaces with low water tension (e.g. biofilm)78,79. This attachment-based motility may be a physiological adaptation to adhere to plant material in rumen fluid.

To put the global carbon-degrading metabolism of BS11 in a phylogenetic context, we profiled the glycoside hydrolases (GH) in genomes from across the

Bacteroidales order. Each BS11 genome encodes up to 29 GH, which when compared to other families is less than the average (mean 73.3 ±38.6). The BS11 genomes contained up to 11-chitin associated genes, which is the most sampled across the order. While chitin is not commonly associated with a plant-based diet in ruminants, several of these genes

(shown in red) confer the degradation of glycoproteins (e.g., human mucin) in other gut 32 bacteria80,81. Thus, it is possible these genes may have an alternative role in BS11 as well82. Like most other members of the Bacteroidales, all members of the BS11 sampled to date encode the capacity for releasing hemicellulosic monomeric sugars.

In contrast to other families in this order, the BS11 lack a single genome representative with the capacity for cellulose degradation. However, similar to the gene prevalence in the Bacteroidetes, both Ca. Hemicellulyticus genomes contained Sus-like polysaccharide utilization loci (PUL). The gene organization was similar to prior reports from rumen metagenomes or Bacteroidetes isolate genomes83,84, with SusC and SusD co- localized. Each Ca. Hemicellulyticus genome contained two scaffolds with this structure, and the Sus genes were surrounded by either peptidases or glycoside hydrolases, suggesting these loci may be involved in coordinating plant biomass degradation.

Given the increase in dietary hemicellulose when BS11 were enriched and the presence of GH for releasing hemicellulose, we used proton NMR to identify the sugar compositions associated with the different size fractions on the winter diet. Unlike cellulose (glucose subunits), hemicelluloses are heterogeneous plant polymers containing multiple monomeric sugar units (e.g., xylose, mannose, galactose, rhamnose, and

85 arabinose) arranged in different proportions . The polymeric properties of hemicelluloses vary depending on plant species and phenological progression3, with much less known about the structural characteristics of arctic or boreal forages. Our

NMR results show that soluble rumen fluids and plant materials contain a different combination of hemicellulosic sugars. Specifically the soluble fraction contained fucose

(37%), galactose (24%), arabinose (19%), xylose (14%) and rhamnose (7%), whereas

33 solidified plant material contained xylose (79%), arabinose (12%), fucose (7%), and galactose sugars (2%) (Figure 9A and 9B).

34

Figure 9. Genome enabled metabolic potential of BS11 Candidatus Alcium. (A) 1H NMR of hemicellulose sugars detected in raw filtered rumen fluid from winter sample. (B) Acid extraction of rumen fluid to specifically release hemicellulose sugars from solid plant material. (C) Genome cartoon represents metabolic predictions for Ca. Alcium. Concentrations of hemicellulose sugars detected with 1H NMR are listed in red. The metabolic pathway of each hemicellulose sugar is represented in colored boxes: orange, mannose; purple, rhamnose; blue, fucose. Genes detected are represented by numbers in colored boxes: white, presence of the enzyme in the genome; red, absence; black, genomic and proteomic support. Abbreviations: M1P: mannose-1-phosphate, M6P: mannose-6-phosphate, GDP-D-M: GDP-D-mannose, F6P: fructose-6-phosphate, G3P: glyceraldehyde-3-phosphate, DHAP: dihydroxyacetone phosphate, Oaa: oxaloacetate, Cit: citrate, Iso: isocitrate, 2-oxo: 2-oxoglutarate, Suc- CoA: succinyl-CoA, Succ: succinate, Fum: fumarate, Mal: malate, Gln: glutamate, Glu: glutamine, Pyr: Pyruvate.

35 Our data suggest that the two BS11 genera are specialized for the fermentation of different hemicellulosic monomers. While all four BS11 genomes are inferred to ferment mannose, the Ca. Alcium genomes exclusively contain genes for releasing and utilizing more accessible hemicellulosic side-chain residues (e.g., rhamnose and fucose)85. Fucose fermentation may be particularly important to energy generation, as the most complete

Ca. Alcium genome has four copies of α-L-fucosidase, two fucose permeases, and a fucose isomerase. Consistent with the distribution of Ca. Alcium genomes in all size fractions (i.e. not necessarily associated with plant biomass), the genomes encode the capacity to ferment hemicellulose monomers most abundant in the bulk rumen fluid

(fucose and rhamnose) (Figure 9C).

Unlike the broadly distributed Ca. Alcium, the Ca. Hemicellulyticus genomes were recovered primarily from fractions containing solid plant biomass (Figure 7).

Metabolite analysis of this plant biomass fraction revealed enrichment of xylose relative to fucose and rhamnose, likely reflecting the xylose backbone in hardwood hemicellulose85. Consistent with the metabolite and abundance data from the plant fractions, the Ca. Hemicellulyticus genomes exclusively encoded genes for xylose fermentation but lacked genes for fucose or rhamnose fermentation, sugars in higher abundance in the soluble fractions2. Specifically these BS11 may utilize the xylose isomerase pathway to convert xylose to xylulose 5-phosphate, which is ultimately degraded by a complete pentose phosphate pathway also present in the most complete

Ca. Hemicellulyticus genome. Taking the genomic insights together, the preference for different hemicellulosic monomers by Ca. Alcium (mannose, fucose, rhammnose) and

36 Ca. Hemicellulyticus (mannose, galactose, xylose) likely explains the distribution of these genera in the rumen fractions (Figure 7).

In BS11, we predict hemicellulose sugars are shunted to a complete glycolysis pathway in Ca. Hemicellulyticus, while both Ca. Alcium genomes have a partial EMP- pathway lacking enolase, pyruvate kinase, and triose-phosphate isomerase homologs

(Figure 9C). This missing functionality in Ca. Alcium may be rescued via the full methylglyoxal detoxification pathway, in a proposed pathway similar to bovine rumen

Butyrivibrio86,87. However, it has been shown that methylgloxal is an inhibitor of sugar metabolism during metabolic imbalance88. This alternative pathway, present in only the

Ca. Alcium genomes would yield only 0.5 ATP/glucose, which could be used under growth limitation conditions when high ATP yields are not required86. The winter diet, when BS11 genera are enriched, may be a growth limiting condition has high amounts of lignin is thought to restrict access to other carbohydrates like cellulose and hemicellulose89. Together, the metabolite and genomic data support the degradation of hemicellulosic monomers as a shared metabolic feature of BS11 (Figure 9C). This explains the increase in BS11 during winter when woody plant tissue (and hemicellulose) comprises a much greater proportion of the moose diet. Additionally, organismal distributions on different size fractions and sugar metabolite profiles suggest the two genera may be functionally specialized for specific hemicellulose monomers.

We infer the Rnf complex identified in all four BS11 genomes supports reverse electron flow90, using the sodium gradient to oxidize NADH and generate reduced ferredoxin. These reducing equivalents can then be disposed of by the generation of hydrogen via FeFe hydrogenases91. BS11 genomes also encode the production of SCFA 37 to generate ATP via substrate-level phosphorylation. Ca. Hemicellulyticus has the potential to produce butyrate, propionate, and acetate, while Ca. Alcium can only produce propionate and acetate.

The production of SCFA by microbial symbionts is vital to ruminant nutrition, and supplies as much as 80% of the host energy and gluconeogenic substrates3. Although the BS11 may contribute to SCFA, the lower absolute concentration produced on the winter diet is likely a result of the poorer quality diet. This capacity for metabolic exchange with the host hints at why BS11 may be maintained across a broad host range.

Metaproteomics supported functional genomic interpretations; the key genes in metabolic pathways identified by metagenomics were expressed in the rumen (Figure

9C). In Ca. Alcium, proteins for rhamnose degradation, methylglyoxyl shunt, and acetate production were detected (Figure 9C). Alternatively, in Ca. Hemicellulyticus, we detected proteins for butyrate and hydrogen production, but not acetate. Surprisingly, many genes annotated as proteins of unknown function were highly expressed comprising three of the five most highly detected proteins in Ca. Alcium. This large number of highly expressed but unannotated genes indicates the large portion of metabolic information still to be discovered in relatively well-annotated phyla like the

Bacteroidetes.

38 2.4 Conclusions

In summary, we uncovered important functional qualities for uncultivated, yet ubiquitous host-associated Bacteroidetes. Using >95% complete genomes, we resolved at least two new genera within the BS11 family, here named Candidatus Alcium and

Candidatus Hemicellulyticus. Together, our metagenomic, metaproteomic, and metabolomic data indicate BS11 are specialized to ferment many different hemicellulosic monomers (xylose, fucose, mannose, rhamnose), producing acetate and butyrate for the host. In the context of adaptation to changing environmental conditions, we have identified BS11 that become dominant in the rumen when the host consumes a high woody biomass diet. Furthermore the release and use of hemicellulosic monomeric sugars may help explain broad distribution in other animals, as hemicellulose represents one of the most abundant foods sources on this planet constituting the fiber in foliage, fruits, vegetables, and grains92.

Keystone ruminants such as the North American moose (Alces alces) and caribou

(Rangifer tarandus) are experiencing nutritional limitation and severe population declines in response to climate-related changes in the plant chemical landscape93,94. Climate change in artic and boreal ecosystems is increasing the abundance of woody deciduous shrubs95 and potentially making the carbon more recalcitrant via the production of secondary metabolites like condensed tannins, which may impact rumen microbial metabolism96,97. Currently, the connection between dietary carbon composition, rumen microbial metabolism, and host nutrition is largely unexplored in wild ruminants from climate-sensitive regions98. The fact that necessary host metabolites are produced under new dietary regimes suggest conditionally rare taxa like BS11, which are generally 39 concealed within the metabolic reservoir of the rumen, may be one of the many important, yet understudied, players enabling wild herbivores to nutritionally adapt to a rapidly changing world.

40

Chapter 3: Decrypting carbon degradation and phage infection networks in the

rumen ecosystem2

Much of the knowledge of microbial carbon degradation in the rumen comes from physiological characterization of cultivated representatives, or from carbon-degrading gene inventories obtained from cultivation-independent metagenomic studies. However, we lack a holistic view of the microbial food web responsible for carbon flow in the rumen. Here, we leveraged fistulated-moose to gain real-time access to rumen microbial communities actively degrading woody plant biomass into short chain fatty acids.

Characterization of plant-derived compounds using 1H NMR and carbohydrate microarrays quantified the rumen carbon nutrient landscape. To create a database for metaproteomics, we resolved 810 viral contigs and 77 unique microbial metagenome assembled genomes, with 71% of these microbial genomes uncharacterized with regard to metabolic pathways. Network analyses of metabolites and metaproteomic data directly linked expressed proteins from uncultivated microorganisms to measured plant polymers and sugars, revealing a three-tiered trophic system for carbon processing. Several uncultivated Bacteroidetes genotypes operate as metabolic hubs, degrading six or more

2 The majority of data presented in this chapter was submitted for publication at Nature Microbiology as of submission of this document. The citation is as follows: Solden, L.M., Naas, A.E, Roux, S., Daly, R.A., Collins, W.B., Nicora, C.D., Purvine, S.O., Hoyt, D.W., Schuckel, J., Jorgensen, B., Willats, W., Spalinger, D.E., Firkins, J.L., Lipton, M.S., Sullivan, M.B., Pope, P.B., Wrighton, K.C. Decrypting carbon degradation and phage infection networks in the rumen ecosystem. Submitted. Text benefitted from the writing and editing of all authors.

41 carbon substrates. Our metaproteomics data also demonstrated that viral proteins are present and can infect microbial hosts, further mediating rumen carbon turnover. Our findings elucidate the microbial and viral members, as well as their metabolic interdependencies underpinning in situ carbon degradation in the rumen ecosystem.

3.1 Introduction

The rumen has served as a model ecosystem for investigating microorganisms involved in anaerobic carbon degradatios99. Many anaerobic microbes are isolated from the rumen and have clearly defined roles in plant biomass degradation. Some of the first functional metagenomic studies also originated from rumen samples8,9 and have shown that uncultivated microorganisms are ubiquitous across ruminant animals, are abundant, and encode many novel carbon-degrading enzymes1,9. These rumen metagenomic studies expanded our knowledge of the rumen ecosystem and provided evidence that uncultivated microorganisms are involved in anaerobic carbon degradation. While this information has begun to refine our understanding of the rumen, the activity and metabolic interactions between rumen microorganisms and between microorganisms and their chemical environment remain unknown.

To uncover microbial metabolic interactions involved in anaerobic carbon degradation, we sampled rumen fluid from moose consuming a diet high in more recalcitrant woody biomass (e.g., twigs and bark). These moose were rumen fistulated,

42 allowing us to sample rumen microorganisms and their expressed proteins in their native environment during the digestion process. Using genome-resolved metaproteomics combined with multiple methods of carbon metabolite quantification, we provide the first multi-scale analyses of carbon cycling in the rumen. These data were used to infer microbial interactions with available metabolites in situ. Here, we show the complex interplay between rumen microorganisms, their enzyme systems, and available substrate pools to define the trophic structure of the rumen ecosystem.

3.2 Materials and Methods

3.2.1 Experimental design, sequencing, assembly, genome reconstruction

Two fistulated female moose (both aged 12 years) were released into a 12-acre enclosure for two weeks during the spring (May), summer (August) and winter (January) in 2014–2015 at the University of Alaska’s Matanuska Research Center. Rumen microbial community samples were collected during the second week of each dietary regime following the experimental design (Figure 2). The animal sampling protocol

(IACUC 505811-4) was approved by the University of Alaska, Fairbanks. For microbial community analyses, rumen fluid samples were first centrifuged at low speed (6,000 x g) to separate plant material (pellet). The supernatant was then sequentially filtered onto a

0.8 um filter (F08) and a 0.2 µm filter (F02). One half of the pellet and the filters were submitted for sequencing. The remaining half of the filters and pellet were frozen, and 43 sent to Environmental Molecular Sciences Laboratory for metaproteomics. Additionally, the 0.2 µm filtrate was concentrated and submitted for sequencing of small cells and viruses. Illumina sequences from DNA extracted from each of the four samples were assembled individually, and then co-assembled55. Genome fragments were binned using multiple approaches. Individual assemblies and co-assembled metagenomes were binned individually using emergent self-organizing map analysis (ESOM), MetaBAT, and a combination of phylogenetic signature and GC content (manual binning)53,54,100. For

ESOM, the primary map structure was established using 2 kb fragments (all fragments

>10 kb were subdivided into 2 kb fragments). For MetaBAT binning, we used the very sensitive setting for all scaffolds >2 kb. Manual binning was performed as previously described54. Code and further software descriptions for metagenome assembly are available online https://github.com/TheWrightonLab/metagenome_assembly.

3.2.2 Genome dereplication, annotation and proteomics

Because we used multiple binning approaches on all four metagenomes, we recovered the same genome multiple times. We first dereplicated our genomes by generating an alignment of scaffolds within one genome individually against scaffolds of all other bins at 98% nucleotide identity or greater. Genomes were then grouped at >50% similarity level, and the best representative was chosen based on a scoring system of single copy genes. The score is equivalent to the number of archaeal or bacterial single copy genes (-2) times the number of multiple single copy genes101. In the case of a tie, the genome with the greatest nucleotide information (length) was chosen. Details of the

44 annotation and proteomic analysis were published previously2 and briefly summarized here. Genes were predicted using MetaProdigal and compared using USEARCH40 to

KEGG, UniRef90, and InterproScan with single and reverse best-hit (RBH) matches greater than 60 bits reported102-106. These predicted proteins formed a database that was searched via MSGF+62 with collected 2D-LC-MS/MS data from the extracted biomass on the filters and plant pellet as described previously2. Briefly, filters were extracted using

SDS-lysis buffer, 100 mM Tris/HCl and sonication at 40% amplitude for 20 seconds and repeated. This is consistent with best practices for proteomic extraction methods in the gut107. Data were collated using an in-house program, imported into a SQL server database, filtered to ~ 1% FDR (peptide to spectrum level) and combined at the protein level to provide unique peptide count (per protein) and observation count (that is, spectral count) data. Protein identification was based on two or more unique peptides per protein.

All scripts used for metagenome annotation are available online https://github.com/TheWrightonLab/metagenome_annotation.

3.2.3 Phylogenetic analyses

Phylogenetic analysis was performed using two different markers, the 16S rRNA gene and a syntenic block of 16 universal ribosomal proteins (L2-L6, L14-L16,

L18, L22, L24, S3, S8, S10, S17 and S19)101,108. Both methods were used for validation of phylogeny when possible, however 16S rRNA gene sequences were only recovered in 16 genome bins. Genomes from other rumen metagenomic datasets were recruited for the phylogenetic resolution of novel lineages. Ribosomal proteins were

45 identified in reference genomes using metaprodigal and annotation pipeline described above. Individual ribosomal proteins were aligned by phyla using MUSCLE57 with default parameters and then manually curated to remove end gaps. Individual protein alignments were concatenated in Geneious version 7109. Phylogenetic analyses for ribosomal proteins and 16S rRNA genes were inferred using RAxML60, under the

PROTGAMMALG method for protein sequences and GTRGAMMA method for 16S rRNA genes108. All trees were rooted to genomes in the phylum. For assigning names to our genome bins, we required at least three genomes, which are monophyletic by concatenated ribosomal protein tree analyses with bootstrap support

>70. If present, 16S rRNA gene phylogeny needed to be congruent with concatenated ribosomal protein analyses.

3.2.4 Detecting polysaccharide utilization loci and predicting substrates

These genes are encoded within polysaccharide utilization loci (PUL), which are composed of SusCD-like proteins surrounded by GH, transporters, and carbon recognition proteins used for the import and degradation of complex carbohydrates. To comprehensively profile the PULs from all of our Bacteroidetes genomes, we first performed a general search for co-localized SusCD-like proteins. Since SusC is a TonB- dependent receptor and commonly found in most organisms, we scanned our genome annotation files for PFAM ids associated with SusD-like proteins (PF07980, PF12741,

PF12771, PF14322)110,111. If genes with the PFAM IDs were co-localized with a TonB-

46 receptor or annotated SusC-like protein, we examined the surrounding genes for a

CAZyme (identified via dbCAN112) or peptidase within 6 open reading frames in the same gene orientation. This analysis resulted in a total of 198-like proteins based on experimentally verified PULs113-115. Many of these PUL systems have a unique organization of GH or peptidase genes surrounding the SusCD-like proteins, and the PUL could not be confidently assigned to a substrate via putative functional gene annotations.

All glucan PUL containing a GH16, GH30 as well as a beta-glucosidase (i.e. GH3) were categorized as a putative mixed-linked glucan (MLG) PUL113. We did not characterize any PUL as cellulolytic because this has never been experimentally verified. However, for all PUL that contained an endoglucanase (GH5, GH9), β-glucosidase (GH3), and/or cellobiose phosphorylase (GH94), in absence of GH that target the side-chains of branched glucans, we specifically identified them as β-(1,4)-glucan PULs.

3.2.5 Metabolic reconstruction and network analyses

To construct the metabolic heat map, we scanned our genome annotations for genes involved specific metabolisms. For polymer degradation, we performed a PFAM scan, as previously described2, and looked for GH enzymes involved in the degradation of specific polymer substrates detected with our comprehensive microarray polymer profiling (CoMPP) analyses, following similar structure to the PUL analyses. For the ability to degrade cellulose, a full cellulosome needed to be detected, or a PUL identical to a PUL experimentally verified to degrade cellulose needed to be present in the bin. For sugar utilization, we required the full pathway of all rare 6C sugar monomers (fucose,

47 mannose, rhamnose), and at least 6 of 9 EMP pathway genes. For 5C sugars, we required the presence of the specific isomerase and epimerase and the full pentose phosphate pathway. For fermentation end products, we looked for all possible pathways, and required 2/3 of the genes to be present for the organism to be capable of producing it. All genes and associated EC numbers used in these analyses can be found in Appendix B.

Beyond genomic potential, we wanted to evaluate which of these metabolic traits were active. For polymer degradation, at least one gene with > 1 unique peptide in the polymer degradation mechanism was required (e.g., for cellulose, one of the cellulosome modules needed to be detected in metaproteomics, for PULs one of the co-localized genes needed to be turned on). Because all sugars eventually are fermented via the EMP pathway, we looked for the expression of specific sugar isomerases, or kinases (rhamnose isomerase, galactokinase, mannose-6-phosphate isomerase, xylose isomerase, arabinose isomerase, glucose-6-phosphate isomerase, fucose isomerase) to determine substrate. We also required the detection of one downstream EMP pathway gene. Glucose-6-phosphate isomerase was utilized for determining whether or not an organism was expressing glucose metabolism genes. However, because galactose enters the EMP pathway at this step, we also required peptides for glucokinase or a glucose specific transporter to determine if it was also using glucose. If we detected proteins for the degradation of a plant compound or a sugar isomerase, we created a connection between genome nodes and substrate nodes. Network was visualized in Cytoscape 3.4.0. The number of edges connected to a node determined connectivity.

48 3.2.6 Carbohydrate microarray profiling

Cell-wall glycans were sequentially extracted from whole rumen content from winter rumen fluid samples (n=6, 10 mg each) using 50 mM diamino-cyclo-hexane-tetra- acetic acid (CDTA), pH 7.5 and 4 M NaOH with 1% v/v NaBH4. These compounds

(NaOH and NaBH4) are known to predominantly release pectins and non-cellulosic polysaccharides, respectively116. For each extraction, 300 µl of solvent was added to 10 mg of rumen samples and then incubated at room temperature with shaking for 2 h.

Samples were then centrifuged at 2,700-x g for 10 min to remove cell debris. Retained supernatants were diluted sequentially (1/2,1/5,1/5,1/5) in microarray printing buffer

(55.2% glycerol, 44% water, 0.8% Triton X-100) and the four dilutions printed in quadruplet onto nitrocellulose membranes using a non-contact microarray robot

(Arrayjet, Roslin, UK). Every replicate was therefore represented by a 16-spot sub-array

(four concentrations and four printing replicates). Arrays were probed with monoclonal antibodies (mAbs) or carbohydrate binding modules (CBMs)116, scanned, and uploaded into Array-Pro Analyzer 6.3 analysis software. The maximal mean spot signal was set to

100%, and all other values within that data set were adjusted accordingly. A mean spot signal minimum was set as 5%. Results from six rumen fluid samples were averaged to calculate a mean abundance of individual plant polymers.

3.2.7 Discovery of viral contigs from metagenomes

VirSorter was used to recover viral contigs based on the identification of viral hallmark genes, strings of hypothetical proteins, and other viral signatures, as previously

49 described. Each of the four metagenomic datasets was used in a separate VirSorter run.

We only considered VirSorter categories 1 and 2 (and 4 and 5, the provirus equivalents of categories 1 and 2), which are the categories with the highest confidence117. The dataset of detected viral populations was manually curated to a final set of 1,907 viral contigs by ensuring consistency with a viral origin in the PFAM annotation, as described previously118. This viral database was dereplicated by clustering at 95% nucleic acid identity with Cd-Hit v4.6119 to generate a final 810 unique viruses for classification.

3.2.8 Taxonomic classification of viral contigs via vContact

A network-based gene content classification was used to place the 1,053 viruses in the context of known viruses118,119. Briefly, predicted proteins from viral contigs were clustered with predicted proteins from viruses in the NCBI RefSeq database (v75, June

2016)120 based on all-vs-all BLASTp search with an E-value threshold of 10-4, and protein clusters (PCs) were defined with MCL, as previously described118,119. vContact was then used to calculate a similarity score for each contig-genome or genome-genome pair (https://bitbucket.org/MAVERICLab/vcontact, accessed September 2016)121. The stringency of the similarity score was evaluated through 1,000 randomizations by permuting PCs or singletons (proteins without significant shared similarity to other protein sequences) within pairs of genomes and/or contigs having a significance score ≤1

(a negative control)121. Subsequently, pairs of genomes and/or contigs with a similarity score >1 were clustered into viral clusters (VCs) with the Markov clustering algorithm

(MCL) with an inflation of 2, as previously described118. The resulting network was

50 visualized with Cytoscape software (version 3.4.0, http://cytoscape.org/), using an edge- weighted spring embedded model, which places the genomes and/or contigs that share more PCs in closer proximity in the display. Reference sequences from the RefSeq genomes that co-clustered with the 1,053 rumen viruses in this study were used to predict viral taxonomy. A last common ancestry (LCA) approach was applied to all reference sequence containing VCs in which RefSeq genomes were clustered. Taxonomic affiliation of rumen viruses was based on the taxonomy of the RefSeq genomes. If the

RefSeq genomes differed in taxonomy, the highest taxonomic level in common for the reference sequences was retained. If VCs exclusively contained rumen viruses from this study, the VC was considered “novel.”

Predicted viral proteins from unique genomes were searched as described for

MAGs106. Viral protein identification was first compared to the microbial metagenomes, and overlapping hits were subtracted. To be more sensitive and able to detect more ‘rare’ viruses, we considered hits for viral proteins based on one or more unique peptides per protein.

51

3.3 Results and Discussion

3.3.1 MAGs define taxonomic and metabolic classification of the rumen microbial

communities

We obtained 53.8 Gbp of Illumina HiSeq sequencing data from size-fractionated rumen fluid samples (plant pellet, 0.8-µm filter, 0.2-µm filter, 0.2-µm filtrate)2. Here, we leveraged unique access to rumen-fistulated and naturally foraging moose rather than sampling hunter-killed animals as with prior moose microbiome studies10,65,98,122 (Figure

10). This real-time sampling captured the in situ rumen conditions, providing metaproteomics and metabolite data to enable recapitulation of active microbial carbon cycling. Metagenomes were assembled separately, co-assembled, and binned using several approaches (ESOM, MetaBAT, phylogenetic signature and GC content) to allow for the greatest recovery of genomes55.

52 30 Prevotella sp. * Genome recovered 25 * Genome recovered with 16S rRNA gene

20

15 RC9 sp. uncultivated Ruminococcaceae genera ** 10 uncultivated BS11 genera uncultivated Prevotellaceae genera * * Genera Relative Abundace (%) Genera Relative Treponema sp. Tenericutes RF9 order 5 uncultivated Lachnospiraceae genera * Butyrivibrio sp. * Ruminococcus sp. uncultivated S24-7 genera Fibrobacter sp. Selenomonas sp. Methanobrevibacter sp. 0 * 1 Rank 50

Figure 10. Rank abundance curve of top 50 genera sampled in 16S rRNA gene amplicon sequencing. Blue bars denote taxa genomically sampled here and include genus level taxonomic assignment. Genera denoted as “uncultivated” could not be resolved beyond the family level and represent novel genera. Asterix indicate that a sampled genome in that genus had a partial 16S rRNA gene sequence (>300 bp). Insert: Moose rumen fluid sampling was conducted via the cannula highlighted with the white arrow.

53 We reconstructed a total of 356 metagenome-assembled genomes (MAGs). The same genome was often sampled multiple times (different size fractions or binning methods); thus, we selected the highest scoring genome as the unique genotype101.

Because our goal was to create a robust genome database for metaproteomic curation, we focused our analyses on 77 unique bacterial and archaeal MAGs (>75% complete) and

810 viral contigs (>10,000 bp). A majority (71%) of the MAGs belonged to uncultivated lineages that lacked genomically derived metabolic or phylogenetic insights to the genus level. Based on the recently proposed GSC standards123, for every genome we report the recovery of tRNA, rRNA, and assembly quality statistics to conclude that all genomes are at least medium quality, with 11 of the 77 being high quality. These genomes represent the most enriched genera identified in 16S rRNA amplicon analyses (Figure 10), hinting at their functional relevance in this ecosystem.

Genome bins were taxonomically assigned based on phylogenetic congruence of single-copy marker gene, concatenated ribosomal protein, and 16S rRNA gene trees. We also recruited 345 genomes from other metagenomics datasets including the recently published UBA genomes10-12,25 to resolve new classes (2), orders (1), families (4), and genera (9) across six phyla (Bacteroidetes, Firmicutes, Tenericutes, Saccharibacteria,

Proteobacteria, Lentisphaerae) (Figure 11). Three of these lineages are composed entirely of genomes recovered from rumen datasets, suggesting a lineage-ecosystem relationship for these organisms (Bacteroidetes families Candidatus Ruminaceae and

Candidatus Hungataceae, and Firmicutes genus Candidatus Hungatadium). A detailed taxonomy, metabolic overview, and naming description for each of our newly recovered

MAGs is provided (Appendix B). The remaining 29% of our genomes belong to genera 54 with existing cultivated representatives: Prevotella (6), Rhodospirillum (3),

Ruminococcus (1), Ruminoclostridium (1), Selenomonas (2), Butyrivibrio (5),

Methanobrevibacter (1), Fibrobacter (1), and Treponema (1). Using a combination of genomes from our study, as well as those previously unassigned from other rumen metagenome studies, we provide the first context for 189 genomes belonging to 12 previously undescribed lineages (Figure 11).

55 Phylogenetic novelty Ruminoclostridium sp. Ca.Algibacter class order Ruminococcus sp. * Ca.Vansoestibacter family genus Selenomonas sp. * known Ca.Hungatadium species species * *16S present Butyrivibrio sp. Tree scale: 0.5

Methanobrevibacter sp.

RF9 Firmicute s Ca. Prevotella sp. Tuttuvakaceae

r * u E *

s T e e t Genome sampled here n e

e

d r i

o Svartstrom et al. 2017 S

* r

a

e

t

c Parks et al. 2017 S

a

p

B

i

Ca. Hungataceae Stewart et al. 2017 r * L

Hess et al. 2011 Ca. * P

r Nanosyncoccales

o Ca. Palmerella* Hungate 1000 project t

Fi Treponema sp. Ca. Alyeska

uncultivated Aeromonadaceae Ca.Ormerodus genus

Ca.Denaliaceae Rhodospirillum sp. Fibrobacter sp. * BS11Ca. Alcium * RC9 sp. Ca. Ruminaceae Figure 11. Phylogeny and genomic sampling of rumen MAGs. Maximum likelihood tree of ribosomal protein rpsC (S3) with reference isolate genomes, genomes from other rumen MAG studies, and genomes recovered here. Branches are marked with lines colored by rumen dataset where genome originated (center legend). Colored circles on the outside of the tree highlight genome phylogenetic novelty of genomes recovered in this study (top left legend). Asterix indicate genome bins containing a partial 16S rRNA gene sequence (>300bp). Tree is rooted by Euryarchaeota. Phylum-level groups are outlined in shades of blue and are labelled on the inside circle (Eur, Euryarchaeota; Sa, Saccharibacteria (TM7); Spir, ; Lent, Lentisphaera; Gp, ; Ap, ; Fi, ). Named groups have grey shading behind circles, including genomes belonging to known genera. Note red lines are missing for RC9 genomes because they did not contain rpsC proteins.

56 Metabolic reconstruction of our 77 unique MAGs revealed that the capacity to use plant polymers and sugars was predominant across phyla, with starch and glucose the most well represented substrates (Figure 12). The capacity for fermenting multiple sugars was a widely encoded trait in the Bacteroidetes, with genomes having the capacity to use an average of 5 different sugars. On the other hand, the Firmicutes,

Saccharibacteria (TM7), and Tenericutes are more specialized. Acetate was the most widely encoded fermentation product, while hydrogen and succinate production were mainly constrained to Firmicutes and Bacteroidetes lineages respectively. Comparative genomic analyses reveal broader metabolic capabilities than fermentation alone, with capacity for respiratory metabolism using fumarate, nitrate, nitrite, and trimethylamine-

N-oxide as electron acceptors encoded in our genomes (Figure 12). With the exception of respiration, carbon metabolisms (e.g., polymer degradation, sugar utilization, and

SCFA production) were all well represented in our metaproteomic data from different organisms (Figure 13). This indicates that the rumen is a functionally redundant ecosystem that may support SCFA production under different dietary regimes.

57 A Tree scale: 0.5

Bacteroidetes Proteobacteria Tenericutes Fibrobacteres Spirochaetes Saccharibacteria (TM7) Lentisphaera Firmicutes Euryarchaeota

B Pectin Starch Chitin Galactan Xyloglucan Xylan Arabinan Mannan Cellulose Glucan Fucose cleavage Arabinose cleavage Rhamnose cleavage

Plant polymer degradation PPO laccase rhamnose mannose xylose fructose galactose arabinose fucose glucose Sugar Utilization glucokinase glucose-6-P isomerase 6-phosphofructokinase fructose BP aldolase triose-P-isomerase G-3-P dehydrogenase PG kinase 2,3-BPG mutase enolase pyruvate kinase PEP carboxykinase EMP PAthway genes PAthway EMP oxidative non−oxidative PPP succinate lactate acetate butyrate propionate hydrogen butanol End products rnf ech complex I fumarate reductase nitrate reductase nitrite reductase TMAO reductase cytochrome c oxidase cytochrome b1

Energy generation cytochrome d ubiquinol BACT31 BACT35 BACT33 BACT32 BACT28 BACT29 BACT24 BACT25 BACT30 BACT38 BACT36 BACT39 BACT26 BACT27 BACT22 BACT18 BACT20 BACT19 BACT21 BACT17 BACT15 BACT13 BACT10 BACT11 BACT3 BACT9 BACT4 BACT7 BACT8 BACT6 BACT14 BACT201 BACT083 FIBRO1 LENT1 LENT2 APROT3 APROT2 APROT1 GPROT1 DPROT1 SPIRO2 FIRM26 FIRM25 FIRM24 FIRM23 FIRM22 FIRM27 FIRM21 FIRM20 FIRM19 FIRM17 FIRM16 FIRM15 FIRM11 FIRM13 FIRM9 FIRM12 FIRM10 FIRM8 FIRM7 FIRM5 FIRM6 FIRM4 FIRM3 FIRM14 FIRM2 FIRM1 TENER4 TENER2 TENER3 TENER1 TM71 TM74 TM72 ARCHA1

Bacteroidetes Firmicutes Figure 12. Metabolic reconstruction of all 77 unique near-complete genomes in this study. (A) Maximum likelihood tree of concatenated ribosomal proteins from all genome bins recovered in this study. Branches are colored by phyla. (B) Heatmap shows the presence of genes or pathways (listed on the right) found in each genome bin (bottom). The presence of a gene/pathway is denoted by a box, colored by taxonomic assignment of the bin. Genes/pathways that were not detected in that bin are represented with a black box. For pathway completion 60% of the pathway needed to be present. Functional category denoted in bold left side. Abbreviations are as follows: PPP, Pentose Phosphate Pathway; EMP, Embden-Meyerhoff-Parnas; P, Phosphate; BP, bisphosphate; G-3-P, glyceraldehyde-3-phosphate; PG, phosphoglycerate; BPG, bisphosphoglycerate; PPO, polyphenol oxidase; TMAO, trimethylamine-N-oxide. 58 A

Bacteroidetes Proteobacteria Tenericutes Tree scale: 0.5 Fibrobacteres Spirochaetes Saccharibacteria (TM7) Lentisphaera Firmicutes Euryarchaeota

B Pectin Starch Chitin Galactan Xyloglucan Xylan Arabinan Mannan Cellulose Glucan Fucose cleavage Arabinose cleavage Rhamnose cleavage

Plant polymer degradation PPO laccase rhamnose mannose xylose fructose galactose arabinose fucose glucose Sugar Utilization glucokinase glucose-6-P isomerase 6-phosphofructokinase fructose BP aldolase triose-P-isomerase G-3-P dehydrogenase PG kinase 2,3-BPG mutase enolase pyruvate kinase PEP carboxykinase EMP PAthway genes PAthway EMP oxidative non−oxidative PPP succinate lactate acetate butyrate propionate hydrogen butanol End products rnf ech complex I fumarate reductase nitrate reductase nitrite reductase TMAO reductase cytochrome c oxidase cytochrome b1

Energy generation cytochrome d ubiquinol BACT31 BACT35 BACT33 BACT32 BACT28 BACT29 BACT24 BACT25 BACT30 BACT38 BACT36 BACT39 BACT26 BACT27 BACT22 BACT18 BACT20 BACT19 BACT21 BACT17 BACT15 BACT13 BACT10 BACT11 BACT3 BACT9 BACT4 BACT7 BACT8 BACT6 BACT14 BACT201 BACT083 FIBRO1 LENT1 LENT2 APROT3 APROT2 APROT1 GPROT1 DPROT1 SPIRO2 FIRM26 FIRM25 FIRM24 FIRM23 FIRM22 FIRM27 FIRM21 FIRM20 FIRM19 FIRM17 FIRM16 FIRM15 FIRM11 FIRM13 FIRM9 FIRM12 FIRM10 FIRM8 FIRM7 FIRM5 FIRM6 FIRM4 FIRM3 FIRM14 FIRM2 FIRM1 TENER4 TENER2 TENER3 TENER1 TM71 TM74 TM72 ARCHA1

Bacteroidetes Firmicutes

Figure 13. Proteomic summary of all 77 unique near-complete genomes in this study. (A) Maximum likelihood tree of concatenated ribosomal proteins from all genome bins recovered in this study. Branches are colored by phyla. (B) Heatmap shows the presence of genes (listed on the right) found in each genome bin (bottom). The presence of a gene is denoted by a box, colored by taxonomic assignment of the bin. Genes that were not detected in that bin are represented with a black box. Six clusters of genes are shown based on shared functional features (functional category denoted in bold left side). Abbreviations are the same as Figure 12.

59 3.3.2 Polysaccharide utilization loci are critical for rumen carbon degradation

Consistent with prior rumen metagenomic studies8-13, the Bacteroidetes were the most abundant and well sampled phyla in our dataset (Figure 14). This included genomes from previously established taxonomies, and newly resolved families (Ca. Hungataceae,

Ca. Denaliaceae, Ca. Ruminaceae) and genera (Prevotellaceae Ca. Alyeska,

Muribaculaceae (S24-7) Ca. Ormerodus) (Figure 14). Our metaproteomics data reveal the significant role of Bacteroidetes in rumen carbon transformation, encoding 91% of the 84 total glycoside hydrolase (GH) proteins detected (Figure 15). Consistent with prior reports from cultivated representatives, these Bacteroidetes GHs are located within polysaccharide utilization loci (PUL) and not cellulosome complexes. We do not think the high representation of PUL in our proteomics data is biased by the abundance of

Bacteroidetes relative to Firmicutes, as we detected cellulosome GHs from low abundant

Firmicutes. This data provides the first insight into the contribution of PULs, relative to cellulosomes, to active rumen carbon transformation, and leaves open the possibility for their importance in human gastrointestinal tracts as well115.

60 class order family Relative Abundance (log10) * 0.001 0.01 0.1 genus species 16S present * BS11 (52) Ca. Hungataceae (4) Ca. Ruminaceae (4*) Ca. Denaliaceae (8*) * Bacteroidetes Bacteroidales undefined BACT6, BACT7 Paludibacteraceae undefined * Rikenellaceae RC9 sp. (80) Marinilabilaceae undefined Muribaculaceae Ca. Ormerodus (8*) Prevotellaceae Ca. Alyeska (11) * Prevotellaceae Ca. Palmerella (8*) Prevotellaceae Prevotella sp. Prevotellaceae Alloprevotella sp. * Clostridiales vadinBB60 (18) Clostridiales family XIII undefined Lachnospiraceae Ca. Hungatadium (4*) *

Fi Lachnospiraceae undefined (6) * r

micutes Lachnospiraceae Butyrivibrio sp. Ruminococcaceae Ca. Vansoestibacter (13*) * Ruminococcaceae Ca. Algibacter (20*) * Ruminococcaceae Ruminococcus sp. Ruminococcaceae Ruminoclostridium sp. Selenomonadaceae Selenomonas sp. Catabacteraceae Catabacter sp. Sa * Ca. Nanosyncoccus Lent Undefined (5) Lentisphaeria undefined (3) Ten * * RF9 Ca. Tuttuvakaceae undefined (16*)

Prot Aeromonadaceae undefined Deltaproteobacteria undefined Rhodobacteraceae Rhodospirillum sp. F Fibrobacter sp. E Methanobrevibacter sp. S Treponema sp.

Figure 14. Novelty and relative abundance of all recovered genomes across all metagenomes. Bars on the left axis indicate phylum (Sa, Saccharibacteria; Lent, Lentisphaera; Tener, Tenericutes; Prot, Proteobacteria; F, Fibrobacteres; E, Euryarchaeota; S, Spirochaetes). Circles represent genome bins and are colored by phylogenetic novelty. Black lines separate family level groupings, while grey lines separate genera, within the same family. Names are listed on the right, with proposed names of groups uncovered here in bold. Parentheses denote total number of genomes in this group, with stars indicating the presence of a 16S rRNA gene sequence. Genome circles with a recovered 16S rRNA gene fragment are starred.

61 A B Tenericutes Cellulosomes Spirochaetes

PUL Firmicutes

Bacteroidetes

Fibrobacteres

Proteobacteria

GH genes Carbon Degradation Mechanisms Expressed

Figure 15. Proteomics summary of carbon degrading enzymes and mechanisms. (A) Glycoside hydrolase (GH) genes detected in bins with metaproteomics and associated phylum-level taxonomy. Most of the GH genes detected belong to Bacteroidetes bins. Similarly, analysis of carbon-degrading mechanisms detected in metaproteomics (B) revealed that the PUL mechanisms was detected far more than cellulosome modules.

PUL are defined as gene clusters of SusCD-like proteins surrounded by enzymes that enable the bacterium to recognize, import, and degrade polymeric carbohydrates115,124. Across our 31 Bacteroidetes genomes (Figure 16A), we identified

198 PULs, 35% of which had a least one of the co-localized genes (SusCD-like or GH) detected in proteomics (Figure 16B). More than half of the expressed PUL were assigned to RC9 and Prevotella genera (Figure 16B), with the former specialized to hemicellulose and pectin, and the latter more generalists (Figure 16C). Our global proteomic sampling of active PUL and predicted substrates in the rumen illustrate the functional

62 differentiation of Bacteroidetes members in carbon processing and suggest the existence of specific clade-substrate associations.

PUL detected in proteomics were highly coordinated to the available rumen substrates. For instance, from our PUL analyses the degradation of hemicellulose was broadly expressed across the active community (Figure 16C), and thus we predicted this substrate was highly available in the rumen. Our chemical analyses supported this inference, demonstrating hemicellulose polymers (e.g., xylan, mannan, and xyloglucan) and their monomeric sugar substituents (e.g., xylose, glucose) were highly abundant

(Figure 17, Figure 18). Additionally, the most abundant PUL detected in proteomics operates on pectin sub-structures measured in our chemical analyses (galactan, arabinan and desterified pectin homogalacturonan). These metabolites support predictions from our reconstructed microbial genomes and paired proteomes, assigning previously undescribed lineages to roles in hemicellulose and pectin degradation in the rumen.

63 Figure 16. Detection and expression of polysaccharide utilization loci (PUL) across recovered Bacteroidales members. (A) Maximum likelihood of concatenated ribosomal proteins from all Bacteroidetes genome bins recovered here. These genomes span at least 10 families within the Bacteroidales order. Lines connecting tree to genome bin name are colored by taxonomic family assignment, with colors assigned in the legend. Bootstrap support greater than 70 is represented by a grey circle. (B) The number of PUL systems encoded (light grey) and expressed (colored) from genome bins. (C) Expressed PUL systems organized by substrate. Colored boxes indicate at least 1 gene within the PUL was detected in proteomics, with 2 or greater unique peptides.

64 Hemicellulose Pectin

60

40

CoMPP values 20

0

xylan o abinan rabinan rabinan r rified HG rabin −D−glucan−D−glucan β β −D−galactan−L−a −L−a rminal xylose Cellulose β α ified pectinα HG r yl−este te xyloglucan LM25xyloglucan LM25 h xylogalacturonan (1−3)− −D−mannan LM22 β processed a (1−4)−beta−D−xylan (1−4)− (1−5)− met −D−xylan/a −D−mannan BS-400-4 β (1−3)(1−4)− β mannan/galactomannande−este xyloglucan (XXXG motif) ised (1−5)− (1−4)− r (1−4)− (1−4)− linea

Figure 17. Measured plant polymers and sugars in winter rumen fluid. CoMPP values of detected cellulose (black), hemicellulose (dark grey), and pectin (light grey) polymers

65 3.3.3 Carbon turnover is mediated by many uncultivated taxa

In anaerobic ecosystems, carbohydrate degradation is performed through interconnected microbial metabolisms125. To uncover these metabolic networks in the rumen, we reconstructed the active carbon degradation network and coordinated this expression data to the carbon metabolite pools. Based on linkages to specific substrate classes, genomes were assigned to one of three trophic levels in the carbon food chain:

(1) plant polymer degradation, (2) mixed polymer degradation and sugar fermentation, or

(3) exclusive sugar fermentation (Figure 18C).

Organisms from which we solely detected proteins for polymer degradation included the Prevotella sp. (BACT35, BACT31), Ruminococcus sp. (FIRM12), and

Fibrobacter sp. (FIBRO1). These genomes are closely related to previously isolated rumen bacteria that have been shown to degrade polymers in the laboratory. Here, we confirm for the first time these processes are active in situ. Notably, a Prevotella spp. that is most dominant in our dataset is the only member to express genes for xyloglucan degradation, suggesting usage of this substrate confers dominance of this organism in the rumen. Cellulose, possibly the most recalcitrant substrate we detected, is degraded by enzymes within cellulosomes expressed from two Firmicutes genomes in the

Ruminococcaceae, Ruminococcus sp. (FIRM12) and Ca. Vansoestibacter (FIRM7)

(Figure 18C, Appendix B). The rarity of cellulose and xyloglucan degraders suggests that these lineages play a foundational role in the conversion of complex plant material into more readily degradable substrates, fueling the metabolisms of other sugar and polymer degraders.

66 200 A 35 Hemicellulose Pectin *Not measured B 30 150

25 M) μ

20 100

15

10 50 Concentration ( CoMPP value CoMPP

5

0 0 MLG Mannan Xylan Xyloglucan Galactan Arabinan HG Cellulose Starch* Fructan* Glucose Fucose Galactose Arabinose Xylose Rhamnose Mannose

C FIRM12 Polymer Degraders FIRM7 Cellulose Xyloglucan BACT35 GPROT SPIRO2 BACT31 FIBRO1 FIRM4

ß-1,4 Starch MLG Xylan Mannan Galactan Fructan Arabinan Pectin glucan Polymer Degraders and BACT14 Sugar Fermenters BACT29 BACT13 BACT3 BACT10 BACT 083 BACT28

BACT33 BACT15 BACT21 BACT32 BACT25 BACT201 FIRM19 Hub genomes

BACT22 BACT24 D BACT17 BACT11 BACT19 BACT39 BACT22 BACT25 BACT24 BACT33 BACT32 BACT13 BACT15 Pectin FIRM1 Hemicellulose ß-Glucans Starch

FIRM14 Glucose Xylose MannosMannosee Galactose Rhamnose Fucose Arabinose

FIRM27 FIRM16

FIRM10

FIRM15 FIRM11 Sugar Fermenters FIRM17 FIRM24 BACT18 FIRM8 FIRM20 FIRM22 FIRM21 BACT30 FIRM23 FIRM25 FIRM2 BACT36

Clostridiales Incertae Sedis XIII Lachnospiraceae Ca. Hungatadium Ca. Alyeska Ca. Hungataceae Butyrivibrio sp. Clostridiales vadin BB60 BS11 genera Ca. Denaliaceae Ruminococcus sp. Ruminococcaceae Ca. Algibacter RC9 sp. Prevotella sp. Ruminoclostridium sp. Ruminococcaceae Ca. Vansoestibacter Marinilabilaceae Ca. Palmerella Selenomonas sp. unclassified Lachnospiraceae Paludibacteraceae Muribaculaceae

Figure 18. Network analysis of plant carbon degradation. (A) Average comprehensive microarray polymer profiling (CoMPP) value of detected carbon polymers in rumen fluid. (B) Concentration of 5 and 6 C sugars measured by 1H NMR (C) Network nodes represent carbon substrates (rectangles) and genome bins (circles). Abbreviations and antibodies used for carbon substrates are provided (Table X). Genome bins are sized by total coverage (abundance). Nodes are connected if proteins for degrading the substrate were detected in metaproteomics and unique to the bin. Polymers are at the top in dark grey; sugars are at the bottom in light grey. White stars indicate genome bin with proteins for short-chain fatty acid production detected in metaproteomics. Highly connected (>6 connections) genome nodes and edges to substrates are outlined in red. (D) Summary of the degradation of bulk polymer types by hub genomes.

67 Unlike exclusive polymer degraders, the next trophic level is composed of metabolically more versatile members. Seven Bacteroidetes genomes (BACT22,

BACT25, BACT24, BACT33, BACT32, BACT13, BACT15) are co-expressing the genes for the utilization of six or more substrates (polymers and sugars) and are referred to as metabolic hubs (Figure 18C, red outline). Of these genomes, Prevotella sp.

BACT32 is a generalist, expressing genes for the degradation of pectin, starch, hemicellulose and β-1,4-glucans (Figure 18D). The other highly connected genomes are more specialized in their substrate utilization, only expressing genes for the degradation of either hemicellulose or pectin polymers and the fermentation of the subsequent sugars.

For example, RC9 (BACT13) expresses a PUL for the importation and degradation of xylan, and it also expressed xylose isomerase for the subsequent sugar fermentation of the cleaved xylose monomer. More globally, every carbon substrate detected (polymer or sugar) has the potential to be degraded by proteins expressed by metabolic hub genomes.

All hub genomes belong to the Bacteroidetes utilizing the PUL system, which imports plant polymers prior to degradation. This highlights that Bacteroidetes PULs are a successful and vital strategy for carbon degradation in this rumen ecosystem.

The third trophic level is composed of obligate fermenters of five- and six-carbon sugars, organisms that are not expressing GHs to degrade plant polymers but, instead, express the isomerase or kinase for the incorporation of specific sugars into central metabolism. Glucose is the most abundant sugar and also the most highly connected metabolite node with 12 genomes expressing glucose-6-phosphate isomerase for its utilization. However, only one hub genome is expressing genes for glucose degradation, implying that versatile organisms tend to rely on substrates other than glucose. Unlike in 68 the first two trophic groups, Firmicutes constitute more than 80% of the genomes in this third fraction. Seven of these Firmicutes genomes belong to three genera, Buytrivibrio,

Ruminoclostridium, and Selenomonas, which are well represented in rumen culture collections. Surprisingly, all Butyrivibrio genomes encode GH genes for hemicellulose degradation that were not detected in our metaproteomes. This may be a reflection of the ability for the Bacteroidetes PUL system to outcompete Butyrivibrio carbon degradation mechanisms on a high lignocellulose diet in the moose rumen. The remaining 11 genomes are the first metabolically characterized representatives of new Bacteroidetes and Firmicutes families or genera. These results demonstrate the clear contribution of known and novel lineages to active sugar fermentation in the rumen (Figure 18, App B).

The key outcome of microbial carbon degradation in the rumen is the production of butyrate, acetate, and propionate, which are the critical SCFAs for host energy3. While

SCFA production was encoded in nearly all the genomes (Figure 12), our proteomics data highlighted the contributions by 17 members (Figure 13, Figure 18, star). These findings highlight that a diverse array of substrates can support the production of SCFA.

Many of these SCFA-producing bacteria were conserved across ruminants (e.g., RC9, uncultivated Prevotellaceae, uncultivated Ruminococcaceae), despite differences in plant diets. This finding suggests that the metabolic end products, and not substrate utilization, dictate the maintenance of these microorganisms across ruminants. Collectively, our integrated metabolite-proteomic results parse microorganisms into substrate niches and reveal the metabolic triaging of plant biomass among different microbial genotypes.

69 A Taxonomy B 800 rumen novel novel unique viral genomes detected in proteomics dsDNA viruses 1000 600 dsDNA;Caudovirales dsDNA;Caudovirales;Siphoviridae dsDNA;Caudovirales;Podoviridae 100 400 dsDNA;Caudovirales;Myoviridae unclassified phages 10

Viral genomes (#) 200 Total Coverage (log10)

1 0 Plant 0.2 ɥm 0.8 ɥm post-0.2 ɥm 1 500 1000 Pellet filter filter filtrate Rank

Figure 19. Summary of viral contigs, host interactions and expression of viral proteins. (A) The number and affiliated taxonomy of unique viral genomes across metagenomes. The 0.2 um filtrate had the most viral contigs recovered as many of the microorganisms remained on the filter. (B) Rank abundance curve of all unique viruses detected and their total coverage across all 4 metagenomes. Red bars indicate genomes with proteins detected in metaproteomics.

3.3.4 Viral infections are a key modulating factor of rumen microbial ecosystems

Recent bovine rumen metagenomics datasets have noted differences in viral populations with diet, although viral taxonomy and activity in non-bovine rumen ecosystems remain poorly characterized8,36. Here, we identified a set of 810 viral populations, with 93 circular (closed) viral genomes from our moose metagenomes. We taxonomically classified these rumen viral populations and defined 148 different viral genera, 75% of which lack a database reference sequence and represent novel candidate viral genera (Figure 19). Thirty-five of these novel moose viral genera are the same

70 genera as viruses previously sampled in the cow rumen38 and may represent conserved rumen viruses, which is the first demonstration showing that different animals share viral genotypes.

We also mined our metaproteomics data for expression of viral genes, detecting a total of 64 viral proteins from 53 viral contigs (Figure 19B). Most viral proteins (80%) had no known functions and were only identified as hypothetical proteins, while the remaining were largely structural proteins like capsids126. While 23% of our viral genomes belong to tailed phages (Siphoviridae, Podoviridae, Myoviridae), only 4 tail genes were detected in our proteomic data. This undersampling of tail proteins was previously reported in marine systems126, and together with isolate data from marine phages, indicates MS proteomics not capture tail proteins. However, we note there are only a handful of viral proteomic data sets currently available for comparison. This is the first time that viruses have been observed in proteomic data in the rumen, signifying that these previously enigmatic components are active players in moose rumen microbial communities.

71 A Podoviridae B MLG SusSus D rnf gliding FIRM13 C motility Siphoviridae Sus C PPP Sus APROT3 D EMP FIRM24 Glyc Cas3 sdh Type I-U CRISPR-cas Myoviridae acetate FIRM22 succinate FIRM1 FIRM2 BACT25

BACT39 FIRM23 BACT8

BACT03 BACT3

BACT083

TENER1 BACT36 Rumen novel BACT25 BACT38

see inset (B) BACT33 BACT24 BACT29

Novel BACT35

Ca. Ruminaceae BACT26 BS11 genera Ca.Hungataceae BACT32 RC9 sp. Prevotellaceae Ca. Palmerella Ca. Alyeska Prevotella sp. BACT22 BACT201 Ruminococcaceae Ca. Algibacter BACT11 Selenomonas sp. Butyrivibrio sp. Figure 20. Host-viral interaction network. (A) Viral genomes (hexagons) are connected to predicted microbial host genomes (circles) by edges. All genomes are sized by abundance across datasets, and colored by taxonomy. Viral genomes are colored by taxonomy: Siphoviridae, grey; Podoviridae, blue; Myoviridae, orange; novel genera, white zigzag; novel genera found in other rumen datasets, red zigzag. Microbial host genome bins are colored by taxonomy labeled on the bottom left. An edge was drawn if a confident link could be established by tetranucleotide frequence (grey edges) or protospacers within CRISPR-cas systems (black edges). Length of edge has no meaning. Viral genome/contigs with proteins detected in proteomics are outlined in red. Outlined box is blown up to show specific protospacer links in (B). Genome cartoon of Ca. Alyeska bin BACT25, highlights that this organism expresses genes for hemicellulose (MLGs) degradation via the PUL system to yield acetate and succinate. This genome bin contains a CRISPR-cas type I-U system with protospacer links to 6 viral genomes.

72 We next considered whether these rumen viruses could impact carbon cycling by infecting ecologically critical microbes. To evaluate virus-host interaction dynamics, we used in silico approaches to link viruses and hosts via genomic features. Using similarities in virus-host genome tetranucleotide frequency and CRISPR protospacer matches to viral genomes, hosts were predicted for 113 viral contigs spanning 4 of the 9 phyla sampled. Forty-six of these viral contigs could be directly linked to a specific metagenomic bin, including the most dominant and active bacterial carbon degrading populations Prevotella sp. (BACT35) and BS11 genome bins (Figure 20A). One bacterial bin in particular, Ca. Alyeska (BACT25), which encodes a Type I-U CRISPR-

Cas system127, contained protospacers from 6 viral contigs/genomes, 5 of which are members of previously undescribed viral genera (Figure 20B). One of these novel viral genomes had proteins detected in our proteomic data (hypothetical protein, caaA1 transport protein) (Figure 20B). This implies viral interactions with active carbon cycling microorganisms. These findings show that viral predation could indirectly impact all levels of carbon processing, targeting organisms that use complex polymers like xyloglucan (BACT35), hemicellulose polymers (BACT25), and hemicellulosic sugars like xylose (BS11).

In addition to linking viruses to hosts computationally and via CRISPR-cas protospacers, we also mined our microbial genomes for integrated prophage. Here we recovered a total of 40 prophage in 24 of our microbial genome bins. Most of the prophage were identified in Firmicutes bins. Although none of these phage proteins were detected in proteomics (likely due to the low abundance of these organisms in the ecosystem), these prophage may be contributing to the decreased replication and growth 73 of these organisms on a low nutrient diet128. Notably, we did not detect prophage in any of the seven metabolic hub genomes, which may be contributing to their inferred success in this ecosystem. Our results expand upon prior detection of viruses in the rumen to indicate that viral predation is actively occurring in the rumen ecosystem, and dominant and important carbon degrading organisms are engaged in an arms race with phage.

3.4 Conclusion

Here, we provide a phylogenetic framework and naming system for hundreds of genomes from at least 12 new lineages in the Bacteroidetes, Firmicutes, and Tenericutes that lack a cultivated representative to the genus or family level. Many of these lineages are not unique to the rumen but are present across other host ecosystems. In particular, the Bacteroidetes PUL enzyme systems are known to be active in the human gut111,114,115, soils124, and the deep ocean113. Furthermore, by linking these reconstructed microbial genomes and paired proteomes to metabolites, we have dissected carbohydrate active phenotypes, partitioning genomes into unique substrate niches. Additionally, our rumen virome and proteome data identifies viruses conserved across animal gastrointestinal tracts and shows these viruses actively infect dominant carbohydrate degrading microorganisms to modulate gastrointestinal carbon degradation. In conclusion, themes identified here will extend beyond the rumen to the basic ecology of other anoxic carbon-

74 rich ecosystems, providing a foundation to model and develop a predictive understanding of carbon degradation.

75

Chapter 4: Unlocking condensed tannin resistance mechanisms in the moose rumen

4.1 Introduction

One of the most elusive steps of the microbial carbon cycle is the anaerobic degradation of plant polyphenolics known as condensed tannins. Condensed tannins (CT) are polymers of flavanol units linked by carbon-carbon bonds129. These carbon polymers have been shown to be strong controllers of ecosystem function130-133, with additions to soils retarding rates of microbial organic matter decomposition. Several mechanisms have been proposed by which CT controls microbial metabolism, including CT binding to and inactivating carbon degrading enzymes, forming complexes with proteins causing them to precipitate, and/or causing cells to lyse due to toxicity131-133. Microorganisms can evade these negative consequences by developing resistance to CT, which is described as microbial growth and metabolism in the presence of CT concentrations typically toxic to other microorganisms134. Mechanisms of microbial CT resistance include both biodegradation and tolerance135. CT tolerant, but not degradative, organisms (1) upregulate proteins that enhance CT precipitation, (2) secrete polymeric substrates (e.g. lipopolysaccharide, glycocalyx) that bind CTs and protect cell membrane integrity, (3) and facilitate CT polymerization131,135-137 (Figure 21). CT biodegradation, on the other

76 hand, is defined here as the alteration of the CT chemical structure by enzymatic processes138 and may result in energy generation for the organism and/or detoxification of the compound to a less noxious metabolite. Given the importance of microbial-CT interactions in human and animal gastrointestinal tracts139-141, bio-energy142, and bioremediation of phenolic waste streams143, there is a critical need to understand the currently enigmatic CT degrading enzymes, the organisms in which they operate, and the prevalence of these processes in a diverse array of biomes.

Previous studies examining the relationship between gut microorganisms and tannins have resulted in many isolates from human saliva144, feces145, rumen fluid146,147, and tannery soil143. Lactobacillus plantarum WCFS1, isolated from human saliva, can degrade a mixture of dietary tannins using the enzymes tannase and gallate decarboxylass144. In another study, Selenomonas ruminatium, isolated from rumen fluid, grew fermentatively on CT as the sole carbon and energy source, apparently because of the tannase enzyme148. For these prior isolate studies, the experiments did not use purified CT, including contamination with hydrolysable tannin (HT). Tannase (or tannin acyl hydrolase) is present in both of these organisms and cleaves HT to release glucose.

This enzyme likely operated on HT, rather than degrading the CT polymer, releasing glucose to support bacterial fermentation in the presence of CT. Additionally, most of these experiments were devoid of genomic data and thus lack correlation to specific gene function. Fortunately, there is convincing evidence for CT degradation from metabolite experiments from the human gut, to soils and wastewater149. Anoxic enrichments of human fecal material removed >90% of the starting polymeric CTs, degrading it to monomeric CTs and further into low-molecular-weight aromatic acids139,150. Given the 77 importance of CTs across ecosystems, there is a need for controlled, pure-culture physiological studies with purified CTs and isolate genome sequencing to resolve microbial-CT interactions.

To enrich for CT-tolerant or -degradative organisms, we used anaerobic batch reactors inoculated with washed cell suspensions of moose rumen fluid. These reactors were amended with CT (1.5g CT/ 100 mL media) and compared to non-CT treated controls. Moose rumen fluid was selected as the inoculum because these microbial communities are naturally exposed to high CT concentrations (up to 2 g/L in fall and winter94 when samples were taken). Given the obligate dependence of these animals on rumen microorganisms for energy, we posited that there is a high probability that this microbiome contains currently unrecognized CT resistance mechanisms. Microbial community analyses revealed enrichment of three OTUs belonging to the Pasturellaceae and Streptococcaceae families in CT-treated reactors. From these CT enrichments, we obtained a Streptococcus gallolyticus isolate that grew in the presence of glucose and CT.

I then developed proteomic approaches and analyses methods to identify possible enzymes enriched upon CT exposure compared to non-CT conditions. Complementary data, including quantifying CT-derived metabolites with HPLC and mass-spectrometry, growth measurements, and microscopy is currently ongoing in the laboratory.

78 Figure 21. Illustration of known CT degradation and tolerance mechanisms. Left) Proposed pathway for CT degradation, with unknown genes marked with “??” in black boxes. Many genes in this pathway that are known are poorly annotated. Right) Putative tolerance mechanisms include (i) production of secreted enzymes that bind CT (ii) those that enhance nutrient scavenging (e.g., metal acquiring siderophores) and (iii) those involved in the formation of biofilms or capsules conferring membrane protection from CT. Green X denotes tannin bound compounds.

79 4.2 Materials and Methods

4.2.1 Experimental design

Previous studies examining the potential for microbial tannin degradation used excreta or digesta from animals such as and ruminants that consume forages with high tannin content145,147,151. Following these successes, we used rumen fluid from moose consuming a high lignocellulose diet that contained up to 2 g/L CT in fall and winter94.

One moose was allowed to forage in a native Alaskan pasture for 1 week. On the 8th day, rumen fluid was collected via the rumen cannula and then immediately shipped on ice to

Columbus, Ohio. Upon arrival, rumen fluid was filtered to remove plant biomass, then microbial cells were extracted via centrifugation and rinsed using an anoxic washed cell suspension. The rinsed cell pellet was then resuspended anoxically into basal bicarbonate media with either 1.5 g CT /100mL media or no CT under anaerobic conditions with an

N2:CO2 (80:20) headspace (Figure 22). We also boiled 10 mL of the cell suspension for

20 minutes and inoculated 0.1 mL of this killed cell suspension into media with 1.5 g

CT/100 mL media as a killed cell control (Figure 22). These bottles were allowed to incubate for 28 days at 39°C. Samples were taken on day 28 (1 mL) for DNA extraction and sequencing (see methods below)

80 Figure 22. Experimental Design for CT enrichment. Rumen fluid samples from moose consuming a diet high in CT were spun down, washed, and inoculated into bicarbonate enrichment media. After a 28-day incubation, DNA was extracted and submitted for sequencing.

4.2.2 Preparation of stocks, media, and cell suspension fluid

Condensed tannin stock was prepared at 15 g CT/100 mL media (10x

concentration) and placed in 10 mL media tubes or diluted (0.1x) when supplemented on

plates for a final concentration of 1.5 g/100 mL media. Sorghum CT (1.43g) was

dissolved in sodium bicarbonate anaerobic media (9.53 mL). After CT dissolved, the

entire contents was taken up in a 10 mL syringe under a flame, and then filtered on a 0.2-

um cellulose nitrate filter into a sterile anaerobic media bottle. The CT stock was then

bubbled for 20 mins with 80:20 N2:CO2 gas using a flame-sterilized cannula in the media

and in the headspace.

81 One liter of bicarbonate media was prepared using ammonium chloride (0.25 g), sodium phosphate-monobasic (0.6 g), potassium chloride (0.1 g) sodium bicarbonate (2.5 g), and vitamin and mineral stocks from David Lovely’s lab (10 mL each)152. The materials were weighed out in a 2-L Erlenmeyer flask, with the exception of the sodium bicarbonate. MilliQ water (980 mL) was added, and then the medium was bubbled under

N2:CO2 (80:20) on a stir plate with the heat on. After five minutes, the sodium bicarbonate was added and then bubbled while stirring until the solution came to a boil.

The media was then cooled on an ice bath and distributed into media bottles, anaerobically.

Brain Heart Infusion (BHI) plates were prepared according to manufacturer’s instructions (37 g BHI per liter of water) with 15 g of agar added per liter. Media was autoclaved for 20 mins and plates were poured when bottle was cool to the touch. Plates were stored at room temperature in the anaerobic chamber for 2 d to equilibrate. In the anaerobic chamber, CT stock (15 g CT/100 mL media) was diluted in basal bicarbonate media to a final concentration of 1.5 g CT/100 mL media. The CT stock was prepared fresh on the day plates were inoculated. Using a pipette, 1 mL of diluted CT stock (1.5 g/100 mL) was spread on the plate.

Cell suspension fluid was prepared using ammonium chloride (0.25 g), sodium phosphate-monobasic (0.379 g) and sodium phosphate-dibasic (0.9713 g) per liter of water. Materials were placed in an Erlenmeyer flask (2 L) and boiled under N2 gas before being anoxically distributed into reactors.

82 4.2.3 Preparation of cells

Immediately upon arrival, rumen fluid was placed in the anaerobic chamber and allowed to equilibrate for 2-3 h. Rumen fluid was filtered through three layers of cheesecloth into an autoclaved plastic beaker, twice to remove all plant material. Filtered rumen fluid was distributed into autoclaved, anoxic centrifuge bottles and centrifuged at

6,000-x g for 6 mins. Bottles were transferred back into the anaerobic chamber to remove the supernatant. The cell pellet was gently washed with fresh cell suspension solution

(100 mL) by swirling the bottle. Bottles were then balanced and centrifuged again for 6 minutes at 6,000-x g. Supernatant was removed, and cells were resuspended in 100 mL of fresh cell suspension solution by swirling the bottle. Resuspended cells were placed in an anaerobic media bottle and sealed in the anaerobic chamber. A parallel resuspended washed cell solution (10 mL) was boiled for 20 minutes for the killed cell control.

4.2.4 DNA extraction and 16S rRNA gene sequencing

For the enrichment cultures, samples were taken immediately after inoculation and 28 d later by removing 1 mL using a sterile needle under flame. Cells were spun down at 10,000-x g for 5 mins. Supernatant was removed and cell pellets were resuspended in lysis buffer from moBio DNA extraction kit following manufacturer’s protocol. DNA was submitted to Argonne National Laboratory on 2/15/2017 for 16S rRNA gene sequencing and analyzed according to previously published methods2 using

QIIME 1.9.047. Raw reads from this experiment can be found in

/home/projects/170215_Wrighton_Daly_16S_fastqs/ with barcode sequences for these

83 samples found in the folder:

/home/projects/170215_Wrighton_Daly_16S_fastqs/LMSqiime_1487796531/validate_m apping_file_output. Sample descriptions are as follows: LMS1, live cells without CT;

LMS2, killed cells with CT; LMS3, live cells with CT.

4.2.5 Isolation, genome sequencing, and genome analyses

From the anoxic reactors, we sought to isolate the enriched taxa. The reactor bottle with CT and live cells was placed in the anaerobic chamber, where the enrichment was streaked on BHI plates supplemented with 1 mL of diluted CT stock (1.5 g/100 mL).

Once growth appeared (after 3-7 d), colonies were restreaked on fresh BHI plates supplemented with CT (1.5 g/100 mL). Cultures were tested for isolation when colonies on the plate contained identical colony morphologies. The identity of the isolates was confirmed using PCR amplification and sequencing of the near full length 16S rRNA gene using 27F and 1492R primers153.

To sequence the genome of our isolates, one colony was inoculated into BHI media tubes (10 mL). Cells were grown to late log phase (~48 h) before removing 6 mL and pelleting. DNA was extracted from the pellets using Qiagen DNeasy PowerSoil kit, with cell lysis step including FastPrep run at 5.0 m/s for 30 s. DNA was sequenced on a

HiSeq 2500 at OSU Comprehensive Cancer Center, which is an NCI subsidized shared resource supported by Cancer Center Support Grant #P30CA016058. Low quality reads were removed using sickle154. High quality, trimmed reads were assembled using IDBA-

84 ud52. Genomes were binned using a combination of GC content, taxonomic signature, and coverage, as previously described2.

Genomes were annotated using the Wrighton Lab annotation pipeline as previously described2. Because genes involved in CT degradation are likely not well annotated in any database, we mined our genomes for genes that have previously been implicated in CT degradation using a BLAST search with a bit score >100 considered homologs155. Genes included in blast search can be found in Table 1. Fasta file of all of these genes can be found on the Wrighton lab server here:

/home/projects/alaska_moose/CT_degradation/Degradation_gene_blasts/CT_GrantBlast.t xt.

85 Gene name Function Organism Ref. Gallate 3,4,5-THB to pyrogallol Lactobacillus plantarum WCFS1 144 decarboxylase (lcdBCD) Catechol oxidase Oxidation of diphenols Plants 156 Tannin acyl Cleaving gallic acid from many 157 hydrolase sugar units Polyphenol Oxidation of aromatic many 158 oxidase compounds Super oxide Dismutation of many 159 dismutase superoxide to peroxide Protocatechuate Breaks down Pseudomonas putida, 160 3,4-dioxygenase protocatechuate with O2 Paenibacillus sp. JJ Protocatechuate Breaks down Sphingomonas paucimobilis 161 4,5-dioxygenase protocatechuate with O2 Chalcone Flavonone to chalcone (C Eubacterium ramulus 162 isomerase (heterocycling) ring opening) Enoate reductase Reduction of carbon- Eubacterium ramulus 162 carbon double bonds fdeBCE Naringenin degradation Herbaspirillum seropedicae SmR1 163 catA Catechol 1,2- Pseudomonas putida 164 dioxygenase Laccase Oxidase enzymes subtilis 165 typically operating on phenols Multi-copper Oxidize substrate with E. coli 166 oxidase electrons MarR Transcriptional regulator Sulfolobus solfataricus 167-169 LysR Transcriptional regulator Pseudomonas putida, 170,171 Acinetobacter calcoaceticus, Herbaspirillum seropedicae SmR1

Table 1. Genes used to search via BLAST for CT-degradation genes present in the isolate genome.

86 4.2.6 CT degradation experiment, proteomics and metabolite analyses

Isolates were maintained in basal bicarbonate media with 1 mM glucose and 1.5 g

CT/100 mL of media. To quantify differential expression of proteins in the presence of

CT and without CT, we set up an experiment with triplicates in two treatments: (i) S. gallolyticus with glucose and CT and (ii) S. gallolyticus with glucose and no CT. For the first exploratory experiment, glucose was added to media because the organism grew optimally in the presence of CT concomitant with glucose (data not shown). S. gallolyticus was transferred to basal bicarbonate media with glucose (1 mM) and without

CT, and grown up for 48 h, transferred and repeated twice. The last transfer was grown up for 48 h and inoculated in the experimental tubes (1 mL for a 10% transfer). Three of the experimental tubes were amended with 1 mL of CT stock (for a final concentration of

1.5 g/100 mL), whereas the other three tubes lacked CT amendment and served as glucose only non-CT controls. Once inoculated, cells grew in the presence of CT for 96 h, which is a time previously shown to allow for up to 62% CT removal under these conditions (data not shown). After 96 h, liquid culture (1.5 mL) from each tube was collected, centrifuged for 10 mins at 10,000-x g, separated from the supernatant, and stored at -80 °C until shipment to the Pacific Northwest National Lab.

At Pacific Northwest National Lab, proteins in the pellet were precipitated and washed twice with acetone. Then, the pellet was lightly dried under N2. Filter Aided

Sample Preparation (FASP) kits were used for protein digestion according to the manufacturer’s instructions. Resultant peptides were snap-frozen in liquid N2, digested again overnight and concentrated to ~30 µl using a SpeedVac (Labconco, Kansas City,

MO, USA). Final peptide concentrations were determined using a bicinchoninic acid 87 assay. All mass spectrometric data were acquired using a Q571 Exactive Plus (Thermo

Scientific, Waltham, MA, USA) connected to an nanoACQUITY UPLC M-Class liquid chromatography system (Waters, Milford, MA, USA) via in-house 70 cm column packed using Phenomenex Jupiter 3 µm C18 particles (Torrence, CA, USA) and in-house built electrospray apparatus. MS/MS spectra were compared with the predicted protein collections using the search tool MSGF+58. Contaminant proteins typically observed in proteomics experiments were also included in the protein collections searched. The searches were performed using ± 20 ppm parent mass tolerance, parent signal isotope correction, partially tryptic enzymatic cleavage rules, and variable oxidation of

Methionine. In addition, a decoy sequence approach63 was employed to assess false discovery rates. Data were collated using an in-house program, imported into a SQL server database, filtered to ~1% FDR (peptide to spectrum level) 581, and combined at the protein level to provide unique peptide count (per protein) and observation count (that is, spectral count) data.

For analysis of proteomics data, peptides were mapped to the genome, and only unique peptides were analyzed, i.e. if a peptide mapped to multiple proteins it was removed from the analyses. Normalized spectral abundance factor (NSAF) values were calculated for each protein in every sample, and averaged per treatment172. The heat map was constructed in R with the pheatmap package173 and represents the average difference in z-score values from each treatment (CT and non-CT). Proteins were considered significantly different between treatments using standard ttest (paired, two tailed) on

NSAF values.

Concurrent samples (0.5 mL of liquid culture) were collected at 0 and 96 h for 88 metabolite analyses. Samples were centrifuged for 10 min at 10,000-x g, and the supernatant was separated. Supernatant samples were concentrated to 100 µl. The concentrated supernatant (10 µl) was run on an HPLC C8 reversed phase column using a solvent gradient from 80:20 water:acetonitrile to 90% (v/v) acetonitrile, all containing

0.1% (v/v) trifluoroacetic acid. Analyses for metabolite data using HPLC and LC-MS are underway.

4.3 Results and Discussion

4.3.1 Sorghum CT enriches for potential CT-degrading microorganisms

After 28 d of incubation, 16S rRNA gene analyses on the anoxic reactors revealed an enrichment of three OTUs in the presence of CT compared to the non-CT control.

Two of these OTUs belong to uncultivated lineages in the Pasturellaceae family, and one

OTU belonged to the Streptococcaceae (Figure 23). Interestingly, both of these taxa have been implicated in tannin degradation. The Pasturellaceae family is typically found in a variety of animal gastrointestinal tracts, including marsupials and ruminants consuming plants with high CT content174-176 One Pasturellaceae isolate from koala feces was able to degrade tannic acid and tannin protein complexes145. There are many isolates from the

Streptococcus genus, with species of S. gallolyticus shown to degrade tannins177,178. In light of previous isolates, our preliminary 16S rRNA gene findings suggest that

89 organisms enriched in the experiment may not only tolerate CT, but may be capable of

CT-degradation.

no CT CT addition Pasturellaceae 1 Pasturellaceae 2 Streptococcaceae* Pasturellaceae 3 Coriobacteriaceae Desulfovibrionaceae Ruminococcaceae 1 Ruminococcaceae 2 Family XIII Incertae Sedis PeH15

20 10 0 10 20 30 40 relative abundance in community (%)

Figure 23. CT-resistant microbial taxa. CT amendment decreases relative abundance of dominant taxa (grey) and enriches for rare members (orange). OTUs enriched in the presence of CT are in orange. Isolate matches identically to 16S rRNA gene sequence in the Streptococcaceae family (asterisk).

90 Samples from the CT reactors were streaked on BHI plates supplemented with 1 mL of 1.5 g CT/100 mL media. Three colonies recovered on the plates were confirmed to be pure. Phylogenetic analysis of near-full length (1,408 bp) 16S rRNA gene confirmed that one isolate was most similar (99.9% identity) to S. gallolyticus subsp. gallolyticus

VTM4R20, a previously isolated strain from moose rumen fluid179. For this isolate, moose rumen fluid was collected in Vermont during the hunting season (October) when

CT concentration in leaves is high180. This isolate is capable of producing ropy EPS, extracellular protein, and biogenic amines from ornithine, tyrosine, and lysine179. It was also able to produce acid from a variety of carbohydrates including cellulose, cellobiose,

D-arabinose, D-mannose, D-raffinose, melibiose, salicin, inulin, glycogen, and N- acetylglucosamine179. Notably, the full length 16S rRNA gene from this S. gallolyticus subsp. gallolyticus VTM4R20 isolate and our isolate have a V4 region identical to the V4 sequence in the enrichments, demonstrating that we were able to culture one of the dominant CT-responding microorganisms (Figure 24). Ongoing research efforts are targeting the Pasturelleaceae and working up the other isolates.

91 Streptococcus V4 OTU S. gallolyticus isolate

Streptococcus V4 OTU S. gallolyticus isolate

Streptococcus V4 OTU S. gallolyticus isolate

Streptococcus V4 OTU S. gallolyticus isolate

Streptococcus V4 OTU S. gallolyticus isolate Figure 24. 16S rRNA gene alignment. Alignment of OTU enriched in condensed tannin anoxic reactors (Streptococcus V4 OTU) and full-length 16S rRNA gene sequence from our Streptococcus isolate from moose rumen fluid incubated with CT for 28 days shows a perfect congruence, suggesting the isolate reflects the enriched 16S rRNA gene sequence from the reactor.

4.3.2 S. gallolyticus genome contains putative phenolic degrading genes

Our S. gallolyticus isolate was grown up on BHI and submitted for genome sequencing at The Ohio State University (Columbus, OH). We obtained 13.1 Gbp of sequencing data (87,074,986 high quality paired end reads). High quality reads were assembled into 3,334 scaffolds (>1000 bp). Assembly quality was extremely high (N50 of 195,572), with 93% of the reads mapping to the assembly. The S. gallolyticus genome was present in only 19 contigs, was estimated to be >99% complete by CheckM with 2.3

Mbp size. Genome metabolic analyses revealed two ATP-ases (V-type and F-type), an incomplete TCA cycle, no NADH dehydrogenase complex and no terminal cytochrome

92 oxidases. Taken together, these findings indicate that this organism is likely an obligate fermenter. Our S. gallolyticus isolate genome has both a pyruvate dehydrogenase and a pyruvate formate lyase for converting pyruvate to acetyl-coA. The S. gallolyticus isolate has the capacity to utilize glucose, sucrose, mannose, fructose, lactose, and galactose sugars to produce lactate, acetate, 2,3-butanediol, ethanol, and succinate. We did not detect any hydrogenases for the production of H2.

Identifying putative genes for metabolizing phenolics in a genome is a challenge, as most cultured isolates shown to degrade CT lack genome sequencing and the enzymes for the initial depolymerization steps of this metabolism are currently unknown. Here, we used a combination of BLAST searches and limited annotations to identify putative phenolic metabolism genes encoded in the genome. Organisms that degrade tannins likely degrade both HT and CT. Consistent with this hypothesis, the S. gallolyticus genome encoded two tannase enzymes on the same scaffold within six open reading frames. This well-annotated gene is attributed to HT degradation144,148. Upstream, we also detected a MarR transcriptional regulator, which has been shown in Sulfolobus solfataricus to respond to aromatic compounds and induce detoxification mechanisms167.

We then sought to identify other putative transcriptional regulators for detoxification mechanisms of aromatics and discovered a total of 27, including MarR (12), LysR (12), and PadR (2)167-171. These transcriptional regulators are scattered throughout the genome, leaving many places to look for potential upstream regions of CT-degrading genes.

We also examined the genome for genes that are not directly related to phenolic metabolism but may have activities that could cause ring opening for CT polymer degradation (e.g., degradation of CT polymer or dimers into catechin monomers) (Table 93 1). For instance, polyphenol oxidase, laccases, superoxide dismutase, and peroxidases could potentially oxidize phenolic compounds using oxygen, peroxide, and superoxide

(Table 1). We also considered glutathione-S-transferases and glutathione reductases, as these have been shown to catalyze xenobiotic substrates for detoxification181,182, and poorly annotated (low bit score) hydrolases, reductases, and oxidoreductases because these could be new genes involved in CT metabolism. Although many of these genes support aerobic metabolisms, a recent anaerobic lignin degrading isolate identified putative roles for peroxidase/catalase183 and others have reported low-potential bacterial laccases (-400 mv)184, suggesting that activity of these degradative enzymes may occur across broad redox ranges. S. gallolyticus encodes a superoxide dismutase, three peroxidases, and two glutathione-S-transferase. The glutathione-S-transferases are located downstream of a LysR transcriptional regulator and represent potential gene targets for CT-degradation mechanisms.

Sorghum CT used in this study is composed of carbon-carbon bonds of flavanol units called catechin and epicatechin185. The enzymes and mechanism for degrading these compounds are still largely unknown. There have been enzymatic studies examining the degradation of naringenin, a flavanol structurally similar to catechin and epicatechin162,163. Eubacterium ramulus, a common microorganism in the human gut, can catabolize naringenin monomers using bacterial chalcone isomerase, for heterocyclic ring opening, and enoate reductase162. Herbaspirillum seropedicae SmR1 has an fde operon downstream from a LysR-type transcriptional regulator that confers naringenin catabolism via the meta-cleaveage pathway163. Because these enzymes are not well annotated in databases, we used a BLAST search to identify homologous genes in the S. 94 gallolyticus genomes. One gene, annotated as NADH-dehydrogenase flavin oxidoreductase (scaffold_5_0020) hit to E. ramulus enoate reductase with an e-value 9e-

25, bit score 102, and length of 394 bps. Although this hit has a relatively low percent identity (22%), we did identify some of conserved regions found in all active enoate reductases162 although these proteins are still not very well characterized. Further biochemical experiments are needed to determine the activity of this enzyme on catechin monomers and determine if this protein is involved in CT degradation in S. gallolyticus.

After initial degradation of flavanols (catechin and epicatechin), the phenyl short chain akyl acids are modified and degraded via known phenolic metabolism pathways

(Figure 21). Based on the detection of protocatechuic acid (PCA) in our metabolite data

(data not shown), we examined the S. gallolyticus genome for PCA degradation pathways. We did not detect any of the dioxygenase enzymes for ring opening of PCA

(Table 1, protocatechuate 4,5-dioxygenase, protocatechuate 1,3-dioxygenase), which may explain detection of PCA in our metabolites. S. gallolyticus does encode genes for further degradation of PCA via the para- and ortho-cleavage pathways, although these pathways were incomplete (ortho: 4-carboxymuconolactone decarboxylase, 3-carboxymuconate cycloisomerase; para: 2-hydroxymuconic-6-semialdehyde dehydrogenase, 5-carboxy-2- hydroxymuconate semialdehyde dehydrogenase, 4-oxalocrotonate decarboxylase).

Additionally, we did not detect genes for further degradation of these metabolites via the

TCA cycle, suggesting that CT is not being used for energy generation, but perhaps is broken down for detoxification purposes. In summary, S. gallolyticus has the genomic potential to convert CT degradation intermediates into lower molecular weight

95 compounds, but the pathway remains incomplete without activity measurements and further metabolite analyses.

4.3.3 Proteomics reveals differential expression between CT and non-CT treatments

To determine which proteins were up-regulated in the presence of CT and might hint at enzyme targets involved in CT degradation, we incubated the S. gallolyticus isolate with and without CT for 96 h and sampled replicates from each treatment for metaproteomics. Using our metaproteomics data, we compared the triplicate samples from CT to the non-CT treatment to look for differential detection in proteins. Overall, we detected 1,784 proteins with 294,299 unique peptides. For the remaining analyses, we discuss only the pellet samples.

Comparative analyses between CT and non-CT controls highlighted a few general metabolic changes in S. gallolyticus in the presence of CT. There was a decrease in phosphoenol pyruvate (PEP) carboxykinase, suggesting a decreased conversion of oxaloacetate (OAA) to PEP, perhaps due to a demand for amino acids or fatty acids

(Figure 25)186. Additionally, pyruvate dehydrogenase increased in detection in the presence of CT. In studies of anaerobic growth with Klebsiella pneumonia, deletion of pyruvate formate lyase and corresponding dependence on pyruvate dehydrogenase resulted in maximum 2-3-butanediol yields187. In support of this idea, we also detected more peptides for diacetyl reductase and alcohol dehydrogenase in the presence of CT.

These general shifts in downstream carbon flux indicate that S. gallolyticus is metabolically responding to the presence of CT. Further experiments are needed to

96 determine if S. gallolyticus is metabolizing CT-derived intermediates that could explain this shift in downstream carbon flux.

In addition to general metabolic changes, we also detected proteins that have previously been implicated in tannin metabolism (resistance and/or degradation). In particular we detected the upregulation of six transcriptional regulators that can respond to phenolics like catechin167-171, including LysR, MarR, and PadR (Figure 25). Other proteins on these scaffolds were detected in higher amounts in the presence of CT, but were not significant. We also detected an increase in peptides from one tannase enzyme, the putative enoate reductase, and a hydroxyacylglutathione hydrolase (Figure 25).

Notably we detected a decrease in one of the glutathione-S-transferase proteins in the presence of CT, contrary to our hypothesis. While these enzymes currently cannot be pieced together into a cohesive metabolic pathway, their generic annotation, prior implication in anaerobic lignin degradation183, and increased detection in the presence of

CT lend merit to their further examination and characterization. We also note other enzymes that fit this description (Figure 25). For downstream phenolic metabolism genes

(e.g., PCA degradation), we did not detect significant differential expression of any of these proteins between CT and non-CT. Future experiments incubating S. gallolyticus with radioactively labeled CT-derived intermediates such as PCA, catechin, or other compounds identified with metabolite analyses will help to further resolve this CT metabolism.

As we were analyzing our metaproteomics data, we noted higher detection of stress-related genes in the CT treatment (Figure 25). In addition to increased detection of damage and osmotic stress response proteins (cinA and csbD, respectively), we also 97 detected a decrease in ribosomal proteins, amino acid transporters, and other metabolite transporters. These changes indicate a decreased allocation of resources away from growth (and theoretically to survival strategies), which is a generalized stress response observed in many different organisms188. CT is known to sequester or bind metals, thus, organisms respond to this CT stressor by increasing the production of siderophores and other metal assimilation systems189. In support of this hypothesis, we saw a significant increase in iron transporter proteins in the presence of CT; however, the genome did not contain siderophores or other iron sequestration mechanisms (Figure 25). Interestingly, we also detected proteins likely originating from a prophage encoded within the genome.

In other Streptococcus species, prophages have been shown to have crucial roles in capsule production190, trigger stationary phase under growth stress191, and become lytic in the response to antibiotics192. Other studies have noted that Streptococcus spp. appear stressed in the presence of tannins, which is consistent with our results193. The occurrence of prophage within the genome will be an important consideration for selecting a model organism for future physiological experiments, because the prophage could cause cell lysis under higher CT concentrations. Alternatively, the prophage genes may contain mechanisms to regulate CT responses, such as capsule production in other Streptococcus sp.190, and should not be disregarded as an important mechanism for CT tolerance.

Overall, one contig contained the largest number of proteins detected in our proteomics (scaffold 8). This scaffold was very similar in gene content and organization to Lactobacillus plantarum WCSF1, which can degrade mixed dietary tannins (both CT and HT)144. Specifically we detected two tannase enzymes downstream from gallate decarboxylase subunits C, B, and D (lpdCBD, Figure 26). The gallate decarboxylase 98 subunits for S. gallolyticus are co-localized in the same orientation, which is different than L. plantarum WCFS1, where lpdC is in the opposite direction and in a different area of the genome. The gallate decarboxylase subunits were not detected in our proteomics, which is expected because the Sorghum CT used in this experiment does not contain any hydrolysable tannin. We did, however, detect peptides for the lysR and marR transcriptional regulators, one tannase enzyme, multiple hypothetical proteins, and other metabolic genes in this region. This suggests that this region of the genome may play an important role in mediating response to tannins in this S. gallolyticus isolate.

99

CT NCT PEP carboxykinase EC:4.1.1.31 K01595(strep_scaffold_8_0402) pdhB pyruvate dehydrogenase E1 EC:1.2.4.1 K00162(strep_scaffold_4_0025) pdhA pyruvate dehydrogenase E1 EC:1.2.4.1 K00161(strep_scaffold_4_0024) transaldolase K00616 EC:2.2.1.2(strep_scaffold_31_0001) pgi glucose-6-phosphate isomerase EC:5.3.1.9 K01810(strep_scaffold_37_0015)

Sugar alcohol dehydrogenase EC:1.1.1.1(strep_scaffold_8_0092) Diacetyl reductase(strep_scaffold_5_0147) Metabolism LysR(strep_scaffold_5_0083) LysR(strep_scaffold_8_0328) MarR(strep_scaffold_7_0052) nusB K03625(strep_scaffold_8_0122) lacR HTH type K02530 DeoR family lactose phosphotransferase(strep_scaffold_15_0043) hrcA heat inducible repressor K003705(strep_scaffold_26_0042) galR LacI family K02529(strep_scaffold_15_0022) LysR(strep_scaffold_4_0037) padR(strep_scaffold_37_0006) regulators pheT HTH-type regulator(strep_scaffold_15_0238) MarR(strep_scaffold_5_0136) Transcriptional mecA(strep_scaffold_26_0013) antibiotic biosynthesis monooxygenase(strep_scaffold_5_0145) 2,5-didehydrogluconate reductase EC:1.1.1.274(strep_scaffold_5_0165) acyl-ACP thioesterase(strep_scaffold_8_0465) hydroxyacylglutathione hydrolase EC:3.1.2.6(strep_scaffold_8_0421) 4-oxalocrotonate tautomerase EC:5.3.2.- K01821(strep_scaffold_4_0172) cinnamoyl ester hydrolase(strep_scaffold_15_0089) Tannase (strep_scaffold_8_0332) Phenolic putative enoate reductase (strep_scaffold_5_0020) metabolistm gor NADPH glutathione reductase EC:1.8.1.7 K00383(strep_scaffold_21_0045) glutathione S-transferase K07393(strep_scaffold_5_0175) cinA competence/damage protein K03742(strep_scaffold_7_0073) iron ABC transporter K02013(strep_scaffold_8_0241) z-score iron transporter K02016(strep_scaffold_8_0240) 1.5 osmoprotectant transport K05847(strep_scaffold_5_0160) csbD(strep_scaffold_32_0016) Stress 1 Response phage associated cell wall hydrolase(strep_scaffold_15_0165) phage protein(strep_scaffold_15_0168) 0.5 scaffolding protein(strep_scaffold_15_0183) erf bacteriophage HK97 GP40(strep_scaffold_15_0204) 0 phage DEAD box helicase(strep_scaffold_7_0114) phage scaffold protein(strep_scaffold_7_0129) -0.5 major head protein (phage)(strep_scaffold_7_0130) phage protein(strep_scaffold_7_0131) -1 phage protein(strep_scaffold_7_0136) phage protein(strep_scaffold_7_0140) -1.5 Phage proteins lysozyme(strep_scaffold_7_0145)

Figure 25. Heatmap of proteins detected in proteomics with significant differences between CT treatments. Enzymes are sorted by functional category (listed on the left in bold) with specific gene annotation listed on the right. Values represent average z-score for all replicates in a treatment. Dark red values mean that protein was detected in higher amounts for that treatment. Treatments are abbreviated as follows: CT, condensed tannins added; NCT, no condensed tannins added.

100

function

327,100

ceramidase

ceramidase

hydrolase 324,000 metal-dep other

Bar Bar chart on top represents the average 320,000

transcriptional regulator

enes. tanB

rpiR cheY 316,000 CT (NCT). Gene names with significant differences between

-

putative phenolic metabolism

tanB

lpdD lpdB 312,000 transporter

CT NCT lpdC

(TanR) lysR lysR

308,000

marR hypothetical protein d in inpresence higherd proteomics CT. the of significantly in Gene Gene orientation of putative phenolic metabolism g

reductase glutathione Figure 26. NSAF value for each gene shown in both CT and Non CT and NCT are highlighted in red. Arrows represent genes scaled to length (light and blue, are hypothetical proteins; colored dark green, by membrane inferred transport protein; dark blue, putative phenolic metabolism gene; pink, transcriptional regulator; light green, other). Gene names are listed below the arrows. Gene names in red were detecte

304,000

0.0010 0.0008 0.0006 0.0004 0.0002 0.0000 average NSAF value NSAF average

101

4.4 Conclusions and Outlook

My dissertation research sought to piece together the complexity of rumen microbial communities using multiple tools and analyses (Figure 27). A dietary experiment harnessing the seasonal differences in lignocellulose was performed to enrich for novel microorganisms. I then used metagenomics and metaproteomics tools from rumen fluid samples to profile the organisms present (who is there) and describe their metabolic potential and activity2 (Chapter 2). After, I examined the metabolites and detected proteins, coupled to genome-resolved metagenomics to profile the active rumen microbial community and infer metabolic interaction networks involved in plant carbon degradation (Chapter 3). Finally, I profiled the metabolic potential and expressed proteins of Streptococcus gallolyticus in the presence of CT, to determine proteins putatively involved in CT metabolism (Chapter 4). This isolate is the first to be documented to degrade pure forms of CT anaerobically (data not shown) and will serve as a model to understand how microorganisms tolerate and degrade condensed tannins. This work has advanced knowledge of the microbial interactions and metabolisms underpinning anaerobic carbon degradation in the rumen.

102

Figure 27. Integration of tools to understand rumen microbial communities. Puzzle pieces represent tools and analyses used in this work to better understand the rumen microbial ecosystem. This approach provided a more complete picture of carbon metabolism in the rumen.

Using metagenomic and single gene analyses, this research determined the microorganisms present in the rumen under different dietary conditions (Figure 27, Who is there?). One of the most interesting findings from this work was that changes in diet enriched for rare microbial taxa that were at extremely low abundance in spring, and became the most dominant taxa in the winter (Figure 5B). The dramatic change in abundance of these taxa could potentially represent a PCR artifact194 or sampling anomaly (e.g., taxa could be more abundant on the plants in the winter time, representing transient taxa)195. However, because these taxa were observed multiple times within a time series, it is much more likely that these microorganisms represent conditionally rare 103 taxa2,196 that respond to changes in rumen chemistry. Conditionally rare taxa, or taxa that are typically found in very low abundance but occasionally bloom under certain conditions, are responsible for most microbial community dynamics across ecosystems196. Rare taxa are of particular interest because it is thought that they can contribute to ecosystem stability by acting as a reservoir that rapidly responds to environmental changes197. This is an especially intriguing idea in a ruminant host ecosystem, where the functional role of rumen microbial communities is to harness energy from dietary plant material for the host to survive. Here, we enriched for taxa that were rare on a low lignocellulose diet, and showed that these taxa are the most abundant members (e.g., BS11, RC9, Ca. Aleyska) and also produce SCFA, the energy currency for the host. While we did not perform metagenomics and metaproteomics in the spring, I would hypothesize that these rare taxa are not active in the spring and that carbon degradation and SCFA production are performed by other microbial community members that are dominant in the spring. Our results provide evidence that the rumen ecosystem may maintain taxa at low abundance to allow for flexibility of dietary conditions. This idea is also supported by the fact that many of our conditionally rare taxa are found in over 90% of the ruminants sampled across the globe1.

Another tool used in this research to understand the rumen microbial ecosystem was genome-resolved metaproteomics. Our goal was to further characterize the enriched taxa by predicting their specific roles in carbon degradation. One downside to genome- resolved metaproteomics is that you miss the functions active within the unbinned fraction. To address this concern, I mapped our metaproteomics data to the entire metagenome assembly. A total of 19,190 proteins were detected in the metaproteomics 104 from winter rumen fluid. About half (43%) of these proteins were binned in genomes

>70% complete. I examined the non-binned proteins to see if there were metabolisms and organisms missed in my proteomics analyses. The unbinned metaproteomics story is largely consistent with the genome resolved metaproteomics. Unbinned proteins detected were mainly taxonomically assigned to Prevotella, other Firmicutes and Bacteroidetes, and Protozoa that were likely unbinned due to strain heterogeneity, the presence on a scaffold <2 kb, or were binned but in a genome estimated to be <70% complete.

Glycoside hydrolases (GHs) were detected in the unbinned fraction but were largely assigned to the Bacteroidetes (15), futher suggesting that our genome resolved view presented in Chapter 3 is not biased by the database selected. We did however detect GH proteins from Firmicutes (7), Protozoa (8), and Fungi (2). Further supporting the idea that the Bacteroidetes are responsible for the majority carbon degradation in the winter, we detected an additional 500 SusCD-like proteins in the unbinned metaproteomics.

Using the scaffolds, rather than our near-complete genomes, revealed that there are proteins detected from Protozoa. Eukaryotes do not assemble well in metagenomics datasets198, and although we had scaffolds with protozoal genes on them, we could not confidently bin out these organisms. The genes were mainly taxonomically assigned to

Oxytricha trifallax and Epidinium ecaudatum, which are known rumen ciliates199. There were also microorganisms that we have genomes for but were not well represented in the metaproteomics including Methanobrevibacter sp. (1 genome), Butyrivibrio sp. (5 genomes), and Fibrobacter sp. (1 genome). Perhaps these microorganisms are not active under these conditions in the rumen or were at too low in abundance to be detected.

Metatranscriptomics, with its amplification, would allow deeper sampling of the active 105 rumen microbial community and may provide insight into the functional role of these low abundant microorganisms. Finally, this research focused entirely on sacchrolytic metabolisms, however we recognize that proteolysis, lipolysis, and urea cycling are important and present in the metaproteomics. We detected proteins from PUL that contain peptidases (Figure 15), as well as evidence in our Prevotella genomes for the degradation of amino acids. In the human gut, the amount of dietary nitrogen, and therefore microbially available nitrogen, is thought to shape the outcomes of competition for carbon substrates200. Biohydrogenation of fatty acids by rumen bacteria could also be important for fiber digestion201. While beyond the scope of this dissertation, future studies should consider these nitrogen based metabolisms to gain a more complete understanding of the rumen ecosystem as a whole.

The genome resolved metaproteomics data was used to examine carbon degradation networks present in the rumen on a high lignocellulose diet. Metabolic reconstruction from genomes showed that in general, these organisms are functionally redundant based on their metabolic potential to ferment sugars and produce short-chain fatty acids (Figure 13). By overlaying metaproteomics and incorporating these data into a network analyses, it was clear that these organisms are operating in different functional guilds202. This is best exemplified with the Prevotellaceae family. Within this family, 11 genomes from three different genera were detected in metaproteomics. Members in this family belong to different trophic levels in our network analyses with some only expressing proteins for polymer degradation and others only expressing proteins for sugar fermentation (Figure 18C). Further differentiating these closely related organisms, within the same trophic level these organisms are also utilize different resources. We 106 detected proteins for utilizing one or two substrates from some Prevotella sp. genomes

(BACT31 and BACT29), whereas proteins for the degradation of over six different substrates were detected from others of the same genus (BACT32 and BACT33). These closely related organisms might be partitioning resources in order to coexist, similar to previous observations of the Bacteroidaceae family in the human gut203. This resource- based regulation can contribute to functional stability in the host ecosystem202 and eliminate the need for keystone species. Other studies have shown that Bacteroidetes are important ecosystem stabilizers, as they are abundant and highly connected in the ecological network204.

The hypothesis that closely related Bacteroidetes organisms differentiate their use of carbon substrates is intriguing and should be tested further. If and when resources permit, an experiment using feedbags with 13C-labeled hemicellulose and pectin polymers incubated in the fistulated moose rumen, paired with metagenomics (DNA-

SIP), could shine light on differentiation of sugar utilization by closely related organisms.

In coordination, 16S rRNA gene analyses could identify co-correlated taxa with specific carbon polymers and also be used to model the response of these members under different dietary regimes as has been performed recently in the mouse gut200.

This study was one of the first studies to incorporate metabolites, genome resolved metagenomics, and genome resolved metaproteomics to examine the rumen ecosystem. However, themes uncovered here have been seen in other recent rumen metagenomics and proteomics papers. Microorganisms identified here are prevalent in ruminants and other animals consuming woody biomass across the globe1,46,205. More recently, metagenome assembled genomes, viral genomes, and genomes from culture 107 collections have provided a metabolic blueprint for these uncultivated taxa1,10,206, with many of the groups resolved here (Ca. Ruminaceae, Ca. Hungataceae, Ca. Algibacter) currently only originating from rumen samples. Genomic information from the rumen has expanded our knowledge of the rumen ecosystem by highlighting the prevalence of the polysaccharide utilization loci and the Bacteroidetes11, expanding the sampling of well- known phyla12, and describing microbial interactions with viruses30,39. For instance, our understanding of polysaccharide utilization loci is currently limited to gene surveys and laboratory assays with cultured isolates (e.g., Bacteroides, Flavobacterium, Prevotella).

Gene surveys examining PULs across uncultivated taxa have identified many PULs without CAZymes and only hypothetical proteins in the locus, suggesting that there is much more functional diversity waiting to be discovered in the Bacteroidetes11,12.

108 Appendix A: Additional Data

The same DNA from samples collected for 16S rRNA gene analyses (Figure 2) were also submitted for 18S and ITS sequencing at the Joint Genome institute. Both 18S and ITS were significantly distributed by treatment and by moose (mrpp, p<0.02), and visually clustered by treatment, however spring and summer could not be visually distinguished similar to 16S rRNA gene analyses (Figure 28A, Figure 28B, Figure 4).

We correlated these community analyses to measured metabolites in the rumen fluid.

Both 18S and ITS communities correlated significantly to acetate, propionate, butyrate, protein, lignin, and cellulose (envfit p<0.01) and did not correlate significantly to hemicellulose, similar to the 16S rRNA gene community analyses.

LEFse analyses could not determine any significantly discriminant OTUs with an

LDA above 2.0 with an alpha of 0.01 between treatment or moose in the 18S analyses.

However, we did detect abundant 18S rRNA sequences in our metagenomics data using

EMIRGE that belong to the SAR Alveolata phylum (Figure 28C). LEFse analyses on the

ITS data determined one OTU that is a discriminating feature between moose. This OTU belongs to the Amphisphaeriaceae family from the Ascomycota phylum, which has been previously detected in leaf litter207. There were a total of 15 discriminant features by treatment. Six OTUs were more abundant in the winter diet and distinguish the winter diet from spring and summer (Figure 28D). This is interesting because it is thought that fungi contribute to degradation of lignin and polyphenolics, which were 8% of dry matter and 0.6 mg/g BSA higher in the winter.

109

A 1.0 spring B 1.0 Crude summer Propionate Protein winter Acetate Butyrate Crude Protein moose A 0.5 0.5 Butyrate moose B

Propionate

NMDS2 Acetate 0.0 NMDS2 0.0

Cellulose Cellulose −0.5 −0.5 Lignin NDF Lignin −1.0 −0.5 0.0 0.5 1.0 −1.0 −0.5 0.0 0.5 1.0 NMDS1 NMDS1 C D

Ascomycota SAR_Alveolata Basidiomycota Neocallimastigomycota

U57763_Ophryoscolex Myriangiales_uncultured HM212039_Trichostomatia_unknown Sporidiobolales_Unknown_26517 Curvibasidium U57769_Trichostomatia_unknown Mrakiella AM158444_Trichostomatia_Entodinium Caecomyces Neocallimastigaceae_Unknown_27230 0 2 4 6 8 0 1 2 3 4 5 6 Relative Abundance (%) LDA score (log 10)

Figure 28. 18S rRNA gene and ITS sequencing results. (A) Non-metric multidimensional scaling (NMDS) of Bray-Curtis similarity metric for 18S rRNA gene sequencing shows a significant separation of communities by treatment and moose. Samples are colored by treatment and shapes denote moose. Stress = 0.03 (B) NMDS of of Bray-Curtis similarity metric for ITS sequencing shows a significant separation of communities by treatment and moose, with the key matching part A. (C) Relative abundance of reconstructed 18S rRNA genes in metagenomic data colored by phyla. (D) Significantly discriminate ITS features of the winter diet, colored by phyla.

110

Appendix B: Naming description and metabolic summary of all genomes

Metabolic summary and naming description for novel lineages

To resolve the placement of novel genomes in our dataset containing one or two genomes, we recruited near neighbors from recently published or publicly available rumen metagenomic datasets2,9-11,12,25. This included genomes from the Uncultured

Bacteria and Archaea (UBA) dataset25, 99 partial genomes from a moose metagenome

(MMAG)10, and smaller cattle rumen metagenomes (RMG, and Hess)12. Here, for each of our novel genomes (e.g. representing new genera, families, orders, classes) we provide: a summary of their metabolic potential, genes detected in metaproteomics, and the proposed taxonomy using the established Candidatus (Ca.) classification for the uncultivated majority76. Additionally, we provide a list of genomes from the databases above, which belong to each of these proposed taxonomies (Table 2).

111

Bin names Phylum Class Order Family Genus BACT10, *BACT11, BACT13,BACT Bacteroidetes Bacteroidia Bacteroidales Rikenellaceae RC9 15, BACT4, BACT3, BACT9 BACT22 Bacteroidetes Bacteroidia Bacteroidales Ca. Hungataceae Undefined BACT24, BACT25, Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Ca. Alyeska BACT30 BACT36, Ca. *BACT38, Bacteroidetes Bacteroidia Bacteroidales Prevotellaceae Palmerella BACT39 *BACT6, Bacteroidetes Bacteroidia Bacteroidales Undefined Undefined BACT7 BACT8 Bacteroidetes Bacteroidia Bacteroidales Ca. Ruminaceae Undefined BACT18 Bacteroidetes Bacteroidia Bacteroidales Ca. Denaliaceae Undefined BACT19, Bacteroidetes Bacteroidia Bacteroidales Paludibacteraceae Undefined BACT20 BACT21 Ca. Bacteroidetes Bacteroidia Bacteroidales Muribaculaceae Ormerodus *BACT14, BACT082, Bacteroidetes Bacteroidia Bacteroidales BS11 Ca. Alcium BACT201 BACT083, Ca. BACTP3 Bacteroidetes Bacteroidia Bacteroidales BS11 Hemicellulyti cus FIRM4, FIRM5, Clostridiales Firmicutes Clostridia Clostridiales Undefined *FIRM6 vadinBB60 *FIRM20 Ca. Firmicutes Clostridia Clostridiales Lachnospiraceae Hungatadium *FIRM11, Ca. Firmicutes Clostridia Clostridiales Ruminococcaceae FIRM13 Algibacter *FIRM7, FIRM8 Ca. Firmicutes Clostridia Clostridiales Ruminococcaceae Vansoestibact er *TENER1-3 Tenericutes Mollicutes RF9 Ca. Tuttuvakaceae Undefined *TENER4 Ca. Tenericutes Mollicutes RF9 Ca. Tuttuvakaceae Mikiplasma *TM71, TM72, Nanosyncocc Saccharibacteria Nanosyncoccalia Nanosyncoccales Nanosynechococcus TM74 us

Table 2. The phylogenetic affiliations of novel genomes reconstructed. Genomes are assigned to each taxonomic level with placement defined by >70% bootstrap support. Taxonomic levels that lacked either bootstrap support, three genomes in a monophyletic group, or a nearby existing reference genome are marked as undefined. Asterix indicate genome bins containing a partial 16S rRNA gene sequence >300bp, which was used to validate taxonomic assignment.

112 Description of novel groups in the Saccharibacteria, Lentisphaerae, Tenericutes, and

Proteobacteria phyla

We recovered four novel Saccharibacteria genomes (TM71, TM72, TM73,

TM74) which are all the same genera within a novel class in the Saccharibacteria (TM7).

Two of the three genomes contain a partial 16S rRNA gene fragment, which are monophyletic. These genomes encode the metabolic potential to ferment simple sugars into lactate or acetate (Figure 12), however we did not detect any of these proteins in our metaproteomics. We propose to assign these monophyletic genomes to Ca.

Nanosyncoccalia (class), Ca. Nanosyncoccales (order), Ca. Nanosynecchococcus

(family), Ca. Nanosyncoccus (genus).

We also recovered two genomes in the Lentisphaerae phylum. Metabolic reconstruction of these genomes suggested a fermentative lifestyle, with a partial EMP pathway, likely resulting in acetate production, but we did not detect these proteins in our metaproteomics. One of these genomes is similar to three previously recovered metagenomes from Rifle, CO (RifOxyA12, RifoxyGWF2, RifOxyC12)108, and together represent a new class in the Lentisphaerae. The other genome in our dataset (LENT2) is similar to UBA genomes (UBA1407, UBA1724) and one rifle genome (RifoxyB12)101, which are part of the class Lentisphaeria, but are not classified beyond this.

All of our recovered Tenericutes genomes (TENER1-4) are monophyletic and can be assigned to the Mollicutes class. Two of these genomes contain a partial 16S rRNA gene fragment that belongs to the order called RF9, which is an order lacking a cultivated representative and is prevalent across 99% of ruminant animals sampled1. Based on AAI analyses, this monophyletic clade all belong to the same family208, which we propose the 113 name Ca. Tuttuvakaceae after the Inuit word for moose “Tuttuvak.” This family currently lacks sufficient genomic sampling to resolve genera. Our most dominant and active

Tenericutes genome (TENER4) may rely on the fermentation of proteins and amino acids because in proteomics we only detected hypothetical proteins, endopeptidases, and ribosomal proteins. Two of our Ca. Tuttuvakaceae genomes (TENER1, TENER4) encode full pathways for glycolysis, while the other two (TENER2, TENER3) only encode the ability to breakdown 3 carbon sugars. All genomes encode few glycoside hydrolases indicating reliance on fermentation of glucose or glycerol to acetate, the latter product providing energy for the ruminant.

Within the Proteobacteria, we reconstructed one novel genome in the

Aeromonadaceae family within the Gammaproteobacteria. This genome is monophyletic with 6 genomes from the RMG and MMAG databases, and likely represents a new genus, however these were lower quality, and did not contain a 16S rRNA gene sequence. Given the lack of confident assignment a name was not provided for this genus, however the genome was deposited to JGI for use in future studies.

Description of novel groups within the Firmicutes

To confidently assign our genomic bins, we used a combination of genomic and single gene (16S rRNA) databases. Some phyla, such as the Firmicutes do not have agreement across these databases208. For instance, in NCBI Catabacter belongs to the

Catabacteraceae family within the Clostridiales120. In Silva, this family does not exist but is encompassed in the Christensenellacae family15. Additionally, many named genera are

114 not monophyletic, as genera such as Ruminococcus, Clostridium, and Eubacterium can be found in many separate monophyletic groups. This incongruence prevented the genus- level assignment of six of our Lachnospiraceae genome bins (FIRM21, FIRM19,

FIRM17, FIRM15, FIRM16, FIRM27).

Three of our genomes (FIRM4-6) belong to the Clostridiales vadinBB60 group family within the Clostidiales Incertae Sedis order. FIRM5 and FIRM6 only contain the capacity to degrade 3-carbon sugars and were not detected in our metaproteomics.

FIRM4 has the full glycolysis pathway and has the ability to produce acetate and hydrogen. These two groups are found on separate monophyletic clades, however pairwise AAI results could not distinguish separate genera as all of the genomes had

<65% pairwise AAI209. In metaproteomics we detected proteins from FIRM4 for the degradation of mixed linked glucans indicating this groups potential importance in carbon degradation. This group was also detected in UBA and RMG datasets.

Within the Ruminococcaceae we resolved one new genus, with two of our genomes (FIRM7, FIRM8), and genomes from the RMG (2) and UBA (8) datasets. These genomes encode the capacity to ferment many sugars including rhamnose, xylose, fucose, and glucose fermentation. FIRM8 encodes the capacity to degrade starch, chitin, galactan, xylan, and glucans, while FIRM7 can mainly cleave hemicellulose polymers like xylan, mannan and glucan. FIRM8 is expressing genes for rhamnose fermentation, while FIRM7 contains a highly expressed (11 unique peptides) GH16 likely involved in cellulose degradation highlighting the differentiation within a genus for participation in the anaerobic carbon cycle. For this group of organisms, we propose the new genus name

115 Ca. Vansoestibacter after the author of Nutritional Ecology of the Ruminant Dr. Peter J.

Van Soest3.

We also resolved a second genus in the Ruminococcaceae with no isolated representatives, which is supported by a full length (1,506 bp) 16S rRNA gene fragment in FIRM11 that is 97% similar to the {Eubacterium} coprostanoligenes group

(EU844054, KC162980), with brackets indicating that the taxonomy of this group needs to be reassigned15. FIRM13 can utilize many sugars (glucose, fucose, arabinose, galactose and mannose) and is expressing genes for acetate production (acetate kinase).

FIRM13 also contains two dockerin, and two cohesin molecules and 21 glycoside hydrolases (GH5, GH31, GH3, GH4, GH20, GH63), with one scaffold containing three

GHs, including Cellulase M, indicating that this organism may use cellulosomes to degrade complex plant polymers, although no evidence for this expression was found in the metaproteomic data210. FIRM11 is expressing genes for fermenting fucose via glycolysis and also has the capacity to ferment galactose, xylose, mannose and rhamnose.

As genomes from the moose metagenome10 assisted in the resolution of this genus, (105 bin19, 103 bin67) we propose to name this genus Ca. Algibacter for the Swedish word for moose “Älg”.

One of our genomes (FIRM20) belongs to a monophyletic clade with other genomes sequences as part of the Hungate 1000 project

(http://www.rmgnetwork.org/hungate1000.html) (Figure 13)11. FIRM20 encodes the capacity to ferment fucose, arabinose, galactose, xylose, and mannose, and is expressing genes for fucose fermentation. We propose to name this group of organisms within the

Lachnospiraceae Ca. Hungatadium after Dr. Robert Hungate. 116

Description of sampled families within the Bacteroidales

Within the Bacteroidetes we sampled genomes from at least 10 families within the

Bacteroidales, resolving 3 new families and 3 new genera. Notably, with the combined rumen datasets9-11,12,25 and our genomes, the BS11 family and the previously described isolate Lentimicrobium saccharophilum within the proposed Lentimicrobiaceae form a monophyletic family, with at least 6 monophyletic groups. Previous phylogenetic analyses of this group utilized the Greengenes database release 05_13, which did not contain BS11 as a classification211. The metabolic capacity of BS11 members was previously described by our group2. This family now contains 52 members, with at least 3 resolved genera, including Lentimicrobium211, Ca. Alcium and Ca. Hemicellulyticus2.

We also resolved 3 other families within the order Bacteroidales, a group which single gene analyses have uncovered across many host ecosystems but has remained largely unresolved until recent metagenomic sampling212. Genome BACT8 belongs to a newly resolved family, which only contains genomes from rumen metagenomic datasets, and thus we propose the family name Ca. Ruminaceae. BACT8 encodes the capacity to ferment fructose and galactose, and to produce lactate and succinate. It contains one PUL system for arabinan degradation, and three other PULs for which a substrate could not be confidently determined. In metaproteomics we detected proteins involved in gliding motility, ribosomal proteins and one protein within a PUL that a substrate could not be assigned.

117 Our phylogenetic analyses resolved a second family in the Bacteroidales, here named Ca. Hungataceae (after Robert Hungate, the father of rumen Microbiology), composed of BACT22 from our dataset, genome AJ from Hess 2011 and 2 UBA genomes9,25. BACT22 is a highly versatile organism with the capacity to utilize every sugar examined, except rhamnose, and can degrade many different hemicellulose polymers. BACT22 is one of our hub genomes as it is expressing genes for the degradation of many different substrates (mannan, glucose, mannose, xylose, arabinose, and galactose).

The third newly resolved family in Bacteroidales (here named Ca. Denaliaceae after the National park in Alaska where moose are well-studied) contains BACT18, and seven other genomes recovered from anaerobic reactors and elephant feces metagenomes.

BACT18 has the capacity to degrade xyloglucan, and cleave many sugars from polymeric substrates. It also encodes the capacity to ferment all sugars examined. BACT18 also encodes respiratory capabilities with a full NADH dehydrogenase complex I and fumarate reductase. BACT18 was one of our only Bacteroidetes genomes found in the sugar utilization trophic level, as we detected proteins for the degradation of rhamnose, arabinose, and xylose but not for the degradation of plant polymers. This could be a result of the PULs in BACT18 lacking confident substrate annotation or the low abundance of

BACT18 in our metagenomes.

118 Description of novel genera within known Bacteroidales families

In addition to three new families, we also recovered three new genera existing within established families in the order Bacteroidales. One novel genus contains three genomes that clade with Paraprevotella clara by concatenated ribosomal protein analyses, but without significant bootstrap support. One of these genomes contains a partial 16S rRNA gene (374 base pairs) that is 90% identical to Prevotellaceae UCG-003 genus. These results suggest that these genomes are the first genomic sampling of a novel genus here named Candidatus Palmerella after Palmer, Alaska where our study took place (BACT36, BACT38, BACT39). All of these genomes encode an Rnf complex that likely participates in ion-motive electron transport90, are lacking an NADH dehydrogenase complex I, contain genes for central glycolysis and encode the capacity for the degradation of fructose, galactose, and arabinose. There is differential capacity between these genomes to ferment rhamnose, mannose, xylose and fucose. BACT39 was the only Ca. Palmerella genome detected in our metaproteomics and is expressing genes for the degradation of xylan and the fermentation of xylose and mannose, indicating a role in hemicellulose breakdown.

Also within the Prevotellaceae, three of our genomes (BACT24, BACT25,

BACT30) and nine UBA genomes all from rumen metagenomic datasets are monophyletic away from previously described genera within Prevotellaceae. These genomes encode a wide capacity of polymer degradation, with the exception of chitin.

Additionally, these genomes contain complete glycolysis pathways, an Rnf compled, and the capacity to produce succinate and acetate. Here we propose the name Ca. Alyeska, after the archaic spelling of Alaska where our study took place. One Ca. Alyeska bin 25 119 is a hub genome, expressing genes for the degradation of many compounds, highlighting this groups’ importance in the rumen.

The third genus we resolved was within the S24-7 (Muribaculaceae) family, with one of our genomes (BACT21) and seven genomes from the UBA, RMG and MMAG datasets. This genome encodes a wide respiratory capacity, as we detected cytochrome b1, cytochrome d ubiquinol, NADH dehydrogenase complex I, and fumarate reductase.

We did not detect the expression of this metabolism; thus we predict that BACT21 is breaking down starch, mannan, and fructan and fermenting glucose. For this genus, we propose the name Ca. Ormerodus after the first author of the paper containing the first

S24-7 genomes and the resolution of many new families in the Bacteroidales212.

Finally we also sampled seven genomes from the RC9 gut group genus within the

Rikenellaceae family. This genus (labeled as unclassified Bacteroidales in Greengenes) contains 10,855 accession numbers in Silva obtained from rumen, cecum, feces, and wastewater samples15. With our genomes (7) and recently published genomes from others10,25, this genus now contains genomes from 80 members. Surprisingly, despite prior genomic sampling, the metabolism of these lineages remained undefined. Here we reconstructed the metabolism of our RC9 genomes and identified the potential for hemicellulose and pectin polymer degradation using the Polysaccharide Utilization Loci system. The RC9 genomes encode up to 16 PUL systems per genome and on average are expressing PULs for the degradation of three different substrates. BACT13, BACT15, and BACT11 are the most active RC9 members expressing PULs for pectin, starch, glucans, mannan, peptides, and arabinans (Figure 15). Although active in carbon polymer degradation, the RC9 genomes are also expressing genes for sugar fermentation. 120 Unlike the Prevotella sp., the RC9 are expressing genes for the fermentation of one sugar and it is typically a sugar present in the polymers they can degrade. For instance, RC9 sp. bin BACT10 is expressing genes for pectin and starch degradation, and genes for the fermentation of rhamnose, a common pectin sugar. RC9 sp. bin BACT13, is expressing

PULs for the degradation of arabinan, pectin, and xylan and is fermenting arabinose, rhamnose, and xylose. This metabolically versatile genus within the Rikenellaceae now has a metabolic role assigned and a functional activity that can be used for the cultivation of this genus or to infer the role of these organisms in single gene studies.

Respiratory capacity recovered in genomes

The reduction of trimethylamine-N-oxide (TMAO) to trimethylamine (TMA) has not been seen previously in the rumen before, but has been observed in other cultured organisms from the gut213. While we detected the presence of these genes in our genomes, we did not detect any of these metabolisms in our metaproteomics.

Furthermore, we did not detect TMAO or the resulting product TMA in our winter rumen fluid NMR data. We did detect TMA in our spring and summer rumen fluids, which may indicate that this metabolism is occurring in the rumen under different dietary conditions.

TMA can also result from phosphotidylcholine degradation, which is found in many plant membranes214, and can be further oxidized to TMAO, which requires oxygen. The capacity for anaerobic respiration using fumarate has been observed in Prevotella,

Porphyromonas, Fibrobacter, and Ruminococcus species isolated from the rumen,

121 although this gene can also operate during fermentation for succinate production and may

215 be a mechanism for rumen organisms to maintain low partial H2 pressures .

122

References

1 Henderson, G. et al. Rumen microbial community composition varies with diet and host, but a core microbiome is found across a wide geographical range. Sci Rep 5, 14567, doi:10.1038/srep14567 (2015). 2 Solden, L. M. et al. New roles in hemicellulosic sugar fermentation for the uncultivated Bacteroidetes family BS11. The ISME journal 11, 691-703 (2017). 3 Van Soest, P. Nutritional Ecology of the Ruminant. (1994). 4 Rubino, F. et al. Divergent functional isoforms drive niche specialisation for nutrient acquisition and use in rumen microbiome. The ISME journal 11, 932-944, doi:10.1038/ismej.2016.172 (2017). 5 Emerson, E. L. & Weimer, P. J. Fermentation of model hemicelluloses by Prevotella strains and Butyrivibrio fibrisolvens in pure culture and in ruminal enrichment cultures. Appl Microbiol Biotechnol 101, 4269-4278, doi:10.1007/s00253-017-8150-7 (2017). 6 Purushe, J. et al. Comparative genome analysis of Prevotella ruminicola and Prevotella bryantii: insights into their environmental niche. Microbial ecology 60, 721-729 (2010). 7 Halliwell, G. & Bryant, M. The cellulolytic activity of pure strains of bacteria from the rumen of cattle. Microbiology 32, 441-448 (1963). 8 Brulc, J. M. et al. Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proceedings of the National Academy of Sciences of the United States of America 106, 1948-1953, doi:10.1073/pnas.0806191105 (2009). 9 Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463-467, doi:10.1126/science.1200387 (2011). 10 Svartström, O. et al. Ninety-nine de novo assembled genomes from the moose (Alces alces) rumen microbiome provide new insights into microbial plant biomass degradation. The ISME journal (2017). 11 Seshadri, R. et al. Cultivation and sequencing of rumen microbiome members from the Hungate1000 Collection. Nature biotechnology (2018). 12 Stewart, R. D. et al. Assembly of 913 microbial genomes from metagenomic sequencing of the cow rumen. Nat Commun 9, 870 (2018). 13 Pope, P. B. et al. Metagenomics of the Svalbard reindeer rumen microbiome reveals abundance of polysaccharide utilization loci. PloS one 7, e38571, doi:10.1371/journal.pone.0038571 (2012). 14 DeSantis, T. Z. et al. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Applied and environmental microbiology 72, 5069-5072, doi:10.1128/AEM.03006-05 (2006).

123 15 Quast, C. et al. The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res 41, D590-596, doi:10.1093/nar/gks1219 (2013). 16 Dai, X. et al. Metagenomic insights into the fibrolytic microbiome in yak rumen. PloS one 7, e40430 (2012). 17 Del Pozo, M. V. et al. Microbial β-glucosidases from cow rumen metagenome enhance the saccharification of lignocellulose in combination with commercial cellulase cocktail. Biotechnology for biofuels 5, 73 (2012). 18 Duan, C. J. et al. Isolation and partial characterization of novel genes encoding acidic cellulases from metagenomes of buffalo rumens. Journal of applied microbiology 107, 245-256 (2009). 19 Ferrer, M. et al. Functional metagenomics unveils a multifunctional glycosyl hydrolase from the family 43 catalysing the breakdown of plant polymers in the calf rumen. PloS one 7, e38134 (2012). 20 Ferrer, M. et al. Novel hydrolase diversity retrieved from a metagenome library of bovine rumen microflora. Environmental Microbiology 7, 1996-2010 (2005). 21 Ross, E. M. et al. High throughput whole rumen metagenome profiling using untargeted massively parallel sequencing. BMC genetics 13, 53 (2012). 22 Singh, K. M. et al. Metagenomic analysis of Surti buffalo (Bubalus bubalis) rumen: a preliminary study. Molecular biology reports 39, 4841-4848 (2012). 23 Wang, D., Williams, B. A., Ferruzzi, M. G. & D’Arcy, B. R. Microbial metabolites, but not other phenolics derived from grape seed phenolic extract, are transported through differentiated Caco-2 cell monolayers. Food chemistry 138, 1564-1573 (2013). 24 Tyson, G. W. et al. Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428, 37 (2004). 25 Parks, D. H. et al. Recovery of nearly 8,000 metagenome-assembled genomes substantially expands the tree of life. Nature microbiology 2, 1533 (2017). 26 Debroas, D. & Blanchart, G. Interactions between proteolytic and cellulolytic rumen bacteria during hydrolysis of plant cell wall protein. Reproduction Nutrition Development 33, 283-288 (1993). 27 Miron, J., Duncan, S. H. & Stewart, C. Interactions between rumen bacterial strains during the degradation and utilization of the monosaccharides of barley straw cell‐walls. Journal of Applied Microbiology 76, 282-287 (1994). 28 Miron, J. & Ben-Ghedalia, D. Digestion of cell-wall monosaccharides of ryegrass and alfalfa hays by the ruminal bacteria Fibrobacter succinogenes and Butyrivibrio fibrisolvens. Canadian journal of microbiology 39, 780-786 (1993). 29 Fondevila, M. & Dehority, B. Interactions between Fibrobacter succinogenes, Prevotella ruminicola, and Ruminococcus flavefaciens in the digestion of cellulose from forages. Journal of animal science 74, 678-684 (1996). 30 Anderson, C. L., Sullivan, M. B. & Fernando, S. C. Dietary energy drives the dynamic response of bovine rumen viral communities. Microbiome 5, 155 (2017). 31 Klieve, A. V. & Swain, R. A. Estimation of ruminal bacteriophage numbers by pulsed-field gel electrophoresis and laser densitometry. Applied and environmental microbiology 59, 2299-2303 (1993). 124 32 Raya, R. R. et al. Isolation and characterization of a new T-even bacteriophage, CEV1, and determination of its potential to reduce O157: H7 levels in sheep. Applied and environmental microbiology 72, 6405-6410 (2006). 33 Klieve, A. et al. Bacteriophages that infect the cellulolytic ruminal bacterium Ruminococcus albus AR67. Lett Appl Microbiol 38, 333-338 (2004). 34 Lockington, R. A., Attwood, G. T. & Brooker, J. D. Isolation and characterization of a temperate bacteriophage from the ruminal anaerobe Selenomonas ruminantium. Applied and environmental microbiology 54, 1575-1580 (1988). 35 Klieve, A. V., Gregg, K. & Bauchop, T. Isolation and characterization of lytic phages fromBacterioides ruminicola ssbrevis. Current Microbiology 23, 183-187 (1991). 36 Ross, E. M., Petrovski, S., Moate, P. J. & Hayes, B. J. Metagenomics of rumen bacteriophage from thirteen lactating dairy cattle. BMC microbiology 13, 242 (2013). 37 Kav, A. B. et al. Insights into the bovine rumen plasmidome. Proceedings of the National Academy of Sciences 109, 5452-5457 (2012). 38 Berg Miller, M. E. et al. Phage–bacteria relationships and CRISPR elements revealed by a metagenomic survey of the rumen microbiome. Environmental microbiology 14, 207-227 (2012). 39 Gilbert, R. A. et al. Toward Understanding Phage: Host Interactions in the Rumen; Complete Genome Sequences of Lytic Phages Infecting Rumen Bacteria. Frontiers in microbiology 8 (2017). 40 Breitbart, M. et al. Genomic analysis of uncultured marine viral communities. Proceedings of the National Academy of Sciences 99, 14250-14255 (2002). 41 Parsley, L. C. et al. Census of the viral metagenome within an activated sludge microbial assemblage. Applied and environmental microbiology 76, 2673-2677 (2010). 42 Reyes, A. et al. Viruses in the faecal microbiota of monozygotic twins and their mothers. Nature 466, 334 (2010). 43 Howard-Varona, C., Hargreaves, K. R., Abedon, S. T. & Sullivan, M. B. Lysogeny in nature: mechanisms, impact and ecology of temperate phages. The ISME journal (2017). 44 Kim, M.-S., Park, E.-J., Roh, S. W. & Bae, J.-W. Diversity and abundance of single-stranded DNA viruses in human feces. Applied and environmental microbiology 77, 8062-8070 (2011). 45 Davies, E. V., Winstanley, C., Fothergill, J. L. & James, C. E. The role of temperate bacteriophages in bacterial infection. FEMS microbiology letters 363 (2016). 46 Creevey, C. J., Kelly, W. J., Henderson, G. & Leahy, S. C. Determining the culturability of the rumen bacterial microbiome. Microbial biotechnology 7, 467- 479 (2014). 47 Caporaso, J. G. et al. Global patterns of 16S rRNA diversity at a depth of millions of sequences per sample. Proceedings of the National Academy of Sciences of the United States of America 108 Suppl 1, 4516-4522, doi:10.1073/pnas.1000080107 (2011). 125 48 Spalinger, D. E., Collins, W. B., Hanley, T. A., Cassara, N. E. & Carnahan, A. M. The impact of tannins on protein, dry matter, and energy digestion in moose (Alces alces). Canadian Journal of Zoology 88, 977-987, doi:10.1139/z10-064 (2010). 49 AOAC(Association of Official Analytical Chemists). Fiber (Acid Detergent) and Lignin in Animal Feed (973.18) in Official methods of analyses, AOAC International, Arlington, VA. (1990). 50 Weljie, A. M., Newton, J., Mercier, P., Carlson, E., Slupsky, C.M. Targeted profiling: quantitative analysis of 1H NMR metabolomics data. Analytical Chemistry 78 (2006). 51 Miller, C. S., Baker, B. J., Thomas, B. C., Singer, S. W. & Banfield, J. F. EMIRGE: reconstruction of full-length ribosomal genes from microbial community short read sequencing data. Genome Biol 12, R44, doi:10.1186/gb- 2011-12-5-r44 (2011). 52 Peng, Y., Leung, H. C., Yiu, S. M. & Chin, F. Y. IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth. Bioinformatics 28, 1420-1428, doi:10.1093/bioinformatics/bts174 (2012). 53 Wrighton, K. C. et al. Fermentation, hydrogen, and sulfur metabolism in multiple uncultivated . Science 337, 1661-1665, doi:10.1126/science.1224041 (2012). 54 Daly R.A., B. M. A., Wilkins M.J. , Hoyt D.W., Kountz D.J., Wolfe R.A., Welch S.A., Marcus D.N., Trexler R.V, MacRae J. , Krzycki J.A. , Cole D.R., Mouser P.J., Wrighton K.C. . Microbial metabolisms in a 2.5 km deep ecosystem created by hydraulic fracturing in shales. Nature Microbiology in press (2016). 55 Sharon, I. et al. Time series community genomics analysis reveals rapid shifts in bacterial species, strains, and phage during infant gut colonization. Genome Res 23, 111-120, doi:10.1101/gr.142315.112 (2013). 56 Wu, M. & Eisen, J. A. A simple, fast, and accurate method of phylogenomic inference. Genome Biol 9, R151, doi:10.1186/gb-2008-9-10-r151 (2008). 57 Edgar, R. C. MUSCLE: a multiple sequence alignment method with reduced time and space complexity. BMC Bioinformatics 5, 113, doi:10.1186/1471-2105-5-113 (2004). 58 Talavera, G. & Castresana, J. Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. Syst Biol 56, 564-577, doi:10.1080/10635150701472164 (2007). 59 Darriba, D., Taboada, G. L., Doallo, R. & Posada, D. ProtTest 3: fast selection of best-fit models of protein evolution. Bioinformatics 27, 1164-1165, doi:10.1093/bioinformatics/btr088 (2011). 60 Stamatakis, A. RAxML version 8: a tool for phylogenetic analysis and post- analysis of large phylogenies. Bioinformatics 30, 1312-1313, doi:10.1093/bioinformatics/btu033 (2014). 61 Letunic, I. & Bork, P. Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation. Bioinformatics 23, 127-128, doi:10.1093/bioinformatics/btl529 (2007).

126 62 Kim, S. & Pevzner, P. A. MS-GF+ makes progress towards a universal database search tool for proteomics. Nat Commun 5, 5277, doi:10.1038/ncomms6277 (2014). 63 Elias, J. E. & Gygi, S. P. Target-decoy search strategy for mass spectrometry- based proteomics. Methods Mol Biol 604, 55-71, doi:10.1007/978-1-60761-444- 9_5 (2010). 64 Wunschmann, A. et al. Necropsy findings in 62 opportunistically collected free- ranging moose (alces alces) from Minnesota, USA (2003-13). Journal of wildlife diseases 51, 157-165, doi:10.7589/2014-02-037 (2015). 65 Ishaq, S. L. & Wright, A. D. Insight into the bacterial gut microbiome of the North American moose (Alces alces). BMC Microbiol 12, 212, doi:10.1186/1471- 2180-12-212 (2012). 66 Gharechahi, J., Zahiri, H. S., Noghabi, K. A. & Salekdeh, G. H. In-depth diversity analysis of the bacterial community resident in the camel rumen. Syst Appl Microbiol 38, 67-76, doi:10.1016/j.syapm.2014.09.004 (2015). 67 Gruninger, R. J., Sensen, C. W., McAllister, T. A. & Forster, R. J. Diversity of rumen bacteria in canadian cervids. PloS one 9, e89682, doi:10.1371/journal.pone.0089682 (2014). 68 Guo, W. et al. Evaluation of composition and individual variability of rumen microbiota in yaks by 16S rRNA high-throughput sequencing technology. Anaerobe 34, 74-79, doi:10.1016/j.anaerobe.2015.04.010 (2015). 69 Koike, S., Yoshitani, S., Kobayashi, Y. & Tanaka, K. Phylogenetic analysis of fiber-associated rumen bacterial community and PCR detection of uncultured bacteria. FEMS Microbiology Letters 229, 23-30, doi:10.1016/s0378- 1097(03)00760-2 (2003). 70 Kong, H. H. & Segre, J. A. Skin microbiome: looking back to move forward. J Invest Dermatol 132, 933-939, doi:10.1038/jid.2011.417 (2012). 71 McCann, J. C., Wiley, L. M., Forbes, T. D., Rouquette, F. M., Jr. & Tedeschi, L. O. Relationship between the rumen microbiome and residual feed intake- efficiency of Brahman bulls stocked on bermudagrass pastures. PloS one 9, e91864, doi:10.1371/journal.pone.0091864 (2014). 72 Ley, R. E. et al. Evolution of mammals and their gut microbes. Science 320, 1647-1651, doi:10.1126/science.1155725 (2008). 73 Yamano, H., Koike, S., Kobayashi, Y. & Hata, H. Phylogenetic analysis of hindgut microbiota in Hokkaido native horses compared to light horses. Animal Science Journal 79, 234-242, doi:10.1111/j.1740-0929.2008.00522.x (2008). 74 Leach, A. L., Chong, J. P. & Redeker, K. R. SSuMMo: rapid analysis, comparison and visualization of microbial communities. Bioinformatics 28, 679-686, doi:10.1093/bioinformatics/bts017 (2012). 75 Wang, Q., Garrity, G. M., Tiedje, J. M. & Cole, J. R. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Applied and environmental microbiology 73, 5261-5267, doi:10.1128/AEM.00062-07 (2007).

127 76 Konstantinidis, K. T. & Rossello-Mora, R. Classifying the uncultivated microbial majority: A place for metagenomic data in the Candidatus proposal. Syst Appl Microbiol 38, 223-230, doi:10.1016/j.syapm.2015.01.001 (2015). 77 Carrondo, M. A. Ferritins, iron uptake and storage from the bacterioferritin viewpoint. EMBO J 22, 1959-1968, doi:10.1093/emboj/cdg215 (2003). 78 McBride, M. J. & Zhu, Y. Gliding motility and Por secretion system genes are widespread among members of the phylum bacteroidetes. J Bacteriol 195, 270- 278, doi:10.1128/JB.01962-12 (2013). 79 Mignot, T., Shaevitz, J. W., Hartzell, P. L. & Zusman, D. R. Evidence that focal adhesion complexes power bacterial gliding motility. Science 315, 853-856, doi:10.1126/science.1137223 (2007). 80 Etzold, S. & Juge, N. Structural insights into bacterial recognition of intestinal mucins. Current opinion in structural biology 28, 23-31 (2014). 81 Macfarlane, G., Gibson, G. & Cummings, J. Comparison of fermentation reactions in different regions of the human colon. Journal of applied microbiology 72, 57-64 (1992). 82 Frederiksen, R. F. et al. Bacterial chitinases and chitin-binding proteins as virulence factors. Microbiology 159, 833-847, doi:10.1099/mic.0.051839-0 (2013). 83 Naas, A. E. et al. Do rumen Bacteroidetes utilize an alternative mechanism for cellulose degradation? mBio 5, e01401-01414 (2014). 84 Rosewarne, C. P., Pope, P. B., Cheung, J. L. & Morrison, M. Analysis of the bovine rumen microbiome reveals a diversity of Sus-like polysaccharide utilization loci from the bacterial phylum Bacteroidetes. Journal of industrial microbiology & biotechnology 41, 601-606 (2014). 85 Scheller, H. V. & Ulvskov, P. Hemicelluloses. Annu Rev Plant Biol 61, 263-289, doi:10.1146/annurev-arplant-042809-112315 (2010). 86 Hackmann, T. J. & Firkins, J. L. Electron transport phosphorylation in rumen butyrivibrios: unprecedented ATP yield for glucose fermentation to butyrate. Front Microbiol 6, 622, doi:10.3389/fmicb.2015.00622 (2015). 87 Kelly, W. J. et al. The glycobiome of the rumen bacterium Butyrivibrio proteoclasticus B316T highlights adaptation to a polysaccharide-rich environment. PloS one 5, e11942 (2010). 88 Yomano, L., York, S., Shanmugam, K. & Ingram, L. Deletion of methylglyoxal synthase gene (mgsA) increased sugar co-metabolism in ethanol-producing Escherichia coli. Biotechnology letters 31, 1389-1398 (2009). 89 Hall, S. J., Treffkorn, J. & Silver, W. L. Breaking the enzymatic latch: impacts of reducing conditions on hydrolytic enzyme activity in tropical forest soils. Ecology 95, 2964-2973, doi:10.1890/13-2151.1 (2014). 90 Biegel, E. & Muller, V. Bacterial Na+-translocating ferredoxin:NAD+ oxidoreductase. Proceedings of the National Academy of Sciences of the United States of America 107, 18138-18142, doi:10.1073/pnas.1010318107 (2010). 91 Schut, G. J. & Adams, M. W. The iron-hydrogenase of Thermotoga maritima utilizes ferredoxin and NADH synergistically: a new perspective on anaerobic

128 hydrogen production. J Bacteriol 191, 4451-4457, doi:10.1128/JB.01582-08 (2009). 92 Baker, D. N., Norris K.H., Li, B.W. in Dietary Fibers: Chemistry and Nutrition 67-79 (2012). 93 Lenart, E. A., Bowyer, R. T., Hoef, J. V. & Ruess, R. W. Climate change and caribou: effects of summer weather on forage. Canadian Journal of Zoology 80, 664-678, doi:10.1139/z02-034 (2002). 94 Mcart, S. H. et al. Summer dietary nitrogen availability as a potential bottom-up constraint on moose in south-central Alaska. Ecology 90, 1400-1411, doi:Doi 10.1890/08-1435.1 (2009). 95 Tape, K., Sturm, M. & Racine, C. The evidence for shrub expansion in Northern Alaska and the Pan-Arctic. Global Change Biology 12, 686-702, doi:10.1111/j.1365-2486.2006.01128.x (2006). 96 Kaarlejärvi, E. et al. Effects of Warming on Shrub Abundance and Chemistry Drive Ecosystem-Level Changes in a Forest–Tundra Ecotone. Ecosystems 15, 1219-1233, doi:10.1007/s10021-012-9580-9 (2012). 97 Lavola, A. et al. Combination treatment of elevated UVB radiation, CO and temperature has little effect on silver birch (Betula pendula) growth and phytochemistry. Physiologia plantarum, doi:10.1111/ppl.12051 (2013). 98 Ishaq, S. L. & Wright, A. D. High-throughput DNA sequencing of the ruminal bacteria from moose (Alces alces) in Vermont, Alaska, and Norway. Microbial ecology 68, 185-195, doi:10.1007/s00248-014-0399-0 (2014). 99 Hobson, P. N. & Stewart, C. S. The rumen microbial ecosystem. (Springer Science & Business Media, 2012). 100 Kang, D. D., Froula, J., Egan, R. & Wang, Z. MetaBAT, an efficient tool for accurately reconstructing single genomes from complex microbial communities. PeerJ 3, e1165 (2015). 101 Anantharaman, K. et al. Thousands of microbial genomes shed light on interconnected biogeochemical processes in an aquifer system. Nat Commun 7 (2016). 102 Consortium, U. The universal protein resource (UniProt) in 2010. Nucleic acids research 38, D142-D148 (2010). 103 Edgar, R. C. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 26, 2460-2461 (2010). 104 Hyatt, D., LoCascio, P. F., Hauser, L. J. & Uberbacher, E. C. Gene and translation initiation site prediction in metagenomic sequences. Bioinformatics 28, 2223- 2230 (2012). 105 Kanehisa, M., Goto, S., Sato, Y., Furumichi, M. & Tanabe, M. KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40, D109-114, doi:10.1093/nar/gkr988 (2012). 106 Zdobnov, E. M. & Apweiler, R. InterProScan–an integration platform for the signature-recognition methods in InterPro. Bioinformatics 17, 847-848 (2001). 107 Zhang, X. et al. Assessing the impact of protein extraction methods for human gut metaproteomics. Journal of proteomics (2017).

129 108 Hug, L. A. et al. A new view of the tree of life. Nature Microbiology 1, 16048 (2016). 109 Kearse, M. et al. Geneious Basic: an integrated and extendable desktop software platform for the organization and analysis of sequence data. Bioinformatics 28, 1647-1649 (2012). 110 Bolam, D. N. & Koropatkin, N. M. Glycan recognition by the Bacteroidetes Sus- like systems. Current opinion in structural biology 22, 563-569 (2012). 111 Koropatkin, N. M., Cameron, E. A. & Martens, E. C. How glycan metabolism shapes the human gut microbiota. Nature reviews. Microbiology 10, 323 (2012). 112 Yin, Y. et al. dbCAN: a web resource for automated carbohydrate-active enzyme annotation. Nucleic acids research 40, W445-W451 (2012). 113 Kabisch, A. et al. Functional characterization of polysaccharide utilization loci in the marine Bacteroidetes ‘Gramella forsetii'KT0803. The ISME journal 8, 1492 (2014). 114 Martens, E. C., Chiang, H. C. & Gordon, J. I. Mucosal glycan foraging enhances fitness and transmission of a saccharolytic human gut bacterial symbiont. Cell host & microbe 4, 447-457 (2008). 115 Martens, E. C., Koropatkin, N. M., Smith, T. J. & Gordon, J. I. Complex glycan catabolism by the human gut microbiota: the Bacteroidetes Sus-like paradigm. Journal of Biological Chemistry 284, 24673-24677 (2009). 116 Moller, I. et al. High-throughput screening of monoclonal antibodies against plant cell wall glycans by hierarchical clustering of their carbohydrate microarray binding profiles. Glycoconjugate journal 25, 37-48 (2008). 117 Roux, S., Enault, F., Hurwitz, B. L. & Sullivan, M. B. VirSorter: mining viral signal from microbial genomic data. PeerJ 3, e985 (2015). 118 Roux, S. et al. Ecogenomics and potential biogeochemical impacts of globally abundant ocean viruses. Nature (2016). 119 Li, W. & Godzik, A. Cd-hit: a fast program for clustering and comparing large sets of protein or nucleotide sequences. Bioinformatics 22, 1658-1659 (2006). 120 Pruitt, K. D., Tatusova, T., Brown, G. R. & Maglott, D. R. NCBI Reference Sequences (RefSeq): current status, new features and genome annotation policy. Nucleic acids research 40, D130-D135 (2011). 121 Bolduc, B., Youens-Clark, K., Roux, S., Hurwitz, B. L. & Sullivan, M. B. iVirus: facilitating new insights in viral ecology with software and community data sets imbedded in a cyberinfrastructure. The ISME journal 11, 7 (2017). 122 Ishaq, S. L., Sundset, M. A., Crouse, J. & Wright, A.-D. G. High-throughput DNA sequencing of the moose rumen from different geographical locations reveals a core ruminal methanogenic archaeal diversity and a differential ciliate protozoal diversity. Microbial Genomics 1 (2015). 123 Bowers, R. M. et al. Minimum information about a single amplified genome (MISAG) and a metagenome-assembled genome (MIMAG) of bacteria and archaea. Nature biotechnology 35 (2017). 124 Larsbrink, J. et al. A discrete genetic locus confers xyloglucan metabolism in select human gut Bacteroidetes. Nature 506, 498 (2014).

130 125 Vanwonterghem, I., Jensen, P. D., Ho, D. P., Batstone, D. J. & Tyson, G. W. Linking microbial community structure, interactions and function in anaerobic digesters using new molecular techniques. Current opinion in biotechnology 27, 55-64 (2014). 126 Brum, J. R. et al. Illuminating structural proteins in viral “dark matter” with metaproteomics. Proceedings of the National Academy of Sciences 113, 2436- 2441 (2016). 127 Makarova, K. S. et al. An updated evolutionary classification of CRISPR–Cas systems. Nature Reviews. Microbiology 13, 722 (2015). 128 Noguchi, Y. & Katayama, T. The Escherichia coli Cryptic Prophage Protein YfdR Binds to DnaA and Initiation of Chromosomal Replication Is Inhibited by Overexpression of the Gene Cluster yfdQ-yfdR-yfdS-yfdT. Frontiers in microbiology 7 (2016). 129 Hagerman, A. E. (Oxford, 2002). 130 Freeman, C., Ostle, N. & Kang, H. An enzymic 'latch' on a global carbon store. Nature 409, 149, doi:10.1038/35051650 (2001). 131 Bae, H. D., McAllister, T. A., Yanke, J., Cheng, K.-J. & Muir, A. Effects of condensed tannins on endoglucanase activity and filter paper digestion by Fibrobacter succinogenes S85. Applied and environmental microbiology 59, 2132-2138 (1993). 132 Chung, K.-T., Lu, Z. & Chou, M. Mechanism of inhibition of tannic acid and related compounds on the growth of intestinal bacteria. Food and Chemical Toxicology 36, 1053-1060 (1998). 133 Halvorson, J. J., Gonzalez, J. M. & Hagerman, A. E. Retention of Tannin-C is Associated with Decreased Soluble Nitrogen and Increased Cation Exchange Capacity in a Broad Range of Soils. Soil Science Society of America Journal 77, 1199, doi:10.2136/sssaj2012.0326 (2013). 134 Eloff, J. N. A sensitive and quick microplate method to determine the minimal inhibitory concentration of plant extracts for bacteria. Planta medica 64, 711-713 (1998). 135 Bhat, T. K., Singh, B. & Sharma, O. P. Microbial degradation of tannins--a current perspective. Biodegradation 9, 343-357 (1998). 136 Coulis, M., Hättenschwiler, S., Rapior, S. & Coq, S. The fate of condensed tannins during litter consumption by soil animals. Soil Biology and Biochemistry 41, 2573-2578 (2009). 137 Ikigai, H., Nakae, T., Hara, Y. & Shimamura, T. Bactericidal catechins damage the lipid bilayer. Biochimica et Biophysica Acta (BBA)-Biomembranes 1147, 132- 136 (1993). 138 Nie, X., Zhao, L., Wang, N. & Meng, X. Phenolics-protein interaction involved in silver carp myofibrilliar protein films with hydrolysable and condensed tannins. LWT-Food Science and Technology 81, 258-264 (2017). 139 Monagas, M. et al. Insights into the metabolism and microbial biotransformation of dietary flavan-3-ols and the bioactivity of their metabolites. Food & function 1, 233-253 (2010).

131 140 Foley, W. J. & Moore, B. D. Plant secondary metabolites and vertebrate herbivores–from physiological regulation to ecosystem function. Current opinion in plant biology 8, 430-435 (2005). 141 Moore, B. D., Wiggins, N. L., Marsh, K. J., Dearing, M. D. & Foley, W. J. Translating physiological signals to changes in feeding behaviour in mammals and the future effects of global climate change. Anim Prod Sci 55, 272-283 (2015). 142 Young, L. Y. & Frazer, A. C. The fate of lignin and lignin‐derived compounds in anaerobic environments. Geomicrobiology Journal 5, 261-293, doi:10.1080/01490458709385973 (1987). 143 Chowdhury, S. P., Khanna, S., Verma, S. & Tripathi, A. Molecular diversity of tannic acid degrading bacteria isolated from tannery soil. Journal of applied microbiology 97, 1210-1219 (2004). 144 Reverón, I. et al. Differential Gene Expression by Lactobacillus plantarum WCFS1 in Response to Phenolic Compounds Reveals New Genes Involved in Tannin Degradation. Applied and environmental microbiology 83, e03387-03316 (2017). 145 Osawa, R. et al. koalarum gen. nov., sp. nov., a new tannin-protein complex degrading bacterium. Systematic and Applied Microbiology 18, 368-373 (1995). 146 Brooker, J. et al. Streptococcus caprinus sp. nov., a tannin‐resistant ruminal bacterium from feral goats. Lett Appl Microbiol 18, 313-318 (1994). 147 Odenyo, A. A. et al. Characterization of Tannin-tolerant Bacterial Isolates from East African Ruminants. Anaerobe 7, 5-15, doi:10.1006/anae.2000.0367 (2001). 148 Skene, I. K. & Brooker, J. D. Characterization of tannin acylhydrolase activity in the ruminal bacterium Selenomonas ruminantium. Anaerobe 1, 321-327, doi:10.1006/anae.1995.1034 (1995). 149 Field, J., Kortekaas, S. & Lettinga, G. The tannin theory of methanogenic toxicity. Biological Wastes 29, 241-262 (1989). 150 Deprez, S. et al. Polymeric proanthocyanidins are catabolized by human colonic microflora into low-molecular-weight phenolic acids. J Nutr 130, 2733-2738 (2000). 151 Brooker, J. D., O'Donovan, L., Skene, I. & Sellick, G. in ACIAR PROCEEDINGS. 117-122 (ACIAR; 1998). 152 Lovley, D. R. & Phillips, E. J. Novel mode of microbial energy metabolism: organic carbon oxidation coupled to dissimilatory reduction of iron or manganese. Applied and environmental microbiology 54, 1472-1480 (1988). 153 Wilson, K. H., Blitchington, R. & Greene, R. Amplification of bacterial 16S ribosomal DNA with polymerase chain reaction. Journal of clinical microbiology 28, 1942-1946 (1990). 154 Joshi NA, F. J. . Sickle: A sliding-window, adaptive, quality-based trimming tool for FastQ files (Version 1.33) [Software] (2011). 155 Pearson, W. R. An introduction to sequence similarity (“homology”) searching. Current protocols in bioinformatics, 3.1. 1-3.1. 8 (2013).

132 156 Harel, K., Mayer, A. M. & Shain, Y. Catechol oxidases from apples, their properties, subcellular location and inhibition. Physiologia plantarum 17, 921-930 (1964). 157 Adamczyk, B., Kitunen, V. & Smolander, A. Response of soil C and N transformations to condensed tannins and different organic N-condensed tannin complexes. Applied soil ecology 64, 163-170 (2013). 158 Adamczyk, B., Kitunen, V. & Smolander, A. Polyphenol oxidase, tannase and proteolytic activity in relation to tannin concentration in the soil organic horizon under silver birch and Norway spruce. Soil Biology and Biochemistry 41, 2085- 2093 (2009). 159 Imlay, J. A. Diagnosing oxidative stress in bacteria: not as easy as you might think. Current opinion in microbiology 24, 124-131 (2015). 160 Bull, C. & Ballou, D. Purification and properties of protocatechuate 3, 4- dioxygenase from Pseudomonas putida. A new iron to subunit stoichiometry. Journal of Biological Chemistry 256, 12673-12680 (1981). 161 Masai, E. et al. Genetic and Biochemical Characterization of 4-Carboxy-2- Hydroxymuconate-6-Semialdehyde Dehydrogenase and Its Role in the Protocatechuate 4, 5-Cleavage Pathway inSphingomonas paucimobilis SYK-6. J Bacteriol 182, 6651-6658 (2000). 162 Gall, M. et al. Enzymatic conversion of flavonoids using bacterial chalcone isomerase and enoate reductase. Angewandte Chemie International Edition 53, 1439-1442 (2014). 163 Marin, A. et al. Naringenin degradation by the endophytic diazotroph Herbaspirillum seropedicae SmR1. Microbiology 159, 167-175 (2013). 164 Jiménez, J. I., Miñambres, B., García, J. L. & Díaz, E. Genomic analysis of the aromatic catabolic pathways from Pseudomonas putida KT2440. Environmental microbiology 4, 824-841 (2002). 165 Enguita, F. J., Martins, L. O., Henriques, A. O. & Carrondo, M. A. Crystal structure of a bacterial endospore coat component a laccase with enhanced thermostability properties. Journal of Biological Chemistry 278, 19416-19425 (2003). 166 Djoko, K. Y., Chong, L. X., Wedd, A. G. & Xiao, Z. Reaction mechanisms of the multicopper oxidase CueO from Escherichia coli support its functional role as a cuprous oxidase. Journal of the American Chemical Society 132, 2005-2015 (2010). 167 Fiorentino, G., Ronca, R., Cannio, R., Rossi, M. & Bartolucci, S. MarR-like transcriptional regulator involved in detoxification of aromatic compounds in Sulfolobus solfataricus. J Bacteriol 189, 7351-7360 (2007). 168 Otani, H. et al. The activity of CouR, a MarR family transcriptional regulator, is modulated through a novel molecular mechanism. Nucleic acids research 44, 595-607 (2015). 169 Sulavik, M., Gambino, L. & Miller, P. The MarR repressor of the multiple antibiotic resistance (mar) operon in Escherichia coli: prototypic member of a family of bacterial regulatory proteins involved in sensing phenolic compounds. Molecular Medicine 1, 436 (1995). 133 170 Romero-Arroyo, C. E., Schell, M. A., Gaines, G. & Neidle, E. L. catM encodes a LysR-type transcriptional activator regulating catechol degradation in Acinetobacter calcoaceticus. J Bacteriol 177, 5891-5898 (1995). 171 Rothmel, R. et al. Nucleotide sequencing and characterization of Pseudomonas putida catR: a positive regulator of the catBC operon is a member of the LysR family. J Bacteriol 172, 922-931 (1990). 172 Paoletti, A. C. et al. Quantitative proteomic analysis of distinct mammalian Mediator complexes using normalized spectral abundance factors. Proceedings of the National Academy of Sciences 103, 18928-18933 (2006). 173 Kolde, R. pheatmap: Pretty Heatmaps. 2015. URL http://CRAN. R-project. org/package= pheatmap. R package version 1, 2 (2017). 174 Brix, L., Hansen, M. J., Kelly, A., Bertelsen, M. F. & Bojesen, A. M. Occurrence of bacteria in the oral cavity of the Tasmanian devil (Sarcophilus harrisii). Journal of Zoo and Wildlife Medicine 46, 241-245 (2015). 175 Kuhnert, P., Scholten, E., Haefner, S., Mayor, D. & Frey, J. Basfia succiniciproducens gen. nov., sp. nov., a new member of the family Pasteurellaceae isolated from bovine rumen. International journal of systematic and evolutionary microbiology 60, 44-50 (2010). 176 Tsuchida, S., Murata, K., Ohkuma, M. & Ushida, K. Isolation of Streptococcus gallolyticus with very high degradability of condensed tannins from feces of the wild Japanese rock ptarmigans on Mt. Tateyama. The Journal of general and applied microbiology 63, 195-198 (2017). 177 Brooker, J. D. et al. Streptococcus-Caprinus Sp-Nov, a Tannin-Resistant Ruminal Bacterium from Feral Goats. Lett Appl Microbiol 18, 313-318, doi:DOI 10.1111/j.1472-765X.1994.tb00877.x (1994). 178 O’donovan, L. & Brooker, J. D. Effect of hydrolysable and condensed tannins on growth, morphology and metabolism of Streptococcus gallolyticus (S. caprinus) and . Microbiology 147, 1025-1033 (2001). 179 Pellegrini, S. I. A comparative analysis of the moose rumen microbiota and the pursuit of improving fibrolytic systems, The University of Vermont and State Agricultural College, (2015). 180 Anderson, R. & Ryser, P. Early autumn senescence in red maple (Acer rubrum L.) is associated with high leaf anthocyanin content. Plants 4, 505-522 (2015). 181 Majid, S., Khanduja, K. L., Gandhi, R. K., Kapur, S. & Sharma, R. R. Influence of ellagic acid on antioxidant defense system and lipid peroxidation in mice. Biochemical pharmacology 42, 1441-1445 (1991). 182 Das, M., Bickers, D. R. & Mukhtar, H. Effect of ellagic acid on hepatic and pulmonary xenobiotic metabolism in mice: studies on the mechanism of its anticarcinogenic action. Carcinogenesis 6, 1409-1413 (1985). 183 Deangelis, K. M. et al. Complete genome sequence of "Enterobacter lignolyticus" SCF1. Stand Genomic Sci 5, 69-85, doi:10.4056/sigs.2104875 (2011). 184 Desai, S. & Nityanand, C. Microbial laccases and their applications: a review. Asian J Biotechnol 3, 98-124 (2011). 185 Strumeyer, D. H. & Malin, M. J. Condensed tannins in grain sorghum. Isolation, fractionation, and characterization. J Agr Food Chem 23, 909-914 (1975). 134 186 Sauer, U. & Eikmanns, B. J. The PEP—pyruvate—oxaloacetate node as the switch point for carbon flux distribution in bacteria: We dedicate this paper to Rudolf K. Thauer, Director of the Max-Planck-Institute for Terrestrial Microbiology in Marburg, Germany, on the occasion of his 65th birthday. FEMS microbiology reviews 29, 765-794 (2005). 187 Jung, M.-Y. et al. Improvement of 2, 3-butanediol yield in by deletion of the pyruvate formate-lyase gene. Applied and environmental microbiology 80, 6195-6203 (2014). 188 Schimel, J., Balser, T. C. & Wallenstein, M. Microbial stress‐response physiology and its implications for ecosystem function. Ecology 88, 1386-1394 (2007). 189 Slabbert, N. in Plant polyphenols 421-436 (Springer, 1992). 190 Brown, L., Kim, J.-H. & Cho, K. H. Presence of a Prophage Determines Temperature-Dependent Capsule Production in . Genes 7, 74 (2016). 191 Feiner, R. et al. A new perspective on lysogeny: prophages as active regulatory switches of bacteria. Nature Reviews Microbiology 13, 641 (2015). 192 Devos, S. et al. Membrane vesicle secretion and prophage induction in multidrug‐ resistant Stenotrophomonas maltophilia in response to ciprofloxacin stress. Environmental microbiology (2017). 193 Krause, D. O., Smith, W. J., Brooker, J. D. & McSweeney, C. S. Tolerance mechanisms of streptococci to hydrolysable and condensed tannins. Animal feed science and technology 121, 59-75 (2005). 194 Pinto, A. J. & Raskin, L. PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PloS one 7, e43093 (2012). 195 Zhou, J. et al. Random sampling process leads to overestimation of β-diversity of microbial communities. mBio 4, e00324-00313 (2013). 196 Shade, A. et al. Conditionally rare taxa disproportionately contribute to temporal changes in microbial diversity. mBio 5, e01371-01314 (2014). 197 Shade, A. & Gilbert, J. A. Temporal patterns of rarity provide a more complete view of microbial diversity. Trends in microbiology 23, 335-340 (2015). 198 Castelle, C. J. & Banfield, J. F. Major New Microbial Groups Expand Diversity and Alter our Understanding of the Tree of Life. Cell 172, 1181-1197 (2018). 199 Aguilar, C. & Gutiérrez-Sánchez, G. Review: sources, properties, applications and potential uses of tannin acyl hydrolase. Food Science and Technology International 7, 373-382 (2001). 200 Holmes, A. J. et al. Diet-microbiome interactions in health are controlled by intestinal nitrogen source constraints. Cell metabolism 25, 140-151 (2017). 201 Harfoot, C. & Hazlewood, G. in The rumen microbial ecosystem 382-426 (Springer, 1997). 202 Ley, R. E., Peterson, D. A. & Gordon, J. I. Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell 124, 837-848 (2006). 203 Sonnenburg, J. L. et al. Glycan foraging in vivo by an intestine-adapted bacterial symbiont. Science 307, 1955-1959 (2005).

135 204 Trosvik, P. & Muinck, E. J. Ecology of bacteria in the human gastrointestinal tract—identification of keystone and foundation taxa. Microbiome 3, 44 (2015). 205 Wong, M. T. et al. Substrate-driven convergence of the microbial community in lignocellulose-amended enrichments of gut microflora from the Canadian beaver (Castor canadensis) and North American moose (Alces americanus). Frontiers in Microbiology 7, doi:10.3389/fmicb.2016.00961 (2016). 206 Stewart, R. et al. Assembly of hundreds of microbial genomes from the cow rumen reveals novel microbial species encoding enzymes with roles in carbohydrate metabolism. bioRxiv, doi:10.1101/162578 (2017). 207 Montez, R. D. The influence of the invasive Chinese tallow (Triadica sebifera) leaflitter on aquatic chemistry and microbial community composition. Masters of Science, thesis. Austin State University (2016). 208 Konstantinidis, K. T., Rossello-Mora, R. & Amann, R. Uncultivated microbes in need of their own taxonomy. The ISME journal, doi:10.1038/ismej.2017.113 (2017). 209 Yutin, N. & Galperin, M. Y. A genomic update on clostridial phylogeny: Gram‐ negative spore formers and other misplaced clostridia. Environmental microbiology 15, 2631-2641 (2013). 210 Schwarz, W. The cellulosome and cellulose degradation by anaerobic bacteria. Applied microbiology and biotechnology 56, 634-649 (2001). 211 Sun, L. et al. Lentimicrobium saccharophilum gen. nov., sp. nov., a strictly anaerobic bacterium representing a new family in the phylum Bacteroidetes, and proposal of Lentimicrobiaceae fam. nov. International journal of systematic and evolutionary microbiology 66, 2635-2642 (2016). 212 Ormerod, K. L. et al. Genomic characterization of the uncultured Bacteroidales family S24-7 inhabiting the guts of homeothermic animals. Microbiome 4, 36 (2016). 213 Fennema, D., Phillips, I. R. & Shephard, E. A. Trimethylamine and trimethylamine N-oxide, a flavin-containing monooxygenase 3 (FMO3)-mediated host-microbiome metabolic axis implicated in health and disease. Drug Metabolism and Disposition 44, 1839-1850 (2016). 214 Neill, A. R., Grime, D. W. & Dawson, R. Conversion of choline methyl groups through trimethylamine into methane in the rumen. Biochemical Journal 170, 529-535 (1978). 215 Asanuma, N. & Hino, T. Activity and properties of fumarate reductase in ruminal bacteria. The Journal of general and applied microbiology 46, 119-125 (2000).

136