<<

University of Massachusetts Amherst ScholarWorks@UMass Amherst

Doctoral Dissertations Dissertations and Theses

April 2014

Anaerobic Microbes and Communities In the Context of Soil and the Equine Digestive Tract

Amy Biddle University of Massachusetts Amherst

Follow this and additional works at: https://scholarworks.umass.edu/dissertations_2

Part of the Commons

Recommended Citation Biddle, Amy, "Anaerobic Microbes and Communities In the Context of Soil and the Equine Digestive Tract" (2014). Doctoral Dissertations. 2. https://doi.org/10.7275/3thj-5m20 https://scholarworks.umass.edu/dissertations_2/2

This Open Access Dissertation is brought to you for free and open access by the Dissertations and Theses at ScholarWorks@UMass Amherst. It has been accepted for inclusion in Doctoral Dissertations by an authorized administrator of ScholarWorks@UMass Amherst. For more information, please contact [email protected].

ANAEROBIC MICROBES AND COMMUNITIES IN THE CONTEXT OF SOIL

AND THE EQUINE DIGESTIVE TRACT

A Dissertation Presented

By

AMY SANDERS BIDDLE

Submitted to the Graduate School of the University of Massachusetts, Amherst in partial fulfillment of the requirements for the degree of

DOCTOR OF PHILOSOPHY

February 2014

Department of Microbiology

© Copyright by Amy Sanders Biddle 2014 All Rights Reserved

ANAEROBIC MICROBES AND COMMUNITIES IN THE CONTEXT OF SOIL

AND THE EQUINE DIGESTIVE TRACT

A Dissertation Presented

By

AMY SANDERS BIDDLE

Approved as to style and content by:

______Jeffrey L. Blanchard, Chair

______Susan Leschine, Member

______Samuel P. Hazen, Member

______Kristen M. DeAngelis, Member

______John M. Lopes, Department Head Department of Microbiology

DEDICATION

To all of the horse owners who are anxiously waiting for the vet to arrive tonight.

ACKNOWLEDGMENTS

I wish to thank my advisor, Jeffrey L. Blanchard, for his friendship and guidance and for fostering a supportive lab environment in which to do science; Susan Leschine for her sound advise and uncanny ability to appear at odd hours when things are falling apart; members of my committee for their time and helpful suggestions; members of the

Blanchard and Leschine Labs for all of their assistance and input, especially Kelly Haas,

Supratim Mukherjee, Maddie Coppi, Tom Warnick, Mary Hagen, Tara Mahendrarajah,

Josephine Aiyeku, Sonia Filipczak, and Ava Bakhtyari; Grace Pold and members of the

DeAngelis Lab for their technical expertise, snacks and general good cheer; Lucy Stewart and the members of the Holden Lab for their help with GC and the project;

Ruthie, Lorelle, Sue and Maryanne who keep the Microbiology Department running so smoothly; Carlos Gradil and Sam Black for their assistance and encouragement for the horse gut work; Manfred Auer and Danielle Jorgens for giving me a nudge at the very

beginning.

Special thanks go to friends and family who have cheered me onward through each step of this process, in particular Suzy Biddle, Chris Madigan, Marilyn Miller, Gale

Christensen, Jody Goodrich, Libby Volckening, and Charles Malloch.

v

ABSTRACT

ANAEROBIC MICROBES AND COMMUNITIES IN THE CONTEXT OF SOIL

AND THE EQUINE DIGESTIVE TRACT

FEBRUARY 2014

AMY SANDERS BIDDLE, B.S., UNIVERSITY OF NEW HAMPSHIRE

M.Ed., ANTIOCH NEW ENGLAND

Ph.D., UNIVERSITY OF MASSACHUSETTS AMHERST

Directed by: Jeffrey L. Blanchard

Soil and herbivore gut environments present different challenges to plant degrading

in terms of nutrient availability, fluctuations in moisture, pH and temperature, and temporal constraints, however complex communities of microbes in each serve similar roles in hydrolyzing and fermenting the diverse components of plant biomass.

This dissertation describes four projects with the underlying purpose to further understand the structure and functioning of anaerobic plant degrading communities.

(1) A three-year microcosm experiment using enrichment and serial transfers to reduce the diversity of a complex soil community over time, tested the hypothesis that changes in community structure would be consistent across replicate samples and enabled the detection and isolation of persistent community members. (2) A second project to track the changes in bacterial community structure and function that occur during starch induced lactate acidosis in horses identified specific microbes that could be implicated in the recovery and/or resistance to these community changes. (3) Genomic data was used to

vi

compare clostridial inhabiting the gut (belonging to the , and

Ruminococcaceae) and those that are free-living (belonging to the ) to identify metabolic strategies that could enable specialization to a host associated or free- living lifestyle. (4) The genomic sequence analysis of indolis , a member of the C. saccharolyticum species group, provided insights into the genetic potential of this poorly described taxa which will drive hypotheses regarding its metabolic and ecological activities and help to resolve distinctions between closely related taxa in this taxonomically confusing clade within the Lachnospiraceae.

vii

TABLE OF CONTENTS

Page

ACKNOWLEDGEMENTS ...... v

ABSTRACT ...... vi

LIST OF TABLES ...... x

LIST OF FIGURES ...... xii

CHAPTER

1. INTRODUCTION ...... 1

1.1 Microbes in plant rich anaerobic environments ...... 1 1.2 Plant degradation in soil ...... 1 1.3 Plant degradation in the herbivore gut ...... 3 1.4 The genomic basis of plant degradation in soil and gut microbes ...... 4 1.5 The genome analysis of the Clostridium saccharolyticum species group ...... 5

2. EARLY ESTABLISHMENT AND PERSISTENCE OF MICROCOSM COMMUNITY DIVERSITY THROUGH THREE YEARS OF SERIAL TRANSFERS ...... 7

2.1 Abstract ...... 7 2.2 Introduction ...... 8 2.3 Materials and methods ...... 10 2.4 Results ...... 15 2.5 Discussion ...... 25 2.6 Conclusion ...... 29

3. AN IN VITRO MODEL OF THE HORSE GUT MICROBIOME ENABLES IDENTIFICATION OF LACTATE-UTILIZING BACTERIA THAT DIFFERENTIALLY RESPOND TO STARCH INDUCTION ...... 31

3.1 Abstract ...... 31 3.2 Introduction ...... 32 3.3 Materials and methods ...... 35 3.4 Results ...... 38 3.5 Discussion ...... 47 3.6 Conclusion ...... 51

viii

4. UNTANGLING THE GENETIC BASIS OF FIBROLYTIC SPECIALIZATION BY LACHNOSPIRACEAE AND IN DIVERSE GUT COMMUNITIES ...... 52

4.1 Abstract ...... 52 4.2 Introduction ...... 52 4.3 Materials and methods ...... 55 4.4 Results and Discussion ...... 58 4.5 Conclusion ...... 66

5. THE COMPLETE GENOME SEQUENCE OF ...... 68

5.1 Abstract ...... 68 5.2 Introduction ...... 68 5.3 Organism information ...... 70 5.4 Genome sequencing information ...... 70 5.5 Genome Properties ...... 74 5.6 Conclusion ...... 82

6. CONCLUSIONS AND FUTURE DIRECTIONS ...... 83

APPENDICES

A. PHYLOGENETIC TREES OF OTUS AND ISOLATES FROM THREE MICROCOSM FAMILIES ...... 88

B. HYDROGEN SULFIDE GAS LEVELS BY HORSE ...... 91

C. HEADSPACE GAS LEVELS BY HORSE ...... 92

D. RAREFACTION CURVES BY HORSE ...... 93

E. ALPHA DIVERSITY ESTIMATES: HORSE GUT ...... 94

F. BEST BLAST HITS FOR MOST ABUNDANT HORSE GUT OTUS AT TIME 48 ...... 96

G. TAXA FROM EACH FAMILY USED IN COMPARATIVE ANALYSES ...... 97

ix

H. ACTIVITIES OF ENZYMES THAT DIFFER BETWEEN THE LACHNOSPIRACEAE, CLOSTRIDIACEAE, AND RUMINOCOCCACEAE ...... 99

I. SUBSTRATE AFFINITIES OF CARBOHYDRATE BINDING MODULES THAT DIFFER BETWEEN THE LACHNOSPIRACEAE, CLOSTRIDIACEAE, AND RUMINOCOCCACEAE ...... 100

REFERENCES ...... 101

x

LIST OF TABLES

Table Page

1. Sequence metrics of five microcosm samples at nine transfer points ...... 17

2. Sequence metrics of sequencing quality assessment ...... 24

3. Sequence metrics of horse gut samples ...... 41

4. Classification and general features of Clostridium indolis DSM 755 ...... 71

5. Project information ...... 72

6. Nucleotide content and gene count levels of the genome of C. indolis DSM 755 ...... 74

7. Number of genes in C. indolis DSM 755 associated with the 25 general COG functional categories ...... 75

8. Number of genes in C. indolis DSM 755 not found in near-relatives associated with the 25 general COG functional categories ...... 76

9. Selected carbohydrate active genes in the C. indolis DSM 755 genome ...... 77

10. Selection of C. indolis DSM 755 genes related to citrate utilization ...... 78

11. Selection of C. indolis DSM 755 genes related to nitrogen fixation ...... 79

12. Selection of C. indolis DSM 755 genes related to lactate utilization ...... 81

13. Selection of C. indolis DSM 755 genes related to degradation of aromatics ...... 82

xi

LIST OF FIGURES

Figure Page

1. Short chain fatty acid metabolites in microcosms over time ...... 16

2. Rarefaction curve of microcosm sequencing data ...... 17

3. Richness in microcosms over time ...... 18

4. Alpha diversity of microcosms over time ...... 19

5. Beta diversity of microcosms by sample and transfer ...... 19

6. Distribution of microcosm sequences by phyla ...... 20

7. Distribution of Proteobacteria sequences by OTU ...... 21

8. Distribution of microcosm sequences by OTU, across all transfers ...... 22

9. Distribution of shared sequences in all replicates at Transfer 62 ...... 23

10. pH of cultures over time by horse ...... 39

11. Short chain fatty acid metabolites over time by horse ...... 40

12. Distribution of horse gut sequences by phyla ...... 42

13. Distribution of horse gut Firmicute sequences by family ...... 43

14. Distribution of most abundant horse gut OTUs at time 48 ...... 44

15. Change in most abundant horse gut Veillonellaceae OTUs over time ...... 45

16. Change in single, abundant horse gut Lactobacillaceae OTU over time ...... 46

17. Change in abundant horse gut Streptococcaceae OTUs over time ...... 47

18. 16S Neighbor-joining tree of the representative taxa used in this analysis ...... 59

19. Proportion of members of each family that are gut associated ...... 60

20. Comparison of Carbohydrate-Active enZymes (CAZy) with respect to abundance in the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae ...... 61

xii

21. Glycoside hydrolase families that differ in abundance ...... 61

22. Glycoside hydrolase enzymes that differ in abundance ...... 62

23. Carbohydrate-binding module (CBM) families that differ in abundance ...... 63

24. Average abundance of phosphotransferase system (PTS) and ATP-binding cassette (ABC) transporter genes ...... 64

25. Average abundance of sugar transport genes ...... 65

26. Average abundance of genes in each Ecocyc degradation pathway ...... 66

27. Carbohydrate and Polymeric Compound Degradation pathways ...... 67

28. Phylogenetic tree of 16S rRNA genes of the Clostridium saccharolyticum species group ...... 69

29. Distribution of ABC and PTS transporters in the genomes of C. indolis and close relatives ...... 77

30. Citrate utilization genes in C. indolis ...... 79

xiii

CHAPTER 1

INTRODUCTION

1.1 Microbes in plant rich anaerobic environments

Plant biomass is a composite of fibrils and sheets of cellulose, hemicellulose, pectin, lignin, waxes, and proteins forming a complex network that provides support for plants while resisting attack from bacteria and fungi. While soil and herbivore gut environments present different challenges to plant degrading bacteria in terms of nutrient availability, fluctuations in moisture, pH and temperature, and temporal constraints, complex communities of microbes in each serve similar roles in hydrolyzing and fermenting the diverse components of plant biomass. The mechanisms of plant degradation by microbial communities are fundamental to carbon flow dynamics in anaerobic environments, and underlie diverse research areas including climate change, gut microbe-host interactions, soil formation, waste management, and biofuels development.

1.2 Plant degradation in soil

Three quarters of the carbon in terrestrial ecosystems is found as organic matter in soils, mostly of plant origin [1]. Anaerobic degradation accounts for as much as 10% of cellulose hydrolysis in soils through transformations mediated by microbes [2]. Soil microbes convert lignocellulose to dissolved organic matter, recalcitrant aggregates, or gases such as CO 2 and CH 4 that are released into the atmosphere [3]. Soil microbes can

be challenged by nitrogen limitation, and/or wide fluctuations in temperature [4],

moisture levels [5], and redox conditions [6], but under normal conditions there is little

1

mixing of the substrate, favoring the persistence of slow growing bacteria. Aerobic fungi and bacteria begin the decomposition process of fresh leaf litter, but as layers build over time and become saturated with water, the rate of oxygen diffusion is soon insufficient to maintain aerobic conditions, resulting in a thin oxic layer restricted to the surface [7].

Mature forest soil, where layers of leaf litter have accumulated over decades, is an ideal source for discovering anaerobic plant degraders because anoxic conditions prevail as leaf layers become compacted and waterlogged.

Soil is one of the most diverse microbial habitats, harboring thousands of species in a single gram [8]. While estimates of soil species diversity are quite high in comparison to other environments [9], phylogenetic diversity has been found to be surprisingly low

[10], suggesting the presence of many closely related species with similar functions that may be adapted to slightly different niches [11]. Efforts to understand the physiological ecology of plant degrading microbial communities in soil has been hampered by the complexity of these communities. Forest soil organisms such as Clostridium phytofermentans [12] have been isolated and studied for biofuels development, but the process of isolation of single bacterial strains is laborious. The estimated 1% of soil microbes that are culturable have exacting nutritional requirements that are likely met in natural environments through associations with other organisms. In fact, cellulose breakdown is hastened by cocultures of cellulolytic and noncellulolytic microbes [13–

17]. Reports suggest that noncellulolytic microbes may help to maintain a favorable pH, consume oxygen or inhibitory metabolites [14], cooperate through crossfeeding of nitrogenous compounds [17], or remove inhibitors to degradation [18].

2

Chapter 2 of this dissertation describes a three-year microcosm experiment using enrichment and serial transfers to reduce the diversity of a complex soil community over time. This strategy tested the hypothesis that changes in community structure over time would be consistent across replicate samples, and enabled the detection of persistent community members.

1.3 Plant degradation in the herbivore gut

The gastrointestinal tract harbors a dense and diverse community of microbes with

bacterial numbers exceeding 10 14 per gram [19]. 70-80% of total gut volume is devoted to fermentation in herbivores, while this volume is much less in omnivores and carnivores (17% in humans and 14% in dogs) [20,21]. In horses, microbial fermentation supplies the host with over 70% of its energy needs through the production of volatile fatty acids and microbial proteins [22].

The biochemistry of plant degradation is essentially the same between soils and the gut, but the processivity in the digestive systems of animals is inherently inefficient, as the transit time of material through the gut constrains the extent of breakdown. Horses, with an average transit time of 30 hours, digest 50% of the dry matter they consume, while giant pandas, with an average transit time of 8 hours, digests only 20% of dry matter [23].

While the transit time through the digestive system imposes temporal constraints, each compartment of the gut is a remarkably stable environment in terms of temperature and moisture levels, although pH fluctuates over time between meals [24]. This stability can

be disrupted by dietary and/or metabolic changes with drastic consequences for the

3

microbial community and its activity. In horses, a sudden influx of dietary starch, favoring the growth of lactic acid bacteria in the hindgut, can precipitate lactic acidosis, leading to colic and/or laminitis.

Chapter 3 describes an experiment to track the changes in bacterial community structure and function that occur during starch induced lactate acidosis in horses and identify specific microbes that could be implicated in the recovery and/or resistance to these dramatic changes.

1.4 The genomic basis of plant degradation in soil and gut microbes

While the capacity to break down lignocellulose is a widespread characteristic in the microbial world, the majority of cellulolytic anaerobic bacteria are , mainly from families within the Order Clostridiales. Cellulolytic activity in this group is concentrated within three families, the Lachnospiraceae, and Ruminococcaceae (clusters

III and XIVa as described by Collins [25]). Not surprisingly, cultured species from these groups have been isolated from environments where plant degradation is actively occurring such as soils, compost, guts, and bioreactors. The abundance of clostridia species varies by environment with gut samples being especially enriched. Soils can harbor fewer than 5% [26], while this phyla comprises between 50–80% of the taxa in the core human [27,28] and more than 84% of the active fraction

[29]. The majority of the Firmicutes in the human gut is estimated to be clostridial [28–

30], while estimates of clostridia in the horse gut are closer to 70% [31–33].

4

Chapter 4 of this dissertation describes work to use genomic data to distinguish between clostridial species inhabiting the gut (belonging to the Lachnospiraceae, and

Ruminococcaceae) from those that are free-living (belonging to the Clostridiacea).

1.5 The genome analysis of the Clostridium saccharolyticum species group

Efforts to isolate bacteria from the experimental microcosms discussed in chapter 2 have resulted in the discovery of novel species as well as new strains closely related to those already in the literature. One particularly abundant family, the Lachnospiraceae, yielded several isolates closely related to Clostridium saccharolyticum , a taxa that has gained attention because its saccharolytic capacity has been shown to be syntrophic with the cellulolytic activity of Bacteroides cellulosolvens in coculture, enabling the conversion of cellulose to ethanol in a single step [16,34]. The C. saccharolyticum species group is a poorly described and taxonomically confusing clade. This group includes C. indolis , C.

sphenoides , C. methoxybenzovorans , C. celerecrescens , and Desulfotomaculum

guttoideum , none of which are well studied, despite interesting characteristics. Members

of this group such as C. celerecrescens are cellulolytic [35], and others are known to degrade unusual substrates such as methylated aromatic compounds (C. methoxybenzovorans ) [36], and the insecticide lindane ( C. sphenoides ) [37]. Seven members of this group (including two microcosm isolates) were targeted for whole genome sequencing in a proposal for the Community Sequencing Project of the Joint

Genome Institute (JGI) (http://www.jgi.gov) to provide insight into the genetic potential of these taxa that could then direct experimental efforts to understand the physiology and ecology of this species group.

5

Chapter 5 of this dissertation describes the genomic sequence analysis of Clostridium indolis , a member of the C. saccharolyticum species group, a poorly described and taxonomically confusing clade within the Lachnospiraceae.

6

CHAPTER 2

EARLY ESTABLISHMENT AND PERSISTENCE OF MICROCOSM COMMUNITY DIVERSITY THROUGH THREE YEARS OF SERIAL TRANSFERS

2.1 Abstract

Substrate utilization in nature results from metabolic and spatial interactions between microbes that are more easily teased apart in simplified communities. While these can be established by intentionally combining bacteria of interest, it is not clear how stable resulting communities would be over time. Here we used a strategy of enrichment and serial transfers to reduce the diversity of a complex soil community, to enable the detection of persistent community members, and to discover whether changes in community structure over time would be consistent across replicate samples. Five replicate microcosms containing switchgrass as the primary carbon source were inoculated with forest soil.

Community structure over time was determined using deep sequencing to track changes in bacterial groups across sixty-two serial transfers over a period of three years. After an initial, dramatic reduction in bacterial diversity, the five replicates were not significantly different over time (ANOVA p < 0.05). After sixty-two transfers 83-93% of reads were shared between replicates with a total of 265 shared OTUs (out of 92,346 OTUs). All of these except for one classified into the Firmicutes. Of those that classified to the family level, 82-91.5% of the reads were found in the Enterococcaceae, Clostridiaceae,

Lachnospiraceae, and Ruminococcaceae (137 shared OTUs). OTUs that differed between samples came from the same families. The most abundant OTUs include novel species

7

and new strains of the biofuels catalyst, Clostridium phytofermentans . Metabolic outputs were consistent between replicates over time.

The consistency of metabolic products and similarity of community profiles between replicates suggest selection for closely related taxa that may have similar metabolic capacity and/or cooperative interactions. The persistence of taxa in a community context suggests the potential of this method for developing stable communities from complex environments such as soil, bioreactors, and the gut.

2.2 Introduction

Soil harbors thousands of species in a single gram [8], representing enormous taxonomic and metabolic diversity. As revealed by 16S rRNA surveys, an estimated 99% of microbial diversity is missed using traditional culture techniques [38–40]. A recent analysis of 16S surveys suggested that while estimates of soil species diversity are quite high in comparison to other environments [9], phylogenetic diversity at higher taxonomic levels is surprisingly low [10]. This suggests the presence of many closely related species with similar functions that may occupy redundant or slightly different niches

[11].

Traditional techniques of continuous culture in chemostats or serial transfers for enrichment have been commonly used to select for specific metabolic traits such as substrate utilization or stress tolerance, thus simplifying the process of isolating bacteria

8

of interest from environmental samples. How communities change during enrichment is a question with broad implications for understanding the response of community composition and function to disturbance, and developing model systems to study the dynamics of microbial communities that underlie ecosystem processes. In particular, the simplification of complex plant degrading communities from the gut, soil, or bioreactors offers a useful approach for teasing out spatial and metabolic interactions that evade understanding due to the complexity of the whole community. Understanding the interactions between plant degrading microbes in these environments could provide insight into the mechanisms of degradation at the community level and strategies for improving its efficiency for waste treatment, digestive physiology and biofuels production. Establishing simplified communities that resist drastic changes in membership could assist in the development of mixed cultures for a wide range of applications, from bioreactors to probiotic production.

While the composition of microbial communities has been shown to be sensitive to disturbance regardless of the taxonomic level described or the method of measurement employed, these changes may not affect ecosystem processing rates due to functional redundancy (species that perform similar ecological roles) [41]. In cellulose-rich microcosms inoculated with defined communities of microbes, functional redundancy has

been demonstrated to maintain both biodiversity and ecosystem function [42]. In one study, efforts to link membership to function using a stable isotope probe pointed to ten taxa in the Clostridiaceae and thirteen taxa in Enterobacteriaceae as the most active

9

fermenters of glucose, suggesting the presence of many closely related and functionally redundant taxa in the earthworm gut community [43].

In this study, five communities were established from one forest soil sample and transferred weekly, then monthly into minimal media with switchgrass as the primary carbon source. Here we demonstrate that the selective pressure of enrichment and serial transfer resulted in a rapid reduction of diversity that was consistent between the five replicates over a period of three years (sixty-two transfers). Deep sequencing of the 16S rRNA gene and analysis of metabolic outputs revealed that the resulting five communities, while not identical, were strikingly similar with respect to membership, relative abundance, and metabolism.

2.3 Materials and methods

2.3.1 Soil sampling

A soil sample was collected on 9/21/09 from a mixed hardwood forest in Montague,

Massachusetts at a depth of 6-10 cm, and stored at 4 oC overnight prior to inoculation. A thick slurry was prepared by the addition of sterile water followed by vortexing.

2.3.2 Inoculation and maintenance of communities

The anaerobic techniques of Hungate [7] were used in all media preparation and transfers. Microcosms were established by inoculating five 25 ml samples of modified

MS media and 10% salt solution with 1 g of soil slurry. Cultures were incubated at 30 oC, and vortexed 2-3 times during the incubation period. Each sample was transferred

10

(2%) into fresh media every seven days for a total of 33 transfers, then every 3-4 weeks into 10 ml of the above media thereafter. Samples were collected from each microcosm at each transfer for DNA extraction and metabolic analysis. Sequencing was performed for all replicates at Time 0 and Transfers 2, 10, 20, 30, 40, 49, 55, and 62, except for replicates 1 and 2 from Transfer 55.

Modified MS media (adapted from [8,9]) was prepared as follows: For 1.0 L: KH 2PO 4,

1.04 g; K 2HPO 4, 1.11 g; NaHCO 3, 2.50 g; NH 4Cl, 0.40 g; L-cysteine HCl, 0.50 g; resazurin 0.1.5%, 1.20 ml, switchgrass (see below for preparation), 6.00 g. The pH was adjusted to 7.0 using 6N KOH. Switchgrass was a homogenate of chopped stem and leaf material grown and harvested as hay at the University of Massachusetts, South Deerfield farm facility in 2009. The switchgrass was further dried in a 50 oC oven for two days, and ball milled at 200 Hz for 2 minutes. Salt solution was prepared as follows: for 100 ml: MgCl 2×6H 2O, 1.0 g; CaCl 2×2H 2O, 0.15 g, FeSO 4×7H 2O, 0.00125g; autoclaved for

20 minutes.

DNA extraction, amplification, and sequencing

DNA was extracted from each sample as described elsewhere [6,44] with the following modifications. Samples were subjected to two extractions in Matrix E bead tubes (MP

Biomedicals, Solon, OH) containing 5% CTAB in 1 M NaCl and 0.25 M phosphate buffer (pH 8), phenol: chloroform: isopropyl alcohol (25:24:1), and 0.1 M ammonium aluminum sulfate, followed by separation with 24:1 chloroform: isoamyl alcohol. DNA was then precipitated with 30% polyethylene glycol 6000 in 1.6 M NaCl, washed with

11

70% ethanol, and resuspended in 10 mM Tris, pH 8. Replicate extractions were pooled and purified using MoBio PowerClean DNA clean-up kit (MoBio, Carlsbad, CA) and quantified using Quant-iT PicoGreen assay (Invitrogen, Carlsbad, CA).

Amplification of the V4 region of the 16S rRNA gene and attachment of Illumina adaptors and barcodes for multiplexing samples (reverse read only) was done in triplicate as described elsewhere [45]. Briefly, genomic DNA was amplified with universal forward 515F (5’- Illumina adapter- Forward primer pad- Forward primer linker-

GTGCCAGCMGCCGCGGTAA-3’) and universal reverse 806R (5’- Illumina adapter-

Golay barcode- Reverse primer pad- Reverse primer linker-

GGACTACHVGGGTWTCTAAT-3’). Each PCR reaction mixture contained 10 ng of genomic DNA, 2.5 ul 10X buffer, 2.0 ul MgCl 2 (25 mM), 2.0 ul dNTP (2.5 mM each),

5.0 mM (each) forward and reverse primers, 1.25 ul (25 ug) BSA (Roche, Indianapolis,

IN), 0.25 ul (1.25 U) Ex Taq (TaKaRa, Japan), and molecular grade water to reach a volume of 25 ul. PCR was performed with 3 min of initial denaturation at 94 oC followed by 30 cycles of the following program (denaturation, 94 oC for 45 sec; annealing, 50 oC for 30 sec; and extension, 72 oC for 45 sec) followed by a final extension at 72 oC for 7 min. PCR products were cleaned using Qiagen MinElute kit (Qiagen, Valencia, CA) as directed and quantified using the Quant-iT PicoGreen assay (Invitrogen, Carlsbad, CA).

Equal concentrations were pooled and sequenced using the Illumina MiSeq platform at the Dana Farber Cancer Institute, Molecular Biology Core Facilities (Cambridge, MA).

12

2.3.3 Sequence analysis

Sequences were demultiplexed and trimmed of bar codes and primer sequences, then filtered for quality. Reverse and forward reads were assembled into contigs using default

parameters in FLASH [46]. Sequence processing was done in QIIME using the following workflow: Reads were aligned using default parameters (PyNAST) [47], operational taxonomic units (OTUs) were picked at the 97% similarity threshold using

QIIME [48] default parameters. Chimeric sequences were identified using ChimeraSlayer

[49] and removed, taxonomic assignments were made against the most recent Greengenes database (October, 2012) [50]. Rarefaction curves, alpha diversity (Shannon and

Simpson), richness (observed species and Chao1) and Good’s coverage were calculated using default parameters in QIIME [48]. The Shapiro-Wilk test was used to confirm the normal distribution of alpha diversity values, and ANOVA was used to indicate differences between samples and across transfer points using the statistical package R

[51].

Beta diversity estimates (Bray-Curtis and weighted Unifrac) were made at a sampling depth of 40,000 in QIIME [48]. Ordination of the Bray-Curtis and weighted Unifrac matrices was done using non-metric multidimensional scaling (NMDS) and principle components analysis (PCoA) respectively. 2-D plots were visualized in R [51]. Sequence data has been submitted to the NCBI Sequence Read Archive (SRA), Accession number:

SRP029387.

13

2.3.4 Metabolic outputs

Concentrations of acetate, ethanol, lactate, formate, propionate, and butyrate were measured from centrifuged, filtered culture supernatant by HPLC using a BioRad

Aminex HPX-87H column (Bio-Rad, Hercules, CA) with 0.005 M H 2SO 4 as the running buffer.

2.3.5 Isolation of microcosm bacteria

Eight strains were isolated from microcosms at Transfer 43. Prior to DNA extraction, culture purity was confirmed by microscopy and taxonomy assigned by amplifying the

16S rRNA gene using universal bacterial primers: 27F (5’-

AGAGTTTGATCCTGGCTCAG-3’) and 1492R (5’-ACGGCTACCTTGTTACGACTT-

3’) (Integrated DNA Technology, Coralville, IA). Using the taq PCR Core Kit (Qiagen,

Valencia, CA), each PCR reaction contained a single colony or 2.0 ul of culture, 5.0 ul

10X buffer, 2.0 ul MgCl 2 (25 mM), 1.0 ul dNTP (2.5 mM each), 5.0 mM (each) forward and reverse primers, 0.25 ul Taq polymerase and molecular grade water to reach a

volume of 50.0 ul. PCR was performed with 5 min of initial denaturation at 95 oC followed by 30 cycles of the following program (denaturation, 94 oC for 1 min; annealing, 53 oC for 1 min; and extension, 72 oC for 2 min) followed by a final extension at 72 oC for 10 min. PCR products were cleaned using Qiagen PCR Purification kit

(Qiagen, Valencia, CA) as directed and sequenced at the University of Massachusetts

Genomics Resource Lab (Amherst, MA) using an ABI 3130XL Sequencer (Beckman

Coulter, Danvers, MA). General taxonomy assignments based on full length 16S rRNA sequences were done using the Ribosomal Database Project (RDP) classifier tool [52,53].

14

Finer level taxonomic assignments were made by constructing neighbor-joining phylogenetic trees in MEGA [54] using 16S rRNA gene sequences from the isolates and all type species from the order Clostridiales downloaded from RDP [52].

2.3.6 Deep sequencing of pooled isolates

A sample containing equal concentrations of genomic DNA (5 ng/ul) from eight

bacteria isolated from the microcosm cultures was amplified and sequenced using the

Illumina Miseq platform. Sequence data was analyzed as described above. Local

BLASTN [55] was used to run full-length isolate 16S rRNA sequences against a database of the representative microcosm sequences created by QIIME [48] during OTU picking to determine which OTU mapped to each isolate.

2.4 Results

2.4.1 Metabolic outputs were consistent between replicates

Measurements of fermentation products (propionate, ethanol, butyrate, acetate, formate, and lactate) at fourteen time points via HPLC revealed that levels of lactate, formate, and propionate dropped below detectable levels after Transfer 20. Concomitantly, the levels of acetate and butyrate increased. There was a leveling off in the production of all metabolites after Transfer 30, with increasing variability between replicates in the production of acetate and butyrate over time. (Fig. 1).

2.4.2 Sequencing metrics of microcosm samples

As shown in Table 1, there were over 4.5 million sequences and 99,558 OTUs in the dataset following quality filtering and initial OTU picking in QIIME [48]. Chimera

15

removal using Chimera Slayer [49] left more than 4.2 million sequences and 92,346

OTUs for further analysis. Average read length was 254 bp.

50

45 Acetate 40

35

30

3 Butyrate

2

1

6 Ethanol 4

2

0

15 Formate 10 Concentration (mM) Concentration

5

0

1 Lactate

0

2 Propionate

1

0

0 20 40 60 Transfers

Figure 1. Short chain fatty acid metabolites in microcosms over time. Concentration (mM) of acetate, butyrate, lactate, and propionate measured by HPLC at thirteen transfer points. Error bars show standard deviation of five samples.

16 Table 1. Sequence metrics of five microcosm samples at nine transfer points.

Before Chimera Checked Chimera After Chimera > 1.5% Check Check Abundance Total number of 4,554,420 4,204,943 3,328,037 sequences Sequences per 114,734.5 105,123.575 83,200.925 sample (mean) Sequences per 49,400 45,989 37 sample (min) Number of OTUs 99,558 92,346 18 Total number of sequences, sequences per sample, and OTUs before and after chimera detection using Chimera Slayer[49].

T0 = S1 = S2 = S3 = S4 = S5 =

Figure 2. Rarefaction curves of microcosm sequencing data . Sequences per sample vs. observed species at a sampling depth of 40,000 sequences calculated and plotted in QIIME [48].

17 Rarefaction curves of the number of sequences per sample by observed species (Fig. 2) indicated a slowing down of new species at the minimum sequence depth of 40,000, but not a clear leveling off. The rarefaction curves were based at a low sampling depth due to one sample with a low number of sequences. Since all other samples had over twice this depth, and Good’s Coverage exceeds 94% for all samples we are confident that our sequencing efforts have captured most of the diversity in these samples.

2.4.3 Diversity is reduced early, and then stabilizes over time

Estimators of observed species (Fig. 3a), and Chao1 (Fig. 3b) indicated that species richness fell dramatically between Time 0 and Transfer 2, and then leveled off. Alpha diversity as estimated by Shannon (Fig. 4a) and Simpson Indices (Fig. 4b) showed a similar pattern, with the highest diversity at Time 0, followed by a sharp decline and leveling off. Based on ANOVA tests ( α=0.05) there were no statistical differences

between replicates or across transfers after Time 0 for all four diversity measures.

a b 10000 30000

7500

20000

5000 Chao1

10000 Observed Species 2500

0 0

0 20 40 60 0 20 40 60 Transfer Transfer

Figure 3. Richness of microcosms over time. Observed species ( a) and Chao1 (b) calculated in QIIME [48]. Error bars show standard deviation of five samples.

18 a b 1.00 8

0.75 6

0.50 4 Simpson Index Simpson Shannon Index Shannon

2 0.25

0 0.00

0 20 40 60 0 20 40 60 Transfer Transfer Figure 4. Alpha diversity in microcosms over time. Shannon Index ( a) and Simpson Index ( b) calculated in QIIME [48]. Error bars show standard deviation of five samples.

Beta diversity plots showed that after Time 0 there was no clear pattern in distance

between replicates or across transfers as shown by weighted Unifrac (Fig. 5a and b),

although there were two loose clusters dividing the earlier transfers (below 20) from the

later transfers (30 and above). Bray-Curtis dissimilarity plots showed the same pattern

(not shown).

a 0.2 b 0.2 Transfer 0 Sample 2 0.1 0 0.1 10 1 20 2 0.0 0.0 30 PCo2 3 PCo2 40 4 −0.1 −0.1 49 5 Explains 16.90% of variation 16.90% Explains Explains 16.90% of variation 16.90% Explains 55

−0.2 −0.2 62

−0.6 −0.4 −0.2 0.0 0.2 −0.6 −0.4 −0.2 0.0 0.2 PCo1 PCo1 Explains 57.57% of variation Explains 57.57% of variation

Figure 5. Beta diversity of microcosms by sample and transfer. Principle component (PCoA) plots of weighted Unifrac analysis colored by Sample ( a) and Transfer ( b). The circled region highlights Transfers 2-20.

19

2.4.4 Simplified community structure is consistent between replicates across transfers

The microbial community of the original sample was dominated by three major phyla:

Acidobacteria , Proteobacteria , and Verrucomicrobia , with a total of 10 phyla at an abundance of greater than 1.5%, comprising 93% of the sequences (Fig. 6). The

Firmicutes at Time 0 were at an abundance of 0.47%. There was an immediate and dramatic reduction in the number of phyla across replicates from Time 0 to Transfer 2 that was sustained throughout the time course, with only the Firmicutes and the

Proteobacteria present at abundances greater than 1.5% at and beyond Transfer 2.

1.00

Acidobacteria

0.75 Actinobacteria

Bacteria; Other

Bacteroidetes

Chloroflexi

0.50 FCPU426

Firmicutes

Relative Abundance Relative Planctomycetes

Proteobacteria 0.25 TM6

Verrucomicrobia

0.00

0 2 10 20 30 40 49 55 62 Transfer

Figure 6. Distribution of microcosm sequences by phyla. Relative abundance of sequences for each operational taxonomic unit found at or greater than 1.5% at each transfer point by phyla.

20

2.4.5 Taxa level community composition shifts dramatically early, then stabilizes

Consistent with the dramatic changes seen in phyla level abundance, there were enormous shifts in the identity and distribution of OTUs between Time 0 and Transfer 2.

At Time 0 there were twelve OTUs with abundance greater than 1.5%, all belonging to the Acidobacteria , Proteobacteria , and Verrucomicrobia phyla. Together these OTUs comprised only 33-42% of the total sequence abundance at Time 0, with the majority of

OTUs found at less than 1.5% abundance each.

Of the major phyla present at Time 0, only the Proteobacteria were present above 1.5% at Transfer 2, however the identity of the OTUs changed dramatically (Fig. 7).

0.20 0.15 1 0.10 0.05 0.00

0.20 0.15 2 0.10 0.05 0.00 Entbac1

0.20 Entbac2 0.15 3 0.10 Entbac3 0.05 Gamma1 0.00 Gamma2 Relative Abundance Relative 0.20 0.15 4 0.10 0.05 0.00

0.20 0.15 5 0.10 0.05 0.00 0 2 10 20 30 Transfer

Figure 7. Distribution of Proteobacteria sequences by OTU. Relative abundance of Proteobacteria sequences for each operational taxonomic unit found at or greater than 1.5% at each transfer point.

21

The two Proteobacteria OTUs found at Time 0 in abundances greater than 1.5%

(Gamma1 and Gamma2) did not persist in Transfer 2 and beyond, and the three OTUs from the Proteobacteria that were abundant after Transfer 2 were not abundant at Time 0.

These three OTUs, members of the Enterobacteriaceae, persisted until Transfer 20 and then declined (Fig. 7 and 8). None of the abundant OTUs from Time 0 persisted at

Transfer 2 at greater than 1.5% abundance. Additionally, none of the eight OTUs at 1.5% abundance or greater at Transfer 2 were found at greater than 0.09% in the Time 0 dataset.

0.75

0.50 1

0.25 Clost1

Clost2 0.00 Clost3 0.75 Clost4

0.50 2 Clost5

0.25 Clost6

0.00 Entbac1

0.75 Entbac2

Entbac3 0.50 3 Entcoc1 0.25 Lachno1 0.00 Relative Abundance Relative Lachno2 0.75 Lachno3

0.50 4 Lachno4

0.25 Lachno5

0.00 Lachno6

Rumino1 0.75 Rumino2

0.50 5

0.25

0.00 2 10 20 30 40 49 55 62 Transfer

Figure 8. Distribution of microcosm sequences by OTU, across all transfers. Relative abundance of sequences for each operational taxonomic unit found at or greater than 1.5% at each transfer point. Data is missing for Transfer 55, replicates 1 and 2.

22

Beyond Transfer 2, eighteen OTUs from five families were found at abundances greater than 1.5% (Fig. 8) including the Enterobacteriaceae discussed above. The four other families that contained abundant OTUs were the Enterococcaceae, Lachnospiraceae,

Clostridiaceae, and Ruminococcaceae (Fig. 8). The phylogenetic relationships of these

OTUs to selected type species is shown (Appendix A). By Transfer 62, ten OTUs were present in the sequence data at abundances greater than 1.5%, three of which were common to all five replicates (Fig. 9). Two OTUs, Entcoc1 and Lachno6 were found at greater than 1.5% abundance in every replicate and time point after Time 0. Entcoc1,

Lachno6, and Lachno4 comprised between 48-67% of the total sequence abundance in all

1 2 3 4 5

0.75

All shared

0.50 Entcoc1

Lachno4

Lachno6 Relative Abundance Relative

0.25

0.00

Transfer 62

Figure 9. Distribution of shared sequences in all replicates at Transfer 62. Relative abundance of three most abundant operational taxonomic units (OTUs) shared between all samples as well as all shared OTUs for Transfer 62.

23

replicates at Transfer 62 (Fig. 9). The five replicates at this time point were remarkably similar, sharing 265 OTUs that together comprised 83-93% of the reads (Fig. 9).

2.4.6 Sequencing metrics of eight pooled isolates

The number of total OTUs even at Transfer 62 was surprisingly high (21,255). To test the ability of our sequencing and analyses strategies to accurately assess OTU assignments, genomic DNA was pooled from the eight isolated strains and the resulting mixture was sequenced using the Illumina MiSeq platform as above for the microcosms.

As shown in Table 2, there were over 334,000 sequences and 5,900 OTUs in the dataset following quality filtering and initial OTU picking in QIIME [48]. Average read length was 254 bp. Chimera removal using Chimera Slayer [49] removed approximately 1,000 sequences, leaving 4,131 OTUs at a similarity threshold of 97% for further analysis.

Table 2. Sequence metrics of sequencing quality assessment.

Before Chimera After Chimera Isolate Check Check Sequences Total number of 334,881 333,086 305,356 sequences

Number of OTUs 5,918 4,131 8 Total number of sequences and OTUs before and after chimera detection using Chimera Slayer [49].

The sequence identities of the eight, pooled isolates were matched against the OTU sequences in the dataset using local BLASTN to verify recovery of the known sequences.

Percent identity to each OTU representative sequence ranged from 97% to 100%, and the coverage was 100%. Phylogenetic affinities between the isolate sequences, OTU representatives, and selected type species verify of relationships (Appendix A).

24

The eight isolates corresponded to the top eight OTUs and represented 92% of the total reads. The remaining reads all classified as Firmicutes.

2.5 Discussion

This research suggests that for plant degrading anaerobic bacteria from soil, a dramatic change in microbial community composition occurs during the initial process of culture enrichment and serial transfers, but this drastic change is followed by relative stability, in a process that is highly replicable. Additionally, this strategy enables the discovery of novel microbes in a community context that might otherwise evade detection due to low abundance in the natural environment.

2.5.1 Dramatic change early during the time course

The soil sample used to inoculate the microcosm cultures at Time 0 was comprised of three abundant phyla ( Acidobacteria , Proteobacteria , and Verrucomicrobia ), with over half of the sequences spread between several less abundant phyla, notably the

Actinobacteria, Bacteroidetes, Planctomycetes , and Chloroflexi). At Time 0 in this study,

as in many other soil surveys, the Firmicutes was a relatively rare group. The distribution

of bacterial phyla found at Time 0 in this study mirrors reports from 16S rRNA surveys

for a variety of soil samples [56–58].

The rapid and dramatic change in bacterial community structure we observed between

Time 0 and Transfer 2 has been observed in other microcosm experiments [59,60]. All

eighteen OTUs present at abundances of greater than 1.5% after Transfer 2 were present,

25

but extremely rare in Time 0, and the Time 0 and Transfer 2 cultures had no OTU in common at abundances greater than 1.5%.

Further changes in abundance, notably the rise and decline of Enterobacteriaceae taxa, occurred over a period of 30 transfers, indicating that the community that emerged from the initial shift was resistant to further change. By Transfer 62, 83-93% of the total sequence abundance was shared between all samples. Three OTUs, Lachno4, Entcoc1 and Lachno6 comprised between 48-67% of these sequences.

2.5.2 Stability of microcosm communities over time

Throughout the time course of this experiment, four bacterial families, Enterococcaceae,

Lachnospiraceae, Clostridiaceae, and Ruminococcaceae were abundant and persistent in all samples (Fig. 7). There were 21,256 OTUs at Transfer 62, however only 265 were common across all five replicates. This suggests the possibility that many OTUs are spurious, as many of the OTUs that are not shared are at low abundance. We thus sequenced a pooled sample of eight isolates from the microcosms in order to assess our ability to distinguish real OTUs from artifacts.

We were able to recover the eight isolate sequences from the 5,900 called OTUs in the sequencing assessment dataset at abundances greater than 1.5%. The relative abundances of the eight isolates were not equal indicating primer/ PCR bias, differences in DNA recovery and/or variation in rRNA gene copy numbers (S5). Errors in PCR and sequencing can generate artifacts resulting in overestimation of diversity [61,62].

26

Differentiating real sequences from artifacts presents a challenge for researchers seeking to estimate OTU abundance and diversity accurately, and is an active topic of discussion in the field of microbial ecology.

Following published recommendations [61], we used a 97% threshold in our OTU

picking method and focused our analysis on more abundant OTUs, and those that were shared between samples and over time. Despite the risk of ignoring important rare taxa, we chose a more conservative abundance cutoff (1.5%) based on the resolution of our sequence assessment experiment. As sequencing technology improves it will be possible to determine the number of OTUs in a complex sample with greater accuracy and confidence.

2.5.3 Reproducibility

Our microcosm enrichments produced simplified, replicate communities from a complex environmental sample. A similar result was found from adapted consortia from tropical soil also using switchgrass as a carbon source in which a few dominant

Firmicutes OTUs and an abundance of rare taxa emerged after 8 weeks of incubation

[63]. Stable, replicable consortia of plant degraders have also been produced in intentional communities of up to eight species [42]. We know of no study that demonstrates community stability and replicability over a time course as long as three years, and suggest that these results demonstrate the robustness of this strategy for the long term cultivation of bacterial communities in industrial contexts such as bioreactors or probiotic production. Since the microcosm experimental design in no way models

27

conditions in the natural soil environment, we make no speculations about whether the observed patterns of diversity reflect ecologically relevant trends.

2.5.4 Potential functional redundancy among the four Firmicutes families

While the four bacterial families, Enterococcaceae, Lachnospiraceae, Clostridiaceae, and Ruminococcaceae were abundant and persistent in all samples the distribution of

OTUs within each family differed by replicate (Fig. 8). We suggest that these closely related taxa may be fulfilling similar ecological roles in the microcosm environment.

While it is unclear whether these groups form stable associations in nature, we hypothesize that they were active in the utilization of plant material in our experimental system, and served roles similar to those described in previous studies [2,64,65]. The

Lachnospiraceae, Ruminococcaceae and some Clostridiaceae are well known gut symbionts that are specialized to degrade complex plant material [66–68]. We suspect that taxa from these groups, specifically OTUs closely related to known cellulose degraders Clostridium phytofermentans, Clostridium sporosphaeroides , and Clostridium celerecrescens (Appendix A), to be involved in solubilizing complex plant polysaccharides to oligosaccharides, making them available for noncellulolytic, fermentative taxa such as those related to Clostridium xylanovorans and Enterococcus sp .

In addition to being common gut symbionts, Enterococcaceae are ubiquitous in soils,

freshwater and marine sediments, watersheds, and vegetation (reviewed in [69]). They

are capable of fermenting a wide variety of carbon sources producing short chain fatty

acids such as acetate, lactate, and butyrate, but they are not cellulolytic [69,70]. There is

an extensive body of literature describing the pathogenicity of Enterococcus sp . [70] and

28

their utility as indicators of fecal contamination in drinking water [69,71]. Enterococcus

is used in cocultures, producing bacteriocins in the production of fermentation products

such as sausages [72] and cheese [73], but relatively little is known about their ecology

and how they interact with other taxa in the environment.

We are especially intrigued by the persistent co-occurrence of OTUs Entcoc1, Lachno4,

and Lachno6, as these are closely related to bacteria that are commonly found in the gut.

Given the high abundance of the Entcoc1, an OTU related to a group of well described

lactic acid bacteria, we were surprised to detect low amounts of lactic acid in the culture

supernatant. We suggest that lactate utilizers in the community were using this resource

to produce butyrate and acetate as has been described in gut environments [74,75].

Elucidating the roles and interactions between members of these and other members of

our microcosm communities in the degradation of plant material is a subject of continued

research.

2.6 Conclusions

Enrichment and serial transfers have been traditionally used to aid in the isolation of

bacteria from environmental samples, but the specific changes communities undergo, the

stability of communities across transfers, and the replicability of such strategies over a

long time course have not been described. This research suggests that enrichment and

serial transfers do not generate communities that are strongly divergent in terms of

bacterial community diversity and membership or metabolic outputs. While community

membership over time may differ slightly between replicate samples, the differences

29

involve closely related species, thus reflect functional redundancy and not divergent communities or metabolic activity. Additionally, this strategy represents an effective way to detect and select for novel taxa that might be missed in shorter-term enrichments and isolation protocols from complex communities. These have implications for the maintenance of long-term stable communities in bioreactors, and other industrial processes utilizing mixed communities.

30

CHAPTER 3

AN IN VITRO MODEL OF THE HORSE GUT MICROBIOME ENABLES IDENTIFICATION OF LACTATE-UTILIZING BACTERIA THAT DIFFERENTIALLY RESPOND TO STARCH INDUCTION

3.1 Abstract

Laminitis is a chronic, crippling disease triggered by the sudden influx of dietary starch.

Starch reaches the hindgut resulting in enrichment of lactic acid bacteria, lactate

accumulation, and acidification of the gut contents. Bacterial products enter the

bloodstream and precipitate systemic inflammation. Hindgut lactate levels are normally

low because specific bacterial groups convert lactate to short chain fatty acids. Why this

mechanism fails when lactate levels rapidly rise, and why some hindgut communities can

recover is unknown.

Fecal samples from three adult horses eating identical diets provided bacterial

communities for this in vitro study. Triplicate microcosms of fecal slurries were enriched with lactate and/or starch. Metabolic products (short chain fatty acids, headspace gases, and hydrogen sulfide) were measured and microbial community compositions determined using Illumina 16S rRNA sequencing over 12-hour intervals.

We report that patterns of change in short chain fatty acid levels and pH in our in vitro system are similar to those seen in in vivo laminitis induction models. Community differences between microcosms with disparate abilities to clear excess lactate suggest profiles conferring resistance of starch-induction conditions. Where lactate levels recover following starch induction conditions, propionate and acetate levels rise correspondingly

31

and taxa related to Megasphaera elsdenii reach levels exceeding 70% relative abundance.

In lactate and control cultures, taxa related to Veillonella montpellierensis are enriched as lactate levels fall. Understanding these community differences and factors promoting the growth of specific lactate utilizing taxa may be useful to prevent acidosis under starch- induction conditions.

3.2 Introduction

3.2.1 The equine digestive tract

Horses are hindgut fermenters, adapted to grazing continually on marginal forages that change seasonally, thus slowly [76]. The hindgut (caecum and colon) comprises roughly two thirds of the volume of the equine digestive tract [77]. Here complex plant material is fermented by microbes to short chain fatty acids (SCFA) such as acetate, propionate, and butyrate, which provide 60-70% of the daily energy needs of the horse [22,78]. Rapid dietary change and modern feeding practices of 2-3 meals a day of starch-based concentrate and/or fructans from rich pasture can disrupt normal fermentation in the hindgut, causing lactic acidosis, and colic [79–82], and predisposing animals to bouts of laminitis [83–85].

3.2.2 Laminitis and lactic acidosis

Laminitis is a chronic, crippling disease, accounting for 15% of all lameness in horses in the United States, with over 27% unable to return to normal work, and 4.7% mortality

[86]. It is characterized by weakened adhesion and eventual detachment of the distal phalynx from the lamellae of the inner hoof wall resulting in permanent rotation of the

32

coffin bone and severe pain. Factors released into the bloodstream by bacteria in the gut during lactic acidosis are thought to serve as triggers for dietary laminitis [87], however the molecular mechanisms underlying induction are unknown.

Surveys of equine hindgut bacteria using culture based methods [88] and 16S rRNA gene sequencing [31] have detected a diverse community of novel microbes dominated by Firmicutes, with Bacteroidetes, Proteobacteria, and Verrucomicrobia as other major phyla. Studies have detected a greater proportion of fibrolytic bacteria than starch and lactate utilizing bacteria in the cecum than in the colon [89], reflecting a substrate content normally low in starch and soluble carbohydrates due to the action of endogenous enzymes and absorption of nutrients in the small intestine.

Experimental in vivo models of laminitis induction using starch gruel or oligosaccharide

[90] have revealed changes in hindgut microbiota during the developmental stage (24-36 hours post induction), correlated with a drop in pH and an increase in lactic acid concentration [91]. Lactic acid bacteria, specifically members of the Streptococcus bovis/ equinus group (now renamed Streptococcus lutetiensis [92]), have been implicated as

major producers of lactic acid, rapidly increasing in numbers as lactic acid levels rise and

the pH drops. At the lowest caecal pH (4-4.5) and levels of lactic acid reaching 1000

umol/g caecal fluid, acid sensitive fibrolytic and gram negative bacteria die off, while

Lactobacilli sp . and Mitzuokella sp. increase [91]. By 32-36 hr, hindgut lactate levels and

pH approach normal levels in most horses [91]. In other experiments of starch induction,

33

blood D-lactate levels peaked at 20-24 hr, then declined and disappeared by 36-40 hr

[93].

3.2.3 Lactate utilizing bacteria

Lactate levels in the hindgut are normally low due to the activity of lactate utilizing

bacteria. It is unclear why this mechanism fails during conditions of starch induction.

While studies of lactate producers have pointed to specific taxa that proliferate during the

developmental stage of laminitis [91] , little is known about how the abundance of lactate

utilizing bacteria changes over the same time course, which lactate utilizers survive the

drop in pH, and which lactate utilizers are active in the later stages to bring lactate

concentrations back to normal levels.

3.2.4 Experimental summary

In this study we used fecal samples collected from 3 healthy, adult horses eating an

identical pasture based diet in an in vitro model system to track bacterial metabolites and community shifts over time in response to enrichment with starch and/or lactate. Patterns of changes in lactate concentration and pH were similar to those reported in published in vivo studies [79]. Illumina 16s rRNA amplicon sequencing was used to track changes in hindgut microbiota over the time course, specifically identifying lactate utilizing taxa and

bacterial groups that proliferated as lactate levels drop. Additionally, we identified community differences in cultures lacking the ability to clear excess lactate, which may lead to further insight into why some horses are resistant to starch induction, and point to

bacteria with the potential to attenuate or prevent lactic acidosis.

34

3.3 Materials and Methods

3.3.1 Sampling and in vitro enrichments

Fecal samples were manually collected from the midrectum of three Morgan geldings cohoused at the University of Massachusetts Hadley Farm and fed identical hay based diets. None of the animals had received antibiotics, anthelmenthics, or other medications for at least three months prior to sampling. Samples were protected from oxygen exposure during collection by inverting the glove around each as it was removed, and

placing each immediately in a container evacuated of oxygen. Samples were kept as close to 39 oC as possible in a heated, insulated container during transport and the following steps of dilution and inoculation. In an anaerobic chamber, samples were diluted to make a 10% slurry in anaerobic dilution media prepared as described by Bryant [94]. This slurry was used to inoculate basal media described by de Carvalho [95] at 2.5% which was enriched with either 1% soluble starch, 50 mM sodium lactate, or both 1% soluble starch and 50 mM sodium lactate. Control cultures were diluted at 2.5% of basal media with no enrichment. Cultures were kept under anaerobic conditions, incubated at 39 oC and sampled every 12 hours for metabolite analysis and DNA extraction. Sampling and enrichments were subsequently repeated for the same horses, inoculant concentrations, culture and incubation conditions in 125 ml serum bottles (Wheaton, Millville, NJ) fitted with stoppers that enable headspace gas collection and analysis.

3.3.2 Metabolite measurements

Short chain fatty acids (acetate, lactate, butyrate, succinate, formate, and propionate) were measured for samples taken over the time course using high performance liquid

35

chromatography (HPLC) (Shimadzu, Japan) with an Aminex HPX-87H column (Bio-

Rad, Hercules, CA). Headspace gases were analyzed using a gas chromatograph

(Shimadzu GC-8A, Shimadzu, Japan) fitted with a HayeSep DB column 100/120

(Bandera, TX). Hydrogen sulfide levels were measured using the methylene blue assay as in Cline [96] modified for culture samples as follows: at each time point, 1.0 ml of culture was removed from each serum bottle via syringe, transferred into sealed vials containing an equal volume of 1.2% degassed zinc acetate solution, and stored at 4 oC until all samples were collected. To a 1.0 ml subsample, 62.5 ul of 7% sodium hydroxide was added. Following a 15 min incubation at room temperature, 187.5 ul of 0.1% N,N’- dimethyl-p-phenylenediamine and 187.5 ul 10 mM iron (III) chloride were added, stoppered immediately, and incubated for 20 minutes at room temperature. The resulting suspension was spun down and the absorbance of the supernatant was measured at 670 nm in comparison to standard solutions (0 - .55mM sodium sulfide, and uninoculated media controls. pH was determined using EMD colorphast pH strips (Fischer Scientific,

Pittsburgh, PA).

3.3.3 DNA extraction, amplification, and sequencing

DNA was extracted from each sample as described elsewhere [44] with the following modifications. Samples were subjected to two extractions in Matrix E bead tubes containing 5% CTAB in 1 M NaCl and .25M phosphate buffer (pH 8), phenol: chloroform: isopropyl alcohol (25:24:1), and 0.1 M ammonium aluminium sulfate, followed by separation with 24:1 chloroform: isoamyl alcohol. DNA was then precipitated with 30% polyethylene glycol 6000 in 1.6 M NaCl, washed with 70%

36

ethanol, and resuspended in 10 mM Tris, pH 8. Replicate extractions were pooled and purified using MoBio PowerClean DNA clean-up kit (MoBio, Carlsbad, Ca) and quantified using Quant-iT PicoGreen assay (Invitrogen, Carlsbad, Ca).

Amplification of the V4 region of the 16S rRNA gene and attachment of Illumina adaptors and barcodes for multiplexing samples (reverse read only) was done in triplicate as described elsewhere [45]. Briefly, genomic DNA was amplified with universal forward 515F (5’- Illumina adapter- Forward primer pad- Forward primer linker-

GTGCCAGCMGCCGCGGTAA-3’) and universal reverse 806R (5’- Illumina adapter-

Golay barcode- Reverse primer pad- Reverse primer linker-

GGACTACHVGGGTWTCTAAT-3’). Each PCR reaction mixture contained 10 ng of genomic DNA, 2.5 ul 10X buffer, 2.0 ul MgCl 2 (25 mM), 2.0 ul dNTP (2.5 mM each),

5.0 mM (each) forward and reverse primers, 1.25 ul (25 ug) BSA (Roche, Indianapolis,

IN), 0.25 ul (1.25 U) Ex Taq (TaKaRa, Japan), and molecular grade water to reach a volume of 25 ul. PCR was performed with 3 min of initial denaturation at 94 oC followed by 30 cycles of the following program (denaturation, 94 oC for 45 sec; annealing, 50 oC for 30 sec; and extension, 72 oC for 45 sec) followed by a final extension at 72 oC for 7 min. PCR products were cleaned using Qiagen MinElute kit (Qiagen, Valencia, Ca) as directed and quantified using the Quant-iT PicoGreen assay (Invitrogen, Carlsbad, Ca).

Equal concentrations were pooled and sequenced using the Illumina MiSeq platform at the Dana Farber Cancer Institute, Molecular Biology Core Facilities (Cambridge, MA).

3.3.4 Sequence analysis

Sequences were demultiplexed and trimmed of bar codes and primer sequences using

37

FastQC [97], then filtered for quality and reverse and forward reads were assembled into contigs using FLASH [46]. Sequence processing was done in QIIME [48] using the following workflow: Reads were aligned using default parameters (PyNAST) [47], operational taxonomic units (otus) were picked at the 97% similarity threshold using the subsampled open-reference option, chimeric sequences were identified using

ChimeraSlayer [49] and removed, taxonomic assignments were made against the most recent greengenes database (October, 2012) [50]. Sequence data has been submitted to the NCBI Sequence Read Archive (SRA), Accession number: SRP028582.

3.4 Results

3.4.1 Patterns of changes in metabolites

Despite variation in the pH response between the three horses, in all of the starch and starch/lactate enrichments, the pH dropped to below 6 by hour 12, and reached levels

between 4 and 5 by hour 18 (Fig. 10). These values paralleled the peak in lactate levels for these cultures over the same time course (Fig. 11). The control and lactate enriched treatment groups showed an initial increase in lactate followed by a rapid decline as the lactate was used up. In the starch enriched cultures, lactate levels peaked by hour 18, exceeding 100 mM in cultures enriched with both starch and lactate. Where lactate levels dropped over time, there was a corresponding increase in acetate, propionate, and butyrate.

The three horse cultures differed dramatically in the ability to attenuate accumulated lactate. The lactate was reduced to below detectable limits in the control and lactate

38

!"#$%"& '()$($* +$(%), +$(%),-(#.-'()$($*

2 ?"%<*-4

?"%<*-5

?"%<*-6

1 >?

0

/

3 43 53 63 73 /3 3 43 53 63 73 /3 3 43 53 63 73 /3 3 43 53 63 73 /3 89:*-;,%<=

Figure 10. pH of cultures over time by horse. pH of cultures from each horse and culture condition measured at 6 hour intervals.

enriched treatment groups of all three horse cultures, however the peak was higher for horse 3 cultures and took longer to drop. In the starch and starch/lactate enrichments the differences were more dramatic. Lactate persisted at maximum levels in horse 3 cultures over the full time course while dropping for horse 1 and 2 cultures by hour 36.

Headspace gases (hydrogen and methane) and hydrogen sulfide levels measured at 12 hour intervals did not show consistent differences between treatment and control conditions due to variation between horses especially for the starch and starch/lactate cultures (Appendix B and C).

39

!"#$%&' !"#$%&( !"#$%&) '+*

=2%.1.%

'** >?.@#1.% 012.1.% ,"-.#"/ A#"B8"-1.%

+*

* '+*

'** 012.1.%

+*

* '+*

'** 3.1#24& ,"-2%-.#1.8"-&:9<;

+*

* '+* 3.1#24&&1-5&012.1.%

'**

+*

*

* '* (* )* 6* +* * '* (* )* 6* +* * '* (* )* 6* +* 789%:4#$;

Figure 11. Short chain fatty acid metabolites over time by horse. Concentration (mM) of acetate, butyrate, lactate, and propionate measured by high performance liquid chromatography from each horse and culture condition at times 0, 6, 18, 36, and 48 hours.

3.4.2 Sequence metrics

As shown in Table 3, there were over 6 million sequences and 41,000 OTUs in the dataset following quality filtering and initial OTU picking in QIIME [48]. Chimera removal using Chimera Slayer [49] left more than 2.5 million sequences and 32,000

OTUs for further analysis. Average read length was 254 bp.

Rarefaction curves of the number of sequences per sample by observed species

(Appendix D) indicated a leveling off in terms of new species at the minimum sequence

40

depth of 9685. Since 83% of the samples had over twice this depth, and Good’s coverage at the depth of the smallest library (9685) was 89% (Appendix E), we are confident that our sequencing efforts have captured most of the diversity in these samples.

Table 3. Sequence metrics of horse gut samples.

Before After Chimera Check Chimera Check Total number of sequences 6,043,194 2,576,535 Sequences per sample 125,900 53,678 (mean) Sequences per sample 51,890 9,685 (min) Number of OTUs 41,055 32,563 Total number of sequences, sequences per sample, and OTUs before and after chimera detection using Chimera Slayer.

3.4.3 Relative abundance at the phylum level

Taxonomic assignments at the phylum level (Fig. 12) showed that the Firmicutes were the most abundant group by time 48 in all cultures regardless of enrichment conditions, with a corresponding decline in all other major groups, namely the Verrucomicrobia,

Spirochaetes, Proteobacteria, and Bacteroidetes, however at this taxonomic level, clear differences between horse and treatment groups were not apparent.

3.4.4 Relative abundance at the family level for Firmicutes

Differences between horse cultures and treatment conditions were evident at the family level, specifically for the most abundant phyla, the Firmicutes. The Veillonellaceae, a family with known lactate utilizing species [98,99], increased in abundance in all cultures in the control and lactate treatment groups. In the starch and starch/lactate enrichments there were dramatic differences between horse 3 cultures and those of horse 1 and 2

41

!"#$%&' !"#$%&( !"#$%&) '+**

*+-, ."/0#"1

*+,*

*+(,

*+** '+**

B#463%3E&FD#G3#463%"03 *+-,

234030% H340%#<3E&H340%#"<7%0%$ *+,* H340%#<3E&I

*+(, H340%#<3E&I<#=<4D0%$

*+** H340%#<3E&ID$"C340%#<3 '+** H340%#<3E&J#"0%"C340%#<3

*+-, H340%#<3E&5K<#"463%0%$ 503#46& @%130

H340%#<3E&L%##D4"=<4#"C<3 *+(, H340%#<3EM06%# *+** '+** 503#46&&3/7&234030%

*+-,

*+,*

*+(,

*+**

* '( (8 )9 8: * '( (8 )9 8: * '( (8 )9 8: ;<=%>6#$?

Figure 12. Distribution of horse gut sequences by phyla. Relative abundance of sequences for each operational taxonomic unit found at or greater than 1% at each time point for each culture condition and horse sample by phyla.

(Fig. 13). By time 48, the Veillonellaceae dropped to less than 5% in horse 3 cultures, while in horse 1 and 2 cultures this group was the highest in abundance, making up more than 70% of the total sequences (Fig. 4). The most abundant family for the starch and starch/lactate horse 3 cultures at time 48 were the Lactobacillaceae, making up greater than 40% of the total sequences at this time point.

The Streptococcaceae reached a peak in abundance by 24 hours in the starch and starch/ lactate conditions for all horse cultures, after which the abundance dropped. In horse 1 and 3 cultures this decrease was accompanied by an increase in the Lactobacillaceae,

42

!"#$%&' !"#$%&( !"#$%&)

*+-, ."/0#"1

*+,*

*+(,

*+** E34<11

*+-, .1"$0#<7<3FG06%#

234030% .1"$0#<7<34%3% *+,* ."#<"C340%#<34%3%

*+(, 2346/"$H<#34%3%

*+** 2340"C34<1134%3%

2340"C34<1131%$FG06%# *+-, 2%D4"/"$0"434%3% 503#46 @%130

@D=

*+,*

*+(,

*+**

* '( (8 )9 8: * '( (8 )9 8: * '( (8 )9 8: ;<=%>6#$?

Figure 13. Distribution of horse gut Firmicute sequences by family. Relative abundance of Firmicutes sequences found at or greater than 1% by family at each time point for each culture condition and horse sample.

however in horse 2 the abundance of Lactobacillaceae remained relatively low even as the Streptococcaceae declined over time.

3.4.5 Distribution of abundant OTUs

Identification of specific taxa that were most abundant by hour 48 (Fig. 14 and

Appendix F) suggested community differences between horse cultures of specific interest in light of differences in lactate utilization and attentuation. There was a striking difference in OTU abundance and distribution between horse 3 cultures and those of horse 1 and 2 in all treatment conditions at hour 48. While the most abundant sequence

43

!"#$%&' !"#$%&( !"#$%&)

*+.

*+-

*+, 9%241:;%&<=>08405%

*+(

/"01#"2 345141% 614#57 614#57&408&345141% /"01#"2 345141% 614#57 614#57&408&345141% /"01#"2 345141% 614#57 614#57&408&345141%

<04%#"$?"#"=451%#&@"=:2:$&$1#4:0&AB6CD&,**''&EF,GH S4#4=451%#":8%$&8:$14$"0:$&

/2"$1#:8:>@&?4#4?>1#:I:5>@&$1#4:0&@:0401:>@&$>=$?+&2451:2V1:54&J

3451"=45:22>$&%O>:&$1#4:0&PAJ&*,KK&E'**GH& 61#%?1"5"55>$&%O>:&E'**GH

B%N4$?74%#4&%2$8%0::&M6B&(*,-*&E'**GH W%:22"0%224&54;:4%&$1#4:0&SW'EFFGH

B:1$>"Q%224&R4242>8:0::&$1#4:0&BF&E'**GH W%:22"0%224&@"01?%22:%#%0$:$&$1#4:0&

Figure 14. Distribution of most abundant horse gut OTUs at time 48. Relative abundance of OTUs found at greater than 5% in each culture condition by horse. OTUs are identified by best BLAST match, and percent sequence similarity is given.

(between 50-60% relative abundance, with 98% identity to Veillonella montpellierensis ) under control and lactate conditions for horse 3 was also found in horse 1 and 2 cultures, it reached a lower abundance in both. While horse 3 control and lactate enriched cultures were able to attenuate lactate, it persisted in high concentrations in starch and starch/lactate enrichments in which no member of the Veillonellaceae was highly abundant.

A different OTU, related to Megasphaera elsdenii (100% identity) was the most abundant OTU in horse 1 and 2 cultures, especially in the starch and starch and lactate

44

enrichments, reaching relative abundances of over 70%. While this OTU was present in single numbers at Time 0 in all three horse samples, and persisted in horse 3 in low numbers under all culture conditions (Fig. 15), it never reached abundances of greater than 0.40% at any condition or time point.

!"#$%&' !"#$%&( !"#$%&)

*+-, ."/0#"1 *+,*

*+(,

*+**

*+-, 234030% *+,*

*+(,

*+**

*+-, 503#46

>%130:?%&@AB/73/4% *+,*

*+(,

*+**

*+-, 503#46&3/7&234030%

*+,*

*+(,

*+** * '* (* )* 8* ,* * '* (* )* 8* ,* * '* (* )* 8* ,* 9:;%<6#$=

@11&C%:11"/%1134%3% 5%1%/";"/3$&A"?:$&$0#3:/&HI&

Figure 15. Change in most abundant horse gut Veillonellaceae OTUs over time. Relative abundance of most abundant Veillonellaceae OTUs over time in each culture condition by horse. OTUs are identified by best BLAST match, and percent sequence similarity is given.

The most abundant OTU in horse 3 starch and starch and lactate cultures was related to

Lactobacillus equi (100% sequence identity) and represented 97.7% of the

Lactobacillaceae sequences across the dataset. This OTU was present in all three horse

45

!"#$%&' !"#$%&( !"#$%&)

?"<7#"5 *+, @6>767%

A76#>2

A76#>2&6<=&@6>767%

*+)

*+( 4%567/8%&9:;<=6<>%

*+'

*+*

* '* (* )* ,* -* * '* (* )* ,* -* * '* (* )* ,* -* ./0%&12#$3

Figure 16. Change in single, abundant horse gut Lactobacillaceae OTU over time. Relative abundance of the single, dominant Lactobacillus OTU over time for each culture condition by horse.

cultures, reaching abundances of 20% and greater in horse 1 in starch and starch and lactate enrichments at time 36, but then falling to 10% or less by time 48 (Fig. 16).

Instead of being dominated by one or two abundant OTUs, the distribution of the

Streptococcaceae taxa was more dispersed, with the two most abundant OTUs (related to

Streptococcus equi and Streptococcus infantarius (with 100% and 99% sequence identities respectively) together accounting for less than 50% of the sequences in this group (Fig. 17).

46

!"#$%&' !"#$%&( !"#$%&) *+'**

*+*-, ."/0#"1

*+*,*

*+*(,

*+*** *+'**

*+*-, 234030%

*+*,*

*+*(,

*+*** *+'**

*+*-, 503#46 >%130:?%&@AB/73/4% *+*,*

*+*(,

*+*** *+'** 503#46&3/7&234030%

*+*-,

*+*,*

*+*(,

*+*** * '* (* )* 8* ,* * '* (* )* 8* ,* * '* (* )* 8* ,* 9:;%&<6#$=

50#%C0"4"44B$&%DB:&<'**E= 50#%C0"4"44B$&:/F3/03#:B$&$BA$C+&:/F3/03#:B$&.G'H&

Figure 17. Change in abundant horse gut Streptococcaceae OTUs over time. Relative abundance of dominant Streptococcus OTUs over time for each culture condition by horse. OTUs are identified by best BLAST match, and percent sequence similarity is given.

3.5 Discussion

3.5.1 Our in vitro system captures elements observed in vivo

The accumulation of lactate has long been recognized as an early event in the microbial response to carbohydrate overload leading to colic and laminitis in horses [79]. In the in vitro system described herein, we were able to simulate many key aspects of starch induced conditions that have been reported elsewhere for in vivo experiments. The extent

and timing of pH changes (Fig.10) and fluctuations of SCFA levels (Fig. 11), especially

47

with respect to lactate, acetate, and propionate are similar to those reported for in vivo starch induction studies [79,91].

No study thus far has tracked microbial changes over the time course of starch induction in horses using 16S rRNA deep sequencing as described here, however, the changes that we observed in community composition are similar to what has been reported in equine and ruminant culture and probe based studies [81,100]. Specifically, in the first 24 hours following enrichment, we noted an increase in Firmicutes, especially members of the

Streptococcaceae, coinciding with a decrease in fibrolytic groups (Ruminococcaceae and

Lachnospiraceae), followed by an increase in the Lactobacillaceae as the abundance of

Streptococcaceae falls.

Despite a sample size of three horses, we were able to observe between-horse differences in the ability to attenuate accumulated lactate as has been described in previous studies [79,91].

Certainly the microbial dynamics reported here may not reflect the actual community compositions of the caecum and large intestine of horses who recover or resist conditions of lactic acidosis, however the elucidation of endogenous lactate utilizers that thrive under conditions of low pH following starch enrichment in our in vitro model may provide insight into microbial mechanisms of resistance and/or recovery.

48

3.5.2 Lactate producing bacteria proliferate following starch induction

While we observed the pattern of increase in Streptococcus species during the first 24 hours of starch induction followed by an increase in Lactobacilli species noted in other studies [81,91] we expected the Streptococcus lutetiensis group (formerly Streptococcus bovis [92]) to be more highly represented in the starch and starch/lactate cultures, as it has been identified as the major lactate producer in in vivo starch induction studies

[93,101]. Our data indicates a high abundance of the family Streptococcaceae in the starch and starch/lactate cultures, but fails to resolve subtle differences between species.

We recognize that basing taxonomic assignments on a single region (V4) of the16S gene is inherently limited [102,103] for all bacterial groups since 500-700 bases are recommended for species resolution [104], and suspect that taxa within the Streptococcus lutetiensis group may be especially difficult to distinguish due to their close phylogenetic relationships [92,105]. Sequencing with longer reads would clarify the species composition of this family in our starch and starch/lactate enrichments.

3.5.3 Members of the Veillonellaceae are the most abundant lactate utilizing bacteria

While studies have pointed to Streptococcus lutetiensis and Lactobacillus sp as major lactic acid producers [93], relatively little attention has been paid to bacteria in the horse gut that actively utilize lactic acid. Studies of lactic acidosis in ruminants [106,107] have identified specific genera, namely Megasphaera , Veillonella , Selenomonas ,

Propionibacterium , and Anaerovibrio as key lactate utilizers. In fact, strains of

Megasphaera elsdenii have been shown to be effective in preventing lactic acidosis in cattle and are under development as probiotic therapies [108–110].

49

Using deep sequencing of the 16S rRNA gene of fecal communities challenged with starch, lactate, or both in an in vitro model of starch induction, we report here changes in the abundance of specific microbes associated with the reduction of lactic acid. As we observed at time 0 in this study, 16S rRNA surveys of the equine gut microbiome have shown that under normal conditions, the Veillonellaceae (known lactate utilizers) comprise 1% or less of the total bacterial abundance [31,111,112]. One probe-based in vivo study did not see a difference in the abundance of Veillonellaceae in response to dietary change despite an increase in lactate levels [101]. It is unclear why the

Veillonellaceae in that study did not increase in abundance as lactate accumulated, or which specific taxa were present in those horses.

In our study we observed that one particular taxa closely related to Megasphaera elsdenii was highly abundant in all starch and starch/lactate cultures in which lactate accumulated and was attenuated. This taxa was present in very low abundance in cultures in which lactate persisted. It is unclear why this taxa fails to proliferate in any of the horse 3 cultures while reaching such high abundances under all conditions in horse 1 and

2 cultures. A second OTU related to Veillonella montpellierensis was highly abundant in control and lactate enrichments specifically in horse 2 and 3 cultures, but did not thrive in the starch or starch/lactate enrichments, suggesting that other factors such as pH or competitive interactions exert selective pressure under starch and starch/lactate enrichment conditions.

50

While our data indicates a relationship between the presence of the Megasphaera elsdenii OTU and the reduction of lactate, conclusive evidence that this taxa is responsible for reducing lactate levels will require further study. It is possible that other community members with lactate utilizing capabilities are playing active roles as well.

Understanding the factors stimulating or preventing the proliferation of lactate utilizers in the horse gut microbiome could provide valuable information about why some horses are more sensitive to starch induction, and the microbial basis behind mechanisms of resistance.

3.6 Conclusion

A robust in vitro model for starch induced laminitis in horses as described here could provide a convenient and cost effective means to understand the microbial dynamics underlying colic and laminitis, and test hypotheses for ways to prevent or interrupt the progress of these equine diseases. Specific taxa in the family Veillonellaceae were highly abundant in starch-enriched cultures that were able to attenuate lactate. These could provide useful insights into mechanisms of recovery or resistance, and could be valuable, individually or in consortia, as probiotics to prevent starch induced colic and laminitis.

51

CHAPTER 4

UNTANGLING THE GENETIC BASIS OF FIBROLYTIC SPECIALIZATION BY LACHNOSPIRACEAE AND RUMINOCOCCACEAE IN DIVERSE GUT COMMUNITIES

4.1 Abstract

The Lachnospiraceae and Ruminococcaceae are two of the most abundant families from the order Clostridiales found in the mammalian gut environment, and have been associated with the maintenance of gut health. While they are both diverse groups, they share a common role as active plant degraders. By comparing the genomes of the

Lachnospiraceae and Ruminococcaceae with the Clostridiaceae, a more commonly free- living group, we identify key carbohydrate-active enzymes, sugar transport mechanisms, and metabolic pathways that distinguish these two commensal groups as specialists for the degradation of complex plant material.

4.2. Introduction

4.2.1 Taxonomic Revision of the Clostridiales is a Work in Progress

Classically, the genus Clostridium was described as comprising spore-forming, non- sulfate reducing obligate anaerobic bacteria with a gram-positive cell wall. Over the years, the classification and naming of new species based on phenotypic traits has led to much confusion about the relationships between taxa in this and related groups. In fact, many species classified as Clostridium are more closely related to members of other genera than to the type species, [25,113]. Prior to 2009, the order

Clostridiales was divided into eight families, many of which were recognized at the time to be paraphyletic [114]. The most recent taxonomic revision of the Phylum Firmicutes in

52

Bergey’s Manual of Systematic Bacteriology [115] divided the Clostridiales into ten named families. An additional nine families were identified as Incertae sedis (Latin for uncertain placement) in an effort to regroup species found to fall outside of the named families.

In the past decades, surveys of 16S rRNA gene sequence diversity have led to the identification of thousands of novel taxa and a new appreciation for the overwhelming diversity of the microbial world. At the same time, knowledge about the roles of new species in their environments has remained scanty [116]. For diverse groups with confusing taxonomic structure, such as the clostridia, linking the phylogeny of novel, uncultured taxa to possible ecological and/or physiological roles requires extensive prior knowledge and time-consuming literature review.

4.2.2 Lachnospiraceae and Ruminococcaceae are active members of the gut environment

Abundance estimates based on 16S rRNA surveys suggest that Firmicutes comprise

between 50–80% of the taxa in the core human gut microbiota [27,28], and more than

84% of the active fraction [29]. Lachnospiraceae and Ruminococcaceae are the most

abundant Firmicute families in gut environments, accounting for roughly 50% and 30% of

phylotypes respectively [27,30]. Lachnospiraceae such as Eubacterium rectale , Eubacterium ventriosum , sp . and sp . have been associated with the production

of butyrate necessary for the health of colonic epithelial tissue [117,118], and have been

shown to be depleted in inflammatory bowel disease [119]. Unusual polysaccharide

binding and degradation strategies have been described in flavefaciens

53

[64,120], while another Ruminococcaceae, Faecalibacterium prausnitzii , has been shown to be depleted in Crohn’s disease [121].

4.2.3 The complexity of plant material poses challenges for bacterial decomposition

Plant biomass is a fibrous composite of fibrils and sheets of cellulose, hemicellulose, lignin, waxes, pectin, and proteins forming a complex network that provides support for the plant while resisting attack from bacteria and fungi. Due to the size and complexity of the substrate, bacterial glycoside hydrolases (GH) are generally produced extracellularly.

In anoxic environments such as the gut, bacteria utilize complexed, multienzyme catalytic systems found on the cell surface or in organelles called cellulosomes. These complexes are modular in design and often include one or more carbohydrate-binding modules (CBM) that attach to the substrate enabling easy access [2,65]. Surveys of plant degradation in the rumen show that bacteria that can degrade easily available substrates colonize plant material first, and that these communities are replaced by others capable of degrading more recalcitrant substrates such as cellulose [66]. Non-adherent Bacteriodes

sp . and Bifidobacterium sp . have been shown to outcompete gram-positive bacteria (such

as Firmicutes) for easily hydrolysable starch [122,123], while Lachnospiraceae and

Ruminococcaceae persist in fibrolytic communities and are uniquely suited to degrade a

wide variety of recalcitrant substrates [66].

54

4.2.4 Genomic clues to fibrolytic function in gut environments

With progress of the Human Microbiome Project and other efforts to understand the complexity of microbial communities living on and inside of humans, an increasing number of sequenced genomes and datasets have been released [124]. Here we use genomic data to describe two families of Clostridiales that are highly abundant in the human gut microbiota, the Lachnospiraceae and the Ruminococcaceae. By comparing the distribution and abundance of carbohydrate-active enzymes and transporters, and the differences in key metabolic pathways present in the genomes of each group, we reveal genetic components supporting plant degradation by these fibrolytic specialists, and provide clues to help distinguish gut microbes from their primarily free-living relatives in the Clostridiaceae.

4.3 Materials and methods

4.3.1 Phylogenetic arrangement of Lachnospiraceae, Clostridiaceae, and

Ruminococcaceae

Sixty-eight high quality (greater than 1,200 bases) 16S rRNA sequences of

representative taxa from each family in the most recent taxonomic revision of the Phylum

Firmicutes in Bergey’s Manual of Systematic Bacteriology [115] were downloaded from

the Ribosomal Database Project website [52], and aligned using MUSCLE [125]. A

maximum likelihood tree was created from these sequences with 500 bootstrap replicates

using Escherichia /Shigella coli (ATCC 11775T; X80725) as the outgroup. Tree building

was accomplished using the Tamura_Nei model [126] with default parameters in the

program MEGA [54]. Tree visualization was carried out using Figtree [127].

55

56

4.3.2 Habitat association by group

Isolation site and habitat preference (gut vs . non-gut) was determined for listed members of each group from either the IMG (Integrated Microbial Genomes) metadata table [128] and published reports of isolation or 16S surveys. Pathogenic organisms were treated as non-gut residents if they were not reported to be part of the normal gut flora of a mammalian species. Logistic regression was performed to test whether each group was more likely to be gut associated, and the Wald test was used to test whether group assignment significantly predicted gut association.

4.3.3 Comparative analysis of carbohydrate-active enzymes

The numbers of carbohydrate-active genes and gene families were tabulated for each member of the Lachnospiraceae, Clostridiaceae, and Ruminococcaceae found in the

Carbohydrate-Active enZymes Database (CAZy) [129] (Appendix G). GH and CBM families represented in greater than 50% of total taxa (more than 15 of 31) and a difference of greater than two times in average abundance between any two groups were chosen for further analysis. All statistical analyses were done in R [51] using the package

MASS [130]. Gene counts for each family and individual genes were tested for normality using the Shapiro-Wilk test [131]. Due to overdispersion and zero values across the data set, we did not transform the data [132]. Instead, we compared the fits of the negative

binomial and Poisson regression models for the gene counts for each family.

Clostridiaceae was used as a reference. Significance was estimated from the model with the best fit as determined by the p-values reported for likelihood ratio tests comparing both models.

57

Genomes were further searched for enzymes from each significant GH and CBM family using the genome comparison tool in the Integrated Microbial Genomes system [128].

Enzymes with significantly different abundances per group were identified by the best fit

between negative binomial and Poisson regression models as described above for each

GH and CBM family. Enzyme functions were assigned using UniprotKB [133].

4.3.4 Comparative analysis of sugar transport genes

Genomes from each member of the Lachnospiraceae, Clostridiaceae, and

Ruminococcaceae found in the IMG database [128] were compared in terms of numbers of carbohydrate transport genes. Specifically, genomes were searched for 63 PTS

(phosphotransferase system) genes, and 137 ABC (Adenosine triphosphate-binding cassette) transporter genes identified by KEGG orthology [134]. Gene counts for each taxa were tabulated and the difference between groups for each category of transporter genes was tested using the best fit between negative binomial or Poisson regression models as determined by likelihood ratio test ( p-value < 0.05) as described earlier.

Individual genes were chosen for further analysis if they had an average number of copies

per taxa of at least 1% in any group. Significant differences between counts for these genes between families were estimated as described above for each category.

4.3.5 Comparative analysis of metabolic pathways

Metabolic pathways characterized in Ecocyc [135] for each member of the

Lachnospiraceae, Clostridiaceae, and Ruminococcaceae in the Biocyc database system [136]

(Appendix G) were tabulated for each pathway class. Significantly different pathway classes

58

between the three families were identified using either one-way ANOVA (for data with a normal distribution based on the Shapiro-Wilks test) [131] or the best fit between negative

binomial or Poisson regression models as described above.

For carbohydrate degradation pathways showing a difference between families, the

percentage of genomes in each family containing the pathway was determined to

highlight differences between families for these functions.

4.4 Results and Discussion

4.4.1 Phylogenetic arrangement of Lachnospiraceae, Clostridiaceae, and

Ruminococcaceae

The topology of the 16S rRNA neighbor-joining tree for the representative taxa chosen for

this study confirms the clustering of the Lachnospiraceae, Clostridiaceae, and

Ruminococcaceae into distinct clades (Figure 18) with the exception of Clostridium

sporosphaeroides DSM 1294, which is identified as a Lachnospiraceae in Bergey’s newest

revision [115] yet clearly clusters with the Ruminococcaceae.

59

1 Butyrivibrio_proteoclasticus_B316_U37378 0.55 Butyrivibrio_fibrisolvens_ATCC_19171_U41172 0.37 Eubacterium_cellulosolvens_ATCC_43171_X71860 0.21 Catonella_morbi_ATCC_51271_X87151 0.55 Shuttleworthia_satelles_D143K-13_AF399956 0.04 Oribacterium_sinus_AIP_354.02_AY323228 1 Roseburia_intestinalis_L1-82_AJ312385 0.79 Roseburia_hominis_(T)_A2-183_AJ270482 0.48 Roseburia_inulinivorans_(T)_A2-194_AJ270473 Eubacterium_rectale_L34627 0.05 0.95 Blautia_producta_ATCC_27340_L76595 0.86 Ruminococcus_hansenii_ATCC_27752_M59114 0.4 Ruminococcus_obeum_X85101 0.45 Blautia_hydrogenotrophica_S5a36_X95624 Marvinbryantia_formatexigens_I-52_AJ505973 0.21 0.09 0.71 Clostridium_saccharolyticum_DSM_2544_Y18185

0.56 Ruminococcus_gauvreauii_CCRI-16110_EF529620

0.75 Clostridium_symbiosum_M59112 1 Clostridium_bolteae_(T)_16351_AJ508452 0.11 Clostridium_asparagiforme_N6_AJ582080 0.3 Eubacterium_eligens_L34420 Butyrivibrio_crossotus_DSM2876_FR733670 Eubacterium_ventriosum_L34421 0.49 Ruminococcus_torques_ATCC_27756_L76604 0.51 0.92 Coprococcus_comes_ATCC_27758_EF031542 0.1 0.39 Clostridium_nexile_DSM_1787_X73443 Ruminococcus_gnavus_ATCC_29149_X94967 0.55 Clostridium_hylemonae_TN-272_AB023973 0.9 Dorea_formicigenerans_L34619 0.17 0.49 0.6 Dorea_longicatena_III-35_AJ132842 Clostridium_scindens_ATCC_35704_AF262238 Coprococcus_catus_VPI-C6-61_AB038359 0.99 0.39 Coprococcus_eutactus_ATCC_27759_EF031543 0.51 Anaerostipes_caccae_L1-92_AJ270487 0.6 Eubacterium_hallii_L34621 Clostridium_phytofermentans_ISDg_CP000885 Cellulosilyticum_lentocellum_DSM_5427_X71851 1 Clostridium_beijerinckii_DSM791_X68179 0.6 Clostridium_butyricum_VPI3266_AJ458420 0.47 Clostridium_perfringens_ATCC_13124_CP000246 0.3 Clostridium_novyi_JCM1406_AB045606 0.49 Clostridium_cellulovorans_743B_CP002160 Clostridium_tetani_NCTC_279_X74770 0.33 0 1 Clostridium_sporogenes_ATCC3584_X68189 1 0.41 Clostridium_botulinum_L37585 0.6 Clostridium_acetobutylicum_ATCC_824_AE001437 Clostridium_carboxidivorans_DSM15243_FR733710 0.88 0.96 Clostridium_kluyveri_DSM_555_CP000673 Clostridium_ljungdahlii_DSM_13528_CP001666 1 Alkaliphilus_oremlandii_OhILAs_DQ250645 Alkaliphilus_metalliredigens_QYMF_CP000724 0.42 Clostridium_aldrichii_DSM_6159_X71846

0.56 0.99 Clostridium_alkalicellulosi_Z-7026_AY959944 0.44 Clostridium_straminisolvens_CSK1_AB125279 0.57 Clostridium_thermocellum_ATCC_27405_CP000568 0.93 1 Clostridium_cellulolyticum_ATCC_35319_X71847

0.8 Clostridium_josui_FERM_P-9684_AB011057 Clostridium_stercorarium_ATCC_35414_AF266461 0.99 Clostridium_viride_T2-7_(DSM_6836)_X81125 Eubacterium_siraeum_L34625 0.75 Faecalibacterium_prausnitzii_ATCC_27768_AJ413954 0.78 0.75 Ruminococcus_albus_ATCC_27210_L76598

0.78 Ruminococcus_flavefaciens_ATCC_19208_L76603 Clostridium_methylpentosum_DSM_5476_Y18181 0.43 Ethanoligenens_harbinense_YUAN-3_AY295777 0.91 0.56 Ruminococcus_bromii_ATCC_27255_L76600 0.95 Clostridium_sporosphaeroides_DSM_1294_X66002 Clostridium_leptum_DSM_753T_AJ305238 Escherichia/Shigella_coli_ATCC_11775T_X80725

0.02 Figure 18. 16S Neighbor-joining tree of the representative taxa used in this analysis . Bootstrap values are given for 500 replicates. (blue) Lachnospiraceae, (red) Clostridiaceae, (green) Ruminococcaceae.

60 4.4.2 Habitat Association by Group

The logit model concluded that the three groups differ significantly in their likelihood to

be gut associated (Figure 19). The Wald test predicted that the Ruminococcaceae and

Lachnospiraceae were more likely than the Clostridiaceae to be gut associated (X2 = 17.5, df = 2, P (>X2) = 0.00016).

%" >8:?3091;/8:<8<" #$-" @A097/;=;8:<8<" #$," B6C;30:0::8:<8<" #$+" #$*" #$)" #$(" #$'" !" #$&" ./010/203"04"567"8990:;87<=" #$%" #" Figure 19. Proportion of members of each family that are gut associated. The Clostridiaceae are less likely to be gut associated based on logistical regression model predictions ( p-value < 0.05) tested using the Wald test ( Χ2 = 17.5, df = 2, P (> Χ2) = 0.00016).

4.4.3 Comparative Analysis of Carbohydrate-Active Enzymes

The Lachnospiraceae, Clostridiaceae, and Ruminococcaceae differ with respect to the average numbers of carbohydrate-active genes and gene families, particularly the glycoside hydrolases (GH) and carbohydrate-binding modules (CBM) (Figure 20), which are more abundant and more diverse in the Lachnospiraceae and Ruminococcaceae.

A closer look at the average numbers of genes per GH family reveals that significant differences exist between these groups for thirteen GH families most of which include enzymes used to degrade complex plant polymers. With the exception of GH1, all of these families are more highly represented in the Lachnospiraceae, and

61 '#" ($" !" 07@A9=?BC67@575" !" a /6AB8<@C>56A464" b !" !" (#" -D=?E6CFC7@575" &$" ,?<@D5>E>6A464" G:;C9=@=@@7@575" F9:>8

&$" %$" &#"

%#" !" %$" 3456785"9:;<56"=>"8595?" 2345674"89:;45"<="=6:>?>4@" !" %#" !" $" $"

#" #" ()" (*+" ,-" ./" ,01" )*" )+," -." /0" -12" Figure 20. Comparison of Carbohydrate-Active enZymes (CAZy) with respect to abundance in the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae. (a) Average numbers of CAZy families per group. ( b) Average numbers of CAZy genes per group. CAZy enzyme classes indicated as follows: GH = glycoside hydrolase, GTF = glucosyltransferase, CE = carbohydrate esterase, PL = polysaccharide lyase. (*) Groups showing significant differences in either one-way ANOVA testing (for data with a normal distribution based on the Shapiro-Wilks test) or the best fit between negative binomial and Poisson models as described above.

Ruminococcaceae (Figure 21). GH2, GH3, GH43, and GH51 are associated with

cleaving pectin and hemicellulose sidechains, GH5 and GH9 contain cellulases, GH13

and GH31 consist of starch-degrading alpha-glucosidases, while GH10 includes

xylanases. GH94 contains phosphorylases that cleave beta-glycosidic bonds in cellobiose,

cellodextrin and chitobiose.

$#" !" <3=>59;?@23=131" ," AB9;C2@D@3=131" +" E67@59=9==3=131" *"

)" !" !" !" (" !" !" !" !" '" !" !" &" !" !" /012341"567812"9:"4151;" !" %" !" !" !" !" $"

#" -.$" -.%" -.&" -.(" -.," -.$#" -.$&" -.&$" -.&)" -.'&" -.($" -.,'" Figure 21. Glycoside hydrolase families that differ in abundance. Families of glycoside hydrolases that differ in abundance between the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae based on best fit between negative binomial or Poisson regression models (Clostridiaceae as reference) as determined by likelihood ratio test ( p-value < 0.05).

62 When GH enzyme families are broken into individual enzymes (Figure 22; Activities are given in Appendix H), differences between microbial families related to specific plant degradation processes are revealed. The enzyme that differs significantly between the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae in the GH1 family is a beta-glucosidase (EC: 3.2.1.21), which is found in almost every GH family.

Considering other GH families, the Lachnospiraceae and Ruminococcaceae have higher numbers of genes equipped to degrade a wide variety of polysaccharides. The

Ruminococcaceae are enriched in endo-1, 4-beta-xylanase and cellulase genes, while both groups have higher numbers of alpha-glucosidases and both alpha and beta- galactosidases. Thus, members of these microbial families are better equipped to cleave the cellulose and hemicellulose components of plant material.

'"$!!# N41=8,2K*:41545# '"!!!# G),29:*3*41545# &"$!!# O0@*8,1,1141545#

&"!!!#

%"$!!#

%"!!!#

!"$!!# L:,K,:M,8#,;#+5852#K5:#94F4## !"!!!#

7-4@?)425# G5))0)425# A@?),201:425# A)K=4-+)01,2*3425# D83,-%.6-E-F?)48425#H)01,2?)15:4@*3425#I594-+4)419,2*3425#I594-+)010:,8*3425#I594-@488,2*3425# A)K=4-+4)419,2*3425# ()*+,-%./-+)01,2*3425# J?)48#%.6-E-F?),2*3425# 6-7-+)0148,9:482;5:425# H)0148#%.'-E-+)01,2*3425# %.6-7-+)0148#<:481=*8+#58>?@5# G5))0),25#%.6-E-15)),<*,2*3425# B4)9,25#7-C-+)01,2?)9:482;5:425#B48848#583,-%.6-E-@488,2*3425#D8>?@5#

Figure 22. Glycoside hydrolase enzymes that differ in abundance. Glycoside hydrolase enzymes that differ significantly between the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae based on best fit between negative binomial or Poisson regression models (Clostridiaceae as reference) as determined by likelihood ratio test ( p-value < 0.05).

63 Carbohydrate-binding modules that differ significantly between the Lachnospiraceae,

Clostridiaceae, and Ruminococcaceae include CBM6, CBM22, and CBM48 (Figure 23;

Activities are given in Appendix I). All are significantly enriched in the

Ruminococcaceae which has been shown to have unusual substrate binding capabilities

[120]. CBM6 binds to both cellulose and hemicellulose components of plant material, while CBM22 binds primarily to xylan, and CBM48 is associated with glycogen.

)" ;2<=498>?12<020" !" *@98A1?B?2<020" (" !" !" C56?49<9<<2<020"

'"

!" &" !" !"

%" !" !" ./01230"4567018"9:"*+,"30408" $" !"

#" *+,%" *+,)" *+,%%" *+,&%" *+,&(" *+,'-" *+,(#" Figure 23. Carbohydrate-binding module (CBM) families that differ in abundance. CBMs with (*) significantly different patterns of abundance in the genomes of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae based on best fit between negative binomial or Poisson regression models (Clostridiaceae as reference) as determined by likelihood ratio test ( p-value < 0.05).

4.4.4 Comparative Analysis of Transporter Proteins

Genomes of the Clostridiaceae contained more PTS (phosphotransferase) genes than the

Lachnospiraceae or Ruminococcaceae, while the later two groups were more highly enriched in ABC (ATP-binding cassette) genes (Figure 24).

PTS systems transport a wide variety of mono- and disaccharides, especially hexoses such as glucose [137]. In PTS transport, substrates are phosphorylated upon entry, which makes their subsequent metabolism efficient, and also provides a means for regulation

64 and preferential sugar utilization via catabolite repression [138]. Thus, PTS transport enables bacteria living in carbohydrate-limited environments, such as soils and sediments, to efficiently utilize and compete for substrates as they become available.

ABC transporters, on the other hand, tend to carry oligosaccharides, and have less preference for hexoses [139,140]. Oligosaccharide import is energetically favorable

because it enables the conservation of the energy of hydrolysis intracellularly. Regulation of ABC transporters is less well studied than for PTS; however, they are thought to be controlled by proteins acting to block specific domains [141]. The abundance of ATP transporters in the Lachnospiraceae and Ruminococcaceae is consistent with their capacity to utilize complex plant material, and transport degradation products of various sizes and compositions. Since carbon is not limited and is present as a range of complex

'$" !" =1>?379:@01>/1/" -A79;0@B@1>/1/" '#" C45@37>7>>1>/1/" &$"

&#" !"

%$"

%#" !"

+./012/"3456/0"78"2/3/9":/0";1<1"" $"

#" ()*" +,-"

Figure 24. Average abundance of phosphotransferase system (PTS) and ATP- binding cassette (ABC) transporter genes. A comparison of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae genomes from the IMG database [128] (*) Genes showing significant differences between groups (Clostridiaceae as reference) in best fit between negative binomial or Poisson regression models as determined by the likelihood ratio test ( p-value < 0.05).

65 polymers in the gut environment, the ability to utilize many different substrates may be more advantageous than the efficient intake of a preferred carbon source.

On the single gene level, the average abundance of sugar transport genes in each genome of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae demonstrates the preference of the Clostridiaceae for simple hexoses such as glucose and cellobiose, and the wider range of substrates, including pentoses, transported by the Lachnospiraceae and

Ruminococcaceae (Figure 25).

)" V15NI67:8E15/1/" !" ($%" >3670E89815/1/" !" !" (" W4H8I6565515/1/" !" '$%" !" '" &$%" &" !" #$%" !" !" !" !" !" #" CU/E12/"I4H@/E"6D"2/I/7":/E"01R1"

*+,-DE45067/-7:/58;5"==>"?DE4CB" C.>-E8@67/"0E1I7:6E0"?E@7C.>B" *+,-./01-23456789/-7:/58;5-<==>"?@23AB"*+,-5/336@867/-7:/58;5"<==C->"?5/3C->B"*+,-H1II67/-7:/58;5-==C-J"?H1IKLMB"C.>-H43T:3/"7421E"0E1I7:6E0"?H7H"?:07GB"*+,-:N67:N60E1I7D/E17/-/IOPH/"="?:07=B" C.>-H/0NP3-2131506789/"0E1I7:6E0"?H23.B" C.>-H13067/QH13069/R0E8I"0E1I7:6E0"?H7HKSB" C.>-H43T:3/"7421E"0E1I7:6E0":/EH/17/7"?H7HAGB" +E1I7:6E0/E"G/I/7" Figure 25. Average abundance of sugar transport genes . A comparison of each genome of Lachnospiraceae, Clostridiaceae, and Ruminococcaceae. (*) Genes showing significant differences between groups (Clostridiaceae as reference) in best fit between negative binomial or Poisson regression models as determined by the likelihood ratio test ( p-value < 0.05).

4.4.5 Comparative Analysis of Metabolic Pathways

Four metabolic pathways were found to differ in terms of average number of genes

between the Lachnospiraceae, Clostridiaceae, and Ruminococcaceae, namely alcohol

66 degradation, carbohydrate degradation, polymeric compound degradation, and generation of precursor metabolites and energy (Figure 26). Considering degradation pathway classes for carbohydrates and polymeric compounds, ten specific pathways were identified as differing between the Lachnospiraceae, Clostridiaceae, and

Ruminococcaceae (Figure 27) revealing the capacity to break down a full range of plant- derived substrates including cellulose, hemicellulose, and starch.

20 * Lachnospiraceae 18 Clostridiaceae 16 * Ruminococcaceae 14 12 10 8 6 * * 4

Average numbers of genes Average 2 0

Protein AlcoholsAldehyde Amino Acids CarbohydratesCarboxylates C1 Compounds Inorganic Nutrients Fatty Acid and Lipids Aromatic Compounds Polymeric CompoundsSecondary Metabolites Amines and Polyamines

Nucleosides and Nucleotides Precursor Metabolites Generation Ecocyc Degradation Pathway Figure 26. Average abundance of genes in each Ecocyc degradation pathway. A comparison between Lachnospiraceae, Clostridiaceae, and Ruminococcaceae genomes. (*) Pathways showing significant differences between groups in either one-way ANOVA testing (for normal data) ( p-value < 0.05) or the best fit between negative binomial or Poisson regression models as determined by the likelihood ratio test ( p-value < 0.05).

4.5 Conclusion

Genome comparisons of the carbohydrate-active enzymes, transporters, and metabolic

pathways of the Lachnospiraceae and Ruminococcaceae in comparison with the

Clostridiaceae as described here reveal these groups to be more highly specialized for the degradation of complex plant material.

67 $!!"# 6.0:?23978.04.4# ,!"# I/[email protected]# +!"# *!"# P>H7?20200.04.4# )!"# (!"# '!"# &!"# %!"# $!"# !"# M4804?1.A4#2N#A4?2H43#;71:#9.1:;.<#

F

I4//>/234#504//>/232H4=# -./.01234#564/278#9.1:;.<=#

-/>0234#.?@#A/>0234B$B9:239:.14#

GB.0418.H7?.14#.?@#GB.041

In gut environments the ability to degrade cellulose and hemicellulose components of

plant material enables members of the Lachnospiraceae and Ruminococcaceae to decompose substrates that are indigestible by the host. These compounds are then fermented and converted into short chain fatty acids (mainly acetate, butyrate, and propionate) that can be absorbed and used for energy by the host.

68

CHAPTER 5

THE COMPLETE GENOME SEQUENCE OF CLOSTRIDIUM INDOLIS

5.1 Abstract

Clostridium indolis DSM 755 is a bacteria commonly found in soils and feces of birds

and mammals. Despite its prevalence, little is known about the ecology or physiology of

this species, however close relatives, C. saccharolyticum and C. hathewayi , have

demonstrated interesting metabolic potentials related to plant degradation and human

health. The genome of C. indolis DSM 755 reveals an abundance of genes in functional groups associated with the transport and utilization of carbohydrates, as well as citrate utilization and nitrogen fixation. Our genome analysis suggests hypotheses to be tested in future culture based work to better understand the physiology of this poorly described species.

5.2 Introduction

The C. saccharolyticum species group is a poorly described and taxonomically confusing clade in the Lachnospiraceae, a family within the Clostridiales that includes members of clostridia, cluster XIVa [25]. This group includes C. indolis , C. sphenoides , C. methoxybenzovorans , C. celerecrescens , and Desulfotomaculum guttoideum , none of which are well studied (Figure 28). C. saccharolyticum has gained attention because its saccharolytic capacity was shown to be syntrophic with the cellulolytic activity of

Bacteroides cellulosolvens in coculture, enabling the conversion of cellulose to ethanol in a single step [16,34]. Members of this group such as C. celerecrescens are themselves

69

cellulolytic [35] and others are known to degrade unusual substrates such as methylated aromatic compounds (C. methoxybenzovorans ) [36], and the insecticide lindane ( C.

sphenoides ) [37]. C. indolis was targeted for whole genome sequencing to provide insight into the genetic potential of this taxa that could then direct experimental efforts to understand its physiology and ecology.

71 Desulfotomaculum_guttoideum_DSM_4024_(Y11568.1) 98 Clostridium_sphenoides_ATCC_19403_(AB075772.1) 83 Clostridium_celerecrescens_DSM_5628_(X71848.1)

87 Clostridium_indolis_DSM_755:1620643-1622056 100 Clostridium_methoxybenzovorans_DSM_12182_(AF067965.1) 100 Clostridium_saccharolyticum_WM1_DSM_2544:_NC_014376:18567-20085 100 Clostridium_algidixylanolyticum_SPL75_(AF092549.1) 100 Clostridium_hathewayi_DSM_13479:_ADLN00000000:_202-1639 Clostridium_phytofermentans_ISDg:_CP000885:_15754-17276

54 Clostridium_jejuense_HY-35-12_(AY494606.2) 100 Clostridium_xylanovorans_HESP1_(AF116920.1) Clostridium_stercorarium_stercorarium_DSM_8532:_CP003992:_856992-858513 Bacillus_subtilis_DSM_10_(AJ276351)

0.02 Figure 28. Phylogenetic tree of 16S rRNA genes of the Clostridium saccharolyticum species group . The strains and their corresponding NCBI accession numbers (and, when applicable, draft sequence coordinates) for 16S rRNA genes are: Desulfotomaculum guttoideum strain DSM 4024 T, Y11568; C. sphenoides ATCC 19403 T, AB075772; C. celerecrescens DSM 5628 T, X71848; C. indolis DSM 755 T, Pending release by JGI: 1620643-1622056; C. methoxybenzovorans SR3, AF067965; C. saccharolyticum WM1 T, NC_014376:18567-20085; C. algidixylanolyticum SPL73 T, AF092549; C. hathewayi DSM 13479 T, ADLN00000000: 202-1639; C. jejuense HY-35-12 T, AY494606; C. xylanovorans HESP1 T, AF116920; C. phytofermentans ISDg T, CP000885: 15754-17276; and C. stercorarium ATCC 35414 T, CP003992: 856992-858513. The tree uses sequences aligned by MUSCLE, and was inferred using the Neighbor-Joining method [142]. The optimal tree with the sum of branch length = 0.44337019 is shown. The percentage of replicate trees in which the associated taxa clustered together in the bootstrap test (500 replicates) are shown next to the branches [143]. The tree is drawn to scale, with branch lengths in the same units as those of the evolutionary distances used to infer the phylogenetic tree. The evolutionary distances were computed using the Maximum Composite Likelihood method [144] and are in the units of the number of base substitutions per site. Evolutionary analyses were conducted in MEGA5 [54]. subtilis DSM 10 T (AJ276351) was used as an outgroup.

70

5.3 Organism information

The general features of Clostridium indolis DSM 755 are listed in Table 4. C. indolis

DSM 755 was originally named for its ability to hydrolyze tryptophan to indole, pyruvate, and ammonia [145] in the classic Indole Test used to distinguish bacterial species. It has been isolated from soil [146], feces [147], and clinical samples from infections [148]. Despite its prevalence, C. indolis is not well characterized, and there are conflicting reports about its physiology. It is described as a sulfate reducer with the ability to ferment some simple sugars, pectin, pectate, mannitol, and galacturonate, and convert pyruvate to acetate, formate, ethanol, and butyrate [115]. According to this source, neither lactate nor citrate are utilized, however other studies demonstrate that fecal isolates closely related to C. indolis may utilize lactate [74], and that the type strain

DSM 755 utilizes citrate [149]. It is unclear whether C. indolis is able to make use of a wider range of sugars or break down complex carbohydrates, however growth is reported to be stimulated by fermentable carbohydrates [115].

5.4 Genome sequencing information

5.4.1 Genome project history

The genome was selected based on the relatedness of C. indolis DSM 755 to C.

saccharolyticum , an organism with interesting saccharolytic and syntrophic properties.

The genome sequence was completed on May 2, 2013, and presented for public access on

June 3, 2013. Quality assurance and annotation done by DOE Joint Genome Institute

(JGI) as described below. Table 5 presents a summary of the project information and its association with MIGS version 2.0 compliance [150]

71

Table 4. Classification and general features of Clostridium indolis DSM 755.

MIGS ID Property Term Evidence Code Current classification Domain Bacteria TAS [151]AM Phylum Firmicutes TAS [115]AM Class Clostridia TAS [113]AM Order Clostridiales TAS [113,115]AM Family Lachnospiraceae TAS [115]AM Genus Clostridium TAS [25,152]AM Species Clostridium indolis TAS [145,146]AM Strain DSM 755 Gram stain Negative TAS [145,146]AM Cell shape Rod TAS [145,146]AM Motility Motile TAS [145,146]AM Sporulation Terminal, spherical spores TAS [145,146]AM Temperature range Mesophilic TAS [145,146]AM Optimum temperature 37 oC TAS [145,146]AM Carbon sources Glucose, lactose, sucrose, mannitol, TAS [145,146]AM pectin, pyruvate, others Terminal electron Sulfate TAS [145,146]AM receptor Indole test Positive TAS [145,146]AM MIGS-6 Habitat Isolated from soil, feces, wounds TAS [146,147]AM MIGS-6.3 Salinity Inhibited by 6.5% NaCl TAS [145,146]AM MIGS-22 Oxygen Anaerobic TAS [145,146]AM MIGS-15 Biotic relationship Free living and host associated TAS [146,147]AM MIGS-14 Pathogenicity No NAS MIGS-4 Geographic location Soil, feces TAS [146,147]AM Evidence Codes - IDA: Inferred from Direct Assay; TAS: Traceable Author Statement (i.e., a direct report exists in the literature); NAS: Non-traceable Author Statement (i.e., not directly observed for the living, isolated sample, but based on a generally accepted property for the species, or anecdotal evidence). These evidence codes are from http://www.geneontology.org/GO.evidence.shtml of the Gene Ontology project [153].

5.4.2 Growth conditions and DNA isolation

C. indolis DSM 755 was cultivated anaerobically on GS-2 media as described elsewhere

[12]. DNA for sequencing was extracted using the DNA Isolation Bacterial Protocol available through the JGI ( http://www.jgi.doe.gov ). The quality of DNA extracted was assessed by gel electrophoresis and NanoDrop (ThermoScientific, Wilmington, DE) according to the JGI recommendations, and the quantity was measured using the Quant- iT TM Picogreen assay kit (Invitrogen, Carlsbad, CA) as directed.

72

Table 5. Project information.

MIGS ID Property Term MIGS-31 Finishing quality Improved Draft MIGS-28 Libraries used Shotgun and long insert mate pair (Illumina) SMRTbell TM (PacBio) MIGS-29 Sequencing platforms Illumina and PacBio MIGS-31.2 Fold coverage 759.7X (Illumina) 51.6X (PacBio) MIGS-30 Assemblers Velvet, AllpathsLG MIGS-32 Gene calling method Prodigal, GenePRIMP Genome Database release June 3, 2013 (IMB) Genbank ID Pending release by JGI Genbank Date of Release Pending release by JGI GOLD ID Gi22434 Project relevance Anaerobic plant degradation

5.4.3 Genome sequencing and assembly

The draft genome of Clostridium indolis was generated at the DOE Joint genome

Institute (JGI) using a hybrid of the Illumina and Pacific Biosciences (PacBio)

technologies. An Illumina std shotgun library and long insert mate pair library was

constructed and sequenced using the Illumina HiSeq 2000 platform [154]. 16,165,490

reads totaling 2,424.8 Mb were generated from the std shotgun and 26,787,478 reads

totaling 2,437.7 Mb were generated from the long insert mate pair library. A Pacbio

SMRTbellTM library was constructed and sequenced on the PacBio RS platform. 99,448

raw PacBio reads yielded 118,743 adapter trimmed and quality filtered subreads totaling

330.2 Mb. All general aspects of library construction and sequencing performed at the

JGI can be found at http://www.jgi.doe.gov. All raw Illumina sequence data was passed

through DUK, a filtering program developed at JGI, which removes known Illumina

sequencing and library preparation artifacts [155]. Filtered Illumina and PacBio reads

were assembled using AllpathsLG (PrepareAllpathsInputs: PHRED 64=1 PLOIDY=1

FRAG COVERAGE=50 JUMP COVERAGE=25; RunAllpath- sLG: THREADS=8

73

RUN=std pairs TARGETS=standard VAPI WARN ONLY=True OVERWRITE=True)

[156]. The final draft assembly contained 1 contig in 1 scaffold. The total size of the genome is 6.4 Mb. The final assembly is based on 2,424.6 Mb of Illumina Std PE ,

2,437.6 Mb of Illumina CLIP PE and 330.2 Mb of PacBio post filtered data, which provides an average 759.7X Illumina coverage and 51.6X PacBio coverage of the genome, respectively.

5.4.4 Genome annotation

Genes were identified using Prodigal [157], followed by a round of manual curation using GenePRIMP [5] for finished genomes and Draft genomes in fewer than 10 scaffolds. The predicted CDSs were translated and used to search the National Center for

Biotechnology Information (NCBI) nonredundant database, UniProt, TIGRFam, Pfam,

KEGG, COG, and InterPro databases. The tRNAScanSE tool [158] was used to find tRNA genes, whereas ribosomal RNA genes were found by searches against models of the ribosomal RNA genes built from SILVA [159]. Other non–coding RNAs such as the

RNA components of the protein secretion complex and the RNase P were identified by searching the genome for the corresponding Rfam profiles using INFERNAL [160].

Additional gene prediction analysis and manual functional annotation was performed within the Integrated Microbial Genomes (IMG) platform [128] developed by the Joint

Genome Institute, Walnut Creek, CA, USA [161].

74

5.5 Genome properties

The genome of C. indolis DSM 755 consists of a 6,383,701 bp circular chromosome

with GC content of 44.93% (Table 6). Of the 5,903 genes predicted, 5,802 were protein-

coding genes, and 101 RNAs; 170 pseudogenes were also identified. 81.21% of genes

were assigned with a putative function with the remaining annotated as hypothetical

proteins. The genome summary and distribution of genes into COGs functional categories

are listed in Tables 7 and 8.

Table 6. Nucleotide content and gene count levels of the genome of C. indolis DSM 755.

Attribute Genome (total) Value % of total a Size (bp) 6,383,701 G+C content (bp) 2,868,247 44.93 Coding region (bp) 5,688,007 89.10 Total genes b 5,903 100.00 RNA genes 101 1.71 Protein-coding genes 5,802 98.02 Protein-coding with function pred. 4,794 81.21 Genes in paralog clusters 4,527 76.69 Genes assigned to COGs 4,643 78.65 Genes with signal peptides 421 7.13 Genes with transmembrane helices 1,494 25.31 Paralogous groups 4,527 76.69 a) The total is based on either the size of the genome in base pairs or the total number of protein coding genes in the annotated genome. b) Also includes 170 pseudogenes.

The genomes of C. indolis and its near relatives ( C. saccharolyticum, C. hathewayi, and

C. phytofermentans ) have similar numbers of genes in each of the 25 broad COG categories (not shown), however differences exist in the type and distribution of genes in specific functional groups (Table 8), particularly those related to COG categories (G)

Carbohydrate transport and metabolism, (C) Energy production and conversion, and (Q)

Secondary metabolites biosynthesis, transport and catabolism.

75

Table 7. Number of genes in C. indolis DSM 755 associated with the 25 general COG functional categories

Code Value % of total a Description J 184 3.57 Translation A 0 0 RNA processing and modification K 531 10.3 Transcription L 191 3.71 Replication, recombination and repair B 1 0.02 Chromatin structure and dynamics D 28 0.54 Cell cycle control, mitosis and meiosis Y 0 0 Nuclear structure V 107 2.08 Defense mechanisms T 335 6.5 Signal transduction mechanisms M 235 4.56 Cell wall/membrane biogenesis N 70 1.36 Cell motility Z 0 0 Cytoskeleton W 0 0 Extracellular structures U 41 0.8 Intracellular trafficking and secretion O 124 2.41 Posttranslational modification, protein turnover, chaperones C 261 5.06 Energy production and conversion G 910 17.65 Carbohydrate transport and metabolism E 493 9.56 Amino acid transport and metabolism F 110 2.13 Nucleotide transport and metabolism H 153 2.97 Coenzyme transport and metabolism I 77 1.49 Lipid transport and metabolism P 325 6.3 Inorganic ion transport and metabolism Q 70 1.36 Secondary metabolites biosynthesis, transport and catabolism R 590 11.45 General function prediction only S 319 6.19 Function unknown - 1260 21.35 Not in COGs a) The total is based on the total number of protein coding genes in the annotated genome.

5.5.1 Carbohydrate transport and metabolism

Plant biomass is a complex composite of fibrils and sheets of cellulose, hemicellulose, waxes, pectin, proteins, and lignin. Bacteria from soil and the gut generally possess a variety of genes to degrade and transport the diversity of substrates encountered in these plant-rich environments. The genome of C. indolis includes 910 genes (17.65% of total protein coding genes) in this COG group including glycoside hydrolases with the potential to degrade complex carbohydrates including starch, cellulose, and chitin

(Table 9), as well as an abundance of carbohydrate transporters (Figure 29).

76

Table 8. Number of genes in C. indolis DSM 755 not found in near-relatives associated with the 25 general COG functional categories *

Code Value Description J 4 Translation A 0 RNA processing and modification K 5 Transcription L 9 Replication, recombination and repair B 1 Chromatin structure and dynamics D 0 Cell cycle control, mitosis and meiosis Y 0 Nuclear structure V 1 Defense mechanisms T 2 Signal transduction mechanisms M 8 Cell wall/membrane biogenesis N 2 Cell motility Z 0 Cytoskeleton W 0 Extracellular structures U 1 Intracellular trafficking and secretion O 10 Posttranslational modification, protein turnover, chaperones C 28 Energy production and conversion G 6 Carbohydrate transport and metabolism E 8 Amino acid transport and metabolism F 1 Nucleotide transport and metabolism H 11 Coenzyme transport and metabolism I 2 Lipid transport and metabolism P 11 Inorganic ion transport and metabolism Q 10 Secondary metabolites biosynthesis, transport and catabolism R 18 General function prediction only S 21 Function unknown * Number of genes from a set of 158 genes not found in near relatives (C. saccharolyticum, C. phytofermentans, C. hathewayi ) associated with the 25 general COG functional categories.

Almost 8% of the protein-coding genes in the genome of C. indolis were found to be

associated with carbohydrate transport, represented by two main strategies. ABC (ATP

binding cassette) transporters tend to carry oligosaccharides, and have less affinity for

hexoses [139,140], while PTS (phosphotransferase system) transporters carry many

different mono- and disaccharides, especially hexoses [137]. PTS systems provide a

means of regulation via catabolite repression [138], and are thought to enable bacteria

living in carbohydrate-limited environments to more efficiently utilize and compete for

substrates [138]. Both C. indolis and its near relatives are more highly enriched in ABC

77 Table 9. Selected carbohydrate active genes in the C. indolis DSM 755 genome.

Gene Count Gene Product Gene ID 19 Beta-glucosidase (GH-1) EC:3.2.1.86 EC:3.2.1.23 Beta-galactosidase/ 8 EC:3.2.1.25 beta-glucuronidase (GH-2) EC:3.2.1.31 Beta-glucosidase/ related EC:3.2.1.21 7 glucosidases (GH-3) EC:3.2.1.52 Alpha-galactosidases/ EC:3.2.1.86 14 6-phospho-beta-glucosidases EC:3.2.1.122 (GH-4) EC:3.2.1.22 2 Cellulase, endogluconase (GH-5) EC:3.2.1.4 EC:3.2.1.10 EC:3.2.1.20 14 Alpha-amylase EC:2.4.1.7 EC:3.2.1.70 8 Beta-xylosidase (GH 39) EC:3.2.1.37 2 Chitinase (GH 18) EC:3.2.1.14 than PTS transporters (Fig 29), however nearly a third of C. indolis and C. saccharolyticum transporters are PTS genes, suggesting a preference for hexoses, as well as an adaptation to more marginal environments. C. indolis also possesses ten genes associated with all

a b 15 ABC

PTS

4

10

2 5 Number of COGS Number Percentage of genes Percentage

0 0

C. indolis C. saccharolyticum C. hathewayi C. phytofermentans C. indolis C. saccharolyticum C. hathewayi C. phytofermentans Species Species Figure 29. Distribution of ABC and PTS transporters in the genomes of C. indolis and close relatives. Shown by ( a) Number of COGS, and ( b) Percentage of genes in the genome.

78

three components of the TRAP-type C4-dicarboxylate transport system, which transports

C4-dicarboxylates such as formate, succinate, and malate [162], as well as six putative malate dehydrogenases and two putative succinate dehydrogenases suggesting that C. indolis may have the potential to utilize both of these short chain fatty acids.

5.5.2 Energy production and conversion

The genome of C. indolis contains 261 genes in COG category (C) Energy production

and conversion, 28 of which are not found in the near relatives analyzed, including genes

for citrate utilization (Table 10 and Figure 30) and nitrogen fixation (Table 11).

Table 10. Selection of C. indolis DSM 755 genes related to citrate utilization.

Locus Tag Gene Product Gene ID

*K401DRAFT_2892 holo-ACP synthase (CitX) EC:2.7.7.61

*K401DRAFT_2893 citrate lyase acyl carrier (CitD) EC:4.1.3.6 EC:4.1.3.6 *K401DRAFT_2894 citrate lyase beta subunit (CitE) EC:2.8.3.10 EC:4.1.3.6 *K401DRAFT_2895 citrate lyase alpha subunit (CitF) EC:2.8.3.10 triphosphoribosyl-dephospho-CoA synthase *K401DRAFT_2896 EC:2.7.8.25 (CitG) *K401DRAFT_2897 citrate (pro3S)-lyase ligase (CitC) EC:6.2.1.22 response regulator, CheY-like receiver domain, K401DRAFT_2898 – winged helix DNA binding domain *K401DRAFT_2899 signal transduction histidine kinase –

KO:K03303 *K401DRAFT_2900 citrate transporter, CITMHS family TC.LCTP * Genes without homologs in near relatives ( C. saccharolyticum, C. phytofermentans, C. hathewayi ).

79 K401DRAFT_2892 2893 2894 2895 2896 2897 2898 2899 2900

CitX CitE CitF CitG CitC RR HK CITMHS 3154420 CitD 3164793 Figure 30. Citrate utilization genes in C. indolis. A single gene cluster on K401DRAFT_scaffold0000.1.1, including citCDEFGX , the citrate transporter CitMHS , and a putative two-component system.

Table 11. Selection of C. indolis DSM 755 genes related to nitrogen fixation.

Locus Tag Gene Product Gene ID *K401DRAFT_0533 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_0534 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_0535 nitrogenase subunit (ATPase) (nifH) Pfam00142 *K401DRAFT_0884 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_0885 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_0886 nitrogenase subunit (ATPase) (nifH) Pfam00142 *K401DRAFT_3349 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_3350 nitrogenase Mo-Fe protein, α and β chains Pfam00148 *K401DRAFT_3351 nitrogenase subunit (ATPase) (nifH) Pfam00142 *K401DRAFT_3874 nitrogenase Mo-Fe protein, α and β chains (nifD) Pfam00148 *K401DRAFT_3875 nitrogenase Mo-Fe protein, α and β chains (nifK) Pfam00148 *K401DRAFT_3876 nitrogenase Fe protein Pfam00142 *K401DRAFT_3878 nitrogenase Mo-Fe protein, α and β chains (nifD) Pfam00148 *K401DRAFT_3879 nitrogenase Mo-Fe protein, α and β chains (nifK) Pfam00148 K401DRAFT_3880 dinitrogenase Fe-Mo cofactor Pfam02579 *K401DRAFT_3895 nitrogenase Mo-Fe protein, α and β chains (nifD) Pfam00148 *K401DRAFT_3896 nitrogenase Mo-Fe protein, α and β chains (nifK) Pfam00148 *K401DRAFT_5519 nitrogenase Mo-Fe protein, α and β chains (nifB) Pfam04055 *K401DRAFT_5520 nitrogenase Mo-Fe protein, α and β chains (nifE) Pfam00148 *K401DRAFT_5521 nitrogenase Mo-Fe protein (nifK) Pfam00148 *K401DRAFT_5522 nitrogenase component 1, alpha chain Pfam00148 *K401DRAFT_5525 nitrogenase subunit (ATPase) (nifH) Pfam00142 * Genes without homologs in near relatives ( C. saccharolyticum, C. phytofermentans, C. hathewayi ). Nitrogenase genes have a common gene identifier (EC:1.18.6.1), therefore the Pfam numbers are given to distinguish between subunits.

80

5.5.3 Citrate utilization

Citrate is a metabolic intermediary found in all living cells. In aerobic bacteria, citrate is utilized as part of the tricarboxylic acid (TCA) cycle. In anaerobes, citrate is fermented to acetate, formate, and/or succinate. The first step is the conversion of citrate to acetate and oxaloacetate in a reaction catalyzed by citrate lyase (EC:4.1.3.6) [163]. C. sphenoides , a close relative of C. indolis that does not yet have a sequenced genome has been shown to utilize citrate [164], but there is conflicting evidence as to whether this phenotype is present in C. indolis [115,149]. The genome of C. indolis reveals a group of seven citrate

genes organized in a cluster similar to operons found in other bacterial species [163,165]

(Figure 30) including CitD, CitE, and CitF, the three subunits of the citrate lyase gene

[163], CitG and CitX which have been shown to be necessary for citrate lyase function

[165], CitMHS, a citrate transporter, and a putative two component system similar to

citrate regulatory mechanisms in other bacteria [166].

5.5.4 Nitrogen Fixation

Nitrogen fixation has been observed in other clostridia [167,168] but has not been

demonstrated in the C. saccharolyticum species group. It has been suggested that the

capacity to fix nitrogen confers a selective advantage to cellulolytic microbes that live in

nitrogen limited environments such as many soils [167]. The C. indolis genome reveals

22 nitrogenases in four gene clusters (Table 11), none of which are found in the near relatives analyzed in this study. Genes needed for the nitrogenase component proteins

(nifH , nifD , and nifK ) are present in C. indolis , but two of the four genes required to synthesize the nitrogenase iron-molybdenum cofactor ( nifV and nifN ) are absent.

81

5.5.5 Lactate utilization

The genome of C. indolis includes both D- and L-lactate dehydrogenases, which convert

lactate to pyruvate. Additionally, there is a lactate transporter, suggesting the ability of

C. indolis to utilize exogenous lactate (Table 12).

Table 12. Selection of C. indolis DSM 755 genes related to lactate utilization.

Locus Tag Gene Product Gene ID K401DRAFT_1877 L-lactate dehydrogenase EC:1.1.1.27 K401DRAFT_5775 L-lactate dehydrogenase EC:1.1.1.27 *K401DRAFT_3431 L-lactate transporter, LctP family TC.LCTP K401DRAFT_3220 D-lactate dehydrogenase EC:1.1.1.28 * Genes without homologs in near relatives ( C. saccharolyticum, C. phytofermentans, C. hathewayi ).

5.5.6 Bacterial microcompartments (BMC)

Additionally, the C. indolis genome reveals 11 genes associated with carboxysome shell

proteins used to concentrate carbon dioxide and sequester biochemical processes in two

gene clusters with additional genes associated ethanolamine and propanediol utilization.

Ethanolamine microcompartments have not been experimentally demonstrated in the

Lachnospiraceae, and the ethanolamine-ammonia lyase gene (EC:4.2.1.7) is not found in

the C. indolis genome, suggesting either a alternative pathway or secondary role for these

BMC genes.

5.5.7 Secondary metabolites biosynthesis, transport and catabolism

Protocatechuate and other aromatics are intermediaries in the degradation of lignin in

plant rich environments [169]. The genome of C. indolis contains two protocatechuate

82

dioxygenases and an aromatic hydrolase, revealing the potential for utilizing aromatic compounds (Table 13).

Table 13. Selection of C. indolis DSM 755 genes related to degradation of aromatics.

Locus Tag Gene Product Gene ID

*K401DRAFT_3571 Protocatechuate 3,4-dioxygenase beta subunit EC:1.13.11.3

*K401DRAFT_3568 Protocatechuate 3,4-dioxygenase beta subunit EC:1.13.11.3 EC:5.3.3.3 *K401DRAFT_3412 Aromatic ring hydroxylase EC:4.2.1.120 * Genes without homologs in near relatives ( C. saccharolyticum, C. phytofermentans, C. hathewayi ).

5.6 Conclusion

The genomic sequence of C. indolis reported here reveals the metabolic potential of this organism to utilize a wide assortment of fermentable carbohydrates and intermediates including citrate, lactate, malate, succinate, and aromatics, and points to potential ecological roles in nitrogen fixation and ethanolamine utilization. Further culture-based characterization is necessary to confirm the metabolic activity suggested by this genome analysis, and to expand the description of C. indolis .

83

CHAPTER 6

CONCLUSIONS AND FUTURE DIRECTIONS

6.1 Conclusions and future directions

This dissertation describes progress made on four projects exploring the community structure and activities of anaerobic microbes from soil and gut environments.

6.1.1 Microcosmic evolution

Utilizing selective pressure to enrich for organisms with specific metabolic characteristics is a fundamental technique in microbiology, however, the application of this strategy to establish stable, replicate communities from a natural sample is a novel approach, as co-cultures of isolated strains have been traditionally employed. Given that cultured isolates are widely recognized as differing from their wild type origins, and the probability of simple pair-wise combinations occurring in nature is low, the ecological relevance of such studies could be questioned, however the enormous complexity of microbial communities in soil warrants the establishment of tractable model systems.

This research demonstrates that long-term enrichment can produce simplified, replicable communities, enabling the isolation of novel microbes, and providing a flexible model for future studies seeking to assess the effects of biotic and/or abiotic factors in a community context. Future experiments to isolate more microbes from the microcosm communities are ongoing, and will provide organisms for further sequencing and characterization.

84

It is interesting that the OTUs that persisted in the microcosm system for more than three years were so rare in the original soil sample. Whether their persistence was due to the ability to outcompete others for resources, establish cooperative networks with other taxa, or the adoption of another strategy is a question for further study. Metagenomic studies of isolates in pairwise combinations could inform efforts to understand the mechanisms underlying the success of the persistent OTUs, and provide insight into the metabolic strategies at work to degrade plant biomass by the adapted communities.

6.1.2 Lactate fermentation in the horse gut

By utilizing an in vitro system to model the changes in bacterial community structure

and function occurring during starch induced lactic acidosis in horses, we were able to

identify and isolate specific microbes that could be implicated in the recovery and/or

resistance to lactate build up. Future experiments will assess the extent to which these

microbes and/or microbial communities can attenuate or prevent lactic acidosis in our in

vitro system modeling conditions of starch induction. We are excited by the prospect that

one of our horse gut isolates could be developed as a probiotic to prevent lactic acidosis

in horses.

Our research points to many additional questions regarding the utilization of lactic acid

by horse gut communities. While we have demonstrated changes in community structure

associated with lactate clearance, it is not clear which lactate (or starch) utilizers are most

active and under what circumstances. Stable isotope probes could be used to track the

flow of labeled starch and lactate through the community, pointing, in a time course, to

85

the microbes that utilize it most quickly and in the greatest abundance. These experiments could be coupled with tests of substrate and/or environmental preferences to identify whether specific conditions favor high lactate utilization rates. Additionally, RNA-seq could be used over the time course to measure gene expression levels indicating specific metabolic and/or competitive strategies at work to survive the stress of high lactate levels and lowered pH.

6.1.3 Comparative genomics of gut vs. free-living microbes

At the time of this writing there were 9,244 sequenced bacterial genomes available at the IMG web portal [128], and more are being released every day. While it is true that some bacterial groups are underrepresented in the database, there is still a wealth of genomic data available with which to generate and test hypotheses in silico . These results can inform and drive experimental design in the lab by providing insights into the genetic potentials of microbes of interest.

Here genome comparisons of the carbohydrate-active enzymes, transporters, and metabolic pathways between the Lachnospiraceae, Ruminococcaceae, and Clostridiaceae revealed the Lachnospiraceae and Ruminococcaceae to be more highly specialized for the degradation of complex plant material. In the gut environment, the ability to degrade both cellulose and hemicellulose components of plant biomass could enable members of these groups to utilize a wider range of substrates that are indigestible by the host. Further experimentation is needed in the lab to determine the extent to which the genetic potential of members of these groups is expressed and under what circumstances. Incorrect or

86

misleading gene annotation could suggest a function that does not exist. Substrate preferences, cooperative interactions with other community members, and environmental limitations are factors that influence ecosystem function without a clear genetic signal. A comparative genomics approach as described here provides insight into what molecular tools a microbe brings to the table, generating hypotheses about core genes and functions that may be associated with the commensal lifestyle.

6.1.4 Genome sequencing and characterization

Clostridium indolis , like many other bacteria, was isolated and characterized prior to the advent of molecular tools and the description of this taxa is scanty and conflicting. Its membership in the C. saccharolyticum species group makes it interesting in terms of

potential saccharolytic ability. The genome sequence of Clostridium indolis reveals the potential to utilize a wider range of substrates than originally thought, including lactate, citrate, aromatics, and ethanolamine. Additionally, there is evidence that C. indolis may fix nitrogen, which could give it a selective advantage in nitrogen poor soil environments, and that it may use bacterial microcompartments to sequester metabolic reactions. These results point to the need for validation experiments that could inform a more complete description of this taxa and its role in soil and gut environments.

DNA samples from four additional members of the C. saccarolyticum group and two closely related microbes that were isolated from the microcosms are in the process of

being sequenced. These data will provide insight into the genetic potential of the rest of

87

the C. saccarolyticum group and enable comparative genomics to better determine how

they are related to each other.

88

APPENDIX A

PHYLOGENETIC TREES OF OTUS AND ISOLATES FROM THREE MICROCOSM FAMILIES

a

KNHs214 0.32 Clost1 1

0.88 OTU_1

Clostridium_tepidiprofundi 0.16 Clostridium_cellulovorans 0.46 0.19 Clostridium_acetobutylicum

Clostridium_kluyveri

0.35 Clostridium_histolyticum

0.28 Eubacterium_combesii

1 0.63 Clostridium_botulinum 0.35 Clostridium_sporogenes

Clostridium_beijerinckii 1 Clostridium_saccharobutylicum 0.59 0.82 Clostridium_intestinale

0.35 Eubacterium_multiforme 0.82 Clostridium_perfringens

Clostridium_novyi 1 0 Clostridium_haemolyticum

Clost3 0.9 Anaerosporobacter_mobilis 0.93 Clost2

0.68 Clost5

Alkaliphilus_halophilus 0.85 1 Alkaliphilus_oremlandii 0.76 Clost4 0.71 Clost6

Escherichia/Shigella_coli 0.02

89

b

Clostridium_sphenoides 0.13 Clostridium_celerecrescens 0.33

Lachno6 0.83

OTU_2 0.65 0.91 KNHs206

Clostridium_xylanolyticum

0.26 OTU_5 0.64 KNHs205

1 Lachno4

0.16 0.55 Clostridium_xylanovorans 0.34 Clostridium_jejuense

Clostridium_hathewayi

0.35 OTU_7

0.33 Lachno11

0.26 Robinsoniella_peoriensis 0.39 KNHs208 0.2

OTU_8 0.75 KNHs209

0.38 Coprococcus_catus

Lachno3

Clostridium_populeti 0.43 0.41 Anaerostipes_caccae

0.32 Lachno2

0.72 0.65 Clostridium_phytofermentans

1 OTU_3

0.7 Lachno5 0.38 0.95 KNHs212

Roseburia_intestinalis 0.99 0 Eubacterium_rectale

Clostridium_colinum

Cellulosilyticum_lentocellum

Escherichia/Shigella_coli 0.02

90

c

OTU_4

0.3

KNHs216 0.99

Rumino2

0.53

Rumino1

1 0.82 Clostridium_sporosphaeroides

0.91 Clostridium_leptum

Ruminococcus_bromii 0.55

Clostridium_methylpentosum

0.37 Ruminococcus_flavefaciens 0.63

0.52

Ruminococcus_albus

0 Faecalibacterium_prausnitzii

Clostridium_viride

0.91 Clostridium_cellulolyticum

1 Clostridium_straminisolvens

0.97

Clostridium_thermocellum

Escherichia/Shigella_coli 0.02

Phylogenetic trees of OTUs and isolates from three microcosm families. To verify the match between isolate sequences (red), abundant representative OTUs (blue), and type species (black), a phylogenetic tree was constructed for each group. ( a) Clostridiaceae, (b) Lachnospiraceae, ( c) Ruminococcaceae. 16S rRNA sequences downloaded from the Ribosomal Protein Database [52], isolate sequences, and sequences from abundant representative OTUs were aligned in ClustalW 2.0.12 [170] using the Mobyle 1.5 web interface [171]. Sequences were trimmed, and a neighbor-joining tree [142] was made in MEGA [54] using the following parameters: 100 bootstrap iterations, Maximum Composite Likelihood method, substitutions to include transitions/transversions, uniform rates, homogeneous patterns among lineages, and complete deletions [144].

91

APPENDIX B

HYDROGEN SULFIDE GAS LEVELS BY HORSE

Control Lactate Starch Starch and Lactate

0.5 Horse 1 Horse 2 Horse 3

0.4

0.3

0.2 Concentration (mM) Concentration

0.1

0.0

0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 0 10 20 30 40 50 Time(hrs)

Hydrogen sulfide concentrations over time by horse. Concentration (mM) of hydrogen sulfide measured by the Cline (methylene blue) assay from each horse and culture condition at 12 hour intervals.

92

APPENDIX C

HEADSPACE GAS LEVELS BY HORSE

Horse 1 Horse 2 Horse 3

0.006 Control 0.004

0.002

0.000

0.006 Lactate 0.004

0.002

0.000

0.006 Starch 0.004 Concentration (mM) Concentration

0.002

0.000 Starch andStarch Lactate 0.006

0.004

0.002

0.000 10 20 30 40 10 20 30 40 10 20 30 40 Time(hrs)

Hydrogen Methane

Headspace gas concentrations over time by horse. Concentration (mM) of hydrogen and methane gases measured by gas chromatography from each horse and culture condition at times 9, 20, 32, and 45 hours.

93 APPENDIX D

RAREFACTION CURVES BY HORSE

Rarefaction curves by horse. Observed species by number of sequences per sample for each horse dataset generated using a sampling depth of 9685 (the minimum number of sequences per sample and default parameters in QIIME).

94

APPENDIX E

ALPHA DIVERSITY ESTIMATES: HORSE GUT

Chao1 Obs. Shannon Good's Treatment Time Horse Reads Index species Index Cov. Time 0 0 1 131143 14744.1 8504 9.6 97% Starch 12 1 54460 12453.6 5520 9.4 94% Lactate 12 1 74884 12035.9 5952 8.6 96% StarchLactate 12 1 57130 12321.5 5629 9.3 94% Control 12 1 55196 11710.1 5528 9.5 94% Starch 24 1 31602 8295.0 3742 9.1 93% Control 24 1 56883 10609.1 4939 8.2 95% StarchLactate 24 1 25002 7377.6 3160 8.9 92% Lactate 24 1 62382 9736.8 4506 7.6 96% Starch 36 1 45499 8546.2 3672 7.2 95% StarchLactate 36 1 67107 8368.9 3735 5.0 97% Control 36 1 87432 11093.7 5543 8.1 97% StarchLactate 48 1 97717 4676.6 2244 2.1 99% Lactate 48 1 66097 7949.1 3491 5.4 97% Starch 48 1 82264 7082.1 3621 4.7 98% Control 48 1 72983 9405.9 4153 6.5 97% Time 0 0 2 126317 14301.8 8679 9.5 97% Lactate 12 2 48086 10453.3 4993 9.1 94% Control 12 2 35210 9303.7 4207 8.8 93% StarchLactate 12 2 26465 7229.7 3267 8.7 93% Starch 12 2 50658 8722.4 4364 8.3 95% StarchLactate 24 2 11172 4002.9 1558 7.0 91% Lactate 24 2 76671 11587.7 5775 8.2 96% Starch 24 2 9844 5160.5 1737 8.0 88% Control 24 2 70657 12132.5 5737 8.5 95% Control 36 2 26829 7043.4 3008 8.3 93% StarchLactate 36 2 29086 4553.5 1701 4.0 96% Starch 36 2 26266 4388.6 1938 5.4 95% Starch 48 2 98241 6076.0 3311 4.6 98% Control 48 2 19390 4969.0 2283 8.1 93% Lactate 48 2 82176 5558.0 2506 3.5 98% StarchLactate 48 2 144133 4536.4 2219 2.5 99%

95

APPENDIX E

ALPHA DIVERSITY ESTIMATES: HORSE GUT (CONTINUED)

Chao1 Obs. Shannon Good's Treatment Time Horse Reads Index species Index Cov. Time 0 0 3 108582 12506.1 7879 9.5 97% StarchLactate 12 3 10392 3859.8 1727 8.1 90% Control 12 3 25386 6828.4 3054 8.3 93% Starch 12 3 9685 4068.9 1723 7.7 89% Lactate 12 3 10229 4201.1 1951 8.7 89% StarchLactate 24 3 10701 4092.4 1828 8.3 90% Control 24 3 42982 9068.7 3969 8.4 95% Starch 24 3 13472 4661.1 2147 8.0 91% Lactate 24 3 52228 8632.4 3983 8.0 96% Lactate 36 3 86097 7136.3 3575 5.2 98% StarchLactate 36 3 12871 6140.9 2481 9.0 88% Control 36 3 81289 8040.5 3866 5.8 97% Control 48 3 43180 5152.0 2527 4.4 97% Lactate 48 3 68708 4247.8 2137 3.1 98% StarchLactate 48 3 31334 6190.7 2805 6.0 95% Starch 48 3 20417 4417.9 2102 5.3 94% Alpha diversity estimates for each sample. Chao1, observed species, Shannon Index and Good’s coverage for each horse dataset generated using default parameters in QIIME.

96

APPENDIX F

BEST BLAST HITS FOR MOST ABUNDANT HORSE GUT OTUS AT TIME 48

Percent NCBI Accession Best BLAST hit similarity Number Parabacteroides distasonis ATCC 8503 85% NR_074376.1 Streptococcus equi subsp. zooepidemicus 100% NR_102812.1 MGCS10565 Desulfovibrio piger ATCC 29098 99% NR_041778.1 Clostridium paraputrificum ATCC 25780 99% NR_026135.1 Mitsuokella jalaludinii M9 100% NR_028840.1 Megasphaera elsdenii DSM 20460 100% NR_102980.1 Anaerosporobacter mobilis IMSNU 40011 94% NR_042953.1 Selenomonas bovis strain WG 98% NR_044111.1 Veillonella montpellierensis ADV 281.99 98% NR_028839.1 Lactobacillus equi YIT 0455 100% NR_028623.1 Selenomonas ruminantium subsp. lactilytica 98% NR_075026.1 TAM6421 Veillonella caviae PV1 99% NR_025762.1 Best BLAST hit from the 16S rRNA Reference Sequence Database (http://blast.ncbi.nlm.nih.gov ). Percent similarity and NCBI accession numbers for each reference sequence is given.

97

APPENDIX G

TAXA FROM EACH FAMILY USED IN COMPARATIVE ANALYSES

Lachnospiraceae caccae DSM 14662[ID] Blautia hansenii VPI C7-24, DSM 20583[ID] Blautia hydrogenotrophica DSM 10507[ID] Blautia producta ATCC 27340[IP] crossotus DSM 2876[ID] Butyrivibrio fibrisolvens 16/4[C][IF] Butyrivibrio proteoclasticus B316[C][IF][B] morbi ATCC 51271[ID] lentocellum RHM5, DSM 5427[C][IF][B] Clostridium asparagiforme DSM 15981[ID] ATCC BAA-613[ID] DSM 15053[ID] Clostridium nexile DSM 1787 [ID] Clostridium phytofermentans ISDg[C][IF][B] Clostridium saccharolyticum WM1, DSM 2544 [C][IF][B] ATCC 35704[ID] Clostridium sporosphaeroides DSM 1294[IP] Clostridium symbiosum WAL-14163[ID][B] Coprococcus catus GD/7[C][IF] Coprococcus comes ATCC 27758[ID] ATCC 27759 [ID] formicigenerans ATCC 27755[ID] DSM 13814[ID] Eubacterium cellulosolvens 6[IP] Eubacterium eligens ATCC 27750[C][IF][B] Eubacterium hallii DSM 3353[ID] Eubacterium rectale ATCC 33656 [C][IF][B] Eubacterium ventriosum ATCC 27560[ID] ignava ATCC 51276[ID] formatexigens I-52, DSM 14469[ID][B] sinus F0268[ID][B] A2-183, DSM 16839[C][IF][B] M50/1[C][IF][B] DSM 16841[ID] Ruminococcus gnavus ATCC 29149[ID] Ruminococcus obeum A2-162[C][IF] Ruminococcus torques L2-14[C][IF]

98

Clostridiaceae metalliredigens QYMF[C][IF][B] Alkaliphilus oremlandii OhILAs[C][IF][B] Clostridium acetobutylicum ATCC 824[C][IF][B] NCIMB 8052[C][IF][B] BoNT/A2 Kyoto-F [C][IF][B] Clostridium butyricum 5521[ID] Clostridium carboxidivorans P7, DSM 15243[ID] Clostridium cellulovorans 743B, ATCC 35296 [C][IF][B] DSM 555[C][IF][B] PETC, DSM 13528[C][IF][B] NT[C][IF][B] ATCC 13124[C][IF][B] ATCC 15579[ID] Massachusetts E88[C][IF][B]

Ruminococcaceae cellulolyticus CD2, DSM 1870[IP] colihominis DSM 17241[ID] Clostridium alkalicellulosi Z-7026, DSM 17461[IP] Clostridium cellulolyticum H10 [C][IF][B] DSM 753[ID] Clostridium methylpentosum R2, DSM 5476[ID] Clostridium papyrosolvens DSM 2782[ID] stercorarium , DSM 8532 [C][IF] DSM 2360 [C][ID][B] Ethanoligenens harbinense YUAN-3T, DSM 18485 [C][IF][B] Faecalibacterium prausnitzii A2-165 [ID][B] Faecalibacterium prausnitzii SL3/3 [C][IF] Ruminococcus albus 7 [C][IF][B] Ruminococcus bromii L2-63 [C][IF] Ruminococcus flavefaciens FD-1 [ID] Ruminococcus gauvreauii CCRI-16110; EF529620Shuttleworthia satelles DSM 14600[ID]

[C]: CAZy database [129], [I]: IMG database[128] with status designation at time of publication: F = Finished, D = Draft, P = Permanent draft, [B]: Biocyc database [133,134]

99

APPENDIX H

ACTIVITIES OF ENZYMES THAT DIFFER BETWEEN THE LACHNOSPIRACEAE, CLOSTRIDIACEAE, AND RUMINOCOCCACEAE

EC Enzyme Role in Carbohydrate Degradation* Hydrolysis of 1,6-α-D glucosidic linkages of EC:3.2.1.10 Oligo-1,6-glucosidase oligosaccharides produced from starch or glycogen Transfer of glucose units from oligosaccharides to EC:2.4.1.25 4-α-glucanotransferase acceptor sugars (glucose, maltose, or maltotriose 1,4-α-glucan branching Transfers a segment of 1,4-α-D-glucan chain to EC:2.4.1.18 enzyme another similar glucan chain Endohydrolysis of 1,4-α-D-glucosidic linkages in EC:3.2.1.1 α-amylase polysaccharides Addition or removal of 1,4-α-D-glucosyl linkages in EC:2.4.1.4 Amylosucrase starch metabolism Maltose α-D- EC:5.4.99.16 Conversion of maltose to α,α-trehalose glucosyltransferase EC:3.2.1.8 Endo-1,4-β-xylanase Endohydrolysis of 1,4-β-D-xylosidic bonds in xylans

EC:3.2.1.4 Cellulase Endohydrolysis of 1,4-β glycosidic bonds in cellulose Hydrolysis of D-glucosyl-N-acylsphingosine, EC:3.2.1.45 Glucosylceramidase releasing D-glucose and N-acylsphingosine Mannan endo-1,4-β- Hydrolysis of 1,4-β-D-mannosidic linkages in EC:3.2.1.78 mannosidase mannans, galactomannans and glucomannans Hydrolysis of β-D-galactosides, releasing β-D- EC:3.2.1.23 Beta-galactosidase galactose Hydrolysis of β-D-glucuronoside, releasing D- EC:3.2.1.31 Beta-glucuronidase glucuronate and an alcohol Hydrolysis of β-D-mannosides, releasing β-D- EC:3.2.1.25 Beta-mannosidase mannose Glucan 1,3-β- Successive hydrolysis of 1,3-β-D-glucans, releasing EC:3.2.1.58 glucosidase β-D-glucose Successive hydrolysis of 1,4-β-D-xylans, releasing EC:3.2.1.37 Xylan 1,4-β-xylosidase D-xylose Cellulose 1,4-β- Hydrolysis of 1,4-β-D-glucosidic residues in EC:3.2.1.91 cellobiosidase cellulose, releasing cellobiose Hydrolysis of terminal α-D-galactose residues, EC:3.2.1.22 Alpha-galactosidase releasing α-D-galactose Hydrolysis of terminal α-D-glucose residues, EC:3.2.1.20 Alpha-glucosidase releasing α-D-glucose *Activities as identified in available databases [129,133]

100

APPENDIX I

SUBSTRATE AFFINITIES OF CARBOHYDRATE BINDING MODULES THAT DIFFER BETWEEN THE LACHNOSPIRACEAE, CLOSTRIDIACEAE, AND RUMINOCOCCACEAE

CBM Substrate Affinity* CBM2 cellulose, chitin, xylan CBM6 cellulose, xylan, β-1,3 glucan, β-1,4- glucan CBM22 xylan CBM32 galactose, lactose, polygalacturonic acid, LacNAc CBM35 xylans, mannans, mannooligosaccharides, β -galactan CBM48 glycogen chitopentaose, peptidoglycan degrading enzymes such as CBM50 peptidases and amidases * Activities as identified in available databases [133].

101

REFERENCES

1. Schimel DS (1995) Terrestrial ecosystems and the carbon cycle. Glob Change Biol 1: 77–91.

2. Leschine S (1995) Cellulose Degradation in Anaerobic Environments. Annu Rev Microbiol 49: 399–426.

3. Ryan MG, Law BE (2005) Interpreting, measuring, and modeling soil respiration. Biogeochemistry 73: 3–27. doi:10.1007/s10533-004-5167-7.

4. Lipson DA, Schadt CW, Schmidt SK (2002) Changes in Soil Microbial Community Structure and Function in an Alpine Dry Meadow Following Spring Snow Melt. Microb Ecol 43: 307–314.

5. Fierer N, Schimel JP, Holden PA (2003) Influence of Drying–Rewetting Frequency on Soil Bacterial Community Structure. Microb Ecol 45: 63–71.

6. DeAngelis KM, Silver WL, Thompson AW, Firestone MK (2010) Microbial communities acclimate to recurring changes in soil redox potential status. Environ Microbiol 12: 3137–3149. doi:10.1111/j.1462-2920.2010.02286.x.

7. Tiedje JM, Sexstone AJ, Parkin TB, Revsbech NP (1984) Anaerobic processes in soil. Plant Soil 76: 197–212.

8. Torsvik V, Øvreås L (2002) Microbial diversity and function in soil: from genes to ecosystems. Curr Opin Microbiol 5: 240–245.

9. Tringe SG (2005) Comparative Metagenomics of Microbial Communities. Science 308: 554–557. doi:10.1126/science.1107851.

10. Lozupone CA, Knight R (2007) Global patterns in bacterial diversity. Proc Natl Acad Sci 104: 11436–11440.

11. Van der Heijden MGA, Bardgett RD, van Straalen NM (2008) The unseen majority: soil microbes as drivers of plant diversity and productivity in terrestrial ecosystems. Ecol Lett 11: 296–310. doi:10.1111/j.1461-0248.2007.01139.x.

12. Warnick Thomas A (2002) Clostridium phytofermentans sp. nov., a cellulolytic from forest soil. Int J Syst Evol Microbiol 52: 1155–1160. doi:10.1099/ijs.0.02125-0.

13. Schink B (1997) Energetics of syntrophic cooperation in methanogenic degradation. Microbiol Mol Biol Rev 61: 262–280.

102

14. Kato S, Haruta S, Cui ZJ, Ishii M, Igarashi Y (2004) Effective cellulose degradation by a mixed-culture system composed of a cellulolytic Clostridium and aerobic non-cellulolytic bacteria. FEMS Microbiol Ecol 51: 133–142. doi:10.1016/j.femsec.2004.07.015.

15. Fondevila M, Dehority BA (1996) Interactions between Fibrobacter succinogenes, Prevotella ruminicola, and Ruminococcus flavefaciens in the digestion of cellulose from forages. J Anim Sci 74: 678–684.

16. Murray WD (1986) Symbiotic relationship of Bacteroides cellulosolvens and Clostridium saccharolyticum in cellulose fermentation. Appl Environ Microbiol 51: 710–714.

17. Cavedon K, Canale-Parola E (1992) Physiological interactions between a mesophilic cellulolytic Clostridium and a non-cellulolytic bacterium. FEMS Microbiol Ecol 86: 237–245.

18. Osborne JM, Dehority BA (1989) Synergism in degradation and utilization of intact forage cellulose, hemicellulose, and pectin by three pure cultures of ruminal bacteria. Appl Environ Microbiol 55: 2247–2250.

19. Ott SJ, Musfeldt M, Ullmann U, Hampe J, Schreiber S (2004) Quantification of Intestinal Bacterial Populations by Real-Time PCR with a Universal Primer Set and Minor Groove Binder Probes: a Global Approach to the Enteric Flora. J Clin Microbiol 42: 2566–2572. doi:10.1128/JCM.42.6.2566-2572.2004.

20. Parra R (1978) Comparison of foregut and hindgut fermentation in herbivores.

21. Van Soest PJ (1984) Nutritional Ecology of the Ruminant. 2nd ed. Ithaca, NY: Cornell University Press.

22. Argenzio RA, Southworth M, Stevens CE (1974) Sites of organic acid production and absorption in the equine gastrointestinal tract. Am J Physiol 226: 1043–1050.

23. Kentucky Equine Research, Inc. (2008) Understanding Digestion in the Horse, A Comparative Approach. Equinews 10: 6–10.

24. De Fombelle A, Julliand V, Drogoul C, Jacotot E (2003) Characterization of the microbial and biochemical profile of the different segments of the digestive tract in horses given two distinct diets. Acta Agric Scand Secion Anim Sci 77: 293–304.

25. Collins MD, Lawson PA, Willems A, Cordoba JJ, Fernandez-Garayzabal J, et al. (1994) The phylogeny of the genus Clostridium: proposal of five new genera and eleven new species combinations. Int J Syst Bacteriol 44: 812–826.

26. Roesch LF, Fulthorpe RR, Riva A, Casella G, Hadwin AK, et al. (2007) Pyrosequencing enumerates and contrasts soil microbial diversity. ISME J 1: 283– 290.

103

27. Jalanka-Tuovinen J, Salonen A, Nikkilä J, Immonen O, Kekkonen R, et al. (2011) Intestinal Microbiota in Healthy Adults: Temporal Analysis Reveals Individual and Common Core and Relation to Intestinal Symptoms. PLoS ONE 6: e23035. doi:10.1371/journal.pone.0023035.

28. Turnbaugh PJ, Hamady M, Yatsunenko T, Cantarel BL, Duncan A, et al. (2008) A core gut microbiome in obese and lean twins. Nature 457: 480–484. doi:10.1038/nature07540.

29. Peris-Bondia F, Latorre A, Artacho A, Moya A, D’Auria G (2011) The Active Human Gut Microbiota Differs from the Total Microbiota. PLoS ONE 6: e22448. doi:10.1371/journal.pone.0022448.

30. Tap J, Mondot S, Levenez F, Pelletier E, Caron C, et al. (2009) Towards the human intestinal microbiota phylogenetic core. Environ Microbiol 11: 2574–2584. doi:10.1111/j.1462-2920.2009.01982.x.

31. Daly K, Stewart CS, Flint HJ, Shirazi-Beechey SP (2001) Bacterial diversity within the equine large intestine as revealed by molecular analysis of cloned 16S rRNA genes. FEMS Microbiol Ecol 38: 141–151.

32. Costa MC, Arroyo LG, Allen-Vercoe E, Stämpfli HR, Kim PT, et al. (2012) Comparison of the Fecal Microbiota of Healthy Horses and Horses with Colitis by High Throughput Sequencing of the V3-V5 Region of the 16S rRNA Gene. PLoS ONE 7: e41484. doi:10.1371/journal.pone.0041484.

33. Shepherd ML, Swecker WS, Jensen RV, Ponder MA (2012) Characterization of the fecal bacteria communities of forage-fed horses by pyrosequencing of 16S rRNA V4 gene amplicons. FEMS Microbiol Lett 326: 62–68. doi:10.1111/j.1574- 6968.2011.02434.x.

34. MURRAY WD, Khan AW (1982) Clostridium saccharolyticum sp. nov., a saccharolytic species from sewage sludge. Int J Syst Bacteriol 32: 132–135.

35. Palop ML, Valles S, Pinaga F, Flors A (1989) Isolation and Characterization of an Anaerobic, Cellulolytic Bacterium, Clostridium celerecrescens sp. nov. Int J Syst Bacteriol 39: 68–71.

36. Mechichi T, Patel BKC, Sayadi S (2005) Anaerobic degradation of methoxylated aromatic compounds by Clostridium methoxybenzovorans and a nitrate-reducing bacterium Thauera sp. strain Cin3,4. Int Biodeterior Biodegrad 56: 224–230. doi:10.1016/j.ibiod.2005.09.001.

37. Heritage AD, MacRae IC (1977) Degradation of lindane by cell-free preparations of Clostridium sphenoides. Appl Environ Microbiol 34: 222–224.

104

38. Amann RI, Ludwig W, Schleifer K-H (1995) Phylogenetic identification and in situ detection of individual microbial cells without cultivation. Microbiol Rev 59: 143–169.

39. DeLong EF, Pace NR (2001) Environmental diversity of bacteria and archaea. Syst Biol 50: 470–478.

40. McCaig AE, Grayston SJ, Prosser JI, Glover LA (2001) Impact of cultivation on characterisation of species composition of soil bacterial communities. FEMS Microbiol Ecol 35: 37–48.

41. Allison SD, Martiny JB (2008) Resistance, resilience, and redundancy in microbial communities. Proc Natl Acad Sci 105: 11512–11519.

42. Wohl DL, Arora S, Gladstone JR (2004) Functional redundancy supports biodiversity and ecosystem function in a closed and constant environment. Ecology 85: 1534–1540.

43. Wüst PK, Horn MA, Drake HL (2010) Clostridiaceae and Enterobacteriaceae as active fermenters in earthworm gut content. ISME J 5: 92–106. doi:10.1038/ismej.2010.99.

44. Griffiths RI, Whiteley AS, O’Donnell AG, Bailey MJ (2000) Rapid method for coextraction of DNA and RNA from natural environments for analysis of ribosomal DNA-and rRNA-based microbial community composition. Appl Environ Microbiol 66: 5488–5491.

45. Caporaso JG, Lauber CL, Walters WA, Berg-Lyons D, Huntley J, et al. (2012) Ultra-high-throughput microbial community analysis on the Illumina HiSeq and MiSeq platforms. ISME J. Available: http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej20128a.html?WT.mc _id=ISMEHP12. Accessed 5 March 2013.

46. Magoc T, Salzberg SL (2011) FLASH: fast length adjustment of short reads to improve genome assemblies. Bioinformatics 27: 2957–2963.

47. Caporaso JG, Bittinger K, Bushman FD, DeSantis TZ, Andersen GL, et al. (2010) PyNAST: a flexible tool for aligning sequences to a template alignment. Bioinformatics 26: 266–267.

48. Caporaso JG, Kuczynski J, Stombaugh J, Bittinger K, Bushman FD, et al. (2010) QIIME allows analysis of high-throughput community sequencing data. Nat Methods 7: 335–336.

49. Haas BJ, Gevers D, Earl AM, Feldgarden M, Ward DV, et al. (2011) Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome Res 21: 494–504.

105

50. DeSantis TZ, Hugenholtz P, Larsen N, Rojas M, Brodie EL, et al. (2006) Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol 72: 5069–5072.

51. R Core Team (2012) R: A Language and Environment for Statistical Computing. Vienna, Austria: R Foundation for Statistical Computing. Available: http://www.R-project.org/.

52. Cole JR (2004) The Ribosomal Database Project (RDP-II): sequences and tools for high-throughput rRNA analysis. Nucleic Acids Res 33: D294–D296. doi:10.1093/nar/gki038.

53. Wang Q, Garrity GM, Tiedje JM, Cole JR (2007) Naive Bayesian Classifier for Rapid Assignment of rRNA Sequences into the New Bacterial Taxonomy. Appl Environ Microbiol 73: 5261–5267. doi:10.1128/AEM.00062-07.

54. Tamura K, Peterson D, Peterson N, Stecher G, Nei M, et al. (2011) MEGA5: Molecular Evolutionary Genetics Analysis Using Maximum Likelihood, Evolutionary Distance, and Maximum Parsimony Methods. Mol Biol Evol 28: 2731–2739. doi:10.1093/molbev/msr121.

55. Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403–410.

56. Youssef NH, Elshahed MS (2008) Diversity rankings among bacterial lineages in soil. ISME J 3: 305–313.

57. Hansel CM, Fendorf S, Jardine PM, Francis CA (2008) Changes in Bacterial and Archaeal Community Structure and Functional Diversity along a Geochemically Variable Soil Profile. Appl Environ Microbiol 74: 1620–1633. doi:10.1128/AEM.01787-07.

58. Janssen PH (2006) Identifying the Dominant Soil Bacterial Taxa in Libraries of 16S rRNA and 16S rRNA Genes. Appl Environ Microbiol 72: 1719–1728. doi:10.1128/AEM.72.3.1719-1728.2006.

59. Schäfer H, Bernard L, Courties C, Lebaron P, Servais P, et al. (2001) Microbial community dynamics in Mediterranean nutrient-enriched seawater mesocosms: changes in the genetic diversity of bacterial populations. FEMS Microbiol Ecol 34: 243–253.

60. Riemann L, Steward GF, Azam F (2000) Dynamics of Bacterial Community Composition and Activity during a Mesocosm Diatom Bloom. Appl Environ Microbiol 66: 578–587. doi:10.1128/AEM.66.2.578-587.2000.

106

61. Kunin V, Engelbrektson A, Ochman H, Hugenholtz P (2010) Wrinkles in the rare biosphere: pyrosequencing errors can lead to artificial inflation of diversity estimates. Environ Microbiol 12: 118–123. doi:10.1111/j.1462- 2920.2009.02051.x.

62. Zhou J, Wu L, Deng Y, Zhi X, Jiang Y-H, et al. (2011) Reproducibility and quantitation of amplicon sequencing-based detection. ISME J 5: 1303–1313.

63. DeAngelis KM, Fortney JL, Borglin S, Silver WL, Simmons BA, et al. (2012) Anaerobic Decomposition of Switchgrass by Tropical Soil-Derived Feedstock- Adapted Consortia. mBio 3: e00249–11–e00249–11. doi:10.1128/mBio.00249-11.

64. Flint HJ, Bayer EA, Rincon MT, Lamed R, White BA (2008) Polysaccharide utilization by gut bacteria: potential for new insights from genomic analysis. Nat Rev Microbiol 6: 121–131. doi:10.1038/nrmicro1817.

65. Lynd LR, Weimer PJ, van Zyl WH, Pretorius IS (2002) Microbial Cellulose Utilization: Fundamentals and Biotechnology. Microbiol Mol Biol Rev 66: 506– 577. doi:10.1128/MMBR.66.3.506-577.2002.

66. Brulc JM, Antonopoulos DA, Miller MEB, Wilson MK, Yannarell AC, et al. (2009) Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci 106: 1948–1953.

67. Arumugam M, Raes J, Pelletier E, Le Paslier D, Yamada T, et al. (2011) Enterotypes of the human gut microbiome. Nature 473: 174–180. doi:10.1038/nature09944.

68. Biddle A, Stewart L, Blanchard J, Leschine S (2013) Untangling the Genetic Basis of Fibrolytic Specialization by Lachnospiraceae and Ruminococcaceae in Diverse Gut Communities. Diversity 5: 627–640. doi:10.3390/d5030627.

69. Byappanahalli MN, Nevers MB, Korajkic A, Staley ZR, Harwood VJ (2012) Enterococci in the Environment. Microbiol Mol Biol Rev 76: 685–706. doi:10.1128/MMBR.00023-12.

70. Gilmore MS (2002) The Enterococci: Pathogenesis, Molecular Biology, and Antibiotic Resistance. Washington, D.C.: ASM Press. 439 p.

71. Cabral JPS (2010) Water Microbiology. Bacterial Pathogens and Water. Int J Environ Res Public Health 7: 3657–3703. doi:10.3390/ijerph7103657.

72. Callewaert R, Hugas M, Vuyst LD (2000) Competitiveness and bacteriocin production of< i> Enterococci in the production of Spanish-style dry fermented sausages. Int J Food Microbiol 57: 33–42.

107

73. Foulquié Moreno MR, Rea MC, Cogan TM, De Vuyst L (2003) Applicability of a bacteriocin-producing< i> Enterococcus faecium as a co-culture in Cheddar cheese manufacture. Int J Food Microbiol 81: 73–84.

74. Duncan SH, Louis P, Flint HJ (2004) Lactate-Utilizing Bacteria, Isolated from Human Feces, That Produce Butyrate as a Major Fermentation Product. Appl Environ Microbiol 70: 5810–5817. doi:10.1128/AEM.70.10.5810-5817.2004.

75. Louis P, Flint HJ (2009) Diversity, metabolism and microbial ecology of butyrate- producing bacteria from the human large intestine. FEMS Microbiol Lett 294: 1–8. doi:10.1111/j.1574-6968.2009.01514.x.

76. Janis C (1976) The evolutionary strategy of the Equidae and the origins of rumen and cecal digestion. Evolution: 757–774.

77. Argenzio R (1975) Functions of the equine large intestine and their interrelationship in disease. Cornell Vet 65: 303–330.

78. Argenzio RA, Hintz HF (1972) Effect of diet on glucose entry and oxidation rates in ponies. J Nutr 102: 879–892.

79. Garner HE, Coffman JR, Hahn AW, Hutcheson D, Tumbleson ME (1975) Equine laminitis of alimentary origin: an experimental model. Am J Vet Res 36: 441–449.

80. Al Jassim RAM, Andrews FM (2009) Bacterial Community of the Horse Gastrointestinal Tract and its Relation to Fermentative Acidosis, Laminitis, Colic, and Stomach Ulcers. Vet Clin North Am Equine Pract 25: 199–215.

81. Milinovich GJ, Trott DJ, Burrell PC, van Eps AW, Thoefner MB, et al. (2006) Changes in equine hindgut bacterial populations during oligofructose-induced laminitis. Environ Microbiol 8: 885–898. doi:10.1111/j.1462-2920.2005.00975.x.

82. Crawford C, Sepulveda MF, Elliott J, Harris PA, Bailey SR (2007) Dietary fructan carbohydrate increases amine production in the equine large intestine: Implications for pasture-associated laminitis. J Anim Sci 85: 2949–2958. doi:10.2527/jas.2006- 600.

83. Garner HE, Moore JN, Clark L, Amend JF, Tritschler L.G., et al. (1978) Changes in the caecal flora associated with the onset of laminitis. Equine Vet J 10: 249–252.

84. De Fombelle A, Julliand V, Drogoul C, Jacotot E (2001) Feeding and microbial disprders in horses: 1-Effects of an abrupt incorporation of two levels of barley in a hay diet on microbial profile and activities. J Equine Vet Sci 21: 439–445.

85. Gonçalves S, Julliand V, Leblond A (2002) Risk factors associated with colic in horses. Vet Res 33: 641–652. doi:10.1051/vetres:2002044.

86. Lameness and Laminitis in U.S. Horses (2005). USDAAPHISVS CEAH 4: 250.

108

87. Garner H.E., Moore J.N., Johnson J.H., Clark L., Amend, J.F.,Tritschler L.G., Coffmann J.R. (1978) Changes in the Caecal Flora associated with the Onset of Laminits. Equine Vet J 10: 249–252.

88. Mackie RI, Wilkins CA (1988) Enumeration of anaerobic bacterial microflora of the equine gastrointestinal tract. Appl Environ Microbiol 54: 2155–2160.

89. Kern DL, Slyter LL, Leffel EC, Weaver JM, Oltjen RR (1974) Ponies vs. steers: Microbial and chemical characteristics of intestinal microbiota. J Anim Sci 38: 559–564.

90. Eps A van, Pollitt CC (2006) Equine laminitis induced with oligofructose. Equine Vet J 38: 203–208.

91. Milinovich GJ, Burrell PC, Pollitt CC, Klieve AV, Blackall LL, et al. (2008) Microbial ecology of the equine hindgut during oligofructose-induced laminitis. ISME J 2: 1089–1100.

92. Poyart C (2002) Taxonomic dissection of the Streptococcus bovis group by analysis of manganese-dependent superoxide dismutase gene (sodA) sequences: reclassification of “Streptococcus infantarius subsp. coli” as Streptococcus lutetiensis sp. nov. and of Streptococcus bovis biotype II.2 as Streptococcus pasteurianus sp. nov. Int J Syst Evol Microbiol 52: 1247–1255. doi:10.1099/ijs.0.02044-0.

93. Al Jassim RAM, Scott PT, Trebbin AL, Trott D, Pollitt CC (2005) The genetic diversity of lactic acid producing bacteria in the equine gastrointestinal tract. FEMS Microbiol Lett 248: 75–81. doi:10.1016/j.femsle.2005.05.023.

94. Bryant MP, Burkey LA (1953) Culture methods and some characteristics of some of the more numerous groups of bacteria in the bovine rumen. J Dairy Sci 36: 205– 217.

95. Carvalho IPC de, Detmann E, Mantovani HC, Paulino MF, Valadares Filho S de C, et al. (2011) Growth and antimicrobial activity of lactic acid bacteria from rumen fluid according to energy or nitrogen source. Rev Bras Zootec 40: 1260– 1265.

96. Cline JD (1969) Spectrophotometric determination of hydrogen sulfide in natural waters. Limnol Ocean 14: 454–458.

97. Andrews S (2012) FastQC. Cambridge, UK: Babraham Institute.

98. Kolenbrander P (2006) The Genus Veillonella. In: Dworkin M, Falkow S, Rosenberg E, Schleifer K-H, Stackebrandt E, editors. The Prokaryotes. New York, NY: Springer US. pp. 1022–1040. Available: http://www.springerlink.com/index/10.1007/0-387-30744-3_36. Accessed 26 June 2013.

109

99. Haikara A, Helander I (2006) Pectinatus, Megasphaera and Zymophilus. In: Dworkin M, Falkow S, Rosenberg E, Schleifer K-H, Stackebrandt E, editors. The Prokaryotes. New York, NY: Springer US. pp. 965–981. Available: http://www.springerlink.com/index/10.1007/0-387-30744-3_32. Accessed 26 June 2013.

100. Bailey SR, Baillon M-L, Rycroft AN, Harris PA, Elliott J (2003) Identification of Equine Cecal Bacteria Producing Amines in an In Vitro Model of Carbohydrate Overload. Appl Environ Microbiol 69: 2087–2093. doi:10.1128/AEM.69.4.2087- 2093.2003.

101. Daly K, Proudman CJ, Duncan SH, Flint HJ, Dyer J, et al. (2011) Alterations in microbiota and fermentation products in equine large intestine in response to dietary variation and intestinal disease. Br J Nutr 107: 989–995. doi:10.1017/S0007114511003825.

102. Kumar PS, Brooker MR, Dowd SE, Camerlengo T (2011) Target Region Selection Is a Critical Determinant of Community Fingerprints Generated by 16S Pyrosequencing. PLoS ONE 6: e20956. doi:10.1371/journal.pone.0020956.

103. Chakravorty S, Helb D, Burday M, Connell N, Alland D (2007) A detailed analysis of 16S ribosomal RNA gene segments for the diagnosis of pathogenic bacteria. J Microbiol Methods 69: 330–339. doi:10.1016/j.mimet.2007.02.005.

104. Clarridge JE (2004) Impact of 16S rRNA Gene Sequence Analysis for Identification of Bacteria on Clinical Microbiology and Infectious Diseases. Clin Microbiol Rev 17: 840–862. doi:10.1128/CMR.17.4.840-862.2004.

105. Schlegel L (2003) Reappraisal of the taxonomy of the Streptococcus bovis/Streptococcus equinus complex and related species: description of Streptococcus gallolyticus subsp. gallolyticus subsp. nov., S. gallolyticus subsp. macedonicus subsp. nov. and S. gallolyticus subsp. pasteurianus subsp. nov. Int J Syst Evol Microbiol 53: 631–645. doi:10.1099/ijs.0.02361-0.

106. Marounek M, Bartos S (1987) Interactions between rumen amylolytic and lactate- utilizing bacteria in growth on starch. J Appl Bacteriol 63: 233–238.

107. Mackie RI, Gilchrist FMC, Heath S (1984) An in vivo study of ruminal micro- organisms influencing lactate turnover and its contribution to volatile fatty acid production. J Agric Sci 103: 37–51.

108. Robinson JA, Smolenski WJ, Greening RC, Ogilvie ML, Bell RL, et al. (1992) Prevention of acute acidosis and enhancement of feed intake in the bovine by Megasphaera elsdenii 407A. J Anim Sci 70: 310.

109. Kung L, Hession AO (1995) Preventing in vitro lactate accumulation in ruminal fermentations by inoculation with Megasphaera elsdenii. J Anim Sci 73: 250–256.

110

110. Aikman PC, Henning PH, Humphries DJ, Horn CH (2011) Rumen pH and fermentation characteristics in dairy cows supplemented with Megasphaera elsdenii NCIMB 41125 in early lactation. J Dairy Sci 94: 2840–2849. doi:10.3168/jds.2010-3783.

111. Willing B, VöRöS A, Roos S, Jones C, Jansson A, et al. (2009) Changes in faecal bacteria associated with concentrate and forage-only diets fed to horses in training. Equine Vet J 41: 908–914. doi:10.2746/042516409X447806.

112. Shepherd ML, Swecker WS, Jensen RV, Ponder MA (2012) Characterization of the fecal bacteria communities of forage-fed horses by pyrosequencing of 16S rRNA V4 gene amplicons. FEMS Microbiol Lett 326: 62–68. doi:10.1111/j.1574- 6968.2011.02434.x.

113. Stackebrandt E, Rainey FA (1997) Phylogenic relationships. In: Rood JI, McClane BA, Songer JG, Titball RW, editors. The Clostridia: Molecular Biology and Pathogenesis. New York, NY: Academic Press. p. 533.

114. Garrity GM, Bell JA, Lilburn T (2005) The Revised Road Map to the Manual. In: Brenner DJ, Krieg NR, Staley JT, Garrity GM, editors. Bergey’s Manual® of Systematic Bacteriology, The Proteobacteria, Part A, Introductory Essays. New York, NY: Springer, Vol. 2. pp. 159–220.

115. Bergey’s manual of systematic bacteriology: Volume Three: The Firmicutes (2009). 2nd ed. New York, NY: Springer.

116. DeLong EF (2009) The microbial ocean from genomes to biomes. Nature 459: 200–206. doi:10.1038/nature08059.

117. Barcenilla A, Pryde SE, Martin JC, Duncan SH, Stewart CS, et al. (2000) Phylogenetic Relationships of Butyrate-Producing Bacteria from the Human Gut. Appl Environ Microbiol 66: 1654–1661. doi:10.1128/AEM.66.4.1654-1661.2000.

118. Duncan SH, Barcenilla A, Stewart CS, Pryde SE, Flint HJ (2002) Acetate utilization and butyryl coenzyme A (CoA): acetate-CoA transferase in butyrate- producing bacteria from the human large intestine. Appl Environ Microbiol 68: 5186–5190.

119. Frank DN, St Amand AL, Feldman RA, Boedeker EC, Harpaz N, et al. (2007) Molecular-phylogenetic characterization of microbial community imbalances in human inflammatory bowel diseases. Proc Natl Acad Sci 104: 13780–13785.

120. Ding S-Y, Rincon MT, Lamed R, Martin JC, McCrae SI, et al. (2001) Cellulosomal Scaffoldin-Like Proteins fromRuminococcus flavefaciens. J Bacteriol 183: 1945–1953.

111

121. Fujimoto T, Imaeda H, Takahashi K, Kasumi E, Bamba S, et al. (2013) Decreased abundance of Faecalibacterium prausnitzii in the gut microbiota of Crohn’s disease. J Gastroenterol Hepatol 28: 613–619. doi:10.1111/jgh.12073.

122. Macfarlane S, Macfarlane GT (2006) Composition and Metabolic Activities of Bacterial Biofilms Colonizing Food Residues in the Human Gut. Appl Environ Microbiol 72: 6204–6211. doi:10.1128/AEM.00754-06.

123. Cameron EA, Maynard MA, Smith CJ, Smith TJ, Koropatkin NM, et al. (2012) Multidomain Carbohydrate-binding Proteins Involved in Bacteroides thetaiotaomicron Starch Metabolism. J Biol Chem 287: 34614–34625. doi:10.1074/jbc.M112.397380.

124. Proctor LM (2011) The Human Microbiome Project in 2011 and Beyond. Cell Host Microbe 10: 287–291. doi:10.1016/j.chom.2011.10.001.

125. Edgar RC (2004) MUSCLE: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Res 32: 1792–1797. doi:10.1093/nar/gkh340.

126. Tamura K, Nei M (1993) Estimation of the number of nucleotide substitutions in the control region of mitochondrial DNA in humans and chimpanzees. Mol Biol Evol 10: 512–526.

127. Rambaut A, Drummond A (n.d.) FigTree v1.3.1. http://tree.bio.ed.ac.uk/software/figtree/.

128. Markowitz VM, Chen I-MA, Palaniappan K, Chu K, Szeto E, et al. (2011) IMG: the integrated microbial genomes database and comparative analysis system. Nucleic Acids Res 40: D115–D122. doi:10.1093/nar/gkr1044.

129. Cantarel BL, Coutinho PM, Rancurel C, Bernard T, Lombard V, et al. (2009) The Carbohydrate-Active EnZymes database (CAZy): an expert resource for Glycogenomics. Nucleic Acids Res 37: D233–D238. doi:10.1093/nar/gkn663.

130. Venables WN, Ripley BD (2002) Modern Applied Statistics with S. New York, NY: Springer. Available: http://www.stats.ox.ac.uk/pub/MASS4.

131. Shapiro SS, Wilk MB (1965) An analysis of variance test for normality (complete samples). Biometrika 52: 591–611.

132. O’Hara RB, Kotze DJ (2010) Do not log-transform count data. Methods Ecol Evol 1: 118–122. doi:10.1111/j.2041-210X.2010.00021.x.

133. The UniProt Consortium (2011) Reorganizing the protein space at the Universal Protein Resource (UniProt). Nucleic Acids Res 40: D71–D75. doi:10.1093/nar/gkr981.

112

134. Kanehisa M, Goto S, Sato Y, Furumichi M, Tanabe M (2011) KEGG for integration and interpretation of large-scale molecular data sets. Nucleic Acids Res 40: D109–D114. doi:10.1093/nar/gkr988.

135. Keseler IM, Mackie A, Peralta-Gil M, Santos-Zavaleta A, Gama-Castro S, et al. (2012) EcoCyc: fusing model organism databases with systems biology. Nucleic Acids Res 41: D605–D612. doi:10.1093/nar/gks1027.

136. Karp PD (2005) Expansion of the BioCyc collection of pathway/genome databases to 160 genomes. Nucleic Acids Res 33: 6083–6089. doi:10.1093/nar/gki892.

137. Saier MH (2000) Families of transmembrane sugar transport proteins. Mol Microbiol 35: 699–710.

138. Brückner R, Titgemeyer F (2002) Carbon catabolite repression in bacteria: choice of the carbon source and autoregulatory limitation of sugar utilization. FEMS Microbiol Lett 209: 141–148.

139. Jojima T, Omumasaba CA, Inui M, Yukawa H (2009) Sugar transporters in efficient utilization of mixed sugar substrates: current knowledge and outlook. Appl Microbiol Biotechnol 85: 471–480. doi:10.1007/s00253-009-2292-1.

140. Stülke J, Hillen W (2000) Regulation of carbon catabolism in Bacillus species. Annu Rev Microbiol 54: 849–880.

141. Rees DC, Johnson E, Lewinson O (2009) ABC transporters: the power to change. Nat Rev Mol Cell Biol 10: 218–227. doi:10.1038/nrm2646.

142. Saitou N, Nei M (1987) The neighbor-joining method: A new method for reconstructing phylogenetic trees. Mol Biol Evol 4: 406–425.

143. Felsenstein J (1985) Confidence limits on phylogenies: An approach using the bootstrap. Evolution 39: 783–791.

144. Tamura K, Nei M, Kumar S (2004) Prospects for inferring very large phylogenies by using the neighbor-joining method. Proc Natl Acad Sci 101: 11030–11035.

145. McClung LS, McCoy E (1957) Genus II Clostridium Prazmovski 1880. Bergey’s Manual of Determinative Bacteriology. Baltimore: Williams and Wilkins. pp. 634– 693.

146. Ng H, Vaughn RH (1963) Clostridium rubrum sp. n. and other pectinolytic clostridia from soil. J Bacteriol 85: 1104–1113.

147. Drasar BS, Goddard P, HEATON S, PEACH S, WEST B (1976) Clostridia isolated from faeces. J Med Microbiol 9: 63–71.

113

148. Woo PCY (2005) Clostridium bacteraemia characterised by 16S ribosomal RNA gene sequencing. J Clin Pathol 58: 301–307. doi:10.1136/jcp.2004.022830.

149. Antranikian G, Friese C, Quentmeier A, Hippe H, Gottschalk G (1984) Distribution of the ability for citrate utilization amongst Clostridia. Arch Microbiol 138: 179–182.

150. Field D, Garrity G, Gray T, Morrison N, Selengut J, et al. (2008) The minimum information about a genome sequence (MIGS) specification. Nat Biotechnol 26: 541–547.

151. Woese CR, Kandler O, Wheelis ML (1990) Towards a natural system of organisms: proposal for the domains Archaea, Bacteria, and Eucarya. Proc Natl Acad Sci 87: 4576–4579.

152. Lawson PA, Llop-Perez P, Hutson RA, Hippe H, Collins MD (1993) Towards a phylogeny of the clostridia based on 16S rRNA sequences. FEMS Microbiol Lett 113: 87–92.

153. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, et al. (2000) Gene Ontology: tool for the unification of biology. Nat Genet 25: 25–29.

154. Bennett S (2004) Solexa, Inc. Pharmacogenomics 5: 433–438.

155. Mingkun L, Copeland A, Han J (2011) DUK. Walnut Creek, CA, USA: JGI.

156. Gnerre S, MacCallum I, Przybylski D, Ribeiro FJ, Burton JN, et al. (2010) High- quality draft assemblies of mammalian genomes from massively parallel sequence data. Proc Natl Acad Sci 108: 1513–1518. doi:10.1073/pnas.1017351108.

157. Hyatt D, Chen G-L, LoCascio P, Land M, Larimer F, et al. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.

158. Lowe TM, Eddy SR (1997) tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. Nucleic Acids Res 25: 0955–0964.

159. Pruesse E, Quast C, Knittel K, Fuchs BM, Ludwig W, et al. (2007) SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nucleic Acids Res 35: 7188–7196. doi:10.1093/nar/gkm864.

160. Nawrocki EP, Kolbe DL, Eddy SR (2009) Infernal 1.0: inference of RNA alignments. Bioinformatics 25: 1335–1337.

161. Markowitz VM, Mavromatis K, Ivanova NN, Chen I-MA, Chu K, et al. (2009) IMG ER: a system for microbial genome annotation expert review and curation. Bioinformatics 25: 2271–2278. doi:10.1093/bioinformatics/btp393.

114

162. Forward JA, Behrendt MC, Wyborn NR, Cross R, Kelly DJ (1997) TRAP transporters: a new family of periplasmic solute transport systems encoded by the dctPQM genes of Rhodobacter capsulatus and by homologs in diverse gram- negative bacteria. J Bacteriol 179: 5482–5493.

163. Bott M (1997) Anaerobic citrate metabolism and its regulation in enterobacteria. Arch Microbiol 167: 78–88.

164. Walther R, Hippe H, Gottschalk G (1977) Citrate, a specific substrate for the isolation of Clostridium sphenoides. Appl Environ Microbiol 33: 955–962.

165. Schneider K, Dimroth P, Bott M (2000) Biosynthesis of the Prosthetic Group of Citrate Lyase †. Biochemistry (Mosc) 39: 9438–9450. doi:10.1021/bi000401r.

166. Brocker M, Schaffer S, Mack C, Bott M (2009) Citrate Utilization by Corynebacterium glutamicum Is Controlled by the CitAB Two-Component System through Positive Regulation of the Citrate Transport Genes citH and tctCBA. J Bacteriol 191: 3869–3880. doi:10.1128/JB.00113-09.

167. Leschine SB, Holwell K, Canale-Parola E (1988) Nitrogen fixation by anaerobic cellulolytic bacteria. Science 242: 1157–1159.

168. Chen JS, Toth J, Kasap M (2001) Nitrogen-fixation genes and nitrogenase activity in Clostridium acetobutylicum and Clostridium beijerinckii. J Ind Microbiol Biotechnol 27: 281–286.

169. Crawford RL, McCoy E, Harkin JM, Kirk TK, Obst JR (1973) Degradation of methoxylated benzoic acids by a Nocardia from a lignin-rich environment: significance to lignin degradation and effect of chloro substituents. Appl Microbiol 26: 176–184.

170. Larkin MA, Blackshields G, Brown NP, Chenna R, McGettigan PA, et al. (2007) Clustal W and Clustal X version 2.0. Bioinformatics 23: 2947–2948. doi:10.1093/bioinformatics/btm404.

171. Neron B, Menager H, Maufrais C, Joly N, Maupetit J, et al. (2009) Mobyle: a new full web bioinformatics framework. Bioinformatics 25: 3005–3011. doi:10.1093/bioinformatics/btp493.

115