An Integrated Investigation of the Microbial Communities Underpinning Biogas Production in Anaerobic Digestion Systems

DISSERTATION

Presented in Partial Fulfillment of the Requirements for the Degree Doctor of Philosophy in the Graduate School of The Ohio State University

By

Michael Christopher Nelson

Graduate Program in Environmental Science

The Ohio State University

2011

Dissertation Committee:

Dr. Mark Morrison, Advisor

Dr. Zhongtang Yu, Co-Advisor

Dr. Olli Tuovinen

Copyrighted by

Michael Christopher Nelson

2011

ABSTRACT

Anaerobic digestion (AD) has been used for decades as a waste processing technology, however its ability to serve as a potential renewable energy resource has spurred increased attention into how the microbial communities in AD systems carryout the digestion process and what factors influence their activity. Previous analyses of the microbial diversity in AD were generally based on community analysis techniques such as DGGE or small sequencing libraries. Additionally, conflicting results have been reported regarding the diversity and abundance of and in AD systems.

The overall objective of my research was to further describe the microbial consortia that participate in the AD process as well as to investigate the patterns of their diversity. In the first study an initial baseline of the microbial diversity participating in the AD process is established using a meta-analysis approach. This removed the bias inherent in individual studies, allowing the global diversity of microbes in AD systems to be determined. The major bacterial groups were identified as the phyla ,

Proteobacteria, , and , while the largest archaeal groups were the Methanosaeta and an as yet uncultured clade known as WSA2/ArcI. The second study was focused on determining the effects of feedstock and biomass fraction on the microbial communities in a sequential digester operation. Using large 16S rRNA clone libraries, both the bacterial and archaeal populations were examined. The resulting ii communities were found to be very distinct, indicating feedstock composition had a large effect on selecting a microbial community best suited to that particular feedstock.

Differences were also observed between the granular and liquid biomass fractions, indicating potential differences in the metabolic role of these two microbial communities within a single AD system.

Using pyrosequencing, the third study was designed to examine changes in microbial community structure in relation to known reactor operational parameters in an

AD system operated for a yearlong period. Analysis of six time points showed a highly variable microbial community, with large shifts in microbial abundance observed at the phylum level for the Bacteria. Within the Archaea, a significant shift from

Methanosaeta to Methanosarcina was observed due to additions of acetate to the system, with an increase of unclassified due to an increase in temperature.

Exploratory use of the statistical analysis method canonical correspondence analysis

(CCA) failed to fully associate the patterns of bacterial diversity to reactor performance, however the use of CCA itself was fundamentally justified and future analyses employing the method will require a larger number of samples and more complete performance data for proper analysis. The fourth study was carried out in tandem with the third and sought to examine the shared microbial communities between AD systems with vastly different operational designs and parameters. The results indicated the majority of OTUs were unique to the system from which they were recovered, with sequences corresponding to shared OTUs not being recovered in equal abundance for each sample. This result confirms that the microbial diversity of AD systems is highly variable. Finally, as no

iii members of the class were recovered in the sequencing datasets, multiplex qPCR assays for five select genera of methanogens found to be abundant in AD systems was developed. The results of the qPCR assays showed that the hydrogenotrophic methanogen genus Methanobacterium was abundant in nearly all samples but not recovered using the archaeal primers used for 16S sequencing. This suggests that 16S sequencing libraries can miss important groups of Bacteria or Archaea, hindering the ability to link the microbial diversity to operational performance.

The overall results of my research provide greater insight into the microbial communities that participate in the AD process than was previously documented.

Incorporation of these results into the design and further optimization of AD systems will further improve the stability and performance of these systems in the future.

iv

To my father and mother.

v

ACKNOWLEDGMENTS

I first would like to thank all of my mentors who have served on my dissertation committee, Drs. Mark Morrison, Zhongtang Yu, and Olli Tuovinen. You have all played a crucial role in my development as an independent researcher and your insight and guidance has been invaluable. To Drs. Morrison and Yu I wish to acknowledge their gracious acceptance of me into their lab and for serving as my advisor and co-advisor, respectively, throughout my graduate studies. It because of them that I have been able to participate in an area of research that I greatly enjoy, and cannot imagine myself not being a part of. I doubt that I would have achieved everything that I have without their generous support. A special acknowledgement goes to Dr. Yu for his patience and understanding while supervising me on a day to day basis. To Dr. Tuovinen I thank you for your support during my transition from to the environmental science graduate program and your enthusiastic support of both myself and my research.

I wish to thank Dr. Floyd Schanbacher for his service on my candidacy committee, his generosity in providing many of the digester samples used in this study, and for sharing his knowledge of anaerobic digestion systems. Within the Morrison/Yu lab group I specifically thank Jill Stiverson for her technical support in training me when

I first joined the lab and for her kindness and friendship during the 5 years that we have worked together. Of the other former and present members of my lab, of whom there are vi too many to recount here, I thank you all for your integral role in my research but more importantly for your friendship.

To my fellow graduate students working in the Department of Sciences, I thank you all for your friendship and serving as a source of often needed entertainment and diversion from my studies. May you all succeed in your own research.

vii

VITA

June 2001 ...... Sanford High School

2005 ...... B.A., Microbiology, Ohio Wesleyan

University

2005 to August 2006 ...... Graduate Teaching Associate, Department

of Microbiology, The Ohio State University

September 2006 to present ...... Graduate Research Associate,

Environmental Science Graduate Program,

The Ohio State University

Publications

Nelson, M.C., Morrison, M., Yu, Z., 2011. A meta-analysis of the microbial diversity observed in anaerobic digesters. Bioresource Technology, 102, 3730-3739

Cressman, M.D., Yu, Z., Nelson, M.C., Moeller, S.J., Lilburn, M.S., Zerby, H.N., 2010. Interrelations between the microbiotas in the litter and in the intestines of commercial broiler chickens. Appl Environ Microbiol, 76, 6572-6582.

Fields of Study

Major Field: Environmental Science

Focus: Microbial Ecology

viii

TABLE OF CONTENTS

Abstract ...... ii

Dedication...... v

Acknowledgments...... vi

Vita ...... viii

List Of Tables ...... xv

List Of Figures ...... xvi

Chapter 1: Introduction ...... 1

Chapter 2: Review Of Literature ...... 5

2.1 Anaerobic Digestion for Waste Treatment ...... 5

2.1.1 System Designs...... 6

2.1.2 The Anaerobic Digestion Process ...... 9

2.1.3 The Microbial Diversity of AD ...... 14

2.1.4 The Black Box ...... 16

2.2 Molecular Analysis Methods as Applied to AD Systems: ...... 16

2.2.1 Community Profiling Methods: ...... 18

2.2.2 Specific and Quantitative Methods: ...... 22

ix

2.2.3 Sequencing of 16S rRNA Genes: ...... 25

2.3 Statistical Analysis Techniques and Methods: ...... 28

2.3.1 Alpha Diversity Statistics: ...... 29

2.3.2 Beta Diversity Statistics: ...... 33

2.3.3 Multivariate Analyses: ...... 36

2.4 Summary ...... 39

Chapter 3: A Meta-Analysis Of The Microbial Diversity Observed In Anaerobic

Digesters ...... 40

3.1 Abstract ...... 40

3.2 Introduction ...... 41

3.3 Materials and Methods ...... 43

3.3.1 Sequence Data Collection ...... 43

3.3.2 Phylogenetic Analyses ...... 44

3.3.3 Nucleotide Accession Numbers...... 45

3.4 Results and Discussion ...... 45

3.4.1 Data summary ...... 46

3.4.2 Archaea ...... 47

3.4.3 Bacteria ...... 48

3.4.4 Diversity Estimates ...... 54

x

3.4.5 Analysis Considerations ...... 57

3.5. Conclusions ...... 58

Chapter 4: Microbial Diversity Changes In The Biomass Fractions Of An Anaerobic

Digester As A Result Of Feedstock Changes ...... 68

4.1 Abstract: ...... 68

4.2 Introduction: ...... 69

4.3 Materials and Methods: ...... 70

4.3.1 Digester Descriptions: ...... 71

4.3.2 Sample Processing and DNA Extraction: ...... 71

4.3.3 PCR Amplification and Clone Library Construction: ...... 72

4.3.4 Sequence Analysis: ...... 73

4.3.5 Nucleotide Accession Numbers: ...... 73

4.4 Results: ...... 74

4.4.1 Sludge Sample Descriptions: ...... 74

4.4.2 Collective Diversity Summary: ...... 74

4.4.3 Whole Samples: ...... 75

4.4.4 Sample Comparisons ...... 77

4.5 Discussion: ...... 78

4.6 Conclusions: ...... 82

xi

Chapter 5: Analysis Of The Shifts In Microbial Community Diverstiy Over Time In An

Anaerobic Digester Treating Ethanol Distillery Wastes ...... 91

5.1 Abstract: ...... 91

5.2 Introduction: ...... 92

5.3 Materials and Methods: ...... 94

5.3.1 Reactor Operation: ...... 94

5.3.2 Microbial Community DNA Extraction: ...... 95

5.3.3 PCR-DGGE: ...... 95

5.3.4 Design of Template Specific Amplification Primers: ...... 96

5.3.5 454 Sequencing Primers: ...... 97

5.3.6 Generation of 454 Amplicons and Sequencing: ...... 97

5.3.7 Sequence Processing: ...... 98

5.3.8 Bioinformatic and Statistical Analysis: ...... 99

5.4 Results and Discussion: ...... 100

5.4.1 DGGE Aanlysis: ...... 100

5.4.2 Pyrosequencing Analysis:...... 101

5.4.3 Community Comparisons Among Samples: ...... 105

5.4.4 Statistical Analysis: ...... 109

5.5 Conclusions: ...... 110

xii

Chapter 6: Examination And Comparison Of Microbial Community Structure In Four

Anaerobic Digestion Sludge Samples Using Pyrosequencing...... 121

6.1 Abstract: ...... 121

6.2 Introduction: ...... 122

6.3 Materials and Methods: ...... 124

6.3.1 Sample Acquisition: ...... 124

6.3.2 Microbial Community DNA Extraction: ...... 125

6.3.3 PCR-DGGE: ...... 125

6.3.4 Generation of 454 Amplicons and Sequencing: ...... 126

6.3.5 Sequence Processing: ...... 127

6.3.6 Bioinformatic and Statistical Analysis: ...... 127

6.4 Results and Discussion: ...... 128

6.4.1 DGGE Analysis: ...... 128

6.4.2 Pyrosequencing Analysis:...... 128

6.4.3 Individual Sample Composition: ...... 131

6.4.4 Sample Comparisons: ...... 135

6.5 Conclusions: ...... 137

Chapter 7: Quantification Of Methanogenic Archaea Using A Multiplex Quantitative

Real Time PCR (qPCR) Assay ...... 146

xiii

7.1 Abstract: ...... 146

7.2 Introduction: ...... 147

7.3 Materials and Methods: ...... 149

7.3.1 Primer/Probe Set Design ...... 149

7.3.2 qPCR Analysis ...... 150

7.4 Results and Discussion: ...... 150

7.5 Conclusions: ...... 155

Chapter 8: General Discussion...... 160

Works Cited ...... 167

Appendix A: Additional Pyrosequencing Methods ...... 186

xiv

LIST OF TABLES

Table 3.1: Diversity satistics for Archaea, Bacteria, and 'Major' phylum groups«...«60

Table 3.2: Estimates of current taxonomic coverage for Archaea and Bacteria...«««61

Table 4.1: Classification table of sequences IRUDOOVDPSOHIUDFWLRQV«««.««««

Table 4.2: Diversity statistics for all sample fractions«««««««««««««86

Table 5.1: List of primers used in this study for both DGGE and pyrosequencing analysis

RIVDPSOHV««««««««««««««««««««««««««««««

Table 5.2: Sequence summary data and alpha diversity indices for the six sample days and the overall datasHW«««««««««««««««««««««««««

Table 6.1: List of primers used in this study for both DGGE and pyrosequencing

DQDO\VLV«««««««««««««««««««««««««««««««

Table 6.2: Individual and total sequence statistics and DOSKDGLYHUVLW\PHDVXUHV««

Table 7.1: 3ULPHUVDQGSUREHVXVHGLQWKLVVWXG\«««««««««««««««6

Table 7.2: Comparison of the proportion of methanogens in qPCR and pyrosequencing

GDWD««««««««««««««««««««««««««««««««

7DEOH$/LVWRIDOODUFKDHDOSULPHUVGHYHORSHGIRUS\URVHTXHQFLQJ««««««

7DEOH$/LVWRIDOOEDFWHULDOSULPHUVGHYHORSHGIRUS\URVHTXHQFLQJ««««««

xv

LIST OF FIGURES

Figure 3.1 Phylogenetic tree of all sequences with phylum level branches grouped and

ODEHOHG««««««««««««««««««««««««««««««««2

Figure 3.2 Treemap of observed taxa VKRZQLQWKHLUKLHUDUFKLFDORUGHU««««««

Figure 3.3 Rarefaction curves for the Archaea, BactHULDDQGµPDMRU¶SK\OD««««

Figure 4.1 Phylogenetic tree of archaeal OTU sequences with nearest neighbor isolates and representativH«««««««««««««««««««««««««««

Figure 4.2 Phylogenetic tree of Chloroflexi««.«««««««««««««««

Figure 4.3 Principal coordinates aQDO\VLVSORWRIVDPSOHIUDFWLRQV«..«««««««

Figure 4.4 Venn diagram of shared OTUs between samples for the granular (left) and liquid (right) biomass fraFWLRQV««««««««««««««««««««««

Figure 5.1 Simplified reactor perfoUPDQFHGHWDLOV««««««««««««««

Figure 5.2 Dendograms showing sample similarity groupings derived from bacterial

DGGE banding patterns and pyrosequencing 278DEXQGDQFHGDWD«««««««

Figure 5.3 Bacterial phylum level distribuWLRQRIVHTXHQFHV««««««««««

Figure 5.4 Distribution of sequences for the primary archDHDOJHQHUD««««««

Figure 5.5 5DUHIDFWLRQFXUYHV«.«««««««««««««««««««««

Figure 5.6 Canonical correspondence analysis triplot..««««««««««««

Figure 6.1 'HQDWXULQJJUDGLHQWJHOHOHFWURSKRUHVLV ''*( EDQGLQJSURILOHV«.««

xvi

Figure 6.2 Neural network map showing the distribution and abundance of for each sample and the entiUHGDWDVHW««««««««««««««««

Figure 6.2 alternate: Distribution of bacWHULDOVHTXHQFHV««««««««««««

Figure 6.3 Distribution RIDUFKDHDOVHTXHQFHV«««««««««««««««.143

Figure 6.4 Rarefaction curves for each of WKHIRXUVDPSOHV««««««««««

Figure 6.5 Procrustes analysis of the principal coordinates analysis (PCoA) ordination for both the DGGE and pyrosequencing sample Ochiai correlation coefficienWV«««

Figure 7.1 Proportional abundance of methanogens in the samples used for pyrosequencinJ««««««««««««««««««««««««««««

Figure 7.2 Proportional abundance of methanogens in AD samples recovered over time

IURPWKHVDPH$'V\VWHPVXVHGLQ&KDSWHU««««««««««««««««

xvii

CHAPTER 1: INTRODUCTION

World wide energy consumption has been increasing at an increasing rate since the late 20th century. Projections for the year 2035 anticipate a doubling of the global energy demand from the year 1990 to 18.6 billion tonnes of oil equivalent (Gtoe) (Energy

Information Administration, 2010). The majority of this growth is expected to come from countries that are not a part of the Organization for Economic Cooperative

Development (OECD), primarily the rapidly developing countries of China, India, and

Brazil (Energy Information Administration, 2010). Increased energy consumption, however, has resulted in a concomitant increase in the release of greenhouse gas emissions (GHG), the global negative effects of which were recently summarized in the fourth annual assessment by the Intergovernmental Panel on Climate Change (IPCC,

2007). In light of the combined challenges of meeting anticipated global energy demand and mitigating the effects of GHG emissions on global climate change, there has been an increased push to develop new energy sources which have a lower environmental impact compared to fossil fuel based energy.

A variety of different waste streams are produced as a result of human activity, each of which have varying degrees of negative environmental impact on the environment. Waste streams consisting of a high proportion of organic matter such as livestock manure, food processing wastes, or waste activated sludge (WAS) from waste

1 water treatment plants (WWTPs) are especially detrimental when discharged into aquatic environments such as streams or lakes. This is due to nutrient overloading which can lead to the eutrophication of waterways. Additionally, manures and waste water sludge can harbor potentially pathogenic microorganisms such as Escherichia coli or

Cryptosporidium spp. that must be killed or removed prior to disposal (Ahring et al.,

2002). Anaerobic digestion (AD) has been used as a waste treatment process for high strength organic wastes for several decades (Ahring, 2003; Lettinga, 1995). During the

AD process, a complex consortium of microorganisms converts organic carbon into methane biogas, a potential energy resource. As such, AD has a dual environmental benefit of being able to treat high strength organic waste streams while also producing a carbon neutral, renewable energy resource.

While anaerobic digestion of organic wastes has been used as a waste treatment method for decades, it has only been in recent years that the AD process has been viewed as a potential source of energy through the capture and use of the produced methane biogas. Unfortunately, AD systems for energy production have not yet proven to be commercially viable in large scale in the U.S. due to resistance from potential operators because of potential maintenance and operating costs for current AD technologies. A large portion of these costs are related to the need for constant monitoring of the stability of the AD process and possible shutdowns in operation due to failures of the system. As

AD is a microbially mediated process, the microbial communities have a large influence on the overall stability of the system and thus have a direct influence on the potential economic viability of AD systems. While numerous studies have examined the microbial

2 communities in AD systems, the microbial diversity and their activity in such systems is

VWLOOUHJDUGHGDVD³EODFNER[´(Ahring, 2003; Rivière et al., 2009). Certain aspects of microbial diversity have been elucidated through various molecular analysis methods, however these datasets were often small, comprising fewer than 500 sequences.

Estimates of total species richness just for AD systems treating municipal WAS suggest that there are approximately 10-13,000 microbial species involved in the AD of WAS and that at most 75% of them have been witnessed (Rivière et al., 2009). Estimates for other

AD systems and/or feedstocks are less available or conclusive, and comparisons of the diversity between previous results are often contradictory.

The primary objective of my research is to further the examination of the microbial communities participating in the AD process using a variety of molecular and statistical analysis methods. In order to thoroughly examine the microbial diversity in AD systems,

I first performed a meta-analysis of publicly available sequence datasets. The goal of this study was to establish a baseline for the global diversity of microorganisms found in AD systems, removing biases inherent in singular studies. Study 2 was based on the examination of the development of the microbial community in a novel down flow sandbed filter reactor that has not yet been reported in the literature. The objective of this study was to examine the changes in microbial community composition and structure between the community present at the start of digester operation to the community that ultimately establishes itself when the feedstock is compositionally different from what the community was initially exposed to. Additionally, an investigation of the differences in the microbial communities that develop as granular and planktonic biomass was

3 conducted.

The third and fourth studies were designed to take advantage of recent advances in DNA sequencing technologies to investigate the microbial diversity more thoroughly than previously achievable. For study 3, I analyzed the microbial communities and their shifts over time in a sand bed filter reactor. Statistical analyses were used to compare microbial diversity between samples, and through the exploratory use of canonical correspondence analysis the relationships between the observed microbial diversity and changes in operational parameters of the digester system were examined. The fourth study was an in depth comparison of the microbial communities recovered from different

AD system to establish a common core of species involved in the AD process. Within this study, a deeper and more thorough investigation of the differences between granular and planktonic biomass communities was carried out to extend the results observed in study 2. Finally, in study 5 I developed a multiplex qPCR assay for select genera of methanogens frequently observed in AD systems. This assay allows for a more accurate representation of methanogen abundance than sequencing surveys and highlights the potential bias of relying on a single analysis method for diversity and population analysis.

The combined results of this research will provide better and more thorough insight into the microbial communities of AD systems and their relationship with operational parameters.

4

CHAPTER 2: REVIEW OF LITERATURE

2.1 Anaerobic Digestion for Waste Treatment

A variety of organic waste streams are created as a result of human activity. These include municipal sewage, food processing wastes, manures, and wastes resulting from bioenergy (e.g. ethanol, biodiesel) production. The disposal of these wastes are regulated by the U.S. EPA, which has established rules forbidding the direct release of these waste streams into public waterways. In order to satisfy the permitting conditions for discharge, waste treatment processes are necessary to decrease the amount of organic carbon and other nutrients such as nitrogen and phosphorus, reduce the counts of pathogenic microorganisms, and remove potentially toxic or harmful pollutants. To achieve this goal, biological waste treatment processes are generally used as they represent a cost effective strategy for the treatment of a wide variety of waste streams.

The anaerobic digestion (AD) process has proven to be particularly effective at the treatment of high strength organic wastes compared to aerobic waste treatment. This is due to a number of factors, including a significant reduction in the volume of sludge produced (~90% reduction), zero or reduced energy input for system operation, and a decreased physical space requirement (Lettinga, 1995). Additionally, during the AD process, organic waste is converted into methane biogas, which can be collected and used as a potential renewable energy resource (Ahring, 2003; Angelidaki et al., 2003).

Depending on the waste feedstock being considered, the estimates of recoverable

5 methane vary from 50-500 cubic meters per ton of feedstock (Angelidaki et al., 2003, Yu et al., 2010). A recent estimate of the potential recoverable methane from biowastes in

Canada was over 10 million tonnes of oil equivalent (Mtoe) (Levin et al., 2007). This represents a significant potential energy resource and has renewed interest in the development and implementation of AD systems.

2.1.1 System Designs

As AD became increasingly popular as a waste treatment process in various industries, a number of different AD systems have been designed and implemented. One of the earliest system designs was the continuously stirred tank reactor (CSTR). In a

CSTR reactor, the influent feedstock is brought into the reactor and mixed via mechanical or hydraulic means to homogenize and distribute the microbial biomass and influent waste feedstock throughout the reactor vessel. The biogas generated during the AD process rises to the top of the reactor where it is collected while treated effluent is drawn off the top. Depending on the exact system design being used, the entire waste effluent can then be released into the environment or downstream treatment system, or a small portion can be recirculated back into the reactor with the influent. In the latter design, the goal is to keep a highly enriched population of active microorganisms in the reactor vessel to aid in the conversion of the waste feedstock. This is necessary as the hydraulic retention time and the solids retention time of standard CSTR systems are coupled, meaning that the rate of biomass removal is linked directly to the removal of effluent. If a CSTR is operated at too short of a hydraulic retention time, the majority of active microbial population can be washed out of the reactor with the effluent. This can result

6 in the breakdown of the system as biogas generation declines or stops, necessitating costly manual intervention to reestablish a properly functioning microbial community.

One of the most common applications of the CSTR design is in the treatment of waste activated sludge (WAS) resulting from aerobic waste water treatment. Due to the increasingly stringent rules covering the application of WAS to fields as a form of fertilizer, municipal waste water treatment plants (WWTPs) are increasingly relying on

AD as a means to reduce the volume of WAS, the total nutrient and pathogen level of the treated sludge, and increasingly as a source of bioenergy to fuel plant operations

(Iranpour et al., 1999). The typical hydraulic retention times of such systems is generally between 14-30 days and systems can be operated in either batch mode or continuous mode (Alatriste-Mondragón et al., 2006).

While the CSTR was the original AD system implemented widely in industry, the upflow anaerobic sludge blanket (UASB) design, and its derivatives, has become the most popular type AD system implemented currently (Lettinga, 1995). This is because

UASB type digesters are able to be operated with low hydraulic retention times (~4hrs) while still maintaining operational stability and performance. In the UASB design, influent feedstock enters the bottom of the reactor vessel and flows upward where it is removed at the top of the reactor. The upward hydraulic flow and low retention time causes the development of granular biomass, which is key in maintaining an active microbial community in the digester. The granular biomass settles downward, against the hydraulic flow, to form a sludge zone where the majority of active microbial metabolism

7 occurs. While aspects of biomass granulation are still being investigated, it is generally accepted that filamentous microorganisms such as the methanogen Methanosaeta play a role in the physical binding and aggregation of microbes during granule formation. Due to the close proximity of microbes in anaerobic granules, the rate of carbon conversion

into CH4 has been shown to be higher granular biomass than in non-granular biomass due to increased transfer of metabolites between syntrophic organisms. While biomass granules have been investigated intensively due to their unique physiochemical properties and their necessary development for proper functioning of UASB systems, no studies have yet examined the microbial diversity of organisms which exist as planktonic cells in the liquid medium.

A derivative of the CSTR design is the sand bed filter (SBF) reactor, which is capable of decoupling the hydraulic and solids retention times. In this system, rather than having an upward hydraulic flow pattern as used in standard CSTR designs, influent feedstock enters near the top of the reactor vessel and treated effluent is removed at the bottom (Yu et al., 2010). To prevent biomass washout, a layer of sand at the bottom of the reactor serves as a biological filter to retain microbial biomass. This serves to decouple the hydraulic and biomass retention times as seen with UASB systems, allowing for a lower hydraulic retention time than CSTR systems. As the SBF system is still in commercial development, no research has been published concerning the operation of and microbial diversity within these systems and how they compare to other types of AD systems. As such, the SBF system offers a new avenue of research for the investigation of the microbial communities in AD systems.

8

2.1.2 The Anaerobic Digestion Process

At its most simplistic level, the anaerobic digestion (AD) process takes a waste organic feedstock and converts into biogas, the principal components of which are carbon dioxide (CO2) and methane (CH4). In reality, however, AD is a complex process, combining a range of microbially mediated and thermo-chemical reactions into a single overall process. To allow for a better understanding of how AD occurs in waste management systems, the International Water Association in 2002 published a generalized model for the AD process called the Anaerobic Digestion Model 1 (ADM1)

(IWA Task Group for Mathematical Modelling of Anaerobic Digestion Processes, 2002).

The ADM1 model breaks the overall AD process down into 4 discreet phases: polymer hydrolysis (I), acidogenesis (II), acetogenesis (III), and methanogenesis (IV). Each of these phases represents the major microbial processes that occur during the AD of organic waste and will be discussed further.

2.1.2.1 Phase I - Polymer Hydrolysis

During the polymer hydrolysis phase, complex organic polymers such as cellulose, fats, and are degraded into monomeric or oligomeric components. This process occurs primarily through the activity of extracellular enzymes secreted by hydrolytic bacteria attached to a polymeric substrate (Song et al., 2005; Vavilin et al., 1996). A number of bacterial species involved in the polymer hydrolysis phase have been isolated from AD systems, including species belonging to the genera Acetivibrio, Bacteroides,

Clostridium, and Coprothermobacter for example (Etchebehere et al., 1998; O'Sullivan et

9 al., 2005; Yang et al., 1990). As the polymer hydrolysis phase occurs extracellularly, it does not directly contribute to the energy harvesting metabolism of the involved microbial community. Thus, the bacterial species active in the polymer hydrolysis phase are also active during the acidogenic phase in order to gain energy for cell growth and replication. The polymer hydrolysis phase of AD is generally considered to be the primary rate limiting step for the conversion of waste feedstocks such as manure that have a high concentration of recalcitrant compounds such as cellulose (Yadvika et al.,

2004).

2.1.2.2 Phase II - Acidogenesis

During the acidogenic phase, the monomeric or oligomeric compounds produced during the hydrolysis phases are metabolized intercellularly into a variety of fermentation products, primarily short chain fatty acids (SCFAs) with minor amounts of lactate and alcohols (IWA Task Group for Mathematical Modelling of Anaerobic Digestion

Processes, 2002). The majority of organic carbon processed during the acidogenic phase,

~50%, is converted into acetate while approximately 20% is released as CO2 (Ahring,

2003). The remaining balance is converted to a mixture of non-acetate SCFAs, the principle of which are butyrate, propionate, and valerate. Both the acetate and CO2 produced during the acidogenesis phase can be directly used in methanogenesis, however carbon in the form of non-acetate SCFAs must be converted to one of these two forms in order to be transformed into methane. A wide variety of microbes have been identified as participating in the acidogenesis phase of AD. The phyla Actinomycetes, Bacteroidetes,

Chloroflexi, Firmicutes, and all contain species which are known to

10 participate in acidogenesis. In addition to the genera listed for the polymer hydrolysis phase, genera of bacteria which solely participate in the acidogenesis phase include those belonging to the family Anaerolineaceae in the phylum Chloroflexi, the genera

Bifidobacterium and Paludibacter in the phylum Bacteroidetes, and thermophilic bacteria in the phylum (Balk et al., 2002; Dong et al., 2000; Ueki et al., 2006;

Yamada et al., 2006).

2.1.2.3 Phase III - Acetogenesis

While acetate is the most abundant SCFA produced during the acidogenesis phase, other SCFAs (e.g. propionate and butyrate) are also produced as mentioned above. While acetate can be directly used as a substrate in the methanogenesis pathway of certain methanogens, other SCFAs cannot be used directly in for methanogenesis by any known methanogen. These non-acetate SCFAs thus need to be converted to either acetate or

CO2+H2 in order to maximize methane production. Additionally, if these non-acetate

SCFAs are allowed to build up in concentration, they can cause an overall decrease in the pH of the system, which can prevent the growth and/or metabolism of certain microbes, particularly the methanogens (Garcia et al., 2000). The concentration of propionate is especially important as concentrations as low as 30 mM are known to have an inhibitory effects on methanogenesis (Barredo and Evison, 1991). A group of bacteria known as syntrophic acetogens are responsible for the conversion of non-acetate SCFAs into acetate and CO2+H2. Genera such as Syntrophobacter and Smithella of the order

Deltaproteobacteria are capable of converting propionate to acetate (Boone and Bryant,

1980; de Bok et al., 2001) while members of the genus Syntrophomonas of the phylum

11

Firmicutes are capable of converting butyrate and longer chain SFCAs (Sousa et al.,

2007; Zhang et al., 2004). The syntrophic label for these bacteria comes from the requirement of the presence of a group of hydrogen utilizing organisms in order to preserve a favorable thermodynamic gradient for the conversion of SCFAs to acetate. In

AD, this is often a hydrogen utilizing methanogen, however reports have indicated that sulfate reducing bacteria are also capable of supporting syntrophic acetogenesis under conditions of high sulfate concentration (Ariesyady et al., 2007).

2.1.2.4 Phase IV - Methanogenesis

The terminal phase of the AD process is methanogenesis. During this phase of the

process, acetate, formate, and/or CO2+H2 are converted to methane (CH4).

Methanogenesis is exclusively carried out by a specialized group of microorganisms, the archaeal methanogens. The methanogens are conceptually divided into three groups according to phylogenetic and phenotypic similarity (Anderson et al., 2009; Bapteste et al., 2005). The first group, Class I methanogens, include the orders ,

Methanococcales, and Methanpyrales while the second and third groups, Class II and

Class III methanogens respectively, only includes the order and

Methanosarcinales respectively (Anderson et al., 2009). This distinction is important as the Class I and II methanogens are both regarded as being hydrogenotrophic, in that they only use formate, and/or CO2+H2 as their substrates for methanogenesis. The Class III methanogens, on the other hand, are characterized as being acetoclastic or acetotrophic, being able to use acetate and, depending on the particular genus, methanol, methylamines or other one carbon compounds (Anderson et al., 2009). During the AD process, Class I

12 methanogens play a crucial role in the consumption of free hydrogen in the reactor. This is critical as the syntrophic conversion of SCFAs such as propionate to acetate are thermodynamically unfavorable under all but low hydrogen partial pressures (Schink,

1997). Studies examining the distribution of syntrophic acetogens have found that they are often located in close physical proximity to hydrogenotrophic methanogens and engage in interspecies hydrogen transfer (Ariesyady et al., 2007). Commonly recovered hydrogenotrophic methanogens include the genera Methanobacterium, , and (Hori et al., 2006; Leclerc et al., 2004).

While hydrogenotrophic methanogens are important in the maintenance of low hydrogen partial pressures in the reactor vessel, it is estimated that the majority of the carbon converted to methane is from acetate (Mackie and Bryant, 1981). The Class III acetoclastic (acetotrophic) methanogens utilize primarily acetate for the production of

CH4. The two primary families of Class III methanogens are the Methanosarcinaceae and Methanosetaceae. While members of the Methanosarcinaceae are able to use other methylated compounds beyond acetate, all known species of the Methanosetaceae use acetate as their sole substrate for methanogenesis and thus energy recovery. Most likely due to its sole reliance on a single substrate for energy, Methanosaeta spp. have a very low minimum acetate concentration (Ks= ~5-70 µm) at which they are able to grow and

actively convert acetate to CH4 (Jetten et al., 1992). Methanosarcina spp., on the other hand, have a much higher acetate threshold (Ks= ~1 mM), but also have a higher growth rate compared to Methanosaeta. Thus, studies examining the population dynamics of acetoclastic methanogens have found that members of the Methanosaetaceae

13 predominate under conditions of low acetate concentration in the reactor medium (< 1 mM) while members of the Methanosarcinaceae predominate under conditions in which the acetate concentration is higher (Hori et al., 2006; Sekiguchi et al., 1999; Westermann et al., 1989).

2.1.3 The Microbial Diversity of AD

While the AD process is commonly divided into four distinct phases, in actual reality each phase is being carried through concurrently by an array of Bacteria and

Archaea that are interdependent on each other. As already mentioned, the conversion of propionate to acetate is thermodynamically unfavorable except for under conditions where the hydrogen partial pressure in the reactor is kept to a minimum. This results in the cooperation between syntrophic propionate oxidizers such as the genus Smithella with hydrogenotrophic methanogens such as Methanospirillum (de Bok et al., 2001). Various studies have investigated the effects of various operational parameters on the microbial diversity in AD systems. The organic loading rate, which indicates the amount of organic carbon being added to the system and is usually linked with the hydraulic retention time, is known to have a large effect on microbial diversity. Studies have shown that if the organic loading rate is too high, the growth and metabolism of fermentative bacteria can exceed the growth and metabolism of the syntrophic bacteria and methanogens, leading to a buildup of SCFAs and acidification of the system (Ahring et al., 1995; Nielsen et al.,

2007). Other studies have examined the effects of sulfur and nitrogen compounds on the microbial diversity of an AD system, with occasionally differing conclusions. AD systems receiving waste feedstocks with high sulfate loads, for example, have been found

14 to have either a decrease or no change in methane production (Pender et al., 2004;

Petersen and Ahring, 1992). The addition of transition metals (e.g. Ni, Co, Cu) has been found to increase CH4 production as a result of supplying necessary metal cofactors for methanogens and other species (Fermoso et al., 2010; Speece et al., 2006).

A special case of how system operation can affect the microbial diversity in AD systems is the temperature at which the AD process is carried out. While the majority of

AD systems in the US and globally are operated at mesophilic temperatures (~20-40°C), the use of psychrophilic (<15°C) and thermophilic (>50°C) digestion systems has increased over the past several years. The primary reason for the operation of an AD system at psychrophilic or thermophilic temperatures varies depending on the exact circumstances involved. A common justification for psychrophilic AD is that it negates the need to heat the influent feedstock to a temperature which is higher than the temperature at which the feedstock was discharged from its source (O'Reilly et al., 2010).

The composition of the microbial communities in psychrophilic AD systems have a similar composition to those in mesophilic AD systems, with fermentative members of the Bacteroidetes and syntrophic members of the Proteobacteria being predominant

(2¶5HLOO\HWDO, 2010). Unlike psychrophilic AD, thermophilic AD requires an energy input to heat the influent feedstock to thermophilic temperatures. However, thermophilic

AD has been proposed as offering a greater degree of pathogen reduction, decreased retention time, and higher rate of biogas production compared to mesophilic AD (Ahring,

2003; Iranpour et al., 2002). Studies of thermophilic AD have shown that the bacterial communities in such systems are dominated by thermophilic genera such as

15

Thermoanaerobacter, , and members of the phylum Thermotogae (Balk et al., 2002; Hernon et al., 2006; Yamada et al., 2006). One of the negatives of thermophilic AD systems is a higher likelihood of system failure due to an increase in

SCFAs, especially propionate, which causes a severe decrease in biogas production

(Speece et al., 2006). To overcome this, research has investigated splitting the acidogenesis phase of AD form the methanogenic phase into two reactor vessels, with the former operating at thermophilic temperatures and the latter operating at mesophilic temperatures (Lv et al., 2010). The use of TPAD systems, however, is still in its infancy and has yet to be demonstrated as being commercially viable in large scale operation.

2.1.4 The Black Box

Due to the importance of the microbial consortia in the AD process, the diversity and distribution of these organisms has been extensively studied. However, a recurring

WKHPHKDVEHHQWRXOWLPDWHO\UHIHUWRWKHVHV\VWHPVDVD³EODFNER[´ HJAhring, 2003,

Rivière et al., 2009). The implication of this phrase is that while numerous studies have examined the microbial diversity in AD systems, there is still more that is unknown about the microbial communities within these systems than is currently known at this point in the development of the field. A wide variety of methods have been developed and are available for the investigation of the microbial diversity in AD systems, and their use and application will be discussed next.

2.2 Molecular Analysis Methods as Applied to AD Systems:

Traditional examinations of the microbial ecology in the environment were based

16 on cultivation based methods that attempted to isolate and identify species in pure culture. This has lead to a number of species being isolated from AD systems, including fermentative and syntrophic members of the Bacteria and methanogenic Archaea (e.g.

Liu et al., 1999; Yamada et al., 2006; Zellner et al., 1998; Zhang et al., 2004).

Cultivation based methodology, however, is limited by the finding that a large percentage of microbial species present in the environment are impossible to culture using current methods. Estimates of the uncultured microbial diversity range from 70-99% of actual microbial species, highlighting the large number of uncharacterized species, some of which are expected to participate in AD (Hugenholtz et al., 1998; Ward et al., 1990). As methods of molecular analysis became more refined and less expensive during the latter decades of the 20th century, culture independent analysis of the microbial communities in environmental samples became the preferred methodology for the investigation of AD systems. These methods are generally based on analysis of certain well characterized genes, the most common of which are the ribosomal RNA (rRNA) genes.

The rRNA genes are useful in the investigation of microbial diversity as they are present in all known Bacteria and Archaea and represent stable, evolutionarily conserved

DNA fragments. The prokaryotic ribosome is comprised of two subunits, the 50S or large sub-unit and the 30S or small subunit. The 50S sub-unit has two RNA components, the 5S and 23S rRNAs, while the 30S sub-unit has only one RNA component, the 16S rRNA. While studies investigating microbial diversity have used all three rRNAs, the

16S rRNA gene (rrs) has become the standard target for molecular analysis. The 16S rRNA gene is preferred because its length (~1600 nucleotides) provides a large degree of

17 phylogenetic information while being short enough to be adequately sequenced using standard sequencing technologies. Further, a traditional reason for the use of the 16S rRNA gene goes back to the foundational work by Carl Woese, who used 16S rRNA gene sequences to determine that the Archaea were in fact a separate branch of life from the Eubacteria (Woese and Fox, 1977). This led to the later recommendation that DNA homology be used as the basis for bacterial over traditional measures such as profiles of metabolic activity or cell envelope structure.

As methods for the extraction and analysis of DNA from mixed microbial communities have improved, a variety of analysis methods have been developed and applied to the study of anaerobic digestion systems. These methods include single strand conformation polymorphism (SSCP), denaturing gradient gel electrophoresis (DGGE), restriction fragment length polymorphism (RFLP), fluorescent in situ hybridization

(FISH), quantitative real-time PCR (qPCR), and DNA sequencing methods. Each of these analysis methods has a set of known advantages and disadvantages for community analysis and will be discussed.

2.2.1 Community Profiling Methods:

Two commonly used methods for investigating the microbial diversity in anaerobic digester systems, SSCP and DGGE, rely on the principle of separating 16S rRNA gene fragments in an acrylamide gel so that fragments with different nucleotide sequence composition will form distinct bands when visualized. As denoted by its name, SCCP utilizes single stranded fragments of the 16S gene which assume a unique structural

18 conformation depending on the nucleotide sequence. The sequences can be visualized as unique bands in an acrylamide gel due to differences in their migratory ability caused by differences in secondary structure conformation. As applied to anaerobic digestion systems, SSCP was used by Leclerc et al. (2004) to examine archaeal diversity in 44

GLIIHUHQW$'V\VWHPV7KH\IRXQGDWRWDORIXQLTXH³VSHFLHV´EDQGVRIZKLFKWKH two most prominent bands were identified as Methanoseata concilii and a

Methabobacterium belonging to the vadin cluster first described by Godon et al. (1997).

While SSCP has been recently used by other groups to examine the microbial diversity in

AD systems (e.g. Lefebvre et al., 2007; Zhao et al., 2008), its use has so far been limited as more advanced methods of analysis have been developed. This is due to complications in the ability to accurately determine the taxonomic classification of a specific band after separation. Band classification using SSCP is typically based on comparison to profiles of known microbial communities (Delbès et al., 2000; Leclerc et al., 2004), and thus classification is highly dependent on the accuracy and reliability of this comparison

In contrast to SSCP, DGGE uses double stranded 16S rRNA gene fragments amplified from one of the hyper-variable regions. When run through an acrylamide gel containing a gradient of the denaturing agents urea and formamide, the DNA fragments partially denature as the hydrogen bonding between nucleotides of the two strands is disrupted. This denaturation causes the DNA fragment to exceed the pore size of the gel, preventing further migration and forming a band that can be visualized in a similar manner as used in SSCP. Unlike in SSCP, however, DGGE bands can be extracted from

19 the gel and sequenced if desired. This allows to a direct identification of a band without relying on a proxy community. Due to its low cost and ease of use, numerous studies have examined the microbial diversity in AD systems using DGGE. Primarily these studies have examined the effects of various operating conditions, feedstocks, and system designs on microbial community development. For example, Akarsubasi et al. (2005) saw changes in the abundance of different Archaea in response to changes in feedstock composition.

A third community profiling method, RFLP, discriminates 16S rRNA sequences by digesting PCR amplified 16S fragments with a set of restriction enzymes to produce a set of small, varied DNA fragments which can be visualized on an agarose gel. Different patterns of banding for different sequences arise from changes in the number and location of restriction sites across the 16S gene and are used to determine if a given sequence is unique compared to another. For large scale, rapid community profiling, a variant of

RFLP known as terminal RFLP (T-RFLP) that analyzes only one or two fragments of the

16S gene, termed the terminal fragment, has been developed. In this method, near full length 16S rRNA gene sequences are first PCR amplified from a mixed community DNA sample using either one, or two, primers that are labeled with a fluorescent marker. The mixed PCR product is then digested using one or more restriction enzymes and the length and quantity of the fluorogenically labeled fragments are determined by capillary gel electrophoresis. The resulting profiles of fragment lengths can then be compared between samples to determine relative abundance and diversity within and between samples, or to a database of fragments generated from known species to tentatively

20 identify a fragment in an unknown sample to a known species (Abdo et al., 2006).

McKeown et al. (2009) used T-RFLP in complement with small clone libraries to examine changes in the microbial diversity over a 3.5 year period of an AD system operating at psychrophilic (4-15°C) temperatures. The T-RFLP profiles for each sample, however, were not directly compared to the clone library sequences and thus the degree to which the two methods produce similar results is not known.

While all three of the discussed community profiling methods have been successfully applied in the analysis of the microbial diversity in anaerobic digestion systems, each method has specific drawbacks that must be considered. For SSCP and T-

RFLP, a primary limitation of the method is the inability to accurately identify a band or fragment peak of interest to a known species. This is because the band or fragment itself cannot be directly sequenced and instead must be compared to known bands for SSCP or fragments for T-RFLP. With T-RFLP, it is often the case that multiple species can be assigned to a single fragment as the comparison is based solely on the presence and location of a restriction enzyme site in a JLYHQVHTXHQFH$GGLWLRQDOO\IDOVHRU³SVHXGR´ peaks can interfere with the proper interpretation of a T-RFLP profile (Egert and

Friedrich, 2003). For DGGE, a major limitation is the reproducibility of banding patterns due to the complexities of casting the denaturing gradient gel, and the formation of heteroduplex sequences which form a single band instead of two (Gafan and Spratt,

2005). A further criticism of DGGE is that it is not sensitive enough to detect or represent changes in rare species which may have a large influence on microbial community function and dynamics (Muyzer et al., 1993). A known issue affecting all

21 three profiling methods is the possibility for PCR induced errors or bias. Multiple studies have shown that PCR amplification can result in biased amplicon pools that do not accurately reflect the true composition of the microbial community (Reysenbach et al.,

1992; Suzuki and Giovannoni, 1996). Despite these known limitation, however, community profiling methods provide a useful method for screening a large number of samples in a cost efficient manner.

2.2.2 Species Specific and Quantitative Methods:

While community profiling methods have proven useful in determining broad changes in microbial diversity, they are limited in their ability to both specifically identify and quantify a desired species target. Both FISH probing and qPCR analysis allow for the identification and quantification of only certain desired targets, however the underlying principles they are based on and different methods used make them suited to different tasks.

FISH utilizes fluorogenic nucleotide probes that hybridize to pre-defined genomic sequence targets in intact microbial cells sampled from an environment. The fluorogenic probe is then visualized using fluorescent microscopy, which allow for visual location of cells containing the hybridized probe. These cells can then be enumerated either visually of with software designed for that purpose. FISH probes have been designed targeting the 16S rRNA of a number of different taxa and by using probes labeled with different fluorescent molecules it is possible to semi-quantitatively compare the abundance of the targets to each other or to the total number of cells in the sample. One of the useful

22 applications of FISH to AD samples has been in determining the physical structure of the microbial biomass that forms in AD systems. Angenent et al. (2004) used FISH probes directed towards members of the methanogens in order to determine that the genus

Methanosaeta was involved in anaerobic granule formation while Yamada et al. (2005) used the technique to examine the physical distribution of Chloroflexi species in granules.

A modification of the standard FISH method, microautoradiography FISH (MAR-FISH), combines the principle of FISH probing with microautoradioagraphy to allow for the codetermination of both cell identity and metabolic function (Lee et al., 1999). This techniques has been successfully applied by Ariesyady and colleagues (Ariesyady et al.,

2007a; 2007b) in the study of syntrophic bacteria that participate in the conversion of non-acetic acid SCFAs to acetate, providing further insight into the metabolism and physical distribution of syntrophic bacteria in AD systems.

qPCR takes advantage of recent advancements in PCR technology to monitor the production of DNA during the PCR cycle in real time. The two primary methods for qPCR use either SYBR green, a non-specific fluorogenic molecule that binds to double stranded DNA, or a dual-labeled TaqMan® probe that binds to a specific DNA target and

KDVDIOXRUHVFHQWPDUNHUDWWKH¶HQGDQGDTXHQFKHUPROHFXOHWKDWDEsorbs the fluorescence DWWKH¶HQG,Q6<%5JUHHQEDVHGDVVD\VWKHDPRXQWRI'1$SURGXFW produced after each amplification cycle is quantified by the relative fluorescent intensity of each sample. For TaqMan, the dual labeled probe is expected to bind to target DNA along with the standard PCR amplification primers. During the primer extension phase,

WKHSUREHLVGHJUDGHGLQWRLQGLYLGXDOQXFOHRWLGHVGXHWRWKH¶-¶H[RQXFOHDVHDFWLYLW\RI

23

Taq polymerase (Holland et al., 1991). With the fluorescent marker no longer quenched, the fluorescence is able to be detected. For both types of assays, a value called the critical or cycle threshold (Ct) is determined as the relative cycle at which the fluorescence of a sample increased above background. By comparing the Ct values of samples with unknown amounts of initial target DNA to those of standards with known starting quantities of template DNA, it is possible to accurately quantify the abundance of a particular gene sequence in a mixed community DNA sample.

Both SYBR green and TaqMan based qPCR has been applied extensively to the analysis of methanogens in AD systems. Yu et al. (2005) designed and tested a set of

TaqMan qPCR primers and probes directed to the 4 primary orders of methanogens witnessed in AD systems. These were then used to monitor the dynamics of acetoclastic methanogens in model AD systems with varying acetate concentrations (Yu et al., 2006).

Franke-Whittle et al. (2009) designed a refined set of primers for use in quantifying the genera Methanoculleus, Methanosarcina, and Methanothermobacter using SYBR green based assays. Aside from targeting 16S rRNA, qPCR has also been applied to functional genes as an alternative method for enumerating methanogens. Steinberg et al. (2009) developed a set of TaqMan probes targeted to the mcrA gene, which codes for the methyl-CoM reductase alpha subunit, and successfully applied them to the examination of methanogen diversity in several AD samples and methanogenic sediments.

While both FISH and qPCR are able to identify, locate, and enumerate specific microbial taxa or functional genes, they both require forehand knowledge of the potential

24 diversity present in the sample in order to design the necessary primers or probes.

Further, the development of primers and probes that specifically target only the taxon of interest is often difficult due to varying degrees of sequence similarity between related taxa. Thus, while these two methods are valuable for the analysis of known species found in complex microbial communities such as those found in AD systems, they are less appropriate for the discovery or analysis of unknown species.

2.2.3 Sequencing of 16S rRNA Genes:

While community profiling methods allow for a determination of broad changes in microbial diversity between samples, they are often unable to uncover the diversity of species which represent a rare minority of the community (<1%) (Muyzer et al., 1993)

To overcome this, direct interrogation of microbial communities through sequencing of the 16S rRNA gene is often the preferred method for determining microbial diversity in environmental samples. Two opposing sequencing strategies are currently available and used in the investigation of microbial diversity based on 16S rRNA gene sequencing.

The most common strategy currently used is the amplification of nearly full length 16S rRNA gene fragments from mixed community DNA and the cloning of these fragments in plasmid vectors in E. coli. The cloned 16S sequences are then sequenced using the

Sanger di-deoxy chain termination method, which is able to produce high quality sequence reads up to 800nt in length. The second strategy relies on the amplification and sequencing of millions of amplicons using newly developed massively parallel sequencing systems such as pyrosequencing as developed by 454 Life Sciences

(Margulies et al., 2005).

25

The generation and sequencing of 16S rRNA clone sequences has been used to examine the microbial diversity in a number of different AD systems. Roest et al. (2005) used small 16S rRNA clone libraries (<100 sequences) to examine the microbial diversity in a UASB digester operating on wastewater form a Kraft paper mill, however the number of sequences analyzed represents only a small fraction of the expected microbial diversity in AD systems. The largest clone library study was conducted by Rivière et al.

(2009), examining nearly 10,000 sequences generated from 7 AD systems at WWTP plants throughout the world. Despite the large size of the sequence dataset, they found only 6 operational taxonomic units (OTUs) to be common between the samples, highlighting the large degree of variation in the microbial diversity among AD systems.

Other studies have used primers specific to a particular phylum to more closely examine the diversity of groups such as the Chloroflexi or (Chouari et al., 2005;

Yamada et al., 2005).

While16S rRNA clone libraries have provided a notable degree of insight into the diversity of microorganisms that participate in AD, the typical number of clone sequences generated and analyzed is often too small to adequately describe the entire community in a given AD system. Massively parallel sequencing of 16S rRNA

DPSOLFRQVXVLQJ³QH[WJHQHUDWLRQ´VHTXHQFLQJWHFKQRORJLHVVXFKDVS\URVHTXHQFLQJ offers the ability to achieve unprecedented levels of sequence coverage compared to the traditional cloning and sequencing strategy traditionally used. This strategy, first applied to the study of hydrothermal vent communities (Sogin et al., 2006), has proven to be

26 successful at recovering rare species that are often missed by clone libraries. Due to the enormous scale of the sequencing operation, multiple samples can be investigated in multiplex using a tag or barcoded sequencing approach (Huber et al., 2007;

Parameswaran et al., 2007). To date, only a few investigations of AD systems using pyrosequencing have been reported. One of the first reports was a metagenomic analysis of a single AD sample, the results of which were split across three papers (Krause et al.,

2008; Kröber et al., 2009; Schlüter et al., 2008). The study by Kröber et al. integrated

16S clone libraries as a comparison of the two techniques, however the number of cloned sequences analyzed (109 total sequences for Bacteria and Archaea) has the same limitations as those listed above for small clone libraries, mainly an inability to adequately recover minority species in the sample. Two additional 16S rRNA based pyrosequencing surveys have recently been reported. The first, by Bibby et al (2010), used a strategy of generating a large number of sequences for a small number of samples

(n=6) to examine the diversity and abundance of rare pathogenic microorganisms in biosolids from a waste water treatment facility that were processed using anaerobic digestion or composting methods. The second, by Werner et al. (2011), used an alternative strategy of examining a large number of samples (n=112) with smaller sequence libraries in an effort to determine common species as well as compare microbial diversity to operational data.

One of the primary drawbacks for the use of either clone library sequencing or pyrosequencing of 16S rRNA genes is the biased amplification of sequences during initial amplicon generation (Polz and Cavanaugh, 1998; Reysenbach et al., 1992; Suzuki

27 and Giovannoni, 1996). Depending on the primers used, sample libraries can fail to represent over 50% of the true microbial diversity in a sample (Hong et al., 2009; Jeon et al., 2008). This bias can be seen as a justifying factor for the use of multiple sequencing strategies by different research groups when examining the microbial diversity of AD systems, as the bias from each group will be different. Another potential source of error for both clone library and pyrosequencing datasets is the generation of chimeric sequences, which are known to be produced during the PCR amplification of mixed community DNA (Ashelford et al., 2005). Screening software has been created to identify and remove these false sequences from the dataset, but it is often up to the decision(s) of an individual researcher if a sequence is truly anomalous or in fact represents an entirely new and previously unidentified microbial species (Ashelford et al.,

2006; Haas et al., 2011). An additional drawback that is more specific to pyrosequencing datasets is the issue of noise or artifactual sequences generated as a result of the sequencing operation. Again, software has been developed to address this issue (e.g

Quince et al., 2009), however given the relative newness of pyrosequencing no single set of processing standards has been agreed upon as of yet (Reeder and Knight, 2009).

2.3 Statistical Analysis Techniques and Methods:

While modern molecular technologies have allowed for greater and more accurate interrogation of the microbial diversity in anaerobic digestion systems, the resulting datasets are often not amenable to direct sample comparison. This stems from a number of factors including uneven sampling and potential errors introduced due to sampling method (e.g. sequencing errors or DGGE band discrimination). To address this issue, a

28 number of statistical description and comparison methods are used to reduce the complexity of the original dataset and allow for comparison of diversity between samples. The majority of these methods were developed by classical biological ecologists studying relationships of macrobiotic species such as plants or birds in defined environmental plots, however there is support for their application to microbial datasets

(Hughes et al., 2001). The first statistical concepts for measuring and evaluating biotic diversity go back to the 1943 paper by Fisher, Corbet, and Williams (Fisher et al., 1943) where the concept of alpha diversity is first mentioned. Later, Whittaker (Whittaker,

1972) defined the three common terms for different types of diversity analysis: alpha, beta, and gamma diversity. Alpha diversity refers to the measured species richness and evenness of a given sample, while beta diversity refers to the richness and evenness of species compared between two samples. Gamma diversity was defined as the sum of alpha diversity for a set of samples and has been further extended to represent

³JHRJUDSKLF´scale diversity (Hunter, 2004). As applied to microbial community ecology, alpha and beta diversity are the primary measures considered and deserve a more thorough examination.

2.3.1 Alpha Diversity Statistics:

The simplest alpha diversity metric that can be reported is the number of species observed in a given sample. For macrobiotic lifeforms such as plants, this is a simple matter of counting the number of different phylotypes (e.g. species of birds, families of insects) observed and their abundance. However, when analyzing microbiological sequence data, the resulting raw sequences are often processed and assigned to artificial

29

³VSHFLHV´GHVLJQDWRUVFDOOHGRSHUDWLRQDOWD[RQRPLFXQLWV7KLVLVDFFRPSOLVKHGE\ comparing the sequence similarity of a set of sequences and grouping them so that sequences of a given group share a common percent similarity. In practice, it is common for percent similarity to instead be replaced with dissimilarity, or genetic distance, as the measure for determining OTUs. As OTU groXSLQJVVHUYHDVWKHREVHUYHG³VSHFLHV´IRU diversity calculations and comparisons, the distance cutoff used to define OTUs has a large impact on the alpha diversity of a sample. Commonly used distance cutoffs for investigation of microbial 16S sequences are 0.05 and 0.03, representing sequence similarities of 95% and 97% similarity respectively (Schloss and Handelsman, 2004).

These values are used partly out of tradition, which allows for direct comparison of reported OTU values from multiple environments, and because of their ability to approximate the taxonomic ranks of genus and species, although there is still disagreement surrounding the best distance cutoff to use to approximate these ranks (Kim et al., 2011; Rossello-Mora, 2003).

As complete sampling of the environment is rarely feasible for a number of technical and financial reasons, a question often asked is what is the maximum diversity of the sampled environment and how well does the sampling represent this maximum.

Borrowing again from classical community ecology, a number of diversity statistics are used to assess various parameters of measured diversity and to estimate the maximum richness that could be expected based on current sampling effort. An initial first approach for most researchers is rarefaction analysis which performs a statistical resampling of a subset of the measured data and plots the number of newly observed

30 units against the number of individual units sampled. The original concept and methodology was proposed by Sanders in 1968 and later refined by Hurlbert (1971) and

Simberloff (1972). The resulting rarefaction plot, an example of which is seen in Figure

3.3, allows for a determination if the level of sampling effort was sufficient to explain a large proportion of the diversity. This is generally done by visually analyzing the slope of the curve to determine if it has reached a horizontal asymptote, which would represent the maximum species richness in the sample. From the rarefaction curve, non-parametric estimation of the location of the asymptote is possible by performing a curve fitting operation and has been occasionally used to estimate maximum richness (Larue et al.,

2005; Youssef and Elshahed, 2008). Non-parametric estimation of maximum species richness is more robust than parametric estimation, however the choice of the curve fitting model can highly influence the resulting estimate of the asymptote (Hughes et al.,

2001; Youssef and Elshahed, 2008).

Parametric estimation methods, which do not use a curve fitting model, have become the popular choice for estimating maximum richness as the calculations are easier to compute and have been implemented into a number of common bioinformatics analysis packages such as MOTHUR and Qiime (Caporaso et al., 2010; Schloss et al.,

2009). The Chao1 estimator is the simplest richness estimator and was determined by

Anne Chao in 1987 (Chao, 1987). It calculates maximum species richness as a correction of the number of species observed from the number of species observed once (singletons) and twice (doubletons). The Chao1 has become a near standard estimator of maximum microbial species richness for a number of various environments (Acosta-Martinez et al.,

31

2008; Edwards et al., 2004; Huber et al., 2007; Rivière et al., 2009), however the suitability of the estimator to all datasets has not yet been thoroughly examined. For certain datasets, such as those produced by pyrosequencing of 16S rRNA, the viability of singletons and doubletons as representative of a true species in the sample is still under debate, and thus the reliability of the Chao1 estimate for pyrosequencing data has yet to be fully examined. A second richness estimator developed by Chao was the abundance- based coverage estimator (ACE). Similar to Chao1, ACE uses a portion of the dataset to estimate maximum richness as a correction of the number of species observed, however instead of using only the number of singletons and doubletons the ACE statistic analyzes

WKHGDWDVHWEDVHGRQWKHQXPEHURI³UDUH´VHTXHQFHV (Chao and Bunge, 2002). As

FRPPRQO\FDOFXODWHG³UDUH´LVGHILQHGDVVSHFLHVZLWKOHVVWKDQVHTXHQFHVKRZHYHU this value is able to be changed and thus the ACE statistic may provide more accurate estimates of maximum richness for pyrosequencing data.

A final set of alpha diversity statistics are measures of species evenness and equitability. The Simpson diversity index (D), first detailed in 1949 (Simpson, 1949), determines the evenness of a community on a 0-1 scale with a completely homogenous community having a value of 0 and a completely heterogeneous community having a value of 1. Two additional forms of the Simpson index, 1/D and 1-D, are also occasionally reported in the literature using the Simpson index name, and care needs to be taken not to confuse the values of the resulting indices. The Shannon diversity index

(+¶) was developed out of information theory by Claude Shannon in 1948 (Krebs, 1998) and is commonly applied to ecological data. The Shannon index provides a measure of

32 both richness and evenness in a population such that a community with perfect evenness would have a value of +¶ equal to the natural log of the number of observed species

(Hmax). The Shannon equitability measure (E H) is the ratio of +¶ to Hmax, and provides an additional measure of the evenness of a sample, with a value of 1 indicating a sample has complete evenness.

2.3.2 Beta Diversity Statistics:

While measures of alpha diversity are informative for determination of the richness and abundance of species in a particular environment, it is often desired and necessary to compare the diversity observed in one environment to the diversity in a different environment. This involves determining the beta diversity of the samples and is generally calculated as a measure of similarity or dissimilarity between the two samples.

To achieve this, a number of different beta diversity measures have been developed, each with their own inherent strengths and weaknesses, to determine the level of similarity or difference between sets of species abundance data. These statistics are generally grouped into two different sets of analyses, Q or R, depending on the relationships being measured

(Legendre and Legendre, 1998). Q analyses measure the relationships between sample sites and are the most common beta diversity analyses used for microbial ecology data. R analyses measure the relationships between observed species, and are generally used in multivariate analysis methods for which they serve as the mathematical foundation of the analysis (Zuur et al., Ch. 10, 2007)

The two most commonly used R analysis metrics are the Pearson correlation

33 coefficient and the Spearman rank correlation coefficient. The Pearson correlation coefficient calculates the strength of a linear relationship between two observed species and can be used to determine if one or more sets of observed species are linearly correlated (Zuur et al., Ch. 10, 2007). Issues affecting the Pearson correlation coefficient that can obfuscate analysis of the result is the need for normally distributed data with a low abundance of samples with zero counts. The Spearman rank correlation coefficient, which calculates the Pearson correlation of ranked variables, can be used to examine non- normally distributed data, however the presence of a large number of zeros will still affect the result. The presence of multiple sample sites with zero counts for a species is an issue for many beta diversity analysis metrics, either Q and R, and is termed the double zeros issue (Zuur et al., Ch. 10, 2007). A result of the double zeros issue is that two variables, either species or sites depending on the analysis, with a large proportion of shared 0 values can be artificially declared more similar than their true biological similarity.

For microbial ecological data, Q analysis metrics are the most common type of beta diversity measures. As mentioned for R analyses, depending on the diversity measure used, the double zeros problem can artificially inflate the similarity of two sites if they share a large number of zero-count species. This can be overcome, however, through the

XVHRIDV\PPHWULFDOPHWULFVZKLFKGRQ¶WLQFRUSRUDWHVKDUHGDEVHQFHRIDVSHFLHVLQWKHLU measurement. The two simplest measures are the Jaccard and Sørensen/Dice similarity coefficients. The Jaccard coefficient is a simple measure of the number of species present in both samples divided by the sum of the number of species jointly present and

34 the number of species unique to both samples. The Sørensen/Dice coefficient is similar to Jaccard but weights the shared abundance in the numerator by multiplying the value by two. Both the Jaccard and Sørensen/Dice similarity coefficients treat the observed species as presence absence data, with species abundance disregarded. To consider species abundance, the Bray-Curtis or Ochiai similarity coefficients can be used. These coefficients use different methods to calculate the similarity between sites, however they are less affected by highly variable counts than other measures such as euclidean distance and thus are more amenable to microbial 16S sequencing data.

An additional set of beta diversity measures developed primarily for microbial ecological 16S data are the UniFrac metric developed by Luzopone, Hamady, and Knight

(Lozupone et al., 2006) and the parsimony test as implemented in MOTHUR (Schloss et al., 2009). Both methods are based on a phylogenetic tree of all species for all samples and are not affected by the double zeros issue. The parsimony test, first described by

Slatkin and Maddison (1990), examines the clustering of species belonging to individual samples and then compares the distribution to a large number of randomly generated trees to determine if the original distribution is significantly more similar than random chance would dictate. The UniFrac metric is calculated as the sum of the branch lengths for branches shared between samples and the significance of the original distribution is determined through a random shuffling of the node labels for a large number of trees

(Lozupone et al., 2006). As the UniFrac metric is based on the summed distance of shared branches, it is able to serve as a distance metric for use in comparing the similarity of multiple samples in a single tree. Further refinement of the UniFrac metric has

35 involved the incorporation of a weighting factor to allow for samples of differing evenness to be compared.

2.3.3 Multivariate Analyses:

While the determination and comparison of the alpha and beta diversity in and between different environments can provide beneficial insight into the diversity and distribution of microorganisms in an environment, it is unable to explain how microbial communities react to and influence various environmental parameters. Such explanations require statistically linking the measured species diversity of a sample with one or more measured environmental variables such as temperature or pH. This has been achieved to a degree using small, laboratory scale experiments where the environmental variables can be accurately controlled and the effect of a single parameter change, such as an increase in temperature, can be determined based on observed changes in microbial diversity. For example, Lefebvre et al. (2007) examined the effect of NaCl concentration on the microbial communities of lab scale batch digesters using a variety of molecular analysis methods and saw that diversity remained high even at high NaCl concentrations, but that activity decreased at similar levels for systems operating on different feedstocks.

However, such controlled analyses are univariate (only one variable is affected) and are unable to explain the effects of multiple variables on multiple species. Multivariate statistical analyses have been developed that allow for the examination of ecological communities based on measured species diversity. The most commonly used analyses in microbial ecology are principal components analysis (PCA), principal coordinates analysis (PCoA), and correspondence analysis (CA).

36

Principal components analysis (PCA) was one of the first ordination methods developed and has been applied to a wide variety of ecological datasets (Zuur et al., Ch.

12, 2007). PCA relies on the decomposition of a data matrix into a set of eigenvalues and eigenvectors which explain the variance of the matrix in a reduced set of synthetic axes.

The power of PCA and related ordination methods is that the first axis explains the largest proportion of total variance of the data, with further axes explaining less and less of the variance. The maximum number of synthetic axes that can be determined is equal to the number of variables in the original matrix, however only the first two or three axes are generally presented because of the simplicity of visualizing 2D and 3D plots. PCA is commonly used as a form of R analysis, using the Pearson correlation coefficient to determine the relationship between species variables. The underlying use of the Pearson correlation coefficient, however, generally makes PCA ill-suited for microbial datasets derived from sequencing surveys due to the double zeros issue. PCA has, however, been successfully applied to DGGE and T-RFLP data where the double zeros issue is less prevalent (Dollhopf et al., 2001).

Unlike PCA, which calculates similarity using the Pearson correlation coefficient, principal coordinates analysis (PCoA) is an ordination method based on the measures of beta diversity for a set of samples. PCoA takes a similarity or dissimilarity matrix for a set of samples, generated using an appropriate beta diversity measure, and decomposes it to determine a minimized set of artificial variables, referred to as components, which explain the variance of the dataset. The coordinates of where each sample lies on each of

37 the newly created components are then determined. By selecting two or three combinations of components, a 2D or 3D plot can be created which shows the distribution of samples in the artificial component space. Samples that have very similar composition will have similar principal coordinates for a given component, while dissimilar samples will have coordinate values that locate them in opposite space from each other.

Correspondence analysis (CA) is a popular ordination method that has been used in the analysis of macrobiotic data since the late ¶V(Zuur et al., Ch. 13, 2007). The foundation of CA, unlike PCA or PCoA, relies on measuring the similarity of samples and sites using the chi-square function to determine chi-square distances (Greenacre, Ch.

4, 2007). These chi-square distances are then used to plot either the samples or species along a set of synthetic axes in a manner similar to PCA and PCoA. As CCA uses the chi-square distance as the basis for ordination, the results are the same regardless of whether the data is being analyzed as an R (species) or Q (samples) analysis. The resulting axes function for ordination of both species and samples, allowing both items to be plotted at the same time (Greenacre, Ch. 5, 2007). In 1986, Ter Braak developed a variant of CA he termed canonical correspondence analysis (CCA) (Ter Braak, 1986). In this variant, the synthetic axes used for plotting the species and samples is first determined by the distribution of a set of environmental variables such that the variation of the environmental variables correlates with the variation of the species by matrix

(Greenacre, Ch. 24, 2007). The ability of CCA to summarize and visualize the joint variation between species, samples, and environmental parameters has made it a popular

38 and useful tool for the analysis of macrobiotic communities, but has seen only limited implementation with microbial datasets (Høj et al., 2005; Ramette, 2007; Supaphol et al.,

2011).

2.4 Summary

The use of AD for energy production has gained increasing interest. The microbial communities involved in the AD processes have been examined under various conditions, however it is estimated that the majority of the diversity in these systems remains to be witnessed. Modern molecular analysis techniques such as DGGE and 16S rRNA sequencing offer reliable methods for the analysis of microbial communities therein. Further, the availability of pyrosequencing allows for a significant advancement in the analysis of AD systems, allowing for a deeper analysis of rare species than previously possible using cloning and sequencing strategies. In combination with statistical analysis methods, the microbial diversity of AD systems, and their interactions within the digester system are able to be interrogated more closely and accurately than ever before.

39

CHAPTER 3

A META-ANALYSIS OF THE MICROBIAL DIVERSITY OBSERVED IN ANAEROBIC DIGESTERS

3.1 Abstract

In this study, the collective microbial diversity in anaerobic digesters was examined using a meta-analysis approach. All 16S rRNA gene sequences recovered from anaerobic digesters available in public databases were retrieved and subjected to phylogenetic and statistical analyses. As of May, 2010, 16,519 bacterial and 2,869 archaeal sequences were found in GenBank. The bacterial sequences were assigned to 5,926 operational taxonomic units (OTUs, based on •VHTXHQFHLGHQWLW\ UHSUHVHQWLQJNQRZQ bacterial phyla, with Proteobacteria (1,590 OTUs), Firmicutes (1,352 OTUs),

Bacteroidetes (705 OTUs), and Chloroflexi (693 OTUs) being predominant. Archaeal sequences were assigned to 296 OTUs, primarily Methanosaeta and the uncharacterized

WSA2 group. Nearly 60% of all sequences could not be classified to any established genus. Rarefaction analysis indicates that approximately 60% of bacterial and 90% of archaeal diversity in anaerobic digesters has been sampled. This analysis of the global bacterial and archaeal diversity in AD systems can guide future studies to further examine the microbial diversity involved in AD and development of comprehensive analytical tools.

40

3.2 Introduction

Anaerobic digestion (AD) of organic wastes is a microbially mediated process whereby complex organic wastes are ultimately converted into methane biogas, a potential renewable energy source. The overall AD process can be conceptually divided into four phases defined by the primary catabolic reactions that occur at each phase: hydrolysis of complex polymers (I, hydrolysis), fermentation of the hydrolysis end- products to short chain fatty acids (SCFAs) (II, acidogenesis), conversion of SCFAs of three or more carbons to primarily acetate (III, syntrophic acetogenesis), and the production of methane (IV, methanogenesis) (Yu et al., 2010). The guilds of microbes involved in each phase of AD are interdependent through cross-feeding and/or maintenance of chemothermodynamic gradients. As a result the AD process can quickly and easily breakdown when one of the four phases is out of balance, such as an accumulation of SCFAs that can lead to acidification of the entire system (Chen et al.,

2008). Failures of the AD process in bioreactors treating high strength organic wastes from industrial or agricultural operations can lead to damaging economic losses. As AD is increasingly looked upon as a source of bioenergy, the reliability and stability of AD systems becomes critical to ensuring both reliable energy supplies and uninterrupted core business operations.

Numerous studies have been conducted to gain a better understanding of the microbiomes present in AD reactors and their influence on the efficiency and stability of the AD processes (e.g. reviewed in Chen et al., 2008). While initial studies employed traditional cultivation-based methods, the primary methods in current use are DNA-based molecular biology methods such as cloning and sequencing of either functional or 16S

41 ribosomal RNA (rRNA) genes, FISH, DGGE, single-stranded conformation polymorphisms (SSCP), and quantitative PCR (Leclerc et al., 2004; Malin and Illmer,

2008; Sousa et al., 2007). Because it allows for identification of potential known and unknown microbes present in AD reactors, cloning and sequencing of 16S rRNA genes has been generally favored over other methods.

Most studies to date, however, have focused on a single specific AD system (e.g. upflow anaerobic sludge bed (UASB) reactors or continuous stirred tank reactors

(CSTRs) processing a single waste stream (e.g. municipal sewage, brewery wastewater).

Many of the datasets published contain a small numbers of cloned sequences (generally

<100), thus revealing only a small portion of the full diversity present in anaerobic digesters (e.g. Lefebvre et al., 2007). Some of these studies were further limited by a narrow focus on one particular microbial group such as the Archaea or a particular phylum (e.g. Chouari et al., 2003). Additionally, many sequences recovered from AD systems were deposited into GenBank but have not yet been reported in the literature, contributing little to no additional information on the microbial diversity and its function.

As a result, the understanding of the microbiomes involved in AD is fragmented and

OLNHO\ELDVHGH[HPSOLILHGE\WKHVHPLFURELRPHVVWLOOEHLQJUHJDUGHGDVD³EODFNER[´

This knowledge gap limits the understanding of how these complex microbiomes either hamper or enhance the efficiency and stability of AD systems.

A few studies have examined the microbial diversity of anaerobic digesters using relatively large (>200 sequences) 16S rRNA clone libraries (Chouari et al., 2005).

Additionally, a few studies have used 454-pyrosequencing, either alone or in combination with the Sanger sequencing technology, to analyze the microbiomes in anaerobic

42 digesters, producing large datasets of short, difficult to classify sequence reads (Krause et al., 2008). To date, however, there has been no collective overview of the microbial diversity generally found in AD systems. In this study, a meta-analysis was performed on all publicly available 16S rRNA gene sequences generated by Sanger sequencing from various anaerobic digesters in an effort to provide a collective appraisal of the microbial diversity in AD systems. Estimates of the current coverage of the microbial diversity already identified in anaerobic digesters were made and particular gaps in the knowledge and understanding of the microbial populations involved in AD were identified.

3.3 Materials and Methods

3.3.1 Sequence Data Collection

Initial sequence sets were obtained from the GenBank

(http://www.ncbi.nlm.nih.gov) and RDP (Release 10, http://rdp.cme.msu.edu) databases using the search termVµDQDHURELFGLJHVWHU¶µELRUHDFWRU¶µ&675¶DQGµ8$6%¶LQWKH months through May 01, 2010. All non-16S rRNA sequences were removed and the resulting composite dataset was de-replicated to remove identical records based on accession number. Sequences not recovered from methanogenic AD systems, particularly those corresponding only with heavy metal and chlorinated solvent remediation, were manually removed according to the annotation provided in the GenBank sequence records. Published datasets that were not automatically retrieved using the search terms were manually added. Sequences with vector nucleotides were trimmed to leave only nucleotides confirmed as rRNA after alignment against the 16S reference sequences from

E. coli (accession number: U00096) for bacteria or Methanothermobacter

43 thermoautotrophicus (accession number: AE000666) for archaea. Sequences shorter than

250 bp were removed from the dataset to avoid uncertainties in comparing and classifying short sequences that have little or no sequence overlap. The remaining sequences comprised the redacted composite dataset used in this study.

3.3.2 Phylogenetic Analyses

Sequences were grouped into batches of roughly 5,000 sequences by size such that the shortest sequence was no more than 20% shorter than the longest sequence in each batch. Batches were submitted for NAST alignment with the minimum alignment length set to 80% of the shortest sequence in each batch and all other criteria using default values (DeSantis et al., 2006). The resulting aligned sequences were imported into ARB and inserted into the Greengenes database ARB tree using the positional variance by parsimony method (Ludwig et al., 2004). Unaligned sequences were classified en masse to taxonomic ranks with the Classifier program implemented as part of the RDP database using default parameters (Wang et al., 2007). Based on the classifications determined with the Classifier program, distance matrices were computed within ARB using Jukes-

Cantor correction for the following groups: Archaea, Bacteria, Proteobacteria,

Firmicutes, Bacteroidetes, ChloroflexiDQGWKHFROOHFWHG³PLQRUSK\OD´RIBacteria that comprised sequences not assigned to any of the aforementioned phyla. Individual distance matrices were analyzed using MOTHUR to cluster OTUs, generate rarefaction curves, and determine the nonparametric ACE and Chao1 richness estimates (Schloss et al., 2009). A parametric estimation of expected maximum number of OTUs was conducted using the non-linear models procedure (PROC NLIN) of SAS (V9.1, SAS Inst.

44

Inc., Cary, NC). This method fits the monomolecular function to the rarefaction output generated by MOTHUR to determine the asymptote that serves as the upper bound of the curves as previously described (Larue et al., 2005). The value defined by the asymptote is an estimate of the expected maximum species richness complementary to the ACE and

Chao1 estimates and has been used previously to estimate maximum species richness

(Larue et al., 2005; Youssef and Elshahed, 2008). Unless otherwise stated, the term OTU was defined as a grouping of sequenceVWKDWVKDUH”VHTXHQFHGLVVLPLODULW\DQGLV taken to represent the species taxonomic rank. The following dissimilarity cut-offs were used to approximate other taxonomic ranks: 0.05 ± genus, 0.10 ± family, 0.15 class/order,

0.20 ± phylum (Schloss and Handelsman, 2004). A treemap based on the output from the

RDP Classifier was constructed using version 4.1.1 of the program Treemap

(http://www.cs.umd.edu/hcil/treemap).

3.3.3 Nucleotide Accession Numbers

The accession numbers for all sequences analyzed in this study are available from the author. The sequences are currently maintained in an in-house ARB database of anaerobic digester sequences. A copy of this database and the sequence alignment are also available by request from the author.

3.4 Results and Discussion

This study was conducted as a naïve meta-analysis of all publicly available 16S rRNA gene sequences recovered from AD reactors worldwide. The term naïve is used here to imply that sequences were collected and analyzed irrespective of their previously

45 determined taxonomic associations or other analyses. As AD becomes an increasingly engineered process for waste management and biogas production, it becomes necessary to understand the totality of microorganisms that are able to participate in the process.

Thus, the results represent a fresh, updated, and global view of the microbial diversity involved in the AD process in general.

3.4.1 Data Summary

A total of 19,388 sequences (16,519 bacterial and 2,869 archaeal) were retrieved and analyzed. The bacterial sequences were assigned to 5,935 OTUs, while the archaeal sequences were assigned to 296 OTUs. The most abundant bacterial OTU contained

1,799 sequences while the most abundant archaeal OTU contained 480 sequences. Of the bacterial sequences analyzed, a majority (approx. 63%) were classified within four

µPDMRU¶SK\ODChloroflexi, Proteobacteria, Firmicutes, and Bacteroidetes (Table 3.1 and

Figures 3.1 and 3.2). Based on assignment using the RDP Classifier, 1,667 of the

EDFWHULDOVHTXHQFHVZHUHOLVWHGDV³XQFODVVLILHG´DWWKHSK\OXPOHYHO7KHUHPDLQLQJ

VHTXHQFHVZHUHFODVVLILHGZLWKLQµPLQRU¶SK\OD )LJXUH6 RIZKLFKWKH

Synergistetes, Planctomycetes, and ZHUHWKHRQO\µPLQRU¶SK\ODZLWK representation >1% of all bacterial sequences. Figure 1 shows a phylogenetic tree of the

OTU sequences, grouped at the phylum level, with the Hugenholtz taxonomy applied.

This taxonomy was used for the phylogenetic tree as it uses previously described group names for clades not recognized in the RDP taxonomy and thus aids in comparison with previously published studies. While direct correlations of sequence abundance in the composite dataset to the corresponding populations of microorganisms in any given AD

46 reactor are likely invalid, species from the phyla Proteobacteria, Firmicutes, and

Bacteroidetes are likely ubiquitous in all AD reactors as nearly all studies have found sequences from these phyla (e.g. Ariesyady et al., 2007a; Hernon et al., 2006).

Furthermore, these phyla contain several species that are known to participate in one or more of the phases of the general AD process.

3.4.2 Archaea

The majority of the archaeal sequences, nearly 95%, were classified within the phylum Euryarchaeota, with only 139 sequences being classified within and 22 sequences failing to be classified to any of the recognized archaeal phyla. The crenarchaeal sequences were confined solely to the class . Among the

Euryarchaeota, 888 sequences were not assigned to any existing class, with the remaining sequences being assigned primarily to the methanogenic classes

Methanomicrobia (1,590 sequences) and (223 sequences), with 7 sequences assigned to the genus Thermogymnomonas in the class . The small numbers of Crenarchaeota and Thermoplasmata sequences suggests that these two groups are either allochthonous to or scarce in AD reactors and are probably not functionally vital in the AD process.

As seen in Figure 2, the obligately acetoclastic Methanosaeta (formerly

Methanothrix) was the most predominant archaeal genus, representing approximately

55% of the archaeal sequences assigned to a genus. Methanospirillum, which

preferentially utilizes H2/CO2 over formate as substrate, and the obligately hydrogenotrophic genus Methanobacterium each represented nearly 10% of the archaeal

47 sequences assigned to a genus. The genera Methanoculleus, Methanolinea, and

Methanosarcina each represented roughly 5% of genus-assigned sequences. The other 10 methanogen genera were only represented by a small number of sequences in the dataset

(Figure S1). These results are in general agreement with the finding reported in the literature that the predominant acetoclastic methanogens in AD systems are

Methanosaeta spp. (Leclerc et al., 2004).

While the RDP taxonomic assignments, as shown in Figure 2, suggest a large number of unclassified Euryarchaeota, the phylogenetic tree based on the Hugenholtz taxonomy (Figure 3.1) showed that most of these sequences belonged to the uncultured

WSA2 group of methanogens. This WSA2 group, also known as ArcI, has been found in several AD systems and currently has no cultured representatives (Chouari et al., 2005;

Rivière et al., 2009). The lack of cultured isolates with high similarity to this group of sequences makes it difficult to infer the function (e.g., substrates) of the methanogens represented by these sequences, however work by Chouari et al. (2005) suggested that members of this group represents methanogens are capable of growth at least on H2/CO2 .

3.4.3 Bacteria

3.4.3.1 Chloroflexi

The Chloroflexi comprised a total of 3744 sequences, approximately 22% of the

EDFWHULDOVHTXHQFHV&RPSDUHGWRWKHRWKHUµPDMRUSK\OD¶KRZHYHUChloroflexi diveUVLW\ZDVWKHORZHVWZLWKRQO\278VJHQHUDWHGDQGD6LPSVRQ¶V,QGH[RI which is much greater than that for the other three major phyla. Highlighting the unevenness of the phylum, over 60% of all the Chloroflexi sequences (13% of all

48 bacterial seqXHQFHV ZHUHDVVLJQHGWRVL[278VFODVVLILHGDV³XQFODVVLILHG

Anaerolineaceae´2YHUDOOMXVWRYHURIWKHChloroflexi sequences were assigned to unclassified Anaerolineaceae, while less than 5% were assigned to a known genus

(Figure 3.2). The 8 genera that were identified include all 5 genera in the family

Anaerolineaceae, Caldilinea, Dehalogenimonas, and Sphaerobacter (Figure S2). The majority of all the Chloroflexi sequences were found by a group of researchers in France through multiple studies investigating the microbial diversity in digested sludge from municipal wastewater treatment plants located in France, Germany, and Chile (Chouari et al., 2005; Rivière et al., 2009). Although it is not possible to infer their functions in anaerobic digesters, the bacteria represented by these OTUs could play a significant role, at least, in the municipal sludge digesters where these sequences were recovered. Future studies are needed to examine the distribution and abundance of Chloroflexi species in other types of anaerobic digesters.

3.4.3.2 Proteobacteria

The Proteobacteria was the second largest and yet most diverse phylum in the dataset. While 159 fewer sequences were assigned to the Proteobacteria (3,585) than the

Chloroflexi, there were more than twice as many OTUs identified (1,590), with a

6LPSVRQ¶V,QGH[QHDUO\WZRRUGHUVRIPDJQLWXGHORZHU8QOLNHWKHChloroflexi sequences, the majority (approx. 70%) of the proteobacterial sequences were assigned to a total of 169 known genera (Figure S3). All 5 classes within the Proteobacteria were represented, but the Beta-, Alpha-, and Deltaproteobacteria together represented roughly

86% of the proteobacterial sequences. The Epsilonproteobacteria represented only 58

49 sequences, with sequences classified to the genus Arcobacter comprising 28 of those. The

Gammaproteobacteria represented 378 sequences, with the genus Pseudomonas representing 17% of those. Only 43 sequences, approximately 0.25% of all bacterial sequences, were classified within Enterobacteriales, indicating a low recovery rate from most AD reactors of enteric bacteria that include many common human enteric pathogens found in sewage.

The fourth largest bacterial OTU in the dataset was assigned to the genus

Brachymonas, which was the most numerically represented genus in the Proteobacteria.

This genus is capable of denitrification, and along with other members of the

Comamonadaceae, is able to degrade organic acids including acetate. Some members of this OTU may be responsible for denitrification in some anaerobic digesters, such as utilizing the nitrate that is produced in the aerobic treatment process of wastewater treatment plants. The genera Rhodobacter and Thauera are similarly capable of growth on organic acids and were the second and fourth most abundant proteobacterial genera, respectively. The third most abundant genus, Smithella, is known to participate in the syntrophic oxidation of propionate to acetate for use by acetoclastic methanogens

(Ariesyady et al., 2007b). As propionate is an important fermentation product from phase

II (Speece et al., 2006), it is not surprising that syntrophic propionate consumers such as

Smithella and Syntrophobacter were correspondingly represented by a large number of sequences in the dataset.

3.4.3.3 Firmicutes

The third largest identified phylum was the Firmicutes, with 2,549 sequences

50 assigned. A majority of the sequences within the Firmicutes, nearly 86%, were assigned to the class . The class was represented by only 172 sequences, while the class Erysipelotrichi ZDVUHSUHVHQWHGE\RQO\VHTXHQFHVµ8QFODVVLILHG)LUPLFXWHV¶ comprised 150 sequences. A total of 108 recognized genera were identified, comprising

1,141 sequences (Figure S4). Within the class Clostridia, over 1/3 of the sequences were either unclassified to any known order or were unclassified below the family

Clostridiales )LJXUH $IXUWKHUZHUHGHWHUPLQHGWREHµ8QFODVVLILHG

5XPLQRFRFFDFHDH¶, a family that contains a number of cellulolytic and amylolytic species isolated from several types of anaerobic environments, including AD reactors.

All genera comprising greater than 5% of Firmicutes sequences were found within the class Clostridia (Figure S4). The cellulolytic genus Acetivibrio was the predominant genus with 162 assigned sequences, while the genus was second in abundance with 116 sequences. Both of these genera include of a number of species that are known to degrade complex biopolymers such as cellulose. This observation corroborates the finding that cellulose is a component of many AD feedstocks and that cellulolytic bacteria are important participants in phase I of the AD process (Burrell et al.,

2004; Yu et al., 2010). The proteolytic genera Coprothermobacter and Sedimentibacter represented 104 and 103 sequences, respectively. The only other genus representing more than 5% of Firmicutes sequences is Syntrophomonas, a genus comprised of syntrophic acetogens capable of butyrate and propionate degradation. Within the class Bacilli, the two primary genera were Lactococcus and Bacillus, representing 31 and 29 sequences respectively.

51

3.4.3.4 Bacteroidetes

Approximately 50% of the Bacteroidetes sequences, including the second largest bacterial OTU in the dataset and two other OTUs representing more than 50 sequences, were unable to be classified to any known class, with a further 190 sequences assigned to

WKHJURXSµXQFODVVLILHG%DFWHURLGDOHV¶ (Figure 3.2). In total, only about 30% of the

Bacteroidetes sequences were able to be classified to one of 30 recognized genera (Figure

S2). The ninth largest OTU within the bacterial dataset was assigned to the most abundant Bacteroidetes genus, Paludibacter, which in total represented 217 sequences.

Paludibacter is exemplified by the type strain P. propionicigenes WB4, a mesophilic anaerobic bacterium that produces propionate and acetate as its primary fermentation products (Ueki et al., 2006). The second largest genus in this phylum was Prolixibacter, which is typified by P. bellariivorans F2, a psychrotolerant organism capable of sugar fermentation via the mixed acid pathway (Holmes et al., 2007). The third largest genus within the Bacteroidetes is Parabacteroides, which is saccharolytic and produces acetate and succinate as its primary fermentation end products. Proteiniphilum, the 4th largest genus, is defined by the proteolytic type strain P. acetatigenes initially isolated from a mesophilic UASB (Chen and Dong, 2005).

3.4.3.5 Minor Phyla

,QDGGLWLRQWRWKHDERYHGLVFXVVHGµPDMRU¶SK\ODµPLQRU¶SK\ODZHUHUHSUHVHQWHG by the retrieved sequences (Figures 6 2IWKHµPLQRU¶SK\ODRQO\WKH

Synergistetes, Planctomycetes, and Actinobacteria represented more than 1% of all the

EDFWHULDOVHTXHQFHV6RPHNQRZQJHQHUDZHUHUHSUHVHQWHGLQWKHVHµPLQRUSK\OD¶ )LJXUH

52

3.2, S2). The genus Cloacibacillus, within the Synergistetes, contained the 5th and 7th largest OTUs in the bacterial dataset and was the single largest identified genus among the Bacteria. The type strain for this genus, C. evryensis str. 158, was isolated from an

AD reactor at a municipal wastewater treatment plant and is capable of anaerobic degradation (Ganesan et al., 2008). The phylum Synergistetes itself was recently defined and includes the genus Aminobacterium, to which 176 sequences were assigned, as well as other protein-degrading bacteria. Comprising just over 6% of the bacterial sequences in the dataset, members of the Synergistetes most likely play an important role in the acidogenic phase (phase II) of the AD process in AD systems.

The Planctomycetes was the second most abundant of the minor phyla, however the majority of the sequences (60%) were derived from a single study using Planctomycetes- specific primers (Chouari et al., 2003). A further 30% were derived from the large-scale sequencing effort by Rivière et al (2009), which included the same digester sampled by

Chouari (2003, 2005). Excluding these sequences gives only 40 sequences from 27 separate studies, suggesting a low and/or sporadic occurrence of Planctomycetes in typical AD systems. Within the phylum Actinobacteria, 26% of the sequences were associated with the propionate-producing suborder Propionibacterinaea, a result that highlights the importance of propionate as an intermediate product during the AD process. Indeed, nearly 30% of the electrons generated from complex substrates flow through propionate during biomethanation (Speece et al., 2006). Interestingly, the largest actinobacterial genus was the recently described Iamia, a genus defined by a single type strain (I. majanohamensis str. F12) capable of nitrate reduction (Kurahashi et al., 2009).

This genus also includes the previously characterized Microthrix parvicella, which has

53 been found in numerous sewage treatment systems and is involved in foaming and sludge bulking problems (Blackall et al., 1996).

3.4.4 Diversity Estimates

For all of the discussed groups, the ACE estimate of richness was the highest, while the rarefaction-based estimate was the lowest (Table 3.1). The ACE and Chao1 estimates of maximum species richness differed greatly (43-47%) for the Bacteria, Chloroflexi, and

Proteobacteria, while the corresponding estimates for the Archaea, Firmicutes, and

Bacteroidetes were less than 19% different. These differences stem from the method of correction each calculation uses and have been discussed elsewhere (Hughes et al., 2001).

In comparison, the richness estimates derived from the rarefaction curves differed much less from the Chao1 estimate than from the ACE estimate. Whereas individual studies represent the alpha diversity of microorganisms in a given digester, the composite dataset represents the global diversity of microorganisms in AD systems in general. Given the large array of options in digester design, operation, feedstock, and geographic region, it should not be surprising that the global diversity seen in the composite dataset is much greater than the alpha diversity reported for an individual digester.

Despite numerous previous studies, the results of this study showed that the complete microbial diversity in AD reactors remains to be detailed. Rarefaction analysis of the Archaea showed that sampling at the phylum (0.20 phylogenetic distance) and family (0.10 phylogenetic distance) levels has reached a maximum, while sampling at the species (0.03 phylogenetic distance) level is incomplete (Figure 3). For the Bacteria, only sampling at the phylum level has neared its expected maximum while the curves for

54 lower taxonomic ranks are still projecting upward. This is corroborated by the estimation of current coverage at the various taxonomic levels for the Bacteria (Table 3.2). At the species level, 90% of the expected archaeal diversity has already been witnessed, while only 61% of the bacterial diversity has been revealed. Rarefaction curves for the four

µPDMRU¶SK\ODsuggest that sampling of the Chloroflexi and Bacteroidetes is more complete than that of the Proteobacteria and Firmicutes as the curves have begun to reach a horizontal plateau (Figure 3.3c). The estimation of current coverage, however, shows that all fouUµPDMRU¶SK\ODKDYHDVLPLODUSHUFHQWFRYHUDJHDWWKHVSHFLHVOHYHO which is similar to that of the Bacteria (Table 3.1). This is due to there being fewer unobserved OTUs for the Chloroflexi and Bacteroidetes than for the Proteobacteria and

Firmicutes, even though the percent coverage is similar. Rarefaction analysis and the diversity statistics show that the known bacterial diversity in AD systems is incomplete below the phylum level. Nevertheless, the global diversity revealed in this study may serve as a framework for future studies of alpha and beta diversity. More specifically, the collective sequence dataset can be useful in developing molecular tools such as primers and probes to detect and quantify specific groups of either bacteria or archaea.

Knowledge arising from such studies will shine light on the ecology of individual bacteria or archaea found in the dataset.

Current second generation sequencing technologies are capable of providing sufficient coverage and depth to fully explore an individual sample or compare multiple samples through multiplexing. Studies examining multiple AD reactor designs, being fed differing feedstocks and operated under various conditions, will not only greatly increase the number of sequences available for analysis, but also provide better datasets upon

55 which to perform comparisons of the beta microbial diversity between individual AD systems. Such knowledge can aide in the design and operation of a system uniquely tailored to a specific feedstock or application, such as energy production. Additionally, the new sequences can be added to the composite sequence datasets analyzed in this study to help define the full diversity in anaerobic digesters. Knowledge on the full diversity may greatly advance our understanding of the microbiome underpinning the AD process and define the significance of individual bacteria and archaea.

One issue that the composite dataset cannot currently address is the beta diversity of species when comparing two or more AD systems. While the dataset contained sequences recovered from a wide array of digesters, beta diversity was unable to be determined because few studies generated both large sequence datasets and provided detailed information (e.g., design, feedstock, operation, etc.) on the AD system analyzed.

The different methodologies and sequence submission criteria used by different researchers also comprises such analyses. Rivière et al. (2009 UHFHQWO\GHILQHGD³FRUH group alpha´RIVL[EDFWHULDO278VIURPDQDO\]LQJPXQLFLSDOVOXGJHGLJesters. This

³FRUHJURXSalpha´LVFRPSULVHGRIbacteria affiliated with Bacteroidetes, Chloroflexi,

Synergistetes and . While these three phyla or class were abundant in the composite dataset analyzed in this study, members of the Firmicutes or Alpha- and

Delta-Proteobacteria were absent in core group alpha. Although it is possible that the unique environment within a specific AD reactor can select for a distinct microbiome, it is intriguing that only a small number of µFRUH278V¶FDQEHfound among the large numbers of OTUs identified in anaerobic digesters. Systematic studies examining multiple AD reactor designs with greater depth of coverage should help further define the

56

µFRUHPLFURELRPH¶LQGLJHVWHUV

Many of the sequences analyzed in this study were classified as either uncultured bacteria or archaea. Although phylogenetic analysis of 16S rRNA gene sequences can provide some insight into the functional diversity of anaerobic digesters, function- oriented approaches are needed. Metagenomic studies, as well as techniques such as

MAR-FISH and SIP, have been used to a limited degree to discover the function of uncultured bacterial groups (Ariesyady et al., 2007a; Schlüter et al., 2008). While these methods should probably be used more frequently, ultimately cultivation-based studies are also needed to define the functions of uncharacterized groups of bacteria and archaea in anaerobic digesters.

3.4.5 Analysis Considerations

Analysis of a composite dataset carries a few potential caveats and limitations that bear mentioning. While datasets from many different studies were combined in this study, the single dataset by Rivière et al. (2009) was the source of over 50% of the sequences analyzed. Furthermore, the two large datasets created by Chouari (2003; 2005) both generated sequences from one of the same sources used by Rivière, a municipal

WWTP plant digester in Evry, France. These three studies have the effect of artificially inflating the representation of digesters handling municipal sludge compared to other types of AD systems. Additionally, some studies focused solely on one or a few phylogenetic groups, again skewing the composite dataset by increasing sequences from these groups. This case is most exemplified by the study by Chouari et al. (2003) where

Planctomyces-specific PCR primers were used. As such, while the global diversity

57 defined by the dataset can represent the types of bacteria and archaea generally present in a digester, the abundance of some OTUs may not directly reflect the actual abundance of the corresponding organisms found in any given AD system.

A second issue is related to the decision, made by researchers, as to which sequences (all vs. unique) to submit to public databases and how to describe these sequences. One third of the sequences in the composite dataset were unpublished at the time they were accessed, and thus had little associated metadata accompanying them.

These two issues become important when determining maximum species richness or assessing the current level of coverage and hinders determination of the beta diversity found between different digesters. As previously suggested by Schloss and Handelsman

(2004), standardized criteria for the submission of sequences to public databases could help improve the quality of data available to researchers working in the AD field. As a beginning to this discussion, it is proposed that researchers investigating AD systems submit all high-quality sequences they have generated to the public databases

(GenBank/EMBL/DDJ) after thorough chimera checking. The resulting increase in sequence redundancy will greatly increase the confidence of maximum richness estimates. Information on key parameters, especially reactor design, feedstock, and operational temperature, should also be included.

3.5. Conclusions

Nearly 90% of the archaeal and 60% of the bacterial species-level diversity in anaerobic digestion (AD) systems has been observed. Sequences from the bacterial phyla

Chloroflexi, Proteobacteria, Firmicutes, Bacteroidetes and archaeal class

58

Methanomicrobia are well represented by the available sequences and the corresponding microorganisms are probably important participants in the AD process. The global diversity contains numerous groups for which there is no close cultured representative, especially the majority of sequences assigned to the phyla Chloroflexi and Bacteroidetes.

Future studies will need to utilize multiple approaches to further characterize the microbial diversity and its function in individual AD systems.

59

Table 3.1 Diversity satistics for Archaea, Bacteria, and 'Major' phylum groups % Total Unclassified # of Rarefaction Current Group Sequences to Phylum OTUsa ACEa Chao1a Estimationa Coverageb Archaea 2869 2.15% 296 362 336 327 90% Bacteria 16519 16.28% 5926 20538 11717 9646 61% Chloroflexi 3744 - 693 3238 1858 1157 60% Proteobacteria 3585 - 1590 6548 3498 2658 60% Firmicutes 2549 - 1352 3184 2674 2298 59% Bacteroidetes 2436 - 705 1494 1221 1076 66%

a: Values were calculated using a 0.03 dissimilarity cut-off b: Coverage = # OTUs / Rarefaction Estimate 60

60

Table 3.2 Estimates of current taxonomic coverage for Archaea and Bacteria. Distances refer to the following approximate taxonomic ranks: 0.03 = species, 0.05 = genus, 0.1 = family, 0.2 = phylum Rarefaction Current . Distance # OTUs Estimation Coveragea 0.03 296 327 90% 0.05 209 223 94% Archaea 0.1 112 116 97% 0.2 49 52 94%

0.03 5926 9646 61% 0.05 4981 7195 69% Bacteria 0.1 3186 3754 85% 0.2 1142 1150 99%

a: Coverage = # OTUs / Rarefaction Estimate

61

Figure 3.1 Phylogenetic tree of all sequences with phylum level branches grouped and labeled. Individual sequences, where shown, are labeled with their accession number. Continued

62

Figure 3.1 Continued

Continued

63

Figure 3.1 Continued

64

Figure 3.2 Treemap of observed taxa shown in their hierarchical order. Treemap showing taxonomic ranking of all taxa with greater than 50 sequences for all 19,388 retrieved sequences. Taxonomic ranks are shown hierarchically as boxes within boxes, with each rank color coded as follows: ± white, Phylum ± red, Class ± orange, Order ± yellow, Family ± green, Genus - blue. The size of each box is proportional to the number of sequences assigned to that taxon with respect to the entire dataset. Each box is labeled according to its RDP classification, with domain and bacterial phylum headings and end-rank boxes including the number of sequences represented if size permits. The placement of boxes is arbitrary with respect to boxes within the same taxonomic rank and does not correspond to any form of phylogeny or relatedness. a unclassified Lachnospiraceae. b unclassified Veillonellaceae

65

66

Figure 3.2

66

Figure 3.3 Rarefaction curves for the Archaea (a), Bacteria E DQGµPDMRU¶SK\OD F  Curves were generated using MOTHUR for the following distances: 0.03 („), 0.05 (S), 0.10 (T), 0.20 (‹) for WKH$UFKDHDDQG%DFWHULDDQGIRUWKHIRXUµPDMRU¶SK\OD Chloroflexi (†), Proteobacteria (U), Firmicutes ({), Bacteroidetes (V). For all graphs, the x-axis represents the number of sequences analyzed and the y-axis represents the number of OTUs generated.

67

CHAPTER 4: MICROBIAL DIVERSITY CHANGES IN THE BIOMASS FRACTIONS OF AN ANAEROBIC DIGESTER AS A RESULT OF FEEDSTOCK CHANGES

4.1 Abstract:

To date, there have been few studies that have examined the effect of the microbial community used to seed an anaerobic digestion (AD) system during initial system startup and whether or not this community establishes itself as the predominant community after achieving stable state operation. Further, no studies have examined the differences in microbial diversity that occur between granular biomass and cells that remain in the planktonic or liquid phase. This study was focused on examining the changes in microbial diversity between the initial seed sludge and the subsequent community that becomes established after steady state operation has been achieved.

Using 16S rRNA clone libraries the diversity of microbes in both the granular and liquid biomass fractions for 3 AD sludge samples was examined. The results showed that each sludge sample had a unique bacterial community associated with it, with the distribution of sequences at the phylum level highly variable. This suggests that the feedstock had the effect of enriching a microbial population that was uniquely suited to a particular feedstock. Differences between the granular and liquid biomass fractions of each sample were less pronounced than differences attributable to the change in feedstock between samples. The most prominent difference between the two fractions was an increased abundance of Methanolinea in the liquid fraction compared to the predominance of

68

Methanosaeta in the granular fraction, suggesting that there are different functional groups in each fraction. Further analyses will be necessary to fully explore the effect of biomass fraction on microbial diversity.

4.2 Introduction:

The anaerobic digestion (AD) of organic waste to biogas relies on a complex microbial community to carryout the conversion of complex feedstocks into methane biogas. During initial startup of an AD system, it is common to use a seed sludge to establish a microbial population in the reactor environment prior to feedstock input

(Ahring, 2003). In such cases, the goal is to provide an active microbial community in the reactor that will quickly begin the digestion process without causing a buildup of harmful intermediate metabolites such as volatile fatty acids (VFAs). The source of seed sludge can vary, but common sources include waste water treatment plants (WWTPs) and previously established AD systems (Ahring et al., 2002; Angenent and Sung, 2001).

While the microbial community in the seed sludge is expected to become the predominant community in the reactor during operation, there have only been a few direct comparisons of the microbial community in the seed sludge with the community present during steady state operation (Akarsubasi et al., 2005; Hernon et al., 2006).

The two most common types of AD systems are the upflow anaerobic sludge blanket (UASB) and continuously stirred tank reactor (CSTR). In UASB systems, the majority of the microbial biomass forms compact granules which settle to the bottom of the reactor and help prevent biomass washout when operating the system with low hydraulic retention times (Skiadas et al., 2003). Contrastingly, in CSTR systems, the

69 biomass typically does not exhibit granulation but is instead dispersed. The majority of previous studies analyzing the microbial communities in AD systems have focused primarily on whole sludge samples (e.g. Chouari et al., 2005; Lee et al., 2008). Several studies have specifically examined the microbial consortia that comprises granules, finding that the methanogen Methanosaeta, fermentative Clostridia, and syntrophic acetogens comprise the majority of species (Díaz et al., 2006). The microbial community that exists in the liquid fraction of the digester, however, has not been studied as a separate biomass fraction on its own.

While sludge from UASB and CSTR systems have been used as seed sludges for the startup of the opposite system (Akarsubasi et al., 2005; Angenent et al., 2004), there have been no investigations to determine if the granular or liquid biomass fractions, or both, serve to establish a predominant community during steady state operation. Thus in this study we used sludge from a UASB at a commercial jam and jelly manufacturer to seed a pilot scale sandbed filter reactor operating on food processing waste from a commercial potato chip manufacturer. The sludge from this run was subsequently used as seed for a second run operating on processing wastewater from a commercial producer. Sludge samples were fractionated into liquid and granular biomass fractions and 16S rRNA clone libraries for the archaea and bacteria were constructed and sequenced. Bioinformatics analysis was used to examine the alpha and beta diversity in the biomass fractions and in the whole sludge samples.

4.3 Materials and Methods:

70

4.3.1 Digester Descriptions:

Two runs of a novel, pilot scale (8000 gal working volume) sand bed filter reactor were conducted in 2006 using food processing wastes. The source inoculum for the first run of the pilot scale reactor was granular sludge obtained from a stable UASB digester operating at a commercial jam and jelly processing facility, sample designation S. While exact chemical data for the feedstock was not provided to us, this feedstock contained a high concentration of simple sugars and pectin and would be similar to that used by

Mohan and Sunny (2008). A 50ml sample of the seed sludge was frozen at -80°F for later microbial analysis. The feedstock for the initial operating run consisted of off-spec corn and potato chips from a commercial snack manufacturer and contained a high concentration of starches and lipids. After a 7 month operational period, the digestion process was shutdown and a bulk collection of sludge was made. The majority of this sludge was held at 4°C prior to being used as the seed inoculum for the second run. A

50ml sample of this sludge was frozen at -80°C for use in microbial analysis after completion of the pilot study (sample N). The second run of the digester was operated similar to the first but for an 8 month period. After digester shutdown, a 50ml sample of sludge preserved (sample B) at -80°C for further analysis. The waste feedstock for the second operating run was processing wastewater from a commercial Swiss cheese manufacturer and consisted of a high concentration of lactose and whey. Due to the commercial nature of the feedstocks and digester operation, chemical data for the feedstocks and operating performance data for the digester was not provided to us.

4.3.2 Sample Processing and DNA Extraction:

71

Sludge samples were fractionated into a liquid (L) or granular (G) biomass fraction by gravity filtration through Whatman grade 1 filter paper (Kent, UK). Granular biomass

was gently washed with 3 ml sterile H2O before transfer to a collection tube. The first flow-through from the filter, i.e. not including the 3ml wash, was collected and transferred to microcentrifuge tubes and centrifuged at 16K x g to pellet planktonic cells in the liquid fraction. Cell pellets were combined to achieve a combined weight of 0.25g prior to DNA extraction. For both granular and liquid biomass, 0.25g of sample was used for DNA extraction by RBB+C method (Yu and Morrison, 2004).

4.3.3 PCR Amplification and Clone Library Construction:

Cloning, clone selection, and sequencing was accomplished as described by

Cressman et al. (2010). Briefly, nearly full length 16S fragments were PCR amplified and then cloned into the TOPO-TA pCR4.0 (Invitrogen, Carlsbad, CA) sequencing vector and transformed into E. coli. Individual colonies for each sample fraction (SG, SL, NG,

NL, BG, BL) were grown in three 96-well culture blocks (total 1728 clones) and screened using RFLP to reduce the number of redundant sequences. After RFLP screening, 192 colonies were selected for each sample for sequencing (total of 1152 bacterial sequences). The archaeal cloning procedure was similar to the bacterial, with initial amplicons generated using the Arc8f and 1492r primers (Lane, 1991). Only 864 archaeal clones were sequenced due to lower diversity seen during RFLP analysis. The sequencing primer used for both the Bacteria and Archaea was Unv907r (Lane, 1991).

Sequences identified as being representative of an OTU for the entire dataset were resequenced using the primers M13f and 530f (Lane, 1991) to achieve longer amplicons.

72

4.3.4 Sequence Analysis:

Returned sequences were trimmed based on a quality score cutoff of 40 or higher using 4Peaks (http://mekentosj.com/science/4peaks/). Trimmed sequences for the

Bacteria and Archaea were checked for chimeric sequences using Mallard and Pintail

(Ashelford et al., 2005; Ashelford et al., 2006). Sequences were aligned using the

Greengenes NAST aligner and the resulting aligned files imported into the Greengenes

ARB database (DeSantis et al., 2006). A phylogenetic tree was constructed by inserting all of the sequences into the main Greengenes tree (tree_all) using the parsimony method and removing sequences not associated with this study. A distance matrix of all sequences was generated within ARB using Jukes-Cantor correction. The distance matrix was analyzed using mothur to generate rarefaction curves, diversity indices, and

OTU groupings (Schloss et al., 2009). Unaligned sequences were classified using the

RDP Classifier (Wang et al., 2007). Sample composition was compared based on sequence classifications using the Libcompare function of the RDP, the OTU based libshuff command in mother, and the tree based Unifrac (Lozupone et al., 2006; Schloss et al., 2009). Venn diagrams were generated using mothur (Schloss et al., 2009).

4.3.5 Nucleotide Accession Numbers:

The nucleotide accession numbers of the sequences reported in this study are:

GU388436:GU388842, GU388844:GU388924, GU388926:GU388949,

GU388951:GU389313, GU389315:GU389338,GU389340:GU389376,

GU389378:GU389428, GU389430:GU389451, GU389453:GU389484,

73

GU389486:GU389623, GU389625:GU389686, GU389688:GU389963,

GU389965:GU389976, GU389978:GU389984, GU389986:GU390080, GU390082,

GU390084:GU390098, GU390100:GU390118, GU390120:GU390121,

GU390123:GU390201, GU390203:GU390230

4.4 Results:

4.4.1 Sludge Sample Descriptions:

Sample S, taken from a stable operating UASB at a commercial jam and jelly manufacturer, had large black granules, around 2-3mm in size, while sample N, the sludge resulting from the first run of the pilot scale sand bed filter reactor, had finer granules between 1-2mm in size. The biomass of the sample B sludge was generally dispersed, with very little granulation.

4.4.2 Collective Diversity Summary:

A total of 1775 sequences, 969 bacterial and 806 archaeal, were recovered after trimming and chimera checking. These sequences were processed using mothur to produce a total of 224 bacterial and 20 archaeal OTUs for the entire dataset.

Classification of all sequences showed 10 bacterial and 1 archaeal phyla across all samples in total. Combined, 4 bacterial phyla, the Proteobacteria, Firmicutes,

Chloroflexi, and Bacteroidetes represented roughly 99% of the bacterial sequences as seen in Table 4.1. All of the archaeal sequences except for two were classified to the methanogenic class Methnanomicrobia, with the two exceptions being classified to the order or as unclassified Euryarchaeota respectively (Figure 4.1).

74

Interestingly, no sequences corresponding to the other primary methanogenic class, the

Methanobacteria, were recovered from the samples. Overall, the genus Methanosaeta represented over 80% of all archaeal sequences, while the genus Methanolinea represented just over 10% (Figure 4.1).

4.4.3 Whole Samples:

Sample S had the fewest number of sequences that were unable to be classified to a phylum compared to samples N and B (Table 4.1). The Proteobacteria and Firmicutes represented the bulk of sample S diversity (~85%) with the Bacteroidetes representing

7%. The primary proteobacterial constituents were unclassified Pseudomonadaceae,

Arcobacter, and Simplicispira. Interestingly, only 8 sequences were assigned to the

Deltaproteobacteria, all from the granular (SG) fraction, suggesting a lower than expected abundance of syntrophic bacteria. The Firmicutes was primarily comprised of members of the Clostridia, with only 1 sequence assigned to the genus . In the SG fraction, the primary Firmicutes taxa were the genus Fusibacter and unclassified

Clostridiales while in the SL fraction they were Fusibacter, unclassified Veillonellaceae, and unclassified Ruminococceae, the latter two being significantly greater in the SL fraction than in the SG fraction as determined using the RDP Libcompare utility. While nearly all of the archaeal sequences from the SG fraction were assigned to the genus

Methanosaeta, the sequences from the SL fraction showed a nearly 50:50 split between

Methanosaeta and Methanolinea (Figure 4.1). The SL fraction was the only sample fraction to have such a large number of Methanolinea associated sequences.

75

Similar to sample S, the primary phyla recovered from sample N were the

Proteobacteria and Firmicutes, however the number of unclassified Bacteria increased to nearly 1/3 of all sequences from only 6% for sample S. Within the Proteobacteria, over

75% of sequences were classified within the family Syntrophoaceae, with 10% classified within the Pseudomonadaceae. The Firmicutes was primarily comprised of the genus

Clostridium, members of the Ruminococcaceae, and the genera Tissierella and

Sedimentibacter. For the Archaea, nearly all sequences from the NG fraction were classified to Methanosaeta, while in the NL fraction nearly 18% of sequences were classified as unclassified Methanomicrobiales. The sole unclassified Euryarchaeota sequence in the dataset (GU388956) was found in the NL fraction and shares 99% sequence identity with sequences belonging to the WSA/ArcI group of methanogens

(Figure 4.1).

Unlike the S and N samples, sample B had a much lower overall richness, with the

BG fraction having less than half the number of OTUs compared to the SG and NG fractions (Table 4.2). The majority of sequences in the B sample were either unclassified

Chloroflexi (47%) (Figure 4.2) or unclassified Bacteria (28%), with the Proteobacteria representing the next largest phylum at 16%. In the BG sample fraction, nearly 2/3 of sequences were unclassified Chloroflexi compared to only 1/3 in the BL fraction (Table

4.1). Only 9 sequences from the BG fraction were not assigned to either the Chloroflexi or unclassified Bacteria, with 5 sequences assigned to the Clostridia, 2 to the

Gammaproteobacteria, and 2 to the . The BL fraction had fewer sequences assigned to the Chloroflexi and unclassified Bacteria than the BG fraction, and

76 an increased number of proteobacterial and Firmicutes sequences. Sequences classified to the Proteobacteria were primarily within the Deltaproteobacteria, with 17% of BL sequences classified within the Syntrophaceae (Table 4.1).

4.4.4 Sample Comparisons

As seen in Table 4.2, the Shannon diversity indices for granular and liquid biomass fractions were similar within the S and N samples, while the BG fraction had a much lower value than the BL fraction. Comparisons of the whole sludge samples showed that all three samples harbored statistically unique microbial communities based on UniFrac and libshuff analysis. As seen in Figure 4.3, the granular and liquid fractions of a single sample were more similar to each other than to the respective fractions of either of the other two samples. The BG and BL fractions, however, showed greater variation from each other, with the BL fraction being more similar to the fractions of the S and N samples than the BG sample. Comparing the number of shared OTUs, the three granular fractions shared 3 OTUs, with 6 OTUs shared between all of the liquid fractions (Figure

4.4). Representative sequences of the three shared OTUs from the granular fraction were classified to the genera Methanosaeta and Anaerofustis while the OTU representatives of the shared liquid fraction were classified to Methanosaeta, Methanolinea, and

Pseudomondaceae. As seen in Table 4.2, the SG and NG fractions had a similar number of OTUs at the 0.03 dissimilarity cutoff while the BG fraction has less than half the number of OTUs of the other two granular fractions. A similar result is seen between the liquid fractions, although not to the same degree with the BL fraction only having 80% as many OTUs. The differences in OTU abundance for each of the sample fractions are

77 similar to differences in the values for the Shannon index of diversity (+¶) for the fractions.

4.5 Discussion:

The design of this study incorporated two different but interconnected goals: to examine changes in microbial communities in response to different feedstock conditions following adaptation, and to examine differences in the granular and liquid biomass fractions. As expected for a UASB system, the sample S sludge was highly granular.

Despite the sludge biomass of CSTR type systems generally being dispersed the N sample sludge also had granular biomass. This was previously seen by Akarsubasi et al.

(2005) using a laboratory scale CSTR seeded with a UASB derived sludge as well as in previous operating runs of the sand-bed filter reactor (unpublished data). This is most likely due to the uncoupling of the hydraulic and solids retention times caused by the retention of biomass by the sand-bed, which allows for granular biomass to develop. The sample B sludge, however, primarily had a dispersed biomass with very little granulation and was similar to sludge samples recovered from other CSTR systems.

The overall taxonomic diversity in the dataset was similar to the global taxonomic diversity seen in anaerobic digesters (Chapter 3). The Archaea were found to be primarily divided between the two genera Methanosaeta and Methanolinea.

Methanosaeta has been proposed as one of the primary species responsible for methanogenic granule formation (Angenent et al., 2004; Zheng et al., 2006) and is capable of growth solely on acetate (Kamagata et al., 1992). The predominance of

78

Methanosaeta in the granular fractions of all samples follows expectations of their role in granule development. The presence of Methanosaeta in the liquid fractions would also be expected as planktonic cells would be expected to serve as the seed for granule formation (Angenent et al., 2004; Hulshoff Pol et al., 2004). Methanolinea was isolated from a municipal sewage sludge digester and is capable of growth on H2 and formate

(Imachi et al., 2008). Interestingly, Methanolinea was primarily found in the SL fraction, although sequences classified to the Methanomicrobiales and phylogenetically related to

Methanolinea also represented over 20% of archaeal sequences in the NL fraction. This

suggests that hydrogenotrophic methanogens could play an active role in consuming H2 and or formate in the liquid fraction while supporting the finding that acetoclastic methanogenesis is the primary pathway used in anaerobic granules (Angenent et al.,

2004; Leclerc et al., 2004)

As seen in previous studies (Chouari et al., 2005; Rivière et al., 2009, Chapter 3) the primary bacterial phyla in the dataset were the Proteobacteria, Firmicutes,

Chloroflexi, and Bacteroidetes. While the overall diversity was similar to patterns seen in previous digester samples, it was not evenly distributed between the three whole sludge samples, particularly at the phylum level. The Chloroflexi were almost exclusively found in sample B, comprising nearly 50% of sequences recovered from that sample in total, with the BG fraction particularly enriched (Table 4.1). Conversely, less than 1% of sequences from samples S and N were related to the Chloroflexi. Rivière et al. (2009) and Levén et al. (2007) both witnessed large proportions (25-45%) of

Chloroflexi sequences in municipal WWTP and household organic waste sludge samples

79 respectively, however members of the Chloroflexi have not previously been identified in

AD systems operating primarily on whey (Lee et al., 2008). It is possible that this result is due to the selective enrichment of the latent Chloroflexi found in the seed sludge

(sample N) during the operational period for sample B. This would support Baas

%HFNLQJ¶VVWDWHPHQW³(YHU\WKLQJLVHYHU\ZKHUe, but, the environment selects´KRZHYHU the depth of sequence coverage for the samples cannot adequately confirm this hypothesis. Due to the phylogenetic distance between the Chloroflexi sequences from this study and known isolates (Figure 4.2), the metabolic function of the particular

Chloroflexi species recovered in this study cannot be conclusively resolved, however their prevalence in the sample B possibly indicates a preference for the simple sugars and proteins found in cheese processing wastewaters. This would agree with previous findings for the metabolism of Chloroflexi species isolated from AD systems, which ferment various sugars into acetate and other SCFAs (Sekiguchi, 2006; Yamada et al.,

2007; Yamada et al., 2006).

Presumably due to the high percentage of Chloroflexi, the BG fraction had significantly lower proportions of Proteobacteria, Firmicutes, and Bacteroidetes than the

BL fraction or fractions of the other two samples. This result is reflected in the species richness measurements, where samples S and N had similar species richness while sample B had roughly 60-65% of the richness of either the S or N sample. Because of the overall predominance of Chloroflexi sequences in sample B, statistical testing with

Unifrac and Libshuff found no significant difference between the two biomass fractions for sample B. Principal coordinate analysis (PCoA) using Unifrac determined sample

80 correlations, however, showed that the sample B fractions were more dissimilar to each other than the sample S and N fractions were, with the BL fraction being more similar to samples S and N than to the BG fraction (Figure 4.3). The sample B sludge had a dispersed biomass with only slight granulation, while samples S and N were highly granular. It is likely that the granular biomass of sample N, which was used to seed the reactor for the second (sample B) run, began to break down and become dispersed during digester operation, with members of the Chloroflexi forming the remaining granular biomass while other species became planktonic. This would agree with findings by

Yamada et al. that suggest that members of the Chloroflexi are associated with sludge bulking in AD systems (Yamada et al., 2005).

As sample S served as the seed sludge for the sample N digester run, we expected to see similar microbial diversity between the two samples, with constituents of the S sludge becoming the primary constituents in sample N. Overall the two samples shared only 17 OTUs, ~15% of the total for either sample, 6 of which were classified to the genus Methanosaeta. Classification based comparison showed that the S and N samples both had similar numbers of Firmicutes, but that the distribution and abundance at lower taxonomic levels were different. For sample S, the largest represented genus was

Fusibacter while in sample N the largest genus was Clostridium. In both cases, no sequences from the predominant genus of one sample were witnessed in the other sample.

The abundance of Fusibacter is particularly interesting as only 2 sequences were recovered from previously published AD 16S sequence data sets (Chapter 3). DGGE analysis of a digester operating on a saline waste activated sludge was found to harbor a

81 large proportion of Fusibacter related species, however the salinity of the food processing feedstock used for sample S (Shin et al., 2010). Within the Proteobacteria, the primary differences between the S and N samples were the large numbers of Epsilon and Gammaproteobacteria in the S sample and nearly no Deltaproteobacteria. In the SL fraction the largest identified genus was Arcobacter, which are commonly associated with fecal contaminated wastewaters (Collado et al., 2010), while no sequences classified to the Epsilonproteobacteria were recovered from the N sample. Curiously, no

Deltaproteobacteria, a class encompassing a number of syntrophic genera (Hatamoto et al., 2007), were found in the SL fraction while they were recovered in nearly equal abundance in both N sample fractions. This dearth of syntrophic bacteria in the S sample fails to correspond with the number of sequences recovered corresponding to

Methanosaeta, which usually forms a close association with syntrophic bacteria as a source of acetate (Angenent et al., 2004). Conversely, the large abundance of syntrophic bacteria in both fractions of sample N corresponds with the need to convert the high concentrations of SCFAs that would result from the oxidation of fats and lipids in the feedstock.

4.6 Conclusions:

Overall, the microbial diversity in the study dataset was similar to patterns seen previously (Chapter 3), however the diversity of individual samples and sample fractions was not homogenous. As exact chemical data on the feedstocks was unavailable, only tentative statements can be made concerning the direct effect of the feedstock on the resultant microbial community, however statistical analysis showed that feedstock had

82 the largest effect on determining sample variation as seen in Figure 3. While large phylum level changes in diversity were not seen between samples S and N, shifts at lower taxonomic levels were identified especially in the Proteobacteria. Sample B was represented by sequences associated with the Chloroflexi at a level of enrichment not previously seen in other AD systems (e.g. Rivière et al., 2009). The source of the

Chloroflexi in sample B is presumed to be the seed, sample N sludge as sequences from both are highly similaU7KLVZRXOGILWWKHVWDWHPHQW³(YHU\WKLQJLVHYHU\ZKHUHEXWWKH

HQYLURQPHQWVHOHFWV´EXWZRXOGUHTXLUHIXUWKHUVHTXHQFLQJHIIRUWWRFRQILUP

Pyrosequencing libraries would possess sufficient sequencing depth to determine if the feedstock is selecting latent species in the seed sludge or providing a new community of species itself.

Differences between the granular and liquid biomass fractions were observed for all samples, with hydrogenotrophic methanogens from the order Methanomicrobiales being more prevalent in the liquid fraction and the acetotrophic genus Methanosaeta overwhelmingly dominant in the granular fractions, suggesting that hydrogenotrophic methanogenesis occurs primarily in the liquid fraction. Differences between fractions were also seen for the Bacteria, indicating that granular biomass may not solely be responsible for degradative activity in anaerobic digesters. Future analyses of the microbial diversity in AD systems that develop granular biomass should examine both the granular and liquid biomass fractions in order to form a comprehensive picture of the activity within the system.

83

Table 4.1: Classification table of sequences for all sample fractions

Classification SG SL NG NL BG BL

Bacteria 147 172 172 158 162 158 unclassified Bacteria 15 4 54 49 52 37 Deferribacteres - Caldithrix 1

Firmicutes 69 61 55 49 5 11 unclassified Firmicutes 3 1

Bacilli - Lactobacillus 1

unclassified Clostridia 4 2 2 2

unclassified Clostridiales 20 10 10 11 4

Incertae Sedis XIV - Proteocatella 1

Incertae Sedis XIII - Anaerovorax 7 3 1 2 3

unclassified Incertae Sedis XII 1 2

Incertae Sedis XII - Fusibacter 21 11

Incertae Sedis XII - Acidaminobacter 1

Incertae Sedis XI - Sedimentibacter 4 2

Incertae Sedis XI - Tissierella 3 2 7 6 1

unclassified Ruminococcaceae 9 7 7

Ruminococcaceae - Acetivibrio 1 3 2 3

Ruminococcaceae - Ethanoligenens 1

Ruminococcaceae - Sporobacter 1

unclassified Lachnospiraceae 1 1

unclassified Veillonellaceae 5 19 2 1 2

unclassified 1 1

Eubacteriaceae - Eubacterium 1

Eubacteriaceae - Anaerofustis 1 1 1

Clostridiaceae - Clostridium 15 16

Spirochaetes - Treponema 2 1

Bacteroidetes 8 15 3 6 14

unclassified Bacteroidetes 5 6 2 6 11

unclassified Bacteroidetes incertae sedis 2 1

Bacteroidetes incertae sedis - Prolixibacter 1 8 1

unclassified Bacteroidales 2

Rikenellaceae - Rikenella 1

Proteobacteria 54 87 53 49 4 46 unclassified Proteobacteria 1 4 2 1

Epsilonproteobacteria 8 31

Helicobacteraceae - Sulfurovum 1 4

Campylobacteraceae - Arcobacter 7 27

Continued

84

Table 4.1 Continued Deltaproteobacteria 8 43 38 2 33 unclassified Deltaproteobacteria 2 2 1 unclassified Syntrophaceae 16 16 1 13 Syntrophaceae - Smithella 3 25 14 1 7 Syntrophaceae - Syntrophus 1 7 Syntrophobacteraceae - Syntrophobacter 1 2 3 1 Geobacteraceae - Geobacter 2 Syntrophorhabdaceae - Syntrophorhabdus 2 2 Desulfobacterales - Desulfobulbus 2 Betaproteobacteria 18 18 1 8 Burkholderiaceae - Polynucleobacter 1 unclassified 1 Alcaligenaceae - Castellaniella 1 Alcaligenaceae - Bordetella 1 unclassified Oxalobacteraceae 1 unclassified Comamonadaceae 1 Comamonadaceae - Simplicispira 17 16 1 Comamonadaceae - Delftia 1 Comamonadaceae - Hydrogenophaga 1 Comamonadaceae - Polaromonas 1 Hydrogenophilales - Thiobacillus 1 Rhodocyclales - Propionivibrio 1 Alphaproteobacteria 3 2 2 1 unclassified Alphaproteobacteria 1 unclassified Rhizobiales 1 Rhizobiaceae - Rhizobium 2 Beijerinckiaceae - Chelatococcus 1 Caulobacterales - Phenylobacterium 1 unclassified Rhodobacteraceae 1 Rhodobacteraceae - Rhodobacter 1 Gammaproteobacteria 17 35 5 9 3 unclassified Gammaproteobacteria 2 2 unclassified Pseudomonadaceae 11 30 1 5 1 Pseudomonadaceae - Pseudomonas 6 4 2 2 1 Xanthomonadales - Pseudoxanthomonas 1 Alteromonadales - Shewanella 1 - unclassified Acidobacteria 1 Acidobacteria - Gp18 1 Planctomycetes - unclassified Planctomycetaceae 1 OP10 - genera incertae sedis 1 1 Lentisphaerae - Victivallis 2 Chloroflexi - unclassified Chloroflexi 1 4 2 101 50

85

Table 4.2: Diversity statistics for all sample fractions

No. of No. of Whole No. OTUs Unique Sample ID sequences No. of OTUs Sample OTUs to Sample Fraction Chao1 Shannon SG 286 71 43 116.88 2.98 110 SL 304 60 35 91.23 3.05 NG 302 73 38 149.15 3.29 104 NL 291 60 24 87.00 3.28 BG 295 31 16 65.20 1.91 67 BL 294 50 23 69.46 2.54 86

86

87

Figure 4.1: Phylogenetic tree of archaeal OTU sequences with nearest neighbor isolates and representative WSA2/ArcI sequence. OTUs are identified using the accession number of the representative sequence, followed by the number of sequences assigned to that OTU for each sample fraction.

87

Figure 4.2: Phylogenetic tree of Chloroflexi OTU sequences with selected nearest neighbor isolates and uncultured sequences. OTUs are identified using the accession number of the representative sequence, followed by the number of sequences assigned to the OTU for each sample fraction.

88

0.4

0.3

0.2

0.1 BG BL 0 NG -0.6 -0.5 -0.4 -0.3 -0.2 -0.1 0 0.1 0.2 0.3 NL -0.1 SG SL -0.2

-0.3 Component 2: 27.3% of variation explained 2:27.3%variation of Component

-0.4

Component 1: 29.8% of variation explained

Figure 4.3: Principal Coordinates Analysis plot of sample fractions determined using the Unifrac metric. The plot represents 57.5% of total variance in the dataset, with the horizontal axis representing 29.8% and the vertical axis representing 27.7% of total variance respectively.

89

Figure 4.4: Venn diagram of shared OTUs between samples for the granular (left) and liquid (right) biomass fractions. Each sample fraction is represented by a circle whose size is proportional to the number of OTUs recovered from that fraction.

90

CHAPTER 5

ANALYSIS OF THE SHIFTS IN MICROBIAL COMMUNITY DIVERSTIY OVER TIME IN AN ANAEROBIC DIGESTER TREATING ETHANOL DISTILLERY WASTES

5.1 Abstract:

The microbial communities in anaerobic digestion (AD) systems are highly variable and influenced by a wide array of factors related to the operational performance of the system. Previous attempts to analyze the structure and composition of these communities, and their relationships to performance variables, often relied on small sequencing datasets which are unable to provide sufficient coverage of the actual microbial diversity in such systems. We developed a set of bacterial and archaeal primers for use in multiplex pyrosequencing analysis of six AD sludge samples taken from an AD system that operated for over a yearlong period. The results showed that during a change from mesophilic to thermophilic operation, the primary microbial community shifted from one comprised primarily of members of the phylum Chloroflexi to those of the phylum Thermotogae. This corresponded to an increase in unclassified

Euryarchaeota, which remained the predominant archaeal group even after a return to mesophilic operation. Canonical correspondence analysis (CCA) was used in an exploratory manner in an attempt to link the observed microbial diversity to known operational parameters for the system. This analysis showed that temperature had the greatest identifiable effect on community structure, while other examined variables had

91 no strongly identifiable community associated. Interestingly, the results of both the CCA analysis and traditional beta diversity measures showed that the initial and final sample points shared the most similar microbial community, despite separation of the sampling points by a period of over 1 year.

5.2 Introduction:

Anaerobic digestion (AD) is the process by which complex organic matter, typically a high strength organic waste stream, is degraded by a mixed microbial consortium to produce biogas. The microbial diversity found within AD systems is highly varied, as discussed in Chapter 3, and has been examined using a number of different technologies including FISH, DGGE, SSCP, and traditional culture methods

(Hori et al., 2006; Leclerc et al., 2004; Lee et al., 2008; Supaphol et al., 2011). Currently, the preferred method for examining the microbial diversity of AD systems is through the cloning and sequencing of 16S rRNA genes. Numerous studies have used this technique to examine both the alpha and beta diversity of AD sludge samples derived from different sources, detailing an extensive range of microbial diversity (Chouari et al., 2005;

Figuerola and Erijman, 2007; Rivière et al., 2009). Estimation of the known current coverage of diversity in AD systems, however, suggest that between 20 and 40% of bacterial species have yet to be witnessed in 16S clone libraries (Rivière et al., 2009,

Chapter 3).

&XUUHQWO\³QH[WJHQHUDWLRQ´VHTXHQFLQJWHFKQRORJLHVVXFKDVWKRVHGHYHORSHGEy

454 Life Sciences are capable of producing hundreds of thousands of sequence reads per

92 run. Taking advantage of these technologies, researchers have been able to examine microbial diversity in a range of habitats with a greater level of coverage than is practical using clone libraries (Andersson et al., 2008; Huber et al., 2007). While previous applications of this technology were limited to short read lengths around 200 base pairs in length, the current FLX Titanium chemistry for the Roche 454 systems allows for high quality read lengths over 500 nt. Additionally, multiple sample libraries can be sequenced and analyzed in parallel by using short nucleotide sequences, commonly referred to as barcodes or multiplex identifiers (MIDs), located between the 454 adapter primer and the template specific primer (Binladen et al,. 2007).

While previous studies examining changes in microbial diversity in anaerobic digesters have used traditional 16S cloning and sequenciQJWKHDSSOLFDWLRQRI³QH[W

JHQHUDWLRQ´VHTXHQFLQJWHFKQRORJ\WRGHWDLOLQJWKHPLFURELDOGLYHUVLW\LQ$'V\VWHPVLV still limited. Three primary datasets generated using pyrosequencing of AD samples are currently reported. The first application of pyrosequencing to AD samples was done as a metagenomic study on a single sample obtained from an agricultural biogas facility

(Krause et al., 2008, Schlüter et al., 2008, Kröber et al., 2009). A second study compared the microbial diversity in biosolids from two different anaerobic digestion processes used for treating municipal waste water treatment sludge (Bibby et al., 2010). The third study examined microbial communities in nine AD systems treating brewery wastewater over the course of a yearlong period (Werner et al., 2011). In the latter two studies, pyrosequencing libraries were constructed only using primers specific to the bacterial domain. As methanogenic Archaea play an important role in the conversion of waste to

93 methane biogas, it is important to monitor changes in their diversity concurrently with the

Bacteria in order to gain a comprehensive understanding of microbial dynamics in AD systems.

In this study we analyzed six sludge samples obtained over time from a downflow sandbed filter (SBF) reactor using 454 pyrosequencing. We designed and used novel, barcoded primers directed toward the V1-3 hypervariable regions of the 16S rRNA of both Archaea and Bacteria to deeply investigate changes in the microbial diversity.

Alpha, beta, and gamma diversity measures were determined and used to compare sample richness and evenness. The pyrosequencing data were also compared to measured performance variables for the system using canonical correspondence analysis (CCA) to explore the relationships between microbial community structure and system performance. Additionally, we showed that the results of DGGE analysis using the V3 hypervariable region provided similar sample similarity correlations to those derived from the pyrosequencing data. This supports the use of DGGE as a diagnostic screening method prior to sample analysis with pyrosequencing.

5.3 Materials and Methods:

5.3.1 Reactor Operation:

A pilot scale sand bed filter (SBF) reactor was operated continuously for 500+ days. The total reactor volume was 8000 gal with a working hydraulic volume of 6000 gal. Starting day 1 through day 197, the feedstock consisted of thin stillage from a commercial ethanol distiller. From day 198 through the end of the operational period the

94 feedstock consisted of whole stillage from the same distiller. Loading rate, system pH,

COD, biogas production, CH4 percentage, and other parameters were automatically recorded daily. During Day 0-92, the digester was operated at 100 °F (37 °C). From days 93-201 the temperature was increased stepwise to a maximum of 132 °F (55 °C)

(Figure 5.1). After thermophilic operation, the reactor temperature was brought back to

100°F (37 °C) for the remainder of the operational period. Sludge samples were obtained periodically from sampling ports 2 feet and 6 feet from the top of the sand bed and frozen at -80°C for microbial analysis.

5.3.2 Microbial Community DNA Extraction:

Samples obtained from the 2 foot sampling port were thawed on ice and approximately 0.25g of homogenized sludge sample was used for community DNA extraction using the RBB+C method (Yu and Morrison, 2004). The quality of the extracted community DNA was visually assessed using agarose gel electrophoresis and quantified using the Quant-it dsDNA Broad Range assay kit (Molecular Probes,

Carlsbad, CA) on a MX3000P real-time PCR system (Agilent, La Jolla, CA).

5.3.3 PCR-DGGE:

DGGE analysis of the archaeal and bacterial communities for each sample was performed by PCR amplification of the V3 hyper-variable region using the extracted community DNA and the primers listed in Table 5.1. Amplified products were visually checked for size and quality via agarose gel electrophoresis and separated in a 40-60% denaturing gradient acrylamide gel as previously described (Yu and Morrison, 2004).

95

DGGE gels were processed using BioNumerics (v.5.1; Applied Maths, Inc., Austin, TX) to determine sample banding patterns as previously described (Cressman et al., 2010).

Distance matrices and dendograms were constructed using the Jaccard, Sørensen, and

Ochiai correlation coefficients and UPGMA clustering respectively.

5.3.4 Design of Template Specific Amplification Primers:

Template specific primers were designed to amplify roughly the first 500 base pairs of the 16S rRNA gene, encompassing the V1-V3 hyper-variable regions. Bacterial type strains from the RDP database were downloaded and aligned using ClustalW. The primer binding positions of the traditional 27F and 519R primers developed by Lane

(1991) were annotated to the alignment and new, degenerate primers were selected by shifting the proposed primer binding site in either the ¶or ¶direction from that of the traditional primers. New candidate primers were analyzed using the RDP Probecheck utility to determine primer specificity and completeness of coverage. To design the

Archaeal primers, whole genome sequences were download for all Archaea available in

GenBank and the entire 16S rRNA gene, plus the 50 bases immediately upstream of the

¶end of the sequence were excised and aligned using ClustalW. Using a strategy similar to the bacterial primers, the primer binding positions of the Arc8f and 519R primers of Lane were annotated to the alignment. New candidate primers were determined by shifting the primer binding site either in the ¶RU¶direction from the positions of the primers by Lane. Candidate primers, with various degrees of degeneracy, were selected and tested using RDP Probecheck. Candidate primer pairs were selected based on their degree of domain specificity, completeness of coverage, and Tm

96 compatibility. To ensure primer compatibility and specificity, the candidate primers were tested on pooled community DNA samples to ensure amplification of the correct sized product. Amplified products were cloned using the TOPO-TA Kit for Sequencing

(Invitrogen, Carlsbad, CA) and 10 clones for each domain were selected for sequencing at the Plant Microbe Genome Facility (The Ohio State University, Columbus, OH). The returned sequences were checked against the GenBank database using BLAST to confirm target domain specificity.

5.3.5 454 Sequencing Primers:

To generate the primers used for 454 amplicon sequencing, each of the template specific primer sets was combined with an 8 nucleotide sample identifier, referred to as a barcode, and either the 454 Adapter A or B primer as follows: 454 Adapter Primer- barcode-template specific primer. A total of 29 unique barcodes were chosen from the list provided by Hamady et al. (2008) such that the nucleotide incorporation pattern of the barcode did not have two sequential nucleotides included during a TACG flow cycle. All primers developed in this study are listed in Table A.1 in the appendix.

5.3.6 Generation of 454 Amplicons and Sequencing:

Barcoded primers were randomly assigned to the sludge samples as listed in Table

5.1. Amplicons for 454 pyrosequencing were generated by PCR amplification using

Platinum Hi-Fidelity Taq polymerase (Invitrogen, Carlsbad, CA) according to the

PDQXIDFWXUHU¶V instructions with an MJ Mini Thermocycler. The PCR amplification program was as follows: an initial 90 °C denaturation step followed by 35 cycles of a 95

97

°C denaturing step for 30 seconds, 60 °C primer annealing step for 30 seconds, and a 72

°C extension step for 1 minute followed by a final 10 minute extension step at 72 °C.

Amplification products were visually checked via agarose gel electrophoresis for quality and size. The amplified product for each sample was excised from the gel and purified using a QIAquick Gel Extraction Kit (Qiagen, Valencia, CA) to remove remaining template DNA and unincorporated primers. The gel purified amplicons were quantified using the Quant-it dsDNA Broad Range assay kit (Molecular Probes, Carlsbad, CA) and the MX3000P real-time PCR system (Agilent, La Jolla, CA). Amplicons were diluted with TE buffer to a concentration of 1x108 copies/ul.

Archaeal and bacterial pools were made by combining 20 ul of diluted amplicon for each sample into a 1.5 ml tube respectively. An initial sequencing pool was generated by combining 60 µl of the bacterial pool with 2 µl of the archaeal pool. This initial pool was sequenced using 1/16th of a PTP at the University of Illinois Keck Center for

Comparative and Functional Genomics using the 454 FLX Titanium sequencing chemistry. The distribution of amplicons between the sampling libraries was used to recalibrate the pooling to achieve more even amplicon distribution. The distribution of amplicons in the recalibrated pool was confirmed by as before using a 1/16th PTP test run. The final amplicon pool was sequenced using a full PTP in both the forward and reverse direction using the 454 FLX Titanium sequencing chemistry.

5.3.7 Sequence Processing:

Returned sequences were processed using the Qiime bioinformatics analysis

98 package, version 1.2.1 (Caporaso et al. 2010). Sequences were demultiplexed, and barcodes and primers trimmed using the default options. Sequences were screened to eliminate sequences with read lengths less than 200nt, longer than 750nt, overall average quality score less than Q25, homopolymer stretches longer than 8nt, and more than 2 N base calls. Sequences generated from the reverse primer were reverse complemented and combined with the forward reads prior to further analysis. Demultiplexed, screened sequences were aligned against the Greengenes core set database using PyNAST and chimera checked using ChimeraSlayer as implemented in Qiime. A full description of methods for sequence processing is available in Appendix A.

5.3.8 Bioinformatic and Statistical Analysis:

Sequences were clustered to OTUs using the default options for uclust as implemented in the Qiime package at the 0.97 similarity cutoff. Chao1, Shannon, and

Simpson alpha diversity indices, alpha rarefaction curves, and phylogenetic distance were calculated for each sample using default values. Beta diversity measurements between samples were determined using the Bray-Curtis, Jaccard, Ochiai, and Sørensen\Dice correlation coefficients along with weighted and unweighted Unifrac methods.

Procrustes analysis of the DGGE and pyrosequencing data was conducted using the similarity matrices derived using the Ochiai correlation coefficient for each dataset.

To examine the interconnected relationship of OTUs, samples and environmental variables, CCA analysis was conducted using the vegan package (Oksanen et al., 2011) of the statistical software R. The species abundance matrix was created by redacting the

99

OTU table generated by Qiime so that only OTUs with greater than 0.01% representation

RIWKHGDWDVHW•VHTXHQFHVZHUHLQFOXGHG7KHHQYLURQPHQWDOPDWUL[FRQWDLQHGWKH following parameter data: temperature, organic loading rate, digester COD, digester pH, biogas production, and methane production. Data in the environmental matrix was log transformed to remove the confounding effect of values with incompatible units. The environmental variables temperature (T), organic loading rate (LR), digester pH (DpH), and methane production (CH4) were included in the model for constrained ordination of the species matrix. Significance was determined by ANOVA for the overall analysis and by term for each environmental variable included in the model.

5.4 Results and Discussion:

5.4.1 DGGE Aanlysis:

An initial sample analysis was first performed using DGGE to determine general sample diversity as well as initial sample similarities. Regardless of the correlation coefficient used (Jaccard, Sørensen or Ochiai) the resulting dendograms shared a similar topology, with the primary differences being the degree of similarity between sample.

This is expected as the Jaccard, Sørensen and Ochiai correlation coefficients use similar methods of calculating correlation, differing primarily in that Jaccard and Sørensen treat

WKHUDZGDWDDVELQDU\³VSHFLHV´Sresence/absence values while Ohciai is able to consider numerical abundance. For DGGE analysis, the sample banding patterns were treated as binary band presence/absence, thus eliminating most of the variation between how the values were calculated for each method. Figure 5.2a shows a dendogram of the bacterial samples created using UPGMA clustering from the Ochiai determined distance matrix.

100

The grouping of the time points remained consistent between the DGGE analysis and pyrosequencing analysis (Figure 5.2b). Procrustes analysis using Ochiai correlation coefficients for both the DGGE and pyrosequencing samples showed no major differences in sample distribution, thus supporting the use of DGGE as an initial screening method.

5.4.2 Pyrosequencing Analysis:

Next generation sequencing technologies such as 454 pyrosequencing offer two primary strategies for examination of microbial diversity. The first strategy is to take advantage of the large number of sequence reads to deeply sequence a limited number of samples at depths impractical to reach using traditional cloning and sequencing strategies.

The second strategy is to examine a large number of samples at a lower depth of coverage. As previous estimates of maximum species richness in AD systems were based primarily on small sequence libraries, we decided to follow the first strategy and examine the microbial diversity in our samples at a sequence depth unprecedented for

AD derived samples.

After demultiplexing and initial sequence processing, a total of just over 478,000 sequences were left representing the entire dataset. The average sequence length was

379.5 nt with a standard deviation of 42.1 nt. The median number of sequences recovered per sample was approximately 70,000, with the actual numbers of reads listed in Table 5.2. These sequences were assigned to a total of 40,101 OTUs at the 97% similarity level, of which just over 60% were singletons. While there is still debate

101 regarding the biological validity of low abundance OTUs (those with only one sequence representative) in pyrosequencing datasets, we decided to keep these OTUs in the dataset for analysis as they had passed both the initial quality and subsequent chimera screening steps. We further justify this rationale as only 1/3 of the singleton OTUs were unclassified Bacteria while nearly 20% were able to be classified to a genus, suggesting that these sequences were not inherently more aberrant than those assigned to more abundant OTUs. It is possible that the number of singleton OTUs is inflated due to incomplete OTU searching by the uclust OTU picker even though we used the most current OTU picking options implemented in Qiime (v1.2.1). This incomplete OTU searching has been reported as a potential source of bias for the uclust OTU picking method (Caporaso, 2010) and its implications will be discussed when relevant.

Figure 5.3 shows the taxonomic distribution of sequences at the phylum level for each of the six sample time-points as well as for the entire dataset. The sequences assigned to the phylum Chloroflexi represented the largest proportion, ~40%, of all of the recovered sequences. This phylum has been witnessed in similar proportions in previous surveys of AD systems (Rivière et al., 2009). The majority of sequences classified to the

Chloroflexi were able to be assigned to the family Caldilineaceae, which are believed to be primarily responsible for the fermentation of simple sugars to SCFAs (Yamada et al.,

2006). The phylum Firmicutes and unclassified Bacteria each represented nearly 18% of the total dataset. The proportion of unclassified Bacteria is similar to that seen in the pyrosequencing datasets of Bibby et al. (2011) and Werner et al. (2011), and is slightly higher than those seen in large Sanger sequencing datasets (Rivière et al., 2009). The

102 phylum Thermotogae represented approximately 8% of all sequences in the dataset, however these were almost entirely recovered from the two samples taken during the thermophilic digestion phase and were found only in low abundance, < 20 sequences, in samples corresponding to the mesophilic digestion phases. The Actinobacteria represented roughly 2.5% of all sequences.

Interestingly, the phylum Proteobacteria, which has previously been identified as a primary constituent in AD systems (Chapter 3, Rivière et al., 2009, Werner et al., 2011) and contains a number of known syntrophic bacteria (Ariesyady et al., 2007), represented only 1% of all sequences in the dataset and no more than 2.2% in any sample individually

(Figure 5.3). As syntrophic conversion of non-acetate SCFAs to acetate is a principal step of the AD process, the abundance of Proteobacteria in the samples for this study does not fit with current expectations of their abundance in AD systems. It is possible that the low recovery of sequences associated with the Proteobacteria points to an inherent bias in the bacterial domain template specific primers used to generate the amplicon libraries. While the primers were tested in silico for their completeness of coverage of the bacterial domain, the actual specificity when used in vitro to generate the amplicon libraries can be different from the in silico results. This difference could greatly affect the actual recovery of sequences from certain groups and has been observed before with broad specificity primers (Jeon et al. 2008, Reysenbach et al. 1992). Further exploration of the specificity of the primers designed in this study will need to be conducted, preferably in comparison to the primers originally designed by Lane using a variety of samples.

103

The archaeal diversity in the dataset was low, with only 11 taxa identified, of which only four were known genera (Figure 5.4). The two acetoclastic genera Methanosarcina and Methanosaeta combined to represent over 60% of archaeal sequences, while unclassified Euryarchaeota represented 25% and unclassified Archaea represented 10% of archaeal sequences. The order Methanomicrobiales represented only 1.2% of archaeal sequences, while none were assigned to the class Methanobacteria. Both of these two groups represent known hydrogenotrophic methanogens, and their absence in the dataset possibly suggests that acetoclastic methanogenesis is the sole pathway for methane production in the system. This is highly unlikely, however, as acetoclastic methanogenesis is expected to be responsible for only 60-70% of the methane produced in AD systems, with hydrogenotrophic methanogenesis comprising the balance (IWA

Task Group for Mathematical Modelling of Anaerobic Digestion Processes, 2002). Two alternative explanations are available to explain this result. The first is that, in a result similar to that of Proteobacteria, the archaeal primers are biased against members of the

Methanobacteria, thus preferentially amplifying members of the Methanomicrobia which are generally witnessed in greater abundance in AD systems (Leclerc et al., 2004, Rivère et al., 2009). The use of quantitative real-time PCR (QPCR) directed toward known methanogens would allow for this hypothesis to be tested and either confirmed or rejected. A second explanation is that sequences which were assigned as either unclassified Archaea or unclassified Euryarchaeota in fact represent all of the hydrogenotrophic methanogens in the samples. There is evidence to support this latter argument, as the WSA2/ArcI group of methanogens are assigned to the unclassified

104

Euryarchaeota using the RDP classifier (Chapter 3) and members of this group are proposed to have a hydrogenotrophic metabolism (Chouari et al., 2005). That very few sequences assigned to known hydrogenotrophic methanogens were recovered, however, makes this second explanation less likely in comparison to the first.

5.4.3 Community Comparisons Among Samples:

Measures of alpha diversity for each of the sample days are given in Table 5.2.

While efforts were made to create normalized pools of amplicons prior to sequencing, ultimately the recovered number of sequences for each sample varied from approximately

61 to 107 thousand. The number of OTUs generated for a given sample, however, was not directly correlated to the number of sequences recovered, suggesting that a simple increase in the number of sequences recovered from a sample will not necessarily inflate the number of OTUs generated. To illustrate, sample days 122 and 331 both had over

100,000 sequence reads, however the number of OTUs generated for day 331 is only

40% of the number of OTUs generated for day 122. Similarly, day 246 had the fewest number of recovered sequences, but had 86% as many OTUs as day 122. Thus, while the individual samples had variable numbers of raw sequence reads, it is safe to assume that the alpha diversity indices accurately reflect the underlying microbial diversity of the sample. This is backed up by analysis of the rarefaction curves for each sample (Figure

5.5) which shows that all six samples were extensively sampled.

Due to the large proportion of singleton and doubleton OTUs in each of the sample datasets (Table 5.2), the Chao1 maximum richness estimator is likely an unreliable

105 estimate of true maximum species richness. This is because Chao1 is a non-parametric estimation of maximum species richness, and uses the number of singleton and doubleton

OTUs as a means to correct the actual observed number of OTUs (Hughes et al., 2001).

While Hughes et al. (2001) argued that this method of estimating maximum species richness is valid for small clone libraries where sampling is often incomplete, the vast majority of singleton and doubleton OTUs in pyrosequencing datasets are more likely to be attributable to sequencing noise than to being representative of truly rare species.

Parametric estimations of maximum richness based on curve fitting of various exponential equations to rarefaction curve data have been found to provide complimentary measure of maximum species richness (Larue et al., 2005, Youssef et al.,

2008, Chapter 3). Fitting the monomolecular function to the rarefied OTU data for each sample gave much lower estimates of maximum species richness, with the values being

50-60% of the Chao1 estimates. Using the rarefaction estimate of maximum species, it is possible to determine the coverage of each sample individually by dividing the number of

OTUs seen in a sample by the estimated maximum. The median estimated coverage for the samples was 65%, with day 246 having the lowest estimated coverage (60%) while day 331 had the highest (91%). The estimated coverage for a given sample was linearly correlated to the number of sequences recovered for that sample (R2=0.93), confirming that a large number of sequences (>100,000/sample) are necessary to adequately cover the majority of the microbial diversity in AD systems.

Initial comparison of the bacterial diversity between the samples can be accomplished by comparing the taxonomic distribution of sequences at the phylum level

106 as seen in Figure (5.3). The phylum level diversity seen in sample day 57 had similar proportions to the global dataset of AD sequences determined in Chapter 3, with the

Chloroflexi representing over 55% of sequences and the Bacteroidetes representing nearly 20%. These two phyla contain known fermentative bacteria and are commonly witnessed in AD systems (Rivière et al., 2009). The most obvious change in diversity is the increase of the phylum Thermotogae in the two samples taken during thermophilic operation (days 122 and 150) compared to the initial population observed on day 57. The phylum Thermotogae represents known thermophilic bacteria which are capable of fermenting sugars to SCFAs, an important step in the AD process, and have been recovered in other AD systems operating at thermophilic temperatures. (Sasaki et al.,

2011; Weiss et al., 2008). This increase in Thermotogae occurred with a concomitant decrease in the phyla Chloroflexi and Bacteroidetes, which have a similar metabolic scheme to the Thermotogae, but have fewer known thermophilic genera. A small number of Thermotogae sequences were recovered for sample day 57, suggesting that the increase in temperature from mesophilic to thermophilic enriched members of the

Thermotogae, which took over the metabolic role of mesophilic members of the

Chloroflexi and Bacteroidetes.

Sample days 246 and 331 had very dissimilar diversity for the Bacteria at the phylum level (Figure 5.3), although the archaeal diversity was more similar (Figure 5.4).

These two time periods represented a change of the primary feedstock from thin stillage to whole stillage on day 226. It is possible that the microbial community on day 246 was still undergoing changes in response to the new feedstock, and these changes were not

107 fully manifested until day 331. Day 427 is more similar to day 246 than day 331 however, suggesting that the microbial diversity witnessed for day 331 is more aberrant.

This is assumption is supported by the Shannon and Simpson diversity indices and the measure of phylogenetic distance determined for day 331, which suggest that the sample has a much lower richness (Shannon and phylogenetic distance) and is more uneven

(Simpson) compared to the other samples.

Tests of sample relatedness using the pyrosequencing data were conducted using both a species abundance and phylogenetic approach. The species abundance comparison method relied on determining the correlation coefficient between samples using the Ochiai method, while the phylogenetic approach used the UniFrac distance method. Ordination of the samples using principal coordinates analysis for both methods had similar patterns of ordination, which was confirmed through the use of procrustes analysis. In both analysis methods, sample days 57 and 427 were nearly identical, a result suggested by the phylum level distribution of sequences for the Bacteria but not for the Archaea. This is most likely due to the deliberate under sampling of the Archaea, which artificially lessens their contribution to overall sample similarity using either the species abundance or phylogenetic approach. Interestingly, the two thermophilic samples were not very similar to each other, a finding supported by the bacterial and archaeal taxonomic distribution of sequences. Again, this is most likely due to the prolonged adaptation of thermophilic bacteria which resulted in their gradual increase from almost no representation at sample day 57 to representing over 40% of sequences at sample day

150.

108

5.4.4 Statistical Analysis:

As both the DGGE and pyrosequencing results failed to show that the samples were closely related based on their temporal acquisition, we hypothesized that changes in reactor operating conditions had a strong effect on the microbial community dynamics at any given time point. In order to explore how the microbial diversity of a sample was related to changes in reactor conditions, canonical correspondence analysis (CCA) was employed. This technique has previously been used in the analysis of microbial diversity in AD systems using DGGE banding patterns (Supaphol et al., 2011) and was recently applied to pyrosequencing data of the Bacteria from samples obtained from AD systems operating on brewery wastewater (Werner et al., 2011). An issue that has often limited the application of statistical methods commonly used in studies of macrobiotic diversity to microbial data has been the double zeros problem, where two communities sharing a large number of non-present species are determined to be statistically more similar than the underlying biology would dictate. In order to mitigate this issue, we applied CCA only to OTUs that represented at least 47 sequences, or 0.01% of the complete dataset.

Hence, out of an initial 40,101 OTUs, this was redacted to 427 OTUs that represented over 80% of all sequences. A similar data pre-processing was used by Werner et al.

(2011) in their analysis and supports the decision to focus solely on OTUs whose abundance could presumably be linked with changes in reactor performance.

As seen in Figure 5.6, the primary distribution of OTUs is by sample day, with temperature (T) being the main environmental parameter to have a strongly identifiable

109 community associated with it as denoted by the large cluster of OTUs coinciding along the vector denoting the temperature gradient. As would be expected, the majority of

OTUs coinciding with the temperature gradient were recovered from the two sample days corresponding to the period of thermophilic operation, days 122 and 150. Of the other three environmental parameters included in the model, organic loading rate (OLR), digester pH (DpH), and methane production (CH4), digester pH was the only other variable to have an identifiable associated microbial community.

Unlike the sample similarity results seen using beta diversity measures of the

DGGE or pyrosequencing data, sample day 246 is shown as being more similar to sample day 122 than to day 331 in the CCA analysis. This is most likely due to the increased similarity of day 246 to day 122 when compared on both their microbial composition and environmental parameters. The results suggest that the functional similarity of the microbial communities in these two samples are more similar than their actual sequence abundance and phylogenetic distribution for the samples. Similar to the findings of sample relatedness for the pyrosequencing data, sample days 57 and 427 were shown to be nearly identical when compared against their environmental parameters and microbial diversity. This result was despite a period of over 1 year between the sampling points and numerous operational changes. One possible interpretation of this result is that the microbial communities observed at these two sample points may represent a type of stable state community for AD of ethanol stillage wastes.

5.5 Conclusions:

110

As the microbial communities in AD system are highly variable and complex, it is often a challenge to adequately sample them using traditional cloning and sequencing strategies. The use of large scale pyrosequencing of a time course set of AD samples allowed for a greater depth of sequencing effort per sample, allowing for low abundance species to be identified. Changes in the microbial diversity between samples were witnessed at taxonomic levels from phylum down to genus and both species abundance and phylogenetic measures of sample similarity suggested that differences were not related to the temporal nature of the sampling. Canonical correspondence analysis

(CCA) was successfully used in an exploratory manner to examine the changes in microbial diversity in relation to reactor operating conditions. The primary effect observed was due to a change in temperature of the system from mesophilic to thermophilic. Other measured parameters could not adequately explain changes in microbial diversity, suggesting that parameters not measured have a greater effect on microbial diversity. Both DGGE and pyrosequencing analysis gave similar measures of sample similarity, validating the use of DGGE as an initial screening method for AD samples. The combined use of DGGE screening, deep pyrosequencing, and CCA should allow for greater exploration of the microbial communities in AD systems and how they respond to changes in system operation.

111

Table 5.2: Sequence summary data and alpha diversity indices for the six sample days and the overall dataset.

Individual # of Rarefaction Phylogenetic Sample Sequences O T Us Singletons Doubletons Chao 1 Estimate Shannon Simpson Distance Day 57 63446 8939 6047 1225 23849 14402 5.29 0.0744 1149.9 Day 122 107860 12075 7943 1787 29716 14722 5.36 0.0403 1591.1 Day 150 72733 6813 4496 923 17749 9906 4.09 0.1489 919.6 Day 246 61094 10458 7226 1379 29374 17389 6.05 0.0261 1290.9 Day 331 106212 4753 2873 745 10283 5210 2.27 0.5128 641.1 Day 427 66714 9674 6577 1316 26094 15592 5.45 0.0612 1240.1 Total 478059 40101 24614 6110 113

113

Figure 5.1: Simplified reactor performance details. Feedstock loading rate (green line), biogas production rate (blue line), and temperature (red line) are read off the left y-axis. Digester pH (purple line) is read off the right y-axis. Sample days are denoted as downward facing black arrows. The initial feedstock was thin stillage from a commercial ethanol distiller (dashed orange line) and was changed to whole stillage (solid orange line) from the same distiller. Other measured variables are not shown for sake of clarity

114

160 14.00

140 12.00

120 10.00

100 8.00

80 pH 115 6.00

60

4.00 40

2.00 20

0 0.00 44 94 144 194 244 294 344 394 Day from Reactor Startup Organic Loading Rate (lbs COD/1000 gal) Temperature (F) CH4 (ft^3/hr) Digester pH

Figure 5.1 .

115

Figure 5.2: Dendograms showing sample similarity groupings derived from bacterial DGGE banding patterns (a) and pyrosequencing OTU abundance data (b). Both trees were constructed using UPGMA clustering based on sample similarity correlations determined using the Ochiai correlation coefficient. For both analyses, days 57 and 427 are more similar to each other than their nearest sample point in time.

116

100% 90% 80% 70% Proteobacteria 60% Actinobacteria 50% Thermotogae 40% Bacteroidetes 30% U_Bacteria

117 20% Firmicutes

Chloroflexi 10% 0% 57 122 150 246 331 427 Total Sample Day

Figure 5.3: Bacterial phylum level distribution of sequences for each sample day and the overall dataset. Only phyla covering at least 1% of all sequences are shown, collectively representing over 99.8% of all bacterial sequences in the dataset.

117

100%

90%

80%

70% U_Methanomicrobiaceae 60% Methanoculleus 50% U_Methanosarcinales

40% U_Archaea Methanosaeta 30%

118 U_Euryarchaeota

20% Methanosarcina

10%

0% 57 122 150 246 331 427 Total Sample Day

Figure 5.4: Distribution of sequences for the primary archaeal genera and unclassified taxa. Only taxa that represented at least 1% of archaeal sequences in any sample are shown. The seven taxa shown collectively represent over 99.5% of all archaeal sequences in the dataset.

118

30000

25000

20000 57 122 15000 150 10000 246

Number of OTUs of OTUs NumberGenerated 331 5000 427

0 0 20000 40000 60000 Number of Sequences Analyzed

Figure 5.5: Rarefaction curves of the observed number of OTUs generated in a sample as a factor of the number of sequences analyzed. The curves for all six sample days are nearing a horizontal asymptote, indicating that the samples have all been well sampled.

119

Figure 5.6: Canonical Correspondence Analysis triplot showing the distribution of samples and OTUs in relation to measured environmental variables. The four environmental variables included in the CCA model are represented as blue vectors in the plot. The direction of a vector in relation to any other indicates the correlation between the environmental variables. Digester pH (DpH) is inversely correlated with organic loading rate (LR) as they point in opposite directions, while temperature (T) is not correlated to methane production rate (CH4) as they are orthogonal to each other. The sample days are labeled in red and their position in the plot is determined by their relationship to the environmental variables. OTUs are represented as either gray + signs or labeled with black text such that the most abundant OTUs are labeled preferentially over lower abundance OTUs.

120

CHAPTER 6:

EXAMINATION AND COMPARISON OF MICROBIAL COMMUNITY STRUCTURE IN FOUR ANAEROBIC DIGESTION SLUDGE SAMPLES USING PYROSEQUENCING

6.1 Abstract:

Anaerobic digestion (AD) systems are commonly used to treat organic wastes through the conversion of waste carbon into methane biogas. A number of different AD systems have been designed and are operated worldwide, however few studies have examined the microbial communities that exist in AD systems of different operational design in a comparative manner. Additionally, while granular microbial biomass from upflow anaerobic sludge blanket (UASB) systems has been well studied, the planktonic or liquid biomass has not been investigated. Using deep 16S pyrosequencing we identified differences in the microbial community composition of digested sludge from a municipal waste water treatment plant (WWTP), fractionated biomass from a UASB, and sludge biomass from a novel downflow sand bed filter (SBF) reactor. While the proportion of unclassified bacterial sequence was similar for each of the four samples, ranging from 11-21%, the proportion of other phyla was highly variable. The phylum

Chloroflexi represented over 1/3 of sequences in the dataset, however in the SBF and

UASB granular (UASB-G) samples this proportion was 45-51% while in the UASB liquid (UASB-L) it represented only 18% of the sequences. The primary Archaea were members of the acetoclastic genus Methanosaeta, however proportions of members of the

121 unclassified Archaea, Euryarchaeaota, and varied greatly between the samples. Beta diversity measures determined using DGGE and pyrosequencing showed similar results, with the two biomass fractions of the UASB sample being the most similar to each other and the SBF and WWTP samples being more different.

6.2 Introduction:

As the anaerobic digestion (AD) process has become increasingly utilized as a waste treatment step in a number of industrial operations, a variety of different AD systems have been designed and implemented. One of the most common AD systems in use is the upflow anaerobic sludge blanket (UASB) system originally designed by

Lettinga and colleagues (Lettinga 1995, Lettinga et al., 1980). This system uses a hydraulic upflow pattern where waste influent enters the digester from the bottom and is removed as effluent at the top. One of the defining characteristics of UASB systems is the development of granular biomass particles, which are necessary for proper system operation. These biomass granules are able to settle to the bottom of the reactor, forming a sludge blanket, and prevents biomass washout. Due to the popularity of the UASB system in commercial waste treatment operations, numerous studies have examined the microbial diversity of granular sludge (Díaz et al., 2006; Fernández et al., 2008;

Hatamoto et al., 2007; Leclerc et al., 2004). However, aside from the work done in

Chapter 4, no studies have examined UASB systems to determine if there are differences in the microbial diversity between the granular biomass and planktonic biomass.

122

During the treatment of municipal sewage at waste water treatment plants

(WWTPs), large volumes of microbial biomass are recirculated from aerobic settling tanks back into upstream parts of the treatment process in what is termed the activated sludge process (Gujer et al., 1999). Since not all of the sludge biomass is recirculated, a large proportion of sludge, often referred to as waste activated sludge (WAS), is transferred to an AD system to reduce sludge volume, organic content, and pathogen levels prior to further disposal (Ahring et al. 2002). The downflow sand bed filter (SBF) is a relatively new AD system that is a derivative of the CSTR design. Unlike UASB and

WWTP systems, the hydraulic flow operates in a downward pattern, with influent coming in at the top of the reactor and effluent being removed from the bottom (Yu et al., 2010).

To prevent the loss of microbial biomass, the bottom two feet of the SBF system is a layer of sand that serves to trap and retain microbial biomass in the digester tank. As the

SBF system has not yet been implemented in wide commercial operation, the only examination of the microbial communities in these systems was the work discussed in

Chapter 4.

Previous studies of AD systems have found differences in the microbial consortia present in AD systems of different operational design (Leclerc et al., 2004, Rivière et al.,

2009) Aside from the analyses discussed in Chapter 5, few studies have explored the microbial diversity in AD systems using large scale pyrosequencing. As previously mentioned, one set of studies was based on metagenomic analysis of a single agricultural biogas facility (Krause et al., 2008, Kröber et al., 2009, Schlüter et al., 2008). The other two studies examined only the bacterial diversity in digested biosolids from municipal

123

WWTPs or whole sludge biomass from UASB type systems operating at commercial breweries (Bibby et al., 2011, Werner et al., 2011). As noted in Chapter 5, the lack of coverage of members of the Archaea in these studies is an omission, as the final step of the AD process, methanogenesis, is exclusively carried out by members of the Archaea.

In this chapter we examined the microbial diversity in sludge samples from AD systems representing the two most common AD systems (UASB and WWTP) and the newer SBF in order to determine what common groups of microorganisms are present in each type of system and which are only found in a certain system. Further, the microbial biomass of the UASB sample was fractioned into its constituent granular and liquid components to allow for a further examination of the differences between granular and liquid biomass than was possible in Chapter 4. Using the novel, barcoded primers designed in Chapter 5, we examined both the bacterial and archaeal domains and found that each type of AD system harbored a unique microbial consortia and that there are differences between the granular and liquid biomass of UASB systems.

6.3 Materials and Methods:

6.3.1 Sample Acquisition:

A sludge sample representing a steady state operational period of a downflow sandbed filter ( sample SBF) operating on whole stillage from a commercial ethanol distillery was obtained and stored at -80°C for later analysis. Bulk sludge from a UASB operating at a commercial jam and jelly processor was obtained in a 3 gallon bucket which was hand mixed to suspend the granular biomass prior to aliquoting into a 50ml

124 sample which was frozen at -80°C for later analysis. Digested activated sludge from a municipal waste water treatment plant (sample WWTP) was obtained from a sampling port directly beneath the digester, divided into 50ml aliquots, and frozen at -80°C. All digesters operated at mesophilic temperatures.

6.3.2 Microbial Community DNA Extraction:

Samples were thawed on ice, agitated by hand, and approximately 0.25g of homogenized sludge for samples SBF and WWTP was transferred to a 2ml screw cap tube for community DNA extraction using the RBB+C method (Yu and Morrison,.

2004). The UASB sample was split into granular (sample UASB-G) and liquid (sample

UASB-L) biomass by filtering the thawed sample through a Steriflip 40 micron nylon net filter (Millipore, Billerica, MA). The granular biomass was transferred back to a sterile

50ml centrifuge tube and approximately 0.25g of granular biomass was transferred to a

2ml screw cap tube for community DNA extraction. Liquid biomass was pelleted by taking 6ml of the liquid flow through and centrifuging at 16K x g and then decanting the supernatant. The entire liquid biomass pellet was used for community DNA extraction.

The quality of the extracted community DNA was visually assessed using agarose gel electrophoresis and quantified using the Quant-it dsDNA Broad Range assay kit

(Molecular Probes, Carlsbad, CA) on a MX3000P real-time PCR system (Agilent, La

Jolla, CA).

6.3.3 PCR-DGGE:

DGGE analysis of the archaeal and bacterial communities was performed by PCR

125 amplification of the V3 hyper-variable region of sample community DNA using the primers listed in Table 6.1. Amplified products were visually checked for size and quality via agarose gel electrophoresis and separated in a 40-60% denaturing gradient acrylamide gel as previously described (Yu and Morrison, 2004). DGGE gels were processed using BioNumerics (v.5.1; Applied Maths, Inc., Austin, TX) to determine sample banding patterns as previously described (Cressman et al., 2010). Distance matrices and dendograms were constructed using the Jaccard, Sørensen and Ochiai correlation coefficients and UPGMA clustering respectively.

6.3.4 Generation of 454 Amplicons and Sequencing:

Barcoded primers designed in Chapter 5 were randomly assigned to the sludge samples as listed in Table 6.1. Amplicons for 454 pyrosequencing were generated by

PCR amplification using Platinum Hi-Fidelity Taq polymerase (Invitrogen, Carlsbad,

CA) according to the PDQXIDFWXUHU¶V instructions with an MJ Mini Thermocycler. The

PCR amplification program was as follows: an initial 90°C denaturation step followed by

35 cycles of a 95°C denaturing step for 30 seconds, 60°C primer annealing step for 30 seconds, and a 72°C extension step for 1 minute followed by a final 10 minute extension step at 72°C. Amplification products were visually checked via agarose gel electrophoresis for quality and size. Bands corresponding to the amplified product were excised and extracted from the gel using a QIAquick Gel Extraction Kit (Qiagen, somewhere, CA). The gel extracted amplicons were quantified using the Quant-it dsDNA Broad Range assay kit (Molecular Probes, Carlsbad, CA) and the MX3000P real- time PCR system (Agilent, La Jolla, CA). Amplicons were diluted with TE buffer to a

126 concentration of 1x108 copies/ul. Archaeal and bacterial pools were made by combining

20ul of diluted amplicon for each sample into a 1.5ml tube respectively. The final sequencing pool was created by combining 60ul of the bacterial pool with 2ul of the archaeal pool. The combined amplicon pool was sequenced in both the forward and reverse direction at the University of Illinois Keck Center using the 454 FLX Titanium sequencing chemistry.

6.3.5 Sequence Processing:

Returned sequences were processed using the Qiime bioimformatics analysis package, version 1.2.1 (Caporaso et al. 2010). Sequences were demultiplexed, and barcodes and primers trimmed using the default options. Sequences were screened to eliminate sequences with read lengths less than 200nt, longer than 750nt, overall average quality score less than Q25, homopolymer stretches longer than 8nt, and more than 2 N base calls. Sequences generated from the reverse primer were reverse complemented and combined with the forward reads prior to further analysis. Demultiplexed, screened sequences were aligned against the Greengenes core set database using PyNAST and chimera checked using ChimeraSlayer as implemented in Qiime. A full description of methods for sequence processing is available in Appendix A.

6.3.6 Bioinformatic and Statistical Analysis:

Sequences were clustered into OTUs using the default options for uclust as implemented in the Qiime package at the 0.97 similarity cutoff. Chao1, Shannon (+¶), and Simpson (D) alpha diversity indices, alpha rarefaction curves, and phylogenetic

127 distance were calculated for each sample using default values. Beta diversity measurements between samples were determined using the Bray-Curtis, Jaccard, Ochiai, and Sørensen\Dice correlation coefficients along with weighted and unweighted Unifrac methods. Procrustes analysis of the DGGE and pyrosequencing data was conducted using the similarity matrices derived using the Ochiai correlation coefficient for each dataset.

6.4 Results and Discussion:

6.4.1 DGGE Analysis:

Similar to the strategy used in Chapter 5, the samples were analyzed using DGGE to gain an initial assessment of the microbial diversity in the samples and sample correlations. The DGGE results are shown in Figure 6.1. For both the Archaea and

Bacteria, the DGGE banding patterns for the two UASB samples were quite similar, with the bacterial similarity being approximately 76% and the archaeal similarity nearly 90%.

The SBF reactor sample was more similar to the UASB samples than the WWTP sample, although this similarity was around 70% for both the bacterial and archaeal profiles. The most interesting result is the very low similarity of the WWTP archaeal profile to either of the other three samples. This suggests that the archaeal diversity in WWTP systems is very different from the diversity found in UASB and SBF systems.

6.4.2 Pyrosequencing Analysis:

As the breadth of microbial diversity in AD systems is quite broad as discussed in

Chapter 3, we followed the same sequencing strategy as used in Chapter 5 and designed

128 the experiment to achieve a high degree of sequencing depth for each sample. While measures were taken during amplicon library construction to achieve an even distribution of sequencing reads for each sample, after processing of the raw pyrosequencing reads the sample libraries had a very uneven number of sequences (Table 6.2). To compensate, alpha diversity measures were calculated based on rarefied values to reduce the effect of sample size. Further, analysis showed that the number of OTUs generated for a given sample was not linearly correlated to the number of sequences comprising the sample, implying that measures of alpha and beta diversity would reflect underlying differences in microbial diversity as opposed to differences in the number of sequencing reads.

While singleton OTUs represented over 65% of all OTUs generated, as was seen in the dataset for Chapter 5, only 1/3 of these were unclassified Bacteria while over 17% were able to be classified to a genus.

The overall microbial diversity in the dataset was similar to the global diversity described in Chapter 3 (Figure 6.2). The Chloroflexi represented nearly 40% of all sequences in the dataset while the Bacteroidetes and unclassified Bacteria both represented approximately 17% of sequences. Members of both the Chloroflexi and

Bacteroidetes are involved in the conversion of simple sugars and carbohydrates into

SCFAs during the acidogenesis phase of AD (Hernon et al., 2006, Yamada et al., 2006), and thus their abundance in the dataset is expected (Chapter 3). The abundance of unclassified Bacteria is again similar to results seen in other pyrosequencing datasets

(Bibby et al., 2011, Werner et al., 2011, Chapter 5) and large Sanger sequencing datasets

(Rivière et al., 2009), suggesting that the sequences generated by pyrosequencing were

129 not inherently more aberrant or unique compared to previous studies of microbial diversity in AD systems. Compared to the results seen in Chapter 5, sequences classified to the phyla Actinobacteria, Proteobacteria, and made up a much larger proportion of sequences recovered than in the dataset reported in Chapter 5. The increased abundance of Proteobacteria in this dataset suggests that the primers are not inherently biased against this phylum as suggested in Chapter 5.

The archaeal diversity in the dataset was limited primarily to members of the class

Methanomicrobia, with the class Thermoprotei representing less than 1% of all archaeal sequences (Figure 6.3). As seen in Chapter 5, no sequences were assigned to the methanogenic class Methanobacteria, which have been recovered in multiple AD systems (Chapter 3, Leclerc et al., 2004). While the archaeal primers were tested in silico and in vitro for their ability to amplify members of the Methanobacteria, the findings suggests that the primers may have a negative bias towards members of the class when used for community DNA amplification. The acetoclastic genus Methanosaeta was the predominant taxon, representing nearly 52% of all archaeal sequences. There were a total of 65 OTUs classified as Methanosaeta, however one of these represented over 88% of Methanosaeta sequences. Similar results were seen for OTUs classified as unclassified

Archaea, with the dominant OTU representing 75% of sequences classified as such. Two

OTUs represented 54% and 35% respectively of sequences classified as unclassified

Euryarchaeota. The second most abundant unclassified Euryarchaeota OTU shared 95% sequence similarity to sequences associated with the WSA2/ArcI group of methanogens, which have been proposed to be hydrogenotrophic methanogens (Chouari et al., 2005).

130

6.4.3 Individual Sample Composition:

Each of the samples had a unique microbial composition based on the distribution of bacterial sequences classified at the phylum level (Figure 6.2). The SBF sample had the greatest proportion of sequences classified to the phylum Chloroflexi (50%), with the family Caldilineaceae accounting for over 43% of sequences in the sample. The next largest phylum, the Bacteroidetes, represented just over 26% of sequences in the sample, with the majority of sequences classified as either unclassified Bacteroidetes (14% of total) or unclassified Porphyromonadaceae (8% of total). Unclassified Bacteria represented nearly 11.5% of sequences while the Firmicutes represented just over 6%.

The majority of Firmicutes sequences were either unclassified Clostridiales (3.7% of total) or unclassified Clostridia (1.27% of total). The Actinobacteria (~3%) and

Proteobacteria (~2%) were the only other phyla representing more than 1% of sequences in the SBF sample, with unclassified Actinomycetales (2.8% of total) being the predominant group in the Actinobacteria and the genus Smithella (~1% of total) and unclassified Syntrophaceae (0.75% of total) being most abundant in the Proteobacteria.

The SBF sample was the only sample from which sequences classified to the

Synergistetes were observed, although only 2 sequences were recovered for this phylum.

The most abundant archaeal groups were unclassified Euryarchaeota (38%), the genus

Methanosaeta (32%), and unclassified Archaea (25%). The only other archaeal taxa with greater than 1% abundance were the unclassified Thermoprotei (2.4%) and unclassified

Methanosarcinales (1.5%).

131

The UASB-G and UASB-L samples, despite being collected from the same reactor, had different bacterial and archaeal composition. The UASB-G sample had a higher proportion of sequences classified to the phylum Chloroflexi, 45% to 18%, while the

UASB-L sample had a greater proportion of Bacteroidetes, 26.8% to 8.4%. The most abundant genus in the Chloroflexi for both samples was (25% UASB-G, 10%

UASB-L), which engages in the fermentation of sugars to SCFAs. The most abundant group in the Bacteroidetes was unclassified Porphyromonadaceae (3.7% UASB-G,

13.7% UASB-L), which also engages in the fermentation of sugars to SCFAs.

Additionally, the Firmicutes had greater representation in the UASB-L sample (26%) than the UASB-G sample (6%) while the phylum Actinobacteria was more prevalent in the UASB-G fraction (14%) than in the UASB-L (4.4%). As these four phyla all contain genera that are known to produce SCFAs, it is likely that the differences in their distribution between the granular and liquid biomass indicates that while the species present in each fraction are different, they are equivalent in terms of metabolic function.

This is supported by the fact that the four phyla together represented nearly equivalent proportions in either sample, 73.4% UASB-G to 75.7% UASB-L. There was little difference in the abundance of unclassified bacterial sequences (18% UASB-G to 16.4%

UASB-L) while the UASB-G fraction had nearly twice as many Proteobacteria as the

UASB-L fraction, 6.4% to 3.8%. The genus Syntrophobacter, which is able to convert propionate to acetate (Boone and Bryant 1980), represented 3.2% of all UASB-G sequences but only 0.65% of all UASB-L sequences. Combined, members of known syntrophic families in the Proteobacteria represented over 4% of all sequences in the

UASB-G sample while they represented only 1% in the UASB-L fraction. The only

132

Proteobacterial taxon that had a higher abundance in the UASB-L sample than the

UASB-G sample was the genus Tolumonas, which produces acetate, ethanol and formate as fermentation products but does not engage in syntrophic conversion of any

SCFA (Caldwell et al., 2010). Within the archaeal domain, the genus Methanosaeta represented over 65% of UASB-G sequences but only 36% of UASB-L sequences. The enrichment of Methanosaeta in the granular sample is expected as this genus has been found to be the predominant methanogen in granular biomass from UASB systems

(Leclerc et al., 2004, Satoh et al., 2007). The liquid sample was predominated by an unclassified Archaea OTU (57% of archaeal sequences) that is tentatively associated with members of the Crenarchaeota.

The WWTP sample had the highest proportion of both sequences (21%) and OTUs

(32%) assigned as unclassified Bacteria compared to the other 3 samples. This abundance is much higher than the level seen in the Sanger sequencing library of Rivière et al. (2009) which found only 6-19% of sequences were unclassified Bacteria. This increase is possibly due to the increased depth of coverage for the WWTP sample, which had 40 times as many sequence reads as each individual sample in the study by Rivière et al (2009). The phyla Chloroflexi and Firmicutes represented 25% and 23% of sequences in the sample respectively. Similarly to the SBF sample, the unclassified Caldilineaceae was the most abundant group within the Chloroflexi, representing 18% of all sequences while the genus Levilinea, which is part of the Caldilineaceae, represented a further

5.5%. The class Anaerolineae, which contains the family Caldilineaceae, was recently shown by Yoon et al. (2010) to represent the majority of Chloroflexi found in municipal

133 waste activated sludge and that these species are important in forming flocculant biomass. Within the Firmicutes, the largest group was unclassified Clostridiales (11% of sequences) while the proteolytic genus Sedimentibacter represented 2.7% of WWTP sequences. The family Ruminococcaceae represented 2.2% of WWTP sequences while sequences classified to the genus Syntrophomonas were almost exclusively found in the

WWTP sample (208 sequences compared to 9 for the SBF sample and none in either

UASB sample).

The Bacteroidetes represented 13.5% of WWTP sequences, with the majority of sequences being unclassified Bacteroidetes (8.5%) or unclassified Bacteroidales (3%).

This result is similar to that seen by Rivière et al. (2009) and in Chapter 3, and highlights the limited knowledge concerning the role of this phylum in different AD systems. The

Proteobacteria represented over 10% of WWTP sequences, with the classes Alpha- and

Betaproteobacteria almost exclusively recovered from this sample. The family

Comamonadaceae represented 2.5% of all sequences, with half of those classified to the genus Acidovorax. This family is commonly recovered from activated sludge treatment systems (Gumaelius et al., 2001; Heylen et al., 2008) and thus its abundance in digested sludge would be expected. The syntrophic genus Smithella was also abundant, representing 1.6% of all sequences. The only other phyla to represent more than 1% of

WWTP sequences were the Spirochaetes (2.9%) and Actinobacteria (1.3%). Similar to the taxonomic diversity of the Bacteria, the archaeal diversity was also greater in the

WWTP sample than in other samples. As with the other 3 samples, the genus

Methanosaeta was the predominant taxon, representing 46% of sequences, but the

134 unclassified Methanosarcinales represented 32% while unclassified Euryarchaeota and

Archaea represented only 7% and 1% of archaeal sequences respectively. The family

Methanomicrobiaceae and genus Methanoculleus, which are hydrogenotrophic methanogens, were primarily only recovered in the WWTP sample.

6.4.4 Sample Comparisons:

As seen in Table 6.2, while each of the samples had differing numbers of raw sequence reads, the numbers of OTUs generated for a given sample were not linearly correlated. Additionally, analysis of the rarefaction curves (Figure 6.4) and estimates of maximum richness (Table 6.2) shows that the majority of expected species were recovered for each sample. This provides support that differences in sample size alone do not explain differences in sample composition. Further, the phylogenetic distance measure for each sample, which determines the degree of phylogenetic diversity, was not linearly correlated to sample size.

On the basis of OTU composition, both the SBF and WWTP samples shared the same dominant OTU, which was classified to the family Caldilineaceae of the phylum

Chloroflexi. This OTU represented nearly 23% of all sequences in the SBF sample, but just over 4% in the WWTP. Similarly, this same OTU represented only approximately

3% of sequences for either UASB sample. In the UASB-G sample, the two most abundant OTUs, each representing roughly 8% of sequences in the sample, were assigned to the genus Levilinea in the family Caldilineaceae while in the UASB-L sample the most abundant OTU, representing 11% of sequences in the sample, was assigned to the

135 family Porphyromonadaceae in the phylum Bacteroidetes. The predominance of the single OTU classified to the Caldilineaceae in the SBF sample corresponds to the higher

Simpson (D) and lower Shannon equitability (E H) values for that sample compared to the other 3 samples, which each had a more even distribution of sequences between OTUs

(Table 6.2). As the evenness of a microbial community has been suggested to be a key determinant of the ability of the community to withstand perturbations (Wittebolle et al.,

2009), the results suggest that the SBF system is more likely to suffer from negative operational events leading to a potential breakdown in system operation.

The initial DGGE analysis (Figure 6.1) of the samples suggested that the bacterial diversity of the two UASB samples was most similar with the WWTP sample being the least similar to the other 3 samples. To compare the microbial diversity between samples using the pyrosequencing data, beta diversity measures were determined using both species abundance and phylogenetic methods. Sample correlation coefficients were determined using the Ochiai method for the species abundance data or the unweighted

UniFrac method from the phylogenetic tree of OTUs. Ordination using principal coordinates analysis (PCoA) for both the Ochiai and UniFrac determined sample correlations produced similar results, with the primary differences between the two methods being the degree of separation between the granular and liquid fractions of the

UASB sample. The species abundance data determined that these two samples were more similar than when using the phylogenetic distribution of sequences. The SBF and

WWTP samples were very distinct from each other, with each being more similar to the

UASB sample than to each other. Procrustes analysis (Figure 6.5) of the DGGE and

136 pyrosequencing species abundance data showed that the two analysis methods provided similar sample correlations. This supports the finding in Chapter 5 that DGGE is able to provide an initial indication of similarities in the microbial population of AD samples.

6.5 Conclusions:

The microbial communities in AD systems are highly variable, with the distribution and abundance of taxonomic groups differing between all four AD samples analyzed. While two of the samples, UASB-G and UASB-L were both taken from the same AD system, they represented different fractions of the microbial biomass, granular and planktonic, respectively. Thus, while these two samples shared a greater proportion of OTUs with each other than with any of the other samples, only 11% of OTUs were common between the two. The UASB biomass fractions were both more similar to the

SBF sample than to the WWTP sample, with the WWTP sample having the lowest similarity to any of the other samples. This result was seen both in the initial DGGE analysis and in the subsequent pyrosequencing analysis. This confirms that DGGE is able to provide initial comparisons of the microbial diversity between AD sample, and that pyrosequencing is able to provide a more detailed description of the microbial population. Further studies examining a wide variety of AD systems can use DGGE as an initial screening method to select which samples should be explored more thoroughly using methods such as pyrosequencing to clarify the phylogenetic and functional distribution of microbes in AD systems.

137

Table 6.1: List of primers used in this study for both DGGE and pyrosequencing analysis.

Barcode Purpose Domain Sample Primer Pair Forward Primer Sequence Sequence Reverse Primer Sequence

GC-Arc344F / a Archaea All ACGGGYGCAGCAGGCGCGA - Univ519R DGGE ATTACCGCGGCKGCTG GC-Eub357F / a Bacteria All CCTACGGGAGGCAGCAG - Univ519R SBF CATGCATG UASB-G 454ArcF / ATGCATGC Archaea WCYGGTTGATCCYGCCRG YGGTRTTACCGCGGCGGCT UASB-L 454ArcR GATGAGCA WWTP CATCAGCA Pyrosequencing SBF TCAGCTGA UASB-G 454BactF / GCATGCAT Bacteria AKRGTTYGATYNTGGCTCAG GTNTBACCGCDGCTGCTG UASB-L 454BactR GAGATCAG WWTP GATCTGCA

138 a: primers GC-Arc344F and GC-Eub357F both had a 40nt GC-FODPSDWWDFKHGDWWKH¶HQGRIWKHSULPHU

138

Table 6.2: Individual and total sequence statistics and alpha diversity measures.

Individual Rarefaction Shannon Simpson Sample Sequences # of OTUs Singletons Doubletons Chao 1 Estimatea (H') (D) PDb SBF 66714 9674 6577 1316 26094 15584 5.45 0.0612 1073.034 UASB-G 76613 8173 5350 1155 20551 11660 5.49 0.0239 1007.190 UASB-L 37377 5643 3934 744 16027 9661 5.63 0.0241 711.229 WWTP 42965 8777 6305 1054 27614 15763 6.67 0.0077 994.928 Total 223669 28658 19186 3879 a: Maximum OTU richness estimate derived from fitting of the monomolecular curve function to the rarefaction data. 139 b: Phylogenetic distance

139

Figure 6.1: Denaturing gradient gel electrophoresis (DDGE) banding profiles of the V3 hypervariable region for each sample for both

140 the Archaea and Bacteria. Sample correlation coefficients were determined using the Ochiai method and used for creation of the

dendogram using the UPGMA method.

140

Figure 6.2: Neural network map showing the distribution and abundance of bacterial phyla for each sample and the entire dataset. Each sample is represented by a large colored circle (SBF=green, UASB-G=pink, UASB-L=orange, WWTP=blue). Individual phyla are represented as black circles whose size and label are sized to proportionally represent the abundance of the phylum in the total dataset. Lines connecting the samples to phyla are colored to indicate the sample of origin and the line width is proportionally set to show the relative abundance of a phylum in the original sample.

141

100%

90%

80% TM7 70% Spirochaetes 60% Proteobacteria 50% Actinobacteria 40% Firmicutes

30% Unclassified_Bacteria Bacteroidetes 20% Chloroflexi 10%

0% SBF UASB-G UASB-L WWTP Total

Figure 6.2 alternate: Distribution of bacterial sequences for each sample and the overall dataset. Only phyla representing at least 1% of sequences for any sample are shown.

142

100%

90%

80%

70% Methanoculleus

60% U_Thermoprotei U_Methanomicrobiaceae 50% U_Methanosarcinales 40% U_Euryarchaeota % Representation% 30% U_Archaea Methanosaeta 20%

10%

0% SBF UASB-G UASB-L WWTP Total

Figure 6.3: Distiribution of archaeal sequences for each sample and the overall dataset. Only taxa that represented at least 1% of archaeal sequences for any sample are shown.

143

30000

25000

20000

WWTP 15000 UASB-L 10000 UASB-G SBF

Number of OTUs Generated OTUs of Number 5000

0 0 10000 20000 30000 40000 50000 60000 Number of Sequences Analyzed

Figure 6.4: Rarefaction curves for each of the four samples.

144

Degree of variation: 32% variation: of Degree

Degree of variation: 43%

Figure 6.5: Procrustes analysis of the principal coordinates analysis (PCoA) ordination for both the DGGE and pyrosequencing sample Ochiai correlation coefficients. The principal coordinates for both the DGGE (red) and pyrosequencing (blue) samples are represented in a 2D space whose axes are synthetic but represent the degree of variation between samples. Samples clustered closer together are more similar in microbial composition than samples which are not. Sample labels are as follows: SBF (diamond), UASB-G (square), UASB-L (triangle), WWTP (circle).

145

CHAPTER 7

QUANTIFICATION OF METHANOGENIC ARCHAEA USING A MULTIPLEX QUANTITATIVE REAL TIME PCR (qPCR) ASSAY

7.1 Abstract:

The final stage of the anaerobic digestion (AD) process is the formation of methane biogas by archaeal methanogens. Previous reports in the literature, and the work discussed in previous chapters, have described differing proportions of methanogens in various AD systems. Additionally, the abundance of acetoclastic versus hydrogenotrophic methanogens in AD systems is debated. Multiplex quantitative real time PCR (qPCR) is able to provide accurate quantification of microbial groups in complex ecosystems such as AD systems. Using a combination of previously published and newly designed primers and TaqMan probes, two multiplex qPCR assays for five methanogenic genera previously determined to be abundant in AD systems

(Methanobacterium, Methanoculleus, Methanosaeta, Methanosarcina, and the

WSA2/Arc1 group) were developed. Reanalyzing samples previously analyzed primarily by 16S sequencing found differences in abundance between the sequencing and qPCR results for the quantified genera. In particular, the genus Methanobacterium, which was not recovered in any sequencing dataset, was determined to be one if not the most abundant methanogen in the samples. The proportional abundance of the genera

Methanosarcina and Methanosaeta also differed up to 88% when determined using

146 qPCR versus pyrosequencing. This indicates that 16S rRNA sequencing is not an accurate determinant of the presence or abundance of known methanogens in AD systems or other environments.

7.2 Introduction:

Anaerobic digestion (AD) is a popular waste treatment process for the remediation of high strength organic wastewaters. Conceptually, the AD process is divided into 4 distinct phases, the terminal phase of which is the production of methane (CH4) biogas.

This step is carried out by only a few known microorganisms, the methanogenic Archaea.

As the production of methane biogas through AD has recently been investigated as a source of clean, renewable energy, increasing attention has been given to the diversity and abundance of methanogens in AD systems. The methanogens in AD systems are generally divided into two phenotypic groups based on their primary substrates for methanogenesis. The hydrogenotrophic methanogens utilize CO2 and H2 as their primary substrate while the acetoclastic (also known as acetotrophic) methanogens utilize primarily acetate. Several methanogenic genera of both types have been defined by isolates recovered from AD sludge, including the acetoclastic Methanosaeta and hydrogenotrophic Methanolinea (Imachi et al., 2008; Kamagata et al., 1992).

Within AD systems, the two methanogen types play important roles in the maintenance of the AD process. The acetoclastic methanogens are believed to be responsible for the majority of methane synthesis, while the hydrogenotrophic methanogens are believed to play a vital role in the consumption of H2 in order to

147 maintain favorable thermodynamic gradients for some bacterial species such as syntrophic acetogens (Demirel and Scherer, 2008). There is disagreement in the literature, however, as to the presence and abundance of genera belonging to these two groups. Sequencing based surveys such as those conducted by Rivière et al. (2009) and in Chapters 4-6 have found that acetoclastic methanogens represent the most abundant groups, with known hydrogenotrophic methanogens a minority. However, some studies using community profiling methods such as DGGE or quantitative methods such as qPCR have found increased abundance of hydrogenotrophic methanogens compared to acetoclastic methanogens (Goberna et al., 2009).

Because sequencing surveys are known to have potential issues recovering all members of a community (Jeon et al., 2008; Reysenbach et al., 1992), quantitative real- time PCR provides a better method to accurately determine the abundance of certain methanogens in AD systems. Several qPCR primers and probes have been developed to quantify methanogens in AD systems based on both the 16S rRNA and mcrA genes

(Steinberg and Regan, 2009; Yu et al., 2005). These assays have only been used in uniplex reactions where only one target is monitored at a time. Multiplex qPCR offers the ability to monitor up to three targets in a single reaction, improving assay throughput and lowering cost of analysis. In this study a multiplex qPCR assay was developed to quantify four commonly observed methanogenic genera, Methanobacterium,

Methanosaeta, Methanosarcina, and Methanoculleus along with the previously identified

WSA2/ArcI.

148

7.3 Materials and Methods:

7.3.1 Primer/Probe Set Design

Based on the results of methanogen abundance in Chapter 3, we chose 5 methanogen genera that have been observed in most AD samples for incorporation into the multiplex qPCR assay. These genera are Methanosaeta (Mst), Methanobacterium

(Mbt), Methanoculleus (Mcu), Methanosarcina (Msc), and the uncultured WSA2/ArcI group (referred to hereafter solely as ArcI). Total Archaea (TArc) was also included to allow for determination of the percentage of total Archaea that each genus represents in a given sample. For each genus and the total Archaea, a primer/probe set was determined consisting of a forward and reverse primer in combination with a TaqMan probe specific for each target.

Previously published primer/probe sets were analyzed against the RDP database using the Probe Match tool (http://rdp.cme.msu.edu/probematch/) to determine their specificity in amplifying only the specified targets. In 4 cases, Methanosaeta,

Methanosarcina, Methanoculleus, and total Archaea, previously published primers and probes were determined to be accurate enough for use without modification. Primers and probes for Methanobacterium and ArcI were designed using Primrose (Ashelford et al.,

2002). The target sequences for Methanobacterium and were created from a compilation of isolated species within the genus along with sequences recovered from AD systems classified to the genus. Sequences derived from the study by Rivière et al. (2009) as well related sequences identified in the dataset generated in Chapter 4 were used for the design of the ArcI primer/probe set. The specificity of primers designed in this study were

149 determined in silico using Probe Match as well as through the cloning and sequencing of

PCR products amplified using the primers. Primer/probe sets for each group are listed in

Table 7.1.

7.3.2 qPCR analysis

Using pooled community DNA from various AD systems, a sample derived standard was created for each of the primer/probe sets as previously described (Yu et al.,

2005). Each standard was subsequently quantified and then diluted serially in TE to achieve a 7 log range of concentration (i.e. 1 to 107 amplicon copies/ml). Samples and standards were analyzed in triplicate 25ul multiplex reactions using the following PCR

reaction conditions: 1X PCR buffer, 5mM MgCl2, 0.07% BSA, 0.8mM dNTPs, 0.3mM of forward and reverse primer for each primer/probe set, 0.1uM of each probe, 0.2mM

ROX, 0.05U/ul Platinum Taq polymerase (Invitrogen, Carlsbad, CA), 0.5ul of standard or sample DNA, and the balance as Milli-Q. The multiplex qPCR was carried out on a

MX3000P real-time PCR system (Stratagene, somewhere CA) using a two-step amplification program as follows: 95°C for 5 minutes followed by 40 cycles of 95°C for

15 seconds followed by 60°C for 1 minute, with quantification of the fluorescent signal occurring after the 60°C primer annealing/extension step. The abundance of each target was determined by averaging the Ct value for each sample across two replicate runs, with individual replicates whose Ct was more than 1.5 standard deviations from the average removed as outliers.

7.4 Results and Discussion:

150

With the exception of the primer/probe set targeted for the genus Methanoculleus

(Mcu), each of the primer probe sets were able to generate reproducible results when used both in uniplex and multiplex qPCR reactions. As a result, the Mcu set was not utilized for further analysis. Figure 7.1 shows the percent abundance of each of the four remaining genera (Mst, Msc, Mbt, ArcI) in the same samples used for pyrosequencing in the studies detailed in Chapters 5 and 6. The most striking difference between the qPCR data and the sequencing data is the large abundance of Methanobacterium in the qPCR data, while no sequences for this genus were recovered in the pyrosequencing data. This result was unexpected because the archaeal primers used for constructing the amplicon pools for both pyrosequencing studies (Chs. 5 and 6) matched Methanobacterium sequences when checked in silico. The most likely explanation is that the primers used for these studies, though capable of annealing to and amplifying Methanobacterium sequences were negatively biased against this genus when used in vitro to amplify 16S sequences from community DNA. This has been seen before, where broad specificity primers have failed to amplify members known to be present in the sample community

(Jeon et al., 2008).

The abundance of Methanobacterium sequences in the samples suggests that hydrogenotrophic methanogens are present in a high abundance in AD samples. This would correlate with findings that hydrogenotrophic methanogens play an important role in the removal of hydrogen in AD systems (Schink, 1997). As can be seen in Figure 7.1, the total archaeal population determined using the TArc primer/probe set (blue line) often produced a lower value than the sum of the four quantified genera (orange line). Indeed,

151 the average value for the TArc set was around 1 log lower than results previously reported in the literature using the same primer/probe set for the investigation of AD systems (Shin et al., 2010; Song et al., 2010). The reason for this result is currently unknown, however issues such as the efficiency of extraction of all archaeal DNA in the samples or more likely insufficient optimization of the qPCR assay could both negatively affect the results.

Aside from the differences between the qPCR and pyrosequencing datasets for the genus Methanobacterium, differences in the proportional abundance for the other two quantified genera and the ArcI group were observed. Table 7.2 shows the percent abundance of the four measured groups in both the pyrosequencing datasets described in chapters 5 and 6 and the qPCR data generated in this study. While the genus

Methanosaeta was generally recovered in similar proportions using both methods, the abundance differed between 4 and 24% depending on the sample. Two samples, Day 57 and Day 427/SBF, had a higher measured abundance of Methanosaeta when using qPCR, while the other seven samples had higher abundance in the pyrosequencing data. The largest differences between the pyrosequencing and qPCR datasets were for the genus

Methanosarcina. For two samples, Day 246 and Day 331, Methanosarcina was determined to represent 83 and 92% of all Archaea using pyrosequencing respectively, but only 4% when determined using qPCR. In both of these samples, the abundance of

Methanosaeta in both the pyrosequencing and qPCR data was low while the abundance of Methanobacterium in the qPCR data was over 90%. This pattern agrees with the hypothesis that the pyrosequencing primers failed to recover Methanobacterium

152 sequences, as the pyrosequencing data would represent only members of the Archaea no associated with Methanobacterium. The ArcI group was found in only three samples in either the qPCR or pyrosequencing data, UASB-G, UASB-L, and WWTP. For the

UASB samples, the proportion of ArcI in the sample was much greater in the pyrosequencing data than the qPCR (Table 7.2). In the WWTP sample, the ArcI group was only observed to represent 4% of total sequences in the pyrosequencing dataset, while in the qPCR dataset this level was 69%. Rivière et al. (2009), using clone library sequences, determined that the ArcI group represented up to 77% of Archaea in samples derived from waste water treatment plants. Due to the discrepancies between the pyrosequencing and qPCR data for the ArcI group, their true abundance in AD systems, particularly waste water treatment plants is unresolved.

Figure 7.2 shows the percent abundance of the four measured genera for all of the

21 samples obtained from the digester run described in Chapter 5. While the abundance of none of the genera can be directly correlated to any of the performance data for the reactor, some generalized conclusions can be reached based on our knowledge of the digester operation. Days 97-194 corresponded to the step-wise increase in temperature from mesophilic (~ 37°C) to thermophilic (~55°C). As seen in the figure, this corresponded with a subsequent decrease in the total archaeal population, with

Methanobacterium representing an even greater proportion of the total as temperature was increased. This corresponds to previous reports of methanogen abundance during the shift from mesophilic to thermophilic operation (Hofman-Bang et al., 2003).

153

Over the course of the thermophilic operational period, transition metals were added to the reactor vessel in order to stimulate the growth and activity of methanogens.

These additions occurred on days 119, 128, 132, 143, 154, 158, and 165. As seen in

Figure 7.2, there was an increase in the proportion of Methanobacterium, however as these additions corresponded to the period of thermophilic operation, whether or not metals addition had a direct effect on the methanogen population is confounded by the temperature increase. During the period from day 183 to 197, a series of acetic acid additions were made as the primary source of feed for the system. As can be seen in

Figure 7.2, this led to an initial increase in the proportion of Methanosaeta and

Methanosarcina in the system, along with an overall increase in the abundance of methanogens. As both Methanosaeta and Methanosarcina are acetoclastic, the increase in their abundance in the samples would be expected when acetate was added. Sample days 246-331 coincided with an increase in the pH of the system to nearly 7.8. This was primarily due to an increase in ammonia production that was subsequently brought under control through the addition of HCl (data not shown). Disturbances such as an increase in ammonia production are known to have a negative effect on methanogen populations, often leading to a shift from acetoclastic methanogenesis to a combined pathway based on acetate oxidation by syntrophic bacteria combined with hydrogenotrophic methanogenesis (Angenent et al., 2002; Karakashev et al., 2006; Zinder and Koch, 1984).

This would correspond to the increase in Methanobacterium abundance to nearly 95% of the total. The abundance of Methanosarcina compared to Methanosaeta is most likely due to the increased tolerance of Methanosarcina spp. to inhibitory compounds such as ammonia compared to the Methanosaeta, and that some Methanosarcina spp. are capable

154 of using a variety of substrates for methanogenesis.

7.5 Conclusions:

While 16S rRNA sequencing surveys are a popular strategy for uncovering the microbial diversity in environmental samples such as AD systems, the results observed in such datasets may not accurately represent the true presence and abundance of certain phylogenetic clades. The multiplex qPCR assays developed in this study showed that the presence and abundance of the genus Methanobacterium was missed in prior sequencing datasets of the AD samples discussed in previous chapters. This indicates that for AD systems, where methanogens play a key role in the AD process, qPCR assays such as the two developed here should be used in combination with sequencing surveys to serve as a validation of the sequencing results. As only three known genera and one uncultured clade (ArcI) were quantified, the ability to accurately link the results to reactor performance characteristics was limited despite these groups supposedly representing the major methanogenic groups in AD systems. This study, however, was the first to quantify the abundance of the ArcI group, which was found to be abundant in the AD systems of WWTP plants, but only a minority member of other AD systems. The unique distribution of the ArcI group suggests that this group serves a unique function in AD systems operating on municipal waste water sludge and deserves further examination of its distribution in the environment.

155

Table 7.1 Primers and probes used in this study. Multiplex sets are defined as the primer/probe sets used in the multiplex qPCR reactions.

Group Primer/Probe Amplicon Set Target Name Sequence (5' - 3') Length Reference ArcI-F GCTCATGCATTGCATGG WSA2/ArcI ArcI-Taq Cy5-GTAATACCGGCAGCTCGAGTGG-BHQ 143 bp This Study ArcI-R TATCCGGCTACGAACGTT Set A Mbt-202F CGCCTAAGGATGGATC Methanobacterium Mbt-341Taq FAM-CGCGAAACCTCCGCAATGC-BHQ 148 bp This Study Mbt-399R TAAGAGTGGCACTTGGGK Mcu-934F AGGAATTGGCGGGGGAGCAC Franke-Whittle et al 2009, Methanoculleus Mcu-1023Taq 309 bp Cy5-GAATGATTGCCGGGCTGAAGACTC-BHQ Shigematsu et al. 2003 Mcu-1200R CCGGATAATTCGGGGCATGCTG MB1b CGGTTTGGTCAGTCCTCCGG

156 Set B Methanosarcina SAR761Taq HEX-ACCAGAACGGGTTCGACGGTGAGG-BHQ 271 bp Shigematsu et al 2003 SAR835R AGACACGGTCGCGCCATGCCT MS1b CCGGCCGGATAAGTCTCTTGA Methanosaeta SAE761Taq FAM-ACCAGAACGGACCTGACGGCAAGG-BHQ 272 bp Shigematsu et al 2003 SAE835R GACAACGGTCGCACCGTGGCC TArc-787F ATTAGATACCCSBGTAGTCC - Total Archaea TArc-915Taq HEX-AGGAATTGGCGGGGGAGCAC-BHQ 271 bp Yu et al. 2005 TArc-1059R GCCATGCACCWCCTCT

156

Table 7.2 Comparison of the proportion of methanogens in qPCR and pyrosequencing data. Analysis method refers to the qPCR (Q) data determined in this study and pyrosequencing (P) data determined in chapters 5 and 6. The group Other Archaea represents all sequences in the pyrosequencing data that were not assigned to any of the 4 groups examined using qPCR. Mst: Methanosaeta, Msc: Methanosarcina, Mbt: Methanobacterium.

Analysis Sample Method Mst Msc ArcI Mbt Other Archaea Q 83% - 0.04% 17% - Day 57 P 67% 0.36% - - 33% Q 36% - 0.25% 64% - Day 122 P 48% - - - 52% Q 12% 0.74% 0.08% 87% - Day 150 P 18% 6% - - 76% Q 4% 4% 0.01% 92% - Day 246 P 8% 83% - - 9% Q 0.002% 4% 0.0001% 96% - Day 331 P 1.71% 92% - - 7% Day 427 / Q 47% - 0.72% 52% - SBF P 32% 0.30% - - 68% Q 41% 0.004% 0.04% 59% - UASB-G P 65% 0.25% 9% - 26% Q 21% 0.002% 0.03% 79% - UASB-L P 35% - 2.50% - 63% Q 30% - 69% 1% - WWTP P 46% - 4% - 50%

157

1.00E+07

1.00E+06

1.00E+05 100% 1.00E+04

1.00E+03

1.00E+02

1.00E+01

0% 1.00E+00 57 122 150 246 331 427/SBF UASB-G UASB-L WWTP

ArcI Methanobacterium Methanosaeta Methanosarcina TArc Total (ArcI+Mbt+MS1b+MB1b)

Figure 7.1 Proportional abundance of methanogens in the samples used for pyrosequencing. The four quantified methanogens are presented as proportional abundance in columns for each sample. Total archaea (TArc) and the sum of the four quantified methanogens are shown as lines, with values read off the right y-axis. As can be seen, the TArc primer/probe set often gave lower values than the sum of the four quantified genera.

158

1.00E+07

1.00E+06

100% 1.00E+05

1.00E+04

1.00E+03 159

1.00E+02

1.00E+01

0% 1.00E+00 57 79 85 97 98 116 122 134 150 164 188 194 246 261 289 331 350 352 382 394 427

ArcI Methanobacterium Methanosaeta Methanosarcina Tarc Total (ArcI+Mbt+MS1b+MB1b)

Figure 7.2 Proportional abundance of methanogens in AD samples recovered over time from the same AD systems used in Chapter 5. The four quantified methanogens are presented as proportional abundance in columns for each sample. Total archaea (TArc) and the sum of the four quantified methanogens are shown as lines, with values read off the right y-axis.

159

CHAPTER 8 GENERAL DISCUSSION

While AD has been used for decades as a waste processing technology, its ability to serve as a potential renewable energy resource has spurred increased attention into how the microbial communities in AD systems carryout the digestion process and what factors influence their activity. Previous surveys of the microbial diversity have uncovered a wide variety of microorganisms involved in the AD process, however even the largest studies estimate that at most 75% of the expected maximum number of species involved in AD have been witnessed (Rivière et al. 2009). This has led to these microbial

FRPPXQLWLHVEHLQJUHIHUUHGWRDV³EODFNER[HV´ZKRVHXOWLPDWHGLYHUVLW\DQGDFWLYLW\DUH still poorly understood (Ahring 2003). The goal of the research presented in this dissertation was to use advanced molecular and statistical analysis methods in order to further characterize the microbial diversity within these complex systems.

Because initial studies of microbial diversity in AD systems were often based on small sequencing datasets, and many datasets were produced and not published, I conducted a meta-analysis of available 16S rRNA sequences in order to re-evaluate and update the known diversity participating in AD. The results, as discussed in Chapter 3, showed that the majority of bacterial species in AD systems are members of the phyla

Bacteroidetes, Chloroflexi, Firmicutes, and Proteobacteria while the majority of Archaea 160 were methanogens. This result agrees with individual studies of the alpha microbial diversity in AD systems, as would be expected, however a significant number of bacterial sequences could not be classified to a known genus. This severely inhibits attempts to place these species in their proper context within the 4 phases of the AD processes.

Within the Archaea, the obligately acetoclastic genus Methanosaeta was the most predominant group while known hydrogenotrophic methanogens represented a minority.

Initial analysis suggests that the large number of unclassified Euryarchaeota in the dataset represent an as yet uncultured group of hydrogenotrophic methanogens referred to as the WSA2/ArcI group. It iVSRVVLEOHWKDWWKLVJURXSUHSUHVHQWVWKH³PLVVLQJ´ hydrogenotrophic methanogens which are expected to be abundant in AD systems where they serve as a hydrogen sink to allow syntrophic acetogenesis to occur (Schink, 1997).

Using qPCR for the WSA2/ArcI group showed that it is abundant in WWTP systems, but has low or no representation in other AD systems. Further analysis of the role of this

WSA2/ArcI group is necessary to determine if they are ubiquitous in AD systems and what their potential presence may indicate.

Historically, only the granular biomass of systems that produce granular sludge has been investigated, with the liquid/planktonic biomass component left unresolved.

Fractionation of granular sludge into separate granular and liquid components, as used in

Chapters 4 and 6, showed differences in microbial diversity that may be indicative of AD

161 system performance. Principally, in the study detailed in Chapter 4, both the family

Veillonellaceae and phylum Bacteroidetes were more abundant in the liquid fraction than in the granular fraction. These differences were shown to be even greater in the study discussed in Chapter 6, where large differences in the abundance of specific genera were identified. As these differences have not previously been explored, the results of this research suggest that the examination of both biomass fractions should be included in the design of further experiments on granular sludge systems such as the UASB design popularly used in industry.

The effects of organic loading rate, hydraulic retention time, and concentration of

SCFAs on microbial populations in AD systems have traditionally been examined using community profiling methods such as DGGE. However, the ability of DGGE to accurately reflect changes in microbial diversity is debated. The use of large scale pyrosequencing of bacterial and archaeal communities provided a thorough and extensive recount of the microbial diversity in the studies discussed in Chapters 5 and 6.

Comparisons of sample similarity using DGGE and pyrosequencing data provided similar findings. Future researchers using DGGE as a screening method to determine initial sample similarities can use the results of such analysis as a method for determining which samples should be investigated further using more robust techniques such as pyrosequencing. While pyrosequencing can be applied to hundreds of samples in a single

162 multiplexed sequencing run, the number of sequences per sample decreases as the number of samples increases. In cases such as AD systems, where minority species may be important indicators of reactor performance, pre-screening with DGGE can allow for the researcher to choose a smaller subset of samples for deeper sequencing analysis. A caveat, however, is that this similarity has only been shown in our results where the hypervariable region screened using DGGE (V3) was also covered within the pyrosequencing data (V1-3). As different hypervariable regions are used for both DGGE and pyrosequencing analyses, the results of comparisons using different hypervariable regions will need to be confirmed.

While a global framework for the microbial diversity of AD systems is now available (Chapter 3), it is not necessarily representative of any single AD system. As seen in the results of Chapters 4-6, the microbial diversity of different AD systems, and even in one AD system over time, is highly variable and dependent on a number of known and unknown factors. Known factors influencing diversity, including temperature, feedstock composition, and organic loading rate among others, have been investigated primarily using community profiling methods such as DGGE. As demonstrated in Chapters 5 and 6, DGGE is able to accurately portray relationships between microbial communities, however it is unable to provide detailed insight into what microbial groups are changing. Multiplexed pyrosequencing allows for the

163 examination of multiple samples at a single time, with the generation of thousands of sequences per sample. The exploratory use of CCA in Chapter 5 and its use by Werner et al. (2011) in the analysis of pyrosequencing datasets showed that common ecological statistics used in the study of macrobiotic communities can successfully be applied to microbial communities. This allows for a statistical linkage of specific microbial groups to operational parameters. Later studies will need to incorporate such statistical analyses into the experimental design to robustly analyze the patterns of microbial diversity in AD systems and determine if certain groups are linked to negative events that lead to system breakdown.

Although differences between the microbial consortia of granular and liquid biomass are now known to occur (Chapters 4, 6), the importance and role of the microorganisms in the liquid fraction needs to be clarified. It is known that during initial granule development, select microbial species play a role in physically aggregating microbes (Hulshof Pol et al. 2004), however whether there is a selective inclusion or exclusion of certain microbes is not yet understood. A controlled study examining the stages of granule formation and steady state operation of a small scale UASB reactor using multiplexed pyrosequencing of the granular and liquid fractions could determine the extent to which the microbial communities of these fractions are distinct from each other. By applying advanced statistical analysis methods such as CCA or machine

164 learning, it could be possible to determine if the two fractions represent different microbial niches in a single AD reactor. To control for the possibility that the planktonic microbiota are simply transient residents, brought into and removed from the reactor with the incoming feedstock and treated effluent, sampling of both the feedstock and effluent microbial communities should also be conducted.

Ultimately, while 16S sequencing surveys using pyrosequencing can provide detailed information on the microbial diversity in AD systems, they are unable to accurately portray the metabolic functioning of the system. This is partly due to the large number of as yet unclassified microorganisms, for which little to no metabolic information is known (Chapters 3-6). Further, the recovery of 16S sequences for identifiable species will not necessarily correlate with their known metabolic activity.

The rapidly advancing fields of metagenomics and metatranscriptomics can provide unique insight into the patterns of microbial activity in AD systems, even for species which have not yet been isolated in pure culture. Metagenome analysis using pyrosequencing allows for the assembly of whole genomes for uncultured microorganisms, while recent advances in Illumina sequencing technology provide unprecedented levels of coverage for transcript libraries. Using a multiple sequencing technology approach, highly accurate metagenomic libraries can be constructed using a combined Sanger and pyrosequencing approach for a small number of samples. These

165 assembled metagenomes can then serve as a scaffold for the mapping of transcript reads, allowing for the tracking of changes in microbial activity in response to a variety of perturbations. It is very likely that the results of such analyses will yield more insight into the activity of microorganisms in AD systems than has previously been determined using traditional molecular analysis methods.

As AD is increasingly necessary for waste treatment, and has been proposed to be a key component in the development of clean and renewable energy systems, further developments to maximize treatment efficiency and biogas production are necessary.

The results of the work presented in the preceding chapters, provide a more accurate depiction of the microbial communities involved in the AD process. These results, combined with those from the proposed analyses above, can then be incorporated into the current ADM1 model to create a more accurate picture of how the AD process occurs, providing new avenues for system optimization.

166

WORKS CITED

Abdo, Z., Schüette, U.M.E., Bent, S.J., Williams, C.J., Forney, L.J., Joyce, P., 2006. Statistical methods for characterizing diversity of microbial communities by analysis of terminal restriction fragment length polymorphisms of 16S rRNA genes. Environ Microbiol, 8, 929-38.

Acosta-Martinez, V., Dowd, S., Sun, Y., Allen, V., 2008. Tag-encoded pyrosequencing analysis of bacterial diversity in a single soil type as affected by management and land use. Soil Biol Biochem, 40, 2762-2770.

Ahring, B., Sandberg, M., Angelidaki, I., 1995. Volatile fatty-acids as indicators of process imbalance in anaerobic digesters. Appl Microbiol Biotechnol, 43, 559- 565.

Ahring, B.K., 2003. Perspectives for anaerobic digestion. Adv Biochem Eng Biotechnol, 81, 1-30.

Ahring, B.K., Mladenovska, Z., Iranpour, R., Westermann, P., 2002. State of the art and future perspectives of thermophilic anaerobic digestion. Water Sci Technol, 45, 293-8.

Akarsubasi, A.T., Ince, O., Kirdar, B., Oz, N.A., Orhon, D., Curtis, T.P., Head, I.M., Ince, B.K., 2005. Effect of wastewater composition on archaeal population diversity. Water Res, 39, 1576-84.

Alatriste-Mondragón, F., Samar, P., Cox, H.H.J., Ahring, B.K., Iranpour, R., 2006. Anaerobic codigestion of municipal, farm, and industrial organic wastes: a survey of recent literature. Water Environ Res, 78, 607-36.

Anderson, I.J., Ulrich, L.E., Lupa, B., Susanti, D., Porat, I., Hooper, S.D., Lykidis, A., Sieprawska-Lupa, M., Dharmarajan, L., Goltsman, E., Lapidus, A., Saunders, E., Han, C., Land, M., Lucas, S., Mukhopadhyay, B., Whitman, W.B., Woese, C., Bristow, J., Kyrpides, N., 2009. Genomic characterization of methanomicrobiales reveals three classes of methanogens. PLoS ONE, 4, e5797.

Andersson, A.F., Lindberg, M., Jakobsson, H., Bäckhed, F., Nyrén, P., Engstrand, L., 2008. Comparative analysis of human gut microbiota by barcoded pyrosequencing. PLoS ONE, 3, e2836.

167

Angelidaki, I., Ellegaard, L., Ahring, B.K., 2003. Applications of the anaerobic digestion process. Adv Biochem Eng Biotechnol, 82, 1-33.

Angenent, L.T., Sung, S., 2001. Development of anaerobic migrating blanket reactor (AMBR), a novel anaerobic treatment system. Water Res, 35, 1739-47.

Angenent, L.T., Sung, S., Raskin, L., 2002. Methanogenic population dynamics during startup of a full-scale anaerobic sequencing batch reactor treating swine waste. Water Res, 36, 4648-54.

Angenent, L.T., Sung, S., Raskin, L., 2004. Formation of granules and Methanosaeta fibres in an anaerobic migrating blanket reactor (AMBR). Environ Microbiol, 6, 315-22.

Ariesyady, H.D., Ito, T., Okabe, S., 2007a. Functional bacterial and archaeal community structures of major trophic groups in a full-scale anaerobic sludge digester. Water Res, 41, 1554-68.

Ariesyady, H.D., Ito, T., Yoshiguchi, K., Okabe, S., 2007b. Phylogenetic and functional diversity of propionate-oxidizing bacteria in an anaerobic digester sludge. Appl Microbiol Biotechnol, 75, 673-83.

Ashelford, K.E., Chuzhanova, N.A., Fry, J.C., Jones, A.J., Weightman, A.J., 2005. At least 1 in 20 16S rRNA sequence records currently held in public repositories is estimated to contain substantial anomalies. Appl Environ Microbiol, 71, 7724-36.

Ashelford, K.E., Chuzhanova, N.A., Fry, J.C., Jones, A.J., Weightman, A.J., 2006. New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras. Appl Environ Microbiol, 72, 5734-41.

Balk, M., Weijma, J., Stams, A.J.M., 2002. Thermotoga lettingae sp. nov., a novel thermophilic, methanol-degrading bacterium isolated from a thermophilic anaerobic reactor. Int J Syst Evol Microbiol, 52, 1361-8.

Bapteste, E., Brochier, C., Boucher, Y., 2005. Higher-level classification of the Archaea: evolution of methanogenesis and methanogens. Archaea, 1, 353-63.

168

Barredo, M.S., Evison, L.M., 1991. Effect of propionate toxicity on methanogen-enriched sludge, Methanobrevibacter smithii, and Methanospirillum hungatii at different pH values. Appl Environ Microbiol, 57, 1764-9.

Bibby, K., Viau, E., Peccia, J., 2010. Pyrosequencing of the 16S rRNA gene to reveal bacterial pathogen diversity in biosolids. Water Res, 44, 4252-60.

Binladen, J., Gilbert, M.T.P., Bollback, J.P., Panitz, F., Bendixen, C., Nielsen, R., Willerslev, E., 2007. The use of coded PCR primers enables high-throughput sequencing of multiple homolog amplification products by 454 parallel sequencing. PLoS ONE, 2, e197.

Blackall, L.L., Stratton, H., Bradford, D., Dot, T.D., Sjörup, C., Seviour, E.M., Seviour, R.J., 1996. "Candidatus Microthrix parvicella", a filamentous bacterium from activated sludge sewage treatment plants. Int J Syst Bacteriol, 46, 344-6.

Boone, D.R., Bryant, M., 1980. Propionate-Degrading Bacterium, Syntrophobacter wolinii sp. nov. gen. nov., from Methanogenic Ecosystems. Appl Environ Microbiol, 40, 626-632.

Burrell, P.C., O'Sullivan, C., Song, H., Clarke, W.P., Blackall, L.L., 2004. Identification, detection, and spatial resolution of Clostridium populations responsible for cellulose degradation in a methanogenic landfill leachate bioreactor. Appl Environ Microbiol, 70, 2414-9.

Caldwell, M.E., Allen, T.D., Lawson, P.A., Tanner, R.S., 2010. Tolumonas osonensis sp. nov., isolated from anoxic freshwater sediment. Int J Syst Evol Microbiol.

Caporaso, J.G. (2010, December 17). New default parameters for uclust otu pickers [Web log message]. Retrieved from http://qiime.wordpress.com/2010/12/17/new- default-parameters-for-uclust-otu-pickers/

Caporaso, J.G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F.D., Costello, E.K., Fierer, N., Peña, A.G., Goodrich, J.K., Gordon, J.I., Huttley, G.A., Kelley, S.T., Knights, D., Koenig, J.E., Ley, R.E., Lozupone, C.A., McDonald, D., Muegge, B.D., Pirrung, M., Reeder, J., Sevinsky, J.R., Turnbaugh, P.J., Walters, W.A., Widmann, J., Yatsunenko, T., Zaneveld, J., Knight, R., 2010. QIIME allows analysis of high-throughput community sequencing data. Nat Methods, 7, 335-6.

169

Chao, A., 1987. Estimating the Population Size for Capture-Recapture Data with Unequal Catchability. Biometrics, 43, 783-791.

Chao, A., Bunge, J., 2002. Estimating the number of species in a Stochastic abundance model. Biometrics, 58, 531-539.

Chen, S., Dong, X., 2005. Proteiniphilum acetatigenes gen. nov., sp. nov., from a UASB reactor treating brewery wastewater. Int J Syst Evol Microbiol, 55, 2257-61.

Chen, Y., Cheng, J.J., Creamer, K.S., 2008. Inhibition of anaerobic digestion process: a review. Bioresour Technol, 99, 4044-64.

Chouari, R., Le Paslier, D., Daegelen, P., Ginestet, P., Weissenbach, J., Sghir, A., 2003. Molecular evidence for novel planctomycete diversity in a municipal wastewater treatment plant. Appl Environ Microbiol, 69, 7354-63.

Chouari, R., Le Paslier, D., Daegelen, P., Ginestet, P., Weissenbach, J., Sghir, A., 2005. Novel predominant archaeal and bacterial groups revealed by molecular analysis of an anaerobic sludge digester. Environ Microbiol, 7, 1104-15.

Chouari, R., Le Paslier, D., Dauga, C., Daegelen, P., Weissenbach, J., Sghir, A., 2005. Novel major bacterial candidate division within a municipal anaerobic sludge digester. Appl Environ Microbiol, 71, 2145-53.

Collado, L., Levican, A., Perez, J., Figueras, M.J., 2010. Arcobacter defluvii sp. nov., isolated from sewage samples. Int J Syst Evol Microbiol, [Epub ahead of print] doi: 10.1099/ijs.0.025668-0.

Cressman, M.D., Yu, Z., Nelson, M.C., Moeller, S.J., Lilburn, M.S., Zerby, H.N., 2010. Interrelations between the microbiotas in the litter and in the intestines of commercial broiler chickens. Appl Environ Microbiol, 76, 6572-6582. de Bok, F.A.M., Stams, A.J.M., Dijkema, C., Boone, D.R., 2001. Pathway of propionate oxidation by a syntrophic culture of Smithella propionica and Methanospirillum hungatei. Appl Environ Microbiol, 67, 1800-4.

Delbès, C., Moletta, R., Godon, J.J., 2000. Monitoring of activity dynamics of an anaerobic digester bacterial community using 16S rRNA polymerase chain reaction--single-strand conformation polymorphism analysis. Environ Microbiol, 2, 506-15.

170

Demirel, B., Scherer, P., 2008. The roles of acetotrophic and hydrogenotrophic methanogens during anaerobic conversion of biomass to methane: a review. Rev Environ Sci Biotechnol, 7, 173-190.

DeSantis, T.Z., Hugenholtz, P., Keller, K., Brodie, E.L., Larsen, N., Piceno, Y.M., Phan, R., Andersen, G.L., 2006. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res, 34, W394-9.

Díaz, E.E., Amils, R., Sanz, J.L., 2003. Molecular ecology of anaerobic granular sludge grown at different conditions. Water Sci Technol, 48, 57-64.

Díaz, E.E., Stams, A.J.M., Amils, R., Sanz, J.L., 2006. Phenotypic properties and microbial diversity of methanogenic granules from a full-scale upflow anaerobic sludge bed reactor treating brewery wastewater. Appl Environ Microbiol, 72, 4942-9.

Dollhopf, Hashsham, Tiedje, J.M., 2001. Interpreting 16S rDNA T-RFLP data: application of self-organizing maps and principal component analysis to describe community dynamics and convergence. Microb Ecol, 42, 495-505.

Dong, X., Xin, Y., Jian, W., Liu, X., Ling, D., 2000. Bifidobacterium thermacidophilum sp. nov., isolated from an anaerobic digester. Int J Syst Evol Microbiol, 50 Pt 1, 119-25.

Edwards, J., McEwan, N., Travis, A., Wallace, R.J., 2004. 16S rDNA library-based analysis of ruminal bacterial diversity. Antonie van Leeuwenhoek, 86, 263-81.

Egert, M., Friedrich, M.W., 2003. Formation of pseudo-terminal restriction fragments, a PCR-related bias affecting terminal restriction fragment length polymorphism analysis of microbial community structure. Appl Environ Microbiol, 69, 2555-62.

Etchebehere, C., Pavan, M.E., Zorzópulos, J., Soubes, M., Muxí, L., 1998. Coprothermobacter platensis sp. nov., a new anaerobic proteolytic thermophilic bacterium isolated from an anaerobic mesophilic sludge. Int J Syst Bacteriol, 48 Pt 4, 1297-304.

Energy Information Administration. (2010). International Energy Outlook 2010 (DOE Report No. DOE/EIA-0484(2010)). Washington, DC: U.S. Government Printing Office.

171

Fermoso, F.G., Bartacek, J., Manzano, R., van Leeuwen, H.P., Lens, P.N.L., 2010. Dosing of anaerobic granular sludge bioreactors with cobalt: Impact of cobalt retention on methanogenic activity. Bioresour Technol, 101(24), 9429-9437.

Fernández, N., Díaz, E.E., Amils, R., Sanz, J.L., 2008. Analysis of microbial community during biofilm development in an anaerobic wastewater treatment reactor. Microb Ecol, 56, 121-32.

Figuerola, E.L.M., Erijman, L., 2007. Bacterial taxa abundance pattern in an industrial wastewater treatment system determined by the full rRNA cycle approach. Environ Microbiol, 9, 1780-1789.

Fisher, R.A., Corbet, A.S., Williams, C., 1943. The relation between the number of species and the number of individuals in a random sample of an animal population. J Anim Ecol, 12, 42-58.

Franke-Whittle, I.H., Goberna, M., Insam, H., 2009. Design and testing of real-time PCR primers for the quantification of Methanoculleus, Methanosarcina, Methanothermobacter, and a group of uncultured methanogens. Can J Microbiol, 55, 611-6.

Gafan, G.P., Spratt, D.A., 2005. Denaturing gradient gel electrophoresis gel expansion (DGGEGE)--an attempt to resolve the limitations of co-migration in the DGGE of complex polymicrobial communities. F EMS Microbiol Lett, 253, 303-7.

Ganesan, A., Chaussonnerie, S., Tarrade, A., Dauga, C., Bouchez, T., Pelletier, E., Le Paslier, D., Sghir, A., 2008. Cloacibacillus evryensis gen. nov., sp. nov., a novel asaccharolytic, mesophilic, amino-acid-degrading bacterium within the phylum 'Synergistetes', isolated from an anaerobic sludge digester. Int J Syst Evol Microbiol, 58, 2003-12.

Garcia, J.L., Patel, B.K., Ollivier, B., 2000. Taxonomic, phylogenetic, and ecological diversity of methanogenic Archaea. Anaerobe, 6, 205-26.

Goberna, M., Insam, H., Franke-Whittle, I.H., 2009. Effect of biowaste sludge maturation on the diversity of thermophilic bacteria and archaea in an anaerobic reactor. Appl Environ Microbiol, 75, 2566-72.

172

Godon, J.J., Zumstein, E., Dabert, P., Habouzit, F., Moletta, R., 1997. Molecular microbial diversity of an anaerobic digestor as determined by small-subunit rDNA sequence analysis. Appl Environ Microbiol, 63, 2802-13.

Greenacre, M., 2007. Correspondence Analysis in Practive. Chapman & Hall, Boca Raton, FL.

Gujer, W., Henze, M., Mino, T., van Loosdrecht, M., 1999. Activated Sludge Model No. 3. Water Science and Technology, 39, 183-193.

Gumaelius, L., Magnusson, G., Pettersson, B., Dalhammar, G., 2001. Comamonas denitrificans sp. nov., an efficient denitrifying bacterium isolated from activated sludge. Int J Syst Evol Microbiol, 51, 999-1006.

Haas, B.J., Gevers, D., Earl, A., Feldgarden, M., Ward, D.V., Giannokous, G., Ciulla, D., Tabbaa, D., Highlander, S.K., Sodergren, E., Methe, B., DeSantis, T.Z., Petrosino, J.F., Knight, R., Birren, B.W., 2011. Chimeric 16S rRNA sequence formation and detection in Sanger and 454-pyrosequenced PCR amplicons. Genome research 21(3), 494-504.

Hamady, M., Walker, J.J., Harris, J.K., Gold, N.J., Knight, R., 2008. Error-correcting barcoded primers for pyrosequencing hundreds of samples in multiplex. Nat Methods, 5, 235-7.

Hatamoto, M., Imachi, H., Ohashi, A., Harada, H., 2007. Identification and cultivation of anaerobic, syntrophic long-chain fatty acid-degrading microbes from mesophilic and thermophilic methanogenic sludges. Appl Environ Microbiol, 73, 1332-40.

Hernon, F., Forbes, C., Colleran, E., 2006. Identification of mesophilic and thermophilic fermentative species in anaerobic granular sludge. Water Sci Technol, 54, 19-24.

Heylen, K., Lebbe, L., De Vos, P., 2008. Acidovorax caeni sp. nov., a denitrifying species with genetically diverse isolates from activated sludge. Int J Syst Evol Microbiol, 58, 73-7.

Høj, L., Olsen, R.A., Torsvik, V.L., 2005. Archaeal communities in High Arctic wetlands at Spitsbergen, Norway (78 degrees N) as characterized by 16S rRNA gene fingerprinting. F EMS Microbiol Ecol, 53, 89-101.

173

Holland, P.M., Abramson, R.D., Watson, R., Gelfand, D.H., 1991. Detection of specific polymerase chain reaction product by utilizing the 5'----3' exonuclease activity of Thermus aquaticus DNA polymerase. Proc Natl Acad Sci USA, 88, 7276-80.

Holmes, D.E., Nevin, K.P., Woodard, T.L., Peacock, A.D., Lovley, D.R., 2007. Prolixibacter bellariivorans gen. nov., sp. nov., a sugar-fermenting, psychrotolerant anaerobe of the phylum Bacteroidetes, isolated from a marine- sediment fuel cell. Int J Syst Evol Microbiol, 57, 701-7.

Hong, S., Bunge, J., Leslin, C., Jeon, S., Epstein, S., 2009. Polymerase chain reaction primers miss half of rRNA microbial diversity. ISME J, 3(12), 1365-1373.

Hori, T., Haruta, S., Ueno, Y., Ishii, M., Igarashi, Y., 2006. Dynamic transition of a methanogenic population in response to the concentration of volatile fatty acids in a thermophilic anaerobic digester. Appl Environ Microbiol, 72, 1623-30.

Huber, J.A., Welch, D.B.M., Morrison, H.G., Huse, S.M., Neal, P.R., Butterfield, D.A., Sogin, M.L., 2007. Microbial population structures in the deep marine biosphere. Science, 318, 97-100.

Hugenholtz, P., Goebel, B.M., Pace, N.R., 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J Bacteriol, 180, 4765- 74.

Hughes, J.B., Hellmann, J.J., Ricketts, T.H., Bohannan, B.J., 2001. Counting the uncountable: statistical approaches to estimating microbial diversity. Appl Environ Microbiol, 67, 4399-406.

Hulshoff Pol, L.W., de Castro Lopes, S.I., Lettinga, G., Lens, P.N.L., 2004. Anaerobic sludge granulation. Water Res, 38, 1376-89.

Hunter, M.L.,Gibbs, J.P., 2006. Fundamentals of Conservation Biology (3rd ed.). Wiley- Blackwell, Malden, MA.

Hurlbert, S., 1971. The nonconcept of species diversity: a critique and alternative parameters. Ecology, 52, 577-586.

Imachi, H., Sakai, S., Sekiguchi, Y., Hanada, S., Kamagata, Y., Ohashi, A., Harada, H., 2008. Methanolinea tarda gen. nov., sp. nov., a methane-producing archaeon

174

isolated from a methanogenic digester sludge. Int J Syst Evol Microbiol, 58, 294- 301.

IPCC, 2007: Climate Change 2007: Synthesis Report. Contribution of Working Groups I, II and III to the Fourth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, Pachauri, R.K and Reisinger, A. (eds.)]. IPCC, Geneva, Switzerland, 104 pp.

Iranpour, R., Oh, S., Cox, H.H.J., Shao, Y.J., Moghaddam, O., Kearney, R.J., Deshusses, M.A., Stenstrom, M.K., Ahring, B.K., 2002. Changing mesophilic wastewater sludge digestion into thermophilic operation at Terminal Island Treatment Plant. Water Environ Res, 74, 494-507.

Iranpour, R., Stenstrom, M., Tchobanoglous, G., Miller, D., Wright, J., Vossoughi, M., 1999. Environmental engineering: energy value of replacing waste disposal with resource recovery. Science, 285, 706-11.

IWA Task Group for Mathematical Modelling of Anaerobic Digestion Processes, 2002. Anaerobic Digestion Model No. 1 (ADM1). IWA Publishing, London, UK.

Jeon, S., Bunge, J., Leslin, C., Stoeck, T., Hong, S., Epstein, S.S., 2008. Environmental rRNA inventories miss over half of protistan diversity. BMC Microbiol, 8, 222.

Jetten, M., Stams, A., Zehnder, A., 1992. Methanogenesis From Acetate - A Comparison of the acetate metabolism in Methanothrix soehnghenii and Methanosarcina spp. F EMS Microbiol Rev, 88, 181-197.

Kamagata, Y., Kawasaki, H., Oyaizu, H., Nakamura, K., Mikami, E., Endo, G., Koga, Y., Yamasato, K., 1992. Characterization of three thermophilic strains of Methanothrix ("Methanosaeta") thermophila sp. nov. and rejection of Methanothrix ("Methanosaeta") thermoacetophila. Int J Syst Bacteriol, 42, 463-8.

Karakashev, D., Batstone, D.J., Trably, E., Angelidaki, I., 2006. Acetate oxidation is the dominant methanogenic pathway from acetate in the absence of Methanosaetaceae. Appl Environ Microbiol, 72, 5138-41.

Kim, M., Morrison, M., Yu, Z., 2011. Evaluation of different partial 16S rRNA gene sequence regions for phylogenetic analysis of microbiomes. J Microbiol Methods, 84, 81-7.

175

Krause, L., Diaz, N.N., Edwards, R.A., Gartemann, K.-H., Krömeke, H., Neuweger, H., Pühler, A., Runte, K.J., Schlüter, A., Stoye, J., Szczepanowski, R., Tauch, A., Goesmann, A., 2008. Taxonomic composition and gene content of a methane- producing microbial community isolated from a biogas reactor. J Biotechnol, 136, 91-101.

Kröber, M., Bekel, T., Diaz, N.N., Goesmann, A., Jaenicke, S., Krause, L., Miller, D., Runte, K.J., Viehöver, P., Pühler, A., Schlüter, A., 2009. Phylogenetic characterization of a biogas plant microbial community integrating clone library 16S-rDNA sequences and metagenome sequence data obtained by 454- pyrosequencing. J Biotechnol, 142, 38-49.

Kurahashi, M., Fukunaga, Y., Sakiyama, Y., Harayama, S., Yokota, A., 2009. gen. nov., sp. nov., an actinobacterium isolated from edulis, and proposal of Iamiaceae fam. nov. Int J Syst Evol Microbiol, 59, 869-73.

Larue, R., Yu, Z., Parisi, V.A., Egan, A.R., Morrison, M., 2005. Novel microbial diversity adherent to plant biomass in the herbivore gastrointestinal tract, as revealed by ribosomal intergenic spacer analysis and rrs gene sequencing. Environ Microbiol, 7, 530-43.

Lane, D.J., 1991. 16S/23S rRNA sequencing. In: Stackebrandt, E., Goodfellow, M. (Eds.), Nucleic Acid Techniques in Bacterial Systematics. Wiley, Chichester, England, pp. 115±175.

Leclerc, M., Delgènes, J.-P., Godon, J.-J., 2004. Diversity of the archaeal community in 44 anaerobic digesters as determined by single strand conformation polymorphism analysis and 16S rDNA sequencing. Environ Microbiol, 6, 809-19.

Lee, C., Kim, J., Shin, S.G., Hwang, S., 2008. Monitoring bacterial and archaeal community shifts in a mesophilic anaerobic batch reactor treating a high-strength organic wastewater. F EMS Microbiol Ecol, 65, 544-54.

Lee, N., Nielsen, P.H., Andreasen, K.H., Juretschko, S., Nielsen, J.L., Schleifer, K.H., Wagner, M., 1999. Combination of fluorescent in situ hybridization and microautoradiography-a new tool for structure-function analyses in microbial ecology. Appl Environ Microbiol, 65, 1289-97.

176

Lefebvre, O., Quentin, S., Torrijos, M., Godon, J.J., Delgenès, J.-P., Moletta, R., 2007. Impact of increasing NaCl concentrations on the performance and community composition of two anaerobic reactors. Appl Microbiol Biotechnol, 75, 61-9.

Legendre and Legendre, 1998. Numerical Ecology. 2nd. ed. Elsevier Science, New York, NY.

Lettinga, G., 1995. Anaerobic digestion and wastewater treatment systems. Antonie van Leeuwenhoek, 67, 3-28.

Lettinga, G., Van Velsen, A., Hobma, S., Zeeuw, W.D., Klapwijk, A., 1980. Use of the upflow sludge blanket (USB) reactor concept for biological wastewater treatment, especially for anaerobic treatment. Biotechnol Bioeng, 22, 699-734.

Levén, L., Eriksson, A.R.B., Schnürer, A., 2007. Effect of process temperature on bacterial and archaeal communities in two methanogenic bioreactors treating organic household waste. F EMS Microbiol Ecol, 59, 683-93.

Levin, D.B., Zhu, H., Beland, M., Cicek, N., Holbein, B.E., 2007. Potential for hydrogen and methane production from biomass residues in Canada. Bioresour Technol, 98, 654-60.

Liu, Y., Balkwill, D.L., Aldrich, H.C., Drake, G.R., Boone, D.R., 1999. Characterization of the anaerobic propionate-degrading syntrophs Smithella propionica gen. nov., sp. nov. and Syntrophobacter wolinii. Int J Syst Bacteriol, 49 Pt 2, 545-56.

Lozupone, C., Hamady, M., Knight, R., 2006. UniFrac--an online tool for comparing microbial community diversity in a phylogenetic context. BMC Bioinformatics, 7, 371.

Ludwig, W., Strunk, O., Westram, R., Richter, L., Meier, H., Yadhukumar, Buchner, A., Lai, T., Steppi, S., Jobb, G., Förster, W., Brettske, I., Gerber, S., Ginhart, A.W., Gross, O., Grumann, S., Hermann, S., Jost, R., König, A., Liss, T., Lüssmann, R., May, M., Nonhoff, B., Reichel, B., Strehlow, R., Stamatakis, A., Stuckmann, N., Vilbig, A., Lenke, M., Ludwig, T., Bode, A., Schleifer, K.-H., 2004. ARB: a software environment for sequence data. Nucleic Acids Res, 32, 1363-71.

Lv, W., Schanbacher, F.L., Yu, Z., 2010. Putting microbes to work in sequence: Recent advances in temperature-phased anaerobic digestion processes. Bioresour Technol, 101(24), 9409-9414.

177

Mackie, R.I., Bryant, M.P., 1981. Metabolic activity of fatty acid-oxidizing bacteria and the contribution of acetate, propionate, butyrate, and CO(2) to methanogenesis in cattle waste at 40 and 60 degrees C. Appl Environ Microbiol, 41, 1363-73.

Malin, C., Illmer, P., 2008. Ability of DNA content and DGGE analysis to reflect the performance condition of an anaerobic biowaste fermenter. Microbiol Res, 163, 503-11.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L.A., Berka, J., Braverman, M.S., Chen, Y.-J., Chen, Z., Dewell, S.B., Du, L., Fierro, J.M., Gomes, X.V., Godwin, B.C., He, W., Helgesen, S., Ho, C.H., Ho, C.H., Irzyk, G.P., Jando, S.C., Alenquer, M.L.I., Jarvie, T.P., Jirage, K.B., Kim, J.-B., Knight, J.R., Lanza, J.R., Leamon, J.H., Lefkowitz, S.M., Lei, M., Li, J., Lohman, K.L., Lu, H., Makhijani, V.B., McDade, K.E., McKenna, M.P., Myers, E.W., Nickerson, E., Nobile, J.R., Plant, R., Puc, B.P., Ronan, M.T., Roth, G.T., Sarkis, G.J., Simons, J.F., Simpson, J.W., Srinivasan, M., Tartaro, K.R., Tomasz, A., Vogt, K.A., Volkmer, G.A., Wang, S.H., Wang, Y., Weiner, M.P., Yu, P., Begley, R.F., Rothberg, J.M., 2005. Genome sequencing in microfabricated high-density picolitre reactors. Nature, 437, 376-80.

Mohan, Sunny, 2008. Study on biomethonization of waste water from jam industries. Bioresour Technol, 99, 210-213.

Muyzer, G., de Waal, E.C., Uitterlinden, A.G., 1993. Profiling of complex microbial populations by denaturing gradient gel electrophoresis analysis of polymerase chain reaction-amplified genes coding for 16S rRNA. Appl Environ Microbiol, 59, 695-700.

Nielsen, H.B., Uellendahl, H., Ahring, B.K., 2007. Regulation and optimization of the biogas process: Propionate as a key parameter. Biomass Bioenerg, 31, 820-830.

Oksanen, Jari, Guillaume Blanchet, F. Kindt, R., Legendre, P.,O'Hara, R.B., Simpson, G.L., Solymos, P., Stevens, M.H.H., Wagner, H., (2011). vegan: Community Ecology Package. R package version 1.18-25/r1537. http://R-Forge.R- project.org/projects/vegan/

O'Reilly, J., Lee, C., Chinalia, F., Collins, G., Mahony, T., O'Flaherty, V., 2010. Microbial community dynamics associated with biomass granulation in low-

178

temperature (15 degrees C) anaerobic wastewater treatment bioreactors. Bioresour Technol, 101(16), 6336-6344.

O'Sullivan, C.A., Burrell, P.C., Clarke, W.P., Blackall, L.L., 2005. Structure of a cellulose degrading bacterial community during anaerobic digestion. Biotechnol Bioeng, 92, 871-8.

Parameswaran, P., Jalili, R., Tao, L., Shokralla, S., Gharizadeh, B., Ronaghi, M., Fire, A.Z., 2007. A pyrosequencing-tailored nucleotide barcode design unveils opportunities for large-scale sample multiplexing. Nucleic Acids Res, 35, e130.

Pender, S., Toomey, M., Carton, M., Eardly, D., Patching, J.W., Colleran, E., O'Flaherty, V., 2004. Long-term effects of operating temperature and sulphate addition on the methanogenic community structure of anaerobic hybrid reactors. Water Res, 38, 619-30.

Petersen, S., Ahring, B., 1992. The influence of sulfate on substrate utilization in a thermophilic sewage-sluge digester. Appl Microbiol Biotechnol, 36, 805-809.

Polz, M.F., Cavanaugh, C.M., 1998. Bias in template-to-product ratios in multitemplate PCR. Appl Environ Microbiol, 64, 3724-30.

Quince, C., Lanzén, A., Curtis, T.P., Davenport, R.J., Hall, N., Head, I.M., Read, L.F., Sloan, W.T., 2009. Accurate determination of microbial diversity from 454 pyrosequencing data. Nat Methods, 6, 639-41.

Ramette, A., 2007. Multivariate analyses in microbial ecology. F EMS Microbiol Ecol, 62, 142-60.

Reeder, J., Knight, R., 2009. The 'rare biosphere': a reality check. Nat Methods, 6, 636-7.

Reysenbach, A.L., Giver, L.J., Wickham, G.S., Pace, N.R., 1992. Differential amplification of rRNA genes by polymerase chain reaction. Appl Environ Microbiol, 58, 3417-8.

Rivière, D., Desvignes, V., Pelletier, E., Chaussonnerie, S., Guermazi, S., Weissenbach, J., Li, T., Camacho, P., Sghir, A., 2009. Towards the definition of a core of microorganisms involved in anaerobic digestion of sludge. ISME J, 3, 700-14.

179

Roest, K., Heilig, H.G.H.J., Smidt, H., de Vos, W.M., Stams, A.J.M., Akkermans, A.D.L., 2005. Community analysis of a full-scale anaerobic bioreactor treating paper mill wastewater. Syst Appl Microbiol, 28, 175-85.

Rossello-Mora, R., 2003. Opinion: The species problem, can we achieve a universal concept? Syst Appl Microbiol, 26, 323-326.

Sasaki, D., Hori, T., Haruta, S., Ueno, Y., Ishii, M., Igarashi, Y., 2011. Methanogenic pathway and community structure in a thermophilic anaerobic digestion process of organic solid waste. J Biosci Bioeng, 111, 41-6.

Satoh, H., Miura, Y., Tsushima, I., Okabe, S., 2007. Layered structure of bacterial and archaeal communities and their in situ activities in anaerobic granules. Appl Environ Microbiol, 73, 7300-7.

Schink, B., 1997. Energetics of syntrophic cooperation in methanogenic degradation. Microbiol Mol Biol Rev, 61, 262-80.

Schloss, P.D., Handelsman, J., 2004. Status of the microbial census. Microbiol Mol Biol Rev, 68, 686-91.

Schloss, P.D., Westcott, S.L., Ryabin, T., Hall, J.R., Hartmann, M., Hollister, E.B., Lesniewski, R.A., Oakley, B.B., Parks, D.H., Robinson, C.J., Sahl, J.W., Stres, B., Thallinger, G.G., Van Horn, D.J., Weber, C.F., 2009. Introducing mothur: open- source, platform-independent, community-supported software for describing and comparing microbial communities. Appl Environ Microbiol, 75, 7537-41.

Schlüter, A., Bekel, T., Diaz, N.N., Dondrup, M., Eichenlaub, R., Gartemann, K.-H., Krahn, I., Krause, L., Krömeke, H., Kruse, O., Mussgnug, J.H., Neuweger, H., Niehaus, K., Pühler, A., Runte, K.J., Szczepanowski, R., Tauch, A., Tilker, A., Viehöver, P., Goesmann, A., 2008. The metagenome of a biogas-producing microbial community of a production-scale biogas plant fermenter analysed by the 454-pyrosequencing technology. J Biotechnol, 136, 77-90.

Sekiguchi, Y., 2006. Yet-to-be cultured microorganisms relevant to methane fermentation processes. Microbes and Environments, 21, 1-15.

Sekiguchi, Y., Kamagata, Y., Nakamura, K., Ohashi, A., Harada, 1999. Fluorescence in situ hybridization using 16S rRNA-targeted oligonucleotides reveals localization

180

of methanogens and selected uncultured bacteria in mesophilic and thermophilic sludge granules. Appl Environ Microbiol, 65, 1280-8.

Shin, S.G., Lee, S., Lee, C., Hwang, K., Hwang, S., 2010. Qualitative and quantitative assessment of microbial community in batch anaerobic digestion of secondary sludge. Bioresour Technol, 101, 9461-9470.

Simberloff, D., 1972. Properties of rarefaction diversity measurement. Am Nat, 106, 414- &.

Skiadas, I.V., Gavala, H.N., Schmidt, J.E., Ahring, B.K., 2003. Anaerobic granular sludge and biofilm reactors. Adv Biochem Eng Biotechnol, 82, 35-67.

Slatkin, M., Maddison, W.P., 1990. Detecting isolation by distance using phylogenies of genes. Genetics, 126, 249-60.

Sogin, M.L., Morrison, H.G., Huber, J.A., Welch, D.M., Huse, S.M., Neal, P.R., Arrieta, J.M., Herndl, G.J., 2006. Microbial diversity in the deep sea and the underexplored "rare biosphere". Proc Natl Acad Sci USA, 103, 12115-20.

Song, H., Clarke, W.P., Blackall, L.L., 2005. Concurrent microscopic observations and activity measurements of cellulose hydrolyzing and methanogenic populations during the batch anaerobic digestion of crystalline cellulose. Biotechnol Bioeng, 91, 369-78.

Song, M., Shin, S.G., Hwang, S., 2010. Methanogenic population dynamics assessed by real-time quantitative PCR in sludge granule in upflow anaerobic sludge blanket treating swine wastewater. Bioresour Technol, 101 Suppl 1, S23-8.

Sousa, D.Z., Pereira, M.A., Smidt, H., Stams, A.J.M., Alves, M.M., 2007. Molecular assessment of complex microbial communities degrading long chain fatty acids in methanogenic bioreactors. F EMS Microbiol Ecol, 60, 252-65.

Sousa, D.Z., Smidt, H., Alves, M.M., Stams, A.J.M., 2007. Syntrophomonas zehnderi sp. nov., an anaerobe that degrades long-chain fatty acids in co-culture with Methanobacterium formicicum. Int J Syst Evol Microbiol, 57, 609-15.

Speece, R.E., Boonyakitsombut, S., Kim, M., Azbar, N., Ursillo, P., 2006. Overview of anaerobic treatment: thermophilic and propionate implications. Water Environ Res, 78, 460-73.

181

Steinberg, L., Regan, J., 2009. An mcrA-Targeted Real-Time Quantitative PCR Method to Examine Methanogen Communities. Appl Environ Microbiol, 75(13), 4435- 4442.

Supaphol, S., Jenkins, S.N., Intomo, P., Waite, I.S., O'Donnell, A.G., 2011. Microbial community dynamics in mesophilic anaerobic co-digestion of mixed waste. Bioresour Technol, 102, 4021-7.

Suzuki, M.T., Giovannoni, S.J., 1996. Bias caused by template annealing in the amplification of mixtures of 16S rRNA genes by PCR. Appl Environ Microbiol, 62, 625-30.

Ter Braak, C., 1986. Canonical correspondence analysis: a new eigenvector technique for multivariate direct gradient analysis. Ecology, 67, 1167-1179.

Ueki, A., Akasaka, H., Suzuki, D., Ueki, K., 2006. Paludibacter propionicigenes gen. nov., sp. nov., a novel strictly anaerobic, Gram-negative, propionate-producing bacterium isolated from plant residue in irrigated rice-field soil in Japan. Int J Syst Evol Microbiol, 56, 39-44.

Vavilin, V.A., Vasiliev, V.B., Rytov, S.V., 1996. Simulation of constituent processes of anaerobic degradation of organic matter by the "methane" model. Antonie van Leeuwenhoek, 69, 15-23.

Wang, Q., Garrity, G.M., Tiedje, J.M., Cole, J.R., 2007. Naive Bayesian classifier for rapid assignment of rRNA sequences into the new . Appl Environ Microbiol, 73, 5261-7.

Ward, D.M., Weller, R., Bateson, M.M., 1990. 16S rRNA sequences reveal numerous uncultured microorganisms in a natural community. Nature, 345, 63-5.

Weiss, A., Jérôme, V., Freitag, R., Mayer, H.K., 2008. Diversity of the resident microbiota in a thermophilic municipal biogas plant. Appl Microbiol Biotechnol, 81, 163-73.

Werner, J.J., Knights, D., Garcia, M.L., Scalfone, N.B., Smith, S., Yarasheski, K., Cummings, T.A., Beers, A.R., Knight, R., Angenent, L.T., 2011. Bacterial community structures are unique and resilient in full-scale bioenergy systems. Proc Natl Acad Sci USA, 108, 4158-4163.

182

Westermann, P., Ahring, B.K., Mah, R.A., 1989. Threshold acetate concentrations for acetate catabolism by aceticlastic methanogenic bacteria. Appl Environ Microbiol, 55, 514-5.

Wittebolle, L., Marzorati, M., Clement, L., Balloi, A., Daffonchio, D., Heylen, K., De Vos, P., Verstraete, W., Boon, N., 2009. Initial community evenness favours functionality under selective stress. Nature, 458, 623-6.

Whittaker, R., 1972. Evolution and measurement of species diversity. Taxon, 21(2/3), 213-251.

Woese, C.R., Fox, G.E., 1977. Phylogenetic structure of the prokaryotic domain: the primary kingdoms. Proc Natl Acad Sci USA, 74, 5088-90.

Yadvika, Santosh, Sreekrishnan, T.R., Kohli, S., Rana, V., 2004. Enhancement of biogas production from solid substrates using different techniques--a review. Bioresour Technol, 95, 1-10.

Yamada, T., Imachi, H., Ohashi, A., Harada, H., Hanada, S., Kamagata, Y., Sekiguchi, Y., 2007. Bellilinea caldifistulae gen. nov., sp. nov. and Longilinea arvoryzae gen. nov., sp. nov., strictly anaerobic, filamentous bacteria of the phylum Chloroflexi isolated from methanogenic propionate-degrading consortia. Int J Syst Evol Microbiol, 57, 2299-306.

Yamada, T., Sekiguchi, Y., Hanada, S., Imachi, H., Ohashi, A., Harada, H., Kamagata, Y., 2006. Anaerolinea thermolimosa sp. nov., gen. nov., sp. nov. and tardivitalis gen. nov., sp. nov., novel filamentous anaerobes, and description of the new classes Anaerolineae classis nov. and Caldilineae classis nov. in the bacterial phylum Chloroflexi. Int J Syst Evol Microbiol, 56, 1331-40.

Yamada, T., Sekiguchi, Y., Imachi, H., Kamagata, Y., Ohashi, A., Harada, H., 2005. Diversity, localization, and physiological properties of filamentous microbes belonging to Chloroflexi subphylum I in mesophilic and thermophilic methanogenic sludge granules. Appl Environ Microbiol, 71, 7493-503.

Yang, J.C., Chynoweth, D.P., Williams, D.S., Li, A., 1990. Clostridium aldrichii sp. nov., a cellulolytic mesophile inhabiting a wood-fermenting anaerobic digester. Int J Syst Bacteriol, 40, 268-72.

183

Yoon, D.-N., Park, S.-J., Kim, S.-J., Jeon, C.O., Chae, J.-C., Rhee, S.-K., 2010. Isolation, characterization, and abundance of filamentous members of Caldilineae in activated sludge. J Microbiol, 48, 275-83.

Youssef, N.H., Elshahed, M.S., 2008. Species richness in soil bacterial communities: A proposed approach to overcome sample size bias. J Microbiol Methods, 75, 86- 91.

Yu, Y., Kim, J., Hwang, S., 2006. Use of real-time PCR for group-specific quantification of aceticlastic methanogens in anaerobic processes: population dynamics and community structures. Biotechnol Bioeng, 93, 424-33.

Yu, Y., Lee, C., Kim, J., Hwang, S., 2005. Group-specific primer and probe sets to detect methanogenic communities using quantitative real-time polymerase chain reaction. Biotechnol Bioeng, 89, 670-9.

Yu, Z., Morrison, M., 2004. Comparisons of different hypervariable regions of rrs genes for use in fingerprinting of microbial communities by PCR-denaturing gradient gel electrophoresis. Appl Environ Microbiol, 70, 4800-6.

Yu, Z., Morrison, M., 2004. Improved extraction of PCR-quality community DNA from digesta and fecal samples. BioTechniques, 36, 808-12.

Yu, Z., Michel, F.C., Hansen, G., Wittum, T., Morrison, M., 2005. Development and application of real-time PCR assays for quantification of genes encoding tetracycline resistance. Appl Environ Microbiol, 71, 6926-33.

Yu, Z., Morrison, M., Schanbacher, F.L., 2010. Production and Utilization of Methane Biogas as Renewable Fuel. in: N.Q. Alain Vertes, Hideaki Yukawa and Hans Blaschek (Ed.) Biomass to Biofuels: Strategies for Global Industries. Wiley, New York, New York.

Zellner, G., Messner, P., Winter, J., Stackebrandt, E., 1998. Methanoculleus palmolei sp. nov., an irregularly coccoid methanogen from an anaerobic digester treating wastewater of a palm oil plant in north-Sumatra, . Int J Syst Bacteriol, 48 Pt 4, 1111-7.

184

Zhang, C., Liu, X., Dong, X., 2004. Syntrophomonas curvata sp. nov., an anaerobe that degrades fatty acids in co-culture with methanogens. Int J Syst Evol Microbiol, 54, 969-73.

Zhao, Y., Ren, N., Wang, A., 2008. Contributions of fermentative acidogenic bacteria and sulfate-reducing bacteria to lactate degradation and sulfate reduction. Chemosphere, 72, 233-42.

Zheng, D., Angenent, L.T., Raskin, L., 2006. Monitoring granule formation in anaerobic upflow bioreactors using oligonucleotide hybridization probes. Biotechnol Bioeng, 94, 458-72.

Zinder, S.H., Koch, M., 1984. Non-acetoclastic methanogenesis from acetate: acetate oxidation by a thermophilic syntrophic coculture. Arch Microbiol, 138, 263-272.

Zuur, A.K., Ieno, E.N., Smith, G.M., 2007. Analysing Ecological Data. Springer Science, New York, NY.

185

APPENDIX A:

ADDITIONAL PYROSEQUENCING METHODS

186

Supplementary Materials and Method for Chapters 5 and 6:

Data Processing:

Processing of the sequence datasets provided by the Keck sequencing center was conducted using the Qiime bioinformatics software package, v1.1.2 (Caporaso et al.

2010). The returned sequence (*.fna) and quality (*.qual) files were processed initially processed to deconvolute the returned data files and generate the individual sample libraries needed for further analysis. The split_libraries.py command was used to generate initial sample libraries with the following screening parameters: sequences with homopolymer stretches greater than 8nt, more than 2 N bases, sequence lengths greater than 650 bp, and overall quality scores < 25 were removed. Forward and reverse primers were trimmed, allowing for a 1 base primer mismatch, and the barcodes were checked using the hamming_8 option. Sequences generated from the reverse template primer were reverse complemented and then the forward and reverse complemented reads were combined into a single sample file. Sequences were aligned using PyNAST against the

Greengenes core set database using the align_seqs.py command with standard options.

Sequences were then screened for chimeras using the identify_chimeric_seqs.py command by the ChimeraSlayer method with standard options. Sequences identified as being chimeric were filtered from the both the aligned and unaligned sequence files prior to further analyses.

To generate OTU groupings, the unaligned, non-chimeric sequence file was processed using the pick_otus.py command using the uclust method with standard

187 options. This process bins OTUs at the standard 97% similarity cutoff, which is the most commonly used standard for binning OTUs. Representative sequences for each OTU were chosen with the pick_rep_set.py command using standard options, with the aligned, non-chimeric sequence file serving as the file from which sequences were chosen. The aligned representative sequences were classified using the assign_taxonomy.py command using the RDP method. The OTU mapping file (the result of pick_otus.py) and the taxonomy file were used to generate the OTU table using the make_otu_table.py command. Prior to construction of a phylogenetic tree, the aligned sequences were processed using the filter_alignment.py command to remove the hypervariable regions as noted by the Lane mask. The resulting filtered alignment was using to construct a phylogenetic tree using the FastTree method implemented in the make_phylogeny.py command.

To calculate alpha diversity measures within Qiime, the alpha_rarefaction.py command was used with the following options: the number of repetitions was 50; diversity metrics determined were the number of observed OTUs, singletons, and doubletons (metric osd), the Chao1 measure of expected richness, the Shannon and

Simpson diversity indices, and the measure of sample phylogenetic distance (metric

PD_whole_tree). This version of the command rarefies the OTU table for the construction of rarefaction curves based on each of the desired alpha diversity metrics, stepping ~5000 sequences between calculations. To calculate the estimated maximum richness based on the rarefaction curve of observed OTUs, the OTU mapping file was reformatted to generate a mothur compatible otu file which was used to generate

188 rarefaction curves for each sample at a frequency of every 1 sequence. The resulting rarefaction data from mothur was then analyzed using SAS as described in Chapter 3.

Measures of beta diversity between the samples were calculated in Qiime using the command beta_rarefaction.py for the following metrics: binary Jaccard, Ochiai, and

Sørenson, Bray-Curtis, and weighted and unweighted UniFrac. All distance measures were processed using the principal_coordinates.py command to ordinate the samples in a two-dimensional space. PCoA plots were compared by Procrustes analysis by using the command transform_coordinate_matrices.py.

189

Table A.1: List of all archaeal primers developed for pyrosequencing.

Primer Name Barcode Full Primer Sequence

454ArcF-3 CATGCATG GCCTCCCTCGCGCCATCAGCATGCATGWCYGGTTGATCCYGCCRG 454ArcF-4 ATGCATGC GCCTCCCTCGCGCCATCAGATGCATGCWCYGGTTGATCCYGCCRG 454ArcF-5 TGCATGCA GCCTCCCTCGCGCCATCAGTGCATGCAWCYGGTTGATCCYGCCRG 454ArcF-6 GAGAGAGA GCCTCCCTCGCGCCATCAGGAGAGAGAWCYGGTTGATCCYGCCRG 454ArcF-7 CATCAGCA GCCTCCCTCGCGCCATCAGCATCAGCAWCYGGTTGATCCYGCCRG 454ArcF-8 CTCTCTCT GCCTCCCTCGCGCCATCAGCTCTCTCTWCYGGTTGATCCYGCCRG 454ArcF-9 TCTCTCAG GCCTCCCTCGCGCCATCAGTCTCTCAGWCYGGTTGATCCYGCCRG 454ArcF-23 CAGATCTG GCCTCCCTCGCGCCATCAGCAGATCTGWCYGGTTGATCCYGCCRG 454ArcF-24 GATGAGCA GCCTCCCTCGCGCCATCAGGATGAGCAWCYGGTTGATCCYGCCRG 454ArcF-25 CTCATGAG GCCTCCCTCGCGCCATCAGCTCATGAGWCYGGTTGATCCYGCCRG 454ArcF-26 AGAGCTCT GCCTCCCTCGCGCCATCAGAGAGCTCTWCYGGTTGATCCYGCCRG 454ArcF-27 TCTGCAGA GCCTCCCTCGCGCCATCAGTCTGCAGAWCYGGTTGATCCYGCCRG 454ArcF-28 CTGATCAG GCCTCCCTCGCGCCATCAGCTGATCAGWCYGGTTGATCCYGCCRG 454ArcF-29 AGCTGATC GCCTCCCTCGCGCCATCAGAGCTGATCWCYGGTTGATCCYGCCRG 454ArcF-30 TCTCTCTC GCCTCCCTCGCGCCATCAGTCTCTCTCWCYGGTTGATCCYGCCRG 454ArcR-3 CATGCATG GCCTTGCCAGCCCGCTCAGCATGCATGYGGTRTTACCGCGGCGGCT 454ArcR-4 ATGCATGC GCCTTGCCAGCCCGCTCAGATGCATGCYGGTRTTACCGCGGCGGCT 454ArcR-5 TGCATGCA GCCTTGCCAGCCCGCTCAGTGCATGCAYGGTRTTACCGCGGCGGCT 454ArcR-6 GAGAGAGA GCCTTGCCAGCCCGCTCAGGAGAGAGAYGGTRTTACCGCGGCGGCT 454ArcR-7 CATCAGCA GCCTTGCCAGCCCGCTCAGCATCAGCAYGGTRTTACCGCGGCGGCT 454ArcR-8 CTCTCTCT GCCTTGCCAGCCCGCTCAGCTCTCTCTYGGTRTTACCGCGGCGGCT 454ArcR-9 TCTCTCAG GCCTTGCCAGCCCGCTCAGTCTCTCAGYGGTRTTACCGCGGCGGCT 454ArcR-23 CAGATCTG GCCTTGCCAGCCCGCTCAGCAGATCTGYGGTRTTACCGCGGCGGCT 454ArcR-24 GATGAGCA GCCTTGCCAGCCCGCTCAGGATGAGCAYGGTRTTACCGCGGCGGCT 454ArcR-25 CTCATGAG GCCTTGCCAGCCCGCTCAGCTCATGAGYGGTRTTACCGCGGCGGCT 454ArcR-26 AGAGCTCT GCCTTGCCAGCCCGCTCAGAGAGCTCTYGGTRTTACCGCGGCGGCT 454ArcR-27 TCTGCAGA GCCTTGCCAGCCCGCTCAGTCTGCAGAYGGTRTTACCGCGGCGGCT 454ArcR-28 CTGATCAG GCCTTGCCAGCCCGCTCAGCTGATCAGYGGTRTTACCGCGGCGGCT 454ArcR-29 AGCTGATC GCCTTGCCAGCCCGCTCAGAGCTGATCYGGTRTTACCGCGGCGGCT 454ArcR-30 TCTCTCTC GCCTTGCCAGCCCGCTCAGTCTCTCTCYGGTRTTACCGCGGCGGCT

190

Table A.2 List of all bacterial primers developed for pyrosequencing

Primer Name Barcode Full Primer Sequence 454BactF-1 TCAGCTGA GCCTCCCTCGCGCCATCAGTCAGCTGAAKRGTTYGATYNTGGCTCAG 454BactF-10 TCTCAGAG GCCTCCCTCGCGCCATCAGTCTCAGAGAKRGTTYGATYNTGGCTCAG 454BactF-11 GATGCATC GCCTCCCTCGCGCCATCAGGATGCATCAKRGTTYGATYNTGGCTCAG 454BactF-12 GATCTGCA GCCTCCCTCGCGCCATCAGGATCTGCAAKRGTTYGATYNTGGCTCAG 454BactF-13 CTCTGAGA GCCTCCCTCGCGCCATCAGCTCTGAGAAKRGTTYGATYNTGGCTCAG 454BactF-14 CAGATGAG GCCTCCCTCGCGCCATCAGCAGATGAGAKRGTTYGATYNTGGCTCAG 454BactF-15 AGAGCTGA GCCTCCCTCGCGCCATCAGAGAGCTGAAKRGTTYGATYNTGGCTCAG 454BactF-16 GAGATCAG GCCTCCCTCGCGCCATCAGGAGATCAGAKRGTTYGATYNTGGCTCAG 454BactF-17 GAGATCTC GCCTCCCTCGCGCCATCAGGAGATCTCAKRGTTYGATYNTGGCTCAG 454BactF-18 TCTGCAGA GCCTCCCTCGCGCCATCAGTCTGCAGAAKRGTTYGATYNTGGCTCAG 454BactF-19 AGAGAGAG GCCTCCCTCGCGCCATCAGAGAGAGAGAKRGTTYGATYNTGGCTCAG 454BactF-2 GCATGCAT GCCTCCCTCGCGCCATCAGGCATGCATAKRGTTYGATYNTGGCTCAG 454BactF-20 TCAGAGAG GCCTCCCTCGCGCCATCAGTCAGAGAGAKRGTTYGATYNTGGCTCAG 454BactF-21 GATCAGCT GCCTCCCTCGCGCCATCAGGATCAGCTAKRGTTYGATYNTGGCTCAG 454BactF-22 CTGAGAGA GCCTCCCTCGCGCCATCAGCTGAGAGAAKRGTTYGATYNTGGCTCAG 454BactR-1 TCAGCTGA GCCTTGCCAGCCCGCTCAGTCAGCTGAGTNTBACCGCDGCTGCTG 454BactR-10 TCTCAGAG GCCTTGCCAGCCCGCTCAGTCTCAGAGGTNTBACCGCDGCTGCTG 454BactR-11 GATGCATC GCCTTGCCAGCCCGCTCAGGATGCATCGTNTBACCGCDGCTGCTG 454BactR-12 GATCTGCA GCCTTGCCAGCCCGCTCAGGATCTGCAGTNTBACCGCDGCTGCTG 454BactR-13 CTCTGAGA GCCTTGCCAGCCCGCTCAGCTCTGAGAGTNTBACCGCDGCTGCTG 454BactR-14 CAGATGAG GCCTTGCCAGCCCGCTCAGCAGATGAGGTNTBACCGCDGCTGCTG 454BactR-15 AGAGCTGA GCCTTGCCAGCCCGCTCAGAGAGCTGAGTNTBACCGCDGCTGCTG 454BactR-16 GAGATCAG GCCTTGCCAGCCCGCTCAGGAGATCAGGTNTBACCGCDGCTGCTG 454BactR-17 GAGATCTC GCCTTGCCAGCCCGCTCAGGAGATCTCGTNTBACCGCDGCTGCTG 454BactR-18 TCTGCAGA GCCTTGCCAGCCCGCTCAGTCTGCAGAGTNTBACCGCDGCTGCTG 454BactR-19 AGAGAGAG GCCTTGCCAGCCCGCTCAGAGAGAGAGGTNTBACCGCDGCTGCTG 454BactR-2 GCATGCAT GCCTTGCCAGCCCGCTCAGGCATGCATGTNTBACCGCDGCTGCTG 454BactR-20 TCAGAGAG GCCTTGCCAGCCCGCTCAGTCAGAGAGGTNTBACCGCDGCTGCTG 454BactR-21 GATCAGCT GCCTTGCCAGCCCGCTCAGGATCAGCTGTNTBACCGCDGCTGCTG 454BactR-22 CTGAGAGA GCCTTGCCAGCCCGCTCAGCTGAGAGAGTNTBACCGCDGCTGCTG

191