Comparative Metagenomic Approaches Reveal Swine-Specific

Bacterial Populations Useful for Fecal Source Identification

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Doctor of Philosophy

in the Department of Civil and Environmental Engineering

by

Regina Lamendella

M.S. Environmental Science (2006), University of Cincinnati

B.S. Biology (2004), Lafayette College

Committee Chair: Dr. Daniel B. Oerther, Ph.D.

Abstract

Despite current efforts to reduce fecal loads into aquatic environments, the problem persists, partially due to the inability to reliably identify the origin of fecal pollution. The swine industry has come under increasing environmental scrutiny due to augmented production and concentration of farming operations, resulting in large amounts of more concentrated waste products. Swine waste often harbors several human pathogens and thus presents a great risk to human health. Recently, methods targeting host-specific microbial populations have been proposed to identify specific sources and loads of fecal pollution entering environmental waters.

Currently, there are a limited number of swine-specific markers available for source tracking studies. This is in part explained by our limited understanding of phylogenetic and functional diversity present within the swine gut microbiome. Several selective pressures imposed at the host and microbe level are predicted to result in unique microbial populations within the swine gastrointestinal system, which could serve as useful targets for swine fecal source tracking purposes.

Two popular targets utilized for fecal source tracking are and

Bacteroidales 16S rRNA genes. In this study, the 16S rRNA gene sequence diversity of these two bacterial groups was examined to determine identity and distribution of swine-specific populations. The occurrence and abundance of these fecal bacterial populations were studied using samples from several geographically diverse host fecal types and impacted environmental samples. This molecular ecology approach revealed the diversity patterns of these popular fecal

i source tracking targets and unveiled previously unknown swine fecal source-specific populations relevant to environmental swine fecal pollution.

Since 16S rRNA gene approaches underestimate functional diversity, the swine fecal microbiome was analyzed using metagenomics-based (i.e., collective fecal microbial genomes) approaches to reveal the identity and functionality of the uncultivable majority within the swine distal gut. Function-specific genes represent another potential pool of swine-specific genetic targets. The use of this in silico comparative metagenomic approach facilitated the discovery of several genetic targets harbored exclusively in the swine gut. Coupling the detection of these function-specific targets to the 16S rRNA-based assays will help further validate if a swine fecal source is present within a given environmental sample. Using the approaches herein described has lead to the design of more comprehensive swine fecal source-specific assays, which will ultimately facilitate the accurate identification and relative contribution of swine fecal pollution in environmental waters.

Chapter one of this dissertation will discuss the detrimental impacts of swine fecal pollution, microbial source tracking methods, and how comparative metagenomics can be utilized for uncovering source-specific targets. Chapter two focuses on analyzing bifidobacterial diversity within different mammalian and avian fecal sources with the purpose of testing this bacterial group as a potential fecal source tracking target. Chapter three evaluates the currently available swine-specific fecal source tracking PCR-based assays within surface and groundwater surrounding swine farms. Chapter four examines Bacteroidales host distribution and presence within fecally contaminated environmental samples with the purpose of revealing populations both specific to swine fecal sources that can also be detected in environmental monitoring

ii scenarios. Chapter five discusses the application of comparative metagenomics as an approach to expose bacterial populations and functional attributes unique to the swine distal gut. Chapter six discusses future directions of the field of microbial source tracking.

iii

iv

To my Mom and Dad,

Thanks for showing me the way.

v

Acknowledgements

I am thankful for support received from:

United States Environmental Protection Agency Traineeship Award;

The National Science Foundation GK-12 Fellowship;

University of Cincinnati's Rindsberg Fellowship

I am indebted to the following people for assisting me in the accomplishment of this dissertation:

Daniel B. Oerther for his contagious passion for teaching and scientific endeavor;

Makram Suidan for serving as a member on my master's and PhD committees;

Alison Weiss and Alice Layton for being part of my committee and serving as successful female role models in science;

My colleagues at both USEPA and University of Cincinnati, especially Jorge Santo Domingo,

Cathy Kelty, Jingrang Lu, Hodon Ryu, Randy Revetta, Claudine Curioso, and Brandon Iker;

Nancy McCreary Waters and Lorraine Mineo (Lafayette College) for inspiring me to pursue a graduate degree in environmental science;

My family for their continued and support in everything I pursue;

My husband, Chris for his unwavering support, patience, unconditional love, and fantastic coffee-making skills.

vi

Table of Contents

ABSTRACT ...... i ACKNOWLEDGEMENTS ...... vi TABLE OF CONTENTS ...... vii

Chapter 1: Genomic Approaches for Microbial Source Tracking ...... 1 1.1. Impact of Fecal Pollution ...... 1 1.1.0. Microbial Water Quality and Fecal Pollution ...... 1 1.1.1. Nonpoint sources of fecal pollution ...... 2 1.1.2. Issues associated with swine waste ...... 3 1.2. Fecal Source Tracking: State of Science ...... 4 1.2.0. Fecal Source Identification ...... 4 1.2.1. Microbial Source Tracking (MST) targets ...... 5 1.2.2. Swine gastrointestinal microbiota ...... 7 1.2.3. Do host-specific populations exist in the gastrointestinal tract? ...... 8 1.2.4. Non-16S rRNA gene targets proposed for fecal source tracking ...... 9 1.3. Comparative Metagenomics Applications for Fecal Source Tracking ...... 11 1.3.0. Fecal bacterial genomes ...... 11 1.3.1. Comparative genomics as a tool for marker development ...... 14 1.3.2. Metagenomics ...... 19 1.3.3. Metagenomic methods ...... 21 1.3.4. Bioinformatics analysis ...... 23 1.3.5. Comparative metagenomics ...... 24 1.3.6. Competitive hybridization methods for comparative metagenomics ...... 26 1.4. Specific aims of research ...... 27 1.5. Significance of Research ...... 28

Chapter 2: Bifidobacteria in Feces and Environmental Waters ...... 29 2.1. Abstract ...... 29

vii

2.2. Introduction ...... 30 2.3. Materials and Methods ...... 32 2.3.0. Sample Collection ...... 32 2.3.1. DNA extraction ...... 35 2.3.2. Group- and -targeted 16S rRNA gene PCR assays ...... 36 2.3.3. Cloning and sequencing analyses ...... 38 2.4. Results and Discussion ...... 39 2.4.0. Cell lysis efficiency, assay specificity, and assay detection limits ...... 39 2.4.1. Bifidobacterium genus- and species-specific PCR results ...... 40 2.4.2. Bifidobacteria population diversity ...... 46 2.4.3. Phylogenetic analyses of Bifidobacterium clones...... 50 2.5. Acknowledgements ...... 56

Chapter 3: Evaluation of Swine-Specific PCR Assays Used for Fecal Source Tracking and Analysis of Molecular Diversity of Swine-Specific "Bacteroidales" Populations ...... 57 3.1. Abstract ...... 57 3.2. Introduction ...... 58 3.3. Materials and Methods ...... 60 3.3.0. Sample Collection ...... 60 3.3.1. DNA extractions ...... 62 3.3.2. PCR assays and limits of detection ...... 62 3.3.3. Cloning and sequencing analyses ...... 64 3.3.4. Statistical analyses and molecular diversity estimates ...... 65 3.4. Results and Discussion ...... 67 3.4.0. Detection limits and performance of PCR assays for fecal sources ...... 67 3.4.1. PCR assay performance in water samples adjacent to three swine farms in Illinois ...... 71 3.4.2. PCR assay performance in Texas surface water samples ...... 73 3.4.3. Probabilities of swine-associated PCR detection at sites using Bayesian statistics ...... 74 3.4.4. Phylogenetic analysis and population diversity of fecal and environmental Bacteroidales clones .. 77 3.5. Acknowledgments ...... 86

viii

Chapter 4: Molecular Ecology of Fecal and Environmental Bacteroidales Populations Reveals Swine-specific Fecal Source Tracking Markers ...... 87 4.1. Abstract ...... 87 4.2. Introduction ...... 88 4.3. Methods...... 91 4.3.0. Sample Collection ...... 91 4.3.1. PCR assays and limits of detection ...... 91 4.3.2. Cloning and sequencing analyses ...... 92 4.3.3. Molecular diversity estimates and primer design ...... 93 4.3.4. Preliminary Primer Evaluation ...... 94 4.4. Results and Discussion ...... 95 4.4.0. Diversity Bacteroidales of 16S rRNA gene sequences ...... 95 4.5. Conclusions ...... 110

Chapter 5: Comparative Fecal Metagenomics Unveils Unique Functional Capacity of the Swine Gut ...... 111 5.1. Abstract ...... 111 5.2. Introduction ...... 112 5.3. Materials and Methods ...... 115 5.3.0. Fecal Sample Collection ...... 115 5.3.1. Pyrosequencing and Sequence Analysis ...... 115 5.3.2. Diversity Indices ...... 116 5.3.3. Statistical Analyses ...... 117 5.4. Results ...... 117 5.5. Discussion ...... 138

Chapter 6: Crystal Ball: The future of Microbial Source Tracking ...... 149

Appendices ...... 152 References ...... 4

ix

x

CHAPTER 1

Genomic Approaches for Microbial Source Tracking

1.1. Impact of Fecal Pollution

1.1.0. Microbial Water Quality and Fecal Pollution

Animal feces can contain pathogens such as E. coli O157:H7, Listeria spp., Salmonella spp., Cryptosporidium spp., and Giardia spp., and when introduced to waters used for recreation, public water supplies, and fish cultivation may lead to severe gastrointestinal illness and other diseases. Feces contain high levels of nitrogen, which if found in groundwater used as a drinking water source may contribute to human disease (i.e. methemoglobinemia). High nutrients levels found in fecal waste can also contribute to eutrophication and hypoxia of environmental waters, leading to decreased productivity of fish and shellfishing beds. Thus, protecting the nation’s waters from fecal pollution is one of the most challenging problems facing environmental scientists guarding waters used for recreation, public water supplies, and fish cultivation. The

Clean Water Act mandates that states must adopt water quality standards in order to reduce pollutant discharges into U. S. waterways. The national point-sources discharge elimination system (NPDES) has assisted in meeting these standards by significantly reducing pollutant loading from point sources, which are defined as single localized sources of pollution (i.e. discharge pipe). While these regulations have been deemed to be effective in the reduction of fecal loads into environmental waters, nearly 13% of the rivers, estuaries, and lakes exceeding regulatory standards fail to do so as a result of increased levels of fecal (USEPA, 2005).

Since strict waste load allocations have been implemented for point sources, non-point sources are thought to be generally responsible for water impairments. Non-point sources of fecal

1 pollution do not originate from single sources, but are more diffuse and are thought to be associated with, but not limited to, agricultural, residential, commercial, and industrial environments, use of manure as fertilizer, wildlife, combined sewer overflows, and problematic septic systems.

1.1.1. Non-point sources of fecal pollution

Non-point sources of fecal contamination are a significant detriment to water quality and impose risks to human health and aquatic ecosystems. A recent study identified that current farming practices are responsible for 70% of the pollution in U.S. rivers and streams (Horrigan et al., 2002). In particular, animal manure has been identified as a large contributor to water pollution due to its over-abundance (Jongbloed and Lenis, 1998). The USEPA estimates that manure from confined animal feeding operations (CAFO’s) is three times our nation’s volume of human waste and the Code Federal Regulation 68 Part 7180 identified that improperly managed manure has been the cause of serious acute and chronic water quality problems throughout the

United States (USEPA, 2003). Excess production of manure, combined with its rapid movement into water and air has harmful impacts on the environment and public health. Concentration of manure in lagoons and storage pits can lead to elevated levels of toxic gases, like hydrogen sulfide and ammonia, as well as contain heavy metal compounds. Additionally, high concentration of antibiotics found in manure ending up in the environment can perpetuate bacterial antibiotic resistance (Chee-Sanford et al., 2001).

Environmental impacts of animal wastes have also been implicated in contributing to hypoxia in waterbodies, such as in the Gulf of Mexico, at levels that cannot support animal life

(CENR, 2000). For example, The USDA identified that in the Mississippi River’s drainage basin, animal manure was estimated to contribute 15% of the nitrogen load entering the Gulf of

2

Mexico (Ribaudo et al., 2003). As a consequence, high nutrient concentrations from animal manure impose significant detrimental impacts on aquatic ecology of the basin, as demonstrated by a marked reduction in fish cultivation in the Northern Gulf of Mexico basin (Osterberg and

Wallinga, 2003).

1.1.2. Issues associated with swine waste

Over the past four decades, the swine industry worldwide has come under increasing environmental scurtiny, due to augmented production and concentration of farming operations, thus producing large amounts of more concentrated waste products. The number of large confined swine animal units (consisting of at least 1,000 swine animal units) increased 1,238% from 1982 to 1997 in the United States (Kellog et al., 2000). Additionally, there has been a 20% increase (from 1982-1997) in nitrogen and phosphorus produced in swine operations, that is, from 996.9 to 1197.5 million pounds 269.9 to 355.2 million pounds, for nitrogen and phosphorus, respectively (Kellog et al., 2000). Swine operations have become fewer, larger, and have more concentrated manure nutrients, reflecting an imbalance between natural assimilative capacity and the quantity of manure nutrients produced on each farm.

The marked increase in the amount of swine waste produced per farming operation has raised concerns about swine waste storage and/or treatment processes. In the U.S., swine manure spills and leaks are commonplace in Iowa and North Carolina, the top hog producing states.

More than half of the tested manure storage structures leaked at rates above the legal limit of 16 million gallons per year (Simkins et al., 2002). Moreover, the Environmental Integrity Project documented 329 manure spills in Iowa between 1992 and 2002, due to failure or overflow of manure storages, uncontrolled runoff from open feedlots, improper manure application on cropland, deliberate pumping of manure onto the ground and intentional breeches in storage

3 lagoons (Merkel, 2004). Swine fecal pollution of water has been noted as a problem outside of the U.S. as well. For example, The Environment Agency of England and Wales identified 119 substantiated water pollution incidents associated with pig farming in 1998 (Pellini and Morris,

2001). Thus, prevention of swine waste from entering waters used for recreation, shellfishing, and public water supplies is essential in order to assist in meeting water quality standards in the

U.S. and other countries as well.

1.2. Fecal Source Tracking: State of Science

1.2.0. Fecal Source Identification

Despite current efforts to reduce fecal loads into aquatic environments, the problem persists, partially due to the inability to reliably identify the origin of fecal pollution. If the origin of fecal pollution could be correctly and rapidly identified, best management practices and remediation efforts could be introduced in a timely and cost-effective manner. Thus, reliable fecal source tracking methods will assist watershed managers in implementing management practices in order to protect our natural water resources. Microbial source tracking (MST) is a rapidly emerging discipline of applied environmental microbiology that focuses on identifying the source(s) of fecal contamination impacting a given body of water. In the U.S., federal requirements to develop and implement total maximum daily loads (TMDLs) have pressured states, territories and tribes to pinpoint fecal pollution sources (USEPA, 2005). Since TMDLs determine the maximum amount of pollutant that a waterbody is able to receive and still meet applicable water quality objectives, it becomes imperative to identify and quantify fecal inputs from point and non-point sources.

4

A number of MST methods, targeting fecal bacteria have been suggested to discriminate among different fecal pollution sources (Appendix A). Early source-tracking studies relied on matching bacterial DNA fingerprints of strains isolated from water samples and fecal sources

(library-dependent methods) to allocate the contribution of different fecal sources (Wiggins et al., 1999). Recent developments in molecular biology are allow the of use culture-independent techniques based on molecular markers to monitor the microbial quality of environmental waters. Culture-independent methods are more rapid, less expensive, and capable of targeting a large number of organisms that would not be captured via cultivation in the laboratory. Culture- independent, and library-independent methods based on polymerase chain reaction (PCR) particularly have been gaining popularity among science and environmental monitoring communities (Simpson et al., 2002; Santo Domingo et al., 2007). However, only a limited number of these methods have been successfully used in field applications. This is explained by the fact that most methods only partly comply with critical criteria for source identification like host-specificity, host-distribution, temporal and geographic stability of the genetic markers.

Additionally, for an MST method to be robust it must exhibit a high level of accuracy, low detection limits, and ideally correlate with traditional fecal indicator densities. Thus, it is imperative to evaluate currently proposed MST methods for these criteria, improve these methods, and perhaps develop more robust source tracking assays.

1.2.1. Microbial Source Tracking (MST) targets

Numerous microbial targets have been suggested as potential source identifiers for microbial source tracking, including E. coli, Enterococcus spp., Bacteroidales, Bifidobacterium spp., Rhodococcus spp., F-specific coliphage, adenovirus, enterovirus, and others (USEPA,

2005). However, PCR assays targeting the 16S rRNA gene of Bacteroidales (Kreader, 1995;

5

Bernhard and Field, 2000; Carson et al., 2005; Dick et al., 2005, Layton et al., 2006; Okabe et al., 2007; Kildare et al., 2007) and Bifidobacterium (Nebra et al., 2003; Bonjoch et al., 2004;

King et al., 2007) populations have been utilized widely in the United States and Canada for fecal source tracking purposes. The focus on these bacterial targets is a reflection of their important roles in the gut, high abundance, and suspected host-specificity in mammals.

Additionally, members of Bifidobacterium and Bacteroidales are strict anaerobes, thus they are predicted to not survive very long once they are released into receiving waters, suggesting they are promising indicators of recent fecal contamination events (Resnick and Levin, 1981; Carrillo et al., 1985; Avelar et al., 1998; Kreader, 1998).

Members of the division are ecologically diverse and several studies have suggested that some members of this bacterial group demonstrate host-specific distributions

(Dick et al., 2005; Lamendella et al., 2007). Additionally, recent analysis of 11,831 bacterial

16S rRNA sequences from three healthy adults revealed that the Bacteroidetes comprise between

20-40% of total microbial sequences (Eckburg et al., 2005). Other 16S rRNA gene-based studies suggest members of Bacteroidetes are also abundant in non-human mammalian hosts such as cattle (Wood et al., 1998), pigs (11% of phylotypes related to Bacteroides/Prevotella) (Leser et al., 2002) and horses (Bacteroidales comprised 18% of horse fecal phylotypes) (Daly et al.,

2001). While Bifidobacterium spp. have been suggested as an indicator of recent fecal pollution, very little is known about their diversity, distribution and occurrence in different hosts.

Arguably, characterization of microbial diversity patterns in the gastrointestinal tract could have significant implications for fecal source tracking, as community members that exhibit preferential host distribution represent a unique pool of genetic targets that can be used in the development of host-specific MST assays.

6

1.2.2. Swine gastrointestinal microbiota

Animal gastrointestinal tracts are dominated by anoxic conditions that favor the growth of strictly, as well as facultative anaerobic bacteria. Fungi, archaea, bacteriophages, and enteric viruses also co-inhabit most animal guts. Besides the human gut, most studies describing the mammalian gut microbiota have focused on cattle, although a wide variety of host systems have been studied, including the swine, mouse, and several wildlife species. The pig gut is similar in physiology and anatomy to that of the human gut. Not surprisingly, several studies have shown similarities between the microbial communities of these two gut systems. For example, the analysis of more than 4,000 pig gut 16S rRNA gene sequences indicated most populations are closely related to , Eubacteria, Bacillus-Lactobacillus-Streptococci, and Bacteroides-

Prevotella species (Leser et al., 2001). Castillo et al. (2007) highlighted that the Bacteroides-

Prevotella group was mostly found in colonic samples, while the low GC gram positive bacteria were more commonly found in proximal regions of the GI tract such as the ileum. Together, these studies suggest that the Bacteroides-Prevotella group are relevant populations in a fecal sample, given their high abundance in the distal regions of the gastrointestinal tract. It should be noted that the presence of similar populations (as determined by 16S rDNA phylogenetic analysis) does not mean that the microbes present in each gut are functionally the same. In fact, pig-specific Bacteroidetes populations have been described in the literature (Dick et al., 2005;

Okabe et al., 2007). Additional evidence is needed (e.g., presence and sequence diversity of functional genes) to further substantiate the level of genetic similarity between the human and swine microbial communities. This information is needed not only to develop methods for fecal source identification, but also to better understand the functional diversity the gut microbiota

7 confers to the swine host, which could have implications in swine health and swine models used for human studies.

1.2.3. Do host-specific populations exist in the gastrointestinal tract?

Previous research supports the concept that specific populations of bacteria are present or over-abundant in one host gut type and absent or underrepresented in others. Molecular-based studies are starting to reveal differences in microbial structure in the gastrointestinal tract dependent on diet, age, spatial orientation, time, host anatomy, physiology, and immunological pressures (Gordon, 2001; Hopkins et al., 2001; Leser et al., 2000, Leser et al., 2002). For example, Matsuki et al. (1998) found that nine species of bifidobacteria are isolated in high frequency from the human gut, among them B. adoloscentis and B. dentium which have recently been used to track human sources of fecal pollution (Nebra et al., 2003; King et al., 2007).

Bacteroidales have also been shown to share host-specific relationships with certain hosts as evidenced by uncultured Prevotella- and Bacteroides-like populations being harbored exclusively within humans, elk, pigs, horses, and cattle (Dick et al, 2005). Examples of host specificity have also been suggested outside of mammalian hosts, as in the termite gut, as recent research has suggested candidate Termite Group 1, a diverse and distinct bacterial group are exclusive to lower termites (Ikeda-Ohtsubo et al., 2007).

The animal gastrointestinal tract harbors a diverse microbial community, in which several populations are presumed to be involved in mutualistic association with each other and the host

(Backhed et al., 2005). Indeed, this complex assemblage of microorganisms represents an important component of the animal physiology. For example, the gastrointestinal microbiota possesses the capacity for diverse pathways of carbohydrate degradation, providing their host with nutritional value (Bry et al., 1996; Gill et al., 2006). Other research has shown the

8 importance of gut microorganisms in the development of its host immune system (Pena et al.,

2004).

Intestinal bacteria within animal groups are expected to be different due to diversity in gastrointestinal conditions, such as temperature, diet, antibiotic use, and anatomy of the digestive system. The diversity harbored within the gut appears to be the result of strong host selection and coevolutionary forces. For example, members of the Cytophoga-Flavobacteria-Bacteroidetes

(CFB group) associated with mammalian hosts are the most-derived or furthest away from the common ancestor, indicating that these populations evolved quickly once they became commensals of the mammalian gut (Backhed et al., 2005). Additionally, this study demonstrated that members of CFB are present in mammalian and non-mammalian hosts, indicating that this interaction was ancient and that different members of CFB coevolved with their hosts. Recently,

Lamendella et al. (2007) further supported the host-specific distributions of certain fecal

Bacteroidales. However, rarefaction analysis of more than 1,200 fecal and water-derived 16S rRNA gene clones indicated that Bacteroidales is a widely diverse group and that additional sequencing is necessary to resolve the level of host-specificity of 16S rRNA-based assays.

Moreover, these results suggest that the 16S rRNA gene might not have the resolution needed to discriminate between true host-specific populations and those that show different levels of preferential host-distribution. This is not surprising in light of the fact that the rRNA operon is involved in general protein synthesis and not in host-microbial interactions, although it provides a general view of gut microorganisms lifestyle (i.e., cosmopolitan versus host-restricted).

1.2.4. Non-16S rRNA gene targets proposed for fecal source tracking

The structure and composition of the gut microbiota show a balance of selection at the microbial level, where there is competition for space and nutrients, and at the host level, where

9 the microbial organ is responsible for the fitness of its given host. Interestingly, while the human gut has been reported to harbor several hundred bacterial species (Ley et al., 2006), relatively few bacterial divisions are represented in this complex ecosystem, suggesting that there is a strong host selection for specific bacteria with a functional repertoire that is beneficial to the host. If this is true, it is reasonable to assume that host-specific populations will carry some genes that will reflect such level of host-specificity. As a result, functional genes involved in host– microbial interactions also represent a potential pool of host-specific genetic markers for MST method development. Genes associated with microbial surface proteins, cellular processes, metabolism, and host immunity have been shown to be present in gut bacterial symbionts (Ley et al., 2006; Bäckhed et al., 2005). Indeed, the development of source tracking assays targeting functional genes, such as enterococcal surface protein (esp gene) of Enterococcus faecium, has been shown to be a viable strategy (Scott et al., 2005). However, the genetic identity and sequence information for functional genes are not available to the extent of 16S rRNA genes, limiting their use in source tracking method development. In spite of these limitations, approaches to find host-specific genes, such as competitive hybridization of fecal communities, have yielded DNA fragments specific to human, cattle and chicken fecal communities (Shanks et al., 2006; Shanks et al., 2007; Lu et al., 2007). These DNA-enriched fragments appear to be similar to surface proteins, suggesting their involvement in host–microbial interactions. Coupling the detection of function-specific markers to 16S rDNA-based assays would further verify that a specific fecal source is present in a given environmental sample.

Previous research has revealed that some enteric bacterial populations have coevolved with their hosts. Host selection for specific bacteria is thought to be the result of competition for space and nutrients as well as the development of a collective functional repertoire that is

10 beneficial to the host. Since “top-down” forces imposed by selection at the host level and the

“bottom-up” forces of microbe-microbe dynamics may result in differentiated populations in a given gastrointestinal system, it is hypothesized that the swine gastrointestinal tract harbors unique bacterial populations and/or functional attributes that are useful targets for microbial source tracking purposes.

1.3. Comparative Metagenomics Applications for Fecal Source Tracking

1.3.0. Fecal bacterial genomes

The genome content of more than 5,000 eukaryotic, bacterial, and viral species has now been defined through whole genome sequencing. As de novo sequencing methods are becoming more cost-effective, genome comparisons of closely related bacteria are becoming more readily available. Studying gene content among different bacterial species can provide insight into fundamental aspects of environmental niches and metabolism. For example, genome sequences have elucidated host-microbial specific metabolic functions, cell-to-cell signaling activities, and host-specific virulence factors of fecal bacteria. Teasing apart the different environmental niches exploited by bacteria may uncover previously unknown host-specific functions unique to avian and mammalian distal gut environments. Thus, studying fecal genomes can help answer some of the microbial ecology gaps associated with MST.

In order to better understand the importance of the gut microbiome and human health, several studies have focused on sequencing microbes that inhabit different environments of the human body (Turnbaugh et al., 2007). Thus far 63 genomes of bacteria inhabiting the human gastrointestinal tract have been sequenced. The hierarchical clustering of these genomes can be seen in Fig. 1.1. The genomes of hundreds of additional gut bacterial strains are anticipated to be

11 available as part of ongoing initiatives such as the Human Microbiome Project (Turbaugh et al.,

2007).

Fig. 1.1. Hierarchical clustering of 63 currently sequenced genomes derived from the human gastrointestinal microbiome.

Additionally, these genomes will serve as the database of reference genomes for human gut metagenomics projects, in order to map short reads to well-curated genomes. One of the questions scientists are trying to answer is, what comprises the core human gastrointestinal microbiome, or in other words, what gut microbial genetic factors are common to the vast majority of humans? In trying to answer this daunting question scientists have discovered several factors dictate the community structure of endobiotic environments (Backhed et al.,

12

2005; Ley et al., 2006; Ley et al., 2008). Though largely unexplored, the human gut microbiome has revealed that more than 90% of all gut phylotypes belong to and the

Bacteroidetes, which are just two of the 121 described bacterial divisions (Eckburg et al., 2005).

We have also learned that the differences between individuals are greater than the differences between different colonic sampling sites in one individual, and these vast population differences are most likely sustained by functional redundancy (Eckburg et al., 2005).

The findings from human gut microbiome studies have important ramifications for the field of microbial source tracking. For example, before robust, source-specific targets are chosen, understanding the core fecal microbiome within potential fecal sources and how the community structure changes during fate and transport to the environment are essential.

Understanding inter-individual variations will be essential in choosing phylogenetic and/or functional markers that have wide distributions within the same host-type. Additionally, the finding that only two divisions dominate human and other mammalian fecal environments, suggests diversity may lie at too fine of a phylogenic resolution (i.e species/sub-species level) for traditional 16S rRNA gene-targeted approaches. While 16S rRNA gene approaches have been useful in identifying phylogenetic clades endemic to specific host gut types, this approach has several limitations (Wu and Eisen, 2008). Because protein sequences are conserved at the amino acid level instead of at the nucleotide level, phylogenetic analyses of protein sequences are thought to be less prone to biases associated with 16S rRNA gene approaches (Loomis and

Smith, 1990; Lockhart et al., 1992; Hasegawa and Hashimoto, 1993; Baldauf et al., 2000; Wu and Eisen, 2008) Additionally, since the third position of each codon is more variable, protein- encoding phylogenetic genes are thought to be more useful in phylogenetic studies of more closely related organisms (Wu and Eisen, 2008). This approach is especially applicable to

13 identifying host-specific fecal source tracking bacterial populations that cannot be well resolved using the 16S rRNA approach.

1.3.1. Comparative genomics as a tool for marker development

Genome comparisons of phylogenetically related gut bacteria have helped uncover the protein-encoding genes that dictate their life-style and how these might be adapted to different ecological niches within a given gut type. For example, recent genome-based studies showed that bifidobacteria sense and adapt to their environment (Lee et al., 2008). Comparison of the

Bifidobacterium dentium genome to other Bifidobacterium species, showed that B. dentium possesses genes encoding for extracellular polysaccharide, collagen adhesion, and restriction modification systems not present in other bifidobacterial genomes, suggesting how this commensal interacts with its environment (Ley et al., 2008). Additionally, Klijn et al., 2005 performed comparative analysis of publicly available bifidobacteria genomes and showed that these commensals co-exist with their host and survive the diversity of host responses developed to eliminate pathogenic bacteria. All together, comparative genomics of this bacterial group uncovered how bifidobacteria harbor genetic factors responsible for interacting with their host environments, suggesting that they may contain promising targets for host-specific PCR assay development.

The sequencing of several Campylobacter species provides another example of drastic genomic differences between members of the same genera. C. jejuni was the first food-borne pathogen to be completely sequenced, however, little was known about how Campylobacter species cause disease, until comparative genomic analysis uncovered major functional differences between these genomes. Recently, Parker et al. (2006) have compared the genomes of C. jejuni strain RM1221, isolated from a chicken carcass, C. jejuni strain NCTC 11168, C.

14 coli strain RM2228, a multi-drug-resistant chicken isolate, and C. lari strain RM2100 a clinical isolate, and C. upsaliensis strain RM3195, taken from a patient with Guillain-Barré syndrome.

This comparative approach uncovered several structural genomic differences due to insertion of genes coding for virulence and survival, explaining the different pathogenicity and infectivity between these strains within different hosts. Additional analysis revealed interesting differences in genes coding for synthesis of products that are important for the interaction of Campylobacter with its environment, suggesting the presence of host-specific factors for each of these species.

Differences in membrane-associated genes of closely related microorganisms, such as E. coli

O55 and O157 (Tarr et al., 2000), enterococci (Shanks et al., 2006), Salmonella enterica

(Selander, 1997), Streptococcus pneumoniae (Tettilin et al., 2001), and Neisseria meningitidis

(Tettilin et al., 2000) have been previously described. Large scale comparative genomic approaches such as these could be useful to fecal source tracking research, by employing in silico approaches for discovering unique functional attributes associated with avian and mammalian fecal bacterial genomes.

Nearly forty Salmonella genomes are publicly available (as of 2008) through the Joint

Genome Institute’s Integrated Microbial Genomes database. Hierarchical clustering of each of these genomes revealed that certain serovars cluster more closely together due to similarity in their whole genome content (Fig. 1.2). Performing comparative genomics of these different serovars revealed presence of certain functional genes in specific serovars. For example,

Salmonella enterica enterica serovars possess an over abundance of genetic factors coding for various transposases, required for excising and inserting mobile elements, phage integrases perhaps involved in adaption of these serovars within their hosts (Fig. 1.3). Interestingly, other studies have identified similar genes as well as type three secretion systems and effector proteins,

15 which play important roles in yielding different host-specificities of S. gallinarum and S. enteridis (Eswarappa et al., 2009). This exercise provides insights as to how comparative genomics can be used to identify

Fig. 1.2. Hierarchical clustering of Salmonella genomes using Integrated Microbial Genomes

host-specific genomic factors that provide a molecular basis for host-pathogen interactions.

These co-evolutionary elements may be promising targets for fecal source tracking due to their highly evolved host-specificity.

Alternatives to in silico comparative genomics are competitive hybridization methods which can be used to select for regions of the genome that differ between microbial strains. One example is genome fragment enrichment (GFE). This in vitro method is used to enrich genomic

DNA fragment pools for unique sequences by competitive hybridization, targeted sequencing and annotation of subsets of chromosomal regions. GFE experiments have been used

16 successfully to identify unique and divergent sequences in the closely related genomes of E. faecalis (ATCC#19433) and E. faecium (ATCC# 19434). (Shanks et al., 2006). Using GFE, the authors identified 225 E. faecalis-specific DNA regions, and confirmed these chromosomal variations by both experimental and bioinformatic analyses. Other examples of in vitro hybridization methods, include suppressive subtractive hybridization methods (Akopyants et al.,

1998; Kang et al., 2006) which have been used to identify differential genetic elements within bacterial strains.

17

1 - Salmonella enterica enterica sv Choleraesuis SC-B67 17 - Salmonella enterica enterica sv 4,[5],12:i:- CVM23701 33 - Salmonella enterica enterica sv Typhi E98-3139 2 - Salmonella enterica enterica sv Paratyphi A ATCC 9150 18 - Salmonella enterica enterica sv Saintpaul SARA23 34 - Salmonella enterica enterica sv Typhi AG3 3 - Salmonella enterica enterica sv Typhi Ty2 19 - Salmonella enterica enterica sv Kentucky CVM29188 35 - Salmonella enterica enterica sv Typhi M223 4 - Salmonella enterica enterica sv Typhi CT18 20 - Salmonella enterica Agona SL483 36 - Salmonella enterica enterica sv Enteritidis P125109 5 - Salmonella typhimurium LT2 21 - Salmonella enterica sv Dublin CT_02021853 37 - Salmonella enterica enterica sv Gallinarum 287/91 6 - Salmonella enterica arizonae sv 62:z4,z23:-- 22 - Salmonella enterica sv Heidelberg SL476, CVM30485 7 - Salmonella enterica enterica sv Paratyphi B SPB7 23 - Salmonella enterica sv Newport SL254 8 - Salmonella enterica enterica sv Schwarzengrund SL480 24 - Salmonella enterica sv Paratyphi A AKU_12601 9 - Salmonella enterica enterica sv Javiana GA_MM04042433 25 - Salmonella enterica sv Schwarzengrund CVM19633 10 - Salmonella enterica enterica sv Kentucky CDC 191 26 - Salmonella enterica enterica sv Typhi J185 11 - Salmonella enterica enterica sv Heidelberg SL486 27 - Salmonella enterica enterica sv Typhi E98-2068 12 - Salmonella enterica enterica sv Newport SL317 28 - Salmonella enterica enterica sv Typhi E98-0664 13 - Salmonella enterica enterica sv Virchow SL491 29 - Salmonella enterica enterica sv Typhi E02-1180 14 - Salmonella enterica enterica sv Hadar RI_05P066 30 - Salmonella enterica enterica sv Typhi E01-6750 15 - Salmonella enterica enterica sv Weltevreden HI_N05-537 31 - Salmonella enterica enterica sv Typhi E00-7866 16 - Salmonella enterica enterica sv Saintpaul SARA29 32 - Salmonella enterica enterica sv Typhi 404ty

Fig. 1.3. Heatmap of Clusters of Orthologous Genes (COGs) from 37 different Salmonella genomes. Red= several genes from a given COG present in a genome. Dark Blue= No genes present from a given COG present in genome. 18

1.3.2. Metagenomics

The term metagenomics was coined by Jo Handelsman over a decade ago and refers to the culture-independent, molecular way of analyzing environmental samples of cohabiting microbial populations (Handelsman, 2004). While several hundred microbial genomes have sequenced individually, using whole genome sequencing, metagenomics involves sampling all genome sequences of a community of organisms inhabiting the sampled environment. Metagenomics allows for the examination of DNA from all of the gene content present in a given sample, allowing scientists to begin to view species and functional composition from a holistic and unbiased lens. With this new insight into microbial communities within a community we can answer questions such as: Which species are present in this environment? What are their abundances? What are the active populations present? What unique functions are present in this environment? Perhaps the most exciting question is embedded in the unknown. With more than half of metagenomic reads mapping to putative proteins of unknown function, gene discovery is becoming an exciting prospect for novel gene mining. For example, one of the largest metagenomics studies to date, the Sargasso Sea study (Venter et al., 2004) discovered an estimate

1.2 million new genes. More than 700 of these genes were proteorhodopsins which were discovered in phyla previously unknown to have light harvesting functions, such as the

Bacteroidetes (Handelsman, 2004). Additionally, the Sargasso Sea data set uncovered and abundance of genomes involved in phosphorus cycling. The ocean being a phosphorus-limited ecosystem can provide scientists with the information needed to discover the relatively poorly understood mechanisms of phosphorus recovery and utilization (Handelsman, 2004).

19

Metagenomic approaches have also been employed to discover and isolate novel compounds in several environments such as new antibiotic compounds from soil and marine environments. Metagenomics has also helped reveal novel biocatalysts such as low temperature active lipases (Hardeman and Sjoling, 2007), alkane hydroxylases (Xu et al., 2008), cytotoxic compounds (Lon et al., 2005; Schmidt et al., 2005), and many other bio-mining prospects. Due to the high microbial diversity of the environments screened for novel functions, large numbers of sequences need to be screened rapidly. Cell-based high-throughput screening using substrate- induced gene-expression screening (SIGEX) allows for the screening of catabolic genes induced by various substrates (Uchiyama and Watanabe, 2008). Additionally, in order to access the genomes of numerically low abundance within a metagenome cell sorting technology, fluorescence activated cell sorting (FACS) and whole genome amplification have been used to uncover these largely unseen members (Marcy et al., 2007). These novel metagenomic screening approaches may be useful for uncovering novel host-specific functionalities of various host fecal environments and how fecal bacteria behave during their fate and transport to the environment.

As of June 2009, 63 metagenome projects have been completed, and hundreds of new metagenomics projects producing a huge amount of DNA sequences. In addition, advances in the throughput and cost-efficiency of de novo sequencing technology is fueling a rapid increase in the number and size of metagenomic datasets. One of the most daunting bottlenecks for studying metagenomes is data analysis. Currently the two most comprehensive, rapid, and interactive metagenomics program suites can be found on the MG-RAST server and the IMG/M ER web servers. Both of these pipelines perform gene calling against different known gene families.

Additionally, each pipeline can perform comparative metagenomics between your query metagenomes and those currently available to the public. For example, Fig. 1.4. provides a

20 hierarchical clustering of currently available metagenomic datasets using COG functional profiles in the JGI IMG-ER pipeline. Principal components analysis can also be used to compare overall level of similarity between metagenomes (Fig. 1.5). These comparative approaches allow for the user to identify differences amongst environmental metagenomes.

1.3.3. Metagenomic methods

One main goal of metagenomic projects is to better understand the functional diversity within an environment of interest by finding genes contained in short-length sequences. Very few fecal source and environmental functional metagenomics projects have been completed to date, however these types of projects could offer a substantial amount of ecological data to the microbial source tracking field. Using rapid pyrosequencing technologies, the phylogenetic and functional content of several fecal source and environmental can be revealed. This information is central to identifying populations relevant to source tracking and understanding their fate from excretion to the environment. Functional metagenomics presents the opportunity to identify the previously unknown functional capacity of microbial communities within fecal source and environmental samples, while also mining for genes with previously unidentified functions.

Some of these functional genes may represent metabolic and/or physiological functions unique to a given fecal source.

21

Fig. 1.4. Hierarchical clustering of 65 publicly available metagenomes with COG functional profiles using Integrated Microbial Genomes v2.8. 22

Since bacterial genes within a bacterial genome are of short length and of high density within a bacterial genome, unassembled metagenomic reads (i.e. 100-400 basepair) often contain a significant portion of a gene and algorithms such as blast-x can be used search against a protein database or other metagenomics data sets. Commonly used protein databases include the non- redundant (nr), swissprot, refseq, protein data bank (PDB), SEED, COG, and KEGG databases.

While several recent studies (Tringe et al., 2005) have used this approach, it should be noted that this type of gene-finding method is dependent on the availability of closely related sequences in the database. Additionally, this type of comparative approach often will not assign more than half of the shorter pyrosequencing reads to a hit in a given database. Another disadvantage of homology type gene finding methods is that finding novel genes within a metagenomic dataset is improbable because identification relies on comparison to known functions.

1.3.4. Bioinformatics analysis

While homology-based approaches are widely used for gene-prediction in metagenomics- based studies, they are often very computationally expensive and thus result in long run times.

Other model-based gene-prediction methods used for gene finding can predict novel genes and at a lower computational cost. These methods involve the ab initio prediction of genes from microbial DNA and are employed in programs such as GLIMMER (Delcher et al., 1999) and

GeneMark.hmm (Lukashin and Borodovsky, 1998) and MetaGene (Noguchi et al., 2006). These approaches are based on identifying open reading frames (ORFs), beginning with a start codon and ending with a stop codon. Since sequenced metagenomes typically comprise 100-700 bp fragments depending on the sequencing technology used, many ORFs will be ignored by these gene-finding methods. Another limitation is that these methods are based on sequence models

23 from only currently sequenced bacterial genomes resulting in poor gene prediction quality in metagenomic datasets from species currently underrepresented in the pool of sequenced genomes

(Hoff et al., 2008). State-of-the art large-scale machine learning approaches, such as Orphelia, are gaining popularity as these programs have become readily available on web-servers, are tailored towards metagenomic data sets of various size, (300 bp or 700 bp) and show high specificity and sensitivity in metagenomic gene-prediction.

1.3.5. Comparative metagenomics

While gene prediction is central to discovering the functionality of a metagenomics dataset, the addition of several metagenomes to publicly available databases provides the opportunity for comparisons of these datasets to one another. To date, several mammalian and avian gastrointestinal environments have been studied using metagenomics approaches, revealing previously unknown functional capacities. For example, metagenomics studies have revealed that the gut microbiota influences host nutrition, controls energy balance, supports the immune system, and offers protection against pathogens. The importance of a healthy gut microbial consortia is best supported by examples where imbalance in the gut microbiome can lead to devastating disease states, including, inflammatory bowel disease and obesity (Gordon et al., 2005). One of the most interesting comparative human gut metagenomics projects to date, by Kurokawa et al. (2008) compared the fecal samples from 13 healthy human of different ages.

This study revealed that adult and infant type gut microbiomes have enriched gene families that generally did not overlap. Their study showed that the commonly enriched gene sets encoded different core functions within the adult and infantile gut microbiota.

24

Bath Hot Springs

Human Gut

Obsidian Hot Spring Soil and Pristine Groundwater

Sludge Mouse Gut

Hypersaline Mat

Marine Planktonic

Fig. 1.5. Principal Components Analysis of COGs from 33 gut and environmental metagenomes.

This sequencing effort also revealed several hundred gene families, which are exclusively found in the human gut, suggesting various strategies are employed by each type of microbiota to adapt to its intestinal environment. As more fecal metagenomes are sequenced, the host-specificity of

25 these genetic factors can be further scrutinized. When studying the metagenome of chicken, mouse, and human fecal microbiomes, Qu et al. (2008) found that viral components or

“metavirulomes” differed significantly, and clustered by host environment. These data suggest that we will soon be able to define core and variable gene content within different host species.

Metagenomic approaches will have important implications for source tracking, as they allow for the discovery of host-specific genetic factors that shape these microbiomes.

1.3.6. Competitive hybridization methods for comparative metagenomics

A novel competitive DNA hybridization approach was used on a metagenomic scale to identify bovine, human, and avian-specific fecal DNA sequences. For example, Shanks et al.,

(2006) retrieved several GFE sequences associated with Bacteroidales-related surface-associated and secreted proteins, which were previously shown to be present in gut bacterial symbionts

(Ley et al., 2006; Bäckhed et al., 2005), These data suggested that host-fecal Bacteroidales-like genes had a capacity for interacting with their respective environments, and were promising host- specific targets. The enriched fragments were used to develop PCR assays that showed high host specificity when challenged non-host fecal samples (Shanks et al., 2007). While several studies have demonstrated the value of 16S rRNA gene-based assays, currently available PCR methods show cross-amplification of non-target fecal source samples, reducing confidence of these assays in accurately identifying sources of fecal pollution in an unknown environmental sample.

Coupling the detection of function-specific markers to 16S rDNA-based assays could provide additional evidence and further verification that a specific fecal source is present in a given environmental sample.

26

1.4. Specific aims of research

The main goal of this thesis it to discover host-specific populations harbored by the swine gastrointestinal tract and to target genetic markers specific to these populations for environmental fecal source tracking. This goal will be investigated by experimentally approaching the following specific aims:

Specific Aim 1. Evaluate currently available source-tracking assays for host-specificity, host distribution, geographic stability and occurrence in the environment.

a. Investigate the distribution of bifidobacterial species in human and animal fecal sources

and their occurrence in environmental waters. b. Examine host-specificity, sensitivity, geographic stability, and detection in the

environment of swine-specific Bacteroidales and swine-specific methanogen source

tracking targets.

Specific Aim 2. Characterize the diversity of Bacteroidales populations in fecal and environmental samples using 16S rRNA gene sequences.

Specific Aim 3. Perform in silico comparative metagenomics to unveil functional and/or overabundant genes restricted to the swine gastrointestinal microbial community that may be useful for source-tracking purposes.

27

1.5. Significance of Research

Due to the number of large animal farming operations, swine feces are an important source of pollution in several states, including Ohio, Illinois, Iowa, and North Carolina. Swine fecal pollution introduces pathogens to the environment and therefore detecting potential sources in environmental waters is critical to public health. Currently, there are a very limited number of assays capable of detecting swine fecal pollution in environmental waters. These assays have been applied in a limited number of watersheds. One of the specific aims of this study is to further validate swine-specific assays against fecal and water samples from different geographic locations. Additional markers will also be developed using phylogenetic analyses of the 16S rDNA and other genomic approaches that include in silico comparative metagenomics and competitive hybridization. From a basic research standpoint, this study will contribute to a better understanding of the molecular ecology of the mammalian gut and the phylogenetic and functional diversity of fecal bacteria. The proposed study will also benefit the MST scientific community as a whole, and local and regional resource managers to help improve their knowledge about fecal contamination sources in their watershed. Development of pig fecal source tracking assays will also complement research efforts within EPA’s National Risk

Management Research Laboratory, in which our research group is also developing fecal detection markers for human, cattle, and avian-related pollution. Results from this study will eventually assist in achieving the goals of clean water standards and developing Total Maximum

Daily Loads (TMDLs) for fecal bacteria within impacted watersheds.

28

CHAPTER 2

Bifidobacteria in Feces and Environmental Waters

as published in

Applied and Environmental Microbiology, 74:575-584. doi:10.1128/AEM.01221-07

Copyright © 2008, American Society for Microbiology. All Rights Reserved.

Regina Lamendella has a non-exclusive license agreement (License Number: 2298490436726) with American Society for Microbiology provided by the Copyright Clearance Center for permission for reproduction of this publication within this dissertation document.

2.1. Abstract

Bifidobacteria have been recommended as potential indicators of human fecal pollution in surface waters even though very little is known about their presence in non-human fecal sources. The objective of this research was to shed light on the occurrence and molecular diversity of this fecal indicator group in different animals and environmental waters. Genus- and specific-specific 16S rRNA gene PCR assays were employed to study the presence of bifidobacteria among 269 fecal DNA extracts from 32 different animals. Twelve samples from three wastewater treatment plants and 34 water samples from two fecally impacted watersheds were also tested. The species-specific assays showed that B. adolescentis, B. bifidum, B. dentium, and B. catenulatum had the broadest host-distribution (11.9% to 17.4%), while B. breve, B. infantis and B. longum were detected in less than 3% of all fecal samples. Phylogenetic

29 analysis of 356 bifidobacterial clones obtained from different animal feces showed that approximately 67% of all the sequences clustered with cultured bifidobacteria, while the rest formed a supercluster with low sequence identity (i.e., <94%) to previously described

Bifidobacterium spp. The B. pseudolongum subcluster (>97% similarity) contained 53 fecal sequences from seven different animal hosts, suggesting the cosmopolitan distribution of members of this clade. In contrast, two clades containing B. thermophilum and B. boum clustered exclusively with 37 and 18 pig fecal clones, respectively, suggesting host-specificity.

Using genus-specific assays, bifidobacteria were detected in only three of the surface water DNA extracts, although other fecal anaerobic bacteria were detected in these waters. Overall, the results suggest that the use of bifidobacterial species as potential markers to monitor human fecal pollution in natural waters may be questionable.

2.2. Introduction

Members of the Bifidobacterium genus have been described as some of the most common and beneficial bacteria in the intestinal tract of humans (Scardovi, 1984), constituting up to 91% of the total fecal microflora in infants (Harmsen et al., 2000, Langendijk et al, 1995). Some important roles of bifidobacteria have recently been elucidated through the completion of the B. longum genome (Klijn et al, 2005). For example, homologs of genes encoding numerous enzymes for processing complex carbohydrates such as xylo-oligosaccharides, pectin, and fructo- oligosaccharides have been discovered, demonstrating the adaptability of bifidobacteria to utilize a wide variety of complex carbohydrates, otherwise recalcitrant to humans (Klijn et al,

2005, Schell et al, 2002). Other characteristics that might contribute to bifidobacteria persistence

30 within their hosts, include exopolysaccharide production (Nagaoka et al, 1994), secreted membrane proteins involved in cell adhesion (Klijn et al, 2005), and bacteriocin production

(Yildirim et al, 1999).

Bifidobacteria have stringent nutrient requirements and grow poorly outside of the animal gut, making this bacterial group a potentially useful indicator of recent fecal pollution (Resnick and Levin, 1981). Some bifidobacterial species are thought to be strictly of human origin while others have been suggested as exclusively associated with animal feces (Biavati et al, 2000).

Specifically, B. dentium and B. adolescentis have been suggested as useful for tracking human fecal sources in surface waters (King et al, 2007, Nebra et al, 2003). Group- and species-specific

16S rRNA gene assays have been developed for bifidobacterial populations frequently isolated from human feces (Kaufmann et al, 1997, Kok et al, 1996, Matsuki et al, 1998, Matsuki et al,

2002). While the presence of these bifidobacterial species have been determined in human and infant subjects (Matsuki et al, 1999), their presence and diversity in non-human hosts has yet to be tested. This is critical if these species are to be considered as useful markers for the specific detection of human fecal pollution sources in environmental waters.

The primary objective of this study was to determine the occurrence of presumed human bifidobacteria in non-human hosts using 16S rRNA gene-based PCR assays. Phylogenetic analyses of 16S rRNA gene fecal clone libraries were performed to elucidate the bifidobacterial population diversity in different animal guts. Additionally, the presence of bifidobacteria in fecally impacted water samples was studied to determine the potential of this bacterial group as an indicator of human fecal pollution in environmental waters.

31

2.3. Materials and Methods

2.3.0. Sample Collection.

A total of 269 fecal samples were collected from locations in West Virginia, Texas, Ohio, and Nebraska from 32 different species of animals and birds (see Table 2.1. and Table S1). Site selection of individual farms was made to represent a large variety of animal operations. The selection was also based on the goal of including as many different animal types as possible to check for host-specificity, with emphasis on hosts considered to be important sources of fecal pollution in the United States. There were three main categories of fecal sources represented in this collection: human, domesticated animals, and wildlife samples. Some of the wildlife host types that are not considered important fecal sources were included in this study to expand our library of potential non-target hosts. Where individual droppings were available, sterile toothpicks were used to expose the interior of the fecal mass (i.e., 1 mm in diameter from the mass). Approximately, 0.5 to 1.0 g of the exposed fecal mass was placed into individual sterile vials containing 3.5-ml of phosphate-buffered saline (pH 7.2) (American Public Health

Association, 1998). For deer feces, a single pellet was placed into the sample vial. To collect human samples, anonymous adult volunteers were requested to place approximately 1 g of fresh feces into sterile vials containing 3.5 ml of phosphate buffered saline, using a sterile spatula.

Septic samples were collected from nine septic tanks (Plum Creek, NE) using sterile cotton swabs, which were then placed in sterile vials containing 3.5 ml of phosphate buffered saline.

Samples were transported on ice to the laboratory and stored at -80ºC for six to eight weeks prior to analyses.

32

Water samples were collected from different sites within two different watersheds (Ohio

River basin and Lower Rio Grande) known to be impacted with different fecal pollution sources

(See Fig. S1. and S2. for maps of sampling points). Specifically, samples associated with the

Ohio River basin were collected from Twelve-Mile Creek (Alexandria, KY), which is a multi- use watershed, with cattle, human (septic and combined sewer overflow, CSO), and wildlife fecal inputs. Additionally, Brush Creek, which is impaired for its designated use due to high fecal bacterial counts (from a suspected faulty waste water treatment plant, WWTP), feeds into

Twelve-mile Creek. Water samples were also collected on the Ohio River, approximately 200 meters upstream and downstream of where Twelve-mile Creek meets the Ohio River

(confluence), in addition to the confluence point. Samples were collected from sites on the Ohio

River presumed to be impacted with human pollution, as several combined sewer overflows run directly into sites near these collection points. This portion of the Ohio River is currently on the impaired waters 303(d) list for exceeding fecal bacteria concentrations. Duplicate water samples were collected in September of 2005 from six sites along Twelve-mile Creek (Alexandria, KY) and three sites (upstream, downstream, and confluence) along the Ohio River. Water samples from Twelve-mile Creek (Site 6) and the Ohio River (confluence point) were also collected in duplicate in September of 2007. Water samples were also collected at four points along the

Lower Rio Grande (Las Cruces watershed, El Paso, TX), which has been placed on the impaired waters 303(d) list. Samples were collected approximately 0.75 miles upstream and downstream of Bustamante and Northwest WWTP, in addition to sampling points at Sunland Park and

Courchesne Bridge. Water samples (i.e., 75 to 300 ml) were filtered onto 0.45 µm polycarbonate filters, (GE Osmonics, Minnetonka, MN) which were stored at -80 °C until genomic DNA extractions were performed.

33

Table 2.1. Detection limits of genus and species-specific bifidobacterial PCR assays

Lm26-f& Bif164-f& g-Bif-f& Fecal or Environmental Sample Lm3-rb Bif662-r g-Bif-r BiADO BiANG BiBIF BiBRE BiCAT BiDEN BiGAL BiINF BiLON Deer (6)a 1x10-8 1x10-12 1x10-12 - - - - 1x10-8 1x10-10 - - - Horse (8) - 1x10-9 - - - - - 1x10-9 1x10-8 - - - Cattle (8) 1x10-9 1x10-12 1x10-12 1x10-9 - - - 1x10-8 - - - -

Chicken (5) - 1x10-9 1x10-11 -c ------

Pig (10) 1x10-10 1x10-14 1x10-13 - 1x10-8 1x10-8 - 1x10-10 1x10-10 - - - Human (9) 1x10-9 1x10-13 1x10-14 1x10-13 - 1x10-11 - 1x10-11 - - - 1x10-8 Dry Creek WWTP Influent (2) 1x10-8 1x10-11 1x10-9 1x10-11 1x10-8 1x10-10 na 1x10-10 na na na na Dry Creek WWTP MLSS/RAS (2) 1x10-8 1x10-9 1x10-9 1x10-10 1x10-8 1x10-9 na 1x10-9 na na na na Dry Creek WWTP Secondary Aeration (2) nad 1x10-9 1x10-9 1x10-10 1x10-8 1x10-8 na 1x10-8 na na na na Dry Creek WWTP Effluent (1) na 1x10-9 1x10-9 1x10-9 1x10-8 1x10-8 na 1x10-8 na na na na NWWTP Effluent (3) 1x10-8 1x10-8 1x10-8 1x10-8 na 1x10-8 na 1x10-8 na na na na Rio Grande Downstream of NWWTP (2) na 1x10-8 na na na na na na na na na na Upstream of Bustamante WWTP (2) na 1x10-8 na na na na na 1x10-8 na na na na Bustamante WWTP Effluent (2) na 1x10-9 1x10-9 na na na na na na na na na Downstream of Bustamante WWTP (2) na na 1x10-8 na na na na na na na na na Sunland Park Upstream of NWWTP (2) 1x10-8 na 1x10-8 na na na na na na na na na

B. adoloscentis (DSM 20086) 1x10-10 1x10-14 1x10-14 1x10-14 ------B. angulatum (DSM 20225) 1x10-10 1x10-15 1x10-15 - 1x10-14 ------

B. bifidum (DSM 20082) 1x10-9 1x10-14 1x10-14 - - 1x10-14 ------B. breve (DSM 20213) 1x10-10 1x10-13 1x10-14 - - - 1x10-14 - - - - - B. catenulatum (DSM 20103) 1x10-11 1x10-12 1x10-14 - - - - 1x10-13 - - - - B gallicum (DSM 20093) 1x10-12 1x10-14 1x10-14 ------1x10-13 - - B. infantis (DSM 20088) 1x10-9 1x10-12 1x10-14 ------1x10-13 -

a Numbers in parenthesis indicate the number of samples in each fecal or environmental composite (i.e. deer comprises n=6 individual fecal samples) b All detection limits have the units of grams of DNA extract. c - indicates that the limit of detection was below 1x10-8 g of DNA extract. d na indicates no detection limit assay was performed due to absence or very low intensity PCR signal in previous assays.

34

Samples were also collected from three wastewater treatment plants (WWTP) effluents (75-100 ml) including Dry Creek WWTP (Alexandria, KY), Northwest WWTP (El Paso, TX), and

Bustamante WWTP (El Paso, TX). Additionally, samples within different stages of wastewater treatment were collected in sterile 50 ml conical tubes, including influent wastewater, secondary aeration, and return activated sludge (RAS) from the Dry Creek WWTP. To ease filtration, samples from the three latter locations were first centrifuged at 8,000 x g for 10 minutes at 4 °C and the supernatant was then filtered onto 0.45 um polycarbonate filters. DNA extractions from pellets and filters were performed immediately after the centrifugation/filtration process. (See

Table S2 in supplemental material for WWTP design, capacity, and location).

2.3.1. DNA extraction.

Total DNA was extracted with the UltraClean Fecal DNA Isolation Kit according to the manufacturer’s instructions (MO BIO Laboratories, Inc., Carlsbad, CA) using 250 µl of each fecal slurry. For water samples, DNA was extracted directly from whole filters using UltraClean

Soil DNA Isolation Kit (MO BIO Laboratories, Inc.). Total DNA was eluted in 50 µl of 10 mM

Tris and quantified using a NanoDrop ® ND-1000 UV spectrophotometer (NanoDrop

Technologies, Wilmington, DE). To test for the presence of extraneous DNA contamination introduced during laboratory procedures, no-template and extraction blanks were included in the

PCR assays. DNA extracts were stored at -20ºC until further processing.

Cell lysis experiments were conducted to assess the overall performance of the fecal extraction kit. B. breve cells (106 to 10 cells) were spiked into 100 ml of Ohio River water samples for which B. breve was previously undetected. The samples were filtered and processed

35 as mentioned above. The same number of cells used in the filtration experiments was added directly into the bead beating solution of the extraction kit and the samples were processed using the same extraction protocol.

2.3.2. Group- and species-targeted 16S rRNA gene PCR assays.

PCR assays were performed on all fecal and water DNA extracts using three genus and nine species-specific PCR assays targeting the 16S rRNA gene of Bifidobacterium spp. Group- specific Bacteroides-Prevotella (Bernhard and Field, 2000) and Clostridium coccoides (Matsuki et al, 2002) 16S rDNA-based PCR assays were used to determine the presence of fecal anaerobic bacteria in water samples and to determine PCR-inhibition potential. The Bifidobacterium genus- specific primer sets Bif164-f and Bif662-r (Kok et al, 1996), Lm26-f and Lm3-r (Kaufmann et al,

1997), and g-Bifid-f and g-Bifid-r (Matsuki et al, 2002) were used to detect bifidobacteria in fecal samples. Reactions for the genus-specific assays were conducted using previously described protocols (Kok et al, 1996, Matsuki et al, 2002) with the exception of the Lm26-f and

Lm3-r assay, for which the following cycling conditions were used: 94 ºC for 5 min, followed by

35 cycles of 94 ºC for 20 s, 55 ºC for 20 s, and 72 ºC for 30 s, and a final extension step consisting of 72 ºC for 5 min. Nine bifidobacteria species-specific primers sets were used to target B. adolescentisis, B. angulatum, B. bifidum, B. breve, B. catenulatum, B. dentium, B. gallicum, B. infantis, and B. longum following the PCR conditions described elsewhere (Matsuki et al, 1999). These species have been previously isolated from human feces (Biavati et al, 1986,

Biavati et al, 2000). Fecal and water DNA template concentrations ranged between two to 21 ng of DNA per µl for each reaction. Final PCR solutions (25 µl total volume) contained 2.5 µl of

Takara Ex Taq 10X buffer (20 mM Mg2+), 2 µl of dNTP mixture (2.5 mM each), 0.4 µl of 4%

BSA, 17 µl of UltraPure water, 0.5 µl of primer at 25 pmol per µl concentration, and 0.625 units

36 of Ex Taq DNA polymerase (TAKARA Mirus Bio., Madison, WI). Reactions were conducted on a DNA Engine 2 Tetrad thermalcycler (Bio-Rad Laboratories, Inc., Hercules, CA).

Amplification products were visualized using 1% agarose gels and GelSTAR nucleic acid stain

(Cambrex BioScience, East Rutherford, NJ).

The performance of each assay was determined in PCR experiments adding known concentrations of fecal and water DNA extracts. Using this approach it is possible to determine the detection limits of an assay for environmental DNA extracts (Lamendella et al, 2007). PCR assays were performed using serial fecal DNA dilutions (1x10-8 – 1x10-16 g DNA) of composite samples from animals that tested positive for each assay and that represented the different types of general sources of pollution, that is, human, domesticated animals (chicken, pig, cattle, horse), and wildlife (deer). We performed similar assays using serial dilutions of influent, return activated sludge, secondary aeration, effluent, and environmental water DNA extracts that yielded positive signals with a given bifidobacterial assay. To determine the inhibition potential at a specific environmental DNA concentration, DNA extracts were used a template in general- and human-specific Bacteroidetes assays (Bernhard and Field, 2000) as well as in C. coccoides

PCR assays (Matsuki et al, 2002).

Detection limits were also established for seven of the nine bifidobacterial species markers using DNA extracts from pure cultures of B. adolescentis (DSM 20086), B. angulatum

(DSM 20225), B. bifidum (DSM 20082), B. breve (DSM 20213), B. catenulatum (DSM 20103),

B. gallicum (DSM 20093), and B. infantis (DSM 20088). Serial dilutions (1x10-8 - 1x10-16 g

DNA) from each pure culture were used as template in the respective PCR assays. Potential cross-reactivity of the bifidobacteria species-specific assays was determined using 1 ng of the aforementioned pure culture DNA extracts and non-target species DNA extracts of additional

37 species such as B. pseudolongum subsp. globosum (DSM 20092), B. ruminatum (DSM 6489), and B. suis (DSM 20211). Coverage of genus-specific assays was tested by using all ten pure culture DNA extracts (1 ng) as template in each of the three genus-specific bifidobacterial assays

2.3.3. Cloning and sequencing analyses.

Bif164-f/Bif662-r PCR products were used to determine the phylogenetic diversity of bifidobacteria in different hosts. Sequencing and data analysis was performed as previously described (Lamendella et al, 2007). Briefly, PCR products were purified using the QIAquick

PCR Purification Kit according to the manufacturer’s instructions (QIAGEN, Valencia, CA).

Representative PCR products derived from 14 different human and animal feces (including alpaca, cat, cattle, chicken, deer, dog, goat, goose, human, peacock, pig, pigeon, seagull, and sheep) were cloned into pCR4.1 TOPO vector as described by the manufacturer (Invitrogen,

Carlsbad, CA). Individual E. coli clones were subcultured into 300 μl of Luria Broth containing

50μg/ml ampicillin and screened for inserts using M13 PCR. Clones were submitted to

Children’s Hospital DNA Core Facility (Cincinnati, OH) for sequencing using Big Dye sequencing chemistry (Applied Biosystems, Foster City, CA), M13 forward and reverse primers, and an Applied Biosystems PRISM 3730XL DNA Analyzer. Sequences were manually verified and aligned, using Sequencher 4.7 software. Potential chimeric sequences detected using

Bellerophon (Hugenholtz and Huber, 2003) and the Mallard (Ashelford et al, 2006) softwares were not included in further analyses. Sequences were also submitted to BLAST homology search algorithms in order to assess sequence similarity to publicly available sequences (Altschul et al, 1990). Phylogenetic analysis used ARB software and trees were inferred from 456 sequence positions [E. coli bases 179 to 655] using neighbor-joining (Kimura correction) and

38 maximum parsimony (Phylip DNAPARS tool) (26). In order to statistically evaluate branching confidence, bootstrap values were obtained from a consensus of 100 parsimony trees.

Arcanobacterium haemolyticum rRNA 16S gene sequence (accession # AJ234059) was used as the outgroup (Pascual et al, 1999), while cultured Bifidobacterium species were included in the analyses as points of reference. Representative sequences generated in this study have been deposited in the GenBank database under accession numbers EU359826 to EU359907.

Community richness and diversity of the clone libraries were studied by calculating rarefaction analysis (Heck et al, 1975, Holland, 2003, Hurlbert, 1971, Simberloff, 1978), abundance-based coverage estimator (ACE, Chao 1 estimator of species richness), and

Shannon’s and Simpson’s index for diversity using EstimateS software. Rarefaction curves were produced by using individual-based Coleman methods and the sample-based Mao Tau method available through EstimateS (Colwell, 2005).

2.4. Results and Discussion

2.4.0. Cell lysis efficiency, assay specificity, and assay detection limits.

Detection limits using B. breve cells spiked into the extraction kit was 10 cells, while filtration of B. breve cells yielded a detection limit of 100 cells, suggesting that up to 90% of the cells could be lost during the filtration/bead beating process. To compensate for the impact the filtration step could have on the assay detection limits, we increased the number of cycles from

35 to 45 for every genus- and host-specific assays using as template DNA extracts from water and wastewater samples collected 2007 (n=14) (Table 2.1.). Of all samples tested (i.e., n=552),

39 only two water samples tested positive for any given assays (i.e., B. breve and B. bifidum using

Dry Creek WWTP RAS DNA as template), even after extending the protocol to 45 cycles.

Adding 10 cycles is the equivalent of increasing cell detection, potentially up to three orders of magnitude, which should compensate for the reduced extraction/cell lysis performance of most nucleic extraction protocols. Consequently, failure to detect bifidobacteria using 16S rRNA gene

PCR-based methods strongly suggests low survival rates of this bacterial group in environmental waters, particularly when other fecal anaerobic bacteria were detected in the same samples.

Specificity was confirmed for the species-specific assays, with the exception of BiLON assay, which also amplified B. pseudolongum. All genus-specific assays amplified DNA from all bifidobacterial strains used in this study. The detection limits of the bifidobacterial assays ranged from 10-8 to 10-14 g of DNA and 10-8 to 10-11 g of DNA when using fecal and water DNA extracts, respectively (Table 2.1.). Assay detection limits using pure cultures indicated that the

Bif164-f/Bif662-r (10-12 to 10-15 g of DNA) and g-Bif-f/g-Bif-r (10-14 to 10-15 g of DNA) markers had lower detection levels than the Lm26-f/Lm3-r, (10-9 to 10-12 g of DNA) (Table 2.1.), which is in agreement with the lower sensitivity of the latter marker in fecal and environmental DNA extracts. In general, the bifidobacterial genus-specific markers (g-Bif-f/g-Bif-r and Bif164- f/Bif662-r) had lower detection limits than the species-specific assays in fecal and environmental matrices. The latter results are not surprising due to the fact that the densities of host-specific bacteria tend to be one to two orders of magnitude lower than general fecal bacterial groups

(Lamendella et al, 2007).

2.4.1. Bifidobacterium genus- and species-specific PCR results.

40

The presence of bifidobacteria was confirmed in 25 of the 32 different animals studied.

However, only 10 animal types had positive signals to all three genus-specific assays. Moreover, of the 269 total fecal samples, only 56, 98, and 87 of the DNA extracts were positive for Lm26- f/Lm3-r, Bif164-f/Bif662-r, and g-Bif-f/g-Bif-r assays, respectively (Table 2.2.). Surprisingly, no more than five of the 19 human fecal samples and three of the nine septic samples were positive when using any of the given genus-specific assays. Moreover, none of the genus-specific primers produced PCR signals when fecal DNA extracts from armadillo, dove, fox, guinea pig, hedgehog, raccoon, squirrel, and vulture were used as template. Altogether, these data suggest that these genus-specific assays may target different populations of bifidobacteria and that several assays might be needed to understand the occurrence of these populations in animal gut systems. While these results may also suggest that some species might not be found in detectable numbers in some gut types, additional samples must be analyzed to further confirm this trend. It should be noted that using culture techniques, Resnick and Levin (1981) could not isolate bifidobacteria from the feces of chickens, cows, dogs, horses, cats, sheep, beavers, goats, and turkeys while Rhodes and Kator (1999) did not find any bifidobacteria in deer, muskrat and raccoon scat (Rhodes and Kator, 1999). However, our results demonstrate that culture-based techniques may fail to detect bifidobacteria in non-human hosts, as at least 21% of all the chicken, dairy cattle, cat, goat, pig, and sheep fecal samples had positive signals using all genus- specific markers. Bifidobacterium signals were also detected in at least one fecal sample from coyote, deer, and dog using genus-targeted assays.

41

Table 2.2. Results from Bifidobacterium species-specific PCR assays using different animal fecal DNA extracts

Bifidobacterium Genus-targeted Primer Lm26-f Bif164-f g-Bif-f & & & Animal Type (n) a Lm3-r Bif662-r g-Bif-r Alpaca (2) -b - 1 Beef Cattle (14) 1 - 1 Bobcat (1) - 1 1 Canadian Goose (20) - 4 2 Chicken (29) 6 9 14 Coyote (11) 7 3 2 Dairy Cattle (14) 5 11 7 Deer (17) 2 4 7 Domestic Cat (10) 3 5 6 Domestic Dog (15) 2 3 1 Ferret (1) - - 1 Goat (4) 1 3 2 Hog (1) 1 - - Horse (16) 3 4 - Human (19) 2 5 3 Llama (1) - 1 - Peacock (1) - 1 1 Pig (43) 17 33 27 Pigeon (4) - 1 1 Possum (2) 1 - 1 Prarie Dog (2) 1 - 1 Rabbit (4) 1 - - Septic (9) - 1 3 Sheep (8) 3 6 3 Turkey (10) - 3 2 Totalc 56 98 87 Percent positive 20.8% 36.4% 32.3% a (n) indicates the number of total fecal samples tested for that given animal type. All squirrel (4), armadillo (1), dove (1), fox (1), guinea pig (1), hedgehog (1), raccoon (1), and vulture (1) fecal sample DNA extracts produced no

PCR signal using any of the three genus-specific assays. b (-) indicates no amplification product was visualized for any of the samples from a given animal type using that designated primer. c The total number of amplification products for all 269 fecal samples using that given genus-specific marker.

42

We investigated the presence of nine bifidobacteria species within 269 fecal samples representing 32 different animal types (Table 2.3.). As expected, some of the species were not detected or infrequently detected in the hosts tested. For example, the BiINF assay was positive for less than one percent of all fecal samples (i.e. only two piglets); B. breve was only detected in two pigs, two chicken, one dairy cow, and one rabbit fecal sample, while B. longum was detected in one human, two pigs, and two sheep fecal samples. The absence of B. infantis and B. breve in the human samples and most fecal samples tested can be explained by the fact that this species is normally present only in infants. The low frequency of detection of B. longum in human fecal samples and in septic samples was not expected, as previous culture-dependent and culture- independent studies have indicated the incidence of B. longum in human feces (Reuter, 2001).

However, it should be noted that the assay used to detect B. longum in this study has been reported to depend on a higher template concentration than other bifidobacterial assays (Matsuki et al, 2002). In contrast, some of the species were detected frequently and in multiple hosts as in the case of B. bifidum, B. adolescentis, B. catenulatum, and B. dentium, which were found in 7,

8, 13, and 16 different hosts, respectively. These species were also detected in the highest number of fecal samples. Additionally, B. gallicum was detected in high frequencies in chicken and horse fecal samples. Cattle, chicken, deer, human, pig, rabbit, and sheep were among the animals showing the highest diversity of bifidobacterial species. The significance of these findings in terms of host-microbial interactions is unknown, although it suggests that some bifidobacterial species prefer a cosmopolitan lifestyle. B. bifidum, B. adolescentis, and B. catenulatum were found in particularly high frequencies in human and pigs. The high incidence of these species in human samples is consistent with previous studies that have shown that they are among the most frequently detected bifidobacterial species in the human adult intestinal

43 microflora (Biavati et al., 1986, Finegold et al., 1974, Matsuki et al., 1999, Moore and

Holdeman, 1974). However, the B. catenulatum amplification frequency in human feces was lower in this study (21%)than in a previous report, in which nearly all (92%) of Japanese adult feces indicated the presence of B. catenulatum (Matsuki et al., 1999). Differences in diet might explain these results. The low detection of B. dentium marker in human fecal samples is consistent with previous non-culture based studies, in which B. dentium was detected in only three of 48 adult human fecal samples (Matsuki et al., 1999).

Some bifidobacterial species, for example, B. adolescentis and B. dentium have been suggested to be good targets for tracking human fecal pollution in environmental waters

(Bonjoch et al., 2004, Nebra et al., 2003). However, previous host-specificity studies of B. adolescentis and B. dentium have been performed using a limited number of fecal samples and host types. Using a larger dataset, our results showed that these species were not exclusive to human feces as they were also detected in several animals (Table 2.3.). Moreover, their detection in cattle and swine feces is significant to environmental monitoring programs in the

U.S., as feces from these animals are important sources of water fecal pollution. In cases in which it is necessary to discriminate between human, cattle, and/or swine pollution, assays targeting B. adolescentis and B. dentium might not be adequate. Another important finding was the relatively lower frequency of detection of bifidobacteria in environmental waters and wastewater treatment effluents when compared to other anaerobic fecal bacteria tested in this study (Table 2.4.). For example, none of the genus- or species-specific assays detected

44

Table 2.3. Results from Bifidobacterium species-specific PCR assays using different animal fecal DNA extracts a (n) Indicates the number of total fecal samples tested for that given animal type.

Bifidobacterium Species and Group-specific Primer Animal Type (n) a BiADOb BiANG BiBIF BiBRE BiCAT BiDEN BiGAL BiINF BiLON Alpaca (2) 1 -c ------Beef Cattle (14) - - - - 1 1 - - - Canadian Goose (20) - - - - 2 2 - - - Chicken (29) 1 - 2 2 - - 3 - - Coyote (11) 1 - - - 1 1 - - - Dairy Cattle (14) 4 - - 1 2 8 1 - - Deer (17) 1 - - - 2 2 2 - - Domestic Cat (10) - - 2 - 5 - - - - Domestic Dog (15) - - 1 - 2 - - - - Goat (4) - - - - 2 2 - - - Guinea Pig (1) - - - - - 1 - - - Hog, Feral (1) - - - - - 1 - - - Horse (16) - - - - 2 3 4 - - Human (19) 7 - 4 - 4 1 - - 1 Pig (43) 13 11 22 2 20 15 7 2 2 Pigeon (4) - - 1 ------Possum (2) - - - - - 1 - - - Prarie Dog (2) - - - - - 1 - - - Rabbit (4) - 1 2 1 1 1 1 - - Septic (9) 1 1 3 - 1 1 1 - - Sheep (8) 3 1 - - 2 5 2 - 2 Squirrel (4) - - - - - 1 - - - Turkey (10) - 1 ------Totald 32 15 37 6 47 47 21 2 5

Percent positives 11.9% 5.6% 13.8% 2.2% 17.4% 17.4% 7.8% 0.74% 1.9% b PCR assays targeted B. adolescentis (BiADO), B. angulatum (BiANG), B. bifidum (BiBIF), B. breve (BiBRE), B. catenulatum group (BiCAT), B. dentium (BiDEN), B. gallicum (BiGAL), B. infantis (BiINF), and B. longum

(BiLON). c (-) Indicates no amplification product was visualized for any of the samples from a given animal type using that designated primer. d The total number of amplification products for all 269 fecal samples using that given species or group-specific marker.

45 more than 50% of the tested environmental samples (g-Bif-f/g-Bif-r, 50%; Bif164-f/Bif662-r,

37%; BiADO, 36%; and BiCAT, 26%). In contrast, other fecal bacteria groups were detected in nearly all samples (i.e., Bacteroidales, 89%; C. coccoides, 98%). Interestingly, BiDEN, a proposed human-specific fecal indicator was not detected in any of the 46 environmental samples, even though 61% of the environmental samples tested were positive for the human

Bacteroides spp. marker (HF183F/Bac708R). While the BiADO assay appears to be the most sensitive of all the bifidobacteria species-specific markers, it should be noted that this marker was only detected in wastewater, but absent in all surface water samples tested (n=34). Nested

PCR approaches have been reported to increase detection of bifidobacteria in water samples

(Bonjoch et al., 2004, King et al., 2007). However, the low sensitivity of bifidobacterial species- specific assays with environmental samples containing human-specific Bacteroidales and C. coccoides (even after increasing the number of PCR cycles), suggests that bifidobacteria might not be a reliable indicator to track human pollution sources in natural waters.

2.4.2. Bifidobacteria population diversity.

Rarefaction analysis using the Mao Tau method (Colwell et al., 2005) of 356 bifidobacterial clones generated in this study suggested that the sequence diversity is approaching operational taxonomic unit (OTU) saturation (Fig. 2.1.). Estimations of species richness and diversity were calculated for Bifidobacterium-related sequences obtained in this study and Bacteroidales fecal bacterial populations from a previous study (Lamendella et al.,

2007) for means of relative comparison of community species richness. Running statistical analyses using EstimateS (version 5.0.1) (Colwell, 2005) with 100 randomizations on 356 fecal bacterial sequences from Bacteroidales

46

Table 2.4. Presence of bifidobacteria in surface waters and wastewater samples

Lm26-f Bif164-f g-Bif-f C. Human Environmental Sample & & & BiADOc BiANG BiBIF BiBRE BiCAT BiDEN BiGAL BiINF BiLON coccoides Bacteroidales Bacteroides Lm3-r Bif662-r g-Bif-r 12-Mile Creek Site 1 (2005) -/-a -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/- NDa 12-Mile Creek Site 2 (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/- ND 12-Mile Creek Site 3 (2005) +/- +/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/- ND 12-Mile Creek Site 4 (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ ND 12-Mile Creek Site 5 (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/- ND 12-Mile Creek Site 6 (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ ND 12 Mile Creek Site 6 (2007) +/- +/+ +/+ -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ -/- Ohio River Upstream (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ ND Ohio River /12 Mile Creek (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ ND Ohio River /12 Mile Creek (2007) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/- +/- -/- Ohio River Downstream (2005) -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ ND Dry Creek WWTP Influent +/- +/+ +/+ +/+ +/- +/+ -/- +/- -/- -/- +/- +/- +/+ +/+ +/+ Dry Creek WWTP RAS -/- -/- +/+ +/+ -/-b -/- -/-b +/+ -/- -/- -/- -/- +/+ +/+ +/+ Dry Creek WWTP Secondary Aeration -/- -/- +/+ +/+ +/- +/+ -/- +/+ -/- -/- -/- -/- +/+ +/+ +/+ Dry Creek WWTP Effluent - - + + - - - + - - - - + + + Rio Grande Upstream of NWWTP +/- +/- +/+ -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ -/- NWWTP Effluent +/+/+ +/+/+ +/+/+ +/+/+ -/-/- +/+/+ -/-/- +/+/+ -/-/- -/-/- -/-/- -/-/- +/+/+ +/+/+ +/+/+ Rio Grande Downstream of NWWTP -/- +/+ -/- -/- -/- -/- -/- -/- -/- -/- -/- -/- +/+ +/+ -/- Bustamante Upstream -/- +/- +/+ -/- -/- -/- -/- +/+ -/- -/- -/- -/- +/+ +/+ +/+ Bustamante WWTP Effluent -/- +/+ +/+ -/- -/- -/- -/- +/- -/- -/- -/- -/- +/+ +/+ +/+ Bustamante Downstream -/- +/+ +/+ -/- -/- -/- -/- -/- -/- +/+ -/- -/- +/+ +/+ +/+ Sunland Park Upstream of WWTP +/+ +/- +/+ -/- -/- -/- -/- -/- -/- +/- -/- -/- +/+ +/+ +/- Courchesne Downstream of WWTP -/- -/- +/- -/- -/- -/- -/- -/- -/- +/- -/- -/- +/+ +/+ -/- Total Sensitivity 0.20 0.37 0.50 0.36 0.04 0.15 0.00 0.26 0.00 0.09 0.02 0.02 0.98 0.89 0.61 a “-/-“ or “+/+” indicates that both duplicate samples produced either negative or positive PCR results; “+/-“ indicates that only one of the duplicates samples

produced a positive signal. Environmental samples were processed in duplicate with the exception of Dry Creek WWTP effluent (n=1) and NWWTP effluent

(n=3). ND, not determined.

b Indicates samples that produced positive signals after increasing the number of PCR cycles from 35 to 45.

c BiADO, BiANG, BiBIF, BiBRE, BiCAT, BiDEN, BiGAL, BiINF, and BiLON refer to PCR assays targeting B. adolescentis, B. angulatum, B. bifidum, B.

breve, B. catenulatum, B. dentium, B. gallicum, B. infantis, and B. longum, respectively.

d Sensitivity was calculated by dividing the number of positive PCR results by the number of positive and negative PCR results

47 and Bifidobacterium fecal bacteria communities indicated OTU richness and diversity indices were significantly higher for Bacteroidales as compared to Bifidobacterium in the fecal communities tested (Table 2.5.). Clones sharing at least 98% sequence identity to one another were placed in the same taxonomic unit. The observed number of OTUs for 356 clones from

Bifidobacterium fecal libraries was 22, while 356 Bacteroidales fecal clones formed 116 OTUs.

Two non-parametric estimators of OTU richness, mean Chao 1 and ACE were calculated to be

28.25 and 28.32, and 455.31 and 1089.49, for fecal Bifidobacterium and Bacteroidales richness, respectively. The confidence intervals for the fecal community estimators did not overlap

(p<0.05) suggesting that there is a significant difference between Bifidobacterium and

Bacteroidales fecal OTU richness. Thus, for these fecal clone libraries, Bacteroidales appear to have higher species richness than Bifidobacterium spp. However, the observed species richness of Bacteroidales appears to be driven by singletons (species captured once), or low abundance classes, as nearly 75% of the observed OTU’s are singletons. In contrast, less than 25% of the

Bifidobacterium OTU’s come from singletons, and the remaining species are derived from species captured more than once. These findings have important implications in the development of assays targeting specific sources of fecal pollution and in further understanding how fecal bacterial populations adapt to a particular set of gut conditions.

48

35 Sample-based Mao Tau Method Chao 1 Estimator of Species Richness ACE Estimator of Species Richness 30

25

20

15

10 Number of Operational of Number Taxonomic Units 5

0 0 50 100 150 200 250 300 350 400 Number of Clones

Fig. 2.1. Observed (diamonds) and estimated (squares and triangles) OTU richness of

Bifidobacterium spp. in animal feces versus sample size. The rarefaction curve (i.e, number of observed phylotypes (Sobs) as a function of number of clones) was calculated using the sample- based Mao-Tau, Chao 1, and ACE estimators averaged over 100 simulations. The dotted lines indicated the 95% confidence interval for the rarefaction calculations.

49

Table 2.5. Bifidobacterium and Bacteroidales sequence diversity and richness estimators

Bifidobacterium Fecal Bacteroidales Fecal

Population Population

Number of sequences 356 366

Number of OTU's (98%) 22 116

Singletons 5 85

Chao 1 estimator of OTU richness 28 455

Chao 1 95% confidence interval 23 - 62 280 – 821

ACE estimator of OTU richness 28 1089

ACE standard deviation 0.4 121

Shannon's Index of Diversity 2.5 3.7

Simpson's Index of Diversity 9.5 16.1

2.4.3. Phylogenetic analyses of Bifidobacterium clones.

Three hundred and sixty six partial 16S rRNA gene sequences derived from 14 different animal feces were analyzed. The final phylogenetic analysis included 55 pig, 53 chicken, 51 cattle, 51 human, 43 cat, 22 deer, 19 pigeon, 17 seagull, 12 sheep, 11 peacock, 10 dog, six goose, three alpaca, three goat sequences, and sequences from cultured Bifidobacterium strains. A total of 32 chimeric sequences were excluded in the analysis. More than half (i. e.,

56%) of all unique clone sequences exhibited low sequence similarity ( 96%) to bifidobacteria-

50 related 16S rDNA sequences, indicating that the phylogenetic diversity of Bifidobacterium-like sequences may be currently underrepresented in the publicly available databases.

The overall phylogenetic tree topology was in agreement with previous studies using cultured strains (Leblond-Bourget et al., 1996, Sakata et al., 2006). For instance, sequences having >97% sequence similarity with cultured strains indeed formed specific clusters with species like B. catenulatum, B. pseudocatenulatum, B. longum, B. coryneforme, B. asteroides, and B. pseudolongum groups (Leblond-Bourget et al., 1996, Miyake et al., 1998, Sakata et al.,

2006). The overall topology of the neighbor-joining tree was supported by parsimony trees with

100 resamplings (data not shown). Most major branching orders of the phylogenetic tree were supported by bootstrap values of at least 63% of the parsimony bootstrap-resamplings, while two branches of the unidentified clades containing fecal sequences distantly related to cultured

Bifidobacterium species were not supported by high bootstrap values (43% and 21%). These low bootstrap values may be a result of the use of partial 16S rRNA sequences limiting phylogenetic resolution in a comparative analysis (Hugenholtz et al., 1998). All other subclusters in the parsimony analysis were supported by bootstrap values of 90% and higher.

Phylogenetic analysis of the fecally-derived bifidobacterial clones revealed previously unidentified host-microbial distributions (Fig. 2.2). For example, approximately 67% of all the sequences (Group I and III) were associated with cultured bifidobacteria. Bifidobacterium Group

I contains several reference strains as well as mammalian and avian-derived clones. Twenty-four of 51 human fecal clones clustered with B. ruminatum, B. adolescentis, B. pseudocatenulatum, and B. longum (>97% sequence similarity), while 16 other human fecal clones formed subclusters related to identifiable strains. Additionally, all 43 domestic cat sequences were

51 found within Group I, sharing high sequence similarity with B. pseudocatenulatum, B. longum, and unidentified human fecal clones. Close sequence identity among human and cat bifidobacteria-related clones may suggest that the close interaction shared among domesticated animals and their owners may serve as pathway for sharing gut microflora. Interestingly, dog sequences were missing from the predominantly “human” clade, perhaps due to the low number of clones examined. Another interesting host-distribution pattern was noted in Group I, in which

21 chicken, 3 geese, and 2 peacock clones formed a subcluster with B. gallinarum, a strain commonly isolated from the chicken cecum (Lauer, 1990). This avian-derived clade suggests that B. gallinarum and closely related populations may exhibit host-specificity.

Group II contained cultured strains of bifidobacteria, including B. coryneforme, B. indicum, B. asteroides, B. minimum, and B. subtile (Fig. 2.2.). None of the 356 clones clustered with these bacteria, suggesting these species may not be common intestinal members in the animal types used in the study. Group III comprised cultured strains including B. pseudolongum, B. animalis, B. choerinum, B. bifidum, B. thermophilum, and B. boum, in addition to fecal sequences derived from eight different animal fecal samples. Most notably, B. pseudolongum formed a subcluster (>97% similarity) with 53 fecal sequences from seven different animal types. Previous studies have characterized B. pseudolongum strains isolated from feces of pig, chicken, bull, calf, guinea pig, rabbit, lamb, and cattle, further supporting the seemingly cosmopolitan lifestyle of species within this clade (Biavati et al., 1982, Mitsuoka,

1969, Yaeshima et al., 1992). In contrast, two subclusters within Group III were composed of 37 and 18 pig fecal clones. These are closely related to B. thermophilum and B. boum, species which have been previously isolated from swine feces (Biavati et al., 2000, Resnick and Levin, 1981).

Culture-based studies previously indicated that B. suis (closely related to B. longum) was the

52 predominant bifidobacterial species in the gastrointestinal tracts of pigs (Matteuzzi et al., 1971).

However, Mikkelsen et al. (2003) found that most bifidobacterial isolates had restriction patterns nearly identical (>99.5%) to B. boum, supporting our finding that these populations may be endemic to the swine gut (Mikkelsen et al., 2003).

Group I

Group II

Group III

(18)

Group IV

Fig. 2.2. Phylogenetic tree of Bifidobacterium 16S rRNA gene sequences (356) derived from 14 different mammalian and avian hosts, based on a neighbor joining algorithm. Numbers in parenthesis indicate the number of sequences associated in each clade for a given host. Clone libraries were generated using genus-specific primer set, Bif164-f and Bif662-r. Sequences for

53 cultured bifidobacteria genera were added to the analyses as reference points while the 16S rRNA sequence of Arcanobacterium haemolyticum (accession #: AJ234059) was used as the outgroup.

Group IV contained nearly one third of the total fecal clones (Fig. 2.2.). Nine out the 14 different animal feces were represented within this group. Approximately, 22% of the sequences formed a supercluster within Group IV that did not associate with any cultured Bifidobacterium species and exhibited some of the lowest similarities to publicly available sequences.

Interestingly, most of the pseudo-ruminant (multi-gastric) fecal clones were members of these unidentified clades. For example 22/22 deer, 9/12 sheep, 3/3 goat, and 3/3 alpaca sequences were associated with these distant subclusters. These results suggest that this cluster comprises a novel bifidobacteria-like group common to the pseudoruminant gut (three-chamber stomach typical of horses, llamas, camels, and alpacas). The rest of the sequences in Group IV were primarily avian-derived clones with high sequence similarity to Aeriscardovia aeriphila, previously known as B. aerophilum (Simpson et al., 2003).

The results obtained in this study suggested that certain bifidobacterial species might prefer a cosmopolitan lifestyle, while others appear to exhibit preferential host-distribution.

These findings are relevant to monitoring microbial water quality from the standpoint that certain bifidobacterial species might not be good targets for the development of methods to determine human fecal pollution in watersheds impacted with different fecal sources as suggested in previous studies. This is the case for B. adolescentis and B. dentium, as these species were detected by molecular means in several non-human hosts. On the other hand, some bifidobacterial groups might represent good target populations for assessing the specific

54 contribution of fecal sources like swine and avian hosts. At the 16S rDNA level, fecal bifidobacteria do not appear to be as diverse as other fecal anaerobic bacteria like Bacteroides, according to our diversity calculations. From a detection standpoint, low sequence diversity suggests that molecular methods could be used to comprehensively study the dynamics of bifidobacteria in different environmental scenarios. Such low phylogenetic diversity might also indicate that as a group bifidobacteria has a relatively narrow environmental niche, which is compatible with their poor survival skills outside of the animal gut. The relatively low incidence of bifidobacteria detected with the species-specific assays in this study suggests that their overall densities in sources of fecal pollution is also low and consequently the levels of bifidobacteria reaching environmental waters might not always correlate with the densities of traditional indicators of fecal pollution or with health risks. In fact, bifidobacteria have not been detected in waters showing evidence of pollution as determined by the presence of fecal indicators that exhibit higher environmental survival rates (Carrillo et al., 1985). Similarly, Rhodes and Kator

(Rhodes and Kator, 1999) failed to detect bifidobacteria in summer months, when water temperatures were between 23 and 30 ºC, an important fact considering the potential for higher exposure to waterborne pathogens due to increase in recreational activities during this period.

Moreover, in the latter study bifidobacteria were isolated in only 11 of the 250 water samples tested, even though the overall fecal coliform average for the samples was approximately 374 per

100 ml. In our study the majority of the water samples tested positive to fecal anaerobic bacteria

(Bacteroidales and C. coccoides) but negative for bifidobacteria. Hence, the use of bifidobacteria as indicators of fecal pollution in environmental waters might only be applicable in a limited number of circumstances (Carrillo et al., 1985), such as fecal contamination

55 associated with rainstorm events or nearby specific sources of pollution when there are high loads of recent fecal contamination.

2.5. Acknowledgements

This research was supported in part by an Augmentation Award to JSD from the National

Center for Computational Toxicology of the U.S. EPA, Office of Research and Development.

We are grateful for the technical assistance from Jingrang Lu and Christopher Luedeker, and

George Di Giovanni and Donald Stoeckel for sharing their fecal sample collections. We would also like to thank Brent Gilpin and Sharon Long for providing bifidobacterial DNA extracts and pure cultures. The opinions expressed in this paper are those of the author(s) and do not, necessarily reflect the official positions and policies of the U.S. EPA. Any mention of products or trade names does not constitute recommendation for use by the U.S. EPA.

56

CHAPTER 3

Evaluation of Swine-Specific PCR Assays Used for Fecal Source Tracking and Analysis of

Molecular Diversity of Swine-Specific "Bacteroidales" Populations

as published in

Applied and Environmental Microbiology, 75:5787-5796.

Copyright © 2009, American Society for Microbiology. All Rights Reserved.

Regina Lamendella has a non-exclusive license agreement (License Number: 2298500401970) with American Society for Microbiology provided by the Copyright Clearance Center for reproduction of this work within this dissertation document.

3.1. Abstract

In this study we evaluated the specificity, distribution, and sensitivity of Prevotella-

(PF163 and PigBac1) and methanogen-based (P23-2) PCR assays proposed to detect swine fecal pollution in environmental waters. The assays were tested against 222 fecal DNA extracts derived from target and non-target animal hosts, and against 34 groundwater and 15 surface water samples from five different sites. We also investigated the phylogenetic diversity of 1,340

Bacteroidales 16S rRNA gene sequences derived from swine feces, swine waste lagoons, swine manure pits, and waters adjacent to swine operations. Most swine fecal samples were positive for the host-specific Prevotella-based PCR assays (80-87%), while fewer were positive using the

57 methanogen-targeted PCR assay (53%). Similarly, the Prevotella markers were detected more frequently than the methanogen-targeted assays in waters historically impacted with swine fecal contamination. However, the PF163 PCR assay cross-reacted with 23% of non-target fecal DNA extracts, although Bayesian statistics suggested that it yielded the highest probability of detecting pig fecal contamination in sites with a history of swine fecal pollution. Phylogenetic analyses revealed previously unknown swine-associated clades comprised of clones from geographically diverse swine-sources and from water samples adjacent to swine operations that are not targeted by the Prevotella assays. While deeper sequencing coverage might be necessary to better understand the molecular diversity of fecal Bacteroidales species, results of sequence analyses supported the presence of swine fecal pollution in the studied watersheds. Overall, due to non- target cross amplification and poor geographic stability of currently available host-specific PCR assays, development of additional assays are necessary to accurately detect sources of swine fecal pollution.

3.2. Introduction

The size of swine farming operations has increased significantly during the last few decades as a result of the high demand in pork products. In fact, pork is now considered the most popular meat worldwide (Guan and Holley, 2003). In the United States, the number of large confined swine animal units increased three order of magnitude from 1982 to 1997 (Kellog et al.,

2000), making the swine industry among the top three producers of domesticated animal feces.

A direct consequence of this trend is the increase in swine fecal waste, which in turn has raised environmental concerns. When introduced to water swine fecal waste can present risk to human health because this waste can harbor a variety of human pathogens (Buckholt et al., 2002,

58

Fratamico et al., 2004, Guan and Holley, 2003, Krapac et al., 2002, Wegener et al., 1996). The diversity and relatively high frequency of human pathogens in swine feces make swine important reservoirs of zoonotic pathogens. Moreover, the marked increase in the number of large operations has resulted in increased manure production and application in small geographic areas, creating an imbalance between the assimilative capacity of manure-applied farmland and the amount of manure nutrients produced on each farm. This imbalance is evidenced by the 20% increase (from 1982-1997) in nitrogen and phosphorus produced in swine operations, thus potentially contributing to the detrimental eutrophication of aquatic ecosystems (Kellog et al.,

2000). Swine manure spills and leaks are commonplace in the top hog-production states, such as

Iowa and North Carolina, due to failure or overflow of manure storage, uncontrolled runoff from open feedlots, improper manure application on cropland, deliberate pumping of manure onto the ground, and intentional breeches in storage lagoons (Merkel, 2004, Wing et al., 2002).

Recently, swine-associated PCR-based methods targeting members of the Bacteroidales order (i.e., Prevotella) and methanogen populations (Dick et al., 2005, Okabe et al., 2007, Ufnar et al., 2007) have been proposed to discriminate swine fecal pollution events from other potential fecal contributions (i.e. human, bovine, wildlife) to environmental waters. Nevertheless, the value of these assays in reliably detecting fecal pollution sources in watershed-based studies has not been thoroughly investigated. The main goals of this study were to determine host- specificity, frequency of detection, and detection limits of currently available swine-associated

PCR-based, microbial source tracking (MST) assays. To achieve these objectives, assays were tested against swine and non-target fecal samples, samples from swine manure-pit and swine waste lagoons, and water samples presumed to be impacted by swine fecal sources. Furthermore, we investigated the phylogenetic diversity of Bacteroidales 16S rRNA gene sequences derived

59 from some of the aforementioned samples to resolve the level of specificity, relative abundance, and environmental occurrence of Bacteroidales-specific 16S rRNA gene sequences.

3.3. Materials and Methods

3.3.0. Sample Collection.

Fecal (n=215), manure pit (n=4), and waste lagoon (n=3) samples were collected from different sites in Illinois, Nebraska, Ohio, Texas, Delaware, and West Virginia (see Table S1 in supplemental material). Selection of source material was based on the goal of including as many different animal types as possible to check for host-specificity, with emphasis on hosts considered to be important sources of fecal pollution in the United States. Approximately, 1.0 to

2.0 g of the fecal material were placed into individual sterile vials and processed as previously described (Chee-Sanford et al., 2001, Koike et al., 2007). One liter of manure pit and lagoon liquid was collected in autoclaved bottles and transported on ice to the laboratory. To ease filtration, manure pit and lagoon samples were first centrifuged at 8,000 x g for 10 min at 4 °C and the supernatant was then filtered onto 0.45 m polycarbonate filters. DNA extractions were performed for both pellets and filters immediately after the centrifugation/filtration process.

Water samples were collected from multiple sites within two Texas watersheds and three sites in Illinois known to be impacted with fecal pollution sources (Fig. S3). Specifically, water samples from Texas were collected (n=5) from the Red River basin (TX) in section 207A of

Buck Creek (Collingsworth, TX), which is currently on the impaired waters 303(d) list for exceeding fecal bacteria concentrations. Water samples (n=6) were also collected from Lake

Granbury (Segment 1205) in the Brazos River Basin (TX), which serves as a critical water supply in North Texas and provides water for more than 250,000 customers. Texas water

60 samples (100 ml) were collected, placed on ice for transportation to the laboratory and filtered within six hours of sample collection. Samples were filtered through 0.2 µm pore size filters as previously described (Lamendella et al., 2007). DNA was extracted using the QIAamp DNA mini kit (Qiagen, Valencia, California) and stored at -80 ºC until further analyses.

Duplicate water samples were also collected from monitoring wells located on three commercial swine operations in Illinois and surface water adjacent to these operations, herein described as Site A, C, and E (Fig. S3). Specifically, samples were collected from 14 wells from

Site A, six wells from Site C, and 12 wells from E. Site A is a 4,000-pig finishing operation that uses a two-stage waste handling system in which a concrete settling basin collects solids prior to the supernatant liquid passively moving into an unlined lagoon. The aim of the two-stage waste handling system is to reduce fecal loading into the lagoon. Site C is a farrowing and nursery operation that houses up to 2,500 sows and that utilizes a single-stage 6 m deep unlined lagoon to directly collect both feces and urine. Site E is a 2,300-hog finishing facility that uses a concrete lined pit system for manure storage. Well installation and groundwater sample collection has been previously described (Koike et al., 2007). Surface water samples were also collected from a groundwater field seep at Site C, and streams north and south of Site A, and south of Site C.

Groundwater samples (250 ml) from the sites in Illinois were collected in sterile plastic bottles and transported to the laboratory on ice. Samples were centrifuged at 17,700 × g for

20 min at 4°C and the supernatants were discarded. The pellets were then washed three times

with 0.1 volume of phosphate-buffered saline (120 mM NaH2PO4 [pH 8.0], 0.85% NaCl) before extraction of total DNA (Koike et al., 2007). Three surface water samples (500 ml) were also collected from a stream located less than 100 m from a swine operation housing approximately

200 animals in Loudonville, Ohio. Samples were stored in ice coolers and transported to the

61 laboratory within four hours of collection. Each sample was centrifuged at 3,600 × g for 20 min at 4°C, after which the supernatants were filtered onto 0.22 µm polycarbonate membranes

(Lamendella et al., 2007). Filters and pellets were stored at -20ºC until further processing.

3.3.1. DNA extraction.

Fecal DNA was extracted with the FastDNA SPIN Kit (MP Biomedicals, Inc., Solon,

OH) according to the manufacturer’s instructions using 250 µl of each fecal slurry. For Ohio water samples, DNA was extracted directly from whole filters and pellets using the FastDNA spin kit. For Illinois water samples DNA was extracted from pellets also using the FastDNA spin kit. Total DNA from corresponding filters and pellets was eluted in 100 µl of 10 mM Tris and combined in a sterile tube. DNA was then quantified using a NanoDrop ® ND-1000 UV spectrophotometer (NanoDrop Technologies, Wilmington, DE). To test for the presence of extraneous DNA contamination introduced during laboratory procedures, extraction blanks (n=8) were included in the PCR assays. DNA extracts were stored at -20ºC until further processing.

3.3.2. PCR assays and limits of detection.

Four different PCR assays, including three assays reported to be swine-associated, were tested using DNA extracts from fecal sources and water samples as templates. Two of the swine- associated PCR assays, PF163 and PigBac1, target the 16S rRNA gene of members of the

Bacteroidales order, specifically Prevotella spp. (Dick et al., 2005, Okabe et al., 2007), while

P23-2 targets the methyl coenzyme-M reductase (mcrA) gene of methanogenic bacteria (Ufnar et al., 2007) (Table 3.1.). Additionally, a general Bacteroidales 16S rRNA gene PCR assay (Bac32)

(Bernhard and Field, 2000) was used to detect potential PCR inhibition, the overall presence of

Bacteroidales fecal anaerobic bacteria in water samples, and to assess Bacteroidales phylogenetic diversity in different fecal sources and environmental samples impacted by fecal

62 pollution via sequencing studies.

Table 3.1. Description of primers tested in this study

Primer Name Primer Sequence Target Reference P23-2f 5'-TCTGCGACACCGGTAGCCATTGA-3' mcrA gene of methanogens (Ufnar et al., 2007) P23-2r 5'-ATACACTGGCGACATTCTTGAGGATTAC-3' Bac32F 5'-AACGCTAGCTACAGGCTT-3' 16S rRNA gene of (Bernhard and Field, 2000) Bac708R 5'-CAATCGGAGTTCTTCGTG-3' Bacteroidales PigBac1f 5’-CGGGTTGTAAACTGCTTTTATGAAG-3’ 16S rRNA gene of Prevotella- (Okabe et al., 2007) PigBac1r related group 5’- CGCTCCCTTTAAACCCAATAAA-3’ PF163 5'-GCGGATTAATACCGTATGA-3' 16S rRNA gene of Prevotella- (Dick et al., 2005) Bac708R 5'-CAATCGGAGTTCTTCGTG-3' related group

Reactions for the general Bacteroidales assay were conducted using the previously described protocol (Bernhard and Field, 2000). PCR conditions for the PF163 swine-associated assay have not been previously described and were determined as follows: 94 ºC for 5 min, followed by 35 cycles of 94 ºC for 20 s, 58 ºC for 20 s, and 72 ºC for 30 s, and a final extension step consisting of 72 ºC for 5 min. The other the swine-associated assays, PigBac1 and P23-2, were used as described elsewhere (Okabe et al., 2007, Ufnar et al., 2007). Fecal and water DNA template concentrations used in the PCR assays were adjusted based on published detection limits. Specifically, 0.2 and 1 ng were used for Bac32, 1 and 10 ng were used for the PF163 and

PigBac1 assays, and 50 ng were used for the P23-2 reactions as suggested by the authors who originally designed these assays. Multiple template concentrations were used, as the commonly found levels of the targeted populations for each assay are different. Final PCR solutions (25 µl total volume) contained 2.5 µl of Takara Ex Taq 10X buffer (20 mM Mg2+), 2 µl of dNTP mixture (2.5 mM each), 1 µl of 25% acetamide, 17.5 µl of UltraPure water, 12.5 pmol of each

63 forward and reverse primers, and 0.625 units of Ex Taq DNA polymerase (TAKARA Mirus

Bio., Madison, WI). Reactions were conducted on a DNA Engine 2 Tetrad thermalcycler (Bio-

Rad Laboratories, Inc., Hercules, CA). Amplification products were visualized using 1% agarose gels and GelSTAR nucleic acid stain (Cambrex BioScience, East Rutherford, NJ). PCR

Inhibition was tested in water DNA extracts by using 8F and 787R general bacterial 16S rRNA gene-targeted primer sets, as described by Buchholz-Cleven et al., (Buchholz-Cleven et al.,

1997).

The performance of each swine-associated assay was determined in PCR assays containing known concentrations of fecal and water DNA extracts. Using this approach, it is possible to determine the detection limits of an assay against environmental extracts (Lamendella et al., 2007). PCR assays were performed using templates consisting of serial fecal DNA dilutions (1x10-8 - 1x10-16 g DNA) of composite fecal samples of swine from different age groups, lagoons, manure pits, and selected water samples yielding positive PCR results.

3.3.3. Cloning and sequencing analyses.

General Bacteroidales (Bac 32F/Bac 708R)PCR products were used in cloning experiments to qualitatively assess the molecular diversity of Bacteroidales species in different hosts. Sequencing and data analyses were performed as previously described (Lamendella et al.,

2007). Briefly, PCR products were purified using the QIAquick PCR Purification Kit according to the manufacturer’s instructions (QIAGEN, Valencia, CA). Representative PCR products derived from swine feces (feral and domesticated), manure pits, lagoons, and water adjacent to swine farms in Illinois and Ohio were cloned into the pCR4.1 TOPO vector as described by the manufacturer (Invitrogen, Carlsbad, CA). Individual E. coli clones were subcultured into 300 μL of Luria Broth containing 50 μg/mL ampicillin and screened for inserts using M13 PCR. Clones

64 were submitted to Children’s Hospital DNA Core Facility (Cincinnati, OH) for sequencing using

Big Dye sequencing chemistry (Applied Biosystems, Foster City, CA), M13 forward and reverse primers, and an Applied Biosystems PRISM 3730XL DNA Analyzer. Sequences were manually verified and cleaned using Sequencher 4.7 software (Gene Codes, Ann Arbor, MI). Chimeric sequences detected using Bellerophon (Huber et al., 2004) were not included in further analyses.

Non-chimeric sequences were submitted to Greengenes for alignment using Nearest Alignment

Space Termination (NAST) algorithm (DeSantis et al., 2006a, DeSantis et al., 2006b).

Sequences were also submitted to BLAST homology search algorithms to assess sequence similarity to the Greengenes database (Altschul et al., 1990, DeSantis et al., 2006a). The distance matrix and phylogenetic tree were generated using ARB software (Ludwig et al., 2004).

Trees were inferred from 650 sequence positions using neighbor-joining (using a Kimura correction) and maximum parsimony (using the Phylip DNAPARS tool) (Ludwig et al., 2004).

To statistically evaluate branching confidence, bootstrap values were obtained from a consensus of 100 parsimonious trees using MEGA software (http://www.megasoftware.net). Werenella spp. rRNA 16S gene sequence (accession number AJ234059) was used as the outgroup, while cultured Bacteroidales species were included in the analyses as point of reference.

Representative sequences generated in this study have been deposited in the GenBank database under accession numbers FJ596647-FJ596751 .

3.3.4. Statistical analyses and molecular diversity estimates.

The ability of each marker to accurately detect swine feces within a given water sample was determined using Bayes’ theorem as described by Kildare et al. (2007) with minor modifications (Kildare et al., 2007). Briefly, the posterior probability or P(S│T) that a given pig-specific PCR assay generated a true positive signal in a water sample was estimated using

65 the following formula:

P(S│T) = P(T│S)*P(S)

P(T│S)*P(S) + P(T│S’)*P(S’) where P(T│S) is the proportion of positive signals in swine fecal samples; P(T│S’) is the proportion of positive signals in non-swine fecal samples (false positives in fecal samples); P(S) is the prior probability of swine fecal contamination in a water sample; P(S’) is probability that a given water sample is not contaminated with pig feces. P(S│T) was calculated over a range of possible prior probabilities, as in all cases the prior probability of swine fecal contamination was unknown. The posterior probability that each swine-associated marker generated a false negative result in a water samples was calculated using the same Baysesian framework as described above, with the exception that P(T│S) is the proportion of false negative signals in swine fecal samples and P(T│S’) is the proportion of true negative signals in non-swine fecal samples.

Additionally, to understand if the combination of any two assays increased the confidence that swine contamination was present in a water sample, the posterior probability from one assay

(e.g.PigBac1) was used as the prior probability of the second assay (e.g. P23-2). A similar approach was used to determine the confidence of using more than two markers, although in this case the posterior probability of two combined assays (e.g.PigBac1) was used as the new prior probability of the third assay (e.g. P23-2).

Molecular diversity analysis and assemblage comparison of clone libraries were performed using DOTUR and SONS softwares, respectively (Schloss et al., 2005; Schloss et al.,

2006). Specifically, DOTUR was used to place sequences into operational taxonomic units, to compute Chao 1 indices, abundance-based coverage estimators (ACE), Shannon and Simpson’s

66 diversity indices, and to perform rarefaction analyses. These diversity estimates were calculated for lagoon, manure pits, feces, and water sequences to determine if the current level of sequencing performed in this study saturated Bacteroidales diversity and to confirm the level of inclusiveness of the Bacteroidales-based assays. SONS was used to characterize community structure overlap amongst clone libraries derived from lagoon, manure pits, feces, and water adjacent to swine farms.

3.4. Results and Discussion

3.4.0. Detection limits and performance of PCR assays for fecal sources.

With the exception of fecal samples collected from eight week old piglets, results from the detection limit experiments indicated that the general Bacteroidales assay had lower detection limits than the swine-associated markers in fecal and environmental samples (Table

3.2.). This was expected because host-specific populations represent a small group within general bacterial classes (Bernhard and Field, 2000, Lamendella et al., 2007). The abundance of fecal populations targeted by the PF163 assay changed with the host age, suggesting that physiological and dietary changes can have an impact on the dynamics of these bacterial populations. In contrast, the abundance of the populations targeted by the PigBac1 assay did not change over time. Interestingly, the PF163 and PigBac1 assays target different Prevotella clades.

Each Prevotella subgroup might occupy different niches in the swine gut and therefore be involved in different interactions with the host. Because it is not well understood how 16S rRNA gene sequence similarity can predict the level of convergence of genes relevant to survival in the gut environment (Lozupone et al., 2008), it is possible that these populations coexist by sharing limited overlapping niches, particularly niches involved in polysaccharide utilization (Bjursell et

67 al., 2006), as it has been suggested for Bacteroides fragilis and B. thetaiotaomicron in the human gut (Xu et al., 2007). The detection limits for the mcrA-based assay were similar for all fecal samples, regardless of age, suggesting no major changes in the abundance of swine-associated methanogen populations. This is in contrast with the general belief that methanogen densities increase with age in pigs and humans (Maczulak et al., 1989).

Table 3.2. Environmental limits of detection for general Bacteriodales and swine-associated

PCR assays for DNA extracted from fecal and water samples.

Swine-associated General Swine-associated Swine-associated Prevotella Bacteroidales Prevotella Methanogens (PigBac1) Sample Type (Bac32)a (PF163) (P23-2)

Pig fecal (8 weeks)b 1x10-15 1x10-15, c 1x10-8 1x10-12 d

Pig fecal (6 months) 1x10-15 1x10-13 1x10-8 1x10-12

Pig fecal (3-5 years) 1x10-15, c 1x10-12 1x10-8 1x10-12

Lagoon (Site A) 1x10-12 1x10-9 5x10-7, e >1x10-8, f

Lagoon (Site C) 1x10-14 1x10-9 1x10-8 >1x10-8, f

Manure Pit (Site E) 1x10-12 1x10-8 >1x10-7, e >1x10-8, f

Water (Site A9) 1x10-10 1x10-9 >1x10-7, e >1x10-8, f

Water (Site C-South Stream) 1x10-9 1x10-9 1x10-7 >1x10-8, f

Water (Site E8) 1x10-14 1x10-8 >1x10-7, e >1x10-8, f

Water (Site C2) 1x10-12 1x10-8 5x10-7 >1x10-8, f

Water (Lake Granbury- Site 18015) 1x10-10 1x10-9 >1x10-7, e 1x10-9

Water (Buck Creek-Site 10A) 1x10-11 1x10-9 >1x10-7, e >1x10-8, f a All detection limits are in the units of grams of DNA extract. b For pig fecal DNA detection limits, four fecal samples from each age class (i.e. eight weeks, six months, and three to five years) were pooled. c Detection limit for the duplicate sample was 1 x 10-14 grams of DNA. d Detection limit for the duplicate sample was 1 x 10-13 grams of DNA. e “>” means that detection limit was more than 1x10-7 grams of DNA extract.

68

In most cases, the detection limits for all assays were lower in swine feces than in lagoons and manure pit samples suggesting the poor survival of these fecal anaerobic bacterial populations during waste management practices. The PF163 assay had the lowest detection limits for most fecal, lagoon, and manure samples. The PigBac1 assay was the least sensitive for both lagoon and manure samples. Specifically, when swine fecal DNA extracts were used as

PCR templates, the methanogen assay (P23-2) was at least four orders of magnitude less sensitive than each of the other two host specific assays. This is not surprising as methanogens have been reported to be 3-4 orders of less prevalent than the total number of anaerobic bacteria in pigs and humans (Sorlini et al., 1988). However, when the lagoon samples were tested the difference between the methanogen assay and the PF163 was only 1-1.5 orders of magnitude.

Overall, these results may suggest differences in survival rates between different host-specific bacterial groups in manure pits and lagoons. As these are the most likely swine pollution sources, these results have practical implications when selecting specific assays for source tracking studies. For example, differences in detection limits amongst pig-associated assays may suggest that different pig-associated populations are prevalent in lagoons versus manure pits. Thus, targeting these different manure pit and lagoon populations, may enhance the usefulness of these

PCR-based assays as risk assessment tools and in estimating fecal load rates for total maximum daily loads. Our data are consistent with differential distribution among host populations and differential survivorship with respect to the fate and transport of the marker targets as noted by others (Leach et al., 2008).

With the exception of raccoon feces, of which only a third of the individual samples were positive, most fecal sources produced positive signals with the general fecal Bacteroidales assay

(Table 3.3.). Differences were noted, however, in host distribution and geographical stability for

69 each of the swine-associated assays. For example, PigBac1 and PF163 assays were positive for at least 80% of swine fecal samples, while the methanogen-based PCR assay (i.e., P23-2) yielded positive results in only 53% of swine fecal samples. Specifically, the methanogen assay performed poorly in fecal samples derived from Ohio and Texas perhaps due to differences in animal husbandry practices used at these locations (Table 3.3. and Table S1). Moreover, while all swine manure pit and lagoon DNA extracts (n=6) yielded positive signals with the PF163 assay, only one sample was positive with either the PigBac1 or P23-2 assays. Altogether, these data demonstrate that the PF163 was the most frequently detected of the targets tested in this study.

While PF163 and PigBac1-like populations were present in most pig fecal samples, the host-specificity tests showed higher false positive rates for these assays than the methanogen- based assay. For example, false positive signals were obtained for several non-target hosts, including horse, human, and chicken fecal DNA extracts (Table 3.3.). Gourmelon et al.

(Gourmelon et al., 2007) reported cross-reactivity for the PF163 assay, particularly when using chicken fecal DNA templates. These data indicate that these host-specific Prevotella populations are also present in the feces of animals other than swine, and therefore, assays based on these specific populations are prone to introduce false positive signals when analyzing environmental samples.

70

Table 3.3. Specificity of general Bacteriodales and swine-associated PCR markers using fecal

DNA extracts.

General Pig Pig Bacteroidales Pig Prevotella Methanogen Prevotella Fecal Type (origin) (Bac32) (PF163) (P23-2) (PigBac1) Pig Feces (DE) 100 (9/9)a 44 (4/9) 78 (7/9) 44 (4/9) Pig Feces (OH) 100 (52/52) 98 (51/52) 42 (22/52) 81 (42/52) Pig Feces (TX) 100 (7/7) 100 (7/7) 29 (2/7) 100 (7/7) Pig Feces (TX) 100 (9/9) 89 (8/9) 22 (2/9) 100 (9/9) Pig Feces (WV) 90 (18/20) 70 (14/20) 90 (18/20) 90 (18/20) Cattle (WV) 100 (20/20) 40 (9/20) 5 (1/20) 10 (2/20) Pig Manure Pits (OH) 100 (3/3) 100 (3/3) 0 (0/3) 33 (1/3) Pig Manure Pit (IL) 100 (1/1) 100 (1/1) 0 (0/1) 0 (0/1) Pig Lagoons (IL) 100 (2/2) 100 (2/2) 0 (0/2) 0 (0/2) Human Feces (WV) 100 (10/10) 30 (3/10) 30 (3/10) 60 (6/10) Chicken (DE) 88 (7/8) 50 (4/8) 38 (3/8) 63 (5/8) Raccoon (NE) 34 (23/68) 4 (3/68) 1 (1/68) 29 (20/68) Horse (WV) 100 (12/12) 67 (8/12) 0 (0/12) 50 (6/12) Cattle Lagoon (OH) 100 (1/1) 0 (0/1) 0 (0/1) 0 (0/1) Diagnostic Specificity - 0.77 0.93 0.67 Diagnostic Sensitivityc - 0.87 0.49 0.79 a Percent positive PCR result using a given marker on a given source type. Numbers in parentheses indicate number positive PCR results divided by the total number of source type samples tested. b Diagnostic specificity is defined as the number of non-pig fecal source samples that produce negative

PCR results divided by the total number of non-pig fecal source samples tested (n=119) c Diagnostic sensitivity is expressed as the number of pig source samples testing positive divided by the total number of pig source samples tested (n=103).

3.4.1. PCR assay performance in water samples adjacent to three swine farms in Illinois.

Water samples collected in Illinois were positive at a relatively high frequency (i.e., 40-

78%) using the general Bacteroidales assay (Table 3.4. and Table S3), suggesting the presence of fecal contamination. Considering that all Illinois sampling stations were adjacent to swine farms and no other domesticated animal practices are known to occur near these sampling stations, it is assumed that swine are the primary source of fecal contamination in these sites.

71

Indeed, water samples tested positive for the swine-associated assays at many of the sites, although at a relatively low frequency. For example, at Site A the general Bacteroidales marker produced positive signals in 40-45% of the different water DNA extracts tested, while the host- specific assays produced positive PCR signals for only 5-20% of the samples. Of each of the swine-associated markers, the methanogen targeted P23-2 assay yielded the lowest frequency of detection, with a positive PCR result in only 0-22% of the water sample DNA extracts from the three Illinois sampling locations. This may be explained by the lower abundance of methanogens found in the pig gut as compared to Bacteroidales (Butine and Leedle, 1989) and by the fact that managed swine fecal waste might select for different methanogenic populations

(Ufnar et al., 2007). Interestingly, all three markers produced the highest proportion of positive pig-associated signals at site C (22-56%), as compared to site A (0-20%) and site E (0-38%), which may be explained by the different waste handling strategies employed at each farm. The two-stage waste handling system used at site A may have resulted in reduced fecal loading into the shallow (1.5 m) lagoon, supporting our finding of lower proportion of general and pig- associated signals at site A. Additionally, few positive pig signals were observed at site E, which may be explained by the use of a concrete-lined manure pit, limiting the direct flow of waste into groundwater at this site. In contrast, site C had the highest frequency of fecal and pig-associated signals which may be a result of the single-stage, deep (6 m) waste management system. It should also be noted that there were no consistent spatial relationships associated with the positive swine-associated PCR results and proximity to waste storage.

72

Table 3.4. Proportion of positive PCR results using general Bacteriodales and swine- associated markers on water DNA extracts

Swine- Swine- General Swine-associated associated associated Water Sampling Site Bacteroidales Prevotella Prevotella Methanogens (Bac32) (PigBac1) (PF163) (P23-2 ) DNA Template concentration 0.2 ng 1 ng 1 ng 10 ng 50 ng 10 ng Illinois* Site A Wells (n=16) 6/16 7/16 1/16 3/16 1/16 2/16 Illinois Site A Lagoon (n=1) 1/1 0/1 0/1 1/1 0/1 1/1 Illinois Site A North Stream (n=1) 1/1 1/1 0/1 0/1 0/1 0/1 Illinois Site A North Tile (n=1) 1/1 0/1 0/1 0/1 0/1 0/1 Illinois Site C Wells (n=6) 3/6 1/6 2/6 2/6 2/6 1/6 Illinois Site C Lagoon (n=1) 1/1 1/1 1/1 1/1 0/1 1/1 Illinois Site C seep from field (n=1) 1/1 1/1 0/1 1/1 0/1 0/1 Illinois Site C South Stream (n=1) 1/1 1/1 1/1 1/1 0/1 0/1 Illinois Site E Wells (n=11) 5/11 6/11 4/11 3/11 0/11 0/11 Illinois Site E Manure Pit (n=1) 1/1 1/1 0/1 1/1 0/1 1/1 Texas Lake Granbury (n=10) 10/10 Nd** nd** 6/10 2/10 10/10 Texas Buck Creek (n=10) 10/10 Nd** nd** 6/10 0/10 0/10

3.4.2. PCR assay performance in Texas surface water samples.

A high frequency of positive signals was obtained for the general Bacteroidales marker and for PF163 in water samples from Lake Granbury and Buck Creek (Table 3.4. and Table S3).

Interestingly, while the PigBac1 was positive for all the samples from Lake Granbury, none of the samples tested from Buck Creek yielded positive PCR signals. In addition, the methanogen- targeted marker produced a positive PCR signal in only one of the water samples from either

Lake Granbury or Buck Creek. PF163 was also the only swine- associated marker detected in three water samples collected close to a swine farming operation in Ohio (data not shown). These results suggest that the PF163 might be more frequently detected in different environmental settings than the other two swine-associated assays tested in this study.

73

3.4.3. Probabilities of swine-associated PCR detection at sites using Bayesian statistics.

Because all of the swine PCR assays showed some level of cross-reactivity with non- target fecal sources, the probability of detecting feces originating from swine-associated sources within a given water sample was estimated using Bayesian statistics (Kildare et al., 2007). When the confidence of each assay was tested for water samples using a range of prior probabilities, a positive result from either P23-2 or PF163 assay always yielded a higher confidence of detecting true positive pig fecal signal than the PigBac1 (Fig. 3.1.a). The probability of pig fecal contamination given a negative PCR result indicated that of the three pig-specific markers,

PF163 yielded the lowest probability of false negatives (Fig. 3.1.b). Altogether, these data indicate that the PF163 assay yielded the highest probability of yielding true positive and negative PCR results in environmental waters, as compared to the other pig-specific markers tested in this study.

The Bayesian analysis also reveals important limitations regarding the utility of some of the currently available assays. Specifically, while a combination of assays could increase the accuracy of detecting swine fecal pollution in environmental waters (Fig. 3.1.c, d), the results from this study suggest that the currently available assays have limited value as environmental monitoring tools. For example, only 40% of the water samples tested in this study was positive using any of the three swine-associated assays. This number is lower than expected considering the close proximity of the sites to swine operations and the high occurrence of the general

Bacteroidales marker (75% of the samples). While PCR inhibition is one possible explanation for the low occurrence of the swine-associated markers, there were only two occurrences where a swine-associated marker produced a PCR product and the general Bacteroidales marker did not.

Additionally, all but three (i.e. 37/40) water DNA extracts produced a positive PCR result when

74 using the general bacterial 16S rRNA gene-targeted assay (8F/787R), suggesting PCR inhibition was not impacting PCR results associated with these environmental samples. Assuming that most Bacteroidales clades are of fecal origin, the results suggest that the targeted host-specific populations survive poorly in conditions associated with waste management and transport into environmental waters. Bayesian statistics indicated that a PCR positive result from the three host-specific markers used in this study would result in greater than 90% confidence that a given water sample is indeed contaminated with swine feces (at P(S) > 0.2). However, only eight of all water samples tested (n=43) were positive for two or more of the swine-associated assays.

Moreover, only two samples were positive for all four swine-associated assays. Thus to improve statistical confidence that a water sample is indeed contaminated with swine feces, better assays are needed, particularly assays capable of detecting multiple groups of environmentally-relevant swine-associated fecal bacteria.

75

1 1 A B

0.8 0.8 PF 163 P23-2 PigBac1 0.6 0.6 Bac 32

0.4 0.4

PF 163 0.2 P23-2 0.2 PigBac1

PosteriorProbability Given a aPositive Result Bac 32 0

PosteriorProbability Givena a Negative Result 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Prior Probability of Swine Contamination Prior Probability of Swine Contamination

1 1 C D

0.8 0.8 P23-2 only P23-2 and PigBac1 0.6 0.6 P23-2, PigBac1, and PF163

0.4 0.4

0.2 P23-2 0.2

P23-2 and PigBac1 PosteriorProbability Given aa Negative Result

PosteriorProbability Given a a Positive Result P23-2, PigBac1, and PF 163 0 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 Prior Probability of Swine Fecal Contamination Prior Probability of Swine Contamination

Fig. 3.1. Probability of swine fecal contamination using a Bayesian statistical model.

(a) Posterior probability of contamination given a positive PCR result using each of the four primer sets tested in this study over a range of prior probabilities. (b) Posterior probability of contamination given a negative PCR result using each of the four primer sets tested in this study over a range of prior probabilities. (c) Posterior probability of contamination, given a positive

PCR result using the P23-2 primer set alone, in combination with PigBac1, and all three host- specific assays together. (d) Posterior probability of contamination, given a negative PCR result using the P23-2 primer set alone, in combination with PigBac1, and all three host-specific assay together.

76

3.4.4. Phylogenetic analysis and population diversity of fecal and environmental

Bacteroidales clones.

Because several studies have suggested that Bacteroidales 16S rRNA gene sequences are promising targets for method development (Bernhard and Field, 2000,

Dick et al., 2005, Lamendella et al., 2007, Okabe et al., 2007), clone libraries were developed from feces and fecally impacted waters to assess the level of diversity, host specificity, and membership of Bacteroidales associated with swine fecal sources and waters impacted with swine fecal pollution. Clone libraries were also developed to understand why the Bacteroidales-based assays did not perform as well as expected in environmental scenarios. By unveiling the diversity of the current pig-associated targets it was possible to provide information on the level of diversity relevant to MST assay development.

A total of 1,502 partial 16S rRNA gene Bacteroidales sequences derived from nine different animal fecal sources, three waste storage facilities (i.e., manure pits, lagoons, and septic tanks), and six sampling sites adjacent to swine operations were analyzed (Table 3.5). The final phylogenetic analysis included sequences from cattle (n=294), pig (n=219), human (n=67), horse

(n=48), chicken (n=16), elk (n=13), cat (n=9), seagull (n=8), human infant (n=6) feces, as well as from swine manure pit (n=245), human septic (n=59), swine lagoon (n=55), and surface water

(n=360) samples adjacent to a swine operation in Ohio, and surface water (n=51) and groundwater (n=135) adjacent to a swine operation in Illinois (Fig. 3.2.). A total of 245 anomalous sequences were excluded in the analysis (10). More than half of all unique clone sequences (i.e., 56%) exhibited low sequence similarity (< 97%) to Bacteroidales 16S rDNA sequences, indicating that the phylogenetic diversity of Bacteroidales-like sequences may be

77 currently underrepresented in the publicly available databases. Interestingly, less than 2% of all swine fecal, manure pit, and lagoon sequences derived from geographically diverse locations were associated with Bacteroides spp. As the latter sequences clustered with clones retrieved from other non-swine sources these populations could be considered cosmopolitan. Similarly,

Hong et al. (Hong et al., 2008) concluded that Bacteroides spp. were present at relatively low abundance in swine feces, even after targeting 14 different Bacteroides species. The results in this study further indicate that Bacteroides spp. may only account for a limited number of PCR- amplified Bacteroidales 16S rRNA genes from the swine fecal wastes (Leser et al., 2001) and therefore are not ideal sequences for the development of inclusive host-specific assays.

In contrast, nearly all (i.e., 98%) of swine-associated sequences clustered with members related to Prevotella species, suggesting this group exists in higher abundance within swine fecal waste. Analysis of Prevotella-like sequences associated with swine fecal pollution sources revealed interesting diversity patterns. For example, more than 80% of swine manure pit and swine fecal Bacteroidales 16S rRNA gene sequences belong to the same Prevotella clades. More importantly, half of the sequences retrieved from the water samples examined in this study (i.e.,

250/495) clustered within the aforementioned Prevotella clades, supporting the contribution of swine fecal sources in these waters. However, when compared to all the water clones from this study, the PF163 primer did not match with the vast majority of the environmental water sequences (i.e., 474 out of 495) suggesting that this assay can underestimate swine pollution in the environment. It should be noted that the PF163 primer matched 24% and 47% of fecal and manure pit sequences, respectively, and that depending on the site, 5 to 56% of the

78

Fig. 3.2. Phylogenetic tree of 1,502 Bacteroidales 16S rRNA gene sequences derived from different animal hosts, and from water, lagoon, and manure samples, based on a neighbor joining algorithm. Numbers in parenthesis indicate the number of sequences associated in each clade for a given host. Sequences for cultured Bacteroidetes members were added to the analyses as reference points.

79 water samples tested were positive for the PF163 assay. Hence, not only does the PF163 assay target a subset of host-specific populations of apparently low abundance, but these populations might not survive well in aquatic environments. Furthermore, a lower number of sequences from fecal sources (i.e., 13/245 manure pit; 5/219 fecal) and water samples (0/495) matched the

PigBac1 primer, which may explain the lower frequency of detection of this marker in manure pit and water samples. Altogether, these data suggest that multiple swine-associated assays should be sought to maximize the coverage of these environmentally relevant host-specific populations.

More than 80% (i.e., 44 of 55) of the swine lagoon clones were associated with either

Bacteroides or Parabacteroides species, and not with Prevotella-like sequences which dominated the feces and manure pits clone libraries. Interestingly, all sequences derived from

Site A and Site C, which used lagoon type waste management, clustered with eleven lagoon sequences, theoretically supporting swine contamination within these wells. Nevertheless, the majority (i.e. 42 of 55) of other swine lagoon sequences could be considered cosmopolitan, as they also cluster with clones derived from human septic tanks, human feces, and horse feces.

While more sequencing from swine lagoons is needed, these data suggest that many of the numerically dominant Bacteroidales populations derived from swine lagoons can withstand the conditions found in this type of fecal waste storage.

Rarefaction of the Chao 1 richness estimator indicated that Bacteroidales assemblages derived from surface water samples from Ohio showed that deeper sequencing coverage might not reveal significant novel diversity, as the curve was approaching a horizontal asymptote at approximately 138 OTUs (Fig. 3.3.). Richness estimators and rarefaction analysis also indicated that at the current sequencing effort, swine feces, manure pit, and surface water contained more

80 diverse Bacteroidales assemblages than sequences derived from groundwater and clones developed using PF163 PCR products from feces (Table 3.5.; Fig. 3.3.). When pair-wise comparisons were inferred between pig feces, pig manure pit, pig lagoons, and water adjacent to swine operations, it was found that Bacteroidales sequences derived from pig fecal samples were a subset of water and fresh manure pit samples using OTU0.03 memberships with the SONs program (Fig. 3.4). This may suggest that manure pits may be a better surrogate for fecal sources, as these samples represent a mixture of several individual pigs and more representative of fecal waste entering the environment. Using community fingerprint data Ziemer et al.

(Ziemer et al., 2004) showed that Bacteroides-Prevotella species present within the fecal and manure pit samples are different More interestingly, 65 OTUs were shared among water, fecal, and manure pit sequences, indicating that several source fecal and manure pit Bacteroidales populations are indeed found in water samples believed to be contaminated with swine pollution.

Sequences representing these OTUs clustered together in host-specific Prevotella phylogenetic clades (Fig. 3.2.). These populations may be promising pig fecal source tracking targets as they are also relatively high in abundance in environmental samples. This finding supports our conclusion that several swine-specific assays will be required to track swine fecal sources in environmental samples due to the diversity of populations that show host-specificity. By choosing assays that target multiple groups, the limitations associated with using a single marker

(Hong et al., 2008, Santo Domingo et al., 2007) can be circumvented, including limitations for assays developed using newly discovered Bacteroidales host-specific clades.

81

180

160

140

120

100

80

Number of (97%) OTUs 60

Pig Lagoon 40 PF163 Fecal Pig Manure Pit 20 Pig fecal Ground Water Adjacent to Pig Farms Surface Water Adjacent to Pig Farms 0

0 40 80 120 160 200 240 280 320 360 400 440 Number of Bacteroidales 16S rRNA gene sequences

Fig. 3.3. Rarefaction of Chao 1 richness estimators using sampling without replacement for swine source and environmental Bacteroidales assemblages. Sequences within an OTU are at most 3% distant from the most similar sequence in that given OTU.

82

Pig Lagoon (L) 15 (CI= 8-42)

Pig Manure Pit (MP) Water near Pig Farms (W) 122 (CI= 85-210) 125 (CI= 103-172)

8 14

65 17 4

Pig Feces (F) 86 (CI= 67-131)

Fig. 3.4. Venn diagram comparing community structure of Bacteroidales populations derived from pig feces, pig manure pit, pig lagoons, and water adjacent to swine operations. Numbers indicate the OTUs shared at the 97% similarity (OTU0.03) by each of the overlapping communities. Pair-wise comparisons were inferred between all four communities to generate a shared Chao estimate for any two overlapping communities. To better define the number of

OTUs in overlap regions, shared Chao estimates were generated using SONs where one community (e.g. water) was compared to the other three communities (e.g. lagoon, manure pit, and fecal) pooled as one community. Chao 1 richness estimate of all four libraries pooled together was 263 (CI= 221-337).

83

Table 3.5. Similarity-based OTUs and richness estimates for general Bacteroidales swine source and environmental samples

Water Adjacent Swine Swine to Swine Lagoon Manure Swine Feces Pit Farms

Number of Sequences 43 209 582 199

Number of OTUsa 10 59 86 55

122 15 (85- 125 86 Chao 1b (11-42) 210) (103-173) (68-132) (95% CI) 131 19 (92- 138 104 (12-49) 215) (113-188) (77-164) ACE b

12 74 103 68 Bootstrap b

1.75 3.18 2.79 2.84 (1.52- (2.98- (2.63- (2.59- 1.97) 3.37) 2.95) 3.08) Shannon b

0.17 0.079 0.19 0.17 Simpson b a Operational taxonomic units have been defined as each of the sequences that are at most

3% distant from the most similar sequence in a given OTU. b The Chao1 and abundance-based coverage estimators (ACE) are non-parametric methods used to estimate richness by adding a correction factor to the observed number of species (7, 8).

Bootstrap is a non-parametric estimate of species richness as described by Smith and Belle (33).

Shannon and Simpson diversity indices give the classic Shannon-Weaver Index of diversity

Simpson Index of diversity.

84

In this study, we evaluated the utility of currently available swine-targeted PCR assays within multiple environmental scenarios. Overall, the assays targeting Prevotella populations were found to more frequently detect swine fecal pollution than the methanogen-based assay in the environmental samples analyzed in this study. Both Prevotella-targeted assays cross-reacted with non-target fecal sources, questioning their potential value as standalone assays in complex multi-use watersheds. However, the application of Bayes’ Theorem shows how a probability model can inform users about the utility of host-specific markers within watershed-based studies.

Sequence analysis of the general Bacteroidales clones demonstrated the presence of novel

Prevotella clades that are not accounted for when using the currently available pig-specific assays. Additionally, phylogenetic analyses discriminated between endemic versus cosmopolitan

Bacteroidales populations. By comparing membership and structure of fecal and environmental microbial communities, it was possible to demonstrate the presence of swine-associated populations which could also be found in swine-impacted waters. While additional tests are necessary to examine host specificity and universal occurrence of the latter populations in environmental waters, the findings in this study suggest that these sequences are promising targets for swine-fecal source tracking assay development. Understanding the molecular diversity of both fecal and environmental microbial populations clearly provided an additional method for confirming specific fecal pollution sources in environmental waters and will likely unveil novel targets for future method development.

85

3.5. Acknowledgments

This research was supported in part by an Augmentation Award to JSD from the National

Center for Computational Toxicology of the U.S. EPA, Office of Research and Development.

We are grateful for to Jingrang Lu and Donald Stoeckel for sharing their fecal sample collections. We thank the Brazos River Authority and Phyllis Dyer for water sample collection.

We also thank Elizabeth Casarez, Joy Truesdale and Nicholas Garcia for technical assistance.

The U.S. Environmental Protection Agency, through its Office of Research and Development, funded and managed, or partially funded and collaborated in, the research described herein. It has been subjected to the Agency’s peer and administrative review and has been approved for external publication. Any opinions expressed are those of the author(s) and do not necessarily reflect the views of the Agency, therefore, no official endorsement should be inferred. Any mention of trade names or commercial products does not constitute endorsement or recommendation for use.

86

CHAPTER 4

Molecular Ecology of Fecal and Environmental Bacteroidales Populations Reveals Swine-

specific Fecal Source Tracking Markers

4.1. Abstract

Swine fecal waste presents risks to human health because this waste can harbor a variety of human pathogens and represents an important reservoir for zoonotic pathogens. Methods that can identify and quantify swine fecal pollution are imperative for the proper implementation of best management practices. More recently, source tracking methods have focused on PCR- based assays targeting the 16S rRNA gene of Bacteroidales populations. However, the utility of these Bacteroidales-targeted methods has been questionable in field applications because these methods only partly comply with critical criteria for source identification like host- specificity, host-distribution, temporal and geographic stability. In order improve these methods, and develop more robust swine-specific Bacteroidales-targeted fecal source tracking assays, the occurrence and diversity of Bacteroidales populations from 14 animal fecal sources and nine fecally contaminated watersheds was further investigated. Phylogenetic and diversity analyses of 6,311 Bacteroidales sequences revealed that some of the previous developed markers target populations shared by multiple hosts. These analyses also unveiled several populations specific to swine fecal sources that were also present in swine fecally-impacted waters. These populations were used as targets for new swine fecal source tracking assay development.

Preliminary in silico and experimental analysis indicated six of the swine specific assays were more than 98% specific to swine fecal sources. Presence of these markers within water samples known to be contaminated with swine feces ranged from 20-53% of all water samples tested.

87

Moreover, at least one marker was positive for 88% of all water samples tested, suggesting multiple markers may be necessary to accurately track host-specific Bacteroidales targets. All together, these data show that deeper sequencing efforts can help elucidate diversity patterns of source tracking bacterial targets and uncover host-specific distributions that are also relevant for environmental detection. While further sequencing is necessary to cover the high diversity of this bacterial group, this study shows the utility of deeply probing molecular diversity of bacterial source tracking targets.

4.2. Introduction

Non-point sources of fecal contamination are a significant detriment to water quality and impose risks to human health and aquatic ecosystems. A recent study identified that current farming practices are responsible for 70% of the pollution in U.S. rivers and streams (Horrigan et al., 2002). In particular, animal manure has been identified as a large contributor to water pollution due to its over-abundance (Jongbloed and Lenis, 1998). The U.S. Environmental

Protection Agency (USEPA) estimates that manure from confined animal feeding operations

(CAFO’s) is three times our nation’s volume of human fecal waste (USEPA, 2003). Over the past four decades, the swine industry worldwide has become an increasing environmental concern, due to augmented production and concentration of farming operations, thus producing large amounts of more concentrated waste products. The marked increase in the amount of swine waste produced per farming operation has raised concerns about swine waste storage and treatment processes. For example, the Environmental Integrity Project documented 329 manure spills in Iowa between 1992 and 2002, due to failure or overflow of manure storages, uncontrolled runoff from open feedlots, improper manure application on cropland, deliberate

88 pumping of manure onto the ground and intentional breeches in storage lagoons (Merkel, 2004).

When introduced into water swine fecal waste can present risks to human health because this waste can harbor a variety of human pathogens and represents an important reservoir for zoonotic pathogens. (Buckholt et al., 2002; Fratamico et al., 2004; Guan and Holley, 2003;

Krapac et al., 2002). Swine waste also can contain high concentrations of antibiotics (Chee-

Sanford et al., 2001), nutrients (Kellog et al., 2000), and heavy metals (Petraitis, 2007). Thus, control of swine waste from entering waters used for recreation, shellfishing, and public water supplies is essential in order to assist in meeting water quality standards to protect environmental and human health.

If the origin of fecal pollution can be correctly and rapidly identified, best management practices and remediation efforts could be introduced in a timely and cost-effective manner.

Microbial source tracking (MST) is a rapidly emerging discipline of applied environmental microbiology that focuses on identifying the source(s) of fecal contamination impacting a given body of water. More recently, MST methods have focused on PCR-based assays targeting the

16S rRNA gene of various Bacteroidales populations. (Bernhard and Field, 2000, Dick et al.,

2005; Layton et al., 2006; Okabe et al., 2007, Kildare et al., 2007; Mieszkin et al., 2009).

Members of this bacterial group are ecologically diverse and numerically abundant within several animal distal guts, and several studies have suggested that some members of this bacterial group demonstrate host-specific distributions (Backhed et al., 2005; Dick et al., 2005,

Lamendella et al., 2007). However, only a limited number of these Bacteroidales methods have been successfully used in field applications. This may be explained by the fact that most methods only partly comply with critical criteria for source identification such as host- specificity, host-distribution, temporal and geographic stability of the genetic markers. In order

89 improve these methods, and perhaps develop more robust Bacteroidales-targeted swine-specific fecal source tracking assays, the occurrence and diversity of Bacteroidales fecal bacteria needs further investigation.

The currently available pig-specific assays were designed based on small 16S rRNA gene fecal sequence libraries derived from a limited number of pig fecal samples from local geographical areas. For example, Bernhard and Field designed their pig-specific assay based on a host-specific phylogenetic clade containing less than ten 16S rRNA gene sequences from one pooled fecal sample collected in Oregon (Dick et al., 2005). Similarly, Okabe et al. (2007) developed an assay that targets two small Prevotella-related, pig-specific clades, each containing less than five sequences derived from pig feces from two farms in Japan. Given the vast diversity of Bacteriodales populations, additional sequencing is needed in order to resolve the level of specificity of these 16S rRNA gene-based assays. The community structure, diversity, and membership of Bacteroidales populations harbored within different animal types and environmental systems are still poorly characterized. In order to design more comprehensive assays for accurately quantifying contributions of fecal pollution from different hosts, it is imperative to characterize the diversity and distribution of Bacteroidales populations and their relative abundances in both fecal and environmental matrices. Thus evaluating the community structure, membership, and abundance of Bacteroidales populations from several geographically diverse host fecal types and environmental samples known to contain fecal pollution can be used to reveal previously unknown host-specific populations, estimate relative abundances of

Bacteroidales populations in various hosts, and reveal populations relevant to environmental fecal pollution.

90

This study focused on studying the diversity and distribution of Bacteroidales populations derived from several fecal source and fecally-impaired environmental samples, as an approach to identify populations specific to swine fecal sources that can also be detected within environmental waters impacted by swine fecal pollution. Studying molecular diversity patterns of this fecal bacterial group highlighted some potential cross-amplification issues with currently available swine-specific assays, and uncovered several new potential swine-specific targets.

This study demonstrated the importance of understanding the molecular diversity of microbial source tracking targets, so as to maximize the efficacy of future microbial source tracking assays.

4.3. Methods

4.3.0. Sample Collection.

Fecal, manure pit, and waste lagoon, septic tank, wastewater, sediment, manure-applied soil, and groundwater and surface water samples were collected for processing. See Table S1 in supplemental material for a detailed description of sample sources. Selection of source material was based on the goal of including as many different animal types as possible to check for host- specificity, with emphasis on hosts considered to be important sources of fecal pollution in the

United States. DNA extraction methods for fecal source and environmental samples are described elsewhere (Lamendella et al., 2007; Lamendella et al., 2009).

4.3.1. PCR assays and limits of detection.

The general Bacteroidales 16S rRNA gene PCR assay (Bac32f/Bac708r) (Bernhard and

Field, 2000) was used to amplify total Bacteroidales fecal anaerobic bacteria in water and fecal source samples. Reactions for the general Bacteroidales assay were conducted using the following conditions: 94 ºC for 5 min, followed by 30 cycles of 94 ºC for 20 s, 55 ºC for 20 s,

91 and 72 ºC for 30 s, and a final extension step consisting of 72 ºC for 5 min. Fecal and water sample DNA template concentrations used in the PCR assays were adjusted based on published detection limits (Bernhard and Field, 2000). Final PCR solutions (25 µl total volume) contained

2.5 µl of Takara Ex Taq 10X buffer (20 mM Mg2+), 2 µl of dNTP mixture (2.5 mM each), 1 µl of 25% acetamide, 17.5 µl of UltraPure water, 12.5 pmol of each forward and reverse primers, and 0.625 units of Ex Taq DNA polymerase (TAKARA Mirus Bio., Madison, WI). Reactions were conducted on a DNA Engine 2 Tetrad thermalcycler (Bio-Rad Laboratories, Inc., Hercules,

CA). Amplification products were visualized using 1% agarose gels and GelSTAR nucleic acid stain (Cambrex BioScience, East Rutherford, NJ).

4.3.2. Cloning and sequencing analyses.

General Bacteroidales (Bac32f/Bac708r) PCR products were used in cloning experiments to assess the molecular diversity of Bacteroidales species in different hosts. Sequencing was performed as previously described (Lamendella et al., 2007). Briefly, PCR products were purified using the QIAquick PCR Purification Kit according to the manufacturer’s instructions

(QIAGEN, Valencia, CA). Representative PCR products derived from swine feces (feral and domesticated), manure pits, lagoons, and water adjacent to swine farms in Illinois and Ohio were cloned into the pCR4.1 TOPO vector as described by the manufacturer (Invitrogen, Carlsbad,

CA). Individual E. coli clones were subcultured into 300 μL of Luria Broth containing 50

μg/mL ampicillin and screened for inserts using M13 PCR. Clones were submitted to Children’s

Hospital DNA Core Facility (Cincinnati, OH) for sequencing using Big Dye sequencing chemistry (Applied Biosystems, Foster City, CA), M13 forward and reverse primers, and an

Applied Biosystems PRISM 3730XL DNA Analyzer. Sequences were manually verified and cleaned using Sequencher 4.7 software (Gene Codes, Ann Arbor, MI). Chimeric sequences

92 detected using Bellerophon (Huber et al., 2004) were not included in further analyses. Non- chimeric sequences were submitted to Greengenes for alignment using Nearest Alignment Space

Termination (NAST) algorithm (DeSantis et al., 2006a; DeSantis et al., 2006b). Sequences were also submitted to BLAST homology search algorithms to assess sequence similarity to the

Greengenes database (Altschul et al., 1990, DeSantis et al., 2006a). The distance matrix and phylogenetic tree were generated using ARB software (Ludwig et al., 2004). Trees were inferred from 650 sequence positions using neighbor-joining (using a Kimura correction) and maximum parsimony (using the Phylip DNAPARS tool) (Ludwig et al., 2004). To statistically evaluate branching confidence, bootstrap values were obtained from a consensus of 100 parsimonious trees using MEGA software (http://www.megasoftware.net). Werenella spp. rRNA 16S gene sequence (accession number AJ234059) was used as the outgroup, while cultured Bacteroidales species were included in the analyses as point of reference. Primers were designed using ARB software based on this total Bacteroidales phylogenetic tree. Representative sequences generated in this study have been deposited in the GenBank database under accession numbers

NNNNN-NNNNN .

4.3.3. Molecular diversity estimates and primer design.

Molecular diversity analyses and assemblage comparison of clone libraries were performed using Mothur v1.6 software (Schloss et al., 2009). First, a distance matrix was calculated using uncorrected pair-wise distances between aligned sequences. Then, sequences were assigned to operational taxonomic units (OTUs) using the furthest-neighbor algorithm.

Chao 1, Abundance-based Coverage Estimator (ACE), and Good's coverage were calculated for each Bacteroidales library at OTU0.03 distance. Sample rarefaction curves were calculated using resampling without replacement with 1,000 randomizations. A rectangular phylogram was

93 generated to describe similarity between libraries. The clustering was performed using the

UPGMA algorithm with the distance between communities calculated using the Yue and Clayton theta, as described within the Mothur manual (http://www.mothur.org/wiki/Tree.shared). The

Yue & Clayton measure of similarity between the structures of any two Bacteroidales assemblages (OTU distance=0.03) was also used to create a heat map of pair-wise similarities.

The statistical significance of these pair-wise similarities was tested using the Cramer von Mises statistic as described within the Mothur manual (http://www.mothur.org/wiki/Libshuff). Heat maps of Bacteroidales populations (OTU0.03) from each fecal source and environmental library were created and the abundance of each OTU was log10 transformed and scaled to the largest log10 abundance value. Mothur was also used to retrieve sequences shared by multiple libraries at the OTU0.03 definition.

4.3.4. Preliminary Primer Evaluation.

Host-specificity was performed using previously described library of fecal source DNA extracts (n=96) (Lamendella et al., 2009). PCR conditions for the swine-associated assays were determined as follows: 94 ºC for 5 min, followed by 35 cycles of 94 ºC for 20 s, 54-63ºC for 20 s

(see Table 5 for annealing temperatures), and 72 ºC for 30 s, and a final extension step consisting of 72 ºC for 5 min. One ng of template DNA was added to each reaction, as detection limits for host-specific assays ranged from 10-9-10-12 ng of DNA. Environmental detection of each swine marker was evaluated using previously described water samples collected from water adjacent to swine farms in Illinois (Lamendella et al., 2009). In silico specificity test were also conducted using more than 20,000 Bacteroidetes 16S rRNA sequences from this study and available within the SSU ARB/Silva database serving as the positional tree server.

94

4.4. Results and Discussion

4.4.0. Diversity Bacteroidales of 16S rRNA gene sequences.

A total of 6,311 non-chimeric Bacteroidales sequences were retrieved from 14 different fecal sources and nine different fecally-contaminated watersheds (Tables 4.1 and 4.2). Detailed descriptions of fecal source samples and environmental samples can be found in Table S1.

When diversity of each of Bacteroidales libraries was assessed, Chao 1 and ACE indices revealed a high diversity of most fecal and environmental Bacteroidales assemblages (Table 3).

For example, at the OTU0.03 distance, Chao 1 diversity indices for swine feces, cattle feces, pig manure pit, wastewater, and several fecally-impacted surface waters exceeded 300 OTUs (Table

4.3). Even sequencing several hundred Bacteroidales clones for many of the libraries did not saturate the diversity of this bacterial group. For example, 268, 351, 280, and 502 sequences were retrieved from pig feces, pig manure pits, cattle feces, and surface water (TN), while only covering between 51-67% of Bacteroidales diversity. Additionally, rarefaction curves of observed number of Bacteroidales OTUs from several fecal and environmental libraries did not appear to be approaching a horizontal asymptote, indicating the current sequencing effort had not saturated diversity (Fig. 4.1). All together, these data suggest that even this deep sequencing effort did not comprehensively cover diversity of this fecal bacterial group within some fecal source and environmental samples. If Bacteroidales is going to be used as a source tracking target, an even more comprehensive sequencing approach should be used to account for total diversity of this fecal bacterial group.

In contrast, some Bacteroidales libraries showed lower levels of diversity, namely river sediment, septic tank, pig lagoon, and pig manure-applied soil. For example, the Chao 1 estimates were 12 , 85, 153, and 86 Bacteroidales OTUs from river sediment, septic tank, pig

95 lagoon, and pig-manure applied soil, respectively (Table 4.3). This finding suggests that these environments are imposing selective pressures on fecal Bacteroidales populations, perhaps driving down diversity of this bacterial group within these environments.

Table 4.1. List of Bacteroidales 16S rRNA gene sequences from fecal source and environmental samples.

Number of Source Sequences Seagull Feces 8 Cat Feces 10 Cattle Feces (Beef) 280 Chicken Feces 12 Dog Feces 11 Fish Gut 49 Horse Feces 32 Human 34 Pig Feces 268 Pig Lagoon 564 Pig Manure Pit 351 Pig Manure-applied soil 83 River Sediment 78 Septic Tank 63 Surface Water 3410 Groundwater 134 Wastewater 937

96

Table 4.2. Bacteroidales 16S rRNA gene sequences derived from fecally impacted water samples

Number of Potential Source of Source Sequences Contamination Surface Water, Canada 746 Avian Surface Water, Massachusetts 570 Human and Cattle Surface Water, Nebraska 162 Cattle and Horse Surface Water, New York 539 Unknown Groundwater, Illinois 134 Swine Surface Water, Ohio, Illinois, North Carolina 702 Swine Multiple Sources Including Surface Water, Singapore 184 Swine Surface Water, Tennessee 502 Human, cattle, wildlife

When Bacteroidales OTUs from fecal source and environmental samples were clustered, relationships between Bacteroidales assemblages could be inferred. For example, Bacteroidales assemblages from pig feces, pig manure pits, and pig manure-applied soils clustered together, suggesting some level of similarity among Bacteroidales assemblage structure within these samples (Fig 4.2). Interestingly, pig lagoon Bacteroidales, did not cluster with the aforementioned group, but rather shared more similarity to surface water environments. This finding suggests that a different Bacteroidales population structure exists within swine lagoons as compared to manure pit or feces. This finding has important implications for source tracking assay development, as different swine waste management practices (i.e. manure pit vs. lagoon) result in different Bacteroidales assemblage structures. Thus, multiple markers may in fact be necessary to target these differentiated fecal source populations. Interestingly, Bacteroidales assemblage structure from swine-impacted waters clustered more closely with swine lagoon and other surface water environments, suggesting, swine-fecal source Bacteroidales populations that make it into the environment may undergo another population shift. Understanding the

97 differential survivability of fecal source-specific populations is imperative the microbial source tracking, especially when quantifying different fecal sources in environmental monitoring scenarios. Recent studies have indicated that human and bovine-specific Bacteroidales markers have differential survivability under various environmental conditions (Bell et al., 2009; Walters and Field; 2009). Thus, studying the molecular diversity of source-specific populations from feces, to waste management, and ultimately transport into the environment is imperative for marker discovery and relative abundance of these source specific targets.

98

Table 4.3. Diversity of Bacteroidales 16S rRNA gene sequences derived from fecal source and environmental samples.

Good's Observed Coverage Sample Chao (95% CI) ACE (95% CI) OTUs (.03) (0.03 distance) Cat Feces 6 9 (6-29) 14 (7-55) 0.6000 Cattle Feces (Beef) 155 357 (271-507) 486 (398-465) 0.6393 Chicken Feces 8 23 (11-76) 21 (10-85) 0.5000 Dog Feces 7 9 (7-23) 11 (9-14) 0.6364 Fish Gut 4 4 (0-6) 45 (4-8) 0.9796 Horse Feces 13 49 (23-139) 34 (18-108) 0.7273 Human Feces 19 65 (31-190) 91 (55-162) 0.6000 Pig Feces 140 413 (294-625) 1072 (874-1325) 0.6082 Pig Lagoon 101 153 (126-210) 241 (198-303) 0.9184 Pig Manure Pit 207 851 (588-1294) 1799 (1521-2135) 0.5426 River Sediment 11 12 (11-22) 13 (11-26) 0.9615 Septic Tank 28 85 (45-219) 161 (109-245) 0.6984 Pig Manure-applied Soil 35 86 (52-185) 166 (117-244) 0.7089 Water, Canada 205 424 (335-571) 511 (439-607) 0.8458 Water, Massachusetts 140 385 (271-599) 769 (639-933) 0.8439 Water, New York 75 123 (96-185) 189 (151-247) 0.9314 Water, Pig 314 854 (675-1122) 1486 (1297-1713) 0.9314 Water, Singapore 118 354 (245-555) 387 (269-598) 0.5109 Water, Tennessee 242 652 (501-890) 1118 (960-1310) 0.6713 Wastewater 248 538 (432-705) 977 (850-1131) 0.8388

99

350

300

250

Wastewater Water, Singapore 200 Water, Pig-impacted Water, New York Water, Massachusetts 150 Water, Canada Pig manure pit Pig Lagoon 100 Pig, Fecal

Cattle, fecal Observed OTUs (.03 distance)

50

0 0 100 200 300 400 500 600 700 800 900 1000 Number of Bacteroidales Sequences

Fig. 4.1. Bacteroidales rarefaction curves for well-sampled (n>200 sequences) fecal and environmental libraries. Dotted line indicated 95% confidence intervals.

100

Fig. 4.2. Rectangular phylogram using the UPGMA algorithm using the distance between communities as calculated by the Yue & Clayton measure of similarity at

0.03 OTU distance.

101

Table 4.4. Cramer von Mises Statistic to compare pair-wise Bacteroidales assemblage structure.

Comparison Score Significance Pig Lagoon vs. Pig Feces 0.02085069 <0.001 Pig Feces vs. Pig Manure Pit 0.04897388 <0.001 Pig Lagoon vs. Pig Manure Pit 0.03995258 <0.001 Water, swine-impacted vs. Pig Feces 0.00286472 0.013 Water, swine-impacted vs. Pig Lagoon 0.03678591 <0.001 Water, swine-impacted vs. Pig Manure Pit 0.0459043 <0.001 Horse Feces vs. Pig Feces 0.00282966 0.889 Pig Feces vs. Horse Feces 0.01244319 0.09 Cattle Feces vs. Pig Feces 0.00137051 0.91 Pig Feces vs. Cattle Feces 0.01121759 0.065 Cattle Feces vs. Pig Lagoon 0.00206563 0.947 Pig Lagoon vs. Cattle Feces 0.00487155 0.652

102

Fig. 4.3. Yue & Clayton measure of similarity between the structures of any two Bacteroidales assemblages (OTU distance=0.03). The color key indicates the level of similarity shared between two assemblages.

103

Performing pair-wise comparisons of Bacteroidales assemblage structure revealed some interesting findings. For example, Bacteroidales assemblages derived from water, wastewater, pig lagoon, and septic tank shared a high level of similarity to one another, as denoted by the brighter red squares (Fig. 4.3). These data suggest that some Bacteroidales populations may be shared within different environmental waters. This analysis also unveiled high levels of similarity between pig manure pits/pig feces and waters impacted by swine fecal, suggesting pig- source specific populations exist, which can also be identified within pig impacted waters. It should be noted that swine lagoon Bacteroidales assemblages also share some similarity to swine-impacted waters, however, swine lagoons are even more similar to wastewater, and water impacted by other sources of fecal contamination. These findings suggest we need to be careful when choosing targets swine lagoon-specific source tracking targets. Additionally, using hypothesis testing to compare any two Bacteroidales assemblages, pig feces, pig manure pit, pig lagoon, and pig-manure applied soil were all had statistically significantly different (p<0.025)

Bacteroidales assemblage structures. This finding suggests that these different swine fecal sources contain significantly different Bacteroidales population structure, providing further evidence for the need of multiple marker approaches to cover these differentiated host-specific populations. This hypothesis testing also showed that Bacteroidales community structure between swine and cattle feces, swine and horse feces, and swine lagoon and cattle feces were not statistically significantly different. This finding suggests that these environments may harbor similar Bacteroidales populations, which needs to be taken into account when designing host- specific assays. The cosmopolitan nature of some Bacteroidales populations has been previously noted (Jeter et al., 2009; Lamendella et al., 2009) and finding truly host-specific Bacteroidales populations may be prove to be difficult given our poor understanding of the diversity and host-

104 distribution of this bacterial group. Additionally, the limited phylogenetic resolution of the 16S rRNA gene may further complicate targeting these smaller clusters of host-specific populations.

In order to better understand the distribution of Bacteroidales populations within the different fecal source and environmental samples, a heat map of OTU presence and abundance was generated. This method helped elucidate Bacteroidales diversity patterns within fecal and environmental matrices. For example, this analysis unveiled which OTUs were shared by multiple host hosts (i.e. cattle and pig). Surprisingly, several OTUs were shared between swine fecal source and cattle feces. Additionally, many populations were shared between municipal wastewater and swine lagoons. While many populations appeared to have shared distributions within several fecal and environmental samples, this analysis also lead to the discovery of swine- specific populations that could also be identified in environmental samples known to be contaminated with swine feces. Another interesting finding was that in most cases more than half of the Bacteroidales OTUs were completely unique to a given library. This finding suggests that sample representation may have a large impact on observed population structure and that further sequencing studies from an even more diverse array of fecal and environmental samples is necessary for a complete understanding of Bacteroidales diversity.

Several swine-specific fecal markers have recently been published that target the 16S rRNA gene of Bacteroidales populations (Dick et al., 2005; Okabe et al., 2007; Ufnar et al.,

2007; Mieszkin et al., 2009). However, the host-specificity and host-distribution of these targets have not been adequately assessed and the utility of these assays in identifying swine fecal contamination on a broad geographic scale is largely unknown. A few recent studies have assessed the utility of these markers in environmental monitoring scenarios with generally poor

105

Fig. 4.4. Heat map of Bacteroidales OTU0.03 presence and relative abundance The color key indicates the level log10 normalized OTU abundance.

results (Table 4.5) For example, while 16S rDNA-based assays targeting swine Bacteroidales populations exhibit moderate levels of host-specificity (i.e., 70-95% in animal fecal samples), their occurrence in environmental samples downstream of suspected swine inputs (Gourmelon et al., 2007) was poor (Gourmelon et al., 2007; Lamendella et al., 2009). The authors recommended the use of different animal-specific Bacteroidales primer sets to increase confidence in the identification of animal fecal pollution. In another study in Japan, pig-specific Bacteroidales quantitative PCR assays amplified DNA extracted from cattle feces (Okabe et al., 2007).

106

Obviously, this “cross-amplification” can confound quantification of fecal loads from various sources, limiting resolution of relative contribution of fecal sources. While these environmental studies are important in evaluating the utility of "swine-specific" markers, these studies offer no solution for improving the utility of these markers. Additionally, it should be noted that all of the currently available markers have been designed based on very small sequence libraries derived from swine fecal sources. In order to evaluate currently available swine specific markers, in silico searches were performed using this large library of Bacteroidales sequences.

Performing in silico searches for currently available swine-targeted markers provided some evidence for their lack of host-specificity and environmental detection (Table S5). For example, the Bac1f/Bac1r primer set hit several non-specific fecal sources, including human, cattle, and wildlife sequences (Table S5). Additionally, while the Bac2f/Bac2R marker showed high host-specificity and only match to one human fecal derived sequence, it did not hit any sequences derived from swine fecally-impacted waters, which might explain its lack of detection within environmental samples. The PF163 marker hybridized to a few non-swine fecal sources but also matched some swine contaminated water derived sequences, which may explain why this marker worked the best within environmental monitoring scenarios (Lamendella et al.,

2009). However, it should be noted that this marker did show both in silico and experimental host-specificity problems. In addition, the Mothur software allows the user to retrieve sequences from OTUs shared by any number of fecal or environmental libraries. This tool revealed sequences shared by different hosts, which was useful when designing swine-source specific markers. Interestingly, some of the non-target sequences which the Bac1,

107

Table 4.5. New Swine-specific Bacteroidales 16S rRNA gene-targeted marker performance.

Annealing Expected Primer Temp fragment Manure Pig Water, pig- Pig Water, Manure Non- Environmental Name Sequence 5'-->3' (degrees C) size Pit Lagoon impacted Feces Singapore Soil specific Specificity Detection

Pig40R ACACGCTTAACCCACGGC 58 571 4 8 0 7 0 0 0 0.65 na

PigGenR GCCTACCTCCCGCGCACT 62 624 35 36 14 14 6 0 1 0.81 na

NSPig200F CCACGGGATAGCCCGTCG 62 563 74 47 5 60 6 1 2 0.77 na

NSPig200R TTCCGCCTACCTCCCGCG 62 652 63 36 34 33 8 2 1 0.6 na

PPGA1F GGTTTCCGGTAGACGATG 56 516 0 25 1 0 0 0 0 0.99 0.25

Pig160R CGGGGCAGCATGAATTCA 56 669 1 29 1 0 0 0 0 0.7 na

Pigsub1F CTTGCTGAGTTTGATGGC 54 650 0 55 0 0 0 0 0 0.98 0.375

Pig60CAF CTCATGGATAGCCTTCTG 54 589 0 11 14 0 0 0 0 1 0.2

PigPond28F GAAGCTTGCCTCCGATAG 56 653 0 10 0 0 0 0 0 0.98 0.525

MAPF CCCTGCGACTCGGGGATA 60 609 4 0 0 0 0 11 0 0.99 0.325

PigPond17F eATCTGCCCATCGCCCGGG 62 601 0 11 1 0 0 0 0 1 0.525

Pig65F GGTACCATGGTCAAAGCT 54 671 9 0 3 0 0 19 0 0.63 na

PiglagoonF ACACGCTTATTCGGCG 50 558 0 11 0 0 0 0 0 0.82 na

PigtotalF CCACGGGATAGCCCGTCG 62 589 66 40 6 55 6 1 1 0.85 na

PigtotalR TCAGGCCTTGCGCCCATT 58 347 21 0 13 0 0 32 0 0.58 na

Pigtotal2F GGATAGCCCGTCGAAAGG 58 584 65 40 6 54 8 1 1 0.87 na

* All forward markers were combined with the general Bacteroidales Bac 708 reverse primer while all reverse primers listed, were combined

with the general Bacteroidales Bac32 forward primer (Bernhard and Field, 2000). Primers marked in yellow displayed greater than 98%

specificity and were tested within environmental matrices to further evaluate environmental detection.

108

PF163, and Bac2 markers hit, fall within OTUs shared by multiple different hosts, suggesting these markers target cosmopolitan Bacteroidales populations. Additionally, sequences unique to swine fecal sources and environments known to be impacted by swine feces, but not shared by non-swine sources could be retrieved. The identity of these sequences can be found in Table S4. Studying the diversity pattern of Bacteroidales populations provided some evidence as to why the currently available markers are performing poorly in watershed-based studies. In addition, studying diversity patterns and phylogeny (Fig. S4 and S5) helped uncover additional targets that may offer more swine fecal source-specificity and environmental detection

(Table 4.5).

Initially, 17 markers were designed based on phylogenetic (Fig. S4 and S5) and diversity patterns of Bacteroidales populations. In silico and experimental testing revealed six of these markers show high levels of swine fecal source specificity (>98%) and environmental detection (Table 4.5). Interestingly, markers designed to target swine-specific lagoon sequences showed higher levels of detection within water samples known to be impacted by swine fecal pollution. Additionally, it should be noted that some of the swine-specific fecal source tracking markers show differential detection within the same water sample. For example the PigPond17F assay was detected in six water samples water samples that no other swine-specific marker detected. When using all six markers a positive signal was found for at least one marker in 35 of

40 water samples, adjacent to sine farms. While these markers need further validation, these findings suggest that that multiple swine specific marker approach may be needed to confirm the source of fecal pollution in each watershed based studies.

109

4.5. Conclusions

This study evaluated the molecular diversity of Bacteroidales populations with fecal source and environmental matrices. Results from this study have highlighted the importance of understanding distribution and occurrence of fecal source tracking targets within feces, waste management processes, and environmental waters. This deeper sequencing effort revealed the identity of shared and swine-specific Bacteroidales populations, which was useful for swine fecal source specific assay development. Moreover, this study revealed a high diversity of

Bacteroidales populations within fecal and environmental matrices, and revealed several distinct swine-specific populations, suggesting multiple targets are necessary for accurately assessing swine fecal pollution in watershed-based applications. Future studies should focus deeper sequencing efforts to study the molecular diversity from more geographically diverse samples, which will lead to the more comprehensive understanding of the distribution of Bacteroidales populations and their utility in fecal source tracking studies.

110

CHAPTER 5

Comparative Fecal Metagenomics Unveils Unique Functional Capacity of the

Swine Gut

5.1. Abstract

Metagenomic approaches are providing rapid and more robust means to investigate the composition and functional genetic potential of complex microbial communities.

In this study, we utilized a metagenomic approach to further understand the phylogenetic and functional diversity of the swine gut. In order to better understand the relationship between the swine gut microbiome and other endobiotic environments, we also performed a large-scale comparative metagenomic approach in the context of phylogenetic and functional composition.

Total DNA extracts from eight Yorkshire pig fecal samples were pooled and used as a template for both a GS20 and FLX sequencing run using the 454 pyrosequencing technology. More than

130 megabases of sequence data was retrieved from this sequencing effort. Species and functional composition of swine fecal metagenomic fragments was performed using the MG-

RAST and JGI IMG/M-ER annotation pipelines with SEED Subsystem, COG, Pfam, and KEGG databases. Species composition results showed that both swine fecal microbiomes were not surprisingly, dominated by Firmicutes and Bacteroidetes phyla. At a finer phylogenetic resolution, the Prevotella genus dominated the swine fecal metagenome, and Treponema and

Anareovibrio were exclusively found within the pig fecal metagenomes. Functional analysis revealed that carbohydrate metabolism was the most abundant SEED subsystem representing

13% of both swine fecal metagenomes. Genes associated with cell wall and capsule, stress, and

111 virulence were also very abundant in both metagenomes. When analyzing proteins involved in the cell wall and capsule subsystem, comparative metagenomics revealed several glycosyl transferases and carbohydrate uptake systems. Pfams and COGs related to several virulence factors were numerous within the gene families unique to the swine fecal metagenomes. Several proteins involved in carbohydrate transport and attachment were both abundant and unique to the distal swine gut with more than 50 metagenomic fragments sharing high sequence homology to putative carbohydrate membrane transporters. The present study demonstrates that pyrosequencing is a useful approach for comparing the gene content of complex gut microbiomes. This study suggests that the variable microbiome content of given host species comprises a plasticity reflective of host ecology.

5.2. Introduction

The gastrointestinal tract of animals harbors a complex microbial network and its composition reflects the constant co-evolution of these microorganisms with their host environment. Uncovering the composition and functions associated with gut microbial consortia is of great importance to food and water safety, animal physiology and health. As a result of issues that relate to food safety and animal nutrition and health, the structure and function of the gut microbial community has received significant attention from researchers. The application of molecular tools to microbiology has sparked the use of 16S rRNA gene as phylogenetic marker, and has shed light on diversity and composition of microbial communities within several animal gut systems. For example, a recent study comparing the 16S rRNA genes from multiple

112 mammalian feces, revealed that host diet and host phylogeny influence bacterial diversity, with herbivores harboring the most diversity of all mammals (Ley et al., 2008).

While the cloning and sequencing of the 16S rRNA gene has revealed impressive microbial diversity within distal gut environments, this approach offers only limited information on the physiological role of microbial consortia within a given gut environment. More recently, the use of metagenomic approaches has allowed scientists to reveal significant differences in metabolic potential within different environments. Metagenomics allows for the examination of

DNA from all of the gene content present in a given sample, allowing scientists to begin to view species and functional composition from a holistic and unbiased lens. Pyrosequencing has generated more than 189 megabases of sequencing data, representing 234,000 genes from 17 different endobiotic microbiomes. Studying gut metagenomes has helped uncover several important biological characters of these microbiomes. For example, when 13 human gut metagenomes were compared, Kurokawa et al. (2007) found that adult and infant type gut microbiomes have enriched gene families sharing little overlap, suggesting different core functions within the adult and infantile gut microbiota. This study also revealed there are several hundred gene families exclusively found in the adult human gut, suggesting various strategies are employed by each type of microbiota to adapt to its intestinal environment (Kurokawa et al.,

2007). Other gut microbiome studies also support these significant differences in core and variable gene content from different animals hosts (Ley et al., 2006; Qu et al. 2008; Warnecke et al., 2007; Brulc et al., 2009). The gastrointestinal tract of animals harbors a large, complex, and dynamic microbial community and its composition has been difficult predict, since individual hosts may be in fact too different from one another to be considered replicate habitats (Sloan et al., 2006). This finding suggests that gut ecology may not follow the rules of traditional ecology,

113 and that genetic and/or nutritional differences between individuals may be more important in shaping individual communities (Ley et al., 2008). Thus, comparing the gene content of multiple endobiotic microbiomes can help elucidate the ecological underpinnings of gut systems.

While sequencing the16S rRNA gene, FISH, T-RFLP approaches have been utilized to describe the microbial diversity harbored within the pig distal gut, (Leser et al., 2002; Castillo et al., 2007) the total functional gene content of this gut microbiome has not been studied. Gut metagenomes have been described from the humans, cows, chicken, termites, and mice, providing a premise for comparative functional endobiotic metagenomics. Using a comparative metagenomics approach, we can determine potential phylogenetic and functional differences harbored within the swine gut environment. Studying the functional capacity of the swine gut can shed light on swine health, nutrition, food and water safety, prevalence of antibiotic resistance, and the overall ecology of the swine gut.

In this study we employed the first, random sample pyrosequencing approach to the complex microbiome of the distal pig gut. The overall goal of this study was to characterize the swine fecal microbiome with respect to species composition and functional content. In order to better understand the relationship between the swine gut microbiome and other endobiotic environments, we performed a large-scale comparative metagenomics approach in the context of phylogenetic and functional composition. The present study demonstrates that pyrosequencing is a useful approach for comparing the gene content of complex gut microbiomes. Additionally, although sequencing coverage was low and variable for each metagenome, a comparative metagenomic approach was useful in identifying unique and/or overabundant taxonomic and functional elements within swine distal gut microbiomes. It also appears that the genes associated with the variable portion of endobiotic microbiomes cluster by host environment with

114 surprising hierarchical trends. This study suggests that the variable microbiome content of given host species comprises plasticity reflective of host ecology.

5.3. Materials and Methods

5.3.0. Fecal Sample Collection.

Fecal samples were collected in April 2006 from a large swine operation located in

Loudonville, Ohio, housing more than 1,000 head of swine. Fecal samples were transported to the laboratory on ice, and store at -20ºC until further processing. DNA extractions were performed on fecal samples collected from eight, six-month old Yorkshire pigs. Fecal DNA was extracted with the FastDNA SPIN Kit (MP Biomedicals, Inc., Solon, OH) according to the manufacturer’s instructions using 250 µl of each fecal sample. Total DNA was then quantified using a NanoDrop ® ND-1000 UV spectrophotometer (NanoDrop Technologies, Wilmington,

DE).

5.3.1. Pyrosequencing and Sequence Analysis.

Three µg of each fecal DNA extract (n=8) were pooled and sent for sequencing to 454

Life Sciences, where one run was performed using 454 Life Science’s, first generation, Genome

Sequencer GS 20 sequencing machine and a second run using the Genome Sequencer FLX instrument. The sequences were compared using the BLASTX algorithm with an expected cutoff of 1×10−5 (Altschul et al., 1990). The BLASTN algorithm (E<1×10−5 and a sequence length hit>50 nucleotides) was used to identify small subunit rDNA genes from RDP (Cole et al.,

2007), Silva SSU (Prausse et al., 2007), and Greengenes databases (DeSantis et al., 2006a).

Additionally, both metagenomes were submitted to the Joint Genome Institute's IMG/M-ER

115 annotation pipeline (Markowitz et al., 2009) using the proxygene method (Dalevi et al., 2009) for gene annotation. The metagenomes used in this paper are freely available from the SEED, JGI's

IMG/M, and NCBI Short Read Archive. Platforms are being made accessible from CAMERA and the NCBI genome project ID and GOLD ID for swine fecal sequences generated in this project are 39267 and Gm00197, respectively.

5.3.2. Diversity Indices.

Observed richness, Chao1 estimator, abundance-based coverage estimator (ACE), jackknife estimator, bootstrap estimator were used to evaluate community richness, while community diversity was described using the classical Shannon, non-parametric shannon, and

Simpson indices. Sampling coverage was calculated using Good's coverage for the given operational taxonomic unit (OTU) definition, while the Boneh estimate was used to calculate the number of additional OTUs that would be observed for an additional 500 SSU reads. The aforementioned SSU rDNA diversity indices were calculated using Mothur v 1.5.0 program

(Schloss et al., 2009) and calculations for each index can found in the Mothur manual

(http://www.mothur.org/wiki/Mothur_manual). Functional diversity was assessed using subsystem, COG, and Pfam abundances from all available endobiotic metagenomes. Diversity estimator used included Shannon-Weiner, Simpson's lambda, and Pielou's evenness analyses for measuring species richness and evenness Functional diversity estimates, K- dominance plots,

Principal Components Analysis, and clustering were performed using the PRIMER v6 ecological software package.

116

5.3.3. Statistical Analyses.

To compare the distribution of taxonomic and functional groups between the two metagenomes a non-parametric Wilcoxon exact test was used, as a non-parametric statistic has minimal assumptions. The test takes into account the magnitude of the differences between two paired variables to identify whether significant differences exist. The data was normalized for sequencing coverage by calculating the percent distribution, prior to statistical analysis. To expose functions over abundant or unique to a given metagenomic dataset, we performed two- way hierarchical clustering of normalized COG and Pfam abundances using the Bioinformatics

Toolbox with Matlab version 2009a. Additionally, to evaluate if these unique or overabundant functions were statistically meaningful, we employed the binomial test within the

ShotgunFunctionalizeR program (Kristiansson et al., 2009a; Kristiansson et al., 2009b)

5.4. Results

In order to better understand the functional gene content and metabolic potential of the swine fecal microbial community we utilized a pyrosequencing approach. The overall goal of this study was to characterize the phylogenic and functional content of the swine fecal microbiome and to compare this distal gut environment to other currently available endobiotic metagenomes, as a method for revealing potential functional and phylogenetic differences harbored within the swine gut. Approximately 130 megabases of sequence data were retrieved from the swine fecal metagenome, making this study the largest endobiotic metagenome sequencing effort to date (Table 5.1).

117

Taxonomic distribution of 16S rDNA reads from the GS20 and FLX swine fecal metagenomes revealed similar taxonomic distributions when compared against the Ribosomal

Database Project and Silva SSU databases. Both swine fecal microbiomes were dominated by

Firmicutes and Bacteroidetes phyla (Fig. 5.1), as has been previously shown in several molecular phylogenetic studies of mammalian gut environments and is consistent with the molecular phylogenetic study of the swine gut (Leser et al., 2002). Archaeal sequences constituted less than

1% of total SSU sequences retrieved in either swine metagenome (Fig. S13), and were dominated by the Methanomicrobia and Thermococci, which is consistent with previous molecular diversity studies of pig manure (Snell-Castro et al., 2005). While these populations are only a very small fraction of the total microflora (Sorlini et al., 1998; Lamendella et al.,

2009), archaeal methanogens contribute significantly to the metabolic potential within in a gut environment (Jensen, 1996) The majority of eukaryotic SSU sequences derived from the swine metagenomes are related to Chordata (i.e., host), fungi, and the Viridiplantae (i.e., feed)

Interestingly, in both swine metagenomes, we retrieved sequences sharing high sequence homology to Balantidium coli, the only know ciliate protozoan pathogen, which causes the disease balantadiasis in humans, swine and other mammalian hosts. Viral sequences were rare, comprising only 0.37% and 0.64% of total metagenomic hits when compared against the SEED database. The low abundance of viral sequences retrieved from the swine fecal metagenomes is consistent with viral proportions see from the termite, chicken, and cow gastro-intestinal metagenomes, and may be a direct result of limited representation of viral genetic information in currently available databases (Qu et al., 2008).

When taking a closer look at the taxonomic distribution of the numerically abundant

Bacterial orders derived from the swine metagenomes, Clostridiales/unclassified Firmicutes,

118

Bacteroidales, Spirochaetales, unclassified gammaproteobacteria, and Lactobacillales were the top five most abundant bacterial groups using both the RDP and Greengenes taxonomic classification schemes (Fig S11). At the genus-level taxonomic resolution, Prevotella species were the most abundant comprising 19-22% of SSU sequences within both swine fecal metagenomes (Fig. 5.2). Of the classified Clostridiales, Sporobacter was the next most abundant genus within both the swine fecal metagenomic datasets. Anaerovibro, Clostridium, and Streptococcus genera encompassed at least 5% of SSU sequences in either the GS20 or FLX datasets.

We performed comparative metagenomics on 16S rRNA sequences derived from publicly available (September 2009) endobiotic metagenomic datasets from the SEED databases, in order to reveal phylotype differences between mammalian, avian, invertebrate, and fish distal gut microbiomes. It should be noted that the comparative analysis was performed on the percent of sequences with similarity to each bacteria phylum, which normalized for differences sequencing depth for each metagenome (Table 5.2). Interestingly, the distribution of phyla from swine feces appeared closest to that of the cow rumen and chicken cecum, sharing more similar proportions of Bacteroidetes, Firmicutes, , and (Fig. 5.3) using both RDP and Greengenes databases. Human adult and infant distal gut microbiomes had significantly higher abundances of Actinobacteria (p<0.05) than the swine microbiome (Table

S6). The fish gut microbiome comprised Proteobacteria and Firmicutes, while the termite gut was dominated by Spirochetes. Interestingly, the swine fecal metagenome harbored significantly higher numbers of Spirochetes than all other metagenomes, with exception to the termite gut

(Table S7). Comparing the abundance of bacteria classified at the genus level revealed

119

Table 5.1. Summary of pyrosequencing data from Yorkshire swine fecal samples

Yorkshire Pig Fecal Yorkshire Pig Fecal Metagenome GS20 Metagenome FLX

Total no. of sequences 157,221 462,501 Total sequence size (bp) 24,518,676 106,193,719 Average sequence length (bp) 155.95 229.61 Genes* 42677 124684 CDS* 42349 (99.23%) 124050 (99.49%) RNA* 328 (.77%) 634 (.51%) rRNA* 328 634 5S 25 46 16S 114 248 18S 1 2 23S 181 325 28S 1 3 Ribosomal Database Project 16S rDNA hits 328 (0.21%) 1100 (.24%) Greengenes 16S rDNA hits 295 (0.19%) 912 (0.20%) w/ Func Prediction* 33249 (77.9%) 93804 (75.2%) COG* 33997 (79.7%) 97053 (77.8%)

Pfam* 34589 (81.0%) 99027 (79.4%) TIGRfam* 16117 (37.8%) 44040 (35.3%) Genome Properties* 3881 (9.1%) 10599 (8.5%) Signalp* 11125 (26.1%) 35780 (28.7%) TransMb* 8863 (20.8%) 26949 (21.6%) MetaCyc* 3694 (8.7%) 10815 (8.7%)

* Indicates that these summary statistics were generated using the IMG/M-ER annotation system offered through the Joint Genome Institute (Markowitz et al., 2008) using the proxygene method

(Dalevi et al., 2009).

120

16S rRNA gene 16S rRNA gene (Ribosomal Database Project) (Greengenes database)

1% 0% 1% 0% 9% 0% 17% 2% 25% 3% 28% Actinobacteria 2% Actinobacteria Bacteroidetes Bacteroidetes 4% Firmicutes Firmicutes Proteobacteria

Metagenome Proteobacteria Spirochaetes (GS 20) (GS unclassified_Bacteria Unclassified

57%

Swine Fecal Fecal Swine 51%

0% 1% 1% 0% 0% 0% 0% 0% 0% 0% 15% 12% Actinobacteria Actinobacteria 0% 31% Bacteroidetes Bacteroidetes 4% 3% Firmicutes Firmicutes 37% Proteobacteria Proteobacteria 3% Spirochaetes Spirochaetes

Metagenome 10% Lentisphaerae unclassified_Bacteria Unclassified (FLX) Genera_incertae_sedis_TM7 Cyanobacteria Fibrobacteres Verrucomicrobia Chloroflexi

Swine Fecal Fecal Swine 43% 40%

Fig. 5.1. Taxonomic composition of from 16S rDNA sequences retrieved from swine fecal metagenomes. The percent of sequences in each of the bacterial phyla from the chicken cecum A and B microbiomes is shown. E-value cutoff for

SSU rDNA hits for all databases used is 1×10−5 with a minimum length of 50 bp. 121 that Prevotella were significantly more abundant in the swine fecal metagenome as compared to all other endobiotic metagenomes (p<0.05), with the exception of the cow rumen, while Bacteroides species were more abundant in chicken and human distal gut microbiomes (Fig 5.4). Additionally, Anaerovibrio and Treponema genera were exclusively found within the pig fecal metagenomes. Hierarchical clustering of phylotype distribution (genus-level) from each endobiotic microbiome revealed that community structure of the swine fecal microbiome was significantly different (p<0.05) from the other endobiotic microbiomes (Fig. 5.5). Of all the microbiomes used in the comparative analysis, the swine metagenomes exhibited highest resemblance to the cow rumen, displaying 59% similarity at the genus level. Surprisingly, swine fecal community structure (genus-level) was less that 40% similar to any of the human fecal microbiomes used in this study.

122

Table 5.2. The results of a Wilcoxon test to compare taxonomic distribution of bacterial orders from endobiotic microbiomes.

Metagenome Z-score One-sided p value Two-sided p value Pig Feces GS20 1.0493 0.147 0.2941 Cow Rumen 1.8544 0.0318 0.0637 Chicken Cecum 2.6024 0.0046 0.0093 Human In-A 2.5245 0.0058 0.0116 Human In-B 3.7919 0.0001 0.0002 Human In-D 2.1297 0.0166 0.0332 Human In-E 2.8309 0.0023 0.0046 Human In-M 2.76 0.0029 0.0058 Human In-R 2.35 0.0094 0.0188 Human F1-S 3.2413 0.0059 0.0012 Human F1-T 1.6986 0.0447 0.0894 Human F1-U 3.1582 0.0008 0.0016 Human F2-V 2.0933 0.0182 0.0363 Human F2-W 2.4777 0.0066 0.0132 Human F2-X 2.8621 0.0021 0.0042 Human F2-Y 2.54 0.0055 0.1102 Mouse Cecum 3.4127 0.0003 0.0006 Termite Gut 3.4543 0.0003 0.0006 Fish gut 3.7503 0.0001 0.0002

123

Mycobacterium Saccharothrix Eggerthella Enterococcus Paralactobacillus Pediococcus A Tetragenococcus Acidaminococcus Allisonella Planctomyces Helicobacter unclassified_Aeromonadales Haemophilus Propionibacterium Prevotella Bifidobacterium unclassified_Rikenellaceae 22% Flavobacterium Gelidibacter Sphingobacterium unclassified_Sphingobacteriales unclassified_Deferribacterales Paenibacillus unclassified_Lactobacillales Aminobacterium Catonella Dehalobacter Desulfonispora Dorea Eubacterium Lachnospira Schwartzia Syntrophomonas Tissierella unclassified_Peptostreptococcaceae Sutterella Photobacterium Marinilabilia Spirochaeta Verrucomicrobium 2% Tenacibaculum Leuconostoc Trichococcus Acetivibrio Selenomonas Mycoplasma Campylobacter 2% unclassified_Flavobacteriaceae Ruminobacter unclassified_Clostridiales Pseudomonas unclassified_Verrucomicrobiales unclassified_Acidaminococcaceae 13% Megasphaera Erysipelothrix 2% unclassified_Enterobacteriaceae Lactococcus Anaerovibrio Natronincola unclassified_Entomoplasmataceae Streptococcus Acholeplasma 2% unclassified_Burkholderiales Clostridium Mitsuokella Dysgonomonas Tannerella Lactobacillus 2% Rikenella Anaerofilum Phascolarctobacterium Sporobacter 2% Roseburia unclassified_Xanthomonadaceae 10% Fibrobacter Bacteroides unclassified_Oceanospirillales Dechloromonas Marinilabilia 3% Selenomonas unclassified_Acidaminococcaceae unclassified_Bacteroidales Anaerovibrio Ruminococcus unclassified_Clostridiaceae 5% Rikenella unclassified_Oceanospirillales Treponema 3% unclassified_Clostridiaceae unclassified_Lachnospiraceae 3% unclassified_Lachnospiraceae unclassified_Porphyromonadaceae Treponema 3% unclassified_Bacteroidales Sporobacter unclassified_Porphyromonadaceae unclassified_Clostridiales Prevotella 3%

Saccharothrix Eggerthella B Enterococcus Paralactobacillus Pediococcus Tetragenococcus Acidaminococcus Allisonella Planctomyces Helicobacter unclassified_Aeromonadales Haemophilus Prevotella Megasphaera Propionibacterium Bifidobacterium 19% 4% unclassified_Rikenellaceae Flavobacterium 1% Gelidibacter Sphingobacterium unclassified_Sphingobacteriales unclassified_Deferribacterales unclassified_Entomoplasmataceae Paenibacillus unclassified_Lactobacillales 2% Aminobacterium Catonella Dehalobacter Desulfonispora Streptococcus Dorea Eubacterium 3% Lachnospira Schwartzia Syntrophomonas Tissierella unclassified_Peptostreptococcaceae Sutterella Clostridium Photobacterium Spirochaeta 5% Verrucomicrobium Tenacibaculum unclassified_Clostridiales Leuconostoc Trichococcus Acetivibrio Mycoplasma 11% Mitsuokella Campylobacter unclassified_Flavobacteriaceae 2% Ruminobacter Pseudomonas unclassified_Verrucomicrobiales Megasphaera Erysipelothrix unclassified_Enterobacteriaceae Lactobacillus Lactococcus Natronincola 6% unclassified_Entomoplasmataceae Streptococcus Acholeplasma unclassified_Burkholderiales Sporobacter Clostridium Mitsuokella Dysgonomonas Tannerella 8% Lactobacillus Anaerofilum Phascolarctobacterium Roseburia unclassified_Bacteroidales unclassified_Acidaminococcaceae unclassified_Xanthomonadaceae Fibrobacter 5% 2% Bacteroides Dechloromonas Marinilabilia Selenomonas Treponema Anaerovibrio unclassified_Acidaminococcaceae Anaerovibrio 2% 6% Ruminococcus Rikenella unclassified_Oceanospirillales unclassified_Clostridiaceae Ruminococcus unclassified_Porphyromonadaceae unclassified_Lachnospiraceae unclassified_Porphyromonadaceae 3% 2% Treponema unclassified_Bacteroidales Rikenella unclassified_Lachnospiraceae Sporobacter unclassified_Clostridiales 2% Prevotella 2% unclassified_Clostridiaceae 2% Fig. 5.2. Taxonomic composition of bacterial genera from 16S rDNA sequences

retrieved from swine fecal metagenomes. The percent of sequences in each of the

bacterial phyla from the (A) GS20 and (B) FLX swine fecal metagenomes is

shown. E-value cutoff for rDNA hits using the RDP database was 1×10−5 with a

minimum length of 50 bp.

124

Fig. 5.3. Taxonomic distribution of bacterial phyla from endobiotic microbiomes using RDP and Greengenes classification schemes.

125

45

40

35

Cow Rumen 30 Chicken Cecum 25 Human Infant Human Adult 20 Pig Feces Termite Gut 15 Fish Gut

10

5

0 Percent Percent of bacterial classified using genera 16S reads rDNA

Veillonella Prevotella Treponema Butyrivibrio Fibrobacter Mitsuokella Spirochaeta Clostridium Bacteroides Anaerovibrio Eubacterium Paenibacillus Lactobacillus Sporobacter Anaerofilum Streptococcus Herbaspirillum Ruminococcus Bifidobacterium

Fig. 5.4. The taxonomic distribution of bacterial genera present in endobiotic metagenomes using the RDP classification

126

Complete linkage Transform: Square root Resemblance: S17 Bray Curtis similarity 0 gut Cow Rumen Chicken Cecum 20 Human Infant Human Adult Pig Feces

40 Mouse Cecum

y t

i Termite Gut

r

a l

i Fish Gut

m i

S 60

80

100

t t t t t t t t t t t t t t t

s s l l l l l l l

n

m m

n n n n n u u n

e e u u u u u u u

e

u u

a a a a a a

c c

d d d d d d d

G G

f f f f f f

c c

m

e e

n n n A A n A A A n A n A

e e

u

e h

I I I I I I

F F

t

i s

C C n n n n n n n

R

i

n n n n n n

g g

a a a a a a a

i i

m F

a a e n a a a a

r

w

P P

s m m m m m m m

e

e

m m o m m m m

u k u u u u u u u

u u u u u T u

C

c

o

i H H H H H H H

H H H H H H

h M C Samples Fig. 5.5. Hierarchical clustering of genus-level 16S rRNA gene abundance from endobiotic microbiomes. Hierarchical clustering using Bray-Curtis dissimilarities. Red-colored branch lengths indicate that no statistical difference in similarity profiles could be identified for these respective nodes.

127

In order to assess diversity of each endobiotic metagenome, we applied several statistical models for measuring genotype richness, evenness, and coverage of SSU rDNA hits against the

RDP database. Overall, while coverage of the GS20 pig fecal metagenome was lower (i.e. 91%) than the FLX run (i.e. 97%), all diversity indices indicate that both swine metagenomes had similar genotype diversity (Tables 5.3, 5.4, and 5.5). Swine fecal microbiomes appeared to have higher richness and lower evenness as compared to chicken, mouse, fish, and termite endobiotic communities. This trend was further supported by a cumulative k-dominance plot, as both swine k-dominance curves are less elevated than all other endobiotic microbiomes (Fig. S15).

Rarefaction of the observed number of OTUs (genus-level) indicated several of the individual human microbiomes were under-sampled (Fig. S14) thus we combined individual pig fecal, human infant, and human adult SSU hits, and performed diversity analyses on the total number of SSU hits. The total human adult and pig microbiomes shared similar diversity patterns, and overall were more diverse than human infant microbiota.

To predict the metabolic potential within the swine fecal microbiome both the MG-RAST and the IMG/M-ER annotation pipelines were used. The MG-RAST pipeline uses BLASTX to direct assign metagenomic fragments to SEED functional subsystems, while the IMG/M-ER pipeline used the proxygene annotation method against COG, KEGG, Pfam, and other available databases. The broad functional classifications of the swine fecal metagenomic reads were expected from previous metagenomics studies of the chicken cecum, cow rumen, human distal gut, and the termite gut. Similar proportions of broad level SEED subsystem classification were retrieved for both the GS20 and FLX swine fecal metagenomes (Fig. 5.6.). However, it should be noted that only 10% of sequences retrieved from the GS20 pig fecal metagenome were assigned to 574 subsystems, while more than 25% of all FLX reads were classified into

128

Table 5.3. Diversity analyses of the endobiotic microbiomes using 16S rRNA gene sequences.

Shannon (non- Metagenome Sobs Chao1 ACE jackknife Shannon parametric) Simpson boneh coverage Pig Feces GS20 52 77.09 (61.24-120.12) 116.05 (89.07-162.68) 76.88 (62.74-91.02) 3.17 (3.03-3.32) 3.36 0.07 (0.05-0.08) 10.34 0.91 Pig Feces FLX 71 113.86 (86.42-190.10) 125.60 (103.78-161.95) 119.78 (92.49-147.06) 3.19 (3.10-3.29) 3.27 0.08 (0.07-0.09) 5.84 0.97 Cow Rumen 40 63.00 (48.33-103.51) 168.17 (120.97-242.89) 63.63 (49.92-77.33) 2.56 (2.35-2.77) 2.86 0.15 (0.11-0.19) 10.58 0.88 Chicken Cecum 37 47.11 (39.89-72.43) 68.02 (52.45-99.29) 51.00 (40.63-61.37) 2.25 (2.11-2.39) 2.36 0.20 (0.17-0.23) 5.58 0.97 Human In-A 20 33.75 (23.40-75.55) 62.23 (41.01-104.88) 32.94 (22.19-43.70) 2.52 (2.25-2.79) 2.84 0.10 (0.06-0.15) 5.05 0.81 Human In-B 10 20.50 (12.03-64.19) 27.79 (13.32-105.26) 23.03 (10.30-35.76) 0.84 (0.50-1.17) 1.15 0.68 (0.53-0.82) 3.02 0.90 Human In-D 26 32.00 (27.33-53.10) 34.06 (28.41-52.93) 35.00 (26.68-43.32) 2.97 (2.80-3.13) 3.16 0.05 (0.04-0.07) 4.95 0.90 Human In-E 18 22.20 (18.79-40.34) 26.41 (20.24-49.62) 25.00 (17.67-32.33) 1.11 (0.88-1.34) 1.26 0.60 (0.51-0.69) 3.72 0.96 Human In-M 26 46.00 (32.02-92.48) 80.76 (54.86-129.91) 43.95 (31.51-56.39) 2.97 (2.72-3.22) 3.42 0.05 (0.02-0.08) 7.34 0.69 Human In-R 21 23.50 (21.41-36.27) 26.77 (22.44-44.13) 27.00 (20.21-33.79) 2.57 (2.38-2.76) 2.72 0.10 (0.07-0.13) 2.83 0.87 Human F1-S 22 31.00 (24.00-62.45) 39.21 (29.33-62.40) 31.00 (22.68-39.32) 2.68 (2.49-2.87) 2.85 0.08 (0.06-0.10) 4.30 0.90 Human F1-T 37 64.14 (46.04-118.51) 109.84 (79.72-161.17) 66.22 (47.95-84.48) 3.05 (2.83-3.26) 3.36 0.07 (0.04-0.10) 9.39 0.82 Human F1-U 17 20.75 (17.64-39.02) 21.96 (18.14-38.53) 23.00 (16.21-29.79) 2.30 (2.04-2.56) 2.49 0.15 (0.08-0.21) 3.22 0.91 Human F2-V 37 46.10 (39.59-68.96) 48.59 (41.00-70.52) 51.00 (40.63-61.37) 3.07 (2.89-3.26) 3.29 0.07 (0.05-0.09) 7.64 0.87 Human F2-W 25 36.00 (27.88-66.94) 55.50 (39.11-90.92) 37.00 (27.40-46.60) 2.72 (2.50-2.93) 2.96 0.08 (0.06-0.11) 5.85 0.86 Human F2-X 19 21.00 (19.29-32.96) 22.80 (19.83-36.32) 24.00 (17.80-30.20) 2.57 (2.38-2.76) 2.72 0.09 (0.06-0.12) 3.06 0.94 Human F2-Y 27 40.20 (30.44-77.60) 41.54 (31.66-72.36) 39.78 (29.54-50.01) 2.87 (2.67-3.08) 3.10 0.07 (0.05-0.09) 5.82 0.87 Mouse Cecum 14 36.50 (19.23-110.77) 41.22 (20.35-130.67) 39.09 (19.22-58.95) 2.18 (1.78-2.58) 2.69 0.15 (0.04-0.25) 4.13 0.67 Termite Gut 13 27.00 (15.92-80.11) 30.75 (16.84-95.03) 29.19 (14.56-43.82) 2.05 (1.72-2.38) 2.38 0.16 (0.09-0.23) 3.39 0.79 Fish gut 14 19.00 (14.86-42.91) 20.45 (15.44-42.93) 20.00 (13.21-26.79) 2.29 (2.05-2.54) 2.50 0.11 (0.07-0.15) 3.71 0.87 Pig Feces Total 91 127.25 (105.56-181.27) 184.42 (150.70-237.20) 127.57 (108.75-146.39) 3.15 (3.11-3.20) 3.19 0.06 (0.06-0.07) 0.34 0.99 Human Infant Total 59 80.00 (66.47-118.05) 83.37 (69.43-115.92) 82.03 (68.30-95.75) 2.66 (2.52-2.79) 2.78 0.17 (0.14-0.20) 1.25 0.96 Human Adult Total 72 89.00 (77.34-126.16) 85.74 (77.28-107.71) 89.60 (77.72-101.48) 3.35 (3.30-3.40) 3.39 0.05 (0.04-0.05) 0.37 0.99

129

Table 5.4. Diversity analyses for endobiotic metagenomes using SEED Subsystem annotations.

S N d J' Brillouin Fisher ES(5000) Pig Feces GS20 574 16093 59.16 0.871 5.458 116.3 472 Pig Feces FLX 714 117061 61.09 0.851 5.575 101.2 488.3 Human In-A 570 11722 60.73 0.8865 5.527 125.3 505.4 Human In-B 461 5210 53.75 0.8813 5.25 122.1 456.8 Human In-D 623 21569 62.33 0.876 5.573 119.8 505.2 Human In-E 555 12022 58.97 0.8686 5.397 120.3 474.3 Human In-M 612 9802 66.48 0.8921 5.603 144.7 553.1 Human In-R 626 20477 62.96 0.8787 5.591 122.1 512 Human F1-S 609 17794 62.13 0.8797 5.567 122.1 507.2 Human F1-U 604 12009 64.19 0.8975 5.644 134 532.7 Human F2-V 659 24015 65.24 0.8757 5.623 125.3 522.8 Human F2-W 620 19007 62.83 0.8807 5.592 122.8 514.2 Human F2-X 629 18778 63.82 0.8805 5.602 125.4 519.5 Human F2-Y 617 21710 61.69 0.8718 5.538 118.2 505.7 Lean Mouse Cecum 496 5146 57.92 0.8906 5.359 135.4 493 Termite Gut 591 33635 56.6 0.8587 5.439 101.8 453.7

Table 5.5. Diversity analyses for endobiotic metagenomes using COG and Pfam annotations.

Sample S N d J' Brillouin Fisher ES(5000) Yorkshire Pig Fecal Metagenome GS20 (COG) 2821 34010 270.3 0.9199 7.157 730.5 1672 Yorkshire Pig Fecal Metagenome GS20 (Pfam) 3043 50716 280.8 0.8785 6.936 710.7 1500 Yorkshire Pig Fecal Metagenome FLX (COG) 3717 97095 323.6 0.9081 7.385 766.5 1806 Yorkshire Pig Fecal Metagenome FLX (Pfam) 4314 141670 363.6 0.8574 7.115 840.4 1610 Human Gut Community Subject 7 (COG) 2256 12888 238.3 0.9332 6.932 791.7 1650 Human Gut Community Subject 7 (Pfam) 2065 13849 216.4 0.9078 6.692 671.9 1491 Human Gut Community Subject 8 (COG) 2295 15978 237 0.9243 6.919 734.4 1598 Human Gut Community Subject 8 (Pfam) 2116 19015 214.7 0.8962 6.675 609.5 1424 Termite Gut (COG) 2261 44458 211.2 0.9117 6.939 503.3 1410 Termite Gut (Pfam) 2079 49791 192.1 0.8809 6.645 438.5 1253 Mouse Gut Community lean1 (COG) 816 1562 110.8 0.9421 5.699 689.6 816 Mouse Gut Community lean1 (Pfam) 777 1637 104.9 0.9281 5.609 578.8 777 Mouse Gut Community lean2 (COG) 782 1424 107.6 0.9691 5.806 711.5 782 Mouse Gut Community lean 2 (Pfam) 637 1317 88.54 0.9134 5.333 485.7 637 Mouse Gut Community lean3 (COG) 787 1508 107.4 0.9144 5.493 664.2 787 Mouse Gut Community lean3 (Pfam) 749 1578 101.6 0.8989 5.394 558 749

130

Pig Fecal Metagenome (GS 20) Pig Fecal Metagenome (FLX)

1% 1% 1% 2% 1% 1% 1% 1% 2% 1% 1% 1% 2%2% 1% 2% 15% 1% 2%2% 16% 2% 15% 2% 2% 2% 2% 3% 3% 3% 4% 4% 4% 13% 4% 5% 13% 13% 4%

5% 5% 5%

5% 6% 5% 10% 10% 11% 5% 6% 5% 6% 4% 9% 8% 9% 8% 6% 8% 8%

Prophage Metabolism of Aromatic Compounds Nitrogen Metabolism PotassiumProphage metabolism SulfurMetabolism Metabolism of Aromatic Compounds FattyNitrogen Acids and Metabolism Lipids MotilityPotassium and Chemotaxis metabolism RegulationSulfur Metabolism and Cell signaling PhosphorusFatty Acids Metabolism and Lipids MembraneMotility Transport and Chemotaxis StressRegulation Response and Cell signaling Cell PhosphorusDivision and Metabolism Cell Cycle RespirationMembrane Transport UnclassifiedStress Response NucleosidesCell Division and Nucleotides and Cell Cycle RNARespiration Metabolism Cofactors,Unclassified Vitamins, Prosthetic Groups, Pigments VirulenceNucleosides and Nucleotides CellRNA Wall Metabolism and Capsule DNACofactors, Metabolism Vitamins, Prosthetic Groups, PigmentsAminoVirulence Acids and Derivatives ProteinCell WallMetabolism and Capsule CarbohydratesDNA Metabolism Clustering-basedAmino Acids subsystems and Derivatives Protein Metabolism Carbohydrates Clustering-based subsystems

Fig. 5.6. SEED Subsystem functional composition of two swine fecal metagenomes. The percent of reads in each Level 1 SEED subsystems from the GS20 and FLX swine fecal metagenomic runs is shown. The BLASTX cutoff was 1×10−5 with a 50 base pair minimum alignment length.

131

714 subsystems. When both pig fecal metagenomes were annotated using proxygenes available through JGI’s IMG/M ER pipeline, nearly one third of all GS20 and FLX pig fecal metagenomes were assigned to Pfams, and over 20% were assigned to COGs. This finding suggests that the proxygene method for gene-centric approaches to metagenomic studies is more robust than the direct BLASTX assignment strategy. Diversity analyses of Subsystems, COGs, and Pfams retrieved from swine metagenomes and other endobiotic metagenomes tested in this study, revealed that larger sequencing efforts reveal significantly more functional classes. For example, an additional 150 Subsystems, 896 COGs, and 1271 Pfams were retrieved from the FLX run as compared to the GS20 metagenome, suggesting additional sequencing efforts for all microbiomes are necessary to cover the high functional diversity in gut environments

Carbohydrate metabolism was the most abundant SEED subsystem representing 13% of both swine fecal metagenomes (Fig. 5.6). Genes associated with cell wall and capsule, stress, and virulence were also very abundant in both metagenomes. It should be noted that 15-16% of annotated reads from swine fecal metagenomes was categorized into the clustering-based subsystems, most of which have unknown or putative functions. Additionally, only 75% to 90% of metagenomic reads were assigned to subsystems, suggesting the need for improved binning and coding region prediction algorithms to annotate these unknown sequences. To improve the meaning of metagenomic functional analysis, we applied statistical methods to compare the 29 broad level functional subsystems that are more, or less, represented in the different microbiomes. As was expected, all endobiotic metagenomes were dominated by carbohydrate metabolism with amino acid, protein, cell wall and capsule, and virulence subsystems represented in relatively high abundance as well. Interestingly, protein metabolism and amino acid subsystems were statistically significantly more abundant in chicken, pig, and cow gut

132

A

B

Fig. 5.7. Hierarchical clustering of (A) COG and (B) Pfam normalized abundance from endobiotic microbiomes. Hierarchical clustering using Bray-Curtis dissimilarities.

133

metagenomes (Fig. 5.8). This elevated protein turnover is consistent with an increased use of amino acids for protein accretion in food production animals. The termite, fish, and pig gut had higher proportion of reads classified to the chemotaxis and motility subsystems as compared to other endobiotic metagenomes.

In order to better understand functional similarity of the Yorkshire pig fecal metagenome to currently available metagenomic projects, we performed hierarchical clustering and principal components analysis of COG and Pfam normalized abundance profiles. Clustering of 55 currently available metagenomes using both COG and Pfam assignments revealed that the swine fecal metagenomes shared approximately 70% similarity to human fecal metagenomes and 60% similarity to the termite gut (Fig. 5.7). Principle components analysis supported the same overall clustering patterns as seen in the hierarchical clustering (Fig. S10). The relatively high similarity shared by the swine fecal metagenome and the invertebrate termite gut, was surprising and suggestive of previously unknown shared functional similarities between these endobiomes.

While analysis of broad-level functions provided some insight to functional potential encoded by the swine fecal microbiome, we investigated functional roles of the swine gut a finer functional resolution. Investigating the SEED Subsystem Hierarchy 1 functional annotation, revealed that subsystems associated with specialized cell wall and capsule enzymes, DNA recombination, and prophage were enriched for within the swine fecal metagenome (Fig. S16 and Fig. S17), and were thus further investigated. Upon interrogating genes within the DNA recombination and prophage subsystem, the swine fecal metagenome was enriched for RstA phage-related replication proteins, terminases, and portal proteins. Additionally, more than 30 metagenomic fragments shared high homology to unknown phage proteins, and these unknown

134 phage classes were dominated by swine metagenomic reads. When analyzing proteins involved in the cell wall and capsule subsystem, comparative metagenomics revealed an overabundance of an unknown glycosyl transferase, a phosphoglucosamine mutase, and a phosphotransferase

(Table 5.6) Interestingly, N-acetyl glucosamine-specific PTS system, specific proteins involved in mannose uptake, novel capsular polysaccharide synthesis enzymes were exclusively found within the swine fecal metagenome. Hierarchical clustering of all genes retrieved from the cell wall and capsule functional subsystem for each endobiotic microbiome, revealed that swine fecal cell wall/capsule profiles were more than 60% similar to those of the cow rumen. Additionally, the cell wall and capsule profiles were more significantly more similar to termite gut than the human gut. All together these novel functions can endow the swine gut with diversification of surface polysaccharide structures, allowing the host immune system to accommodate such a diverse microbiota (Ley et al., 2008). Presence of novel carbohydrate binding proteins and transporters suggest the swine gut is capable of exploiting a diverse array of substrates.

When carbohydrate subsystems were compared across endobiotic microbiomes, maltose and maltodextrin utilization were the most abundant carbohydrate subsystem in the swine, termite, and cow rumen. This may be a result of the high level of complex polysaccharides found in the diets of swine, cattle and termites. Analysis of carbohydrate metabolism using the

SEED subsystem approach, unveiled several proteins unique to the swine gut. For example, an outer surface protein part of the cellobiose operon, a beta-glucoside-specific IIA component and a cellobiose-specific IIC component of the PTS system and a protein similar CDP-glucose 4,6- dehydratase were unique to the swine fecal metagenome. These unique carbohydrate

135

Table 5.6. List of cell wall and capsule SEED subsystem functions overabundant in swine fecal metagenome

Pig Feces Human Adult Human Infant Cow Rumen Termite Gut Mouse Cecum Fish gut putative glycosyltransferase - possibly involved in cell wall localization and side chain formation of rhamnose-glucose polysaccharide 112 9 10 10 0 1 0 Phosphoglucosamine mutase (EC 5.4.2.10) 97 18 9 0 20 0 1 COG3178: Predicted phosphotransferase related to Ser/Thr protein kinases 66 10 6 4 5 2 1 3-deoxy-D-manno-octulosonate 8-phosphate phosphatase (EC 3.1.3.45) 27 10 9 2 0 1 3 O-antigen export system, permease protein 23 3 2 4 0 0 1 Glutamine synthetase, clostridia type (EC 6.3.1.2) 21 4 1 3 0 0 0 D-glycero-D-manno-heptose 1-phosphate guanosyltransferase 20 7 6 1 0 5 0 UDP-glucose 4-epimerase (EC 5.1.3.2) 14 1 2 0 9 1 1 Capsular polysaccharide synthesis enzyme Cap8D 9 0 1 1 0 0 0 D-alanine--D-alanine ligase B (EC 6.3.2.4) 8 0 0 0 0 0 0 PTS system, N-acetylglucosamine-specific IIB component (EC 2.7.1.69) 7 0 0 0 0 0 0 Mannose-1-phosphate guanylyltransferase (GDP) (EC 2.7.7.22) 5 0 0 0 0 0 0 2-Keto-3-deoxy-D-manno-octulosonate-8-phosphate synthase (EC 2.5.1.55) 3 0 0 0 0 0 0 capsular polysaccharide biosynthesis protein, putative 3 0 0 0 0 0 0 Capsular polysaccharide synthesis enzyme Cap8L 3 0 0 0 0 0 0

136

Table 5.7. Pfams and COGs unique to swine fecal metagenomes.

Descriptive Pfam Pig Feces FLX Pig Feces GS20 Pig Feces FLX Pig Feces GS20 Pfam COG Descriptive COG Name Name Abundance Abundance Abundance Abundance pfam00723 Glyco_hydro_15 8 1 COG5039 Exopolysaccharide biosynthesis protein 8 2 Membrane transporters of cations and cationic pfam04843 Herpes_teg_N 10 5 COG2076 drugs 8 2 pfam00686 CBM_20 10 13 COG3387 Glucoamylase and related glycosyl hydrolases 9 1 pfam09492 Pec_lyase 11 1 COG4833 Predicted glycosyl hydrolase 9 2 pfam07659 DUF1599 11 3 COG3468 Type V secretory pathway, adhesin AidA 11 3 pfam04886 PT 11 6 COG5276 Uncharacterized conserved protein 11 3 pfam09978 DUF2212 12 3 COG2931 RTX toxins and related Ca2+-binding proteins 13 6 pfam06810 Phage_GP20 12 3 COG2311 Predicted membrane protein 13 2 pfam05504 Spore_GerAC 14 1 COG0412 Dienelactone hydrolase and related enzymes 13 5 Cellobiohydrolase A (1,4-beta-cellobiosidase pfam03190 DUF255 17 2 COG5297 A) 14 1 pfam08522 DUF1735 18 4 COG3386 Gluconolactonase 14 2 Uncharacterized protein conserved in bacteria pfam04650 YSIRK_signal 18 10 COG4372 with the myosin-like 15 7 pfam09087 Cyc-maltodext_N 19 10 COG5295 Autotransporter adhesin 15 7 pfam10438 Cyc-maltodext_C 19 9 COG2133 Glucose/sorbosone dehydrogenases 17 1 RNA-binding protein of the Puf family, pfam03806 ABG_transport 25 11 COG5099 translational repressor 19 10 pfam04122 CW_binding_2 26 2 COG2247 Putative cell wall-binding domain 20 2 pfam03895 YadA 26 16 COG5283 Phage-related tail protein 20 13 pfam02181 FH2 26 9 COG4677 Pectin methylesterase 21 11 pfam06750 DiS_P_DiS 27 5 COG2978 Putative p-aminobenzoyl-glutamate transporter 23 8 pfam08800 VirE_N 32 11 COG3899 Predicted ATPase 25 2 Highly conserved protein containing a pfam03160 Calx-beta 38 2 COG1331 thioredoxin domain 26 7 pfam01344 Kelch_1 38 4 COG2374 Predicted extracellular nuclease 41 8 pfam05860 Haemagg_act 43 10 COG3712 Fe2+-dicitrate sensor, membrane component 51 7 RhoGEF, Guanine nucleotide exchange factor pfam05594 Fil_haemagg 45 6 COG5422 for Rho/Rac/Cdc42-like GTPases 51 26 pfam04773 FecR 54 8 COG3291 FOG: PKD repeat 87 6

137 metabolism proteins may suggest the importance of cellobiose metabolism within the swine gut.

Two way hierarchical clustering of COGs and Pfams retrieved from swine, human, termite, and mouse gut microbiomes revealed gene families unique to the swine distal gut (Fig.

5.10). Additionally, the swine fecal FLX run yielded a pool of pfams and COGs unique to the

FLX run, suggesting the deeper level of sequencing uncovered a larger proportion of functional diversity. Interestingly, this analysis unveiled a large pool of Pfams and COGs unique to the swine fecal metagenome.

5.5. Discussion

The overall goal of this study was to characterize the phylogenic and functional content of the swine fecal microbiome and to compare this distal gut environment to other currently available endobiotic metagenomes, as a method for revealing potential functional and phylogenetic differences harbored within the swine gut. While human colonic, bovine ruminal, and chicken cecal microbial communities are dominated by non-motile microbes, termite hindguts harbor motile populations of bacteria (Warnecke et al., 2007). In fact, when screening for 16S rRNA gene sequences from the termite gut metagenome, this study and others noted an abundance of Spirocheata genera. Thus, it was not surprising that Warnecke et al., (2007), revealed an abundance of genes related to chemotaxis and chemosensation within the termite gut.

In addition, the fish gut harbored two genera of bacteria the Paenebacillus and Herbaspirillum, which are known to possess complex social behaviors (Ingham and Ben-Jacob, 2008; Kirchhof et al., 2001), thus supporting the overabundance of reads annotated in the chemotaxis and motility subsystem within the fish gut microbiome. Surprisingly, this pig fecal metagenome revealed a very high abundance of the motile Treponema and Anaerovibrio genera as compared to other

138 endobiotic microbiomes. This finding was unexpected considering a thorough culture independent study of swine gut (Leser et al., 2002) microbiota revealed a very low abundance of

Spirochetes (i.e. 0.3% of all phylotypes). Altogether these findings suggest the pig gut harbors previously unknown social dynamics, similar to the termite gut, which may be relevant for maintaining compartmentalization and promoting niche selection within monogastric systems.

These findings show that while a majority of metagenomics reads were associated with the conserved "core microbiome", functional metagenomic studies, such as this unveils the

“variable microbiome” encoded to carry out unique functions (Qu et al., 2008). While this study has further supported the finding that taxonomically diverse gut organisms maintain a conserved core set of genes, our findings further suggest that the variable microbiome is more abundant than previously anticipated. For example, of the 160 functional SEED Subsystems (Level 1)

DNA repair/recombination and antibiotic resistance subsystems were the 5th and 6th most abundant functions within all endiobiotic microbiomes. Since the frequency of a gene encoding a particular metabolic function is usually related to its relative importance in an environment (Qu et al., 2008), transferable elements are most likely very important in shaping microbiome composition and diversity in the gastrointestinal environments. When comparing prophage and transposon genes from each endobiotic microbiome, it was clear that pig distal microbiomes harbored a great abundance and diverse array of horizontal gene transfer mechanisms. This finding from the MG RAST pipeline was further supported by the IMG/M-ER annotation. When putative transposases for all available endobiotic metagenomes were retrieved using the JGI pipeline, the swine fecal metagenome harbored the most diverse transposase fingerprint, harboring 26 different transposase families (Fig. 5.9). The importance of transposable elements was further supported by the fact that 42% of large contigs (>500 bp) assembled from all pig

139 fecal metagenomic reads matched to putative transposases (Table 5.8). Additionally, 24% of all large contigs matched to protein related to antibiotic resistance mechanisms. These results suggests that lateral gene transfer and mobile elements allow gut microbial populations to perpetually change their cell surface for sensing their environment and collecting nutrient resources present in the distal intestine (Ley et al., 2008). Interestingly, a majority of these transposable elements belonged to the Bacteroidetes genomes, and Ley et al., (2006) have shown that these genetic elements aid in the adaption of this diverse group of bacteria to the distal gut environments. Interestingly, many of genetic features unique to the swine fecal metagenome encoded cell surface features of different Bacteroidetes populations, suggesting the adaptation of

Bacteroidetes populations to distinct niches within different the swine distal gut microbiome.

While the role of diet, antibiotic usage, and genetics on shaping the ecology of the distal pig gut needs further study, industrialization of the swine industry has lead to the standardization of diet and antibiotic usage for optimization of meat production. All together, these findings suggest that genes coding for Bacteroidetes cell surface features may represent a large pool of potential swine fecal source-specific markers. The development of swine fecal source-specific markers is imperative for the assessment and protection of food and water safety.

When analyzing proteins involved in the cell wall and capsule subsystem, comparative metagenomics revealed several glycosyl transferases and carbohydrate uptake systems (Table

5.6). This unique pool of glycosyltransferases provides a capacity for diversification of surface polysaccharide structures helping shape the microbiota and allow for the presence of various food and other environmental antigens. Because the gut environment surrounding gut microbes can vary between and among host species, a direct result of this diversification is the generation of swine-specific microbiomes. It has been suggested that acquisition of new types of

140 carbohydrate binding proteins, transporters, and degradation enzymes through horizontal gene transfer allows for a wider array of substrates, which can be utilized for energy harvest (Ley et al., 2008). Pfams and COGs related to virulence factors such as adhesions were numerous within the gene families unique to the swine fecal metagenomes (Table 5.7). Proteins involved in carbohydrate transport and attachment were both abundant and unique to the distal swine gut with more than 50 metagenomic fragments sharing high sequence homology to a putative carbohydrate membrane transporters. Additionally other proteins involved in carbohydrate metabolism were unique to the swine metagenome including glycosyl hydrolases, cellobiohydrolases, gluconolactonases, maltodextrin metabolism, and pectin lyases. Many of the proteins unique to the swine fecal metagenome had unknown functions. These unique gene families provide another line of evidence that the variable microbiome is a result of the microbial interaction with its surrounding environment.

Studying the swine distal gut metagenome also shed light on the diversity and high occurrence of antibiotic resistance mechanism employed by the microbiome (Fig 5.11).

Antibiotics are widely used as additives in food or water within swine feeding operations to prevent and treat animal disease and to promote animal growth (Carlson and Fangman, 2000).

Since antibiotics are commonly excreted in urine and feces, the swine is a probable hotspot for commensal and pathogenic bacteria to develop mechanisms for resistance to these antibiotics.

Seepage and runoff of the of swine waste into both surface and groundwater with antibiotics and antibiotic-resistant bacteria poses a significant threat to public health. Nearly six percent of all assigned metagenomics reads retrieved from both swine fecal metagenomes were involved in antibiotic resistance mechanisms. Interestingly, tetracycline resistance was the most abundant class of virulence subsystems within the swine fecal metagenome, which may be explained by

141 the fact that this antibiotic class is reported to comprise nearly half of the total amount of antibiotics used in commercial swine operations (USDA, 2001).

Perhaps the most striking finding is the diversity in resistance mechanisms retrieved from the swine fecal microbiome. Resistance to floroquinolones was also well represented in the swine fecal metagenome, and may be explained by an increased of non-therapeutic use floroquinolones within pig feed. While initial studies indicated there was a low risk for floroquinolone resistance to develop, new studies from several countries are showing the use of floroquinolones was the most important factor associated with finding resistant E. coli and

Campylobacter on farms with a history of use (Taylor et al., 2009). Additionally, floroquinolone resistance was found on farms with no history of floroquinolone use, suggesting that resistant organisms, such as Campylobacter have the ability to spread between pig farms. It should also be noted that genes were retrieved with high sequence similarity to methicillin-resistant

Staphylococcus subsystem. This finding is important considering MRSA carriage has been elevated in swine and exposed farmers and veterinarians (VAN DEN Broek et al., 2009), uggesting MRSA is a significant risk in swine farm resident and worker cohorts.

More than 12% of virulence subsystems identified in the pig fecal metagenome were classified as multi-drug resistance mechanisms, suggesting the pig gut could be a hot-spot for multiple-antibiotic resistant bacteria. One subsystem, MexA MexB OprM multiple drug efflux pump was found exclusively in the swine fecal metagenome. This antibiotic resistance mechanisms has only been described in Psedomonas aeriginosa strains, known to carry resistance in cystic fibrosis patients and has not been described in endobiotic distal gut environment. Additionally, more than 10% of virulence-associated sequences were assigned to previously undescribed virulence subsystems, suggesting the prevalence of unknown virulence

142 mechanisms are at work within the distal gut. Altogether, the high abundance of metagenomic sequences assigned to known and unknown antibiotic resistance subsystems, suggests that a functional metagenomics is a possible tool for surveying and predicting the prevalence of antibiotic resistance within high cell density environments. Moreover, the availability of longer read sequencing technologies will allow for more complete coverage of these resistance gene cassettes, which will aid in the development of strategies that may inhibit antibiotic resistance gene transfer.

In this study we employed the first, random sample pyrosequencing approach to the complex microbiome of the distal pig gut. The overall goal of this study was to characterize the swine fecal microbiome with respect to species composition and functional content. In order to better understand the relationship between the swine gut microbiome and other endobiotic environments, we performed a large-scale comparative metagenomics approach in the context of phylogenetic and functional composition. The present study demonstrates that pyrosequencing is a useful approach for comparing the gene content of complex gut microbiomes. This study also showed that a comparative metagenomic approach was useful in identifying unique and/or overabundant taxonomic and functional elements within swine distal gut microbiomes. It also appears that the genes associated with the variable portion of endobiotic microbiomes cluster by host environment with surprising hierarchical trends. This study suggests that the variable microbiome content of given host species comprises plasticity reflective of host ecology.

143

Contig Contig Number Percent Predicted Protein Organism Accession Number E-value Name Length of Reads Identity contig09884 1444 159 hypothetical protein Bacteroides fragilis BAA95637 0 99% contig00095 646 22 tetracycline resistant protein TetQ Bacteroides sp. D1 ZP 04543830 2.00E-111 99% contig01271 812 22 tetracycline resistance protein Prevotella intermedia AAB51122 3.00E-102 98% contig01956 731 17 macrolide-efflux protein Faecalibacterium prausnitzii A2-165 ZP 05613628 3.00E-85 99% contig01189 549 14 macrolide-efflux protein Bacteroides finegoldii DSM 17565 ZP 05859238 8.00E-83 98% contig00070 603 11 rRNA (guanine-N1-)-methyltransferase Faecalibacterium prausnitzii A2-165 ZP 05614052 2.00E-81 100% contig07794 846 27 putative transposase Bacteroides fragilis AAA22911 4.00E-81 98% Bacillus thuringiensis serovar contig03360 671 10 ABC transporter, ATP-binding protein pondicheriensis BGSC 4BA1 ZP 04090641 8.00E-77 77% contig09748 650 13 hypothetical protein PRABACTJOHN 03572 Parabacteroides johnsonii DSM 18315 ZP 03477882 9.00E-71 77% contig00180 846 26 macrolide-efflux protein Faecalibacterium prausnitzii A2-165 ZP 05613628 6.00E-67 90% contig00608 527 7 ISPg3, transposase Prevotella tannerae ATCC 51259 ZP 05734821 1.00E-59 67% contig04843 578 7 hypothetical protein COPEUT 02459 eutactus ATCC 27759 ZP 02207638 2.00E-57 88% contig00340 847 24 conserved hypothetical protein Bacteroides sp. 4 3 47FAA ZP 05257903 6.00E-56 72% contig02245 616 7 putative transposase Bacteroides thetaiotaomicron VPI-5482 NP 809147 3.00E-52 62% contig09776 531 9 resolvase, N domain protein Faecalibacterium prausnitzii A2-165 ZP 05613620 5.00E-41 100% contig02310 557 11 replication initiator protein A Faecalibacterium prausnitzii A2-165 ZP 05613624 1.00E-38 100% contig02075 524 9 transposase Bacteroides fragilis 3 1 12 ZP 05284372 7.00E-38 92% contig02837 529 7 hypothetical protein CLOSS21 01510 Clostridium sp. SS2/1 ZP 02439046 6.00E-37 67% contig09732 632 11 hypothetical protein BACCOP 00975 Bacteroides coprocola DSM 17136 ZP 03009123 1.00E-35 62% contig09862 574 16 conserved hypothetical protein Oxalobacter formigenes HOxBLS ZP 04576182 1.00E-34 100% Sphingobacterium spiritivorum ATCC contig00069 897 21 regulatory protein 33300 ZP 03965851 4.00E-29 43% contig00129 529 9 transposase, putative Bacteroides sp. 2 1 7 ZP 05288481 8.00E-26 75% contig00130 674 11 hypothetical protein BACCOP 00975 Bacteroides coprocola DSM 17136 ZP 03009123 6.00E-24 43% contig09924 1355 55 conserved hypothetical protein Magnetospirillum gryphiswaldense MSR-1 CAJ30045 2.00E-23 45% contig00140 552 13 ISPg7, transposase Cyanothece sp. PCC 8802 YP 003135760 5.00E-23 44% contig00572 675 16 transposase, putative Bacteroides sp. 2 1 7 ZP 05288481 2.00E-21 57% contig09792 556 9 hypothetical protein ALIPUT 01364 Alistipes putredinis DSM 17216 ZP 02425220 2.00E-16 67% contig09902 528 14 putative transposase Lentisphaera araneosa HTCC2155 ZP 01873850 2.00E-12 63% contig09796 867 17 hypothetical protein CLONEX 03424 Clostridium nexile DSM 1787 ZP 03291203 3.00E-07 35% contig01049 548 5 No significant similarity found - - - - contig04775 565 4 No significant similarity found - - - -

144 contig09740 531 7 No significant similarity found - - - - contig09927 656 29 No significant similarity found - - - - Table 5.8. Summary of BLASTX results of pig fecal assembled contigs

145

Fig. 5.8. Comparison of SEED subsystem functional composition of endobiotic microbiomes. The percent of Level 1 SEED subsystems from the currently available endobiotic metagenomes. The BLASTX cutoff was 1×10−5 with a 30 base pair minimum alignment length

146

Fig. 5.9. Transposases derived from endobiotic metagenomes using IMG 2.8.

147

A

B

Fig. 5.10. Two-way hierarchical clustering of (A) COG and (B) Pfam normalized abundance.

148

Tetracycline resistance, ribosome Streptothricin protection type Integrons resistance 10% 3% 6% Streptococcus pyogenes virulence regulators Acriflavin resistance Zinc resistance 6% cluster 3% Resistance to fluoroquinolones 3% 6% Resistance to Methicillin Vancomycin resistance in 3% Staphylococci 6% Mercuric reductase USS-DB-6 Vibrio Polysaccharide 3% 5% (VPS) Biosynthesis MexA- 3% Multidrug Resistance, MexBUSS-OprM Cobalt-zinc-cadmium Tripartite Systems Found in MultidrugDB-5 Arsenic resistance MultidrugGram Resistance Negative EffluxBacteria 3%Efflux resistance 5% Pumps4% Beta-lactamase System 3% 4% 4% 4%

MexE-MexF-OprN Multidrug Efflux System Streptococcus pneumoniae Vancomycin Tolerance Locus Multiple Antibiotic Resistance MAR locus Mercury resistance operon USS-DB-4 The mdtABCD multidrug resistance cluster Multidrug Resistance, 2-protein version Found in Gram-positive bacteria Aminoglycoside adenylyltransferases Streptococcal Mga Regulon USS-DB-1 Multidrug efflux pump in Campylobacter jejuni (CmeABC operon) Teicoplanin-resistance in Staphylococcus USS-DB-2 Tolerance to colicin E2

Fig. 5.11. Resistance to Antibiotics and Toxic Compounds SEED Subsystem composition of swine fecal microbiomes. The

BLASTX cutoff was 1×10−5 with a 50 base pair minimum alignment length.

149

CHAPTER 6

Crystal Ball: The future of Microbial Source Tracking

Several source-specific biological markers have been targeted for microbial source tracking purposes, yet very few of these assays have offered utility within watershed-based studies. In order for MST to work within environmental scenarios and to be used for regulatory purposes, studies need to focus on scrutinizing host-specificity and environmental detection of potential source-tracking markers before they are made publicly available. This dissertation offers a method for both designing and scrutinizing molecular targets within fecal source and environmental samples by comprehensively studying the diversity of Bacteroidales 16S rRNA gene targets within a diversity of fecal source and environmental samples. Deeply probing diversity of these molecular targets in fecal and environmental scenarios shed light on

Bacteroidales population distribution and abundance within fecal source and environmental matrices. For example, populations previously thought to have host-specific distributions, were unveiled as populations shared by multiple hosts. Studying the molecular diversity of this target also showed that at the 16S rRNA gene level, several swine-specific Bacteroidales targets existed at high levels of coverage, however, these population exist in low abundances.

Additionally, abundances of the swine-specific populations varied in their distribution within swine-fecal sources. Altogether, these findings suggest that Bacteroidales 16S rRNA gene source-specific targets exist, however further research is needed to understand their relative abundances within several fecal source targets and their fate and transport into the environment, if these markers are to be quantitative in nature. With the advent of pyrosequencing and other next-generation sequencing technologies, we can more comprehensively study the diversity of

150 total Bacteroidales, which will allow for estimation of abundances of targeted host-specific subgroups of this vastly diverse group of fecal bacteria.

In addition to 16S rRNA gene-based source tracking markers, future assays should focus on the development of functional gene based targets. Protein encoding genes offer a larger pool of potential more host-specific markers. This dissertation highlighted several functional genes showing specificity to the swine fecal metagenome and many of these genes coded for protein involved with the microbial cell surface. These genes are interesting from a source tracking standpoint as they are thought to be more involved in host/environmental-microbial interactions.

Since these proteins are in constant contact with the swine gut environment, portion of these proteins may have faster rate of evolution, giving the bacteria a selective advantage to exploit a sub-niche within the complex gut environment. Considering domesticated swine operations are becoming larger, more centralized, and diets and antibiotics comprise of carefully calculated commercially available feeds, the inter-individual gut environment of swine may be very low.

These increasingly standardized conditions may result in protein-encoding markers that could be more widely applicable. However, considering improvements in diet and antibiotic will likely constantly evolve to more efficiently produce higher quality pork, we must consider that MST will also be a constantly evolving scientific field.

16S rRNA gene and protein encoding genes offer a large group of potentially host- specific genetic markers useful for microbial source tracking. In order for these markers to become useful fecal source identification tools they need to be used in unison. Additionally, future markers should focus on targeting source specific pathogenic markers in order to more accurately predict human health risks. Given that array-based technologies had dropped significantly cost and have low limits of detection, I envision a microbial source tracking array

151 with both pathogen and fecal indicator genetic targets. For example, the current version of the phylochip simultaneously tests for 10,000 different bacterial types for approximately 150$. The current state of microbial source tracking needs to shift from development of one or two markers to development of hundreds or thousands of source-specific markers. Additionally, chip-based technologies could also be used to evaluate markers more rapidly. In order for fecal source tracking to move from bench-top validation to a useful product to the end-user, we need to use the best available technologies more efficiently. Using chip=based technologies will allow for simultaneous detection and quantification of these targets, which will benefit counties, states, and regions, to more rapidly and accurately assess fecal pollution sources in environmental scenarios and ultimately help meet water quality standards.

152

APPENDICES

Table S1. Description of fecal samples used in this study

Size range of Number of Locations Type of operation Animal Type Location Sample Size operation(s) Range of Diet Sampled Alpaca Berkeley County, WV 2 - - - - Beef Cattle Berkeley County, WV 14 small beef farm (1) 25 beef cattle hay, grass 1 Canadian Goose Berkeley County, WV 20 - - - - Chicken Berkeley County, WV 29 Small dairy farm (2), small family farm (2) 10-75 chickens commercial feed, cracked corn, scratch 4 Dairy Cattle Berkeley County, WV 14 Small commercial dairy farm (3) 75-200 dairy cattle hay, grain 3 Domestic Cat Berkeley County, WV 10 Houses (3) 4-6 cats commercial feed 3 Domestic Dog Berkeley County, WV 15 Houses (4) 1-4 dogs commercial feed - Ferret Berkeley County, WV 1 House (1) - - - Goat Berkeley County, WV 4 - - - - Guinea Pig Berkeley County, WV 1 House (1) - - - Hedgehog Berkeley County, WV 1 House (1) - - - Horse Berkeley County, WV 11 small dairy farm (2), family hobby (1) 2-5 horses hay, grass, sweat feed 3 Human Berkeley County, WV 19 households (2), workplace (1) - - 2 Llama Berkeley County, WV 1 small dairy farm 1 llama Corn, alfalfa/hay mix 1 Peacock Berkeley County, WV 1 family hobby 4 peacocks commercial feed 1 Pig Berkeley County, WV 24 small pig farm (1), large pig farm (1) 55-300 swine Wheat, soy, barley mix, hay 2 Pigeon Berkeley County, WV 4 - - - - Prairie Dog Berkeley County, WV 2 - - - - Rabbit Berkeley County, WV 3 - - - - Whitetail Deer Berkeley County, WV 15 - - - - Pig Lodounville, OH 19 small pig farm (2), large pig farm (1) - corn, commercial feed, scraps 3 Whitetail Deer Waco, TX 2 - - - 2 Coyote Waco, TX 11 - - - 3 Raccoon Waco, TX 1 - - - 1 Squirrel, Grey Waco, TX 4 - - - 2 Armadillo Waco, TX 1 - - - 1 Turkey, Wild Waco, TX 2 - - - 1 Turkey, Wild West Virginia 8 - - - 1 Horse Waco, TX 1 Ranch (1) - - 1 Horse Plum Creek, NE 4 Ranch (1) - - 1 Vulture Waco, TX 1 - - - 1 Hog, Feral Waco, TX 1 Ranch (1) - - 1 Rabbit, Jack Waco, TX 1 - - - 1 Bobcat Waco, TX 1 - - - 1 Fox, Grey Waco, TX 1 - - - 1 Raccoon Waco, TX 2 - - - 2 Septic Tanks Plum Creek, NE 9 2-10 people per Houses (9) house variable 9 Sheep Deleware 6 grazing farm - - 1 Dove Plum Creek, NE 1 - - - 1 Possum Waco, TX 2 - - - -

153

Table S2. Description of wastewater treatment plants sampled in this study

Average Daily Flow WWTP Sample Million Gallons per Day Effluent Type Area Served Sample Size Volume (ml) (MGD) Wastewater type Treatment Technology Discharge Extended aeration activated sludge, air scrubbers for odor Northwest WWTP North and West El Residential, control, UV disinfection of Effluent Paso, TX Triplicate 75 17.5 MGD Industrial effluent Rio Grande Extended aeration activated Bustamante WWTP South, East, and Residential, sludge, air scrubbers for odor Effluent Lower Valley El Paso Duplicate 75 39 MGD Industrial control Rio Grande Northern Kentucky Residential, Dry Creek WWTP (Boon, Campbell, Industrial, Conventional activated sludge, Effluent Kenton County Single 100 33 MGD commercial chlorination for odor control Ohio River Dry Creek WWTP Influent Duplicate 40 Dry Creek WWTP Return Activated Sludge Duplicate 40 Dry Creek WWTP Secondary Aeration Duplicate 40

154

Table S3. PCR results using general Bacteriodales and swine-associated markers for individual

water DNA extracts

Swine- Swine- General Swine-associated associated associated Water Sampling Site* Bacteroidales Prevotella Prevotella Methanogens (Bac32) (PigBac1) (PF163) (P23-2 ) DNA Template concentration 0.2 ng 1 ng 1 ng 10 ng 50 ng 10 ng A-2 - + - - - - A-4 +a - + b - +b + A-5 + + - - - - A-6 + + - - - - A-9 + + - +a - + A-11 - - - + - - A-13 + + - + - - A-14 + + - - - - A-Facility Well - + - - - - A-Lagoon + - - + - + A-North Stream + + - - - - A-North Tile + - - - - - Percent Positive (Site A) 45 40 5 20 5 15 C-2 + - - + + + C-3 + b - + - + - C-4 - - - + - - C-7 + + + - - - C-Lagoon + + + + - + C-seep from field + + b - + - - C-South Stream + + + + - - Percent Positive (Site C) 78 44 44 56 22 22 E-1 + - - + - - E-2 + + + - - - E-7 +b + + + - - E-8 + + + + - - E-9 - + - + - - E-10 + + + a - - - E-12 - + - - - - E-Manure Pit + + - + - + Percent Positive (Site E) 46 54 31 38 0 8 Positive Control + + + + + + Positive Control + + + + + - Total Percent Positive Illinois (n=42) 52 45 21 33 7 14 Lake Granbury 18015 +/+c ndd nd +/+ - + Lake Granbury 11861 +/+ nd nd +b /- - + Lake Granbury 18018 +/+ nd nd -/- + + Lake Granbury 70015 +/+ nd nd +/- - + Lake Granbury 18038 +/+ nd nd +/+ - + Percent Positive (Lake Granbury) 100 nd nd 60 20 100 Buck Creek Site 04 +/+ nd nd +/- - - Buck Creek Site 05 +/+ nd nd +/- - - Buck Creek Site 10A +/+ nd nd +/+ - - Buck Creek Site 10C +/+ nd nd +/- - - Buck Creek Site 11 +/+ nd nd + b /- - - Percent Positive (Buck Creek) 100 nd nd 60 0 0 a PCR generated multiple bands one of which was of the correct size.

b Low intensity PCR product.

c “-/-“ or “+/+” indicates that both duplicate samples produced either negative or positive PCR results. “+/-“ indicates

that only one of the duplicates samples produced a positive signal.

d not determined.

155

Pig Feces-Pig manure Pit-Pig Lagoon- Pig Feces-Pig Manure Pit-Water, pig- Pig Feces-Pig manure Pit-Pig Lagoon Water, Singapore impacted Pig Lagoon-Water, pig-impacted OHPig107_696_bp__dna|pig|888 OHPig084_694_bp__dna|pig|385 MP090_698_bp__dna|pigmp|471 PPGA094_692_bp__dna|piglagoon|618 MP211_696_bp__dna|pigmp|888 OHPig046_694_bp__dna|pig|385 OHpig035_700_bp__dna|pig|471 PPGA196_692_bp__dna|piglagoon|618 PPGA541_696_bp__dna|piglagoon|888 BSING010_694_bp__dna|waterSING|385 PWO033(3)_700_bp__dna|waterPIG|471 PPGA085_692_bp__dna|piglagoon|618 PPGA552_696_bp__dna|piglagoon|888 MP134_694_bp__dna|pigmp|385 MP201_689_bp__dna|pigmp|474 PPGA061_692_bp__dna|piglagoon|618 PPGA313_696_bp__dna|piglagoon|888 ASING026_694_bp__dna|waterSING|385 FieldAY695697.1_689_bp__dna|pig|474 PPGA060_692_bp__dna|piglagoon|618 TMP009_694_bp__dna|pigmp|891 MP167_694_bp__dna|pigmp|389 FieldAY695695.1_688_bp__dna|pig|474 PPGA079_692_bp__dna|piglagoon|618 OHPig098_694_bp__dna|pig|891 MP029_694_bp__dna|pigmp|389 PWO058(2)_689_bp__dna|waterPIG|474 PPGA013_692_bp__dna|piglagoon|618 PPGA444_694_bp__dna|piglagoon|891 OHFECAL065_694_bp__dna|pig|389 OkabeAB237871.1_695_bp__dna|pig|483 PPGA042_692_bp__dna|piglagoon|618 PPGA377_694_bp__dna|piglagoon|891 MP034_694_bp__dna|pigmp|389 HanjimaAB237871.1_695_bp__dna|pig|483 PPGA106_692_bp__dna|piglagoon|618 PPGA331_694_bp__dna|piglagoon|891 ASING030_694_bp__dna|waterSING|389 WPGA019_694_bp__dna|waterPIG|483 PPGA010_692_bp__dna|piglagoon|618 PPGA328_694_bp__dna|piglagoon|891 OHPig049_696_bp__dna|pig|394 MP316_694_bp__dna|pigmp|483 PPGA048_692_bp__dna|piglagoon|618 PPGA431_694_bp__dna|piglagoon|891 MP168_696_bp__dna|pigmp|394 MP057_694_bp__dna|pigmp|483 PPGA180_692_bp__dna|piglagoon|618 PPGA381_694_bp__dna|piglagoon|891 BSING001_696_bp__dna|waterSING|394 OHPig118_694_bp__dna|pig|483 PPGA179_692_bp__dna|piglagoon|618 PPGA325_694_bp__dna|piglagoon|891 MP059_694_bp__dna|pigmp|417 MP219_694_bp__dna|pigmp|483 PPGA162_692_bp__dna|piglagoon|618 MP238_694_bp__dna|pigmp|909 MP273_695_bp__dna|pigmp|417 MP225_694_bp__dna|pigmp|483 PPGA057_692_bp__dna|piglagoon|618 TMP028_694_bp__dna|pigmp|909 MP258_695_bp__dna|pigmp|417 PWO323_694_bp__dna|waterPIG|483 PPGA058_692_bp__dna|piglagoon|618 MP146_694_bp__dna|pigmp|909 OHPig090_694_bp__dna|pig|417 PWO087_696_bp__dna|waterPIG|483 PPGA056_692_bp__dna|piglagoon|618 OHPig170_694_bp__dna|pig|909 MP144_694_bp__dna|pigmp|417 MP103_694_bp__dna|pigmp|540 PPGA250_692_bp__dna|piglagoon|618 MP019_694_bp__dna|pigmp|909 BSING036_694_bp__dna|waterSING|417 OHPig079_694_bp__dna|pig|540 PPGA031_692_bp__dna|piglagoon|618 OHPig169_694_bp__dna|pig|909 PWO377_694_bp__dna|waterPIG|540 PPGA158_692_bp__dna|piglagoon|618 TMP065_694_bp__dna|pigmp|909 Pig Feces-Water, pig-impacted OHFECAL093_694_bp__dna|pig|540 PPGA009_692_bp__dna|piglagoon|618 OHPig082_694_bp__dna|pig|909 OHPig083_694_bp__dna|pig|499 PWO230_694_bp__dna|waterPIG|540 PPGA172_692_bp__dna|piglagoon|618 PPGA528_694_bp__dna|piglagoon|909 PWO128_694_bp__dna|waterPIG|499 MP119_694_bp__dna|pigmp|565 PPGA099_692_bp__dna|piglagoon|618 PPGA453_694_bp__dna|piglagoon|909 OHPig072_689_bp__dna|pig|528 MP266_694_bp__dna|pigmp|565 PPGA006_692_bp__dna|piglagoon|618 PPGA098_696_bp__dna|piglagoon|930 OHpig038_689_bp__dna|pig|528 MP208_694_bp__dna|pigmp|565 PPGA230_692_bp__dna|piglagoon|618 OHFECAL080_696_bp__dna|pig|930 OHpig030a_689_bp__dna|pig|528 OHPig145_694_bp__dna|pig|565 PPGA044_692_bp__dna|piglagoon|618 TMP046_696_bp__dna|pigmp|930 OHPig116_689_bp__dna|pig|528 PWO296_694_bp__dna|waterPIG|565 PPGA233_692_bp__dna|piglagoon|618 TMP058_696_bp__dna|pigmp|930 PWO200_689_bp__dna|waterPIG|528 OHPig078_694_bp__dna|pig|576 PPGA091_692_bp__dna|piglagoon|618 MP271_696_bp__dna|pigmp|930 OHPig109_694_bp__dna|pig|552 TMP052_694_bp__dna|pigmp|576 PPGA144_692_bp__dna|piglagoon|618 PPGA036_696_bp__dna|piglagoon|930 PWO262_694_bp__dna|waterPIG|552 PWO324_694_bp__dna|waterPIG|576 PPGA087_692_bp__dna|piglagoon|618 PPGA209_696_bp__dna|piglagoon|930 OHFECAL086_696_bp__dna|pig|561 MP228_694_bp__dna|pigmp|671 PPGA208_692_bp__dna|piglagoon|618 PPGA216_696_bp__dna|piglagoon|930 PWO282_696_bp__dna|waterPIG|561 FieldAY695694.1_694_bp__dna|pig|671 PPGA038_692_bp__dna|piglagoon|618 PPGA171_696_bp__dna|piglagoon|930 OHPig048_694_bp__dna|pig|611 PWO682_694_bp__dna|waterPIG|671 PPGA234_692_bp__dna|piglagoon|618 PPGA115_696_bp__dna|piglagoon|930 PWO433_694_bp__dna|waterPIG|611 PWO677_694_bp__dna|waterPIG|671 PPGA002_692_bp__dna|piglagoon|618 PPGA117_696_bp__dna|piglagoon|930 OHFECAL042_694_bp__dna|pig|628 WPGA008_694_bp__dna|waterPIG|975 PWO453_692_bp__dna|waterPIG|618 PPGA066_696_bp__dna|piglagoon|930 PWO494_692_bp__dna|waterPIG|628 MP189_694_bp__dna|pigmp|975 PPGA259_689_bp__dna|piglagoon|881 PPGA101_696_bp__dna|piglagoon|930 OHPig138_696_bp__dna|pig|659 MP123_694_bp__dna|pigmp|975 WPGA038_689_bp__dna|waterPIG|881 PPGA055_696_bp__dna|piglagoon|930 PWO598_696_bp__dna|waterPIG|659 MP114_694_bp__dna|pigmp|975 WPGA067_689_bp__dna|waterPIG|881 PPGA049_696_bp_ OHPig132_694_bp__dna|pig|975 PPGA258_689_bp__dna|piglagoon|881 WPGA020_690_bp__dna|waterPIG|882 Pig manure pit- Water, pig-impacted Pig manure pit-Water, Singapore Pig Lagoon-Septic Tank WPGA054_690_bp__dna|waterPIG|882 MP124_694_bp__dna|pigmp|570 MP111_694_bp__dna|pigmp|374 LGIL019_691_bp__dna|piglagoon|718 PPGA342_690_bp__dna|piglagoon|882 PWO308_694_bp__dna|waterPIG|570 MP281_694_bp__dna|pigmp|374 Contig0278_Septic_(1)_691_bp__dna|septic|718 PPGA267_690_bp__dna|piglagoon|882 PWO659_696_bp__dna|waterPIG|606 ASING012_694_bp__dna|waterSING|374 PPGA278_686_bp__dna|piglagoon|883 Pig Feces-Pig manure Pit-Water, pig- PWO642_696_bp__dna|waterPIG|606 MP213_694_bp__dna|pigmp|414 impacted WPGA043_686_bp__dna|waterPIG|883 MP003_696_bp__dna|pigmp|606 MP031_694_bp__dna|pigmp|414 no OTUs shared WPGA040_686_bp__dna|waterPIG|883

MP291_696_bp__dna|pigmp|606 MP015_694_bp__dna|pigmp|414 WPGA018_686_bp__dna|waterPIG|883

PWO422_696_bp__dna|waterPIG|606 BSING027_694_bp__dna|waterSING|414 WPGA035_686_bp__dna|waterPIG|883

MP227_691_bp__dna|pigmp|664 Pig Lagoon-Cattle Feces WPGA017_686_bp__dna|waterPIG|883

PWO606_687_bp__dna|waterPIG|664 PPGA372_690_bp__dna|piglagoon|231 WPGA004_686_bp__dna|waterPIG|883

AY597140.1TN_690_bp__dna|cattle|231 PPGA277_686_bp__dna|piglagoon|883

NECattle019_690_bp__dna|cattle|231 WPGA027_696_bp__dna|waterPIG|884

PPGA263_690_bp__dna|piglagoon|265 PPGA289_696_bp__dna|piglagoon|884

NECattle127_690_bp__dna|cattle|265 WPGA007_694_bp__dna|waterPIG|890

NECattle062_690_bp__dna|cattle|265 WPGA049_694_bp__dna|waterPIG|890

PPGA396_690_bp__dna|piglagoon|268 WPGA021_695_bp__dna|waterPIG|890

NECattle122_690_bp__dna|cattle|268 WPGA046_694_bp__dna|waterPIG|890 NECattle167_690_bp__dna|cattle|268 WPGA030_693_bp__dna|waterPIG|890 NECattle069_690_bp__dna|cattle|268 WPGA050_694_bp__dna|waterPIG|890 NECattle136_690_bp__dna|cattle|273 PPGA460_694_bp__dna|piglagoon|890 NECattle130_690_bp__dna|cattle|273 PPGA335_694_bp__dna|piglagoon|890 PPGA482_690_bp__dna|piglagoon|273 WPGA042_694_bp__dna|waterPIG|890 PPGA262_690_bp__dna|piglagoon|273 PPGA491_694_bp__dna|piglagoon|890 AY597167.1_690_bp__dna|cattle|273 WPGA060_694_bp__dna|waterPIG|890 NECattle076_690_bp__dna|cattle|273 PPGA496_694_bp__dna|piglagoon|890 PPGA439_694_bp__dna|piglagoon|274 WPGA013_694_bp__dna|waterPIG|890 AY597157.1TX_694_bp__dna|cattle|274 PPGA452_694_bp__dna|piglagoon|890 PPGA489_694_bp__dna|piglagoon|274 PPGA555_705_bp__dna|piglagoon|890 NECattle135_694_bp__dna|cattle|274 PPGA457_694_bp__dna|piglagoon|890 NECattle124_694_bp__dna|cattle|274 WPGA015_694_bp__dna|waterPIG|890 NECattle077_694_bp__dna|cattle|274 WPGA012_694_bp__dna|waterPIG|890

156

PPGA467_694_bp__dna|piglagoon|321 WPGA070_694_bp__dna|waterPIG|890 AY597169.1_694_bp__dna|cattle|321 WPGA051_694_bp__dna|waterPIG|890 AY597170.1_694_bp__dna|cattle|321 WPGA059_694_bp__dna|waterPIG|890 NECattle165_694_bp__dna|cattle|321 WPGA023_694_bp__dna|waterPIG|890 PPGA253_689_bp__dna|piglagoon|362 WPGA032_694_bp__dna|waterPIG|890 AY597156.1TX_689_bp__dna|cattle|362 WPGA009_694_bp__dna|waterPIG|890 Table S4. Bacteroidales 16S rRNA sequeuces shared between two or more libraries.

157

Table S5. In silico testing of currently available swine-specific Bacteroidales markers

158

Table S6. Binomial test for comparing abundance of bacteria phyla from distal gut metagenomes.

p-value Statistical outcome Cow Rumen vs Pig Feces Firmicutes 3.17E-05 Less abundant in Pig Feces Bacteroidetes 1.39E-03 More abundant in Pig Feces Actinobacteria 2.56E-03 Less abundant in Pig Feces Proteobacteria 3.15E-02 More abundant in Pig Feces Spirochaetes 3.68E-02 More abundant in Pig Feces Chicken Cecum vs Pig Feces Spirochaetes 2.78E-08 More abundant in Pig Feces Bacteroidetes 1.83E-04 Less abundant in Pig Feces

Proteobacteria 4.98E-04 More abundant in Pig Feces

Fibrobacteres 1.30E-02 More abundant in Pig Feces

Human Infant vs Pig Feces Actinobacteria 1.69E-57 Less abundant in Pig Feces Bacteroidetes 5.32E-10 More abundant in Pig Feces Spirochaetes 2.78E-02 More abundant in Pig Feces Human Adult vs Pig Feces Actinobacteria 7.37E-09 Less abundant in Pig Feces Bacteroidetes 1.25E-03 More abundant in Pig Feces Proteobacteria 3.72E-03 Less abundant in Pig Feces Spirochaetes 1.44E-02 More abundant in Pig Feces Fish Gut vs Pig Feces Proteobacteria 8.95E-24 Less abundant in Pig Feces Bacteroidetes 1.02E-10 More abundant in Pig Feces Termite Gut vs Pig Feces Spirochaetes 5.89E-25 Less abundant in Pig Feces Firmicutes 1.12E-09 More abundant in Pig Feces Bacteroidetes 1.10E-08 More abundant in Pig Feces Actinobacteria 1.57E-05 Less abundant in Pig Feces Fibrobacteres 1.15E-02 Less abundant in Pig Feces

159

Table S7. Binomial test for comparing abundance of bacteria genera from distal gut metagenomes.

p-value Statistical Outcome Cow rumen vs Pig Feces Butyrivibrio 2.01E-03 Less abundant in Pig Feces Anaerovibrio 2.89E-02 More abundant in Pig Feces Treponema 1.64E-02 More abundant in Pig Feces Chicken Cecum vs Pig Feces Prevotella 4.69E-48 More abundant in Pig Feces Bacteroides 5.89E-24 Less abundant in Pig Feces Treponema 6.26E-03 More abundant in Pig Feces Anaerovibrio 1.33E-02 More abundant in Pig Feces Lactobacillus 1.33E-02 Less abundant in Pig Feces Sporobacter 2.06E-02 More abundant in Pig Feces Human Infant vs Pig Feces Bifidobacterium 1.18E-28 Less abundant in Pig Feces Prevotella 6.12E-05 More abundant in Pig Feces Treponema 4.32E-02 More abundant in Pig Feces Bacteroides 6.53E-04 Less abundant in Pig Feces Human Adult vs Pig Feces Prevotella 6.13E-03 More abundant in Pig Feces Bifidobacterium 6.13E-03 Less abundant in Pig Feces Treponema 1.63E-02 More abundant in Pig Feces Bacteroides 1.57E-04 Less abundant in Pig Feces Termite Gut vs Pig Feces Treponema 1.69E-11 More abundant in Pig Feces Spirochaeta 5.87E-10 Less abundant in Pig Feces Prevotella 7.80E-03 More abundant in Pig Feces Fish Gut vs Pig Feces Paenibacillus 1.28E-09 Less abundant in Pig Feces Herbaspirillum 1.87E-06 Less abundant in Pig Feces Ralstonia 3.04E-03 Less abundant in Pig Feces Prevotella 6.05E-03 More abundant in Pig Feces Treponema 3.60E-02 More abundant in Pig Feces

160

Ohio River

Downstream Site 4 Confluence

Site 5 Site 6 Upstream

CSO Site 3

Brush Creek Site 2 Site 1

Creek Twelvemile

Fig. S1. Map of Twelve-mile Creek and Ohio River (Area mapped using

Enviromapper; http://www.epa.gov/enviro/emef/). Water samples were collected from six sites along the Twelve-mile Creek downstream of Brush Creek and three sites located on the Ohio River. The location of a combined sewer overflow is designated by “CSO”.

161

Fig. S2. Map of the Rio Grande/El Paso area (courtesy of El Paso Water Utilities).

Samples were collected at upstream, effluent, and downstream from the Northwest and

Bustamante wastewater treatment plants. The Sunland Park and Courchesne samples were taken from the river just upstream from the Northwest plant in New Mexico.

162

Site A

Site C

Site E

Fig. S3. Site A, C, and E well locations (circles) and ground water flow directions (arrows).

Numbers in parenthesis are well depths (m). and stratigraphic columns represent the geology of each site.

163

= swine-specific clade

= cosmopolitan clade

Fig. S4. Phylogenetic tree of 6,311 Bacteroidales 16S rRNA gene sequences derived from different animal hosts, and from water, lagoon, and manure samples, based on a neighbor joining algorithm.

164

= Swine-specific clades

Fig. S5. Phylogenetic tree of 6,311 Bacteroidales 16S rRNA gene sequences derived from different animal hosts, and from water, lagoon, and manure samples, based on a neighbor joining algorithm.

165

Fig. S6. Hierarchical clustering of all genes retrieved from the cell wall and capsule SEED subsystem for available endobiotic microbiomes. Hierarchical clustering using Bray-Curtis dissimilarities. Red-colored branch lengths indicate that no statistical difference in similarity profiles could be identified for these respective nodes.

166

Fig. S7. Respiration SEED subsystem from endobiotic metagenomes.

167

Fig. S8. Lipid SEED subsystem from endobiotic metagenomes.

168

Fig. S9. Virulence SEED subsystem from endobiotic metagenomes.

169

A

B

Fig. S10. Principal Components Analysis of gut metagenomes using (A) COG and (b) Pfam abundance.

170

Fig. S11. Taxonomic distribution of bacterial orders from endobiotic microbiomes using RDP and Greengenes classification schemes.

171

Fig. S12. Hierarchical clustering of all genes retrieved from the cell wall and capsule SEED subsystem for available endobiotic microbiomes. Hierarchical clustering using Bray-Curtis dissimilarities. Red-colored branch lengths indicate that no statistical difference in similarity profiles could be identified for these respective nodes.

172

1% 1% 2% 5% Actinoplanes phage Actinoplanes phage phiAsp2. A phiAsp2. B Bacteriophage 11b. Bacteriophage 11b.

Bacteriophage ROSA. Bacteriophage ROSA. Bacteriophage Bacteriophage phBC6A51. phBC6A51. Bacteriophage Bacteriophage phBC6A52. phBC6A52. Lactobacillus plantarum Lactobacillus plantarum bacteriophage phiJL-1. bacteriophage phiJL-1. Retro-transcribing viruses Retro-transcribing viruses dsDNA viruses, no RNA dsDNA viruses, no RNA stage ssDNA viruses stage 91% ssDNA viruses

Fig. S13. Metagenomic fragments related to viruses for (A) GS20 pig fecal metagenome and (B) the FLX pig fecal

metagenome, using SEED subsystem

172 80

70 CowRumen ChickenCecum

60 HumanInM

level) HumanInB -

50 humanInA

HumanF1U

40 HumanInD

humanF2Y 30 HumanF2X

humanF1S

Number OTUs Number of (RDP genus 20 humanF2V

humanF2W 10 humanF1T

0 PigFecesFLX 0 100 200 300 400 500 600 700 800 900 Number of 16S rDNA reads

Fig. S14. Rarefaction curves of observed OTUs using the RDP genus level

173

Fig. S15. k-dominance plots for gut metagenomes using RDP genus level taxonomy

174

Fig. S16. Percent contribution of endobiotic metagenomes to each of 160 functional SEED Subsystems ( Hierarchy Level 1).

Distribution of metagenomic reads for each microbiome was normalized to account for metagenome size.

175

Fig. S17. Percent contribution of endobiotic metagenomes to Prophage/DNA recombination SEED Subsystem. Distribution of metagenomic reads for each microbiome was normalized to account for metagenome size.

176

REFERENCES

Akopyants, N.S., Fradkov, A., Diatchenko, L., Hills, J.E., Siebert, P.D., Lukyanov, S.A.,

Sverdlov, E.D., and D.E. Berg. 1998. PCR based subtractive hybridization and differences in gene content among strains of Helicobacter pylori. Proc. Natl. Acad. Sci. USA 95:13108-13113.

Altschul, S.F., W. Gish, W. Miller, E.W. Myers, and D.J. Lipman. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410.

American Public Health Association. 1998. Standard methods for the examination of water and wastewater. APHA/AWWA/WEF, Washington, DC.

Ashelford, K. E., N. Chuzhanova, J. C. Fry, A. J. Jones, and A. J. Weightman. 2006. New screening software shows that most recent large 16S rRNA gene clone libraries contain chimeras. Appl. Environ. Microbiol. 72:5734-5741.

Avelar, K.E.S., S.R. Moraes, L.J.F. Pinto, W.G. Silva e Souza, R.M.C.P. Domingues, and

M.C. de Souza Ferreira. 1998. Influence of stress conditions on Bacteroides fragilis survival and protein profiles. Zentralbl. Bakteriol. 287:399-409.

Bäckhed, F., R.E. Ley, J.L. Sonnenburg, D.A. Peterson, and J.I. Gordon. 2005. Host-

Bacterial Mutualism in the Human Intestine. Science. 307:1915-1920.

177

Baldauf, S.L., A.J. Roger, I. Wenk-Siefert, and W.F. Doolittle. 2000. A -level phylogeny of eukaryotes based on combined protein data. Science. 290:972-977.

Bell, A., A.C. Layton, L. McKay, D. Williams, R. Gentry, and G.S. Sayler. 2009. Factors influencing the persistence of fecal Bacteroides in stream water. J Environ Qual. 38:1224-32.

Bernhard, A.E., and K.G. Field. 2000. A PCR assay to discriminate human and ruminant feces on the basis of host differences in Bacteroides-Prevotella genes encoding 16S rRNA. Appl.

Environ. Microbiol. 66:4571-4574.

Biavati, B., M. Vescovo, S. Torriani, and V. Bottazzi. 2000. Bifidobacteria: history, ecology, physiology and applications. Ann. Microbiol. 50:117-131.

Biavati, B., V. Scardovi and W.E.C. Moore. 1982. Electrophoretic patterns of proteins in the genus Bifidobacterium and proposal of four new species. Int. J. Syst. Bacteriol. 32:358-373.

Biavati, B., P. Castagnoli, and L. D. Trovatelli. 1986. Species of the genus Bifidobacterium in the feces of human adults. Microbiologica. 9:39-45.

Bjursell, M. K., E. C. Martens, and J. I .Gordon. 2006. Functional genomic and metabolic

178 studies of the adaptations of a prominent adult human gut symbiont, Bacteroides thetaiotaomicron, to the suckling period. J. Biol. Chem. 281:36269-36279.

Bonjoch, X., E. Ballesté and A.R. Blanch. 2004. Multiplex PCR with 16S rRNA gene-targeted primers of Bifidobacterium spp. to identify sources of fecal pollution. Appl. Environ. Microbiol.

70:3171-3175.

Brulc, J.M., D.A. Antonopoulos, M.E. Berg Miller, M.K. Wilson, A.C. Yannarell, et al.

2009. Gene-centric metagenomics of the fiber-adherent bovine rumen microbiome reveals forage specific glycoside hydrolases. Proc Natl Acad Sci U S A. 106:1948–1953.

Bry, L., P.G. Falk, T. Midtvedt, and J.I. Gordon. 1996. A model of host microbial interactions in an open mammalian ecosystem. Science. 273:1380 1383.

Buchholz-Cleven, B. E., B. Rattunde, and K. L. Straub. 1997. Screening for genetic diversity of isolates of anaerobic Fe(II)-oxidizing bacteria using DGGE and whole-cell hybridization. Syst

Appl Microbiol. 20:301–309.

Buckholt, M.A., J.H. Lee, and S. Tzipori. 2002. Prevalence of Enterocytozoon bieneusi in swine: an 18-month survey at a slaughterhouse in Massachusetts. Appl. Environ. Microbiol.

68:2595-2599.

179

Butine, T. J., and J. A. Leedle. 1989. Enumeration of selected anaerobic bacterial groups in cecal and colonic contents of growing-finishing pigs. Appl. Environ. Microbiol. 55:1112-1116.

Carlson, M., and T. Fangman. 2000. G2353, antibiotics and other additives for swine: food safety considerations. Columbia Extension Service, University of Missouri, Columbia.

Carrillo, M., E. Estrada, and T.C. Hazen. 1985. Survival and enumeration of the fecal indicators Bifidobacterium adolescentis and Escherichia coli in a tropical rain forest watershed.

Appl. Environ. Microbiol. 50:468-476.

Carson, C.A., J.M. Christiansen, H. Yampara-Iquise, V.W. Benson, C. Baffaut, J.V. Davis,

R.R. Broz, W.B. Kurtz, W.M. Rogers, and W.H. Fales. 2005. Specificity of a Bacteroides thetaiotaomicron Marker for Human Feces. Appl. Environ. Microbiol. 71:4945-4949.

Castillo, M., G. Skene, M. Roca, M. Anguita, I. Badiola, S.H. Duncan, H.J. Flint, S.M.

Martín-Orúe. 2007. Application of 16S rRNA gene-targetted fluorescence in situ hybridization and restriction fragment length polymorphism to study porcine microbiota along the gastrointestinal tract in response to different sources of dietary fibre

FEMS Microbiology Ecology. 59:138–146.

Chao, A. 1984. Non-parametric estimation of the number of classes in a population. Scand. J.

Stat. 11:265–270.

180

Chao, A., and S.M. Lee. 1992. Estimating the number of classes via sample coverage. J Am Stat

Assoc. 87:210–217.

Chee-Sanford, J. C., R. I. Aminov, I. J. Krapac, N. Garrigues-Jeanjean, and R.I. Mackie.

2001. Occurrence and diversity of tetracycline resistance genes in lagoons and groundwater underlying two swine production facilities. Appl. Environ. Microbiol. 67:1494-1502.

Cole, J.R., B. Chai, T. Marsh, R. Farris, Q. Wang, S. Kulam, S. Chandra, D. McGarrell, T.

Schmidt, G. Garrity, and J. Tiedje. 2003. The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic taxonomy.

Nucleic Acids Res. 31:442-443.

Cole JR, Chai B, Farris RJ, Wang Q, Kulam-Syed-Mohideen AS, McGarrell DM, Bandela

AM, Cardenas E, Garrity GM, Tiedje JM. 2007. The ribosomal database project (RDP-II): introducing myRDP space and quality controlled public data. Nucleic Acids Res. 35:169-172.

Colwell, R.K. 2005. EstimateS: Statistical estimation of species richness and shared species from samples. Version 7.5, http://purl.oclc.org/estimates.

Committee on Environment and Natural Resources. 2000. Hypoxia in the Northern Gulf of

Mexico: An Integrated Assessment. Washington, DC: National Science and Technology

Council, Office of the President of the United States. 7 pp.

181

Dalevi D, Ivanova, NN, Mavromatis K, Hooper S, Szeto E, Hugenholtz P, Kyrpides NC, and

Markowitz VM. 2008. Annotation of metagenome short reads using proxygenes.

Bioinformatics. 24:i7-13.

Dalevi ,D., N.N. Ivanova, K. Mavromatis, S.D. Hooper, E. Szeto, P. Hugenholtz, N.C.

Kyrpides, and V.M. Markowitz. 2008. Annotation of metagenome short reads using proxygenes. Bioinformatics. 24:i7-13.

Daly, K., C. S. Stewart, H. J. Flint, and S. P. Shirazi-Beechey. 2001. Bacterial diversity within the equine large intestine as revealed by molecular analysis of cloned 16S rRNA genes.

FEMS Microbiol. Ecol. 38:141-151.

Delcher, A.L., D. Harmon, S. Kasif, O. White, and S.L. Salzberg. 1999. Improved microbial gene identification with GLIMMER. Nucleic Acids Res. 27:4636-41.

DeSantis, T. Z., P. Hugenholtz, K. Keller, E. L. Brodie, N. Larsen, Y. M. Piceno, R. Phan, and G. L. Andersen. 2006. NAST: a multiple sequence alignment server for comparative analysis of 16S rRNA genes. Nucleic Acids Res. 34:394-9.

DeSantis, T. Z., P. Hugenholtz, N. Larsen, M. Rojas, E. L. Brodie, K. Keller, T. Huber, D.

Dalevi, P. Hu, and G. L. Andersen. 2006. Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl. Environ. Microbiol 72:5069-72.

182

Dick, L.K., A.E. Bernhard, T.J. Brodeur, J.W. Santo Domingo, J.M. Simpson, S.P.

Walters, and K.G. Field. 2005. Host distributions of uncultivated fecal Bacteroidales bacteria reveal genetic markers for fecal source identification. Appl. Environ. Microbiol. 71: 3184-3191.

Eckburg, P.B., E.M. Bik, C.N. Bernstein, E. Purdom, L. Dethlefsen, M. Sargent, S.R. Gill,

K.E. Nelson, and D.A. Relman. 2005. Diversity of the human intestinal microbial flora.

Science. 308:1635-1638.

Eswarappa, S.M., J.J. Balasundaram, N.M. Dixit, and D. Chakravortty. 2009. Host- specificity of Salmonella enterica serovar Gallinarum: insights from comparative genomics.

Infect Genet Evol. 9:468-73.

Finegold, S.M., H.R. Attebery, and V.L. Sutter. 1974. Effect of diet on human fecal flora: comparison of Japanese and American diets. Am. J. Clin. Nut. 27:1456-1469.

Fratamico, P.M., L.K. Bagi, E.J. Bush, and B.T. Solow. 2004. Prevalence and characterization of shiga toxin-producing Escherichia coli in swine feces recovered in the

National Animal Health Monitoring System's Swine 2000 study. Appl. Environ. Microbiol.

70:7173-7178.

Gill, S.R., M. Pop, R.T. DeBoy, P.B. Eckburg, P.J. Turnbaugh, B.S. Samuel, J.I.

Gordon, D.A. Relman, C.M. Fraser-Liggett, and K.E. Nelson. 2006. Metagenomic Analysis of the Human Distal Gut Microbiome. Science. 312:1355-1359.

183

Gordon, D.M. 2001. Geographical structure and host specificity in bacteria and the implications for tracing the source of coliform contamination. Microbiology. 147:1079-1085.

Gordon, D.M., and J. Lee. 1999. The genetic structure of enteric bacteria from Australian mammals. Microbiology. 145:2673-2682.

Gordon, J.I. 2005. A genomic view of our symbiosis with members of the gut microbiota. J

Pediatr Gastroenterol Nutr. 40:S28

Gourmelon, M., M. P. Caprais, R. Segura, C. Le Mennec, S. Lozach, J. Y. Piriou, and A.

Rince. 2007. Evaluation of two library-independent microbial source tracking methods to identify sources of fecal contamination in French estuaries Appl. Environ. Microbiol. 73:4857-

4866

Guan, T. Y., and R. A. Holley. 2003. Pathogen survival in swine manure environments and transmission of human enteric illness--a review. J. Environ. Qual. 32:383-392.

Handelsman J. 2004. Metagenomics: application of genomics to uncultured microorganisms.

Microbiology and Molecular Biology Reviews. 68:669-685.

Hardeman F, and S. 2007. Metagenomic approach for the isolation of a novel low- temperature-active lipase from uncultured bacteria of marine sediment. FEMS Microbiol Ecol.

59:524–534.

Harmsen, H.J.M., G.R. Gibson, P. Elfferich, G.C. Raangs, A.C.M. Wildeboer-Veloo, A.

184

Argaiz, M.B. Roberfroid, and G.W. Welling. 2000. Comparison of viable cell counts and fluorescence in situ hybridization using specific rRNA-based probes for the quantification of human fecal bacteria. FEMS Microbiol. Lett. 183:125-129.

Harrington, E.D., A.H. Singh, T. Doerks, I. Letunic, C. von Mering, L.J. Jensen, J. Raes, and P. Bork. 2007. Quantitative assessment of protein function prediction from metagenomics shotgun sequences. Proceedings of the National Academy of Sciences. 104:13913-13918.

Hasegawa M, Hashimoto T. 1993. Ribosomal RNA trees misleading? Nature 361:23.

Heck, K.L., Jr., G. Van Belle, and D. Simberloff. 1975. Explicit calculation of the rarefaction diversity measurement and the determination of sufficient sample size. Ecology. 56:1459-1461.

Hoff, K.J., T. Lingner, P. Meinicke , and T. Maike. 2009. Orphelia: predicting genes in metagenomic sequencing reads Nucleic Acids Research.(Epub)

Hoff, K.J., M. Tech, T. Lingner, R. Daniel, M. Morgenstern, and P. Meinicke. 2008 Gene prediction in metagenomic fragments: a large scale machine learning approach. BMC

Bioinformatics, 9, 217.

Holland, S. 2003. aRarefactWin 1.3 program. University of Georgia, Athens. www.uga.edu/strata/software/Software.html.

185

Hong, P.-Y., J.-H. Wu, and W.-T. Liu. 2008. Relative abundance of Bacteroides spp. in stools and wastewaters as determined by hierarchical oligonucleotide primer extension. Appl. Environ.

Microbiol. 74:2882-2893.

Hopkins, M.J., R. Sharp, and G.T. Macfarlane. 2001. Age and disease related changes in intestinal bacterial populations assessed by cell culture, 16S rRNA abundance, and community cellular fatty acid profiles. Gut. 48:198-205.

Horrigan, L., R. Lawrence, and P. Walker. 2002. How sustainable agriculture can address the environmental and human health harms of industrial agriculture. Environ. Health Perspect.

110:445–456.

Huber, T., G. Faulkner, and P. Hugenholtz. 2004. Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics. 20:2317-2319.

Hugenholtz, P., and T. Huber. 2003. Chimeric 16S rDNA sequences of diverse origin are accumulating in public databases. Int. J. Syst. Evol. Microbiol. 53:289-293.

Hugenholtz, P., and Tyson, G.W. 2008. Microbiology: Metagenomics. Nature. 455:481-483.

Hugenholtz, P., B. Goebel, and N. R. Pace. 1998. Impact of culture-independent studies on the emerging phylogenetic view of bacterial diversity. J. Bacteriol. 180:4766–4774.

186

Hurlbert, S.H. 1971. The nonconcept of species diversity: a critique and alternative parameters.

Ecology. 52:577-586.

Huson, Daniel H., Auch, Alexander F., Qi, Ji, Schuster, Stephan C. 2007.

MEGAN analysis of metagenomic data. Genome Res. 17: 377-386.

Ikeda-Ohtsubo, W., M. Desai, U. Stingl, and A. Brune. 2007. Phylogenetic diversity of

'Endomicrobia' and their specific affiliation with termite gut flagellates. Microbiology. 153:3458-

3465.

Ingham CJ, Ben-Jacob E (2008) Swarming and complex pattern formation in Paenibacillus vortex studied by imagine and tracking cells. BMC Microbiology 8:36

Jensen, B. B. 1996. Methanogenesis in monogastric animals. Environ. Monit. Assess. 42:99-

112.

Jeter, S.N., C.M. McDermott, P.A. Bower, J.L. Kinzelman, M.J. Bootsma, G.W. Goetz, and

S.L. McLellan. 2009. Bacteroidales diversity in ring-billed gulls (Laurus delawarensis) residing at Lake Michigan beaches. Appl Environ Microbiol. 75:1525-33.

Jimenez-Clavero, M.A., C. Fernandez, J.A. Ortiz, J. Pro, G. Carbonell, J.V. Tarazona, N.

Roblas, and V. Ley. 2003. Teschoviruses as indicators of porcine fecal contamination of surface water. Appl. Environ. Microbiol. 69:6311-6315.

187

Johnson, C. C., and S. M. Finegold. 1987. Uncommonly encountered, motile, anaerobic gram- negative bacilli associated with infection. Rev. Infect. Dis. 9:1150-1162.

Jongbloed, A.W., and N.P. Lenis. 1998. Environmental concerns about animal manure. J.

Anim. Sci. 76:2641–2648.

Kang, M. S., D. Hancock, T. E. Besser, and D. R. Call. 2006. Identification of specific gene sequences conserved in contemporary epidemic strains of Salmonella enterica. Appl. Environ.

Microbiol. 72:6938-6947.

Kaufmann, P., A. Pfefferkorn, M. Teuber, and L. Meile. 1997. Identification and

Quantification of Bifidobacterium Species isolated from food with genus-specific 16S rRNA- targeted probes by colony hybridization and PCR. Appl. Environ. Microbiol. 63:1268-1273.

Kellog, R. L., C. H. Lander, D. C. Moffitt, and N. Gollehon. 2000. Manure nutrients relative to the capacity of cropland and pastureland to assimilate nutrients: spatial and temporal trends for the United States. U.S. Department of Agriculture, Washington, DC.

Khatib, L.A., Y.L. Tsai, and B.H. Olson. 2003. A biomarker for the identification of swine fecal pollution in water, using the STII toxin gene from enterotoxigenic Escherichia coli. Appl.

Microbiol. Biotechnol. 63:231-238.

188

Kildare, B.J., C.M. Leutenegger, B.S. McSwain, D.G. Bambic, V.B. Rajal, and S. Wuertz.

2007. 16S rRNA-based assays for quantitative detection of universal, human-, cow-, and dog- specific fecal Bacteroidales: A Bayesian approach. Water Research 41:3701-3715.

King, E. L., D. S.Bachoon, and K. W. Gates. 2007. Rapid detection of human fecal contamination in estuarine environments by PCR targeting of Bifidobacterium adolescentis. J.

Microbiol. Methods. 68:76-81.

Kirchhof G., B. Eckert B, M. Stoffels M, J. I. Baldani JI, V. M. Reis VM, and A.

Hartmann. 2001. Herbaspirillum frisingense sp. nov., a new nitrogen-fixing bacterial species that occurs in C4-fibre plants. Int J Syst Evol Microbiol. 51:157-168.

Klijn, A., A. Mercenier, and F. Arigoni. 2005. Lessons from the genomes of bifidobacteria.

FEMS Microbiol. Rev. 29:491-509.

Koike, S., I. G. Krapac, H. D. Oliver, A. C. Yannarell, J. C. Chee-Sanford, R. I. Aminov, and R. I . Mackie. 2007. Monitoring and source tracking of tetracycline resistance genes in lagoons and groundwater adjacent to swine production facilities over a 3-year period. Appl.

Environ. Microbiol. 73:4813-4823.

Kok, R.G., A. D. Waal, F. Schut, G. W. Welling, G. Weenk, and K. J. Hellingwerf. 1996.

Specific detection and analysis of a probiotic Bifidobacterium strain in infant feces. App.

Environ. Microbiol. 62:3668-3672.

189

Krapac, I. G., W. S. Dey, W. R. Roy, C. A. Smyth, E. Storment, S. L. Sargent, and J. D.

Steele. 2002. Impacts of swine manure pits on groundwater quality. Environ. Poll. 120:475-492.

Kreader, C.A. 1995. Design and evaluation of Bacteroides DNA probes for the specific detection of human fecal pollution. Appl. Environ. Microbiol. 61:1171-1179.

Kreader, C.A. 1998. Persistence of PCR-detectable Bacteroides distasonis from human feces in river water. Appl. Environ. Microbiol. 64:4103-4105.

Kristiansson, E., Hugenholtz, P., and Dalevi, D. (2009a). ShotgunFunctionalizeR: An R- package for functional analysis of metagenomic data. Bioinformatics. (Epub)

Kristiansson, E., Hugenholtz, P., and Dalevi, D. (2009b). ShotgunFunctionalizeR: An R- package for functional analysis of metagenomic data - Supplement. Bioinformatics. (Epub)

Kurokawa, K., T. Itoh, T. Kuwahara , K. Oshima, H. Toh, A. Toyoda, H. Takami, H.

Morita, V.K. Sharma, T.P. Srivastava, T.D. Taylor, H. Noguchi, H. Mori, Y. Kurokawa,

K., T. Itoh, T. Kuwahara, K. Oshima, H. Toh, A. Toyoda, H. Takami, H. Morita, V.K.

Sharma, T.P. Srivastava, et al. 2007. Comparative metagenomics revealed commonly enriched gene sets in human gut microbiomes. DNA Research. 14:169–181.

Lamendella, R., J. W. Santo Domingo, D. B. Oerther, J. R. Vogel, and D. M. Stoeckel. 2007.

Assessment of fecal pollution sources in a small northern-plains watershed using PCR and

190 phylogenetic analyses of Bacteroidetes 16S rRNA gene. FEMS Microbiol. Ecol. 59:651-660.

Lamendella, R., J. W. Santo Domingo, D. B. Oerther, J. R. Vogel, and D. M. Stoeckel. 2007.

Assessment of fecal pollution sources in a small northern-plains watershed using PCR and phylogenetic analyses of Bacteroidetes 16S rRNA gene. FEMS Microbiol. Ecol. 59:651-660.

Lamendella, R., J.W. Santo Domingo, S.G. Ghosh, G. DiGiovanni, R. Mackie, A.C.

Yanarell, and D.B. Oerther. 2009. Evaluation of swine-specific PCR assays used for fecal source tracking and analysis of molecular diversity of swine-specific Bacteroidales populations.

Applied and Environmental Microbiology 75:5787-5796.

Langendijk, P. S., F. Schut, G. J. Jansen, G. J. Raangs, G. R. Kamphuis, M. H. Wilkinson, and G. W. Welling. 1995. Quantitative fluorescence in situ hybridization of Bifidobacterium spp. with genus-specific 16S rRNA-targeted probes and its application in fecal samples. Appl.

Environ. Microbiol. 61:3069-3075.

Lauer, E. 1990. Bifidobacterium gallicum sp. nov. isolated from human feces. Int. J. Syst.

Bacteriol. 40:100-102.

Layton, A., L. McKay, D. Williams, V. Garrett, R. Gentry, G. Sayler. 2006. Development of

Bacteroides 16S rRNA gene TaqMan-based real-time PCR assays for estimation of total, human, and bovine fecal pollution in water. Appl. Environ. Microbiol. 72:4214-4224.

191

Leach, M. D., Broschat, S. L., and D. R Call. 2008. A discrete, stochastic model and correction method for bacterial source tracking. Environ. Sci. Tech. 42:524-529.

Leblond-Bourget, N., H. Philippe, I. Mangin, and B. Decaris. 1996. 16S rRNA and 16S to

23S internal transcribed spacer sequence analyses reveal inter- and intraspecific Bifidobacterium phylogeny. Int. J. Systematic Bacteriology. 46:102-111.

Lee, J.H., V.N. Karamychev, S.A. Kozyavkin, D. Mills, A.R. Pavlov, N.V. Pavlova, N.N.

Polouchine, P.M. Richardson, V.V. Shakhova, A.I. Slesarev, B. Weimer, and D.J.

O'Sullivan. 2008. Comparative genomic analysis of the gut bacterium Bifidobacterium longum reveals loci susceptible to deletion during pure culture growth. BMC Genomics. 9:247.

Leser, T.D., R.H. Lindecrona, T.K. Jensen, B.B. Jensen, and K. Moller. 2000. Changes in bacterial community structure in the colon of pigs fed different experimental diets and after infection with Brachyspira hyodysenteriae. Appl. Environ. Microbiol. 66:3290-3296.

Leser, T.D., J.Z. Amenuvor, T.K. Jensen, R.H. Lindecrona, M. Boye, and K. Moller. 2002.

Culture-independent analysis of gut bacteria: the pig gastrointestinal tract microbiota revisited.

Appl Environ Microbiol. 68:673–690.

Ley, R. E., M. Hamady, C. Lozupone, P. J. Turnbaugh, R. R. Ramey, J. S. Bircher, M. L.

Schlegel, T. A. Tucker, M. D. Schrenzel, R. Knight, and J. I. Gordon. 2008. Evolution of mammals and their gut microbes. Science 320:1647-1651.

192

Ley, R.E., D.A. Peterson, J.I. Gordon. 2006. Ecological and evolutionary forces shaping microbial diversity in the human intestine. Cell. 124:837-848.

Lockhart, P.J., C.J. Howe, D.A. Bryant, T.J. Beanland, and A.W. Larkum. 1992.

Substitutional bias confounds inference of cyanelle origins from sequence data. J Mol Evol.

34:153-162.

Long, P.F., W.C. Dunlap, C.N. Battershill, and M. Jaspars. 2005. Shotgun cloning and heterologous expression of the patellamide gene cluster as a strategy to achieving sustained metabolite production. Chembiochem. 6:1760–1765.

Loomis, W.F. and D.W. Smith. 1990. Molecular phylogeny of Dictyostelium discoideum by protein sequence comparison. Proc Natl Acad Sci. 87:9093-9097.

Lozupone, C., and R. Knight. 2005. UniFrac: a new phylogenetic method for comparing microbial communities. Appl. Environ. Microbiol. 71:8228-8235.

Lozupone, C.A., M. Hamady, B. L. Cantarel, P. M. Coutinho, B. Henrissat, J. I. Gordon, and R. Knight. 2008. The convergence of carbohydrate active gene repertoires in human gut microbes. Proc. Natl. Acad. Sci. U S A. 105:15076-15081.

Lu, J., J.W. Santo Domingo, and O.C. Shanks. 2007. Identification of chicken-specific fecal

193 microbial sequences using a metagenomic approach. Water Res. 41:3561-74.

Ludwig, W., O. Strunk, R. Westram, L. Richter, H. Heier, I. Yadhukumar, A. Buchner, T.

Lai, S. Steppi, G. Jobb, W. Förster, L. Brettske, S. Gerber, A.W. Ginhart, O. Gross, S.

Grumann, R. Hermann, A. Jost, T. König, R. Liss, M. Lüßmann, B. May, B. Nonhoff, S.

Reichel, R. Strehlow, A. Stamatakis, N. Stuckmann, A. Vilbig, M. Lenke, T. Ludwig, A.

Bode, A., K.H. Schleifer. 2004. ARB: a software environment for sequence data. Nucleic Acids

Res. 32:1363-1371.

Lukashin, A., and M. Borodovsky. 1998. GeneMark.hmm: new solutions for gene finding.

Nucleic Acids Res. 26:1107-15.

Macauley, J.J., Z. Qiang, C.D. Adams, R. Surampalli, M.R. Mormile. 2006. Water

Res.Disinfection of swine wastewater using chlorine, ultraviolet light and ozone. 40:2017-26.

Maczulak, A. E., M. J. Wolin, and T. L Miller. 1989. Increase in colonic methanogens and total anaerobes in aging rats. Appl. Environ. Microbiol. 55:2468-73.

Marcy Y, Ouverney C, Bik EM, Losekann T, Ivanova N, Martin HG, Szeto E, Platt D,

Hugenholtz P, Relman DA, Quake SR. Dissecting biological "dark matter" with single-cell genetic analysis of rare and uncultivated TM7 microbes from the human mouth. Proc Natl Acad

Sci U S A. 2007;104:11889–11894.

194

Markowitz, V.M., N. Ivanova, E. Szeto, K. Palaniappan, K. Chu, D. Dalevi, I.M. Chen, Y.

Grechkin, I. Dubchak, I. Anderson, A. Lykidis, K. Mavromatis, P. Hugenholtz, and N.C.

Kyrpides. 2008. IMG/M: a data management and analysis system for metagenomes, Nucleic

Acids Research 36, D534 - D538.

Martin, A.P. 2002. Phylogenetic approaches for describing and comparing the diversity of microbial communities. Appl. Environ. Microbiol. 68:3673-3682.

Matsuki, T., K. Watanabe, R. Tanaka, and H. Oyaizu. 1998. Rapid identification of human intestinal bifidobacteria by 16S rRNA-targeted species- and group-specific primers. FEMS

Microbiol. Lett. 167:113-121.

Matsuki, T., K.Watanabe, R. Tanaka, M. Fukuda, and H. Oyaizu. 1999. Distribution of bifidobacterial species in human intestinal microflora examined with 16S rRNA-gene-targeted species-specific primers. Appl. Environ. Microbiol. 65:4506-4512.

Matsuki, T., K. Watanabe, J. Fujimoto, Y. Miyamoto, T. Takada, K. Matsumoto, H.

Oyaizu, and R. Tanaka. 2002. Development of 16S rRNA-gene-targeted group-specific primers for the detection and identification of predominant bacteria in human feces. Appl. Environ.

Microbiol. 68:5445-5451.

Matteuzzi, D., F. Crociani, G. Zani, and L. D. Trovatelli, 1971. Bifidobacterium suis n. sp.: a new species of the genus Bifidobacterium isolated from pig feces. Allg. Mikrobiol. 11:387-395.

195

Merkel, M. 2004. Threatening Iowa’s Future: Iowa’s Failure to Implement and Enforce the

Clean Water Act for Livestock Operations. Washington, DC: Environmental Integrity Project. 20 pp. Available at: http://www.environmentalintegrity.org/pub194.cfm.

Mieszkin, S., J.P. Furet, G. Corthier, and M. Gourmelon. 2009. Estimation of pig fecal contamination in a river catchment by real-time PCR using two pig-specific Bacteroidales 16S rRNA genetic markers. Appl Environ Microbiol. 75:3045-54.

Mikkelsen, L. L., C. Bendixen, M. Jakobsen, and B. B. Jensen. 2003. Enumeration of bifidobacteria in gastrointestinal samples from piglets. Appl. Environ. Microbiol. 69:654-658.

Mitsuoka, T. 1969. Comparative studies on bifidobacteria isolated from the alimentary tract of man and animals (including descriptions of Bifidobacterium thermophilum nov. spec. and

Bifidobacterium pseudolongum nov. spec). Zentralbl Bakteriol. 210:52-64.

Miyake T., K. Watanabe, T. Watanabe, and H. Oyaizu. 1998. Phylogenetic analysis of the genus Bifidobacterium and related genera based on 16S rDNA sequences. Microbiol Immunol.

42:661-667.

Moore W.E., and L.V. Holdeman. 1974. Human fecal flora: the normal flora of 20 Japanese-

Hawaiians. Appl Microbiol. 27:961-79.

196

Nagaoka, M., S. Hashimoto, T. Watanabe, T. Yokokura, and Y. Mori. 1994. Anti-ulcer effects of lactic acid bacteria and their cell wall polysaccharides. Biol. Pharm. Bull. 17:1012–

1017.

Nebra, Y., X. Bonjoch, and A.R. Blanch. 2003. Use of Bifidobacterium dentium as an indicator of the origin of fecal water pollution. Appl Environ Microbiol. 69:2651-2656.

Noguchi, H., J. Park, and T. Takagi. 2006. MetaGene: prokaryotic gene finding from environmental shotgun sequences. Nucleic Acids Res. 34:5623–5630.

Ogura, D.S., K. Ehrlich, T. Itoh, Y. Takagi, T. Sakaki, T. Hayashi, and M. Hattori. 2007.

Comparative Metagenomics Revealed Commonly Enriched Gene Sets in Human Gut

Microbiomes. DNA Research. 14:169-81.

Okabe, S., N. Okayama, O. Savichtcheva, and T. Ito. 2007. Quantification of host-specific

Bacteroides-Prevotella 16S rRNA genetic markers for assessment of fecal pollution in freshwater. Appl. Microbiol. Biotechnol. 74:890-901.

Osterberg, D., and D. Wallinga. 2004. Addressing Externalities From Swine Production to

Reduce Public Health and Environmental Impacts. Am. J. Public Health. 94: 1703-1708.

Parker, C.T., B. Quinones, W.G. Miller, S.T. Horn, and R.E. Mandrell. 2006. Comparative genomic analysis of Campylobacter jejuni strains reveals diversity due to genomic elements similar to those present in C. jejuni strain RM1221. J. Clin. Microbiol. 444125-4135.

197

Pascual, C., G. Foster, E. Falsen, K. Bergstrom, C. Greko, and M.D. Collins. 1999.

Actinomyces bowdenii sp. nov., isolated from canine and feline clinical specimens. Int. J. Syst.

Bacteriol. 49:1873-1877.

Pellini, T. and J. Morris. 2001. A framework for assessing the impact of the IPPC directive on the performance of the pig industry. Journal of Environmental Management 63: 325-333.

Peña, J.A., S.Y. Li, P.H. Wilson, S.A. Thibodeau, A.J. Szary, et al. 2004. Genotypic and phenotypic studies of murine intestinal lactobacilli: Species differences in mice with and without colitis. Appl Environ Microbiol. 70:558–568.

Petraitis, E. 2007. Research into heavy metal concentrations in agricultural soils Ekologica. 53:

64–69.

Pieper R., R. Jha, B. Rossnagel, A.G. Van Kessel, W.B. Souffrant, and P. Leterme. 2008.

Effect of barley and oat cultivars with different carbohydrate compositions on the intestinal bacterial communities in weaned piglets. FEMS Microbiol Ecol. 66:556-66.

Podar, M., C.B. Abulencia, M. Walcher, D. Hutchison, K. Zengler, J.A. Garcia, T. Holland,

D. Cotton, L. Hauser, and M. Keller. 2007. Targeted access to the genomes of low-abundance organisms in complex microbial communities. Appl Environ Microbiol. 73:3205–3214.

198

Pruesse, E., C. Quast, K. Knittel, B. Fuchs, W. Ludwig, J. Peplies, and F.O. Glöckner.

2007. SILVA: a comprehensive online resource for quality checked and aligned ribosomal RNA sequence data compatible with ARB. Nuc. Acids Res. 35:7188-7196.

Qu, A., et al. 2008. Comparative metagenomics reveals host specific metavirulomes and horizontal gene transfer elements in the chicken cecum microbiome. PLoS ONE. 2008;3:e2945

Raes, J., K.U. Foerstner, and P. Bork. 2007. Get the most out of your metagenome: computational analysis of environmental sequence data. Current Opinion in Microbiology 10:

490-498.

Resnick, I. G., and M. A. Levin. 1981. Assessment of bifidobacteria as indicators of human fecal pollution. Appl. Environ. Microbiol. 42:433-438.

Reuter, G. 2001. The Lactobacillus and Bifidobacterium microflora of the human intestine: composition and succession. Curr. Issues Intest. Microbiol. 2:43-53.

Rhodes, M. W., and H. Kator. 1999. Sorbitol-fermenting bifidobacteria as indicators of diffuse human faecal pollution in estuarine watersheds. J. Appl. Microbiol. 87:528-535.

Ribaudo, M., J. Kaplan, L. Christensen, N. Gollehon, R. Johansson, V. Breneman, and J.

199

Agapoff, and M., Aillery. 2003. Manure Management for Water Quality: Costs to Animal

Feeding Operations of Applying Manure Nutrients to Land. Washington, DC: US Department of

Agriculture, Economic Research Service. 90 pp.

Sakata, S., C. S. Ryu, M. Kitahara, M. Sakamoto, H. Hayashi, M. Fukuyama, and Y.

Benno. 2006. Characterization of the genus Bifidobacterium by automated ribotyping and 16S rRNA gene sequences. Microbiol. Immunol. 50:1-10.

Santo Domingo, J.W., D. G. Bambic, T. A. Edge, and S. Wuertz. 2007. Quo vadis source tracking? Towards a strategic framework for environmental monitoring of fecal pollution. Water

Res. 41:3539-3552.

Scardovi, V. 1984. Genus Bifidobacterium. Orla-Jensen, 1924, 472, p. 1418-1434. In Krieg N.R. and Holt J.G. (eds), Bergey''s manual of systematic bacteriology, vol. 1. Williams and Wilkins,

Baltimore, MD.

Schell, M. A., M. Karmirantzou, B. Snel, D. Vilanova, G. Pessi, M. C. Zwahlen, F. Desiere,

P. Bork, M. Delley and G. Aigoni. 2002. The genome sequence of Bifidobacterium longum reflects its adaptation to the human gastrointestinal tract. Proc. Natl. Acad. Sci. 99:14422–14427.

Schloss, P.D., S.L. Westcott, T. Ryabin, J.R. Hall, M. Hartmann, E.B. Hollister, R.A.

Lesniewski, B.B. Oakley, D.H. Parks, C.J. Robinson, J.W. Sahl, B. Stres, G.G. Thallinger,

D.J. Van Horn, and C.F. Weber. 2009. Introducing mothur: Open Source, Platform-

200 independent, Community-supported Software for Describing and Comparing Microbial

Communities. Appl Environ Microbiol. (Epub).

Schloss, P. D. and J. Handelsman. 2005. Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. App. Environ. Microbiol.

71:1501-1506.

Schloss, P. D., B. R. Larget, and J. Handelsman. 2004. Integration of microbial ecology and statistics: a test to compare gene libraries. Appl. Environ. Microbiol. 70:5485-5492.

Schloss, P.D. and J. Handelsman. 2006. Introducing SONS, a tool for OTU-based comparisons of membership and structure between microbial communities. App. Environ.

Microbiol. 72:6773-6779.

Schloss, P.D., and J. Handelsman. 2006a. Introducing TreeClimber, a test to compare microbial community structures. Appl. Environ. Microbiol. 72:2379-2384.

Schmidt, E.W., J.T. Nelson, D.A. Rasko, S. Sudek, J.A. Eisen, M.G. Haygood, and J. Ravel.

2005. Patellamide A and C biosynthesis by a microcin-like pathway in Prochloron didemni, the cyanobacterial symbiont of Lissoclinum patella. Proc Natl Acad Sci U S A.102:7315–7320.

Scott, T.M., T.M. Jenkins, J. Lukasik and J.B. Rose. 2005. Potential use of a host associated molecular marker in Enterococcus faecium as an index of human fecal pollution. Environ Sci

201

Technol. 39:283-287.

Selander, R.K. 1997. DNA sequence analysis of the genetic structure and evolution of

Salmonella enterica, p. 191-213. In B. A. M. van der Zeijst, W. P. M. Hoekstra, and J. D. A. van

Embden (ed.), Ecology of pathogenic bacteria: molecular and evolutionary aspects. Royal

Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands.

Shanks, O.C., J.W. Santo Domingo, R. Lamendella, C.A. Kelty and J.E. Graham. 2006.

Competitive metagenomic DNA hybridization identifies host-specific microbial genetic markers in cow fecal samples. Appl. Environ. Microbiol. 72:4054–4060.

Shanks, O.C., J.W. Santo Domingo, J. Lu, C.A. Kelty and J.E. Graham. 2007. Identification of bacterial DNA markers for the detection of human fecal pollution in water. Appl. Environ.

Microbiol. 73:2416–2422.

Simberloff, D. 1978. Use of rarefaction and related methods in ecology, In: Dickson, K.L.,

Cairns, J.J., and Livingston, R.J. Biological data in water pollution assessment: quantitative and statistical analyses. American Society for Testing and Materials, Philadelphia, Pa. pp. 150-165.

Simpkins W.W., and M.R. Burkart, M.F. Helmke, T.N Twedt, D.E. James, R.J. Jaquis, and

K.J. Cole. 2002. Potential impact of waste storage structures on water resources in Iowa. J. Am.

Water Resources Assoc. 38:759–771.

Simpson, J.M., J.W. Santo Domingo, and D.J. Reasoner. 2002. Microbial source tracking:

202 state of the science. Environ. Sci. Technol. 36:5279-5288.

Simpson, P. J., C. Stanton, G. F. Fitzgerald, and R. P. Ross. 2003. Genomic diversity and relatedness of bifidobacteria isolated from a porcine cecum. Journal of Bacteriology. 185: 2571-

2581.

Sloan, W.T., M. Lunn, S. Woodcock, I.M. Head, S. Nee, and T.P. Curtis. 2006. Quantifying the roles of immigration and chance in shaping community structure. Environ.

Microbiol. 8:732-740.

Smith, E. P., and G. van Belle. 1984. Nonparametric estimation of species richness. Biometrics.

40:119-129.

Snell-Castro, R., J.J. Godon, J.P. Delgenes, and P. Dabert. 2005. Characterisation of the microbial diversity in a pig manure storage pit using small subunit rDNA sequence analysis.

FEMS Microbiol. Ecol. 52:229-242.

Sorlini, C., T. Brusa, G. Ranalli, and A. Ferrari. 1988. Quantitative determination of methanogenic bacteria in the feces of different mammals. Curr. Microbiol. 17:33-36.

Stepanauskas R, Sieracki ME. Matching phylogeny and metabolism in the uncultured marine bacteria, one cell at a time. Proc Natl Acad Sci U S A. 2007;104:9052–9057.

203

Stoeckel, D.M. and V.J. Harwood. 2007. Performance, Design, and Analysis in Microbial

Source Tracking Studies. Appl Environ Microbiol. 73: 2405–2415.

Tarr, P.I., L.M. Schoening, Y.L. Yea, T.R. Ward, S. Jelacic, and T.S. Whittam. 2000.

Acquisition of the rfb-gnd cluster in evolution of Escherichia coli O55 and O157. J. Bacteriol.

182:6183-6191.

Taylor N.M., F.A. Clifton-Hadley, A.D. Wales, A. Ridley, and R.H. Davies. 2009. Farm-level risk factors for fluoroquinolone resistance in E. coli and thermophilic Campylobacter spp. on finisher pig farms. Epidemiol. Infect. 137:1121-1134.

Tettelin, H., K.E. Nelson, I.T. Paulsen, J.A. Eisen, T.D. Read, S. Peterson, J. Heidelberg,

R.T. DeBoy, D.H. Haft, R.J. Dodson, A.S. Durkin, M. Gwinn, J.F. Kolonay, W.C. Nelson,

J.D. Peterson, L.A. Umayam, O. White, S.L. Salzberg, M.R. Lewis, D. Radune, E.

Holtzapple, H. Khouri, A.M. Wolf, T.R. Utterback, C. L. Hansen, L. A. McDonald, T. V.

Feldblyum, S. Angiuoli, T. Dickinson, E.K. Hickey, I.E. Holt, B.J. Loftus, F. Yang, H.O.

Smith, J.C. Venter, B.A. Dougherty, D.A. Morrison, S.K. Hollingshead, and C.M. Fraser.

2001. Complete genome sequence of virulent isolate of Streptococcus pneumoniae. Science.

293:498-506.

Tettelin, H., N. J. Saunders, J. Heidelberg, A. C. Jeffries, K. E. Nelson, J. A. Eisen, K. A.

Ketchum, D. W. Hood, J. F. Peden, R. J. Dodson, W. C. Nelson, M. L. Gwinn, R. DeBoy, J.

204

D. Peterson, E. K. Hickey, D. H. Haft, S. L. Salzberg, O. White, R. D. Fleischmann, B. A.

Dougherty, T. Mason, A. Ciecko, D. S. Parksey, E. Blair, H. Cittone, E. B. Clark, M. D.

Cotton, T. R. Utterback, H. Khouri, H. Qin, J. Vamathevan, J. Gill, V. Scarlato, V.

Masignani, M. Pizza, G. Grandi, L. Sun, H. O. Smith, C. M. Fraser, E. R. Moxon, R.

Rappuoli, and J. C. Venter. 2000. Complete genome sequence of Neisseria meningitidis serogroup B strain BC58. Science 287:1809-1815.

Thompson, C.L., B. Wang, and A.J. Holmes. 2008. The immediate environment during postnatal development has long-term impact on gut community structure in pigs. 2:739-48.

Thompson, C.L., and A.J. Holmes. 2009. A window of environmental dependence is evident in multiple phylogenetically distinct subgroups in the faecal community of piglets. FEMS

Microbiol Lett. 290:91-97.

Torsvik, V., and L.Ovreas L. 2002. Microbial diversity and function in soil: from genes to ecosystems. Current Opinion in Microbiology 5: 240–245.

Tringe, S.G., C. von Mering, A. Kobayashi, A.A. Salamov, K. Chen, H.W. Chang, M.

Podar, J.M. Short, E.J. Mathur, J.C. Detter, P. Bork, P. Hugenholtz, and E.M. Rubin.

2005. Comparative Metagenomics of Microbial Communities. Science. 308: 554-557.

Turnbaugh, P.J., R.E. Ley, M.A. Mahowald, V. Magrini, E.R. Mardis, and J. Gordon.

2006. An obesity-associated gut microbiome with increased capacity for energy harvest. Nature.

444: 1027-1031.

205

Turnbaugh, P.J., R.E. Ley, M. Hamady, C.M. Fraser-Liggett, R. Knight R, et al. 2007. The

Human Microbiome Project. Nature. 449: 804–810.

Uchiyama, T. and K. Watanabe. 2008. Substrate-induced gene expression (SIGEX) screening of metagenome libraries. Nature protocols, 3:1202-1212.

U. S. Environmental Protection Agency. 2003. National pollutant discharge elimination system permit regulation and effluent limitation guidelines and standards for concentrated animal feeding operations (CAFOs). 68 Federal Register 7180.

U. S. Environmental Protection Agency. 2005. Microbial Source Tracking Guide Document.

Office of Research and Development. United States Environmental Protection Agency. 143pp.

Ufnar, J.A., D.F. Ufnar, S.Y. Wang, and R.D. Ellender. 2007. Development of a swine- specific fecal pollution marker based on host differences in methanogen mcrA genes. Appl.

Environ. Microbiol. 73:5209-5217.

United States Department of Agriculture. 2005. Animal and Plant Health Inspection Service.

National Animal Health Monitoring System. Swine 2000: Part IV Changes in the U. S. Pork

Industry, 1990-2000. Fort Collins, Colorado. 50pp.

206

USDA. 2001. Part I: Reference of Swine Health and Management in the United States, 2000

N338.0801. National Animal Health Monitoring System.

VAN DEN Broek, I.V., B.A. VAN Cleef, A. Haenen, E.M. Broens, P.J. VAN DER Wolf,

M.J. VAN DEN Broek, X.W. Huijsdens, J.A. Kluytmans, A.W. VAN DE Giessen, and E.W.

Tiemersma. 2009. Methicillin-resistant Staphylococcus aureus in people living and working in pig farms. 137:700-708.

Venter, J.C., K. Remington, J.F. Heidelberg, A.L. Halpern, D. Rusch, J.A. Eisen, D. Wu, I.

Paulsen, K.E. Nelson, W. Nelson, D.E. Fouts, S. Levy, A.H. Knap, M. W. Lomas, K.

Nealson, O. White, J. Peterson, J. Hoffman, R. Parsons, H. Baden-Tillson, C. Pfannkoch,

Y.H. Rogers, and H. O. Smith. 2004. Environmental genome shotgun sequencing of the

Sargasso Sea. Science. 304:66–74.

Von Mering, C., P. Hugenholtz, J. Raes, S.G. Tringe, T. Doerks, L.J. Jensen, N. Ward, and

P. Bork. 2007. Quantitative Phylogenetic Assessment of Microbial Communities in Diverse

Environments. Science. 315: 1126-1130.

Walters S.P., and K.G. Field. 2009. Survival and persistence of human and ruminant-specific faecal Bacteroidales in freshwater microcosms. Environ Microbiol. 11:1410-21.

207

Warnecke, F., P. Luginbuhl, N. Ivanova, M. Ghassemian, T.H. Richardson, J.T. Stege, M.

Cayouette, A.C. McHardy, G. Djordjevic, N. Aboushadi , R. Sorek, S.G. Tringe, M. Podar,

H.G. Martin, V. Kunin, D. Dalevi, J. Madejska, E. Kirton, D. Platt, E. Szeto, A. Salamov,

K. Barry, N. Mikhailova, N.C. Kyrpides, E.G. Matson, E.A. Ottesen, X.N. Zhang, M.

Hernandez, C. Murillo, and L.G. Acosta. 2007 Metagenomic and functional analysis of hindgut microbiota of a wood-feeding higher termite. Nature. 450:560-565.

Weber, T.E., S.L. Trabue, C.J. Ziemer, B.J. Kerr. 2009. Evaluation of elevated dietary corn fiber from corn germ meal in growing female pigs. Journal of Animal Science (Epub)

Wegener, H.C., and D.L. Baggesen. 1996. Investigation of an outbreak of human salmonellosis caused by Salmonella enterica ssp. enterica serovar Infantis by use of pulsed field gel electrophoresis. Int. J. Food Microbiol. 32:125-131.

Wiggins, B.A., R.W. Andrews, R.A. Conway, C.L. Corr, E.J. Dobratz, D.P. Dougherty, J.R.

Eppard, S.R. Knupp, M.C. Limjoco, J.M. Mettenburg, J.M. Rinehardt, J. Sonsino, R.L.

Torrijos and M.E. Zimmerman. 1999. Use of antibiotic resistance analysis to identify nonpoint sources of fecal pollution. Appl. Environ. Microbiol. 65:3483–3486.

Wing, S., S. Freedman, and L. Band. 2002. The potential impact of flooding on confined animal feeding operations in eastern North Carolina. Environ. Health Perspect. 110:387-391.

Wood, J., K.P. Scott, C.J. Newbold, and H.J. Flint. 1998. Estimation of the relative abundance

208 of different Bacteroides and Prevotella ribotypes in gut samples by restriction enzyme profiling of PCR-amplified 16S rRNA gene sequences. Appl. Environ. Microbiol. 64:3683-3689.

Wu, M., and J. Eisen. 2008 A simple, fast, and accurate method of phylogenomic inference.

Genome Biol. 9:R151

Xu, M., X. Xiao, and F. Wang. Isolation and characterization of alkane hydroxylases from a metagenomic library of Pacific deep-sea sediment. Extremophiles. 2008;12:255–262.

Xu, J., M. A. Mahowald, R.E. Ley, C. A. Lozupone, M. Hamady, E.C. Martens, B.

Henrissat, P.M. Coutinho, P. Minx, P. Latreille, H. Cordum, A. Van Brunt, K. Kim, R.S.

Fulton, L.A. Fulton, S.W. Clifton, R.K. Wilson, R.D. Knight, and J.I. Gordon. 2007.

Evolution of symbiotic bacteria in the distal human intestine. PLoS Biol. 5:e156

Yaeshima, T., T. Fujisawa, and T. Mitsuoka. 1992. Bifidobacterium globosum, subjective synonym of Bifidobacterium pseudolongum , and description of Bifidobacterium pseudolongum subsp. pseudolongum comb. nov. and Bifidobacterium pseudolongum subsp. globosum comb. nov. Syst. Appl. Microbiol. 15: 380-385.

Yildirim, Z., D. K. Winters, and M.G. Johnson. 1999. Purification, amino acid sequence and mode of action of bifidocin B produced by Bifidobacterium bifidum NCFB 1454. J. Appl.

Microbiol. 86:45-54.

209

Ziemer, C.J., M.A. Cotta, and T.R. Whitehead. 2004. Application of group specific amplified rDNA restriction analysis to characterize swine fecal and manure storage pit samples. Anaerobe

10:217-227.

210