UNIVERSITY OF CINCINNATI

Date: 21-May-2010

I, Ben W Humrighouse , hereby submit this original work as part of the requirements for the degree of: Master of Science in Environmental Science It is entitled: Phylogenetic analysis of bacterial 16S rRNA sequences found in bulk water

samples collected throughout a metropolitan area drinking water distribution

system Student Signature: Ben W Humrighouse

This work and its defense approved by: Committee Chair: Daniel Oerther, PhD Daniel Oerther, PhD

Jorge W Santo Domingo, PHD Jorge W Santo Domingo, PHD

Margaret Kupferle, PhD, PE Margaret Kupferle, PhD, PE

5/28/2010 785

Phylogenetic analysis of bacterial 16S rRNA gene sequences found

in bulk water samples collected throughout a metropolitan area

drinking water distribution system

A thesis submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

Master of Science

in the Department of Civil and Environmental Engineering

of the College of Engineering

by

Ben Humrighouse

May, 2010

B.S. Ohio University

June, 2001

Abstract

Assays for the detection of total coliforms and E. coli are used in the United States as a tool for the detection of fecal contamination in finished drinking water, as a measure to prevent illness. However, based on statistics published by a number of researchers, as many as 764 documented waterborne outbreaks occurred between 1971 and 2002, which resulted in over half a million cases of illness and 79 deaths (Reynolds, Mena, & Gerba,

2008). Also, it has been theorized that as many as 19 million cases/year of waterborne illness are caused by public water systems fed by ground water and surface water in the

US. Therefore, it has been argued that coliforms are not good indicator organisms for predicting risk of waterborne illness. In order to gather more microbiological information regarding drinking water, much research has been conducted on microorganisms in planktonic phase, as well as in biofilms present in drinking water distribution systems (DWDS).

Biofilms have the potential to harbor pathogens including , viruses, and protozoa, and provide resistance against antibiotics and residual disinfectants (Helmi et al ., 2008); (Stewart & Costerton, 2001), and therefore have been studied in depth using culture-based and molecular techniques. Not all biofilm is created equal. Different species of bacteria may colonize surfaces (internal or external) and begin to proliferate and form biofilm (Zhang, Choi, Dionysiou, Sorial, & Oerther, 2006). Biofilms have the ability to form a greater amount of surface area for attachment of different substances including microbes. Due to the fact that biofilms are consistently being established, maturing, and sloughing off, it seems reasonable that the bacteria in planktonic phase, in

iii a distribution system, would be ideal for microbial community analysis. With this in mind, we decided to analyze bacterial genomic DNA in order to gain knowledge of the current community structures present in drinking water samples, to serve as a base for measuring future differences or shifts in community structure. Although biofilms and planktonic populations may differ to some degree i.e.: culturability, activity, etc. (Boe-

Hansen, Albrechtsen, Arvin, & Jorgensen, 2002) and may be very similar in terms of phylogeny, we decided to analyze the planktonic bacteria present in 31 sampling sites within a DWDS in order to characterize the bacterial community structures in these areas of the distribution system.

The United States Environmental Protection Agency has been charged with continuing to ensure public drinking water safety and source water protection via development of compliance standards and continuing research. Through analysis of bulk water samples, this study aims to identify and compare the microbial communities that are present in a DWDS fed by ground water and surface water. This study was conducted in order to gain a better understanding of the bacterial communities that are present within a municipal DWDS at Points of Use (POUs), with the focus of the study being a phylogenetic comparison between those communities found in different areas of the DWDS based on source.

A total of 2786 16S rDNA clones were analyzed in this study. Using existing databases for sequence comparison, we found that Actinobacteria (mainly

Mycobacterium sp .) and represented nearly 46% and 36% of the

total clones examined, respectively. Other bacterial genera identified in this study include

Betaproteobacteria , , Cyanobacteria , Firmicutes , Planctomycetes ,

iv and others. While Alphaproteobacteria has been shown to be a numerically dominant group in chlorinated drinking water system simulators (Williams, Domingo, Meckes,

Kelty, & Rochon, 2004); (Williams, Santo Domingo, & Meckes, 2005), the surprising abundance of mycobacterial sequences recovered in this study indicate significant differences in microbial community structure between the WDS analyzed in this study as compared to others in the literature. These differences could be attributed to different raw water sources or water treatment methods utilized by each distribution system. While some of the genera identified in this study have been associated with some public health risks, it should be noted that analysis of 16S rDNA clones does not confirm the presence of pathogenic strains of any organisms identified in this study, as current methods for detection and identification require several steps including selective enrichment, isolation, and final confirmation via in vitro studies.

v vi Acknowledgements

I am indebted to the following people for assisting me in the accomplishment of this thesis:

Dr. Daniel B. Oerther (University of Cincinnati, Department of Civil and Environmental

Engineering, College of Engineering) and Dr. Jorge W. Santo Domingo (United States

Environmental Protection Agency), my advisors and mentors, for their invaluable help in learning the scientific process and their continuous inspiration and patience;

Dr. Margaret Kupferle, for her valuable input serving as a member of my committee; the

US Environmental Protection Agency Traineeship Award for financial support; my coworkers at both US EPA and University of Cincinnati, and Randy Revetta for his assistance in sample preparation and collection;

I would also like to thank Dr. Geoffrey Buckley, Dr. Nancy Bain, and Dr. Scott Moody

(Ohio University Departments of Geography and Biology) for inspiring me to pursue a graduate degree in environmental science.

vii

To my loving wife Keri and my Family, for their support in all ventures of my life

ii

Table of Contents

Dedication…………………………………………………………………………….ii

Abstract……………………………………………………………………………… iii

Acknowledgements…………………………………………………………………. vii

Table of Contents………………………………………………………………….....viii

Chapter 1. Introduction

1.1 Literature Review of Studies using 16S Molecular Approach with Environmental

Samples

1.2 Available Molecular Biology Tools in Drinking Water Research

1.3 Cultivation vs. Cultivation- Independent Library-Based Approaches

Chapter 2. Biofilms and bulk water microbial analysis

Chapter 3. Cincinnati Drinking Water Distribution System

3.1 Site description of Distribution System Network

3.2 Description of sampling stations

3.3 Benefits of This Study

3.4 Application of library and culture-independent PCR assays in Drinking Water.

viii Chapter 4. Molecular Survey of Drinking Water Distribution System in Cincinnati using

PCR and phylogenetic analyses of bacterial 16S rDNA

4.1 Material and Methods

4.1.1 Sample Collection

4.1.2 Molecular Techniques

4.1.3 Sequence analysis

4.2 Results

4.3 Discussion

4.4 Conclusions and Future Studies

Appendix A

Bibliography

ix

Chapter 1

Introduction

In 1974, the United States Congress passed the Safe Drinking Water Act (SDWA) in order to protect public health by regulating America’s drinking water supply. The

SDWA authorizes the United States Environmental Protection Agency (US EPA) to set standards for the nation’s drinking water, in order to protect the public from any natural or man-made contaminants. The 1996 amendments to the SDWA enhanced the laws by allowing for source water protection and funding for water system improvements

(including research-based activities). One of the main points of the 1996 amendments is that the EPA is required to strengthen protection against microbial contamination as well as strengthen the control over the byproducts formed during chemical disinfection of drinking water. These amendments allow for the funding of drinking water research projects such as this study.

Coliform bacteria counts including Total Coliform Counts as well as E. coli testing are conducted by public water systems as a requirement by the United States

Environmental Protection Agency (US EPA), under the Safe Drinking Water Act, as these organisms are indicators of contamination by animal waste or human sewage

(www.epa.gov/safewater/contaminants/ecoli.html). These water systems are required to monitor for total coliforms, and if a sample tests positive, it must be tested for E. coli using culture based assays. The frequency of these assays depends mainly on the number

1 of people the system serves. This method relies on cultivation of the microbes, and has been criticized for failing to provide much other information regarding the safety of public drinking water.

Interestingly, Reynolds and colleagues (Reynolds, et al ., 2008) recently cited other studies that have addressed microbial outbreaks in drinking water in the US over the last

30 years. Reynolds cited Blackburn and Calderon (2004), 764 documented waterborne outbreaks between 1971 and 2002. These outbreaks resulted in 575,547 cases of illness and 79 deaths. However, the actual numbers of illnesses associated with drinking water consumption are thought to be much higher, as many gastrointestinal illnesses are not reported, and other infections other than gastroenteritis are not usually reported.

Due to the current practice of monitoring total coliforms and E. coli in drinking water, and the inability of these indicator organisms to predict the majority of waterborne illness via drinking water, more information describing the community structures of microbes

(specifically bacteria) inhabiting a drinking water distribution network is justified. This study was conducted in order to gain a better understanding of the bacterial community structures that exist in the harsh oligotrophic conditions of drinking water while comparing community structures present in ground water fed areas of a DWDS to those community structures present in areas of a DWDS fed by surface water.

The primary sources of drinking water in the world are surface water and ground water. Previous studies have shown that ground water and surface water differ greatly with respect to types of microbes, organic content, and mineral quantities (Chapelle,

2000). Regardless of their microbial composition, source waters have to undergo similar series of disinfection treatments before reaching the consumer as potable water. It is also

2 known that drinking water treatment processes have variable effectiveness with respect to removal of microorganisms (Norton & LeChevallier, 2000). Due to these differences in initial water quality and treatment regimens, it can be assumed that the bacterial communities that exist in drinking water distribution networks may be composed of different members, or relative abundances of such members.

Traditional culture-based approaches have proven to provide valuable information regarding microorganisms in drinking water samples. However, these techniques tend to be time consuming, and tend to grossly underestimate the microbial diversity in environmental samples. These shortcomings are mainly due to the specific media and other laboratory conditions such as incubator temperature, etc. These media and temperatures are highly selective in nature as they only provide very specific amounts of nutrients, for example. Another problem associated with utilizing such techniques lies in the fact that organisms may enter a somewhat dormant phase under oligotrophic conditions (such as those of a DWDS), in which they persist in a viable but non- culturable state, when optimal conditions for growth are not maintained.

Bacteria inhabiting drinking water distribution systems have been studied using culture-based approaches. Many of these studies have utilized media such as HPC and

R2A as well as others (Lehtola, Miettinen, Vartiainen, & Martikainen, 2002); (Uhl &

Schaule, 2004). These approaches have been successful in gathering bacterial presence data as well as phylogenetic information, but have been criticized for their lack of ability to capture the diverse phylogenetic structure and relationships that exist in drinking water system environments (Szewzyk, Szewzyk, Manz, & Schleifer, 2000); (Eichler, et al .,

2006);(Santo Domingo, et al ., 2003);(Williams, et al ., 2004); (Keinanen-Toivola, et al .,

3 2006). The study of such bacterial community structures using molecular biology tools can provide an immensely greater amount of information and insight into the complex microbial community structures that exist in such environments.

The advent of molecular biology tools has provided researchers with incredible amounts of new information regarding microorganisms in samples taken from various environments. These technologies, including the polymerase chain reaction (PCR) are currently being used for detection, quantification, and phylogenetic analysis, and many other applications. These methods can target molecules that are unique to specific types, classes, and species of microorganisms. For example, the 16S rRNA gene is a universally recognized target molecule, as it is present in all bacteria and is also highly conserved over subsequent generations of bacterial growth. The gene is active during protein synthesis in bacterial cells, coding for specific types of protein. The 16S rRNA gene is highly conserved in nature, but it also very diverse and unique regarding its specific code among different bacteria.

The 16S rRNA gene has been used for phylogenetic analysis of bacteria in a broad spectrum of environmental samples (Keinanen-Toivola, Revetta, & Santo Domingo,

2006); (Stephen, McCaig, Smith, Prosser, & Embley, 1996);(Angenent, Kelley, St

Amand, Pace, & Hernandez, 2005) including soil, water, and air, in order to obtain information regarding the identity of individual bacteria, as well as the bacterial community structure that exists in such samples. This target gene has been used in drinking water research in recent studies (Eichler et al ., 2006);(Keinanen-Toivola, et al .,

2006);(Santo Domingo, Meckes, Simpson, Sloss, & Reasoner, 2003);(Schmeisser et al .,

2003);(Williams, et al ., 2005);(Revetta, Pemberton, Lamendella, Iker, & Santo Domingo,

4 2009);(Feazel et al ., 2009); (Santo Domingo, et al ., 2003); (Williams, et al ., 2004) in

order to gain more insight regarding the types of bacteria present in such an oligotrophic

environment. Also, metagenomic analyses have produced information describing the

function of other genes present in biofilm and bulk water samples from drinking water

(Schmeisser, et al ., 2003). Most of these studies have utilized distribution system

simulators, annular reactors, and other submerged materials to develop biofilms for use in

16S phylogenetic analyses (Williams, et al ., 2004); (Santo Domingo, et al ., 2003);

(Keinanen-Toivola, et al ., 2006); (Schmeisser, et al ., 2003): (Regan, Harrington, &

Noguera, 2002). Although such studies have provided phylogenetic information from samples in distributions systems, the majority of these studies have focused on a limited amount of sampling sites, or annular reactors.

Studies on biofilms, and their ability to harbor pathogens have been conducted

(Donlan, 2002); (Stewart & Costerton, 2001); (Helmi, et al ., 2008). Biofilms tend to slough off during a normal progression of their development, consequently entering the bulk water phase within a pipeline. We theorize that this process of biofilm development occurs at a regular rate due to the regular pressure and flow of water that occur in a

DWDS.

This study aims to reveal differences in bacterial composition of two areas of a metropolitan drinking water distribution system, which are fed by two different sources of water (ground water and surface water). Emphasis was placed on the number of sampling sites, as this approach may provide more valuable information regarding the community structures that exist in water consumed by the customers of a metropolitan area. To our knowledge, this is the largest 16S clone library study that has been

5 conducted on a drinking water distribution system. The study may also provide much needed information regarding the bacterial community dynamics that may be influenced by raw water sources and subsequent water treatment technologies. We chose to use the full 16S cycle approach in order to obtain bacterial sequence data from grab samples from 31 sites within the DWDS. This sequence data was analyzed using computer programs developed by mathematicians and microbial ecologists in order to reveal differences that exist between the sampling areas.

1.1 Literature Review of Studies using 16S Molecular Approach in Water

Culture-based methods and biochemical techniques (Keinanen, Martikainen, &

Kontro, 2004) are commonly used in attempt to characterize drinking water bacterial communities. Many of these phenotypic methods are time consuming and tend to grossly underestimate the microbial diversity within water samples. Microbial community structure and biomass in developing drinking water biofilms have been analyzed using such techniques. However, in order to avoid such problems, molecular techniques are now widely used to characterize and detect microorganisms that are difficult to culture.

For example, studies aimed at improving detection of using molecular methods have been conducted, due to the theorized ability of the organism Legionella to live as an intracellular symbiont within an amoeba. This relationship has been described by (Rowbotham, 1986) and studied in vivo and in vitro by (Vandenesch et al ., 1990).

Additionally, bacteria may persist in the environment in a viable but non-culturable

(VBNC) state. The existence of such a state has been debated. The primary reasons for

6 an organism to enter such a state are environmental in nature. Examples include lack of food, changes in temperature, pH, competition for resources, etc. Once an organism reaches such a state, it may be difficult for a researcher to cultivate due to the very specific needs of the organism. Due to the inability of VBNC bacteria to grow on artificial media, bacterial enumeration using culture technique can result in a gross underestimation of bacterial counts (Barer, 1997).

In recent literature, a large number of studies have relied on the use of 16S rRNA sequence analysis to detect as well as characterize natural microbial communities. The

16S rRNA gene is vital to protein synthesis and therefore it has been maintained in all living organisms (Olsen, Lane, Giovannoni, Pace, & Stahl, 1986). Moreover, the primary structure of this gene (that is, the DNA sequence) is highly converse, giving researchers the ability to exploit this molecule in many different ways. As a consequence, microbial taxonomists have used the sequence analysis of this gene to predict the phylogenetic affiliation of bacterial populations (Olsen, et al ., 1986). This target molecule has also been used as a means of detection for specific bacteria, and a host of microbes that are native to the intestinal tract of animals for microbial source tracking (Lamendella,

Domingo, Oerther, Vogel, & Stoeckel, 2007).

The 16S rRNA gene has also been used as a tool in the construction of phylogenetic trees based on the mixed DNA found in complex environmental samples. Since it is possible to amplify this gene from whole community DNA extracts using universal primers, it is then possible to develop 16S rDNA clone libraries and study the composition of complex microbial communities without relying on culture-based techniques. Using this approach, scientists have been able to study the microbial network

7 of a wide array of environments. For example, Dunbar and colleagues were able to exploit the use of this molecule in a study comparing microbial diversity within soil samples using culture-based and molecular techniques (Dunbar, Takala, Barns, Davis, &

Kuske, 1999). Another example includes the work of Bernhard and Field, in which the researchers were able to utilize genetic markers for fecal anaerobes in order to detect non- point sources of fecal pollution (Bernhard & Field, 2000). The molecule has also been exploited to explore the diversity of members of particular genus in microbial studies.

For example, (September, Brozel, & Venter, 2004) were able to successfully explore the diversity of non-tuberculoid mycobacterial species in drinking water distribution systems, constructing phylogenetic trees that support the diversity in mycobacterial sequences found in the mixed DNA in the drinking water system under study. Another study conducted by (Santo Domingo, et al ., 2003) were able to monitor and compare the

impacts of different disinfection chemicals (chlorine and chloramine) on microbial

communities within a simulated drinking water distribution system.

Many studies have relied on the inherent qualities of the 16S target in DWDS biofilm

research as well. An example of this type of work can be seen in a study conducted by

Williams and colleagues (Williams, et al ., 2004), in which the researchers were able to

grow biofilms under a number of different conditions using annular reactors. They were

able to successfully grow and sample biofilms from these reactors in order to assess the

microbial diversity within each system.

The 16S as well as the 23S molecules have also been utilized in a number of studies

using the molecule as a base for attachment of fluorescent oligonucleotide probes. An

8 example includes (Manz et al ., 1993) in which the researchers were not only able to

detect and identify specific types of bacteria, but also quantify them as well.

The goal of the study was to accurately characterize and compare the microbial

community structures within a drinking water distribution system, based on source water

type, using a 16S rDNA phylogenetic approach. Clone libraries were developed and

sequenced to more accurately characterize microbial populations in drinking water

samples. Water samples from the two existing drinking water distribution systems were

used to extract whole microbial community DNA. The two distribution systems will be

referred to as DS1 (ground water fed) and DS2 (surface water fed).

1.2 Available Molecular Tools used to Analyze Community Structures in Drinking

Water Research

Many techniques and technologies exist, which enable researchers to analyze the

microbes, or microbial community, in an environmental sample. These technologies are

mainly based on the exploitation of the microbial DNA, or RNA, present in a given

sample.

Molecular methods, for such analysis include: PCR and sequencing, DGGE,

quantitative polymerase chain reaction, restriction fragment length polymorphism, T-

RFLP, and others. Some molecular tools enable one to identify specific organisms by

means of library-independent approaches. Other tools allow the researcher to analyze the

differences in community structure of the microbes that exist in a given sample. Clone

libraries, denaturing gradient gel electrophoresis (DGGE), and terminal-restriction

9 fragment length polymorphism (T-RFLP), are a few of such tools which can be used to obtain a fingerprint of the microbial community structure.

Most molecular-based analyses of community structures begin with the amplification of DNA or RNA in order to generate enough starting material for further analysis.

The Polymerase Chain Reaction (PCR), developed in 1983 by Kary Mullis, is a technology widely used to amplify template DNA by exploiting the natural DNA replication mechanisms found in nature. One needs only template DNA of interest, some basic reagents (enzymes and buffers), and a heat source (preferably with the ability to change temperatures as programmed). The applications of this technology have branched into different scientific disciplines including biochemistry, forensic science, and detection and diagnosis of infectious disease. In fact, Mullis shared the 1993 Nobel Prize in

Chemistry with Michael Smith, for his improvements to the process that have made it so widely used today. Amplification of DNA from environmental samples (such as water) is critical to the analysis of microbes in such environments, especially when studying the detection and enumeration of specific microbes.

After the amplification of the DNA present in the sample, one is left with billions of copies of the template. These copies can be used for direct community analysis

(fingerprinting), such as with DGGE or T-RFLP, or they can be inserted into plasmid vectors of chemically or electrochemically competent E. coli cells, a process called cloning. By inserting this DNA into the cells, one can take advantage of the natural bacterial cell replication process in order to produce even higher numbers of copies of the amplified DNA.

10 Denaturing gradient gel electrophoresis (DGGE) has been used to highlight the differences between different microbial ecologies among environmental and clinical samples (Iasur-Kruh, Hadar, Milstein, Gasith, & Minz, 2009); (van Vliet et al ., 2009).

The technology is based on the fact that different genetic sequences have different

amounts of guanine (G’s) and cytosine (C’s) bases in their code. By exposing PCR

products to gradients of chemical denaturant in a polyacrylamide gel, eventually the

products will reach a threshold denaturant concentration. At this point, the movement

through the gel is slowed dramatically. One can then see different species separated out

of the PCR product mix, so to speak. A banding pattern can seen based on this

separation, illustrating the community profile.

Another molecular fingerprinting method is terminal-restriction fragment length

polymorphism (T-RFLP). This method also involves initial amplification of DNA, only

with one or more fluorescently labeled primers. These fluorescent PCR products are then

“cut up” with restriction enzymes to produce varying fluorescent sized DNA fragments

that can be detected with optical systems. T-RFLP is highly reproducible, and it can

generate data on relative abundances of certain operational taxonomic units (species) in a

sample (Braker, Ayala-del-Rio, Devol, Fesefeldt, & Tiedje, 2001).

Fluorescence in situ hybridization (FISH) is a molecular technique used to visualize

microbial communities using fluorescently labeled probes, which are designed to bind to

specific sites on the DNA of specific genera of organisms. For example, probes have

been designed to target specific genera and species of bacteria (Braun, Richert, &

Szewzyk, 2009). However, in order to utilize this methodology, one must first obtain the

11 sequence (or conserved region) of DNA for these genera or species of bacteria for probe design.

1.3 Cultivation vs. Culture-Independent Library-Based Approaches

Culture based approaches in environmental sample analyses have proven to be a valuable approach in many studies over the years. Such methods have produced a wealth of data regarding microorganisms that are able to be isolated from samples taken from numerous environments, such as soil, water, air, marine, and even intestinal. The gold standard in water quality remains the presence/absence of E. coli . Although this data does not tell us much else about the other microbes that are in the sample, it has proven to be a decent indicator organism regarding public health.

However, drawbacks to the culture-based approach are numerous, when used to characterize the bacterial community structures present in environmental samples. The culture-based approach relies on many assumptions, including: that the organisms in the sample are viable, the organisms in the sample are able to utilize the specific media provided for growth, the organisms will thrive at specific environmental conditions provided such as temp, pH, humidity, aerobic/anaerobic, etc., and also that the competition for resources is negligible between organisms. Not only are these methods very specific in nature, but they can take days to weeks to months, before they are cultured and identified.

The use of a cultivation-independent library-based approach has let researchers gather enormous amounts of information regarding the types of microbes in samples.

12 Microbial communities from environmental samples can be analyzed by construction and comparison of 16S rRNA gene clone libraries and fingerprinting techniques. These approaches are more robust, in that they enable researchers to get a snapshot of the operational taxonomic units (OTUs) present in their samples. For example, a researcher can isolate DNA from a sample, insert that piece of DNA into a genetically engineered cell (usually E. coli ), and exploit the new host cell by making it produce millions of

copies of the isolated DNA via regular proliferation. With so many new copies of the

DNA, more reactions, such as sequencing reactions, can be carried out in order to

identify each base in the genetic sequence and assign the sequence to a phylogenetic

group when compared to databases. Accumulations of gene sequences into publically

available databases have been rapidly increasing over recent years. Databases such as

Genbank (www.ncbi.nlm.nih.gov) and RDP (rdp.cme.msu.edu) provide data and services

to researchers. These sites provide databases of annotated Bacterial and Archaeal small-

subunit 16S rRNA sequences, which are commonly used for comparison and

classification. Moreover, this information is publically available, and growing.

Sequences are constantly being added to these databases from isolated or cloned DNA

fragments which originate in a variety of environments.

13 Chapter 2. Biofilms and Bulk Water Microbial Analysis

Many studies have been conducted on biofilms and bulk water gathered from drinking water samples. These studies include: culture based studies, studies based on phylogenetic analysis, potential for harboring pathogens, as well as dealing with biofilm characteristics such as colonization, growth, sloughing, and regrowth, as well as a variety of other factors, such as responses to disinfection.

Biofilms can grow on just about any surface with sufficient moisture and resources for growth, including the surfaces of interior walls of a pipe in a drinking water distribution system under harsh oligotrophic conditions and residual disinfectant concentrations.

The process of biofilm formation has been described as follows. First, the bacteria attach to a solid surface, which in turn, provides more surface area for additional bacteria to adhere. The bacteria produce extrapolymeric substance (EPS) as the biofilm grows. The biofilm grows and matures and finally disperses. Biofilms are essentially a community of bacteria that can form relationships through communication, and can produce substances for protection of the community through synergistic interactions.

These substances can protect the biofilm inhabitants, or community members, against phagocytes, disinfectants, and antibiotics. Studies have shown that members of a biofilm can protect themselves by forming a dense matrix of cells and EPS. With the cells and

EPS on the outer surfaces of the film, the cells inside are able to live and communicate with each other, all the while being protected from detergents, disinfectants, antibiotics,

14 and macrophages. One study has shown that antibiotic resistance of biofilms can increase up to 1000 fold. (Stewart & Costerton, 2001)

Studies utilizing R2A agar (Reasoner & Geldreich, 1985) have been conducted in order to analyze the bacterial communities that are able to be cultured in a media that closely mimics the chemical composition of water flowing through drinking water distribution systems. Researchers have utilized this approach in many studies (Uhl &

Schaule, 2004). For example, biofilm formation in drinking water has been shown to be affected by low concentrations of phosphorus (Lehtola, Miettinen, & Martikainen, 2002);

(Williams, et al ., 2004); (Kalmbach, Manz, & Szewzyk, 1997).

Many studies utilizing culture-based approaches, molecular tools, and

combinations of both, have been conducted on drinking water bacteria (LeChevallier,

Babcock, & Lee, 1987);(Norton & LeChevallier, 2000);(Olson & Nagy, 1984);(Payment,

Gamache, & Paquette, 1988);(Schmeisser, et al., 2003).

However, the use of molecular tools has provided a greater amount of information

regarding the identities of bacteria that inhabit such systems. Santo Domingo and

colleagues (2003) showed that the 16S rRNA gene is a useful target for molecular studies

in drinking water phylogenetic studies on bacteria, due to the fact that more accurate

phylogenetic information can be gathered using this approach. Santo Domingo and

colleagues (2003) showed that the majority of bacteria inhabiting the system under study

belong to the classes Alpha - and Beta - . Further, they showed that other

inhabitants of the system under study include those from the groups Gram positive

bacteria such as Mycobacteria , Alphaproteobacteria such as Pedomicrobium ,

Hyphmicrobium , Sphingomonas . such as Dechlormonas ,

15 Aquaspirillium , and Gammaproteobacteria such as Legionella . Santo Domingo and colleagues showed that the gram positive bacteria were more abundant in the clone libraries generated than in the isolates grown on R2A (Santo Domingo, et al ., 2003).

Research has been conducted in order to evaluate the ability of biofilm to protect

pathogens. For example, Helmi and colleagues utilized molecular assays to describe the

potential for biofilm to protect parasites and viruses over approximately 40 days after

spiking the annular reactors (Helmi, et al ., 2008).

Researchers have also studied the responses of the bacterial communities

occurring after changes in treatment regimens and disinfection types (chlorine and

chloramine) (Pryor et al ., 2004); (Williams, et al ., 2005).

Classification of bacteria, whether by culture-based or molecular approaches,

have revealed that the majority of culturable bacteria found in distribution system

samples fall into the Phylum Proteobacteria (Martiny, Albrechtsen, Arvin, & Molin,

2005). Some other phyla that have been identified include: Actinobacteria , Firmicutes ,

and Bacteriodetes

16 Chapter 3. Cincinnati Drinking Water Distribution System

3.1 Site Description of Distribution System Network

The distribution system network is located north of the Ohio River and is surrounded mainly by the interstate I-275 beltway (except for portions of the West side).

The Miller plant provides water to the light blue areas on the map, and the Bolton water treatment plant provides water to the orange colored section of the map (Figure 1). The

Miller plant provides roughly 88 percent of the water to the greater Cincinnati area and the Bolton plant provides the other 12 percent. The Miller plant is fed by the Ohio River, and the Bolton Plant draws water from the Great Miami Aquifer. These plants also differ by treatment train (Figures 2 and 3).

17

Figure 1. Map of the Greater Cincinnati Water Works Service Area

Figure 2. Schematic representation of the surface water treatment regimen used to treat

Ohio River water for public use. http://www.cincinnati-oh.gov/water/pages/-3283-/

18

Figure 3. Schematic representation of the ground water treatment regimen used to treat water pumped from the Great Miami Aquifer for public use. http://www.cincinnati-oh.gov/water/pages/-3283-/

3.2 Description of Sampling Stations

Sampling sites visited in this study were chosen based on their location in the distribution system network. Clone libraries were generated from numerous sites within both areas of the distribution system. These sites are the same sampling sites used by the

Cincinnati Water Works employees to gather data such as: pH, free chlorine, total chlorine, temperature, and sometimes bacterial detection samples. The majority of these sites are gas stations and public works buildings. These buildings, being smaller structures, have a relatively low pipe volume, making them more attractive sites for sampling, as internal pipes can be flushed more readily using 5 minute flushing before sampling.

19

Figure 4. Sampling sites visited throughout course of study

3.3 Benefits of This Study

This study aims to reveal differences in bacterial community structures that exist within two areas of a metropolitan area drinking water distribution system. The benefits associated with this study are that the data gathered will provide the US Environmental

Protection Agency with a better understanding of the bacteria that can be found in bulk water samples in drinking water. There have been many studies conducted in the past that were based on culture approaches, to identify living members of the bacterial communities in drinking water, but more information is needed to paint a better picture of the diversity of bacteria that exist in drinking water, as the biofilm in drinking water networks can harbor potential pathogens, and possibly affecting public health. As stated

20 in the abstract, statistics published by a number of researchers have documented as many as 764 documented waterborne outbreaks occurred between 1971 and 2002, which resulted in over half a million cases of illness and 79 deaths. Also, it has been theorized that as many as 19 million cases/year of waterborne illness are caused by public water systems fed by ground water and surface water in the US. Due to the inability of E. coli to predict gastrointestinal illness, as well as other infections, phylogenetic analysis of bacteria found in bulk water samples is justified, in order to gather more information pertaining to the complex community structures in drinking water distribution systems.

These community structures may produce favorable conditions for pathogenic microbes.

Also, detectable disturbances of relative abundances of specific organisms or certain genera of organisms may serve as a signal or early warning of contamination or system failure, to utility operators.

3.4 Application of Library and Culture-Independent PCR Assays in Drinking

Water.

In order to gain a more comprehensive perspective on the diversity of bacteria in bulk water samples of a DWDS, a library and culture-independent full cycle 16S approach was used. Using the full cycle 16S approach through the development of clone libraries was chosen for this study due to the ability of this approach to identify bacteria down to the genus level (if not species). With the development and continued updating of publically available databases such as Genbank ( www.ncbi.nlm.nih.gov )

which contains sequences of more than 10,000 organisms, or Greengenes

21 (www.greengenes.lbl.gov), which houses over 230,000 aligned 16S rDNA sequences for comparisons, searching and comparing sequence data generated from clone libraries has become a common approach to phylogenetic analysis. Furthermore, market competition has driven down the cost of chemicals and enzymes needed for PCR, cloning, and sequencing reactions, making the full cycle 16S approach more attractive. Sequence data was analyzed using the NCBI database using BLASTn software (Altschul et al ., 1997) as

well as the RDP classification tool (Wang, Garrity, Tiedje, & Cole, 2007). Sequencher

4.6 (Gene Codes, Ann Arbor, MI) was used to edit the contiguous sequences. The

internet-based tool for analysis of 16S sequence data Greengenes was used to align

sequences as well as check for chimeric sequences (DeSantis et al ., 2006).

Chapter 4. Molecular Survey of Drinking Water Distribution System in

Cincinnati using PCR and phylogenetic analyses of bacterial 16S rDNA

4.1 Materials and Methods

4.1.1 Sample Collection

The general experimental design used in this study is shown in Figure 5. Water samples were obtained directly from faucet heads from 31 different sites within the drinking water distribution systems of a metropolitan area. The major differences between DS1 and DS2 are water source and treatment processes. Water in the distribution system coming from surface water undergoes a treatment train including addition of settling aids before a primary settling stage. Following the settling stage (reservoir) the pH of the water is adjusted and final settling occurs. After the final settling stage is

22 accomplished, the water is passed through a traditional sand and gravel filtration step, and finally the water is passed through granular activated carbon which helps in the removal of organics.

Water gathered from ground water sources undergoes a somewhat different train of treatment. First, the water is pumped up from well fields located within an aquifer.

Secondly, the water is softened through the addition of lime. Following the softening process, the water is allowed to settle, helping to remove solids and the previously added lime. After the water is allowed to settle, chlorine and fluoride are added prior to traditional filtration through coal, sand, and gravel. Finally, the water is pumped to a reservoir for storage before releasing into the distribution system. In this study, the source for DS1 is ground water while DS2 source is surface water.

Faucets were run with the cold valve completely open for 5 minutes in order to flush out idle water from the pipes within the structure. One liter polypropylene

(Nalgene) bottles were used to collect water. A total of 5 liters were obtained from each sample location and transported to the laboratory in coolers. Temperature, pH, free chlorine concentration, and total chlorine concentrations values were measured and recorded at each sampling site (Table 3 in Appendix). Samples were processed within three hours of collection.

23 Phylogenetic Analysis

Collected and filtered (5L) of water from two distribution systems:

Automated Sequencing

Extract Genomic DNA from water filters.

Confirmation of correct insert size using M13 PCR

M13 PCR: Confirmation of Inserts Polymerase Chain Reaction PCR with universal primers 8F and 787R Cloning Ligation of PCR product to vector and transformation of plasmid into E. coli competent cells Universal Primers (8F and 787R)

Figure 5. Experimental approach used in this study. The end result is the phylogenetic identification of bacterial populations directly from whole microbial community DNA extracts.

4.1.2 Molecular Techniques

DNA Extraction

Prior to nucleic extractions, water samples were filtered through polycarbonate membranes (47 mm in diameter, 0.22 µm pore size; Osmonics Laboratory Products).

Water samples from three areas within a municipal drinking water distribution system were used to extract whole microbial community DNA. Membranes were folded in half,

24 placed in 2 ml tubes, and stored at -80C until used in extractions. DNA extractions were performed using UltraClean Soil DNA kit (MoBio Laboratories Inc., Solana Beach, CA), as others have been able to extract DNA from various environmental samples with this kit (Cordova-Kreylos et al ., 2006); (Perreault, Andersen, Pollard, Greer, & Whyte, 2007);

(Blackwood, Oaks, & Buyer, 2005); (Souza et al ., 2006). DNA extracted was stored at -

20°C in a buffered solution provided in the DNA extraction kit. Bacterial community characterization was performed by 16S rDNA sequencing analysis. This gene, which is present in all bacteria, is considered a good phylogenetic marker due to the different levels of sequence conversancy present in the entire gene family. Microbial community

DNA was used in Polymerase Chain Reaction (PCR) studies to develop 16S rDNA clone libraries. Primers 8F (5’AGAGTTTGATCCTGGCTCAG) and 787R (5’GGACTACCAGGGTATCTAAT) which target universally conserved regions of the 16S rRNA gene were used to generate mixed community PCR products used to develop clone libraries as done in other studies (Rodriguez, Wachlin, Altendorf,

Garcia, & Lipski, 2007);(Badenoch, Mills, Woolley, & Wetherall, 2007);(Wilson,

Blitchington, & Greene, 1990);(Merrill, Dunbar, Richardson, & Kuske, 2006);(Kuske,

Barns, Grow, Merrill, & Dunbar, 2006). These primers generate a PCR product that covers more than half of the entire 16S rRNA gene and yet does not require primer walking for sequencing.

The PCR assays contained the following reagents per 50 µl: 5 U of Ex Taq DNA polymerase (TaKaRa Mirus Bio Corp., Madison, WI), 10X Buffer (5µl), dNTP mix

(4µl), forward and reverse primers 8F and 787R (75 picomoles each), DNA template

(2µl), and Ultra Pure water (32.75 µl). The thermal cycler conditions used were as

25 follows: an initial denaturation step of 3 min at 94°C, followed by 35 cycles of 1 min at

94°C, 1 min at 56°C, 1 min at 72°. An extension step of 7 min at 72°C was included at the end of the cycling run, followed by a cooling step of 4°C. To confirm PCR product formation, products were screened using gel electrophoresis in 1% agarose (Fischer

Scientific) at 80 V for 100 min using GelStar™ as the DNA staining dye (BioWhittaker

Molecular Applications, Rockland, ME).

TA Cloning

Prior to cloning experiments, mixed community PCR products were cleaned with a QIA-quick kit (Qiagen, Inc., Valencia, CA) and then cloned using a TOPO TA cloning kit (Invitrogen Corp., Carlsbad, CA). Transformed cells were grown on Luria-Bertani agar plates containing ampicillin (100 mg/ml) as selective agent. Colonies were selected from LBampicilin plates and screened for proper PCR size using M13 primers. Gel electrophoresis was used to confirm PCR products. Images were documented using a

Kodak digital camera (model DC290 Zoom) and software package (Kodak 1D version

3.6.3). PCR products of expected size were cleaned as previously mentioned before use in sequencing reactions.

4.1.3 Sequence Analysis

Sequencing reactions were performed using cleaned PCR products and the ABI

Big Dye Terminator kit following the manufacturer’s instructions (Applied Biosystems,

26 Foster City, CA). Sequence reactions were cleaned using DyeEx plates (Qiagen) and dried down using a lyophilization step. PCR products were then resuspended in 25 L of

Hi-Di ™ Formamide (Applied Biosystems, Foster City, CA) and analyzed on an ABI

3730 xl DNA analyzer to obtain sequence data. Sequence data obtained through analysis was compared to sequences found in the NCBI database using BLASTn software

(Altschul, et al ., 1997). Sequencher 4.6 (Gene Codes, Ann Arbor, MI) was used to edit the contiguous sequences. Sequences were examined for chimeras with Chimera Check

(Cole et al ., 2003). The internet-based tool for analysis of 16S sequence data Greengenes

was used to align sequences as well as check for chimeric sequences (DeSantis, et al .,

2006)

4.2 Results

DNA extractions performed on membrane filters yielded between 2 ng/ l and 13 ng/ l. PCR assays generated products of the expected size visualized by gel

electrophoresis, further confirming the quality of the DNA extracted. A total of 2786 16S

rDNA clones generated from drinking water samples were obtained and analyzed in this

study. Depending on the efficiency of sequencing reactions, approximately 85 clones

were analyzed per library. A total of 2325 16S rRNA clones were used for analysis after

screening for chimeras using Bellerophon 3 (Huber, Faulkner, & Hugenholtz, 2004) and

screening for more than 5 ambiguous bases using MOTHUR, which is an open source

bioinformatics computer program (Schloss et al ., 2009).

27 Phylogentic Analysis

Alignments of sequences were produced using the website Greengenes (DeSantis, et al ., 2006). These alignments were also used to check for chimeric sequences using the software program Bellerophon 3. Resulting chimera-free alignments were submitted to the Greengenes website http://greengenes.lbl.gov for classification. Summaries of these sequence classifications can be seen in Figures 6-10. Also, summaries of the sequences gathered from the surface water areas of the WDS, and those from the ground water areas of the WDS, are summarized in different pie charts.

Figure 6. Classification of all sequences obtained in the study

28 Overall, 46% of the sequences in this study were from the Actinobacteria group,

36% were from the Alphaprotoebacteria group, 5% from Betaproteobacteria , 3% each

from Planctomycetes and Firmicutes groups, 2% each from groups Cyanobacteria ,

Gammaproteobacteria and Unclassified, 1% from Deinococcus-Thermus , and less than

1% from δ-proteobacteria (Figure 6).

Figure 7. Classifications of bacteria found in the ground water fed area of the distribution system

The dominant genera of bacteria found in areas of the distribution system fed by ground water were Actinobacteria (50%) and Proteobacteria (48%). Other groups including Firmicutes , Cyanobacteria , Deinococcus-Thermus , Planktomycetes ,

29 Bacteroidetes , and Verrucomicrobia at 1% or less (Figure 7). Alphaproteobacteria

(91%), Betaproteobacteria (5%), and Gammaproteobacteria (4%) accounted for the sequences from within the phylum Proteobacteria (Figue 8).

Figure 8. Ground water Proteobacteria

30

Figure 9. Classifications of surface water sequences

The relative abundances of sequences obtained from the surface water sampling sites showed to be slightly different to those obtained from the ground water samples, in that the abundances of Proteobacteria (41%) and Actinobacteria (45%), were slightly lower (Figure 9). In addition, surface water libraries revealed other classes such as

Planktomycetes (4%) , Firmicutes (5%) , Cyanobacteria (3%) , Bacteriodetes (<1%) ,

Verrucomicrobia (1%), and Deinococcus (1%) (Figure 9). This observation allows one to hypothsize that either the source of water, or the treatment processes involved in the production of the finished water, have an impact on the relative abundances of the bacterial classes and groups that can be detected using 16S-based assays.

31

Figure 10. Surface water Proteobacteria

Sequences identified as being members of the Proteobacteria phylum were further classified to the Class level (Figure 10). The Alphaproteobacterial sequences were most abundant, accounting for 82% of the proteobacterial sequences. Sequences identified as

Betaproteobacteria accounted for 14% of the sequences in this phylum. 4% of the

sequences were classified as Gammaproteobacteria . Sequences identified as

Deltaproteobacteria and each accounted for less than one percent

of the sequences in this phylum.

Analysis of Texas and CDC samples

Additional water samples were taken from taps in laboratories in Atlanta, Georgia

(at the Centers for Disease Control and Prevention) as well as from a laboratory tap in

32 Texas. These samples were analyzed for phylogenetic classification in order to compare the relative abundances of bacterial sequences obtained from tap water samples from different geographic regions. Results of this analysis showed differences in community structures between the different taps. All samples were shipped to the EPA on ice, and

DNA was extracted upon arrival. The samples shipped from the CDC showed that the majority of clones analyzed were classified as Gammaproteobacteria (71%), followed by

Alphaproteobacteria (25%), and Actinobacteria (3%). Members of the class Firmicutes were also represented in the samples (1%). 88 partial 16S sequences were obtained and analyzed from the CDC tap water samples. Samples taken from taps in Atlanta, Texas, and the system under study, showed major differences in relative abundances of bacterial genera. Although the sequence data set collected from the distribution system under study was much larger than that of either the Texas samples, or the Atlanta samples, it should be noted that the relative abundances of the bacterial classes and phyla showed major differences in composition. The two Texas libraries were composed of 54 and 51 sequences. The two Atlanta libraries were composed of 54 and 34 sequences.

33

Figure 11. Classifications of sequences gathered from Atlanta tap water samples

Classifications of Sequences from Texas Tap Water Samples

Cyanobacteria 10%

Alphaproteobacteria Actinobacteria 73% 1% Gammaproteobacteria 3% Firmicutes 5%

Betaproteobacteria 6%

Deltaproteobacteria 2%

Figure 12. Classifications of sequences gathered from Texas tap water samples

34 Samples analyzed from Texas tap water showed greater similarity with the samples analyzed from the tap water samples gathered from the distribution system under study, in that the Alphaproteobacterial sequences were more dominant in the clone libraries generated. Also, the classifications covered many other phyla such as

Cyanobacteria , Betaproteobacteria , and Deltaproteobacteria . However, one major difference between the samples analyzed in the current study and the samples received from Atlanta and Texas was that the relative abundance of Actinobacteria sequences was

much greater in the samples taken from the Cincinnati WDS. Sequences closely related

to the genera Gammaproteobacteria were shown to dominate the libraries generated from the samples sent from CDC. Again, with sequences closely related to Actinobacteria represented in low relative abundances. It should be noted that the sequences representing the Actinobacteria class were further classified to the genus level by the classification tool in Greengenes (which utilizes public databases such as RDP and

NCBI). In all, 1234 Actinobacterial sequences were recovered. 94% of the

Actinobacterial sequences were classified as Mycobacterium , 3% were classified as

Microbacteria , 1% as Pseudonocardia , and 4% were unidentified Actinobacteria .

4.3 Discussion

Sequence comparisons with existing databases revealed Actinobacteria as the dominant class of bacteria found in the WDS studied. Alphaproteobacteria was the next most abundant genera of bacteria in both surface water fed locations as well as ground water fed locations. Previous studies have shown Betaproteobacteria to be the dominant

35 class of bacteria found in drinking water (Kalmbach, et al ., 1997). However, this difference may be attributed to other abiotic factors, such as nutrient loads, available organic carbon, or possibly due to hydrogeochemical factors affecting those areas of study. For example, (Rubin & Leff, 2007) showed differences in community structure related to nutrient availability in the Ohio River (USA), a source of drinking water for many communities. Clone libraries generated from samples gathered from another distribution system (CDC samples) showed that the samples were dominated by the

Gammaproteobacteria class. Tap water samples gathered from Texas were also analyzed, and found to be dominated by Alphaproteobacterial sequences. Clones analyzed from the Texas samples seemed to be more diverse, as other bacterial classes were represented, including: Alpha , Beta , Delta , and Gammaproteobacteria . Firmicutes and Cyanobacteria were also represented.

Legionella , Pseudomonas , and Agrobacterium -like sequences were also identified in this study. Only 170 clones generated from the area of the distribution system fed by surface water, and 48 from areas fed by ground water matched sequences in the RDP database at a value of less than 90%, accounting for 7% of the total number of clones examined. Phylogenetic analysis of clones examined, in this study, illustrated similarities to previous studies pertaining to drinking water biofilms. For example, 16S rRNA gene sequences identified as mycobacteria, Alphaproteobacteria , and Betaproteobacteria organisms have been previously shown to be present in both drinking water biofilms and drinking water planktonic populations (Santo Domingo, et al ., 2003); (Williams, et al .,

2004); (Williams, et al ., 2005).

36 In this study, nearly identical sequences have been recovered from 31 independent sites from the same distribution system suggesting that these bacterial groups can be considered part of the normal microbiota of this drinking water system. Moreover, the isolation of mycobacteria and Alphaproteobacteria (Covert, Rodgers, Reyes, & Stelma,

1999); (Falkinham, Norton, & LeChevallier, 2001) and the finding of rRNA-based clones affiliated to these bacterial groups (Keinanen-Toivola, et al ., 2006) suggest that some drinking water bacteria are capable of surviving the relatively harsh conditions found in drinking water. It should be noted that many of the sequences identified in this study, are closely related to partial 16S sequences deposited into public databases. In fact, using

BLAST, we found that many sequences identified in this study matched most closely to sequences submitted by Santo Domingo and Williams (Williams, et al ., 2004). However, the main difference between this study and most other phylogenetic studies conducted on drinking water is the high relative proportion of mycobacterial 16S sequences. Although many similarities between sequence data exist between this study, and the study conducted by Santo Domingo and Williams, et al, this result may be related to the fact that mycobacteria tend to be slow growing bacteria, and the Santo Domingo study was conducted over a period of 36 weeks under use of different residual disinfection chemicals (chlorine and chloramines), possibly not allowing enough time and stable conditions for the mycobacterial populations to reach maximum relative abundance

(Williams, et al ., 2004). Recently, Feazel and colleagues (Feazel, et al ., 2009) were able

to detect high numbers of mycobacterial sequences gathered from swabs of disassembled

shower heads in a variety of different geographic locations. The results of this study

37 clearly support the findings of other studies, in that mycobacteria inhabit drinking water distribution systems, whether in the biofilm or in planktonic phase.

Sequence analysis of the 16S rDNA clones generated from the surface water samples showed that 46% of the total sequences were closely related to the alphaproteobacteria class. This supports the findings of (Williams, et al ., 2004), showing that Alphaproteobacteria are a numerically dominant group in drinking water distribution

systems using chlorine as the disinfectant. Interestingly, 45% of the sequences recovered

from surface water fed, and 50% of the sequences recovered from the ground water fed,

areas of the WDS were affiliated with mycobacterial species in publicly available

databases. Sequences closely related to the species Mycobacteria gordonae,

Mycobacterium massiliense , Mycobacterium sacrum , and Mycobacterium sp. accounted

for this portion of sequences when using BLAST.

Many of the phylogenetic groups identified in this study have been shown to be

present in natural surface water samples (Zwart et al . 2002). In the study, Zwart et al .

identified bacterial divisions found in freshwater samples. The results of this study

identify many of the divisions that were identified by Zwart et al . For example, relative

abundances of Actinobacteria , Alpha -, and Beta - proteobacterial sequences dominated the clone libraries generated by Zwart and colleagues. It should be noted that these divisions also dominated the sequences found in this study, with similar relative abundances (other than the Actinobacterial sequences). Additionally, Zwart and colleagues concluded that the sequences obtained in their study were found only in freshwater samples, and were not associated with soil types in the respective sampling areas. Their study also indicated that the relative abundance of 16S rRNA gene

38 sequences found in their samples could be considered descriptive of freshwater samples in general, as their sampling efforts included different parts of the world. Comparison between the results of these two studies suggests that bacterial 16S rRNA gene sequences that can be recovered from drinking water are correlated with those able to be obtained from natural freshwater samples, and also that the relative abundance of Mycobacterial sequences may be dominant in both environments. This comparison also suggests that the harsh conditions of the man made drinking water environment may selectively favor species from the genus Mycobacteria.

Group Zwart Percentage Study Percentage α-Proteobacteria 14.1 36 β-Proteobacteria 16.5 5 γ-Proteobacteria 4.4 2 δ-Proteobacteria 2.2 <1 ε-Proteobacteria 0.6 <1 CFB 14.4 N/A Actinobacteria 18.9 46 Gram Pos low GC 0.4 N/A Cyanobacteria 8.1 2 Verrucomicrobia 7.8 <1 Planktomycetes 3.6 3 Green non sulphur 1.5 N/A Holophaga 1.2 N/A OP10 1.9 N/A other 1.6 N/A unkown bacterial 2.9 2

Table 1. Comparison of Relative Abundance of Different Phyla of Bacteria between

Zwart et al . 2002 and This Study.

39 These results suggest that although source waters are treated before distribution, either live bacterial cells or DNA fragments, or both, may be introduced to the distribution system, and can subsequently be detected through sequence analysis.

Although either or both of these scenarios is possible (or even probable), the nonparametric estimators used to compare the two areas of the distribution system show that the areas in the distribution system fed by surface water are more diverse than those fed by ground water based on higher numbers of species observed. This result supports the notion that bacterial communities present in ground water are less diverse than those present in surface water. This notion could be explained by the fact that surface water tends to be more mobile than ground water, and comes into contact with a wider variety of soil and associated fauna. However, it is also possible that the manmade chemistry of the drinking water from either the surface water or ground water treatment plants could be the most significant factor that influences the bacterial community structure of the distribution system. For example, the average pH of the samples gathered coming from the ground water plant was 9.2, and the average pH of the samples coming from the surface water plant was 8.7. Another factor that could influence the composition of the drinking water bacterial communities, is the soil type in any area under study, as all water

(surface or ground water) is exposed to soil before the treatment process. For example, the water samples analyzed from Texas and Atlanta could show differences in community structure as a direct result of the types of bacteria that inhabit those bioregions.

Comparison of Different Indices and Estimators

40 Nonparametric Estimators

So, how many species are there in the samples collected in this study? Current technology, including mathematical modeling methods, is still far from being able to describe all bacteria in natural samples. This is one issue that microbial ecologists grapple with when looking at diversity in environmental samples. However, based on sequence analysis, it is possible to identify “species” or Operational Taxonomic Units (OTUs).

Current practice in microbial ecology is to identify sequences which are ≥97% similar as

distinct species. A few ways of dealing with this issue include the use of a variety of

estimators that have been developed by mathematicians and ecologists. One such group

is the nonparametric estimators. Chao1, developed by Anne Chao in 1984, is one such

estimator; the statistic aims to estimate the absolute number of species in a sample, and

has been used to analyze bacterial sequence data in other studies (Schloss & Handelsman,

2005); (Bik et al ., 2006). The formula is as follows:

2 Schao 1 = S obs + F 1 /2F 2

Where Sobs = number of species in sample; F 1 = number of observed species represented by a single individual; and F 2 = number of species represented by two individuals. The estimate is a function of the ratio of the number of singlets to the number of doublets.

The estimate reaches its largest value once each species is represented at least twice

(Chao, Colwell, Lin, & Gotelli, 2009). The performance of the Chao1 estimate (as well as others) was evaluated by Colwell and Coddington (Colwell & Coddington, 1994) on their ability to estimate the species richness of a Costa Rican seed bank. The estimators used

41 in their study produced accurate values of species richness using small sampling numbers.

The Chao 1 estimate is an example of a nonparametric estimator which is based on a group of statistics called “mark, release, and recapture” (MRR). In other words, statistics based on this methodology track the numbers of “species” that are seen more than once. Chao1 for example, compares the ratio of the abundance of singlet (species seen only once) to doublets (species seen twice). This non-parametric estimator, as well as others such as ACE, Jack1, and Jack2 have been compared at different similarity cutoff values i.e. 90%, 95%, 99%, etc., and found to be useful when comparing diversity among communities (Shaw et al ., 2008). In their study, which was based on a large data set of 6000 sequences gathered from the Global Ocean Survey (aquatic communities),

Shaw and colleagues found that diversity rankings of aquatic bacterial communities changed based on the percent similarity cutoff or the chosen definition of an operational taxonomic unit (OTU) for a given sample set. However, they concluded that using richness and evenness statistics such as Chao1, rarefaction (S obs ), Shannon, and Simpson

indices, were sufficient for comparison and ranking diversity of bacterial communities

sampled in their study.

Comparisons were made at different similarity cutoffs between libraries

constructed from clones generated in this study. See figure below. 100% and 98%

similarity cutoffs were examined in this study (as well as 95%, 97%, and 99% data not

shown). Results for the Chao1 estimator show that bacterial communities inhabiting or

simply flowing through the portion of the DWDS fed by surface water tend to be more

diverse than those found in areas of the DWDS fed by ground water.

42

Figure 13. Chao1 estimates for ground water and surface water showing number of expected OTUs per number of sequences analyzed at OTU cutoff values of 100% similarity and 98% similarity.

The abundance-based coverage estimator, ACE, is a similar formula which also looks at the ratios of singlets, doublets, triplets, and so on, up to ten species. Although both methods (Chao1 and ACE) tend to underestimate diversity at low sample sizes

(Colwell & Coddington, 1994), a comparison of estimates between the area of the DWDS fed by ground water and the area of the DWDS fed by surface water, reveal that the area of the DWDS fed by surface water has a greater bacterial richness. Although not surprising, the fact that both non-parametric estimators ranked the area fed by surface water to be more diverse is another line of evidence that suggests that the drinking water originating from the Miller treatment plant is more diverse than that of the water coming from the Bolton plant.

43

Figure 14. Calculated ACE values for surface and ground water collective libraries at

100% and 98% sequence identity levels.

Rarefaction analysis

Rarefaction analysis is another tool used by biologists to compare species richness between samples at the macro- and micro- biological levels. Rarefaction can be used to compare different assemblages of organisms. Rarefaction takes into account all of the species (in this study, sequences), and allows one to estimate the richness of the sample or assemblage at smaller sample sizes. An advantage of using rarefaction to compare samples is that one can compare species richness between samples, as long as the investigator “rarifies” the larger sample to one of equal number in the smaller sample.

Ex. 500 sequences from the larger clone library compared to 500 sequences gathered from the smaller clone library.

44 Rarefaction allows the calculation of the species richness for a given number of sampled individuals and allows the construction of rarefaction curves. This curve is a plot of the number of species as a function of the number of individuals sampled. If a steep slope is the result of such calculations, a large fraction of the species diversity has not been sampled. If the part of the curve is already becoming flat, a reasonable number of individuals is sampled and more intensive sampling will probably only yield a small number of additional species. Rarefaction was developed by Howard Sanders in 1968 as a means of comparing different samples (of different sizes) that were gathered in a marine benthic ecosystem study.

Rarefaction curves generated through use of the computer program DOTUR

(Schloss & Handelsman, 2005) were used to compare the summation of all clone libraries generated from the ground water fed areas of the WDS to that of the libraries generated from surface water fed areas of the WDS. In this study, rarefaction curves generated at different species definitions, i.e., 98%, 100%, showed the same pattern. When comparing ground water curves to surface water curves, one can see that at each level of species definition, the surface water curves show more diversity when rarified to equal amounts of sequences. The results of this analysis are summarized in Figure 15.

45

Figure 15. Rarefaction curves showing expected number of OTUs vs. number of sequences in both surface water and ground water at OTU definitions of 100%, 99%, and

97% sequence similarity.

Although there was a slightly larger amount of sequence data gathered from the surface water areas, rarefaction analysis allows us to make comparisons between these assemblages.

Community Analysis

Several tools for analysis of microbial communities (including membership and structure) are currently being used by microbiologists interested in community structure.

Microbiologists concerned with analysis of microbial communities are typically interested in the membership (the OTUs in a sample), and structure (the combination of

46 membership and abundance of the OTUs). One particular computer program developed by Dr. Patrick Schloss and colleagues named MOTHUR allows one to analyze microbial communities in a variety of ways. This program has the ability to report the similarities that exist between communities under study, based on a defined species level, or OTU.

The program also allows one to compare memberships and structures of communities, as well as report on the abundance of OTUs that are unique to a community or shared by two communities (Schloss, et al ., 2009). In order to analyze our data from both surface water and ground water clone libraries in a manner that allows for comparison of shared

OTUs, the computer program MOTHUR was used.

MOTHUR

The computer program MOTHUR was used to compare the surface and ground water libraries at the 98% species cutoff value (OTU definition). All of the surface water sequences were grouped together to form a library, and all of the ground water sequences were grouped into another library. The results of this analysis revealed 181 species present in the surface water library, and 103 species in the ground water library. 33

OTUs were shared between the two large libraries, accounting for 13% of the total sequences. The total richness of the combined two groups was 251 as seen in Figure 16.

47 33

The number of species in group S is 181 The number of species in group G is 103 The number of species shared between groups S and G is 33 Percentage of species that are shared in groups S and G is 13.1474 The total richness for all groups is 251

Figure 16. Venn diagram illustrating shared species (OTUs) between the collective surface and ground water libraries.

Sequences that were shared between the two groups were classified, as well as sequences that were not shared. Figure 17 illustrates the classification of the sequences belonging to the shared OTUs; Actinobacteria , Alphaproteobacteria , Betaproteobacteria ,

Gammaproteobacteria , and Cyanobacteria were shared between these two libraries.

48 Classification of OTUs shared between surface and Ground Water Libraries Cyanobacteria Alphaproteobacteria <1% 40% Betaproteobacteria 4%

Gammaproteobacteria 1%

Actinobacteria 55%

Figure 17. Classification of sequences in OTUs shared between the surface water library and the ground water library.

Sequences that were not shared between the two libraries were classified in the same manner. The following pie chart summarizes the results of this analysis.

49

Figure 18. Classification of sequences in OTUs not shared between the surface water library and the ground water library.

Another unique option allows the user to separate the sequences which are not shared between communities under study. Figure 18 is the result of this analysis and classification. Interestingly, many of the sequences that were not shared between the two collective libraries included those classified as belonging to the groups: Firmicutes

(15%), Verrucomicrobia (1%), Bacteroidetes (1%), Planctomycetes (11%), Deinococcus-

Thermus (2%), Deltaproteobacteria (<1%), and Epsilonproteobacteria (<1%). However, major groups such as Alphaproteobacteria , Actinobacteria , Betaproteobacteria , and

Gammaproteobacteria , which were commom between the two collective groups also had

50 many sequences that were not shared. These sequences belonged to OTUs that were present in either the ground water or surface water collective libraries.

Figure 19. Heat map generated using MOTHUR illustrating OTUs defined at the 98%

sequence identity level and their relative abundances in surface water and ground water.

The heat map generated using MOTHUR allows one to visualize the differences in the number of OTUs in the collective surface and ground water libraries. Each line represents an OTU, and the shade of the line indicates the relative abundance of that particular OTU in the library. The results of the analysis illustrate the larger species

51 richness of the collective surface water library as opposed to the collective ground water library.

Within the MOTHUR computer program, one can use the program s-Libshuff in order to compare two different libraries. The program calculates the Cramer-von Mises test statistic and depending on the calculated values, one can say with statistical confidence that two libraries are either statistically different or not. The results of this analysis (Table 2) show that the cumulative libraries, ground water and surface water, are statistically significantly different from one another. With the calculated dcxy scores, one can see that the significance of these values is well below 0.025, which would be used since an experiment wide false detection rate of alpha = 0.05 was used.

Comparison dcxyscore Significance

Ground-Surface 0.00107406 <0.0001

Surface-Ground 0.00378481 <0.0001

Table 2. Results of analysis using S-Libschuff (as used in MOTHUR)

Mycobacteria

Alignment with sequences available within the NCBI database (Genbank) showed that a significant number of sequences obtained in this study were related to the following mycobacterial species: M. mucogenicum , M. massiliense , M. sacrum , M. gordonae , and

M. gadium (with sequence similarities of 98% and higher). These clones represented

52 1063 of the 2325 clones of the clones examined. This data further supports several studies conducted on the occurrence of environmental non-tuberculosis mycobacterium

(NTM) in drinking water using culture-based and molecular techniques (Covert, et al .,

1999); (September, et al ., 2004). (Torvinen et al., 2004)

It should be noted that the percentage of mycobacteria related sequences obtained

in this study is relatively high. Previous studies conducted on biofilms exposed to

chlorinated drinking water have shown that mycobacterial species account for a smaller

percentage of the species richness of the biofilm (Williams, et al ., 2004) (Keinanen-

Toivola, et al ., 2006). However, it should be noted that LeChevallier and colleagues

recognized that exposure to chlorine in concentrations similar to those found in DWDS

selected for Gram positive bacteria (35). Due to differences in phylogenetic composition

of the sequences being obtained from samples in the distribution system under study, two

drinking water samples from other locations in the US were obtained.

As previously mentioned, water samples were obtained from Atlanta, Georgia and a

Texas water laboratory. Mycobacterial sequences represented 3% or less of these two

libraries. This data supports the notion that certain conditions in the distribution system

selectively favor the proliferation of mycobacterial species. One possibility is the amount

of assimilable organic carbon (AOC), as previous work has shown that the density of

mycobacteria increases with the increase of AOC (Falkinham, et al ., 2001). Other factors

that influence mycobacterial densities are heterotrophic bacterial counts, iron, water

retention times and turbidity (Falkinham, et al ., 2001). Iivanainen and colleagues showed

variation in mycobacteria in surface water with respect to specific hydrogeochemical

characteristics of drainage areas (Iivanainen, Martikainen, Vaananen, & Katila, 1993).

53 Nutrients, acidic runoff, and organic matter from peatlands were shown to increase the occurrence of mycobacteria in brook waters. This illustrates the importance of geochemical variables which may affect the source waters utilized by the WDS under study. Covert and Rodgers showed that of the non-tuberculous mycobacteria (NTM) positive samples, 81% of them showed detectable levels of chlorine (between 0.7 ppm free chlorine and 1.0 ppm total chlorine). These values are within the range of free chlorine values in the present study. Using culture-based and molecular assays, Covert and Rodgers (Covert, et al ., 1999) found that the species Mycobacteria gordonae was the only NTM species that was recovered from distribution systems fed by ground water sources. For the isolates grown from samples taken from distribution systems fed by surface water, Covert and Rodgers, found M. mucogenicum to be the most commonly

isolated NTM. Interestingly, this organism was also found to be the most frequently

isolated organism in all of their samples.

The findings of this study, as well as others that have identified mycobacterial

species in drinking water systems, are not shocking considering the morphology of this

genus of bacteria. Mycobacterium species have a thick cell wall which is hydrophobic,

waxy, and rich in mycolic acids. These characteristics of the Mycobacterium cell wall

are thought to contribute to the organism’s hardiness, and also its ability to resist

antibiotics, in the cases of infection caused by certain species of the genus (e.g., members

of the Mycobacterium tuberculosis complex and members of the Mycobacterium avium

complex ). Certain species of Mycobacterium have been found to have the ability to resist specific biocides when physically associated with biofilms (Bardouniotis, Ceri, & Olson,

2003).

54 In general terms, the high numbers of Mycobacterium, and the different proportions of some members of this bacterial group in distribution systems are significant findings in light of the fact that several Mycobacterium species been associated with possible life threatening illnesses, as well as the ability of these species to cause severe illness and death in immunocompromised individuals. Although not all environmental mycobacteria are pathogenic, some species identified in this study have been associated with diseases in humans. For example, Mycobacterium mucogenicum was recently identified as the causative agent of neurological infections (Adekambi,

Foucault, La Scola, & Drancourt, 2006); (Kline et al ., 2004). This is an interesting finding as the latter species is considered a rapidly growing mycobacterial species while the mycobacterial strains of most public health concern are believed to be slow growing mycobacteria. While the molecular evidence presented in this study confirmed the presence of Mycobacterium sp . in this drinking water distribution system, there is no evidence regarding whether or not the identified strains are indeed pathogenic.

Similar to Mycobacterium , the detection of Legionella-like sequences does not immediately imply public health risk, as water has been the source of several Legionella species of little or no clinical relevance (Ohnishi et al ., 2004). However, these results are interesting because they suggest that Legionella can survive the harsh environmental conditions within drinking water. One possible survival mechanism relates to the ability of this bacterial group to live as an intracellular symbiont of environmental protozoa

(King, Shotts, Wooley, & Porter, 1988; Ohno, Kato, Yamada, & Yamaguchi, 2003).

Indeed, chlorine disinfection studies have shown that bacterium-protozoan associations can provide chlorine-sensitive bacteria with increased resistance to free chlorine (King, et

55 al ., 1988). Another mechanism relates to the ability of Legionella to grow within biofilms

(Rogers et al ., 1994). From a microbial water quality standpoint, these survival

mechanisms are relevant to human health as they increase the persistence of potentially

in drinking water (Szewzyk, et al ., 2000); (Parsek & Singh, 2003).

4.4 Conclusions and Future Work

This study aimed to gather phylogenetic information by way of analysis of the

highly conserved bacterial 16S rRNA gene. This gene, being a widely targeted molecule

by microbiologists and microbial ecologists due to its conserved nature, was utilized in

this study to describe the bacterial genomic DNA that could be gathered by sampling

bulk water at 31 sampling sites in the Cincinnati Metropolitan Water Distribution

System. Results of this study have shown that different bacterial genera are present in

DWDS water samples. In addition, although the majority of bacterial phyla are

represented in both of the areas (those fed by ground water and those fed by surface

water), classifications of these partial 16S sequences illustrated many differences

between the compiled libraries.

Firstly, many of the sequences obtained from the surface water fed area of the

distribution system were not detected in the ground water samples. This result could be

explained by the idea that ground water is formed by percolation of surface water down

to the water table, in this case the Great Miami Aquifer. This aquifer water may have

totally different chemistry than that of water flowing in the Ohio River, as it travels

through the soil and may dissolve and retain minerals and metals along the way. Also,

56 ground water is thought to contain less dissolved oxygen than surface water, which may be another factor that influences bacterial growth.

Secondly, it was interesting to find that, although the majority of the bacterial genera were identified in both the surface water and the ground water libraries, phylogenetic analysis using 98% as the species identity cutoff showed that 33 OTUs were shared between the two compiled libraries, represented only 13% of the total OTUs that could be formed at this species identity level. This result suggests that there could possibly be a vast diversity among the specific bacterial OTUs that were identified in this study.

Thirdly, sequences closely related to Mycobacterium species were revealed in high relative abundances in this study. Previous studies have identified this phylum of bacteria in drinking water samples in a variety of geographic regions using culture-based as well as molecular based assays. Overall, Mycobacterial species accounted for roughly

46% of all sequences obtained in this study. Although partial 16S rRNA gene analysis

cannot confirm the activity or infectivity of these organisms, the shocking relative

abundances of the sequences that were gathered in the study may justify the undertaking

of a project which aims at quantifying the active portion of these Mycobacterium species

within the DWDS under study, or others across the country. Although culture-based

studies have quantified mycobacteria in drinking water, perhaps a vast number of the

bacteria may be persisting in a viable but non-culturable state, which prevents culture-

based approaches from accurately describing the relative abundance of these particular

organisms. Perhaps a molecular approach targeting RNA of Mycobacteria would be a

57 more valuable indicator of the active fraction of the bacterial community that inhabit

DWDS.

In conclusion, future work in this field should focus on developing molecular approaches that aim to identify active bacterial species present in DWDS at points of use

(POUs) within DWDS, as this approach will be more representative of the living bacteria that consumers are exposed to on a regular basis. Another idea for future work on this topic would be to attempt to correlate environmental soil types with specific 16S rRNA sequences obtained form drinking water, as ground water and surface water are exposed to different soils (bioregions). This could be the most important factor that influences the bacterial community structure of drinking water based on the 16S rRNA sequencing approach. A study focusing on temporal shift in bacterial community structure based on limited sampling sites could also shed light on potential differences in community structure. Future work could also include studies that aim to describe the fate and transport of viruses in simulated DWDS, as they could be protected to some degree by the bacterial biofilms that are present in DWDS. Analysis and comparison of samples taken along the same water main should also be conducted, to add a spatial component to the study.

58 Appendix

Gel images:

Figure 20. Gel agarose electrophoresis of PCR assay using 16S rDNA universal primers

and drinking water microbial community DNA extracts. PCR products were

approximately 800bp. The sizes of the DNA ladder are 2000 bp, 1000bp, 500 bp, 250 bp,

and 100 bp (from top to bottom).

59 Date Location pH Temp °C Cl (Total) Cl (Free) 8/17/2006 2I 9.2 22.9 1.15 1.10 2Y 9.2 23.6 0.91 0.73 1W 9.1 24.4 0.87 0.81

8/23/2006 4B 8.6 26.3 0.82 0.75 4T 8.7 26.1 1.08 0.97 4R 8.7 27.3 0.62 0.58 4P 8.6 27.7 1.09 1.02 4G 8.7 27.6 1.04 0.92

9/7/2006 6A 8.7 24.2 1.03 0.96 5A 8.7 26.2 0.94 0.90 6C 8.7 25.5 1.12 1.04

9/13/2006 6A 6C 5B* 5C 3S

9/20/2006 4B 8.7 24.3 1.07 0.98 4T 8.7 23.7 0.86 0.82 4R 8.7 24.3 0.67 0.63 4P 8.7 24.2 0.76 0.73 4F 8.7 24.4 0.53 0.51

9/27/2006 6A 8.9 22.4 0.94 0.90 5A 8.8 22.6 1.06 0.90 6C 8.8 22.3 1.18 1.09 5B 8.8 21.4 1.22 1.16 5C 8.7 22 0.80 0.72

10/12/2006 2E 8.8 21.2 0.32 0.29 2G 9.2 18.1 1.04 1.00 3G 8.8 21.2 0.51 0.48 U 8.7 20.8 0.87 0.78 3E 8.7 20.8 0.50 0.47

11/2/2006 5P 8.8 14.8 1.18 1.07 2B 8.8 15.9 0.90 0.87 1A 8.7 16.8 0.73 0.63 2T 8.8 15.4 1.05 1.00 1D 8.8 14.4 0.91 0.87

11/16/2006 2E 9.1 15.4 0.72 0.64 2G 9 15.8 1.11 1.03 3G 8.7 14.5 0.59 0.54 3E 8.7 15.1 0.79 0.74 3F 8.7 15 0.54 0.50

11/30/2006 4B 8.5 14.6 1.18 1.07 4T 8.7 13.1 1.26 1.20 4R 8.6 11 0.98 0.91 4P 8.7 11.2 1.18 1.12 4F 8.7 13.9 0.92 0.87

1/25/2007 2Y 9.3 10.6 1.40 1.29 2J1 9.2 11.2 1.40 1.35 2N1 9.5 10.8 1.42 1.34 1W 9.2 11.1 1.24 1.16 2U 9.3 11.1 1.37 1.32

Table 3. Water quality data gathered during sampling of various points in the Cincinnati area drinking water distribution system.

60 Library Number of Clones 1A071907 85 1W070907 65 1W 88 2E 87 2I 070907 76 2I 92 2J1 89 2N1 070907 85 2N1 072007 72 2N1 95 2T 072007 71 2U1 072007 81 2U2 072007 87 2U 91 2Y 070907 77 2Y 89 3E 87 3G 84 3K 87 3S 76 4B 77 4P1 85 4P2 84 4P 93 4R 87 5A 90 5B 64 5C 88 6A 81 CA 157 JSD 216

Libraries SUM 31 2786

AVE Clones per Site 89.87096774

Table 4. Number of clones per library

61 References

Adekambi, T., Foucault, C., La Scola, B., & Drancourt, M. (2006). Report of two fatal cases of Mycobacterium mucogenicum central nervous system infection in immunocompetent patients. J Clin Microbiol, 44 (3), 837-840. Altschul, S. F., Madden, T. L., Schaffer, A. A., Zhang, J., Zhang, Z., Miller, W., et al . (1997). Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res, 25 (17), 3389-3402. Angenent, L. T., Kelley, S. T., St Amand, A., Pace, N. R., & Hernandez, M. T. (2005). Molecular identification of potential pathogens in water and air of a hospital therapy pool. Proc Natl Acad Sci U S A, 102 (13), 4860-4865. Badenoch, P. R., Mills, R. A., Woolley, M. W., & Wetherall, B. L. (2007). Clostridium novyi keratitis. Br J Ophthalmol, 91 (5), 691. Bardouniotis, E., Ceri, H., & Olson, M. E. (2003). Biofilm formation and biocide susceptibility testing of Mycobacterium fortuitum and Mycobacterium marinum. Curr Microbiol, 46 (1), 28-32. Barer, M. R. (1997). Viable but non-culturable and dormant bacteria: time to resolve an oxymoron and a misnomer? J Med Microbiol, 46 (8), 629-631. Bernhard, A. E., & Field, K. G. (2000). Identification of nonpoint sources of fecal pollution in coastal waters by using host-specific 16S ribosomal DNA genetic markers from fecal anaerobes. Appl Environ Microbiol, 66 (4), 1587-1594. Bik, E. M., Eckburg, P. B., Gill, S. R., Nelson, K. E., Purdom, E. A., Francois, F., et al . (2006). Molecular analysis of the bacterial microbiota in the human stomach. Proc Natl Acad Sci U S A, 103 (3), 732-737. Blackwood, C. B., Oaks, A., & Buyer, J. S. (2005). Phylum- and class-specific PCR primers for general microbial community analysis. Appl Environ Microbiol, 71 (10), 6193-6198. Boe-Hansen, R., Albrechtsen, H. J., Arvin, E., & Jorgensen, C. (2002). Bulk water phase and biofilm growth in drinking water at low nutrient conditions. Water Res, 36 (18), 4477-4486. Braker, G., Ayala-del-Rio, H. L., Devol, A. H., Fesefeldt, A., & Tiedje, J. M. (2001). Community structure of denitrifiers, bacteria, and archaea along redox gradients in Pacific Northwest marine sediments by terminal restriction fragment length polymorphism analysis of amplified nitrite reductase (nirS) and 16S rRNA genes. Appl Environ Microbiol, 67 (4), 1893-1901. Braun, B., Richert, I., & Szewzyk, U. (2009). Detection of iron-depositing Pedomicrobium species in native biofilms from the Odertal National Park by a new, specific FISH probe. J Microbiol Methods, 79 (1), 37-43. Chao, A., Colwell, R. K., Lin, C. W., & Gotelli, N. J. (2009). Sufficient sampling for asymptotic minimum species richness estimators. Ecology, 90 (4), 1125-1133. Chapelle, F.H. (2000). Ground-Water Microbiology and Geochemistry , Wiley, New York. Cole, J. R., Chai, B., Marsh, T. L., Farris, R. J., Wang, Q., Kulam, S. A., et al . (2003). The Ribosomal Database Project (RDP-II): previewing a new autoaligner that allows regular updates and the new prokaryotic . Nucleic Acids Res, 31 (1), 442-443.

62 Colwell, R. K., & Coddington, J. A. (1994). Estimating terrestrial biodiversity through extrapolation. Philos Trans R Soc Lond B Biol Sci, 345 (1311), 101-118. Cordova-Kreylos, A. L., Cao, Y., Green, P. G., Hwang, H. M., Kuivila, K. M., Lamontagne, M. G., et al . (2006). Diversity, composition, and geographical distribution of microbial communities in California salt marsh sediments. Appl Environ Microbiol, 72 (5), 3357-3366. Covert, T. C., Rodgers, M. R., Reyes, A. L., & Stelma, G. N., Jr. (1999). Occurrence of nontuberculous mycobacteria in environmental samples. Appl Environ Microbiol, 65 (6), 2492-2496. DeSantis, T. Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E. L., Keller, K., et al . (2006). Greengenes, a chimera-checked 16S rRNA gene database and workbench compatible with ARB. Appl Environ Microbiol, 72 (7), 5069-5072. Donlan, R. M. (2002). Biofilms: microbial life on surfaces. Emerg Infect Dis, 8 (9), 881- 890. Dunbar, J., Takala, S., Barns, S. M., Davis, J. A., & Kuske, C. R. (1999). Levels of bacterial community diversity in four arid soils compared by cultivation and 16S rRNA gene cloning. Appl Environ Microbiol, 65 (4), 1662-1669. Eichler, S., Christen, R., Holtje, C., Westphal, P., Botel, J., Brettar, I., et al . (2006). Composition and dynamics of bacterial communities of a drinking water supply system as assessed by RNA- and DNA-based 16S rRNA gene fingerprinting. Appl Environ Microbiol, 72 (3), 1858-1872. Falkinham, J. O., 3rd, Norton, C. D., & LeChevallier, M. W. (2001). Factors influencing numbers of Mycobacterium avium, Mycobacterium intracellulare, and other Mycobacteria in drinking water distribution systems. Appl Environ Microbiol, 67 (3), 1225-1231. Feazel, L. M., Baumgartner, L. K., Peterson, K. L., Frank, D. N., Harris, J. K., & Pace, N. R. (2009). Opportunistic pathogens enriched in showerhead biofilms. Proc Natl Acad Sci U S A, 106 (38), 16393-16399. Helmi, K., Skraber, S., Gantzer, C., Willame, R., Hoffmann, L., & Cauchie, H. M. (2008). Interactions of Cryptosporidium parvum, Giardia lamblia, vaccinal poliovirus type 1, and bacteriophages phiX174 and MS2 with a drinking water biofilm and a wastewater biofilm. Appl Environ Microbiol, 74 (7), 2079-2088. Huber, T., Faulkner, G., & Hugenholtz, P. (2004). Bellerophon: a program to detect chimeric sequences in multiple sequence alignments. Bioinformatics, 20 (14), 2317-2319. Iasur-Kruh, L., Hadar, Y., Milstein, D., Gasith, A., & Minz, D. (2009). Microbial Population and Activity in Wetland Microcosms Constructed for Improving Treated Municipal Wastewater. Microb Ecol . Iivanainen, E. K., Martikainen, P. J., Vaananen, P. K., & Katila, M. L. (1993). Environmental Factors Affecting the Occurrence of Mycobacteria in Brook Waters. Appl Environ Microbiol, 59 (2), 398-404. Kalmbach, S., Manz, W., & Szewzyk, U. (1997). Isolation of new bacterial species from drinking water biofilms and proof of their in situ dominance with highly specific 16S rRNA probes. Appl Environ Microbiol, 63 (11), 4164-4170.

63 Keinanen-Toivola, M. M., Revetta, R. P., & Santo Domingo, J. W. (2006). Identification of active bacterial communities in a model drinking water biofilm system using 16S rRNA-based clone libraries. FEMS Microbiol Lett, 257 (2), 182-188. Keinanen, M. M., Martikainen, P. J., & Kontro, M. H. (2004). Microbial community structure and biomass in developing drinking water biofilms. Can J Microbiol, 50 (3), 183-191. King, C. H., Shotts, E. B., Jr., Wooley, R. E., & Porter, K. G. (1988). Survival of coliforms and bacterial pathogens within protozoa during chlorination. Appl Environ Microbiol, 54 (12), 3023-3033. Kline, S., Cameron, S., Streifel, A., Yakrus, M. A., Kairis, F., Peacock, K., et al . (2004). An outbreak of bacteremias associated with Mycobacterium mucogenicum in a hospital water supply. Infect Control Hosp Epidemiol, 25 (12), 1042-1049. Kuske, C. R., Barns, S. M., Grow, C. C., Merrill, L., & Dunbar, J. (2006). Environmental survey for four pathogenic bacteria and closely related species using phylogenetic and functional genes. J Forensic Sci, 51 (3), 548-558. Lamendella, R., Domingo, J. W., Oerther, D. B., Vogel, J. R., & Stoeckel, D. M. (2007). Assessment of fecal pollution sources in a small northern-plains watershed using PCR and phylogenetic analyses of Bacteroidetes 16S rRNA gene. FEMS Microbiol Ecol, 59 (3), 651-660. LeChevallier, M. W., Babcock, T. M., & Lee, R. G. (1987). Examination and characterization of distribution system biofilms. Appl Environ Microbiol, 53 (12), 2714-2724. Lehtola, M. J., Miettinen, I. T., & Martikainen, P. J. (2002). Biofilm formation in drinking water affected by low concentrations of phosphorus. Can J Microbiol, 48 (6), 494-499. Lehtola, M. J., Miettinen, I. T., Vartiainen, T., & Martikainen, P. J. (2002). Changes in content of microbially available phosphorus, assimilable organic carbon and microbial growth potential during drinking water treatment processes. Water Res, 36 (15), 3681-3690. Manz, W., Szewzyk, U., Ericsson, P., Amann, R., Schleifer, K. H., & Stenstrom, T. A. (1993). In situ identification of bacteria in drinking water and adjoining biofilms by hybridization with 16S and 23S rRNA-directed fluorescent oligonucleotide probes. Appl Environ Microbiol, 59 (7), 2293-2298. Martiny, A. C., Albrechtsen, H. J., Arvin, E., & Molin, S. (2005). Identification of bacteria in biofilm and bulk water samples from a nonchlorinated model drinking water distribution system: detection of a large nitrite-oxidizing population associated with Nitrospira spp. Appl Environ Microbiol, 71 (12), 8611-8617. Merrill, L., Dunbar, J., Richardson, J., & Kuske, C. R. (2006). Composition of bacillus species in aerosols from 11 U.S. cities. J Forensic Sci, 51 (3), 559-565. Norton, C. D., & LeChevallier, M. W. (2000). A pilot study of bacteriological population changes through potable water treatment and distribution. Appl Environ Microbiol, 66 (1), 268-276. Ohnishi, H., Mizunoe, Y., Takade, A., Tanaka, Y., Miyamoto, H., Harada, M., et al . (2004). Legionella dumoffii DjlA, a member of the DnaJ family, is required for intracellular growth. Infect Immun, 72 (6), 3592-3603.

64 Ohno, A., Kato, N., Yamada, K., & Yamaguchi, K. (2003). Factors influencing survival of serotype 1 in hot spring water and tap water. Appl Environ Microbiol, 69 (5), 2540-2547. Olsen, G. J., Lane, D. J., Giovannoni, S. J., Pace, N. R., & Stahl, D. A. (1986). Microbial ecology and evolution: a ribosomal RNA approach. Annu Rev Microbiol, 40 , 337- 365. Olson, B. H., & Nagy, L. A. (1984). Microbiology of potable water. Adv Appl Microbiol, 30 , 73-132. Pace, N. R., Stahl, D.A., Lane, D.L., and Olsen, G. J. (1986). “The analysis of microbial populations by rRNA sequences.” Advances in Microbial Ecology 9, 1-55. Parsek, M. R., & Singh, P. K. (2003). Bacterial biofilms: an emerging link to disease pathogenesis. Annu Rev Microbiol, 57 , 677-701. Payment, P., Gamache, F., & Paquette, G. (1988). Microbiological and virological analysis of water from two water filtration plants and their distribution systems. Can J Microbiol, 34 (12), 1304-1309. Perreault, N. N., Andersen, D. T., Pollard, W. H., Greer, C. W., & Whyte, L. G. (2007). Characterization of the prokaryotic diversity in cold saline perennial springs of the Canadian high Arctic. Appl Environ Microbiol, 73 (5), 1532-1543. Pryor, M., Springthorpe, S., Riffard, S., Brooks, T., Huo, Y., Davis, G., et al . (2004). Investigation of opportunistic pathogens in municipal drinking water under different supply and treatment regimes. Water Sci Technol, 50 (1), 83-90. Reasoner, D. J., & Geldreich, E. E. (1985). A new medium for the enumeration and subculture of bacteria from potable water. Appl Environ Microbiol, 49 (1), 1-7. Regan, J. M., Harrington, G. W., & Noguera, D. R. (2002). Ammonia- and nitrite- oxidizing bacterial communities in a pilot-scale chloraminated drinking water distribution system. Appl Environ Microbiol, 68 (1), 73-81. Revetta, R. P., Pemberton, A., Lamendella, R., Iker, B., & Santo Domingo, J. W. (2009). Identification of bacterial populations in drinking water using 16S rRNA-based sequence analyses. Water Res . Reynolds, K. A., Mena, K. D., & Gerba, C. P. (2008). Risk of waterborne illness via drinking water in the United States. Rev Environ Contam Toxicol, 192 , 117-158. Rodriguez, C., Wachlin, A., Altendorf, K., Garcia, F., & Lipski, A. (2007). Diversity and antimicrobial susceptibility of oxytetracycline-resistant isolates of Stenotrophomonas sp. and Serratia sp. associated with Costa Rican crops. J Appl Microbiol, 103 (6), 2550-2560. Rowbotham, T. J. (1986). Current views on the relationships between amoebae, legionellae and man. Isr J Med Sci, 22 (9), 678-689. Rubin, M. A., & Leff, L. G. (2007). Nutrients and other abiotic factors affecting bacterial communities in an Ohio River (USA). Microb Ecol, 54 (2), 374-383. Santo Domingo, J. W., Meckes, M. C., Simpson, J. M., Sloss, B., & Reasoner, D. J. (2003). Molecular characterization of bacteria inhabiting a water distribution system simulator. Water Sci Technol, 47 (5), 149-154. Schloss, P. D., & Handelsman, J. (2005). Introducing DOTUR, a computer program for defining operational taxonomic units and estimating species richness. Appl Environ Microbiol, 71 (3), 1501-1506.

65 Schloss, P. D., Westcott, S. L., Ryabin, T., Hall, J. R., Hartmann, M., Hollister, E. B., et al . (2009). Introducing mothur: open-source, platform-independent, community- supported software for describing and comparing microbial communities. Appl Environ Microbiol, 75 (23), 7537-7541. Schmeisser, C., Stockigt, C., Raasch, C., Wingender, J., Timmis, K. N., Wenderoth, D. F., et al . (2003). Metagenome survey of biofilms in drinking-water networks. Appl Environ Microbiol, 69 (12), 7298-7309. September, S. M., Brozel, V. S., & Venter, S. N. (2004). Diversity of nontuberculoid Mycobacterium species in biofilms of urban and semiurban drinking water distribution systems. Appl Environ Microbiol, 70 (12), 7571-7573. Shaw, A. K., Halpern, A. L., Beeson, K., Tran, B., Venter, J. C., & Martiny, J. B. (2008). It's all relative: ranking the diversity of aquatic bacterial communities. Environ Microbiol, 10 (9), 2200-2210. Souza, V., Espinosa-Asuar, L., Escalante, A. E., Eguiarte, L. E., Farmer, J., Forney, L., et al . (2006). An endangered oasis of aquatic microbial biodiversity in the Chihuahuan desert. Proc Natl Acad Sci U S A, 103 (17), 6565-6570. Stephen, J. R., McCaig, A. E., Smith, Z., Prosser, J. I., & Embley, T. M. (1996). Molecular diversity of soil and marine 16S rRNA gene sequences related to beta- subgroup ammonia-oxidizing bacteria. Appl Environ Microbiol, 62 (11), 4147- 4154. Stewart, P. S., & Costerton, J. W. (2001). Antibiotic resistance of bacteria in biofilms. Lancet, 358 (9276), 135-138. Szewzyk, U., Szewzyk, R., Manz, W., & Schleifer, K. H. (2000). Microbiological safety of drinking water. Annu Rev Microbiol, 54 , 81-127. Torvinen, E., Suomalainen, S., Lehtola, M. J., Miettinen, I. T., Zacheus, O., Paulin, L., et al . (2004). Mycobacteria in water and loose deposits of drinking water distribution systems in Finland. Appl Environ Microbiol, 70 (4), 1973-1981. Uhl, W., & Schaule, G. (2004). Establishment of HPC(R2A) for regrowth control in non- chlorinated distribution systems. Int J Food Microbiol, 92 (3), 317-325. van Vliet, M. J., Tissing, W. J., de Bont, E. S., Meessen, N. E., Kamps, W. A., & Harmsen, H. J. (2009). Denaturing gradient gel electrophoresis of PCR-amplified gki genes: a new technique for tracking streptococci. J Clin Microbiol, 47 (7), 2181-2186. Vandenesch, F., Surgot, M., Bornstein, N., Paucod, J. C., Marmet, D., Isoard, P., et al . (1990). Relationship between free amoeba and Legionella: studies in vitro and in vivo. Zentralbl Bakteriol, 272 (3), 265-275. Wang, Q., Garrity, G. M., Tiedje, J. M., & Cole, J. R. (2007). Naive Bayesian classifier for rapid assignment of rRNA sequences into the new bacterial taxonomy. Appl Environ Microbiol, 73 (16), 5261-5267. Williams, M. M., Domingo, J. W., Meckes, M. C., Kelty, C. A., & Rochon, H. S. (2004). Phylogenetic diversity of drinking water bacteria in a distribution system simulator. J Appl Microbiol, 96 (5), 954-964. Williams, M. M., Santo Domingo, J. W., & Meckes, M. C. (2005). Population diversity in model potable water biofilms receiving chlorine or chloramine residual. Biofouling, 21 (5-6), 279-288.

66 Wilson, K. H., Blitchington, R. B., & Greene, R. C. (1990). Amplification of bacterial 16S ribosomal DNA with polymerase chain reaction. J Clin Microbiol, 28 (9), 1942-1946. Zhang, K., Choi, H., Dionysiou, D. D., Sorial, G. A., & Oerther, D. B. (2006). Identifying pioneer bacterial species responsible for biofouling membrane bioreactors. Environ Microbiol, 8 (3), 433-440. Zwart, G., Crump, B., Kamst-van Agterweld, M., Hagen, F., & Han, S. (2002). Typical freshwater bacteria: An analysis of available 16S gene sequences from plankton of lakes and rivers. Aquatic Microbiology Ecology , 28, 141-155.

67