<<

Population dynamics and metabolic potential of a pilot-scale microbial community

performing enhanced biological phosphorus removal

by

CHRISTOPHER EVAN LAWSON

B.A.Sc. (Civil Engineering), University of British Columbia, 2010

A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF

THE REQUIREMENTS FOR THE DEGREE OF

MASTER OF APPLIED SCIENCE

in

THE FACULTY OF GRADUATE AND POSTDOCTORAL STUDIES

(Civil Engineering)

THE UNIVERSITY OF BRITISH COLUMBIA

(Vancouver)

September 2014

© Christopher Evan Lawson, 2014 Abstract

Enhanced biological phosphorus removal (EBPR) is an environmental biotechnology of global importance, essential for protecting receiving waters from eutrophication and enabling phosphorus recovery. Current understanding of EBPR is largely based on empirical evidence and black-box models that fail to appreciate the driving force responsible for nutrient cycling and ultimate phosphorus removal, namely microbial communities. Accordingly, this thesis focused on understanding the microbial ecology of a pilot-scale microbial community performing EBPR to better link bioreactor processes to underlying microbial agents.

Initially, temporal changes in microbial community structure and activity were monitored in a pilot-scale EBPR treatment plant by examining the ratio of small subunit ribosomal RNA

(SSU rRNA) to SSU rRNA gene over a 120-day study period. Although the majority of operational taxonomic units (OTUs) in the EBPR ecosystem were rare, many maintained high potential activities, suggesting that rare OTUs made significant contributions to protein synthesis potential. Few significant differences in OTU abundance and activity were observed between bioreactor redox zones, although differences in temporal activity were observed among phylogenetically cohesive OTUs. Moreover, observed temporal activity patterns could not be explained by measured process parameters, suggesting that alternate ecological forces shaped community interactions in the bioreactor milieu.

Subsequently, a metagenome was generated from pilot plant biomass samples using 454 pyrosequencing. Comparison of microbial community metabolism across multiple metagenomes from different environments revealed that EBPR community function was enriched in biofilm formation, phosphorus metabolism, and aromatic compound degradation, reflective of local bioreactor conditions. Population genomes binned from metagenomic contigs showed that M. parvicella genomes displayed remarkable genomic cohesion across EBPR ecosystems, where functional differences related to biofilm formation and antibiotic resistance, likely reflecting adaptation to habitat-specific selection pressures. Additionally, novel metabolic insights into

ii spp. in the EBPR ecosystem suggested a potential role for its involvement in polyphosphate and triacylglycerol cycling.

Overall, these findings offer valuable insight on EBPR microbial ecology and will guide future studies aimed at monitoring spatiotemporal patterns in population dyanmics and gene expression. Moreover, this work demonstrates that molecular sequencing approaches can be successfully used to gain deeper insight on microbial communities responsible for wastewater remediation.

iii Preface

I was responsible for the design and initiation of this research program with direct input from my supervisors, Dr. Steven Hallam and Dr. Eric Hall. My thesis committee members, Dr. William

Ramey, Dr. Barry Rabinowitz, and Dr. Don Mavinic, also made significant contributions to the design of the research program.

In Chapter 2, I generated and analyzed the small subunit ribosomal RNA (SSU rRNA) gene amplicons and transcripts from biomass samples collected from the UBC enhanced biological phosphorus removal (EBPR) Pilot Plant. Melanie Scofield and Aria Hahn provided initial training on laboratory protocols and assisted with sample collection. Niels Hanson assisted with data visualization and provided bioinformatic support. Blake Strachan received training from me and assisted with sample collection and subsequent laboratory processing. Sam Bailey, Mike

Harvard, Rony Das, and Fred Koch assisted with the operation and maintenance of the UBC

EBPR Pilot Plant. I drafted the manuscript with direct input from Dr. William Ramey and Dr.

Steven Hallam. Dr. Eric Hall, Dr. Barry Rabinowitz, and Dr. Don Mavinic also provided constructive feedback. Excerpts from Chapter 1 were presented at the 86th Annual Water

Environment Federation Technical Exhibition and Conference, Chicago, Illinois, October 9th,

2013 and have been submitted for publication in a peer-reviewed journal:

Lawson, C.E., Strachan, B.J., Hanson, N.W., Hahn, A.S., Hall, E.R., Rabinowitz, B., Mavinic,

D.S., Ramey, W.D., & Hallam, S.J. Microbial community structure and activity in a pilot-scale enhanced biological phosphorus removal ecosystem. In review.

In Chapter 3, I generated and analyzed the EBPR metagenome from a biomass sample collected from the UBC EBPR Pilot Plant. Masaru Nobu from the University of Illinois at

Urbana-Champaign performed the metagenomic binning with my interpretation. Excerpts from

iv Chapter 2 were presented at the 15th International Symposium on Microbial Ecology, Seoul,

Korea, August 29th, 2014 and are in preparation for submission to a peer-reviewed journal.

v Table of Contents

Abstract ...... ii

Preface ...... iv

Table of Contents ...... vi

List of Tables ...... ix

List of Figures ...... x

List of Symbols and Abbreviations ...... xi

Acknowledgements ...... xiii

Dedication ...... xiv

Chapter 1: Introduction - the microbial ecology of enhanced biological phosphorus removal ...... 1 1.1 Phosphorus: a broken biogeochemical cycle ...... 1 1.2 The removal of Phosphorus from municipal wastewaters ...... 1 1.3 EBPR biochemical transformations and metabolic models ...... 2 1.4 EBPR process configurations ...... 6 1.5 Metagenomic insights: an ecosystem model for EBPR ...... 7 1.5.1 Polyphosphate-accumulating organisms...... 8 1.5.2 Glycogen-accumulating organisms ...... 10 1.5.3 Filamentous hydrolyzing ...... 11 1.5.4 Fermenting bacteria ...... 13 1.5.5 Denitrifying bacteria ...... 14 1.5.6 Nitrifying bacteria ...... 15 1.5.7 Predators: bacteriophage and protozoa ...... 16 1.5.8 EBPR environments: from macro to micro...... 18 1.6 Research motivation and objectives ...... 20

Chapter 2: Microbial community structure and activity in a pilot-scale EBPR ecosystem .. 22 2.1 Synopsis...... 22 2.2 Background ...... 23 2.3 Experimental procedures ...... 25 2.3.1 Pilot plant operation and sampling ...... 25 2.3.2 Nucleic acid extraction and cDNA synthesis...... 27

vi 2.3.3 PCR amplification and pyrosequencing of SSU rDNA and cDNA ...... 27 2.3.4 Processing of pyrotag sequences...... 28 2.3.5 Statistical analysis ...... 28 2.4 Results ...... 29 2.4.1 SSU rDNA and rRNA sequencing ...... 29 2.4.2 Overview of microbial community structure and activity ...... 32 2.4.3 Relative rRNA abundance across EBPR redox zones ...... 35 2.4.4 Abundance and activity of core EBPR taxa...... 35 2.4.5 Temporal dynamics of community structure and activity ...... 41 2.5 Discussion...... 44 2.5.1 Rare biosphere is active in EBPR ecosystems ...... 44 2.5.2 Temporal activity patterns suggest high microdiversity within genera ...... 46 2.5.3 Anticipatory life strategy for EBPR microbes? ...... 47 2.6 Concluding remarks ...... 48

Chapter 3: Metagenomic analysis of a pilot-scale microbial community performing enhanced biological phosphorus removal ...... 49 3.1 Synopsis...... 49 3.2 Background ...... 50 3.3 Experimental procedures ...... 51 3.3.1 Sampling ...... 51 3.3.2 DNA extraction and sequencing ...... 51 3.3.3 Metagenomic assembly and binning ...... 52 3.3.4 Gene annotation and pathway analysis...... 52 3.3.5 Genome comparisons ...... 53 3.3.6 Prophage and CRISPR reconstruction ...... 53 3.4 Results and discussion ...... 54 3.4.1 Sequencing statistics ...... 54 3.4.2 Community structure: comparison of pyrotag and metagenomic results ...... 54 3.4.3 Microbial community metabolism ...... 56 3.4.4 Comparison of population genomes to existing reference genomes ...... 59 3.4.5 EBPR ecosystem bacteria-phage interactions ...... 70 3.5 Concluding remarks ...... 73

Chapter 4: Conclusions and future directions ...... 75 4.1 Conclusions, limitations, and future directions ...... 75

vii Bibliography ...... 78

Appendix A – Chapter 2 supplementary material ...... 94

Appendix B – Chapter 3 supplementary material ...... 103

viii List of Tables

Table 2.1 Pilot plant process operations and performance data ...... 26

Table 3.1 Metagenome assembly and sequencing statistics ...... 54 Table 3.2 Comparison of community structure based on pyrotag and metagenomic methods ...... 55 Table 3.3 ORFs assigned to capsular and exopolysaccharides metabolism ...... 59 Table 3.4 Metagenomic contig binning statistics ...... 59 Table 3.5 Polyphosphate metabolism, M. parvicella ...... 63 Table 3.6 Polyphosphate metabolism, Gordonia spp...... 69

Table A1 Sampling and sequencing statistics ...... 94 Table A2 OTU richness and diversity estimates ...... 96 Table A3 Abundance and activity of select EBPR taxa ...... 97 Table A4 Indicator OTUs – rDNA abundance ...... 100 Table A5 Indicator OTUs – SSU rRNA:rDNA ratio ...... 102

Table B1 Marker genes identified in population genome bins ...... 105 Table B2 Bin001 (Candidatus ‘Microthrix parvicella’) variable genomic regions ...... 107 Table B3 Bin002 (Gordonia spp.) variable genomic regions ...... 117 Table B4 Prophage regions from metagenome ...... 129 Table B5 Summary of spacer sequences ...... 136

ix List of Figures

Figure 1.1 Nutrient cycling in EBPR bioreactors ...... 3 Figure 1.2 Original EBPR biochemical model...... 5 Figure 1.3 Bioreactor configurations to achieve EBPR...... 7 Figure 1.4 EBPR ecosystem distributed metabolism...... 9 Figure 1.5 Acetate uptake model in Candidatus ‘Accumulibacter phosphatis’ ...... 11 Figure 1.6 EBPR microbial activity dynamics...... 19 Figure 1.7 Activated sludge floc composition ...... 20

Figure 2.1 Generic configuration of the UBC EBPR pilot plant...... 25 Figure 2.2 Rarefaction curves for each day in time series...... 30 Figure 2.3 Relationship between SSU rDNA and rRNA frequencies ...... 32 Figure 2.4 Microbial community structure and activity ...... 33 Figure 2.5 Activity profiles for selected genera...... 40 Figure 2.6 Bray-Curtis dissimilarities between samples...... 41

Figure 2.7 UBC EBPR Pilot Plant phosphate (PO4) removal performance ...... 42 Figure 2.8 RDA bioplots ...... 44

Figure 3.1 Comparison of taxonomic composition using pyrotag and metagenomic methods. .... 56 Figure 3.2 SEED subsystem comparison of microbial metagenomes with the UBC EBPR Pilot Plant metagenome ...... 58 Figure 3.3 M. parvicella pathway comparison ...... 61 Figure 3.4 Fine-scale comparison of M. parvicella genomes ...... 65 Figure 3.5 Gordonia spp. pathway comparison ...... 67 Figure 3.6 Prophage coding sequence (CDS) regions reconstructed from metagenome...... 72 Figure 3.7 Total spacers count from EBPR metagenome ...... 73 Figure 3.8 CRISPR spacer-repeat loci (region G4)...... 73

Figure B1 UDP-D-xylose biosynthesis pathways in M. parvicella spp...... 103 Figure B2 dTDP-L-rhamnose biosynthesis pathways in M. parvicella spp...... 104

x List of Symbols and Abbreviations

ADP Adenosine diphosphate AOB Ammonium oxidizing bacteria ATP Adenosine triphosphate BLAST Basic Local Alignment Search Tool BOD Biochemical oxygen demand BNR Biological nutrient removal COD Chemical oxygen demand CRISPR Clustered regularly interspaced short palindromic repeats DNA Deoxyribonucleic acid DAPI 4',6-diamidino-2-phenylindole EBPR Enhanced biological phosphorus removal ED Entner Doudoroff EMP Embden–Meyerhoff–Parnas ePGDB Environmental pathway/genome databases EPS Extracelluar polymetric substances FISH Fluorescent in situ hybridization GAO Glycogen accumulating organism GC Guanine-cytosine HGT Horizontal gene transfer HRT Hydraulic retention time LCFA Long-chain fatty acid N Nitrogen NADH Nicotinamide adenine dinucleotide

NH4 Ammonium NDMS Non-metric multidimensional scaling NOB Nitrite oxidizing bacteria NOx Nitrate/nitrite-nitrogen ORF Open reading frame OTU Operational taxonomic unit PAO Polyphosphate accumulating organism P Phosphorus PCR Polymerase chain reaction

Pi Phosphate Pit Inorganic phosphate transport PMF Proton motive force PolyP Polyphosphate Pst Phosphate-specific transport PHA Poly-β-hydroxyalkanoate PHB Poly-β- hydroxybutyrate RDA Redundancy analysis RNA Ribonucleic acid rDNA Ribosomal DNA rRNA Ribosomal RNA

xi RT Reverse transcriptase SCFA Short-chain fatty acid SERC Staging Environmental Research Centre SRT Solids retention time SSU Small subunit TAG Triacylglycerol TCA Tricarboxylic acid TKN Total Kjeldahl Nitrogen TSS Total suspended solids TP Total Phosphorus UBC University of British Columbia UCT University of Cape Town

xii Acknowledgements

I would like to acknowledge my supervisor and mentor Dr. Steven Hallam for his unwavering enthusiasm and support during all my academic pursuits. His constant belief that I could successfully pursue research at the interface of microbial ecology and environmental engineering has truly motivated me to become the scientist I am today. Additionally, I would like to thank Dr.

William Ramey for the enormous investment he has made toward my scientific training. I truly cherish the many evening long conversations in Bill’s office learning about microbiology and discussing the philosophies of science. Many thanks also go to Dr. Donald Mavinic and Dr. Eric

Hall for supporting me during my Masters degree and allowing me to pursue my passion for an interdisciplinary research project. Without their support, this project would not have been possible. In the same regard, I must also thank Dr. Barry Rabinowitz for his constant enthusiasm and willingness to participate in my research work, providing his invaluable insight on enhanced biological phosphorus removal. I am also grateful for having worked with members of both the

Hallam Lab and Pollution Control and Waste Management group. Their constant support and constructive feedback over the years has made my time at UBC most enjoyable. Finally, I could not have completed this degree without the enormous support of my family. In particular, I wish to thank Christine Tam and Keith Lawson for their continual motivation, support, and investment in my ongoing interests.

xiii To my grandmother, Helen Sorensen, for always encouraging me to follow my dreams.

xiv Chapter 1: Introduction - the microbial ecology of enhanced biological phosphorus removal

1.1 Phosphorus: a broken biogeochemical cycle

Phosphorus is an essential nutrient in all forms of life. It is a central molecule in the structure of

DNA, cellular membranes, and ATP, and has no known substitute. However, population growth and intensive farming have resulted in a collapse of the natural phosphorus cycle, leading to the deterioration of surface water quality and a shortage of easily mineable deposits of phosphate rock (Elser and Bennett, 2011). The discharge of excess phosphorus to the aquatic environment can promote excessive algal growth and eutrophication, which exerts a substantial oxygen demand on receiving waters with the potential to upset marine food webs (Wright et al., 2012), and produces a variety of products that adversely affect the suitability of water for consumption

(Orihel et al., 2012). While this sink of phosphorus creates a serious threat to the aquatic environment and human health, our major reserves of phosphate for food production are diminishing, creating a one-way flow of phosphorus from rocks to farms to lakes and oceans

(Elser and Bennett, 2011). Therefore, the ability to remove and subsequently recover phosphorus from municipal wastewaters is critical to restoring the phosphorus balance, while preventing the deterioration of surface water quality and sustaining available water and nutrient resources for human societies and the biosphere.

1.2 The removal of phosphorus from municipal wastewaters

With the global population becoming increasingly urbanized, the removal of phosphorus from municipal wastewaters has become a crucial aspect of wastewater management. Fundamental to meeting this objective is the enhanced biological phosphorus removal (EBPR) process, which is widely recognized as the most advanced and sustainable treatment technology in application for the removal and recovery of phosphorus from municipal wastewaters (Coats et al., 2011). The

1 process leverages microbial community metabolism, resulting in the accumulation of polyphosphate within the biomass of polyphosphate accumulating organisms (PAO), where it can be subsequently recovered into a commercial-grade fertilizer via struvite crystallization (Britton et al., 2005). However, despite its successful application in Canada and abroad, development of

EBPR technology has largely relied on “black-box” empirical approaches to predict complex biological processes, which tend towards oversimplification (Follows and Dutkiewicz, 2011) and neglect key microbial interactions and activities facilitating phosphorus removal (Mino and Satoh,

2006). Consequently, reliable phosphorus removal to regulated limits is difficult to meet with

EBPR alone and often requires the use of supplementary chemical precipitation (Johnson and

Daigger, 2009). These chemical methods increase treatment plant operating costs due to the large volumes of chemical waste sludge generated and additionally reduce the availability of phosphorus for nutrient recovery. Unpredictable loss or reduced activity of microorganisms responsible for phosphorus removal has been the main observation associated with EBPR process instability (Gu et al., 2010). Accordingly, if EBPR is to fully achieve its potential as an effective and reliable environmental biotechnology, a more comprehensive understanding of the microbial ecology of EBPR ecosystems is needed.

1.3 EBPR biochemical transformations and metabolic models

EBPR is achieved in the activated sludge process by cycling biomass through anaerobic “feast conditions” and aerobic “famine conditions” (Barnard, 1975) (Figure 1.1). This configuration, combined with an anoxic zone for nitrogen removal is termed biological nutrient removal (BNR).

In EBPR processes, it is presumed that system performance is largely dictated by the ability of

PAO to store intercellular compounds, namely, poly-β-hydroxyalkanoate (PHA), polyphosphate

(polyP), and glycogen (Smolders et al., 1995). In the anaerobic zone of EBPR systems, PAO take up short-chain fatty acids (SCFAs), such as acetate, and store them as PHAs, while degrading

2 internally stored polyP and glycogen for energy and reducing equivalents (Smolders et al.,

1994a). Effectively sequestering SCFAs during the anaerobic phase is believed

Figure 1.1 Nutrient cycling in EBPR bioreactors (taken from McMahon and Read, 2013). Anaerobic zone receives high soluble phosphate (P) and organic carbon loading from settled wastewater influent (primary effluent). Anaerobic zone characterized by phosphate release and carbon uptake; aerobic zone characterized by phosphate uptake and subsequent P removal in waste activated sludge.

to give PAO a selective advantage over other organisms present in the microbial community for subsequent growth in the aerobic phase, allowing for their proliferation in the system. Under aerobic conditions, internally stored PHA is oxidized and used for growth, conservation of energy, Pi uptake, and glycogen production (Smolders et al., 1994b).

The microbiology of EBPR has been subject to ongoing review (Mino et al., 1998;

Seviour et al., 2003; Oehmen et al., 2007; McMahon and Read, 2013; Kang and Noguera, 2014).

Two initial metabolic models have been used or expanded upon to describe the biochemical transformations of PAO, based on bulk chemical measurements from lab-scale and pilot-scale systems; namely, the Comeau-Wentzel model (Comeau et al., 1986; Wentzel et al., 1986) and the

Mino model (Mino et al., 1987). The models are generally based on acetate as the primary substrate, although PAO can assimilate other soluble substrates, such as propionate and glucose

3 for storage as PHA (Jeon and Park, 2000). The major difference between the two models relates to the origin of reducing equivalents needed for PHA biosynthesis. In the Comeau-Wentzel model, reducing equivalents were assumed to be produced anaerobically by the tricarboxylic acid

(TCA) cycle, whereas the Mino model proposed that reducing equivalents were generated from the consumption of internally stored carbohydrates (glycogen) based on experimental observations. Under anaerobic conditions, glycogen was assumed to be converted to pyruvate via the Embden–Meyerhoff–Parnas (EMP) pathway, producing nicotinamide adenine dinucleotide

(NADH). This hypothesis was later contested by Pereira et al. (1996), who used in-vivo 13C and

31P nuclear magnetic resonance (NMR) to show that acetate was mainly used for poly-β- hydroxybutyrate (PHB) storage under anaerobic conditions and that the conversion of glycogen via the Entner Doudoroff (ED) pathway appeared to be the main source of reducing equivalents.

However, more recent genomic and transcriptomic data unambiguously implicate the EMP pathway as the main route for glycolysis in model PAO (Garcia Martin et al. 2006, He et al.,

2011b). It was also suggested that glycogen alone could not provide sufficient reducing equivalents to convert SCFAs to PHA (Pereira et al., 1996), suggesting that other mechanisms, such as the TCA or glyoxylate cycle were active. (Louie et al., 2000; Burow et al., 2008).

Therefore, it is likely that both glycogen and variants of the TCA cycle are utilized to generate reducing power, combining the initial ideas of both the Comeau/Wentzel and Mino models.

Summaries of the main biochemical transformations that transpire in the anaerobic and aerobic zones of EBPR systems are as follows (Figure 1.2; adopted from McMahon et al., 2010):

Anaerobic (feast) environment:

i. SCFAs are rapidly assimilated and stored as PHAs (active transport via the proton motive

force, PMF), whose chemical composition depends on the feed carbon substrate. PHB is

synthesized with acetate as the carbon source; Poly-β-hydroxyvalerate (PHV) and poly-β-

hydroxy-2-methylvalerate (PH2MV) are produced with propionate as the carbon source.

4 ii. Adenosine triphosphate, ATP, (i.e. “energy currency”) is produced from the transfer of an

energy-rich phosphoric group from intracellular polyP to adenosine diphosphate (ADP),

resulting in the release of cations and Pi to the bulk liquid.

iii. Intracellularly stored glycogen is degraded for the production of ATP and reducing

equivalents (NADH).

Aerobic (famine) environment:

iv. Stored PHA is catabolized through the TCA cycle as a carbon and energy source for

biomass growth. Portions of the carbon and ATP produced are used for regeneration of

polyP and glycogen.

v. Pi levels in the bulk liquid decrease, coupled to an increase in intercellular polyP levels.

vi. Biomass storage carbohydrates (i.e. glycogen) are replenished.

+ - H Ac PAO

ergy en O PHA 2 PolyP + Ac-CoA + - H H Ac Gly P pool Gly i TCA cell + growth cycle Ac-CoA NADH TCA Cycle PolyP P PHA i energy Pi Anaerobic Phase Aerobic Phase

Figure 1.2 Original EBPR biochemical model adopted from Comeau et al. (1986) and Mino et al. (1987). Ac-: acetate; H+: hydrogen ion; PAO: polyphosphate accumulating organism; polyP: polyphosphate; gly: glycogen; TCA: tricarboxylic acid.

5 1.4 EBPR process configurations

Several bioreactor configurations exist for the process design of EBPR treatment plants (Figure

1.3). Design and development of EBPR treatment processes has largely been empirical (Oldham and Stevens, 1984; Oldham, 1985; Barnard, 1998), however, several key design considerations are based on the initially proposed EBPR metabolic models (Section 1.3; Fuhs and Chen, 1975;

Comeau et al., 1986; Wentzel et al., 1986). These considerations include the following.

i. Sufficient availability of SCFAs to drive PAO carbon storage and phosphorus release

(Rabinowitz and Oldham, 1986). Approximately 7 to 10 mg of acetate are needed to

remove 1 mg of phosphorus, based on experimental evidence (Grady et al., 2011). To

ensure optimal phosphorus removal, particularly in regions with colder wastewater

temperatures, SCFAs are added to the anaerobic zone using external carbon sources (e.g.

sodium acetate) or pre-fermentation of primary sludge (Rabinowitz and Oldham, 1986).

ii. Alternating anaerobic-aerobic regimes to drive PHA and polyP cycling (Nicholls and

Osborn, 1979). Spatial separation of electron donor and electron acceptor selects for

bacteria capable of polymer storage, ultimately resulting in excess phosphorus

assimilation due to polyP synthesis requirements in the aerobic zone.

iii. Strict maintenance of anaerobic conditions in the anaerobic zone (Barnard, 1976).

Oxygen or nitrate entering the anaerobic zone is believed to provide denitrifying bacteria

(Section 1.5.5) with an alternative electron acceptor for growth and organic carbon

consumption, reducing SCFA availability for PAO. Oxygen or nitrate can potentially

enter the anaerobic zone through aggressive mixing or recycle activated sludge lines,

which should therefore be minimized.

6

Figure 1.3 Bioreactor configurations to achieve EBPR. Advantages and disadvantages of each configuration are summarized in Grady et al. (2011). Q: process flow rate; inf: influent; acetate: external acetate addition; IR: internal recycle flow; RAS: recycle activated sludge; eff: effluent.

1.5 Metagenomic insights: an ecosystem model for EBPR

Recent advances in molecular biology and sequencing throughput have led microbiologists and engineers to appreciate EBPR as a model ecosystem for understanding microbial community metabolism and environmental biotechnology (Nielsen et al., 2012). Previous methodologies based on cultivation-dependent approaches have limited the study of in situ microbial communities performing EBPR at wastewater treatment plants due to isolation-biases and low taxonomic resolution (Seviour and Nielsen, 2010). Indeed, decades of research were spent searching for the “super bug” responsible for EBPR, once believed to be Acinetobacter spp. (Fuhs and Chen, 1975). Cultivation-independent approaches based on high-throughput sequencing have

7 now revealed that the majority of microorganisms in natural and engineered ecosystems are uncultured (Amann et al., 1998; Hugenholtz et al., 1998; Rappé and Giovannoni, 2003). These approaches are increasing being employed to resolve central questions in microbial ecology; namely who are the key microbial players, what are their functions, and how do they interact?

The answers can in turn be used to realize engineering objectives related to managing microbial communities for societies benefit (e.g. biological nutrient removal).

Culture-independent analysis of microbial communities across multiple full-scale wastewater treatment plants has revealed that PAO represent only a minor fraction of microorganisms in EBPR ecosystems (Nielsen et al., 2010; Nielsen et al., 2012). These studies have also shown that EBPR ecosystems are diverse, but share a core microbiome (defined at the genus-level), despite differences in treatment plant layout, operations, and wastewater characteristics (Nielsen et al., 2010; Zhang et al., 2012; Nielsen et al., 2012). Core microorganisms of known functional relevance to EBPR ecosystems include PAO, glycogen- accumulating organisms (GAO), hydrolyzers, fermenters, nitrifiers, denitrifiers, and predators

(viruses and protozoa) (Figure 1.4). A summary of the core EBPR microbiome is presented below.

1.5.1 Polyphosphate-accumulating organisms

Culture-independent methods have identified the betaproteobacterial Rhodocyclus-related

Candidatus ‘Accumulibacter phosphatis’ (hereafter, Accumulibacter) and the actinobacterial genus Tetrasphera to be important PAOs, based on their abundance and activity in full-scale

EBPR ecosystems (Hesselmann et al., 1999; Maszenen et al., 2000; Zilles et al., 2002; Kong et al,

2004; Kong et al., 2005). While the ecophysiology of Accumulibacter agrees with the initially proposed metabolic models (Section 1.3), the ecophysiology of Tetrasphera is markedly different

(He and McMahon, 2012; Kristiansen et al., 2012). Under anaerobic conditions, Kristiansen et al.

8 (2012) proposed that Tetrasphera PAO synthesize glycogen granules using energy generated through polyP degradation and glucose fermentation. Under subsequent aerobic conditions, stored glycogen is catabolized to provide energy for growth and polyP replenishment needed for subsequent anaerobic metabolisms (Kristiansen et al., 2012). Both Accumulibacter and

Tetrasphera are also believed to be capable of denitrification (Kong et al., 2004; Flowers et al.,

2008; Kristiansen et al., 2012). Whether this is completed independently or in concert with other microbial partners has yet to be determined (Flowers et al., 2013; Kim et al., 2013).

Figure 1.4 EBPR ecosystem distributed metabolism. Hydrolyzing bacteria, such as Chloroflexi convert larger macromolecules into soluble carbon molecules that are subsequently fermented into short-chain fatty acids by fermenting bacteria, such as Streptococcus. PAO, such as Accumulibacter assimilate SCFAs and store them as poly-β-hydroxyalkanoates under anaerobic conditions. Both bacteriophage and grazers (predators) modulate EBPR community dynamics.

It is important to note that other microorganisms have also been observed to accumulate polyP granules based on in situ 4',6-diamidino-2-phenylindole (DAPI) staining and genomic evidence, including Candidatus ‘Microthrix parvicella’ (hereafter, Microthrix) (Erhart et al.,

1997; McIlroy et al., 2013; Wang et al., 2014), Gordonia amarae like organisms (hereafter,

9 Gordonia) (Wong et al., 2005; Beer et al., 2006), and Dechloromonas (Goel et al., 2005; Kong et al., 2007). However, direct evidence of their involvement in phosphorus removal and continuous polyP cycling has yet to be shown. Nevertheless, it seems likely that PAO are not phylogenetically cohesive units, but rather consist of several diverse taxonomic groups that vary among different treatment systems (Mino, 1998; Seviour and Nielsen, 2010).

1.5.2 Glycogen-accumulating organisms

Glycogen-accumulating organisms are considered competitors to PAO in EBPR ecosystems, based on their ability to compete for SCFAs. Two main GAO often identified in lab-scale and some full-scale EBPR ecosystems include the Gammaproteobacteria Candidatus ‘Competibacter phosphatis’ (hereafter, Competibacter) and the tetrad-forming Alphaproteobacteria related to

Defluviicoccus vanus (Crocetti el al., 2002; Wong et al., 2004; McIlroy and Seviour, 2009).

Ecophysiological differentiation between GAO and PAO results from the ability of GAO to perform anaerobic-aerobic cycling of PHA and glycogen, but not polyP (Cech and Hartman;

1993; Wong and Liu, 2006). As competitors, it is presumed that GAO populations should be minimized for stable phosphorus removal. Known factors that control the balance between GAO and PAO populations include SCFA/P ratio, carbon source, pH, and temperature among other factors, as reviewed by Oehmen et al. (2007).

Recent genomic comparisons have revealed that key metabolic differences between GAO and PAO relate to their phosphate transport systems (McIlroy et al., 2014; Nobu et al., 2014).

Here, the genomes of known PAO (i.e. Accumulibacter and Tetrasphera) encode both the high- affinity phosphate-specific transport (Pst) system and the low-affinity inorganic phosphate transport (Pit) system, whereas GAO genomes only encode the Pst system (Garcia Martin et al.,

2006; McIlroy et al., 2014; Nobu et al., 2014). Indeed, the Pit system is also missing in

Microthrix, which can accumulate polyP, but is not believed to participate in polyP cycling linked

10 to anaerobic SCFA uptake (Andereasen and Nielsen, 2000, McIlroy et al., 2013). In PAO, the Pit system is believed to be essential for generating the proton motive force under anaerobic conditions through export of Pi in symport with protons (Figure 1.5; Saunders et al., 2007; Burow et al., 2008). As such, several researchers have hypothesized that the Pit system is a prerequisite for polyP cycling in EBPR (McIlroy et al., 2014; Nobu et al., 2014); however, further studies are needed to elucidate this possibility.

Figure 1.5 Acetate uptake model in Candidatus ‘Accumulibacter phosphatis’ (taken from Saunders et al., 2007). Proton motive force is generated by export of phosphate in symport with hydrogen ions.

1.5.3 Filamentous hydrolyzing bacteria

Earlier studies based on microscopic identification observed numerous filament morphologies in activated sludge plants, including those configured for EBPR (Eikelboom, 2000). These investigations have been greatly improved by the application of culture-independent methods, which overcome difficulties associated with identification of bacteria based on morphological features (Nielsen et al., 2008). While excess filamentous bacteria result in serious operational

11 problems at wastewater treatment plant due to sludge settling issues and foaming (Seviour and

Nielsen, 2010), they are essential for hydrolysis of macromolecules and the generation of low- molecular weight soluble substrates utilized by other community members, such as PAO. Many non-filamentous bacteria have also been implicated in the hydrolysis of macromolecules, including members affiliated with the phyla , Firmicutes, and (Xia et al., 2008). Macromolecule degradation is accomplished by secretion of exoenzymes by hydrolyzing bacteria, including lipases, proteases, esterases, chitinases, galactosidases, glucuronidases, and phosphatases (Krageland et al., 2007). These exoenzymes typically remain associated with its producer cell (surface-associated) or diffuse into the adjacent extracellular polymeric substance (EPS) layer and function by repeatedly fragmenting macromolecules into small enough molecules for assimilation (Confer and Logan, 1998; Wingender et al., 1999).

The abundant macromolecules entering wastewater treatment plants are typically lipids, polysaccharides, and proteins. Bacteria specialized in lipid degradation in EBPR ecosystems include Microthrix (Nielsen et al., 2002) and the Mycolata (mainly Gordonia and

Mycobacterium) (Krageland et al., 2007). Previous in situ experimental data and genomic information has provided significant insight into the ecophysiology of the lipid-accumulating

Microthrix (Rossetti et al., 2005; McIlory et al., 2013). Metabolic models proposed by McIlroy et al. (2013) suggest that under anaerobic conditions, Microthrix preferentially takes up and accumulates long-chain fatty acids (LCFA), such as triaclglycerols, using energy generated through trehalose and/or polyP degradation or partial oxidation of LCFAs. Under subsequent aerobic conditions, stored triaclglycerols are processed via β-oxidation and ethylmalonyl-CoA pathways into the TCA cycle, providing energy and precursor metabolites for growth (McIlroy et al., 2013). Comparison of the genome of the isolated Microthrix RN1 strain to two metagenomes recovered from full-scale Danish EBPR plants indicates that limited metabolic differences exist between Microthrix strains. This suggests that proposed metabolic models are generally applicable to Microthrix strains across EBPR ecosystems, and that removal of lipids could be a

12 potential control strategy for excess Microthrix growth resulting in sludge bulking and foaming

(McIlroy et al., 2013). In comparison, variable substrate uptake patterns have been observed for the Mycolata, where some members utilize LCFAs (e.g. oleic acid) (Soddell et al., 1998) while others take up acetate or glucose (Carr et al., 2006; Kragelund et al., 2007). Indeed this highlights the diversity of mycolic acid in EBPR ecosystems and indicates that further ecological studies are needed to determine effective control strategies.

Filamentous bacteria implicated in the hydrolysis of polysaccharides and proteins in

EBPR ecosystems are commonly affiliated with the phyla Bacteroidetes, Chloroflexi, and candidate division TM7 (Miura et al., 2007; Xia et al. 2007; Kragelund et al., 2009; Yoon et al.,

2010; Albertsen et al., 2013). Microbial degradation of polysaccharides and proteins in wastewaters is essential for the generation of monosaccharaides (e.g. glucose) and amino acids, often the rate-limiting step for biological nutrient removal (Dueholm et al., 2001; Morgenroth et al., 2002). Interestingly, a group of protein-hydrolyzing epiphytic rods affiliated with the family

Saprospiraceae have been observed to grow attached to filaments belonging to the phyla

Chloroflexi, Proterobacteria, and candidate phylum TM7 (Xia et al., 2008). The advantage of epiphytic growth is currently unknown, however, it is hypothesized that such interactions may be symbiotic, where attachment protects epifloral bacteria from washout and, in return, provides amino acid substrates to their filamentous hosts (Xia et al., 2008).

1.5.4 Fermenting bacteria

Fermenting bacteria carry out anaerobic degradation of wastewater organic carbon derived from the hydrolysis of macromolecules. Here, bacteria ferment simple monosaccharides, fatty acids, and amino acids to SCFAs, which are primary substrates for both PAO and denitrifying bacteria

(Section 1.5.5) (Vollertsen et al., 2006). SCFA availability is often limiting in EBPR ecosystems, and operators are therefore required to add external sources or use pre-fermentation processes to

13 achieve optimal nutrient removal (Section 1.4; Rabinowitz and Oldham, 1986); an important economic consideration in wastewater treatment.

Previous culture-independent studies using labeling and isotope experiments coupled to fluorescent in situ hybridization (FISH) have identified bacteria affiliated with the phyla

Firmicutes, , and Bacteroidetes as active fermenters in EBPR ecosystems (Kong et al., 2008; Nielsen et al., 2012). Kong et al. (2008) demonstrated that the genera Streptococcus and

Tetrasphera were dominant monosaccharide fermenters in full-scale EBPR plants, where main fermentation products (in descending order) were propionic acid, lactic acid, acetic acid, and formic acid. Subsequent studies by Nielsen et al. (2012) were consistent with this, and further implicated the genera Propionicimonas (Actinobacteria) and Lactococcus (Firmicutes) in active fermentation. Indeed, these studies reveal that fermentation is diverse in EBPR ecosystems based on the identification of multiple metabolic pathways, which likely ensures a broad mixture of fermentation products are produced to sustain other microbial populations (Kong et al., 2008;

Nielsen et al., 2012). Nonetheless, further insight into fermentation processes in EBPR ecosystems is needed to better control and optimize SCFA production for efficient phosphorus removal.

1.5.5 Denitrifying bacteria

Denitrification is widespread among the prokaryotes and allows microorganisms to cope with oxygen-limited conditions. It is the second step in nitrogen removal from municipal wastewaters

(with nitrification, discussed below) and is often performed together with EBPR for complete nutrient removal. At full-scale wastewater treatment plants, the addition of an anoxic zone

(defined in engineering as the presence of nitrate, in the absence of oxygen) is required to select for the growth of denitrifying organisms. Denitrifying bacteria commonly identified in EBPR ecosystems are affiliated with the families , Comamonadaceae, and

14 Hyphomicrobiaceae (Kong et al., 2004; Osaka et al., 2006; Hesselsoe et al., 2009), however, many other bacteria are also known to denitrify (Daims & Wagner, 2010). As such, strategies based on phylogenetic markers (e.g. SSU RNA gene) have limited application as process diagnostics for monitoring and control of denitrification at EBPR treatment plants because some organisms that denitrify are not dependent on denitrification and will also grow in non- denitrifying conditions. Instead, functional gene makers based on key enzymes involved in nitrate reduction have been used to monitor in situ activity of denitrifiers in environmental samples

(Braker et al., 1998; Taroncher-Oldenburg et al., 2003). Such markers include core genes involved in the reduction of nitrate to nitrogen gas, including the respiratory nitrate reductase

(Nar), nitrite reductase (Nir), nitric oxide reductase (Nor), and nitrous oxide reductase (Nos) genes.

1.5.6 Nitrifying bacteria

Nitrification is the first step in nitrogen removal from wastewater. Most nitrogen in municipal wastewater enters the treatment plant as ammonium (NH4) or nitrogen-based organic molecules

(urea and proteins). Microbial degradation of urea and proteins by hydrolyzing bacteria (Section

1.4.3) results in the further release of NH4 , through the process of ammonification. In biological wastewater treatment, nitrification is achieved in the aerobic zone by maintaining sufficient solids retention times (SRT) for the proliferation of slow-growing nitrifying bacteria (Grady et al.,

2011). Two main groups of bacteria are known to catalyze the transformation of ammonia to nitrate: the ammonium oxidizing bacteria (AOB) and the nitrite oxidizing bacteria (NOB) (Daims

& Wagner, 2010). AOB catalyze the oxidation of ammonium to nitrite using the membrane- bound enzyme ammonia monooxygenase (amoA) and the periplasmic enzyme hydroxylamine oxidoreductase (HAO) (Hooper at al., 1997; Olson and Hooper, 1983). Bacteria affiliated with the genus Nitrosomonas are commonly considered the most important AOB in wastewater

15 treatment plants (including EBPR), based on culture-independent surveys of 16S rRNA and amoA gene sequences (Purkhold et al., 2000). However, other microorganisms have also been implicated in ammonia oxidation at wastewater treatment plants, including the bacterial genera

Nitrosococcus and Nitrosospira, as well as ammonia oxidizing archaea (Park et al., 2006; Stahl and Torre, 2012).

Subsequent to ammonium oxidation, NOB catalyze the conversion of nitrite to nitrate.

NOB are more phylogenetically diverse than AOB, where known NOB include the genera

Nitrobacter, Nitrospira, Nitrococcus affiliated with the phylum Proteobacteria and the

Nitrospina affiliated with the phylum Nitrospirae (Daims & Wagner, 2010). The key enzyme involved in nitrite oxidation is nitrite oxidoreductase (Nxr) (Spieck et al., 1996; Bock and

Wagner, 2006). Recently, recovery of the Candidatus ‘Nitrospira defluvii’ (hereafter Nitrospira) genome through metagenomic sequencing identified variant nxr genes that differed dramatically from other known nitrite oxidizers from natural ecosystems (Spieck et al., 2006, Lücker et al.,

2010). This was consistent with comparative genomic analyses that showed Nitrospira isolates from activated sludge and natural ecosystems were evolutionarily distant because the sludge

Nitrospira genome was shaped by horizontal gene transfer (HGT) events with anaerobic ammonium-oxidizing planctomycetes (Lücker et al., 2010). The extent to which HGT has shaped other genomes in EBPR ecosystems remains to be more fully explored. As such, further insight into genomic differentiation processes likely holds promise for identifying functionally important traits specific to survival in EBPR ecosystems.

1.5.7 Predators: bacteriophage and protozoa

Aside from bacteria, bacteriophage (i.e. viruses) and eukaryotic protozoa also play important roles in EBPR ecosystems. While a large number of viruses enter EBPR ecosystems through sewage, our knowledge of bacteriophage ecology and their impact on microbial population

16 dynamics is limited (Otawa et al., 2007). Nevertheless, recent advances in environmental genomics have shed light on important phage-bacteria interactions that transpire in EBPR ecosystems (Kunin et al., 2008; Albertsen et al., 2012). For example, using a combination of metagenomics and community expression analysis, Kunin et al. (2008) revealed that bacteriophage actively prey on globally dispersed Accumulibacter populations, where

Accumulibacter adapts locally to phage predation pressures. Here, key genes differentiating

Accumulibacter populations were related to EPS gene cassettes, which act as phage defense mechanisms by masking bacteriophage receptors on bacterial cell surfaces (Forde and Fitzgerald,

2003; Kunin et al., 2008). Additional phage defense mechanisms found in the Accumulibacter genomes included the recently discovered CRISPR (clustered regularly interspaced short palindromic repeats)-Cas (CRISPR-associated proteins) adaptive immunity system (Barrangou et al., 2007; Horvath et al., 2010), where the majority of CRISPR spacers were derived from local phage DNA (Kunin et al., 2008). Indeed, this suggests that bacteriophage significantly impact

EBPR population dynamics and that future studies are needed to further elucidate phage-bacteria interactions.

Protozoan grazers are the main consumers of bacteria in the environment and play a major role in controlling bacterial biomass and nutrient cycling in EBPR ecosystems (Sherr et al., 2002;

Moreno et al., 2010; Warren et al., 2010). Protozoa can reach concentrations of 105 – 106 cells ml-

1 in activated sludge and are dominated by a taxonomically diverse range of ciliates (Warren et al., 2010), in addition to flagellates such Giardia, Apicomplexia, , amoebae, and pathogenic protozoa (e.g. Giardia and Cryptosporidium). Microscopic identification of protozoa by plant operators has been used as a convenient indicator for assessing changes in biochemical oxygen demand (BOD) removal performance, although limited work has extended these metrics for assessing phosphorus removal (Madoni et al., 1993; Warren et al., 2010). While most literature indicates that protozoa play direct roles in bacterial consumption and inorganic particle removal, little is known about protozoan feeding dynamics, food selectivity, and interaction with other

17 microbial community members (Petropoulos and Gilbride, 2005; Moreno et al., 2010; Warren et al., 2010). As such, further understanding of the protozoan ecology in EBPR ecosystems is needed to better predict bacterial responses to protozoan grazing.

1.5.8 EBPR environments: from macro to micro

The co-existence of microorganisms in EBPR ecosystems is influenced by both macro- and micro-scales. Three redox conditions, separated spatially by individual anaerobic, anoxic, and aerobic zones, characterize the EBPR macro environment. While the hydraulic retention time in each zone is between 1 to 10 hours, the solids retention time can be greater than 20 days, requiring EBPR microbes to constantly adapt to changing redox conditions with each bioreactor cycle. Moreover, microorganisms must also adapt to strong nutrient gradients between bioreactor redox zones, which often requires strategies for temporary carbon sequestration and storage (e.g.

PHA granules and triacylglycerol). Indeed, these dynamic conditions are believed to select for versatile microorganisms that maintain some level of activity across most redox zones in the ecosystem (Nielsen et al., 2012) (Figure 1.6). Exposure to such dynamic conditions promotes niche partitioning in EBPR ecosystems, as some microorganisms are more suited for growth in one zone versus another (e.g. fermenters versus nitrifiers). This niche partitioning is further driven by the diversity of substrates present in wastewaters and those generated through macromolecule hydrolysis (Section 1.4.3.). For example, Kindaichi et al. (2013) observed that several core EBPR microbes maintained distinct and highly specialized substrate uptake patterns, even when exposed to variations in substrate and growth conditions. This suggests that strong selection pressures for metabolic specialization exist in EBPR ecosystems, likely driven by biochemical conflicts (Johnson et al., 2012). Such niche partitioning also implies that substrate type (i.e. wastewater composition) plays a substantial role in determining EBPR community structure (Kindaichi et al., 2013).

18 Wastewater Influent Effluent

Anoxic recycle Aerobic recycle WAS

Figure 1.6 EBPR microbial activity dynamics (adopted from Nielsen et al, 2012). Most microorganisms are versatile and maintain some level of activity across the three redox conditions. Hydrolyzing bacteria, such as Microthrix produce exoenzymes in all zones. Fermenting bacteria, such as Streptococcus are highly active in the anaerobic zone, but also can also grow in the aerobic zone because they are tolerant of oxygen and nitrates but do not use them. Polyphosphate accumulating organisms (PAO), such as Accumulibacter actively store acetate under anaerobic conditions and grow under aerobic conditions. Some Accumulibacter strains can also denitrify in the anoxic zone.

Microenvironment conditions, such as those associated with activated sludge floc, also drive community assembly in EBPR ecosystems. Flocs are typically 50-100 μm in diameter and contain a complex mixture of different microcolonies, filamentous bacteria, and EPS (Nielsen et al., 2012) (Figure 1.7). The function of activated sludge floc remains largely unknown, however, many believe microbial floc assemblage is important for biomass retention (Grady et al., 2011), resistance to chemical and enzymatic breakdown (Henriques and Love, 2007), and protection from predation (Forde and Fitzgerald, 2003). Floc formation is also proposed to be the consequence of stochastic and deterministic ecological factors that are complex and poorly understood (van der Gast et al., 2008; Ayarza et al., 2011). Elucidating these factors, and the local interactions between microorganisms in activated sludge floc, is essential for understanding controls on microbial assemblage in EBPR ecosystems.

19

Figure 1.7 Activated sludge floc composition (taken from Nielsen et al., 2012).

1.6 Research motivation and objectives

The overarching motivation for this research was to gain further insight into the microbial ecology of biological phosphorus removal, such that design principles for improved engineering of EBPR ecosystems can be developed. Resulting principles will support the optimization of existing nutrient removal systems for improved treatment performance and process stability, while also enabling the development of novel EBPR process designs and control strategies.

Chapter 2 examines the activity of abundant and rare (i.e. very low abundance) microorganisms present in the University of British Columbia (UBC) EBPR Pilot Plant using pyrotag sequencing. Currently, little is known about microbial activity dynamics in engineered ecosystems, in particular, among rare members of the biosphere (Sogin et al., 2006; Pedrós-Alió,

2006; Kim et al., 2013). We hypothesized that microbial activity dynamics vary in EBPR ecosystems, both spatially and temporally, and posited that rare microorganisms make important contributions to nutrient cycling, based on recent observations from natural ecosystem (Neufeld et al., 2008; Pester et al., 2010; Campbell et al., 2011; Wihelm et al., 2014).

Chapter 3 examines the functional potential of the UBC EBPR pilot plant using metagenomic sequencing. To date, studies exploring the functional potential of full-scale EBPR

20 communities have been limited, but suggest that EBPR ecosystems manifest a high degree of microdiversity and significant selection pressure from phage (Albertsen et al., 2011; Albertsen et al., 2013). We hypothesized that local selection pressures in EBPR ecosystems are responsible for genomic differentiation events observed between microbial populations, and believe such differentiation is acquired through mobile gene pools, based on current ecology theory (Polz et al., 2013; Cordero and Polz, 2014).

This thesis had the following objectives:

1. Compile current knowledge on the microbial ecology of biological phosphorus removal.

2. Determine the spatial and temporal activity dynamics of microorganisms in the UBC

EBPR Pilot Plant.

3. Assess the functional potential of microbial communities in the UBC EBPR Pilot Plant

and compare this potential to other EBPR ecosystems.

21 Chapter 2: Microbial community structure and activity in a pilot-scale EBPR ecosystem

2.1 Synopsis

Enhanced biological phosphorus removal (EBPR) relies on diverse but specialized microbial communities to mediate the cycling and ultimate removal of phosphorus from municipal wastewaters. However, little is known about microbial activity and dynamics in relation to process fluctuations in EBPR ecosystems. Here, temporal changes in microbial community structure and activity were monitored across each bioreactor zone in a pilot-scale EBPR treatment plant by examining the ratio of small subunit ribosomal RNA (SSU rRNA) to SSU rRNA gene

(rDNA) over a 120-day study period. Although the majority of OTUs in the EBPR ecosystem were rare, many maintained high potential activities based on SSU rRNA:rDNA ratios, suggesting that rare OTUs contribute substantially to protein synthesis potential in EBPR ecosystems. Few significant differences in OTU abundance and activity were observed between bioreactor redox zones, although differences in temporal activity were observed among phylogenetically cohesive OTUs. Moreover, observed temporal activity patterns could not be explained by measured process parameters, suggesting that other ecological drivers, such as grazing or viral lysis, modulated community interactions. Taken together, these results point towards complex regulatory controls within the EBPR ecosystem based on “anticipatory” life strategies that attune EBPR communities to changing bioreactor redox conditions and nutrient concentrations.

Section of this Chapter have been submitted for publication and were presented at:

Lawson, C.E., Rabinowitz B., Mavinic, D.S., Ramey, W.D., Hallam, S.J. (2013). Structure of the active microbial community in a pilot-scale enhanced biological phosphorus removal process revealed through 454-pyrotag sequencing. Proceedings of the 86th Annual Water Environment Federation Technical Exhibition and Conference, Chicago, Illinois, October 5-9, 2013.

22 2.2 Background

Despite its widespread usage, the stability and efficacy of EBPR can be unreliable at full-scale operations due to removal or reduced activity of taxa mediating polyphosphate storage and nutrient cycling (Neethling et al., 2005). The cause of this instability has been unpredictable, as our knowledge of microbial community structure and activity within the EBPR ecosystem is limited. As such, there is a pragmatic interest in charting the structure, function, and activities of microbial communities in EBPR ecosystems to better link bioreactor processes to microbial agents and ultimately improve the design and operation of wastewater treatment plants.

Multiple molecular surveys targeting the small subunit ribosomal RNA gene (SSU rDNA) have been conducted to chart microbial abundance across EBPR ecosystems (Eschenhagen et al.,

2003; Hall et al., 2010; Wan et al., 2011; Silvia et al., 2012) and recent studies have improved quantitative insights through the use of high-throughput amplicon sequencing (Zhang et al., 2012;

Kim et al., 2013a; Saunders et al., 2013). These studies revealed that despite being diverse, EBPR ecosystems shared a core microbiome composed of relatively stable and abundant taxa (Nielsen et al., 2010; Zhang et al., 2012). Stable isotope probing and microautoradiography have provided several metabolic linkages between bioreactor processes and active taxa, including polyphosphate accumulation (Kong et al., 2005; Kim et al., 2013b), nitrification (Dolinšek et al., 2013), denitrification (Osaka et al., 2006; Thomsen et al., 2007), hydrolysis (Kragelund et al., 2007), fermentation (Kong et al., 2008; Nielsen et al., 2012), and grazing (Moreno et al., 2010). While these studies effectively linked microbial community structure to function in EBPR ecosystems, they neither accounted for total diversity of active microbes nor determined dynamics of these microbial activities in relation to process fluctuations. Moreover, these studies did not consider whether the rare biosphere, defined by the long tail of low-abundance taxa identified in microbial communities (Sogin et al., 2006; Pedrós-Alió, 2006), contributed to EBPR nutrient cycling and process stability.

High-throughput sequencing combining both SSU rDNA and rRNA can be used to

23 measure the abundance and potential activity of abundant and rare taxa within microbial communities (Campbell et al., 2011; Campbell and Kirchman, 2013; Hunt et al., 2013; Wihelm et al., 2014). Here, the potential activity of microbial taxa was inferred based on the ratio of recovered rRNA to rDNA sequences. While this approach does not directly measure in situ microbial growth rates, rRNA can represent past, present, or emerging cellular activities and provide a robust proxy for microbial protein synthesis potential (Blazewicz et al. 2013).

Moreover, monitoring rRNA dynamics over time can identify changes in ribosome synthesis and degradation, and inform hypotheses related to life strategies within microbial communities (Lepp and Schmidt, 1998; Barnard et al., 2013). Indeed, the cyclical exposure of microbes to dynamic

EBPR redox conditions (anaerobic to aerobic) is believed to support multiple coexisting life- strategies differentiated by population-level metabolic traits (e.g. aerobic respiration versus fermentation) (Nielsen et al., 2012).

In the present study, microbial community dynamics were monitored across each bioreactor zone in a pilot-scale EBPR treatment plant over a 120-day study period using 454 pyrosequencing; this targeted the V6-V8 region of SSU rDNA and rRNA with three-domain resolution. Resulting sequence information was used to explore the potential activity of rare and abundant taxa in EBPR ecosystems and assess whether microbial community activity exhibited temporal variation under steady-state bioreactor conditions. Resulting datasets indicate that a large proportion of rare taxa are potentially active in EBPR ecosystems, similar to natural ecosystems (Jones and Lennon, 2010; Campbell et al., 2011; Hunt et al., 2013; Wilhelm et al.,

2014) and that rare taxa may make important contributions to EBPR nutrient cycling and process stability.

24 2.3 Experimental procedures

2.3.1 Pilot plant operation and sampling

This study was conducted at the Staging Environmental Research Centre (SERC; lat. 49.245378, long. -123.22940) located on the UBC campus. SERC is an EBPR pilot plant operating with a municipal wastewater feed and supplementary acetate addition as a SCFA source. The plant uses a University of Cape Town (UCT) activated sludge configuration with a Zee-Weed® membrane system installed in the aerobic zone for solids-liquid separation and is designed for carbon oxidation, nitrification-denitrification, and EBPR (Figure 2.1). The pilot plant was operated under steady-state conditions at a solids retention time (SRT) of 15 days for the duration of the study.

Wastewater temperatures ranged from 13 to 20oC. Biomass samples were collected from each bioreactor zone every 2 weeks from February 2013 to May 2013 to characterize microbial community structure and activity (Table A1). To avoid RNA degradation, samples were immediately flash frozen and stored at minus 80oC until further use. Table 2.1 summarizes operating conditions and performance of the treatment plant during the 120-day study period.

Chemical oxygen demand (COD), orthophosphate-phosphorus (PO4-P), total phosphorus (TP), ammonium-nitrogen (NH4-N), nitrate/nitrite-nitrogen (NOx), total Kjeldahl nitrogen (TKN), total suspended solids (TSS), and SCFA were measure according to Standard Methods (APHA, 2005).

Figure 2.1 Generic configuration of the UBC EBPR pilot plant. The UBC EBPR pilot plant employs a University of Cape Town (UCT) activated sludge configuration with membrane filtration for solids-liquid separation. Carbon-rich primary effluent enters the anaerobic zone and is mixed with recycle activated

25 sludge from the anoxic zone. Anaerobic conditions are characterized by minimal dissolved oxygen and nitrate concentrations and microbial phosphorus release. Sodium acetate is added to the anaerobic zone as an external carbon source to improve biological phosphorus removal and denitrification. Activated sludge then enters the anoxic zone and is mixed with nitrate-rich activated sludge from the aerobic zone. Dissolved oxygen concentrations are minimal. Here, anaerobic respiration of carbon with nitrate occurs; nitrogen is removed from the system as nitrogen gas. Biomass then enters the aerobic zone where nitrification, carbon oxidation, and phosphorus uptake occur. Dissolved oxygen levels are maintained via air sparging. A submerged membrane unit provides solids-liquid separation; effluent is returned to the sewer system and biomass is retained in the bioreactor, such that all microorganisms experience the same solids retention time governed by the solids wasting rate. Phosphorus removal occurs via solids wastage.

Table 2.1 Process operations and performance data Parameter unit N Mean Max Min Influent COD mg/L 25 168.52 ± 41.1 579 59 SCFA mgCOD/L 24 43.24 ± 4.3 67.09 21.21

PO4-P mg/L 98 2.40 ± 0.09 3.35 0.66 TP mg/L 30 3.80 ± 0.3 5.52 2.35

NH4-N mg/L 99 36.90 ± 1.09 48.60 2.82

NOX-N mg/L 30 0.095 ± 0.017 0.36 0.00 TKN mg N/L 19 41.93 ± 4.96 88.20 22.60 Effluent COD mg/L 25 29.36 ± 11.4 122 0.00

PO4-P mg/L 98 0.29 ± 0.11 3.99 0.00 TP Mg P/L 30 0.24 ± 0.13 1.33 0.00

NH3-N mg/L 99 0.19 ± 0.09 4.70 0.00

NOX-N mg/L 99 16.19 ± 0.8 24.30 4.50 TKN mg N/L 30 1.36 ± 0.35 4.06 0.00 Operational data pH1 pH 23 7.28 ± 0.09 7.68 6.62 Temperature oC 106 16.37 ± 0.31 20.40 13.00

Dissolved O2 mg/L 110 2.26 ± 0.19 7.20 0.25 TSS1 mg/L 20 4441 ± 266 5350 3230 Influent Flow L/min 22 3.43 ± 0.03 3.6 3.3 Recycle ratio - 1 - - SRT days - 15 - - HRT hours - 10 - - N, number of measurements. Max, maximum observed value. Min, minimum observed value. 1Measurements were taken in the aerobic zone only.

26 2.3.2 Nucleic acid extraction and cDNA synthesis

Total genomic DNA and RNA was extracted from biomass samples using the FastDNA® Spin

Kit for Soil (MP Biomedicals, Solon, OH, USA) and the RNeasy Mini Kit (Qiagen, Valencia,

CA, USA), respectively. A 60 second bead-beating step with a spherical ceramic bead (lysing matrix E) at a speed setting of 4.0 on a FastPrep (MP Biomedicals) was used for sample lysis. An on-column DNase I digestion was applied to remove DNA contamination from RNA prior to RT-

PCR (Qiagen). Genomic DNA was quality checked using agarose gel electrophoresis and total

RNA was quality checked using the Bioanalyzer RNA 6000 Nano assay (Agilent, Santa Clara,

CA, USA) to ensure only high quality nucleic acids was used for downstream analysis. Total

RNA was reverse transcribed to complementary DNA (cDNA) using random hexamers and a

Superscript® III first-strand synthesis kit (Invitrogen). DNA contamination in RNA samples was determined by performing cDNA synthesis reactions without reverse transcriptase (RT).

Reactions without RT were then subject to PCR identically to reactions with RT.

2.3.3 PCR amplification and pyrosequencing of SSU rDNA and cDNA

The V6-V8 region of the SSU rRNA gene was amplified from DNA and cDNA templates using the universal primer pair 926F (5’-AAACTYAAAKGAATTGRCGG-3’) and 1392R (5’-

ACGGGCGGTGTGTRC-3’). Primers were modified to include 454 pyrosequencing adaptor sequences and reverse primers included a five base-pair barcode according to previously published protocols (Engelbrektson et al., 2010; Allers and Wright et al., 2012). 50 μl PCR reactions were performed in duplicate and pooled to minimize PCR bias. Each reaction used 0.6

μl Taq DNA Polymerase (5U/ μl), 5 μl 10X PCR buffer, 3 μl magnesium chloride, 4 μl 2mM dNTP mix, 1 μl of each primer, and 10 ng template. Negative controls were included with each reaction to ensure that no contamination of DNA had occurred. Samples were diluted to 10 ng/μl and pooled in equal concentrations prior to sequencing. Emulsion PCR and sequencing were

27 performed at Genome Quebec (Montreal, Canada) on the Roche 454 GS FLX Titanium platform

(454 Life Sciences, Branford, CT, USA) according to manufacturer’s instructions.

2.3.4 Processing of pyrotag sequences

A total of 1,432,464 SSU rDNA and rRNA pyrotag sequences were processed using the

Quantitative Insights Into Microbial Ecology (QIIME) version 1.4.0 software package (Caporaso et al., 2010). Sequences with less than 150 bases, ambiguous ‘N’ bases, and homopolymer runs were removed before chimera detection. Chimeric sequences were identified with QIIME via

ChimeraSlayer and removed prior to taxonomic assignment. A total of 633,521 rDNA and

794,255 rRNA non-chimeric sequences were clustered at 97% into operational taxonomic units

(OTUs). Representative sequences from each cluster were queried against the SILVA 111 ribosomal RNA database (Quast et al., 2013) using the Basic Local Alignment Search Tool

(BLAST) (Altschul et al., 1990) to assign . Singleton OTUs (represented by one read only) were omitted from downstream analysis to reduce over prediction of rare OTUs (Kunin et al., 2010).

2.3.5 Statistical analysis

To make microbial community data more suitable for multivariate analysis, OTU matrices were standardized and Hellinger-transformed (Legendre and Legendre, 1998; Ramette, 2007). SSU rDNA values for rare OTUs that recovered rRNA but not rDNA were imputed using non- parametric multiplicative replacement (Martín-Fernández et al., 2003). Differences in microbial community structure and activity during the study period were explored using non-metric multidimensional scaling (NMDS) based on Bray-Curtis dissimilarity. A stress value was calculated which measures how far the distances in the reduced-space configuration are from being monotonic to the original distances in the OTU matrix (Legendre and Legendre, 1998).

28 Redundancy analysis (RDA) was performed to interpret changes in microbial community structure and activity with process variables including SCFA/P ratio, temperature, dissolved oxygen, effluent nitrate/nitrite, and effluent phosphorus. Permutation tests were conducted to assess the significance of RDA constraints (Legendre et al., 2011). Indicator analysis was used to identify OTUs specifically associated with time periods predefined based on NMDS ordinations.

The statistical significance of the indicator value was evaluated using a randomization procedure.

NMDS and RDA were performed using the package vegan, version 2.0.8, and indicator analysis was performed using the package labdsv version 1.5.0 in R.

2.4 Results

2.4.1 SSU rDNA and rRNA sequencing

A total of 616,766 rDNA and 778,266 rRNA pyrotag sequences recovered from 72 time-resolved and replicated biomass samples were clustered with a 97% similarity cutoff into 30 946 OTUs after singleton removal (Table A2). These OTUs encompassed 53 phyla affiliated with 402 families spanning Archaea, Bacteria and Eukaryota. Relative activity of microbial populations was inferred based on the ratio of recovered SSU rRNA sequences to rDNA sequences (SSU rRNA:rDNA ratio) for a given time interval. OTUs were considered active if rRNA recovery exceeded rDNA recovery (Rodriguez-Blanco et al., 2009; Jones and Lennon, 2010). Abundant

OTUs were arbitrarily defined as having a frequency >1% in at least one sample, intermediate

OTUs as having a frequency between 1% and 0.1% in at least one sample, and rare OTUs as having a frequency <0.1% in all samples (Galand et al., 2009). Rarefaction curves constructed from rDNA and rRNA libraries across each bioreactor redox zone and time point indicated that the total diversity of each sample was not reached despite the depth of sequencing (Figure 2.2).

29

rDNA/anaerobic rDNA/anoxic rDNA/aerobic

AD2 BD2 CD2

AD4 CD7 0

0 BD4

0 5

AD5

0

0

0

0

0

4

0 4 BD7

CD8

0

0 0

4 CD4

CD6

AD8 0

AD1 BD8

0 0

AB7 0

0 3 0 CD5

3 BD1

0

0

0

s s

s AD6 BD6

3

U U U

T T

T AD3 BD5

O O

O BD3

f f

f CD1

o o o

0

0

# # #

0

0

0

0

2

0

2

0

0 2

CD3

0

0

0

0

0

0

0

0

1

1

0

1

0 0 0

0 2000 4000 6000 8000 10000 12000 0 2000 4000 6000 8000 10000 12000 0 5000 10000 15000

Sequencing Depth Sequencing Depth Sequencing Depth

rRNA/anaerobic rRNA/anoxic rRNA/aerobic

AR6 BR6 CR6 BR7 CR7 AR7

AR5

AR8 0 0

0 CR8 CR5 0

AR4 0

0

0 0

AR3 4

4 0

4 CR4 BR8 CR3 AR2 BR5

BR4 BR3

CR2

0 0

0 CR1 0

0 BR2

0

0 0

AR1 3

0

3 3

BR1

s s s

U U U

T T T

O O O

f f f

o o o

0

0

0

0

# # #

0

0

0

0

0

2

2

2

0

0

0

0

0

0

0

0

0

1

1

1

0 0 0

0 2000 4000 6000 8000 10000 12000 14000 0 2000 4000 6000 8000 10000 12000 14000 0 5000 10000 15000 20000

Sequencing Depth Sequencing Depth Sequencing Depth

Figure 2.2 Anaerobic (red), anoxic (green), and aerobic (blue) zone rarefaction curves for each day in time series. Horizontal axis shows number of reads sampled; vertical axis indicates accumulation of OTUs. Rows 1 and 2 indicate SSU rDNA and rRNA reads, respectfully. Curves that become horizontal with increasing sequencing depth indicate that full sampling depth has been reached.

30 The majority of OTUs in the EBPR bioreactor were rare; only 403 OTUs had maximal rDNA abundance greater than 0.1% at any time point sampled (Figure 2.2; Figure 2.3). This is consistent with previous reports on microbial communities from natural (Sogin et al., 2006;

Gibbons et al., 2013) and engineered ecosystems (Kim et al., 2013c), where rare taxa are thought to act as a seed bank of dormant microbes capable of resuscitating in response to environmental change (Lennon and Jones, 2011). We observed that on average 29% of all OTUs present in the

EBPR bioreactor were dormant (n = 8995), based on OTUs that recovered rDNA sequences but not rRNA sequences. This approach has previously been used to estimate the frequency of dormant cells in natural ecosystems (Lennon and Jones, 2010). Indeed, the majority of these

OTUs (>99%) belonged to the rare EBPR biosphere, representing 8% of total community composition based on rDNA. However, many rare EBPR taxa were not dormant and maintained a consistently high level of activity in the bioreactor (see below). Interestingly, 48% of total rRNA abundance was recovered from the rare EBPR biosphere, suggesting that rare taxa contributed almost equally to the community’s protein synthesis potential (Figure 2.3). On average, approximately 90% of eukaryotic OTUs, 65% of archaeal OTUs, and 35% of bacterial OTUs from the rare biosphere were active, based on average SSU rRNA: rDNA ratios. In comparison, only 11% of OTUs with rDNA abundance >0.1% (n = 403) were active. Overall, we observed that rDNA and rRNA for individual OTUs did not correlate well within the EBPR ecosystem

(Figure 2.3). This suggests that microbial abundance does not necessarily scale with potential activity in the EBPR milieu. Similar results were found for bacterial communities along an estuarine salinity gradient that experiences wide variation in environmental conditions over intermittent temporal and spatial scales (Sharp et al., 2009; Campbell and Kirchman, 2013).

31

Figure 2.3 Relationship between SSU rDNA and rRNA frequencies for abundant, intermediate, and rare OTUs in the pyrotag dataset, separated by bioreactor redox zone (anaerobic, anoxic, aerobic). Each point reflects paired SSU rRNA and rDNA coordinates for each OTU and time point.

2.4.2 Overview of microbial community structure and activity

The EBPR pilot plant maintained a relatively diverse microbial community structure during the study period, with rDNA datasets manifesting greater diversity overall than rRNA datasets (Table

S2). Occasionally, rRNA diversity was greater than rDNA diversity, a difference attributed to increased representation of active eukaryotic taxa. Figure 2.4 summarizes abundance and activity of major taxa identified. When core taxa were defined as OTUs detected across all biomass

32 samples (Shade and Handelsman, 2011), a core microbial community was consistently recovered throughout the 120-day study period.

SSU rRNA SSU rDNA SSU rRNA: rDNA ratio Euryarchaeota Archaea Other Archaea Armatimonadetes Acidobacteria Acidimicrobiaceae Candidatus Microthrix Mycobacterium Gordonia Actinobacteria PeM15 Solirubrobacterales Other Actinobacteria Cytophagaceae Flavobacteriaceae Bacteroidetes Chitinophagaceae Saprospiraceae Other Bacteroidetes TM6 Candidate division TM7 Chloroflexi Cyanobacteria Streptococcus Lactococcus Firmicutes Clostridia

Other Firmicutes e

c Gemmatimonadaceae n

a Nitrospira d

n Planctomycetes u

b Caulobacterales A

Rhizobiales

e v

i Sphingomonadaceae

t a

l Other Alphaproteobacteria e

R Alcaligenaceae Acidovorax Other Comamonadaceae Pro teobacter ia Candidatus Nitrotoga Nitrosomonadaceae Dechloromonas Zoogloea Propionivibrio Other Rhodocyclaceae Other Myxococcales Other Deltaproteobacteria Pseudomonadales Xanthomonadales Other Gammaproteobacteria Verrucomicrobia Other Bacteria Ciliophora Euglenida Rotifera Eukaryota Amoebozoa Fungi Stramenopiles Other Eukaryota

Pyrotag Abundance SSU rRNA: rDNA Ratio Legend Redox Zone Aerobic 1% 4% 9%16% 1 4 >10 Anoxic Anaerobic Figure 2.4 Microbial community structure and activity. Relative abundance of SSU rDNA and rRNA pyrotags for microbial taxa recovered from the UBC EBPR bioreactor (indicated by circles). Circles on right panel indicate corresponding SSU rRNA:rDNA ratio. Differences in abundance and activity between zones for all OTUs were assessed by one-way analysis of variance (ANOVA) to test for significance (p- value <0.05).

33 On average, the most abundant bacterial phyla (rDNA >1%) were Bacteroidetes (29.6%),

Proteobacteria (28.7%), Actinobacteria (24.6%), Planctomycetes (3.8%), Firmicutes (2.8%),

Chloroflexi (1.8%), Verrucomicrobia (1.4%), and Nitrospirae (1.0%) (Figure 2.4). Thirty-nine other bacterial phyla had rDNA abundances <1%, including Gemmatimonadetes (0.79%),

Acidobacteria (0.77%), Armatimonadetes (0.20%), and Candidate division TM7 (0.19%). With the exception of Bacteroidetes, these results were consistent with other high-throughput sequencing studies of activated sludge (Zhang et al., 2012; Hu et al., 2012). Within bacteria,

OTUs affiliated with Candidate phylum TM6, Armatimonadetes, Firmicutes, and Proteobacteria appeared to be highly active, whereas OTUs affiliated with Candidate division TM7,

Planctomycetes, Verrucomicrobia, Bacteroidetes, Actinobacteria, and Chloroflexi appeared less active, based on observed SSU rRNA:rDNA ratios (Figure 2.4). The most abundant archaeal phyla included Euryarchaeota (0.20%) and Thaumarchaeota (0.01%), where the majority of

OTUs were affiliated with the methanogenic classes Methanobacteria and Methanomicrobia

(Figure 2.4). Methanogenic archaea appeared to be particularly active in the EBPR ecosystem, averaging SSU rRNA:rDNA ratios between 2 and 10. While methane production was not measured in this study, active populations of methanogens have previously been observed in activated sludge, although their role in carbon turnover was considered minor (Gray et al., 2002).

On average, the most abundant eukaryotic clades included Opisthokonta (1.8%),

Stramenopiles//Rhizaria (SAR) (0.54%), Amoebozoa (0.04%), and Excavta (0.01%).

The most abundant phyla included LKM11 group (Fungi-related) (1.24%), Ciliophora

(0.43%), Ichthyosporea (0.32%), and Rotifera (0.09%) (Figure 2.4). Recently, Evans and Seviour

(2012) identified Ascomycota, Basidiomycota, and Cryptomycota as dominant fungi in an EBPR ecosystem using fungal 18S rDNA clone libraries. However, OTUs affiliated with these fungal groups were rare (<0.02%) throughout the 120-day study period. While eukaryotic OTUs represented only 2.4% of rDNA, they represented a significant proportion of the community rRNA, encompassing >35% on average. We observed that Ciliophora, Amoebozoa, Excavta

34 (mainly Euglenida), Rotifera, and Basidiomycota were consistently active in the EBPR ecosystem, with Ciliophora and Amoebozoa averaging SSU rRNA:rDNA ratios >100 (Figure

2.4). Ciliphora and Amoebozoa have recently been identified as active protozoa abundant in activated sludge and play a major role in bacterial grazing and nutrient cycling (Moreno et al.,

2010).

2.4.3 Relative rRNA abundance across EBPR redox zones

To test whether potential activity shifted across EBPR redox zones, SSU rRNA:rDNA ratios were compared from the anaerobic, anoxic, and aerobic zones for each OTU at a given time interval.

Anaerobic, anoxic, and aerobic zone hydraulic retention times were approximately 1, 3, and 6 hours, respectfully, while the biomass solids retention time was 15 days. We observed that SSU rRNA:rDNA ratios for the majority of OTUs were not statistically different across the redox zones (Figure 2.4). These results were surprising, as we expected SSU rRNA:rDNA ratios to modulate based on changing redox conditions. Additionally, no changes in the relative abundance of SSU rRNA or rDNA were observed, suggesting that OTU abundance and ribosome levels were constant across redox zones. This contrasts previous studies done with EBPR communities enriched in Accumulibacter, which showed that rRNA abundances fluctuated with changing redox conditions in a sequencing batch reactor using reverse transcription quantitative PCR assays (He and McMahon, 2011a).

2.4.4 Abundance and activity of core EBPR taxa

The recently proposed core EBPR microbiome includes microbes mediating key bioreactor processes, including hydrolysis, fermentation, nitrification, denitrification, and biological phosphorus removal (Nielsen et al., 2010; Nielsen et al., 2012). Pyrosequencing allowed us to monitor potential activity of core EBPR taxa (genus-level) affiliated with these processes, in

35 addition to poorly characterized taxa (Table A3). Eukaryote populations commonly implicated in bacterial grazing were also monitored.

Hydrolyzing bacteria were the most abundant taxa in the EBPR ecosystem. However, their potential activity appeared to be the lowest. Abundant hydrolyzers were affiliated with the filamentous bacteria Microthrix and the Mycolata (mycolic-acid containing filaments).

Microthrix is a genus specialized in lipid degradation (McIlory et al., 2013), whereas the

Mycolata (mainly Gordonia and Mycobacterium) are implicated in the breakdown of lipids and polysaccharides (Krageland et al., 2007). Both groups are also commonly associated with foaming episodes and sludge separation problems (i.e. bulking) in activated sludge (Seviour and

Nielsen, 2010). Despite being abundant, their average SSU rRNA:rDNA ratios were typically low

(Table A3). This is consistent with previous reports that indicate Gordonia and Microthrix are either slow growing or carry low cellular levels of rRNA (Blackall et al., 1996; de los Reyes and

Raskin, 2002; Rossetti et al., 2005). Other known hydrolyzers abundant in the EBPR ecosystem included filamentous bacteria affiliated with the classes , Flavobacteriales, and

Sphingobacteriales of the Bacteroidetes and Caldilineae, Thermomicrobia, and Anaerolineae of the Chloroflexi. These groups are known to encompass many of the polysaccharide and protein- hydrolyzing bacteria detected in activated sludge (Miura et al., 2007; Xia et al. 2007; Kragelund et al., 2009; Yoon et al., 2010). SSU rRNA:rDNA ratios for these groups were also low, suggesting they had reduced metabolic activities in the bioreactor or are slow growing (Table

A3).

The most abundant known fermenting microorganisms detected in the EBPR ecosystem belonged to the phylum Firmicutes and were affiliated with Streptococcus and Lactococcus.

Despite having average rDNA abundances <0.5%, Streptococcus and Lactococcus consistently had the highest potential activity of all bacteria present in the EBPR ecosystem, averaging SSU rRNA:rDNA ratios of >20 (Table A3). These groups are frequently identified in EBPR ecosystems as active fermenters, producing a diverse range of fermentation products, such as

36 short-chain fatty acids (Kong et al., 2008; Nielsen et al., 2012). Other fermenters detected in the

EBPR ecosystem were affiliated with the classes Clostridiales and Bacteroidales. SSU rRNA:rDNA ratios indicated that these taxa had moderate activity levels (Table A3).

Polyphosphate accumulating organisms detected in the EBPR ecosystem were affiliated with the genera Propionivibrio and . Species of the Propionivibrio were assigned to either uncultured Candidatus ‘Accumulibacter sp.’ or uncultured bacterium and maintained very high SSU rRNA:rDNA ratios (Table A3). Tetrasphaera-related PAO were observed to have very low abundance and potential activity in the EBPR ecosystem (Table A3), in contrast to other studies that have reported high levels of the Actinobacterial-PAO in full-scale systems (Kong et al., 2005; Nguyen et al., 2011; Mielczarek et al., 2013). This suggests that Tetrasphaera was likely not a significant contributor to biological phosphorus removal in our system. Competitors to PAO, known collectively as glycogen accumulating organisms (GAO), were also detected in the bioreactor, albeit at very low rDNA and rRNA abundances (Table A3). Known GAO observed were affiliated with the genus Defluviicoccus; no Competibacter-related GAO were detected. Defluviicoccus appeared to be relatively active (average SSU rRNA:rDNA ratio ~1).

However, their relative rRNA abundance (proxy for protein synthesis potential) was still low

(>0.1%), indicating they were likely not major competitors to PAO (Table A3).

Known nitrifying community members identified in the EBPR ecosystem included the ammonia-oxidizing bacteria (AOB) Nitrosomonas and Nitrosospira and the nitrite-oxidizing bacteria (NOB) Nitrospira and Candidatus ‘Nitrotoga’. Nitrosomonas and Nitrosospira were potentially active in the EBPR ecosystem (rRNA:rDNA ratio >2), despite belonging to the rare biosphere and are commonly cited as the main AOB in activated sludge (Zhang et al., 2011)

(Table A3). Nitrosococcus-related AOB were also detected at low rDNA abundances and moderate SSU rRNA:rDNA ratios (Table A3), which have previously been observed in activated sludge (Juretschko et al., 1998). Similarly, ammonia-oxidizing archaea (AOA) affiliated with

Thaumarchaeota and Marine Group 1 were identified in the rare EBPR biosphere, albeit at low

37 rRNA and rDNA abundances (<0.01%). These groups have been detected in a wide variety of natural and engineering ecosystems, including activated sludge (Park et al., 2006; Stahl and

Torre, 2012).

Members of the Nitrospira were abundant NOB in the bioreactor (average rDNA ~ 1%) consistent with previous reports (Luker et al., 2010). However, their SSU rRNA:rDNA ratio was

~ 0.74, on average. Indeed, this agrees with the presumed “K-strategist” lifestyle of Nitrospira, which suggests that these NOB possess a reduced maximum specific growth rate, but are well adapted to low nitrite concentrations allowing for their proliferation in EBPR ecosystems

(Nogueira and Melo, 2006). In addition to Nitrospira, Candidatus ‘Nitrotoga’-related NOB were detected. Interestingly, these bacteria had the highest SSU rRNA:rDNA ratios of all nitrifying bacteria, which may reflect their high adaptability to cold-weather climates, such as those experienced in Canada (Alawi et al., 2007; Alawi et al., 2009) (Table A3).

Known denitrifying bacteria typically observed in EBPR ecosystems are affiliated with the families Rhodocyclaceae, Comamonadaceae, and Hyphomicrobiaceae (Kong et al., 2004; Osaka et al., 2006; Hesselsoe et al., 2009). Abundant denitrifiers identified in our system belonged to the

Rhodocyclaceae family, including Dechloromonas, Thauera and to a lesser extend Zoogloea

(Table A3). Dechloromonas consistently had the highest potential activity of all denitrifiers present in the EBPR ecosystem, averaging SSU rRNA:rDNA ratios >8 (Table A3) and has also been implicated in polyphosphate accumulation (Goel et al. 2005; Kong et al. 2007).

Accumulibacter has also been shown to denitrify (Flowers et al., 2009) and was also potentially active as described above.

Eukaryotes affiliated with Peritrichia, Suctoria, and Aspidisca within the Ciliphora and

Vannella within the Amoebozoa were observed to be the dominant protozoan OTUs in the EBPR ecosystem (Table A3). These groups had the highest SSU rRNA:rDNA ratios of all taxa in the bioreactor and are implicated in bacterial grazing (Moreno et al., 2010; Nielsen and Seviour,

2010). Other potentially active grazing populations included Euglenida and members of the

38 phylum Cercozoa (both protozoa), as well as Rotifera-related taxa (Rotifers) (Table A3). As the number of rRNA operons and ribosomes per cell drastically differs between and prokaryotes, it is difficult to compare these results to prokaryotic rRNA levels directly (Gong et al., 2013). Nonetheless, the abundance of protozoan rRNA (>35%) in the EBPR ecosystem still suggests that bacterial grazing was likely significant and plays a considerable role in controlling population dynamics and nutrient cycling.

In addition to well-characterized taxa, OTUs affiliated with Nannocystis within the

Myxococcales (Myxobacteria) were also abundant and potentially active in the EBPR ecosystem.

Myxobacteria are known for their ability to aggregate into fruiting bodies, and secrete a number of extracellular compounds including exoenzymes, antibiotics, and bioflocculants (Zhang et al.,

2002; Velicer and Vos, 2009). While this may suggest that Nannocystis-related taxa may actively contribute to floc formation in the bioreactor, further work is needed to characterize their phenotype in EBPR ecosystems.

It is important to note that variable differences in activity for individual OTUs were observed at the genus level for numerous core EBPR taxa (Figure 2.5). For example, OTU 54595 and 37371 were both highly abundant taxa affiliated with Microthrix that exhibited markedly different activity profiles; OTU 54595 had high SSU rRNA:rDNA ratios on days 48 and 65, whereas OTU 37371 SSU rRNA:rDNA ratios were consistently low. Similar variability between different OTU of the same genera were found for most genera, including Accumulibacter,

Dechloromonas, Streptococcus, Lactococcus, and Gordonia (Figure 2.5). This suggests that a high degree of microdiversity exists within EBPR ecosystems with potential implications for bioreactor performance. Interestingly, many OTUs across different genera appeared to share a sharp increase in activity during Days 48 and 65 (Figure 2.5). However, the cause of this increased activity could not be explained by measured process parameters, suggesting that other ecological drivers were at play.

39

Figure 2.5 Activity profiles for selected genera. Profile of relative abundance of SSU rDNA and rRNA pyrotags for selected OTUs (97% similarity cutoff) affiliated with genera over 120-day study period. Solid line indicates SSU rDNA abundance; dashed line indicates SSU rRNA abundance. Profiles on right panel indicate corresponding SSU rRNA:rDNA ratio. OTUs marked with an asterisk (*) were affiliated with uncultured Propionivibrio bacteria.

40 2.4.5 Temporal dynamics of community structure and activity

Although the EBPR community maintained a core population over the 120-day period, temporal dynamics in community structure and activity were still observed. This dynamic behavior was explored by ordination analysis of microbial community structure and activity at each time point using NMDS. As shown in Figure 2.6A, the bioreactor manifested two related community structures during days 1 - 34 (Period I) and days 79 – 120 (Period III), separated by a transition period during days 48 – 65 (Period II). The latter part of this transition period coincided with a process upset in biological P removal performance (days 57 – 65), where removal efficiencies dropped from 100% to ~70% on average (Figure 2.7). Overall changes in potential activity differed from changes in microbial community structure, where activity appeared to be much more variable, likely reflecting the increased responsiveness of rRNA to EBPR ecosystem dynamics (Figure 2.6B) (Barnard et al., 2013).

Points Legend 1 - Day1 5 - Day 65 Anaerobic (A) 2 - Day19 6 - Day79 Anoxic (B) 3 - Day 34 7 - Day 98 Aerobic (C) 4 - Day 48 8 - Day 112

NMDS/Bray - Stress = 0.074 NMDS/Bray - Stress = 0.098 0

. Plot A - community structure Plot B - community activity

1

5

.

0 5

. B4

0 C4 B3 B5 C3 C5

A4 C6 2

2 A7 A3

A5 B6 S

S B7

0 C2 A6

C7 D . D A8 A2

0 B7 M M C8 B8 B5 B2 A7

B6 N N C6 A5 C7 A6 C5 A1 C1 Period III B1

0 A8 .

0 C3 B4 B8 C8 A1 C2 B2 Period II C1 A4

B1 B3 5

. 0 A2 - A3 Period I

C4

5

.

0 -

-0.5 0.0 0.5 1.0 -0.5 0.0 0.5

NMDS1 NMDS1 Figure 2.6 Bray-Curtis dissimilarities between samples (i.e. bioreactor communities) across all time points determined by non-metric multidimensional scaling. (A) NMDS unconstrained ordination of microbial community structure (SSU rDNA abundance) and (B) activity (SSU rRNA:rDNA ratio) for each time point and redox zone. 41 Influent UBC EBPR Pilot Plant PO4 Concentrations Effluent % Removal Period I Period II Period III

4.5 100%

4.0 90%

80% 3.5

70% 3.0

60%

) L

/ 2.5

g

m (

50%

4

O 2.0 P 40%

1.5 30%

1.0 20%

0.5 10%

0.0 0% 0 20 40 60 80 100 120 Days in Operation (Jan 31st to May 30th, 2013)

Figure 2.7 UBC EBPR Pilot Plant phosphate (PO4) removal performance during the 120-day study period. Triangles represent influent PO4 concentrations, circles represent effluent PO4 concentrations, and diamonds represent PO4 percent removal efficiency. Red lines indicate sampling events.

42 To interpret changes in microbial structure and activity with measured process parameters, redundancy analysis (RDA) was conducted. Here, an attempt was made to explain the variance in microbial community structure and activity by linear combinations of several environmental variables, including influent SCFA/total phosphorus ratio, temperature, dissolved oxygen, and effluent PO4. However, RDA revealed no significant canonical axes (p < 0.05; Figure 2.8), based on permutation testing done using the annova.cca() function in the R package vegan. This indicates that variations along the canonical axes were not distinguishable from random chance alone (Legendre et al., 2011). It is possible that the lack of significant canonical axes was due to a limited number of time points (n=8) and a lack of seasonal resolution. A recent survey by Flowers et al. (2013) suggests that bacterial community structure in EBPR ecosystems can change with seasonal temperature variation. However, other studies have shown only weak correlations between microbial community structure and process parameters (Mielczarek et al., 2013).

Unexpectedly, the influent SCFA/total phosphorus ratio did not correlate well with effluent phosphate concentrations (Figure 2.8). Approximately 7 to 10 mg of acetate are needed to remove

1 mg of phosphorus, based on experimental evidence (Grady et al., 2011). Given this, SCFA availability was likely not limiting our ability to achieve low effluent phosphorus concentrations, as the loading of SCFAs to the anaerobic zone of our pilot-scale bioreactor was in excess of what is typically necessary for good biological phosphorus removal (ratio >20), due to the addition of external sodium acetate to the process.

A further attempt was made to resolve changes in community structure and activity using indicator analysis. OTUs were considered indicators of a particular time period if their relative frequency during that period was >50% compared to other time periods. We observed that abundant indicator OTUs (>1% rDNA) from Periods I and III were affiliated with

Dechloromonas, Thauera, and Hydrogenophilaceae (Table A4). Dechloromonas-related OTUs appeared very active, albeit their potential activity levels were consistently high across all

Periods. No indicator OTUs were observed during Period II. Consistent with RDA, indicator

43 analysis revealed no abundant indicator OTUs associated with changes in microbial activity

(Table A5).

RDA Biplot A - community abundance ~scaling 1 RDA Biplot B - community activity ~scaling 1

1 1 T

Day 65

)

) 5

5 Day 112 0 4 4

eff PO 9

7

2 2

Day 98 .

. Day 1

0

0

=

=

p

p

,

, Day 34

d

d

e e

DO n n

Day 19 Day 112 i

i a

a Day 98 l

l Day 79

0 0 0 0 p

p Day 1 Day 48 T

x

x

e

e

SCFA/P Day 19

eff PO4 %

% Day 48

5

6 1

1 Day 65

(

(

SCFA/P

2

2 A

A Day 34 Day 79

D

D

5 5

- -

R R

DO

1

-

1 -

-5 0 5 10 -5 0 5 10

RDA1 (17% explained, p=0.217) RDA1 (18% explained, p=0.204)

Figure 2.8 RDA bioplots constrained by four environmental parameters: temperature (T), dissolved oxygen

(DO), effluent phosphate (eff PO4), and SCFA/total phosphorus ratio (SCFA/TP). Bioplot A, no statistically significant axes were found: RDA1 (p=0.217), RDA2 (p=0.274). Biplot B, no statistically significant axes were found: RDA1 (p=0.204), RDA2 (p=0.290).

2.5 Discussion

2.5.1 Rare biosphere is active in EBPR ecosystems

In natural and engineered ecosystems much of the microbial diversity is associated with low

abundance taxa (i.e. rare biosphere) (Sogin et al., 2006; Shade et al., 2012; Kim et al., 2013c).

While the degree to which rare taxa contribute to ecosystem function remains poorly understood

(Pedrós-Alió 2012), recent studies suggest that the rare biosphere maintains a persistent microbial

seed bank (Gibbons et al., 2013), containing both active and inactive members (Campbell et al.,

2011). Our results indicate that even though 71% of the OTUs in the EBPR ecosystem maintained

some activity, a large proportion were inactive or dormant (29%), possibly reflecting the large

44 changes in nutrient and electron acceptor availability experienced across bioreactor redox zones.

Cellular stress response associated with such changes is a well-documented control on dormancy in bacteria (Braeken et al., 2006; Lennon and Jones, 2011) and it is reasonable to expect that these stressors exist in EBPR ecosystems, given the rapid and constant exposure to alternating redox conditions and substrate availabilities. Other factors contributing to dormancy in natural ecosystems include environmental perturbation and predation (Lennon and Jones, 2011). As

EBPR ecosystems are subject to fluctuating operating conditions, such as toxic shock loadings

(Henriques and Love, 2007), and strong selective pressure from phages (Kunin et al., 2008;

Albertsen et al., 2012) these factors likely contribute to dormant microbial seed bank maintenance in engineered ecosystems as well.

Active taxa have also been observed within the rare biosphere of natural ecosystems

(Campbell et al., 2011; Hugoni et al., 2013; Wilhelm et al., 2014) and some studies show that rare taxa make important contributions to nutrient cycling (Neufeld et al., 2008; Pester et al., 2010).

Our results indicated that many rare taxa in the EBPR ecosystem maintained high activity levels over the study period, based on SSU rRNA:rDNA ratios and increases in rRNA abundances over time. Particular examples were OTUs affiliated with fermenting bacteria Streptococcus and

Lactococcus and nitrifying bacteria Nitrosomonas and Candidatus ‘Nitrotoga’. It should be noted that the number of rRNA operons and ribosomes per cell varies between taxonomic groups, especially between eukaryotes and prokaryotes, which could bias the correlation between rRNA abundance and microbial activity found here (Crosby and Criddle, 2003; Blazewicz et al., 2013;

Gong et al., 2013). With this caveat in mind, our results remained consistent with a recent study that identified low-abundance bacteria affiliated with Streptococcus and Lactococcus as important glucose fermenters in EBPR ecosystems using stable-isotope probing combined with fluorescent in situ hybridization (Nielsen et al., 2012).

The occurrence of consistently rare and active taxa is interesting from a functional perspective because active taxa are expected to eventually accumulate in abundance over time

(Pedrós-Alió, 2012). It is possible that populations within these rare and active taxa did not

45 become abundant due to high maintenance energy demands associated with fluctuating nutrient concentrations and redox conditions across the EBPR ecosystem. Such conditions have the potential to limit growth, particularly for taxa with thermodynamically constrained energy metabolisms, such as fermentation (Russell and Cook, 1995). Another possibility is that rare and active taxa do indeed grow, but experience prompt decay via viral lysis or protozoan grazing (i.e.

“top-down” regulation). Similar controls have been evoked in marine ecosystems, where rare taxa exhibiting high activity levels have also been observed (Hunt et al., 2013). Given these constraints, our results indicate that the rare EBPR biosphere might act as a seed bank, but may also contribute to metabolic transformations within the EBPR ecosystem based on the high potential for protein synthesis observed among rare taxa.

2.5.2 Temporal activity patterns suggest high microdiversity within genera

Although it is commonly assumed in EBPR ecosystems that OTU affiliations at the genus level share functional similarity (Zhang et al., 2012; Nielsen et al. 2012), we observed that potential activities between OTUs varied dynamically over time. For example, two abundant OTUs affiliated with Microthrix had considerably different activity levels during the 120-day study period (Figure 2.5). While recent reports indicate that low genomic diversity exists among

Microthrix strains (McIlroy et al. 2013), our results suggest that fine-scale differences in genome content or gene expression could select for differential OTU activity or function within the EBPR ecosystem. Indeed, fine-scale population differences have been observed among other phylogenetically cohesive units abundant in EBPR ecosystems, based on taxonomic (He et al.,

2007; McMahon et al. 2007) and genomic comparisons (Flowers et al., 2013).

Further investigation into the activity profiles of individual OTUs revealed that some taxa shared a common spike in activity during days 48 and 65. This spike could not be explained by measured process parameters and could reflect increases in grazing activities or viral lysis. In freshwater ecosystems, increased predation by viruses and has been shown to stimulate bacterial activity, likely through nutrient and substrate regeneration (Pradeep Ram and Sime- 46 Ngando, 2008; Berdjeb et al., 2011). Interestingly, we observed a considerable increase in amoeboid protozoan activity (Vannellida-related) during days 48 and 65, consistent with increased predation potential (Figure 2.5). However, as ecological interactions between bacteria, viruses, and grazing populations is poorly understood in EBPR ecosystems, further work is needed to interpret the impact these relationships have on EBPR ecosystem function and performance.

2.5.3 Anticipatory life strategy for EBPR microbes?

Our results indicate that the relative abundance of rRNA for individual OTUs was similar across all bioreactor redox zones (anaerobic/anoxic/aerobic). This was surprising, as previous studies using EBPR communities enriched in Accumulibacter showed rRNA abundance changed with sequencing batch reactor conditions (He and McMahon, 2011a). Sequencing batch reactors provide prefect separation between redox zones, whereas the continuous flow process employed in this study constantly recycles biomass from downstream, such that there is substantial mixing between zones. Indeed, the apparent level of similarity in rRNA abundance observed in our study may be due to ribosome carry-over and/or mixing between redox zones and therefore not reflect true differences in rRNA synthesis. Similar results were found at the protein level, where synthesized proteins could not be assigned to a particular redox zone (Wilmes et al., 2008). In a follow-up study, Wexler et al. (2009) used radioactive protein labeling to confirm that differential protein synthesis occurred at very low levels. Based on this information, it is likely that some differential rRNA synthesis transpired, despite our inability to detect differences based on rRNA:rDNA ratios. However, we hypothesize that the majority of ribosome activity is controlled at the translational level, where ribosomes become active or inactive based on a given taxon’s metabolic status. Indeed, minimizing rRNA (and therefore ribosome) synthesis across the rapid and cyclical redox changes encountered in the EBPR milieu would have minimal energy costs compared to constant production of new ribosomes during each bioreactor pass. Nonetheless, future studies are needed to elucidate rRNA regulation processes in continuous flow EBPR 47 ecosystems; process changes that should be more observable under perturbed conditions (He and

McMahon, 2011b).

2.6 Concluding remarks

In summary, using a combination of SSU rDNA and rRNA sequencing we show that both rare and abundant taxa are potentially active in the EBPR ecosystem, and posit that rare taxa play important functional roles. Furthermore, we reveal that rRNA:rDNA ratios can vary dramatically among phylogenetically cohesive OTUs. This suggests that fine-scale population differences exist in EBPR ecosystems with the potential to impact process performance. Few changes in OTU rRNA abundance were detected across the EBPR redox zones consistent with “anticipatory” life strategies among EBPR microbiota, and observed temporal activity patterns could not be explained by measured process parameters. Taken together, these results point towards complex regulatory controls within the EBPR ecosystem that may be influenced by microniches (e.g. floc gradients) and localized spatial interactions (Shapiro and Polz, 2014). Given the abundance of potentially active eukaryotic OTUs identified in this study, these regulatory controls may include predation (e.g. grazers), in addition to other ecological drivers. It is important to note that rRNA abundances measured here reflect the potential for protein synthesis, including past, present, and future activities (Blazewicz et al., 2013). As such, additional studies combining labeling and incubation experiments with amplicon and plurality or single cell sequencing are needed to confirm whether rare taxa do indeed make meaningful contributions to nutrient cycling in EBPR ecosystems. When conducted with temporal resolution under different perturbation scenarios, such studies have potential to define ecological design principles needed to engineer improved nutrient cycling and population stability at EBPR wastewater treatment plants.

48 Chapter 3: Metagenomic analysis of a pilot-scale microbial community performing enhanced biological phosphorus removal

3.1 Synopsis

In this study, a metagenome was generated from biomass samples collected from a pilot-scale

EBPR treatment plant using 454 pyrosequencing. Resulting DNA sequences were used to compare the diversity of microorganisms and metabolic processes present in the EBPR ecosystem to multiple metagenomes from different environments. These comparisons revealed that EBPR community function was enriched in biofilm formation, phosphorus metabolism, and aromatic compound degradation, reflective of local bioreactor conditions. Consistent with the taxonomic composition of the pilot-scale EBPR treatment plant obtained by SSU rRNA pyrotag sequencing

(Chapter 2), population genomes extracted from assembled metagenomes were affiliated with

Candidatus ‘Microthrix parvicella’ and Gordonia spp. These population genomes were subsequently compared with existing references genomes to identify conserved and variable genomic regions. Although the M. parvicella population genome displayed remarkable similarity to other M. parvicella strains, functional differences in biofilm formation and antibiotic resistance often associated with mobile genetic elements reflected adaptation to habitat-specific selection pressures. This was further supported by the presence of prophage and phage defense mechanisms (EPS, restriction-modification systems, CRISPR) recovered from the metagenome.

In the case of Gordonia spp. a potentially novel role in polyP and TAG cycling was identified.

Taken together, our findings provide deeper insight on EBPR community function and enable future efforts aimed at monitoring spatiotemporal patterns in gene expression.

Sections of this Chapter will be submitted for publication and were presented at:

Lawson, C.E., Hall, E.R., Mavinic, D.S., Ramey, W.D., Hallam, S.J. (2014). Metagenomic analysis of a pilot-scale microbial community performing enhanced biological phosphorus removal. Proceedings of the 15th International Symposium on Microbial Ecology, Seoul, Korea, August 24-29, 2014.

49

3.2 Background

Elucidating the identity and function of microorganisms driving enhanced biological phosphorus removal (EBPR) has been a long-standing research objective since the inception of the technology over four decades ago (Mino et al., 1998; Seviour et al., 2003). It is now recognized that EBPR is carried out by a diverse and stable core microbial community that mediates distinct steps in bioreactor nutrient cycling (Nielsen et al., 2012; Section 1.5). While previous investigations have probed the ecophysiology of some core microorganisms, including

Accumulibacter, Tetrasphaera, and Microthrix (Nielsen et al., 2002; Zilles et al., 2002; Kong et al., 2005, 2007) limited information exists on the metabolic pathways catalyzing key biochemical transformations within the EBPR milieu. Moreover, ecological considerations that influence the assembly, partitioning, and ultimate control of microorganisms in EBPR ecosystems, such as local selection pressures, remain unclear (McMahon and Read, 2013).

Recent advances in sequencing technology permit the study of microbial communities in unprecedented detail (Riesenfeld et al., 2004; Tringe et al., 2005; Garcia Martin et al., 2006).

Indeed, this enables reconstruction of the metabolic blueprints underlying microbial controls on bioreactor performance (Tyson et al., 2004; Garcia Martin et al., 2006). To date, only a few studies exploring the metabolic potential of full-scale EBPR communities using metagenomic approaches have been published (Albertsen et al., 2012; Albertsen et al., 2013). Resulting datasets indicate that EBPR ecosystems manifest a high degree of microdiversity and significant selection pressure from phage (Kunin et al., 2008). While these studies have started to generate reference genomes for core EBPR microbes, including Microthrix (McIlroy et al., 2013) and

Tetrasphaera (Kristiansen et al., 2012), further efforts are needed to accurately reconstruct metabolic pathways and assess ecological and evolutionary dynamics of core microbial players in the EBPR milieu (Albertsen et al., 2012).

In this study, we examined the structure and functional potential of a pilot-scale microbial community performing EBPR using metagenomic sequencing. Resulting environmental sequence information was used to compare metabolic potential between the EBPR 50 pilot plant and other ecosystems and to investigate genomic variation between available reference genomes and binned population genomes. Our results indicate that the pilot-scale microbial community was enriched with genes for biofilm formation, fatty acid metabolism, and aromatic compound degradation, consistent with reports from other activated sludge ecosystems

(Sanapareddy et al., 2009, Albertsen et al., 2012), and highlight that local selection pressures are likely responsible for genomic differentiation within microbial populations from disparate treatment plant locations.

3.3 Experimental procedures

3.3.1 Sampling

A total of 9 biomass samples were collected from the anaerobic, anoxic, and aerobic zones (1 per zone in triplicate) of the UBC EBPR Pilot Plant (lat. 49.245378, long. -123.22940). The plant uses a University of Cape Town (UCT) activated sludge configuration with a Zee-Weed® membrane system installed in the aerobic zone for solids-liquid separation and is designed for carbon oxidation, nitrification-denitrification, and EBPR (see Section 2.2.1). Samples were collected on March 5, 2013 from each bioreactor zone and immediately flash frozen in liquid nitrogen and stored at -80oC until further processing.

3.3.2 DNA extraction and sequencing

Total genomic DNA was extracted from biomass samples using the FastDNA® Spin Kit for Soil

(MP Biomedicals, Solon, OH, USA). A 60 second bead-beating step with a spherical ceramic bead (lysing matrix E) at a speed setting of 4.0 on a FastPrep (MP Biomedicals) was used for sample lysis. Genomic DNA was quality checked using agarose gel electrophoresis. Library preparation and sequencing were performed at Genome Quebec (Montreal, Canada) on the Roche

454 GS FLX Titanium platform (454 Life Sciences, Branford, CT, USA) according to manufacturer’s instructions. Resulting reads were de-replicated and filtered using a minimum

51 length of 100 bp, and allowing for no ambiguous bases. Reads that did not meet minimum requirements were not used in downstream analysis.

3.3.3 Metagenomic assembly and binning

Filtered reads were subsequently assembled into contigs using the software package Newber with overlap parameters of 95% minimum identity and a minimum length of 40 bp (Margulies et al.,

2005). Resulting contigs were binned using the software MaxBin 1.3 developed at the Joint

Bioenergy Institute into population genomes with default parameters (Wu et al., 2014). MaxBin is based on an expectation-maximization algorithm and provides genome-related statistics, including estimated completeness, GC content, and genome size (Wu et al., 2014). The taxonomy of resulting population genome bins was assigned using the Metagenome Analyzer (MEGAN)

(Huson et al., 2011).

3.3.4 Gene annotation and pathway analysis

Gene annotation and metabolic pathway prediction was accomplished using the in-house

MetaPathways 2.0 pipeline (Konwar et al., 2013; Hanson et al., 2014). Briefly, open reading frames (ORFs) were predicted using the Prokaryotic Dynamic Programming Genefinding

Algorithm (Prodigal) and queried against the Kyoto Encyclopedia of Genes and Genomes

(KEGG), SEED subsystems, Clusters of Orthologous Groups of proteins (COG), RefSeq, and

MetaCyc protein databases using the optimized LAST algorithm for functional annotation.

Taxonomic annotation of predicted ORFs was accomplished using MEGAN; nucleotide sequences were also queried against the SILVA database to identify SSU rRNA genes.

Environmental Pathway/Genome Databases (ePGDB) were subsequently reconstructed from annotated ORFs using Pathway Tools (Karp et al., 2010), which predicts metabolic pathways from MetaCyc: a highly curated database of 2,151 pathways and 14,084 reactions representing all

52 domains of life. Pathway inference was based on reaction coverage of at least 50 percent in a particular pathway and/or the presence of all “key reactions” (Karp et al., 2011).

3.3.5 Genome comparisons

Comparison of binned population genomes with isolate genomes was accomplished using the protein Basic Local Alignment Search Tool (BLASTP). Sequences for complete isolate genomes were downloaded from the National Center for Biotechnology Information (NCBI) website

(www.ncbi.nlm.nih.gov) and ORFs for both the binned population and isolate genomes were predicted using prodigal (Hyatt et al., 2010). Predicted ORFs from population and isolate genomes were compared using BLASTP and the percent amino acid similarity between best reciprocal BLASTP hits was plotted to identify regions of low similarity. Low amino acid similarity regions that were flanked by mobile genetic element (MGE) signatures (e.g. transposases or integrases) and had variable guanine-cytosine (GC) content were considered putative genomic islands. ORFs were queried against the NCBI RefSeq or non-redundant (nr) database using BLASTP for annotation.

3.3.6 Prophage and CRISPR reconstruction

The Phage Search Tool (PHAST) was used to identify, annotate, and graphically display prophage sequences recovered from the EBPR metagenome based on data clustering algorithms

(Zhou et al., 2011). PHAST was also used to evaluate the completeness of putative phage.

Clustered regularly interspaced short palindromic repeats (CRISPR) were identified and reconstructed from raw metagenomic reads using Crass: the CRISPR assembly tool (Skennerton et al., 2013). Crass uses short- and long-read algorithms to scan unassembled metagenomic reads to identify and cluster CRISPR loci. Subsequently, graphical methods were used to reconstruct variable spacer arrangements that catalogue the historical interactions between host (i.e. microbe) and virus (Skennerton et al., 2013).

53 3.4 Results and discussion

3.4.1 Sequencing statistics

Pyrosequencing of 9 activated sludge samples collected from the anaerobic, anoxic, and aerobic zones of the UBC EBPR Pilot Plant resulted in 1,208,421 reads after quality filtering with an average read length of 712 bp (Table 3.1). Raw reads were subsequently assembled into contigs using Newbler. Approximately 37% of raw reads assembled into contigs over 300 bp with an average N50 contig size of 4,990 bp and a maximum contig length of 205,236 bp. (Table 3.1).

This represents 318 Mb of non-redundant nucleotides, encompassing the equivalent of approximately 70-80 full bacterial genomes across the three redox zones.

Table 3.1 Metagenome assembly and sequencing statistics Number of samples 9 Sequenced reads 1,208,421 Sequenced bases 860,108,034 Avg. read quality 33 Avg. read length 712 ± 26 (s.d.) Contigs 15,137 Reads assembled 446,957 N50 (bp) 4,990 Max. contig length (bp) 205,236 ORF: open reading frame. N50: length of smallest contig in set that contains the fewest (largest) contigs whose combined length represents ≥ 50% of the assembly.

3.4.2 Community structure: comparison of pyrotag and metagenomic results

Overall, the taxonomic composition of the EBPR microbial community based on metagenomic methods was comparable to results obtained from pyrotag sequencing (Section 2), although quantitative differences were observed among some phyla (Table 3.2). In particular,

Bacteroidetes displayed considerable quantitative differences between methods; SSU rDNA pyrotag abundances affiliated with Bacteroidetes accounted for 25% of the community, whereas

ORFs assigned to Bacteroidetes accounted for 6.5% of the community. Further examination revealed that these differences were largely within the orders Flavobacteriales and 54 Sphingobacteriales (especially the family Saprospiraceae) (Figure 3.1). It is possible that the lower Bacteroidetes abundances based on ORF counts reflects the limited number of sequenced representatives present in available databases. Indeed, a lack of sequenced reference genomes strongly biases microbial community structure when employing metagenomic methods due to incorrect annotation of metagenomic reads (Albertsen et al., 2013). Such biases likely occurred in our dataset, as many SSU rDNA sequences were affiliated with uncultured bacteria. Additional caveats associated with pyrotag community analysis may have influenced our results. These include differences in rrn operon copy number between phylogenetic groups and PCR primer biases (Crosby and Criddle, 2003; Pinto and Raskin, 2012). PCR primers biases arise because of poor complexing between primer and template and/or poor primer extension, resulting in non- uniform amplification of SSU rRNA amplicons across all taxa (Pinto and Raskin, 2012).

Table 3.2 Comparison of community structure based on pyrotag and metagenomic methods Phyla/Class ORF count Metagenome Pyrotags Actinobacteria 397,139 47.8% 32.0% Bacteroidetes 53,895 6.5% 25.0% Betaproteobacteria 131,894 15.9% 14.8% Deltaproteobacteria 39,232 4.7% 3.8% Gammaproteobacteria 40,651 4.9% 2.7% Alphaproteobacteria 79,151 9.5% 5.1% Firmicutes 18,020 2.2% 2.8% Cyanbacteria 9,639 1.2% 0.2% Verrucomicrobia 15,329 1.8% 2.6% Planctomycetes 17,392 2.1% 3.5% Acidobacteria 3,437 0.4% 0.7% Chloroflexi 3,322 0.4% 1.9% Euryarchaeota 1,932 0.2% 0.4% Nitrospirae 7,875 0.9% 0.8% Other 11,625 1.4% 3.7% Total 830,533 100.0% 100.0%

55 Comparison of pyrotag and metagenomic results

Candidatus Microthrix

Actinobacteria Mycobacteriaceae

Nocardiaceae

Rhizobiales

Alphaproteobacteria Rhodobacterales

Rhodospirillales

Bacteroidales

Cytophagales

Flavobacteriales Bacteroidetes Other Sphingobacteriales

Saprospiraceae

Sphingomonadales

Comamonadaceae

Betaproteobacteria Other Burkholderiales

Rhodocyclaceae

Deltaproteobacteria Myxococcales

Bacillales Firmicutes Clostridiales Method metagenome Gammaproteobacteria Pseudomonadales pyrotags

Nitrospirae Nitrospiraceae

Planctomycetes Planctomycetales

Verrucomicrobia Verrucomicrobiae

2% 4% 6% 8% 10% 12% 14% 16% 18% 20% % Relative abundance Figure 3.1 Comparison of taxonomic composition using pyrotag and metagenomic methods. Pyrotag abundance based on SSU rDNA abundance; metagenome abundance based on ORF counts.

3.4.3 Microbial community metabolism

The overall functional potential of the UBC EBPR metagenome was compared to 45 microbial metagenomes collected from nine distinct biomes (Dinsdale et al., 2008), as well as two previously published wastewater treatment plant metagenomes: the Aalborg East (AAE) EBPR metagenome (Albertsen et al., 2012) and the Mallard Creek activated sludge (Non-EBPR) 56 metagenome. Comparisons were made based on the percentage of ORFs assigned to the SEED subsystems (Figure 3.2). Average values for the 45 microbial metagenomes were adopted from

Sanapareddy et al. (2009). No photosynthesis was predicted across all WWTPs, consistent with previous observations (Sanapareddy et al., 2009; Albertsen et al., 2012). ORFs involved in capsular and exopolysacchride biosynthesis were also more enriched in the WWTP metagenomes, possibly do to the high selection pressure favoring floc formation in activated sludge systems (Figure 3.2; Table 3.3). As expected, the EBPR metagenomes had a larger fraction of phosphorus metabolism compared with the Non-EBPR metagenome. Additionally, the fraction of ORFs assigned to aromatic compound degradation was greater in the UBC EBPR and

Non-EBPR metagenomes. This may be attributed to these facilities reciving higher industrial wastewater loadings (Sanapareddy et al., 2009), albeit aromatic compounds were not directly measured in this study.

To further examine microbial community metabolism in the UBC EBPR Pilot Plant, population genome bins were extracted from metagenomic contigs using MaxBin (Wu et al.,

2014), producing four partial genomes (Table 3.4). Here, partial genomes were considered to be representative of a population, as metagenomic assembly algorithms cannot discriminate single- nucleotide polymorphisms (Sharon and Banfield, 2013). While metagenomic assembly algorithms confound analysis of fine-scale heterogeneity, generation of genomes from metagenomes does provide sufficient resolution to study population metabolic potential, diversity, and evolutionary dynamics (Sharon and Banfield, 2013). The two most complete population genomes were taxonomically affiliated with Candidatus ‘Microthrix parvicella’ (Bin

1) and Gordonia spp. (Bin 2). Genome completeness was estimated at 92.5% and 74.8%, respectfully, based on the presence of 107 essential single copy maker genes (Wu et al., 2014)

(Table 3.4; Table B1). The M. parvicella genome contained 99 unique marker genes, indicating it was nearly complete (Table B1). A total of 4 out of 99 unique markers were found in duplicate, which may represent some contamination in the population bin or indicate that M. parvicella carries multiple copies of these genes.

57 SEED subsystem comparison of wastewater treatment plant (WWTP) metagenomes

200% 180% WWTP Metagenome 160% UBC EBPR 140% Non-EBPR 120% AAE EBPR 100% 80% % difference 60% compared to 40% 45 microbial 20% metagenomes 0% -20% -40% -60% -80% -100%

s s g .. e s s rt e s s e n e s m d m i n m . l e d s m e m e l m m o s i is n is x li is ic u v i o n is t is d c is is ti n s l u l ta a l t s ti o p o l a l ti y l l a e e o o o n o e p a n s o r o o C o o r f th b o b b h a v e n p b d b l b b i e n a p a m ig a t C ri r ra s a y a le l a a p y t m t e s t s p e t h t c e t t s D s e o e h ll e ro d e o T R e o e u C e e e d M M M n D Is e M b m M M R n to C C e P a n s r N d a o s c A d C r , l d d s n a d n A in h u ti n fu s l n n ra re e C m n a e r N d l in a a a b t g iu a N te s P o a D a n u s , S o s n R o a h m y a S m W d s m tr s s io r e p o lit n a ll i d e i a e s P s s r ti it e c i N t d i i o A io V A ip M o i iv D h o t , C P s D , P M la s o L o l e u r in , le l c g o s c e n e t m id u C c A c le R fa A N u o y ir tt V C a F

Figure 3.2 SEED subsystem comparison of microbial metagenomes with the UBC EBPR Pilot Plant metagenome. SEED subsystems contain 23 functional categories of microbial metabolism. Each WWTP metagenome was compared to average values for 45 microbial metagenomes adapted from Sanapareddy et al. (2009).

58 Other population bins had either low genome completeness or were taxonomically ambiguous

and were not included in downstream analysis.

Table 3.3 ORFs assigned to capsular and exopolysaccharides metabolism

SEED capsular and exopolysaccharides categories UBC EBPR AAE EBPR Non-EBPR Alginate metabolism 857 192 184 Capsular heptose biosynthesis 665 159 125 Capsular polysaccharide (CPS) of Campylobacter 5 2 6 Colanic acid biosynthesis 300 105 99 dTDP-rhamnose synthesis 1253 428 340 Gram-negative cell wall components 2557 728 739 O-Methyl phosphoramidate capsule modification in Campylobacter 102 9 10 Peptidoglycan biosynthesis 4982 1244 1301 Pseudaminic acid biosynthesis 14 3 4 Rhamnose containing glycans 2192 678 554 Serotype determining capsular polysaccharide biosynthesis in Staphylococcus 9 0 3 Sialic acid metabolism 1032 265 303 Xanthan exopolysaccharide biosynthesis and export 21 3 6

Table 3.4 Metagenomic contig binning statistics Completeness Genome size GC content Abundance Bin (%) (bp) (%) Bin001 16.47 92.5 4,165,338 66 Bin002 4.43 74.8 4,447,347 67 Bin003 3.72 67.3 4,139,412 63 Bin004 2.53 11.2 3,619,713 71 aCompleteness based on percentage of 107 marker genes identified (Table B1; Dupont et al., 2012)

3.4.4 Comparison of population genomes to existing reference genomes

Candidatus ‘Microthrix parvicella’ population genome

To elucidate local selection pressures driving genomic differentiation events in the EBPR

ecosystem, we compared functional differences between the M. parvicella population genome

recovered here with previously published M. parvicella genomes isolated from activated sludge:

M. parvicella strain Bio17-1 (Muller et al., 2012) and M. parvicella strain RN1 (McIlroy et al.,

59 2013). Genomic differences were considered to be associated with habitat-specific gene pools shaped by local bioreactor conditions (Polz et al., 2013; Cordero and Polz, 2014). Overall, the three M. parvicella-related genomes displayed high sequence homology; strain Bio17-1 and strain

RN1 shared 89 and 87 percent amino acid similarity with the M. parvicella population genome, respectfully, based on best reciprocal BLASTP matches using an e-value cutoff of 1e-6. Indeed, this agrees with recent population comparisons that highlighted the low genomic diversity between M. parvicella RN1 and related EBPR community strains from full-scale treatment plants in Denmark (McIlroy et al., 2013).

To verify whether the M. parvicella RN1 metabolic model (McIlroy et al., 2013) could be extended to related strains from the UBC EBPR Pilot Plant, metabolic pathways were compared across all available M. parvicella-genomes using MetaPathways (Konwar et al., 2013; Hanson et al., 2014). From Figure 3.3, it is clear that the majority of metabolic pathways are indeed conserved across M. parvicella strains, including fatty acid β-oxidation and biosynthesis, triacylglycerol (TAG) biosynthesis and degradation, the pentose phosphate pathway, and the

Embden-Meyerhof-Parnas (EMP) glycolysis pathway. Genes responsible for polyphosphate

(polyP) storage were also identified, including a single polyphosphate kinase (ppk) (Table 3.5).

Despite the remarkable metabolic similarities between M. parvicella strains, minor differences in pathway composition were observed, particularly in strain RN1. Pathways missing from strain

RN1 included cell capsule and exopolysaccharide biosynthesis; for example, dTDP-L-rhamnose biosynthesis (Tsukioka et al., 1997; Graninger et al., 2002) and UDP-D-xylose biosynthesis

(Coyne et al., 2011; Gu et al., 2011).

60

Figure 3.3 M. parvicella pathway comparison. Columns represent M. parvicella strains; rows represent inferred MetaCyc pathways. Branches indicate clustering of M. parvicella strains (columns) and pathways (rows) based on Manhattan method. Pathways discussed in main text are listed on the right.

Predicted ORFs for these pathways formed syntenic regions conserved across the M. parvicella population genome and strain Bio17-1, but not strain RN1 (Figure B1). Structural variations in exopolysaccharides and/or cell capsule residues are known bacteriophage resistance mechanisms that prevent phage adsorption through physical barriers (i.e. EPS layer) or modifications to cell

61 surface receptors (Hanlon et al., 2001; Labrie et al., 2010). Given that some phage carry substrate specific polysaccharide-degrading enzymes, including polysaccharases and lyases (Sutherland,

1995), these structural differences may be important bacterial adaptive features for evading phage predation in EBPR ecosystems, as recently proposed by others (Kunin et al., 2008; Albertsen et al., 2012; McIlroy et al., 2013).

To further assess functional differentiation between M. parvicella-related strains, fine- scale variations in genome content were examined. Our results identified 21 loci encompassing

274 ORFs in the M. parvicella population genome that were missing in strain Bio17-1 and/or strain RN1 (Figure 3.4; Table B2). Indeed, these ORFs formed discrete gene clusters, often located on putative genomic islands, suggesting they were acquired together and encode some adaptive function (Figure 3.4). Consistent with pathway-level comparisons, annotation of the gene clusters revealed genes encoding cell envelope and exopolysaccharide biosynthesis enzymes. Other annotated gene clusters were associated with heavy metal/antibiotic resistance and antibiotic biosynthesis, as well as restriction-modification systems (possible phage defense).

The presence of toxic compound and antibiotic resistance genes on mobile genetic elements agrees with previous studies that examined plasmids from activated sludge microbial communities (Szczepanowski et al., 2008; Zhang et al., 2011; Sentchilo et al., 2013). Acquisition of such resistance mechanisms is likely essential for survival in EBPR ecosystems that receive high loadings of toxic chemicals and antibiotics in wastewater influent, and also contributes to the global dispersal of antibiotic resistant bacteria (Czekalski et al., 2014).

62 Table 3.5 Polyphosphate metabolism, M. parvicella M. parvicella M. parvicella M. parvicella Gene Protein EC No. UBC RN1a Bio17-1 ppk1 polyphosphate kinase 1 2.7.4.1 1 1 1 ppx Exopolyphosphatase 3.6.1.11 - 2 - adk Adenylate kinase 2.7.4.3 1 1 1 pap polyP:AMP phosphotransferase 2.7.4.1 - - - ppgK polyphosphate glucokinase 2.7.1.63 1 1 1 ppnK polyphosphate/ATP NAD kinase 2.7.1.23 - 1 - phoB Pho regulon, DNA-binding response regulator - 2 2 2 phoR Pho regulon,sensory histidine kinase 2.7.13.3 - 1 - phoH Pho regulon, phosphate starvation-inducible protein - 1 1 1 phoU Chaperone-like PhoR/PhoB inhibitory protein - - 1 - pstA Pi ABC transporter - membrane subunit - - 1 - pstB Pi ABC transporter - ATP binding subunit 3.6.3.27 1 1 1 pstC Pi ABC transporter - membrane subunit - - 1 - pstS Pi ABC transporter - periplasmic binding protein - - 1 - pitA low-affinity Pi transport protein - - - - ppa Inorganic pyrophosphatase 3.6.1.1 1 1 1 alp secreted alkaline phosphatase 3.1.3.1 - - - aValues revised based on McIlroy et al. (2013).

63

Figure 3.4 Fine-scale comparison of M. parvicella genomes. X-axis indicates position along the M. parvicella population genome; y-axis indicates the percent amino acid similarity of isolate genomes (M. parvicella RN1 and Bio17-1) with the population, as determined by best reciprocal BLASTP hits. Purple vertical bars indicate putative genomic islands. Arrows indicate predicted ORFs.

64 Gordonia spp. population genome

Metabolic pathways and variable genomic regions were also examined in the Gordonia spp. population genome in comparison to 8 closely related Gordonia spp. isolates from different habitats. Our results show that the functional potential of Gordonia spp. from the UBC EBPR

Pilot Plant was most closely related to G.amarae (Figure 3.5), consistent with previous studies that isolated Gordonia spp. from activated sludge systems (Soddell et al., 1998; Soddell et al.,

2006). Core metabolic pathways inferred across Gordonia spp. were similar to M. parvicella strains, including fatty-acid β-oxidation and biosynthesis, TAG biosynthesis and degradation, the pentose phosphate pathway, and the Embden-Meyerhof-Parnas glycolysis pathway. The glyoxylate cycle was also inferred for most Gordonia spp., which could play a role in generating essential carbon precursors for biosynthesis under anaerobic conditions (Burrows et al., 2008b).

The overrepresentation of lipid and fatty acid metabolism is consistent with isolate experiments indicating their use as growth substrates (Soddell et al., 1998) and supports previous observations correlating Gordonia spp. abundance with lipid loading rates in full-scale wastewater treatment plants (Frigon et al., 2006). While this implies that Gordonia spp. in EBPR ecosystems may also be specialized in lipid degradation, previous ecophysiological studies showed that acetate and glucose, not lipids, were the main substrates utilized by Gordonia spp. in situ, indicating that substrate preferences between Gordonia spp. may vary considerably (Carr et al., 2006; Kragelund et al., 2007). Most Gordonia spp., including the population genome binned here, were also capable of nitrate reduction, based on the presence of a respiratory nitrate reductase (nar) gene and nitric-oxide reductase (norQ) gene. This implies that anaerobic respiration for growth under anoxic conditions is possible.

Surprisingly, no Gordonia spp. examined here, including the binned population genome, encoded the potential for polyhydroxyalkanoate (PHA) storage, based on the absence of the key enzyme PHA synthetase (phaC). This contradicts previous ecophysiological studies that suggest

PHA was accumulated by Gordonia spp. in activated sludge (Nielsen et al., 2009). Recently, phaC was also found to be missing from the M. parvicella genome (McIlroy et al., 2013). The

65 authors suggested that the Nile blue A staining methods used to identify PHA storage may have incorrectly identified TAGs as PHA, given that these methods stain both PHA and lipids (Serafim et al., 2002). Alternatively, a novel multifunctional fusion gene or putative hydrolase may have substituted for the missing PHA synthase normally encoded by phaC (McIlroy et al., 2013).

Another possibility may be that the reaction catalyzed by PHA depolymerase (detected in

Gordonia spp.) is reversible, although no evidence of such actvitiy has been reported in the literature. As such, confirmation of Gordonia spp. ability to accumulate PHA granules in EBPR ecosystems requires further elucidation.

Genes encoding the enzymes necessary for polyP storage were also present across

Gordonia spp. (Table 3.6). All Gordonia spp. genomes encoded a single ppk gene and multiple copies of the polyP hydrolyzing enzyme exopolyphosphatase (ppx) (Rao et al., 2009).

Regeneration of ATP from PolyP requires polyphosphate:AMP transferase (PAP) activity (Rao et al., 2009).

66

Figure 3.5 Gordonia spp. pathway comparison. Columns represent Gordonia spp. strains; rows represent inferred MetaCyc pathways. Clustering based on Manhattan method. Branches indicate clustering of Gordonia spp. strains (columns) and pathways (rows) based on Manhattan method. Pathways discussed in main text are listed on the right. G. polyisoprenivorans, Gpol; G. terrae, Gter; G. rhizosphera, Grhi; G. soli, Gsol; Gordonia spp. UBC, Gubc; G. amarae, Gama; G. paraffinivorans, Gpar; G. KTR9, Gktr; G. aichiensis, Gaic.

While Accumulibacter and Tetrasphaera PAO encode PAP (Garcia Martin et al., 2006;

Kristiansen et al., 2012), it was not identified in the Gordonia spp. genomes examined here.

However, ppk and adenylate kinase (adk) are reported to form a complex exhibiting some PAP activity (Ishige and Noguchi, 2000), which were both identified in Gordonia spp. (Table 3.6). 67 This suggests that potential for ATP regeneration from polyP by Gordonia spp. in the EBPR ecosystem exists. Both the specific phosphate transport (pst) system and inorganic phosphate transport (Pit) system were also identified in most Gordonia spp. (Table 3.6). In Accumulibacter, the Pit system is believed to be essential for generating the proton motive force under anaerobic conditions through export of Pi in symport with protons (Saunders et al., 2007; Burow et al.,

2008). The Pit system is missing in glycogen accumulating organisms (McIlroy et al., 2014;

Nobu et al., 2014), as well as M. parvicella (McIlroy et al., 2013) leading some researchers to hypothesize that it is required for polyP cycling in EBPR. However, while Gordonia spp. have been observed to store polyP granules (Wong et al., 2005; Beer et al., 2006), their ability to continuously cycle polyP across anaerobic-aerobic conditions remain unknown.

Analysis of the variable regions across the Gordonia spp. genomes identified 39 gene clusters containing 316 ORFs (Table B3). While assigning biological roles to some of these clusters was difficult due the large presence of hypothetical proteins, many appeared to be involved in aromatic compound degradation and fatty acid metabolism. Indeed, this observed metabolic variability suggests that differences in substrate uptake patterns among Gordonia spp. likely transpire, consistent with in situ observations from activated sludge populations (Carr et al.,

2006; Kragelund et al., 2007). Other molecules commonly identified within the variable regions were PIN-domain toxin-antitoxin proteins.

68 Table 3.6 Polyphosphate metabolism, Gordonia spp. EC Gene No.* Gordonia spp. UBC G.aichiensis G.amarae G.KTR9 G.paraffinivorans G.polyisoprenivorans G.rhizosphera G.soli G.terrae ppk1 2.7.4.1 1 1 1 1 1 1 1 1 1 ppx 3.6.1.11 3 2 2 2 2 2 2 2 2 adk 2.7.4.3 1 1 1 1 1 1 1 1 1 pap 2.7.4.1 ------ppgK 2.7.1.63 ------ppnK 2.7.1.23 1 1 1 1 1 1 1 1 1 phoB - 2 1 2 1 0 0 0 0 2 phoR 2.7.13.3 ------phoH - 1 1 1 1 1 1 1 1 1 phoU - 1 1 1 1 1 1 1 1 1 pstA - 1 1 1 1 1 1 1 1 1 pstB 3.6.3.27 1 1 1 1 1 1 1 1 1 pstC - 1 1 1 1 1 1 1 1 1 pstS - 1 1 1 1 1 1 1 1 1 pitA - 1 2 1 1 2 1 0 1 2 ppa 3.6.1.1 1 1 1 1 1 1 1 1 1 alp 3.1.3.1 1 1 1 1 1 2 2 2 1 *Protein names provided on Table 3.5.

69 While the precise role of these proteins is unclear, previous reports suggests that resident toxin- antitoxin operons originally acquired through mobile gene pools may be associated with retardation of cell growth and persistence in stressful environments (Arus et al., 2005; Gerdes et al., 2005). This agrees with the low activity levels often observed among Gordonia spp. in activated sludge (de los Reyes and Raskin et al., 2002; Nielsen et al., 2008); however, much still remains to be learned about the ecological role of toxin-antitoxin systems in bacteria (Arus et al.,

2005).

3.4.5 EBPR ecosystem bacteria-phage interactions

Given the prevalence of phage defense mechanisms in the EBPR community, attempts were made to identify and annotate prophage sequences in the metagenome using the Phage Search Tool

(PHAST) (Zhou et al., 2011). PHAST identified 5 partial prophage regions in the metagenome, 3 of which were affiliated with the M. parvicella population genome (Figure 3.6; Table B4).

Overall, phage recovered from the metagenome appeared to be novel, as no consistent classification of viral coding sequences could be assigned. Functional content of the phage encoded mainly structural components, replication machinery, and hypothetical proteins.

Interestingly, one prophage encoded the terminal quinol oxidases cytochrome bd, which facilitates microaerophilic respiration and nitric oxide resistance in bacteria (Mason et al., 2009).

Indeed, such enzymes have potential to confer a fitness advantage in bacterial hosts under the microaerophilic conditions present in EBPR ecosystems. Such potential has previously been reported in marine ecosystems, where viral metabolic reprogramming of host carbon flux towards energy production and viral genome replication was proposed under sunlit and dark ocean conditions (Hurwitz et al., 2013). Nevertheless, our interpretation remains largely speculative and requires further investigation.

To further explore bacteria-phage interactions in the EBPR ecosystem, clustered regularly interspaced short palindromic repeats (CRISPR) were identified and reconstructed from

70 raw metagenomic reads using Crass (Skennerton et al., 2013). CRISPR are an adaptive prokaryotic immune system found in half of all sequenced bacterial and archaeal genomes. They recognize and cleave foreign DNA entering the cell using unique “spacer” sequences (Barrangou et al., 2007; Horvath and Barrangou, 2010). Spacer sequences (i.e. short pieces of excised foreign

DNA) are incorporated into the host genome between direct repeat clusters, providing a catalogue of the dynamic and rapidly evolving interactions between host and virus (Horvath and

Barrangou, 2010).

A total of 40 unique CRISPR were identified in the EBPR metagenome, each differentiated by a specific direct repeat sequence and multiple spacers (Figure 3.7). Average direct repeat and spacer lengths were 35 bp and 33 bp, respectfully, although some variation in sequence length was observed (Table B5). While most spacers did not map to the assembled metagenomic contigs, the two largest spacer-repeat arrays matched contigs in the M. parvicella population genome (Figure 3.8). Here, some spacers had greater abundance within a specific

CRISPR locus, indicating that some phage attacks were more widespread among the population than others. Moreover, many CRISPR formed a large number of unconnected spacer arrangements, suggesting that discrete CRISPR loci originated from different strains in a population (Skennerton et al., 2013). Attempts were also made to map spacer sequences to phage regions reconstructed from the metagenome, however no matches were found. This implies that reconstructed prophage regions may not represent the most recent lytic events that occurred in the pilot-scale EBPR ecosystem.

71

Figure 3.6 Prophage coding sequence regions reconstructed from the metagenome. A detailed description of each region can be found on Table B4. Each row represents one partial prophage. *Prophage found in M. parvicella UBC population genome.

72

40

s

r e

c 30

a

p

s

f

o 20

r

e b

m 10

u N

0

0 1

0 2 5 6 9 0 3 7 8 9 3 5 7 1 8 0 3 6 8 9 0 3 7 0 1 2 3 4 6 3 4 5 7 0 1 8

0 0

4 9

1 1 1 1 1 2 2 2 2 2 3 3 3 4 4 5 5 5 5 5 6 6 6 7 7 7 7 7 7 8 8 8 8 9 9 9

1 1

G G

G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G G

G G Direct repeat ID

Figure 3.7 Total spacer count from EBPR metagenome. X-axis indicates direct repeat ID arbitrarily assigned by Crass (Skennerton et al., 2013); y-axis indicates number of spacers associated with each direct repeat sequence.

Spacer Count

1 5

Figure 3.8 CRISPR spacer-repeat loci (region G4) in M. parvicella population genome. Arrows indicate direct repeats; circles indicate spacers; diamonds indicate flanking sequences. Spacer abundance indicated by colour gradient.

73 3.5 Concluding remarks

In summary, our results show that metagenomic and pyrotag sequencing approaches provide comparable estimates of EBPR community structure; however biases may exist due to the underrepresentation of some taxa in available sequence databases (e.g. RefSeq) and/or because of non-uniform amplification of SSU rRNA amplicons using PCR methods. Additionally, we show that EBPR microbial communities are enriched in cell capsule and exopolysaccharide biosynthesis, consistent with local selection pressures favoring floc formation. Other enriched functions were associated with phosphorus metabolism and aromatic compound degradation, possibly reflecting microbial adaptation to local bioreactor conditions and influent composition, respectfully.

Recovery of population genomes from metagenomic contigs revealed that M. parvicella strains from different geographical locations manifest remarkable genomic similarity, in agreement with previous reports (McIlroy et al., 2013). While this suggests that the proposed M. parvicella metabolic model can be broadly applied across EBPR ecosystems, fine-scale genomic differences relating to EPS formation and toxic compound/antibiotic resistance indicate that further work is needed to understand eco-evolutionary dynamics that tune M. parvicella population structure such as viral lysis or predation. Novel metabolic insights into Gordonia spp. in the EBPR ecosystem suggest a potential role for polyP cycling that should be further explored, and the presence of phage and phage defense mechanisms (EPS, restriction-modification systems, CRISPR) highlights the need to further elucidate the role that viruses potentially play in modulating microbial community dynamics in EBPR ecosystems. Taken together, our findings provide insight on EBPR community function and enable future efforts aimed at monitoring spatiotemporal patterns in gene expression using metatranscriptomics to elucidate regulatory controls on community metabolism.

74 Chapter 4: Conclusions and future directions

Enhanced biological phosphorus removal (EBPR) is an environmental biotechnology of global importance, essential for protecting receiving waters from eutrophication and enabling phosphorus recovery (Nielsen et al., 2012). Current understanding of EBPR technology is largely based on empirical evidence and black-box models that fail to appreciate the intricarte microbial community interactions responsible for nutrient cycling and ultimate phosphorus removal. This empirical approach has limited further development of EBPR technology, which can experience unpredictable process failures and struggles to meet increasingly stringent effluent regulations as a stand-alone process. Accordingly, in order for EBPR to realize its full potential as an efficient and reliable environmental biotechnology, greater understanding of the microbial ecology of these engineered ecosystems is needed. Insights into the structure and function of microbial communities performing EBPR are starting to emerge as reviewed in Chapter 1, including the description of a core EBPR microbiome (Nielsen et al., 2010; 2012). This thesis aimed to build on previous efforts by exploring the temporal and spatial activity dynamics of the core EBPR microbiome using 454-pyrotag sequencing (Chapter 2) and by examining the metabolic potential of a pilot-scale EBPR community through metagenomic approaches (Chapter 3). Recent ecological theory was incorporated into the interpretation of the research findings, such that rules governing the assembly and control of microbial communities can be harnessed for engineering purposes.

4.1 Conclusions, limitations, and future directions

The findings presented in Chapter 2 represent the first high-throughput examination of microbial activity dynamics in an EBPR ecosystem. Using a combination of SSU rDNA and rRNA sequencing, our investigation expanded the current knowledge of active microbial players participating in enhanced biological phosphorus removal (EBPR), and revealed that rare (i.e. very low abundance) microorganisms have the potential to contribute to nutrient cycling and process stability in these engineered ecosystems. At present, rare microorganisms are generally ignored in

75 engineered ecosystems such as EBPR; a misconception that stems from the assumption that low abundance equates to limited functional importance. Our results further illustrate that microbial activity can be highly dynamic, even among phylogenetically cohesive units. This suggests that fine-scale population differences exist in EBPR ecosystems with potential to impact process performance. We also revealed that rRNA abundance for individual taxa remained constant across bioreactor redox zones (anaerobic/anoxic/aerobic) in the continuous flow process employed in this study, indicating that EBPR communities are attuned to changing bioreactor redox conditions and nutrient concentrations (Section 2.5.3).

It is important to note that rRNA abundances measured in this study reflect the potential for protein synthesis, including past, present, and future activities (Blazewicz et al., 2013). As such, additional studies combining labeling and incubation experiments with RNA sequencing are needed to confirm whether rare taxa do indeed make bona fide contributions to nutrient cycling in

EBPR ecosystems. These studies could be complemented by metatranscriptomic approaches that measure total RNA expression to better understand microbial community responses to bioreactor dynamics. Such responses should also be tested under different perturbation scenarios in order to elucidate the environmental cues controlling population dynamics and gene expression in EBPR ecosystems. Specific perturbations of interest include sudden dilution of nutrients (C, N, and Pi), seasonal and weekly nutrient changes, changes in wastewater temperature, and the recycle of nitrates to the anaerobic zone.

The metabolic potential of a pilot-scale EBPR microbial community was examined in

Chapter 3 using 454-pyrosequening. Here, major functions enriched in the EBPR community were related to extracellular polymetric substances (EPS), phosphorus metabolism, and aromatic compound degradation, likely reflecting microbial adaptation to the incoming wastewater composition (i.e. substrate) and local bioreactor selection pressures (Sanapareddy et al., 2009;

Albertsen et al., 2012). This was further explored by comparative analysis of population genomes binned from the assembled metagenome. Here, a population genome for Candidatus ‘Microthrix parvicella’ showed remarkable similarity to previously sequenced strains sourced from disparate

76 EBPR ecosystems, indicating that existing M. parvicella metabolic models (McIlroy et al., 2013) may be broadly applicable. Fine-scale genomic comparisons also revealed that differentiation between Microthrix strains related to bacteriophage and toxin/antibiotic resistance, highlighting that local selection pressures from phage and toxic compounds likely contributed to community dynamics. This was further supported by the presence of prophage and phage defense mechanisms

(EPS, restriction-modification systems, CRISPR) recovered from the metagenome. Comparative analysis of a Gordonia spp. population genome revealed that EBPR populations encode the metabolic potential for polyP cycling; a previously unrecognized finding. This supports the notion that PAO may consist of several diverse phylogenetic groups that vary among different treatment systems (Mino et al., 1998; McMahon et al., 2010).

The metagenomic work presented here provides much needed insight into EBPR community function; however, several constraints need to be considered in future studies. For example, in this study low sequencing depth prevented examination of the rare biosphere, which was shown in

Chapter 2 to potentially play key roles in EBPR ecosystems (Chapter 2). As such, future investigations should combine 454 pyrosequencing offering longer reads needed for assembly

(~700bp) with “deep” sequencing approaches such as the Illumina HiSeq or MiSeq platforms that generate orders of magnitude more short read paired-end data. In addition to sequencing depth, a lack of indigenous reference genomes limited the accuracy of binning and gene annotation

(Chapter 3). This can be partially overcome by single-cell genomic approaches that enable whole genome amplification and sequencing of individual microorganisms (Rinke et al., 2013).

77 Bibliography

Alawi, M., Lipski, A., Sanders, T., and Spieck, E. (2007) Cultivation of a novel cold-adapted nitrite oxidizing betaproteobacterium from the Siberian Arctic. ISME J. 1: 256–264.

Alawi, M., Off, S., Kaya, M., and Spieck, E. (2009) Temperature influences the population structure of nitrite-oxidizing bacteria in activated sludge. Environ. Microbiol. Rep. 1: 184–90.

Albertsen, M., Hansen, L.B.S., Saunders, A.M., Nielsen, P.H., and Nielsen, K.L. (2012) A metagenome of a full-scale microbial community carrying out enhanced biological phosphorus removal. ISME J. 6: 1094–106.

Albertsen, M., Hugenholtz, P., Skarshewski, A., Nielsen, K.L., Tyson, G.W., and Nielsen, P.H. (2013) Genome sequences of rare, uncultured bacteria obtained by differential coverage binning of multiple metagenomes. Nat. Biotechnol. 31: 533–8.

Albertsen, M., Saunders, A.M., Nielsen, K.L., and Nielsen, P.H. (2013) Metagenomes obtained by “deep sequencing” - what do they tell about the enhanced biological phosphorus removal communities? Wat. Sci. Tech. 68: 1959–68.

Allers, E., Wright, J.J., Konwar, K.M., Howes, C.G., Beneze, E., Hallam, S.J., and Sullivan, M.B. (2012) Diversity and population structure of Marine Group A bacteria in the Northeast subarctic Pacific Ocean. ISME J. 1–13.

Altschup, S.F., Gish, W., Webb, M., Myers, E.W., and Lipman, D.J. (1990) Basic Local Alignment Search Tool. J. Mol. Biol. 215: 403–410.

APHA (2005). Standard Methods for the Examination of Water and Waste Water. Washington, DC: American Public Health Association, American Water Works Association and Water Environment Federation: Washington, DC.

Amann, R., Lemmer, H., and Wagner, M. (1998) Monitoring the community structure of wastewater treatment plants: a comparison of old and new techniques. FEMS Microbiol. Ecol. 25: 205–215.

Andreasen, K. and Nielsen, P.H. (2000) Growth of Microthrix parvicella in Nutrient Removal Activated Sludge Plants: Studies of in situ Physiology. Wat. Res. 34:

Arcus, V.L., Rainey, P.B., and Turner, S.J. (2005) The PIN-domain toxin-antitoxin array in mycobacteria. Trends Microbiol. 13: 360–5.

Ayarza, J.M. and Erijman, L. (2011) Balance of neutral and deterministic components in the dynamics of activated sludge floc assembly. Microb. Ecol. 61: 486–95.

Ayarza, J.M., Guerrero, L.D., and Erijman, L. (2010) Nonrandom assembly of bacterial populations in activated sludge flocs. Microb. Ecol. 59: 436–44.

Barnard, J.L. (1976) A Review of Biological Phosphorus Removal in the Activated Sludge Process. Water SA 2: 136–144.

Barnard, J.L. (1998) The Development of Nutrient-Removal Processes (Abridged). J.CIWEM 12: 330–337. 78 Barnard, R.L., Osborne, C.A., and Firestone, M.K. (2013) Responses of soil bacterial and fungal communities to extreme desiccation and rewetting. ISME J. 7: 2229–41.

Barrangou, R., Christophe, F., Deveau, H., Richards, M., Boyaval, P., Moineau, S., et al. (2007) CRISPR Provides Acquired Resistance Against Viruses in Prokaryotes. Science (80-. ). 315: 1709–1712.

Beer, M., Stratton, H.M., Griffiths, P.C., and Seviour, R.J. (2006) Which are the polyphosphate accumulating organisms in full-scale activated sludge enhanced biological phosphate removal systems in Australia? J. Appl. Microbiol. 100: 233–43.

Berdjeb, L., Pollet, T., Domaizon, I., and Jacquet, S. (2011) Effect of grazers and viruses on bacterial community structure and production in two contrasting trophic lakes. BMC Microbiol. 11: 88.

Blackall, L.L., Stratton, H., Bradford, D., Dot, T.D., Sjörup, C., Seviour, E.M., and Seviour, R.J. (1996) “Candidatus Microthrix parvicella”, a filamentous bacterium from activated sludge sewage treatment plants. Int. J. Syst. Bacteriol. 46: 344–6.

Blazewicz, S.J., Barnard, R.L., Daly, R.A., and Firestone, M.K. (2013) Evaluating rRNA as an indicator of microbial activity in environmental communities: limitations and uses. ISME J. 1–8.

Bock, E. and Wagner, M. (2006) The Prokaryotes. In, Dworkin,M., Falkow,S., Rosenberg,E., Schleifer,K.-H., and Stackebrandt,E. (eds). Springer, Berlin, Heidelberg, pp. 457–495.

Braeken, K., Moris, M., Daniels, R., Vanderleyden, J., and Michiels, J. (2006) New horizons for (p)ppGpp in bacterial and plant physiology. Trends Microbiol. 14: 45–54.

Braker, G. and Fesefeldt, A. (1998) Development of PCR Primer Systems for Amplification of Nitrite Reductase Genes ( nirK and nirS ) To Detect Denitrifying Bacteria in Environmental Samples. Appl. Environ. Microbiol. 64: 3769–3775.

Britton, A., Koch, F.A., Mavinic, D.S., Adnan, A., Oldham, W.K., and Udala, B. (2005) Pilot- scale struvite recovery from anaerobic digester supernatant at an enhanced biological phosphorus removal wastewater treatment plant. J. Environ. Eng. Sci 4: 265 – 277.

Burow, L.C., Mabbett, A.N., McEwan, A.G., Bond, P.L., and Blackall, L.L. (2008) Bioenergetic models for acetate and phosphate transport in bacteria important in enhanced biological phosphorus removal. Environ. Microbiol. 10: 87–98.

Campbell, B.J. and Kirchman, D.L. (2013) Bacterial diversity, community structure and potential growth rates along an estuarine salinity gradient. ISME J. 7: 210–20.

Campbell, B.J., Yu, L., Heidelberg, J.F., and Kirchman, D.L. (2011) Activity of abundant and rare bacteria in a coastal ocean. PNAS 108: 12776–12781.

Caporaso, J.G., Paszkiewicz, K., Field, D., Knight, R., and Gilbert, J. a (2012) The Western English Channel contains a persistent microbial seed bank. ISME J. 6: 1089–93.

Cech, J.S. and Hartman, P. (1993) Competition between polyphosphate and polysaccharide accumulating bacteria in enhanced biological phosphate removal systems. Wat. Res. 27: 1219– 1225. 79 Cech, J.S., Hartman, P., and Macek, M. (1994) Bacteria and protozoa population dynamics in biological phosphate removal systems. Wat. Sci. Tech. 29: 109–117.

Coats, E.R., Watkins, D.L., and Kranenburg, D. (2011) A Comparative Environmental Life-Cycle Analysis for Removing Phosphorus from Wastewater: Biological versus Physical/Chemical Processes. Water Environ. Res. 83: 750 – 760.

Comeau, Y., Hall, K.J., Hancock, R.E.W., and Oldham, W, K. (1986) Biochemical model for enhanced biological phosphorus removal. Water Res. 20: 1511–1521.

Confer, D.R. and Logan, B.E. (1998) Location of Protein and Polysaccharide Hydrolytic Activity in Suspended and Biofilm Wastewater Cultures. Water Res. 32: 31–38.

Cordero, O.X. and Polz, M.F. (2014) Explaining microbial genomic diversity in light of evolutionary ecology. Nat. Rev. Microbiol. 12: 263–73.

Coyne, M.J., Fletcher, C.M., Reinap, B., and Comstock, L.E. (2011) UDP-Glucuronic Acid Decarboxylases of Bacteroides fragilis and Their Prevalence in Bacteria. J. Bacteriol. 193: 5252– 5259.

Crocetti, G.R., Banfield, J.F., Keller, J., Bond, P.L., and Blackall, L.L. (2002) Glycogen- accumulating organisms in laboratory-scale and full-scale wastewater treatment processes. Microbiology 148: 3353–64.

Crosby, L.D. and Criddle, C.S. (2003) Understanding bias in microbial community analysis techniques due to rrn operon copy number heterogeneity. Biotechniques 34: 790–4, 796, 798 passim.

Czekalski, N., Gascón Díez, E., and Bürgmann, H. (2014) Wastewater as a point source of antibiotic-resistance genes in the sediment of a freshwater lake. ISME J. 8: 1381–90.

Daims and Wagner (2010) The microbiology of nitrogen removal. In Microbial Ecology of Activated Sludge (eds. Seviour, R.J., and Nielsen, P.H.). IWA, London, United Kingdom.

De los Reyes, F.L. and Raskin, L. (2002) Role of filamentous microorganisms in activated sludge foaming: relationship of mycolata levels to foaming initiation and stability. Water Res. 36: 445–59.

Dinsdale, E. a, Edwards, R. a, Hall, D., Angly, F., Breitbart, M., Brulc, J.M., et al. (2008) Functional metagenomic profiling of nine biomes. Nature 452: 629–32.

Dolinšek, J., Lagkouvardos, I., Wanek, W., Wagner, M., and Daims, H. (2013) Interactions of nitrifying bacteria and heterotrophs: identification of a Micavibrio-like putative predator of Nitrospira spp. Appl. Environ. Microbiol. 79: 2027–37.

Dueholm, T.E., Andreasen, K.H., and Nielsen, P.H. (2001) Transformation of lipids in activated sludge. Water Sci. Technol. 43: 165–72.

Dunfield, P.F., Tamas, I., Lee, K.C., Morgan, X.C., McDonald, I.R., and Stott, M.B. (2012) Electing a candidate: a speculative history of the bacterial phylum OP10. Environ. Microbiol. 14: 3069–80.

Eikelboom, D. (2000) Process control of activated sludge plants. IWA, London, United Kingdom. 80 Engelbrektson, A., Kunin, V., Wrighton, K.C., Zvenigorodsky, N., Chen, F., Ochman, H., and Hugenholtz, P. (2010) Experimental factors affecting PCR-based estimates of microbial species richness and evenness. ISME J. 4: 642–647.

Eschenhagen, M., Schuppler, M., and Röske, I. (2003) Molecular characterization of the microbial community structure in two activated sludge systems for the advanced treatment of domestic effluents. Water Res. 37: 3224–32.

Esler, J. and Bennett, E. (2011) Phosphorus cycle: A broken biogeochemical cycle. Nature 478: 29–31.

Evans, T.N. and Seviour, R.J. (2012) Estimating biodiversity of fungi in activated sludge communities using culture-independent methods. Microb. Ecol. 63: 773–86.

Flowers, J.J., Cadkin, T. a, and McMahon, K.D. (2013) Seasonal bacterial community dynamics in a full-scale enhanced biological phosphorus removal plant. Water Res. 47: 7019–31.

Flowers, J.J., He, S., Malfatti, S., Del Rio, T.G., Tringe, S.G., Hugenholtz, P., and McMahon, K.D. (2013) Comparative genomics of two “Candidatus Accumulibacter” clades performing biological phosphorus removal. ISME J. 1–14.

Flowers, J.J., He, S., Yilmaz, S., Noguera, D.R., and McMahon, K.D. (2009) Denitrification capabilities of two biological phosphorus removal sludges dominated by different “Candidatus Accumulibacter” clades. Environ. Microbiol. Rep. 1: 583–588.

Follows, M.J. and Dutkiewicz, S. (2011) Modeling Diverse Communities of Marine Microbes. Ann. Rev. Mar. Sci. 3: 427–451.

Forde, A. and Fitzgerald, G.F. (2003) Molecular organization of exopolysaccharide (EPS) encoding genes on the lactococcal bacteriophage adsorption blocking plasmid, pCI658. Plasmid 49: 130–142.

Fredriksson, N.J., Hermansson, M., and Wilén, B.-M. (2012) Diversity and dynamics of Archaea in an activated sludge wastewater treatment plant. BMC Microbiol. 12: 140.

Frigon, D., Guthrie, R.M., Bachman, G.T., Royer, J., Bailey, B., and Raskin, L. (2006) Long-term analysis of a full-scale activated sludge wastewater treatment system exhibiting seasonal biological foaming. Water Res. 40: 990–1008.

Fuhs, G.W. and Chen, M. (1975) Microbiological basis of phosphate removal in the activated sludge process for the treatment of wastewater. Microb. Ecol. 2: 119–38.

Galand, P.E., Casamayor, E.O., Kirchman, D.L., and Lovejoy, C. (2009) Ecology of the rare microbial biosphere of the Arctic Ocean. PNAS 106: 22427–32.

García Martín, H., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C., et al. (2006) Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge communities. Nat. Biotechnol. 24: 1263–9.

Van der Gast, C.J., Ager, D., and Lilley, A.K. (2008) Temporal scaling of bacterial taxa is influenced by both stochastic and deterministic ecological factors. Environ. Microbiol. 10: 1411–8.

81 Gerdes, K., Christensen, S.K., and Løbner-Olesen, A. (2005) Prokaryotic toxin-antitoxin stress response loci. Nat. Rev. Microbiol. 3: 371–82.

Gibbons, S.M., Caporaso, J.G., Pirrung, M., Field, D., Knight, R., and Gilbert, J. a. (2013) Evidence for a persistent microbial seed bank throughout the global ocean. PNAS 110: 4651–4655.

Ginige, M.P., Keller, J., and Blackall, L.L. (2005) Investigation of an Acetate-Fed Denitrifying Microbial Community by Stable Isotope Probing , Full-Cycle rRNA Analysis , and Fluorescent In Situ Investigation of an Acetate-Fed Denitrifying Microbial Community by Stable Isotope Probing , Full-Cycle rRNA An. Appl. Environ. Microbiol. 71: 8683 – 8691.

Goel R.K., Sanhueza P., Noguera D.R. (2005). Evidence of Dechloromonas Sp. Participating in Enhanced Biological Phosphorus Removal (EBPR) in a Bench-Scale Aerated Anoxic Reactor. Water Environment Federation 78th Annual Technical Exhibition and Conference, Water Environment Federation, Washington DC, 3864-3871.

Gong, J., Dong, J., Liu, X., and Massana, R. (2013) Extremely high copy numbers and polymorphisms of the rDNA operon estimated from single cell analysis of oligotrich and peritrich ciliates. 164: 369–79.

Grady, C., Daigger, G., Love, N., & Filipe, C. (2011) Biological Wastewater Treatment. CRC Press, Boca Raton, Florida.

Graninger, M., Kneidinger, B., Bruno, K., Scheberl, A., Messner, P., Graninger, M., et al. (2002) Homologs of the Rml Enzymes from Salmonella enterica Are Responsible for dTDP-β-L- Rhamnose Biosynthesis in the Gram-Positive Thermophile Aneurinibacillus thermoaerophilus DSM 10155. Appl. Environ. Microbiol. 68: 3708 – 3715.

Gray, N.D., Miskin, I.P., Kornilova, O., Curtis, T.P., and Head, I.M. (2002) Occurrence and activity of Archaea in aerated activated sludge wastewater treatment plants. Environ. Microbiol. 4: 158–68.

Gu, A.Z., Saunders, a, Neethling, J.B., Stensel, H.D., and Blackall, L.L. (2008) Functionally Relevant Microorganisms to Enhanced Biological Phosphorus Removal Performance at Full-Scale Wastewater Treatment Plants in the United States. Water Environ. Res. 80: 688–698.

Gu, X., Lee, S.G., and Bar-Peled, M. (2010) Biosynthesis of UDP-xylose and UDP-arabinose in Sinorhizobium meliloti 1021: first characterization of a bacterial UDP-xylose synthase, and UDP- xylose 4-epimerase. Microbiology 157: 260–269.

Hall, E.R., Monti, A., and Mohn, W.W. (2010) A comparison of bacterial populations in enhanced biological phosphorus removal processes using membrane filtration or gravity sedimentation for solids-liquid separation. Water Res. 44: 2703–14.

Hanlon, G.W., Denyer, S.P., Olliff, C.J., Ibrahim, L.J., and Ibrahim, L.J. (2001) Reduction in Exopolysaccharide Viscosity as an Aid to Bacteriophage Penetration through Pseudomonas aeruginosa Biofilms. Appl. Environ. Microbiol. 67: 2746–2753.

Hanson, N.W., Konwar, K.M., Wu, S.-J., and Hallam, S.J. (2014) MetaPathways v2.0: A master- worker model for environmental Pathway/Genome Database construction on grids and clouds. 2014 IEEE Conf. Comput. Intell. Bioinforma. Comput. Biol. 1–7.

82 He, S., Gall, D.L., and McMahon, K.D. (2007) “Candidatus Accumulibacter” population structure in enhanced biological phosphorus removal sludges as revealed by polyphosphate kinase genes. Appl. Environ. Microbiol. 73: 5865–74.

He, S. and McMahon, K.D. (2011a) “Candidatus Accumulibacter” gene expression in response to dynamic EBPR conditions. ISME J. 5: 329–40.

He, S. and McMahon, K.D. (2011b) Microbiology of “Candidatus Accumulibacter” in activated sludge. Microb. Biotechnol. 4: 603–19.

Henriques, I.D.S. and Love, N.G. (2007) The role of extracellular polymeric substances in the toxicity response of activated sludge bacteria to chemical toxins. Water Res. 41: 4177–85.

Hesselmann, R.P.X., Werlen, C., Hahn, D., van der Meer, J.R., and Zehnder, A.J.B. (1999) Enrichment, Phylogenetic Analysis and Detection of a Bacterium That Performs Enhanced Biological Phosphate Removal in Activated Sludge. Syst. Appl. Microbiol. 22: 454–465.

Hesselsoe, M., Fu, S., Schloter, M., Bodrossy, L., Iversen, N., Roslev, P., et al. (2009) Isotope array analysis of uncovers functional redundancy and versatility in an activated sludge. ISME J. 3: 1349–1364.

Hooper, a B., Vannelli, T., Bergmann, D.J., and Arciero, D.M. (1997) Enzymology of the oxidation of ammonia to nitrite by bacteria. Antonie Van Leeuwenhoek 71: 59–67.

Horvath, P. and Barrangou, R. (2010) CRISPR/Cas, the immune system of bacteria and archaea. Science. 327: 167–70.

Hu, M., Wang, X., Wen, X., and Xia, Y. (2012) Microbial community structures in different wastewater treatment plants as revealed by 454-pyrosequencing analysis. Bioresour. Technol. 117: 72–9.

Hugenholtz, P., Goebel, B.M., and Pace, N.R. (1998) Impact of Culture-Independent Studies on the Emerging Phylogenetic View of Bacterial Diversity. J. Bacteriol. 180: 4765 – 4774.

Hugoni, M., Taib, N., Debroas, D., Domaizon, I., Jouan Dufournel, I., Bronner, G., et al. (2013) Structure of the rare archaeal biosphere and seasonal dynamics of active ecotypes in surface coastal waters. Proc. Natl. Acad. Sci. U. S. A. 110: 6004–09.

Hunt, D.E., Lin, Y., Church, M.J., Karl, D.M., Tringe, S.G., Izzo, L.K., and Johnson, Z.I. (2013) Relationship between abundance and specific activity of bacterioplankton in open ocean surface waters. Appl. Environ. Microbiol. 79: 177–84.

Hurwitz, B.L., Hallam, S.J., and Sullivan, M.B. (2013) Metabolic reprogramming by viruses in the sunlit and dark ocean. Genome Biol. 14:

Huson, D.H., Mitra, S., Ruscheweyh, H.-J., Weber, N., and Schuster, S.C. (2011) Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21: 1552–60.

Hyatt, D., Chen, G.-L., Locascio, P.F., Land, M.L., Larimer, F.W., and Hauser, L.J. (2010) Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics 11: 119.

83 Ishige, K. and Noguchi, T. (2000) Inorganic polyphosphate kinase and adenylate kinase participate in the polyphosphate:AMP phosphotransferase activity of Escherichia coli. Proc. Natl. Acad. Sci. U. S. A. 97: 14168–71.

Jeon, C.K. and Park, J.M. (2000) Enhanced biological phosphorus removal in a sequencing batch reactor supplied with glucose as a sole carbon source. Water Res. 34: 2160–2170.

Johnson, B.R. and Daigger, G.T. (2009) Integrated nutrient removal design for very low phosphorus levels. Water Sci. Technol. 60: 2455–2462.

Johnson, D.R., Goldschmidt, F., Lilja, E.E., and Ackermann, M. (2012) Metabolic specialization and the assembly of microbial communities. ISME J. 6: 1985–91.

Jones, S.E. and Lennon, J.T. (2010) Dormancy contributes to the maintenance of microbial diversity. Proc. Natl. Acad. Sci. U. S. A. 107: 5881–6.

Juretschko, S., Timmermann, G., Schmid, M., Schleifer, K., Pommerening-Röser, A., and Wagner, M. (1998) Combined Molecular and Conventional Analyses of Nitrifying Bacterium Diversity in Activated Sludge : Nitrosococcus mobilis and Nitrospira-Like Bacteria as Dominant Populations. Appl. Environ. Microbiol. 64: 3042–3051.

Kang, D. and Noguera, D.R. (2014) Candidatus Accumulibacter phosphatis : Elusive Bacterium Responsible for Enhanced Biological Phosphorus Removal. J. Environ. Eng. 140: 2–10.

Kim, B.-C., Kim, S., Shin, T., Kim, H., and Sang, B.-I. (2013a) Comparison of the bacterial communities in anaerobic, anoxic, and oxic chambers of a pilot A(2)O process using pyrosequencing analysis. Curr. Microbiol. 66: 555–65.

Kim, J.M., Lee, H.J., Lee, D.S., and Jeon, C.O. (2013b) Characterization of the denitrification- associated phosphorus uptake properties of “Candidatus Accumulibacter phosphatis” clades in sludge subjected to enhanced biological phosphorus removal. Appl. Environ. Microbiol. 79: 1969– 79.

Kim, T.-S., Jeong, J.-Y., Wells, G.F., and Park, H.-D. (2013c) General and rare bacterial taxa demonstrating different temporal dynamic patterns in an activated sludge bioreactor. Appl. Microbiol. Biotechnol. 97: 1755–65.

Kindaichi, T., Nierychlo, M., Kragelund, C., Nielsen, J.L., and Nielsen, P.H. (2013) High and stable substrate specificities of microorganisms in enhanced biological phosphorus removal plants. Environ. Microbiol. 15: 1821–31.

Kong, Y., Nielsen, J.L., and Nielsen, P.H. (2005) Identity and Ecophysiology of Uncultured Actinobacterial Polyphosphate-Accumulating Organisms in Full-Scale Enhanced Biological Phosphorus Removal Plants. Appl. Environ. Microbiol. 71: 4076 – 4085.

Kong, Y., Nielsen, J.L., and Nielsen, P.H. (2004) Microautoradiographic Study of Rhodocyclus- Related Polyphosphate-Accumulating Bacteria in Full-Scale Enhanced Biological Phosphorus Removal Plants. Appl. Environ. Microbiol. 70: 5383–5390.

Kong, Y., Xia, Y., Nielsen, J.L., and Nielsen, P.H. (2007) Structure and function of the microbial community in a full-scale enhanced biological phosphorus removal plant. Microbiology 153: 4061–4073.

84 Kong, Y., Xia, Y., and Nielsen, P.H. (2008) Activity and identity of fermenting microorganisms in full-scale biological nutrient removing wastewater treatment plants. Environ. Microbiol. 10: 2008– 19.

Kragelund, C., Levantesi, C., Borger, A., Thelen, K., Eikelboom, D., Tandoi, V., et al. (2007) Identity, abundance and ecophysiology of filamentous Chloroflexi species present in activated sludge treatment plants. FEMS Microbiol. Ecol. 59: 671–82.

Kragelund, C., Remesova, Z., Nielsen, J.L., Thomsen, T.R., Eales, K., Seviour, R., et al. (2007) Ecophysiology of mycolic acid-containing Actinobacteria (Mycolata) in activated sludge foams. FEMS Microbiol. Ecol. 61: 174–84.

Kristiansen, R., Nguyen, H.T.T., Saunders, A.M., Nielsen, J.L., Wimmer, R., Le, V.Q., et al. (2012) A metabolic model for members of the genus Tetrasphaera involved in enhanced biological phosphorus removal. ISME J. 1–12.

Kunin, V., Engelbrektson, A., Ochman, H., and Hugenholtz, P. (2010) Wrinkles in the rare biosphere : pyrosequencing errors. Environ. Microbiol. 12: 118–123.

Kunin, V., He, S., Warnecke, F., Peterson, S.B., Martin, H.G., Haynes, M., et al. (2008) A bacterial metapopulation adapts locally to phage predation despite global dispersal A bacterial metapopulation adapts locally to phage predation despite global dispersal. Genome Res. 18: 293– 297.

Labrie, S.J., Samson, J.E., and Moineau, S. (2010) Bacteriophage resistance mechanisms. Nat. Rev. Microbiol. 8: 317–27.

Legendre, P., Oksanen, J., and ter Braak, C.J.F. (2011) Testing the significance of canonical axes in redundancy analysis. Methods Ecol. Evol. 2: 269–277.

Legendre P, Legendre L. (1998) Numerical Ecology. Amsterdam, the Netherlands: Elsevier Science, BV.

Lennon, J.T. and Jones, S.E. (2011) Microbial seed banks: the ecological and evolutionary implications of dormancy. Nat. Rev. Microbiol. 9: 119–30.

Lepp, P. and Schmidt, T. (1998) Nucleic acid content of synechococcus spp. during growth in continuous light and light/dark cycles. Arch. Microbiol. 170: 201–7.

Louie, T.M., Mah, T.J., Oldham, W., and Ramey, W.D. (2000) Use of metabolic inhibitors and gas chromatography / mass spectrometry to study poly-b-hydroxyalkanoates metabolism involving cryptic nutrients in enhanced biological phosphorus removal systems. Wat. Res. 34: 1507–1514.

Lücker, S., Wagner, M., Maixner, F., Pelletier, E., Koch, H., Vacherie, B., et al. (2010) A Nitrospira metagenome illuminates the physiology and evolution of globally important nitrite- oxidizing bacteria. Proc. Natl. Acad. Sci. U. S. A. 107: 13479–84.

Madonii, P., Davoli, D., and Chierici, E. (1993) Comparative Analysis of the Activated Sludge Microfauna in Several Sewage Treatment Works. Wat. Res. 27: 1485–1491.

Margulies, M., Egholm, M., Altman, W.E., Attiya, S., Bader, J.S., Bemben, L. a, et al. (2005) Genome sequencing in microfabricated high-density picolitre reactors. Nature 437: 376–80. 85 Martin-Fernandez, J.A., Barcelo-Vidal, C., and Pawlowsky-Glahn, V. (2003) Dealing With Zeros and Missing Values in Compositional Data Sets Using Nonparametric. Math. Geol. 35: 253 – 278.

Martinez-Garcia, M., Brazel, D.M., Swan, B.K., Arnosti, C., Chain, P.S.G., Reitenga, K.G., et al. (2012) Capturing single cell genomes of active polysaccharide degraders: an unexpected contribution of Verrucomicrobia. PLoS One 7: e35314.

Mason, M.G., Shepherd, M., Nicholls, P., Dobbin, P.S., Dodsworth, K.S., Poole, R.K., and Cooper, C.E. (2009) Cytochrome bd confers nitric oxide resistance to Escherichia coli. Nat. Chem. Biol. 5: 94–6.

Maszenan, a M., Seviour, R.J., Patel, B.K., Schumann, P., Burghardt, J., Tokiwa, Y., and Stratton, H.M. (2000) Three isolates of novel polyphosphate-accumulating gram-positive cocci, obtained from activated sludge, belong to a new genus, Tetrasphaera gen. nov., and description of two new species, Tetrasphaera japonica sp. nov. and Tetrasphaera australiensis sp. no. Int. J. Syst. Evol. Microbiol. 50: 593–603.

McIlroy, S. and Seviour, R.J. (2009) Elucidating further phylogenetic diversity among the Defluviicoccus-related glycogen-accumulating organisms in activated sludge. Environ. Microbiol. Rep. 1: 563–8.

McIlroy, S.J., Albertsen, M., Andresen, E.K., Saunders, A.M., Kristiansen, R., Stokholm- Bjerregaard, M., et al. (2014) “Candidatus Competibacter”-lineage genomes retrieved from metagenomes reveal functional metabolic diversity. ISME J. 8: 613–24.

McIlroy, Simon, J., Kristiansen, R., Albertsen, M., Michael Karst, S., Rossetti, S., Lund Nielsen, J., et al. (2013) Metabolic model for the filamentous “Candidatus Microthrix parvicella” based on genomic and metagenomic analyses. ISME J. 1–12.

McMahon, K.D. and Read, E.K. (2013) Microbial Contributions to Phosphorus Cycling in Eutrophic Lakes and Wastewater. Annu. Rev. Microbiol. 67: 199–219.

McMahon, K.D., Shaomei, H., and Oehmen, A. (2010) The microbiology of phosphorus removal. In Microbial Ecology of Activated Sludge (eds. Seviour, R.J., and Nielsen, P.H.). IWA, London, United Kingdom.

McMahon, K.D., Yilmaz, S., He, S., Gall, D.L., Jenkins, D., and Keasling, J.D. (2007) Polyphosphate kinase genes from full-scale activated sludge plants. Appl. Microbiol. Biotechnol. 77: 167–73.

Meyer, F., Paarmann, D., D’Souza, M., Olson, R., Glass, E.M., Kubal, M., et al. (2008) The metagenomics RAST server - a public resource for the automatic phylogenetic and functional analysis of metagenomes. BMC Bioinformatics 9: 386.

Mielczarek, A.T., Nguyen, H.T.T., Nielsen, J.L., and Nielsen, P.H. (2013) Population dynamics of bacteria involved in enhanced biological phosphorus removal in Danish wastewater treatment plants. Water Res. 47: 1529–44.

Mino, T., Tsuzuki, Y. and Matsuo, T. (1987) Effect of phosphorus accumulation on acetate metabolism in the biological phosphorus removal process. In: Biological Phosphate Removal from Wastewaters (Ramadori, R., Ed.), pp. 27-38. Pergamon Press, Oxford.

86 Mino, T., van Loosdrecht, M.C.M., and Heijnen, J.J. (1998) Microbiology and Biochemistry of the Enhanced Biological Phosphorus Removal Process. Water Res. 32: 3193–3207.

Mino, T. and Satoh, H. (2006) Wastewater genomics. Nat. Biotechnol. 24: 1229–1230.

Miura, Y., Watanabe, Y., and Okabe, S. (2007) Significance of Chloroflexi in performance of submerged membrane bioreactors (MBR) treating municipal wastewater. Environ. Sci. Technol. 41: 7787–94.

Morales-Belpaire, I. and Gerin, P. a (2007) Factors affecting the fate of active proteins introduced in wastewater sludges: investigation with green fluorescent protein. Water Res. 41: 1723–33.

Moreno, A.M., Matz, C., Kjelleberg, S., and Manefield, M. (2010) Identification of ciliate grazers of autotrophic bacteria in ammonia-oxidizing activated sludge by RNA stable isotope probing. Appl. Environ. Microbiol. 76: 2203–11.

Morgenroth, E., Kommedal, R., and Harremoës, P. (2002) Processes and modeling of hydrolysis of particulate organic matter in aerobic wastewater treatment--a review. Water Sci. Technol. 45: 25–40.

Neufeld, J.D., Chen, Y., Dumont, M.G., and Murrell, J.C. (2008) Marine methylotrophs revealed by stable-isotope probing, multiple displacement amplification and metagenomics. Environ. Microbiol. 10: 1526–35.

Nguyen, H.T.T., Le, V.Q., Hansen, A.A., Nielsen, J.L., and Nielsen, P.H. (2011) High diversity and abundance of putative polyphosphate-accumulating Tetrasphaera-related bacteria in activated sludge systems. FEMS Microbiol. Ecol. 76: 256–67.

Nicholls, A.H.A., Osborn, D.W., and Nicholls, H.A. (1979) Bacterial stress: prerequisite for biological removal of phosphorus. J. WPCF 51: 557–569.

Nielsen, J.L., Nguyen, H., Meyer, R.L., and Nielsen, P.H. (2012) Identification of glucose- fermenting bacteria in a full-scale enhanced biological phosphorus removal plant by stable isotope probing. Microbiology 158: 1818–25.

Nielsen, P.H., Kragelund, C., Seviour, R.J., and Nielsen, J.L. (2009) Identity and ecophysiology of filamentous bacteria in activated sludge. FEMS Microbiol. Rev. 33: 969–98.

Nielsen, P.H., Mielczarek, A.T., Kragelund, C., Nielsen, J.L., Saunders, A.M., Kong, Y., et al. (2010) A conceptual ecosystem model of microbial communities in enhanced biological phosphorus removal plants. Water Res. 44: 5070–5088.

Nielsen, P.H., Roslev, P., Dueholm, T.E., and Nielsen, J.L. (2002) Microthrix parvicella, a specialized lipid consumer in anaerobic-aerobic activated sludge plants. Wat. Sci. Tech. 46: 73–80.

Nielsen, P.H., Saunders, A.M., Hansen, A.A., Larsen, P., and Nielsen, J.L. (2012) Microbial communities involved in enhanced biological phosphorus removal from wastewater — a model system in environmental biotechnology. Curr. Opin. Biotechnol. 23: 452–459.

Nobu, M.K., Tamaki, H., Kubota, K., and Liu, W.-T. (2014) Metagenomic characterization of “Candidatus Defluviicoccus tetraformis strain TFO71,” a tetrad-forming organism, predominant in

87 an anaerobic-aerobic membrane bioreactor with deteriorated biological phosphorus removal. Environ. Microbiol. doi:10.111: 1–13.

Nogueira, R. and Melo, L.F. (2006) Competition Between Nitrospira spp . and Nitrobacter spp . in Nitrite-Oxidizing Bioreactors. Biotechnol. Bioeng. 95: 169–175.

Oehmen, A., Lemos, P.C., Carvalho, G., Yuan, Z., Keller, J., Blackall, L.L., and Reis, M. a M. (2007) Advances in enhanced biological phosphorus removal: from micro to macro scale. Water Res. 41: 2271–300.

Oldham, W.K. (1986) Excess biological phosphorus removal in the activated sludge process using primary sludge fermentation. Can. J. Civ. Eng.

Oldham, W.K. (1985) Full Scale Optimization of Biological Phosphorus Removal at Kelowna, Canada. Wat. Sci. Tech. 17: 243–257.

Oldham, W.K. and Stevens, G.M. (1984) Initial operating experiences of a nutrient removal process (Modified Bardenpho) at Kelowna, British Columbia. Can. J. Civ. Eng. 11: 474–479.

Olson, T.C. and Hooper, A.B. (1983) Energy coupling in the bacterial oxidation of small molecules : an extracytoplasmic dehydrogenase in Nitrosomonas. FEMS Microbiol. Lett. 19: 47– 50.

Orihel, D.M., Bird, D.F., Brylinsky, M., Chen, H., Donald, D.B., Huang, D.Y., et al. (2012) High microcystin concentrations occur only at low nitrogen-to-phosphorus ratios in nutrient-rich Canadian lakes. Can. J. Fish. Aquat. Sci. 69: 1457–1462.

Osaka, T., Yoshie, S., Tsuneda, S., Hirata, A., Iwami, N., and Inamori, Y. (2006) Identification of acetate- or methanol-assimilating bacteria under nitrate-reducing conditions by stable-isotope probing. Microb. Ecol. 52: 253–66.

Otawa, K., Lee, S.H., Yamazoe, A., Onuki, M., Satoh, H., and Mino, T. (2007) Abundance, diversity, and dynamics of viruses on microorganisms in activated sludge processes. Microb. Ecol. 53: 143–52.

Park, H.-D., Wells, G.F., Bae, H., Criddle, C.S., and Francis, C. a (2006) Occurrence of ammonia- oxidizing archaea in wastewater treatment plant bioreactors. Appl. Environ. Microbiol. 72: 5643–7.

Pedrós-Alió, C. (2006) Marine microbial diversity: can it be determined? Trends Microbiol. 14: 257–63.

Pedrós-Alió, C. (2012) The rare bacterial biosphere. Ann. Rev. Mar. Sci. 4: 449–66.

Pereira, H., Lemos, P.C., Reis, M.A.M., Cresp, J.P.S.G., Carrond, M.J.T., and Santos, H. (1996) Model for carbon metabolism in biological phosphorus removal processes based on in vivo 13C- NMR labelling experiments. Wat. Res. 30: 2128–2138.

Pester, M., Bittner, N., Deevong, P., Wagner, M., and Loy, A. (2010) A “rare biosphere” microorganism contributes to sulfate reduction in a peatland. ISME J. 4: 1591–602.

Petropoulos, P. and Gilbride, K.A. (2005) Nitrification in activated sludge batch reactors is linked to protozoan grazing of the bacterial population. Can. J. Civ. Eng. 799: 791–799. 88 Pinto, A.J. and Raskin, L. (2012) PCR biases distort bacterial and archaeal community structure in pyrosequencing datasets. PLoS One 7: e43093.

Polz, M.F., Alm, E.J., and Hanage, W.P. (2013) Horizontal gene transfer and the evolution of bacterial and archaeal population structure. Trends Genet. 29: 170–5.

Pradeep Ram, A.S. and Sime-Ngando, T. (2008) Functional responses of prokaryotes and viruses to grazer effects and nutrient additions in freshwater microcosms. ISME J. 2: 498–509.

Purkhold, U., Pommerening-Röser, A., Juretschko, S., Schmid, M.C., Koops, H., and Wagner, M. (2000) Phylogeny of All Recognized Species of Ammonia Oxidizers Based on Comparative 16S rRNA and amoA Sequence Analysis : Implications for Molecular Diversity Surveys. Appl. Environ. Microbiol. 66: 5368–5382.

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., et al. (2013) The SILVA ribosomal RNA gene database project: improved data processing and web-based tools. Nucleic Acids Res. 41: D590–6.

Rabinowitz, B. and Oldham, W.K. (1986) Excess biological phosphorus removal in the activated sludge process using primary sludge fermentation. Can. J. Civ. Eng. 13: 345–351.

Ramette, A. (2007) Multivariate analyses in microbial ecology. FEMS Microbiol. Ecol. 62: 142– 60.

Rappé, M.S. and Giovannoni, S.J. (2003) The uncultured microbial majority. Annu. Rev. Microbiol. 57: 369–94.

Rinke, C., Schwientek, P., Sczyrba, A., Ivanova, N.N., Anderson, I.J., Cheng, J.-F., et al. (2013) Insights into the phylogeny and coding potential of microbial dark matter. Nature 499: 431–7.

Rodríguez-Blanco, A., Ghiglione, J.-F., Catala, P., Casamayor, E.O., and Lebaron, P. (2009) Spatial comparison of total vs. active bacterial populations by coupling genetic fingerprinting and clone library analyses in the NW Mediterranean Sea. FEMS Microbiol. Ecol. 67: 30–42.

Rossetti, S., Tomei, M.C., Nielsen, P.H., and Tandoi, V. (2005) “Microthrix parvicella”, a filamentous bacterium causing bulking and foaming in activated sludge systems: a review of current knowledge. FEMS Microbiol. Rev. 29: 49–64.

Rotthauwe, J. and Witzel, K. (1997) The Ammonia Monooxygenase Structural Gene amoA as a Functional Marker : Molecular Fine-scale Analysis of Natural Ammonia-Oxidizing Populations. Appl. Environ. Microbiol. 63: 4704–4712.

Russell, J.B. and Cook, G.M. (1995) Energetics of bacterial growth: balance of anabolic and catabolic reactions. Microbiol. Rev. 59: 48–62.

Saunders, A.M., Larsen, P., and Nielsen, P.H. (2013) Comparison of nutrient-removing microbial communities in activated sludge from full-scale MBRs and conventional plants. Wat. Sci. Tech. 68: 366–71.

Sentchilo, V., Mayer, A.P., Guy, L., Miyazaki, R., Green Tringe, S., Barry, K., et al. (2013) Community-wide plasmid gene mobilization and selection. ISME J. 7: 1173–1186.

89 Seviour, R.J., Mino, T., and Onuki, M. (2003) The microbiology of biological phosphorus removal in activated sludge systems. FEMS Microbiol. Rev. 27: 99 – 127.

Seviour, R.J., and Nielsen, P.H. (2010) Microbial Ecology of Activated Sludge. IWA, London, United Kingdom.

Shade, A. and Handelsman, J. (2012) Beyond the Venn diagram: the hunt for a core microbiome. Environ. Microbiol. 14: 4–12.

Shade, A., Hogan, C.S., Klimowicz, A.K., Linske, M., McManus, P.S., and Handelsman, J. (2012) Culturing captures members of the soil rare biosphere. Environ. Microbiol. 14: 2247–52.

Shapiro, B.J. and Polz, M.F. (2014) Ordering microbial diversity into ecologically and genetically cohesive units. Trends Microbiol. 22: 235–247.

Sharon, I. and Banfield, J.F. (2013) Microbiology. Genomes from metagenomics. Science (80-. ). 342: 1057–8.

Sharp, J.H., Yoshiyama, K., Parker, A.E., Schwartz, M.C., Curless, S.E., Beauregard, A.Y., et al. (2009) A Biogeochemical View of Estuarine Eutrophication: Seasonal and Spatial Trends and Correlations in the Delaware Estuary. Estuaries and Coasts 32: 1023–1043.

Sherr, E.B. and Sherr, B.F. (2002) Significance of predation by protists in aquatic microbial food webs. Antonie Van Leeuwenhoek 81: 293–308.

Silva, A.F., Carvalho, G., Oehmen, A., Lousada-Ferreira, M., van Nieuwenhuijzen, A., Reis, M. a M., and Crespo, M.T.B. (2012) Microbial population analysis of nutrient removal-related organisms in membrane bioreactors. Appl. Microbiol. Biotechnol. 93: 2171–80.

Skennerton, C.T., Imelfort, M., and Tyson, G.W. (2013) Crass: identification and reconstruction of CRISPR from unassembled metagenomic data. Nucleic Acids Res. 41: e105.

Soddell, J. a, Stainsby, F.M., Eales, K.L., Seviour, R.J., and Goodfellow, M. (2006) Gordonia defluvii sp. nov., an actinomycete isolated from activated sludge foam. Int. J. Syst. Evol. Microbiol. 56: 2265–9.

Soddell, J.A., Seviour, R.J., Blackall, L.L., and Hugenholtz, P. (1998) New Foam-Forming Nocardioforms Found in Actiaved Sludge. Wat. Sci. Tech. 37: 495–502.

Sogin, M.L., Morrison, H.G., Huber, J. a, Mark Welch, D., Huse, S.M., Neal, P.R., et al. (2006) Microbial diversity in the deep sea and the underexplored “rare biosphere”. Proc. Natl. Acad. Sci. U. S. A. 103: 12115–20.

Smolders, G., van der Meij, J., van Loosdrecht, M., and Heijnen, J. (1994a) Model of the Anaerobic Metabolism of the Biological Phosphorus Removal Process: Stoichiometry and pH Influence. Biotechnol. Bioeng. 43: 461-470.

Smolders, G., van der Meij, J., van Loosdrecht, M., & Heijnen, J. (1994b) Stoichiometric Model of the Aerobic Metabolism of the Biological Phosphorus Removal Process. Biotechnol. Bioeng. 44: 837-848.

90 Smolders, G., van der Meij, J., van Loosdrecht, M., & Heijnen, J. (1995) A Structured Metabolic Model for Anaerobic and Aerobic Stoichiometry and Kinetics of the Biological Phosphorus Removal Process . Biotechnol. Bioeng. 47: 277-287.

Spieck, E., Ehrich, S., Aamand, J., and Bock, E. (1998) Isolation and immunocytochemical location of the nitrite-oxidizing system in nitrospira moscoviensis. Arch. Microbiol. 169: 225–30.

Spieck, E., Hartwig, C., McCormack, I., Maixner, F., Wagner, M., Lipski, A., and Daims, H. (2006) Selective enrichment and molecular characterization of a previously uncultured Nitrospira- like bacterium from activated sludge. Environ. Microbiol. 8: 405–15.

Stahl, D. a and de la Torre, J.R. (2012) Physiology and diversity of ammonia-oxidizing archaea. Annu. Rev. Microbiol. 66: 83–101.

Sutherland, I. (1995) Polysaccharide lyases. FEMS Microbiol. Rev. 16: 323–347.

Szczepanowski, R., Linke, B., Krahn, I., Gartemann, K.-H., Gützkow, T., Eichler, W., et al. (2009) Detection of 140 clinically relevant antibiotic-resistance genes in the plasmid metagenome of wastewater treatment plant bacteria showing reduced susceptibility to selected antibiotics. Microbiology 155: 2306–19.

Taroncher-Oldenburg, G., Griner, E.M., Francis, C.A., and Ward, B.B. (2003) Oligonucleotide Microarray for the Study of Functional Gene Diversity in the Nitrogen Cycle in the Environment. Appl. Environ. Microbiol. 69: 1159–1171.

Thomsen, T.R., Kong, Y., and Nielsen, P.H. (2007) Ecophysiology of abundant denitrifying bacteria in activated sludge. FEMS Microbiol. Ecol. 60: 370–82.

Tsukioka, Y., Yamashita, Y., Oho, T., Nakano, Y., and Koga, T. (1997) Biological function of the dTDP-rhamnose synthesis pathway in Streptococcus mutans . Biological Function of the dTDP- Rhamnose Synthesis Pathway in Streptococcus mutans. J. Bacteriol. 179: 1126–1134.

Tyson, G.W., Chapman, J., Hugenholtz, P., Allen, E.E., Ram, R.J., Richardson, P.M., et al. (2004) Community structure and metabolism through reconstruction of microbial genomes from the environment. Nature 428: 37–43.

Velicer, G.J. and Vos, M. (2009) Sociobiology of the myxobacteria. Annu. Rev. Microbiol. 63: 599–623.

Vollertsen, J., Petersen, G., and Borregaard, V.R. (2006) Hydrolysis and fermentation of activated sludge to enhance biological phosphorus removal. Water Sci. Technol. 53: 55.

Wan, C.-Y., De Wever, H., Diels, L., Thoeye, C., Liang, J.-B., and Huang, L.-N. (2011) Biodiversity and population dynamics of microorganisms in a full-scale membrane bioreactor for municipal wastewater treatment. Water Res. 45: 1129–38.

Warren, A., Salvado, H., Curds, C.R., and Roberts, D.M. (2010) Protozoa in activated sludge processes. In Microbial Ecology of Activated Sludge (eds. Seviour, R.J., and Nielsen, P.H.). IWA, London, United Kingdom.

91 Wentzel, M.C., Lotter, L.H., Loewenthal, R.E., and Marais, G. (2000) Metabolic behaviour of Acinetobacter spp . in enhanced biological phosphorus removal - a biochemical model. Water SA 12: 209.

Wexler, M., Richardson, D.J., and Bond, P.L. (2009) Radiolabelled proteomics to determine differential functioning of Accumulibacter during the anaerobic and aerobic phases of a bioreactor operating for. Environ. Microbiol. 11: 3029–3044.

Wilhelm, L., Besemer, K., Fasching, C., Urich, T., Singer, G. a, Quince, C., and Battin, T.J. (2014) Rare but active taxa contribute to community dynamics of benthic biofilms in glacier-fed streams. Environ. Microbiol. 1–11.

Wilmes, P., Andersson, A.F., Lefsrud, M.G., Wexler, M., Shah, M., Zhang, B., et al. (2008) Community proteogenomics highlights microbial strain-variant protein expression within activated sludge performing enhanced biological phosphorus removal. ISME J. 2: 853–64.

Wingender, J., Neu, T.R., and Flemming, H.-C. (1999) Microbial extracellular polymeric substances: characterization, structure, and function. 1st edition. Springer, Berlin, Heidelberg.

Wong, M.-T., Mino, T., Seviour, R.J., Onuki, M., and Liu, W.-T. (2005) In situ identification and characterization of the microbial community structure of full-scale enhanced biological phosphorous removal plants in Japan. Water Res. 39: 2901–14.

Wong, M.-T., Tan, F.M., Ng, W.J., and Liu, W.-T. (2004) Identification and occurrence of tetrad- forming Alphaproteobacteria in anaerobic-aerobic activated sludge processes. Microbiology 150: 3741–8.

Wright, J.J., Konwar, K.M., and Hallam, S.J. (2012) Microbial ecology of expanding oxygen minimum zones. Nat. Rev. Microbiol. 10: 381–94.

Xia, Y., Kong, Y., and Nielsen, P.H. (2007) In situ detection of protein-hydrolysing microorganisms in activated sludge. FEMS Microbiol. Ecol. 60: 156–65.

Xia, Y., Kong, Y., and Nielsen, P.H. (2008) In situ detection of starch-hydrolyzing microorganisms in activated sludge. FEMS Microbiol. Ecol. 66: 462–71.

Xia, Y., Kong, Y., and Thomsen, T.R. (2008) Identification and Ecophysiological Characterization of Epiphytic Protein-Hydrolyzing Saprospiraceae (“ Candidatus Epiflobacter ” spp.) in Activated Sludge. Appl. Environ. Microbiol. 74: 2229 – 2238.

Yoon, D.-N., Park, S.-J., Kim, S.-J., Jeon, C.O., Chae, J.-C., and Rhee, S.-K. (2010) Isolation, characterization, and abundance of filamentous members of Caldilineae in activated sludge. J. Microbiol. 48: 275–83.

Zhang, J., Liu, Z., Wang, S., and Jiang, P. (2002) Characterization of a bioflocculant produced by the marine myxobacterium Nannocystis sp. NU-2. Appl. Microbiol. Biotechnol. 59: 517–22.

Zhang, T., Shao, M.-F., and Ye, L. (2012) 454 Pyrosequencing Reveals Bacterial Diversity of Activated Sludge From 14 Sewage Treatment Plants. ISME J. 6: 1137–47.

92 Zhang, T., Ye, L., Tong, A.H.Y., Shao, M.-F., and Lok, S. (2011) Ammonia-oxidizing archaea and ammonia-oxidizing bacteria in six full-scale wastewater treatment bioreactors. Appl. Microbiol. Biotechnol. 91: 1215–25.

Zhang, T., Zhang, X.-X., and Ye, L. (2011) Plasmid metagenome reveals high levels of antibiotic resistance genes and mobile genetic elements in activated sludge. PLoS One 6: e26041.

Zhou, Y., Liang, Y., Lynch, K.H., Dennis, J.J., and Wishart, D.S. (2011) PHAST: a fast phage search tool. Nucleic Acids Res. 39: W347–52.

Zilles, J.L., Peccia, J., Kim, M., Hung, C., and Noguera, D.R. (2002) Involvement of Rhodocyclus -Related Organisms in Phosphorus Removal in Full-Scale Wastewater Treatment Plants. Appl. Environ. Microbiol. 68: 2763 – 2769.

93 Appendix A – Chapter 2 supplementary material

Table A1 Sampling and sequencing statistics DNA RNA Total Filtered Total Filtered Sample reads readsa reads readsa Day 1 (January, 31, 2013) Anaerobic, replicate1 11970 11731 11570 11435 Anaerobic, replicate2 12171 11962 12001 11805 Anaerobic, replicate3 10262 10052 9771 9652 Anoxic, replicate1 10807 10626 11035 10893 Anoxic, replicate2 6080 5965 14432 14272 Anoxic, replicate3 10813 10603 10677 10528 Aerobic, replicate1 9363 9198 11551 11422 Aerobic, replicate2 - - 11661 11517 Aerobic, replicate3 8027 7887 14786 14591 Day 19 (February 18, 2013) Anaerobic, replicate1 12874 8271 12262 12123 Anaerobic, replicate2 1588 10974 12599 12408 Anaerobic, replicate3 4427 10931 9295 9136 Anoxic, replicate1 4766 9453 10280 10132 Anoxic, replicate2 2993 8343 7810 7698 Anoxic, replicate3 5719 9232 9682 9546 Aerobic, replicate1 1829 10044 8623 8500 Aerobic, replicate2 543 11521 8715 8596 Aerobic, replicate3 1527 8973 11697 11519 Day 34 (March, 05, 2013) Anaerobic, replicate1 8680 8305 13623 13412 Anaerobic, replicate2 11509 10174 8763 8635 Anaerobic, replicate3 11530 10057 13138 12861 Anoxic, replicate1 9704 8345 8323 8190 Anoxic, replicate2 8780 9974 9319 9158 Anoxic, replicate3 9696 10380 9493 9253 Aerobic, replicate1 10410 11071 10022 9805 Aerobic, replicate2 11657 10564 12007 11775 Aerobic, replicate3 9423 9341 9815 9649 Day 48 (March 19, 2013) Anaerobic, replicate1 8755 12513 12288 12066 Anaerobic, replicate2 10784 1530 11252 11088 Anaerobic, replicate3 10572 4264 10231 10065 Anoxic, replicate1 8772 4595 11303 11094 Anoxic, replicate2 10265 2698 11560 11368 Anoxic, replicate3 10602 5566 3201 3153 Aerobic, replicate1 11260 1776 11742 11554 94 Aerobic, replicate2 10673 520 12423 12240 Aerobic, replicate3 9469 1481 11387 11270 Day 65 (April 05, 2013) Anaerobic, replicate1 13358 12874 11925 11637 Anaerobic, replicate2 10609 9916 12921 12639 Anaerobic, replicate3 4603 4381 9028 8787 Anoxic, replicate1 3457 3352 10977 10748 Anoxic, replicate2 5056 4859 8936 8753 Anoxic, replicate3 5404 5210 8762 8567 Aerobic, replicate1 9545 9226 15612 15333 Aerobic, replicate2 7720 7441 10641 10470 Aerobic, replicate3 7173 6893 11773 11538 Day 79 (April 19, 2013) Anaerobic, replicate1 8573 8459 11596 11289 Anaerobic, replicate2 8836 8732 12279 11969 Anaerobic, replicate3 9848 9738 11125 10839 Anoxic, replicate1 9037 8924 11409 11100 Anoxic, replicate2 3372 3328 10646 10288 Anoxic, replicate3 9133 9027 12142 11730 Aerobic, replicate1 10582 10454 12574 12272 Aerobic, replicate2 8487 8362 - - Aerobic, replicate3 10420 10287 24867 24282 Day 98 (May 08, 2013) Anaerobic, replicate1 6691 6585 12135 11843 Anaerobic, replicate2 11117 10923 11405 11041 Anaerobic, replicate3 8929 8752 11717 11446 Anoxic, replicate1 9855 9671 11636 11264 Anoxic, replicate2 15029 14731 12271 11849 Anoxic, replicate3 9297 9121 11683 11446 Aerobic, replicate1 12622 12329 19446 18885 Aerobic, replicate2 22141 21733 - - Aerobic, replicate3 10512 10332 10763 10443 Day 112 (May 22, 2013) Anaerobic, replicate1 9131 8938 13108 12814 Anaerobic, replicate2 10019 9798 11319 11078 Anaerobic, replicate3 10094 9907 10925 10666 Anoxic, replicate1 7506 7363 9813 9573 Anoxic, replicate2 10870 10663 11569 11279 Anoxic, replicate3 6043 5951 8990 8817 Aerobic, replicate1 10405 10201 11992 11637 Aerobic, replicate2 10313 10119 8925 8689 Aerobic, replicate3 9434 9266 11008 10846 a Singletons removed.

95 Table A2 OTU richness and diversity estimates rRNA pool rDNA pool Sample OTUsa Shannonb Simpsonb Chao1c OTUs Shannon Simpson Chao1 Anaerobic Day 1 2929 4.63 13.29 3286.66 3557 5.17 19.53 4029.03 Day 19 3667 5.34 47.37 4177.63 5297 6.07 38.76 6175.46 Day 34 3937 5.56 49.37 4509.88 2739 5.73 38.44 3269.21 Day 48 4040 5.47 42.79 4480.33 5196 6.11 44.17 6198.78 Day 65 4269 5.77 53.11 5041.57 4789 6.69 94.95 5737.37 Day 79 4596 5.84 54.09 5220.22 2952 5.64 59.31 3301.23 Day 98 4430 5.92 64.92 5067.12 3422 5.87 62.55 3783.41 Day 112 4114 5.44 27.98 4545.44 3583 5.85 58.92 4041.99 Anoxic Day 1 2748 3.95 5.69 3084.18 2862 4.71 10.69 3161.52 Day 19 3092 5.30 46.60 3320.17 4733 6.23 63.88 5329.31 Day 34 3249 5.44 42.80 3709.45 2358 6.12 74.65 2873.56 Day 48 3386 5.39 44.73 3814.49 4451 5.82 36.83 5011.12 Day 65 3616 5.63 42.10 4104.28 2448 6.25 69.61 2860.08 Day 79 4679 5.83 51.83 5303.43 2642 5.67 64.22 2961.01 Day 98 4627 5.92 63.56 5270.51 3808 5.80 61.13 4303.06 Day 112 3802 5.42 30.71 4426.81 3144 5.85 63.53 3588.08 Aerobic Day 1 3036 4.37 10.47 3507.68 2167 5.02 18.63 2693.75 Day 19 3123 5.22 44.88 3625.90 4596 6.10 61.01 5441.68 Day 34 3728 5.39 37.64 4182.73 1161 5.81 60.00 1389.04 Day 48 3757 5.51 50.83 4297.09 3372 5.49 36.63 3784.72 Day 65 4015 5.59 44.87 4446.16 2857 6.21 65.46 3392.51 Day 79 4552 5.74 58.31 6239.53 3147 5.69 61.60 3464.46 Day 98 4526 6.04 68.24 6183.16 4473 5.84 63.40 5062.06 Day 112 4017 5.56 37.41 4487.16 3595 5.28 53.46 4130.54 a Number of OTUs in sample. b Shannon and Simpson diversity indices consider both OTU (97%) richness and evenness. Inverse Simpson values are reported. c Chao1 index for OTU richness.

96 Table A3 Abundance and activity of select EBPR taxa SSU rRNA:rDNA Taxonomy by Functional Group %rDNAa range %rRNAa range ratioa range Higher Classification Hydrolyzers Candidatus Microthrix parvicella 11.80 8.15-15.22 3.49 0.94 - 4.57 0.43 0.14 - 0.87 Actinobacteria Gordonia 8.40 0.74-29.22 2.90 0.99 - 5.89 0.84 0.26 - 1.36 Corynebacteriales Chitinophagaceae 7.02 2.31-14.77 1.28 0.70 - 1.67 0.41 0.10 - 0.78 Sphingobacteriales Sapropiraceae 5.80 2.12-11.24 1.10 0.42 - 2.97 0.26 0.10 - 0.51 Sphingobacteriales Flexibacter 4.82 1.20 - 9.74 0.65 0.27 - 1.21 0.35 0.07 - 1.18 Cytophagales NS9 marine group 3.52 0.37 - 6.52 0.74 0.08 - 1.30 0.29 0.19 - 0.51 Flavobacteriales env.OPS 17 1.48 0.46 - 3.15 0.31 0.07 - 0.94 0.51 0.10 - 2.41 Sphingobacteriales Mycobacterium 1.31 0.33 - 3.17 0.24 0.16 -0.32 0.37 0.08 - 0.70 Corynebacteriales Persicobacter 1.13 0.01 - 3.57 1.24 0.01 - 3.01 2.08 0.30 - 5.59 Cytophagales PHOS-HE51 1.06 0.08 - 4.72 0.16 0.02 - 0.39 0.35 0.10 - 0.82 Sphingobacteriales Flavobacterium 0.84 0.52 - 1.22 0.26 0.12 - 0.61 0.50 0.20 - 1.75 Flavobacteriales Chryseobacterium 0.77 0.01 - 4.81 0.08 0.02 - 0.17 1.11 0.05 - 3.19 Flavobacteriales Thermomicrobia 0.62 0.25 - 0.95 0.11 0.06 - 0.18 0.33 0.06 - 0.68 Chloroflexi Caldilinea 0.59 0.28 - 1.15 0.23 0.05 - 0.43 0.54 0.18 - 1.11 Chloroflexi-Caldilineae Anaerolineae 0.33 0.11 - 0.67 0.12 0.02 - 0.23 0.54 0.22 - 1.39 Chloroflexi Candidate division TM7 0.19 0.05 - 0.78 0.01 0.00 - 0.01 0.11 0.01 - 0.43 TM7 Fermenters Lachnospiraceae 0.52 0.33 - 0.82 0.86 0.22 - 1.43 2.35 0.77 - 3.67 Clostridiales Christensenellaceae 0.46 0.16 - 0.94 0.30 0.13 - 0.45 1.14 0.46 - 3.31 Clostridiales Ruminococcaceae 0.46 0.36 - 0.56 0.38 0.17 - 0.56 1.22 0.56 - 3.10 Clostridiales Streptococcus 0.38 0.23 - 0.55 5.42 1.7 - 12.03 20.90 4.81 - 46.43 Lactobacillales Paludibacter 0.29 0.14 - 0.44 0.12 0.06 - 0.18 0.58 0.36 - 1.51 Bacteroidales Lactococcus 0.16 0.06 - 0.22 4.65 2.97 - 8.47 49.28 17.51 - 112.03 Lactobacillales

97 PAO/GAO Propionivibrio 1.38 0.46 - 2.63 1.21 0.59 - 1.72 1.97 0.53 - 8.05 Rhodocyclaceae Tetrasphaera 0.11 0.08 - 0.14 0.02 0.00 - 0.04 0.25 0.03 - 0.50 Actinobacteria Candidatus Accumulibacter sp. 0.06 0.02 - 0.14 0.83 0.36 - 1.38 31.30 6.25 - 99.31 Rhodocyclaceae Defluviicoccus 0.06 0.03 - 0.10 0.07 0.02 - 0.11 1.64 0.69 - 3.75 Alphaproteobacteria AOB/NOB Nitrospira 0.90 0.57 - 1.38 0.51 0.25 - 0.84 0.74 0.35 - 0.98 Nitrospirales Candidatus Nitrotoga 0.04 ND - 0.09 0.36 0.17 - 0.63 58.21 1.98 - 226.67 Nitrosomonadales Nitrosomonas 0.01 ND - 0.02 0.05 0.03 - 0.09 8.86 1.94 - 33.00 Nitrosomonadales Denitrifiers Thauera 2.60 0.28 - 4.89 1.56 0.28 - 3.29 0.91 0.20 - 1.92 Rhodocyclaceae Acidovorax 2.27 0.61 - 3.63 0.62 0.26 - 1.05 0.56 0.22 - 2.12 Comamonadaceae Dechloromonas 1.44 0.97 - 2.09 7.39 4.38 - 12.6 8.81 3.55 - 29.61 Rhodocyclaceae Zoogloea 0.40 0.10 - 0.63 0.19 0.13 - 0.23 0.97 0.34 - 2.41 Rhodocyclaceae Sphaerotilus 0.38 0.12 - 0.89 0.70 0.26 - 1.06 3.79 0.82 - 11.64 Comamonadaceae Rhodobacter 0.30 0.08 - 0.65 0.11 0.06 - 0.15 0.82 0.20 - 1.96 Rhodobacteraceae Ancalomicrobium 0.24 0.10 - 0.34 0.13 0.07 - 0.18 0.81 0.41 - 1.09 Hyphomicrobiaceae Azospira 0.18 0.11 - 0.28 0.08 0.06 - 0.13 0.68 0.27 - 1.40 Rhodocyclaceae Variovorax 0.18 0.08 - 0.40 0.16 0.09 - 0.22 1.75 0.24 - 3.50 Comamonadaceae Hyphomicrobium 0.17 0.05 - 0.37 0.10 0.04 - 0.19 0.99 0.34 - 1.65 Hyphomicrobiaceae Grazers Peritrichia 0.20 ND - 0.73 10.50 0.87-19.85 412.97 18.54 - 1224.67 Ciliphora Suctoria 0.20 ND - 0.81 1.80 0.25 - 5.87 96.44 8.27 - 197.75 Ciliphora Rotifera 0.09 ND - 0.27 6.69 0.82-38.70 96.84 6.17 - 282.40 Metazoa Euglenida 0.04 ND - 0.11 0.87 0.05 - 2.32 35.52 11.73 - 58.71 Excavata Vannella 0.004 ND - 0.01 0.83 0.09 - 3.96 262.17 14.78 - 652.83 Amoebozoa Aspidisca ND ND 1.61 0.26 - 4.76 1446.34 256.00-4395.00 Ciliphora Other

98 Planctomycetes 3.78 1.87 - 6.56 1.78 0.87 - 3.02 0.66 0.34 - 1.09 Bacteria Hydrogenophilaceae 1.48 0.21 - 3.45 0.59 0.34 - 0.75 1.42 0.17 - 4.58 Betaproteobacteria LKM11 group 1.24 0.13 - 6.63 0.17 0.03 - 0.51 0.84 0.09 - 4.67 Fungi Acidobacteria 0.77 0.36 - 1.00 0.14 0.09 - 0.24 0.25 0.15 - 0.35 Bacteria Nannocystis 0.64 0.05 - 1.51 2.10 0.14 - 6.0 6.43 1.75 - 28.78 Myxobacteria Prosthecobacter 0.42 0.19 - 0.71 0.03 0.01 - 0.04 0.10 0.06 - 0.23 Verrucomicrobia Armatimonadetes 0.20 ND - 0.70 0.92 0.01 - 3.08 6.21 1.63 - 13.23 Bacteria Methanosarcinales 0.13 0.02 - 0.34 0.37 0.13 - 0.86 5.17 2.87 - 3.20 Archaea Methanobacteriales 0.03 ND - 0.05 0.23 0.06 - 0.66 13.11 1.87 - 6.95 Archaea Basidiomycota 0.02 0.01 - 0.03 0.97 0.12 - 3.91 75.85 7.46 - 151.09 Fungi Fungi 0.01 ND - 0.02 0.24 0.06 - 0.46 5.77 0.14 - 13.84 Fungi amean values. ND = not detectable. ND values were imputed to calculate SSU rRNA: rDNA ratios.

99 Table A4 Indicator OTUs – rDNA abundance Period 1 Period 2 Period 3 Days 1, Days 48 Days 79, 19, & 98, p Taxonomic OTUa & 34 65 & 112 value classification rDNA_max rDNA_avg rRNA_max rRNA_avg Abundant 23654 0.16 0.11 0.73 0.007 Thauera 1.73% 0.75% 2.50% 1.18% 32670 0.14 0.10 0.76 0.008 Hydrogenophilaceae 1.38% 0.57% 0.50% 0.32% 33447 0.55 0.09 0.36 0.003 Dechloromonas 1.08% 0.48% 1.22% 0.77% Intermediate 56156 0.14 0.09 0.77 0.006 Flexibacter 0.96% 0.35% 0.06% 0.04% 4653 0.23 0.07 0.70 0.003 Dechloromonas 0.68% 0.29% 9.95% 5.24% 28771 0.69 0.06 0.25 0.006 Aeromonas 0.52% 0.17% 0.10% 0.05% 61685 0.19 0.02 0.80 0.003 Denitratisoma 0.51% 0.14% 0.07% 0.02% 59787 0.31 0.08 0.62 0.003 Gemmatimonadaceae 0.45% 0.20% 0.00% 0.00% 57183 0.73 0.00 0.26 0.005 Filomicrobium 0.39% 0.16% 0.00% 0.00% 47748 0.25 0.03 0.72 0.003 Polyangiaceae 0.39% 0.12% 0.01% 0.00% 27989 0.78 0.06 0.16 0.003 Gordonia 0.32% 0.08% 0.03% 0.02% 37305 0.66 0.06 0.28 0.004 Enhydrobacter 0.26% 0.12% 0.33% 0.25% 33208 0.35 0.08 0.57 0.007 Caldilinea 0.26% 0.14% 0.02% 0.01% 43703 0.76 0.09 0.15 0.007 Solirubrobacterales 0.24% 0.08% 0.05% 0.03% 28383 0.04 0.00 0.94 0.008 Oceanospirillales 0.24% 0.06% 0.14% 0.03% 62931 0.06 0.02 0.92 0.003 Planctomycetes 0.23% 0.04% 0.01% 0.00% 62243 0.24 0.16 0.60 0.008 MLE1-12 0.21% 0.09% 0.45% 0.28% 45955 0.05 0.01 0.93 0.009 Bacteroidales 0.20% 0.04% 0.02% 0.01% 62405 0.62 0.07 0.31 0.004 Azospira 0.18% 0.09% 0.04% 0.03% 14606 0.25 0.04 0.71 0.005 Haliangiaceae 0.18% 0.07% 0.08% 0.05% 43954 0.11 0.00 0.89 0.005 Phenylobacterium 0.17% 0.03% 0.01% 0.00%

100 41360 0.61 0.09 0.30 0.008 Lactococcus 0.16% 0.08% 6.74% 3.54% 9875 0.83 0.06 0.12 0.008 Alphaproteobacteria 0.14% 0.04% 0.00% 0.00% 41070 0.10 0.07 0.83 0.009 Chitinophagaceae 0.14% 0.05% 0.10% 0.02% 22155 0.87 0.00 0.12 0.003 Candidate division TM7 0.13% 0.03% 0.00% 0.00% 32176 0.14 0.03 0.83 0.003 Hydrogenophilaceae 0.13% 0.06% 0.02% 0.01% 31872 0.09 0.01 0.88 0.005 Bryobacter 0.13% 0.03% 0.09% 0.04% 16742 0.92 0.00 0.08 0.003 Gordonia 0.12% 0.03% 0.05% 0.02% 52371 0.32 0.08 0.60 0.005 Hyphomonadaceae 0.11% 0.05% 0.01% 0.01% 25575 0.28 0.08 0.64 0.003 Bacteroidetes 0.11% 0.05% 0.04% 0.02% 34754 0.68 0.05 0.28 0.004 Rhizobacter 0.10% 0.04% 0.01% 0.01% a Only OTUs with >0.1% rDNA abundance were reported.

101 Table A5 Indicator OTUs – SSU rRNA:rDNA ratio Days Days Day 19, Day 65,79, Day Taxonomic rRNA rRNA rDNA rDNA OTUa 1 34 48 98 112 p val classification max avg max avg Abundant 14721 0.00 0.03 0.04 0.03 0.90 0.007 Verrucomicrobia 0.01% 0.01% 1.07% 0.32% Intermediate 21475 0.04 0.06 0.80 0.01 0.09 0.008 Gordonia 0.07% 0.03% 0.40% 0.07% 19222 0.03 0.07 0.69 0.05 0.16 0.009 Pirellula 0.51% 0.18% 0.25% 0.12% 469 0.04 0.04 0.66 0.03 0.22 0.006 Afipia 0.02% 0.01% 0.16% 0.08% 10095 0.06 0.09 0.68 0.07 0.11 0.007 Neisseriaceae 0.25% 0.18% 0.10% 0.04% Rare 33323 0.02 0.10 0.08 0.59 0.21 0.007 Conthreep 0.11% 0.03% 0.02% 0.00% 57134 0.00 0.02 0.91 0.00 0.07 0.007 Haplozoon 0.57% 0.10% 0.00% 0.00% 2749 0.00 0.02 0.89 0.00 0.09 0.001 Haplozoon 0.31% 0.06% 0.00% 0.00% 27860 0.00 0.02 0.89 0.00 0.08 0.003 Haplozoon 0.12% 0.02% 0.00% 0.00% a Only OTUs with >0.1% rDNA or rRNA abundance were reported.

102 Appendix B – Chapter 3 supplementary material

UDP-D-xylose biosynthesis

Microthrix parvicella Bio 17-1

Microthrix parvicella UBC strain

Figure B1 UDP-D-xylose biosynthesis pathways in M. parvicella spp. Reactions indicated in bronze; genome ORF indicated in purple; enzyme classification number indicated in blue.

103

dTDP-L-rhamnose biosynthesis

M.parvicella UBC strain

M.parvicella Bio17-1

Figure B2 dTDP-L-rhamnose biosynthesis pathways in M. parvicella spp. Reactions indicated in bronze; genome ORF indicated in purple; enzyme classification number indicated in blue.

104 Table B1 Marker genes identified in population genome bins TIGRFAM Bin001 Bin002 Bin003 Bin004 Accession Function M. parvicella Gordonia spp. N/A N/A Total marker N/A 103 83 81 13 Unique marker N/A 99 80 72 12 PGK N/A 1 1 2 1 Ribosomal_L23 ribosomal protein L23 1 1 Ribosomal_L5 ribosomal protein L5 1 1 Ribosomal_L3 ribosomal protein L3 1 1 1 Ribosomal_L6 ribosomal protein L6 2 2 2 Ribosomal_S17 ribosomal protein S17 1 Ribosomal_S9 ribosomal protein S9 1 1 1 Ribosomal_S8 ribosomal protein S8 1 1 1 Ribosomal_S11 ribosomal protein S11 1 1 Ribosomal_S13 ribosomal protein S13 1 1 1 Ribosomal_L10 ribosomal protein L10 1 1 1 Ribosomal_L4 ribosomal protein L4 1 1 1 tRNA-synt_1d tRNA synthetases class I 1 2 GrpE protein GrpE 1 2 1 1 Methyltransf_5 methyltransferase 1 1 1 2 TIGR00001 ribosomal protein L35 1 1 1 TIGR00002 ribosomal protein S16 1 1 1 TIGR00009 ribosomal protein L28 1 1 1 TIGR00012 ribosomal protein L29 1 TIGR00019 peptide chain release factor 1 TIGR00029 ribosomal protein S20 1 TIGR00043 probable rRNA maturation factor YbeY 1 1 1 TIGR00059 ribosomal protein L17 1 TIGR00060 ribosomal protein L18 1 1 TIGR00061 ribosomal protein L21 1 1 1 TIGR00062 ribosomal protein L27 1 1 1 TIGR00064 signal recognition particle-docking protein FtsY 1 1 TIGR00082 ribosome-binding factor A 1 1 1 TIGR00086 SsrA-binding protein 1 1 2 TIGR00092 GTP-binding protein YchF 1 1 1 TIGR00115 trigger factor 1 1 2 TIGR00116 translation elongation factor Ts 1 1 2 TIGR00152 dephospho-CoA kinase 1 1 1 1 TIGR00158 ribosomal protein L9 1 1 1 TIGR00165 ribosomal protein S18 1 1 1 TIGR00166 ribosomal protein S6 1 1 1 TIGR00168 translation initiation factor IF-3 1 1 1 TIGR00234 tyrosine--tRNA ligase 1 1 2 TIGR00337 CTP synthase 1 1 1 TIGR00344 alanine--tRNA ligase 1 1 TIGR00362 chromosomal replication initiator protein DnaA 1 1 1 TIGR00389 glycine--tRNA ligase 1 1 1 TIGR00392 isoleucine--tRNA ligase 2 TIGR00396 leucine--tRNA ligase 2 1 TIGR00409 proline--tRNA ligase 1 1 1 TIGR00414 serine--tRNA ligase 1 1 2 TIGR00418 threonine--tRNA ligase 1 1 tRNA (5-methylaminomethyl-2-thiouridylate)- TIGR00420 methyltransferase 1 1 1 TIGR00422 valine--tRNA ligase TIGR00435 cysteine--tRNA ligase 2 1 1 TIGR00436 GTP-binding protein Era 1 1 1 105 TIGR00442 histidine--tRNA ligase 1 1 1 TIGR00459 aspartate--tRNA ligase 1 TIGR00460 methionyl-tRNA formyltransferase 1 1 TIGR00468 phenylalanine--tRNA ligase, alpha subunit 1 1 TIGR00472 phenylalanine--tRNA ligase, beta subunit 1 1 TIGR00487 translation initiation factor IF-2 1 1 1 1 TIGR00496 ribosome recycling factor 1 2 1 TIGR00575 DNA ligase, NAD-dependent 1 1 2 1 TIGR00631 excinuclease ABC subunit B 1 1 TIGR00663 DNA polymerase III, beta subunit 1 1 1 TIGR00755 ribosomal RNA small subunit methyltransferase A 1 1 TIGR00810 preprotein translocase, SecG subunit 1 1 1 TIGR00855 ribosomal protein L7/L12 1 1 1 TIGR00922 transcription termination/antitermination factor NusG 1 1 1 TIGR00952 ribosomal protein S15 1 1 1 TIGR00959 signal recognition particle protein 1 1 TIGR00963 preprotein translocase, SecA subunit 1 1 TIGR00964 preprotein translocase, SecE subunit 1 1 TIGR00967 preprotein translocase, SecY subunit 1 1 TIGR00981 ribosomal protein S12 1 1 TIGR01009 ribosomal protein S3 1 1 TIGR01011 ribosomal protein S2 1 1 1 1 TIGR01017 ribosomal protein S4 1 1 1 TIGR01021 ribosomal protein S5 1 1 TIGR01024 ribosomal protein L19 1 1 1 TIGR01029 ribosomal protein S7 1 1 TIGR01030 ribosomal protein L34 TIGR01031 ribosomal protein L32 1 1 TIGR01032 ribosomal protein L20 1 1 1 TIGR01044 ribosomal protein L22 1 1 TIGR01049 ribosomal protein S10 1 1 1 TIGR01050 ribosomal protein S19 1 1 1 TIGR01059 DNA gyrase, B subunit 1 TIGR01063 DNA gyrase, A subunit 1 TIGR01066 ribosomal protein L13 1 1 1 TIGR01067 ribosomal protein L14 1 1 TIGR01071 ribosomal protein L15 1 1 1 TIGR01079 ribosomal protein L24 1 1 TIGR01164 ribosomal protein L16 1 1 TIGR01169 ribosomal protein L1 1 1 1 TIGR01171 ribosomal protein L2 1 1 1 TIGR01391 DNA primase 1 1 TIGR01393 elongation factor 4 1 1 TIGR01632 ribosomal protein L11 1 1 1 TIGR01953 transcription termination factor NusA 1 1 1 TIGR02012 protein RecA 1 1 TIGR02013 DNA-directed RNA polymerase, beta subunit 1 1 1 TIGR02027 DNA-directed RNA polymerase, alpha subunit 1 TIGR02191 ribonuclease III 1 1 1 TIGR02350 chaperone protein DnaK 1 TIGR02387 DNA-directed RNA polymerase, gamma subunit 1 TIGR02397 DNA polymerase III, subunit gamma and tau 1 1 1 TIGR02432 tRNA(Ile)-lysidine synthetase 1 1 TIGR02729 Obg family GTPase CgtA 1 1 1 TIGR03263 guanylate kinase 1 TIGR03594 ribosome-associated GTPase EngA 1 1

106 Table B2 Bin001 (Candidatus ‘Microthrix parvicella’) variable genomic regions %ID %ID M. M. %ID parvicella parvicella nr contig_orf RN1 Bio17-1 database RefSeq annotation non-redundant (nr) annotation outer membrane biosynthesis (EPS- related) 1_89 100 #N/A 47 hypothetical protein [Caulobacter vibrioides] hypothetical protein HypdeDRAFT_1834 1_90 100 #N/A 34 putative secreted protein [Nocardioidaceae bacterium Broad-1] G-D-S-L family lipolytic protein 1_91 100 #N/A 30 #N/A glycosyl transferase family 2 1_92 100 #N/A no hit #N/A no hit 1_93 100 #N/A no hit #N/A no hit 1_94 100 #N/A no hit #N/A no hit 1_95 100 40.87 59 CoA transferase [Frankia sp. EAN1pec] formyl-CoA transferase 1_96 100 54.85 69 hypothetical protein [Phenylobacterium zucineum] hypothetical protein PHZ_c2268 1_97 100 28.34 50 hypothetical protein [Actinopolymorpha alba] Hydroxymethylglutaryl-CoA lyase 1_98 100 90.53 54 hypothetical protein [Candidatus Microthrix parvicella] carotenoid oxygenase 1_99 100 83.05 61 hypothetical protein [Candidatus Microthrix parvicella] acetyl-CoA acetyltransferase 1_100 100 #N/A no hit #N/A no hit 1_101 100 74.84 34 hypothetical protein [Candidatus Microthrix parvicella]

1_102 100 83.09 55 hypothetical protein [Candidatus Microthrix parvicella] enoyl-CoA hydratase 1_103 100 77.66 42 hypothetical protein [Candidatus Microthrix parvicella] short-chain dehydrogenase/reductase SDR 1_104 100 #N/A no hit #N/A no hit periplasmic component of the Tol 1_105 100 36.84 33 #N/A biopolymer transport system 1_106 100 38.05 29 group 1 glycosyl transferase [Acidothermus cellulolyticus] glycosyltransferase 1_107 100 #N/A 31 hypothetical protein [Salinicoccus albus] serine O-acetyltransferase Alpha-1,4-glucan-protein synthase, UDP- 1_108 100 #N/A 34 #N/A forming 1_109 100 32.89 40 hypothetical protein [Salinispora arenicola] dtdp-4-dehydrorhamnose reductase 1_110 100 36.21 35 3-demethylubiquinone-9 3-methyltransferase [alpha proteobacterium LLX12A] Methyltransferase type 11 1_111 100 37.3 41 glycosyl transferase family 2 [Prevotella loescheii] putative glycosyltransferase 1_112 99.93 #N/A 23 translocation protein TolB [Achromobacter arsenitoxydans] lipid A core-O-antigen ligase-like enyme 1_113 100 #N/A 30 glycosyl transferase [Arenimonas oryziterrae] cell wall biosynthesis glycosyltransferase 1_114 100 #N/A 28 polysaccharide biosynthesis family protein [Lyngbya aestuarii] polysaccharide biosynthesis protein

107 1_115 100 #N/A 26 #N/A exopolysaccharide transport protein 1_116 100 38.66 35 glycosyltransferase, group 1 family protein [delta proteobacterium NaphS2] putative family 2 glycosyltransferase WsfG 1_117 100 44.28 52 UDP-phosphate galactose phosphotransferase [Actinoplanes globisporus] UDP-phosphate galactose phosphotransferase N-formylglutamate amidohydrolase 1_118 100 #N/A 33 N-formylglutamate amidohydrolase [Clostridium sp. KLE 1755] superfamily 1_119 100 36.08 50 hypothetical protein [ canus] hypothetical protein SPAM21_01025 1_120 100 #N/A 65 hypothetical protein [Salinibacterium sp. PAMC 21357] hypothetical protein SPAM21_01030 1_121 100 88.24 53 hypothetical protein [Candidatus Microthrix parvicella] AMP-dependent synthetase and ligase 1_122 100 58.97 33 hypothetical protein [Candidatus Microthrix parvicella] putative hydrolase 1_123 100 67.34 40 hypothetical protein [Candidatus Microthrix parvicella] hypothetical protein 1_124 100 66.17 no hit hypothetical protein [Candidatus Microthrix parvicella] no hit 1_125 100 31.03 53 hypothetical protein [Azoarcus toluclasticus] ThiamineS protein 1_126 100 #N/A 62 glycosyl hydrolase [Cupriavidus taiwanensis] glycosyl hydrolase, bnr repeat restriction-modification system, partial 2_1 #N/A 100 no hit #N/A no hit 5-methylcytosine restriction system 2_2 #N/A 94.93 38 hypothetical protein [Candidatus Microthrix parvicella] component-like protein 2_3 #N/A #N/A no hit #N/A no hit 2_4 #N/A #N/A 41 hypothetical protein [Kocuria sp. UCD-OTCP] restriction endonuclease 2_5 #N/A #N/A no hit #N/A no hit 2_6 #N/A #N/A no hit hypothetical protein [Dehalobacter sp. FTH1] no hit antibiotic biosynthesis 2_26 #N/A #N/A no hit hypothetical protein, partial [Rhodococcus erythropolis] no hit 2_27 #N/A #N/A 45 hypothetical protein [Synechococcus elongatus] hydroxyneurosporene-O-methyltransferase 2_28 #N/A #N/A 57 Tryptophan halogenase PrnA [Streptomyces rimosus] halogenase 2_29 #N/A #N/A 34 hypothetical protein [SAR406 cluster bacterium SCGC AB-629-J13] Kynurenine 3-monooxygenase 2_30 #N/A #N/A 37 hypothetical protein [Photorhabdus temperata] hypothetical protein plu0999 class I and II (histidinol-phosphate) 2_31 #N/A 30.25 40 putative Histidinol-phosphate aminotransferase 2 [Streptomyces aurantiacus] aminotransferase protein involved in biosynthesis of mitomycin 2_32 #N/A #N/A 37 #N/A antibiotics/polyketide 2_33 #N/A #N/A no hit #N/A no hit 2_34 #N/A #N/A no hit #N/A no hit 2_35 #N/A #N/A 34 #N/A transposase of ISGme9, IS481 family 2_36 #N/A #N/A no hit hypothetical protein [candidate division OP9 bacterium SCGC AAA255-E04] no hit

108 antibiotic resistance 3_55 #N/A #N/A no hit conserved hypothetical protein [Candidatus Microthrix parvicella] no hit 3_56 #N/A #N/A 40 Plasmid stabilization system protein [Plesiocystis pacifica] Plasmid stabilization system protein 3_57 98.51 98.51 47 hypothetical protein [Candidatus Microthrix parvicella] cytochrome P450 3_58 91.19 91.19 38 putative Transcriptional regulator [Candidatus Microthrix parvicella] putative TetR family transcriptional regulator glyoxalase/bleomycin resistance 3_59 93.65 93.65 53 catechol 2,3 dioxygenase [Arthrobacter gangotriensis] protein/dioxygenase 3_60 #N/A #N/A 78 cupin [Gordonia rhizosphera] cupin domain-containing protein 3_61 #N/A #N/A no hit #N/A no hit bifunctional deaminase-reductase domain 3_62 #N/A 39.24 56 hypothetical protein [Streptomyces sp. TOR3209] protein glyoxalase/bleomycin resistance 3_63 #N/A 46.67 66 hypothetical protein [Amycolatopsis nigrescens] protein/dioxygenase 3_64 #N/A #N/A no hit #N/A no hit 3_65 #N/A #N/A no hit #N/A no hit

3_128 #N/A #N/A 55 transposase [Comamonadaceae] transposase IS4 family protein 3_129 92.13 82.54 35 hypothetical protein [Candidatus Microthrix parvicella] transposase IS4 family protein 3_130 #N/A 50.94 55 integrase [Mycobacterium] transposase 3_131 #N/A #N/A no hit tranposase [Corynebacterium diphtheriae] no hit 3_132 #N/A 26.7 40 hypothetical protein [Streptomyces sp. PVA 94-07] putative aspartate/aromatic aminotransferase Inositol phosphatase/fructose-1,6- 3_133 #N/A 28.81 37 hypothetical protein [Nocardiopsis ganjiahuensis] bisphosphatase:Inositol monophosphatase 3_134 #N/A #N/A 38 hypothetical protein [Amycolatopsis sp. ATCC 39116] sulfotransferase putative adenylyl-sulfate kinase (modular 3_135 #N/A 25.56 42 adenylylsulfate kinase [Pyrobaculum arsenaticum] protein) 3_136 #N/A 29.2 50 hypothetical protein [Chitiniphilus shinanonensis] putative methyltransferase Phosphoenolpyruvate synthase/pyruvate 3_137 #N/A #N/A 43 hypothetical protein [Chitiniphilus shinanonensis] phosphate dikinase putative gamma-glutamyl-gamma- 3_138 #N/A #N/A 47 hypothetical protein [Amycolatopsis sp. ATCC 39116] aminobutyrate hydrolase 3_139 #N/A #N/A 45 hypothetical protein [Methylotenera mobilis] nucleotidyl transferase 3_140 #N/A 34.84 46 isoleucyl-tRNA synthase [Streptomyces] isoleucyl-tRNA synthetase 3_141 #N/A #N/A 24 #N/A Proline--tRNA ligase 3_142 #N/A #N/A 46 peptidase U61 LD-carboxypeptidase A [Micromonospora sp. CNB394] peptidase U61 LD-carboxypeptidase A 3_143 #N/A 58.1 60 leucyl-tRNA synthetase [Frankia sp. Iso899] leucyl-tRNA synthetase

109 3_144 #N/A #N/A no hit #N/A no hit 3_145 100 #N/A 57 hypothetical protein [Salinispora pacifica] hypothetical protein Mspyr1_53990 3_146 99.08 29.59 61 hypothetical protein [Salinispora pacifica] site-specific recombinase XerD 3_147 99.72 29.68 63 hypothetical protein [Salinispora pacifica] site-specific recombinase XerD 3_148 41.24 48 45 integrase [ sp. UCD-THP] transposase

8_1 100 #N/A rRNA adenine methyltransferase [Streptomyces violaceusniger]

8_2 96.53 #N/A hypothetical protein [Gordonia rhizosphera]

8_3 97.48 #N/A #N/A

8_4 100 #N/A #N/A

8_5 94.55 #N/A #N/A

8_6 93.59 #N/A hypothetical protein [Mycobacterium smegmatis]

8_7 99.73 #N/A AAA ATPase [Frankia sp. CN3]

8_8 90.15 #N/A hypothetical protein [Rhodococcus sp. P14]

8_9 88.99 48.08 DNA methylase N-4/N-6 domain protein [Frankia sp. CN3]

8_10 83.41 #N/A antirestriction protein [Synechococcus sp. PCC 7002]

8_11 #N/A #N/A #N/A

8_12 #N/A #N/A hypothetical protein [Nocardia sp. 348MFTsu5.1]

8_13 92.16 #N/A hypothetical protein [Frankia sp. CN3]

8_14 #N/A #N/A #N/A

8_15 #N/A #N/A #N/A

8_16 90.38 #N/A #N/A

8_17 80.19 #N/A #N/A

8_18 97.87 34.88 XRE family transcriptional regulator [Rhodococcus sp. P14]

8_19 94.69 #N/A hypothetical protein [Rhodococcus sp. P14]

8_20 92.64 25.17 recombinase [Rhodococcus sp. P14] outer membrane biosynthesis (EPS- related) 9_1 100 #N/A no hit #N/A no hit putative bifunctional protein Glutamate-1- semialdehyde 2,1-aminomutase/3-deoxy- 9_2 33.33 32.6 55 glutamate-1-semialdehyde aminotransferase-like protein [Thauera linaloolentis] manno-octulosonate 9_3 83.97 #N/A 26 hypothetical protein [Afipia broomeae] aldo/keto reductase 9_4 #N/A #N/A no hit #N/A no hit

110 9_5 #N/A 30.56 57 hypothetical protein [Pseudomonas veronii] putative polysaccharide biosynthesis protein 9_6 33.33 33.46 70 transposase [Arthrobacter sp. 161MFSha2.1] integrase catalytic subunit 9_7 40.45 #N/A 74 transposase [Curtobacterium sp. B8] transposase IS3/IS911 family protein 9_8 49.51 #N/A 56 N-acetylneuraminate synthase [Salisaeta longa] N-acetylneuraminate synthase 9_9 24.15 #N/A 30 hypothetical protein [Prochlorothrix hollandica] glycosyltransferase family 28 protein 9_10 90.91 34.27 50 DegT/DnrJ/EryC1/StrS aminotransferase [Pseudomonas putida] DegT/DnrJ/EryC1/StrS aminotransferase N-acetyl glucosamine/N-acetyl galactosamine epimerase [Amycolatopsis 9_11 99.08 34.68 66 benzoatilytica] polysaccharide biosynthesis protein CapD 9_12 100 43.16 32 hypothetical protein [Sphingomonas wittichii] conserved hypothetical protein 9_13 100 32.56 40 ABC transporter, ATP-binding/permease protein [Tetrasphaera elongata] ABC transporter 9_14 100 30.17 42 type 11 methyltransferase [Natrialba asiatica] type 11 methyltransferase 9_15 100 #N/A 28 hypothetical protein [Paenibacillus fonticola] glycosyl transferase, group 1 family protein 9_16 99.63 #N/A 32 hypothetical protein [Actinoplanes globisporus] Phytanoyl-CoA dioxygenase (PhyH)

transcriptional regulator, putative ATPase, 10_69 #N/A #N/A 38 #N/A winged helix family 10_70 #N/A 26.82 30 transcriptional regulator, partial [Corynebacterium-like bacterium B27] serine/threonine protein kinase 10_71 #N/A #N/A no hit #N/A no hit 10_72 #N/A #N/A 53 hypothetical protein [Nostoc sp. PCC 7120] hypothetical protein all7065 10_73 #N/A #N/A 45 hypothetical protein [Singularimonas variicoloris] XRE family transcriptional regulator 10_74 #N/A 33.54 48 serine kinase [Bradyrhizobium japonicum] HipA domain-containing protein 10_75 #N/A 33.99 40 putative uncharacterized protein [Bacteroides intestinalis CAG:315] hypothetical protein MLP_42460 10_76 42.03 #N/A 40 hypothetical protein [Kocuria rhizophila] hypothetical protein MMAR_4827 10_77 #N/A #N/A no hit #N/A no hit glyoxalase/bleomycin resistance 10_78 #N/A #N/A 33 hypothetical protein [Ilumatobacter coccineus] protein/dioxygenase

19_59 100 #N/A no hits #N/A no hits 19_60 100 #N/A no hits #N/A no hits 19_61 100 37.82 46 hypothetical protein [Amycolatopsis decaplanina] hypothetical protein Rv3179 outer membrane autotransporter barrel 19_62 #N/A #N/A 40 #N/A domain 19_63 #N/A #N/A no hit #N/A no hit excisionase family DNA binding domain- 19_64 #N/A 38.41 45 #N/A containing protein

111 19_65 #N/A #N/A 39 hypothetical protein [Thermus islandicus] hypothetical protein Caur_3718 19_66 #N/A 100 transcriptional regulator [Methylobacter marinus]

19_67 #N/A 99.47 35 filamentation induced by cAMP protein Fic [Frankia sp. EUN1f] filamentation induced by cAMP protein Fic 19_68 #N/A 98.84 hypothetical protein [Candidatus Microthrix parvicella]

19_69 #N/A 87.61 hypothetical protein [Candidatus Microthrix parvicella]

19_70 #N/A 88.37 hypothetical protein [Candidatus Microthrix parvicella]

19_71 #N/A #N/A #N/A outer membrane protien (EPS-related) LPXTG-motif cell wall anchor domain 22_45 #N/A 96.98 50 hypothetical protein [Candidatus Microthrix parvicella] protein 22_46 #N/A 100 40 hypothetical protein [Candidatus Microthrix parvicella] heme oxygenase 22_47 69.78 100 42 hypothetical protein [Candidatus Microthrix parvicella] PilT protein domain protein 22_48 84.78 98.91 54 hypothetical protein [Candidatus Microthrix parvicella] prevent-host-death family protein 22_49 95 99.69 43 hypothetical protein [Candidatus Microthrix parvicella] exodeoxyribonuclease V alpha chain 22_50 38.35 99.56 37 hypothetical protein [Candidatus Microthrix parvicella] exodeoxyribonuclease V subunit beta 22_51 35.03 99.5 36 hypothetical protein [Candidatus Microthrix parvicella] exodeoxyribonuclease V subunit gamma 22_52 #N/A 99.38 50 hypothetical protein [Candidatus Microthrix parvicella] putative exonuclease SbcC 22_53 32 100 40 hypothetical protein [Candidatus Microthrix parvicella] putative exonuclease 22_54 #N/A 98.43 no hit hypothetical protein [Candidatus Microthrix parvicella] no hit 22_55 #N/A 98.33 39 hypothetical protein [Candidatus Microthrix parvicella] putative endoribonuclease 22_56 #N/A 98.89 56 hypothetical protein [Candidatus Microthrix parvicella] beta-lactamase domain-containing protein 22_57 99.63 99.88 33 hypothetical protein [Candidatus Microthrix parvicella] outer membrane adhesin like proteiin 22_58 #N/A 100 42 hypothetical protein [Candidatus Microthrix parvicella] restriction-modification system transposase of ISAar4, IS3 family, IS3 group, 31_35 31.82 81.17 55 hypothetical protein [Candidatus Microthrix parvicella] orfB 31_36 #N/A 89.58 60 hypothetical protein [Candidatus Microthrix parvicella] transposase IS3/IS911 family protein 31_37 #N/A 65.85 no hit #N/A no hit 31_38 #N/A 51.54 48 hypothetical protein [Candidatus Microthrix parvicella] hypothetical protein SZN_33891 31_39 #N/A 27.1 35 transposase [Bordetella petrii] integrase catalytic subunit 31_40 #N/A 30.65 48 hypothetical protein [Alicyclobacillus acidoterrestris] IstB ATP binding domain-containing protein 31_41 #N/A #N/A no hit #N/A no hit 31_42 70.53 72.18 47 hypothetical protein [Candidatus Microthrix parvicella] Type II restriction enzyme, methylase subunit

112 31_43 #N/A #N/A #N/A no hit

32_1 #N/A #N/A no hit #N/A no hit 32_2 #N/A 31.4 38 hypothetical protein [Frankia sp. CN3] hypothetical protein AN3325.2 32_3 #N/A #N/A 35 integrase [Xanthomonas vesicatoria] integrase, catalytic region 32_4 #N/A 35.03 70 hypothetical protein [Amycolatopsis methanolica] transposase 32_5 88.37 90.7 80 resolvase [Candidatus Microthrix parvicella] invertase/recombinase-like protein 32_6 #N/A 97.37 no hit #N/A no hit 32_7 #N/A 100 57 hypothetical protein [Candidatus Microthrix parvicella] putative acyl-CoA dehydrogenase FADE16 ABC-type phosphate/phosphonate transport 32_8 #N/A 99.63 62 hypothetical protein [Candidatus Microthrix parvicella] system, periplasmic component restriction-modification system EcoKI restriction-modification system protein 34_39 #N/A 100 29 hypothetical protein [Candidatus Microthrix parvicella] HsdS 34_40 32.86 100 36 hypothetical protein [Candidatus Microthrix parvicella] hypothetical protein Noca_4767 34_41 68.33 100 71 hypothetical protein [Candidatus Microthrix parvicella] phage Gp37Gp68 family protein type I restriction modification system, 34_42 #N/A 96.72 68 hypothetical protein [Candidatus Microthrix parvicella] methyltransferase subunit 34_43 #N/A 100 41 hypothetical protein [Candidatus Microthrix parvicella] hypothetical protein Rhom172_2840 34_44 #N/A 98.9 60 hypothetical protein [Candidatus Microthrix parvicella] ATPase outer membrane biosynthesis (EPS- related) 39_20 89.08 28.99 no hit #N/A no hit 39_21 89.36 #N/A no hit #N/A no hit 39_22 89.55 30.88 no hit #N/A no hit 39_23 84.32 #N/A no hit #N/A no hit 39_24 90.2 91.62 61 hypothetical protein [Candidatus Microthrix parvicella] integrase catalytic subunit 39_25 99.75 100 53 hypothetical protein [Candidatus Microthrix parvicella] phage integrase 39_26 46.79 99.68 57 hypothetical protein [Candidatus Microthrix parvicella] integrase 39_27 59.16 100 65 putative integrase/recombinase y4rC [Candidatus Microthrix parvicella] tyrosine recombinase XerC 39_28 100 94.24 56 putative Integrase/transposase (fragment) [Candidatus Microthrix parvicella] Integrase catalytic region 39_29 100 91.58 no hit #N/A no hit 39_30 98.45 97.67 hypothetical protein [Candidatus Microthrix parvicella] transposition helper protein

39_31 100 #N/A no hit #N/A no hit

113 39_32 100 #N/A 54 hypothetical protein [Candidatus Microthrix parvicella] conserved hypothetical protein alkyl hydroperoxide reductase/ Thiol specific 39_33 100 34.21 39 alkyl hydroperoxide reductase [Caldithrix abyssi] antioxidant/ Mal allergen 39_34 100 #N/A 43 phytanoyl-CoA dioxygenase [Parvibaculum lavamentivorans] phytanoyl-CoA dioxygenase 39_35 100 #N/A 35 phytanoyl-CoA dioxygenase [Streptomyces clavuligerus] phytanoyl-CoA dioxygenase 39_36 100 #N/A 29 hypothetical protein [Saccharopolyspora spinosa] SnoK-like protein 39_37 100 #N/A no hit #N/A no hit 39_38 100 85.76 44 hypothetical protein [Candidatus Microthrix parvicella] putative acyl-CoA N-acyltransferase 39_39 100 #N/A no hit #N/A no hit 39_40 100 51.56 42 hypothetical protein [Cesiribacter andamanensis] D-alanine export protein 39_41 100 98.19 33 hypothetical protein [Candidatus Microthrix parvicella] transposase of ISAar12, IS1380 family 39_42 100 42.8 58 O-acyltransferase [Riemerella anatipestifer] acyltransferase 39_43 99.73 45.9 46 hypothetical protein [Candidatus Microthrix parvicella] Lipopolysaccharide biosynthesis protein 39_44 99.64 #N/A 30 uncharacterized protein [Clostridium sp. CAG:411] hypothetical protein 39_45 99.38 #N/A 37 #N/A hypothetical protein MAXJ12_18373 restriction-modification system 48_1 96.97 #N/A 56 transposase [Nakamurella multipartita] transposase 48_2 29.19 #N/A 27 #N/A hypothetical protein 48_3 #N/A #N/A no hit #N/A no hit 48_4 #N/A #N/A 24 #N/A hypothetical protein GORBP_083_00290 48_5 #N/A #N/A 44 hypothetical protein [Streptomyces sp. ScaeMP-e10] DNA restriction-modification system protein restriction-modification system 57_17 #N/A #N/A 34 #N/A restriction endonuclease 57_18 #N/A #N/A 37 phage integrase family protein [Mycobacterium parascrofulaceum] phage integrase family protein 57_19 #N/A 29.44 43 putative transposase [Arthrobacter nicotinovorans] integrase family protein 57_20 #N/A #N/A 35 hypothetical protein [Frankia sp. Iso899] hypothetical protein HMPREF1020_00117 57_21 #N/A #N/A no hit #N/A no hit toxin/antibiotic resistance 69_1 #N/A 83.33 40 putative integrase/recombinase y4rA [Candidatus Microthrix parvicella] putative integrase/recombinase 69_2 #N/A 56.25 no hit hypothetical protein [Candidatus Microthrix parvicella] no hit 69_3 #N/A #N/A 66 toxic anion resistance protein [Frankia sp. EAN1pec] toxic anion resistance family protein 69_4 #N/A #N/A 41 #N/A hypothetical protein Krad_3315 69_5 #N/A #N/A 66 hypothetical protein [Streptomyces sp. MspMP-M5] glyoxalase/bleomycin resistance/dioxygenase

114 69_6 #N/A #N/A no hit #N/A no hit Haloacid dehalogenase domain protein 69_7 #N/A #N/A 28 hypothetical protein [Frankia sp. CcI3] hydrolase type 3 Plasma membrane calcium-transporting 69_8 #N/A 31.52 37 calcium-translocating P-type ATPase PMCA-type [Bacteroides sp. CAG:462] ATPase 69_9 #N/A #N/A 54 hypothetical protein [Actinomadura atramentaria] conserved hypothetical protein 69_10 #N/A #N/A 33 phosphoribosyltransferase [Paenibacillus sp. ICGEB2008] hypothetical protein PPE_00945 69_11 #N/A #N/A 41 ATP/GTP-binding protein [Actinomadura atramentaria] ATP/GTP-binding protein 69_12 #N/A #N/A 55 tellurium resistance protein TerA [Deinococcus maricopensis] tellurium resistance protein TerA 69_13 #N/A #N/A 72 chemical-damaging agent resistance protein C [Pasteurella pneumotropica] tellurium resistance protein TerD 69_14 #N/A #N/A 71 chemical-damaging agent resistance protein C [Pseudomonas fragi] tellurium resistance protein TerD 69_15 #N/A #N/A 63 stress protein [Actinomadura atramentaria] stress protein outer membrane/cell wall biosynthesis 95_1 #N/A #N/A 31 capsular polysaccharide biosynthesis protein [Fusobacterium sp. CAG:815] capsular polysaccharide biosynthesis protein 95_2 #N/A #N/A 32 #N/A putative aminoglycoside phosphotransferase 95_3 #N/A 28.79 32 hypothetical protein [Bacillus sp. L1(2012)] nucleoside-diphosphate-sugar transferase 95_4 #N/A 62.02 63 valyl-tRNA synthetase [Thermobifida fusca] valS gene product 95_5 #N/A #N/A no hit #N/A no hit 95_6 #N/A 33.98 26 phage integrase domain protein, partial [Bacillus nealsonii] integrase family protein

108_1 #N/A 100 #N/A

108_2 #N/A 100 hypothetical protein [candidate division EM 19 bacterium JGI 0000001-G10]

108_3 #N/A 100 hypothetical protein [Candidatus Microthrix parvicella]

108_4 #N/A 100 hypothetical protein [Candidatus Microthrix parvicella]

108_5 #N/A #N/A #N/A

108_6 #N/A 99.63 hypothetical protein [Candidatus Microthrix parvicella]

108_7 #N/A #N/A #N/A

108_8 100 #N/A hypothetical protein [Mycobacterium sp. 360MFTsu5.1]

108_9 100 #N/A #N/A

108_10 100 #N/A methylenetetrahydrofolate reductase [Nonomuraea coxensis]

108_11 100 #N/A hypothetical protein, partial [Gordonia paraffinivorans]

108_12 100 #N/A hypothetical protein [Gordonia paraffinivorans]

108_13 100 #N/A hypothetical protein [Vibrio nigripulchritudo]

115 108_14 100 #N/A hypothetical protein [Vibrio nigripulchritudo]

108_15 #N/A #N/A #N/A polysaccharide biosynthesis (likely cell wall-related) 130_1 #N/A 31.34 41 alcohol dehydrogenase [Fulvimarina pelagi] Nucleotidyl transferase 130_2 #N/A 33.82 31 glycosyl transferase family 2 [Chthoniobacter flavus] glycosyl transferase family protein 130_3 #N/A #N/A 37 metallophosphatase [Scytonema hofmanni] metallophosphoesterase putative N-acetylglucosamine-6-phosphate 130_4 #N/A #N/A 43 hypothetical protein [Actinomadura atramentaria] deacetylase 130_5 #N/A #N/A 34 hypothetical protein [Candidatus Poribacteria sp. WGA-4E] predicted protein 130_6 #N/A 36.68 36 hypothetical protein [Streptomyces sp. CNT372] glycosyl transferase family protein 130_7 #N/A 28.9 29 epimerase [Desulfotomaculum carboxydivorans] NAD-dependent epimerase/dehydratase 130_8 29.11 28.85 30 coenzyme PQQ biosynthesis protein E [Pyrobaculum aerophilum] family 2 glycosyl transferase 130_9 #N/A 37.89 34 hypothetical protein [Verrucomicrobium spinosum] glycosyltransferase, group 2 family protein pyrroloquinoline quinone biosynthesis protein 130_10 #N/A 26.11 31 hypothetical protein [Methanosarcina barkeri] E 130_11 #N/A #N/A 30 glycosyltransferase family protein [delta proteobacterium NaphS2] glycosyl transferase family 2 130_12 #N/A #N/A 34 #N/A glycosyltransferase

321_1 #N/A 33.61 48 hypothetical protein [Amycolatopsis balhimycina] transposase 321_2 #N/A 53.33 51 hypothetical protein [Candidatus Microthrix parvicella] transposase 321_3 #N/A 34.65 51 hypothetical protein [Nocardioides sp. CF8] putative transposase 321_4 #N/A #N/A no hit #N/A no hit 321_5 #N/A 30.24 59 transposase [Corynebacterineae] putative transposase 321_6 #N/A 29.07 67 integrase family protein [Rhodococcus wratislaviensis] integrase family protein 321_7 #N/A 30.16 44 hypothetical protein [Gordonia polyisoprenivorans] transposase IS3/IS911

116 Table B3 Bin002 (Gordonia spp.) variable regions %ID %ID %ID %ID contig_orf G.terrae G.rhizosphera G.amarae nr RefSeq database annotation Non-redundant (nr) database annotation

6_14 #N/A #N/A #N/A 88 hypothetical protein [Mycobacterium marinum] hypothetical protein GOALK_093_00260 6_15 #N/A #N/A #N/A 46 hypothetical protein [Gordonia alkanivorans] ref|YP_005585358.1| 6_16 #N/A #N/A #N/A na #N/A no hit

10_1 #N/A #N/A #N/A 49 hypothetical protein [Candidatus Microthrix parvicella] oxidoreductase domain protein 10_2 #N/A 29.69 #N/A 31 hypothetical protein [Saccharomonospora saliphila] cupin superfamily protein 10_3 #N/A #N/A #N/A #N/A no hit

10_4 #N/A #N/A #N/A #N/A no hit

10_5 #N/A #N/A #N/A hypothetical protein [Candidatus Microthrix parvicella] no hit

10_6 #N/A #N/A #N/A 37 hypothetical protein [Streptomyces vitaminophilus] cupin 4 family protein 10_7 #N/A #N/A #N/A 35 #N/A cupin 4 10_8 #N/A #N/A #N/A 36 #N/A hypothetical protein MC7420_2498 10_9 #N/A #N/A #N/A 30 #N/A cupin 4 family protein 10_10 #N/A #N/A #N/A 28 cupin superfamily protein [Janibacter hoylei] cupin 10_11 #N/A #N/A #N/A 28 hypothetical protein [Candidatus Microthrix parvicella] hypothetical protein SGM_5272 10_12 #N/A #N/A #N/A 27 #N/A hypothetical protein P9303_19051 10_13 #N/A #N/A #N/A 55 carbamoyl transferase [Hahella chejuensis] carbamoyl transferase 10_14 44.67 35.8 #N/A 58 hypothetical protein [Micromonospora sp. CNB394] ISMsm2, transposase toxin-antitoxin systems (toxin) 11_7 #N/A 49.64 56.52 Hypothetical protein [Methylobacterium mesophilicum] PilT protein-like

11_8 #N/A #N/A 62.86 88 hypothetical protein [Thiomonas sp. FB-6] prevent-host-death family protein 11_9 #N/A #N/A #N/A 76 hypothetical protein [Nocardioides sp. Iso805N] hypothetical protein Gbro_3426 11_10 #N/A #N/A #N/A 45 hypothetical protein [Nocardioides sp. Iso805N] PIN domain protein Fatty-acid metabolsim 20_6 #N/A #N/A 30.79 44 hypothetical protein [Cellulomonas sp. JC225] secretory lipase 20_7 29.05 #N/A 43.48 52 putative serine/threonine protein kinase [Gordonia hirsuta] putative serine/threonine protein kinase 20_8 #N/A #N/A #N/A 35 putative lipase [Gordonia hirsuta] secreted protein 20_9 #N/A #N/A 46.88 47 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_13_00090

117 20_10 #N/A #N/A 37.91 38 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_13_00090 Fatty-acid metabolism 21_1 #N/A #N/A #N/A #N/A no hit

21_2 37.04 36.77 #N/A 69 hypothetical protein [Gordonia soli] Acetyl-CoA acetyltransferase 21_3 62.37 #N/A #N/A 69 hypothetical protein [Gordonia terrae] hypothetical protein ROP_17870 21_4 32.14 #N/A #N/A 72 hypothetical protein [Gordonia terrae] hypothetical protein GOTRE_175_01880 21_5 84.77 #N/A #N/A 87 Acetyl-CoA acetyltransferase [Gordonia terrae] Acetyl-CoA acetyltransferase 21_6 55.16 57.44 55.35 76 aldehyde dehydrogenase [Rhodococcus ruber] putative aldehyde dehydrogenase 21_7 31.62 #N/A 28.45 67 CAIB/BAIF family protein [Rhodococcus sp. EsD8] formyl-coenzyme A transferase 21_8 82.12 54.17 53.65 83 acyl-CoA dehydrogenase [Gordonia terrae] acyl-CoA dehydrogenase 21_9 56.98 #N/A #N/A 57 putative acetyltransferase [Gordonia amicalis] putative acetyltransferase 21_10 79.92 #N/A 49.43 82 Enoyl-CoA hydratase / carnithine racemase [Gordonia terrae] Enoyl-CoA hydratase / carnithine racemase 21_11 69.58 63.2 #N/A 74 enoyl-CoA hydratase [Rhodococcus rhodochrous] enoyl-CoA hydratase

virion core protein (lumpy skin disease virus)-like 35_1 #N/A #N/A 94.05 50 hypothetical protein [Gordonia amarae] protein 35_2 #N/A #N/A 89.23 91 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_28_00130 35_3 30.4 30.83 87.74 88 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_28_00140 35_4 #N/A #N/A 94.42 66 short-chain dehydrogenase [Frankia sp. CN3] short-chain dehydrogenase/reductase SDR 35_5 29.56 #N/A 84.3 43 hypothetical protein [Gordonia amarae] AMP-dependent synthetase and ligase 35_6 #N/A 46 67.22 68 putative LuxR family transcriptional regulator [Gordonia amarae] putative LuxR family transcriptional regulator 35_7 28.32 27.8 #N/A 51 #N/A putative LuxR family transcriptional regulator 35_8 #N/A #N/A 65.17 66 putative LuxR family transcriptional regulator [Gordonia amarae] putative LuxR family transcriptional regulator Fatty-acid metabolism/ toxin-antitoxin 36_1 #N/A #N/A 83.12 84 hypothetical protein [Acetobacter pasteurianus] hypothetical protein GOAMR_43_00440 36_2 #N/A 24.19 86.9 87 hypothetical protein [Nocardia sp. BMG111209] putative major facilitator superfamily transporter 36_3 #N/A #N/A 93.95 64 lipase [Nocardia farcinica] putative lipase 36_4 #N/A #N/A 91.61 92 hypothetical protein [Gordonia hirsuta] hypothetical protein GOAMR_43_00470 36_5 #N/A #N/A #N/A 47 hypothetical protein [Gordonia bronchialis] hypothetical protein Gbro_0376 36_6 28.8 44.34 #N/A 45 UDP-glucuronosyltransferase [Gordonia rhizosphera] UDP-glucuronosyl/UDP-glucosyltransferase 36_7 #N/A 52.02 #N/A 55 TetR family transcriptional regulator [Gordonia rhizosphera] TetR family transcriptional regulator 36_8 33.94 42.86 27.94 86 putative RutC family protein YjgH [Streptomyces aurantiacus] endoribonuclease L-PSP 36_9 #N/A #N/A 78.68 50 hypothetical protein [Gordonia amarae] DNA polymerase beta domain-containing protein

118 36_10 #N/A #N/A #N/A 64 twitching protein PilT [Actinomyces massiliensis] PIN domain-containing protein 36_11 #N/A 36.23 #N/A 75 prevent-host-death protein [Arthrobacter sp. PAO19] prevent-host-death family protein

37_1 #N/A #N/A #N/A 77 sulfate transporter [Gordonia rhizosphera] putative sulfate transporter 37_2 36.63 36.9 30.98 78 NADPH:quinone reductase [Gordonia rhizosphera] putative oxidoreductase 37_3 #N/A 61.36 #N/A 88 AcrR family transcriptional regulator [Gordonia rhizosphera] putative TetR family transcriptional regulator 37_4 71.78 69.53 #N/A 93 cyclic diguanylate phosphodiesterase [Gordonia rubripertincta] hypothetical protein GOAMR_07_00140 37_5 89.15 90.93 #N/A 98 putative aldehyde dehydrogenase [Gordonia soli] putative aldehyde dehydrogenase 37_6 86.76 85.4 #N/A 94 hypothetical protein [Gordonia] hypothetical protein GOAMR_07_00160 37_7 #N/A #N/A #N/A 88 hypothetical protein [Gordonia soli] hypothetical protein GOAMR_07_00170 37_8 29.37 31.22 30.17 86 alpha/beta hydrolase [Gordonia polyisoprenivorans] putative carboxylesterase 37_9 #N/A #N/A #N/A 93 glutamate synthase large subunit [Gordonia soli] glutamate synthase large subunit 37_10 #N/A #N/A #N/A 93 glutamate synthase [Gordonia] glutamate synthase large subunit

38_1 55.37 #N/A #N/A 61 hypothetical protein [Nocardia sp. 348MFTsu5.1] isoniazid inducible protein iniC 38_2 49.57 #N/A #N/A 52 hypothetical protein [Nocardia sp. 348MFTsu5.1] isoniazid inducible protein IniA 38_3 #N/A #N/A #N/A hypothetical protein [Gordonia effusa] no hit

38_4 #N/A #N/A #N/A 43 hypothetical protein [Nocardia sp. 348MFTsu5.1] hypothetical protein GOEFS_014_00250 38_5 #N/A #N/A #N/A 45 #N/A hypothetical protein MPHLEI_10590 38_6 33.66 #N/A #N/A 35 hypothetical protein, partial [Rhodococcus sp. JVH1] conserved hypothetical proline and threonine rich protein 38_7 #N/A #N/A #N/A 39 putative LuxR family transcriptional regulator [Gordonia effusa] putative LuxR family transcriptional regulator Bacteriophage 41_1 #N/A #N/A #N/A 29 #N/A minor tail protein 41_2 #N/A #N/A #N/A 50 hypothetical protein [Gordonia sihwensis] gp23 phage tail tape measure protein, TP901 family, core 41_3 #N/A #N/A #N/A 51 hypothetical protein [Gordonia sihwensis] region 41_4 #N/A #N/A #N/A #N/A

41_5 #N/A #N/A #N/A #N/A no hit

41_6 #N/A #N/A #N/A hypothetical protein [Nocardia farcinica]

41_7 #N/A #N/A #N/A hypothetical protein [Gordonia sihwensis]

41_8 #N/A #N/A #N/A hypothetical protein [Gordonia sihwensis]

41_9 #N/A #N/A #N/A #N/A no hit

41_10 #N/A #N/A #N/A #N/A no hit

119 41_11 #N/A #N/A #N/A hypothetical protein [Gordonia sihwensis]

41_12 #N/A #N/A #N/A #N/A no hit

41_13 #N/A #N/A #N/A hypothetical protein [Gordonia sihwensis]

50_1 #N/A #N/A #N/A 96 ABC transporter [Gordonia namibiensis] putative UvrA-like protein 50_2 #N/A #N/A #N/A #N/A no hit

50_3 #N/A #N/A #N/A #N/A no hit

50_4 #N/A #N/A #N/A #N/A no hit

50_5 #N/A #N/A #N/A #N/A no hit

50_6 #N/A #N/A 55.34 56 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_38_00120 50_7 42.24 #N/A 72.27 73 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_38_00110

51_1 #N/A #N/A 65.27 hypothetical protein [Gordonia aichiensis]

51_2 #N/A #N/A #N/A #N/A no hit

51_3 #N/A #N/A #N/A #N/A no hit

51_4 #N/A #N/A #N/A 61 hypothetical protein [Gordonia malaquae] XRE family transcriptional regulator 51_5 #N/A #N/A #N/A 67 hypothetical protein [Mycobacterium tuberculosis complex] hypothetical protein MLP_05470 51_6 33.57 43.28 #N/A 69 putative helicase [Gordonia aichiensis] putative helicase 51_7 #N/A #N/A #N/A 72 putative Xre family DNA-binding protein [Gordonia aichiensis] putative Xre family DNA binding protein 51_8 25.73 78.42 94.33 95 glycerol kinase [Gordonia soli] glycerol kinase Hydrogen metabolism 52_1 #N/A 57.97 #N/A 62 hypothetical protein [Kineosphaera limosa] NHL repeat containing protein 52_2 #N/A 45.98 #N/A 54 NiFe hydrogenase maturation protein [Nocardioides sp. CF8] 52_3 #N/A 30.26 #N/A 62 hydrogenase [Nocardia sp. 348MFTsu5.1] hydrogenase assembly chaperone HypC/HupF 52_4 #N/A 79.34 #N/A 81 hydrogenase formation protein HypD [Actinoplanes globisporus] hydrogenase maturation protein HypD 52_5 #N/A 66.38 #N/A 70 hydrogenase [Mycobacterium vaccae] hydrogenase maturation protein Hype 52_6 55.72 #N/A #N/A 55 hypothetical protein [Nocardia sp. 348MFTsu5.1] TetR family transcriptional regulator 52_7 56.88 #N/A #N/A 60 hydrolase [Gordonia sp. KTR9] putative hydrolase 52_8 70.65 66.3 82.61 83 hypothetical protein [Gordonia sp. KTR9] hypothetical protein GOAMR_20_01660 52_9 #N/A #N/A 81.19 81 amine oxidase [Gordonia polyisoprenivorans] putative flavin-containing amine oxidase 52_10 #N/A 34 #N/A 60 transcriptional regulator [Herbaspirillum frisingense] MarR family transcriptional regulator 52_11 #N/A #N/A #N/A #N/A no hit

120 carbohydrate metabolism 53_1 #N/A #N/A 80.56 81 hypothetical protein [Nocardia farcinica] hypothetical protein GOAMR_20_01870 53_2 #N/A 44.09 87.1 88 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_20_01880 53_3 #N/A #N/A 77.05 55 hypothetical protein [Gordonia hirsuta] 16S RNA G1207 methylase RsmC 53_4 #N/A 31.93 #N/A 54 Trehalose and maltose hydrolases [Thermoanaerobacter] glycoside hydrolase 53_5 #N/A #N/A #N/A 65 PfkB domain protein [Gillisia limnaea] PfkB family protein carbohydrate kinase 53_6 50.75 24.57 27.37 72 putative sugar transporter [Nocardia asteroides] sugar transporter 53_7 #N/A 29.7 #N/A 51 ROK-family transcriptional regulator [Rhodococcus sp. EsD8] putative NagC family transcriptional regulator 53_8 #N/A #N/A 88.54 89 putative aminotransferase [Gordonia aichiensis] putative aminotransferase

60_1 #N/A #N/A #N/A #N/A

60_2 #N/A #N/A #N/A 40 hypothetical protein [Gordonia neofelifaecis] hypothetical protein SCNU_02285 60_3 #N/A #N/A #N/A 30 hypothetical protein [Gordonia amarae] virulence-associated E family protein 60_4 #N/A #N/A #N/A #N/A no hit

60_5 #N/A #N/A #N/A #N/A no hit

60_6 #N/A #N/A 27.2 hypothetical protein [Propionibacterium sp. HGH0353] site-specific recombinase, DNA invertase Pin

60_7 #N/A #N/A 27.69 33 #N/A conserved membrane protein of unknown function 60_8 35.34 #N/A 38.92 41 hypothetical protein [Actinoplanes globisporus] MerR family transcriptional regulator 60_9 #N/A #N/A 50.72 88 hypothetical protein [Streptomyces sp. HGB0020] putative ArsR family transcriptional regulator polysaccharide biosynthesis 65_2 #N/A 26.93 24.88 polysaccharide biosynthesis protein [Rhodococcus ruber]

65_3 #N/A #N/A 32.93 glycosyl transferase family 1 [Mycobacterium gilvum] saxobsidens DD2]

65_4 #N/A #N/A #N/A hypothetical protein [Mycobacterium gilvum]

65_5 #N/A #N/A 30.15 polymerase [Mycobacterium vanbaalenii]

65_6 #N/A #N/A #N/A glycosyl transferase family 1 [Mycobacterium fortuitum]

65_7 #N/A #N/A #N/A hypothetical protein [Smaragdicoccus niigatensis]

71_1 32.54 40.5 #N/A 98 3-phosphoglycerate dehydrogenase [Gordonia namibiensis] D-3-phosphoglycerate dehydrogenase 71_2 68.49 66.22 #N/A 85 peptidase S1 family protein [Gordonia amarae] peptidase S1 family protein 71_3 #N/A #N/A #N/A 33 hypothetical protein [Gordonia amarae] hypothetical protein GPOL_c30100 71_4 #N/A #N/A #N/A 48 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_09_00320 71_5 #N/A #N/A #N/A 71 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_09_00330

121 71_6 #N/A #N/A #N/A 72 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_09_00340 71_7 28.39 37.67 28.77 93 phosphoenolpyruvate synthase [Corynebacterium terpenotabidum] phosphoenolpyruvate synthase 71_8 #N/A 47.52 #N/A 86 putative phosphotransferase [Gordonia amarae] putative phosphotransferase

96_1 #N/A #N/A #N/A 40 hypothetical protein [Nonomuraea coxensis] integrase, catalytic region 96_2 #N/A #N/A #N/A 51 #N/A bacterial stress protein 96_3 48.48 48.48 28.91 60 MerR family transcriptional regulator [Rhodococcus] MerR family transcriptional regulator 96_4 60.1 #N/A #N/A 83 Cyanate permease [Gordonia terrae] putative major facilitator superfamily transporter 96_5 61.32 59.58 #N/A 63 beta-lactamase [Rhodococcus] beta-lactamase 96_6 #N/A #N/A #N/A 45 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_12_00400 96_7 45.27 41.5 #N/A 82 MarR family transcriptional regulator [Glaciibacter superstes] putative MarR family transcriptional regulator 96_8 #N/A #N/A #N/A 62 DoxX family protein [Micromonospora sp. ATCC 39149] DoxX family protein 96_9 48.84 48.97 #N/A 82 NmrA family protein, partial [Leifsonia aquatica] putative NAD(P)H--quinone oxidoreductase

97_1 #N/A #N/A #N/A 37 #N/A hypothetical protein FrCN3DRAFT_8038 97_2 #N/A #N/A #N/A 54 putative PnuC family transporter [Gordonia malaquae] nicotinamide mononucleotide transporter PnuC 97_3 #N/A #N/A #N/A 45 hypothetical protein [Gordonia malaquae] cytidylyltransferase 97_4 #N/A 38.35 42.93 42 putative hydrolase [Gordonia malaquae] NUDIX hydrolase 97_5 36.05 36.56 37.21 73 DGPFAETKE family protein [Nocardiopsis halotolerans] DGPFAETKE family protein 97_6 32.66 42.43 37.26 68 RNA polymerase sigma24 factor [Frankia sp. BCU110501] sigma-70 region 2 domain-containing protein 97_7 #N/A #N/A #N/A 63 beta-lactamase [Streptomyces bottropensis] hypothetical protein PAI11_08780 97_8 53.82 34.83 35.56 60 AraC family transcriptional regulator [Saccharomonospora cyanea] AraC family transcriptional regulator 97_9 36.67 35.24 #N/A 47 hypothetical protein [Gordonia hirsuta] peptidase S51 dipeptidase E Antibiotic biosynthesis 98_1 33.33 #N/A #N/A 45 #N/A hypothetical protein GOEFS_014_00330 98_2 #N/A #N/A 27.66 40 hypothetical protein [Gordonia amarae] HipA domain-containing protein 98_3 #N/A #N/A #N/A 80 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_19_00060 98_4 #N/A #N/A #N/A 66 antibiotic biosynthesis monooxygenase [Marinobacter lipolyticus] Antibiotic biosynthesis monooxygenase 98_5 65.58 65.28 #N/A 91 3-oxoacyl-ACP synthase [Gordonia polyisoprenivorans] 3-oxoacyl- 98_6 #N/A 52.27 #N/A 55 LpqP protein [Mycobacterium marinum] LpqP protein 98_7 32.64 46.9 #N/A 90 putative carboxylesterase [Gordonia amarae] putative carboxylesterase 98_8 32.65 #N/A #N/A 92 hypothetical protein [Mycobacterium tuberculosis complex] putative TetR family transcriptional regulator

122 pyridoxamine 5'-phosphate oxidase-like FMN-binding 98_9 27.27 #N/A #N/A 69 pyridoxamine 5'-phosphate oxidase [Ornithinimicrobium pekingense] prot

99_1 #N/A #N/A #N/A #N/A putative transposase

99_2 #N/A #N/A #N/A 42 prevent-host-death protein [ marinus] toxin-antitoxin system, antitoxin component, PHD family 99_3 #N/A #N/A #N/A 70 #N/A hypothetical protein GOAMR_19_01360 99_4 69.66 71.13 #N/A 84 NAD-dependent deacetylase [Gordonia polyisoprenivorans] NAD-dependent deacetylase 99_5 67.26 68.87 #N/A 92 cytosine deaminase [Gordonia rhizosphera] putative amidohydrolase 99_6 40 45.81 #N/A 73 hypothetical protein [Pseudoclavibacter faecalis] protein CrcB homolog 99_7 50.88 53.49 #N/A 81 hypothetical protein [Pseudoclavibacter faecalis] protein CrcB homolog 99_8 54.47 44.57 50.61 93 3-hydroxyacyl-CoA dehydrogenase [Rhodococcus wratislaviensis] putative 3-hydroxyacyl-CoA dehydrogenase 99_9 43.56 82.98 43.87 90 acyl-CoA dehydrogenase [Rhodococcus sp. DK17] acyl-CoA dehydrogenase, short-chain specific 99_10 63.35 62.11 #N/A 90 TetR family transcriptional regulator [Mycobacterium vaccae] putative TetR family transcriptional regulator 99_11 #N/A #N/A #N/A 99 amidohydrolase [Mycobacterium] putative decarboxylase

108_1 #N/A #N/A #N/A #N/A no hit

108_2 #N/A #N/A #N/A #N/A no hit

108_3 #N/A #N/A #N/A #N/A no hit

108_4 #N/A #N/A #N/A 53 hypothetical protein [Gordonia araii] hypothetical protein GOARA_013_00350 108_5 #N/A #N/A #N/A 38 N-acetylmuramoyl-L-alanine amidase [Corynebacterium terpenotabidum] lysozyme 108_6 #N/A #N/A #N/A 26 hypothetical protein [Gordonia polyisoprenivorans] TPR repeat protein 108_7 #N/A #N/A #N/A #N/A no hit

Bacteriophage 116_1 #N/A #N/A #N/A 50 hypothetical protein [Streptomyces aurantiacus] gp14 116_2 #N/A #N/A #N/A 42 #N/A gp104 116_3 #N/A #N/A #N/A 49 #N/A gp15 116_4 #N/A #N/A #N/A 58 #N/A gp14 116_5 #N/A #N/A #N/A #N/A

116_6 #N/A #N/A #N/A hypothetical protein [Mycobacterium abscessus]

116_7 #N/A #N/A #N/A Bacteriophage protein [Mycobacterium abscessus]

116_8 #N/A #N/A #N/A #N/A

116_9 #N/A #N/A #N/A #N/A

116_10 #N/A #N/A #N/A hypothetical protein [Dietzia alimentaria]

123

132_1 #N/A #N/A 44.98 94 monooxygenase [Rhodococcus sp. DK17] putative FMNH2-dependent monooxygenase 132_2 #N/A #N/A #N/A 75 putative FMNH2-dependent monooxygenase [Gordonia amarae] putative FMNH2-dependent monooxygenase 132_3 #N/A #N/A #N/A 92 #N/A putative acetyl-CoA acyltransferase 132_4 50 66.12 #N/A 68 acetyl-CoA acetyltransferase [Nocardia sp. 348MFTsu5.1] acyl-CoA dehydrogenase, C-terminal domain protein 132_5 30.43 29.35 #N/A 81 putative oxidoreductase [Gordonia soli] 2-deoxy-D-gluconate 3-dehydrogenase 132_6 34.57 55.71 56.41 49 putative oxidoreductase [Gordonia soli] serine/threonine protein kinase 132_7 #N/A 50 #N/A 50 putative LuxR family transcriptional regulator [Gordonia soli] transcriptional regulator, LuxR family Aromatic compound degradation 133_1 33.68 74.91 33.33 75 AMP-dependent synthetase [Gordonia rhizosphera] putative acyl-CoA synthetase 133_2 47.69 45.81 #N/A 88 putative 4-hydroxy-2-oxovalerate aldolase [Gordonia aichiensis] 4-hydroxy-2-oxovalerate aldolase CmtG 133_3 #N/A #N/A #N/A 80 acetaldehyde dehydrogenase CmtH [Gordonia aichiensis] acetaldehyde dehydrogenase CmtH 133_4 40.57 37.74 #N/A 71 2-hydroxypenta-2,4-dienoate hydratase CmtF [Gordonia aichiensis] 2-hydroxypenta-2,4-dienoate hydratase CmtF 133_5 26.89 #N/A #N/A 75 p-cumate dioxygenase small subunit [Gordonia polyisoprenivorans] p-cumate dioxygenase small subunit 133_6 #N/A #N/A 32.41 75 putative oxidoreductase [Gordonia polyisoprenivorans] putative oxidoreductase 133_7 #N/A #N/A #N/A 50 hypothetical protein [Streptomyces prunicolor] putative hydrolase cytochrome 159_1 #N/A #N/A #N/A 93 Mce family protein [Gordonia amarae] Mce family protein 159_2 26.75 #N/A 25.74 83 Mce family protein [Gordonia amarae] Mce family protein 159_3 51.71 #N/A 53.41 96 hypothetical protein [Nocardia asteroides] YrbE family protein 159_4 #N/A #N/A #N/A hypothetical protein [Nocardia asteroides] YrbE family protein

159_5 #N/A #N/A #N/A YrbE family protein [Nocardia asteroides] YrbE family protein

159_6 41.33 #N/A #N/A short-chain dehydrogenase [Mycobacterium fortuitum] putative oxidoreductase

159_7 #N/A 29.59 #N/A 93 cytochrome P450 [Mycobacterium sp. 360MFTsu5.1] putative cytochrome P450

165_1 24.54 #N/A 94.32 35 dihydropyrimidinase [Gordonia amarae] hydantoin racemase 165_2 #N/A #N/A #N/A 47 Asp/Glu racemase [Jannaschia sp. CCS1] ATP-dependent protease FtsH 165_3 #N/A #N/A 80.46 31 hypothetical protein [Gordonia amarae] hypothetical protein Xcel_2882 165_4 #N/A #N/A #N/A 31 hypothetical protein [Sporichthya polymorpha] hypothetical protein Xcel_2882 165_5 #N/A #N/A #N/A 53 hypothetical protein [Sporichthya polymorpha] hypothetical protein 165_6 33.12 30.38 #N/A 43 hypothetical protein [Dermabacter sp. HFH0086] ribosomal protein N-acetylase 165_7 #N/A #N/A #N/A 42 hypothetical protein [Gordonia malaquae] ATP-binding protein

124 Bacteriophage 166_1 #N/A #N/A #N/A #N/A no hit

166_2 #N/A #N/A #N/A hypothetical protein [Niabella aurantiaca] no hit

166_3 #N/A #N/A #N/A #N/A no hit

166_4 #N/A #N/A #N/A #N/A no hit

166_5 #N/A #N/A #N/A 33 phage tail tape measure protein [Rhodospirillum centenum] phage tail tape measure protein, TP901 family 166_6 #N/A #N/A #N/A #N/A no hit

166_7 #N/A #N/A #N/A #N/A no hit

166_8 #N/A #N/A #N/A #N/A no hit

166_9 #N/A #N/A #N/A #N/A no hit

Aromatic compound degradation 167_1 #N/A #N/A #N/A 55 #N/A 2-ketocyclohexanecarboxyl-CoA hydrolase 167_2 #N/A 38.99 31.3 80 acetyl-CoA acetyltransferase [Smaragdicoccus niigatensis] acetyl-CoA acetyltransferase 167_3 #N/A 36.84 #N/A hypothetical protein [Mycobacterium sp. VKM Ac-1815D]

167_4 #N/A 66.26 #N/A 79 acetyl-CoA acetyltransferase [Mycobacterium thermoresistibile] acetyl-CoA acetyltransferase 167_5 #N/A #N/A #N/A hypothetical protein [Mycobacterium abscessus]

167_6 28.57 28.09 #N/A 70 dioxygenase [Mycobacterium indicus pranii] hydroxylase beta subunit, benzoate 1,2-dioxygenase 167_7 #N/A #N/A #N/A 80 dioxygenase [Mycobacterium sp. VKM Ac-1815D] dioxygenase large subunit 167_8 #N/A #N/A #N/A 74 dioxygenase [Mycobacterium sp. VKM Ac-1815D] benzoate 1,2-dioxygenase, large subunit

168_1 #N/A #N/A #N/A #N/A no hit

168_2 #N/A #N/A #N/A #N/A no hit

168_3 #N/A #N/A #N/A #N/A no hit

168_4 #N/A #N/A #N/A #N/A no hit

168_5 #N/A #N/A #N/A #N/A no hit

168_6 #N/A #N/A #N/A #N/A no hit

168_7 #N/A #N/A #N/A #N/A no hit

168_8 #N/A #N/A #N/A #N/A no hit

168_9 #N/A #N/A #N/A 39 ParB-like protein [Bifidobacterium breve] partitioning protein parB 168_10 #N/A #N/A #N/A #N/A no hit

168_11 #N/A #N/A #N/A hypothetical protein [Rhodococcus rhodnii] no hit Stress response

125 177_1 83.92 87.21 #N/A 95 UDP-galactopyranose mutase [Gordonia soli] UDP-galactopyranose mutase 177_2 65.73 72.31 #N/A 66 stage II sporulation protein SpoIID [Gordonia sp. KTR9] SpoIID/LytB domain-containing protein 177_3 #N/A #N/A #N/A #N/A

177_4 #N/A #N/A #N/A #N/A

177_5 58.71 55.83 #N/A 62 putative N-acetylmuramoyl-L-alanine amidase [Gordonia terrae] putative N-acetylmuramoyl-L-alanine amidase 177_6 #N/A #N/A #N/A hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_20_00230

177_7 #N/A #N/A #N/A 93 type I restriction-modification enzyme, R subunit [Mycobacterium avium] type I restriction-modification system restriction subunit 177_8 #N/A #N/A #N/A #N/A

178_1 #N/A #N/A 71.73 72 antibiotic transporter [Gordonia namibiensis] putative ABC transporter permease protein 178_2 #N/A #N/A 78.59 69 IclR family transcriptional regulator [Gordonia namibiensis] putative ABC transporter ATP-binding protein 178_3 35.23 #N/A #N/A 74 putative non-ribosomal peptide synthetase [Gordonia alkanivorans] putative non-ribosomal peptide synthetase

179_1 #N/A #N/A #N/A #N/A

179_2 #N/A #N/A #N/A hypothetical protein [Amycolatopsis nigrescens]

179_3 #N/A #N/A #N/A #N/A

179_4 #N/A 25.56 #N/A 62 methylmalonyl-CoA carboxyltransferase 12S subunit [Rhodococcus opacus] carboxyl transferase 179_5 43.85 #N/A #N/A 80 Short-chain dehydrogenase/reductase SDR [Rhodococcus sp. EsD8] short-chain dehydrogenase/reductase SDR 179_6 #N/A #N/A #N/A 82 acyl-CoA dehydrogenase [Mycobacterium abscessus] acyl-CoA dehydrogenase Toxin-antitoxin system 242_1 43.4 #N/A 83.19 84 hypothetical protein [Microbacterium sp. 292MF] putative methylated-DNA-cysteine methyltransferase 242_2 50.1 50.89 81.23 82 hypothetical protein [Microbacterium maritypicum] putative methylated-DNA-cysteine methyltransferase 242_3 #N/A #N/A #N/A #N/A -

242_4 #N/A #N/A 42.74 integrase [Mycobacterium abscessus] integrase

242_5 #N/A #N/A #N/A #N/A -

242_6 #N/A #N/A #N/A antitoxin HicB [Propionibacterium] Fe-S-cluster redox enzyme

242_7 #N/A #N/A #N/A 68 toxin HicA [Actinomyces odontolyticus] response regulator, CheY-like receiver domain 242_8 #N/A #N/A #N/A 59 hypothetical protein [Rhodococcus erythropolis] Fic protein Toxin-antitoxin system 253_2 #N/A #N/A #N/A 38 hypothetical protein [Promicromonospora sukumoe] hypothetical protein Snas_5449 253_3 #N/A #N/A #N/A #N/A no hit

253_4 #N/A #N/A #N/A #N/A no hit

253_5 #N/A #N/A #N/A 78 hypothetical protein [Methylosinus sp. LW4] Ribbon-helix-helix protein, copG family

126 253_6 #N/A #N/A #N/A #N/A no hit

253_7 #N/A #N/A #N/A 64 hypothetical protein [Streptomyces sp. HPH0547] hypothetical protein FraEuI1c_6647 253_8 #N/A #N/A #N/A #N/A no hit

253_9 #N/A 37.07 #N/A 57 transcriptional modulator of MazE/toxin, MazF [Cyanothece] putative PemK-like protein 253_10 #N/A #N/A #N/A 51 antitoxin [Patulibacter americanus] ChpI 253_11 #N/A #N/A #N/A #N/A no hit

253_12 #N/A #N/A #N/A 76 #N/A helix-turn-helix domain protein 253_13 58.11 #N/A #N/A 80 rifampin ADP-ribosyl transferase [Brevibacterium casei] rifampin ADP-ribosyl transferase 253_14 #N/A #N/A #N/A #N/A no hit

254_1 81.79 80.18 #N/A 94 amino acid adenylation protein [Gordonia polyisoprenivorans] putative non-ribosomal peptide synthetase 254_2 57.41 #N/A #N/A 60 hypothetical protein [Actinomadura flavalba] alpha/beta hydrolase fold protein 254_3 48.37 #N/A #N/A 57 hypothetical protein [Actinomadura flavalba] transcriptional regulator 254_4 #N/A 29.69 #N/A 57 filamentation induced by cAMP protein fic [Rhodococcus qingshengii] filamentation induced by cAMP protein Fic

255_1 83.13 37.5 #N/A 85 putative oxidoreductase [Gordonia aichiensis] short-chain dehydrogenase/reductase SDR 255_2 70.79 71.19 #N/A 93 hypothetical protein [Gordonia soli] hypothetical protein GOAMR_03_01340 255_3 73.26 71.17 #N/A 92 enoyl-CoA hydratase [Rhodococcus erythropolis] putative enoyl-CoA isomerase 255_4 70.32 32.41 #N/A 84 hypothetical protein [Gordonia paraffinivorans] hypothetical protein GOAMR_03_01370 255_5 #N/A 40.35 #N/A 79 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_03_01380 255_6 #N/A 75.84 #N/A 93 heat shock protein 90 [Gordonia rhizosphera] chaperone protein HtpG

256_1 #N/A #N/A #N/A 53 conserved hypothetical protein [Bradyrhizobium sp. STM 3843] hypothetical protein 256_2 #N/A #N/A #N/A 82 twitching motility protein PilT [Mycobacterium sp. 141] PilT protein domain-containing protein 256_3 #N/A #N/A #N/A 82 hypothetical protein [Mycobacterium sp. 141] transcription regulator of the Arc/MetJ class 256_4 52.89 52.51 #N/A 62 putative Fis family transcriptional regulator [Gordonia malaquae] helix-turn-helix domain-containing protein 256_5 40.16 41.79 46.44 85 putative aldehyde dehydrogenase [Gordonia hirsuta] aldehyde dehydrogenase 256_6 #N/A #N/A #N/A 85 oxidoreductase, Rxyl_3153 family protein [Rhodococcus wratislaviensis] zinc-containing alcohol dehydrogenase

257_1 31.84 26 27.09 65 hypothetical protein [Nocardia sp. BMG111209] putative dioxygenase 257_2 #N/A #N/A #N/A 72 hypothetical protein [Gordonia amarae] hypothetical protein GOAMR_10_00090 257_3 #N/A #N/A #N/A 51 hypothetical protein [Sorangium cellulosum] hypothetical protein O3I_17803

127 257_4 47.79 79.1 93.31 94 acyl-CoA synthetase [Gordonia rhizosphera] putative fatty-acid--CoA ligase 257_5 #N/A #N/A #N/A 64 hypothetical protein [Nocardiopsis lucentensis] hypothetical protein MLP_16900 257_6 #N/A #N/A #N/A 53 hypothetical protein [Nocardiopsis halotolerans] PilT protein domain-containing protein 257_7 #N/A #N/A #N/A 66 #N/A hypothetical protein MAP4268c

128 Table B4 Prophage regions from metagenome Annotation e value Sequence Region 1 PHAGE_Megavi_chiliensis_NC_016072: collagen-like protein 4.00E-11 MARPRRPKPKESGRRTKKKVSILNTESIEYVDW KDVNLLRRFQSDRAKIRARRVTGNNTQQQRQVAVAIR PHAGE_Rhodoc_REQ3_NC_016654: single stranded DNA binding protein 7.00E-30 MATNTVTIIGNVTRDPELRFTPSGQAVANFGVAV NRRWQNRQTNEWEEATSFFDIVAWAQLGENVSESCP 30S ribosomal protein S6 [Ilumatobacter coccineus YM16-304] gi|470180472|ref|YP_007566516.1| 3.00E-25 MNRAYELMVIIDADVADAENKVVVDRVEELIGAAGGEL SSTDRWGRRKFAYLINHKAEGYYVVFEFTADP PHAGE_Ostreo_2_NC_014789: putative 3-methyl-2-oxobutanoate hydroxymethyltransferase 6.00E-47 MSDRPTVPQIRARKVRDGAEPLVMITAHDAPTARIADAG GVDMILVGDSLAMVALGYEDTLQVTIDDMVH hypothetical N/A MNPTVRDPSSSDTDCQNCGSTGLPTEPVQRVYCSPESPS DLDAATIDNEIEVWCAACVANYPHLAVKPG PHAGE_Prochl_P_SSM2_NC_006883: phage tail fiber-like protein 1.00E-09 MSSQPPARRSLFWRSRRFWFAAVVVVVLGAGGVWRLL DQVDLPEEITNPLANTSLICDASVPVGTCSIDN hypothetical protein [Frankia symbiont of Datisca glomerata] gi|336180323|ref|YP_004585698.1| 5.00E-21 MVMAGAGGGVMPGVRDAAGSGSLSRGVGCRTVEQLF ARLQGRTSPSGEVDYRLTRRSTLRSLAAGEVDRA PHAGE_Microm_MpV1_NC_014767: hypothetical protein 3.00E-05 VSSLDNRTVLHRTRRHPSAGNLGPRPLLLAPGTPPPGLPP TSFPHGLSQQTNFVEFGPDLDHSTHQRLLG PHAGE_Mycoba_Myrna_NC_011273: gp28 1.00E-57 VIPPRLATLLDQLAPLAERFAGRGHRLYLVGGSVRDLLL GSDRLPDDLDFTTEASPEAIKAALDGWVDAL PHAGE_Rhodoc_E3_NC_021347: putative histone deacetylase protein 9.00E-09 MTVLVVRSESSVRHDTGQWHPERAARLTATTAALSDPE LDGALRFVEARQATDDELHMVHTPEHVARIRV hypothetical N/A MVLVVNLGWDVVGEPVVPGEDPGFEDSALSGLPTCSRV AVGAI GCN5-like N-acetyltransferase [Acidimicrobium ferrooxidans DSM 10331] gi|256370849|ref|YP_003108673.1| 3.00E-25 MTAQLAVDARGAHPSVAGVGRCVADATERGFDRILTA ALHRDDLFPFVHHGFEPVEELVVLAHDLIEVPV hypothetical protein Isova_3007 [Isoptericola variabilis 225] gi|334338427|ref|YP_004543579.1| 8.00E-14 MTTRQVHRLVVGMMMLLLTTIGLPVGTGGAVGADGAV ATSRGAAPQETLRILHTTTFVPADGTFTFTVDT PHAGE_Pandor_salinus_NC_022098: serine/threonine kinase motif-containing 2.00E-20 MPGNATDSPSPSADYMPGAMLGNRYRLERKVGTGGMA QVWEASDLVLDRRVAVKILHPHLATDTSVERFR putative RNA polymerase ECF subfamily sigma factor [Ilumatobacter coccineus YM16-304] gi|470180484|ref|YP 2.00E-34 MTSSRRAGATDDELVAWAQGGDRLAIEVLLRRHYDRL YAVCRGVVGLGDADDATQATMMGIVGGLARFDG serine/threonine protein kinase [Haliangium ochraceum DSM 14365] gi|262193765|ref|YP_003264974.1| 1.00E-05 VSDEPLRPGALGPPPQPEPSVREAAIASSLTRFDELIVDGR LTEGNEPHSRSDGSMPSSAPVDELAQVRE hypothetical protein YM304_42790 [Ilumatobacter coccineus YM16-304] gi|470180487|ref|YP_007566531.1| 2.00E-05 MATSPAATHDPPSAPAGPSGTQSHGDAHWADQVADLIV DTVDRVRDRTVTPAHALAKYVVYGAVIAVLIL PHAGE_Bacill_G_NC_023719: gp344 5.00E-65 MSTIHRKVLIIGSGPAGLTAGIYTSRAQLEPLLVEGEPSST SDQPGGQLMLTTEVENFPGFPDQVQGPDL PHAGE_Achrom_JWAlpha_NC_023556: hypothetical protein 3.00E-10 MSAAITHLTNSSFSEEVTGSDLPVLVDFWAEWCGPCKTI APVLEELAQEHGDKLRIAKVDVDSEQALALR PHAGE_Salmon_SSU5_JQ965645: putative ParB-like nuclease domain-containing protein 1.00E-09 VLEVAVDDIRANPFQPRVDFDPESLGGLAASIAALGVLQ PLLVRPAAQGTYHLIAGERRWRAARQAGLAT

129 PHAGE_Natria_PhiCh1_NC_004084: putative plasmid partitioning protein Soj 5.00E-26 VTSKTKKIKKPKSDVDVKGDVAKPTQTMTTRVVAIANQ KGGVGKTTTTVNLGAALAERDLRVLVIDLDPQ PHAGE_Pandor_dulcis_NC_021858: pif1-like helicase 6.00E-05 MEWIELTGKTIDEARDLALEKLGVHESEAEVEVLEHPSV SMFGRVKSMARIRARVAPVAAPAKEERRRRG PHAGE_Megavi_chiliensis_NC_016072: hypothetical protein 7.00E-05 MPDLLGPFDPLFEFFGAIVAGIYAVIPSFGVAIVLFTFLVM VVTTPLTVKSTKSMLQMQRLQPELKQLQA hypothetical N/A VTRNRVRRRLRHLMADAERRGTLVTKDYLIVGGPAISN LSFDELSTHLNNALTSAERSVQGTRRHSPG hypothetical protein PFREUD_24220 [Propionibacterium freudenreichii subsp. shermanii CIRM-BIA1] gi|297627 3.00E-10 VKRTYQPKTRRRARRHGFRHRMAERSGRAVVKARRRK GRARLSA hypothetical N/A MDELSELSVRRCPVDGVRRVGPVASLPARPQASGKGG PHAGE_Bacill_Pony_NC_022770: replication initiator protein 2.00E-05 MEDEATSVWEAVARGVTHQVSSVVWRTTFSEVRAVDY DGATLTIVAPSLVLRDRIDNRFRPLLMGVISDL hypothetical N/A VCEPLDPGDDTTVIPTGRLPRTSYLGIIMDHRWIQLVRSR KGFG PHAGE_Mycoba_DS6A_NC_023744: gp34 9.00E-34 VKFRCERDVLADAVGSAGRATSGRGGALPVLAGLRLRL DGDHLEITGSDLDLTVTAEIEVAGGDDGVAVI Region 2 PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp090; PP_02770; phage(gi593774803) 6.00E-82 MSISLRNDFVVDLLMAAGSDEFLCKAARVSTQGSASIDS EESYGLLNFLMKNRHGSPFEHGMMTFRIEAP PHAGE_Cyanop_PP_NC_022751: hypothetical protein; PP_02771; phage(gi557307645) 2.00E-07 MFAMAAVLHEGACKYGANNWRGITIEDHLNHLIMHAY AYLSGDRSDEHLSHIMCRAMFAQAVEITEQEKQ hypothetical; PP_02772 N/A MNAINHFNNTAVIIRGQDNQPQMELFDLVPDINSVRAED IPDSHESNLVSHARRELEMIGEEPEWVEGYL PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp078; PP_02773; phage(gi593774791) 2.00E-32 MANEEKTFKVEGAELIYKNFAGEKSAFNATGKREVSVV LSPEFAETLLADGWNVRQTKPDEDGEFRYYIT hypothetical; PP_02774 N/A MSEELHNVVITDTQVRCSCGYEFDTDLPPEAHKLGYIHA QMQLASKVDNQQTPDPDGY PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp072; PP_02775; phage(gi593774785) 0 MVRGQSFYAIWDETKGLWSTDEYDVQRLVDEDLHRYA SELHEKKGVNYTVLNLKSFDTKIWTTFRSYMRH hypothetical; PP_02776 N/A MALGRNKFTLTDREFAIIERSIDLRIKHLVGEINRTLPEVG LDNPHIRDMDRTLYECRELSKKLSAIKKA hypothetical; PP_02777 N/A LTSNIFLSHPLLINEILTELIPKLNLGLIMLGVSQWEQSTFT VG PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp070; PP_02778; phage(gi593774783) 6.00E-18 METTEYIDDVQVSPTTIGILVGIASFGVGLTIGYFVGKRN KDVVYRTIPAVNSGAITSVVYDYSYSEDVV PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp069; PP_02779; phage(gi593774782) 1.00E-60 MKYIPEQLARNLSRQILVARKHSPRVLFVAGIAGVVTST VLACKATLKLEAELDEMQTQINNVKELKTEH PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp065; PP_02780; phage(gi593774778) 4.00E-17 MLKRTITFTDYNGVRHTEDHYFHLSKVDLVRLEVAGEK SFAEYLQDIVKTEDRKGLIEIFEKLIQLSYGK hypothetical; PP_02781 N/A MENDTNTTVETNPYLESMKEGFAKGAATAVATVVVTQ LATLLIEKSVGAVRNHRNKIADENTDQ PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp054; PP_02782; phage(gi593774767) 3.00E-46 MTIKTVLHKTEKSLRDNSPVILTAIGVSGTLSTAYLAGKA SYEIGYAFYDDMPTRDFFKLNWKKYIPAAV

130 hypothetical; PP_02783 N/A MKQILDAISYVGGLAVGVTLTAVAGVIVLVIAREVYEQL TKSDD hypothetical; PP_02784 N/A MESHTEPKKTFAGRIVSFAKNDIVQMTALMAATAVVSA GLAREITLRQAFTFEQMDYILKDKDLQKALID hypothetical; PP_02785 N/A MFNRAIQVKMVNTKKQEPQEPVASDSYFEKKAEVVSRE IDGVMRKVGMLATGYVVVDTLRQVLVARANRF PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp041; PP_02786; phage(gi593774754) 2.00E-08 MFGNFRKKEDPKLSAAIDAIYEEMTTHGPDSPEYPNMLG YLERLTELQAPKRHNRVSPDQMAVVLGNLLG PHAGE_Strept_Dp_1_NC_015274: holin; PP_02787; phage(gi327198372) 3.00E-07 MEQNGTVDVQGFQLSNRTYNKLKAFVTVILPAFSSAYY GLAELWDFPNVAAVIGTTAIITTLFGTLLGIS hypothetical protein [butyrate-producing bacterium SM4/1] gi|479183015|ref|YP_0078101 5.00E-09 MGIKFIEQKIIYKDEYEDFRRYLYEPYKELGGNGTADRI MAEIAKLPIRGYHTSDSNRIHLRREDKSNGT hypothetical protein [butyrate-producing bacterium SM4/1] gi|479183020|ref|YP_0078101 5.00E-14 MLDILKFNSGVPTRLENGEVVNGIVSKTWVERYRDPGEF TFTAKESSDLLSKLPVGTLISHMQTSEVMVV PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp037; PP_02790; phage(gi593774750) 7.00E-12 MNLTSLDLYSNNVFVARFDCEPDGTSPFLLVDESGLGAE TIVQRYVMESASGEQFYDLTVPSRTISLQMI PHAGE_Rhodoc_ReqiPoco6_NC_023694:tape measure protein; PP_02791; phage(gi593774749) 0 MPSVDNKIVSIEFDNNSFERKVAETMASLDKLKASLAFN DANKSFADLDSSVKKINFGSMASAVDGISTK hypothetical; PP_02792 N/A MKDGQLTANGTAVGGTETDSGLTTGSFNVISGRKYRID VFAMILGSTAGNIASCKLTNASNTVLAQNNVL PHAGE_Rhodoc_ReqiPoco6_NC_023694:head protein; PP_02793; phage(gi593774744) 5.00E-41 MTKLTWDAPTEKTYETGLDRGVLYLQDGTAVVWNGLT AVTESSSRSTTPLYFDGKKFKDVVNLDQPKSKL hypothetical; PP_02794 N/A MNSDKIYVYRQDAELPDLGVAWYDRDGNLIDFSNGYSF TVKLVSQKDKTVALTKTAGIVGSNSKPNIIIG PHAGE_Caulob_CcrRogue_NC_019408:putative lectin-like domain protein; PP_02795; phage 2.00E-16 MHGGGYDTTIGSDHSVGLAYKEIYPAGSVTCPTWNQVS NGPDAWVALTLAFKPEPEAPWEGPEVYYSNED PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp030; PP_02796; phage(gi593774743) 1.00E-40 MLKLIIKGDEVYDESTGQFGTVNDTILELEHSLLSVSKW ESKFEKPFLANTEKTVDEIMDYIRFMIITPD PHAGE_Rhodoc_ReqiPoco6_NC_023694:head protein; PP_02797; phage(gi593774742) 1.00E-85 MTILTWDQSGQRLYETGVDRGVLYIPNAVSGLYDNGVA WNGLVSVTESPSGAESSAQYADNSKYLNLVSA PHAGE_Rhodoc_ReqiPoco6_NC_023694:prohead protease; PP_02798; phage(gi593774739) 0 MKAADFSGWATKAGLKCTDGRTITPDAFKDQNGVTVP LVWQHGHNDVENVLGHAVLEHRPEGVYAYCYFN hypothetical; PP_02799 N/A LILQLKENARIRQAEVREKLDKIISQLASKSKNARSKAER EKAKQEIESIRDELKTALEGAREAYAKLKD PHAGE_Rhodoc_ReqiPoco6_NC_023694:portal protein; PP_02800; phage(gi593774737) 1.00E- LAILNQVKRAINAFRSNEQSTQRYVTNADIGPGSSIRPHT 134 THSRHFNERSIVTSIYTRISVDVASVAIRH hypothetical; PP_02801 N/A MTARINADFKNSKGEKVSIDFANAVLSKAVSKNQREAY FKVGAVWVAAMLAGKYIGNARMSR PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp020; PP_02802; phage(gi593774733) 2.00E-06 MSDVAVEEFLEHFGTKGMKWGVRNSRPTSGSSGKSEKP KHSKKKIAVGIAVGVGAIAVGVILAKNHKVKV PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp019; PP_02803; phage(gi593774732) 1.00E-09 MSDISVEDFLAHYGVKGMKWGVRNESKSSNVTLGPPAG VVMRKDGSILIKPGANLQRLVRSNGESLPMKD PHAGE_Rhodoc_ReqiPoco6_NC_023694:TerL; PP_02804; phage(gi593774729) 0 MTLSNTATPKYYAEFRAAVLRGEIPVNEEISLEMNRIDD LIADPDIYYDDKAVEGFISYCENELTLTDGG

131 PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp022; PP_02805; phage(gi593774735) 3.00E-05 MDTMSDHEEAREEFLEHYGVKGMKWGVRKKPSSSKRS LVTNAKKMSDSDLKAAVERLRLEREYVNINKDL PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp014; PP_02806; phage(gi593774727) 2.00E-32 MESSILKSTKKVLGLSEDYIAFDLDIMTHINAAFSILNQL GVGPSTGFTIEDEQAQWSSFSSDAAVVNLV PHAGE_Thermo_THSA_485A_NC_018264:glycoside hydrolase family 25; PP_02807; phage(gi397912616) 1.00E-06 MSDTLYPVFYGTRLVTFDVLEATFSSKCHPEFWRRMKN FLLHQGGKFGIGGGWRAVGAQPDLSGFAPEGK hypothetical; PP_02808 N/A VARFMVIGDSISEARAVTTLGDRWQDRLAKMLRTKFPC VGV PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp011; PP_02809; phage(gi593774724) 3.00E-15 VHHINPMSVDDLIRHEEWVLNPEYLITTTHDTHNAIHYG DQSLLKKPFTPRQMGDTKLW PHAGE_Rhodoc_ReqiPoco6_NC_023694:gp010; PP_02810; phage(gi593774723) 5.00E-17 MVEGTASAQVITHFLKLGTEREKLERERLRQEIILGQAKT DQIASAERVEKLYSRALQAMRQYQGQEIDD Region 3 PHAGE_Staphy_80alpha_NC_009526: tape measure protein; PP_03288; phage(gi148717898) 9.00E-08 VRAVFEARVAGAQKGLRDLAGDADKAGAKVDATAKSL KDLSSVTAKPKIDLAIEDAQRRLTAVTKELGEL hypothetical; PP_03289 N/A MSQWDDLDDHDRAWALGVGLADAEAEAEANAATCPS CGGLKAECQDPDNQHAYVVTTGRCYRTRALMEAQ hypothetical; PP_03290 N/A MRTRVLNHFHTFHPEAQAELNRLAAEEARLTLALARRS ESEPEPKPKRRMSEPPAPVDDIAVELEKVRAA hypothetical; PP_03291 N/A MAQTAVAGVVAAGRVPTWIIPQASIATDPTPGTYSIPLT ALTGGTTVKADCHMDAGDLSVSRSAQTRERQ hypothetical; PP_03292 N/A MPQNPAEFTRVSTPAGHFSVPSALVEAAGSEWKVLKQD AADSNGLPYPPKLREQSAAATPQAEASASADN hypothetical; PP_03293 N/A VHTEIKAALIAAGVYAVDGPADDLPSDGGVVRQAAVL WPSPGSHTYTRVSGSSSGRVDRVLITCVGATTF PHAGE_Strept_20617_NC_023503: phage protein; PP_03294; phage(gi588295123) 1.00E-06 VSAGAEFSRFARALRIAAGGLESDGRKAVDRVAQGALR TAQAHAGVDSGDLRSSLRVSSRGGELRAAVET hypothetical; PP_03295 N/A VIGPAIDAALPVMHANAESMMLDRCTIERATSTWDEAA QKTVTTWAPVITESVCDVDDGAASGRSIVTDE PHAGE_Mycoba_Brujita_NC_011291: gp9; PP_03296; phage(gi206599553) 5.00E-06 VTVDGSGCAVQMLPTLRLVYLVSITNDGAAVSNPEWSA AGFVRGSWTCRLRGVTATMRHGFEDWPADLLG hypothetical; PP_03297 N/A MSARKSAETNPQPKAVDVPDEVTPDVPKPPVKVPSIMGS TFAERAAANKAVAKSRTEAKG PHAGE_Mycoba_Butters_NC_021061: major capsid protein; PP_03298; phage(gi479336468) 3.00E-77 MSNARQRLEAAVKSLQEFSDQLDAADAPLSGEDMSNLK SRMEEIKDLKGQVEAEAEAAGALKDAKAFMAA PHAGE_Mycoba_HufflyPuff_NC_022981: capsid maturation protease; PP_03299; phage(gi563398893) 1.00E-24 VKKSYAFAAVKSLDSENPNGEFEVVLSAATVDRDGEVIE ARAFEPLPESIPFHAFHDFHDPIGRGVPFYD PHAGE_Mycoba_Murphy_NC_021305: portal protein; PP_03300; phage(gi508179175) 2.00E-61 VFVSNGSLVTKTPLLAGSPTYFPKMSTAGLIYPTAYSQM YRGQLWINILVNKLAKAQARLPFPVYERDEL hypothetical; PP_03301 N/A VMFSNLVLKNRWRDRVVVTLKSGESFAGVLWSNDSRA LVIRNASALGAGENRTDLSLDGEVIVLMADVAY PHAGE_Mycoba_Butters_NC_021061: terminase; PP_03302; phage(gi479336464) 2.00E-52 VTEYAPLFRHRPPERWTNGDLAAKVGIDLGLPPDDEQRE LLDMIYAEKAPDRPAAFEVCVVGPRQNIKTS

132 PHAGE_Mycoba_Butters_NC_021061: hypothetical protein; PP_03303; phage(gi479336463) 8.00E-17 MSEVVCVCGKAFAAKSNRARYCSDRCRKRAQRGGGEV VELPAKVGQVEGSSLAQAGPVETATVDALKAAD Region 4 attL N/A TTGGGTGTCGGTC PHAGE_Arthro_vB_ArS_ArV2_NC_022972: putative endodeoxyribonuclease; PP_03315; phage(gi563398 6.00E-07 MKPQRLTIPAPAPWINANARDHWTKKGRLTRSWRSASA AWARHQKLRPVTQPVVIVATVVKTNSRRFDVE hypothetical; PP_03316 N/A MIATRPECGTGYGIRYHLAKDEPFCDLCTDHVMDRRLV RETRTLTGSTTTQERTLRQAIHALAQMLDEHD hypothetical; PP_03317 N/A MSSTTRAQAEALATFVRQLRPDWDHPGIVHAIGRCQRE AVSEIAVALIRLAENGQAKTPALLPEPGRHWK PHAGE_Strept_VWB_NC_005345: hypothetical protein VWBp15; PP_03318; phage(gi41057231) 3.00E-08 VDDTLHSHPKTRRAGLAAMGLWTVCGSYCMAYKTNGF VPEWFVAGFQSGRKLAADLVRAGKWEDAVKDNE PHAGE_Pseudo_F116_NC_006552: DNA adenine methyltransferase; PP_03319; phage(gi56692911) 6.00E-45 VKPPFSYYGGKMTVGPEIARILPAHKHYVEPFAGSLAVL LAKEPSHAETVNDLDGDIVTFWRVLRDRADD PHAGE_Arthro_vB_ArS_ArV2_NC_022972: hypothetical protein; PP_03320; phage(gi563398174) 2.00E-08 VSLADWTCATCTTEGRRGSCCPGRCYCGHDTCHAFAS WTPRPVLNVTDISKPKGKKGSAWAEREESTWID PHAGE_Mycoba_BigNuz_NC_023692: gp55; PP_03321; phage(gi593774685) 8.00E- VSVYEGHGVTLHHGDCLDVLRSLPDCSVDSVVCDPPYA 133 LGFMGREWDTFGMDVGRGAQARSQRRAEVTPT hypothetical; PP_03322 N/A MNTNRPALPVHVVNRCKFAALTFGNVRAAREYEQRTG RHASACGNCGKWHA hypothetical; PP_03323 N/A VARLMPNQPKTPIRSVRIPDEEWRAAQARAAERGETVT DLIRRALRRYAK hypothetical; PP_03324 N/A MSDPTPIKALDWHGFLRILAAHSPSKLLSANTIHADACE GVQIDPKRLGALFKAAADAGYIRLVGVENAA PHAGE_Mycoba_32HC_NC_023602: DnaQ; PP_03325; phage(gi589893377) 7.00E-21 VSAPLVFLDTETDGIHPGRRVWEVAMIRRDYDGDSVKQ METHFFVGLDLRDSDPFGLRVGGFWDRHPAGR PHAGE_Arthro_vB_ArS_ArV2_NC_022972: hypothetical protein; PP_03326; phage(gi563398164) 4.00E-31 MDLTDSIAPRSDQMNAEDLLTGPRTFTVTEVRKGSSAEQ PVSIYLAEFPSDRPFKPSKTVRRLIVSAWGK PHAGE_Mycoba_Charlie_NC_023729: gp47; PP_03327; phage(gi593779210) 5.00E-42 MTITEPGIVTDLDERTYHADRGSLSHSGSKNLHDSPARF RWLLDNRVEKDSFDVGTLAHKLILRSTDNRI hypothetical; PP_03328 N/A VSYFDLPEVQEARADLRDALDAAWQDARDHGHDDHRP TRAETDTDQWHAAWDDDEGHTWADMVRGVAVDS hypothetical; PP_03329 N/A MTAAAIVPAQCPNCGRLTVDNRQHCPDMPASQWARVC LAMTCAKCGTGYHTHTTKETQS hypothetical; PP_03330 N/A VAGRGDIANTRHVLTIGPVGSAGRHVTIHRHYDGPDSTT WGHLIIAGCWRGIADQLDHRIHTKKGHGWDE hypothetical; PP_03331 N/A MSDAEIITQARLQLMHGTDPTDVADYILDALTERAWRG PSRDHDEVTA hypothetical; PP_03332 N/A VGEADPMTLHTRVAVTSGDVTPEAVFAQCRSIIGADKH QVIEGDPLAMAPGQGLPALMWVESSNGEPETC hypothetical; PP_03333 N/A MNIVETSDQIGALLILTATLVPVVAWAVWVAIDVLWER RTR hypothetical; PP_03334 N/A MNDTTTIVWADLATEHHCDTCTCAPPIPDLNAALQAAA TLDVRGAYSVRIDLDGHITLQGDAPNMLNLIG

133 hypothetical; PP_03335 N/A LGRPAAFERFGGALLFLRSVVAAGGDECAGDDDGHERE GGE attR N/A TTGGGTGTCGGTC hypothetical; PP_03336 N/A MSNIPPLLTTSEVAKSCGDVAVKTVTRWVESGQLAYAQ KLGGLRGAYLFDPAEVARFKKSRERQVTS PHAGE_Mycoba_32HC_NC_023602: integrase; PP_03337; phage(gi589893371) 2.00E-23 MSQDLAAAYLAHLEAEHAPANTIAARARVLRSVGSAGT ATREDIEAWWATRRDLSPATRSNDLANLRAFY Region 5 attL N/A CAGCAGGCCGATGCC PHAGE_Bacill_G_NC_023719: gp245; PP_05619; phage(gi593777701) 3.00E-22 MTTPQISAPDSTAAPDTPAHGRPLVGFEHCELRYPNGTH ALSDVNLTVREGEFVSVVGPSGCGKSTLLRL hypothetical; PP_05620 N/A MTTHTDQAAAPKDPVEPDPGAPSTTTDIAVLAQKASAA SGF PHAGE_Liston_phiHSIC_NC_006953: hypothetical protein LPPPVgp44; PP_05621; phage(gi62362410) 2.00E-43 MTPEDGTSASARANDDFADFDAGYRFGQYSDDDFSAAD FAPSTPEAPAQDGPLPPPFPLDGIDLLSPPGF hypothetical; PP_05622 N/A MDRLAPAPFASRQIKRLVAALWGRMSPAERQAFKEWIT KQ hypothetical; PP_05623 N/A VSEKTGLQVGTIANIRDGKTQNPTYYALKRLSDYFEVNP PHAGE_Liston_phiHSIC_NC_006953: putative helicase subunit; PP_05624; phage(gi62362409) 4.00E-62 MRALAALAKEPNMSLIATAAKPADRPVLITLCGDSGMG KTSLAASFPKPIFIRAEDGMQAIPANNRPDAF PHAGE_Liston_phiHSIC_NC_006953: putative helicase subunit; PP_05625; phage(gi62362409) 3.00E-12 VGFLRLVTFTKGDDGERKKAISTGDRELVCHAVASNISK NRYGLTEALPFAAGENPLFAVIPALGAKHSI PHAGE_Liston_phiHSIC_NC_006953: hypothetical protein LPPPVgp42; PP_05626; phage(gi62362408) 9.00E-20 MAGFWNLSDGEDAAKTGAEYEIPGGNMDPIPAGSSVLA MIDEAKWDHTQNDAEEYISLRWTVLAPEEYKN PHAGE_Erwini_vB_EamM_Y2_NC_019504: tail fiber; PP_05627; phage(gi422934766) 2.00E-13 MEQRSEEWFAARKGRVTASMVGAILGVSPNLSRAGAM RRMVRDAHGAEPEFTGNIATQYGERNEDGAVDE PHAGE_Vibrio_pYD21_A_NC_020846: hypothetical protein; PP_05628; phage(gi472340491) 2.00E-05 LTKISKAGAVSYAKAVAELLPGTDLEKWRGKPSTYWML K PHAGE_Liston_phiHSIC_NC_006953: putative helicase; PP_05629; phage(gi62362405) 2.00E- MGQKRDAQKKKDAEMTLRPYQQAAVDAAVEWMRKSL 132 APACIEAATGAGKSHIIAEIARQIHHQTSKRVLC PHAGE_Synech_S_MbCM6_NC_019444: hypothetical protein; PP_05630; phage(gi418487471) 1.00E-11 MATRAFAASVRQRRPITTDQVTITTAAANPSGATLPTAV AGDRVIIANRGANPVNIYPATGAAIGALAAN hypothetical; PP_05631 N/A MLALAENTQAVRAMVEAMKAQNTHFADNNEMFKALG PVLSDLRHDGADSKAHLAAIRDALNRGR PHAGE_Acinet_AP22_NC_017984: putative endolysin/autolysin; PP_05632; phage(gi388570824) 4.00E-23 MTMKTSDAGLFALALHEGIVPAPYRDSVGVWTYGIGHT LGAGYPDPAKMLRGMPSNLDAALRDVFDLFRR hypothetical protein KVU_1650 [Ketogulonicigenium vulgare WSH-001] gi|385234143|ref|YP_00579 PP_05633 3.00E-06 LGRIVFQLKGRDMDWAPYARIAARYIIGGVGGTAVGDA VLNDPDLMNILTIAISGAAAALTEYLYALAKR hypothetical; PP_05634 N/A MIAFIWKLILGGLWRPLLAVLGAAGLYVKGRADAKAKA DSRALDATVKGQEAARKGRAEAVEKLRQGKTP PHAGE_Roseob_1_NC_015466: hypothetical protein RDJLphi1_gp31; PP_05635; phage(gi331028085) 1.00E-09 MATLVPRLAHSAALILLALPAQAQTPCTGLPDALAALAA RYDEAPRVSGLMANGQLLIVTASEAGGFTVL hypothetical; PP_05636 N/A MSDLIERLERWGRDEGLQNHFIGSRSARKDCAEAATAL AEAEAEIARLKDVLEIVAAGPILGEPMARWIN

134 hypothetical; PP_05637 N/A LTSRVAIKRALEAAGFVFLRGGWARKEAAPRLQDKIDR AVKDAAESVERIKNVGTHENHVGTHEK hypothetical; PP_05638 N/A MTIAEKIQMMRDAGLRRTAARIWQNPNGTWSHHKDAQ REWYGLDGCETDLRFFEEWEA PHAGE_Rhodob_RcapNL_NC_020489: phage integrase/recombinase; PP_05639; phage(gi461474991) 1.00E-15 VMREAPYYGWTIFTRTRSGKSIGGDVSAAAKLAGVKKT AHGLRKTRATVLAEGGATASQIAAWTGHKTLA PHAGE_Rhizob_RR1_A_NC_021560: hypothetical protein; PP_05640; phage(gi514231508) 9.00E-13 LIQAIAELRRDGPEDATWTASLVADGNPYQSSLAGAVAT TLNAALSGDLIPKADADLAVALMVEKAADVV PHAGE_Rhodoc_REQ2_NC_016652: hypothetical protein; PP_05641; phage(gi372449849) 3.00E-06 LRSGDTCPGRPRGSASATSATTRCHLIAAAPDLARALLD ARAEHAASLIRINAEAVEAVAQARADAQAAV cytochrome bd quinol oxidase subunit 1 [Intrasporangium calvum DSM 43043] gi|317123393|ref|Y05.1|; PP_05642 6.00E-78 MYRRSARLGAIILLLGGVAVTISGDLQSRVMTQVQPMK MAAAEALYDTSPEGKGASFSIISVGTPDGQHE cytochrome d oxidase cyd, subunit II [Kytococcus sedentarius DSM 20547] gi|256824081|ref|YP_.1|; PP_05643 3.00E-90 MELTTVWFILIAVLWIGYFVLEGFDFGVGILFPVLGRDDP DLGSNDLAETGEIRRRVMLSTVGPVWDGNE cytochrome d ubiquinol oxidase subunit II [Nocardiopsis dassonvillei subsp. dassonvillei 1.00E-23 DSMi|297559180|ref|YP_003678154.1|; PP_05644 MSVGSLFVALFPDVMPSTTDPAFSLTTINASSTDYTLKIM TWVAVVFTPIVIGYQGWSYWTFRKRVSGHH PHAGE_Plankt_PaV_LD_NC_016564: ABC transporter; PP_05645; phage(gi371496158) 5.00E-13 MGPIDPSLLRALPGARSRVARLAGMGVISGVLALGQAIA VAASVTAIVRGSSLAMPLAVLGAVLVLRGLV attR N/A CAGCAGGCCGATGCC

135 Table B5 Summary of spacer sequences # DR Ave. DR Spacer Ave. SP Ave. SP # of Ave. FL # of GID DR concensus Variants Length count Length Coverage Flankers Length Reads Coverage 1:25,2:10,3:6, G4 GTGCACCCCGGCAGCCCGCCGGGGTGGGAGTCTCAAC 1 37 45 35 1 6 37 26 4:1,5:2,6:1, G9 CTCTCCGTCGGCGTTCGTCGACGGCCTCATTGAAGC 1 36 36 36 1 3 35 41 1:30,2:6,

1:22,2:1,4:1,7:2,8:2, G10 CTTCCCCCGGCCATCAGGCCGGGGCTCCATTGCGGC 1 36 33 37 3 4 42 52 9:2,10:1,11:1,12:1 G12 AGGAGGGGCTTGCGGTGTTTGTTCAGG 1 27 3 40 1 0 0 2 1:3, G15 CGGTTCACCTCCACGTGCGTGGAGACAAC 1 29 23 32 1 0 0 4 1:23, G16 ATTCACTGCCGTGTAGGCAGCTCAGAA 1 27 10 33 1 0 0 3 1:4,2:6, G19 GGCTCCCCCGCACACGCGGGGATCGACCC 1 29 6 32 1 0 0 2 1:6, G20 CCTGCCAAGAAAGCGCCGGCAAAGAAGGCACCGGTTAAAAAGGC 1 44 3 18 1 1 31 19 1:1,2:2, G23 GTTGCACTCAGGCTTTGCCCTGAGTGGGGATTGAAAC 1 37 5 35 1 1 38 1 1:5, G27 CCAGCATTCCCGGCCTAGTGTCGGGCTCCGTTGAAGCGG 1 39 3 31 1 0 0 1 1:3, G28 AGCCTACCAATGGGAAGTCGGTAGGGAAACCACGGCGCGCG 1 41 3 25 1 0 0 1 1:3, G29 GAGTGTAGCTATCCGGGGTGAGAGAGGGAGCTACAAC 1 37 3 30 1 0 0 1 1:3, G33 CTTATAATTGCACCAGTTTGGGATTGAAAC 1 30 31 36 1 2 39 9 1:31, G35 GTAGCGCCCGTCCTTAGTGACGGGCGAGGATTGAAAC 1 37 22 34 1 0 0 5 1:22, G37 GTCGCGCGCCCTTCACGGGGCGCGCGTGGATTGAAAC 1 37 10 34 1 0 0 1 1:10, G41 GCTCTCCGCGCCCGCGCGGGCGCGGCCTCGTTGAAGC 1 37 4 35 1 1 38 4 1:4, G48 GATCCCGCCCTCACCCGCACGGGCCGCA 1 28 3 33 2 1 37 4 2:1,3:2, G50 GTTCGCCATCGCATAGATGGTTTAGAAAA 1 29 9 31 1 0 0 1 1:9, G53 CGGGCTCGCCCGTCAGCGATGACGGGCGCGGATTGAAAC 1 39 5 34 1 0 0 1 1:5, G56 AGTTCTCGTCCCCTCGCGGGGTTTTGGGTCTGACGAC 1 37 15 37 1 1 42 9 1:14,2:1, G58 CGCGTTCCCCGCAGGCGCGGGGATGAACCG 1 30 4 32 1 0 0 4 1:4, G59 GGGGTCGCCCCTCGTGATCACGAGGGGCGTGGATTGAAAC 1 40 9 32 1 1 37 2 1:9, G60 CGTCGCGCCCCTCACGGGGCGCGCGGATTGAAACTA 1 36 4 32 1 0 0 1 1:4, G63 GACACGCTCCCCGGCGACGGGGAGCGAGGATTGAAACCAC 1 40 4 31 1 0 0 1 1:4, G67 ATTTCCGCGACTGAAAGGTCGCGGCCTCATTGAAGC 1 36 9 36 1 0 0 1 1:9, G70 CATGTGCTCAACGCCTTTCGGCATCAACGAATGATTCAC 1 39 5 33 1 0 0 1 1:5, G71 GGGGGAGGCCAGGAGGCGGCCGTATCGTGA 1 30 3 31 1 0 0 1 1:2,2:1,

136 G72 CTGTTGCACCCGCCTCTCGGGGCGGGTGGGGATTGAAACCA 1 41 5 31 1 2 33 1 1:5, G73 GTTGATAGCAATAATTCAAAGATACATTCTAAAAGCTATTCACAAC 1 46 6 30 1 1 32 1 1:6, G74 CCTTCAATGAGGCCGAGGCACGAGGCCTCGGAAAAC 1 36 8 35 1 2 37 2 1:8, G76 GTCACAAAGGAGGTTCCGCTCACGCGGATTGAAACA 1 36 3 36 1 0 0 1 1:3, G83 GTTTTTGCGCGTCTGCTCGAAAACCACAA 1 29 3 28 1 0 0 1 1:3, G84 ACAAGGCGTTGTAGACCCCGACGGGAGGAGGGATGAATTCGC 1 42 3 36 1 0 0 1 1:3, G85 GTCGCCGCCTTCACCGGCGGCGCGGATTGAAAC 1 33 12 33 1 1 102 3 1:11,2:1, G87 CTTTCAGTCTCCGCTCTTTCGGAGTAGGTGAGGAATC 1 37 5 35 1 0 0 1 1:5, G90 ATCGCGACTTGCGTCGCTCCTACG 1 24 3 46 1 0 0 2 1:3, G91 CCGCCGCCGGTGCAGGCGATGATGCTCAATGCCTGCGGTGCAGCGCC 1 47 3 23 1 1 28 1 1:3, G98 CGTGGTCCCCGCGTGGGCGGGGATGAGCCGCC 1 32 3 29 1 0 0 2 1:3, G100 GCGCGAGGTCGCGTTGTGACGCGACCAATTGAAACTA 1 37 5 33 1 0 0 1 1:5, G101 GCTTCAATTCGGCCACGGCGTTGATGCCGTGGAAAC 1 36 5 36 1 0 0 1 1:5, DR, direct repeats; SP, spacers; FL, flankers

137