bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

1 Species-level microbiome composition of activated sludge - introducing the MiDAS 3 2 ecosystem-specific reference database and

3 Marta Nierychlo, Kasper Skytte Andersen, Yijuan Xu, Nick Green, Mads Albertsen, Morten S. 4 Dueholm, Per Halkjær Nielsen.

5 Center for Microbial Communities, Department of Chemistry and Bioscience, Aalborg University, 6 Aalborg, Denmark.

7 *Correspondence to: Per Halkjær Nielsen, Center for Microbial Communities, Department of 8 Chemistry and Bioscience, Aalborg University, Fredrik Bajers Vej 7H, 9220 Aalborg, Denmark; 9 Phone: +45 9940 8503; Fax: Not available; E-mail: [email protected]

10 Abstract 11 The function of microbial communities in wastewater treatment systems and anaerobic digesters is 12 dictated by the physiological activity of its members and complex interactions between them. Since 13 functional traits are often conserved at low taxonomic ranks (genus, species, strain), the development 14 of high taxonomic resolution and reliable classification is the first crucial step towards understanding 15 the role of microbes in any ecosystem. Here we present MiDAS 3, a comprehensive 16S rRNA gene 16 reference database based on high-quality full-length sequences derived from activated sludge and 17 anaerobic digester systems. The MiDAS 3 taxonomy proposes unique provisional names for all 18 microorganisms down to species level. MiDAS 3 was applied for the detailed analysis of microbial 19 communities in 20 Danish wastewater treatment plants with nutrient removal, sampled over 12 years, 20 demonstrating community stability and many abundant core taxa. The top 50 most abundant species 21 belonged to genera, of which >50% have no known function in the system, emphasizing the need for 22 more efforts towards elucidating the role of important members of wastewater treatment ecosystems. 23 The MiDAS 3 taxonomic database guided an update of the MiDAS Field Guide – an online resource 24 linking the identity of microorganisms in wastewater treatment systems to available data related to 25 their functional importance. The new field guide contains a complete list of genera (>1,800) and 26 species (>4,200) found in activated sludge and anaerobic digesters. The identity of the microbes is 27 linked to functional information, where available. The website also provides the possibility to BLAST 28 the sequences against MiDAS 3 taxonomy directly online. The MiDAS Field Guide is a collaborative 29 platform acting as an online knowledge repository and facilitating understanding of wastewater 30 treatment ecosystem function.

1

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

31 Introduction 32 Biological nutrient removal (BNR) plants are the principal wastewater treatment systems across the 33 urbanized world. Besides their primary role in removing pollutants and pathogens, these systems are 34 increasingly seen as resource recovery facilities, contributing to sustainable resource management. 35 At-plant anaerobic digestion exploits the value of sludge as a source of energy, reducing plant carbon 36 footprint and placing wastewater treatment plants (WWTPs) on the road to achieving a circular 37 economy. Complex microbial communities define the function of biological wastewater treatment 38 systems, and understanding the role of individual microbes requires knowledge of their identity, to 39 which known and putative functions can be assigned. The gold standard for identification and 40 characterization of the diversity, composition, and dynamics of microbial communities is presently 41 16S rRNA gene amplicon sequencing (Boughner and Singh, 2016). A crucial step is a reliable 42 taxonomic classification, but this is strongly hampered by the lack of sufficient reference sequences 43 in public databases for many important microbes present in wastewater treatment systems (McIlroy 44 et al., 2015). In addition, high taxonomic resolution is missing from the large-scale public reference 45 databases (SILVA (Quast et al., 2013), Greengenes (DeSantis et al., 2006), RDP (Cole et al., 2014)), 46 often resulting in poor classification. 47 MiDAS (Microbial Database for Activated Sludge) was established in 2015 as an ecosystem- 48 specific database for wastewater treatment systems that provided manually curated taxonomic 49 assignment (MiDAS taxonomy 1.0) based on the SILVA database and associated physiological 50 information profiles (midasfieldguide.org) for all abundant and process-critical genera in activated 51 sludge (AS) (McIlroy et al., 2015). Both taxonomy and database were later updated (MiDAS 2.0) to 52 cover abundant microorganisms found in anaerobic digesters (AD) and influent wastewater (McIlroy 53 et al., 2017). However, more high-identity 16S rRNA gene reference sequences are needed to be able 54 to classify microbes at the lowest possible taxonomic levels. Currently, taxonomic classification can 55 only be provided down to genus level, limiting elucidation of the true diversity in these ecosystems. 56 Consequently, physiological differences between the members of the same genus that coexist in 57 activated sludge cannot be reliably linked to the individual phylotypes due to insufficient 58 phylogenetic resolution. Finally, manual curation of large-scale databases such as SILVA is not 59 feasible due to the growing number of sequences and frequent taxonomy updates (Glöckner et al., 60 2017). 61 Recent developments of sequencing technology and bioinformatic tools enabled generation of 62 millions of high-quality full-length 16S rRNA reference sequences from any environmental 63 ecosystem (Callahan et al., 2019; Karst et al., 2019, 2018). Obtaining full-length 16S rRNA gene 64 sequences directly from environmental samples has made it possible to establish a near-complete 65 reference database for high-taxonomic resolution studies of the wastewater treatment ecosystem, 66 based on approximately one million 16S rRNA gene sequences retrieved directly from the wastewater 67 treatment plants, which provided more than 9,500 exact sequence variants (ESVs) after dereplication 68 and error-correction. Moreover, we created the AutoTax pipeline for generating a comprehensive 69 ecosystem-specific taxonomic database with de novo names for novel taxa. The names are first 70 inherited from sequences present in large-scale SILVA taxonomy, and de novo names are provided 71 for all remaining unclassified taxa based on reproducible clustering with rank-specific identity

2

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

72 thresholds (Yarza et al., 2014). These de novo names provide placeholder names for all novel 73 phylotypes and act as fixed unique identifiers, making the taxonomic assignment independent of the 74 analyzed dataset and enabling cross-study comparisons. 75 Here we present the MiDAS 3 reference database, which is based on the ESVs previously 76 obtained from activated sludge and anaerobic digester systems (Dueholm et al., 2019), amended with 77 additional sequences of potential importance in the field. These sequences represent from 78 influent wastewater, potential pathogens, and genome-derived sequences of microbes found in 79 wastewater treatment systems. The MiDAS 3 taxonomy is built using AutoTax (Dueholm et al., 2019) 80 and proposes unique provisional names for all microorganisms important in wastewater treatment 81 ecosystems at species-level resolution. We have furthermore curated some of these species-level 82 names, and here we demonstrate the resolution power of MiDAS 3 by analyzing species-level 83 community composition in 20 Danish full-scale WWTPs carrying out nutrient removal with chemical 84 phosphorus removal (BNR) or enhanced biological phosphorus removal (EBPR) over 12 years. We 85 provide important information about diversity, core communities, and the most abundant species in 86 these ecosystems. Although the plants investigated are Danish, the results and approach are relevant 87 for similar systems worldwide. 88 We also present the updated MiDAS Field Guide, an online platform that links the MiDAS 3 89 taxonomy-derived identity of all microbes from wastewater treatment systems to abundance 90 information based on our long-term survey, and functional information (where available). MiDAS 3 91 alleviates important problems related to the taxonomic classification of microbes in environmental 92 ecosystems, such as missing reference sequences and low taxonomic resolution, while our MiDAS 93 Field Guide acts as an online knowledge repository facilitating understanding the function of 94 microbes present in wastewater treatment ecosystems.

95 Materials and methods 96 MiDAS 3 reference database and taxonomy 97 The MiDAS 3 reference database was built using full-length 16S rRNA gene ESVs from 21 activated 98 sludge WWTPs and 16 ADs systems, and taxonomic classification was assigned as described by 99 Dueholm et al., (2019) and shown in Figure 1. The database was amended with 95 additional 100 sequences from published genomes for the bacteria that are potentially important in wastewater 101 treatment systems, but not present in the reference ESVs obtained (Table S2). Taxonomy was 102 assigned to all sequences using the workflow presented in Figure 1. Gaps in the SILVA-derived 103 taxonomy were filled with a de novo taxonomy that provides ‘midas_x_y’ placeholder names, where 104 x represents the first letter of taxonomic level, for which de novo assignment was created, and y is a 105 number derived from the ESV that forms the cluster centroid for a given taxon. Where applicable, 106 taxonomic classifications were curated with taxon names for ESVs that represent genomes published 107 in peer-reviewed literature. Additionally, several manual curations were made, based on MiDAS 2.1 108 database and published literature (full list of amendments is presented in Table S3). MiDAS 109 taxonomy file is available for download from the web platform (midasfieldguide.org/Downloads).

3

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

110 Activated sludge samples 111 Sampling of the activated sludge biomass from 20 Danish full-scale municipal WWTPs was carried 112 out within the MiDAS project (McIlroy et al., 2015). The plants were sampled up to four times a year 113 (February, May, August, October) from 2006 to 2018 with a total of 712 samples. All samples were 114 taken from the aeration tank and sent overnight to Aalborg University for processing. Basic 115 information about the design and operation of the WWTPs as well as number of samples collected 116 are given in Table S1. 117 Amplicon sequencing and bioinformatic analysis 118 DNA extraction, sample preparation including amplification of V1-3 region of 16S rRNA gene, and 119 amplicon sequencing were conducted as described by Stokholm-Bjerregaard et al., (2017). Forward 120 reads were processed as described by Dueholm et al., (2019) with raw reads trimmed to 250 bp, 121 quality filtered, exact amplicon sequence variants (ASVs) identified, and taxonomy assigned using 122 the MiDAS 3 reference database using the SINTAX classifier (Edgar, 2016). 123 Data analysis and visualization 124 Data was imported into R (R Core Team, 2018) using RStudio (RStudio Team, 2015), analyzed and 125 visualized using ggplot2 v.3.1.0 (Wickham, 2009) and ampvis2 v.2.4.9 (K. S. Andersen et al., 2018). 126 The dataset containing 75,017 ASVs was rarefied to 10,000 reads per sample, resulting in 57,902 127 ASVs. Simpson’s alpha-diversity index calculation and redundancy analyses were performed using 128 the ampvis2 package. Redundancy Analysis (RDA) was performed using the Hellinger 129 transformation and constrained to WWTP location. For the analysis of rank abundance and core 130 microbiome, all ASVs without taxonomic classification at the rank analyzed were removed. The 131 remaining datasets consisting of 29,493 ASVs (50.9%) and 18,452 ASVs (31.9%) with genus- and 132 species-level classification were normalized to 100%. These ASVs were represented in total by 1647 133 genera and 3651 species. The unclassified ASVs were primarily due to missing ESV reference 134 sequences for rare ASVs, and the inability of the amplicons to resolve the species-level taxonomy for 135 certain taxa. An example of the latter is genus Trichococcus, where the species T. pasteurii, T. 136 flocculiformis, and T. patagoniensis cannot be distinguished even with full-length 16S rRNA gene 137 sequences.

138 Results and discussion 139 The MiDAS reference database provides species-level classification 140 In the creation of the MiDAS 3 database, we have applied a novel approach (Figure 1), where high- 141 throughput sequencing of high-quality full-length 16S rRNA genes (Karst et al., 2018) was combined 142 with a new automatic taxonomy concept, AutoTax (Dueholm et al., 2019). The database contains 143 full-length sequences from 21 Danish WWTPs carrying out nutrient removal, and 16 mesophilic and 144 thermophilic Danish ADs. The MiDAS 3 reference database was additionally supplemented with 145 high-quality full-length 16S rRNA gene sequences of microbes with known importance in the field, 146 such as bacteria present in the influent wastewater and pathogens (Table S2). 147 MiDAS 3 taxonomic assignment is based on the AutoTax framework, where sequences first 148 inherit taxonomic classification from the SILVA database (release 132, Quast et al., 2013), including

4

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

149 species names (based on type strains), if available. The remaining unclassified sequences are assigned 150 unique MiDAS names down to species level. Thus, the new release of MiDAS 3 taxonomy proposes 151 a provisional genus and species name for all microorganisms found in activated sludge and anaerobic 152 digester ecosystems, acting as stable identifiers and placeholder names for the sequences, until the 153 representative microorganisms are further characterized and assigned approved names. Additionally, 154 names derived from recently published genomes and other sources not yet incorporated in SILVA 155 taxonomy, were manually curated (see Table S3 for full list of curated names) to provide a near- 156 complete database and taxonomy for microorganisms in wastewater treatment ecosystems. For a few 157 species there were inconsistencies between the genus assigned by SILVA and the type strain name 158 (see Table S4); these issues relate to the fact the SILVA taxonomy is updated based on phylogenetic 159 analyses of the available 16S rRNA gene sequences, whereas type strain names have to be approved 160 by the international taxonomy committee. The possibility to include additional sequences ensures that 161 the database can easily be updated based on the current state of the ecosystem-specific field, and will 162 drive the future releases of the MiDAS database. 163 Two examples of improved taxonomy are shown in Table 1. is one of the most 164 abundant genera in Danish influent wastewater (McIlroy et al., 2017), with two abundant species 165 identified using the MiDAS 3 reference database. A. defluvii is described by the correct classification 166 down to genus level in all but one of the reference databases tested (MiDAS 2, SILVA, Greengenes, 167 RDP). However, the other abundant species (midas_s_8989) was classified only at family level, with 168 genus name either missing or assigned incorrectly by the other databases. 169 Genus Ca. Amarolinea contains abundant filamentous bacteria in Danish WWTPs (Nierychlo 170 et al., 2019) and can be associated with bulking incidents (M. H. Andersen et al., 2018; Nierychlo 171 and Nielsen, 2017). All three abundant species belonging to the genus lack taxonomic classification 172 below order level in public large-scale databases, preventing identification and analysis of this 173 important member of the activated sludge system.

174 Microbiome composition of Danish activated sludge plants 175 In this study, we performed a 12-year survey of bacteria present in 20 Danish full-scale WWTPs with 176 nutrient removal that primarily treated municipal wastewater. All plants had biological nitrogen 177 removal and most also had EBPR (17 plants), while 3 plants had chemical P-removal (BNR plants). 178 The survey was carried out using 16S rRNA gene sequencing of the V1-3 region on the Illumina 179 MiSeq platform. Using the MiDAS 3 taxonomy, microbial community composition was 180 characterized at genus and species level in all plants based on 712 samples collected 2 to 4 times per 181 year for each plant from 2006 to 2018. 182 The overall microbial diversity within each plant was estimated using the Simpson index of 183 diversity and compared for all plants analyzed (Figure 2). The Simpson index places a greater weight 184 on species evenness than the richness (Kim et al., 2017), thus allowing the assessment of how evenly 185 distributed relative taxa contributions are across the samples. Since the value of the index represents 186 the probability that two randomly selected individuals will belong to different taxa, the low values of 187 Simpson index of diversity (Figure 2) indicate that the microbial communities were dominated by a 188 low number of abundant taxa across the plants. Additionally, the low spread of the index in each plant 189 indicates that the microbial communities across the plants were consistently stable across the years.

5

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

190 Similar alpha diversity indices were observed in WWTPs across the world by Wu et al., (2019), 191 calculated as Inverse Simpson (Figure S1). 192 The overall variation in activated sludge community structure in 20 Danish WWTPs was 193 visualized using RDA plot (Figure 3) with the WWTP location as the constrained variable. In many 194 cases the samples coming from different WWTPs were not completely separated from each other, 195 indicating high similarity between the microbial communities in those WWTPs. This is also 196 highlighted by the fact that the eigenvalues of the two principal components plotted relative to the 197 total variance in the data are relatively small (9.7% and 5.6%). Fredericia, Hirtshals, Esbjerg E, and 198 Esbjerg W WWTPs appeared to have a more distinct microbial community composition, presumably 199 due to a high fraction of industrial waste in the influent (Table S1). A pronounced clustering of the 200 samples coming from individual WWTPs indicates that variation between the plants was likely more 201 pronounced than within individual plants over time, and suggests long-term stability of the microbial 202 communities in Danish plants. 203 The microbial diversity of the WWTPs was composed of 1,647 genera and 3,651 species. 204 However, a relatively small fraction of these comprised a large proportion of the reads, thus 205 representing the abundant taxa that are assumed to carry out the main functions in the system (Figure 206 4a) (Saunders et al., 2016). The abundant taxa were defined as those that constitute the top 80% of 207 reads obtained by amplicon sequencing. In the BNR and EBPR plants, we found 157 and 164 208 abundant genera and 421 and 504 abundant species, respectively. These species had a relative ASV 209 read abundance above approx. 0.04%. 210 The overlap of taxa between the BNR and EBPR abundant communities both at genus and 211 species level was high (Figure 4b), and the core taxa, defined as being present and abundant in ≥80% 212 of the samples (Saunders et al., 2016), constituted more than 45% of the total reads at genus level and 213 more than 25% of total reads at species level in both types of plants (Figure 4c). The number of 214 genera and species belonging to the core communities in BNR and EBPR plants constituted a small 215 fraction of the total number of taxa present (1-6-3.5%). Both types of plants shared 72% of genera 216 and 65% of species in their respective core communities. The results show that the inclusion of an 217 anaerobic process step in the EBPR plants compared to the conventional BNR plants did not affect 218 the communities very much.

219 Species-level community composition 220 The most abundant species present in the BNR and EBPR plants are shown in Figure 5. Among the 221 10 most abundant species, two belong to genera with a provisional ‘midas’ name, while of the top 50 222 and top 100 species, 13 and 36 species, respectively, belong to genera not possessing genus-level 223 classification in SILVA (top 100 species are presented in Figure S2). These would be excluded from 224 the analysis if MiDAS 3 was not used as the reference database, demonstrating the advantage of using 225 the ecosystem-specific taxonomy. 226 A large fraction of the most abundant species is poorly described in the literature. The top 50 227 species belong to 41 different genera, of which almost 60% (28 genera) have no known function in 228 wastewater treatment systems (McIlroy et al., 2017, 2015). The function of 65% of all genera defined 229 as abundant in the EBPR and BNR plants (Figure 2a) is unknown, demonstrating the need for more 230 studies that link identity and function. A few species, however, match SILVA type strains, such as

6

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

231 Acidovorax defluvii and Intestinibacter bartletti, so these may be applied for deeper studies of their 232 physiology. 233 The most abundant species in both BNR and EBPR plants belong to the genus Tetrasphaera, 234 which represents polyphosphate-accumulating organisms (PAO) with versatile metabolism, 235 including the potential for denitrification (Herbst et al., 2019; Kristiansen et al., 2013). Other very 236 abundant species belong to the filamentous genera Ca. Microthrix (McIlroy et al., 2013) and Ca. 237 Villigracilis (Nierychlo et al., 2019); the former known to cause serious bulking problems (Rossetti 238 et al., 2005). Trichococcus may also possess filamentous morphology in activated sludge and may be 239 implicated in bulking (Liu and Seviour, 2001). Rhodoferax is a denitrifier (McIlroy et al., 2014), 240 while the function in activated sludge is unknown for species in the Romboutsia and Rhodobacter 241 genera. The novel genus midas_g_17, for which no functional information is available, belongs to 242 the family Saprospiraceae, and was wrongly classified as Ca. Epiflobacter in the previous version of 243 MiDAS taxonomy. Based on the coverage of the FISH probes available and mapping of the partial 244 16S sequences (Xia et al., 2008) to MiDAS 3 database, Ca. Epiflobacter name was re-assigned to a 245 correct genus, which is present at much lower abundance in the ecosystem. MiDAS 3 provides good 246 opportunities to reclassify sequences from previous studies, for which information about distribution 247 and/or function is available. For example, after mapping a set of partial 16S rRNA gene sequences 248 originally related to genus Aquaspirillum (Thomsen et al., 2007), they could be reclassified as 249 midas_g_33 – an abundant novel genus, indicating its potential role in denitrification. 250 Interestingly, the diversity of abundant species within the different genera was generally low in 251 the plants, typical with 3-4 abundant species in each of the most abundant genera. However, the 252 known PAO Tetrasphaera and the less abundant Ca. Accumulibacter exhibited slightly higher 253 diversity, each having 10 and 5 abundant species, respectively (Figure 6). Tetrasphaera had one 254 dominant species (midas_s_5) present in most plants, while the other species appeared more 255 randomly distributed. No clear dominance of any of Ca. Accumulibacter species could be observed 256 as their abundance was very different across the plants. Two of these, Ca. A. phosphatis and Ca. A. 257 aalborgensis are described at the species-level, based on the genomes available (Albertsen et al., 2016; 258 Martín et al., 2006). Interestingly, the genus Dechloromonas, which is also suggested to be a putative 259 PAO (Stokholm-Bjerregaard et al., 2017), was more abundant than Ca. Accumulibacter, and their 260 diversity was low with only two dominant species present in almost all plants. In general, high species 261 diversity of the PAOs may suggest a high degree of specialization allowing the exploitation of 262 different niches in the activated sludge ecosystem. Ca. Accumulimonas, another putative PAO, which 263 was described in the previous version of MiDAS, is not included in the present taxonomy version, as 264 sequences representing that genus, according to the present SILVA taxonomy, were classified to the 265 genus Halomonas.

266 The MiDAS reference database provides a common language for the field 267 Only a few studies have made surveys of the microbial communities in activated sludge systems, and 268 in all cases, it has only been with genus-level and not species-level classification (Saunders et al., 269 2016; Wu et al., 2019). Since the different studies have applied different extraction procedures, 270 different primers (typically V1-3 or V4), and different databases for taxonomic assignment (MiDAS 271 2, SILVA, RDP or Greengenes), it can be difficult to compare results across studies at the genus level,

7

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

272 and impossible at lower taxonomic levels. Therefore, the use of the ecosystem-specific MiDAS 273 database will provide a common language for all work in the field. Another key advantage of the 274 MiDAS 3 taxonomy is that it includes taxa abundant in similar systems across the world (Dueholm 275 et al., 2019), which are missing in SILVA, RDP, and Greengenes. By using this approach, and similar 276 protocols for extraction and primer selection (Albertsen et al., 2015; Dueholm et al., 2019), it will be 277 possible to compare the results of WWTP microbiome studies in the future.

278 The MiDAS reference database for anaerobic digesters 279 Anaerobic digestion is a key technology increasingly used worldwide to reduce waste, generate 280 energy, and minimize the carbon footprint of the plant. In order to provide a comprehensive reference 281 database for the microbes present in the whole wastewater treatment system, MiDAS 3 is based on 282 sequences from activated sludge as well as 16 full-scale Danish anaerobic digesters treating primary 283 and waste activated sludge. A detailed analysis of bacterial and archaeal community structure at 284 species-level has been performed in another study (Jiang et al., n.d.). The study was based on >1000 285 samples collected over 6 years from 46 anaerobic digesters, located at 22 WWTPs in Denmark. When 286 the microbial data was coupled with the analysis of operational, physicochemical, and performance 287 parameters, the main factors driving the assembly of the AD community were elucidated.

288 MiDAS Field Guide online platform (midasfieldguide.org) 289 The new MiDAS Field Guide (www.midasfieldguide.org) is a comprehensive database of microbes 290 in wastewater treatment systems, which was updated at the same time as the release of the MiDAS 3 291 database took place. All microbes found in activated sludge and anaerobic digesters are now included 292 in the field guide and, for the first time, species are listed for each genus. Our database now covers 293 more than 1,800 genera and 4,200 species found in biological nutrient removal treatment plants and 294 anaerobic digesters. More detailed abundance information is now provided both at genus and species 295 level, and functional information is described where available. 296 The MiDAS Field Guide integrates the identity of the microbes with the available functional 297 information (Figure 7) into a community knowledge platform in the form of a searchable database 298 of referenced information, thus acting as a central, on-line repository for current knowledge, where 299 all are welcome to contribute. Due to the rapidly growing amount of relevant literature, and high 300 number of taxa included, the MiDAS Field Guide is a continuous work in progress, with our efforts 301 prioritized towards the microorganisms that comprise the core microbiome as well as other abundant 302 or process-critical species. 303 New features launched in the updated MiDAS Field Guide include: 1) phylogenetic Search by 304 taxonomy function, where all microbes found in the ecosystem are displayed in their respective 305 taxonomic groups, 2) the possibility of classifying sequences using blast function, and MiDAS 3 306 taxonomy directly online. Updated protocols for DNA extraction and amplicon library preparation 307 are available online to facilitate establishment of common framework for studies of microbial ecology 308 of wastewater treatment systems.

8

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

309 Conclusions 310 Here we present the new ecosystem-specific MiDAS 3 reference database, based on a comprehensive 311 set of full-length 16S rRNA gene sequences derived from activated sludge and anaerobic digester 312 systems, and MiDAS 3 taxonomy built using AutoTax. It proposes unique provisional names for all 313 microorganisms important in wastewater treatment ecosystems, down to species level. MiDAS 3 was 314 used in the detailed analysis of microbial communities in 20 Danish wastewater treatment plants 315 (WWTPs) with nutrient removal, sampled over 12 years. This enabled unprecedented resolution, 316 revealing for the first time all species present in the plants, the stability of the communities, many 317 abundant core taxa, and the diversity of species in important functional guilds, exemplified by the 318 PAOs. We found many genera without known function, emphasizing the need for more efforts 319 towards elucidating the role of important members of wastewater treatment ecosystems. We also 320 present a new version of the MiDAS Field Guide, an online resource with a complete list of genera 321 and species found in activated sludge and anaerobic digesters. Through the field guide, microbe 322 identity is linked to available knowledge on function, and user microbial 16S rRNA gene sequences 323 can be classified online with the MiDAS 3 taxonomy. The central purpose of the MiDAS Field Guide 324 is to facilitate collaborative research into wastewater treatment ecosystem functions, which will 325 support efforts towards the sustainable production of clean water and bioenergy, and the development 326 of resource recovery, ending ultimately in the circular economy.

327 Acknowledgement 328 The project has been funded by the Danish Research Council (grant 6111-00617A), Innovation Fund 329 Denmark (OnlineDNA, grant 6155-00003A), the Villum foundation (Dark Matter and grant 13351), 330 and Danish wastewater treatment plants.

331 References 332 Albertsen, M., Karst, S.M., Ziegler, A.S., Kirkegaard, R.H., Nielsen, P.H., 2015. Back to Basics – 333 The Influence of DNA Extraction and Primer Choice on Phylogenetic Analysis of Activated 334 Sludge Communities. PLoS ONE 10, e0132783. 335 https://doi.org/10.1371/journal.pone.0132783 336 Albertsen, M., McIlroy, S.J., Stokholm-Bjerregaard, M., Karst, S.M., Nielsen, P.H., 2016. 337 “Candidatus Propionivibrio aalborgensis”: A Novel Glycogen Accumulating Organism 338 Abundant in Full-Scale Enhanced Biological Phosphorus Removal Plants. Front. Microbiol. 339 7. https://doi.org/10.3389/fmicb.2016.01033 340 Andersen, K.S., Kirkegaard, R.H., Karst, S.M., Albertsen, M., 2018. ampvis2: an R package to 341 analyse and visualise 16S rRNA amplicon data. bioRxiv 299537. 342 https://doi.org/10.1101/299537 343 Andersen, M.H., McIlroy, S.J., Nierychlo, M., Nielsen, P.H., Albertsen, M., 2018. Genomic 344 insights into Candidatus Amarolinea aalborgensis gen. nov., sp. nov., associated with

9

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

345 settleability problems in wastewater treatment plants. Systematic and Applied Microbiology. 346 https://doi.org/10.1016/j.syapm.2018.08.001 347 Boughner, L.A., Singh, P., 2016. Microbial Ecology: Where are we now? Postdoc J 4, 3–17. 348 https://doi.org/10.14304/SURYA.JPR.V4N11.2 349 Callahan, B.J., Wong, J., Heiner, C., Oh, S., Theriot, C.M., Gulati, A.S., McGill, S.K., Dougherty, 350 M.K., 2019. High-throughput amplicon sequencing of the full-length 16S rRNA gene with 351 single-nucleotide resolution. Nucleic Acids Res 47, e103–e103. 352 https://doi.org/10.1093/nar/gkz569 353 Cole, J.R., Wang, Q., Fish, J.A., Chai, B., McGarrell, D.M., Sun, Y., Brown, C.T., Porras-Alfaro, 354 A., Kuske, C.R., Tiedje, J.M., 2014. Ribosomal Database Project: data and tools for high 355 throughput rRNA analysis. Nucleic Acids Res 42, D633–D642. 356 https://doi.org/10.1093/nar/gkt1244 357 DeSantis, T.Z., Hugenholtz, P., Larsen, N., Rojas, M., Brodie, E.L., Keller, K., Huber, T., Dalevi, 358 D., Hu, P., Andersen, G.L., 2006. Greengenes, a chimera-checked 16S rRNA gene database 359 and workbench compatible with ARB. Appl. Environ. Microbiol. 72, 5069–5072. 360 https://doi.org/10.1128/AEM.03006-05 361 Dueholm, M.S., Andersen, K.S., Petriglieri, F., McIlroy, S.J., Nierychlo, M., Petersen, J.F., 362 Kristensen, J.M., Yashiro, E., Karst, S.M., Albertsen, M., Nielsen, P.H., 2019. 363 Comprehensive ecosystem-specific 16S rRNA gene databases with automated taxonomy 364 assignment (AutoTax) provide species-level resolution in microbial ecology. bioRxiv. 365 Edgar, R.C., 2016. SINTAX: a simple non-Bayesian taxonomy classifier for 16S and ITS 366 sequences. bioRxiv 074161. https://doi.org/10.1101/074161 367 Glöckner, F.O., Yilmaz, P., Quast, C., Gerken, J., Beccati, A., Ciuprina, A., Bruns, G., Yarza, P., 368 Peplies, J., Westram, R., Ludwig, W., 2017. 25 years of serving the community with 369 ribosomal RNA gene reference databases and tools. Journal of Biotechnology, 370 Bioinformatics Solutions for Big Data Analysis in Life Sciences presented by the German 371 Network for Bioinformatics Infrastructure 261, 169–176. 372 https://doi.org/10.1016/j.jbiotec.2017.06.1198 373 Herbst, F.-A., Dueholm, M.S., Wimmer, R., Nielsen, P.H., 2019. The Proteome of Tetrasphaera 374 elongata is adapted to Changing Conditions in Wastewater Treatment Plants. Proteomes 7. 375 https://doi.org/10.3390/proteomes7020016 376 Jiang, C., Andersen, M.H., Peces, M., Nierychlo, M., Yashiro, E., Andersen, K.S., Kirkegaard, 377 R.H., Kucheryavskiy, S., Hao, L., Høgh, J., Hansen, A.A., Dueholm, M.S., Nielsen, P.H., 378 n.d. Associations between species-level microbial communities and key variables revealed 379 in anaerobic digesters at 22 Danish wastewater treatment plants during a six-years survey. 380 BioRxiv. 381 Karst, S.M., Dueholm, M.S., McIlroy, S.J., Kirkegaard, R.H., Nielsen, P.H., Albertsen, M., 2018. 382 Retrieval of a million high-quality, full-length microbial 16S and 18S rRNA gene sequences 383 without primer bias. Nature Biotechnology. https://doi.org/10.1038/nbt.4045 384 Karst, S.M., Ziels, R.M., Kirkegaard, R.H., Albertsen, M., 2019. Enabling high-accuracy long-read 385 amplicon sequences using unique molecular identifiers and Nanopore sequencing. bioRxiv 386 645903. https://doi.org/10.1101/645903

10

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

387 Kim, B.-R., Shin, J., Guevarra, R.B., Lee, J.H., Kim, D.W., Seol, K.-H., Lee, J.-H., Kim, H.B., 388 Isaacson, R.E., 2017. Deciphering Diversity Indices for a Better Understanding of Microbial 389 Communities. Journal of Microbiology and Biotechnology 27, 2089–2093. 390 https://doi.org/10.4014/jmb.1709.09027 391 Kristiansen, R., Nguyen, H.T.T., Saunders, A.M., Nielsen, J.L., Wimmer, R., Le, V.Q., McIlroy, 392 S.J., Petrovski, S., Seviour, R.J., Calteau, A., Nielsen, K.L., Nielsen, P.H., 2013. A 393 metabolic model for members of the genus Tetrasphaera involved in enhanced biological 394 phosphorus removal. ISME J 7, 543–554. https://doi.org/10.1038/ismej.2012.136 395 Liu, J.R., Seviour, R.J., 2001. Design and application of oligonucleotide probes for fluorescent in 396 situ identification of the filamentous bacterial morphotype Nostocoida limicola in activated 397 sludge. Environmental Microbiology 3, 551–560. https://doi.org/10.1046/j.1462- 398 2920.2001.00229.x 399 Martín, H.G., Ivanova, N., Kunin, V., Warnecke, F., Barry, K.W., McHardy, A.C., Yeates, C., He, 400 S., Salamov, A.A., Szeto, E., Dalin, E., Putnam, N.H., Shapiro, H.J., Pangilinan, J.L., 401 Rigoutsos, I., Kyrpides, N.C., Blackall, L.L., McMahon, K.D., Hugenholtz, P., 2006. 402 Metagenomic analysis of two enhanced biological phosphorus removal (EBPR) sludge 403 communities. Nat Biotechnol 24, 1263–1269. https://doi.org/10.1038/nbt1247 404 McIlroy, S.J., Kirkegaard, R.H., McIlroy, B., Nierychlo, M., Kristensen, J.M., Karst, S.M., 405 Albertsen, M., Nielsen, P.H., 2017. MiDAS 2.0: an ecosystem-specific taxonomy and online 406 database for the organisms of wastewater treatment systems expanded for anaerobic digester 407 groups. Database 2017. https://doi.org/10.1093/database/bax016 408 McIlroy, S.J., Kristiansen, R., Albertsen, M., Michael Karst, S., Rossetti, S., Lund Nielsen, J., 409 Tandoi, V., James Seviour, R., Nielsen, P.H., 2013. Metabolic model for the filamentous 410 ‘Candidatus Microthrix parvicella’’ based on genomic and metagenomic analyses.’ ISME J 411 7, 1161–1172. https://doi.org/10.1038/ismej.2013.6 412 McIlroy, S.J., Saunders, A.M., Albertsen, M., Nierychlo, M., McIlroy, B., Hansen, A.A., Karst, 413 S.M., Nielsen, J.L., Nielsen, P.H., 2015. MiDAS: the field guide to the microbes of 414 activated sludge. Database 2015, bav062. https://doi.org/10.1093/database/bav062 415 McIlroy, S.J., Starnawska, A., Starnawski, P., Saunders, A.M., Nierychlo, M., Nielsen, P.H., 416 Nielsen, J.L., 2014. Identification of active denitrifiers in full-scale nutrient removal 417 wastewater treatment systems. Environ Microbiol n/a-n/a. https://doi.org/10.1111/1462- 418 2920.12614 419 Nierychlo, M., Miłobędzka, A., Petriglieri, F., McIlroy, B., Nielsen, P.H., McIlroy, S.J., 2019. The 420 morphology and metabolic potential of the Chloroflexi in full-scale activated sludge 421 wastewater treatment plants. FEMS Microbiol Ecol 95. 422 https://doi.org/10.1093/femsec/fiy228 423 Nierychlo, M., Nielsen, P.H., 2017. Experiences in various countries: Denmark, in: Tandoi, V., 424 Rossetti, S., Wanner, J. (Eds.), Activated Sludge Separation Problems: Theory, Control 425 Measures, Practical Experiences. IWA Publishing, pp. 198–209. 426 Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., Glöckner, F.O., 427 2013. The SILVA ribosomal RNA gene database project: improved data processing and 428 web-based tools. Nucl. Acids Res. 41, D590–D596. https://doi.org/10.1093/nar/gks1219

11

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

429 R Core Team, 2018. R: A language and environment for statistical computing. R Foundation for 430 Statistical Computing, Vienna, Austria. Available online at https://www.R-project.org/. 431 Rossetti, S., Tomei, M.C., Nielsen, P.H., Tandoi, V., 2005. “Microthrix parvicella”, a filamentous 432 bacterium causing bulking and foaming in activated sludge systems: a review of current 433 knowledge. FEMS Microbiol. Rev. 29, 49–64. https://doi.org/10.1016/j.femsre.2004.09.005 434 RStudio Team, 2015. RStudio: Integrated Development for R. RStudio, Inc., Boston, MA URL 435 http://www.rstudio.com/. 436 Saunders, A.M., Albertsen, M., Vollertsen, J., Nielsen, P.H., 2016. The activated sludge ecosystem 437 contains a core community of abundant organisms. The ISME journal 10, 11–20. 438 Stokholm-Bjerregaard, M., McIlroy, S.J., Nierychlo, M., Karst, S.M., Albertsen, M., Nielsen, P.H., 439 2017. A Critical Assessment of the Microorganisms Proposed to be Important to Enhanced 440 Biological Phosphorus Removal in Full-Scale Wastewater Treatment Systems. Front. 441 Microbiol. 8. https://doi.org/10.3389/fmicb.2017.00718 442 Thomsen, T.R., Kong, Y., Nielsen, P.H., 2007. Ecophysiology of abundant denitrifying bacteria in 443 activated sludge. FEMS Microbiology Ecology 60, 370–382. https://doi.org/10.1111/j.1574- 444 6941.2007.00309.x 445 Wickham, H., 2009. ggplot2 - Elegant Graphics for Data Analysis. Springer Science + Business 446 Media. 447 Wu, L., Ning, D., Zhang, B., Li, Y., Zhang, P., Shan, X., Zhang, Q., Brown, M., Li, Z., Nostrand, 448 J.D.V., Ling, F., Xiao, N., Zhang, Y., Vierheilig, J., Wells, G.F., Yang, Y., Deng, Y., Tu, Q., 449 Wang, A., Zhang, T., He, Z., Keller, J., Nielsen, P.H., Alvarez, P.J.J., Criddle, C.S., 450 Wagner, M., Tiedje, J.M., He, Q., Curtis, T.P., Stahl, D.A., Alvarez-Cohen, L., Rittmann, 451 B.E., Wen, X., Zhou, J., 2019. Global diversity and biogeography of bacterial communities 452 in wastewater treatment plants. Nature Microbiology 1. https://doi.org/10.1038/s41564-019- 453 0426-5 454 Xia, Y., Kong, Y., Nielsen, P.H., 2008. In situ detection of starch-hydrolyzing microorganisms in 455 activated sludge. FEMS Microbiology Ecology 66, 462–471. https://doi.org/10.1111/j.1574- 456 6941.2008.00559.x 457 Yarza, P., Yilmaz, P., Pruesse, E., Glöckner, F.O., Ludwig, W., Schleifer, K.-H., Whitman, W.B., 458 Euzéby, J., Amann, R., Rosselló-Móra, R., 2014. Uniting the classification of cultured and 459 uncultured bacteria and archaea using 16S rRNA gene sequences. Nat Rev Micro 12, 635– 460 645. https://doi.org/10.1038/nrmicro3330 461

12

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

462 Tables:

463 Table 1. Classification of abundant species belonging to genus Acidovorax and Ca. Amarolinea with 464 different taxonomies. Taxonomy Phylum Class Order Family Genus Species MiDAS 3 Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax A. defluvii MiDAS 2.1 Proteobacteria Acidovorax - SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax - Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - - RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Acidovorax - MiDAS 3 Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae Acidovorax midas_s_8989 MiDAS 2.1 Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - - SILVA Proteobacteria Gammaproteobacteria Betaproteobacteriales Burkholderiaceae - - Greengenes Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae - - RDP Proteobacteria Betaproteobacteria Burkholderiales Comamonadaceae Simplicispira - MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca . Amarolinea midas_s_1 MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum - SILVA Chloroflexi Anaerolineae C10_SB1A - - - Greengenes Chloroflexi Anaerolineae SHA-20 - - - RDP ------MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca . Amarolinea midas_s_553 MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum - SILVA Chloroflexi Anaerolineae C10_SB1A - - - Greengenes Chloroflexi Anaerolineae SHA-20 - - - RDP ------MiDAS 3 Chloroflexi Anaerolineae Caldilineales Amarolineaceae Ca . Amarolinea Ca . A. aalborgensis MiDAS 2.1 Chloroflexi SJA-15 C10_SB1A C10_SB1A Ca. Amarilinum - SILVA Chloroflexi Anaerolineae C10_SB1A - - - Greengenes Chloroflexi Anaerolineae SHA-20 - - - RDP ------465 MiDAS 2.1 (McIlroy et al., 2017); SILVA: Release 132_SSURef_NR99 (Quast et al., 2013); GreenGenes: Release 16s_13.5 (DeSantis 466 et al., 2006); Ribosomal Database Project: Release 16S_v16 training set (Cole et al., 2014).

13

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

467 Figures:

468 469 Figure 1. MiDAS 3 taxonomic assignment workflow based on the AutoTax framework. (1) ESVs 470 from the study of Danish plants (Dueholm et al., 2019), amended with AddOns ESVs coming from, 471 e.g., published genomes, are first mapped to the SILVA_132_SSURef_Nr99 database to identify the 472 closest relative. (2) Taxonomy is adopted from this sequence after trimming, based on identity and 473 the taxonomic thresholds proposed by Yarza et al., (2014) with 94.5% sequence similarity for genus- 474 level. To gain species information, ESVs are mapped to sequences from type strains extracted from 475 the SILVA database, and species names were adopted if the identity was >98.7% and the type strain 476 genus matched that of the closest reference in the complete database. (3) ESVs were also clustered 477 by greedy clustering at different identities, corresponding to the thresholds proposed by Yarza et al., 478 (2014) to generate a stable de novo taxonomy. (4) The de novo taxonomy was curated with taxon 479 names for ESVs that represent, e.g., genomes from strains published in peer-reviewed literature. (5) 480 Finally, a comprehensive taxonomy was obtained by filling gaps in the SILVA-based taxonomy with 481 the de novo taxonomy with provisional “midas” placeholder names (adapted from Dueholm et al., 482 2019).

14

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

483 484 Figure 2. Alpha diversity estimates of activated sludge community composition using ASV-level 485 taxa in individual plants, calculated using the Simpson index of diversity (1-D). Data represents 712 486 samples from 20 Danish full-scale WWTPs collected from 2006 to 2018 (with 17-51 samples per 487 plant).

15

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

488 489 Figure 3. Redundancy analysis (RDA) plots of activated sludge communities, constrained to 490 visualize the variance among 712 samples explained by WWTP location. Axes show the percent of 491 total variance described by each component (first and second value represent relative contribution of 492 each axis to the total variance in the data and the constrained space only, respectively). Data represents 493 samples from 20 Danish full-scale WWTPs collected from 2006 to 2018.

16

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

494 495 Figure 4 a-c. a) Cumulative read abundance of genus- and species-level taxa found in activated 496 sludge, plotted in rank order. Dashed line denotes 80% of the reads, which is assumed to cover the 497 abundant taxa, b) Venn-diagram showing abundant (top 80% of the reads) genera and species shared 498 between BNR and EBPR plants, c) core community (observation frequency vs abundance frequency) 499 in Danish BNR and EBPR plants for genus- and species-level taxa. Abundance cut-off values 500 represent the lowest abundance of the taxon belonging to the top 80% of the reads in given group 501 (genus/species). Darker points indicate multiple observations at given positions. Colored points 502 designate core taxa. Data represents 111 samples from BNR plants and 601 samples from EBPR 503 plants.

17

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

504 505 Figure 5. Boxplot showing the occurrence of the top 50 most abundant species in Danish BNR and 506 EBPR WWTPs.

18

bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

507 Figure 6. Occurrence of abundant PAO species in 20 full-scale activated sludge WWTPs. Data represents average read abundance values for each plant, based on samples collected 2–4 times per year from 2006 to 2018. Individual PAO genera are indicated by different colors.

19 bioRxiv preprint doi: https://doi.org/10.1101/842393; this version posted November 14, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license.

Figure 7. MiDAS Field Guide example of Tetrasphaera genus entry.

20