<<

Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm

Communities along an Acid Mine Drainage Recovery Gradient

A thesis presented to

the faculty of

the College of Arts and Sciences of Ohio University

In partial fulfillment

of the requirements for the degree

Master of Science

Daniel I. Wolf

August 2018

© 2018 Daniel I. Wolf. All Rights Reserved. 2

This thesis titled

Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm

Communities along an Acid Mine Drainage Recovery Gradient

by

DANIEL I. WOLF

has been approved for

the Department of Environmental and Biology

and the College of Arts and Sciences by

Morgan L. Vis

Professor of Environmental and Plant Biology

Robert Frank

Dean, College of Arts and Sciences 3

ABSTRACT

WOLF, DANIEL, M.S., August 2018, Environmental and Plant Biology

Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm

Communities along an Acid Mine Drainage Recovery Gradient

Director of Thesis: Morgan L. Vis

In southeastern Ohio, historical coal mining has exposed streams to acid mine drainage. Active remediation of these streams has proven to be successful for some streams, while others have Not Recovered based on macroinvertebrate assessment. In this study, biofilms were collected from three Not Recovered streams, three Recovered streams, and two Unimpaired streams. The biodiversity was characterized by metabarcoding using four DNA barcode markers and high-throughput amplicon sequencing. Two universal markers (16S and 18S), were sequenced along with an algal specific marker (UPA) and a diatom specific marker (rbcL). Ordination of Bray-Curtis

Index calculated from the total operational taxonomic units (OTUs) present in each site showed the Unimpaired and Not Recovered sites were significantly different.

Additionally, Shannon index for the rbcL and UPA markers showed significantly lower alpha diversity in Not Recovered streams compared to Unimpaired streams. Further taxonomic investigation revealed a decrease in relative abundance and OTUs of diatoms in Not Recovered streams compared to Recovered and Unimpaired streams including some diatom indicator genera of good water quality. Overall, results from this study describe biofilm diversity during remediation and support previous findings using other methods to assess stream impairment from AMD. 4

DEDICATION

I would like to dedicate this to my family, friends, and mentors who influenced me to be curious, learn, and grow. 5

ACKNOWLEDGMENTS

I would first like to acknowledge and thank Dr. Morgan Vis, who provided me with an impeccable research environment where her students are always priority. Her mentorship over the last two years has molded me from a student who likes science to a passionate researcher. I would like to thank my committee members, Dr. Sarah Wyatt and

Dr. Kelly Johnson. Both Dr. Wyatt and Dr. Johnson provided me with excellent advice on my project design allowing me to create a successful research project. I would also like to thank Dr. Kelly Johnson and the Ohio University watershed group for MAIS sampling at my sites. I would like to thank Dr. Rebecca Snell for her help with the statistics in this project. I would like to thank Nathan Smucker for his knowledge of indicator diatoms.

Additionally, I would like to thank the Ohio Department of Natural Resources for collecting permits. I would like to thank to Dr. Bill Broach and Rachel Yoho at the Ohio

University Genomics Facility for help with project design and sequencing. I would like to thank the Student Enhancement Award for funding my masters project. The Vis lab member (present and past), including Josh Evans, Emily Keil, Lexie Redmond, and

Amanda Szinte for helping me with collections, water analysis, and much needed moral support. This whole experience wouldn’t have been the same without my friends in the

Ohio University Environmental and Plant Biology Department, especially Laura Mason,

Alexander Meyers, and Sarah Smith. Last but not least, I would like to thank Dr. Eric

Salomaki for being a role model and encouraging me to work harder and continue my scientific career. 6

TABLE OF CONTENTS

Page

Abstract ...... 3 Dedication ...... 4 Acknowledgments ...... 5 List of Tables ...... 8 List of Figures ...... 9 Introduction ...... 10 Materials and Methods ...... 14 Pilot Study ...... 14 Study Site Selection and Location ...... 16 Biofilm Collection ...... 17 Chlorophyll a Analysis ...... 18 Biofilm Extraction ...... 18 Amplicon PCR and Sequencing ...... 19 Bioinformatics ...... 21 Statistical Analysis ...... 22 Results...... 24 Stream Chemical and Physical Data ...... 24 Illumina Data Analyses ...... 25 Within Site Replication ...... 25 Beta Diversity of all OTUs ...... 26 Analyses of all OTUs ...... 27 Analyses of Algal OTUs ...... 28 Alpha Diversity ...... 31 Discussion ...... 33 Experimental Design...... 33 Comparison of Stream Types ...... 34 Conclusion...... 39 References ...... 40 Appendix 1: Site Information ...... 66 Appendix 2: Shannon Index Values ...... 67 7

Appendix 3: Rarefaction Plot of 16S data by replicate ...... 69 Appendix 4: Rarefaction Plot of 18S data by replicate ...... 70 Appendix 5: Rarefaction Plot of UPA ...... 71 Appendix 6: Rarefaction Plot of rbcL data by Replicate ...... 72 Appendix 7: 16S Relative Abundance...... 73 Appendix 8: 16S Number of OTUs ...... 75 Appendix 9: 18S Relative Abundance...... 76 Appendix 10:18S Algal Relative Abundance ...... 78 Appendix 11: Number of 18S algal OTUs ...... 79 Appendix 12: UPA Relative Abundance ...... 80 Appendix 13: UPA OTUs per ...... 81 Appendix 14: rbcL Relative Abundance ...... 82 Appendix 15: rbcL OTUs ...... 84 Appendix 16: Read Counts throughout Bioinformatics Pipeline ...... 86

8

LIST OF TABLES

Page

Table 1: PCR Amplification Primers ...... 49 Table 2: PCR Thermocycler Settings ...... 50 Table 3: Mean Stream Chemical and Physical Data ...... 51 Table 4: Compairson of OTUs in Replicates ...... 52 Table 5: rbcL Diatom Relative Abundance ...... 52

9

LIST OF FIGURES

Page

Figure 1: Map of Collection Sites ...... 54 Figure 2: OTU Venn Diagram ...... 55 Figure 3: Shannon Diversity Boxplots...... 56 Figure 4: 16S Rarefaction Plot per Site...... 57 Figure 5: 18S Rarefaction Plot per Site...... 58 Figure 6: UPA Rarefaction Plot per Site...... 59 Figure 7: rbcL Rarefaction Plot per Site...... 60 Figure 8: Bray-Curtis Non-metric multidimensional scaling (NMDS)...... 61 Figure 9: Bray-Curtis Constrained Analysis of Principal Coordinates (CAP)...... 62 Figure 10: Relative Abundance of Algal Phyla for the 18S Marker ...... 63 Figure 11: Relative Abundance of Algal Phyla for the UPA Marker ...... 64 Figure 12: Relative Abundance of Diatom Genera for the rbcL Marker ...... 65

10

INTRODUCTION

Freshwater biofilms establish and grow on rocks, wood, and other surfaces in aquatic ecosystems (Wetzel 1983). The biofilm consisting of a complex community of algae, fungi, , and other unicellular organisms in a gelatinous polysaccharide matrix is a vital source of energy in stream ecosystems (Wetzel 1983, Stevenson et al.

1996). These communities may be the first to respond to environmental stress in freshwater streams, reflect the current environmental conditions (Edwards and Kjellerup

2013) and can have short recovery times from environmental perturbations (Belanger and

Rupe 1996). The microorganisms that comprise the biofilm community can be more or less tolerant of varied environmental conditions and community level changes in stream biofilms has been used to track changes in water chemistry due to eutrophication (Nelson et al. 2013). Organisms living within the biofilm that require specific environmental conditions can be used as “indicators” of water quality, such as the diatom genera

Cymbella and Cocconeis known to inhabit streams with high water quality (Wang at al.

2005).

In the Appalachian region, a substantial source of pollution is acid mine drainage

(AMD) from coal mines abandoned prior to regulation (U.S. EPA 1994). It is estimated that 12,000 stream km or 14% of all stream km in this region have been affected (U.S.

EPA 1997). When coal tailings and waste from these abandoned mines are exposed to air, water, and bacterial processes, sulfuric acid and iron oxide are produced from the oxidation of sulfide materials (Gray 1997). The acidic stream water leaches metals from soils and rocks, causing high concentrations of Al, Fe, Mn, Cu, As, and other toxins (U.S. 11

EPA 1994). In an effort to restore ecological stability, different techniques have been used to remediate AMD streams. Active treatment of AMD in the Appalachian region is primarily the alkaline doser, which discharges calcium oxide into AMD headwaters

(Costello 2003).

Environmental stresses caused by AMD have a variety of effects on the diversity of organisms in the biofilm community. Reduction in species richness in stream biofilms simplifies the food web and may reduce the ecological stability in the ecosystem (Gray

1997, Smucker et al. 2014). In particular, decreases in diatom species richness has been documented with increased AMD impairment due to diatom taxa that are sensitive to or less tolerant of this pollution (Verb and Vis 2001, 2005, Bray et al. 2008). Diatom taxa living within stream biofilms are correlated with a range of AMD impairment levels

(Zalack et al. 2010). Comparing the diatom community structure of AMD impaired and streams that have undergone remediation can be an effective tool to assess the efficacy of remediation efforts (Smucker et al. 2014). Whole biofilm community changes have been documented using functional measures such as fatty acid profiles (Guschina and

Harwood 2009, DeForest et al. 2016, Drerup and Vis 2016). Although the effects of

AMD and remediation have been investigated via diatom diversity and to a much lesser extent other biofilm organisms and functional measures, a study of the whole biofilm community diversity has yet to be undertaken. Determination of whole biofilm species richness, evenness and diversity would be difficult using traditional microscopic methods due to cryptic species and/or sheer quantity of organisms to identify (Stein et al. 2014,

Zimmermann et al. 2015). 12

Similar to traditional microscopy, individual members of the biofilm community have been identified using molecular data (Souza-Egipsy et al. 2008, Baker et al. 2009,

Dopheide et al. 2009, Lear et al. 2009). A DNA barcode or gene fragment can be targeted with specific primers for PCR and subsequent Sanger sequencing for taxonomic identification (Ratnasingham and Hebert 2007). Barcoding on a specimen by specimen basis would be too costly and time consuming for the entire community of organisms, which may have hundreds or thousands of taxa per sample (Elbrecht and Leese 2015).

With advent of high-throughput sequencing, environmental DNA (eDNA) barcoding or metabarcoding can be utilized to identify multiple taxa within a community simultaneously to determine the diversity (Zimmermann et al. 2015). Within the biofilm community, the most prominent group of organisms that have been investigated using molecular techniques are bacteria (16S rDNA barcode region) (Souza-Egipsy et al. 2008,

Lear et al. 2009) as well as Fungi and Protista Kingdoms (18S rDNA barcode region)(Baker et al. 2009, Dopheide et al. 2009). Organisms that are not easily amplified or have low taxonomic resolution with the primer sets for these standard markers have gone undetected, regardless of abundance and importance in the ecosystem (Marcelino et al. 2016). Multi-marker analyses can utilize both universal and taxonomic group specific primer sets to target barcode regions of a wide range of organisms and increase detection of unique operational taxonomic units (OTUs). This multiple marker strategy has been successful in increasing the known diversity in other ecosystems (Marcelino et al. 2016,

Pfendler et al. 2018).

The combination of high-throughput sequencing and a multi-marker barcoding 13 approach has yet to be employed to study stream biofilm biodiversity and has a high potential to yield insights into community differences associated with varying water quality. Remediation of AMD has resulted in streams with higher water quality. Yet, not all streams have shown the same improvements in biological condition. The purpose of this study was to examine biofilm diversity in streams along an AMD recovery gradient to investigate any differences in in overall biofilm community with a focus on the algal community. It was hypothesized that biofilm taxonomic diversity measures in streams would reflect water quality and remediation status similar to other biological metrics.

14

MATERIALS AND METHODS

Pilot Study

A pilot study was conducted to test various aspects of the laboratory protocols as well as the sequencing and bioinformatics pipeline. AMD streams contain pollutants (i.e. metals) that may interfere with DNA extraction and sequencing. In addition, biofilms have a thick polysaccharides matrix that may hamper extraction of high-quality DNA. In the preliminary study, one Recovered sites (MC 0300) and one Not Recovered site (MC

0240) were sampled in early May 2017. The environmental DNA (eDNA) was extracted using the DNeasy PowerSoil or PowerBiofilm kits (QIAGEN, Valencia, CA, USA) as per the manufacture protocol to determine which would be more successful removing inhibitors from the extraction that could cause problems with PCR or sequencing. The

PowerBiofilm kit was determined to be better based on DNA concentration and number of reads from the high-throughput sequencing.

I also tested the appropriate amount of starting material in order to maximize the amount of DNA Recovered without clogging the columns in the extraction kit. Starting material weighing 0.05 to 0.1g was used for the preliminary extractions. These values were determined based on the extraction kit protocol for stream rocks. The amount of starting material only slightly impacted the DNA concentration after extraction. Using

0.1g of material improved the DNA concentration of a few samples, but the majority had no change in concentration. During the homogenization step, two durations (1 and 4 minutes) were tested. It was determined based on amplification success (gel electrophoresis) and DNA concentration that 4 minutes was best for all markers, 15 especially the rbcL marker that targeted diatoms with a silica frustule that may require additional lysing time.

PCR amplification of DNA extracts was conducted using the 16S, 18S, UPA, and rbcL primers with Illumina overhangs (Table 1). Thermocycler settings were optimized for each primer pair during pilot study (Table 2). Amplified PCR product from each primer pair was multiplexed and sequenced using an Illumina Miseq V2 kit (Illumina

Inc., San Diego, CA, USA). I chose to sequence samples from both extraction kits, two replicates of the same eDNA sample, as well as a negative extraction control. The data were processed with Qiime1 (Caporaso et al. 2010) (Qiime2 was in early beta testing).

Data analyses after sequencing revealed the importance of sterile technique as the negative controls contained more sequences than originally expected. Therefore, in the primary study, the lab bench was bleached prior to extraction and PCR reactions were conducted in a laminar flow hood. Assignment of using Qiime1 did not show any obvious primer biases based on the range of taxa sequenced. Rarefaction curves were calculated using CLC genomics workbench (version 10.1.1; https://www.qiagenbioinformatics.com) to determine the number of samples that could be sequenced in a single MiSeq run while maintaining good sequencing depth.

Rarefaction curves for all markers showed that the accumulation of new operational taxonomic units (OTUs) per replicate plateaued at approximately 5,000 to 8,000 reads with each marker. Therefore, 31 multiplexed (4 amplicons per sample) samples could be run together, which is in excess of the number proposed for the primary study. 16

Study Site Selection and Location

In order to investigate biofilm communities in streams that have undergone AMD remediation and compare with those unimpacted by AMD, a total of nine sites were sampled (Appendix 1). These sites were categorized prior to sampling as Unimpaired,

Recovered, and Not Recovered. Stream reach categories were determined using MAIS

(Macroinvertebrate Aggregate Index for Streams) scores (Johnson 2007) from at least the last 5 years (2011-2016). A linear regression was used to test significance change in

MAIS scores over previous 5 years in both Recovered and Not Recovered sites.

Recovered sites were classified by having MAIS scores that had significantly (P < 0.05) improved since monitoring started and had sustained a MAIS score > 12 for the past 3 years. Not Recovered streams were determined based on streams with MAIS scores significantly (P < 0.05) improving over last 5 years but not ³12 for past two consecutive years. Sites classified as Unimpaired had not been previously impacted by AMD and have been classified as Excellent Warm Water Habitat (EWH) according to the EPA

(OEPA 1987). Three watersheds were sampled in order to have three sites in each category. However, one of the Unimpaired streams (ELRM 0.1) was removed from the dataset due to low 2017 MAIS scores. Therefore, the remaining sites were as follows:

WB SC RM1.8 (Recovered), Unimpaired EB001 (Unimpaired), and WB 51 (Not

Recovered) in Sunday Creek, HF 039 (Recovered) and HF 090 (Not Recovered) in

Hewett Fork (within Raccoon Creek) and MC 0300 (Recovered), LM 0110

(Unimpaired), and MC 0240 (Not Recovered) in Monday Creek (Fig. 1). 17

Biofilm Collection

The stream sites were sampled on two consecutive days (August 7- 8, 2017) with similar weather conditions. At each site, three independent biofilm replicates were collected. For each replicate, 10 cobbles were randomly selected within a single riffle, a

3-cm circular area of each cobble was scrubbed with a sterile soft bristle brush and the material pooled into a sterile 50 mL conical tube using a sterile 60 mL plastic syringe.

The three conical tubes were placed on ice for transport to the laboratory. A number of physical stream characteristics were measure at each site. Current velocity was measured using the Global Water Flow Probe (Global Water Instrumentation, Inc., College Station,

TX, USA); three measurements (in areas that were visually high, medium and low flow) were taken within the riffle to calculate the mean current velocity. Concurrently, stream depth was measured using the metric markings on the Global Water Flow Probe and mean depth was calculated. The Multi-Parameter PCTestr™ 35 (Oakton Instruments,

Vernon Hills, IL, USA) was used to measure pH, specific conductance, and water temperature. Stream width was measured across the widest part of the sampled riffle using a tape measure. Canopy cover was visually estimated in the middle of the riffle using a scale of 0 to 100% shade. Stream water was filtered through a 0.45µm filter into a

50 mL conical tube for chemical analysis in the laboratory.

On the collection date, the biofilm material was centrifuged in the 50 mL conical tubes at 3,400 rpm for 20 min at 4˚C. The supernatant was removed leaving the pelleted biofilm material. Before freezing biofilm material, samples were lightly stirred in the conical tube and approximately 1 g biofilm material from each tube was placed in a 18 microcentrifuge tube and centrifuged for 30 seconds at 11,600 rpm. Any remaining supernatant was removed and the samples were placed in a -20˚C freezer until extraction.

The remaining biofilm material was frozen to be used, if needed. A voucher of the biofilm from each sample was stored in a 2.5% glutaraldehyde solution in a 20 mL vial.

Within 48 hours of collection, stream water was analyzed using a HACH DR/890™ colorimeter (HACH Company, Loveland, CO, USA) and HACH powder pillows as per

2- the manufacturer’s protocol for sulfate (SO4 ) (Method 8051), iron (Fe) (Method 8008),

- and nitrate (NO3 ) (Method 8192).

Chlorophyll a Analysis

For each replicate from each site, a subsample of biofilm material was freeze- dried. Chlorophyll a was analyzed using 3-5 mg of freeze-dried biofilm. The samples were soaked in 90% acetone for 18-20 hours as described by EPA method 445.0 (Arar and Collins 1997). The concentration of chlorophyll a was determined using a Turner

TD-700 fluorometer (Turner Designs, Sunnyvale, CA, USA). To correct for phaeophytin- a, 0.1 N HCl solution was added. The results of the three replicates per site were reported as mean chlorophyll a concentration µg/g.

Biofilm Extraction

Biofilm extraction was conducted using sterile technique to minimize DNA contamination from lab sources. Extractions were completed during a three-day period.

Each set (1 per day) of extractions contained one replicate from each stream plus a negative control (autoclaved Nanopure H2O). The negative control was subjected to the same procedures as the other samples and later used in the bioinformatics pipeline to 19 remove potential contaminate sequences for the extractions conducted on the same day.

DNeasy PowerBiofilm kit was used to extract 0.1g biofilm. The manufacturer’s protocol was utilized with a one modification as follows: homogenization was performed using the

TissueLyser LT (QIAGEN) for 4 min at 50 oscillations/sec. After extraction, the eDNA sample was quantified using a Qubit Fluorometer 3.0 ( Technologies, Carlsbad, CA,

USA). All samples were diluted to 1ng/µl and stored at -20˚C in 100 µl aliquots.

Amplicon PCR and Sequencing

The first step in the two-step PCR procedure to prepare Illumina 16S amplicon sequencing libraries for multiple samples and markers was to amplify the four barcode markers in individual PCR reactions for each eDNA sample. The V4 region of 16S rDNA

(prokaryotes) was amplified using primers 515F and 806RB (Caporaso et al. 2011,

Apprill et al. 2015). The 18S rDNA () amplicon was amplified with primers

1391F and EukBr (Amaral-Zettler et al. 2009). Both the 16S and 18S primers were chosen from those available via the Earth Microbiome Project

(http://www.earthmicrobiome.org). The Universal Plastid Amplicon (UPA), which targets a portion of the 23S rDNA chloroplast gene in algae, was amplified using primers p23SrV_f1 and p23SrV_r1 (Sherwood and Presting 2007). Lastly, a rbcL chloroplast marker suitable for amplicon sequencing (> 600 bp) to specifically target diatom taxa was designed based on previously published primers. Sequences from 15 freshwater diatom genera were downloaded from GenBank and used as a test community to determine the best primer pair in Geneious v10.1 (Kearse et al. 2012). Two primers cfD and DtrbcL_3R

(Lang and Kaczmarska 2011, Hamsher et al. 2011) were chosen based on the amplicon 20 size (338 bp) and ability to bind to all genera from the test community (Geneious v10.1).

In order to increase the ability of the sequencer to distinguish similar sequences simultaneously, the library was made more complex by adding 0-2 N’s at the 5’ end of the primers, followed by either the forward or reverse Illumina overhang tail of 33 bp.

The primary PCR amplification was completed using 12.5 µl of Kapa HiFi HotStart

ReadyMix (Kapa Biosystems, Wilmington, MA, USA), 0.5 µl of each primer (10x), and

11.5 ng eDNA prepared in a sterile laminar flow hood. One master mix per gene was prepared and aliquoted for all samples along with a negative control (autoclaved

Nanopure H2O) and amplified simultaneously. Thermocycler settings, and detailed primer assembly information is provided (Tables 1, 2).

Secondary PCR amplification and sequencing was performed by the Ohio

University Genomics Facility. Samples were purified using Agencourt AmPure XP beads

(Beckman Coulter Inc., Indianapolis, IN, USA) and quantified using a Qubit Fluorometer

3.0. Equimolar amounts of the 16S, 18S, UPA, and rbcL PCR reactions from a single replicate were pooled, indexed using Nextera Index 1 and 2 primers (to generate uniquely identified samples) and enriched with the Kapa Hifi HotStart 2x PCR Master Mix (Kapa

Biosystems, Wilmington, MA, USA). Amplified pooled samples were analyzed on an

Agilent 2100 Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies, Santa

Clara, CA, USA) for size range and quantified on the Qubit Fluorometer 3.0. Libraries from all samples were pooled in equimolar ratios, denatured using 0.2 N NaOH, diluted to 18 pM and loaded with 10% PhiX clustering control on the Illumina MiSeq with the

600 cycle (2 x 300 bp paired end) Sequencing Reagents V3. 21

Bioinformatics

Reads were demultiplexed using Illumina CASAVA1.8. Demultiplexed reads were parsed by gene and the primer sequences were trimmed using separate_genes.py

(Marcelino and Verbruggen 2016). Reads were quality filtered based on Q30 score and trimmed to remove low quality reads while maintaining least 50 bp overlap of forward and reverse reads using Qiime2 (Caporaso et al. 2010, version 2017.12, https://docs.qiime2.org/2017.12/). Trimmed and filtered reads were denoised and dereplicated to produce Amplicon Sequence Variants (ASVs) using DADA2 (Callahan et al. 2016). ASVs are OTUs that have been assessed based on a 100% match as opposed to

97% match OTUs (previous standard). I will refer to these ASVs as OTUs, a term more often used in this field of study. After processing with DADA2, OTUs were filtered using the sequences present in PCR and extraction controls. Since I used one negative control for each set of extractions and PCR runs, OTUs present in negative controls were removed from samples that were in the same extraction or PCR run. OTUs that had only one or two reads assigned (singletons and doubletons) were removed from the dataset, as well. The SILVA database (release 128, Quast et al. 2013) was employed to assign taxonomy to 16S and 18S OTUs. For UPA OTUs, taxonomy was assigned from a SILVA

(release 123) curated database (Sherwood et al. 2017) and for the rbcL OTUs, taxonomy was assigned using the R-Syst Micro-Algae v6 database (http://www.rsyst.improving Not

Recovereda.fr/, 11/2017). Reference taxonomy databases were trimmed to contain only the region of the gene amplified by the primers in this study. A machine learning naïve

Bayes classifier was trained and used to assign taxonomy (confidence = 0.7, classify- 22 sklearn: Qiime2). OTUs present in the 16S dataset classified as chloroplast or mitochondria after assigning taxonomy were removed as these are not prokaryote sequences. Non-algal taxa (Embryophytes and non-photosynthetic Bacteria) present in the UPA marker dataset were also removed. Analysis of Variance (ANOVA) and

Kruskal-Wallis Rank Sum Test were calculated using R (version 3.4.2) on OTUs per

Phylum in each stream category. Venn diagrams for each DNA marker were created

(bioinformatics.psb.ugent.be/webtools/Venn/, 3/30/18) to visualize the number of OTUs shared among stream categories and distinctive to a single category (Fig. 2).

Statistical Analysis

Environmental data were standardized by scaling the mean to zero and standard deviation to one using Vegan package in R (Oksanen et al. 2018, Vegan package: decostand, R version 3.4.2). Analysis of Variance (ANOVA) and Kruskal-Wallis Rank

Sum Test were calculated using R (version 3.4.2) to determine environmental data that were significantly different (P < 0.05) among the three stream categories. Those stream variables were included in further analyses (Table 3).

Alpha diversity was calculated using Shannon diversity index for each gene marker (Appendix 2). Shannon index was calculated separately for algal only OTUs in

18S and 16S data. One replicate from LM 0110 (Unimpaired) for the UPA gene was removed due to low number of reads (< 50 reads total). Kruskal-Wallis pairwise test was used to determine significant difference between Shannon index values. Shannon index was plotted for each marker by stream category (Fig. 3). Rarefaction curves were calculated for each marker to visualize the rate at which additional reads increased the 23 number of OTUs (Qiime2) and to ensure sequence depth was adequate for each site (Figs

4–7) and each replicate (Appendices 3–6). Relative abundance and OTUs richness of taxa

(phylum for 16S, 18S, UPA, and genus for rbcL) per sample (combined replicates within a stream) was calculated for each gene marker (Appendices 7–15). Relative abundance in the 18S markers was also calculated for algal taxonomy only (Appendix 10).

Beta diversity was explored using the phyloseq (McMurdie and Holmes 2013) and vegan packages in R. The OTU tables were Hellinger transformed (Legendre and

Gallagher 2001). To investigate the similarity in community composition between and within categories, a Bray Curtis distance matrix and Jaccard index were calculated and visualized using a non-metric multidimensional scaling (NMDS) ordination. Ordination of Bray Curtis and Jaccard showed similar results, only NMDS of Bray Curtis is provided

(Fig. 8). Stress level for 16S, 18S, UPA, and rbcL using Bray Curtis and Jaccard index were all between 0.115 to 0.151. The Bray-Curtis and Jaccard index were tested using a

PERMANOVA (Vegan: Adonis) and post hoc test (RVAideMemoire: pairwise.perm.manova, Hervé 2018). A Constrained Analysis of Principal Coordinates

(CAP) using Bray Curtis values was created to overlay the environmental data. Using the function envfit (Vegan), R2 values were calculated for each environmental factor (Fig. 9).

R2 values < 0.5 were removed from the dataset and the ordination was re-conducted. An

ANOVA was conducted to assess the significance of the constrained axes used in ordination.

24

RESULTS

Stream Chemical and Physical Data

Stream chemical and physical data were collected for all sites (Appendix 1).

There were significant differences (P < 0.05) among stream categories in numerous measures as follows: 2016/2017 MAIS scores, specific conductance, temperature, mean current velocity, mean depth, % canopy, nitrate, iron, and sulfate (Table 3). The mean

MAIS scores (2016 and 2017) significantly differed (P < 0.05) among the three categories. The 2017 MAIS scores were not available for all sites (WBSCM1.8, WB51,

MC0300, and MC0240). Specific conductance and sulfate, measures that can be associated with AMD, showed significantly higher values (P < 0.05) for the Not

Recovered in comparison with Unimpaired, but was only significantly higher (P < 0.05) compared to Recovered in sulfate (Table 3). Another AMD indicator measure, iron, differed among the categories with only Recovered being significantly higher (P < 0.05) than Not Recovered (Table 3). Temperature was significantly lower (P < 0.05) in

Unimpaired (19˚C) compared to Recovered (21˚C) and Not Recovered (21˚C) sites. Not

Recovered sites had the significantly greater (P < 0.05) mean current velocity (44 cm/s) than either Unimpaired (16.7 cm/s) or Recovered (15.4 cm/s) sites. Mean depth was significantly lower (P < 0.05) in Unimpaired (14.5cm) compared to Recovered (31.1cm) sites. Canopy cover was significantly greater (P < 0.05) in Unimpaired (90%) than Not

Recovered (30%) sites. Canopy cover in Recovered (70%) sites was not significantly different from either Unimpaired or Not Recovered sites. Mean nitrate was significantly greater in Not Recovered (0.08 mg/L) sites compared to Unimpaired (0.03 mg/L) or 25

Recovered (0.05 mg/L) sites. Chlorophyll a, pH, and stream width were not significantly different (P < 0.05) among stream categories (Table 3).

Illumina Data Analyses

The Illumina MiSeq amplicon sequencing produced a total of 7,525,168 paired end (15,042,336 single) reads and after reads were quality filtered and denoised, a total of

5,056,340 paired end reads remained. Following filtering of low abundance sequences, chimeras and controls, 1,719,015 paired ends reads (71,625 average reads per sample) remained for the 16S dataset, 430,798 paired end reads (17,950 average reads per sample) for the 18S dataset, 978,304 paired end reads (40,763 average reads per sample) for the

UPA dataset, and 178,219 paired end reads (7,426 average reads per sample) for the rbcL dataset. Filtered reads were assigned to a total of 13,884 16S OTUs, 2,583 18S OTUs,

2221 UPA OTUs and 662 rbcL OTUs, respectively (Appendix 16). Rarefaction curves for 16S, 18S, and UPA amplicons showed OTUs began to plateau at ~5,000 to 10,000 reads per replicate, which was similar as the preliminary study (Appendices 3–6). Reads from the rbcL marker began to plateau at ~1,000 reads. Rarefaction curves were also calculated per site (combined replicates) which showed OTUs plateauing with each marker indicating enough reads were sequenced for each site (Figs 4–7).

Within Site Replication

Within a site, three independent replicates were collected so that within site variability could be examined. The mean percent algal and non-algal OTUs observed in all three replicates at a site for 16S, 18S, UPA, and rbcL were 19.05%, 16.19%, 17.05%, and 15.38%, respectively. The mean percent OTUs observed in two of the three replicates 26 for 16S, 18S, UPA, and rbcL were 22.71%, 23.59%, 22.24%, and 31.76%, respectively.

Overall, the majority of OTUs were only observed in a single replicate for all four markers (Table 4).

Beta Diversity of all OTUs

Prior to assigning taxonomy, beta diversity was analyzed with Bray Curtis dissimilarity and Jaccard index. Stream categories and watersheds had significantly (P <

0.05) different centroids using the Bray Curtis distance matrix and Jaccard index for each of the four markers. PERMANOVA of Bray Curtis distance matrix determined stream category (R2 = 0.20-0.17) had a slightly greater effect on the distribution of centroids than watershed (R2 = 0.16-0.17) for all markers. An NMDS of the Bray Curtis values showed similar results for all gene markers with the Recovered and Unimpaired sites on one side of the graph closer to each other and further from Not Recovered sites (Fig. 8). The

Unimpaired sites were furthest from the Not Recovered sites. Within this pattern, sites within a watershed showed some clustering (Fig. 8). Recovered site WBSCRM1.8 tended to cluster closer to Unimpaired sites than the other Recovered sites for each marker. The replicates for a given site clustered in close proximity to each other, but site MC0300 has one replicate that tended to be further away in the 16S, 18S, and UPA plots (Fig. 8).

The constrained ordination was used to overlay the environmental variables with the OTU diversity (Fig. 9). In the ordination for each marker, Unimpaired sites as well as

WBSCRM1.8 (Recovered site) were to one side of the graph and furthest from Not

Recovered sites (Fig. 9). Recovered sites MC0300 and HF039 had points in the middle, but closer to the Unimpaired sites (Fig. 9). The first two axes accounted for 22.2% of 27 variation in the 16S ordination, 20.8% of variation in the 18S ordination, 23.7% of variation in the UPA ordination, and 24.7% of variation in the rbcL ordination. In all four ordinations, the vectors for % canopy cover and 2017 MAIS scores were in the direction of the Unimpaired sites and WBSCRM1.8 (Fig. 9). Vectors for mean current velocity, nitrate, and sulfate were in the direction of the Not Recovered sites (Fig. 9). Depth pointed towards Recovered and Not Recovered sites in the rbcL and UPA ordinations and temperature pointed in a similar direction in the 16S and UPA ordination.

Analyses of all OTUs

All OTUs, algal and non-algal, in the 16S and 18S datasets were assigned taxonomy and phylum level analysis was reported as relative abundance (Appendices 7,

9). Read counts of OTUs that could not be classified at phylum level were not included in relative abundance. Relative abundance was calculated per sample and per category to compare the diversity of each biofilm community.

In all three stream categories for the 16S dataset, the most abundant phylum was

Proteobacteria (63.87 – 64.65%) followed by Bacteriodetes (9.61 – 12.69%). All stream categories also contained a similar relative abundance of (3.25 – 5.43%) and (3.93 – 5.86%) (Appendix 7). The only difference between the five most relative abundant phyla among categories was (3.76 – 4.02%) in the

Unimpaired and Recovered and (2.87%) in the Not Recovered. Analysis of 16S OTUs showed the greatest number of overlapping OTUs between Recovered and

Unimpaired (1264 OTUs) sites. A total of 1928 OTUs were shared among all three categories (Fig. 2). 28

A wide taxonomic range of eukaryotic taxa were detected using the 18S marker

(Appendix 9). For all three categories, the greatest relative abundances of reads were assigned to the same three phyla: Bacillariophyta (diatoms) (33.05 – 57.22%), followed by (11.10 – 18.20%) and Arthropoda (5.17 – 14.35%). These three most abundant phyla were followed by (4.33%) and Nematoda (4.11%) in the

Unimpaired, Annelid (2.63%) and Ciliophora (2.61%) in the Recovered, and

(9.35%) and (4.36%) in the Not Recovered category. 18S OTUs shared the greatest number of overlap between Not Recovered and Recovered streams (230 OTUs).

Recovered and Unimpaired sites contained a greater number of overlapping OTUs (199) than Not Recovered and Unimpaired (118 OTUs) sites. A total of 424 OTUs were shared between all three categories (Fig. 2).

Analyses of Algal OTUs

The 16S and 18S OTU datasets were examined for algal taxa only. Cyanobacteria is the only algal prokaryote phylum and 369 individual cyanobacterial OTUs were present in 16S dataset: 178 OTUs (18,232 reads) in Unimpaired, 165 OTUs (15,623 reads) in Recovered, and 175 OTUs (40,710 reads) in Not Recovered categories, respectively (Appendix 7, 8). The 18S dataset contained a total of 10 algal phyla to which the 509 individual OTUs (213,290 reads) belonged (Appendix 9, 10, 11). Relative abundance was calculated using only algal OTUs. Unimpaired, Recovered, and Not

Recovered categories contained 224, 242, and 352 algal OTUs, respectively. Algal phyla

Haptophyta, Miozoa, , Cryptophyta, , and Rhodophyta each represented less than 2% of the relative abundance for all three stream categories (Fig. 29

10). Algal reads assigned to Ochrophyta represented less than 2% of the relative abundance in Unimpaired and Recovered categories but was found to be greater in the

Not Recovered category (3.92%). Relative abundance of Cercozoa was 1.04% in

Unimpaired category but was more abundant in Recovered (3.92%) and Not Recovered

(15.67%) categories. was observed in similar relative abundance in all three stream categories (4.21 – 5.86%). Overall, the dominant phylum in all three streams categories was Bacillariophyta. Relative abundance of Bacillariophyta was 89.17% in the

Unimpaired category, 87.35% in the Recovered category, and 70.71% in the Not

Recovered category (Fig. 10).

The UPA marker amplified reads from a total of 11 algal phyla from 859,892 reads (1354 OTUs)(Appendix 12, 13). Unimpaired sites contained a total of 478 algal

OTUs (239,909 reads), Recovered sites consisted of 583 algal OTUs (258,330 reads). Not

Recovered sites consisted of 690 algal OTUs (361,653 reads) and had the most OTUs

(496 OTUs) not found in other categories (Fig. 2). A total of 165 OTUs were shared by all three categories. Not Recovered and Recovered sites shared more OTUs (146) than either Not Recovered or Recovered shared with Unimpaired (70 and 113, respectively)

(Fig. 2). Excluding prokaryotes (Cyanobacteria), the UPA marker identified a greater number of algal OTUs than 18S (Appendices 11, 13). In all three categories Haptophyta,

Charophyta, Cryptophyta, Euglenozoa, Cercozoa, Rhodophyta, and Ochrophyta each represented less than 2% of the relative abundance (Fig. 11). Unimpaired and Recovered categories a lower relative abundance of Chlorophyta (< 2%) and greater abundance of

Miozoa (5.43 – 5.85%) compared to Not Recovered category (3.71% Chlorophyta and 30

2.79% Miozoa). Bacillariophyta had the greatest relative abundance in Recovered

(66.26%) and Unimpaired (69.39%) categories followed by Cyanobacteria (23.40% and

21.94%, respectively), whereas biofilm composition in the Not Recovered category had greatest relative abundance of Cyanobacteria (77.44%) followed by Bacillariophyta

(13.76%) (Fig. 11). The lower number of Bacillariophyta OTUs in the Not Recovered

(92) category in comparison with Unimpaired (155) was not statistically significant (P =

0.06) but may be biologically important. Bacillariophyta OTUs in Recovered category was not significantly different from Not Recovered (P = 0.24) or Unimpaired categories

(P = 0.41).

The diatom specific rbcL marker identified a greater number of diatom OTUs than either the 18S or UPA marker. The rbcL marker produced 301 diatom OTUs

(77,202 reads) in Unimpaired sites, 223 diatom OTUs (43,375 reads) in Recovered sites, and 174 diatom OTUs (15,052 reads) in Not Recovered sites after unknowns were removed. In total, only 96 OTUs were shared between all three categories (Fig. 2).

Recovered and Unimpaired sites shared more OTUs (121) than either Recovered or

Unimpaired sites shared with Not Recovered sites (36 and 32, respectively) (Fig. 2).

Diatom OTUs in Unimpaired sites were significantly (P < 0.05) greater than Not

Recovered sites. The Recovered category was not significantly different from the either two categories. Specificity of the rbcL marker allowed for relative abundance to be compared at the genus level (Fig. 12, Table 5). Unknown diatom reads were 16.4%,

29.7%, and 26.3%, respectively, of Unimpaired, Recovered, and Not Recovered relative abundance. These reads were removed prior to the final relative abundance calculation. 31

Relative abundance of the major (>10%) diatom genera at Unimpaired sites was in descending order: Cymbella, Nitzschia, Navicula, and Encyonema. Major diatom genera from Recovered sites were Cymbella, Encyonema, Nitzschia, and Fragilaria. The relative abundance of Cymbella in Recovered (23.19%) and Unimpaired (22.60%) sites was much greater than Not Recovered (0.24%) sites. The major genera comprising the relative abundance in Not Recovered sites were Nitzschia, Navicula, Sellaphora, and

Fragilaria (Fig. 12, Table 5).

Alpha Diversity

In addition to relative abundance, alpha diversity was investigated using Shannon diversity index, which was analyzed by category (Fig. 3). Shannon index for 16S data (all

OTUs) showed significantly (P < 0.05) lower diversity in Not Recovered sites compared with the Recovered and Unimpaired sites, but no significant difference between

Unimpaired and Recovered sites. For algal OTUs (Cyanobacteria) only in the 16S data, the Shannon index was highest in Unimpaired sites and lowest in Not Recovered sites, but there were no significant differences among stream categories (Fig. 3). Shannon index using 18S OTUs (all and algal OTUs only) was not significantly different among stream categories. For the UPA marker, Shannon diversity was significantly greater (P <

0.05) in the Unimpaired versus Not Recovered category and the Recovered category was not significantly different from either Unimpaired or Not Recovered categories.

Comparison of the Shannon index in rbcL sequences among categories showed a similar trend with the Unimpaired category significantly greater (P < 0.05) than the Not 32

Recovered, but no significant difference between the Recovered category and the other two categories (Fig. 3).

33

DISCUSSION

Experimental Design

In this study, three independent replicates per stream site were utilized. The majority of OTUs were present in only one of the three replicates. As well, this finding was observed in all four markers. The lack of similarity among replicates was unexpected. This result may be due to the small amount of starting material in each DNA extraction relative to the amount of material collected for a sample. However, it is difficult to determine as analytical replicates from each sample were not performed.

Nevertheless, the individual replicates from a location did group together in the ordinations for most streams. These replicates were pooled and provided an overall picture of the diversity in the stream. Future stream biofilm metabarcoding projects should continue to include independent replicates and potentially analytical replicates to increase validation of OTUs at each site.

Sequencing eDNA using 16S and 18S markers has advanced our knowledge of community structure and diversity in stream biofilms (Bond et al. 2000, Souza-Egipsy et al. 2008, Baker et al. 2009, Lear et al. 2009). Multi-marker approaches have recently shown the ability to increase the taxonomic resolution of metabarcoding studies

(Marcelino et al. 2016, Pfendler et al. 2018). In the current study, the use of 16S, 18S,

UPA, and rbcL genes to amplify freshwater biofilm eDNA provided a broad taxonomic characterization of the prokaryotes and eukaryotes (16S and 18S marker), and a more detailed taxonomic representation of the algal community (UPA and rbcL). In my study, the Eukaryotic algal diversity represented by 18S and UPA sequences was the same 10 34 phyla of algae. Interestingly, the UPA marker showed a greater number of eukaryotic algal OTUs than the 18S marker in all three stream categories. The rbcL marker had a greater number of diatom OTUs than either the 18S or UPA marker. Previous studies have recommended the rbcL gene instead of the 18S or UPA specifically for diatom barcoding because it has greater species distinguishing capabilities (Evans et al. 2007,

Hamsher et al. 2011, Kermarrec et al. 2013). In-situ metabarcoding of stream biofilms using 18S and 16S provided an outline of the diversity but my study showed that algal specific markers increased the number of OTUs and taxonomic depth observed for algal taxa similar to findings from previous studies (Marcelino et al. 2016).

A limiting factor in all metabarcoding studies is the strength and sensitivity of the taxonomic database (Kermarrec et al. 2013, Zimmermann et al. 2015). When analyzing relative abundance in this study, OTUs with undefined taxonomy or OTUs that could not be classified beyond were removed. In addition, certain taxa are repeated in the databases with different sequences, making it difficult to adequately characterize species richness. As the number of metabarcoding studies increase, it is imperative that we continue to investigate the taxonomy of unknown OTUs and OTUs with incorrect taxonomic assignments in the datasets to unlock the full potential of eDNA and metabarcoding (Elbrecht et al. 2017).

Comparison of Stream Types

This study was conducted to further investigate the change in biofilm algal diversity along an AMD recovery gradient. Of the 10 Eukaryotic algal phyla sequenced with both 18S and UPA, representatives (OTUs) of each phylum were present in each 35 stream category. Algal prokaryotes (Cyanobacteria) sequenced with 16S and UPA were also present in each stream category. Therefore, it was shown that the biotic differences between the stream categories was not due to differences in phyla present but potentially differences among taxa represented within a phylum. Although each phylum was represented, Shannon diversity showed significantly lower algal diversity (abundance and evenness) in Not Recovered streams compared to Unimpaired streams. Algal diversity increased along the AMD remediation gradient from Not Recovered to Recovered to

Unimpaired sites. The increase in algal diversity along an AMD gradient has been recorded in previous studies using morphology, molecular markers, and functional measures (Verb and Vis 2001, Bray et al. 2008, Drerup and Vis 2016). Benthic algae are the base of the food chain in aquatic ecosystems and a decrease in algal diversity will alter the community, which may negatively affect the upper trophic levels (Sterner and

Hessen, 1994). Impacts to upper trophic levels are measured with MAIS which correlated to stream categories in this study.

Individual OTUs within each phylum also varied depending on the AMD remediation category. Sites in Not Recovered clustered significantly furthest from

Unimpaired sites and Recovered sites clustered in between or closer to Unimpaired sites.

Individuals within each algal phylum in biofilm communities can be more or less tolerant to AMD which may explains the spread of OTUs across categories (Verb and Vis 2001,

Zalack et al 2010, Pool et al. 2013, etc.). Venn diagrams support Bray Curtis ordinations because the fewest number of OTUs were shared between Not Recovered and

Unimpaired sites. In addition, each stream category had a large number of OTUs 36 distinctive to that category. This could be due to taxa with certain environmental tolerances or rare taxa that only occur in very small abundances and may not be present in reference databases (Zalack et al. 2010, Debroas et al. 2015).

Investigation of algal relative abundance in each stream category revealed a change in the percent of diatoms (Bacillariophyta) in the biofilm community. Relative abundance of diatoms in the UPA and 18S data was greatly lower in the Not Recovered category compared to Unimpaired and Recovered. This result was also observed as decrease in rbcL OTUs and UPA diatom OTUs in Not Recovered sites compared to

Unimpaired sites. Similarly, previous studies using morphological identification of biofilm taxa have observed a decrease in diatom species richness, diversity, and abundance with increased AMD impairment (Verb and Vis 2000, Smucker and Vis

2009). Diatoms are important contributors of polyunsaturated fatty acids (PUFA) in the stream, which is a vital source of energy in the ecosystem (Napolitano 1999, Torres-Ruiz et al. 2007, Torres-Ruiz and Wehr 2010). Research investigating stream biofilm essential fatty acids reported a reduction in the proportion of PUFAs with increased AMD impairment compared to Unimpaired streams (DeForest et al. 2016, Drerup and Vis

2016). A decrease in PUFAs may also have a negative effect on the higher trophic levels, as they are reliant on biofilm PUFA since many organisms cannot synthesize their own

(Torres-Ruiz and Wehr 2010).

The UPA marker showed an increase in relative abundance of cyanobacteria in the Not Recovered sites, but this was not supported in the 16S data. Additionally, the 16S and UPA Cyanobacteria OTUs varied by site within each category rather than showing 37 an overall increase in the Not Recovered category streams. These findings indicate that there is most likely not an increase in Cyanobacteria, but a drastic decrease in diatom relative abundance in the community which increased the relative abundance of

Cyanobacteria. The drastic change in diatom relative abundance appears to be unique and is supported by multiple biological metrics.

Benthic diatom communities and individual taxa have been shown to be indicators of stream water quality (Pan et al. 1996, Winter and Duthie 2000, Porter et al.

2008, Stevenson 2014, Hausmann et al. 2016). Diatoms have been used to assess AMD impacts in this mining region (Smucker and Vis 2009, Zalack et al. 2010, Pool et al.

2013) and the diatom index of biotic integrity (AMD-DIBI) was developed specifically to assess AMD pollution in the Western Alleghany plateau (Zalack et al. 2010). One metric used in this index is the relative abundance of Cymbella. This genus decreases in streams with increasing impairment. In the current study, a similar result was observed with

Cymbella being very sparse in Not Recovered streams compared to Recovered and

Unimpaired streams. Likewise, Encyonema has been noted to be an indicator of stream with higher water quality and lower nutrients (Hausmann et al. 2016) and this genus was observed in greater relative abundance in Recovered and Unimpaired sites compared to

Not Recovered sites. Sellaphora has been recorded as an indicator of streams with high conductivity and neutral pH, similar to Not Recovered sites in this study (Hausmann et al.

2016). Sellaphora was in the greatest relative abundance in Not Recovered sites and decreased in abundance in Recovered and Unimpaired sites. Metabarcoding of diatoms using the rbcL marker allowed for genus and in some cases species identification of 38 indicator taxa. As diatom rbcL reference databases continue to grow and be curated, a greater number of indicator species within the community can be investigated using this method.

Although the indicator taxa Cymbella, Encyonema, and Sellaphora were in relative high abundance, other potential indicator taxa were only in low abundance.

Cocconeis has been reported to be an indicator of higher water quality (Wang et al.

2005). Cocconeis represented 0.08% of the relative abundance in Unimpaired and 0.2% in Recovered sites, but it was not observed in any of the Not Recovered sites and one

Recovered site (HF 039). In addition, Pinnularia was in very low abundance and has been described as a possible indicator of AMD impairment (Zalack et al. 2010).

Pinnularia had a relative abundance of 0.49% in Not Recovered, 0.02% in Recovered and 0.006% in Unimpaired sites representing a 25 to 80x decrease in the higher quality streams, which is consistent with the reports that it occurs in more impacted streams.

Although these taxa were found in low relative abundance in each category, they offer important insight into the stream water quality and remediation status.

Eunotia exigua and Frustulia are indicator diatoms previously reported in AMD impaired streams in the area but were not observed in this study (Verb and Vis 2000,

Smucker and Vis 2009). Eunotia exigua was not in our reference database but other

Eunotia species and Frustulia species were present in the reference database. If these organisms were present in the sample they should have been identified at least to a genus level with the rbcL marker. Not Recovered sites in this study are not as severely impacted as AMD impacted sites in other studies that previously made these observations (Verb 39 and Vis 2000, Smucker and Vis 2009). It is possible that the lower water quality sites chosen in this study had good enough water quality that Eunotia and Frustulia were not present.

Conclusion

Analysis of stream biofilm communities along AMD remediation gradient using multi-marker metabarcoding results support biofilm community clustering by remediation status similar to the MAIS. The biofilm community increases in diversity as streams increase in remediation status (from Not Recovered to Recovered and

Unimpaired). Most biological metrics used in this study showed Recovered sites to be more similar to Unimpaired sites than Not Recovered. In addition, Not Recovered sites were more dissimilar to Unimpaired sites than to Recovered sites. Diatom indicator taxa established from previous studies corresponded to water quality and remediation status in the current study as well. Overall, this research successfully showed that the data generated from a mutli-marker metabarcoding approach provided insights into community level differences in biofilm diversity among sites categorized in different states of AMD remediation. Numerous future directions could be explored including sampling more streams in each category, seasonal studies, using other group specific markers, and deeper taxonomic analysis as databases continue to be curated.

40

REFERENCES

Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., & Huse, S. M. 2009. A

method for studying protistan diversity using massively parallel sequencing of V9

hypervariable regions of small-subunit ribosomal RNA Genes. PLoS ONE 4:e6372.

Apprill, A., McNally, S., Parsons, R., & Weber, L. 2015. Minor revision to V4 region

SSU rRNA 806R gene primer greatly increases detection of SAR11

bacterioplankton. Aquat. Microb. Ecol. 75:129–137.

Arar, E. J., & Collins, G. B. 1997. In Vitro Determination of Chlorophyll a and

Pheophytin a in Marine and Freshwater Algae by Fluorescence. 1.2st ed.

Environmental Protection Agency, Office of Research and Development. Cincinnati,

Ohio.

Baker, B.J., Tyson, G.W., Goosherst, L. & Banfield, J.F. 2009. Insights into the diversity

of eukaryotes in acid mine drainage biofilm communities. Appl. Environ. Microbiol.

75:2192–9.

Belanger, S. E., Rupe, K. L., Lowe, R. L., Johnson, D. & Pan, Y. 1996. A flow-through

laboratory microcosm suitable for assessing effects of surfactants on natural

periphyton. Environ. Toxicol. Water Qual. 11:65-76.

Bray, J. P., Broady, P. A., Niyogi, D. K., & Harding, J. S. 2008. Periphyton communities

in New Zealand streams impacted by acid mine drainage. Mar. Freshwater Res.

59:1084-1091.

41

Bond, P. L., S. P. Smriga, & J. F. Banfield. 2000. Phylogeny of microor- ganisms

populating a thick, subaerial, predominantly lithotrophic biofilm at an extreme acid

mine drainage site. Appl. Environ. Microbiol. 66:3842–3849.

Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J., & Holmes, S.

P. 2016. DADA2: High-resolution sample inference from Illumina amplicon data.

Nat. Methods 13:581–583.

Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E.

K., Huttley, G. A., et al. 2010. QIIME allows analysis of high-throughput

community sequencing data. Nat. Methods 7:335.

Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A.,

Turnbaugh, P. J., Noah Fierer, N., & Knight, R. 2011. Global patterns of 16S rRNA

diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA

108:4516–4522.

Costello, C. 2003. Acid Mine Drainage: Innovative Treatment Technologies. Report for

U.S. Environmental Protection Agency. Office of Solid Waste, and Emergency

Response Technology Innovation Office, Washington, DC. 52 pp. Available at:

www.clu-in.org/s.focus/c/pub/i/1054. (last accessed Febuary 2017).

Debroas D., Hugoni M., & Domaizon, I. 2015. Evidence for an active rare biosphere

within freshwater community. Mol. Ecol. 24:1236-1247.

DeForest, J. L., Drerup, S. A. & Vis, M. L. 2016. Using fatty acids to fingerprint biofilm

communities: a means to quickly and accurately assess stream quality. Environ.

Monit. Assess. 188:277. 42

Drerup, S. A., & Vis, M. L. 2016. Responses of Stream Biofilm Phospholipid Fatty Acid

Profiles to Acid Mine Drainage Impairment and Remediation. Water Air Soil

Pollution 227:1–10.

Dopheide, A., Lear, G., Stott, R. & Lewis, G. 2009. Relative diversity and community

structure of in stream biofilms according to molecular and microscopy

methods. Appl. Environ. Microbiol. 75:5261–72.

Edwards, S. J. & Kjellerup, B. V. 2013. Applications of biofilms in bioremediation and

biotransformation of persistent organic pollutants, pharmaceuticals/personal care

products, and heavy metals. Appl. Microbiol. Biotechnol. 97:9909–21.

Elbrecht, V. & Leese, F. 2015. Can DNA-based ecosystem assessments quantify species

abundance? Testing primer bias and biomass—sequence relationships with an

innovative metabarcoding protocol. PLoS ONE 10:e0130324.

Elbrecht, V., Vamos, E. E., Meissner, K., Aroviita, J., & Leese, F. 2017. Assessing

strengths and weaknesses of DNA metabarcoding-based macroinvertebrate

identification for routine stream monitoring. Methods Ecol. and Evol. 8:1265–1275.

Evans, K. M., Wortley, A. H., & Mann, D. G. 2007. An assessment of potential diatom

“barcode” genes (cox1, rbcL, 18S and ITS rDNA) and their effectiveness in

determining relationships in Sellaphora (Bacillariophyta). . 158:349–364.

Gray, N. F. 1997. Environmental impact and remediation of acid mine drainage: A

management problem. Environ. Geol. 30:62–71.

43

Guschina, I. A. & Harwood, J. L. 2009. Algal lipids and effect of the environment on

their biochemistry. Lipids in Aquatic Ecosystems. Springer, New York, NY, pp. 1–

24.

Hamsher, S. E., Evans, K. M., Mann, D. G., Poulícková, A., & Saunders, G. W. 2011.

Barcoding diatoms: exploring alternatives to COI-5P. Protist 162:405–422.

Hausmann, S., Charles, D. F., Gerritsen, J., & Belton, T. J. 2016. A diatom-based

biological condition gradient (BCG) approach for assessing impairment and

developing nutrient criteria for streams. Sci. Total Environ. 562:914–927.

Hervé, M. 2018. RVAideMemoire: Testing and Plotting Procedures for Biostatistics. R

package version:0.9-69.

Johnson, K. S. 2007. Field and laboratory Methods for using the MAIS

(Macroinvertebrate Aggregated Index for Streams) in Rapid Bioassessment of Ohio

Streams. Available at: http://www.epa.ohio.gov/dsw/credibledata/references.aspx

(last accessed May 2017).

Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S.,

Cooper, A., Markowitz, S., & Duran, C., 2012. Geneious Basic: an integrated and

extendable desktop software platform for the organization and analysis of sequence

data. Bioinformatics 28:1647–1649.

Kermarrec, L., Franc, A., Rimet, F., Chaumeil, P., Humbert, J. F., & Bouchez, A. 2013.

Next-generation sequencing to inventory taxonomic diversity in eukaryotic

communities: a test for freshwater diatoms. Mol. Ecol. Resour. 13:607619.

44

Lang, I., & Kaczmarska, I. 2011. A protocol for a single-cell PCR of diatoms from fixed

samples: method validation using Ditylum brightwellii (T. West) Grunow. Diatom

Research 26:43–49.

Lear, G., Niyogi, D., Harding, J., Dong, Y. & Lewis, G. 2009. Biofilm bacterial

community structure in streams affected by acid mine drainage. Appl. Environ.

Microbiol. 75:3455–60.

Legendre, P. & Gallagher, E. D. 2001. Ecologically meaningful transformations for

ordination of species data. Oecologia 129:271–280.

Marcelino, V. R., Verbruggen, H., Rosenberg, E., Koren, O., Reshef, L., Efrony, R., &

Zilber-Rosenberg, I. 2016. Multi-marker metabarcoding of coral skeletons reveals a

rich microbiome and diverse evolutionary origins of endolithic algae. Sci. Rep.

6:31508.

McMurdie, P. J., & Holmes, S. (2013 phyloseq: An R Package for Reproducible

Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE

8:e61217.

Napolitano, G. E., 1999. Fatty acids as trophic and chemical markers in freshwater

ecosystems. In Arts, M. T. & B. C. Wainman [Eds] Lipids in Freshwater

Ecosystems Springer, New York, NY. pp. 21–44

Nelson, C. E., Bennett, D. M., & Cardinale, B. J. 2013. Consistency and sensitivity of

stream periphyton community structural and functional responses to nutrient

enrichment. Ecol. Appl. 23:159-173.

45

OEPA. 1987. Biological criteria for the protection of aquatic life: volume II: users

manual for biological field assessment of Ohio surface waters. Div. Water Qual.

Monit. & Assess. Surface Water Section, Columbus, Ohio.

Oksanen, J., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’hara, R. B.,

Simpson, G. L., et al. 2013. Package “vegan” Community ecology package. R

package version 2.4-6.

Pan, Y., Stevenson, R. J., Hill, B. H., Herlihy, A. T., & Collins, G. B. 1996. Using

diatoms as indicators of ecological conditions in lotic systems: a regional

assessment. J. N. Am. Benthol. Soc. 15:481–495.

Pfendler, S., Karimi, B., Maron, P. A., Ciadamidaro, L., Valot, B., Bousta, F., Alaoui-

Sosse, L. 2018. Biofilm biodiversity in French and Swiss show caves using the

metabarcoding approach: First data. Sci. Total Environ. 615:1207-1217.

Pool, J. R., Kruse, N. A., & Vis, M. L. 2013. Assessment of mine drainage remediated

streams using diatom assemblages and biofilm enzyme

activities. Hydrobiologia, 709:101–116.

Porter, S. D., Mueller, D. K., Spahr, N. E., Munn, M. D., & Dubrovsky, N. M. 2008.

Efficacy of algal metrics for assessing nutrient and organic enrichment in flowing

waters. Freshw. Biol. 53:1036–1054.

Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., &

Glöckner, F. O. 2013 The SILVA ribosomal RNA gene database project: improved

data processing and web-based tools. Nucleic acids research. 41:D590-D596

46

Ratnasingham, S., & Hebert, P. D. 2007. BOLD: The Barcode of Life Data System

(http://www. barcodinglife. org). Mol. ecol. notes 7:355–364.

Sherwood, A. R. & Presting, G. G. 2007. Universal primers amplify a 23S rDNA plastid

marker in eukaryotic algae and cyanobacteria. J. Phycol. 43:605–8.

Sherwood, A. R., Dittbern, M. N., Johnston, E. T., & Conklin, K. Y. 2017. A

metabarcoding comparison of windward and leeward airborne algal diversity across

the Ko ‘olau mountain range on the island of O'ahu, Hawai ‘i1. J. Phycol. 53:437–

445.

Smucker, N. J., & Vis, M. L. 2009. Use of diatoms to assess agricultural and coal mining

impacts on streams and a multiassemblage case study. J. N. AM. Benthol. Soc.

28:659–675.

Smucker, N. J., Drerup, S. A., & Vis, M.L. 2014. Roles of benthic algae in the structure,

function, and assessment of stream ecosystems affected by acid mine drainage. J.

Phycol. 50:425–36.

Souza-Egipsy, V., González-Toril, E., Zettler, E., Amaral-Zettler, L., Aguilera, A. &

Amils, R. 2008. Prokaryotic community structure in algal photosynthetic biofilms

from extreme acidic streams in Rio Tinto (Huelva, Spain). Int. Microbiol. 11:251–

60.

Stein, E. D., Martinez, M. C., Stiles, S., Miller, P. E., & Zakharov, E. V. 2014. Is DNA

barcoding actually cheaper and faster than traditional morphological methods:

results from a survey of freshwater bioassessment efforts in the United

States?. PLoS One 9:e95525 47

Sterner, R. W., & Hessen, D. O. 1994. Algal nutrient limitation and the nutrition of

aquatic herbivores. Annu. Rev. Ecol. Syst. 25:1–29.

Stevenson, R. J., Bothwell, M. L., Lowe, R. L., & Thorp, J. H. 1996. Algal ecology:

Freshwater benthic ecosystem. Academic press.

Stevenson, J. 2014. Ecological assessments with algae: a review and synthesis. J. Phycol.

50:437–461.

Torres-Ruiz, M., & Wehr, J. D. 2010. Changes in the nutritional quality of decaying leaf

litter in a stream based on fatty acid content. Hydrobiologia 651:265–278.

Torres-Ruiz, M., Wehr, J. D., & Perrone, A. A. 2007. Trophic relations in a stream food

web: importance of fatty acids for macroinvertebrate consumers. J. N. AM. Benthol.

Soc. 26:509–522.

US Environmental Protection Agency (US EPA) 1994. Acid mine drainage prediction.

US EPA Report #EPA-530-R-94-036, US EPA Offie of Solid Waste, Washington,

DC.

United States Environmental Protection Agency (US EPA). 1997. A citizen’s handbook

to address to address contaminated coal mine drainage. US EPA Report #EPA-903-

K-97-003, US EPA Region 3, Philadelphia, PA, USA.

Verb, R. G., & Vis, M. L. 2000. Comparison of benthic diatom assemblages from streams

draining abandoned and reclaimed coal mines and nonimpacted sites. J. N. Am.

Benthol. Soc. 19:274–288.

Verb, R. G., & Vis, M. L. 2001. Macroalgal communities from an acid mine drainage

impacted watershed. Aquat. Bot. 71:93–107. 48

Verb, R. G., & Vis, M. L. 2005. Periphyton assemblages as bioindicators of mine-

drainage in unglaciated western allegheny plateau lotic systems. Water. Air. Soil

Pollut. 161:227–65.

Wang, Y. K., Stevenson, R. J. & Metzmeier, L. 2005. Development and evaluation of a

diatom-based index of biotic integrity for the Interior Plateau ecoregion, USA. J. N.

Am. Benthol. Soc. 24:990–1008.

Wetzel, R. G. 1983. Attached algal-substrata interactions: fact or myth, and when and

how?. In Periphyton of freshwater ecosystems. Springer, Netherlands, pp. 207-215

Winter, J. G., & Duthie, H. C. 2000. Epilithic diatoms as indicators of stream total N and

total P concentration. J. N. Am. Benthol. 19:32-49.

Zalack, J. T., Smucker, N. J., & Vis, M. L. 2010. Development of a diatom index of

biotic integrity for acid mine drainage impacted streams. Ecol. Indic. 10:287–295.

Zimmermann, J., Glöckner, G., Jahn, R., Enke, N., & Gemeinholzer, B. 2015.

Metabarcoding vs. morphological identification to assess diatom diversity in

environmental studies. Mol. Ecol. Resour. 15:526-542 49

Table 1: Amplification primers and PCR settings for each gene marker. The citation refers to the primary primer sequence. The amplicon size (bp) is the size of the PCR product after amplification including primers. Thermocycler settings for the first PCR amplification.

Target Primer Citation Sequence (5’ – 3’) Illumina Overhang + 0-2 N’s + primer (5’-3’) Name 16S 515F Caporaso et al. 2011 GTGYCAGCMGCCGCGGTAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGYCAGCMGCCGCGGTAA (V4) TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGTGYCAGCMGCCGCGGTAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGTGYCAGCMGCCGCGGTAA 806RB Apprill et al. 2015 GGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNGGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNGGACTACNVGGGTWTCTAAT 18S 1391F Amaral-Zettler et al. 2009 GTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGTACACACCGCCCGTC EukBr Amaral-Zettler et al. 2009 TGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNTGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNTGATCCTTCTGCAGGTTCACCTAC UPA P23SrV_f1 Sherwood & Presting 2007 GGACAGAAAGACCCTATGAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACAGAAAGACCCTATGAA (23S) TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGGACAGAAAGACCCTATGAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGGACAGAAAGACCCTATGAA P23SrV_f1 Sherwood & Presting 2007 TCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNTCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNTCAGCCTGTTATCCCTAGAG rbcL cfD Hamsher et al. 2011 CCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNCCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNCCRTTYATGCGTTGGAGAGA DtrbcL3R Lang and Kaczmarska 2011 ACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNACACCWGACATACGCATCCA 50

Table 2: Thermocycler Settings for 1st PCR Amplification

DNA Marker 1st Amplification Conditions (* = Touchdown PCR) Initial denaturing = 95˚C, 5 min 28 Cycles: Denature = 98˚C, 20 s 16S (V4) Annealing = 55˚C, 15 s Extension = 72˚C, 25 s Final Extension = 72˚C, 5 min Initial denaturing = 95˚C, 5 min 27 Cycles: Denature = 98˚C, 20 s 18S Annealing = 58˚C, 15 s Extension = 72˚C, 25 s Final Extension = 72˚C, 5 min *Initial denaturing = 94˚C, 4 min

1) 2) 16 Cycles: 19 Cycles:

UPA Denature = 94˚C, 30 s Denature = 94˚C, 30 s Annealing = 66˚C (-0.5˚C / cycle), 30 s Annealing = 58˚C, 30 s

Extension = 72˚C, 30 s Extension = 72˚C, 30 s Final Extension = 72˚C, 1 min Initial denaturing = 95˚C, 5 min 31 Cycles: Denature = 98˚C, 20 s rbcL Annealing = 55˚C, 15 s Extension = 72˚C, 30 s Final Extension = 72˚C, 5 min 51

Table 3: Mean (range) of chemical and physical parameters measured for streams in each category. Significant differences in bold among categories were calculated using Kruskal-Wallis post hoc test (P < 0.05). Values that share a letter are not significant from each other. MAIS = Macroinvertebrate Aggregate Index for Streams, Chl. a = Chlorophyll a.

Variable Category Not Recovered Recovered Unimpaired (n=3) (n=3) (n=2) MAIS 2017 12A (12-13) 15B (14-16) 14C (14) Chl. a (µg/g) 448.0 (187.9-868.6) 490.2 (157.7-971.7) 601.0 (300.7, 901.3) pH 7.7 (7.5-8.0) 7.5 (7.4-7.7) 7.6 (7.3, 7.9) Specific Conductance (µS/cm) 613A (510 - 770) 490AB (380-560) 370B (210, 530) Temp. (˚C) 21A (19-22) 21A (20-23) 19B (19) Mean Current Velocity (cm/s) 44.0A (26.9-72.7) 15.4B (5.0-32.5) 16.7B (7.3, 26.1) Mean Depth (cm) 19.4AB (11.8-34.3) 31.1A (17.7-44.2) 14.5B (8.0, 21.0) Width (m) 6.7 (3.2-10.0) 9.1 (8.4-9.7) 9.0 (7.3, 10.7) % Canopy 30A (0-90) 70AB (70-80) 90B (80, 90) Nitrate (mg/L) 0.08A (0.07-0.10) 0.05B (0.02-0.06) 0.03B (0.02, 0.03) Iron (mg/L) 0.17A (0.04-0.42) 0.32B (0.22-0.49) 0.15AB (0.13, 0.16) Sulfate (mg/L) 120.0A (70.0-170.0) 56.7B (30.0-80.0) 22.5B (15.0, 30.0)

52

Table 4: Comparison of replicates within a stream site. Values are the mean percent similarity of OTUs in one, two, or three of the replicates with the range (%) among sites below.

OTUs in All Three OTUs in Two OTUs in One Gene Replicates Replicates Replicate 19.05 22.71 58.24 16S (12.21 – 22.84) (20.71 – 25.33) (55.53 – 67.08) 16.19 23.59 60.23 18S (10.21 – 24.95) (20.0 – 27.07) (53.61 – 69.51) 17.50 22.24 60.86 UPA (8.57 – 38.94) (17.53 – 38.94) (47.78 – 73.27) 15.38 31.76 52.86 rbcL (4.05 – 27.37) (15.75 – 43.93) (34.21 – 77.40)

Table 5: Mean Relative abundance (%) of diatom genera based on rbcL sequences for each stream category. Genera in bold are taxa noted to indicate water quality.

Genus Category Not Recovered Recovered Unimpaired Achnanthidium 7.14 2.65 3.47 Amphora 0.11 0.01 1.90 Cocconeis 0.00 0.20 0.08 Craticula 0.00 0.00 0.01 Cyclotella 0.32 1.19 2.35 53

Table 5: continued

Cymbella 0.24 23.19 22.60 Discostella 0.07 0.00 0.00 Encyonema 2.62 18.55 10.12 Fistulifera 1.50 1.59 2.68 Fragilaria 11.93 10.97 5.46 Gomphonema 8.24 9.48 3.40 Halamphora 0.07 0.03 2.80 Mayamaea 0.32 0.01 0.16 Melosira 0.56 2.66 4.85 Navicula 21.86 5.09 10.39 Nitzschia 25.71 15.47 17.10 Pinnularia 0.49 0.02 0.01 Planothidium 0.46 0.12 1.76 Reimeria 0.43 0.17 2.67 Sellaphora 16.70 7.38 3.47 Surirella 1.24 1.22 4.73

54

Ohio Sunday Creek

WB51 # EB001 "

WBSCRM1.8 Sunday ^ Creek Monday Creek

Hewett Fork

/ 0 40 80 160 0 2 4 8 km km

Hewett Fork Monday Creek

" LM00110 #HF090

MC00300 ^ MC00240 HF039^ #

0 2 4 8 km km # Not Recovered ^ Recovered " Unimpaired Figure 1: A. Map of Ohio with watersheds sampled in this study outlined. B. Site locations within Sunday Creek, C. Hewett Fork and D. Monday Creek. Site categories as follows: triangle = Not Recovered, star = Recovered, square = Unimpaired. Detailed information about each location in Appendix 1. 55

Figure 2: Venn diagrams for each DNA marker showing the number of OTUs unique and shared between and among site categories. 56

16S 18S

B A B 10 12 10 12

A A A 2468 2468

rbcL UPA Shannon 10 12 10 12

B B AB A AB A 2468 2468 Not Recovered Recovered Unimpaired Not Recovered Recovered Unimpaired

Category Category

Figure 3: Shannon diversity index per stream category (Not Recovered, Recovered,

Unimpaired) grouped by DNA marker (16S, 18S, UPA, rbcL). Each box represents the lower (25%) and upper (75%) quartiles. Mean Shannon value is represented with bold line. Minimum and maximum values indicated by whiskers and outliers indicated with circles. Significant differences among categories were calculated with Kruskal-Wallis post hoc test (P < 0.05) and denoted by letters on top of each box. Boxes that share the same letter are not significant from each other.

57

Figure 4: Rarefaction Plot of 16S marker showing the number of OTUs as a function of sequence depth (number of reads). Replicates for a site were pooled. 58

Figure 5: Rarefaction Plot of 18S marker showing the number of OTUs as a function of sequence depth (number of reads). Replicates for a site were pooled. 59

Figure 6: Rarefaction Plot of UPA marker showing the number of OTUs as a function of sequence depth (number of reads). Replicates for a site were pooled. 60

Figure 7: Rarefaction Plot of rbcL marker showing the number of OTUs as a function of sequence depth (number of reads). Replicates for a site were pooled.

61

16S 1.0 18S Site WB51 HF090 MC0240 HF039 0.5 0.5 MC0300 WBSCRM1.8 EB001 LM0110

0.0 NMDS2 0.0 NMDS2

−0.5

−0.5

−1.0 −1.0 −0.50.0 0.51.0 −1 01 NMDS1 NMDS1

rbcL UPA

1.0

1

0.5

0 NMDS2 NMDS2 0.0

−1 −0.5

−2 −1 012 −1 012 NMDS1 NMDS1 Figure 8: Non-metric multidimensional scaling (NMDS) ordination representing Bray

Curtis distance matrix calculated from Hellinger transformed OTU tables from 16S

(stress = 0.145), 18S (stress = 0.139), rbcL (stress = 0.121), and UPA (stress = 0.130).

Each point represents one replicate from a site. Colors represent replicates from the same

site. Circles = Not Recovered sites, triangles = Recovered sites, squares = Unimpaired

sites. 62

2 16S 18S Site WB51 HF090 MC0240 HF039 1 MC0300 % Canopy WBSCRM1.8 1 EB001 CV LM0110 Sulfate CVSulfate % Canopy Nitrate Nitrate CAP2 [9.4%] CAP2 [9.3%] 0

0 MAIS

MAIS

Temp. −1 −1.0 −0.50.0 0.51.0 1.5 −1.0 −0.50.0 0.51.0 CAP1 [12.8%] CAP1 [11.5%]

rbcL UPA

1 1 CV Sulfate Sulfate Nitrate CV Nitrate

% Canopy

0 % Canopy CAP2 [9%]

CAP2 [9.6%] 0

Temp. MAIS MAIS Depth Depth −1 −1

−1.0 −0.50.0 0.51.0 −1.0 −0.50.0 0.51.0 1.5 CAP1 [15.1%] CAP1 [14.7%]

Figure 9: Constrained Analysis of Principal Coordinates (CAP) of 16S, 18S, rbcL, and

UPA based on Bray Curtis distance matrix calculated from Hellinger transformed OTU tables constrained by stream chemical and physical data. Each point represents one replicate within a site. Both axes were statistically significant (P < 0.05) following a

PERMANOVA test for each marker. Environmental data with R2 > 0.5 and significantly

(P < 0.05) correlated with ordination were overlaid. Arrows represent each variables effect along both axis on the ordination. Percent variation shown for each axis. Colors represent replicates from the same site. Circles = Not Recovered sites, triangles =

Recovered sites, squares = Unimpaired sites. 63

1

Haptophyta 0.01 Euglenozoa 0.56 Miozoa 0.08 Cercozoa 0.92 Bacillariophyta Cryptophyta 1.04 Ochrophyta 1.59 89.17 Rhodophyta 1.87 Chlorophyta 4.76

2 Haptophyta 0.04 Charophyta 0.05 Miozoa 0.03 Euglenozoa 0.54 Cryptophyta 1.12 Ochrophyta 1.24 Bacillariophyta Rhodophyta 1.5 87.35 Cercozoa 3.92

Chlorophyta 4.21

Haptophyta 0.01 3 Miozoa 0.06 Charophyta 0.25 Cryptophyta 1.11 Euglenozoa 1.14 Rhodophyta 1.27 Ochrophyta 3.92 Bacillariophyta

70.71 Chlorophyta 5.86

Cercozoa 15.67

Figure 10: Mean Relative Abundance (%) of Algal Phyla for the 18S marker in

Unimpaired (1), Recovered (2), and Not Recovered (3) categories. 64

1

Cryptophyta 0.3 Haptophyta 0.05 Miozoa 5.43

Bacillariophyta Ochrophyta 0.78 Charophyta 0.04 69.39 Chlorophyta 1.8 Rhodophyta 0.21 Euglenozoa 0.05 Cyanobacteria 21.94

2 Haptophyta 0.04 Cryptophyta 0.12 Miozoa 5.85 Ochrophyta 1.93 Charophyta 0.04 Bacillariophyta Chlorophyta 1.39 Rhodophyta 0.86 66.26 Cercozoa 0.02 Euglenozoa 0.09

Cyanobacteria 23.4

3

Bacillariophyta 13.76

Cyanobacteria Cryptophyta 0.03 77.44 Miozoa 2.79 Ochrophyta 1.82 Chlorophyta 3.71 Rhodophyta 0.21 Cercozoa 0.13 Euglenozoa 0.10 Charophyta 0.02 Haptophyta 0.01

Figure 11: Mean Relative Abundance (%) of Algal Phyla for the UPA marker in

Unimpaired (1), Recovered (2), and Not Recovered (3) categories. rbcL Relative Abundance of Diatom Genera 65 27

24

21

18

15

12

Relative Abundance (%) 9

6

3

0

Navicula Nitzschia Surirella Melosira Reimeria Amphora Craticula Cymbella Cyclotella Fragilaria Cocconeis Fistulifera Mayamaea Sellaphora Discostella Pinnularia Encyonema

Halamphora Planothidium Gomphonema Achnanthidium Diatom Genera

Unimpaired Recovered Not Recovered Figure 12: Scatter plot representing relative abundance (%) along the Y-axis and diatom genera along the X-axis found in each category using the rbcL marker. Relative abundance was compiled by summing diatom reads per genera found in sites within a category and dividing by sum of total diatom reads found in the category. Unknown genera were removed prior to relative abundance calculation. Shapes represent the different categories. Diatom genera in bold are considered to be water quality indicators. 66

APPENDIX 1: SITE INFORMATION

Individual Site location information including chemical and physical data from collection day. Location MC0240 HF090 WB51 MC0300 HF039 WBSCRM1.8 LM0110 EB001 *ELRM 0.1 Watershed Monday Hewett Fork Sunday Monday Hewett Fork Sunday Creek Monday Sunday Creek Sunday Creek Creek Creek Creek Creek Longitude -82.21287 W -82.25307 W -82.13725 W -82.24669 W -82.28427 W -82.08601 W -82.28317 W -82.03234 W -82.03166 W Latitude 39.49024 N 39.34740 N 39.59062 N 39.50078 N 39.31913 N 39.51858 N 39.54786 N 39.56972 N 39.57274 N Category Not Not Not Recovered Recovered Recovered Unimpaired Unimpaired Unimpaired Recovered Recovered Recovered Current MAIS 12 13 12 16 14 16 14 14 NA Chla (ug/g) 287.3 868.6 187.9 157.7 971.7 341.3 300.7 901.3 271.2 pH 7.5 7.5 8 7.4 7.5 7.7 7.3 7.9 7.9 Conductivity 510 560 770 530 380 560 530 210 210 (uS) Temp. (˚C) 21 22 19 20 23 20 19 19 20 Mean Current 26.9 32.5 72.7 32.5 8.5 5 26.1 7.3 23 Velocity (cm/s) Mean Depth 34.3 11.8 12 44.2 31.3 17.7 21 8 17 (cm) Width (m) 10 3.2 7 9.7 8.4 9.3 7.3 10.7 9.3 % Canopy 0 0 0.9 0.7 0.7 0.8 0.8 0.9 0.9 Nitrate (mg/L) 0.07 0.07 0.1 0.06 0.06 0.02 0.02 0.03 0.02 Iron (mg/L) 0.05 0.42 0.04 0.22 0.49 0.26 0.13 0.16 0.27 Sulfate (mg/L) 70 120 170 60 30 80 30 15 12 67

APPENDIX 2: SHANNON INDEX VALUES

Shannon index values for 16S, 16S Algae Only, 18S, 18S Algae Only, UPA, and rbcL for each replicate for each site. Index values are provided for each DNA marker as well as algal only OTUs in 16S and 18S.

Shannon Index Location Category 16S 16S Algae Only 18S 18S Algae Only UPA rbcL EB001 Unimpaired 9.571 3.624 6.342 4.748 5.843 5.283 EB001 Unimpaired 9.495 4.214 6.584 4.908 5.563 6.100 EB001 Unimpaired 8.869 5.211 4.908 3.720 5.582 5.295 HF039 Recovered 10.100 3.398 6.350 4.324 4.593 4.370 HF039 Recovered 10.040 4.007 6.762 4.868 4.993 5.572 HF039 Recovered 9.834 4.588 6.031 4.276 5.583 5.114 HF090 Not Recovered 9.372 3.103 7.415 4.598 3.936 3.072 HF090 Not Recovered 9.579 3.970 6.757 4.044 3.965 3.766 HF090 Not Recovered 10.063 4.524 7.441 4.896 4.857 4.314 LM0110 Unimpaired 10.445 1.961 6.061 4.004 5.687 4.055 LM0110 Unimpaired 10.748 3.813 7.495 5.491 0.787 6.129 LM0110 Unimpaired 10.733 4.229 6.266 4.379 6.230 5.248 MC0240 Not Recovered 9.792 3.011 7.063 5.769 4.731 2.507 MC0240 Not Recovered 8.781 3.871 6.001 3.956 4.203 3.531 MC0240 Not Recovered 8.820 4.350 5.450 3.473 3.591 4.881 MC0300 Recovered 10.166 2.490 6.741 4.394 5.506 4.492 MC0300 Recovered 9.640 3.832 6.809 4.775 5.966 5.419 68

Appendix 2: Continued

MC0300 Recovered 9.789 4.340 5.644 4.417 3.525 3.368 WB51 Not Recovered 8.421 3.652 6.035 4.388 5.689 5.470 WB51 Not Recovered 8.658 4.229 6.260 4.642 5.572 5.439 WB51 Not Recovered 7.421 5.447 5.592 3.975 5.017 3.888 WBSCRM1.8 Recovered 9.926 3.468 6.361 4.317 5.383 4.385 WBSCRM1.8 Recovered 9.230 4.098 5.740 3.750 4.656 4.315 WBSCRM1.8 Recovered 9.811 5.172 5.859 4.407 5.448 4.643

69

APPENDIX 3: RAREFACTION PLOT OF 16S DATA BY REPLICATE

Rarefaction Plot of 16S data by replicate showing the number of OTUs as a function of sequence depth (number of reads). Replicates from the same site are in the same color. 70

APPENDIX 4: RAREFACTION PLOT OF 18S DATA BY REPLICATE

Rarefaction Plot of 18S data by replicate showing the number of OTUs as a function of sequence depth (number of reads). Replicates from the same site are in the same color. 71

APPENDIX 5: RAREFACTION PLOT OF UPA

Rarefaction Plot of UPA data by replicate showing the number of OTUs as a function of sequence depth (number of reads). Replicates from the same site are in the same color. 72

APPENDIX 6: RAREFACTION PLOT OF RBCL DATA BY REPLICATE

Rarefaction Plot of rbcL data by replicate showing the number of OTUs as a function of sequence depth (number of reads). Replicates from the same site are in the same color.

73

APPENDIX 7: 16S RELATIVE ABUNDANCE

Relative abundance (%) for each phylum in the 16S data.

Phylum Site Not MC0240 HF090 WB51 MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered Acidobacteria 2.92 2.90 1.61 2.59 5.28 3.50 3.48 4.02 4.59 2.57 3.76 Actinobacteria 3.82 9.08 5.74 5.86 5.31 6.25 3.44 5.19 4.22 3.52 3.93 Aminicenantes 0.01 0.01 0.01 0.01 0.02 0.02 0.03 0.02 0.03 0.00 0.02 Armatimonadetes 0.58 0.86 0.14 0.56 0.75 0.66 0.59 0.67 0.40 0.40 0.40 9.44 8.77 10.99 9.61 9.54 9.85 13.09 10.65 11.47 14.47 12.69 Bathyarchaeota 0.04 0.03 0.01 0.03 0.07 0.04 0.07 0.06 0.11 0.01 0.07 0.04 0.06 0.04 0.05 0.03 0.03 0.02 0.02 0.06 0.01 0.04 Chlorobi 0.13 0.04 0.03 0.08 0.17 0.08 0.12 0.12 0.14 0.06 0.11 1.43 6.02 1.34 2.78 1.98 1.91 1.26 1.75 1.57 0.73 1.22 Crenarchaeote 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Cyanobacteria 8.56 4.84 0.32 5.43 2.52 1.56 6.67 3.25 3.55 4.80 4.06 Deferribacteres 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Deinococcus- 1.87 1.64 0.54 1.48 0.45 0.41 0.45 0.43 0.14 0.35 0.22 Thermus Elusimicrobia 0.01 0.00 0.00 0.00 0.04 0.02 0.00 0.02 0.09 0.05 0.07 0.01 0.00 0.02 0.01 0.01 0.00 0.01 0.00 0.03 0.00 0.02 0.00 0.00 0.00 0.00 0.01 0.04 0.03 0.03 0.05 0.02 0.04 0.53 0.86 0.82 0.70 0.81 0.85 0.49 0.74 0.80 0.41 0.64 0.00 0.00 0.00 0.04 0.00 0.00 0.00 0.04 0.00 0.00 0.03 74

Appendix 7: Continued

Gemmatimonadetes 0.76 0.80 0.18 0.63 0.88 0.40 0.68 0.62 0.85 0.41 0.67 Ignavibacteriae 0.07 0.02 0.01 0.04 0.15 0.11 0.14 0.13 0.25 0.04 0.17 Nitrospinae 0.05 0.03 0.03 0.04 0.04 0.16 0.08 0.10 0.11 0.02 0.07 0.78 0.77 0.40 0.69 1.21 1.05 0.99 1.08 1.52 0.96 1.29 Planctomycetes 2.70 4.62 1.06 2.87 3.55 2.69 2.69 2.95 3.44 1.89 2.81 64.40 56.03 75.78 64.65 63.66 67.23 61.39 64.54 62.35 66.13 63.87 Saccharibacteria 0.16 1.14 0.23 0.47 0.25 0.30 0.18 0.25 0.14 0.10 0.13 0.03 0.01 0.00 0.02 0.09 0.09 0.05 0.08 0.17 0.20 0.18 Synergistetes 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Tenericutes 0.00 0.00 0.00 0.00 0.01 0.01 0.00 0.01 0.00 0.00 0.00 Thaumarchaeota 0.29 0.22 0.24 0.26 0.36 0.17 0.23 0.24 0.49 0.26 0.40 1.37 1.24 0.45 1.10 2.81 2.58 3.83 2.99 3.42 2.59 3.08

75

APPENDIX 8: 16S NUMBER OF OTUS

Total number of 16S OTUs and 16S OTUs assigned to Cyanobacteria from each sample and category.

Site Number of OTUs Total Cyanobacteria MC0240 4043 96 HF090 3424 98 WB51 2008 26 Not Recovered 6891 175 MC0300 3390 77 HP039 3683 76 WBSCRM1.8 2811 67 Recovered 6663 165 LM0110 5508 139 EB001 2977 81 Unimpaired 6895 178

76

APPENDIX 9: 18S RELATIVE ABUNDANCE

Relative abundance (%) of each phylum in the 18S data.

Phylum Site

Not MC0240 HF090 WB51 MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered Acomycota 1.29 11.01 1.55 4.36 2.19 1.13 0.87 1.43 1.31 0.94 1.14 (Fungi) Amoebozoa 16.71 17.94 20.31 18.20 12.30 16.48 17.44 15.31 9.92 12.52 11.10 Annelid 0.28 8.24 0.01 2.65 2.47 3.21 1.59 2.63 7.84 0.13 4.33 0.08 0.22 0.02 0.10 0.07 0.04 0.01 0.04 0.06 0.01 0.04 Arthropoda 23.64 10.54 6.56 14.35 14.82 11.39 2.35 10.65 7.00 2.97 5.17 Bacillariophyta 41.27 18.67 37.16 33.05 41.27 46.18 57.50 46.90 48.45 67.72 57.22 0.42 5.50 0.87 2.12 0.17 0.35 0.10 0.24 0.28 0.22 0.25 (Fungi) Cercozoa 3.58 3.55 22.35 9.35 1.35 4.05 0.63 2.46 1.06 1.54 1.28 Charophyta 0.05 0.16 0.17 0.12 0.05 0.02 0.00 0.03 0.00 0.00 0.00 Chlorophyta 2.57 2.65 3.04 2.74 2.24 1.78 3.38 2.26 3.29 2.78 3.05 Choanozoa 0.31 0.12 0.00 0.16 0.30 0.30 0.19 0.28 0.16 0.13 0.15 Chordata 0.00 0.00 0.01 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.42 2.41 0.12 0.94 0.12 3.13 0.14 1.52 0.86 0.09 0.51 (Fungi) Ciliophora 1.21 1.83 1.06 1.35 2.05 1.65 5.64 2.61 2.23 2.13 2.19 0.53 0.06 0.02 0.23 5.68 0.22 0.48 2.07 1.73 0.02 0.95 Cryptophyta 0.73 0.65 0.13 0.52 1.01 0.43 0.33 0.60 0.86 0.43 0.67 Embryophyta/ 0.61 6.03 0.03 2.10 5.23 0.06 0.12 1.77 0.29 0.27 0.28 Streptophyta 77

Appendix 9: Continued.

Euglenozoa 0.26 0.44 0.97 0.53 0.18 0.18 0.70 0.29 0.43 0.28 0.36 Gastrotricha 1.06 0.64 1.54 1.08 2.30 2.04 3.00 2.32 1.25 1.74 1.47 Haptophyta 0.00 0.01 0.00 0.00 0.03 0.01 0.04 0.02 0.01 0.00 0.01 Heterokontophyta 0.67 0.73 0.35 0.59 0.25 0.40 0.36 0.34 0.36 0.17 0.27 (Oomycota) Miozoa 0.05 0.02 0.00 0.03 0.01 0.00 0.06 0.02 0.07 0.02 0.05 0.00 0.00 0.00 0.00 0.04 1.41 1.21 0.92 0.35 0.08 0.23 Nematoda 0.25 0.85 0.04 0.37 0.13 0.44 0.33 0.32 7.48 0.09 4.11 Nematomorpha 0.00 0.48 0.01 0.15 0.00 0.03 0.00 0.01 0.00 0.00 0.00 0.00 0.02 0.00 0.01 0.09 0.01 0.00 0.03 0.00 0.00 0.00 Ochrophyta 1.81 4.08 2.79 2.81 1.42 1.25 0.91 1.23 1.51 1.83 1.66 Platyhelminthes 0.03 0.01 0.02 0.02 0.79 0.01 0.08 0.28 0.09 0.03 0.06 Retaria 0.00 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 Rhodophyta 1.19 0.40 0.03 0.59 0.53 1.01 0.77 0.80 0.89 1.57 1.20 Rotifera 0.66 1.77 0.47 0.94 2.59 2.47 1.50 2.31 1.76 2.18 1.95 Rozelida (Fungi) 0.31 0.92 0.34 0.51 0.31 0.30 0.28 0.30 0.43 0.12 0.29 Tartdigrada 0.00 0.00 0.00 0.00 0.01 0.00 0.00 0.00 0.00 0.00 0.00 0.01 0.05 0.03 0.03 0.02 0.00 0.00 0.00 0.03 0.00 0.02

78

APPENDIX 10:18S ALGAL RELATIVE ABUNDANCE

Relative abundance (%) for each algal phylum in the 18S data.

Phylum Site

MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered

Bacillariophyta 82.31 65.97 61.00 70.71 87.48 85.53 90.68 87.35 87.08 91.05 89.17 Cercozoa 6.14 8.39 28.84 15.67 2.04 6.80 0.63 3.92 1.10 0.76 0.92 Charophyta 0.10 0.56 0.27 0.25 0.10 0.04 0.00 0.05 0.00 0.00 0.00 Chlorophyta 5.12 9.36 5.00 5.86 4.74 3.30 5.33 4.21 5.91 3.73 4.76 Cryptophyta 1.45 2.28 0.21 1.11 2.15 0.79 0.52 1.12 1.55 0.58 1.04 Euglenozoa 0.53 1.55 1.59 1.14 0.39 0.34 1.11 0.54 0.77 0.37 0.56 Haptophyta 0.00 0.04 0.00 0.01 0.05 0.03 0.06 0.04 0.02 0.00 0.01 Miozoa 0.10 0.08 0.00 0.06 0.02 0.00 0.09 0.03 0.14 0.03 0.08 Ochrophyta 1.87 10.35 3.04 3.92 1.89 1.30 0.37 1.24 1.83 1.37 1.59 Rhodophyta 2.38 1.41 0.05 1.27 1.13 1.88 1.21 1.50 1.60 2.11 1.87

79

APPENDIX 11: NUMBER OF 18S ALGAL OTUS

Number of OTUs assigned to each algal phylum in the 18S

Phylum Site MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered Bacilliariophyta 61 45 57 107 61 58 60 101 76 83 111 Cercozoa 15 14 6 26 9 13 4 21 5 4 8 (Imbricatea & Thecofilosea) Charophyte 1 1 1 1 1 1 0 1 0 0 0 Chlorophyta 45 50 27 95 35 22 17 51 27 27 38 Cryptophyta 12 17 4 21 9 7 5 11 9 5 9 Euglenozoa 14 13 6 24 4 10 7 18 10 7 15 Haptophyta 0 1 0 1 1 2 1 2 1 0 1 Miozoa 4 1 0 5 1 0 1 1 4 1 4 Ochrophyta 26 46 9 69 18 17 4 32 17 22 34 Rhodophyta 2 2 1 4 3 5 2 5 4 2 4

80

APPENDIX 12: UPA RELATIVE ABUNDANCE

Relative abundance(%) of each phylum in the UPA data.

Phylum Site

MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered Bacillariophyta 1.69 6.63 39.40 13.76 70.24 63.44 67.59 66.26 70.29 68.17 69.39 Cercozoa 0.03 0.16 0.23 0.13 0.00 0.03 0.00 0.02 0.00 0.01 0.00 Charophyta 0.01 0.02 0.02 0.02 0.04 0.05 0.01 0.04 0.03 0.06 0.04 Chlorophyta 0.42 8.34 3.29 3.71 1.32 1.35 1.59 1.39 2.19 1.27 1.80 Cryptophyta 0.00 0.00 0.09 0.03 0.05 0.13 0.19 0.12 0.38 0.19 0.30 Cyanobacteria 96.90 82.86 42.93 77.44 22.40 25.23 20.41 23.40 20.85 23.43 21.94 Euglenozoa 0.05 0.06 0.24 0.10 0.06 0.13 0.05 0.09 0.04 0.05 0.05 Haptophyta 0.00 0.00 0.01 0.01 0.03 0.03 0.06 0.04 0.08 0.00 0.05 Miozoa 0.11 0.12 9.71 2.79 3.17 5.97 9.17 5.85 4.91 6.15 5.43 Ochrophyta 0.54 1.52 4.01 1.82 2.32 2.17 0.87 1.93 0.96 0.53 0.78 Rhodophyta 0.24 0.28 0.08 0.21 0.38 1.47 0.06 0.86 0.26 0.14 0.21

81

APPENDIX 13: UPA OTUS PER PHYLUM

Number of UPA OTUs assigned to each phylum

Phylum Site

MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered

Bacillariophyta 27 25 78 92 90 95 72 138 121 112 155 Cercozoa 3 4 8 11 0 3 0 3 0 1 1 Charophyta 2 1 2 5 2 3 1 5 2 3 4 Chlorophyta 35 42 88 128 48 61 18 101 35 35 63 Cryptophyta 1 0 4 5 1 10 1 10 13 7 16 Cyanobacteria 160 160 208 316 130 146 70 193 170 117 179 Euglenozoa 7 5 9 19 8 15 4 25 10 7 17 Hapytophyta 0 1 2 3 1 1 1 1 1 0 1 Miozoa 4 2 10 11 9 16 16 20 16 16 18 Ochrophyta 23 36 58 90 22 54 12 74 11 10 18 Rhodophyta 5 1 5 10 6 10 3 13 5 2 6

82

APPENDIX 14: RBCL RELATIVE ABUNDANCE

Relative abundance (%) of each diatom genus in the rbcL data.

Genus Site

MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered

Achnanthidium 5.46 1.84 12.72 7.14 3.27 3.55 2.03 2.65 2.03 5.57 3.47 Amphora 0.00 0.00 0.25 0.11 0.06 0.00 0.00 0.01 3.15 0.09 1.90 Cocconeis 0.00 0.00 0.00 0.00 0.05 0.00 0.34 0.20 0.07 0.09 0.08 Craticula 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.02 0.01 Cyclotella 2.56 0.00 0.16 0.32 0.51 2.53 0.82 1.19 0.93 4.41 2.35 Cymbella 0.48 0.07 0.35 0.24 17.98 21.96 25.56 23.19 36.34 2.58 22.60 Discostella 0.00 0.15 0.00 0.07 0.00 0.00 0.00 0.00 0.00 0.00 0.00 Encyonema 2.14 0.12 5.20 2.62 21.06 5.60 23.48 18.55 9.21 11.45 10.12 Fistulifera 2.35 0.10 2.70 1.50 0.99 2.64 1.33 1.59 2.86 2.41 2.68 Fragilaria 18.58 19.03 3.51 11.93 14.25 2.69 13.56 10.97 3.40 8.47 5.46 Gomphonema 9.39 5.77 10.43 8.24 15.62 13.68 5.45 9.48 3.61 3.10 3.40 Halamphora 0.00 0.00 0.15 0.07 0.04 0.00 0.04 0.03 4.63 0.12 2.80 Mayamaea 0.00 0.00 0.70 0.32 0.00 0.00 0.02 0.01 0.06 0.30 0.16 Melosira 4.01 0.10 0.28 0.56 5.44 2.17 1.91 2.66 4.29 5.67 4.85 Navicula 3.87 44.53 3.30 21.86 5.17 7.40 4.02 5.09 7.42 14.72 10.39 Nitzschia 42.75 15.69 32.00 25.71 7.56 13.73 19.00 15.47 16.87 17.44 17.10 Pinnularia 0.21 0.77 0.28 0.49 0.11 0.00 0.00 0.02 0.00 0.02 0.01 83

Appendix 14: Continued.

Planothidium 0.21 0.00 0.96 0.46 0.05 0.07 0.16 0.12 0.08 4.21 1.76 Reimeria 0.00 0.00 0.95 0.43 0.00 0.00 0.30 0.17 0.48 5.85 2.67 Sellaphora 7.80 11.74 23.48 16.70 6.66 23.05 0.60 7.38 1.99 5.62 3.47 Surirella 0.21 0.09 2.59 1.24 1.19 0.92 1.37 1.22 2.58 7.86 4.73

84

APPENDIX 15: RBCL OTUS

Number of rbcL OTUs assigned to each diatom genus.

Genera Site

MC0240 HF090 WB51 Not MC0300 HF039 WBSCRM1.8 Recovered LM0110 EB001 Unimpaired Recovered Acnathidium 3 3 13 14 8 8 12 17 21 17 26 Amphora 0 0 1 1 1 0 0 1 2 1 2 Cocconeis 0 0 0 0 1 0 1 1 3 1 3 Craticula 0 0 0 0 0 0 0 0 0 1 1 Cyclotella 3 0 1 4 2 4 3 4 5 4 5 Cymbella 1 1 1 1 2 2 2 2 2 2 2 Discotella 0 1 0 1 0 0 0 0 0 0 0 Encyonema 2 1 7 11 9 7 7 15 11 9 17 Fistulifera 2 1 4 7 4 4 8 10 12 11 14 Fragilaria 5 2 5 9 5 5 5 9 5 6 10 Gomphonema 5 6 17 22 21 15 11 29 21 20 31 Halamphora 0 0 1 1 1 0 3 3 3 2 4 Mayamaea 0 0 1 1 0 0 1 1 2 3 4 Melosira 1 1 1 2 1 1 1 1 1 1 1 Navicula 4 10 12 21 15 15 16 24 30 24 33 Nitzschia 9 19 34 48 22 34 44 65 69 67 92 Pinnularia 1 3 2 5 1 0 0 1 0 1 1 Planothidium 1 0 1 2 1 2 2 2 2 2 2 Reimeria 0 0 1 1 0 0 3 3 1 3 3 85

Appendix 15: Continued.

Sellaphora 5 10 10 18 10 16 9 23 22 26 39 Suriella 1 1 1 2 3 5 2 5 6 3 6 Ulnaria 3 3 2 3 5 2 7 7 5 3 5

86

APPENDIX 16: READ COUNTS THROUGHOUT BIOINFORMATICS PIPELINE

Number of paired end reads throughout the bioinformatics pipeline.

Process Marker SUM 16S 18S UPA rbcL Reads after separating by Gene (Sum) 2,809,716 795,912 1,920,562 351,792 5,877,982 Reads after separating by Gene (Average per replicate) 90,636 25,675 61,954 11,348 Reads after dada2 (Sum) 2,376,650 717,345 1,679,059 283,286 5,056,340 Reads after dada2 (Average per replicate) 76,666 23,140 54,163 9,138 Features After dada2 (Sum) 16,078 3,032 2,629 757 Final Filtered Reads (Sum) 1,719,015 430,798 978,304 178,219 3,306,336 Final Filtered Reads (Average per replicate) 71,625 17,950 40,763 7,426 Final Filtered OTU count (Sum) 13,884 2,583 2,221 662 ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! ! !

Thesis and Dissertation Services ! !