Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm
Communities along an Acid Mine Drainage Recovery Gradient
A thesis presented to
the faculty of
the College of Arts and Sciences of Ohio University
In partial fulfillment
of the requirements for the degree
Master of Science
Daniel I. Wolf
August 2018
© 2018 Daniel I. Wolf. All Rights Reserved. 2
This thesis titled
Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm
Communities along an Acid Mine Drainage Recovery Gradient
by
DANIEL I. WOLF
has been approved for
the Department of Environmental and Plant Biology
and the College of Arts and Sciences by
Morgan L. Vis
Professor of Environmental and Plant Biology
Robert Frank
Dean, College of Arts and Sciences 3
ABSTRACT
WOLF, DANIEL, M.S., August 2018, Environmental and Plant Biology
Multi-marker Metabarcoding Assessment of Biodiversity within Stream Biofilm
Communities along an Acid Mine Drainage Recovery Gradient
Director of Thesis: Morgan L. Vis
In southeastern Ohio, historical coal mining has exposed streams to acid mine drainage. Active remediation of these streams has proven to be successful for some streams, while others have Not Recovered based on macroinvertebrate assessment. In this study, biofilms were collected from three Not Recovered streams, three Recovered streams, and two Unimpaired streams. The biodiversity was characterized by metabarcoding using four DNA barcode markers and high-throughput amplicon sequencing. Two universal markers (16S and 18S), were sequenced along with an algal specific marker (UPA) and a diatom specific marker (rbcL). Ordination of Bray-Curtis
Index calculated from the total operational taxonomic units (OTUs) present in each site showed the Unimpaired and Not Recovered sites were significantly different.
Additionally, Shannon index for the rbcL and UPA markers showed significantly lower alpha diversity in Not Recovered streams compared to Unimpaired streams. Further taxonomic investigation revealed a decrease in relative abundance and OTUs of diatoms in Not Recovered streams compared to Recovered and Unimpaired streams including some diatom indicator genera of good water quality. Overall, results from this study describe biofilm diversity during remediation and support previous findings using other methods to assess stream impairment from AMD. 4
DEDICATION
I would like to dedicate this to my family, friends, and mentors who influenced me to be curious, learn, and grow. 5
ACKNOWLEDGMENTS
I would first like to acknowledge and thank Dr. Morgan Vis, who provided me with an impeccable research environment where her students are always priority. Her mentorship over the last two years has molded me from a student who likes science to a passionate researcher. I would like to thank my committee members, Dr. Sarah Wyatt and
Dr. Kelly Johnson. Both Dr. Wyatt and Dr. Johnson provided me with excellent advice on my project design allowing me to create a successful research project. I would also like to thank Dr. Kelly Johnson and the Ohio University watershed group for MAIS sampling at my sites. I would like to thank Dr. Rebecca Snell for her help with the statistics in this project. I would like to thank Nathan Smucker for his knowledge of indicator diatoms.
Additionally, I would like to thank the Ohio Department of Natural Resources for collecting permits. I would like to thank to Dr. Bill Broach and Rachel Yoho at the Ohio
University Genomics Facility for help with project design and sequencing. I would like to thank the Student Enhancement Award for funding my masters project. The Vis lab member (present and past), including Josh Evans, Emily Keil, Lexie Redmond, and
Amanda Szinte for helping me with collections, water analysis, and much needed moral support. This whole experience wouldn’t have been the same without my friends in the
Ohio University Environmental and Plant Biology Department, especially Laura Mason,
Alexander Meyers, and Sarah Smith. Last but not least, I would like to thank Dr. Eric
Salomaki for being a role model and encouraging me to work harder and continue my scientific career. 6
TABLE OF CONTENTS
Page
Abstract ...... 3 Dedication ...... 4 Acknowledgments ...... 5 List of Tables ...... 8 List of Figures ...... 9 Introduction ...... 10 Materials and Methods ...... 14 Pilot Study ...... 14 Study Site Selection and Location ...... 16 Biofilm Collection ...... 17 Chlorophyll a Analysis ...... 18 Biofilm Extraction ...... 18 Amplicon PCR and Sequencing ...... 19 Bioinformatics ...... 21 Statistical Analysis ...... 22 Results...... 24 Stream Chemical and Physical Data ...... 24 Illumina Data Analyses ...... 25 Within Site Replication ...... 25 Beta Diversity of all OTUs ...... 26 Analyses of all OTUs ...... 27 Analyses of Algal OTUs ...... 28 Alpha Diversity ...... 31 Discussion ...... 33 Experimental Design...... 33 Comparison of Stream Types ...... 34 Conclusion...... 39 References ...... 40 Appendix 1: Site Information ...... 66 Appendix 2: Shannon Index Values ...... 67 7
Appendix 3: Rarefaction Plot of 16S data by replicate ...... 69 Appendix 4: Rarefaction Plot of 18S data by replicate ...... 70 Appendix 5: Rarefaction Plot of UPA ...... 71 Appendix 6: Rarefaction Plot of rbcL data by Replicate ...... 72 Appendix 7: 16S Relative Abundance...... 73 Appendix 8: 16S Number of OTUs ...... 75 Appendix 9: 18S Relative Abundance...... 76 Appendix 10:18S Algal Relative Abundance ...... 78 Appendix 11: Number of 18S algal OTUs ...... 79 Appendix 12: UPA Relative Abundance ...... 80 Appendix 13: UPA OTUs per phylum ...... 81 Appendix 14: rbcL Relative Abundance ...... 82 Appendix 15: rbcL OTUs ...... 84 Appendix 16: Read Counts throughout Bioinformatics Pipeline ...... 86
8
LIST OF TABLES
Page
Table 1: PCR Amplification Primers ...... 49 Table 2: PCR Thermocycler Settings ...... 50 Table 3: Mean Stream Chemical and Physical Data ...... 51 Table 4: Compairson of OTUs in Replicates ...... 52 Table 5: rbcL Diatom Relative Abundance ...... 52
9
LIST OF FIGURES
Page
Figure 1: Map of Collection Sites ...... 54 Figure 2: OTU Venn Diagram ...... 55 Figure 3: Shannon Diversity Boxplots...... 56 Figure 4: 16S Rarefaction Plot per Site...... 57 Figure 5: 18S Rarefaction Plot per Site...... 58 Figure 6: UPA Rarefaction Plot per Site...... 59 Figure 7: rbcL Rarefaction Plot per Site...... 60 Figure 8: Bray-Curtis Non-metric multidimensional scaling (NMDS)...... 61 Figure 9: Bray-Curtis Constrained Analysis of Principal Coordinates (CAP)...... 62 Figure 10: Relative Abundance of Algal Phyla for the 18S Marker ...... 63 Figure 11: Relative Abundance of Algal Phyla for the UPA Marker ...... 64 Figure 12: Relative Abundance of Diatom Genera for the rbcL Marker ...... 65
10
INTRODUCTION
Freshwater biofilms establish and grow on rocks, wood, and other surfaces in aquatic ecosystems (Wetzel 1983). The biofilm consisting of a complex community of algae, fungi, bacteria, and other unicellular organisms in a gelatinous polysaccharide matrix is a vital source of energy in stream ecosystems (Wetzel 1983, Stevenson et al.
1996). These communities may be the first to respond to environmental stress in freshwater streams, reflect the current environmental conditions (Edwards and Kjellerup
2013) and can have short recovery times from environmental perturbations (Belanger and
Rupe 1996). The microorganisms that comprise the biofilm community can be more or less tolerant of varied environmental conditions and community level changes in stream biofilms has been used to track changes in water chemistry due to eutrophication (Nelson et al. 2013). Organisms living within the biofilm that require specific environmental conditions can be used as “indicators” of water quality, such as the diatom genera
Cymbella and Cocconeis known to inhabit streams with high water quality (Wang at al.
2005).
In the Appalachian region, a substantial source of pollution is acid mine drainage
(AMD) from coal mines abandoned prior to regulation (U.S. EPA 1994). It is estimated that 12,000 stream km or 14% of all stream km in this region have been affected (U.S.
EPA 1997). When coal tailings and waste from these abandoned mines are exposed to air, water, and bacterial processes, sulfuric acid and iron oxide are produced from the oxidation of sulfide materials (Gray 1997). The acidic stream water leaches metals from soils and rocks, causing high concentrations of Al, Fe, Mn, Cu, As, and other toxins (U.S. 11
EPA 1994). In an effort to restore ecological stability, different techniques have been used to remediate AMD streams. Active treatment of AMD in the Appalachian region is primarily the alkaline doser, which discharges calcium oxide into AMD headwaters
(Costello 2003).
Environmental stresses caused by AMD have a variety of effects on the diversity of organisms in the biofilm community. Reduction in species richness in stream biofilms simplifies the food web and may reduce the ecological stability in the ecosystem (Gray
1997, Smucker et al. 2014). In particular, decreases in diatom species richness has been documented with increased AMD impairment due to diatom taxa that are sensitive to or less tolerant of this pollution (Verb and Vis 2001, 2005, Bray et al. 2008). Diatom taxa living within stream biofilms are correlated with a range of AMD impairment levels
(Zalack et al. 2010). Comparing the diatom community structure of AMD impaired and streams that have undergone remediation can be an effective tool to assess the efficacy of remediation efforts (Smucker et al. 2014). Whole biofilm community changes have been documented using functional measures such as fatty acid profiles (Guschina and
Harwood 2009, DeForest et al. 2016, Drerup and Vis 2016). Although the effects of
AMD and remediation have been investigated via diatom diversity and to a much lesser extent other biofilm organisms and functional measures, a study of the whole biofilm community diversity has yet to be undertaken. Determination of whole biofilm species richness, evenness and diversity would be difficult using traditional microscopic methods due to cryptic species and/or sheer quantity of organisms to identify (Stein et al. 2014,
Zimmermann et al. 2015). 12
Similar to traditional microscopy, individual members of the biofilm community have been identified using molecular data (Souza-Egipsy et al. 2008, Baker et al. 2009,
Dopheide et al. 2009, Lear et al. 2009). A DNA barcode or gene fragment can be targeted with specific primers for PCR and subsequent Sanger sequencing for taxonomic identification (Ratnasingham and Hebert 2007). Barcoding on a specimen by specimen basis would be too costly and time consuming for the entire community of organisms, which may have hundreds or thousands of taxa per sample (Elbrecht and Leese 2015).
With advent of high-throughput sequencing, environmental DNA (eDNA) barcoding or metabarcoding can be utilized to identify multiple taxa within a community simultaneously to determine the diversity (Zimmermann et al. 2015). Within the biofilm community, the most prominent group of organisms that have been investigated using molecular techniques are bacteria (16S rDNA barcode region) (Souza-Egipsy et al. 2008,
Lear et al. 2009) as well as Fungi and Protista Kingdoms (18S rDNA barcode region)(Baker et al. 2009, Dopheide et al. 2009). Organisms that are not easily amplified or have low taxonomic resolution with the primer sets for these standard markers have gone undetected, regardless of abundance and importance in the ecosystem (Marcelino et al. 2016). Multi-marker analyses can utilize both universal and taxonomic group specific primer sets to target barcode regions of a wide range of organisms and increase detection of unique operational taxonomic units (OTUs). This multiple marker strategy has been successful in increasing the known diversity in other ecosystems (Marcelino et al. 2016,
Pfendler et al. 2018).
The combination of high-throughput sequencing and a multi-marker barcoding 13 approach has yet to be employed to study stream biofilm biodiversity and has a high potential to yield insights into community differences associated with varying water quality. Remediation of AMD has resulted in streams with higher water quality. Yet, not all streams have shown the same improvements in biological condition. The purpose of this study was to examine biofilm diversity in streams along an AMD recovery gradient to investigate any differences in in overall biofilm community with a focus on the algal community. It was hypothesized that biofilm taxonomic diversity measures in streams would reflect water quality and remediation status similar to other biological metrics.
14
MATERIALS AND METHODS
Pilot Study
A pilot study was conducted to test various aspects of the laboratory protocols as well as the sequencing and bioinformatics pipeline. AMD streams contain pollutants (i.e. metals) that may interfere with DNA extraction and sequencing. In addition, biofilms have a thick polysaccharides matrix that may hamper extraction of high-quality DNA. In the preliminary study, one Recovered sites (MC 0300) and one Not Recovered site (MC
0240) were sampled in early May 2017. The environmental DNA (eDNA) was extracted using the DNeasy PowerSoil or PowerBiofilm kits (QIAGEN, Valencia, CA, USA) as per the manufacture protocol to determine which would be more successful removing inhibitors from the extraction that could cause problems with PCR or sequencing. The
PowerBiofilm kit was determined to be better based on DNA concentration and number of reads from the high-throughput sequencing.
I also tested the appropriate amount of starting material in order to maximize the amount of DNA Recovered without clogging the columns in the extraction kit. Starting material weighing 0.05 to 0.1g was used for the preliminary extractions. These values were determined based on the extraction kit protocol for stream rocks. The amount of starting material only slightly impacted the DNA concentration after extraction. Using
0.1g of material improved the DNA concentration of a few samples, but the majority had no change in concentration. During the homogenization step, two durations (1 and 4 minutes) were tested. It was determined based on amplification success (gel electrophoresis) and DNA concentration that 4 minutes was best for all markers, 15 especially the rbcL marker that targeted diatoms with a silica frustule that may require additional lysing time.
PCR amplification of DNA extracts was conducted using the 16S, 18S, UPA, and rbcL primers with Illumina overhangs (Table 1). Thermocycler settings were optimized for each primer pair during pilot study (Table 2). Amplified PCR product from each primer pair was multiplexed and sequenced using an Illumina Miseq V2 kit (Illumina
Inc., San Diego, CA, USA). I chose to sequence samples from both extraction kits, two replicates of the same eDNA sample, as well as a negative extraction control. The data were processed with Qiime1 (Caporaso et al. 2010) (Qiime2 was in early beta testing).
Data analyses after sequencing revealed the importance of sterile technique as the negative controls contained more sequences than originally expected. Therefore, in the primary study, the lab bench was bleached prior to extraction and PCR reactions were conducted in a laminar flow hood. Assignment of taxonomy using Qiime1 did not show any obvious primer biases based on the range of taxa sequenced. Rarefaction curves were calculated using CLC genomics workbench (version 10.1.1; https://www.qiagenbioinformatics.com) to determine the number of samples that could be sequenced in a single MiSeq run while maintaining good sequencing depth.
Rarefaction curves for all markers showed that the accumulation of new operational taxonomic units (OTUs) per replicate plateaued at approximately 5,000 to 8,000 reads with each marker. Therefore, 31 multiplexed (4 amplicons per sample) samples could be run together, which is in excess of the number proposed for the primary study. 16
Study Site Selection and Location
In order to investigate biofilm communities in streams that have undergone AMD remediation and compare with those unimpacted by AMD, a total of nine sites were sampled (Appendix 1). These sites were categorized prior to sampling as Unimpaired,
Recovered, and Not Recovered. Stream reach categories were determined using MAIS
(Macroinvertebrate Aggregate Index for Streams) scores (Johnson 2007) from at least the last 5 years (2011-2016). A linear regression was used to test significance change in
MAIS scores over previous 5 years in both Recovered and Not Recovered sites.
Recovered sites were classified by having MAIS scores that had significantly (P < 0.05) improved since monitoring started and had sustained a MAIS score > 12 for the past 3 years. Not Recovered streams were determined based on streams with MAIS scores significantly (P < 0.05) improving over last 5 years but not ³12 for past two consecutive years. Sites classified as Unimpaired had not been previously impacted by AMD and have been classified as Excellent Warm Water Habitat (EWH) according to the EPA
(OEPA 1987). Three watersheds were sampled in order to have three sites in each category. However, one of the Unimpaired streams (ELRM 0.1) was removed from the dataset due to low 2017 MAIS scores. Therefore, the remaining sites were as follows:
WB SC RM1.8 (Recovered), Unimpaired EB001 (Unimpaired), and WB 51 (Not
Recovered) in Sunday Creek, HF 039 (Recovered) and HF 090 (Not Recovered) in
Hewett Fork (within Raccoon Creek) and MC 0300 (Recovered), LM 0110
(Unimpaired), and MC 0240 (Not Recovered) in Monday Creek (Fig. 1). 17
Biofilm Collection
The stream sites were sampled on two consecutive days (August 7- 8, 2017) with similar weather conditions. At each site, three independent biofilm replicates were collected. For each replicate, 10 cobbles were randomly selected within a single riffle, a
3-cm circular area of each cobble was scrubbed with a sterile soft bristle brush and the material pooled into a sterile 50 mL conical tube using a sterile 60 mL plastic syringe.
The three conical tubes were placed on ice for transport to the laboratory. A number of physical stream characteristics were measure at each site. Current velocity was measured using the Global Water Flow Probe (Global Water Instrumentation, Inc., College Station,
TX, USA); three measurements (in areas that were visually high, medium and low flow) were taken within the riffle to calculate the mean current velocity. Concurrently, stream depth was measured using the metric markings on the Global Water Flow Probe and mean depth was calculated. The Multi-Parameter PCTestr™ 35 (Oakton Instruments,
Vernon Hills, IL, USA) was used to measure pH, specific conductance, and water temperature. Stream width was measured across the widest part of the sampled riffle using a tape measure. Canopy cover was visually estimated in the middle of the riffle using a scale of 0 to 100% shade. Stream water was filtered through a 0.45µm filter into a
50 mL conical tube for chemical analysis in the laboratory.
On the collection date, the biofilm material was centrifuged in the 50 mL conical tubes at 3,400 rpm for 20 min at 4˚C. The supernatant was removed leaving the pelleted biofilm material. Before freezing biofilm material, samples were lightly stirred in the conical tube and approximately 1 g biofilm material from each tube was placed in a 18 microcentrifuge tube and centrifuged for 30 seconds at 11,600 rpm. Any remaining supernatant was removed and the samples were placed in a -20˚C freezer until extraction.
The remaining biofilm material was frozen to be used, if needed. A voucher of the biofilm from each sample was stored in a 2.5% glutaraldehyde solution in a 20 mL vial.
Within 48 hours of collection, stream water was analyzed using a HACH DR/890™ colorimeter (HACH Company, Loveland, CO, USA) and HACH powder pillows as per
2- the manufacturer’s protocol for sulfate (SO4 ) (Method 8051), iron (Fe) (Method 8008),
- and nitrate (NO3 ) (Method 8192).
Chlorophyll a Analysis
For each replicate from each site, a subsample of biofilm material was freeze- dried. Chlorophyll a was analyzed using 3-5 mg of freeze-dried biofilm. The samples were soaked in 90% acetone for 18-20 hours as described by EPA method 445.0 (Arar and Collins 1997). The concentration of chlorophyll a was determined using a Turner
TD-700 fluorometer (Turner Designs, Sunnyvale, CA, USA). To correct for phaeophytin- a, 0.1 N HCl solution was added. The results of the three replicates per site were reported as mean chlorophyll a concentration µg/g.
Biofilm Extraction
Biofilm extraction was conducted using sterile technique to minimize DNA contamination from lab sources. Extractions were completed during a three-day period.
Each set (1 per day) of extractions contained one replicate from each stream plus a negative control (autoclaved Nanopure H2O). The negative control was subjected to the same procedures as the other samples and later used in the bioinformatics pipeline to 19 remove potential contaminate sequences for the extractions conducted on the same day.
DNeasy PowerBiofilm kit was used to extract 0.1g biofilm. The manufacturer’s protocol was utilized with a one modification as follows: homogenization was performed using the
TissueLyser LT (QIAGEN) for 4 min at 50 oscillations/sec. After extraction, the eDNA sample was quantified using a Qubit Fluorometer 3.0 (Life Technologies, Carlsbad, CA,
USA). All samples were diluted to 1ng/µl and stored at -20˚C in 100 µl aliquots.
Amplicon PCR and Sequencing
The first step in the two-step PCR procedure to prepare Illumina 16S amplicon sequencing libraries for multiple samples and markers was to amplify the four barcode markers in individual PCR reactions for each eDNA sample. The V4 region of 16S rDNA
(prokaryotes) was amplified using primers 515F and 806RB (Caporaso et al. 2011,
Apprill et al. 2015). The 18S rDNA (eukaryotes) amplicon was amplified with primers
1391F and EukBr (Amaral-Zettler et al. 2009). Both the 16S and 18S primers were chosen from those available via the Earth Microbiome Project
(http://www.earthmicrobiome.org). The Universal Plastid Amplicon (UPA), which targets a portion of the 23S rDNA chloroplast gene in algae, was amplified using primers p23SrV_f1 and p23SrV_r1 (Sherwood and Presting 2007). Lastly, a rbcL chloroplast marker suitable for amplicon sequencing (> 600 bp) to specifically target diatom taxa was designed based on previously published primers. Sequences from 15 freshwater diatom genera were downloaded from GenBank and used as a test community to determine the best primer pair in Geneious v10.1 (Kearse et al. 2012). Two primers cfD and DtrbcL_3R
(Lang and Kaczmarska 2011, Hamsher et al. 2011) were chosen based on the amplicon 20 size (338 bp) and ability to bind to all genera from the test community (Geneious v10.1).
In order to increase the ability of the sequencer to distinguish similar sequences simultaneously, the library was made more complex by adding 0-2 N’s at the 5’ end of the primers, followed by either the forward or reverse Illumina overhang tail of 33 bp.
The primary PCR amplification was completed using 12.5 µl of Kapa HiFi HotStart
ReadyMix (Kapa Biosystems, Wilmington, MA, USA), 0.5 µl of each primer (10x), and
11.5 ng eDNA prepared in a sterile laminar flow hood. One master mix per gene was prepared and aliquoted for all samples along with a negative control (autoclaved
Nanopure H2O) and amplified simultaneously. Thermocycler settings, and detailed primer assembly information is provided (Tables 1, 2).
Secondary PCR amplification and sequencing was performed by the Ohio
University Genomics Facility. Samples were purified using Agencourt AmPure XP beads
(Beckman Coulter Inc., Indianapolis, IN, USA) and quantified using a Qubit Fluorometer
3.0. Equimolar amounts of the 16S, 18S, UPA, and rbcL PCR reactions from a single replicate were pooled, indexed using Nextera Index 1 and 2 primers (to generate uniquely identified samples) and enriched with the Kapa Hifi HotStart 2x PCR Master Mix (Kapa
Biosystems, Wilmington, MA, USA). Amplified pooled samples were analyzed on an
Agilent 2100 Bioanalyzer High Sensitivity DNA Chip (Agilent Technologies, Santa
Clara, CA, USA) for size range and quantified on the Qubit Fluorometer 3.0. Libraries from all samples were pooled in equimolar ratios, denatured using 0.2 N NaOH, diluted to 18 pM and loaded with 10% PhiX clustering control on the Illumina MiSeq with the
600 cycle (2 x 300 bp paired end) Sequencing Reagents V3. 21
Bioinformatics
Reads were demultiplexed using Illumina CASAVA1.8. Demultiplexed reads were parsed by gene and the primer sequences were trimmed using separate_genes.py
(Marcelino and Verbruggen 2016). Reads were quality filtered based on Q30 score and trimmed to remove low quality reads while maintaining least 50 bp overlap of forward and reverse reads using Qiime2 (Caporaso et al. 2010, version 2017.12, https://docs.qiime2.org/2017.12/). Trimmed and filtered reads were denoised and dereplicated to produce Amplicon Sequence Variants (ASVs) using DADA2 (Callahan et al. 2016). ASVs are OTUs that have been assessed based on a 100% match as opposed to
97% match OTUs (previous standard). I will refer to these ASVs as OTUs, a term more often used in this field of study. After processing with DADA2, OTUs were filtered using the sequences present in PCR and extraction controls. Since I used one negative control for each set of extractions and PCR runs, OTUs present in negative controls were removed from samples that were in the same extraction or PCR run. OTUs that had only one or two reads assigned (singletons and doubletons) were removed from the dataset, as well. The SILVA database (release 128, Quast et al. 2013) was employed to assign taxonomy to 16S and 18S OTUs. For UPA OTUs, taxonomy was assigned from a SILVA
(release 123) curated database (Sherwood et al. 2017) and for the rbcL OTUs, taxonomy was assigned using the R-Syst Micro-Algae v6 database (http://www.rsyst.improving Not
Recovereda.fr/, 11/2017). Reference taxonomy databases were trimmed to contain only the region of the gene amplified by the primers in this study. A machine learning naïve
Bayes classifier was trained and used to assign taxonomy (confidence = 0.7, classify- 22 sklearn: Qiime2). OTUs present in the 16S dataset classified as chloroplast or mitochondria after assigning taxonomy were removed as these are not prokaryote sequences. Non-algal taxa (Embryophytes and non-photosynthetic Bacteria) present in the UPA marker dataset were also removed. Analysis of Variance (ANOVA) and
Kruskal-Wallis Rank Sum Test were calculated using R (version 3.4.2) on OTUs per
Phylum in each stream category. Venn diagrams for each DNA marker were created
(bioinformatics.psb.ugent.be/webtools/Venn/, 3/30/18) to visualize the number of OTUs shared among stream categories and distinctive to a single category (Fig. 2).
Statistical Analysis
Environmental data were standardized by scaling the mean to zero and standard deviation to one using Vegan package in R (Oksanen et al. 2018, Vegan package: decostand, R version 3.4.2). Analysis of Variance (ANOVA) and Kruskal-Wallis Rank
Sum Test were calculated using R (version 3.4.2) to determine environmental data that were significantly different (P < 0.05) among the three stream categories. Those stream variables were included in further analyses (Table 3).
Alpha diversity was calculated using Shannon diversity index for each gene marker (Appendix 2). Shannon index was calculated separately for algal only OTUs in
18S and 16S data. One replicate from LM 0110 (Unimpaired) for the UPA gene was removed due to low number of reads (< 50 reads total). Kruskal-Wallis pairwise test was used to determine significant difference between Shannon index values. Shannon index was plotted for each marker by stream category (Fig. 3). Rarefaction curves were calculated for each marker to visualize the rate at which additional reads increased the 23 number of OTUs (Qiime2) and to ensure sequence depth was adequate for each site (Figs
4–7) and each replicate (Appendices 3–6). Relative abundance and OTUs richness of taxa
(phylum for 16S, 18S, UPA, and genus for rbcL) per sample (combined replicates within a stream) was calculated for each gene marker (Appendices 7–15). Relative abundance in the 18S markers was also calculated for algal taxonomy only (Appendix 10).
Beta diversity was explored using the phyloseq (McMurdie and Holmes 2013) and vegan packages in R. The OTU tables were Hellinger transformed (Legendre and
Gallagher 2001). To investigate the similarity in community composition between and within categories, a Bray Curtis distance matrix and Jaccard index were calculated and visualized using a non-metric multidimensional scaling (NMDS) ordination. Ordination of Bray Curtis and Jaccard showed similar results, only NMDS of Bray Curtis is provided
(Fig. 8). Stress level for 16S, 18S, UPA, and rbcL using Bray Curtis and Jaccard index were all between 0.115 to 0.151. The Bray-Curtis and Jaccard index were tested using a
PERMANOVA (Vegan: Adonis) and post hoc test (RVAideMemoire: pairwise.perm.manova, Hervé 2018). A Constrained Analysis of Principal Coordinates
(CAP) using Bray Curtis values was created to overlay the environmental data. Using the function envfit (Vegan), R2 values were calculated for each environmental factor (Fig. 9).
R2 values < 0.5 were removed from the dataset and the ordination was re-conducted. An
ANOVA was conducted to assess the significance of the constrained axes used in ordination.
24
RESULTS
Stream Chemical and Physical Data
Stream chemical and physical data were collected for all sites (Appendix 1).
There were significant differences (P < 0.05) among stream categories in numerous measures as follows: 2016/2017 MAIS scores, specific conductance, temperature, mean current velocity, mean depth, % canopy, nitrate, iron, and sulfate (Table 3). The mean
MAIS scores (2016 and 2017) significantly differed (P < 0.05) among the three categories. The 2017 MAIS scores were not available for all sites (WBSCM1.8, WB51,
MC0300, and MC0240). Specific conductance and sulfate, measures that can be associated with AMD, showed significantly higher values (P < 0.05) for the Not
Recovered in comparison with Unimpaired, but was only significantly higher (P < 0.05) compared to Recovered in sulfate (Table 3). Another AMD indicator measure, iron, differed among the categories with only Recovered being significantly higher (P < 0.05) than Not Recovered (Table 3). Temperature was significantly lower (P < 0.05) in
Unimpaired (19˚C) compared to Recovered (21˚C) and Not Recovered (21˚C) sites. Not
Recovered sites had the significantly greater (P < 0.05) mean current velocity (44 cm/s) than either Unimpaired (16.7 cm/s) or Recovered (15.4 cm/s) sites. Mean depth was significantly lower (P < 0.05) in Unimpaired (14.5cm) compared to Recovered (31.1cm) sites. Canopy cover was significantly greater (P < 0.05) in Unimpaired (90%) than Not
Recovered (30%) sites. Canopy cover in Recovered (70%) sites was not significantly different from either Unimpaired or Not Recovered sites. Mean nitrate was significantly greater in Not Recovered (0.08 mg/L) sites compared to Unimpaired (0.03 mg/L) or 25
Recovered (0.05 mg/L) sites. Chlorophyll a, pH, and stream width were not significantly different (P < 0.05) among stream categories (Table 3).
Illumina Data Analyses
The Illumina MiSeq amplicon sequencing produced a total of 7,525,168 paired end (15,042,336 single) reads and after reads were quality filtered and denoised, a total of
5,056,340 paired end reads remained. Following filtering of low abundance sequences, chimeras and controls, 1,719,015 paired ends reads (71,625 average reads per sample) remained for the 16S dataset, 430,798 paired end reads (17,950 average reads per sample) for the 18S dataset, 978,304 paired end reads (40,763 average reads per sample) for the
UPA dataset, and 178,219 paired end reads (7,426 average reads per sample) for the rbcL dataset. Filtered reads were assigned to a total of 13,884 16S OTUs, 2,583 18S OTUs,
2221 UPA OTUs and 662 rbcL OTUs, respectively (Appendix 16). Rarefaction curves for 16S, 18S, and UPA amplicons showed OTUs began to plateau at ~5,000 to 10,000 reads per replicate, which was similar as the preliminary study (Appendices 3–6). Reads from the rbcL marker began to plateau at ~1,000 reads. Rarefaction curves were also calculated per site (combined replicates) which showed OTUs plateauing with each marker indicating enough reads were sequenced for each site (Figs 4–7).
Within Site Replication
Within a site, three independent replicates were collected so that within site variability could be examined. The mean percent algal and non-algal OTUs observed in all three replicates at a site for 16S, 18S, UPA, and rbcL were 19.05%, 16.19%, 17.05%, and 15.38%, respectively. The mean percent OTUs observed in two of the three replicates 26 for 16S, 18S, UPA, and rbcL were 22.71%, 23.59%, 22.24%, and 31.76%, respectively.
Overall, the majority of OTUs were only observed in a single replicate for all four markers (Table 4).
Beta Diversity of all OTUs
Prior to assigning taxonomy, beta diversity was analyzed with Bray Curtis dissimilarity and Jaccard index. Stream categories and watersheds had significantly (P <
0.05) different centroids using the Bray Curtis distance matrix and Jaccard index for each of the four markers. PERMANOVA of Bray Curtis distance matrix determined stream category (R2 = 0.20-0.17) had a slightly greater effect on the distribution of centroids than watershed (R2 = 0.16-0.17) for all markers. An NMDS of the Bray Curtis values showed similar results for all gene markers with the Recovered and Unimpaired sites on one side of the graph closer to each other and further from Not Recovered sites (Fig. 8). The
Unimpaired sites were furthest from the Not Recovered sites. Within this pattern, sites within a watershed showed some clustering (Fig. 8). Recovered site WBSCRM1.8 tended to cluster closer to Unimpaired sites than the other Recovered sites for each marker. The replicates for a given site clustered in close proximity to each other, but site MC0300 has one replicate that tended to be further away in the 16S, 18S, and UPA plots (Fig. 8).
The constrained ordination was used to overlay the environmental variables with the OTU diversity (Fig. 9). In the ordination for each marker, Unimpaired sites as well as
WBSCRM1.8 (Recovered site) were to one side of the graph and furthest from Not
Recovered sites (Fig. 9). Recovered sites MC0300 and HF039 had points in the middle, but closer to the Unimpaired sites (Fig. 9). The first two axes accounted for 22.2% of 27 variation in the 16S ordination, 20.8% of variation in the 18S ordination, 23.7% of variation in the UPA ordination, and 24.7% of variation in the rbcL ordination. In all four ordinations, the vectors for % canopy cover and 2017 MAIS scores were in the direction of the Unimpaired sites and WBSCRM1.8 (Fig. 9). Vectors for mean current velocity, nitrate, and sulfate were in the direction of the Not Recovered sites (Fig. 9). Depth pointed towards Recovered and Not Recovered sites in the rbcL and UPA ordinations and temperature pointed in a similar direction in the 16S and UPA ordination.
Analyses of all OTUs
All OTUs, algal and non-algal, in the 16S and 18S datasets were assigned taxonomy and phylum level analysis was reported as relative abundance (Appendices 7,
9). Read counts of OTUs that could not be classified at phylum level were not included in relative abundance. Relative abundance was calculated per sample and per category to compare the diversity of each biofilm community.
In all three stream categories for the 16S dataset, the most abundant phylum was
Proteobacteria (63.87 – 64.65%) followed by Bacteriodetes (9.61 – 12.69%). All stream categories also contained a similar relative abundance of Cyanobacteria (3.25 – 5.43%) and Actinobacteria (3.93 – 5.86%) (Appendix 7). The only difference between the five most relative abundant phyla among categories was Acidobacteria (3.76 – 4.02%) in the
Unimpaired and Recovered and Planctomycetes (2.87%) in the Not Recovered. Analysis of 16S OTUs showed the greatest number of overlapping OTUs between Recovered and
Unimpaired (1264 OTUs) sites. A total of 1928 OTUs were shared among all three categories (Fig. 2). 28
A wide taxonomic range of eukaryotic taxa were detected using the 18S marker
(Appendix 9). For all three categories, the greatest relative abundances of reads were assigned to the same three phyla: Bacillariophyta (diatoms) (33.05 – 57.22%), followed by Amoebozoa (11.10 – 18.20%) and Arthropoda (5.17 – 14.35%). These three most abundant phyla were followed by Annelid (4.33%) and Nematoda (4.11%) in the
Unimpaired, Annelid (2.63%) and Ciliophora (2.61%) in the Recovered, and Cercozoa
(9.35%) and Ascomycota (4.36%) in the Not Recovered category. 18S OTUs shared the greatest number of overlap between Not Recovered and Recovered streams (230 OTUs).
Recovered and Unimpaired sites contained a greater number of overlapping OTUs (199) than Not Recovered and Unimpaired (118 OTUs) sites. A total of 424 OTUs were shared between all three categories (Fig. 2).
Analyses of Algal OTUs
The 16S and 18S OTU datasets were examined for algal taxa only. Cyanobacteria is the only algal prokaryote phylum and 369 individual cyanobacterial OTUs were present in 16S dataset: 178 OTUs (18,232 reads) in Unimpaired, 165 OTUs (15,623 reads) in Recovered, and 175 OTUs (40,710 reads) in Not Recovered categories, respectively (Appendix 7, 8). The 18S dataset contained a total of 10 algal phyla to which the 509 individual OTUs (213,290 reads) belonged (Appendix 9, 10, 11). Relative abundance was calculated using only algal OTUs. Unimpaired, Recovered, and Not
Recovered categories contained 224, 242, and 352 algal OTUs, respectively. Algal phyla
Haptophyta, Miozoa, Charophyta, Cryptophyta, Euglenozoa, and Rhodophyta each represented less than 2% of the relative abundance for all three stream categories (Fig. 29
10). Algal reads assigned to Ochrophyta represented less than 2% of the relative abundance in Unimpaired and Recovered categories but was found to be greater in the
Not Recovered category (3.92%). Relative abundance of Cercozoa was 1.04% in
Unimpaired category but was more abundant in Recovered (3.92%) and Not Recovered
(15.67%) categories. Chlorophyta was observed in similar relative abundance in all three stream categories (4.21 – 5.86%). Overall, the dominant phylum in all three streams categories was Bacillariophyta. Relative abundance of Bacillariophyta was 89.17% in the
Unimpaired category, 87.35% in the Recovered category, and 70.71% in the Not
Recovered category (Fig. 10).
The UPA marker amplified reads from a total of 11 algal phyla from 859,892 reads (1354 OTUs)(Appendix 12, 13). Unimpaired sites contained a total of 478 algal
OTUs (239,909 reads), Recovered sites consisted of 583 algal OTUs (258,330 reads). Not
Recovered sites consisted of 690 algal OTUs (361,653 reads) and had the most OTUs
(496 OTUs) not found in other categories (Fig. 2). A total of 165 OTUs were shared by all three categories. Not Recovered and Recovered sites shared more OTUs (146) than either Not Recovered or Recovered shared with Unimpaired (70 and 113, respectively)
(Fig. 2). Excluding prokaryotes (Cyanobacteria), the UPA marker identified a greater number of algal OTUs than 18S (Appendices 11, 13). In all three categories Haptophyta,
Charophyta, Cryptophyta, Euglenozoa, Cercozoa, Rhodophyta, and Ochrophyta each represented less than 2% of the relative abundance (Fig. 11). Unimpaired and Recovered categories a lower relative abundance of Chlorophyta (< 2%) and greater abundance of
Miozoa (5.43 – 5.85%) compared to Not Recovered category (3.71% Chlorophyta and 30
2.79% Miozoa). Bacillariophyta had the greatest relative abundance in Recovered
(66.26%) and Unimpaired (69.39%) categories followed by Cyanobacteria (23.40% and
21.94%, respectively), whereas biofilm composition in the Not Recovered category had greatest relative abundance of Cyanobacteria (77.44%) followed by Bacillariophyta
(13.76%) (Fig. 11). The lower number of Bacillariophyta OTUs in the Not Recovered
(92) category in comparison with Unimpaired (155) was not statistically significant (P =
0.06) but may be biologically important. Bacillariophyta OTUs in Recovered category was not significantly different from Not Recovered (P = 0.24) or Unimpaired categories
(P = 0.41).
The diatom specific rbcL marker identified a greater number of diatom OTUs than either the 18S or UPA marker. The rbcL marker produced 301 diatom OTUs
(77,202 reads) in Unimpaired sites, 223 diatom OTUs (43,375 reads) in Recovered sites, and 174 diatom OTUs (15,052 reads) in Not Recovered sites after unknowns were removed. In total, only 96 OTUs were shared between all three categories (Fig. 2).
Recovered and Unimpaired sites shared more OTUs (121) than either Recovered or
Unimpaired sites shared with Not Recovered sites (36 and 32, respectively) (Fig. 2).
Diatom OTUs in Unimpaired sites were significantly (P < 0.05) greater than Not
Recovered sites. The Recovered category was not significantly different from the either two categories. Specificity of the rbcL marker allowed for relative abundance to be compared at the genus level (Fig. 12, Table 5). Unknown diatom reads were 16.4%,
29.7%, and 26.3%, respectively, of Unimpaired, Recovered, and Not Recovered relative abundance. These reads were removed prior to the final relative abundance calculation. 31
Relative abundance of the major (>10%) diatom genera at Unimpaired sites was in descending order: Cymbella, Nitzschia, Navicula, and Encyonema. Major diatom genera from Recovered sites were Cymbella, Encyonema, Nitzschia, and Fragilaria. The relative abundance of Cymbella in Recovered (23.19%) and Unimpaired (22.60%) sites was much greater than Not Recovered (0.24%) sites. The major genera comprising the relative abundance in Not Recovered sites were Nitzschia, Navicula, Sellaphora, and
Fragilaria (Fig. 12, Table 5).
Alpha Diversity
In addition to relative abundance, alpha diversity was investigated using Shannon diversity index, which was analyzed by category (Fig. 3). Shannon index for 16S data (all
OTUs) showed significantly (P < 0.05) lower diversity in Not Recovered sites compared with the Recovered and Unimpaired sites, but no significant difference between
Unimpaired and Recovered sites. For algal OTUs (Cyanobacteria) only in the 16S data, the Shannon index was highest in Unimpaired sites and lowest in Not Recovered sites, but there were no significant differences among stream categories (Fig. 3). Shannon index using 18S OTUs (all and algal OTUs only) was not significantly different among stream categories. For the UPA marker, Shannon diversity was significantly greater (P <
0.05) in the Unimpaired versus Not Recovered category and the Recovered category was not significantly different from either Unimpaired or Not Recovered categories.
Comparison of the Shannon index in rbcL sequences among categories showed a similar trend with the Unimpaired category significantly greater (P < 0.05) than the Not 32
Recovered, but no significant difference between the Recovered category and the other two categories (Fig. 3).
33
DISCUSSION
Experimental Design
In this study, three independent replicates per stream site were utilized. The majority of OTUs were present in only one of the three replicates. As well, this finding was observed in all four markers. The lack of similarity among replicates was unexpected. This result may be due to the small amount of starting material in each DNA extraction relative to the amount of material collected for a sample. However, it is difficult to determine as analytical replicates from each sample were not performed.
Nevertheless, the individual replicates from a location did group together in the ordinations for most streams. These replicates were pooled and provided an overall picture of the diversity in the stream. Future stream biofilm metabarcoding projects should continue to include independent replicates and potentially analytical replicates to increase validation of OTUs at each site.
Sequencing eDNA using 16S and 18S markers has advanced our knowledge of community structure and diversity in stream biofilms (Bond et al. 2000, Souza-Egipsy et al. 2008, Baker et al. 2009, Lear et al. 2009). Multi-marker approaches have recently shown the ability to increase the taxonomic resolution of metabarcoding studies
(Marcelino et al. 2016, Pfendler et al. 2018). In the current study, the use of 16S, 18S,
UPA, and rbcL genes to amplify freshwater biofilm eDNA provided a broad taxonomic characterization of the prokaryotes and eukaryotes (16S and 18S marker), and a more detailed taxonomic representation of the algal community (UPA and rbcL). In my study, the Eukaryotic algal diversity represented by 18S and UPA sequences was the same 10 34 phyla of algae. Interestingly, the UPA marker showed a greater number of eukaryotic algal OTUs than the 18S marker in all three stream categories. The rbcL marker had a greater number of diatom OTUs than either the 18S or UPA marker. Previous studies have recommended the rbcL gene instead of the 18S or UPA specifically for diatom barcoding because it has greater species distinguishing capabilities (Evans et al. 2007,
Hamsher et al. 2011, Kermarrec et al. 2013). In-situ metabarcoding of stream biofilms using 18S and 16S provided an outline of the diversity but my study showed that algal specific markers increased the number of OTUs and taxonomic depth observed for algal taxa similar to findings from previous studies (Marcelino et al. 2016).
A limiting factor in all metabarcoding studies is the strength and sensitivity of the taxonomic database (Kermarrec et al. 2013, Zimmermann et al. 2015). When analyzing relative abundance in this study, OTUs with undefined taxonomy or OTUs that could not be classified beyond Kingdom were removed. In addition, certain taxa are repeated in the databases with different sequences, making it difficult to adequately characterize species richness. As the number of metabarcoding studies increase, it is imperative that we continue to investigate the taxonomy of unknown OTUs and OTUs with incorrect taxonomic assignments in the datasets to unlock the full potential of eDNA and metabarcoding (Elbrecht et al. 2017).
Comparison of Stream Types
This study was conducted to further investigate the change in biofilm algal diversity along an AMD recovery gradient. Of the 10 Eukaryotic algal phyla sequenced with both 18S and UPA, representatives (OTUs) of each phylum were present in each 35 stream category. Algal prokaryotes (Cyanobacteria) sequenced with 16S and UPA were also present in each stream category. Therefore, it was shown that the biotic differences between the stream categories was not due to differences in phyla present but potentially differences among taxa represented within a phylum. Although each phylum was represented, Shannon diversity showed significantly lower algal diversity (abundance and evenness) in Not Recovered streams compared to Unimpaired streams. Algal diversity increased along the AMD remediation gradient from Not Recovered to Recovered to
Unimpaired sites. The increase in algal diversity along an AMD gradient has been recorded in previous studies using morphology, molecular markers, and functional measures (Verb and Vis 2001, Bray et al. 2008, Drerup and Vis 2016). Benthic algae are the base of the food chain in aquatic ecosystems and a decrease in algal diversity will alter the community, which may negatively affect the upper trophic levels (Sterner and
Hessen, 1994). Impacts to upper trophic levels are measured with MAIS which correlated to stream categories in this study.
Individual OTUs within each phylum also varied depending on the AMD remediation category. Sites in Not Recovered clustered significantly furthest from
Unimpaired sites and Recovered sites clustered in between or closer to Unimpaired sites.
Individuals within each algal phylum in biofilm communities can be more or less tolerant to AMD which may explains the spread of OTUs across categories (Verb and Vis 2001,
Zalack et al 2010, Pool et al. 2013, etc.). Venn diagrams support Bray Curtis ordinations because the fewest number of OTUs were shared between Not Recovered and
Unimpaired sites. In addition, each stream category had a large number of OTUs 36 distinctive to that category. This could be due to taxa with certain environmental tolerances or rare taxa that only occur in very small abundances and may not be present in reference databases (Zalack et al. 2010, Debroas et al. 2015).
Investigation of algal relative abundance in each stream category revealed a change in the percent of diatoms (Bacillariophyta) in the biofilm community. Relative abundance of diatoms in the UPA and 18S data was greatly lower in the Not Recovered category compared to Unimpaired and Recovered. This result was also observed as decrease in rbcL OTUs and UPA diatom OTUs in Not Recovered sites compared to
Unimpaired sites. Similarly, previous studies using morphological identification of biofilm taxa have observed a decrease in diatom species richness, diversity, and abundance with increased AMD impairment (Verb and Vis 2000, Smucker and Vis
2009). Diatoms are important contributors of polyunsaturated fatty acids (PUFA) in the stream, which is a vital source of energy in the ecosystem (Napolitano 1999, Torres-Ruiz et al. 2007, Torres-Ruiz and Wehr 2010). Research investigating stream biofilm essential fatty acids reported a reduction in the proportion of PUFAs with increased AMD impairment compared to Unimpaired streams (DeForest et al. 2016, Drerup and Vis
2016). A decrease in PUFAs may also have a negative effect on the higher trophic levels, as they are reliant on biofilm PUFA since many organisms cannot synthesize their own
(Torres-Ruiz and Wehr 2010).
The UPA marker showed an increase in relative abundance of cyanobacteria in the Not Recovered sites, but this was not supported in the 16S data. Additionally, the 16S and UPA Cyanobacteria OTUs varied by site within each category rather than showing 37 an overall increase in the Not Recovered category streams. These findings indicate that there is most likely not an increase in Cyanobacteria, but a drastic decrease in diatom relative abundance in the community which increased the relative abundance of
Cyanobacteria. The drastic change in diatom relative abundance appears to be unique and is supported by multiple biological metrics.
Benthic diatom communities and individual taxa have been shown to be indicators of stream water quality (Pan et al. 1996, Winter and Duthie 2000, Porter et al.
2008, Stevenson 2014, Hausmann et al. 2016). Diatoms have been used to assess AMD impacts in this mining region (Smucker and Vis 2009, Zalack et al. 2010, Pool et al.
2013) and the diatom index of biotic integrity (AMD-DIBI) was developed specifically to assess AMD pollution in the Western Alleghany plateau (Zalack et al. 2010). One metric used in this index is the relative abundance of Cymbella. This genus decreases in streams with increasing impairment. In the current study, a similar result was observed with
Cymbella being very sparse in Not Recovered streams compared to Recovered and
Unimpaired streams. Likewise, Encyonema has been noted to be an indicator of stream with higher water quality and lower nutrients (Hausmann et al. 2016) and this genus was observed in greater relative abundance in Recovered and Unimpaired sites compared to
Not Recovered sites. Sellaphora has been recorded as an indicator of streams with high conductivity and neutral pH, similar to Not Recovered sites in this study (Hausmann et al.
2016). Sellaphora was in the greatest relative abundance in Not Recovered sites and decreased in abundance in Recovered and Unimpaired sites. Metabarcoding of diatoms using the rbcL marker allowed for genus and in some cases species identification of 38 indicator taxa. As diatom rbcL reference databases continue to grow and be curated, a greater number of indicator species within the community can be investigated using this method.
Although the indicator taxa Cymbella, Encyonema, and Sellaphora were in relative high abundance, other potential indicator taxa were only in low abundance.
Cocconeis has been reported to be an indicator of higher water quality (Wang et al.
2005). Cocconeis represented 0.08% of the relative abundance in Unimpaired and 0.2% in Recovered sites, but it was not observed in any of the Not Recovered sites and one
Recovered site (HF 039). In addition, Pinnularia was in very low abundance and has been described as a possible indicator of AMD impairment (Zalack et al. 2010).
Pinnularia had a relative abundance of 0.49% in Not Recovered, 0.02% in Recovered and 0.006% in Unimpaired sites representing a 25 to 80x decrease in the higher quality streams, which is consistent with the reports that it occurs in more impacted streams.
Although these taxa were found in low relative abundance in each category, they offer important insight into the stream water quality and remediation status.
Eunotia exigua and Frustulia are indicator diatoms previously reported in AMD impaired streams in the area but were not observed in this study (Verb and Vis 2000,
Smucker and Vis 2009). Eunotia exigua was not in our reference database but other
Eunotia species and Frustulia species were present in the reference database. If these organisms were present in the sample they should have been identified at least to a genus level with the rbcL marker. Not Recovered sites in this study are not as severely impacted as AMD impacted sites in other studies that previously made these observations (Verb 39 and Vis 2000, Smucker and Vis 2009). It is possible that the lower water quality sites chosen in this study had good enough water quality that Eunotia and Frustulia were not present.
Conclusion
Analysis of stream biofilm communities along AMD remediation gradient using multi-marker metabarcoding results support biofilm community clustering by remediation status similar to the MAIS. The biofilm community increases in diversity as streams increase in remediation status (from Not Recovered to Recovered and
Unimpaired). Most biological metrics used in this study showed Recovered sites to be more similar to Unimpaired sites than Not Recovered. In addition, Not Recovered sites were more dissimilar to Unimpaired sites than to Recovered sites. Diatom indicator taxa established from previous studies corresponded to water quality and remediation status in the current study as well. Overall, this research successfully showed that the data generated from a mutli-marker metabarcoding approach provided insights into community level differences in biofilm diversity among sites categorized in different states of AMD remediation. Numerous future directions could be explored including sampling more streams in each category, seasonal studies, using other group specific markers, and deeper taxonomic analysis as databases continue to be curated.
40
REFERENCES
Amaral-Zettler, L. A., McCliment, E. A., Ducklow, H. W., & Huse, S. M. 2009. A
method for studying protistan diversity using massively parallel sequencing of V9
hypervariable regions of small-subunit ribosomal RNA Genes. PLoS ONE 4:e6372.
Apprill, A., McNally, S., Parsons, R., & Weber, L. 2015. Minor revision to V4 region
SSU rRNA 806R gene primer greatly increases detection of SAR11
bacterioplankton. Aquat. Microb. Ecol. 75:129–137.
Arar, E. J., & Collins, G. B. 1997. In Vitro Determination of Chlorophyll a and
Pheophytin a in Marine and Freshwater Algae by Fluorescence. 1.2st ed.
Environmental Protection Agency, Office of Research and Development. Cincinnati,
Ohio.
Baker, B.J., Tyson, G.W., Goosherst, L. & Banfield, J.F. 2009. Insights into the diversity
of eukaryotes in acid mine drainage biofilm communities. Appl. Environ. Microbiol.
75:2192–9.
Belanger, S. E., Rupe, K. L., Lowe, R. L., Johnson, D. & Pan, Y. 1996. A flow-through
laboratory microcosm suitable for assessing effects of surfactants on natural
periphyton. Environ. Toxicol. Water Qual. 11:65-76.
Bray, J. P., Broady, P. A., Niyogi, D. K., & Harding, J. S. 2008. Periphyton communities
in New Zealand streams impacted by acid mine drainage. Mar. Freshwater Res.
59:1084-1091.
41
Bond, P. L., S. P. Smriga, & J. F. Banfield. 2000. Phylogeny of microor- ganisms
populating a thick, subaerial, predominantly lithotrophic biofilm at an extreme acid
mine drainage site. Appl. Environ. Microbiol. 66:3842–3849.
Callahan, B. J., McMurdie, P. J., Rosen, M. J., Han, A. W., Johnson, A. J., & Holmes, S.
P. 2016. DADA2: High-resolution sample inference from Illumina amplicon data.
Nat. Methods 13:581–583.
Caporaso, J. G., Kuczynski, J., Stombaugh, J., Bittinger, K., Bushman, F. D., Costello, E.
K., Huttley, G. A., et al. 2010. QIIME allows analysis of high-throughput
community sequencing data. Nat. Methods 7:335.
Caporaso, J. G., Lauber, C. L., Walters, W. A., Berg-Lyons, D., Lozupone, C. A.,
Turnbaugh, P. J., Noah Fierer, N., & Knight, R. 2011. Global patterns of 16S rRNA
diversity at a depth of millions of sequences per sample. Proc. Natl. Acad. Sci. USA
108:4516–4522.
Costello, C. 2003. Acid Mine Drainage: Innovative Treatment Technologies. Report for
U.S. Environmental Protection Agency. Office of Solid Waste, and Emergency
Response Technology Innovation Office, Washington, DC. 52 pp. Available at:
www.clu-in.org/s.focus/c/pub/i/1054. (last accessed Febuary 2017).
Debroas D., Hugoni M., & Domaizon, I. 2015. Evidence for an active rare biosphere
within freshwater protists community. Mol. Ecol. 24:1236-1247.
DeForest, J. L., Drerup, S. A. & Vis, M. L. 2016. Using fatty acids to fingerprint biofilm
communities: a means to quickly and accurately assess stream quality. Environ.
Monit. Assess. 188:277. 42
Drerup, S. A., & Vis, M. L. 2016. Responses of Stream Biofilm Phospholipid Fatty Acid
Profiles to Acid Mine Drainage Impairment and Remediation. Water Air Soil
Pollution 227:1–10.
Dopheide, A., Lear, G., Stott, R. & Lewis, G. 2009. Relative diversity and community
structure of ciliates in stream biofilms according to molecular and microscopy
methods. Appl. Environ. Microbiol. 75:5261–72.
Edwards, S. J. & Kjellerup, B. V. 2013. Applications of biofilms in bioremediation and
biotransformation of persistent organic pollutants, pharmaceuticals/personal care
products, and heavy metals. Appl. Microbiol. Biotechnol. 97:9909–21.
Elbrecht, V. & Leese, F. 2015. Can DNA-based ecosystem assessments quantify species
abundance? Testing primer bias and biomass—sequence relationships with an
innovative metabarcoding protocol. PLoS ONE 10:e0130324.
Elbrecht, V., Vamos, E. E., Meissner, K., Aroviita, J., & Leese, F. 2017. Assessing
strengths and weaknesses of DNA metabarcoding-based macroinvertebrate
identification for routine stream monitoring. Methods Ecol. and Evol. 8:1265–1275.
Evans, K. M., Wortley, A. H., & Mann, D. G. 2007. An assessment of potential diatom
“barcode” genes (cox1, rbcL, 18S and ITS rDNA) and their effectiveness in
determining relationships in Sellaphora (Bacillariophyta). Protist. 158:349–364.
Gray, N. F. 1997. Environmental impact and remediation of acid mine drainage: A
management problem. Environ. Geol. 30:62–71.
43
Guschina, I. A. & Harwood, J. L. 2009. Algal lipids and effect of the environment on
their biochemistry. Lipids in Aquatic Ecosystems. Springer, New York, NY, pp. 1–
24.
Hamsher, S. E., Evans, K. M., Mann, D. G., Poulícková, A., & Saunders, G. W. 2011.
Barcoding diatoms: exploring alternatives to COI-5P. Protist 162:405–422.
Hausmann, S., Charles, D. F., Gerritsen, J., & Belton, T. J. 2016. A diatom-based
biological condition gradient (BCG) approach for assessing impairment and
developing nutrient criteria for streams. Sci. Total Environ. 562:914–927.
Hervé, M. 2018. RVAideMemoire: Testing and Plotting Procedures for Biostatistics. R
package version:0.9-69.
Johnson, K. S. 2007. Field and laboratory Methods for using the MAIS
(Macroinvertebrate Aggregated Index for Streams) in Rapid Bioassessment of Ohio
Streams. Available at: http://www.epa.ohio.gov/dsw/credibledata/references.aspx
(last accessed May 2017).
Kearse, M., Moir, R., Wilson, A., Stones-Havas, S., Cheung, M., Sturrock, S., Buxton, S.,
Cooper, A., Markowitz, S., & Duran, C., 2012. Geneious Basic: an integrated and
extendable desktop software platform for the organization and analysis of sequence
data. Bioinformatics 28:1647–1649.
Kermarrec, L., Franc, A., Rimet, F., Chaumeil, P., Humbert, J. F., & Bouchez, A. 2013.
Next-generation sequencing to inventory taxonomic diversity in eukaryotic
communities: a test for freshwater diatoms. Mol. Ecol. Resour. 13:607619.
44
Lang, I., & Kaczmarska, I. 2011. A protocol for a single-cell PCR of diatoms from fixed
samples: method validation using Ditylum brightwellii (T. West) Grunow. Diatom
Research 26:43–49.
Lear, G., Niyogi, D., Harding, J., Dong, Y. & Lewis, G. 2009. Biofilm bacterial
community structure in streams affected by acid mine drainage. Appl. Environ.
Microbiol. 75:3455–60.
Legendre, P. & Gallagher, E. D. 2001. Ecologically meaningful transformations for
ordination of species data. Oecologia 129:271–280.
Marcelino, V. R., Verbruggen, H., Rosenberg, E., Koren, O., Reshef, L., Efrony, R., &
Zilber-Rosenberg, I. 2016. Multi-marker metabarcoding of coral skeletons reveals a
rich microbiome and diverse evolutionary origins of endolithic algae. Sci. Rep.
6:31508.
McMurdie, P. J., & Holmes, S. (2013 phyloseq: An R Package for Reproducible
Interactive Analysis and Graphics of Microbiome Census Data. PLoS ONE
8:e61217.
Napolitano, G. E., 1999. Fatty acids as trophic and chemical markers in freshwater
ecosystems. In Arts, M. T. & B. C. Wainman [Eds] Lipids in Freshwater
Ecosystems Springer, New York, NY. pp. 21–44
Nelson, C. E., Bennett, D. M., & Cardinale, B. J. 2013. Consistency and sensitivity of
stream periphyton community structural and functional responses to nutrient
enrichment. Ecol. Appl. 23:159-173.
45
OEPA. 1987. Biological criteria for the protection of aquatic life: volume II: users
manual for biological field assessment of Ohio surface waters. Div. Water Qual.
Monit. & Assess. Surface Water Section, Columbus, Ohio.
Oksanen, J., Blanchet, F. G., Kindt, R., Legendre, P., Minchin, P. R., O’hara, R. B.,
Simpson, G. L., et al. 2013. Package “vegan” Community ecology package. R
package version 2.4-6.
Pan, Y., Stevenson, R. J., Hill, B. H., Herlihy, A. T., & Collins, G. B. 1996. Using
diatoms as indicators of ecological conditions in lotic systems: a regional
assessment. J. N. Am. Benthol. Soc. 15:481–495.
Pfendler, S., Karimi, B., Maron, P. A., Ciadamidaro, L., Valot, B., Bousta, F., Alaoui-
Sosse, L. 2018. Biofilm biodiversity in French and Swiss show caves using the
metabarcoding approach: First data. Sci. Total Environ. 615:1207-1217.
Pool, J. R., Kruse, N. A., & Vis, M. L. 2013. Assessment of mine drainage remediated
streams using diatom assemblages and biofilm enzyme
activities. Hydrobiologia, 709:101–116.
Porter, S. D., Mueller, D. K., Spahr, N. E., Munn, M. D., & Dubrovsky, N. M. 2008.
Efficacy of algal metrics for assessing nutrient and organic enrichment in flowing
waters. Freshw. Biol. 53:1036–1054.
Quast, C., Pruesse, E., Yilmaz, P., Gerken, J., Schweer, T., Yarza, P., Peplies, J., &
Glöckner, F. O. 2013 The SILVA ribosomal RNA gene database project: improved
data processing and web-based tools. Nucleic acids research. 41:D590-D596
46
Ratnasingham, S., & Hebert, P. D. 2007. BOLD: The Barcode of Life Data System
(http://www. barcodinglife. org). Mol. ecol. notes 7:355–364.
Sherwood, A. R. & Presting, G. G. 2007. Universal primers amplify a 23S rDNA plastid
marker in eukaryotic algae and cyanobacteria. J. Phycol. 43:605–8.
Sherwood, A. R., Dittbern, M. N., Johnston, E. T., & Conklin, K. Y. 2017. A
metabarcoding comparison of windward and leeward airborne algal diversity across
the Ko ‘olau mountain range on the island of O'ahu, Hawai ‘i1. J. Phycol. 53:437–
445.
Smucker, N. J., & Vis, M. L. 2009. Use of diatoms to assess agricultural and coal mining
impacts on streams and a multiassemblage case study. J. N. AM. Benthol. Soc.
28:659–675.
Smucker, N. J., Drerup, S. A., & Vis, M.L. 2014. Roles of benthic algae in the structure,
function, and assessment of stream ecosystems affected by acid mine drainage. J.
Phycol. 50:425–36.
Souza-Egipsy, V., González-Toril, E., Zettler, E., Amaral-Zettler, L., Aguilera, A. &
Amils, R. 2008. Prokaryotic community structure in algal photosynthetic biofilms
from extreme acidic streams in Rio Tinto (Huelva, Spain). Int. Microbiol. 11:251–
60.
Stein, E. D., Martinez, M. C., Stiles, S., Miller, P. E., & Zakharov, E. V. 2014. Is DNA
barcoding actually cheaper and faster than traditional morphological methods:
results from a survey of freshwater bioassessment efforts in the United
States?. PLoS One 9:e95525 47
Sterner, R. W., & Hessen, D. O. 1994. Algal nutrient limitation and the nutrition of
aquatic herbivores. Annu. Rev. Ecol. Syst. 25:1–29.
Stevenson, R. J., Bothwell, M. L., Lowe, R. L., & Thorp, J. H. 1996. Algal ecology:
Freshwater benthic ecosystem. Academic press.
Stevenson, J. 2014. Ecological assessments with algae: a review and synthesis. J. Phycol.
50:437–461.
Torres-Ruiz, M., & Wehr, J. D. 2010. Changes in the nutritional quality of decaying leaf
litter in a stream based on fatty acid content. Hydrobiologia 651:265–278.
Torres-Ruiz, M., Wehr, J. D., & Perrone, A. A. 2007. Trophic relations in a stream food
web: importance of fatty acids for macroinvertebrate consumers. J. N. AM. Benthol.
Soc. 26:509–522.
US Environmental Protection Agency (US EPA) 1994. Acid mine drainage prediction.
US EPA Report #EPA-530-R-94-036, US EPA Offie of Solid Waste, Washington,
DC.
United States Environmental Protection Agency (US EPA). 1997. A citizen’s handbook
to address to address contaminated coal mine drainage. US EPA Report #EPA-903-
K-97-003, US EPA Region 3, Philadelphia, PA, USA.
Verb, R. G., & Vis, M. L. 2000. Comparison of benthic diatom assemblages from streams
draining abandoned and reclaimed coal mines and nonimpacted sites. J. N. Am.
Benthol. Soc. 19:274–288.
Verb, R. G., & Vis, M. L. 2001. Macroalgal communities from an acid mine drainage
impacted watershed. Aquat. Bot. 71:93–107. 48
Verb, R. G., & Vis, M. L. 2005. Periphyton assemblages as bioindicators of mine-
drainage in unglaciated western allegheny plateau lotic systems. Water. Air. Soil
Pollut. 161:227–65.
Wang, Y. K., Stevenson, R. J. & Metzmeier, L. 2005. Development and evaluation of a
diatom-based index of biotic integrity for the Interior Plateau ecoregion, USA. J. N.
Am. Benthol. Soc. 24:990–1008.
Wetzel, R. G. 1983. Attached algal-substrata interactions: fact or myth, and when and
how?. In Periphyton of freshwater ecosystems. Springer, Netherlands, pp. 207-215
Winter, J. G., & Duthie, H. C. 2000. Epilithic diatoms as indicators of stream total N and
total P concentration. J. N. Am. Benthol. 19:32-49.
Zalack, J. T., Smucker, N. J., & Vis, M. L. 2010. Development of a diatom index of
biotic integrity for acid mine drainage impacted streams. Ecol. Indic. 10:287–295.
Zimmermann, J., Glöckner, G., Jahn, R., Enke, N., & Gemeinholzer, B. 2015.
Metabarcoding vs. morphological identification to assess diatom diversity in
environmental studies. Mol. Ecol. Resour. 15:526-542 49
Table 1: Amplification primers and PCR settings for each gene marker. The citation refers to the primary primer sequence. The amplicon size (bp) is the size of the PCR product after amplification including primers. Thermocycler settings for the first PCR amplification.
Target Primer Citation Sequence (5’ – 3’) Illumina Overhang + 0-2 N’s + primer (5’-3’) Name 16S 515F Caporaso et al. 2011 GTGYCAGCMGCCGCGGTAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTGYCAGCMGCCGCGGTAA (V4) TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGTGYCAGCMGCCGCGGTAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGTGYCAGCMGCCGCGGTAA 806RB Apprill et al. 2015 GGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGGGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNGGACTACNVGGGTWTCTAAT GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNGGACTACNVGGGTWTCTAAT 18S 1391F Amaral-Zettler et al. 2009 GTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGTACACACCGCCCGTC TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGTACACACCGCCCGTC EukBr Amaral-Zettler et al. 2009 TGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNTGATCCTTCTGCAGGTTCACCTAC GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNTGATCCTTCTGCAGGTTCACCTAC UPA P23SrV_f1 Sherwood & Presting 2007 GGACAGAAAGACCCTATGAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGGGACAGAAAGACCCTATGAA (23S) TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNGGACAGAAAGACCCTATGAA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNGGACAGAAAGACCCTATGAA P23SrV_f1 Sherwood & Presting 2007 TCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGTCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNTCAGCCTGTTATCCCTAGAG GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNTCAGCCTGTTATCCCTAGAG rbcL cfD Hamsher et al. 2011 CCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGCCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNCCRTTYATGCGTTGGAGAGA TCGTCGGCAGCGTCAGATGTGTATAAGAGACAGNNCCRTTYATGCGTTGGAGAGA DtrbcL3R Lang and Kaczmarska 2011 ACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNACACCWGACATACGCATCCA GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAGNNACACCWGACATACGCATCCA 50
Table 2: Thermocycler Settings for 1st PCR Amplification
DNA Marker 1st Amplification Conditions (* = Touchdown PCR) Initial denaturing = 95˚C, 5 min 28 Cycles: Denature = 98˚C, 20 s 16S (V4) Annealing = 55˚C, 15 s Extension = 72˚C, 25 s Final Extension = 72˚C, 5 min Initial denaturing = 95˚C, 5 min 27 Cycles: Denature = 98˚C, 20 s 18S Annealing = 58˚C, 15 s Extension = 72˚C, 25 s Final Extension = 72˚C, 5 min *Initial denaturing = 94˚C, 4 min
1) 2) 16 Cycles: 19 Cycles:
UPA Denature = 94˚C, 30 s Denature = 94˚C, 30 s Annealing = 66˚C (-0.5˚C / cycle), 30 s Annealing = 58˚C, 30 s
Extension = 72˚C, 30 s Extension = 72˚C, 30 s Final Extension = 72˚C, 1 min Initial denaturing = 95˚C, 5 min 31 Cycles: Denature = 98˚C, 20 s rbcL Annealing = 55˚C, 15 s Extension = 72˚C, 30 s Final Extension = 72˚C, 5 min 51
Table 3: Mean (range) of chemical and physical parameters measured for streams in each category. Significant differences in bold among categories were calculated using Kruskal-Wallis post hoc test (P < 0.05). Values that share a letter are not significant from each other. MAIS = Macroinvertebrate Aggregate Index for Streams, Chl. a = Chlorophyll a.
Variable Category Not Recovered Recovered Unimpaired (n=3) (n=3) (n=2) MAIS 2017 12A (12-13) 15B (14-16) 14C (14) Chl. a (µg/g) 448.0 (187.9-868.6) 490.2 (157.7-971.7) 601.0 (300.7, 901.3) pH 7.7 (7.5-8.0) 7.5 (7.4-7.7) 7.6 (7.3, 7.9) Specific Conductance (µS/cm) 613A (510 - 770) 490AB (380-560) 370B (210, 530) Temp. (˚C) 21A (19-22) 21A (20-23) 19B (19) Mean Current Velocity (cm/s) 44.0A (26.9-72.7) 15.4B (5.0-32.5) 16.7B (7.3, 26.1) Mean Depth (cm) 19.4AB (11.8-34.3) 31.1A (17.7-44.2) 14.5B (8.0, 21.0) Width (m) 6.7 (3.2-10.0) 9.1 (8.4-9.7) 9.0 (7.3, 10.7) % Canopy 30A (0-90) 70AB (70-80) 90B (80, 90) Nitrate (mg/L) 0.08A (0.07-0.10) 0.05B (0.02-0.06) 0.03B (0.02, 0.03) Iron (mg/L) 0.17A (0.04-0.42) 0.32B (0.22-0.49) 0.15AB (0.13, 0.16) Sulfate (mg/L) 120.0A (70.0-170.0) 56.7B (30.0-80.0) 22.5B (15.0, 30.0)
52
Table 4: Comparison of replicates within a stream site. Values are the mean percent similarity of OTUs in one, two, or three of the replicates with the range (%) among sites below.
OTUs in All Three OTUs in Two OTUs in One Gene Replicates Replicates Replicate 19.05 22.71 58.24 16S (12.21 – 22.84) (20.71 – 25.33) (55.53 – 67.08) 16.19 23.59 60.23 18S (10.21 – 24.95) (20.0 – 27.07) (53.61 – 69.51) 17.50 22.24 60.86 UPA (8.57 – 38.94) (17.53 – 38.94) (47.78 – 73.27) 15.38 31.76 52.86 rbcL (4.05 – 27.37) (15.75 – 43.93) (34.21 – 77.40)
Table 5: Mean Relative abundance (%) of diatom genera based on rbcL sequences for each stream category. Genera in bold are taxa noted to indicate water quality.
Genus Category Not Recovered Recovered Unimpaired Achnanthidium 7.14 2.65 3.47 Amphora 0.11 0.01 1.90 Cocconeis 0.00 0.20 0.08 Craticula 0.00 0.00 0.01 Cyclotella 0.32 1.19 2.35 53
Table 5: continued
Cymbella 0.24 23.19 22.60 Discostella 0.07 0.00 0.00 Encyonema 2.62 18.55 10.12 Fistulifera 1.50 1.59 2.68 Fragilaria 11.93 10.97 5.46 Gomphonema 8.24 9.48 3.40 Halamphora 0.07 0.03 2.80 Mayamaea 0.32 0.01 0.16 Melosira 0.56 2.66 4.85 Navicula 21.86 5.09 10.39 Nitzschia 25.71 15.47 17.10 Pinnularia 0.49 0.02 0.01 Planothidium 0.46 0.12 1.76 Reimeria 0.43 0.17 2.67 Sellaphora 16.70 7.38 3.47 Surirella 1.24 1.22 4.73
54