<<

ANALYZING ALGAL DIVERSITY IN AQUATIC SYSTEMS

USING NEXT GENERATION SEQUENCING

______

A Thesis

Presented to

The Honors Tutorial College

Ohio University

______

In Partial Fulfillment

of the Requirements for Graduation

from the Honors Tutorial College

with the degree of

Bachelor of Science in Environmental and Biology

______

By

Mariah A. Thrush

May 2013

This thesis has been approved by The Honors Tutorial College and the Department of Environmental and Plant Biology

______Dr. Morgan Vis Professor, Environmental and Plant Biology Thesis Advisor

______Dr. Harvey Ballard Honors Tutorial College, Director of Studies Environmental and Plant Biology

______Jeremy Webster Dean, Honors Tutorial College

2

ACKNOWLEDGEMENTS I’d like to thank my thesis advisor Dr. Morgan Vis for her time, support and advice throughout my academic career. She helped to keep me on track even as plans changed and changed again. Thanks also go to Dr. Harvey Ballard; he has been my

Director of Studies and a source of support and advice since I began at Ohio

University. Daryl Lam also spent a great deal of time teaching me genetic techniques and bioinformatics programs, for which I am immensely grateful. I’d also like to thank

Vijay Nadella and the staff in the Ohio University Genomics Facility for their work; I wouldn’t have my data without their hard work with this new technology. Thanks to

Dr. Kelly Johnson and the Boat of Knowledge program for collecting most of the large river samples for me. I would like to thank my labmates Lauren Fuelling, Emily

Johnston, Eric Solamaki, Sam Drerup, and Emily Keil for their help, support, and friendship. A special thanks goes out to Sam Drerup for his help with Hewett Fork field collections, as well as chlorophyll a and phosphorus measurements. The following funding sources made it possible to complete this thesis: Honors Tutorial

College Plant Biology Research fund, the Jeanette G. Grasselli Brown Undergraduate

Research Award, and a Research and Creative Activity Grant from VP Research

Office.

I want to thank my family and friends for their continued support through my undergraduate degree, especially my parents, my sister Sam, and my boyfriend

Weylin. Your encouragement during the difficult times has meant a great deal to me.

3

TABLE OF CONTENTS Acknowledgements...... 3 List of Tables...... 5 List of Figures...... 6 Introduction...... 8 Chapter One Introduction...... 12 Methods...... 13 Results...... 19 Discussion and Conclusions...... 23 Chapter Two Introduction...... 39 Methods...... 42 Results...... 47 Discussion and Conclusions...... 48 References...... 60

4

LIST OF TABLES

Table 1.1: Water chemistry for the Ohio and Muskingum Rivers...... 29

Table 1.2: Number of sequences after Prinseq filters were applied...... 30

Table 1.3: Number of sequences for genera included in positive control...... 32

Table 2.1: Primers used for PCR to prepare samples for multiplexing. Shaded bases indicate barcode to separate the different environmental samples during the bioinformatics pipe procedure...... 50

Table 2.2: Water chemistry data for sites along Hewett Fork. Data collected October 2,

2012. Site designations as in Fig. 2.1...... 51

Table 2.3: Calculated values for AFDM, Chl. a, and autotrophic index for sites along

Hewett Fork. Site designations as in Fig. 2.1...... 52

5

LIST OF FIGURES Figure 1.1: Positive control consisting of 11 unialgal cultures. The taxonomic groups identified in GenBank represent the unialgal cultures as follows: Cyanobacteria

(Oscillatoria, Nostoc, Spirulina), Zygnemophyceae (Spirogyra), Euglenozoa

(, Phacus), and Chlorophyceae (Chlamydomonas, Volvox), Bacillariophyta

(Navicula, Synedra) and Synurophyceae (Synura). X, Y, and Z were identified through a GenBank search, but do not represent the unialgal cultures...... 33

Figure 1.2: Ohio River phytoplankton sample composition from single chip data with a total of 19 taxonomic groups identified via a GenBank search...... 34

Figure 1.3: Muskingum River phytoplankton sample composition single chip data with a total of 20 taxonomic groups identified via a GenBank search...... 35

Figure 1.4: Ohio River phytoplankton sample composition from multiplex data with a total of 19 taxonomic groups identified via a GenBank search...... 36

Figure 1.5: Muskingum River phytoplankton sample composition from data with a total of 19 taxonomic groups identified via a GenBank search...... 37

Figure 1.6: Composition of Ohio and Muskingum River phytoplankton single chip samples; sequences identified via a GenBank search in March 2012...... 38

Figure 2.1. Map of Hewett Fork sampling sites. Note that sites HF 137 and HF 129 are upstream of the doser, and all other sites are downstream, and site HF 120 is at the mouth of Carbondale Creek flowing into Hewett

Fork...... 53

6

Figure 2.2: Sites on Hewett Fork. A. HF 095 (not sampled) closer to the doser exhibited an orange deposit covering the stream bottom making sampling impossible.

B. HF 010 farther from the doser showed no orange sediments...... 54

Figure 2.3: Filters for AFDM before combustion. Site HF 120 was an orange color due to iron oxide...... 55

Figure 2.4: AFDM values for sites along Hewett Fork. Site designations as in Figure

2.1. The linear regression line for the samples is y = -0.9405x + 12.857...... 56

Figure 2.5: Chlorophyll a measurements for sites along Hewett Fork. Site designations as in Figure 2.1. The linear regression for the samples is y = 0.0044x + 0.0515...... 57

Figure 2.6: Autotrophic index values for sites along Hewett Fork. Site designations as in Figure 2.1. The linear regression for the samples is y = -18.682x + 222.78...... 58

Figure 2.7: Agarose gel of PCR products for sites along Hewett Fork. All sites had

PCR product, but the bands for HF 129 and HF 120 are not visible on the image. Site designations as in Fig. 2.1...... 59

7

INTRODUCTION

Algae can be broadly defined as photosynthetic, oxygen-producing aquatic bacteria or , but there are exceptions to this definition, such as terrestrial and some taxa can be heterotrophic (Graham et al. 2009). Algae are a polyphyletic assemblage of organisms; several separate endosymbiotic events – primary, secondary, and tertiary – led to the diversity of algae seen today (Keeling 2010).

Depending on where they occur in the water column, algae can be categorized as phytoplankton or periphyton/benthic algae. Phytoplankton is algae that have adapted to live in the open water of any body of water (Graham et al. 2009), while periphyton/benthic algae are unicellular, filamentous, and colonial algae that attach to the stream bottom (benthos) (Graham et al. 2009). Phytoplankton morphology varies in shape and size, ranging from 0.2 µm to 20 mm (Graham et al. 2009). This variation stems from the need to balance buoyancy to stay within the photic zone, surface area for nutrient uptake and deter predation (Reynolds 2006). Periphyton is found attached to the substratum of streams and within the littoral zone of lakes. The size and shape of periphyton are primarily influenced by the physical habitat such as the level of disturbance, current velocity, and substratum, all of which help determine the present in a given habitat (Graham et al. 2009). Grazing pressures also influences the benthic algal species present, and to what height they will grow (Steinman 1996).

Algae (phytoplankton or periphyton) form the base of the food chain in aquatic ecosystems; primary production by algae is regarded as an important food source for consumers in streams (Minshall 1978, Vannote et al. 1980). Benthic algae can also

8 help consolidate loose sediments to create a stable substratum for other organisms; some diatoms, such as Nitzschia curvilineata Hustedt, exude sticky mucilage to facilitate this process (Sutherland et al. 1998). Algae can modulate the chemical characteristics of their environment, as well. For example, benthic algae take up various nutrients to facilitate and mediate nutrient cycling in streams (Mulholland

1996); phytoplankton in lakes absorbs various nutrients from the water column to prevent lake eutrophication (Wetzel 1996).

Biodiversity is defined as the diversity of species within a specified area or the variety of genetic material within a discrete population (Swingland 2001). High biodiversity within a stream is important; research has found that having many taxa within an aquatic ecosystem help keep streams healthy and functioning. For example,

McGrady-Steed et al. (1997) showed that aquatic ecosystems with high biodiversity are more resistant to invasive species establishment. High biodiversity leads to ecosystems in which organisms have different morphologies and behaviors to adapt to their habitats. For example, benthic algae have a wide variety of structures to adapt to specific conditions and locations; Steinman (1996) outlined six growth forms that inhibit or facilitate various types of grazers. These growth forms and morphologies aid in visual identification, but learning to identify species takes a great deal of time and expertise.

Diversity of algae is typically analyzed and enumerated using microscopy, but molecular techniques have become a useful tool for researchers. DNA barcodes are short segments of DNA that are conservative enough to be common among related

9 species, but different enough to distinguish between species (Saunders 2005). Sanger sequencing was initially used, but this process was time consuming because only one taxon is sequenced per reaction. Next generation sequencing using DNA barcodes is replacing Sanger sequencing due to the ability to assess all species in a single sample with a single reaction (Pfrender et al. 2010). Although some researchers are wary of sequencing technology (DeWalt 2011), the phycological community has embraced this new powerful tool. Currently, algal researchers are primarily utilizing DNA barcodes for phylogenetic purposes to quickly identify species and determine cryptic species (Le Gall and Saunders 2010, Hamsher et al. 2011). Next generation sequencing is very prevalent in macroinvertebrate biodiversity research (Ekrem et al.

2010, Hajibabaei et al. 2011), but its use in algal biodiversity research is still in its infancy. By using molecular techniques instead of the traditional microscopy method, the diversity of an algal community can be quickly assessed without having expertise in algal taxonomic identification. Algal diversity can be used as a biomonitoring tool to catalogue the biodiversity within an ecosystem, and then make inferences about the health of the ecosystem based on the composition of the algal community.

Biodiversity of algae, whether plankton or periphyton, is important to ecosystem function and next generation sequencing of DNA barcodes is a promising new tool to elucidate diversity in a variety of habitats. Therefore, this thesis seeks to develop methodologies for conducting biodiversity studies using next generation sequencing as well as utilizing this information to assess stream health. To develop methods, phytoplankton biodiversity of two samples from the Ohio and Muskingum

10

Rivers were examined using DNA barcodes with next generation sequencing (Chapter

1). Utilizing the methods developed for phytoplankton, an acid mine drainage remediation gradient along the length of Hewett Fork, a small stream in southeastern

Ohio, was assessed with eight periphyton samples (Chapter 2). In this study, it was hypothesized that the sites farther downstream from the remediation site would have the higher the biodiversity because the water quality would have greatly improved in comparison to the upstream sites. No previous studies have catalogued all species of benthic algae found within Hewett Fork, but approximately 80 species of diatoms have been recorded previously (Gray 2011, Pool et al. 2013).

11

CHAPTER ONE

INTRODUCTION

A large river is defined as a sixth-order or greater river with positive net primary productivity according to the River Continuum Concept (Vannote et al. 1980).

This type of river is typically deep and wide, so benthic algae are only found near the shallow banks and most of the productivity is from phytoplankton suspended in the photic zone of the water column. Examples of this river type in Ohio are the

Muskingum and Ohio Rivers. The Muskingum River is a tributary of the Ohio River and the Ohio eventually drains into the Mississippi River. These rivers are used for commerce and a series locks and dams, 20 on Ohio River and 11 on the Muskingum

River, help accommodate heavy industrial use (Benke and Cushing 2005). For the

Ohio River, farms and forests make up the majority of the land use in the watershed, but 1% of the basin has had mining activity (Benke and Cushing 2005). Current land use around the Muskingum River is dominated by deciduous forests (65%), with farming and urban development in the remaining 35% (ILGARD). The Ohio River has a diverse phytoplankton community despite light limiting conditions of high sediment load and a fast current. A survey of the river cataloged 134 taxa of cyanobacteria and algae with the most abundant algal groups being and diatoms (Wehr and

Thorp 1997). The survey conducted by Wehr and Thorp spanned 3 100+ km pools sampled once a month for 1 year; within each pool, several habitats were sampled, creating a total of 40 samples per pool collected each month. There appears to have

12 been no surveys conducted of the Muskingum River and little is known about its algal flora.

Next generation sequencing is a relatively new technology that has been used quite successfully to sequence whole genomes (e.g. Metzker 2009), but it has been less explored for sequencing entire populations (environmental sequencing) of eukaryotic groups such as algae or macroinvertebrates (Sherwood et al. 2008,

Hajibabaei et al. 2011. Pfrender et al. (2010) asserted that next generation sequencing is the next major research method for taxonomic identification and environmental sampling. This new research tool coupled with DNA barcodes could make taxonomic identification of whole communities fairly quick, easy and inexpensive having shown great promise in assessing bacterial communities (Creer et al. 2010, Dopheide et al.

2009). Its power for elucidating algal diversity has yet to be fully explored.

This research was designed to fill the gap in our knowledge of next generation sequencing to analyze algal diversity. This new tool is used to investigate the phytoplankton diversity of samples from the Ohio and Muskingum Rivers. In addition, a sample with known algal diversity was analyzed at the same time to evaluate the accuracy of these methods.

METHODS

Both the Ohio and Muskingum are large rivers. The Ohio River is a 9th order stream forming the southern border of Ohio and the Muskingum River is a tributary of the Ohio River that is entirely within the state of Ohio. Ohio University’s Boat of

Knowledge conducted research cruises coordinated with various southeastern high

13 schools in fall 2011 (http://books.ohio.edu/). On this vessel, phytoplankton and water chemistry samples were collected from river mile 272 on the Ohio River and river mile 3 on the Muskingum River.

Phytoplankton was collected with a liter bottle from the top 0.5 m of the water column. A 250 mL sample was filtered onto a Whatman #6 filter disc. The filter disc was folded and placed into silica desiccant to preserve the DNA. Three replicate discs were prepared in order to ensure enough DNA for analysis. These discs were stored in the freezer (-20°C) until DNA extraction.

Water chemistry was collected by deploying a DataSonde (Hydrolab,

Loveland, CO) off the stern of the boat. The DataSonde recorded temperature, pH, specific conductance, dissolved oxygen, chlorophyll a (chl. a), phosphorus, and nitrogen. Data were collected three times in 15-minute intervals; the mean of these measurements were reported (Table 1.1).

For analysis of phytoplankton diversity via next generation sequencing, material from the silica preserved sample was gently scraped off the filter disc into a mortar. Liquid nitrogen was added to the mortar, and the pestle was used to grind the sample into a fine dust; the ground sample was transferred to a 1.5 mL centrifuge tube.

Since diatom silica frustules might remain intact, the ground sample in the first lysis buffer was frozen in -80 °C for 30 minutes and thawed at 25°C for 15 minutes and repeated once to ensure that the diatom frustules were broken exposing the cell content. After the freeze-thaw step, DNA was extracted using the NucleoSpin Plant II kit (Macherey-Nagel, Bethlehem, PA) following the manufacturer’s protocol, briefly

14 described here. RNAase A was added to the thawed solution and incubated at 65° C for ten minutes. The solution was loaded into a filter tube and centrifuged to allow the flow-through liquid to pass through the filter and the DNA to remain on the filter. A washing buffer was added to the filter tube and centrifuged; this helps to wash away contaminants to purify the DNA still on the filter. An additional washing buffer is used in the same manner as the first. A final volume of 100 uL buffer was used to elute the DNA.

The DNA extract was utilized for a PCR reaction consisting of 2 µL of DNA,

1.5 µL of each primer UPAf (5’-GGACAGAAAGACCCTATGAA-3’) and UPAr (5’-

TCAGCCTGTTATCCCTAGAG-3’), 10.2 µL dH2O and 15 µL AmpliTaq Gold®

PCR Master Mix (Life Technologies, Grand Island, NY). These primers were chosen because they target the Domain V of the 23S rRNA plastid gene present in only photosynthetic organisms (Sherwood and Presting 2007). A 2720 Thermal Cycler

(Applied Biosystems, Grand Island, NY) was used to perform touchdown PCR amplification as follows: 2 minutes at 95°C, 35 cycles of 94°C for 30 s, 66°C for 30 s with every cycle dropping 0.5°C until 58°C is reached, 72°C for 30 s (Sherwood et al.

2008). PCR products were prepared for sequencing using the UltraClean PCR Clean- up kit (MO Bio Laboratories, Inc., Carlsbad, CA) following the manufacturer’s protocol and briefly described here. A buffer was added to the PCR product and the solution was transferred to a filter tube and centrifuged; the buffer allows the DNA to bind to the filter and other PCR components (e.g. unused primers, genomic DNA) went through the filter and were discarded. Another buffer was added to the filter and

15 centrifuged to remove additional unwanted compounds while leaving the DNA on the filter. The cleaned PCR product was resuspended in 25 µL water instead of the typical

50 µL to concentrate the product for sequencing.

A known mixture of algal species was created to serve as a positive control for the experiment. All kits and procedures outlined above were used to collect and process this sample. Unialgal cultures were obtained from Carolina Biological

Company (Burlington, NC). Cultures were chosen to represent a wide taxonomic sampling of cyanobacteria and algae as follows: cyanobacteria (Oscillatoria, Nostoc,

Spirulina), green algae (Chlamydomonas, Micrasterias, Scenedesmus, Spirogyra,

Ulothrix, Volvox), diatoms (Navicula, Synedra), Synurophyte (Synura), Euglenozoa

(Euglena, Phacus) and (Peridinium). Each culture was filtered onto a separate disc and DNA extracted using the same method described for the phytoplankton samples. Each DNA extract was amplified via PCR using the same method described for the phytoplankton samples and confirmed that the PCR product was amplified from each culture on an agarose gel. No PCR product was obtained from three of the green algae Micrasterias, Ulothrix, Scenedesmus, and the dinoflagellate Peridinium cultures, so they were not used in the remainder of the study. The PCR products of the other eleven cultures were combined prior to using the same PCR cleanup kit as the phytoplankton samples.

The cleaned PCR product samples were submitted to the Ohio University

Genomics Facility. The facility personnel performed the rest of the laboratory work to obtain the sequence data. The procedures followed at the Genomics Facility are briefly

16 outlined. All the equipment and consumables used were manufactured by Life

Technologies (Grand Island, NY). The ~400 base pair (bp) samples were purified and fragmented into 200 bp lengths to accommodate the size restrictions associated with next generation sequencing (200 bp chemistry). Adapters were ligated to the 200 bp fragments; these adapters expedite the process of creating a library by allowing population DNA to be sequenced without the need for clonal purification. An E-Gel®

Agarose Gel Electrophoresis system (Life Technologies, Grand Island, NY) was used to size-select 300 bp (200 bp fragment + adapters) from the entire sample. The fragments were quantified using qPCR kit Stratagene MX3000p (Agilent technologies, Santa Clara, CA), and a high sensitivity chip was used to determine the number of DNA molecules per microliter, and diluted to a concentration of approximately 300 million molecules per microliter. An ePCR was completed to attach the fragments with adapters to small beads following the manufacture’s protocols; a thermocycler (Life Technologies, Grand Island, NY) was used to complete this step. After enriching the sample to increase the concentration, the beads were loaded onto a 314 next- generation chip; there is only one bead per well so that each fragment is sequenced separately. The chip is loaded into the Ion Torrent System and run according to manufacturer’s protocol.

Sequence data for the Muskingum and Ohio River samples were generated in

February 2012, when the Ion Torrent system was new to the Ohio Genomics Facility.

Each site sample was placed on a separate 314 chip; these samples will be referred to as Ohio single and Muskingum single. In February 2013, it was decided to PCR and

17 sequence these DNA extract samples again and add a positive control (sample with a known algal mix) because Ion Torrent technology had progressed such that all three samples could be multiplexed (i.e. all on the same 314 chip). The two river samples will be referred to as Ohio multiplex and Muskingum multiplex. It should also be noted that in the span of time between the two sequencing runs, the technique for the

200 bp chemistry was refined and the chemistry was subsequently changed to maximize the efficiency of each chip. These changes provided the opportunity to determine if multiplexing and/or different chemistries alter the results for the two river sites.

FASTQ files (Li et al. 2008) were received from the Genomics Facility. These files contain all sequences produced from the Ion Torrent system along with quality scores for each base, called PHRED scores (Green and Ewing 1998). The PHRED scores scale is logarithmic, such that a score of 10 is interpreted as a 1 in 10 chance the base sequence is wrong, a score of 20 equals a 1 in 100 chance, and so on; generally, a score of 20 or higher is considered acceptable (Daryl Lam, pers. com.). The FASTQ files were analyzed and processed using the program Prinseq, version 0.20.3

(Schmieder and Edwards 2011). Prinseq allows the users to cull data of a certain quality or type from the tens of thousands of reads next generation sequencing can produce. Sequences with the following characteristics were eliminated from the data set: > 100 bp, exact duplicate sequences, exact reverse complement sequences, and/or

PHRED quality scores < 25. Remaining sequences were trimmed on both the 3' and 5' ends based on a window size of 10 and a step size of 1 and threshold value of < 25.

18

Window size indicates that 10 base pairs were analyzed at a time for an average

PHRED score of 25 or above; step size indicates that once a given window had

PHRED scores above 25, the program would not analyze the next window. The remaining trimmed sequences after processing were saved into a FASTA file (Pearson and Lipman 1988). Due to the large size of the FASTA files, each environmental sample was split into FASTA files with 5000 sequences each to expedite supercomputer processing time by using a custom perl script (Daryl Lam unpublished). The files were uploaded onto the Ohio Supercomputer and all FASTA files were run on BLAST (Basic Local Alignment Search Tool) (Altschul et al. 1990) through National Center for Biotechnology Information’s GenBank nucleotide database to identify as many sequences as possible and return a zipped .blastn file.

Once BLAST was finished, the .blastn and .fasta files were uploaded and converted to

.rma files so the program MEGAN (MEtaGenome Analyzer) (Hudson et al. 2011) could be used to analyze the identified sequences. Pie charts of algal community composition for each environmental sample were created in MEGAN.

RESULTS

The two river sites showed variation in water chemistry (Table 1.1). The

Muskingum River had a chl. a measurement six times greater than that of the Ohio

River. The water temperature differed by 6 °C between the two sites, but was probably associated with sampling date as the Ohio River was sampled on September 14, 2011 and the Muskingum River was sampled in 5 October, 2011. In addition, the phosphorus level in the Ohio River was relatively low, but the Muskingum was much

19 higher indicating a eutrophic status (Table 1.1). The pH, specific conductance, nitrate level and dissolved O2 were similar for the two sites.

Pie charts were created by categorizing all sequences into higher taxonomic levels of either phyla or classes for the diatoms and green algae. This grouping was done to reduce the number of taxonomic entities displayed in each chart. All pie charts were standardized to the same taxonomic groups in order to facilitate comparisons among samples.

The positive control sample composed of 11 algal genera yielded 47, 522 sequences after the Prinseq filter. A total of 28,003 sequences were identified via

GenBank (Table 1.2). The sequences were categorized into 16 taxonomic groups (Fig.

1.1). Sequence data in GenBank matched 9 of the genera present in the positive control (Table 1.3). The two genera, Volvox and Navicula were not present in

GenBank so the sequences from the positive control could not be matched. However,

GenBank does contain sequences of other genera of the family Naviculaceae within the class Bacillariophyta, as well as the family Chlorophyceae within the class

Chlorophyta (Table 1.2). Identified via GenBank were taxa not included in the control sample as follows: Cryptophyta, Dinophyceae, Embryophyta, Eustigmatophyceae, PX clade, Rhodophyta, Trebouxiophyceae (Fig. 1.1). The four most represented taxa in the community were Cyanobacteria (Oscillatoria, Nostoc, Spirulina) (33%),

Zygnemophyceae (Spirogyra) (21%), Euglenozoa (Euglena, Phacus) (17%), and green algae (Chlorophyceae + Trebouxiophyceae) (Chlamydomonas) (15%).

Bacillariophyta (Navicula, Synedra) (4%) and Synurophyceae (Synura) (3%) were

20 also present. The unusually large amount of Proteobacteria reads (3%) was surprising because the touchdown PCR method employed should have inhibited most non- photosynthetic bacterial sequences from amplifying (Sherwood et al. 2008). Excluding

Proteobacteria, non-photosynthetic bacteria (Actinobacteria + Bacteriodetes) represented 0.001 % of the community composition.

From the Ohio single sample, a total of 29,417 sequences were produced; of those, 21,176 could be identified via GenBank (Table 1.2). The division

Cyanobacteria was the best-represented taxonomic group, accounting for 47% of the identified sequences (Fig. 1.2). Green algae (Chlorophyceae, Prasinophyceae,

Trebouxiophyceae, and Ulvophyceae) represented 20% of the identified sequences and the other abundant sequences were diatoms (Bacillariophyta) (14%) and

Eustigmatophyceae (7%). The remaining 12% sequences were from 11 other photosynthetic taxonomic groups (Fig. 1.2). The primarily non-photosynthetic

Proteobacteria accounted for 0.001% of the sequences.

The Muskingum single sample produced more sequences than the Ohio single sample with 82,604 total sequences, of which 52,671 were identified through

GenBank (Table 1.2). The division Cyanobacteria was the most abundant group with

47% of the identified sequences (Fig. 1.3). Green algae (Chlorophyceae,

Prasinophyceae, Trebouxiophyceae, and Ulvophyceae) were 25% of sequences,

Eustigmatophyceae were 15%, and diatoms (Bacillariophyta) represented 8%. The 13 other photosynthetic taxonomic groups contributed 5% of the sequence diversity. Non-

21 photosynthetic bacteria (Firmicutes and Proteobacteria) were a small proportion, only

0.001% of the sequences.

A total of 21,401 sequences were produced from the Ohio multiplex sample, of which 13,995 were identified via GenBank (Table 1.2). Cyanobacteria accounted for

54% of the sequences identified, the largest percentage of cyanobacteria within all the environmental samples from either the 2012 or 2013 sequencing (Fig. 1.4). Green algae (Chlorophyceae, Prasinophyceae, Trebouxiophyceae, and Ulvophyceae) were

19% of the sequences, diatoms (Bacillariophyta) were 8%, and Eustigmatophyceae were 7%. The other 12% of the community was represented by the remaining 12 taxonomic groups. Sequences attributed to non-photosynthetic bacteria (Proteobacteria and Verrucomicrobia) were only 0.004% of the community composition (Fig. 1.4).

The Muskingum multiplex sample contained 47,544 sequences total and

27,501 were identified through GenBank (Table 1.2). Cyanobacteria represented 51% of the community composition, while green algae (Chlorophyceae, Prasinophyceae,

Trebouxiophyceae) accounted for 21%, Eustigmatophyceae for 11%, and diatoms

(Bacillariophyta) for 6% (Fig. 1.5). The remaining 11% of sequence diversity were represented by 13 other taxa. Non-photosynthetic bacteria (Proteobacteria and

Verrucomicrobia) accounted for only 0.006%.

All four Ohio and Muskingum samples (either single or multiplex) had similar compositions, with cyanobacteria representing approximately one half of the composition. Community composition from the multiplexed samples were very similar to the single samples, but not identical, even though the same extracted DNA

22 was used from each river. For example, the percentages of Cyanobacteria in each sample for more single and multiplexed were very similar: Ohio single = 47%,

Muskingum single = 47%, Ohio multiplex = 54%, Muskingum multiplex = 51%. The

Ohio River samples whether run singly or multiplexed had less than half the number of sequences of the Muskingum (single or multiplex) or the positive control. The

Muskingum multiplex sample and positive control had similar number of reads, but the Muskingum single sample had twice the reads (Table 1.2). Of the 4 river samples, the Ohio multiplex sample contained the smallest number of sequences, while the

Muskingum single sample contained the largest number of sequences.

When the data from single chip samples were generated, they were processed in March 2012, with the samples available on GenBank at the time. When these samples were reanalyzed in February 2013 using the data available in GenBank at that date, the composition of the pie charts differed considerably. The earlier analysis showed cyanobacteria and diatoms each representing approximately one third of the community composition (Fig 1.6). After observing the large difference in taxonomic structure between the multiplexed and single samples, the single samples were reanalyzed via GenBank at the same time as the multiplex samples in April 2013 to eliminate GenBank sequence coverage as a source of variation.

DISCUSSION

The difference in the number of sequences between Ohio and Muskingum

River sites may reflect the chl. a measurements with the Ohio having low chl. a

23 suggesting low biomass and the Muskingum having much higher chl. a suggesting high biomass. This appears to be a real difference since it was seen in both the single chip and multiplex data. As well the positive control also had twice as many sequences as the Ohio samples. The high levels of chl. a may be partially attributed to the elevated levels of phosphorus recorded in the Muskingum River sample as compared to the Ohio River sample.

The positive control sample with a known mix of 11 algal genera was an excellent example to highlight the advantages and disadvantages of utilizing GenBank for identification sequences. Volvox was not included in the identified sequences because GenBank does not contain any plastid sequences for Volvox, even though it is a very common freshwater alga. A whole-genome shotgun (WGS) sequence for

Volvox was located in GenBank, but there were large gaps within the sequences, making it impossible to locate the UPA plastid sequence and, therefore, the WGS sequence was not useful. In addition, there are few diatom sequences within GenBank, relative to the over 250 genera of diatoms recognized (Round et al. 1990). This lack of sequence data could explain the absence of the diatom genus Navicula and the low numbers of Synedra sequences recorded in the positive control. Although we did get a relatively accurate overall representation of the algae present, our results were limited by the sequences available in GenBank.

In addition to unidentified sequences, there were sequence identifications that appear to have been initially misleading. The large number of Proteobacteria sequences found within the positive control was puzzling at first. When the sequences

24 were examined closer, some of them belonged to the order Rhodospirillales, a photosynthetic bacteria nicknamed “purple bacteria” (Trüper and Pfennig 1978). Since

Rhodospirillales is photosynthetic, it would possess plastids that contained the UPA barcode used. Similarly, the dinoflagellate (Dinophyceae) sequences recovered from the positive control seemed unlikely. We attempted to include a dinoflagellate,

Peridinium, in the positive control, but it did not PCR amplify. The failure of

Peridinium to PCR amplify was expected since Sherwood and Presting (2007), when developing this barcode region, noted that they could not get PCR amplification of and therefore this region would not be able to detect these organisms.

Upon further research into these dinoflagellate (Dinophyceae) sequences in GenBank, it was discovered that they were from a recent study on dinoflagellates engulfing diatoms and utilizing the diatoms’ plastids (Imanian et al. 2010). These sequences were classified as dinoflagellate origin in GenBank, but really were derived from diatoms and were probably identified in the positive control because of the Navicula and Synedra sequences. The small number of Rhodophyta reads in the positive control most likely can be attributed to contamination within the laboratory because that is the group of organisms that are researched in the laboratory. Therefore, it would appear that these next generation sequencing techniques are very sensitive and further precautions are needed such as filtering and extracting in a sterile hood to prevent outside contamination. Field samples could also be filtered and extracted in a sterile hood, or filtering could occur in the field, though care should be taken to thoroughly clean equipment in between sites to prevent cross contamination from site to site.

25

There were other unexplained taxonomic groups identified from the positive control as follows: Cryptophyta, Embryophyta, Eustigmatophyceae, and the

Phaeophyceae-Xanthophyceae clade. The identification of these sequences may be explained several different ways. These may have been minute traces of contamination from the commercial cultures. It is also possible that they were misidentifications due to using only a 200 bp segment of a 400 bp barcode. In the future, using cultures from research-based culture collections such as the Culture Collection of Algae at Gottingen

University could eliminate the possibility of outside contamination as these cultures would be of better quality. In addition, the 400 bp chemistry for the Ion Torrent system has been released in March 2013, and using this method would provide the whole barcode such that misidentifications due to short fragments would be greatly reduced.

GenBank is updated on a daily basis with sequences from newly published and unpublished sources. This constant influx of new sequences proved to be an issue in this study for analyzing the river data. The community composition pie charts from

GenBank identifications produced immediately after the single chip sequencing versus those produced a year later displayed noticeably different results from the same sequence data for each site. Therefore, if data are to be compared, the GenBank search needs to be conducted at the same time due to changes in the data from year to year in

GenBank.

Although the aforementioned shortcomings of GenBank caused initial discrepancies when examining the data, it is important to note that the algal diversity

26 between the two experiments (single chip versus multiplex) were very similar, but not identical, despite using the same extracted DNA for both experiment. There may be a few additional sources of variation including changing Ion Torrent chemistries, human error, and the absence or presence of multiplexing samples. When the single chip samples were tested, the Ohio University Genomics Facility just received the Ion

Torrent and those data were among the first samples to be run through the Ion Torrent machine. Since this was a new procedure, there could have been human error due to inexperience with the system. Also in 2012, samples were not yet being multiplexed, so there was more ‘space’ for each river sample per chip in the single chip samples as opposed to the multiplexed river samples and positive control. The changing chemistries between the two time periods could also bias the data towards certain taxonomic groups.

Wehr and Thorp (1997) conducted an algae survey of the Ohio River. Of the

134 taxa recorded, 60% were diatoms (Bacillariophyta), while the other 40% consisted of cyanobacteria, green algae (Chlorophyceae, Prasinophyceae, Trebouxiophyceae), diatoms (Bacillariophyta), Cryptophyta and dinoflagellates (Dinophyceae). The community composition of the Ohio River samples from the next generation sequencing was quite different. There are numerous explanations for this difference such as the lack of diatom sequences in GenBank. However, there are other potential sources of variation as well, such as the locations sampled on the River and time of year sampled.

27

Overall, much information was gleaned from the five next generation samples.

In theory using next generation sequences for the capture of algal diversity would appear to be very straightforward. However, there are significant hurdles to be overcome before it can be put into practice. One of the more significant advancements that will probably make this technology more robust for diversity surveys is the 400 bp chemistry, which will allow for the sequencing of the whole barcode and cut back on misidentifications and sequences that can’t be identified. Along with that technological advance, more standardized protocols for all aspects from collecting the sample in the field through the sequencing of the environmental sample will need to be developed and evaluated. The positive control sample results attested to the sensitivity of the method to DNA in the laboratory. Lastly, a reference database for this particular barcode would be of great benefit to focus on particular taxa that need sequencing and to better assess the sequences from GenBank.

28

Table 1.1. Water chemistry for the Ohio and Muskingum Rivers.

Site Date Temp pH Specific Nitrate Phosphorus Chl. a Dissolved O2 (oC) Conductance (ppm) (ppm) (µg/l) (mg/l) (µS/cm)

Ohio 9/14/11 25 7.4 506 0.9 0.31 31.3 6.6

Muskingum 10/5/11 16 7.8 740 0.4 2.06 206.1 9.2

29

Table 1.2. Number of sequences after Prinseq filters were applied to the raw sequence data with taxonomic group identified via GenBank search. PX clade indicates the Phaeophyceae-Xanthophyceae clade. Muskingu Ohio m Positiv Ohio Muskingu multiplexe multiplexe e single m single d d Bacillariophyta Bacillariophyta 1045 2929 4280 1108 1724 Chlorophyceae 3246 2010 6815 1116 3914 Pedinophyceae 24 Prasinophyceae 297 145 162 75 Trebouxophyceae 1070 2103 6052 1352 1972 Ulvophyceae 13 17 7 Cryptophyta Cryptophyta 22 776 809 770 524 Cyanobacteria Cyanobacteria 9265 10070 23958 7583 14612 Dinoflagellata Dinophyceae 626 347 1428 236 677 Euglenozoa Euglenozoa 4854 401 201 205 121 Haptophyta Haptophyceae 19 6 Heterokontophyta Eustigmatophyceae 11 1410 7688 947 2898 PX clade 16 387 19 211 17 Raphidophyceae 148 69 Synurophyceae 868 Glaucophyta

Glaucocystophyceae 5 Embryophyta 37 7 9 9

Mesostigmatophyceae 129 234 47 34 Streptophytina 5 Zygnemophyceae 5932 78 648 90 658

30

Rhizaria 11 10 6 8 Rhodophyta Rhodophyta 47 35 264 26 86 Non-photosynthetic bacteria Actinobacteria 6 6 Bacteriodetes 32 Firmicutes 6 Proteobacteria 932 20 45 42 165 Verrucomicrobia 9 5 Total sequences identified by GenBank 27039 21156 52620 13944 27331 Total sequences, identified and unidentified by 29,41 GenBank 47,522 7 82,604 21,401 47,544

31

Table 1.3. Number of sequences identified via GenBank for genera included in positive control. Taxonomic group Number of sequences Cyanobacteria Oscillatoria 3,496 Nostoc 1,403 Spirulina 1,854 Chlorophyta Chlamydomonas 1,598 Spirogyra 5,905 Bacillariophyta Synedra 5 Synurophyceae Synura 868 Euglenozoa Euglena 21 Phacus 2,049 Genus not found, but family found Volvox (Chlorophyceae) 3,246 Navicula (Naviculaceae) 248

32

Figure 1.1. Positive control consisting of 11 unialgal cultures. The taxonomic groups identified in GenBank represent the unialgal cultures as follows: Cyanobacteria (Oscillatoria, Nostoc, Spirulina), Zygnemophyceae (Spirogyra), Euglenozoa (Euglena, Phacus), and Chlorophyceae (Chlamydomonas, Volvox), Bacillariophyta (Navicula, Synedra) and Synurophyceae (Synura). Cryptophyta, Dinophyceae, Embryophyta, Eustigmatophyceae, PX (Phaeophyceae-Xanthophyceae) clade, Rhodophyta, Trebouxiophyceae were identified through a GenBank search, but do not represent the unialgal cultures.

33

Figure 1.2. Ohio River phytoplankton sample composition from single chip data with a total of 19 taxonomic groups identified via a GenBank search.

34

Figure 1.3. Muskingum River phytoplankton sample composition single chip data with a total of 20 taxonomic groups identified via a GenBank search.

35

Figure 1.4. Ohio River phytoplankton sample composition from multiplex data with a total of 19 taxonomic groups identified via a GenBank search.

36

Figure 1.5. Muskingum River phytoplankton sample composition from data with a total of 19 taxonomic groups identified via a GenBank search.

37

Figure 1.6. Composition of Ohio and Muskingum River phytoplankton single chip samples; sequences identified via a GenBank search in March 2012.

38

CHAPTER TWO

INTRODUCTION

Coal mining has been an important industry in the past and remains prevalent today. In the US, the industry has been regulated since 1977. A legacy of pre- regulation mining is acid mine drainage (AMD), which is a significant environmental problem worldwide and throughout most of Appalachia (Gray 1997). Thousands of miles of streams within Appalachia have been affected by AMD and the Ohio

Department of Natural Resources has cataloged 1,300 affected stream miles in Ohio

(ODNR). AMD is the acidic, metal rich water that flows from abandoned mines.

When iron pyrite (FeS2) and other sulphidic materials found within the mines come in contact with water and oxygen, they are oxidized to sulphuric acid (H2SO4) and ferric hydroxide (Johnson and Hallberg 2005). In regions where the streams do not have natural buffering capacity, the water coming from the mine may be less than pH 3 and laden with dissolved metals. When these waters mix with streams, the AMD can cause both physical and chemical stress on a stream and the biota (Hogsden and Harding

2012). The soluble metals, such as aluminum (Al) and Manganese (Mn) in the acidified water, precipitate around pH 5. This precipitate is a physical stressor coating the stream bottom making it unsuitable habitat for much of the aquatic life (Planas

1996).

Since AMD is so harmful to aquatic life and limits water uses, there have been significant efforts to ameliorate this type of pollution. Remediation for AMD can be

39 categorized into passive and active treatment. Active treatments like water aeration and lime addition require ongoing action to maintain stream restoration (Johnson and

Hallberg 2005). If these methods should stop at any point, the AMD effects would return and potentially destroy any remediation achieved. For example, the lime doser in a southeastern Ohio watershed (Hewett Fork) was offline for two weeks and research showed that fish were seriously impacted, while macroinvertebrates displayed little impact (Kruse et al. 2012). Passive treatments like wetland buffers and limestone drainage areas do not require ongoing maintenance, which make them appealing options (Johnson and Hallberg 2005). Passive treatments have been successful in many cases (Walter et al. 2012), but some streams require more action to meet State and Federal water quality standards. A passive wetland was first constructed in Hewett Fork (Farley et al. 2004), but found to be ineffective (NPS,

2009), necessitating the need to install an active treatment lime doser.

Remediation, either passive or active, is quite expensive and there is a need to determine the effectiveness. Typically, to gauge the efficacy of remediation, both chemical and biological sampling is conducted (NPS 2011). While both chemical and biological parameters are important to determine the health of a stream, biological parameters can detect one-time events that chemistry wouldn’t identify unless a measurement was taken before, during, and after an event (Dodds 2002). Fish and macroinvertebrates are commonly sampled, and many different metrics have been developed to quantify and rank streams according to relative stream health. The Index of Biotic Integrity (IBI) (Karr 1981) and the Index of Wellbeing (Iwb) (Patrick et al.

40

1973) are usually used for fish. Both indices utilize several metrics including species abundance, diversity, and percentage of tolerant/sensitive species, which are assigned a point value; fish caught within a stream reach are identified and tallied, and the points are combined to produce an index score correlated to the health of an ecosystem. The Invertebrate Community Index (ICI) (Ohio EPA 1988) was adapted from the IBI for macroinvertebrates; the Macroinvertebrate Aggregated Index for

Streams (MAIS) (Smith and Voshell 1997) is also commonly used. Similar to the fish metrics, macroinvertebrate metrics are given a point value, and communities receive a score based on the species collected.

Fish and macroinvertebrate indices have been widely used, but they provide a partial picture of the biotic community. Benthic algae, particularly diatoms, could help provide a more complete picture, as they are the primary producers in the stream.

Diatoms are not as commonly used as fish and macroinvertebrates for freshwater surveys, but diatom indices have been developed. Zalack et al. (2010) developed the

AMD Diatom Index of Biotic Integrity (AMD-DIBI) that could also be implemented into biological surveys. The AMD-DIBI uses the same concept and point system that was developed for the IBI, but it is tailored to the diatom community; for example, instead of examining different fish types (darters, suckers), percentages of acidophilic and eutraphentic diatoms are analyzed (Zalack et al. 2010). The AMD-DIBI scores are determined with the traditional method of examining acid-boiled diatom frustules via microscopy; the process of preparing and identifying diatoms takes a great deal of time and expertise to complete. It should be noted that the entire photosynthetic

41 periphyton community is typically not sampled and analyzed (but see Verb and Vis

2005).

All of these indices use the biodiversity of a habitat to find relative stream health. Biodiversity is defined as the diversity of species or genetic material within a demarcated area (Swingland 2001). Using any of the metrics discussed above requires identification expertise that can take years to acquire. Next generation sequencing could bypass this issue and allow a researcher inexperienced with taxonomic identification to analyze biodiversity of a stream community.

In this study, periphyton diversity along an AMD active treatment gradient was examined using next generation sequencing. It was hypothesized that the distance from the doser will correlate with benthic algae composition; sites further away from the doser should have a greater number of algal taxa while the sites closer to the doser would have few taxa and lower diversity due to the stressful chemical conditions.

METHODS

Study Site Description

Hewett Fork is a tributary of Raccoon Creek in Southeast Ohio, within Athens and Vinton Counties (Fig. 2.1). In the early 1900s, mining took place near river mile

11 from Rice Hocking underground mine, which was abandoned in 1923 (Farley et al.

2004). In 1991, the Ohio Department of Natural Resources and the Division of

Mineral Resources Management funded a passive wetland treatment system. The acid load decreased somewhat, but still average acid load was high (329 kg/day) (Farley et

42 al. 2004). To ameliorate the effects of Hewett Fork acidic water input into Raccoon

Creek, an Aqua-Fix® alkaline doser was placed near Carbondale in 2004 to neutralize the acidified waters and provide better water quality that should in turn result in the recovery of the biological community (Kruse et al. 2012) and as a result a gradient of restoration is present along the length of Hewett Fork (Fig. 2.2). All sites included in this study are located along the main stem of Hewett Fork, with the exception of HF

120, which is located at the mouth of Carbondale Creek leading into Hewett Fork just below the doser (Fig. 2.2). In order to monitor the efficacy of the remediation, the

Raccoon Creek Partnership (RCP), a watershed group, monitors 117 river miles on a regular basis for chemistry, fish and macroinvertebrates (NPS 2011).

Field Methods

Nine of the permanent RCP sites were visited on October 2, 2012. At each site, water temperature, pH, and conductivity were measured with a hand-held meter

(Myron L Ultrameter 6P), and current velocity was measured with a pygmy meter

(USGS Pygmy meter model 6205). Water samples were collected for analysis by a

+ professional lab (Cambridge Lab) of total alkalinity, Ca , total SO4, hardness, and total dissolved metals including Al, Mn, and Fe. A water sample was collected in the field and filtered through a Whatman #6 filter for measurement of inorganic phosphorus in the laboratory. Periphyton was sampled by randomly collecting ten rocks in a riffle, and scraping a known area (7.1 cm2 per rock) using a toothbrush and an O-ring template. The slurry from the rocks was pooled, placed in a 250 ml Nalgene bottle and stored on ice for transport to the laboratory.

43

Laboratory Methods

Within 12-hours of sampling, the periphyton sample was apportioned for a voucher sample, DNA sequencing, chlorophyll a (chl. a), and ash-free dry mass

(AFDM). For the voucher, 10 ml of the periphyton sample was preserved in 2.5% calcium carbonate buffered glutaraldehyde. For DNA analysis, 1 ml of the sample was filtered onto a Millipore 0.45µm filter disc. The filter disc was folded and placed into a 20 ml scintillation vial approximately ½ full of silica desiccant (Fisher brand 200 mesh). Three replicates of the desiccant dried material were made, as a precaution against lab processing errors. An aliquot of 1 ml was filtered onto a Whatman #6 filter for the chl. a analysis; the filters were wrapped in aluminum foil and placed immediately into the freezer until further processing. For AFDM, 5 ml of sample was filtered onto a pre-combusted, dried, and weighed Whatman filter.

For chl. a analysis, the protocol by Arar and Collins (1997) was followed. The filter was steeped in 8 ml of 90% acetone overnight. The supernatant was decanted into a cuvette and the fluorescence was measured using the TD-700 fluorometer

(Turner Design).

For AFDM, the protocol of Steinman et al. (1996) was followed. The filter was placed into a drying oven until the filter reached a constant mass. The filter was placed in a desiccator to return to room temperature. It was weighed and oxidized at 500°C for two hours to ensure that the thick film was completely combusted. The oxidized filter was placed into the desiccator to return to room temperature, and the final weight recorded.

44

Inorganic phosphorus was analyzed by using methodology from Stainton et al.

(1974). Water samples collected in the field were kept frozen and in the dark until processing. For each sample, 1 ml of color reagent was added to 5 ml of a water sample. The sample was set aside for a fifteen-minute reaction time to allow the color to develop. The samples were measured using a spectrophotometer

(ThermoSpectronic, Genysys 20).

Calculations

AFDM was calculated using the following equation:

푚푝푟푒푐표푚푏푢푠푡푒푑 푙표푎푑푒푑 푓푖푙푡푒푟 − 푚푐표푚푏푢푠푡푒푑 푙표푎푑푒푑 푓푖푙푡푒푟 = 퐴퐹퐷푀

Chl. a was calculated using the following equations:

푅푑⁄푅푐 [ × (푅푑 − 푅푐)] = 퐶ℎ푙. 푎 (푅푑⁄푅푐) − 1

Where Rd is the loaded filter fluorescence before acidification and Rc is the filter fluorescence 90 seconds after acidification. Chl. a value is expressed as µg/l.

The autotrophic index (AI) is a unit less ratio of AFDM in mg and Chl. a in mg/l:

퐴퐹퐷푀 = 퐴퐼 퐶ℎ푙. 푎

This ratio is informative in that higher values indicate more heterotrophic conditions while lower values indicate more autotrophic conditions.

DNA Extraction

45

For analysis of algal diversity via next generation sequencing, DNA from the silica preserved sample was gently scraped off the filter disc into a mortar. Liquid nitrogen was added to the mortar, and the pestle was used to grind the sample into a fine dust. The ground sample was transferred into a 1.5 ml centrifuge tube. Since diatom silica frustules might remain intact, the ground sample with the first lysis buffer added was frozen in -80 °C for 30 minutes and thawed at 25 °C for 15 minutes and repeated once to ensure that the diatom frustules were broken exposing the cell content. DNA was extracted using the NucleoSpin Plant II kit (Macherey-Nagel) using manufacturer’s protocol after the freeze-thaw step; 100 ul of DNA was eluted.

PCR and Next Generation Sequencing – 400 bp chemistry

The PCR conditions and reaction mix as stated in Chapter 1 were utilized with the exception of the primer sequences. To facilitate sample multiplexing, four barcodes were designed and attached to the forward primer (p23F) (Table 2.1). This primer modification made it possible to put PCR product from two sites on one 314 sequencing chip and then sort the sequences by sites using bioinformatics after sequencing. The DNA was extracted, PCR amplified and cleaned using the same methods outlined in Chapter 1. The extracted and cleaned DNA was processed by the

Ohio University Genomics Facility for next-generation sequencing via the Ion Torrent

(Life Technologies) with 400 bp chemistry using the same methods outlined in

Chapter 1, except 400 bp sequences were kept whole instead of split into 200 bp segments.

46

RESULTS

The values for AFDM, chl. a and the autotrophic index were calculated (Table

2.2). Measurement of AFDM for the sites ranged from 17 to 6 mg. The filter from site

HF 120, just below the doser, exhibited a red-orange color (Fig. 2.3). Sites HF 137,

HF 129 and HF 120 all had AFDM measurements ≥ 9 mg; the highest AFDM was at site HF 129, just upstream of the doser (Fig. 2.4). The sites farther from the doser had a mass of ≤ 9 mg. Chl. a measurements ranged from 0.0327 to 0.1269 mg/L (Table

2.2). Most sites had a measurement between 0.04 to 0.10 mg/L, with the exception of

HF 060 with a lower value and HF 039 with a higher value (Fig. 2.5). The values for the sites upstream of the doser were similar to the most downstream sites of the doser.

Autotrophic index had a broad range of values from 223.2 to 47.28 (Table 2.2). The four closest sites to the doser (HF 137, HF 129, HF 120, HF 60) had values above 160, while the four sites farther from the doser had values lower than 130 (Fig. 2.6).

All sites, except HF 190, yielded PCR product. Therefore, HF 190 was excluded from the experiment. Sites HF 129 and HF 120 had very light bands relative to the others (Fig. 2.7), but they could still be used in the study. Sites HF 090, HF 039, and HF 010 produced strong bands (Fig. 2.7).

The 200 bp chemistry requires 50 ng of PCR product for sequencing. The periphyton samples yielded far less. Attempts were made to increase the yield by pooling numerous PCR products of each site, by using ready-to-go PCR beads (GE

Healthcare, Pittsburgh, PA) and by pooling products of these PCR beads. However, none of these methods yielded enough PCR product. At the same time, ABI

47 announced 400 bp chemistry, which would be a significant improvement for this study as the PCR product/barcode is ~400 bp and the samples would not need such a high yield. However, due to unforeseen circumstances and delays, the data for the 400 bp next generation sequencing was not gathered in time. The 400 bp chemistry for Ion

Torrent was released in mid-March 2013 after several delays from the originally planned release date in early January 2013. The first 400 bp sequence data produced in late March had technical difficulties, triggering the weeks-long process of trouble- shooting the new chemistry.

DISCUSSION

The restoration gradient along Hewett Fork has been documented in numerous studies (Kruse et al. 2012, Pool et al. 2013, Gray and Vis 2013). The data for chl. a, and autotrophic index values in the present study also showed this gradient. The chl. a values displayed a clear positive trend, with sites farther from the doser having higher chl. a values than sites closer to the doser implying that the algal community closer to the doser has less biomass when compared to sites farther downstream. This same trend in chl. a data has been noted in an earlier study of Hewett Fork (Gray 2011). The results of the autotrophic index indicated that sites close to the doser had heterotrophic conditions, while the sites farther from the doser had comparatively autotrophic conditions. According to Hauer and Lamberti (2007), a value over 100 indicates a heterotrophic community and therefore, only sites HF 090, HF 039, and HF 010 were autotrophic. In streams of this size, in natural conditions there should be an

48 autotrophic community composed of numerous photosynthetic taxa (Vannote et al.

1980).

The AFDM values were contrary to the expected result. The sites closer to the doser were expected to have lower biomass due to decreased biomass in stressed conditions as compared to sites further from the doser in less stressed conditions. This trend was also seen in Hewett Fork data from Fuelling (2013), though not as noticeable. Kruse et al. (unpublished) found that calcium hydroxide precipitates close to the doser turn into calcium oxide precipitates when combusted, releasing water that artificially increases the organic content; this would explain why site HF 120, the site immediately downstream of the doser, would have an unusually high AFDM value.

The red-orange color seen on the AFDM filter from site HF 120 can be partially explained by the water chemistry at that site. The pH at HF 120 dropped to 3.26; Fe3+ precipitates into iron hydroxides at a pH above 2 (Trendall and Morris, 2000; Wei et al., 2005).

The absence of the next-generation data using 400 bp chemistry highlights the risks of using new technology: release dates can be pushed back multiple times and the first releases might not work. When on a strict time schedule for a thesis, this all spells trouble in the end. The Ohio Genomics Facility is continuing to troubleshoot the

400 bp chemistry and we hope to receive the data at a later date.

49

Table 2.1 Primers used for PCR to prepare samples for multiplexing. Shaded bases indicate barcode to separate the different environmental samples during the bioinformatics pipe procedure.

Primer name Primer sequence

p23F_BC1 5`- CCATCTCATCCCTGCGTGTCTCCGACTCAGCTAAGGTAACGGACAGAAAGACCCTATGAA-3`

p23F_BC2 5`-CCATCTCATCCCTGCGTGTCTCCGACTCAGTAAGGAGAACGGACAGAAAGACCCTATGAA-3`

p23F_BC3 5`-CCATCTCATCCCTGCGTGTCTCCGACTCAGAAGAGGATTCGGACAGAAAGACCCTATGAA-3`

p23F_BC4 5`-CCATCTCATCCCTGCGTGTCTCCGACTCAGTACCAAGATCGGACAGAAAGACCCTATGAA-3`

p23R 5`-CCTCTCTATGGGCAGTCGGTGATTCAGCCTGTTATCCCTAGAG-3`

50

Table 2.2 Water chemistry data for sites along Hewett Fork. Data collected October 2, 2012. Site designations as in Fig. 2.1.

Site Temp pH Conduc- Total Hardness Total Fe Total Al Total Total Ca SO4 Inorganic (°C) tivity alkalinity (mg (mg/l) (mg/l) Mn (mg/l) (mg/l) P (µS/cm) (mg/l) CaCO3/l) (mg/l) (µg/l)

HF 14 5.2 518 2.42 235 0.18 0.902 2.46 60.2 246 0.067 137 HF 14 6.5 513 16.3 221 1.25 <0.050 1.56 57.7 223 0.069 129 HF 16 3.3 1070 0.0 279 14.90 18.1 1.98 68.5 501 0.074 120 HF 17 6.9 979 27.2 477 0.58 0.058 0.587 161 502 0.066 090 HF 16 6.8 926 26.1 450 0.46 <0.050 0.382 148 471 0.069 060 HF 16 6.7 937 24.1 556 0.22 0.055 0.136 152 471 0.067 045 HF 16 6.7 874 25.7 426 0.55 0.117 0.261 138 421 0.069 039 HF 15 6.6 696 28.8 320 0.380 0.080 0.259 95.2 305 0.0678 010

51

Table 2.3. Calculated values for AFDM, Chl. a, and autotrophic index for sites along Hewett Fork. Site designations as in Fig. 2.1. Site AFDM (mg) Chl. a (µg/l) Autotrophic Index HF 137 9 55.4 162.4 HF 129 17 78.1 217.7 HF 120 10 44.8 223.2 HF 090 6 89.4 67.1 HF 060 6 32.7 183.5 HF 045 9 74.1 121.5 HF 039 6 126.9 47.3 HF 010 6 69.0 87.0

52

Figure 2.1. Map of Hewett Fork sampling sites. Note that sites HF 137 and HF 129 are upstream the doser, and all other sites are downstream and site HF 120 is at the mouth of Carbondale Creek flowing into Hewett Fork.

53

Figure 2.2. Sites on Hewett Fork. A. HF 095 (not sampled) closer to the doser exhibited an orange deposit covering the stream bottom making sampling impossible. B. HF 010 farther from the doser showed no orange sediments.

54

Figure 2.3. Filters for AFDM before combustion. Site HF 120 was an orange color due to iron oxide.

55

Figure 2.4. AFDM values for sites along Hewett Fork. Site designations as in Figure 2.1. The linear regression line for the samples is y = -0.9405x + 12.857.

56

Figure 2.5. Chlorophyll a measurements for sites along Hewett Fork. Site designations as in Figure 2.1. The linear regression for the samples is y = 0.0044x + 0.0515.

57

Figure 2.6. Autotrophic index values for sites along Hewett Fork. Site designations as in Figure 2.1. The linear regression for the samples is y = -18.682x + 222.78.

58

Figure 2.7. Agarose gel of PCR products for DNA extracts from sites along Hewett Fork. All sites had PCR product, but the bands for HF 129 and HF 120 they are not visible on the image. Site designations as in Fig. 2.1.

59

REFERENCES Altschul, S. F., Gish, W., Miller, W., Myers, E. W. & Lipman, D. J. 1990. Basic local alignment search tool. J. Mol. Biol. 215:403-410. Arar, E. J. & Collins, G. B. 1997. In Vitro Determination of Chlorophyll a and Phaeophytin a in Marine and Freshwater Algae by Fluorescence. Method 445.0. National Research Exposure Laboratory, USEPA, Cincinnati, OH. pp 1- 5. Benke, A. C. & Cushing, C. E. 2005. Rivers of North America, Elsevier Press, Burlington, MA, 1168 pp. Creer, S., Fonseca, V. G., Porazinska, D. L., Giblin-Davis, R. M., Sung, W., Power, D. M., Packer, M., Carvalho, G. R., Blaxter, M. L., Lambshead, P. J. D. & Thomas, W. K. 2010. Ultrasequencing of the meiofaunal biosphere: Practice, pitfalls and promises. Mol. Ecol. 19:4-20. DeWalt, R. E. 2011. DNA barcoding: A taxonomic point of view. J. N. Am. Benthol. Soc. 30:174-81. Dodds, W. K. 2002. Freshwater Ecology: Concepts and Environmental Applications, Academic Press, San Diego, CA, 596 pp. Dopheide, A., Lear, G., Stott, R. & Lewis, G. 2009. Relative diversity and community structure of in stream biofilms according to molecular and microscopy methods. Appl. Environ. Microbiol. 75:5261-72. Ekrem, T., Stur, E. & Hebert, P. D. N. 2010. Females do count: Documenting chironomidae (diptera) species diversity using DNA barcoding. Org. Divers. & Evol. 10:397-408. Farley, M., McCamment, B., Bryenton, D., Miller, B. & Greenlee, M. 2004. Stream Dosing for Acid Mine Drainage Pollution at Carbondale and Jobs Hollow in Southeastern, Ohio, Ohio Department of Natural Resources Division of Mineral Resources Management, Jackson, Ohio. Fuelling, L. J. 2013. Interactive effects of AMD and grazing on periphyton productivity, biomass, and diatom diversity. Ohio University, Athens, Ohio. Thesis. Graham, L. E., Graham, J. M. & Wilcox, L. W. 2009. Algae, 2nd ed. Benjamin- Cummins Publishing Company, San Francisco, CA, 616 pp. Gray, J. B. 2011. Reference diatom assemblage response to transplantation into a stream receiving treatment for acid mine drainage in Southeastern Ohio. Ohio

60

University, Athens, Ohio. Thesis. http://etd.ohiolink.edu/view.cgi?ohiou1317921115. Gray, J. B. & Vis, M. L. 2013. Reference diatom assemblage response to restoration of an acid mine drainage stream. Ecol. Ind. 29:234-245. Gray, N. F. 1997. Environmental impact and remediation of acid mine drainage: A management problem. Environ. Geol. (Berlin) 30:62-71. Green, P. & Ewing, B. 1998. Basecalling of automated sequencer traces using phred. Genome Res. 8:175-194. Hajibabaei, M., Shokralla, S., Zhou, X., Singer, G. A. C. & Baird, D. J. 2011. Environmental barcoding: A next-generation sequencing approach for biomonitoring applications using river benthos. PLoS One 6:e17497. Hogsden, K. L. & Harding, J. S. 2012. Consequences of acid mine drainage for the structure and function of benthic stream communities: A review. Freshw. Sci. 31:108-120. Hudson, D. H., Mitra, S., Weber, N., Ruscheweyh, H. & Schuster, S. C. 2011. Integrative analysis of environmental sequences using MEGAN4. Genome Res. 21:1552-1560. ILGARD. 2012. Lower Muskingum River watershed management plan. 2012. Accessed 2013. http://www.muskingumriver.org/lowermuskingumwatershedplan.pdf. Imanian, B., Pombert, J. & Keeling, P. J. 2010. The complete plastid genomes of the two 'dinotoms' Durinskia baltica and Kryptoperidinium foliaceum. PLoS One 5:e10711. Johnson, D. B. & Hallberg, K. B. 2005. Acid mine drainage remediation options: A review. Sci. Total Environ. 338:3-14. Karr, J. R. 1981. Assessment of biotic integrity using fish communities. Fisheries 6:21-27. Keeling, P. J. 2010. The endosymbiotic origin, diversification and fate of plastids. Phil. Trans. Roy. Soc. London B Biol. Sci. 365:729-48. Kruse, N. A., Bowman, J. R., Mackey, A. L., McCament, B. & Johnson, K. S. 2012. The lasting impacts of offline periods in lime dosed streams: A case study in Raccoon Creek, Ohio. Mine Water Environ. 31:266-272.

61

Kruse, N.A., DeRose, L., Korenowsky, R., Bowman, J.R., Lopez, D., Johnson, K. & Rankin, E. The role of remediation, natural alkalinity sources and physical stream parameters in stream recovery. Unpublished. Li, H., Ruan, J. & Durbin, R. 2008. Mapping short DNA sequencing reads and calling variants using mapping quality scores. Genome Res. 18:1851-8. McGrady-Steed, J., Harris, P. M. & Morin, P. J. 1997. Biodiversity regulates ecosystem predictability. Nature (London) 390:162-5. Metzker, M. L. 2009. Sequencing technologies—the next generation. Nat. Rev. Gen., 11:31-46. Minshall, G. W. 1978. Autotrophy in stream ecosystems. Bioscience 28:767-71. Mulholland, P. J. 1996. Role in nutrient cycling in streams. In Stevenson, R. J. Bothwell, M. L. Lowe,R.L. [Ed.] Algal Ecology: Freshwater Benthic Ecosystem. Academic Press, San Diego, CA, pp. 639. NPS. 2009. NPS report - Raccoon Creek watershed - Carbondale II doser. non-point source monitoring system. 2013. http://www.watersheddata.com/userview_file.aspx?UserFileLo=1&UserFileID =99. NPS 2011. NPS report - raccoon creek watershed - carbondale II doser. non-point source monitoring system. 2013. http://www.watersheddata.com/userview_file.aspx?UserFileLo=1&UserFileID =119. ODNR. Acid mine drainage abatement program. 2013. http://www.ohiodnr.com/mineral/acid/tabid/10421/Default.aspx. Ohio Environmental Protection Agency. 1988. Biological Criteria for the Protection of Aquatic Life, Volume II, Ohio EPA, Columbus, Ohio. Patrick, D. L., Bush, J. W. & Chen, M. M. 1973. Methods for measuring levels of well-being for a health status index. Health Serv Res 8:228-245. Pearson, W. R. & Lipman, D. J. 1988. Improved tools for biological sequence comparison. Proc. Natl. Acad. Sci. U. S. A. 85:2444-8. Peterson, C. G. & Hoagland, K. D. 1990. Effects of wind-induced turbulence and algal mat development on epilithic diatom succession in a large reservoir. Arch. Hydrobiol. 118:47-68. Pfrender, M. E., Hawkins, C. P., Bagley, M., Courtney, G. W., Creutzburg, B. R., Epler, J. H., Fend, S., Ferrington, L. C., Jr., Hartzell, P. L., Jackson, S., Larsen,

62

D. P., Levesque, C. A., Morse, J. C., Petersen, M. J., Ruiter, D., Schindel, D. & Whiting, M. 2010. Assessing macroinvertebrate biodiversity in freshwater ecosystems: Advances and challenges in DNA-based approaches. Q. Rev. Biol. 85:319-40. Planas, D. 1996. Acidification effects. In Stevenson, R. J. Bothwell, M. L. Lowe,R.L. [Ed.] Algal Ecology: Freshwater Benthic Ecosystems. Academic Press, San Diego, CA, pp. 530. Pool, J. R. 2010. Use of diatom assemblages and biofilm enzyme activities for assessment of acid mine remediated streams in southeastern Ohio. Ohio University, Athens, Ohio. Thesis. http://etd.ohiolink.edu/view.cgi?acc num=ohiou1280508985. Reynolds, C. S. 2006. The Ecology of Phytoplankton, Cambridge University Press, Cambridge, UK, 535 pp. Round, F. E., Crawford, R. M. & Mann, G. M. 1990. The Diatoms: Biology and Morphology of the Genera, Cambridge University Press, Cambridge, UK, 747 pp. Saunders, G. 2005. Applying DNA barcoding to red macroalgae: A preliminary appraisal holds promise for future applications. Philos. Trans. R. Soc. B-Biol. Sci. 360:1879-88. Schmieder, R. & Edwards, R. 2011. Quality control and preprocessing of metagenomic datasets. Bioinformatics (Oxford) 27:863-4. Sherwood, A. R. & Presting, G. G. 2007. Universal primers amplify a 23S rDNA plastid marker in eukaryotic algae and cyanobacteria. J. Phycol. 43:605-8. Sherwood, A. R., Chan, Y. L. & Presting, G. G. 2008. Application of universally amplifying plastid primers to environmental sampling of a stream periphyton community RID A-3383-2009. Mol. Ecol. Resour. 8:1011-4. Smith, E. P. & J. R. Voshell. 1997. Studies of benthic macroinvertebrates and fish in streams within EPA Region 3 for development of Biological Indicators of Ecological Condition. Part 1, Benthic Macroinvertebrates. Report to U. S. Environmental Protection Agency. Cooperative Agreement CF821462010. EPA, Washington, D.C. Stainton, M.J., Capel, M.J., & Armstrong, F.A.J. 1974. The chemical analysis of fresh water. Environment Canada; Fisheries and Marine Service, Miscellaneous Publication No. 25, pp. 67-69.

63

Steinman, A. D. 1996. Effects of grazers on freshwater benthic algae. In Stevenson, R. J. Bothwell, M. L. Lowe,R.L. [Ed.] Algal Ecology: Benthic Freshwater Ecosystems.Academic Press, San Diego, CA, pp. 373. Steinman, A. D., Lamberti, G. A. & Leavitt, P. R. 1996. Biomass and pigments of benthic algae. In Hauer, F. R. & Lamberti, G. A. [Eds.] Methods in Stream Ecology. Academic Press, Burlington, Massachusettes, pp. 357-380. Sutherland, T. F., Amos, C. L. & Grant, J. 1998. The effect of buoyant biofilms on the erodibility of sublittoral sediments of a temperate microtidal estuary. Limnol. Oceanogr. 43:225-35. Swingland, I. R. 2001. Definition of biodiversity. In Levin, S. A. [Ed.] Encyclopedia of Biodiversity, Volume I. Academic Press, San Diego, CA, pp. 377-391. Trendall, A. F. & Morris, R. C. 1983. Iron-Formation: Facts and Problems, Elsevier, New York City, NY, 558 pp. Trüper, H. G. & Pfennig, N. 1978. of the Rhodospirillales. In Clayton, R. K. & Sistrom, W. R. [Eds.] The Photosynthetic Bacteria. Plenum Press, New York City, NY, pp. 19-27. Vannote, R. L., Minshall, G. W., Cummins, K. W., Sedell, J. R. & Cushing, C. E. 1980. The river continuum concept. Can. J. Fish. Aquat. Sci. 37:130-7. Verb, R. G. & Vis, M. L. 2005. Periphyton assemblages as bioindicators of mine- drainage in unglaciated western Allegheny plateau lotic systems. Water, Air Soil Poll. 161:227-65. Walter, C. A., Nelson, D. & Earle, J. I. 2012. Assessment of stream restoration: Sources of variation in macroinvertebrate recovery throughout an 11-year study of coal mine drainage treatment. Restor. Ecol. 20:431-40. Wehr, J. D. & Thorp, J. H. 1997. Effects of navigation dams, tributaries, and littoral zones on phytoplankton communities in the Ohio River. Can. J. Fish. Aquat. Sci. 54:378-95. Wei, X., Viadero Jr., R. C. & Buzby, K. M. 2005. Recovery of iron and aluminum from acid mine drainage by selective precipitation. Environ. Eng. Sci. 22:745- 755. Wetzel, R. G. 1996. Benthic algae and nutrient cycling in lentic freshwater ecosystems. In Stevenson, R. J. Bothwell, M. L. Lowe,R.L. [Ed.] Algal Ecology: Benthic Freshwater Ecosystems. Academic Press, San Diego, CA, pp. 667.

64

Zalack, J. T., Smucker, N. J. & Vis, M. L. 2010. Development of a diatom index of biotic integrity for acid mine drainage impacted streams. Ecol. Indic. 10:287- 295.

65