Analysis of Algae-Vulnerable Lakes in Utah Using R Plotting Tools to Visualize Water Quality Data
Total Page:16
File Type:pdf, Size:1020Kb
Analysis of Algae-Vulnerable Lakes in Utah Using R Plotting Tools to Visualize Water Quality Data Sunayna Dasgupta and Aiswarya Rani Pappu Department of Civil and Environmental Engineering University of Utah Abstract - Algae formation in water body is a direct B. CHL a quantification outcome of eutrophication. Eutrophication adversely Detection and quantification of chlorophyll a (CHL a) impacts the biological, physical, chemical and aesthetic has proven to be an effective way to assess the presence of components of a water body. It usually occurs due to algae in a water body [8]. Since algae have chlorophyll as increased rate of nutrient loading in the form of nitrogen their primary photosynthetic pigment, CHL a quantification and phosphorous. This study presents a comparative will provide useful information for measuring algal analysis of algae vulnerable lakes/waterbodies in Utah population density in a water body. Chlorophyll is the green State and categorize them based on Tropic State Index. pigment, which acts as an essential component to trap Keywords: eutrophication, algae, lakes sunlight and convert it to energy for metabolism. I. INTRODUCTION C. Algae vulnerable lakes in Utah A. Problem According to a recent report, three of Utah’s largest Algae are primarily aquatic, single or multicellular public drinking water systems, tap reservoirs, and twenty organisms containing chlorophyll. Examples of algae rivers have developed green biota in them. Utah Division of include: diatoms, green and red algae, and primitive Water Quality released a list of algae vulnerable water photosynthetic bacteria such as Cyanobacteria (blue green bodies [3]: algae). Algal biomass acts as one of the primary surface Huntington Creek* water quality criterion. Higher level of algal biomass in a Provo River* water body can be associated with a broad range of changes Utah Lake in the dissolved oxygen concentration. Fluctuations in Gunlock Reservoir dissolved oxygen concentration will effect sensitive aquatic Lost Creek Reservoir biota [6]. The process of algal biomass formation in a water Scofield Reservoir body is commonly known as algal bloom/eutrophication. Pineview Reservoir* This usually occurs due to increase in the nutrient loading East Canyon Reservoir [1] [2] to the water body, light levels, pH and temperature. Steinaker Reservoir* Figure 1. Justifies the ill-effect of algae contamination by Minersville Reservoir depicting a fish kill picture as a result of eutrophication and Yuba Reservoir the formation of green algal biomass in a water body. Rockport Reservoir Red Fleet Reservoir* Jordanelle Reservoir a b Starvation Reservoir* Otter Creek Reservoir Wide Hollow Reservoir Flaming Gorge Reservoir* Sand Hollow Reservoir* Figure 1. (a) Fish kill due to algal growth (eutrophication), and (b) Water Millsite Reservoir* Stargrass (Heteranthera dubia) a rooted macrophyte in the Yakima River, Deer Creek Reservoir* Washington. (USGS.) *Denotes source of drinking water There are basically three methods for the estimation of algal biomass: Algal blooms have been found to cause health risks to both human being and animals. Hence regular monitoring of such • Computing the chlorophyll a amount (CHL a) water bodies becomes an important parameter for promoting • Measurement of the carbon biomass as the ash-free healthy environment. dry mass (AFDM) • Measurement of the particulate organic carbon (POC). I. OBJECTIVE The present report discusses the data collection, assortment and visualization analysis of the water quality using water quality data collected from various counties of Utah. The collected data was analyzed using tools for R. The data was visually analyzed for Chlorophyll a along a timeline of sampling data. A database consisting of the data from various counties was created and then manipulated for analysis purpose. Figure 2. EPA STORET Web interface II. SOFTWARE DESCRIPTION A. MySQL Under STORET Data Warehouse link click on yellow button MySQL workbench is an undivided visualization tool titled browse or download Modernized STORET data used by architects, developers, and DBAs [9]. It is (Figure 3). responsible for providing a data model, SQL development environment and cumulative administration tools for the server configuration, user administration and backup. It enables a developer to design and visualize model and generate and manage databases. B. Rstudio Rstudio is a free and open source integrated development environment for R [10]. It provides an environment for data analysis. RMySQL Package was used to create a connection with an Observation data model (ODM) database in MySQL. The intended variables requiring analysis can be Figure 3. Data access tool called and visualized by creating plots using packages like matplotlib. Under STORET/WQX Warehouse Reports‐STORET Results click the link titled Results Download (Figure 4). III. DATA COLLECTION The data required for the analysis was collected from EPA’s STORET (STOrage and RETrieval). It is a data warehouse repository for sharing water monitoring data which includes biological and physical parameters. This data can be used by state environmental agencies, EPA and other federal agencies, universities, private citizens and others. A. Data query and download The STORET database can be used for data query and acquire access on data of specific water resource chemical, physical and biological attributes and parameters as well as methods used in evaluation [7]. STORET data can be used to Figure 4. Data selection window query data based on monitoring location information and data collected on that location. Choose the respective state, county, station type, date, The STORET data can be accessed by following the steps activity medium, activity intent and community sampled below: (default) and the characteristics intended. Go to STORET main page http://www.epa.gov/storet/ For downloading the query results, note down the number of Click download data link (Figure 2) records found and narrow down the query if the number of records exceed 3gigabites. Select the report types, click on ‘appropriate user profile’ box, enter email address, prefix the intended name, and select data elements for report. Under batch processing click immediate button and then the data will be sent to the provided email address. B. Database creation Data downloaded from the STORET site was organized using observations data model (ODM) in MySQL [9]. Chlorophyll a data was available only for years 2006 and Waterbodies with low to midrange TSI values from 40-50 2007. This data was sorted according to the time series and are moderately clear, and have high chances of algal growth stored in a column named data value. Data from 10 different increase. Waterbodies with midrange TSI values ranging in counties were plotted using R, according to the available between 50-70 are generally more turbid, have higher algal time series. Figure 5 shows the data values loaded into the population densities, and also exhibits low DO levels. table created for Beaver County ODM. Waterbodies with high TSI values (70 and above) are observed to have heavy and dense algal blooms with excessive DO problems [4]. VI. RESULTS Figures 6 to 15 show the time series plots for Beaver, Cache, Davis, Duchesne, Morgan, Salt Lake, Summit, Wasatch, Wayne and Weber counties. Figure 5. Variables table created for Beaver County in MySQL IV. DATA VISUALISATION USING R The data was visualized using R Studio [10]. The R script was used to connect to the local ODM named Figure 6. Plot showing Chlorophyll a record for Beaver County during chlorophyll a ODM. The code was written to sort and create 2006-2007. time series plots for all counties. V. TROPIC STATE INDEX TSI (Tropic State Index) is a common way to characterize lake’s trophic status. It indicates the overall health status of a waterbody. The quantities of Chlorophyll a, Phosphorus, nitrogen and other useful nutrients are the primary determinants that independently estimate the algal bloom density in a waterbody at a specific location. It uses algal biomass as the basis for classification of tropic state of waterbody. This index is a dimensionless numeric value which approximately ranges from 0 to 100. The index is simple to calculate, use and understand. A simplified equation (given below) can be used to calculate Figure 7. Plot showing Chlorophyll a record for Cache County during 2006- 2007. the Tropic State Index for Chlorophyll a. (Carlson,R.E et al.,1996) Where, CHL represents the average value of Chlorophyll a in µg/l. Generally, every TSI value indicates algal population densities and the water system characteristics. A water body with low TSI values ranging from 30-40 are generally transparent, have low algal population densities, and have adequate DO (Dissolved Oxygen) concentration present. Figure 8. Plot showing Chlorophyll a record for Davis County during 2006- Figure 11. Plot showing Chlorophyll a record for Salt Lake County during 2007. 2006-2007. Figure 9. Plot showing Chlorophyll a record for Duchesne County during Figure12. Plot showing Chlorophyll a record for Summit County during 2006-2007. 2006-2007. Figure 10. Plot showing Chlorophyll a record for Morgan County during Figure 13. Plot showing Chlorophyll a record for Wasatch County during 2006-2007. 2006-2007. VII. CONCLUSIONS MySQL and Rstudio acted as a useful tool for database creation, storage of Chlorophyll a data over a time series, statistical analysis and visualization of data. Data base creation in MySQL was a bit challenging due to the required formatting issues but later it was solved. Once the database was created and the data was stored, linking it to Rstudio, creation of visualizations was comparatively easy. Due to limited field sampling for Chlorophyll a data for Utah State, limited data points were achieved. The graphs generated using R and the statistical analysis showed the presence of algae vulnerable sites in Utah and the highest presence was indicated in the Salt Lake County. Figure 14. Plot showing Chlorophyll a record for Wayne County during 2006-2007. The ease of usage and the compatibility makes MySQL and R an efficient tool to handle large amount of data and conduct a comparative and statistical analysis.