Technological Feasibility Analysis

Technological Feasibility Analysis Date: 10/6/2020 Team Name: Presto Proxies Team Members: Melissa Peiffer, Mason Rodgers, Justin Coffey, Colin Taylor Project Sponsor: Dr. Nicholas McKay Team Mentor: David Failing 2 Table of Contents 1. Introduction 3 2. Technological Challenges 4 3. Technology Analysis 4 a. Reading NetCDF files for Paleoclimate data 4 b. Visualization methods for time series data 6 and index reconstruction c. Front End Framework 9 d. Backend Framework 11 4. Technology Integration 13 5. Conclusion 14 3 1. Introduction Climate can be described as the average, long-term weather patterns observed over many years in a specific region. Observations of the climate include metrics such as temperature, precipitation, and air pressure. The climate affects and has always affected everything, from ecosystems to societies. To better understand our climate now and in the future, it is essential we know how it has changed in the past. The study of past climate variations is known as paleoclimatology. Past climate conditions can be deciphered by observing imprints left on the natural environment, such as the isotopes present in coral skeletons or the substances frozen in the layers of an ice core. Quantitative data, known as proxy data, can be collected from these imprints and be used to reconstruct models of past climate conditions. Over the last 30 years, thousands of proxy data sets have been collected. The quantity of the collected data makes it nearly impossible to analyze climate variation patterns by hand. It would be too time-consuming to search the data set for notable information, and difficult to view the information in a meaningful way. The Paleoclimate Dynamics Laboratory at NAU, along with collaborators at the University of Southern California, are launching a new project called PReSto (Paleoclimate Reconstruction Storehouse), that will streamline the creation of paleoclimate reconstructions. Visual reconstruction of paleoclimate data allows scientists to isolate specific patterns in their data and condense it into meaningful subsets. To accomplish this task, our vision is a modern web application that utilizes the reconstructions being produced by PReSto. The application will present these reconstructions to a wide range of informed end-users in the form of interactive maps and graphs, which will allow users to navigate through past climate reconstructions in space and time. These visualizations will give users insight into climate variation patterns that would otherwise be difficult to obtain. If successful, it’s difficult to overstate the impact of this application. Access to past climate data is of large public interest. Thousands of educators, scientists, and policymakers would benefit from the application. Educators would be able to integrate the application into their sustainability curriculum, and better inform our society about climate-related issues. The ability to visualize past climate data will allow researchers to better understand current climate variations, and to better predict future climate conditions. Finally, the application will inform 4 policymakers to instate laws and protections that mitigate the effects of climate change. It is crucial to address climate change now in order to prevent catastrophic changes to our climate in the future. The application is one of many tools that will provide a way to combat climate change by giving insight into the patterns of climate variation. In this document, we begin by outlining the major technological challenges we expect to encounter as we develop our project. Once we have identified these challenges, we will closely examine each item in more detail, presenting our methods for analyzing that technology, alternatives to that technology, and how we ultimately decided to integrate that technology into our project. The technological challenges we expect to encounter are listed below. 2. Technological Challenges ● We will need a way to read NetCDF files for Paleoclimate data. ● We will need visualization methods for time series data and index reconstruction. ● We will need a front end Framework. ● We will need a back end Framework. 3. Technology Analysis a. Reading NetCDF files for Paleoclimate data Intro the issue In order to generate visualizations, users must choose a NetCDF file containing the proxy data. Several libraries exist for the explicit purpose of manipulating NetCDF data. Our challenge is to select an appropriate library that can handle large NetCDF files efficiently. NetCDF formatted files are made up of plaintext metadata headers and binary compressed scientific data, which must be placed into a data structure in order to be usable. As such, selecting a library to read the NetCDF files is a top priority for our application. We have outlined some of the desired characteristics such a library would have below. Desired Characteristics Our library must meet the following criteria. It should have a smaller library size to allow for efficient loading and manipulation of NetCDF data, be able to place the NetCDF data into a comparable data structure, and be able to return subsets of NetCDF data. The library should be implemented in Python to remain consistent with our chosen programming language. Our 5 analysis of Python libraries for data management has led us to two primary alternatives. These alternatives are NetCDF4 and GDAL. Alternatives Below, we have provided a broad overview of NetCDF4 and GDAL. ● NetCDF4 is an open-source python module used to read in NetCDF files as NumPy arrays. It is built off of existing C libraries for reading large data structures. ● GDAL, the Geospatial Data Abstraction Library, is a wide-ranging library implemented in multiple programming languages such as C# and Python for reading in vector and raster geospatial data. Analysis After looking at both GDAL and NetCDF4 we examined each of our criteria metrics to compare the feasibility of each technology. Our comparison of these metrics has led us to the observations below. Analysis of NetCDF4 ● NetCDF4 was built specifically to interface with NetCDF data. ● NetCDF4 has a library size of 3.12 MB. ● NetCDF4 automatically returns data in the form of a NumPy array. ● NetCDF4 can return subsets of specific data. Analysis of GDAL ● GDAL is a general library that interfaces with a wide variety of data. ● GDAL has a library size of 126 MB. ● GDAL automatically returns data in the form of a NumPy array. ● GDAL can return subsets of data. 6 ALTERNATIVE Technology Concept Library Includes Able to Return S Type Size Conversion to Data Subsets Comparable Data Structure NetCDF4 Python A Python 3.12 MB Yes Yes library interface for the NetCDF C library GDAL Python A Python library 126 MB Yes Yes library specifically for vector and geospatial data Chosen approach Our chosen approach is NetCDF4. Both libraries met our criteria well, but NetCDF4 is superior in terms of library size and efficiency. Also, NetCDF4 is designed specifically to interface with NetCDF data, whereas GDAL is a much broader library and would introduce unnecessary overhead. Proving feasibility In order to prove the feasibility of NetCDF4, our team must import the library into Python and test the functionality using a sample NetCDF file. The tests should include reading the NetCDF file, defining dimensions for data visualization, and accessing specific variables and values from the NetCDF file. If the NetCDF4 library successfully handles the data provided, we will need to use NetCDF4 to access the data for visualization. b. Visualization methods for time series data and index reconstruction Intro to the Issue Our program needs to be able to generate different types of visualizations for both time series data and index reconstruction. The time-series data should be displayed in five dimensions; those dimensions being spatial (x, y, z,) coordinates, time, and uncertainty. The index reconstruction should be displayed in four dimensions; those dimensions being spatial coordinates and time. Our challenge is to select an appropriate library to construct these visualizations and to give the user the ability to find specific information from their data. Our 7 program also needs to provide users with the ability to export data visualizations and data subsets. We have outlined some of the desired characteristics we would like our visualization library to have below. Desired Characteristics Our library must meet the following criteria. It should be able to handle large amounts of data, allow for the creation of interactive plots in multiple file formats, be able to isolate specific data subsets and plot the data in the form of NumPy Arrays. Once again, the library should be implemented in Python for consistency. Our research has led us to four alternatives, Matplotlib, Seaborn, ggplot, and Pygal. Alternatives ● Matplotlib is a standard python library utilizing the NumPy library to plot 2D graphs and other plots. ● Seaborn is an extra layer of abstraction over matplotlib that allows complex plot types to be made with less code than if they were implemented with only the matplotlib library. ● ggplot is originally a library implemented in R and then ported to python used for visualization of data science applications. It implements a high-level API that allows for complex plots with very little code. ● Pygal is an open-source python library that by default produces SVG images, which helps ensure scalability of images without producing excessive pixelation. It also provides many options for user interaction, which may be useful for our purposes. Analysis Below are our observations of each library, with an overview of their benefits and deficits. Analysis of Matplotlib ● Matplotlib can create both static and animated plots but is a low-level plotting library that may be difficult to use. Analysis of Seaborn ● Seaborn is a wrapper library over matplotlib which offers many of the same functionalities of matplotlib with less code and complexity. Analysis of ggplot 8 ● ggplot is simple and can isolate data by specific periods. However, it is harder to customize.

Load more