Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By

Bringing 'Bee-cological' Data to Life through a Relational Database and an Interactive Visualization Tool by Xiaojun Wang A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Master of Science in Bioinformatics and Computational Biology Aug 2018 APPROVED BY: Dr. Carolina Ruiz Dr. Robert J. Gegear Dr. Elizabeth F. Ryder 1 Abstract Over the past decade, bumblebees have rapidly declined in abundance and geographic distribution at an alarming rate, raising major social, economic and ecological concern worldwide. However, we presently lack effective bumblebee conservation strategies due to a lack of information on the specific ecological needs of each species. The ‘Beecology Project’ was created to fill this knowledge gap by utilizing citizen scientists to collect data on floral resource use patterns of foraging bees in naturally occurring mixed species communities across Massachusetts. In addition to its research goals, the Beecology Project also has the educational goal of providing a modular, integrated biology - computer science framework (a BIO-CS bridge) to assist teachers in developing curricula to meet the next generation biology and computer science standards at the high school level. The Beecology team has developed Android and Web mobile apps to assist citizen scientists to collect and submit field data on bumblebee and plant species interactions. Other Beecology team members also collected a substantial amount of bumblebee data through field research and online digital museum collections. However, there was no central location dedicated to the storage of such data. There was also no way for users such as researchers, educators, and the general public to access all of the collected data in an ecologically-meaningful way. To fill these gaps, I created a flexible relational database for receiving and storing data submissions from the Android app, Web app and other research data files. I also leveraged web development techniques like D3, Google maps, and Angular to develop several interactive visualization tools. These tools enable users to explore the contents of the database in different ways thereby facilitating exploratory studies, hypothesis generation and testing, and the development of effective conservation strategies for threatened bumblebee species. In addition, each tool was designed in a modular way , which will accelerate the development of modular high school curricula integrating biology practices and computational thinking. 2 Acknowledgments I would like to express my greatest gratitude to my advisors, Professor Carolina Ruiz, Professor Elizabeth F. Ryder and Professor Robert Gegear who have given me a chance to participate this exciting BIO-CS project. I also would like to thank them for their constant help, reading the paper with great care and offering me invaluable advice and informative suggestions. I also would like to thank every team and every team member in our Beecology project: Database and visualization team: Ellen Pierce who provided excel records of field observations; Fareya Ikram who assisted me in importing the data into the database; Quyen Hoang who helped me import the data and gave me advice on web service development and visualization. App and website development team: Linh Hoang, Andrew Gao, Ziyang Yu, Jacob Moon and Jackson Oliva who designed and developed the Android application; Huy Tran who developed the web application; Ankit Kumar who provided help on web site design and web service improvement; Akshit Soota who integrated Firebase Authentication with the web service; Andrew Walter who helped us with improvement of website and server; Sarun Paisarnsrisomsuk who helped me at teacher workshop. Website content team: Eoin O’Connell, Kenneth Levasseur, Sam Coache, Rachel Murphy, Kenedi Heather and Devin Stevens who provided background of the project and introductory tutorials. Simulation team: Kevin Heath, Michael LoTurco, Rachel Blakely who are developing the Simulation modeling software. Finally, I would like to thank my parents, Tao Wang and Xiujun Geng, and thank my aunt Hong Wang and my cousin Whitney Zhang for supporting me during my studies. This material is based upon work supported by the National Science Foundation under Grant No. 1742446. Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 3 Contents List of Figures 5 List of Tables 7 1 Introduction 8 1.1 Overview: 8 1.2 Background 10 1.2.1 Bumblebees 10 1.2.2 Database 11 1.2.3 Visualization Tool 12 1.2.4 The Beecology Project 14 2 Methodology and Results 16 2.1 Data sources 17 2.1 Database Design 19 2.1.1 Introduction 19 2.1.2 Schema 22 2.2 Web Service 24 2.3 Interactive Data Visualizations 27 2.3.1 Introduction 27 2.3.2 Time Controller 28 2.3.3 Species selector 29 2.3.4 Data Preprocessing 30 2.3.5 Diversity Map 30 2.3.6 Population 36 2.3.7 Floral Preferences 46 2.3.8 Model, View, Controller (MVC) 52 3 Discussion and Future Work 54 Bibliography 58 Appendix 61 A.1 Section 2.4.5 Diversity 61 A.2 Section 2.4.6 Population by season 61 A.3 Section 2.3 Web Service 62 4 List of Figures Figure 1.1: Overview of Technological Needs for the Beecology Project 6 Figure 1.2: Angular Architecture 11 Figure 2.1: Roadmap of Data Flow 13 Figure 2.2: The four study sites used by Pierce and Gegear to collect data on bumblebee-plant species associations. 15 Figure 2.3: Abstraction graph for the Beecology Database 17 Figure 2.4: How to store picture and video of observation in database 18 Figure 2.5: Entity - Relationship Graph for the Beecology Database 19 Figure 2.6: Sample data stored in tables according to the schema 21 Figure 2.7: Roadmap of Web Service 22 Figure 2.8: Web service example - Submitting an observation to the database 22 Figure 2.9: Web service example - Getting all observations from the database 23 Figure 2.10: Firebase Authentication - Getting observations by user 23 Figure 2.11: The First Page of Data Visualization 24 Figure 2.12: Angular Modularity of the Beecology Visualization Tools 25 Figure 2.13: Time controller 26 Figure 2.14: Species selector 26 Figure 2.15: Process data by date 27 Figure 2.16: Design Process to Generate of Diversity Map 28 Figure 2.17: Gridview Design for Diversity Map 29 Figure 2.18: Calculate diversity of each grid cell 29 Figure 2.19: The diversity value is mapped to a color gradient 30 Figure 2.20: Diversity Map 31 Figure 2.21: The Diversity Map Allows Interactive Zooming by the User 31 Figure 2.22: Stacked bar presents species abundance of one grid cell 32 Figure 2.23: Overview of Location Visualization 33 Figure 2.24: Overview of Location Visualization 34 Figure 2.25: Overview of Location Visualization 34 Figure 2.26: Map and Map cluster markers - Zoom in and out 35 Figure 2.27: Initial design for map cluster markers showing species selection 36 Figure 2.28: Google Map Marker Clusters library 36 Figure 2.29: Circle map cluster markers -> Donuts cluster marker 37 5 Figure 2.30: Add Donuts graph function to Google Map Marker Cluster Library 37 Figure 2.31: Donuts Map cluster markers - all observations 37 Figure 2.32: Map cluster markers change dynamically when the user selects species 38 Figure 2.33: Map cluster markers adjust dynamically when the user zooms in 38 Figure 2.34: Prototype of Line graph for bumblebee phenology 39 Figure 2.35: Line graph for bumblebee season 40 Figure 2.36: Line graph for bumblebee season - Data filtered by time and species 41 Figure 2.37: Mapping observation data within heatmap cells 41 Figure 2.38: Heatmap Prototype 42 Figure 2.39: Heatmap 42 Figure 2.40: Heatmap - Data filtered by species and hover event 43 Figure 2.41: Design Process to Generate of Floral Visualization 44 Figure 2.42: How to build a bipartite graph 45 Figure 2.43: click event on one bumblebee rectangle 46 Figure 2.44: Bipartite graph with trait filters of bumblebee and flower 47 Figure 2.45: Bipartite graph click event - display statistic graphs of selected bumblebee and preferred flowers 47 Figure 2.46: Flower Preference Visualization - Overview graph 49 Figure 2.47: Show flower preferences of Bombus fervidus 50 Figure 2.48: Angular MVC implementation of heat map in Summary visualization 51 6 List of Tables Table 2.1 Data source exploration of three data sources 15 Table 2.2: “Bumblebee” species Table 21 Table 2.3: “Flower” Table 21 Table 2.4: “Feature” Table 21 Table 2.5: “Observation” Table 22 Table 2.6: A matrix of bumblebee - flower network 45 Table A.1: Lists of API methods 61 7 1 Introduction 1.1 Overview: Insect pollination plays a crucial role in ecosystem function, biodiversity maintenance, and agricultural productivity worldwide.[1] [2] [3] For example, pollination provided by different bumblebee species provide food, nest sites and shelter for a diverse array of wildlife and also contributes billions of dollars to the agronomy each year.[4] However, many insect pollinator species have declined in abundance and geographic distribution at an alarming rate over recent years, [5]raising major environmental, social and economic concerns. Although the cause of pollinator decline is currently unknown, significant contributing factors are likely to include habitat fragmentation and loss; pathogens, parasites and pesticides; impacts of non-native plants and pollinators; and climate changes. [6] [7] [8] In the case of bumblebee decline, a considerable amount of empirical effort has been devoted to examining the potential impacts of pathogens and pesticides on wild populations [7] [9] [10] [11] [12]However, far less is known about the effects of human-induced changes to critical resources needed for bumblebees to complete their annual life cycle.

Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By

Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3

Visualization and Exploration of Transcriptomics Data Nils Gehlenborg

How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification

OLIVER: a Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data

Superheat: an R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data

Data Mining Mobile Devices Defines the Collection of Machine-Sensed Mobile Mining Data Devices Mobile Devices Environmental Data Pertaining to Human Social Behavior

Download the Publicly Available R Software Language, Among a Few Other Operating System-Specific Requirements

View in the FDA’S Voluntary Genomics Data Submission Program

Heatmap Visualization with Spreadsheet

Heat Map Visualization for Electrocardiogram Data Analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2*

Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence

A Visual Analytics Framework for Cluster Analysis of DNA Microarray Data ⇑ José A