Bringing 'Bee-cological' Data to Life through a Relational Database and an Interactive Visualization Tool by

Xiaojun Wang

A Thesis

Submitted to the Faculty

of the

WORCESTER POLYTECHNIC INSTITUTE

in partial fulfillment of the requirements for the

Degree of Master of Science

in

Bioinformatics and Computational Biology

Aug 2018

APPROVED BY:

Dr. Carolina Ruiz

Dr. Robert J. Gegear

Dr. Elizabeth F. Ryder

1 Abstract

Over the past decade, bumblebees have rapidly declined in abundance and geographic distribution at an alarming rate, raising major social, economic and ecological concern worldwide. However, we presently lack effective bumblebee conservation strategies due to a lack of information on the specific ecological needs of each species. The ‘Beecology Project’ was created to fill this knowledge gap by utilizing citizen scientists to collect data on floral resource use patterns of foraging bees in naturally occurring mixed species communities across Massachusetts. In addition to its research goals, the Beecology Project also has the educational goal of providing a modular, integrated biology - computer science framework (a BIO-CS bridge) to assist teachers in developing curricula to meet the next generation biology and computer science standards at the high school level. The Beecology team has developed Android and Web mobile apps to assist citizen scientists to collect and submit field data on bumblebee and plant species interactions. Other Beecology team members also collected a substantial amount of bumblebee data through field research and online digital museum collections. However, there was no central location dedicated to the storage of such data. There was also no way for users such as researchers, educators, and the general public to access all of the collected data in an ecologically-meaningful way. To fill these gaps, I created a flexible relational database for receiving and storing data submissions from the Android app, Web app and other research data files. I also leveraged web development techniques like D3, Google maps, and Angular to develop several interactive visualization tools. These tools enable users to explore the contents of the database in different ways thereby facilitating exploratory studies, hypothesis generation and testing, and the development of effective conservation strategies for threatened bumblebee species. In addition, each tool was designed in a modular way , which will accelerate the development of modular high school curricula integrating biology practices and computational thinking.

2 Acknowledgments

I would like to express my greatest gratitude to my advisors, Professor Carolina Ruiz, Professor Elizabeth F. Ryder and Professor Robert Gegear who have given me a chance to participate this exciting BIO-CS project. I also would like to thank them for their constant help, reading the paper with great care and offering me invaluable advice and informative suggestions. I also would like to thank every team and every team member in our Beecology project: Database and visualization team: Ellen Pierce who provided excel records of field observations; Fareya Ikram who assisted me in importing the data into the database; Quyen Hoang who helped me import the data and gave me advice on web service development and visualization. App and website development team: Linh Hoang, Andrew Gao, Ziyang Yu, Jacob Moon and Jackson Oliva who designed and developed the Android application; Huy Tran who developed the web application; Ankit Kumar who provided help on web site design and web service improvement; Akshit Soota who integrated Firebase Authentication with the web service; Andrew Walter who helped us with improvement of website and server; Sarun Paisarnsrisomsuk who helped me at teacher workshop. Website content team: Eoin O’Connell, Kenneth Levasseur, Sam Coache, Rachel Murphy, Kenedi Heather and Devin Stevens who provided background of the project and introductory tutorials. Simulation team: Kevin Heath, Michael LoTurco, Rachel Blakely who are developing the Simulation modeling software. Finally, I would like to thank my parents, Tao Wang and Xiujun Geng, and thank my aunt Hong Wang and my cousin Whitney Zhang for supporting me during my studies. This material is based upon work supported by the National Science Foundation under Grant No. 1742446.

Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

3 Contents

List of Figures 5 List of Tables 7 1 Introduction 8 1.1 Overview: 8 1.2 Background 10 1.2.1 Bumblebees 10 1.2.2 Database 11 1.2.3 Visualization Tool 12 1.2.4 The Beecology Project 14 2 Methodology and Results 16 2.1 Data sources 17 2.1 Database Design 19 2.1.1 Introduction 19 2.1.2 Schema 22 2.2 Web Service 24 2.3 Interactive Data Visualizations 27 2.3.1 Introduction 27 2.3.2 Time Controller 28 2.3.3 Species selector 29 2.3.4 Data Preprocessing 30 2.3.5 Diversity Map 30 2.3.6 Population 36 2.3.7 Floral Preferences 46 2.3.8 Model, View, Controller (MVC) 52 3 Discussion and Future Work 54 Bibliography 58 Appendix 61 A.1 Section 2.4.5 Diversity 61 A.2 Section 2.4.6 Population by season 61 A.3 Section 2.3 Web Service 62

4 List of Figures

Figure 1.1: Overview of Technological Needs for the Beecology Project 6 Figure 1.2: Angular Architecture 11 Figure 2.1: Roadmap of Data Flow 13 Figure 2.2: The four study sites used by Pierce and Gegear to collect data on bumblebee-plant species associations. 15 Figure 2.3: Abstraction graph for the Beecology Database 17 Figure 2.4: How to store picture and video of observation in database 18 Figure 2.5: Entity - Relationship Graph for the Beecology Database 19 Figure 2.6: Sample data stored in tables according to the schema 21 Figure 2.7: Roadmap of Web Service 22 Figure 2.8: Web service example - Submitting an observation to the database 22 Figure 2.9: Web service example - Getting all observations from the database 23 Figure 2.10: Firebase Authentication - Getting observations by user 23 Figure 2.11: The First Page of 24 Figure 2.12: Angular Modularity of the Beecology Visualization Tools 25 Figure 2.13: Time controller 26 Figure 2.14: Species selector 26 Figure 2.15: Process data by date 27 Figure 2.16: Design Process to Generate of Diversity Map 28 Figure 2.17: Gridview Design for Diversity Map 29 Figure 2.18: Calculate diversity of each grid cell 29 Figure 2.19: The diversity value is mapped to a color gradient 30 Figure 2.20: Diversity Map 31 Figure 2.21: The Diversity Map Allows Interactive Zooming by the User 31 Figure 2.22: Stacked bar presents species abundance of one grid cell 32 Figure 2.23: Overview of Location Visualization 33 Figure 2.24: Overview of Location Visualization 34 Figure 2.25: Overview of Location Visualization 34 Figure 2.26: Map and Map cluster markers - Zoom in and out 35 Figure 2.27: Initial design for map cluster markers showing species selection 36 Figure 2.28: Google Map Marker Clusters library 36 Figure 2.29: Circle map cluster markers -> Donuts cluster marker 37

5 Figure 2.30: Add Donuts graph function to Google Map Marker Cluster Library 37 Figure 2.31: Donuts Map cluster markers - all observations 37 Figure 2.32: Map cluster markers change dynamically when the user selects species 38 Figure 2.33: Map cluster markers adjust dynamically when the user zooms in 38 Figure 2.34: Prototype of Line graph for bumblebee phenology 39 Figure 2.35: Line graph for bumblebee season 40 Figure 2.36: Line graph for bumblebee season - Data filtered by time and species 41 Figure 2.37: Mapping observation data within heatmap cells 41 Figure 2.38: Heatmap Prototype 42 Figure 2.39: Heatmap 42 Figure 2.40: Heatmap - Data filtered by species and hover event 43 Figure 2.41: Design Process to Generate of Floral Visualization 44 Figure 2.42: How to build a bipartite graph 45 Figure 2.43: click event on one bumblebee rectangle 46 Figure 2.44: Bipartite graph with trait filters of bumblebee and flower 47 Figure 2.45: Bipartite graph click event - display statistic graphs of selected bumblebee and preferred flowers 47 Figure 2.46: Flower Preference Visualization - Overview graph 49 Figure 2.47: Show flower preferences of Bombus fervidus 50 Figure 2.48: Angular MVC implementation of heat map in Summary visualization 51

6 List of Tables

Table 2.1 Data source exploration of three data sources 15 Table 2.2: “Bumblebee” species Table 21 Table 2.3: “Flower” Table 21 Table 2.4: “Feature” Table 21 Table 2.5: “Observation” Table 22 Table 2.6: A matrix of bumblebee - flower network 45 Table A.1: Lists of API methods 61

7 1 Introduction

1.1 Overview: Insect pollination plays a crucial role in ecosystem function, biodiversity maintenance, and agricultural productivity worldwide.[1] [2] [3] For example, pollination provided by different bumblebee species provide food, nest sites and shelter for a diverse array of wildlife and also contributes billions of dollars to the agronomy each year.[4] However, many insect pollinator species have declined in abundance and geographic distribution at an alarming rate over recent years, [5]raising major environmental, social and economic concerns. Although the cause of pollinator decline is currently unknown, significant contributing factors are likely to include habitat fragmentation and loss; pathogens, parasites and pesticides; impacts of non-native plants and pollinators; and climate changes. [6] [7] [8] In the case of bumblebee decline, a considerable amount of empirical effort has been devoted to examining the potential impacts of pathogens and pesticides on wild populations [7] [9] [10] [11] [12]However, far less is known about the effects of human-induced changes to critical resources needed for bumblebees to complete their annual life cycle. These resources include appropriate sources of floral nectar and pollen, nesting locations, and overwintering sites.[8] Although there is some evidence that species vary in their specific habitat requirements[13] or ‘ecological needs’[14], we currently do not have sufficient information at the species level to establish a direct link between a reduction in one or many of such needs and wild population decline in threatened species. Consequently, the impact of ongoing changes to the structure and diversity of wild bumblebee communities on ecosystem stability remains poorly understood. The ‘Beecology Project’ was created to rapidly fill this critical knowledge gap by using an army of citizen scientists to digitally collect data on bumblebee - plant species associations from natural areas across Massachusetts. From an educational perspective, the Beecology Project provides an integrated, modular biology- computer science framework (BIO-CS Bridge)[15] to assist high school teachers in the development of curricula satisfying the next generation science standards for the state of Massachusetts. To assist citizen scientists in the collection and submission of data, the Beecology team has developed Android and Web mobile apps. However, for such data to be useful in the context of research and public education, it is imperative that there is a repository for it to be stored and integrated with data from other sources such as research labs and museum archives. There is also a critical need for information in the Beecology Database, once developed, to be accessed in a way that is biologically meaningful to researchers, educators and the general public. For example, how can researchers compare historical and current data to test for species declines at a particular geographic location? And how can teachers and students use the database to effectively test scientific hypotheses? Finally, presenting the information in the Beecology Database in a modular way is essential for the

8 development and implementation of modular STEM+C curricula.[15] (Figure 1.1).

Figure 1.1: Overview of Technological Needs for the Beecology Project The Beecology Project has amassed data on bumblebee-flower species interactions from Android and Web applications, research laboratories, and museum species. However, it was lacking a central database to store and manage such data and a visualization tool to present these data to users in a way that is biologically and educationally meaningful. Filling these major technological gaps, which is the focus of my thesis, is essential for the continuation and success of the project and ultimately, the development of effective conservation strategies and BIO-CS high school curricula.

In this thesis, to fill the gaps of the Beecology Project, a database was designed for receiving and storing the observation data from Android and Web mobile apps, and from Excel sheets, csv files and other text files. This database can handle large and different data sets. It also allows multiple users to interact with the data at the same time without interference while protecting the integrity of the data. To see the power of citizen science, I also developed a web-based data visualization tool that generates multiple data visualizations, which are interactive, thus flexible for researchers, students, teachers, conservation groups, and the general public to use the collected ecological data to expedite exploratory study, develop hypotheses, explore the cause of bumblebee decline and find bumblebee conservation strategies. The visualization tool not only focuses on the display of bumblebee and plant data but also extends the study scope to abundance, diversity, phenology of bumblebee and bumblebee-plant interaction in quantitative ways. In addition, the modular design of the visualization tool makes it easier to develop educational material that will help high school teachers and students learn how to design and develop web-based visualizations of biological data.

9 1.2 Background

1.2.1 Bumblebees Life Cycle Bumblebee have an annual cycle that commences when hibernating queens emerge from their overwintering sites and locate a suitable nest site. The queen then lays eggs in a mass of pollen that she has collected from spring flowers and feeds the larvae until they emerge as adults. [16] [17]These ‘worker’ bees then continue to collect food for the growing colony and perform a variety of duties around the next, including housekeeping, feeding the young, nest construction, and colony defense. [18] [19] [20] In late summer to fall, depending on the species, the colony begins to produce males and virgin queen bees, which leave the nest and locate a mate.[21] Mated queens then locate a suitable overwintering site and the rest of the colony dies when temperatures drop below freezing. [21] The cycle continues the following spring. The dynamics of the population over the entire cycle is called ‘phenology’, which varies from species to species. Understanding differences in species phenology can assist in both conservation and commercial uses of bumblebees.

Ecological Importance Bumblebees are flower-visiting insects that provide critical pollination services to a wide variety of wild and crop plants as a consequence of their foraging activities. In fact, numerous plant species native to North America entirely depend on bumblebees for pollination. Collectively, the floral visitation patterns of bumblebees support the function and diversity of ecosystems due to the fact that the plants that they pollinate provide food, shelter and nesting sites for other wildlife[8]. However, this ‘keystone’ role of bumblebees is in serious jeopardy due to the recent rapid decline of many species around the globe. For example, research comparing current and historical distributions of 8 species in the United States showed that in the past 20 years, relative abundances of 4 bumblebee species have decreased by up to 96%, and their species ranges have narrowed by 23-87%.[7] The continued decline of bumblebees and other pollinators will eventually lead to ecosystem collapse, producing massive reductions in biodiversity and ecosystem functions. It is therefore imperative that the factors causing wild population decline in bumblebees be identified and mitigated.

Floral resource use Bumblebees need a source of high quality pollen and nectar over their entire life cycle.[22] [23] Although considered foraging generalists because they visit flowers of more than one plant species, they do not indiscriminately visit flowers. For example, species comprised of individuals with a long tongue tend to forage on flowers with a long corolla tube or nectar spur.[24] Economic foraging decisions

10 mean more nectar to the colony and consequently, greater colony reproductive output (fitness). There is also some evidence that nectar chemistry may play a significant role in determining the foraging preferences of some species, but this idea needs further empirical evaluation.[25]

1.2.2 Database Database A database is a collection of organized information. It can be easily accessed and managed.[26] The schema of the database defines how the data organized and how they are connected. [27]

Database Management Systems Each database has a structure, which provides logical models to deal with the different type of data easily. Database management systems are applications to create and manage databases withvarious shapes, sizes, and sorts.[27]

Relational Database Management Systems Relational Database Systems provide the relational models. The relational model organizes all kinds of information by defining them as related entities and fields across tables.[27] In these tables, each column (attribute) holds a different data type. For every instance in the database with a unique key, translates to a row in a table, with each row's series of attribute values being represented as the columns of a table. The tables are connected as a relational model.[27]

Schema A database schema is a logical architecture of a database created on a database management system.[27] It introduces a graphical view to describe how the database architecture organize. It also provides a way for representing database objects such as tables, attributes, and relations. [27] A database schema shows the tables, their attributes and the relationship between them and other tables. We always use native database language to define the database schema. It helps database administrators in knowing the database structure.[27]

PostgreSQL PostgreSQL is an object-relational database management system.[28] It is a free and open source software. A fundamental characteristic of PostgreSQL make it flexible and robust. is it allow user-defined objects and behaviors.[29] As Lisa Smith explained , "The numeric, floating-point, string, boolean and date types, PostgreSQL even includes data types like universally unique identifiers (uuid), monetary, enumerated, geometric, binary, network address, bit string, text search, Extensible Markup Language (XML), JavaScript Object Notation (JSON), array, composite and range types, as well as some internal types for object identification and log location."[29] 11 Since data values and types in the field of ecology are varied, a PostgreSQL database is a good choice in this field as it can store and manage ecological data in a flexible and stable way.

Web service The World Wide Web Consortium (W3C) defines a web service as "a software system designed to support interoperable machine-to-machine interaction over a network".[30] A web service may for example enable a client end (e.g., a Web app, a mobile app, a visualization tool) to send or request data to and from a remote database. A web service is not rely upon any one operating system or programming language; any client can use it. Furthermore, a web service distributed at the server side can protect the safety of a remote database, since public users can see parts of the data through the client side with limited permissions. A web service implements an API (Application programming interface). The API includes data communication standards, what data should be submitted by the client end and how to submit data through HTTP, the Hypertext Transfer Protocol (HTTP)[31] HTTP is designed to enable communications between clients and servers under a request-response protocol.[32]

1.2.3 Visualization Tool Interactive Visualization Interactive visualization is a type of graphics visualization which is emerging as a dynamic form of digital communication, providing attractive presentations that allow users to interact directly with data in order to construct their knowledge of it. [33] When Interactive visualization combines with traditional analysis approaches, it offers quick and dynamic ways to facilitate analysis of data exploration. Interactivity satisfies the analytical needs of users, and the visualization tools are well suited to characteristics of long-term data.[33] Also, an interactive user interface is effective for ecologists to explore data directly, compose or clarify hypotheses, and present and exchange new ideas within research groups.[33]

Web-Based Interactive Visualization In this thesis, interactive visualization is implemented as a web-based application. The web introduces a flexible way to connect applications, data, and users. Visualization can utilize this feature to remotely link associated data and visual representations.Web-based information visualization explained, "Current web-based techniques of interaction and navigation follow an intuitive point-and-click paradigm that lets users follow associated hyperlinks and drill down the data." [34] So that web-based visualization interactions also have a similar approach. Furthermore, new web techniques and paradigms will continue to contribute web-based visualizations as the public is gaining experience using the web.

Web Development A web-based application, a website, is a set of web pages. A web page is 12 composed of three parts: HTML, CSS, and Javascript. HTML (Hypertext Markup Language) is a markup language used to write the content of webpages. Text and data are provided by HTML Tags. One pair of closed tags is one element. [35] CSS (Cascading Style Sheets) is a stylesheet language that describes the layout of a web page.[36] JavaScript is a script language that programs the behavior of web pages. It can control HTML tags and CSS. [37] SVG (Scalable Vector Graphics) is a vector image format with support for interactivity and animation; images will not lose any quality if they are zoomed or resized. It is supported by HTML tags. HTML SVG tags has predefined shape elements like rectangle, circle, polygon, path, among others.[38]

D3 D3.js is an open source JavaScript library for manipulating HTML and CSS based on data.[39] D3 helps a developer visualize data interactively by using HTML, SVG, and CSS. [39] It allows simpler operations than using JavaScript directly. It also can create interactive graphics for online websites.

NVD3 NVD3 is a reusable charting library written in d3.js.[40] Since D3 is intentionally a low-level library, NVD3 provides a higher-level visualization specification language based on D3.[40] In addition, it makes the D3 component reusable and intuitive.

Modularity Modularity refers to the property of a system or a software/web application that allows it to be divided into smaller modules, as well as integration with similar modules, which provides great software development manageability. Modules are divided based on functionality, avoiding redundancy. It is easier to design and implement new components that add functionality to the system. In addition, an existing module can be removed from the system and replaced with a new one while the whole system remains functional.

Angular Angular is a framework that allows the construction of apps that work across any platform including the web, mobile web, native mobile and native desktop.[41] It is designed to be modular, which is to break a big system into well-connected, smaller pieces. Modularity makes it easier to implement reusable code, and in addition, makes the code more understandable. Integrating Angular with the Beecology visualization tool will help developers, high school teachers, and students to better understand how web-based visualization tools can be developed in a modular way. An Angular app is comprised of several components, which are connected via routing. These components may have templates attached to them,

13 which may display component properties and attach events to interact with the properties.[41] A component may use a service, to access a particular feature or perform a very specific task.[41] Services must be injected into components before they can be used from within the components; this is referred to as Dependency Injection.[41](Figure 1.2)

Figure 1.2: Angular Architecture[41]

1.2.4 The Beecology Project The Beecology Project is an ecological monitoring project, which aims to use citizen scientists to collect long-term data on interactions between bumblebees and flowering plants at the species level through the use of mobile apps. Data collected from the project can later be used by academic researchers, conservation groups, garden clubs, K-12 institutions, and the general public to learn more about bumblebees and the important role that they play in maintaining the function and diversity of ecosystems around the world. Phase 1 of the project is focused on the collection of bumblebee-plant species association data, with Phase 2 of the project focusing on nesting and overwintering site preference data for each species. Data collection will also be expanded to include other pollinator groups (e.g., butterflies and solitary bees) and additional geographic regions in eastern North America (e.g. New England states, Canadian provinces). The innovative aspect of Beecology Project is the collection of data on bumblebee behavior (pollen vs. nectar collection) and bee-flower interactions rather than just the bee species, sex, and geographic location data. Other members of the Beecology team have developed Android and Web mobile apps for collecting and submitting observation (i.e., bumblebee-flower

14 sightings) data. These apps contain guides and constraints that can lead users to identify the bumblebee species correctly, filling the taxonomic skills gaps for non- experts, which will greatly decrease logs (i.e., observations submitted by uses) verification time. In addition, we have other components developed by other members of the Beecology team. Simulation modeling software called ‘SimBee’ is in development, (Kevin Heath, unpublished results), which will allow users to simulate the environment of bumblebees and native plant species, adding and examining environmental stressors to allow predictions of the effects of these stressors on populations over time. We also have a website which integrates all of the tools we have: Android and Web applications, Visualization tool and Simulation modeling software. It also contains information on the Beecology project, including background of the project, and introductory tutorials.[42] The Beecology Project was created as a modular system to allow its use as part of a more formal educational project, the Bio-CS Bridge Project, with the goal of helping teachers to design modular curriculum integrating biology and computer science. The biology components in this system are generated from science and engineering practices, and allow students to practice experimental design and hypothesis testing. The computer science infrastructure consists of interconnected computational components which are designed to support computational needs related to each of the biological components. Modularity of software design was critical in order to allow teachers to be able to utilize software as a working tool (‘plug and play’), or to allow students to learn to develop particular software components (replace or add a component in the system). Thus, the Web app and web site were implemented in Angular 6 to achieve modularity.

Other Related Projects Citizen science projects such as Bumble Bee Watch[43] in North America and Beewatch[44] in the United Kingdom can provide help tracking changes in these distributions. Bumble Bee Watch provides Web and Mobile (iOS and Android platform) applications to let users identify bumblebee species and submit their observation data. These apps don’t collect information on bumblebee - flower associations. Observation data are presented by distribution maps only; these maps present observation location by GPS information. Ecological information such as different bumblebee species’ abundance, their activity within a season (phenology), and diversity calculations is not easily available to users. Beewatch in the United Kingdom provides Web applications to let user identify bumblebee species and submit their observation data. It gathers bumblebee - flower associations information by expert identification of flower in pictures submitted by users. Similar to Bumble Bee Watch, Beewatch provides an observation data distribution map only, with no other interactive data visualizations.

15 2 Methodology and Results

In creating the Beecology database and visualization tool, it was important to place them in the context of the Beecology system (Fig. 2.1) The effective transfer of observational data from the Android and web apps to the database over the Internet requires the development and implementation of a Web service. A Web service acts as ‘middleman’ that builds a data communication between the client end and the server end during periods of online connectivity. The Web service can receive, process and insert data into the database and an associated file system. It also can fetch and send the data from the database to the client end. Consequently, interactive visualization tools can be developed to fetch data through the web service and then show the data in a user-friendly way. The interactive data visualizations along with the Android and Web applications are part of the client end. In contrast, remote devices and services located in the server like the file system, the database and the web service are part of the server end. Figure 2.1 provides a schematic overview of how client and server ends were integrated into the system design.

Figure 2.1: Roadmap of Data Flow Users collect observation data (bumblebee, flower and their association) using the Android and web applications. The apps submit data to the web service, then the web service receives the data and stores them into the database. Pictures and videos of bumblebee and flower associations are stored in the file system. Visualization tools can get data from the database using the web service. The Web app and the visualization tool are parts of the web site. The Android app, Web app and visualization tool are called client end. The Web service, file system and database are called the server end. Knowledgeable authorized users with database administration permissions can import other data files like excel and csv data into database directly.

2.1 Data sources 16 Information contained in the Beecology database was obtained from the following three sources (Table 2.1): Excel Records Museum Android of Field Records / Web Observations apps

attributes bumblebee species ○ ○ ○ observation gender could have ○ ○ ○ bumblebee ○ traits(abdomen,head, thorax)

bumblebee behavior ○ ○

flower Species ○ ○

flower shape ○ ○

flower color ○ ○

spatial information ○ ○ ○

collection date ○ ○ ○

picture and video of ○(web bumblebee app submits picture only)

data source data size 18228 4331 almost 800 *

date range 2015-2017 1800-2014 2017- June - October March - 2018 * October May - August

geographical 4 conservation United States, Eastern distribution sites in MA Southern United Canada States Table 2.1 Data source exploration of three data sources ○: whether data source contains the attribute.

17 *: Only data size of existing data submitted from Android/Web apps at the completion of this thesis is listed here.

1) Excel Records of Field Observations: 18228 bumblebee observations in this data set were collected at four conservation sites, from June to October, 2015 to 2017 (see Figure 2.2.) They were collected as weekly field observations of bumblebee-plant species interactions by Ellen Pierce and Dr. Robert J. Gegear as a part of an ongoing field research project.Data were transcribed from digital voice recordings and then entered into Excel sheet files. See Table 2.1 for variables entered into the database.

Figure 2.2: The four study sites used by Pierce and Gegear to collect data on bumblebee-plant species associations. A.Breakneck Hill conservation site, Southborough, MA B. Mass Audubon's Wachusett Meadow Wildlife Sanctuary, Princeton, MA C. Bullitt Reservation, Ashfield, MA D. Pelhem Lake Park , Heath, MA

2) Museum Records 4331 observations and specimens in this data were collected in the United States and southern Canada in the last 100 years (1800-2014). It is from Yale Peabody Museum of Natural History,[45] MCZBASE(The Database of the Zoological Collections) from Museum of Comparative Zoology - Harvard University[46]. Of these observations, 1763 observations and specimens were collected in Massachusetts.

18 This dataset crosses long time scales and large spatial scales. However, the size of the historical dataset is small.

3) Android and Web apps and web app Almost 800 observations have been submitted to the database at this writing via the Beecology Android app and web app. Each record (See Table 2.1) includes: bumblebee species, flower species, bumblebee and flower traits, bumblebee behavior, collection date, spatial information, location of picture and video files which only submitted by Android app.

Data details Bumblebee Species There are in total 11 bumblebee species and almost 90 plant species. The 11 bumblebee species are: Bombus affinis, Bombus bimaculatus, Bombus borealis, Bombus fervidus, Bombus griseocollis, Bombus impatiens, Bombus pensylvanicus, Bombus perplexus, Bombus ternarius, Bombus terricola and Bombus vagans. Bee Traits: Behavior: pollen, nectar, unknown; Tongue length: short, medium, long; Gender: worker, queen(both worker and queen are female), male, female,unknown

The almost 90 plant species include: Alfalfa, Allegheny, Monkey Flower, Alsike clover, American burnweed, American wild mint, among others. Flower Traits: Flower shape: short/no tube, closed tube, long tube, tube with spur, open tube, unknown Flower color: red, white, yellow, pink, blue, orange, purple, brown, green, white/pink, other color, unknown

2.1 Database Design

2.1.1 Introduction The database was developed under PostgreSQL 9.5, distributed in a remote Ubuntu 16.04.1 server. It was designed as a relational database to allow expandability, flexibility, transferability, and reusability. The database created in this thesis is called the “beecology database”. An entity may be defined as a thing capable of an independent existence that can be uniquely identified.[27] An entity is an abstraction from the complexities of a domain. A relationship describes how entities are related to one another.[27] Based on the 19 research objectives, the data could be attributed to 4 entities: Bumblebee, Flowers, Observation records, and Traits (Figure 2.3); thus, the beecology database is composed of four tables. The construction of the structure and the relationship of the four tables follows basic ecological concepts (Figure 2.3).

Figure 2.3: Abstraction graph for the Beecology Database Four biology concepts: Observation logs, bumblebee species, flower species and traits are abstracted into 4 entities. (Blue ellipses) There are relationships between entities: An observation has one bumblebee and may have one flower. Some attributes (pink rectangles) of entities could be represented as relationships between entities: Interaction of bumblebee and flower, bumblebee behavior, is logged only in Observation. Both bumblebee and flower have key traits like head, abdomen, and color.

Bumblebee species and flower species are each documented and indexed into one table respectively, which contains their basic background information: common name, Latin name, appearance features, and other traits as described below. In the database, bumblebee and flower tables are independent of the observation data, and they can be edited to add new bumblebee or flower species. Traits are important characteristics that bumblebees and flowers contain. Every bumblebee species has different color traits associated with head, abdomen, and thorax, which are used collectively for identification of females and males. Plant species likewise have a specific combination of flower color and shape traits. Bumblebees also have behavioral traits, which define how individuals are interacting with the floral traits. It is convenient to extract traits and document them into one table. New traits of bumblebees and flowers could be added. One observation record includes one observed bumblebee, its forage flower (if it is collecting food on the flower), the traits of both bee and flower, and in addition, record details like date, location (latitude and longitude coordinates), and bumblebee behavior. In some cases it will include the location in the file system 20 where a picture and video of the observation are stored (Figure 2.4). The observation table integrates and indexes all of the observation records into one table; it grows as users submit their observation record data.

Figure 2.4: How to store picture and video of observation in database For each observation, sometimes it will include a picture of the observed bee and the flower, and in some cases a video of the bee as well. Both observation pictures and videos are stored in a file system in the server, outside of the database. Links to those locations (relative path) in the file system where individual observation pictures and videos are stored are included in the database as shown in the "Observation" table in section 2.2.

Finally, the entity - relationship (ER) graph (Figure 2.5) follows the points depicted above. The ER graph was drawn in Crow’s Foot Notation[47], 4 tables represent of four entities and their attributes. PK means the primary key, the unique identifier of one instance of an entity. FK means the foreigner key, the attribute that could associate with other entities. The lines and symbols of line starts and ends illustrate an association between two entities. Cardinality and Modality are the measures of the logical rules for a certain relationship. Cardinality represent the maximum number of times that an instance in one entity can be connected with instances in the associated entity.[48] Modality represent the minimum number of times that an instance in one entity can be connected with an instance in the associated entity.[48] Cardinality can be 1 or many , symbol of cardinality is put on the outside ends of the relationship line. A straight line is represented as 1, a foot with three toes is represented as many. It always near to the entity. Modality can be 1 or 0, its symbol is put on the inside of the relationship line. A straight line is represented as 1, a circle is represented as 0. It always near to the cardinality symbol [48]

21 Figure 2.5: Entity - Relationship Graph for the Beecology Database An observation must have 1 and only 1 bumblebee species, could have 0 or 1 flower species. 1 Bumblebee species could be found in 0 or more observation instances. 1 flower species could be found in 0 or more observation instances. 1 traits could be found in 0 or more flower species or bumblebee species instances. 1 Bumblebee or flower must have 1 or more traits.

2.1.2 Schema A database schema is the skeleton that represents the business rules of database.[27] It states how the data is organized and how the relationships among them are connected.[27] (Table 2.2-2.5) Column Name Type Description bee_id serial The unique id of the bee species. The primary key bee_name Text The name of the bumblebee species common_name Text The common name of the bumblebee species description Text The description of the bumblebee species active_months Text The active months of the bumblebee species confused Text The bumblebee species could be confused with bee_pic_path Text The relative path of the bumblebee species' picture head_set Text[] All of the possible head features the bee could have thorax_set Text[] All of the possible thorax features the bee could have 22 abdomen_set Text[] All of the possible abdomen features the bee could have tongue_length Text Tongue length of the bumblebee species Table 2.2: “Bumblebee” species Table

Column Name Type Description flower_id Serial The unique id of the flower species. The primary key flower_species Text The latin name of the flower species flower_genus Text The latin name of the flower genus flower_color Text Color of the flower flower_shape Text Shape of flower flower_common_nam Text The common name of the flower species e Table 2.3: “Flower” Table

Column Name Type Description feature_id Text The unique id of the feature feature_name Text The name of the feature description Text The description of the feature feature_pic_path Text The relative path of the feature picture Table 2.4: “Feature” Table

(*): Required attribute. The value of other attributes without ‘*’ could be ‘unknown’ Column Name Type Description beerecord_id (*) serial The unique id of the observation. The primary key bee_name Text The species name of the bee bee_dict_id Integer The id of the bee dictionary. If the logged species of the bee is unknown, this value is -1. gender Text The gender of bee coloration_abdomen(*) Text The coloration type of the bee's abdomen coloration_thorax(*) Text The coloration type of the bee's thorax coloration_head(*) Text The coloration type of bee's head flower_id int The id of flower species flower_shape Text The shape of the flower flower_color Text The color of the flower flower_name Text The name of the flower time (*) Timestamp The date of the observation loc_info (*) Text The location information of the observation,a pair of latitude and longitude city_name (*) Text City name of location information(loc_info) user_id (*) Text The id of user who submitted this observation record_pic_path Text The relative path of the observation picture location record_video_path Text The relative path of the observation video location 23 bee_behavior beebehavior_enum The collection behavior of bumble bee (pollen, nectar, or unknown) Table 2.5: “Observation” Table

Figure 2.6: Sample data stored in tables according to the schema The schema is shown in black; examples of table entries are in white with blue headers. Following the schema rules, the data is stored in tables, and the tables are associated to one another. Finally, associated tables with data and schema form the entire database.

2.2 Web Service In this thesis, a web service was created to enable communication between the beecology database (server side) and the beecology web app, Android app and website (client side). (Figure 2.7) Our web service is implemented in NodeJS v8.9.1. Node.js is an open-source, cross-platform JavaScript run-time environment that executes JavaScript code outside the browser.[49] The API implemented by our Web Service follows the RESTful ( Representational State Transfer) standard.[50] It uses HTTP requests to GET, PUT, POST and DELETE data. It maps HTTP methods (POST, GET, PUT, DELETE) in HTTP requests to CRUD (create, retrieve, update, delete) data operations.[50]

24 Figure 2.7: Roadmap of Web Service The beecology Android and web apps communicate with the beecology database by sending an http request to the web service, using parameters to envelope the data (such as observation data). The web service defines the API, which includes request name, method and data. The API will map HTTP methods (POST, GET, PUT, DELETE)) in the HTTP request to database CRUD (create, retrieve, update, delete) operations. After executing the operation in the database, the web service will send back to the client the HTTP database query result in JSON format. The result includes likely whether the operation succeed or not, a set of data sets, and a success or failure message.

A client end needs to follow the API standard to format the observation data into a valid parameters string, then send to the Web Service an HTTP request that contains a request name, a request method and the parameters string. The Web Service can recognize the purpose of this request by the request name and request method. The bumblebee ID app will typically request to add an observation to the database (Figure 2.8), while the visualization tool will typically request data to display (Figure 2.9). The Web Service transforms the parameter string it receives into Structure Query Language (SQL) that the database can execute.

Figure 2.8: Web service example - Submitting an observation to the database Apps build a HTTP request to submit an observation to the database. In the body of the request, set request name ’record’, request method ‘POST’ and put all required data like bumblebee name, flower name, collection time into request parameters. Apps send it to web service, web service will analyze the request body, by request name ’record’ and ‘POST’, locate to the ‘submit observation’ function in web service. This function will transform data in parameters into SQL, then send it to the database, database will execute the SQL to insert the observation data into the observation table. If this operation success, web service will send congratulation message back to apps as HTTP response. Otherwise web service will send failure message back to apps.

25 Figure 2.9: Web service example - Getting all observations from the database Apps build a HTTP request to get all observations to the database. In body of the request, set request name ‘beevisrecord’, request method ‘GET’, request parameters is not required. Apps send it to web service, web service will analyze the request body, by request name ‘beevisrecord’ and ‘GET’, locate to the ‘get observations’ function in web service. This function will generate a SQL sentence, then send it to the database, database will execute the SQL to retrieve all observations from the observation table. If this operation success, web service will send congratulation message and observations in JSON format back to apps as HTTP response. Otherwise web service will send failure message back to apps.

Some data in the database and file resources in the server should not be accessible to unauthorized users. To solve this problem, other members of the Beecology team developed an advanced user security module using Firebase Authentication, which was added to our existing Web service. Firebase Authentication provides a secure user authentication in a simple way. Furthermore, it provides an end-to-end user identity solution, supporting multiple alternative ways for user authentication including email and password accounts, phone authentication, and Google and Facebook login, among others.[51] (Figure 2.10)

Figure 2.10: Firebase Authentication - Getting observations by user A user’s submitted observations should be accessible to that user only and not to others. A user who logins successfully with a Google account, Facebook account, Phone number and Email through Firebase Authentication will be recognized as an authorized user. The user can get his/her own observations. The user whose login fails or sends a request without Firebase Authentication will be recognized as an unauthorized user. Requests from an unauthorized user will be rejected. They will receive a request rejection message.

26 2.3 Interactive Data Visualizations

2.3.1 Introduction The visualization tool was developed under D3.js, a Javascript visualization framework, primarily due to the fact that web-based interactive visualizations are convenient and can be widely used. Javascript is the predominant front-end web page script language. D3 is an easily portable, small in size, and highly customizable visualization framework. Three types of information contained in the Beecology database can be visualized by clicking on one of the tabs in the top bar of the data visualization home screen (Figure 2.11): Bumblebee species diversity over time and space (‘Diversity’); Bumblebee species abundance over time and space (‘Population’); and Bumblebee- flower species interactions (‘Floral’). The ‘Population’ option contains three types of sub-visualization, which include 1) species location over time and space (‘Location’), 2) seasonal changes in species abundance (‘Season’), and 3) a summary of observations over different time scales (‘Summary’). Figure 2.11 shows the data visualization home screen.

Figure 2.11: The First Page of Data Visualization The first page of visualizations is diversity visualization. Navigation at the top of the homepage can lead the user to browse to different visualizations. Angular Architecture of the Beecology Visualization Tool In order for the visualization tool to be integrated into the rest of the Beecology web resources, it had to be migrated to Angular 6 after development was completed. The implementation of the beecology data visualizations in Angular starts with a root component (Fig 2.12). The Navigation component is routed by root component. 5 main components are nested under the Navigation component. Navigation component is routing to 5 main ‘child’ components: Diversity 27 Component, Location Component, Season Component, Summary Component, and Floral Component. Our five beecology data visualizations are packed into these 5 components. Our time controller and species selector, which are utilized by several of the visualizations, are packed into 2 additional components. They are both child components of the Navigation component. The Navigation component controls when to show a data visualization and simultaneously to hide the other data visualizations. An API service is a Service module that let an Angular application communicate with web service to get data. We implemented a BeeRecord service that provides the observation data pre-processing functions for components (Figure 2.12).

Figure 2.12: Angular Modularity of the Beecology Visualization Tool Root App Component is the root component of the whole visualization angular structure. According routing, Navigation Component routed by Root App Component as the second layer of the structure. 7 components are nested under the Navigation Component as the third layer of the structure: Time Controller component can communicate with Diversity Component, Location Component, and Season Component. Species selector component can communicate with Location Component, Season Component and Summary Component. Api Service and BeeRecord Service provide required data for Diversity Component, Location Component, Season Component, Summary Component and Floral Component. 5 visualizations in section 2.3.1 are implemented by these 5 components.

2.3.2 Time Controller

The time controller in this visualization is formed as a combination of a time slider and a time range selector. It is applicable to all of the visualization tools in this thesis that need to present the data over the time. The user can select one or more years and months in order to filter the data by time range and synchronously reflect the new filtered data change in the visualization tools. (Figure 2.13) 28 Figure 2.13: Time controller The selection shown would include data for the months of May, June, and July for 2015.

The upper time controller is the year controller. The start year starts with 1800 and the end year ends with current year. There is a small amount of historical data located in the interval 1800 - 2014. To compare historical data and current data, we can concentrate the historical data into fewer but larger sub datasets. Thus we change the start year from ‘1800’ to ‘<2000’ year, which means when this time range is selected, all of the historical data from 1800 to 2000 are centralized into one sub dataset. Then the next point means historical data in 2000 - 2014 are centralized into another sub dataset. More recent data are distributed year by year. The lower slider is the month controller. The start month is April and the end month is October. This selection corresponds to the active months of bumblebees.

2.3.3 Species selector To see the different visualization results for different species of bumblebees, a species selector is necessary. The user can select one or more species in order to filter the observation data and synchronously reflect the new filtered data change in the data visualization. This selector is applicable to all of the data visualizations which need to present the data species by species. (Figure 2.14)

(a) (b) Figure 2.14: Species selector (a) Prototype of species selector; checkbox event can receive a click event and tell the visualization tools which bumblebee species are to be shown or hidden. (b) Species selector gets the species data from the database, lists all of the 11 bumblebee species with different color legends.

29 2.3.4 Data Preprocessing Since observation data will often be visualized by time and species, we use D3 provided data process functions “d3.nest ()” to re-group and nest the observation data. When a data visualization starts its initialization, it first requests all of the observation data from the database and stores them in cache. Next, the cache data are imported into d3.nest to perform data preprocessing. d3.nest can build a simple time and species index of the observation data. When ranges of time and species are decided via the time controller and species selector, our visualization tools follow the indexes and find the appropriate data. The use of indexes results in a highly optimized search. (Figure 2.15)

Figure 2.15: Process data by date D3 re-groups observation data by year and month. Here, the range of the time controller is from 2017 to 2018, August of these two years; all of the data in this range is required data for the visualization. The data visualization first finds the 2017 and 2018 datasets by using the ‘2017’ and ’2018’ index, then finds the August data by using the ’Aug’index. Data in green dashed border box is the required data.

2.3.5 Diversity Map

The change in biological diversity, affected by any possible environment factors, is a good measure of possible environmental changes which could lead to species rise or decline in one area. Areas showing a decline in diversity would be priority conservation areas. Species diversity is of major interest both in theoretical and applied studies [52]Diversity indices are a key metric used to study spatio-temporal changes in natural communities, to identify priority areas of protection and to support effective conservation planning[53] Hence scientists are curious about the change of species diversity of bumblebee, especially rare or threatened species, and their distribution with reference to reserve locations. 30

Diversity Indices A diversity index is a quantitative measure of species diversity in a given community, based on the species abundance and richness.[54] Abundance is the number of individuals per species. Richness is the number of species present. The Shannon index (H) is a commonly used information statistic index utilized in this thesis (Formula 2.1).[54] This index assumes all species are randomly represented in a sample.

Formula 2.1: Shannon Index pi is the proportion (n/N) of individuals of one particular species i, found (n) divided by the total number of individuals found (N),[54] and R is richness, which is the number of species in the community.

A broad geographical extent map is the main source of data on species distributions. Diversity is estimated by calculating a diversity index for each cell of a grid system.[55] To visualize the diversity map, we need spatial information on each species, a map that is customizable by the developer, and a time controller that lets users view species information at different temporal scales (months to years). (Figure 2.16)

Figure 2.16: Design Process to Generate of Diversity Map

Google Maps provides Map development interfaces which allow users to visualize their own data through spatial information (latitude and longitude). For the grid system, Google Maps allows adding objects to the map to indicate areas, lines or collection of objects. These objects are called overlays. Overlays are tied to latitude and longitude coordinates; therefore they can move when the user drags or zooms the map. They also can interact with users, so that developers can add interactive events onto overlays, such as mouse click and double click, or change grid size by zooming in or out. (Fig 2-17)

31 Figure 2.17: Gridview Design for Diversity Map

For each cell, ranges of width and height correspond with ranges of latitude and longitude on the map. One observation can be indicated as one point on the map according to GPS information. Whether a given point lies inside, or on the boundary of a grid cell can be determined by comparing if the latitude and longitude value of observation data is within the intervals of width and height of the grid cell. All of the observations in the grid cell will participate in the diversity calculation for that cell.(Figure 2-18).

Figure 2.18: Calculate diversity of each grid cell

Mapping numerical diversity value to color value is a user - friendly way to display different diversity values of all grid cells. By using a gradient color scale, we can smoothly interpolate 2 colors from any palette to a continuous scale. (Figure 2.19). Typically, the Shannon index in real ecosystems ranges is [1.5,3.5].[56] The value rarely greater than 4.[57] The upper limit for the diversity visualization is a Shannon index of 5.00. In the current dataset, no diversity value of any grid cell exceeds 5.00.

32 Most of the values are in the range of 0 to 2. D3 provides a scale function that returns an interpolator between the two arbitrary values a and b, where a and b could be rgb colors. The interpolator maps a normalized domain parameter t in [0, 1] to the corresponding value in the range [a,b]. colorScale = d3.interpolate(a, b) return a interpolator [0,1] D3 also provides another scale function that maps a continuous, quantitative input domain to a continuous output range. So that we map diversity range [0, 5.0] into [0, 1] diversityScale = d3.scale.linear().domain([0, 5.0]).range([0, 1]); After these two mapping steps, the diversity value D could be map to the color value C. C = colorScale(diversityScale (D));

Figure 2.19: The diversity value is mapped to a color gradient Smoothly interpolate 1 light color and 1 dark color from any palette to a continuous scale. Map low Shannon diversity to light color and map high diversity to dark color. The domain of Shannon diversity domain is [0.0, 5.0].

Figure 2.20 depicts the final appearance of the diversity map. It contains all of the observation data currently in the database. Here we chose yellow and red gradient color scale to map the diversity value. Only grid cells that contain observation data are visible.

33 Figure 2.20: Diversity Map Each colored grid position contains observation data. Colors are scaled to represent biodiversity at each grid position, with red ( rgb(255,255,0)) representing highest diversity and yellow( rgb(251,255,12)) representing lowest diversity. See text for details.

When the user zooms in or out of the map, the grid system will be refreshed, and the grid size and diversity of each new grid cell will be updated interactively.(Figure 2.21)

(a) (b) Figure 2.21: The Diversity Map Allows Interactive Zooming by the User (a) There are three grid cells containing data in the city of Worcester. (b)After zooming in one level, the grid system will redivide the overlay and recalculate the diversity. As a result, there are now four grid cells containing data in Worcester, and their positions are more precise.

Users can click every colored grid cell to get species abundance in this area by binding the mouse event with Google Map. this.google.maps.event.addListener(this.map, 'click', function()); 34 (See Appendix A.1) A stacked bar reports species distribution and total number of each species in the grid cell. (Figure 2.22)

(a)

(b) Figure 2.22: Stacked bar presents species abundance of one grid cell (a):Grid cell with high diversity and bumblebee abundance. Stacked bar shows that there are 6 species and a total of 284 observations in this area. (b): Grid cell with low diversity and bumblebee abundance. Stacked bar shows there is only 1 species, Bombus vagans, in this area, and the total number of observations is 1. The diversity is 0.

2.3.6 Population 35 The population visualization shows bumblebee population in three different ways: 1. Location: distribution of population by map location for the time and species selected by the user (Figure 2.23).

Figure 2.23: Overview of Location Visualization Click ‘Population’ in navigation bar, a nested navigator display under the top navigation bar. Click ‘Location’ in orange color, the Location Visualization will show up on webpage. It is composed of Year and month time controllers, a Google Map covered by donuts graphs and a specie selector.

2. Season: population change over the bumblebee active season (May-October) for the years selected by the user;(Figure 2.24)

36 Figure 2.24: Overview of Location Visualization Click ‘Season’ in orange color, the Season Visualization will show up on webpage. It is composed of a year controller, a multi - line graph and a specie selector.

3. Summary: total observation numbers by season and year of species selected by the user.(Figure 2.25)

Figure 2.25: Overview of Location Visualization Click ‘Summary’ in orange color, the Summary Visualization will show up on webpage. It is composed of a heatmap and a specie selector.

Population by Location Similar to the diversity map, the Population Location visualization presents the 37 distribution of all of the observations selected by the user on the map. However, instead of calculating the diversity index at a given location, -the tool displays the number of each species that was observed. Importing large amounts data can extremely slow down the mapping process. Google Map Marker Cluster, one library of Google map API, uses fewer cluster markers instead of thousands of map makers to represent many observations.(Figure 2.25 a)

(a)

(b) Figure 2.26: Map and Map cluster markers - Zoom in and out (a): Thickly dotted map markers are clustered into fewer cluster map markers by Google Marker Cluster library. Similar to the diversity map, Google Marker Cluster will divide the map into gridview, and add cluster markers on Google Map as an overlay. (b) There are 11 locations in Western United States, they are clustered into 2 cluster markers: 4 and 7. After zooming in the map, there are 2 cluster markers with reduced total number.

Notice that as users zoom into any one cluster location, the number in that cluster may decrease, as clusters are subdivided into smaller clusters. Zooming out of the map consolidates the markers into larger clusters again. (Figure 2.26 b) To observe one or more species at the same time, the user can employ the species selector. In the initial implementation of the Population Location visualization, all map markers were divided into 11 categories of clusters. One species was mapped with one category of cluster. By selecting or deselecting options of species selector, corresponding categories will show or hide on the map. The library allowed the user to customize the style of the cluster marker icon. In the initial design of the Population Location visualization, we set a circle as the icon of cluster markers. Text in the circle is the total number of map markers in one area. The colors of circle are the same as the colors of species options in the species 38 selector. (Fig 2.27).

Figure 2.27: Initial design for map cluster markers showing species selection Check ‘Bombus impatiens’ in species selector, only blue circle cluster markers, the population distribution of Bombus impatiens’, will show on the map.

However, it is probable that different circle cluster markers will overlap at one same area on the maps, making it difficult for viewers to distinguish species’ population distribution on the map. There are 11 different categories of cluster species, and they are mutually independent when created by the Google Map Marker Cluster library. The library can only detect the distance of two map markers on the map, not the distance of two map cluster markers. If the distance is too close, the two map markers will be assigned into one cluster. The library cannot recognize if two or more categories of clusters are too close to be easily distinguishable. (Figure 2.28)

Figure 2.28: Google Map Marker Clusters library The library will draw circle clusters from different species as overlapping circles, making it difficult to distinguish different population sizes.

Therefore, for the final design of this data visualization we did not divide the map markers into 11 different categories of cluster species. Instead, the library will cluster all map markers into 1 category so that there is no longer a map cluster markers overlap issue. To clearly present a summary of species abundance in one area, we set the donut chart as the cluster marker icon. A donut chart can comprise all map markers of all kinds of species as an individual cluster marker. Donut are produced by D3. Since Google Map Marker Cluster library is an open source library, we integrated the D3 donut chart code into the library easily. (Figure 2.29 Figure 2.30) 39 Figure 2.29: Circle map cluster markers -> Donuts cluster marker Change the initial marker cluster design - circle map cluster marker to donuts cluster marker. Comprise all map markers of 3 bumblebee species in one donut graph as an individual cluster marker.

Figure 2.30: Add Donuts graph function to Google Map Marker Cluster Library The library will ignore the bumblebee category and consider all the markers in one area is one cluster. The classification works are assigned to D3 Donuts graph.

To avoid having to calculate when overlap makes a donut chart necessary, all of the observation data from the database are presented as donut chart cluster markers (Figure 2.31). No cluster markers overlapped in the map.

Figure 2.31: Donuts Map cluster markers - all observations Donut cluster markers solved the problem of overlapping cluster markers

The donut chart can dynamically present species population number when the species data are changed through the species selector and time selector. For example, if the user only selects two species, Bombus impatiens and Bombus 40 bimaculatus, donuts charts will hide other species sectors. Only the population distribution of Bombus impatiens and Bombus bimaculatus displays on the map. (Figure 2.32)

Figure 2.32: Map cluster markers change dynamically when the user selects species The user has selected Bombus Impatiens and Bombus bimaculatus; thus, only these species are visualized . Bombus Impatiens populations are larger than Bombus bimaculatus for most locations.

When the user zooms in the map, Google map cluster library renders new donut cluster markers of Bombus Impatiens and Bombus bimaculatus.(Figure 2.33)

Figure 2.33: Map cluster markers adjust dynamically when the user zooms in Donut charts cluster makers are rendered into more cluster markers with smaller centered numbers when user zooms in the map.

41 Population by Season To present the phenology, or abundance of different bumble bee species across the active season of the year (typically May to October in New England), a line graph is an appropriate visualization. It is easy to find the peaks of population change for some species. (Figure 2.34 a; species 3)

(a)

(b) Figure 2.34: Prototype of Line graph for bumblebee phenology (a). With the y axis on a linear scale, It is difficult to distinguish peaks of bumblebee species 1 and 2. (b) Applying a logarithmic scale on the Y axis, the peaks of three bumblebee species are evident.

However, the number of some bumblebee species at their peak may be an order of magnitude greater than other species. Thus, the lines representing species with smaller populations will be compressed at the bottom of the graph. To solve this issue, a logarithmic scale allow the number of bumblebees in the scale are not plotted equidistantly; it is plotted in a way that two equal percent changes are positioned as the same distance on the scale. The same distance in a scale will cover a wider range of counts from the bottom to the top on the vertical axis. (Figure 2.34 b)

To achieve line graphs, we use the reusable D3 charting library: nvd3. (See Appendix A.2) d3.select(“svg”) //select svg tag 42 .datum(data) // assign data set .call(nv.models.lineChart()) // call the nvd3 linechart generation function

Similar to the location visualization above, bumblebee season visualization also integrated the time controller and species selector. Since the same months are always shown along the X axis,the month time controller is disabled here. (Figure 2.35)

Figure 2.35: Line graph for bumblebee season The line graph in season visualization showing all of the bumblebee species by month in all of the years in the database.

To see the power of the season visualization, we set the time range from 2015 to 2018 and select species Bombus impatiens, Bombus bimaculatus, Bombus pensylvanicus and Bombus borealis (Fig 2.36). Since there are no Bombus pensylvanicus observation data in 2015 - 2018, only lines of Bombus impatiens, Bombus bimaculatus and Bombus borealis are shown on the webpage. When the user hovers on the graph, all of the observation numbers in the month which corresponds to the X axis value of the hover point are listed in the tool box: From 2015 to 2018, there are 6240 Bombus impatiens, 15 Bombus bimaculatus and 22 Bombus pensylvanicus observations in August. (Figure 2.36)

43 Figure 2.36: Line graph for bumblebee season - Data filtered by time and species A pop-up tool box is produced by hovering over the graph, and shows the number of observations for the nearest month

Population Summary A time-based observation records summary can tell scientists the status of observations in different time periods. A heat map is a graphical visualization of data that the values holded in a matrix are portrayed as colors. Calendar heat maps are useful for visualizing recurring discrete activities over long periods of time. In this thesis, the color for each matrix position is determined by the observation total number of every month. (Figure 2.37)

Figure 2.37: Mapping observation data within heatmap cells The blue border square means the observation total number in 2017 August. Then find the data by date index and calculate the total number of observations. This number will be assigned to the blue border square. Same with green border and purple border squares.

44 Figure 2.38: Heatmap Prototype Similar to diversity visualization, observation number of each month are assigned to a gridview graph- heatmap. The darker color grid cell has, the more observation number per month. Y axis represents month, X axis represents Year Following the heatmap prototype (Figure 2.38), D3 was used to construct the heatmap (Fig 2.38). The method of color scaling is same as used with diversity value mapping method in section 2.4.5. d3.svg.append('g').append('rect').attr(‘fill’,colorscale(d)); //Add squares and map each cell color with observation total number of every month

This figure 2.38 show the current status of the database, with all bumblebee species selected.

Figure 2.39: Heatmap Similar to diversity value mapping method in section 2.4.5. The color domain is [#ffffbf , #a50026 ], the range of observation number of each grid cell is [0, max value of all observation number per month]. The light yellow color (#ffffbf ) representing the lowest observation number 0, the dark red color (#a50026 ) representing the highest observation number.

45 The user can also choose to display summary numbers for only selected species. As Figure 2.40 shows, Bombus vagans, Bombus perplexus and Bombus ternarius are selected. Only these 3 species observation numbers will visualized on the heatmap. When the user hovers on the grid cell in 2017, September, the grid cell will highlight with a dark orange border, and the observation number of this month show in the tool box. There are 18 observations in 2017, September.

Figure 2.40: Heatmap - Data filtered by species and hover event The user has selected Bombus vagans, Bombus Perplexus and Bombus ternarius by species selector. The visualization updates the heatmap with new filtered data. When the user hover on the Sep 2017 cell, there is a pop-up toolbox that shows the total number of observations of these species in September 2017.

2.3.7 Floral Preferences The interaction between bumblebees and plants is still poorly understood at the species level, but likely to depend on a number of traits. From the bumblebee’s perspective, tongue length, gender, and behavior (foraging for nectar or pollen) are possible factors relevant to specialization.[58] From the plant’s perspective, potential traits are flower shape and flower color. In order to allow users to test for important associations between bumblebees and plants at the species level,a Floral Preferences Tool was created in which species associations are represented as a bipartite network where pollinators and plant are nodes, and pollination interactions form the links connecting plant and pollinator communities.[59] To allow users to test hypotheses regarding bumblebee and flower traits mediating interactions, the bipartite graph needs to present the overall dataset with a variety of informative sub datasets. Thus, all of the traits can be set as data filters to generate the sub datasets. Furthermore, the statistics of the traits will help scientists design pollination network hypothesis. While the bipartite graph shows the results, statistical graphs like pie charts, histogram charts can also provide details of the traits. (Figure 2.41)

46 Figure 2.41: Design Process to Generate of Floral Visualization

Bipartite Graph Generation and Visualization The bipartite graph displays the abundance of bumblebees and their visited flowers interactions. First, a matrix representing the bumblebee - flower pollination network is created in data preprocessing. (Table 2.6) Each number in the data matrix represents the number of interactions between a bumblebee and flower species. According the matrix, we can build a basic bipartite graph (Figure 2.42). (a)(b))

Table 2.6: A matrix of bumblebee - flower network There are 3 bumblebee species and 5 flower species in the data. Number n (i,j) in cell(i,j) means the number of observations that bumblebee bee species i visited flower species j.

In this thesis, rectangles as nodes represent bumblebee and flowers species, and the height of each rectangle is proportional to the total number of interactions involving these species. Interacting bumblebee and flower are connected by lines. Height of lines is proportional to the number of species interactions. (Figure 2.42(c)) To visualize the proportion more obviously, we set the color of the bumblebee rectangles as the main color, one color to one bumblebee species. For the flower rectangles side, the tool combines the bumblebee colors rather than representing each flower species as its own color. Each flower rectangle is divided into sub rectangles whose heights are represented as the interaction number, and colors are the colors of every associated bumblebee. (Figure 2.42(d))

47 (a) (b)

(c) (d) Figure 2.42: How to build a bipartite graph

(a):We assigned E(bi , fi) as number of interaction between bumblebee bi and flower fi . According to the table 2.5, list all interactions number.

(b): For each E (bi , fi) , start with bi at left side, draw a line to fi .at right side. bi and fi . are nodes. A simple bipartite graph was created. (c) : Transform shape of all nodes to rectangle: Define height of BiPartite graph as H, interval of rectangles as hh, number of bumblebee species as nb, number of flower species as nf, calculate height of each rectangle node, height of bumblebee node h(bi ) = (H - (nb-1) * hh) * (number of bumblebee(bi

) / total number of all of the bumblebee species ), height of flower node h(fi) =(H - (nf-1) * hh) *

(number of flower(fi ) / total number of all of the flower species) (d): Each flower rectangle is divided into sub rectangles whose heights are represented as the interaction number, height of each sub rectangles h (E (bi, fi)) = height of flower node h (fi) *

(interaction number / number of flower fi). Color of sub rectangle color (E (bi, fi)) is the color of fi associated bumblebee node bi.

We use D3 to create SVG rectangles elements and assign them attributes like height values.

48 d3.select('svg').append('rect').attr(“height”, ”height value”);

The user can focus on the interactions of only one bee or one flower species by clicking on the rectangle representing that species, or the species name. When the user clicks the cursor on one bumblebee rectangle in the left bumblebee bar, only the visits of that bumble bee species are shown dynamically. Only the flower rectangles visited by that bumblebee species remain visible; the other rectangles shrink to minimum size. Here, we set minimum height as 0px; these minimal rectangles are hidden from the bipartite graph. (Figure 2.43). Clicking on a flower rectangle or species name has a similar effect from the flower perspective.

Figure 2.43: click event on one bumblebee rectangle We define min height of flower node not visited by bumblebee bi as mh, interval of rectangles as hh, total number of flower species as ns, total number of flower species not visited by bumblebee bi as n, height of the BiPartite graph as H. After user click on b2 node, all of the flower nodes not visited by b2 (f1, f3, f4) will transit to minimum size. The heights of other flower nodes (f2, f5) will be recalculated by each interaction proportion of total number of bumblebee b2. Height of f2 flower node h (f2)’ = (H - n*mh - (ns-1)*hh) * (E (b2, f2)/b2), height of f5 flower node h (f5)’ = h (f2)’ = (H - n*mh - (ns-1)*hh) * (E (b2, f5)/b2).

Filters of flower and bumblebee traits will filter the bumblebee-flower datasets, preprocessing the data and re-rendering bipartite graphs under different traits conditions. The options of each trait filter visualized in checkboxes are retrieved from the database. For each trait filter, there is a switch to activate or disable the filter, which means it can add or remove all options of the trait from the filter conditions. (Figure 2.44)

49 Figure 2.44: Bipartite graph with trait filters of bumblebee and flower Bumblebee have traits: behavior, tongue length, and gender, put their trait filters at the bipartite bumblebee side, flower have traits:flower shape and color, put their trait filters at the bipartite flower side.

In the meantime, when one bumble bee species / flower species is selected, the statistical graphs of traits of the visited flower/ bumblebee should be presented. The statistical graphs are placed under the traits filters. In this thesis, we use a reusable D3 charting library: nvd3, to generate the statistical graphs. (Figure 2.45)

Figure 2.45: Bipartite graph click event - display statistic graphs of selected bumblebee and preferred flowers When one bumblebee species is selected (clicked) in the bipartite graph,only the flowers preferred by that species are displayed in the bipartite graph. The statistics of the traits of the preferred floral species are dynamically updated at the same time. For flower traits, apie chart is used to show shape proportions, and a histogram is used to show color proportion.

50 Using the Floral Preferences Visualization When the Floral Preferences visualization is initially selected by the user, the data displayed represents all of the observation data from Excel Records that contains bumblebee and flower information in the database. Data submitted through the apps by users was not included in this visualization, because the combination of flower traits and species name cannot be checked for accuracy. Thus, visualizations could reflect incorrect information in these data were included. In Figure. 2.46, there are 9 bumblebee species on the left and 84 flower species on the right in bipartite graph. We also display the count and percentage of each species after the species name. Statistical graphs are shown under each trait filter. They present the basic statistical information of each trait.

Figure 2.46: Flower Preference Visualization - Overview graph

Figure 2.47 shows an example of a user selecting a particular bumblebee species; in this case, Bombus fervidus, Bombus fervidus and its preferred flowers are highlighted in the bipartite graph. The statistical graphs are updated to show the traits status of Bombus fervidus and its preferred flowers.

51 Figure 2.47: Show flower preferences of Bombus fervidus There are 18 observations that contain Bombus fervidus. Among them, their behavior are all nectar. 83% fervidus are male. Among their preferred flowers: 9 Cow vetch, 5 Red clover, and 4 other flower species have been visited by Bombus fervidus. Flowers with narrow tube accounted for almost 50% of these visits. Purple color is the most frequent flower color that Bombus fervidus visited.

2.3.8 Model, View, Controller (MVC) One advantage of using Angular is that it requires a modular construction of each component of the software system. In this thesis Angular uses the Model, View, Controller software design mode to increase the modularity of an application. Each application component is divided into three parts: Model: dataset and structure of the data. In Angular, the Model can be provided by the service module. View: User Interface (UI) that users see. In Angular, the template module is a View. It contains html elements and directives which can attach custom behavior to elements in the html elements. Controller: intermediate between Model and View. It can receive inputs from users or perform application-specific tasks based on the inputs, including modifying the Model. In addition, it can update the View when the Model is changed. In Angular, the controller part can be implemented in Components.

52 Figure 2.48: Angular MVC implementation of heat map in Summary visualization Summary-component.html is a template, a HTML file is referred to summary component, summary - component.ts. There is a svg tag for loading heatmap graph. BeerecodServcie.ts defines a function getBeeRecordsNumberByMonth(), which will return a data sets contains observation number per month. Summary - component.ts can call this service and get the data (MODEL). Then it will use d3 to find the svg tag in template (VIEW) to create heatmap and assign the data set into heatmap. We said summary - component.ts is a controller because it can combine the MODEL and VIEW.

53

3 Discussion and Future Work

In order to create a functioning Beecology Project, we filled the gaps in the system described in the Introduction. We created a database to provide a data repository to store data submitted by users. We also developed a visualization tool to visualize and analyze data from the database. Both of these have been integrated in the Beecology Project system.

Database We have created a database for storing observational data from Android and Web apps and other common data files. The database is designed to help scientists and the general public store field observations of pollination-plant associations, which can be used in the contexts of academic research and STEM+C education. Currently, the database can only be updated by submitting data through the mobile apps. Although the database can also allow bulk data submissions through other means such as research laboratories and museum archives, only people with database management expertise can currently do so because they need to know how to access the database directly.

Database flexibility The database was designed as a relational database in order to allow easy additions to functionality in the future. Such additions may include new bumblebee species, new plant species, and new floral traits. It is also possible to create species tables for new pollinator groups such as hummingbirds and butterflies as the Beecology Project expands. To do so, a table with similar structure to the bumblebee table shown in Figure 2.5 would need to be created. If the user submits another species observation to database, the observation table can use species ID to refer to the species table so that new species can build relationships with observations. The addition of another ‘category’ attribute to distinguish bumblebee and other pollinators in the observation table may also be required.

Future directions for the database There are some limitations in the database, which are listed below. a) The storage of two or more flower shapes and/or flower colors for a flower species is not supported in the flower table. b) Users can input flower name, shape and color without strict identification. Issues like duplication of species name, incorrect species being identified, or incorrect traits being associated with a species can happen. Currently, these errors are corrected manually by expert data curators. c) There is no mature content management system (CMS). Only a project team member who has server management and database management knowledge can do large-batch database management work. CMS could provide a complete and user-

54 friendly interface to create, retrieve, update and delete data in the database for non- technical administrators, like the pollination experts in Beecology project.

Solutions To achieve CMS as described in c), an advanced user’s group policy which can divide users into different groups with different data access permission levels is required. For a) and b), we are planning to change the design of the flower table to accept compound values. In addition, we would like to get data access from Go Botany[60], a project that contains web-based tools to teach botany and plant identification, to optimize the flower identification part of the apps. In this way, it would be possible to integrate correct and relatively complete plant data into our database, including plant species name, shape, color and other traits from Go Botany. For c), we would need other team members to design another web-based application including a web page and user interface design, with new functions, addition of the web service, as well as new identify rules and solutions in Firebase Authentication. Since the Android application was developed ahead of the database, it was installed with a local database including 11 bumblebee species, bumblebee traits and flower traits. To keep consistency with the Android application, the web app has a similar design. We plan to make the apps synchronize with the database so that when new bumblebee species and new traits need to be added to the database, developers do not have to add new data to the local database separately. The Web service has a function ‘/BeeDex’ (see Appendix Table A.1) to let apps get bumblebee species information from the database. Using this function, the web and Android apps can get and show all bumblebee species in the database through the web service, rather than from a local (e.g., on the mobile device) database.

Visualization tool We also have developed a web-based visualization tool comprised of several ecologically-relevant ways to view database contents. Collectively, the visualizations were designed to provide a way to access data in a way that would be meaningful to researchers, educators and the general public interested in bee ecology and life history. The tool was developed in a modular way in order to allow high school teachers to develop modular curriculum that incorporates computing and computational thinking into biological theory and practices. Our visualization tool facilitates understanding of bumblebee biology in the following areas:

1) Diversity: This visualization allows users to visualize the diversity of bumblebee species both spatially and temporally. Users can compare diversity values at different time periods, such as pre- and post- species decline and at different geographic locations such as high and low elevation. This visualization also shows users the actual numerical makeup of different bumblebee species at a given geographic location via a stacked bar graph.

55 2) Population: This visualization allows users to see data on one or more bumblebee species on the following contexts. a) Location: This visualization shows the numbers of user-selected bumblebee species distributed across the map. Users can compare one or more bumblebee species’ geographic distributions in different time periods. Users can answer questions about where particular bumblebee species of interest are distributed, and whether the geographic distribution changed in over time. b) Season: The season visualization shows bumblebee phenology -- abundance changes over time within the active season (May - October). Users can visualize the change in number of each bumblebee species over the season month by month, or at 2 week time intervals. Users can compare the active seasons and peaks of activity of different bumblebee species. They also can see and compare changes in phenology of bumblebee species in different years. This visualization could provide a reference that can be used to take actions to protect threatened bumblebees during their active periods. c) Summary: The summary visualization shows the number of bumblebee observations submitted by month and year. Users can select one or more species. This visualization gives user an overview of the status of observations in different time periods. Users can know how many observations were submitted during a given month-year combination.

3) Floral: The floral visualization shows the numbers of particular bumblebee and plant species interactions and information of possible interaction impact factors: bumblebee traits (behavior, tongue length, gender) and flower traits (flower shape and flower color). It allows users to see what kinds of flower are preferred by a bumblebee species; and what bumblebee species are frequenters of a kind of flower. The visualization provides trait filters to see different bumblebee and flower species abundance under different trait condition. Statistical graphs of traits reflect statistical summary of the traits. This visualization can help scientists construct hypotheses about what are important associations between bumblebees and plants at the species level; which traits of bumblebees and flowers may influence or determine a particular association.

Future directions for the visualization tool For the population season and summary visualizations, we also would like to present the abundance changes in different time scales; not only by month and two- weeks (season visualization only), but also by week and days. For the next phase of the Beecology Project, the collection and submission of data on nestand overwintering site preferences is planned. We would like to integrate floral visualization with other visualizations, such as the diversity map, to test if diversity value in one geographic area relate to flower preferences, nesting habitat or overwintering site availability. Doing so would greatly accelerate our understanding of bumblebee decline. In addition, we also would like to add interactive visualizations for other pollinator groups.

56 Currently, the floral visualization does not include data submitted through the apps, until it is checked for accuracy manually. Once we include the compound trait attribute for flowers in the database, we plan to include all observations that contain both bumblebee and flower data. The visualization tool is only available for browsers of laptops and desktop computers currently. The development of an adaptive user interface is essential for mobile and tablet users.

57 Bibliography

[1]. Klein, A.-M., et al., Importance of Pollinators in Changing Landscapes for World Crops. Proceedings: Biological Sciences, 2007. 274(1608): p. 303-313. [2]. Ashman, T.L., et al., Pollen limitation of plant reproduction: Ecological and evolutionary causes and consequences. Ecology, 2004. 85(9): p. 2408-2421. [3]. Aguilar, R., et al., Plant reproductive susceptibility to habitat fragmentation: Review and synthesis through a meta-analysis. Ecology Letters, 2006. 9(8): p. 968-980. [4]. Gallai, N., et al., Economic valuation of the vulnerability of world agriculture confronted with pollinator decline. Ecological Economics, 2009. 68(3): p. 810-821. [5]. Potts, S.G., et al., Global pollinator declines: trends, impacts and drivers. Trends in Ecology & Evolution, 2010. 25(6): p. 345-353. [6]. Vanbergen, A.J. and I.P. Initiative, Threats to an ecosystem service: pressures on pollinators. Frontiers in Ecology and the Environment, 2013. 11(5): p. 251-259. [7]. Cameron, S.A., et al., Patterns of widespread decline in North American bumble bees. Proceedings of the National Academy of Sciences, 2011. 108(2): p. 662-667. [8]. Goulson, D., et al., Bee declines driven by combined stress from parasites, pesticides, and lack of flowers. Science, 2015. 347(6229). [9]. Frick, W.F., et al., An Emerging Disease Causes Regional Population Collapse of a Common North American Bat Species. Science, 2010. 329(5992): p. 679. [10]. Marletto, F., A. Patetta, and A. Manino, Laboratory assessment of pesticide toxicity to bumblebees. Bulletin of insectology, 2003. 56(1): p. 155-158. [11]. Tasei, J., G. Ripault, and E. Rivault, Hazards of imidacloprid seed coating to Bombus terrestris (Hymenoptera: Apidae) when applied to sunflower. Journal of economic entomology, 2001.94(3): p. 623-627. [12]. Vredenburg, V.T., et al., Dynamics of an emerging disease drive large-scale amphibian population extinctions. Proceedings of the National Academy of Sciences, 2010. 107(21): p.9689. [13]. Biesmeijer, J.C., et al., Parallel Declines in Pollinators and Insect-Pollinated Plants in Britain and the Netherlands. Science, 2006. 313(5785): p. 351. [14]. Kleijn, D. and I. Raemakers, A retrospective analysis of pollen host plant use by stable and declining bumble bee species. Ecology, 2008. 89(7): p. 1811-1823. [15]. Ryder, E.F., R.J. Gegear, C. Ruiz, and S. Weaver. "Building Educational Bridges between Computer Science and Biology through Transdisciplinary Teamwork and Modular Curriculum Design". National Science Foundation (NSF) STEM+C grant #1742446. [16]. Alford, D.V., Bumblebees. 1975: Davis-Poynter. [17]. Sladen, F., Scientific Books: The Humble-Bee, Its Life History and How to Domesticate It, with Descriptions of All the British Species of Bombus and Psithyrus. Science, 1913. 37: p. 180-182. [18]. Jandt, J.M., E. Huang, and A. Dornhaus, Weak Specialization of Workers Inside a Bumble Bee (Bombus impatiens) Nest. Behavioral Ecology and Sociobiology, 2009. 63(12): p. 1829-1836. [19]. Yerushalmi, S., S. Bodenhaimer, and G. Bloch, Developmentally determined attenuation in circadian rhythms links chronobiology to social organization in bees. Journal of Experimental Biology, 2006. 209(6): p. 1044. [20]. Goulson, D., A Sting in the Tale. 2016: Random House. [21]. Goulson, D., Bumblebees: behaviour, ecology, and conservation. 2010: Oxford University Press on Demand. [22]. Ayasse, M., J. Stökl, and W. Francke, Chemical ecology and pollinator-driven speciation in sexually deceptive orchids. Phytochemistry, 2011. 72(13): p. 1667-1677. [23]. Juillet, N. and G. Scopece, Does floral trait variability enhance reproductive success in deceptive orchids? Perspectives in Plant Ecology, Evolution and Systematics, 2010. 12(4): p. 317-322. [24]. Hatfield, R., et al., Conserving bumble bees. Guidelines for Creating and Managing Habitat for America's Declining Pollinators, 2012.

58 [25]. Somme, L., et al., Pollen and nectar quality drive the major and minor floral choices of bumble bees. Apidologie, 2015. 46(1): p. 92-106. [26]. Elmasri, R., Fundamentals of database systems. 2008: Pearson Education India. [27]. Batini, C., S. Ceri, and S.B. Navathe, Conceptual database design: an Entity-relationship approach. Vol. 116. 1992: Benjamin/Cummings Redwood City, CA. [28]. Momjian, B., PostgreSQL: introduction and concepts. Vol. 192. 2001: Addison-Wesley New York. [29]. Smith, L. What PostgreSQL has over other open source SQL databases: Part I. Oct 8, 2015; Available from: https://www.compose.com/articles/what-postgresql-has-over-other-open- source-sql-databases/. [30]. Haas, H. and A. Brown, Web services glossary. W3C Working Group Note (11 February 2004), 2004. 9: p. 784-786. [31]. Fielding, R., et al., Hypertext transfer protocol--HTTP/1.1. 1999. [32]. Fielding, R. and J. Reschke, Hypertext transfer protocol (HTTP/1.1): Message syntax and routing. 2014. [33]. Ferster, B., Interactive visualization: Insight through inquiry. 2012: MIT Press. [34]. Rohrer, R.M. and E. Swing, Web-based information visualization. IEEE Computer Graphics and Applications, 1997(4): p. 52-59. [35]. Berners-Lee, T. and D. Connolly, Hypertext markup language-2.0. 1995. [36]. Bos, B., et al., Cascading style sheets, level 2 CSS2 specification. Available via the World Wide Web at http://www.w3.org/TR/1998/REC-CSS2-19980512, 1998: p. 1472-1473. [37]. Flanagan, D., JavaScript: the definitive guide. 2006: O'Reilly Media, Inc. [38]. Quint, A., Scalable vector graphics. IEEE MultiMedia, 2003(3): p. 99-102. [39]. Bostock, M., D3. js. Data Driven Documents, 2012. 492: p. 701. [40]. Partners, N. NVD3 re-usable charts for d3.js. Available from: http://nvd3.orglindex.html. [41]. Google. Angular. Available from: https://angular.io/. [42]. The Beecology Project. 2018; Available from: https://beecology.wpi.edu. [43]. Bumble Bee Watch. Available from: https://www.bumblebeewatch.org/ [44]. BeeWatch. Available from: http://homepages.abdn.ac.uk/wpn003/beewatch/index.php. [45]. Yale Peabody Museum of Natural History. Available from: http://peabody.yale.edu/. [46]. University, M.o.C.Z.-H. MCZBASE:The Database of the Zoological Collections. Available from: https://mczbase.mcz.harvard.edu/. [47]. Everest, G.C. Basic data structure models explained with a common example. in Proc. Fifth Texas Conference on Computing Systems. 1976. Computer Society Publications Austin, TX (Oct.). Long Beach, CA. [48]. Pressman, R.S., Software engineering: a practitioner's approach. 2005: Palgrave Macmillan. [49]. Tilkov, S. and S. Vinoski, Node.js: Using JavaScript to build high-performance network programs. IEEE Internet Computing, 2010. 14(6): p. 80-83. [50]. Richardson, L. and S. Ruby, RESTful web services. 2008: O'Reilly Media, Inc. [51]. Google. Firebase Authentication | Firebase. Available from: https://firebase.google.com/docs/auth/. [52]. Devictor, V., et al., Spatial mismatch and congruence between taxonomic, phylogenetic and functional diversity: the need for integrative conservation strategies in a changing world. Ecology letters, 2010. 13(8): p. 1030-1040. [53]. Lavergne, S., et al., Biodiversity and climate change: integrating evolutionary and ecological responses of species and communities. Annual review of ecology, evolution, and systematics, 2010. 41: p. 321-350. [54]. M. Beals, L.G., S. Harrell. Diversity Indices. 2000 Available from: http://www.tiem.utk.edu/~gross/bioed/bealsmodules/shannonDI.html. [55]. Hurlbert, A.H. and W. Jetz, Species richness, hotspots, and the scale dependence of range maps in ecology and conservation. Proceedings of the National Academy of Sciences, 2007. 104(33): p. 13384-13389. [56]. MacDonald, G.M., Biogeography: Space, Time, and Life. 2003: NY: John Wiley & Sons, Inc. [57]. Margalef, R., Homage to Evelyn Hutchinson, or why is there an upper limit to diversity. Transactions of the Connecticut Academy of Arts and Sciences. Transactions of the

59 Connecticut Academy of Arts and Sciences, 1972. 44: p. 211-235. [58]. Fonkalsrud, S., Interaction patterns and specialization in a local and national Norwegian pollination network. 2014, The University of Bergen. [59]. Olesen, J.M., et al., The modularity of pollination networks. Proceedings of the National Academy of Sciences, 2007. 104(50): p. 19891-19896. [60]. Farnsworth, E., Go Botany: Integrated Tools To Advance Botanical Learning. Rhodora, 2012. 114(958): p. 214-215.

60 Appendix

A.1 Section 2.4.5 Diversity When users click one grid cell, there is a stacked bar pop up at the right side of diversity map. google.maps.event.addListener will catch any mouse click event. The third parameters is a function activated after click event. pseudo-code: google.maps.event.addListener(google map object, 'click', function (event) { var clickPoint = { lat:event.latitude, lng:event.longitude} var cell_ID = function(){ Check which grid cell is clickpoint locate in. Return cell identifier } var speciesdata = function(){ return all species and species number by checking cell_ID in preprocessed data. } var stackedbar = d3.select (svg tags) stackedbar.data(speciesdata).show(); });

A.2 Section 2.4.6 Population by season a line graph with a logarithmic scale, define linegraph is the line graph finally generate by nvd3 method nv.addGraph( function(){nv.models.lineChart()}); d3.scale.log() can generate a logarithmic scale, which map minimum value of data set to 0, maximum value of data set to height of y axis. var linegraph = nv.addGraph(function() { linegraph= nv.models.lineChart() Set x axis, y axis attributes here; yScale = d3.scale.log().domain([minimum value of data set , maximum Value of data set]).range([0, height of y axis ]); linegraph.yScale(yScale); //applied logarithmic scale on linegraph

Use d3 select svg tags for linegraph; Assigned season data with linegraph; Call linegraph; // code it recursively make linegraph reusable

return linegraph; });

Use d3 select svg tags for linegraph; Assigned season data with linegraph; 61 Call linegraph;

A.3 Section 2.3 Web Service We provide pseudo-code below to illustrate how the web service handles a sample request. The sample request we use as an example is as follows: A client end sends the Web service a ‘GET’ HTTP request to get an observation in database by observation id(record_id). Web service has a Node.js framework called ‘express’. The function ’express.Router()’ can recognize what the HTTP request method and request name are. Then express finds and executes ‘retrieve an observation’ function. The following pseudo-code show how to implement the function. find http request method ‘GET’ find http request name '/beerecord'; router = express.Router(); router.get('/beerecord/:record_id?', function(request, response, next) { record_id = request.parameters.record_id; if (record_id is null && record_id is not an integer type){ response.status(400).json(Utility.getErrorResponse('Bee record id must be an integer')); } connect to the database; build sql sentence "SELECT (*) FROM OBSERVATION(beerecord) TABLE WHERE observation id(‘record_id’ in beerecord table ) = record_id" ; run the sql sentence in the database; const result = result of executing sql sentence;

If (result is success) return response.status(200).json(result); else return response.status(404).json(failure message, error message); }); }

Table lists all request functions,

Request name Request Request Function and Method parameters Response

/BeeDex GET id (bumblebee dex If id is null, it will id) return all of the bumblebee species

62 information. Otherwise it will find and return THE bumblebee species information corresponding to id.

/beerecord DELETE record_id(*) It will delete the (observation id) observation in the database with record_id. If the observation is deleted from the database successfully, the response value will be a success message.

/beerecord GET record_id(*) It will find and return (observation id) the observation in the database with record_id.

/beerecorduser GET user_id (provided It will return all of the by Firebase observations Authentication ) submitted by the user whose client end (Android/Web app) generates the request.

/beevisrecords GET null I will return all of the observation data in the database including bumblebee species name, flower species name, location information, gender, behavior and date.

/record POST userid, It will insert a new chead,(bumblebe observation into e head) database. If database cabdomen,(bumbl insert new ebee abdomen) observation cthorax successfully, (bumblebee response value would 63 thorax), gender, be success message. Loc, cityname,fname, fshape, fcolor, beename, time, beedictid,beebeha vior, recordpicpath, recordvideopath

/uploadImage64 POST recordImage It will transform the BASE64 image string provided into an image file, then save it in server's file system and return the location of the image file.

/uploadVideo POST recordVideo It will save video file provided in sever's file system and return the location of the video file.

/record PUT id,gender,fname,f It will update the shape,fcolor,been observation with new ame,beename, data beebehavior, beedictid

/flowercolors GET null It will return all color values of flower colors trait

/flowershapes GET null It will return all shape values of flower shape Table A.1: Lists of API methods

64