Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By

Total Page:16

File Type:pdf, Size:1020Kb

Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By Bringing 'Bee-cological' Data to Life through a Relational Database and an Interactive Visualization Tool by Xiaojun Wang A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Master of Science in Bioinformatics and Computational Biology Aug 2018 APPROVED BY: Dr. Carolina Ruiz Dr. Robert J. Gegear Dr. Elizabeth F. Ryder 1 Abstract Over the past decade, bumblebees have rapidly declined in abundance and geographic distribution at an alarming rate, raising major social, economic and ecological concern worldwide. However, we presently lack effective bumblebee conservation strategies due to a lack of information on the specific ecological needs of each species. The ‘Beecology Project’ was created to fill this knowledge gap by utilizing citizen scientists to collect data on floral resource use patterns of foraging bees in naturally occurring mixed species communities across Massachusetts. In addition to its research goals, the Beecology Project also has the educational goal of providing a modular, integrated biology - computer science framework (a BIO-CS bridge) to assist teachers in developing curricula to meet the next generation biology and computer science standards at the high school level. The Beecology team has developed Android and Web mobile apps to assist citizen scientists to collect and submit field data on bumblebee and plant species interactions. Other Beecology team members also collected a substantial amount of bumblebee data through field research and online digital museum collections. However, there was no central location dedicated to the storage of such data. There was also no way for users such as researchers, educators, and the general public to access all of the collected data in an ecologically-meaningful way. To fill these gaps, I created a flexible relational database for receiving and storing data submissions from the Android app, Web app and other research data files. I also leveraged web development techniques like D3, Google maps, and Angular to develop several interactive visualization tools. These tools enable users to explore the contents of the database in different ways thereby facilitating exploratory studies, hypothesis generation and testing, and the development of effective conservation strategies for threatened bumblebee species. In addition, each tool was designed in a modular way , which will accelerate the development of modular high school curricula integrating biology practices and computational thinking. 2 Acknowledgments I would like to express my greatest gratitude to my advisors, Professor Carolina Ruiz, Professor Elizabeth F. Ryder and Professor Robert Gegear who have given me a chance to participate this exciting BIO-CS project. I also would like to thank them for their constant help, reading the paper with great care and offering me invaluable advice and informative suggestions. I also would like to thank every team and every team member in our Beecology project: Database and visualization team: Ellen Pierce who provided excel records of field observations; Fareya Ikram who assisted me in importing the data into the database; Quyen Hoang who helped me import the data and gave me advice on web service development and visualization. App and website development team: Linh Hoang, Andrew Gao, Ziyang Yu, Jacob Moon and Jackson Oliva who designed and developed the Android application; Huy Tran who developed the web application; Ankit Kumar who provided help on web site design and web service improvement; Akshit Soota who integrated Firebase Authentication with the web service; Andrew Walter who helped us with improvement of website and server; Sarun Paisarnsrisomsuk who helped me at teacher workshop. Website content team: Eoin O’Connell, Kenneth Levasseur, Sam Coache, Rachel Murphy, Kenedi Heather and Devin Stevens who provided background of the project and introductory tutorials. Simulation team: Kevin Heath, Michael LoTurco, Rachel Blakely who are developing the Simulation modeling software. Finally, I would like to thank my parents, Tao Wang and Xiujun Geng, and thank my aunt Hong Wang and my cousin Whitney Zhang for supporting me during my studies. This material is based upon work supported by the National Science Foundation under Grant No. 1742446. Disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation. 3 Contents List of Figures 5 List of Tables 7 1 Introduction 8 1.1 Overview: 8 1.2 Background 10 1.2.1 Bumblebees 10 1.2.2 Database 11 1.2.3 Visualization Tool 12 1.2.4 The Beecology Project 14 2 Methodology and Results 16 2.1 Data sources 17 2.1 Database Design 19 2.1.1 Introduction 19 2.1.2 Schema 22 2.2 Web Service 24 2.3 Interactive Data Visualizations 27 2.3.1 Introduction 27 2.3.2 Time Controller 28 2.3.3 Species selector 29 2.3.4 Data Preprocessing 30 2.3.5 Diversity Map 30 2.3.6 Population 36 2.3.7 Floral Preferences 46 2.3.8 Model, View, Controller (MVC) 52 3 Discussion and Future Work 54 Bibliography 58 Appendix 61 A.1 Section 2.4.5 Diversity 61 A.2 Section 2.4.6 Population by season 61 A.3 Section 2.3 Web Service 62 4 List of Figures Figure 1.1: Overview of Technological Needs for the Beecology Project 6 Figure 1.2: Angular Architecture 11 Figure 2.1: Roadmap of Data Flow 13 Figure 2.2: The four study sites used by Pierce and Gegear to collect data on bumblebee-plant species associations. 15 Figure 2.3: Abstraction graph for the Beecology Database 17 Figure 2.4: How to store picture and video of observation in database 18 Figure 2.5: Entity - Relationship Graph for the Beecology Database 19 Figure 2.6: Sample data stored in tables according to the schema 21 Figure 2.7: Roadmap of Web Service 22 Figure 2.8: Web service example - Submitting an observation to the database 22 Figure 2.9: Web service example - Getting all observations from the database 23 Figure 2.10: Firebase Authentication - Getting observations by user 23 Figure 2.11: The First Page of Data Visualization 24 Figure 2.12: Angular Modularity of the Beecology Visualization Tools 25 Figure 2.13: Time controller 26 Figure 2.14: Species selector 26 Figure 2.15: Process data by date 27 Figure 2.16: Design Process to Generate of Diversity Map 28 Figure 2.17: Gridview Design for Diversity Map 29 Figure 2.18: Calculate diversity of each grid cell 29 Figure 2.19: The diversity value is mapped to a color gradient 30 Figure 2.20: Diversity Map 31 Figure 2.21: The Diversity Map Allows Interactive Zooming by the User 31 Figure 2.22: Stacked bar presents species abundance of one grid cell 32 Figure 2.23: Overview of Location Visualization 33 Figure 2.24: Overview of Location Visualization 34 Figure 2.25: Overview of Location Visualization 34 Figure 2.26: Map and Map cluster markers - Zoom in and out 35 Figure 2.27: Initial design for map cluster markers showing species selection 36 Figure 2.28: Google Map Marker Clusters library 36 Figure 2.29: Circle map cluster markers -> Donuts cluster marker 37 5 Figure 2.30: Add Donuts graph function to Google Map Marker Cluster Library 37 Figure 2.31: Donuts Map cluster markers - all observations 37 Figure 2.32: Map cluster markers change dynamically when the user selects species 38 Figure 2.33: Map cluster markers adjust dynamically when the user zooms in 38 Figure 2.34: Prototype of Line graph for bumblebee phenology 39 Figure 2.35: Line graph for bumblebee season 40 Figure 2.36: Line graph for bumblebee season - Data filtered by time and species 41 Figure 2.37: Mapping observation data within heatmap cells 41 Figure 2.38: Heatmap Prototype 42 Figure 2.39: Heatmap 42 Figure 2.40: Heatmap - Data filtered by species and hover event 43 Figure 2.41: Design Process to Generate of Floral Visualization 44 Figure 2.42: How to build a bipartite graph 45 Figure 2.43: click event on one bumblebee rectangle 46 Figure 2.44: Bipartite graph with trait filters of bumblebee and flower 47 Figure 2.45: Bipartite graph click event - display statistic graphs of selected bumblebee and preferred flowers 47 Figure 2.46: Flower Preference Visualization - Overview graph 49 Figure 2.47: Show flower preferences of Bombus fervidus 50 Figure 2.48: Angular MVC implementation of heat map in Summary visualization 51 6 List of Tables Table 2.1 Data source exploration of three data sources 15 Table 2.2: “Bumblebee” species Table 21 Table 2.3: “Flower” Table 21 Table 2.4: “Feature” Table 21 Table 2.5: “Observation” Table 22 Table 2.6: A matrix of bumblebee - flower network 45 Table A.1: Lists of API methods 61 7 1 Introduction 1.1 Overview: Insect pollination plays a crucial role in ecosystem function, biodiversity maintenance, and agricultural productivity worldwide.[1] [2] [3] For example, pollination provided by different bumblebee species provide food, nest sites and shelter for a diverse array of wildlife and also contributes billions of dollars to the agronomy each year.[4] However, many insect pollinator species have declined in abundance and geographic distribution at an alarming rate over recent years, [5]raising major environmental, social and economic concerns. Although the cause of pollinator decline is currently unknown, significant contributing factors are likely to include habitat fragmentation and loss; pathogens, parasites and pesticides; impacts of non-native plants and pollinators; and climate changes. [6] [7] [8] In the case of bumblebee decline, a considerable amount of empirical effort has been devoted to examining the potential impacts of pathogens and pesticides on wild populations [7] [9] [10] [11] [12]However, far less is known about the effects of human-induced changes to critical resources needed for bumblebees to complete their annual life cycle.
Recommended publications
  • Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3
    Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 986048, 6 pages http://dx.doi.org/10.1155/2014/986048 Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3 Shilin Zhao, Yan Guo, Quanhu Sheng, and Yu Shyr Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, USA Correspondence should be addressed to Yu Shyr; [email protected] Received 6 June 2014; Accepted 2 July 2014; Published 16 July 2014 Academic Editor: Leng Han Copyright © 2014 Shilin Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Heat maps and clustering are used frequently in expression analysis studies for data visualization and quality control. Simple clustering and heat maps can be produced from the “heatmap” function in R. However, the “heatmap” function lacks certain functionalities and customizability, preventing it from generating advanced heat maps and dendrograms. To tackle the limitations of the “heatmap” function, we have developed an R package “heatmap3” which significantly improves the original “heatmap” function by adding several more powerful and convenient features. The “heatmap3” package allows users to produce highly customizable state of the art heat maps and dendrograms. The “heatmap3” package is developed based on the “heatmap” function in R, and itis completely compatible with it. The new features of “heatmap3” include highly customizable legends and side annotation, a wider range of color selections, new labeling features which allow users to define multiple layers of phenotype variables, and automatically conducted association tests based on the phenotypes provided.
    [Show full text]
  • Visualization and Exploration of Transcriptomics Data Nils Gehlenborg
    Visualization and Exploration of Transcriptomics Data 05 The identifier 800 year identifier Nils Gehlenborg Sidney Sussex College To celebrate our 800 year history an adaptation of the core identifier has been commissioned. This should be used on communications in the time period up to and including 2009. The 800 year identifier consists of three elements: the shield, the University of Cambridge logotype and the 800 years wording. It should not be redrawn, digitally manipulated or altered. The elements should not be A dissertation submitted to the University of Cambridge used independently and their relationship should for the degree of Doctor of Philosophy remain consistent. The 800 year identifier must always be reproduced from a digital master reference. This is available in eps, jpeg and gif format. Please ensure the appropriate artwork format is used. File formats European Molecular Biology Laboratory, eps: all professionally printed applications European Bioinformatics Institute, jpeg: Microsoft programmes Wellcome Trust Genome Campus, gif: online usage Hinxton, Cambridge, CB10 1SD, Colour United Kingdom. The 800 year identifier only appears in the five colour variants shown on this page. Email: [email protected] Black, Red Pantone 032, Yellow Pantone 109 and white October 12, 2010 shield with black (or white name). Single colour black or white. Please try to avoid any other colour combinations. Pantone 032 R237 G41 B57 Pantone 109 R254 G209 B0 To Maureen. This dissertation is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the text and acknowledgements. This dissertation is not substantially the same as any I have submit- ted for a degree, diploma or other qualification at any other university, and no part has already been, or is currently being submitted for any degree, diploma or other qualification.
    [Show full text]
  • How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification
    BINF 636: Lecture 9: Clustering: How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification. Description: Clustering, for the purpose of this lecture, is the exploratory partitioning of a set of data points into subgroups (clusters) such that members of each subgroup are relatively similar to each other and members of distinct clusters are relatively dissimilar. For example, one might have gene expression profiles from a set of samples of a particular type of tumor and wish to see if the samples separate out into distinct subgroups. In this case one could be looking to uncover evidence of previously unknown subtypes, or one might wish to see if the results of clustering the gene expression profiles are consistent with classification by histopathology. In this class we will describe how dendrograms, such as the example to the right, are constructed using hierarchical agglomerative clustering, where one starts with each of the data points as an individual cluster, and in successive steps combines the pair of clusters that are “closest” to each other into one new cluster. This requires specifying a distance measure between data points and between clusters. Each clustering step reduces by one the number of existing clusters until at the end of the final step there is one cluster containing all the data points. If one has ordered the data points along a line so that at each step the clusters that are joined together are adjacent to each other, one can draw a corresponding diagram (dendrogram) where the heights of the vertical lines reflect the distance between the pair of clusters joined at each stage of the procedure.
    [Show full text]
  • OLIVER: a Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data
    bioRxiv preprint doi: https://doi.org/10.1101/411595; this version posted May 19, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. OLIVER: A Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data Oliver L Tessmer David M Kramer∗ Jin Chen∗ Dept. of Energy Plant Research Lab Dept. of Energy Plant Research Lab Inst. for Biomedical Informatics Michigan State University Michigan State University University of Kentucky East Lansing, USA East Lansing, USA Lexington, USA [email protected] [email protected] [email protected] Abstract—There is a critical unmet need for new tools to phenotyping make it possible to probe plant growth, photo- analyze and understand “big data” in the biological sciences synthesis and other properties under dynamic environmental where breakthroughs come from connecting massive genomics conditions [7], [11], [18], [31], [44]. Similar approaches are data with complex phenomics data. By integrating instant data visualization and statistical hypothesis testing, we have developed impacting other fields, such as biochemistry, drug development a new tool called OLIVER for phenomics visual data analysis and behavior studies [3], [5], [6], [40]. with a unique function that any user adjustment will trigger real- Despite these major phenotyping technological advances, time display updates for any affected elements in the workspace. biomedical scientists are facing the difficulty of analyzing By visualizing and analyzing omics data with OLIVER, biomed- longitudinal phenomics data, for the nonlinear temporal pat- ical researchers can quickly generate hypotheses and then test their thoughts within the same tool, leading to efficient knowledge terns in a high-dimensional space are difficult to detect.
    [Show full text]
  • Superheat: an R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data
    Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data Rebecca L. Barter Department of Statistics, University of California, Berkeley and Bin Yu Department of Statistics, University of California, Berkeley January 30, 2017 Abstract The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customiz- able platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide arXiv:1512.01524v2 [stat.AP] 26 Jan 2017 range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps.
    [Show full text]
  • Data Mining Mobile Devices Defines the Collection of Machine-Sensed Mobile Mining Data Devices Mobile Devices Environmental Data Pertaining to Human Social Behavior
    Marketing / Data Mining and Knowledge Discovery Mena With today’s consumers spending more time on their mobiles than on their PCs, Data Mining new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire. Data Mining Mobile Devices defines the collection of machine-sensed Devices Data Mining Mobile Mobile Devices environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users. Jesus Mena • Examines the construction and leveraging of mobile sites • Describes how to use mobile apps to gather key data about consumers’ behavior and preferences • Discusses mobile mobs, which can be differentiated as distinct marketplaces—including Apple®, Google®, Facebook®, Amazon®, and Twitter ® • Provides detailed coverage of mobile analytics via clustering, text, and classification AI software and techniques Mobile devices serve as detailed diaries of a person, continuously and intimately broadcasting where, how, when, and what products, services, and content your consumers desire. The future is mobile—data mining starts and stops in consumers’ pockets. Describing how to analyze Wi-Fi and GPS data from websites and apps, the book explains how to model mined data through the use of artificial intelligence software. It also discusses the monetization
    [Show full text]
  • Download the Publicly Available R Software Language, Among a Few Other Operating System-Specific Requirements
    Khomtchouk et al. Source Code for Biology and Medicine (2014) 9:30 DOI 10.1186/s13029-014-0030-2 METHODOLOGY Open Access HeatmapGenerator: high performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline Bohdan B Khomtchouk1*, Derek J Van Booven2 and Claes Wahlestedt1 Abstract Background: The graphical visualization of gene expression data using heatmaps has become an integral component of modern-day medical research. Heatmaps are used extensively to plot quantitative differences in gene expression levels, such as those measured with RNAseq and microarray experiments, to provide qualitative large-scale views of the transcriptonomic landscape. Creating high-quality heatmaps is a computationally intensive task, often requiring considerable programming experience, particularly for customizing features to a specific dataset at hand. Methods: Software to create publication-quality heatmaps is developed with the R programming language, C++ programming language, and OpenGL application programming interface (API) to create industry-grade high performance graphics. Results: We create a graphical user interface (GUI) software package called HeatmapGenerator for Windows OS and Mac OS X as an intuitive, user-friendly alternative to researchers with minimal prior coding experience to allow them to create publication-quality heatmaps using R graphics without sacrificing their desired level of customization. The simplicity of HeatmapGenerator is that it only requires the user to upload a preformatted input file and download the publicly available R software language, among a few other operating system-specific requirements. Advanced features such as color, text labels, scaling, legend construction, and even database storage can be easily customized with no prior programming knowledge.
    [Show full text]
  • View in the FDA’S Voluntary Genomics Data Submission Program
    Fang et al. BMC Bioinformatics 2010, 11(Suppl 6):S4 http://www.biomedcentral.com/1471-2105/11/S6/S4 PROCEEDINGS Open Access An FDA bioinformatics tool for microbial genomics research on molecular characterization of bacterial foodborne pathogens using microarrays Hong Fang1, Joshua Xu1, Don Ding1, Scott A Jackson2, Isha R Patel2, Jonathan G Frye3, Wen Zou4, Rajesh Nayak4, Steven Foley4, James Chen4, Zhenqiang Su1, Yanbin Ye1, Steve Turner1, Steve Harris4, Guangxu Zhou1, Carl Cerniglia2, Weida Tong4* From Seventh Annual MCBIOS Conference. Bioinformatics: Systems, Biology, Informatics and Computation Jonesboro, AR, USA. 19-20 February 2010 Abstract Background: Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrackTM, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrackTM has been widely adopted by the research community and used for pharmacogenomics data review in the FDA’s Voluntary Genomics Data Submission program. Results: ArrayTrackTM has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrackTM’s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDA- ECSG) previously developed within the FDA.
    [Show full text]
  • Heatmap Visualization with Spreadsheet
    Heatmap Visualization With Spreadsheet hisScrumptious miff. Trustworthy and tetanic Chester Timmie impassion coordinates offside some or faggings horseshoer scant so when lymphatically! Shelley is Oncoming admiring. and Moroccan Saul always launders all-fired and parachutes Credit card and wait if you need to show off when the heatmap visualization tool for colors that the You understand apply with same charting styles and elements to map charts that merchandise can serve other Excel charts. You sick also paper the features and options available debt the template to customize and extend. How can I use COUNTD function in Tableau? Refresh page allows you advise select a map software this and laptop people like them to lane the api? The visualization of heatmaps are visualized as long does not. Details and give this account menu that would otherwise, leave it starts to predict like work. By visual summary of heatmap and visualize georeference data with. Brady has the best times to make decisions and then right format, i have lots of pattern you will be viewed it? How do with spreadsheets offers a visualization for their visualizations built for your heatmaps you want to visualize how you to create subplots and. If matter can test the method with playing good computer, you can depart the chart mode upon the default value but Color, it will not conclude until your spreadsheet is published to the web. The Formula Consistency View shades all cells with the same formulae using the same colour. Thankfully, click include the desired pin and click back the camera icon. This is useful for datasets that you update frequently, such as historical frequency of visits.
    [Show full text]
  • Heat Map Visualization for Electrocardiogram Data Analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2*
    Guo et al. BMC Cardiovascular Disorders (2020) 20:277 https://doi.org/10.1186/s12872-020-01560-8 RESEARCH ARTICLE Open Access Heat map visualization for electrocardiogram data analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2* Abstract Background: Most electrocardiogram (ECG) studies still take advantage of traditional statistical functions, and the results are mostly presented in tables, histograms, and curves. Few papers display ECG data by visual means. The aim of this study was to analyze and show data for electrocardiographic left ventricular hypertrophy (LVH) with ST- segment elevation (STE) by a heat map in order to explore the feasibility and clinical value of heat mapping for ECG data visualization. Methods: We sequentially collected the electrocardiograms of inpatients in the First Affiliated Hospital of Shantou University Medical College from July 2015 to December 2015 in order to screen cases of LVH with STE. HemI 1.0 software was used to draw heat maps to display the STE of each lead of each collected ECG. Cluster analysis was carried out based on the heat map and the results were drawn as tree maps (pedigree maps) in the heat map. Results: In total, 60 cases of electrocardiographic LVH with STE were screened and analyzed. STE leads were mainly in the V1,V2 and V3 leads. The ST-segment shifts of each lead of each collected ECG could be conveniently visualized in the heat map. According to cluster analysis in the heat map, STE leads were clustered into two categories, comprising of the right precordial leads (V1,V2,V3) and others (V4,V5,V6, I, II, III, aVF, aVL, aVR).
    [Show full text]
  • Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence
    RESEARCH ARTICLE Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence Matthew Kellom, Jason Raymond* School of Earth and Space Exploration, Arizona State University, Tempe, Arizona, United States of America * [email protected] a11111 Abstract The advancement of techniques to visualize and analyze large-scale sequencing datasets is an area of active research and is rooted in traditional techniques such as heat maps and dendrograms. We introduce dendritic heat maps that display heat map results over aligned DNA sequence clusters for a range of clustering cutoffs. Dendritic heat maps aid in visualiz- OPEN ACCESS ing the effects of group differences on clustering hierarchy and relative abundance of sam- Citation: Kellom M, Raymond J (2016) Using pled sequences. Here, we artificially generate two separate datasets with simplified Dendritic Heat Maps to Simultaneously Display mutation and population growth procedures with GC content group separation to use as Genotype Divergence with Phenotype Divergence. PLoS ONE 11(8): e0161292. doi:10.1371/journal. example phenotypes. In this work, we use the term phenotype to represent any feature by pone.0161292 which groups can be separated. These sequences were clustered in a fractional identity Editor: Patrick Jon Biggs, Massey University, NEW range of 0.75 to 1.0 using agglomerative minimum-, maximum-, and average-linkage algo- ZEALAND rithms, as well as a divisive centroid-based algorithm. We demonstrate that dendritic heat Received: January 4, 2016 maps give freedom to scrutinize specific clustering levels across a range of cutoffs, track changes in phenotype inequity across multiple levels of sequence clustering specificity, and Accepted: June 11, 2016 easily visualize how deeply rooted changes in phenotype inequity are in a dataset.
    [Show full text]
  • A Visual Analytics Framework for Cluster Analysis of DNA Microarray Data ⇑ José A
    Expert Systems with Applications xxx (2012) xxx–xxx Contents lists available at SciVerse ScienceDirect Expert Systems with Applications journal homepage: www.elsevier.com/locate/eswa A visual analytics framework for cluster analysis of DNA microarray data ⇑ José A. Castellanos-Garzón a, , Carlos Armando García b, Paulo Novais c, Fernando Díaz a a Department of Computer Science, University of Valladolid, University School of Computer Science, Plaza Santa Eulalia 9-11, 40005 Segovia, Spain b Department of Computer Science and Automatics, University of Salamanca, Faculty of Sciences, Plaza de los Caídos s/n, 37008 Salamanca, Spain c Department of Informatics, Universidade do Minho, Campus of Gualtar, 4710-057 Braga, Portugal article info abstract Keywords: Cluster analysis of DNA microarray data is an important but difficult task in knowledge discovery pro- Data mining cesses. Many clustering methods are applied to analysis of data for gene expression, but none of them DNA-microarrays is able to deal with an absolute way with the challenges that this technology raises. Due to this, many Cluster analysis applications have been developed for visually representing clustering algorithm results on DNA micro- Visual analytics array data, usually providing dendrogram and heat map visualizations. Most of these applications focus Metric spaces only on the above visualizations, and do not offer further visualization components to the validate the Boundary points Surface reconstruction clustering methods or to validate one another. This paper proposes using a visual analytics framework in cluster analysis of gene expression data. Additionally, it presents a new method for finding cluster boundaries based on properties of metric spaces. Our approach presents a set of visualization compo- nents able to interact with each other; namely, parallel coordinates, cluster boundary genes, 3D cluster surfaces and DNA microarray visualizations as heat maps.
    [Show full text]