Department of Information and Computer Science Paula Järvinen

Total Page:16

File Type:pdf, Size:1020Kb

Department of Information and Computer Science Paula Järvinen Department of Information and Computer Science Paula Jrvinen A DATA MODEL BASED APPROACH FOR VISUAL ANALYTICS OF MONITORING DATA Thesis submitted in partial fulfillment of the requirements for the degree of Licentiate of Science in Technology. Espoo, April 10, 2013 Supervisor: Professor Samuel Kaski Instructor: Docent Kai Puolamäki Inspector: Dr. Matti Gröhn Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of licentiate thesis Author Paula Jrvinen Title of thesis A data model based approach for visual analytics of monitoring data Department Department of Information and Computer Science Field of research Information technology Supervising professor Samuel Kaski Code of professorship T-61 Thesis advisor(s) Kai Puolamki Thesis examiner(s) Matti Grhn Number of pages 135 +20 Language English Date of submission for examination 13.03.2013 Abstract Data modelling is a well-established method in software engineering. This work explores its use in the emerging field of visual analytics. Visual analytics is a recent approach to finding knowledge from data masses. It combines the strengths of automatic data processing and the visual perception and analysis capabilities of the human user. The approach has its roots in information visualization and data analysis, in which the use of data models is not common practice. The backbone of this work is the domain data model. The model incorporates the main concepts of a given domain, which remain similar regardless of the application, but which can be tuned for visualization and analysis purposes. The work proposes three uses for data models. The first is the construction of visual analytics applications in the domain. The second is supporting reasoning with the help of metadata. The third is using the data model as an approach to visualize large data spaces. The study focuses on the analysis of monitoring data, which is nowadays collected in vast amounts and from a wide variety of fields. The approach is evaluated using two cases from different applications in the monitoring data domain: analysing the eating and exercise habits of dieting people, and studying the energy efficiency and indoor conditions of buildings. In addition to the approach and the evaluation cases, the work introduces visual analytics, data modelling and monitoring data, and discusses the evaluation of visual analytics. The multi-discipline research area of visual analytics is represented in the form of a framework constructed as a part of this work. The results suggest that data modelling is a useful method in visual analytics. A domain model approach can save effort in constructing new visual analytics applications. Supporting reasoning and browsing data with the help of the data model would be especially useful for users who are not so familiar with data analysis, but know the application domain well. Combining the data model approach with descriptive visualizations can bring powerful tools for analysing data. Keywords Visual analytics, information visualization, data modelling, monitoring data Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Lisensiaatin tutkimuksen tiivistelmä Tekij Paula Jrvinen Tyn nimi Tietomallien hydyntminen monitorointidatan visuaalisessa data- analyysiss Laitos Tietojenksittelytieteen laitos Tutkimusala Informaatiotekniikka Vastuuprofessori Samuel Kaski Professuurikoodi T-61 Tyn ohjaajat Kai Puolamki Tyn tarkastajat Matti Grhn Jtetty tarkastettavaksi 13.03.2013 Sivumr 135 +20 Kieli Englanti Tiivistelm Tiedon mallinnus on vakiintunut ohjelmistotekniikan menetelm. Työss tutkitaan sen soveltamismahdollisuuksia visuaaliseen data-analyysiin. Visuaalinen data-analyysi on tuore tutkimusalue, jonka tavoitteena tietämyksen esiin saaminen tietomassoista. Siin pyritään hydyntämän ihmisen visuaalisia kykyj ja päättelytaitoja yhdess automaattisten analyysimenetelmien kanssa. Tiedon mallinnuksen kyttminen visuaalisessa data-analyysiss ei ole vakiintunutta. Työn perustana on sovellusaluekohtainen tietomalli. Malli sisält sovellusalueen keskeiset ksitteet, jotka pysyvt samoina yksittäisest sovelluksesta toiseen. Lisksi malli huomioi visualisointi- ja analyysimenetelmien tarpeet. Työss ehdotetaan kolmea kytttapaa tietomallille. Ensimminen on analyysisovellusten tuottaminen malliin perustuvan alustan avulla, toinen on kyttjän pttelyn tukeminen malliin talletetun metadatan perusteella ja kolmas laajojen tietoaineistojen selailu mallista generoidun nkymn avulla. Työss keskitytn monitorointidataan, jota kertn nykyisin yh useammilta alueilta. Lhestymistapaa arvioidaan soveltamalla sit kahteen monitorointisovellukseen: ihmisten ruokailu- ja liikuntatapojen analysointiin sek rakennusten sisilmaolosuhteiden ja energiatehokkuuden arviointiin. Lisksi työss esitetn laaja katsaus visuaaliseen data-analyysiin, tiedon mallinnukseen, monitorointidataan sek visuaalisen data- analyysin käytettävyyden arviointiin. Tulokset antavat olettaa, ett sovellusaluekohtaisesta tietomallista on hyty visuaalisessa data-analyysiss. Malliin perustuva alusta voi nopeuttaa uusien sovellusten tuottamista. Mallin tarjoama päättelytuki ja aineiston selailunkym hyödyttävät erityisesti kohdealueen asiantuntijoita, jotka eivt ole data-analyysin tuntijoita. Yhdistämll lhestymistavan tarjoamiin mahdollisuuksiin kohdetta kuvaavia visualisointeja, voidaan saada aikaa tehokkaita työkaluja tiedon analysointiin. Avainsanat visuaalinen data-analyysi, tiedon visualisointi, tiedon mallinnus, monitorointi Acknowledgements I would like to thank my advisor, Docent Kai Puolamäki, for the valuable advice and support he has given me in the writing of this thesis. I would also like to thank my advisor Professor Samuel Kaski for leading me through the final stages of the process. Thanks also to nutritionist Leena Metsranta for valuable discussions, and to all other participants of the user evaluations. My deepest thanks go to my colleagues at VTT, especially Sanni, and my friends and family for their understanding and support. I am also grateful to my employer VTT for the financial contribution. 2 Contents 1 Introduction ....................................................................................................... 5 2 Visual analytics ................................................................................................. 8 2.1 State of the art of research ..........................................................................10 2.2 Commercial markets...................................................................................13 3 Framework for visual analytics .........................................................................15 3.1 Visual analytics problems ...........................................................................16 3.2 Data ...........................................................................................................17 3.3 Insight ........................................................................................................18 3.4 Human cognition and perception ................................................................19 3.4.1 Visual perception ................................................................................19 3.4.2 Memory system ...................................................................................23 3.4.3 Data graphics ......................................................................................24 3.5 Analytical reasoning ...................................................................................26 3.5.1 Sense-making in visual analytics .........................................................26 3.5.2 Human mental model ..........................................................................28 3.5.3 Precepts and artefacts for reasoning .....................................................29 3.6 User interaction ..........................................................................................30 3.6.1 Requirements for user interaction ........................................................30 3.6.2 Interaction tasks ..................................................................................31 3.6.3 Interaction techniques..........................................................................32 3.6.4 Spatial and temporal data ....................................................................33 3.7 Data analysis ..............................................................................................34 3.7.1 Data analysis process...........................................................................35 3.7.2 Methods for data exploration ...............................................................36 3.7.3 Descriptive methods ............................................................................36 3.7.4 Predictive methods ..............................................................................37 3.7.5 Methods for anomaly detection ...........................................................37 3.8 Information visualization ............................................................................38 3.8.1 Univariate data ....................................................................................39 3.8.2 Bivariate data ......................................................................................41 3.8.3 Multivariate data .................................................................................41 3.8.4 Hierarchical and network structures .....................................................45 3.8.5 Temporal and spatial data ....................................................................47
Recommended publications
  • Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3
    Hindawi Publishing Corporation BioMed Research International Volume 2014, Article ID 986048, 6 pages http://dx.doi.org/10.1155/2014/986048 Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3 Shilin Zhao, Yan Guo, Quanhu Sheng, and Yu Shyr Center for Quantitative Sciences, Vanderbilt University, Nashville, TN 37232, USA Correspondence should be addressed to Yu Shyr; [email protected] Received 6 June 2014; Accepted 2 July 2014; Published 16 July 2014 Academic Editor: Leng Han Copyright © 2014 Shilin Zhao et al. This is an open access article distributed under the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. Heat maps and clustering are used frequently in expression analysis studies for data visualization and quality control. Simple clustering and heat maps can be produced from the “heatmap” function in R. However, the “heatmap” function lacks certain functionalities and customizability, preventing it from generating advanced heat maps and dendrograms. To tackle the limitations of the “heatmap” function, we have developed an R package “heatmap3” which significantly improves the original “heatmap” function by adding several more powerful and convenient features. The “heatmap3” package allows users to produce highly customizable state of the art heat maps and dendrograms. The “heatmap3” package is developed based on the “heatmap” function in R, and itis completely compatible with it. The new features of “heatmap3” include highly customizable legends and side annotation, a wider range of color selections, new labeling features which allow users to define multiple layers of phenotype variables, and automatically conducted association tests based on the phenotypes provided.
    [Show full text]
  • Visualization and Exploration of Transcriptomics Data Nils Gehlenborg
    Visualization and Exploration of Transcriptomics Data 05 The identifier 800 year identifier Nils Gehlenborg Sidney Sussex College To celebrate our 800 year history an adaptation of the core identifier has been commissioned. This should be used on communications in the time period up to and including 2009. The 800 year identifier consists of three elements: the shield, the University of Cambridge logotype and the 800 years wording. It should not be redrawn, digitally manipulated or altered. The elements should not be A dissertation submitted to the University of Cambridge used independently and their relationship should for the degree of Doctor of Philosophy remain consistent. The 800 year identifier must always be reproduced from a digital master reference. This is available in eps, jpeg and gif format. Please ensure the appropriate artwork format is used. File formats European Molecular Biology Laboratory, eps: all professionally printed applications European Bioinformatics Institute, jpeg: Microsoft programmes Wellcome Trust Genome Campus, gif: online usage Hinxton, Cambridge, CB10 1SD, Colour United Kingdom. The 800 year identifier only appears in the five colour variants shown on this page. Email: [email protected] Black, Red Pantone 032, Yellow Pantone 109 and white October 12, 2010 shield with black (or white name). Single colour black or white. Please try to avoid any other colour combinations. Pantone 032 R237 G41 B57 Pantone 109 R254 G209 B0 To Maureen. This dissertation is my own work and contains nothing which is the outcome of work done in collaboration with others, except as specified in the text and acknowledgements. This dissertation is not substantially the same as any I have submit- ted for a degree, diploma or other qualification at any other university, and no part has already been, or is currently being submitted for any degree, diploma or other qualification.
    [Show full text]
  • Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By
    Bringing 'Bee-cological' Data to Life through a Relational Database and an Interactive Visualization Tool by Xiaojun Wang A Thesis Submitted to the Faculty of the WORCESTER POLYTECHNIC INSTITUTE in partial fulfillment of the requirements for the Degree of Master of Science in Bioinformatics and Computational Biology Aug 2018 APPROVED BY: Dr. Carolina Ruiz Dr. Robert J. Gegear Dr. Elizabeth F. Ryder 1 Abstract Over the past decade, bumblebees have rapidly declined in abundance and geographic distribution at an alarming rate, raising major social, economic and ecological concern worldwide. However, we presently lack effective bumblebee conservation strategies due to a lack of information on the specific ecological needs of each species. The ‘Beecology Project’ was created to fill this knowledge gap by utilizing citizen scientists to collect data on floral resource use patterns of foraging bees in naturally occurring mixed species communities across Massachusetts. In addition to its research goals, the Beecology Project also has the educational goal of providing a modular, integrated biology - computer science framework (a BIO-CS bridge) to assist teachers in developing curricula to meet the next generation biology and computer science standards at the high school level. The Beecology team has developed Android and Web mobile apps to assist citizen scientists to collect and submit field data on bumblebee and plant species interactions. Other Beecology team members also collected a substantial amount of bumblebee data through field research and online digital museum collections. However, there was no central location dedicated to the storage of such data. There was also no way for users such as researchers, educators, and the general public to access all of the collected data in an ecologically-meaningful way.
    [Show full text]
  • How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification
    BINF 636: Lecture 9: Clustering: How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification. Description: Clustering, for the purpose of this lecture, is the exploratory partitioning of a set of data points into subgroups (clusters) such that members of each subgroup are relatively similar to each other and members of distinct clusters are relatively dissimilar. For example, one might have gene expression profiles from a set of samples of a particular type of tumor and wish to see if the samples separate out into distinct subgroups. In this case one could be looking to uncover evidence of previously unknown subtypes, or one might wish to see if the results of clustering the gene expression profiles are consistent with classification by histopathology. In this class we will describe how dendrograms, such as the example to the right, are constructed using hierarchical agglomerative clustering, where one starts with each of the data points as an individual cluster, and in successive steps combines the pair of clusters that are “closest” to each other into one new cluster. This requires specifying a distance measure between data points and between clusters. Each clustering step reduces by one the number of existing clusters until at the end of the final step there is one cluster containing all the data points. If one has ordered the data points along a line so that at each step the clusters that are joined together are adjacent to each other, one can draw a corresponding diagram (dendrogram) where the heights of the vertical lines reflect the distance between the pair of clusters joined at each stage of the procedure.
    [Show full text]
  • OLIVER: a Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data
    bioRxiv preprint doi: https://doi.org/10.1101/411595; this version posted May 19, 2019. The copyright holder for this preprint (which was not certified by peer review) is the author/funder, who has granted bioRxiv a license to display the preprint in perpetuity. It is made available under aCC-BY-NC-ND 4.0 International license. OLIVER: A Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data Oliver L Tessmer David M Kramer∗ Jin Chen∗ Dept. of Energy Plant Research Lab Dept. of Energy Plant Research Lab Inst. for Biomedical Informatics Michigan State University Michigan State University University of Kentucky East Lansing, USA East Lansing, USA Lexington, USA [email protected] [email protected] [email protected] Abstract—There is a critical unmet need for new tools to phenotyping make it possible to probe plant growth, photo- analyze and understand “big data” in the biological sciences synthesis and other properties under dynamic environmental where breakthroughs come from connecting massive genomics conditions [7], [11], [18], [31], [44]. Similar approaches are data with complex phenomics data. By integrating instant data visualization and statistical hypothesis testing, we have developed impacting other fields, such as biochemistry, drug development a new tool called OLIVER for phenomics visual data analysis and behavior studies [3], [5], [6], [40]. with a unique function that any user adjustment will trigger real- Despite these major phenotyping technological advances, time display updates for any affected elements in the workspace. biomedical scientists are facing the difficulty of analyzing By visualizing and analyzing omics data with OLIVER, biomed- longitudinal phenomics data, for the nonlinear temporal pat- ical researchers can quickly generate hypotheses and then test their thoughts within the same tool, leading to efficient knowledge terns in a high-dimensional space are difficult to detect.
    [Show full text]
  • Superheat: an R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data
    Superheat: An R package for creating beautiful and extendable heatmaps for visualizing complex data Rebecca L. Barter Department of Statistics, University of California, Berkeley and Bin Yu Department of Statistics, University of California, Berkeley January 30, 2017 Abstract The technological advancements of the modern era have enabled the collection of huge amounts of data in science and beyond. Extracting useful information from such massive datasets is an ongoing challenge as traditional data visualization tools typically do not scale well in high-dimensional settings. An existing visualization technique that is particularly well suited to visualizing large datasets is the heatmap. Although heatmaps are extremely popular in fields such as bioinformatics for visualizing large gene expression datasets, they remain a severely underutilized visualization tool in modern data analysis. In this paper we introduce superheat, a new R package that provides an extremely flexible and customiz- able platform for visualizing large datasets using extendable heatmaps. Superheat enhances the traditional heatmap by providing a platform to visualize a wide range of data types simultaneously, adding to the heatmap a response variable as a scatterplot, model results as boxplots, correlation information as barplots, text information, and more. Superheat allows the user to explore their data to greater depths and to take advantage of the heterogeneity present in the data to inform analysis decisions. The goal of this paper is two-fold: (1) to demonstrate the potential of the heatmap as a default visualization method for a wide arXiv:1512.01524v2 [stat.AP] 26 Jan 2017 range of data types using reproducible examples, and (2) to highlight the customizability and ease of implementation of the superheat package in R for creating beautiful and extendable heatmaps.
    [Show full text]
  • Data Mining Mobile Devices Defines the Collection of Machine-Sensed Mobile Mining Data Devices Mobile Devices Environmental Data Pertaining to Human Social Behavior
    Marketing / Data Mining and Knowledge Discovery Mena With today’s consumers spending more time on their mobiles than on their PCs, Data Mining new methods of empirical stochastic modeling have emerged that can provide marketers with detailed information about the products, content, and services their customers desire. Data Mining Mobile Devices defines the collection of machine-sensed Devices Data Mining Mobile Mobile Devices environmental data pertaining to human social behavior. It explains how the integration of data mining and machine learning can enable the modeling of conversation context, proximity sensing, and geospatial location throughout large communities of mobile users. Jesus Mena • Examines the construction and leveraging of mobile sites • Describes how to use mobile apps to gather key data about consumers’ behavior and preferences • Discusses mobile mobs, which can be differentiated as distinct marketplaces—including Apple®, Google®, Facebook®, Amazon®, and Twitter ® • Provides detailed coverage of mobile analytics via clustering, text, and classification AI software and techniques Mobile devices serve as detailed diaries of a person, continuously and intimately broadcasting where, how, when, and what products, services, and content your consumers desire. The future is mobile—data mining starts and stops in consumers’ pockets. Describing how to analyze Wi-Fi and GPS data from websites and apps, the book explains how to model mined data through the use of artificial intelligence software. It also discusses the monetization
    [Show full text]
  • Download the Publicly Available R Software Language, Among a Few Other Operating System-Specific Requirements
    Khomtchouk et al. Source Code for Biology and Medicine (2014) 9:30 DOI 10.1186/s13029-014-0030-2 METHODOLOGY Open Access HeatmapGenerator: high performance RNAseq and microarray visualization software suite to examine differential gene expression levels using an R and C++ hybrid computational pipeline Bohdan B Khomtchouk1*, Derek J Van Booven2 and Claes Wahlestedt1 Abstract Background: The graphical visualization of gene expression data using heatmaps has become an integral component of modern-day medical research. Heatmaps are used extensively to plot quantitative differences in gene expression levels, such as those measured with RNAseq and microarray experiments, to provide qualitative large-scale views of the transcriptonomic landscape. Creating high-quality heatmaps is a computationally intensive task, often requiring considerable programming experience, particularly for customizing features to a specific dataset at hand. Methods: Software to create publication-quality heatmaps is developed with the R programming language, C++ programming language, and OpenGL application programming interface (API) to create industry-grade high performance graphics. Results: We create a graphical user interface (GUI) software package called HeatmapGenerator for Windows OS and Mac OS X as an intuitive, user-friendly alternative to researchers with minimal prior coding experience to allow them to create publication-quality heatmaps using R graphics without sacrificing their desired level of customization. The simplicity of HeatmapGenerator is that it only requires the user to upload a preformatted input file and download the publicly available R software language, among a few other operating system-specific requirements. Advanced features such as color, text labels, scaling, legend construction, and even database storage can be easily customized with no prior programming knowledge.
    [Show full text]
  • View in the FDA’S Voluntary Genomics Data Submission Program
    Fang et al. BMC Bioinformatics 2010, 11(Suppl 6):S4 http://www.biomedcentral.com/1471-2105/11/S6/S4 PROCEEDINGS Open Access An FDA bioinformatics tool for microbial genomics research on molecular characterization of bacterial foodborne pathogens using microarrays Hong Fang1, Joshua Xu1, Don Ding1, Scott A Jackson2, Isha R Patel2, Jonathan G Frye3, Wen Zou4, Rajesh Nayak4, Steven Foley4, James Chen4, Zhenqiang Su1, Yanbin Ye1, Steve Turner1, Steve Harris4, Guangxu Zhou1, Carl Cerniglia2, Weida Tong4* From Seventh Annual MCBIOS Conference. Bioinformatics: Systems, Biology, Informatics and Computation Jonesboro, AR, USA. 19-20 February 2010 Abstract Background: Advances in microbial genomics and bioinformatics are offering greater insights into the emergence and spread of foodborne pathogens in outbreak scenarios. The Food and Drug Administration (FDA) has developed a genomics tool, ArrayTrackTM, which provides extensive functionalities to manage, analyze, and interpret genomic data for mammalian species. ArrayTrackTM has been widely adopted by the research community and used for pharmacogenomics data review in the FDA’s Voluntary Genomics Data Submission program. Results: ArrayTrackTM has been extended to manage and analyze genomics data from bacterial pathogens of human, animal, and food origin. It was populated with bioinformatics data from public databases such as NCBI, Swiss-Prot, KEGG Pathway, and Gene Ontology to facilitate pathogen detection and characterization. ArrayTrackTM’s data processing and visualization tools were enhanced with analysis capabilities designed specifically for microbial genomics including flag-based hierarchical clustering analysis (HCA), flag concordance heat maps, and mixed scatter plots. These specific functionalities were evaluated on data generated from a custom Affymetrix array (FDA- ECSG) previously developed within the FDA.
    [Show full text]
  • Heatmap Visualization with Spreadsheet
    Heatmap Visualization With Spreadsheet hisScrumptious miff. Trustworthy and tetanic Chester Timmie impassion coordinates offside some or faggings horseshoer scant so when lymphatically! Shelley is Oncoming admiring. and Moroccan Saul always launders all-fired and parachutes Credit card and wait if you need to show off when the heatmap visualization tool for colors that the You understand apply with same charting styles and elements to map charts that merchandise can serve other Excel charts. You sick also paper the features and options available debt the template to customize and extend. How can I use COUNTD function in Tableau? Refresh page allows you advise select a map software this and laptop people like them to lane the api? The visualization of heatmaps are visualized as long does not. Details and give this account menu that would otherwise, leave it starts to predict like work. By visual summary of heatmap and visualize georeference data with. Brady has the best times to make decisions and then right format, i have lots of pattern you will be viewed it? How do with spreadsheets offers a visualization for their visualizations built for your heatmaps you want to visualize how you to create subplots and. If matter can test the method with playing good computer, you can depart the chart mode upon the default value but Color, it will not conclude until your spreadsheet is published to the web. The Formula Consistency View shades all cells with the same formulae using the same colour. Thankfully, click include the desired pin and click back the camera icon. This is useful for datasets that you update frequently, such as historical frequency of visits.
    [Show full text]
  • Heat Map Visualization for Electrocardiogram Data Analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2*
    Guo et al. BMC Cardiovascular Disorders (2020) 20:277 https://doi.org/10.1186/s12872-020-01560-8 RESEARCH ARTICLE Open Access Heat map visualization for electrocardiogram data analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2* Abstract Background: Most electrocardiogram (ECG) studies still take advantage of traditional statistical functions, and the results are mostly presented in tables, histograms, and curves. Few papers display ECG data by visual means. The aim of this study was to analyze and show data for electrocardiographic left ventricular hypertrophy (LVH) with ST- segment elevation (STE) by a heat map in order to explore the feasibility and clinical value of heat mapping for ECG data visualization. Methods: We sequentially collected the electrocardiograms of inpatients in the First Affiliated Hospital of Shantou University Medical College from July 2015 to December 2015 in order to screen cases of LVH with STE. HemI 1.0 software was used to draw heat maps to display the STE of each lead of each collected ECG. Cluster analysis was carried out based on the heat map and the results were drawn as tree maps (pedigree maps) in the heat map. Results: In total, 60 cases of electrocardiographic LVH with STE were screened and analyzed. STE leads were mainly in the V1,V2 and V3 leads. The ST-segment shifts of each lead of each collected ECG could be conveniently visualized in the heat map. According to cluster analysis in the heat map, STE leads were clustered into two categories, comprising of the right precordial leads (V1,V2,V3) and others (V4,V5,V6, I, II, III, aVF, aVL, aVR).
    [Show full text]
  • Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence
    RESEARCH ARTICLE Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence Matthew Kellom, Jason Raymond* School of Earth and Space Exploration, Arizona State University, Tempe, Arizona, United States of America * [email protected] a11111 Abstract The advancement of techniques to visualize and analyze large-scale sequencing datasets is an area of active research and is rooted in traditional techniques such as heat maps and dendrograms. We introduce dendritic heat maps that display heat map results over aligned DNA sequence clusters for a range of clustering cutoffs. Dendritic heat maps aid in visualiz- OPEN ACCESS ing the effects of group differences on clustering hierarchy and relative abundance of sam- Citation: Kellom M, Raymond J (2016) Using pled sequences. Here, we artificially generate two separate datasets with simplified Dendritic Heat Maps to Simultaneously Display mutation and population growth procedures with GC content group separation to use as Genotype Divergence with Phenotype Divergence. PLoS ONE 11(8): e0161292. doi:10.1371/journal. example phenotypes. In this work, we use the term phenotype to represent any feature by pone.0161292 which groups can be separated. These sequences were clustered in a fractional identity Editor: Patrick Jon Biggs, Massey University, NEW range of 0.75 to 1.0 using agglomerative minimum-, maximum-, and average-linkage algo- ZEALAND rithms, as well as a divisive centroid-based algorithm. We demonstrate that dendritic heat Received: January 4, 2016 maps give freedom to scrutinize specific clustering levels across a range of cutoffs, track changes in phenotype inequity across multiple levels of sequence clustering specificity, and Accepted: June 11, 2016 easily visualize how deeply rooted changes in phenotype inequity are in a dataset.
    [Show full text]