Department of Information and Computer Science Paula Järvinen

Department of Information and Computer Science Paula Jrvinen A DATA MODEL BASED APPROACH FOR VISUAL ANALYTICS OF MONITORING DATA Thesis submitted in partial fulfillment of the requirements for the degree of Licentiate of Science in Technology. Espoo, April 10, 2013 Supervisor: Professor Samuel Kaski Instructor: Docent Kai Puolamäki Inspector: Dr. Matti Gröhn Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of licentiate thesis Author Paula Jrvinen Title of thesis A data model based approach for visual analytics of monitoring data Department Department of Information and Computer Science Field of research Information technology Supervising professor Samuel Kaski Code of professorship T-61 Thesis advisor(s) Kai Puolamki Thesis examiner(s) Matti Grhn Number of pages 135 +20 Language English Date of submission for examination 13.03.2013 Abstract Data modelling is a well-established method in software engineering. This work explores its use in the emerging field of visual analytics. Visual analytics is a recent approach to finding knowledge from data masses. It combines the strengths of automatic data processing and the visual perception and analysis capabilities of the human user. The approach has its roots in information visualization and data analysis, in which the use of data models is not common practice. The backbone of this work is the domain data model. The model incorporates the main concepts of a given domain, which remain similar regardless of the application, but which can be tuned for visualization and analysis purposes. The work proposes three uses for data models. The first is the construction of visual analytics applications in the domain. The second is supporting reasoning with the help of metadata. The third is using the data model as an approach to visualize large data spaces. The study focuses on the analysis of monitoring data, which is nowadays collected in vast amounts and from a wide variety of fields. The approach is evaluated using two cases from different applications in the monitoring data domain: analysing the eating and exercise habits of dieting people, and studying the energy efficiency and indoor conditions of buildings. In addition to the approach and the evaluation cases, the work introduces visual analytics, data modelling and monitoring data, and discusses the evaluation of visual analytics. The multi-discipline research area of visual analytics is represented in the form of a framework constructed as a part of this work. The results suggest that data modelling is a useful method in visual analytics. A domain model approach can save effort in constructing new visual analytics applications. Supporting reasoning and browsing data with the help of the data model would be especially useful for users who are not so familiar with data analysis, but know the application domain well. Combining the data model approach with descriptive visualizations can bring powerful tools for analysing data. Keywords Visual analytics, information visualization, data modelling, monitoring data Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Lisensiaatin tutkimuksen tiivistelmä Tekij Paula Jrvinen Tyn nimi Tietomallien hydyntminen monitorointidatan visuaalisessa data- analyysiss Laitos Tietojenksittelytieteen laitos Tutkimusala Informaatiotekniikka Vastuuprofessori Samuel Kaski Professuurikoodi T-61 Tyn ohjaajat Kai Puolamki Tyn tarkastajat Matti Grhn Jtetty tarkastettavaksi 13.03.2013 Sivumr 135 +20 Kieli Englanti Tiivistelm Tiedon mallinnus on vakiintunut ohjelmistotekniikan menetelm. Työss tutkitaan sen soveltamismahdollisuuksia visuaaliseen data-analyysiin. Visuaalinen data-analyysi on tuore tutkimusalue, jonka tavoitteena tietämyksen esiin saaminen tietomassoista. Siin pyritään hydyntämän ihmisen visuaalisia kykyj ja päättelytaitoja yhdess automaattisten analyysimenetelmien kanssa. Tiedon mallinnuksen kyttminen visuaalisessa data-analyysiss ei ole vakiintunutta. Työn perustana on sovellusaluekohtainen tietomalli. Malli sisält sovellusalueen keskeiset ksitteet, jotka pysyvt samoina yksittäisest sovelluksesta toiseen. Lisksi malli huomioi visualisointi- ja analyysimenetelmien tarpeet. Työss ehdotetaan kolmea kytttapaa tietomallille. Ensimminen on analyysisovellusten tuottaminen malliin perustuvan alustan avulla, toinen on kyttjän pttelyn tukeminen malliin talletetun metadatan perusteella ja kolmas laajojen tietoaineistojen selailu mallista generoidun nkymn avulla. Työss keskitytn monitorointidataan, jota kertn nykyisin yh useammilta alueilta. Lhestymistapaa arvioidaan soveltamalla sit kahteen monitorointisovellukseen: ihmisten ruokailu- ja liikuntatapojen analysointiin sek rakennusten sisilmaolosuhteiden ja energiatehokkuuden arviointiin. Lisksi työss esitetn laaja katsaus visuaaliseen data-analyysiin, tiedon mallinnukseen, monitorointidataan sek visuaalisen data- analyysin käytettävyyden arviointiin. Tulokset antavat olettaa, ett sovellusaluekohtaisesta tietomallista on hyty visuaalisessa data-analyysiss. Malliin perustuva alusta voi nopeuttaa uusien sovellusten tuottamista. Mallin tarjoama päättelytuki ja aineiston selailunkym hyödyttävät erityisesti kohdealueen asiantuntijoita, jotka eivt ole data-analyysin tuntijoita. Yhdistämll lhestymistavan tarjoamiin mahdollisuuksiin kohdetta kuvaavia visualisointeja, voidaan saada aikaa tehokkaita työkaluja tiedon analysointiin. Avainsanat visuaalinen data-analyysi, tiedon visualisointi, tiedon mallinnus, monitorointi Acknowledgements I would like to thank my advisor, Docent Kai Puolamäki, for the valuable advice and support he has given me in the writing of this thesis. I would also like to thank my advisor Professor Samuel Kaski for leading me through the final stages of the process. Thanks also to nutritionist Leena Metsranta for valuable discussions, and to all other participants of the user evaluations. My deepest thanks go to my colleagues at VTT, especially Sanni, and my friends and family for their understanding and support. I am also grateful to my employer VTT for the financial contribution. 2 Contents 1 Introduction ....................................................................................................... 5 2 Visual analytics ................................................................................................. 8 2.1 State of the art of research ..........................................................................10 2.2 Commercial markets...................................................................................13 3 Framework for visual analytics .........................................................................15 3.1 Visual analytics problems ...........................................................................16 3.2 Data ...........................................................................................................17 3.3 Insight ........................................................................................................18 3.4 Human cognition and perception ................................................................19 3.4.1 Visual perception ................................................................................19 3.4.2 Memory system ...................................................................................23 3.4.3 Data graphics ......................................................................................24 3.5 Analytical reasoning ...................................................................................26 3.5.1 Sense-making in visual analytics .........................................................26 3.5.2 Human mental model ..........................................................................28 3.5.3 Precepts and artefacts for reasoning .....................................................29 3.6 User interaction ..........................................................................................30 3.6.1 Requirements for user interaction ........................................................30 3.6.2 Interaction tasks ..................................................................................31 3.6.3 Interaction techniques..........................................................................32 3.6.4 Spatial and temporal data ....................................................................33 3.7 Data analysis ..............................................................................................34 3.7.1 Data analysis process...........................................................................35 3.7.2 Methods for data exploration ...............................................................36 3.7.3 Descriptive methods ............................................................................36 3.7.4 Predictive methods ..............................................................................37 3.7.5 Methods for anomaly detection ...........................................................37 3.8 Information visualization ............................................................................38 3.8.1 Univariate data ....................................................................................39 3.8.2 Bivariate data ......................................................................................41 3.8.3 Multivariate data .................................................................................41 3.8.4 Hierarchical and network structures .....................................................45 3.8.5 Temporal and spatial data ....................................................................47

Department of Information and Computer Science Paula Järvinen

Research Article Advanced Heat Map and Clustering Analysis Using Heatmap3

Visualization and Exploration of Transcriptomics Data Nils Gehlenborg

Bringing 'Bee-Cological' Data to Life Through a Relational Database and an Interactive Visualization Tool By

How Do They Make and Interpret Those Dendrograms and Heat Maps; Differences Between Unsupervised Clustering and Classification

OLIVER: a Tool for Visual Data Analysis on Longitudinal Plant Phenomics Data

Superheat: an R Package for Creating Beautiful and Extendable Heatmaps for Visualizing Complex Data

Data Mining Mobile Devices Defines the Collection of Machine-Sensed Mobile Mining Data Devices Mobile Devices Environmental Data Pertaining to Human Social Behavior

Download the Publicly Available R Software Language, Among a Few Other Operating System-Specific Requirements

View in the FDA’S Voluntary Genomics Data Submission Program

Heatmap Visualization with Spreadsheet

Heat Map Visualization for Electrocardiogram Data Analysis Haisen Guo1†, Weidai Zhang1†, Chumin Ni1†, Zhixiong Cai1, Songming Chen2 and Xiansheng Huang2*

Using Dendritic Heat Maps to Simultaneously Display Genotype Divergence with Phenotype Divergence