Visually Guiding Users in Selection, Exploration, and Presentation Tasks

Submitted by DI Samuel Gratzl, Bsc. Submitted at Institute of Computer Graphics Supervisor and First Examiner Assoz.Univ.-Prof. Dipl.-Ing. Dr.techn. Marc Streit Visually Guiding Users in Second Examiner Prof. Dr. Tobias Schreck Selection, Exploration, March 2017 and Presentation Tasks Doctoral Thesis to obtain the academic degree of Doktor der technischen Wissenschaften in the Doctoral Program Technische Wissenschaften JOHANNES KEPLER UNIVERSITY LINZ Altenbergerstraße 69 4040 Linz, Osterreich¨ www.jku.at DVR 0093696 Abstract Making scientific discoveries based on large and heterogeneous datasets is challenging. The continuous improvement of data acquisition technologies makes it possible to collect more and more data. However, not only the amount of data is growing at a fast pace, but also its complexity. Visually analyzing such large, interconnected data collections requires a user to perform a combination of selection, exploration, and presentation tasks. In each of these tasks a user needs guidance in terms of (1) what data subsets are to be investigated from the data collection, (2) how to effectively and efficiently explore selected data subsets, and (3) how to effectively reproduce findings and tell the story of their discovery. On the basis of a unified model called the SPARE model, this thesis makes contributions to all three guidance tasks a user encounters during a visual analysis session: The LineUp multi-attribute ranking technique was developed to order and prioritize item collections. It is an essential building block of the proposed guidance process that has the goal of better supporting users in data selection tasks by scoring and ranking data subsets based on user-defined queries. Domino is a generic visualization technique for relating and exploring data subsets, supporting users in the exploration of interconnected data collections. Phoeva is a novel open-source visual analytics platform designed to speed up the creation of domain-specific exploration tools. The final building block of this thesis is CLUE, a universally applicable framework for capturing, labeling, understanding, and explaining visually driven exploration. Based on provenance data captured during the exploration process, users can author \Vistories", visual stories based on the history of the exploration. The practical applicability of the guidance model and visualization techniques developed is demonstrated by means of usage scenarios and use cases based on real-world data from the biomedical domain. Zusammenfassung Wissenschaftliche Entdeckungen anhand großer heterogener Datensätzen zu machen ist eine Herausforderung. Die kontinuierlichen technologischen Verbesserungen ermöglichen es, immer schneller große Datenbestände zu erfassen und zu speichern. Jedoch steigt nicht nur die Datengröße sehr schnell an, sondern auch deren Komplexität. Die visuelle Analyse solch großer verknupfter¨ Datenmengen erfordert, dass der Benutzer eine Kombination aus Selektions-, Explorations-, und Präsentationsaufgaben durchfuhrt.¨ Fur¨ jede dieser Auf- gaben benötigen BenutzerInnen Unterstutzung¨ (Guidance) bezuglich:¨ (1) welche Daten- untermenge aus einer Datensammlung näher untersucht werden soll; (2) wie diese Daten- untermenge effektiv und effizient exploriert werden kann; (3) wie Entdeckungen während der Analyse gefunden wurden und wie diese auch effektiv reproduziert und präsentiert werden können. Basierend auf einem vereinheitlichten Modell | dem SPARE Modell| beschreibt diese Dissertation Lösungen zu allen bisher genannten Guidance Aufgaben, denen BenutzerInnen während einer Analyse begegnen. Die LineUp Visualisierungstech- nik wurde fur¨ die Reihung und Priorisierung von beliebigen Sammlungen basierend auf einer Kombination von Datenattributen entwickelt. Sie ist ein essentieller Bestandteil des vorgeschlagenen Guidance Prozesses, welcher das Ziel hat, BenutzerInnen bei der Datense- lektion zu unterstutzen,¨ indem Daten basierend auf Benutzereingaben gereiht werden. Domino ist eine generische Visualisierungstechnik fur¨ die visuelle Exploration von Daten- untermengen und deren Relationen. Phovea ist eine neuartige Open-Source Plattform fur¨ die visuelle Analyse, welche die Erstellung von domainenspezifischen Explorationsanwen- dungen beschleunigt. Den Abschluss dieser Dissertation bildet CLUE, ein universell ein- setzbares Konzept zur Sammlung (Capture), Annotation (Labelling), Verständigung (Un- derstanding) und Erklärung (Explaining) von visuell geleiteten Analysen. Basierend auf der aufgezeichneten Analyse können BenutzerInnen \Vistories" erstellen, um Entdeckun- gen zu präsentieren und zu teilen. Die praktische Anwendbarkeit der entwickelten Modelle und Visualisierungstechniken wird mithilfe von Anwendungsbeispielen aus dem biomedi- zinischen Bereich gezeigt. \I want to remember that I am not my title, my position, my possessions, my appearance, my neighborhood, my age, my body size, my race or my income level. I am a soul." | Unknown iii Acknowledgements I would like to thank my supervisor Marc Streit. Since day one he fully integrated me in every aspect, showed me everything and taught me a lot. Thank you for that. Moreover, Alexander Lex and Nils Gehlenborg. Together with Marc we wrote all my papers. Thank you for the fruitful discussions, the help, the brainstormings, and the last minute improve- ments before the deadline. To appreciate your contributions and efforts I write this thesis using the \we" form instead of \I\. These are our papers not mine alone. My thank also goes to the remaining members of the Caleydo team, especially Hendrik Strobelt, Holger Stitz, and Christian Partl. I always had the feeling that we are not a bunch of individuals but a team in which everyone contributes what he or she can. I also want to thank Hanspeter Pfister for giving me the opportunity to stay a couple of months at his lab at Harvard University. This was a memorable time from a personal, teaching, and work point of view. My family and friends, my anchors in this sometimes crazy and stressful world. Thanks mom and dad for supporting me over the years. My siblings and friends that stand by my side. I don't know what future will bring, but I know we will be there for each other. Finally, I would also like to thank all the one that challenged me over the years. Those who pushed me to go beyond my limits in various ways. All of the mentioned great people and the one I forgot unluckily helped to be exactly at this point, in which you my dear reader can read the result of a four-year long journey of my life. This thesis and individual papers were supported in part by the Austrian Research Pro- motion Agency (FFG) (840232), the Austrian Science Fund (P27975-NBL, P 22902, and J 3437-N15), the State of Upper Austria (FFG 851460 and Dissertation scholarship), the Austrian Marshallplan Foundation, the European Union (324554), the Air Force Research Laboratory and DARPA grant FA8750-12-C-0300, the United States National Cancer Institute (U24 CA143867), the United States NIH/National Human Genome Research In- stitute (U01 CA198935, U24 CA144025, U24 CA143845, K99 HG00758, U01 CA1989353) and Boehringer Ingelheim Regional Center Vienna. iv Contents 1. Introduction1 1.1. Problem Statement and Goals . .1 1.2. Contributions ..................................2 1.3. Guidance Classification . .9 1.4. Structure .................................... 12 2. Biomedical Background 14 2.1. Problem Domain and Data . 14 2.2. Cancer Subtype Characterization . 15 2.3. Challenges . 16 3. Related Work 18 3.1. Visual Analysis of Heterogeneous Stratifications: StratomeX . 19 3.2. Multi-Attribute Ranking . 21 3.3. Multi-Dimensional Data Exploration . 22 3.4. Presentation and Storytelling . 26 4. LineUp 29 4.1. Introduction................................... 30 4.2. Requirement Analysis . 31 4.3. Ranking Design Space and Related Work . 34 4.4. Multi-Attribute Ranking Visualization Technique . 38 4.5. Use Cases . 48 4.6. Evaluation . 51 4.7. Conclusion . 54 5. Guided StratomeX 55 5.1. Introduction................................... 56 5.2. Application Areas . 57 5.3. Method ..................................... 59 5.4. CaseStudy ................................... 64 Contents v 5.5. Conclusion . 66 6. Domino 67 6.1. Introduction................................... 68 6.2. Domino Visualization Technique . 69 6.3. Spectrum of Supported Visualizations . 78 6.4. Interface and Interaction Design . 78 6.5. Use Cases . 83 6.6. Discussion . 87 6.7. Conclusion . 89 7. CLUE 90 7.1. Introduction................................... 91 7.2. CLUE Model . 93 7.3. Realizing the CLUE Model . 95 7.4. Usage Scenarios . 100 7.5. Discussion . 103 8. Phovea 106 8.1. Key Aspects . 107 8.2. Related Platforms . 108 8.3. Ecosystem . 109 8.4. Additional Libraries . 112 8.5. Discussion . 113 9. Conclusion 114 9.1. Discussion and Future Work . 114 9.2. Conclusion . 117 Bibliography 118 A. Phovea Applications 131 A.1. Caleydo LineUp and LineUp.js . 131 A.2. Caleydo Gapminder with CLUE . 132 A.3. StratomeX.js with CLUE . 133 A.4. Caleydo Pathfinder . 134 Contents vi List of Figures 1.1. The SPARE model . .3 1.2. Relationships between my published contributions and the SPARE model .4 1.3. Five aspects of guidance . 10 1.4. Structure of this thesis with respect to the SPARE model . 12 3.1. StratomeX at a glance . 19 3.2. StratomeX user interface . 20 3.3. Block visualizations in StratomeX . 21 4.1. Illustration of different ranking visualization techniques. 34 4.2. A simple example demonstrating the basic design of the LineUp

Visually Guiding Users in Selection, Exploration, and Presentation Tasks

DATA Poster Numbers: P Da001 - 130 Application Posters: P Da001 - 041

A Molecular Phylogenetic Toolkit Using Node.Js 1 2 3 Damien M

Biojs-HGV Viewer: Genetic Variation Visualizer

a Biojs Component to Visualize KEGG Pathways Keggviewer [V1

1 Phylo-Node: a Molecular Phylogenetic Toolkit Using Node.Js

Thesis Entire.Pdf (12.90Mb)

Blastjs: a BLAST+ Wrapper for Node.Js

Trends in IT Innovation to Build a Next Generation Bioinformatics Solution to Manage and Analyse Biological Big Data Produced by NGS Technologies Alexandre G

Anatomy of Biojs, an Open Source Community for the Life Sciences

1 Phylo-Node: a Molecular Phylogenetic Toolkit Using Node.Js

Downloads Or Setups, Because They Are Built Into the Webpage Structure

Open Source Libraries and Frameworks for Biological Data Visualisation: a Guide for Developers