A Demonstration of Exploratory Visual
Total Page:16
File Type:pdf, Size:1020Kb
Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment by Teeradache Viangteeravat, PhD, and Naga Satya V. Rao Nagisetty, MS Abstract Secondary use of large and open data sets provides researchers with an opportunity to address high- impact questions that would otherwise be prohibitively expensive and time consuming to study. Despite the availability of data, generating hypotheses from huge data sets is often challenging, and the lack of complex analysis of data might lead to weak hypotheses. To overcome these issues and to assist researchers in building hypotheses from raw data, we are working on a visual and analytical platform called PRD Pivot. PRD Pivot is a de-identified pediatric research database designed to make secondary use of rich data sources, such as the electronic health record (EHR). The development of visual analytics using Microsoft Live Labs Pivot makes the process of data elaboration, information gathering, knowledge generation, and complex information exploration transparent to tool users and provides researchers with the ability to sort and filter by various criteria, which can lead to strong, novel hypotheses. Keywords: clinical research; translational research; visual analytics; research data warehouse; medical informatics; biomedical informatics Introduction Using a clinical research database to facilitate potential cohort discovery and recruit patients for possible future studies is not a new concept, but forming hypotheses from data sets consisting of hundreds to thousands of variables and analyzing them in an intuitive way is a very challenging and complex process. Visual analytics can facilitate the discourse between the user and the data by providing the opportunity for visual interaction with the data in a way that can support analytical reasoning and the exploration of data from multiple perspectives. Visual analytics not only permit users to detect expected events, such as those that might be predicted by models, but also help users discover the unexpected— surprising anomalies, changes, patterns, and relationships that can then be examined and assessed to develop new insights. In this article, we demonstrate exploratory analysis techniques using Microsoft Live Labs Pivot technology,1 a visual analytics tool that offers a fresh way to visually browse and arrange massive 2 Perspectives in Health Information Management, Winter 2014 amounts of data (and images) online. As we show, it can be used to classify data by characteristics, such as in demographic, geographic, and neuroimaging classifications. Literature Review Visual analytics enhances the concept of information visualization and can be seen as an integrated approach combining visualization, human factors, and data analysis.2 The goal of visual analytics is to permit people to draw conclusions that lead to better decisions by visually representing information in a way that allows direct interaction with the data and can provide new insights. The synergy among computation, visual representation, and interactive thinking supports intensive analysis by harnessing the human visual system to support information collection, organization, and analysis, that is, the process of making sense of information. Visual analytics is a multidisciplinary field3–5 that combines the methods and strengths of various research areas, including human-computer interaction, cognitive and perceptual science, decision science, information visualization, scientific visualization, geospatial visualization and analytics, databases, data mining and management, statistics, knowledge discovery and representation, and graphics and rendering. It takes advantage of humans’ ability to optically process large amounts of information at once, allowing them to apply analytical reasoning and assess, plan, and make decisions. The benefits of visual data exploration over automatic data mining techniques that use statistics or machine learning are as follows: 1. Visual analytics can easily deal with extremely heterogeneous and noisy data; it is intuitive and does not require understanding complex mathematical or statistical algorithms or parameters, and it is of great value when little is known about the data. 2. It can be used to analyze problems and find effective and efficient solutions that might elude either a machine or a human working alone.6 Geospatial visual analytics is a specialized subtype of visual analytics that supports spatial analysis and decision making through interactive visual interfaces, such as maps and other visual artifacts.7 Many good online resources to learn about geospatial visual analytics are GeoAnalytics.net,8 Web GIS in Practice IX,9 and a tutorial provided by the Commission on GeoVisualization of the International Cartographic Association.10 A number of software applications and tools that can be useful in various geospatial visual analytics are offered by the GeoVISTA Center at Pennsylvania State University.11 Some examples of human health, surveillance/emergency management, and epidemiology-related geospatial visual analytics applications can be found on a web-based data system for infectious disease surveillance and management that utilizes movable timelines and line-list querying, in addition to other tools for aggregating and stratifying data.12 Google Public Data Explorer is a powerful visualization tool for exploring, visualizing, and sharing data in a Gapminder-like manner.13, 41 Data sets from providers such as World Bank and the US Centers for Disease Control and Prevention (CDC), including data sets that are directly related to human health, such as infectious disease outbreaks, sexually transmitted diseases, mortality, and cancer, are available to explore in Google Public Data Explorer.14 With the recent expansion of web technologies and increased network performance,15 delivering massive image collections has become feasible for translational researchers and clinician-scientists to analyze, interpret, and possibly even make diagnoses from these distributed, networked image collections. Given the recent advancements in web service technologies, a basic component to be considered when developing distributed image portals for viewing massive image collections is the ability to efficiently interact with and effectively search large amounts of data to answer multidimensional analytical queries, along with the ability to augment the data with pertinent experiential knowledge. Several centers and the National Institutes of Health (NIH) have invested heavily in individual image-data storage and retrieval systems. The Biomedical Informatics Research Network (BIRN) system extracts or retrieves and then transmits images from a source,16 while the cancer Biomedical Informatics Grid (caBIG) managed oncology and radiology images from multiple sources through its web servers.17 Notable projects using the National Biomedical Imaging Archive (NBIA) database are the Reference Image Database to Giving Raw Data a Chance to Talk: A Demonstration of Exploratory Visual Analytics with a Pediatric Research Database Using Microsoft Live Labs Pivot to Promote Cohort Discovery, Research, and Quality Assessment Evaluate Response (RIDER),18 the Lung Image Database Consortium (LIDC),19 and the Virtual Colonoscopy Collection.20 The NBIA is an online image repository tool that aims to improve the use of imaging to increase the efficiency of cancer detection, diagnosis, and therapeutic response, and to improve clinical decision support.21 Waxholm Space is a conceptual and physical atlas space developed by the International Neuroinformatics Coordinating Facility to serve as a framework for registering and spatially relating neuroanatomical and physiological data, as well as to facilitate data sharing in neuroscience. Currently, researchers can use Waxholm Space to query the spatial location of their own images and retrieve structure names, gene expression, and other data associated with the user-defined point of interest in resources such as the Allen Brain Atlas.22 The Allen Brain Atlas, a genome-wide image database collection, uses an interactive, web-based platform to present a comprehensive online resource for the exploration of mouse and human brain research.23 BrainMaps is an NIH-funded project developed to serve as an online, interactive digital atlas of massive, high-resolution scanned brain structure images for research and didactic purposes.24 The Mouse Brain Library and the WebQTL databases provide huge collections of mouse brain structure data for studies of function behavior and genetic control.25, 26 Here, we demonstrate the addition of a visual analytics layer called PRD Pivot to our clinical research database using Microsoft Live Labs Pivot technology,27 a free tool that offers a novel way to examine and arrange huge amounts of clinical data online. This added layer enables data visualization and the ability to drill down (moving from summary to detailed data) by filtering and sorting information in electronic databases, leading to the discovery of patterns and relationships that would otherwise not be apparent. The visual analytics layer would obviously serve as a research tool for users