©Copyright 2018

Kanit Wongsuphasawat Augmenting Exploratory Data Analysis with Recommendation

Kanit Wongsuphasawat

A dissertation submitted in partial ful llment of the requirements for the degree of

Doctor of Philosophy

University of Washington

2018

Reading Committee:

Jerey Heer, Chair

Bill Howe

Jock Mackinlay

Program Authorized to Oer Degree:

Computer Science & Engineering University of Washington

Abstract

Augmenting Exploratory Data Analysis with Visualization Recommendation

Kanit Wongsuphasawat

Chair of the Supervisory Committee:

Professor Jerey Heer

Paul G. Allen School of & Engineering

Exploratory data analysis is one of the key activities for understanding and discovering new insights from data. As exploratory data analysis can involve both open-ended exploration and focused question answering, analysis tool should facilitate both exploration breadth and analysis depth. However, existing data exploration tools typically require manual speci cation, which can be tedious and prevent analysts from rapidly exploring dierent aspects of the data. Moreover, analysts may be blindsided by their own cognitive biases and prematurely xate on speci c questions or hypotheses. Without discipline and time, analysts may overlook important insights in the data, such as potentially confounding factors and data quality issues, and produce inaccurate results in their analyses.

To help analyst perform rapid and systematic data exploration, this dissertation presents the design of mixed-initiative systems that complement manual chart speci cation with chart recommendation. To better understand the practice and challenges of exploratory data analysis, we rst conduct an interview study with 18 data analysts. From the interview data, we characterize the goals, process, and challenges of exploratory data analysis. We then identify design opportunities for exploratory analysis tools. One major opportunity is facilitating rapid and systematic exploration with automation and guidance. The rest of the dissertation addresses this opportunity by contributing a stack of systems to augment exploratory analysis tools with chart recommendation.

At the foundations of this stack, we introduce new formal languages for chart speci cation and recommendation. The Vega-Lite visualization grammar provides a representation for specifying and reasoning about . Building on Vega-Lite, the CompassQL query language combines partial chart speci cation with recommendation directives to provide a generalizable framework for chart recommendation via queries over the space of visualizations.

Based on these foundations, we used the iterative design process to develop and study new recommendation-powered visual data exploration tools. Voyager enables data exploration via browsing of recommended charts, while allowing users to steer the recommendations by selecting data elds and transformations. Our user study, which compares Voyager with a traditional chart authoring tool, indicates the complementary bene ts of manual authoring and recommendation browsing. Inspired by the study result, Voyager 2 blends manual and automated chart authoring in a single tool to facilitate rapid and systematic data exploration while preserving users’ exibility to directly author a broad range of charts.

All of these systems have been released as open-source projects and adopted by both research and professional data science communities. of Contents

Page

List of Figures ...... iv

Chapter 1: Introduction ...... 1 1.1 Contributions ...... 2 1.2 Outline ...... 5 1.3 Prior Publications and Authorship ...... 6

Chapter 2: Goals, Process, and Challenges of