Interactive Program Distillation
Total Page:16
File Type:pdf, Size:1020Kb
Interactive Program Distillation Andrew Head Electrical Engineering and Computer Sciences University of California at Berkeley Technical Report No. UCB/EECS-2020-48 http://www2.eecs.berkeley.edu/Pubs/TechRpts/2020/EECS-2020-48.html May 15, 2020 Copyright © 2020, by the author(s). All rights reserved. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. To copy otherwise, to republish, to post on servers or to redistribute to lists, requires prior specific permission. Interactive Program Distillation by Andrew Head A dissertation submitted in partial satisfaction of the requirements for the degree of Doctor of Philosophy in Computer Science in the Graduate Division of the University of California, Berkeley Committee in charge: Associate Professor Björn Hartmann, Co-chair Professor Marti A. Hearst, Co-chair Professor Koushik Sen Assistant Professor Joshua Blumenstock Spring 2020 Interactive Program Distillation Copyright 2020 by Andrew Head 1 Abstract Interactive Program Distillation by Andrew Head Doctor of Philosophy in Computer Science University of California, Berkeley Associate Professor Björn Hartmann, Co-chair Professor Marti A. Hearst, Co-chair From snippets to tutorials, programmers rely on sample programs to learn and get work done. The process of creating sample programs, however, can be demanding, limiting the dissemination of programming knowledge. To enhance this process, we introduce the concept of program distillation, methods for its implementation, and usability studies verifying its power. Program distillation is the tool-assisted transformation of existing programs into simpler ones, where key ideas are emphasized, and cruft has been removed. Three interactive tools are introduced for distilling code snippets, notebooks, and tutorials. Each tool contributes novel interactions grounded in proven pro- gram analysis techniques. CodeScoop helps programmers extract snippets from existing code through interactive program slicing and simplification. Code gather- ing tools let a programmer extract subsets of cells from a computational notebook that reproduce key results. And Torii provides a live programming experience for creating output-rich multi-step tutorials. Studies with users reveal that these tools satisfy important needs, support efficient sample program creation, and provide a level of expressiveness not yet available in today’s standard tools. i To Anna, whose distilled knowledge would fill many dissertations, each of them worth reading. ii Table of Contents Table of Contents ii List of Figures vi List of Tables viii Preface ix Acknowledgments xi 1 Introduction1 Purpose and thesis statement ........................ 2 An overview of this dissertation....................... 2 Summary of contributions ....................... 3 Research methodology ......................... 4 Statement of prior publication........................ 5 2 Background: The design of sample programs6 Terms ..................................... 6 How do programmers read programs?.................... 7 Reading order.............................. 8 Building mental models of programs ................. 9 Program design choices and their impact on readability....... 9 How are sample programs used?....................... 11 Why programmers use sample programs ............... 11 The process of finding and using samples............... 12 What makes a sample program effective?.................. 14 Code snippet design .......................... 14 iii Tutorial design ............................. 16 How do authors distill sample programs?.................. 17 The quality of sample programs today ................ 18 Summary ................................... 19 3 Related work 21 Tools for authoring sample programs.................... 21 Automated generation of sample programs.............. 21 Literate programming ......................... 31 Multi-stage sample authoring ..................... 40 Other tools that could support program distillation............ 43 Efficient code selection......................... 44 Cleaning programs ........................... 46 Linked edits to programs, documentation, and outputs . 47 Automated program explanation.................... 50 A design space for program distillation tools................ 51 This dissertation in the design space ................. 54 4 Snippet distillation: Mixed-initiative code selection and simplification 57 Motivation................................... 58 Formative study................................ 60 Method ................................. 60 Results.................................. 60 Design motivations.............................. 62 A demo of CodeScoop ............................ 63 Prologue: An unexpectedly useful programming pattern . 64 First steps: Initial text selections ................... 64 Mixed-initiative dialogue: Completing the example ......... 65 Implementation................................ 69 Code extraction with the “Flag-Suggest-Resolve” workflow . 69 Detecting errors and relevant code................... 69 Suggesting fixes and code additions.................. 71 Applying fixes to the scoop....................... 72 iv Generating an example program from the “scoop” data structure . 72 Implementation specifics and limitations ............... 73 In-lab usability study............................. 73 Method ................................. 73 Results.................................. 75 Conclusions............................... 81 Limitations and extensions.......................... 81 5 Notebook distillation: Cleaning messy computational notebooks 84 Motivation................................... 85 Design motivations.............................. 87 A demo of code gathering tools ....................... 88 Prologue: A proliferation of cells.................... 88 Finding the code that produces a result................ 88 Removing old and distracting analysis code.............. 89 Reviewing versions of a result and the code that produced them . 90 Cleaning finished analysis code .................... 91 Exporting analysis code to a standalone script............ 91 Implementation................................ 92 Collecting and slicing an execution log ................ 93 In-lab usability study............................. 94 Method ................................. 94 Results.................................. 95 Conclusions............................... 99 Limitations and extensions..........................100 6 Tutorial distillation: Flexible sequencing of snippets 102 Motivation...................................103 Formative study I: Interviews with tutorial authors . 105 Method .................................105 Results..................................105 Formative study II: Content analysis of two-hundred tutorials . 109 Method .................................109 v Results..................................109 A demo of Torii................................112 Propagating edits from snippets to source programs . 113 Propagating edits from code to outputs . 113 Splitting, reordering, and copying code . 114 Reviewing a simulated reader’s code..................115 Making localized changes to the code.................116 Distributing augmented tutorials ...................117 In-lab usability study.............................117 Method .................................118 Results..................................120 Conclusions ...............................123 Limitations and extensions..........................124 7 Conclusions 126 Summary of findings .............................126 Claim I. Four interactive functions ..................127 Claim II. Implementation with proven program analysis techniques 128 Claim III. Effective and flexible user experience . 129 Remaining challenges and future directions . 130 Mixed-initiative program synthesis ..................130 Authoring tools for explorable tutorials................131 Natural language generation......................132 The distillation of scientific discourse and beyond . 132 Closing remarks: Humans, compilers, and creativity . 134 Bibliography 135 vi List of Figures 0.1 A snippet from the TEX program..................... ix 1.1 An intricate, hand-crafted programming tutorial............. 1 1.2 Interactive program distillation tools................... 3 2.1 Four stages of program reading ...................... 7 3.1 Classic techniques for presenting programs................ 22 3.2 A workflow for extracting sample programs from existing programs . 24 3.3 An automatically-generated sample program............... 25 3.4 A flow diagram of a sample usage of a mobile app............ 29 3.5 A section of a WEB program and the document generated from it . 33 3.6 A schematic of a computational notebook ................ 35 3.7 Types of messes in computational notebooks............... 36 3.8 A guided tour of a program ........................ 39 3.9 Interactive assistance for repairing sample programs........... 47 3.10 Linked edits of source code clones..................... 48 3.11 A design space of distillation tools, explored............... 55 4.1 Extracting example code from existing code with CodeScoop . 57 4.2 Tool recommendations for improving example extraction . 62 4.3 A workflow for iterative correction of incorrect example code . 70 4.4 Suggesting fixes and code that complete a