Program Synthesis for Empowering End Users and Stress-Testing Compilers
Total Page:16
File Type:pdf, Size:1020Kb
Program Synthesis for Empowering End Users and Stress-Testing Compilers By VU MINH LE B.Eng. (University of Technology, Vietnam National University) 2006 M.Sc. (University of California, Davis) 2011 DISSERTATION Submitted in partial satisfaction of the requirements for the degree of DOCTOR OF PHILOSOPHY in Computer Science in the OFFICE OF GRADUATE STUDIES of the UNIVERSITY OF CALIFORNIA DAVIS Approved: Zhendong Su, Chair Sumit Gulwani Premkumar Devanbu Committee in Charge 2015 -i- Copyright © 2015 by Vu Minh Le All rights reserved. Vu Minh Le September 2015 Computer Science Program Synthesis for Empowering End Users and Stress-Testing Compilers Abstract Since 2012, the number of people with access to computing devices has exploded. This trend is due to these devices’ falling prices, and their increased functionalities and programmability. Two challenges arise from this trend: (1) How to assist the increasing number of device owners, most of whom are non-programmers, yet hope to take full advantage of their devices, and (2) how to make critical software, which is relied upon by other software running on these devices, more reliable? Our vision is that program synthesis is the solution for the above arising challenges. This dissertation describes various program synthesis techniques to synthesize programs from natural languages, input/output examples, and existing programs to tackle these challenges. We present SmartSynth, a natural language interface that synthesizes smartphone programs from natural language descriptions. SmartSynth enables users to automate their repetitive tasks by leveraging various phone sensors. It is the first system that combines the advances in both natural language processing (NLP) and program synthesis: it uses NLP techniques to parse a given command, and applies program synthesis techniques to resolve parsing ambiguities. We have adapted and extended SmartSynth’s algorithms to integrate it into TouchDevelop, a popular touch-based programming environments for end users. We develop FlashExtract, a system that extracts data from semi-structured document (such as text files, webpages, and spreadsheets) using examples. Natural language does not work well in this domain because the tasks are complicated and users usually do not know how to perform them. In FlashExtract, users only need to highlight some sample regions to be extracted, the system will learn a program to select similar regions -ii- automatically. They can also provide nested examples to extract structured, hierarchical data. FlashExtract has been shipped as the cmdlet ConvertFrom-String of PowerShell in Microsoft Windows 10 and as the Custom Fields feature in Microsoft Azure Operational Insights. While it is important to make programming accessible to end users, it is also vital to improve the correctness of critical software because they impact other software products. We introduce Equivalence Modulo Inputs (EMI), a novel, general methodology to synthesize valid compiler test programs from existing test cases. Given a test program and a set of its inputs, we profile the program’s execution over the inputs. We then generate new test variants by randomly removing code that is unexecuted under the provided inputs. The expectation is that these new variants should behave exactly the same as the original program under the same inputs; any observed discrepancy indicates a compiler bug. Our technique is simple to realize yet very effective. In total, we have reported more than 400 bugs in GCC and LLVM, most of which have already been fixed. Many compiler vendors have adopted EMI to test their compilers. We also believe that cross-disciplinary solutions can increase the impact of program synthesis techniques. We therefore further explore program synthesis in the following three dimensions. First, we introduce two novel user interaction models to help users resolve ambiguities in programming-by-example systems like FlashExtract. One model allows users to effectively navigate among the large set of programs consistent with the provided examples. The other model uses active learning to ask questions to help differentiate the outputs of these programs. Second, we present a guided, advanced mutation strategy based on Bayesian optimization to synthesize EMI variants more effectively. Our improved technique supports both code deletions and insertions in the unexecuted regions, and uses Markov Chain Monte Carlo (MCMC) optimization to guide the synthesis process. Finally, we apply program synthesis to find bugs in a new domain: the link-time optimizer components in compilers. iii To Minhon and Koala. iv CONTENTS List of Figures . viii List of Tables . xii Acknowledgments . xiii 1 Introduction 1 2 SmartSynth: Synthesizing Smartphone Programs from Natural Language 11 2.1 Motivating Example . 14 2.2 Automation Script Language . 19 2.2.1 Language Features . 19 2.2.2 Essential Components . 21 2.3 Synthesis Algorithm . 23 2.3.1 Component Discovery . 23 2.3.2 Script Discovery . 26 2.3.3 SmartSynth . 32 2.4 Evaluation . 35 2.4.1 Setup . 35 2.4.2 Experimental Results . 36 2.4.3 Limitations . 40 3 FlashExtract: Synthesizing Data-Extraction Program from Examples 42 3.1 Motivating Examples . 44 3.2 User Interaction Model . 49 3.2.1 Output Schema . 49 3.2.2 Regions . 50 3.2.3 Schema Extraction Program . 51 3.2.4 Example-based User Interaction . 53 3.3 Inductive Synthesis Framework . 56 3.3.1 Data Extraction DSLs . 57 -v- 3.3.2 Core Algebra for Constructing Data Extraction DSLs . 58 3.3.3 Modular Synthesis Algorithms . 60 3.3.4 Correctness . 65 3.4 Instantiations . 67 3.4.1 Text Instantiation . 67 3.4.2 Webpage Instantiation . 70 3.4.3 Spreadsheet Instantiation . 72 3.5 Evaluation . 73 3.5.1 Setup . 74 3.5.2 Experimental Results . 75 4 EMI: Synthesizing Compiler Test Programs from Existing Programs 79 4.1 Illustrating Examples . 82 4.2 Equivalence Modulo Inputs . 87 4.2.1 Definition . 87 4.2.2 Differential Testing with EMI Variants . 88 4.3 Implementation of Orion ........................... 89 4.3.1 Extracting Coverage Information . 90 4.3.2 Generating EMI Variants . 92 4.4 Evaluation . 94 4.4.1 Testing Setup . 95 4.4.2 Quantitative Results . 98 4.4.3 Assorted Bug Samples Found by Orion ............... 101 4.4.4 Remarks . 104 5 Other Dimensions of Program Synthesis 107 5.1 New User Interaction Models: Program Navigation and Conversational . 107 5.1.1 FlashProg’s User Interface . 108 5.1.2 Illustrative Scenario . 110 5.1.3 Implementation . 113 vi 5.1.4 Evaluation . 117 5.2 New Techique: Guided Program Synthesis . 124 5.2.1 Illustrative Examples . 125 5.2.2 Background on Markov Chain Monte Carlo . 128 5.2.3 MCMC Bug Finding . 129 5.2.4 Implementation . 134 5.2.5 Evaluation . 137 5.3 New Domain: Stress-Testing Link-Time Optimizers . 145 5.3.1 Illustrative Examples . 145 5.3.2 Design and Realization . 149 5.3.3 Evaluation . 154 6 Related Work 160 6.1 Natural Language Understanding . 160 6.2 Program Synthesis . 162 6.3 Data Extraction . 164 6.4 Interactive User Interfaces . 166 6.5 Compiler Testing and Verification . 169 6.6 Markov Chain Monte Carlo Sampling . 172 7 Conclusion and Future Work 173 vii LIST OF FIGURES 2.1 SmartSynth’s system architecture. 13 2.2 A visual program for Example 1 written in AppInventor. 14 2.3 The graph showing all possible dataflow relations among components identified by the Component Discovery algorithms. 17 2.4 The script for Example 1. 18 2.5 The syntax of automation language. : x, i, r, and lrefer to a temporary variable, an argument of the script, a return value, and a literal, respec- tively. The essential components identified by the NLP techniques are underlined. 21 2.6 A region in the description that cannot be mapped to any component. SmartSynth reports its phrase to the user to clarify. 34 2.7 The result of mapping phrases to components under increasingly sophisti- cated configurations. 38 2.8 The number of dataflow relations detected by the rule-based algorithm for various tasks grouped by relation number. 39 2.9 The number of descriptions whose dataflow relations are entirely detected by the rule-based algorithm (total 487) vs. those that require completion by the completion algorithm (total 153), grouped by the number of relations. 40 3.1 Extracting data from a text file using FlashExtract............. 45 3.2 Extracting data from a Google Scholar webpage. 47 3.3 Extracting data from a semi-structured spreadsheet. 48 3.4 The language of schema for extracted data. 49 3.5 Semantics of Fill............................... 53 3.6 The syntax of Ltext, the DSL for extracting text files. 68 3.7 The syntax of Lweb, the DSL for extracting webpages. Definitions of p and rr are similar to those in Figure 3.6. 71 -viii- 3.8 The syntax of Lsps, the DSL for extracting spreadsheets. 72 3.9 Average number of examples (solid/white bars represent positive/nega- tive instances) across various fields for each document. 76 3.10 Average learning time of the last interaction across various fields for each document. 77 4.1 Orion transforms 4.1a to 4.1b and uncovers a miscompilation in Clang. 84 4.2 Reduced version of the code in Figure 4.1b for bug reporting. (http: //llvm.org/bugs/show_bug.cgi?id=14972)................ 85 4.3 GCC miscompiles this program to an infinite loop instead of immediately terminating with no output. (http://gcc.gnu.org/bugzilla/show_bug. cgi?id=58731)................................ 86 4.4 Affected versions of GCC and LLVM (lighter bars: confirmed bugs; darker bars: fixed bugs). 100 4.5 Affected optimization levels of GCC and LLVM (lighter bars: confirmed bugs; darker bars: fixed bugs). 101 4.6 Affected components of GCC and LLVM (lighter bars: confirmed bugs; darker bars: fixed bugs).