
A Machine Learning Framework for Programming by Example Aditya Krishna Menon [email protected] University of California, San Diego, 9500 Gilman Drive, La Jolla CA 92093 Omer Tamuz [email protected] Faculty of Mathematics and Computer Science,The Weizmann Institute of Science, Rehovot Israel Sumit Gulwani [email protected] Microsoft Research, One Microsoft Way, Redmond, WA 98052 Butler Lampson [email protected] Microsoft Research, One Memorial Drive, Cambridge MA 02142 Adam Tauman Kalai [email protected] Microsoft Research, One Memorial Drive, Cambridge MA 02142 Abstract sion of Microsoft Excel ships with “Flash Fill,” a PBE algo- Learning programs is a timely and interesting rithm for simple string manipulation in spreadsheets, such challenge. In Programming by Example (PBE), as splitting and concatenating strings (Gulwani, 2011). We a system attempts to infer a program from in- are interested in using learning for PBE to enable richer put and output examples alone, by searching for classes of programs such as, for example, text-processing a composition of some set of base functions. tasks that might normally be accomplished in a program- We show how machine learning can be used to ming language such as PERL or AWK. speed up this seemingly hopeless search prob- Learning programs poses two key challenges. First, users lem, by learning weights that relate textual fea- give very few examples per task. Hence, one must impose tures describing the provided input-output ex- a strong bias, learned across multiple tasks. In particu- amples to plausible sub-components of a pro- lar, say for task t the user provides nt examples of pairs gram. This generic learning framework lets us (t) (t) (t) (t) of strings (¯x1 ; y¯1 );:::; (¯xnt ; y¯nt ) that demonstrate the address problems beyond the scope of earlier task (in our application we will generally take nt = 1). PBE systems. Experiments on a prototype im- There are infinitely many consistent functions f such that plementation show that learning improves search (t) (t) f(¯x ) =y ¯ . Hence, one requires a good ranking over and ranking on a variety of text processing tasks i i such f, learned across the multiple tasks. Simply choosing found on help forums. the shortest consistent program can be improved upon us- ing learning. For one, the shortest program mapping x¯ to y¯ 1. Introduction may very well be the trivial constant function f(x) =y ¯. Second, and perhaps even more pressing, is searching over An interesting challenge is that of learning programs, a arbitrary compositions of functions for consistent candidate problem with very different characteristics to the more functions. In many cases, finding any (nontrivial) consis- well-studied goal of learning real-valued functions (regres- tent function can be a challenge, let alone the “best” under sion). Our practical motivation is Programming by Exam- some ranking. For any sufficiently rich set of functions, ple (PBE) (Lieberman, 2001; Cypher et al., 1993), where this search problem defies current search techniques, such an end user provides a machine with examples of a task as convex optimization, dynamic programming, or those she wishes to perform, and the machine infers a program to used in Flash Fill. This is because program representations accomplish this. The study of PBE is timely; the latest ver- are wildly unstable – a small change to a program can com- Proceedings of the 30 th International Conference on Machine pletely change the output. Hence, local heuristics that rely Learning, Atlanta, Georgia, USA, 2013. JMLR: W&CP volume on progress in terms of some metric such as edit distance, 28. Copyright 2013 by the author(s). will be trapped in local minima. A Machine Learning Framework for Programming by Example We introduce a learning-based framework to address both mains. the search and ranking problems. Of particular interest, • Demonstrating the effectiveness of this approach with machine learning speeds up search (inference). This is un- experiments on a prototype system. like earlier work on string processing using PBE, which To clarify matters, we step through a concrete example of restricted the types of programs that could be searched our system’s operation. through so that efficient search would be possible using so- called version space algebras (Lau et al., 2000). It is also unlike work in structured learning that uses dynamic pro- 1.1. Example of our system’s operation gramming for efficient search. Instead, the types of pro- Imagine a user has a long list of names with some re- grams we can handle are more general than earlier sys- peated entries (say, the Oscar winners for Best Actor), and tems such as SMARTedit (Lau et al., 2000), LAPIS (Miller, would like to create a list of the unique names, each anno- 2002), Flash Fill (Gulwani, 2011), and others (Nix, 1985; tated with their number of occurrences. Following the PBE Witten & Mo, 1993). In addition, our approach is more paradigm, in our system, the user illustrates the operation extensible and broadly applicable than earlier approaches, by providing an example, which is an input-output pair of though it is not as efficient as the version space algebras strings. Figure1 shows one possible such pair, which uses previously used. a subset of the full list (in particular, the winners from ’91– How can one improve on brute force search for combining ’95) the user possesses. functions from a general library? In general, it seems im- Anthony Anthony Hopkins possible to speed up search if all one can tell is whether a Hopkins (1) program correctly maps example x¯ to y¯. The key idea in Al Pacino Al Pacino (1) our approach is to augment the library with certain telling Tom Hanks ! Tom Hanks (2) Tom Hanks textual features on example pairs. These features suggest Nicolas Cage (1) which functions are more likely to be involved in the pro- Nicolas Cage gram. As a simple example beyond the scope of earlier Figure 1. Input-output example for the desired task. PBE systems, consider sorting a list of names by last name. Specifically, say the user gives just one example pair: x¯ consists of a list of four or five names, one per line, where One way to perform the above transformation is to first each line is in the form FirstName LastName, and y¯ is the generate an intermediate list where each element of the same list sorted by last name. One feature of (¯x; y¯) is that input list is appended with its occurrence count – which the lines are permutations of one another, which is a clue would look like [ "Anthony Hopkins (1)", "Al that the sorting function may be useful in the desired pro- Pacino (1)", "Tom Hanks (2)", "Tom Hanks gram. No such clue is perfectly reliable. However, we (2)", "Nicolas Cage (1)"] – and then remove learn reliability weights of these clues, which dramatically duplicates. The corresponding program f(·) may be speeds up search and inference. In more complex exam- expressed as the composition ples, such as that in Figure1, which as a program tree has f(x) = dedup(concatLists(x; \"; 10 elements, clues dramatically speed up search. concatLists(\("; count(x; x); \)"))): The contributions of this work are: (1) • Proposing a framework for learning general programs The argument x here represents the list of input lines that that speeds up search (inference) and is also used the user wishes to process, which may be much larger than for ranking. Previous work addressed ranking (Liang the input provided in the example. We assume here a base et al., 2010) and used restricted classes of programs language comprising (among others) a function dedup that that could be efficiently searched but not extended. removes duplicates from a list, concatLists that concate- • Proposing the use of features relating input and output nates lists of strings elementwise, implicitly expanding sin- examples in PBE. Previous work did not use example gleton arguments, and count that finds the number of oc- features – Liang et al.(2010) use features of the target currences of the elements of one list in another. programs in the training set. • Advancing the state of the art in PBE by introducing While conceptually simple, this example is out of scope for a general, extensible framework, that can be applied existing text processing PBE systems. Most systems sup- to tasks beyond the scope of earlier systems (as dis- port a restricted, pre-defined set of functions that do not cussed shortly). Indeed, while the learning compo- include natural tasks like removing duplicates; for exam- nent is discussed in the context of text processing, the ple (Gulwani, 2011) only supports functions that operate approach could possibly be adapted for different do- on a line-by-line basis. These systems perform inference with search routines that are hand-coded for their supported A Machine Learning Framework for Programming by Example Table 1. Example of grammar rules generated for task in Figure1. Production Probability Production Probability P!join(LIST,DELIM) 1 CAT!LIST 0.7 LIST!split(x,DELIM) 0.3 CAT!DELIM 0.3 LIST!concatList(CAT,CAT,CAT) 0.1 DELIM!"nn" 0.5 LIST!concatList("(",CAT,")") 0.2 DELIM!"" 0.3 LIST!dedup(LIST) 0.2 DELIM!"(" 0.1 LIST!count(LIST,LIST) 0.2 DELIM!")" 0.1 functionality, and are thus not easily extensible. (Even if mar our prototype system generates, which is based on a an exception could be made for specific examples like the large library of about 100 features and clues. With the one above, there are countless other text processing appli- full grammar, a na¨ıve brute force search over compositions cations we would like to solve.) takes 30 seconds to find the right solution to the example of Figure1, whereas with learning the search terminates in Notice that certain textual features can help bias our search just 0:5 seconds.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages9 Page
-
File Size-