Inductive Programming Meets the Real World
Total Page:16
File Type:pdf, Size:1020Kb
Inductive Programming Meets the Real World Sumit Gulwani José Hernández-Orallo Emanuel Kitzelmann Stephen H. Muggleton Ute Schmid Benjamin Zorn Since most end users lack programming skills they often and recently celebrated the 20th anniversary [34] of its in- spend considerable time and effort performing tedious and ception as an annual series of international conferences. repetitive tasks such as capitalizing a column of names man- Over the last decade Inductive Programming has attracted ually. Inductive Programming has a long research tradition a series of international workshops. Recent surveys [7, 19, and recent developments demonstrate it can liberate users 14] reflect the wide variety of implementations and applica- from many tasks of this kind. tions in this area. In the domain of end-user programming, programming by Key insights demonstration approaches were proposed which support the learning of small routines from observing the input behavior • Real-world applications emerge with spreadsheet of users [5, 25, 23]. The above-mentioned Microsoft Ex- tools, scripting, and intelligent program tutors. cel's sub-system Flash Fill provides an impressive illustra- • Learning from few examples is possible because users tion that program synthesis methods developed in IP can and systems share the same background knowledge. be successfully applied to gain more flexibility and power • Search is guided by domain-specific languages and for end-user programming [9, 11]. Further applications are the use of higher-order knowledge. being realized for other desktop applications (for example, PowerShell scripting in the ConvertFrom-String cmdlet [2]) Much of the world's population use computers for every- as well as for special-purpose devices such as home robots day tasks, but most fail to benefit from the power of com- and smartphones. putation due to their inability to program. Most crucially, In this paper, several of these current applications are users often have to perform repetitive actions manually be- presented. We contrast the specific characteristics of IP with cause they are not able to use the macro languages which those of typical machine learning approaches and we show are available for many application programs. Recently, a how IP is related to cognitive models of human inductive first mass-market product was presented in the form of the learning. We finally discuss recent techniques |such as use Flash Fill feature in Microsoft Excel 2013. Flash Fill al- of domain-specific languages and meta-level learning| that lows end users to automatically generate string processing widen the scope and power of IP and discuss new challenges. programs for spreadsheets from one or more user-provided examples. Flash Fill is able to learn a large variety of quite complex programs from only a few examples because of in- 1. REAL-WORLD APPLICATIONS corporation of inductive programming methods. Originally, IP was applied to synthesizing functional or Inductive Programming (IP) is an inter-disciplinary do- logic programs for general purpose tasks such as manipulat- main of research in computer science, artificial intelligence, ing data structures (e.g., sorting or reversing a list). These and cognitive science that studies the automatic synthesis of investigations showed that small programs could be synthe- computer programs from examples and background knowl- sized from a few input/output examples. The recent IT rev- edge. IP developed from research on inductive program syn- olution has created real-world opportunities for such tech- thesis, now called inductive functional programming (IFP), niques. Most of today's large number of computer users and from inductive inference techniques using logic, nowa- are non-programmers and are limited to being passive con- days termed inductive logic programming (ILP). IFP ad- sumers of the software that is made available to them. IP dresses the synthesis of recursive functional programs gener- can empower such users to more effectively leverage comput- alized from regularities detected in (traces of) input/output ers for automating their daily repetitive tasks. We discuss examples [42, 20] using generate-and-test approaches based below some such opportunities, especially in the areas of on evolutionary [35, 28, 36] or systematic [17, 29] search End-user Programming and Education. or data-driven analytical approaches [39, 6, 18, 11, 37, 24]. Its development is complementary to efforts in synthesizing 1.1 End-User Programming programs from complete specifications using deductive and End users of computational devices often need to create formal methods [8]. small (and perhaps one-off) scripts to automate repetitive ILP originated from research on induction in a logical tasks. These users can easily specify their intent using ex- framework [40, 31] with influence from artificial intelligence, amples, making IP a great fit. For instance, consider the machine learning and relational databases. It is a mature domain of data manipulation. Documents of various types, area with its own theory, implementations, and applications such as text/log files, spreadsheets, and webpages, offer their creators great flexibility in storing and organizing hierarchi- cal data by combining presentation and formatting with the underlying data model. However, this makes it extremely hard to extract the underlying data for common tasks such as data processing, querying, altering the presentation view, or transforming data to another storage format. Existing programmatic solutions to manipulating data (such as Excel macro language, regular expression libraries inside Perl/Python, and JQuery library for Javascript) have three key limitations. First, the solutions are domain-specific and require expertise in different technologies for different docu- ment types. Second, they require understanding of the entire underlying document structure including the data fields the end-user is not interested in (some of which may not even be visible in the presentation layer of the document). Third, and most significantly, they require knowledge of program- ming. As a result, users have to resort to manual copy-paste, which is both time consuming and error prone. Figure 2: Flash Fill [9]: An Excel 2013 feature that automates repetitive string transformations using exam- ples. Once the user performs one instance of the de- sired transformation (row 2, col. B) and proceeds to trans- forming another instance (row 3, col. B), Flash Fill learns Ana Trujillo a program Concatenate(ToLower(Substring(v,WordToken,1)), 357 21th Place SE \ ", ToLower(SubString(v,WordToken,2))), which extracts the Redmond, WA (757) 555-1634 Label 1 Label 2 Label 3 first two words in input string v (col. A), converts them Ana Trujillo Redmond (757) 555-1634 Antonio Moreno Antonio Moreno Renton (411) 555-2786 to lowercase, and concatenates them separated by a space 515 93th Lane Thomas Hardy Seattle (412) 555-5719 Renton, WA Christina Berglund Redmond (443) 555-6774 character, to automate the repetitive task. (411) 555-2786 Hanna Moos Puyallup (376) 555-2462 Frederique Citeaux Redmond (689) 555-2770 Thomas Hardy 742 17th Street NE (b) Seattle, WA Example sin A 1 + cos A (412) 555-5719 Problem + = 2 csc A 1 + cos A sin A Christina Berglund PairSeq SS ::= LinesMap(λx:Pair(Pos(x;p1);Pos(x;p2));LS) + 475 22th Lane Redmond, WA j StartSeqMap(λx:Pair(x; Pos(R0[x:]; p));PS) (443) 555-6774 Generalized T1A 1 ± T3A LineSeq LS ::= FilterInt(init; iter; BLS) Problem + = 2 T5A 1 ± T2A T4A Hanna Moos BoolLineSeq BLS ::= FilterBool(b; split(R0; `nn')) 785 45th Street NE Template PositionSeq PS ::= LinesMap(λx:Pos(x; p); LS) Puyallup, WA + where Ti 2 fcos; sin; tan; cot; sec; cscg (376) 555-2462 j FilterInt(init; iter; PosSeq(R0; rr)) Frederique Citeaux Pred b ::= λx:{Starts,Ends}With(r; x) j λx:Contains(r; k; x) cos A 1 − sin A 308 66th Place (c) + = 2 tan A Redmond, WA 1 − sin A cos A (689) 555-2770 1 − sin A cos A cos A 1 + sin A (a) + = 2 sec A 1 + sin A cos A Figure 1: FlashExtract [24]: A framework for extracting New cot A 1 + csc A data from documents of various kinds such as text files and Similar + = 2 sec A web pages using examples. Once the user highlights one or Problems 1 + csc A cot A two examples of each field in the textfile in (a), FlashEx- tan A 1 + sec A + = 2 csc A tract extracts more such instances and arranges them in a 1 + sec A tan A structured format in the table in (b). This is enabled by syn- sin A 1 − cos A + = 2 cot A thesis of a program in the domain-specific language (DSL) 1 − cos A sin A in (c) that is consistent with the examples in (a) followed by Figure 3: Problem generation for algebraic proof problems execution of that program on the textfile in (a). involving identities over analytic functions. A given problem is generalized into a template and valid instantiations are found by testing on random values for free variables. Inductive synthesis can help out with a variety of data ma- 1.2 Computer-aided Education nipulation tasks. These include: (a) Extracting data from Human learning and communication is often structured semi-structured documents including text files, web pages, around examples |be it a student trying to understand or and spreadsheets [24] (as in Fig. 1). (b) Transformation master a certain concept using examples, or be it a teacher of atomic data types such as strings [9] (as in Fig. 2) or trying to understand a student's misconceptions or provide numbers. Transformation of composite data types such as feedback using example behaviors. Example-based reason- tables [11] and XML [36]. (c) Formatting data [37]. Com- ing techniques developed in the inductive synthesis commu- bining these technologies in a pipeline of extraction, trans- nity can help automate several repetitive and structured formation, and formatting can allow end users to perform tasks in education including problem generation, solution sophisticated data manipulation tasks.