Inductive Programming Meets the Real World
Total Page:16
File Type:pdf, Size:1020Kb
Inductive Programming Meets the Real World Sumit Gulwani José Hernández-Orallo Emanuel Kitzelmann Microsoft Corporation, Universitat Politècnica de Adam-Josef-Cüppers [email protected] València, Commercial College, [email protected] [email protected] Stephen H. Muggleton Ute Schmid Benjamin Zorn Imperial College London, UK. University of Bamberg, Microsoft Corporation, [email protected] Germany. [email protected] [email protected] Since most end users lack programming skills they often thesis, now called inductive functional programming (IFP), spend considerable time and effort performing tedious and and from inductive inference techniques using logic, nowa- repetitive tasks such as capitalizing a column of names man- days termed inductive logic programming (ILP). IFP ad- ually. Inductive Programming has a long research tradition dresses the synthesis of recursive functional programs gener- and recent developments demonstrate it can liberate users alized from regularities detected in (traces of) input/output from many tasks of this kind. examples [42, 20] using generate-and-test approaches based on evolutionary [35, 28, 36] or systematic [17, 29] search Key insights or data-driven analytical approaches [39, 6, 18, 11, 37, 24]. Its development is complementary to efforts in synthesizing • Supporting end-users to automate complex and programs from complete specifications using deductive and repetitive tasks using computers is a big challenge formal methods [8]. for which novel technological breakthroughs are de- ILP originated from research on induction in a logical manded. framework [40, 31] with influence from artificial intelligence, • The integration of inductive programing techniques machine learning and relational databases. It is a mature in software applications can support users by learning area with its own theory, implementations, and applications domain specific programs from observing interactions and recently celebrated the 20th anniversary [34] of its in- of the user with the system. ception as an annual series of international conferences. • Inductive programming is being transferred to real- Over the last decade Inductive Programming has attracted world applications such as spreadsheet tools, script- a series of international workshops. Recent surveys [7, 19, ing, and intelligent program tutors. 14] reflect the wide variety of implementations and applica- • In contrast to standard machine learning, in induc- tions in this area. tive programming learning from few examples is pos- In the domain of end-user programming, programming by sible because users and systems share the same back- demonstration approaches were proposed which support the ground knowledge. learning of small routines from observing the input behavior • The efficient induction of small but conceptually of users [5, 25, 23]. The above-mentioned Microsoft Ex- complex programs becomes possible because search cel's sub-system Flash Fill provides an impressive illustra- is guided by domain-specific languages and the use tion that program synthesis methods developed in IP can of higher-order knowledge. be successfully applied to gain more flexibility and power for end-user programming [9, 11]. Further applications are Much of the world's population use computers for every- being realized for other desktop applications (for example, day tasks, but most fail to benefit from the power of com- PowerShell scripting in the ConvertFrom-String cmdlet [2]) putation due to their inability to program. Most crucially, as well as for special-purpose devices such as home robots users often have to perform repetitive actions manually be- and smartphones. cause they are not able to use the macro languages which In this paper, several of these current applications are are available for many application programs. Recently, a presented. We contrast the specific characteristics of IP with first mass-market product was presented in the form of the those of typical machine learning approaches and we show Flash Fill feature in Microsoft Excel 2013. Flash Fill al- how IP is related to cognitive models of human inductive lows end users to automatically generate string processing learning. We finally discuss recent techniques |such as use programs for spreadsheets from one or more user-provided of domain-specific languages and meta-level learning| that examples. Flash Fill is able to learn a large variety of quite widen the scope and power of IP and discuss new challenges. complex programs from only a few examples because of in- corporation of inductive programming methods. Inductive Programming (IP) is an inter-disciplinary do- 1. REAL-WORLD APPLICATIONS main of research in computer science, artificial intelligence, Originally, IP was applied to synthesizing functional or and cognitive science that studies the automatic synthesis of logic programs for general purpose tasks such as manipulat- computer programs from examples and background knowl- ing data structures (e.g., sorting or reversing a list). These edge. IP developed from research on inductive program syn- investigations showed that small programs could be synthe- sized from a few input/output examples. The recent IT rev- Inductive synthesis can help out with a variety of data ma- olution has created real-world opportunities for such tech- nipulation tasks. These include: (a) Extracting data from niques. Most of today's large number of computer users semi-structured documents including text files, web pages, are non-programmers and are limited to being passive con- and spreadsheets [24] (as in Fig. 1). (b) Transformation sumers of the software that is made available to them. IP of atomic data types such as strings [9] (as in Fig. 2) or can empower such users to more effectively leverage comput- numbers. Transformation of composite data types such as ers for automating their daily repetitive tasks. We discuss tables [11] and XML [36]. (c) Formatting data [37]. Com- below some such opportunities, especially in the areas of bining these technologies in a pipeline of extraction, trans- End-user Programming and Education. formation, and formatting can allow end users to perform sophisticated data manipulation tasks. 1.1 End-User Programming End users of computational devices often need to create small (and perhaps one-off) scripts to automate repetitive tasks. These users can easily specify their intent using ex- amples, making IP a great fit. For instance, consider the domain of data manipulation. Documents of various types, such as text/log files, spreadsheets, and webpages, offer their creators great flexibility in storing and organizing hierarchi- cal data by combining presentation and formatting with the underlying data model. However, this makes it extremely hard to extract the underlying data for common tasks such as data processing, querying, altering the presentation view, or transforming data to another storage format. Existing programmatic solutions to manipulating data (such as Excel macro language, regular expression libraries inside Perl/Python, and JQuery library for Javascript) have three key limitations. First, the solutions are domain-specific and require expertise in different technologies for different docu- ment types. Second, they require understanding of the entire Figure 2: Flash Fill [9]: An Excel 2013 feature that underlying document structure including the data fields the automates repetitive string transformations using exam- end-user is not interested in (some of which may not even ples. Once the user performs one instance of the de- be visible in the presentation layer of the document). Third, sired transformation (row 2, col. B) and proceeds to trans- and most significantly, they require knowledge of program- forming another instance (row 3, col. B), Flash Fill learns ming. As a result, users have to resort to manually perform- a program Concatenate(ToLower(Substring(v,WordToken,1)), ing repetitive tasks, which is both time consuming and error \ ", ToLower(SubString(v,WordToken,2))), which extracts the prone. first two words in input string v (col. A), converts them to lowercase, and concatenates them separated by a space Ana Trujillo character, to automate the repetitive task. 357 21th Place SE Redmond, WA (757) 555-1634 Label 1 Label 2 Label 3 Ana Trujillo Redmond (757) 555-1634 Antonio Moreno Antonio Moreno Renton (411) 555-2786 515 93th Lane Thomas Hardy Seattle (412) 555-5719 Example sin A 1 + cos A Renton, WA Christina Berglund Redmond (443) 555-6774 Problem + = 2 csc A (411) 555-2786 Hanna Moos Puyallup (376) 555-2462 1 + cos A sin A Frederique Citeaux Redmond (689) 555-2770 + Thomas Hardy (b) 742 17th Street NE Generalized T A 1 ± T A Seattle, WA 1 + 3 = 2 T A (412) 555-5719 Problem 1 ± T A T A 5 Template 2 4 Christina Berglund PairSeq SS ::= LinesMap(λx:Pair(Pos(x;p1);Pos(x;p2));LS) 475 22th Lane + where Ti 2 fcos; sin; tan; cot; sec; cscg Redmond, WA j StartSeqMap(λx:Pair(x; Pos(R0[x:]; p));PS) (443) 555-6774 LineSeq LS ::= FilterInt(init; iter; BLS) cos A 1 − sin A Hanna Moos BoolLineSeq BLS ::= FilterBool(b; split(R0; `nn')) + = 2 tan A 785 45th Street NE 1 − sin A cos A Puyallup, WA PositionSeq PS ::= LinesMap(λx:Pos(x; p); LS) 1 − sin A cos A (376) 555-2462 j FilterInt(init; iter; PosSeq(R0; rr)) cos A 1 + sin A + = 2 sec A Frederique Citeaux Pred b ::= λx:{Starts,Ends}With(r; x) j λx:Contains(r; k; x) 1 + sin A cos A 308 66th Place New Redmond, WA (c) cot A 1 + csc A (689) 555-2770 Similar + = 2 sec A (a) Problems 1 + csc A cot A tan A 1 + sec A Figure 1: FlashExtract [24]: A framework for extracting + = 2 csc A data from documents of various kinds such as text files and 1 + sec A tan A sin A 1 − cos A web pages using examples. Once the user highlights one or + = 2 cot A two examples of each field in the textfile in (a), FlashEx- 1 − cos A sin A tract extracts more such instances and arranges them in a Figure 3: Problem generation for algebraic proof problems structured format in the table in (b).