Learning Semantic Parsers for If-This-Then-That Recipes

Language to Code: Learning Semantic Parsers for If-This-Then-That Recipes Chris Quirk Raymond Mooney∗ Michel Galley Microsoft Research UT Austin Microsoft Research Redmond, WA, USA Austin TX, USA Redmond, WA, USA [email protected] [email protected] [email protected] Abstract these recipes represents a step toward complex natural language programming, moving beyond single Using natural language to write programs commands toward compositional statements with is a touchstone problem for computational control flow. linguistics. We present an approach that learns to map natural-language descrip- Several services, such as Tasker and IFTTT, al- tions of simple “if-then” rules to executable low users to create simple programs with “triggers” code. By training and testing on a large cor- and “actions.” For example, one can program their pus of naturally-occurring programs (called Phillips Hue light bulbs to flash red and blue when “recipes”) and their natural language de- the Cubs hit a home run. A somewhat complicated scriptions, we demonstrate the ability to GUI allows users to construct these recipes based effectively map language to code. We on a set of information “channels.” These chan- compare a number of semantic parsing ap- nels represent many types of information. Weather, proaches on the highly noisy training data news, and financial services have provided constant collected from ordinary users, and find that updates through web services. Home automation loosely synchronous systems perform best. sensors and controllers such as motion detectors, thermostats, location sensors, garage door openers, etc. are also available. Users can then describe the recipes they have constructed in natural language 1 Introduction and publish them. The ability to program computers using natural lan- Our goal is to build semantic parsers that al- guage would clearly allow novice users to more low users to describe recipes in natural language effectively utilize modern information technology. and have them automatically mapped to exe- Work in semantic parsing has explored mapping cutable code. We have collected 114,408 recipe- natural language to some formal domain-specific description pairs from the http://ifttt.com website. programming languages such as database queries Because users often provided short or incomplete (Woods, 1977; Zelle and Mooney, 1996; Berant et English descriptions, the resulting data is extremely al., 2013), commands to robots (Kate et al., 2005), noisy for the task of training a semantic parser. operating systems (Branavan et al., 2009), smart- Therefore, we have constructed semantic-parser phones (Le et al., 2013), and spreadsheets (Gulwani learners that utilize and adapt ideas from several and Marron, 2014). Developing such language- previous approaches (Kate and Mooney, 2006; to-code translators has generally required specific Wong and Mooney, 2006) to learn an effective in- dedicated efforts to manually construct parsers or terpreter from such noisy training data. We present large corpora of suitable training examples. results on our collected IFTTT corpus demonstrat- An interesting subset of the possible program ing that our best approach produces more accurate space is if-then “recipes,” simple rules that allow programs than several competing baselines. By users to control many aspects of their digital life exploiting such “found data” on the web, seman- including smart devices. Automatically parsing tic parsers for natural-language programming can ∗Work performed while visiting Microsoft Research. potentially be developed with minimal effort. 878 Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing, pages 878–888, Beijing, China, July 26-31, 2015. c 2015 Association for Computational Linguistics 2 Background Since our “found data” for IFTTT is extremely noisy, we have taken an approach similar to KRISP; We take an approach to semantic parsing that however, we use a probabilistic log-linear text clas- directly exploits the formal grammar of the tar- sifier rather than an SVM to recognize productions. get meaning representation language, in our case This method of assembling well-formed pro- IFTTT recipes. Given supervised training data in grams guided by a natural language query bears the form of natural-language sentences each paired some resemblance to Keyword Programming (Lit- with their corresponding IFTTT recipe, we learn tle and Miller, 2007). In that approach, users en- to introduce productions from the formal-language ter natural language queries in the middle of an grammar into the derivation of the target program existing program; this query drives a search for based on expressions in the natural-language input. programs that are relevant to the query and fit This approach originated with the SILT system within the surrounding program. However, the (Kate et al., 2005) and was further developed in function used to score derivations is a simple match- the WASP (Wong and Mooney, 2006; Wong and ing heuristic relying on the overlap between query Mooney, 2007b) and KRISP (Kate and Mooney, terms and program identifiers. Our approach uses 2006) systems. machine learning to build a correspondence be- WASP casts semantic parsing as a syntax-based tween queries and recipes based on parallel data. statistical machine translation (SMT) task, where There is also a large body of work applying Com- a synchronous context-free grammar (SCFG) (Wu, binatory Categorical Grammars to semantic pars- 1997; Chiang, 2005; Galley et al., 2006) is used ing, starting with Zettlemoyer and Collins (2005). to model the translation of natural language into a Depending on the set of combinators used, this ap- formal meaning representation. It uses statistical proach can capture more expressive languages than models developed for syntax-based SMT for lexical synchronous context-free MT. In practice, however, learning and parse disambiguation. Productions in synchronous MT systems have competitive accu- the formal-language grammar are used to construct racy scores (Andreas et al., 2013). Therefore, we synchronous rules that simultaneously model the have not yet evaluated CCG on this task. generation of the natural language. WASP was sub- sequently “inverted” to use the same synchronous 3 If-this-then-that recipes grammar to generate natural language from the for- The recipes considered in this paper are diverse and mal language (Wong and Mooney, 2007a). powerful despite being simple in structure. Each KRISP uses classifiers trained using a Support- recipe always contains exactly one trigger and one Vector Machine (SVM) to introduce productions action. Whenever the conditions of the trigger are in the derivation of the formal translation. The satisfied, the action is performed. The resulting productions of the formal-language grammar are recipes can perform tasks such as home automation treated like semantic concepts to be recognized (“turn on my lights when I arrive home”), home from natural-language expressions. For each pro- security (“text me if the door opens”), organization duction, an SVM classifier is trained using a string (“add receipt emails to a spreadsheet”), and much subsequence kernel (Lodhi et al., 2002). Each clas- more (“remind me to drink water if I’ve been at sifier can then estimate the probability that a given a bar for more than two hours”). Triggers and natural-language substring introduces a production actions are drawn from a wide range of channels into the derivation of the target representation. Dur- that must be activated by each user. These channels ing semantic parsing, these classifiers are employed can represent many entities and services, including to estimate probabilities on different substrings devices (such as Android devices or WeMo light of the sentence to compositionally build the most switches) and knowledge sources (such as ESPN probable meaning representation for the sentence. or Gmail). Each channel exposes a set of functions Unlike WASP whose synchronous grammar needs for both trigger and action. to be able to directly parse the input, KRISP’s ap- Several services such as IFTTT, Tasker, and proach to “soft matching” productions allows it Llama allow users to author if-this-then-that to produce a parse for any input sentence. Conse- recipes. IFTTT is unique in that it hosts a large quently, KRISP was shown to be much more robust set of recipes along with descriptions and other to noisy training data than previous approaches to metadata. Users of this site construct recipes using semantic parsing (Kate and Mooney, 2006). a GUI interface to select the trigger, action, and the 879 parameters for both trigger and action. After the 4 Program synthesis methods recipe is authored, the user must provide a descrip- We consider a number of methods to map the natu- tion and optional set of notes for this recipe and ral language description of a problem into its for- publish the recipe. Other users can browse and use mal program representation. these published recipes; if a user particularly likes a recipe, they can mark it as a favorite. 4.1 Program retrieval As of January 2015, we found 114,408 recipes One natural baseline is retrieval. Multiple users on http://ifttt.com. Among the available recipes we could potentially have similar needs and therefore encountered a total of 160 channels. In total, we author similar or even identical programs. Given found 552 trigger functions from 128 of those chan- a novel description, we can search for the closest nels, and 229 action functions from 99 channels, description in a table of program-description pairs, for a total of 781 functions. Each recipe includes and return the associated program. We explored a number of pieces of information: description1, several text-similarity metrics, and found that string note, author, number of uses, etc. 99.98% of the edit distance over the unmodified character se- entries have a description, and 35% contain a note. quence achieved best performance on the devel- Based on availability, we focused primarily on the opment set.

Load more