Genie: a Generator of Natural Language Semantic Parsers For
Total Page:16
File Type:pdf, Size:1020Kb
Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands Giovanni Campagna∗ Silei Xu∗ Mehrad Moradshahi Computer Science Department Computer Science Department Computer Science Department Stanford University Stanford University Stanford University Stanford, CA, USA Stanford, CA, USA Stanford, CA, USA gcampagn@cs:stanford:edu silei@cs:stanford:edu mehrad@cs:stanford:edu Richard Socher Monica S. Lam Salesforce, Inc. Computer Science Department Palo Alto, CA, USA Stanford University rsocher@salesforce:com Stanford, CA, USA lam@cs:stanford:edu Abstract → Natural language processing; • Software and its engi- To understand diverse natural language commands, virtual neering → Context specific languages. assistants today are trained with numerous labor-intensive, Keywords virtual assistants, semantic parsing, training manually annotated sentences. This paper presents a method- data generation, data augmentation, data engineering ology and the Genie toolkit that can handle new compound ACM Reference Format: commands with significantly less manual effort. Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, We advocate formalizing the capability of virtual assistants and Monica S. Lam. 2019. Genie: A Generator of Natural Lan- with a Virtual Assistant Programming Language (VAPL) and guage Semantic Parsers for Virtual Assistant Commands. In Pro- using a neural semantic parser to translate natural language ceedings of the 40th ACM SIGPLAN Conference on Programming into VAPL code. Genie needs only a small realistic set of input Language Design and Implementation (PLDI ’19), June 22–26, 2019, sentences for validating the neural model. Developers write Phoenix, AZ, USA. ACM, New York, NY, USA, 17 pages. https: templates to synthesize data; Genie uses crowdsourced para- //doi:org/10:1145/3314221:3314594 phrases and data augmentation, along with the synthesized Thingpedia data, to train a semantic parser. User input We also propose design principles that make VAPL lan- guages amenable to natural language translation. We apply Get a cat picture and post it on Facebook with these principles to revise ThingTalk, the language used by caption funny cat. the Almond virtual assistant. We use Genie to build the first semantic parser that can support compound virtual assistants commands with unquoted free-form parameters. Genie achieves a 62% accuracy on realistic user inputs. We now => @com.thecatapi.get() ThingTalk => @com.facebook.post_picture(picture_url = demonstrate Genie’s generality by showing a 19% and 31% picture_url, caption = "funny cat"); improvement over the previous state of the art on a music skill, aggregate functions, and access control. f CCS Concepts • Human-centered computing → Per- sonal digital assistants; • Computing methodologies Execution result ∗Equal contribution PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Figure 1. An example of translating and executing com- This is the author’s version of the work. It is posted here for your personal pound virtual assistant commands. use. Not for redistribution. The definitive Version of Record was published in Proceedings of the 40th ACM SIGPLAN Conference on Programming Language 1 Introduction Design and Implementation (PLDI ’19), June 22–26, 2019, Phoenix, AZ, USA, https://doi:org/10:1145/3314221:3314594. Personal virtual assistants provide users with a natural lan- guage interface to a wide variety of web services and IoT PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA G. Campagna, S. Xu, M. Moradshahi, R. Socher, and M. S. Lam devices. Not only must they understand primitive commands However, the original ThingTalk was not amenable to nat- across many domains, but they must also understand the ural language translation, and no usable semantic parser has composition of these commands to perform entire tasks. been developed. In attempting to create an effective semantic State-of-the-art virtual assistants are based on semantic pars- parser for ThingTalk, we discovered important design princi- ing, a machine learning algorithm that converts natural lan- ples for VAPL, such as matching the non-developers’ mental guage to a semantic representation in a formal language. model and keeping the semantics of components orthogo- The breadth of the virtual assistant interface makes it par- nal. Also, VAPL programs must have a (unique) canonical ticularly challenging to design the semantic representation. form so the result of the neural network can be checked for Furthermore, there is no existing corpus of natural language correctness easily. We applied these principles to overhaul commands to train the neural model for new capabilities. and extend the design of ThingTalk. Unless noted otherwise, This paper advocates using a Virtual Assistant Programming we use ThingTalk to refer to the new design in the rest of Language (VAPL) to capture the formal semantics of the vir- the paper. tual assistant capability. We also present Genie, a toolkit for creating a semantic parser for new virtual assistant capabili- 1.2 Training Data Acquisition ties that can be used to bootstrap real data acquisition. Virtual assistant development is labor-intensive, with Alexa boasting a workforce of 10,000 employees [36]. Obtaining training data for the semantic parser is one of the challeng- 1.1 Virtual Assistant Programming Languages ing tasks. How do we get training data before deployment? Previous semantic parsing work, including commercial as- How can we reduce the cost of annotating usage data? Wang sistants, typically translates natural language into an inter- et al. [57] propose a solution to acquire training data for the mediate representation that matches the semantics of the task of question answering over simple domains. They use a sentences closely [4, 30, 33, 44, 51]. For example, the Alexa syntax-driven approach to create a canonical sentence for Meaning Representation Language [30, 44] is associated with each formal program, ask crowdsourced workers to para- a closed ontology of 20 domains, each manually tuned for phrase canonical sentences to make them more natural, then accuracy. Semantically equivalent sentences have different use the paraphrases to train a machine learning model that representations, requiring complex and expensive manual can match input sentences against possible canonical sen- annotation by experts, who must know the details of the tences. Wang et al.’s approach designs each domain ontology formalism and associated ontology. The ontology also limits individually, and each domain is small enough that all possi- the scope of the available commands, as every parameter ble logical forms can be enumerated up to a certain depth. must be an entity in the ontology (a person, a location, etc.) This approach was used in the original ThingTalk seman- and cannot be free-form text. tic parser and has been shown to be inadequate [8]. It is Our approach is to represent the capability of the virtual as- infeasible to collect paraphrases for all the sentences sup- sistant fully and formally as a VAPL; we use a deep-learning ported by a VAPL language. Virtual assistants have powerful semantic parser to translate natural language into VAPL code, constructs to connect many diverse domains, and their ca- which can directly be executed by the assistant. Thus, the pability scales superlinearly with the addition of APIs. Even assistant’s full capability is exposed to the neural network, with our small Thingpedia, ThingTalk supports hundreds eliminating the need and inefficiency of an intermediate rep- of thousands of distinct programs. Also, it is not possible resentation. The VAPL code can also be converted back into to generate just one canonical natural language that can a canonical natural language sentence to confirm the pro- be understood across different domains. Crowdworkers of- gram before execution. Furthermore, new capabilities can ten paraphrase sentences incorrectly or just make minor be supported by extending the VAPL. modifications to original sentences. The ThingTalk language designed for the open-source Al- Our approach is to design a NL-template language to help mond virtual assistant is an example of a VAPL[8]. ThingTalk developers data-engineer a good training set. This language has one construct which has three clauses: when some event lets developers capture common ways in which VAPL pro- happens, get some data, and perform some action, each of grams are expressed in natural language. The NL-templates which can be predicated. This construct combines primitives are used to synthesize pairs of natural language sentences from the extensible runtime skill library, Thingpedia, cur- and their corresponding VAPL code. A sample of such sen- rently consisting of over 250 APIs to Internet services and tences is paraphrased by crowdsource workers to make them IoT devices. Despite its lean syntax, ThingTalk is expres- more natural. The paraphrases further inform more useful sive. It is a superset of what can be expressed with IFTTT, templates, which in turn derives more diverse sentences which has crowdsourced more than 250,000 unique com- for paraphrasing. This iterative process increases the cost- pound commands [56]. Fig.1 shows how a natural-language effectiveness of paraphrasing. sentence can be translated into a ThingTalk program, using Whereas the traditional approach is only to train with the services in Thingpedia. paraphrase data, we are the first to add synthesized