Genie: A Generator of Natural Language Semantic Parsers for Virtual Assistant Commands Giovanni Campagna∗ Silei Xu∗ Mehrad Moradshahi Computer Science Department Computer Science Department Computer Science Department Stanford University Stanford University Stanford University Stanford, CA, USA Stanford, CA, USA Stanford, CA, USA [email protected] [email protected] [email protected] Richard Socher Monica S. Lam Salesforce, Inc. Computer Science Department Palo Alto, CA, USA Stanford University [email protected] Stanford, CA, USA [email protected] Abstract CCS Concepts • Human-centered computing → Per- To understand diverse natural language commands, virtual sonal digital assistants; • Computing methodologies assistants today are trained with numerous labor-intensive, → Natural language processing; • Software and its engi- manually annotated sentences. This paper presents a method- neering → Context specific languages. ology and the Genie toolkit that can handle new compound Keywords virtual assistants, semantic parsing, training commands with significantly less manual effort. data generation, data augmentation, data engineering We advocate formalizing the capability of virtual assistants with a Virtual Assistant Programming Language (VAPL) and ACM Reference Format: using a neural semantic parser to translate natural language Giovanni Campagna, Silei Xu, Mehrad Moradshahi, Richard Socher, into VAPL code. Genie needs only a small realistic set of input and Monica S. Lam. 2019. Genie: A Generator of Natural Lan- sentences for validating the neural model. Developers write guage Semantic Parsers for Virtual Assistant Commands. In Pro- templates to synthesize data; Genie uses crowdsourced para- ceedings of the 40th ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI ’19), June 22–26, 2019, phrases and data augmentation, along with the synthesized Phoenix, AZ, USA. ACM, New York, NY, USA, 17 pages. https: data, to train a semantic parser. //doi.org/10.1145/3314221.3314594 We also propose design principles that make VAPL lan- guages amenable to natural language translation. We apply 7KLQJSHGLD these principles to revise ThingTalk, the language used by the Almond virtual assistant. We use Genie to build the 8VHULQSXW first semantic parser that can support compound virtual *HWDFDWSLFWXUHDQGSRVW assistants commands with unquoted free-form parameters. LWRQ)DFHERRNZLWK Genie achieves a 62% accuracy on realistic user inputs. We FDSWLRQIXQQ\FDW demonstrate Genie’s generality by showing a 19% and 31% improvement over the previous state of the art on a music skill, aggregate functions, and access control. QRZ !#FRPWKHFDWDSLJHW 7KLQJ7DON !#FRPIDFHERRNSRVWBSLFWXUH SLFWXUHBXUO ∗ Equal contribution SLFWXUHBXUOFDSWLRQ IXQQ\FDW Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies I are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights ([HFXWLRQ for components of this work owned by others than the author(s) must UHVXOW be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. PLDI ’19, June 22–26, 2019, Phoenix, AZ, USA © 2019 Copyright held by the owner/author(s). Publication rights licensed to ACM. Figure 1. An example of translating and executing com- ACM ISBN 978-1-4503-6712-7/19/06. pound virtual assistant commands. https://doi.org/10.1145/3314221.3314594 394 PLDI ’19, June 22ś26, 2019, Phoenix, AZ, USA G. Campagna, S. Xu, M. Moradshahi, R. Socher, and M. S. Lam 1 Introduction which has crowdsourced more than 250,000 unique com- Personal virtual assistants provide users with a natural lan- pound commands [56]. Fig. 1 shows how a natural-language guage interface to a wide variety of web services and IoT sentence can be translated into a ThingTalk program, using devices. Not only must they understand primitive commands the services in Thingpedia. across many domains, but they must also understand the However, the original ThingTalk was not amenable to nat- composition of these commands to perform entire tasks. ural language translation, and no usable semantic parser has State-of-the-art virtual assistants are based on semantic pars- been developed. In attempting to create an effective semantic ing, a machine learning algorithm that converts natural lan- parser for ThingTalk, we discovered important design princi- guage to a semantic representation in a formal language. ples for VAPL, such as matching the non-developers’ mental The breadth of the virtual assistant interface makes it par- model and keeping the semantics of components orthogo- ticularly challenging to design the semantic representation. nal. Also, VAPL programs must have a (unique) canonical Furthermore, there is no existing corpus of natural language form so the result of the neural network can be checked for commands to train the neural model for new capabilities. correctness easily. We applied these principles to overhaul This paper advocates using a Virtual Assistant Programming and extend the design of ThingTalk. Unless noted otherwise, Language (VAPL) to capture the formal semantics of the vir- we use ThingTalk to refer to the new design in the rest of tual assistant capability. We also present Genie, a toolkit for the paper. creating a semantic parser for new virtual assistant capabili- 1.2 Training Data Acquisition ties that can be used to bootstrap real data acquisition. Virtual assistant development is labor-intensive, with Alexa boasting a workforce of 10,000 employees [36]. Obtaining training data for the semantic parser is one of the challeng- 1.1 Virtual Assistant Programming Languages ing tasks. How do we get training data before deployment? Previous semantic parsing work, including commercial as- How can we reduce the cost of annotating usage data? Wang sistants, typically translates natural language into an inter- et al. [57] propose a solution to acquire training data for the mediate representation that matches the semantics of the task of question answering over simple domains. They use a sentences closely [4, 30, 33, 44, 51]. For example, the Alexa syntax-driven approach to create a canonical sentence for Meaning Representation Language [30, 44] is associated with each formal program, ask crowdsourced workers to para- a closed ontology of 20 domains, each manually tuned for phrase canonical sentences to make them more natural, then accuracy. Semantically equivalent sentences have different use the paraphrases to train a machine learning model that representations, requiring complex and expensive manual can match input sentences against possible canonical sen- annotation by experts, who must know the details of the tences. Wang et al.’s approach designs each domain ontology formalism and associated ontology. The ontology also limits individually, and each domain is small enough that all possi- the scope of the available commands, as every parameter ble logical forms can be enumerated up to a certain depth. must be an entity in the ontology (a person, a location, etc.) This approach was used in the original ThingTalk seman- and cannot be free-form text. tic parser and has been shown to be inadequate [8]. It is Our approach is to represent the capability of the virtual as- infeasible to collect paraphrases for all the sentences sup- sistant fully and formally as a VAPL; we use a deep-learning ported by a VAPL language. Virtual assistants have powerful semantic parser to translate natural language into VAPL code, constructs to connect many diverse domains, and their ca- which can directly be executed by the assistant. Thus, the pability scales superlinearly with the addition of APIs. Even assistant’s full capability is exposed to the neural network, with our small Thingpedia, ThingTalk supports hundreds eliminating the need and inefficiency of an intermediate rep- of thousands of distinct programs. Also, it is not possible resentation. The VAPL code can also be converted back into to generate just one canonical natural language that can a canonical natural language sentence to confirm the pro- be understood across different domains. Crowdworkers of- gram before execution. Furthermore, new capabilities can ten paraphrase sentences incorrectly or just make minor be supported by extending the VAPL. modifications to original sentences. The ThingTalk language designed for the open-source Al- Our approach is to design a NL-template language to help mond virtual assistant is an example of a VAPL[8]. ThingTalk developers data-engineer a good training set. This language has one construct which has three clauses: when some event lets developers capture common ways in which VAPL pro- happens, get some data, and perform some action, each of grams are expressed in natural language. The NL-templates which can be predicated. This construct combines primitives are used to synthesize pairs of natural language sentences from the extensible runtime skill library, Thingpedia, cur- and their corresponding VAPL code. A sample of such sen- rently consisting of over 250 APIs to Internet services and tences is paraphrased by crowdsource workers to make them IoT devices. Despite its lean syntax, ThingTalk is expres- more natural. The paraphrases further inform more useful sive. It is a
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages17 Page
-
File Size-