Learning Executable Semantic Parsers for Natural Language Understanding

Learning Executable Semantic Parsers for Natural Language Understanding Percy Liang Computer Science Department Stanford University Stanford, CA [email protected] ABSTRACT cle focuses on semantic parsing, an area within the field of For building question answering systems and natural lan- natural language processing (NLP), which has been growing guage interfaces, semantic parsing has emerged as an impor- over the last decade. Semantic parsers map input utter- tant and powerful paradigm. Semantic parsers map natural ances into semantic representations called logical forms that language into logical forms, the classic representation for support this form of reasoning. For example, the first utter- many important linguistic phenomena. The modern twist ance above would map onto the logical form max(primes \ is that we are interested in learning semantic parsers from (−∞; 10)). We can think of the logical form as a program data, which introduces a new layer of statistical and com- that is executed to yield the desired behavior (e.g., answer- putational issues. This article lays out the components of a ing 7). The second utterance would map onto a database statistical semantic parser, highlighting the key challenges. query; the third, onto an invocation of a calendar API. We will see that semantic parsing is a rich fusion of the logical and the statistical world, and that this fusion will play an Semantic parsing is rooted in formal semantics, pioneered by integral role in the future of natural language understanding logician Richard Montague [25], who famously argued that systems. there is \no important theoretical difference between natural languages and the artificial languages of logicians." Semantic parsing, by residing in the practical realm, is more exposed Categories and Subject Descriptors to the differences between natural language and logic, but I.2.7 [Artificial Intelligence]: Natural Language Process- it inherits two general insights from formal semantics: The ing|Language parsing and understanding first idea is model theory, which states that expressions (e.g., primes) are mere symbols which only obtain their meaning 1. INTRODUCTION or denotation (e.g., f2; 3; 5;::: g) by executing the expression A long-standing goal of artificial intelligence (AI) is to build with respect to a model, or in our terminology, a context. systems capable of understanding natural language. To fo- This property allows us to factor out the understanding of cus the notion of \understanding" a bit, let us say that the language (semantic parsing) from world knowledge (execu- system must produce an appropriate action upon receiving tion). Indeed, one can understand the utterance \What is an input utterance from a human. For example: the largest prime less than 10?" without actually computing the answer. The second idea is compositionality, a principle Context: knowledge of mathematics often attributed to Gottlob Frege, which states that the de- Utterance: What is the largest prime less than 10? notation of an expression is defined recursively in terms of Action: 7 the denotations of its subexpressions. For example, primes denotes the set of primes, (−∞; 10) denotes the set of num- Context: knowledge of geography bers smaller than 10, and so primes \ (−∞; 10) denotes the Utterance: What is the tallest mountain in Europe? intersection of those two sets. This compositionality is what Action: Mt. Elbrus allows us to have a succinct characterization of meaning for Context: user's calendar a combinatorial range of possible utterances. Utterance: Cancel all my meetings after 4pm tomorrow. Action: (removes meetings from calendar) Early systems. Logical forms have played a foundational role in natural language understanding systems since their We are interested in utterances such as the ones above, genesis in the 1960s. Early examples included LUNAR, a which require deep understanding and reasoning. This arti- natural language interface into a database about moon rocks [34], and SHRDLU, a system that could both answer ques- tions and perform actions in a toy blocks world environment [32]. For their time, these systems were significant achieve- ments. They were able to handle fairly complex linguistic phenomena and integrate syntax, semantics, and reasoning in an end-to-end application. For example, SHRDLU was able to process \Find a block which is taller than the one you are holding and put it into the box." However, as the systems were based on hand-crafted rules, it became increasingly difficult to generalize beyond the narrow domains and handle x: What is the largest prime less than 10? y: 7 the intricacies of general language. c: primes : f2; 3; 5; 7; 11;::: g Rise of machine learning. In the early 1990s, influenced by the successes of statistical techniques in the neighbor- ing speech recognition community, the field of NLP under- Figure 1: A natural language understanding prob- went a statistical revolution. Machine learning offered a new lem where the goal is to map an utterance x in a paradigm: Collect examples of the desired input-output be- context c to an action y. havior and then fit a statistical model to these examples. The simplicity of this paradigm coupled with the increase in data and computation allowed machine learning to prevail. [29, 2], identifying objects in a scene [23, 15], converting natural language to regular expressions [17], and many others. What fell out of favor was not only rule-based methods, but also the natural language understanding problems. In Outlook. Today, the semantic parsing community is a vi- the statistical NLP era, much of the community's atten- brant field, but it is still young and grappling with the com- tion turned to tasks|documentation classification, part-of- plexities of the natural language understanding problem. Se- speech tagging, and syntactic parsing|which fell short of mantic parsing poses three types of challenges: full end-to-end understanding. Even question answering systems relied less on understanding and more on a shallower • Linguistic: How should we represent the semantics of analysis coupled with a large collection of unstructured text natural language and construct it compositionally from documents [10], typified by the TREC competitions. the natural language? Statistical semantic parsing. The spirit of deep under- • Statistical: How can we learn semantic parsers from standing was kept alive by researchers in statistical semantic weak supervision and generalize well to new examples? parsing [36, 24, 33, 37, 19]. A variety of different semantic • Computational: How do we efficiently search over the representations and learning algorithms were employed, but combinatorially large space of possible logical forms? all of these approaches relied on having a labeled dataset of natural language utterances paired with annotated logical forms, for example: The rest of the paper is structured as follows: We first present a general framework for semantic parsing, intro- Utterance: What is the largest prime less than 10? ducing the key components (Section 2). The framework is Logical form: max(primes \ (−∞; 10)) pleasantly modular, with different choices of the components corresponding to existing semantic parsers in the literature Weak supervision. Over the last few years, two exciting (Section 3). We describe the datasets (Section 4) and then developments have really spurred interest in semantic pars- conclude (Section 5). ing. The first is reducing the amount of supervision from annotated logical forms to answers [13, 21]: 2. FRAMEWORK Utterance: What is the largest prime less than 10? Natural language understanding problem. In this ar- Action: 7 ticle, we focus on the following natural language understanding problem: Given an utterance x in a context c, output the This form of supervision is much easier to obtain via crowd- desired action y. Figure 1 shows the setup for a question sourcing. Although the logical forms are not observed, they answering application, in which case x is a question, c is a are still modeled as latent variables, which must be inferred knowledge base, and y is the answer. In a robotics applica- from the answer. This results in a more difficult learning tion, x is a command, c represents the robot's environment, problem, but [21] showed that it is possible to solve it with- and y is the desired sequence of actions to be carried by the out degrading accuracy. robot [29]. To build such a system, assume that we are given n a set of n examples f(xi; ci; yi)gi=1. We would like to use Scaling up. The second development is the scaling up of these examples to train a model that can generalize to new semantic parsers to more complex domains. Previous se- unseen utterances and contexts. mantic parsers had only been trained on limited domains such as US geography, but the creation of broad-coverage Semantic parsing components. This article focuses on knowledge bases such as Freebase [8] set the stage for a new a statistical semantic parsing approach to the above prob- generation of semantic parsers for question answering. Ini- lem, where the key is to posit an intermediate logical form tial systems required annotated logical forms [11], but soon, z that connects x and y. Specifically, z captures the seman- systems became trainable from answers [4, 18, 5]. Semantic tics of the utterance x, and it also executes to the action y parsers have even been extended beyond fixed knowledge (in the context of c). In our running example, z would be bases to semi-structured tables [26]. With the ability to max(primes \ (−∞; 10)). Our semantic parsing framework learn semantic parsers from question-answer pairs, it is easy consists of the following five components (see Figure 2): to collect datasets via crowdsourcing. As a result, semantic parsing datasets have grown by an order of magnitude. 1. Executor: computes the denotation (action) y = z c J K In addition, semantic parsers have been applied to a number given a logical form z and context c.

Load more