Learning Dependency-Based Compositional Semantics
Total Page:16
File Type:pdf, Size:1020Kb
Learning Dependency-Based Compositional Semantics Percy Liang∗ University of California, Berkeley Michael I. Jordan∗∗ University of California, Berkeley Dan Klein† University of California, Berkeley Suppose we want to build a system that answers a natural language question by representing its semantics as a logical form and computing the answer given a structured database of facts. The core part of such a system is the semantic parser that maps questions to logical forms. Semantic parsers are typically trained from examples of questions annotated with their target logical forms, but this type of annotation is expensive. Our goal is to instead learn a semantic parser from question–answer pairs, where the logical form is modeled as a latent variable. We develop a new semantic formalism, dependency-based compositional semantics (DCS) and define a log-linear distribution over DCS logical forms. The model parameters are estimated using a simple procedure that alternates between beam search and numerical optimization. On two standard semantic parsing benchmarks, we show that our system obtains comparable accuracies to even state-of-the-art systems that do require annotated logical forms. 1. Introduction One of the major challenges in natural language processing (NLP) is building systems that both handle complex linguistic phenomena and require minimal human effort. The difficulty of achieving both criteria is particularly evident in training semantic parsers, where annotating linguistic expressions with their associated logical forms is expensive but until recently, seemingly unavoidable. Advances in learning latent-variable models, however, have made it possible to progressively reduce the amount of supervision ∗ Computer Science Division, University of California, Berkeley, CA 94720, USA. E-mail: [email protected]. ∗∗ Computer Science Division and Department of Statistics, University of California, Berkeley, CA 94720, USA. E-mail: [email protected]. † Computer Science Division, University of California, Berkeley, CA 94720, USA. E-mail: [email protected]. Submission received: 12 September 2011; revised submission received: 19 February 2012; accepted for publication: 18 April 2012. doi:10.1162/COLI a 00127 No rights reserved. This work was authored as part of the Contributor’s official duties as an Employee of the United States Government and is therefore a work of the United States Government. In accordance with 17 U.S.C. 105, no copyright protection is available for such works under U.S. law. Computational Linguistics Volume 39, Number 2 required for various semantics-related tasks (Zettlemoyer and Collins 2005; Branavan et al. 2009; Liang, Jordan, and Klein 2009; Clarke et al. 2010; Artzi and Zettlemoyer 2011; Goldwasser et al. 2011). In this article, we develop new techniques to learn accurate semantic parsers from even weaker supervision. We demonstrate our techniques on the concrete task of building a system to answer questions given a structured database of facts; see Figure 1 for an example in the domain of U.S. geography. This problem of building natural language interfaces to databases (NLIDBs) has a long history in NLP, starting from the early days of artificial intelligence with systems such as LUNAR (Woods, Kaplan, and Webber 1972), CHAT-80 (Warren and Pereira 1982), and many others (see Androutsopoulos, Ritchie, and Thanisch [1995] for an overview). We believe NLIDBs provide an appropriate starting point for semantic parsing because they lead directly to practical systems, and they allowus to temporarily sidestep intractable philosophical questions on howto represent meaning in general. Early NLIDBs were quite successful in their respective limited domains, but because these systems were constructed from manually built rules, they became difficult to scale up, both to other domains and to more complex utterances. In response, against the backdrop of a statistical revolution in NLP during the 1990s, researchers began to build systems that could learn from examples, with the hope of overcoming the limitations of rule-based methods. One of the earliest statistical efforts was the CHILL system (Zelle and Mooney 1996), which learned a shift-reduce semantic parser. Since then, there has been a healthy line of work yielding increasingly more accurate semantic parsers by using newsemantic representations and machine learning techniques (Miller et al. 1996; Zelle and Mooney 1996; Tang and Mooney 2001; Ge and Mooney 2005; Kate, Wong, and Mooney 2005; Zettlemoyer and Collins 2005; Kate and Mooney 2006; Wong and Mooney 2006; Kate and Mooney 2007; Wong and Mooney 2007; Zettlemoyer and Collins 2007; Kwiatkowski et al. 2010, 2011). Although statistical methods provided advantages such as robustness and portabil- ity, however, their application in semantic parsing achieved only limited success. One of the main obstacles was that these methods depended crucially on having examples of utterances paired with logical forms, and this requires substantial human effort to obtain. Furthermore, the annotators must be proficient in some formal language, which drastically reduces the size of the annotator pool, dampening any hope of acquiring enough data to fulfill the vision of learning highly accurate systems. In response to these concerns, researchers have recently begun to explore the pos- sibility of learning a semantic parser without any annotated logical forms (Clarke et al. Figure 1 The concrete objective: A system that answers natural language questions given a structured database of facts. An example is shown in the domain of U.S. geography. 390 Liang, Jordan, and Klein Learning Dependency-Based Compositional Semantics Figure 2 Our statistical methodology consists of two steps: (i) semantic parsing (p(z | x; θ)): an utterance x is mapped to a logical form z by drawing from a log-linear distribution parametrized by a vector θ; and (ii) evaluation ([[z]]w): the logical form z is evaluated with respect to the world w (database of facts) to deterministically produce an answer y. The figure also shows an example configuration of the variables around the graphical model. Logical forms z are represented as labeled trees. During learning, we are given w and (x, y)pairs(shadednodes)andtrytoinfer the latent logical forms z and parameters θ. 2010; Artzi and Zettlemoyer 2011; Goldwasser et al. 2011; Liang, Jordan, and Klein 2011). It is in this vein that we develop our present work. Specifically, given a set of (x, y) example pairs, where x is an utterance (e.g., a question) and y is the corresponding answer, we wish to learn a mapping from x to y. What makes this mapping particularly interesting is that it passes through a latent logical form z, which is necessary to capture the semantic complexities of natural language. Also note that whereas the logical form z was the end goal in much of earlier work on semantic parsing, for us it is just an intermediate variable—a means towards an end. Figure 2 shows the graphical model which captures the learning setting we just described: The question x, answer y,and world/database w are all observed. We want to infer the logical forms z and the parameters θ of the semantic parser, which are unknown quantities. Although liberating ourselves from annotated logical forms reduces cost, it does increase the difficulty of the learning problem. The core challenge here is program induction: On each example (x, y), we need to efficiently search over the exponential space of possible logical forms (programs) z and find ones that produce the target answer y, a computationally daunting task. There is also a statistical challenge: How do we parametrize the mapping from utterance x to logical form z so that it can be learned from only the indirect signal y? To address these two challenges, we must first discuss the issue of semantic representation. There are two basic questions here: (i) what 391 Computational Linguistics Volume 39, Number 2 should the formal language for the logical forms z be, and (ii) what are the compositional mechanisms for constructing those logical forms? The semantic parsing literature has considered many different formal languages for representing logical forms, including SQL (Giordani and Moschitti 2009), Prolog (Zelle and Mooney 1996; Tang and Mooney 2001), a simple functional query language called FunQL (Kate, Wong, and Mooney 2005), and lambda calculus (Zettlemoyer and Collins 2005), just to name a few. The construction mechanisms are equally diverse, in- cluding synchronous grammars (Wong and Mooney 2007), hybrid trees (Lu et al. 2008), Combinatory Categorial Grammars (CCG) (Zettlemoyer and Collins 2005), and shift- reduce derivations (Zelle and Mooney 1996). It is worth pointing out that the choice of formal language and the construction mechanism are decisions which are really more orthogonal than is often assumed—the former is concerned with what the logical forms look like; the latter, with how to generate a set of possible logical forms compositionally given an utterance. (Howto score these logical forms is yet another dimension.) Existing systems are rarely based on the joint design of the formal language and the construction mechanism; one or the other is often chosen for convenience from existing implementations. For example, Prolog and SQL have often been chosen as formal languages for convenience in end applications, but they were not designed for representing the semantics of natural language, and, as a result, the construction mechanism that bridges the gap between natural