Semantic Parsing on Freebase from Question-Answer Pairs
Total Page:16
File Type:pdf, Size:1020Kb
Semantic Parsing on Freebase from Question-Answer Pairs Jonathan Berant Andrew Chou Roy Frostig Percy Liang Computer Science Department, Stanford University fjoberant,[email protected] frf,[email protected] Abstract Occidental College, Columbia University In this paper, we train a semantic parser that Execute on Database scales up to Freebase. Instead of relying on annotated logical forms, which is especially Type.University u Education:BarackObama expensive to obtain at large scale, we learn Type.University from question-answer pairs. The main chal- bridging lenge in this setting is narrowing down the Education huge number of possible logical predicates for alignment a given question. We tackle this problem in BarackObama two ways: First, we build a coarse mapping alignment from phrases to predicates using a knowledge Which college did Obama go to ? base and a large text corpus. Second, we use a bridging operation to generate additional Figure 1: Our task is to map questions to answers via la- predicates based on neighboring predicates. tent logical forms. To narrow down the space of logical On the dataset of Cai and Yates (2013), despite predicates, we use a (i) coarse alignment based on Free- not having annotated logical forms, our sys- base and a text corpus and (ii) a bridging operation that tem outperforms their state-of-the-art parser. generates predicates compatible with neighboring predi- Additionally, we collected a more realistic and cates. challenging dataset of question-answer pairs and improves over a natural baseline. predicates (Cai and Yates, 2013). The goal of this 1 Introduction paper is to do both: learn a semantic parser with- We focus on the problem of semantic parsing nat- out annotated logical forms that scales to the large ural language utterances into logical forms that can number of predicates on Freebase. be executed to produce denotations. Traditional se- At the lexical level, a major challenge in semantic mantic parsers (Zelle and Mooney, 1996; Zettle- parsing is mapping natural language phrases (e.g., moyer and Collins, 2005; Wong and Mooney, 2007; “attend”) to logical predicates (e.g., Education). Kwiatkowski et al., 2010) have two limitations: (i) While limited-domain semantic parsers are able they require annotated logical forms as supervision, to learn the lexicon from per-example supervision and (ii) they operate in limited domains with a small (Kwiatkowski et al., 2011; Liang et al., 2011), at number of logical predicates. Recent developments large scale they have inadequate coverage (Cai and aim to lift these limitations, either by reducing the Yates, 2013). Previous work on semantic parsing on amount of supervision (Clarke et al., 2010; Liang et Freebase uses a combination of manual rules (Yahya al., 2011; Goldwasser et al., 2011; Artzi and Zettle- et al., 2012; Unger et al., 2012), distant supervision moyer, 2011) or by increasing the number of logical (Krishnamurthy and Mitchell, 2012), and schema matching (Cai and Yates, 2013). We use a large 2.1 Knowledge base amount of web text and a knowledge base to build a Let E denote a set of entities (e.g., BarackObama), coarse alignment between phrases and predicates— and let P denote a set of properties (e.g., an approach similar in spirit to Cai and Yates (2013). PlaceOfBirth). A knowledge base K is a set However, this alignment only allows us to gen- of assertions (e1; p; e2) 2 E × P × E (e.g., erate a subset of the desired predicates. Aligning (BarackObama; PlaceOfBirth; Honolulu)). light verbs (e.g., “go”) and prepositions is not very We use the Freebase knowledge base (Google, informative due to polysemy, and rare predicates 2013), which has 41M non-numeric entities, 19K (e.g., “cover price”) are difficult to cover even given properties, and 596M assertions.1 a large corpus. To improve coverage, we propose a new bridging operation that generates predicates 2.2 Logical forms based on adjacent predicates rather than on words. At the compositional level, a semantic parser must To query the knowledge base, we use a logical lan- combine the predicates into a coherent logical form. guage called Lambda Dependency-Based Compo- Previous work based on CCG requires manually sitional Semantics (λ-DCS)—see Liang (2013) for specifying combination rules (Krishnamurthy and details. For the purposes of this paper, we use a re- Mitchell, 2012) or inducing the rules from anno- stricted subset called simple λ-DCS, which we will tated logical forms (Kwiatkowski et al., 2010; Cai define below for the sake of completeness. and Yates, 2013). We instead define a few simple The chief motivation of λ-DCS is to produce composition rules which over-generate and then use logical forms that are simpler than lambda cal- model features to simulate soft rules and categories. culus forms. For example, λx.9a:p1(x; a) ^ In particular, we use POS tag features and features 9b:p2(a; b) ^ p3(b; e) is expressed compactly in on the denotations of the predicted logical forms. λ-DCS as p1:p2:p3:e. Like DCS (Liang et al., We experimented with two question answering 2011), λ-DCS makes existential quantification im- datasets on Freebase. First, on the dataset of Cai plicit, thereby reducing the number of variables. and Yates (2013), we showed that our system out- Variables are only used for anaphora and building performs their state-of-the-art system 62% to 59%, composite binary predicates; these do not appear in despite using no annotated logical forms. Second, simple λ-DCS. we collected a new realistic dataset of questions by Each logical form in simple λ-DCS is either a performing a breadth-first search using the Google unary (which denotes a subset of E) or a binary Suggest API; these questions are then answered by (which denotes a subset of E × E). The basic λ- Amazon Mechanical Turk workers. Although this DCS logical forms z and their denotations z K are J K dataset is much more challenging and noisy, we are defined recursively as follows: still able to achieve 31.4% accuracy, a 4.5% ab- • Unary base case: If e 2 E is an entity (e.g., solute improvement over a natural baseline. Both Seattle), then e is a unary logical form with datasets as well as the source code for SEMPRE, our z K = feg. J K semantic parser, are publicly released and can be • Binary base case: If p 2 P is a property (e.g., downloaded from http://nlp.stanford.edu/ PlaceOfBirth), then p is a binary logical form 2 software/sempre/. with p K = f(e1; e2):(e1; p; e2) 2 Kg. • Join:J IfKb is a binary and u is a unary, then b:u 2 Setup (e.g., PlaceOfBirth.Seattle) is a unary de- noting a join and project: b:u K = fe1 2 E : J K Problem Statement Our task is as follows: Given 9e2:(e1; e2) 2 b K ^ e2 2 u Kg. (i) a knowledge base K, and (ii) a training set of J K J K n 1In this paper, we condense Freebase names for readability question-answer pairs f(xi; yi)g , output a se- i=1 /people/person Person mantic parser that maps new questions x to answers ( becomes ). 2Binaries can be also built out of lambda abstractions (e.g., y via latent logical forms z and the knowledge base λx.Performance:Actor:x), but as these constructions are K. not central to this paper, we defer to (Liang, 2013). • Intersection: If u1 and u2 are both unaries, Type.Location u PeopleBornHere.BarackObama intersection then u1 u u2 (e.g., Profession.Scientist u PlaceOfBirth.Seattle) denotes set intersec- Type.Location was PeopleBornHere.BarackObama ? lexicon join tion: u1 u u2 K = u1 K \ u2 K. • Aggregation:J K If u isJ aK unary,J thenK count(u) where BarackObama PeopleBornHere lexicon lexicon denotes the cardinality: count(u) K = J K Obama born fj u Kjg. As aJ finalK example, “number of dramas star- ring Tom Cruise” in lambda calculus would Figure 2: An example of a derivation d of the utterance be represented as count(λx.Genre(x; Drama) ^ “Where was Obama born?” and its sub-derivations, each 9y:Performance(x; y) ^ Actor(y; TomCruise)); labeled with composition rule (in blue) and logical form d “was” “?” in λ-DCS, it is simply count(Genre:Drama u (in red). The derivation skips the words and . Performance:Actor:TomCruise). It is useful to think of the knowledge base K as ily over-generates. We instead rely on features and a directed graph in which entities are nodes and learning to guide us away from the bad derivations. properties are labels on the edges. Then simple λ- DCS unary logical forms are tree-like graph patterns Modeling Following Zettlemoyer and Collins which pick out a subset of the nodes. (2005) and Liang et al. (2011), we define a discriminative log-linear model over derivations 2.3 Framework d 2 D(x) given utterances x: pθ(d j x) = expfφ(x;d)>θg x P 0 > , where φ(x; d) is a feature Given an utterance , our semantic parser constructs d02D(x) expfφ(x;d ) θg a distribution over possible derivations D(x). Each vector extracted from the utterance and the deriva- b derivation d 2 D(x) is a tree specifying the appli- tion, and θ 2 R is the vector of parameters to cation of a set of combination rules that culminates be learned. As our training data consists only of in the logical form d:z at the root of the tree—see question-answer pairs (xi; yi), we maximize the log- Figure 2 for an example.