Open-Vocabulary Semantic Parsing with Both Distributional Statistics and Formal Knowledge

Proceedings of the Thirty-First AAAI Conference on Artificial Intelligence (AAAI-17) Open-Vocabulary Semantic Parsing with both Distributional Statistics and Formal Knowledge Matt Gardner, Jayant Krishnamurthy Allen Institute for Artificial Intelligence Seattle, Washington, USA {mattg,jayantk}@allenai.org Abstract To overcome this limitation, recent work has proposed Traditional semantic parsers map language onto composi- methods for open vocabulary semantic parsing, which re- tional, executable queries in a fixed schema. This map- place a formal KB with a probabilistic database learned ping allows them to effectively leverage the information con- from a text corpus. In these methods, language is mapped tained in large, formal knowledge bases (KBs, e.g., Freebase) onto queries with predicates derived directly from the to answer questions, but it is also fundamentally limiting— text itself (Lewis and Steedman 2013; Krishnamurthy and these semantic parsers can only assign meaning to language Mitchell 2015). For instance, the question above might be that falls within the KB’s manually-produced schema. Re- mapped to λ(x).president of(x, USA). This query is not ex- cently proposed methods for open vocabulary semantic pars- ecutable against any KB, however, and so open vocabulary ing overcome this limitation by learning execution models semantic parsers must learn execution models for the pred- for arbitrary language, essentially using a text corpus as a icates found in the text. They do this with a distributional kind of knowledge base. However, all prior approaches to open vocabulary semantic parsing replace a formal KB with approach similar to word embedding methods, giving them textual information, making no use of the KB in their mod- broad coverage, but lacking access to the large, curated KBs els. We show how to combine the disparate representations available to traditional semantic parsers. used by these two approaches, presenting for the first time a Prior work in semantic parsing, then, has either had direct semantic parser that (1) produces compositional, executable access to the information in a knowledge base, or broad cov- representations of language, (2) can successfully leverage the erage over all of natural language using the information in a information contained in both a formal KB and a large cor- large corpus, but not both. pus, and (3) is not limited to the schema of the underly- ing KB. We demonstrate significantly improved performance In this work, we show how to combine these two over state-of-the-art baselines on an open-domain natural lan- approaches by incorporating KB information into open guage question answering task. vocabulary semantic parsing models. Our key insight is that formal KB queries can be converted into features that can be added to the learned execution mod- 1 Introduction els of open vocabulary semantic parsers. This conver- Semantic parsing is the task of mapping a phrase in nat- sion allows open vocabulary models to use the KB fact ural language onto a formal query in some fixed schema, /GOVERNMENT/PRESIDENT OF(BARACKOBAMA, USA) when which can then be executed against a knowledge base scoring president of(BARACKOBAMA, USA), without requir- (KB) (Zelle and Mooney 1996; Zettlemoyer and Collins ing the model to map the language onto a single formal state- 2005). For example, the phrase “Who is the president ment. Crucially, this featurization also allows the model to of the United States?” might be mapped onto the query use these KB facts even when they only provide partial in- λ(x)./GOVERNMENT/PRESIDENT OF(x, USA), which, when formation about the language being modeled. For example, executed against Freebase (Bollacker et al. 2008), returns knowing that an entity is a POLITICIAN is very helpful in- BARACK OBAMA. By mapping phrases to executable state- formation for deciding whether that entity is a front-runner. ments, semantic parsers can leverage large, curated sources Our approach, outlined in Figure 1, effectively learns the of knowledge to answer questions (Berant et al. 2013). meaning of a word as a distributional vector plus a weighted This benefit comes with an inherent limitation, however— combination of Freebase queries, a considerably more ex- semantic parsers can only produce executable statements pressive representation than those used by prior work. within their manually-produced schema. There is no query While this combination is the main contribution of our against Freebase that can answer questions like “Who are the work, we also present some small improvements that allow Democratic front-runners in the US election?”, as Freebase open vocabulary semantic parsing models to make better use does not encode information about front-runners. Semantic of KB information when it is available: improving the logi- parsers trained for Freebase fail on these kinds of questions. cal forms generated by the semantic parser, and employing a Copyright c 2017, Association for the Advancement of Artificial simple technique from related work for generating candidate Intelligence (www.aaai.org). All rights reserved. entities from the KB. 3195 Input Text −→ Logical Form Italian architect λx.architect(x) ∧ architect N/N(ITALY,x) Candidate entities Probability that entity is in denotation PALLADIO −→ p(architect(PALLADIO)) × p(architect N/N(ITALY, PALLADIO)) = 0.79 OBAMA p(architect(OBAMA)) × p(architect N/N(ITALY, OBAMA)) = 0.01 ··· ··· ' $' $ Category models: Relation models: Predicate parameters for architect: Predicate parameters for architect N/N: θ: [.2, -.6, ...] θ: [-.9, .1, ...] -1 ω: TYPE:ARCHITECT → .52 ω:/PERSON/NATIONALITY → .29 TYPE:DESIGNER → .32 /STRUCTURE/ARCHITECT → .11 NATIONALITY:ITALY → .20 /PERSON/ETHNICITY-1 → .05 ··· ··· Entity parameters for PALLADIO: Entity pair parameters for (ITALY,PALLADIO): φ: [.15, -.8, ...] φ: [-.8, .2, ...] ψ: TYPE:ARCHITECT → 1 ψ:/PERSON/NATIONALITY-1 → 1 NATIONALITY:ITALY → 1 /STRUCTURE/ARCHITECT → 0 ··· ··· p( ( )) = σ(θT φ + ωT ψ) p( ( , )) = σ(θT φ + ωT ψ) &architect PALLADIO %&architect N/N ITALY PALLADIO % Figure 1: Overview of the components of our model. Given an input text, we use a CCG parser and an entity linker to produce a logical form with predicates derived from the text (shown in italics). For each predicate, we learn a distributional vector θ,as well as weights ω associated with a set of selected Freebase queries. For each entity and entity pair, we learn a distributional vector φ, and we extract a binary feature vector ψ from Freebase, indicating whether each entity or entity pair is in the set returned by the selected Freebase queries. These models are combined to assign probabilities to candidate entities. We demonstrate our approach on the task of answering rameters θ and φ are learned using a query ranking objective open-domain fill-in-the-blank natural language questions. that optimizes them to rank entities observed in the denota- By giving open vocabulary semantic parsers direct access tion of a logical form above unobserved entities. Given the to KB information, we improve mean average precision on trained predicate and entity parameters, the system is capa- this task by over 120%. ble of efficiently computing the marginal probability that an entity is an element of a logical form’s denotation using ap- 2 Open vocabulary semantic parsing proximate inference algorithms for probabilistic databases. In this section, we briefly describe the current state-of-the- The model presented in this section is purely distribu- art model for open vocabulary semantic parsing, introduced tional, with predicate and entity models that draw only on by Krishnamurthy and Mitchell (2015). Instead of mapping co-occurrence information found in a corpus. In the follow- text to Freebase queries, as done by a traditional semantic ing sections, we show how to augment this model with in- parser, their method parses text to a surface logical form formation contained in large, curated KBs such as Freebase. with predicates derived directly from the words in the text (see Figure 1). Next, a distribution over denotations for each 3 Converting Freebase queries to features predicate is learned using a matrix factorization approach Our key insight is that the executable queries used by tra- similar to that of Riedel et al. (2013). This distribution is ditional semantic parsers can be converted into features that concisely represented using a probabilistic database, which provide KB information to the execution models of open vo- also enables efficient probabilistic execution of logical form cabulary semantic parsers. Here we show how this is done. queries. The matrix factorization has two sets of parameters: each Traditional semantic parsers map words onto distributions category or relation has a learned k-dimensional embedding over executable queries, select one to execute, and return θ, and each entity or entity pair has a learned k-dimensional sets of entities or entity pairs from a KB as a result. Instead embedding φ. The probability assigned to a category in- of executing a single query, we can simply execute all possible queries and use an entity’s (or entity pair’s) membership stance c(e) or relation instance r(e1,e2) is given by: in each set as a feature in our predicate models. T p(c(e)) = σ(θc φe) There are two problems with this approach: (1) the set of p(r(e ,e )) = σ(θT φ ) all possible queries is intractably large, so we need a mech- 1 2 r (e1,e2) anism similar to a semantic parser’s lexicon to select a small The probability of a predicate instance is the sigmoided set of queries for each word; and (2) executing hundreds or inner product of the corresponding predicate and entity em- thousands of queries at runtime for each predicate and entity beddings. Predicates with nearby embeddings will have sim- is not computationally tractable. To solve these problems, ilar distributions over the entities in their denotation. The pa- we use a graph-based technique called subgraph feature ex- 3196 NATIONALITY fore select a small number of statements to consider for each PALLADIO ITALY learned predicate in the open vocabulary semantic parser.

Load more