Combined Distributional and Logical Semantics

Combined Distributional and Logical Semantics Mike Lewis Mark Steedman School of Informatics School of Informatics University of Edinburgh University of Edinburgh Edinburgh, EH8 9AB, UK Edinburgh, EH8 9AB, UK [email protected] [email protected] Abstract of limited use in capturing the huge variety of meanings that can be expressed in language. We introduce a new approach to semantics However, distributional semantics has largely de- which combines the benefits of distributional veloped in isolation from the formal semantics liter- and formal logical semantics. Distributional models have been successful in modelling the ature. Whilst distributional semantics has been ef- meanings of content words, but logical se- fective in modelling the meanings of content words mantics is necessary to adequately represent such as nouns and verbs, it is less clear that it can be many function words. We follow formal se- applied to the meanings of function words. Semantic mantics in mapping language to logical rep- operators, such as determiners, negation, conjunc- resentations, but differ in that the relational tions, modals, tense, mood, aspect, and plurals are constants used are induced by offline distri- ubiquitous in natural language, and are crucial for butional clustering at the level of predicate- high performance on many practical applications— argument structure. Our clustering algorithm is highly scalable, allowing us to run on cor- but current distributional models struggle to capture pora the size of Gigaword. Different senses of even simple examples. Conversely, computational a word are disambiguated based on their in- models of formal semantics have shown low recall duced types. We outperform a variety of ex- on practical applications, stemming from their re- isting approaches on a wide-coverage question liance on ontologies such as WordNet (Miller, 1995) answering task, and demonstrate the ability to to model the meanings of content words (Bobrow et make complex multi-sentence inferences in- al., 2007; Bos and Markert, 2005). volving quantifiers on the FraCaS suite. For example, consider what is needed to answer a question like Did Google buy YouTube? from the 1 Introduction following sentences: Mapping natural language to meaning representa- 1. Google purchased YouTube tions is a central challenge of NLP. There has been 2. Google’s acquisition of YouTube much recent progress in unsupervised distributional semantics, in which the meaning of a word is in- 3. Google acquired every company duced based on its usage in large corpora. This ap- 4. YouTube may be sold to Google proach is useful for a range of key applications in- 5. Google will buy YouTube or Microsoft cluding question answering and relation extraction 6. Google didn’t takeover YouTube (Lin and Pantel, 2001; Poon and Domingos, 2009; Yao et al., 2011). Because such a semantics can be All of these require knowledge of lexical seman- automically induced, it escapes the limitation of de- tics (e.g. that buy and purchase are synonyms), but pending on relations from hand-built training data, some also need interpretation of quantifiers, nega- knowledge bases or ontologies, which have proved tives, modals and disjunction. It seems unlikely that 179 Transactions of the Association for Computational Linguistics, 1 (2013) 179–192. Action Editor: Johan Bos. Submitted 1/2013; Revised 3/2013; Published 5/2013. c 2013 Association for Computational Linguistics. Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00219 by guest on 03 October 2021 distributional or formal approaches can accomplish Every dog barks the task alone. NP↑/NNS NP \ λ pλq. x[p(x) = q(x)] λx.dog0(x) λx.bark0(x) We propose a method for mapping natural lan- ∀ ⇒ > guage to first-order logic representations capable of NP↑ λq. x[dog0(x) = q(x)] capturing the meanings of function words such as ∀ ⇒ > every, not and or, but which also uses distributional S x[dog (x) = bark (x)] statistics to model the meaning of content words. ∀ 0 ⇒ 0 Our approach differs from standard formal seman- Figure 1: A standard logical form derivation using CCG. tics in that the non-logical symbols used in the log- The NP↑ notation means that the subject is type-raised, ical form are cluster identifiers. Where standard se- and taking the verb-phrase as an argument—so is an ab- mantic formalisms would map the verb write to a breviation of S/(S NP). This is necessary in part to sup- \ write’ symbol, we map it to a cluster identifier such port a correct semantics for quantifiers. as relation37, which the noun author may also map to. This mapping is learnt by offline clustering. Input Sentence Unlike previous distributional approaches, we Shakespeare wrote Macbeth perform clustering at the level of predicate-argument ⇓ structure, rather than syntactic dependency struc- Intial semantic analysis ture. This means that we abstract away from many writearg0,arg1(shakespeare, macbeth) syntactic differences that are not present in the se- ⇓ mantics, such as conjunctions, passives, relative Entity Typing clauses, and long-range dependencies. This signifi- writearg0:PER,arg1:BOOK(shakespeare:PER, cantly reduces sparsity, so we have fewer predicates macbeth:BOOK) to cluster and more observations for each. ⇓ Of course, many practical inferences rely heavily Distributional semantic analysis on background knowledge about the world—such relation37(shakespeare:PER, macbeth:BOOK) knowledge falls outside the scope of this work. Figure 2: Layers used in our model. 2 Background form of semantics captures the underlying predicate- Our approach is based on Combinatory Categorial argument structure, but fails to license many impor- Grammar (CCG; Steedman, 2000), a strongly lexi- tant inferences—as, for example, write and author calised theory of language in which lexical entries do not map to the same predicate. for words contain all language-specific information. In addition to the lexicon, there is a small set of The lexical entry for each word contains a syntactic binary combinators and unary rules, which have a category, which determines which other categories syntactic and semantic interpretation. Figure 1 gives the word may combine with, and a semantic inter- an example CCG derivation. pretation, which defines the compositional semantics. For example, the lexicon may contain the entry: 3 Overview of Approach write (S NP)/NP : λyλx.write (x,y) ` \ 0 Crucially, there is a transparent interface between We attempt to learn a CCG lexicon which maps the syntactic category and the semantics. For ex- equivalent words onto the same logical form—for ample the transitive verb entry above defines the example learning entries such as: verb syntactically as a function mapping two noun- author N/PP[o f ] : λxλy.relation37(x,y) ` phrases to a sentence, and semantically as a bi- write (S NP)/NP : λxλy.relation37(x,y) ` \ nary relation between its two argument entities. The only change to the standard CCG derivation is This means that it is relatively straightforward to that the symbols used in the logical form are arbi- deterministically map parser output to a logical trary relation identifiers. We learn these by first map- form, as in the Boxer system (Bos, 2008). This ping to a deterministic logical form (using predicates 180 Downloaded from http://www.mitpressjournals.org/doi/pdf/10.1162/tacl_a_00219 by guest on 03 October 2021 such as author’ and write’), using a process simi- exactly preserve meaning, but still captures the most lar to Boxer, and then clustering predicates based on important relations. Note that this allows us to their arguments. This lexicon can then be used to compare semantic relations across different syntac- parse new sentences, and integrates seamlessly with tic types—for example, both transitive verbs and CCG theories of formal semantics. argument-taking nouns can be seen as expressing bi- Typing predicates—for example, determining that nary semantic relations between entities. writing is a relation between people and books— Figure 2 shows the layers used in our model. has become standard in relation clustering (Schoen- mackers et al., 2010; Berant et al., 2011; Yao et 4 Initial Semantic Analysis al., 2012). We demonstate how to build a typing The initial semantic analysis maps parser output model into the CCG derivation, by subcategorizing onto a logical form, in a similar way to Boxer. The all terms representing entities in the logical form semantic formalism is based on Steedman (2012). with a more detailed type. These types are also in- The first step is syntactic parsing. We use the duced from text, as explained in Section 5, but for C&C parser (Clark and Curran, 2004), trained on convenience we describe them with human-readable CCGBank (Hockenmaier and Steedman, 2007), us- labels, such as PER, LOC and BOOK. ing the refined version of Honnibal et al. (2010) A key advantage of typing is that it allows us to which brings the syntax closer to the predicate- model ambiguous predicates. Following Berant et argument structure. An automatic post-processing al. (2011), we assume that different type signatures step makes a number of minor changes to the parser of the same predicate have different meanings, but output, which converts the grammar into one more given a type signature a predicate is unambiguous. suitable for our semantics. PP (prepositional phrase) For example a different lexical entry for the verb and PR (phrasal verb complement) categories are born is used in the contexts Obama was born in sub-categorised with the relevant preposition. Noun Hawaii and Obama was born in 1961, reflecting a compounds with the same MUC named-entity type distinction in the semantics that is not obvious in the (Chinchor and Robinson, 1997) are merged into a 1 syntax . Typing also greatly improves the efficiency single non-compositional node2 (we otherwise ig- of clustering, as we only need to compare predicates nore named-entity types).

Load more