Montague Meets Markov: Deep Semantics with Probabilistic Logical Form
Total Page:16
File Type:pdf, Size:1020Kb
Montague Meets Markov: Deep Semantics with Probabilistic Logical Form Islam Beltagyx, Cuong Chaux, Gemma Boleday, Dan Garrettex, Katrin Erky, Raymond Mooneyx xDepartment of Computer Science yDepartment of Linguistics The University of Texas at Austin Austin, Texas 78712 x beltagy,ckcuong,dhg,mooney @cs.utexas.edu [email protected],[email protected] g Abstract # » # » sim(hamster, gerbil) = w We combine logical and distributional rep- resentations of natural language meaning by gerbil( transforming distributional similarity judg- hamster( ments into weighted inference rules using Markov Logic Networks (MLNs). We show x hamster(x) gerbil(x) f(w) that this framework supports both judg- 8 ! | ing sentence similarity and recognizing tex- tual entailment by appropriately adapting the Figure 1: Turning distributional similarity into a MLN implementation of logical connectives. weighted inference rule We also show that distributional phrase simi- larity, used as textual inference rules created on the fly, improves its performance. els (Turney and Pantel, 2010) use contextual sim- ilarity to predict semantic similarity of words and 1 Introduction phrases (Landauer and Dumais, 1997; Mitchell and Tasks in natural language semantics are very diverse Lapata, 2010), and to model polysemy (Schutze,¨ and pose different requirements on the underlying 1998; Erk and Pado,´ 2008; Thater et al., 2010). formalism for representing meaning. Some tasks This suggests that distributional models and logic- require a detailed representation of the structure of based representations of natural language meaning complex sentences. Some tasks require the ability to are complementary in their strengths (Grefenstette recognize near-paraphrases or degrees of similarity and Sadrzadeh, 2011; Garrette et al., 2011), which between sentences. Some tasks require logical infer- encourages developing new techniques to combine ence, either exact or approximate. Often it is neces- them. sary to handle ambiguity and vagueness in meaning. Garrette et al. (2011; 2013) propose a framework Finally, we frequently want to be able to learn rele- for combining logic and distributional models in vant knowledge automatically from corpus data. which logical form is the primary meaning repre- There is no single representation for natural lan- sentation. Distributional similarity between pairs of guage meaning at this time that fulfills all require- words is converted into weighted inference rules that ments. But there are representations that meet some are added to the logical form, as illustrated in Fig- of the criteria. Logic-based representations (Mon- ure 1. Finally, Markov Logic Networks (Richardson tague, 1970; Kamp and Reyle, 1993) provide an and Domingos, 2006) (MLNs) are used to perform expressive and flexible formalism to express even weighted inference on the resulting knowledge base. complex propositions, and they come with standard- However, they only employed single-word distribu- ized inference mechanisms. Distributional mod- tional similarity rules, and only evaluated on a small set of short, hand-crafted test sentences. though other approaches could be adapted to handle In this paper, we extend Garrette et al.’s approach both RTE and STS, we do not know of any other and adapt it to handle two existing semantic tasks: methods that have been explicitly tested on both recognizing textual entailment (RTE) and seman- problems. tic textual similarity (STS). We show how this sin- gle semantic framework using probabilistic logical 2 Related work form in Markov logic can be adapted to support both Distributional semantics Distributional models of these important tasks. This is possible because define the semantic relatedness of words as the MLNs constitute a flexible programming language similarity of vectors representing the contexts in based on probabilistic logic (Domingos and Lowd, which they occur (Landauer and Dumais, 1997; 2009) that can be easily adapted to support multiple Lund and Burgess, 1996). Recently, such mod- types of linguistically useful inference. els have also been used to represent the meaning At the word and short phrase level, our approach of larger phrases. The simplest models compute model entailment through “distributional” similarity a phrase vector by adding the vectors for the indi- (Figure 1). If X and Y occur in similar contexts, we vidual words (Landauer and Dumais, 1997) or by a assume that they describe similar entities and thus component-wise product of word vectors (Mitchell there is some degree of entailment between them. At and Lapata, 2008; Mitchell and Lapata, 2010). the sentence level, however, we hold that a stricter, Other approaches, in the emerging area of distribu- logic-based view of entailment is beneficial, and we tional compositional semantics, use more complex even model sentence similarity (in STS) as entail- methods that compute phrase vectors from word ment. vectors and tensors (Baroni and Zamparelli, 2010; There are two main innovations in the formalism Grefenstette and Sadrzadeh, 2011). that make it possible for us to work with naturally occurring corpus data. First, we use more expres- Wide-coverage logic-based semantics Boxer sive distributional inference rules based on the sim- (Bos, 2008) is a software package for wide-coverage ilarity of phrases rather than just individual words. semantic analysis that produces logical forms using In comparison to existing methods for creating tex- Discourse Representation Structures (Kamp and tual inference rules (Lin and Pantel, 2001b; Szpek- Reyle, 1993). It builds on the C&C CCG parser tor and Dagan, 2008), these rules are computed on (Clark and Curran, 2004). the fly as needed, rather than pre-compiled. Second, we use more flexible probabilistic combinations of Markov Logic In order to combine logical and evidence in order to compute degrees of sentence probabilistic information, we draw on existing work similarity for STS and to help compensate for parser in Statistical Relational AI (Getoor and Taskar, errors. We replace deterministic conjunction by an 2007). Specifically, we utilize Markov Logic Net- average combiner, which encodes causal indepen- works (MLNs) (Domingos and Lowd, 2009), which dence (Natarajan et al., 2010). employ weighted formulas in first-order logic to We show that our framework is able to han- compactly encode complex undirected probabilistic dle both sentence similarity (STS) and textual en- graphical models. MLNs are well suited for our ap- tailment (RTE) by making some simple adapta- proach since they provide an elegant framework for tions to the MLN when switching between tasks. assigning weights to first-order logical rules, com- The framework achieves reasonable results on both bining a diverse set of inference rules and perform- tasks. On STS, we obtain a correlation of r = 0:66 ing sound probabilistic inference. with full logic, r = 0:73 in a system with weak- An MLN consists of a set of weighted first-order ened variable binding, and r = 0:85 in an ensemble clauses. It provides a way of softening first-order model. On RTE-1 we obtain an accuracy of 0:57. logic by allowing situations in which not all clauses We show that the distributional inference rules ben- are satisfied. More specifically, they provide a efit both tasks and that more flexible probabilistic well-founded probability distribution across possi- combinations of evidence are crucial for STS. Al- ble worlds by specifying that the probability of a world increases exponentially with the total weight assigned. Raina et al. (2005) use this framework for of the logical clauses that it satisfies. While methods RTE, deriving inference costs from WordNet simi- exist for learning MLN weights directly from train- larity and properties of the syntactic parse. ing data, since the appropriate training data is lack- Garrette et al. (2011; 2013) proposed an approach ing, our approach uses weights computed using dis- to RTE that uses MLNs to combine traditional log- tributional semantics. We use the open-source soft- ical representations with distributional information ware package Alchemy (Kok et al., 2005) for MLN in order to support probabilistic textual inference. inference, which allows computing the probability This approach can be viewed as a bridge between of a query literal given a set of weighted clauses as Bos and Markert (2005)’s purely logical approach, background knowledge and evidence. which relies purely on hard logical rules and the- orem proving, and distributional approaches, which Tasks: RTE and STS Recognizing Textual En- support graded similarity between concepts but have tailment (RTE) is the task of determining whether no notion of logical operators or entailment. one natural language text, the premise, implies an- There are also other methods that combine dis- other, the hypothesis. Consider (1) below. tributional and structured representations. Stern et al. (2011) conceptualize textual entailment as tree (1) p: Oracle had fought to keep the forms from rewriting of syntactic graphs, where some rewrit- being released ing rules are distributional inference rules. Socher h: Oracle released a confidential document et al. (2011) recognize paraphrases using a “tree of Here, h is not entailed. RTE directly tests whether vectors,” a phrase structure tree in which each con- a system can construct semantic representations that stituent is associated with a vector, and overall sen- allow it to draw correct inferences. Of existing RTE tence similarity is computed by a classifier