Commonsense Knowledge Base Completion

Commonsense Knowledge Base Completion Xiang Li∗‡ Aynaz Taheriy Lifu Tuz Kevin Gimpelz ∗University of Chicago, Chicago, IL, 60637, USA yUniversity of Illinois at Chicago, Chicago, IL, 60607, USA zToyota Technological Institute at Chicago, Chicago, IL, 60637, USA [email protected], [email protected], flifu,[email protected] Abstract relation right term conf. MOTIVATEDBYGOAL relax 3.3 USEDFOR relaxation 2.6 We enrich a curated resource of common- MOTIVATEDBYGOAL your muscle be sore 2.3 sense knowledge by formulating the prob- HASPREREQUISITE go to spa 2.0 lem as one of knowledge base comple- CAUSES get pruny skin 1.6 HASPREREQUISITE change into swim suit 1.6 tion (KBC). Most work in KBC focuses on knowledge bases like Freebase that re- Table 1: ConceptNet tuples with left term “soak in late entities drawn from a fixed set. How- hotspring”; final column is confidence score. ever, the tuples in ConceptNet (Speer and Havasi, 2012) define relations between an unbounded set of phrases. We develop tain resources, researchers have developed meth- neural network models for scoring tuples ods to automatically increase coverage by infer- on arbitrary phrases and evaluate them by ring missing entries. These methods are com- their ability to distinguish true held-out monly categorized under the heading of knowl- tuples from false ones. We find strong edge base completion (KBC). KBC is widely- performance from a bilinear model using studied for knowledge bases like Freebase (Bol- a simple additive architecture to model lacker et al., 2008) which contain large sets of enti- phrases. We manually evaluate our trained ties and relations among them (Mintz et al., 2009; model’s ability to assign quality scores to Nickel et al., 2011; Riedel et al., 2013; West et novel tuples, finding that it can propose tu- al., 2014), including recent work using neural net- ples at the same quality level as medium- works (Socher et al., 2013; Yang et al., 2014). confidence tuples from ConceptNet. We improve the coverage of commonsense resources by formulating the problem as one of 1 Introduction knowledge base completion. We focus on a par- Many ambiguities in natural language process- ticular curated commonsense resource called Con- ing (NLP) can be resolved by using knowledge ceptNet (Speer and Havasi, 2012). ConceptNet of various forms. Our focus is on the type of contains tuples consisting of a left term, a rela- knowledge that is often referred to as “common- tion, and a right term. The relations come from sense” or “background” knowledge. This knowl- a fixed set. While terms in Freebase tuples are en- edge is rarely expressed explicitly in textual cor- tities, ConceptNet terms can be arbitrary phrases. pora (Gordon and Van Durme, 2013). Some re- Some examples are shown in Table 1. An NLP ap- searchers have developed techniques for inferring plication may wish to query ConceptNet for infor- this knowledge from patterns in raw text (Gor- mation about soaking in a hotspring, but may use don, 2014; Angeli and Manning, 2014), while oth- different words from those contained in the Con- ers have developed curated resources of common- ceptNet tuples. Our goal is to do on-the-fly knowl- sense knowledge via manual annotation (Lenat edge base completion so that queries can be an- and Guha, 1989; Speer and Havasi, 2012) or swered robustly without requiring the precise lin- games with a purpose (von Ahn et al., 2006). guistic forms contained in ConceptNet. Curated resources typically have high preci- To do this, we develop neural network mod- sion but suffer from a lack of coverage. For cer- els to embed terms and provide scores to arbitrary tuples. We train them on ConceptNet tuples ing, though our methods could be applied to the and evaluate them by their ability to distinguish output of these or other extraction systems. true and false held-out tuples. We consider sev- Our goals are similar to those of the Analogy- eral functional architectures, comparing two com- Space method (Speer et al., 2008), which uses ma- position functions for embedding terms and two trix factorization to improve coverage of Concept- functions for converting term embeddings into tu- Net. However, AnalogySpace can only return a ple scores. We find that all architectures are able confidence score for a pair of terms drawn from to outperform several baselines and reach similar the training set. Our models can assign scores to performance on classifying held-out tuples. tuples that contain novel terms (as long as they We also experiment with several training ob- consist of words in our vocabulary). jectives for KBC, finding that a simple cross en- Though we use ConceptNet, similar techniques tropy objective with randomly-generated negative can be applied to other curated resources like examples performs best while also being fastest. WordNet (Miller, 1995) and FrameNet (Baker et We manually evaluate our trained model’s abil- al., 1998). For WordNet, tuples can contain lexi- ity to assign quality scores to novel tuples, find- cal entries that are linked via synset relations (e.g., ing that it can propose tuples at the same qual- “hypernym”). WordNet contains many multi- ity level as medium-confidence tuples from Con- word entries (e.g., “cold sweat”), which can be ceptNet. We release all of our resources, includ- modeled compositionally by our term models; al- ing our ConceptNet KBC task data, large sets of ternatively, entire glosses could be used as terms. randomly-generated tuples scored with our model, To expand frame relationships in FrameNet, tuples training code, and pretrained models with code for can draw relations from the frame relation types calculating the confidence of novel tuples.1 (e.g., “is causative of”) and terms can be frame lexical units or their definitions. 2 Related Work Several researchers have used commonsense knowledge to improve language technologies, in- Our methods are similar to past work on cluding sentiment analysis (Cambria et al., 2012; KBC (Mintz et al., 2009; Nickel et al., 2011; Lao Agarwal et al., 2015), semantic similarity (Caro et et al., 2011; Nickel et al., 2012; Riedel et al., 2013; al., 2015), and speech recognition (Lieberman et Gardner et al., 2014; West et al., 2014), particu- al., 2005). Our hope is that our models can en- larly methods based on distributed representations able many other NLP applications to benefit from and neural networks (Socher et al., 2013; Bordes commonsense knowledge. et al., 2013; Bordes et al., 2014a; Bordes et al., Our work is most similar to that of Angeli and 2014b; Yang et al., 2014; Neelakantan et al., 2015; Manning (2013). They also developed methods Gu et al., 2015; Toutanova et al., 2015). Most prior to assess the plausibility of new facts based on work predicts new relational links between terms a training set of facts, considering commonsense drawn from a fixed set. In a notable exception, data from ConceptNet in one of their settings. Neelakantan and Chang (2015) add new entities Like us, they can handle an unbounded set of terms to KBs using external resources along with prop- by using (simple) composition functions for novel erties of the KB itself. Relatedly, Yao et al. (2013) terms, which is rare among work in KBC. One key induce an unbounded set of entity categories and difference is that their best method requires iterat- associate them with entities in KBs. ing over the KB at test time, which can be com- Several researchers have developed techniques putationally expensive with large KBs. Our mod- for discovering commonsense knowledge from els do not require iterating over the training set. text (Gordon et al., 2010; Gordon and Schu- We compare to several baselines inspired by their bert, 2012; Gordon, 2014; Angeli and Manning, work, and we additionally evaluate our model’s 2014). Open information extraction systems like ability to score novel tuples derived from both REVERB (Fader et al., 2011) and NELL (Carl- ConceptNet and Wikipedia. son et al., 2010) find tuples with arbitrary terms and relations from raw text. In contrast, we start 3 Models with a set of commonsense facts to use for train- 1Available at http://ttic.uchicago.edu/ Our goal is to represent commonsense knowledge ˜kgimpel/commonsense.html. such that it can be used for NLP tasks. We assume this knowledge is given in the form of tuples where a is a nonlinear activation function (tuned ht1; R; t2i, where t1 is the left term, t2 is the right among ReLU, tanh, and logistic sigmoid) and term, and R is a (directed) relation that exists be- where we have introduced additional parameters tween the terms. Examples are shown in Table 1.2 W (B) and b(B). This gives us the following model: Given a set of tuples, our goal is to develop a > parametric model that can provide a confidence scorebilinear(t1; R; t2) = u1 MR u2 score for new, unseen tuples. That is, we want to design and train models that define a function When using the LSTM, we tune the decision about how to produce the final term vectors to pass to the score(t1; R; t2) that provides a quality score for bilinear model, including possibly using the final an arbitrary tuple ht1; R; t2i. These models will be evaluated by their ability to distinguish true held- vectors from each direction and the output of max out tuples from false ones. or average pooling. We use the same LSTM pa- We describe two model families for scoring tu- rameters for each term. ples. We assume that we have embeddings for 3.2 Deep Neural Network Models words and define models that use these word embeddings to score tuples.

Load more