Discovering Implicit Knowledge with Unary Relations

Discovering Implicit Knowledge with Unary Relations Michael Glass Alfio Gliozzo IBM Research AI IBM Research AI Knowledge Induction and Reasoning Knowledge Induction and Reasoning [email protected] [email protected] Abstract to detect an occurrence of this relation between the entities TRUMP and UNITED STATES from both State-of-the-art relation extraction ap- the sentences “Trump issued a presidential mem- proaches are only able to recognize rela- orandum for the United States” and “The Houston tionships between mentions of entity ar- Astros will visit President Donald Trump and the guments stated explicitly in the text and White House on Monday”. However, the first ex- typically localized to the same sentence. ample expresses an explicit relation between the However, the vast majority of relations are two entities, while the second states the same re- either implicit or not sententially local- lation implicitly and requires some background ized. This is a major problem for Knowl- knowledge and inference to identify it properly. In edge Base Population, severely limiting fact, the entity UNITED STATES is not even men- recall. In this paper we propose a new tioned explicitly in the text, and it is up to the methodology to identify relations between reader to recall that US presidents live in the White two entities, consisting of detecting a very House, and therefore people visiting it are visiting large number of unary relations, and us- the US president. ing them to infer missing entities. We de- Very often, relations expressed in text are im- scribe a deep learning architecture able to plicit. This reflects in the low recall of the cur- learn thousands of such relations very ef- rent KBP relation extraction methods, that are ficiently by using a common deep learn- mostly based on recognizing lexical-syntactic con- ing based representation. Our approach nections between two entities within the same sen- largely outperforms state of the art rela- tence. The state-of-the-art systems are affected tion extraction technology on a newly in- by very low performance, close to 16.6% F1, as troduced web scale knowledge base pop- shown in the latest TAC-KBP evaluation cam- ulation benchmark, that we release to the paigns and in the open KBP evaluation bench- research community. mark1. Existing approaches to dealing with im- 1 Introduction plicit information such as textual entailment de- pend on unsolved problems like inducing entail- Knowledge Base Population (KBP) from text is ment rules from text. the problem of extracting relations between enti- In this paper, we address the problem of ties with respect to a given schema, usually de- identifying implicit relations in text using a fined by a set of types and relations. The facts radically different approach, consisting of added to the KB are triples, consisting of two en- reducing the problem of identifying binary re- tities connected by a relation. Although providing lations into a much larger set of simpler unary explicit provenance for the triples is often a sub- relations. For example, to build a Knowl- goal in KBP, we focus on the case where correct edge Base (KB) about presidents in the G8 triples are gathered from text without necessarily countries, the presidentOf relation can be annotating any particular text with a relation. Hu- expanded to presidentOf :UNITED STATES, pres- mans are able to perform very well on the task of identOf :GERMANY, presidentOf :JAPAN, and so understanding relations in text. For example, if the target relation is presidentOf, anyone will be able 1https://kbpo.stanford.edu 1585 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 1585–1594 Melbourne, Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics on. For all these unary relations, we train a multi- vised KBP methodologies, with a focus on knowl- class (and in other cases, multi-label) classifier edge induction applications. Section3 introduces from all the available training data. This classifier the use of Unary Relations for KBP and section takes textual evidence where only one entity is 4 outlines the process for producing and training identified (e.g. ANGELA MERKEL) and predicts them. Section5 describes a deep learning archi- a confidence score for each unary relation. In tecture able to recognize unary relations from tex- this way, ANGELA MERKEL will be assigned to tual evidence. In section6 we describe the bench- the unary relation presidentOf :GERMANY, mark for evaluation. Section7 provides an exten- which in turn generates the triple sive evaluation of unary relations, and a saliency hANGELA MERKEL presidentOf GERMANYi. map exploration of what the deep learning model To implement the idea above, we explore the has learned. Section8 concludes the paper high- use of knowledge-level supervision, sometimes lighting research directions for future work. called distant supervision, to train a deep learning based approach. The training data in this approach 2 Related Work is a knowledge base and an unannotated corpus. A pre-existing Entity Detection and Linking sys- Binary relation extraction using distant supervi- tem first identifies and links mentions of entities in sion has a long history (Surdeanu et al., 2012; the corpus. For each entity, the system gathers its Mintz et al., 2009). Mentions of entities from the context set, the contexts (e.g. sentences or token knowledge base are located in text. When two en- windows) where it is mentioned. The context set tities are mentioned in the same sentence that sen- forms the textual evidence for a multi-class, multi- tence becomes part of the evidence for the relation label deep network. The final layer of the network (if any) between those entities. The set of sen- is vector of unary relation predictions and the in- tences mentioning an entity pair is used in a ma- termediate layers are shared. This architecture al- chine learning model to predict how the entities lows us to efficiently train thousands of unary rela- are related, if at all. tions, while reusing the feature representations in Deep learning has been applied to binary rela- the intermediate layers across relations as a form tion extraction. CNN-based (Zeng et al., 2014), of transfer learning. The predictions of this net- LSTM-based (Xu et al., 2015), attention based work represent the probability for the input entity (Wang et al., 2016) and compositional embedding to belong to each unary relation. based (Gormley et al., 2015) models have been To demonstrate the effectiveness of our ap- trained successfully using a sentence as the unit proach we developed a new KBP benchmark, con- of context. Recently, cross sentence approaches sisting of extracting unseen DBPedia triples from have been explored by building paths connecting the text of a web crawl, using a portion of DBpe- the two identified arguments through related enti- dia to train the model. As part of the contributions ties (Peng et al., 2017; Zeng et al., 2016). These for this paper, we release the benchmark to the re- approaches are limited by requiring both entities search community providing the software needed to be mentioned in a textual context. The context to generate it from Common Crawl and DBpedia aggregation approaches of state-of-the-art neural as an open source project2. models, max-pooling (Zeng et al., 2015) and at- As a baseline, we adapt a state of the art tention (Lin et al., 2016), do not consider that dif- deep learning based approach for relation extrac- ferent contexts may contribute to the prediction in tion (Lin et al., 2016). Our experiments clearly different ways. Instead, the context pooling only show that using unary relations to generate new determines the degree of a sentence’s contribution triples greatly complements traditional binary ap- to the relation prediction. proaches. An analysis of the data shows that our TAC-KBP is a long running challenge for approach is able to capture implicit information knowledge base population. Effective systems from textual mentions and to highlight the reasons in these competitions combine many approaches why the assignments have been made. such as rule-based relation extraction, directly su- The paper is structured as follows. In section2 pervised linear and neural network extractors, dis- we describe the state of the art in distantly super- tantly supervised neural network models (Zhang et al., 2016) and tensor factorization approaches 2https://github.com/IBM/cc-dbp to relation prediction. Compositional Universal 1586 Schema is an approach based on combining the the subset of the KB extension that occurs in the matrix factorization approach of universal schema corpus. (Riedel et al., 2013), with repesentations of tex- A requisite for a unary relation is that in the tual relations produced by an LSTM (Chang et al., training KB there should exist many triples that 2016). The rows of the universal schema matrix share a relation and a particular entity as one ar- are entity pairs, and will only be supported by a gument, thus providing enough training for each textual relation if they occur in a sentence together. unary classifier. Therefore, in the example above, Other approaches to relational knowledge in- we will not likely be able to generate predicates for duction have used distributed representations for all the 195 countries, because some of them will words or entities and used a model to predict the either not occur at all in the training data or they relation between two terms based on their seman- will be very infrequent. However, even in cases tic vectors (Drozd et al., 2016). This enables the where arguments tend to follow a long tail distri- discovery of relations between terms that do not bution, it makes sense to generate unary predicates co-occur in the same sentence.

Load more