Neural Relation Extraction for Knowledge Base Enrichment

Neural Relation Extraction for Knowledge Base Enrichment Bayu Distiawan Trisedya1, Gerhard Weikum2, Jianzhong Qi1, Rui Zhang1∗ 1 The University of Melbourne, Australia 2 Max Planck Institute for Informatics, Saarland Informatics Campus, Germany fbtrisedya@student, jianzhong.qi@, [email protected] [email protected] Abstract Input sentence: "New York University is a private university in Manhattan." We study relation extraction for knowledge Unsupervised approach output: base (KB) enrichment. Specifically, we aim hNYU, is, private universityi to extract entities and their relationships from hNYU, is private university in, Manhattani sentences in the form of triples and map the Supervised approach output: elements of the extracted triples to an existing hNYU, instance of, Private Universityi hNYU, located in, Manhattani KB in an end-to-end manner. Previous stud- Canonicalized output: ies focus on the extraction itself and rely on hQ49210, P31, Q902104i Named Entity Disambiguation (NED) to map hQ49210, P131, Q11299i triples into the KB space. This way, NED errors may cause extraction errors that affect the Table 1: Relation extraction example. overall precision and recall. To address this problem, we propose an end-to-end relation Previous studies work on embedding-based extraction model for KB enrichment based on model (Nguyen et al., 2018; Wang et al., 2015) a neural encoder-decoder model. We collect high-quality training data by distant supervi- and entity alignment model (Chen et al., 2017; sion with co-reference resolution and para- Sun et al., 2017; Trisedya et al., 2019) to en- phrase detection. We propose an n-gram based rich a knowledge base. Following the success of attention model that captures multi-word en- the sequence-to-sequence architecture (Bahdanau tity names in a sentence. Our model employs et al., 2015) for generating sentences from struc- jointly learned word and entity embeddings to tured data (Marcheggiani and Perez-Beltrachini, support named entity disambiguation. Finally, 2018; Trisedya et al., 2018), we employ this ar- our model uses a modified beam search and a triple classifier to help generate high-quality chitecture to do the opposite, which is extracting triples. Our model outperforms state-of-the- triples from a sentence. art baselines by 15:51% and 8:38% in terms of In this paper, we study how to enrich a KB by F1 score on two real-world datasets. relation exaction from textual sources. Specif- ically, we aim to extract triples in the form of 1 Introduction hh; r; ti, where h is a head entity, t is a tail entity, and r is a relationship between the enti- Knowledge bases (KBs), often in the form of ties. Importantly, as KBs typically have much knowledge graphs (KGs), have become essential better coverage on entities than on relationships, resources in many tasks including Q&A systems, we assume that h and t are existing entities in recommender system, and natural language gener- a KB, r is a predicate that falls in a prede- ation. Large KBs such as DBpedia (Auer et al., fined set of predicates we are interested in, but ¨ 2007), Wikidata (Vrandecic and Krotzsch, 2014) the relationship hh; r; ti does not exist in the KB and Yago (Suchanek et al., 2007) contain millions yet. We aim to find more relationships between of facts about entities, which are represented in the h and t and add them to the KB. For exam- form of subject-predicate-object triples. However, ple, from the first extracted triples in Table1 we these KBs are far from complete and mandate con- may recognize two entities "NYU" (abbreviation tinuous enrichment and curation. of New York University) and "Private ∗Rui Zhang is the corresponding author. University", which already exist in the KB; also the predicate "instance of" is in the added to the KB. set of predefined predicates we are interested in, We aim to integrate the extraction and the but the relationship of hNYU, instance of, canonicalization tasks by proposing an end- Private Universityi does not exist in the to-end neural learning model to jointly extract KB. We aim to add this relationship to our KB. triples from sentences and map them into This is the typical situation for KB enrichment an existing KB. Our method is based on the (as opposed to constructing a KB from scratch or encoder-decoder framework (Cho et al., 2014) performing relation extraction for other purposes, by treating the task as a translation of a sentence such as Q&A or summarization). into a sequence of elements of triples. For the KB enrichment mandates that the entities and example in Table1, our model aims to translate relationships of the extracted triples are canonical- "New York University is a private ized by mapping them to their proper entity and university in Manhattan" into a se- predicate IDs in a KB. Table1 illustrates an ex- quence of IDs "Q49210 P31 Q902104 ample of triples extracted from a sentence. The Q49210 P131 Q11299", from which we can entities and predicate of the first extracted triple, derive two triples to be added to the KB. including NYU, instance of, and Private A standard encoder-decoder model with atten- University, are mapped to their unique IDs tion (Bahdanau et al., 2015) is, however, unable Q49210, P31, and Q902104, respectively, to to capture the multi-word entity names and ver- comply with the semantic space of the KB. bal or noun phrases that denote predicates. To Previous studies on relation extraction have address this problem, we propose a novel form employed both unsupervised and supervised ap- of n-gram based attention that computes the n- proaches. Unsupervised approaches typically start gram combination of attention weight to capture with a small set of manually defined extraction the verbal or noun phrase context that comple- patterns to detect entity names and phrases about ments the word level attention of the standard at- relationships in an input text. This paradigm is tention model. Our model thus can better cap- known as Open Information Extraction (Open IE) ture the multi-word context of entities and rela- (Banko et al., 2007; Corro and Gemulla, 2013; tionships. Our model harnesses pre-trained word Gashteovski et al., 2017). In this line of ap- and entity embeddings that are jointly learned with proaches, both entities and predicates are captured skip gram (Mikolov et al., 2013) and TransE (Bor- in their surface forms without canonicalization. des et al., 2013). The advantages of our jointly Supervised approaches train statistical and neural learned embeddings are twofold. First, the em- models for inferring the relationship between two beddings capture the relationship between words known entities in a sentence (Mintz et al., 2009; and entities, which is essential for named entity Riedel et al., 2010, 2013; Zeng et al., 2015; Lin disambiguation. Second, the entity embeddings et al., 2016). Most of these studies employ a pre- preserve the relationships between entities, which processing step to recognize the entities. Only help to build a highly accurate classifier to filter few studies have fully integrated the mapping of the invalid extracted triples. To cope with the lack extracted triples onto uniquely identified KB en- of fully labeled training data, we adapt distant su- tities by using logical reasoning on the existing pervision to generate aligned pairs of sentence and KB to disambiguate the extracted entities (e.g., triple as the training data. We augment the process (Suchanek et al., 2009; Sa et al., 2017)). with co-reference resolution (Clark and Manning, 2016) and dictionary-based paraphrase detection Most existing methods thus entail the need for (Ganitkevitch et al., 2013; Grycner and Weikum, Named Entity Disambiguation (NED) (cf. the sur- 2016). The co-reference resolution helps extract vey by Shen et al.(2015)) as a separate process- sentences with implicit entity names, which en- ing step. In addition, the mapping of relationship larges the set of candidate sentences to be aligned phrases onto KB predicates necessitates another with existing triples in a KB. The paraphrase de- mapping step, typically aided by paraphrase dic- tection helps filter sentences that do not express tionaries. This two-stage architecture is inherently any relationships between entities. prone to error propagation across its two stages: NED errors may cause extraction errors (and vice The main contributions of this paper are: versa) that lead to inaccurate relationships being • We propose an end-to-end model for extracting and canonicalizing triples to enrich a KB. related to ours is Neural Open IE (Cui et al., 2018), The model reduces error propagation between which proposed an encoder-decoder with attention relation extraction and NED, which existing model to extract triples. However, this work is not approaches are prone to. geared for extracting relations of canonicalized en- • We propose an n-gram based attention model tities. Another line of studies use neural learning to effectively map the multi-word mentions of for semantic role labeling (He et al., 2018), but the entities and their relationships into uniquely goal here is to recognize the predicate-argument identified entities and predicates. We propose structure of a single input sentence – as opposed joint learning of word and entity embeddings to extracting relations from a corpus. to capture the relationship between words and All of these methods generate triples where the entities for named entity disambiguation. We head and tail entities and the predicate stay in further propose a modified beam search and a their surface forms. Therefore, different names triple classifier to generate high-quality triples. and phrases for the same entities result in multiple • We evaluate the proposed model over two triples, which would pollute the KG if added this real-world datasets.

Load more