Arxiv:2106.08657V1 [Cs.CL] 16 Jun 2021
Total Page:16
File Type:pdf, Size:1020Kb
EIDER: Evidence-enhanced Document-level Relation Extraction Yiqing Xie, Jiaming Shen, Sha Li, Yuning Mao, Jiawei Han University of Illinois at Urbana-Champaign, IL, USA {xyiqing2, js2, shal2, yuningm2, hanj}@illinois.edu Abstract h: Hero of the Day t: the United States r: [country of origin] Ground truth evidence: [1,10] Predicted evidence: [1,10] Document-level relation extraction (DocRE) Original document as input: [1] Load is the sixth studio album by the American heavy metal band Metallica , released on June 4, aims at extracting the semantic relations 1996 by Elektra Records in the United States and by Vertigo among entity pairs in a document. In DocRE, Records internationally. … [9] It was certified 5×platinum by the a subset of the sentences in a document, called Recording Industry Association of America ( RIAA ) for shipping five million copies in the United States. [10] Four singles—"Until It the evidence sentences, might be sufficient for Sleeps", "Hero of the Day", "Mama Said" , and "King Nothing"— predicting the relation between a specific en- were released as part of the marketing campaign for the album. tity pair. To make better use of the evidence Prediction results (logits): NA: 17.63 country of origin: 14.79 sentences, in this paper, we propose a three- Predicted evidence as input: [1] Load is … in the United States and by Vertigo Records internationally. [10] Four singles—"Until It stage evidence-enhanced DocRE framework Sleeps", "Hero of the Day", … for the album. consisting of joint relation and evidence ex- Prediction results (logits): country of origin: 18.31 NA: 13.45 traction, evidence-centered relation extraction Final prediction result of our model: country of origin (RE), and fusion of extraction results. We first jointly train an RE model with a sim- Figure 1: A test sample in the DocRED dataset (Yao ple and memory-efficient evidence extraction et al., 2019), where the ith sentence in the document model. Then, we construct pseudo documents is marked with [i] at the start. Our model correctly based on the extracted evidence sentences and predicts [1,10] as evidence, and if we only use the ex- run the RE model again. Finally, we fuse the tracted evidence as input, the model can predict the re- extraction results of the first two stages us- lation “country of origin” correctly. ing a blending layer and make a final predic- tion. Extensive experiments show that our pro- posed framework achieves state-of-the-art per- level relation extraction (DocRE) (Quirk and Poon, formance on the DocRED dataset, outperform- 2017; Peng et al., 2017; Gupta et al., 2019). ing the second best method by 0.76/0.82 Ign Although multiple sentences might be necessary F1/F1. In particular, our method significantly to infer a relation, the sentences in a document are improves the performance on inter-sentence re- not equally important for each entity pair and some lations by 1.23 Inter F1. sentences could be irrelevant for the relation pre- diction. We refer to the minimal set of sentences 1 Introduction required to infer a relation as evidence sentences Relation extraction (RE) is the task of extracting (Huang et al., 2021). In the example shown in Fig- ure1, the 1st and 10th sentences serve as evidence arXiv:2106.08657v1 [cs.CL] 16 Jun 2021 semantic relations among entities within a given text, which is a critical step in information extrac- sentences to the “country of origin” relation be- tion and has abundant applications (Yu et al., 2017; tween “Hero of the Day” and “the United States”. Shi et al., 2019; Trisedya et al., 2019). Prior stud- The 1st sentence indicates that Load is originated ies mostly focus on sentence-level RE, where the from the United States, and the 10th indicates Hero two entities of interest co-occur in the same sen- of the Day is a song of Load. Hence, Hero of the tence and it is assumed that their relation can be Day is also from the United States. On the other derived from the sentence (Zeng et al., 2015; Cai hand, although the 9th sentence also mentions “the et al., 2016). However, this assumption does not United States”, it is irrelevant to the relation pre- always hold and some relations between entities diction. Sometimes including such irrelevant sen- can only be inferred given multiple sentences as the tences in the input might introduce noise to the context. As a result, recent studies have been mov- model and be more detrimental than beneficial. ing towards the more realistic setting of document- In light of the observations above, we propose two approaches to make better use of evidence sen- and make another set of predictions based on the tences. The first is to jointly extract relations and pseudo document. In the last stage, we fuse the evidence. Intuitively, both tasks should focus on predictions based on the original document and the the information relevant to the current entity pair, pseudo document using a blending layer (Wolpert, such as the underlined “Load” and “the album” in 1992). In this way, EIDER puts more attention to the 10th sentence of Figure1. This suggests that the important sentences extracted in the first stage, the two tasks have commonality and can provide ad- while still having access to the whole document to ditional training signals for each other. Huang et al. avoid information loss. (2020) trains these two tasks in a multi-task learn- Contributions. (1) We propose a memory- ing manner. However, their model makes one pre- efficient multi-task learning DocRE framework for diction for every <sentence, relation, entity, entity> joint relation and evidence extraction, which only tuple, which requires massive GPU memory and requires around 14% additional memory compared long training time. Our method adopts a much sim- to a single RE model alone. (2) We refine the pler model structure, which only predicts for each model inference stage by using a blending layer to (relation, entity, entity) tuple and can be trained on fuse the prediction results from the original docu- a single consumer GPU. ment and the extracted evidence, which allows the The second approach is to conduct evidence- model to focus more on the important sentences centered relation extraction with the evidence sen- with no information loss. (3) We conduct exten- tences as model input. In the extreme case, if there sive experiments to demonstrate the effectiveness is only one sentence related to the relation predic- of our method, which achieves state-of-the-art per- tion, we can make predictions solely based on this formance on the large-scale DocRED dataset. sentence and reduce the problem to sentence-level relation extraction. One concurrent work (Huang 2 Problem Formulation et al., 2021) shows the effectiveness of replacing the original documents with sentences extracted Given a document d comprised of N sentences by hand-crafted rules. However, the sentences ex- N fstgt=1 and a set of entities feig appearing in d, the tracted by rules are not perfect. Solely relying task of document-level relation extraction (DocRE) on heuristically extracted sentences may result in is to predict the relations between all entity pairs S information loss and harm model performance in (eh; et) from a pre-defined relation set R fNAg. certain cases. Instead, our evidence is obtained by We refer to eh and et as the head entity and tail en- a dedicated evidence extraction model. In addition, tity, respectively. An entity ei may appear multiple we fuse the prediction results of both the original times in document d, where we denote its corre- document and extracted evidence to avoid informa- i sponding mentions as fmjg. A relation r 2 R tion loss, and demonstrate that the two sources of between (eh; et) exists if it is expressed by any predictions are complementary to each other. pair of their mentions, and otherwise labeled as Specifically, in this paper, we propose an NA. For each entity pair (eh; et) that possesses a 1 evidence-enhanced RE framework, named EIDER, non-NA relation, we define its evidence sentences K which automatically extracts evidence and effec- Vh;t = fsvi gi=1 as the subset of sentences in the tively leverages the extracted evidence to improve document that are sufficient for human annotators the performance of DocRE in three stages. In the to infer the relation. first stage, we train a relation extraction model and an evidence extraction model in a multi-task learn- 3 Methodology ing manner. We adopt localized context pooling (Zhou et al., 2021) in both models, which enhances The framework of EIDER consists of three stages: the entity embedding with additional context rel- joint relation and evidence extraction (Sec. 3.1), evant to the current entity pair. To reduce mem- evidence-centered relation extraction (Sec. 3.2) ory usage and training time, we use the same sen- and fusion of extraction results (Sec. 3.3). An tence representation for each relation and only train illustration of our framework is shown in Figure2. the evidence extraction model on entity pairs with at least one relation. In the second stage, we re- 1We use “evidence sentence” and “evidence” interchange- gard the extracted evidence as a pseudo document ably through the paper. Predicted relation from Orig doc: NA Predicted evidence: [1, 4] Final predicted relation: Country & Located in (Learned threshold: -0.28) Relation Classifier Evidence Classifier Sent embs Head emb Blending Layer … [1] … Context emb … [2] … … … Pred Scores from Orig doc Pred Scores from Pseudo doc Tail emb … [4] Country: -1.05 Country: 3.76 Weights Located in: -1.82 Located in: 2.70 … … Citizenship: -11.53 Citizenship: -5.54 … … Paul Desmarais is a Canadian businessman … Desmarais was born in Ontario … Attention to head & tail Canadian Ontario After convergence Trained model Pre-trained Encoder Original Document: [1] Paul Desmarais is a Canadian businessman in his Pseudo Document: [1] Paul Desmarais is a hometown of Montreal .