Neural Relation Extraction Within and Across Sentence Boundaries
Total Page:16
File Type:pdf, Size:1020Kb
Neural Relation Extraction Within and Across Sentence Boundaries Pankaj Gupta1,2, Subburam Rajaram1, Bernt Andrassy1, Hinrich Schutze¨ 2, Thomas Runkler1 1Corporate Technology, Machine-Intelligence (MIC-DE), Siemens AG Munich, Germany 2CIS, University of Munich (LMU) Munich, Germany pankaj.gupta, subburam.rajaram @siemens.com | [email protected] f g Abstract The two sentences together convey the fact that the entity e1 is associated with e2, which cannot be inferred from ei- Past work in relation extraction mostly focuses on binary re- ther sentence alone. The missed relations impact the system lation between entity pairs within single sentence. Recently, the NLP community has gained interest in relation extrac- performance, leading to poor recall. But precision is equally tion in entity pairs spanning multiple sentences. In this paper, important; e.g., in high-valued biomedicine domain, signif- we propose a novel architecture for this task: inter-sentential icant inter-sentential relationships must be extracted, espe- dependency-based neural networks (iDepNN). iDepNN mod- cially in medicine that aims toward accurate diagnostic test- els the shortest and augmented dependency paths via recur- ing and precise treatment, and extraction errors can have se- rent and recursive neural networks to extract relationships vere negative consequences. In this work, we present a neu- within (intra-) and across (inter-) sentence boundaries. Com- ral network (NN) based approach to precisely extract rela- pared to SVM and neural network baselines, iDepNN is more tionships within and across sentence boundaries, and show a robust to false positives in relationships spanning sentences. better balance in precision and recall with an improved F1. We evaluate our models on four datasets from newswire (MUC6) and medical (BioNLP shared task) domains that Previous work on cross-sentence relation extraction used achieve state-of-the-art performance and show a better bal- coreferences to access entities that occur in a different sen- ance in precision and recall for inter-sentential relationships. tence (Gerber and Chai 2010; Yoshikawa et al. 2011) with- We perform better than 11 teams participating in the BioNLP out modeling inter-sentential relational patterns. Swampillai shared task 2016 and achieve a gain of 5:2%(0:587 vs 0:558) and Stevenson (2011) described a SVM-based approach to in F1 over the winning team. We also release the cross- both intra- and inter-sentential relations. Recently, Quirk and sentence annotations for MUC6. Poon (2016) applied distant supervision to cross-sentence relation extraction of entities using binary logistic regres- sion (non-neural network based) classifier and Peng et al. Introduction (2017) applied sophisticated graph long short-term memory The task of relation extraction (RE) aims to identify seman- networks to cross-sentence n-ary relation extraction. How- tic relationship between a pair of nominals or entities e1 and ever, it still remains challenging due to the need for corefer- e2 in a given sentence S. Due to a rapid growth in infor- ence resolution, noisy text between the entity pairs spanning mation, it plays a vital role in knowledge extraction from multiple sentences and lack of labeled corpora. unstructured texts and serves as an intermediate step in a va- Bunescu and Mooney (2005), Nguyen, Matsuo, and riety of NLP applications in newswire, web and high-valued Ishizuka (2007) and Mintz et al. (2009) have shown that biomedicine (Bahcall 2015) domains. Consequently, there the shortest dependency path (SDP) between two entities has been increasing interest in relation extraction, particu- in a dependency graph and the dependency subtrees are the larly in augmenting existing knowledge bases. arXiv:1810.05102v2 [cs.CL] 14 Jan 2019 most useful dependency features in relation classification. Progress in relation extraction is exciting; however most Further, Liu et al. (2015) developed these ideas using Re- prior work (Zhang et al. 2006; Kambhatla 2004; Vu et al. cursive Neural Networks (RecNNs, Socher et al. (2014)) and 2016a; Gupta, Schutze,¨ and Andrassy 2016) is limited to combined the two components in a precise structure called single sentences, i.e., intra-sentential relationships, and ig- Augmented Dependency Path (ADP), where each word on nores relations in entity pairs spanning sentence boundaries, a SDP is attached to a dependency subtree; however, lim- i.e., inter-sentential. Thus, there is a need to move beyond ited to single sentences. In this paper, we aspire from these single sentences and devise methods to extract relationships methods to extend shortest dependency path across sentence spanning sentences. For instance, consider the sentences: boundary and effectively combine it with dependency sub- Paul Allen has started a company and named [Vern Raburnse1 trees in NNs that can capture semantic representation of the its President. The company, to be called [Paul Allen Group]e2 will structure and boost relation extraction spanning sentences. be based in Bellevue, Washington. The contributions are: (1) Introduce a novel dependency- Copyright c 2019, Association for the Advancement of Artificial based neural architecture, named as inter-sentential Intelligence (www.aaai.org). All rights reserved. Dependency-based Neural Network (iDepNN) to extract re- dobj conj NEXTS nsubj cc started based compound iobj poss dobj det [Vern Raburn] Paul Allen started a company and named e1 its President Allen and named company company called will be Washington nsubjpass nmod dep Paul Raburn President a the to be group in Bellevue det aux dobj aux compound case auxpass auxpass The company to be called [Paul Allen Group] will be based in Bellevue Washington e2 e1 Vern its Paul Allen e2 Figure 1: Left: Sentences and their dependency graphs. Right: Inter-sentential Shortest Dependency Path (iSDP) across sentence boundary. Connection between the roots of adjacent sentences by NEXTS. lations within and across sentence boundaries by modeling the root node of the parse tree of a sentence is connected to shortest and augmented dependency paths in a combined the root of the subsequent tree, leading to the shortest path structure of bidirectional RNNs (biRNNs) and RecNNs. (2) from one entity to another across sentences. Evaluate different linguistic features on four datasets from Figure 1 (Left) shows dependency graphs for the exam- newswire and medical domains, and report an improved ple sentences where the two entities e1 and e2 appear in performance in relations spanning sentence boundary. We nearby sentences and exhibit a relationship. Figure 1 (Right) show amplified precision due to robustness towards false illustrates that the dependency trees of the two adjacent sen- positives, and a better balance in precision and recall. tences and their roots are connected by NEXTS to form an We perform better than 11 teams participating in in the iSDP, an inter-Sentential Dependency Path, (highlighted in BioNLP shared task 2016 and achieve a gain of 5:2% gray) between the two entities. The shortest path spanning (0:587 vs 0:558) in F1 over the winning team. (3) Release sentence boundary is seen as a sequence of words between relation annotations for the MUC6 dataset for intra- and two entities. Figure 2 shows how a biRNN (Schuster and inter-sentential relationships. Code, data and supplemen- Paliwal 1997; Vu et al. 2016b) uses iSDP to detect relation tary are available at https://github.com/pgcool/ between e1 and e2, positioned one sentence apart. Cross-sentence-Relation-Extraction-iDepNN. Modeling Inter-sentential Dependency Subtrees: To ef- fectively represent words on the shortest dependency path Methodology within and across sentence boundary, we model dependency subtrees assuming that each word w can be seen as the word Inter-sentential Dependency-Based Neural itself and its children on the dependency subtree. The notion Networks (iDepNN) of representing words using subtree vectors within the de- Dependency-based neural networks (DepNN) (Bunescu and pendency neural network (DepNN) is similar to (Liu et al. Mooney 2005; Liu et al. 2015) have been investigated for re- 2015); however, our proposed structures are based on iSDPs lation extraction between entity pairs limited to single sen- and ADPs that span sentences. tences, using the dependency information to explore the se- To represent each word w on the subtree, its word em- d mantic connection between two entities. In this work, we bedding vector xw P and subtree representation cw P introduce iDepNN, the inter-sentential Dependency-based d1 R are concatenated to form its final representation pw P Neural Network, an NN that models relationships between R 1 d`d . We use 200-dimensional pretrained GloVe embed- entity pairs spanning sentences, i.e., inter-sentential within dingsR (Pennington, Socher, and Manning 2014). The sub- a document. We refer to the iDepNN that only models the tree representation of a word is computed through recur- shortest dependency path (SDP) spanning sentence bound- sive transformations of the representations of its children ary as iDepNN-SDP and to the iDepNN that models aug- words. A RecNN is used to construct subtree embedding c , mented dependency paths (ADPs) as iDepNN-ADP; see be- w traversing bottom-up from its leaf words to the root for en- low. biRNNs (bidirectional RNNs, Schuster and Paliwal tities spanning sentence boundaries, as shown in Figure 2. (1997)) and RecNNs (recursive NNs, Socher et al. (2012)) For a word which is a leaf node, i.e., it does not have a sub- are the backbone of iDepNN. tree, we set its subtree representation as cLEAF. Figure 2 Modeling Inter-sentential Shortest Dependency Path illustrates how subtree-based word representations are con- (iDepNN-SDP): We compute the inter-sentential Shortest structed via iSDP. Dependency Path (iSDP) between entities spanning sen- Each word is associated with a dependency relation r, tence boundaries for a relation. To do so, we obtain the de- e.g., r = dobj, during the bottom-up construction of the sub- pendency parse tree for each sentence using the Stanford- d1ˆpd`d1q tree.