Span Model for Open Information Extraction on Accurate Corpus
Total Page:16
File Type:pdf, Size:1020Kb
Span Model for Open Information Extraction on Accurate Corpus Junlang Zhan1,2,3, Hai Zhao1,2,3,∗ 1Department of Computer Science and Engineering, Shanghai Jiao Tong University 2 Key Laboratory of Shanghai Education Commission for Intelligent Interaction and Cognitive Engineering, Shanghai Jiao Tong University, Shanghai, China 3 MoE Key Lab of Artificial Intelligence, AI Institute, Shanghai Jiao Tong University [email protected], [email protected] Abstract Sentence Repeat customers can purchase luxury items at reduced prices. Open Information Extraction (Open IE) is a challenging task Extraction (A0: repeat customers; can purchase; especially due to its brittle data basis. Most of Open IE sys- A1: luxury items; A2: at reduced prices) tems have to be trained on automatically built corpus and evaluated on inaccurate test set. In this work, we first alle- viate this difficulty from both sides of training and test sets. Table 1: An example of Open IE in a sentence. The extrac- For the former, we propose an improved model design to tion consists of a predicate (in bold) and a list of arguments, more sufficiently exploit training dataset. For the latter, we separated by semicolons. present our accurately re-annotated benchmark test set (Re- OIE2016) according to a series of linguistic observation and analysis. Then, we introduce a span model instead of previ- ous adopted sequence labeling formulization for n-ary Open long short-term memory (BiLSTM) labeler and BIO tagging IE. Our newly introduced model achieves new state-of-the-art scheme which built the first supervised learning model for performance on both benchmark evaluation datasets. Open IE. (Cui, Wei, and Zhou 2018) constructed a large but noisy annotated corpus by using OpenIE4 to perform ex- tractions on Wikipedia then kept only tuples with high con- Introduction fidence score. They also built an Open IE system by using a Open Information Extraction (Open IE) aims to generate a neural sequence to sequence model and copy mechanism to structured representation of information from natural lan- generate extractions. However, this system can only perform guage text in the form of verbs (or verbal phrases) and binary extractions. their arguments. An Open IE system is usually domain- In this work, we develop an Open IE system by adapt- independent and does not rely on a pre-defined ontology ing a modified span selection model which has been ap- schema. An example of Open IE is shown in Table 1. Open plied in Semantic Role Labeling (SRL)(Ouchi, Shindo, IE has been widely applied in many downstream tasks such and Matsumoto 2018; He et al. 2018b; Li et al. 2019; as textual entailment and question answering (Fader, Zettle- Zhao, Zhang, and Kit 2013; Zhao, Chen, and Kit 2009; moyer, and Etzioni 2014). Some existing Open IE systems Li et al. 2018b; Cai et al. 2018), coreference resolution(Shou are listed in Table 2. Most of these systems are built in un- and Zhao 2012; Zhang, Wu, and Zhao 2012) and syntactic supervised manner except that TextRunner and OLLIE are parsing (Zhang, Zhao, and Qin 2016; Ma and Zhao 2012; built in a self-supervised manner. Li, Zhao, and Parnow 2020; Zhou and Zhao 2019; Zhao et To perform Open IE in supervised learning and to lever- arXiv:1901.10879v6 [cs.CL] 21 Nov 2019 al. 2009; Li et al. 2018a). The advantage of a span model is age the advantages of neural network, (Stanovsky et al. that span level features can be sufficiently exploited which 2018) created the first annotated corpus by an automatic is hard to perform in token based sequence labeling models. translation from Question-Answer Driven Semantic Role As far as we know, this is the first attempt that applies span Labeling (QA-SRL) annotations (He, Lewis, and Zettle- model on Open IE. moyer 2015) and Question-Answer Meaning Representation As previous Open IE systems were mostly built on a loose (QAMR) (Michael et al. 2018). Based on this corpus, they data basis, we thus intend to strengthen such a basis from developed an Open IE system RnnOIE using a bidirectional both sides of training and test sets. We constructed a large ∗ training corpus following the method of (Cui, Wei, and Zhou Corresponding author. This paper was partially supported 2018). The differences of our construction method and the by National Key Research and Development Program of China (No. 2017YFB0304100), Key Projects of National Natural Science method of (Cui, Wei, and Zhou 2018) are two-fold: First, our Foundation of China (U1836222 and 61733011). corpus is constructed for n-ary extraction instead of binary Copyright c 2020, Association for the Advancement of Artificial extraction. Second, previous methods only keep extractions Intelligence (www.aaai.org). All rights reserved. with high confidence score for training. However, the confi- Systems Base Main Features Name Work The first Open IE system using TextRunner (Yates et al. 2007) - self-supervised learning approach. ReVerb (Fader, Soderland, and Etzioni 2011) - Leveraging POS tag patterns. OLLIE (Mausam et al. 2012) Reverb Learning extraction patterns. ClausIE (Del Corro and Gemulla 2013) - Decomposing a sentence into clauses. Stanford (Angeli, Johnson Premkumar, and Manning 2015) - Splitting a sentence into utterances. Adding RELNOUN (Pal and 2016) OpenIE4 (Mausam 2016) OLLIE , SRLIE (Christensen et al. 2011). Adding CALMIE (Saha and 2018) OpenIE5 - OpenIE4 , BONIE (Saha, Pal, and Mausam 2017). Using conversion rules from PropS (Stanovsky et al. 2016) - dependency trees. MinIE (Gashteovski, Gemulla, and Del Corro 2017) ClausIE Mitigating overspecific extractions. Parsing a sentence into core facts Graphene (Cetto et al. 2018) - and accompanying contexts. RnnOIE (Stanovsky et al. 2018) - BiLSTM and BIO tagging. Seq2seq OIE (Cui, Wei, and Zhou 2018) - Seq2seq model and copy mechanism. Table 2: A summary of existing Open IE systems dence scores estimated by OpenIE4 are not so reliable and score: 0 0 we have found that some extractions with low confidence arg max SCOREl(i ; j ); l 2 L (1) score are also useful. We thus propose a method designed to (i0;j0)2S leverage such extractions with low confidence scores. where At the stage of evaluation, the benchmark constructed by (Stanovsky and Dagan 2016) is widely used as the test set for SCOREl(i; j) = Pθ(i; jjl) Open IE performance evaluation. However, since this corpus exp(φ (i; j; l)) (2) = θ was automatically constructed from QA-SRL, we observe P exp(φ (i0; j0; l)) that there are still incorrect extractions. To reduce the noise (i0;j0)2S θ and obtain a more accurate evaluation for Open IE systems, and φθ is a trainable scoring function with parameters θ. To we manually relabel this corpus. Experiments show that our train the parameters θ, in the training set, for each sample X system outperforms previous Open IE systems in both old and the gold structure Y ∗, we minimize the cross-entropy benchmark and our relabeled benchmark. loss: ∗ X lθ(X; Y ) = −logPθ(i; jjl) (3) Model (i;j;l)2Y ∗ Our system, named SpanOIE, consists of two parts. The first Note that some labels may not appear in the given sen- part is the predicate module to find predicate spans in a sen- tence. In this case, we define the predicate span as NULL tence. The second part is the argument module which takes a span and train a model to select the NULL span. sentence and a predicate span as input and outputs argument spans for this predicate. Spans Candidates Selection Problem Definition The main drawback of the span model is that too many spans We formulize the open information extraction as a span se- generated from a sentence will cause too high computational lection task. A span (i, j) is a piece of a sentence begin- cost and hurt the performance of the model. The number T (T +1) ning with the word i and ending with the word j. The goal of all possible spans for a sentence with size T is 2 , is to select the correct span for each Open IE role label. which makes the number of the spans grow quickly with the Given a sentence W = w1; :::; wn, our model predicts a set length of the sentence. To reduce the number of spans, we of predicate-argument relations Y ⊆ S × P × L, where propose three constraints for span candidates: S = f(wi; :::; wj)j1 ≤ i ≤ j ≤ ng is the set of all the spans • Maximum length constraint: During training, we only in W , P is the set of predicate spans which is a subset of S 1 keep the argument spans whose size is less than 10 words and L = fA0;A1;A2;A3g is the label set of Open IE . For and the predicate spans whose size is less than 5 words. each label l, our model selects a span (i; j) with the highest At inference stage, we remove this constraint. 1According to (Stanovsky et al. 2018), this labeled set covers • No overlapping constraint: We keep only the spans more than 96% sentences. which do not overlap with the given predicate span. • Syntactic constraint: This constraint is only for spans use span-consistent greedy search (Ouchi, Shindo, and Mat- whose length is bigger than 1. We keep only the spans sumoto 2018). which satisfy that a word is either the syntactic parent of Predicate Inference A predicate span that is completely another word in the same span, or the parent of this word included in another predicate span in the same sentence will is in the same span. For the example shown in Figure 1, be dropped. For example, given a sentence [James wants to the span [luxury items at] violates this syntactic constraint sell his company], two spans [wants to sell] and [to sell] are because the word at is not the parent of a word in the same both selected as predicates, we keep only the former one.