More Data, More Relations, More Context and More Openness: A
Total Page:16
File Type:pdf, Size:1020Kb
More Data, More Relations, More Context and More Openness: A Review and Outlook for Relation Extraction Xu Han1∗ , Tianyu Gao2∗, Yankai Lin3∗, Hao Peng1, Yaoliang Yang1, Chaojun Xiao1, Zhiyuan Liu1y , Peng Li3, Maosong Sun1, Jie Zhou3 1Department of Computer Science and Technology, Tsinghua University, Beijing, China 2Princeton University, Princeton, NJ, USA 3Pattern Recognition Center, WeChat AI, Tencent Inc, China [email protected], [email protected], [email protected] Abstract to researching relation extraction (RE), which aims at extracting relational facts from plain text. Relational facts are an important component of More specifically, after identifying entity mentions human knowledge, which are hidden in vast USA New York amounts of text. In order to extract these facts (e.g., and ) in text, the main goal from text, people have been working on rela- of RE is to classify relations (e.g., contains) tion extraction (RE) for years. From early pat- between these entity mentions from their context. tern matching to current neural networks, ex- The pioneering explorations of RE lie in statisti- isting RE methods have achieved significant cal approaches, such as pattern mining (Huffman, progress. Yet with explosion of Web text and 1995; Califf and Mooney, 1997), feature-based emergence of new relations, human knowl- methods (Kambhatla, 2004) and graphical models edge is increasing drastically, and we thus re- (Roth and Yih, 2002). Recently, with the develop- quire “more” from RE: a more powerful RE system that can robustly utilize more data, ef- ment of deep learning, neural models have been ficiently learn more relations, easily handle widely adopted for RE (Zeng et al., 2014; Zhang more complicated context, and flexibly gener- et al., 2015) and achieved superior results. These alize to more open domains. In this paper, we RE methods have bridged the gap between unstruc- look back at existing RE methods, analyze key tured text and structured knowledge, and shown challenges we are facing nowadays, and show their effectiveness on several public benchmarks. promising directions towards more powerful Despite the success of existing RE methods, RE. We hope our view can advance this field and inspire more efforts in the community.1 most of them still work in a simplified setting. These methods mainly focus on training models 1 Introduction with large amounts of human annotations to classify two given entities within one sentence Relational facts organize knowledge of the world in into pre-defined relations. However, the real a triplet format. These structured facts act as an im- world is much more complicated than this simple port role of human knowledge and are explicitly or setting: (1) collecting high-quality human annota- implicitly hidden in the text. For example, “Steve tions is expensive and time-consuming, (2) many Jobs co-founded Apple” indicates the fact (Apple long-tail relations cannot provide large amounts founded by Inc., , Steve Jobs), and we can also of training examples, (3) most facts are expressed arXiv:2004.03186v3 [cs.CL] 30 Sep 2020 contains infer the fact (USA, , New York) from by long context consisting of multiple sentences, “Hamilton made its debut in New York, USA”. and moreover (4) using a pre-defined set to cover As these structured facts could benefit down- those relations with open-ended growth is difficult. stream applications, e.g, knowledge graph comple- Hence, to build an effective and robust RE sys- tion (Bordes et al., 2013; Wang et al., 2014), search tem for real-world deployment, there are still some engine (Xiong et al., 2017; Schlichtkrull et al., more complex scenarios to be further investigated. 2018) and question answering (Bordes et al., 2014; In this paper, we review existing RE meth- Dong et al., 2015), many efforts have been devoted ods (Section2) as well as latest RE explorations ∗ indicates equal contribution (Section3) targeting more complex RE scenarios. y Corresponding author e-mail: [email protected] Those feasible approaches leading to better RE 1Most of the papers mentioned in this work are col- lected into the following paper list https://github. abilities still require further efforts, and here we com/thunlp/NREPapers. summarize them into four directions: (1) Utilizing More Data (Section 3.1). Super- Tim Cook is Apple’s current CEO. Founder 0.05 vised RE methods heavily rely on expensive human CEO 0.89 . annotations, while distant supervision (Mintz et al., . 2009) introduces more auto-labeled data to allevi- Place of Birth 0.01 ate this issue. Yet distant methods bring noise ex- amples and just utilize single sentences mentioning Figure 1: An example of RE. Given two entities and entity pairs, which significantly weaken extraction one sentence mentioning them, RE models classify the relation between them within a pre-defined relation set. performance. Designing schemas to obtain high- quality and high-coverage data to train robust RE models still remains a problem to be explored. ties to existing knowledge graphs (KGs, necessary (2) Performing More Efficient Learning (Sec- when using relation extraction for knowledge graph tion 3.2). Lots of long-tail relations only contain a completion), and a relational classifier to determine handful of training examples. However, it is hard relations between entities by given context. for conventional RE methods to well generalize re- Among these steps, identifying the relation is lation patterns from limited examples like humans. the most crucial and difficult task, since it requires Therefore, developing efficient learning schemas models to well understand the semantics of the con- to make better use of limited or few-shot examples text. Hence, RE generally focuses on researching is a potential research direction. the classification part, which is also known as rela- (3) Handling More Complicated Context tion classification. As shown in Figure1, a typical (Section 3.3). Many relational facts are expressed RE setting is that given a sentence with two marked in complicated context (e.g. multiple sentences or entities, models need to classify the sentence into even documents), while most existing RE models one of the pre-defined relations2. focus on extracting intra-sentence relations. To In this section, we introduce the development of cover those complex facts, it is valuable to investi- RE methods following the typical supervised set- gate RE in more complicated context. ting, from early pattern-based methods, statistical (4) Orienting More Open Domains (Sec- approaches, to recent neural models. tion 3.4). New relations emerge every day from dif- ferent domains in the real world, and thus it is hard 2.1 Pattern Extraction Models to cover all of them by hand. However, conven- The pioneering methods use sentence analysis tools tional RE frameworks are generally designed for to identify syntactic elements in text, then auto- pre-defined relations. Therefore, how to automat- matically construct pattern rules from these ele- ically detect undefined relations in open domains ments (Soderland et al., 1995; Kim and Moldovan, remains an open problem. 1995; Huffman, 1995; Califf and Mooney, 1997). Besides the introduction of promising directions, In order to extract patterns with better coverage we also point out two key challenges for existing and accuracy, later work involves larger corpora methods: (1) learning from text or names (Sec- (Carlson et al., 2010), more formats of patterns tion 4.1) and (2) datasets towards special inter- (Nakashole et al., 2012; Jiang et al., 2017), and ests (Section 4.2). We hope that all these contents more efficient ways of extraction (Zheng et al., could encourage the community to make further 2019). As automatically constructed patterns may exploration and breakthrough towards better RE. have mistakes, most of the above methods require further examinations from human experts, which is 2 Background and Existing Work the main limitation of pattern-based models. Information extraction (IE) aims at extracting struc- 2.2 Statistical Relation Extraction Models tural information from unstructured text, which is an important field in natural language processing As compared to using pattern rules, statistical meth- (NLP). Relation extraction (RE), as an important ods bring better coverage and require less human task in IE, particularly focuses on extracting rela- efforts. Thus statistical relation extraction (SRE) tions between entities. A complete relation extrac- has been extensively studied. tion system consists of a named entity recognizer to 2Sometimes there is a special class in the relation set in- identify named entities (e.g., people, organizations, dicating that the sentence does not express any pre-specified locations) from text, an entity linker to link enti- relation (usually named as N/A). One typical SRE approach is feature-based 89.5 methods (Kambhatla, 2004; Zhou et al., 2005; 86.3 84.3 Jiang and Zhai, 2007; Nguyen et al., 2007), which 82.4 82.7 design lexical, syntactic and semantic features for 77.6 (SRE) entity pairs and their corresponding context, and F1 Score (%) then input these features into relation classifiers. Due to the wide use of support vector machines Before 2013 2013 2014 2015 2016 Now (SVM), kernel-based methods have been widely explored, which design kernel functions for SVM Figure 2: The performance of state-of-the-art RE mod- to measure the similarities between relation rep- els in different years on widely-used dataset SemEval- resentations and textual instances (Culotta