Adjacency List Oriented Relational Fact Extraction Via Adaptive Multi
Total Page:16
File Type:pdf, Size:1020Kb
Adjacency List Oriented Relational Fact Extraction via Adaptive Multi-task Learning Fubang Zhao1∗, Zhuoren Jiang2∗y, Yangyang Kang1, Changlong Sun1, Xiaozhong Liu3 1Alibaba Group, Hangzhou, China 2School of Public Affairs, Zhejiang University, Hangzhou, China 3School of Informatics, Computing and Engineering, IUB, Bloomington, USA [email protected], [email protected] [email protected], [email protected] [email protected] Abstract Relational fact extraction aims to extract se- mantic triplets from unstructured text. In this work, we show that all of the relational fact extraction models can be organized accord- ing to a graph-oriented analytical perspective. An efficient model, aDjacency lIst oRiented rElational faCT(DIRECT), is proposed based on this analytical framework. To alleviate challenges of error propagation and sub-task loss equilibrium, DIRECT employs a novel adaptive multi-task learning strategy with dy- namic sub-task loss balancing. Extensive ex- periments are conducted on two benchmark Figure 1: Example of exploring the relational fact ex- datasets, and results prove that the proposed traction task from the perspective of directed graph rep- model outperforms a series of state-of-the-art resentation method as output data structure. (SoTA) models for relational triplet extraction. 1 Introduction et al., 2019), and PNDec (Nayak and Ng, 2020), Relational fact extraction, as an essential NLP fall into this category. task, is playing an increasingly important role in Edge list is a simple and space-efficient way to knowledge graph construction (Han et al., 2019; represent a graph (Arifuzzaman and Khan, 2015). Distiawan et al., 2019). It aims to extract rela- However, there are three problems. First, the tional triplet from the text. A relational triplet triplet overlapping problem (Zeng et al., 2018). is in the form of (subject; relation; object) or For instance, as shown in Figure1, for triplets (s; r; o) (Zeng et al., 2019). While various prior (Obama, nationality, USA) and (Obama, presi- models proposed for relational fact extraction, few dent of, USA), there are two types of relations be- of them analyze this task from the perspective of tween the “Obama” and “USA”. If the model only output data structure. generates one sequence from the text (Zheng et al., arXiv:2106.01559v1 [cs.CL] 3 Jun 2021 As shown in Figure1, the relational fact extrac- 2017), it may fail to identify the multi-relation be- tion can be characterized as a directed graph con- tween entities. Second, to overcome the triplet over- struction task, where graph representation flexibil- lapping problem, the model may have to extract ity and heterogeneity accompany additional bene- the triplet element repeatedly (Zeng et al., 2018), faction. In practice, there are three common ways which will increase the extraction cost. Third, there to represent graphs (Gross and Yellen, 2005): could be an ordering problem (Zeng et al., 2019): Edge List is utilized to predict a sequence of for multiple triplets, the extraction order could in- triplets (edges). The recent sequence-to-sequence fluence the model performance. based models, such as NovelTagging (Zheng et al., Adjacency Matrices are used to predict ma- 2017), CopyRE (Zeng et al., 2018), CopyRL (Zeng trices that represent exactly which entities (ver- ∗These two authors contributed equally to this research. tices) have semantic relations (edges) between y Zhuoren Jiang is the corresponding author them. Most early works, which take a pipeline ap- proach (Zelenko et al., 2003; Zhou et al., 2005), be- lem (Chen et al., 2018; Sener and Koltun, 2018) long to this category. These models first recognize could compromise its performance. all entities in text and then perform relation classi- Based on the analysis from the perspective of fication for each entity pair. The subsequent neural output data structure, we propose a novel solution, network-based models (Bekoulis et al., 2018; Dai aDjacency lIst oRiented rElational faCT extraction et al., 2019), that attempt to extract entities and model (DIRECT), with the following advantages: relations jointly, can also be classified into this cat- • For efficiency, DIRECT is a fully adjacency list egory. oriented model, which consists of a shared BERT Compared to edge list, adjacency matrices have encoder, the Pointer-Network based subject and ob- better relation (edge) searching efficiency (Arifuz- ject extractors, and a relation classification module. zaman and Khan, 2015). Furthermore, adjacency In Section 3.4, we provide a detailed comparative matrices oriented models is able to cover differ- analysis2 to demonstrate the efficiency of the pro- ent overlapping cases (Zeng et al., 2018) for rela- posed method. tional fact extraction task. But the space cost of • From the performance viewpoint, to address this approach can be expensive. For most cases, the sub-task error propagation and sub-task loss balanc- output matrices are very sparse. For instance, for ing problems, DIRECT employs a novel adaptive a sentence with n tokens, if there are m kinds of multi-task learning strategy with the dynamic sub- relations, the output space is n · n · m, which can task loss balancing approach. In Section 3.2 and be costly for graph representation efficiency. This 3.3, the empirical experimental results demonstrate phenomenon is also illustrated in Figure1. DIRECT can achieve the state-of-the-art perfor- Adjacency List is designed to predict an array mance of relational fact extraction task, and the of linked lists that serves as a representation of a adaptive multi-task learning strategy did play a pos- graph. As depicted in Figure1, in the adjacency itive role in improving the task performance. list, each vertex v (key) points to a list (value) con- The major contributions of this paper can be taining all other vertices connected to v by sev- summarized as follows: eral edges. Adjacency list is a hybrid graph rep- 1. We refurbish the relational fact extraction resentation between edge list and adjacency ma- problem by leveraging an analytical framework trices (Gross and Yellen, 2005), which can bal- of graph-oriented output structure. To the best of ance space and searching efficiency1. Due to the our knowledge, this is a pioneer investigation to structural characteristic of the adjacency list, this explore the output data structure of relational fact type of model usually adopts a cascade fashion to extractions. identify subject, object, and relation sequentially. 2. We propose a novel solution, DIRECT3, For instance, the recent state-of-the-art model Cas- which is a fully adjacency list oriented model with Rel (Wei et al., 2020) can be considered as an ex- a novel adaptive multi-task learning strategy. emplar. It utilizes a two-step framework to rec- 3. Through extensive experiments on two bench- ognize the possible object(s) of a given subject mark datasets3, we demonstrate the efficiency and under a specific relation. However, CasRel is not efficacy of DIRECT. The proposed DIRECT out- fully adjacency list oriented: in the first step, it performs the state-of-the-art baseline models. use subject as the key; while in the second step, it predicts (relation; object) pairs using adjacency 2 The DIRECT Framework matrix representation. In this section, we will introduce the framework Despite its considerable potential, the cascade of the proposed DIRECT model, which includes fashion of adjacency list oriented model may cause a shared BERT encoder and three output layers: problems of sub-task error propagation (Shen et al., subject extraction, object extraction, and relation 2019), i.e., errors from ancestor sub-tasks may ac- classification. As shown in Figure2, DIRECT is cumulate to threaten downstream ones, and sub- fully adjacency list oriented. The input sentence tasks can hardly share supervision signals. Multi- is firstly fed into the subject extraction module to task learning (Caruana, 1997) can alleviate this 2Theoretical representation efficiency analysis of graph problem, however, the sub-task loss balancing prob- representative models are described in Appendix section 6.4. 3To help other scholars reproduce the experiment out- 1More detailed complexity analyses of different graph rep- come, we will release the code and datasets via GitHub: resentations are provided in Appendix section 6.3. https://github.com/fyubang/direct-ie. Figure 2: An overview of the proposed DIRECT framework extract all subjects. Then each extracted subject 2.2 Subject and Object Extraction is concatenated with the sentence, and fed into The subject and object extraction modules are moti- the object extraction module to extract all objects, vated by the Pointer-Network (Vinyals et al., 2015) which can form a set of subject-object pairs. Fi- architecture, which are widely used in Machine nally, the subject-object pair is concatenated with Reading Comprehension (MRC) (Rajpurkar et al., sentence, and fed into the relation classification 2016) task. Different from MRC task that only module to get the relations between them. For needs to extract a single span, the subject and object balancing the weights of sub-task losses and to im- extractions need to extract multiple spans. There- prove the global task performance, three modules fore, in the training phase, we replace softmax share the BERT encoder layer and are trained with function with sigmoid function for the activation an adaptive multi-task learning strategy. function of the output layer, and replace cross en- tropy (CE) (Goodfellow et al., 2016) with binary 2.1 Shared BERT Encoder cross entropy (BCE) (Luc et al., 2016) for the loss In the DIRECT framework, the encoder is used to function. Specifically, we will perform indepen- extract the semantic features from the inputs for dent binary classifications for each token twice to three modules. As aforementioned, we employ the indicate whether the current token is the start or the BERT (Devlin et al., 2019) as the shared encoder to end of a span. The probability of a token to be start make use of its pre-trained knowledge and attention or end is as follows: mechanism.