arXiv:1905.05529v4 [cs.CL] 4 Sep 2019 st rnfr tt tutrlkoldebase. knowledge text: structural following the a given example, to For it transform to is extraction natural entity-relation of of chunk goal the a text, Given language years. grow- these recieved interest has ing which texts, raw from pre- unstructured knowledge the structured is extracting of relations requisite their and entities Identifying Introduction 1 2 1 oapa nACL2019. in appear to contribution. equal indicates * diinly ecntutanwydeveloped newly a construct we Additionally, iu aaes h rpsdmlitr QA multi-turn proposed The datasets. vious entity modeling jointly of way natural a vides encodes query question the firstly, vantages: oe loahee h etpromneon performance dataset. RESUME best the the achieves also model pre- in exaction triplet depen- the single-step in extraction the dency to opposed as depen- dencies, entity construct requires to reasoning which multi-step Chinese, in RESUME dataset (+1.0), 49.4 respectively. to (+1.1), 68.9 datasets and three (+0.6) 60.2 the SOTA on the results increasing datasets, ACE05 state- ACE04, CoNLL04 the the and obtain of all to proposed on able results are of-the-art We the models. previous best outperforms that significantly CoNLL04 paradigm demonstrate the and ACE corpora the on Experiments compre- models. reading (MRC) hension machine developed exploit well to us the allows it thirdly, and relation; pro- and QA secondly, identify; to want entity/relation we the class for information important ad- key several with comes formalization multi-turn QA This context. the from identifying spans of answer task re- the to and answering transformed entities is of lations question extraction multi-turn the i.e., a problem, as cast We task extraction. for the entity-relation paradigm of new task a the propose we paper, this In niyRlto xrcina ut-unQeto Answer Question Multi-turn as Extraction Entity-Relation ran Yuan Arianna Abstract { 2 xiaoya duo 2 1 optrSineDprmn,Safr University Stanford Department, Science Computer ioaLi Xiaoya hi mingxin chai, i fan li, 1 , 2 ∗ u Chai Duo , 1 a Yin Fan , i,zijun yin, 1 hu jiwei zhou, Shannon.AI 1 ∗ 1 igi Zhou Mingxin , u,xiayu sun, iu Sun Zijun , al :A lutaino netatdsrcua table. structural extracted an of illustration An 1: Table oigCmay nifatutr n tunnel- and infrastructure an Company, Boring 2016, In chairman. its as operates and Company, inspired he 2006, In architect. product and CEO T omdit tutrldtstsoni Table in shown dataset structural a into formed n he ye frelations, Position, of and types Time three and Company, Person, i.e., tities, Company. construction The founded Musk 2016, In CEO. its is and faces, inter- brain–computer developing on focused pany Com- a Neuralink, co-founded he services energy solar a SolarCity, of creation the its became and 2003, in so- manufacturer, and He panel vehicle designer. lar electric lead an Inc., and Tesla, CEO fund is helped he which Com- of services pany, transport space and manufacturer xrcigals ftilsfo h et i.e., text, the from triples of list a extracting ewe entity between taeis uha osrit rprmtr shar- parameters or differenting. constraints throught as model such relation strategies, the entity and the pair; combines model which entity approach, each joint between the and relation models the extraction identify relation to uses then ap- and identify to pipelined entities, models tagging the uses first categories: which proach, major two into fall REL esnCr iePosition Time Corp Person IME ukNuaik21 CEO chairman - 2016 2006 2016 Neuralink Company Boring SolarCity The Musk Musk Musk ukSae 02CEO 2003 2002 Tesla SpaceX Musk Musk ene oetatfu ifrn ye fen- of types different four extract to need We aerospace an SpaceX, founded Musk 2002, In oteitn oesapoc hsts by task this approach models existing Most hr r eea e suswt urn ap- current with issues key several are There ( li e 1 and } e , @shannonai.com 2 i arianna li, SERVING ,wihdntsta relation that denotes which ), 1 iy Li Xiayu , 1 e 1 n ie Li Jiwei and n entity and - ROLE yuan, 1 h eti ob trans- be to is text The . FOUND e 2 1 rvosmodels Previous . , ing rdc architect product FOUNDING REL CEO& holds 1 - . proaches, both in terms of the task formalization 2005; Lemon et al., 2006); (2) the question query and the algorithm. At the formalization level, the encodes important prior information for the rela- REL(e1, e2) triplet structure is not enough to fully tion class we want to identify. This informative- express the data structure behind the text. Take ness can potentially solve the issues that existing the Musk case as an example, there is a hierar- relation extraction models fail to solve, such as chical dependency between the tags: the extrac- distantly-separated entity pairs, relation span over- tion of Time depends on Position since a Person lap, etc; (3) the QA framework provides a natu- can hold multiple Positions in a Company during ral way to simultaneously extract entities and rela- different Time periods. The extraction of Posi- tions: most MRC models support outputting spe- tion also depends on Company since a Person can cial NONE tokens, indicating that there is no an- work for multiple companies. At the algorithm swer to the question. Throught this, the original level, for most existing relation extraction mod- two tasks, entity extraction and relation extraction els (Miwa and Bansal, 2016; Wang et al., 2016a; can be merged to a single QA task: a relation holds Ye et al., 2016), the input to the model is a raw if the returned answer to the question correspond- sentence with two marked mentions, and the out- ing to that relation is not NONE, and this returned put is whether a relation holds between the two answer is the entity that we wish to extract. mentions. As pointed out in Wang et al. (2016a); In this paper, we show that the proposed Zeng et al. (2018), it is hard for neural models to paradigm, which transforms the entity-relation ex- capture all the lexical, semantic and syntactic cues traction task to a multi-turn QA task, introduces in this formalization, especially when (1) entities significant performance boost over existing sys- are far away; (2) one entity is involved in multiple tems. It achieves state-of-the-art (SOTA) perfor- triplets; or (3) relation spans have overlaps3. mance on the ACE and the CoNLL04 datasets. In the paper, we propose a new paradigm to han- The tasks on these datasets are formalized as dle the task of entity-relation extraction. We for- triplet extraction problems, in which two turns of malize the task as a multi-turn question answer- QA suffice. We thus build a more complicated and ing task: each entity type and relation type is more difficult dataset called RESUME which re- characterized by a question answering template, quires to extract biographical information of indi- and entities and relations are extracted by an- viduals from raw texts. The construction of struc- swering template questions. Answers are text tural knowledge base from RESUME requires four spans, extracted using the now standard machine or five turns of QA. We also show that this multi- reading comprehension (MRC) framework: pre- turn QA setting could easilty integrate reinforce- dicting answer spans given context (Seo et al., ment learning (just as in multi-turn dialog systems) 2016; Wang and Jiang, 2016; Xiong et al., 2017; to gain additional performance boost. Wang et al., 2016b). To extract structural data like The rest of this paper is organized as follows: Table 1, the model need to answer the following Section 2 details related work. We describe the questions sequentially: dataset and setting in Section 3, the proposed • Q: who is mentioned in the text? A: Musk; model in Section 4, and experimental results in • Q: which Company / companies did Musk Section 5. We conclude this paper in Section 6. work for? A: SpaceX, Tesla, SolarCity, Neu- ralink and ; 2 Related Work • Q: when did Musk join SpaceX? A: 2002; • Q: what was Musk’s Position in SpaceX? A: 2.1 Extracting Entities and Relations CEO. Many earlier entity-relation extraction systems are Treating the entity-relation extraction task as pipelined (Zelenko et al., 2003; Miwa et al., 2009; a multi-turn QA task has the following key ad- Chan and Roth, 2011; Lin et al., 2016): an entity vantages: (1) the multi-turn QA setting provides extraction model first identifies entities of interest an elegant way to capture the hierarchical depen- and a relation extraction model then constructs re- dency of tags. As the multi-turn QA proceeds, we lations between the extracted entities. Although progressively obtain the entities we need for the pipelined systems has the flexibility of integrat- next turn. This is closely akin to the multi-turn ing different data sources and learning algorithms, slot filling dialogue system (Williams and Young, they suffer significantly from error propagation. 3e.g., in text ABCD,(A, C) is a pair and (B, D) is a pair. To tackle this issue, joint learning models have been proposed. Earlier joint learning ap- to be selected from multiple passages. Multi- proaches connect the two models through var- passage MRC tasks can be easily simplified ious dependencies, including constraints solved to single-passage MRC tasks by concatenating by integer linear programming (Yang and Cardie, passages (Shen et al., 2017; Wang et al., 2017b). 2013; Roth and Yih, 2007), card-pyramid pars- Wang et al. (2017a) first rank the passages and ing (Kate and Mooney, 2010), and global prob- then run single-passage MRC on the selected abilistic graphical models (Yu and Lam, 2010; passage. Tan et al. (2017) train the passage Singh et al., 2013). In later studies, Li and Ji ranking model jointly with the reading compre- (2014) extract entity mentions and relations us- hension model. Pretraining methods like BERT ing structured perceptron with efficient beam- (Devlin et al., 2018) or Elmo (Peters et al., 2018) search, which is significantly more efficient and have proved to be extremely helpful in MRC less Time-consuming than constraint-based ap- tasks. proaches. Miwa and Sasaki (2014); Gupta et al. There has been a tendency of casting non-QA (2016); Zhang et al. (2017) proposed the table- NLP tasks as QA tasks (McCann et al., 2018). filling approach, which provides an opportunity Our work is highly inspired by Levy et al. (2017). to incorporating more sophisticated features and Levy et al. (2017) and McCann et al. (2018) fo- algorithms into the model, such as search orders cus on identifying the relation between two pre- in decoding and global features. Neural network defined entities and the authors formalize the task models have been widely used in the literature as of relation extraction as a single-turn QA task. In well. Miwa and Bansal (2016) introduced an end- the current paper we study a more complicated to-end approach that extract entities and their re- scenario, where hierarchical tag dependency needs lations using neural network models with shared to be modeled and single-turn QA approach no parameters, i.e., extracting entities using a neu- longer suffices. We show that our multi-turn QA ral tagging model and extracting relations using method is able to solve this challenge and obtain a neural multi-class classification model based new state-of-the-art results. on tree LSTMs (Tai et al., 2015). Wang et al. (2016a) extract relations using multi-level atten- 3 Datasets and Tasks tion CNNs. Zeng et al. (2018) proposed a new framework that uses sequence-to-sequence models 3.1 ACE04, ACE05 and CoNLL04 to generate entity-relation triples, naturally com- We use ACE04, ACE05 and CoNLL04 bining entity detection and relation detection. (Roth and Yih, 2004), the widely used entity- Another way to bind the entity and the relation relation extraction benchmarks for evaluation. extraction models is to use reinforcement learning ACE04 defines 7 entity types, including Per- or Minimum Risk Training, in which the training son (PER), Organization (ORG), Geographical signals are given based on the joint decision by the Entities (GPE), Location (loc), Facility (FAC), two models. Sun et al. (2018) optimized a global Weapon (WEA) and Vehicle (VEH). For each loss function to jointly train the two models under pair of entities, it defines 7 relation categories, the framework work of Minimum Risk Training. including Physical (PHYS), Person-Social (PER- Takanobu et al. (2018) used hierarchical reinforce- SOC), Employment-Organization (EMP-ORG), ment learning to extract entities and relations in a Agent-Artifact (ART), PER/ORG Affiliation hierarchical manner. (OTHER-AFF), GPE- Affiliation (GPE-AFF) and Discourse (DISC). ACE05 was built upon ACE04. 2.2 Machine Reading Comprehension It kept the PER-SOC, ART and GPE-AFF categories Main-stream MRC models (Seo et al., 2016; from ACE04 but split PHYS into PHYS and a new Wang and Jiang, 2016; Xiong et al., 2017; relation category PART-WHOLE. It also deleted Wang et al., 2016b) extract text spans in pas- DISC and merged EMP-ORG and OTHER-AFF into sages given queries. Text span extraction can a new category EMP-ORG. As for CoNLL04, it be simplified to two multi-class classification defines four entity types (LOC, ORG, PERand OTH- tasks, i.e., predicting the starting and the ending ERS) and five relation categories (LOCATED IN, positions of the answer. Similar strategy can WORK FOR, ORGBASED IN, LIVE IN ]and KILL). be extended to multi-passage MRC (Joshi et al., For ACE04 and ACE05, we followed the 2017; Dunn et al., 2017) where the answer needs training/dev/test split in Li and Ji (2014) and Miwa and Bansal (2016)4. For the CoNLL04 Total # Average # per passage Person 961 1.09 dataset, we followed Miwa and Sasaki (2014). Company 1988 2.13 Position 2687 1.33 3.2 RESUME: A newly constructed dataset Time 1275 1.01

The ACE and the CoNLL-04 datasets are intended Table 2: Statistics for the RESUME dataset. for triplet extraction, and two turns of QA is suf- ficient to extract the triplet (one turn for head- entities and another for joint extraction of tail- ment Management Co., Ltd.; since 2016, he has entities and relations). These datasets do not in- served as director and general manager of Zhan- volve hierarchical entity relations as in our previ- jiang Zhongguang Venture Capital Co., Ltd.; since ous Musk example, which are prevalent in real life March 2016, he has served as the supervisor of the applications. Company. Therefore, we construct a new dataset called We identify four types of entities: Person (the RESUME. We extracted 841 paragraphs from name of the executive), Company (the company chapters describing management teams in IPO that the executive works/worked for), Position (the prospectuses. Each paragraph describes some position that he/she holds/held) and Time (the time work history of an executive. We wish to extract period that the executive occupies/occupied that the structural data from the resume. The dataset is position). It is worth noting that one person can in Chinese. The following shows an examples: work for different companies during different peri- 郑强先生,本公司监事,1973年出生,中 ods of time and that one person can hold different 国国籍,无境外永久居留权。1995年,毕业 positions in different periods of time for the same 于南京大学经济管理专业;1995年至1998年, company. 就职于江苏常州公路运输有限公司,任主办 We recruited crowdworkers to fill the slots in 会计;1998年至2000年,就职于越秀会计师事 Table 1. Each passage is labeled by two differ- 务所,任项目经理;2000年至2010年,就职于 ent crowdworkers. If labels from the two annota- 国富浩华会计师事务所有限公司广东分所, tors disagree, one or more annotators were asked 历任项目经理、部门经理、合伙人及副主任 to label the sentence and a majority vote was 会计师;2010年至2011年,就职于广东中科 taken as the final decision. Since the wording of 招商创业投资管理有限责任公司,任副总经 the text is usually very explicit and formal, the 理;2011年至今,任广东中广投资管理有限公 inter-agreement between annotators is very high, 司董事、总经理;2016年至今,任湛江中广创 achieving a value of 93.5% for all slots. Some 业投资有限公司董事、总经理;2016年3月至 statistics of the dataset are shown in Table 2. We 今,担任本公司监事. randomly split the dataset into training (80%), val- Mr. Zheng Qiang, a supervisor of the Company. idation(10%) and test set (10%). He was born in 1973. His nationality is Chinese with no permanent residency abroad. He gradu- 4 Model ated from Nanjing University with a major in eco- 4.1 System Overview nomic management in 1995. From 1995 to 1998, The overview of the algorithm is shown in Algo- he worked for Jiangsu Changzhou Road Trans- rithm 1. The algorithm contains two stages: portation Co., Ltd. as an organizer of account- (1) The head-entity extraction stage (line 4-9): ing. From 1998 to 2000, he worked as a project each episode of multi-turn QA is triggered by an manager in Yuexiu Certified Public Accountants. entity. To extract this starting entity, we trans- In 2010, he worked in the Guangdong branch of form each entity type to a question using Enti- Guofu Haohua Certified Public Accountants Co., tyQuesTemplates (line 4) and the entity e is ex- Ltd., and served as a project manager, depart- tracted by answering the question (line 5). If the ment manager, partner and deputy chief accoun- system outputs the special NONE token, then it tant. From 2010 to 2011, he worked for Guang- means s does not contain any entity of that type. dong Zhongke Investment Venture Capital Man- (2) The relation and the tail-entity extraction agement Co., Ltd. as a deputy general manager; stage (line 10-24): ChainOfRelTemplates defines since 2011, he has served as thedirector and gen- a chain of relations, the order of which we need eral manager of Guangdong Zhongguang Invest- to follow to run multi-turn QA. The reason is that 4https://github.com/tticoin/LSTM-ER/. the extraction of some entities depends on the ex- Relation Type head-e tail-e Natural Language Question & Template Question GEN-AFF FAC GPE find a geo-political entity that connects to XXX XXX; has affiliation; geo-political entity PART-WHOLE FAC FAC find a facility that geographically relates to XXX XXX; part whole; facility PART-WHOLE FAC GPE find a geo-political entity that geographically relates to XXX XXX; part whole; geo-political entity PART-WHOLE FAC VEH find a vehicle that belongs to XXX XXX; part whole; vehicle PHYS FAC FAC find a facility near XXX? XXX; physical; facility ART GPE FAC find a facility which is made by XXX XXX; agent artifact; facility ART GPE VEH find a vehicle which is owned or used by XXX XXX; agent artifact; vehicle ART GPE WEA find a weapon which is owned or used by XXX XXX; agent artifact; weapon ORG-AFF GPE ORG find an organization which is invested by XXX XXX; organization affiliation; organization PART-WHOLE GPE GPE find a geo political entity which is controlled by XXX XXX; part whole; geo-political entity PART-WHOLE GPE LOC find a location geographically related to XXX XXX; part whole; location

Table 3: Some of the question templates for different relation types in AEC.

Q1 Person: who is mentioned in the text? A: e1 Q2 Company: which companies did e1 work for? A: e2 Q3 Position: what was e1’s position in e2? A: e3 Q4 Time: During which period did e1 work for e2 as e3 A: e4

Table 4: Question templates for the RESUME dataset. traction of others. For example, in the RESUME For ACE04, ACE05 and CoNLL04 datasets, dataset, the position held by an executive relies on only two QA turns are needed. ChainOfRelTem- the company he works for. Also the extraction of plates thus only contain chains of 1. For RE- the Time entity relies on the extraction of both the SUME, we need to extract 4 entities, so Chain- Company and the Position. The extraction order is OfRelTemplates contain chains of 3. manually pre-defined. ChainOfRelTemplates also defines the template for each relation. Each tem- plate contains some slots to be filled. To generate 4.2 Generating Questions using Templates a question (line 14), we insert previously extracted entity/entities to the slot/slots in a template. The Each entity type is associated with a type-specific relation REL and tail-entity e will be jointly ex- question generated by the templates. There are tracted by answering the generated question (line two ways to generate questions based on tem- 15). A returned NONE token indicates that there is plates: natural language questions or pseudo- no answer in the given sentence. questions. A pseudo-question is not necessarily It is worth noting that entities extracted from the grammatical. For example, the natural language head-entity extraction stage may not all be head question for the Facility type could be Which fa- entities. In the subsequent relation and tail-entity cility is mentioned in the text, and the pseudo- extraction stage, extracted entities from the first question could just be entity: facility. stage are initially assumed to be head entities, and are fed to the templates to generate questions. If At the relation and the tail-entity joint extrac- an entity e extracted from the first stage is indeed tion stage, a question is generated by combing a a head-entity of a relation, then the QA model will relation-specific template with the extracted head- extract the tail-entity by answering the correspond- entity. The question could be either a natural lan- ing question. Otherwise, the answer will be NONE guage question or a pseudo-question. Examples and thus ignored. are shown in Table 3 and Table 4. Input: sentence s, EntityQuesTemplates, ChainOfRelTem- one sentence/passage in our setting might con- plates Output: a list of list (table) M = [] tain multiple answers. To tackle this issue, 1: we formalize the task as a query-based tagging 2: M ← ∅ problem (Lafferty et al., 2001; Huang et al., 2015; 3: HeadEntList← ∅ 4: for entity question in EntityQuesTemplates do Ma and Hovy, 2016). Specially, we predict a 5: e1 = Extract Answer(entity question, s) BMEO (beginning, inside, ending and outside) la- 6: if e1 6= NONE do bel for each token in the context given the query. 7: HeadEntList = HeadEntList + {e1} 8: endif The representation of each word is fed to a soft- 9: end for max layer to output a BMEO label. One can think 10: for head entity in HeadEntList do that we are transforming two N-class classification 11: ent list = [head entity] 12: for [rel, rel temp] in ChainOfRelTemplates do tasks of predicting the starting and the ending in- 13: for (rel, rel temp) in List of [rel, rel temp] do dices (where N denotes the length of sentence) to 14: q = GenQues(rel temp, rel, ent list) N 5-class classification tasks5. 15: e = Extract Answer(rel question, s) 16: if e 6= NONE 17: ent list = ent list + e Training and Test At the training time, we 18: endif jointly train the objectives for the two stages: 19: end for 20: end for 21: if len(ent list)=len([rel, rel temp]) L = (1 − λ)L(head-entity)+ λL(tail-entity, rel) 22: M = M + ent list (1) 23: endif 24: end for λ ∈ [0, 1] is the parameter controling the trade-off 25: return M between the two objectives. Its value is tuned on Algorithm 1: Transforming the entity-relation extrac- the validation set. Both the two models are ini- tion task to a multi-turn QA task. tialized using the standard BERT model and they share parameters during the training. At test time, head-entities and tail-entities are extracted sepa- 4.3 Extracting Answer Spans via MRC rately based on the two objectives.

Various MRC models have been proposed, such 4.4 Reinforcement Learning as BiDAF (Seo et al., 2016) and QANet (Yu et al., 2018). In the standard MRC setting, given a Note that in our setting, the extracted answer from one turn not only affects its own accuracy, but also question Q = {q1,q2, ..., qNq } where Nq de- notes the number of words in Q, and context determines how a question will be constructed for the downstream turns, which in turn affect later ac- C = {c1, c2, ..., cNc }, where Nc denotes the number of words in C, we need to predict the curacies. We decide to use reinforcement learning answer span. For the QA framework, we use to tackle it, which has been proved to be successful BERT (Devlin et al., 2018) as a backbone. BERT in multi-turn dialogue generation (Mrkˇsi´cet al., performs bidirectional language model pretrain- 2015; Li et al., 2016; Wen et al., 2016), a task that ing on large-scale datasets using transformers has the same challenge as ours. (Vaswani et al., 2017) and achieves SOTA results Action and Policy In a RL setting, we need to on MRC datasets like SQUAD (Rajpurkar et al., define action and policy. In the multi-turn QA 2016). To align with the BERT framework, the setting, the action is selecting a text span in each question Q and the context C are combined by turn. The policy defines the probability of select- concatenating the list [CLS, Q, SEP, C, SEP], ing a certain span given the question and the con- where CLS and SEP are special tokens, Q is the text. As the algorithm relies on the BMEO tagging tokenized question and C is the context. The rep- output, the probability of selecting a certain span resentation of each context token is obtained using {w , w , ..., w } is the joint probability of w be- multi-layer transformers. 1 2 n 1 ing assigned to B (beginning), w , ..., wn− being Traditional MRC models (Wang and Jiang, 2 1 assigned to M (inside) and wn being assigned to 2016; Xiong et al., 2017) predict the starting and ending indices by applying two softmax layers 5 For some of the relations that we are interested in, their to the context tokens. This softmax-based span corresponding questions have single answers. We tried the strategy of predicting the starting and the ending index and extraction strategy only fits for single-answer found the results no different from the ones in the multi- extraction tasks, but not for our task, since answer QA-based tagging setting. E (end), written as follows: are well suited for triplet extractions, but not really tailored to our setting because in the third and forth p(y(w1, ..., wn)= answer|question,s) turn, we need more information to decide the rela- tion than just the two entities. For instance, to ex- = p(w1 = B) × p(wn = E) Y p(wi = M) i∈[2,n−1] tract Position, we need both Person and Company, (2) and to extract Time, we need Person, Company and Position. This is akin to a dependency parsing Reward For a given sentence s, we use the num- task, but at the tag-level rather than the word-level ber of correctly retrieved triples as rewards. We (Dozat and Manning, 2016; Chen and Manning, use the REINFORCE algorithm (Williams, 1992), 2014). We thus proposed the following base- a kind of policy gradient method, to find the opti- line, which modifies the previous entity+relation mal policy, which maximizes the expected reward strategy to entity+dependency, denoted by tag- Eπ[R(w)]. The expectation is approximated by ging+dependency. We use the BERT tagging sampling from the policy π and the gradient is model to assign tagging labels to each word, and computed using the likelihood ratio: modify the current SOTA dependency parsing model Biaffine (Dozat and Manning, 2016) to con- E θ R w b π y w ∇ ( ) ≈ [ ( ) − ]∇ log ( ( )|question s)) struct dependencies between tags. The Biaffine de- (3) pendency model and the entity-extraction model b where denotes a baseline value. For each turn in are jointly trained. the multi-turn QA setting, getting an answer cor- Results are presented in Table 5. As can be seen, rect leads to a reward of +1 . The final reward the tagging+dependency model outperforms the is the accumulative reward of all turns. The base- tagging+relation model. The proposed multi-turn line value is set to the average of all previous re- QA model performs the best, with RL adding addi- wards. We do not initialize policy networks from tional performance boost. Specially, for Person ex- scratch, but use the pre-trained head-entity and traction, which only requires single-turn QA, the tail-entity extraction model described in the pre- multi-turn QA+RL model performs the same as vious section. We also use the experience replay the multi-turn QA model. It is also the case in strategy (Mnih et al., 2015): for each batch, half tagging+relation and tagging+dependency. of the examples are simulated and the other half is randomly selected from previously generated ex- 5.2 Results on ACE04, ACE05 and CoNLL04 amples. For ACE04, ACE05 and CoNLL04, only two For the RESUME dataset, we use the strategy of turns of QA are required. For evaluation, curriculum learning (Bengio et al., 2009), i.e., we we report micro-F1 scores, precision and recall gradually increase the number of turns from 2 to 4 on entities and relations (Tables 6, 7 and 8) at training. as in Li and Ji (2014); Miwa and Bansal (2016); Katiyar and Cardie (2017); Zhang et al. (2017). 5 Experimental Results For ACE04, the proposed multi-turn QA model al- 5.1 Results on RESUME ready outperforms previous SOTA by +1.8% for entity extraction and +1.0% for relation extrac- Answers are extracted according to the order of tion. For ACE05, the proposed multi-turn QA Person (first-turn), Company (second-turn), Posi- model outperforms previous SOTA by +1.2% for tion (third-turn) and Time (forth-turn), and the ex- entity extraction and +0.6% for relation extraction. traction of each answer depends on those prior to For CoNLL04, the proposed multi-turn QA model them. leads to a +1.1% on relation F1. For baselines, we first implement a joint model in which entity extraction and relation extraction 6 Ablation Studies are trained together (denoted by tagging+relation). As in Zheng et al. (2017), entities are extracted us- 6.1 Effect of Question Generation Strategy ing BERT tagging models, and relations are ex- In this subsection, we compare the effects of natu- tracted by applying a CNN to representations out- ral language questions and pseudo-questions. Re- put by BERT transformers. sults are shown in Table 9. Existing baselines which involve entity and rela- We can see that natural language questions lead tion identification stages (either pipelined or joint) to a strict F1 improvement across all datasets. multi-turn QA multi-turn QA+RL tagging+dependency tagging+relation p r f p r f p r f p r f Person 98.1 99.0 98.6 98.1 99.0 98.6 97.0 97.2 97.1 97.0 97.2 97.1 Company 82.3 87.6 84.9 83.3 87.8 85.5 81.4 87.3 84.2 81.0 86.2 83.5 Position 97.1 98.5 97.8 97.3 98.9 98.1 96.3 98.0 97.0 94.4 97.8 96.0 Time 96.6 98.8 97.7 97.0 98.9 97.9 95.2 96.3 95.7 94.0 95.9 94.9 all 91.0 93.2 92.1 91.6 93.5 92.5 90.0 91.7 90.8 88.2 91.5 89.8

Table 5: Results for different models on the RESUME dataset.

Models EntityP EntityR EntityF RelationP RelationR RelationF Li and Ji (2014) 83.5 76.2 79.7 60.8 36.1 49.3 Miwa and Bansal (2016) 80.8 82.9 81.8 48.7 48.1 48.4 Katiyar and Cardie (2017) 81.2 78.1 79.6 46.4 45.3 45.7 Bekoulis et al. (2018) - - 81.6 - - 47.5 Multi-turn QA 84.4 82.9 83.6 50.1 48.7 49.4 (+1.0)

Table 6: Results of different models on the ACE04 test set. Results for pipelined methods are omitted since they consistently underperform joint models (see Li and Ji (2014) for details).

Models EntityP EntityR EntityF RelationP RelationR RelationF Li and Ji (2014) 85.2 76.9 80.8 65.4 39.8 49.5 Miwa and Bansal (2016) 82.9 83.9 83.4 57.2 54.0 55.6 Katiyar and Cardie (2017) 84.0 81.3 82.6 55.5 51.8 53.6 Zhang et al. (2017) - - 83.5 - - 57.5 Sun et al. (2018) 83.9 83.2 83.6 64.9 55.1 59.6 Multi-turn QA 84.7 84.9 84.8 64.8 56.2 60.2 (+0.6)

Table 7: Results of different models on the ACE05 test set. Results for pipelined methods are omitted since they consistently underperform joint models (see Li and Ji (2014) for details).

Models EntityP EntityR EntityF1 RelationP RelationR RelationF Miwa and Sasaki (2014) – – 80.7 – – 61.0 Zhang et al. (2017) – – 85.6 – – 67.8 Bekoulis et al. (2018) – – 83.6 – – 62.0 Multi-turn QA 89.0 86.6 87.8 69.2 68.2 68.9 (+1.1)

Table 8: Comparison of the proposed method with the previous models on the CoNLL04 dataset. Precision and recall values of baseline models were not reported in the previous papers.

RESUME can help entity/relation extraction. By contrast, Model Overall P Overall R Overall F Pseudo Q 90.2 92.3 91.2 the pseudo-questions provide very coarse-grained, Natural Q 91.0 93.2 92.1 ambiguous and implicit hints of entity and relation ACE04 types, which might even confuse the model. Model EP ER EF RP RR RF Pseudo Q 83.7 81.3 82.5 49.4 47.2 48.3 Natural Q 84.4 82.9 83.6 50.1 48.7 49.9 6.2 Effect of Joint Training ACE05 Model EP ER EF RP RR RF In this paper, we decompose the entity-relation ex- Pseudo Q 83.6 84.7 84.2 60.4 55.9 58.1 traction task into two subtasks: a multi-answer Natural Q 84.7 84.9 84.8 64.8 56.2 60.2 task for head-entity extraction and a single-answer CoNLL04 task for joint relation and tail-entity extraction. We Model EP ER EF RP RR RF Pseudo Q 87.4 86.4 86.9 68.2 67.4 67.8 jointly train two models with parameters shared. Natural Q 89.0 86.6 87.8 69.6 68.2 68.9 The parameter λ control the tradeoff between the two subtasks: Table 9: Comparing of the effect of natural language questions with pseudo-questions. L = (1−λ)L(head-entity)+λL(tail-entity) (4)

This is because natural language questions pro- Results regarding different values of λ on the vide more fine-grained semantic information and ACE05 dataset are given as follows: λ EntityF1 Relation F1 7 Conclusion λ = 0 85.0 55.1 λ = 0.1 84.8 55.4 In this paper, we propose a multi-turn question an- λ = 0.2 85.2 56.2 λ = 0.3 84.8 56.4 swering paradigm for the task of entity-relation ex- λ = 0.4 84.6 57.9 traction. We achieve new state-of-the-art results λ = 0.5 84.8 58.3 λ = 0.6 84.6 58.9 on 3 benchmark datasets. We also construct a λ = 0.7 84.8 60.2 new entity-relation extraction dataset that requires λ = 0.8 83.9 58.7 hierarchical relation reasoning and the proposed λ = 0.9 82.7 58.3 λ = 1.0 81.9 57.8 model achieves the best performance. When λ is set to 0, the system is essentially only trained on the head-entity prediction task. It is in- References teresting to see that λ = 0 does not lead to the Giannis Bekoulis, Johannes Deleu, Thomas Demeester, best entity-extraction performance. This demon- and Chris Develder. 2018. Adversarial training for strates that the second-stage relation extraction ac- multi-context joint entity and relation extraction. In tually helps the first-stage entity extraction, which Proceedings of the 2018 Conference on Empirical again confirms the necessity of considering these Methods in Natural Language Processing, pages 2830–2836. Association for Computational Linguis- two subtasks together. For the relation extraction tics. task, the best performance is obtained when λ is set to 0.7. Yoshua Bengio, J´erˆome Louradour, Ronan Collobert, and Jason Weston. 2009. Curriculum learning. In 6.3 Case Study Proceedings of the 26th annual international confer- ence on machine learning, pages 41–48. ACM. Table 10 compares outputs from the proposed Yee Seng Chan and Dan Roth. 2011. Exploiting multi-turn QA model with the ones of the previ- syntactico-semantic structures for relation extrac- ous SOTA MRT model (Sun et al., 2018). In the tion. In Proceedings of the 49th Annual Meeting of first example, MRT is not able to identify the rela- the Association for Computational Linguistics: Hu- tion between john scottsdale and iraq because the man Language Technologies-Volume 1, pages 551– two entities are too far away, but our proposed QA 560. Association for Computational Linguistics. model is able to handle this issue. In the second ex- Danqi Chen and Christopher Manning. 2014. A fast ample, the sentence contains two pairs of the same and accurate dependency parser using neural net- relation. The MRT model has a hard time identi- works. In Proceedings of the 2014 conference on empirical methods in natural language processing fying handling this situation, not able to locate the (EMNLP), pages 740–750. ship entity and the associative relation, which the multi-turn QA model is able to handle this case. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova.2018. Bert: Pre-training of deep EXAMPLE1 [john scottsdale] PER: PHYS-1 is bidirectional transformers for language understand- on the front lines in [iraq]GPE: PHYS-1 . ing. arXiv preprint arXiv:1810.04805. MRT [john scottsdale] PER is on the front lines in [iraq]GPE . Timothy Dozat and Christopher D Manning. 2016. MULTI-QA [john scottsdale] PER: PHYS-1 is Deep biaffine attention for neural dependency pars- on the front lines in [iraq]GPE: PHYS-1 . ing. arXiv preprint arXiv:1611.01734. EXAMPLE2 The [men] PER: ART-1 held on the sinking [vessel] VEH: ART-1 Matthew Dunn, Levent Sagun, Mike Higgins, V Ugur until the [passenger] PER: ART-2 Guney, Volkan Cirik, and Kyunghyun Cho. 2017. [ship] VEH: ART-2 Searchqa: A new q&a dataset augmented with was able to reach them. context from a search engine. arXiv preprint MRT The [men] PER: ART-1 held on the arXiv:1704.05179. sinking [vessel] VEH: ART-1 until the [passenger]PER Pankaj Gupta, Hinrich Sch¨utze, and Bernt Andrassy. ship was able to reach them. 2016. Table filling multi-task recurrent neural net- MULTI-QA The [men] PER: ART-1 held on the work for joint entity and relation extraction. In Pro- sinking [vessel] VEH: ART-1 ceedings of COLING 2016, the 26th International until the [passenger] PER: ART-2 [ship] VEH: ART-2 was able to reach them. Conference on Computational Linguistics: Techni- cal Papers, pages 2537–2547. Table 10: Comparing the multi-turn QA model with Zhiheng Huang, Wei Xu, and Kai Yu. 2015. Bidirec- MRT (Sun et al., 2018). tional lstm-crf models for sequence tagging. arXiv preprint arXiv:1508.01991. Mandar Joshi, Eunsol Choi, Daniel S Weld, and Luke Makoto Miwa and Mohit Bansal. 2016. End-to-end re- Zettlemoyer. 2017. Triviaqa: A large scale distantly lation extraction using lstms on sequences and tree supervised challenge dataset for reading comprehen- structures. arXiv preprint arXiv:1601.00770. sion. arXiv preprint arXiv:1705.03551. Makoto Miwa, Rune Sætre, Yusuke Miyao, and Rohit J Kate and Raymond J Mooney. 2010. Joint en- Jun’ichi Tsujii. 2009. A rich feature vector for tity and relation extraction using card-pyramid pars- protein-protein interaction extraction from multiple ing. In Proceedings of the Fourteenth Conference on corpora. In Proceedings of the 2009 Conference on Computational Natural Language Learning, pages Empirical Methods in Natural Language Processing: 203–212. Association for Computational Linguis- Volume 1-Volume 1, pages 121–130. Association for tics. Computational Linguistics.

Arzoo Katiyar and Claire Cardie. 2017. Going out on Makoto Miwa and Yutaka Sasaki. 2014. Modeling a limb: Joint extraction of entity mentions and re- joint entity and relation extraction with table repre- lations without dependency trees. In Proceedings sentation. In Proceedings of the 2014 Conference on of the 55th Annual Meeting of the Association for Empirical Methods in Natural Language Processing Computational Linguistics (Volume 1: Long Papers), (EMNLP), pages 1858–1869. pages 917–928. Association for Computational Lin- guistics. Volodymyr Mnih, Koray Kavukcuoglu, David Silver, John Lafferty, Andrew McCallum, and Fernando CN Andrei A Rusu, Joel Veness, Marc G Bellemare, Pereira. 2001. Conditional random fields: Prob- Alex Graves, Martin Riedmiller, Andreas K Fidje- abilistic models for segmenting and labeling se- land, Georg Ostrovski, et al. 2015. Human-level quence data. control through deep reinforcement learning. Na- ture, 518(7540):529. Oliver Lemon, Kallirroi Georgila, James Henderson, and Matthew Stuttle. 2006. An isu dialogue system Nikola Mrkˇsi´c, Diarmuid O S´eaghdha, Blaise Thom- exhibiting reinforcement learning of dialogue poli- son, Milica Gaˇsi´c, Pei-Hao Su, David Vandyke, cies: generic slot-filling in the talk in-car system. In Tsung-Hsien Wen, and Steve Young. 2015. Multi- Proceedings of the Eleventh Conference of the Euro- domain dialog state tracking using recurrent neural pean Chapter of the Association for Computational networks. arXiv preprint arXiv:1506.07190. Linguistics: Posters & Demonstrations, pages 119– 122. Association for Computational Linguistics. Matthew E Peters, Mark Neumann, Mohit Iyyer, Matt Gardner, Christopher Clark, Kenton Lee, and Luke Omer Levy, Minjoon Seo, Eunsol Choi, and Luke Zettlemoyer. 2018. Deep contextualized word repre- Zettlemoyer. 2017. Zero-shot relation extrac- sentations. arXiv preprint arXiv:1802.05365. tion via reading comprehension. arXiv preprint arXiv:1706.04115. Pranav Rajpurkar, Jian Zhang, Konstantin Lopyrev, and Percy Liang. 2016. Squad: 100,000+ questions Jiwei Li, Alexander H Miller, Sumit Chopra, for machine comprehension of text. arXiv preprint Marc’Aurelio Ranzato, and Jason Weston. 2016. arXiv:1606.05250. Dialogue learning with human-in-the-loop. arXiv preprint arXiv:1611.09823. Dan Roth and Wen-tau Yih. 2004. A linear program- ming formulation for global inference in natural lan- Qi Li and Heng Ji. 2014. Incremental joint extraction guage tasks. Technical report, ILLINOIS UNIV AT of entity mentions and relations. In Proceedings of URBANA-CHAMPAIGN DEPT OF COMPUTER the 52nd AnnualMeeting of the Association for Com- SCIENCE. putational Linguistics (Volume 1: Long Papers), vol- ume 1, pages 402–412. Dan Roth and Wen-tau Yih. 2007. Global inference Yankai Lin, Shiqi Shen, Zhiyuan Liu, Huanbo Luan, for entity and relation identification via a linear pro- and Maosong Sun. 2016. Neural relation extraction gramming formulation. Introduction to statistical re- with selective attention over instances. In Proceed- lational learning, pages 553–580. ings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Pa- Minjoon Seo, Aniruddha Kembhavi, Ali Farhadi, and pers), volume 1, pages 2124–2133. Hannaneh Hajishirzi. 2016. Bidirectional attention flow for machine comprehension. arXiv preprint Xuezhe Ma and Eduard Hovy. 2016. End-to-end arXiv:1611.01603. sequence labeling via bi-directional lstm-cnns-crf. arXiv preprint arXiv:1603.01354. Yelong Shen, Po-Sen Huang, Jianfeng Gao, and Weizhu Chen. 2017. Reasonet: Learning to stop Bryan McCann, Nitish Shirish Keskar, Caiming Xiong, reading in machine comprehension. In Proceedings and Richard Socher. 2018. The natural language de- of the 23rd ACM SIGKDD International Conference cathlon: Multitask learning as question answering. on Knowledge Discovery and Data Mining, pages arXiv preprint arXiv:1806.08730. 1047–1055. ACM. Sameer Singh, Sebastian Riedel, Brian Martin, Jiaping Jason D Williams and Steve Young. 2005. Scaling Zheng, and Andrew McCallum. 2013. Joint infer- up pomdps for dialog management: The“summary ence of entities, relations, and coreference. In Pro- pomdp”method. In IEEE Workshop on Automatic ceedings of the 2013 workshop on Automated knowl- Speech Recognition and Understanding, 2005., edge base construction, pages 1–6. ACM. pages 177–182. IEEE. Changzhi Sun, Yuanbin Wu, Man Lan, Shiliang Sun, Ronald J Williams. 1992. Simple statistical gradient- Wenting Wang, Kuang-Chih Lee, and Kewen Wu. following algorithms for connectionist reinforce- 2018. Extracting entities and relations with joint ment learning. Machine learning, 8(3-4):229–256. minimum risk training. In Proceedings of the 2018 Conference on Empirical Methods in Natural Lan- Caiming Xiong, Victor Zhong, and Richard Socher. guage Processing, pages 2256–2265. 2017. Dcn+: Mixed objective and deep residual coattention for question answering. arXiv preprint Kai Sheng Tai, Richard Socher, and Christopher D arXiv:1711.00106. Manning. 2015. Improved semantic representations from tree-structured long short-term memory net- Bishan Yang and Claire Cardie. 2013. Joint inference works. arXiv preprint arXiv:1503.00075. for fine-grained opinion extraction. In Proceedings of the 51st Annual Meeting of the Association for Ryuichi Takanobu, Tianyang Zhang, Jiexi Liu, and Computational Linguistics (Volume 1: Long Papers), Minlie Huang. 2018. A hierarchical framework volume 1, pages 1640–1649. for relation extraction with reinforcement learning. arXiv preprint arXiv:1811.03925. Hai Ye, Wenhan Chao, Zhunchen Luo, and Zhoujun Li. 2016. Jointly extracting relations with class Chuanqi Tan, Furu Wei, Nan Yang, Bowen Du, ties via effective deep ranking. arXiv preprint Weifeng Lv, and Ming Zhou. 2017. S-net: From arXiv:1612.07602. answer extraction to answer generation for ma- chine reading comprehension. arXiv preprint Adams Wei Yu, David Dohan, Minh-Thang Luong, Rui arXiv:1706.04815. Zhao, Kai Chen, Mohammad Norouzi, and Quoc V Le. 2018. Qanet: Combining local convolution Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob with global self-attention for reading comprehen- Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz sion. arXiv preprint arXiv:1804.09541. Kaiser, and Illia Polosukhin. 2017. Attention is all you need. In Advances in Neural Information Pro- Xiaofeng Yu and Wai Lam. 2010. Jointly identifying cessing Systems, pages 5998–6008. entities and extracting relations in encyclopedia text Linlin Wang, Zhu Cao, Gerard de Melo, and Zhiyuan via a graphical model approach. In Proceedings Liu. 2016a. Relation classification via multi-level of the 23rd International Conference on Computa- attention cnns. tional Linguistics: Posters, pages 1399–1407. Asso- ciation for Computational Linguistics. Shuohang Wang and Jing Jiang. 2016. Machine com- prehension using match-lstm and answer pointer. Dmitry Zelenko, Chinatsu Aone, and Anthony arXiv preprint arXiv:1608.07905. Richardella. 2003. Kernel methods for relation ex- traction. Journal of machine learning research, Shuohang Wang, Mo Yu, Jing Jiang, Wei Zhang, 3(Feb):1083–1106. Xiaoxiao Guo, Shiyu Chang, Zhiguo Wang, Tim Klinger, Gerald Tesauro, and Murray Campbell. Xiangrong Zeng, Daojian Zeng, Shizhu He, Kang Liu, 2017a. Evidence aggregation for answer re-ranking and Jun Zhao. 2018. Extracting relational facts by in open-domain question answering. arXiv preprint an end-to-end neural model with copy mechanism. arXiv:1711.05116. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume Wenhui Wang, Nan Yang, Furu Wei, Baobao Chang, 1: Long Papers), volume 1, pages 506–514. and Ming Zhou. 2017b. Gated self-matching net- works for reading comprehension and question an- Meishan Zhang, Yue Zhang, and Guohong Fu. 2017. swering. In Proceedings of the 55th Annual Meet- End-to-end neural relation extraction with global op- ing of the Association for Computational Linguistics timization. In Proceedings of the 2017 Conference (Volume 1: Long Papers), volume 1, pages 189–198. on Empirical Methods in Natural Language Process- ing, pages 1730–1740. Zhiguo Wang, Haitao Mi, Wael Hamza, and Radu Florian. 2016b. Multi-perspective context match- Suncong Zheng, Yuexing Hao, Dongyuan Lu, ing for machine comprehension. arXiv preprint Hongyun Bao, Jiaming Xu, Hongwei Hao, and arXiv:1612.04211. Bo Xu. 2017. Joint entity and relation extraction based on a hybrid neural network. Neurocomputing, Tsung-Hsien Wen, David Vandyke, Nikola Mrksic, 257:59–66. Milica Gasic, Lina M Rojas-Barahona, Pei-Hao Su, Stefan Ultes, and Steve Young. 2016. A network- based end-to-end trainable task-oriented dialogue system. arXiv preprint arXiv:1604.04562.