Improving Entity Linking by Modeling Latent Relations Between Mentions

Improving Entity Linking by Modeling Latent Relations between Mentions Phong Le1 and Ivan Titov1;2 1University of Edinburgh 2University of Amsterdam fple,[email protected] Abstract sides coreference, there are many other relations between entities which constrain or favor certain Entity linking involves aligning textual alignment configurations. For example, consider mentions of named entities to their corre- relation participant in in Figure 1: if “World Cup” sponding entries in a knowledge base. En- is aligned to the entity FIFA WORLD CUP then tity linking systems often exploit relations we expect the second “England” to refer to a foot- between textual mentions in a document ball team rather than a basketball one. (e.g., coreference) to decide if the linking NEL methods typically consider only corefer- decisions are compatible. Unlike previous ence, relying either on off-the-shelf systems or approaches, which relied on supervised some simple heuristics (Lazic et al., 2015), and systems or heuristics to predict these rela- exploit them in a pipeline fashion, though some tions, we treat relations as latent variables (e.g., Cheng and Roth (2013); Ren et al. (2017)) in our neural entity-linking model. We in- additionally exploit a range of syntactic-semantic duce the relations without any supervision relations such as apposition and possessives. An- while optimizing the entity-linking sys- other line of work ignores relations altogether and tem in an end-to-end fashion. Our multi- models the predicted sequence of KB entities as a relational model achieves the best reported bag (Globerson et al., 2016; Yamada et al., 2016; scores on the standard benchmark (AIDA- Ganea and Hofmann, 2017). Though they are able CoNLL) and substantially outperforms its to capture some degree of coherence (e.g., pref- relation-agnostic version. Its training also erence towards entities from the same general do- converges much faster, suggesting that the main) and are generally empirically successful, the injected structural bias helps to explain underlying assumption is too coarse. For example, regularities in the training data. they would favor assigning all the occurrences of “England” in Figure 1 to the same entity. 1 Introduction We hypothesize that relations useful for NEL Named entity linking (NEL) is the task of as- can be induced without (or only with little) domain signing entity mentions in a text to corresponding expertise. In order to prove this, we encode rela- entries in a knowledge base (KB). For example, tions as latent variables and induce them by opti- consider Figure 1 where a mention “World Cup” mizing the entity-linking model in an end-to-end refers to a KB entity FIFA WORLD CUP. NEL fashion. In this way, relations between mentions is often regarded as crucial for natural language in documents will be induced in such a way as to understanding and commonly used as preprocess- be beneficial for NEL. As with other recent ap- ing for tasks such as information extraction (Hoff- proaches to NEL (Yamada et al., 2017; Ganea and mann et al., 2011) and question answering (Yih Hofmann, 2017), we rely on representation learn- et al., 2015). ing and learn embeddings of mentions, contexts Potential assignments of mentions to entities are and relations. This further reduces the amount regulated by semantic and discourse constraints. of human expertise required to construct the sys- For example, the second and third occurrences of tem and, in principle, may make it more portable mention “England” in Figure 1 are coreferent and across languages and domains. thus should be assigned to the same entity. Be- Our multi-relational neural model achieves an 1595 Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Long Papers), pages 1595–1604 Melbourne, Australia, July 15 - 20, 2018. c 2018 Association for Computational Linguistics England England West_Germany England_national_ England_national Germany_national_ FIFA_World_Cup football_team _football_team football_team FIBA_Basketball_ England_national_ England_national Germany_national_ World_Cup basketball_team _basketball_team basketball_team ... ... ... ... World Cup 1966 was held in England …. England won… The final saw England beat West Germany . located_in coreference beat participant_in Figure 1: Example for NEL, linking each mention to an entity in a KB (e.g. “World Cup” to FIFA WORLD CUP rather than FIBA BASKETBALL WORLD CUP). Note that the first and the second “England” are in different relations to “World Cup”. improvement of 0.85% F1 over the best re- 2.2 Local and global models ported scores on the standard AIDA-CoNLL Local models rely only on local contexts of men- dataset (Ganea and Hofmann, 2017). Substan- tions and completely ignore interdependencies be- tial improvements over the relation-agnostic ver- tween the linking decisions in the document (these sion show that the induced relations are indeed interdependencies are usually referred to as coher- beneficial for NEL. Surprisingly its training also ence). Let ci be a local context of mention mi and converges much faster: training of the full model Ψ(ei; ci) be a local score function. A local model requires ten times shorter wall-clock time than then tackles the problem by searching for what is needed for estimating the simpler relation- ∗ agnostic version. This may suggest that the in- ei = arg max Ψ(ei; ci) (1) 2 jected structural bias helps to explain regularities ei Ci in the training data, making the optimization task for each i 2 f1; :::; ng (Bunescu and Pas¸ca, 2006; easier. We qualitatively examine induced rela- Lazic et al., 2015; Yamada et al., 2017). tions. Though we do not observe direct counter- A global model, besides using local context parts of linguistic relations, we, for example, see within Ψ(ei; ci), takes into account entity co- that some of the induced relations are closely re- herency. It is captured by a coherence score func- lated to coreference whereas others encode forms tion Φ(E; D): of semantic relatedness between the mentions. Xn ∗ E = arg max Ψ(ei; ci) + Φ(E; D) 2 × × E C1 ::: Cn i=1 2 Background and Related work where E = (e1; :::; en). The coherence score function, in the simplest form, is a sum over 2.1 Named entity linking problem all pairwise scores Φ(ei; ej;D) (Ratinov et al., 2011; Huang et al., 2015; Chisholm and Hachey, Formally, given a document D containing a list of 2015; Ganea et al., 2016; Guo and Barbosa, 2016; mentions m ; :::; m , an entity linker assigns to 1 n Globerson et al., 2016; Yamada et al., 2016), re- each m an KB entity e or predicts that there is no i i sulting in: corresponding entry in the KB (i.e., ei = NILL). Xn Because a KB can be very large, it is stan- ∗ E = arg max Ψ(ei; ci)+ dard to use an heuristic to choose potential can- E2C ×:::×Cn 1 Xi=1 didates, eliminating options which are highly un- Φ(e ; e ;D) (2) likely. This preprocessing step is called candidate i j i=6 j selection. The task of a statistical model is thus re- duced to choosing the best option among a smaller A disadvantage of global models is that exact list of candidates Ci = (ei1; :::; eili ). In what fol- decoding (Equation 2) is NP-hard (Wainwright lows, we will discuss two classes of approaches et al., 2008). Ganea and Hofmann (2017) over- tackling this problem: local and global modeling. come this using loopy belief propagation (LBP), 1596 an approximate inference method based on mes- At the other extreme, feature engineering is al- sage passing (Murphy et al., 1999). Globerson most completely replaced by representation learn- et al. (2016) propose a star model which approxi- ing. These approaches rely on pretrained embed- mates the decoding problem in Equation 2 by ap- dings of words (Mikolov et al., 2013; Penning- proximately decomposing it into n decoding prob- ton et al., 2014) and entities (He et al., 2013; Ya- lems, one per each ei. mada et al., 2017; Ganea and Hofmann, 2017) and often do not use virtually any other hand-crafted 2.3 Related work features. Ganea and Hofmann (2017) showed Our work focuses on modeling pairwise score that such an approach can yield SOTA accuracy functions Φ and is related to previous approaches on a standard benchmark (AIDA-CoNLL dataset). in the two following aspects. Their local and pairwise score functions are T Relations between mentions Ψ(ei; ci) = ei Bf(ci) A relation widely used by NEL systems is corefer- 1 T Φ(ei; ej;D) = − ei Rej (3) ence: two mentions are coreferent if they refer to n 1 the same entity. Though, as we discussed in Sec- e ; e 2 Rd tion 1, other linguistic relations constrain entity as- where i j are the embeddings of entity e ; e B; R 2 Rd×d signments, only a few approaches (e.g., Cheng and i j, are diagonal matrices. The f(c ) Roth (2013); Ren et al. (2017)), exploit any rela- mapping i applies an attention mechanism to c tions other than coreference. We believe that the context words in i to obtain a feature representa- f(c ) 2 Rd) reason for this is that predicting and selecting rel- tions of context ( i . evant (often semantic) relations is in itself a chal- Note that the global component (the pairwise lenging problem. scores) is agnostic to any relations between entities or even to their ordering: it models e ; :::; e In Cheng and Roth (2013), relations between 1 n simply as a bag of entities. Our work is in line with mentions are extracted using a labor-intensive ap- Ganea and Hofmann (2017) in the sense that fea- proach, requiring a set of hand-crafted rules and a ture engineering plays no role in computing local KB containing relations between entities.

Load more