Entity Linking Via Dual and Cross-Attention Encoders
Total Page:16
File Type:pdf, Size:1020Kb
Entity Linking via Dual and Cross-Attention Encoders Oshin Agarwal∗ Daniel M. Bikel University of Pennsylvania Google Research [email protected] [email protected] Abstract the most likely entity from these candidates. Of- ten, priors and alias tables (a.k.a. candidate tables) Entity Linking has two main open areas of are used to generate the set of candidate entities, research: ¬ generate candidate entities with- and work on entity linking has focused on either out using alias tables and generate more contextual representations for both mentions generating better alias tables or reranking the set of and entities. Recently, a solution has been candidate entities generated from alias tables. proposed for the former as a dual-encoder en- Alias tables, however, suffer from many draw- tity retrieval system (Gillick et al., 2019) that backs. They are based on prior probabilities of learns mention and entity representations in a mention referring to an entity using occurrence the same space, and performs linking by se- counts of mention strings. As a result, they are lecting the nearest entity to the mention in heavily biased towards the most common entity this space. In this work, we use this re- trieval system solely for generating candidate and do not take into account complex features such entities. We then rerank the entities by us- as mention context and entity description. Such ing a cross-attention encoder over the tar- features can help not only in better disambigua- get mention and each of the candidate en- tion to existing entities but also when new entities tities. Whereas a dual encoder approach are added to the knowledge base without the need forces all information to be contained in the for retraining. If candidate generation is based on small, fixed set of vector dimensions used dense representations instead of a list of candidate to represent mentions and entities, a cross- “entity ids”, the system will generalize better, even attention model allows for the use of detailed information (read: features) from the entirety in a zero-shot setting. Lastly, alias tables are of- of each hmention; context; candidate entityi tu- ten not readily available for many domains, so an ple. We experiment with features used approach that is free of alias tables will be useful in the reranker including different ways of for such domains. Recently, (Gillick et al., 2019), incorporating document-level context. We also known as DEER, used a dual encoder system achieve state-of-the-art results on TACKBP- to learn entity and mention representations in the 2010 dataset, with 92.05% accuracy. Further- same space to perform end to end entity linking us- more, we show how the rescoring model gener- alizes well when trained on the larger CoNLL- ing nearest neighbor search without the use of alias tables. This model obtained accuracy competitive arXiv:2004.03555v1 [cs.CL] 7 Apr 2020 2003 dataset and evaluated on TACKBP-2010. to the state-of-the-art entity linking systems on the 1 Introduction TACKBP-2010 dataset. In this paper, we use this model as a candidate generator by retrieving the Entity linking is the task of finding the unique re- top few nearest neighbors. ferring entity in a knowledge base for a mention Another challenge in entity linking is the use span in text. For example, Union City in the sen- of more complex contextual representations for tence “The Bay Area Transit Centre in Union City mentions as well as entities. While a dual en- is under construction” refers to the Wikipedia entity coder learns separate fixed representations for en- Union City, California . Entity linking is typi- tities and mentions in context, we may often want cally performed in two steps: generating candidate to look at different pieces of information depend- entities from the knowledge base and then selecting ing on the entity and mention in question. In the ∗ Work done during internship at Google above example on Union City, if the candidate en- Cosine similarity we discuss the challenges faced in the task of entity linking for a fair comparison to prior work and a Compound need to establish standards for a more meaningful encoder Compound encoder comparison of different systems. Compound Compound encoder encoder 2 Task Setup Text Text Text Text Text Text Sparse encoder encoder encoder encoder encoder encoder encoder Entity linking involves finding the unique entity Mention Left Right Sentence Entity Entity Entity in a knowledge base that a mention in text refers span context context title description categories to. Given a knowledge base KB = fe1; e2:::; eng Figure 1: Dual encoder that generates candidate enti- and a mention in text m, the goal is to find the ties for the given mention. Green denotes the mention unique entity e 2 KB that m refers to. Typi- features and blue denotes the entity features. m cally, a two-step procedure is used: ¬ candidate generation and reranking candidates. In the tity were New York City, then the names serve as first step, possible candidate entities are shortlisted, sufficient evidence. If the candidate entity were CE = fe1; e2:::; ekg s.t. e j 2 KB; j : 1 ≤ j ≤ k. CE Union City, New Jersey though, we need to may or may not contain em. Finally, the entities in look at the full sentence with the mention for ad- CE are re-ranked so that em gets selected. ditional clues. Moreover, depending on the entity, For example, given the mention Union City we may even need to look at different parts of the in the sentence “The Bay Area Transit Cen- sentence or even the document. On the entity side, tre in Union City is under construction”, a we may want to look the relevant parts of the entity candidate generator would first generates pos- description instead of just the title. Let’s say we sible candidate entities from the knowledge have a mention Asia Cup and two candidate enti- base such as Union City, New Jersey and ties 2018 Asia Cup and 2016 Asia Cup. Entity Union City, California. The reranker would descriptions give us additional information that that them aim to selects one entity out of these candi- the 2018 Asia Cup was held in UAE and the 2016 dates, in this case Union City, California. Asia Cup was held in Bangladesh. Depending on 3 Model whether the mention context has the year or the location, we would want the entity representation In this work, we follow a two-step procedure as to capture the corresponding relevant information well: ¬ candidate generation and reranking can- from the entity. For this reason, we re-rank the didates. We use the dual encoder architecture in candidate entities using BERT (Devlin et al., 2019) (Gillick et al., 2019) for generating a small set of as a cross-attention encoder. Cross-attention gives candidate entities. We use top k retrieved entities the opportunity to choose relevant context selec- as candidates for linking. We rerank these can- tively depending in the specific mention and entity didates by binary classification using BERT as a in question and the available features. cross-attention encoder to generate the probability In our approach, the candidate generator uses the of each of the candidates being the true link and lighter representation of GloVe (Pennington et al., select the one with the highest probability. We first 2014) with fewer features as it is trained on the briefly describe the dual encoder candidate gener- full knowledge base and the reranker uses the more ator, followed by the details of the cross-attention complex representation of BERT and additional encoder for reranking. The full system architecture features such as document context as it needs to is illustrated in Figures1&2 along with an example. operate only on a small set of candidates and much less data overall. This method gives improved accu- 3.1 Candidate Generator racy on both the TACKBP-2010 and CoNLL-2003 We use the dual encoder model in (Gillick et al., datasets. Furthermore, we show that the reranker 2019) for generating candidates for each mention. trained on the larger of the two datasets–CoNLL- The model learns representations for entities and 2003–generalizes well on both the datasets. We mentions in the same space by maximizing the co- also present an ablation study and a qualitative sine similarity between the two during training. analysis to determine the impact of various entity The mention encoder consists of multiple feed- and mention features used in the reranker. Lastly, forward networks encoding the mention span, the Union City, California New Jersey Union City, Union City, New Jersey New Jersey New Hampshire Union City, Union City, DEER Georgia California entity encoder Union City Pennsylvania San Francisco Union City, California New York City San Francisco Union City, Philadelphia Georgia Wikipedia DEER mention encoder The Bay Area Transit Centre in Union City is under construction Figure 2: Cross-attention encoder that reranks the candidate entities. Each candidate is paired with the mention and processed one at a time. Green denotes the mention features and blue denotes the entity features. left context (5 words), the right context (5 words) span and candidate entity, we learn a joint represen- and the sentence with the mention masked. The en- tation of the pair using BERT as a cross-attention tity encoder also consists of multiple feed-forward encoder. This representation is then classified as networks encoding the entity title, description (first true link or not. At test time, the final entity is paragraph) and user-specified categories. Entity selected from the candidates as a post-processing categories are input as sparse vectors.