Mixing Context Granularities for Improved Entity Linking on Question Answering Data Across Entity Categories
Total Page:16
File Type:pdf, Size:1020Kb
Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories Daniil Sorokin and Iryna Gurevych Ubiquitous Knowledge Processing Lab (UKP) and Research Training Group AIPHES Department of Computer Science, Technische Universitat¨ Darmstadt www.ukp.tu-darmstadt.de sorokin|gurevych @ukp.informatik.tu-darmstadt.de { } Abstract what are taylor swift’s albums ? The first stage of every knowledge base question answering approach is to link en- Taylor Swift Q462 album Q24951125 tities in the input question. We investigate entity linking in the context of a question INSTANCE OF PERFORMER answering task and present a jointly opti- mized neural architecture for entity men- Red, 1989, etc. tion detection and entity disambiguation that models the surrounding context on dif- Figure 1: An example question from a QA dataset ferent levels of granularity. that shows the correct entity mentions and their relationship with the correct answer to the question, We use the Wikidata knowledge base and Qxxx stands for a knowledge base identifier available question answering datasets to create benchmarks for entity linking on question answering data. Our approach for question answering (e.g. DBPedia Spotlight1, outperforms the previous state-of-the-art AIDA2). However, these systems have certain draw- system on this data, resulting in an average backs in the QA setting: they are targeted at long 8% improvement of the final score. We fur- well-formed documents, such as news texts, and ther demonstrate that our model delivers a are less suited for typically short and noisy ques- strong performance across different entity tion data. Other EL systems focus on noisy data categories. (e.g. S-MART, Yang and Chang, 2015), but are not openly available and hence limited in their us- 1 Introduction age and application. Multiple error analyses of QA systems point to entity linking as a major external Knowledge base question answering (QA) requires source of error (Berant and Liang, 2014; Reddy a precise modeling of the question semantics et al., 2014; Yih et al., 2015). through the entities and relations available in the The QA datasets are normally collected from the knowledge base (KB) in order to retrieve the cor- web and contain very noisy and diverse data (Be- rect answer. The first stage for every QA approach rant et al., 2013), which poses a number of chal- is entity linking (EL), that is the identification of lenges for EL. First, many common features used entity mentions in the question and linking them in EL systems, such as capitalization, are not mean- to entities in KB. In Figure1, two entity mentions ingful on noisy data. Moreover, a question is a are detected and linked to the knowledge base ref- short text snippet that does not contain broader con- erents. This step is crucial for QA since the correct text that is helpful for entity disambiguation. The answer must be connected via some path over KB QA data also features many entities of various cat- to the entities mentioned in the question. egories and differs in this respect from the Twitter The state-of-the-art QA systems usually rely on datasets that are often used to evaluate EL systems. off-the-shelf EL systems to extract entities from the question (Yih et al., 2015). Multiple EL sys- 1http://www.dbpedia-spotlight.org tems are freely available and can be readily applied 2https://www.mpi-inf.mpg.de/yago-naga/aida/ 65 Proceedings of the 7th Joint Conference on Lexical and Computational Semantics (*SEM), pages 65–75 New Orleans, June 5-6, 2018. c 2018 Association for Computational Linguistics In this paper, we present an approach that tackles 0.4 3 . the challenges listed above: we perform entity men- 0.3 0 25 tion detection and entity disambiguation jointly in . 0 23 . 0 22 22 . a single neural model that makes the whole pro- . 0 0 cess end-to-end differentiable. This ensures that 0.2 any token n-gram can be considered as a potential 16 . 15 0 . 14 0 . 13 0 entity mention, which is important to be able to . 0 link entities of different categories, such as movie 08 08 08 . titles and organization names. 0.1 . 0 0 07 0 . 06 06 0 . ratio of the entity type . 05 0 0 . 05 . To overcome the noise in the data, we automat- 0 04 0 . 03 . 0 0 02 . 02 01 . 0 ically learn features over a set of contexts of dif- . 0 0 ferent granularity levels. Each level of granularity is handled by a separate component of the model. event person thing location product A token-level component extracts higher-level fea- organization fic. character professions etc. tures from the whole question context, whereas a character-level component builds lower-level fea- NEEL WebQSP GraphQuestions tures for the candidate n-gram. Simultaneously, we extract features from the knowledge base context of Figure 2: Distribution of entity categories in the candidate entity: character-level features are ex- the NEEL 2014, WebQSP and GraphQuestions tracted for the entity label and higher-level features datasets are produced based on the entities surrounding the candidate entity in the knowledge graph. This infor- mation is aggregated and used to predict whether 2 Motivation and Related Work the n-gram is an entity mention and to what entity it should be linked. Several benchmarks exist for EL on Wikipedia texts and news articles, such as ACE (Bentivogli et al., Contributions The two main contributions of our work are: 2010) and CoNLL-YAGO (Hoffart et al., 2011). These datasets contain multi-sentence documents (i) We construct two datasets to evaluate EL for and largely cover three types of entities: Location, QA and present a set of strong baselines: the Person and Organization. These types are com- existing EL systems that were used as a build- monly recognized by named entity recognition sys- ing block for QA before and a model that uses tems, such as Stanford NER Tool (Manning et al., manual features from the previous work on 2014). Therefore in this scenario, an EL system noisy data. can solely focus on entity disambiguation. In the recent years, EL on Twitter data has (ii) We design and implement an entity linking emerged as a branch of entity linking research. In system that models contexts of variable gran- particular, EL on tweets was the central task of ularity to detect and disambiguate entity men- the NEEL shared task from 2014 to 2016 (Rizzo tions. To the best of our knowledge, we are et al., 2017). Tweets share some of the challenges the first to present a unified end-to-end neural with QA data: in both cases the input data is short model for entity linking for noisy data that and noisy. On the other hand, it significantly dif- operates on different context levels and does fers with respect to the entity types covered. The not rely on manual features. Our architec- data for the NEEL shared task was annotated with ture addresses the challenges of entity linking 7 broad entity categories, that besides Location, on question answering data and outperforms Organization and Person include Fictional Charac- state-of-the-art EL systems. ters, Events, Products (such as electronic devices or works of art) and Things (abstract objects). Fig- Code and Datasets Our system can be applied ure2 shows the distribution of entity categories in on any QA dataset. The complete code as well the training set from the NEEL 2014 competition. as the scripts that produce the evaluation data can One can see on the diagram that the distribution be found here: https://github.com/UKPLab/ is mainly skewed towards 3 categories: Location, starsem2018-entity-linking. Person and Organization. 66 Figure2 also shows the entity categories present on QA data. Guo et al.(2013b) have created a in two QA datasets. The distribution over the cate- new dataset of around 1500 tweets and suggested a gories is more diverse in this case. The WebQues- Structured SVM approach that handled mention de- tions dataset includes the Fictional Character and tection and entity disambiguation together. Chang Thing categories which are almost absent from the et al.(2014) describe the winning system of the NEEL dataset. A more even distribution can be ob- NEEL 2014 competition on EL for short texts: The served in the GraphQuestion dataset that features system adapts a joint approach similar to Guo et al. many Events, Fictional Characters and Professions. (2013b), but uses the MART gradient boosting al- This means that a successful system for EL on ques- gorithm instead of the SVM and extends the fea- tion data needs to be able to recognize and to link ture set. The current state-of-the-art system for all categories of entities. Thus, we aim to show EL on noisy data is S-MART (Yang and Chang, that comprehensive modeling of different context 2015) which extends the approach from Chang et al. levels will result in a better generalization and per- (2014) to make structured predictions. The same formance across various entity categories. group has subsequently applied S-MART to extract Existing Solutions The early machine learning entities for a QA system (Yih et al., 2015). approaches to EL focused on long well-formed Unfortunately, the described EL systems for documents (Bunescu and Pasca, 2006; Cucerzan, short texts are not available as stand-alone tools. 2007; Han and Sun, 2012; Francis-Landau et al., Consequently, the modern QA approaches mostly 2016). These systems usually rely on an off-the- rely on off-the-shelf entity linkers that were de- shelf named entity recognizer to extract entity men- signed for other domains.