Aggregated Semantic Matching for Short Text Entity Linking
Total Page:16
File Type:pdf, Size:1020Kb
Aggregated Semantic Matching for Short Text Entity Linking Feng Nie1,∗ Shuyan Zhou2,∗ Jing Liu3,∗ Jinpeng Wang4, Chin-Yew Lin4, Rong Pan1∗ 1Sun Yat-Sen University 2Harbin Institute of Technology 3Baidu Inc. 4Microsoft Research Asia ffengniesysu, [email protected], [email protected], fjinpwa, [email protected], [email protected] Abstract Tweet The vile #Trump humanity raises its gentle face The task of entity linking aims to identify con- in Canada ... chapeau to #Trudeau cepts mentioned in a text fragments and link them to a reference knowledge base. Entity Candidates linking in long text has been well studied in Donald Trump, Trump (card games), ... previous work. However, short text entity link- Table 1: An illustration of short text entity linking, ing is more challenging since the texts are noisy and less coherent. To better utilize the with mention Trump underlined. local information provided in short texts, we propose a novel neural network framework, Aggregated Semantic Matching (ASM), in One of the major challenges in entity link- which two different aspects of semantic infor- ing task is ambiguity, where an entity mention mation between the local context and the can- could denote to multiple entities in a knowledge didate entity are captured via representation- base. As shown in Table1, the mention Trump based and interaction-based neural semantic matching models, and then two matching sig- can refer to U.S. president Donald Trump and nals work jointly for disambiguation with a also the card name Trump (card games). Many rank aggregation mechanism. Our evaluation of recent approaches for long text entity linking shows that the proposed model outperforms take the advantage of global context which cap- the state-of-the-arts on public tweet datasets. tures the coherence among the mapped entities for a set of related mentions in a single docu- 1 Introduction ment (Cucerzan, 2007; Han et al., 2011; Glober- son et al., 2016; Heinzerling et al., 2017). How- The task of entity linking aims to link a men- ever, short texts like tweets are often concise and tion that appears in a piece of text to an entry less coherent, which lack the necessary informa- (i.e. entity) in a knowledge base. For example, tion for the global methods. In the NEEL dataset as shown in Table1, given a mention Trump in (Weller et al., 2016), there are only 3:4 mentions in a tweet, it should be linked to the entity Donald each tweet on average. Several studies (Liu et al., Trump1 in Wikipedia. Recent research has shown 2013; Huang et al., 2014) investigate collective that entity linking can help better understand the tweet entity linking by pre-collecting and consid- text of a document (Schuhmacher and Ponzetto, ering multiple tweets simultaneously. However, 2014) and benefits several tasks, including named multiple texts are not always available for collec- entity recognition (Luo et al.) and information re- tion and the process is time-consuming. Thus, we trieval (Xiong et al., 2017b). The research of entity argue that an efficient entity disambiguation which linking mainly considers two types of documents: requires only a single short text (e.g., a tweet) and long text (e.g. news articles and web documents) can well utilize local contexts is better suited in and short text (e.g. tweets). In this paper, we focus real word applications. on short text, particularly tweet entity linking. In this paper, we investigate entity disambigua- ∗ Correspondence author is Rong Pan. This work was tion in a setting where only local information is done when the first and second author were interns and the third author was an employee at Microsoft Research Asia. available. Recent neural approaches have shown 1https://en.wikipedia.org/wiki/Donald Trump their superiority in capturing rich semantic sim- 476 Proceedings of the 22nd Conference on Computational Natural Language Learning (CoNLL 2018), pages 476–485 Brussels, Belgium, October 31 - November 1, 2018. c 2018 Association for Computational Linguistics ilarities from mention contexts and entity con- by leveraging only the local information. Specif- tents. Sun et al.(2015); Francis-Landau et al. ically, we propose using both representation- (2016) proposed using convolutional neural net- focused model and interaction-focused model for works (CNN) with Siamese (symmetric) archi- semantic matching and view them as complemen- tecture to capture the similarity between texts. tary to each other. To overcome the issue of the These approaches can be viewed as represen- static weights in linear regression, we apply rank tation-focused semantic matching models. The aggregation to combine multiple semantic match- representation-focused model first builds a rep- ing signals captured by two neural models on mul- resentation for a single text (e.g., a context or tiple text pairs. We conduct extensive experiments an entity description) with a neural network, and to examine the effectiveness of our proposed ap- then conducts matching between the abstract rep- proach, ASM, on both NEEL dataset and MSR resentation of two pieces of text. Even though tweet entity linking (MSR-TEL for short) dataset. such models capture distinguishable information from both mention and entity side, some con- 2 Background crete matching signals are lost (e.g., exact match), 2.1 Notations since the matching between two texts happens af- ter their individual abstract representations have Given a tweet t, it contains a set of identified been obtained. To enhance the representation- queries Q = (q1; :::; qn). Each query q in a tweet t focused models, inspired by recent advances in in- consists of m and ctx, where m denotes an entity formation retrieval (Lu and Li, 2013; Guo et al., mention and ctx denotes the context of the men- 2016; Xiong et al., 2017a), we propose using in- tion, i.e., a piece of text surrounding m in the tweet teraction-focused approach to capture the con- t. An entity is an unambiguous page (e.g., Donald crete matching signals. The interaction-focused Trump) in a referent Knowledge Base (KB). Each method tries to build local interactions (e.g., co- entity e consists of ttl and desc, where ttl denotes sine similarity) between two pieces of text, and the title of e and desc denotes the description of e then uses neural networks to learn the final match- (e.g., the article defining e). ing score based on the local interactions. 2.2 An Overview of the Linking System The representation- and interaction-focused ap- Typically, an entity linking system consists of proach capture abstract- and concrete-level match- three components: mention detection, candidate ing signal respectively, they would be comple- generation and entity disambiguation. In this sec- ment each other if designed appropriately. One tion, we will briefly presents the existing solutions straightforward way to combine multiple seman- for the first two components. In next section, we tic matching signals is to apply a linear regres- will introduce our proposed aggregated semantic sion layer to learn a static weight for each match- matching for entity disambiguation. ing signal(Francis-Landau et al., 2016). However, we observe that the importance of different sig- 2.2.1 Mention Detection nals can be different case by case. For example, Given a tweet t with a sequence of words as shown in Table1, the context word Canada w1; :::; wn, our goal is to identify the possible en- is the most important word for the disambiguation tity mentions in the tweet t. Specifically, every of Trudeau. In this case, the concrete-level match- word wi in tweet t requires a label to indicate ing signal is required. While for the tweet “#Star- that whether it is an entity mention word or not. Wars #theForceAwakens #StarWarsForceAwakens Therefore, we view it as a traditional named entity @StarWars”, @StarWars is linked to the entity recognition (NER) problem and use BIO tagging 2 Star Wars . In this case, the whole tweet describes schema. Given the tweet t, we aim to assign labels the same topic “Star Wars”, thus the abstract-level y = (y1; :::; yn) for each word in the tweet t. semantics matching signal is helpful. To address this issue, we propose using a rank aggregation 8 < B wi is a begin word of a mention; method to dynamically combine multiple seman- yi = I wi is a non-begin word of a mention; tic matching signals for disambiguation. : O wi is not a mention word: In summary, we focus on entity disambiguation In our implementation, we apply an LSTM-CRF 2https://en.wikipedia.org/wiki/Star Wars based NER tagging model which automatically 477 Model Overview Knowledge Base Tweet Data ing signals captured by the two neural models on Mention Detection and Candidate Generation four text pairs. Semantic Matching 3.1 Semantic Matching Convolution Neural Neural Relevance Model Formally, given two texts T1 and T2, the semantic Network with Max-Pooling with Kernel-Pooling similarity of the two texts is measured as a score produced by a matching function based on the rep- Rank Aggregation resentation of each text: Linking Results match(T ;T ) = F (Φ(T ); Φ(T )) (1) Figure 1: An overview of aggregated semantic 1 2 1 2 matching for entity disambiguation. where Φ is a function to learn the text representa- tion, and F is the matching function based on the learns contextual features for sequence tagging via interaction between the representations. recurrent neural networks (Lample et al., 2016). Existing neural semantic matching models can be categorized into two types: (a) the 2.2.2 Candidate Generation representation-focused model which takes a com- Given a mention m, we use several heuristic rules plex representation learning function and uses to generate candidate entities similar to (Bunescu a relatively simple matching function, (b) the and Pasca, 2006; Huang et al., 2014; Sun et al., interaction-focused model which usually takes a 2015).