Improving Entity Linking Through Semantic Reinforced Entity

Improving Entity Linking through Semantic Reinforced Entity Embeddings Feng Hou]y, Ruili Wang]y∗, Jun Hez, Yi Zhou\y ]School of Computer and Information Engineering, Zhejiang Gongshang University, Hangzhou, China ySchool of Natural and Computational Sciences, Massey University, New Zealand zSchool of Information Communication, National University of Defense Technology, China \Shanghai Research Center for Brain Science and Brain-Inspired Intelligence, Zhangjiang Lab, China ff.hou, [email protected], [email protected] hejun [email protected], [email protected] Abstract wikipedia articles Entity embeddings, which represent different aspects of each entity with a single vector like word embeddings, are a key component of neural entity linking models. Existing entity embeddings are learned from canonical Wikipedia articles and local contexts surrounding target entities. Such entity embeddings are extracting fine-grained semantic types United States Congress Robert Mueller effective, but too distinctive for linking models [legislature, government, u.s.] [american, lawyer, government, official] to learn contextual commonality. We propose a simple yet effective method, FGS2EE, to inject fine-grained semantic information into en- Congress of the Council of Europe Greg Mueller tity embeddings to reduce the distinctiveness [european, assembly] [german, canadian, poker, player] and facilitate the learning of contextual commonality. FGS2EE first uses the embeddings of semantic type words to generate semantic embeddings, and then combines them with ex- Appearing before Congress, Mr Mueller said he had not isting entity embeddings through linear aggre- exonerated the president of obstruction of justice. gation. Extensive experiments show the effec- Figure 1: Entity linking with embedded fine-grained tiveness of such embeddings. Based on our semantic types entity embeddings, we achieved new state-of- the-art performance on entity linking. We hypothesize that fine-grained semantic types 1 Introduction of entities can let the linking models learn contextual commonality about semantic relatedness. For Entity Linking (EL) or Named Entity Disambigua- example, rugby related documents would have en- tion (NED) is to automatically resolve the ambi- tities of rugby player and rugby team. If a linking guity of entity mentions in natural language by model learns the contextual commonality of rugby linking them to concrete entities in a Knowledge related entities, it can correctly select entities of Base (KB). For example, in Figure1, mentions similar types using the similar contextual informa- “Congress” and “Mr. Mueller” are linked to the tion. corresponding Wikipedia entries, respectively. In this paper, we propose a method FGS2EE Neural entity linking models use local and global to inject fine-grained semantic information into scores to rank and select a set of entities for men- entity embeddings to reduce the distinctiveness and tions in a document. Entity embeddings are critical facilitate the learning of contextual commonality. for the local and global score functions. But the FGS2EE uses the word embeddings of semantic current entity embeddings (Ganea and Hofmann, words that represent the hallmarks of entities (e.g., 2017) encoded too many details of entities, thus writer, carmaker) to generate semantic embeddings. are too distinctive for linking models to learn con- We find that the training converges faster when textual commonality. using semantic reinforced entity embeddings. ∗Corresponding author Our proposed FGS2EE consists of four steps: 6843 Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, pages 6843–6848 July 5 - 10, 2020. c 2020 Association for Computational Linguistics (i) creating a dictionary of fine-grained semantic words from canonical Wikipedia articles and lo- words; (ii) extracting semantic type words from cal context surrounding anchor links. They used each entity’s Wikipedia article; (iii) generating se- Word2Vec vectors (Mikolov et al., 2013) of positive mantic embedding for each entity; (iv) combining words and random negative words as input to the semantic embeddings with existing embeddings learning objective. Thus their entity embeddings through linear aggregation. are aligned with the Word2Vec word embeddings. Fine-grained Entity Typing Fine-grained en- 2 Background and Related Work tity typing is a task of classifying entities into 2.1 Local and Global Score for Entity fine-grained types (Ling and Weld, 2012) or ul- Linking tra fine-grained semantic labels (Choi et al., 2018). Bhowmik and de Melo(2018) used a memory- The local score Ψ(e ; c ) (Ganea and Hofmann, i i based network to generate a short description of 2017) measures the relevance of entity candidates an entity, e.g. “Roger Federer” is described as of each mention independently. ‘Swiss tennis player’. In this paper, we heuristically > extract fine-grained semantic types from the first Ψ(ei; ci) = e B f(ci) i sentence of Wikipedia articles. d where ei 2 R is the embedding of candidate entity Embeddings Aggregation Our research is d×d d ei; B 2 R is a diagonal matrix; f(ci) 2 R is a closely related to the work on aggregation and eval- feature representation of local context ci surround- uation of the information content of embeddings ing mention mi. from different sources (e.g., polysemous words In addition to the local score, the global score have multiple sense embeddings), and fusion of adds a pairwise score Φ(ei; ej;D) to take the co- multiple data sources (Wang et al., 2018). Arora herence of entities in document D into account. et al.(2018) hypothesizes that the global word embedding is a linear combination of its sense embed- 1 Φ(e ; e ;D) = e>C e dings. They showed that senses can be recovered i j n − 1 i j through sparse coding. Mu et al.(2017) showed d that senses and word embeddings are linearly re- where ei and ej 2 R are the embeddings of en- lated and sense sub-spaces tend to intersect over a tities ei, ej, which are candidates for mention mi d×d line. Yaghoobzadeh et al.(2019) probe the aggre- and mj respectively; C 2 R is a diagonal matrix. The pairwise score of (Le and Titov, 2018) gated word embeddings of polysemous words for considers K relations between entities. semantic classes. They created a WIKI-PSE corpus, where word and semantic class pairs are annotated K using Wikipedia anchor links, e.g., “apple” has two X > Φ(ei; ej;D) = αijk ei Rk ej semantic classes: food and organization. A sepa- k=1 rate embedding for each semantic class was learned where αijk is the weight for relation k, and Rk is a based on the WIKI-PSE corpus. They found that diagonal matrix for measuring relations k between the linearly aggregated embeddings of polysemous two entities. words represent well their semantic classes. The most similar work is that of Gupta et al. 2.2 Related Work (2017), but there are many differences: (i) they Our research focuses on improving the vector repre- use the FIGER (Ling and Weld, 2012) type tax- sentations of entities through fine-grained semantic onomy that contains manually curated 112 types types. Related topics are as follows. organized into 2 levels; we employ over 3000 vo- Entity Embeddings Similar to word embed- cabulary words as type, and we treat them as a flat dings, entity embeddings are the vector repre- list; (ii) they mapped the Freebase types to FIGER sentations of entities. The methods of Yamada types,but this method is less credible, as noted by et al.(2016), Fang et al.(2016), Zwicklbauer et al. Daniel Gillick et al.(2014); we extract type words (2016), use data about entity-entity co-occurrences directly from Wikipedia articles, which is more to learn entity embeddings and often suffer from reliable. (iii) their entity vectors and type vectors sparsity of co-occurrence statistics. Ganea and are learned jointly on a limited corpus. Ours are Hofmann(2017) learned entity embeddings using linear aggregations of existing entity vectors, and 6844 word vectors learned from a large corpus, such fine- 3.2 Extracting Semantic Types grained semantic word embeddings are helpful for For each entity, we extract at most 11 dictionary capturing informative context. words (phrases) from its Wikipedia article. For ex- 2.3 Motivation ample, “Robert Mueller” in Figure1 will be typed as [american, lawyer, government, official, direc- person Coarse-grained semantic types (e.g. ) have tor]. been used for candidate selection (Ganea and Hof- mann, 2017). We observe that fine-grained seman- 3.3 Remapping Semantic Words tic words appear frequently as apposition (e.g., De- For some semantic words (e.g., conchologist) or fense contractor Raytheon), coreference (e.g., the semantic phrases (e.g., rugby league), there are company) or anonymous mentions (e.g., American no word embeddings available for generating the defense firms). These fine-grained types of enti- semantic entity embeddings. We remap these se- ties can help capture local contexts and relations of mantic words to semantically similar words that entities. are more common. For example, the concholo- Some of these semantic words have been used gist is remapped to zoologist, and rugby league is for learning entity embeddings, but they are diluted remapped to rugby league. by other unimportant or noisy words. We reinforce entity embeddings with such fine-grained semantic 4 FGS2EE: Injecting Fine-Grained types. Semantic Information into Entity Embeddings 3 Extracting Fine-grained Semantic Types FGS2EE first uses semantic words of each entity to generate semantic entity embeddings, then com- We first create a dictionary of fine-grained semantic bine them with existing entity embeddings to gen- types, then we extract fine-grained types for each erate semantic reinforced entity embeddings. entity. 4.1 Semantic Entity Embeddings 3.1 Semantic Type Dictionary Based on the semantic words of each entity, we We select those words that can encode the hall- can produce a semantic entity embedding.

Load more