Global Entity Disambiguation with Pretrained Contextualized Embeddings of Words and Entities Ikuya Yamada1;4 Koki Washio2;4 Hiroyuki Shindo3;4 Yuji Matsumoto4
[email protected] [email protected] [email protected] [email protected] 1Studio Ousia, Tokyo, Japan 2The University of Tokyo, Tokyo, Japan 3Nara Institute of Science and Technology, Nara, Japan 4RIKEN AIP, Tokyo, Japan Abstract the model by predicting randomly masked entities based on words and non-masked entities. We train We propose a new global entity disambigua- the model using texts and their entity annotations tion (ED) model based on contextualized em- retrieved from Wikipedia. beddings of words and entities. Our model is based on a bidirectional transformer en- Furthermore, we introduce a simple extension coder (i.e., BERT) and produces contextual- to the inference step of the model to capture global ized embeddings for words and entities in the contextual information. Specifically, similar to the input text. The model is trained using a new approach used in past work (Fang et al., 2019; masked entity prediction task that aims to train Yang et al., 2019), we address ED as a sequen- the model by predicting randomly masked en- tial decision task that disambiguates mentions one tities in entity-annotated texts obtained from by one, and uses words and already disambiguated Wikipedia. We further extend the model by entities to disambiguate new mentions. solving ED as a sequential decision task to capture global contextual information. We We evaluate the proposed model using six stan- evaluate our model using six standard ED dard ED datasets and achieve new state-of-the-art datasets and achieve new state-of-the-art re- results on all but one dataset.