Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes
Building a Corpus for Japanese Wikification with Fine-Grained Entity Classes Davaajav Jargalsaikhan Naoaki Okazaki Koji Matsuda Kentaro Inui Tohoku University, Japan davaajav, okazaki, matsuda, inui @ecei.toholu.ac.jp { } Abstract EL is useful for various NLP tasks, e.g., Question-Answering (Khalid et al., 2008), Infor- In this research, we build a Wikifica- mation Retrieval (Blanco et al., 2015), Knowl- tion corpus for advancing Japanese En- edge Base Population (Dredze et al., 2010), Co- tity Linking. This corpus consists of 340 Reference Resolution (Hajishirzi et al., 2013). Japanese newspaper articles with 25,675 There are about a dozen of datasets target- entity mentions. All entity mentions are ing EL in English, including UIUC datasets labeled by a fine-grained semantic classes (ACE, MSNBC) (Ratinov et al., 2011), AIDA (200 classes), and 19,121 mentions were datasets (Hoffart et al., 2011), and TAC-KBP successfully linked to Japanese Wikipedia datasets (2009–2012 datasets) (McNamee and articles. Even with the fine-grained se- Dang, 2009). mantic classes, we found it hard to define Ling et al. (2015) discussed various challenges the target of entity linking annotations and in EL. They argued that the existing datasets are to utilize the fine-grained semantic classes inconsistent with each other. For instance, TAC- to improve the accuracy of entity linking. KBP targets only mentions belonging to PER- 1 Introduction SON, LOCATION, ORGANIZATION classes. Although these entity classes may be dominant in Entity linking (EL) recognizes mentions in a text articles, other tasks may require information on and associates them to their corresponding en- natural phenomena, product names, and institu- tries in a knowledge base (KB), for example, tion names.
[Show full text]