REL: an Entity Linker Standing on the Shoulders of Giants SIGIR ’20, July 25–30, 2020, Virtual Event, China

arXiv:2006.01969v1 [cs.IR] 2 Jun 2020 00AM 978-1-4503-8016-4/20/07...$15.00 ACM. 2020 © I)rsaces[7 8.Oe h atyas niylinki entity years, past the Over 18]. [17, researchers (IR) C eeec format: Reference ACM C utb ooe.Asrcigwt rdti emte.T permitted. is credit with Abstracting honored. be must ACM oansM a us,FehhHsb,Ke ecsn rsta Bal Krisztian Dercksen, Koen Hasibi, Faegheh Hulst, van M. Johannes n 2] niyrtivl[1,koldebs population ra base document knowledge including [11], tasks, retrieval entity of an [26], range ing a systems, in retrieval leveraged modern been in has component standard a become retr information by up 6]. followed quickly [4, been has against la work of a disambiguate line as to Wikipedia repository of w utility knowledge It the scale recognized community. first (NLP) who processing them language also natural the be in traditionally ied have disambiguation and o to problems recognition The identifiers tity [2]. unique repository knowledge mentio assigning underlying recognizing an and from of text task in the entities to specific refers (EL) linking Entity INTRODUCTION 1 ’20), 10.1145/3397271.3401416 (SIGIR DOI: 2020 25–30, July China, Event, Informatio tual in Development Shoul- and the Research on on Conference Standing SIGIR Linker In Entity An Giants. REL: of 2020. ders Vries. de P. Arjen and niyLnig oli;Ett iabgain NER Disambiguation; Entity Toolkit; Linking; Entity KEYWORDS bench linking entity standard on exper state-of-the-art an current systems on report well-established also both We against API. comparison web imental a as well prov as is package it Python research, a processing language neura natural state-of-the-art on from Building nents gap. that fill to aims in per presented system al REL be of The important easily performance. most state-of-the-art can and, versions, sources, Wikipedia external newer to on dated depend not ma does components certain replaced, where architecture single modular Despit a a find has to difficult that toolkits. is it third-party s options, source by retrieval open of performed modern plethora often in is component that standard tem a is linking Entity ABSTRACT O:10.1145/3397271.3401416 DOI: China Event, Virtual [email protected] ’20, from SIGIR permissions requ Request lists, fee. to a redistribute to and/or or servers on post to work publish, this of components n for this Copyrights bear page. copies first that the on and thi tion advantage of commercial ar part copies or that or profit provided all for fee of without granted copies is hard use or classroom digital make to Permission E:A niyLne tnigo h hudr fGiants of Shoulders the on Standing Linker Entity An REL: oansM a Hulst van M. Johannes [email protected] abu University Radboud rceig fPoednso h 3dItrainlACM International 43rd the of Proceedings of Proceedings nvriyo Stavanger of University [email protected] rsta Balog Krisztian pages. 4 rspirseicpermission specific prior ires o aeo distributed or made not e g. oyohrie rre- or otherwise, copy o okfrproa or personal for work s tc n h ulcita- full the and otice we yohr than others by owned erea,Vir- Retrieval, n marks. compo- l nstud- en 3,and [3], abu University Radboud n the and hspa- this system ddas ided age Hasibi Faegheh ghas ng [email protected] the e them sof ns ,has l, ieval This be y en- f rge- up- nk- ys- og, as d - • • • • • eso hthsbcm ae,cuigdffiute hnatte when difficulties causing dated, become has that version 1 a endvlpdwt h olwn einconsideration design following the with developed been sho has res the processing language pack on natural and state-of-the-art methods stands the multiple from of REL ensemble an is and linking. giants of entity ders for toolkit source open niylne hthsamdlracietr oes,e.g., Wikipedia. ease, like to up-to-d resources architecture external efficient, modular of an a rem has introducing and that by gap linker problems that entity close these to of aim all we edy work, this With a [14]. network-based r neural proaches incorporate on community linkers NLP the entity in source made progress of open import is default Most that the (throughput). issue of speed An none of 21]. lack the [5, is Wikipedia addressed recent not a to update to ing a edpoe saPto akg,o sdvaarsflAPI. restful a via used or package, Python li a MIT as deployed a be under can http://tiny.cc/RadboudEL at available is REL ie oabeits‘eai’(eaini English). in (relation ‘relatie’ abbreviates to nized oeicueDpdaSolgt[6,TGE[] A 2] an [21], WAT [7], TAGME [19]. FEL [16], Spotlight thi for DBpedia used include toolkits pose prominent most the of Some d processing. in third-pa utilized being some annotations by resulting the performed with commonly toolkit, is it works, these m the of not cus is linking entity Since [23]. recommendation query ie 5.Tpcly hyaesipdwt pcfi Wikipe specific a with shipped search web are like they sources Typically, external on [5]. rely gines some inefficie [12]; and text text short long for meant are others [19]; maintained E nDthmasmye,itreec,o itrac;an disturbance; or interference, mayhem, means Dutch in REL ihtesaeo-h-r nedt-n niylnig[14 linking entity end-to-end on state-of-the-art the with 5 n ae niyrcgiin(E)[] nuigi son is it ensuring [1], (NER) recognition entity named and 15] Use s a Use einfrsufficient for Design vs. recall put). for optimizing (i.e., whic employed in is context linking the entity for appropriate method NER an choose to us e disambiguation entity from Specifica detection mention separating components. disambiguation entity and approach) oueto 0 od.Ntby oto h iei sdfor option. used efficient is more time a a to Develop the changed of be most could Notably, which NER, words. 300 of document ri narcn iiei up(090)adesr easy ensure and dates (2019-07) dump Wikipedia recent a GPU. and, on Train a RAM, need much not need does not it importantly, does it machine; laptop/desktop age epeetREL present We xsigtokt alsoti ubro ra.Sm r un are Some areas. of number a in short fall toolkits Existing state-of-the-art re .d Vries de P. Arjen onwWkpdavrin alncsaysrpsincluded) scripts necessary (all versions Wikipedia new to oua architecture modular abu University Radboud [email protected] lightweight 1 wihsad o abu niyLne) an Linker), Entity Radboud for stands (which prahsfrett iabgain(D [8, (ED) disambiguation entity for approaches throughput ouinta a edpoe na aver- an on deployed be can that solution ihmnindtcin(sn NER a (using detection mention with [email protected] abu University Radboud onDercksen Koen eotn 0 sfra average an for ms 700 reporting ; ,i sesl recog- easily is it d, ownstream ac.REL earch. through- updates ]. i fo- ain cense, s: tfor nt nables pur- s ecent antly, ages mpt- par ten up- dia ate rty en- lly, ul- p- d h - - . SIGIR ’20, July 25–30, 2020, Virtual Event, China J.M. van Hulst, et al.

2 ENTITY LINKING IN REL coherence with the other entity linking decisions in the document: In this section, we present the entity linking method underlying n = , + , , , REL. We follow a standard entity linking pipeline architecture [2], E∗ argmax Õψ ei ci Õϕ ei ej D (1) E C1 ... Cn = ( ) , ( ) consisting of three components: (i) mention detection, (ii) candi- ∈ × × i 1 i j date selection, and (iii) entity disambiguation. where Ci denotes the set of candidate entities for mention mi and E = e1,.., en . The coherence score between entity ei and its 2.1 Mention Detection { } local context ci is computed by the function ψ ei ,ci as defined In the mention detection step, we aim to detect all text spans that ( ) in [8], and the coherence between all entity linking decisions is can be linked to entities. These text spans, referred to as mentions, captured by the function ϕ ei ,ej , D . Le and Titov [15] compute are obtained by employing a Named Entity Recognition (NER) tool. ( ) the ϕ function by incorporating relations between mentions of a NER taggers detect entity mentions in text and annotate them with document. Assuming K latent relations, ϕ is calculated as: (coarse-grained) entity types [2]. We employ Flair [1], a state-of- the-art NER based on contextualized word embeddings. Flair takes K , , = T , the input to be a sequence of characters and passes it to a bidirec- ϕ ei ej D Õ αijk ei Rk ej (2) ( ) = tional character-level neural language model to generate a contex- k 1 d tual string embedding for each word. These embeddings are then where ei , ej R are the embeddings of entities ei ,ej (using the ∈ utilized in a sequence labeling module to generate tags for NER. same embeddings as in the candidate selection step), Rk is a diag- Using a NER method for mention detection enables us to strike onal matrix, and αijk is a normalized score defined as: a balance between precision and recall. Another approach, which f T m ,c D f m , c may result in high recall, is matching all n-grams (up to a certain = 1 i i k j j αijk exp n ( ) ( ) o , (3) n) in the input text against a rich dictionary of entity names [2, 10]. Zijk √d In REL, the mention detection component can easily be replaced where D Rd d is a diagonal matrix, and function f is a single- by another NER tagger such as spaCy2 or by a dictionary-based k ∈ × layer neural network that maps mention m and its context c to a approach. i i d-dimensional vector. Zijk is a normalization factor over j and is 2.2 Candidate Selection computed as: + n T For each text span detected as a mention, we select up tok1 k2 (=7) f mi ,ci D f mj , cj = ( ) k ( ) . candidate entities (following [8]). The k1 (=4) candidate entities are Zijk Õ exp n o (4) = √d selected from the top ranked entities based on the mention-entity j′ 1 , prior p e m , for a given entity e and a mention m. To compute j′ i ( | ) this prior, we sum up hyperlink counts from Wikipedia and from The optimization of Eq. (1) is performed using max-product loopy the CrossWikis corpus [25] to estimate probability P e m . A belief propagation (LBP), and the final score for an entity of a men- Wiki( | ) uniform probability P e m is also extracted from YAGO dic- tion is obtained by a two-layer neural network that combines P e m YAGO( | ) ( | ) tionary [13]. These two probabilities are combined into the final with max-marginal probability of an entity for a given document. P e m prior as min 1, P e m + P e m [8]. The training of the model, referred to as the ED model henceforth, ( | ) ( Wiki( | ) YAGO( | )) The other k2 (=3) candidate entities are chosen based on their is performed using max-margin loss. To estimate posterior proba- similarity to the context of the mention. This similarity score is bilities of the linked entities, we fit a logistic function over the final T obtained by e w c w, where c is n-word (n = 50) context sur- scores obtained by the neural model [22]. rounding mentionÍ m∈ and w and e are entity and word embedding vectors. This score is computedfor k (=30) entities with the highest 3 IMPLEMENTATION AND USAGE P e m prior and the top-k2 entities are added to the list of candi- Next, we describe the implementation details and usage of REL. ( | ) date entities [8]. In REL, we use Wikipedia2Vec word and entity embeddings [28] 3.1 Implementation Details to estimate the similarity between an entity and a mention’s local Memory and GPU usage. One of the design requirements of REL context. Wikipedia2Vec jointly learns word and entity embeddings is being lightweight, such that it can be deployed on an average ma- from Wikipedia text and link structure, and is available as an open chine. To minimize memory requirements, we store Wikipedia2Vec source library [27]. The hyper-parameters k1, k2, k, and n are set entity and word embeddings, GloVe embeddings, and an index of based on the recommended values in [8, 15]. pre-computed P e m values (i.e., a surface form dictionary) in a ( | ) SQLite33 database. Using SQLite, we are able to minimize mem- 2.3 Entity Disambiguation ory usage for our API to 1.8GB if the user chooses to not preload In the entity disambiguation step, we link mentions to their corre- embeddings. REL also does not require GPU during inference. The sponding entities in the knowledge graph (here: Wikipedia). En- neural model used for entity disambiguation is a feed-forward net- tity disambiguation in REL is based on the Ment-norm method pro- work and does not require heavy CPU/GPU usage. Training of posed by Le and Titov [15]. Given an input document D, the en- Wikipedia2Vec embeddings, however, requires high memory and tity linking decisions are made by combining local compatibility is done more efficiently using a GPU. (which includes prior importance and contextual similarity) and

2https://spacy.io/ 3https://www.sqlite.org/index.html REL: An Entity Linker Standing on the Shoulders of Giants SIGIR ’20, July 25–30, 2020, Virtual Event, China

Table 1: EL strong matching results on the GERBIL platform. Figure 1: Example API input and output for entity linking.

INPUT : {"text": "Belgrade 1996-08-30 Result in an international basketball tournament on Friday : Red Star ( Yugoslavia ) beat Dinamo ( Russia) 92-90 ( halftime 47-47 )."} AIDA-B MSNBC OKE-2015 OKE-2016 N3-Reuters-128 N3-RSS-500 KORE50 OUTPUT : Derczynski [ [0, 8, 'Belgrade', 'Belgrade', 0.91, 0.98, 'LOC', ], Macro F1 [80, 8, 'Red Star', 'KK_Crvena_zvezda ', 0.36, 0.99, 'ORG'], Micro F1 [91, 10, 'Yugoslavia ', 'Yugoslavia', 0.8, 0.99, 'LOC'], [109, 6, 'Dinamo', 'FC_Dinamo_Bucuresti', 0.7, 0.99, 'ORG'], DBpedia 52.0 42.4 42.0 41.4 21.5 26.7 33.7 29.4 [118, 6, 'Russia', 'Russia', 0.85, 0.99, 'LOC'] Spotlight 57.8 40.6 44.4 43.1 24.8 27.2 32.2 34.9 ] 70.8 62.6 53.2 51.8 45.0 45.3 44.4 37.3 WAT 73.0 64.5 56.4 53.9 49.2 42.3 38.0 49.6 82.6 73.0 56.6 47.8 45.4 43.8 43.2 26.2 SOTA NLP REL components. REL has a modular architecture, with separate 82.4 72.4 61.9 52.7 50.3 38.2 34.1 35.2 components for mention detection, entity disambiguation, and the 81.3 73.2 61.5 57.5 46.8 35.9 38.1 60.1 REL (2014) generation of the P e m index. The mention detection component 83.3 74.4 64.8 58.8 49.7 34.3 41.2 61.6 ( | ) 4 78.6 71.1 61.8 57.4 45.7 36.2 38.0 50.1 is based on the Flair package and can be easily replaced by another REL (2019) mention detection approach. The disambiguation component is 80.5 72.4 63.1 58.3 49.9 35.0 41.1 50.7 implemented using PyTorch and based on the source code of [15].5 The generation of the P e m index is based on the source code Table 2: ED results on the GERBIL platform. ( | ) of [8] and involves the parsing of Wikipedia, the CrossWikis corpus, and YAGO. Any of these may be either removed completely, or replaced by different corpora; using the resulting P e m index ( | ) in the package instead. AIDA-B MSNBC OKE-2015 OKE-2016 N3-Reuters-128 N3-RSS-500 KORE50 ED Training. For the entity disambiguation method, we used the Derczynski AIDA-train dataset for training and AIDA-A for validation. We use Macro F1 3 4 Micro F1 the Adam optimizer and reduce the learning rate from 1− to 1− once the F1-score of the validation set reaches 0.88 (following [15]). DBpedia 53.7 43.6 30.4 43.0 41.8 42.6 50.3 48.7 Spotlight 56.1 42.1 35.8 43.1 43.4 34.6 43.3 52.3 79.8 79.7 62.2 0.0 59.2 62.8 70.4 52.4 Embeddings. The entity and word embeddings used for selecting WAT 80.5 78.8 64.9 0.0 63.1 63.9 69.5 62.2 candidate entities are trained on a Wikipedia 2019-07 dump using 83.8 88.5 73.2 76.7 63.4 66.6 65.3 52.4 6 SOTA NLP the Wikipedia2Vec package. Following [9], we set the min-entity- 83.0 86.2 74.0 78.1 67.3 68.6 65.4 60.8 count parameter to zero and used the Wikipedia link graph during 85.5 89.6 65.5 72.0 59.8 61.0 61.9 61.9 REL (2014) training. For the entity disambiguation model, we used GloVe em- 86.6 88.5 65.8 72.2 64.9 62.8 62.1 64.6 beddings [20] as suggested in [15]. 82.9 86.3 64.0 67.0 58.2 61.7 62.3 54.4 REL (2019) 84.0 85.8 64.3 67.3 64.9 64.1 62.0 54.0 3.2 Usage REL can be used as a Python package deployed on a local machine, for entity disambiguation only, by submitting an input text and a or as a service, via a restful API. list of spans (specified with start position and length). To use REL as a package, our GitHub repository contains step- by-step tutorials on how to perform end-to-end entity linking, and 4 EVALUATION on how to (re-)train the ED model. We provide scripts and instruc- We compare REL with a state-of-the-art end-to-end entity link- tions for deploying REL using a new Wikipedia dump; this helps ing [14], referred to as SOTA NLP, and two popular well-established REL users to keep up-to-date with emerging entities in Wikipedia, entity linking systems: (i) DBpedia-spotlight [16] and (ii) WAT [21], and enables researchers to deploy REL for any specific Wikipedia the updated version of TagMe [7]. We report the results for two version that is required for a downstream task. versions of our system. The first one, denoted as REL (2014), is The API is publicly available. Given an input text, depicted in based on the original implementation of [15] for ED. It uses Wikipedia Fig. 1 (Top), the API returns a list of mentions, each with (i) the 2014 as the reference knowledge base and employs entity embed- start position and length of the mention, (ii) the mention itself, (iii) dings provided by [8] for candidate selection. The second version the linked entity, (iv) the confidence score of ED, and (vi) the confi- of our system, denoted as REL (2019), is based on Wikipedia 2019- dence score and type of entity from the mention detection step (if 07 and uses Wikipedia2Vec embeddings; cf. Section 3. available); see Fig. 1 (Bottom). Alternatively, a user can use the API We use the GERBIL platform [24] for evaluation, and report on micro and macro InKB F1 scores for both EL and ED. Table 1 shows 4https://github.com/flairNLP/flair 5https://github.com/lephong/mulrel-nel the strong matching results for EL, where strong refers to the re- 6https://wikipedia2vec.github.io/wikipedia2vec quirement of exactly predicting the gold mention boundaries. We SIGIR ’20, July 25–30, 2020, Virtual Event, China J.M. van Hulst, et al.

Table 3: Local ED results as reported in [15] we plan to train REL on a large corpus of annotated queries and make it available for the task of entity linking in queries as well. REFERENCES [1] Alan Akbik, Duncan Blythe, and Roland Vollgraf. 2018. Contextual String Em- AIDA-B ACE2004 CLUEWEB MSNBC Wikipedia Aquaint beddings for Sequence Labeling. In Proc. of COLING ’18. 1638–1649. Micro F1 [2] Krisztian Balog. 2018. Entity-Oriented Search. The Information Retrieval Series, Vol. 39. Springer. MulRel-NEL [15] 93.1 89.9 88.3 77.5 93.9 78.0 [3] Krisztian Balog, Heri Ramampiaro, Naimdjon Takhirov, and Kjetil Nørvåg. 2013. REL (2014) 92.8 89.7 87.4 77.6 93.5 78.7 Multi-step Classification Approaches to Cumulative Citation Recommendation. REL (2019) 89.4 85.3 84.1 71.9 90.7 73.1 In Proc. of OAIR ’13. 121–128. [4] Razvan Bunescu and Marius Paşca. 2006. Using Encyclopedic Knowledge for Named Entity Disambiguation. In Proc. of EACL ’06. 9–16. Table 4: Efficiency of REL (in seconds) for 50 documents [5] Marco Cornolti, Paolo Ferragina, Massimiliano Ciaramita, Stefan Rüd, and Hin- rich Schütze. 2018. SMAPH: A Piggyback Approach for Entity-Linking in Web from AIDA-B with > 200 words, which is 323 ( 105) words ± Queries. ACM Trans. Inf. Syst. 37, 1 (2018). and 42 ( 19) mentions per document. [6] Silviu Cucerzan. 2007. Large-Scale Named Entity Disambiguation Based on ± Wikipedia Data. In Proc. of EMNLP-CoNLL ’07. 708–716. [7] Paolo Ferragina and Ugo Scaiella. 2010. TAGME: On-the-fly Annotation of Short Time MD Time ED Text Fragments (by Wikipedia Entities). In Proc. of CIKM ’10. 1625–1628. With GPU 0.44 0.22 0.24 0.08 [8] Octavian-Eugen Ganea and Thomas Hofmann. 2017. Deep Joint Entity Disam- ± ± biguation with Local Neural Attention. In Proc. of EMNLP ’17. 2619–2629. Without GPU 2.41 1.24 0.18 0.09 ± ± [9] Emma Gerritse, Faegheh Hasibi, and Arjen P. de Vries. Graph-Embedding Em- powered Entity Retr