Entity Linking meets Sense Disambiguation: a Unified Approach

Andrea Moro, Alessandro Raganato, Roberto Navigli Dipartimento di Informatica, Sapienza Universita` di Roma, Viale Regina Elena 295, 00161 Roma, Italy moro,navigli @di.uniroma1.it { [email protected]}

Abstract such as BabelNet (Navigli and Ponzetto, 2012a), DBpedia (Auer et al., 2007) and YAGO2 (Hoffart Entity Linking (EL) and Word Sense Disam- et al., 2013), has favoured the emergence of new biguation (WSD) both address the lexical am- tasks, such as Entity Linking (EL) (Rao et al., 2013), biguity of . But while the two tasks and opened up new possibilities for tasks such as are pretty similar, they differ in a fundamen- Disambiguation (NED) and Wikifi- tal respect: in EL the textual mention can be linked to a named entity which may or may not cation. The aim of EL is to discover mentions of contain the exact mention, while in WSD there entities within a text and to link them to the most is a perfect match between the word form (bet- suitable entry in a reference knowledge base. How- ter, its lemma) and a suitable word sense. ever, in contrast to WSD, a mention may be partial In this paper we present Babelfy, a unified while still being unambiguous thanks to the context. graph-based approach to EL and WSD based For instance, consider the following sentence: on a loose identification of candidate mean- (1) Thomas and Mario are strikers playing in Munich. ings coupled with a densest subgraph heuris- tic which selects high-coherence semantic in- This example makes it clear how intertwined the terpretations. Our experiments show state-of- two tasks of WSD and EL are. In fact, on the one the-art performances on both tasks on 6 differ- hand, striker and play are polysemous which ent datasets, including a multilingual setting. can be disambiguated by selecting the game/soccer Babelfy is online at http://babelfy.org playing senses of the two words in a ; on the other hand, Thomas and Mario are partial men- 1 Introduction tions which have to be linked to the appropriate en- tries of a knowledge base, that is, Thomas Muller¨ The automatic understanding of the meaning of text and Mario Gomez, two well-known soccer players. has been a major goal of research in computational The two main differences between WSD and EL linguistics and related areas for several decades, lie, on the one hand, in the kind of inventory used, with ambitious challenges, such as Machine Read- i.e., dictionary vs. encyclopedia, and, on the other ing (Etzioni et al., 2006) and the quest for knowl- hand, in the assumption that the mention is complete edge (Schubert, 2006). Word Sense Disambiguation or potentially partial. Notwithstanding these differ- (WSD) (Navigli, 2009; Navigli, 2012) is a historical ences, the tasks are similar in nature, in that they task aimed at assigning meanings to single-word and both involve the disambiguation of textual fragments multi-word occurrences within text, a task which is according to a reference inventory. However, the re- more alive than ever in the research community. search community has so far tackled the two tasks Recently, the collaborative creation of large semi- separately, often duplicating efforts and solutions. structured resources, such as , and knowl- In contrast to this trend, research in knowledge edge resources built from them (Hovy et al., 2013), acquisition is now heading towards the seamless in-

231

Transactions of the Association for Computational Linguistics, 2 (2014) 231–244. Action Editor: Noah Smith. Submitted 9/2013; Revised 1/2014; Published 5/2014. c 2014 Association for Computational Linguistics. tegration of encyclopedic and lexicographic knowl- by exploiting centrality and connectivity measures edge into structured language resources (Hovy et al., (Sinha and Mihalcea, 2007; Tsatsaronis et al., 2007; 2013), and the main representative of this new direc- Agirre and Soroa, 2009; Navigli and Lapata, 2010). tion is undoubtedly BabelNet (Navigli and Ponzetto, One of the key steps of many knowledge-based 2012a). Given such structured language resources it WSD algorithms is the creation of a graph repre- seems natural to suppose that they might provide a senting the semantic interpretations of the input text. common ground for the two tasks of WSD and EL. Two main strategies to build this graph have been More precisely, in this paper we explore the hy- proposed: i) exploiting the direct connections, i.e., pothesis that the lexicographic knowledge used in edges, between the considered sense candidates; ii) WSD is also useful for tackling the EL task, and, populating the graph according to (shortest) paths vice versa, that the encyclopedic information uti- between them. In our approach we manage to unify lized in EL helps disambiguate nominal mentions in these two strategies by automatically creating edges a WSD setting. We propose Babelfy, a novel, uni- between sense candidates performing Random Walk fied graph-based approach to WSD and EL, which with Restart (Tong et al., 2006). performs two main steps: i) it exploits random walks The recent upsurge of interest in multilinguality with restart, and triangles as a support for reweight- has led to the development of cross-lingual and mul- ing the edges of a large ; ii) it uses tilingual approaches to WSD (Lefever and Hoste, a densest subgraph heuristic on the available seman- 2010; Lefever and Hoste, 2013; Navigli et al., 2013). tic interpretations of the input text to perform a joint Multilinguality has been exploited in different ways, disambiguation with both concepts and named enti- e.g., by using parallel corpora to build multilingual ties. Our experiments show the benefits of our syn- contexts (Guo and Diab, 2010; Banea and Mihalcea, ergistic approach on six gold-standard datasets. 2011; Lefever et al., 2011) or by means of ensemble methods which exploit complementary sense evi- 2 Related Work dence from translations in different (Nav- igli and Ponzetto, 2012b). In this work, we present 2.1 Word Sense Disambiguation a novel exploitation of the structural properties of a Word Sense Disambiguation (WSD) is the task of multilingual semantic network. choosing the right sense for a word within a given context. Typical approaches for this task can be clas- 2.2 Entity Linking sified as i) supervised, ii) knowledge-based, and iii) Entity Linking (Erbs et al., 2011; Rao et al., 2013; unsupervised. However, supervised approaches re- Cornolti et al., 2013) encompasses a set of similar quire huge amounts of annotated data (Zhong and tasks, which include Named Entity Disambiguation Ng, 2010; Shen et al., 2013; Pilehvar and Navigli, (NED), that is the task of linking entity mentions 2014), an effort which cannot easily be repeated in a text to a knowledge base (Bunescu and Pasca, for new domains and languages, while unsupervised 2006; Cucerzan, 2007), and Wikification, i.e., the ones suffer from data sparsity and an intrinsic diffi- automatic annotation of text by linking its relevant culty in their evaluation (Agirre et al., 2006; Brody fragments of text to the appropriate Wikipedia arti- and Lapata, 2009; Manandhar et al., 2010; Van de cles. Mihalcea and Csomai (2007) were the first to Cruys and Apidianaki, 2011; Di Marco and Nav- tackle the Wikification task. In their approach they igli, 2013). On the other hand, knowledge-based disambiguate each word in a sentence independently approaches are able to obtain good performance us- by exploiting the context in which it occurs. How- ing readily-available structured knowledge (Agirre ever, this approach is local in that it lacks a collective et al., 2010; Guo and Diab, 2010; Ponzetto and Nav- notion of coherence between the selected Wikipedia igli, 2010; Miller et al., 2012; Agirre et al., 2014). pages. To overcome this problem, Cucerzan (2007) Some of these approaches marginally take into ac- introduced a global approach based on the simulta- count the structural properties of the knowledge base neous disambiguation of all the terms in a text and (Mihalcea, 2005). Other approaches, instead, lever- the use of lexical context to disambiguate the men- age the structural properties of the knowledge base tions. To maximize the semantic agreement Milne

232 and Witten (2008) introduced the analysis of the se- 3 WSD and Entity Linking Together mantic relations between the candidate senses and Task. Our task is to disambiguate and link all the unambiguous context, i.e., words with a single nominal and named entity mentions occurring sense candidate. However, the performance of this within a text. The linking task is performed by asso- algorithm depends heavily on the number of links ciating each mention with the most suitable entry of incident to the target senses and on the availabil- a given knowledge base.1 ity of unambiguous words within the input text. To We point out that our definition is unconstrained overcome this issue a novel class of approaches have in terms of what to link, i.e., unlike Wikification and been proposed (Kulkarni et al., 2009; Ratinov et al., WSD, we can link overlapping fragments of text. 2011; Hoffart et al., 2011) that exploit global and For instance, given the text fragment Major League local features. However, these systems either rely Soccer, we identify and disambiguate several dif- on a difficult NP-hard formalization of the problem ferent nominal and entity mentions: Major League which is infeasible for long text, or exploit popular- Soccer, major league, league and soccer. In contrast ity measures which are domain-dependent. In con- to EL, we link not only named entity mentions, such trast, we show that the semantic network structure as Major League Soccer, but also nominal mentions, can be leveraged to obtain state-of-the-art perfor- e.g., major league, to their corresponding meanings mance by synergistically disambiguating both word in the knowledge base. senses and named entities at the same time. Recently, the explosion of on-line social network- Babelfy. We provide a unified approach to WSD ing services, such as Twitter and Facebook, have and entity linking in three steps: contributed to the development of new methods for 1. Given a lexicalized semantic network, we as- the efficient disambiguation of short texts (Ferrag- sociate with each vertex, i.e., either concept or ina and Scaiella, 2010; Hoffart et al., 2012; Bohm¨ et named entity, a semantic signature, that is, a set al., 2012). Thanks to a loose candidate identification of related vertices (Section 5). This is a prelim- technique coupled with a densest subgraph heuristic, inary step which needs to be performed only we show that our approach is particularly suited for once, independently of the input text. short and highly ambiguous text disambiguation. 2. Given a text, we extract all the linkable frag- 2.3 The Best of Two Worlds ments from this text and, for each of them, list Our main goal is to bring together the two worlds of the possible meanings according to the seman- WSD and EL. On the one hand, this implies relaxing tic network (Section 6). the constraint of a perfect association between men- tions and meanings, which is, instead, assumed in 3. We create a graph-based semantic interpreta- WSD. On the other hand, this relaxation leads to the tion of the whole text by linking the candidate inherent difficulty of encoding a full-fledged sense meanings of the extracted fragments using the inventory for EL. Our solution to this problem is to previously-computed semantic signatures. We keep the set of candidate meanings for a given men- then extract a dense subgraph of this represen- tion as open as possible (see Section 6), so as to en- tation and select the best candidate meaning for able high recall in linking partial mentions, while each fragment (Section 7). providing an effective method for handling this high 4 Semantic Network ambiguity (see Section 7). A key assumption of our work is that the lexico- Our approach requires the availability of a wide- graphic knowledge used in WSD is also useful for coverage semantic network which encodes struc- tackling the EL task, and vice versa the encyclopedic tural and lexical information both of an encyclope- information utilized in EL helps disambiguate nom- dic and of a lexicographic kind. Although in prin- inal mentions in a WSD setting. We enable the joint ciple any semantic network with these properties treatment of concepts and named entities by enforc- 1Mentions which are not contained in the reference knowl- ing high coherence in our semantic interpretations. edge base are not taken into account.

233 could be utilized, in our work we used the Babel- Structural weighting. Our first objective is to as- Net2 1.1.1 semantic network (Navigli and Ponzetto, sign higher weights to edges which are involved in 2012a) since it is the largest multilingual knowl- more densely connected areas of the directed net- edge base, obtained from the automatic seamless work. To this end, inspired by the local cluster- integration of Wikipedia3 and WordNet (Fellbaum, ing coefficient measure (Watts and Strogatz, 1998) 1998). We consider BabelNet as a directed multi- and its recent success in Word Sense Induction graph which contains both concepts and named en- (Di Marco and Navigli, 2013), we use directed tri- tities as its vertices and a multiset of semantic rela- angles, i.e., directed cycles of length 3, and weight tions as its edges. We leverage the multilingual lex- each edge (v, v0) by the number of directed triangles icalizations of the vertices of BabelNet to identify it occurs in: weight(v, v0) := (v, v0, v00):(v, v0), (1) mentions in the input text. For example, the entity |{ FC Bayern Munich can be lexicalized in different (v0, v00), (v00, v) E + 1 ∈ }| languages, e.g., F.C. Bayern de Munich´ in Spanish, We add one to each weight to ensure the highest de- Die Roten in English and Bayern Munchen¨ in Ger- gree of reachability in the network. man, among others. As regards semantic relations, Random Walk with Restart. Our goal is to cre- the only information we use is that of the end points, ate a semantic signature (i.e., a set of highly related i.e., vertices, that these relations connect, while ne- vertices) for each concept and named entity of the glecting the relation type. semantic network. To do this, we perform a Random Walk with Restart (RWR) (Tong et al., 2006), that is, 5 Building Semantic Signatures a stochastic process that starts from an initial vertex of the graph4 and then, for a fixed number n of steps One of the major issues affecting both manually- or until convergence, explores the graph by choos- curated and automatically constructed semantic net- ing the next vertex within the current neighborhood works is data sparsity. For instance, we calculated or by restarting from the initial vertex with a given, that the average number of incident edges is roughly fixed restart probability α. For each edge (v, v0) in 10 in WordNet, 50 in BabelNet and 80 in YAGO2, the network, we model the conditional probability to mention a few. Although automatically-built re- P (v0 v) as the normalized weight of the edge: sources typically provide larger amounts of edges, | weight(v, v0) two issues have to be taken into account: concepts P (v0 v) = | v V weight(v, v00) which should be related might not be directly con- 00∈ nected despite being structurally close within the where V is the set ofP vertices of the semantic net- network, and, vice versa, weakly-related or even un- work and weight(v, v0) is the function defined in related concepts can be erroneously connected by an Equation 1. We then run the RWR from each ver- edge. For instance, in BabelNet we do not have an tex v of the semantic network for a fixed number n edge between playmaker and Thomas Muller¨ , while of steps (we show in Algorithm 1 our RWR pseu- we have an incorrect edge connecting FC Bayern docode). We keep track of the encountered ver- Munich and Yellow Submarine (song). However, tices using the map counts, i.e., we increase the this crisp notion of relatedness can be overcome by counter associated with vertex v0 in counts every exploiting the global structure of the semantic net- time we hit v0 during a RWR started from v (see work, thereby obtaining a more precise and higher- line 11). As a result, we obtain a frequency distri- coverage measure of relatedness. We address this bution over the whole set of concepts and entities. issue in two steps: first, we provide a structural To eliminate weakly-related vertices we keep only weighting of the network’s edges; second, for each those items that were hit at least η times (see lines vertex we create a set of related vertices using ran- 16–18). Finally, we save the remaining vertices in dom walks with restart. the set semSignv which is the semantic signature of v (see line 19). 2http://babelnet.org 4RWR can be used with an initial set of vertices, however in 3http://www.wikipedia.org this paper we use a single initial vertex.

234 Algorithm 1 Random walk with restart. mention Mario even if this named entity does not 1: input: v, the starting vertex; have Mario as one of its lexicalizations (for an anal- α, the restart probability; ysis of the impact of this routine against the exact n, the number of steps to be executed; matching approach see the discussion in Section 9). P , the transition probabilities; η, the frequency threshold. Moreover, as we stated in Section 3, we allow overlapping fragments, e.g., for major league we 2: output: semSignv, set of related vertices for v. 3: function RWR(v, α, n, P, η) recognize league and major league. We denote with 4: v0 := v cand(f) the set of all the candidate meanings of 5: counts := new Map < Synset, Integer > fragment f. For instance, for the noun league we 6: while n > 0 do have that cand(league) contains among others the 7: if random() > α then sport word sense and the TV series named entity. 8: given the transition probabilities P ( v0) ·| 9: of v0, choose a random neighbor v00 10: v0 := v00 7 Candidate Disambiguation 11: counts[v0] + + 12: else Semantic interpretation graph. After the identi- 13: restart the walk fication of fragments (F ) and their candidate mean- 14: v := v ings (cand( )), we create a directed graph G = 0 · I 15: n := n 1 (V ,E ) of the semantic interpretations of the input − I I 16: for each v0 in counts.keys() do text. We show the pseudocode in Algorithm 2. VI 17: if counts[v0] < η then contains all the candidate meanings of all fragments, 18: remove v0 from counts.keys() that is, V := (v, f): v cand(f), f F , where I { ∈ ∈ } 19: return semSignv = counts.keys() f is a fragment of the input text and v is a candidate Babel synset that has a lexicalization which is equal The creation of our set of semantic signatures, one to or is a superstring of f (see lines 4–8). The set for each vertex in the semantic network, is a prelim- of edges EI connects related meanings and is pop- inary step carried out once only before starting pro- ulated as follows: we add an edge from (v, f) to cessing any input text. We now turn to the candidate (v , f ) if and only if f = f and v semSign 0 0 6 0 0 ∈ v identification and disambiguation steps. (see lines 9–11). In other words, we connect two candidate meanings of different fragments if one is 6 Candidate Identification in the semantic signature of the other. For instance, Given a text as input, we apply part-of-speech tag- we add an edge between (Mario Gomez, Mario) and ging and identify the set F of all the textual frag- (Thomas Muller,¨ Thomas), while we do not add one ments, i.e., all the sequences of words of maximum between (Mario Gomez, Mario) and (Mario Basler, length five, which contain at least one noun and that Mario) since these are two candidate meanings of are substrings of lexicalizations in BabelNet, i.e., the same fragment, i.e., Mario. In Figure 1, we show those fragments that can potentially be linked to an an excerpt of our graph for sentence (1). entry in BabelNet. For each textual fragment f F , At this point we have a graph-based representa- ∈ i.e., a single- or multi-word expression of the input tion of all the possible interpretations of the input text, we look up the semantic network for candidate text. In order to drastically reduce the degree of am- meanings, i.e., vertices that contain f or, only for biguity while keeping the interpretation coherence named entities, a superstring of f as their lexical- as high as possible, we apply a novel densest sub- ization. For instance, for sentence (1) in the intro- graph heuristic (see line 12), whose description we duction, we identify the following textual fragments: defer to the next paragraph. The result is a sub- Thomas, Mario, strikers, Munich. This output is ob- graph which contains those semantic interpretations tained thanks to our loose candidate identification that are most coherent to each other. However, this routine, i.e., based on superstring matching instead subgraph might still contain multiple interpretations of exact matching, which, for instance, enables us to for the same fragment, and even unambiguous frag- recognize the right candidate Mario Gomez for the ments which are not correct. Therefore, the final

235 (Tomas´ Milian,´ Thomas) (Mario Adorf, Mario) (Thomas Muller,¨ Thomas) (Mario Basler, Mario) (Mario Gomez, Mario)

(forward, striker) (Munich, Munich) (striker, striker) (FC Bayern Munich, Munich)

Figure 1: An excerpt of the semantic interpretation graph automatically built for the sentence Thomas and Mario are strikers playing in Munich (the edges connecting the correct meanings are in bold). step is the selection of the most suitable candidate Algorithm 2 Candidate Disambiguation. meaning for each fragment f given a threshold θ to 1: input: F , the fragments in the input text; discard semantically unrelated candidate meanings. semSign, the semantic signatures; We score each meaning v cand(f) with its nor- µ, ambiguity level to be reached; ∈ cand malized weighted degree5 in the densest subgraph: , fragments to candidate meanings. 2: output: selected, disambiguated fragments. w deg((v, f)) 3: function DISAMB(F, semSign, µ, cand) (v,f) · score((v, f)) = (2) 4: VI := ; EI := w(v0,f) deg((v0, f)) ∅ ∅ v cand(f) · 5: GI := (VI ,EI ) 0 ∈ P 6: for each fragment f F do where w(v,f) is the fraction of fragments the candi- ∈ 7: for each candidate v cand(f) do date meaning v connects to: ∈ 8: V := V (v, f) I I ∪ { } f 0 F : v0 s.t. ((v, f), (v0, f 0)) 9: for each ((v, f), (v0, f 0)) VI VI do |{ ∈ ∃ ∈ × 10: if f = f 0 and v0 semSignv then or ((v0, f 0), (v, f)) EI 6 ∈ w(v,f) := ∈ }| 11: EI := EI ((v, f), (v0, f 0)) F 1 ? ∪ { } | | − 12: GI := DENSSUB(F, cand, GI , µ) The rationale behind this scoring function is to 13: selected := new Map < String, Synset > ? take into account both the semantic coherence, us- 14: for each f F s.t. (v, f) VI do ? ∈ ∃ ∈ ? ing a graph centrality measure among the candidate 15: cand (f) := v :(v, f) VI ? { ∈ } meanings, and the lexical coherence, in terms of the 16: v := arg maxv cand?(f) score((v, f)) ? ∈ number of fragments a candidate relates to. 17: if score((v , f)) θ then ≥ ? Finally, we link each f to the highest ranking can- 18: selected(f) := v didate meaning v? if score((v?, f)) θ, where θ is 19: return selected ≥ a fixed threshold (see lines 14–18 of Algorithm 2). Munich, Munich) form a dense subgraph supporting For instance, in sentence (1) and for the fragment their relevance for sentence (1). Mario we select Mario Gomez as our final candidate meaning and link it to the fragment. The problem of identifying the densest subgraph of size at least k is NP-hard (Feige et al., 1999). Linking by densest subgraph. We now illustrate Therefore, we define a heuristic for k-partite graphs our novel densest subgraph heuristic, used in line 12 inspired by a 2-approximation greedy algorithm for of Algorithm 2, for reducing the level of ambiguity arbitrary graphs (Charikar, 2000; Khuller and Saha, of the initial semantic interpretation graph GI . The 2009). Our adapted strategy for selecting a dense main idea here is that the most suitable meanings of subgraph of GI is based on the iterative removal each text fragment will belong to the densest area of of low-coherence vertices, i.e., fragment interpreta- the graph. For instance, in Figure 1 the (candidate, tions. We show the pseudocode in Algorithm 3. fragment) pairs (Thomas Muller,¨ Thomas), (Mario (0) We start with the initial graph GI at step t = 0 Gomez, Mario), (striker, striker) and (FC Bayern (see line 5). For each step t (lines 7–16), first, we 5We denote with deg(v) the overall number of incoming and identify the most ambiguous fragment fmax, i.e., the + outgoing edges, i.e., deg(v) := deg (v) + deg−(v). one with the maximum number of candidate mean-

236 Algorithm 3 Densest Subgraph. 8 Experimental Setup 1: input: F , the set of all fragments in the input text; cand, from fragments to candidate meanings; Datasets. We carried out our experiments on six (0) datasets, four for WSD and two for EL: GI , the full semantic interpretation graph; µ, ambiguity level to be reached. The SemEval-2013 task 12 dataset for multilin- ? • 2: output: GI , a dense subgraph. gual WSD (Navigli et al., 2013), which consists (0) 3: function DENSSUB(F, cand, GI , µ) of 13 documents in different domains, available 4: t := 0 in 5 languages. For each language, all noun ? (0) 5: GI := GI occurrences were annotated using BabelNet, 6: while true do thereby providing Wikipedia and WordNet an- f := arg max v : (v, f) V (t) 7: max f F I notations wherever applicable. The number of ∈ |{(t) ∃ ∈ }| 8: if v : (v, fmax) V µ then |{ ∃ ∈ I }| ≤ mentions to be disambiguated roughly ranges 9: break; from 1K to 2K per language in the different se- 10: vmin:= argmin score((v, fmax)) v cand(f ) tups. ∈ max (t+1) (t) 11: VI := VI (vmin, fmax) The SemEval-2007 task 7 dataset for coarse- (t+1) (t) \{ (t+1) (t+1)} • 12: EI := EI VI VI grained English all-words WSD (Navigli et al., (t+1) (t+1)∩ (t+1)× 13: GI := (VI ,EI ) 2007). We take into account only nominal men- (t+1) ? tions obtaining a dataset containing 1107 nouns 14: if avgdeg(GI ) > avgdeg(GI ) then ? (t+1) to be disambiguated using WordNet. 15: GI := GI 16: t := t + 1 The SemEval-2007 task 17 dataset for fine- ? • 17: return GI grained English all-words WSD (Pradhan et al., 2007). We considered only nominal mentions ings in the graph (see line 7). Next, we discard resulting in 158 nouns annotated with WordNet the weakest interpretation of the current fragment synsets. fmax. To do so, we determine the lexical and seman- The Senseval-3 dataset for English all-words • tic coherence of each candidate meaning (v, fmax) WSD (Snyder and Palmer, 2004), which con- using Formula 2 (see line 10). We then remove tains 899 nouns to be disambiguated using (t) from our graph GI the lowest-coherence vertex WordNet. (v , f ), i.e., the one whose score is minimum min max KORE50 (Hoffart et al., 2012), which consists (see lines 11–13). For instance, in Figure 1, f • max of 50 short English sentences (mean length of is the fragment Mario and we have: score((Mario 14 words) with a total number of 144 mentions Gomez, Mario)) 3 5 = 5, score((Mario Basler, ∝ 3 · manually annotated using YAGO2, for which a Mario)) 1 1 = 0.3 and score((Mario Adorf, ∝ 3 · Wikipedia mapping is available. This dataset Mario)) 2 2 = 1.3, so we remove (Mario Basler, ∝ 3 · was built with the idea of testing against a high Mario) from the graph since its score is minimum. level of ambiguity for the EL task. We then move to the next step, i.e., we set t := AIDA-CoNLL6 (Hoffart et al., 2011), which • t + 1 (see line 16) and repeat the low-coherence re- consists of 1392 English articles, for a total moval step. We stop when the number of remaining of roughly 35K named entity mentions anno- candidates for each fragment is below a threshold tated with YAGO concepts separated in devel- µ, i.e., v : (v, f) V (t) µ f F (see opment, training and test sets. |{ ∃ ∈ I }| ≤ ∀ ∈ lines 8–9). During each iteration step t we com- We exploited the POS tags already available in the (t) SemEval and Senseval datasets, while we used the pute the average degree of the current graph GI , (t) (t) 2 E Stanford POS tagger (Toutanova et al., 2003) for the i.e., avgdeg(G ) = | I | . Finally, we select as I V (t) English sentences in the last two datasets. | I | the densest subgraph of the initial semantic interpre- 6 ? We used AIDA-CoNLL as it is the most recent and largest tation graph GI the graph GI that maximizes the av- available dataset for EL (Hachey et al., 2013). The TAC KBP erage degree (see lines 14–15). datasets are available only to participants.

237 Parameters. We fixed the parameters of RWR when using BabelNet senses, Wikipedia senses and (Section 5) to the values α = .85, η = 100 and WordNet senses, thanks to BabelNet being a super- n = 1M which maximize F1 on a manually cre- set of the other two inventories. We ran our sys- ated tuning set made up of 10 gold-standard seman- tem on a document-by-document basis, i.e., disam- tic signatures. We tuned our two disambiguation pa- biguating each document at once, so as to test its rameters µ = 10 and θ = 0.8 by optimizing F 1 on effectiveness on long coherent texts. Performance the trial dataset of the SemEval-2013 task on mul- was calculated in terms of F1 score. We also com- tilingual WSD (Navigli et al., 2013). We used the pared the systems with the MFS baseline computed same parameters on all the other WSD datasets. As for the three inventories (Navigli et al., 2013). for EL, we used the training part of AIDA-CoNLL Coarse-grained WSD. For the SemEval-2007 (Hoffart et al., 2011) to set µ = 5 and θ = 0.0. task 7 we compared our system with the two top- 8.1 Systems ranked approaches, i.e., NUS-PT (Chan et al., 2007) and UoR-SSI (Navigli, 2008), which respectively Multilingual WSD. We evaluated our system on exploited parallel texts and enriched semantic paths the SemEval-2013 task 12 by comparing it with the in a semantic network, the previously described participating systems: UKB w2w system,8 a knowledge-based WSD ap- UMCC-DLSI (Gutierrez´ et al., 2013) a state- • proach (Ponzetto and Navigli, 2010) which exploits of-the-art Personalized PageRank-based ap- an automatic extension of WordNet, and, as base- proach that exploits the integration of different line, the MFS. sources of knowledge, such as WordNet Do- mains/Affect (Strapparava and Valitutti, 2004), Fine-grained WSD. For the remaining fine- SUMO (Zouaq et al., 2009) and the eXtended grained WSD datasets, i.e., Senseval-3 and WordNet (Mihalcea and Moldovan, 2001); SemEval-2007 task 17, we compared our approach DAEBAK! (Manion and Sainudiin, 2013) with the previously described state-of-the-art • which performs WSD on the basis of periph- systems UKB and IMS, and, as baseline, the MFS. eral diversity within subgraphs of BabelNet; KORE50 and AIDA-CoNLL. For the KORE50 GETALP (Schwab et al., 2013) which uses an and AIDA-CoNLL datasets we compared our sys- • Ant Colony Optimization technique together tem with six approaches, including state-of-the-art with the classical measure of Lesk (1986). ones (Hoffart et al., 2012; Cornolti et al., 2013): We also compared with UKB w2w (Agirre MW, i.e., the Normalized Google Distance as • and Soroa, 2009), a state-of-the-art approach for defined by Milne and Witten (2008); knowledge-based WSD, based on Personalized KPCS (Hoffart et al., 2012), which calcu- • PageRank (Haveliwala, 2002). We used the same lates a Mutual Information weighted vector of mapping from words to senses that we used in our keyphrases for each candidate and then uses the approach, default parameters7 and BabelNet as the cosine similarity to obtain candidates’ scores; input graph. Moreover, we compared our system KORE and its variants KORELSH G and with IMS (Zhong and Ng, 2010), a state-of-the- • − KORELSH F (Hoffart et al., 2012), based on − art supervised English WSD system which uses an similarity measures that exploit the overlap be- SVM trained on sense-annotated corpora, such as tween phrases associated with the considered SemCor (Miller et al., 1993) and DSO (Ng and entities (KORE) and a hashing technique to re- Lee, 1996), among others. We used the IMS model duce the space needed by the keyphrases asso- out-of-the-box with Most Frequent Sense (MFS) as ciated with the entities (LSH-G, LSH-F); backoff routine since the model obtained using the Tagme 2.09 (Ferragina and Scaiella, 2012) task trial data performed worse. • which uses the relatedness measure defined We followed the original task formulation and 8 evaluated the synsets in three different settings, i.e., We report the results as given by Agirre et al. (2014). 9We used the out-of-the-box RESTful API available at 7./ukb wsd -D dict.txt -K kb.bin --ppr w2w ctx.txt http://tagme.di.unipi.it

238 Sens3 Sem07 SemEval-2013 English French German Italian Spanish System WN WN WN Wiki BN Wiki BN Wiki BN Wiki BN Wiki BN Babelfy 68.3 62.7 65.9 87.4 69.2 71.6 ?56.9 81.6 69.4 84.3 66.6 83.8 69.5 IMS 71.2 63.3 65.7 – – – – – – – – – – UKB w2w ?65.3 ?56.0 61.3 – 60.8 – 60.8 – 66.2 – 67.3 – 70.0 UMCC-DLSI – – 64.7 54.8 68.5 ?60.5 60.5 ?58.1 62.8 ?58.3 65.8 ?61.0 71.0 DAEBAK! – – – – 60.4 – 53.8 – 59.1 – ?61.3 – 60.0 GETALP-BN – – 51.4 – 58.3 – 48.3 – 52.3 – 52.8 – 57.8 MFS 70.3 65.8 ?63.0 ?80.3 ?66.5 69.4 45.3 83.1 ?67.4 82.3 57.5 82.4 ?64.4 Babelfy unif. weights 67.0 65.2 65.0 87.0 68.5 71.9 57.2 81.2 69.8 83.7 66.8 83.8 70.8 Babelfy w/o dens. sub. 68.3 63.3 65.4 87.3 68.7 71.6 57.0 81.7 69.1 84.4 66.5 83.9 69.5 Babelfy only concepts 68.2 62.7 65.5 83.0 68.7 70.2 56.6 79.3 69.3 83.0 66.3 84.0 69.7 Babelfy on sentences 66.0 65.2 63.5 84.0 67.1 70.7 53.6 82.3 68.1 83.8 64.2 83.5 68.7

Table 1: F1 scores (percentages) of the participating systems of SemEval-2013 task 12 together with MFS, UKB w2w, IMS, our system and its ablated versions on the Senseval-3, SemEval-2007 task 17 and SemEval-2013 datasets. The first system which has a statistically significant difference from the top system is marked with ? (χ2, p < 0.05).

by Milne and Witten (2008) weighted with 9 Results the commonness of a sense together with the keyphraseness measure defined by Mihal- Multilingual WSD. In Table 1 we show the F1 cea and Csomai (2007) to exploit the context performance on the SemEval-2013 task 12 for the around the target word; three setups: WordNet, Wikipedia and BabelNet. Using BabelNet we surpass all systems on English Illinois Wikifier10 (Cheng and Roth, 2013) • and German and obtain performance comparable which combines local features, such as com- with the best systems on two other languages (UKB monness and TF-IDF between mentions and on Italian and UMCC-DLSI on Spanish). Using the Wikipedia pages, with global coherence fea- WordNet sense inventory, our results are on a par tures based on Wikipedia links and relational with the best system, i.e., IMS. On Wikipedia our inference; results range between 71.6% (French) and 87.4% F1 DBpedia Spotlight11 (Mendes et al., 2011) (English), i.e., more than 10 points higher than the • which uses LingPipe’s string matching algo- current state of the art (UMCC-DLSI) in all 5 lan- rithm implementation together with a weighted guages. As for the MFS baseline, which is known cosine similarity measure to recognize and dis- to be very competitive in WSD (Navigli, 2009), we ambiguate mentions. beat it in all setups except for German on Wikipedia. We also compared with UKB w2w, introduced Interestingly, we surpass the WordNet MFS by 2.9 above. Note that we could not use supervised sys- points, a significant result for a knowledge-based tems, as the training data of AIDA-CoNLL covers system (see also (Pilehvar and Navigli, 2014)). less than half of the mentions used in the testing part and less than 10% of the entities considered in Coarse- and fine-grained WSD. In Table 2, we KORE50. To enable a fair comparison, we ran our show the results of the systems on the SemEval- system by restricting the BabelNet sense inventory 2007 coarse-grained WSD dataset. As can be seen, of the target mentions to the English Wikipedia. As we obtain the second best result after Ponzetto and is customary in the literature, we calculated the sys- Navigli (2010). In Table 1 (first two columns), we tems’ accuracy for both Entity Linking datasets. show the results of IMS and UKB on the Senseval-3 and SemEval-2007 task 17 datasets. We rank second 10We used the out-of-the-box Java API available from on both datasets after IMS. However, the differences http://cogcomp.cs.illinois.edu/page/download view/Wikifier 11 are not statistically significant. Moreover, Agirre et We used the 2011 version of DBpedia Spotlight as it ob- al. (2014, Table 5) note that using WordNet 3.0, in- tains better scores on the considered datasets in comparison to the new version (Daiber et al., 2013). We used the out-of-the- stead of 1.7 or 2.1, to annotate these datasets can box RESTful API available at http://spotlight.dbpedia.org cause a more than one percent drop in performance.

239 System F1 System KORE50 CoNLL (Ponzetto and Navigli, 2010) 85.5 Babelfy 71.5 82.1 Babelfy 84.6 KORE-LSH-G 64.6 81.8 UoR-SSI 84.1 KORE 63.9 ?80.7 UKB w2w 83.6 MW ?57.6 82.3 NUS-PT ?82.3 Tagme 56.3 70.1 KPCS 55.6 82.2 MFS 77.4 KORE-LSH-F 53.2 81.2 Babelfy unif. weights 85.7 UKB w2w (on BabelNet) 52.1 71.8 Babelfy w/o dens. sub. 84.9 Illinois Wikifier 41.7 72.4 Babelfy only concepts 85.3 DBpedia Spotlight 35.4 34.0 Babelfy on sentences 82.3 Babelfy unif. weights 69.4 81.7 Babelfy w/o dens. sub. 62.5 78.1 Table 2: F1 score (percentages) on the SemEval-2007 Babelfy only NE 68.1 78.8 task 7. The first system which has a statistically signifi- cant difference from the top system is marked with ? (χ2, Table 3: Accuracy (percentages) of state-of-the-art EL p < 0.05). systems and our system on KORE50 and AIDA-CoNLL. The first system with a statistically significant difference Entity Linking. In Table 3 we show the results on from the top system is marked with ? (χ2, p < 0.05). the two Entity Linking datasets, i.e., KORE50 and AIDA-CoNLL. Our system outperforms all other from its inability to enforce high coherence, e.g., by approaches, with KORE-LSH-G getting closest, and jointly disambiguating all the words, which is in- Tagme and Wikifier lagging behind on the KORE50 stead needed when considering the high level of am- dataset. For the AIDA-CoNLL dataset we obtain the biguity that we have in our semantic interpretation third best performance after MW and KPCS, how- graph (Cucerzan, 2007). For instance, for sentence ever the difference is not statistically significant. (1) in the introduction, UKB disambiguates Thomas We note the low performance of DBpedia Spot- as a cricket player and Mario as the popular video light which, even if it achieves almost 100% preci- game rather than the two well-known soccer play- sion on the identified mentions on both datasets, suf- ers, and Munich as the German city, rather than the fers from low recall due to its candidate identifica- soccer team in which they play. Our approach, in- tion step, confirming previous evaluations (Derczyn- stead, by enforcing highly coherent semantic inter- ski et al., 2013; Hakimov et al., 2012; Ludwig and pretations, correctly identifies all the soccer-related Sack, 2011). This problem becomes even more ac- entities. centuated in the latest version of this system (Daiber In order to determine the need of our loose candi- et al., 2013). Finally, UKB using BabelNet obtains date identification heuristic (see Section 6), we com- low performance on EL, i.e., 19.4-10.5 points below pared the percentage of times a candidate set con- the state of the art. This result is discussed below. tains the correct entity against that obtained by an exact string matching between the mention and the Discussion. The results obtained by UKB show sense inventory. On KORE50, our heuristic retrieves that the high performance of our unified approach the correct entity 98.6% of the time vs. 42.4% when to EL and WSD is not just a mere artifact of the use exact matching is used. This demonstrates the inad- of a rich multilingual semantic network, that is, Ba- equacy of exact matching for EL, and the need for belNet. In other words, it is not true that any graph- a comprehensive sense inventory, as is done in our based algorithm could be applied to perform both approach. EL and WSD at the same time equally well. This We also performed different ablation tests by ex- also shows that BabelNet by itself is not sufficient perimenting with the following variants of our sys- for achieving high performances for both tasks and tem (reported at the bottom of Tables 1, 2 and 3): that, instead, an appropriate processing of the struc- Babelfy using uniform distribution during the • tural and lexical information of the semantic net- RWR to obtain the concepts’ semantic sig- work is needed. A manual analysis revealed that the natures; this test assesses the impact of our main cause of error for UKB in the EL setup stems weighting and edge creation strategy.

240 Babelfy without performing the densest sub- 10 Conclusion • graph heuristic, i.e., when line 12 in Algorithm ? In this paper we presented Babelfy, a novel, 2 is GI = GI , so as to verify the impact of identifying the most coherent interpretations. integrated approach to Entity Linking and Word Sense Disambiguation, available at Babelfy applied to the BabelNet subgraph in- • http://babelfy.org. Our joint solution is duced by the entire set of named entity ver- based on three key steps: i) the automatic creation tices, for the EL task, and that induced by word of semantic signatures, i.e., related concepts and senses only, for the WSD task; this test aims to named entities, for each node in the reference stress the impact of our unified approach. semantic network; ii) the unconstrained identifica- Babelfy applied on sentences instead of on tion of candidate meanings for all possible textual • whole documents. fragments; iii) linking based on a high-coherence The component which has a smaller impact on densest subgraph algorithm. We used BabelNet the performance is our triangle-based weighting 1.1.1 as our multilingual semantic network. scheme. The main exception is on the smallest Our graph-based approach exploits the semantic dataset, i.e., SemEval-2007 task 17, for which this network structure to its advantage: two key features version attains an improvement of 2.5 percentage of BabelNet, that is, its multilinguality and its in- points. tegration of lexicographic and encyclopedic knowl- Babelfy without the densest subgraph algorithm edge, make it possible to run our general, unified ap- is the version which attains the lowest performances proach on the two tasks of Entity Linking and WSD on the EL task, with a 9% performance drop on the in any of the languages covered by the semantic net- KORE50 dataset, showing the need for a specially work. However, we also demonstrated that Babel- designed approach to cope with the high level of am- Net in itself does not lead to state-of-the-art accu- biguity that is encountered on this task. On the other racy on both tasks, even when used in conjunction hand, in the WSD datasets this version attains almost with a high-performance graph-based algorithm like the same results as the full version, due to the lower Personalized PageRank. This shows the need for our number of candidate word senses. novel unified approach to EL and WSD. Babelfy applied on sentences instead of on whole At the core of our approach lies the effective treat- documents shows a lower performance, confirm- ment of the high degree of ambiguity of partial tex- ing the significance of higher semantic coherence tual mentions by means of a 2-approximation algo- on whole documents (notwithstanding the two ex- rithm for the densest subgraph problem, which en- ceptions on the SemEval-2007 task 17 and on the ables us to output a semantic interpretation of the SemEval-2013 German Wikipedia datasets). input text with drastically reduced ambiguity, as was Finally, the version in which we restrict our previously done with SSI (Navigli, 2008). system to named entities only (for EL) and con- Our experiments on six gold-standard datasets cepts only (for WSD) consistently obtains lower re- show the state-of-the-art performance of our ap- sults (notwithstanding the three exceptions on the proach, as well as its robustness across languages. Spanish SemEval-2013 task 12 using BabelNet and Our evaluation also demonstrates that our approach Wikipedia, and on the SemEval 2007 coarse-grained fares well both on long texts, such as those of the task). This highlights the benefit of our joint use WSD tasks, and short and highly-ambiguous sen- of lexicographic and encyclopedic structured knowl- tences, such as the ones in KORE50. Finally, abla- edge, on each of the two tasks. The 3.4% per- tion tests and further analysis demonstrate that each formance drop attained on KORE50 is of particu- component of our system is needed to contribute lar interest, since this dataset aims at testing perfor- state-of-the-art performances on both EL and WSD. mance on highly ambiguous mentions within short As future work, we plan to use Babelfy for in- sentences. This indicates that the semantic analysis formation extraction, where semantics is taking the of small contexts can be improved by leveraging the lead (Moro and Navigli, 2013), and for the valida- coherence between concepts and named entities. tion of semantic annotations (Vannella et al., 2014).

241 Acknowledgments Silviu Cucerzan. 2007. Large-Scale Named Entity Dis- ambiguation Based on Wikipedia Data. In Proc. of The authors gratefully acknowl- EMNLP-CoNLL, pages 708–716. edge the support of the ERC Start- Joachim Daiber, Max Jakob, Chris Hokamp, and ing Grant MultiJEDI No. 259234. Pablo N. Mendes. 2013. Improving efficiency and accuracy in multilingual entity extraction. In Proc. of I-Semantics, pages 121–124. Leon Derczynski, Diana Maynard, Niraj Aswani, and References Kalina Bontcheva. 2013. Microblog-genre noise and impact on semantic annotation accuracy. In Proc. of Eneko Agirre and Aitor Soroa. 2009. Personalizing Hypertext, pages 21–30. PageRank for Word Sense Disambiguation. In Proc. Antonio Di Marco and Roberto Navigli. 2013. Cluster- of EACL, pages 33–41. ing and Diversifying Web Search Results with Graph- Eneko Agirre, David Mart´ınez, Oier Lopez´ de Lacalle, Based Word Sense Induction. Computational Linguis- and Aitor Soroa. 2006. Two graph-based algorithms tics, 39(3):709–754. for state-of-the-art WSD. In Proc. of EMNLP, pages Nicolai Erbs, Torsten Zesch, and Iryna Gurevych. 2011. 585–593. Link discovery: A comprehensive analysis. In Proc. Eneko Agirre, Aitor Soroa, and Mark Stevenson. 2010. of ICSC, pages 83–86. Graph-based Word Sense Disambiguation of biomedi- Oren Etzioni, Michele Banko, and Michael J Cafarella. cal documents. Bioinformatics, 26(22):2889–2896. 2006. Machine Reading. In Proc. of AAAI, pages Eneko Agirre, Oier Lopez de Lacalle, and Aitor Soroa. 1517–1519. 2014. Random Walks for Knowledge-Based Word Uriel Feige, Guy Kortsarz, and David Peleg. 1999. The Sense Disambiguation. Computational Linguistics, dense k-subgraph problem. Algorithmica, 29:2001. 40(1):57–84. Christiane Fellbaum. 1998. WordNet: An Electronic Soren¨ Auer, Christian Bizer, Georgi Kobilarov, Jens Lexical Database. MIT Press. Lehmann, Richard Cyganiak, and Zachary Ives. 2007. Paolo Ferragina and Ugo Scaiella. 2010. TAGME: DBpedia: A Nucleus for a Web of Open Data. In Proc. On-the-fly Annotation of Short Text Fragments (by of ISWC/ASWC, pages 722–735. Wikipedia Entities). In Proc. of CIKM, pages 1625– Carmen Banea and Rada Mihalcea. 2011. Word Sense 1628. Disambiguation with multilingual features. In Proc. Paolo Ferragina and Ugo Scaiella. 2012. Fast and Accu- of IWCS, pages 25–34. rate Annotation of Short Texts with Wikipedia Pages. Christoph Bohm,¨ Gerard de Melo, Felix Naumann, and IEEE Software, 29(1):70–75. Gerhard Weikum. 2012. LINDA: distributed web-of- Weiwei Guo and Mona T. Diab. 2010. Combining data-scale entity matching. In Proc. of CIKM, pages Orthogonal Monolingual and Multilingual Sources of 2104–2108. Evidence for All Words WSD. In Proc. of ACL, pages Samuel Brody and Mirella Lapata. 2009. Bayesian Word 1542–1551. Sense Induction. In Proc. of EACL, pages 103–111. Yoan Gutierrez,´ Yenier Castaneda,˜ Andy Gonzalez,´ Razvan C. Bunescu and Marius Pasca. 2006. Using en- Rainel Estrada, Dennys D. Piug, Jose I. Abreu, cyclopedic knowledge for named entity disambigua- Roger Perez,´ Antonio Fernandez´ Orqu´ın, Andres´ tion. In Proc. of EACL, pages 9–16. Montoyo, Rafael Munoz,˜ and Franc Camara. 2013. Yee Seng Chan, Hwee Tou Ng, and Zhi Zhong. 2007. UMCC DLSI: Reinforcing a Ranking Algorithm with NUS-PT: Exploiting Parallel Texts for Word Sense Sense Frequencies and Multidimensional Semantic Disambiguation in the English All-Words Tasks. In Resources to solve Multilingual Word Sense Disam- Proc. of SemEval-2007, pages 253–256. biguation. In Proc. of SemEval-2013, pages 241–249. Moses Charikar. 2000. Greedy approximation algo- Ben Hachey, Will Radford, Joel Nothman, Matthew Hon- rithms for finding dense components in a graph. In nibal, and James R. Curran. 2013. Evaluating En- Proc. of APPROX, pages 84–95. tity Linking with Wikipedia. Artificial Intelligence, Xiao Cheng and Dan Roth. 2013. Relational Inference 194:130–150. for Wikification. In Proc. of EMNLP, pages 1787– Sherzod Hakimov, Salih Atilay Oto, and Erdogan Dogdu. 1796. 2012. Named entity recognition and disambiguation Marco Cornolti, Paolo Ferragina, and Massimiliano Cia- using and graph-based centrality scoring. ramita. 2013. A framework for benchmarking entity- In Proc. of SWIM, pages 4:1–4:7. annotation systems. In Proc. of WWW, pages 249– Taher H. Haveliwala. 2002. Topic-sensitive PageRank. 260. In Proc. of WWW, pages 517–526.

242 Johannes Hoffart, Mohamed Amir Yosef, Ilaria Bordino, Rada Mihalcea and Andras Csomai. 2007. Wikify!: link- Hagen Furstenau,¨ Manfred Pinkal, Marc Spaniol, ing documents to encyclopedic knowledge. In Proc. of Bilyana Taneva, Stefan Thater, and Gerhard Weikum. CIKM, pages 233–242. 2011. Robust disambiguation of named entities in text. Rada Mihalcea and Dan I Moldovan. 2001. Extended In Proc. of EMNLP, pages 782–792. WordNet: Progress report. In Proc. of NAACL Work- Johannes Hoffart, Stephan Seufert, Dat Ba Nguyen, Mar- shop on WordNet and Other Lexical Resources, pages tin Theobald, and Gerhard Weikum. 2012. KORE: 95–100. keyphrase overlap relatedness for entity disambigua- Rada Mihalcea. 2005. Unsupervised large-vocabulary tion. In Proc. of CIKM, pages 545–554. word sense disambiguation with graph-based algo- Johannes Hoffart, Fabian M Suchanek, Klaus Berberich, rithms for sequence data labeling. In Proc. of and Gerhard Weikum. 2013. YAGO2: A spatially and HLT/EMNLP, pages 411–418. temporally enhanced knowledge base from Wikipedia. George A. Miller, Claudia Leacock, Randee Tengi, and Artificial Intelligence, 194:28–61. Ross T. Bunker. 1993. A semantic concordance. In Eduard H. Hovy, Roberto Navigli, and Simone P. Proc. of HLT, pages 303–308. Ponzetto. 2013. Collaboratively built semi-structured Tristan Miller, Chris Biemann, Torsten Zesch, and Iryna content and Artificial Intelligence: The story so far. Gurevych. 2012. Using Distributional Similarity for Artificial Intelligence, 194:2–27. Lexical Expansion in Knowledge-based Word Sense Samir Khuller and Barna Saha. 2009. On finding dense Disambiguation. In Proc. of COLING, pages 1781– subgraphs. In Proc. of ICALP, pages 597–608. 1796. Sayali Kulkarni, Amit Singh, Ganesh Ramakrishnan, and David Milne and Ian H. Witten. 2008. Learning to link Soumen Chakrabarti. 2009. Collective Annotation of with Wikipedia. In Proc. of CIKM, pages 509–518. Wikipedia Entities in Web Text. In Proc. of KDD, Andrea Moro and Roberto Navigli. 2013. Integrating pages 457–466. Syntactic and Semantic Analysis into the Open Infor- Els Lefever and Veronique´ Hoste. 2010. Semeval-2010 mation Extraction Paradigm. In Proc. of IJCAI, pages task 3: Cross-lingual Word Sense Disambiguation. In 2148–2154. Proc. of SemEval-2010, pages 15–20. Roberto Navigli and Mirella Lapata. 2010. An Experi- Els Lefever and Veronique´ Hoste. 2013. SemEval-2013 mental Study of Graph Connectivity for Unsupervised Task 10: Cross-lingual Word Sense Disambiguation. Word Sense Disambiguation. TPAMI, 32(4):678–692. In Proc. of SemEval-2013, pages 158–166. Roberto Navigli and Simone Paolo Ponzetto. 2012a. Ba- Els Lefever, Veronique´ Hoste, and Martine De Cock. belNet: The automatic construction, evaluation and 2011. Parasense or how to use parallel corpora for application of a wide-coverage multilingual semantic Word Sense Disambiguation. In Proc. of ACL-HLT, network. Artificial Intelligence, 193:217–250. pages 317–322. Roberto Navigli and Simone Paolo Ponzetto. 2012b. Michael E. Lesk. 1986. Automatic Sense Disambigua- Joining forces pays off: Multilingual Joint Word Sense tion Using Machine Readable : How to Disambiguation. In Proc. of EMNLP, pages 1399– Tell a Pine Cone from an Ice Cream Cone. In Proc. of the International Conference on Systems Documen- 1410. tation, pages 24–26. Roberto Navigli, Kenneth C. Litkowski, and Orin Har- Nadine Ludwig and Harald Sack. 2011. Named entity graves. 2007. SemEval-2007 Task 07: Coarse- recognition for user-generated tags. In Proc. of DEXA, Grained English All-Words Task. In Proc. of pages 177–181. SemEval-2007, pages 30–35. Suresh Manandhar, Ioannis P. Klapaftis, Dmitriy Dli- Roberto Navigli, David Jurgens, and Daniele Vannella. gach, and Sameer S. Pradhan. 2010. SemEval-2010 2013. SemEval-2013 Task 12: Multilingual Word task 14: Word sense induction & disambiguation. In Sense Disambiguation. In Proc. of SemEval-2013, Proc. of SemEval-2010, pages 63–68. pages 222–231. Steve L. Manion and Raazesh Sainudiin. 2013. DAE- Roberto Navigli. 2008. A structural approach to the BAK!: Peripheral Diversity for Multilingual Word automatic adjudication of word sense disagreements. Sense Disambiguation. In Proc. of SemEval-2013, Natural Language Engineering, 14(4):293–310. pages 250–254. Roberto Navigli. 2009. Word Sense Disambiguation: A Pablo N. Mendes, Max Jakob, Andres´ Garc´ıa-Silva, and survey. ACM Computing Surveys, 41(2):1–69. Christian Bizer. 2011. DBpedia spotlight: shed- Roberto Navigli. 2012. A Quick Tour of Word Sense ding light on the web of documents. In Proc. of I- Disambiguation, Induction and Related Approaches. Semantics, pages 1–8. In Proc. of SOFSEM, pages 115–129.

243 Hwee Tou Ng and Hian Beng Lee. 1996. Integrat- George Tsatsaronis, Michalis Vazirgiannis, and Ion An- ing multiple knowledge sources to disambiguate word droutsopoulos. 2007. Word Sense Disambiguation sense: An exemplar-based approach. In Proc. of ACL, with Spreading Activation Networks Generated from pages 40–47. Thesauri. In Proc. of IJCAI, pages 1725–1730. Mohammad Taher Pilehvar and Roberto Navigli. 2014. Tim Van de Cruys and Marianna Apidianaki. 2011. La- A Large-scale Pseudoword-based Evaluation Frame- tent Semantic Word Sense Induction and Disambigua- work for State-of-the-Art Word Sense Disambigua- tion. In Proc. of ACL, pages 1476–1485. tion. Computational Linguistics. Daniele Vannella, David Jurgens, Daniele Scarfini, Simone P. Ponzetto and Roberto Navigli. 2010. Domenico Toscani, and Roberto Navigli. 2014. Vali- Knowledge-rich Word Sense Disambiguation rivaling dating and Extending Semantic Knowledge Bases us- supervised system. In Proc. of ACL, pages 1522–1531. ing Video Games with a Purpose. In Proc. of ACL. Sameer S. Pradhan, Edward Loper, Dmitriy Dligach, and Duncan J. Watts and Steven H. Strogatz. 1998. Col- Martha Palmer. 2007. SemEval-2007 task 17: En- lective dynamics of ’small-world’ networks. Nature, glish lexical sample, SRL and all words. In Proc. of 393(6684):409–10. SemEval-2007, pages 87–92. Association for Compu- Zhi Zhong and Hwee Tou Ng. 2010. It Makes Sense: A tational Linguistics. Wide-Coverage Word Sense Disambiguation System Delip Rao, Paul McNamee, and Mark Dredze. 2013. En- for Free Text. In Proc. of ACL (Demo), pages 78–83. tity Linking: Finding Extracted Entities in a Knowl- Amal Zouaq, Michel Gagnon, and Benoit Ozell. 2009. A edge Base. In Multi-source, Multilingual Information SUMO-based Semantic Analysis for Knowledge Ex- Extraction and Summarization, Theory and Applica- traction. In Proc of LTC. tions of Natural Language Processing, pages 93–115. Springer Berlin Heidelberg. Lev-Arie Ratinov, Dan Roth, Doug Downey, and Mike Anderson. 2011. Local and Global Algorithms for Disambiguation to Wikipedia. In Proc. of ACL, pages 1375–1384. Lenhart K. Schubert. 2006. Turing’s dream and the knowledge challenge. In Proc. of NCAI, pages 1534– 1538. Didier Schwab, Andon Tchechmedjiev, Jer´ omeˆ Goulian, Mohammad Nasiruddin, Gilles Serasset,´ and Herve´ Blanchon. 2013. GETALP System: Propagation of a Lesk Measure through an Ant Colony Algorithm. In Proc. of SemEval-2013, pages 232–240. Hui Shen, Razvan Bunescu, and Rada Mihalcea. 2013. Coarse to Fine Grained Sense Disambiguation in Wikipedia. In Proc. of *SEM, pages 22–31. Ravi Sinha and Rada Mihalcea. 2007. Unsupervised Graph-based Word Sense Disambiguation Using Mea- sures of Word . In Proc. of ICSC, pages 363–369. Benjamin Snyder and Martha Palmer. 2004. The English all-words task. In Proc. of Senseval-3, pages 41–43. Carlo Strapparava and Alessandro Valitutti. 2004. Word- Net Affect: an Affective Extension of WordNet. In Proc. of LREC, pages 1083–1086. Hanghang Tong, Christos Faloutsos, and Jia-Yu Pan. 2006. Fast Random Walk with Restart and Its Appli- cations. In Proc. of ICDM, pages 613–622. Kristina Toutanova, Dan Klein, Christopher D. Manning, and Yoram Singer. 2003. Feature-rich part-of-speech tagging with a cyclic dependency network. In Proc. of NAACL-HLT, pages 173–180.

244