Semeval-2013 Task 12: Multilingual Word Sense Disambiguation
Total Page:16
File Type:pdf, Size:1020Kb
SemEval-2013 Task 12: Multilingual Word Sense Disambiguation Roberto Navigli, David Jurgens and Daniele Vannella Dipartimento di Informatica Sapienza Universita` di Roma Viale Regina Elena, 295 – 00161 Roma Italy fnavigli,jurgens,[email protected] Abstract In recent years, with the exponential growth of the Web and, consequently, the increase of non- This paper presents the SemEval-2013 task on English speaking surfers, we have witnessed an up- multilingual Word Sense Disambiguation. We surge of interest in multilinguality. SemEval-2010 describe our experience in producing a mul- tasks on cross-lingual Word Sense Disambiguation tilingual sense-annotated corpus for the task. (Lefever and Hoste, 2010) and cross-lingual lexi- The corpus is tagged with BabelNet 1.1.1, cal substitution (Mihalcea et al., 2010) were orga- a freely-available multilingual encyclopedic dictionary and, as a byproduct, WordNet 3.0 nized. While these tasks addressed the multilin- and the Wikipedia sense inventory. We present gual aspect of sense-level text understanding, they and analyze the results of participating sys- departed from the traditional WSD paradigm, i.e., tems, and discuss future directions. the automatic assignment of senses from an existing inventory, and instead focused on lexical substitu- tion (McCarthy and Navigli, 2009). The main factor 1 Introduction hampering traditional WSD from going multilingual was the lack of a freely-available large-scale multi- Word Sense Disambiguation (WSD), the task of au- lingual dictionary. tomatically assigning predefined meanings to words The recent availability of huge collaboratively- occurring in context, is a fundamental task in com- built repositories of knowledge such as Wikipedia putational lexical semantics (Navigli, 2009; Navigli, has enabled the automated creation of large-scale 2012). Several Senseval and SemEval tasks have lexical knowledge resources (Hovy et al., 2013). been organized in the past to study the performance Over the past few years, a wide-coverage multi- and limits of disambiguation systems and, even lingual “encyclopedic” dictionary, called BabelNet, more importantly, disambiguation settings. While has been developed (Navigli and Ponzetto, 2012a). an ad-hoc sense inventory was originally chosen for BabelNet1 brings together WordNet and Wikipedia the first Senseval edition (Kilgarriff, 1998; Kilgarriff and provides a multilingual sense inventory that cur- and Palmer, 2000), later tasks (Edmonds and Cot- rently covers 6 languages. We therefore decided to ton, 2001; Snyder and Palmer, 2004; Mihalcea et put the BabelNet 1.1.1 sense inventory to the test al., 2004) focused on WordNet (Miller et al., 1990; and organize a traditional Word Sense Disambigua- Fellbaum, 1998) as a sense inventory. In 2007 the tion task on a given English test set translated into 4 issue of the fine sense granularity of WordNet was other languages (namely, French, German, Spanish addressed in two different SemEval disambiguation and Italian). Not only does BabelNet enable mul- tasks, leading to the beneficial creation of coarser- tilinguality, but it also provides coverage for both grained sense inventories from WordNet itself (Nav- lexicographic (e.g., apple as fruit) and encyclopedic igli et al., 2007) and from OntoNotes (Pradhan et al., 2007). 1http://babelnet.org meanings (e.g., Apple Inc. as company). In this pa- by filling in lexical gaps in resource-poor languages per we describe our task and disambiguation dataset with the aid of Machine Translation (MT). and report on the system results. Its lexicon includes lemmas which denote both lexicographic meanings (e.g., balloon) and ency- 2 Task Setup clopedic ones (e.g., Montgolfier brothers). The The task required participating systems to annotate basic meaning unit in BabelNet is the Babel nouns in a test corpus with the most appropriate synset, modeled after the WordNet synset (Miller sense from the BabelNet sense inventory or, alter- et al., 1990; Fellbaum, 1998). A Babel synset natively, from two main subsets of it, namely the is a set of synonyms which express a concept WordNet or Wikipedia sense inventories. In contrast in different languages. For instance, f Globus to previous all-words WSD tasks we did not focus aerostatic` CA, BalloonEN,Aerostation´ FR, BallonDE, on the other three open classes (i.e., verbs, adjec- Pallone aerostaticoIT, . , Globo aerostatico´ ES g is tives and adverbs) since BabelNet does not currently the Babel synset for the balloon aerostat, where the provide non-English coverage for them. language of each synonym is provided as a subscript label. Thanks to their multilingual nature, we were 2.1 Test Corpus able to use Babel synsets as interlingual concept tags The test set consisted of 13 articles obtained from for nouns occurring within text written in any of the the datasets available from the 2010, 2011 and 2012 covered languages. editions of the workshop on Statistical Machine 2.2.2 WordNet and Wikipedia inventories Translation (WSMT).2 The articles cover different Since BabelNet 1.1.1 is a superset of the Word- domains, ranging from sports to financial news. Net 3.0 and Wikipedia sense inventories,3 once text The same article was available in 4 different lan- is annotated with Babel synsets, it turns out to guages (English, French, German and Spanish). In be annotated also according to either WordNet or order to cover Italian, an Italian native speaker man- Wikipedia, or both. In fact, in order to induce the ually translated each article from English into Ital- WordNet annotations, one can restrict to those lex- ian, with the support of an English mother tongue ical items annotated with Babel synsets which con- advisor. In Table 1 we show for each language the tain WordNet senses for the target lemma; similarly, number of words of running text, together with the for Wikipedia, we restrict to those items tagged with number of multiword expressions and named enti- Babel synsets which contain Wikipedia pages for the ties annotated, from the 13 articles. target lemma. 2.2 Sense Inventories 2.3 BabelNet sense inventory validation 2.2.1 BabelNet inventory Because BabelNet is an automatic integration of To semantically annotate all the single- and multi- WordNet and Wikipedia, the resulting Babel synsets word expressions, as well as the named entities, oc- may contain WordNet and Wikipedia entries about curring in our test corpus we used BabelNet 1.1.1 different meanings of the same lemma. The under- (Navigli and Ponzetto, 2012a). BabelNet is a mul- lying cause is a wrong mapping between the two tilingual “encyclopedic dictionary” and a semantic original resources. For instance, in BabelNet 1.1 network currently covering 6 languages, namely: the WordNet synset f arsenic, As, atomic number English, Catalan, French, German, Italian and Span- 33 g was mapped to the Wikipedia page AS (RO- ish. BabelNet is obtained as a result of a novel inte- MAN COIN), and therefore the same Babel synset gration and enrichment methodology. This resource mixed the two meanings. is created by linking the largest multilingual Web en- In order to avoid an inconsistent semantic tag- cyclopedia – i.e., Wikipedia – to the most popular ging of text, we decided to manually check all the computational lexicon – i.e., WordNet 3.0. The inte- mappings in BabelNet 1.1 between Wikipedia pages gration is performed via an automatic mapping and 3For version 1.1.1 we used the English Wikipedia database 2http://www.statmt.org/wmt12/ dump from October 1, 2012. Language Instances Single- Multiword Named Mean senses Mean senses words expressions Entities per instance per lemma BabelNet English 1931 1604 127 200 1.02 1.09 French 1656 1389 89 176 1.05 1.15 German 1467 1267 21 176 1.00 1.05 Italian 1706 1454 211 41 1.22 1.27 Spanish 1481 1103 129 249 1.15 1.19 Wikipedia English 1242 945 102 195 1.15 1.16 French 1039 790 72 175 1.18 1.14 German 1156 957 21 176 1.07 1.08 Italian 1977 869 85 41 1.20 1.18 Spanish 1103 758 107 248 1.11 1.10 WordNet English 1644 1502 85 57 1.01 1.10 Table 1: Statistics for the sense annotations of the test set. and WordNet senses involving lemmas in our En- 2.4 Sense Annotation glish test set for the task. Overall, we identified 8306 synsets for 978 lemmas to be manually checked. We To ensure high quality annotations, the annotation recruited 8 annotators in our research group and as- process was completed in three phases. Because signed each lemma to two annotators. Each anno- BabelNet is a superset of both the WordNet and tator was instructed to check each Babel synset and Wikipedia sense inventories, all annotators used the determine whether any of the following three opera- BabelNet 1.1.1 sense inventory for their respective tions was needed: language. These BabelNet annotations were then projected into WordNet and Wikipedia senses. An- • Delete a mapping and separate the WordNet notation was performed by one native speaker each sense from the Wikipedia page (like in the ar- for English, French, German and Spanish and, for senic vs. AS (ROMAN COIN) example above); Italian, by two native speakers who annotated dif- ferent subsets of the corpus. • Add a mapping between a WordNet sense and a In the first phase, each annotator was instructed Wikipedia page (formerly available as two sep- to inspect each instance to check that (1) the lemma arate Babel synsets); was tagged with the correct part of speech, (2) lem- • Merge two Babel synsets which express the mas were correctly annotated as named entity or same concept. multiword expressions, and (3) the meaning of the instance’s lemma had an associated sense in Ba- After disagreement adjudication carried out by belNet. Based on these criteria, annotators removed the first author, the number of delete, add and merge dozens of instances from the original data.