A Bottom-Up Comparative Study of Eurowordnet and Wordnet 3.0 Lexical and Semantic Relations
Total Page:16
File Type:pdf, Size:1020Kb
A Bottom-up Comparative Study of EuroWordNet and WordNet 3.0 Lexical and Semantic Relations Maria Teresa Pazienzaa, Armando Stellatoa, Alexandra Tudoracheab a) AI Research Group, Dept. of Computer Science, b) Dept. of Cybernetics, Statistics and Economic Systems and Production University of Rome, Informatics, Academy of Economic Studies Bucharest Tor Vergata Calea Dorobanţilor 15-17, 010552, Via del Politecnico 1, 00133 Rome, Italy Bucharest, Romania {pazienza,stellato,tudorache}@info.uniroma2.it [email protected] Abstract The paper presents a comparative study of semantic and lexical relations defined and adopted in WordNet and EuroWordNet. This document describes the experimental observations achieved through the analysis of data from different WordNet versions and EuroWordNet distributions for different languages, during the development of JMWNL (Java Multilingual WordNet Library), an extensible multilingual library for accessing WordNet-like resources in different languages and formats. The goal of this work was to realize an operative mapping between the relations defined in the two lexical resources and to unify library access and content navigation methods for both WordNet and EuroWordNet. The analysis focused on similarities, differences, semantic overlaps or inclusions, factual misinterpretations and inconsistencies between the intended and practical use of each single relation defined in these two linguistic resources. The paper details with examples the produced mapping, discussing required operations which implied merging, extending or simply keeping separate the examined relations. This document describes the experimental observations 1. Introduction achieved through the analysis of data from different We introduce a comparative study of semantic and WordNet versions and EuroWordNet distributions for lexical relations defined and adopted in two renowned different languages, during the development of JMWNL. lexical databases: WordNet (Miller, Beckwith, Fellbaum, The paper focuses on relations mapping, cross part-of- Gross, & Miller, 1993; Fellbaum, 1998) and speech relations and the partial mapping of Fuzzynym EuroWordNet (Vossen, 1998). The study was conducted relations like Derivationally Related Form. during the development of the Java Multi WordNet Library (JMWNL), an extensible multilingual library for 2. WordNet and EuroWordNet accessing WordNet-like resources in different languages WordNet is a large lexical database of English. Nouns, and formats, based on John Didion’s JWNL library verbs, adjectives and adverbs are grouped into sets of (http://sourceforge.net/projects/jwordnet). The analysis cognitive synonyms (synsets), each expressing a distinct and comparison between the two resources was carried concept. WordNet “Synsets are interlinked by means of out in the pre-design stage of development of the above conceptual-semantic and lexical relations” (About library. The aim was to realize an operative mapping WordNet, 2006). Its European counterpart, between the relations defined in the two lexical resources EuroWordNet, is a multilingual database with wordnets and to unify library access and content navigation for several European languages (Dutch, Italian, Spanish, methods for both WordNet and EuroWordNet. The work German, French, Czech and Estonian). The wordnets are was conducted bottom-up by analyzing the raw data and structured in the same way as the American wordnet for several examples either from English WordNet and English in terms of synsets with basic semantic relations different language EuroWordNet (EWN from now on) between them (Vossen, EuroWordNet Web Abstract, resources. The analysis focused on similarities, 2001). differences, semantic overlaps or inclusions, factual EuroWordNet is based on the 1998 WordNet version 1.5 misinterpretations and inconsistencies between the (Fellbaum, 1998) and contains more (and different) intended and practical use of each single relation defined relations than current English WordNet (version 3.0 - in these two linguistic resources. 2007), so a one- to-one relations mapping is not The mapping between the relations defined in the two achievable. resources required a two layered investigation. In most Although the structure of various wordnets is similar and cases it sufficed to establish a template level mapping consistent, the relations defined in each version are not like telling that has_hyperonym is equivalent – at least in identical and moreover some wordnets are richer in the intentions of the lexicographers – to hypernym. This relations than others. way, in theory, all instantiated relationships based on these two relations could be interchangeable. In other 3. Overall Mapping Statistics cases, (even partial) mappings between apparently Our analysis revealed that only some EuroWordNet different relations emerged by looking at a vast quantity relations could be mapped directly with WordNet of sample data and studying cross-linguistic similarities. relations. Other EuroWordNet relations had to be added. EuroWordNet Has Holo Madeof Relation EuroWordNet Has Mero Madeof Relation 0 @2817@ WORD_MEANING 0 @4493@ WORD_MEANING 1 PART_OF_SPEECH "n" 1 PART_OF_SPEECH "n" 1 VARIANTS 1 VARIANTS 2 LITERAL "seta" 2 LITERAL "steccato" 3 SENSE 2 3 SENSE 1 3 STATUS new 3 STATUS new 3 EXTERNAL_INFO 3 EXTERNAL_INFO 4 SOURCE_ID 1 4 SOURCE_ID 1 5 TEXT_KEY "0-0b" 5 TEXT_KEY "0-1a" 1 INTERNAL_LINKS 1 INTERNAL_LINKS 2 RELATION "has_hyperonym" 2 RELATION "has_mero_madeof" 3 TARGET_CONCEPT 3 TARGET_CONCEPT 4 PART_OF_SPEECH "n" 4 PART_OF_SPEECH "n" 4 LITERAL "tessuto" 4 LITERAL "palo" 5 SENSE 1 5 SENSE 1 2 RELATION "has_holo_madeof" 1 EQ_LINKS 3 TARGET_CONCEPT 2 EQ_RELATION "eq_synonym" 4 PART_OF_SPEECH "n" 3 TARGET_ILI 4 LITERAL "abito" 4 PART_OF_SPEECH "n" 5 SENSE 1 4 WORDNET_OFFSET 2963587 Figure 1: EuroWordNet Has_Holo_Madeof and Has_Mero_Madeof Relations English WordNet has in total 46 relations defined on all (words appearing in the same synset are, by definition, parts of speech while EuroWordNet has more than one synonyms), while, in the case of antonymy, EWN tight hundred relations. During our integration for building the and loose antonyms both collapse in the ANTONYM JMWNL library we could map directly 32 EuroWordNet definition in WordNet (see section 4.4). relations and we had to add 17 new relations that were present in EuroWordNet from which 10 are defined 4.2. Meronymy and Holonymy across multiple parts of speech (XPOS). We created In EuroWordNet HAS_MERO/HAS_HOLO with all their these 10 new relations using the defined WordNet variants express respectively HOLONYM/MERONYM relation pointer symbols adding an “x” character as relations. prefix to express the cross relationship. More specifically, in EuroWordNet, HAS MERO MADEOF We found in total 21 WordNet relations that are not and HAS HOLO MADEOF relations have a partial overlap present in EuroWordNet and that couldn’t be mapped with both SUBSTANCE MERONYM/HOLONYM and PART even partially. MERONYM/HOLONYM. In Figure 1 are shown two examples of HAS MERO 4. Comparative results and observations MADEOF and HAS HOLO MADEOF relations. The first As previously mentioned, it is not possible to build a example is the overlap with SUBSTANCE HOLONYM: complete 1-to-1 mapping between WordNet and “abito” (suit) is made of “seta” (silk). EuroWordNet lexical and semantic relations. However a The second example shows the overlap of HAS MERO correct and complete record of all lexical and semantic MADEOF with PART MERONYM: “palo” (pole) is relations is indispensable for building multilingual part meronym of “steccato” (wooden fence). applications that use available wordnets as their lexical In Table 1 are presented both HAS MERO PART and HAS databases. Moreover, it is necessary to establish HOLO MEMBER relations in parallel starting from the relationships between their models to create the grounds same base concept: “tree”. Looking at this example we for working consistently across different languages. could conclude that there are not big differences between This section will show the differences between WordNet definitions of the relations for such base concepts. and EuroWordNet identifications of lexical and semantic relations focusing on the most important EuroWordNet English Base relations and their correspondent WordNet ones. Relation Italian French Transla Word tion 4.1. Synonymy and Antonymy frutto N/A fruit Unlike in WordNet, EuroWordNet distinguishes between corteccia souche bark tight and loose synonymy and antonymy relationships, Albero foglia N/A leaf (It) Has mero branche introducing two relations respectively called NEAR ramo branch SYNONYM and NEAR ANTONYM.: for example, in Italian Arbre part d'arbre EuroWordNet the word "nemico" (“enemy”, in English) (Fr) tronco tronc trunk is NEAR_ANTONYM of "alleato" (Eng. “allay”), while in Tree cima cime flower original WordNet “enemy” is only direct ANTONYM of (En) radice N/A root Has holo “friend”. bosco bois wood These two relations can be mapped directly as SIMILAR member and in WordNet. The tight version of TO ANTONYM Table 1: Multilingual Holonymy and Meronymy Relations synonymy is implicit in the WordNet definition of synset 2 RELATION "xpos_fuzzynym" EuroWordNet Fuzzynym Relation 3 TARGET_CONCEPT 4 PART_OF_SPEECH "n" 0 @317@ WORD_MEANING 4 LITERAL "classicismo" 1 PART_OF_SPEECH "a" 5 SENSE 2 1 VARIANTS 2 RELATION "has_hyperonym" 2 LITERAL "classico" 3 TARGET_CONCEPT 3 SENSE 1 4 PART_OF_SPEECH "a" 3 DEFINITION "attinente al classicismo 4 LITERAL "relativo"