Using a Lexical Semantic Network for the Ontology Building Nadia Bebeshina-Clairet, Sylvie Despres, Mathieu Lafourcade

Using a Lexical Semantic Network for the Ontology Building Nadia Bebeshina-Clairet, Sylvie Despres, Mathieu Lafourcade To cite this version: Nadia Bebeshina-Clairet, Sylvie Despres, Mathieu Lafourcade. Using a Lexical Semantic Network for the Ontology Building. RANLP: Recent Advances in Natural Language Processing, Sep 2019, Varna, Bulgaria. 12th International Conference Recent Advances in Natural Language Processing, 2019. hal-02269128 HAL Id: hal-02269128 https://hal.archives-ouvertes.fr/hal-02269128 Submitted on 22 Aug 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Using a Lexical Semantic Network for the Ontology Building Nadia Bebeshina-Clairet Sylvie Despres Mathieu Lafourcade LIRMM / LIMICS LIMICS LIRMM 74 rue Marcel Cachin 860 rue de St Priest 93017 Bobigny 34095 Montpellier Abstract cepts) shared by a community of agents. A concept corresponds to a set of individuals sharing sim- In the present article, we focus on mining lexical se- ilar characteristics and may or may not be lexical- mantic knowledge resources for the ontology build- ized. Thus, the ontology labels cannot be polyse- ing and enhancement. The ontology building is of- mous. The strength of ontologies is in their for- ten a descending and manual process where the high mal consistency. The weaknesses are linked to their level concepts are defined, then detailed by human coverage, size (as stated in (Raad and Cruz, 2015), experts. We explore a way of using a multilingual ”large ontologies usually cause serious scalability or monolingual lexical semantic network to make problems”), and human effort needed for their build- evolve or localize an existing ontology with a lim- ing. The potential of the LSNs is linked to the large ited human effort. amount of explicit semantic information they con- tain. However, a filtering process is needed to dis- 1 Introduction criminate irrelevant information (polysemy, noise). Today’s termino-ontological resources are increas- 2 State of the art ingly rich in terms of data they rely upon. The sci- entific community works intensively on data acqui- The opportunity of ontology construction empow- sition for the ontology building. For instance, the 1 ered by the use of Natural Language Processing NeOn project has been set up to provide a method- (NLP) techniques and tools has been explored for ology for the ontology engineering by integrating more than 20 years. Among the achievements, one preexisting knowledge resources into an ontology can distinguish the tools which take into account the building process. The NeOn methodology contains difference between the lexical term and the ontol- a consistent framework for modular ontology build- ogy concept (differentiated tools) and those that do ing as well as for setting up ontology networks. Here not make such distinction. Differentiated tools and we focus on exploiting Lexical Semantic Networks methods suggest extracting the terminological units (LSNs) to enrich an ontology or accompany the from texts and organizing them as a network using ontology building process. We assume that LSNs a set of hierarchical and equivalence relation types. represent knowledge as it is expressed through the Such network guides the ontology expert through human language whereas ontologies provide a for- the conceptualisation and ontology building pro- mal description (specification) of a conceptualiza- cess. Such process relies on an intermediary struc- tion (concepts and relationships between those con- ture, a termino-ontology (for instance, the Termi- 1http://neon-project.org/nw/About_NeOn.html nae suite, (Szulman, 2012)). Undifferentiated tools use some statistical information to suggest the can- The RezoJDM (Lafourcade, 2007) is a lexical se- didate concepts. They exploit such methods as for- mantic network (LSN) for French built using crowd- mal concept analysis (Mondary, 2011) or knowledge sourcing methods and, in particular, games with a based methods (for instance, TextToOnto2). ”Lex- purpose (GWAPS) such as JeuxDeMots3 and addi- ical ontologies” (Abu Helou et al., 2014) are suc- tional games . This commons sense network has cessfully used for ontology building. Numerous ap- been built since 2007. It is a directed, typed and proaches targeted at high level ontology or informa- weighted graph. At the time of our writing, Rezo- tion retrieval ontology based on general knowledge JDM contains 2.7 millions of terms that are modeled (such as (Marciniak, 2013)) rely on PWN. Others as nodes of the graph and 240 millions of relations use PWN and domain specific semantic lexicons for (arcs). forming the concepts (Turcato et al., 2000). Many The MLSN (Bebeshina-Clairet, 2019) is a multilin- other ontology learning techniques use distributional gual LSN (it covers French, English, Spanish, and semantics to learn lightweight ontologies, for exam- Russian) with an interlingual pivot built for the cui- ple, (Wong, 2009). sine and nutrition domain. This network is inspired In the framework of corpora based approaches to by the RezoJDM in terms of its model. Structurally the ontology building such as described in (Kietz speaking, the MLSN is a directed, typed, and val- et al., 2000), the idea of notable (salient, relevant) uated graph. It contains k sub-graphs correspond- element or relevant piece of knowledge (RPK) has ing to each of the k languages it covers and a spe- been introduced. It corresponds either to the fre- cific sub-graph which fulfills the role of the interlin- quent terms appearing in a corpus and to the tacit gual pivot. Similar to the RezoJDM, we call terms knowledge contained in texts. Such tacit knowl- the nodes of the MLSN and relations - its typed, edge corresponds to the semantic relationships (sub- weighed, and directed arcs. The MLSN nodes may somption relationship and other specialized relation- correspond to one of the following types : lexi- ships). Their presence in texts may take the form of cal items (garlic), interlingual items (pertaining ”indices”. In contrast, the explicit elements may re- to the interlingual pivot and also called covering veal the presence of concepts. The main drawback terms), relational items (i.e. relationship reifica- such definition of RPKs is that it relies on the fea- tions such as salad[r has part]garlic), and category tures defined or recorded for a particular language. items parts of speech or other morpho-syntactic fea- In addition, statistical criteria are often preferred and tures (i.e. Noun:AccusativeCase). it is difficult to qualify such RPKs from the semantic As it has been difficult to set up the pivot using point of view and in a language independent man- a multilingual embedding (joining multiple spaces, ner. In this paper, we will detail the experiments we one per language) as well as to avoid pairwise align- conducted to provide a new definition of the RPKs ment based on combinatorial criteria, the pivot has based on the structured lexical semantic information been started as a natural one using the English edi- and describe the way so defined RPKs can be used tion of DBNary (Serasset´ , 2014). It incrementally to help the top down ontology building process. evolves to become interlingual. The pivot evolution relies on sense-based alignments between the lan- 3 Ressources guages of the MLSN and aims at taking into account the difference of sense ”granularity” in different lan- In the present section, we will describe the resources guages. For example, as stewin English can be trans- we used in our experiments. lated as into French as pot-au-feu and ragoutˆ . It 2https://sourceforge.net/p/texttoonto/wiki/ 3http://www.jeuxdemots.org reflects the conceptualization discrepancy as ragoutˆ details and exemplifies the annotation scheme. refers to saute´ the ingredients and then add water whereas pot-au-feu refers to boiling them together in a larger amount of water than used for the ragoutˆ . In the MLSN pivot we have the interlingual term corresponding to stew (which covers the English term) with two hyponyms corresponding to 0rench terms. The alignments are progressively obtained through external resources or by inference. Thus it can be Figure 1: MLSN: relationship annotation scheme. considered as a union of word senses lexicalized or identified in the languages covered by the MLSN. l l . Even though we assume the pivot as being inter- The labels s and t correspond to the language (sub- lingual, it is still close to a natural one. A relation graph) labels. At the time of our writing, the MLSN contains 821 781 nodes and 2 231 197 arcs. It covers r 2 R is a sextuplet r =< s; t; type; v; ls; lt > where s and t correspond respectively to the source 4 languages : English, French, Russian, and Span- and the target term of the relation. The relation ish. type is a typical relation type. It may model differ- MIAM (Despres` , 2016)5 is a modular termino- ent features such as taxonomic and part-whole re- ontology for the digital cooking. It provides knowl- lations (r isa, r hypo, r has part, r matter, r holo), edge necessary for the elaboration of general nutri- possible predicate-argument relations (typical object tional suggestions. The knowledge model of this r object, location r location, instrument r instr of an ontology gathers expert knowledge on food, food action), ”modifier” relations (typical characteristic transformation, cooking actions, relevant dishes that r carac, typical manner r manner) and more4. The reflect french culinary tradition, recipes necessary relationship valuation v corresponds to the charac- to cook such dishes.

Using a Lexical Semantic Network for the Ontology Building Nadia Bebeshina-Clairet, Sylvie Despres, Mathieu Lafourcade

Lexical Resource Reconciliation in the Xerox Linguistic Environment

Wordnet As an Ontology for Generation Valerio Basile

Modeling and Encoding Traditional Wordlists for Machine Applications

TEI and the Documentation of Mixtepec-Mixtec Jack Bowers

Unification of Multiple Treebanks and Testing Them with Statistical Parser with Support of Large Corpus As a Lexical Resource

Instructions for ACL-2010 Proceedings

The Interplay Between Lexical Resources and Natural Language Processing

Exploring Complex Vowels As Phrase Break Correlates in a Corpus of English Speech with Proposel, a Prosody and POS English Lexicon

Methods for Efficient Ontology Lexicalization for Non-Indo

Maurice Gross' Grammar Lexicon and Natural Language Processing

Using Relevant Domains Resource for Word Sense Disambiguation

Inferences for Lexical Semantic Resource Building with Less Supervision