
Semantic Web 1 (2010) 1–9 1 IOS Press S-Match: an open source framework for matching lightweight ontologies Editor(s): Jie Tang, Tsinghua University Beijing, China sequential information explosion, the problem seems Solicited review(s): Ming Mao, SAP Research, Palo Alto, CA, to be emphasized. People face these concrete problems U.S.A.; Wei Hu, Nanjing University, China; Shenghui Wang, Vrije when retrieving, disambiguating and integrating infor- Universiteit Amsterdam, The Netherlands Open review(s): Prateek Jain, Kno.e.sis Center, Wright State Uni- mation coming from a wide variety of sources. Many versity, Dayton, OH, U.S.A. of these sources of information can be represented us- ing lightweight ontologies, which provide the formal representation upon which it is possible to reason auto- a a Fausto Giunchiglia , Aliaksandr Autayeu and matically about hierarchical structures such as classifi- Juan Pane a cations, database schemas, business catalogs, and file a DISI, via Sommarive, 14, 38123 Trento, Italy system directories, among others. E-mail: {name.surname}@disi.unitn.it Semantic matching constitutes a fundamental tech- nique which applies in many areas such as resource discovery, data integration, data migration, query translation, peer to peer networks, agent communica- tion, schema and ontology merging. Semantic match- Abstract. ing is a type of ontology matching technique that relies Achieving automatic interoperability among systems with on semantic information encoded in lightweight on- diverse data structures and languages expressing different tologies to identify nodes that are semantically related. viewpoints is a goal that has been difficult to accomplish. It operates on graph-like structures and has been pro- This paper describes S-Match, an open source semantic matching framework that tackles the semantic interoperabil- posed as a valid solution to the semantic heterogeneity ity problem by transforming several data structures such as problem, namely managing the diversity in knowledge business catalogs, web directories, conceptual models and [1]. web services descriptions into lightweight ontologies and S-Match1 is an open source semantic matching establishing semantic correspondences between them. The framework that provides several semantic matching framework is the first open source semantic matching project algorithms and facilities for the development of new that includes three different algorithms tailored for specific ones. It includes components for transforming tree- domains and provides an extensible API for developing new like structures into lightweight ontologies, where each algorithms, including possibility to plug-in specific back- node label in the tree is translated into propositional ground knowledge according to the characteristics of each application domain. description logic (DL) formula, which univocally cod- ifies the meaning of the node. Keywords: data integration, semantic matching, lightweight S-Match contains the implementation of the ba- ontologies, open source framework sic semantic matching, the minimal semantic match- ing, and the structure preserving semantic matching (SPSM) algorithms. The basic semantic matching al- 1. Introduction gorithm is a general purpose matching algorithm, very customizable and suitable for many applications. Min- Interoperability among different viewpoints and lan- imal semantic matching algorithm exploits additional guages which use different terminology and where knowledge encoded in the structure of the input and knowledge can be expressed in diverse forms is a diffi- cult problem. With the advent of the Web and the con- 1http://s-match.org/ 1570-0844/10/$27.50 c 2010 – IOS Press and the authors. All rights reserved 2 F. Giunchiglia et al. / S-Match: an open source framework is capable of producing minimal mapping and maxi- mal mapping. SPSM is a type of semantic matching producing a similarity score and a mapping preserving structural properties: (i) one-to-one correspondences between semantically related nodes; (ii) functions are matched to functions and variables to variables. The key contributions of the S-Match framework are: Fig. 1. Two example course catalogs to be matched i) working open source implementation of semantic tude is known in Knowledge Organization as the get- matching algorithms; specific principle [2]. ii) several interfaces ranging from easy to use Graph- The information in the classification is normally de- ical User Interface (GUI) and Command Line In- scribed using natural language labels (see Figure 1), terface (CLI) to Application Program Interface which has proven to be very effective in manual tasks (API). They suit different purposes varying from (for example, for manual indexing and manually nav- running a quick and easy experiment to embed- igating the tree). However, these natural language la- ding S-Match in other projects; bels present limitations when one tries to automate rea- iii) the implementation of three different versions of soning over them, for instance for automatic indexing, semantic matching algorithms, each suitable for search and semantic matching or when dealing with different purposes, and the flexibility for integrat- multiple languages. ing new algorithms and linguistic oracles tailored Translating the classifications containing natural of other domains; language labels into their formal counterpart, i.e., iv) an open architecture extensible to work with dif- lightweight ontologies, is a fundamental step toward ferent data formats with a ready-to-run implemen- being able to automatically work with them. Follow- tation of the basic formats. ing the approach described in [2] and exploiting dedi- cated Natural Language Processing (NLP) techniques The rest of the paper is organized as follows: Sec- tuned to short phrases [3], each node label can be trans- tion 2 introduces lightweight ontologies and enumer- lated into an unambiguous formal expression, i.e., into ates several data structures that can be transformed a propositional Description Logic (DL) expression. As into them; Section 3 gives an overview of the seman- a result, lightweight ontologies, or formal classifica- tic matching algorithm and the different versions that tions, are tree-like structures where each node label is a are supported by S-Match; Section 4 presents the gen- language-independent propositional DL formula cod- eral architecture of the framework and the macro-level ifying the meaning of the node. Taking into account components; Section 5 gives an introduction to differ- its context (namely the path from the root node), each ent interfaces available in the framework. Finally, Sec- node formula is subsumed by the formula of the node tion 6 provides a summary of the open source project above [4]. As a consequence, the backbone structure hosting the open source framework. of a lightweight ontology is represented by subsump- tion relations between nodes, i.e., “the extension of a concept of a child node is a subset of the extension of 2. Lightweight ontologies the concept of the parent node” [5]. Giunchiglia et al. show in [6], [4] and [7] how Classification structures such as taxonomies, busi- lightweight ontologies can be used to automate impor- 2 3 ness catalogs , web directories and user directories tant tasks, in particular to favor interoperability among in the file system, among others, are perhaps the most different knowledge organization systems. For exam- natural tools used by humans to organize information ple, [6] shows how data and conceptual models such content. The information is hierarchically arranged un- as database schemes, object oriented schemes, XML der topic nodes moving from general ones to more spe- schemes and concept hierarchies can be converted into cific ones as we go deeper in the hierarchy. This atti- graph-like structures that can be used as input of the Semantic Matching. In [7] it has been demonstrated 2unspsc.org, eclass-online.com how lightweight ontologies can also be used for repre- 3dmoz.org, dir.yahoo.com senting web services and therefore automate the web F. Giunchiglia et al. / S-Match: an open source framework 3 service composition task. This shows that Lightweight 3.1. The basic algorithm Ontologies, while being simple structures, are power- ful enough to encode several types of models ranging The basic semantic matching algorithm was intro- from data and classification models to service descrip- duced in [6] and later extended in [1]. The key intuition tions, reducing the complexity of the semantic interop- of Semantic Matching is to find semantic relations, in erability problem (in many cases) to that of matching the form of equivalence (=), less general (v), more two lightweight ontologies. general (w) and disjointness (?), between the mean- ings (concepts) that the nodes of the lightweight on- tologies represent, and not only the labels [1]. This is done using a four steps approach, namely: 3. Semantic matching – Step 1: Compute the concepts of label, CLs; – Step 2: Compute the concepts at node, CN s; Semantic matching is a type of ontology matching – Step 3: Compute relations between the concepts [8] technique that relies on semantic information en- of label; coded in lightweight ontologies [2] to identify nodes – Step 4: Compute relations between the concepts that are semantically related. A considerable amount at node; of research has been done in this field, which can be In the first step, the natural language labels of the seen in extensive surveys [8,9,10], papers, to cite a few nodes are
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages11 Page
-
File Size-