Semantic Web 1 (2010) 1–9 1 IOS Press

S-Match: an open source framework for matching lightweight ontologies

Editor(s): Jie Tang, Tsinghua University Beijing, China sequential information explosion, the problem seems Solicited review(s): Ming Mao, SAP Research, Palo Alto, CA, to be emphasized. People face these concrete problems U.S.A.; Wei Hu, Nanjing University, China; Shenghui Wang, Vrije when retrieving, disambiguating and integrating infor- Universiteit Amsterdam, The Netherlands Open review(s): Prateek Jain, Kno.e.sis Center, Wright State Uni- mation coming from a wide variety of sources. Many versity, Dayton, OH, U.S.A. of these sources of information can be represented us- ing lightweight ontologies, which provide the formal representation upon which it is possible to reason auto- a a Fausto Giunchiglia , Aliaksandr Autayeu and matically about hierarchical structures such as classifi- Juan Pane a cations, schemas, business catalogs, and file a DISI, via Sommarive, 14, 38123 Trento, Italy system directories, among others. E-mail: {name.surname}@disi.unitn.it Semantic matching constitutes a fundamental tech- nique which applies in many areas such as resource discovery, , data migration, query translation, peer to peer networks, agent communica- tion, schema and ontology merging. Semantic match- Abstract. ing is a type of ontology matching technique that relies Achieving automatic interoperability among systems with on semantic information encoded in lightweight on- diverse data structures and languages expressing different tologies to identify nodes that are semantically related. viewpoints is a goal that has been difficult to accomplish. It operates on graph-like structures and has been pro- This paper describes S-Match, an open source semantic matching framework that tackles the semantic interoperabil- posed as a valid solution to the ity problem by transforming several data structures such as problem, namely managing the diversity in knowledge business catalogs, web directories, conceptual models and [1]. web services descriptions into lightweight ontologies and S-Match1 is an open source semantic matching establishing semantic correspondences between them. The framework that provides several semantic matching framework is the first open source semantic matching project algorithms and facilities for the development of new that includes three different algorithms tailored for specific ones. It includes components for transforming tree- domains and provides an extensible API for developing new like structures into lightweight ontologies, where each algorithms, including possibility to plug-in specific back- node label in the tree is translated into propositional ground knowledge according to the characteristics of each application domain. (DL) formula, which univocally cod- ifies the meaning of the node. Keywords: data integration, semantic matching, lightweight S-Match contains the implementation of the ba- ontologies, open source framework sic semantic matching, the minimal semantic match- ing, and the structure preserving semantic matching (SPSM) algorithms. The basic semantic matching al- 1. Introduction gorithm is a general purpose matching algorithm, very customizable and suitable for many applications. Min- Interoperability among different viewpoints and lan- imal semantic matching algorithm exploits additional guages which use different terminology and where knowledge encoded in the structure of the input and knowledge can be expressed in diverse forms is a diffi- cult problem. With the advent of the Web and the con- 1http://s-match.org/

1570-0844/10/$27.50 c 2010 – IOS Press and the authors. All rights reserved 2 F. Giunchiglia et al. / S-Match: an open source framework is capable of producing minimal mapping and maxi- mal mapping. SPSM is a type of semantic matching producing a similarity score and a mapping preserving structural properties: (i) one-to-one correspondences between semantically related nodes; (ii) functions are matched to functions and variables to variables. The key contributions of the S-Match framework are: Fig. 1. Two example course catalogs to be matched

i) working open source implementation of semantic tude is known in Knowledge Organization as the get- matching algorithms; specific principle [2]. ii) several interfaces ranging from easy to use Graph- The information in the classification is normally de- ical User Interface (GUI) and Command Line In- scribed using natural language labels (see Figure 1), terface (CLI) to Application Program Interface which has proven to be very effective in manual tasks (API). They suit different purposes varying from (for example, for manual indexing and manually nav- running a quick and easy experiment to embed- igating the tree). However, these natural language la- ding S-Match in other projects; bels present limitations when one tries to automate rea- iii) the implementation of three different versions of soning over them, for instance for automatic indexing, semantic matching algorithms, each suitable for search and semantic matching or when dealing with different purposes, and the flexibility for integrat- multiple languages. ing new algorithms and linguistic oracles tailored Translating the classifications containing natural of other domains; language labels into their formal counterpart, i.e., iv) an open architecture extensible to work with dif- lightweight ontologies, is a fundamental step toward ferent data formats with a ready-to-run implemen- being able to automatically work with them. Follow- tation of the basic formats. ing the approach described in [2] and exploiting dedi- cated Natural Language Processing (NLP) techniques The rest of the paper is organized as follows: Sec- tuned to short phrases [3], each node label can be trans- tion 2 introduces lightweight ontologies and enumer- lated into an unambiguous formal expression, i.e., into ates several data structures that can be transformed a propositional Description Logic (DL) expression. As into them; Section 3 gives an overview of the seman- a result, lightweight ontologies, or formal classifica- tic matching algorithm and the different versions that tions, are tree-like structures where each node label is a are supported by S-Match; Section 4 presents the gen- language-independent propositional DL formula cod- eral architecture of the framework and the macro-level ifying the meaning of the node. Taking into account components; Section 5 gives an introduction to differ- its context (namely the path from the root node), each ent interfaces available in the framework. Finally, Sec- node formula is subsumed by the formula of the node tion 6 provides a summary of the open source project above [4]. As a consequence, the backbone structure hosting the open source framework. of a lightweight ontology is represented by subsump- tion relations between nodes, i.e., “the extension of a concept of a child node is a subset of the extension of 2. Lightweight ontologies the concept of the parent node” [5]. Giunchiglia et al. show in [6], [4] and [7] how Classification structures such as , busi- lightweight ontologies can be used to automate impor- 2 3 ness catalogs , web directories and user directories tant tasks, in particular to favor interoperability among in the file system, among others, are perhaps the most different knowledge organization systems. For exam- natural tools used by humans to organize information ple, [6] shows how data and conceptual models such content. The information is hierarchically arranged un- as database schemes, object oriented schemes, XML der topic nodes moving from general ones to more spe- schemes and concept hierarchies can be converted into cific ones as we go deeper in the hierarchy. This atti- graph-like structures that can be used as input of the Semantic Matching. In [7] it has been demonstrated 2unspsc.org, eclass-online.com how lightweight ontologies can also be used for repre- 3dmoz.org, dir.yahoo.com senting web services and therefore automate the web F. Giunchiglia et al. / S-Match: an open source framework 3 service composition task. This shows that Lightweight 3.1. The basic algorithm Ontologies, while being simple structures, are power- ful enough to encode several types of models ranging The basic semantic matching algorithm was intro- from data and classification models to service descrip- duced in [6] and later extended in [1]. The key intuition tions, reducing the complexity of the semantic interop- of Semantic Matching is to find semantic relations, in erability problem (in many cases) to that of matching the form of equivalence (=), less general (v), more two lightweight ontologies. general (w) and disjointness (⊥), between the mean- ings (concepts) that the nodes of the lightweight on- tologies represent, and not only the labels [1]. This is done using a four steps approach, namely: 3. Semantic matching – Step 1: Compute the concepts of label, CLs; – Step 2: Compute the concepts at node, CN s; Semantic matching is a type of ontology matching – Step 3: Compute relations between the concepts [8] technique that relies on semantic information en- of label; coded in lightweight ontologies [2] to identify nodes – Step 4: Compute relations between the concepts that are semantically related. A considerable amount at node; of research has been done in this field, which can be In the first step, the natural language labels of the seen in extensive surveys [8,9,10], papers, to cite a few nodes are analyzed with the aid of a linguistic oracle [11,12], and in systems, such as Falcon4, COMA++5, 6 7 (e.g., WordNet) in order to extract the intended mean- Similarity Flooding , HMatch and others. ing. The intended meaning is a complex concept, made S-Match algorithm is an example of semantic match- of atomic concepts, which roughly correspond to in- ing operator. Given any two graph-like structures, like dividual words. The output of this step is a language classifications, database or XML schemas and ontolo- independent Description Logic (DL) formula that en- gies, matching is an operator that identifies those nodes codes the meaning of the label. The main objective of in the two structures which semantically correspond this step is to unequivocally encode the intended mean- to one another. For example, applied to file systems ing of the label, thus eliminating possible ambiguities it can identify that a folder labeled “car” is seman- introduced by the natural language such as, but not tically equivalent to another folder “automobile” be- only, homonyms and . If we consider the ex- cause they are synonyms in English. This information ample in Figure 1, we need to disambiguate the word can be taken from a background knowledge, e.g., a “History”. This is a trivial task for a human, which linguistic resource such as WordNet [13]. would know that “History” is being used in the sense Figure 1 shows two University course catalogs. This of humanistic discipline as opposed to the past times is a typical example of data integration where we need sense, but this task has proven to be difficult to auto- to match these course catalogs in case of a transfer of mate. As SENSEVAL competitions show, this problem a student from one University to another, where the re- is noted for particularly difficult to beat simple base- line approaches. For example, a 4% improvement over ceiving university has to decide which courses to rec- a baseline is considered good [14]. ognize from the former University. There are different In the second step, we compute the meaning of the ways the set of mapping elements can be returned, ac- node considering its position in the tree, namely, the cording to the specifics of the problem where the se- path of the particular node to the root. For example, mantic matching approach is being applied. The fol- in the node “History” of Figure 1 by having computed lowing sub-sections introduce different versions of the the CL in the previous step, we know exactly which semantic matching algorithm while Section 4.8 pro- “History” we are referring to. In this step we restrict vides more details on how these versions are imple- the extension of the concept to that of only “Courses” mented in S-Match. about “History”, and not everything about “History”. During the third step, we compute the relations be- tween the concepts of label of two input trees. This 4ws.nju.edu.cn/falcon-ao 5dbs.uni-leipzig.de/Research/coma.html is done with the aid of two kinds of element level 6www-db.stanford.edu/~melnik/mm/sfa/ matchers: semantic and syntactic. Semantic element 7islab.dico.unimi.it/hmatch/ level matchers rely on linguistic oracles to find seman- 4 F. Giunchiglia et al. / S-Match: an open source framework

Fig. 2. Result of the basic semantic matching algorithm. Fig. 3. Result of the minimal semantic matching algorithm. tic correspondences between the concepts of label. For the Minimal Semantic Matching algorithm. This algo- example, it would discover that the concepts for “car” rithm exploits the above mentioned properties of the and “automobile” are synonyms. Instead, syntactic el- inputs to reduce matching time, the algorithm depen- ement level matchers, such as N-Gram and Edit Dis- dency on the background knowledge and to output the tance are used if no semantic relation could be found minimal set of possible mapping elements. The min- by the semantic element level matcher. The output of imality property is especially desirable if the output this step is a matrix of relations between all atomic mapping is to be further processed by humans. concepts encountered in the nodes of both lightweight The minimal set of possible mapping elements be- ontologies given as inputs. This constitutes the theory tween two Lightweight Ontologies is such that: and axioms which will be used in the following step i) all the (redundant) mapping elements can be com- (see [1] for complete details). puted from the minimal set, In the fourth step, we compute the semantic re- ii) none of the mapping elements can be dropped lations (=, w, v, ⊥) between the concepts at node without loosing property i). (CN s). By relying on the axioms built on the previ- ous step, the problem is reformulated as a proposi- This minimal set of possible mapping elements always tional satisfiability (SAT) problem between each pair exists and it is unique [4]. Another important property of nodes of the two input lightweight ontologies. This of the minimal mappings algorithm is that it is possi- problem is then solved by a sound and complete SAT ble to compute the maximum set of mapping elements engine. based on the minimal mappings set. This computation Steps 1 and 2 need to be done only once (prefer- can be done by inferring all redundant mapping ele- ably off-line) for each input tree. Then, each time the ments from the minimal set [4], therefore, significantly tree needs to be matched, these steps can be skipped reducing the number of node matching operations to be and the resulting lightweight ontologies can be used performed (see step four of the basic algorithm in Sec- directly. Steps 3 and 4 can only be done at runtime, tion 3.1), which otherwise consume considerable time since they require both input lightweight ontologies in for solving the propositional satisfiable (SAT) prob- order to compute the mapping elements. lems. The output of the Semantic Matching is a set of A graphical representation of the minimal mappings i j mapping elements in the form hN1,N2 ,Ri, namely a set can be seen in Figure 3. Note how the set of set of semantic correspondences between the nodes in mappings is drastically reduced in comparison to the the two lightweight ontologies given as input. Where set of mappings returned by the “default” semantic i N1 is the i-th node of the first lightweight ontology, matcher and shown in Figure 2. This reduced set is j N2 is the j-th node of the second lightweight ontology, more human-readable and in general corresponds to and R is a semantic relation in (=, w, v, ⊥). Figure 2 what a person expects to see as the result of the seman- shows a graphical representation of a set of mapping tic matcher. elements. If we analyze the mappings in Figure 2 for the “Courses” root node in the left tree, we can see that 3.2. Minimal Semantic Matching “Courses” is equal to “Course” (the root node of the right tree), but is more general than all the children Considering the hierarchical nature of inputs (the nodes of the “Course” root node. By comparing these lightweight ontologies) and taking into account the mappings to the mapping in Figure 3 (only the = re- subsumption relation existing between the set-theoretic lation between the roots) we can see how the minimal interpretations of input labels [4], we outline here version collapses all the other mapping elements, be- F. Giunchiglia et al. / S-Match: an open source framework 5

ii) The leaf node “brand” in the left tree is mapped to the leaf node “brand” in the right tree, and simi- larly with the leaf node “color” in the left tree and the node “colour” in the right tree - the leaf node is mapped to the node on the same level, that is, to a leaf node.

Fig. 4. Result of Structure Preserving Semantic Matching (SPSM). cause they can be inferred from the equivalence rela- 4. The Framework tion. In general, if two nodes are semantically equal, all the children nodes will be more specific compared 4.1. Algorithms to the parent (see [4] for the theoretical basis). Currently S-Match contains implementations of the 3.3. Structure Preserving Semantic Matching (SPSM) basic semantic matching algorithm [1], as well as min- imal semantic matching algorithm [4] and structure In many cases it is desirable to match structurally preserving semantic matching algorithm [7]. The ba- identical elements of both the source and the target sic algorithm is a general purpose matching algorithm, parts of the input. This is specially the case when com- very customizable and suitable for many applications. paring signatures of functions such as web service de- The minimal semantic matching algorithm produces scriptions or APIs or database schemas. Structure Pre- minimal and maximal mapping elements sets. The serving Semantic Matching (SPSM) [7] is a variant minimal set is well suited for manual evaluations, as of the basic semantic matching algorithm. SPSM can it contains “compressed” information and saves ex- be useful for facilitating the process of automatic web perts’ time. The maximum mapping elements set con- services composition, returning a set of possible map- tains all possible links and is well suited for consump- pings between the functions and their parameters. tion by applications which are not aware of seman- SPSM computes the set of mapping elements pre- tics of lightweight ontologies. The structure preserving serving the following structural properties of the in- semantic matching (SPSM) algorithm is an algorithm puts: well suited for matching API and database schemas. i) only one mapping element per node is returned. It matches the inputs distinguishing between structural This is required in order to match only one param- elements such as functions and variables. eter (or function) in the first input tree, to only one parameter (or function) in the second input tree. 4.2. Architecture ii) leaf nodes are matched to leaf nodes and internal nodes are matched to internal nodes. The ratio- S-Match is developed to have a modular architec- nale behind this property is that a leaf node rep- ture that allows easy extension and plug-in of ad-hoc resents a parameter of a function, and an internal components for specific tasks. Figure 5 shows the main node corresponds to a function. This way a param- components of the S-Match architecture, and a refer- eter which contains a value will not be confused ence to the four steps of the Semantic Matching algo- with a function. rithm outlined in Section 3. Figure 4 shows the output of SPSM when match- ing two simple database schemas, consisting of one ta- 4.3. Loaders ble each: table “auto” with columns “brand”, “name” and “color” on the left and table “car” with columns The Loaders package contains loaders from vari- “year”, “brand” and “colour” on the right. Observing ous formats. This component loads the files contain- the results of SPSM in this example we can see that the ing the tree-like structures from tab-indented formats set of structural properties is preserved: and XML formats. The provided IContextLoader i) The root node “auto” in the left tree has only one interface allows extra loaders to be added for formats mapping to the node “car” in the right tree on the such as RDF and OWL, which alternatively can be ac- same level, that is, root. cessed using S-Match-AlignAPI integration. 6 F. Giunchiglia et al. / S-Match: an open source framework

4.6. Oracles

The Oracles package provides access to linguis- tic knowledge and background knowledge. This com- ponent contains linguistic oracles, which provide ac- cess to linguistic knowledge, such as base forms and senses and sense matchers, which find relations be- tween word senses handed out by linguistic oracle. On one hand, linguistic knowledge (e.g. the knowledge one needs to translate natural language into proposi- tional description logics) and background knowledge (e.g. the knowledge one needs to match “car” to “au- tomobile” or to know that “apple” is less general than “fruit”) are separate and can be provided by different components. On the other hand, in practice, they are often provided by a single component handling out of a dictionary, such as WordNet [13], base forms as well as word senses and relations between those senses.

4.7. Deciders

The Deciders package provides access to sound and complete satisfiability reasoners, as well as other logic- related services, such as conversion of an arbitrary log- Fig. 5. S-Match architecture ical formula into its conjunctive normal form. The ser- vices of the package are used by classifiers during the 4.4. Preprocessors construction of the concepts at node and by the struc- ture level matchers during the matching process. The Preprocessors package contains components, 4.8. Matchers providing translation of natural language , such as classification labels, ontology class names and The Matchers package is organized as two sub pack- library subject headings into lightweight ontologies. ages, one for the Element level matchers, and one for These components use linguistic information provided the Structure level matchers. Oracles by the components from the package to ex- The Element level matchers are used to compute the tract atomic concepts from labels of node. They also relations between the labels of node (the third step of use the knowledge of the language, such as peculiari- the Semantic Matching algorithm). The package is di- ties of the syntax of subject headings or features of the vided in two sub-packages, one for semantic matchers, structure of classification labels to construct complex which use the linguistic Oracles to compute semantic concepts out of atomic ones. These components output relations between concepts, and the syntactic match- the concepts of label (CLs). ers which are used if no semantic relation could be ex- tracted by the semantic matcher. Currently S-Match in- 4.5. Classifiers cludes Wordnet-based syntactic gloss matchers, and a set of eight string-based matchers (see Table 1). The Classifiers package constructs the concepts at The Structure level matchers in Figure 5 are used node (CN s) using the information stored in the tree to compute semantic relations between the concepts of plus the concepts of the labels of each node. The out- node. For this purpose, after transforming the problem put of this step is a set of DL formulas representing into a propositional satisfiability (SAT) problem, they the concepts of each node. These formulas represent rely on the Deciders package that provides sound and the Lightweight Ontology (see Section 2) for the input complete SAT engines. The output of this step is a set trees. of mapping elements that contains semantic relation F. Giunchiglia et al. / S-Match: an open source framework 7

Table 1 ture is preserved. This package also contains imple- S-Match element level matchers mentation of the utility filters such RandomSample- Type Name MappingFilter which obtains a random sample from the mapping, for example, for evaluation, or Re- Sense-based WordnetMatcher, WNHierarchy tainRelationsMappingFilter, which retains String-based Prefix, Suffix, EditDistance, NGram only the relations of interest from a complete mapping. Gloss-based WNGloss, WNExtendedGLoss 4.10. Renderers between the nodes. The core parts of matching algo- rithms are implemented by structure level matchers. Finally, the output step is concerned with the for- The DefaultTreeMatcher implements the ba- matting of the mapping elements to be suitable for sic semantic matching algorithm by matching two each particular case. The Renderers package takes care trees in a very simple manner. It takes all nodes of the of saving the results (and the computed Lightweight source tree and sequentially matches them to all nodes Ontology) in the appropriate format for future use. of the target tree. Currently, the Lightweight Ontology can be saved in The OptimizedStageTreeMatcher imple- an XML format, which can be loaded by the Loaders ments the minimal semantic matching algorithm by package, to avoid repeating the first two steps more splitting the matching task into four steps, according to than once. The set of mapping elements can currently the relations and the partial order between them. First, be saved in plain text file (see Section 5.4 for more de- it walks simultaneously two trees and searches for dis- tails). joint relation between the nodes. If found, the subtrees of the two nodes in question could be skipped. Then it repeats the operation searching for less and more 5. The interfaces generality, skipping appropriate subtrees in case a re- lation is found. Finally, it searches for the presence of S-Match provides 3 different interfaces for execut- both less and more general relations between the same ing the matching algorithms, managing the inputs, the nodes and establishes an equivalence relation between outputs and configuring the framework. These inter- them instead. faces are: Alternatively, the minimal mapping can be obtained from a normal mapping by using a minimal filter, as 1. Java API: useful for extending the functionality explained in the Section 4.9. of the framework by, for example, implement- The SPSMTreeMatcher uses the results of the ing specific Loaders and Renderers, replacing the DefaultTreeMatcher to building a graph to com- linguistic Oracles and implementing specific ele- pute the minimal number of edit distance required to ment and structure level Matchers. transform one lightweight ontology into the other. This 2. Command Line Interface: useful for managing number of operations is then used to compute a seman- S-Match as a configurable black box algorithm, tic similarity score as defined in [7]. Once the score is which parameters can be fine tuned via command computed, the resulting mapping elements that comply line parameters and configuration files. with the structural properties presented in Section 3.3 3. Graphical User Interface: useful for testing pur- is computed by applying the SPSM filter, as explained poses where human experts need to quickly as- in the Section 4.9. sess the resulting mapping elements.

4.9. Filters 5.1. Java API

The Filters package provides components which fil- The Java API provided by S-Match can be partic- ter the mapping. For example, the RedundantMap- ularly useful for extending the basic matchers to in- pingFilterEQ filters what can be considered re- clude, for example, geospatial matchers (to check that dundant mappings between nodes [4] to return a min- a street is part of a city) or temporal matchers (to check imal set of mapping results (for example, to be shown that December 31st is equal to New Year’s eve). Many in a User Interface). The SPSMMappingFilter fil- commonly used S-Match services are exposed via ters a normal mapping into a mapping where the struc- the IMatchManager interface. The MatchManager 8 F. Giunchiglia et al. / S-Match: an open source framework

– filter: filters the mapping elements by load- ... 1: IMatchManager mm = MatchManager.getInstance(); ing a specific mapping element set, filtering it and 2: Properties config = new Properties(); 3: config.load(new FileInputStream("s-match-Tab2XML.properties")); then saving it into an output file. This can be used 4: mm.setProperties(config); to further process the results from a particular ver- 5: IContext s = mm.loadContext("../test-data/cw/c.txt"); 6: IContext t = mm.loadContext("../test-data/cw/w.txt"); sion of semantic matching algorithm (Section 3). 7: mm.offline(s); 8: mm.offline(t); For example, to expand the minimal set of map- 9: IContextMapping result = mm.online(s, t); 10: mm.renderMapping(result, "../test-data/cw/result.txt"); ping elements into a maximum set by computing ... the redundant mapping elements. – convert: transforms between I/O formats by Fig. 6. S-Match API example. using different implementations of Loaders and Renderers. This is useful, for example, to trans- class is an implementation of this interface and can be form tab-indented trees into AlignAPI format and used as shown in Figure 6. vice versa. wntoflat Once the MatchManager is instantiated (line 1), – : converts the WordNet dictionary the following step is to load the configuration file (lines into internal binary format. This format can then be loaded into memory to speed up element level 2-4). Once the framework is configured, the source matching process. and target trees need to be loaded (lines 5-6) and con- verted into lightweight ontologies by preprocessing Each of these commands can be customized by them (lines 7-8) before matching (line 9). Finally, the specifying a properties file which configures the com- mappings can be rendered into an output file (line 10). ponents and the run-time environment. When this The example code shown in Figure 6 is included in the property file is not specified, a default configuration open source distribution as SMatchAPIDemo 8. file9 available in the open source distribution is used. Furthermore, a tuple of can be 5.2. Command line interface specified multiple times in the command line over- riding the loaded value from the properties file. This S-Match command line interface provides facili- is particularly useful at times when one or two out ties for configuring the framework by specifying sev- of many options in the configuration file need to be eral components such as the configuration files, com- changed, for example, when conducting a series of ex- mands to run and their arguments and options. Cur- periments where several threshold values need to be rently the following commands are available in the S- tested. Match CLI: 5.3. Graphical User Interface – offline: transforms the input tree-like struc- tures into lightweight ontologies by performing S-Match can also be used with a simple Graphical the offline preprocessing (first and second steps User Interface (GUI). Figure 7 shows the GUI with of Section 3.1) and rendering the results for fu- two example course catalogs and the resulting Minimal ture reuse. This is useful for transforming the Mapping loaded. The set of mapping elements com- trees only once, and then avoiding this overhead puted by the framework can be easily analyzed by a each subsequent time the same tree needs to be human using the GUI in Figure 7 in contrast to the set matched. of mapping elements rendered as output using any for- – online: computes the mapping elements (third mat as shown in Figure 9. and fourth steps of Section 3.1) between lightweight ontologies and writes the results into an output 5.4. I/O formats file. This performs the online and output steps of Figure 5. The command is decoupled from the S-Match provides generic interfaces for dealing previous offline step in order to avoid preprocess- with the input and output of tree-like structures, ing overheads and allowing better performance Lightweight Ontologies and mapping elements. While when running experiments. currently implementing a set of Loaders and Render-

8demos/smatchapi/SMatchAPIDemo.java 9../conf/s-match.properties F. Giunchiglia et al. / S-Match: an open source framework 9

\Courses = \Course \Courses\College of Arts and Sciences = \Course\College of Arts and Sciences ... \Courses\College of Arts and Sciences\History = \Course\College of Arts and Sciences\History ... \Courses\College of Arts and Sciences\History\Latin America History < \Course\College of Arts and Sciences\History\History of Americas ...

Fig. 9. Example output in the plain text format.

Figure 9 shows an extract (4 out of 43) from the set of mapping elements shown in Figure 2. S-Match integration into the Alignment API (Alig- Fig. 7. S-Match GUI with course catalogs and a mapping loaded nAPI)10 allows S-Match users to access more input and output formats and allows AlignAPI users to use S-Match matching capabilities. AlignAPI is being Courses widely used for benchmarking purposes for the last 6 College of Arsta and Sciences years in the Evaluation Initiative Earth and Athmosferic Science (OAEI)11. History Latin America History America History Ancient European History 6. The open source distribution Computer Science S-Match12 is the open source implementation of the Semantic Matching framework outlined in Section 3 Fig. 8. Example input in tab-indented format. and defined and evaluated in [1] at the date of writ- ing this paper, and it has been released under GNU ers, the framework can be easily extended by imple- Library or Lesser General Public License (LGPL). S- menting the IContextLoader and IContext- Match provides implementations of the basic semantic Renderer interfaces. matching algorithm [1], the minimal semantic match- The Loaders package includes the implementation ing algorithm [4], and the Structure Preserving Se- of loaders for the tree-like structures and mapping el- mantic Matching (SPSM) [7]. It provides necessary ements. Currently, S-Match supports loading tree-like implementation for transforming tree-like structures, structures in tab indented format. Figure 8 shows an such as Classifications, Web and file system directo- example tree from Figure 1 in tab indented format. ries and Web services descriptions, among others, into Given that steps 1 and 2 need to be performed Lightweight Ontologies (see Section 2). It also pro- only once per tree, S-Match also provides the facili- vides a Graphical User Interface for selecting the in- ties (in the Renderers package) for saving the result- puts, running the matching and visualizing the results, ing lightweight ontology from the second step in XML as well as the command line API (Section 5). It pro- format. Consequently, one can load the Lightweight vides a Java library13 that enables other projects to ex- Ontology directly from the XML file (using the Load- ploit semantic matching capabilities. ers package) and save time and resources that are crit- The latest version of S-Match can be found at the 14 ical for testing purposes and when working with big download page of the site and the complete docu- Lightweight Ontologies. mentation including the “Getting started” guide, the Current Renderers and Loaders for the mappings el- manual, the publications and the presentations can be 15 ements consists of the implementation of a plain text found at the documentation page . file format, containing one line per mapping element in i j i 10http://alignapi.gforge.inria.fr/ the form hN1→R→N2 , i. Where N1 is the path from the root to the particular node in one tree, → is a tab 11http://oaei.ontologymatching.org/ 12http://s-match.org/ character, R is the semantic relation in (= equivalence, 13 j s-match.jar > more general, < less general, ! disjointness) and N2 14http://s-match.org/download.html is the path from the root to the node in the second tree. 15http://s-match.org/documentation.html 10 F. Giunchiglia et al. / S-Match: an open source framework

The datasets used for testing the performance of the [2] Fausto Giunchiglia, Maurizio Marchese, and Ilya Zaihrayeu. framework have also been released. The TaxMe2 [15] Encoding classifications into lightweight ontologies. In Jour- dataset with annotations and reference mapping ele- nal on Data VIII, pages 57–81. 2007. 16 [3] Ilya Zaihrayeu, Lei Sun, Fausto Giunchiglia, Wei Pan, Qi Ju, ments can be found at the datasets page . This dataset Mingmin Chi, and Xuanjing Huang. From web directories to has been used since 2005 in the yearly Directory Track ontologies: Natural language processing challenges. In The of the Ontology Alignment Evaluation Initiative (see , pages 623–636. 2007. [16,17] for latest editions) associated with the ISWC [4] Fausto Giunchiglia, Vincenzo Maltese, and Aliaksandr Au- Ontology Matching Workshop, where it has shown tayeu. Computing minimal mappings. In Proceedings of the over the past years to be robust with a good discrim- 4th Workshop on Ontology matching at ISWC. 2009. [5] Fausto Giunchiglia and Ilya Zaihrayeu. Encyclopedia of ination ability, i.e., different sets of correspondences Database Systems, chapter Lightweight Ontologies. Springer, are still hard to find for different systems. For exam- June 2009. ple, S-Match achieves precision, recall and f-measure [6] Fausto Giunchiglia and Pavel Shvaiko. Semantic matching. of 46%, 30% and 36% respectively, and compares well Knowl. Eng. Rev., 18(3):265–280, 2003. with 8 other tested systems [15]. [7] Fausto Giunchiglia, Fiona McNeill, Mikalai Yatskevich, Juan Pane, Paolo Besana, and Pavel Shvaiko. Approximate structure-preserving semantic matching. In Proceedings of the 7. Conclusion OTM 2008 Confederated International Conferences. ODBASE 2008, pages 1217–1234, Moterrey, Mexico, 2008. Springer- Verlag. This paper presents S-Match, an open source se- [8] Jérôme Euzenat and Pavel Shvaiko. Ontology Matching. mantic matching framework. S-Match includes the im- Springer-Verlag New York, Inc., 2007. plementation of three versions of semantic match- [9] Pavel Shvaiko and Jérôme Euzenat. Ten challenges for ontol- ing algorithms, designed for different application do- ogy matching. In Robert Meersman and Zahir Tari, editors, mains. The framework allows new algorithms and OTM Conferences (2), volume 5332 of Lecture Notes in Com- puter Science, pages 1164–1182. Springer, 2008. background knowledge to be included when specific [10] Namyoun Choi, Il-Yeol Song, and Hyoil Han. A survey on matchers or linguistic information is required. The ontology mapping. SIGMOD Record, 35(3):34–41, 2006. graphical user interface allows users to easily interpret [11] David Aumueller, Hong Hai Do, Sabine Massmann, and Er- the results, while the programmatic API provides great hard Rahm. Schema and ontology matching with COMA++. flexibility for exploiting the matching algorithms from In Fatma Özcan, editor, SIGMOD Conference, pages 906–908. other systems. ACM, 2005. [12] Wei Hu and Yuzhong Qu. Falcon-AO: A practical ontology We have outlined how lightweight ontologies, while matching system. J. Web Sem., 6(3):237–239, 2008. being simple and intuitive structures, can be used [13] George A. Miller. WordNet: a lexical database for english. to hold many knowledge organization systems. This Communications of the ACM, 38(11):39–41, 1995. shows that S-Match, while being designed to work [14] Rada Mihalcea and Andras Csomai. SenseLearner: Word with lightweight ontologies, is of much importance be- Sense Disambiguation for All Words in Unrestricted Text. In 43rd Annual Meeting of the Association for Computational cause it covers a great number of information struc- , Proceedings of the Conference, 25–30 June 2005, tures. University of Michigan, USA. The Association for Computa- We are currently working on making more datasets tional Linguistics, 2005. available and extending the programmatic API to in- [15] Fausto Giunchiglia, Mikalai Yatskevich, Paolo Avesani, and clude a web service layer that can be used to execute Pavel Shvaiko. A large dataset for the evaluation of ontology matching systems. The Knowledge Engineering Review Jour- matching online at the S-Match project site or as enter- nal, 24:137–157, 2008. prise semantic matching services. We are also working [16] Caterina Caracciolo, Jérôme Euzenat, Laura Hollink, Ryutaro on enriching the supported input formats by including Ichise, Antoine Isaac, Véronique Malaisé, Christian Meilicke, other standards such as the Ontology Alignment API, Juan Pane, Pavel Shvaiko, Heiner Stuckenschmidt, Ondrej among others. Sváb-Zamazal, and Vojtech Svátek. Results of the ontology alignment evaluation initiative 2008. In Pavel Shvaiko, Jérôme Euzenat, Fausto Giunchiglia, and Heiner Stuckenschmidt, ed- References itors, Proceedings of the 3rd ISWC international workshop on Ontology Matching, Karlsruhe (DE), 2008. [1] Fausto Giunchiglia, Mikalai Yatskevich, and Pavel Shvaiko. [17] Jérôme Euzenat, Alfio Ferrara, Laura Hollink, Antoine Isaac, Semantic matching: Algorithms and implementation. In Jour- Cliff Joslyn, Véronique Malaisé, Christian Meilicke, Andriy nal on Data Semantics IX, pages 1–38. 2007. Nikolov, Juan Pane, Marta Sabou, François Scharffe, Pavel Shvaiko, Vassilis Spiliopoulos, Heiner Stuckenschmidt, On- drej Sváb-Zamazal, Vojtech Svátek, Cássia Trojahn dos San- 16http://s-match.org/datasets.html tos, George Vouros, and Shenghui Wang. Results of the on- F. Giunchiglia et al. / S-Match: an open source framework 11 tology alignment evaluation initiative 2009. In Pavel Shvaiko, national Workshop on Ontology Matching (OM-2009), collo- Jérôme Euzenat, Fausto Giunchiglia, Heiner Stuckenschmidt, cated with ISWC-2009, pages 73–126, Fairfax (VA US), 2009. Natasha Noy, and Arnon Rosenthal, editors, In Proc. 4th Inter-