The Aberdeen University Ontology Reuse Stack

Edward Thomas, Derek Sleeman, Jeff Z. Pan, Quentin Reul and Joey Lam Department of Computing Science University of Aberdeen Aberdeen, AB24 3FX Contact emails: {d.sleeman,jeff.z.pan}@abdn.ac.uk

Abstract Ontologies on the ONTOSEARCH2 This paper describes a of tools which allow the large num- Requirements (find candidate ber of ontologies available on the Semantic Web to be discov- ontologies) ered and reused for other applications (both in the Semantic Web and by the larger community). Candidate Ontology Firstly, these tools address the problem of finding an existing Changes ontology, with given characteristics. Secondly, we discuss tools which allow the knowledge engineer to detect and re- pair errors which occur in the ontology’s vocabulary (CleOn) or are related to logical coherency/inconsistency (RepairTab). CleOn (repair taxonomic Once a knowledge engineer is satisfied that the ontology is inconisitencies) lexically and semantically consistent, then he/she can extend the ontology if the application requires that. Once the exten-

sion has been done, it again would be prudent to use those RepairTab same tools to check that the resulting ontology is consistent. (repair logical inconsistencies)

Introduction Refined Ontology This paper describes a set of tools which allow the large number of ontologies available on the Semantic Web to be discovered and reused for other applications (both in the Se- mantic Web and by the larger knowledge engineering com- Other KE munity). By using these tools, instead of creating their own applications ontologies from scratch, people could reuse (parts of) exist- ing relevant ones. Figure 1: The ontology toolkit Firstly, these tools address the problem of finding an ex- isting ontology, with given characteristics. Secondly, we discuss tools which allow the knowledge engineer to detect formed on individual ontologies, or the entire repository us- and repair errors which occur in the ontology’s vocabulary ing SPARQL or a simplified keyword based interface. (CleOn) or logical coherency/inconsistency(RepairTab). Once a knowledge engineer is satisfied that the ontology is ONTOSEARCH2 creates a semantic approximation (Pan lexically and semantically consistent, then he/she can extend & Thomas 2007) of an OWL DL ontology in the tractable the ontology if the application requires that. Once the exten- language DL-Lite. DL-Lite has a characteristic per- sion has been done, it again would be prudent to use those formance for conjunctive queries, and because semantic ap- same tools to check that the resulting ontology is consistent. proximation guarantees that no incorrect axioms are present in the approximation with respect to the original ontology, ONTOSEARCH2 (Pan, Thomas, & Sleeman 2006) is a ONTOSEARCH2 is sound for all queries, further it is both tool designed to allow end users of ontologies (knowledge sound and complete for all queries which do not include engineers, domain experts, or software agents) to discover non-distinguished variables. This approach allows us to per- appropriate ontologies on the Semantic Web. It maintains a form structural queries over very large repositories while still maintaining good scalability and performance. large repository of ontologies which have been found from 1 various sources, and allows structured queries to be per- The CleOn system (Sleeman & Reul 2006) semi-

Copyright c 2008, Association for the Advancement of Artificial 1The approach was formerly known as CleanOnto, but was re- Intelligence (www.aaai.org). All rights reserved. named to avoid unnecessary confusion with OntoClean automatically creates a lexically coherent ontology, when or part of this repository using SPARQL (Prud’hommeaux provided with an OWL ontology, and the corresponding & Seaborne 2006). By using a DL-Lite approximation ON- source of taxonomic information. Currently, we assess taxo- TOSEARCH2 is significantly faster than other comparable nomic relationships by processing the linguistic information tools which perform full OWL DL entailment (up to two or- inherent in concept labels. This is achieved by assessing the ders of magnitude faster on larger datasets (Pan & Thomas lexical adequacy of a link between a parent and a child con- 2007)). cept based on their lexical paths. To date these lexical paths In Figure 2 we can see the structure of the system. Source have been extracted from sources such as WordNet 2.0 and ontologies are first loaded into a DL Reasoner (Pellet), and a mechanical engineering thesaurus. are checked for consistency before being classified. The re- We have implemented a graph-based approach imple- sulting class hierarchy is stored in the repository where it is mented in ReTAX++ (Lam, Sleeman, & Vasconcelos 2005) represented as a DL-Lite ABox against a meta-TBox which to help knowledge engineers browse ontologies and resolve defines the meta structure of all OWL-DL ontologies. The logical inconsistencies. In fact, this has been implemented ontology is then approximated using the semantic approxi- as the RepairTab plug-in for Prot´eg´e. With the help of a rea- mation technique presented in (Pan & Thomas 2007). On- soner, the system detects and highlights the inconsistent con- tology metadata is analysed and converted to a set of fuzzy cepts. We have implemented graph based algorithms to de- property instances, relating ontological structures to repre- tect which relationships among concepts cause the inconsis- sentations of the keywords found in the metadata, with a tencies, and provide options for the user to correct them. If weight based on the semantic strength of the relationship. an incomplete or inconsistent ontology is imported, a num- Conjunctive queries can be performed over the repository; ber of ontological fragments may still be formed. In this each query being expanded by the PerfectRef algorithm in case, we aim to suggest to the user the best concept candi- DL-Lite to a set of SQL queries. These SQL queries are date to integrate the several fragments. then executed on the relational database engine which man- How these three tools interoperate to find and refine on- ages the repository. tologies is illustrated in Figure 1. More details of these three We have reformulated the idea of knowledge compila- systems are given in the following three sections, and the tion (Selman & Kautz 1996) as semantic approximation of paper finishes with an overall conclusion. OWL DL ontologies. The idea of knowledge compilation is simple: users write statements in an expressive represen- ONTOSEARCH22 tation language and these statements are compiled into a Searching for relevant ontologies is one of the key tasks to restricted language that allows efficient inference. In this enable ontology reuse. Now the W3C ontology language way, users do not have to use a less expressive language OWL has become the defacto standard for ontologiesand se- which might be too limited for practical applications. This mantic data, there are progressively more ontology libraries approach guarantees soundness and completeness for all online, such as the DAML Ontology Library and Prot´eg´e queries with non-distinguished variables. By using this tech- Ontologies Library. While the Web makes an increasing nique to reduce the complexity of query answering, we can number of ontologies widely available for applications, how perform conjunctive queries against large knowledge bases. to discover ontologies becomes a more challenging issue. By treating the combined TBoxes of the ontologies in the Currently it is difficult to find ontologies suitable for as a single ABox for a meta-ontology rep- a particular purpose. Semantic Web search engines like resenting the underlying structure of DL-Lite, we can allow Swoogle (Ding et al. 2005) and OntoKhoj (Patel et al. 2003) users to perform searches for structural patterns across mul- allow ontologies to be searched using keywords, but further tiple ontologies. In tests we have found the performance of refinement of the search criteria based on semantic entail- the ONTOSEARCH2 query engine to exceed that of other mentsis notpossible. For example, if one wantsto search for knowledge base systems (Pan, Thomas, & Sleeman 2006). ontologies in which Professor is a sub-class of Client, using After the semantic approximation has been stored, the on- a keyword based approach is not satisfactory, as all ontolo- tology is examined for metadata. This is stored as a map- gies that contain the classes Professor and Client would be ping between the objects in the ontology (OWL ontologies, returned, whether or not the subsumption relationship holds classes, properties, and instances) and keywords that appear between them. In this paper, we present an ontology search in the metadata attached to that object (currently, the label engine, called ONTOSEARCH2, which provides three ap- and comment properties, and the URL of the object). How proaches to searching ontologies semantically, namely: a the keyword is related to the object is used to give a weight- keyword-based search tool, a search tool based on query an- ing for each mapping: for example, a keyword occurring in swering and a search tool based on fuzzy query answering. a label is given the highest weighting, and a keyword in a Detailed examples of the three searches are presented later comment is given the lowest. In addition, keywords can be in this section. inherited from parents of an object (super classes or super ONTOSEARCH2 (Pan, Thomas, & Sleeman 2006) has properties); instances inherit the keywords of their classes, two principal components, namely an ontology repository and all objects in an ontologywill inherit the metadata of the and query engine. It stores approximations of OWL ontolo- ontology itself. In all cases, the greater the semantic distance gies in DL-Lite, and allows queries to be executed over all between two objects which have an inheritance relationship, the lower the weighting is for the keywords inherited. Punc- 2http://www.ontosearch.org/ tuation is removed from keywords before they are added to (Fuzzy) Source Ontology Conjunctive Query Results Query

(Fuzzy) PerfectRef SQL Queries Implementation

Check Consistency

Pellet OWL Reasoner

Class RDB Query Classify TBox Store Hierarchy Engine

ONTOSEARCH2 Repositary

Queries Entailments ABox Store Metadata Entailment Set Approximation Algorithm

Metadata Fuzzy Analyser Property instances

Figure 2: ONTOSEARCH2 structure diagram.

the metadata repository. If a keyword is less than three char- added to the query. acters long or is a common word such as “and”, “the”, or 2. ABox Searching To restrict a search term to only match “some”, it is discarded. within ABox (or instance) data, it can be prefixed with in- From this fuzzy ontology, we find objects within the stance: Similarly the directive pragma:Instance will di- repository which match the requirements of the original rect the search engine to search instance data only for query. The weightings are used to specify minimum weights every keyword. If used with both pragma:Class and required for each term in the query, or they are used to rank pragma:Property, the search will exhibit its default be- the results by relevance to the initial terms. haviour, searching the entire contents of the repository Queries can be made through a simple keyword based (Class and Property definitions as well as instance data). search form, or can be submitted as SPARQL queries, op- tionally containing fuzzy extensions that can specify the 3. Other Search Directives By adding the search directive degree of confidence required for each term in the query. pragma:Resource, the search engine will find the ob- Keyword based queries are expanded into fuzzy SPARQL ject within each ontology which best matches the search queries, so all searches use the same internal process. The terms. It will also cause the results to be listed as indi- most basic search is for a set of keywords, where the results vidual resources rather than as ontologies. Therefore if will list ontologies containing all the keywords. The query an ontology contains a single class or instance which has can be made more specific by adding search directives to the a very high match for the keywords, but which as an on- query: tology has a low score, this class will be displayed above other matches. The default behavior is for the whole on- 1. TBox Searching Restrictions on the search query can tology with the highest sum of its objects’ scores to be specify that a particular keyword should only be matched returned first. against a Class or a Property, but not against instance values in the repository. This is done by prefixing the Search results can be ranked as entire ontologies, or keyword with class:, so the query class:red wine would as individual objects within each ontology (by using the match the keyword ”red” in class definitions only, and pragma:Resource directive in a query). In the first instance, match the keyword wine across the entire ontology. Sim- we sum the total weightings for each object/keyword pair- ilarly a keyword can be restricted to only occur within ing that matched in an ontology. This total is used to sort property definitions within the ontology by prefixing the the results, with the highest total being returned at the top keyword with property:. For queries where all keywords of the rankings. By ranking results based on the semantic should only match class and/or property definitions, the significance of the matches themselves, rather than by track- directives pragma:Class and/or pragma:Property can be ing the number of links to the ontology (Patel et al. 2003; Ding et al. 2005) we are able to find ontologies which are work is centered around improving the metrics used to give the closest match for a particular query regardless of their different weightings. Additionally, we plan to investigate popularity. the use of machine learning techniques to evaluate the suc- In the case that results are to be returned as individual cess of the weightings currently being used by comparing objects, the total sum of object/keyword weightings for each the ontologies users eventually select for their application, object in the repository is used to determine rank. This is the position in the results which that ontology held, and the used to returna list of all the differentobjectswhich matched weightings used to generate the results. To help users find the search terms. the best ontology or resource we are working on an ontol- ogy browser integrated with the search engine which will Examples allow ontologies to be explored graphically, with matching Query-based Search Example This search uses the query resources highlighted, on the results page. answering facilities of ONTOSEARCH2 to find a class Another aim is to expand the metadata captured to in- which has a particular label, which is a subclass of a class clude other recognised sources of metadata such as Dublin with a different label. Core (DCMI 2003) and to eventually index all datatype The search is specified in SPARQL. The query is for a properties present in an ontology as potential metadata. class with a label of “Chablis” which is a subclass of a class with a label “Wine”. The query used is shown in listing 1. CleOn3 SELECT ?X WHERE { ?X rdfs:label "Chablis" . Given the central role of ontologies in the Semantic Web, it ?X rdfs:subClassOf ?Y . is highly desirable that all ontologies are assessed for logi- ?Y rdfs:label "Wine" . cal consistency and against specific criteria of the particular } application. Further, ontology evaluation should ensure that the information available in the ontology is complete and Listing 1: SPARQL Query consistent with respect to the domain of interest. This evalu- This is a standard SPARQL query which searches for some ation should be performed during the ontology development class with a rdfs:label of exactly “Wine” which has some process (i.e. ontology building) as well as during the ontol- subclass with a rdfs:label of exactly “Chablis”. This query ogy maintenance process (e.g. ontology evolution4). may return classes which are not direct subclasses because Although DL reasoners (e.g. FaCT++ (Tsarkov & Hor- of the DL-Lite semantics which underlie the query engine in rocks 2006) and Pellet (Sirin et al. 2007)) can determine ONTOSEARCH2, a search engine that used RDF semantics the logical consistency of an ontology, they are unable to would not match any indirect subclasses. evaluate whether the vocabulary used for class labels is co- Fuzzy Query-based Search Example This search uses herent with the domain represented. Additionally, Volker et the fuzzy query engine in ONTOSEARCH2 to find a class al. (Volker, Hitzler, & Cimiano 2007) suggest that the terms which is a subclass of a different class, where both classes chosen as class names in ontologies(e.g. OWL axioms)con- have particular metadata associated with them, with a certain vey some of the meaning (semantics) of these axioms to the level of confidence. human reader. The search is specified using SPARQL with additional Let’s consider a team of knowledge engineers develop- fuzzy values included as comments. This current query is ing an ontology about the geography domain. The class for a class with a metadata keyword of “Chablis” with a de- labeled with the term country has been used by different gree of confidence ≥ 0.7 which is a subclass of a class with knowledge engineers to be a subclass of both ‘Location’ and a keyword of “Wine” with a confidence of ≥ 0.5. The query ‘Social entity’ (Figure 3). However, this representation de- is shown in listing 2. scribes two distinct aspects of the domain through the same class (i.e. ‘Country’). Ontheonehand,theterm country de- SELECT ?X WHERE { ?X os2:hasKeyword "Chablis" . #FT# 0.7 fines the social and political aspects of countries (e.g. gov- ernment). On the other hand, the term is used to describe ?X rdfs:subClassOf ?Y . the territory occupied by a country. In information retrieval ?Y os2:hasKeyword "Wine" . #FT# 0.5 } systems, this ontology would cause many inappropriate re- sults to be returned and would require the user to extract the Listing 2: fuzzy SPARQL Query relevant information to his/her search. The property os2:hasKeyword is a fuzzypropertyin the ON- The OntoClean methodology (Guarino & Welty 2000) as- TOSEARCH2 fuzzy metadata ontology which associates signs meta-properties; namely Unity, Identity, Rigidity and objects with keywords. For details on such fuzzy queries and Dependence, to describe relevant aspects of the intended related query answering algorithms, see (Pan et al. 2008). interpretation of the properties, classes, and relations that make up the ontology. The evaluation of the taxonomic ONTOSEARCH2: Future Work We have presented three methods for semantically search- 3http://www.csd.abdn.ac.uk/ qreul/CleOn.html ing ontologies. By offering different methods of querying 4Ontology evolution is the systematic application of changes to for data, we can allow users to choose the best tool available an ontology while maintaining the consistency of the ontology and for their level of expertise and their requirements. Future all its dependent artifacts. Figure 3: Example ontology. structure is dictated by the constraints imposed on the dif- ≺location, physical object, physical entity, entity≻. ferent meta-properties. The notion of identity is based on Secondly, we determine the adequacy of every taxonomic intuitions about how objects interact with the world around relationship in the ontology. There are several reasons why them. OntoClean proposes to distinguish between properties a link might be considered to be inadequate. Initially, ev- that supply their “own” identity criteria and those that inherit ery taxonomic relationship containing one or more lexically their identity criteria from subsuming properties. In general, unsatisfiable classes are removed from the ontology. For a country represents different social aspects and hence car- example, ‘Country’ ⊑ ‘Social entity’ is removed from the ries its own identity, whereas a geographical region inherits ontology as social entity is present in the ontology. Then, the identity of its location (e.g. fauna and flora). There- CleOn inspects the adequacy of every remaining taxonomic fore, OntoClean suggests adding a new class (e.g. ‘Geo- relationship in the ontology with regard to the lexical paths graphical region’) to represent the physical aspects related of their constituents. A concept name ‘A’ is adequately sub- to countries. However, Volker et al. (Volker, Vrandecic, sumed by a concept name ‘B’ if one of the sequences in the & Sure 2005) have shown that this methodology was very lexical path of ‘B’ is contained in one of the sequences in time consuming and that the agreement among knowledge the lexical path of ‘A’, where ‘A’ is the child node, and ‘B’ engineers was pretty low (only 38%). Moreover, they found corresponds to the parent node. Examining the taxonomic that knowledge engineers based the assignment of meta- relationship between ‘Location’ (parent node) and ‘Coun- properties on the examples provided by Guarino and Welty try’ (child node), we retrieve the lexical paths for both these rather than their formal definition, thus suggesting that it concepts as described previously: would be difficult to apply OntoClean to highly specialised domain ontologies. • LP(Location) [Parent node] {≺location, physical object, physical entity, entity≻} As a result, we have developed an alternative approach, called CleOn (Sleeman & Reul 2006; Reul, Sleeman, & • LP(Country) [Child node] Fowler 2008), which detects lexical incoherencies in onto- {≺country, administrative district, district, region, loca- logical structures given a lexical resource (e.g. thesauri). tion, physical object, physical entity, entity≻} A thesaurus generally classifies linguistic terms by themes As all the elements of the parent path are included in the or topics. Often, this classification is achieved through the child path, we consider this link to be adequate. The re- hypernym relation, which specifies that if a term A is a hy- movalof all inadequatelinks results in a skeleton tree, which pernym of a term B then A is more general than B. we refer to as Tree-0. Furthermore, this process creates “de- Our overall approach contains three phases. We first ex- tached” subtrees and/or orphan nodes5. tract a lexical path for each class in the ontology. For this Finally, CleOn places these orphan nodes and “de- example, we have extracted these paths from WordNet (Fell- tached” subtrees back onto Tree-0 so as to create a lexically baum 1998) as it is a general purpose thesaurus and provides coherent ontology (Definition 1). When there are several the type of information that we require. A class is consid- nodes at which an orphan element or detached subtree can ered lexically satisfiable only if an exact match for its class be placed, the user is prompted to choose among the poten- label is found in the thesaurus. For example, social entity is tial taxonomic relationships. It is important to note that we not a term in WordNet and therefore the concept is lexically do not wish to imply that a taxonomic relationship is wrong. unsatisfiable and its lexical path is the empty set. Once an Instead we suggest that a taxonomic relationship is inade- exact match for the class label has been found in the the- quate given a particular conceptualisation and a thesaurus. saurus, the lexical path of the class is created by adding its matching term in the thesaurus, then its hypernym, then the Definition 1 (Lexically Coherent Ontology) An ontology hypernym of its hypernym, and so on until the root term of O is lexically coherent if all taxonomic relationships in O the thesaurus is encountered. For example, an exact match are adequate. for ‘Location’ is found in the thesaurus. As a result, its lexical path, denoted LP(‘Location’), contains the sequence 5An orphan node is a subtree with a single node. In Sleeman and Reul (Sleeman & Reul 2006), we have ‘Country’ ⊑ ‘Location ’). However, many OWL DL on- introduced the CleOn system, which when provided with an tologies are composed of GCIs between a concept name and OWL ontology (Bechhofer et al. 2004) and a thesaurus (e.g a concept description (e.g. conjunction). In further work, WordNet) in the appropriate format, collaboratively creates we plan to use a DL reasoner (e.g. Fact++ (Tsarkov & Hor- a lexically coherent ontology. The application is written in rocks 2006)) to extract the implicit knowledge from GCIs Java and uses Jena API6 to process and manipulate OWL composed of concept descriptions. Moreover, we plan to ontologies. Furthermore, we described several possible im- also consider a wider range of OWL DL constructs, such as provements resulting in four different modes; namely con- owl:unionOf and owl:equivalentClass. servative mode, include head mode, inclusive mode and in- Finally, the CleOn methodology detects lexical inco- teractive selection of word sense mode. The conservative herencies in these ontological structures given a thesaurus mode is the system default mode applying our approach to (e.g. WordNet). A significant issue is how one might acquire OWL ontologies and only considers the first sense in Word- appropriate thesauri; one approach would be to use ON- Net when creating the different lexical paths. Whereas the TOSEARCH2 (Pan, Thomas, & Sleeman 2006) to search conservative mode determines the lexical satisfiability of a the web for them; the second and probably more practical concept name based on exact match, the include head mode approach is to encourage domain experts to develop them. also searches for the head of the noun phrase when no prior In any event, we would hope that these thesauri would be match is found. The inclusive mode includes the intermedi- expressed in a standard formalism such as SKOS (Miles & ary nodes found in the lexical path of the child node but not Perez-Aguera 2007). If the second route is followed, editors present in the lexical path of the parent node as part of the will be needed to develop these thesauri, and we believe that Creating a lexically coherent ontology step of the algorithm. the use of information retrieval techniques would facilitate Finally, the interactive selection of word sense mode allows this process. the user to choose the intended meaning when an ambiguous term is found in the thesaurus. RepairTab7 In Reul et al. (Reul, Sleeman, & Fowler 2008), we report Inconsistenciesin OWL ontologiescan easily occur, and it is the results of an experiment in which we used our method- a challengingtask for ontologymodelers, especially for non- ology to evaluate a domain dependent ontology. The eval- expert ontology modelers, to resolve such inconsistencies. uation demonstrated that our method does not solely rely Therefore, we proposed an ontology debugging tool, called on WordNet, but relies on a thesaurus which is appropri- ’RepairTab’ that can provide guidance on (1) selecting the ate for the domain. In this case, we had to create this the- axioms to modify, and (2) how to minimise the impact of saurus as none was available for the engineering domain. proposed modifications on the ontology. Moreover, we showed that our method was consistently sim- pler to apply than the OntoClean methodology, which is Ranking Axioms a very encouraging result. The precision and recall met- rics (Dellschaft & Staab 2006) demonstrated that the discor- Existing ontology debugging approaches (Schlobach & Cor- dance was in the lexical layer rather than in the concept hi- net 2003; Wang et al. 2005; Meyer et al. 2006) can iden- erarchy. Therefore, the precision and recall metrics suggest tify the problematic axioms for unsatisfiable concepts. Af- that the terminology used in industry is somewhat inconsis- ter pinpointing these problematic axioms, the next step is tent with the terminology used by mechanical engineering to resolve the error by removing or modifying one of the academics. pinpointed axioms. It is difficult for tools to make specific suggestions for how to resolve problems, because it requires CleOn: Future Work an understandingof the meaning of the set of classes. Only a few existing approachesprovide supportfor resolving incon- We realise that there is still room for improvements to both sistencies in ontologies. Tools such as SWOOP (Kalyanpur the approach and the system. Firstly, the Creating a lexically et al. 2006) that provide this functionality are quite lim- coherent ontology phase proposes several potential super- ited in their approach; the problematic axioms are ranked in concepts when adding subtrees to Tree-0 based on the terms order of importance. We illustrate their strategies with the contained in the lexical path of a subtree. However, it could following example. be argued that terms occurring in every lexical path con- vey less information than terms occurring only once. Resnik Example 1 An ontology contains the following axioms: (Resnik 1995) proposes the information content of a concept (1) Cows ⊑Animals which is calculated as the negative log likelihood of a con- (2) Sheep ⊑Animals cept occurring in a repository. The implementation of this (3) Lions ⊑ ∃ eat.(Cows ⊓ Sheep) assumption would reduce the number of choices offered as (4) Tigers ⊑ ∃ eat.(Cows ⊓ Sheep) part of Creating a lexically coherent ontology step of the al- (5) Cows ⊑ ¬ Sheep gorithm. Therefore, the system would only suggest the root Axioms (3) and (5) cause Lions to be unsatisfiable; ax- term of the thesaurus if no other alternative is available. ioms (4) and (5) cause Tigers to be unsatisfiable. In this Secondly, the current approach evaluates General Con- case, axiom (5) causes two unsatisfiable concepts; its re- cept Inclusions (GCIs) between two concept names (e.g. moval can resolve two unsatisfiabilities. Thus, axiom (5) is

6http://jena.sourceforge.net/ontology/index.html 7http://www.csd.abdn.ac.uk/ slam/RepairTab/ regarded as less important than axioms (3) and (4). More- “common error heuristics” and “merging heuristics”, then over, removing axiom (5) is a change which preserves more conducted a usability study with non-expert users. The re- of the information in the ontology, as otherwise two axioms sults of our usability study (Lam 2007) show that (1) with would have to be removed (axioms (3) and (4)). By remov- heuristic support, the users were guided towards the axioms ing axiom (5) we can retain the information about Lions which would be the best to modify, and they achieved cor- and Tigers. Existing tools use these heuristics, i.e., number rect modifications in a shorter time; (2) the heuristics could of unsatisfiabilities caused by the axiom, and information increase the users’ confidence in the modifications which lost by a modification. Hence, the user is suggested to re- they make; (3) the common error heuristics were useful for move/modify axiom (5). dealing with ontologies with common mistakes; and (4) the Intuitively, an axiom whose removal causes more infor- merging heuristics were useful for dealing with merged on- mation to be lost is regarded as more important, and should tologies. be preserved. However, it does not necessarily follow that such suggestions are the best results for all ontologies. A Impact of Changes human ontology engineer may well propose a better solu- Furthermore, whenever (parts of) an axiom is removed, it tion in particular cases. For example, an ontologist may is easy for the knowledge engineer to unintentionally re- believe that sibling concepts are usually disjoint with each move implicit information from the ontology. In order to other. That means, the disjointness of Cows and Sheep minimise the impact on the ontology, it is important to cal- should be kept in the above example. This heuristic has an culate the lost entailments (e.g., subsumption relationships) opposite result to the factor of information lost. Moreover, of named concepts which occur due to the removal of (parts Rector et al. (Rector et al. 2004) enumerate a number of of) axioms. We use the following example to illustrate our common error patterns made by non-expert users in mod- approach. eling OWL ontologies. For example, differences between Example 2 For a bird-penguin example, two axioms cause the linguistic and logical usage of ‘and’ and ‘or’ often cause Penguin to be unsatisfiable: confusion. Non-expert users might interpret axiom (3) as (1) Bird ⊑ Animal ⊓ CanFly that ‘lions eat sheep and cows’; it actually means ‘lions eat (2) Penguin ⊑ Bird ⊓¬ CanFly something which is both a cow and a sheep’. By using the common error patterns, we are able to evaluate axioms (3) To resolve this problem, if we remove the part Bird ⊑ and (4) as likely to be erroneous. CanFly from axiom (1), then the implicit information Eagle It is very difficult for tools to determine the optimal way ⊑ CanFly will be lost (see Figure 4 on the left-hand side). to resolve such errors (semi-)automatically. Many tools sim- If we remove the part Penguin ⊑ Bird from axiom (2), then ply leave it up to the user to determine how to resolve such the implicit information Penguin ⊑ Animal will be lost (see errors. We believe that it will be useful to analyse the heuris- Figure 4 on the right-hand side). tics ontology engineers use to solve these kinds of problems, To minimise the loss of information, when Bird ⊑ CanFly and to incorporate these heuristics in an ontology manage- is removed from axiom (1), the modeler should be notified ment tool. We firstly acquireda numberof heuristics used by that the information Eagle ⊑ CanFly should be added back ontology engineers from an empirical study (see (Lam 2007) to the ontology. When Penguin ⊑ Bird is removed from ax- for more details). We then combine our acquired heuristics iom (2), the modeler should be notified that the information with those already incorporated in existing debugging tools Penguin ⊑ Animal should be added back to the ontology. and divide them into the following categories: Whenever parts of an axiom or a whole axiom are re- 1. Impact of removal on the ontology – two heuristics are moved, it frequently happens that intended entailments are presented to analyse the loss of entitlementsdue to remov- lost. In order to minimise the impact on the ontology, we ing axioms (i.e., arity and number of lost entailments) analyse the lost information (e.g., concept subsumption) of 2. Knowledge of ontology structure – five novel heuristics concepts due to the removal of (parts of) axioms. We pro- are proposed to analyse: pose a fine-grained approach which allows us to identify (a) disjointness between siblings, changes which are helpful in that they restore lost informa- tion due to removal of axioms, and those that are harmful in (b) disjointness between non-siblings, that they cause additional unsatisfiability. (c) sibling patterns, (d) length of paths in a concept hierarchy, and Example 3 Continuing the above example, when Bird ⊑ CanFly (e) depth of concepts in a concept hierarchy from axiom (1) is removed, the helpful change is Eagle ⊑ CanFly; when Penguin ⊑ Bird from axiom (2) is 3. Linguistic heuristic – one heuristic is presented to analyse removed, the helpful change is Penguin ⊑ Animal. Harm- the similarities of the names of the concepts ful changes to replace the part CanFly in axiom (1) are From the result in the empirical study, we learnt that some ¬Animal, Penguin, and Eagle. This is because if the part subjects (the ontology engineers) took common OWL mod- CanFly in axiom (1) is replaced by ¬Animal, then Bird will eling errors into account when resolving the inconsistencies become unsatisfiable. If that part is replaced by Penguin, in ontologies; some subjects took the process of constructing then Penguin is defined as a type of Penguin. If it is re- ontologies into account, such as integrating/merging ontolo- placed by Eagle, then Penguin is defined as a type of Ea- gies. We therefore, clustered the heuristics into two sets: gle. Animal CanFly Animal CanFly We thank Dr. Wamberto Vasconcelos for helpful discussions of aspects of this work.

? ? References Bechhofer, S.; van Harmelen, F.; Hendler, J.; Horrocks, I.; McGuinness, D. L.; Patel-Schneider, P. F.; and Stein, L. A. Bird Bird 2004. OWL Web Ontology Language Reference. W3C Recommendation, World Wide Web Consortium. ? ? DCMI. 2003. Dublin Core Metadata Element Set, Ver- Eagle Eagle sion 1.1: Reference Description. DCMI Recommenda- tion, URLhttp://dublincore.org/documents/ Penguin Penguin dces/. Dellschaft, K., and Staab, S. 2006. On How to Perform a Figure 4: (Left-hand side) Option one: removing ‘birds can Gold Standard Based Evaluation of Ontology Learning. In fly’, the implicit information ‘eagles can fly’ is lost (dashed Proceedings of the 5th International Semantic Web Confer- arrow). (Right-hand side) Option two: removing ‘penguins ence (ISWC 2006), 228–241. are birds’, the implicit information ‘penguins are animals’ is Ding, L.; Pan, R.; Finin, T.; Joshi, A.; Peng, Y.; and Kolari, lost. (NB: Deletion is indicated by a on the appropriate × P. 2005. Finding and Ranking Knowledge on the Semantic link) Web. In Proceedings of the 4th International Semantic Web Conference, LNCS 3729, 156–170. Springer. We conducted a usability evaluation to test its effective- Fellbaum, C. 1998. WordNet: An Electronic Lexical Data- ness by comparing RepairTab with existing ontology de- base. MIT Press. bugging tools (Lam et al. 2007). The results show that Guarino, N., and Welty, C. 2000. Ontological Analysis of (1) the fine-grained approach greatly facilitated the subjects Taxonomic Relationships. In Proceedings of the 19th Inter- understanding of the reasons for concepts unsatisfiability, national Conference on Conceptual Modeling (ER 2000), and (2) the subjects found the helpful changes very useful 210–224. to minimise the impact of changes on the ontologies; but the Kalyanpur, A.; Parsia, B.; Sirin, E.; and Cuenca-Grau, B. harmful changes were less useful for them. For evaluating 2006. Repairing Unsatisfiable Concepts in OWL Ontolo- the runtime and memory performance of the proposed algo- gies. In Proceedings of the 3rd European Semantic Web rithms, we use a number of existing ontologies to evaluate Conference (ESWC2006), 170–184. the efficiency. The results demonstrate that our algorithms Lam, J. S. C.; Pan, J. Z.; Sleeman, D.; and Vasconcelos, provide acceptable performance when used with real world W. 2007. A Fine-Grained Approach to Resolving Unsatis- ontologies (Lam 2007). fiable Ontologies. Journal on Data Semantics 10. RepairTab: Future Work Lam, J.; Sleeman, D.; and Vasconcelos, W. 2005. Re- TAX++: a Tool for Browsing and Revising Ontologies. In Our debugging approach combines heuristic and formal ap- osters & Demonstration Proceedings of ISWC-05 (Galway, proaches for dealing with inconsistencies in ontologies. It Ireland). ISWC Conference Directorate. shows that the approach of acquiring human heuristics and Lam, J. S. C. 2007. Methods for Resolving Inconsistencies encoding them in a tool is viable for ontology debugging. It in Ontologies. Ph.D. Dissertation, University of Aberdeen. also shows that the fine-grained approach for indicating lost Meyer, T.; Lee, K.; Booth, R.; and Pan, J. Z. 2006. Find- entailments and recommending modifications is a powerful ing maximally satisfiable terminologies for the description novel approach. logic ALC. In Proceedings of the 21st National Conference on Artificial Intelligence (AAAI-06). Conclusion Miles, A., and Perez-Aguera, J. 2007. SKOS: Simple How these three tools interoperate to locate and refine on- knowledge organisation for the web. Cataloging and Clas- tologies, is illustrated in Figure 1 and will soon be the sub- sification Quarterly 43(3-4):69–83. ject of various useability studies. The newly introduced Pan, J. Z., and Thomas, E. 2007. Approximating OWL-DL W3C standards have made the interoperation of such sys- Ontologies. In Proc. of the 22nd National Conference on tems viable. Further, the use of tools such as CleOn and Artificial Intelligence (AAAI-07). To appear. RepairTab enhance and refine the set of ontologies available Pan, J. Z.; Stamou, G.; Stoilos, G.; Taylor, S.; and Thomas, on the Semantic Web. E. 2008. Scalable Querying Services over Fuzzy Ontolo- gies. In Proc. of the Seventeenth International World Wide Acknowledgment Web Conference (WWW 2008). To appear. Support from the AKT project (GR/N15764/01), the EC Pan, J. Z.; Thomas, E.; and Sleeman, D. 2006. ON- Knowledge Web project (IST-2004-507842) and DTI/Rolls- TOSEARCH2: Searching and Querying Web Ontologies. Royce IPAS project (DTI project No. TP/2/IC/6/I/10292). In Proc. of WWW/Internet 2006, 211–218. Patel, C.; Supekar, K.; Lee, Y.; and Park, E. K. 2003. OntoKhoj: a semantic web portal for ontology searching, ranking and classification. In WIDM ’03: Proceedings of the 5th ACM international workshop on Web information and data management, 58–61. New York, NY, USA: ACM Press. Prud’hommeaux, E., and Seaborne, A. 2006. SPARQL query language for RDF. W3C Working Draft, http://www.w3.org/TR/rdf-sparql-query/. Rector, A.; Drummond, N.; Horridge, M.; Rogers, J.; Knublauch, H.; Stevens, R.; Wang, H.; and Wroe, C. 2004. OWL Pizzas: Practical experience of teaching OWL-DL: Common errors & common patterns. In Proceedings of the European Conference on Knowledge Acquistion, 63–81. Resnik, P. 1995. Using Information Content to Evalu- ate Semantic Similarity in a Taxonomy. In Proceedings of the 14th International Joint Conference on Artificial Intel- ligence, 448–453. Reul, Q.; Sleeman, D.; and Fowler, D. 2008. CleOn: Res- olution of lexically incoherent concepts in an engineering ontology. Technical report, University of Aberdeen. To be published. Schlobach, S., and Cornet, R. 2003. Non-standard reason- ing services for the debugging of description logic termi- nologies. In 8th International Joint Conference on Artifi- cial Intelligence (IJCAI’03). Selman, B., and Kautz, H. 1996. Knowledge Compilation and Theory Approximation. Journal of the ACM (JACM) 43(2):193–224. Sirin, E.; Parsia, B.; Grau, B. C.; Kalyanpur, A.; and Katz, Y. 2007. Pellet: A practical OWL-DL reasoner. Web Se- mantics: Science, Services and Agents on the World Wide Web 5(2):51–53. Sleeman, D., and Reul, Q. 2006. CleanONTO: Evaluating taxonomic relationships in ontologies. In Proceedings of 4th International EON Workshop on Evaluation of Ontolo- gies for the Web. Tsarkov, D., and Horrocks, I. 2006. FaCT++ description logic reasoner: System description. In Proceedings of the 3d International Joint Conference on Automated Reason- ing (IJCAR 2006), 292–297. Volker, J.; Hitzler, P.; and Cimiano, P. 2007. Acquisition of OWL DL axioms from lexical resources. In Proceedings of the 4th European Semantic Web Conference (ESWC’07), 670–685. Volker, J.; Vrandecic, D.; and Sure, Y. 2005. Automatic evaluation of ontologies (AEON). In Proceedings of the 4th International Semantic Web Conference (ISWC2005), 716–731. Wang, H.; Horridge, M.; Rector, A.; Drummond, N.; and Seidenberg, J. 2005. Debugging OWL-DL Ontologies: A heuristic approach. In 4th Internation Semantic Web Con- ference.