Evaluating Ontology Cleaning
Total Page:16
File Type:pdf, Size:1020Kb
Evaluating Ontology Cleaning Christopher Welty, Ruchi Mahindru, and Jennifer Chu-Carroll IBM T.J. Watson Research Center 19 Skyline Dr. Hawthorne, NY 10532 {welty,jencc}@us.ibm.com, [email protected] Abstract very little evidence that it can have impact on Ontology as a discipline of Computer Science has made knowledge-based systems. In fact, there appears many claims about its usefulness, however to date there has to be a significant obstacle in understanding the been very little evaluation of those claims. We present the methodology, and even without this “learning results of an experiment using a hybrid search system with a curve”, significant manual effort must be significant knowledge-based component to measure, using expended to employ the methodology to develop precision and recall, the impact of improving the quality of actual “clean” ontologies. Furthermore, there an ontology on overall performance. We demonstrate that has been no clear argument that such an improving the ontology using OntoClean (Guarino and expenditure will pay for itself in the long run. Welty, 2002), does positively impact performance, and that Indeed, “Why does it matter?” has been the most having knowledge of the search domain is more effective than domain-knowledge-free search techniques such as link frequent criticism of the OntoClean approach. analysis. We report here on a series of experiments using a Content Areas: Ontologies hybrid search system with a significant knowledge-based component to test the impact of improving the quality of ontologies on system Introduction performance. The use of search as a test system provides a well understood framework for Ontologies have been proposed as a solution to empirical evaluation, and gives an excellent numerous problems in areas such as search, opportunity to address the “Why does it matter?” semantic integration, agents, human-computer question. interaction, etc. The premise is usually that a Preliminary results of this experiment were clear, high-quality ontology can act as an presented earlier in a workshop (Welty, et al, interlingua in which mappings between systems 2003). In this paper we present the final data for can unambiguously be expressed, or in which the full experiment, plus a more detailed analysis understanding of a domain can be gained by a of the results. system or shared by users (Smith and Welty, 2001). For the most part, this claim has not been Background realized in practice, and one reason for this failure is that there was little agreement on just The field of ontology has been sorely lacking in what makes an ontology “clear” or “high formal empirical analysis in general. This quality.” One recent development in ontology deficiency has been recognized, however, and a research has been the specification of a formal few examples are starting to appear. Ablation methodology for ontological analysis, OntoClean analysis of ontologies at U.T. Austin (Fan, et al., (Guarino and Welty, 2002), that addresses the 2003) is one example in which elements of an problem of defining just what “high quality” is ontology used by a search system that relied on for ontologies. Following this definition and recognizing noun compounds were approach, a high quality foundational ontology, systematically removed to determine which level Dolce, is being developed (Gangemi, et al, 2002). of specificity were most relevant to the results. While OntoClean appears to be widely accepted There have also been several evaluations of the in the scientific community, and OntologyWorks, impact of structured knowledge (loosely a small company providing database integration construed as ontologies) on IR and search services, has a proprietary analysis tool based on systems in general. OntoClean (www.ontologyworks.com), there is Most closely related to this work is the work of Clark, et al, at Boeing (2000), in which a search system was enhanced by employing an ontology- Copyright © 2004, American Association for Artificial Intelligence like artifact (a thesaurus of terms with more (www.aaai.org). All rights reserved. meaningful structure than a flat list of keywords). KNOWLEDGE REPRESENTATION & REASONING 311 This work showed that precision and recall From the grammatical structure of the question, performance of a retrieval system can be we extract the primary verb phrase and the noun significantly increased by adding this kind of phrases. The verb phrase information is the main information. It is important to note that while evidence used to fire rules for recognizing Clarke, et al, did discuss a process for improving question types, which themselves depend on the the quality of the ontology; they did not formally web site structure. The ibm.com web site, like evaluate the impact of the improvement. many enterprise sites, is divided into a section Furthermore, Kanisa, Inc. (previously AskJeeves for support and a section for sales. This gives Enterprise Solutions – www.kanisa.com) has RISQUE its two most basic question types, based their business on providing domain- “buy” and “support”. specific knowledge-enhanced search, and has The hub-page finder is a system of declarative been turning a profit since 1Q 2002 (Ulicny, rules that takes the noun phrases from the 2003). question and determines whether they Similar evaluations of the impact of ontologies correspond to products listed in the terminology, on search-based systems have been done in the and if so finds the most appropriate “hub page” question-answering community. Moldovan and or “comparison page” for that product. For Mihalcea (2000) use a significantly enhanced example, many questions about IBM Thinkpads version of WordNet to drastically improve can be answered with information on the question answering performance, and other “ThinkPad home page” on the IBM site. groups including Woods, et al (2000), Hovy et al Directing users to these key pages is often the (2001), and Prager, et al (2001), have reported quickest path to an answer. Hub and comparison similar results. Again, as with the Boeing work, pages are described in the next section. The these groups report positively on the impact of rules are broken into two parts, one set is derived adding an ontology to a search system, but make directly from the knowledge base and includes no attempt to determine whether a good quality moreImportantThan relationships and the ontology would improve performance more. In taxonomy; the second includes rules expressing fact, within the IR and QA communities, the relationships between linguistically-derived WordNet is the most common ontology-like information and the hub pages. For example, if artifact to employ, and previous work has shown the question contains a superlative, as in “What that WordNet viewed as an ontology is not is the fastest Thinkpad?”, the rules indicate that particularly of high quality (Oltramari, et al, the Thinkpad comparison page should be 2002). returned. The system iteratively generates complex queries using knowledge of web site structure. The System Overview query may include URL restrictions, such as “only consider pages with 'support' in the URL The RISQUE system is an evolution of the for support questions”, or “exclude pages with system reported in (Chu-Carroll, et al, 2002). research.ibm.com in the URL for buy questions”. This system provides a natural-language front The query will also make use of boolean end to a conventional search engine, but uses connectives, disjunction to support synonym clues in the natural language question, a expansion, and conjunction of noun phrase terms. knowledge-base of industry terms, and If this query does not return enough hits, the knowledge of the web site structure (see below) query formulation component will relax the to construct an advanced search query using the query according to a number of heuristics, such full expressiveness of the search engine query as dropping the least important noun phrase. language. This search is limited to a corporate Knowledge of which terms are more important web site, in this case our knowledge is of the than others, based on manual analysis of the web ibm.com buy and support pages for the site, is included in the knowledge-base. ThinkPad™ and NetVista™ product lines. RISQUE iteratively formulates search queries The main components of RISQUE include a and accumulates results until a pre-determined parser, the terminology knowledge base, rules for number of result pages are obtained. question types, a hub page finder, a query The RISQUE system was tested and trained with formulation component, and the search engine. a set of questions made up by team members. The parser is a slot-grammar parser (McCord, For the initial experiment described in (Welty, et 1980) that must be seeded with multi-word al, 2003), the system was evaluated with industry specific terms, so that e.g. “disk drive” questions made up by a domain expert from will be parsed as a compound noun, rather than a outside the group. Finally, the system was head noun with a pre-modifier. These multi- evaluated with roughly 200 questions collected word terms come from the knowledge-base. from students and professors at CCNY. 312 KNOWLEDGE REPRESENTATION & REASONING mentioned in the same question, such as, “What CD drive goes with my ThinkPad T23?” we Role of Ontology consider the CD drive to be the more important term in the question. The more important term in The central terminology of RISQUE is an the question will have its hub page returned in a ontology-like knowledge-base of industry terms higher position, and the less important term may, arranged in a taxonomy according to specificity. in some circumstances, not have its hub page In addition to the taxonomy, the knowledge base appear at all. In addition, the less important term includes important information used by the will be dropped first during query relaxation. system: The moreImportantThan relation is considered to Hub page: Most terms at the top level of the be transitive, and is also inherited down the taxonomy have a corresponding “hub page” – a taxonomy.