Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010
Total Page:16
File Type:pdf, Size:1020Kb
Federated Ontology Search Vasco Calais Pedro CMU-LTI-09-010 Language Technologies Institute School of Computer Science Carnegie Mellon University 5000 Forbes Ave. Pittsburgh, PA 15213 www.lti.cs.cmu.edu Thesis Committee: Jaime Carbonell, Chair Eric Nyberg Robert Frederking Eduard Hovy, Information Sciences Institute Submitted in partial fulfillment of the requirements for the degree Doctor of Philosophy In Language and Information Technologies Copyright © 2009 Vasco Calais Pedro For my grandmother, Avó Helena, I am sorry I wasn’t there Abstract An Ontology can be defined as a formal representation of a set of concepts within a domain and the relationships between those concepts. The development of the semantic web initiative is rapidly increasing the number of publicly available ontologies. In such a distributed environment, complex applications often need to handle multiple ontologies in order to provide adequate domain coverage. Surprisingly, there is a lack of adequate frameworks for enabling the use of multiple ontologies transparently while abstracting the particular ontological structures used by that framework. Given that any ontology represents the views of its author or authors, using multiple ontologies requires us to deal with several significant challenges, some stemming from the nature of knowledge itself, such as cases of polysemy or homography, and some stemming from the structures that we choose to represent such knowledge with. The focus of this thesis is to explore a set of techniques that will allow us to overcome some of the challenges found when using multiple ontologies, thus making progress in the creation of a functional information access platform for structured sources. In this thesis we try to address the question “How do we integrate and use information contained in a set of heterogeneous ontologies?” This question is becoming crucial as the world gets more connected. As the amount of information grows, so does the amount of resources that encode domain knowledge in different forms. At the same time, as we deal with problems that span a large number of domains, such as search, question answering and semantic analysis, the need for concurrent access to multiple ontologies becomes ever more self-evident. We model our approach on the general framework proposed in Federated Search and transpose the set of problems to the ontological domain. We start by illustrating the variety of available ontologies and focusing on automatic ontology creation. We proceed to describe the use of pro- active ontology selection to guide the ontology selection process at the query level. We then propose a set of operators for ontological search centered on the user’s information needs and expand the basic operator set through composition of basic operators. We address the problem of graph merging within Ontology Search and propose a scoring metric for results that is based in the proposed operator set. Finally, we present results of using the Federated Ontology Search (FOS) engine in a set of task based evaluations. Our task set is comprised of type checking for question answering, concept coverage, concept disambiguation and content recommendation. We show that in all of the evaluated tasks, the FOS system outperforms the use of individual ontologies. 1 INTRODUCTION ............................................................................................................................... 6 1.1 MOTIVATING EXAMPLE ................................................................................................................... 12 1.1.1 Question Answering ............................................................................................................... 12 1.1.2 Example 1 .............................................................................................................................. 12 1.2 BASIC DEFINITIONS ......................................................................................................................... 20 1.2.1 Ontologies and Graphs .......................................................................................................... 20 1.3 CHALLENGES TO BE ADDRESSED ...................................................................................................... 22 1.4 SUMMARY OF CONTRIBUTIONS ........................................................................................................ 22 1.5 EVALUATION OF RESULTS ................................................................................................................ 23 1.6 OVERVIEW OF THESIS ORGANIZATION.............................................................................................. 24 1.7 SUMMARY........................................................................................................................................ 24 2 BACKGROUND AND RELATED WORK .................................................................................... 26 2.1 ONTOLOGY SELECTION .................................................................................................................... 26 2.2 WORD SENSE DISAMBIGUATION ...................................................................................................... 27 2.3 RESULT MAPPING AND MERGING .................................................................................................... 29 3 AUTOMATIC ONTOLOGY CREATION ..................................................................................... 32 3.1 AUTOMATIC CREATION OF ONTOLOGIES FROM SEMI STRUCTURED RESOURCES ............................. 32 3.2 RELATED WORK .............................................................................................................................. 33 3.3 THE WIKIPEDIA STRUCTURE ............................................................................................................ 34 3.4 GENERAL METHODOLOGY ............................................................................................................... 35 3.5 FEATURE EXTRACTION .................................................................................................................... 35 3.6 GENERATING LABELED DATA ......................................................................................................... 36 3.6.1 List Based Labeling ............................................................................................................... 37 3.6.2 Context Based Labeling ......................................................................................................... 37 3.7 DIRECTIONAL FEEDBACK EDGE LABELING ALGORITHM ................................................................. 39 3.8 UMLS AND OKINET......................................................................................................................... 42 3.9 EXPERIMENTAL SETTING ................................................................................................................. 43 3.10 RESULTS ..................................................................................................................................... 43 4 ONTOLOGY SELECTION .............................................................................................................. 47 4.1 CHARACTERIZATION OF ONTOLOGICAL RESOURCES ....................................................................... 47 4.2 ONTOLOGICAL REPRESENTATION LANGUAGES ............................................................................... 52 4.2.1 Knowledge Interchange Format (KIF) .................................................................................. 52 4.2.2 Resource Description Framework (RDF).............................................................................. 52 4.2.3 Web Ontology Language (OWL) ........................................................................................... 53 4.2.4 CycL ...................................................................................................................................... 54 4.3 PROACTIVE ONTOLOGY SELECTION ................................................................................................. 55 4.3.1 Cooperative Ontology Selection ............................................................................................ 55 4.3.2 Uncooperative Ontology Selection ........................................................................................ 56 5 ONTOLOGICAL SEARCH ............................................................................................................. 62 5.1 ONTOLOGY QUERY LANGUAGES ..................................................................................................... 62 5.1.1 Simple Protocol and RDF Query Language (SPARQL) ........................................................ 62 5.1.2 Metaweb Query Language (MQL)......................................................................................... 63 5.1.3 User Centric Approach to Ontology Query Languages ........................................................ 64 5.2 PROPOSED QUERY APPROACH ......................................................................................................... 67 5.2.1 Operators............................................................................................................................... 68 5.2.2 Query ..................................................................................................................................... 80 5.2.3 Query Results .......................................................................................................................