Environments, Methodologies and Languages for Supporting Users in Building a Chemical Ontology
Total Page:16
File Type:pdf, Size:1020Kb
Environments, Methodologies and Languages for supporting Users in building a Chemical Ontology A dissertation submitted to The University of Manchester for the degree of MSc Bioinformatics in the F aculty of Life Sciences September 2005 Louisa Casely-Hayford Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. TABLE OF CONTENTS TABLE OF CONTENTS……………………………………………………….……1 LIST OF TABLES…………………………………………………………………...4 LIST OF FIGURES…………………………………………………………….…….5 ABSTRACT……………………………………………………………………....….6 DECLARATION……………………………………………………………….…....8 COPYRIGHT STATEMENT………………………………………………….…….9 ACKNOWLEDGEMENTS……………………………………………………..….10 1 Introduction……………………………………………………………..….11 1.1 Ontologies…………………………………………………………..….12 1.2 The Role of Ontologies in the CCLRC Data Portal………..……….....12 1.3 Objectives………………………………………………………....…....14 1.4 Structure of Dissertation………………………………………………..15 2 Background and Literature review ……………………………………….16 2.1 The Council for the Central Laboratory of the Research Councils.........16 2.2 What is an Ontology?...............................................................................18 2.3 Types of Ontologies and their uses…………………………..……….....21 2.4 Building an Ontology………………………………………………..…..25 2.5 Methodologies, Languages and Editing environments…………….……29 2.6 The role of ontologies in Semantic Web(SW) Portals…………….…….35 2.7 Topic Maps………………………………………………………….......37 1 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. 3 Methodologies, tools and Languages for building ontologies………….....42 3.1 Methodologies for building ontologies……………………………..…..42 3.2 Ontology Implementation Languages………………………………..…46 3.3 Classic Ontology Specification Languages.…............................. ..........48 3.4 Web-Based Ontology Specification Languages………………………...48 3.5 Ontology Development tools……………………………………………51 3.6 Newer Generation Ontology development tools………………………..54 3.7 Graphical Ontology Editors……………..……………………………...57 4 Choosing an Ontology Language and Editing Environment…………....60 4.1 Criteria for evaluating Ontology Editors………………………………...61 4.2 Summary of Results of Evaluation of editors…………………………...65 4.3 Editing the Chemical Ontology Graphically…………………………….75 4.4 Evaluation of OntoTrack………………………………………………...76 4.5 Evaluation of Growl……………………………………………………..77 4.6 Choosing an Ontology Language………………………………………..81 4.7 Evaluation of Ontology languages………………………………………83 4.8 Summary of Results of Evaluation……………………………………...84 5 Other methods of building controlled taxonomies, vocabularies and thesauri………………………………………………………………….….90 5.1 Topic Maps and Simple Knowledge Organisation Systems (SKOS).....91 5.2 Deployment of Ontologies………………………………………………94 5.3 Ontology Management Systems………………………………………...95 2 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. 6 Converting a Topic Map into an Ontology ………………………………98 6.1 Topic Map of Chemistry…………………………………………………99 6.2 Building the Chemical ontology……………………………………….102 6.3 Problems encountered………………………………………………….105 6.4 What is the added value of an ontology compared to a Topic Map?......108 7 Conclusion…………………………………………………………………112 7.1 Future work………………………………………………………....116 8. References………………………………………………………………….118 9. Appendix…………………………………………………………………....126 Word count: 18,856 3 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. LIST OF TABLES Table 3.1 A comparison between the methodologies………………….…..……......43 Table 3.2 General Description of Ontology Development Tools……..………….…52 Table 4.1 Summary of the results of evaluation of tools……………………..….….65 Table 4.2 Interoperability of Tools……………………...………….………….……68 Table 4.3 Summary of results of the evaluation of ontology languages…………...83 4 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. LIST OF FIGURES Figure 2.1 Amigo………………………………………………………………....….20 Figure 2.2 Snapshot of Protégé OWL Interface…………...…………………..……24 Figure 2.3 Editing in Protégé OWL. ………………………………………..……….25 Figure 2.4 Illustration of a skeletal methodology and life cycle….…………...……..30 Figure 2.5 Ontology Life cycle………………………………………………...…….31 Figure 2.6 A topic Map…………………………………………………………...….38 Figure 3.1 Classification of Ontology Implementation Languages……..……...........48 Figure 3.2 Illustration of W3C recommended stack of ontology markup language...49 Figure 3.3 Relationship between methodologies, languages and tools……………...57 Figure 4.1 Snapshot of DAG-edit……………………………………………………69 Figure 4.2 SWOOP GUI…………………………………………………………….70 Figure 4.3 Chimaera………………………………………….………………………70 Figure 4.4 Protégé Interface……………………………………………………….....71 Figure 4.5 OiLed interface…………………………………………………………....72 Figure 4.6 Snapshot of OntoEdit interface …………………………………………..73 Figure 4.7 Snapshot of WebODE………………………………………………….…74 Figure 4.8 Growl Interface. ………………………………………………………….78 Figure 4.9 Owl viz tab in Protégé OWL interface……...………………………...…..79 Figure 6.1 List of Topics present in Topic Map………………..…………………...100 Figure 6.2 Topic Map Before Conversion…………………………………………..100 Figure 6.3 An illustration of associations between Topics. ………...………………101 Figure 6.4 The Chemical Ontology..… ………………………………..…………...105 5 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. ABSTRACT Ontologies are widely used by different communities for several purposes. With the advent of the Semantic Web, ontologies are becoming increasingly popular amongst members of the scientific community. This is because they provide a powerful way to formally express the nature of a domain or subject area. By defining shared and common domain theories, ontologies help both people and machines to communicate concisely, which promotes knowledge reuse and integration. During the process of building an ontology several questions arise related to the methodologies, tools and languages that should be used in the process of development. The Council for the Central Laboratory of the Research Councils (CCLRC) is developing a Data Portal to store and retrieve experimental data from across the spectrum of the sciences. Ontologies are a key part of this effort as they are used to provide a common indexing mechanism for these data. Therefore this dissertation aims to review how simple tools and techniques can be used to gather ontological information from a community as a form of consensus building. This project is part of a wider effort by the Council for the Central Laboratory of the Research Councils CCLRC to develop a data portal to store and retrieve experimental data from across the spectrum of the sciences. This dissertation discusses how simple tools and techniques can be used to gather ontological information from a community as a form of consensus building. After looking at a number of tools, Protégé and the web ontology language (OWL) were chosen as the best combination for building both heavy and lightweight 6 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. ontologies. Using these tools, a Topic Map of Chemistry was converted into an ontology. Due to their lack of formal semantics SKOS (Simple knowledge Organisation systems) and topic maps proved unsuitable for building heavy weight ontologies. 7 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. DECLARATION No portion of the work referred to in the dissertation has been submitted in support of an application for another degree or qualification of this or any other university of the other institution of learning. 8 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. COPYRIGHT STATEMENT (1) Copyright in text of this dissertation rests with the author. Copies (by any process) either in full, or of extracts, may be made only in accordance with instructions given by the author. Details may be obtained from the appropriate Graduate Office. This page must form part of any such copies made. Further copies (by any process) of copies made in accordance with such instructions may not be made without the permission (in writing) of the author. (2) The ownership of any intellectual property rights which may be described in this dissertation is vested in the University of Manchester, subject to any prior agreement to the contrary, and may not be made available for use by third parties without the written permission of the University, which will prescribe the terms and conditions of any such agreement. (3) Further information on the conditions under which disclosures and exploitation may take place is available from the Head of the School of Electrical and Electronic Engineering (or the Vice-President and Dean of the Faculty of Life Sciences for Faculty of Life Sciences’ candidates.) 9 Evaluation notes were added to the output document. To get rid of these notes, please order your copy of ePrint IV now. ACKNOWLEDGEMENT Many thanks to the head of the CCLRC e-Science Data Management group Kerstin Kleese Van Dam, for her support and guidance throughout the duration of the project. Special thanks to Mr. Shoaib Sufi, the Deputy Group Leader of the CCLRC e-Science Data Management