Weaving the Semantic Web: Contributions and Insights

Anne Cregan Weaving the Semantic Web: Contributions and Insights Doctoral Thesis submitted in partial requirement of the award of PhD from the University of New South Wales September 2008 Research reported in this thesis has been partially financed by NICTA (http://www.nicta.com.au). NICTA is funded by the Australian Government’s Department of Communications, Information Technology and the Arts, and the Australian Research Council through Backing Australia’s Ability and the ICT Centre of Excellence program. It is supported by its members the Australian National University, University of NSW, ACT Government, NSW Government and affiliate partner University of Sydney. Abstract The semantic web aims to make the meaning of data on the web explicit and machine processable. Harking back to Leibniz in its vision, it imagines a world of interlinked information that computers ‘understand’ and ‘know’ how to process based on its meaning. Spearheaded by the World Wide Web Consortium, ontology languages OWL and RDF form the core of the current technical offerings. RDF has successfully enabled the construction of virtually unlimited webs of data, whilst OWL gives the ability to express complex re- lationships between RDF data triples. However, the formal semantics of these languages limit themselves to that aspect of meaning that can be captured by mechanical inference rules, leaving many open questions as to other aspects of meaning and how they might be made machine processable. The Semantic Web has faced a number of problems that are addressed by the included publications. Its germination within academia, and logical semantics has seen it struggle to become familiar, accessible and implementable for the general IT population, so an overview of semantic technologies is provided. Faced with competing ‘semantic’ languages, such as the ISO’s Topic Map standards, a method for building ISO-compliant Topic Maps in the OWL DL language has been provided, enabling them to take advantage of the more mature OWL language and tools. Supplementation with rules is needed to deal with many real-world scenarios and this is explored as a practical exercise. The available syntaxes for OWL have hindered domain experts in ontology building, so a natural language syntax for OWL designed for use by non-logicians is offered and compared with similar offerings. In recent years, proliferation of ontologies has resulted in far more than are needed in any given domain space, so a mechanism is proposed to facilitate the reuse of existing ontologies by giving contextual information and leveraging social factors to encourage wider adoption of common ontologies and achieve interoperability. Lastly, the question of meaning is addressed in relation to the need to define one’s terms and to ground one’s symbols by anchoring them effectively, ultimately providing the foundation for evolving a ‘Pragmatic Web’ of action. i Official Statements Copyright Statement I hereby grant the University of New South Wales or its agents the right to archive and to make available my thesis or dissertation in whole or part in the University libraries in all forms of media, now or here after known, subject to the provisions of the Copyright Act 1968. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation. I also authorise University Microfilms to use the 350 word abstract of my thesis in Disser- tation Abstracts International (this is applicable to doctoral thesis only). I have either used no substantial portions of copyright material in my thesis or I have obtained permission to use copyright material; where permission has not been granted I have applied/will apply for a partial restriction of the digital copy of my thesis or dissertation. Signature: Date: Authenticity Statement I certify that the Library deposit digital copy is a direct equivalent of the final officially approved version of my thesis. No emendation of content has occurred and if there are any minor variations in formatting, they are the result of the conversion to digital format. Signature: Date: Originality Statement I hereby declare that this submission is my own work and to the best of my knowledge it contains no materials previously published or written by another person, or substantial proportions of material which have been accepted for the award of any other degree or diploma at UNSW or any other educational institution, except where due acknowledgment is made in the thesis. Any contribution made to the research by others, with whom I have worked at UNSW or elsewhere, is explicitly acknowledged in the thesis. I also declare that the intellectual content of this thesis is the product of my own work, except to the extent that assistance from others in the projects design and conception or in style, presentation and linguistic expression is acknowledged. Signature: Date: iii Preface The Semantic Web is a fascinating enterprise with the aim of inter-linking data on a massive scale and machine processing it based on its semantics. It is a mammoth effort, now involving thousands of people, of which I am merely one. Whilst much has been achieved, some of the questions regarding meaning and making it machine processable are tantalizingly unresolved. Our generation has a great opportunity to address this question and to decide how it will be tackled. This thesis details some of the contributions and insights I as one of those thousands of people have offered to the overall process I like to refer to as ‘Weaving the Semantic Web’, an expression that tips its hat to world wide web and semantic web inventor Sir Tim Berners-Lee’s famous book titled ‘Weaving the Web’. As a UNSW student sponsored by NICTA, Australia’s Centre of Excellence for Infor- mation and Communications Technology, I have embraced NICTA’s values of use-inspired research and close engagement with real-world users of technology. The publications included in this thesis thus reflect a focus on delivering what has been or is needed to make the Semantic Web as usable as possible, as well as attempting to deliver on its vision of machine processable meaning. Some of the included publications have helped the semantic web along its evolutionary path to get where it is now, some are at the forefront of its current evolution, whilst others are very long-sighted and examine what is needed to reach the ultimate goal of machines that ‘understand’ information and ‘know’ how to process it according to its meaning. Such an effort involves many activities, of which only some result are visible as the publications included in this thesis: others result in community building, application building, education and outreach and material such as WG reports and recommendations. I have been involved in many of these activities and would like to thank all those who have ac- companied me on this exciting journey, particularly my colleagues at NICTA, and fellow members of the W3C’s OWL Working Group and Reasoning on the Web Incubator Group (UR-W3). I would also like to thank my supervisors Emeritus Professor Norman Foo, Dr Thomas Meyer and Dr Maurice Pagnucco, my family and particularly James for their invaluable support throughout the time I have been engaged in this work. Anne Cregan September 2008. v Structure of the Thesis As the Semantic Web is a very dynamic field of activity and my work has focused on active areas of research and development, it is very difficult to offer a complete doctoral thesis that is current at the time of submission. Therefore, in accordance with the regulations of UNSW, I have opted to submit this thesis as a series of publications. Each of these has been researched and published or accepted for publication within the period of my PhD candidature at UNSW, and has been subjected to a peer review process prior to publication. The publications are grouped into chapters according to commonality of content. Each publication is preceded by details of where and when it was published and the author’s personal contribution to each publication, acknowledging the contribution of others where the candidate is not the sole author. The personal contribution percentage was arrived at by asking each co-author to estimate my personal contribution to the publication, and averaging their estimates. The copyright permission of each co-author to reproduce the publication was also obtained. The thesis contains an introductory chapter, Chapter 1, that introduces the Semantic Web, places the published works in the context of problems facing the Semantic Web, and summarizes each publication’s content and contribution. The publications themselves are presented in Chapters 2 through 7 in their published formats, overlaid with pagination within this thesis. In summary: • Chapter 2 Overview of Semantic Technologies is an overview of semantic technologies, which explains the field to mainstream information technologists in order to make it more accessible and to encourage wider adoption of the technologies. In the context of this thesis, it provides background information that is relevant for setting the scene for the following chapters, which are essentially independent of each other. • Chapter 3 Integrating Topic Maps into the Semantic Web details work conducted to map the ISO’s Topic Map standards into the W3C’s Semantic Web stack of technologies, so that it might take advantage of the formal semantics and tools associated with the latter. • Chapter 4 Adding Rules to OWL Ontologies describes joint work conducted to explore the need for, and use of rules in conjunction with OWL ontologies, editors and reasoners. • Chapter 5 Controlled Natural Language Syntaxes for OWL describes joint work on designing a controlled natural language syntax for OWL 2.

Weaving the Semantic Web: Contributions and Insights

The Origins of the Underline As Visual Representation of the Hyperlink on the Web: a Case Study in Skeuomorphism

The Semantic Web in Action

Semantic Wiki Search

A Survey of Top-Level Ontologies to Inform the Ontological Choices for a Foundation Data Model

Semantic Integration and Knowledge Discovery for Environmental Research

Semantic Integration Across Heterogeneous Databases Finding Data Correspondences Using Agglomerative Hierarchical Clustering and Artificial Neural Networks

Uluslararası Ders Kitapları Ve Eğitim Materyalleri Dergisi

Machine Learning

Chaudron: Extending Dbpedia with Measurement Julien Subercaze

Unreasonable Expectations: an Examination of the Semantic Web

Everybody Is Talking About Virtual Assistants, but How Are Users Really Using Them?

Logic and Ontology