University of Southampton Research Repository Eprints Soton
Total Page:16
File Type:pdf, Size:1020Kb
University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk UNIVERSITY OF SOUTHAMPTON FACULTY OF PHYSICAL AND APPLIED SCIENCES School of Electronics and Computer Science Archaeology and the Semantic Web by Leif Isaksen Thesis for the degree of Doctor of Philosophy December 2011 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF PHYSICAL AND APPLIED SCIENCES SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy ARCHAEOLOGY AND THE SEMANTIC WEB by Leif Isaksen This thesis explores the application of Semantic Web technologies to the discipline of Archaeology. Part One (Chapters 1{3) offers a discussion of historical developments in this field. It begins with a general comparison of the supposed benefits of semantic technologies and notes that they partially align with the needs of archaeologists. This is followed by a literature review which identifies two different perspectives on the Se- mantic Web: Mixed-Source Knowledge Representation (MSKR), which focuses on data interoperability between closed systems, and Linked Open Data (LOD), which connects decentralized, open resources. Part One concludes with a survey of 40 Cultural Heritage projects that have used semantic technologies and finds that they are indeed divided between these two visions. Part Two (Chapters 4{7) uses a case study, Roman Port Networks, to explore ways of facilitating MSKR. Chapter 4 describes a simple ontology and vocabulary framework, by means of which independently produced digital datasets pertaining to amphora finds at Roman harbour sites can be combined. The following chapters describe two entirely different approaches to converting legacy data to an ontology-compliant semantic for- mat. The first, TRANSLATION, uses a `Wizard'-style toolkit. The second, Introducing Semantics, is a wiki-based cookbook. Both methods are evaluated and found to be technically capable but socially impractical. The final chapter argues that the reason for this impracticality is the small-to-medium scale typical of MSKR projects. This does not allow for sufficient analytical return on the high level of investment required of project partners to convert and work with data in a new and unfamiliar format. It further argues that the scale at which such investment pays off is only likely to arise in an open and decentralized data landscape. Thus, for Archaeology to benefit from semantic technologies would require a severe sociological shift from current practice towards openness and decentralization. Whether such a shift is either desirable or feasible is raised as a topic for future work. To Jessica Rose Ogden, who was there from the beginning. i Contents List of Figures vii List of Tables ix Declaration of Authorship xi Acknowledgements xii Nomenclature xiii 1 Introduction1 1.1 Preamble....................................1 1.2 The Semantic Web...............................2 1.2.1 Potential Benefits...........................3 1.2.2 Potential Limitations..........................5 1.3 Archaeology...................................7 1.4 Semantic Technologies and Archaeology...................9 1.5 Research Aims and Contribution....................... 12 1.5.1 Additional Outputs.......................... 12 1.6 Chapter Summary............................... 13 2 The Semantic Web and Cultural Heritage 17 2.1 Overview.................................... 17 2.2 A Brief History of the Semantic Web..................... 17 2.2.1 The Semantic Web Era (2001-2005)................. 20 2.2.2 The Linked Data Era (post-2005).................. 27 2.3 The Semantic Web in Cultural Heritage................... 33 2.3.1 The CIDOC CRM........................... 34 2.3.2 Galleries, Libraries, Archives and Museums............. 36 2.3.3 Archaeology............................... 40 2.4 Conclusions................................... 45 3 Survey | Evaluating Semantic Technologies for Data Publication in Cultural Heritage 47 3.1 Overview.................................... 47 3.2 Method..................................... 48 3.2.1 Participants............................... 49 3.2.2 Procedure................................ 50 iii iv CONTENTS 3.2.3 Bias and Limits of Scope....................... 50 3.3 Findings..................................... 53 3.3.1 The Projects.............................. 53 3.3.2 Intentions................................ 56 3.3.3 Semantic Technologies Employed................... 59 3.3.4 Data Conversion............................ 60 3.3.5 Access and Consumption....................... 62 3.3.6 No Linked Open Data?........................ 64 3.3.7 Respondents' Comments........................ 64 3.4 Conclusions................................... 68 4 MSKR in Archaeology: Building a Framework 71 4.1 Overview.................................... 71 4.2 Case Study: Roman Port Networks...................... 72 4.3 Requirements.................................. 74 4.4 Infrastructure.................................. 76 4.4.1 Domain Ontology............................ 77 4.4.2 Thesauri................................. 80 4.4.3 Hosting and Maintenance....................... 82 4.5 Conclusions................................... 82 5 TRANSLATION | A Toolkit for Generating RDF 85 5.1 Overview.................................... 85 5.2 Stage 1: Mapping................................ 86 5.2.1 Configuration File........................... 86 5.2.2 Architecture.............................. 87 5.2.3 Configuration details.......................... 88 5.2.4 Schema Mapping............................ 89 5.2.5 URI Minting.............................. 91 5.2.6 URI Search............................... 92 5.2.7 Literal Standardization........................ 95 5.3 Stage 2: Export................................. 97 5.4 Stage 3: Representation............................ 98 5.5 Evaluation.................................... 98 5.6 Conclusions................................... 101 6 Introducing Semantics | A Semantic Cookbook 105 6.1 Overview.................................... 105 6.1.1 Introducing Semantics ......................... 106 6.1.2 Infrastructure.............................. 108 6.1.3 Layout.................................. 109 6.2 Semantic Level 1: Literal Standardization.................. 110 6.2.1 Overview and Aims.......................... 110 6.2.2 Recipes................................. 111 6.2.3 Visualization and Analysis....................... 121 6.3 Semantic Level 2: Introducing URIs..................... 123 6.3.1 Overview and Aims.......................... 123 CONTENTS v 6.3.2 Recipes................................. 124 6.3.3 Visualisation and Analysis....................... 133 6.4 Semantic Level 3: Introducing RDF..................... 137 6.4.1 Overview and Aims.......................... 137 6.4.2 Recipes................................. 138 6.4.3 Visualisation and Analysis....................... 144 6.5 Evaluation.................................... 147 6.6 Conclusions................................... 148 7 Discussion, Conclusions and Future Work 151 7.1 Review...................................... 151 7.2 Discussion.................................... 153 7.3 Conclusions................................... 155 7.3.1 Openness................................ 155 7.3.2 Decentralization............................ 158 7.3.3 Linked Open Data........................... 159 7.4 Contributions and Future Work........................ 160 A Semantic Web Conference Papers 165 B STCH Survey: Participant Information Sheet 171 C STCH Survey: Consent Form 175 D STCH Survey: Questionnaire 177 E STCH Survey: Participants 187 F STCH Survey: Data 191 G Roman Port Networks: Partners 211 H Introducing Semantics Questionnaire 215 Glossary 217 Bibliography 217 List of Figures 2.1 The Semantic Web Layer Cake in 2000.................... 23 2.2 The Semantic Web Layer Cake in 2007.................... 23 2.3 The LOD cloud diagram in 2007 (Cyganiak and Jentzsch, 2007)...... 30 2.4 The LOD cloud diagram in 2011 (Cyganiak and Jentzsch, 2011)...... 30 2.5 Google Trends: Search volume index by year for the terms `semantic web', `linked data', `web of data' and `web 3.0' (2004-2011)............ 32 2.6 Google Trends: Search volume index by year for the terms `semantic web' and `web 2.0' (2004-2011)........................... 32 4.1 A Dressel 20 Roman amphora (Keay and Williams, 2005)......... 73 4.2 ArchVocab Excavation Ontology diagram.................. 79 5.1 UML Package Diagram of the Data Inspector Wizard........... 88 5.2 Data Inspector Wizard: Basic configuration information.......... 89 5.3 Data Inspector Wizard: Ontology-to-column schema mapping....... 90 5.4 Data Inspector Wizard: URIs for instance data............... 91 5.5 Data Inspector Wizard: Mapping excavations to GeoNames URIs..... 93 5.6 Data