Reference and Meaning on the Web
Total Page:16
File Type:pdf, Size:1020Kb
The Open World: Reference and Meaning on the Web Harry Halpin Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh 2008 Abstract This thesis examines the question: What does a URI mean? Or in other words, what does a URI identify and refer to? This question is of utmost importance for the creation and re-use of URIs on the Semantic Web. An philosophical and historical introduction to the Web explains the primary purpose of the Web as a universal information space for naming and accessing information. A terminology, based in widely-known philo- sophical distinctions, is employed to define precisely what is meant by information, language, representation, and digitality. These terms are then employed to create a foundational ontology and principles of Web architecture, which can be used to de- limit the boundaries of the Web. From this perspective, the Semantic Web is then viewed as the application of the principles of Web architecture to knowledge repre- sentation. However, the classical philosophical problems of identity, reference, and meaning that traditionally caused problems for knowledge representation are inherited by the Semantic Web. Three main positions are inspected: the logicist position, as exemplified by formal semantics and a neo-Russellian descriptivist theory of names, the direct reference position, as exemplified by Putnam and Kripke’s causal theory of reference, and a neo-Wittgensteinian position that views the Semantic Web as yet an- other public language. After identifying the neo-Wittgensteinian viewpoint as the most promising, a solution of using people’s everyday use of search engines is proposed as a proper neo-Wittgensteinian way to determine the meaning of URIs via sense identi- fication and disambiguation. This solution is then tested on a corpus of search terms from a real-world major search engine query log, and these terms are used to retrieve a corpus of Semantic Web URIs. Human browsing of hypertext web-pages is then used as a feature set in order to disambiguate and identify the “sense” of these URIs via a variety of machine-learning techniques. The results show that @@. Future work for the Semantic Web that follows from our argument is detailed, both of a technical and conceptual nature. iii Acknowledgements @@To be filled in when needed iv Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Harry Halpin) v Table of Contents 1 Introduction 1 1.1 Motivation................................ 2 1.2 Hypothesis ............................... 2 1.3 Scope .................................. 5 1.4 NotationalConventions. 7 1.5 Summary ................................ 7 2 The Significance of the Web 9 2.1 TheOriginsoftheWeb ......................... 11 2.2 TheMan-MachineSymbiosisProject . 12 2.3 TheInternet............................... 13 2.4 TheModernWorldWideWeb . 16 3 Philosophical Prolegomenon 19 3.1 TheWebandtheWorld......................... 20 3.2 Information,Encoding,andContent . 23 3.3 Language,Interpretation,andModels . .. 29 3.4 PurposesandProperFunctions . 36 3.5 Representations ............................. 39 3.6 Digitality ................................ 45 3.7 TheExtendedMindThesisontheWeb. 49 4 The Principles of Web Architecture 53 4.1 FoundationalTerminologyoftheWeb . 58 4.1.1 Protocols ............................ 58 4.1.2 UniformResourceIdentifiers. 63 4.1.3 ResourcesandWebRepresentations . 67 4.1.4 RepresentationalStateTransfer . 74 vii 4.2 ThePrinciplesofWebArchitecture. 78 4.2.1 PrincipleofUniversality . 79 4.2.2 PrincipleofLinking . 86 4.2.3 PrincipleofSelf-Description . 90 4.2.4 TheOpenWorldPrinciple . 94 4.2.5 PrincipleofLeastPower . 98 4.3 Conclusions............................... 101 5 The Semantic Web 103 5.1 ABriefHistoryofKnowledgeRepresentation . 104 5.2 TheResourceDescriptionFramework(RDF) . 109 5.2.1 RDFandthePrincipleofUniversality . 109 5.2.2 RDFandthePrincipleofLinking . 110 5.2.3 RDFandthePrincipleofSelf-Description . 113 5.2.4 RDFandtheOpenWorldPrinciple . 115 5.2.5 RDFandthePrincipleofLeastPower . 118 5.3 TheSemanticWeb: Good-OldfashionedAIRedux? . 123 6 The Identity Crisis 127 6.1 WhatDoURIsreferto?. 127 6.2 The Logicist Position and the Descriptivist Theory of Reference . 133 6.2.1 LogicalPositivism . 133 6.2.2 Tarki’sFormalSemantics. 136 6.2.3 InDefenseofAmbiguity . 138 6.2.4 LogicismUnboundontheSemanticWeb . 142 6.3 The Direct Reference Position and The Causal Theory of Reference . 145 6.3.1 Kripke’sCausalTheoryofProperNames . 145 6.3.2 Putnam’sTheoryofNaturalKinds . 146 6.3.3 DirectReferenceontheWeb . 147 6.3.4 The 303 Redirection Solution to the Identity Crisis . .... 149 6.3.5 TheSecond-GenerationSemanticWeb . 154 6.4 Critiques of the Causal and Descriptivist Theory of Reference . 156 6.4.1 Radical Translation and the Failure of the Direct Reference Position............................. 156 6.4.2 Radical Interpretation and the Failure of the LogicistPosition 159 6.4.3 TheReturnofFrege’sSense . 164 viii A Glossary of Terms 167 Bibliography 177 ix Chapter 1 Introduction To imagine a language means to imagine a form of life. Ludwig Wittgenstein (1953) The World Wide Web is without a doubt one of the most significant computational phenomena yet. Although it is impossible to tell what the future holds, it is obvious that the Web has already been massively adopted, for it is the Web more than even the word processor has made computing a necessity in everyday life. The sheer size of the Web makes empirical work difficult, although through careful sampling large amounts of progress has been made on characterizing the Web descriptively (Barabsi et al., 2000). Still, there are some questions that cannot be answered without having the entire Web available (Abiteboul and Vianu, 1997). Furthermore, there are some questions that cannot be answered without a theoretical understanding of the Web, in particular, the design of new Web standards. Although the Web is impressive as a practical success story, there has been little in the way of developing a theoretical framework to understand what, if anything, is different about the Web from the stand- point of long-standing questions of meaning and reference in philosophy. While this situation may have been tolerable so far, serving as no real barrier to the further growth of the Web, with the development of the Semantic Web, a next generation of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation,” these philosophical questions come to the forefront, and only a practical solution to them can help the Semantic Web repeat the success of the hypertext Web (Berners-Lee et al., 2001). 1 1.1 Motivation Despite the hyperbole, there is little doubt that the Semantic Web faces gloomy prospects. At first inspection, the Semantic Web appears to be a close cousin both in spirit and in execution to another intellectual project, which we dub “classical artificial intelli- gence,” (also known as “Good-Old Fashioned AI”) an ambitious project whose progress has been relatively glacial and whose assumptions have been found to be cognitively questionable (Clark, 1997). The initial bet of the Semantic Web was that somehow the Web part of the Semantic Web would somehow overcome whatever problems the Semantic Web inherited from classical artificial intelligence (Halpin, 2004). However, progress on the Semantic Web has also been relatively slow over the last near-decade, and it appears both new techniques and large amounts of data have not yet caused the Semantic Web to repeat the phenomenal success the hypertext Web. The first problem that is self-evident to anyone who actually attempts to deploy any ‘real-world’ data on the Semantic Web is that there is little guidance on how to name data using URIs, and what digital information to serve from these URIs (Bizer et al., 2007). For long, this question was unanswered, and recently has only been cryp- tically answered (Sauermann and Cygniak, 2008). The second self-evident problem that occurs to anyone actually trying to do data integration on the Semantic Web is the different people create different URIs for the same thing. The essential bet of the Semantic Web is that decentralized agents will come to agreement on using the same URI to name a thing, including things that aren’t accessible on the Web, like people, places, and abstract concepts. Yet there is virtually no ability to even find URIs for things on the Semantic Web. Currently, each application creates its own URIs, and rarely are URIs re-used outside a particular application domain, repeating the localism of classical artificial intelligence. So many things either have no URIs or far too many. With the recent explosion of Semantic Web data on the Web given by the ‘Linked Data’ initiative, this overabundance of co-referential URIs have become even more pressing, and there is no principled way to determine when a new URI should be created (Bizer et al., 2008). 1.2 Hypothesis The scientific hypothesisof this thesis must be stated in a two-fold fashion, both to state the problem and then propose a solution. To state the problem, the Semantic Web is an extension of language, one that can defined by its conformance to the principles of Web architecture, but nonetheless inherits the problems regarding reference