Reference and Meaning on the Web

Total Page:16

File Type:pdf, Size:1020Kb

Reference and Meaning on the Web The Open World: Reference and Meaning on the Web Harry Halpin Doctor of Philosophy Institute for Communicating and Collaborative Systems School of Informatics University of Edinburgh 2008 Abstract This thesis examines the question: What does a URI mean? Or in other words, what does a URI identify and refer to? This question is of utmost importance for the creation and re-use of URIs on the Semantic Web. An philosophical and historical introduction to the Web explains the primary purpose of the Web as a universal information space for naming and accessing information. A terminology, based in widely-known philo- sophical distinctions, is employed to define precisely what is meant by information, language, representation, and digitality. These terms are then employed to create a foundational ontology and principles of Web architecture, which can be used to de- limit the boundaries of the Web. From this perspective, the Semantic Web is then viewed as the application of the principles of Web architecture to knowledge repre- sentation. However, the classical philosophical problems of identity, reference, and meaning that traditionally caused problems for knowledge representation are inherited by the Semantic Web. Three main positions are inspected: the logicist position, as exemplified by formal semantics and a neo-Russellian descriptivist theory of names, the direct reference position, as exemplified by Putnam and Kripke’s causal theory of reference, and a neo-Wittgensteinian position that views the Semantic Web as yet an- other public language. After identifying the neo-Wittgensteinian viewpoint as the most promising, a solution of using people’s everyday use of search engines is proposed as a proper neo-Wittgensteinian way to determine the meaning of URIs via sense identi- fication and disambiguation. This solution is then tested on a corpus of search terms from a real-world major search engine query log, and these terms are used to retrieve a corpus of Semantic Web URIs. Human browsing of hypertext web-pages is then used as a feature set in order to disambiguate and identify the “sense” of these URIs via a variety of machine-learning techniques. The results show that @@. Future work for the Semantic Web that follows from our argument is detailed, both of a technical and conceptual nature. iii Acknowledgements @@To be filled in when needed iv Declaration I declare that this thesis was composed by myself, that the work contained herein is my own except where explicitly stated otherwise in the text, and that this work has not been submitted for any other degree or professional qualification except as specified. (Harry Halpin) v Table of Contents 1 Introduction 1 1.1 Motivation................................ 2 1.2 Hypothesis ............................... 2 1.3 Scope .................................. 5 1.4 NotationalConventions. 7 1.5 Summary ................................ 7 2 The Significance of the Web 9 2.1 TheOriginsoftheWeb ......................... 11 2.2 TheMan-MachineSymbiosisProject . 12 2.3 TheInternet............................... 13 2.4 TheModernWorldWideWeb . 16 3 Philosophical Prolegomenon 19 3.1 TheWebandtheWorld......................... 20 3.2 Information,Encoding,andContent . 23 3.3 Language,Interpretation,andModels . .. 29 3.4 PurposesandProperFunctions . 36 3.5 Representations ............................. 39 3.6 Digitality ................................ 45 3.7 TheExtendedMindThesisontheWeb. 49 4 The Principles of Web Architecture 53 4.1 FoundationalTerminologyoftheWeb . 58 4.1.1 Protocols ............................ 58 4.1.2 UniformResourceIdentifiers. 63 4.1.3 ResourcesandWebRepresentations . 67 4.1.4 RepresentationalStateTransfer . 74 vii 4.2 ThePrinciplesofWebArchitecture. 78 4.2.1 PrincipleofUniversality . 79 4.2.2 PrincipleofLinking . 86 4.2.3 PrincipleofSelf-Description . 90 4.2.4 TheOpenWorldPrinciple . 94 4.2.5 PrincipleofLeastPower . 98 4.3 Conclusions............................... 101 5 The Semantic Web 103 5.1 ABriefHistoryofKnowledgeRepresentation . 104 5.2 TheResourceDescriptionFramework(RDF) . 109 5.2.1 RDFandthePrincipleofUniversality . 109 5.2.2 RDFandthePrincipleofLinking . 110 5.2.3 RDFandthePrincipleofSelf-Description . 113 5.2.4 RDFandtheOpenWorldPrinciple . 115 5.2.5 RDFandthePrincipleofLeastPower . 118 5.3 TheSemanticWeb: Good-OldfashionedAIRedux? . 123 6 The Identity Crisis 127 6.1 WhatDoURIsreferto?. 127 6.2 The Logicist Position and the Descriptivist Theory of Reference . 133 6.2.1 LogicalPositivism . 133 6.2.2 Tarki’sFormalSemantics. 136 6.2.3 InDefenseofAmbiguity . 138 6.2.4 LogicismUnboundontheSemanticWeb . 142 6.3 The Direct Reference Position and The Causal Theory of Reference . 145 6.3.1 Kripke’sCausalTheoryofProperNames . 145 6.3.2 Putnam’sTheoryofNaturalKinds . 146 6.3.3 DirectReferenceontheWeb . 147 6.3.4 The 303 Redirection Solution to the Identity Crisis . .... 149 6.3.5 TheSecond-GenerationSemanticWeb . 154 6.4 Critiques of the Causal and Descriptivist Theory of Reference . 156 6.4.1 Radical Translation and the Failure of the Direct Reference Position............................. 156 6.4.2 Radical Interpretation and the Failure of the LogicistPosition 159 6.4.3 TheReturnofFrege’sSense . 164 viii A Glossary of Terms 167 Bibliography 177 ix Chapter 1 Introduction To imagine a language means to imagine a form of life. Ludwig Wittgenstein (1953) The World Wide Web is without a doubt one of the most significant computational phenomena yet. Although it is impossible to tell what the future holds, it is obvious that the Web has already been massively adopted, for it is the Web more than even the word processor has made computing a necessity in everyday life. The sheer size of the Web makes empirical work difficult, although through careful sampling large amounts of progress has been made on characterizing the Web descriptively (Barabsi et al., 2000). Still, there are some questions that cannot be answered without having the entire Web available (Abiteboul and Vianu, 1997). Furthermore, there are some questions that cannot be answered without a theoretical understanding of the Web, in particular, the design of new Web standards. Although the Web is impressive as a practical success story, there has been little in the way of developing a theoretical framework to understand what, if anything, is different about the Web from the stand- point of long-standing questions of meaning and reference in philosophy. While this situation may have been tolerable so far, serving as no real barrier to the further growth of the Web, with the development of the Semantic Web, a next generation of the Web “in which information is given well-defined meaning, better enabling computers and people to work in cooperation,” these philosophical questions come to the forefront, and only a practical solution to them can help the Semantic Web repeat the success of the hypertext Web (Berners-Lee et al., 2001). 1 1.1 Motivation Despite the hyperbole, there is little doubt that the Semantic Web faces gloomy prospects. At first inspection, the Semantic Web appears to be a close cousin both in spirit and in execution to another intellectual project, which we dub “classical artificial intelli- gence,” (also known as “Good-Old Fashioned AI”) an ambitious project whose progress has been relatively glacial and whose assumptions have been found to be cognitively questionable (Clark, 1997). The initial bet of the Semantic Web was that somehow the Web part of the Semantic Web would somehow overcome whatever problems the Semantic Web inherited from classical artificial intelligence (Halpin, 2004). However, progress on the Semantic Web has also been relatively slow over the last near-decade, and it appears both new techniques and large amounts of data have not yet caused the Semantic Web to repeat the phenomenal success the hypertext Web. The first problem that is self-evident to anyone who actually attempts to deploy any ‘real-world’ data on the Semantic Web is that there is little guidance on how to name data using URIs, and what digital information to serve from these URIs (Bizer et al., 2007). For long, this question was unanswered, and recently has only been cryp- tically answered (Sauermann and Cygniak, 2008). The second self-evident problem that occurs to anyone actually trying to do data integration on the Semantic Web is the different people create different URIs for the same thing. The essential bet of the Semantic Web is that decentralized agents will come to agreement on using the same URI to name a thing, including things that aren’t accessible on the Web, like people, places, and abstract concepts. Yet there is virtually no ability to even find URIs for things on the Semantic Web. Currently, each application creates its own URIs, and rarely are URIs re-used outside a particular application domain, repeating the localism of classical artificial intelligence. So many things either have no URIs or far too many. With the recent explosion of Semantic Web data on the Web given by the ‘Linked Data’ initiative, this overabundance of co-referential URIs have become even more pressing, and there is no principled way to determine when a new URI should be created (Bizer et al., 2008). 1.2 Hypothesis The scientific hypothesisof this thesis must be stated in a two-fold fashion, both to state the problem and then propose a solution. To state the problem, the Semantic Web is an extension of language, one that can defined by its conformance to the principles of Web architecture, but nonetheless inherits the problems regarding reference
Recommended publications
  • V a Lida T in G R D F Da
    Series ISSN: 2160-4711 LABRA GAYO • ET AL GAYO LABRA Series Editors: Ying Ding, Indiana University Paul Groth, Elsevier Labs Validating RDF Data Jose Emilio Labra Gayo, University of Oviedo Eric Prud’hommeaux, W3C/MIT and Micelio Iovka Boneva, University of Lille Dimitris Kontokostas, University of Leipzig VALIDATING RDF DATA This book describes two technologies for RDF validation: Shape Expressions (ShEx) and Shapes Constraint Language (SHACL), the rationales for their designs, a comparison of the two, and some example applications. RDF and Linked Data have broad applicability across many fields, from aircraft manufacturing to zoology. Requirements for detecting bad data differ across communities, fields, and tasks, but nearly all involve some form of data validation. This book introduces data validation and describes its practical use in day-to-day data exchange. The Semantic Web offers a bold, new take on how to organize, distribute, index, and share data. Using Web addresses (URIs) as identifiers for data elements enables the construction of distributed databases on a global scale. Like the Web, the Semantic Web is heralded as an information revolution, and also like the Web, it is encumbered by data quality issues. The quality of Semantic Web data is compromised by the lack of resources for data curation, for maintenance, and for developing globally applicable data models. At the enterprise scale, these problems have conventional solutions. Master data management provides an enterprise-wide vocabulary, while constraint languages capture and enforce data structures. Filling a need long recognized by Semantic Web users, shapes languages provide models and vocabularies for expressing such structural constraints.
    [Show full text]
  • Security Applications of Formal Language Theory
    Dartmouth College Dartmouth Digital Commons Computer Science Technical Reports Computer Science 11-25-2011 Security Applications of Formal Language Theory Len Sassaman Dartmouth College Meredith L. Patterson Dartmouth College Sergey Bratus Dartmouth College Michael E. Locasto Dartmouth College Anna Shubina Dartmouth College Follow this and additional works at: https://digitalcommons.dartmouth.edu/cs_tr Part of the Computer Sciences Commons Dartmouth Digital Commons Citation Sassaman, Len; Patterson, Meredith L.; Bratus, Sergey; Locasto, Michael E.; and Shubina, Anna, "Security Applications of Formal Language Theory" (2011). Computer Science Technical Report TR2011-709. https://digitalcommons.dartmouth.edu/cs_tr/335 This Technical Report is brought to you for free and open access by the Computer Science at Dartmouth Digital Commons. It has been accepted for inclusion in Computer Science Technical Reports by an authorized administrator of Dartmouth Digital Commons. For more information, please contact [email protected]. Security Applications of Formal Language Theory Dartmouth Computer Science Technical Report TR2011-709 Len Sassaman, Meredith L. Patterson, Sergey Bratus, Michael E. Locasto, Anna Shubina November 25, 2011 Abstract We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development.
    [Show full text]
  • Bibliography of Erik Wilde
    dretbiblio dretbiblio Erik Wilde's Bibliography References [1] AFIPS Fall Joint Computer Conference, San Francisco, California, December 1968. [2] Seventeenth IEEE Conference on Computer Communication Networks, Washington, D.C., 1978. [3] ACM SIGACT-SIGMOD Symposium on Principles of Database Systems, Los Angeles, Cal- ifornia, March 1982. ACM Press. [4] First Conference on Computer-Supported Cooperative Work, 1986. [5] 1987 ACM Conference on Hypertext, Chapel Hill, North Carolina, November 1987. ACM Press. [6] 18th IEEE International Symposium on Fault-Tolerant Computing, Tokyo, Japan, 1988. IEEE Computer Society Press. [7] Conference on Computer-Supported Cooperative Work, Portland, Oregon, 1988. ACM Press. [8] Conference on Office Information Systems, Palo Alto, California, March 1988. [9] 1989 ACM Conference on Hypertext, Pittsburgh, Pennsylvania, November 1989. ACM Press. [10] UNIX | The Legend Evolves. Summer 1990 UKUUG Conference, Buntingford, UK, 1990. UKUUG. [11] Fourth ACM Symposium on User Interface Software and Technology, Hilton Head, South Carolina, November 1991. [12] GLOBECOM'91 Conference, Phoenix, Arizona, 1991. IEEE Computer Society Press. [13] IEEE INFOCOM '91 Conference on Computer Communications, Bal Harbour, Florida, 1991. IEEE Computer Society Press. [14] IEEE International Conference on Communications, Denver, Colorado, June 1991. [15] International Workshop on CSCW, Berlin, Germany, April 1991. [16] Third ACM Conference on Hypertext, San Antonio, Texas, December 1991. ACM Press. [17] 11th Symposium on Reliable Distributed Systems, Houston, Texas, 1992. IEEE Computer Society Press. [18] 3rd Joint European Networking Conference, Innsbruck, Austria, May 1992. [19] Fourth ACM Conference on Hypertext, Milano, Italy, November 1992. ACM Press. [20] GLOBECOM'92 Conference, Orlando, Florida, December 1992. IEEE Computer Society Press. http://github.com/dret/biblio (August 29, 2018) 1 dretbiblio [21] IEEE INFOCOM '92 Conference on Computer Communications, Florence, Italy, 1992.
    [Show full text]
  • Helsinki University of Technology
    Aalto University School of Science and Technology Faculty of Electronics, Communications and Automation Degree Programme of Communications Engineering Markku Pekka Mikael Laine XFormsDB—An XForms-Based Framework for Simplifying Web Application Development Master’s Thesis Helsinki, Finland, January 20, 2010 Supervisor: Professor Petri Vuorimaa, D.Sc. (Tech.) Instructor: Mikko Honkala, D.Sc. (Tech.) Aalto University ABSTRACT OF THE School of Science and Technology MASTER’S THESIS Faculty of Electronics, Communications and Automation Degree Programme of Communications Engineering Author: Markku Pekka Mikael Laine Title: XFormsDB—An XForms-Based Framework for Simplifying Web Application Development Number of pages: xx + 161 Date: January 20, 2010 Language: English Professorship: Interactive Digital Media and Contents Production Code: T-111 Supervisor: Professor Petri Vuorimaa, D.Sc. (Tech.) Instructor: Mikko Honkala, D.Sc. (Tech.) The nature of the World Wide Web is constantly changing to meet the increasing demands of its users. While this trend towards more useful interactive services and applications has improved the utility and the user experience of the Web, it has also made the development of Web applications much more complex. The main objective of this Thesis was to study how Web application development could be simplified by means of declarative programming. An extension that seamlessly integrates common server-side functionalities to the XForms markup language is proposed and its feasibility and capabilities are validated with a proof- of-concept implementation, called the XFormsDB framework, and two sample Web applications. The results show that useful, highly interactive multi-user Web applications can be authored quickly and easily in a single document and under a single programming model using the XFormsDB framework.
    [Show full text]
  • Linked Research on the Decentralised Web
    Linked Research on the Decentralised Web Dissertation zur Erlangung des Doktorgrades (Dr. rer. nat.) der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn vorgelegt von Sarven Capadisli aus Istanbul, Turkey Bonn, 2019-07-29 Angefertigt mit Genehmigung der Mathematisch-Naturwissenschaftlichen Fakultät der Rheinischen Friedrich-Wilhelms-Universität Bonn 1. Gutachter: Prof. Dr. Sören Auer 2. Gutachter: Dr. Herbert Van de Sompel Tag der Promotion 2020-03-03 Erscheinungsjahr 2020 Abstract This thesis is about research communication in the context of the Web. I analyse literature which reveals how researchers are making use of Web technologies for knowledge dissemination, as well as how individuals are disempowered by the centralisation of certain systems, such as academic publishing platforms and social media. I share my findings on the feasibility of a decentralised and interoperable information space where researchers can control their identifiers whilst fulfilling the core functions of scientific communication: registration, awareness, certification, and archiving. The contemporary research communication paradigm operates under a diverse set of sociotechnical constraints, which influence how units of research information and personal data are created and exchanged. Economic forces and non-interoperable system designs mean that researcher identifiers and research contributions are largely shaped and controlled by third-party entities; participation requires the use of proprietary systems. From a technical standpoint, this thesis takes a deep look at semantic structure of research artifacts, and how they can be stored, linked and shared in a way that is controlled by individual researchers, or delegated to trusted parties. Further, I find that the ecosystem was lacking a technical Web standard able to fulfill the awareness function of research communication.
    [Show full text]
  • Security Applications of Formal Language Theory Dartmouth Computer Science Technical Report TR2011-709
    Security Applications of Formal Language Theory Dartmouth Computer Science Technical Report TR2011-709 Len Sassaman, Meredith L. Patterson, Sergey Bratus, Michael E. Locasto, Anna Shubina November 25, 2011 Abstract We present an approach to improving the security of complex, composed systems based on formal language theory, and show how this approach leads to advances in input validation, security modeling, attack surface reduction, and ultimately, software design and programming methodology. We cite examples based on real-world security flaws in common protocols representing different classes of protocol complexity. We also introduce a formalization of an exploit development technique, the parse tree differential attack, made possible by our conception of the role of formal grammars in security. These insights make possible future advances in software auditing techniques applicable to static and dynamic binary analysis, fuzzing, and general reverse-engineering and exploit development. Our work provides a foundation for verifying critical implementation components with considerably less burden to developers than is offered by the current state of the art. It additionally offers a rich basis for further exploration in the areas of offensive analysis and, conversely, automated defense tools and techniques. This report is divided into two parts. In Part I we address the formalisms and their applications; in Part II we discuss the general implications and recommendations for protocol and software design that follow from our formal analysis. 1 Introduction Composition is the primary engineering means of complex system construction. No matter what other engineering approaches or design patterns are applied, the economic reality is that a complex computing system will ultimately be pulled together from components made by different people and groups of people.
    [Show full text]
  • 818 Index Σ 6 Α-Β Pruning
    Index 6 active tape .............................................................. 266 - pruning ............................................................ 787 ACT-V ................................................................... 774 (L) ............................................................... 223, 228 Ada ........................................................................ 668 * 6 adder ...................................................................... 800 adjacency matrix ............................................ 437, 567 0 ….. ................................................................... 589 -automaton .................................................... 83, 144 Adleman, Leonard ......................................... 430, 723 admissibility H ................................. 321, 325, 328, 347, 349, 354 of a heuristic function ............................... 548, 791 H ................................................................ 350, 363 of a search algorithm ......................................... 549 k 426 age of universe ....................................... 419, 433, 799 L 7 agent -loop ......................................................... 54, 60, 136 intelligent .......................... 703, 759, 769, 773, 790 M 333 agreement in natural languages ............. 256, 398, 750 n …............................................................... 541, 571 Aho, Alfred .................................................... 144, 263 P 476 AI See artificial intelligence -recursive function ......................................
    [Show full text]
  • Implementing Web Applications Using Xquery
    University of Konstanz Department of Computer and Information Science Master esis for the degree Master of Science (M.Sc.) in Information Engineering Implementing Web Applications Using XQuery XML from Front to Back Michael Seiferle / Konstanz, March , Referee: Prof. Dr. Marc H. Scholl ⁿ Referee: Prof. Dr. Marcel Waldvogel Supervisors: Alexander Holupirek & Dr. Christian Grün Z (D) Entwickler sehen sich heute immer häuger und mit einer immer grösser werdenden Anzahl an XML-Daten konfrontiert. Es liegt auf der Hand, dass diese mit dafür konzipierten Datenbanksystemen verwaltet werden. Innerhalb eines modernen XML- Datenbank-Management-Systems (XML-DBMS) kann XML effizient gespeichert werden und es stehen domänenspezische Sprachen, wie XQuery, zur Weiterverarbeitung zur Verfügung. Gleich- wohl wird die Konzentration auf die Datenhaltung allein den Herausforderung nicht gerecht; In- formationen in Anwendungen „zum Leben zu erwecken“ ist mindestens genauso wichtig. Die vor- liegende Arbeit untersucht Chancen und Probleme der Webapplikationsentwicklung in einer rein auf XML-Technologie beruhenden Systemarchitektur. Wir beschäigen uns mit der Frage, ob sich die Entwicklung grundlegend vereinfachen lässt, indem man konzeptuellen Ballast, den moderne Web-Frameworks wie Ruby on Rails oder CakePHP mit sich bringen, in weiten Teilen obsolet macht. Der reine XML-Stack bringt hierzu die notwendigen Voraussetzungen mit: einheitliche Programmierparadigmen und die durchgängige Verwendung ei- nes Datenmodells in allen Schichten der Applikation. Zur Klärung der Frage beschreiben wir die Konzeption von BX W, einem XQuery-getriebenen Anwendungsserver. BX W ist eine auf der XML-Datenbank BX basierende Technologie- studie, die es erlaubt Web-Applikationen allein unter Verwendung von WC Standards zu realisie- ren. Ergebnis ist ein leichtgewichtiges Anwendungsframework mit dessen Hilfe Expertensysteme, wie beispielsweise ein Online Public Access Catalogue (OPAC), implementiert werden können.
    [Show full text]