University of Southampton Research Repository Eprints Soton
Total Page:16
File Type:pdf, Size:1020Kb
University of Southampton Research Repository ePrints Soton Copyright © and Moral Rights for this thesis are retained by the author and/or other copyright owners. A copy can be downloaded for personal non-commercial research or study, without prior permission or charge. This thesis cannot be reproduced or quoted extensively from without first obtaining permission in writing from the copyright holder/s. The content must not be changed in any way or sold commercially in any format or medium without the formal permission of the copyright holders. When referring to this work, full bibliographic details including the author, title, awarding institution and date of the thesis must be given e.g. AUTHOR (year of submission) "Full thesis title", University of Southampton, name of the University School or Department, PhD Thesis, pagination http://eprints.soton.ac.uk UNIVERSITY OF SOUTHAMPTON Trusted Collaboration in Distributed Software Development by Ellis Rowland Watkins A thesis submitted in partial fulfillment for the degree of Doctor of Philosophy in the Faculty of Engineering, Science and Mathematics School of Electronics and Computer Science June 2007 UNIVERSITY OF SOUTHAMPTON ABSTRACT FACULTY OF ENGINEERING, SCIENCE AND MATHEMATICS SCHOOL OF ELECTRONICS AND COMPUTER SCIENCE Doctor of Philosophy by Ellis Rowland Watkins Distributed systems have moved from application-specific, bespoke and mutually incompatible network protocols to open standards based on TCP/IP, HTTP, and SGML - the foundations of the World Wide Web (WWW). The emergence of the WWW has brought about a revolution in computer resource discovery and exploitation across organisational boundaries. Examples of this can be seen with recent advances in Security and Service Orientated Architectures such as Web Services and Grid middleware. Expansion of the WWW has seen the development of the Semantic Web, a layer on top of the WWW where content is enriched and made interoperable through standards such as RDF and OWL. Our work in these fields has brought together different ideas to further the ad- vancement of version control; the Semantic Web, Service Orientated Architectures, strong cryptography and the highly dynamic and collaborative WikiWikiWeb. Our online collaborative tool takes advantage of Description Logics, Named Graphs, digital signatures and Grids technologies, to improve collaboration for software engineers working in distributed software development, using semantic knowledge federation and inference rules. Such a system goes well beyond any current version control technology and demonstrates the value and future potential of Semantic Web technologies over traditional Relational Database Management Systems and overly expressive logics such as Prolog. Contents Acknowledgements xv 1 Introduction 1 1.1 Distributed Systems and the Birth of the Web ............ 1 1.1.1 The Semantic Web ....................... 2 1.2 Research Motivation .......................... 2 1.2.1 Server Integrity ......................... 3 1.2.2 Audit Logs ........................... 3 1.2.3 Management ........................... 4 1.3 Research Statement ........................... 5 1.4 Thesis Structure ............................. 9 1.5 Declaration ............................... 10 2 Background 11 2.1 The WikiWikiWeb: Lightweight Collaboration ............ 11 2.2 Technology ............................... 13 2.2.1 Relational Database Management Systems (RDBMS) .... 13 2.2.2 Version Control Systems .................... 14 2.2.2.1 Concurrent Versioning System ............ 14 2.2.2.2 Subversion ...................... 14 2.2.2.3 Git ........................... 15 2.2.2.4 GNU Arch ...................... 15 2.2.2.5 Darcs ......................... 16 2.2.3 Service Orientated Architecture and Web Services ...... 16 2.2.4 The Grid ............................ 18 2.2.4.1 Globus Toolkit .................... 18 2.2.4.2 Grid Services ..................... 19 WSRF.net ........................ 19 MS.NETGrid Project .................. 19 Globus Toolkit 4.x ................... 19 2.2.4.3 Sun Microsystems’ Grid Engine ........... 20 2.2.4.4 Legion ......................... 20 2.2.5 EC IST Framework Grids ................... 20 2.2.5.1 GRIA ......................... 20 2.2.5.2 gLite .......................... 21 v vi CONTENTS 2.2.6 China Grid ........................... 21 2.2.7 UK e-Science Projects ..................... 22 2.2.8 Public Key Infrastructure ................... 22 2.2.8.1 Digital Signatures .................. 23 2.3 Provenance Frameworks ........................ 23 2.3.1 Data Provenance ........................ 24 2.3.2 Provenance in Service Oriented Architectures ........ 24 2.3.3 Knowledge Provenance ..................... 25 2.3.4 Provenance Mechanisms .................... 26 2.3.4.1 RDF Reification ................... 26 2.3.4.2 Quads ......................... 27 2.3.4.3 Contexts ....................... 27 2.3.4.4 RDFX ......................... 28 2.3.4.5 Named Graphs .................... 28 2.3.4.6 RDF Molecules .................... 29 2.4 Logic Frameworks ............................ 29 2.4.1 Description Logics ....................... 29 2.4.1.1 Class Expression Language ............. 29 2.4.1.2 The TBox ....................... 31 2.4.1.3 The ABox ....................... 31 2.4.1.4 The RBox ....................... 31 2.4.1.5 Concrete Datatypes ................. 32 2.4.2 Semantic Inferences ....................... 32 2.4.2.1 Types of Inferences .................. 33 Backward Chaining ................... 33 Forward Chaining .................... 34 2.4.2.2 Popular Inference Engines .............. 34 2.4.3 Frame Logics .......................... 34 2.4.4 Open and Closed World Semantics .............. 35 2.4.5 Monotonicity .......................... 36 2.5 World Wide Web Technologies ..................... 37 2.5.1 URIs ............................... 37 2.5.2 Description Frameworks .................... 37 2.5.2.1 Resource Description Framework .......... 37 2.5.2.2 Topic Maps ...................... 38 2.5.3 Ontologies ............................ 39 2.5.3.1 RDFS ......................... 39 Class/SubClass declaration ............... 39 Instances ......................... 39 Properties (relations) .................. 40 Multiple Inheritance .................. 40 2.5.3.2 OWL ......................... 40 2.5.3.3 OWL DL ....................... 40 2.5.3.4 OWL Lite ....................... 41 CONTENTS vii 2.6 Summary ................................ 41 3 Analysis 43 3.1 Distributed Collaborative Software Development Case Studies ... 43 3.1.1 FLOSS .............................. 43 3.1.2 EC IST Grid Collaboration .................. 45 3.1.3 EC IST Collaboration with FLOSS .............. 46 3.2 RDBMS Approach ........................... 47 3.2.1 Benefits ............................. 48 3.2.1.1 Interoperability .................... 48 3.2.1.2 Performance and Scalability ............. 48 3.2.2 Issues .............................. 49 3.2.2.1 Federation ....................... 49 3.2.2.2 Trust Management .................. 50 3.2.2.3 Interoperability .................... 51 3.3 Semantic Web Approach ........................ 52 3.3.1 Benefits ............................. 52 3.3.1.1 Federation ....................... 52 3.3.1.2 Trust Management .................. 53 3.3.2 Issues .............................. 53 3.3.2.1 Performance and Scalability ............. 54 3.3.2.2 Provenance Mechanisms ............... 54 RDF Reification Issues ................. 54 MSG and RDF Molecule Issues ............ 55 Semantic Interoperability Issues ............ 56 The Two Towers of the Semantic Web ......... 56 3.4 Digital Signatures ............................ 58 3.4.1 Digital Signatures and the Semantic Web ........... 59 3.4.1.1 Canonicalisation Issues ................ 59 Blank Nodes ....................... 59 3.4.1.2 Semantic Issues .................... 62 3.4.1.3 Serialisation Issues .................. 62 3.5 Querying Semantic Version Control .................. 62 3.5.1 FLOSS Questions ........................ 63 3.5.2 IST Project Questions ..................... 64 3.6 Summary ................................ 66 4 Design and Implementation 67 4.1 Ontology Design Overview ....................... 67 4.1.1 Requirements .......................... 68 4.1.2 Document Provenance ..................... 68 4.1.2.1 Document Class ................... 70 4.1.2.2 Wikipage Class .................... 70 4.1.2.3 Person Class ..................... 71 viii CONTENTS 4.1.3 Other Ontologies ........................ 71 4.1.3.1 Friend of a Friend .................. 72 4.1.3.2 Description of a Project ............... 74 4.1.3.3 Dublin Core Metadata Initiative .......... 74 4.1.3.4 Simple Java Ontology ................ 75 4.1.4 Construction .......................... 76 4.2 Provenance Mechanism ......................... 77 4.2.1 Named Graphs ......................... 77 4.3 Modelling Version Control with DP .................. 78 4.4 Security ................................. 79 4.4.1 Signing RDF Graphs ...................... 79 4.4.1.1 Carroll’s algorithm vs. nauty ............ 81 4.4.1.2 Conservative Canonicalisation ............ 82 4.5 Open Issues ............................... 83 4.5.1 NG Management ........................ 83 4.5.1.1 Signature Management ................ 83 4.5.2 Ontology Decomposition .................... 84 4.6 Implementation - An Online Collaborative Tool ........... 85 4.6.1 Motivation ............................ 85 4.6.2 Architecture ........................... 86 4.6.2.1 Client Side ...................... 87 4.6.2.2