A Framework to Support Developers in the Integration and Application of Linked and Open Data

Total Page:16

File Type:pdf, Size:1020Kb

A Framework to Support Developers in the Integration and Application of Linked and Open Data This copy of the thesis has been supplied on condition that anyone who consults it is understood to recognize that its copyright rests with its author and that no quota- tion from the thesis and no information derived from it may be published without the author’s prior consent. A FRAMEWORK TO SUPPORT DEVELOPERS IN THE INTEGRATION AND APPLICATION OF LINKED AND OPEN DATA by TIMM HEUSS A thesis submitted to Plymouth University in partial fulfilment of the degree of DOCTOR OF PHILOSOPHY Faculty of Science and Engineering April 2016 Abstract Timm Heuss: A Framework to Support Developers in the Integration and Application of Linked and Open Data In the last years, the number of freely available Linked and Open Data datasets has multiplied into tens of thousands. The numbers of applications taking advantage of these, however, have not. Thus, large portions of potentially valuable data remain unexploited and are inaccessible for lay users. Therefore the upfront investment in releasing data in the first place is hard to justify. The lack of applications needs to be addressed in order not to undermine efforts put into Linked and Open Data. In existing research, strong indicators can be found that the dearth of applications is due to a lack of pragmatic, working architectures supporting these applications and guiding developers. In this thesis, a new architecture for the integration and application of Linked and Open Data is presented. Fundamental design decisions are backed up by two studies: firstly, based on real-world Linked and Open Data samples, characteristic properties are identified. A key finding is the fact that large amounts of structured data display tabular structures, do not use clear licensing and involve multiple different file formats. Secondly, following on from that study, a comparison of storage choices in relevant query scenarios is made. It includes the de-facto standard storage choice in this domain, triples stores, as well as relational and NoSQL approaches. Results show significant performance deficiencies of some technologies in certain scenarios. Consequently, when integrating Linked and Open Data in scenarios with application-specific entities, the first choice of storage is relational databases. Combining these findings and related best practices of existing research, a prototype framework is implemented using Java 8 and Hibernate. As a proof-of-concept it is employed in an existing Linked and Open Data integration project. Thereby, it is shown that a best practice architectural component is introduced successfully, while development effort to implement specific program code can be simplified. Thus, the present work provides an important foundation for the development of se- mantic applications based on Linked and Open Data and potentially leads to a broader adoption of such applications. 3 Table of Contents Abstract 3 List of Figures 14 List of Tables 16 List of Listings 17 Acknowledgements 19 Author’s Declaration 21 Abbreviations 25 1 Introduction 29 1.1 Topics and Motivation . 30 1.1.1 World Wide Web . 30 1.1.2 Semantic Web . 32 1.1.3 Open Data . 33 1.1.4 Linked Data . 34 1.1.5 Term Usage in the present Work . 36 1.1.6 Research Project Mediaplatform . 36 1.2 Problem Statement . 40 1.3 Research Question . 45 1.4 Outline of Thesis . 46 2 Background and Literature Review 47 2.1 Reference Architectures . 48 2.1.1 Semantic Web Layered Cake, 2000-2006 . 48 2.1.2 Comprehensive, Functional, Layered Architecture (CFL), 2008 . 49 2.1.3 5-Star LOD Maturity Model, 2006 . 50 2.1.4 Linked Data Application Architectural Patterns, 2011 . 51 5 TABLE OF CONTENTS 2.2 Characteristics of Linked Data and RDF . 52 2.2.1 Uniformity, Referencibility, Timeliness . 52 2.2.2 RDF is Entity-Agnostic . 53 2.2.3 Not Consistently Open, Not Consistently Linked . 53 2.2.4 Quality Issues . 54 2.2.5 Completeness . 54 2.2.6 Simple Constructs . 55 2.2.7 JSON-LD and SDF . 55 2.2.8 Pay-as-you-go Data Integration . 56 2.2.9 Data Portal Analysis . 57 2.3 Storage . 57 2.3.1 Typical Triple Stores . 57 2.3.2 NoSQL and Big Data . 58 2.3.3 Translation Layers . 58 2.4 Storage Benchmarks . 59 2.4.1 Lehigh University Benchmark (LUBM) . 59 2.4.2 SPARQL Performance Benchmark (SP2Bench) . 60 2.4.3 Berlin SPARQL Benchmark (BSBM) . 60 2.4.4 DBpedia SPARQL Benchmark (DBPSB) . 61 2.4.5 Linked Data Benchmark Council Semantic Publishing Benchmark (LDBC SPB) . 62 2.5 Conceptual Data Integration . 62 2.5.1 Knowledge Discovery in Databases Process . 62 2.5.2 Data Warehouse Architecture . 64 2.5.3 News Production Content Value Chain . 65 2.5.4 Multimedia / Metadata Life-Cycles . 66 2.5.5 Linked Media Workflow . 67 2.5.6 Linked Data Value Chain . 67 2.6 Data Integration Frameworks . 68 2.6.1 Open PHACTS . 69 2.6.2 WissKI . 70 2.6.3 OntoBroker . 72 2.6.4 LOD2 Stack . 73 6 TABLE OF CONTENTS 2.6.5 Catmandu . 74 2.6.6 Apache Marmotta . 75 2.7 Linked Data Applications . 76 2.7.1 Building Linked Data Applications . 76 2.7.2 Search Engines . 77 2.7.3 BBC . 77 2.7.4 Wolters Kluwer’s LOD2 Stack Application . 78 2.7.5 KDE Nepomuk . 80 2.7.6 Cultural Heritage . 81 2.7.7 Natural Language Processing . 81 2.8 Development Paradigms . 82 2.8.1 Speed . 82 2.8.2 Data Binding . 83 2.8.3 Polyglot Persistence . 83 2.8.4 New Web Development Approaches . 84 2.9 Summary of Findings . 85 2.9.1 Entity-Agnostic RDF versus Developer APIs . 85 2.9.2 Storage in Triple Stores . 86 2.9.3 RDF-centred Architectures versus Application-specific Architec- tures . 88 2.9.4 Essential Steps Not Covered by Existing Architectures . 89 2.9.5 Warehousing Data . 90 2.10 Conclusions . 91 3 Research Approach 93 3.1 Research Methodology . 94 3.1.1 Field Study . 94 3.1.2 Dynamic Analysis . 95 3.1.3 Architecture Analysis and Informed Argument . 95 3.1.4 Case Study . 95 3.2 Research Design . 96 3.2.1 Field Study on Data at CKAN-based repositories . 97 3.2.2 Benchmark of Storage Technologies . 106 7 TABLE OF CONTENTS 3.2.3 Architecture of a Linked and Open Data Integration Chain . 114 3.2.4 Proof of concept Framework Application . 118 4 Framework for the Integration of Linked and Open Data 121 4.1 Field Study On Linked and Open Data at Datahub.io . 122 4.1.1 Common Formats . 123 4.1.2 Popular Formats . 124 4.1.3 Licences . 125 4.1.4 Ages . 125 4.1.5 In-Depth RDF Analysis Results . 126 4.1.6 Summary of the Findings . 132 4.2 Storage Evaluation for a Linked and Open Data Warehouse . 134 4.2.1 Selection of Databases . 135 4.2.2 Installation . 136 4.2.3 Setup . 139 4.2.4 Storage Decisions . 139 4.2.5 MARC versus RDF . 140 4.2.6 Overview of Measurement Results . 141 4.2.7 Stability of Operation . 146 4.2.8 Loading Times . 146 4.2.9 Entity Retrieval . 148 4.2.10 Caching . 150 4.2.11 Conditional Table Scan . 150 4.2.12 Aggregation . 152 4.2.13 Graph Scenarios . 154 4.2.14 Change Operations . 155 4.2.15 Summary of the Findings . 157 4.3 Development of a Linked and Open Data Integration Framework . 159 4.3.1 Comparison of Existing Approaches . 160 4.3.2 Gap Analysis . 165 4.3.3 Requirements . 166 4.3.4 The LODicity Framework . 167 4.3.5 Design Decisions . 175 8 TABLE OF CONTENTS 4.3.6 Main Components . 177 4.3.7 Execution Process . ..
Recommended publications
  • From UKMARC to MARC 21: a Guide to the Differences
    Changing the record A concise guide to the differences between the UKMARC and MARC 21 bibliographic formats By R.W. Hill Bibliographic research officer, British Library The British Library Bibliographic Policy & Liaison National Bibliographic Service Boston Spa, Wetherby West Yorkshire LS23 7BQ ©The British Library 2002 1 Contents Introduction 4 1 MARC 21 and UKMARC: an overview 1.1 A side-by-side comparison 5 1.2 MARC standards 6 1.3 Related formats 6 1.4 A note on the background 6 2 Structure and components of the MARC record 2.1 Fields 7 2.2 Indicators 7 2.3 Subfields 7 2.4 Control subfields 8 2.5 Levels 8 2.6 Character set 8 2.7 Record leader and directory 8 A note on the method of comparison 10 3 Name headings 3.1 Personal name headings 11 3.2 Corporate name headings 12 3.3 Meeting/conference name headings 12 4 Title information 4.1 Uniform titles 13 4.2 Collective titles 14 4.3 Title and statement of responsibility 14 4.4 Part titles 14 4.5 Key titles 16 4.6 Variant and related titles 16 5 Edition and imprint 5.1 Edition statement 17 5.2 Cartographic mathematical data 17 5.3 Computer file characteristics 17 5.4 Publication, distribution and manufacture 17 5.5 Projected publication date 18 6 Physical description and related details 6.1 Physical description 19 6.2 Price and availability 19 6.3 Sequential designation of serials 19 6.4 Other descriptive fields 19 7 Series statements 7.1 Series statements in title added entry form 20 7.2 Series statements not in title added entry form 20 8 Notes 8.1 Principles for defining notes 21 8.2
    [Show full text]
  • Domain Landscape Review and Data Value Chain Definition
    Ref. Ares(2018)2305100 - 30/04/2018 H2020 - INDUSTRIAL LEADERSHIP - Information and Communication Technologies (ICT) ICT-14-2016-2017: Big Data PPP: cross-sectorial and cross-lingual data integration and experimentation ICARUS: “Aviation-driven Data Value Chain for Diversified Global and Local Operations” D1.1 – Domain Landscape Review and Data Value Chain Definition Workpackage: WP1 – ICARUS Data Value Chain Elaboration Authors: University of Cyprus (UCY), Suite5, UBITECH, CINECA, AIA, PACE, ISI, CELLOCK, TXT, OAG Status: Final Classification: Public Date: 30/04/2018 Version: 1.00 Disclaimer: The ICARUS project is co-funded by the Horizon 2020 Programme of the European Union. The information and views set out in this publication are those of the author(s) and do not necessarily reflect the official opinion of the European Communities. Neither the European Union institutions and bodies nor any person acting on their behalf may be held responsible for the use which may be made of the information contained therein. © Copyright in this document remains vested with the ICARUS Partners. D1.1 - Domain Landscape Review and Data Value Chain Definition ICARUS Project Profile Grant Agreement No.: 780792 Acronym: ICARUS Aviation-driven Data Value Chain for Diversified Global and Local Title: Operations URL: http://www.icarus2020.aero Start Date: 01/01/2018 Duration: 36 months Partners UBITECH (UBITECH) Greece ENGINEERING - INGEGNERIA INFORMATICA SPA (ENG) Italy PACE Aerospace Engineering and Information Technology GmbH (PACE) Germany SUITE5 DATA
    [Show full text]
  • Qualifikationsprofil #10309
    QUALIFIKATIONSPROFIL #10309 ALLGEMEINE DATEN Geburtsjahr: 1972 Ausbildung: Abitur Diplom, Informatik, (TU, Kaiserslautern) Fremdsprachen: Englisch, Französisch Spezialgebiete: Kubernetes KENNTNISSE Tools Active Directory Apache Application Case CATIA CVS Eclipse Exchange Framework GUI Innovator ITIL J2EE JMS LDAP Lotus Notes make MS Exchange MS Outlook MS-Exchange MS-Office MS-Visual Studio NetBeans OSGI RACF SAS sendmail Together Turbine UML VMWare .NET ADS ANT ASP ASP.NET Flash GEnie IDES Image Intellij IDEA IPC Jackson JBOSS Lex MS-Visio ODBC Oracle Application Server OWL PGP SPSS SQS TesserAct Tivoli Toolbook Total Transform Visio Weblogic WebSphere YACC Tätigkeiten Administration Analyse Beratung Design Dokumentation KI Konzeption Optimierung Support Vertrieb Sprachen Ajax Basic C C# C++ Cobol Delphi ETL Fortran Java JavaScript Natural Perl PHP PL/I PL-SQL Python SAL Smalltalk SQL ABAP Atlas Clips Delta FOCUS HTML Nomad Pascal SPL Spring TAL XML Detaillierte Komponenten AM BI FS-BA MDM PDM PM BW CO FI LO PP Datenbanken Approach IBM Microsoft Object Store Oracle Progress Sybase DMS ISAM JDBC mySQL DC/Netzwerke ATM DDS Gateway HBCI Hub Internet Intranet OpenSSL SSL VPN Asynchronous CISCO Router DNS DSL Firewall Gateways HTTP RFC Router Samba Sockets Switches Finance Business Intelligence Excel Konsolidierung Management Projektleiter Reporting Testing Wertpapiere Einkauf CAD Systeme CATIA V5 sonstige Hardware Digital HP PC Scanner Siemens Spark Teradata Bus FileNet NeXT SUN Switching Tools, Methoden Docker Go Kubernetes Rational RUP
    [Show full text]
  • Descriptive Metadata Guidelines for RLG Cultural Materials I Many Thanks Also to These Individuals Who Reviewed the Final Draft of the Document
    ������������������������������� �������������������������� �������� ����������������������������������� ��������������������������������� ��������������������������������������� ���������������������������������������������������� ������������������������������������������������� � ���������������������������������������������� ������������������������������������������������ ����������������������������������������������������������� ������������������������������������������������������� ���������������������������������������������������� �� ���������������������������������������������� ������������������������������������������� �������������������� ������������������� ���������������������������� ��� ���������������������������������������� ����������� ACKNOWLEDGMENTS Many thanks to the members of the RLG Cultural Materials Alliance—Description Advisory Group for their participation in developing these guidelines: Ardie Bausenbach Library of Congress Karim Boughida Getty Research Institute Terry Catapano Columbia University Mary W. Elings Bancroft Library University of California, Berkeley Michael Fox Minnesota Historical Society Richard Rinehart Berkeley Art Museum & Pacific Film Archive University of California, Berkeley Elizabeth Shaw Aziza Technology Associates, LLC Neil Thomson Natural History Museum (UK) Layna White San Francisco Museum of Modern Art Günter Waibel RLG staff liaison Thanks also to RLG staff: Joan Aliprand Arnold Arcolio Ricky Erway Fae Hamilton Descriptive Metadata Guidelines for RLG Cultural Materials i Many
    [Show full text]
  • DACS (Describing Archives: a Content Standard)
    SAA-Approved Standard DE S CRI DESCRIBING B ARCHIVES ING ARCHIVE A CONTENT STANDARD SECOND EDITION DESCRIBING Revised and updated, Describing Archives: A Content Standard (DACS) facilitates S : consistent, appropriate, and self-explanatory description of archival materials and A CON creators of archival materials. This new edition reflects the growing convergence among archival, museum, and library standards; aligns DACS with the descriptive ARCHIVES standards developed and supported by the International Council on Archives; and TE A CONTENT STANDARD provides guidance on the creation of archival authority records. DACS can be applied N to all types of material at all levels of description, and the rules are designed for use by T S SECOND EDITION any type of descriptive output, including MARC 21, Encoded Archival Description TA (EAD), and Encoded Archival Context (EAC). N The second edition consists of two parts: “Describing Archival Materials” and DARD “Archival Authority Records.” Separate sections discuss levels of description and the importance of access points to the retrieval of descriptions. Appendices feature a list of companion standards and crosswalks to ISAD(G), ISAAR(CPF), MARC 21, EAD, S EAC, and Resource Description and Access (RDA). Also included is an index. ECON D E D ITION BROWSE ARCHIVES TITLES AT WWW.ARCHIVISTS.ORG/CATALOG D E S C R I B I N G ARCHIVES ______________________________________________________________________________________________________________________________ A CONTENT STANDARD SECOND EDITION Chicago THE SOCIETY OF AMERICAN ARCHIVISTS www.archivists.org © 2013 by the Society of American Archivists All rights reserved. Printed in the United States of America Describing Archives: A Content Standard, Second Edition (DACS) was officially adopted as a standard by the Council of the Society of American Archivists in January 2013, following review by the SAA Standards Committee, its Technical Subcommittee for Describing Archives: A Content Standard, and the general archival community.
    [Show full text]
  • RDA and MARC 21 Development Report to EURIG 2020 RDA Satellite Meeting, 15Th September, 2020
    RDA and MARC 21 Development Report to EURIG 2020 RDA Satellite meeting, 15th September, 2020 Thurstan Young, British Library Overview • MARC 21 Background • MARC 21 Mappings in RDA Toolkit • MARC RDA Working Group www.bl.uk 2 MARC 21 Background (What is it?) • Machine Readable Cataloging 21 • Set of digital formats for the description of items catalogued by the library community • International standard for the transmission of bibliographic data • Widely used across North America, Europe and Australia www.bl.uk 3 MARC 21 Background (History) • Created in 1999 as a result of harmonizing USMARC and CAN/MARC • Adopted by the British Library in 2004 as a replacement to UKMARC www.bl.uk 4 MARC 21 Background (Structure / Content) • Field designations: three digit numeric code (from 001-999) identifies each field in a record • File structure: generally stored and transmitted as binary files, consisting of several MARC records • Content: encodes information about bibliographic resources and related entities www.bl.uk 5 MARC 21 Background (Formats) • Authority • Bibliographic • Holdings • Classification • Community www.bl.uk 6 MARC 21 Background (Development) • Discussion Papers, Proposals, Fast Track Changes • Updates issued twice a year • Sixty day embargo period to allow for implementation www.bl.uk 7 MARC 21 Background (Administration 1) • Library of Congress Network Development and MARC Standards Office • MARC Steering Group • MARC Advisory Committee www.bl.uk 8 MARC 21 Background (Administration 2) • MARC Steering Group • Library of Congress • Library and Archives Canada • British Library • Deutsche Nationalbibliothek www.bl.uk 9 MARC 21 Background (Administration 3) • MARC Advisory Committee • MARC Steering Group • Biblioteca Nacional de España • National Library of Australia • Various U.S.
    [Show full text]
  • Tags for Identifying Languages File:///C:/W3/International/Draft-Langtags/Draft-Phillips-Lan
    Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan... Network Working Group A. Phillips, Ed. TOC Internet-Draft webMethods, Inc. Expires: October 7, 2004 M. Davis IBM April 8, 2004 Tags for Identifying Languages draft-phillips-langtags-02 Status of this Memo This document is an Internet-Draft and is in full conformance with all provisions of Section 10 of RFC2026. Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts. Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as "work in progress." The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt. The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html. This Internet-Draft will expire on October 7, 2004. Copyright Notice Copyright (C) The Internet Society (2004). All Rights Reserved. Abstract This document describes a language tag for use in cases where it is desired to indicate the language used in an information object, how to register values for use in this language tag, and a construct for matching such language tags, including user defined extensions for private interchange. Table of Contents 1. Introduction 2. The Language Tag 2.1 Syntax 2.2 Language Tag Sources 2.2.1 Pre-Existing RFC3066 Registrations 1 of 20 08/04/2004 11:03 Tags for Identifying Languages file:///C:/w3/International/draft-langtags/draft-phillips-lan..
    [Show full text]
  • Source Code Patterns of Cross Site Scripting in PHP Open Source Projects
    Source Code Patterns of Cross Site Scripting in PHP Open Source Projects Felix Schuckert12, Max Hildner1, Basel Katt2 ,∗ and Hanno Langweg12 1 HTWG Konstanz, Department of Computer Science, Konstanz, Baden-W¨urttemberg, Germany [email protected] [email protected] [email protected] 2 Department of Information Security and Communication Technology, Faculty of Information Technology and Electrical Engineering, NTNU, Norwegian University of Science and Technology, Gjøvik, Norway [email protected] Abstract To get a better understanding of Cross Site Scripting vulnerabilities, we investigated 50 randomly selected CVE reports which are related to open source projects. The vulnerable and patched source code was manually reviewed to find out what kind of source code patterns were used. Source code pattern categories were found for sources, concatenations, sinks, HTML context and fixes. Our resulting categories are compared to categories from CWE. A source code sample which might have led developers to believe that the data was already sanitized is described in detail. For the different HTML context categories, the necessary Cross Site Scripting prevention mechanisms are described. 1 Introduction Cross Site Scripting (XSS) is on the fourth place in Common Weakness Enumeration (CWE) top 25 2011 [3] and on the seventh place in Open Wep Application Security Project (OWASP) top 10 2017 [4]. Accordingly, Cross Site Scripting is still a common issue in web security. To discover the reason why the same vulnerabilities are still occurring, we investigated the vulnerable and patched source code from open source projects. Similar methods, functions and operations are grouped together and are called source code patterns.
    [Show full text]
  • Postgresql Flyer
    PostgreSQL - English Usage Examples Further Information Development system PostgreSQL (2nd Edition), Korry Douglas, Sams Publishing, ISBN: 0672327562 A small system just for developing, running on any supported platform (Unix, Linux, Mac OS, Windows). Beginning Databases with PostgreSQL:From Novice to This system does not need much system resources. Professional, Second Edition, Neil Matthew, Apress, The result can be exported and used in the production ISBN: 1590594789 PostgreSQL system. PostgreSQL Developer's Handbook, Ewald Geschwinde, Sams Publishing, ISBN 0672322609 Beginning PHP and PostgreSQL 8, W. Jason Gilmore, Small to mid-level database server Apress, ISBN 1590595475 A small to mid-level database server has just small PHP and PostgreSQL Advanced Web Programming, hardware requirements. PostgreSQL is not running ex- Ewald Geschwinde and Robert Treat, Sams Publishing, clusive on this system but shares the resources with ISBN 0672323826 other services. A webserver (Blog, CMS) with a data- base backend is a good example. PostgreSQL homepage: www.postgresql.org pgAdmin III: http://www.pgadmin.org Large database server PgFoundry: http://pgfoundry.org phpPgAdmin: http://phppgadmin.sourceforge.net A large database server has extensive hardware re- PostGIS: postgis.refractions.net quirements and is usually dedicated to a single appli- cation or project. PostgreSQL can use the full power Slony: slony.info of the hardware without the need to share resources. PostgreSQL 8.3 What is PostgreSQL? PostgreSQL 8.3, released in early 2008, includes a record PostgreSQL is an object-relational database management number of new and improved features which will greatly system (ORDBMS). It is freely available and usable with- enhance PostgreSQL for application designers, database out licensing fee.
    [Show full text]
  • General Introduction General Introduction
    General Introduction General Introduction The MARC 21 formats are widely used standards for the representation and exchange of authority, bibliographic, classification, community information, and holdings data in machine-readable form. They consist of a family of five coordinated formats: MARC 21 Format for Authority Data; MARC 21 Format for Bibliographic Data; MARC 21 Format for Classification Data; MARC 21 Format for Community Information; and MARC 21 Format for Holdings Data. Each of these MARC formats is published separately online to provide detailed field descriptions, guidelines for applying the defined content designation (with examples), and identification of conventions to be used to insure input consistency. The MARC 21 Concise Formats provide in a single publication a quick reference guide to the content designators defined in the Bibliographic, Authority, and Holdings formats. It provides a concise description of each field, each character position of the fixed-length data element fields, and of the defined indicators in the variable data fields. Descriptions of subfield codes and coded values are given only when their names may not be sufficiently descriptive. Examples are included for each field. COMPONENTS OF A MARC 21 RECORD MARC format characteristics that are common to all of the formats are described in this general introduction. Information specific only to certain record types is given in the introduction to the MARC format to which it relates. A MARC record is composed of three elements: the record structure, the content designation, and the data content of the record. The record structure is an implementation of the American National Standard for Information Interchange (ANSI/NISO Z39.2) and its ISO equivalent ISO 2709.
    [Show full text]
  • A Framework for Working with Cross-Application Social Tagging Data
    TECHNISCHE UNIVERSITÄT MÜNCHEN FAKULTÜT FÜR INFORMATIK Forschungs- und Lehreinheit XI Angewandte Informatik / Kooperative Systeme A Framework for Working with Cross-Application Social Tagging Data Walter Christian Kammergruber Vollständiger Abdruck der von der Fakultät für Informatik der Technischen Universität München zur Erlangung des akademischen Grades eines Doktors der Naturwissenschaften (Dr. rer. nat.) genehmigten Dissertation. Vorsitzender: Univ.-Prof. Dr. Helmut Krcmar Prüfer der Dissertation: 1. Univ.-Prof. Dr. Johann Schlichter 2. Univ.-Prof. Dr. Florian Matthes Die Dissertation wurde am 26.06.2014 bei der Technischen Universität München eingere- icht und durch die Fakultät für Informatik am 26.11.2014 angenommen. Zusammenfassung Mit dem zunehmenden Erfolg des Web 2.0 wurde und wird Social-Tagging immer beliebter, und es wurde zu einem wichtigen Puzzle-Stück dieses Phänomens. Im Unterschied zu ausgefeilteren Methoden um Ressourcen zu organisieren, wie beispielsweise Taxonomien und Ontologien, ist Social-Tagging einfach einzusetzen und zu verstehen. Bedingt durch die Einfachheit finden sich keine expliziten und formalen Strukturen vor. Das Fehlen von Struktur führt zu Problemen beim Wiederaufinden von Informationen, da beispielsweise Mehrdeutigkeiten in Suchanfragen nicht aufgelöst werden können. Zum Beispiel kann ein Tag „dog“ (im Englischen) für des Menschen bester Freund stehen, aber auch für das Lieblingsessen mancher Personen, einem Hot Dog. Ein Bild einer Katze kann mit„angora cat“, „cat“, „mammal“, „animal“oder „creature“getagged sein. Die Art der Tags hängt sehr stark vom individuellen Nutzer ab. Weiterhin sind Social-Tagging-Daten auf verschiedene Applikationen verteilt. Ein gemeinsamer Mediator ist nicht vorhanden. Beispielsweise kann ein Nutzer auf vielen verschiedenen Applikationen Entitäten taggen. Für das Internet kann das Flickr, Delicious, Twitter, Facebook and viele mehr sein.
    [Show full text]
  • Integration of Web Apis and Linked Data Using SPARQL Micro-Services
    Integration of Web APIs and Linked Data Using SPARQL Micro-Services - Application to Biodiversity Use Cases Franck Michel, Catherine Faron Zucker, Olivier Gargominy, Fabien Gandon To cite this version: Franck Michel, Catherine Faron Zucker, Olivier Gargominy, Fabien Gandon. Integration of Web APIs and Linked Data Using SPARQL Micro-Services - Application to Biodiversity Use Cases. Information, MDPI, 2018, Special Issue ”Semantics for Big Data Integration”, 9 (12), 10.3390/info9120310. hal- 01947589 HAL Id: hal-01947589 https://hal.archives-ouvertes.fr/hal-01947589 Submitted on 7 Dec 2018 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Distributed under a Creative Commons Attribution| 4.0 International License information Article Integration of Web APIs and Linked Data Using SPARQL Micro-Services—Application to Biodiversity Use Cases † Franck Michel 1,*, Catherine Faron Zucker 1, Olivier Gargominy 2 and Fabien Gandon 1 1 I3S laboratory, University Côte d’Azur, CNRS, Inria, 930 route des Colles, 06903 Sophia Antipolis, France; [email protected] (C.F.Z.); [email protected] (F.G.) 2 Muséum National d’Histoire Naturelle, 36 rue Geoffroy Saint-Hilaire, 75005 Paris, France; [email protected] (O.G.) * Correspondence: [email protected] (F.M.) † This paper is an extended version of our paper published in Michel F., Faron Zucker C.
    [Show full text]