Mashup-Based Linked Data Integration

Die approbierte Originalversion dieser Dissertation ist in der Hauptbibliothek der Technischen Universität Wien aufgestellt und zugänglich. http://www.ub.tuwien.ac.at The approved original version of this thesis is available at the main library of the Vienna University of Technology. http://www.ub.tuwien.ac.at/eng Mashup-based Linked Data Integration DISSERTATION zur Erlangung des akademischen Grades Doktor der Technischen Wissenschaften eingereicht von Tuan-Dat Trinh Matrikelnummer 1229761 an der Fakultät für Informatik der Technischen Universität Wien Betreuung: O.Univ.Prof. Dipl.-Ing. Dr.techn. A Min Tjoa Mag. Dr. Elmar Kiesling Diese Dissertation haben begutachtet: O.Univ.Prof. Dipl.-Ing. Dr.techn. Prof. Dr. A Min Tjoa Hamideh Afsarmanesh Wien, 3. März 2016 Tuan-Dat Trinh Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Mashup-based Linked Data Integration DISSERTATION submitted in partial fulfillment of the requirements for the degree of Doktor der Technischen Wissenschaften by Tuan-Dat Trinh Registration Number 1229761 to the Faculty of Informatics at the Vienna University of Technology Advisor: O.Univ.Prof. Dipl.-Ing. Dr.techn. A Min Tjoa Mag. Dr. Elmar Kiesling The dissertation has been reviewed by: O.Univ.Prof. Dipl.-Ing. Dr.techn. Prof. Dr. A Min Tjoa Hamideh Afsarmanesh Vienna, 3rd March, 2016 Tuan-Dat Trinh Technische Universität Wien A-1040 Wien Karlsplatz 13 Tel. +43-1-58801-0 www.tuwien.ac.at Erklärung zur Verfassung der Arbeit Tuan-Dat Trinh Donaufelderstrasse 54/3120, 1210 Wien Hiermit erkläre ich, dass ich diese Arbeit selbständig verfasst habe, dass ich die verwen- deten Quellen und Hilfsmittel vollständig angegeben habe und dass ich die Stellen der Arbeit – einschließlich Tabellen, Karten und Abbildungen –, die anderen Werken oder dem Internet im Wortlaut oder dem Sinn nach entnommen sind, auf jeden Fall unter Angabe der Quelle als Entlehnung kenntlich gemacht habe. Wien, 3. März 2016 Tuan-Dat Trinh v Acknowledgements I would like to take this opportunity to express my sincere thanks to those who helped me in one way or another on conducting research and the writing of this thesis. I want to express my heartfelt thanks to my advisor, O.Univ.Prof. Dipl.-Ing. Dr.techn. A Min Tjoa. The work presented in this thesis is not possible without his great support and advice. I also want to thank Mag. Dr. Elmar Kiesling, and Mag. Dipl.-Ing. Dr. Amin Anjomshoaa for participating on my Ph.D. committee and for their insightful feedback, comments and suggestions. I want to sincerely thank them for providing me the opportunity to participate in the Linked Data Lab, which enables me to get to know a brand new field and learn a lot of knowledge and skills. I would like to show my sincere thanks to my family for their endless support during my 3-year stay at TU Wien University. Starting from the beginning of my study at TU Wien, they kept encouraging and supporting me whenever I felt frustrated for my course work, my research and my life. Without their love and support during the years, I would not have come to where I am. I thank my friends in the Linked Data Lab: Ba-Lam Do, Peter Wetz, Peb Ruswono Aryan, and Bernhard Ortner for the stimulating discussions, the insightful comments, and for all the fun we have had in the last three years Last but not the least, I would like to thank all the people that have provided your help and support to me during these years in Austria in the best way. All in all, thanks everyone for your help, support, and love. vii Kurzfassung Das so genannte Web der Daten wächst mit erstaunlicher Geschwindigkeit. Mittlerweile ist eine große Menge an Datenquellen, APIs, Services, und Datenvisualisierungen öf- fentlich verfügbar. Dennoch ist es nach wie vor eine große Herausforderung, komplexe Informationsbedürfnisse von Anwendern mittels Integration und Verarbeitung von Daten aus heterogenen Quellen zu befriedigen. Zur Lösung dieser Probleme gab es in den letzten Jahren viele Forschungsarbeiten, die sich mit mashup-basierter Datenintegration beschäftigten. Mashups ermöglichen Anwendern, Daten und Services zu integrieren und rekombinieren, um so eigenständig Rich Web Applications zu erstellen. Trotzdem ist es für Anwender ohne technische Expertise schwierig, solche Mashups effizient und effektiv zu entwerfen. Um dieses Problem zu lösen, stellen wir eine Methode vor, die es ermöglicht, Mashups zu entwerfen, die heterogene Daten auf automatische, kollaborative, und verteilte Weise integrieren. Wir folgen dabei dem Visual Programming Paradigma basierend auf drei Grundsätzen: Offenheit, Vernetzung und Wiederverwendbarkeit. Die Methode fußt auf Semantic Web Technologien und dem Konzept der Linked Widgets, d.h., web widgets, die mit einem semantischen Modell ausgestattet sind. Linked Widgets sind konzipiert, um Herausforderungen der Datenintegration wie folgt zu bewältigen: (i) Die Wiederverwend- barkeit von Datenverarbeitungs-Prozessen wird unterstützt, (ii) Datenintegration mittels einfacher Operationen wird erleichtert, (iii) Es wird Anwendern ermöglicht, relevante Datenquellen auf Basis ihres Kontexts zu erforschen, (iv) Heterogeneität von Daten wird überwunden und (v) automatische Datenintegration erleichtert. Diese Arbeit stellt ein neues Konzept von semantischen, kollaborativen und verteilten Mashups vor. Den Prinzipien des Semantic Web folgend, können die damit erstellten ad-hoc Datenintegrations-Applikationen heterogene Daten verschiedener Teilnehmer gleichzeitig verarbeiten und kombinieren. Dabei können die Daten von so verschiedenen Geräten wie Sensoren, eingebetteten Systemen, Handys, Desktop Computern, oder Web Servern stammen. Diese Vorgehensweise benötigt keine Server Infrastruktur, um Daten hochzuladen, vielmehr behalten die Teilnehmer Kontrolle über ihre Daten und stellen nur ein minimales Subset anderen Anwendern zur Verfügung. Verteilte Mashups operieren persistent und sind somit ideal für Anwendungsfälle wie Echtzeit-Monitoring oder Verarbeitung von Datenströmen. ix Abstract The web of data is growing at a staggering pace. A large number of data sources, APIs, services, and data visualizations are publicly available. Satisfying users’ complex information needs by integrating and processing data from disparate sources, however, remains challenging. In recent years, a large stream of research into mashup-based data integration has emerged. These mashups foster combination and reuse of data and services and thereby have the potential for rapid creation of rich web applications. Nonetheless, users lacking technical expertise still face enormous barriers when trying to develop such mashups efficiently and effectively. To address this issue, we introduce an approach to compose mashups that integrate heterogeneous data sources in an automatic, collaborative, and distributed manner. We follow a visual programming paradigm and aim for three guiding principles: openness, con- nectedness, and reusability. The approach is based on semantic web technologies and the concept of Linked Widgets, i.e., web widgets backed by a semantic model. Linked Widgets are designed to effectively tackle data integration challenges by (i) fostering reusability of data processing tasks, (ii) easing data integration via simple operations, (iii) allowing users to explore relevant data sources with regard to their context, (iv) tackling data heterogeneity, and (v) facilitating automatic data integration. This thesis introduces a new model of semantic, collaborative, and distributed mashups. Following semantic web principles for data integration, these ad-hoc mashup-based data integration applications can simultaneously process and combine heterogeneous data contributed by multiple stakeholders. The data can come from various devices such as sensors, embedded devices, mobile phones, desktops, or web servers. The approach does not require server infrastructure to upload data, but rather allows each stakeholder to keep control over their data and expose only relevant subsets to the collaborative group. Distributed mashups can run persistently in the background and are hence ideal for real-time data monitoring or data streaming use cases. xi Preface Major content of this thesis is taken from our published contributions [1, 2, 3, 4, 5, 6, 7, 8] in three recent years. The author of this thesis is also the lead author of these papers. xiii Contents Kurzfassung ix Abstract xi Preface xiii Contents xiv List of Figures xvi List of Tables xix List of Algorithms xxi 1 Introduction 1 1.1 Motivation . 1 1.2 Problem Description . 2 1.3 Research Questions . 3 1.4 Main Contributions . 3 1.5 Thesis Outline . 4 2 Background 7 2.1 Semantic Web . 7 2.2 Semantic Data Integration . 12 2.3 Open Data and Linked Data . 18 2.4 Mashups . 19 2.5 State of the Art in Mashup-based Data Integration . 28 2.6 Research Gap . 34 3 Conceptual Framework 37 3.1 Modular Approach for Data Integration . 37 3.2 Architecture . 39 3.3 Client and Server Linked Widgets . 42 3.4 Semantic Model for Linked Widgets . 45 3.5 Mashup Construction . 55 xiv 3.6 Mashup Execution Protocols . 58 3.7 Hybrid Mashup Patterns . 64 3.8 Mashup Encapsulation . 70 3.9 Automatic Data Integration . 70 3.10 Tag-based Automatic Mashup Composition . 79 4 Computational Experiments 89 4.1 Terminal Matching . 89 4.2 Automatic Mashup Composition . 96 5 Prototype Implementation of the Framework 103 5.1 Architectural Design Considerations . 103 5.2 Key Components . 104 5.3 Implementation Details and Lessons Learned . 108 5.4 Example Applications in the Geospatial Context . 113 5.5 Hybrid Mashup Example Use Cases . 120 6 Related Work 127 6.1 Widget-based Mashups . 127 6.2 Semantic Mashups . 129 6.3 Embedded, Mobile, and Pervasive Mashups . 134 6.4 Collaborative Mashups . 136 6.5 Automatic Mashups . 138 6.6 Natural Language-supported Mashups . 141 6.7 Programming-by-demonstration Mashups . 143 7 Conclusions and Future Work 145 7.1 Conclusions . 145 7.2 Future Work . 147 Appendices 151 A Running Example of the Automatic Mashup Composition Algorithm . 151 B Box Plots of the Terminal Matching Experiment . 153 C Box Plots of the Automatic Mashup Composition Experiment . 160 Bibliography 165 List of Figures 2.1 Example of an RDF graph describing a person .

Mashup-Based Linked Data Integration

Uila Supported Apps

Larry Page Developing the Largest Corporate Foundation in Every Successful Company Must Face: As Google Word.” the United States

Web GIS in Practice VI: a Demo Playlist of Geo-Mashups for Public Health Neogeographers Maged N Kamel Boulos*1, Matthew Scotch2, Kei-Hoi Cheung2,3 and David Burden4

Towards Secure and Reusable Web Applications

Openmagazin 5/2009

Review of Service Composition Interfaces

Identifying Web Widgets

Towardsweb User-Centric Development Emilian Pascalau

Comparative Studies of 10 Programming Languages Within 10 Diverse Criteria Revision 1.0

Automated Security Testing of Web Widget Interactions

Is the Mashup Technology Mature for Its Application in an Institutional Website?

(12) United States Patent (10) Patent No.: US 8,660,849 B2 Gruber Et Al