Open Research Online Oro.Open.Ac.Uk
Total Page:16
File Type:pdf, Size:1020Kb
Open Research Online The Open University’s repository of research publications and other research outputs Knowledge Components and Methods for Policy Propagation in Data Flows Thesis How to cite: Daga, Enrico (2018). Knowledge Components and Methods for Policy Propagation in Data Flows. PhD thesis The Open University. For guidance on citations see FAQs. c 2017 The Author https://creativecommons.org/licenses/by-nc-nd/4.0/ Version: Version of Record Link(s) to article on publisher’s website: http://dx.doi.org/doi:10.5281/zenodo.1308710 Copyright and Moral Rights for the articles on this site are retained by the individual authors and/or other copyright owners. For more information on Open Research Online’s data policy on reuse of materials please consult the policies page. oro.open.ac.uk Knowledge Components and Methods for Policy Propagation in Data Flows ENRICO DAGA Knowledge Media Institute Faculty of Science, Technology, Engineering and Mathematics The Open University This dissertation is submitted for the degree of Doctor of Philosophy October 2017 2 Abstract Data-oriented systems and applications are at the centre of current developments of the World Wide Web (WWW). On the Web of Data (WoD), information sources can be accessed and processed for many purposes. Users need to be aware of any licences or terms of use, which are associated with the data sources they want to use. Conversely, publishers need support in assigning the appropriate policies alongside the data they distribute. In this work, we tackle the problem of policy propagation in data flows - an expression that refers to the way data is consumed, manipulated and produced within processes. We pose the question of what kind of components are required, and how they can be acquired, managed, and deployed, to support users on deciding what policies propagate to the output of a data-intensive system from the ones associated with its input. We observe three scenarios: applications of the Semantic Web, workflow reuse in Open Science, and the exploitation of urban data in City Data Hubs. Starting from the analysis of Semantic Web applications, we propose a data-centric approach to semantically describe processes as data flows: the Datanode ontology, which comprises a hierarchy of the possible relations between data objects. By means of Policy Propagation Rules, it is possible to link data flow steps and policies derivable from semantic descriptions of data licences. We show how these components can be designed, how they can be effectively managed, and how to reason efficiently with them. In a second phase, the developed components are verified using a Smart City Data Hub as a case study, where we developed an end-to-end solution for policy propagation. Finally, we evaluate our approach and report on a user study aimed at assessing both the quality and the value of the proposed solution. i ii To Filippo and Pietro (Dad wrote a book!) iii iv Declaration I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. Enrico Daga (October, 2017) v vi Acknowledgements I can fairly say that this has been a long journey since when I wrote my first line of code 13 years ago at the age of 27. The path leading me here has been full of serendipitous events, among which the most important was attending a language course at the Russian Institute of Culture in Rome. Another one is related to a newspaper advert of a training course for "Expert Web Developers", kindly cut out by a friend of mine for me, as I was unemployed at that time. I will skip many others and only summon a conversation between who is writing and a young but already distinguished researcher during a meeting of the EU-funded NeOn project: - "So, are you doing a PhD in Semantic Web?" - "No. Anyway, I could not do one in Computer Science, I have a degree in Humanities. I am just a developer who likes Semantic Web." - "Well, I am pretty sure you could do one with us at The Open University!" First and foremost, I offer my sincerest gratitude to my supervisors, Professors Mathieu d’Aquin, Enrico Motta, and Aldo Gangemi, for being my guide during these PhD years and showing me the art of research whilst allowing me the freedom to work in my own way. I am very grateful to the reviewers Oscar Corcho (Universidad Politécnica de Madrid) and David Robertson (University of Edinburgh) for the constructive feedback and for having made my PhD viva a memorable experience, and to Trevor Collins who had a key role in reducing my anxiety before and during that day. Many excellent colleagues at the Knowledge Media Institute (KMI) at The Open University supported me with through their valuable feedback. The superb Miriam Fernandez read the whole draft returning precious feedback and challenging questions which prepared me for the review panel. Paul Mulholland and Francesco Osborne provided valuable feedback and helped me with solid suggestions on how to improve the work in Chapter 6. I have been rewarded by a friendly and cheerful group of colleagues and friends that supported me during these years: Alessandro Adamou, Luca Panziera, Ilaria Tiddi, Emanuele Bastianelli, Andrea Mannocci, Carlos Pedrinaci, Matteo Cancellieri, Alessandro Taffetani, Patrizia Paci, Jaisha vii viii Bruce, Lucas Anastasiou, Pinelopi Troullinou, Lara Piccolo, Martin Holsta, Giorgio Basile, Simon Cutjar, Tina Papathoma, Allan Third, ... and all the people of the Knowledge Media Institute of The Open University. Thank you for making KMI the ideal place for a PhD, and for being so sensitive in supporting the cause of inclusiveness and diversity in these politically difficult times. A special acknowledgment goes to the Knowledge Media Instruments for being "The Best Band You Ever Heard in Your Life" (Cit.). Follow them on Facebook: https://www.facebook. com/kmiinstruments/. I would never have thought of starting a career in research without having had the opportunity to collaborate with the Semantic Technology Laboratory (STLab) of the Institute of Cognitive Sciences and Technologies (ISTC) of CNR: Aldo Gangemi, Valentina Presutti, Andrea Nuzzolese and their first-class team of researchers and collaborators. I am forever grateful to Alberto Salvati for having believed in me at a time when I did not. Those years at the Ufficio Reti e Telecomunicazioni and later at the Ufficio Sistemi Informativi of the Italian National Research Council remain unforgettable. The strongest hugs go to Gianluca Troiani and Andrea Pompili, brothers in arms. I can fairly say I learned from the best. A special thought goes to my family and friends in Italy: Andrea (Gino), Pietro, Alessio, Alessandro (Cuccioletto), Enrico (Abbruzzese), Massi and all the metal-heads of Decima1. I want to thank my steady mum for being a constant in my life in many different ways (- "Mum! I am PhD!"; - "What does it mean?"), my brother Marcello (admirable perfectionist), and my sister Francesca and her beautiful family. I am grateful to my father Luigi Daga, whose example guides me, lifts me and inspires me daily. This work is dedicated to my fantastic kids, Filippo and Pietro, who fill my days with joy and purpose. I could not have done this without my wife’s support, Francesca. She is capable of making disappear any fear and doubt, by being there when it truly matters. My Russian is still very basic, though. 1https://goo.gl/maps/NUFGQbjc9jk. Contents 1 Introduction 1 1.1 Motivation . .2 1.1.1 Semantic Web applications . .3 1.1.2 Open Science . .5 1.1.3 City Data Hubs . .7 1.2 Research questions . .9 1.3 Approach . 10 1.4 Hypotheses . 12 1.5 Research Methodology . 14 1.6 Contribution . 15 1.7 Publications . 17 2 State of the Art 19 2.1 The Web of data . 19 2.2 Knowledge Engineering . 22 2.3 Formal Concept Analysis (FCA) . 25 2.4 Semantic Web Technologies . 30 2.4.1 Resource Description Framework (RDF) . 32 2.4.2 Publishing and consuming Semantic Web data . 36 2.4.3 SPARQL Protocol and Query Language . 40 2.4.4 Reasoning with RDF data . 46 2.5 Data cataloguing on the Web . 51 2.5.1 Approaches to data cataloguing . 51 2.5.2 Tools for data cataloguing . 53 2.5.3 Towards supporting an exploitability assessment . 54 2.6 Licenses and policies: representation and reasoning . 55 ix x CONTENTS 2.6.1 Foundations . 55 2.6.2 Open Digital Rights Language (ODRL) . 59 2.6.3 Reasoning with licences on the Web . 60 2.6.4 Actions propagating policies: a missing link . 63 2.7 Representation of process knowledge . 63 2.7.1 Process Modelling in Software Engineering . 67 2.7.2 Process Modelling in Ontology Engineering . 71 2.7.3 Process modelling on the Web: from Service Oriented Architectures to Semantic Web Services . 73 2.7.4 Provenance . 77 2.7.5 Scientific workflows . 80 2.7.6 Conclusions: towards a data-centric model of processes . 86 3 Semantic Representation of Data Flows 89 3.1 Genesis of Datanode . 90 3.1.1 Selection & acquisition . 92 3.1.2 Use case abstraction . 92 3.1.3 Ontology design . 93 3.1.4 Testing . 94 3.1.5 Result . 95 3.2 The Datanode Ontology . 95 3.2.1 Metalevels . 96 3.2.2 Derivations . 98 3.2.3 Adjacencies . 99 3.2.4 Capabilities (overlapping and different) . 100 3.2.5 Shared interpretation . 103 3.2.6 Alignments with Semantic Web ontologies .