Explaining Data Patterns Using Knowledge from the Web of Data

Explaining Data Patterns using Knowledge from the Web of Data Ilaria Tiddi Knowledge Media Institute The Open University This dissertation is submitted for the degree of Doctor of Philosophy November 2016 “The moment you doubt whether you can fly, you cease for ever to be able to do it.” Peter Pan; or the Boy Who Wouldn’t Grow Up (1904) J. M. Barrie To my parents (who understood that sometimes it is better not to understand) Declaration I hereby declare that except where specific reference is made to the work of others, the contents of this dissertation are original and have not been submitted in whole or in part for consideration for any other degree or qualification in this, or any other university. Ilaria Tiddi November 2016 Acknowledgements I spent months trying to think how I would write these pages – we all know this is the section that will be read the most. In the end, it probably took me the same amount of time it took to write the thesis itself. I promised a friend that the first sentence of the Acknowledgements would be “Graphs are everywhere”, because graphs have been my Odi et amo in these years. Well, I could not keep the promise. I had to reconsider this sentence. *** Patterns are everywhere. Thinking back right now, these 4 years have been all about patterns (for clarity: by “pattern”, I mean “something recurrent”). Patterns have been part of my work, because I (almost desperately) tried to explain them automatically. But they have also been part of my days, without even me realising that. Funnily, when I thought of whom I should have thanked, I found a very clear, recurrent pattern. I hope the reader will agree. First and foremost, I need to thank my two supervisors, Mathieu d’Aquin and Enrico Motta, for their guidance, support and trust. I do not think that words can be sufficient to thank Mathieu for all the time he patiently dedicated to me. I am immensely grateful to him for all the times he let me in his office and listened to any kind of problem, complaint, fear, doubt I might have, and for all the supervisions we had in impossible places (even public transports!) around the world. And I am even more thankful to Enrico, who never mistrusted us in the work we were doing and, more especially, made a bet on me a long time ago, trusting that I could be able to arrive at the end of this journey. Legends say they fought to decide who would be my first supervisor: I am sure my experience would have been the same either ways. Many thanks should also go to my exaMiners, Frank van Harmelen and Harith Alani, for their valuable comment on my thesis, for having made my Ph.D. dissertation an awesome and enjoyable experience that I will never forget. They might not remember this, but Harith was my mentor during my first conference and Frank was the inspiring opening talk to the unforgettable SSSW2013. How could I not close my Ph.D. with them? Also, they deserve a big “thank you” for having bravely coped with the crazy burocracy of the OU. I am as much thankful to the wonderful MeD-team, just for being what we are – a messy and noisy group without which I could have never started my Mondays. A special thank is x | deserved by the best compagno di banco I ever had, Enrico D., and by Alessandro for having been my coding-teachers in many occasions, for having dealt with a constantly complaining “I-am-not-a-Computer-Scientist” Ph.D. student, and for having spent hours with me every time I was messing up with our servers. I need to thank KMi for being such a perfect place for studying, working, partying, playing music...basically, living (for the record, I definitely spent more time at my desk than in my room, and I do not regret it). I am proud to be part of KMi, and I will never stop saying that. My only regret is to having met so many cool people, who then left KMi. But I like to think KMi as a family: we grow up together but then we need to let our friends go, knowing that it does not really matter the distance, the good souvenirs that we shared will always keep us close (well, social networks do help, too). KMiers are only a part of the family I had these years, the MK-crowd. As some people know, Milton Keynes is a peculiar place to live, but this is the price for having such special and unique friends. It is quite hard to name everybody, every single friend has contributed to make me the person that I am today, and for that I am extremely thankful (I feel a bit sorry, too – more than sometimes I can be a very difficult person). I thank all the people that lived or orbited Perivale 46, for being friends and siblings; the football group for having coped with my despotic “Sir Alex” manners, and all the other friends that were always there for helping, laughing, listening, drinking, partying, chilling, relaxing, travelling and much more. I could not have survived my Ph.D. without coming back to my RoMe every now and then (everybody knows how much I am proud of being Roman, so I think it is nice to thank my hometown as a whole). I must have said that a millionth times now, without my mother backing my decisions up (well, most of them) nothing of this would have been possible. For that, no thanks will ever be enough. I must also thank my father, for being my first and biggest fan, and my brothers, for the special relation that makes us able to cross in places such as a random airport without even organising it. And because “Rome, sweet home”, a great thank also goes to all the friends that made my homecoming always warm and enjoyable. Finally, and most importantly, a truly, sincere, special thanks goes my Eυ⌧"⇢⇡⌘, my Music muse, my Mountain (!) of these years. Manu, you are the M that walked these years with me day-by-day, that was always there (physically and remotely!), that celebrated my winning moments, that alleviated my (code) sufferings and dried my tears, forcing me to stand up and believe in myself when I did not want to. For that, and for all that will come, I thank you and I just add the land at the end of our toes goes on, and on, and on... Abstract Knowledge Discovery (KD) is a long-tradition field aiming at developing methodologies to detect hidden patterns and regularities in large datasets, using techniques from a wide range of domains, such as statistics, machine learning, pattern recognition or data visualisation. In most real world contexts, the interpretation and explanation of the discovered patterns is left to human experts, whose work is to use their background knowledge to analyse, refine and make the patterns understandable for the intended purpose. Explaining patterns is therefore an intensive and time-consuming process, where parts of the knowledge can remain unrevealed, especially when the experts lack some of the required background knowledge. In this thesis, we investigate the hypothesis that such interpretation process can be fa- cilitated by introducing background knowledge from the Web of (Linked) Data. In the last decade, many areas started publishing and sharing their domain-specific knowledge in the form of structured data, with the objective of encouraging information sharing, reuse and discovery. With a constantly increasing amount of shared and connected knowledge, we thus assume that the process of explaining patterns can become easier, faster, and more automated. To demonstrate this, we developed Dedalo, a framework that automatically provides explanations to patterns of data using the background knowledge extracted from the Web of Data. We studied the elements required for a piece of information to be considered an explanation, identified the best strategies to automatically find the right piece of information in the Web of Data, and designed a process able to produce explanations to a given pattern using the background knowledge autonomously collected from the Web of Data. The final evaluation of Dedalo involved users within an empirical study based on a real-world scenario. We demonstrated that the explanation process is complex when not being familiar with the domain of usage, but also that this can be considerably simplified when using the Web of Data as a source of background knowledge. Keywords: Knowledge Discovery, Linked Data, Explanation, Background Knowledge, Pattern Interpretation Table of Contents List of Figures xix List of Tables xxi I Introduction and State of the Art 1 1 Introduction 3 1.1 Problem Statement . 3 1.2 Research Hypothesis . 5 1.3 Research Questions . 7 1.3.1 RQ1: Definition of an Explanation . 8 1.3.2 RQ2: Detection of the Background Knowledge . 8 1.3.3 RQ3: Generation of the Explanations . 9 1.3.4 RQ4: Evaluation of the Explanations . 10 1.4 Research Methodology . 10 1.5 Approach and Contributions . 11 1.5.1 Applicability . 11 1.5.2 Dedalo at a Glance . 12 1.5.3 Contributions of the Thesis . 13 1.6 Structure of the Thesis . 14 1.6.1 Structure . 14 1.6.2 Publications . 15 1.6.3 Datasets and Use-cases . 17 2 State of the Art 19 2.1 A Cognitive Science Perspective on Explanations . 19 2.1.1 Characterisations of Explanations . 20 xiv | Table of Contents 2.1.2 The Explanation Ontology .

Load more