Hampson, Cormac TCD-SCSS-PHD-2011-13.Pdf
Total Page:16
File Type:pdf, Size:1020Kb
AN EXPERT-SUPPORTED APPROACH TO DATA EXPLORATION A thesis submitted to the University of Dublin, Trinity College for the degree of Doctor of Philosophy Cormac Hampson Knowledge and Data Engineering Group (KDEG) School of Computer Science Trinity College, Dublin, [email protected] Supervised by Dr. Owen Conlan Declaration I, the undersigned, declare that this work has not been previously submitted as an exercise for a degree at this or any other University, and that, unless otherwise stated, it is entirely my own work. __________________________ Cormac Hampson May 2011 ii Permission to Lend or Copy I, the undersigned, agree that the Trinity College Library may lend or copy this thesis upon request. __________________________ Cormac Hampson May 2011 iii Acknowledgements Many people have provided support to me during the course of my Ph.D. studies. Firstly, I would like to thank my supervisor Dr. Owen Conlan for his guidance and many helpful contributions to this research over the years. I would also like to express my gratitude to my colleagues in KDEG who have been so generous with their time since I joined Trinity College. A special mention must go to Dr. Rachael Rafter and Dr. Alex O'Connor for their time and feedback while I was compiling this thesis. Furthermore, I would like to thank Meltem Gürel, Thomas Hengster, Virgile Brouard and Aonghus McGovern, whose work with various incarnations of SARA and SABer helped to shape their implementation greatly. Many thanks are also due to Peter Williams for helping to initiate this research path many moons ago. I would also like to acknowledge my sponsors who funded this research: The Embark Initiative of the Irish Research Council for Science, Engineering and Technology, funded by the National Development Plan; and Science Foundation Ireland via grant 08/IN.1/I2103 — Adaptive and adaptable media and services for dynamic personalisation and contextualisation (AMAS). Finally, I would like to take this opportunity to express my gratitude to my family and friends for their encouragement throughout this entire process, and especially to Judy for her boundless patience and good humour. Knowledge is knowing that a tomato is a fruit; wisdom is knowing not to put it in a fruit salad. iv Abstract Almost all information domains have witnessed a large increase in the amount of structured and semi-structured data available. However, there is still a lack of support for casual computer users who wish to create queries spanning multiple information sources. Until this occurs, the real benefits of having such a proliferation of metadata will not be realised by the general public. This thesis proposes a novel expert-supported approach to data exploration that will help casual users interact with large content repositories. Specifically, this approach helps users leverage expert knowledge to discover relevant information and to draw correlations across separate data sources. These heterogeneous sources can be in various data formats, and are accessed by users in a consolidated fashion. Both a framework and a set of models based on this approach have been designed and are implemented in a technical infrastructure called SARA (Semantic Attribute Reconciliation Architecture). An associated authoring tool called SABer (Semantic Attribute Builder), which works in tandem with SARA has also been developed, and provides support for domain experts with no computing programming or data modelling experience to encode their expertise. Importantly, this means that SARA can be used in a broad range of domains, once rich metadata is available. How this expertise can then be tailored to an end-user’s interpretation or context, in order to provide him with more meaningful semantics, is another key issue tackled in this research. In summary, this thesis presents a novel and generic knowledge access platform that serves as an intermediary between curators and consumers of data. It describes the expert- supported approach to data exploration, its accompanying framework and models, as well as the implementation of SARA and SABer. Furthermore, the validation of these systems and their underlying approach is performed through five distinct evaluations. These evaluations incorporate a variety of techniques, including user trials, performance tests, questionnaires and interviews, and involve experiments with both SABer and SARA, and the third party applications that use them. v Contents Declaration ............................................................................................................................ ii Permission to Lend or Copy ................................................................................................. iii Acknowledgements .............................................................................................................. iv Abstract .................................................................................................................................. v Contents ................................................................................................................................ vi List of Figures ........................................................................................................................ x List of Tables ...................................................................................................................... xiii Abbreviations ....................................................................................................................... xv 1 Introduction ....................................................................................................................... 1 1.1 Motivation .................................................................................................................. 1 1.2 Research Question ...................................................................................................... 5 1.3 Research Objectives ................................................................................................... 5 1.4 Contribution ................................................................................................................ 6 1.5 Technical Approach ................................................................................................... 7 1.6 Thesis Overview ......................................................................................................... 8 2 State of the Art ............................................................................................................... 10 2.1 Introduction .............................................................................................................. 10 2.2 Complex Querying by Casual Users over Multiple Sources .................................... 10 2.2.1 Search Computing Systems ............................................................................... 11 2.2.2 Parallax and Sparallax ....................................................................................... 13 2.2.3 Explorator .......................................................................................................... 16 2.2.4 PowerAqua ........................................................................................................ 18 2.2.5 ORAKEL ........................................................................................................... 20 2.2.6 Semantic Web Portal ......................................................................................... 22 2.2.7 MashArt ............................................................................................................. 25 2.2.8 Discussion ......................................................................................................... 27 2.2.9 Key Findings ..................................................................................................... 30 2.3 Domain Independent Tools to Support SME Encoding by Non-Technical Experts 31 2.3.1 Konduit VQB ..................................................................................................... 32 2.3.2 SPARQLViz ...................................................................................................... 34 2.3.3 Potluck ............................................................................................................... 36 2.3.4 SpreadATOR ..................................................................................................... 38 2.3.5 Web Ontology Building System for Novice Users: A Step-by-Step Approach 40 2.3.6 ROO ................................................................................................................... 43 2.3.7 Discussion ......................................................................................................... 45 vi 2.3.8 Key Findings .................................................................................................. 49 2.4 Conclusion ............................................................................................................. 49 3 Design ............................................................................................................................. 50 3.1 Introduction ........................................................................................................... 50 3.2 Influence from State of the Art .............................................................................. 51 3.3 Design Considerations ..........................................................................................