Abstract Conceptual Navigation (ACN)
Total Page:16
File Type:pdf, Size:1020Kb
Habilitation Thesis University Rennes 1 Reconciling Expressivity and Usability in Information Access from File Systems to the Semantic Web S´ebastien Ferre´ Associate Professor Team LIS, IRISA, University Rennes 1 [email protected] http://www.irisa.fr/LIS/ferre/ Defended on November 6th, 2014 in front of the defense committee, composed of: Olivier Pivert Professor at Univ. Rennes 1 / president Norbert E. Fuchs Senior researcher at Univ. Zurich / referee Marianne Huchard Professor at Univ. Montpellier 2 / referee Eero Hyvonen¨ Professor at Univ. Aalto / referee Karell Bertet Associate professor (HDR) at Univ. La Rochelle / examiner Fabien Gandon Researcher (HDR) at INRIA Valbonne / examiner 2 To Sterenn, Yann, and El´eonore 3 4 Foreword Since the start of my PhD in 1999, I have worked on various problems, and produced a number of different contributions pertaining to several domains and research communities. This habilitation thesis is an opportunity for me to summarize them, and to open perspectives. However, that synthesis work is made difficult by the fact that the set of my research tracks look more like a bush than like a tree. In fact, I see the research process as an evolution process where ideas, theories, softwares, and applications compete for sur- vival, and are transformed by mutation and crossover. Some of my research tracks have gone extinct (e.g., logic functors), while others have flourished (e.g., Semantic Web). Some have appeared by mutation (e.g., guided edition from guided search), while others have merged (e.g., conceptual navigation and faceted search). At every time, there is not a single active research track, nor a \best" research track, but a population of diverse and comple- mentary research tracks. That evolution process makes it hard to synthesize the different research results into one \all-inclusive" result. First, merging different existing results in a coherent way is a research process in itself, and a never-ending process. Second, the result of that merging would be a static snapshot hiding the evolution process, and hence hiding both the past and the future of the research activity, and that snapshot would soon become outdated. An alternative approach would be to write a history of research tracks and results, in chronological order. However, people generally do not want to revive the past, which tends to be boring, but rather to know the past in order to understand the present, and envision the future. What I therefore chose to do is to write a synthesis of my research work over the last 15 years such that it is rooted in the present, and yet, shows the evolution process from past to future. To this purpose, I \datamined" my different research results, and discovered a pattern that provides an uni- fying framework for a number of quite different results: Abstract Conceptual Navigation (ACN). Each result is presented as an instance of the ACN frame- work, without trying to reformulate the original result, only using ACN as a point of view over it. The presentation of successive results through a 5 6 same framework makes it easier to compare them, and therefore to highlight progress over time. As an unifying framework, ACN helps to understand the specificity of my research work, and what drived it from file systems to the Semantic Web. The generality of ACN also opens the room for imagi- nation, and helps suggest and structure a number of perspectives. ACN is not a \hook in the sky", and builds on strong information access paradigms: query languages, navigation structures, and interactive views. In fact, ACN is a proposal for a synthesis of those three information access paradigms. This habilitation thesis is made of a synthesis chapter, and four selected papers published in international journals as appendices. The selected papers represent important milestones in my research work, and are shown exactly as they were published. The synthesis is centered around Abstract Concep- tual Navigation (ACN), and is made of three parts: state-of-the-art, main contributions, and perspectives. The state-of-the-art is an overview over ex- isting approaches to information access for structured data. It tries to be complete at a coarse level of detail, but it ignores many details and speci- ficities of existing approaches. This is justified, I think, by the objective to introduce and motivate ACN, which is itself at a high level of abstraction. A more detailed account of related work can be found in published papers. The ACN framework is then introduced, and used as a basis for the presentation of the main contributions of my work. Secondary contributions are associ- ated to their closest main contribution, and only briefly presented. Finally, the main contributions are summarized, and ACN is used again as a basis for discussing a number of perspectives. In the synthesis, self-references are distinguished by a boldface font. Each appendix paper has its own reference list. Even though I use the first-person singular in this foreword because I here express my point of view, most of my work is our work. First of all, all this work would not even exist without the original ideas of Olivier Ridoux in the late nineties, and the opportunity he gave me to work on them. Then, I was lucky to work with many colleagues and students, who largely contributed to the \population growth" of my research tracks. This is why the use of the first-person plural in the synthesis is not only for scientific tradition, but also an acknowledgement of their contributions on many results. I will not try and list all of them here, but the reader will see them all along the synthesis, and as co-authors in self-references. Contents Foreword5 Synthesis9 1 Introduction............................9 2 State of the Art.......................... 12 2.1 Query Languages..................... 13 2.2 Navigation Structures.................. 15 2.3 Interactive Views..................... 20 2.4 Summary and Comparison................ 24 3 Main Contributions........................ 26 3.1 Abstract Conceptual Navigation (ACN)........ 27 3.2 Logical Concept Analysis (Camelis).......... 30 3.3 Cubes of Concepts (Abilis)............... 34 3.4 Query-based Faceted Search (Sewelis)......... 37 3.5 Updating Through Interaction (Utilis)........ 41 3.6 Possible World Exploration (PEW)........... 44 3.7 An Expressive CNL for the Semantic Web (Squall). 47 4 Conclusion and Perspectives................... 48 4.1 Theoretical Perspectives................. 50 4.2 Applicative Perspectives................. 55 References 61 Index 77 Acronyms 79 A Introduction to Logical Information Systems (2004) 81 B Camelis: a Logical Information System to Organize and Browse a Collection of Documents (2009) 127 7 8 CONTENTS C Reconciling Faceted Search and Query Languages for the Semantic Web (2012) 153 D SQUALL: a Controlled Natural Language as Expressive as SPARQL 1.1 (2014) 173 Synthesis 1 Introduction In many domains where information access plays a central role, there is a gap between expert users who can ask complex questions through formal query languages, and lay users who either are dependent on expert users, or must restrict themselves to ask simpler questions. In everyday practice, tools like folder trees, search engines, and spreadsheets are far more often used than databases and semantic knowledge bases (formal information systems), even though the later tools offer much more expressivity and precision in search. In fact, formal information systems are so tedious to use that even expert users do not generally use them for their everyday activities. When a database is used, only a limited number of pre-defined queries are generally made available to end users through interfaces. We think this is because there are two bottlenecks in the use of formal information systems. The first bottleneck is the input of data, which requires the formal representation of data. The second bottleneck is the expression of formal queries (e.g., in SQL). The common point between those two bottlenecks is the use of formal languages, an update language for the former, a query language for the latter. Because of the formal nature of those languages, there seems to be an unescapable trade-off between expressivity and usability in information systems. The objective of this synthesis is to present a number of results and perspectives that show that the expressivity of formal languages can be reconciled with the usability of widespread information systems. The final aim of this work is to empower people with the capability to produce, explore, and analyze their data in a formal way. We focus on formal languages and systems because they are key to a high expressivity and precision. Therefore we restrict the scope of our work to structured and semantic information, i.e. information represen- tations that are amenable to (logical) reasoning. This includes relational databases (RDB) [Codd, 1970], spreadsheets, classification hierarchies, and the Semantic Web (SW) [Berners-Lee et al., 2001, Hitzler et al., 2009]. This 9 10 CONTENTS a priori excludes unstructured texts, images and other multimedia objects as those would be seen as mere strings and bitmaps in our approach. How- ever, a large number of works aim at automatically extracting structured and semantic information from such contents (e.g., natural language pro- cessing, clustering/classification, segmentation). A notable example is DB- pedia [Lehmann et al., 2013], a Semantic Web version of Wikipedia that ex- tracts and formally represents about 2 billions facts. More generally, the extension of the Web of documents with a Web of data, which includes the Semantic Web, means that more and more information is available in struc- tured formats, ranging from simple CSV files to Linked Open Data (LOD)1. Because of its structured and semantic nature, its openness, and its support by the W3C, the Semantic Web is the playground of choice for our work, even though similar results could have been obtained on the more established re- lational databases.