Knowledge Representation Meets Digital Libraries

Knowledge Representation meets Digital Libraries Enrico Franconi Department of Computer Science University of Manchester, UK [email protected] http://www.cs.man.ac.uk/franconi/ Abstract In this short paper, the basic ideas behind a project on the application of Knowledge Representation formalisms and technologies for Conceptual Modelling and Query Management are presented. It is argued that good Conceptual Modelling is required to support powerful information access through Intelligent Navigation,a fundamental task in Digital Libraries. 1 Introduction In recent years, data and knowledge base applications have progressively converged towards integrated technologies that try to overcome the limits of each single discipline. Research in Knowledge Representation (KR) orig- inally concentrated around formalisms that are typically tuned to deal with relatively small knowledge bases, but provide powerful deduction services, and the language to structure information is highly expressive. In contrast, Information Systems and Database research mainly dealt with efficient storage and retrieval with powerful query languages, and with sharing and displaying large amounts of (multimedia) documents. However, document representations were relatively simple and flat, and reasoning played only a minor role. This distinction between the requirements in Knowledge Representation and Databases is vanishing rapidly. On the one hand, to be useful in realistic applications, a modern Knowledge Representation system must be able to handle large data sets, and to provide expressive query languages. This suggests that techniques developed in the Database area could be useful for knowledge bases. On the other hand, the information stored in digital libraries is now very complex and with deep semantic structures, thus requiring more intelligent modelling languages and methodologies, and reasoning services on those complex representations to support design, management, and retrieval. Therefore, a great call for an integrated view of Knowledge Representation and Information Systems is emerging. Description Logics (DL) are a very promising research area in KR with applications in Databases. The main effort of the research in DL is in providing both theories and systems for expressing structured knowledge and for accessing and reasoning with it in a principled way. Recently, basic progress has been made by establish- ing the theoretical foundations for the effective use of DL in Information Systems [Calvanese et al., 1998c; Borgida, 1995]. This turns out to be of relevant importance also for the digital libraries field, since DL of- fer promising formalisms for solving several problems concerning Conceptual Data Modelling [Calvanese et al., 1998c], Intelligent Information Access [Stock et al., 1993; Bresciani and Franconi, 1996; Bresciani et al., 2000 ], and Information Integration [Calvanese et al., 1998a; Jarke et al., 2000]. Two major problems that arise while designing and managing an information system are conceptual modelling and information access. In the digital libraries context, these problems become more complex and challenging, and require new ideas, technologies, and tools to be solved. In this short paper, the basic ideas behind an EPSRC funded (the British Research Council) project on the application of Knowledge Representation technologies for Conceptual Modelling and Query Management are presented. The aim of the project is to devise and evaluate algorithms, methodologies, techniques, and interaction paradigms to build a tool for conceptual modelling and query management of complex data repositories, based on a framework with its formal foundations on DL. It is argued that good Conceptual Modelling is required to support powerful information access through Intelligent Navigation, a fundamental task in Digital Libraries. A tool supporting the design, the management, and the storage of conceptual schemas has been already de- livered. The toool is connected in the background with a Description Logic inference server, and it is based on a formal framework and a methodology for conceptual modelling borrowed from the most recent ideas in the De- scription Logic research. An extension of the tool for query management and intelligent navigation is on the way, this time connected in the background with a Description Logic inference server augmented with query reasoning capabilities. The query manager is based on the recent research result on query management paradigms which exploit the knowledge in the conceptual schema. This presentation has been divided into two Sections. In the first Section, the data model and the methodology for conceptual design will be introduced. In the second Section, the query management problem in the presence of the previously devised conceptual model will be considered: a global framework will be introduced, together with various basic tasks involved in intelligent information access and navigation. 2 Conceptual Modelling Good conceptual data models put their emphasis on the correct and semantically rich representation of complex properties and relations that may exist between documents. They should allow for an abstract representation of documents which resembles the way they are actually perceived and used in the real world, thus shortening (with respect to the more traditional data models) the semantic gap between the domain and its representation. Conceptual modelling deals with the question on how to describe in a declarative and reusable way the domain information of an application, its relevant vocabulary, and how to constrain the use the data, by understanding what can be drawn from it. Recently, a number of conceptual modelling languages has emerged as de-facto standard, in particular we mention Entity/Relationship (ER) for the relational data model, UML and ODMG for the object oriented data model, and XML, RDF an OIL for the semi-structured data model. Still, many such languages do not have a formal semantics based on logic, or reasoners built upon them to support the designer. Not surprisingly, conceptual modelling tasks have always been in the mainstream of KR research – see for example the research on Ontology representation and design – and can be considered now one of the main applications of KR languages and reasoning techniques. DL can be considered as an unifying formalism, since they allow the logical reconstruction and the extension of representational tools such as object-oriented data models (e.g., UML and ODMG), semantic data models (e.g., Entity/Relationship and ORM), frame-based ontology languages (e.g., OKBC, OIL and XOL) [Calvanese et al., 1998c; 1999; Fensel et al., 2000]. In addition, given the high complexity of the modelling task when complex documents are involved, there is the demand of more sophisticated and expressive languages than for normal databases. Again, DL research is very active in providing expressive languages for conceptual modelling (see, e.g., [Artale and Franconi, 1999; ; Franconi et al., 1999; 2000; Franconi and Sattler, 1999 ]). As to the basic conceptual modelling language, we have decided to adopt an extended ER (EER) conceptual data model. Basic elements of ER schemas are entities, denoting a set of objects called instances, and relationships, denoting a set of tuples made by the instances of the different entities involved in the relationship. Since the same entity can be involved in the same relationship more than once, participation of entities in relationships is repre- sented by means of ER-roles, to which a unique name is assigned. ER-roles can have cardinality constraints to limit the number instances of an entity involved in the relationship. Both entities and relationships can have attributes, i.e., properties whose value belong to some predefined domain – e.g., Integer, String. Additionally, the EER model includes taxonomic relations to state inclusion assertions between entities and between relationships, with the possibility to specify optional covering and disjointness constraints. The most interesting feature of the extended modelling language is the ability to completely define entities and relationships as views over other entities and relationships of the conceptual schema [Calvanese et al., 1998c]. The adopted view language is DÄÊ [Calvanese et al., 1998a], a Description Logic over unary and n-ary relationships. DÄÊ is an interesting decidable fragment of first order logic: among others, inclusion dependencies with DÄÊ views can express (a) unary inclusion dependencies, (b) typed inclusion dependencies without projec- tion, (c) existence dependencies, (d) exclusion dependencies, and (e) full key dependencies. An implementation of of the DÄÊ Description Logic already exist, and it has been developed within our group [Horrocks, 1999; Horrocks et al., 1999b]. The conceptual data model includes the modelling of multidimensional aggregations [Franconi and Sattler, 1999; Franconi et al., 1999]. That is, the conceptual data model is able to represent the structure of aggregated entities and of multiply hierarchically organised dimensions. In a much more radical way than in UML, aggregations become first class citizens of the representation language: it is possible to describe the components of aggregations, and the relationships that the properties of the components may have with the properties of the aggregation itself; it is possible to build aggregations out of other aggregations, i.e., it is possible for an aggregation to be explicitly built

Load more