CIDOC CRM In

A first-order logic expression of the CIDOC Conceptual Reference Model Carlo Meghini Istituto di Scienza e Tecnologie dell'Informazione, Consiglio Nazionale delle Ricerche, Pisa, Italy E-mail: [email protected] Martin Doerr Institute of Computer Science Foundation for Research and Technology - Hellas Heraklion, Crete, Greece E-mail: [email protected] Abstract: The CIDOC Conceptual Reference Model is a well-known conceptual modelling language for documenting cultural heritage artifacts, with a special attention to museum objects. CRM is an ISO standard since 2006 (ISO21127:2006) and renewed 2014 (ISO21127:2014). The CRM is specified in a semantic data modelling style and relies on consolidated notions for the representation of knowledge such as classes, properties, IsA hierachies, domain and range constraints and cardinality restrictions. However, the CRM still lacks a formal specification of its semantical and inferential apparatus. This lack makes it difficult to clearly define fundamental operations on a CRM knowledge base, such as querying or consistency checking, while preventing any investigation on the computational properties of the language. This paper provides such an apparatus by expressing the CRM as a first-order theory. It then provides a reduction of the theory to a datalog program, and shows how the program can be used to effectively query a knowledge base taking into account the logical consequences of the represented knowledge. Keywords: Ontology; Metadata; Logic Reference to this paper should be made as follows: Meghini, C. and Doerr, M. (xxxx) À first-order logic expression of the CIDOC Conceptual Reference Model', International Journal of Metadata, Semantics and Ontologies, Vol. x, No. x, pp.xxx{xxx. Biographical notes: Carlo Meghini is prime researcher at CNR-ISTI and the head of the Digital Libraries group in the NeMIS Lab of ISTI. His area of research includes Digital Libraries, Digital Narratives and the Semantic Web. He has been leading or participating in a series of international projects on digital libraries, digital preservation and research infrastructures for the humanities. He has been involved in the making of the European digital library Europeana, taking care of the scientific aspects of the project. He is member of the editorial board of the ACM Journal on Computers and Cultural Heritage. He has published more than 50 scientific refereed papers in international journals, books and conferences. Martin Doerr is Research Director at FORTH-ICS and since 2004 he is leading the Centre for Cultural Informatics, an activity of the Information Systems Lab. He has been leading or participating in a series of national and international projects for knowledge management, terminology management, cultural information systems, information integration systems and research infrastructures. His research concerns information integration by formal ontologies, scientific argumentation and e-science methods. He is leading the working group of ICOM/CIDOC (International Committee for Documentation of the International Council of Museums) which has developed ISO21127:2006, together with the respective ISO committees, a standard core ontology for the semantic interoperability of cultural heritage information. He is member of the editorial board of the journal Applied Ontology and the ACM Journal on Computers and Cultural Heritage. He has published more than 30 refereed papers and 6 book chapters. Copyright c 201X Inderscience Enterprises Ltd. Int. J. Signal and Imaging Systems Engineering, Vol. x, No. x, 201X 2 1 Introduction For these reasons, we have chosen to express CRM as an instance of the first-order logic L; defined in [5]. The CIDOC Conceptual Reference Model1 (hereafter This logic extends the classical predicate calculus with CRM) is a well-known conceptual modelling language a referential apparatus based on standard names that for documenting cultural heritage artifacts, with a adequately captures the treatment of individuals of the special attention to museum objects. CRM is an ISO CRM. Happily, this choice does not prevent to achieve standard since 2006 (ISO21127:2006) and renewed 2014 computational tractability. (ISO21127:2014). The logical formalization is expected to bring several advantages to the CRM, from a scientific point of view. The CRM is specified in a semantic data modelling In particular, the first-order logic expression of the model style and relies on consolidated notions for the may serve as a better communication medium with other representation of knowledge such as classes, properties, researchers, allowing for a better understanding of the IsA hierachies, domain and range constraints and CRM that can serve several purposes: cardinality restrictions. These notions are considered sufficiently clear and unambiguous both in the • appreciating the CRM's underlying ontological documentation and in the standardization communities. decisions; The CRM specification has been used several times for implementing the model, and some of these • enriching the CRM with knowledge representation implementations support large knowledge bases, such as mechanisms that are well understood and as such the one at the British Museum. The status of the CRM transferrable with no special effort, such as n-ary as a data definition language is therefore quite solid, from relations, tractable forms of negation, or simple a practical point of view. role constructors, just to mention a few; However, the CRM still lacks a formal specification of • extending the CRM to new domains or to aspects its semantical and inferential apparatus. This lack makes of knowledge representation that are not dealt with it difficult to clearly define fundamental operations on a by the current version. CRM knowledge base, such as querying or consistency checking, while preventing any investigation on the Finally, logic provides a language for expressing computational properties of the language. constraints that go beyond the representation machinery The lack of formalization of semantic data models of object-oriented modelling, and which may turn out to has been one of the two main motivations for be useful in more refined definitions of the CRM. Logic the development of Description Logics [1], the also provides two useful notions, such as consistency and other motivation being computational amenability. implication; consistency is important for the integrity of Description Logics are the theoretical counterpart of a CRM knowledge base, while implication is important the Ontology Web Language (OWL) [2], a distinguished for extracting knowledge from a CRM knowledge base. member of the Semantic web suite. This suite includes From a knowledge representation point of view, the also the Resource Description Framework (RDF), a present study can be viewed as an attempt at defining rather basic knowledge representation language endowed the knowledge level of a CRM knowledge base [6]. with a model-theoretic semantics [3]. RDF offers a The paper is structured as follows: we will start vocabulary for semantic modelling known as RDF by examining the expressive requirements of the CRM Schema [4]. It would seem natural, then, to resort to one (Section 2). Based on the results of this examination, of these languages to express the inferential apparatus we will define the first-order language LC for the logical of the CRM. There are several reasons why we will not expression of the CRM (Section 3). Next, we will follow this route. introduce the axioms that capture the CRM ontology First, both RDF and OWL have a syntax that (Section 4). We will then define the notion of knowledge is closer to that of implementation language rather base and query, thus concluding the logical formulation than to a specification language; for instance, they of the CRM (Section 5). The remaining part of the use International Resource Identifiers as non-logical paper (Section 6) is devoted to design a datalog- symbols, which are hard to read and unnecessary for our based implementation of a knowledge base. Section 7 purposes. Second, and more important, our study will concludes. show that none of these languages has the expressive power required by CRM: RDF Schema does not allow to capture property quantification; strong shortcuts are 2 Expressive Requirements of the CRM not expressible in OWL 2 DL, while weak shortcuts can be expressed in OWL 2 DL but with several In this Section, we examine the basic principles of limitations, as explained in Section 4.2.2. Third, it turns the CRM in order to obtain a clear understanding of out that the treatment of individuals at the basis of its expressive requirements. We start by outlining a the CRM cannot be captured by standard first-order general correspondence between semantic data models logic, and consequently by Description Logics, which are and logical languages. Subsequently, we will consider the contractions of standard first-order logic. specific features of the CRM. As a notational convention, Copyright c 201X Inderscience Enterprises Ltd. A first-order logic expression of the CIDOC CRM 3 CRM terms are written in Sans Serif, e.g. E1 CRM Entity, This language can be directly put in correspondence with while first-order symbols are written in italics, e.g. E1 the elements of semantic data models as follows: CRM Entity. • objects are named by the constant symbols of the 2.1 From the CRM to logic: first principles language, such as pisa; The correspondence between semantic data modeling • classes are named by unary predicate symbols, such and first-order

Load more