Digital Libraries © Springer-Verlag 1998

Int J Digit Libr (1998) 2: 20–37 INTERNATIONAL JOURNAL ON Digital Libraries © Springer-Verlag 1998 Creating ontological metadata for digital library content and services Peter C. Weinstein, William P. Birmingham University of Michigan Digital Library, Artificial Intelligence Laboratory, University of Michigan, 1101 Beal Ave., Ann Arbor, MI 48109-2110, USA; E-mail: fpeterw, [email protected] Abstract. We use formal ontologies to represent know- what they need, when they need it. To this end, librar- ledge about digital library content and services. For- ians have developed sophisticated classification schemes mal ontologies define concepts with logic in a frame- and cataloging rules for creating metadata. Metadata de- inheritance structure. The expressiveness and precision of scribes works contained in libraries. Metadata enhances these structures supports computational reasoning that works’ usefulness by providing a basis for search, and can be used in important ways. This paper focuses on the more generally, by identifying their intellectual and his- creation of ontological metadata. torical contexts. We create ontological content metadata by gener- Digital libraries will make greater demands on meta- ating it from MARC (MAchine Readable Cataloging) data than do traditional libraries. The quantity of in- data. MARC contains much information that is hard to formation will be even vaster, and access will be pro- exploit computationally. In particular, relationships be- vided with a large variety of services that help users de- tween works are implicit in shared values and natural lan- fine and satisfy their needs. Most of these services must guage notes. The conversion process involves specifying be routinely provided without human assistance. Thus, an ontological model, mapping MARC to the ontology, it is essential that metadata used in digital libraries be and reasoning about the data to create explicit links be- amenable to computation. Digital libraries should be ca- tween works. pable of reasoning about their contents to reformulate Service metadata will be supplied by providers who queries, customize services to the task and user, deduce wish to participate fully in a digital library that is im- new relations between works, and so forth. In short, the plemented as a decentralized multi-agent system. Agents metadata must support inference. advertise by describing their services in terms of ontolog- We use metadata based on formal ontologies to sup- ically defined concepts. We reason about these descrip- port sound computational reasoning. Formal ontologies tions to organize them into subsumption taxonomies. use logic to define concepts in relation to other concepts. Agents can then find the best available services to meet We describe digital library works with instances of on- their needs by describing their needs, without requiring tology concepts. We can reason about relations among apriori knowledge of other agents. This infrastructure attributes defined in the ontology, and about relations has demonstrated its usefulness in a multi-agent system among the works themselves. We define services as on- organized as a computational economy. tology concepts, and can reason about these definitions. For example, we identify when one service is a specializa- Key words: Metadata – Ontology – Catalog structure – tion of another even when this relationship is not asserted Automatic classification – Multi-agent systems explicitly in the definition. This paper focuses on the creation of ontological metadata for digital library content and services. Formal ontology is an appropriate technique for mod- 1 Introduction eling complex domains. Concept definitions can form webs of relations, rather than being limited to trees. Con- A fundamental issue in building libraries is how to orga- cepts can have many descriptive dimensions (attributes), nize large amounts of information so that users can find may be partially described at any level of granular- P.C. Weinstein, W.P. Birmingham: Creating ontological metadata for digital library content and services 21 ity (with any combination of dimensions), and may be viewed from many perspectives (accessed by different execute query sequences of attribute values). For example, we can represent a song as being simultaneously music, linguistic expression, and possibly fiction, without needing to pri- find collection oritize these characteristics with respect to each other. form A user might search for an audio tool to hear some group User agent team of songs, or for songs that can be played with a certain audio tool. In an ontology, retrieval supports access from Mediator agents either perspective and at any level of granularity. In com- Collection parison, declarative formalisms with less expressiveness agents than ontologies, such as relational databases, force com- mitments to particular combinations and orderings of Fig. 1. Forming an agent team to satisfy a query dimensions. Our ontological model for content is centered around communicates with other mediator agents to help it make a hierarchy of five concepts that loosely describes the its recommendation. Each agent provides a specific ser- creation of work, and thus, how works are derived from vice. For example, one agent might provide a thesaurus, other works. A CONCEPTION is an abstract work, an another might know about topic hierarchies, and another EXPRESSION adds description of the content, a MANI- might keep track of individual agent locations. The user FESTATION adds publishing format, a MATERIAL- agent then directly contacts the recommended collection IZATION adds production format, and an INSTANCE agents to execute the query. has an address for a particular copy. In many respects The difference between service metadata for users and this model articulates librarians’ traditional world-view: agents is mostly in the degree of detail. In Fig. 1, the user it borrows most heavily from the proposal by the Interna- is typically not aware of the need for a thesaurus. On tional Federation of Library Associations (IFLA 1996). another occasion, the same user might want to use a the- To make the creation of ontological content metadata saurus directly. (Agent services, however, may be hidden practical, we generate it from MARC (Network Devel- from users.) opment and MARC Standards Office 1994), thus lever- If we compare our approaches to the creation of aging the tremendous investment in that format. When content metadata and service metadata, we might ask two MARC records share data – for example, they might whether we could create content metadata in a man- have the same uniform title and publisher – we may infer ner analogous to our approach for service metadata. either that one work is derived from the other, or that Why not build infrastructure that encourages informa- both are derived from a common ancestor. MARC also tion providers to supply their own metadata? Potentially, includes many natural language notes with information other kinds of users – researchers, students, and so on about derivation. Computers can make inferences based – could also contribute to metadata knowledge bases as on shared data and process natural language to interpret they use them. Computational reasoning would then be notes, but it is preferable not to do this while users are used to facilitate, edit, filter, manage, and apply user waiting. Generating ontological metadata from MARC contributions. The role of the professional cataloging is thus a form of preprocessing, in which relationships community would change correspondingly, to become in- implicit in MARC are converted into explicit, labeled re- creasingly focused on quality control and less on creating lations amenable to manipulation by computers. metadata. Indeed, we suspect that the ever-increasing We are not currently trying to develop specific on- flood of new information will eventually force changes in tological models of digital-library services; we believe it this direction. There are, however, institutional as well as is premature to do so. Rather, we provide strong in- technical obstacles to creating metadata in this way. For centives for providers to describe their services ontologi- now, we consider this kind of knowledge sharing “future cally as they become available. Our system, the Univer- research”. sity of Michigan Digital Library (UMDL) (Birmingham The remainder of this paper is structured as follows. et al. 1994), has a decentralized, agent-based architecture Section 2 defines “formal ontology”, and describes its rep- (Durfee et al. 1998). A defining characteristic of this ar- resentation with description logic and the kinds of rea- chitecture is that agents form teams with other agents to soning that are then available. Section 3 reports on our solve problems. Agents choose to team with other agents creation of ontological metadata for content. Section 3.1 – at least the first time they work together – on the basis presents the UMDL ontology of digital library content. of other agents’ ontological definitions of their services. Section 3.2 describes the generation of a knowledge base Figure 1 provides a high-level overview of agents coop- of metadata from a sample of MARC records. Section 4 erating to answer a user’s content query. The user agent describes the infrastructure that incorporates the cre- asks a mediator agent to recommend one or more collec- ation of ontological service metadata into the growth of tion agents appropriate to the query. The mediator agent the system. Section 4.1 explains

Digital Libraries © Springer-Verlag 1998

Schema.Org As a Description Logic

Ontologies and Description Logics

Knowledge Representation in Bicategories of Relations

THE DATA COMPLEXITY of DESCRIPTION LOGIC ONTOLOGIES in Recent Years, the Use of Ontologies to Access Instance Data Has Become In

Complexity of the Description Logic ALCM

Description Logics

Description Logics—Basics, Applications, and More

Logic-Based Technologies for Intelligent Systems: State of the Art and Perspectives

Knowledge Representation and Ontologies Part 2: Description Logics

A Fuzzy Description Logic

Reasoning in the Description Logic BEL Using Bayesian Networks

Probabilistic Description Logic Programs Q