Where Are We Headed? RDA, BIBFRAME, and the FRBR Reference Model

The technical services world is in a state of chaotic transformation at this moment, and undoubtedly librarians are feeling the growing pains. Three major initiatives in the cataloging field are driving the revolution, and their adoption will mean big changes in the way that catalogers and metadata specialists approach their work. Given the many projects, models, and papers being disseminated and discussed within the library community, the question becomes, what will library cataloging and metadata creation look like in the next decade? Where are we headed? And what knowledge and skills will we need to function in this increasingly digital and mechanized world?

The metadata and data management initiatives that are currently being developed, both in and outside the library realm, are numerous, but at the moment three large projects are currently underway that have the potential to be the most transformative for those providing metadata and cataloging services in . The first is the International Federation of Library Associations and Institutions (IFLA)’s

Functional Requirements for Bibliographic Records (FRBR) Library Reference Model (FRBR LRM) (Riva, Le

Bœuf and Žumer, 2016) which seeks to harmonize the various FRBR models into one. The second is the ongoing development and reworking of Resource Description and Access (RDA), the set of rules used to standardize bibliographic description for world-wide access and sharing (RDA, 2010). The third is

BIBFRAME, an encoding system which will eventually (probably?) replace the current MAchine-Readable

Cataloging (MARC) record as the fundamental repository for bibliographic data (Library of Congress

Bibliographic Framework Transition Initiative, 2012a). These three initiatives are very closely entwined, and heavily depend upon each other for implementation. The opportunities that these new tools present are truly paradigm-shattering, while at the same time are creating no small sense of unease and discomfort in the library world.

RDA and FRBR In 1997, IFLA released a study, Functional Requirements of Bibliographic Records (FRBR), which identified four major user tasks (Find, Identify, Select, and Obtain) and developed an entity-relationship model to describe the bibliographic data needed for users to carry out those tasks. (International

Federation of Library Associations and Institutions, 1997) In an entity-relationship model, things

(entities) are linked together and described by the relationships between them (Figure 1):

The model was deliberately based on techniques used to describe relational databases (IFLA 1997, p. 9) because of their structured approach to data, but the model is also quite useful in designing database systems that can take advantage of semantic web applications and machine-readable data. The FRBR model for bibliographic data was completed in 1997 and amended through 2009. Because FRBR dealt mainly with bibliographic data and entities, two accompanying models were developed to describe creators and subjects (Functional Requirements for Authority Data, or FRAD (IFLA, 2008), and Functional

Requirements for Subject Authority Data, or FRSAD (IFLA, 2011)), which defined their own entities and the relationships between them.

In the FRBR model, the highest level entity is the work. The work represents the intellectual product of a creator or creators, such as Antoine de Saint-Exupéry's Petit Prince, or Anton Dvorak’s Cello Concerto in

B Minor. Each of these works is expressed in different ways; Saint- Exupéry's book can be expressed in

English, or French, or have different translators, for example. Dvorak’s concerto can be performed by

YoYo Ma, or Miloslav Rostopovich, or can be in score form. These are different “expressions” of the work. Both work and expression are abstractions, and do not exist out in the real world. When the score is published or the piece is recorded, the work is manifested. This manifestation is what is cataloged for libraries, where a surrogate record for the manifestation of a work stands in for the actual work itself.

An item is a single exemplar of a manifestation--this is the copy that is held in particular library, archive, museum, or repository. Each work also has a creator or creators related to it, as well as subjects. In figure 2, the WEMI entities and the relationships between them are illustrated. A work may have many expressions, but each expression may only realize one work. A manifestation may embody more than one expression (consider a bilingual edition of a novel, for example), and an expression may have many manifestations. A manifestation may be exemplified by many items, but an item can only have one manifestation. (IFLA, 1997, p. 13-14).

In the early 2000s, the Joint Steering Committee, an international body of representatives from national libraries, recognized the need for cataloging standards that were more robust and responsive to the digital environment and the new digital materials that were being produced. They intended to update the current Anglo-American Cataloging Rules (AACR2), which were seen as outdated and not responsive to the new online environments in which librarians and users were operating. AACR2 had been developed from still earlier guidelines based on traditional cataloging suited to card catalogs and analog environments, and bibliographic records reflected these practices. Although there were many theoretical underpinnings to cataloging instructions before RDA, such as those laid out by Charles Cutter

(1904), S.R. Ranganathan (1931), and Seymour Lubetzky (1953, 1960, 1969), there was no logic model underlying cataloging practices. AACR2 and other cataloging codes provided the standardization needed for the sharing and dissemination of bibliographic data, but were based mostly on traditional practice, rather than an examination of what elements were actually required in a bibliographic record for a user to find, identify, select, and obtain the material he or she needed. In 2005, after a period of comments on an early draft of what was then called AACR3, it became clear to the Joint Steering Committee that a complete overhaul, rather than a simple reworking, of the standards was needed (Joint Steering

Committee for the Development of RDA, 2005). AACR2, the cataloging rules that preceded RDA, treated different types of bibliographic resources separately, requiring the cataloger to identify the kind of resource to be cataloged (volume, sound recording, electronic resource, etc.), and then searching in separate chapters for the rules to describe it.

When RDA was released in 2010, it presented cataloging rules in a way that was entirely new to catalogers and librarians who were used to AACR2, much to the consternation of some in the field. RDA was fully compatible with the FRBR model, and what part of FRAD and FRSAD was available at that time; its organization, however, was revolutionary. RDA uses the FRBR model to break down bibliographical description into the fundamental entities of work, expression, manifestation, and item. The type of resource being cataloged is no longer of primary interest, as RDA recognizes that there is no fundamental difference in the information needed to describe something, regardless of the form. All resources have, or may be supplied, a title. All resources have a date of production, publication, manufacture, or copyright, or may be described by the lack of such. All resources have a kind of extent, be it in dimensions in centimeters, number of online files, or minutes of film. By recognizing these

“functional requirements” needed for description, the FRBR model compacts the extensive sets of individual rules that had to be given for each form of resource into a much more generalized process, as directed in the instructions RDA presents.

RDA’s FRBR-inspired organization was a fundamental shift away from the traditional methods of cataloging that had been in use for at least a century. Although fully compatible with International

Standard Book Description (ISBD) punctuation and description (e.g. the addition of a space-slash-space after a title and before an author’s name in the following: The Grapes of Wrath / John Steinbeck), RDA no longer requires it. RDA is potentially useful, therefore, to other cultural heritage institutions for their own resource description, since it does not require a particular punctuation scheme or method for presenting resource data. RDA is also, intentionally, not tied to MARC or any particular coding format, allowing for the development of new systems for encoding bibliographic data that are both more user friendly and are able to leverage new developments in web applications, machine-actionable data, and the Semantic Web. RDA was fully implemented by the Library of Congress in March, 2013, with Library and Archives Canada, the British Library, National Library of Australia, and Deutsche Nationalbibliothek following later that year. Testing of RDA previous to release, however, demonstrated that much of the benefit of RDA was going to be unrealized as long as MARC remained the encoding format for bibliographic data.

The MARC Encoding Format

MARC, the data format used to record bibliographic information in machine-readable form, was developed to transfer card catalog information into a digital format. In fact, MARC was originally developed to convert data on Library of Congress cards into a machine-readable form for printing and distribution to libraries who subscribed to the Library of Congress card service (Avram, 1975). Many fields in MARC reflect ISBD, and are presented in a way that mirrors how data was recorded on cards in card catalogs. As technology improved and the internet became ubiquitous in the 21st century, the billions of MARC records, representing the collective holdings of tens of thousands of libraries, became

“siloed” in the invisible web, the part of the internet lost to web crawlers and search engines, as the

MARC format is not compatible with internet protocols.

MARC encoding heavily relies on matching strings of text data. Mirroring the original format of card catalogs, headings are filed in alphabetical order, and entries are determined by cataloging rules that may be unfamiliar to users who must know to search for Carroll, Lewis rather than Lewis Carroll (Coyle,

2016). While front end webpacs have been developed to handle such queries through keyword searching, library cataloging has continued to produce bibliographic records that depend on rules-based entries and alphabetic ordering as they were in card catalogs, in no little part due to the persistence of the MARC record. And although MARC stands for “machine-readable” cataloging, the bibliographic data stored in MARC catalog records relies heavily on human interpretation (Thomale, 2010). In this way it is similar to textual markup language, such as XML, rather than structured data. For example, in MARC records currently the date can be recorded in a field as 2006, c2006, ©2006, [2006] and so on, depending on the cataloging rules being used. A person reading that piece of data understands that the manifestation was produced in 2006. A computer cannot understand the semantic rules underlying the recording of free-text MARC dates, and can only parse the explicit string of characters given in the field.

Some modification of MARC has been done with the release of RDA to make this more machine-friendly, through the use of additional fields and indicators, or through the use of machine-understandable or machine-actionable data. For example, dates may be recorded according to ISO 8601 as YYYY-MM-DD.

However, the major problem remains. MARC records are records, not data, and rely on syntax and context to make sense to the user.

BIBFRAME

As early as 2002, librarians were recognizing the limitations of MARC and the need for something to replace it. Roy Tennant that year published a provocative article entitled “MARC must die!” (Tennant,

2002) which enumerated some of the inherent problems with the MARC format and suggested an extensible, XML-based system that would allow for flexibility, granularity of data, and the recognition of hierarchical data. In 2008, the Working Group on the Future of Bibliographic Control at the Library of

Congress released their final report, which called for developing a more flexible, extendible metadata format, integrating library standards into the web environment, and generating web-based identifiers for data elements and vocabularies used by the Library of Congress to include in bibliographic records

(Library of Congress Working Group on the Future of Bibliographic Control, 2008). The Library of

Congress announced the outlines of a program created to explore a bibliographic framework transition initiative (later, BIBFRAME) to replace MARC in May 2011, and laid out seven issues that would be addressed, including experimenting with Semantic Web and linked data technologies, enabling users to navigate relationships between entities for more precise search and retrieval, and bringing existing metadata into the new system. (LoC BIBFRAME, 2011) By May 2012 the Library of Congress had hired

Zepheira to provide a model for a bibliographic framework that could translate the existing MARC21 format into a linked data infrastructure. (LoC BIBFRAME, 2012b)

Linked data forms the backbone of the Semantic Web, a concept described by Tim Berners-Lee, in which data is linked together to form a web of information. If you find something, you should be able to find other, related data. One way this can be done is by using RDF, a model that is used to exchange information on the web, and URIs, a string of characters used to identify something, allowing data to be linked to and from on the Web. In RDF, data is linked by a URI, naming the relationship between two things, which are also expressed as URIs. This is called an RDF “triple” and takes the form of subject/predicate/object. A bibliographic example would be “Charles Dickens is the author of Bleak

House,” where “Charles Dickens” is the subject, “is the author of” is the predicate, and “Bleak House” is the object. In RDA form, the statement would replace each element with a URI that was understandable by machines:

http://id.loc.gov/authorities/names/n78087607 [Charles Dickens]

http://id.loc.gov/vocabulary/relators/aut [is author of]

http://id.loc.gov/authorities/names/no2012013691 [Bleak House].

Although not friendly to human eyes, it is fully parsable by computers. By using standard naming protocols, information can be shared and identified across different platforms, allowing for discovery and collaboration. FRBR’s entity-relationship model is modelled on this idea of RDF triples, which makes

FRBR and RDA compatible with RDF specifications. BIBFRAME is based on an RDF model that uses a data interchange format, such as XML, Turtle, or JSON, , to allow data to be stored and transported on the web. Although the BIBFRAME model does not exactly align with the FRBR work, expression, manifestation, and item entities, resources in BIBFRAME can be successfully mapped to FRBR entities, making them compatible. The model itself consists of four classes: creative work, instance, authority, and annotation as shown in Figure 3.

A BIBFRAME resource can be any one of these classes. A creative work is “a resource reflecting a conceptual essence of the cataloging resource,” (LoC BIBFRAME 2012a, p. 8) equivalent to a FRBR work or expression level entity. An instance is a resource that is a concrete manifestation of a work (a book, map, recording, pdf document, etc.), roughly equivalent to a manifestation in the FRBR model. An authority is a person, corporate body, family, object, place, jurisdiction, topic, meeting, or time frame associated with a creative work and instance (LoC BIBFRAME 2014a) Annotations are resources that expand and enhance other BIBFRAME resources with additional information, and can include cover art, reviews, holds and holdings information, or description.

BIBFRAME leverages the use of RDF and a text syntax to allow bibliographic data to be compatible with web-based protocols. For the first time, the rich information held in library records can be broken out of the MARC silos in which it is contained and can be freely discovered and shared on the web. The RDF syntax structure appears quite different than the MARC format that librarians are used to dealing with, but the elements of MARC records can be efficiently mapped to the BIBFRAME environment, which was one of the requirements that the Working Group on the Future of Bibliographic Control identified in its report. It is important to remember than although the raw coding of an RDF-based BIBFRAME record is rather impenetrable to the human eye, display programming can be written to present the data in these records in virtually any format wanted. Raw MARC records are encoded as one long single string of data, which is then manipulated in programs such as OCLC or integrated library systems for ease of use. Public views in library catalogs on webpacs or online catalogs are further manipulated to present a pleasing and easy to read record. In this respect RDF-based cataloging is no different.

Currently, the Library of Congress is working with a number of libraries and institutions, including the

Deutsche Nationalbibliotek, the British Library, OCLC, and the National Library of Medicine, to develop and test BIBFRAME implementations. Tools and downloads are available from the BIBFRAME website for those who are interested in experimenting with the model, the vocabularies, or tools for creating bibliographic data in BIBFRAME or converting MARC records to the BIBFRAME model. (LoC BIBFRAME n.d.) A major issue with the further development and testing of the model is the migration of Integrated

Library Systems from a MARC-record based database to one which fully supports BIBFRAME and RDF syntaxes. As vendors are slow to change and the model itself is still under construction, it is expected to be some time before there are systems in place to allow the recording of bibliographic data in a

BIBFRAME environment. Even then it is expected that the conversion of legacy MARC data to a new format will take some time, and a period of multiple formats is likely. In the meantime, interested libraries can contribute to the development of BIBFRAME through involvement in the Libhub Initiative

(http://www.libhub.org/), a program started by Zepheira in 2014. The purpose of the initiative is to publish library records in the BIBFRAME format on the web, building up a core set of library data on the web. (Zepheira 2014, FAQ) The goal is to allow users to find relevant resources on the web and be directed back to a .

Where are we headed?

Work on new initiatives and models for bibliographic data is ongoing. IFLA has begun a process to reconcile the FRBR, FRAD, and FRSAD models into one, referred to as the FRBR Library Reference Model

(FRBR-LRM). It retains the underlying structure of the FRBR model, while refining the and consolidating the kinds and definitions of entities and user tasks. A preliminary report was published in 2015; the full report was published in late February and is available for review until May 1, 2016 (Riva, Le Bœuf, and

Žumer, 2016). It is expected that with the adoption of the FRBR-LRM, RDA will be further changed and revised to harmonize with the new consolidated model. In addition, much more work will be done on developing the BIBFRAME environment for the recording of bibliographic data, which will also be compatible with both the FRBR-LRM and RDA rules. So where are we headed, and what should we be doing to prepare for the new world of bibliographic data and web visibility and discovery?

Firstly, librarians have to stop thinking in terms of bibliographic records and start thinking in terms of data. Strings of text held in a flat record such as a MARC file have served the library community well for a long time, but the limitations of information in this format have been enumerated. Instead, we must start thinking of bibliographic information in terms of data that is machine-actionable, able to be manipulated and shared across different databases and platforms, and that is not format specific. It has been lamented that the rise of Google has lessened the use of library catalogs and discovery tools, but this is in great part due to the lack of operability of the MARC format in the online world. If we want our collections to be discoverable and used, we are going to have to move away from the MARC bibliographic "record" and the "flat file" way of thinking and embrace a more three-dimensional way of approaching data that allows for sharing and interoperability both within and without the library community. This will include being willing to share the creation of metadata for library resources with vendors, publishers, and others. As Karen Coyle points out, “this means dropping the snobbery of ‘only libraries create quality data’” (Coyle 2007, p. 46) and being open to sharing our data with other constituencies.

Secondly we have to have a basic understanding of linked data, RDF triples, RDF/XML and other data interchange formats such as JSON, Turtle, and R3. Librarians have long complained that software and integrated library system (ILS) developers and vendors do not understand their needs, but the road goes both ways. It is incumbent upon the library community to have a solid foundation in the fundamentals of internet data formats, schemas, and Semantic Web applications in order to have productive conversations with software developers, IT professionals, and ILS designers. Librarians have the opportunity to shape and direct how the “catalog of the future” will look and operate, but we have to have some basic knowledge of how these systems work in order to partner effectively with ILS vendors.

This isn’t suggesting that all librarians need to learn how to code RDF triples in Turtle or XML. We do, however, need to understand the basics of data models and interchange formats to have intelligent conversations about the possibilities and limitations for metadata creation, storage, and retrieval. We also need to start piloting small linked data projects in our own collections, to develop expertise and share lessons learned.

Thirdly, we have to start pushing our ILS vendors to move towards the transition to a new, post-MARC system. Vendors are going to be reluctant to put much time and effort into research and development of a new linked-data compatible ILS until there is a firm commitment on the part of the library community to adopt BIBFRAME or another standard. Some vendors, such as Sirsi-Dynix and Innovative

Interfaces, have been partnering with Zepheira in the LibHub Initiative to produce tools that will allow for the creation of library records in linked data formats and make them visible on the web (Innovative

2015, SirsiDynix 2015). However, these programs are in early pilot stages and will take years to develop alongside BIBFRAME. In the meantime, we need to be talking to our vendors about what their plans are for developing systems that can produce linked data that can be freely shared in the wider Sematic Web community. If we do not push vendors to develop systems capable of managing linked data, our resources will remain forever siloed, and we will never be able to take advantage of the full potential of linked data and the Semantic Web. Conclusion

As librarians, we have become used to change. We have also been staunch advocates for high-quality metadata. While the need to develop new technological expertise seems daunting, in order to move our bibliographic data out of its traditional silo and in to the 21st century web environment we must make that leap. It would be a great loss if the decades of work that has been done to maintain the rich collection of library metadata stagnates, unused, because of a reluctance to change. Besides, as Virginia

Schilling writes, “despite the complexity, frustration, and general chaos involved in transitioning to a newer technology like linked data, it should be recognized that there really may be no choice in the matter” (Schilling 2012). As Google has shown us, we are not the only provider of metadata on the block, and it could be argued we no longer are the best, either. We need to get on board or get out of the way.