<<

Dublin Core Analysis: An Examination of Two Collections of Cooking Ephemera

Introduction

The chosen digital collections for the term project are the Szathmary Recipe Pamphlet Digital Collection of the University of Iowa Libraries and the Historical Menu Collection from the University of Washington Libraries. Both collections use the Dublin Core schema and contain similar subject matter, i.e. ephemera about food. The collections are online but unfortunately do not provide access to XML information. The Szathmary Recipe Pamphlet collection is comprised of more than 4,000 recipe pamphlets with a particular concentration on the time period ranging from 1880-1930 when “industrialization gave rise to the modern food industry” (Szathmary Pamphlet Digital Collection, 2007). The digital collection which contains approximately 120 images, serves as a representation of the collection. The Historical Menu Collection represents menus from famous restaurants in the years 1889-2003 around the Seattle and Pacific Northwest region of the United States. The total collection consists of 740 items; approximately 100 of those items have been digitized to serve as a representation of the collection.

Metadata

Metadata has been described by many as “data about data”. With the creation of the web came a rapid growth of available scholarly resources. The reason such resources are made available is for their retrieval and discovery. Metadata is used to facilitate effective resource discovery and organization through precise description and standardization. This is done by employing metadata schemes (i.e. formal structures) for organizing resources. Not only describing data but describing it justly is a necessary means for its accessibility by end users. According to Svenonious (2000) “metadata has long been widely employed by library and information professionals and scholarly communities in the organizing of information and the discovery of resources”. It is no surprise that the vastness of data and richness of resources call for a variety of metadata schemes such as Dublin Core, MARC, TEI, EAD, VRA, and MODS. As both collections for this project employ Dublin Core schema, a brief explanation of its structure is provided.

Dublin Core

Founded in 1995 in Dublin, Ohio, the Dublin Core Metadata Initiative (DCMI) focuses on creating a standardized way for indexing terms for resource description. The Dublin Core

1

Metadata Element Set (DCMES) is comprised of fifteen elements or properties that were developed to enable cross-domain information item description. Dublin Core is one of the most widely used schemes for metadata due to its flexibility and simplicity. Such flexibility is apparent as each of the fifteen elements are both optional and repeatable. The elements may also be listed in any order as one element is not required to appear before the next. The National Information Standards Organization (2007) highlights this flexibility further by stating that the “core” in Dublin Core “is because its elements are broad and generic, usable for describing a wide range of resources (p. ii). Essentially, implementers may use any of the DC elements that work best to describe the resource or item. In order to make the element meaning more specific, or to allow the item to be more aptly described, local metadata creation guidelines may be utilized. These guidelines or local practices facilitate local data elements that allow catalogers additional flexibility. While local guidelines may be helpful for implementers, it also poses consequences for interoperability. Another way to make the element meaning more specific is through qualified Dublin Core elements.

The qualified Dublin Core elements exist to add enhancement to element meaning and interpretation to element value. The DCMI highlights the usage of two classifiers: element refinement and encoding scheme. Both qualifiers can be used in a variety of elements. Two elements that appear to use refinement most frequently are date and relation. For example, one might observe “created” “valid” “issued” instead of date and “replaces” “requires” “has part of” for relation. For Encoding Scheme, “these schemes include controlled vocabularies and formal notations or parsing rules” (Dublin Core Qualifiers, 2012). Such qualifiers allow for interpretation of the element value and are carried out through controlled vocabularies as Library of Congress Subject Headings (LCSH) Thesaurus of Graphic Materials (LCTGM), and the Getty Art & Architecture Thesaurus (AAT).

As Dublin Core is one of the most widely used metadata schemes, there are several organizations that utilize this scheme for digital repository creation. CONTENTdm is an OCLC software created for metadata using the Dublin Core scheme and is used by “nearly 2,000 organizations worldwide” (CONTENT dm in action, 2012). Both the Szathmary Recipe Pamphlet Collection and the Historical Menu collection were created under CONTENTdm.

• CONTENTdm digital collection of collections using DC scheme http://www.oclc.org/contentdm/collections/default.htm

Metadata Usage Examination

2

While the structure of the metadata records in the University of Iowa and University of Washington collections differ, they both follow the Dublin Core usage guide (http://dublincore.org/documents/usageguide/) to some degree. The following table serves as a representation of the usage of elements in each collection. The table illustrates the data collected from 80 records; 40 were collected from the Szathmary Recipe Pamphlet Collection and 40 were collected from the Historical Menu Collection. The records were generally chosen at random within each collection. There were certain elements that repeated such as the subject element in the Szathmary Collection which appeared 2-4 times within each record. These elements were counted individually as the purpose of the data collection was to identify the number of instances each element appears. Not surprisingly, the presence of elements somewhat differed from one record to the next. It should also be mentioned that not all of the element names for the collections matched with the exact names for DCMES. In this scenario, the elements were thoroughly analyzed and marked as one of the 15 elements; if similarities between the locally added elements and DC elements existed. For example, the Historical Menu Collection uses the term “location” and the Szathmary Collection uses the term(s) “geographical subject” and “chronological subject” which correspond with the element coverage. This element may include “spatial or temporal topic of the resource, the spatial applicability of the resource, or the jurisdiction under which the resource is relevant” (DCMI Metadata Terms, 2012). The percentages are based on the number of times the element appeared in each collection.

Element Usage Table

Szathmary Recipe Historical Menu Both collections Pamphlet Total Percentage of Total Percentage of Total Percentage Elements n = 40 total elements n = 40 total elements n = 80 of total n = 526 n = 378 elements n = 904 Title 40 7.60 40 10.58 80 8.50 Creator 7 1.33 - - 7 0.77 Subject 120 22.81 36 9.52 156 17.27 Description - - 33 8.73 33 3.65 Publisher 40 7.60 - - 40 4.42

3

Contributor 40 7.60 4 1.06 44 4.87 Date 39 7.41 36 9.52 75 8.30 Type 40 7.60 38 10.05 78 8.63 Format 40 7.60 38 10.05 78 8.63 Identifier - - 38 10.05 38 4.20 Source 40 7.60 38 10.05 78 8.63 Language - - 1 0.26 1 0.11 Relation 40 7.60 38 10.05 78 8.63 Coverage 40 7.60 38 10.05 78 8.63 Rights 40 7.60 - - 40 4.42

It is apparent that of the 15 elements, only half are utilized in the collections. The most commonly used element is subject as once again it is repeated in each record of the Szathmary Collection. The other frequently used elements are: title, date, type, format, source, relation and coverage. Each collection consistently uses these elements for resource description. If looking at the collections individually, there are elements used consistently in one collection and not the other. For example, the Szathmary Recipe Pamphlet Collection uses the element publisher in all of the records examined whereas the Historical Menu Collection uses the elements description and identifier frequently. The least used elements were creator and language, with only 8 instances of element use total. Although not presented in the table, both collections frequently employed a non-mapping element titled “category”.

Overall, the high number of element usage in both collections can be viewed as positive and in compliance with the DC usage guide. Unfortunately the element usage table provides information on the presence of elements but not on the content within them. For this, the structure of the content and must be examined.

Controlled Vocabularies

Aside from employing the proper element usage for a particular metadata scheme, controlled vocabulary is another significant factor in efficient resource retrieval and discovery. Controlled vocabulary is essential by establishing preferred terms for description and by achieving consistency in such description. Most importantly, controlled vocabulary creates harmony among terms as it aims to connect both the author’s/cataloger’s and end user’s cognitive process of classifying and describing objects. Through using related concepts, users can retrieve desired results because a controlling of synonyms occurs. For example, by

4 standardizing cooking related resources under the Library of Congress Subject Heading “cookery”, the user shall find a wide range of results by both searching for the term or using the term as a browsing aid. Quintarelli (2005) sums up the importance of classification nicely by stating, “As human beings, we need to know our relative position and the viable routes towards other places. In a physical world, we design and use maps, coordinates, graphs, diagrams, and signposts. Equivalent tools are needed to find out way into the virtual world” (: Power to the People). Without such tools and control, there remains a world of resources that will not be discovered. In order to test the Historical Menu and Szathmary Recipe Pamphlet Collection, for efficient retrieval, the controlled vocabulary usage was examined.

Controlled Vocabulary Examination

Given that the collections contain similar subject matter, i.e. ephemera about food it was interesting to explore how each collection’s records used subject language for classification and description. Svenonius (2000) highlights that “subject language is used to depict what a document is about” and is “designed for the special purpose of retrieving information” (p. 128). In order to justly describe a document and achieve consistent and efficient retrieval, a list of preferred terms or a controlled vocabulary, should be present for each collection.

To identify the controlled vocabulary (or lack thereof), the records of both collections were examined with a focus on the subject element. As the subject element is “typically expressed as keywords or key phrases or classification codes that describe the topic of the resource” (Using Dublin Core – The Elements, 2012), this element that has high usage. According to Park (2005), the subject element is generally comprised of a particular controlled vocabulary scheme or a combination of schemes. Taking Park’s findings into account, it was no surprise that one of the collections examined contained controlled vocabulary for the Library of Congress Subject Headings (LCSH), the Thesaurus of Graphic Materials (LCTGM), and the Getty Art & Architecture Thesaurus (AAT).

Looking at the subject terms for the Szathmary Recipe Pamphlet Digital Collection of the University of Iowa Libraries it is apparent that the records contain vocabulary from LCSH, LCTGM, and AAT. Each subject element in the record lists the controlled vocabulary used. For example, a record of a promotional material on canning and preserving has two topical subjects listed: Topical subject (LCTGM) and Topical subject (LCSH). Interestingly enough, there are also three type elements in the record: Type (DCMITYPE), Type (AAT), and Type (IMT). For the purpose of this analysis, the element type using AAT is considered as a subject term since the element is using vocabulary as Svenonius (2000) declares “to name its major concepts” (p. 129). For the records consulted in this assignment, all contained the same terms for the element Type (ATT): cookbooks; promotional materials; recipes. The use of the controlled vocabulary of AAT, illustrate that, “the preferred term can serve as the unique identifier for each collection of

5 equivalent terms” (Morville & Rosenfeld, 2006). Therefore, the controlled vocabulary of AAT tells the audience the three concepts of what the collection is made of.

For the Historical Menu Collection of the University of Washington Libraries, subject terms and the arrangement of records are documented in the finding aid for the collection. There are seven subject terms and two genre headings, all of which are terms from LCSH. When examining records in this collection, each subject element appears to have a genre heading and a subject term from the finding aid guidelines. This aid was quite helpful in learning about the cataloger’s overall process, something that was unclear for the University of Iowa collection.

As both collections have attributes of controlled vocabularies, issues appear in the amount of terms used per record for the LCSH. There is a lack of standardization in how many terms are needed or used to describe the document. For the Szathmary Recipe Pamphlet Digital Collection there are two pamphlets that deal with canning and preserving. One record has six terms for the topical subject (cookery, canning & preserving, fruit, vegetables, meat, fruit trees, vines) while the other record merely has (cookery, canning & preserving). If searching for material on canning, both records would appear. However, it is unclear why one record would lack the other subject terms if the pamphlet it was describing also included preserving fruit and vegetables. Perhaps neglecting to add the terms vegetable and fruit is not the greatest cataloging error, it simply illustrates inconsistency in labeling resources.

Another issue that was encountered was the mixture of controlled and uncontrolled terms in the subject element of the records in the Historical Menu Collection. For example, one record for a hotel menu had the subjects: Kahlua Room, Hotel Windsor, Restaurants—Menus, Hotels—Washington (State)—Seattle. Kahlua Room and Hotel Windsor do not appear in LCSH. A second record for a hotel menu has subjects: Menus, Cloud Room, Restaurants—Washington (State)—Seattle. When comparing these subject terms it is interesting that one record has Menus, and the next has Restaurant—Menus, a narrower term for LCSH. It is also unclear why these subject headings would vary.

Both collections are strong in the sense that they use controlled vocabulary, although to varying degrees, to describe records. The Szathmary Recipe Pamphlet Digital Collection is beneficial in that it provides controlled vocabularies from AAT, LSCH and LSGTM. Users have less opportunity of failing to retrieve a record due to the several options of searchable terms. On the other hand, the lack of standardization for subject language may cause confusion and interoperability issues.

Metadata Quality

6

Another aspect that affects interoperability is the quality of metadata. Park (2009) indicates, “the quality of metadata reflects to the degree which the metadata in question perform the core bibliographic functions of discovery, use, , currency, authenticity and administration” (215). In order to measure the quality of metadata for both the Historical Menu collection and the Szathmary Recipe Pamphlet collection, records from each collection were evaluated. The record quality was assessed through the criteria of completeness, accuracy, and consistency.

Completeness

The criteria completeness measures how the elements describe an object in a record and how it connects that record to its local and parent collection. Completeness also takes into account a repository’s local metadata creation guides. These guides can provide a sense of why and how the elements are used. Since local guides were unavailable for both collections, the metadata standards for Dublin Core were observed with the usage of the collections’ metadata elements. It is important to note that completeness does not always mean that all of the fifteen metadata elements have to be used to define an object. Completeness indicates that enough elements are used to justify and effectively describe the object. In both collections there was not a single record that implemented all of the DC elements. As mentioned previously, and indicated in the table below, there were elements that frequently occurred in both collections: title, date, type, format, source, relation and coverage. The least used elements were creator and language. While similar to the element usage table, this table indicates how many times an element was utilized instead of how many times an element appeared.

Record Completeness Table

Szathmary Recipe Historical Menu Both collections Pamphlet Total Percentage of Total Percentage of Total Percentage Elements n = 40 total elements n = 40 total elements n = 80 of total n = 40 n = 40 elements n = 80 Title 40 100 40 100 80 100 Creator 7 17.5 - - 7 8.75 Subject 40 100 36 90 76 95 Description - - 33 82.5 33 41.25

7

Publisher 40 100 - - 40 50 Contributor 40 100 4 10 44 55 Date 39 97.5 36 90 75 93.8 Type 40 100 38 95 78 97.5 Format 40 100 38 95 78 97.5 Identifier - - 38 95 38 47.5 Source 40 100 38 95 78 97.5 Language - - 1 2.5 1 1.25 Relation 40 100 38 95 78 97.5 Coverage 40 100 38 95 78 97.5 Rights 40 100 - - 40 50

Also observed were locally added elements mainly concerned with provenance details: contact information, digital reproduction information, and digitalization specifications. Another locally added element for both collections was category which did not use standard vocabulary. Anomalies of element use between the collections included a high number of records using the publisher element (Szathmary Recipe Pamphlet Collection) and a high number of records using the identifier element (Historical Menu Collection).

In viewing both collections and their relationship to their local and parent collection, it is apparent that the Szathmary Recipe Pamphlet collection has a much stronger connection and justification of its records. The information of each element is linked not only to the rest of the collection but to the entire repository for the University of Iowa. Also important is that the element date is used and it is interoperable within its repository. The Historical Menu’s date element is not interoperable for users. The Szathmary collection also does a sound job of executing several repeatable subject elements in order to accurately describe the item. Such subject elements include: topical subject, corporate name subject, and chronological subject. The Historical Menu’s collection employs the null mapping element notes in order to describe the object. The length and structure of such descriptions tend to vary record to record. Additionally, the Historical Menu’s collection uses the subject element but to a much lesser degree. Performing a simple subject search in this collection is challenging as many of the content in the metadata records are not connected to the collection itself or the other collections of the University of Washington’s digital repository.

Accuracy and Consistency

8

The criteria of accuracy and consistency are concerned with how the records present the information and if the contents of the elements appear correct. For accuracy, the collections were checked for typographical errors and instances of semantic overlaps in element use. In certain records for the Historical Menu collection, there appeared to non- conforming names for places. For example, a record has the title of “Red Carpet Dinner Menu (higher)” and then under subject “Red Carpet”. It is unclear why the word “higher” is included in the title and why the name of the restaurant is then shortened to “Red Carpet”. Although less common for the Szathmary Recipe Pamphlet collection, there was a record where “Joseph Campbell co.” was the publisher but then the corporate name subject was “Joseph Campbell Company (Camden, N.J.)”. In other records of this collection, the publisher name and corporate name subject are identical in format.

Another issue that occurred was the collection’s usage of the elements format and type. The Historical Menu collection completely misses the mark on employing either element and uses the terms object type and physical description instead. For object type the content always is “menu” and for physical description it is “letterpress” followed by the specific size of the physical format; the only part that follows the DC usage guide. For the Szathmary collection, type is used appropriately and abides by the DCMI type vocabulary. Format is an element missing from the records but it appears to be partly employed in the second type element in each record indicated as Type (IMT). The contents in this element abide by the recommended controlled vocabulary of the Media Types for defining computer media formats. It should also be noted that the element relation was interchangeably used with the terms digital collection, archival collection, and repository collection for both the Historical Menu and Szathmary Recipe Pamphlet collections.

As previously mentioned in the section dealing with controlled vocabulary, there was a great deal of inconsistency involving controlled vocabulary use in the collections. There was also a mixture of subject headings and free-text descriptions in the subject element for the Historical Menu collection; certain records had two LCSH terms and two free-text terms. For Szathmary Recipe Pamphlet Collection, there appeared to be an overabundance of controlled vocabulary usage for one record and simply one or two controlled vocabulary terms in the next. While these terms were still searchable in the collection, it was concerning that records of a similar subject matter would be inconsistently justified through the contents of the subject element.

Of the two collections, the Szathmary Recipe Pamphlet Collection achieves greater semantic interoperability. This collection uses elements that appropriately identify the objects and uses standard vocabularies that create connections to the rest of the digital repository. Many of the elements serve as links to other records of the collection’s holdings. Unfortunately

9 for the Historical Menu Collection, there are fewer elements used and used properly. Also unclear is the amount of hyper-text links found in the records that fail to connect the object with the rest of the digital collections. In the notes section there are free-text sentences which have many words hyper-linked such as “printed” and “orange”. These terms greatly diminish the interoperability as they lead to completely different records with no apparent connection. As mentioned previously each record for the Historical Menu collection has the locally added field category which it uses for interoperability among the rest of the collections. When clicking on the terms provided (if more than one is provided), it redirects the user to the main page for digital collections. If only one term is provided, it will connect the user to other objects that fall under the category. The finding aid for the collection provides the names of three individuals responsible for processing the information. Interestingly enough, it appears that some information was processed in 2003, and later by different individuals in 2007. Perhaps this change, and the lack of an available metadata creation guide, led to the issues that arose when exploring the collection’s semantic interoperability.

Conclusion

As indicated by the data collection and analysis of both the Szathmary Recipe Pamphlet Collection and the Historical Menu Collection, both employ adequate metadata. However, there were several factors that appeared to influence the different levels of each collection’s metadata quality, semantic interoperability and resource description. While both collections only used about half of the 15 elements, the Szathmary Recipe Pamphlet Collection made better use of the elements through controlled vocabulary and linking the controlled terms to other parts of its local and parent collection. The Historical Menu Collection achieves metadata quality only halfway by encompassing attributes of successful metadata (controlled vocabulary, active linking) but failing to correctly use them. Most surprising was that both small collections were similar in the amount of records and used CONTENT dm for organization yet still managed to differ. If a metadata creation guide were present, the quality of both collections may have been better assessed and the DC usage guide may have been better employed. In order to get a greater sense of the University of Iowa’s and University of Washington’s digital collections, the chosen collections for this term project can be examined against other collections of those digital repositories. If implementers chose to add records to each collection hopefully local metadata guidelines are made public and closer attention is paid to the Dublin Core elements and their meanings.

References

Content dm in action. OCLC (2012). Retrieved November 25, 2012 from: http://www.oclc.org/contentdm/collections/

10

DCMI Metadata Terms. Dublin Core Metadata Initiative (2012). Retrieved November 25, 2012 from: http://dublincore.org/documents/dcmi-terms/

Szathmary Recipe Pamphlet Digital Collection. The University of Iowa Libraries (2007). Retrieved November 26, 2012 from: http://digital.lib.uiowa.edu/szathmary/

Using Dublin Core – Dublin Core Qualifiers. Dublin Core Metadata Initiative (2012). Retrieved November 25, 2012 from: http://dublincore.org/documents/usageguide/qualifiers.shtml

Using Dublin Core- The Elements. Dublin Core Metadata Initiative (2012). Retrieved November 3, 2012 from: http://dublincore.org/documents/usageguide/elements.shtml

Morville P. & Rosenfeld L. (2006). for the (3rd ed). Sebastopol, CA: O’Reilly Media, Inc.

National Information Standards Organization. (2001). The Dublin Core Metadata Element Set. ASNI/NISO Z39.85-2001. American National Standards Institute. Available at: http://www.niso.org/kst/reports/standards?step=2&gid=&project_key=9b7bffcd2daeca 6198b4ee5a848f9beec2f600e5

Park, J. (2009). Metadata Quality in Digital Repositories: A Survey of the Current State of the Art. Special issue on Metadata and Open Access Repositories (Michael S. Babinec and Holly Mercer Eds.). Cataloging and Classification Quarterly 47(3/4)

Quintarelli, Emanuele. 2005. “Folksonomies: Power to the People” paper presented at the ISKO Italy-UniMIB meeting : Milan : June 24, 2005. Retrieved November 25, from: http://www.iskoi.org/doc/folksonomies.htm

Svenonius, E. (2000). The intellectual foundation of information organization Cambridge, MA: MIT Press

Appendix 1

URL for Historical Menu Collection: http://content.lib.washington.edu/menusweb/index.html

11

Sample Records from University of Washington Historical Menu Collection

Appendix 2

URL for Szathmary Recipe Pamphlet Collection: http://digital.lib.uiowa.edu/szathmary/

Sample Records from University of Iowa Szathmary Recipe Pamphlet Collection

12

13

• This paper/project/exam is entirely my own work. • I have not quoted the words of any other person from a printed source or a website without indicating what has been quoted and providing an appropriate citation. • I have not submitted this paper/project to satisfy the requirements of any other course. Your Signature: Ria Capone Date: 11/28/2012

14