Image Embedded Metadata in Cultural Heritage Digital Collections on the Web: an Analytical Study
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Arab Federation for Libraries & Information "Ialam" Image Embedded Metadata in Cultural Heritage Digital Collections on the Web: An Analytical Study Emad I. Saleh Associate Professor (1) Information Science Department, King Abdulaziz University, Jeddah, Saudi Arabia (2) Library and Information Science Department, Helwan University, Egypt Abstract Information institutions used to create metadata describing the content and properties of information resources and package it separately from the resource itself. However, given today’s web environment, the increasing number of published digital objects and the need to share the metadata describing those objects, it became necessary to embed metadata as an integral part of the objects. In fact, embedding standard metadata in a file structure makes a file self describing, helps identify the resource outside its home system, and supports the searchability, discoverability and management of those resources. The web abounds with many sites providing the dissemination and sharing of pictures because of their importance in all aspects of life; in fact, in 2012 an estimated 250 million images were posted daily on Facebook. Hence this study focuses on the verification of the following hypothesis: Most digitally derived images of cultural resources are stripped of their metadata that identify it once they are published on websites. The study seeks to achieve a number of research objectives, including: analysis of images of heritage websites and identify of metadata standards included, definition of the importance of descriptive metadata contained in image for retrieval and archiving, investigate standards and tools needed to embed descriptive metadata in image files. The study depends on the descriptive, analytical methodology using the checklist as a tool for data collection. That a remarkably low percentage of the images analyzed contain metadata, that no links exist between image embedded metadata and its metadata record or the pages of the websites analyzed and that there is significant usage of XMP to encode embedded metadata within the images. Keywords: Image Embedded Metadata; Cultural Heritage; Digital Collections; Image Metadata 1 Emad, I. Saleh 1. Introduction For some time now, the world has witnessed a growing trend to preserve cultural heritage and make it accessible for public users over the web. There have been several initiatives and international collaborations, such as the World Digital Library and Europeana, whose goal is to make digital cultural heritage collections widely available by targeting diverse user communities with varying levels of subject knowledge for both leisure and education purposes(Klimaszewski, 2016). Information institutions used to create metadata describing the content and properties of information resources as separate entities from the resource itself. Nowadays, however, given the web environment, the increasing number of published digital objects and the need to share the metadata describing those objects, it has become necessary to embed the metadata as an integral part of the objects. In fact, embedding standard metadata in a file structure makes a file self-describing, helps identify the resource outside its home system and improves the searchability, discoverability and interoperability management of information objects. The International Press Telecommunications Council (IPTC) stated that: “In the online world, there can be many copies of a single image or video file, and with millions of images or videos on the internet, metadata is essential for identification and copyright protection. We should ensure this metadata travels with the content as a digital label, and remains with it over its lifetime. The metadata associated with an image or video can provide information about: copyright and copyright holders’ associated information, image content, search terms (keywords), technical details of the photography, rights restrictions for use of the image, etc.”(IPTC—International Press Telecommunications Council, 2015). In 2008, the Stock Artists Alliance (SAA) launched a comprehensive “MetaSurvey” to investigate practices around metadata use and preservation among major stock image distributors. The SAA conducted a random sampling of digital image files available on the websites of five distributors, which included Getty Images, Corbis, Jupiterimages, Masterfile and Alamy, to document the presence of metadata in both “thumbnails” and larger “preview” images. The team then tracked sample images to see what happened to embedded metadata as files were forwarded from distributors to multiple sub-distributors. Preliminary findings included the following: too many images in the licensing market lacked key identifying and content information (caption/descriptive information, color management information); there was little use of XMP or ICC profiles on the web; , IPTC “Legacy” metadata Journal of Arab Federation for Libraries & Information "Ialam" was the most popular; few thumbnails had any metadata; slightly more than half of the sample in preview mode had IPTC/XMP; most of the sample did not have XMP metadata; and the most popular IPTC metadata fields were: credit, copyright notice, object name and caption. The survey attributed the reasons for the missing metadata to internal workflows, server-side processing and special processes, such as watermarking (Riecks, 2008)(SAA, n.d.-a). The objective of the Controlled Vocabulary Survey, which began in November 2009, is to find out whether social media websites or various image-sharing services were preserving embedded image metadata after upload. Unfortunately, there is no conclusion to this survey, except an updated spreadsheet made available on the web. However, based on preliminary survey data, the amount and type of embedded metadata preserved in JPEG images online varies. Much seems to depend on the type of server side software used and the type of image processing performed when resizing. Some services may claim they remove this information to decrease download time for those viewing the images(Controlled Vocabulary, n.d.). Embedded metadata is contained within the structure of a digital file and can be descriptive, technical, or administrative in nature. Reser (2010) (“Embedded Metadata: Share, Deliver, Preserve” 2010) distinguished between two types of embedded metadata; external data included in images for public use, which aims to maximize interoperability in order to convey content, source, and restriction information to most users; and internal data used primarily for internal processes. The Madison Digital Image Database (MDID) is an example of a content delivery system that allows external metadata to be embedded in its exported image files. Role of image embedded metadata Information professionals can create additional layers of description to form an external catalog as an integral part of a digital file with which collection managers can strengthen the connection between the image and its data record. These technical and ownership data are often found embedded. Embedded description also creates more flexibility for user access across platforms(“Embedded Metadata: Share, Deliver, Preserve” 2010). By choosing to embed descriptive data, the following benefits can be achieved: - The connection between the image and its data record can be strengthened. An image with embedded metadata becomes an access point to additional information outside the catalog record. Without embedded metadata, the identity of the image is lost and the user has no point of reference to begin research (Frisch, 2012). 3 Emad, I. Saleh - A file becomes self-describing so that it can be identified and described outside of its home system(Matthew Miller & Mullin, 2011). - The value of an image increases, in terms of both potential licensing arrangements and the protection of culturally significant works. When descriptive, source, and ownership metadata are embedded in image files, users can properly identify and attribute images from multiple sources, even if they have kept incomplete notes or renamed files(“Embedded Metadata: Share, Deliver, Preserve,” 2010). - There is increased flexibility for accessing across platforms, which enables people in and outside an organization to work more efficiently, provides valuable data to the systems that preserve digital content, and can assist in disaster recovery(FADGI, 2015). - It assists and facilitates interoperability between image management systems, and adds descriptive metadata to a digital image, allowing it to take advantage of technologies that can harvest and extract that metadata (EMDaWG, 2010). - Users are able to search and sort image files in a system browser or a photo organization application (Reser, 2012b). - Without descriptive data, digital images will lose their cultural context. In this deluge of images how will future users know what they are looking at? Embedding descriptive metadata in image files offers us the chance to survive(Frisch, 2012). 2. Literature Review A search using the terms “Image embedded metadata” and “cultural heritage”, in SAGE, LISA, Emerald, Science Direct and EBSCO, yielded similar results. The literature reveals a preponderance of studies focusing on four major domains, namely: Image embedded metadata implementation Huiskes & Lew (2008) presented a collection for the MIR community comprising 25,000 images from the Flickr website which are redistributable for research purposes and represent a real community of users,