Journal of Arab Federation for Libraries & Information "Ialam"

Image Embedded Metadata in Cultural Heritage Digital Collections on the Web: An Analytical Study

Emad I. Saleh Associate Professor (1) Information Science Department, King Abdulaziz University, Jeddah, Saudi Arabia (2) Library and Information Science Department, Helwan University, Egypt

Abstract Information institutions used to create metadata describing the content and properties of information resources and package it separately from the resource itself. However, given today’s web environment, the increasing number of published digital objects and the need to share the metadata describing those objects, it became necessary to embed metadata as an integral part of the objects. In fact, embedding standard metadata in a file structure makes a file self describing, helps identify the resource outside its home system, and supports the searchability, discoverability and management of those resources.

The web abounds with many sites providing the dissemination and sharing of pictures because of their importance in all aspects of life; in fact, in 2012 an estimated 250 million images were posted daily on . Hence this study focuses on the verification of the following hypothesis: Most digitally derived images of cultural resources are stripped of their metadata that identify it once they are published on .

The study seeks to achieve a number of research objectives, including: analysis of images of heritage websites and identify of metadata standards included, definition of the importance of descriptive metadata contained in image for retrieval and archiving, investigate standards and tools needed to embed descriptive metadata in image files. The study depends on the descriptive, analytical methodology using the checklist as a tool for data collection. That a remarkably low percentage of the images analyzed contain metadata, that no links exist between image embedded metadata and its metadata record or the pages of the websites analyzed and that there is significant usage of XMP to encode embedded metadata within the images.

Keywords: Image Embedded Metadata; Cultural Heritage; Digital Collections; Image Metadata

1

Emad, I. Saleh

1. Introduction

For some time now, the world has witnessed a growing trend to preserve cultural heritage and make it accessible for public users over the web. There have been several initiatives and international collaborations, such as the World Digital Library and Europeana, whose goal is to make digital cultural heritage collections widely available by targeting diverse user communities with varying levels of subject knowledge for both leisure and education purposes(Klimaszewski, 2016). Information institutions used to create metadata describing the content and properties of information resources as separate entities from the resource itself. Nowadays, however, given the web environment, the increasing number of published digital objects and the need to share the metadata describing those objects, it has become necessary to embed the metadata as an integral part of the objects. In fact, embedding standard metadata in a file structure makes a file self-describing, helps identify the resource outside its home system and improves the searchability, discoverability and interoperability management of information objects.

The International Press Telecommunications Council (IPTC) stated that: “In the online world, there can be many copies of a single image or video file, and with millions of images or videos on the internet, metadata is essential for identification and copyright protection. We should ensure this metadata travels with the content as a digital label, and remains with it over its lifetime. The metadata associated with an image or video can provide information about: copyright and copyright holders’ associated information, image content, search terms (keywords), technical details of the , rights restrictions for use of the image, etc.”(IPTC—International Press Telecommunications Council, 2015). In 2008, the Stock Artists Alliance (SAA) launched a comprehensive “MetaSurvey” to investigate practices around metadata use and preservation among major stock image distributors. The SAA conducted a random sampling of files available on the websites of five distributors, which included , Corbis, Jupiterimages, Masterfile and , to document the presence of metadata in both “thumbnails” and larger “preview” images. The team then tracked sample images to see what happened to embedded metadata as files were forwarded from distributors to multiple sub-distributors. Preliminary findings included the following: too many images in the licensing market lacked key identifying and content information (caption/descriptive information, management information); there was little use of XMP or ICC profiles on the web; , IPTC “Legacy” metadata Journal of Arab Federation for Libraries & Information "Ialam"

was the most popular; few thumbnails had any metadata; slightly more than half of the sample in preview mode had IPTC/XMP; most of the sample did not have XMP metadata; and the most popular IPTC metadata fields were: credit, copyright notice, object name and caption. The survey attributed the reasons for the missing metadata to internal workflows, server-side processing and special processes, such as watermarking (Riecks, 2008)(SAA, n.d.-a). The objective of the Controlled Vocabulary Survey, which began in November 2009, is to find out whether websites or various image-sharing services were preserving embedded image metadata after upload. Unfortunately, there is no conclusion to this survey, except an updated spreadsheet made available on the web. However, based on preliminary survey data, the amount and type of embedded metadata preserved in JPEG images online varies. Much seems to depend on the type of server side software used and the type of image processing performed when resizing. Some services may claim they remove this information to decrease download time for those viewing the images(Controlled Vocabulary, n.d.). Embedded metadata is contained within the structure of a digital file and can be descriptive, technical, or administrative in nature. Reser (2010) (“Embedded Metadata: Share, Deliver, Preserve” 2010) distinguished between two types of embedded metadata; external data included in images for public use, which aims to maximize interoperability in order to convey content, source, and restriction information to most users; and internal data used primarily for internal processes. The Madison Digital Image Database (MDID) is an example of a content delivery system that allows external metadata to be embedded in its exported image files.

Role of image embedded metadata Information professionals can create additional layers of description to form an external catalog as an integral part of a digital file with which collection managers can strengthen the connection between the image and its data record. These technical and ownership data are often found embedded. Embedded description also creates more flexibility for user access across platforms(“Embedded Metadata: Share, Deliver, Preserve” 2010). By choosing to embed descriptive data, the following benefits can be achieved: - The connection between the image and its data record can be strengthened. An image with embedded metadata becomes an access point to additional information outside the catalog record. Without embedded metadata, the identity of the image is lost and the user has no point of reference to begin research (Frisch, 2012).

3

Emad, I. Saleh

- A file becomes self-describing so that it can be identified and described outside of its home system(Matthew Miller & Mullin, 2011). - The value of an image increases, in terms of both potential licensing arrangements and the protection of culturally significant works. When descriptive, source, and ownership metadata are embedded in image files, users can properly identify and attribute images from multiple sources, even if they have kept incomplete notes or renamed files(“Embedded Metadata: Share, Deliver, Preserve,” 2010). - There is increased flexibility for accessing across platforms, which enables people in and outside an organization to work more efficiently, provides valuable data to the systems that preserve digital content, and can assist in disaster recovery(FADGI, 2015). - It assists and facilitates interoperability between image management systems, and adds descriptive metadata to a digital image, allowing it to take advantage of technologies that can harvest and extract that metadata (EMDaWG, 2010). - Users are able to search and sort image files in a system browser or a photo organization application (Reser, 2012b). - Without descriptive data, digital images will lose their cultural context. In this deluge of images how will future users know what they are looking at? Embedding descriptive metadata in image files offers us the chance to survive(Frisch, 2012). 2. Literature Review A search using the terms “Image embedded metadata” and “cultural heritage”, in SAGE, LISA, Emerald, Science Direct and EBSCO, yielded similar results. The literature reveals a preponderance of studies focusing on four major domains, namely:

Image embedded metadata implementation Huiskes & Lew (2008) presented a collection for the MIR community comprising 25,000 images from the which are redistributable for research purposes and represent a real community of users, both in image content and image tags. In addition, researchers discussed several challenges for benchmarking retrieval and classification methods. Christensen and Dunlop (2010) demonstrated the ongoing process of establishing core embedded metadata within the Smithsonian institution through the work of the Smithsonian Embedded Metadata Group. The focus of the working group described within this paper is the creation of core embedded metadata fields for use in still images. Journal of Arab Federation for Libraries & Information "Ialam"

Sheryl Frisch (2012) illustrated the practice of embedding metadata into image files at CalPoly’s Art and Design Department’s Visual Resources Collection. She showed the evolution of using embedded metadata as a method of applying descriptive label information to images, to its use in cataloging, and how the practice can lead to the discovery of digital resources.

Image embedded Metadata Schema &Standards Using standards for embedded metadata, whether descriptive, technical, structural or administrative (i.e., IPTC and XMP), can aid in searchability, interoperability, provenance, rights management and data repurposing(Christensen & Dunlop, 2010). In , there are two types of embedded metadata: 1) technical metadata, which is usually embedded automatically by the equipment software that produces the file (i.e., equipment name, manufacturer, and capture date); and 2) descriptive metadata, which requires the manual input of additional information about the image of a library, museum or archive; this input could be automated to a certain degree, however, may still require manual data entry of information that’s distinctive to the image (EMDaWG, 2010). Steidl (2007) divided photo metadata into “fields” (user view) or “properties” (technical view). The “metadata field” or “metadata schema” standard defines the semantics of the fields, a label for the field in the user interface, a formal technical identifier for the property and a data type for the content of this property. The “metadata value” standard defines a set of sensible values for a specific use (= field), making a “controlled vocabulary” (CV), which is a vocabulary controlled by a body. Baca (2003) gives an overview of descriptive metadata schemas for art and architecture, including categories for the description of works of art, object ID, and VRA core. The article also focuses on the menu of controlled vocabularies and classification systems needed to populate these metadata schemas, such as the Art & Architecture Thesaurus, ICONCLASS, and others. She warned against selecting an inappropriate schema for a specific type of resource or collection, which can do a substantial disservice to both the materials and to their intended users. Concerning selecting metadata schemas, Reser (2012) discussed a vital question resulting from debates about embedding metadata in digital images, namely: which metadata should be embedded to describe image content, especially when the content is a work of art? This question does not concern technical metadata. In April 2010, the Embedded Metadata Working Group (EMDaWG) created guidelines to propose the minimum descriptive embedded metadata for digital images at the Smithsonian Institution. This document recommends a minimal core set of

5

Emad, I. Saleh embedded metadata and discusses both the IPTC Information Interchange Model and Extensible Metadata Platform (XMP)(EMDaWG, 2010). The EMDaWG recommended ten minimal sets, distributed into two categories: first, the required core set of metadata, which include title, copyright notice, source and creator; and secondly, the suggested—but not required—metadata, which may include date, description, keywords, credit/provider, job identifier and headline (formally called caption). The Group recommended avoiding removing the embedded metadata once the digital surrogates are repurposed (i.e., for web use)(EMDaWG, 2010).

Image embedded Metadata Tools Adobe Bridge can be used to allow the pre-cataloging of information to be recorded directly in the image file without altering live database records. The creation of custom panels in Adobe Photoshop CS4 allows administrators to incorporate multiple schemas (e.g., Dublin Core, VRA Core 4) into a single record for both internal and external uses(“Embedded Metadata: Share, Deliver, Preserve” 2010). Liu & Chen (2009) developed metadata storage and exchange through extended techniques of embedded metadata, in order to combine information about files with the files themselves and further extend and control other related information. An embedded metadata framework (EMF) was structured as a reference platform for the development of embedded metadata in the digital archive system. The Embedded Metadata Explorer (EME), an online embedded metadata editor, was constructed as a part of a project by Matt Miller & Chris Mullin (2011) at Pratt SILS; to explore contextually descriptive embedded metadata in images and other digital assets through the development of a digital archive system (a plug-in for the Omeka web publishing system (http://omeka.org/), allowing it to utilize embedded metadata. EME uses XMP metadata to embed and edit Dublin Core elements into a digital image, providing an overview of all the embedded metadata contained in an image. The Visual Resources Collection at CalPoly has been embedding descriptive metadata since 2004. Three Adobe custom File Info panels (creator, work and image) were developed which were soon merged into one CalPoly panel, in addition to export and import scripts in JavaScript(Frisch, 2012). GregReser (2012), metadata specialist at UC San Diego, developed the VRA panel under the auspices of the Embedded Metadata Working Group (EMwg) of the Data Standards Committee of the VRA, which follows the VRA core and offers a highly customizable user interface(Frisch, 2012). Journal of Arab Federation for Libraries & Information "Ialam"

Challenges &opportunities Baca (2003) addressed challenges concerning controlling terminology or vocabularies because there is no appropriate vocabulary tool suitable for all uses. Unless the metadata or data structure are populated with the appropriate values (terminology), the resource will be ineffectual and hard for users to find. Information resource producers and curators should select from the menu of the most appropriate vocabularies for describing and providing access points to their particular collections. Riecks (2007) discussed many obstacles to the adoption of image embedded metadata, including the inadvertent removal of original metadata due to untested workflow or proprietary software, the intentional removal by rewriting the copyright field to include their name or overwriting to match the internal system, and the failure to follow industry standards. He pointed out that we need industry-wide commitment to adopt metadata standards and best practices that have a consistent world-view approach. Johanna Bauman (2010) examined the challenges related to creating embedded metadata, including: the technical limitations of the cataloger’s tools, the variety of viewing applications employed by users and the potential for the embedded information to become obsolete (“Embedded Metadata: Share, Deliver, Preserve” 2010). Reser (2012) discussed the many challenges facing the embedding of metadata for works of art, because, since they are cultural heritage works which are subject to research, information about them evolves over time. From the above review, it can be concluded that: (1) most studies have focused on best practices and lessons learned from embedded metadata implementation; (2) there is a keen interest in the development of applications and panels; (3) the majority of the studies were published prior to 2012, with no sign of recent research activity to update our current knowledge and trends concerning the topic.

The Literature in Arabic While there is an established body of literature in Arabic on metadata standards and schemas and their application in website description to increase findability, we have scarce knowledge about image embedded metadata, especially for cultural collections on the web. In an attempt to survey the body of research on image embedded metadata in the Arabic literature in the Arabic language, the researcher consulted two Arabic databases in the field: first, the El-Hadi database (AFLI, 2016), and second, the EduSearch database (Dar Almandumah, 2016). The two sources show that, so far, no studies in the Arabic language have examined the topic.

7

Emad, I. Saleh

Therefore, this study is ground-breaking in that, to the best of the researcher’s knowledge, it is the first comprehensive study in the Arab world which aims to recognize the use of image embedded metadata within cultural heritage digital collections on the web. The findings of the present study may serve not only to encourage heritage digital collection providers to reconsider their metadata preservation practices and policies to enrich the content of embedded metadata, but also to raise awareness about the potential and value of embedded metadata in enhancing the findability and exchange of digital collections of cultural heritage. 3. Research Purpose and Methodology

The real issue is that without image attribution information (such as the creator/ author field, or the copyright notice field, or provider), an image could be considered an “orphan work” once it leaves its website. This means that anyone downloading an image from a cultural collection or sharing it with others may not know the origin or context of the image without at least having access to some basic information remaining stored within the image file or a link to a metadata record residing at the preserving organization. Also, regarding image management applications, this kind of metadata could be extracted automatically and be useful for organizing, indexing and retrieving images. The present research initiative, which uses a field study to investigate the availability of embedded metadata within images of digital cultural collections, aims to provide empirical data to further the discussion of the potential value of image embedded metadata. It is designed to investigate and examine a proposed hypothesis that most digitally derived images of cultural resources are stripped of their metadata or metadata identification once they are placed on the web.

Research Questions This paper seeks to address the following questions: - Have cultural heritage digital collection portals or websites preserved embedded metadata within their images? - Do any of the digital images have descriptive metadata (beyond system- generated metadata) already embedded in them? - What metadata schemas are used by the image collections? - What kind of metadata is being used or produced (i.e., descriptive, administrative, technical, etc.)? - Are there any metadata linking the image to the organization holding the copyright or linking to the full metadata record’s (bibliographic record) URL or permanent identifiers? Journal of Arab Federation for Libraries & Information "Ialam"

Sample and data collection 1,000 images were selected randomly from 4 cultural portals and websites which aggregate digitized cultural collections of galleries, libraries, archives and museums.397 images were excluded for invalid file formats or lack of a direct link (URL) to an image file. The final sample amounted to 603 images, distributed as follows:

Qatar Digital Library 50 Europeana 8% 142 24%

World Digital library 200 33% The Commons 211 35%

Figure 1: Distribution of the study’s collection sample

Others 17% Museum 27% Archive 3%

Library 53%

Figure 2: Distribution of the study’s sample by institution type 1. Europeana: Europeana Collections are divided into 22 topics; 18 of them are limited (specified) by country and four collections are not (contain items from various countries). One of the collections, entitled “Art Nouveau posters” was selected to be analyzed. The collection contained 536 images belonging to 20

9

Emad, I. Saleh

institutions from 12 countries. After the167 Czech Republic images were excluded because no direct links or downloadable images were available, 141 items were selected randomly (Table 1).

Table 1: Distribution of the Art Nouveau Poster Collection in the Europeana Portal

Country Collection Study Items Sample Belgium 218 40 Spain 68 34 Croatia 28 20 United Kingdom 23 15 Austria 13 13 Italy 10 10 Europe 3 3 The Netherlands 3 3 France 1 1 Norway 1 1 Sweden 1 1 Total 536 141

Others 21%

Archive Museum 7% 50%

Library 22%

Figure 3: Distribution of study sample of Europeana by institution type 2. The Commons: Ten participating institutions were selected based on their reputation, while ensuring that the various countries were represented, with an emphasis on national libraries. The study intended to draw/select at least most added 10 images from each collection, which reflects recent practices regarding embedded metadata, but this number was exceeded in some collections to ensure data collection (Table 2).

Journal of Arab Federation for Libraries & Information "Ialam"

Table 2: Distribution of Image Samples in The Commons by Institution

Institutions Images Sample The Library of Congress 26,462 40 British Library 1,023,714 30 State Library of New South Wales 2,912 23 National Library of Ireland 1,775 14 National Library of Australia 969 19 New York Public Library 2,525 11 National Library of Scotland 2,313 19 The Finnish Museum of Photography 272 15 Swedish National Heritage Board 1,642 25 National Library of Sweden 165 15 Total 1,034,512 211

3. World Digital library (WDL):45 images were selected, in addition to 145 items belonging to Arab institutions. 4. Arab Libraries: - A preliminary investigation by visiting the websites of three Arab national libraries (Egypt, Saudi Arabia and Qatar) showed that the Qatar Digital Library (QDL) is the only library that publishes a digital cultural collection on their website; therefore 50 items were selected as a sample. - Although Bibliotheca Alexandrina is involved in many projects (i.e., Memory of Modern Egypt, Memory of the Suez Canal, etc.) to make cultural collections available on the web, it seems the library has followed the same policy and technique in presenting their digital collection by using photo albums with tailored viewer to prevent image downloads; therefore its projects had to be excluded from this study, but the sample of its collection within the World Digital Library (WDL) was selected instead. - Major Arab libraries and archives involved in cultural collection preservation initiatives and partnerships were surveyed, revealing that six institutions contributed their collections to the World Digital Library (Table 3).

Table 3: Distribution of Arab Collections in the WDL

Item types

Institutions Total S. Books Books Manuscripts Journals Maps Abu Dhabi Authority for Culture And 10 10 10 Heritage Bibliotheca Alexandrina 57 10 1 68 10 Iraqi National Library and Archives 1509 1509 30

11

Emad, I. Saleh

King Abdullaziz University 4 4 4 National Library and Archives Egypt 57 57 57 Qatar National Library 125 16 34 175 34 Total 182 77 1509 20 35 4823 145

Once the sample was developed, a 4-step data collection process took place: 1) each image was viewed and downloaded via its URL; 2) the image’s URL was used to examine image metadata via the web-based tool “The Embedded MetaData Explorer (EME)” (Matt Miller & Mullin, n.d.); 3) the image file was then re-examined using the windows application “ExifToolGUI” (Hrastnik, n.d.) to verify the metadata schemas extracted by the previous tool; 4) finally, the metadata schema and element name and value for each image was recorded, and any element name with an empty value (content) would be omitted.

4. Results and Discussion Apart from the Dublin Core (DC) schema, a well-known, general-purpose standard used by many libraries for describing image collections, there are several standards dedicated to image metadata, namely: - The IPTC Photo Metadata Standard: the most widely used standard to describe photos, especially among news agencies, stock photo agencies, , and related industries. The current version of the IPTC Standard consists of two schemas, the IPTC Core and IPTC Extension, which were built on the legacy of the original IPTC-IIM and utilized Adobe’s XMP technology as an enriched alternative to the IIM format(IPTC, 2016). - The Visual Resources Association Core (VRA Core): a data standard first developed in 1996, used to describe works of visual culture as well as the images that document them. The VRA Core is hosted by the Network Development and MARC Standards Office of the Library of Congress (LC) in partnership with the Visual Resources Association. The current version (4.0) is expressed as an XML schema to support the interoperability and exchange of VRA Core records(Library of Congress, 2015). - Categories for Describing Works of Art (CDWA and CDWA Lite): an extensive metadata schema for cataloguing and describing art, architecture, and other cultural works and related images. An XML encoding schema to describe core records for works of art and material culture based on CDWA and Cataloguing Cultural Objects (CCO) was developed in 2005, called CDWA Lite. CDWA Lite is intended for contribution to union catalogs and other repositories using the Open Archives Initiative (OAI) encoding and harvesting protocol(J. Paul Getty Trust, 2015; JISC, 2016). Journal of Arab Federation for Libraries & Information "Ialam"

- Extensible Metadata Platform (XMP):a specific type of extensible markup language and metadata platform developed by Adobe that employs the W3C standard 'RDF' (Resource Description Framework) data interchange model for encoding metadata, and utilizes existing schema such as Dublin Core and IPTC(Adobe Systems Incorporated, 2016; SAA, n.d.-b). - Exchangeable Image (): The information captured by the at the point at which a picture is taken, e.g., compression applied, color information, speed, time and date, etc. EXIF is a technical metadata standard that can be written to and read from a still image and audio file. It was developed by JEITA (Japan Electronics and Information Technology Industries Association) to enable camera manufacturers to write technical data into digital images (e.g., camera settings). Although primarily used by digital , some scanners will also write EXIF data. The latest release of the EXIF standard was in 2010(JISC, 2016). Surveying the core elements of the above schemas results in the following table, which maps the elements across the schemas.

Table 4: Dublin Core, IPTC, and VRA Element Mapping

DC VRA Core CDWA (core) IPTC Core Creator Agent Creator Description Creator Coverage Cultural ------Context Date Date Creation Date Date Created Description Description --- Description Description Inscription ------Contributor Location ------Format Material Materials/Techniques --- Description Format Measurements Dimensions Description --- Relation Relation ------Rights Rights --- Copyright Notice Coverage Style/Period --- Date Created Subject Subject Classification Term Keywords Subject --- Subject Indexing --- Terms Format Technique --- Title Title Title Text Title or Headline Type Work type Object/Work Type ---- Coverage --- City/State/Postal Code/Country Identifier --- Repository Numbers --- Relation --- Related Textual References ------Record type ------Source ------State Edition ------Textref ------

13

Emad, I. Saleh

An analysis of the data collected sheds light on the use of image embedded metadata by the cultural collections available on the web. Of the 603 images examined, 172 (28.5%) contained at least one descriptive metadata element, with the majority (60.7%) coming from The Commons collection. Nearly 28.5% of the images contained metadata, but only 151 (25%) of the 603 images examined showed evidence of using the Dublin Core schema, and 114 (18.9%), of using the IPTC or a variant of them. Table 5 contains a breakdown of the Dublin Core elements and IPTC fields (number used and the percentage of images in which they occurred). “Rights” is the element most frequently used (32.4%) in the Dublin Core schema. This is very close to the ratio of the ‘Copyright Notice’ field in the IPTC schema used within the images being studied; this may be due to the keen interest on the part of institutions to record statements about the rights or rights holders associated with those images. Unexpectedly, subject and keyword elements had a very low occurrence (3%) in image metadata, despite their importance for image retrieval, especially on the web.

Table 5: Breakdown of the Dublin Core Elements and IPTC elements (Tags)

Images Images DC IPTC No. % No. % Creator 40 26.5% Keywords 2 1.7% Subject 2 1.3% Date created 39 34.2% Description 35 23.2% Byline 64 56.1% Format 48 31.8% City 2 1.7% Title 41 27.1% Headline 21 18.4% Rights 49 32.4% Credit 21 18.4% Type 4 2.6% Caption-Abstract 52 45.6% Date 27 17.9% Copyright Notice 56 49.1% Identifier 27 17.9% Source 19 16.7% 1 0.7% Object Name 7 6.1% Creator Tool 5 3.3% Total 151 Total 114

In Fig. 4 there is a clear trend toward using the DC schema by libraries. In addition, the DC schema sees its most frequent use by museums among the institutions of Europeana (89.6%); this may be because of its and their familiarity with it. Journal of Arab Federation for Libraries & Information "Ialam"

Figure 4: Use of Dublin Core (DC) and IPTC schemas by institution XMP platform was used by 14(46.7%) institutions to encode DC metadata and by three (10%) to encode IPTC Core. The value for “Creator Contact Information” element of the IPTC Core appeared in only five images (0.8%) with XMP encoding. Concerning technical and administrative metadata, 27.8% of the images in this study included EXIF elements, 24.4% XMP-MM (media management) elements, and 10% XMP-CRS (camera raw schema) elements.

4.1Europeana Europeana is a network for the cultural heritage sector, representing the world’s largest collection of Europe’s cultural heritage data as an aggregate of the digital collections of more than 3,000 institutions across Europe (libraries, archives and museums) to make them globally available(Europeana, 2015; “Why become a data provider?,” n.d.). Europeana was begun as a project called the European Digital Library Network (EDLnet) funded by the European Commission under its eContentplus programme. It aimed to build a prototype of a cross-border, cross- domain, user-centered service. The prototype was launched on 20 November 2008 and the project accomplished its objective of giving access to over 10 million digital objects in 2010(“Europeana,” 2016). Table 6 shows that the collection’s use of the DC schema(34%) is close to that of the IPTC schema (29%); the majority were from the collection of the Museu Nacional d'Art de Catalunya (National Art Museum of Catalonia, Spain).

15

Emad, I. Saleh

Table 6: Number of Europeana images with embedded metadata

IPT XMP EXI DC C F Country Institutions Basi DC Righ MM CR IPT S. c ts S C- core The Royal Institute for - 35 5 3 35 3 - 31 - 40 Belgium Cultural Heritage (KIK- IRPA) MuseuNacional d'Art de 33 33 33 33 33 31 24 32 - 34 Spain Catalunya Croatia Muzejzaumjetnostiobrt 2 4 4 - 4 1 - 4 - 20 Victoria and Albert 4 2 2 2 2 1 - - 15 UK Museum MAK - Österreichisches 2 ------2 13 Austria Museum fürangewandteKunst Soprintendenza alla ------4 Gallerianazionaled'artem oderna e contemporanea Bibliotecanazionale ------5 Italy centrale di Roma Archivi delle ------1 artiapplicateitalianedel XX secolo Europe European Library ------3 Rijksmuseum - 1 1 - 1 - - 1 1 Netherla Gemeentearchief ------2 nds Roosendaal Bibliothèque municipale ------1 France de Lyon Sverresborg Trøndelag - 1 1 - 1 1 - 1 - 1 Norway Folkemuseum Sweden Röhsska Museum ------1 41 74 46 38 74 38 25 69 2 14 Total 1 29 52.5 32.6 27% 52.5 27 27.7 49 1.4 % % % % % % % % %

Table 7 shows that the IPTC element most frequently used is “byline” (25.5%), which is equivalent to “creator” in the IPTC Core(IPTC—International Press Telecommunications Council, 2010), followed by “copyright notice” (24.1%) and “date created” (21.3%). Despite the weak presence of DC’s “creator” and “subject” elements (2.1%) within the collections, the Museu Nacional d'Art de Catalunya showed great interest in embedded metadata by using XMP to encode four DC elements (format, creator, rights, and description) along with the IPTC schema.

Journal of Arab Federation for Libraries & Information "Ialam"

Table 7:Europeana images embedded DC & IPTC elements

DC IPTC

Institutions Creator Creator Subject Keywords Datecreated Byline City Headline Credit Caption- Abstract Copyright Notice MuseuNacional d'Art - - - 24 33 - - - 8 33 de Catalunya Muzej za umjetnost i ------1 - obrt Victoria and Albert - - - 4 1 - - - 4 1 Museum, UK MAK - Österreichisches 1 2 2 2 2 2 2 2 2 - Museum für angewandte Kunst Total 1 2 2 30 36 2 2 2 15 34 % 0.7% 1.4% 1.4% 21.3% 25.5% 1.4% 1.4% 1.4% 10.6% 24.1%

What is interesting in the analysis of XMP encoding values is that the 'xmpRights: Usage Term' embedded in the images of the Royal Institute for Cultural Heritage (KIK-IRPA) contained a URL referring to a statement of ownership and usage rights to this resource. This is similar to the license URL ‘xmpRights: Web Statement’ License URL which was found in the Victoria and Albert Museum.

Table 8: Europeana images embedded XMP encoded elements

Institutions XMP-dc XMP-Rights XMP-IPTC- core Format Format Creator Rights Description Web Statement Creator contact information The Royal Institute for Cultural Heritage (KIK- 5 - 3 - - - IRPA) MuseuNacional d'Art de Catalunya 33 33 32 8 - - Muzej za umjetnost i obrt 4 - - - - - Victoria and Albert Museum 2 1 2 2 2 1 Rijksmuseum 1 - - - - - Sverresborg Trøndelag Folkemuseum 1 - - - - - Total 46 34 37 10 2 1 % 32.6% 24.1% 26.2% 7.1% 1.4% 0.7%

17

Emad, I. Saleh

4.2The Commons The Commons was launched on January 2008, when Flickr released a pilot project in partnership with The Library of Congress. Cultural heritage institutions that join The Commons share images from their photographic collections that have no known copyright restrictions as a way to increase the general public’s awareness of these collections(LOC, n.d.). An analysis of the data in Table 9 shows that the DC schema was used for103 (48.8%) of the collection’s total images, compared to the use of the IPTC schema (34.6%). The biggest users of the DC schema were the British Library collection (26.2%), the Swedish National Heritage Board (24.3%) and the National Library of Ireland (12.6%) respectively. Also, the highest IPTC usage can be attributed to the Swedish National Heritage Board (28%).

Table 9: Number of The Commons images with embedded metadata

IPTC XMP EXIF DC Institutions Base DC Rights MM CRS IPTC- S. core The Library of Congress ------40 British Library - - 27 - - - - 27 27 30 State Library of New South 7 5 2 - 2 3 3 1 5 23 Wales National Library of Ireland 5 13 13 - 13 9 - 13 14 National Library of Australia 19 3 2 - 3 1 - - 2 19 New York Public Library ------11 National Library of Scotland 2 9 9 1 8 - - 9 6 19 The Finnish Museum of 10 8 8 - 8 8 - 8 - 15 Photography Swedish National Heritage Board 21 25 25 - 25 - - 25 25 25 National Library of Sweden 9 14 14 2 14 1 1 15 14 15 Total 73 77 100 3 73 22 4 98 79 211 % 34.6% 36.5% 47.4% 1.4% 34.6% 10.4% 1.9% 46.4% 37.4%

Table 10 shows that the DC element most frequently used is “rights” (23.2%), followed by “format” (22.7%), with a low occurrence of “title” (19.4%) in spite of its importance for resource identification. The British Library seems to have a keen interest in using DC elements, as 27of its 30 images (90%) had four elements encoded with XMP. In addition, the British Library was the only institution to use the “identifier” element.

Journal of Arab Federation for Libraries & Information "Ialam"

Table 10: The Commons images embedded DC elements

Institutions Creator Creator Description Format Title Rights Type Date Identifier British Library - - - 27 27 - 27 27 State Library of New South 3 - 2 - - - - - Wales National Library of Australia 1 - 2 - - - - - National Library of Scotland - - 6 - - - - - Swedish National Heritage 21 21 25 - 21 - - - Board National Library of Sweden 14 14 13 14 1 4 - - Total 39 35 48 41 49 4 27 27 % 18.5% 16.6% 22.7% 19.4% 23.2% 1.9% 12.8% 12.8%

The Swedish National Heritage Board and the National Library of Australia showed great interest in embedded metadata using XMP to encode four DC elements, along with the IPTC schema (Tables11, 12).

Table 11: The Commons images embedded IPTC elements

Institutions Datecreated Datecreated Byline Headline Credit Source Caption- Abstract Copyright Notice Object Name National Library of Ireland 5 ------National Library of Australia - - 19 19 19 - - - National Library of Scotland 2 ------The Finnish Museum of 1 - - - - 9 - - Photography Swedish National Heritage Board - 21 - - - 21 21 National Library of Sweden 1 7 - - - 7 1 7 Total 9 28 19 19 19 37 22 7 % 4.3% 13.3% 9% 9% 9% 17.5% 10.4% 6.6%

For the National Library of Australia, the IPTC source element has an image ID (i.e., nla.obj-140624777) which, when combined with the library website domain name, i.e., http://nla.gov.au/nla.obj-140624777, leads to an image metadata record page on the web; however, it could be more accessible if it were recorded in the elements as a full URL.

19

Emad, I. Saleh

Table 12: The Commons images embedded with XMP-Encoded Elements

XMP-dc XMP-Rights XMP-Rights XMP-IPTC- core

Institutio ns Format Format Creator Rights Description Title Type Photographe r Date Identifier Web Statement Creator contact British - - 26 26 - - 26 26 - - Library State Library of New 2 ------3 South Wales National Library of 13 ------Ireland National Library of 2 1 ------Australia National Library of 9 - 1 ------1 - Scotland The Finnish Museum of 8 - - 3 ------Photography Swedish National 25 21 21 21 ------Heritage Board National Library of 13 14 1 14 14 4 1 - - 1 1 Sweden Total 72 36 49 38 40 4 1 26 26 2 4 % 34.1 17.1 23.2 18 19 2 0.5 12.3 12.3 0.9 2 % % % % % % % % % % %

4.3 World Digital library (WDL) The World Digital Library (WDL) is a Library of Congress project whose main aim is to promote significant cultural content from all countries and cultures by making it available on the internet, free of charge and in multilingual format. The online version of the WDL was launched at UNESCO in April 2009 with content resulting from the contribution of twenty-six institutions in 19 countries(Library of Congress, n.d.). Using study tools to analyze 50 images from the WDL, it was discovered that none of them included embedded metadata. However, when those images were downloaded in PDF format through the WDL viewer program and the files were analyzed, it was found that 32 items (64%) had only one descriptive metadata element, ‘dc.subject’, Journal of Arab Federation for Libraries & Information "Ialam"

containing a statement about the source of the downloadable file and referring to the WDL URL for more information about the item, 15 items (30%) had ‘dc.format’, four items (8%) had ‘dc.creator’, indicating “Library of Congress” as their creator, and two items (4%) had ‘dc.description’, with duplicate content as the subject element. All existing metadata was encoded using XMP.

4.4 Arab Countries As mentioned earlier, a sample of 145 images belonging to six Arab institutions within the World Digital Library (see Table 3), in addition to 50 items from The Qatar Digital Library (QDL), were analyzed and no embedded metadata was found within them. This could imply that these institutions adhered to the same technical guidelines as those established by the Library of Congress. 5. Conclusions This study sheds light on the use of embedded image metadata by cultural collections available on the web. Results show that a remarkably low percentage of such images contain metadata elements. Even though the Library of Congress is involved in several initiatives and projects to preserve cultural heritage and make it accessible, such as the World Digital Library and The Commons, a metadata analysis of its images reveals that none of them contain embedded metadata elements. This implies that the Library of congress followed specific policy towards embedded metadata. By contrast, the National Library of Sweden and the British Library use the Dublin Core schema to describe their images. Although five images (0.8%) have a link to a statement of the ownership and usage rights for the images, the present study shows that there is no link between image embedded metadata and its metadata record or page of the websites studied. Also, there is significant usage of XMP to encode embedded metadata within images because of its ability to accommodate the requirements of the different metadata schema. This research faced a number of obstacles related to downloading and saving images. Arab sites were particularly problematic, due to the lack of available and downloadable Arab images on the websites of cultural institutions, and it was not easy for the researcher to record the results of the analyses which were achieved by using image analysis tools. Despite this, the study brought up many questions in need of further investigation. More work needs to be done to address the challenges facing libraries and archives dealing with cultural collections and interested in image embedded metadata, especially for Arab national libraries.

21

Emad, I. Saleh

References Adobe Systems Incorporated. (2016). Extensible Metadata Platform (XMP). Retrieved from http://www.adobe.com/products/xmp.html Baca, M. (2003). Practical Issues in Applying Metadata Schemas and Controlled Vocabularies to Cultural Heritage Information. Cataloging & Classification Quarterly, 36(3-4), 47–55. http://doi.org/10.1300/J104v36n03_05 Christensen, S., & Dunlop, D. (2010). The Case for Implementing Core Descriptive Embedded Metadata at the Smithsonian. In International Conference on Dublin Core and Metadata Applications (pp. 80–87). Retrieved from http://hdl.handle.net/10088/11123 Controlled Vocabulary. (n.d.). The Controlled Vocabulary Survey regarding the Preservation of Photo Metadata by Social Media Websites. Retrieved July 1, 2015, from http://www.controlledvocabulary.com/socialmedia/ Embedded Metadata: Share, Deliver, Preserve Organizer. (2010). VRA Bulletin, 37(3). EMDaWG. (2010). Basic Guidelines for Minimal Descriptive Embedded Metadata in Digital Images. Retrieved from http://repository.si.edu/handle/10088/9719 Europeana. (2015). Europeana Strategy 2015-2020. Retrieved from http://strategy2020.europeana.eu/ Europeana. (2016). In wikipedia. Retrieved from https://en.wikipedia.org/wiki/Europeana FADGI. (2015). Guidelines: Minimal Descriptive Embedded Metadata in Digital Still Images. Retrieved September 1, 2016, from http://www.digitizationguidelines.gov/guidelines/digitize- core_embedded_metadata.html Frisch, S. (2012). Embedded Metadata in Cultural Image Collections and Beyond: Embedding Metadata in Image Files at CalPoly, San Luis Obispo. Visual Resources Association Bulletin, 39(2), 1–15. Retrieved from http://online.vraweb.org/vrab/vol39/iss2/3 Hrastnik, B. (n.d.). ExifToolGUI: based on widely used ExifTool by Phil Harvey v9.99. Retrieved from http://u88.n24.queensu.ca/exiftool/ Huiskes, M. J., & Lew, M. S. (2008). The MIR flickr retrieval evaluation. Proceeding of the 1st ACM International Conference on Multimedia Information Retrieval MIR 08, 4(November), 39. http://doi.org/10.1145/1460096.1460104 IPTC. (2016). IPTC Photo Metadata Standard. Retrieved January 11, 2016, from https://iptc.org/standards/photo-metadata/iptc-standard/ IPTC - International Press Telecommunications Council. (2010). IPTC Core and Extension Guidelines. Retrieved from https://www.iptc.org/std/photometadata/documentation/GenericGuidelines/index.htm IPTC - International Press Telecommunications Council. (2015). Embedded Metadata Manifesto. Retrieved July 15, 2015, from http://www.embeddedmetadata.org/ ISO. (2012). XMP: ISO 16684-1:2012, Graphic technology – Extensible metadata platform (XMP) specification – Part 1: Data model, serialization and core properties J. Paul Getty Trust. (2015). Categories for the Description of Works of Art. Retrieved from http://www.getty.edu/research/publications/electronic_publications/cdwa/introduction.ht ml Journal of Arab Federation for Libraries & Information "Ialam"

JISC. (2016). Putting Things in Order: a Directory of Metadata Schemas and Related Standards. Retrieved from http://www.jiscdigitalmedia.ac.uk/guide/putting-things-in- order-links-to-metadata-schemas-and-related-standards Klimaszewski, C. (2016). What makes you click ? A Case Study of One User ’ s Experience of the Europeana . eu Portal. In International Workshop on Accessing Cultural Heritage at Scale (ACHS’16). Newark, NJ. Library of Congress. (n.d.). About the World Digital Library. Retrieved January 6, 2016, from https://www.wdl.org/en/about/ Library of Congress. (2015). VRA Core Schemas and Documentation. Retrieved February 10, 2016, from https://www.loc.gov/standards/vracore/ Liu, C., & Chen, C. (2009). Archiving and Management of Digital Images Based on an Embedded Metadata Framework. International Conference on Dublin Core and Metadata Applications, 71–84. Retrieved from http://dcpapers.dublincore.org/pubs/article/view/951 LOC. (n.d.). Library of Congress Photos on Flickr. Retrieved March 16, 2016, from http://www.loc.gov/rr/print/flickr_pilot.html Miller, M., & Mullin, C. (n.d.). The Embedded MetaData Explorer (EME). Retrieved from http://embedmydata.com Miller, M., & Mullin, C. J. (2011). Towards Contextually Descriptive Embedded Metadata. In International Conference on Dublin Core and Metadata Applications (pp. 197–198). Retrieved from http://dcpapers.dublincore.org/pubs/article/view/3648 Reser, G. (2012a). Embedded Metadata and the do-it-yourselfe. In Annual Conference of the Visual Resources Association. Albuquerque. Retrieved from http://www.slideshare.net/VisResAssoc/vra-2012-embedded-metadata-embedded- metadata-and-the-doityourselfer Reser, G. (2012b). What Not to Embed: is it better not to embed certain cultural heritage metadata in images? Visual Resources Association Bulletin, 39(1), 1–3. Retrieved from http://search.ebscohost.com/login.aspx?direct=true&db=lxh&AN=86972598&site=ehos t-live Riecks, D. (2007). Metadata: A Stock Photographer’s View. In International Photo Metadata Conference (pp. 1–23). Florence. Riecks, D. (2008). Introducing the SAA Photo Metadata Project. In Second international Photo Metadata Conference. Julien, Malta. Retrieved from http://www.phmdc.org/programme2008.htm SAA. (n.d.-a). META Insights: MetaSurvey. Retrieved July 1, 2015, from http://photometadata.org/META-Insights-MetaSurvey SAA. (n.d.-b). META Resources Standards: XMP. Retrieved March 5, 2016, from http://www.photometadata.org/META-Resources-metadata-types-standards-XMP Steidl, M. (2007). IPTC photo metadata standards and the art of making standards (pp. 1–6). Why become a data provider? (n.d.). Retrieved January 4, 2016, from http://pro.europeana.eu/share-your-data/become-a-data-provider

23