<<

Follow-Up Questions ASERL Webinar: “Intro to #2 -- Forbearing the Digital Dark Age: Capturing for Digital Objects” Speaker = Chris Dietrich, National Park Service Session Recording: https://vimeo.com/63669010 Speaker’s PPT: http://bit.ly/10PKvu8

UPDATED – May 23, 2013

Tools

1. Is photo watermarking available using Windows Explorer?

Microsoft Paint, which comes installed with Windows, provides basic (albeit inelegant) watermarking capabilities.

2. Do Microsoft tools capture basic metadata automatically, without user intervention?

Microsoft Office products capture very basic metadata automatically. The Author, Initials, and Company are captured automatically from a user’s Windows User Account settings. File system properties like File Size, Date, etc. are also automatically captured. The following Microsoft Knowledge Base articles provide details for each Microsoft Office product: http://office.microsoft.com/en-us/access-help/view-or-change-the-properties-for-an-office-file- HA010354245.aspx, http://office.microsoft.com/en-us/help/about-file-properties-HP003071721.aspx. Microsoft SharePoint can be configured to automatically capture metadata for items uploaded to libraries: http://office.microsoft.com/en-us/sharepoint-help/introduction-to-managed-metadata-HA102832521.aspx.

3. Can you recommend tools/services that leverage geospatial data that do not provide latitude & longitude information? For example, I want to plot a photo of “Mt Doom” on a map but have no coordinates….

Embedding geospatial coordinates in digital objects (often called “geotagging”) can be done with a number of tools. GPS Photo Link (http://www.geospatialexperts.com/gps-photo%20link.php) allows users to add coordinates to embedded metadata manually, or by selecting a photo(s) and then clicking a point on a Bing Maps satellite image. Other software with similar capability includes ArcGIS from Environmental Research Systems Institute (www.esri.com), and Geotag available from SourceForge (http://geotag.sourceforge.net/ ). There are many others!

4. Can you recommend any open source tools?

It’s almost impossible to make recommendations without a good understanding of requirements. There are many open source tools for managing metadata for digital objects. A good place to find open source metadata tools is SourceForge (http://sourceforge.net/). Wikipedia has the following lists of photo software:

• Free photo software: http://en.wikipedia.org/wiki/Category:Free_photo_software • Photo Software: http://en.wikipedia.org/wiki/Category:Photo_software

5. Can you provide screenshots of using at least a couple of the tools you described?

GPS Photo Link Start Page

GPS Photo Link Attribute Editor Tab

MS Windows JPEG Properties (metadata)

MS Windows JPEG Advanced Properties (metadata)

MS Word 2010 Document Properties (metadata) Panel (available from document editing view)

MS Word 2010 Advanced Properties (metadata)

Formats

6. For the NPS metadata standards, what’s the difference between “Access constraints” and “constraints information”?

Access Constraints describes the type of constraint on the object. Constraints Information provides for a free-text description of the constraint. For example:

Access Constraints: “Restrictions apply on use and/or reproduction (Sensitive material) “

Constraints Information: “Abandoned mineral features may pose safety hazards, be archeological sites, or be endangered species habitat.”

7. What is the DNG format?

DNG is an open RAW image format often referred to as a “digital ”. DNG supports embedded metadata and generally provides somewhat smaller file sizes than other RAW formats. It has received mixed adoption since its introduction by Adobe Systems Inc. in 2004. Not all camera and software makers support the DNG format. An ISO standard based on DNG 1.3 is pending.

8. Which formats are best (or standards) for preservation and why? TIFF (for Tag (or Tagged) Image ) is widely adopted as the archival or master format for digital images. The following features of the TIFF format contribute to its desirability for preservation:

• Lossless – there is no loss of data • Embedded metadata – supports metadata • Longevity – TIFF has been in use for over 20 years • Wide adoption – the most widely used format for preserving digital images • Broad support – supported by most image viewers/editors and web browsers

TIFF resources:

• Adobe TIFF page: http://partners.adobe.com/public/developer/tiff/index.html • Library of Congress Digital Preservation TIFF page: http://www.digitalpreservation.gov/formats/fdd/fdd000022.shtml

PART THREE: Metadata

9. Can you provide a specific example of embedded metadata?

Here is an NPS JPEG photo that has metadata embedded in Exif header:

The photo has been watermarked and renamed using elements in the embedded metadata.

Here is a link to the photo online: https://www.dropbox.com/s/iumr2n3ynw1mu1d/MORU_20110714_tag.JPG?v=0mwng. You can open the photo to view all metadata using Opanda IExif, GPS Photo Link, or Photo Studio. 10. I’m concerned about the cost of creating embedded metadata (versus external metadata) –what’s the right balance of costs & level of effort to create and maintain these data?

Creating metadata, whether embedded or external, will have some cost. Unfortunately, there is no easy way to estimate costs. Choosing one method over the other will not necessarily result in savings. It will take a concerted effort to determine what resources (people, equipment, software, budget, etc.) are available now and into the future, and determine from that analysis the best approach for your organization.

Things to consider:

• The cost of software to embed metadata versus the cost of creating and maintaining an external metadata system o External metadata systems run the from sidecar text files or spreadsheets stored in the same directory as the digital object (cheap), to digital asset management (DAM) systems/ (can be expensive) o Embedded Exif metadata and Windows Explorer search/discovery is affordable and may be adequate for many organizations • If you will be storing the metadata in a , spreadsheet, XML, or text file: o How will you ensure that the digital objects and the external metadata do not get separated? o How will metadata get updated if something changes with the digital objects? o How will support for the external metadata system be provisioned for the long term? o Will you be able to digitally migrate the external metadata to newer formats/versions as needed to ensure future readability/access? • If metadata will be embedded in the digital assets themselves: o How will the embedded metadata be made viewable by humans and/or data systems? o How will you store, discover, and retrieve objects using the embedded metadata? o Will you be able to digitally migrate the objects over time and still preserve the metadata? • If both embedded and external metadata will be used: o How will metadata embedded in digital objects remain synchronized with the external metadata? o Will you be able to digitally migrate the objects over time and still preserve the metadata? • Do you have staff dedicated to work on metadata and data management and/or can you contract for the work? o Initial digitizing, organizing, and metadata creation may be a short-term cost o Management and curation of digital objects is a long-term commitment of people and money • Work on the highest priority/most valuable items first. This will pare down the size of the project (and costs) and will better ensure that the most important items get documented if your effort stalls or ends prematurely. • If you plan to manage digital objects for the long term, your initial management system is likely to be replaced (in whole or part) every five to ten years. Do you have the resources to support that turnover?

11. Does embedded metadata affect the use and management of check sums?

I am not an expert in digital forensics so the following is my limited understanding. Checksums can be applied to any kind of file to determine if a file has changed or if two files are identical. If the checksum is for the file as a whole, then editing embedded metadata after applying the checksum will alter the checksum hash value, indicating that the file has been modified. Some digital object formats (e.g. Broadcast Wave files) allow checksums to be applied only to the data portion of the file, thus allowing modification of embedded metadata without affecting the checksum. I am not sure whether partial checksums are possible with JPEG and other file formats. Here is a resource on this topic: Federal Agencies Digitization Guidelines Initiative: http://www.digitizationguidelines.gov/audio-visual/documents/md5.html 12. What is the importance of the different kinds of metadata in a digital preservation strategy?

• Descriptive – discovery of assets and discerning between similar assets (disambiguation) • Technical (instrument) – analysis and usefulness of the asset for a particular use/consumer • Administrative – management • Structural – how is the asset(s) related to other assets? This affects asset storage, rendering, and access/sharing. • Preservation – what is the asset’s history, ownership and disposition schedule • Rights – who can access an asset, when do use-terms expire, etc.

Categories are useful for understanding elements, not a strict, mutually exclusive taxonomy. Some categories can be considered subcategories of others, such as Rights being a subset of Preservation or Administrative. Categories aid in understanding why an element is being captured and how to use the element. Some elements may belong to multiple categories, for example Date Created:

• Technical - instrument date = the day/time the asset was created • Descriptive - the date can aid consumers in discovery of and disambiguation among assets • Rights – the date can help determine when use-terms expire for an asset • Preservation - the date can be used to determine when to dispose of and asset

13. What does XMP and IPTC metadata look like?

See answer to Question 15

14. What are some examples of Dublin Core elements?

The Dublin Core Metadata Initiative (http://dublincore.org/) manages the Dublin Core metadata standard. The standard is designed to be simple and useful for documenting a wide variety of digital and non-digital objects.

Here are the 15 elements of the Simple Dublin Core standard:

1) Contributor 9) Publisher 2) Coverage 10) Relation 3) Creator 11) Rights 4) Date 12) Source 5) Description 13) Subject 6) Format 14) Title 7) Identifier 15) Type 8) Language The entire element set can be viewed at: http://dublincore.org/documents/dces/

15. Of the EXIF, XMP, IPTC metadata, which is best for what purposes?

Exif (Exchangeable Image File Format) is designed to capture technical metadata taken from the camera or scanner. Exif also captures a few descriptive metadata elements. There are ‘unused’ hexadecimal addresses in JPEG file headers that can be used (as the NPS does) to capture additional metadata.

XMP (Extensible Metadata Platform) is another space in JPEG (and other) file headers used to capture mostly descriptive metadata. XMP is developed and supported by Adobe Systems Inc. and is supported by most Adobe software products. Some software by non-Adobe makers also supports the XMP standard. IPTC (International Press Telecommunications Council) metadata is designed for use primarily with news and stock photos. It originally had its own separate header space, but has since been incorporated into the XMP space.

Here’s a graphic showing the relationship between these three metadata standards:

For more information on these embedded metadata standards see: http://www.metadataworkinggroup.com/specs/

16. What are some solutions to generating metadata automatically?

• SharePoint and other asset repository systems can be configured to automatically embed metadata as assets are uploaded and can be used to require metadata creation on upload/ingest. • Exif – technical metadata automatically generated by the camera/scanner. Geospatial metadata can be automatically generated by GPS-enabled cameras. Software (GPS Photo Link and others) can be used to embed metadata captured by separate GPS unit. • Operating systems – computer operating systems (e.g. Windows Explorer and others) have the capability to batch edit embedded metadata and rename photos. Operating systems also can embed some user account information in digital assets.

17. Do you have any recommendations for tools that embed metadata in batch?

NOTE: The options listed here do not constitute endorsements by the National Park Service.

• Windows Explorer – batch rename items and update embedded metadata (aka properties). • /Lightroom/Bridge – rename multiple assets, manage batch renaming options, create metadata templates to apply metadata to multiple images. • GPS Photo Link – batch import and edit embedded metadata, use metadata elements to rename and watermark photos, generate multiple outputs including marked-up images, reports, maps, tables, XML, and geospatial data files.

18. Who on your staff (what position) is entering your metadata, and how specific/detailed is it? At the National Park Service, metadata creation is done by a wide variety of people working in different program areas often creating metadata as a collateral duty. The challenge for us is to develop Servicewide metadata guidance and standards and make them available and used across well over 400 park and program units from American Samoa, to Alaska, to the Virgin Islands.

NPS Geographic Information Systems (GIS) practitioners are probably most experienced at creating metadata, most of which is very technical and detailed about datums, projections, etc. Another experienced group is the NPS librarians who have been creating metadata in the form of catalog records since before the term metadata came into widespread use.

Technical metadata (resolution/sampling rate, settings, date/time, etc.) are often captured by the instrument and embedded in the digital file. Recommended “required” metadata for most NPS digital objects includes the following:

1) Title – who or what is in the image 2) Image_Content_Place – where the image was taken 3) Image_or_Set_Create_Date – when the image was taken 4) NPS_Unit_Alpha_Code – (e.g. Park code) used for managing or filtering groups of records and/or for linking systems 5) Metadata_Access_Constraints – who may view the record/image (e.g. Public, NPS staff access only) 6) Constraints_Information – explanation of restrictions, if any, on access/use of the image or the metadata 7) Contact_Information => Contact Organization – who to contact for further information

Although these elements are considered Mandatory, it is understood that they may not always apply. For example, some images, such as those created for Servicewide programs, may not have an NPS_Unit_Alpha_Code. Constraints_Information provides a place to describe any and all restrictions ranging from copyright to security. However, if there are no restrictions on access to or use of an image, then the Constraints_Information element may be blank.

We recommend that staff take a pragmatic approach to completing mandatory metadata elements. If it applies, use it. If it does not apply, move on to the next one.

19. What’s the value of standards such as DC and CCO for both professional and amateur/public use?

Metadata standards are SO important. They provide a common framework for describing, understanding, and managing digital assets. By using recognized standards you signify your organization’s commitment to best-practice approaches to digital asset management. You also make it easier for the public to discover and use digital assets in your collection. Finally, it is likely that adopting or adapting existing standards will be much easier, cheaper, and result in fewer errors, than developing your own metadata standard. Here is a good resource for comparing and evaluating metadata standards: http://en.wikipedia.org/wiki/Metadata_standards#Available_metadata_standards

Other

20. We have trouble ensuring that photo release forms (on paper) are tracked with the corresponding photo. Any tips?

I don’t have any experience with a photo release form tracking system. I asked for input from others in the NPS and have yet to receive a reply. I will continue to follow-up on this question and respond to the webinar moderators when I can.

21. Should we duplicate embedded metadata in separate files, in case the original files/data get corrupted? If yes, any suggested practices? It’s always good practice to keep multiple copies of important digital data in different locations on different types of storage media. Whether the metadata is embedded or in sidecar files, keeping duplicate copies of the files in different locations is important. You can set up a mutual assistance offsite backup system with another office in a distant location with which you can exchange data on discs, external hard drives, magnetic tape, USB drives etc. This could be another office/unit in your organization, or even an external partner organization. The key is to have the exchange partner far enough away that a catastrophic event does not affect both locations. Catastrophic events aren’t necessarily large-scale natural disasters but can include things like power outages and network failures. You can set up a scheduled time to exchange data, or just exchange data as needed.

Another option is to use an online (aka cloud) backup service. Some services provide a limited amount of free storage and charge fess over a certain amount. Search the internet for the terms “cloud backup services” and “online backup services”.

22. Should we keep all formats of an image (raw, , , etc.) for archival purposes?

That depends on your requirements and availability of storage. You will need to strike a balance between how much storage you can afford and how important it is to deliver large lossless image files like RAW and TIFF. Some digital asset management (DAM) systems will create lower resolution derivative/proxy files on demand so that you only have to store the archival files. To improve download performance, some DAMs create the proxies beforehand so users don’t wait for them to be created at download time. If you have plenty of storage and relatively low demand, you can store just the high-resolution files and manually create proxies when requested.

23. Can you give us recommendations for file naming conventions?

• Keep file names as simple as possible. Don’t try to use the file name as a substitute for good descriptive metadata (embedded or sidecar). Filenames can only be so long and they aren’t good metadata containers. • Excessively long file naming conventions are less likely to be correctly or consistently applied. • Spend time planning the file naming convention and get input from as many stakeholders as possible. • Once you select a file naming convention stick with it. It’s OK to modify the convention if circumstances change or you forgot a key piece, but try to keep changes to a minimum. • Here’s a link to an NPS file naming convention from 2006 which I believe is still in use: https://www.dropbox.com/s/6xxvh5dglpr4pii/MortensonD_2006_SWAN_FilenamingCheatSheet.?v=0rw-g

24. What is your suggested workflow for each of the following areas? Photo, text docs, audio

There are any number of potential workflows depending on organizational goals and priorities. The same basic principles apply to all digital objects/formats. Here are some suggestions:

Identify Decide, Organize, and Make copies (IDOM): http://blogs.loc.gov/digitalpreservation/2011/10/four-easy-tips-for- preserving-your-digital-/ (be sure to read the comments following this article).

For guidance currently being developed in National Park Service, here are the suggested steps:

1) Inventory: find out what assets you’ve got 2) Prioritize: delete low-value assets, keep high-value assets for next steps (also see response to Q 25 below) 3) Categorize: develop a subject taxonomy to tag and/or organize assets by whatever scheme makes sense 4) Describe: this means metadata! 5) Backup: create multiple copies of digital assets and keep in different physical locations 6) Archive: your institutional repository and/or the National Archives and Records Administration (NARA) 7) Share: within and outside your organization as appropriate 25. How do you decide what to keep (appraisal)?

• Triage: Delete low-value assets. Low-value criteria may include poor metadata availability, poor image/recording quality, inappropriate subject matter, redundant content, unknown location/subject, etc. • For born-digital assets, triage begins at the device (camera, scanner, recorder, etc.). • Questionable items can be placed in a “triage” folder for review by specific people or a triage review team.