State Library of North Carolina Lisa Gregory Jennifer Ricker Photo, flickr, DaseinDesign What is ?

Image, “The“Orange Scream” Marilyn by 1962”Edward by Munch Andy Warhol Options for makik ng filfles accessiblble over time . Emulation ▪ Original hardware & software ▪ System that mimics original hardware & software

Image, http://forum.xcitefun.net/amazing‐3d‐sidewalk‐art‐t40240.html Options for making files accessible over time . Migration ▪ Transformation to “stable” format upon ingest ▪ Transformation to “stable” format when current format reaches obsolescence

Image, http://blog.builddirect.com/industryinsights/a‐little‐better‐every‐day‐really‐adds‐up/  Small library  Small budget  Little IT support

Image, http://www.dogs4dogs.com/shots.html  Investigate what others were doing  Inventory what we had  Identify how others were transforming the type of files that we had

Image, http://blog.riskmanagers.us/?attachment_id=2765  Determine the transformation path and tool  Document our expected result  Perform the transformation  Evaluate the results . DDtocument‐Like  Audio/Video . CSS . MOV . DOC . MP3 . DOCX . WAV . HTLM  Images and Structured . PDF Graphics . PPT . GIF . PUB . RTF . JPG . TXT . TIF (compressed) . Spreadsheets . PSD . XLS . AI . Geospatial . SHP  Web Archives . SHX . ARC . DBF . Ffmpeg . MP3 to WAV . MOV to AVI . Inkscape . AI to SVG . PLANETS Testbed . GIF, JPG, PSD to TIF . DOC & DOCX to ODT . CSS & HTML to TXT . PUB & RTF to PDF/a . XENA . GIF, JPG, PSD to TIF . DOC & DOCX to ODT . PPT to ODP . XLS to ODF . CSS & HTML to TXT . PUB to PDF/a . ArcMap/TerraGo . SHP, SHX, &DBF to GeoPDF (TerraGo) & Photo, flickr, Azzazello Geospatial PDF (ArcMap) . Free . Open source . Documented . Supported . Audit trail/reporting . Easy to use (preferably with GUI…not command line) . Versatile (transforms multiple formats, single or batch, etc.) . No viil/ditsual/auditory loss of contttent . No loss of metadata . Mini mal ddidegradation in quality (look & feel) . Minimal degradation in structure ((pcomprehensibility) . Minimal degradation in interactivity ((y)functionality) Desired Results Original Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes

Audio/Video Considerable .mpeg-2 + mxf degradation in video and .mov wrapper audio quality.

. file + bwf . header Results Original Desired Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes Images and Structured Graphics Font formatting and .ai .svg Y * * YYcolor subtly different.

* Yes, but with some loss. .ai (original)

.svg (trans formati on) Results Desired Migration Converted Rendered Metadata Considered Original Format Result Successfully? Well? Retained? Acceptable? Notes Document-Like .css .txt Y Y Y Y .doc (all Word 95 files could not be versions) .odt Y Y Y Y converted. .docx .odt Y Y Y Y .html .txt Y Y Y Y Tool could not accommodate migrating . .pdf/a N n/a n/a N .pdf to .pdf/a. Tool could not accommodate migrating .pub .pdf/a N n/a n/a .pub to .pdf/a. Tool could not accommodate migrating .rtf .rtf .pdf/a n/a n/a to .pdf/a. Images and Structured Graphics Rendered, but no .psd functionality (layers, etc.) .psd .tif (uncompresse d) Y N Y retitained . . .tif (uncompressed) Y Y Y Y File header's "modified date" was changed to .jpg .tif (uncompressed) Y Y N experiment date. Results Desired Migration Converted Rendered Metadata Considered Original Format Result Successfully? Well? Retained? Acceptable? Notes Document-Like .css .txt Y Y Y Y Tables & tabs did not render exactly in XENA; fine in .doc (all versions) .odt Y * YYOpenOffice. Bullets did not render exactly in .docx .odt Y * YYXENA; fine in OpenOffice.

.html .txt Y Y Y Y Tool could not accommodate .pdf .pdf/a N n/a n/a N migrating .pdf to .pdf/a. Tool could not accommodate .pub .pdf /a n/a n/a migrating .pub to .pdf/a. Images and Structured Graphics

.psd .png N N

.gif .png Y * Y Y Less crisp than the original. .tif (uncompressed) or .jpg .png n/a n/a Simply wraps in XML. Spreadsheets Author, manager, company .xls.odf metdttadata l os t.

* Yes, but with some loss. .ppt (original) .odp (transformation) Desired Results Original Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes Geospatial ArcMap: Embedded metadata is not currently accessible in Adobe. Both tools: Most metadata is contained in a separate .xml file .shp, .shx, that the converting tool .dbf .pdf * seems to ignore.

* Yes, but with some loss.

NOTE: ArcMap and TerraGo are both proprietary software tools.

 Challenges expected and found

Proprietary + Complex, Layers less widely related files used  Surprises . Audio‐video formats have their own complexities . Frame rates, compression, and codecs, oh my!  Surprises . PDF/Argh

1a 1b

•1b restrictions PLUS •Self‐contained • Defined document •No external references structure (tags) •Lowest level of compliance • Better accessibility • Digitized materials • Metadata is required

Photo, etsy, icehousecrafts  GGdood ttlools to have: ▪ FFmpeg ▪ FITS ▪ FLAC Frontend ▪ Ghostscript ▪ Inkscape ▪ MPEG Streamclip ▪ PLANETS Testbed (RIP?) ▪ XENA  Free & open source has downsides . “Free” in upfront costs . Might be developed by single person, or by hundreds . Learning curve can be steep  Documentation can be confusing/nonexistent

 Can you rock the command line? . gswin32c –dPDFA –dBATCH –dNOPAUSE –dNOOUTERSAVE – dUseCIEColor –SDEVICE#pdfwrite –sOutputFile=newfilename inputfilename  Build in time for stops along the road . Tool installation ▪ You probably won’t have the ideal configuration . Troubleshooting ▪ You may not get great error feedback . General Googling for assistance ▪ You might not be able to rely on the software documentation for help On‐the‐fly or scheduled bulk migration? QA –what shldhould we use/rely on?

QA –how much should we do? How can we facili tate batch processing? ARC to WARC?  Overcoming challenges to production implementation . Usual culprits: staff time, resources, IT restrictions, programming skills . The existence of technologies and workflows are not the main problems  Testing Archivematica (archivematica.org), by Artefactual Systems  “Archivematica is a comprehensive dldigital preservation system. Archivematica uses a micro‐services design pattern to provide an integrated suite of free and open‐source tools that allows users to process ddligital objects from ingest to access in compliance with the ISO‐OAIS flfunctional modldel.” OUR “TOOLS TO HAVE” ARCHIVEMATICA

▪ FFmppgeg  digiKam DNG ▪ FITS Convertor  Document Converter ▪ FLAC Frontend  FFmpeg ▪ Ghostscript  FITS ▪ Inkscape  Imagemagick ▪ MPEG Streamclip  Inkscape  OpenOffice.org ▪ PLANETS Testbed (RIP?)  PyODConverter ▪ XENA Daemon  Formal workflow description . OAIS compliant . Multiple sources . Multiple stakeholders . On‐ and off‐site storage . Tenuous IT capabilities  At‐risk files At‐riskier files . Older files . Older formats (Word 6, etc.) . Obsolete formats . Databases . More work on a/v formats Jennifer Ricker Lisa Gregory [email protected] [email protected]

Photo, flickr, HarshLight