State Library of North Carolina Y Lisa Gregory Jennifer Ricker
Total Page:16
File Type:pdf, Size:1020Kb
State Library of North Carolina Lisa Gregory Jennifer Ricker Photo, flickr, DaseinDesign What is Digital Preservation? Image, “The“Orange Scream” Marilyn by 1962”Edward by Munch Andy Warhol Options for makik ng filfles accessiblble over time . Emulation ▪ Original hardware & software ▪ System that mimics original hardware & software Image, http://forum.xcitefun.net/amazing‐3d‐sidewalk‐art‐t40240.html Options for making files accessible over time . Migration ▪ Transformation to “stable” format upon ingest ▪ Transformation to “stable” format when current format reaches obsolescence Image, http://blog.builddirect.com/industryinsights/a‐little‐better‐every‐day‐really‐adds‐up/ Small library Small budget Little IT support Image, http://www.dogs4dogs.com/shots.html Investigate what others were doing Inventory what we had Identify how others were transforming the type of files that we had Image, http://blog.riskmanagers.us/?attachment_id=2765 Determine the transformation path and tool Document our expected result Perform the transformation Evaluate the results . DDtocument‐Like Audio/Video . CSS . MOV . DOC . MP3 . DOCX . WAV . HTLM Images and Structured . PDF Graphics . PPT . GIF . PUB . RTF . JPG . TXT . TIF (compressed) . Spreadsheets . PSD . XLS . AI . Geospatial . SHP Web Archives . SHX . ARC . DBF . Ffmpeg . MP3 to WAV . MOV to AVI . Inkscape . AI to SVG . PLANETS Testbed . GIF, JPG, PSD to TIF . DOC & DOCX to ODT . CSS & HTML to TXT . PUB & RTF to PDF/a . XENA . GIF, JPG, PSD to TIF . DOC & DOCX to ODT . PPT to ODP . XLS to ODF . CSS & HTML to TXT . PUB to PDF/a . ArcMap/TerraGo . SHP, SHX, &DBF to GeoPDF (TerraGo) & Photo, flickr, Azzazello Geospatial PDF (ArcMap) . Free . Open source . Documented . Supported . Audit trail/reporting . Easy to use (preferably with GUI…not command line) . Versatile (transforms multiple formats, single or batch, etc.) . No viil/ditsual/auditory loss of contttent . No loss of metadata . Mini mal ddidegradation in quality (look & feel) . Minimal degradation in structure ((pcomprehensibility) . Minimal degradation in interactivity ((y)functionality) Desired Results Original Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes Audio/Video Considerable .mpeg-2 + mxf degradation in video and .mov wrapper audio quality. .wav file + bwf .mp3 header Results Original Desired Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes Images and Structured Graphics Font formatting and .ai .svg Y * * YYcolor subtly different. * Yes, but with some loss. .ai (original) .svg (trans forma tion ) Results Desired Migration Converted Rendered Metadata Considered Original Format Result Successfully? Well? Retained? Acceptable? Notes Document-Like .css .txt Y Y Y Y .doc (all Word 95 files could not be versions) .odt Y Y Y Y converted. .docx .odt Y Y Y Y .html .txt Y Y Y Y Tool could not accommodate migrating .pdf .pdf/a N n/a n/a N .pdf to .pdf/a. Tool could not accommodate migrating .pub .pdf/a N n/a n/a .pub to .pdf/a. Tool could not accommodate migrating .rtf .rtf .pdf/a n/a n/a to .pdf/a. Images and Structured Graphics Rendered, but no .psd functionality (layers, etc.) .psd .tif (uncompresse d) Y N Y retitaine d. .gif .tif (uncompressed) Y Y Y Y File header's "modified date" was changed to .jpg .tif (uncompressed) Y Y N experiment date. Results Desired Migration Converted Rendered Metadata Considered Original Format Result Successfully? Well? Retained? Acceptable? Notes Document-Like .css .txt Y Y Y Y Tables & tabs did not render exactly in XENA; fine in .doc (all versions) .odt Y * YYOpenOffice. Bullets did not render exactly in .docx .odt Y * YYXENA; fine in OpenOffice. .html .txt Y Y Y Y Tool could not accommodate .pdf .pdf/a N n/a n/a N migrating .pdf to .pdf/a. Tool could not accommodate .pub .pdf /a n/a n/a migrating .pub to .pdf/a. Images and Structured Graphics .psd .png N N .gif .png Y * Y Y Less crisp than the original. .tif (uncompressed) or .jpg .png n/a n/a Simply wraps in XML. Spreadsheets Author, manager, company .xls.odf metdttadata los t. * Yes, but with some loss. .ppt (original) .odp (transformation) Desired Results Original Migration Converted Rendered Metadata Considered Format Result Successfully? Well? Retained? Acceptable? Notes Geospatial ArcMap: Embedded metadata is not currently accessible in Adobe. Both tools: Most metadata is contained in a separate .xml file .shp, .shx, that the converting tool .dbf .pdf * seems to ignore. * Yes, but with some loss. NOTE: ArcMap and TerraGo are both proprietary software tools. Challenges expected and found Proprietary + Complex, Layers less widely related files used Surprises . Audio‐video formats have their own complexities . Frame rates, compression, and codecs, oh my! Surprises . PDF/Argh 1a 1b •1b restrictions PLUS •Self‐contained • Defined document •No external references structure (tags) •Lowest level of compliance • Better accessibility • Digitized materials • Metadata is required Photo, etsy, icehousecrafts GGdood ttlools to have: ▪ FFmpeg ▪ FITS ▪ FLAC Frontend ▪ Ghostscript ▪ Inkscape ▪ MPEG Streamclip ▪ PLANETS Testbed (RIP?) ▪ XENA Free & open source has downsides . “Free” in upfront costs . Might be developed by single person, or by hundreds . Learning curve can be steep Documentation can be confusing/nonexistent Can you rock the command line? . gswin32c –dPDFA –dBATCH –dNOPAUSE –dNOOUTERSAVE – dUseCIEColor –SDEVICE#pdfwrite –sOutputFile=newfilename inputfilename Build in time for stops along the road . Tool installation ▪ You probably won’t have the ideal configuration . Troubleshooting ▪ You may not get great error feedback . General Googling for assistance ▪ You might not be able to rely on the software documentation for help On‐the‐fly or scheduled bulk migration? QA –what shldhould we use/rely on? QA –how much should we do? How can we facili tate batch processing? ARC to WARC? Overcoming challenges to production implementation . Usual culprits: staff time, resources, IT restrictions, programming skills . The existence of technologies and workflows are not the main problems Testing Archivematica (archivematica.org), by Artefactual Systems “Archivematica is a comprehensive dldigital preservation system. Archivematica uses a micro‐services design pattern to provide an integrated suite of free and open‐source tools that allows users to process dldigital objects from ingest to access in compliance with the ISO‐OAIS flfunctional moddlel.” OUR “TOOLS TO HAVE” ARCHIVEMATICA ▪ FFmppgeg digiKam DNG ▪ FITS Convertor Document Converter ▪ FLAC Frontend FFmpeg ▪ Ghostscript FITS ▪ Inkscape Imagemagick ▪ MPEG Streamclip Inkscape OpenOffice.org ▪ PLANETS Testbed (RIP?) PyODConverter ▪ XENA Daemon Formal workflow description . OAIS compliant . Multiple sources . Multiple stakeholders . On‐ and off‐site storage . Tenuous IT capabilities At‐risk files At‐riskier files . Older files . Older formats (Word 6, etc.) . Obsolete formats . Databases . More work on a/v formats Jennifer Ricker Lisa Gregory [email protected] [email protected] Photo, flickr, HarshLight.