SOLOMON R. GUGGENHEIM FOUNDATION

Electronic Records Management Start-Up Project Electronic Records Processing Manual Appendix A: SRGF Preservation and Access Formats Updated May 9, 2014 – Prepared by Anthony Cocciolo, Electronic Records Consultant

Original file is preserved, and files are created for preservation and access. This table is based on the normalization table from Archivematica.1

Relevant Pilot Access Archivist needs to Media type File formats Preservation format(s) Notes format(s) manually normalize?

Pilot 5 - MS Office (2007 Yes, use script created MS Office Open XML Standard and greater): by consultant to DOCX, PPTX Original format PDF files can be unzipped Records Word, normalize: and viewed as XML. PowerPoint pres&access.vbs2

MS Office (2007 No. Original and greater): XLSX Original format format Excel

Yes, use script created MS Office binary files by consultant to are well documented, MS Office normalize: however, the open (binary formats): DOC, PPT DOCX, PPTX PDF pres&access.vbs XML versions may be Word & more sustainable Powerpoint because they are text- based.3

MS Office Yes, use script created Not normalized to XLS XLSX XLSX (binary formats): by consultant to PDF because

1 https://www.archivematica.org/wiki/Media_type_preservation_plans#Normalization 2 Script available on: https://github.com/Guggenheim 3 http://www.digitalpreservation.gov/formats/intro/specifications.shtml Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 2

Excel normalize: spreadsheets can be pres&access.vbs tedious to view in this format.

Original No. Plain text TXT Original format format

No, automatically PDF submitted should converted by include fonts Archivematica using embedded; avoid Ghostscript4 “Smallest file size” Portable which do not Original Document PDF PDF/A embed fonts.5, format Format InDesign and QuarkXPress files should come in as high- quality or press quality PDFs.

No, automatically Raster images – JPG, JPG2000, converted by Highly Original format JPG PNG, TIFF Archivematica using sustainable6 ImageMagick.

Raster images – BMP, PCT, No, automatically Questionable TGA, GIF, Uncompressed TIFF JPG converted by sustainability7 PSD8 Archivematica using

4 Using the behavior described here: https://www.archivematica.org/wiki/PDF_to_PDF/A_using_Ghostscript 5 http://help.adobe.com/en_US/acrobat/X/pro/using/WSb2f1a50375cd48d3-1f36d19412ada208ceb-8000.html 6 Based on Library of Congress’ Sustainability of Digital Formats – Still Images: http://www.digitalpreservation.gov/formats/fdd/still_fdd.shtml 7 Ibid.

Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 3

ImageMagick.

No, most Illustrator files have PDF embedded within it, so it is not The lack of a PDF necessary to create a being embedded within separate PDF. the InDesign file will be AI PDF/A PDF However, if PDF is not caught with FIDO embedded within the format identification in Illustrator file, then Archivematica. create a separate PDF.

Yes, use script created by consultant to Rich Text format RTF Original format PDF normalize: pres&access.vbs

Yes, create PDF/A and InDesign files should PDF using InDesign and generally not be Acrobat. submitted; prefer high- Adobe InDesign INDD, IND PDF/A PDF quality PDFs that will be auto-converted into PDF/A.

Pilot 6 - Obscure Yes, export to PDF in The DWG AutoCAD [2D file formats DWG, DXF Original format PDF DWG viewer. has been openly model / drawing] specified, and open

8 files (Photoshop, Illustrator, InDesign, Flash) could face long term preservation challenges because Adobe will only be making those products available via cloud subscription: http://news.cnet.com/8301-1001_3-57582735-92/adobe-kills-creative-suite-goes-subscription-only/ However, Adobe has made great strides in providing open documentation on many of their formats, which helps ameliorate the previously mentioned concern.

Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 4

is available for reading it.9 Also, AutoDesk makes a free DWG viewer – DWG TrueView.10

Bentley Systems Yes, convert to DWG CAD format and PDF using DGN DWG PDF [2D model / AutoCAD. drawing]

Yes, export to DWG Major category – VectorWorks and PDF using 19,130 files in this and MiniCad MCD, VWX DWG [AutoCAD] PDF VectorWorks. format. Export to [2D model / DWG from drawing ] VectorWorks.

Yes. In Rhino, select Original Major category – 1,111 Print and Save as PDF. format, files in this format. To Rhino [3D Include the 3DM file and 3DM Original format PDF (acts preserve would need to Models] PDF file in the manually like a preserve software and normalized access preview) emulate if needed.11 folder.

Autodesk 3D 3DM, PDF Yes. Export to Rhino Minor format at SRGF; Max [3D 3DS Original format (acts like a 3dm using Autodesk 3D To preserve would models] preview) Max. Create a PDF need to preserve

9 http://opendesign.com/files/guestdownloads/OpenDesign_Specification_for_.dwg_files. ; http://www.opendesign.com/guestfiles/teigha_viewer 10 http://usa.autodesk.com/adsk/servlet/pc/index?id=6703438&siteID=123112 11 http://www.ijdc.net/index.php/ijdc/article/viewFile/105/80

Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 5

preview, and include the software and emulate if 3DM and PDF file in the needed.12 Or, convert manually normalized to Rhino.13 access folder.

3FR, ARW, No, automatically CR2, CRW, converted in Raw camera DCR, DNG, Archivematica using files/Digital ERF, KDC, Uncompressed TIFF JPEG ImageMagick and Negative MRW, NEF, UFRaw. format** ORF, PEF, RAF, RAW, X3F

No, automatically General Vector converted in EPS, SVG SVG PDF images Archivematica using .

Pilot 7 – Yes, use script created Script leverages MS Obsolete file WordPerfect by consultant to Word 2007+ for WPD DOCX PDF formats files normalize: Windows to do pres&access.vbs conversion.

Yes, export to PDF/A QXD files should and PDF using InDesign generally not be and Acrobat. Need submitted; prefer high- QuarkXPress QXD PDF/A PDF fonts used available on quality PDFs that will system. be auto-converted into PDF/A.

12 http://www.ijdc.net/index.php/ijdc/article/viewFile/105/80 13 If converting to Rhino, would want to ensure that no information is lost in the process, such as textures.

Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 6

Yes, export using Lotus 1-2-3 WK4 XLSX XLSX Gnumeric.14

Pilot 9 – No, uses readpst to A viewer for MBOX MBOX Significant Email MS Outlook create MBOX file. files is required, and PST MBOX (need Correspondence Personal Folders can be installed on an viewer) archives computer.

Yes, see Appendix C EML file is text and MS Outlook MIME based and well- MSG EML PDF Message suited for long-term preservation.15

Yes, use script created Novell by consultant to MLM files are MLM DOCX PDF Groupwise Email normalize: WordPerfect 5.X files pres&access.vbs

MS Outlook Yes, see Appendix C EML Original format PDF Express Message

Pilot 10 – Web Text based No. archiving SRGM remain Websites HTML. HTM, original Text based remain Web documents Misc. files used format; original format; other files stored on for web display other files converted as indicated in Network (JPG, PNG, converted this table. etc.) as indicated in this

14 http://www.gnumeric.org/ 15 http://www.dpconline.org/component/docman/doc_download/739-dpctw11-01pdf

Solomon R. Guggenheim Foundation Electronic Records Management Start-Up Project 7

table.

No. Heretix (retrieve) and HTML, HTM, Web WayBackMachine Misc. files used documented (view). WARC for web display WARC WARC retrieved from viewers can be used for (JPG, PNG, web archiving reading the WARC etc.) file.

Pilot 11 – Very No, automatically AC3, AIFF, large files (or converted in Audio MP3, WAV, WAVE (LPCM) MP3 Video/Audio Archivematica using WMA media) FFmpeg

AVI, FLV, No, automatically MOV, MPEG-1, converted in

Video MPEG-2, FFV1/LPCM in MKV MP4 Archivematica using

MPEG-4, SWF, FFmpeg WMV, DV