Williams College Archives and Special Collections Recommendations

Based on current digital data curation best practice and recommendations, the Williams College Archives has set the following confidence levels for the longevity of commonly used file formats. Record creators are encouraged to take these recommendations into consideration when determining what file formats to save records as during their active lifecycle. The longer the active lifecycle is anticipated to be, the greater the need to use formats ranked as medium or high confidence.

Media type High Confidence Level Medium Confidence Low Confidence Level Level

Text - Plain text (encoding: USASCII, - Cascading Style Sheets - PDF (.) (encrypted) UTF-8, UTF-16 with (.) - Word (.doc) BOM) - DTD (.dtd) - WordPerfect (.wpd) - XML (includes XSD/XSL/ - Plain text (ISO 8859-1 - DVI (.dvi) XHTML, etc.; with included or encoding) - All other text formats accessible schema and - PDF (.pdf) (embedded not listed here character encoding explicitly fonts) specified) - Rich Text Format 1.x (.rtf) - PDF/A-1 (ISO 19005-1) - HTML (include a DOCTYPE (.pdf) declaration) - SGML (.sgml) - Open Office (.sxw/.odt) - OOXML (ISO/IEC DIS 29500) (.docx)

Raster Images - TIFF (uncompressed) - BMP (.bmp) - MrSID (.sid) - JPEG2000 (lossless) (.jp2) - JPEG/JFIF (.jpg) - TIFF (in Planar format) -PNG - JPEG2000 (lossy) (.jp2) - FlashPix (.fpx) - TIFF (compressed) - PhotoShop (.psd) - GIF (.) - RAW - Digital Negative DNG - JPEG 2000 Part 2 (*.jpf, (.dng) .jpx) - PNG (.png) - All other raster image formats not listed here

Vector - SVG (no Java script binding) - Computer Graphic - Encapsulated Postscript (.svg) Metafile (EPS) (CGM, WebCGM) (.cgm) - Macromedia Flash (.) - All other vector image formats not listed here

Audio - AIFF (PCM) (.aif, .aiff) - SUN Audio - AIFC (compressed) - WAV (PCM) (.) (uncompressed) (.aifc)

(.au) - NeXT SND (.snd) - MIDI (.mid, .midi) - RealNetworks 'Real - (.ogg) Audio' (.ra, rm, - Free Lossless Audio .ram) (.) - Audio - Advance Audio Coding (.wma) (.mp4, .m4a, .aac) - Protected AAC (.m4p) - MP3 (MPEG-1/2, Layer 3) - WAV (compressed) (.) (.wav) - All other audio formats not listed here

Video - Motion JPEG 2000 (ISO/IEC - Ogg (.ogg) - AVI (others) (.avi) 15444-4) (.mj2) - MPEG-1, MPEG-2 (.mpg, - QuickTime Movie - AVI (uncompressed, motion .mpeg, wrapped in AVI, (others) (.mov) JPEG) (.avi) MOV) - RealNetworks 'Real - QuickTime Movie - MPEG-4 (H.263, H.264) ' (.rv) (uncompressed, motion JPEG) (.mp4, wrapped in AVI, - (.mov) MOV) (.wmv) - All other video formats not listed here

Spreadsheet/ - Comma Separated Values - DBF (.dbf) - Excel (.xls) Database (.csv) - OpenOffice (.sxc/.ods) - All other spreadsheet/ - Delimited Text (.txt) - OOXML (ISO/IEC DIS database formats not - SQL DDL 29500) listed here (.xlsx)

Virtual Reality - (.x3d) - VRML (.wrl, .) - All other - U3D ( file formats not listed here format)

Computer - Computer program source - Compiled / Executable Programs code files (EXE, .class, COM, (.c, .c++, .java, .js, .jsp, DLL, BIN, DRV, OVL, .php, .pl, etc.) SYS, PIF)

Presentation - OpenOffice (.sxi/.odp) - PowerPoint (.ppt) Files - OOXML (ISO/IEC DIS - All other presentation 29500) formats not listed here (.pptx)

Williams College Archives and Special Collections Preservation Plan

Once records have become inactive and are transferred to the Williams College Archives, the Archives’ primary preservation strategy is to normalize files to preservation and access formats upon ingest. The choice of access formats is based on the ubiquity of viewers for the file format. All preservation formats are open standards. Additionally, the choice of preservation format is based on community best practices, availability of open-source normalization tools, and an analysis of the significant characteristics of each media type. The College Archives also maintains the original format of all ingested files to support future migration and emulation preservation strategies as needed. The normalization chart below is based on international best practice standards and the default plan for Archivematica. The Archives monitors standard format registries and technological advances and adjusts the plan as needed.

Preservation Access Normalization Media type Original File formats format(s) format(s) tool

Audio AC3, AIFF, MP3, WAV, WMA WAVE (LPCM) MP3 FFmpeg

Portable Document PDF PDF/A PDF or PDF/A Ghostscript Format

Presentation PPT ODF PDF OpenOffice files

BMP, GIF, JPG, JP2, PNG, PSD*, Uncompressed Images JPEG ImageMagick TIFF, TGA TIFF

Raw camera DigiKam DNG NEF DNG JPEG files Converter

Spreadsheet/ Original XLS ODF Unoconv/OpenOffice Database format

Original Plain text TXT Original format None format

Vector images AI*, EPS*, SVG* SVG PDF Inkscape

AVI, FLV, MOV, MPEG-1, MPEG- Video MPEG-2 MPG FFmpeg 2, MPEG-4*, SWF, WMV

Word processing DOC, WPD, RTF ODF PDF OpenOffice files