Williams College Archives and Special Collections File Format Recommendations
Total Page:16
File Type:pdf, Size:1020Kb
Williams College Archives and Special Collections File Format Recommendations Based on current digital data curation best practice and recommendations, the Williams College Archives has set the following confidence levels for the longevity of commonly used file formats. Record creators are encouraged to take these recommendations into consideration when determining what file formats to save records as during their active lifecycle. The longer the active lifecycle is anticipated to be, the greater the need to use formats ranked as medium or high confidence. Media type High Confidence Level Medium Confidence Low Confidence Level Level Text - Plain text (encoding: USASCII, - Cascading Style Sheets - PDF (.pdf) (encrypted) UTF-8, UTF-16 with (.css) - Microsoft Word (.doc) BOM) - DTD (.dtd) - WordPerfect (.wpd) - XML (includes XSD/XSL/ - Plain text (ISO 8859-1 - DVI (.dvi) XHTML, etc.; with included or encoding) - All other text forMats accessible schema and - PDF (.pdf) (eMbedded not listed here character encoding explicitly fonts) specified) - Rich Text ForMat 1.x (.rtf) - PDF/A-1 (ISO 19005-1) - HTML (include a DOCTYPE (.pdf) declaration) - SGML (.sgMl) - Open Office (.sxw/.odt) - OOXML (ISO/IEC DIS 29500) (.docx) Raster Images - TIFF (uncoMpressed) - BMP (.bMp) - MrSID (.sid) - JPEG2000 (lossless) (.jp2) - JPEG/JFIF (.jpg) - TIFF (in Planar forMat) -PNG - JPEG2000 (lossy) (.jp2) - FlashPix (.fpx) - TIFF (coMpressed) - PhotoShop (.psd) - GIF (.gif) - RAW - Digital Negative DNG - JPEG 2000 Part 2 (*.jpf, (.dng) .jpx) - PNG (.png) - All other raster image formats not listed here Vector - SVG (no Java script binding) - CoMputer Graphic - Encapsulated Postscript Graphics (.svg) Metafile (EPS) (CGM, WebCGM) (.cgM) - MacroMedia Flash (.swf) - All other vector image formats not listed here Audio - AIFF (PCM) (.aif, .aiff) - SUN Audio - AIFC (coMpressed) - WAV (PCM) (.wav) (uncoMpressed) (.aifc) (.au) - NeXT SND (.snd) - Standard MIDI (.Mid, .midi) - RealNetworks 'Real - Ogg Vorbis (.ogg) Audio' (.ra, rm, - Free Lossless Audio Codec .raM) (.flac) - Windows Media Audio - Advance Audio Coding (.wMa) (.Mp4, .M4a, .aac) - Protected AAC (.m4p) - MP3 (MPEG-1/2, Layer 3) - WAV (coMpressed) (.mp3) (.wav) - All other audio forMats not listed here Video - Motion JPEG 2000 (ISO/IEC - Ogg Theora (.ogg) - AVI (others) (.avi) 15444-4) (.mj2) - MPEG-1, MPEG-2 (.mpg, - QuickTime Movie - AVI (uncoMpressed, motion .mpeg, wrapped in AVI, (others) (.mov) JPEG) (.avi) MOV) - RealNetworks 'Real - QuickTime Movie - MPEG-4 (H.263, H.264) Video' (.rv) (uncoMpressed, Motion JPEG) (.mp4, wrapped in AVI, - Windows Media Video (.mov) MOV) (.wmv) - All other video forMats not listed here Spreadsheet/ - CoMMa Separated Values - DBF (.dbf) - Excel (.xls) Database (.csv) - OpenOffice (.sxc/.ods) - All other spreadsheet/ - Delimited Text (.txt) - OOXML (ISO/IEC DIS database forMats not - SQL DDL 29500) listed here (.xlsx) Virtual Reality - X3D (.x3d) - VRML (.wrl, .vrMl) - All other virtual reality - U3D (Universal 3D file formats not listed here format) Computer - CoMputer prograM source - CoMpiled / Executable Programs code files (EXE, .class, COM, (.c, .c++, .java, .js, .jsp, DLL, BIN, DRV, OVL, .php, .pl, etc.) SYS, PIF) Presentation - OpenOffice (.sxi/.odp) - PowerPoint (.ppt) Files - OOXML (ISO/IEC DIS - All other presentation 29500) formats not listed here (.pptx) Williams College Archives and Special Collections Media Type Preservation Plan Once records have become inactive and are transferred to the Williams College Archives, the Archives’ primary preservation strategy is to normalize files to preservation and access formats upon ingest. The choice of access formats is based on the ubiquity of viewers for the file format. All preservation formats are open standards. Additionally, the choice of preservation format is based on community best practices, availability of open-source normalization tools, and an analysis of the significant characteristics of each media type. The College Archives also maintains the original format of all ingested files to support future migration and emulation preservation strategies as needed. The normalization chart below is based on international best practice standards and the default plan for Archivematica. The Archives monitors standard format registries and technological advances and adjusts the plan as needed. Preservation Access Normalization Media type Original File formats format(s) format(s) tool Audio AC3, AIFF, MP3, WAV, WMA WAVE (LPCM) MP3 FFmpeg Portable DocuMent PDF PDF/A PDF or PDF/A Ghostscript ForMat Presentation PPT ODF PDF OpenOffice files BMP, GIF, JPG, JP2, PNG, PSD*, UncoMpressed Images JPEG ImageMagick TIFF, TGA TIFF Raw caMera DigiKaM DNG NEF DNG JPEG files Converter Spreadsheet/ Original XLS ODF Unoconv/OpenOffice Database format Original Plain text TXT Original forMat None format Vector iMages AI*, EPS*, SVG* SVG PDF Inkscape AVI, FLV, MOV, MPEG-1, MPEG- Video MPEG-2 MPG FFmpeg 2, MPEG-4*, SWF, WMV Word processing DOC, WPD, RTF ODF PDF OpenOffice files .