PRONOM in Practice

Total Page:16

File Type:pdf, Size:1020Kb

PRONOM in Practice iPRES 2018 The 15th International Conference on Digital Preservation September 24th, 2018, 9am-12:30pm PRONOM in Room 216, the Joseph B Martin Conference Centre Practice Creating File Format/System Signatures for Submission to PRONOM Technical Registry David Clipsham, Nick Krabbenhoeft, Shira Peltzman, Justin Simpson, & Carl Wilson 1 INTRODUCTIONS Facilitators David Clipsham - Digital Archives Systems Manager, PRONOM Lead, National Archives, UK Nick Krabbenhoeft - Head of Digital Preservation, New York Public Library Shira Peltzman - Digital Archivist, UCLA Library Justin Simpson - Archivematica Technical Director, Artefactual Systems, Inc Carl Wilson - Technical Lead, Open Preservation Foundation 2 Introduction to file format signatures How are file formats identified, overview of 9:15 - 9:35 am Agenda PRONOM, case studies Signature development process Reading bytestreams (why to do it, how to do 9:35 - 10:35 am it), creating signatures [break] Signature development process (cont’d) & case studies 10:50-11:45 pm Testing signatures, submitting to PRONOM [break] Advanced signature development & open signature development workshop 12:00-12:30 pm Container signatures, finding samples, troubleshoot existing signatures 3 Introduction to file format signatures 4 Why file format signatures? Agenda Style Relevancy to digital preservation ● File format identification enables us to know what we’re dealing with ○ This happens early on in most workflows ○ The outcome of this process impacts downstream decision-making around activities like normalization for preservation and access ● File format identification tools are only as good as the file format signatures that have been developed by the community ○ The lack of a file format signature means that file identification cannot meaningfully take place ○ Executing tasks that should be straightforward, like disk image extraction and File Formats characterization, are sometimes difficult if not altogether impossible 5 PRONOM Image from Flickr via kevandotorg 6 PRONOM http://www.nationalarchives.gov.uk/PRONOM/Default.aspx Developed in 2001 to meet the National Archives digital record File format registry for digital preservation planning. preservation planning File format research File format 1670 entries Format extensions, always ongoing, National identification aka PUIDs - PRONOM MIME/Media types, Archives research guided signatures Unique Identifiers links to documentation primarily by UK (for DROID originally) Government needs. External contribution always welcome and encouraged 7 PRONOM Timeline 2001 2004 2005 ongoing Continual PRONOM research and signature development DROID launched alongside PRONOM 4 PUIDs introduced Opened up as externally browsable resource Aka PRONOM 3 Original internal version 8 PRONOM Growth 9 PRONOM Contributors 10 File FormatAgenda ID Style PRONOM identification mechanisms ● Extension (.doc, .exe, .jpg) ● File format signature ○ Binary pattern matching ○ Created from elements of internal structure ○ May be simple ‘magic numbers’ - “CAFEBABE” for Java Class File ■ http://blog.nationalarchives.gov.uk/blog/cafed00ds-and-cafebabes/ ○ May consist of complex patterns of variations, gaps and alternative values. ○ Driven by file format specification where possible ● Container signatures - formats made up of small files contained within a ‘ZIP’ or ‘OLE2’ wrapper (.doc, .xlsx, .odt, .epub) ○ http://openpreservation.org/blog/2016/01/07/droid-container-signature-files-what-th ey-are-and-how-to-create-them-a-template-and-an-example-or-few/ File Format ID 11 File FormatAgenda ID Style DROID Pattern Matching ● Scans internal file byte code ● Compares against known signatures in signature file ● Returns a Hit! where it gets a match ● We’re aiming for certainty – there should be an extremely low chance that a file could be of a different type to the format that DROID identifies ● So, signature needs to be strong enough, but doesn’t need to encode all of the characteristics of a format File Format ID 12 File FormatAgenda ID Style Magic Numbers (AKA Signatures) ● A specified sequence of characters/bytes that must be present ● Usually at the start of the file (not always) ● Explicitly stated within the format specification: ● Java Class file – https://docs.oracle.com/javase/specs/jvms/se7/html/jvms-4.html Hex 0xCAFEBABE ● PNG - https://www.w3.org/TR/PNG-Structure.html ASCII “‰PNG”, then hex 0x0D0A1A0A ● Photoshop PSD - https://www.adobe.com/devnet-apps/photoshop/fileformatashtml File Format ID ASCII “8BPS” 13 File FormatAgenda ID Style Inferred Signatures ● Sometimes formats may not have clearly defined signatures, but may have characteristics that must be present. This can be a good hook for a signature. This can get really complex! ● Gatan DM3: http://www.er-c.org/cbb/info/dmformat/#dm3 00000003{4}000000(00|01){6}(14|15){2-258}25252525 ● Stata DTA 113: http://www.stata.com/help.cgi?dta_113 71(01|02)01{105}00 ● ASP ASAX: https://msdn.microsoft.com/en-us/library/es4ac4ek(v=vs.85).aspx 3C2540204170706C69636174696F6E20(436F6465426568696E64|436F6D70696C6572 File Format ID 4F7074696F6E73|4465736372697074696F6E|496E686572697473|4C616E6775616765 )3D 14 File FormatAgenda ID Style Format ‘Subsets’ ● Sometimes file formats may be ‘subsets’ or subtypes of other formats. Major examples are: ● PDF/a – subtype of PDF (so is PDF/X) ● DNG – subtype of TIFF (so is NIKON Raw NEF) ● WAV – subtype of RIFF (so is AVI) ● SVG – subtype of XML (so is GML) We manage these relationships with ‘priorities’ – for example, PDF/A has priority over PDF because it contains a more specific element File Format ID 15 File FormatAgenda ID Style Notes on Format Identification ● Not all files are automatically identifiable (based on how PRONOM currently works) – see Wireless Bitmap (.wbmp) – but an extension-only entry is better than nothing! ● A 4 byte (32 bit) sequence has a 1 in ~4 billion chance of a clash with truly random data – this is usually strong enough ● A text editor can be better for viewing XML based formats than a Hex Editor (although you’ll need the hex editor for creating the byte sequences) ● We’re not trying to characterise a format or validate that it is well formed, we’re just trying to give us a reasonable degree of certainty about the outcome File Format ID ● Files that ID as OLE2 or ZIP are probably container sigs (for later!) 16 Signature development process 17 Reading bytestreams Format signature tools Hexadecimal and binary Hex (hexadecimal editors) are number systems. allows for manipulation of Base 2 and Base16 the fundamental binary respectively. We usually data that constitutes a file work in Base 10 (decimal), ie. 1, 2, 3 … 10. Binary for 144 is 10010000. In hex this is OP Format Corpus simply 0x90. Fewer zeros to work with helps us see an openly-licensed larger numbers easier. corpus of sample files DROID/Siegfried/FIDO PRONOM Tools to match files to Submission Utility PRONOM format an online form to submit signatures information about file formats for PRONOM 18 Tools Agenda Style Hex Editors A program that allows for manipulation of the fundamental binary data that constitutes a file Also called a binary file editor For more info see: https://en.wikipedia.or g/wiki/Hex_editor Reading Bytestreams 19 Hex editors ● Windows - HxD https://mh-nexus.de/en/hxd/ ● OS X - HexFiend http://ridiculousfish.com/hexf iend/ ● Linux - Bless https://apps.ubuntu.com/cat/ applications/precise/bless/ 20 Hex editors Online options: http://binvis.io https://hexed.it/ http://icebuddha.com/ 21 ResourcesAgenda Style Format specification documents A document that describes the set of requirements necessary for a given file format LoC’s Sustainability of Digital Formats is a good place to look for these: http://www.loc.gov/preservation/digital/for mats/fdd/browse_list.shtml Reading Bytestreams GIF specification: https://www.w3.org/Graphics/GIF/spec-gif89a.txt 22 Hands-on: Examining sample files in a hex editor 23 Case study Developing a simple signature TZX Spectrum Tapes 24 The TZX AgendaTape Format Style Creating a signature ● A format for archiving ZX Spectrum programs ● Used with ZX emulation programs ● Large hobbyist community – lots of information available ● A audio stream of the tape data ● World of Spectrum Archive: 10,000’s of examples - https://www.worldofspectrum.org Creating Signatures 25 26 27 The TZX AgendaTape Format Style The Format Specification - http://www.worldofspectrum.org/TZXformat.html Creating Signatures 28 ResourcesAgenda Style PRONOM terms, basic syntax and data model BOF = Beginning of File. EOF = End of File. Var = Variable (anywhere in the file) Offset/Max Offset = Exact or positional range in which a signature starts Wildcards: ?? = single wildcard byte, e.g. AB??C3 * = 0-many wildcard bytes, e.g BC*D4 {n} = specific number of wildcard bytes, e.g. A2{5}F3 {n-n} = range of wildcard bytes, e.g. 4D{0-12}E4 Byte range: [hh:hh] = single byte value between range, e.g [00:FA] Either/or: (hhhh|hhhh|hh) = either/any or these byte values, e.g. (0D|0A|0D0A) Not: [!hh] = anything except this byte value, e.g. ABCD[!01]E1 https://www.nationalarchives.gov.uk/aboutapps/fileformat/pdf/automatic_format_identification.pdf Creating Signatures 29 Tool Agenda Style PRONOM Signature Development Utility http://www.nationalarchives.gov.uk/pronom/sigdev/index.htm Creating Signatures 30 Hands-on: Creating and editing a sample PRONOM signature 31 Break! Please be back at 10:50am 32 Signature development process cont’d 33 Tool Agenda Style Format characterization tools The process of file format characterization
Recommended publications
  • The LAS File Format Contains a Header Block, Variable Length
    LAS Specification Version 1.2 Approved by ASPRS Board 09/02/2008 LAS 1.2 1 LAS FORMAT VERSION 1.2: This document reflects the second revision of the LAS format specification since its initial version 1.0 release. Version 1.2 retains the same structure as version 1.1 including identical field alignment. LAS 1.1 file Input/Output (I/O) libraries will require slight modifications in order to be compliant with this revision. A LAS 1.1 Reader will read LAS 1.2 (without the new enhancements) with no modifications. A detailed change document that provides both an overview of the changes in the specification as well as the motivation behind each change is available from the ASPRS website in the LIDAR committee section. The additions of LAS 1.2 include: • GPS Absolute Time (as well as GPS Week Time) – LAS 1.0 and LAS 1.1 specified GPS “Week Time” only. This meant that GPS time stamps “rolled over” at midnight on Saturday. This makes processing of LIDAR flight lines that span the time reset difficult. LAS 1.2 allows both GPS Week Time and Absolute GPS Time (POSIX) stamps to be used. • Support for ancillary image data on a per point basis. You can now specify Red, Green, Blue image data on a point by point basis. This is encapsulated in two new point record types (type 2 and type 3). LAS FORMAT DEFINITION: The LAS file is intended to contain LIDAR point data records. The data will generally be put into this format from software (e.g.
    [Show full text]
  • Key Aspects in 3D File Format Conversions
    Key Aspects in 3D File Format Conversions Kenton McHenry and Peter Bajcsy Image Spatial Data Analysis Group, NCSA Presented by: Peter Bajcsy National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Outline • Introduction • What do we know about 3D file formats? • Basic Archival Questions • Is there an optimal format to convert to? • Can we quantify 3D noise introduced during conversions? • NCSA Polyglot to Support Archival Processes • Automation of File Format Conversions • Quality of File Format Conversions • Scalability with Volume • Conclusions • Live demonstration Introduction Introduction to 3D File Format Reality *.k3d *.pdf (*.prc, *.u3d) *.ma, *.mb, *.mp *.w3d *.lwo *.c4d *.dwg *.blend *.iam *.max, *.3ds Introduction: Our Survey about 3D Content • Q: How Many 3D File Formats Exist? • A: We have found more than 140 3D file formats. Many are proprietary file formats. Many are extremely complex (1,200 and more pages of specifications). • Q: How Many Software Packages Support 3D File Format Import, Export and Display? • A: We have documented about 16 software packages. There are many more. Most of them are proprietary/closed source code. Many contain incomplete support of file specifications. Examples of Formats and Stored Content Format Geometry Appearance Scene Animation Faceted Parametric CSG B-Rep Color Material Texture Bump Lights Views Trans. Groups 3ds √ √ √ √ √ √ √ √ √ igs √ √ √ √ √ √ √ lwo √ √ √ √ √ √ obj √ √ √ √ √ √ √ ply √ √ √ √ √ stp √ √ √ √ √ √ wrl √ √ √ √ √ √ √ √ √ √ √ u3d √ √ √ √ √
    [Show full text]
  • Generating File Format Identification and Checksums with DROID
    Electronic Records Modules Electronic Records Committee Congressional Papers Roundtable Society of American Archivists Generating File Format Identification and Checksums with DROID Brandon Hirsch Center for Legislative Studies [email protected] ____________________________________________________ Date Published: July 2016 Module#: ERCM001 Created 2016-07 CPR Electronic Records Committee File Format Identification & Checksum Generation with DROID May 2016 For Congressional Papers Roundtable Electronic Records Committee Table of Contents Table of Contents Overview and Rationale Procedural Assumptions Hardware and Software Requirements Workflow Configuring DROID Configuring DROID in Mac OS X Configuring DROID in Windows Starting DROID Starting DROID in Mac OS X Starting DROID in Windows What Do These Results Mean? Checksums Further Evaluation Exporting Results Filtering Reports Overview and Rationale File format identification is a critical component of digital preservation activities because it provides a reliable method for determining exactly what types of files are stored in your institution’s holdings. Understanding the contents of one’s holdings provides a foundation upon which additional preservation decisions are made. Additionally, generating checksums provides a reliable method for evaluating the identity and integrity of the specific files and objects in an institution’s digital holdings throughout the preservation lifecycle. The National Archives UK’s Digital Record Object IDentifier is one tool that can meet both of these needs. DROID’s primary function is to generate file format identification in compliance with the PRONOM registry, and to provide reports and/or exported results that can be used to 2 interpret the files within a data set. The exported results (i.e. exported to .csv) can also be used to enhance preservation information for a collection, accession, data set, etc.
    [Show full text]
  • Tools Used by CERP
    COLLABORATIVE ELECTRONIC RECORDS PROJECT EVALUATION OF TOOLS In order to process and preserve email collections for the pilot, tools were needed for format conversion, format detection, file comparison, and file extraction. One goal of the project was to address the realities that small to mid-sized institutions face with limited funding and technical staffing. During the project, various software applications (some free), metadata formats, and guides were used and evaluated. The summary below includes product information and the results of our trials. This report should not be considered an official endorsement of any product, nor is it a comprehensive list of every applicable product. Note: See glossary for format definitions. Product ABC Amber Outlook Converter Description ProcessText Group application that converts email into different formats such as PDF, HTML, and TXT. Trial version available. Vendor information “ABC Amber Outlook Converter is intended to help you keep your important emails, newsletters, other important messages organized in one file. It is a useful tool that converts your emails from MS Outlook to any document format (PDF, DOC, HTML, CHM, RTF, HLP, TXT, DBF, CSV, XML, MDB, etc.) easily and quickly. It generates the contents with bookmarks (in PDF, DOC, RTF and HTML), keeping hyperlinks. Also you can use this tool as MSG Converter. Currently our software supports more than 50 languages.” Intended CERP Use SIA tried it for some XML conversion of email before the XML parser- schema work was started by the CERP technical consultant. It can produce a report indicating number of unread items within the folders of an email account.
    [Show full text]
  • Download Download
    “What? So What?”: The Next-Generation JHOVE2 Architecture 123 The International Journal of Digital Curation Issue 3, Volume 4 | 2009 “What? So What”: The Next-Generation JHOVE2 Architecture for Format-Aware Characterization Stephen Abrams, California Digital Library, University of California Sheila Morrissey, Portico Tom Cramer, Stanford University Summary The JHOVE characterization framework is widely used by international digital library programs and preservation repositories. However, its extensive use over the past four years has revealed a number of limitations imposed by idiosyncrasies of design and implementation. With funding from the Library of Congress under its National Digital Information Infrastructure Preservation Program (NDIIPP), the California Digital Library, Portico, and Stanford University are collaborating on a two-year project to develop and deploy a next-generation architecture providing enhanced performance, streamlined APIs, and significant new features. The JHOVE2 Project generalizes the concept of format characterization to include identification, validation, feature extraction, and policy-based assessment. The target of this characterization is not a simple digital file, but a (potentially) complex digital object that may be instantiated in multiple files.1 1 This article is based on the paper given by the authors at iPRES 2008; received April 2009, published December 2009. The International Journal of Digital Curation is an international journal committed to scholarly excellence and dedicated to the advancement of digital curation across a wide range of sectors. ISSN: 1746-8256 The IJDC is published by UKOLN at the University of Bath and is a publication of the Digital Curation Centre. 124 “What? So What?”: The Next-Generation JHOVE2 Architecture Introduction Digital preservation is the set of intentions, strategies, and activities directed toward ensuring the continuing usability of digital objects over time.
    [Show full text]
  • Fileweaver: Flexible File Management with Automatic Dependency Tracking Julien Gori Han L
    FileWeaver: Flexible File Management with Automatic Dependency Tracking Julien Gori Han L. Han Michel Beaudouin-Lafon Université Paris-Saclay, CNRS, Inria, Laboratoire de Recherche en Informatique F-91400 Orsay, France {jgori, han.han, mbl}@lri.fr ABSTRACT Specialized tools typically load and save information in pro- Knowledge management and sharing involves a variety of spe- prietary and/or binary data formats, such as Matlab1 .mat cialized but isolated software tools, tied together by the files files or SPSS2 .sav files. Knowledge workers have to rely on that these tools use and produce. We interviewed 23 scientists standardized exchange file formats and file format converters and found that they all had difficulties using the file system to communicate information from one application to the other, to keep track of, re-find and maintain consistency among re- leading to a multiplication of files. lated but distributed information. We introduce FileWeaver, a system that automatically detects dependencies among files Moreover, as exemplified by Guo’s “typical” workflow of a without explicit user action, tracks their history, and lets users data scientist [8, Fig. 2.1], knowledge workers’ practices often interact directly with the graphs representing these dependen- consist of several iterations of exploratory, production and cies and version history. Changes to a file can trigger recipes, dissemination phases, in which workers create copies of files either automatically or under user control, to keep the file con- to save their work, file revisions, e.g. to revise the logic of sistent with its dependants. Users can merge variants of a file, their code, and file variants, e.g.
    [Show full text]
  • Common Object File Format (COFF)
    Application Report SPRAAO8–April 2009 Common Object File Format ..................................................................................................................................................... ABSTRACT The assembler and link step create object files in common object file format (COFF). COFF is an implementation of an object file format of the same name that was developed by AT&T for use on UNIX-based systems. This format encourages modular programming and provides powerful and flexible methods for managing code segments and target system memory. This appendix contains technical details about the Texas Instruments COFF object file structure. Much of this information pertains to the symbolic debugging information that is produced by the C compiler. The purpose of this application note is to provide supplementary information on the internal format of COFF object files. Topic .................................................................................................. Page 1 COFF File Structure .................................................................... 2 2 File Header Structure .................................................................. 4 3 Optional File Header Format ........................................................ 5 4 Section Header Structure............................................................. 5 5 Structuring Relocation Information ............................................... 7 6 Symbol Table Structure and Content........................................... 11 SPRAAO8–April 2009
    [Show full text]
  • Preserva'on*Watch What%To%Monitor%And%How%Scout%Can%Help
    Preserva'on*Watch What%to%monitor%and%how%Scout%can%help Luis%Faria%[email protected] KEEP%SOLUTIONS%www.keep7solu:ons.com Digital%Preserva:on%Advanced%Prac::oner%Course Glasgow,%15th719th%July%2013 KEEP$SOLUTIONS • Company%specialized%in%informa:on%management • Digital%preserva:on%experts • Open%source:%RODA,%KOHA,%DSpace,%Moodle,%etc. • Scien:fic%research • SCAPE:%large7scale%digital%preserva:on%environments • 4C:%digital%preserva:on%cost%modeling h/p://www.keep6solu'ons.com This%work%was%par,ally%supported%by%the%SCAPE%Project. The%SCAPE%project%is%co<funded%by%the%European%Union%under%FP7%ICT<2009.4.1%(Grant%Agreement%number%270137). 2 Preservation monitoring 3 Why do we need monitoring? Format obsolescence New standards Emerging technology Repository Producer trends Organisation Bit rot mission Resource capability Organisation System availability Consumer trends policies Security breach Economical limitations Social and political factors 4 Why do we need monitoring? Format obsolescence New standards Emerging technology Repository Producer trends Organisation Bit rot mission Risks Resource capability Organisation System availability Consumer trends Opportunities policies Security breach Economical limitations Social and political factors 5 SCAPE State of the Art • Digital Format Registries • Automatic Obsolescence Notification System (AONS) • Technology watch reports 6 SCAPE State of the Art • Digital Format Registries • Lack of coverage • Statically-defined generic risks • Lack of structure in risks • Focus on format obsolescence • AONS
    [Show full text]
  • The Application of File Identification, Validation, and Characterization Tools in Digital Curation
    THE APPLICATION OF FILE IDENTIFICATION, VALIDATION, AND CHARACTERIZATION TOOLS IN DIGITAL CURATION BY KEVIN MICHAEL FORD THESIS Submitted in partial fulfillment of the requirements for the degree of Master of Science in Library and Information Science in the Graduate College of the University of Illinois at Urbana-Champaign, 2011 Urbana, Illinois Advisers: Research Assistant Professor Melissa Cragin Assistant Professor Jerome McDonough ABSTRACT File format identification, characterization, and validation are considered essential processes for digital preservation and, by extension, long-term data curation. These actions are performed on data objects by humans or computers, in an attempt to identify the type of a given file, derive characterizing information that is specific to the file, and validate that the given file conforms to its type specification. The present research reviews the literature surrounding these digital preservation activities, including their theoretical basis and the publications that accompanied the formal release of tools and services designed in response to their theoretical foundation. It also reports the results from extensive tests designed to evaluate the coverage of some of the software tools developed to perform file format identification, characterization, and validation actions. Tests of these tools demonstrate that more work is needed – particularly in terms of scalable solutions – to address the expanse of digital data to be preserved and curated. The breadth of file types these tools are anticipated to handle is so great as to call into question whether a scalable solution is feasible, and, more broadly, whether such efforts will offer a meaningful return on investment. Also, these tools, which serve to provide a type of baseline reading of a file in a repository, can be easily tricked.
    [Show full text]
  • The Unified Digital Formats Registry
    article excerpted from: information StandardS Quarterly SPRING 2010 | VOL 22 | ISSUE 2 | ISSN 1041-0031 SPECIAL ISSUE: DIGITAL PRESERVATION Digital Preservation MetaData stanDarDs trUstworthy Digital rePositories UnifieD Digital forMats registry Audio-visUal Digitization GuiDelines Digital Preservation Education 26 FE the UUnifIeD Digital Formats registry a n D r e a g o e t h a l s a publication of the national information standards organization (NISO) fe 27 Why do we need a format registry for digital preservation? If you diligently protected a WordStar document for the last twenty-five years, all of its original bits may still be intact, but it would not be usable to anyone. Today’s computers do not have software that can open documents in the WordStar format. It’s not enough to keep digital bits safe; to fully preserve digital content we must make sure that it remains compatible with modern technology. Given that the ultimate goal of digital preservation is to keep content usable, practically how do we accomplish this? Somehow we need to be able to answer two questions: (1) is the content I’m managing in danger of becoming unusable, and if so, (2) how can I remedy this situation? Formats play a key role in determining if digital material is usable. While traditional books are human-readable, giving the reader immediate access to the intellectual content, to use a digital book, the reader needs hardware that runs software, that understands formats, composed of bits, to access the intellectual content. Without technological mediation, a digital book cannot be read. Formats are the bridge between the bits and the technologies needed to make sense of the bits.
    [Show full text]
  • Image Formats
    Image Formats Ioannis Rekleitis Many different file formats • JPEG/JFIF • Exif • JPEG 2000 • BMP • GIF • WebP • PNG • HDR raster formats • TIFF • HEIF • PPM, PGM, PBM, • BAT and PNM • BPG CSCE 590: Introduction to Image Processing https://en.wikipedia.org/wiki/Image_file_formats 2 Many different file formats • JPEG/JFIF (Joint Photographic Experts Group) is a lossy compression method; JPEG- compressed images are usually stored in the JFIF (JPEG File Interchange Format) >ile format. The JPEG/JFIF >ilename extension is JPG or JPEG. Nearly every digital camera can save images in the JPEG/JFIF format, which supports eight-bit grayscale images and 24-bit color images (eight bits each for red, green, and blue). JPEG applies lossy compression to images, which can result in a signi>icant reduction of the >ile size. Applications can determine the degree of compression to apply, and the amount of compression affects the visual quality of the result. When not too great, the compression does not noticeably affect or detract from the image's quality, but JPEG iles suffer generational degradation when repeatedly edited and saved. (JPEG also provides lossless image storage, but the lossless version is not widely supported.) • JPEG 2000 is a compression standard enabling both lossless and lossy storage. The compression methods used are different from the ones in standard JFIF/JPEG; they improve quality and compression ratios, but also require more computational power to process. JPEG 2000 also adds features that are missing in JPEG. It is not nearly as common as JPEG, but it is used currently in professional movie editing and distribution (some digital cinemas, for example, use JPEG 2000 for individual movie frames).
    [Show full text]
  • File Format Guidelines for Management and Long-Term Retention of Electronic Records
    FILE FORMAT GUIDELINES FOR MANAGEMENT AND LONG-TERM RETENTION OF ELECTRONIC RECORDS 9/10/2012 State Archives of North Carolina File Format Guidelines for Management and Long-Term Retention of Electronic records Table of Contents 1. GUIDELINES AND RECOMMENDATIONS .................................................................................. 3 2. DESCRIPTION OF FORMATS RECOMMENDED FOR LONG-TERM RETENTION ......................... 7 2.1 Word Processing Documents ...................................................................................................................... 7 2.1.1 PDF/A-1a (.pdf) (ISO 19005-1 compliant PDF/A) ........................................................................ 7 2.1.2 OpenDocument Text (.odt) ................................................................................................................... 3 2.1.3 Special Note on Google Docs™ .......................................................................................................... 4 2.2 Plain Text Documents ................................................................................................................................... 5 2.2.1 Plain Text (.txt) US-ASCII or UTF-8 encoding ................................................................................... 6 2.2.2 Comma-separated file (.csv) US-ASCII or UTF-8 encoding ........................................................... 7 2.2.3 Tab-delimited file (.txt) US-ASCII or UTF-8 encoding .................................................................... 8 2.3
    [Show full text]