Digital Preservation File Formats and Characterisation

Digital Preservation File Formats and Characterisation

Agenda . Digital Preservation File formats Basics and issues Exercise/experiments File formats and characterisation Characterisation Identification and validation (DROID, JHove) File format registries Risk assessment of file formats Christoph Becker, Hannes Kulovits April 17, 2008 eXtensible Characterisation Languages (XCL) Vienna University of Technology www.ifs.tuwien.ac.at/dp . Agenda . Definition of File/File Format Part 1: Representation File Formats Elements of a file format Hannes Kulovits Institut für Softwaretechnik und Interaktive Systeme TU Wien File and Preservation http://www.ifs.tuwien.ac.at/dp Challenges Part of this presentation is based on slides by Prof. Manfred Thaller, DELOS Summer School 2007, Pisa . What is a file/file format? Plain Text . A file is nothing more than a sequence of bits De facto standard for Plain Text is ASCII – Uses 8 bits How to encode those bits is specified in a file format – Maximum of 256 different characters possible – Includes File format is a specification of how to interpret a bit • Letters of most alphabets (()lower and upper case) stream. • Arabic numerals • Punctuation marks • Standard symbols File format specifies 1. Whether the file is binary or ASCII Another important format is Unicode 2. How information is organized – Provides unique encoding for each character 3. ... – Uses multiple bytes to represent each character . 1 Proprietary vs. Open File formats based on plain text . Proprietary For example: XHTML 1.1 – Documentation mostly not available – License and patent rules – License agreements subject to change In HTML plain text must obey certain rules – Restrictions for use and modifications may apply – se of tags – type sizes Open – color – Documentation available! – Unlimited use – No license fee – Open for modifications – No patent owners . Different types of File Formats Different types of File Formats (2) . Different kinds of formats for different kinds of Three-character file extension of DOS and Windows. information (Neither standardised nor unique.) [Rothenberg, 1995, Ensuring the Longevity of Digital Documents] Official categorisation of file formats is the IANA MIME Unix ‚magic numbers‘ type – Text documents – Databases Macintosh data-forks – Still and moving images – Audio MIME type, also not unique – Multipart – ... None of them is really satisfying – Better solution: PRONOM with Pronom Unique Identifier . An image An image . 6 rows 5 columns . 2 An image An image . 11111 5 rows 1 == blue 10001 6 columns 0 == red 1 1 0 1 1 11011 11011 11111 . An image An image . 11111 Store: 11111 1,1,1,1,1, 1 == green 10001 1,0,0,0,1, 10001 0 == yellow 1 1 0 1 1 1,1,0,1,1, 1 1 0 1 1 1,1,0,1,1, 11011 1,1,0,1,1, 11011 11011 1,1,1,1,1 11011 11111 11111 . An image An image . Store: 11111 Store: 11111 6,1,3,0,3, 6,1,3,0,3, 1,1,0,4,1,1, 10001 1,1,0,4,1,1, 10001 0,4,1,1,0, 11011 0,4,1,1,0, 11011 7,1 7,1 11011 11011 11011 11011 11111 11111 . 3 An image An image . Store: 11111 Store: 11111 6,1,3,0,3, 1,1,1,1,1, 1,1,0,4,1,1, 10001 1,0,0,0,1, 10001 0,4,1,1,0, 11011 1,1,0,1,1, 1 1 0 1 1 7,1 1,1,0,1,1, 11011 1,1,0,1,1, 11011 11011 1,1,1,1,1 11011 11111 Uncompressed 11111 . An image An image . Store: 11111 Store: 1,1 2,1 3,1 4,1 5,1 6,1,3,0,3, SetSize: 5 by 6 SetBackgroundColor: Blue 1,2 2,2 3,2 4,2 5,2 1,1,0,4,1, 10001 SetForegroundColor: Red SetLetterHeight: 4 1,0,4,1,1, 1 1 0 1 1 MoveTo: 3,5 1,3 2,3 3,3 4,3 5,3 0,7,1 DrawLetter: T 11011 1,4 2,4 3,4 4,4 5,4 (Compressed) 11011 1,5 2,5 3,5 4,5 5,5 Run Length Encoded 11111 1,6 2,6 3,6 4,6 5,6 . An image An image . 6 rows dimensions 5 columns 1 == blue 1 == blue 0 == red 0 == red Uncompressed Uncompressed . 4 An image An image . <basic <basic information> information> (implicit / explicit) <rendering <rendering information> information> (implicit / explicit) <storage <storage information> information> (implicit / explicit) … and the data? . An image An image . <basic Data either as data stream 11111 information> (implicit / explicit) 1,1,1,1,1,1, 10001 <rendering 0,0,0,1,1,1, 0,1,1,1,1,0, information> 1111011,1,1,1,0,1, 1 1 0 1 1 (implicit / explicit) 1,1,1,1,1,1 <storage 11011 information> (implicit / explicit) 11011 … and the data? 11111 . An image File Format . Data either as data stream 11111 Basic Information or as – What to do? processing instructions 10001 Rendering Information SetSize: 5 by 6 SetBackgroundColor: Blue 1 1 0 1 1 – How to do It? SetForegroundColor: Red SetLetterHeight: 4 11011 Storage Information MoveTo: 3,5 – How to move it from persistent form to deployed form?.

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    17 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us