Digital Preservation File Formats and Characterisation

Total Page:16

File Type:pdf, Size:1020Kb

Digital Preservation File Formats and Characterisation Agenda . Digital Preservation File formats Basics and issues Exercise/experiments File formats and characterisation Characterisation Identification and validation (DROID, JHove) File format registries Risk assessment of file formats Christoph Becker, Hannes Kulovits April 17, 2008 eXtensible Characterisation Languages (XCL) Vienna University of Technology www.ifs.tuwien.ac.at/dp . Agenda . Definition of File/File Format Part 1: Representation File Formats Elements of a file format Hannes Kulovits Institut für Softwaretechnik und Interaktive Systeme TU Wien File and Preservation http://www.ifs.tuwien.ac.at/dp Challenges Part of this presentation is based on slides by Prof. Manfred Thaller, DELOS Summer School 2007, Pisa . What is a file/file format? Plain Text . A file is nothing more than a sequence of bits De facto standard for Plain Text is ASCII – Uses 8 bits How to encode those bits is specified in a file format – Maximum of 256 different characters possible – Includes File format is a specification of how to interpret a bit • Letters of most alphabets (()lower and upper case) stream. • Arabic numerals • Punctuation marks • Standard symbols File format specifies 1. Whether the file is binary or ASCII Another important format is Unicode 2. How information is organized – Provides unique encoding for each character 3. ... – Uses multiple bytes to represent each character . 1 Proprietary vs. Open File formats based on plain text . Proprietary For example: XHTML 1.1 – Documentation mostly not available – License and patent rules – License agreements subject to change In HTML plain text must obey certain rules – Restrictions for use and modifications may apply – se of tags – type sizes Open – color – Documentation available! – Unlimited use – No license fee – Open for modifications – No patent owners . Different types of File Formats Different types of File Formats (2) . Different kinds of formats for different kinds of Three-character file extension of DOS and Windows. information (Neither standardised nor unique.) [Rothenberg, 1995, Ensuring the Longevity of Digital Documents] Official categorisation of file formats is the IANA MIME Unix ‚magic numbers‘ type – Text documents – Databases Macintosh data-forks – Still and moving images – Audio MIME type, also not unique – Multipart – ... None of them is really satisfying – Better solution: PRONOM with Pronom Unique Identifier . An image An image . 6 rows 5 columns . 2 An image An image . 11111 5 rows 1 == blue 10001 6 columns 0 == red 1 1 0 1 1 11011 11011 11111 . An image An image . 11111 Store: 11111 1,1,1,1,1, 1 == green 10001 1,0,0,0,1, 10001 0 == yellow 1 1 0 1 1 1,1,0,1,1, 1 1 0 1 1 1,1,0,1,1, 11011 1,1,0,1,1, 11011 11011 1,1,1,1,1 11011 11111 11111 . An image An image . Store: 11111 Store: 11111 6,1,3,0,3, 6,1,3,0,3, 1,1,0,4,1,1, 10001 1,1,0,4,1,1, 10001 0,4,1,1,0, 11011 0,4,1,1,0, 11011 7,1 7,1 11011 11011 11011 11011 11111 11111 . 3 An image An image . Store: 11111 Store: 11111 6,1,3,0,3, 1,1,1,1,1, 1,1,0,4,1,1, 10001 1,0,0,0,1, 10001 0,4,1,1,0, 11011 1,1,0,1,1, 1 1 0 1 1 7,1 1,1,0,1,1, 11011 1,1,0,1,1, 11011 11011 1,1,1,1,1 11011 11111 Uncompressed 11111 . An image An image . Store: 11111 Store: 1,1 2,1 3,1 4,1 5,1 6,1,3,0,3, SetSize: 5 by 6 SetBackgroundColor: Blue 1,2 2,2 3,2 4,2 5,2 1,1,0,4,1, 10001 SetForegroundColor: Red SetLetterHeight: 4 1,0,4,1,1, 1 1 0 1 1 MoveTo: 3,5 1,3 2,3 3,3 4,3 5,3 0,7,1 DrawLetter: T 11011 1,4 2,4 3,4 4,4 5,4 (Compressed) 11011 1,5 2,5 3,5 4,5 5,5 Run Length Encoded 11111 1,6 2,6 3,6 4,6 5,6 . An image An image . 6 rows dimensions 5 columns 1 == blue 1 == blue 0 == red 0 == red Uncompressed Uncompressed . 4 An image An image . <basic <basic information> information> (implicit / explicit) <rendering <rendering information> information> (implicit / explicit) <storage <storage information> information> (implicit / explicit) … and the data? . An image An image . <basic Data either as data stream 11111 information> (implicit / explicit) 1,1,1,1,1,1, 10001 <rendering 0,0,0,1,1,1, 0,1,1,1,1,0, information> 1111011,1,1,1,0,1, 1 1 0 1 1 (implicit / explicit) 1,1,1,1,1,1 <storage 11011 information> (implicit / explicit) 11011 … and the data? 11111 . An image File Format . Data either as data stream 11111 Basic Information or as – What to do? processing instructions 10001 Rendering Information SetSize: 5 by 6 SetBackgroundColor: Blue 1 1 0 1 1 – How to do It? SetForegroundColor: Red SetLetterHeight: 4 11011 Storage Information MoveTo: 3,5 – How to move it from persistent form to deployed form?.
Recommended publications
  • X3DOM – Declarative (X)3D in HTML5
    X3DOM – Declarative (X)3D in HTML5 Introduction and Tutorial Yvonne Jung Fraunhofer IGD Darmstadt, Germany [email protected] www.igd.fraunhofer.de/vcst © Fraunhofer IGD 3D Information inside the Web n Websites (have) become Web applications n Increasing interest in 3D for n Product presentation n Visualization of abstract information (e.g. time lines) n Enriching experience of Cultural Heritage data n Enhancing user experience with more Example Coform3D: line-up of sophisticated visualizations scanned historic 3D objects n Today: Adobe Flash-based site with videos n Tomorrow: Immersive 3D inside browsers © Fraunhofer IGD OpenGL and GLSL in the Web: WebGL n JavaScript Binding for OpenGL ES 2.0 in Web Browser n à Firefox, Chrome, Safari, Opera n Only GLSL shader based, no fixed function pipeline mehr n No variables from GL state n No Matrix stack, etc. n HTML5 <canvas> element provides 3D rendering context n gl = canvas.getContext(’webgl’); n API calls via GL object n X3D via X3DOM framework n http://www.x3dom.org © Fraunhofer IGD X3DOM – Declarative (X)3D in HTML5 n Allows utilizing well-known JavaScript and DOM infrastructure for 3D n Brings together both n declarative content design as known from web design n “old-school” imperative approaches known from game engine development <html> <body> <h1>Hello X3DOM World</h1> <x3d> <scene> <shape> <box></box> </shape> </scene> </x3d> </body> </html> © Fraunhofer IGD X3DOM – Declarative (X)3D in HTML5 • X3DOM := X3D + DOM • DOM-based integration framework for declarative 3D graphics
    [Show full text]
  • TR 102 199 V1.1.1 (2003-10) Technical Report
    ETSI TR 102 199 V1.1.1 (2003-10) Technical Report Services and Protocols for Advanced Networks (SPAN); Preliminary analysis of Broadband multimedia services 2 ETSI TR 102 199 V1.1.1 (2003-10) Reference DTR/SPAN-130320 Keywords broadband, multimedia, service ETSI 650 Route des Lucioles F-06921 Sophia Antipolis Cedex - FRANCE Tel.: +33 4 92 94 42 00 Fax: +33 4 93 65 47 16 Siret N° 348 623 562 00017 - NAF 742 C Association à but non lucratif enregistrée à la Sous-Préfecture de Grasse (06) N° 7803/88 Important notice Individual copies of the present document can be downloaded from: http://www.etsi.org The present document may be made available in more than one electronic version or in print. In any case of existing or perceived difference in contents between such versions, the reference version is the Portable Document Format (PDF). In case of dispute, the reference shall be the printing on ETSI printers of the PDF version kept on a specific network drive within ETSI Secretariat. Users of the present document should be aware that the document may be subject to revision or change of status. Information on the current status of this and other ETSI documents is available at http://portal.etsi.org/tb/status/status.asp If you find errors in the present document, send your comment to: [email protected] Copyright Notification No part may be reproduced except as authorized by written permission. The copyright and the foregoing restriction extend to reproduction in all media. © European Telecommunications Standards Institute 2003. All rights reserved.
    [Show full text]
  • Completed Projects / Projets Terminés
    Completed Projects / Projets terminés New Standards — New Editions — Special Publications Please note that the following standards were developed by the International Organization for Standardization (ISO) and the International Electrotechnical Commission (IEC), and have been adopted by the Canadian Standards Association. These standards are available in PDF format only. CAN/CSA-ISO/IEC 2593:02, 4th edition Information Technology–Telecommunications and Information Exchange Between Systems–34-Pole DTE/DCE Interface Connector Mateability Dimensions and Contact Number Assignments (Adopted ISO/IEC 2593:2000).................................... $85 CAN/CSA-ISO/IEC 7811-2:02, 3rd edition Identification Cards–Recording Technique–Part 2: Magnetic Stripe–Low Coercivity (Adopted ISO/IEC 7811-2:2001) .................................................................................... $95 CAN/CSA-ISO/IEC 8208:02, 4th edition Information Technology–Data Communications–X.25 Packet Layer Protocol for Data Terminal Equipment (Adopted ISO/IEC 8208:2000) ............................................ $220 CAN/CSA-ISO/IEC 8802-3:02, 2nd edition Information Technology–Telecommunications and Information Exchange Between Systems–Local and Metropolitan Area Networks–Specific Requirements–Part 3: Carrier Sense Multiple Access with Collision Detection (CSMA/CD) Access Method and Physical Layer (Adopted ISO/IEC 8802-3:2000/IEEE Std 802.3, 2000) ................. $460 CAN/CSA-ISO/IEC 9798-1:02, 2nd edition Information Technology–Security Techniques–Entity Authentication–Part
    [Show full text]
  • (L3) - Audio/Picture Coding
    Committee: (L3) - Audio/Picture Coding National Designation Title (Click here to purchase standards) ISO/IEC Document L3 INCITS/ISO/IEC 9281-1:1990:[R2013] Information technology - Picture Coding Methods - Part 1: Identification IS 9281-1:1990 INCITS/ISO/IEC 9281-2:1990:[R2013] Information technology - Picture Coding Methods - Part 2: Procedure for Registration IS 9281-2:1990 INCITS/ISO/IEC 9282-1:1988:[R2013] Information technology - Coded Representation of Computer Graphics Images - Part IS 9282-1:1988 1: Encoding principles for picture representation in a 7-bit or 8-bit environment :[] Information technology - Coding of Multimedia and Hypermedia Information - Part 7: IS 13522-7:2001 Interoperability and conformance testing for ISO/IEC 13522-5 (MHEG-7) :[] Information technology - Coding of Multimedia and Hypermedia Information - Part 5: IS 13522-5:1997 Support for Base-Level Interactive Applications (MHEG-5) :[] Information technology - Coding of Multimedia and Hypermedia Information - Part 3: IS 13522-3:1997 MHEG script interchange representation (MHEG-3) :[] Information technology - Coding of Multimedia and Hypermedia Information - Part 6: IS 13522-6:1998 Support for enhanced interactive applications (MHEG-6) :[] Information technology - Coding of Multimedia and Hypermedia Information - Part 8: IS 13522-8:2001 XML notation for ISO/IEC 13522-5 (MHEG-8) Created: 11/16/2014 Page 1 of 44 Committee: (L3) - Audio/Picture Coding National Designation Title (Click here to purchase standards) ISO/IEC Document :[] Information technology - Coding
    [Show full text]
  • BIM & IFC-To-X3D
    BIM & IFC-to-X3D Hyokwang Lee PartDB Co., Ltd. and Web3D Korea Chapter [email protected] Engineering IT & VR solutions based on International Standards BIM . BIM (Building Information Modeling) . A digital representation of physical and functional characteristics of a facility. A shared knowledge resource for information about a facility forming a reliable basis for decisions during its life-cycle; defined as existing from earliest conception to demolition. http://www.directindustry.com/ http://www.aecbytes.com/buildingthefuture/2007/BIM_Awards_Part1.html Coordination between the different disciplinary models. (© M.A. Mortenson Company) 4D scheduling using the BIM model (top image) and the details of site utilization and civil work (lower image) used for coordination and communication with local review agencies and utility companies. (© M.A. Mortenson Company) http://www.quartzsys.com/ Design http://www.quartzsys.com/?c=1/6/27 OpenBIM . A universal approach to the collaborative design, realization and operation of buildings based on open standards and workflows. An initiative of buildingSMART and several leading software vendors using the open buildingSMART Data Model. http://buildingsmart.com/openbim OpenBIM & IFC . IFC (ISO/PAS 16739) . The Industry Foundation Classes (IFC) : an open, neutral and standard ized specification for Building Information Models, BIM. The main buildingSMART data model standard buildingSMART Web3D Consortium International IFC Web3D (ISO/PAS 16739) Web3D Korea Chapter buildingSMART KOREA - Inhan Kim, What is BIM?, Aug. 2008, http://www.cadgraphics.co.kr/cngtv/data/20080813_1.pdf BIM CAD Systems . Autodesk Revit Architecture . Bently Achitecture . Nemetschek Allplan . GRAPHISOFT ArchiCad . Gehry Technologies (GT) Digital Project . CATIA V5 as a core modeling engine . Vectorworks Architect Simple IFC-to-X3D Conversion .
    [Show full text]
  • JPEG and JPEG 2000
    JPEG and JPEG 2000 Past, present, and future Richard Clark Elysium Ltd, Crowborough, UK [email protected] Planned presentation Brief introduction JPEG – 25 years of standards… Shortfalls and issues Why JPEG 2000? JPEG 2000 – imaging architecture JPEG 2000 – what it is (should be!) Current activities New and continuing work… +44 1892 667411 - [email protected] Introductions Richard Clark – Working in technical standardisation since early 70’s – Fax, email, character coding (8859-1 is basis of HTML), image coding, multimedia – Elysium, set up in ’91 as SME innovator on the Web – Currently looks after JPEG web site, historical archive, some PR, some standards as editor (extensions to JPEG, JPEG-LS, MIME type RFC and software reference for JPEG 2000), HD Photo in JPEG, and the UK MPEG and JPEG committees – Plus some work that is actually funded……. +44 1892 667411 - [email protected] Elysium in Europe ACTS project – SPEAR – advanced JPEG tools ESPRIT project – Eurostill – consensus building on JPEG 2000 IST – Migrator 2000 – tool migration and feature exploitation of JPEG 2000 – 2KAN – JPEG 2000 advanced networking Plus some other involvement through CEN in cultural heritage and medical imaging, Interreg and others +44 1892 667411 - [email protected] 25 years of standards JPEG – Joint Photographic Experts Group, joint venture between ISO and CCITT (now ITU-T) Evolved from photo-videotex, character coding First meeting March 83 – JPEG proper started in July 86. 42nd meeting in Lausanne, next week… Attendance through national
    [Show full text]
  • ISO/IEC JTC 1 N13604 ISO/IEC JTC 1 Information Technology
    ISO/IEC JTC 1 N13604 2017-09-17 Replaces: ISO/IEC JTC 1 Information Technology Document Type: other (defined) Document Title: Study Group Report on 3D Printing and Scanning Document Source: SG Convenor Project Number: Document Status: This document is circulated for review and consideration at the October 2017 JTC 1 meeting in Russia. Action ID: ACT Due Date: 2017-10-02 Pages: Secretariat, ISO/IEC JTC 1, American National Standards Institute, 25 West 43rd Street, New York, NY 10036; Telephone: 1 212 642 4932; Facsimile: 1 212 840 2298; Email: [email protected] Study Group Report on 3D Printing and Scanning September 11, 2017 ISO/IEC JTC 1 Plenary (October 2017, Vladivostok, Russia) Prepared by the ISO/IEC JTC 1 Study Group on 3D Printing and Scanning Executive Summary The purpose of this report is to assess the possible contributions of JTC 1 to the global market enabled by 3D Printing and Scanning. 3D printing, also known as additive manufacturing, is considered by many sources as a truly disruptive technology. 3D printers range presently from small table units to room size and can handle simple plastics, metals, biomaterials, concrete or a mix of materials. They can be used in making simple toys, airplane engine components, custom pills, large buildings components or human organs. Depending on process, materials and precision, 3D printer costs range from hundreds to millions of dollars. 3D printing makes possible the manufacturing of devices and components that cannot be constructed cost-effectively with other manufacturing techniques (injection molding, computerized milling, etc.). It also makes possible the fabrications of customized devices, or individual (instead of identical mass-manufactured) units.
    [Show full text]
  • University of Florida Acceptable ETD Formats
    University of Florida acceptable ETD formats The Acceptable ETD formats below are defined based on those media for which a preservation strategy exists. The aim is to make all ETD’s available into the future as the technology changes and software becomes obsolete. Media formats that are not listed in this table are ones that cannot presently be preserved at an acceptable level. Many of these can easily be converted to one of the acceptable formats. If you create a document using Word or LaTex it should be exported as a PDF. Please consult with the CIRCA ETD unit ([email protected]) for assistance or if you have any questions. The Acceptable ETD Formats are reviewed and updated regularly by the Graduate School, the Library, the ETD training staff in Academic Technology, and the Florida Center for Library Automation. All files should be scanned with up-to-date virus software before submission. Files with viruses will not be accepted. Media Acceptable formats (bold indicates preferred formats) Text - PDF/A-1 (ISO 19005-1) (*.pdf) - Plain text (encoding: USASCII, UTF-8, UTF-16 with BOM) - XML (includes XSD/XSL/XHTML, etc.; with included or accessible schema and character encoding explicitly specified) - Cascading Style Sheets (*.css) - DTD (*.dtd) - Plain text (ISO 8859-1 encoding) - PDF (*.pdf) (embedded fonts) - Rich Text Format 1.x (*.rtf) - HTML (include a DOCTYPE declaration) - SGML (*.sgml) - Open Office (*.sxw/*.odt) - OOXML (ISO/IEC DIS 29500) (*.docx) - Computer program source code (*.c, *.c++, *.java, *.js, *.jsp, *.php, *.pl, etc.)
    [Show full text]
  • File Format Guidelines for Management and Long-Term Retention of Electronic Records
    FILE FORMAT GUIDELINES FOR MANAGEMENT AND LONG-TERM RETENTION OF ELECTRONIC RECORDS 9/10/2012 State Archives of North Carolina File Format Guidelines for Management and Long-Term Retention of Electronic records Table of Contents 1. GUIDELINES AND RECOMMENDATIONS .................................................................................. 3 2. DESCRIPTION OF FORMATS RECOMMENDED FOR LONG-TERM RETENTION ......................... 7 2.1 Word Processing Documents ...................................................................................................................... 7 2.1.1 PDF/A-1a (.pdf) (ISO 19005-1 compliant PDF/A) ........................................................................ 7 2.1.2 OpenDocument Text (.odt) ................................................................................................................... 3 2.1.3 Special Note on Google Docs™ .......................................................................................................... 4 2.2 Plain Text Documents ................................................................................................................................... 5 2.2.1 Plain Text (.txt) US-ASCII or UTF-8 encoding ................................................................................... 6 2.2.2 Comma-separated file (.csv) US-ASCII or UTF-8 encoding ........................................................... 7 2.2.3 Tab-delimited file (.txt) US-ASCII or UTF-8 encoding .................................................................... 8 2.3
    [Show full text]
  • DICOM Strategy
    1300 North 17th Street, Suite 900 Arlington, VA 22209, USA +1-703-841-3259 http://dicom.nema.org E-mail: [email protected] STRATEGIC DOCUMENT Last revised: 2017-03-08 The purpose of this document is to provide a summary of the current activities and future directions of the DICOM Standard. The content of the document is largely based on information submitted by individual working group chairs. The date of the last change of the strategy is marked for each working group; changes to the listing of Secretaries and/or Co-Chairs are made as needed. Contents INTRODUCTION The DICOM Standards Committee WG-01: Cardiac and Vascular Information WG-16: Magnetic Resonance WG-02: Projection Radiography and Angiography WG-17: 3D Manufacturing WG-03: Nuclear Medicine WG-18: Clinical Trials and Education WG-04: Compression WG-19: Dermatologic Standards WG-05: Exchange Media WG-20: Integration of Imaging and Information Systems WG-06: Base Standard WG-21: Computed Tomography WG-07: Radiotherapy WG-22: Dentistry WG-08: Structured Reporting WG-23: Application Hosting WG-09: Ophthalmology WG-24: Surgery WG-10: Strategic Advisory WG-25: Veterinary Medicine WG-11: Display Function Standard WG-26: Pathology WG-12: Ultrasound WG-27: Web Technology for DICOM WG-13: Visible Light WG-28: Physics WG-14: Security WG-29: Education, Communication and Outreach WG-15: Digital Mammography and CAD WG-30: Small Animal Imaging WG-31: Conformance INTRODUCTION Last substantive update: 2010-06-16 A Brief Background of the DICOM Standard The introduction of digital medical image sources in the 1970’s and the use of computers in processing these images after their acquisition led the American College of Radiology (ACR) and the National Electrical Manufacturers Association (NEMA) to form a joint committee in order to create a standard method for the transmission of medical images and their associated information.
    [Show full text]
  • ITC Confplanner DVD Pages Itcconfplanner
    Comparative Analysis of H.264 and Motion- JPEG2000 Compression for Video Telemetry Item Type text; Proceedings Authors Hallamasek, Kurt; Hallamasek, Karen; Schwagler, Brad; Oxley, Les Publisher International Foundation for Telemetering Journal International Telemetering Conference Proceedings Rights Copyright © held by the author; distribution rights International Foundation for Telemetering Download date 25/09/2021 09:57:28 Link to Item http://hdl.handle.net/10150/581732 COMPARATIVE ANALYSIS OF H.264 AND MOTION-JPEG2000 COMPRESSION FOR VIDEO TELEMETRY Kurt Hallamasek, Karen Hallamasek, Brad Schwagler, Les Oxley [email protected] Ampex Data Systems Corporation Redwood City, CA USA ABSTRACT The H.264/AVC standard, popular in commercial video recording and distribution, has also been widely adopted for high-definition video compression in Intelligence, Surveillance and Reconnaissance and for Flight Test applications. H.264/AVC is the most modern and bandwidth-efficient compression algorithm specified for video recording in the Digital Recording IRIG Standard 106-11, Chapter 10. This bandwidth efficiency is largely derived from the inter-frame compression component of the standard. Motion JPEG-2000 compression is often considered for cockpit display recording, due to the concern that details in the symbols and graphics suffer excessively from artifacts of inter-frame compression and that critical information might be lost. In this paper, we report on a quantitative comparison of H.264/AVC and Motion JPEG-2000 encoding for HD video telemetry. Actual encoder implementations in video recorder products are used for the comparison. INTRODUCTION The phenomenal advances in video compression over the last two decades have made it possible to compress the bit rate of a video stream of imagery acquired at 24-bits per pixel (8-bits for each of the red, green and blue components) with a rate of a fraction of a bit per pixel.
    [Show full text]
  • Forcepoint DLP Supported File Formats and Size Limits
    Forcepoint DLP Supported File Formats and Size Limits Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 This article provides a list of the file formats that can be analyzed by Forcepoint DLP, file formats from which content and meta data can be extracted, and the file size limits for network, endpoint, and discovery functions. See: ● Supported File Formats ● File Size Limits © 2021 Forcepoint LLC Supported File Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 The following tables lists the file formats supported by Forcepoint DLP. File formats are in alphabetical order by format group. ● Archive For mats, page 3 ● Backup Formats, page 7 ● Business Intelligence (BI) and Analysis Formats, page 8 ● Computer-Aided Design Formats, page 9 ● Cryptography Formats, page 12 ● Database Formats, page 14 ● Desktop publishing formats, page 16 ● eBook/Audio book formats, page 17 ● Executable formats, page 18 ● Font formats, page 20 ● Graphics formats - general, page 21 ● Graphics formats - vector graphics, page 26 ● Library formats, page 29 ● Log formats, page 30 ● Mail formats, page 31 ● Multimedia formats, page 32 ● Object formats, page 37 ● Presentation formats, page 38 ● Project management formats, page 40 ● Spreadsheet formats, page 41 ● Text and markup formats, page 43 ● Word processing formats, page 45 ● Miscellaneous formats, page 53 Supported file formats are added and updated frequently. Key to support tables Symbol Description Y The format is supported N The format is not supported P Partial metadata
    [Show full text]