File Compression and Decompression in Cloveretl

Total Page:16

File Type:pdf, Size:1020Kb

File Compression and Decompression in Cloveretl MASARYK UNIVERSITY FACULTY OF INFORMATICS Û¡¢£¤¥¦§¨ª«¬­Æ°±²³´µ·¸¹º»¼½¾¿Ý File Compression and Decompression in CloverETL BACHELOR’S THESIS Sebastián Lazo ˇn Brno, spring 2014 Declaration Hereby I declare, that this paper is my original authorial work, which I have worked out by my own. All sources, references and literature used or excerpted during elaboration of this work are properly cited and listed in complete reference to the due source. Sebastián Lazoˇn Advisor: doc. RNDr. Tomáš Pitner, Ph.D. i Acknowledgement I would like to express my thanks to the Javlin’s employees, especially Mgr. Jan Sedláˇcekfor their time, provided assistance and feedback on problems throughout the process of project development. I would also like to thank doc. RNDr. Tomáš Pitner, Ph.D. for valuable advice on the thesis’s text. ii Abstract The aim of the thesis was to create set of components for compression, de- compression and manipulation with compressed file archives for CloverETL. The thesis provides an overview of ETL processes and introduction to CloverETL, implemented archive formats and used external libraries in the first part, while design, implementation and testing of the developed com- ponents in the second. iii Keywords Java, CloverETL, compression, decompression, ZIP, TAR, GZIP iv Contents 1 Introduction ...............................1 1.1 Motivation . .1 1.2 Purpose . .1 1.3 Structure . .2 2 ETL ....................................3 2.1 In general . .3 2.1.1 Extract . .3 2.1.2 Transform . .3 2.1.3 Load . .4 2.2 CloverETL . .4 2.2.1 Transformation graph . .4 2.2.1.1 Components . .5 2.2.1.2 Edges . .5 2.2.1.3 Sequences . .6 2.2.1.4 Lookup tables . .6 3 Data compression ............................7 3.1 In general . .7 3.1.1 Lossy . .7 3.1.2 Lossless . .7 3.1.2.1 ZIP . .8 3.1.2.2 TAR . .9 3.1.2.3 GZIP . .9 3.1.2.4 The DEFLATE algorithm . 10 3.2 In Java . 10 3.2.1 Java.util.zip package . 10 3.2.2 Apache Commons CompressTM ............ 11 3.2.3 TrueZip . 11 4 Analysis ................................. 13 4.1 File operation components . 13 4.1.1 Common attributes of Compressed and File Operation 13 4.2 Requirements . 14 4.2.1 Supported URIs . 14 5 Design .................................. 15 5.1 Project architecture . 15 5.1.1 Components . 15 5.1.2 CompressedFileManager . 16 5.1.3 CompressedOperationHandler . 17 v 5.1.3.1 Resolve . 17 5.1.3.2 List . 18 5.1.3.3 Delete . 18 5.1.3.4 Copy/Move . 18 5.1.3.5 Compress . 18 5.1.3.6 Decompress . 18 5.1.3.7 Other methods . 19 5.1.4 ArchiveInfo . 19 5.1.5 CompressedUtil . 19 5.2 Components attributes . 19 5.2.1 Input mapping . 19 5.2.2 Output mapping . 20 5.2.3 Error mapping . 21 6 Implementation ............................. 22 6.1 Used external libraries . 22 6.2 Integration with CloverETL . 22 6.2.1 Integration with Engine . 22 6.2.2 Integration with Designer . 23 7 Testing and documentation ...................... 24 7.1 Graph tests . 24 7.2 Unit tests . 24 7.3 Documentation . 24 8 Conclusion ................................ 25 8.1 Further extension of functionality . 25 8.2 Summary . 25 vi 1 Introduction Information are essential part of every enterprise whether as a subject of business or after analysis, by providing look at its functioning and help with its management. At first, these information saved in enterprise sys- tems have to be extracted and processed. But when we realize these data can be stored in different repositories, platforms and applications, we find out a specialized tool is needed. ETL, shorthand for extract, transform, load represents tools which provide this functionality. A properly designed ETL system extracts data from the source systems, enforces data quality and consistency, conforms data so that separate sources can be used together, and finally delivers data in presentation-ready format.[2] 1.1 Motivation Javlin’s CloverETL is one of these tools. CloverETL represents group of multi-platform Java-based software products implementing ETL processes. It currently supports reading and writing of compressed data, but it has not been able to access and manipulate content of a compressed archive yet. During data writing, it is often more efficient to create files uncompressed and them compress them simultaneously. 1.2 Purpose Purpose of this thesis is to create set of components for compression, de- compression and manipulation with compressed file archives for CloverETL. Components’ interface has to be similar to the existing ones from cate- gory FileOperation which are working with uncompressed files. Their fu- ture extension with new compression formats should be as simple as possi- ble. New component category CompressedFileOperations consists of these components: ListCompressedFiles provides content listing of archives DeleteCompressedFiles removes entries from archives CopyCompressedFiles copies entries from one archive to another MoveCompressedFiles moves entries from one archive to another CompressFiles creates new archive or adds files to existing 1 1. INTRODUCTION DecompressFiles decompresses archive entries to selected location User and developer documentation are also part of the thesis. 1.3 Structure Thesis is divided to eight chapters. The second is dedicated to introduction to field of ETL tools, presentation of CloverETL and explaining basic princi- ples of how it works. The third talks about compression methods and algo- rithms in general, compression support in Java and functionality provided by external libraries. Following parts contain description of analysis and design of the components, implementation, testing and writing documen- tation. In the conclusion, options for further development of the created components’ functionality are presented. 2 2 ETL 2.1 In general The term ETL is essential part of data warehousing 1 and represents process of data extraction from data source, transformation to fit operational needs and loading to target location. After the data are collected from multiple sources (extraction), they are reformatted and cleansed for operational needs (transformation). Most of numerous extraction and transformation tools also enable loading of the data into the target location, mostly database, data warehouse or a data mart to be analyzed and allow developers to create applications and sup- port users decisions.[2] Except for data warehousing and business intelligence, ETL Tools can also be used to move data from one operational system to another. [11] 2.1.1 Extract The Extract step covers the data extraction from the source system and makes it accessible for further processing. Usually, data are retrieved from different source systems. These systems may use a different data organiza- tion or format so the extraction must convert the data into a format suitable for transformation processing. [11] This process should use as little resources as possible, it should be de- signed in a way that it does not negatively affect the source system in terms or performance, response time or any kind of locking. [12] 2.1.2 Transform The transform stage of an ETL process involves an application of a series of rules or functions to the extracted data. It includes validation of records and their rejection if they are not acceptable as well as integration part. While some data sources require very little or even no manipulation of data, other may require one or more transformations to meet the business and technical requirements of the target database. These transformations can include: • conversion • clearing of the duplicates 1. a database used for reporting and data analysis 3 2. ETL • standardizing • filtering and sorting • translating • looking up or verifying if the data sources are inconsistent A good ETL tool must enable building up of complex processes and extend- ing a tool library so custom user’s functions can be added. [11, 12] 2.1.3 Load The loading is the last stage of ETL process and it loads extracted and trans- formed data into a target repository. Specialized proprietary technologies for effective and optimal data storage are often used. [11] 2.2 CloverETL CloverETL is family of multiplatform software products implementing ETL processes created in Java. It consists of these products: [7] CloverETL Engine is the base member of the family. It a run-time layer that executes transformation graphs created in CloverETL Designer. The Engine is stand-alone Java library which can be embedded into other Java applications. CloverETL Designer is a powerful Java-based standalone application for data extraction, transformation and loading built upon extensible Eclipse platform. It allows users user-friendly creating of ETL trans- formations either locally or remotely on server via CloverETL Server. CloverETL Server is fully integrated with Designer and allows running ETL processes in server environment, where scheduling, parallel ex- ecution of graphs and load balancing can be achieved. 2.2.1 Transformation graph Transformation graph is directed acyclic graph and has to contain at least one node. Nodes represent components and are the most important part of the graph, while the edges connecting them behave as data channels. There are few other elements which can be found in the transformation graph like sequences, database connections and lookup tables. 4 2. ETL Each graph is also divided into number of smaller units called phases. Every graph contains at least one phase and every node belongs to exactly one phase and during graph execution they are sequentially executed. 2.2.1.1 Components As mentioned before, components are the most important graph elements. Typically, each of the components executes single data transformation. Most of the components have ports through which they can receive data and/or send the processed data out and most of them work only when edges are connected to these ports. Each edge in a graph connected to some port must have metadata assigned to it.
Recommended publications
  • Lossless Audio Codec Comparison
    Contents Introduction 3 1 CD-audio test 4 1.1 CD's used . .4 1.2 Results all CD's together . .4 1.3 Interesting quirks . .7 1.3.1 Mono encoded as stereo (Dan Browns Angels and Demons) . .7 1.3.2 Compressibility . .9 1.4 Convergence of the results . 10 2 High-resolution audio 13 2.1 Nine Inch Nails' The Slip . 13 2.2 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 16 2.3 Wasted bits . 18 3 Multichannel audio 20 3.1 Howard Shore's soundtrack for The Lord of the Rings: The Return of the King . 20 A Motivation for choosing these CDs 23 B Test setup 27 B.1 Scripting and graphing . 27 B.2 Codecs and parameters used . 27 B.3 MD5 checksumming . 28 C Revision history 30 Bibliography 31 2 Introduction While testing the efficiency of lossy codecs can be quite cumbersome (as results differ for each person), comparing lossless codecs is much easier. As the last well documented and comprehensive test available on the internet has been a few years ago, I thought it would be a good idea to update. Beside comparing with CD-audio (which is often done to assess codec performance) and spitting out a grand total, this comparison also looks at extremes that occurred during the test and takes a look at 'high-resolution audio' and multichannel/surround audio. While the comparison was made to update the comparison-page on the FLAC website, it aims to be fair and unbiased.
    [Show full text]
  • Contrasting the Performance of Compression Algorithms on Genomic Data
    Contrasting the Performance of Compression Algorithms on Genomic Data Cornel Constantinescu, IBM Research Almaden Outline of the Talk: • Introduction / Motivation • Data used in experiments • General purpose compressors comparison • Simple Improvements • Special purpose compression • Transparent compression – working on compressed data (prototype) • Parallelism / Multithreading • Conclusion Introduction / Motivation • Despite the large number of research papers and compression algorithms proposed for compressing genomic data generated by sequencing machines, by far the most commonly used compression algorithm in the industry for FASTQ data is gzip. • The main drawbacks of the proposed alternative special-purpose compression algorithms are: • slow speed of either compression or decompression or both, and also their • brittleness by making various limiting assumptions about the input FASTQ format (for example, the structure of the headers or fixed lengths of the records [1]) in order to further improve their specialized compression. 1. Ibrahim Numanagic, James K Bonfield, Faraz Hach, Jan Voges, Jorn Ostermann, Claudio Alberti, Marco Mattavelli, and S Cenk Sahinalp. Comparison of high-throughput sequencing data compression tools. Nature Methods, 13(12):1005–1008, October 2016. Fast and Efficient Compression of Next Generation Sequencing Data 2 2 General Purpose Compression of Genomic Data As stated earlier, gzip/zlib compression is the method of choice by the industry for FASTQ genomic data. FASTQ genomic data is a text-based format (ASCII readable text) for storing a biological sequence and the corresponding quality scores. Each sequence letter and quality score is encoded with a single ASCII character. FASTQ data is structured in four fields per record (a “read”). The first field is the SEQUENCE ID or the header of the read.
    [Show full text]
  • Schematic Entry
    Schematic Entry Copyrights Software, documentation and related materials: Copyright © 2002 Altium Limited This software product is copyrighted and all rights are reserved. The distribution and sale of this product are intended for the use of the original purchaser only per the terms of the License Agreement. This document may not, in whole or part, be copied, photocopied, reproduced, translated, reduced or transferred to any electronic medium or machine-readable form without prior consent in writing from Altium Limited. U.S. Government use, duplication or disclosure is subject to RESTRICTED RIGHTS under applicable government regulations pertaining to trade secret, commercial computer software developed at private expense, including FAR 227-14 subparagraph (g)(3)(i), Alternative III and DFAR 252.227-7013 subparagraph (c)(1)(ii). P-CAD is a registered trademark and P-CAD Schematic, P-CAD Relay, P-CAD PCB, P-CAD ProRoute, P-CAD QuickRoute, P-CAD InterRoute, P-CAD InterRoute Gold, P-CAD Library Manager, P-CAD Library Executive, P-CAD Document Toolbox, P-CAD InterPlace, P-CAD Parametric Constraint Solver, P-CAD Signal Integrity, P-CAD Shape-Based Autorouter, P-CAD DesignFlow, P-CAD ViewCenter, Master Designer and Associate Designer are trademarks of Altium Limited. Other brand names are trademarks of their respective companies. Altium Limited www.altium.com Table of Contents chapter 1 Introducing P-CAD Schematic P-CAD Schematic Features ................................................................................................1 About
    [Show full text]
  • Cluster-Based Delta Compression of a Collection of Files Department of Computer and Information Science
    Cluster-Based Delta Compression of a Collection of Files Zan Ouyang Nasir Memon Torsten Suel Dimitre Trendafilov Department of Computer and Information Science Technical Report TR-CIS-2002-05 12/27/2002 Cluster-Based Delta Compression of a Collection of Files Zan Ouyang Nasir Memon Torsten Suel Dimitre Trendafilov CIS Department Polytechnic University Brooklyn, NY 11201 Abstract Delta compression techniques are commonly used to succinctly represent an updated ver- sion of a file with respect to an earlier one. In this paper, we study the use of delta compression in a somewhat different scenario, where we wish to compress a large collection of (more or less) related files by performing a sequence of pairwise delta compressions. The problem of finding an optimal delta encoding for a collection of files by taking pairwise deltas can be re- duced to the problem of computing a branching of maximum weight in a weighted directed graph, but this solution is inefficient and thus does not scale to larger file collections. This motivates us to propose a framework for cluster-based delta compression that uses text clus- tering techniques to prune the graph of possible pairwise delta encodings. To demonstrate the efficacy of our approach, we present experimental results on collections of web pages. Our experiments show that cluster-based delta compression of collections provides significant im- provements in compression ratio as compared to individually compressing each file or using tar+gzip, at a moderate cost in efficiency. A shorter version of this paper appears in the Proceedings of the 3rd International Con- ference on Web Information Systems Engineering (WISE), December 2002.
    [Show full text]
  • PKZIP MVS User's Guide
    PKZIP for MVS MVS/ESA, OS/390, & z/OS User’s Guide PKMU-V5R5000 PKWARE, Inc. PKWARE, Inc. 9009 Springboro Pike Miamisburg, Ohio 45342 Sales: 937-847-2374 Support: 937-847-2687 Fax: 937-847-2375 Web Site: http://www.pkzip.com Sales - E-Mail: [email protected] Support - http://www.pkzip.com/support 5.5 Edition (2003) PKZIP for MVS™, PKZIP for OS/400™, PKZIP for VSE™, PKZIP for UNIX™, and PKZIP for Windows™ are just a few of the many members in the PKZIP® family. PKWARE, Inc. would like to thank all the individuals and companies -- including our customers, resellers, distributors, and technology partners -- who have helped make PKZIP® the industry standard for Trusted ZIP solutions. PKZIP® enables our customers to efficiently and securely transmit and store information across systems of all sizes, ranging from desktops to mainframes. This edition applies to the following PKWARE of Ohio, Inc. licensed program: PKZIP for MVS™ (Version 5, Release 5, 2003) PKZIP(R) is a registered trademark of PKWARE(R) Inc. Other product names mentioned in this manual may be a trademark or registered trademarks of their respective companies and are hereby acknowledged. Any reference to licensed programs or other material, belonging to any company, is not intended to state or imply that such programs or material are available or may be used. The copyright in this work is owned by PKWARE of Ohio, Inc., and the document is issued in confidence for the purpose only for which it is supplied. It must not be reproduced in whole or in part or used for tendering purposes except under an agreement or with the consent in writing of PKWARE of Ohio, Inc., and then only on condition that this notice is included in any such reproduction.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Dspic DSC Speex Speech Encoding/Decoding Library As a Development Tool to Emulate and Debug Firmware on a Target Board
    dsPIC® DSC Speex Speech Encoding/Decoding Library User’s Guide © 2008-2011 Microchip Technology Inc. DS70328C Note the following details of the code protection feature on Microchip devices: • Microchip products meet the specification contained in their particular Microchip Data Sheet. • Microchip believes that its family of products is one of the most secure families of its kind on the market today, when used in the intended manner and under normal conditions. • There are dishonest and possibly illegal methods used to breach the code protection feature. All of these methods, to our knowledge, require using the Microchip products in a manner outside the operating specifications contained in Microchip’s Data Sheets. Most likely, the person doing so is engaged in theft of intellectual property. • Microchip is willing to work with the customer who is concerned about the integrity of their code. • Neither Microchip nor any other semiconductor manufacturer can guarantee the security of their code. Code protection does not mean that we are guaranteeing the product as “unbreakable.” Code protection is constantly evolving. We at Microchip are committed to continuously improving the code protection features of our products. Attempts to break Microchip’s code protection feature may be a violation of the Digital Millennium Copyright Act. If such acts allow unauthorized access to your software or other copyrighted work, you may have a right to sue for relief under that Act. Information contained in this publication regarding device Trademarks applications and the like is provided only for your convenience The Microchip name and logo, the Microchip logo, dsPIC, and may be superseded by updates.
    [Show full text]
  • Encryption Introduction to Using 7-Zip
    IT Services Training Guide Encryption Introduction to using 7-Zip It Services Training Team The University of Manchester email: [email protected] www.itservices.manchester.ac.uk/trainingcourses/coursesforstaff Version: 5.3 Training Guide Introduction to Using 7-Zip Page 2 IT Services Training Introduction to Using 7-Zip Table of Contents Contents Introduction ......................................................................................................................... 4 Compress/encrypt individual files ....................................................................................... 5 Email compressed/encrypted files ....................................................................................... 8 Decrypt an encrypted file ..................................................................................................... 9 Create a self-extracting encrypted file .............................................................................. 10 Decrypt/un-zip a file .......................................................................................................... 14 APPENDIX A Downloading and installing 7-Zip ................................................................. 15 Help and Further Reference ............................................................................................... 18 Page 3 Training Guide Introduction to Using 7-Zip Introduction 7-Zip is an application that allows you to: Compress a file – for example a file that is 5MB can be compressed to 3MB Secure the
    [Show full text]
  • Implementing Compression on Distributed Time Series Database
    Implementing compression on distributed time series database Michael Burman School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 05.11.2017 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Michael Burman Title Implementing compression on distributed time series database Degree programme Major Computer Science Code of major SCI3042 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Date 05.11.2017 Number of pages 70+4 Language English Abstract Rise of microservices and distributed applications in containerized deployments are putting increasing amount of burden to the monitoring systems. They push the storage requirements to provide suitable performance for large queries. In this paper we present the changes we made to our distributed time series database, Hawkular-Metrics, and how it stores data more effectively in the Cassandra. We show that using our methods provides significant space savings ranging from 50 to 95% reduction in storage usage, while reducing the query times by over 90% compared to the nominal approach when using Cassandra. We also provide our unique algorithm modified from Gorilla compression algorithm that we use in our solution, which provides almost three times the throughput in compression with equal compression ratio. Keywords timeseries compression performance storage Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Diplomityön tiivistelmä Tekijä Michael Burman Työn nimi Pakkausmenetelmät hajautetussa aikasarjatietokannassa Koulutusohjelma Pääaine Computer Science Pääaineen koodi SCI3042 Työn valvoja ja ohjaaja Prof. Kari Smolander Päivämäärä 05.11.2017 Sivumäärä 70+4 Kieli Englanti Tiivistelmä Hajautettujen järjestelmien yleistyminen on aiheuttanut valvontajärjestelmissä tiedon määrän kasvua, sillä aikasarjojen määrä on kasvanut ja niihin talletetaan useammin tietoa.
    [Show full text]
  • Pack, Encrypt, Authenticate Document Revision: 2021 05 02
    PEA Pack, Encrypt, Authenticate Document revision: 2021 05 02 Author: Giorgio Tani Translation: Giorgio Tani This document refers to: PEA file format specification version 1 revision 3 (1.3); PEA file format specification version 2.0; PEA 1.01 executable implementation; Present documentation is released under GNU GFDL License. PEA executable implementation is released under GNU LGPL License; please note that all units provided by the Author are released under LGPL, while Wolfgang Ehrhardt’s crypto library units used in PEA are released under zlib/libpng License. PEA file format and PCOMPRESS specifications are hereby released under PUBLIC DOMAIN: the Author neither has, nor is aware of, any patents or pending patents relevant to this technology and do not intend to apply for any patents covering it. As far as the Author knows, PEA file format in all of it’s parts is free and unencumbered for all uses. Pea is on PeaZip project official site: https://peazip.github.io , https://peazip.org , and https://peazip.sourceforge.io For more information about the licenses: GNU GFDL License, see http://www.gnu.org/licenses/fdl.txt GNU LGPL License, see http://www.gnu.org/licenses/lgpl.txt 1 Content: Section 1: PEA file format ..3 Description ..3 PEA 1.3 file format details ..5 Differences between 1.3 and older revisions ..5 PEA 2.0 file format details ..7 PEA file format’s and implementation’s limitations ..8 PCOMPRESS compression scheme ..9 Algorithms used in PEA format ..9 PEA security model .10 Cryptanalysis of PEA format .12 Data recovery from
    [Show full text]
  • Steganography and Vulnerabilities in Popular Archives Formats.| Nyxengine Nyx.Reversinglabs.Com
    Hiding in the Familiar: Steganography and Vulnerabilities in Popular Archives Formats.| NyxEngine nyx.reversinglabs.com Contents Introduction to NyxEngine ............................................................................................................................ 3 Introduction to ZIP file format ...................................................................................................................... 4 Introduction to steganography in ZIP archives ............................................................................................. 5 Steganography and file malformation security impacts ............................................................................... 8 References and tools .................................................................................................................................... 9 2 Introduction to NyxEngine Steganography1 is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. When it comes to digital steganography no stone should be left unturned in the search for viable hidden data. Although digital steganography is commonly used to hide data inside multimedia files, a similar approach can be used to hide data in archives as well. Steganography imposes the following data hiding rule: Data must be hidden in such a fashion that the user has no clue about the hidden message or file's existence. This can be achieved by
    [Show full text]
  • Chapter 11. Media Formats for Data Submission and Archive 11-1
    Chapter 11. Media Formats for Data Submission and Archive 11-1 Chapter 11. Media Formats for Data Submission and Archive This standard identifies the physical media formats to be used for data submission or delivery to the PDS or its science nodes. The PDS expects flight projects to deliver all archive products on magnetic or optical media. Electronic delivery of modest volumes of special science data products may be negotiated with the science nodes. Archive Planning - During archive planning, the data producer and PDS will determine the medium (or media) to use for data submission and archiving. This standard lists the media that are most commonly used for submitting data to and subsequently archiving data with the PDS. Delivery of data on media other than those listed here may be negotiated with the PDS on a case- by-case basis. Physical Media for Archive - For archival products only media that conform to the appropriate International Standards Organization (ISO) standard for physical and logical recording formats may be used. 1. The preferred data delivery medium is the Compact Disk (CD-ROM or CD-Recordable) produced in ISO 9660 format, using Interchange Level 1, subject to the restrictions listed in Section 10.1.1. 2. Compact Disks may be produced in ISO 9660 format using Interchange Level 2, subject to the restrictions listed in Section 10.1.2. 3. Digital Versatile Disk (DVD-ROM or DVD-R) should be produced in UDF-Bridge format (Universal Disc Format) with ISO 9660 Level 1 or Level 2 compatibility. Because of hardware compatibility and long-term stability issues, the use of 12-inch Write Once Read Many (WORM) disk, 8-mm Exabyte tape, 4-mm DAT tape, Bernoulli Disks, Zip disks, Syquest disks and Jaz disks is not recommended for archival use.
    [Show full text]