Lossless Compression: State of the Art Many More Variants

Total Page:16

File Type:pdf, Size:1020Kb

Lossless Compression: State of the Art Many More Variants Lossless compression: state of the art Many more variants In our lessons we’ve seen some of the most common algorithms for lossless compression Literature and applications present some other algorithms and many more variants Popular applications have proprietary encoding schemes 2 State of the art (2005) Windows . .zip . .Cab . .RAR . .ACE . .7z (7-Zip) Linux . .gz (gzip) . .bz2 (bzip2) . .Z (Compress) Mac . .zip . .sit (Stuffit) 3 zip format - I The ZIP file format was originally created by Phil Katz, founder of PKWARE Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989. Katz had converted compression routines of a previously available archival program, ARC, from C to optimized assembler code He has been processed for for copyright infringement and condemned 4 zip format - II Then he created his own file format, and the .zip format he designed was a much more efficient compression format than .ARC In the mid 1990s, as more new computers included graphical user interfaces, some authors proposed shareware compression programs with a GUI The most famous, in Windows environment, is Winzip (www.winzip.com) zip format uses a a combination of the LZ77 algorithm and Huffman coding 5 zip format - III In the late 1990s, various file manager software started integrating support for the zip format into the file manager user interface Windows Explorer (Windows Me, Windows XP) Finder (Mac OS X) Nautilus file manager used with GNOME Konqueror file manager used with KDE Today all major desktop environments included zip file support in their file managers Typically, a zip file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder compression is handled in a way that is largely transparent to the end user 6 RAR format developed by the russian Eugene Roshal proprietary the creator has released source code for decoding RAR archives, under a licence that allows free distribution and modification, but forbids its use to build a compatible encoder (WinRAR - www.rarlab.com) usually slower than zip, but with better compression encription solid archives extra redundancy for archive recovery 7 CAB format Microsoft Windows native compressed archive format Allows various compression methods, the most common is based on Lempel-Ziv compression and is very similar to zip format 8 ACE format compression performance are generally better than zip, but compression is slower www.winace.com it has some interesting features possibility of encription solid archives ... 9 7-zip format - I 7-Zip is an open source file archiver predominantly for the Microsoft Windows operating system, but also for Linux command line program or graphical user interface 7-Zip is free software, distributed under the GNU LGPL license (www.7-zip.org) 10 7-zip format - II By default, the program creates files in the 7z archive format (with the file extension .7z) using the LZMA algorithm for compression LZMA is a variant of LZ77 that used Markov chains As all the other archiver seen, it supports a great variety of different formats Uses optimized zip routines that increase compression ratio at cost of some compression speed it is highly customizable 11 gzip - format We’ve seen it with some details Gzip (GNU zip) was created by Jean-loup Gailly and Mark Adler, and first released in 1992 Gzip is based on the deflate algorithm, which is a combination of LZ77 and Huffman coding the deflate algorithm and the Gzip file format were standardized respectively as RFC 1951 and RFC 1952 12 bzip2 - format open source data compression algorithm and developed by Julian Seward in 1996 compression is better than gzip, even if considerably slower bzip2 uses the Burrows-Wheeler transform When a character string is transformed by the BWT, the order of the characters are rearranged in a way that make compression easier 13 Z format Files that are compressed by the Unix command compress receive the file extension .Z It uses an implementation of LZW compress has fallen out of favor because of the UNISYS and IBM patents covering the LZW algorithm used by it For this reason gzip and bzip2 became more popular 14 StuffIt Raymond Lau wrote StuffIt in the 1980s as a high school student Files compressed by StuffIt typically have the filename extension .sitx or .sit StuffIt format is proprietary Quite common in Macintosh environment 15.
Recommended publications
  • End-To-End Enterprise Encryption: a Look at Securezip® Technology
    End-to-End Enterprise Encryption: A Look at SecureZIP® Technology TECHNICAL WHITE PAPER WP 700.xxxx End-to-End Enterprise Encryption: A Look at SecureZIP Technology Table of Contents SecureZIP Executive Summary 3 SecureZIP: The Next Generation of ZIP 4 PKZIP: The Foundation for SecureZIP 4 Implementation of ZIP Encryption 5 Hybrid Cryptosystem 6 Crytopgraphic Calculation Sources 7 Digital Signing 7 In Step with the Data Protection Market’s Needs 7 Conclusion 8 WP-SZ-032609 | 2 End-to-End Enterprise Encryption: A Look at SecureZIP Technology End-to-End Enterprise Encryption: A Look at SecureZIP Technology Every day sensitive data is exchanged within your organization, both internally and with external partners. Personal health & insurance data of your employees is shared between your HR department and outside insurance carriers. Customer PII (Personally Identifiable Information) is transferred from your corporate headquarters to various offices around the world. Payment transaction data flows between your store locations and your payments processor. All of these instances involve sensitive data and regulated information that must be exchanged between systems, locations, and partners; a breach of any of them could lead to irreparable damage to your reputation and revenue. Organizations today must adopt a means for mitigating the internal and external risks of data breach and compromise. The required solution must support the exchange of data across operating systems to account for both the diversity of your own infrastructure and the unknown infrastructures of your customers, partners, and vendors. Moreover, that solution must integrate naturally into your existing workflows to keep operational cost and impact to minimum while still protecting data end-to-end.
    [Show full text]
  • PKZIP MVS User's Guide
    PKZIP for MVS MVS/ESA, OS/390, & z/OS User’s Guide PKMU-V5R5000 PKWARE, Inc. PKWARE, Inc. 9009 Springboro Pike Miamisburg, Ohio 45342 Sales: 937-847-2374 Support: 937-847-2687 Fax: 937-847-2375 Web Site: http://www.pkzip.com Sales - E-Mail: [email protected] Support - http://www.pkzip.com/support 5.5 Edition (2003) PKZIP for MVS™, PKZIP for OS/400™, PKZIP for VSE™, PKZIP for UNIX™, and PKZIP for Windows™ are just a few of the many members in the PKZIP® family. PKWARE, Inc. would like to thank all the individuals and companies -- including our customers, resellers, distributors, and technology partners -- who have helped make PKZIP® the industry standard for Trusted ZIP solutions. PKZIP® enables our customers to efficiently and securely transmit and store information across systems of all sizes, ranging from desktops to mainframes. This edition applies to the following PKWARE of Ohio, Inc. licensed program: PKZIP for MVS™ (Version 5, Release 5, 2003) PKZIP(R) is a registered trademark of PKWARE(R) Inc. Other product names mentioned in this manual may be a trademark or registered trademarks of their respective companies and are hereby acknowledged. Any reference to licensed programs or other material, belonging to any company, is not intended to state or imply that such programs or material are available or may be used. The copyright in this work is owned by PKWARE of Ohio, Inc., and the document is issued in confidence for the purpose only for which it is supplied. It must not be reproduced in whole or in part or used for tendering purposes except under an agreement or with the consent in writing of PKWARE of Ohio, Inc., and then only on condition that this notice is included in any such reproduction.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Implementing Compression on Distributed Time Series Database
    Implementing compression on distributed time series database Michael Burman School of Science Thesis submitted for examination for the degree of Master of Science in Technology. Espoo 05.11.2017 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Aalto University, P.O. BOX 11000, 00076 AALTO www.aalto.fi Abstract of the master’s thesis Author Michael Burman Title Implementing compression on distributed time series database Degree programme Major Computer Science Code of major SCI3042 Supervisor Prof. Kari Smolander Advisor Mgr. Jiri Kremser Date 05.11.2017 Number of pages 70+4 Language English Abstract Rise of microservices and distributed applications in containerized deployments are putting increasing amount of burden to the monitoring systems. They push the storage requirements to provide suitable performance for large queries. In this paper we present the changes we made to our distributed time series database, Hawkular-Metrics, and how it stores data more effectively in the Cassandra. We show that using our methods provides significant space savings ranging from 50 to 95% reduction in storage usage, while reducing the query times by over 90% compared to the nominal approach when using Cassandra. We also provide our unique algorithm modified from Gorilla compression algorithm that we use in our solution, which provides almost three times the throughput in compression with equal compression ratio. Keywords timeseries compression performance storage Aalto-yliopisto, PL 11000, 00076 AALTO www.aalto.fi Diplomityön tiivistelmä Tekijä Michael Burman Työn nimi Pakkausmenetelmät hajautetussa aikasarjatietokannassa Koulutusohjelma Pääaine Computer Science Pääaineen koodi SCI3042 Työn valvoja ja ohjaaja Prof. Kari Smolander Päivämäärä 05.11.2017 Sivumäärä 70+4 Kieli Englanti Tiivistelmä Hajautettujen järjestelmien yleistyminen on aiheuttanut valvontajärjestelmissä tiedon määrän kasvua, sillä aikasarjojen määrä on kasvanut ja niihin talletetaan useammin tietoa.
    [Show full text]
  • Steganography and Vulnerabilities in Popular Archives Formats.| Nyxengine Nyx.Reversinglabs.Com
    Hiding in the Familiar: Steganography and Vulnerabilities in Popular Archives Formats.| NyxEngine nyx.reversinglabs.com Contents Introduction to NyxEngine ............................................................................................................................ 3 Introduction to ZIP file format ...................................................................................................................... 4 Introduction to steganography in ZIP archives ............................................................................................. 5 Steganography and file malformation security impacts ............................................................................... 8 References and tools .................................................................................................................................... 9 2 Introduction to NyxEngine Steganography1 is the art and science of writing hidden messages in such a way that no one, apart from the sender and intended recipient, suspects the existence of the message, a form of security through obscurity. When it comes to digital steganography no stone should be left unturned in the search for viable hidden data. Although digital steganography is commonly used to hide data inside multimedia files, a similar approach can be used to hide data in archives as well. Steganography imposes the following data hiding rule: Data must be hidden in such a fashion that the user has no clue about the hidden message or file's existence. This can be achieved by
    [Show full text]
  • Zlib Home Site
    zlib Home Site http://zlib.net/ A Massively Spiffy Yet Delicately Unobtrusive Compression Library (Also Free, Not to Mention Unencumbered by Patents) (Not Related to the Linux zlibc Compressing File-I/O Library) Welcome to the zlib home page, web pages originally created by Greg Roelofs and maintained by Mark Adler . If this page seems suspiciously similar to the PNG Home Page , rest assured that the similarity is completely coincidental. No, really. zlib was written by Jean-loup Gailly (compression) and Mark Adler (decompression). Current release: zlib 1.2.6 January 29, 2012 Version 1.2.6 has many changes over 1.2.5, including these improvements: gzread() can now read a file that is being written concurrently gzgetc() is now a macro for increased speed Added a 'T' option to gzopen() for transparent writing (no compression) Added deflatePending() to return the amount of pending output Allow deflateSetDictionary() and inflateSetDictionary() at any time in raw mode deflatePrime() can now insert bits in the middle of the stream ./configure now creates a configure.log file with all of the results Added a ./configure --solo option to compile zlib with no dependency on any libraries Fixed a problem with large file support macros Fixed a bug in contrib/puff Many portability improvements You can also look at the complete Change Log . Version 1.2.5 fixes bugs in gzseek() and gzeof() that were present in version 1.2.4 (March 2010). All users are encouraged to upgrade immediately. Version 1.2.4 has many changes over 1.2.3, including these improvements:
    [Show full text]
  • PKZIP®/Securezip® for I5/OS® User's Guide
    PKZIP®/SecureZIP® ® for i5/OS User’s Guide SZIU- V10R05M02 PKWARE, Inc. PKWARE, Inc. 648 N Plankinton Avenue, Suite 220 Milwaukee, WI 53203 Main office: 888-4PKWARE (888-475-9273) Sales: 937-847-2374 (888-4PKWARE / 888-475-9273) Sales: Email: [email protected] Support: 937-847-2687 Support: http://www.pkware.com/support/system-i Fax: 414-289-9789 Web Site: http://www.pkware.com 10.0.5 Edition (2010) SecureZIP for z/OS, PKZIP for z/OS, SecureZIP for i5/OS®, PKZIP for i5/OS, SecureZIP for UNIX, and SecureZIP for Windows are just a few of the members of the PKZIP family. PKWARE Inc. would like to thank all the individuals and companies—including our customers, resellers, distributors, and technology partners—who have helped make PKZIP the industry standard for trusted ZIP solutions. PKZIP enables our customers to efficiently and securely transmit and store information across systems of all sizes, ranging from desktops to mainframes. This edition applies to the following PKWARE Inc. licensed programs: PKZIP for i5/OS (Version 10, Release 0.5, 2010) SecureZIP for i5/OS (Version 10, Release 0.5, 2010) SecureZIP Partner for i5/OS (Version 10, Release 0.5, 2010) PKWARE, PKZIP and SecureZIP are registered trademarks of PKWARE, Inc. z/OS, i5/OS, zSeries, and iSeries are registered trademarks of IBM Corporation. Other product names mentioned in this manual may be trademarks or registered trademarks of their respective companies and are hereby acknowledged. This product includes software developed by the OpenSSL Project for use in the OpenSSL Toolkit (http://www.openssl.org/) Any reference to licensed programs or other material, belonging to any company, is not intended to state or imply that such programs or material are available or may be used.
    [Show full text]
  • ORDERINFO 3 Copy
    ORDER info ORDER info Placing Your Order We will need to know the following information before your order can be quoted: PLEASE NOTE: Artwork with low resolution will project fuzzy and/or distorted and will not 1. Customer name and contact information. give the best appearance. Most web images are created at 72 dpi. Please do not try to resize an image taken from the Web from 72 dpi to a higher resolution. To receive the highest quality 2. The lighting instrument that will be projecting the gobo. image, we require artwork to be saved at a minimum resolution of 600 dpi with an image size 3. Quantity needed and gobo size (in millimeters). of at least 5 x 5 inches (127 x 127 mm). 4. Whether you intend to project the image on to a surface from the front or on to a screen ORDER from behind. Artwork received at lower resolutions will incur an additional (hourly) artwork fee. However, 5. Is the background of the image clear or black? the quality and detail of the gobo will still be determined by the original quality of the artwork. 6. When you need the order shipped. We also accept hard copies of artwork; such as photographs, slicks, or prints. Faxed images When placing orders, please e-mail the above information and attached artwork to are accepted, but usually do not reproduce well. Please submit a reasonably sized, clean copy info [email protected] . A quotation and proof will be generated and returned to you of your artwork. Additional (hourly) artwork fees may apply for poor quality artwork.
    [Show full text]
  • Forcepoint DLP Supported File Formats and Size Limits
    Forcepoint DLP Supported File Formats and Size Limits Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 This article provides a list of the file formats that can be analyzed by Forcepoint DLP, file formats from which content and meta data can be extracted, and the file size limits for network, endpoint, and discovery functions. See: ● Supported File Formats ● File Size Limits © 2021 Forcepoint LLC Supported File Formats Supported File Formats and Size Limits | Forcepoint DLP | v8.8.1 The following tables lists the file formats supported by Forcepoint DLP. File formats are in alphabetical order by format group. ● Archive For mats, page 3 ● Backup Formats, page 7 ● Business Intelligence (BI) and Analysis Formats, page 8 ● Computer-Aided Design Formats, page 9 ● Cryptography Formats, page 12 ● Database Formats, page 14 ● Desktop publishing formats, page 16 ● eBook/Audio book formats, page 17 ● Executable formats, page 18 ● Font formats, page 20 ● Graphics formats - general, page 21 ● Graphics formats - vector graphics, page 26 ● Library formats, page 29 ● Log formats, page 30 ● Mail formats, page 31 ● Multimedia formats, page 32 ● Object formats, page 37 ● Presentation formats, page 38 ● Project management formats, page 40 ● Spreadsheet formats, page 41 ● Text and markup formats, page 43 ● Word processing formats, page 45 ● Miscellaneous formats, page 53 Supported file formats are added and updated frequently. Key to support tables Symbol Description Y The format is supported N The format is not supported P Partial metadata
    [Show full text]
  • Data Compression and Archiving Software Implementation and Their Algorithm Comparison
    Calhoun: The NPS Institutional Archive Theses and Dissertations Thesis Collection 1992-03 Data compression and archiving software implementation and their algorithm comparison Jung, Young Je Monterey, California. Naval Postgraduate School http://hdl.handle.net/10945/26958 NAVAL POSTGRADUATE SCHOOL Monterey, California THESIS^** DATA COMPRESSION AND ARCHIVING SOFTWARE IMPLEMENTATION AND THEIR ALGORITHM COMPARISON by Young Je Jung March, 1992 Thesis Advisor: Chyan Yang Approved for public release; distribution is unlimited T25 46 4 I SECURITY CLASSIFICATION OF THIS PAGE REPORT DOCUMENTATION PAGE la REPORT SECURITY CLASSIFICATION 1b RESTRICTIVE MARKINGS UNCLASSIFIED 2a SECURITY CLASSIFICATION AUTHORITY 3 DISTRIBUTION/AVAILABILITY OF REPORT Approved for public release; distribution is unlimite 2b DECLASSIFICATION/DOWNGRADING SCHEDULE 4 PERFORMING ORGANIZATION REPORT NUMBER(S) 5 MONITORING ORGANIZATION REPORT NUMBER(S) 6a NAME OF PERFORMING ORGANIZATION 6b OFFICE SYMBOL 7a NAME OF MONITORING ORGANIZATION Naval Postgraduate School (If applicable) Naval Postgraduate School 32 6c ADDRESS {City, State, and ZIP Code) 7b ADDRESS (City, State, and ZIP Code) Monterey, CA 93943-5000 Monterey, CA 93943-5000 8a NAME OF FUNDING/SPONSORING 8b OFFICE SYMBOL 9 PROCUREMENT INSTRUMENT IDENTIFICATION NUMBER ORGANIZATION (If applicable) 8c ADDRESS (City, State, and ZIP Code) 10 SOURCE OF FUNDING NUMBERS Program Element No Project Nc Work Unit Accession Number 1 1 TITLE (Include Security Classification) DATA COMPRESSION AND ARCHIVING SOFTWARE IMPLEMENTATION AND THEIR ALGORITHM COMPARISON 12 PERSONAL AUTHOR(S) Young Je Jung 13a TYPE OF REPORT 13b TIME COVERED 1 4 DATE OF REPORT (year, month, day) 15 PAGE COUNT Master's Thesis From To March 1992 94 16 SUPPLEMENTARY NOTATION The views expressed in this thesis are those of the author and do not reflect the official policy or position of the Department of Defense or the U.S.
    [Show full text]
  • Rule Base with Frequent Bit Pattern and Enhanced K-Medoid Algorithm for the Evaluation of Lossless Data Compression
    Volume 3, No. 1, Jan-Feb 2012 ISSN No. 0976-5697 International Journal of Advanced Research in Computer Science RESEARCH PAPER Available Online at www.ijarcs.info Rule Base with Frequent Bit Pattern and Enhanced k-Medoid Algorithm for the Evaluation of Lossless Data Compression. Nishad P.M.* Dr. N. Nalayini Ph.D Scholar, Department Of Computer Science Associate professor, Department of computer science NGM NGM College, Pollachi, India College Pollachi, Coimbatore, India [email protected] [email protected] Abstract: This paper presents a study of various lossless compression algorithms; to test the performance and the ability of compression of each algorithm based on ten different parameters. For evaluation the compression ratios of each algorithm on different parameters are processed. To classify the algorithms based on the compression ratio, rule base is constructed to mine with frequent bit pattern to analyze the variations in various compression algorithms. Also, enhanced K- Medoid clustering is used to cluster the various data compression algorithms based on various parameters. The cluster falls dissentingly high to low after the enhancement. The framed rule base consists of 1,048,576 rules, which is used to evaluate the compression algorithm. Two hundred and eleven Compression algorithms are used for this study. The experimental result shows only few algorithm satisfies the range “High” for more number of parameters. Keywords: Lossless compression, parameters, compression ratio, rule mining, frequent bit pattern, K–Medoid, clustering. I. INTRODUCTION the maximum shows the peek compression ratio of algorithms on various parameters, for example 19.43 is the Data compression is a method of encoding rules that minimum compression ratio and the 76.84 is the maximum allows substantial reduction in the total number of bits to compression ratio for the parameter EXE shown in table-1 store or transmit a file.
    [Show full text]
  • Preparing and Submitting Electronic Files for the IEEE 2000 Mobile Ad Hoc Networking and Computing Conference (Mobihoc)
    Preparing and Submitting Electronic Files for the IEEE 2000 Mobile Ad Hoc Networking and Computing Conference (MobiHOC) Please read the detailed instructions that follow before you start, taking particular note of preferred fonts, formats, and delivery options. The quality of the finished product is largely dependent upon receiving your help at this stage of the publication process. Producing Your Paper Acceptable Formats Papers can be submitted in either PostScript (PS) or Portable Document Format (PDF) (see Generating PostScript and PDF Files). Using LaTeX Documents converted from the TeX typesetting language into PostScript or PDF files usually contain fixed-resolution bitmap fonts that do not print or display well on a variety of printer and computer screens. Although Adobe Acrobat Distiller will convert a PostScript language file with bitmapped fonts (level 3) into PDF, these fonts display slowly and do not render well on screen in the resulting PDF file. But, if you use Type 1 versions of the fonts you will get a compact file format that delivers the optimal font quality when used with any display screen, zoom mode, or printer resolution. Using Type 1 fonts with DVIPS The default behavior of Rokicki's DVIPS is to embed Type 3 bitmapped fonts. You need access to the Type 1 versions of the fonts you use in your documents in order to embed the font information (see Fonts). Type 1 versions of the Computer Modern fonts are available in the BaKoMa collection and from commercial type vendors. Before distributing files with embedded fonts, consult the license agreement for your font package.
    [Show full text]