Lossless Compression: State of the Art Many More Variants
Total Page:16
File Type:pdf, Size:1020Kb
Lossless compression: state of the art Many more variants In our lessons we’ve seen some of the most common algorithms for lossless compression Literature and applications present some other algorithms and many more variants Popular applications have proprietary encoding schemes Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2 State of the art (2005) Windows .zip .Cab .RAR .ACE .7z (7-Zip) Linux .gz (gzip) .bz2 (bzip2) .Z (Compress) Mac .zip .sit (Stuffit) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3 zip format - I The ZIP file format was originally created by Phil Katz, founder of PKWARE Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989. Katz had converted compression routines of a previously available archival program, ARC, from C to optimized assembler code He has been processed for for copyright infringement and condemned Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4 zip format - II Then he created his own file format, and the .zip format he designed was a much more efficient compression format than .ARC In the mid 1990s, as more new computers included graphical user interfaces, some authors proposed shareware compression programs with a GUI The most famous, in Windows environment, is Winzip (www.winzip.com) zip format uses a a combination of the LZ77 algorithm and Huffman coding Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 5 zip format - III In the late 1990s, various file manager software started integrating support for the zip format into the file manager user interface Windows Explorer (Windows Me, Windows XP) Finder (Mac OS X) Nautilus file manager used with GNOME Konqueror file manager used with KDE Today all major desktop environments included zip file support in their file managers Typically, a zip file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder compression is handled in a way that is largely transparent to the end user 6 RAR format developed by the russian Eugene Roshal proprietary the creator has released source code for decoding RAR archives, under a licence that allows free distribution and modification, but forbids its use to build a compatible encoder (WinRAR - www.rarlab.com) usually slower than zip, but with better compression encription solid archives extra redundancy for archive recovery Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7 CAB format Microsoft Windows native compressed archive format Allows various compression methods, the most common is based on Lempel-Ziv compression and is very similar to zip format Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8 ACE format compression performance are generally better than zip, but compression is slower www.winace.com it has some interesting features possibility of encription solid archives ... Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9 7-zip format - I 7-Zip is an open source file archiver predominantly for the Microsoft Windows operating system, but also for Linux command line program or graphical user interface 7-Zip is free software, distributed under the GNU LGPL license (www.7-zip.org) Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 10 7-zip format - II By default, the program creates files in the 7z archive format (with the file extension .7z) using the LZMA algorithm for compression LZMA is a variant of LZ77 that used Markov chains As all the other archiver seen, it supports a great variety of different formats Uses optimized zip routines that increase compression ratio at cost of some compression speed it is highly customizable Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11 gzip - format We’ve seen it with some details Gzip (GNU zip) was created by Jean-loup Gailly and Mark Adler, and first released in 1992 Gzip is based on the deflate algorithm, which is a combination of LZ77 and Huffman coding the deflate algorithm and the Gzip file format were standardized respectively as RFC 1951 and RFC 1952 Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 12 bzip2 - format open source data compression algorithm and developed by Julian Seward in 1996 compression is better than gzip, even if considerably slower bzip2 uses the Burrows-Wheeler transform When a character string is transformed by the BWT, the order of the characters are rearranged in a way that make compression easier Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13 Z format Files that are compressed by the Unix command compress receive the file extension .Z It uses an implementation of LZW compress has fallen out of favor because of the UNISYS and IBM patents covering the LZW algorithm used by it For this reason gzip and bzip2 became more popular Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 14 StuffIt Raymond Lau wrote StuffIt in the 1980s as a high school student Files compressed by StuffIt typically have the filename extension .sitx or .sit StuffIt format is proprietary Quite common in Macintosh environment Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15.