Lossless compression: state of the art Many more variants
In our lessons we’ve seen some of the most common algorithms for lossless compression
Literature and applications present some other algorithms and many more variants
Popular applications have proprietary encoding schemes
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2 State of the art (2005)
Windows .zip .Cab .RAR .ACE .7z (7-Zip) Linux .gz (gzip) .bz2 (bzip2) .Z (Compress)
Mac .zip .sit (Stuffit)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3 zip format - I
The ZIP file format was originally created by Phil Katz, founder of PKWARE
Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989.
Katz had converted compression routines of a previously available archival program, ARC, from C to optimized assembler code
He has been processed for for copyright infringement and condemned
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4 zip format - II
Then he created his own file format, and the .zip format he designed was a much more efficient compression format than .ARC
In the mid 1990s, as more new computers included graphical user interfaces, some authors proposed shareware compression programs with a GUI
The most famous, in Windows environment, is Winzip (www.winzip.com)
zip format uses a a combination of the LZ77 algorithm and Huffman coding
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 5 zip format - III
In the late 1990s, various file manager software started integrating support for the zip format into the file manager user interface
Windows Explorer (Windows Me, Windows XP)
Finder (Mac OS X)
Nautilus file manager used with GNOME
Konqueror file manager used with KDE
Today all major desktop environments included zip file support in their file managers
Typically, a zip file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder
compression is handled in a way that is largely transparent to the end user 6 RAR format
developed by the russian Eugene Roshal
proprietary
the creator has released source code for decoding RAR archives, under a licence that allows free distribution and modification, but forbids its use to build a compatible encoder (WinRAR - www.rarlab.com)
usually slower than zip, but with better compression
encription
solid archives
extra redundancy for archive recovery
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7 CAB format
Microsoft Windows native compressed archive format
Allows various compression methods, the most common is based on Lempel-Ziv compression and is very similar to zip format
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8 ACE format
compression performance are generally better than zip, but compression is slower
www.winace.com
it has some interesting features
possibility of encription
solid archives
...
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9 7-zip format - I
7-Zip is an open source file archiver predominantly for the Microsoft Windows operating system, but also for Linux
command line program or graphical user interface
7-Zip is free software, distributed under the GNU LGPL license (www.7-zip.org)
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 10 7-zip format - II
By default, the program creates files in the 7z archive format (with the file extension .7z) using the LZMA algorithm for compression
LZMA is a variant of LZ77 that used Markov chains
As all the other archiver seen, it supports a great variety of different formats
Uses optimized zip routines that increase compression ratio at cost of some compression speed
it is highly customizable
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11 gzip - format
We’ve seen it with some details
Gzip (GNU zip) was created by Jean-loup Gailly and Mark Adler, and first released in 1992
Gzip is based on the deflate algorithm, which is a combination of LZ77 and Huffman coding
the deflate algorithm and the Gzip file format were standardized respectively as RFC 1951 and RFC 1952
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 12 bzip2 - format
open source data compression algorithm and developed by Julian Seward in 1996
compression is better than gzip, even if considerably slower
bzip2 uses the Burrows-Wheeler transform
When a character string is transformed by the BWT, the order of the characters are rearranged in a way that make compression easier
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13 Z format
Files that are compressed by the Unix command compress receive the file extension .Z
It uses an implementation of LZW
compress has fallen out of favor because of the UNISYS and IBM patents covering the LZW algorithm used by it
For this reason gzip and bzip2 became more popular
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 14 StuffIt
Raymond Lau wrote StuffIt in the 1980s as a high school student
Files compressed by StuffIt typically have the filename extension .sitx or .sit
StuffIt format is proprietary
Quite common in Macintosh environment
Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15