Lossless compression: state of the art Many more variants

„ In our lessons we’ve seen some of the most common algorithms for

„ Literature and applications present some other algorithms and many more variants

„ Popular applications have proprietary encoding schemes

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 2 State of the art (2005)

„ Windows ƒ . ƒ .Cab ƒ .RAR ƒ .ACE ƒ .7z (7-Zip) „ ƒ .gz () ƒ .bz2 () ƒ .Z ()

„ Mac ƒ .zip ƒ .sit (Stuffit)

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 3 zip format - I

„ The ZIP file format was originally created by Phil Katz, founder of PKWARE

„ Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989.

„ Katz had converted compression routines of a previously available archival program, ARC, from C to optimized assembler code

„ He has been processed for for copyright infringement and condemned

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 4 zip format - II

„ Then he created his own file format, and the .zip format he designed was a much more efficient compression format than .ARC

„ In the mid 1990s, as more new computers included graphical user interfaces, some authors proposed shareware compression programs with a GUI

„ The most famous, in Windows environment, is Winzip (www..com)

„ zip format uses a a combination of the LZ77 algorithm and Huffman coding

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 5 zip format - III

„ In the late 1990s, various file manager software started integrating support for the zip format into the file manager user interface

„ Windows Explorer (Windows Me, Windows XP)

„ Finder (Mac OS X)

„ Nautilus file manager used with GNOME

„ Konqueror file manager used with KDE

„ Today all major desktop environments included zip file support in their file managers

„ Typically, a zip file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder

„ compression is handled in a way that is largely transparent to the end user 6 RAR format

„ developed by the russian Eugene Roshal

„ proprietary

„ the creator has released source code for decoding RAR archives, under a licence that allows free distribution and modification, but forbids its use to build a compatible encoder (WinRAR - www.rarlab.com)

„ usually slower than zip, but with better compression

„ encription

„ solid archives

„ extra redundancy for archive recovery

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 7 CAB format

„ native compressed archive format

„ Allows various compression methods, the most common is based on Lempel-Ziv compression and is very similar to zip format

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 8 ACE format

„ compression performance are generally better than zip, but compression is slower

„ www.winace.com

„ it has some interesting features

„ possibility of encription

„ solid archives

„ ...

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 9 7-zip format - I

„ 7-Zip is an open source predominantly for the Microsoft Windows , but also for Linux

„ command line program or graphical user interface

„ 7-Zip is , distributed under the GNU LGPL license (www.7-zip.org)

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 10 7-zip format - II

„ By default, the program creates files in the 7z archive format (with the file extension .7z) using the LZMA algorithm for compression

„ LZMA is a variant of LZ77 that used Markov chains

„ As all the other archiver seen, it supports a great variety of different formats

„ Uses optimized zip routines that increase compression ratio at cost of some compression speed

„ it is highly customizable

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 11 gzip - format

„ We’ve seen it with some details

„ Gzip (GNU zip) was created by Jean-loup Gailly and Mark Adler, and first released in 1992

„ Gzip is based on the deflate algorithm, which is a combination of LZ77 and Huffman coding

„ the deflate algorithm and the Gzip file format were standardized respectively as RFC 1951 and RFC 1952

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 12 bzip2 - format

„ open source algorithm and developed by Julian Seward in 1996

„ compression is better than gzip, even if considerably slower

„ bzip2 uses the Burrows-Wheeler transform

„ When a character string is transformed by the BWT, the order of the characters are rearranged in a way that make compression easier

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 13 Z format

„ Files that are compressed by the Unix command compress receive the file extension .Z

„ It uses an implementation of LZW

„ compress has fallen out of favor because of the UNISYS and IBM patents covering the LZW algorithm used by it

„ For this reason gzip and bzip2 became more popular

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 14 StuffIt

„ Raymond Lau wrote StuffIt in the 1980s as a high school student

„ Files compressed by StuffIt typically have the filename extension .sitx or .sit

„ StuffIt format is proprietary

„ Quite common in Macintosh environment

Gabriele Monfardini - Corso di Basi di Dati Multimediali a.a. 2004-2005 15