Lossless compression: state of the art Many more variants

 In our lessons we’ve seen some of the most common algorithms for

 Literature and applications present some other algorithms and many more variants

 Popular applications have proprietary encoding schemes

2 State of the art (2005)

 Windows . . . .Cab . .RAR . .ACE . .7z (7-Zip)  Linux . .gz () . .bz2 () . .Z ()

 Mac . .zip . .sit (Stuffit)

3 zip format - I

 The ZIP was originally created by , founder of PKWARE

 Katz publicly released technical documentation on the ZIP file format, along with the first version of his PKZIP archiver, in January 1989.

 Katz had converted compression routines of a previously available archival program, ARC, from to optimized assembler code

 He has been processed for for copyright infringement and condemned

4 zip format - II

 Then he created his own file format, and the .zip format he designed was a much more efficient compression format than .ARC

 In the mid 1990s, as more new computers included graphical user interfaces, some authors proposed compression programs with a GUI

 The most famous, in Windows environment, is Winzip (www..com)

 zip format uses a a combination of the LZ77 algorithm and

5 zip format - III

 In the late 1990s, various file manager started integrating support for the zip format into the file manager user interface

 Windows Explorer (Windows Me, Windows XP)

 Finder (Mac OS X)

 Nautilus file manager used with GNOME

 Konqueror file manager used with KDE

 Today all major desktop environments included zip file support in their file managers

 Typically, a zip file may be treated as a directory or folder, so that files are copied into and out of it in the same manner as any other folder

 compression is handled in a way that is largely transparent to the end user 6 RAR format

 developed by the russian Eugene Roshal

 proprietary

 the creator has released for decoding RAR archives, under a licence that allows free distribution and modification, but forbids its use to build a compatible encoder (WinRAR - www.rarlab.com)

 usually slower than zip, but with better compression

 encription

 solid archives

 extra redundancy for archive recovery

7 CAB format

native compressed archive format

 Allows various compression methods, the most common is based on Lempel-Ziv compression and is very similar to zip format

8 ACE format

 compression performance are generally better than zip, but compression is slower

 www..com

 it has some interesting features

 possibility of encription

 solid archives

 ...

9 7-zip format - I

 7-Zip is an open source predominantly for the Microsoft Windows operating system, but also for Linux

 command line program or graphical user interface

 7-Zip is , distributed under the GNU LGPL license (www.7-zip.org)

10 7-zip format - II

 By default, the program creates files in the 7z archive format (with the file extension .7z) using the LZMA algorithm for compression

 LZMA is a variant of LZ77 that used Markov chains

 As all the other archiver seen, it supports a great variety of different formats

 Uses optimized zip routines that increase compression ratio at cost of some compression speed

 it is highly customizable

11 gzip - format

 We’ve seen it with some details

 Gzip (GNU zip) was created by Jean-loup Gailly and Mark Adler, and first released in 1992

 Gzip is based on the algorithm, which is a combination of LZ77 and Huffman coding

 the deflate algorithm and the Gzip file format were standardized respectively as RFC 1951 and RFC 1952

12 bzip2 - format

 open source algorithm and developed by Julian Seward in 1996

 compression is better than gzip, even if considerably slower

 bzip2 uses the Burrows-Wheeler transform

 When a character string is transformed by the BWT, the order of the characters are rearranged in a way that make compression easier

13 Z format

 Files that are compressed by the Unix command compress receive the file extension .Z

 It uses an implementation of LZW

 compress has fallen out of favor because of the UNISYS and IBM patents covering the LZW algorithm used by it

 For this reason gzip and bzip2 became more popular

14 StuffIt

 Raymond Lau wrote StuffIt in the 1980s as a high school student

 Files compressed by StuffIt typically have the filename extension .sitx or .sit

 StuffIt format is proprietary

 Quite common in Macintosh environment

15