<<

LINUXUSER Line: , ,

gzip, bzip2 and tar EXPERT PACKING

A short command is all it takes to your data or extract it from an archive.

BY HEIKE JURZIK

rchiving provides many bene- fits: packed and compressed Afiles occupy less space on your disk and require less bandwidth on the Internet. has both GUI-based pro- grams, such as Roller or , and www.sxc.hu command-line tools for creating and un- packing various archive types. This arti- cle examines some shell tools for - chiving files and demonstrates the kind of expert packing that clever combina- tained by the packing process. If you A gzip file can be unpacked using either tions of Linux commands offer the com- prefer to use a different extension, you gunzip or gzip -d. If the tool discovers a mand line user. can set the -S (suffix) flag to specify your file of the same name in the working di- own instead. For example, the command rectory, it prompts you to make sure that Nicely Packed with “gzip” you know you are overwriting this file: The gzip (GNU ) program is the de- gzip -S .z image.bmp fault packer on Linux. Gzip $ gunzip screenie.jpg.gz simple files, but it does not create com- creates a compressed file titled image. gunzip: screenie.jpg U plete archives. In its simplest bmp.z. form, the gzip command looks like this: The size of the compressed file de- Listing 1: Compression pends on the distribution of identical Compared gzip file strings in the original file. gzip will com- press a file efficiently if it contains 01 $ ls -l gzip replaces the original file with a patterns that repeat frequently. gzip does 02 -rw---r-- 1 daisy daisy compressed version, adding the .gz ex- not do a good job of compressing files 2313894 Sep 3 22:47 screenie. tension. File properties such as access that use compressed formats for the bmp and modification stamps are re- original, such as MP3 or JPEG. Listing 1 03 -rw-r--r-- 1 daisy daisy shows the difference in performance 169862 Sep 5 12:41 screenie. Command Line between a bitmap and a JPEG file. jpg You can additionally specify a com- Although GUIs such as KDE or Gnome 04 $ gzip screenie* are useful for various tasks, if you intend pression factor between 1 and 9 to influ- 05 $ ls -l to get the most out of your Linux ma- ence the compression: gzip -1 is quicker chine, you will need to revert to the and gzip -9 is slower but has a far higher 06 -rw-r--r-- 1 daisy daisy good old command line from time to compression rate. The default is -6. If 9547 Sep 3 22:47 screenie. time. Apart from that, you will probably you prefer to change the default, you can bmp.gz be confronted with various scenarios do so by setting the gzip environment 07 -rw-r--r-- 1 daisy daisy where some working knowledge will be variable in the ~/.bashrc file 130524 Sep 5 12:41 screenie. extremely useful in finding your way jpg.gz through the command line jungle. export GZIP="-9"

88 ISSUE 61 DECEMBER 2005 WWW.LINUX - MAGAZINE.COM Command Line: gzip, bzip2, tar LINUXUSER

already exists; do you wishU You can abbreviate this even further original file properties – just like com- to overwrite (y or n)? using zless file.gz: This command is in pressing with gzip. In contrast to gzip, fact a short script that has a similar re- bzip2 has an option for creating a If you say [n] this point, gzip aborts sult but with less typing. of the original. To do this, simply specify the operation. If you object to the safety the -k (for “keep”) parameter: check, you can disable the check by set- Newer, Quicker, Better: ting the -f (for “force”) option. The pa- bzip2 bzip2 -k file rameter has an additional side effect: by The bzip2 tool uses a different compres- default, gzip refuses to Sym- sion algorithm, and thus it compresses (To do the same thing with gzip, you links. But if you set the -f parameter be- most files far more effectively. Listing 2 could output the compressed file to stan- fore pointing gzip at a symlink, the tool shows gzip and bzip2 in comparison. dard output, and then redirect the output will compress the file the link points to Additionally, bzip2 has a recovery mode. to a new file: gzip - file> file.gz. How- and assign the name of the symlink On compressing, the tool splits files into ever, the compressed version of the (plus the normal extension). When you individual blocks; if one file is damaged, file will not keep the properties of the unpack the file, this does not give you a it may still be possible to rescue the symlink but a normal file (Figure 1). other, undamaged files. To do this, run Listing 2: gzip and bzip2 There is no need to unpack com- bzip2recover to unpack the undamaged Compared pressed text files, such as the HOWTOs elements. below /usr/share/doc/, before using the Apart from a few minor differences, 01 $ ls -l bild.bmp* less or more pager to view them. Run- most bzip2 options are like their gzip 02 -rw-r--r-- 1 daisy daisy ning gzip with the -c option (which gives counterparts. Again, you simply specify 2313894 Sep 5 13:35 image.bmp you output on stdout), and piping the a filename to compress that file: 03 -rw-r--r-- 1 daisy daisy output to a pager, gives you a much 2534 Sep 5 13:35 image.bmp. quicker approach, for example: bzip2 file bz2 04 -rw-r--r-- 1 daisy daisy U gzip -dc /usr/share/doc/ The compressed file will have a .bz2 ex- 9547 Sep 5 13:35 image.bmp.gz iptables/README..gz | less tension after the event, and it keeps the

Advertisement LINUXUSER Command Line: gzip, bzip2, tar

original file in this (date '+%Y_%m_%d').tar U case.) --rsh-command=/usr/bin/ssh U Just like with --exclude=/proc / gzip, the -1 through -9 (de- This fairly lengthy command gives the fault) parameters following instructions: use SSH to create tell bzip2 what a backup copy in the /scratch/tmp/direc- level of compres- tory on the host machine, basing the ar- sion to use. To chive name on the backup_ prefix, the change the de- current date, and the .tar extension (for fault, again set the example, backup_2005_10_05.tar). This BZIP2 environ- backup will include all directories from ment variable by Figure 1: Be careful how you compress symlinks. You can use the -f the root /, but without the /proc direc- adding the follow- parameter to force this operation, but unpacking the file will not give tory, which does not contain any real ing to ~/.bashrc you a symlink. data.

export BZIP2="-6" simply append the file using the -r op- Altogether Now! tion: As I mentioned previously, tar has a To unpack a compressed file, set bzip2’s number of parameters for creating gzip -d flag, or use the special bunzip2 un- tar -rf archive.tar file or bzip2 compressed archives all at once. packing tool. Like gzip, bzip2 gives you The syntax for the gzip variant of the some protection against inadvertently Unpack an archive with the -x (“x” for tar command is: overwriting existing files, but in contrast “extract”) option: to gzip, the tool does not give you free- tar -cvzf archive.tar.gz file(s) dom of . Instead it displays the fol- tar -xvf archive.tar lowing message and quits the operation: If you prefer to use bzip2 instead of gzip To ensure than tar does not overwrite ex- with tar, the -z option in the $ bunzip2 peggy.jpg.bz2 isting files with content from the archive, preceding command with -j: bunzip2: Output file U it is a good idea to unpack the archive in peggy.jpg already exists. a temporary directory, or use -t instead tar -cvjf U of -x for a trial run, just to see which archive.tar.bz2 file(s) The -f parameter disables this behavior. files and directories the archive contains. You can handle complete directories You can set the -z and -j parameters ap- Into the Archive and subdirectories with the same ap- propriately when unpacking the archive The tar program stores multiple files in proach – instead of specifying individual to save an extra step; for example, in- an archive. The program’s name (tar is filenames, just give tar the name of the stead of short for “tape archiver”) indicates its folder: original use for tape archive manage- bunzip2 archive.tar.bz2 ment. But tar can do more than just con- $ tar -cvf archive.tar folder/ tar -xvf archive.tar catenate files; it has options for com- test/ pressing archives directly using gzip or test/screenie.bmp just say bzip2 – and this is what gives you the test/link.bmp common tar.gz and tar.bz2 archive test/new/ tar -xvjf archive.tar.bz2 names. test/new/screenie.jpg To put multiple files in an archive, test/new/new/ to invoke the -j option for unpacking a enter ... bzip2 archive. ■

tar -cvf archive.tar file1 file2 It can make sense to exclude some direc- Heike Jurzik studied tories from the archive, especially if you German, Computer for example. The -c, -v, and -f options, are using tar for backups. The --exclude Science and English which are in shorthand in this com- parameter gives you this ability. The at the University of Cologne, Germany. mand, tell tar to create the archive (“c” --rsh-command parameter is also useful She discovered for “create”), tell us what is going on be- in this context, as you can tell tar to use Linux in 1996 and hind the scenes (“v” for “verbose”), and SSH to send the backup to another ma- has been fascinated interpret the first argument (archiv.tar) chine. The complete command looks like with the scope of the Linux com-

as the archive name (“f” for “file”). this: THEAUTHOR mand line ever since. In her leisure If you notice that you have forgotten a time you might Heike hanging file at a later date, you don’t have to tar -cvf user@host:/scratchU out at Irish folk sessions or visiting redo the whole archive; instead you can /tmp/backup_$U Ireland.

90 ISSUE 61 DECEMBER 2005 WWW.LINUX - MAGAZINE.COM