A Comparative Study of Lossless Compression Techniques

International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in A COMPARATIVE STUDY OF LOSSLESS COMPRESSION TECHNIQUES 1J P SATI, 2M J NIGAM 1, 2 Indian Institute of Technology, Roorkee, India E-mail: [email protected], [email protected] Abstract- As we are dealing with more and more digital data, several compression techniques are developed for increasing need to store more data in lesser memory. Compressing can save storage capacity, speed file transfer, and decrease costs for storage hardware and network bandwidth. This paper intends to provide the performance analysis of lossless compression techniques with respect to various parameters like compression ratio, compression factor, saving percentage, compression and de-compression timeetc. It provides the relevant data about variations in parameters as well as describes the possible causes for it. The simulation results of image compression are achieved in MATLAB R2009a. The paper focuseson thede- compression time and the reasons for differences in comparison. Keywords- Run length Coding (RLE), Huffman, Arithmetic, Lempel-ziv-welch (LZW), Compression Ratio. I. INTRODUCTION compression. In this method, thecompressed image is not same as the original image; there is some amount Data compression is a technique that transforms the of information loss in the image. data from one representation to another new In lossy compression, much information can be compressed (in bits) representation, which contains simply discarded away from image data/audio the same information but with smallest possible size data/video data and when they are uncompressed the [13]. The size of data is reduced by removing the data will still be of acceptable quality. excessive or redundant information. The data is stored or transmitted at reduced storage and/or II. LOSSLESS COMPRESSION METHODS communication costs. Compressing a file to half of its original size is equivalent to doubling the capacity of Commonly used lossless compression techniques are the storage medium. It may then become feasible to Run Length Encoding (RLE), Huffman Coding, store the data at a higher level of the storage Arithmetic coding and Lempel-Ziv-Welch (LZW) hierarchy and reduce the load on the input/output channels of the system. 2.1 Run length coding Run-length encoding (RLE) is a data compression algorithm that is supported by most bitmap file formats, such as TIFF, BMP, and PCX. RLE is suited for compressing any type of data regardless of its information content, but the content of the data will affect the compression ratio achieved by RLE. RLE is both easy to implement and quick to execute, making it a good alternative to either using a complex compression algorithm or leaving your image data uncompressed. Fig1: Compression and de-compression process RLE works by reducing the physical size of a repeating string of characters. This repeating string, There are two compression techniques named as called a run, is typically encoded into two bytes. The Lossless and Lossy compression. first byte represents the number of characters in the In lossless compression scheme the reconstructed run and is called the run count. In practice, an image is same as the input image. Lossless image encoded run may contain 1 to 128 or 256 characters; compression techniques first convert the images into the run count usually contains as the number of the image pixels. Then processing is done on characters minus one (a value in the range of 0 to 127 eachsingle pixel. The First step includes prediction of or 255). The second byte is the value of the character next image pixel value from the in the run, which is in the range of 0 to 255, and is neighbourhoodpixels. In the second stage the called the run value. difference between the predicted value and the actual intensity of thenext pixel is coded using different Arun of 15 A’s would normally require 15 bytes to encoding methods. store: Lossy compression technique provides higher AAAAAAAAAAAAAAA compression ratio as compared to lossless A Comparative Study of Lossless Compression Techniques 45 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in The same string after RLE would require only two Implementation of Arithmetic codingwas carried out bytes: 15A.This compression technique is useful for in MATLAB R2009a [12]. The stepsfor executing the monochrome images or images having the same code are as follows:- background pixels. Implementation of Run-length Convert the image into greyscale image. encoding is carried out in MATLAB R2009a. The Read the greyscale image and store all the stepsfor executing the code are as follows:- intensity values as a single row vector. Convert the image into greyscale image. Convert the matrix into binary form and Read the grey scale image and rearrange arrange all the bits in binary stream representing the data of image as single row vector. same image. Convert all intensities values to binary state Encode the entire stream using arithmetic & obtain a binary stream representation of image. encoding algorithm. Count consecutive 1’s and 0’s appeared in a Calculate the compression ratio as the ratio sequence and stored as run length encoded sequence. of original image and size of Arithmetic coded Reconstruct the original image. sequence. Calculate the compression ratio as the ratio of original image and size of run lengthencoded 2.4 Lempel-Ziv-Welch (LZW) coding sequence. LZW compression algorithm is dictionary based algorithm. This means that instead of tabulating 2.2 Huffman coding character counts and building trees (as for Huffman It is a variable length coding technique which is encoding), LZW encodes data by referencing a coded for the symbols based on their probabilities. dictionary. It representsthe variable length symbols Symbols are generated based on the pixels in an with fixed length codes. The original version of this image. On the basis of the frequency of occurrence of method was created by Lempel and Ziv in 1978 the symbols, bits are assigned to it. Less bits are (LZ78) and was further refined by Welch in 1984, assigned to the symbols that occur more frequently hence the LZW acronym. while more number of bits are assigned to the Dictionary based coding scheme are of two types, symbols that occur less frequently. In Huffman Static and Adaptive. In Static Dictionary based coding the generated binary code of any symbol is coding, dictionary size is fixed during encoding and not the prefix of the code of any other symbol [3] [5]. decoding processes and in Adaptive Dictionary based Implementation of Huffman coding was carried out in coding; dictionary size is updated and reset when it is MATLAB R2009a. The stepsfor executing the code completely filled. Since images are used as data, are as follows:- static coding suits for the compression job with Convert the image into greyscale image. minimum delay [3] [6] [7]. Read the greyscale image and convert the Implementation of LZW encoding was carried out in array into singlerow vector. MATLAB R2009a [6][7]. The stepsfor executing the From the grey scale image, form a Huffman code are as follows:- encoding tree using probability of symbols in the Convert the image into greyscale image. image. Read an image and arrange all the intensity Encode each symbol independently using values in single row vector. the Huffman encoding tree. Convert all the values in binary form and Reconstruct the original image by achieve a single row binary representation. decompressing it using Huffman decoding. Initialize the dictionary with basic symbols 1 Calculate the compression ratio as the ratio and 0. of original image and size of Huffman coded Start encoding & decoding based on search sequence. & find method. Add any new word found in dictionary and encode the sequence. 2.3 Arithmetic encoding If dictionary is completely filled, continue Arithmetic coding is also a variable length coding using same dictionary. technique.In this technique, the entire symbols Calculate the compression ratio as the ratio generated from the pixels is converted into a single of original image and size of encoded sequence. floating point number also termed as binary fraction. In arithmetic coding technique, a tag is generated for III. EVALUATION AND COMPARISON the sequence which is to be encoded. This tag signifies the given binary fraction and becomes the 3.1 Performance Parameters unique binary code for the sequence. This unique Depending on the nature of the application there are binary code generated for a given sequence of certain various criteria to measure the performance of a length is not dependent on the entire length of compression algorithm.Following are some sequence [1] [4] [10]. measurements parameters used to evaluate the performances of lossless algorithms. A Comparative Study of Lossless Compression Techniques 46 International Journal of Advances in Electronics and Computer Science, ISSN: 2393-2835 Volume-4, Issue-2, Feb.-2017 http://iraj.in Compression Ratio is the ratio between the size of the Table 1: Compressionratio compressed file and the size of the source file. size after compression compression ratio = size before compression Compression Factor is the inverse of the compression ratio. That is the ratio between the size of the source file and the size of the compressed file. sizebeforecompression compressionfactor = sizeaftercompression Saving Percentage calculates the shrinkage of the Table 2: Compression factor source file as a percentage. sizeaftercompression savingpercentage = 1 − sizebeforecompression Bits per pixel is the number of bits per pixel used in the compressed representation of the image. 8 bitsperpixel = compression ratio Table 3: Saving Percentage Along with the above parameters compression and de-compression time, are also used to measure the effectiveness. Compression and De-Compression Time Time taken for the compression and decompression should be considered separately.

A Comparative Study of Lossless Compression Techniques

Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding

A Survey Paper on Different Speech Compression Techniques

XAPP616 "Huffman Coding" V1.0

Image Compression Through DCT and Huffman Coding Technique

Arxiv:2004.10531V1 [Cs.OH] 8 Apr 2020

The Strengths and Weaknesses of Different Image Compression Methods Samuel Teare and Brady Jacobson Lossy Vs Lossless

Annual Report 2016

An Optimized Huffman's Coding by the Method of Grouping

Arithmetic Coding

Image Compression Using Discrete Cosine Transform Method

Lossless Compression of Audio Data

The H.264 Advanced Video Coding (AVC) Standard