Data Compression Techniques: a Comparative Study
Total Page:16
File Type:pdf, Size:1020Kb
International Journal of Applied Research & Studies ISSN 2278 – 9480 Review Paper Data Compression Techniques: A Comparative Study Authors: 1Pooja Jain*, 2 Zeeshan Khan, 3Anurag Jain Address For correspondence: 1, 2, 3 Dept. of Computer Science, Radharaman, Institute of Technology and Science, Bhopal, India. Abstract- Data compression is one of many technologies that enables and images but have made compression of such signals central today’s information revolution. Image compression is an application of to storage and communication technology. data compression. Image compression is now essential for applications such as transmission and storage in data bases. In this paper we review II. Principles Behind Compression and discuss about the image compression, need of compression, its A common characteristic of most images is that the principles, and classes of compression and various algorithm of image neighboring pixels are correlated and therefore contain compression. This paper attempts to give a recipe for selecting one of redundant information. The foremost task then is to find less the popular image compression algorithms based on Wavelet, correlated representation of the image. Two fundamental JPEG/DCT, VQ, and Fractal approaches. We review and discuss the components of compression are redundancy and irrelevancy advantages and disadvantages of data compression techniques and reduction. Redundancy reduction aims at removing algorithms for compressing grayscale images; give an experimental duplication from the signal source (image/video). Irrelevancy comparison on 256×256 commonly used image of Lenna and one reduction omits parts of the signal that will not be noticed by 400×400 fingerprint image. the signal receiver, namely the Human Visual System (HVS). In general, three types of redundancy can be identified: Keywords - Data compression, Image compression, JPEG, DCT, VQ, Wavelet, Fractal. A. Coding Redundancy I. INTRODUCTION A code is a system of symbols (letters, numbers, bits, and the like) used to represent a body of information or set of events. Data compression is one of many technologies that enables Each piece of information or events is assigned a sequence of today’s information revolution. Lossless data compression is code symbols, called a code word. The number of symbols in used to compact files or data into a smaller form. It is often each code word is its length. The 8-bit codes that are used to used to package up software before it is sent over the Internet or represent the intensities in the most 2-D intensity arrays downloaded from a web site to reduce the amount of time and contain more bits than are needed to represent the intensities. bandwidth required to transmit the data. Lossless data compression has the constraint that when data is uncompressed, B. Spatial Redundancy and Temporal Redundancy it must be identical to the original data that was compressed. Because the pixels of most 2-D intensity arrays are correlated Graphics, audio, and video compression such as JPEG, MP3, spatially, information is unnecessarily replicated in the and MPEG on the other hand use lossy compression schemes representations of the correlated pixels. In video sequence, which throw away some of the original data to compress the temporally correlated pixels also duplicate information. files even further. Image compression is the application of data compression on digital images. In effect, the objective is to C. Irrelevant Information reduce redundancy of the image data in order to be able to store Most 2-D intensity arrays contain information that is ignored or transmit data in an efficient form Uncompressed multimedia by the human visual system and extraneous to the intended (graphics, audio and video) data requires considerable storage use of the image. It is redundant in the sense that it is not capacity and transmission bandwidth. Despite rapid progress in used. Image compression research aims at reducing the mass-storage density, processor speeds, and digital number of bits needed to represent an image by removing the communication system performance, demand for data storage spatial and spectral redundancies as much as possible. capacity and data- transmission bandwidth continues to outstrip the capabilities of available technologies. The recent growth of data intensive multimedia-based web applications have not only [email protected] * Corresponding Author Email-Id sustained the need for more efficient ways to encode signals iJARS/ Vol. II/ Issue 2/Feb, 2013/329 1 http://www.ijars.in International Journal of Applied Research & Studies ISSN 2278 – 9480 Bits/ Uncompres Transmissio Multimedia Size/ Pixel or sed Size (B n Transmiss B. Predictive vs. Transform coding Data Duration Bits/ for bytes) Bandwidth ion Time In predictive coding, information already sent or available Sample (b for bits) is used to predict future values, and the difference is coded. Since this is done in the image or spatial domain, it is A page of 11"x8.5 Varyi 4.8KB 32.64 1.1 - 2.2 relatively simple to implement and is readily adapted to local text " ng Kb/page sec image characteristics. Differential Pulse Code Modulation resolut (DPCM) is one particular example of predictive coding. ion Transform coding, on the other hand, first transforms the Telephone 10 sec 8 bps 80KB 64 Kb/sec 22.2 sec image from its spatial domain representation to a different quality type of representation using some well-known transform and speech then codes the transformed values (coefficients). This method Grayscale 512x512 8 bpp 262KB 2.1 1 min provides greater data compression compared to predictive Image Mb/Image 13 sec methods, although at the expense of greater computation. Color 512x512 24 786KB 6.29 3 min Image bpp Mb/Image 39 sec V. CONVENTIONAL UNIVERSAL DATA COMPRESSION Medical 2048x16 12 5.16MB 41.3 23 min Image 80 bpp Mb/Image 54 sec SHD Image 2048x20 24 12.58 100 58 min Most universal compression techniques collect statistical 48 bpp MB Mb/Image 15 sec information to identify the highest frequency patterns in the data. In this section some of the previous attempts to address the universal compression problem by utilising statistical approaches are reviewed. III. NEED OF COMPRESSION The Table 1 show the qualitative transition from simple text A. Adaptive Compression Algorithm to full-motion video data and the disk space transmission Typically data compression algorithms have been bandwidth, and transmission time needed to store and implemented through a two-pass approach. In the first pass, transmit such uncompressed data. the algorithm collects information regarding the data to be compressed, such as the frequencies of characters or Table 1: substrings. In the second pass, the actual encoding takes place. Multimedia data types and uncompressed storage space, transmission A fixed encoding scheme is required to ensure that the bandwidth, and transmission time required. The prefix kilo- denotes a factor decoder is able to retrieve the original message. This approach of 1000 rather than 1024. has been improved through so called adaptive compression, where the algorithm only needs one pass to compress the data The examples given in the Table I clearly illustrate the need [25]. The main idea of adaptive compression algorithms is that for sufficient storage space, large transmission bandwidth, and the encoding scheme changes as the data being compressed. long transmission time for image, audio, and video data. Thus, the encoding of the nth symbol is based on the characteristics of the data until position n�1 [25]. A key At the present state of technology, the only solution is to advantage of adaptive compression is that it does not require compress multimedia data before its storage and transmission, the entire message to be loaded into the memory before the and decompress it at the receiver for play back. For example, compression process can start. with a compression ratio of 32:1, the space, bandwidth, and transmission time requirements can be reduced by a factor of Adaptive Huffman coding [26] is a statistical lossless 32, with acceptable quality. compression where the code is represented using a binary tree structure. This algorithm gives the most frequent characters IV. Different classes of compression techniques that appear in the file to be compressed shorter codes than those characters which are less frequent. Top nodes in the tree Two ways of classifying compression techniques are store the most frequent characters. To produce the compressed mentioned here. version of a file, the algorithm simply traverses the tree from A. Lossless vs. Lossy compression the root node to the target character. At each step the In lossless compression schemes, the reconstructed image, algorithm will encode 1 if it has moved to the left branch and after compression, is numerically identical to the original 0 otherwise. The Huffman tree creates a unique variable image. However lossless compression can only achieve a length code for each character. The novelty of the Adaptive modest amount of compression. An image reconstructed Huffman compression is that nodes swap locations within the following lossy compression contains degradation relative to tree during the compression. This allows the algorithm to the original. Often this is because the compression scheme adapt its compression as the frequency of the symbols changes completely discards redundant information. However, lossy throughout the file. schemes are capable of achieving much higher compression. Under normal viewing conditions, no visible loss is perceived Adaptive Arithmetic Coding (AC) is a statistical compression (visually lossless). iJARS/ Vol. II/ Issue 2/Feb, 2013/329 2 http://www.ijars.in International Journal of Applied Research & Studies ISSN 2278 – 9480 that learns the distribution of the source during the this model has similar performance to PPM it is not widely compression process [23]. AC has historic significance as it used. was the best alternative to Huffman coding after a gap of 25 years [23]. Adaptive AC encodes the source message to Lempel and Ziv developed a universal compression system in variable-length code in such a way that frequently used 1977 known as LZ77 [28].