Lecture 17: Lempel-Ziv Coding and Summary

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 17: Lempel-Ziv Coding and Summary COMP2610/6261 - Information Theory Lecture 17: Lempel-Ziv Coding and Summary ANU Logo UseMark Guidelines Reid and Aditya Menon Research School of Computer Science The ANU logo is a contemporary The Australian National University refection of our heritage. It clearly presents our name, our shield and our motto: First to learn the nature of things. To preserve the authenticity of our brand identity, there are rules that govern how our logo is used. Preferred logo Black version Preferred logo - horizontal logo September 30th, 2014 The preferred logo should be used on a white background. This version includes black text with the crest in Deep Gold in either PMS or CMYK. Black Where colour printing is not available, the black logo can be used on a white background. Reverse Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 1 / 12 The logo can be used white reversed out of a black background, or occasionally a neutral dark background. Deep Gold Black C30 M50 Y70 K40 C0 M0 Y0 K100 PMS Metallic 8620 PMS Process Black PMS 463 Reverse version Any application of the ANU logo on a coloured background is subject to approval by the Marketing Offce, contact [email protected] LOGO USE GUIDELINES 1 THE AUSTRALIAN NATIONAL UNIVERSITY 1 Lempel-Ziv Coding 2 Compression Review Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 2 / 12 Run-length coding using (count; symbol) saves 12 bits: 111 0 111 1 111 0 111 1 |{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} 7 a 7 b 7 a 7 b Makes no probabilistic assumptions about source. Doesn't always yield shorter strings: aa bb a b a ! 10 0 10 1 01 0 01 1 01 0 (7 to 15 bits) Misses other structure: \2 repetitions of (7 as and 7 bs)" Eliminating Repetition What is a simple, short binary description of the following string? aaaaaaa bbbbbbb aaaaaaa bbbbbbb | {z } | {z } | {z } | {z } 7as 7bs 7as 7bs A simple symbol code for fa; bg, C = f0; 1g, uses 28 bits Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 3 / 12 Makes no probabilistic assumptions about source. Doesn't always yield shorter strings: aa bb a b a ! 10 0 10 1 01 0 01 1 01 0 (7 to 15 bits) Misses other structure: \2 repetitions of (7 as and 7 bs)" Eliminating Repetition What is a simple, short binary description of the following string? aaaaaaa bbbbbbb aaaaaaa bbbbbbb | {z } | {z } | {z } | {z } 7as 7bs 7as 7bs A simple symbol code for fa; bg, C = f0; 1g, uses 28 bits Run-length coding using (count; symbol) saves 12 bits: 111 0 111 1 111 0 111 1 |{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} 7 a 7 b 7 a 7 b Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 3 / 12 Eliminating Repetition What is a simple, short binary description of the following string? aaaaaaa bbbbbbb aaaaaaa bbbbbbb | {z } | {z } | {z } | {z } 7as 7bs 7as 7bs A simple symbol code for fa; bg, C = f0; 1g, uses 28 bits Run-length coding using (count; symbol) saves 12 bits: 111 0 111 1 111 0 111 1 |{z} |{z} |{z} |{z} |{z} |{z} |{z} |{z} 7 a 7 b 7 a 7 b Makes no probabilistic assumptions about source. Doesn't always yield shorter strings: aa bb a b a ! 10 0 10 1 01 0 01 1 01 0 (7 to 15 bits) Misses other structure: \2 repetitions of (7 as and 7 bs)" Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 3 / 12 1 A new a 2 A new b 3 The same1 symbol as1 symbol ago 4 The same2 symbols as3 symbols ago 5 The same 10 symbols as5 symbols ago 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 2 A new b 3 The same1 symbol as1 symbol ago 4 The same2 symbols as3 symbols ago 5 The same 10 symbols as5 symbols ago 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 3 The same1 symbol as1 symbol ago 4 The same2 symbols as3 symbols ago 5 The same 10 symbols as5 symbols ago 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a 2 A new b Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 4 The same2 symbols as3 symbols ago 5 The same 10 symbols as5 symbols ago 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a 2 A new b 3 The same1 symbol as1 symbol ago Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 5 The same 10 symbols as5 symbols ago 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a 2 A new b 3 The same1 symbol as1 symbol ago 4 The same2 symbols as3 symbols ago Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 0001100100011011001011011001 ... Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a 2 A new b 3 The same1 symbol as1 symbol ago 4 The same2 symbols as3 symbols ago 5 The same 10 symbols as5 symbols ago Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 Looking for Repetition Consider a sequence that starts abbababbababbab ::: We can describe each new part in terms of what we have seen so far: 1 A new a (0; a) 2 A new b (0; b) 3 The same1 symbol as1 symbol ago (1 ; 1; 1) 4 The same2 symbols as3 symbols ago (1 ; 3; 2) 5 The same 10 symbols as5 symbols ago (1 ; 5; 10) 0001100100011011001011011001 ... Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 4 / 12 1 If s = then output (0; xn) and continue; otherwise 2 Find smallest 0 ≤ i < W such that t = xn−i ::: xn−i+jt|−1xn 3 Output (1; i; jsj) 1 xn next symbol from sequence (n is total symbols read) 2 Let t last W symbols of s xn 3 If t does not appear in xn−W ::: xn−1 4 Else s s xn 2 While input sequence has more symbols: Notes: The output is converted to binary. i is represented with dlog2 W e bits. The size output jsj can be larger than W . Not very effective compression for short input sequences. Run-length encoding is essentially LZ77 with W = 1. Lempel-Ziv: Sliding Window (LZ77) LZ77(Sequence x1x2 :::, Window size W > 0) 1 Initialise s Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 5 / 12 1 If s = then output (0; xn) and continue; otherwise 2 Find smallest 0 ≤ i < W such that t = xn−i ::: xn−i+jt|−1xn 3 Output (1; i; jsj) 1 xn next symbol from sequence (n is total symbols read) 2 Let t last W symbols of s xn 3 If t does not appear in xn−W ::: xn−1 4 Else s s xn Notes: The output is converted to binary. i is represented with dlog2 W e bits. The size output jsj can be larger than W . Not very effective compression for short input sequences. Run-length encoding is essentially LZ77 with W = 1. Lempel-Ziv: Sliding Window (LZ77) LZ77(Sequence x1x2 :::, Window size W > 0) 1 Initialise s 2 While input sequence has more symbols: Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 5 / 12 1 If s = then output (0; xn) and continue; otherwise 2 Find smallest 0 ≤ i < W such that t = xn−i ::: xn−i+jt|−1xn 3 Output (1; i; jsj) 2 Let t last W symbols of s xn 3 If t does not appear in xn−W ::: xn−1 4 Else s s xn Notes: The output is converted to binary. i is represented with dlog2 W e bits. The size output jsj can be larger than W . Not very effective compression for short input sequences. Run-length encoding is essentially LZ77 with W = 1. Lempel-Ziv: Sliding Window (LZ77) LZ77(Sequence x1x2 :::, Window size W > 0) 1 Initialise s 2 While input sequence has more symbols: 1 xn next symbol from sequence (n is total symbols read) Mark Reid and Aditya Menon (ANU) COMP2610/6261 - Information Theory September 30th, 2014 5 / 12 1 If s = then output (0; xn) and continue; otherwise 2 Find smallest 0 ≤ i < W such that t = xn−i ::: xn−i+jt|−1xn 3 Output (1; i; jsj) 3 If t does not appear in xn−W ::: xn−1 4 Else s s xn Notes: The output is converted to binary.
Recommended publications
  • Word-Based Text Compression
    WORD-BASED TEXT COMPRESSION Jan Platoš, Jiří Dvorský Department of Computer Science VŠB – Technical University of Ostrava, Czech Republic {jan.platos.fei, jiri.dvorsky}@vsb.cz ABSTRACT Today there are many universal compression algorithms, but in most cases is for specific data better using specific algorithm - JPEG for images, MPEG for movies, etc. For textual documents there are special methods based on PPM algorithm or methods with non-character access, e.g. word-based compression. In the past, several papers describing variants of word- based compression using Huffman encoding or LZW method were published. The subject of this paper is the description of a word-based compression variant based on the LZ77 algorithm. The LZ77 algorithm and its modifications are described in this paper. Moreover, various ways of sliding window implementation and various possibilities of output encoding are described, as well. This paper also includes the implementation of an experimental application, testing of its efficiency and finding the best combination of all parts of the LZ77 coder. This is done to achieve the best compression ratio. In conclusion there is comparison of this implemented application with other word-based compression programs and with other commonly used compression programs. Key Words: LZ77, word-based compression, text compression 1. Introduction Data compression is used more and more the text compression. In the past, word- in these days, because larger amount of based compression methods based on data require to be transferred or backed-up Huffman encoding, LZW or BWT were and capacity of media or speed of network tested. This paper describes word-based lines increase slowly.
    [Show full text]
  • The Basic Principles of Data Compression
    The Basic Principles of Data Compression Author: Conrad Chung, 2BrightSparks Introduction Internet users who download or upload files from/to the web, or use email to send or receive attachments will most likely have encountered files in compressed format. In this topic we will cover how compression works, the advantages and disadvantages of compression, as well as types of compression. What is Compression? Compression is the process of encoding data more efficiently to achieve a reduction in file size. One type of compression available is referred to as lossless compression. This means the compressed file will be restored exactly to its original state with no loss of data during the decompression process. This is essential to data compression as the file would be corrupted and unusable should data be lost. Another compression category which will not be covered in this article is “lossy” compression often used in multimedia files for music and images and where data is discarded. Lossless compression algorithms use statistic modeling techniques to reduce repetitive information in a file. Some of the methods may include removal of spacing characters, representing a string of repeated characters with a single character or replacing recurring characters with smaller bit sequences. Advantages/Disadvantages of Compression Compression of files offer many advantages. When compressed, the quantity of bits used to store the information is reduced. Files that are smaller in size will result in shorter transmission times when they are transferred on the Internet. Compressed files also take up less storage space. File compression can zip up several small files into a single file for more convenient email transmission.
    [Show full text]
  • Dictionary Based Compression for Images
    INTERNATIONAL JOURNAL OF COMPUTERS Issue 3, Volume 6, 2012 Dictionary Based Compression for Images Bruno Carpentieri Ziv and Lempel in [1]. Abstract—Lempel-Ziv methods were original introduced to By limiting what could enter the dictionary, LZ2 assures compress one-dimensional data (text, object codes, etc.) but recently that there is at most one instance for each possible pattern in they have been successfully used in image compression. the dictionary. Constantinescu and Storer in [6] introduced a single-pass vector Initially the dictionary is empty. The coding pass consists of quantization algorithm that, with no training or previous knowledge of the digital data was able to achieve better compression results with searching the dictionary for the longest entry that is a prefix of respect to the JPEG standard and had also important computational a string starting at the current coding position. advantages. The index of the match is transmitted to the decoder using We review some of our recent work on LZ-based, single pass, log N bits, where N is the current size of the dictionary. adaptive algorithms for the compression of digital images, taking into 2 account the theoretical optimality of these approach, and we A new pattern is introduced into the dictionary by experimentally analyze the behavior of this algorithm with respect to concatenating the current match with the next character that the local dictionary size and with respect to the compression of bi- has to be encoded. level images. The dictionary of LZ2 continues to grow throughout the coding process. Keywords—Image compression, textual substitution methods.
    [Show full text]
  • A Survey on Different Compression Techniques Algorithm for Data Compression Ihardik Jani, Iijeegar Trivedi IC
    International Journal of Advanced Research in ISSN : 2347 - 8446 (Online) Computer Science & Technology (IJARCST 2014) Vol. 2, Issue 3 (July - Sept. 2014) ISSN : 2347 - 9817 (Print) A Survey on Different Compression Techniques Algorithm for Data Compression IHardik Jani, IIJeegar Trivedi IC. U. Shah University, India IIS. P. University, India Abstract Compression is useful because it helps us to reduce the resources usage, such as data storage space or transmission capacity. Data Compression is the technique of representing information in a compacted form. The actual aim of data compression is to be reduced redundancy in stored or communicated data, as well as increasing effectively data density. The data compression has important tool for the areas of file storage and distributed systems. To desirable Storage space on disks is expensively so a file which occupies less disk space is “cheapest” than an uncompressed files. The main purpose of data compression is asymptotically optimum data storage for all resources. The field data compression algorithm can be divided into different ways: lossless data compression and optimum lossy data compression as well as storage areas. Basically there are so many Compression methods available, which have a long list. In this paper, reviews of different basic lossless data and lossy compression algorithms are considered. On the basis of these techniques researcher have tried to purpose a bit reduction algorithm used for compression of data which is based on number theory system and file differential technique. The statistical coding techniques the algorithms such as Shannon-Fano Coding, Huffman coding, Adaptive Huffman coding, Run Length Encoding and Arithmetic coding are considered.
    [Show full text]
  • Enhanced Data Reduction, Segmentation, and Spatial
    ENHANCED DATA REDUCTION, SEGMENTATION, AND SPATIAL MULTIPLEXING METHODS FOR HYPERSPECTRAL IMAGING LEANNA N. ERGIN Bachelor of Forensic Chemistry Ohio University June 2006 Submitted in partial fulfillment of the requirements for the degree of DOCTOR OF PHILOSOPHY IN BIOANALYTICAL CHEMISTRY at the CLEVELAND STATE UNIVERSITY July 19th, 2017 We hereby approve this dissertation for Leanna N. Ergin Candidate for the Doctor of Philosophy in Clinical-Bioanalytical Chemistry degree for the Department of Chemistry and CLEVELAND STATE UNIVERSITY College of Graduate Studies ________________________________________________ Dissertation Committee Chairperson, Dr. John F. Turner II ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. David W. Ball ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Petru S. Fodor ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Xue-Long Sun ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Yan Xu ________________________________ Department/Date ________________________________________________ Dissertation Committee Member, Dr. Aimin Zhou ________________________________ Department/Date Date of Defense: July 19th, 2017 Dedicated to my husband, Can Ergin. ACKNOWLEDGEMENT I would like to thank my advisor,
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]
  • 1 Introduction
    HELSINKI UNIVERSITY OF TECHNOLOGY Faculty of Electronics, Communications, and Automation Department of Communications and Networking Le Wang Evaluation of Compression for Energy-aware Communication in Wireless Networks Master’s Thesis submitted in partial fulfillment of the requirements for the degree of Master of Science in Technology. Espoo, May 11, 2009 Supervisor: Professor Jukka Manner Instructor: Sebastian Siikavirta 2 HELSINKI UNIVERSITY OF TECHNOLOGY ABSTRACT OF MASTER’S THESIS Author: Le Wang Title: Evaluation of Compression for Energy-aware Communication in Wireless Networks Number of pages: 75 p. Date: 11th May 2009 Faculty: Faculty of Electronics, Communications, and Automation Department: Department of Communications and Networks Code: S-38 Supervisor: Professor Jukka Manner Instructor: Sebastian Siikavirta Abstract In accordance with the development of ICT-based communication, energy efficient communication in wireless networks is being required for reducing energy consumption, cutting down greenhouse emissions and improving business competitiveness. Due to significant energy consumption of transmitting data over wireless networks, data compression techniques can be used to trade the overhead of compression/decompression for less communication energy. Careless and blind compression in wireless networks not only causes an expansion of file sizes, but also wastes energy. This study aims to investigate the usages of data compression to reduce the energy consumption in a hand-held device. By con- ducting experiments as the methodologies, the impacts of transmission on energy consumption are explored on wireless interfaces. Then, 9 lossless compression algo- rithms are examined on popular Internet traffic in the view of compression ratio, speed and consumed energy. Additionally, energy consumption of uplink, downlink and overall system is investigated to achieve a comprehensive understanding of compression in wireless networks.
    [Show full text]
  • Error-Resilient Coding Tools in MPEG-4
    I Error-Resilient Coding Tools In MPEG-4 t By CHENG Shu Ling A THESIS SUBMITTED IN PARTIAL FULFILLMENT OF THE REQUIREMENTS FOR THE DEGREE OF THE MASTER OF PHILOSOPHY DIVISION OF INFORMATION ENGnvfEERING THE CHINESE UNIVERSITY OF HONG KONG July 1997 • A^^s^ p(:^S31^^^ 4c—^w -v^ UNIVERSITY JgJ ' %^^y^^ s�,s^^ %^r::^v€X '^<"nii^Ji.^i:^'5^^ Acknowledgement I would like to thank my supervisor Prof. Victor K. W. Wei for his invaluable guidance and support throughout my M. Phil, study. He has given to me many fruitful ideas and inspiration on my research. I have been the teaching assistant of his channel coding and multimedia coding class for three terms, during which I have gained a lot of knowledge and experience. Working with Prof. Wei is a precious experience for me; I learnt many ideas on work and research from him. I would also like to thank my fellow labmates, C.K. Lai, C.W. Lam, S.W. Ng and C.W. Yuen. They have assisted me in many ways on my duties and given a lot of insightful discussions on my studies. Special thanks to Lai and Lam for their technical support on my research. Last but not least, I want to say thank you to my dear friends: Bird, William, Johnny, Mok, Abak, Samson, Siu-man and Nelson. Thanks to all of you for the support and patience with me over the years. ii Abstract Error resiliency is becoming increasingly important with the rapid growth of the mobile systems. The channels in mobile communications are subject to fading, shadowing and interference and thus have a high error rate.
    [Show full text]
  • Distortionless Source Coding 2
    ECE 515 Information Theory Distortionless Source Coding 2 1 Huffman Coding • The length of Huffman codewords has to be an integer number of symbols, while the self- information of the source symbols is almost always a non-integer. • Thus the theoretical minimum message compression cannot always be achieved. • For a binary source with p(x1) = 0.1 and p(x2) = 0.9 – H(X) = .469 so the optimal average codeword length is .469 bits – Symbol x1 should be encoded to l1 = -log2(0.1) = 3.32 bits – Symbol x2 should be encoded to l2 = -log2(0.9) = .152 bits 2 Improving Huffman Coding • One way to overcome the redundancy limitation is to encode blocks of several symbols. In this way the per-symbol inefficiency is spread over an entire block. – N = 1: ζ = 46.9% N = 2: ζ = 72.7% N = 3: ζ = 80.0% • However, using blocks is difficult to implement as there is a block for every possible combination of symbols, so the number of blocks (and thus codewords) increases exponentially with their length. – The probability of each block must be computed. 3 Peter Elias (1923 – 2001) 4 Arithmetic Coding • Arithmetic coding bypasses the idea of replacing a source symbol (or groups of symbols) with a specific codeword. • Instead, a sequence of symbols is encoded to an interval in [0,1). • Useful when dealing with sources with small alphabets, such as binary sources, and alphabets with highly skewed probabilities. 5 Arithmetic Coding Applications • JPEG, MPEG-1, MPEG-2 – Huffman and arithmetic coding • JPEG2000, MPEG-4 – Arithmetic coding only • ZIP – prediction by partial matching (PPMd) algorithm • H.263, H.264 6 Arithmetic Coding • Lexicographic ordering j−1 • Cumulative probabilities Puji= ∑p( ) i=1 u11111 xx x P u2 xx 11 x 2 P 2 uKKNN xxKK x K P • The interval Pj to Pj+1 defines uj 7 Arithmetic Coding • A sequence of source symbols is represented by an interval in [0,1).
    [Show full text]
  • 13. Compression and Decompression
    13. Compression and Decompression Algorithms for data compression usually proceed as follows. They encode a text over some nite alphabet into a sequence of bits, hereby exploiting the fact that the letters of this alphabet occur with different frequencies. For instance, an e occurs more frequently than a q and will therefore be assigned a shorter codeword. The quality of the compression procedure is then measured in terms of the average codeword length. So the underlying model is probabilistic, namely we consider a nite alphabet and a probability distribution on this alphabet, where the probability distribution reects the (re- lative) frequencies of the letters. Such a pair an alphabet with a probability distribution is called a source. We shall rst introduce some basic facts from Information Theory. Most important is the notion of entropy, since the source entropy characterizes the achievable lower bounds for compressibility. The source model to be best understood, is the discrete memoryless source. Here the letters occur independently of each other in the text. The use of prex codes, in which no codeword is the beginning of another one, allows to compress the text down to the entropy of the source. We shall study this in detail. The lower bound is obtained via Kraft's inequality, the achievability is demonstrated by the use of Huffman codes, which can be shown to be optimal. There are some assumptions on the discrete memoryless source, which are not fullled in most practical situations. Firstly, usually this source model is not realistic, since the letters do not occur independently in the text.
    [Show full text]
  • Introduction to Data Compression, by Guy E. Blelloch
    Introduction to Data Compression∗ Guy E. Blelloch Computer Science Department Carnegie Mellon University blellochcs.cmu.edu January 31, 2013 Contents 1 Introduction 3 2 Information Theory 5 2.1 Entropy ........................................ 5 2.2 The Entropy of the English Language . ...... 6 2.3 Conditional Entropy and Markov Chains . ...... 7 3 Probability Coding 10 3.1 PrefixCodes...................................... 10 3.1.1 Relationship to Entropy . 11 3.2 HuffmanCodes .................................... 13 3.2.1 CombiningMessages............................. 15 3.2.2 Minimum Variance Huffman Codes . 15 3.3 ArithmeticCoding ................................. 16 3.3.1 Integer Implementation . 19 4 Applications of Probability Coding 22 4.1 Run-lengthCoding .................................. 25 4.2 Move-To-FrontCoding .............................. 26 4.3 ResidualCoding:JPEG-LS. 27 4.4 ContextCoding:JBIG ................................ 28 4.5 ContextCoding:PPM................................. 29 ∗This is an early draft of a chapter of a book I’m starting to write on “algorithms in the real world”. There are surely many mistakes, and please feel free to point them out. In general the Lossless compression part is more polished than the lossy compression part. Some of the text and figures in the Lossy Compression sections are from scribe notes taken by Ben Liblit at UC Berkeley. Thanks for many comments from students that helped improve the presentation. c 2000, 2001 Guy Blelloch 1 5 The Lempel-Ziv Algorithms 32 5.1 Lempel-Ziv 77 (Sliding Windows) . ...... 32 5.2 Lempel-Ziv-Welch ................................ 34 6 Other Lossless Compression 37 6.1 BurrowsWheeler ................................... 37 7 Lossy Compression Techniques 40 7.1 ScalarQuantization .............................. 41 7.2 VectorQuantization.............................. 41 7.3 TransformCoding.................................. 43 8 A Case Study: JPEG and MPEG 44 8.1 JPEG ........................................
    [Show full text]
  • Energy-Aware Lossless Data Compression
    Energy-Aware Lossless Data Compression KENNETH C. BARR and KRSTE ASANOVIC´ MIT Computer Science and Artificial Intelligence Laboratory Wireless transmission of a single bit can require over 1000 times more energy than a single 32-bit computation. It can therefore be beneficial to perform additional computation to reduce the number of bits transmitted. If the energy required to compress data is less than the energy required to send it, there is a net energy savings and an increase in battery life for portable computers. This article presents a study of the energy savings possible by losslessly compressing data prior to transmission. A variety of algorithms were measured on a StrongARM SA-110 processor. This work demonstrates that, with several typical compression algorithms, there is a actually a net energy increase when compression is applied before transmission. Reasons for this increase are explained and suggestions are made to avoid it. One such energy-aware suggestion is asymmetric compression, the use of one compression algorithm on the transmit side and a different algorithm for the receive path. By choosing the lowest-energy compressor and decompressor on the test platform, overall energy to send and receive data can be reduced by 11% compared with a well-chosen symmetric pair, or up to 57% over the default symmetric zlib scheme. Categories and Subject Descriptors: C.4 [Performance of Systems]—Performance attributes General Terms: Algorithms, Measurement, Performance Additional Key Words and Phrases: Compression, energy-aware, lossless, low-power, power-aware 1. INTRODUCTION Wireless communication is an essential component of mobile computing, but the energy required for transmission of a single bit has been measured to be over 1000 times greater than for a single 32-bit computation.
    [Show full text]