Entropy Coding

Total Page:16

File Type:pdf, Size:1020Kb

Entropy Coding Video Codec Design Iain E. G. Richardson Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic) Entropy Coding 8.1 INTRODUCTION A video encoder contains two main functions: a source model that attempts to represent a video scene in a compact form that is easy to compress (usually an approximation of the original video information) and anentropy encoder that compresses the output of the model prior to storage and transmission. The source model is matched to the characteristics of the input data (images or video frames), whereas the entropy coder may use ‘general-purpose’ statistical compression techniques that are not necessarily unique in their application to image and video coding. As with the functions described earlier (motion estimation and compensation, transform coding,quantisation), the design ofan entropyCODEC is affected by a number of constraints including: 1. Compression eficiency: the aim is to represent the source model output using as few bits as possible. 2. Computational eficiency: thedesign should be suitable for implementation on the chosen hardware or software platform. 3. Error robustness: if transmission errors are likely, the entropy CODEC should support recovery from errors and should (if possible) limit error propagation at decoder (this constraint may conflict with (1) above). In a typical transform-based video CODEC, the data to be encoded by the entropy CODEC falls into three main categories: transform coefficients (e.g. quantised DCT coefficients), motion vectors and ‘side’ information (headers, synchronisation markers, etc.). The method of coding side information depends on the standard. Motion vectors can often be represented compactly in a differential form due to the high correlation between vectors for neighbouring blocks or macroblocks.Transform coefficients can be represented efficiently with‘run- level’ coding, exploiting the sparse nature of the DCT coefficient array. An entropy encoder maps input symbols (for example, run-level coded coefficients) to a compressed data stream. It achieves compression by exploiting redundancy in the set of input symbols, representing frequently occurring symbols with a small number of bits and infrequently occumng symbols with a larger number of bits. The two most popular entropy encodingmethods used in video coding standards are Huffman coding and arithmetic coding.Huffman coding (or ‘modified’Huffman coding) represents each input symbol by avariable-length codeword containing an integralnumber of bits. It is relatively 164 ENTROPY CODING straightforwardto implement, but cannot achieve optimal compression because of the restriction that each codeword must contain an integral number of bits. Arithmetic coding maps aninput symbol into a fractional number of bits,enabling greater compression efficiency at the expense of higher complexity (depending on the implementation). 8.2 DATA SYMBOLS 8.2.1 Run-LevelCoding The output of the quantiser stage in a DCT-based video encoder is a block of quantised transform coefficients. The arrayof coefficients is likely to be sparse:if the image block has been efficiently decorrelated by the DCT, most of the quantised coefficients in a typical block are zero. Figure 8.1 shows a typical block of quantised coefficients from an MPEG-4 ‘intra’block. The structure of thequantised block is fairlytypical. A few non-zero coefficients remain after quantisation, mainly clustered around DCT coefficient (0,O): this is the ‘DC’ coefficient and is usually the most important coefficient to the appearanceof the reconstructed image block. The block of coefficients shown in Figure 8.1 may be efficiently compressed as follows: 1. Reordering. The non-zero values are clustered around the top left of the 2-D array and this stage groups these non-zero values together. 2. Run-level coding. This stage attempts to find a more efficient representation for the large number of zeros (48 in this case). 3. Entropy coding. The entropy encoder attempts to reduce the redundancyof the data symbols. Reordering The optimum method of reordering the quantised data depends on the distribution of the non-zero coefficients. If the original image (or motion-compensated residual) data is evenly DC Figure 8.1 Block of quantisedcoefficients (intra-coding) DATA SYMBOLS 165 distributed in the horizontal and vertical directions (i.e. there is not a predominance of ‘strong’ image features in either direction), then the significant coefficientswill also tend to be evenly distributed about the top left of the array (Figure 8.2(a)). In this case, a zigzag reordering pattern such as Figure 8.2 (c) should group together the non-zero coefficients Typical coefficientmap: frame coding 8000 - 6000. 4000. 2000 - 0OL iPo2 Typical coefficientmap: field coding 2000 j 0Ob 2 2 88 (b) Figure 8.2 Typicaldata distributions and reordering patterns: (a) even distribution; (b) field distribution; (c) zigzag; (d) modified zigzag 166 ENTROPY CODING efficiently. However, in some cases an alternative pattern performs better. For example, a field of interlaced video tends to vary more rapidly in the vertical than in the horizontal direction (because it has been vertically subsampled). In this case the non-zero coefficients are likely to be ‘skewed’ as shown in Figure 8.2(b): they are clustered more to the leftof the array (corresponding to basis functions with a strong vertical variation, see for example Figure 7.4). A modified reordering pattern such as Figure 8.2(d) should perform better at grouping the coefficients together. Run-level coding The output of the reordering process is a linear array of quantised coefficients. Non-zero coefficients are mainly grouped together near the start of the array and the remaining values inthe array arezero. Long sequences of identicalvalues (zeros in this case) can be represented as a (run, level) code, where (run) indicates the number of zeros preceding a non-zero value and (level) indicates the sign and magnitude of the non-zero coefficient. The following example illustrates the reordering and run-level coding process. Example Theblock of coefficientsin Figure 8.1 isreordered with the zigzag scan shown in Figure 8.2 and the reordered array is run-level coded. Reordered array: [102, -33, 21, -3, -2, -3, -4, -3,0,2, 1,0, 1,0, -2, - 1, -1,0, 0,0, -2, 0,0, 0, 0,0,0,0,0,0,0,0,1,0 ...l Run-level coded: (0, 102) (0, -33) (0, 21) (0, -3) (0, -2) (0, -3) (0, -4) (0, -3) (1, 2) (0, 1) (1, 1) (1, -2) (0, - 1) (0, -1) (4, -2) (11, 1) DATA SYMBOLS 167 Two special cases need to be considered. Coefficient (0, 0) (the ‘DC’ coefficient) is impor- tant to the appearance of the reconstructed image block and has no preceding zeros. In an intra-coded block (i.e. coded without motion compensation), the DC coefficient is rarely zero and so is treated differently from other coefficients. In an H.263 CODEC, intra-DC coefficients are encoded with a fixed, relatively low quantiser setting (to preserve image quality) and without (run, level) coding. Baseline JPEG takes advantageof the property that neighbouringimage blocks tend to have similar mean values (and hence similar DC coefficientvalues) and each DC coefficient is encoded differentially from the previous DC coefficient. The second special case is the final run of zeros in a block. Coefficient (7, 7) is usually zero and so we need a special case to deal with the final run of zeros that has no terminating non-zero value. In H.261 and baseline JPEG, a special code symbol, ‘end of block’ or EOB, is inserted after the last (run, level) pair. This approach is known as ‘two-dimensional’ run- level coding since each code representsjust two values (run and level).The method doesnot perform well underhigh compression: in this case, many blocks contain only a DC coefficient and so the EOB codes make up a significant proportion of the coded bit stream. H.263 and MPEG-4 avoid this problemby encoding a flag along with each (run, level)pair. This ‘last’ flag signifies the final (run, level) pair in the block and indicates to the decoder thatthe rest of theblock should be filled withzeros. Each code now representsthree values (run, level, last) and so this method is known as ‘three-dimensional’ run-level-last coding. 8.2.2 Other Symbols In addition to run-level coded coefficient data, a number of other values need to be coded and transmitted by the video encoder. These include the following. Motion vectors The vectordisplacement between the current and reference areas (e.g. macroblocks)is encoded along with each dataunit. Motion vectors for neighbouring data units are oftenvery similar, and this property may be used to reduce the amount of information required to be encoded.In anH.261 CODEC,for example, the motion vector for each macroblock is predicted from the preceding macroblock. The difference between the current and previous vectoris encoded and transmitted (instead of transmittingthe vector itself). A more sophisticatedprediction is formed during MPEG-4/H.263 coding: the vector for each macroblock (or block if the optional advanced prediction mode is enabled) is predicted from up to three previously transmitted motion vectors. This helps to further reduce the transmitted information. These two methods of predicting the current motion vector are shown in Figure 8.3. Example Motion vector of current macroblock: x = +3.5, y = +2.0 Predicted motion vector from previous macroblocks: x = f3.0, y = 0.0 Differential motion vector: dx = +0.5, dy = -2.0 168 ENTROPY CODING Current Current macroblock macroblock H.261: predict MV from previous H.263/MPEG4: predict MV from three previous macroblock vector MV1 macroblock vectors MV1, MV2 and MV3 Figure 8.3 Motion vector prediction (H.261, H.263) Quantisation parameter In order to maintain a target bit rate, it is common for a video encoder to modify the quantisation parameter (scale factor or step size) during encoding.
Recommended publications
  • Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding
    Dictionary-based Coding already coded not yet coded search buffer look-ahead buffer cursor (N symbols) (L symbols) We know the past but cannot control it. We control the future but... Last Lecture Last Lecture: Predictive Lossless Coding Predictive Lossless Coding Simple and effective way to exploit dependencies between neighboring symbols / samples Optimal predictor: Conditional mean (requires storage of large tables) Affine and Linear Prediction Simple structure, low-complex implementation possible Optimal prediction parameters are given by solution of Yule-Walker equations Works very well for real signals (e.g., audio, images, ...) Efficient Lossless Coding for Real-World Signals Affine/linear prediction (often: block-adaptive choice of prediction parameters) Entropy coding of prediction errors (e.g., arithmetic coding) Using marginal pmf often already yields good results Can be improved by using conditional pmfs (with simple conditions) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Dictionary-based Coding 2 / 37 Dictionary-based Coding Dictionary-Based Coding Coding of Text Files Very high amount of dependencies Affine prediction does not work (requires linear dependencies) Higher-order conditional coding should work well, but is way to complex (memory) Alternative: Do not code single characters, but words or phrases Example: English Texts Oxford English Dictionary lists less than 230 000 words (including obsolete words) On average, a word contains about 6 characters Average codeword length per character would be limited by 1
    [Show full text]
  • Annual Report 2016
    ANNUAL REPORT 2016 PUNJABI UNIVERSITY, PATIALA © Punjabi University, Patiala (Established under Punjab Act No. 35 of 1961) Editor Dr. Shivani Thakar Asst. Professor (English) Department of Distance Education, Punjabi University, Patiala Laser Type Setting : Kakkar Computer, N.K. Road, Patiala Published by Dr. Manjit Singh Nijjar, Registrar, Punjabi University, Patiala and Printed at Kakkar Computer, Patiala :{Bhtof;Nh X[Bh nk;k wjbk ñ Ò uT[gd/ Ò ftfdnk thukoh sK goT[gekoh Ò iK gzu ok;h sK shoE tk;h Ò ñ Ò x[zxo{ tki? i/ wB[ bkr? Ò sT[ iw[ ejk eo/ w' f;T[ nkr? Ò ñ Ò ojkT[.. nk; fBok;h sT[ ;zfBnk;h Ò iK is[ i'rh sK ekfJnk G'rh Ò ò Ò dfJnk fdrzpo[ d/j phukoh Ò nkfg wo? ntok Bj wkoh Ò ó Ò J/e[ s{ j'fo t/; pj[s/o/.. BkBe[ ikD? u'i B s/o/ Ò ô Ò òõ Ò (;qh r[o{ rqzE ;kfjp, gzBk óôù) English Translation of University Dhuni True learning induces in the mind service of mankind. One subduing the five passions has truly taken abode at holy bathing-spots (1) The mind attuned to the infinite is the true singing of ankle-bells in ritual dances. With this how dare Yama intimidate me in the hereafter ? (Pause 1) One renouncing desire is the true Sanayasi. From continence comes true joy of living in the body (2) One contemplating to subdue the flesh is the truly Compassionate Jain ascetic. Such a one subduing the self, forbears harming others. (3) Thou Lord, art one and Sole.
    [Show full text]
  • Arithmetic Coding
    Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according to the probability of their occurrence. The average code length corresponds exactly to the possible minimum given by information theory. Deviations which are caused by the bit-resolution of binary code trees do not exist. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better compression rate. Its implementation is more complex on the other hand. In arithmetic coding, a message is encoded as a real number in an interval from one to zero. Arithmetic coding typically has a better compression ratio than Huffman coding, as it produces a single symbol rather than several separate codewords. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) Arithmetic coding is a lossless coding technique. There are a few disadvantages of arithmetic coding. One is that the whole codeword must be received to start decoding the symbols, and if there is a corrupt bit in the codeword, the entire message could become corrupt. Another is that there is a limit to the precision of the number which can be encoded, thus limiting the number of symbols to encode within a codeword. There also exist many patents upon arithmetic coding, so the use of some of the algorithms also call upon royalty fees. Arithmetic coding is part of the JPEG data format.
    [Show full text]
  • Image Compression Using Discrete Cosine Transform Method
    Qusay Kanaan Kadhim, International Journal of Computer Science and Mobile Computing, Vol.5 Issue.9, September- 2016, pg. 186-192 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IMPACT FACTOR: 5.258 IJCSMC, Vol. 5, Issue. 9, September 2016, pg.186 – 192 Image Compression Using Discrete Cosine Transform Method Qusay Kanaan Kadhim Al-Yarmook University College / Computer Science Department, Iraq [email protected] ABSTRACT: The processing of digital images took a wide importance in the knowledge field in the last decades ago due to the rapid development in the communication techniques and the need to find and develop methods assist in enhancing and exploiting the image information. The field of digital images compression becomes an important field of digital images processing fields due to the need to exploit the available storage space as much as possible and reduce the time required to transmit the image. Baseline JPEG Standard technique is used in compression of images with 8-bit color depth. Basically, this scheme consists of seven operations which are the sampling, the partitioning, the transform, the quantization, the entropy coding and Huffman coding. First, the sampling process is used to reduce the size of the image and the number bits required to represent it. Next, the partitioning process is applied to the image to get (8×8) image block. Then, the discrete cosine transform is used to transform the image block data from spatial domain to frequency domain to make the data easy to process.
    [Show full text]
  • Modification of Adaptive Huffman Coding for Use in Encoding Large Alphabets
    ITM Web of Conferences 15, 01004 (2017) DOI: 10.1051/itmconf/20171501004 CMES’17 Modification of Adaptive Huffman Coding for use in encoding large alphabets Mikhail Tokovarov1,* 1Lublin University of Technology, Electrical Engineering and Computer Science Faculty, Institute of Computer Science, Nadbystrzycka 36B, 20-618 Lublin, Poland Abstract. The paper presents the modification of Adaptive Huffman Coding method – lossless data compression technique used in data transmission. The modification was related to the process of adding a new character to the coding tree, namely, the author proposes to introduce two special nodes instead of single NYT (not yet transmitted) node as in the classic method. One of the nodes is responsible for indicating the place in the tree a new node is attached to. The other node is used for sending the signal indicating the appearance of a character which is not presented in the tree. The modified method was compared with existing methods of coding in terms of overall data compression ratio and performance. The proposed method may be used for large alphabets i.e. for encoding the whole words instead of separate characters, when new elements are added to the tree comparatively frequently. Huffman coding is frequently chosen for implementing open source projects [3]. The present paper contains the 1 Introduction description of the modification that may help to improve Efficiency and speed – the two issues that the current the algorithm of adaptive Huffman coding in terms of world of technology is centred at. Information data savings. technology (IT) is no exception in this matter. Such an area of IT as social media has become extremely popular 2 Study of related works and widely used, so that high transmission speed has gained a great importance.
    [Show full text]
  • Information Theory Revision (Source)
    ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Information Theory Revision (Source) {S(k)} {b i } • Digital source is defined by digital source source coding 1. Symbol set: S = {mi, 1 ≤ i ≤ q} symbols/s bits/s 2. Probability of occurring of mi: pi, 1 ≤ i ≤ q 3. Symbol rate: Rs [symbols/s] 4. Interdependency of {S(k)} • Information content of alphabet mi: I(mi) = − log2(pi) [bits] • Entropy: quantifies average information conveyed per symbol q – Memoryless sources: H = − pi · log2(pi) [bits/symbol] i=1 – 1st-order memory (1st-order Markov)P sources with transition probabilities pij q q q H = piHi = − pi pij · log2(pij) [bits/symbol] Xi=1 Xi=1 Xj=1 • Information rate: tells you how many bits/s information the source really needs to send out – Information rate R = Rs · H [bits/s] • Efficient source coding: get rate Rb as close as possible to information rate R – Memoryless source: apply entropy coding, such as Shannon-Fano and Huffman, and RLC if source is binary with most zeros – Generic sources with memory: remove redundancy first, then apply entropy coding to “residauls” 86 ELEC3203 Digital Coding and Transmission – Overview & Information Theory S Chen Practical Source Coding • Practical source coding is guided by information theory, with practical constraints, such as performance and processing complexity/delay trade off • When you come to practical source coding part, you can smile – as you should know everything • As we will learn, data rate is directly linked to required bandwidth, source coding is to encode source with a data rate as small as possible, i.e.
    [Show full text]
  • Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE
    SUBMITTED TO IEEE TRANSACTIONS ON INFORMATION THEORY 1 Probability Interval Partitioning Entropy Codes Detlev Marpe, Senior Member, IEEE, Heiko Schwarz, and Thomas Wiegand, Senior Member, IEEE Abstract—A novel approach to entropy coding is described that entropy coding while the assignment of codewords to symbols provides the coding efficiency and simple probability modeling is the actual entropy coding. For decades, two methods have capability of arithmetic coding at the complexity level of Huffman dominated practical entropy coding: Huffman coding that has coding. The key element of the proposed approach is given by a partitioning of the unit interval into a small set of been invented in 1952 [8] and arithmetic coding that goes back disjoint probability intervals for pipelining the coding process to initial ideas attributed to Shannon [7] and Elias [9] and along the probability estimates of binary random variables. for which first practical schemes have been published around According to this partitioning, an input sequence of discrete 1976 [10][11]. Both entropy coding methods are capable of source symbols with arbitrary alphabet sizes is mapped to a approximating the entropy limit (in a certain sense) [12]. sequence of binary symbols and each of the binary symbols is assigned to one particular probability interval. With each of the For a fixed probability mass function, Huffman codes are intervals being represented by a fixed probability, the probability relatively easy to construct. The most attractive property of interval partitioning entropy (PIPE) coding process is based on Huffman codes is that their implementation can be efficiently the design and application of simple variable-to-variable length realized by the use of variable-length code (VLC) tables.
    [Show full text]
  • Fast Algorithm for PQ Data Compression Using Integer DTCWT and Entropy Encoding
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 12, Number 22 (2017) pp. 12219-12227 © Research India Publications. http://www.ripublication.com Fast Algorithm for PQ Data Compression using Integer DTCWT and Entropy Encoding Prathibha Ekanthaiah 1 Associate Professor, Department of Electrical and Electronics Engineering, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-3031-7263 Dr.A.Manjunath 2 Principal, Sri Krishna Institute of Technology, No 29, Chimney hills Chikkabanavara post, Bangalore-560090, Karnataka, India. Orcid Id: 0000-0003-0794-8542 Dr. Cyril Prasanna Raj 3 Dean & Research Head, Department of Electronics and communication Engineering, MS Engineering college , Navarathna Agrahara, Sadahalli P.O., Off Bengaluru International Airport,Bengaluru - 562 110, Karnataka, India. Orcid Id: 0000-0002-9143-7755 Abstract metering infrastructures (smart metering), integration of distributed power generation, renewable energy resources and Smart meters are an integral part of smart grid which in storage units as well as high power quality and reliability [1]. addition to energy management also performs data By using smart metering Infrastructure sustains the management. Power Quality (PQ) data from smart meters bidirectional data transfer and also decrease in the need to be compressed for both storage and transmission environmental effects. With this resilience and reliability of process either through wired or wireless medium. In this power utility network can be improved effectively. Work paper, PQ data compression is carried out by encoding highlights the need of development and technology significant features captured from Dual Tree Complex encroachment in smart grid communications [2].
    [Show full text]
  • Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian
    mathematics Article Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian Matea Ignatoski 1 , Jonatan Lerga 1,2,* , Ljubiša Stankovi´c 3 and Miloš Dakovi´c 3 1 Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, HR-51000 Rijeka, Croatia; [email protected] 2 Center for Artificial Intelligence and Cybersecurity, University of Rijeka, R. Matejcic 2, HR-51000 Rijeka, Croatia 3 Faculty of Electrical Engineering, University of Montenegro, Džordža Vašingtona bb, 81000 Podgorica, Montenegro; [email protected] (L.S.); [email protected] (M.D.) * Correspondence: [email protected]; Tel.: +385-51-651-583 Received: 3 June 2020; Accepted: 17 June 2020; Published: 1 July 2020 Abstract: The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.
    [Show full text]
  • Entropy Encoding in Wavelet Image Compression
    Entropy Encoding in Wavelet Image Compression Myung-Sin Song1 Department of Mathematics and Statistics, Southern Illinois University Edwardsville [email protected] Summary. Entropy encoding which is a way of lossless compression that is done on an image after the quantization stage. It enables to represent an image in a more efficient way with smallest memory for storage or transmission. In this paper we will explore various schemes of entropy encoding and how they work mathematically where it applies. 1 Introduction In the process of wavelet image compression, there are three major steps that makes the compression possible, namely, decomposition, quanti- zation and entropy encoding steps. While quantization may be a lossy step where some quantity of data may be lost and may not be re- covered, entropy encoding enables a lossless compression that further compresses the data. [13], [18], [5] In this paper we discuss various entropy encoding schemes that are used by engineers (in various applications). 1.1 Wavelet Image Compression In wavelet image compression, after the quantization step (see Figure 1) entropy encoding, which is a lossless form of compression is performed on a particular image for more efficient storage. Either 8 bits or 16 bits are required to store a pixel on a digital image. With efficient entropy encoding, we can use a smaller number of bits to represent a pixel in an image; this results in less memory usage to store or even transmit an image. Karhunen-Lo`eve theorem enables us to pick the best basis thus to minimize the entropy and error, to better represent an image for optimal storage or transmission.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • Data Compression
    Data Compression Data Compression Compression reduces the size of a file: ! To save space when storing it. ! To save time when transmitting it. ! Most files have lots of redundancy. Who needs compression? ! Moore's law: # transistors on a chip doubles every 18-24 months. ! Parkinson's law: data expands to fill space available. ! Text, images, sound, video, . All of the books in the world contain no more information than is Reference: Chapter 22, Algorithms in C, 2nd Edition, Robert Sedgewick. broadcast as video in a single large American city in a single year. Reference: Introduction to Data Compression, Guy Blelloch. Not all bits have equal value. -Carl Sagan Basic concepts ancient (1950s), best technology recently developed. Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos226 2 Applications of Data Compression Encoding and Decoding hopefully uses fewer bits Generic file compression. Message. Binary data M we want to compress. ! Files: GZIP, BZIP, BOA. Encode. Generate a "compressed" representation C(M). ! Archivers: PKZIP. Decode. Reconstruct original message or some approximation M'. ! File systems: NTFS. Multimedia. M Encoder C(M) Decoder M' ! Images: GIF, JPEG. ! Sound: MP3. ! Video: MPEG, DivX™, HDTV. Compression ratio. Bits in C(M) / bits in M. Communication. ! ITU-T T4 Group 3 Fax. Lossless. M = M', 50-75% or lower. ! V.42bis modem. Ex. Natural language, source code, executables. Databases. Google. Lossy. M ! M', 10% or lower. Ex. Images, sound, video. 3 4 Ancient Ideas Run-Length Encoding Ancient ideas. Natural encoding. (19 " 51) + 6 = 975 bits. ! Braille. needed to encode number of characters per line ! Morse code.
    [Show full text]