Lecture 3: Entropy and Arithmetic Coding

Total Page:16

File Type:pdf, Size:1020Kb

Lecture 3: Entropy and Arithmetic Coding Data Compression Techniques Part 1: Entropy Coding Lecture 3: Entropy and Arithmetic Coding Juha K¨arkk¨ainen 07.11.2017 1 / 24 Entropy As we have seen, the codeword lengths `s , s 2 Σ, of a complete prefix P −`s code for an alphabet Σ satisfy s2Σ 2 = 1: Thus the prefix code defines a probability distribution over Σ: P(s) = 2−`s for all s 2 Σ: Conversely, we can convert probabilities into codeword lengths: `s = − log P(s) for all s 2 Σ: For an arbitrary distribution, these codeword lengths are generally not integral and do not represent an actual prefix code. However, the average code length derived from these lengths is a quantity called the entropy. Definition Let P be a probability distribution over an alphabet Σ. The entropy of P is X H(P) = − P(s) log P(s): s2Σ The quantity − log P(s) is called the self-information of the symbol s. 2 / 24 The concept of (information theoretic) entropy was introduced in 1948 by Claude Shannon in his paper \A Mathematical Theory of Communication" that established the discipline of information theory. From that paper is also the following result. Theorem (Noiseless Coding Theorem) Let C be an optimal prefix code for an alphabet Σ with the probability distribution P. Then X H(P) ≤ P(s)jC(s)j < H(P) + 1: s2Σ The same paper contains the Noisy-channel Coding Theorem that deals with coding in the presence of error. 3 / 24 The entropy is an absolute lower bound on compressibility in the average sense. Because of integral code lengths, a prefix codes may not quite match the lower bound, but the upper bound shows that they can get fairly close. Example symbol a e i o u y probability 0.2 0.3 0.1 0.2 0.1 0.1 self-information 2.32 1.74 3.32 2.32 3.32 3.32 Huffman codeword length 2 2 3 2 4 4 H(P) ≈ 2:45 while the average Huffman code length is 2.5. We will later see how to get even closer to entropy, but it is never possible to get below. We will next prove the Noiceless Coding Theorem, and for that we will need Gibbs' Inequality. 4 / 24 Lemma (Gibbs' Inequality) For any two probability distributions P and Q over Σ X X − P(s) log P(s) ≤ − P(s) log Q(s): s2Σ s2Σ with equality if only if P = Q. Proof. Since ln x ≤ x − 1 for all x > 0, X Q(s) X Q(s) X X P(s) ln ≤ P(s) − 1 = Q(s) − P(s) = 0: P(s) P(s) s2Σ s2Σ s2Σ s2Σ Dividing both sides by ln 2, we can change ln into log. Then X X X Q(s) P(s) log Q(s) − P(s) log P(s) = P(s) log ≤ 0 ; P(s) s2Σ s2Σ s2Σ which proves the inequality. Since ln x = x − 1 if and only if x = 1, all the inequalities above are equalities if and only if P = Q. 5 / 24 Proof of Noiseless Coding Theorem Let us prove the upper bound first. For all s 2 Σ, let `s = d− log P(s)e. Then X X X 2−`s ≤ 2log P(s) = P(s) = 1 s2Σ s2Σ s2Σ and by Kraft's inequality, there exists a (possibly inoptimal and even redundant) prefix code with codeword length `s for each s 2 Σ (known as Shannon code). The average code length of this code is X X X P(s)`s = P(s)d− log P(s)e < P(s)(− log P(s)+1) = H(P)+1: s2Σ s2Σ s2Σ Now we prove the lower bound. Let C be an optimal prefix code with codeword length `s for each s 2 Σ. We can assume that C is complete, P −`s −`s i.e., s2Σ 2 = 1. Define a distribution Q by setting Q(s) = 2 : By Gibbs' inequality, the average code length of C satisfies X X −`s X P(s)`s = P(s)(− log(2 )) = − P(s) log(Q(s)) ≥ H(P): s2Σ s2Σ s2Σ 6 / 24 One can prove tighter upper bounds than H(P) + 1 on the average Huffman code length in terms of the maximum probability pmax of a single symbol: X H(P) + p + 0:086 when p < 0:5 P(s)jC(s)j ≤ max max H(P) + pmax when pmax ≥ 0:5 s2Σ The case when pmax is close to 1 represents the worst case for prefix coding, because then the self-information − log pmax is much less than 1, but a codeword can never be shorter than 1. Example symbol a b probability 0.99 0.01 self-information 0.014 6.64 Huffman codeword length 1 1 H(P) ≈ 0:081 while the average Huffman code length is 1. The Huffman code is more than 10 times longer than the entropy. 7 / 24 One way to get closer to entopy is to use a single codeword for multiple symbols. Example symbol pair aa ab ba bb probability 0.9801 0.0099 0.0099 0.0001 self-information 0.029 6.658 6.658 13.288 Huffman codeword length 1 2 3 3 H(P) ≈ 0:162 while the average Huffman code length is 1.0299. We are now encoding two symbols with just over one bit on average. The more symbols are encoded with one codeword, the closer to entropy we get. Taking this approach to the logical extreme | encoding the whole message with a single codeword | leads to arithmetic coding. 8 / 24 Arithmetic coding Let Σ = fs1; s2;:::; sσg be an alphabet with an ordering s1 < s2 < ··· < sσ. Let P be a probability distribution over Σ and define the cumulative distribution F (si ), for all si 2 Σ, as i X F (si ) = P(sj ) j=1 We associate a symbol si with the interval[ F (si ) − P(si ); F (si )). Example symbol abc probability 0.2 0.5 0.3 interval [0; 0:2) [0:2; 0:7) [0:7; 1:0) The intervals correspond to the dyadic intervals in the proof of Kraft's inequality except now we do not require the intervals to be dyadic. 9 / 24 We can extend the interval representation from symbols to strings. For any n ≥ 1, define a probability distribution Pn and a cumulative n distribution Fn over Σ , the set of strings of length n, as n Y Pn(X ) = P(xi ) i=1 X Fn(X ) = Pn(Y ) Y 2Σn;Y ≤X n for any X = x1x2 ::: xn 2 Σ , where Y ≤ X uses the lexicographical ordering of strings. The interval for X is [Fn(X ) − Pn(X ); Fn(X )). The interval can be computed incrementally. n Input: string X = x1x2 ::: xn 2 Σ and distributions P and F over Σ. Output: interval [l; r) for X . (1) [l; r) [0:0; 1:0) (2) for i = 1 to n do 0 0 0 0 0 (3) r F (xi ); l r − P(xi ) // [l ; r ) is interval for xi (4) p r − l; l l + p · l 0; r l + p · r 0 (5) return[ l; r) 10 / 24 Example Computing the interval for the string cab. symbol abc probability 0.2 0.5 0.3 interval [0; 0:2) [0:2; 0:7) [0:7; 1:0) Subdivide the interval for c. string ca cb cc probability 0.06 0.15 0.09 interval [0:7; 0:76) [0:76; 0:91) [0:91; 1:0) Subdivide the interval for ca. string caa cab cac probability 0.012 0.03 0.018 interval [0:7; 0:712) [0:712; 0:742) [0:742; 0:76) Output [0:712; 0:742). 11 / 24 We can similarly associate an interval with each string over the code alphabet (which we assume to be binary here) but using equal probabilities. The interval for a binary string B 2 f0; 1g` is val(B) val(B) + 1 I(B) = ; 2` 2` where val(B) is the value of B as a binary number. I This is the same interval as in the proof of Kraft's inequality and is always dyadic. I Another characterization of I(B) is that it contains exactly all fractional binary numbers beginning with B. In fact, in the literature, arithmetic coding is often described using binary fractions instead of intervals. Example For B = 101110, val(B) = 46 and thus 46 47 I(B) = ; = [:71875;:734375) = [:101110;:101111): 64 64 12 / 24 The arithmetic coding procedure is: source string 7! source interval 7! code interval 7! code string The missing step is the mapping from source intervals to code intervals. A good choice for the code interval is the longest dyadic interval that is completely contained in the source interval. This ensures that the code intervals cannot overlap when the source intervals do not overlap and thus the code is a prefix code for Σn. Example The string cab is encoded as 101110: cab 7! [:712;:742) 7! [:71875;:734375) = [:101110;:101111) 7! 101110 The code is usually not optimal or even complete, because there are gaps between the code intervals. However, the length of the code interval for a source interval of length p is always more than p=4. Thus the code length for a string X is less than − log Pn(X ) + 2, and the average code length is less than H(Pn) + 2. Since H(Pn) = nH(P), the average code length per symbols is H(P) + 2=n.
Recommended publications
  • Data Compression: Dictionary-Based Coding 2 / 37 Dictionary-Based Coding Dictionary-Based Coding
    Dictionary-based Coding already coded not yet coded search buffer look-ahead buffer cursor (N symbols) (L symbols) We know the past but cannot control it. We control the future but... Last Lecture Last Lecture: Predictive Lossless Coding Predictive Lossless Coding Simple and effective way to exploit dependencies between neighboring symbols / samples Optimal predictor: Conditional mean (requires storage of large tables) Affine and Linear Prediction Simple structure, low-complex implementation possible Optimal prediction parameters are given by solution of Yule-Walker equations Works very well for real signals (e.g., audio, images, ...) Efficient Lossless Coding for Real-World Signals Affine/linear prediction (often: block-adaptive choice of prediction parameters) Entropy coding of prediction errors (e.g., arithmetic coding) Using marginal pmf often already yields good results Can be improved by using conditional pmfs (with simple conditions) Heiko Schwarz (Freie Universität Berlin) — Data Compression: Dictionary-based Coding 2 / 37 Dictionary-based Coding Dictionary-Based Coding Coding of Text Files Very high amount of dependencies Affine prediction does not work (requires linear dependencies) Higher-order conditional coding should work well, but is way to complex (memory) Alternative: Do not code single characters, but words or phrases Example: English Texts Oxford English Dictionary lists less than 230 000 words (including obsolete words) On average, a word contains about 6 characters Average codeword length per character would be limited by 1
    [Show full text]
  • Annual Report 2016
    ANNUAL REPORT 2016 PUNJABI UNIVERSITY, PATIALA © Punjabi University, Patiala (Established under Punjab Act No. 35 of 1961) Editor Dr. Shivani Thakar Asst. Professor (English) Department of Distance Education, Punjabi University, Patiala Laser Type Setting : Kakkar Computer, N.K. Road, Patiala Published by Dr. Manjit Singh Nijjar, Registrar, Punjabi University, Patiala and Printed at Kakkar Computer, Patiala :{Bhtof;Nh X[Bh nk;k wjbk ñ Ò uT[gd/ Ò ftfdnk thukoh sK goT[gekoh Ò iK gzu ok;h sK shoE tk;h Ò ñ Ò x[zxo{ tki? i/ wB[ bkr? Ò sT[ iw[ ejk eo/ w' f;T[ nkr? Ò ñ Ò ojkT[.. nk; fBok;h sT[ ;zfBnk;h Ò iK is[ i'rh sK ekfJnk G'rh Ò ò Ò dfJnk fdrzpo[ d/j phukoh Ò nkfg wo? ntok Bj wkoh Ò ó Ò J/e[ s{ j'fo t/; pj[s/o/.. BkBe[ ikD? u'i B s/o/ Ò ô Ò òõ Ò (;qh r[o{ rqzE ;kfjp, gzBk óôù) English Translation of University Dhuni True learning induces in the mind service of mankind. One subduing the five passions has truly taken abode at holy bathing-spots (1) The mind attuned to the infinite is the true singing of ankle-bells in ritual dances. With this how dare Yama intimidate me in the hereafter ? (Pause 1) One renouncing desire is the true Sanayasi. From continence comes true joy of living in the body (2) One contemplating to subdue the flesh is the truly Compassionate Jain ascetic. Such a one subduing the self, forbears harming others. (3) Thou Lord, art one and Sole.
    [Show full text]
  • Arithmetic Coding
    Arithmetic Coding Arithmetic coding is the most efficient method to code symbols according to the probability of their occurrence. The average code length corresponds exactly to the possible minimum given by information theory. Deviations which are caused by the bit-resolution of binary code trees do not exist. In contrast to a binary Huffman code tree the arithmetic coding offers a clearly better compression rate. Its implementation is more complex on the other hand. In arithmetic coding, a message is encoded as a real number in an interval from one to zero. Arithmetic coding typically has a better compression ratio than Huffman coding, as it produces a single symbol rather than several separate codewords. Arithmetic coding differs from other forms of entropy encoding such as Huffman coding in that rather than separating the input into component symbols and replacing each with a code, arithmetic coding encodes the entire message into a single number, a fraction n where (0.0 ≤ n < 1.0) Arithmetic coding is a lossless coding technique. There are a few disadvantages of arithmetic coding. One is that the whole codeword must be received to start decoding the symbols, and if there is a corrupt bit in the codeword, the entire message could become corrupt. Another is that there is a limit to the precision of the number which can be encoded, thus limiting the number of symbols to encode within a codeword. There also exist many patents upon arithmetic coding, so the use of some of the algorithms also call upon royalty fees. Arithmetic coding is part of the JPEG data format.
    [Show full text]
  • Image Compression Using Discrete Cosine Transform Method
    Qusay Kanaan Kadhim, International Journal of Computer Science and Mobile Computing, Vol.5 Issue.9, September- 2016, pg. 186-192 Available Online at www.ijcsmc.com International Journal of Computer Science and Mobile Computing A Monthly Journal of Computer Science and Information Technology ISSN 2320–088X IMPACT FACTOR: 5.258 IJCSMC, Vol. 5, Issue. 9, September 2016, pg.186 – 192 Image Compression Using Discrete Cosine Transform Method Qusay Kanaan Kadhim Al-Yarmook University College / Computer Science Department, Iraq [email protected] ABSTRACT: The processing of digital images took a wide importance in the knowledge field in the last decades ago due to the rapid development in the communication techniques and the need to find and develop methods assist in enhancing and exploiting the image information. The field of digital images compression becomes an important field of digital images processing fields due to the need to exploit the available storage space as much as possible and reduce the time required to transmit the image. Baseline JPEG Standard technique is used in compression of images with 8-bit color depth. Basically, this scheme consists of seven operations which are the sampling, the partitioning, the transform, the quantization, the entropy coding and Huffman coding. First, the sampling process is used to reduce the size of the image and the number bits required to represent it. Next, the partitioning process is applied to the image to get (8×8) image block. Then, the discrete cosine transform is used to transform the image block data from spatial domain to frequency domain to make the data easy to process.
    [Show full text]
  • Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian
    mathematics Article Comparison of Entropy and Dictionary Based Text Compression in English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian Matea Ignatoski 1 , Jonatan Lerga 1,2,* , Ljubiša Stankovi´c 3 and Miloš Dakovi´c 3 1 Department of Computer Engineering, Faculty of Engineering, University of Rijeka, Vukovarska 58, HR-51000 Rijeka, Croatia; [email protected] 2 Center for Artificial Intelligence and Cybersecurity, University of Rijeka, R. Matejcic 2, HR-51000 Rijeka, Croatia 3 Faculty of Electrical Engineering, University of Montenegro, Džordža Vašingtona bb, 81000 Podgorica, Montenegro; [email protected] (L.S.); [email protected] (M.D.) * Correspondence: [email protected]; Tel.: +385-51-651-583 Received: 3 June 2020; Accepted: 17 June 2020; Published: 1 July 2020 Abstract: The rapid growth in the amount of data in the digital world leads to the need for data compression, and so forth, reducing the number of bits needed to represent a text file, an image, audio, or video content. Compressing data saves storage capacity and speeds up data transmission. In this paper, we focus on the text compression and provide a comparison of algorithms (in particular, entropy-based arithmetic and dictionary-based Lempel–Ziv–Welch (LZW) methods) for text compression in different languages (Croatian, Finnish, Hungarian, Czech, Italian, French, German, and English). The main goal is to answer a question: ”How does the language of a text affect the compression ratio?” The results indicated that the compression ratio is affected by the size of the language alphabet, and size or type of the text. For example, The European Green Deal was compressed by 75.79%, 76.17%, 77.33%, 76.84%, 73.25%, 74.63%, 75.14%, and 74.51% using the LZW algorithm, and by 72.54%, 71.47%, 72.87%, 73.43%, 69.62%, 69.94%, 72.42% and 72% using the arithmetic algorithm for the English, German, French, Italian, Czech, Hungarian, Finnish, and Croatian versions, respectively.
    [Show full text]
  • The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree
    International Journal of Applied Engineering Research ISSN 0973-4562 Volume 13, Number 6 (2018) pp. 3296-3414 © Research India Publications. http://www.ripublication.com The Pillars of Lossless Compression Algorithms a Road Map and Genealogy Tree Evon Abu-Taieh, PhD Information System Technology Faculty, The University of Jordan, Aqaba, Jordan. Abstract tree is presented in the last section of the paper after presenting the 12 main compression algorithms each with a practical This paper presents the pillars of lossless compression example. algorithms, methods and techniques. The paper counted more than 40 compression algorithms. Although each algorithm is The paper first introduces Shannon–Fano code showing its an independent in its own right, still; these algorithms relation to Shannon (1948), Huffman coding (1952), FANO interrelate genealogically and chronologically. The paper then (1949), Run Length Encoding (1967), Peter's Version (1963), presents the genealogy tree suggested by researcher. The tree Enumerative Coding (1973), LIFO (1976), FiFO Pasco (1976), shows the interrelationships between the 40 algorithms. Also, Stream (1979), P-Based FIFO (1981). Two examples are to be the tree showed the chronological order the algorithms came to presented one for Shannon-Fano Code and the other is for life. The time relation shows the cooperation among the Arithmetic Coding. Next, Huffman code is to be presented scientific society and how the amended each other's work. The with simulation example and algorithm. The third is Lempel- paper presents the 12 pillars researched in this paper, and a Ziv-Welch (LZW) Algorithm which hatched more than 24 comparison table is to be developed.
    [Show full text]
  • Data Compression
    Data Compression Data Compression Compression reduces the size of a file: ! To save space when storing it. ! To save time when transmitting it. ! Most files have lots of redundancy. Who needs compression? ! Moore's law: # transistors on a chip doubles every 18-24 months. ! Parkinson's law: data expands to fill space available. ! Text, images, sound, video, . All of the books in the world contain no more information than is Reference: Chapter 22, Algorithms in C, 2nd Edition, Robert Sedgewick. broadcast as video in a single large American city in a single year. Reference: Introduction to Data Compression, Guy Blelloch. Not all bits have equal value. -Carl Sagan Basic concepts ancient (1950s), best technology recently developed. Robert Sedgewick and Kevin Wayne • Copyright © 2005 • http://www.Princeton.EDU/~cos226 2 Applications of Data Compression Encoding and Decoding hopefully uses fewer bits Generic file compression. Message. Binary data M we want to compress. ! Files: GZIP, BZIP, BOA. Encode. Generate a "compressed" representation C(M). ! Archivers: PKZIP. Decode. Reconstruct original message or some approximation M'. ! File systems: NTFS. Multimedia. M Encoder C(M) Decoder M' ! Images: GIF, JPEG. ! Sound: MP3. ! Video: MPEG, DivX™, HDTV. Compression ratio. Bits in C(M) / bits in M. Communication. ! ITU-T T4 Group 3 Fax. Lossless. M = M', 50-75% or lower. ! V.42bis modem. Ex. Natural language, source code, executables. Databases. Google. Lossy. M ! M', 10% or lower. Ex. Images, sound, video. 3 4 Ancient Ideas Run-Length Encoding Ancient ideas. Natural encoding. (19 " 51) + 6 = 975 bits. ! Braille. needed to encode number of characters per line ! Morse code.
    [Show full text]
  • Answers to Exercises
    Answers to Exercises A bird does not sing because he has an answer, he sings because he has a song. —Chinese Proverb Intro.1: abstemious, abstentious, adventitious, annelidous, arsenious, arterious, face- tious, sacrilegious. Intro.2: When a software house has a popular product they tend to come up with new versions. A user can update an old version to a new one, and the update usually comes as a compressed file on a floppy disk. Over time the updates get bigger and, at a certain point, an update may not fit on a single floppy. This is why good compression is important in the case of software updates. The time it takes to compress and decompress the update is unimportant since these operations are typically done just once. Recently, software makers have taken to providing updates over the Internet, but even in such cases it is important to have small files because of the download times involved. 1.1: (1) ask a question, (2) absolutely necessary, (3) advance warning, (4) boiling hot, (5) climb up, (6) close scrutiny, (7) exactly the same, (8) free gift, (9) hot water heater, (10) my personal opinion, (11) newborn baby, (12) postponed until later, (13) unexpected surprise, (14) unsolved mysteries. 1.2: A reasonable way to use them is to code the five most-common strings in the text. Because irreversible text compression is a special-purpose method, the user may know what strings are common in any particular text to be compressed. The user may specify five such strings to the encoder, and they should also be written at the start of the output stream, for the decoder’s use.
    [Show full text]
  • Lec 05 Arithmetic Coding
    ECE 5578 Multimedia Communication Lec 05 Arithmetic Coding Zhu Li Dept of CSEE, UMKC web: http://l.web.umkc.edu/lizhu phone: x2346 Z. Li, Multimedia Communciation, 2018 p.1 Outline Lecture 04 ReCap Arithmetic Coding About Homework-1 and Lab Z. Li, Multimedia Communciation, 2018 p.2 JPEG Coding Block (8x8 pel) based coding DCT transform to find sparse * representation Quantization reflects human visual system Zig-Zag scan to convert 2D to 1D string Run-Level pairs to have even more = compact representation Hoffman Coding on Level Category Quant Table: Fixed on the Level with in the category Z. Li, Multimedia Communciation, 2018 p.3 Coding of AC Coefficients Zigzag scanning: Example 8 24 -2 0 0 0 0 0 -31 -4 6 -1 0 0 0 0 0 -12 -1 2 0 0 0 0 0 0 -2 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Example: zigzag scanning result 24 -31 0 -4 -2 0 6 -12 0 0 0 -1 -1 0 0 0 2 -2 0 0 0 0 0 -1 EOB (Run, level) representation: (0, 24), (0, -31), (1, -4), (0, -2), (1, 6), (0, -12), (3, -1), (0, -1), (3, 2), (0, -2), (5, -1), EOB Z. Li, Multimedia Communciation, 2018 p.4 Coding of AC Coefficients Run / Base Run / Base … Run / Base codeword Catg. codeword Catg. Codeword Cat. EOB 1010 - - … ZRL 1111 1111 001 0/1 00 1/1 1100 … 15/1 1111 1111 1111 0101 0/2 01 1/2 11011 … 15/2 1111 1111 1111 0110 0/3 100 1/3 1111001 … 15/3 1111 1111 1111 0111 0/4 1011 1/4 111110110 … 15/4 1111 1111 1111 1000 0/5 11010 1/5 11111110110 … 15/5 1111 1111 1111 1001 … … … … … … … ZRL: represent 16 zeros when number of zeros exceeds 15.
    [Show full text]
  • An Efficient Compression Technique Using Arithmetic Coding
    Journal of Scientific Research & Reports 4(1): 60-67, 2015; Article no.JSRR.2015.009 ISSN: 2320–0227 SCIENCEDOMAIN international www.sciencedomain.org An Efficient Compression Technique Using Arithmetic Coding Ahsan Habib 1* and Debojyoty Chowdhury 1 1Department of Computer Science and Engineering, Shahjalal University of Science and Technology, Sylhet, Bangladesh. This work was carried out in collaboration between both authors. Author AH designed the study, wrote the protocol, wrote the first draft of the manuscript and managed the experimental process. Author DC managed the literature searches; analyses of the study performed the compression analysis. Both authors read and approved the final manuscript. Article Information DOI: 10.9734/JSRR/2015/12846 Editor(s): (1) Luigi Rodino, Faculty of Mathematical Analysis, Dipartimento di Matematica, Università di Torino, Italy. Reviewers: (1) H D Arora, Department of Mathematics, Amity Institute of Applied Sciences, Amity University, Noida – 201303, India. (2) Garba S. Adamu, Department of Mathematics and Computer, Waziri Umaru Federal Polytechnic, Birnin Kebbi, Nigeria. (3) Anonymous, Near East University, Cyprus. (4) Anonymous, Technical College – Kirkuk, Iraq. (5) Anonymous, Hanoi University of Science and Technology, Vietnam. Complete Peer review History: http://www.sciencedomain.org/review-history.php?iid=689&id=22&aid=6327 Received 21 st July 2014 Accepted 16 th September 2014 Original Research Article th Published 5 October 2014 ABSTRACT Data compression is a special kind of technique for representing various types of data such as text, audio, video, images etc. Data compression is used to reduce the number of bits and to store data. Different types of data compression mechanism are widely being used for compressing several types of data.
    [Show full text]
  • An Inter-Data Encoding Technique That Exploits Synchronized Data for Network Applications
    1 An Inter-data Encoding Technique that Exploits Synchronized Data for Network Applications Wooseung Nam, Student Member, IEEE, Joohyun Lee, Member, IEEE, Ness B. Shroff, Fellow, IEEE, and Kyunghan Lee, Member, IEEE Abstract—In a variety of network applications, there exists significant amount of shared data between two end hosts. Examples include data synchronization services that replicate data from one node to another. Given that shared data may have high correlation with new data to transmit, we question how such shared data can be best utilized to improve the efficiency of data transmission. To answer this, we develop an inter-data encoding technique, SyncCoding, that effectively replaces bit sequences of the data to be transmitted with the pointers to their matching bit sequences in the shared data so called references. By doing so, SyncCoding can reduce data traffic, speed up data transmission, and save energy consumption for transmission. Our evaluations of SyncCoding implemented in Linux show that it outperforms existing popular encoding techniques, Brotli, LZMA, Deflate, and Deduplication. The gains of SyncCoding over those techniques in the perspective of data size after compression in a cloud storage scenario are about 12.5%, 20.8%, 30.1%, and 66.1%, and are about 78.4%, 80.3%, 84.3%, and 94.3% in a web browsing scenario, respectively. Index Terms—Source coding; Data compression; Encoding; Data synchronization; Shared data; Reference selection F 1 INTRODUCTION are capable of exploiting previously stored or delivered URING the last decade, cloud-based data synchroniza- data for storing or transmitting new data. However, they D tion services for end-users such as Dropbox, OneDrive, mostly work at the level of files or chunks of a fixed and Google Drive have attracted a huge number of sub- size (e.g., 4MB in Dropbox, 8kB in Neptune [4]), which scribers.
    [Show full text]
  • Adaptive Video Compression Using Discrete Cosine and Wavelet Transform
    M. Mary Shanthi Rani et al. / International Journal of Computer Science Engineering (IJCSE) Adaptive Video Compression using Discrete Cosine and Wavelet Transform M. Mary Shanthi Rani1 Assistant Professor, Department of Computer Science & Applications The Gandhigram Rural Institute-Deemed University Gandhigram-624 302 Tamil Nadu, India. [email protected] P. Chitra2 Ph.D. Scholar, Department of Computer Science & Applications The Gandhigram Rural Institute-Deemed University Gandhigram-624 302 Tamil Nadu, India. [email protected] K. Anandharaj3 M.Phil Scholar, Department of Computer Science & Applications The Gandhigram Rural Institute-Deemed University Gandhigram-624 302 Tamil Nadu, India. [email protected] Abstract— Video compression is a process of eliminating redundancy and irrelevant information for efficient storage and transmission. Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT) are the popular transforms used for efficient compression. DCT has high energy compaction and requires less computational resources. In this paper, a novel method is proposed combining DCT, DWT and Arithmetic coding for compressing the video. Arithmetic coding is used to optimize data, generating minimum redundancy code. Experimental results demonstrate the superior performance of the proposed method than similar terms of achieving high PSNR value and compression ratio. Keywords- DCT, DWT, Motion Estimation, Motion Compensation, Quantization, Arithmetic coding. I. INTRODUCTION Videos play a crucial role in multimedia applications. Normally, video file occupies huge amount of memory to store. A video file of 40 minutes of 30 frames per second frame rate and resolution of 750*570 will require 1.39 GB. The storage and bandwidth requirement of this uncompressed video is very high. Videos and images contain a large amount of redundant information.
    [Show full text]