Foundations of Coding Compression, Encryption

2 INFORMATION THEORY AND COMPRESSION The previous chapter already contains several coding principles for compression: fax coding, instantaneous coding, the principle of entropy, and so on. We now carry out a deeper and more exhaustive survey of these principles, from the understanding of why compression is theoretically possible to examples of compression codes which are commonly used in computer science and in everyday life for storing and exchanging music and videos. The objective is to improve the transmission time of each message as well as the storage space. For this purpose, one has to build codes that optimize the size of messages. Here we assume that the channel is not subject to perturbations (one says that encoding is without noise); error handling will be studied in the last chapter. We are going to build encoding techniques, which enable one to choose effcient codes, as well as an important theory in order to quantify the information included in a message, to compute the minimum size of an encoding scheme and thus to determine the “value” of a given code. We frst focus on lossless data compression, that is, compression followed by decompression does not modify the original fle. The frst section describes the the- oretical limits of lossless compression, and the second algorithms to approach those limits. Then the third section shows how changes of representations can expand these presupposed boundaries, with applications to usual codes. Foundations of Coding: Compression, Encryption, Error Correction, First Edition. Jean-Guillaume Dumas, Jean-Louis Roch, Éric Tannier and Sébastien Varrette. © 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc. http://foundationsofcoding.imag.fr 99 100 INFORMATION THEORY AND COMPRESSION Then, at the end of this chapter, we introduce several techniques for the compression of images or sounds that allow some loss on the visual or the audio quality by modeling human perception. Exercise 2.1 (It is Impossible to Compress ALL Files Without Loss) 1. How many distinct fles of size exactly N bits are there? 2. How many distinct fles of size strictly lower than N bits are there? 3. Conclude on the existence of a method that will reduce the size of any fle. Solution on page 295. Exercise 2.2 (On The Scarcity of Files Compressible Without Loss) Show that less than one N-bits fle over a million, with N > 20 is compressible by more than 20 bits,withoutlossofinformation. Solutiononpage295. 2.1 INFORMATION THEORY Information theory gives the mathematical framework for compression codes. Recall that an alphabet is a fnite set on which messages are composed, containing the information to encode or the information already encoded. The set of letters of the source message (data compression is often called source coding)isthesource alphabet,and the set of code letters is the code alphabet.Forexample,theLatinalphabetistheset of letters we are using to write this text, and {0, 1} is the alphabet used to write the messages that are transmitted through most of the numerical channels. The set of fnite strings over some alphabet V is denoted by V+,andtheimage of the source alphabet through the encoding function is a subset of V+ called the set of codewords,orsometimesalsosimplycalledthecode, especially in information theory. Therefore, a code C over some alphabet V is a fnite subset of V+.Thecodeis composed of the basic elements from which messages are built. An element m in C is called a codeword.Itslengthisdenotedbyl(m).Thearity of the code is the cardinal number of V.Acodeofarity2issaidtobebinary. For example, C ={0, 10, 110} is a binary code of arity 2, over the alphabet V = {0, 1}. 2.1.1 Average Length of a Code In all this section, for the sake of simplicity and because of their practical importance in telecommunications, we mainly focus on binary codes. Nevertheless, most of the following results can be applied to any code. As all codewords are not always of the same size, one uses a measure dependent on the frequencies of appearance in order to evaluate the length of the messages that will encode the source. One recalls that a source of information is composed of an alphabet INFORMATION THEORY 101 S and a probability distribution over S.Forasymbolsi in a source =(S, ), P(si) is the probability of occurrence of si. Let =(S, ) with S ={s1, … , sn},andletC be a code in ,whoseencoding function is f (C is the image of S through f ). The average length of the code C is n l(C)= l(f (si))P(si). i=1 ∑ Example 2.1 S a b c d , 1 1 1 1 , V 0 1 . ={ , , , } =(2 , 4 , 8 , 8 ) ={ , } If C ={f (a)=00, f (b)=01, f (c)=10, f (d)=11},theaveragelengthofthe scheme is 2. If C ={f (a)=0, f (b)=10, f (c)=110, f (d)=1110},thentheaveragelengthof the scheme is 1 1 2 1 3 1 4 1 1 875. ∗ 2 + ∗ 4 + ∗ 8 + ∗ 8 = . One uses the average length of an encoding scheme in order to measure its effciency. 2.1.2 Entropy as a Measure of the Amount of Information We are reaching the fundamental notions of information theory. Let us consider a source =(S, ).Oneonlyknowstheprobabilitydistributionofthissource,and one wishes to measure quantitatively his/her ignorance concerning the behavior of .Forinstance,thisuncertainty is higher if the number of symbols in S is large. It is low if the probability of occurrence of a symbol is close to 1, and it reaches its highest value if the distribution is uniform. One uses the entropy of a source in order to measure the average amount of information issued by this source. For example, let us imagine a fair die whose value is only given by comparison with a number we are able to choose: how many questions are required in order to determine the value of the die? If one proceeds by dichotomy, it only takes 3 = log2(6) questions. Now let us suppose that the die is unfair: one has a probability 1over2andeachofthefveothervalueshasaprobability1over10.Ifthefrstques- ⌈tion is “is⌉ it 1?” then in half of the cases, this question is enough to determine the value of the die. For the other half, it will require three additional questions. Hence, the average number of questions required is 1 1 1 4 1 log 1 5 1 2 ∗ + 2 ∗ =−2 2( 2 )− ∗ 10 log 1 2 5. 2( 10 ) = . Actually, it is still possible to refne this result by noticing that three questions ⌈ ⌉ are not always required in order to determine the right value among 2, 3, 4, 5, 6: if dichotomy splits these fve possibilities into two groups 2, 3and4, 5, 6, then only two additional questions will be required in order to fnd 2 or 3. Only 5 and 6 will require three questions to be separated. For a large number of draws, it is still possible to improve this method if the questions do not always split the set the same way, for example, in 2, 3, 5and4, 6, so that two questions are alternatively required and so on. By extending this reasoning, one shows that the average number of questions required for the fve possibilities is equal to log2(10). 102 INFORMATION THEORY AND COMPRESSION Hence, the amount of information included in this throw of die (which can be easily applied generally to any source) is defned intuitively by the average number of questions. Formally, entropy is defned, for a source (S, ), =(p1, … , pn),as n 1 H( )= p log ( ) i 2 p . i=1 i ∑ This is a measure of the uncertainty relying on a probability law, which is always illustrated using the example of the die: one considers the random variable (source) generated by the throw of an n-face die. We have seen that there is more uncertainty in the result of this experiment if the die is fair than if the die is unfair. This can be written in the following way: for all p p , H p p H 1 1 log n, 1, … , n ( 1, … , n) ≤ ( n , … , n )= 2 according to Property 1 in Chapter 1. 2.1.3 Shannon’s Theorem This fundamental theorem of information theory is known as Shannon’s theorem or the theorem of noiseless encoding. First of all, we formulate the theorem when considering a source without memory. Theorem 20 Let beasourcewithoutmemoryofentropyH ().Anycodeuniquely decodable of over an alphabet V of size q (i.e., q = V ), and an average length l, satisfes H() l ≥ . | | log2 q Moreover, there exists a code uniq uely decodable of over an alphabet of size q , and an average length l, that satisfes H() l < + 1 . log2 q Proof. First part:LetC =(c1, … , cn) be a code of ,uniquelydecodable,oversome alphabet of size q ,andlet l l be the lengths of the words in C.IfK n 1 , ( 1, … , n) = i=1 q li q −li then K ≤ 1fromKraft’stheorem(seepage69).Let(q 1, … , q n) be such that ∑q i = n K for all i = 1, … , n.Onehasq i ∈[0, 1] for all i,and i=1 q i = 1, thus (q 1, … , q n) is a probability distribution. Gibbs’ lemma (see page 17) can be applied, and one obtains ∑ n q −li p log 0; i 2 Kp ≤ i=1 i ∑ in other words, n n 1 p log p l log q + log K i 2 p ≤ i i 2 2 . i=1 i i=1 ∑ ∑ INFORMATION THEORY 103 Yet, because log 2 K ≤ 0, one has H() ≤ l ∗ log2 q ;Hence,theresult. 1 n 1 li 1 Second part:Letli = logq .As i 1 l ≤ 1(indeed,q ≥ ), there exists a pi = q i pi code of over an alphabet of size q ,uniquelydecodable,withlengthsofcodewords ∑ n equal to (l1, … , ln).Itsaveragelengthis⌈ ⌉ l = i=1 pili.Then,thepropertyoftheceiling 1 function gives us logq + 1 > li and, as a consequence, pi ∑ n n 1 p log p l log q − log q i 2 p > i i 2 2 .

Foundations of Coding Compression, Encryption

Table of Contents Modules and Packages

An Optimal Real-Time Controller for Vertical Plasma Stabilization N

Aligning Intent and Behavior in Software Systems: How Programs Communicate & Their Distribution and Organization

Volume 5: DAF Variable Detail Pages

Kafl: Hardware-Assisted Feedback Fuzzing for OS Kernels

Pipenightdreams Osgcal-Doc Mumudvb Mpg123-Alsa Tbb

Systems Cost/Performance Analysis (Study 2.3) Final Report

Porovnání Komprimaˇcních Program ˚U Na Textových Korpusech

IZO: Applications of Large-Window Compression to Virtual Machine Management

Lightning Talk Presentations

UCLA Electronic Theses and Dissertations

MIGRATORY COMPRESSION Coarse-Grained Data Reordering to Improve Compressibility