Compressing Information Kolmogorov Complexity Optimal Decompression
Total Page:16
File Type:pdf, Size:1020Kb
Compressing information Optimal decompression algorithm Almost everybody now is familiar with compress- The definition of KU depends on U. For the trivial ÞÔ ÞÔ µ ´ µ ing/decompressing programs such as , , decompression algorithm U ´y y we have KU x Ö , , etc. A compressing program can x. One can try to find better decompression algo- be applied to any file and produces the “compressed rithms, where “better” means “giving smaller com- version” of that file. If we are lucky, the compressed plexities”. However, the number of short descrip- version is much shorter than the original one. How- tions is limited: There is less than 2n strings of length ever, no information is lost: the decompression pro- less than n. Therefore, for any fixed decompres- gram can be applied to the compressed version to get sion algorithm the number of words whose complex- n the original file. ity is less than n does not exceed 2 1. One may [Question: A software company advertises a com- conclude that there is no “optimal” decompression pressing program and claims that this program can algorithm because we can assign short descriptions compress any sufficiently long file to at most 90% of to some string only taking them away from other its original size. Would you buy this program?] strings. However, Kolmogorov made a simple but How compression works? A compression pro- crucial observation: there is asymptotically optimal gram tries to find some regularities in a file which decompression algorithm. allow to give a description of the file which is shorter than the file itself; the decompression program re- Definition 1 An algorithm U is asymptotically not µ 6 ´ µ · constructs the file using this description. worse than an algorithm V if KU ´x KV x C for come constant C and for all x. Kolmogorov complexity Theorem 1 There exists an decompression algo- The Kolmogorov complexity may be roughly de- rithm U which is asymptotically not worse than any scribed as “the compressed size”. However, there are other algorithm V. some differences. The technical difference is that in- stead of files (which are usually byte sequences) we Such an algorithm is called asymptotically optimal consider bit strings (sequences of zeros and ones). one. The complexity KU with respect to an asymp- The principal difference is that in the framework of totically optimal U is called Kolmogorov complexity. Kolmogorov complexity we have no compression al- The Kolmogorov complexity of a string x is denoted gorithm and deal only with the decompression algo- µ by K ´x . (We assume that some asymptotically op- rithm. timal decompression algorithm is fixed.) Of course, Here is the definition. Let U be any algorithm µ Kolmogorov complexity is defined only up to O´1 whose inputs and outputs are binary strings. Using additive term. U as a decompression algorithm, we define the com- µ The complexity K ´x can be interpreted as the µ plexity KU ´x of a binary string x with respect to U amount of information in x or the “compressed size” as follows: of x. µ ´ µ KU ´x min y U y x The construction of optimal decompression (here y denotes the length of a binary string y). In algorithm other words, the complexity of x is defined as the length of the shortest description of x if each binary The idea of the construction is used in the so-called µ string y is considered as a description of U ´y “self-extracting archives”. Assume that we want to µ Let us stress that U ´y may be defined not for all send a compressed version of some file to our friend, y’s and that there are no restrictions on time neces- but we are not sure he has the decompression pro- µ sary to compute U ´y . Let us mention also that for gram. What to do? Of course, we can send the pro- some U and x the set in the definition of KU may be gram together with the compressed file. Or we can µ ·∞ empty; we assume that min ´ /0 . append the compressed file to the end of the program 1 and get an executable file which will be applied to its Proof. (a) The asymptotically optimal decompres- own contents during the execution). sion algorithm U is not worse that the trivial decom- µ The same simple trick is used to construct an uni- pression algorithm V ´y y. versal decompression algorithm U. Having an input (b) The number of such x’s does not exceed the string x, the algorithm U starts scanning x from left number of their compressed versions, which is lim- to right until it founds some program p written in ited by the number of all binary strings of length not a fixed programming language (say, Pascal) where exceeding n, which is bounded by 2n·1. On the other µ 6 programs are self-delimiting (so the end of the pro- hand, the number of x’s such that K ´x n is not less gram can be determined uniquely). Then the rest of than 2n c (here c is the constant from (a)), because all µ x is used as an input for p, and U ´x is defined as the words of length n c have complexity not exceeding output of p. n. Why U is (asymptotically) optimal? Consider any (c) Let U be the optimal decompression algorithm other decompression algorithm V . Let v be a Pascal used in the definition of K. Compare U with decom- ´ ´ µµ program which implements V. Then pression algorithm V : y f U y : ´ ´ µµ 6 ´ ´ µµ · ´ µ 6 ´ µ · ´ µ µ 6 ´ µ · KU ´x KV x v KU f x KV f x O 1 KU x O 1 for any string x. Indeed, if y is V -compressed version (any U-compressed version of x is a V -compressed ´ µ µ of x (i.e., V ´y x), then vy is U-compressed version version of f x ). µ of x (i.e., U ´vy x) which is only v bits longer. (d) We allocate strings of length n to be com- pressed versions of strings in Vn (when a new ele- ment of V appears during the enumeration, the first Basic properties of Kolmogorov complexity n unused string of length n is allocated). This pro- µ 6 · ´ µ (a) K ´x x O 1 cedure provides a decompression algorithm W such µ 6 ¾ that KW ´x n for any x Vn. µ 6 (b) The number of x’s such that K ´x n is equal to (e) According to (a), all the 100% of strings of n 2 up to a bounded factor separated from zero. length n have complexity not exceeding n · c for some c. It remains to mention that the number of (c) For any computable function f there exists a strings whose complexity is less than n c does not constant c such that exceed the number of all strings of length n c. Therefore, for c 7 the fraction of strings having ´ µµ 6 ´ µ · K ´ f x K x c complexity less than n c among all the strings of length n does not exceed 1%. µ (for any x such that f ´x is defined). (d) Assume that for any natural n a finite set Vn Problems containing not more than 2n elements is given. 1. A decompression algorithm D is chosen in such a Assume that the relation x ¾ Vn is enumerable, µ i.e., there is an algorithm which produces the way that KD ´x is even for any string x. Could D be optimal? ; (possibly infinite) list of all pairs x n such that µ 2. The same question if K ´x is a power of 2 for x ¾ Vn. Then there is a constant c such that all D any x. elements of Vn have complexity at most n · c (for any n). 3. Let D be the optimal decompression algorithm. ´ µµ Does it guarantee that D´D x is also an optimal de- (e) The “typical” binary string of length n has com- compression algorithm? ;::: plexity close to n: there exists a constant c such 4. Let D1 ; D2 be a computable sequence of de- µ 6 ´ µ · that for any n more than 99% of all strings of compression algorithms. Prove that K ´x K x Di · ´ µ ´ µ length n have complexity in between n c and 2logi O 1 for all i and x (the constant in O 1 does n · c. not depend on x and i). 2 £ µ 6 ´ µ · ´ µ · ´ µ 5. Is it true that K ´xy K x K y O 1 for Theorem 3 There exists a constant c such that all µ > < all x and y? the theorems of type “K ´x n” have n c. Algorithmic properties of K Indeed, assume that it is not true. Consider the following algorithm α: For a given integer k, gen- Theorem 2 The complexity function K is not com- erate all the theorems and look for a theorem of type putable; moreover, any computable lower bound for µ > K ´x s for some x and some s greater than k. When K is bounded from above. µ such a theorem is found, x becomes the output α ´s α ´ µ Proof. Assume that k is a computable lower bound of the algorithm. By our assumption, s is defined for K which is not bounded from above. Then for any for all s.