Kolmogorov Complexity and Other Entropy Measures Handout Mode

Application of Information Theory, Lecture 8 Kolmogorov Complexity and Other Entropy Measures Handout Mode Iftach Haitner Tel Aviv University. December 16, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24 Part I Kolmogorov Complexity Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 2 / 24 Description length I What is the description length of the following strings? 1. 010101010101010101010101010101010101 2. 011010100000100111100110011001111110 3. 111010100110001100111100010101011111 I 1. Eighteen 01 p 2. First 36 bit of the binary expansion of 2 − 1 3. Looks random, but 22 ones out of 36 I Bergg’s paradox: Let s be “the smallest positive integer that cannot be described in twelve English words” I The above is a definition of s, of less than twelve English words... I Solution: the word “described" above in the definition of s is not well defined Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24 Kolmogorov complexity ∗ ++ I For s string x 2 f0; 1g , let K (x) be the length of the shortest C program (written in binary) that outputs x (on empty input) I Now the term “described” is well defined. ++ I Why C ? I All (complete) programming language/computational model are essentially equivalent. 0 I Let K (x) be the description length of x in another complete language, then jK (x) − k 0(x)j ≤ const. I What is K (x) for x = 0101010101 ::: 01 | {z } n pairs ++ I “For i = 1 : i : n; print 01” I K (x) ≤ log n + const I This is considered to be small complexity. We typically ignore log n factors. I What is K (x) for x being the first n digits of π? I K (x) = log n + const Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24 More examples n I What is K (x) for x 2 f0; 1g with k ones? n nh(k=n) I Recall that k ≤ 2 I Hence K (x) ≤ log n + nh(k=n) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24 Bounds I K (x) ≤ jxj + const I Proof: “output x” I Most sequences have high Kolmogorov complexity: n−1 ++ I At most2 (C ) programs of length ≤ n − 2 n I 2 strings of length n 1 I Hence, at least 2 of n-bit strings have Kolmogorov complexity at least n − 1 I In particular, a random sequence has Kolmogorov complexity ≈ n Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24 Conditional Kolmogorov complexity I K (xjy) — Kolmogorov complexity of x given y. The length of the shortest partogram that outputd x on input y I Chain rule K (x; y) ≈ k(y) + k(xjy) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24 H vs. K H(X) speaks about a random variable X and K (x) of a string x, but I Both quantities measure the amount of uncertainty or randomness in an object I Both measure the number of bits it takes to describe an object I Another property: Let X1;:::; Xn be iid, then whp K (X1;:::; Xn) ≈ H(X1;:::; Xn) = nH(X1) I Proof:? AEP I Example: coin flip (0:7; 0:3) then whp we get a string with K (x) ≈ n · h(0:3) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24 Universal compression I A program of length K (x) that outputs x, compresses x into k(x) bit of information. 9 I Example: length of the human genome:6 · 10 bits I But the code is redundant I The relevant number to measure the number of possible values is the Kolmogorov complexity of the code. I No-one knows its value... Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24 Universal probability ++ K (x) = minp : p()=x jpj, where p() is the output of C program defined by p. Definition 1 The universal probability of a string x is P −|pj PU (x) = p : p()=x 2 = Prp f0;1g1 [p() = x] I Namely, the probability that if one picks a program at random, it prints x. I Insensitive (up o constant factor) to the computation model. I Interpretation: PU (x) is the the probability that you observe x in nature. I Computer as an intelligent amplifier Theorem 2 −K (x) −K (x) ∗ 9c > 0 such that 2 ≤ PU (x) ≤ c · 2 for everyx 2 f0; 1g . −K (x) I The interesting part is PU (x) ≤ c · 2 I Hence, for X ∼ PU , it holds that jE [K (X)] − H(X)j ≤ c Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24 Proving Theorem 2 1 ∗ I We need to find c > 0 such that k(x) ≤ log + c for every x 2 f0; 1g PU (x) 1 I In other words, find a program to output x whose length is log + c PU (x) I Idea, program chooses a leaf on the Shannon code for PU (in which x is l m of depth log 1 ) PU (x) I Problem: PU is not computable I Solution: compute a better and better estimate for the tree of PU along with the “mapping" from the tree nodes back to codewords. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24 Proving Theorem 2 I Initial T to be the infinite Binary tree. Program 3 (M) Enumerate over all programs in f0; 1g∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗; x; n(x)) 2= T , place (p; x; n(x)) at unused n(x)-depth node of l m 0 1 ^ P −jp j T , for n(x) = log + 1 and PU (x) = 0 emulated p0 has output 2 PÛ (x) p : x I The program never gets stuck (can always add the node). Proof: Let x 2 f0; 1g∗. At each point through the execution ofM, P −|pj −K (x) (p;x;·)2T 2 ≤ 2 P −K (x) Since x 2 ≤ 1, the proof follows by Kraft inequality. ∗ l 1 m I 8x 2 f0; 1g : M adds a node (·; x; ·) to T at depth2 + log PU (x) ^ Proof: PU (x) converges to PU (x) ∗ l 1 m I For x 2 f0; 1g , let `(x) be the location its (2 + log )-depth node PU (x) I Program for printing x. RunM till it assigns the node at the location of `(x) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24 Applications I (another) Proof that there are infinity many primes. I Assume there are finitely many primes p1;:::; pm Qm di I Any length n integer x can be written as x = i=1 pi I di ≤ n, hence length di ≤ log n I Hence, K (x) ≤ m · log n + const I But for most numbers k(x) ≥ n − 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24 Computability of K I Can we compute K (x)? I Answer, No. I Proof: Assume K is computable by a program of length C I Let s be the smallest positive integer s.t. K (s) > 2C + 10; 000 I s can be computed by the following program: 1. x = 0 2. While (K (x) < 2C + 10; 000): x ++ 3. Output x I Thus K (s) < C + log C + log 10; 000 + const < 2C + 10; 000 I Bergg’s Paradox, revisited: I s — the smallest positive number with K (s) > 10000 I This is not a paradox, since the description of s is not short. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24 Explicit large complexity strings I Can we give an explicit example of string x with large k(x)? Theorem 4 9 constantC s.t. the theoremK (x) ≥ C cannot be proven (under any reasonable axiom system). I For most strings K (x) > C + 1, but it cannot be proven even for a single string I K (x) ≥ C is an example for a theorem that cannot be proven, and for most x’s cannot be disproved. I Proof: for integer C define the program TC : 1. y = 0 2. If y is a proof for the statement k(x) > C, output x 3. y ++ I jTC j = log C + D, where D is a const I Take C such that C > log C + D I If TC stops and outputs x, then k(x) < log C + D < C, a contradiction to the fact that 9 proof that k(x) > C. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24 Part II Other Entropy Measures Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 16 / 24 Other entropy measures Let X ∼ p be a random variable over X . I Recall that Shannon entropy of X is P H(X) = x2X −p(x) · log p(x) = EX [− log p(X)] I Max entropy of X isH 0(X) = log jSupp(X)j I Min entropy of X isH 1(X) = minx2X {− log p(x)g = − log maxx2X fp(x)g P 2 I Collision probability of X isCP (X) = x2X p(x) Probability of collision when drawing two independent samples from X I Collision entropy/Renyi entropy of X isH 2(X) = − log CP(X) 1 Pn α α I For α 6= 1 2 N — Hα = 1−α log i=1 pi = 1−α log(kpkα) I H1(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen) Equality iff X is uniform over X P 0 0 I For instance,CP (X) ≤ x p(x) maxx 0 p(x ) = maxx 0 p(x ).

Load more