Kolmogorov Complexity and Other Entropy Measures Handout Mode

Kolmogorov Complexity and Other Entropy Measures Handout Mode

Application of Information Theory, Lecture 8 Kolmogorov Complexity and Other Entropy Measures Handout Mode Iftach Haitner Tel Aviv University. December 16, 2014 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 1 / 24 Part I Kolmogorov Complexity Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 2 / 24 Description length I What is the description length of the following strings? 1. 010101010101010101010101010101010101 2. 011010100000100111100110011001111110 3. 111010100110001100111100010101011111 I 1. Eighteen 01 p 2. First 36 bit of the binary expansion of 2 − 1 3. Looks random, but 22 ones out of 36 I Bergg’s paradox: Let s be “the smallest positive integer that cannot be described in twelve English words” I The above is a definition of s, of less than twelve English words... I Solution: the word “described" above in the definition of s is not well defined Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 3 / 24 Kolmogorov complexity ∗ ++ I For s string x 2 f0; 1g , let K (x) be the length of the shortest C program (written in binary) that outputs x (on empty input) I Now the term “described” is well defined. ++ I Why C ? I All (complete) programming language/computational model are essentially equivalent. 0 I Let K (x) be the description length of x in another complete language, then jK (x) − k 0(x)j ≤ const. I What is K (x) for x = 0101010101 ::: 01 | {z } n pairs ++ I “For i = 1 : i : n; print 01” I K (x) ≤ log n + const I This is considered to be small complexity. We typically ignore log n factors. I What is K (x) for x being the first n digits of π? I K (x) = log n + const Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 4 / 24 More examples n I What is K (x) for x 2 f0; 1g with k ones? n nh(k=n) I Recall that k ≤ 2 I Hence K (x) ≤ log n + nh(k=n) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 5 / 24 Bounds I K (x) ≤ jxj + const I Proof: “output x” I Most sequences have high Kolmogorov complexity: n−1 ++ I At most2 (C ) programs of length ≤ n − 2 n I 2 strings of length n 1 I Hence, at least 2 of n-bit strings have Kolmogorov complexity at least n − 1 I In particular, a random sequence has Kolmogorov complexity ≈ n Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 6 / 24 Conditional Kolmogorov complexity I K (xjy) — Kolmogorov complexity of x given y. The length of the shortest partogram that outputd x on input y I Chain rule K (x; y) ≈ k(y) + k(xjy) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 7 / 24 H vs. K H(X) speaks about a random variable X and K (x) of a string x, but I Both quantities measure the amount of uncertainty or randomness in an object I Both measure the number of bits it takes to describe an object I Another property: Let X1;:::; Xn be iid, then whp K (X1;:::; Xn) ≈ H(X1;:::; Xn) = nH(X1) I Proof:? AEP I Example: coin flip (0:7; 0:3) then whp we get a string with K (x) ≈ n · h(0:3) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 8 / 24 Universal compression I A program of length K (x) that outputs x, compresses x into k(x) bit of information. 9 I Example: length of the human genome:6 · 10 bits I But the code is redundant I The relevant number to measure the number of possible values is the Kolmogorov complexity of the code. I No-one knows its value... Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 9 / 24 Universal probability ++ K (x) = minp : p()=x jpj, where p() is the output of C program defined by p. Definition 1 The universal probability of a string x is P −|pj PU (x) = p : p()=x 2 = Prp f0;1g1 [p() = x] I Namely, the probability that if one picks a program at random, it prints x. I Insensitive (up o constant factor) to the computation model. I Interpretation: PU (x) is the the probability that you observe x in nature. I Computer as an intelligent amplifier Theorem 2 −K (x) −K (x) ∗ 9c > 0 such that 2 ≤ PU (x) ≤ c · 2 for everyx 2 f0; 1g . −K (x) I The interesting part is PU (x) ≤ c · 2 I Hence, for X ∼ PU , it holds that jE [K (X)] − H(X)j ≤ c Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 10 / 24 Proving Theorem 2 1 ∗ I We need to find c > 0 such that k(x) ≤ log + c for every x 2 f0; 1g PU (x) 1 I In other words, find a program to output x whose length is log + c PU (x) I Idea, program chooses a leaf on the Shannon code for PU (in which x is l m of depth log 1 ) PU (x) I Problem: PU is not computable I Solution: compute a better and better estimate for the tree of PU along with the “mapping" from the tree nodes back to codewords. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 11 / 24 Proving Theorem 2 I Initial T to be the infinite Binary tree. Program 3 (M) Enumerate over all programs in f0; 1g∗: at round i emulate the first i programs (one after the other), for i steps, and do: If program p outputs a string x and (∗; x; n(x)) 2= T , place (p; x; n(x)) at unused n(x)-depth node of l m 0 1 ^ P −jp j T , for n(x) = log + 1 and PU (x) = 0 emulated p0 has output 2 P^U (x) p : x I The program never gets stuck (can always add the node). Proof: Let x 2 f0; 1g∗. At each point through the execution ofM, P −|pj −K (x) (p;x;·)2T 2 ≤ 2 P −K (x) Since x 2 ≤ 1, the proof follows by Kraft inequality. ∗ l 1 m I 8x 2 f0; 1g : M adds a node (·; x; ·) to T at depth2 + log PU (x) ^ Proof: PU (x) converges to PU (x) ∗ l 1 m I For x 2 f0; 1g , let `(x) be the location its (2 + log )-depth node PU (x) I Program for printing x. RunM till it assigns the node at the location of `(x) Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 12 / 24 Applications I (another) Proof that there are infinity many primes. I Assume there are finitely many primes p1;:::; pm Qm di I Any length n integer x can be written as x = i=1 pi I di ≤ n, hence length di ≤ log n I Hence, K (x) ≤ m · log n + const I But for most numbers k(x) ≥ n − 1 Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 13 / 24 Computability of K I Can we compute K (x)? I Answer, No. I Proof: Assume K is computable by a program of length C I Let s be the smallest positive integer s.t. K (s) > 2C + 10; 000 I s can be computed by the following program: 1. x = 0 2. While (K (x) < 2C + 10; 000): x ++ 3. Output x I Thus K (s) < C + log C + log 10; 000 + const < 2C + 10; 000 I Bergg’s Paradox, revisited: I s — the smallest positive number with K (s) > 10000 I This is not a paradox, since the description of s is not short. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 14 / 24 Explicit large complexity strings I Can we give an explicit example of string x with large k(x)? Theorem 4 9 constantC s.t. the theoremK (x) ≥ C cannot be proven (under any reasonable axiom system). I For most strings K (x) > C + 1, but it cannot be proven even for a single string I K (x) ≥ C is an example for a theorem that cannot be proven, and for most x’s cannot be disproved. I Proof: for integer C define the program TC : 1. y = 0 2. If y is a proof for the statement k(x) > C, output x 3. y ++ I jTC j = log C + D, where D is a const I Take C such that C > log C + D I If TC stops and outputs x, then k(x) < log C + D < C, a contradiction to the fact that 9 proof that k(x) > C. Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 15 / 24 Part II Other Entropy Measures Iftach Haitner (TAU) Application of Information Theory, Lecture 8 December 16, 2014 16 / 24 Other entropy measures Let X ∼ p be a random variable over X . I Recall that Shannon entropy of X is P H(X) = x2X −p(x) · log p(x) = EX [− log p(X)] I Max entropy of X isH 0(X) = log jSupp(X)j I Min entropy of X isH 1(X) = minx2X {− log p(x)g = − log maxx2X fp(x)g P 2 I Collision probability of X isCP (X) = x2X p(x) Probability of collision when drawing two independent samples from X I Collision entropy/Renyi entropy of X isH 2(X) = − log CP(X) 1 Pn α α I For α 6= 1 2 N — Hα = 1−α log i=1 pi = 1−α log(kpkα) I H1(X) ≤ H2(X) ≤ H(X) ≤ H0(X) (Jensen) Equality iff X is uniform over X P 0 0 I For instance,CP (X) ≤ x p(x) maxx 0 p(x ) = maxx 0 p(x ).

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    24 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us