<<

This week’s Topics Shannon’s Framework (1948)

Shannon’s Work. Three entities: Source, Channel, and Receiver. • Mathematical/Probabilistic Model of Communication. Source: Generates “message” - a sequence of bits/ symbols - according to some • Definitions of , Entropy, “stochastic” process S. Randomness.

• Noiseless Channel & Coding Theorem. Communication Channel: Means of passing information from source to receiver. May • Noisy Channel & Coding Theorem. introduce errors, where the error sequence is another E. • Converses. Receiver: Knows the processes S and E, but • Algorithmic challenges. would like to know what sequence was generated by the source. Detour from Error-correcting ?

�c Madhu Sudan, Fall 2004: Essential Coding : MIT 6.895 1 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 2

Goals/Options Noiseless case: Example

• Noiseless case: Channel precious • Channel transmits bits: 0 ∞ 0, 1 ∞ 1. commodity. Would like to optimize usage. 1 bit per unit of time.

• Noisy case: Would like to recover message • Source produces a sequence of independent despite errors. bits: 0 with probability 1 − p and 1 with probability p. • Source can “Encode” information. • Question: Expected time to transmit n • Receiver can “Decode” information. bits, generated by this source?

Theories are very general: We will describe very specific cases only!

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 3 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 4 Noiseless Coding Theorem (for Example) Entropy of a source

Let H2(p) = −(p log2 p+ (1 −p) log2(1 −p)). • Distribution D on finite set S is D : S ∞ [0, 1] with D(x) = 1. Noiseless Coding Theorem: Informally, �x�S expected time ∞ H(p) · n as n ∞ ≤. • Entropy: H(D) = �x�S −D(x) log2 D(x). Formally, for every � > 0, there exists n0 s.t. • Entropy of p-biased bit H2(p). for every n ∃ n0, �E : {0, 1}n ∞ {0, 1}� and D : {0, 1}� ∞ • Entropy quantifies randomness in a {0, 1}n s.t. distribution.

• For all x ⊆ {0, 1}n,D(E(x)) = x. • Coding theorem: Suffices to specify entropy # of bits (amortized, in expectation) to

• Ex[|E(x)|] � (H(p) + �)n. specify the point of the probability space.

Proof: Exercise. • Fundamental notion in probability/.

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 5 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 6

Binary Entropy Function H2(p) Noisy Case: Example

• Plot H(p). • Source produces 0/1 w.p. 1/2.

• Main significance? • Error channel: with probability p (BSCp), transmits 1 bit − Let B (y, r) = {x ⊆ {0, 1}n|�(x, y) � 2 per unit of time faithfully with probability r} (n implied). 1 − p and flips it with probability p. − Let Vol2(r, n) = |B2(0, r)|. (H(p)+o(1))n − Then Vol2(pn, n) = 2 • Goal: How many source bits can be transmitted in n time units? − Can permit some error in recovery. − Error probability during recovery should be close to zero.

• Prevailing belief: Can only transmit o(n) bits.

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 7 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 8 Noisy Coding Theorem (for Example) The Encoding and Decoding Functions

Theorem: (Informally) Can transmit (1 − • E chosen at random from all functions k n H(p)) · n bits, with error probability going mapping {0, 1} ∞ {0, 1} . to zero exponentially fast. • D chosen to be the brute force (Formally) �� > 0, �� > 0 s.t. for all n: - for every y, D(y) is the vector x that minimizes �(E(x), y). Let k = (1 − H(p + �))n. Then �E : {0, 1}k ∞ {0, 1}n and �D : {0, 1}n ∞ • Far from constructive!!! {0, 1}k s.t. • But its a proof of concept! Pr [D(E(x) + �) =≈ x] � exp(−�n), �,x • Main lemma: For E,D as above, the probability of decoding failure is where x is chosen according to the source and exponentially small, for any fixed message � independently according to BSCp. x.

• Power of the probabilistic method!

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 9 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 10

Proof of Lemma • If � is not Bad, and no x� ≈= x is Bad for x, then D(E(x) + �) = x.

• Will fix x ⊆ {0, 1}k and E(x) first and pick • Conclude that decoding fails with error � next, and then the rest of E last! probability at most e−�(n), over random choice of E, � (for every x, and so also if x • � is Bad if it has weight more than (p+ �)n. is chosen at random).

Pr[�Bad] � 2−�n • Conclude there exists E such that encoding � and decoding lead to exponentially small (Chernoff bounds). error probability, provided k+H(p)·n ∈ n.

� � • x Bad for x, � if E(x ) ⊆ B2(E(x)+�, (p+ �)n).

Pr [x �Bad for x, �] � 2H(p+�)n/2n E(x�)

� k+H(p)·n−n • PrE[�x Bad for x, �] � 2

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 11 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 12 Converse to Coding Theorems Formal proof of the Converse

• Shannon also showed his results to be tight. • � Easy if weight � (p−�)n. Pr�[� Easy ] � exp(−n). For any y of weight ∃ (p − �)n, • For noisy case, 1−H(p) is the best possible Pr[� = y] � 2−H(p−�)n. rate ... • For x ⊆ {0, 1}k let S ∀ {0, 1}n = x • ... no matter what E, D are! n {y|D(y) = x}. Have �x |Sx| = 2 . • How to prove this? • Pr[ Decoding correctly]

• Intuition: Say we transmit E(x). W.h.p. = 2−k Pr[� = y − E(x)] � � � # erroneous bits is ≥ pn. In such case, x�{0,1}k y�Sx symmetry implies no one received vector is likely w.p. more that n ≥ 2−H(p)n . To = Pr[� Easy]+2−k Pr[� = y−E(x)|�Hard ] �pn � � � � have error probability close to zero, at least x y�Sx 2H(p)n received vectors must decode to x. = exp(−n) + 2−k ··2−H(p)n 2n But then need 2k � 2n/2H(p)n . = exp(−n)

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 13 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 14

Importance of Shannon’s Framework More general source

• Examples considered so far are the baby • Allows for Markovian sources. examples! • Source described by a finite collection of • Theory is wide and general. states with a probability transition .

• But, essentially probabilistic + “information­ • Each state corresponds to a fixed symbol theoretic” not computational. of the output.

• For example, give explicit E! Give efficient • Interesting example in the original paper: D! Shannon’s work does not. Markovian model of English. Computes the rate of English!

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 15 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 16 More general models of error General theorem

• i.i.d. case generally is a transition matrix • Every source has a Rate (based on entropy from � to �.(�, � need not be finite! of the distribution it generates). (Additive White Gaussian Channel). Yet capacity might be finite.) • Every channel has a Capacity.

• Also allows for Markovian error models. Theorem: If Rate < Capacity, information May be captured by a state diagram, with transmission is feasible with error decreasing each state having its own transition matrix exponentially with length of transmission. If from � to �. Rate > Capacity, information transmission is not feasible.

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 17 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 18

Contrast with Hamming adversarial error. Engineering need: Closer to Shannon theory. However Hamming theory provided solutions, since min. • Main goal of Shannon Theory: seemed easier to analyze than Perr. − Constructive (polytime/linear-time/etc.) E, D. − Maximize rate = k/n where E : {0, 1}k ∞ {0, 1}n . − While minimizing Perr = Prx,�[D(E(x)+ �) ≈= x]

• Hamming theory:

− Explicit description of {E(x)}x. − No focus on E, D itself. − Maximize k/n and d/n, where d =

minx1,x2{�(E(x1), E(x2))}.

• Interpretations: Shannon theory deals with probabilistic error. Hamming with

�c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 19 �c Madhu Sudan, Fall 2004: Essential Coding Theory: MIT 6.895 20