04 Rate Distortion Theory-WS12

Rate Distortion Theory Rate Distortion Theory Lossy coding: decoded signal is an approximation of original Rate distortion theory is a theoretical discipline treating data compression from the viewpoint of information theory The results of rate distortion theory are obtained without consideration of a specific coding method The goal of rate distortion theory is to calculate the minimum transmission bit rate for a given distortion and source Below: example for a rate distortion function for a discrete iid source R (H(S),Dmin = 0) Rate Distortion Function € € (R = 0,D = Dmax ) D 0 D 0 max € 1 o November 14, 2012 1 / 64 € Rate Distortion Theory Outline Transmission System and Variables Rate Distortion Function for Discrete Random Variables Definition: Source and Decoded Symbols, Coder/Decoder, Distortion, Rate Rate Distortion Function and Properties Rate Distortion Function for Binary IID Sources Rate Distortion Function for Amplitude-Continuous Random Variables Definition: Source and Decoded Symbols, Coder/Decoder, Distortion, Rate Rate Distortion Function Shannon Lower Bound Rate Distortion Function for Memoryless Gaussian Sources Rate Distortion Function for Gaussian Sources with Memory o November 14, 2012 2 / 64 Rate Distortion Theory Transmission System and Variables Transmission system Distortion D S S Source Coder Decoder ' Sink Bit-Rate R Derivation in two steps 0 1 Define S, S , coder/decoder, distortion D, and rate R 0 2 Establish a functional relationship between S, S , D, and R For two types of random variables 1 Derivation for discrete random variables 2 Derivation for amplitude-continuous random variables (Gaussian, Laplacian, etc.) o November 14, 2012 3 / 64 Rate Distortion Theory The Operational Rate Distortion Function Operational Rate Distortion Function Encoder: Irreversible encoder mapping α : s ! i Lossless mapping γ : i ! b Decoder: Lossless mapping γ−1 : b ! i Decoder mapping β : i ! s0 o November 14, 2012 4 / 64 Rate Distortion Theory The Operational Rate Distortion Function Source code Source code: Q = (α; β; γ) N-dimensional block source code: QN = αN ; βN ; γN f g Blocks of N consecutive input samples are independently coded (N) Each block of input samples s = fs0; ··· ; sN−1g is mapped to a vector of K quantization indexes (K) (N) i = αN (s ) (1) Resulting vector of indexes i is converted into bit sequence (`) (K) (N) b = γN (i ) = γN (αN (s )) (2) At decoder side, index vector is recovered (K) −1 (`) −1 (K) i = γN (b ) = γN (γN (i )) (3) Index vector mapped to a block of reconstructed samples 0(N) 0 0 s = fs0; ··· ; sN−1g 0(N) (K) (N) s = βN (i ) = βN (αN (s )) (4) o November 14, 2012 5 / 64 Rate Distortion Theory The Operational Rate Distortion Function Distortion I This lecture: additive distortion measures 0 0 d1(s; s ) 0 with equality, if and only if s = s (5) ≥ Examples: 1; for s = s0 Hamming distance: d (s; s0) = (6) 1 0; for s =6 s0 0 0 2 Mean squared error: d1(s; s ) = (s s ) (7) − (N) 0(N) 0 0 0 Distortion for s = s0; s1; : : : ; sN−1 and s = s ; s ; : : : ; s f g f 0 1 N−1g N−1 (N) 1 X d (s(N); s0 ) = d (s ; s0 ) (8) N N 1 i i i=0 o November 14, 2012 6 / 64 Rate Distortion Theory The Operational Rate Distortion Function Distortion II Stationary random process S = Sn and N-dimensional block source code f g QN = αN ; βN ; γN f g n (N) (N) o δ(QN ) = E dN S ; βN (αN (S )) ) (9) Z = f(s) dN s; βN (αN (s)) ds (10) RN Arbitrary random process S = Sn and arbitrary code Q f g n (N) (N) o δ(Q) = lim E dN S ; βN (αN (S )) ) (11) N!1 o November 14, 2012 7 / 64 Rate Distortion Theory The Operational Rate Distortion Function Rate Average number of bits per input symbol ( denotes the number of bits) j · j (N) 1 (N) (`) (N) rN (s ) = γN (αN (s )) with b = γN (αN (s )) (12) N Stationary random process S = Sn and N-dimensional block source code f g QN = αN ; βN ; γN f g 1 n (N) o r(QN ) = E γN (αN (S )) (13) N Z 1 = f(s) γN (αN (s)) ds (14) N RN Arbitrary random process S = Sn and arbitrary code Q f g 1 n (N) o r(Q) = lim E γN (αN (S )) (15) N!1 N o November 14, 2012 8 / 64 Rate Distortion Theory The Operational Rate Distortion Function Operational Rate Distortion Function For given source S: each code Q is associated with a rate distortion point (R; D) Rate distortion point is achievable, if there exist a code Q such that r(Q) R and δ(Q) D Operational≤ rate distortion≤ function R(D) and its inverse, the operational distortion rate function D(R) R(D) = inf r(Q) D(R) = inf δ(Q) (16) Q: δ(Q)≤D Q: r(Q)≤R o November 14, 2012 9 / 64 Rate Distortion Theory The Information Rate Distortion Function Motivation of Information Rate Distortion Function Operational rate distortion function specifies a fundamental performance bound for lossy source coding Difficulty to evaluate R(D) = inf r(Q) (17) Q: δ(Q)≤D Information rate distortion function: introduced by Shannon in ! [Shannon, 1948, Shannon, 1959] Obtain expression of rate distortion bound that involves the distribution of the source using mutual information Show that information rate distortion function is achievable o November 14, 2012 10 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Example: Discrete Binary Source Consider a discrete binary iid source with alphabet A = 0; 1 f g p(0) = p(1) = 0:5 (18) Assume distortion measure is Hamming distance Coding scheme: 1 out of M symbols is not transmitted but guessed at receiver side Rate for this experimental codec is M 1 1 RE = − = 1 bits/symbol (19) M − M Distortion is given as 1 1 1 1 D = 0 + 1 = (20) M 2 · 2 · 2M Operational rate distortion function for the above code: RE(D) = 1 2D (21) − o November 14, 2012 11 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Can we do better than the example codec? RE(D) is the rate distortion performance for the experimental coding scheme Rate distortion function R(D) for the same source 1 0.9 0.8 0.7 0.6 R (D) 0.5 E R [bits] 0.4 0.3 0.2 Rate Distortion Function 0.1 0 0 0.1 0.2 0.3 0.4 0.5 D o November 14, 2012 12 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete IID Sources Entropy H(S) is a measure of uncertainty about random variable S Conditional entropy H(S S0) is a measure on uncertainty about random variable S after observingj random variable S0 Mutual information I(S; S0) = H(S) H(S S0) (22) − j is a measure of for the reduction of uncertainty about S due to the observation of S0 average amount of information that S0 carries about S Mutual information for discrete random variables S A and S0 B 2 2 X X p(s s0) I(S; S0) = p(s; s0) log j (23) 2 p(s) s2A s02B o November 14, 2012 13 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete IID Sources and Rate Mutual information rewritten using Bayes' rule X X p(s s0) I(S; S0) = p(s; s0) log j 2 p(s) s2A s02B X X p(s; s0) = p(s; s0) log 2 p(s)p(s0) s2A s02B X X p(s0 s) = p(s; s0) log j 2 p(s0) s2A s02B = H(S0) H(S0 S) (24) − j Since H(S0 S) 0 j ≥ H(S0) I(S; S0) (25) ≥ o November 14, 2012 14 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete Sources and Rate T N Consider N-dimensional random vectors S = (S0;S1;:::;SN−1) A and S0 = (S0 ;S0 ;:::;S0 )T BN 2 0 1 N−1 2 0 0 0 IN (S; S ) = HN (S ) HN (S S) − | {z j } ≥0 0 HN (S ) (26) ≤ Recall: entropy rate H (S0) H¯ (S0) = lim N (27) N!1 N Rate vs. mutual information H (S0) I (S; S0) r(Q) lim N lim N (28) ≥ N!1 N ≥ N!1 N o November 14, 2012 15 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Description of a Codec for Discrete Sources Using Conditional PMFs I Statistical properties of a mapping s0 = β(α(s)) can be described by an 0 N-th order conditional pmf gN (s s) 0 j For N > 1, gN (s s) are multivariate conditional pmfs 0 j Pmfs gN (s s) obtained by a deterministic mapping (codes) are a subset of the set of allj conditional pmfs Example 1: Mapping s s0 : s0 = s=∆ ∆ ! b c · g (s′ s) 1 | 1 1 : s0 = bs=∆c · ∆ g (s0js) = 1 0 : otherwise s/∆ ∆ s′ ⌊ ⌋ o November 14, 2012 16 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Description of a Codec Using Conditional Pmfs II 0 0 Example 2: Mapping (sn; sn+1) (s ; s ) ! n n+1 sn+1 1 8 (1; 1) : sn + sn+1 > 1 0 0 < (s ; s ) = (−1; −1) : sn + sn+1 < −1 n n+1 1 1 sn : (0; 0) : otherwise − 1 − 8 0 g1(s′ s) x(s): s = −1 | > 0 0 < y(s): s = 0 g1(s js) = 0 > z(s): s = 1 :> 0 : otherwise with x(s) + y(s) + z(s) = 1 1 1 s′ − o November 14, 2012 17 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Information Rate Distortion Function for Discrete Sources I 0 0 N N Let gN (s s) be the N-th order conditional pmf of s B given s A Distortionj 2 2 X X 0 0 δN (gN ) = p(s; s ) dN (s; s ) · s2AN s02BN X X 0 0 = p(s) gN (s s) dN (s; s ) (29) · j · s2AN s02BN Mutual information 0 X X 0 p(s; s ) IN (gN ) = p(s; s ) log · 2 p(s) p(s0) s2AN s02BN · 0 X X 0 gN (s s) = p(s) gN (s s) log j (30) · j · 2 p(s0) s2AN s02BN Information rate distortion function I (g ) R(I)(D) = lim inf N N (31) N!1 gN : δN (gN )≤D N o November 14, 2012 18 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Information Rate Distortion Function for Discrete Sources II Q 0 The class of conditional pmfs gN (s s) representing a codec Q = (α; β; γ) is j 0 a subset of all conditional pmfs gN (s s) j Q IN (gN ) inf IN (gN ) (32) ≥ gN : δN (gN )≤D For a given maximum average distortion D, the information rate distortion function R(I)(D) is the lower bound for the transmission bit-rate Q : δ(Q) D r(Q) R(I)(D) (33) 8 ≤ ≥ Fundamental source coding theorem ! It can be shown [Cover and Thomas, 2006, p.

04 Rate Distortion Theory-WS12

Entropy Rate of Stochastic Processes

Entropy Rate Estimation for Markov Chains with Large State Space

Entropy Power, Autoregressive Models, and Mutual Information

Source Coding: Part I of Fundamentals of Source and Video Coding

Relative Entropy Rate Between a Markov Chain and Its Corresponding Hidden Markov Chain

Entropy Rates & Markov Chains Information Theory Duke University, Fall 2020

Information Theory 1 Entropy 2 Mutual Information

Arxiv:1711.03962V1 [Stat.ME] 10 Nov 2017 the Entropy Rate of an Individual’S Behavior

Entropy Rate

Redundancy Rates of Slepian-Wolf Coding∗

Entropy Rate

Arxiv:1906.02570V2 [Cs.IT] 22 Apr 2020 on Orsodn Oagvnmliait Ore E [Kac12])