Rate Distortion Theory Rate Distortion Theory

Lossy coding: decoded signal is an approximation of original Rate distortion theory is a theoretical discipline treating data compression from the viewpoint of The results of rate distortion theory are obtained without consideration of a specific coding method The goal of rate distortion theory is to calculate the minimum transmission bit rate for a given distortion and source Below: example for a rate distortion function for a discrete iid source

R (H(S),Dmin = 0) Rate Distortion Function €

€ (R = 0,D = Dmax )

D 0 D 0 max € 1 o November 14, 2012 1 / 64

€ Rate Distortion Theory Outline

Transmission System and Variables Rate Distortion Function for Discrete Random Variables Definition: Source and Decoded Symbols, Coder/Decoder, Distortion, Rate Rate Distortion Function and Properties Rate Distortion Function for Binary IID Sources Rate Distortion Function for Amplitude-Continuous Random Variables Definition: Source and Decoded Symbols, Coder/Decoder, Distortion, Rate Rate Distortion Function Shannon Lower Bound Rate Distortion Function for Memoryless Gaussian Sources Rate Distortion Function for Gaussian Sources with Memory

o November 14, 2012 2 / 64 Rate Distortion Theory Transmission System and Variables

Transmission system

Distortion D S S Source Coder Decoder ' Sink

Bit-Rate R Derivation in two steps 0 1 Define S, S , coder/decoder, distortion D, and rate R 0 2 Establish a functional relationship between S, S , D, and R For two types of random variables 1 Derivation for discrete random variables 2 Derivation for amplitude-continuous random variables (Gaussian, Laplacian, etc.)

o November 14, 2012 3 / 64 Rate Distortion Theory The Operational Rate Distortion Function Operational Rate Distortion Function

Encoder: Irreversible encoder mapping α : s → i Lossless mapping γ : i → b Decoder: Lossless mapping γ−1 : b → i Decoder mapping β : i → s0

o November 14, 2012 4 / 64 Rate Distortion Theory The Operational Rate Distortion Function Source code

Source code: Q = (α, β, γ)

N-dimensional block source code: QN = αN , βN , γN { } Blocks of N consecutive input samples are independently coded (N) Each block of input samples s = {s0, ··· , sN−1} is mapped to a vector of K quantization indexes (K) (N) i = αN (s ) (1) Resulting vector of indexes i is converted into bit sequence

(`) (K) (N) b = γN (i ) = γN (αN (s )) (2)

At decoder side, index vector is recovered

(K) −1 (`) −1 (K) i = γN (b ) = γN (γN (i )) (3) Index vector mapped to a block of reconstructed samples 0(N) 0 0 s = {s0, ··· , sN−1}

0(N) (K) (N) s = βN (i ) = βN (αN (s )) (4)

o November 14, 2012 5 / 64 Rate Distortion Theory The Operational Rate Distortion Function Distortion I

This lecture: additive distortion measures

0 0 d1(s, s ) 0 with equality, if and only if s = s (5) ≥ Examples:

 1, for s = s0 Hamming distance: d (s, s0) = (6) 1 0, for s 6= s0 0 0 2 Mean squared error: d1(s, s ) = (s s ) (7) −

(N) 0(N) 0 0 0 Distortion for s = s0, s1, . . . , sN−1 and s = s , s , . . . , s { } { 0 1 N−1} N−1 (N) 1 X d (s(N), s0 ) = d (s , s0 ) (8) N N 1 i i i=0

o November 14, 2012 6 / 64 Rate Distortion Theory The Operational Rate Distortion Function Distortion II

Stationary random process S = Sn and N-dimensional block source code { } QN = αN , βN , γN { } n (N) (N)  o δ(QN ) = E dN S , βN (αN (S )) ) (9) Z  = f(s) dN s, βN (αN (s)) ds (10) RN

Arbitrary random process S = Sn and arbitrary code Q { } n (N) (N)  o δ(Q) = lim E dN S , βN (αN (S )) ) (11) N→∞

o November 14, 2012 7 / 64 Rate Distortion Theory The Operational Rate Distortion Function Rate

Average number of bits per input symbol ( denotes the number of bits) | · |

(N) 1 (N) (`) (N) rN (s ) = γN (αN (s )) with b = γN (αN (s )) (12) N

Stationary random process S = Sn and N-dimensional block source code { } QN = αN , βN , γN { } 1 n (N) o r(QN ) = E γN (αN (S )) (13) N Z 1 = f(s) γN (αN (s)) ds (14) N RN

Arbitrary random process S = Sn and arbitrary code Q { } 1 n (N) o r(Q) = lim E γN (αN (S )) (15) N→∞ N

o November 14, 2012 8 / 64 Rate Distortion Theory The Operational Rate Distortion Function Operational Rate Distortion Function

For given source S: each code Q is associated with a rate distortion point (R,D) Rate distortion point is achievable, if there exist a code Q such that r(Q) R and δ(Q) D Operational≤ rate distortion≤ function R(D) and its inverse, the operational distortion rate function D(R) R(D) = inf r(Q) D(R) = inf δ(Q) (16) Q: δ(Q)≤D Q: r(Q)≤R

o November 14, 2012 9 / 64 Rate Distortion Theory The Information Rate Distortion Function Motivation of Information Rate Distortion Function

Operational rate distortion function specifies a fundamental performance bound for lossy source coding Difficulty to evaluate R(D) = inf r(Q) (17) Q: δ(Q)≤D Information rate distortion function: introduced by Shannon in → [Shannon, 1948, Shannon, 1959] Obtain expression of rate distortion bound that involves the distribution of the source using Show that information rate distortion function is achievable

o November 14, 2012 10 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Example: Discrete Binary Source

Consider a discrete binary iid source with alphabet A = 0, 1 { } p(0) = p(1) = 0.5 (18)

Assume distortion measure is Hamming distance Coding scheme: 1 out of M symbols is not transmitted but guessed at receiver side Rate for this experimental codec is M 1 1 RE = − = 1 bits/symbol (19) M − M Distortion is given as 1 1 1  1 D = 0 + 1 = (20) M 2 · 2 · 2M Operational rate distortion function for the above code:

RE(D) = 1 2D (21) − o November 14, 2012 11 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Can we do better than the example codec?

RE(D) is the rate distortion performance for the experimental coding scheme Rate distortion function R(D) for the same source

1

0.9

0.8

0.7

0.6 R (D) 0.5 E R [bits] 0.4

0.3

0.2 Rate Distortion Function 0.1

0 0 0.1 0.2 0.3 0.4 0.5 D

o November 14, 2012 12 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete IID Sources

Entropy H(S) is a measure of uncertainty about random variable S Conditional H(S S0) is a measure on uncertainty about random variable S after observing| random variable S0 Mutual information

I(S, S0) = H(S) H(S S0) (22) − |

is a measure of for the reduction of uncertainty about S due to the observation of S0 average amount of information that S0 carries about S Mutual information for discrete random variables S A and S0 B ∈ ∈ X X p(s s0) I(S; S0) = p(s, s0) log | (23) 2 p(s) s∈A s0∈B

o November 14, 2012 13 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete IID Sources and Rate

Mutual information rewritten using Bayes’ rule

X X p(s s0) I(S; S0) = p(s, s0) log | 2 p(s) s∈A s0∈B X X p(s, s0) = p(s, s0) log 2 p(s)p(s0) s∈A s0∈B X X p(s0 s) = p(s, s0) log | 2 p(s0) s∈A s0∈B = H(S0) H(S0 S) (24) − | Since H(S0 S) 0 | ≥ H(S0) I(S; S0) (25) ≥

o November 14, 2012 14 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Mutual Information for Discrete Sources and Rate

T N Consider N-dimensional random vectors S = (S0,S1,...,SN−1) A and S0 = (S0 ,S0 ,...,S0 )T BN ∈ 0 1 N−1 ∈ 0 0 0 IN (S; S ) = HN (S ) HN (S S) − | {z | } ≥0 0 HN (S ) (26) ≤ Recall: entropy rate H (S0) H¯ (S0) = lim N (27) N→∞ N Rate vs. mutual information

H (S0) I (S; S0) r(Q) lim N lim N (28) ≥ N→∞ N ≥ N→∞ N

o November 14, 2012 15 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Description of a Codec for Discrete Sources Using Conditional PMFs I

Statistical properties of a mapping s0 = β(α(s)) can be described by an 0 N-th order conditional pmf gN (s s) 0 | For N > 1, gN (s s) are multivariate conditional pmfs 0 | Pmfs gN (s s) obtained by a deterministic mapping (codes) are a subset of the set of all| conditional pmfs

Example 1: Mapping s s0 : s0 = s/∆ ∆ → b c · g (s′ s) 1 | 1  1 : s0 = bs/∆c · ∆ g (s0|s) = 1 0 : otherwise

s/∆ ∆ s′ ⌊ ⌋ o November 14, 2012 16 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Description of a Codec Using Conditional Pmfs II

0 0 Example 2: Mapping (sn, sn+1) (s , s ) → n n+1

sn+1

1  (1, 1) : sn + sn+1 > 1 0 0  (s , s ) = (−1, −1) : sn + sn+1 < −1 n n+1 1 1 sn  (0, 0) : otherwise − 1 −

 0 g1(s′ s) x(s): s = −1 |  0 0  y(s): s = 0 g1(s |s) = 0  z(s): s = 1  0 : otherwise

with x(s) + y(s) + z(s) = 1 1 1 s′ − o November 14, 2012 17 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Information Rate Distortion Function for Discrete Sources I

0 0 N N Let gN (s s) be the N-th order conditional pmf of s B given s A Distortion| ∈ ∈ X X 0 0 δN (gN ) = p(s, s ) dN (s, s ) · s∈AN s0∈BN X X 0 0 = p(s) gN (s s) dN (s, s ) (29) · | · s∈AN s0∈BN Mutual information 0 X X 0 p(s, s ) IN (gN ) = p(s, s ) log · 2 p(s) p(s0) s∈AN s0∈BN · 0 X X 0 gN (s s) = p(s) gN (s s) log | (30) · | · 2 p(s0) s∈AN s0∈BN Information rate distortion function I (g ) R(I)(D) = lim inf N N (31) N→∞ gN : δN (gN )≤D N

o November 14, 2012 18 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Information Rate Distortion Function for Discrete Sources II

Q 0 The class of conditional pmfs gN (s s) representing a codec Q = (α, β, γ) is | 0 a subset of all conditional pmfs gN (s s) | Q IN (gN ) inf IN (gN ) (32) ≥ gN : δN (gN )≤D For a given maximum average distortion D, the information rate distortion function R(I)(D) is the lower bound for the transmission bit-rate

Q : δ(Q) D r(Q) R(I)(D) (33) ∀ ≤ ≥ Fundamental source coding theorem → It can be shown [Cover and Thomas, 2006, p. 318] that for any ε > 0, there exists a code Q with δ(Q) D and r(Q) R(I)(D) + ε ≤ ≥ Rate distortion function R(D) is used for both, operational rate distortion function and information rate distortion function

o November 14, 2012 19 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Distortion Rate Function for Discrete Sources

Distortion

X X 0 0 δN (gN ) = p(s) gN (s s) dN (s, s ) (34) · | · s∈AN s0∈BN Mutual information 0 X X 0 gN (s s) IN (gN ) = p(s) gN (s s) log | (35) · | · 2 p(s0) s∈AN s0∈BN Information distortion rate function is given via interchanging the roles of rate and distortion

D(R) = lim inf δN (gN ) (36) N→∞ gN : IN (gN )/N≤R

For a given maximum average rate R, the distortion rate function D(R) is the lower bound for the average distortion

Q : r(Q) R δ(Q) D(R). (37) ∀ ≤ ≥ o November 14, 2012 20 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Rate Distortion Function for Discrete IID Sources

Information rate distortion function was first derived by Shannon for iid sources [Shannon, 1948, Shannon, 1959] QN−1 For iid sources: joint pmf p(s) = i=0 p(si)

Q Q Q Q δN (g ) = δ1(g ) and IN (g ) = N I1(g ) (38) N N · First order rate distortion function

(I) R1 (D) = inf I1(g) (39) g: δ1(g)≤D

First order distortion rate function

D1(R) = inf δ1(g) (40) g: I1(g)≤R

o November 14, 2012 21 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Properties of R(D) for Discrete Sources and Additive Distortion Measures

(H(S), D = 0) Example of R(D) for min a discrete iid source H S H S S D R(D) is a non-increasing ( ( ) " ( | #) = 0, max ) and convex function of D D 0! D There exists a value D , so that max max 0! 1!

D Dmax R(D) = 0 (41) ∀ ≥ 2 For MSE distortion measure: Dmax is equal to the variance σ of the source → Minimum rate required for lossless transmission of a discrete source is equal to the entropy rate Dmin = 0 R(0) = H¯ (S) (42) Fundamental bound for lossless coding: special case of the fundamental → bound for lossy coding

o November 14, 2012 22 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Example: Discrete Binary IID Source I

Let the discrete binary iid source be given with probability p(0) = p and p(1) = 1 p, Hamming distance as distortion measure − Derive information rate distortion function R(D) by Deriving a lower bound Showing that the lower bound is achievable Maximum distortion is given by D(0) = min(p, 1 p), since we can assume that all symbols have the value with the higher probability− With denoting modulo 2 addition (XOR operator): derive a lower bound ⊕ I(S; S0) = H(S) H(S S0) − | 0 0 = Hb(p) H(S S S ) − ⊕ 0| Hb(p) H(S S ) ≥ − ⊕ Hb(p) Hb(D) (43) ≥ −

with Hb(x) = x log2 x (1 x) log2(1 x) being the binary entropy function and D−= S S−0 being− the distortion− (Hamming distance) ⊕ o November 14, 2012 23 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Example: Discrete Binary IID Source II

The derived lower bound can be achieved by the following choice  0 0 1 p − D : s = 0 p 0 (s ) = (44) S 1 − 2D 1 − p − D : s0 = 1  0 0 1 − D : s = s g 0 (s|s ) = (45) S|S D : s 6= s It is a valid choice, since 1  X 0 0 p : s = 0 p (s) = g 0 (s|s ) · p 0 (s ) = (46) S S|S S 1 − p : s = 1 s0=0 We obtain X 0 0 0 δ1(g) = pS0 (s ) · gS|S0 (s|s ) · d1(s, s ) = D (47) ∀s,s0 0 I1(g) = H(S) − H(S|S ) = Hb(p) − Hb(D) (48) Hence, the information rate distortion function R(D) is  H (p) − H (D) 0 ≤ D ≤ min(p, 1 − p) R(D) = b b (49) 0 D > min(p, 1 − p)

o November 14, 2012 24 / 64 Rate Distortion Theory The Information Rate Distortion Function for Discrete Sources Example: R(D) for Different Discrete Binary IID Sources

RE(D) is the rate distortion performance for the example coding scheme R(D) for a binary source for several values of p

1

0.9

0.8

0.7

0.6 R (D) 0.5 E R [bits] 0.4

0.3

0.2

0.1 P=0.1 P=0.2 P=0.3 P=0.5 0 0 0.1 0.2 0.3 0.4 0.5 D

o November 14, 2012 25 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Operational Rate Distortion Function for Continuous Sources

Source symbols: random process S with joint pdf f(s) Decoded symbols: random process S0 with joint pmf p(s0) For finite rate, decoding symbols must be discrete → For stationary processes S and block codes QN = (αN , βN , γN ) Z  δN (QN ) = f(s) dN s, βN (αN (s)) ds (50) RN

Stationary processes S and block codes QN = (αN , βN , γN ) Z 1 rN (QN ) = f(s) γN (αN (s)) ds (51) N RN Distortion and rate of the operational rate distortion function

δ(Q) = lim δN (QN ) r(Q) = lim rN (QN ) (52) N→∞ N→∞

o November 14, 2012 26 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Discretization of Continuous Random Variables

(∆) Approximation fX of pdf fX Z xi+1 (∆) 1 0 0 x : xi x < xi+1 fX (x) = fX (x ) dx (53) ∀ ≤ ∆ xi

Pmf pX∆ for random variable X∆

Z xi+1 0 0 (∆) pX∆ (xi) = fX (x ) dx = fX (xi) ∆ (54) xi ·

Joint pmf of two discrete approximations X∆ and Y∆

(∆) 2 pX∆Y∆ (xi, yj) = fXY (xi, yj) ∆ (55) · o November 14, 2012 27 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Mutual Information for Continuous Random Variables

Mutual information for discrete random variables X∆ AX and Y∆ AY ∈ ∆ ∈ ∆

X X pX∆Y∆ (xi, yi) I(X∆; Y∆) = pX∆Y∆ (xi, yj) log2 (56) pX∆ (xi) pY∆ (yj) xi∈AX∆ yi∈AY∆ (∆) X X (∆) fXY (xi, yj) 2 = f (xi, yj) log ∆ XY · 2 (∆) (∆) · fX (xi) fY (yj) xi∈AX∆ yi∈AY∆ As ∆ approaches zero

I(X; Y ) = lim I(X∆; Y∆) (57) ∆→0

(∆) (∆) (∆) the piecewise constant pdf approximations fXY , fX , and fY approach pdfs fXY , fX , and fY

Z ∞ Z ∞ fXY (x, y) I(X; Y ) = fXY (x, y) log2 dx dy (58) −∞ −∞ fX (x) fY (y)

o November 14, 2012 28 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Mutual Information for Random Vectors

T 0 0 0 0 T Given S = (S0,S1,...,SN−1) and S = (S0,S1,...,SN−1) Z Z fXY (x, y) I(X; Y ) = fXY (x, y) log2 dx dy RN RN fX (x) fY (y) Z Z fY |X(x,y) = fX (x)fY |X (x, y) log2 dx dy (59) RN RN fY (x) N Assume: Y is a discrete random vector with alphabet AY and δ( ) in these equations being the Dirac delta function · X fY (y) = δ(y a) pY (a) (60) N − a∈AY X fY |X (y x) = δ(y a) pY |X (a x) (61) | N − | a∈AY Rewriting mutual information using above pmfs Z X pY |X (a x) I(X; Y ) = f (x) p (a x) log | dx (62) X Y |X 2 p (a) N | Y RN a∈AY o November 14, 2012 29 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Mutual Information for Random Vectors II

Mutual information assuming Y being a discrete process Z X pY |X (a x) I(X; Y ) = f (x) p (a x) log | dx (63) X Y |X 2 p (a) N | Y RN a∈AY Can be re-written as Z ∞ I(X; Y ) = H(Y ) fX (x) H(Y X =x) dx (64) − −∞ | where H(Y ) is the entropy of the discrete random vector Y and X H(Y X =x) = pY |X (a x) log2 pY |X (a x) (65) | − N | | a∈AY is the of Y given the event X =x { } Since the conditional entropy H(Y X =x) is always nonnegative, we have | I(X; Y ) H(Y ) (66) ≤ o November 14, 2012 30 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Description of a Codec for Continuous Processes Using Conditional PDFs

Similar as for discrete sources, the statistical properties of a mapping 0 0 s = β(α(s)) can be described using a conditional pdf gN (s s) | Example: Mapping s s0 : s0 = s/∆ ∆ → b c · g (s′ s) 1 |

0 0 g1(s |s) = δ(s − bs/∆c · ∆)

s/∆ ∆ s′ ⌊ ⌋

0 For N > 1, gN (s s) are multivariate conditional pdfs 0 | The pdfs gN (s s) obtained by a deterministic mapping (codes) are a subset of the set of all| conditional pdfs o November 14, 2012 31 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Information Rate Distortion Function for Continuous Sources II

0 0 N N Let gN (s s) be the N-th order conditional pdf of s given s | ∈ B ∈ R Distortion Z Z 0 0 0 δN (gN ) = fSS0 (s, s ) dN (s, s ) ds ds RN RN · Z Z 0 0 0 = fS(s) gN (s s) dN (s, s ) ds ds (67) RN RN · | · Mutual information Z Z 0 0 gN (s s) 0 IN (gN ) = fS(s) gN (s s) log2 |0 ds ds (68) RN RN · | · fS0 (s ) with Z 0 0 fS0 (s ) = fS(s) gN (s s) ds. (69) RN · | Information Rate Distortion Function I (g ) R(I)(D) = lim inf N N (70) N→∞ gN : δN (gN )≤D N

o November 14, 2012 32 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Properties of R(I)(D) for Continuous Sources and Additive Distortion Measures

Every code Q that yields a distortion δ(Q) less than or equal to any given value D for a source S is associated with a rate r(Q) that is greater than or equal to the information rate distortion function R(I)(D) for the source S

Q : δ(Q) D r(Q) R(I)(D) (71) ∀ ≤ ≥ Example of R(D) for an amplitude-continuous source

R [bits] "!

R(D) for continuous amplitude sources !

0 ! 2 0! 1! D/" Properties similar as for discrete sources, except R(D) approaches infinity as D approaches 0 o November 14, 2012 33 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Differential Entropy

Mutual information of two continuous random vectors X and Y Z Z fXY (x, y) I(X; Y ) = fXY (x, y) log2 dx dy RN RN fX (x) fY (y) Z Z fX|Y (x,y) = fXY (x, y) log2 dx dy RN RN fX (x) Z Z = − fXY (x, y) log2 fX (x) dx dy RN RN Z Z + fXY (x, y) log2 fX|Y (x, y) dx dy RN RN = h(X) − h(X|Y ) (72) Differential entropy of X Z h(X) = E {− log2 fX (X)} = − fX (x) log2 fX (x) dx (73) RN Conditional differential entropy n o h(X|Y ) = E − log2 fX|Y (X|Y ) Z Z = − fXY (x, y) log2 fX|Y (x|y) dx dy. (74) N N R R o November 14, 2012 34 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Differential Entropy for an Uniform IID Source

For an continuous iid source S, differential entropy is defined as Z h(S) = E log f(S) = f(s) log f(s) ds (75) {− 2 } − 2 h(S) for uniform distribution f(s) = 1/A for A/2 s A/2 − ≤ ≤ Z A/2 1 1 1 Z A/2 h(S) = log2 ds = log2 A ds = log2 A (76) − −A/2 A A A −A/2 Differential entropy can become negative (in contrast to discrete entropy)

2 f(s)=A−1=0.5−1 2

1 1.5

0 A 1 −1 −1 2 f(s) f(s)=A =1 log −1

0.5 −1 −1 f(s)=A =2 −2

0 −3 −2 −1.5 −1 −0.5 0 0.5 1 1.5 2 0 0.5 1 1.5 2 2.5 3 3.5 4 s A o November 14, 2012 35 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Differential Entropy for an Gaussian IID Source

Gaussian iid process 2 1 − s fS(s) = e 2σ2 (77) √2πσ2 Differential entropy Z h(S) = fS(s) log fS(s)ds − 2 Z  2  s 2 = fS(s) log e log √2πσ ds − −2σ2 2 − 2 E S2 1 1 = log e + log (2πσ2) σ2 2 2 2 2 1 1 = log e + log (2πσ2) 2 2 2 2 1 = log (2πeσ2) bits (78) 2 2

o November 14, 2012 36 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables N-th Order Differential Entropy

N-th order differential entropy (N) hN (S) = h(S ) = h(S0, ,SN−1) (79) ··· Differential entropy rate

hN (S) h(S0, ,SN−1) h¯(S) = lim = lim ··· (80) N→∞ N N→∞ N N-th order pdf of a stationary Gaussian process

−1 1 − 1 (s−µ )T C (s−µ ) f (s) = e 2 N N N (81) G N/2 1/2 (2π) CN | | Differential entropy of N-th order stationary Gaussian process Z (G) hN (S) = fG(s) log2 fG(s) ds − RN 1 N  = log (2π) CN 2 2 | | Z log2 e T −1 + fG(s)(s µN ) CN (s µN ) ds (82) 2 RN − − o November 14, 2012 37 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Differential entropy of N-th order stationary Gaussian process

N-th oder random process: pdf f(s), mean µN , covariance matrix CN Z T −1 f(s)(s µN ) CN (s µN ) ds RN − − = E (S µ )T C−1(S µ ) − N N − N N−1 N−1   X X −1  = E (Si µi)(C )i,j(Sj µj) − −  i=0 j=0  N−1 N−1 X X −1 = E (Si µi)(Sj µj) (C )i,j { − − } i=0 j=0 N−1 N−1 X X −1 = Cj,i(C )i,j i=0 j=0 N−1 X −1 = (CC )j,j i=0 = N (83) o November 14, 2012 38 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Differential entropy of N-th order stationary Gaussian process

Differential entropy of N-th order stationary Gaussian process Z (G) hN (S) = fG(s) log2 fG(s) ds − RN 1 N  log2 e = log (2π) CN + N 2 2 | | 2 · 1 N  = log (2πe) CN (84) 2 2 | | N-th order differential entropy of any stationary non-Gaussian process is less than the N-th order differential entropy of a stationary Gaussian process (with co-variance matrices being equal for both processes) Z hN (S) = f(s) log2 f(s) ds − RN Z f(s) log2 fG(s) ds ≤ − RN Z 1 N  log2 e T −1 = log2 (2π) CN + f(s)(s µN ) CN (s µN ) ds 2 | | 2 RN − − = h(G)(S) (85) N o November 14, 2012 39 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Eigendecomposition of the Covariance Matrix

Determinant CN : product of the eigenvalues ξi of the matrix CN , | | N−1 T T Y (N) CN = AN ΞN A CN = AN ΞN A = ξ (86) N → | | | | ·| | · | N | i |{z} |{z} i=0 =1 =1

AN : matrix whose columns are build by the N unit-norm eigenvectors

 (N) (N) (N)  AN = v , v , , v (87) 0 1 ··· N−1

ΞN : diagonal matrix that contains the N eigenvalues of CN on its main diagonal

 (N)  ξ0 0 ... 0  (N)   0 ξ1 ... 0  ΞN =  . .  (88)  . . ..   . . . 0  (N) 0 0 0 ξN−1

o November 14, 2012 40 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Maximum Differential Entropy

Inequality of arithmetic and geometric means,

1 N−1 !N N−1 Y 1 X xi xi, (89) ≤ N i=0 i=0

with equality if x0 =x1 =...=xN−1 (geometric mean is maximized) Application to product of eigenvalues

N−1 N−1 !N Y 1 X 2N CN = ξi ξi = σ (90) | | ≤ N i=0 i=0

Equality holds if and only if all eigenvalues of CN are the same, i.e, if and → only if the random process is iid N-th order differential entropy of a stationary process S with a variance σ2 is upper bounded by differential entropy of Gaussian iid process with the same variance N 2  hN (S) log 2πeσ (91) ≤ 2 2 o November 14, 2012 41 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Shannon Lower Bound

R(D) is nicely defined, but its computation is actually not so simple Shannon lower bound provides lower bound for R(D) I (S; S0) R(I)(D) = lim inf N N→∞ gN : δN (gN )≤D N 0 hN (S) hN (S S ) = lim inf − | N→∞ gN : δN (gN )≤D N 0 hN (S) hN (S S ) = lim lim sup | N→∞ N→∞ N − gN : δN (gN )≤D N 0 0 hN (S S S ) = h¯(S) lim sup − | (92) N→∞ − gN : δN (gN )≤D N Shannon Lower Bound 0 ¯ hN (S S ) RL(D) = h(S) lim sup − (93) N→∞ − gN : δN (gN )≤D N Conditioning may reduce differential entropy

R(D) RL(D) (94) o ≥ November 14, 2012 42 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Shannon Lower Bound for MSE Distortion

Gaussian iid process has maximum differential entropy

N 2  hN (Z) = log 2πe σ (95) 2 2 · Z Setting Z = S S0 and D = σ2 − Z

0 N  hN (S S ) = log 2πe D (96) − 2 2 · Shannon Lower Bound for MSE distortion

¯ 1  1 2 h¯(S) −2R RL(D) = h(S) log 2πeD DL(R) = 2 2 (97) − 2 2 2πe · ·

Shannon Lower Bound for MSE distortion and iid sources

1  1 2 h(S) −2R RL(D) = h(S) log 2πeD DL(R) = 2 2 (98) − 2 2 2πe · ·

o November 14, 2012 43 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Shannon Lower Bound Various IID Sources

RL(D)/DL(R) is only different in h(S) for various distributions Uniform pdf:

1 2 6 2 −2R h(S) = log (12σ ) DL(R) = σ 2 (99) 2 2 → πe · |{z} ≈0.7

Laplacian pdf:

1 2 2 e 2 −2R h(S) = log (2e σ ) DL(R) = σ 2 (100) 2 2 → π · |{z} ≈0.865

Gaussian pdf:

1 2 2 −2R h(S) = log (2πeσ ) DL(R) = σ 2 (101) 2 2 → ·

o November 14, 2012 44 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Asymptotic Tightness of the Shannon Lower Bound

Shannon lower bound approaches distortion rate function for small distortions or high rates lim R(D) RL(D) = 0. (102) D→0 −

Comparison of D(R) with DL(R) for the Laplacian iid source

o November 14, 2012 45 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables

RL(D) for Various IID Sources

Shannon lower bound RL(D) for Uniform iid process: red Laplace iid process: green Gauss iid process: blue Left: MSE vs. Rate, right: SNR vs. Rate σ2 SNR = 10 log (103) 10 MSE

1 30

25 0.8 20 0.6 15 MSE

0.4 SNR[dB] 10 0.2 5

0 0 0 1 2 3 4 0 1 2 3 4 Rate [bit/symbol] Rate [bit/symbol] o November 14, 2012 46 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables

RL(D) for Gaussian Sources with Memory

Differential entropy for Gaussian sources

(G) 1 N  h (S) = log (2πe) CN (104) N 2 2 | | Shannon lower bound for MSE distortion

(G) hN (S) 1  RL(D) = lim log2 2πeD N→∞ N − 2 N log2((2πe) CN ) 1  = lim | | log2 2πeD N→∞ 2N − 2 1 log2( CN ) 1  = log2(2πe) + lim | | log2 2πeD 2 N→∞ 2N − 2 log2 CN 1 = lim | | log2 D N→∞ 2N − 2 N−1 1 X (N) 1 = lim log2 ξi log2 D (105) N→∞ 2N − 2 i=0

o November 14, 2012 47 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Grenander and Szego¨’s theorem

Assume zero-mean process: CN = RN Given the conditions

RN is a sequence of Hermitian Toeplitz matrices with elements φk on the k-th diagonal

The infimum Φinf = infω Φ(ω) and supremum Φsup = supω Φ(ω) of the Fourier series are finite ∞ X −jωk Φ(ω) = φk e (106) k=−∞

The function G is continuous in the interval [Φinf , Φsup] The following expression holds

N−1 Z π 1 X  (N) 1 lim G ξi = G (Φ(ω)) dω (107) N→∞ N 2π i=0 −π

where ξ(N), for i = 0, 1,...,N 1, denote the eigenvalues of the N-th i − matrix RN

o November 14, 2012 48 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Power Spectral Density of a Gauss-Markov Process

Zero-mean Gauss-Markov process with ρ < 1 | | Sn = Zn + ρ Sn−1 (108) · Auto-correlation function φ[k] = σ2 ρ|k| (109) · Using the relationship ∞ X a ak e−jkx = (110) e−jx a k=1 − we obtain ∞ X −jωk ΦSS(ω) = φ[k] e · k=−∞ ∞ X = σ2 ρ|k| e−jωk · · k=−∞  ρ ρ  = σ2 1 + + · e−jω ρ ejω ρ 2− − 2 1 ρ = σ − 2 (111) · 1 2ρ cos ω + ρ o − November 14, 2012 49 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables

RL(D) for Gaussian-Markov Processes

Shannon lower bound for a zero-mean Gauss-Markov process with ρ < 1 | | Z π 1 ΦSS(ω) RL(D) = log2 dω 4π −π D 1 Z π σ2(1 ρ2) = log2 − dω 4π −π D − Z π 1 2 log2(1 2ρ cos ω + ρ ) dω 4π −π − | {z } =0 1 σ2 (1 ρ2) = log − (112) 2 2 D where we used Z π ln(a2 2ab cos x + b2) dx = 2π ln a (113) 0 − Shannon lower bound for information distortion rate function 2 2 −2R DL(R) = (1 ρ ) σ 2 (114) − o November 14, 2012 50 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for Gaussian IID Sources I

Goal: derive information rate distortion function for Gaussian iid sources For distortions D < σ2, choose

(s0−µ)2 0 1 − 2 fS0 (s ) = e 2 (σ −D) (115) p2π(σ2 D) − 2 0 1 − z fZ|S0 (z s ) = e 2 D = fZ (z) (116) | √2πD with S = S0 + Z (117) 0 0 The random variables S and Z are independent, since f 0 (z s ) = fZ (z). Z|S | Hence, the pdf fS is given by the convolution of fS0 and fZ 0 Pdfs fS0 (s ) and fZ (z) are possible choices, since convolution of two Gaussians yields a Gaussian 0 fS(s) = fS0 (s ) ? fZ (z) (118) with mean and variance adding up µ = µ + 0 and σ2 = (σ2 D) + D (119) − o November 14, 2012 51 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for Gaussian IID Sources II

0 Distortion given by variance of difference process Zn = Sn S − n  0 2  2 δ1(g1) = E (Sn S ) = E Z = D. (120) − n n Mutual information

0 I1(g1) = h(Sn) h(Sn Sn) (121) − | 0 0 = h(Sn) h(Sn Sn Sn) (122) − −0 | = h(Sn) h(Zn S ) (123) − | n = h(Sn) h(Zn) (124) − 1 1 = log 2πeσ2  log 2πeD  (125) 2 2 − 2 2 1 σ2 R(D) = log (126) 2 2 D The information rate distortion function coincides with the Shannon lower bound for Gaussian iid sources

o November 14, 2012 52 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for a Gaussian IID Sources III

We assume a memoryless Gaussian source with a variance σ2 For the memoryless Gaussian source, the Shannon lower bound coincides with the rate distortion function The information rate distortion function is given as

2  1 log σ , 0 D σ2 R(D) = 2 2 D (127) 0, D≤ > σ2≤

The information distortion rate function is given as

D(R) = σ2 2−2·R (128) · The signal-to-noise ratio (SNR) is given as

σ2 SNR(R) = 10 log = 10 log 22R 6R dB (129) · 10 D(R) · 10 ≈

o November 14, 2012 53 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for a Gaussian Source with Memory

Assume a stationary Gaussian random process with zero mean and pdf

−1 (G) 1 − 1 (s−µ )T C (s−µ ) f (s) = e 2 N N N (130) S N/2 1/2 (2π) CN | | Eigendecomposition of covariance matrix CN , T CN = AN ΞN AN (131)

AN : matrix whose columns are build by the N unit-norm eigenvectors

 (N) (N) (N)  AN = v , v , , v (132) 0 1 ··· N−1

ΞN : diagonal matrix that contains the N eigenvalues of CN on its main diagonal  (N)  ξ0 0 ... 0  (N)   0 ξ1 ... 0  ΞN =  . .  (133)  . . ..   . . . 0  (N) 0 0 0 ξN−1 o November 14, 2012 54 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Signal Space Rotation

Given stationary Gaussian source Sn : construct source Un by { } { } decomposing Sn into vectors S of size N and applying the transform { } U = A−1 (S µ ) = AT (S µ ) (134) N − N N − N Linear transformation of a Gaussian random vector results in another Gaussian random vector

The chosen transform yields independent random variables Ui

2 N−1 ui 1 T −1 − (N) 1 − u Ξ u Y 1 2 ξ fU (u) = e 2 N = e i (135) (2π)N/2 Ξ 1/2 q (N) N i=0 | | 2πξi Mean E U = AT (E S µ ) = AT (µ µ ) = 0 (136) { } N { } − N N N − N Covariance

n T o T n T o E UU = A E (S µ )(S µ ) AN N − N − N T = AN CN AN = ΞN (137) o November 14, 2012 55 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Distortion and Mutual Information

Inverse transform after compression identical to forward transform 0 0 S = AN U + µN , (138) With 0  T 0  0  0  U U = A S S S S = AN U U (139) − N − ↔ − − MSE distortion between any realization s of S and its reconstruction s0

N−1 0 1 X 0 2 1 0 T 0 dN (s; s ) = (si s ) = (s s ) (s s ) N − i N − − i=0

1 0 T T 0 1 0 T 0 = (u u ) A AN (u u ) = (u u ) (u u ) N − N − N − − N−1 1 X 0 2 0 = (ui u ) = dN (u; u ) (140) N − i i=0 T Since AN AN = IN (identity matrix) 0 0 IN (S; S ) = IN (U; U ) (141) o November 14, 2012 56 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Distortion Rate Function

Mutual information and average distortion considering independence of the components Ui

N−1 N−1 X 1 X I (gQ) = I (gQ) and δ (gQ) = δ (gQ) (142) N N 1 i N N N 1 i i=0 i=0

N-th order distortion rate function DN (R)

N−1 N−1 1 X 1 X D (R) = D (R ) with R = R (143) N N i i N i i=0 i=0

Di(Ri): first-order distortion rate function for Gaussian iid processes for component Ui 2 −2Ri (N) −2Ri Di(Ri) = σi 2 = ξi 2 (144) (N) with ξi being the eigenvalues of CN

o November 14, 2012 57 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Optimal Bit Allocation

Task 1 N−1 1 N−1 X (N) −2Ri X min DN (R) = ξi 2 such that R Ri R0,R1,...,RN−1 N ≥ N i=0 i=0 Comparison on different types of mean computations

1 1 N−1 N−1 !N N−1 !N 1 X (N) −2Ri Y (N) −2Ri Y (N) −2R DN (R) = ξ 2 ξ 2 = ξ 2 N i ≥ i i · i=0 i=0 i=0 | {z } 1 (N) =|CN | N =ξ˜

N−1 −2R −2R −2R −2R − PN−1 2R −2RN with Q 2 i = 2 0 2 1 2 N−1 = 2 i=0 i = 2 i=0 · ··· Expression on the right-hand side of above inequality is constant: (N) −2Ri ˜(N) −2R equality achieved when all terms ξi 2 = ξ 2

1 (N) (N) N−1 ! N 1 ξi 1 ξi ˜(N) Y (N) Ri = R+ log = log with ξ = ξ 2 2 ˜(N) 2 2 ˜(N) −2R i ξ ξ 2 i=0 o November 14, 2012 58 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Parametric Formulation

So far, we have ignored that Ri cannot be less than 0 (N) 1 ξi (N) ˜(N) −2R Ri = log 0 Ri = 0 if ξ ξ 2 (145) 2 2 ξ˜(N)2−2R ≥ → i ≤ Introducing the parameter θ 0 ≥ ( ξ(N) (N) ( (N) 1 i θ, 0 θ ξ 2 log2 θ , 0 θ ξi i Ri = ≤ (≤N) and Di = (N) ≤ (≤N) 0, θ ξ ξi , θ ξi ≥ i ≥ or ! 1 ξ(N)   R (θ) = max 0, log i and D (θ) = min ξ(N), θ (146) i 2 2 θ i i Parametric expressions N−1 N−1 1 X 1 X  (N)  D (θ) = D = min ξ , θ (147) N N i N i i=0 i=0 N−1 N−1 (N) ! 1 X 1 X 1 ξ R (θ) = R = max 0, log i (148) N N i N 2 2 θ i=0 i=0 o November 14, 2012 59 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(I)(D) for N → ∞ Recall Grenander and Szeg¨ostheorem for infinite Toeplitz matrices N−1 Z π 1 X (N) 1 lim G(ξi ) = G(Φ(ω))dω (149) N→∞ N 2π i=0 −π Parametric expressions

N−1 1 X  (N)  lim DN (θ) = lim min ξi , θ (150) N→∞ N→∞ N i=0 N−1 (N) ! 1 X 1 ξi lim RN (θ) = lim max 0, log2 (151) N→∞ N→∞ N 2 θ i=0

With the spectral density function Φss(ω) 1 Z π D(θ) = min Φss(ω), θ dω 2π −π { } Z π   1 1 Φss(ω) R(θ) = max 0, log2 dω (152) 2π −π 2 θ o November 14, 2012 60 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Illustration of Minimization Approach

"ss (#) reconstruction error spectrum

preserved spectrum "s s ($)

white noise " "

! no signal transmitted

It can be interpreted that at each frequency, the variance of the process as given by the spectral density function Φss(ω) is compared to the parameter θ, which represents the mean squared error which is the considered parameter

1 Φss(ω) When Φss(ω) is found to be larger than θ, the rate 2 log2 θ is assigned, otherwise zero rate is assigned to that frequency component

o November 14, 2012 61 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for a Gauss-Markov Process I

Calculate R(D) for Gauss-Markov process with ρ < 1 and variance σ2 | |

Sn = Zn + ρ Sn−1 (153) · Auto-correlation function and spectral density function are given as

∞ X σ2(1 ρ2) φ[k] = σ2 ρ k Φ(ω) = φ[k]e−jkω = − (154) | | 1 2ρ cos ω + ρ2 k=−∞ − All rate terms in the parametric formulation of R(D) are positive when

D 1 ρ − (155) σ2 ≤ 1 + ρ and R(D) can be written

1 σ2(1 ρ2) R(D) = log − (156) 2 2 D

o November 14, 2012 62 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables R(D) for a Gauss-Markov Process II

Corresponding distortion rate function for R log (1 + ρ) is given by ≥ 2 D(R) = (1 ρ2) σ2 2−2R (157) − · · Includes result for Gaussian iid sources (ρ = 0) Below: R(D) for Gauss-Markov sources for different values of ρ SNR [dB] 45 ! !" 40 ! D* 1$ =0.99 ! 2 # 35 ! ! 1+ !"=0.95 ! !"=0.9 ! 30 ! !"=0.78 ! 25 ! !"=0.5 ! !"=0 ! 20 ! 15 ! 10 ! 5! 0! R [bits] ! 0! 0.5 ! 1! 1.5 ! 2! 2.5 ! 3! 3.5 ! 4 o November 14, 2012 63 / 64 Rate Distortion Theory Rate Distortion Function for Amplitude-Continuous Random Variables Summary on Rate Distortion Theory

Rate distortion theory: minimum bit-rate for a given distortion Operational rate distortion function difficult to compute - derive information rate distortion function instead and show that they coincide R(D) for discrete sources is a convex and continuous function in the range R = (0,H(S)),D = (Dmax, 0) R(D) for amplitude-continuous sources: similar to R(D) for discrete sources, except R(D) for D 0 → ∞ → Shannon lower bound RL(D) can often be computed as analytical lower bound for R(D) R(D) for Gaussian iid sources and MSE: 6 dB/bit Any other source than the Gaussian iid source with equal variance requires less bits given the same mean squared error R(D) for Gaussian source with memory and MSE: Encode spectral components independently Introduce white noise Suppress small spectral components

o November 14, 2012 64 / 64