<<

Transform Coding Transform Coding

Another concept for partially exploiting the memory gain of Used in virtually all lossy image and coding applications Samples of source s are grouped into vectors s of adjacent samples Transform coding consists of the following steps 1 Linear analysis transform A, converting source vectors s into transform coefficient vectors u = As 0 2 Scalar quantization of the transform coefficients u 7→ u 3 Linear synthesis transform B, converting quantized transform coefficient vectors u0 into decoded source vectors s0 = Bu0

Adjacent Samples2D Transform: Transform Coefficients 4 4 Rotation by ϕ = 45◦ 2 2  sin ϕ cos ϕ  S 0 A = U 0 1 cos ϕ − sin ϕ 1 −2 −2

−4 =⇒ −4 −4 −2 0 2 4 −4 −2 0 2 4

S0 U0 o January 23, 2013 1 / 56 Transform Coding Overview

Structure of Transform Coding Systems Motivation of Transform Coding Orthogonal Block Transforms Bit Allocation for Transform Coefficients Karhunen Lo´eveTransform (KLT) -Independent Transforms Walsh-Hadamard Transform Discrete Fourier Transform (DFT) Discrete Cosine Transform (DCT) 2-d Transforms in Image and Video Coding Entropy coding of transform coefficients Distribution of transform coefficients for Images Modified Discrete Cosine Transform (MDCT) Summary

o January 23, 2013 2 / 56 Transform Coding Structure of Transform Coding Systems

0 u0 u0 Q0 0 u1 u1 Q1 s BA s0 0 uN−1 uN−1 QN−1

analysis transform synthesis transformquantizers

Synthesis transform is typically inverse of analysis transform

Separate scalar quantizer Qn for each transform coefficient un Vector quantization of all bands or some of them is also possible, but Transforms are designed to have a decorrelating effect =⇒ Memory gain of VQ is reduced Shape gain can be obtained by ECSQ Space-filling gain is left as a possible additional gain for VQ Combination of decorrelating transformation, scalar quantization and entropy coding is highly efficient - in terms of rate-distortion performance and complexity o January 23, 2013 3 / 56 Transform Coding Motivation of Transform Coding

Exploitation of statistical dependencies Transform are typically designed in a way that, for typical input , the signal energy is concentrated in a few transform coefficients Coding of a few coefficients and many zero-valued coefficients can be very efficient (e.g., using , run-length coding) Scalar quantization is more effective in transform domain Efficient trade-off between coding efficiency & complexity Vector Quantization: searching through codebook for best matching vector Combination of transform and scalar quantization typically results in a substantial reduction in computational complexity Suitable for quantization using perceptual criteria In image & video coding, quantization in transform domain typically leads to an improvement in subjective quality In speech & audio coding, frequency bands might be used to simulate processing of human ear Reduce perceptually irrelevant content

o January 23, 2013 4 / 56 Transform Coding Transform Encoder and Decoder

encoder u0 i0 α0 u i 1 α 1 A 1 γ bs

uN−1 iN−1 αN−1

analysis transform mapping entropy coderencoder

decoder 0 i0 u0 β0 i u0 1 β 1 b γ−1 1 B s0

0 iN−1 uN−1 βN−1

entropy decoder mapping synthesis transformdecoder

o January 23, 2013 5 / 56 Transform Coding Linear Block Transforms

Linear Block Transform Each component of the N-dimensional output vector represents a linear combination of the N components of the N-dimensional input vector Can be written as matrix multiplication

Analysis transform u = A · s (1) Synthesis transform s0 = B · u0 (2)

Vector interpretation: s0 is represented as a linear combination of column vectors of B

N−1 0 X 0 0 0 0 s = un · bn = u0 · b0 + u1 · b1 + ··· + uN−1 · bN−1 (3) n=0

o January 23, 2013 6 / 56 Transform Coding Linear Block Transforms (cont’d)

Perfect Reconstruction Property Consider case that no quantization is applied (u0 = u) Optimal synthesis transform: B = A−1 (4) Reconstructed samples are equal to source samples

s0 = B u = B A s = A−1 A s = s (5)

Optimal Synthesis Transform (in presence of quantization) Optimality: Minimum MSE distortion among all synthesis transforms B = A−1 is optimal if A is invertible and produces independent transform coefficients the component quantizers are centroidal quantizers If above conditions are not fulfilled, a synthesis transform B 6= A−1 may reduce the distortion

o January 23, 2013 7 / 56 Transform Coding Orthogonal Block Transforms

Orthonormal Basis An analysis transform A forms an orthonormal basis if basis vectors (matrix rows) are orthogonal to each other basis vectors have to length 1 The corresponding transform is called an orthogonal transform The transform matrices are called unitary matrices Unitary matrices with real entries are called orthogonal matrix Inverse of unitary matrices: Conjugate transpose

A−1 = A† (for orthogonal matrices: A−1 = AT ) (6)

Why are orthogonal transforms desirable? MSE distortion can be minimized by independent scalar quantization of the transform coefficients Orthogonality of the basis vectors sufficient: Vector norms can be taken into account in quantizer design

o January 23, 2013 8 / 56 Transform Coding Properties of Orthogonal Block Transforms

Transform coding with orthogonal transform and perfect reconstruction B = A−1 = A† preserves MSE distortion 1 d (s, s0) = (s − s0)† (s − s0) N N 1 † = A−1 u − B u0 A−1 u − B u0 N 1 † = A† u − A† u0 A† u − A† u0 N 1 = (u − u0)† AA−1 (u − u0) N 1 = (u − u0)† (u − u0) N 0 = dN (u, u ) (7)

Scalar quantization that minimizes MSE in transform domain also minimizes MSE in original signal space For the special case of orthogonal matrices: (··· )† = (··· )T o January 23, 2013 9 / 56 Transform Coding Properties of Orthogonal Block Transforms (cont’d)

Covariance matrix of transform coefficients

 T CUU = E (U − E {U})(U − E {U}) n o = E A (S − E {S})(S − E {S})T AT

−1 = ACSS A (8)

Since the trace of a matrix is similarity-invariant,

tr(X) = tr(PXP −1), (9)

and the trace of an autocovariance matrix is the sum of the variances of the vector components, we have

N−1 1 X σ2 = σ2 . (10) N i S i=0 The arithmetic mean of the variances of the transform coefficients is equal to the variances of the source o January 23, 2013 10 / 56 Transform Coding Geometrical Interpretation of Orthogonal Transforms

Inverse 2-d transform matrix (= transpose of forward transform matrix)     1 1 1 T B = b0 b1 = √ = A 2 1 −1 Vector interpretation for 2-d example s s = u0 · b0 + u1 · b1 1       s0 1 1 1 1 u1 · b1 = u0 · √ + u1 · √ s1 2 1 2 −1 u0 · b0  4   1   1  = 3.5 · + 0.5 · s 3 1 −1 b0 yielding transform coefficients s b 0 √ √ 1 u0 = 2 · 3.5 u1 = 2 · 0.5 An orthogonal transform is a rotation from the signal coordinate system into the coordinate system of the basis functions o January 23, 2013 11 / 56 Transform Coding Transform Example N = 2

Adjacent samples of Gauss-Markov source with different correlation factors ρ ρ = 0 ρ = 0.5 ρ = 0.9 ρ = 0.95 4 4 4 4 S1 S1 S1 S1 2 2 2 2

0 0 0 0

−2 −2 −2 −2 S S S S −4 0 −4 0 −4 0 −4 0 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4

Transform coefficients for orthonormal 2D transform ρ = 0 ρ = 0.5 ρ = 0.9 ρ = 0.95 4 4 4 4 U1 U1 U1 U1 2 2 2 2

0 0 0 0

−2 −2 −2 −2 U0 U U U0 −4 −4 0 −4 0 −4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4 −4 −2 0 2 4

o January 23, 2013 12 / 56 Transform Coding Example for Waveforms (Gauss-Markov Source with ρ = 0.95)

4 s[k] 2 Top: signal s[k] 0 Middle: −2 transform coefficient u0[k/2] k −4 also called dc coefficient 0 10 20 30 40 50 Bottom: 4 u0[k/2] transform coefficient u1[k/2] 2 also called ac coefficient 0 Number of transform −2 k/2 coefficients u0 is half the −4 number of samples s 0 5 10 15 20 25 4 Number of transform u [k/2] 2 1 coefficients u1 is half the number of samples s 0 −2 k/2 −4 0 5 10 15 20 25 o January 23, 2013 13 / 56 Transform Coding Scalar Quantization in Transform Domain

Consider Transform Coding with Orthogonal Transforms

direct coding transform coding transform coding

quantization cells quantization cells quantization cells in transform domain in signal space

Quantization cells are hyper-rectangles as in scalar quantization but rotated and aligned with the transform basis vectors Number of quantization cells with appreciable probabilities is reduced =⇒ indicates improved coding efficiency for correlated sources o January 23, 2013 14 / 56 Transform Coding Bit Allocation for Transform Coefficients

Problem: Distribute R among the N transform coefficients such that the resulting distortion D is minimized

N N 1 X 1 X min D(R) = D (R ) subject to R ≤ R (11) N i i N i i=1 i=1

with Di(Ri) being the op. distortion-rate functions of the scalar quantizers Approach: Minimize Lagrangian cost function: J = D + λR

N N ! ∂ X X ∂Di(Ri) ! D (R ) + λ R = + λ = 0 (12) ∂R i i i ∂R i i=1 i=1 i Solution: Pareto condition ∂D (R ) i i = −λ = const (13) ∂Ri Move bits from coefficients with small distortion reduction per bit to coefficients with larger distortion reduction per bit

Similar to bit allocation problem in discrete sets: min Di + λRi o January 23, 2013 15 / 56 Transform Coding Bit Allocation for Transform Coefficients (cont’d)

Operational distortion-rate function of scalar quantizers can be written as

2 Di(Ri) = σi · gi(Ri) (14) Justified to assume that

gi(Ri) is a continuous strictly convex function 0 0 gi(Ri) has a continuous strictly increasing derivative gi(Ri) with gi(∞) = 0 Pareto condition becomes

2 0 −σi · gi(Ri) = λ (15)

2 0 If λ ≥ −σi gi(0), the quantizer for ui cannot be operated at the given slope =⇒ Set the corresponding component rate to Ri = 0 Bit allocation rule ( 2 0 0 : −σi gi(0) ≤ λ Ri =  λ  2 0 (16) ηi − 2 : −σi gi(0) > λ σi

0 where ηi(·) denotes the inverse of the derivative gi(·) o January 23, 2013 16 / 56 Transform Coding Approximation for Gaussian Sources

Transform coefficients have also a Gaussian distribution Experimentally found approximation for entropy-constrained scalar quantization for Gaussian sources (a ≈ 0.952) πe g(R) = ln(a · 2−2R + 1) (17) 6a Use parameter 3 (a + 1) θ = λ with 0 ≤ θ ≤ σ2 (18) πe ln 2 max Bit allocation rule ( 2 0 : θ ≥ σi  2  Ri(θ) = 1 σi 2 . (19) 2 log2 θ (a + 1) − a : θ < σi Resulting component distortions ( 2 2 σi : θ ≥ σi Di(θ) = ε2 ln 2 2  θ a  2 . (20) − · σi · log2 1 − 2 : θ < σi a σi a+1 o January 23, 2013 17 / 56 Transform Coding High-Rate Approximation

Assumption: High-rate approximation valid for all component quantizers High-rate approximation for distortion-rate function of component quantizers

2 2 −2Ri Di(Ri) = εi · σi · 2 (21) 2 where εi depends on transform coefficient distribution and quantizer Pareto condition ∂ 2 2 −2Ri Di(Ri) = −2 ln 2 εi σi 2 = −2 ln 2 Di(Ri) = −λ = const (22) ∂Ri states that all quantizers are operated at the same distortion Bit allocation rule 1 ε2 σ2  R (D) = log i i (23) i 2 2 D Overall operational rate-distorion function

N−1 N−1 1 X 1 X σ2ε2  R(D) = R (D) = log i i (24) N i 2N 2 D i=0 i=0

o January 23, 2013 18 / 56 Transform Coding High-Rate Approximation (cont’d)

Overall operational rate-distorion function N−1 1 X σ2ε2  1 ε˜2 σ˜2  R(D) = log i i = log (25) 2N 2 D 2 2 D i=0 with geometric means

1 1 N−1 !N N−1 !N 2 Y 2 2 Y 2 σ˜ = σi and ε˜ = εi (26) i=0 i=0 Overall distortion-rate function D(R) =ε ˜2 · σ˜2 · 2−2R (27)

For Gaussian sources (transform coefficients are also Gaussian) and 2 2 πe entropy-constrained scalar quantizers, we have εi = ε = 6 , yielding πe D (R) = · σ˜2 · 2−2R (28) G 6 o January 23, 2013 19 / 56 Transform Coding Transform Coding Gain at High Rates

Transform coding gain is the ratio of the distortion for scalar quantization and the distortion for transform coding ε2 · σ2 · 2−2R ε2 · σ2 G = S S = S S (29) T ε˜2 · σ˜2 · 2−2R ε˜2 · σ˜2 with

2 σS : variance of the input signal 2 εS : factor of high-rate approximation for direct scalar quantization

High-rate transform coding gain for Gaussian sources

σ2 1 PN−1 σ2 G = S = N i=0 i (30) T 2 q σ˜ N QN−1 2 i=0 σi Ratio of arithmetic and geometric mean of the transform coefficient variances The high-rate transform coding gain for Gaussian sources is maximized if the geometric mean is minimized (=⇒ Karhunen Lo`eveTransform) o January 23, 2013 20 / 56 Transform Coding Example: Orthogonal Transformation with N = 2

Input vector and transform matrix

 s  1  1 1  s = 0 and A = √ (31) s1 2 1 −1 Transformation  u  1  1 1   s  u = 0 = A · s = √ 0 (32) u1 2 1 −1 s1 Coefficients 1 1 u0 = √ (s0 + s1), u0 = √ (s0 − s1) (33) 2 2 Inverse transformation 1  1 1  A−1 = AT = A = √ (34) 2 1 −1

o January 23, 2013 21 / 56 Transform Coding Example: Orthogonal Transformation with N = 2 (cont’d)

Variance of transform coefficients 1  σ2 = E U 2 = E (S + S )2 0 0 2 0 1 1 = E S2 + E S2 + 2 · E {S S } 2 0 1 0 1 1 = σ2 + σ2 + 2σ2ρ = σ2(1 + ρ) (35) 2 S s s s 2  2 2 σ1 = E U1 = σs (1 − ρ) (36) Cross-correlation of transform coefficients 1 E{U U } = E(S + S ) · (S − S ) 0 1 2 0 1 0 1 1 = E S2 − S2 = σ2 − σ2 = 0 (37) 2 0 1 s s Transform coding gain for Gaussian (assuming optimal bit allocation) 2 σS 1 GT = = (38) p 2 2 p 2 σ0 + σ1 1 − ρ o January 23, 2013 22 / 56 Transform Coding Example: Analysis of Transform Coding for N = 2

R-d cost before transform J (0) = 2(D + λR) (for 2 samples) R-r cost after transform

(1) J = (D0 + D1) + λ(R0 + R1) (for both transform coefficients)

Gain in r-d cost due to transform at same rate (R0 + R1 = R) (0) (1) ∆J = J − J = 2D − D0 − D1 (39) For Gaussian sources, input and output of transform have Gaussian pdf With operational distortion-rate function for an entropy-constrained scalar quantizer at high rates (D = ε2 · σ2 · 2−2R with ε2 = πe/6), we have

2 2 −2R+1 −2R0 −2R1  ∆J = ε σS 2 − (1 + ρ)2 − (1 − ρ)2 (40)

By eliminating R1 using R1 = 2R − R0, we get

2 2 −2R+1 −2R0 −2(2R−R0) ∆J = ε σS 2 − (1 + ρ)2 − (1 − ρ)2 (41)

o January 23, 2013 23 / 56 Transform Coding Example: Analysis of Transform Coding for N = 2 (cont’d)

Gain in rate-distortion due to transform

2 2 −2R+1 −2R0 −2(2R−R0) ∆J = ε σS 2 − (1 + ρ)2 − (1 − ρ)2 (42) To maximize gain, we set

∂ ! ∆J = 2 ln 2 · (1 + ρ)2−2R0 − 2 ln 2 · (1 − ρ)2−4R+2R0 = 0 (43) ∂R0 yielding the bit allocation rule 1 r1 + ρ R = R + log (44) 0 2 2 1 − ρ Same expression is obtained by using the previously derived high rate bit allocation rule 1 ε2 σ2  R = log i (45) i 2 2 D Operational high-rate distortion-rate function (Gaussian, ECSQ, N = 2) πe p D(R) = · 1 − ρ2 · σ2 · 2−2R (46) 6 S o January 23, 2013 24 / 56 Transform Coding Example: Experimental R-D curves for N = 2

0.8

0.7

TC1 Theoretical 0.6

0.5 TC1 experimental (ECSQ)

0.4 Optimal bit−allocation using Pareto condition Distortion −−>

0.3

Cloud of RD points by 0.2 combination of RD values of TC1 through TC4

0.1

0 0 1 2 3 4 5 6 Bit−rate −−>

o January 23, 2013 25 / 56 Transform Coding Example: Experimental R-D curves for N = 4

1.4

1.2

TC1 theoretical

1

0.8

TC1 experimental (ECSQ)

Distortion −−> 0.6 Optimal bit−allocation using Pareto condition

Cloud of RD points by 0.4 combination of RD values of TC1 through TC4

0.2

0 0 1 2 3 4 5 6 7 Bit−rate −−>

o January 23, 2013 26 / 56 Transform Coding General Bit Allocation for Transform Coefficients

For Gaussian sources, the following points need to be considered: High-rate approximations are not valid for low bit rates; better approximations should be used for low rates For low rates, Pareto conditions cannot be fulfilled for all transform coefficients, since the component rates Ri must not be less then 0 Solution:

Use generalized approximation of Di(Ri) for components quantizers Set components rates Ri to zero for all transform coefficients, for which the ∂ Pareto condition D(Ri) = −λ cannot be fullfilled for Ri ≥ 0 ∂Ri Distribute rate among remaining coefficients For non-Gaussian sources, the following needs to be considered in addition The transform coefficients have different (non-Gaussian) distributions (except for large transform sizes) Using the same quantizer design for all transform coefficients with 2 Di(Ri) = σi g(Ri) is suboptimal

o January 23, 2013 27 / 56 Transform Coding Linear Transform Examples

WHT: WHTDFT DCT KLT Walsh Hadamard

b0 Transform

b 1 DFT:

b2 Discrete Fourier Transform b3

b4 DCT: Discrete Cosine b5 Transform

b6 KLT: b7 Karhunen Lo`eve Transform

o January 23, 2013 28 / 56 Transform Coding Karhunen Lo`eveTransform (KLT)

Karhunen Lo`eveTransform Orthogonal transform that decorrelates the input vectors Transform matrix depends on the source Autocorrelation matrix of input vectors s

n T o RSS = E SS (47)

Autocorrelation matrix of transform coefficient vectors u

T T T T RUU = E{UU } = E{(AS)(AS) } = AE{SS }A T = ARSS A (48)

By multiplying with A−1 = AT from the front, we get

T T RSS · A = A · RUU (49)

To get uncorrelated transform coefficients, we need to obtain a diagonal autocorrelation matrix RUU for the transform coefficients o January 23, 2013 29 / 56 Transform Coding Karhunen Lo`eveTransform (cont’d)

Expression for autocorrelation matrices T T RSS · A = A · RUU (50)

RUU is a diagonal matrix if the eigenvector equation

RSS bi = ξi bi (51) T is fulfilled for all basis vectors bi (column vectors of A , row vectors of A) The transform matrix A decorrelates the input vectors if its rows are equal to the unit-norm eigenvectors vi of RSS h iT AKLT = v0 v1 ··· vN−1 (52)

The resulting autocorrelation matrix RUU is a diagonal matrix with the eigenvalues of RSS on its main diagonal   ξ0 0 ··· 0  0 ξ1 ··· 0  R =   (53) UU  . . .. .   . . . .  0 0 ··· ξN−1 o January 23, 2013 30 / 56 Transform Coding Optimality of KLT for Gaussian Sources

Transform coding with orthogonal N ×N transform matrix A and B = AT Scalar quantization using scaled quantizers N−1 X 2 D(R, Ak) = σi (Ak) · g(Ri) (54) i=0 2 with σi (Ak) being variance of ith transform coefficient and Ak being the transform matrix

Consider an arbitrary orthogonal transform matrix A0 and an arbitrary bit T PN−1 allocation given by the vector r = [R0, ··· RN−1] with i=0 Ri = R Starting with arbitrary orthogonal matrix A0, apply iterative algorithm that generates a series of orthonormal transform matrices {Ak}, k = 1, 2, ...

Iteration Ak+1 = J kAk consists of Jacobi rotation and re-ordering =⇒ Transform matrix approaches a KLT matrix

Can show that for all Ak: D(R, Ak+1) ≤ D(R, AAk) =⇒ KLT is optimal transform for Gaussian sources (minimizes MSE)

o January 23, 2013 31 / 56 Transform Coding Asymptotic Rate Distortion Efficiency for KLT of Gaussian Sources at High Rates

2 Transform coefficient variances σi are equal to the eigenvalues ξi of RSS High-rate approximation for Gaussian source and optimal ECSQ πe πe D(R) = · σ˜2 · 2−2R = · ξ˜· 2−2R 6 6 πe 1 PN−1 log ξ −2R = · 2 N i=0 2 i · 2 (55) 6 For N → ∞, we can apply Szeg¨ostheorem for infinite Toeplitz matrices: If all eigenvalues ξi of an infinite autocorrelation matrix (N → ∞) are finite and G(ξi) is any continuous function over all eigenvalues,

N−1 1 X 1 Z π lim G(ξi) = G(Φ(ω))dω (56) N→∞ N 2π i=0 −π Resulting distortion-rate function for KLT of infinite size for high rates

∞ πe 1 R π log Φ (ω)·dω −2R D (R) = · 2 2π −π 2 ss · 2 (57) KLT 6 o January 23, 2013 32 / 56 Transform Coding Asymptotic Rate Distortion Efficiency for KLT of Gaussian Sources at High Rates (cont’d)

Asymptotic distortion-rate function for KLT of infinite size for high rates

∞ πe 1 R π log Φ (ω)·dω −2R D (R) = · 2 2π −π 2 ss · 2 (58) KLT 6 Information distortion-rate function (fundamental bound) is by a factor ε2 = πe/6 smaller

1 R π log Φ (ω)·dω −2R D(R) = 2 2π −π 2 ss · 2 (59)

Asymptotic transform gain (N → ∞) at high rates

2 2 −2R 1 R π ε σ 2 ΦSS(ω)dω G∞ = S = 2π −π (60) T ∞ 1 R π log Φ (ω)dω DKLT (R) 2 2π −π 2 SS Asymptotic transform gain (N → ∞) at high rates is identical to the asymptotic prediction gain at high rates

o January 23, 2013 33 / 56 Transform Coding KLT Transform Size Gain for Gauss-Markov at High Rates

Operational distortion-rate function for KLT of size N, ECSQ, and optimum bit allocation for Gauss-Markov sources with correlation factor ρ πe D (R) = · σ2 · (1 − ρ2)1−1/N · 2−2R (61) N 6 s

DN (R) 10 log10 D (R) [dB] ∞ 1 GT = 7.21 dB

ρ = 0.9

transform size N

o January 23, 2013 34 / 56 Transform Coding Distortion-Rate Functions for Gauss-Markov

Distortion-rate curves for coding a first-order Gauss-Markov source with correlation factor ρ = 0.9 and different transform sizes N

ECSQ+KLT, N → ∞ SNR [dB] 30 N = 16 space-filling gain: 1.53 dB N = 8 N = 4 25 distortion-rate N = 2 function D(R) 20 dB 15 .21 ∞ = 7 G T 10 EC-Lloyd (no transform) 5 bit rate [bit/sample] 0 0 1 2 3 4

o January 23, 2013 35 / 56 Transform Coding KLTs for Gauss-Markov Sources

ρ = 0.1 ρ = 0.5 ρ = 0.9 ρ = 0.95

0.5 0.5 0.5 0.5 b0 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b1 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b2 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b3 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b4 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b5 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b6 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 0.5 0.5 b7 0 0 0 0 −0.5 −0.5 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7

o January 23, 2013 36 / 56 Transform Coding Walsh-Hadamard Transform

Very simple orthogonal transform (only additions & final scaling) For transform sizes N that are positive integer power of 2   1 AN/2 AN/2 AN = √ with A1 = [1]. (62) 2 AN/2 −AN/2 Transform matrix for N = 8  1 1 1 1 1 1 1 1   1 −1 1 −1 1 −1 1 −1     1 1 −1 −1 1 1 −1 −1    1  1 −1 −1 1 1 −1 −1 1  A8 = √ ·   (63) 2 2  1 1 1 1 −1 −1 −1 −1     1 −1 1 −1 −1 1 −1 1     1 1 −1 −1 −1 −1 1 1  1 −1 −1 1 −1 1 1 −1 Piecewise-constant basis vectors Image & video coding: Produces subjectively disturbing artifacts when combined with strong quantization o January 23, 2013 37 / 56 Transform Coding Discrete Fourier Transform (DFT)

Discrete version of the Fourier transform Forward Transform N−1 1 X −j 2πkn u[k] = √ s[n]e N (64) N n=0 Inverse Transform N−1 1 X j 2πkn s[n] = √ u[k]e N (65) N k=0 DFT is an orthonormal transform (specified by a unitary transform matrix) Produces complex transform coefficients For real inputs, it obeys the symmetry u[k] = u∗[N − k], so that N real samples are mapped onto N real values FFT is a fast algorithm for DFT computation, uses sparse matrix factorization Implies periodic signal extension: Differences between left and right signal boundary reduces rate of convergence of Fourier series Strong quantization =⇒ significant high-frequent artifacts o January 23, 2013 38 / 56 Transform Coding Discrete Fourier Transform vs. Discrete Cosine Transform

(a) Input time-domain signal (b) Time-domain replica in case of DFT (c) Time-domain replica in case of DCT-II

o January 23, 2013 39 / 56 Transform Coding Derivation of DCT

High-frequent DFT quantization error components can be reduced by introducing implicit symmetry at the boundaries of the input signal and applying a DFT of approximately double length Signal with mirror symmetry  s[n − 1/2] : 0 ≤ n < N s∗[n] = (66) s[2N − n − 3/2] : N ≤ n < 2N √ Transform coefficients (orthonormal: divide u∗[0] by 2)

2N−1 ∗ 1 X ∗ −j 2πkn u [k] = √ s [i]e 2N 2N i=0 N−1 1 X  −j π kn −j π k(2N−n−1) = √ s[n − 1/2] e N + e N 2N n=0 N−1 1 X  −j π k n+ 1 j π k n+ 1  = √ s[n] e N ( 2 ) + e N ( 2 ) 2N n=0 r N−1 2 X  π  1  = s[n] cos k n + (67) N N 2 n=0 o January 23, 2013 40 / 56 Transform Coding Discrete Cosine Transform (DCT)

Implicit periodicity of DFT leads to loss of coding efficiency This can be reduced by introducing mirror symmetry at the boundaries and applying a DFT of approximately double size Due to mirror symmetry, imaginary sine terms get eliminated and only cosine terms remain Most common DCT is the so-called DCT-II (mirror symmetry with sample 1 repetitions at both sides: n = − 2 ) DCT and IDCT Type-II are given by

N−1 X   1 π  u[k] = α s[n] · cos k · n + · (68) k 2 N n=0 N−1 X   1 π  s[n] = α · u[k] · cos k · n + · (69) k 2 N k=0

q 1 q 2 where α0 = N and αn = N for n 6= 0

o January 23, 2013 41 / 56 Transform Coding DCT vs. KLT

Correlation matrix of a first-order Markov processes can be written as

 1 ρ ρ2 ··· ρN−1   ρ 1 ρ ··· ρN−2  R = σ2 ·   (70) SS S  . .. .   . . .  ρN−1 ρN−2 ρN−3 ··· 1

DCT is a good approximation of the eigenvectors of RSS DCT basis vectors approach the basis functions of the KLT for first-order Markov processes for ρ → 1 DCT does not depend on input signal Fast algorithms for computing forward and inverse transform Justification for wide usage of DCT (or integer approximations thereof) in image and video coding: JPEG, H.261, H.262/MPEG-4, H.263, MPEG-4, H.264/AVC, H.265/HEVC

o January 23, 2013 42 / 56 Transform Coding KLT Convergence Towards DCT for ρ → 1

KLT, ρ = 0.9 DCT-II 0.5 0.5 Difference between the transform b0 0 0 −0.5 −0.5 matrices of KLT and DCT-II 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 b1 0 0 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 2 0.5 0.5 δ(ρ) = ||AKLT (ρ) − ADCT ||2 b2 0 0 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 b3 0 0 −0.5 −0.5 0.4 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 δ(ρ) 0.5 0.5 b4 0 0 −0.5 −0.5 0.3 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.5 0.5 b5 0 0 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.2 0.5 0.5 b6 0 0 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0.1 0.5 0.5 ρ b7 0 0 −0.5 −0.5 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 0.4 0.5 0.6 0.7 0.8 0.9 1

o January 23, 2013 43 / 56 Transform Coding Bit Allocation for Audio Signals

Bit Allocation based on human response to sound signals - Psychoacoustic Models (PM) After transform, set of frequencies are grouped together to form bands For each band, PM gives maximum allowed quantization noise so that the distortion cannot be heard, i.e. noise is masked The goals of the encoder are Enforcing the bit rate specified by the user Implementing the PM threshold Adding noise in less offensive places when there are not enough bits As can be seen, transform provides a means for eliminating not only redundancy but also irrelevancy

o January 23, 2013 44 / 56 Transform Coding Transform Type

Coding efficiency for a speech signal [Zelinski and Noll, 1977]

o January 23, 2013 45 / 56 Transform Coding Two-dimensional Transforms

2-D linear transform: input image is represented as a linear combination of basis images An orthonormal transform is separable and symmetric, if the transform of a signal block s of size N × N can be expressed as,

u = A · s · AT (71)

where A is the transformation matrix and u is the matrix of transform coefficients, both of size N × N. The inverse transform is s = AT · s · A (72) Great practical importance: transform requires 2 matrix multiplications of size N × N instead one multiplication of a vector of size 1 × N 2 with a matrix of size N 2 × N 2 Reduction of the complexity from O(N 4) to O(N 3)

o January 23, 2013 46 / 56 Transform Coding DCT Example

210

2 200 2 600

4 190 4 500 180 6 6 170 400 8 8 160 300 10 150 10

140 200 12 12 130 14 14 100 120 16 16 110 2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 Image block DCT column-wise

1-d DCT is applied column-wise on image block to obtain DCT column-wise result Notice the energy concentration in the first row (DC coefficients)

o January 23, 2013 47 / 56 Transform Coding DCT-Example (contd.)

2500

2 600 2

4 4 2000 500

6 6 400 1500 8 8 300 10 10 1000

200 12 12

500 14 100 14

16 16

2 4 6 8 10 12 14 16 2 4 6 8 10 12 14 16 DCT column-wise DCT row-wise

For convenience, DCT column-wise of previous slide is repeated on left side 1-d DCT is applied row-wise on DCT column-wise result to obtain final result Notice the energy concentration in the first coefficient

o January 23, 2013 48 / 56 Transform Coding Entropy coding of transform coefficients

AC coefficients are very likely equal to zero (for moderate quantization) For 2-d, ordering of the transform coefficients by zig-zag (or similar) scan Example for zig-zag scanning in case of a 2-d transform 185 3 1 1 -3 2 -1 0 1 1 -1 0 -1 0 0 1 0 0 1 0 -1 0 0 0 1 1 0 -1 0 0 0 -1 0 0 1 0 0 0 -1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0

Huffman code for events {number of leading zeros, coefficient value} or events {end-of-block, number of leading zeros, coefficient value} Arithmetic coding: For example, use probabilities that particular coefficient is unequal to zero when quantizing with a particular step size

o January 23, 2013 49 / 56 Transform Coding Analysis of DCT Coefficient Distributions for Images

Let sx,y denote pixel intensity in a block. The sx,y’s are assumed to be identically distributed, but not necessarily Gaussian

DCT is a weighted summation of sx,y’s By the central limit theorem, the weighted summation of identically distributed random variables can be well approximated as having a Gaussian distribution Therefore, DCT coefficients of this block should be approximately distributed as Gaussian 2 2 1 − u f(u|σ ) ≈ √ e 2σ2 (73) 2πσ In typical images, variance of the blocks has approximately an exponential distribution 2 f(σ2) ≈ λe−λσ (74) Can show that pdf of each transform coefficient then has approximately a Laplacian distribution r λ √ f(u) ≈ e− 2λ|u| (75) 2 o January 23, 2013 50 / 56 Transform Coding Example Histograms of DCT Coefficients – Picture “Lena”

Histogram of transform coeffcients for picture “Lena”

o January 23, 2013 51 / 56 Transform Coding Distribution of Variances

2 f(σ2) ≈ λe−λσ

σ2

o January 23, 2013 52 / 56 Transform Coding Calculation of Approximate Distribution of DCT Coefficients

Distribution of DCT coefficients can be written as Z ∞ f(u) = f(u|σ2) · f(σ2) · dσ2 (76) 0 With the conditional pdf assumption

2 2 1 − u f(u|σ ) = √ e 2σ2 (77) 2πσ we get a Laplacian distribution for the transform coefficients

Z ∞ 2 1 − u −λσ2 2 f(u) = √ e 2σ2 · λe · dσ 0 2πσ

r Z ∞ 2 2 − u −λσ2 = λ e 2σ2 · dσ π 0

r r q 2 r √ 2 1 π −2 λu λ − 2λ|u| = λ e 2 = e (78) π 2 λ 2 √ 2 −2 Note (from integral table): R ∞ e−ax −bx dx = 1 p π e−2 ab 0 2 a o January 23, 2013 53 / 56 Transform Coding Modified Discrete Cosine Transform (MDCT)

MDCT introduced by [Princen, Johnson, and Bradley 1987] to avoid block boundaries artifacts MDCT is a that maps 2N dimensional data to N dimensional data F : R2N → RN Forward transform

2N−1 X  π  1 N   1 u = s cos n + + k + (79) k n N 2 2 2 n=0 Inverse transform

N−1 1 X  π  1 N   1 s = u cos n + + k + (80) n N k N 2 2 2 k=0 Perfect reconstruction is achieved by adding the IMDCTs of subsequent overlapping blocks Used extensively in audio coding - MP3, AAC, Dolby AC3, etc. o January 23, 2013 54 / 56 Transform Coding MDCT Block Diagram

Frame k Frame k+1 Frame k+2 Frame k+3 N N N N

2N MDCT N

2N MDCT N

2N MDCT N

N IMDCT 2N

N IMDCT 2N

N IMDCT 2N

N N Frame k+1 Frame k+2

o January 23, 2013 55 / 56 Transform Coding Summary on Transform Coding

Orthonormal transform: rotation of coordinate system in signal space Purpose of transform: decorrelation, energy concentration =⇒ Align quantization cells with primary axis of joint pdf KLT achieves optimum decorrelation, but signal dependent and, hence, without a fast algorithm DCT shows reduced blocking artifacts compared to DFT For Gauss-Markov and ρ → ∞: DCT approaches KLT For Gaussian sources: Bit allocation proportional to logarithm of variance – similar to bit allocation in discrete sets (D + λR) For high rates: Optimum bit allocation yields equal component distortion Larger transform size increases gain for Gauss-Markov source For picture coding: decorrelating transform + entropy-constrained quantization + zig-zag scan + entropy coding is widely used today (e.g. JPEG, MPEG-1/2/4, ITU-T H.261/2/3/4) For pictures: pdf of transform coefficients is Laplacian because of exponential distribution of block variances For audio coding: MDCT is widely used o January 23, 2013 56 / 56