<<

ENEE425: Digital Processing, Fall 2009 The Discrete Cosine Transform

Steve Tjoa Dept. of Electrical and , University of Maryland

November 19, 2009

1 Definitions

(Some of the variable names in this guide are different from the ones in Oppenheim and Schafer.)

• The discrete cosine transform (DCT) of a signal x[n] is

N−1 X π(2n + 1)k yc[k] = α[k] x[n] cos 2N n=0

where the scaling factor α[k] is equal to ( p1/N, k = 0, α[k] = p2/N, k ∈ {1, 2, ..., N − 1}.

• Like the discrete Fourier transform (DFT), the DCT provides a decomposition of any discrete- time signal as a weighted sum of basis functions. In the DFT, the basis functions are complex exponentials. In the DCT, the basis functions are cosines.

• There are actually eight versions of the DCT: type-I through type-VIII. The version mentioned above, and the one used most often in practice, is the type-II DCT.

• The inverse discrete cosine transform (IDCT) of yc[k] is

N−1 X π(2n + 1)k x[n] = α[k]yc[k] cos . 2N k=0

• Define the vectors x and yc as

 x[0]   yc[0]   x[1]   yc[1]  x =   yc =   .  .   .   .   .  x[N − 1] yc[N − 1]

1 Define the DCT matrix C as  c[0, 0] c[0, 1] ··· c[0,N − 1]   c[1, 0] c[1, 1] ··· c[1,N − 1]     . . .. .   . . . .  c[N − 1, 0] c[N − 1, 1] ··· c[N − 1,N − 1]

π(2n+1)k c where c[k, n] = α[k] cos 2N . Then y = Cx. • Exercise: Show that CT C = I. (If CT C = I, then C is an orthonormal matrix.) • Since C is an orthonormal matrix, then

CT yc = CT Cx = x.

Therefore, the forward and inverse DCT can be concisely described as

yc = Cx x = CT yc.

2 Properties

1. If x[n] is real for all n, then yc[k] is real for all k.

c P 2 P c 2 2. Parseval’s Relation: ||x|| = ||y ||, or equivalently, n |x[n]| = k |y [k]| . 3. The DCT has excellent compaction for many real-world (e.g., signals with high correlation among neighboring samples). Digression: Let ys = Sx, where ys is the discrete sine transform (DST) of x, and S is the DST matrix. (Properties 1 and 2 also hold for the DST. We will not properly introduce the DST because it is rarely used.) If, for most real-world signals x[n] and for any choice of K less than N − 1,

K K X X |yc[k]|2 > |ys[k]|2 k=0 k=0 then the DCT has better energy compaction than the DST (generally speaking). Fact: The DCT has better energy compaction than the DST.

3 Uses in Signal Compression

• Compression of a involves both truncation and quantization of its transform coefficients. • The mean-squared error (MSE) between two signals of length N, x[n] andx ˆ[n], is defined to be N−1 1 1 X MSE(x, xˆ) = ||x − xˆ||2 = |x[n] − xˆ[n]|2. N N n=0

2 • The peak signal-to- ratio (PSNR) between x[n] andx ˆ[n] is, in decibels,

P 2 PSNR(x, xˆ) = 10 log 10 MSE(x, xˆ) where P is the maximum possible value that x[n] orx ˆ[n] can take. For example, in images, P = 255 because eight- pixel values are between 0 and 255.

• Define  yc[k], 0 ≤ k ≤ K,  ys[k], 0 ≤ k ≤ K, yˆc[k] = yˆs[k] = 0, K < k ≤ N − 1, 0, K < k ≤ N − 1.

(Exercise: Show that an equivalent way of comparing the energy compaction between the DCT and DST is ||yˆc|| > ||yˆs||.) Now, define the reconstructed signals xˆc = CT yˆc and xˆs = CT yˆs. Exercise: Use the energy compaction property to show that

MSE(x, xˆc) < MSE(x, xˆs),

i.e., xˆc retains more about x than xˆs. Equivalently,

PSNR(x, xˆc) > PSNR(x, xˆs).

• When compressing signals, the DCT is rarely computed over the entire signal. Instead, the DCT is computed for several (possibly overlapping) shorter segments within the signal.

• Suppose that, for some integers k and l, Var(yc[k]) > Var(yc[l]). If more are devoted to representing yc[k] than yc[l], then the MSE is higher than the case where more bits are devoted to representing yc[l] than yc[k].

• The compression ratio (CR) of a compression scheme is the original file size divided by the compressed file size. A good compression scheme has a high compression ratio.

4 Two-Dimensional DCT

• The two-dimensional DCT of a two-dimensional signal x[m, n] is

Yc = CXCT

and the inverse two-dimensional DCT of yc[k, l] is

X = CT YcC .

• Interpretation: The 1D DCT provides a decomposition of x[n] as a weighted sum of basis vectors (i.e., cosines), where yc[k] defines the weights of each basis vector. Similarly, the 2D DCT decomposes x[m, n] as a weighted sum of basis images where yc[k, l] defines the weights of each basis image.

3