ENEE425: Digital Signal Processing, Fall 2009 The Discrete Cosine Transform
Steve Tjoa Dept. of Electrical and Computer Engineering, University of Maryland
November 19, 2009
1 Definitions
(Some of the variable names in this guide are different from the ones in Oppenheim and Schafer.)
• The discrete cosine transform (DCT) of a signal x[n] is
N−1 X π(2n + 1)k yc[k] = α[k] x[n] cos 2N n=0
where the scaling factor α[k] is equal to ( p1/N, k = 0, α[k] = p2/N, k ∈ {1, 2, ..., N − 1}.
• Like the discrete Fourier transform (DFT), the DCT provides a decomposition of any discrete- time signal as a weighted sum of basis functions. In the DFT, the basis functions are complex exponentials. In the DCT, the basis functions are cosines.
• There are actually eight versions of the DCT: type-I through type-VIII. The version mentioned above, and the one used most often in practice, is the type-II DCT.
• The inverse discrete cosine transform (IDCT) of yc[k] is
N−1 X π(2n + 1)k x[n] = α[k]yc[k] cos . 2N k=0
• Define the vectors x and yc as
x[0] yc[0] x[1] yc[1] x = yc = . . . . . x[N − 1] yc[N − 1]
1 Define the DCT matrix C as c[0, 0] c[0, 1] ··· c[0,N − 1] c[1, 0] c[1, 1] ··· c[1,N − 1] . . .. . . . . . c[N − 1, 0] c[N − 1, 1] ··· c[N − 1,N − 1]
π(2n+1)k c where c[k, n] = α[k] cos 2N . Then y = Cx. • Exercise: Show that CT C = I. (If CT C = I, then C is an orthonormal matrix.) • Since C is an orthonormal matrix, then
CT yc = CT Cx = x.
Therefore, the forward and inverse DCT can be concisely described as
yc = Cx x = CT yc.
2 Properties
1. If x[n] is real for all n, then yc[k] is real for all k.
c P 2 P c 2 2. Parseval’s Relation: ||x|| = ||y ||, or equivalently, n |x[n]| = k |y [k]| . 3. The DCT has excellent energy compaction for many real-world signals (e.g., signals with high correlation among neighboring samples). Digression: Let ys = Sx, where ys is the discrete sine transform (DST) of x, and S is the DST matrix. (Properties 1 and 2 also hold for the DST. We will not properly introduce the DST because it is rarely used.) If, for most real-world signals x[n] and for any choice of K less than N − 1,
K K X X |yc[k]|2 > |ys[k]|2 k=0 k=0 then the DCT has better energy compaction than the DST (generally speaking). Fact: The DCT has better energy compaction than the DST.
3 Uses in Signal Compression
• Compression of a digital signal involves both truncation and quantization of its transform coefficients. • The mean-squared error (MSE) between two signals of length N, x[n] andx ˆ[n], is defined to be N−1 1 1 X MSE(x, xˆ) = ||x − xˆ||2 = |x[n] − xˆ[n]|2. N N n=0
2 • The peak signal-to-noise ratio (PSNR) between x[n] andx ˆ[n] is, in decibels,
P 2 PSNR(x, xˆ) = 10 log 10 MSE(x, xˆ) where P is the maximum possible value that x[n] orx ˆ[n] can take. For example, in images, P = 255 because eight-bit pixel values are between 0 and 255.
• Define yc[k], 0 ≤ k ≤ K, ys[k], 0 ≤ k ≤ K, yˆc[k] = yˆs[k] = 0, K < k ≤ N − 1, 0, K < k ≤ N − 1.
(Exercise: Show that an equivalent way of comparing the energy compaction between the DCT and DST is ||yˆc|| > ||yˆs||.) Now, define the reconstructed signals xˆc = CT yˆc and xˆs = CT yˆs. Exercise: Use the energy compaction property to show that
MSE(x, xˆc) < MSE(x, xˆs),
i.e., xˆc retains more information about x than xˆs. Equivalently,
PSNR(x, xˆc) > PSNR(x, xˆs).
• When compressing signals, the DCT is rarely computed over the entire signal. Instead, the DCT is computed for several (possibly overlapping) shorter segments within the signal.
• Suppose that, for some integers k and l, Var(yc[k]) > Var(yc[l]). If more bits are devoted to representing yc[k] than yc[l], then the MSE is higher than the case where more bits are devoted to representing yc[l] than yc[k].
• The compression ratio (CR) of a compression scheme is the original file size divided by the compressed file size. A good compression scheme has a high compression ratio.
4 Two-Dimensional DCT
• The two-dimensional DCT of a two-dimensional signal x[m, n] is
Yc = CXCT
and the inverse two-dimensional DCT of yc[k, l] is
X = CT YcC .
• Interpretation: The 1D DCT provides a decomposition of x[n] as a weighted sum of basis vectors (i.e., cosines), where yc[k] defines the weights of each basis vector. Similarly, the 2D DCT decomposes x[m, n] as a weighted sum of basis images where yc[k, l] defines the weights of each basis image.
3