Quick viewing(Text Mode)

Lecture 13 Discrete Fourier Transforms (Cont'd)

Lecture 13 Discrete Fourier Transforms (Cont'd)

Lecture 13

Discrete Fourier Transforms (cont’d)

The Discrete Cosine Transform (DCT)

Here we briefly examine the DCT. It actually exists in several forms, one of which provides the basis of the standard “JPEG” compression method. First recall the standard definition of the DFT which is employed in this course. Let f RN ∈ denote a set of data points f[n],n = 0, 1, , N 1, with the understanding that f is part of an · · · − N-periodic sequence, i.e., f[n + N] = f[n], n Z. Then the Discrete F = f is ∈ F defined as follows, N 1 − i2πkn F [k]= f[n]exp , k = 0, 1, , N 1. (1) − N  · · · − nX=0 The inverse DFT is given by

N 1 1 − i2πkn f[n]= F [k]exp , n = 0, 1, , N 1. (2) N  N  · · · − Xk=0

Recall from our discussion of the representations of functions f(x) of a continuous variable π x π, that the series actually defines the 2π-periodic extension of f(x) for all x R. If − ≤ ≤ ∈ f(π) = f(π), then the 2π-periodic extension of f(x) is not continuous at these points. As we saw, such 6 a discontinuity introduces convergence problems at the endpoints, i.e., Gibbs phenomenon. One way to overcome such problems was to consider the function f(x) as defined over the half-interval [0,π]. It was also assumed to be an even function, i.e., f( x)= f(x). The resulting 2π-extension of − f(x) is now continuous at π. ± This same complication arises with data sequences. The N-point DFT assumes an N-point periodization of the data set f[n],n = 0, 1, , N 1, as mentioned earlier. This implies that f[0] = · · · − f[N], but it does not imply that f[N 1] is close to f[N], as sketched in the figure below. In such − situations, there will be convergence problems near the endpoints. (And, as we have already seen, these problems manifest themselves over the entire data set, slowing down the overall convergence.) So the idea is to perform an even periodic extension of the N-point data set f[n],n = 0, 1, , N 1. · · · − Before we get to the actual construction of such a data set, let us examine the implications on the form of the that will be associated with this set. Since it will be even, we shall not require the sine functions that comprise part of the complex exponential in the DFT. As such, we

142 o o

o o

o o o o o o o

N 2N 1 0 1 2 N 2N 10 1 2 − − − −

periodic extension N-point data set periodic extension

N-point data set f[n] and the periodic extension f[n + N]= f[n] implicit in the discrete Fourier transform.

might expect that the transform will assume the form,

N 1 − 2πkn F [k]= f[n] cos , k = 0, 1, , N 1. (3)  N  · · · − nX=0 This is almost correct – one slight modification must be made. This would be the transform for N points over an interval centered at N = 0. But really want this transform to correspond to a signal of N points only on one side of N = 0. Remember that these N points would then be “flipped” onto the other side to create the even data set. As such – modulo one other technical consideration to be addressed in a moment – the above formula should have N changed to 2N. The resulting transform,

N 1 − πkn F [k]= f[n] cos , k = 0, 1, , N 1. (4)  N  · · · − Xn=0 is known as a discrete cosine transform (DCT). Note the use of the article “a” again! This is actually one of several versions, and is commonly referred to as “DCT-I”, i.e., DCT Version 1.

Let’s now address the “technical consideration” referred to above. Consider the “flipping” of the N-point data set f[n],n = 0, 1, , N 1 to produce an even data set, f[n] = f[ n], as sketched in · · · − − the figure below. In this process, we have produced a data set of length 2N 1 and not 2N since the data point − f[0] has not been repeated. Essentially, we have “lost” one data point. This might not seem like a big deal, and it isn’t in some considerations. But studies have shown that this transform is not ideal – that convergence is improved if we preserve the point f[0] to produce a 2N-point data set. The question is, “How do we do it?” The answer is to “flip” the data set f[n],n = 0, 1, , N 1 with · · · − respect to the line n = 1/2, as sketched in the figure below. −

143 o

o o o o o o o o o o

N + 1N + 2 2 1 0 1 2 N 2 N 1 − − − − − −

even extension N-point data set

Even extension of N-point data set f[n] obtained by inverting the set about f[0] so that f[ n]= f[n]. This is − the basis of the “DCT-I” method.

The resulting 2N-point data set is now even with respect to n = 1/2, and may be periodically − extended. Note that f[0] = f[ 1], f[N 1] = f[N], etc.. In other words, repetition of data values − − has been introduced at points n = pN, p Z: f[pN]= f[pN 1]. ∈ −

o o

o o o o o o o o o o

N N + 1 2 1 0 1 2 N 2 N 1 − − − − − − -1/2

N-point data set N-point data set even extension Even extension of N-point data set f[n] obtained by inverting and copying the entire set f[n], including f[0]. The result is a 2N-point data set. This is the basis of the “DCT-II” method.

The remaining question, “What will be the proper form of the associated discrete cosine transform for this data set?” First of all, because we are now working with a 2N-point data set – the set is 2N-periodic and not N-periodic – the N in the argument of the cosine function must be replaced by 2N. Second, because the data set is even with respect to n = 1/2, we must shift the n parameter in −

144 Eq. (4) by one-half to the left, i.e., replace n with n + 1/2. The result,

N 1 − π 1 F [k]= f[n] cos n + k , k = 0, 1, , N 1. (5) N  2  · · · − nX=0 will define what we shall call Version 1.1 of the discrete cosine transform, or “DCT-I.I”.

Before discussing this version of the DCT in more mathematical detail, we mention that a com- parison of the two figures above indicates why the DCT-I.I method may perform better in that DCT-I. The copying of the f[0] data value in DCT-I.I produces a kind of “flattening” of the resulting signal at n = 1 and n = 0, as opposed to a potential cusp produced by DCT-I. −

And now for the mathematical details. First of all, we claim that the set of N-vectors uk, k = 0, 1, 2, , N 1, with components defined as follows, · · · − π 1 u [n] = cos n + k , n = 0, 1, , N 1, (6) k N  2  · · · − form an orthogonal set in RN . The first basis vector, k = 0, is an easy one:

u [n] = cos(0) = 1, implying that u = (1, 1, , 1). (7) 0 0 · · ·

It follows immediately that N 1 − u , u = 1= N. (8) h 0 0i nX=0 For k = 0, we have 6 N 1 − π 1 u , u = cos2 n + k h k ki N  2  nX=0 N 1 − 1 1 (2n + 1)π = + cos k 2 2  N  nX=0 N = . (9) 2

The fact that the sum of the discrete cosine functions is zero may be verified by expressing the cosine in terms of complex exponentials. The sums over both exponentials are finite geometric series which vanish, in the same way that they did for the discrete Fourier transform. Finally, for k = l, we simply state that 6 N 1 − π 1 π 1 u , u = cos n + k cos n + l h k li N  2  N  2  nX=0

145 N 1 N 1 1 − π 1 1 − π 1 = cos n + (k + l) + cos n + (k l) 2 N  2  2 N  2 −  nX=0 nX=0 = 0. (10)

Once again, the fact that each of the sums is zero may be shown by expressing the cosine functions in terms of complex exponentials.

From the above results, it follows that the family of N-vectors, ek, defined below, forms an orthonormal basis on RN : 2 π 1 e [n]= λ cos n + k , n = 0, 1, , N = 1, (11) k krN N  2  · · · where 1 , k = 0 √2 λk =  (12)  1, k = 0. 6 The special normalization required for the cos(0) function reminds us of the situation with Fourier cosine series.

In the figure below are presented plots of the N = 8-point orthonormal functions ek[n], k = 0, 1, , 7. These functions are rather special since they form the basis of the JPEG compression · · · standard.

Given any f RN , its expansion in the orthonormal basis e will be given by ∈ k N 1 − f = ckek, (13) nX=0 where the Fourier coefficients ck are given by N 1 − 2 π 1 c = f, e = f[n]λ cos n + k . (14) k h ki krN N  2  nX=0

As in the case of the discrete Fourier transform, we consider the ck to define the discrete cosine transform (DCT) of f, i.e.,

N 1 2 − π 1 F [k]= λ f[n] cos n + k . (15) krN N  2  Xn=0

The inverse discrete cosine transform may be found by using the orthonormality of the ek: N 1 2 − π 1 f[n]= F [k]λ cos n + k . (16) rN k N  2  Xk=0

146 0.5 0.5 0.5 0.5

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

0 0 0 0

-0.1 -0.1 -0.1 -0.1

-0.2 -0.2 -0.2 -0.2

-0.3 -0.3 -0.3 -0.3

-0.4 -0.4 -0.4 -0.4

-0.5 -0.5 -0.5 -0.5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 n n n n

e0[n] e1[n] e2[n] e3[n]

0.5 0.5 0.5 0.5

0.4 0.4 0.4 0.4

0.3 0.3 0.3 0.3

0.2 0.2 0.2 0.2

0.1 0.1 0.1 0.1

0 0 0 0

-0.1 -0.1 -0.1 -0.1

-0.2 -0.2 -0.2 -0.2

-0.3 -0.3 -0.3 -0.3

-0.4 -0.4 -0.4 -0.4

-0.5 -0.5 -0.5 -0.5

0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 0 1 2 3 4 5 6 7 n n n n

e4[n] e5[n] e6[n] e7[n]

The N = 8-point DCT-II orthonormal functions e [n], k = 0, 1, , 7, plotted as bargraphs because of their k · · · discrete nature.

These two formulas define the so-called “DCT-II” discrete cosine transform that is em- ployed in the so-called JPEG compression standard. They also correspond to the dct and idct functions in MATLAB, i.e.,

F = dct(f) f = idct(F) .

In many signal processing books, however, (see, for example, the book by Mallat) the practice is to remove the normalization term from the forward DCT, and then modify the inverse DCT accordingly. The result is as follows, N 1 − π 1 F [k]= λ f[n] cos n + k . (17) k N  2  nX=0 with inverse N 1 2 − π 1 f[n]= F [k]λ cos n + k . (18) N k N  2  Xk=0

The normalization factors λk must be kept in both formulas, to differentiate the k = 0 case from the k = 0 case. 6

147 Examples: N = 4: We use the 4-vectors examined in the section on DFT. In what follows, we use the DCT-II transforms (which may be verified with MATLAB):

1. f = (1, 1, 1, 1). F = (2, 0, 0, 0).

2. g = (0, 1, 0, 1). G = (1, ( √2 + 1) cos(π/8), 0, cos(π/8)) (1, 0.3827, 0, 0.9239). − − ≈ − − 3. h = (1, 2, 1, 2). H = (3, ( √2 + 1)(cos(π/8), 0, cos(π/8)). − − Note that f = g + h, implying that F = G + H, because of the linearity of the DCT.

4. a = (1, 0, 1, 0). A = (1, (√2 1) cos(π/8), 0, cos(π/8)) (1, 0.3827, 0, 0.9239). − ≈ Note that a represents a left (or right) shift of vector b in 2.. The transforms G and A appear to be related.

Applications of DCT to signal processing

As mentioned several times above, the DCT is important in signal processing. Generally speaking, DCTs demonstrate much greater “energy compaction” that discrete Fourier transforms. By this we mean that more of the “energy” of the signal – the squared L2 , measured by the sums of the squares of the coefficients – is contained in the lower frequency coefficients. (Another way of saying this is that the DCT coefficients decay more rapidly.) This is important in compression – a signal can generally be approximated to prescribed degree of accuracy with less DCT coefficients than DFT coefficients.

We illustrate with a simple example. Consider the function

x2/10 f(x)= e− [sin(2x) + 2 cos(4x) + 0.4 sin(x) sin(10x)], 0 x 2π. (19) ≤ ≤ that was used in previous lectures. Once again, we consider an N = 256 sampling of this function over the interval [0, 2π]. The DFT and DCT of the signal f[n], n = 0, 1, , 255 are shown in the figure · · · below. (The DFT appeared in an earlier lecture.) The first notable difference between the plots is that the DCT does not have high-magnitude coefficients near N. This is because the DCT does not possess conjugate symmetry. As such, the high-frequency range for DCT is k N, and not k near N/2 as was the case for DFT. →

148 125 125

100 100

75 75 |F(k)| |DCT(k)|

50 50

25 25

0 0 0 50 100 150 200 250 0 50 100 150 200 250 k k

DFT and DCT spectra of sampled signal f[n], n = 0, 1, ,N 1, N = 256, obtained from function f(x) = · · · − 2 e−x /10[sin(2x) + 2cos(4x)+0.4 sin(x) sin(10x)], 0 x 2π. Left: Magnitudes of DFT coefficients. Right: ≤ ≤ Magnitudes of DCT coefficients. The DCT demonstrate a much greater “energy compaction”: Most of the

“energy” (squared L2 norm) of the signal is contained in the first 23 DCT coefficients.

The second feature is that for N > 30, the DCT coefficients are virtually negligible. This indicates the energy compaction property mentioned earlier.

A simple experiment to compare DFT and DCT methods of denoising by thresholding

Let f0 denote an N-point signal that is assumed to be “noiseless.” If the DCT coefficients demonstrate greater “energy compaction”, then for some integer k0 > 0, we expect that the DCT coefficients F0[k] are negligible. Now suppose that Gaussian noise (0,σ) (zero-mean, variance σ2) is added to f to N 0 produce a noisy signal f, i.e.,

f = f0 + n, (20) where n is an N-vector whose entries are random numbers.

The DCT spectrum of pure noise – let’s call it Fn[k] – has the same characteristics as its DFT spectrum, i.e., statistical fluctuations, but no overall decay with k. From the linearity property of the DCT,

F [k]= F0[k]+ Fn[k], (21) it follows that for k > k , F [k] F [k]. In other words, the DCT coefficients F [k] for k > k 0 ≈ n 0 correspond to the noise, and carry no information from the signal f0. As such, if we can remove these, we are removing some of the noise and not affecting the original signal.

149 This is more difficult with the discrete Fourier transform since, as we saw earlier, the DFT coef- ficients do decay in magnitude but do not become negligible. To test this conjecture, we have applied the thresholding method for both DFT and DCT rep- resentations of the noisy signal f obtained by adding Gaussian noise with zero-mean and standard deviation σ = 0.1 to the N = 255 signal of Eq. (19). (This is the same noisy signal that was used in an earlier lecture to illustrate the denoising method for DFTs.) Recall the basis of the thresholding method: Given a threshold ǫ > 0, and a transform F – either a DFT or a DCT – we discard all coefficients F [k] whose magnitudes lie below ǫ. This produces a modified transform F˜ǫ which, when inverted, yields a modified, signal f˜ǫ which represents an approximation to the noiseless signal f0. For both representations, we have computed the L2 errors f f˜ for threshold parameters ǫ k 0 − ǫk ranging from 0 to 10.0. The results are shown in the plot below. For 0 ǫ 0.5, the two transforms ≤ ≤ yield virtually identical results, with almost no improvement in the error. For ǫ > 1, however, the respective errors begin to diverge dramatically, with the DCT method yielding lower errors and the DFT yielding higher errors. Between ǫ = 3 and ǫ = 4, the DCT thresholding method yields the lowest error – roughly one-half the error associated with the original noisy signal. This simple experiment shows the advantage of working with the more compact DCT represen- tation. A final comment: The “unsymmetric” form of the DCT was employed in these computations, where there is no factor in front of the forward DCT and a factor of 1/N in the inverse DCT. In this way, the magnitudes of the DFT and DCT coefficients are comparable, so that it makes sense to compare the results for the two transforms at a common ǫ value.

150 3

2.75

2.5

2.25 DFT

2

1.75

1.5

1.25

1

0.75 DCT

0.5

0.25

0 0 1 2 3 4 5 6 7 8 9 10 eps

Errors f0 f˜ vs. ǫ for thresholding of DFT and DCT coefficients of the noisy signal f = f0 + n, σ =0.1. k − ǫk

DFTs of two-dimensional data sets

The DFT is easily extended to handle two-dimensional data sets, e.g., images, by using a tensor product form of the orthogonal basis elements. In what follows, we assume that the data set is an N M array – for example, an array “f” of greyscale values of an image – denoted as follows, ×

f[n,m], 0 n N 1, 0 m M 1. (22) ≤ ≤ − ≤ ≤ −

Recall that the vectors uk, defined as follows,

i2πkn u [n] = exp , (23) k  N  formed an orthonormal basis of the space CN of complex-valued N-vectors. We now construct the following tensor product vectors ukl,

i2πkn i2πlm u [n,m] = exp exp , 0 n N 1, 0 m M 1. (24) kl  N   M  ≤ ≤ − ≤ ≤ −

This set of NM vectors forms an orthogonal basis in the space CN CM = CNM. ×

The “Version 3” Discrete Fourier Transform of f will be defined as

N 1 M 1 − − i2πkn i2πlm F [k, l]= f[n,m]exp exp . (25) − N  − M  Xn=0 mX=0

151 And the inverse DFT associated with this DFT will be given by

N 1 M 1 1 − − i2πkn i2πlm f[n,m]= F [k, l]exp exp . (26) NM  N   M  Xk=0 Xl=0 (The Version 3 DFT does not have a factor, implying that the inverse DFT must have the entire 1/(NM) factor.) As in the one-dimensional case, the above Version 3 forms of the DFT and IDFT are employed in MATLAB, and are denoted as follows,

F = fft2(f) f = ifft2(F) .

Recall that in the one-dimensional case, the high-frequency DFT coefficients F [k] of a real N- point signal lie in the region surrounding the midpoint k = N/2. As such, the low-frequency DFT coefficients, typically the ones with the largest magnitudes, lie in the areas near k = 0 and k = N 1. − In the two-dimensional case, the DFT coefficients F [k, l] must now be arranged in a rectangular array in kl-space, as illustrated schematically in the figure below. The DFT coefficients with highest magnitudes will typically be those coefficients with low frequencies in both the k and l directions, which means the four corners of the square at which k or l are near either 0 or N 1. DFT coefficient − of higher magnitude will also be found The DFT coefficients with smallest magnitudes will be those coefficients corresponding to both high k and l frequencies. This corresponds to the subregion of the rectangle near its midpoint [k, l] = [N/2, M/2], which has been shaded in the figure. Since the 2D-DFT employs the complex exponential functions used for the 1D-DFT, one expects that the 2D-DFT coefficients F [k, l] associated with a real-valued 2D data set, f[n,m], also satisfy a conjugate symmetry property. In fact, there will be more than one conjugate symmetry because products of the complex exponentials are involved. We expect conjugate symmetry in both directions. We state the following results without proof, leaving the proofs as an exercise.

1. Case 1: Keeping l fixed, we have

F [N k, l]= F [k, l] . (27) − Case 2: Keeping k fixed, we have

F [k, M l]= F [k, l] . (28) −

The above two results obviously imply that

F [N k, l]= F [k, M l] . (29) − −

152 l

− low l-frequency M 1 region high frequency region in both directions

high l-frequency M/2 region

low l-frequency region k 0 N/2 N − 1

low k-frequency low k-frequency region region high k-frequency region

General behaviour of magnitudes F [k,l] of two-dimensional DFT coefficients. | |

But they may also be combined to derive the result (Exercise)

F [k, l]= F [N k, M l] . (30) − −

The implication of these results is that the lower left quarter of the rectangle in the previous figure, i.e., the coefficients F [k, l], 0 k N/2 1, 0 l M/2 1, (31) ≤ ≤ − ≤ ≤ − define all other coefficients F [k, l] in the rectangle. (Here, we have assumed, for simplicity, that N and M are even.) This lower left quarter contains

N M complex numbers = NM real numbers. (32) 2 × 2

Thus, the dimensionality, or “number of degrees of freedom,” of the 2D-DFT matrix F [k, l] is NM, which is precisely the dimensionality of the 2D real-valued data set f[n,m].

We now illustrate with a practical example. The top left figure below is the original 512 512-pixel × Boat test image, corresponding to N = M = 512. The top right figure is a plot of the magnitudes of the DFT coefficients of the Boat image. Note that the coefficients are displayed in the usual way that matrices are displayed: The top left element is (k, l) = (0, 0) and the bottom right element is (k, l) = (512, 512). In this and subsequent plots, the brightness of the plot is proportional to the magnitude of the DFT coefficient.

153 50

100

150

200

250

300

350

400

450

500 50 100 150 200 250 300 350 400 450 500

Left: Original Boat test image. Right: DFT spectrum of Boat image.

Perhaps the most noteworthy feature of the DFT figure is that it is almost entirely black! In fact, the very tiny regions with DFT coefficients of significant magnitude may not even be visible in this plot. A logarthmic transformation of these values will reduce the disparity in these magnitudes. We’ll return to this idea below. First, however, we mention that it is usually more convenient to consider the region of the DFT spectrum around the zero-frequency point (k, l) = (0, 0) but in all directions. From the periodicity of the DFT, this will necessarily include the points around (k, l) = (N, M). The resulting “shifted” DFT spectrum is shown in the figure to the left below. In MATLAB, this shifting is performed by the following command,

G = fftshift(F) .

(Note the ease in which these operations can be applied, i.e., no need to write any code that produces the shift.) The center of this plot corresponds to frequency (0, 0). As a result, DFT coefficients of highest magnitude are concentrated near (0, 0). Once again, this region is tiny. A logarithmic scaling of the magnitudes produces the figure at the right. Clearly, there is more structure to be seen. But the interpretation of this structure is beyond the scope of this course.

154 50 50

100 100

150 150

200 200

250 250

300 300

350 350

400 400

450 450

500 500 50 100 150 200 250 300 350 400 450 500 50 100 150 200 250 300 350 400 450 500

Left: Shifted DFT magnitude spectrum of Boat image. Bottom right: DFT magnitude spectrum after a logarithmic scaling of the brightness, allowing more structure to be seen.

Some numerical experiments to show the effects of truncating DFT coefficients from images

Here we show the results of a few experiments in which DFT coefficients of an image are removed. In all experiments, we have used the Boat test image, a 512 512 pixel image. As such, N = M = 512. ×

Experiment No. 1: Removal of high frequency coefficients, or “low-pass filtering”

Our first experiment is to remove high frequency coefficients from the DFT coefficient table, as illus- trated in the figure below. For a given n > 0, all high frequency coefficients in the shaded region of the 2D-DFT matrix of coefficients are removed. Since high frequency coefficients are removed and low frequency coefficients are preserved, this type of operation is often referred to as a “low-pass filter” in signal and image processing literature. When the resulting images are viewed on a display monitor, there is no appreciable deterioration of the image for n 100. (There are minor fluctuations in the brightness, but no degradations ≤ of the image structures.) At around n = 120, some “ringing” starts to appear near the edges, in particular the poles and wire of the image. As n increases from 120, the ringing becomes more and more pronounced, increasing in amplitude and distance away from the edges. At n around 200-220, the ringing produces a strong blurring of the image. At n = 240, the image is very blurred. In the figure below, the images corresponding to n = 160, 200, 220 and 240 are shown.

155 l N/2 − n N/2+ n − low l-frequency N 1 region

N/2+ n high l-frequency N/2 region N/2 − n

low l-frequency region k 0 N/2 N − 1

low k-frequency low k-frequency region region high k-frequency region

“Low-pass filter” employed in Experiment No. 1. For a given n> 0. All high frequency coefficients in the shaded region of the 2D-DFT are removed. In this case, N = 512.

Experiment No. 2: Removal of high frequency coefficients associated with only one direction

The next experiment is to remove only one of the two strips in the 2D-DFT table of the previous figure at a time. If we remove the horizontal strip, i.e., high l-frequency coefficients, we expect that rapid vertical variations in the image will be degraded, producing ringing that will be seen in the form of horizontal waves. This effect is seen in the top image of the next figure. On the other hand, if we remove the vertical strip, i.e., high k-frequency coefficients, rapid hor- izontal variations in the image will be degraded, producing ringing that will be seen in the form of vertical waves. This effect is seen in the bottom image of the next figure.

Experiment No. 3: Removal of low frequency coefficients, or “high-pass filtering”

In this experiment, we remove low frequency coefficients associated with both directions, as shown in the figure below. These are generally the coefficients of highest magnitude of the image. Removing them affects the overall brightness of the image. As more and more coefficients are removed, regions of the original image that are “flat” become more “wavy”. There is a kind of ringing about the edges, e.g., poles, boundaries of boat, but the edges are not blurred. At n = 50, the primary edges of the image are still visible.

156 Experiment No. 1: Truncation of high frequencies

Top left: n = 160. Top right: n = 200. Bottom left: n = 220. Bottom right: n = 240.

157 Expt. No. 2: Removal of high frequencies in one direction only

Top: Removal of horizontal strip of high l-frequency coefficients from 2D-DFT table, n = 200. Bottom: Removal of vertical strip of high k-frequency coefficients from 2D-DFT table, n = 200.

158 l

− 1 − − n N n low l-frequency M 1 region M − 1 − n

high l-frequency M/2 region

n n low l-frequency region k 0 N/2 N − 1

low k-frequency low k-frequency region region high k-frequency region

“High-pass filter” employed in Experiment No. 3. For a given n> 0. All low-frequency coefficients in the shaded region of the 2D-DFT are removed. In this case, N = M = 512.

(33)

159 Experiment No. 3: Removal of low frequencies

Top left: n = 10. Top right: n = 20. Bottom left: n = 30. Bottom right: n = 50.

160 Lecture 14

Discrete Fourier transforms (cont’d)

2D-DCT transform and “JPEG”

Discrete cosine transforms for two-dimensional data sets may be defined in a manner analogous to the 2D-DFT discussed above. The first step is to construct a tensor product basis involving the one- dimensional discrete cosine functions discussed earlier. Because of its importance in image processing, we shall discuss only the “Version II” DCT. Recall that this version involves “half-shifted” discrete cosine functions in order to produce 2N-periodic even extension of an N-point data set. In what follows, we shall assume that our 2D data set (image) is a N M array. In this case, the orthogonal × (unnormalized) 2D basis functions will have the form

π 1 π 1 u [k, l] = cos n + k cos m + l , nm N  2  M  2  0 n, k N 1, 0 m, l M 1. (34) ≤ ≤ − ≤ ≤ −

The 2D-DCT of an N M array f will be defined as × N 1 M 1 2 − − π 1 π 1 F [k, l]= λ λ f[n,m] cos n + k cos m + l , (35) k l √ N  2  M  2  NM nX=0 mX=0 where 1 , k = 0 √2 λk =  (36)  1, k = 0. 6 Note that we have included the normalization factors. And the inverse discrete cosine transform is given by

N 1 M 1 2 − − π 1 π 1 f[n,m]= λ λ F [k, l] cos n + k cos m + l . (37) k l √ N  2  M  2  NM Xk=0 Xl=0 As mentioned earlier, this “DCT-II” version of the discrete cosine transform employed in the so- called JPEG compression standard. In MATLAB, it may be invoked using the following commands,

F = dct2(f) f = idct2(F).

Here we simply mention that in the JPEG compression method (which produces the .jpg files with which you are all familiar), an image is partitioned into into nonoverlapping 8 8-pixel blocks, ×

161 and the DCT-II transform is applied to each block separately. The next stage of the compression method is to “quantize” the DCT coefficients according to the desired compression factor – the higher the compression factor, the more information is discarded. It is interesting to view the oscillatory nature of the DCT basis functions, as depicted in the figure below. (This image, 250px-Dctjpeg.png, was downloaded from Wikipedia.) We first consider the entire figure as representing an 8 8 matrix B with elements B , 0 n,m 7. Each element B is a × nm ≤ ≤ nm square 8 8 array in which the values of the basis elements u [k, l], 0 k, l 7 are plotted. × nm ≤ ≤ For example, the top left block is B00, corresponding to the basis function u00, which is constant.

As we move to the right, the blocks B0m represent the basis functions u0m, which are constant in the n (vertical) direction but oscillate in the m (horizontal) direction. The roles of vertical and horizontal oscillation are reversed if we move down the first column. The basis function of greatest oscillation is u88, which is found in block B88 in the lower right corner.

Pictorial representation of the 8 8 DCT basis functions u , 0 n,m 7. × nm ≤ ≤

In the JPEG compression method, it is the coefficients of higher frequencies, i.e., those in the lower right, that are going to be affected the most as the compression factor is raised. One of the most noticeable effects of JPEG compression is the blockiness that is produced by high compression rates. This is a result of treating the 8 8-pixel blocks of the image independently. × There is no guarantee of a smooth transition from one block to another. This “blockiness artifact” of the JPEG scheme is shown in the figure below for the case of the Boat test image. The figure shows JPEG-compressed images for four degrees of compression, ranging from “mild” to “high”. In the JPEG

162 algorithm, compression is achieved by adjusting the so-called “quality factor”, Q. Q = 100 corresponds to no compression and Q = 1 corresponds to full compression. Recall that the compression factor or ratio is the ratio of the size of the original (uncompressed) file (i.e., the amount of memory required to store it) to that of the compressed file. Based upon the filesizes of the original and compressed Boat files, the compression factors associated with each of the images is given in the caption. Qualitatively, the image corresponding to a compression ratio of 12.5 is quite similar to the original (uncompressed) image, which is a testimony to the fact that there is a great deal of redundancy in an image, due primarily to correlation between neighbouring pixels. As the compression is reduced, however, there is an increased degradation of the image, as seen by increased blockiness. Later, if time permits, we’ll see that -based methods, which do not work on independent pixel blocks, do not suffer from blockiness artifacts. That being said, other forms of degradation will be produced for sufficiently high compression rates.

A final comment on “transform coding”

Up to this point, we have spent a good deal of time examining (i) Fourier series and (ii) discete Fourier transforms, showing how they can represent (i) functions and (ii) digital and images, respectively, and how they can be used to perform tasks such as denoising and compression. An important point that may have been missed in these discussions is concerned with digital signals and images: When you download a music track or an image, you are, in general, not downloading the actual digital signal, but rather the coefficients of a suitable basis representation of that signal/image. In other words, when you download an image, you are not downloading the greyscale or red/green/blue values of each pixel – that would be an enormous amount of information to store. (A 512 512 pixel, × 8 bits per pixel image would require about 0.25 megabytes, or 250 kilobytes. A colour image of this size would require about 0.75 megabytes or 750 Kb.) If the image you are downloading is in JPEG format (i.e., picture.jpg), then you are downloading a very efficiently coded and stored version of the DCT coefficients of that image. This is how the image is stored in your digital storage device – in coded form. This is why you image can be stored at less than 10% of its original size, e.g., 75 Kb for a colour image, with no perceptible degradation. (The fact that you can “throw away” over 90% of the information in an image and not lose any visual quality is a testimony to the statement that images are “highly redundant,” i.e., they contain a great deal of redundant information.)

163 JPEG compression of “Boat” image

Top left: Quality factor Q = 60, compression ratio CR = 5.8. Top right: Q = 20, CR = 12.4. Bottom left: Q = 5, CR = 40. Bottom right: Q = 1, CR = 72.6.

164 When you want to look at the image (or hear the music), a decoder is activated, which performs the inverse DCT operation (for images) and displays your image on the screen.

Fourier transforms

Most, if not all, of you are familiar with Fourier transforms. That being said, it is important to step back and revisit some basics in order to see the big picture: the desire to perform a Fourier-based, i.e. frequency-content, analysis of functions on the entire real line, i.e, f : R R. In what follows, we → shall be concerned with functions in either of the spaces L1(R) or L2(R). This of course, implies that f(x) 0 as x . At first, this might seem to be a rather strict condition on f, but when you → → ±∞ consider that, in practice, all signals have a beginning and an end – in other words, they have finite support – it is not too strict at all.

We start with the fact that the following set of complex-valued functions,

u (x)= einπx/L, n , 2, 1, 0, 1, 2, (38) n ∈ {· · · − − ···} forms an orthogonal basis for the complex-valued space of functions L2[ L,L]. (You are, of course, − familiar with the special case L = π, corresponding to the standard Fourier series.) The Fourier series expansion of a function f L2[ L,L] is given by ∈ −

∞ inπx/L f(x)= ane , (39) n=X −∞ 2 where the equality is understood “in the L sense.” The coefficients an are given by L 1 inπt/L an = f(t)e− dt. (40) 2L Z L − We mention here that Eq. (39) essentially represents an expansion of the function f(x) in terms of its frequency components, with the frequencies given by nπ ω = . (41) n L

Let us now substitute Eq. (40) for the an into Eq. (39): L 1 ∞ nπt/L inπx/L f(x) = f(t)e− dt e 2L Z  n=X L −∞ − L 1 ∞ n(x t)πt/L = f(t)e − dt . (42) 2L Z  n=X L −∞ − 165 The idea is now to take the limit L so that the support of our functions f, [ L,L] becomes the →∞ − real line R. From Eq. (41) one might think that this implies that all frequencies ω 0. Letting n → n , however, will still produce frequencies of arbitrarily large magnitude. More important, → ∞ however, is that the spacing between consecutive frequencies will go to zero as L 0. This spacing is → given by π ∆ω = ω ω = . (43) n+1 − n L From this relation, it follows that the factor 1/(2L) in Eq. (42) becomes

1 ∆ω = (44) 2L 2π

Substitution of these expressions into (42) yields

∞ L 1 iωn(x t) f(x) = f(t)e − dt 2π Z  n=X L −∞ − 1 ∞ = G(x,ω ,L)∆ω. (45) 2π n n=X −∞ The sum in the final line may be viewed as the Riemann sum of the function

L iω(x t) G(x,ω,L) = f(t)e − dt Z L − L iωx iωt = e f(t)e− dt . (46) Z L − The sample points ωn are equally spaced over the entire real line. As such, the Riemann sum ap- proximates the integration of the variable ω on R. From Eq. (43, it follows that in the limit L , →∞ ∆ω 0. We now claim (without rigorous proof) that the Riemann sum converges to the integral → ∞ G(x,ω) d ω, (47) Z −∞ where iωx ∞ iωt G(x,ω)= e f(t)e− dt. (48) Z −∞ (Note that x is fixed.) Substitution into Eq. (45) after taking the limit L yields the following →∞ 1 f(x) = ∞ G(x,ω) dω 2π Z −∞ 1 ∞ ∞ iωt iωx = f(t)e− dt e dω. (49) 2π Z Z  −∞ −∞

A few comments regarding this result:

166 1. The term eiωx looks like a basis element - the summation over the index n, therefore over the

discrete frequencies ωn, has been replaced by an integration over ω.

2. The term in brackets looks like a complex inner product, i.e., f, eiωt , therefore a Fourier h i coefficient.

3. From these two observation, the RHS has the form of a continuous expansion.

1 One question remains: What to do with the factor in front of the integral. The “symmetric 2π 1 1 1 approach” is to split this factor, i.e., = , and rewrite the above integral as follows, 2π √2π √2π

1 ∞ 1 ∞ iωt iωx f(x)= f(t)e− dt e dω. (50) √2π Z √2π Z  −∞ −∞ The term in brackets, viewed as a “Fourier coefficient,” is called the Fourier transform of f and denoted as F (ω): 1 ∞ iωt F (ω)= f(t)e− dt. (51) √2π Z −∞ We say that

F (ω) is the Fourier transform (FT) of f(t) or

F = (f). F

Eq. (50) may then be written as follows:

1 f(x)= ∞ F (ω)eiωxdω. (52) √2π Z −∞ The above equation defines the inverse Fourier transform (IFT) of F (ω), i.e., f = 1(F ). F −

We shall adopt the above formulas, (51) and (52), for the Fourier transform and inverse Fourier transform, respectively, in this course.

167 A note on other definitions of the Fourier transform

There is another formulation of the FT which, in fact, is the usual convention in the signal/image processing literature, as well as being the one adopted by the and MATLAB programming 1 languages. It involves leaving the factor out in front of the integral, as in Eq. (50). The Fourier 2π transform of f(x) is then defined as

∞ iωt F (ω)= f(t)e− dt. (53) Z −∞ The inverse Fourier transform is then given by

1 f(x)= ∞ F (ω)eiωxdω. (54) 2π Z −∞ Note that the lack of a factor in front of the Fourier transform is consistent with the convention adopted for the discrete Fourier transform (Version III, which is also the “MATLAB” formula.)

Important note: The definitions in Eqs. (53) and (54) were used in the AMATH 231 Course Notes.

A further note on this convention: It is convenient in the engineering literature not to use the angular frequency ω (radians/unit time), but rather to employ the “wavenumber” k, the number of cycles per unit time. (For example, we normally think of the range of human hearing to be something like 20-20,000 cycles per second and not its equivalent in radians per second.) These two frequencies are related as follows, ω = 2πk, (55) since there are 2π radians/cycle. The resulting Fourier transform is

∞ i2πkt F (k)= f(t)e− dt. (56) Z −∞ From the change of variable ω = 2πk, we have dω = 2πdk, so that the inverse Fourier transform becomes f(x)= ∞ F (ω)ei2πkxdk. (57) Z −∞ Note that neither the FT nor the inverse FT have any factors in front of the integrals, which is most convenient.

168 We now return the FT and IFT as defined in Eqs. (51) and (52), respectively, for our course. Two noteworthy comments:

1. In the Fourier series representation for a function f(x) on [ L,L], cf. Eqs. (39) and (40), the − nπ coefficient a measures the component of f that oscillates at the frequency ω = . The n n L Fourier series is a summation over discrete frequencies ωn.

2. In the Fourier transform representation for a function f(x) on ( , ), the coefficient F (ω) −∞ ∞ measures the component of f that oscillates at the frequency ω. The Fourier transform is an integration over continuous frequencies ω.

We now present a few simple examples to illustrate some basic points. More details on the actual calculations are to be found in the book by Boggess and Narcowich, p. 94-99.

Examples:

1. The so-called “boxcar” function

1, π t π, f(t)=  − ≤ ≤ (58)  0, otherwise.  A plot is given at the left in the figure below.

The Fourier transform is computed to be

1 ∞ iωt F (ω) = f(t)e− dt √2π Z −∞π 1 iωt = e− dt √2π Z π − . . sin ωπ = √2π ωπ = √2π sinc(πω). (59)

Here, we have used the mathematical definition of the “sinc” function (see note below)

sin x , x = 0, sinc(x)=  x 6 (60)  1, x = 0.  A plot is shown at the right in the figure below.

169 2 3

2.5 1.5

2

1 1.5

0.5 1

0.5 0

0

-0.5 -0.5

-1 -1 -5 -4 -3 -2 -1 0 1 2 3 4 5 -5 -4 -3 -2 -1 0 1 2 3 4 5 t t

Left: Plot of the “boxcar function”, f(t), Example 1. Right : Its Fourier transform F (ω).

The boxcar function is piecewise constant, with a nonzero constant value over [ π,π]. (The − constant value of zero would not contribute to F (ω)). As such, one would expect that its largest frequency component is at ω = 0. That being said, one would also expect that the discontinuities at x = π would have to be accomodated, accounting for the high-frequency components. ± Note: In the signal processing literature, the “sinc” function is defined as follows,

sin πx , x = 0, sinc(x)=  πx 6 (61)  1, x = 0.  In this notation, the Fourier transform of the boxcar function would be F (ω)= √2π sinc (ω).

2. The function cos 3t, π t π, f(t)=  − ≤ ≤ (62)  0, otherwise. This may be viewed as a “clipped audio signal”, obtained by multiplying the function cos(3t),t ∈ R by the boxcar function of Example 1.

Its Fourier transform is given by (Exercise)

2 ω sin(πω) F (ω)= . (63) rπ 9 ω2 − Plots of f(t) and F (ω) are shown in the figure below. The largest frequency components of F (ω) are at 3, as expected since the original function has a cos(3t) component, at least over a finite ± time interval. (The fact that F (ω) is finite at ω = 3 is also left as an exercise.) ±

170 2 2

1.5 1.5

1

1 0.5

0 0.5

-0.5 0

-1

-0.5 -1.5

-2 -1 -5 -4 -3 -2 -1 0 1 2 3 4 5 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 t t

Left: Plot of f(t), Example 2. (The vertical lines at the discontinuities x = π are artifacts of the plotting ± routine.) Right : Fourier transform F (ω).

3. The triangular peak function,

π + t, π t π,  − ≤ ≤ f(t)=  π t, 0

2 1 cos(πω) F (ω)= − . (65) rπ  ω2 

5 5

4.5

4 4

3.5

3 3

2.5

2 2

1.5

1 1

0.5

0 0

-0.5

-1 -1 -5 -4 -3 -2 -1 0 1 2 3 4 5 -10 -9 -8 -7 -6 -5 -4 -3 -2 -1 0 1 2 3 4 5 6 7 8 9 10 t t

Left: Plot of f(t), Example 3. Right : Fourier transform F (ω).

There are some other noteworthy observations to be made regarding the above examples:

1. The Fourier transform in Examples 1 and 2 decay as F (ω)= O(1/ω) as ω . | |→∞

171 2. The Fourier transform in Example 3 decays as F (ω)= O(1/ω2) as ω . | |→∞ As in the case of Fourier series coefficients – which represent a discrete Fourier transform – the faster decay rate of Example 3 is due to the higher degree of regularity of the function f(t): it is continuous at all x R, whereas the “boxcar function” in Example 1 is piecewise continuous. ∈

172 Lecture 15

This lecture, scheduled for Friday, February 8, 2013, was cancelled due to the snow storm. As such, only Lectures 13 and 14 will be posted this week.

173