Discrete Wavelet Transformations and Undergraduate Education Catherine Bénéteau and Patrick J

Discrete Wavelet Transformations and Undergraduate Education Catherine Bénéteau and Patrick J. Van Fleet

avelet theory was an immensely In much the same way that wavelet theory popular research area in the is the confluence of several mathematical disci- 1990s that synthesized ideas from plines, we have discoveredthat the discrete wavelet mathematics, physics, electrical en- transformation is an ideal topic for a modern un- Wgineering, and computer science. In dergraduate class—the derivation of the discrete mathematics, the subject attracted researchers wavelet transformation draws largely from calcu- from areas such as real and harmonic analysis, sta- lus and linear algebra, provides a natural conduit tistics, and approximation theory, among others. Applications of wavelets abound today—perhaps to Fourier series and discrete convolution, and the most significant contributions of wavelets can allows near-immediate access to current appli- be found in signal processing and digital image cations. Students learn about signal denoising, compression. As the basic tenets of wavelet theory edge detection in digital images, and image com- were established, they became part of graduate pression, and use computer software in both school courses and programs, but it is only in the derivation and implementation of the dis- the last ten years that we have seen wavelets crete wavelet transformation. In the process of and their applications being introduced into the investigating applications, students learn how undergraduate curriculum. the application often drives the development of The introduction of the topic to undergrad- mathematical tools. Finally, the design of more ad- uates is quite timely—most of the foundational vanced wavelet filters allows the students to gain questions posed by wavelet researchers have been experience “working in the transform domain” answered, and several current applications of wavelets are firmly entrenched in areas of image and provides motivation for several important processing. Several authors [7, 2, 8, 14] have writ- concepts from an undergraduate real analysis ten books in which they present the basic results class. of wavelet theory in a manner that is accessible In what follows, we outline the basic devel- to undergraduates. Others have authored books opment of discrete wavelet transformations and [15, 13, 1, 9] that include applications as part of discuss their connection with Fourier series, con- their presentation. volution, and filtering. In the process, we illustrate the application of discrete wavelet transforma- Catherine Bénéteau is professor of mathematics at the tions to digital images. We next construct one pair University of South Florida. Her email address is of filters that are used by the JPEG2000 image [email protected]. compression standard. We conclude the paper by Patrick J. Van Fleet is professor of mathematics at the University of St. Thomas. His email address is illustrating how wavelet filters are implemented [email protected]. by the JPEG2000 standard. This material is based upon work supported by the Na- Let us begin by discussing the simplest discrete tional Science Foundation under grant DUE-0717662. wavelet transformation.

656 Notices of the AMS Volume 58, Number 5 The Discrete Haar Wavelet Transformation with The discrete Haar wavelet transformation is the ˜ 1 ˜ T ˜T ˜T most elementary of all discrete wavelet transforms (2) WN− 2WN 2 HN/2 GN/2 . = = and serves as an excellent pedagogical tool for the development of more sophisticated discrete What is the advantage of transforming x into a wavelet transformations and their application to and d? The vector a gives us an approximation digital signals or images. of the original vector x, while d allows us to We motivate this transformation as follows: identify large (or small) differences between the Suppose we wish to transmit a list (vector) of pairwise averagesand the originaldata. Identifying large differences might be of interest if we were N numbers, N even, to a friend via the Inter- attempting to detect edges in a digital image, net. To reduce transfer time, we decide to send for example. On the other hand, we might be only a length N/2 approximation of the data. inclined to convert differences that are small One way to form the approximation is to send in absolute value to zero—this action, assuming pairwise averages of the numbers. For example, the original data were largely homogeneous, has T the vector [6, 12, 15, 15, 14, 12, 120, 116] can be the effect of producing a large number of zeros T approximated by the vector [9, 15, 13, 118] . Of in the quantized differences vector d˜. In this course, it is impossible to determine the original case, the modified transform is more amenable to N numbers from this approximation, but if, in data coders in image compression applications. addition, we transmit the N/2 averaged directed Of course, we have no hope of recovering the distances, then our friend could completely recover original vector x from the modified transform, but the original data. For our example, the averaged depending on the application, the sacrifice of loss directed distances are [3, 0, 1, 2]T . of resolution for storage size might be desirable. − − We define the one–dimensional discrete Haar wavelet transformation as the linear transforma- The Two-Dimensional Discrete Haar tion Wavelet Transformation a If we view a grayscale digital image as an M N x , × → d matrix A whose entries are integers between 0 and 255 inclusive, where 0 represents black and 255 where x RN , N even, and the elements of ∈ is white, and integers in between indicate varying a, d RN/2 are ∈ degrees of intensities of gray, then application x2k 1 x2k of the one-dimensional Haar wavelet transform ak − + , ˜ = 2 WM acts on the columns of A. Each column is x2k 1 x2k transformed to a column whose top half gives an dk − − + , = 2 approximation, or blur, of the original column and whose bottom half holds the differences or details where k 1,...,N/2. Given these averages and = in the data. differences, we can recover the original data. We can process the rows of A as well by In matrix form, the transformation can be ˜ ˜ T multiplying WM A by WN . We define the two- expressed as dimensional discrete Haar wavelet transformation ˜ ˜ T of the matrix A to be WM AWN . Of course, both M 1 1 0 0 0 0 2 2 and N must be even integers. An image (stored in   ˜ ˜ ˜ T 1 1 A), WM A, and WM AWN are plotted in Figure 1. 0 0 0 0  2 2  We can better understand the image displayed    .  in Figure 1(c) if we express the two–dimensional  .   .    transformation in block form. Indeed, using (1)  0 0 0 0 1 1   2 2  and (2), we see that WÑ    1 1  =  0 0 0 0  H˜M/2  − 2 2  W˜ AW˜ T A H˜ T G˜T   M N  ˜  N/2 N/2  1 1  = GM/2  0 0 2 2 0 0   −      ˜ ˜T ˜ ˜T  .  HM/2AHN/2 HM/2AGN/2  ..    =  ˜ ˜T ˜ ˜T    GM/2AHN/2 GM/2AGN/2  0 0 0 0 1 1   − 2 2        BV . ˜ = HN/2 HD (1)   It is a straightforward computation to see that if =  G˜  we partition A into 2 2 blocks Aj,k, 1 j M/2,  N/2  × ≤ ≤   1 k N/2, the (j, k) elements of , , , are   ≤ ≤ B V H D

May 2011 Notices of the AMS 657 (a) A (b) W˜320A

˜ ˜ T (c) W320AW512

Figure 1. A 320 x 512 digital grayscale image, the product W˜M A, and the two-dimensional discrete ˜ ˜ T Haar wavelet transformation WM AWN .

constructed using the four values in Aj,k. Indeed, if a b Aj,k , = c d then each (j, k) element of , , , is B V H D a b c d (a c) (b d) + + + + − + 4 4 (a b) (c d) (a d) (b c) + − + + − + , 4 4 respectively. In this way we see that the elements (a) Two iterations of are simply the averages of the elements in the B blocks Aj,k, and the elements in , , and are V H D averaged differences in the vertical, horizontal, and diagonal directions, respectively. If M and N aredivisibleby2p, then we can iterate the transformation a total of p times. Figure 1(c) shows one iteration of the discrete Haar wavelet transformation. We can perform a second iteration by applying the two-dimensional discrete Haar wavelet transformation to . Figure 2 shows two B and three iterations of the discrete Haar wavelet transformation applied to A. In applications such (b) Three iterations as image compression, the iterated transformation produces more details portions, and in the case in Figure 2. Multiple iterations of the discrete which the original data are largely homogeneous, Haar wavelet transformation applied to the matrix A from Figure 1(a). then we can expect a large number of near-zero values in the details portions of the iterated transform. These small values can be quantized to zero, and the data coder should be able to Alternatively, we can use the averagesportion B compress the modified transform in a highly in applications as well. Suppose we stored a four- efficient manner. iteration version of an image on a web server. Then

658 Notices of the AMS Volume 58, Number 5 Figure 3. The wavelet transform of a digital image of Helaman Ferguson’s Four Canoes is shown at the top left. The portion framed in blue is enlarged and transmitted. Next, the portion of the transform outlined in yellow is transmitted and, with 4 , inverted and enlarged to create the B4 image at top right. This process is repeated for the green,B red, and brown portions of the transform to progressively create a sequence of images whose resolution increases by a factor of two at each step. The original image, outlined in brown, is at bottom right.

4, the fourth iteration averages portion of the We could just as well develop WN via discrete B transform, could serve as a thumbnail image for convolution. the original. If a user requests the original image, Recall that if h and x are bi-infinite vectors, then the convolution of h with x is the bi-infinite we could first transmit 4 and then progressively B send detail portions at each iterative level. The vector y, denoted by y h x, where the nth component, n Z, of the= vector∗ is recipient can apply the inverse transform as detail ∈ portions are received to sequentially produce a yn hkxn k. − = k Z higher-resolution version of the original image. ∈ Figure 3 illustrates the process. If we assume that xn 0 for n < 1 and n > N, = Equation (2) tells us that the discrete Haar where N is an even integer, then we can form WN x wavelet transformation is almost orthogonal. In- by T 1 deed, if we define WN √2WÑ , then W W − , Computing u h x, where the only = N = N • = ∗ nonzero values of h are h0 h1 √2/2. and our transformation is orthogonal. The inverse = = of the transformation now has a simple form, Computing v g x, where the only • nonzero values= of g∗ are g √2/2 and which allows us to easily reconstruct the signal 0 g √2/2. = from its transformed version. 1 Downsampling= − u and v by removing the • odd components of each vector. Relation to Discrete Convolution and Truncating the downsampled vectors ap- Filters • propriately to obtain a and d. We have motivated the construction of WN by The vectors h and g are often called filters. In fact, insisting that WN transform the original data x h and g are special kinds of filters. Let x be a into an averages portion a and details portion d. vector whose elements xk 1, k Z, and suppose = ∈

May 2011 Notices of the AMS 659 k y is a vector whose elements are yk ( 1) , k Z. Then h x √2 x and h y is= the− bi-infinite∈ zero vector.∗ = Thus h is an example∗ of a lowpass filter—it passes low-frequency signals through largely unchanged but attenuates the amplitudes of high-frequency data. In a similar manner we see that g x is the bi-infinite zero vector and g y √2∗y. Here g is an example of a highpass filter∗ —it= allows high-frequency signals to pass largely unchanged but attenuates the amplitudes of low-frequency data. Thus in filtering terms, the discrete Haar wavelet transform is constructed by applying a lowpass filter and a highpass filter to the input data, downsampling both results, and (a) H(ω) then appropriately truncating the downsampled | | vectors. We will learn that all discrete wavelet transforms are built from a lowpass (scaling) filter h and a highpass (wavelet) filter. Fourier series are valuable tools for constructing these filters.

Fourier Series and Discrete Wavelet Transformations A course on discrete wavelet transformations is an excellent venue for the introduction of Fourier series into the undergraduate curriculum. There are two main uses of Fourier series in such a course. We can determine whether a finite-length T filter f [f0, . . . , fL] is lowpass or highpass by = (b) G(ω) plotting the modulus of the Fourier series | | L H(ω) G(ω) ikω Figure 4. The functions H(ω) and G(ω) . F(ω) fke , | | | | = k 0 = and, as we will see later, we can characterize a At the lowest frequency, ω 0, H(0) √2. At (bi)orthogonal discrete wavelet transformation by the highest frequency, ω π=, H(π)| | =0. In the investigating an identity related to the Fourier frequency domain, these two= conditions| | = indicate series of the scaling filter(s). that h is a lowpass filter. In a similar manner, For the discrete Haar wavelet transformation, G(0) 0 and G(π) √2 are conditions that the finite-length filters |imply| that = g is a| highpass| = filter. T For ω R, we use (3) to write T √2 √2 h [h0, h1] , ∈ = = 2 2 H(ω) 2 H(ω π) 2 | | + | + | = T ω ω π T √2 √2 (5) 2 cos2 2 cos2 + 2. g g0, g1 , = = 2 − 2 2 + 2 = In a similar manner, we use (3) and (4) to obtain give rise to the Fourier series (6) G(ω) 2 G(ω π) 2 2 √2 √2 | | + | + | = H(ω) eiω = 2 + 2 and ω (3) √2eiω/2 cos (7) H(ω)G(ω) H(ω π)G(ω π) 0. = 2 + + + = In the next section, we will see that the first √2 √2 G(ω) eiω step in a general strategy for finding longer, more = 2 − 2 sophisticated filters h and g is to find h so that (5) ω (4) √2ieiω/2 sin is satisfied and then make a judicious choice of = − 2 g so that (6) and (7) hold as well. The additional for ω R. To determine the nature of the filter, lowpass conditions H(0) √2 and H(π) 0 we plot∈ H(ω) and G(ω) . Since H(ω) and with our choice of |g ensure| = that the| highpass| = G(ω) are| both| 2π-periodic,| | even functions,| | we conditions G(0) 0, G(π) √2 hold, and in plot| them| on the interval [0, π]. The functions are this way, we| see| that = all| lowpass,| = highpass, and displayed in Figure 4. matrix orthogonality conditions placed on filters

660 Notices of the AMS Volume 58, Number 5 T h and g can be completely characterized using We now set g [h3, h2, h1, h0] . It is straight- = − − Fourier series. forward to show that if (9), (13), (14), and (15) hold, then the highpass conditions (10), (11) are satisfied Daubechies Orthogonal Scaling Filters and Suppose we apply the one-dimensional discrete g2 g2 g2 g2 1 Haar wavelet transformation to the vector v 0 + 1 + 2 + 3 = T = g0g2 g1g3 0. [0, 0, 100, 100] . We obtain the transformed vector + = √2 [0, 100 0, 0]T . The differences portion of the Moreover, we can show that transformed| vector consists only of zeros, missing h0g0 h1g1 h2g2 h3g3 0 the large “edge” between 0 and 100 in v. The + + + = problem is that the Haar scaling/wavelet filter h1g3 h0g2 0 + = pair is too short. Let’s consider longer filter pairs. h3g1 h2g0 0 These filters and the corresponding wavelets were + = discovered by Ingrid Daubechies (see, e. g., [4]). so that WN is an orthogonal matrix. However, there are infinitely many solutions to the system We wish to construct an orthogonal matrix WN that uses longer scaling and wavelet filters. It turns h2 h2 h2 h2 1 out that we need to consider even-length filters, so 0 + 1 + 2 + 3 = T h0h2 h1h3 0 let’s start with a scaling filter h [h0, h1, h2, h3] (16) + = . = T h0 h1 h2 h3 √2 and wavelet filter g [g0, g1, g2, g3] . We insist + + + = = h0 h1 h2 h3 0 that − + − =

(8) H(0) h0 h1 h2 h3 √2 | | = | + + + | = Here is where the Fourier series approach really (9) H(π) h0 h1 h2 h3 0 makes an impact. Daubechies chose to impose the = − + − = and additional condition H′(π) 0. This condition has the effect of “flattening” the= graph of the modulus (10) G(0) g0 g1 g2 g3 0 H(ω) even more at ω π; thus intuitively the = + + + = |filter h| does an even better= job of attenuating (11) G(π) g0 g1 g2 h3 √2 | | = | − + − | = the amplitudes of the high-frequency portions of so that the filters h, g are lowpass and highpass, the input data. If we expand the left-hand side of respectively. H′(π) 0, we obtain the following linear equation We also wish to retain the structure of the = (17) h1 2h2 3h3 0, transformation by suitably truncating two down- − + = sampled, convolution products. In this case, the and we add (17) to our system (16). longer lengths of our filters force us to “wrap” It turns out that there are two real solutions rows at the bottom of each block HN/2, GN/2 of to the system, one of which is a reflection of the matrix WN . For example, in the case in which the other. The solution can be solved by hand N 8, WN takes the form = (see, e. g., [12]) to obtain the Daubechies four-term scaling filter: h3 h2 h1 h0 0 0 0 0  0 0 h3 h2 h1 h0 0 0  1 √3 3 √3 h , h 0 0 0 0 h h h h 0 + 1 +  3 2 1 0  = 4√2 = 4√2    h1 h0 0 0 0 0 h3 h2  3 √3 1 √3 (12) W8   . h2 − , h3 − . =  g3 g2 g1 g0 0 0 0 0    = 4√2 = 4√2  0 0 g g g g 0 0   3 2 1 0  The modulus of H(ω) is plotted in Figure 5.    0 0 0 0 g3 g2 g1 g0    Compare this graph with that of Figure 4.  g g 0 0 0 0 g g   1 0 3 2  In order to construct longer even-length fil-   T ters h [h0, . . . , hL] , L odd, we must write down The wrapping rows cause problems in applications = since it inherently assumes that data are periodic, general orthogonality conditions and lowpass con- but on the other hand, it simplifies our final ditions. We can ensure the orthogonality of WN using the following theorem. requirement that WN be an orthogonal matrix. In this case, we must have ikω Theorem 1. Suppose A(ω) ake and 2 2 2 2 = k Z (13) h0 h1 h2 h3 1 ikω ∈ + + + = B(ω) bke are Fourier series. Then = Z (14) h0h2 h1h3 0. k + = ∈ (18) A(ω) 2 A(ω π) 2 2 It turns out that if real numbers h0, . . . , h3 satisfy | | + | + | = (9), (13), and (14), then (8) holds. We will choose if and only if the positive square root so that (19) akak 2n δ0, n √ − (15) h0 h1 h2 h3 2. k Z = + + + = ∈

May 2011 Notices of the AMS 661 While this derivation of the system is easy to understand, solving it numerically for large filter lengths can be difficult. It should be noted that Daubechies’s classical derivation of these filters is centered around the construction of a trigonometric polynomial that leads to the solution of (18). Particular roots of this polynomial are then used to find a Fourier series H(ω) that solves (23). This process (see [5] or [9]) works well for filters of long length, but the mathematics required to understand it may be better suited for advanced undergraduate or beginning graduate students.

Figure 5. H(ω) for the length four | | Orthogonal Transforms in Applications Daubechies scaling filter. Now that we have the means for constructing orthogonal filters h, g of any odd length, it is natural to consider the advantages and disadvantages of for n Z, and ∈ their implementation in applications. (20) A(ω)B(ω) A(ω π)B(ω π) 0 Let’s first consider filter length. Certainly, the + + + = if and only if time to compute WN v is inversely proportional to the length of h. On the other hand, it can be (21) akbk 2n 0 − shown [4] that the derivative conditions (23) lead k Z = ∈ to filters g that annihilate (modulo wrapping rows) for all n Z. ∈ data obtained by uniformly sampling a polynomial of degree (L 1)/2. The proof of this theorem is straightforward. − Note that equations (19) and (21) guarantee In image compression we typically transform orthogonality of the matrix WN . the image, then quantize the transform, and finally Suppose H(ω) satisfies (18). If we take encode the result. It can be shown that if the transform is orthogonal, then the error incurred (22) G(ω) eiLωH(ω π), = − + when quantizing the transform coefficients is then G(ω) satisfies (18), and H(ω), G(ω) satisfy exactly the same as the error between the original (20). It is an easy exercise to show that the Fourier image and the compressed image. k Z coefficients of G(ω) are gk ( 1) h1 k, k . There are disadvantages to using orthogonal The condition H (π) 0= leads− naturally− ∈ to the ′ transformations. In lossless image compression, following generalization:= If we want to produce the quantization step is omitted. In this case, we longer scaling filters h, each time we increase the need to use a transformation that maps integers to filter length by two, we require an additional derivative condition at π. For example, the Daubechies integers. It is possible to modify orthogonal trans- scaling filter of length six is, in part, constructed forms (see[3]) and produce integer-to-integer map- pings, but it is easier to modify the biorthogonal by requiring the conditions that H′(π) 0 and = H′′(π) 0. Each time an additional derivative transformations described in the next section. condition= is imposed at π, the modulus of the cor- Unfortunately, the wrapping rows that appear responding Fourier series becomes flatter at the in the WN constructed from Daubechies scaling highest frequency ω π. So the generalderivative and wavelet filters pose a problem in many ap- = conditions are plications. In (12), there is one wrapping row in L 1 each block of the transform. Figure 6 illustrates (23) H(m)(π) 0, m 0,..., − . = = 2 the problem when the discrete Daubechies wavelet 16 We also add the lowpass condition transformation is applied to v R , where vk k, ∈ = k 1,..., 16. The wrapping rows cause the first L = (24) H(0) hk √2 two elements to be combined with the last two el- = = k 0 ements of v to form the last weighted average and = sothat ourgeneralsystem isgivenby(19),(23),and last weighted difference. If the data are periodic, L 2 + this makesperfect sense,but in manysignal/image (24). Daubechies proved that there are M 2⌊ 4 ⌋ = T real-valued scaling filters h [h0, h1, . . . , hL] , L processing applications, the data are not periodic. is odd, that satisfy this system.= Moreover, if we The wrapping row problem, caused by truncating construct G(ω) using (22), then G(0) 0 and downsampled convolution products, is typical in G(π) √2, as desired. = applied mathematics. | | =

662 Notices of the AMS Volume 58, Number 5 Biorthogonal Scaling Filters Fortunately, there is a way to deal with the wrapping row problem, and that is to develop ﬁlters that are symmetric.

An odd-length filter h is called symmetric if hk = h k, while an even-length filter is called symmetric − if hk h1 k. Daubechies [5] proved that the only = − symmetric, finite length, orthogonal filter is the Haar filter, and we have already discussed some of its limitations. How can we produce symmetric filters while preserving some of the desirable properties of orthogonal filters? These desirable properties are finite length (computational speed), (a) v orthogonality (ease of inverse), and the ability to produce a good approximation of the original data with the scaling filter. Since the inverse need only be computed once, it was Daubechies’s idea to relinquish orthogonality and to construct instead a discrete biorthogonal wavelet transformation. The idea is to construct two sets of filters instead of one. In other words, we construct two wavelet transform matrices WÑ and WN so that 1 T W˜ − W . In block form, we have N = N

HÑ/2 T   T T WÑ W H G (b) W v N = N/2 N/2 16  G˜   N/2    Figure 6. The discrete wavelet transform,   ˜ T ˜ T constructed using the Daubechies length four HN/2HN/2 HN/2GN/2 scaling filter, applied to v.  ˜ T ˜ T  = GN/2HN/2 GN/2GN/2   IN/2 0N/2 . = 0N/2 IN/2 then we can easily verify that G(˜ 0) G(0) 0, G(π)˜ G(π) √2, and = = Let h˜ and h denote two filters and suppose H(ω), = = (29) G(ω)˜ G(ω) G(ω˜ π)G(ω π) 2 H(ω)˜ are the associated Fourier series. We require + + + = both filters to be lowpass so we have H(˜ 0) (30) H(ω)˜ G(ω) H(ω˜ π)G(ω π) 0 H(0) √2 and H(π)˜ H(π) 0. Instead of= + + + = = = = (31) G(ω)˜ H(ω) G(ω˜ π)H(ω π) 0. orthogonality, we insist that the filters satisfy the + + + = T general biorthogonal condition Equation (29) implies that GÑ/2G IN/2, and N/2 = (30), (31), in conjunction with (20) from Theorem 1, (25) H(ω)˜ H(ω) H(ω˜ π)H(ω π) 2. T T establishes HÑ/2G GÑ/2H 0N/2. We can + + + = N/2 = N/2 = also expand the Fourier series in (27) and (28) to Although the filters are no longer orthogonal, they obtain the wavelet filter coefficients are close to orthogonal, and a proof very similar k k ˜ to that of Theorem 1 shows that (25) holds if and (32) g˜k ( 1) h1 k and gk ( 1) h1 k, = − − = − − only if where 1 k ranges over the nonzero indices of − h˜, h. Thus, if we solve (25), we can find all the ˜ (26) hkhk 2n δ0, n, ˜ − filters we need to build WN and WN . Of course k Z = ∈ (25) represents a quadratic system in terms of where n Z. Equation (26) ensures us, if we wrap the scaling filter coefficients, and such systems ∈ ˜ T are difficult to solve. Daubechies’s solution to rows, that HN/2HN/2 IN/2. If we take = this problem was to simply pick the scaling filter (27) G(ω)˜ eiωH(ω π) h˜ and then solve the linear system (25) for the = − + scaling filter h. Moreover, since we have given up (28) G(ω) eiωH(ω˜ π), = − + orthogonality in this new construction, we can

May 2011 Notices of the AMS 663 insist that both scaling ﬁlters be symmetric. Let’s look at an example.

Example 1. Suppose we want h˜ to be a symmetric, T length three, lowpass ﬁlter h˜ [h˜ 1, h˜0, h˜1] T = − = [h˜1, h˜0, h˜1] . We seek two numbers h˜0, h˜1 so that H(˜ 0) h˜1 h˜0 h˜1 √2 and H(π)˜ h˜1 h˜0 = + +1 = = − + − h˜1 0. The ﬁlter h˜ we seek is =

T √2 T [h˜ 1, h˜0, h˜1] [1, 2, 1] . − = 4 To find a symmetric filter h, we must satisfy the biorthogonality condition (25), together with the lowpass constraints H(0) √2 and Figure 7. The modified biorthogonal wavelet = H(π) 0. It turns out that h must have an transformation applied to the vector v from = odd length of 5, 9, 13,.... We choose symmetric Figure 6(a). T h [h2, h1, h0, h1, h2] and use the lowpass condi- = tion H(π) 0 in conjunction with (26) to obtain = the system and

√ √ √ √ √ 3 2 2 2 0 0 0 2 2 h0 2h1 2h2 0 4 4 − 8 − 8 4 − + = √2 √2 3√2 √2 √2  8 4 4 4 8 0 0 0  h h √2 − − 0 1 √ √ √ √ √ + = 0 0 2 2 3 2 2 2 0  − 8 4 4 4 − 8  h1 2h2 0.  √ √ √ √ √  + =  2 0 0 0 2 2 3 2 2  W  − 8 − 8 4 4 4  √2 √2 8  √ √ √  Solving this system gives h , h , and =  2 2 2 0 0 0 0 0  2 8 1 4  4 − 2 4  3√2 = − =  √2 √2 √2  h . We can next use (32) to find the coeffi-  0 0 4 2 4 0 0 0  0 4  −  = √ √ √ cients of the wavelet filters g andg ˜. We have:  0 0 0 0 2 2 2 0   4 − 2 4   √ √ √  ˜g  2 0 0 0 0 0 2 2  g  4 4 − 2  √2   g˜ 1 ˜ 1 T − = 8 It is straightforward to verify that W8− W8 . √2 √2 = g˜0 g0 Note that W˜ and W still have wrapping rows, but = 4 = 4 8 8 3√2 √2 we can exploit the symmetry of the scaling filters g˜1 g1 = − 4 = − 2 to construct a transformation that diminishes the g˜ √2 g √2 2 4 2 4 adverse effects of the wrapping rows. The idea = √2 = g˜3 is quite simple, and we illustrate it now with an = 8 Note that there is some symmetry in the highpass example. filters: g˜k g˜2 k and gk g2 k. Here are exam- Example 2. Consider again the vector v R16 − − = = 30 ∈ ples of the biorthogonal transformation matrices where vk k. We form the vector vp R where = ∈ with N 8. We form the matrices by placing the T = vp [v1, . . . , v16 v15, v14, . . . , v2] . 0th element in the upper left corner of each block = | with positive indexed elements placed consecu- ap We next compute W˜30vp. The periodic na- tively to the right and negative indexed elements dp = placed consecutively to the left with wrapping if ture of vp means that the weighted averages and necessary. differences formed by the wrapping rows use con- secutive elements from v. For example, the first √2 √2 √2 2 4 0 0 0 0 0 4 √2 √2 √2 element of ap is v2 v1 v2 instead of the √2 √2 √2 4 2 4  0 4 2 4 0 0 0 0  + +√2 √2 √2 first element of W˜16, which is v16 v1 v2. √2 √2 √2 4 2 4  0 0 0 4 2 4 0 0  + +   The elements of the transform that do not involve  √2 √2 √2  0 0 0 0 0 4 2 4 wrapping rows are the same as elements in W˜ v. W˜   16 8  √ √ √ √ √  =  2 3 2 2 2 0 0 0 2  We keep the first eight elements of ap and dp and  4 − 4 4 8 8   √2 √2 3√2 √2 √2  concatenate them to form our transform. The re-  0 8 4 4 4 8 0 0   −  sult is plotted in Figure 7. Compare this result to √ √ √ √ √  0 0 0 2 2 3 2 2 2   8 4 − 4 4 8  that plotted in Figure 6(b).  √ √ √ √ √   2 2 0 0 0 2 2 3 2   4 8 8 4 − 4    We can certainly construct biorthogonal filter pairs of different lengths—we can either select h˜, 1Do you see how Pascal’s triangle can be used to generate then set up and solve a linear system for h, or we symmetric, lowpass filters? can use the closed formulas for H(ω)˜ and H(ω)

664 Notices of the AMS Volume 58, Number 5 obtained by Daubechies in [6]. The biorthogonal “reversed”—h is used to build WÑ . For example, filter pair from the above example is particularly the modified transform for vectors of length 8 is: interesting since it plays a significant role in the 3 1 1 0 0 0 1 1 JPEG2000 lossless image compression standard. 4 4 − 8 − 8 4  1 1 3 1 1 0 0 0  Next we present one more modification of the − 8 4 4 4 − 8 transformation from Example 2 to understand  0 0 1 1 3 1 1 0   − 8 4 4 4 − 8  how the discrete biorthogonal wavelet transfor-  1 0 0 0 1 1 3 1   − 8 − 8 4 4 4  W˜8   . mation is usedin the JPEG2000 image compression  1 1  =  2 1 2 0 0 0 0 0  standard.  − −   0 0 1 1 1 0 0 0   − 2 − 2   1 1  Discrete Wavelet Transformations and  0 0 0 0 2 1 2 0   − −    JPEG2000  1 0 0 0 0 0 1 1   − 2 − 2  In 1992 JPEG (Joint Photographic Image Experts   Group) became an international standard for com- We use the ideas of Example 2 to further modify pressing digital images. In the late 1990s work the transformation so that it better handles output began to improve the JPEG compression standard, produced by wrapping rows. This modification is ˜ p and a new algorithm was developed. This algo- equivalent to computing W8 to v where rithm uses the biorthogonal wavelet transform 3 1 1 4 2 4 0 0 0 0 0 instead of the discrete cosine transform. Because −  1 1 3 1 1 0 0 0  of patent issues, JPEG2000 is not yet supported − 8 4 4 4 − 8  0 0 1 1 3 1 1 0  by popular web browsers, but it is likely that  − 8 4 4 4 − 8   1 1 5 1  once the patent issues are resolved, JPEG2000 will  0 0 0 0 8 4 8 4  (33) W˜ p  −  . replace JPEG as the most popular standard for 8  1 1  =  2 1 2 0 0 0 0 0  image compression.  − −   0 0 1 1 1 0 0 0   − 2 − 2  The basic JPEG2000 algorithm works as follows.    0 0 0 0 1 1 1 0  Let us assume that we have a grayscale image A.  − 2 − 2    We first subtract 128 from each entry of A. We  0 0 0 0 0 0 1 1   −  apply as many iterations of the modified biorthog-   This modified transform is still invertible, and onal wavelet transformation (see Example 2) as it now does a good job of handling edge effects. we wish (and as possible, depending on the di- However, the coding step of the JPEG2000 al- mensions of the image). The filter pair used in the gorithm works best if the transformation maps discrete biorthogonal wavelet transform depends integers to integers. How can we modify (33) for on whether we choose to perform lossy or loss- general size N so that it maps integers to integers? less compression. In the case of the former, the We could multiply the transformed data by 8, but transformed image is quantized with many ele- that increases the range of the output, which is ments either reduced in magnitude or converted not desirable for coders. The algorithm used by to zero. Of course, once we quantize the trans- JPEG2000 for lossless compression is called lifting formed image, we have no hope of recovering the and is due to Sweldens [11]. original image, but this loss of resolution is not as important as the coder’s ability to compress The Lifting Scheme the quantized transform. In lossless compression, The idea behind lifting is quite simple. For a nice no quantization is performed. In both cases, the tutorial, see [10], and more details are provided in next step is to code the data. JPEG2000 uses an the books [12, 8]. arithmetic coding algorithm known as Embedded To understand how lifting works, we apply (33) Block Coding with Optimized Truncation (EBCOT) Z8 ˜ p to v . The resulting product is W8 v to actually encode the (quantized) transformed ∈ = 1 1 3 1 1 v3 v2 v1 v2 v3 data. This last step is the actual compression step. − 8 + 4 + 4 + 4 − 8 1 1 3 1 1 v1 v2 v3 v4 v5 After transmission of the data, the original image  − 8 + 4 + 4 + 4 − 8  1 1 3 1 1 can be reconstructed by decoding, then applying v3 v4 v5 v6 v7  − 8 + 4 + 4 + 4 − 8   1 1 3 1 1  the inverse transformation as many times as iter- a  v5 v6 v7 v8 v7  (34)  − 8 + 4 + 4 + 4 − 8  , ated. With lossless compression, we can recover 1 1 d =  v1 v2 v3   − 2 + − 2  the image exactly.  1 1   v3 v4 v5   − 2 + − 2  One of the many advantages of JPEG2000 over 1 1  v5 v6 v7  JPEG is its ability to allow the user to perform the  − 2 + − 2   1 1   v7 v8 v7  lossless compression. The biorthogonal filter pair  − 2 + − 2  ˜   for this task is a modified version of the h and h where we have rewritten a1, a4, and d4 so that they from Example 1. The filter h is multiplied by √2/2 display the same symmetry as the other elements and h˜ is multiplied by √2. Moreover, the filters are in the product. Note that the center value of the

May 2011 Notices of the AMS 665 elements of a/(d) are built from the odd (even) transformation that involves rounding! This fi- elements of v. If we form “even” and “odd” vectors nal modification is an important tool used by the T T JPEG2000 image compression standard to perform e [e1, e2, e3, e4] [v2, v4, v6, v8] = = lossless compression. T T o [o1, o2, o3, o4] [v1, v3, v5, v7] , = = We have not discussed in this paper the classi- then cal approach to wavelet theory—the mathematics 1 thereisquite elegant(the books[2, 14, 9]areaimed (35) dk ek (ok 1 ok), at undergraduates). We believe, however, that an = − 2 + − introduction to the discrete wavelet transforma- where k 1, 2, 3, 4 and o : o . The elements of 5 4 tion, with immediate immersion in applications, the top= half of the transformation= can then be is an effective way to introduce students to appli- lifted from d: cations that are ubiquitous in today’s digital age, 1 to hone problem-solving skills, to promote and (36) ak ok (dk dk 1), = + 4 + − improve scientific computing, and to motivate where k 1, 2, 3, 4 and d0 : d1. concepts in upper-level undergraduate courses This method= is computationally= more efficient such as real and complex analysis. than a fast matrix multiplication. Moreover, we can easily modify this transformation so that it References maps integers to integers. Instead of computing [1] E. Aboufadel and S. Schlicker, Discovering the elements of d using (35), we compute d∗, Wavelets, Wiley–Interscience, Hoboken, NJ, 1999. whose elements are [2] A. Boggess and F. J. Narcowich, A First Course in 1 Wavelets with Fourier Analysis, Prentice Hall, Upper (37) dk∗ ek (ok ok 1) Saddle River, NJ, 2001. = − 2 + + [3] A. Calderbank, I. Daubechies, W. Sweldens, and and then compute a∗, whose elements are B.-L. Yeo, Wavelet transforms that map integers to 1 1 integers, Appl. Comp. Harm. Anal. 5, 3 (1998), 332– (38) ak∗ ok (dk∗ dk∗ 1) . 369. = + 4 + − + 2 [4] I. Daubechies, Orthogonal bases of compactly sup- Adding 1/2 to the term above eliminates bias. ported wavelets, Comm. Pure Appl. Math. 41 (1988), Certainly, the elements of a∗ and d∗ are integer- 909–996. valued and only slightly different from those given [5] , Ten Lectures on Wavelets, Society for Indus- by (35), (36), respectively. Moreover, the process trial and Applied Mathematicians, Philadelphia, PA, is completely invertible. Given d∗ and a∗, we can 1992. certainly recover the values ok using (38), and then [6] , Orthonormal bases of compactly supported wavelets II. Variations on a theme, SIAM J. Math. given o and d∗, we can use (37) to recover e. Anal. 24, 2 (1993), 499–519. M. W. Frazier Concluding Remarks [7] , An Introduction to Wavelets Through Linear Algebra, Undergraduate Texts in In this article, we have provided a very brief intro- Mathematics, Springer, New York, NY, 1999. duction to discrete wavelet transformations and [8] A. Jensen and A. la Cour-Harbo, Ripples in Math- have shown how the topic effectively lends itself to ematics: The Discrete Wavelet Transform, Springer, an intriguing undergraduate applied mathematics New York, NY, 2001. course. [9] D. K. Ruch and P. J. Van Fleet, Wavelet Theory: By introducing students to the discrete Haar An Elementary Approach with Applications, Wiley– wavelet transformation, we are able to not only Interscience, Hoboken, NJ, 2009. W. Sweldens construct a matrix representation of a tool used in [10] , Wavelets and the lifting scheme: A 5 minute tour, Z. Angew. Math. Mech. 76 (Suppl. 2) severalapplications but also provide students with (1996), 41–44. a concrete exampleof how convolution and Fourier [11] , The lifting scheme: A construction of sec- series can lead to a more general mathematical ond generation wavelets, SIAM J. Math. Anal. 29, 2 model for the construction of more sophisticated (1997), 511–546. scaling and wavelet filters. [12] P. J. Van Fleet, Discrete Wavelet Transformations: The Daubechies filters are better suited for An Elementary Approach with Applications, Wiley– applications, but still suffer from the effects Interscience, Hoboken, NJ, 2008. of truncating infinite convolution products. This [13] J. S. Walker, A Primer on Wavelets and Their Sci- problem leads to the development of biorthogonal entific Applications, 2nd ed., Chapman & Hall/CRC Press, Boca Raton, FL, 2008. scaling filters. Still more modification is needed [14] D. F. Walnut, An Introduction to Wavelet Analysis, to handle boundary effects, and students learn Birkhäuser, Cambridge, MA, 2002. that these modifications are not only tractable [15] M. V. Wickerhauser, Adapted Wavelet Analysis but also lead to the lifting scheme. This compu- from Theory to Software, A K Peters, Wellesley, MA, tational method is not only faster than sparse 1994. matrix multiplication but can also be modified to produce the seemingly impossible invertible

666 Notices of the AMS Volume 58, Number 5