Optimally Sparse Image Representations using Shearlets

Glenn R. Easley Demetrio Labate Wang–Q Lim System Planning Corporation, Department of Mathematics, Department of Mathematics, 1000 Wilson Blvd North Carolina State University, Washington University, Arlington, VA 22209 USA Raleigh, NC 27695 USA St. Louis, Missouri 63130 USA

Abstract— It is now widely acknowledged that traditional has a simple mathematical construction, extends naturally to are not very effective in dealing with multidimensional higher dimensions and can be associated to a multiresolution signals containing distributed discontinuities. This paper presents analysis [12]. In addition, as we will show in this paper, this a new discrete multiscale directional representation called the Discrete Shearlet Transform. This approach, which is based on approach is amenable to a fast algorithmic implementation and the shearlet transform previously developed by the authors and is very competitive for image denoising. their colaborators, combines the power of multiscale methods The paper is organized as follows. In Section II we introduce with a unique ability to capture the geometry of multidimensional a two-dimensional continuous transform called the Continuous data and is optimally efficient in representing images containing Shearlet Trasform, which is well suited to locate discontinu- edges. Numerical experiments demonstrate that the Discrete Shearlet Transform is very competitive in denoising applications ities along edges. In Section III we show that the Discrete both in terms of performance and computational efficiency. Shearlet Transform, obtained by discretizing the corresponding continuous transform, provides optimal representation for a I.INTRODUCTION large class of two-dimensional signals. In Section IV, we de- The most useful feature of the transform is the scribe the algorithmic implementation of the Discrete Shearlet ability to deal with signals containing isolated point sin- Transform and in Section V we describe some applications for gularities. This fact, together with the availability of fast image denoising. discrete implementations, explains the spectacular success of II.THE CONTINUOUS SHEARLET TRANSFORM wavelets in a variety of signal processing applications. Indeed, 2 if a one-dimensional signal s(t), smooth away from point An affine family generated by ψ ∈ L (R) is a collection of discontinuities, is approximated using the best M-term wavelet functions of the form: expansion, then the rate of decay of the approximation error, −1/2 −1 {ψa,t(x) = a ψ(a x − t): a > 0, t ∈ R}. as a function of M, is optimal. In fact, it is significantly better 2 than the corresponding Fourier approximation error [1], [2]. ψ is called a continuous wavelet if, for all f ∈ L (R) Z Z However, despite their optimal approximation properties for ∞ ∞ da one-dimensional signals, traditional wavelet methods do not f(x) = hf, ψa,ti ψa,t(x) dt . 0 −∞ a perform as well with multidimensional data. Indeed, wavelets are very efficient in dealing with isolated point singularities The continuous wavelet transform of f is: only. In higher dimensions, other types of singularities are usu- Wf(a, t) = hf, ψa,ti, ally present or even dominant. Images, for example, typically contain sharp transitions such as edges. Since edges interact and the discrete wavelet transform is obtained by dicretizing extensively with the elements of the wavelet basis, “many” Wf(a, t) on an appropriate set [2]. wavelet coefficients are needed to accurately represent these The natural way of extending the theory of the continuous objects. A number of recent results have shown that a much wavelet transform to higher dimensions is by considering the more efficient representation of multidimensional data is ob- affine system −1/2 −1 n tained by better exploiting their geometric regularities. These {ψM,t(x) = | det M| ψ(M x − t): M ∈ G, t ∈ R }, various methods include contourlets [3], [4], complex wavelets 2 n [5] brushlets [6], ridgelets [7], curvelets [8], bandelets [9] and where ψ ∈ L (R ) and G is a subset of GLn(R), the group other schemes of “directional wavelets” [10]. of invertible n × n matrices. Similarly to the one-dimensional 2 n The authors and their collaborators have recently intro- case, ψ is called a continuous wavelet if, for all f ∈ L (R ) Z Z duced the shearlet representation [11], [12], which applies f(x) = hf, ψM,ti ψM,t(x) dt dλ(M), the framework of affine systems to capture very efficiently G Rn the geometry of multidimensional signals. As a result, this where λ(M) is a measure on G, and approach provides optimal approximation properties for a large class of two-dimensional images. The shearlet representation Wf(M, t) = hf, ψM,ti ˆ ˆ ∞ b ˆ 1 1 1 1 is the continuous wavelet transform of f [13]. with ψ1, ψ2 ∈ C (R), supp ψ1 ⊂ [− 2 , − 16 ] ∪ [ 16 , 2 ] and ˆ The traditional way of discretizing the continuous wavelet supp ψ2 ⊂ [−1, 1]. In addition, we assume that f ∈ L2(Rn) M ∈ G, t ∈ Rn transform of replaces X 1 with a discrete set Aj, j ∈ Z, k ∈ Zn. However, starting |ψˆ (2−2jω)|2 = 1 for |ω| ≥ , (4) 1 8 from the continuous wavelet transform other types of discrete j≥0 transforms can be deduced. Indeed, the shearlet transform, and, for each j ≥ 0, which will be described in the next section, is a obtained by discretizing the 2-dimensional continuous wavelet transform 2Xj −1 ˆ j 2 in a ‘non-traditional’ way. |ψ2(2 ω − `)| = 1 for |ω| ≤ 1. (5) For ψ ∈ L2(R2), consider the 2-dimensional affine system `=−2j 1 ˆ(0) − 2 −1 2 From these assumptions it follows that the functions ψ {ψast(x) = | det Mas| ψ(Mas x − t): t ∈ R ,Mas ∈ Γ}, j,`,k (1) (with j ≥ 0, −2j ≤ ` ≤ 2j − 1, k ∈ Z2) form a tiling of 1 ξ2 D0 = {(ξ1, ξ2): |ξ1| ≥ , | | ≤ 1}. This is illustrated in where Γ is the 2–parameter dilation group 8 ξ1 µ √ ¶ Figure 1(a). In very similar way, we construct a second set a a s + (1) ˆ(1) Γ = {Mas = √ :(a, s) ∈ R × R}. of discrete shearlets ψj,`,k(x) such that the set {ψj,`,k : j ≥ 0 a j j 2 0, −2 ≤ ` ≤ 2 − 1, k ∈ Z } is a tiling of D1 = {(ξ1, ξ2): 1 ξ1 2 2 ψ |ξ2| ≥ , | | ≤ 1} (see Figure 1(a)). Finally, let ϕ ∈ L (R ) We choose such that 8 ξ2 2 ξ be such that the set {ϕk(x) = ϕ(x − k): k ∈ Z } is a tight ˆ ˆ ˆ ˆ 2 2 1 1 2 ∨ ψ(ξ) = ψ(ξ1, ξ2) = ψ1(ξ1) ψ2( ), (2) for L ([− , ] ) . We deduce the following result. ξ 16 16 1 Theorem 3.1 ([12]): The collection: ˆ ∞ where ψ1 is a continuous wavelet for which ψ1 ∈ C (R) (d) j j 2 ˆ {ϕk, ψj,`,k : j ≥ 0, −2 ≤ ` ≤ 2 − 1, k ∈ Z , d = 0, 1} with supp ψ1 ⊂ [−2, 1/2] ∪ [1/2, 2] and ψ2 is chosen so that ˆ ∞ ˆ ˆ ψ2 ∈ C (R), supp ψ2 ⊂ [−1, 1], with ψ2 > 0 on (-1,1), and is a tight frame for L2(R2). kψ2k = 1. Under these assumptions, ψ is a continuous wavelet This indicates that the decomposition is invertible and the + 2 [14] and for a ∈ R , s ∈ R, and t ∈ R transformation is numerically well-conditioned.

Sf(a, s, t) = hf, ψasti ξ2 will be called the continuous shearlet transform of f ∈ L2(R). The elements of the affine system, which we shall call con- tinuous shearlets, are oriented waveforms whose orientation is 6∼ 2j ? controlled by the shear parameter s. They become increasingly elongated at fine scales (as a → 0). We refer to [14] for more ξ1  - details. ∼ 22j III.DISCRETE SHEARLET TRANSFORM By sampling the Continuous Shearlet Transform Sf(a, s, t) (a) (b) on an appropriate discrete set we obtain a discrete transform which is able to better deal with distributed discontinuities. Fig. 1. (a) The tiling of the frequency plane induced by the shearlets. The Observe that the matrix Mas can be factored as tiling of D0 is illustrated in solid line, the tiling of D1 is in dashed line. (b) µ √ ¶ µ ¶ µ ¶ The frequency support of a shearlet ψj,`,k satisfies parabolic scaling. a a s 1 s a 0 √ = √ . 0 a 0 1 0 a Details about this construction can be found in [12] and ` j [15]. Let us summarize here the main mathematical properties Thus, it will be “discretized” as Mj` = B A , where µ ¶ µ ¶ of shearlets: 1 1 4 0 B = ,A = • Shearlets are well localized. They are compactly sup- 0 1 0 2 ported in the frequency domain and have fast decay in are the shear matrix and the anisotropic dilation matrix, the spatial domain. ˆ respectively. Hence, the discrete shearlets are the functions • Shearlets satisfy parabolic scaling. Each element ψj,`,k is of the form supported on a pair of trapezoids, each one contained in a j 2j 3j box of size approximately 2 × 2 (see Figure 1(b)). In (0) 2 ` j ψj,`,k(x) = 2 ψ(B A x − k), (3) the spatial domain, each ψj,`,k is essentially supported on a box of size 2−j × 2−2j. Their supports become where µ ¶ increasingly thin as j → ∞. ˆ(0) ˆ(0) ˆ ˆ ξ2 • Shearlets exhibit highly directional sensitivity. The ele- ψ (ξ) = ψ (ξ1, ξ2) = ψ1(ξ1) ψ2 , ˆ ξ1 ments ψj,`,k are oriented along lines with slope given by −j j −` 2 . As a consequence, the corresponding elements and thus, fd [n1, n2] are the discrete samples of a function −j j bj ψj,`,k are oriented along lines with slope ` 2 . The fd (x1, x2), whose Fourier transform is fd (ξ1, ξ2). number of orientations doubles at each finer scale. In order to obtain the directional localization illustrated • Shearlets are spatially localized. For any fixed scale and in Figure 1, we will compute the DFT on the pseudo-polar orientation, the shearlets are obtained by translations on grid, and then apply a one-dimensional band-pass filter to the lattice Z2. the components of the signal with respect to this grid. More • Shearlets are optimally sparse: precisely, let us define the pseudo-polar coordinates (u, v) ∈ Theorem ([11, Thm. 1.1]): Let f be C2 away from R2 as follows: piecewise C2 curves, and let f S be the approximation N ξ2 (u, v) = (ξ1, ) if (ξ1, ξ2) ∈ D0 to f using the N largest coefficients in the shearlet ξ1 ξ1 expansion. Then (u, v) = (ξ2, ) if (ξ1, ξ2) ∈ D1 ξ2 S 2 −2 3 kf − fN k2 ≤ CN (log N) . Let us summarize the procedure illustrated by the scheme Thus the shearlets form a tight frame of well-localized wave- of Figure 2. forms, at various scales and directions, and are optimally sparse in representing images with edges. General Algorithm IV. ALGORITHMIC IMPLEMENTATION 0 It will be convenient to describe the collection of shearlets Define fa be the given N × N image and set N0 = N. described above in a way which is more suitable to derive For j = 1,...,L, do the following: its numerical implementation. We introduce a collection of 1. Apply the Laplacian Pyramid scheme to decompose (d) j−1 j smooth window functions Wj,` (ξ) localized on a pair of fa into a low-pass Nj−1/4 × Nj−1/4 image fa and a j trapezoids, as illustrated in Figure 1(a), satisfying high-pass Nj−1 × Nj−1image fd . bj 1 2j −1 2. Compute fd on a pseudo-polar grid and apply filtering X X j |W (d)(ξ , ξ )|2 = 1. b b˜ j,` 1 2 ψ2(ξ) along the angular direction to obtain f d. d=0 `=−2j ˜j 3. Invert to obtain fd . The discrete shearlet transform of f The algorithm runs in O(N 2 log N) operations. (d) Sf(j, `, k) = hf, ψj,`,ki can be computed by Z (d) −j −` 1 ˆ −2j 2πiξAd Bd k f Sf(j, `, k) = f(ξ) V (2 ξ) Wj,` (ξ) e dξ, d R2 (6) ˆ ˆ f where V (ξ1, ξ2) = ψ1(ξ1) χD0 (ξ1, ξ2) + ψ1(ξ2) χD1 (ξ1, ξ2). A. A Frequency-Domain Implementation N × N {f[n , n ]}N−1,N−1} Consider an image given by 1 2 n1,n2=0 . Its 2D Discrete Fourier Transform (DFT) is f 1 f 2 N−1 a d X n n 1 −2πi( 1 k + 1 k ) (4,4) fˆ[k , k ] = f[n , n ] e N 1 N 2 , 1 2 N 1 2 n1,n2=0 2 fa N N (4,4) where − 2 ≤ k1, k2 < 2 . First, to compute ˆ −2j −2j f(ξ1, ξ2) V (2 ξ1, 2 ξ2) (7) Fig. 2. The figure illustrates the succession of Laplacian pyramid and in the discrete domain, at the resolution level j, we apply directional filtering. the Laplacian pyramid algorithm [16]. This gives rise to the Observe that, in this implementation, we have a large multiscale partition illustrated in Figure 1 by decomposing j−1 flexibility in the choice of the frequency window function W . fa [n1, n2], 0 ≤ n1, n2 < Nj−1, into a low pass filtered j j−1 In addition, there is a fast time-domain implementation of the image fa [n1, n2], a quarter of the size of fa [n1, n2], and a j windowing process. For more details about these issues, we high pass filtered image f [n , n ]. Observe that the matrix d 1 2 refer to [15]. f j[n , n ] has size N × N , where N = 2−2jN, and a 1 2 j j j Figure 3 illustrates the two-level shearlet decomposition of f 0[n , n ] = f[n , n ] has size N × N. In particular, we have a 1 2 1 2 the “Cameraman” image. The first-level decomposition gener- bj ˆ −2j −2j fd (ξ1, ξ2) = f(ξ1, ξ2) V (2 ξ1, 2 ξ2) ates 4 subbands, and the second-level decomposition generates TABLE I

Noisy Curvelet NSCT Shear Peppers PSNR (dB) σ = 10 28.11 32.25 33.75 33.79 σ = 15 24.58 31.36 32.43 32.54 σ = 20 22.09 30.60 31.43 31.57 Elaine PSNR (dB) σ = 10 28.11 30.72 31.69 31.85 σ = 15 24.58 29.98 30.52 30.59 σ = 20 22.09 29.34 29.79 29.81

values of the noise (see Figures 4 and 5). For more competitive comparisons, we tested the scheme against the Curvelet based denoising scheme of [8] and the Nonsubsampled Contourlet Transform (NSCT) denoising scheme of [4] using 16, 16, 8, and 8 directions from finer to coarser scales. The performance of the shearlet approach relative to other Fig. 3. The top image is the original Cameraman image. The image below the top image contains the approximate shearlet coefficients. Images of the transforms is shown in Table I. It shows that the shearlet algo- detail shearlet coefficients are shown below this with an inverted grayscale rithm consistently matches or outperforms all the algorithms for better presentation. mentioned above. It is noticeable that the shearlet transform results exhibit less Gibbs-type residual artifacts than the other denoising methods. 8 subbands, corresponding to the different directional bands Further examples showing the good performance of the illustrated by the scheme in Figure 2. methods outlined in this work can be found in [15].

V. SUMMARY AND EXPERIMENTAL RESULTS REFERENCES The highly directional sensitivity of the shearlet transform [1] D. L. Donoho, M. Vetterli, R. A. DeVore, I. Daubechies, Data compres- sion and harmonic analysis, IEEE Trans. Inf. Th. 44 (1998) 2435–2476. and its optimal approximation properties will lead to improve- [2] S. Mallat, A Wavelet Tour of Signal Processing, Academic Press, San ments in many image processing applications. Diego, CA, 1998. To illustrate one of its potential uses, we have used the [3] M. N. Do, M. Vetterli, The contourlet transform: an efficient directional multiresolution image representation, IEEE Trans. Im. Proc. 14 ( 2005) shearlet transform to remove noise from images. Specifically, 2091–2106. suppose that for a given image f, we have u = f+², where ² is [4] A. L. Cunha, J. Zhou, M. N. Do, The nonsubsampled contourlet trans- Gaussian white noise with zero mean and standard deviation σ. form: Theory, design, and applications, in press. [5] N. Kingsbury, Complex wavelets for shift invariant analysis and filtering We recover the image f from the noisy data u by computing an of signals, Appl. Computat. Harmon. Anal. 10 (2001) 234-253. approximation f˜ of f obtained by applying a soft thresholding [6] R. R. Coifman, F. G. Meyer, Brushlets: a tool for directional image scheme in the subbands of the shearlet decomposition. We analysis and image compression, Appl. Comp. Harmon. Anal. 5 (1997) choose the threshold parameters τ = σ2 /σn as in [4], 147–187. i,j ²i,j i,j [7] E. J. Candes,` D. L. Donoho, Ridgelets: a key to higher–dimensional n where σi,j denotes the variance of the nth coefficient at the intermittency?, Phil. Trans. Royal Soc. London A 357 (1999), 2495–2509. ith shearing direction subband in the jth scale, and σ2 is the [8] J. L. Starck, E. J. Candes,` D. L. Donoho, The curvelet transform for ²i,j image denoising, IEEE Trans. Im. Proc., 11 (2002) 670-684. noise variance at scale j and shearing direction i. [9] E. Le Pennec, and S. Mallat, Sparse geometric image representations with The particular form of the shearlet transform we tested was bandelets, IEEE Trans. Image Process. 14 (2005) 423–438. the nonsubsampled Laplacian pyramid transform with several [10] J. P. Antoine, R. Murenzi, P. Vandergheynst, Directional wavelets revisited: Cauchy wavelets and symmetry detection in patterns, Appl. different combinations of the shearing filters. In particular, we Computat. Harmon. Anal. 6 (1999) 314-345. implemented the shearing on 4 scales of the Laplacian pyramid [11] K. Guo, D. Labate, Optimally Sparse Multidimensional Representation transform decomposition. The shearing filters of sizes 16×16, using Shearlets, preprint 2006. [12] K. Guo, W-Q. Lim, D. Labate, G. Weiss and E. Wilson, Wavelets with 16 × 16, 32 × 32, and 32 × 32 from finer to coarser were used composite dilations and their MRA properties, Appl. Computat. Harmon. with the number of shearing directions chosen to be 16, 16, Anal. 20 (2006) 231–249. 8, and 8. The shearing was done by using a Meyer wavelet [13] G. Weiss, and E. Wilson, The mathematical theory of wavelets, Proceed- ing of the NATO–ASI Meeting. Harmonic Analysis 2000. A Celebration. window. In addition to the shearlet reconstruction, a slight Kluwer Publisher, 2001. post-filtering has been applied to the reconstructed estimate [14] G. Kutyniok, and D. Labate, Resolution of the wavefront set using by using a stationary wavelet transform with a very small shearlets, preprint (2006). [15] G. R. Easley, D. Labate, W.Q Lim, Sparse Directional Image Represen- threshold parameter. tations using the Discrete Shearlet Transform, preprint (2006). We tested the denoising schemes using the pieces of the [16] P. J. Burt, E. H. Adelson, The Laplacian pyramid as a compact image “Peppers” and “Elaine” images for various standard deviation code, IEEE Trans. Commun. 31(4)(1983) 532-540. Fig. 4. Image denoising results of a piece of the “Peppers” with a standard diviation of 20. From top left, clockwise: Original image, noisy image (PSNR= 22.09 dB), Shearlet transform (PSNR=31.57 dB), and NSCT (PSNR= 31.43 dB)

Fig. 5. Image denoising results of a piece of the “Elaine” with a standard diviation of 20. From top left, clockwise: Original image, noisy image (PSNR= 22.09 dB), Shearlet transform (PSNR=29.81 dB), and NSCT (PSNR= 29.79 dB)