IMAGE RESTORATION USING GAUSSIAN SCALE MIXTURES IN THE DOMAIN

Javier Portilla ∗ Eero Simoncelli †

Visual Information Processing Group Center for Neural Science, and Dept. of Comp. Science and Artif. Intell. Courant Inst. of Mathematical Sciences Universidad de Granada New York University, NY 10003 [email protected] [email protected]

We describe a statistical model for images decomposed 2. IMAGE MODELING in an overcomplete wavelet pyramid. Each neighborhood of pyramid coefficients is modeled as the product of a Gaus- Modeling the statistics of natural images is a difficult task, sian vector of known covariance, and an independent hid- because of the high dimensionality of the signal and the den positive scalar random variable. We propose an effi- higher-order statistical dependencies that are prevalent. Sim- cient Bayesian estimator for the pyramid coefficients of an plifying assumptions, such as homogeneity and locality are image degraded by linear distortion (e.g., blur) and additive usually applied. In the past decade it has become standard Gaussian noise. We demonstrate the quality of our results to begin by decomposing the image with a set of multi-scale in simulations over a wide range of blur and noise levels. band-pass oriented filters. This kind of representation is adapted to the approximate scale invariance of natural im- 1. INTRODUCTION ages, and it has been shown to decouple some high-order statistics. Another technique for simplifying the descrip- Natural images have distinct features that allow the human tion of high-order statistics is to use a low-order local model visual system to detect the presence of distortion, and to (e.g., Gaussian) whose parameters (e.g., variance) are gov- extract remaining information from the observation. Im- erned by a local hidden random variable. In this work we age restoration aims to construct an approximation sharing have combined these ideas into a Bayesian framework. the relevant features still present in the corrupted image, but with the artifacts suppressed. In order to distinguish the ar- tifacts from the signal, a good image model is essential. In 2.1. Image representation this work we use a Bayesian framework for image restora- tion, basing on a prior statistical model for natural images. As a preprocessing stage for image modeling, we use a fixed We assume Gaussian noise of known covariance has been multi-scale multi-orientation linear decomposition. It is now added after having convolved the image with a known blur- generally agreed that the use of overcomplete representa- ring kernel. Such model for the distortion provides a reason- tions is advantageous for restoration, in order to avoid alias- able approximation to real-world corruption sources, such ing artifacts that plague critically-sampled representations as de-focus or photon and electron noise in cameras. In such as orthogonal [9]. A widely followed ap- previous works, we have described a model for wavelet co- proach is to use orthogonal or biorthogonal basis functions, efficients using scale mixtures of Gaussians [1, 2]. Related but without decimating the subbands [e.g., 10]. However, models have been developed by several groups [3, 4, 5, 6]. after the critical sampling constraint has been removed, sig- We have applied this model to estimate images in the pres- nificant improvement comes from using representations with ence of independent additive Gaussian noise of known co- higher redundancy and orientation selectivity [9, 11, 12]. variance, following a suboptimal empirical Bayes strat- For the current paper, we have used a particular variant of an egy [2, 7]. Recently, we developed a direct Bayesian Least overcomplete tight frame representation known as a “steer- Square denoising procedure [8]. In this paper, we extend able pyramid” [9] (see [8] for details). In this paper we have this approach to include the full restoration problem. used 5 scales and 4 orientations for the decomposition, for a total number of 25 subbands. This number includes 4 5 ∗JP is supported by a ”Ramon´ y Cajal” grant (Science and Technology × Spanish Ministry) band-pass subbands plus 4 oriented high-pass subbands and †EPS is supported by NSF CAREER grant MIP-9796040. a non-oriented low-pass residual. 2.2. Gaussian scale mixtures in the wavelet domain prior, which for a Gaussian variable with a scale parameter √z yields [16]: For each coefficient in the pyramid representation, we con- 1 sider a neighborhood of coefficients, referring to the center pz(z) . (3) ∝ z coefficient as the reference coefficient of the neighborhood. To avoid computational problems at the estimation stage, The neighborhood may include coefficients from other sub- we have set the prior to zero in the interval , where bands (i.e., corresponding to basis functions at nearby scales [0, zmin) is a small positive number.1 This solution is simple and and orientations), as well as from the same subband. For zmin efficient to implement. Surprisingly, it also performs better this work we have used a 3 3 neighborhood around the than maximum likelihood fitted non-parametric priors for reference coefficient, plus the×parent coefficient (same ori- the neighborhoods of each subband.2 entation and position, next coarser scale), whenever it ex- ists. Due to sampling differences at different scales, the coarser subband must be resampled at double rate in both 3. RESTORATION dimensions for obtaining the parent of each coefficient of the subband. We model the observed image as: We use a Gaussian scale mixture (GSM) to model the y(n, m) = x(n, m) h(n, m) + w(n, m), (4) coefficients within each local neighborhood with reference ∗ coefficients belonging to every given subband. A random where ”*” denotes , x is the original image, h vector x is a Gaussian scale mixture [13] if it can be ex- is a known linear kernel and w is Gaussian noise of known pressed as the product of a zero-mean Gaussian vector u power spectral density (PSD) P (u, v). and an independent positive scalar random variable √z: w

x = √zu. (1) 3.1. GSM with blur and noise

The variable z is the multiplier. Vector x is an infinite mix- We follow the usual procedure for image restoration in the ture of Gaussian vectors, whose density is determined by the wavelet domain: (1) decompose the image y into pyramid subbands; (2) restore each subband, except for the low-pass covariance matrix Cuu of u and the mixing density pz(z): residual; and (3) invert the pyramid transform. Translating (4) into our local GSM model, a vector y corresponding to a px(x) = p(x z) pz(z) dz Z | neighborhood of N observed coefficients around a reference T −1 coefficient for each subband of the pyramid representation exp x (zCuu) x/2 − (2) = N/2 1/2 pz(z) dz, can be expressed as: Z (2 π) zCuu  | | where N is the dimensionality of x and u (the size of the y = Hx + w = √zHu + w, (5) neighborhood). Without loss of generality one can assume where u and w are zero-mean Gaussian vectors, and H is an E z = 1, which implies C = C . uu xx N M matrix expressing the convolution of N coefficients { GSM} densities are symmetric and zero-mean, and they with× a kernel h of size N (N >> N, typically).3 The have leptokurtotic marginal densities (i.e., heavier tails than h h density of the observed neighborhood vector conditioned on a Gaussian). Another key property of the GSM model is that z is zero-mean Gaussian, with covariance C = zC 0 0 + the density of x is Gaussian when conditioned on z. Also, y|z u u C : the normalized vector x/√z is Gaussian. They also present ww interesting joint statistics: the variance of a vector element T −1 exp y (zCu0u0 + Cww) y/2 conditioned on a neighbor scales roughly linearly with the p(y z) = − , (6) N | (2π) zCu0u0 + Cww  square of the neighbor’s value. These marginal and joint | | statistics of GSM distributions are qualitatively similar to p where C 0 0 = HC HT is the N N covariance ma- those of neighbor coefficients responding to natural images u u uu trix of u0 = Hu. As the noise PSD ×P (u, v) is assumed in multi-scale and multi-orientation representation [14, 1]. w known, Cww is easily estimated by computing, for each subband, the sample covariance for the pyramid coefficients 2.3. Prior density for multiplier 9 1We have chosen zmin = 10− in our implementation. Results remain 17 5 Our model requires a specification of the density of the mul- almost unaffected with zmin ∈ [10− , 10− ]. tiplier, z. An alternative to estimate adaptive priors for the 2Note that, when estimating parameters under an overcomplete repre- hidden multiplier (as in [e.g., 5, 7]) is to use a noninfor- sentation, the set of least squares optimal solutions for the subbands is not least square optimal for the whole image. mative prior [15], which does not require the fitting of any 3The kernel h must be cropped in the frequency domain according to parameters to the observation. We have applied Jeffrey’s the spatial resolution of the pyramid subband. −T T −1 of the inverse of Pw(u, v) (a delta for where M = Cxx0 S Q, and v = Q S y. Finally, we the case of white noise). Given Cwwpand taking the expec- restrict the estimate to the reference coefficient: C C 0 0 tation of y|z over z, the filtered signal covariance u u N can be computed from the observation covariance matrix zmcnvn E xc y, z = , (11) Cyy: Cyy = E z Cu0u0 + Cww. Setting E z to 1, re- { | } zλn + 1 { } { } nX=1 sults in Cu0u0 = Cx0x0 = Cyy Cww. To ensure positive- − definiteness, small negative eigenvalues are set to zero. where mij represents an element (i-th row, j-th column) of the matrix M, λn are the diagonal elements of Λ, vn the 3.2. Bayes least squares estimator elements of v, and c is the index of the reference coefficient within the neighborhood vector. For each neighborhood, we estimate xc, the reference coef- ficient, from y, the set of noisy coefficients. The Bayes least 3.3.1. Estimation of the cross-covariance matrix squares (BLS) estimate is: We use the relation between the power spectrum of the ob-

E xc y = xc p(xc y) dxc served image and that of the original image, Py(u, v) = { | } Z | 2 Px(u, v) H(u, v) + Pw(u, v), to first obtain an approxi- ∞ | | = x p(x y, z) p(z y) dz dx mation of the power spectrum of the filtered image: c c| | c Z Z0 2 ∞ Px0 (u, v) Y (u, v) G(u, v) Pw(u, v) , (12) E ' | | ∗ − + = p(z y) xc y, z dz. (7)   Z0 | { | } where G(u, v) is a Gaussian convolving window for remov- Thus, the solution is an average of the least squares esti- ing sampling fluctuations. Then, we use: mate of x when conditioned on z (local Wiener solution), c 2 −1 P (u, v) P 0 (u, v)/ max( H(u, v) , G ), (13) weighted by the posterior density of the multiplier, p(z y). x ' x | | max This integral can be computed numerically for each neigh-| where G is the maximal allowed gain for the inverse borhood of coefficients with a few (10 to 15) uniform sam- max filter. For the results shown in this paper (white noise case) ples in log z. We now describe each of these components. 2 we chose Gmax = min(2400/σ0, 180) with σ0 the standard 4 deviation of w. Now Cxx0 can be estimated as the sample 3.3. Local Wiener estimate cross-covariance of the coefficients of the inverse Fourier The local linear estimate for the full neighborhood is the transforms of Px(u, v) and Px0 (u, v). The resulting es- [ Wiener solution: timate Cxx0 canp be significantlyp improved by modeling its [ −1 bias using an N N linear transform: E Cxx0 Cxx0 B. E x y, z = zCxx0 (zCx0x0 + Cww) y, (8) × { } ' { | } In order to estimate B, we assume that we would incur the T 0 0 0 where Cxx0 = CxxH is the M N cross-covariance same proportional bias when estimating Cx x from Px (u, v): 0 × \ matrix of x and x = Hx, the coefficients from the original E Cx0x0 Cx0x0 B = (Cyy Cww)B. Solving this for image and those from its blurry version. We explain below B{−1 yields} 'the following impro−ved estimator: a method for estimating this matrix. [ −1 We can simplify the dependence of this expression on z [ [ \ Cxx0 = Cxx0 (Cyy Cww)Cx0x0 (14) by diagonalizing the matrix zCx0x0 + Cww. Specifically,  −  let S be the symmetric square root of the positive definite Note that particularizing (14) for the reference c-th row of matrix C (i.e., C SST ), and Q Λ the eigen- ww ww = , 0 { −1} −T Cxx (N values) suffices for solving (11). vector/eigenvalue expansion of the matrix S Cx0x0 S . Then: 3.4. Posterior distribution of the multiplier T zCx0x0 + Cww = zCx0x0 + SS −1 −T T We use Bayes rule to compute p(z y) from p(y z) and p(z), = S zS Cx0x0 S + I S normalizing numerically the integral| of their product.| Using = SQ (zΛ + I) QT ST .  (9) the relationship in (9) and the definition of v, the conditional density p(y z) defined in (6) simplifies to: This diagonalization does not depend on z, and thus only | 2 needs to be computed once for each subband. We can now 1 N vn exp( 2 =1 +1 ) simplify (8): p(y z) = − n zλn . (15) | P N 1 1 (2π)N C (zλ + 1) E 0 −T − T − ww n=1 n x y, z = zCxx S Q(zΛ + I) Q S y q | | { | } 1 Q = zM(zΛ + I)− v, (10) 4Other more accurate estimates could be used instead. [4] M S Crouse, R D Nowak, and R G Baraniuk, “Wavelet- based statistical using hidden Markov models,” IEEE Trans. Signal Proc., vol. 46, pp. 886–902, April 1998.

[5] M K Mihc¸ak, I Kozintsev, K Ramchandran, and P Moulin, “Low-complexity image denoising based on statistical mod- eling of wavelet coefficients,” IEEE Trans. Sig. Proc., vol. 6, no. 12, pp. 300–303, December 1999.

[6] C Spence and L Parra, “Hierarchical image probability (HIP) model,” in Adv. Neural Information Processing Sys- tems, S. A. Solla, T. K. Leen, and K.-R. Muller¨ , Eds., Cam- bridge, MA, May 2000, vol. 12, MIT Press.

[7] J Portilla, V Strela, M Wainwright, and E Simoncelli, “Adaptive Wiener denoising using a Gaussian scale mixture model in the wavelet domain,” in Proc 8th IEEE Int'l Conf on Image Proc, Thessaloniki, Greece, 2001, pp. 37–40

[8] J Portilla, V Strela, M Wainwright, and E P Simoncelli, “Image denoising using scale mixtures of Gaussians in the wavelet domain,” IEEE Trans. Image Proc., In Press. 2003.

[9] E P Simoncelli, W T Freeman, E H Adelson, and D J Heeger, “Shiftable multi-scale transforms,” IEEE Trans Information Observed BLS-GSM Wiener Theory, vol. 38, no. 2, pp. 587–607, March 1992, Fig. 1. Up: High noise (σ = 20) and low blur (σb = 0.2); PSNR values are (dB): 22.1, 29.9, 28.3; Middle: High [10] R R Coifman and D L Donoho, “Translation–invariant de– noise (σ = 20) and high blur (σb = 2.0), (21.5, 27.9, noising,” in Wavelets and statistics, A Antoniadis and G Op- 27.0); Bottom: Low noise (σ = 2) and high blur (σb = penheim, Springer-Verlag lecture notes, San Diego, 1995. 2.0), (29.7, 32.5, 31.8); [11] E P Simoncelli, “Bayesian denoising of visual images in the wavelet domain,” in Bayesian Inference in Wavelet Based 4. SIMULATIONS Models, P Muller¨ and B Vidakovic, Eds., chapter 18, pp. 291–308. Springer-Verlag, New York, Spring 1999, We have applied our restoration algorithm to the Einstein image. For h we used a unity volume circular low-pass [12] J Starck, E J Candes, and D L Donoho, “The curvelet trans- form for image denoising,” IEEE Trans. Image Proc., vol. Gaussian filter with standard deviation σb. We added zero- 2 11, no. 6, pp. 670–684, June 2002. mean white Gaussian noise of variance σ0. We compare to the Wiener linear restoration method, (using (12) and (13)). [13] D Andrews and C Mallows, “Scale mixtures of normal dis- Fig. 1 shows the results with 2 levels of noise and blur. tributions,” J. Royal Stat. Soc., vol. 36, pp. 99–, 1974.

[14] E P Simoncelli, “Statistical models for images: Compres- 5. REFERENCES sion, restoration and synthesis,” in 31st Asilomar Conf on Signals, Systems and Computers, Pacific Grove, CA, [1] M J Wainwright and E P Simoncelli, “Scale mixtures of November 1997, pp. 673–678, IEEE Computer Society, Gaussians and the statistics of natural images,” in Adv. Neu- [15] M Figueiredo and R Nowak, “Bayesian wavelet-based im- ral Information Processing Systems, S. A. Solla, T. K. Leen, age estimation using non-informative priors,” in SPIE Conf. and K.-R. Muller¨ , Eds., Cambridge, MA, May 2000, vol. 12, on Mathematical Modeling, Bayesian Estimation, and In- pp. 855–861, MIT Press, verse Problems, Denver CO, Jul 1999.

[2] M Wainwright, E Simoncelli, and A Willsky, “Random cas- [16] J O Berger, “Prior information and subjective probabil- cades on wavelet trees and their use in modeling and analyz- ity,” in Statistical Decission Theory and Bayesian Analysis, ing natural imagery,” Applied and Computational Harmonic Springer Series in Statistics. Springer-Verlag, 1990. Analysis, 2000,

[3] S M LoPresto, K Ramchandran, and M T Orchard, “Wavelet image coding based on a new generalized gaussian mixture model,” in Data Compression Conf., Snowbird, Utah, 1997.