A Content-Aware Image Prior

Taeg Sang Cho†, Neel Joshi‡, C. Lawrence Zitnick‡, Sing Bing Kang‡, Richard Szeliski‡, William T. Freeman† † Massachusetts Institute of Technology, ‡ Microsoft Research

Abstract

In image restoration tasks, a heavy-tailed gradient distri- bution of natural images has been extensively exploited as an image prior. Most image restoration algorithms impose a sparse gradient prior on the whole image, reconstructing an image with piecewise smooth characteristics. While the sparse gradient prior removes ringing and noise artifacts, it also tends to remove mid-frequency textures, degrading the visual quality. We can attribute such degradations to imposing an incorrect image prior. The gradient profile in fractal-like textures, such as trees, is close to a Gaussian dis- tribution, and small gradients from such regions are severely penalized by the sparse gradient prior.

0 Orthogonal gradients 0 Aligned gradients

To address this issue, we introduce an image restoration 10 10

algorithm that adapts the image prior to the underlying tex- ∆ −1 o∆ x −1 ax ture. We adapt the prior to both low-level local structures as 10 10

well as mid-level textural characteristics. Improvements in −2 −2 10 10 visual quality is demonstrated on deconvolution and denois-

−3 −3 ing tasks. 10 10

−4 −4 10 10 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 0 0. 1 0. 2 0. 3 0. 4 0. 5 0. 6 1. Introduction Gradient magnitude Gradient magnitude Figure 1. The colored gradient profiles correspond to the region Image enhancement algorithms resort to image priors to with the same mask. The steered gradient profile is spatially hallucinate information lost during the image capture. In variant in most natural images. Therefore, the image prior should recent years, image priors based on image gradient statis- adapt to the image content. Insets illustrate how steered gradients tics have received much attention. Natural images often adapt to local structures. consist of smooth regions with abrupt edges, leading to a heavy-tailed gradient profile. We can parameterize heavy- 2. Related work tailed gradient statistics with a generalized Gaussian distri- Image prior research revolves around finding a good im- bution or a mixture of Gaussians. Prior works hand-select age transform or basis functions such that the transformed parameters for the model distribution, and fix them for the image exhibits characteristics distinct from unnatural im- entire image, imposing the same image prior everywhere ages. Transforms derived from signal processing have been [10, 17, 23]. Unfortunately, different textures have differ- exploited in the past, including the Fourier transform [12], ent gradient statistics even within a single image, therefore the wavelet transform [28], the curvelet transform [6], and imposing a single image prior for the entire image is inap- the contourlet transform [8]. propriate (Figure 1). Basis functions learned from natural images have also We introduce an algorithm that adapts the image prior to been introduced. Most techniques learn filters that lie in the both low-level local structures as well as mid-level texture null-space of the natural image manifold [20, 30, 31, 32]. 1 cues, thereby imposing the correct prior for each texture. Aharon et al.[1] learns a vocabulary from which a natu- Adapting the image prior to the image content improves the ral image is composed. However, none of these techniques quality of restored images. adapt the basis functions to the image under analysis. 1 Strictly speaking, an estimate of image statistics made after examin- Edge-preserving smoothing operators do adapt to local ing the image is no longer a “prior” probability. But the fitted gradient distributions play the same role as an image prior in image reconstruction structures while reconstructing images. The anisotropic dif- equations, and we keep that terminology. fusion operator [5] detects edges, and smoothes along edges

1 −4 Orthogonal gradient x 10-4 Aligned gradient x 10 −4 but not across them. A similar idea appeared in a prob- 6 6 14 14 abilistic framework called a Gaussian conditional random 5 5 12 12 field [26]. A bilateral filter [27] is also closely related to 10 4 10 4 anisotropic operators. Elad [9] and Barash [2] discuss rela- 8 3 8 3 tionships between edge-preserving operators. log−lambda log−lambda 6 6 2 2 Some image models adapt to edge orientations as well as 4 4 magnitudes. Hammond et al.[13] present a Gaussian scale 1 1 2 2

0 0 mixture model that captures the statistics of gradients adap- 0 0.5 1 gamma1.5 2 2.5 3 3.5 0 0.5 1 gamma1.5 2 2.5 3 3.5 tively steered in the dominant orientation in image patches. Figure 2. The distribution of γ, ln(λ) for ∇ox, ∇ax in natural im- Roth et al.[21] present a random field model that adapts to ages. While the density is the highest around γ = 0.5, the density the oriented structures. Bennett et al.[3] and Joshi et al.[14] tails off slowly with significant density around γ = 2. We show, as exploit a prior on : an observation that there are two insets, some patches from Figure 1 that are representative of differ- dominant colors in a small window. ent γ, ln(λ). Adapting the image prior to textural characteristics was investigated for gray-scale images consisting of a single 3.2. The distribution of γ, λ in natural images texture [23]. Bishop et al.[4] present a variational im- Different textures give rise to different gradient profiles age restoration framework that breaks an image into square and thus different γ, λ. We study the distribution of the blocks and adapts the image prior to each block indepen- shape parameters γ, λ in natural images. We sample ∼ dently (i.e. the image prior is fixed within the block). How- 110, 000 patches of size 41 × 41 from 500 high quality nat- ever, Bishop et al.[4] do not address issues with estimating ural images. We fit the gradient profile from each patch to the image prior at texture boundaries. a generalized Gaussian distribution to associate each patch with γ, λ. We fit the distribution by searching for γ, λ that 3. Image characteristics minimize the Kullback-Leibler (KL) divergence between the empirical gradient distribution and the model distribu- We analyze the statistics of gradients adaptively steered tion p, which is equivalent to minimizing the negative log- in the dominant local orientation of an image x. Roth et al. likelihood of the model distribution evaluated over the gra- [21] observe that the gradient profile of orthogonal gradi- dient samples: ents ∇ox is typically of higher variance compared to that of aligned gradients ∇ x, and propose imposing different pri- N a ˜ 1 ors on and . We show that different textures within [˜γ, λ] = argmin − ln (p(∇xi)) (2) ∇ox ∇ax γ,λ N ( i=1 ) the same image also have distinct gradient profiles. X We parameterize the gradient profile using a generalized Figure 2 shows the Parzen-windowfit to sampled γ,˜ ln(λ˜) Gaussian distribution: for ∇ox, ∇ax. For orthogonal gradients ∇ox, there exists a 1 ( γ ) large cluster near γ = 0.5, ln(λ)=2. This cluster corre- γλ γ p(∇x)= 1 exp(−λk∇xk ) (1) spondsto patches from a smooth region with abrupt edges or 2Γ( γ ) a region near texture boundaries. This observation supports the dead leaves image model – an image is a collage of over- where Γ is a Gamma function, and γ, λ are the shape pa- lapping instances [16, 19]. However, we also observe a sig- rameters. γ determines the peakiness and λ determines the nificant density even when γ is greater than 1. Samples near width of a distribution. We assume that ∇ x and ∇ x are o a γ = 2 with large λ correspond to flat regions such as sky, independent: p(∇ x, ∇ x)= p(∇ x)p(∇ x). o a o a and samples near γ = 2 with small λ correspond to fractal- like textures such as tree leaves or grass. We observe sim- 3.1. Spatially variant gradient statistics ilar characteristics for aligned gradients ∇ax as well. The The local gradient statistics can be different from the distribution of shape parameters suggests that a significant global gradient statistics. Figure 1 shows the gradient statis- portion of natural images is not piecewise smooth, which tics of the colored regions. Two phenomena are responsible justifies adapting the image prior to the image content. for the spatially variant gradient statistics: the material and 4. Adapting the image prior the viewing distance. For example, a building is noticeably more piecewise smooth than a gravel path due to material The goal of this paper is to identify the correct image properties, whereas the same gravel path can exhibit differ- prior (i.e. shape parameters of a generalized Gaussian dis- ent gradient statistics depending on the viewing distance. To tribution) for each pixel in the image. One way to identify account for the spatially variant gradient statistics, we pro- the image prior is to fit gradients from every sliding win- pose adjusting the image prior locally. dow to a generalized Gaussian distribution (Eq 2), but the Variance of gradients0.045 Fourth moment of gradients required computation would be overwhelming. We intro- 0.1 0.04 0.09 duce a regression-based method to estimate the image prior. 0.035 0.08 0.03 0.07 4.1. Image model 0.06 0.025 0.05 0.02

0.04 Let y be an observed degraded image, k be a blur kernel 0.015 0.03 0.01 (a point-spread function or a PSF), and x be a latent image. 0.02 0.005 Image degradation is modeled as a process: 0.01 Down-sampled deconvolved image deconvolved Down-sampled Down-sampled deconvolved image deconvolved Down-sampled 0.02 0.04 0.06 0.08 0.1 0.0050.01 0.015 0.02 0.025 0.03 0.035 0.04 0.045 Down-sampled original image Down-sampled original image y = k ⊗ x + n (3) Figure 3. The local variance and fourth moments of gradients com- where ⊗ is a convolution operator, and n is an observation puted from the deconvolved, down-sampled image of Figure 1 are closely correlated with those of the down-sampled original image. noise. The goal of a (non-blind) image restoration problem is to recover a clean image x from a degraded observation Γ(3/γ) Γ(5/γ) y given a blur kernel k and a standard deviation of noise v = 2 ,f = 4 (9) λ γ Γ(1/γ) λ γ Γ(1/γ) η, both of which can be estimated through other techniques [10, 18]. To take advantage of these relationships, we define the local We introduce a conditional random field (CRF) model to texture around a pixel i, zˆi, as a two dimensionalvector. The incorporate texture variations within the image restoration first dimension is the variance vi of gradients in the neigh- framework. Typically, a CRF restoration model can be ex- borhood of a pixel i, and the second dimension is the fourth pressed as follows: moment fi of gradients in the neighborhood of a pixel i: 1 p(x|y,k,η)= φ (x; y ,k,η)φ (x) zˆi = [vi(∇x),fi(∇x)] (10) M y i x (4) i Y Qualitatively, the variance of gradients vi(∇x) encodes the where M is a partition function and i is a pixel index. φy width of the distribution, and the fourth moment fi(∇x) en- is derived from the observation process Eq 3; φx from the codes the peakiness of the distribution. Note that we can assumed image prior: easily compute vi,fi by convolving the gradient image with 2 a window that defines the neighborhood. We use a Gaussian (yi − (k ⊗ x)i) φ (x; y ,k,η) ∝ exp(− ) (5) window with a 4-pixel standard deviation. y i 2η2 γ φx(x) ∝ exp(−λk∇xk ) (6) Estimating the texture zˆ from the observation y The texture zˆ should be estimated from the sharp image x we To model the spatially variant gradient statistics, we in- wish to reconstruct, but x is not available when estimat- troduce an additional hidden variable , called texture, to the z ing zˆ. We address this issue by estimating the texture zˆ conventional CRF model. z controls the shape parameters of from an image reconstructed using a spatially invariant im- the image prior: age prior. We hand-select the spatially invariant prior with 1 a weak gradient penalty so that textures are reasonably re- p(x, z|y,k,η)= φy(x; yi,k,η)φx,z(x, z) (7) M stored: [γo = 0.8, λo = 6.5], [γa = 0.6, λa = 6.5]. A i Y caveat is that the fixed prior deconvolution may contami- γ(z) where φx,z(x, z) ∝ exp(−λ(z)k∇xk ). We model z as a nate the gradient profile of the reconstructed image, which continuous variable since the distribution of [γ, λ] is heavy- may induce texture estimation error. To reduce such decon- tailed and does not form tight clusters (Figure 2). volution noise, we down-sample the deconvolved image by We maximize p(x|y,k,η) to estimate a clean image xˆ. a factor of 2 in both dimensions before estimating the tex- To do so, we use a Laplace approximation around the mode ture zˆ. The gradient profile of natural images is often scale zˆ to approximate p(x|y,k,η): invariant due to fractal properties of textures and piecewise smooth properties of surfaces [16, 19], whereas that of the p(x|y,k,η)= p(x, z|y,k,η)dz ≈ p(x, zˆ|y,k,η) (8) deconvolution noise tends to be scale variant. Therefore, the Zz texture zˆ estimated from the down-sampled deconvolvedim- Section 4.2 discusses how we estimate zˆ for each pixel. age better resembles the texture of the original sharp image.

4.2. Estimating the texture zˆ 4.3. Estimating the shape parameters γ, λ from zˆ A notable characteristic of a zero-mean generalized We could numerically invert Eq 9 to directly compute the Gaussian distribution is that the variance v and the fourth shape parameters [γ, λ] from the variance and fourth mo- moment f completely determine the shape parameters [γ, λ] ment estimates [24]. However, a numericalinversion is com- [24]: putationally expensive and is sensitive to noise. We instead 1 Υ ln(λ) edge 3 7 0.8 6 0.6 2.5 (a) 5 0.4 2 4 0.2 Image

Estimated from from Estimated 3

0 image deconvolved 10 20 30 40 50 60 70 80 90 100 1.5 −6 2 −5 −10 1 1

−14 0 (b) 0.5 log−variance − 2 pix. std. −1

log−variance − 4 pix. std. −18 image original

−10 from Estimated log−fourth moment − 2 pix. std. −20 −2 log−fourth moment − 4 pix. std. 0

10 20 30 40 50 60 70 80 90 100 Figure 5. The shape parameters for orthogonal gradients, estimated 4 from the down-sampled deconvolved image of Figure 1 (recon- Gamma - from regression structed from an image degraded with 5% noise and the blur in 3 GCRF observation noise Regularized gamma Figure 7) and the down-sampled original image. The shape param- (c) 2 eters estimated from the deconvolved image are similar to those of the original image. 1

0 4.4. Handling texture boundaries 10 20 30 40 50 60 70 80 90 100 Figure 4. We regularize the estimated shape parameters using a If multiple textures appear within a single window, the GCRF such that the texture transition mostly occurs at the texture estimated shape prior can be inaccurate. Suppose we want boundary. We model the observation noise in the GCRF as the to estimate the image prior for a 1-dimensional slice of an variance of the variance and fourth moments estimated from two image (Figure 4(a)). Ideally, we should recover two regions Gaussian windows with different standard deviations – 2-pixel and with distinct shape parameters that abut each other via a thin 4-pixel, as shown in (b). This reduces the shape parameter esti- band of shape parameters corresponding to an edge. How- mation error at texture boundaries, as shown in (c) (compare ever, the estimated image prior becomes “sparse” (i.e. small and curves). γ) near the texture boundaryeven if pixels do not correspond use a kernel regressor that maps the log of the texture ln(ˆz) to an edge (the green curve in Figure 4(c)). The use of a to shape parameters [γ, ln(λ)]. finite-size window for computing v and f causes this issue. The regressor should learn the mapping from the tex- To recover shape parameters near texture boundaries, we ture zˆ of the down-sampled deconvolved image to shape pa- regularize the estimated shape parameters using a Gaussian rameters in order to account for any residual deconvolution conditional random field (GCRF) [26]. Essentially, we want noise in the estimated texture zˆ. Since the deconvolved im- to smooth shape parameters only near texture boundaries. A age, thus zˆ, depends on the blur kernel and the noise level, notable characteristic at texture boundaries is that zˆ’s esti- we would have to train regressors discriminatively for each mated from two different window sizes tend to be different degradation scenario, which is intractable. However, we em- from each other: while a small window spans a homogenous pirically observe in Figure 3 that the varianceand fourth mo- texture, a larger window could span two different textures, ment of the deconvolved, down-sampled image are similar generating different zˆ’s. We leverage this characteristic to to those of the down-sampled original image. Therefore, we smooth only near texture boundaries by defining the obser- learn a single regressor that maps the variance and fourth vation noise level in GCRF as a mean variance of the vari- moment of the down-sampled original image to the shape ance v and of the fourth moment f estimated from windows parameters, and use it to estimate the shape parameters from of two different sizes (Gaussian windows with 2-pixel and the down-sampled deconvolved image. 4-pixel standard deviations.) If the variance of these esti- To learn the regression function, we sample 125, 000 mates is large, the central pixel is likely to be near a texture patches of size 17 × 17 pixels from 500 high quality natural boundary, thus we smooth the shape parameter at the central images. We fit the gradient profile of each patch to a gen- pixel. Appendix A discusses the GCRF model in detail. Fig- eralized Gaussian distribution, and associate each fit with ure 4(c) shows the estimated shape parameters before and the variance and fourth moment of gradients in the down- after regularization along with the estimated GCRF observa- sampled version of each patch (9 × 9 pixels). We use the tion noise. After regularization, two textures are separated collected data to learn the mapping from [ln(v), ln(f)] to by a small band of sparse image prior corresponding to an [γ, ln(λ)] using LibSVM [7]. We use a 10-fold cross valida- edge. tion technique to avoid over-fitting. Figure 5 shows the estimated shape parameters for or- thogonal gradients of Figure 1. In the top row of Figure 5, Unsteered gradient prio r PSNR Steered gradient prior the parameters are estimated from the image reconstructed 30 Content-aware prior from 5% noise and the blur in Figure 7. We observe that the 29 estimated prior in the tree region is close to Gaussian (i.e. 28 27 γ = 2 ∼ 3), whereas the estimated prior in the building re- 26 (dB) gion is sparse (i.e. γ < 1). The estimated shape parameters 25 are similar to parameters estimated from the down-sampled, 24 original image (the bottom row of Figure 5). This supports 23 22 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 the claim that shape parameters estimated from a degraded SSIM input image reasonably accurate. 1 0.98

4.5. Implementation details 0.96 We minimize the negative log-posterior to reconstruct a 0.94 clean image xˆ: 0.92 0.9 (y − k ⊗ x)2 (11) 0.88 xˆ = argmin{ 2 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 x 2η Image number N Figure 6. Image deconvolution results : PSNR and SSIM. Mean γo(ˆzi) γa(ˆzi) + w (λo(ˆzi)k∇ox(i)k + λa(ˆzi)k∇xk )} PSNR: unsteered gradient prior – 26.45 dB, steered gradient prior i=1 X – 26.33 dB, content-aware prior – 27.11 dB. Mean SSIM: un- steered gradient prior – 0.937, steered gradient prior – 0.940, where [γo, λo], [γa, λa] are estimated parameters for orthog- content-aware prior – 0.951. onal and aligned gradients, respectively, and w is a weight- ing term that controls the gradient penalty. w =0.025 in all of piecewise smooth objects such as buildings, the differ- examples. We minimize Eq 11 using an iterative reweighted ence between the content-aware image prior and others is least squares algorithm [17, 25]. minor. Figure 7 compares the visual quality of images re- constructed using different priors. We observe that textured 5. Experimental results regions are best reconstructed using the content-aware im- We evaluate the performance of the content-aware image age prior, illustrating the benefit of adapting the image prior prior for deblurring and denoising tasks. We compare our to textures. results to those reconstructed using a sparse unsteered gradi- Denoising The goal of denoising is to reconstruct a sharp ent prior [17] and a sparse steered gradient prior [21], using image from a noisy observation given a noise level. We con- peak signal-to-noise ratio (PSNR) and gray-scale structural sider reconstructing clean images from degraded images at similarity (SSIM) [29] as quality metrics. We augmented two noise levels: 5% and 10%. Figure 8 shows the mea- the steerable random fields [21], which introduced denois- sured PSNR and SSIM for the denoising task. When the ing and image inpainting as applications, to perform decon- noise level is low (5%), the content-aware prior reconstructs volution using the sparse steered gradient prior. In all exper- images with lower PSNR compared to competing methods. iments, we use the first order and the second order gradient One explanation is that the content-aware prior may not re- filters [11]. We can augment these algorithms using higher move all the noise in texturedregions (such as trees) because order gradient filters to improve reconstruction qualities, but the gradient statistics of noise is similar to that of the under- it is not considered in this work. The test set consists of 21 lying texture. Such noise, however, does not disturb the vi- high quality images downloaded from LabelMe [22] with sual quality of textured regions. The SSIM measure, which enough texture variations within each image. is better correlated with the perceptual quality [29], shows Non-blind deconvolution The goal of non-blinddeconvo- that the content-aware image prior performs slightly worse, lution is to reconstruct a sharp image from a blurred, noisy if not comparably, compared to other methods at a 5% noise image given a blur kernel and a noise level. We gener- level. It’s worth noting that when the noise level is low, the ate our test set by blurring images with the kernel shown observation term is strong so that reconstructed images do in Figure 7, and adding 5% noise to blurred images. Fig- not depend heavily on the image prior. The top row of Fig- ure 6 shows the measured PSNR and SSIM for different de- ure 9 shows that at a 5% noise level, reconstructed images convolution methods. The content-aware prior deconvolu- are visually similar. tion method performs favorably compared to the competing When the noise level is high (10%), SSIM clearly fa- methods, both in terms of PSNR and SSIM. The benefit of vors images reconstructed using the content-aware prior. In using a spatially variant prior is more pronouncedfor images this case, the observation term is weak, thus the image prior with large textured regions. If the image consists primarily plays an important role in the quality of reconstructed im- Blurry image Unsteered gradient prior Steered gradient prior Content-aware prior Original image

PSNR: 25.59dB, SSIM: 0.960 PSNR: 25.48dB, SSIM: 0.962 PSNR: 26.45dB, SSIM: 0.970 Figure 7. Adapting the image prior to textures leads to better reconstructions. The red box denotes the cropped region.

PSNR Unsteered gradient prior PSNR Unsteered gradient prior 35 Steered gradient prior 32 Steered gradient prior Content-aware prior 34 Content-aware prior 31 33 30 32 29 31 (dB) 28 30 29 27 28 26 27 25 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 SSIM SSIM 1 1

0.995 0.98 0.99 0.96 0.985 0.94 0.98 0.92 0.975 0.97 0.9

0.965 0.88 1 2 3 4 5 6 7 8 9 101112131415161718192021 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 Image number Image number σ = 5% σ = 10% Figure 8. Image denoising results : PSNR and SSIM. At 5% noise = Mean PSNR: unsteered gradient prior – 32.53 dB, steered gradient prior – 32.74 dB, content-aware prior – 31.42 dB. Mean SSIM: unsteered gradient prior – 0.984, steered gradient prior – 0.984, content-aware prior – 0.982. At 10% noise = Mean PSNR: unsteered gradient prior – 28.54 dB, steered gradient prior – 28.43 dB, content-aware prior – 28.52 dB. Mean SSIM: unsteered gradient prior – 0.950, steered gradient prior – 0.953, content-aware prior – 0.959

Content-aware gradient prior Content-aware gradient prior ages. Therefore, the performance benefit from using the Vs. sparse unsteered gradient prior Vs. sparse steered gradient prior content-aware prior is more pronounced. The bottom row of Unsteered grad. prior No Difference Content-aware prior Steered grad. prior 1 1

Figure 9 shows denoising results at a 10% noise level, sup- 0.8 0.8 porting our claim that the content-aware image prior gener- 0.6 0.6 ates more visually pleasing textures. 0.4 0.4

Figure 10 shows the result of deconvolving a real blurry 0.2 0.2 image captured with a handheld camera. We estimate the 0 0 blur kernel using the algorithm in Fergus et al.[10]. Again, blur blur kernel kernel textured regions are better reconstructed using our method. 5% 10% 5% 5% 10% 5% level level noise noise 5.1. User study Figure 11. User study results. The region corresponds to the fraction of users that favored our reconstructions for each degra- We conducted a user study on Amazon Mechanical Turk dation scenario. At a low degradation level, users do not prefer to compare the visual quality of the reconstructed images. one method over another, but as the level of degradation increases, We evaluated 5 randomly selected images for each degra- users clearly favor the content-aware image prior. dation scenario considered above. Each user views two im- ages, one reconstructed using the content-aware prior and ing crop regions in order to minimize bias. another reconstructed using either the unsteered gradient We gathered about 20 user opinions for each comparison. prior or the steered gradient prior. The user selects the visu- In Figure 11, we show the average user preference in each ally pleasing one of the two, or selects “There is no differ- degradation scenario. Users did not have a particular pref- ence” option. We cropped each test image to 500 × 350 erence when the degradation was small (e.g. 5% noise), but pixels to ensure that the users view both images without at a high image degradation level users clearly favored the scrolling. We did not refer to restored images while select- content-aware image prior over others. Noisy image Unsteered gradient priorSteered gradient prior Content-aware prior Original image 5% noise

PSNR : 30.74dB, SSIM: 0.995 PSNR : 30.85dB, SSIM: 0.995 PSNR : 29.27dB, SSIM: 0.995 10% noise

PSNR : 29.49dB, SSIM: 0.957 PSNR : 29.33dB, SSIM: 0.960 PSNR : 29.41dB, SSIM: 0.965 Figure 9. The visual comparisons of denoised images. The red box denotes the cropped region. At 5% noise level, while the PSNR of our result is lower than those of competing algorithms, visually the difference is imperceptible. At 10% noise level, content-aware prior outperforms others in terms of both the PSNR and the SSIM, and is more visually pleasing. Blurry input image Sparse unsteered gradient prior Sparse steered gradient prior Content-aware prior

Figure 10. The deconvolution of a blurred image taken with a hand-held camera. We estimate the blur kernel using Fergus et al.[10]. The red box denotes the cropped region. The textured region is better reconstructed using the content-aware image prior. 5.2. Discussions the estimated gradient statistics to adaptively restore differ- A limitation of our algorithm, which is shared with algo- ent textural characteristics in image restoration tasks. We rithms using a conditional random field model with hidden show that the content-aware image prior can restore piece- variables [14, 21, 26], is that hidden variables, such as the wise smooth regions without over-smoothing textured re- magnitude and/or orientation of an edge, or textureness of a gions, improving the visual quality of reconstructed images region, are estimated from the degraded input image or the as verified through user studies. Adapting to textural charac- image restored through other means. Any error from this teristics is especially important when the image degradation preprocessing step induces error in the final result. is significant. Another way to estimate a spatially variant prior is to seg- Appendix A ment the image into regions and assume a single prior within each segment. Unless we segment the image into many Gaussian CRF model for [γ, λ] regularization pieces, the estimated prior can be inaccurate. Also, the seg- We regularize the regressor outputs [˜γ, λ˜] using a Gaus- mentation may inadvertently generate artificial boundaries sian Conditional Random Fields (GCRF). We maximize the in reconstructed images. Therefore, we estimate a distinct following probability to estimate regularized γ: image prior for each pixel in the image. P (γ;˜γ) ∝ ψ(˜γi|γi)Ψ(γi,γj ) (12) 6. Conclusion ∈N i,jY(i) We have explored the problem of estimating spatially where N(i) denotes the neighborhood of i, ψ is the obser- variant gradient statistics in natural images, and exploited vation model and Ψ is the neighborhood potential: (˜γ − γ )2 ψ(˜γ |γ ) ∝ exp − i i [14] N. Joshi, C. L. Zitnick, R. Szeliski, and D. Kriegman. Image i i 2σ2  l  (13) deblurring and denoising using color priors. In IEEE CVPR, 2 (γi − γj ) 2009. 2, 7 Ψ(γi,γj ) ∝ exp − 2 [15] J. C. Lagarias, J. A. Reeds, M. H. Wright, and P. E. Wright. 2σn(i, j)   Convergence properties of the nelder-mead simplex method 2 We set σl and σn adaptively. We set the variance σn(i, j) in low dimensions. SIAM Journal of Optimization, 1998. 2 of the neighboring γ estimates γi,γj as σn(i, j)= α(x(i)− [16] A. B. Lee, D. Mumford, and J. Huang. Occlusion models for x(j))2, where x is the image and α = 0.01 controls how natural images: a statistical study of a scale-invariant dead smooth the neighboring estimates should be. σn encourages leaves model. IJCV, 41:35–59, 2001. 2, 3 the discontinuity at strong edges of the image x [26]. The [17] A. Levin, R. Fergus, F. Durand, and W. T. Freeman. Image 2 and depth from a conventional camera with a coded aperture. observation noise σl is the average variance of the variance and fourth moment estimates (for two Gaussian windows ACM TOG (SIGGRAPH), 2007. 1, 5 with standard deviation = 2 pixels, 4 pixels). We use the [18] C. Liu, W. T. Freeman, R. Szeliski, and S. B. Kang. Noise same GCRF model to regularize ln(λ) with α =0.001. estimation from a single image. In IEEE CVPR, 2006. 3 [19] G. Matheron. Random Sets and Integral Geometry. John Acknowledgment Wiley and Sons, 1975. 2, 3 This work was done while the first author was an intern [20] S. Roth and M. J. . Fields of experts: a framework for at MSR. This research is partially funded by NGA NEGI- learning image priors. In IEEE CVPR, June 2005. 1 1582-04-0004, by ONR-MURI Grant N00014-06-1-0734, [21] S. Roth and M. J. Black. Steerable random fields. In IEEE and by gift from Microsoft, Google, Adobe. The first au- ICCV, 2007. 2, 5, 7 thor is also supported by Samsung Scholarship Foundation. [22] B. Russell, A. Torralba, K. Murphy, and W. T. Freeman. La- belMe: a database and web-based tool for image annotation. References IJCV, 2008. 5 [23] S. S. Saquib, C. A. Bouman, and K. Sauer. ML parame- [1] M. Aharon, M. Elad, and A. Bruckstein. The K-SVD: an al- ter estimation for markov random fields with applications to gorithm for designing of overcomplete dictionaries for sparse bayesian tomography. IEEE TIP, 7(7):1029, 1998. 1, 2 representation. IEEE TSP, Nov. 2006. 1 [24] E. P. Simoncelli and E. H. Adelson. Noise removal via [2] D. Barash. A fundamental relationship between bilateral fil- bayesian wavelet coring. In IEEE ICIP, 1996. 3 tering, adaptive smoothing, and the nonlinear diffusion equa- tion. IEEE TPAMI, 24(6):844 – 847, June 2002. 2 [25] C. V. Stewart. Robust parameter estimation in computer vi- [3] E. P. Bennett, M. Uyttendaele, C. L. Zitnick, R. Szeliski, and sion. SIAM Reviews, 41(3):513 – 537, Sept. 1999. 5 S. B. Kang. Video and image bayesian demosaicing with a [26] M. F. Tappen, C. Liu, E. H. Adelson, and W. T. Freeman. two color image prior. In ECCV, 2006. 2 Learning Gaussian conditional random fields for low-level vi- [4] T. E. Bishop, R. Molina, and J. R. Hopgood. Nonstationary sion. In IEEE CVPR, 2007. 2, 4, 7, 8 blind image restoration using variational methods. In IEEE [27] C. Tomasi and R. Manduchi. Bilateral filtering for gray and ICIP, 2007. 2 color images. In IEEE ICCV, 1998. 2 [5] M. J. Black, G. Sapiro, D. H. Marimont, and D. Heeger. Ro- [28] M. Wainwright and E. P. Simoncelli. Scale mixtures of Gaus- bust anisotropic diffusion. IEEE TIP, Mar. 1998. 1 sians and the statistics of natural images. In NIPS, 2000. 1 [6] E. J. Candes and D. L. Donoho. Curvelets - a surprisingly [29] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli. effective nonadaptive representation for objects with edges, Image quality assessment: from error visibility to structural 1999. 1 similarity. IEEE TIP, 2004. 5 [7] C.-C. Chang and C.-J. Lin. LIBSVM: a library for support [30] Y. Weiss and W. T. Freeman. What makes a good model of vector machines, 2001. 4 natural images? In IEEE CVPR, 2007. 1 [8] M. N. Do and M. Vetterli. The contourlet transform: an effi- [31] M. Welling, G. Hinton, and S. Osindero. Learning sparse cient directional multiresolution image representation. IEEE topographic representations with products of student-t distri- TIP, Dec. 2005. 1 butions. In NIPS, 2002. 1 [9] M. Elad. On the origin of the bilateral filter and ways to [32] S. C. Zhu, Y. Wu, and D. Mumford. Filters, random fields improve it. IEEE TIP, Oct. 2002. 2 and maximum entropy (FRAME): Towards a unified theory [10] R. Fergus, B. Singh, A. Hertzmann, S. Roweis, and W. T. for texture modeling. IJCV, 1998. 1 Freeman. Removing camera shake from a single photograph. ACM TOG (SIGGRAPH), 2006. 1, 3, 6, 7 [11] W. T. Freeman and E. H. Adelson. The design and use of steerable filters. IEEE TPAMI, 13(9), Sept. 1991. 5 [12] Gonzalez and Woods. Digital image processing. Prentice Hall, 2008. 1 [13] D. K. Hammond and E. P. Simoncelli. Image denoising with an orientation-adaptive gaussian scale mixture model. In IEEE ICIP, 2006. 2