<<

An Efficient Statistical Method for Image Level Estimation

Guangyong Chen1, Fengyuan Zhu1, and Pheng Ann Heng1,2 1 Department of Computer Science and Engineering, The Chinese University of Hong Kong 2Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

Abstract not change over the position or color intensity of a . The goal of noise level estimation is to estimate the un- In this paper, we address the problem of estimating noise known σ of the with a level from a single image contaminated by additive zero- single observed noisy image. Gaussian noise. We first provide rigorous analysis on The problem of estimating noise level from a single im- the statistical relationship between the noise variance and age is fundamentally ill-posed. During last decades, numer- the eigenvalues of the covariance matrix of patches within ous noise estimating methods [2, 17, 13, 20, 24] have been an image, which shows that many state-of-the-art noise es- proposed. However, all of these methods are based on the timation methods underestimate the noise level of an image. assumption that the processed image contains a sufficient To this end, we derive a new nonparametric algorithm for amount of flat areas, which is not always the case for natu- efficient noise level estimation based on the observation that ral image processing. Recently, new algorithms have been patches decomposed from a clean image often lie around a proposed in [19, 23] with state-of-the-art performance. The low-dimensional subspace. The performance of our method authors of [19, 23] claim that these methods can accurately has been guaranteed both theoretically and empirically. estimate the noise level of images without homogeneous ar- Specifically, our method outperforms existing state-of-the- eas. However, these methods suffer from the following two art algorithms on estimating noise level with the least exe- weaknesses. Firstly, as concluded in [19], the convergence cuting time in our experiments. We further demonstrate that and performance of selecting low-rank patches is not the- the denoising algorithm BM3D algorithm achieves optimal oretically guaranteed and does not have high accuracy em- performance using noise variance estimated by our algo- pirically. Secondly, as theoretically explained in Sec. 2.2 of rithm. this paper, both [19] and [23] underestimate the noise level for processed images, since they take the smallest eigen- value of the covariance of selected low-rank patches as their 1. Introduction noise estimation result. To tackle these problems, we propose a new algorithm Noise level is an important parameter to many algo- for noise level estimation. Our work is based on the obser- rithms in different areas of computer vision, including im- vation that patches taken from the noiseless image often lie age denoising [3, 5, 6, 7, 16], optical flow [12, 26], image in a low-dimensional subspace, instead of being uniformly segmentation [1, 4] and super resolution [9]. However, in distributed across the ambient space. This property has been real world situations the noise level of particular images widely used in subspace clustering methods [8, 29]. The can be unavailable and is required to be estimated. So far, low-dimensional subspace can be learned by the method of it still remains to be a challenge to accurately estimate the Principal Component Analysis (PCA) [14]. As analyzed in noise level for different noisy images, especially for those 2.1, the noise variance can be estimated from the eigenval- with rich textures. Therefore, a robust noise level estimation ues of redundant dimensions. In this way, the problem of method is highly demanded. noise level estimation is reformulated to the issue of select- One important noise model widely used in different com- ing redundant dimensions for PCA. This problem has been puter vision problems, including image denoising, is the investigated as a model selection problem in the fields of additive, independent and homogeneous Gaussian noise statistics and , including [10, 15, 21, 28]. model, where “homogeneous” that the noise vari- However, these methods focus on using less latent compo- ance is a constant for all within an image and does nents to represent observed signals. As a result, their meth-

477 Figure 1. An example to illustrate that patches of a clean image lie in a low-dimension subspace. (a) The clean image. (b) The noisy image with noise σ = 50. (c) The eigenvalues of clean image and noisy image, where the red curve represents the eigenvalues of clean image and the blue curve represents the eigenvalues of noisy image.

2 ods always consider signals components as resulting r = cd elements in this paper. For any arbitrary vector xt in the overestimation of . In this paper, we pro- in the observable set Xs, it can be decomposed as: pose an effective method to solve this issue with the statis- tical property that the eigenvalues of redundant dimensions xt =x ˆt + et, (1) are random variables following a same distribution, which where xˆ Rr 1 is the corresponding noise-free image is demonstrated in Sec. 2.2. As proved in Sec. 2.3, the pro- t × patch lying∈ in the low-dimensional subspace, e Rr 1 posed method is expected to achieve the accurate noise esti- t × denotes the additive noise and E(ˆxT e ) = 0. As I∈is con- mation when the number of principal dimensions is smaller t t taminated by Gaussian noise N(0,σ2) with zero-mean and than a threshold. 2 variance σ , et follows a multivariate Gaussian distribution Thus, the contributions of this paper can be summarized 2 2 Nr(0,σ I) with mean 0 and covariance matrix σ I. With as below: such setting, estimating noise level of an image with the set 2 The statistical relationship between the noise level σ2 of patches Xs is equivalent to estimating the noise level σ • and the eigenvalues of covariance matrix of patches is of the dataset Xs. firstly estimated in this paper. 2.1. Eigenvalues A nonparametric algorithm is proposed to estimate As illustrated in Fig.1 (c), most eigenvalues of the • the noise level σ2 from the eigenvalues in polynomial clean image are 0, which confirms our previous discussion: time, whose performance is theoretically guaranteed. "patches taken from clean image often lie in a single low- As demonstrated empirically, our method is the most dimensional subspace". However, it can be observed that robust and can achieve best performance for noise level most eigenvalues of the noisy image surround the true noise estimation in most cases. Moreover, our method con- variance 50 instead of being 50 exactly. Thus, it is still diffi- sumes least executing time and is nearly 8 times faster cult to obtain an accurate noise estimation from eigenvalues than [19, 23]. directly. To investigate the relationship between eigenval- ues and noise level comprehensively, we first formulate the The rest of the article is organized as following. Based calculation of eigenvalues from another perspective. on the patch-based model, we propose our method in Sec. Assume that clean patches lie in m-dimensional linear 2. In Sec. 3, we present the comparison of the experimental subspace, where m is a predefined positive integer with results with discussions. Finally, the conclusion and some m r, we can formulate equation (1) as follows: future works are presented in Sec. 4. ≪ xt = Ayt + et. (2)

2. Our Method: Theory and Implementation r m A R × denotes the dictionary matrix spanning the m- ∈ T An observable image I can be decomposed into a num- dimension subspace with constraint A A = I and yt s r s m 1 ∈ ber of patches X = x R × . Given a multi- R × denotes the projection point of x on the subspace s { t}t=1 ∈ t channel image I with size M N c, Xs contains s = spanned by A. PCA has been widely used to infer the lin- (M d + 1)(N d + 1) patches× of× size d d c, whose ear model described in equation (2), where A consists of left-top− corner positions− are taken from set ×1,...,M× d+ the m eigenvectors with the m largest eigenvalues of the { − 1 s T 1 1,...,N d + 1 . To simplify the following calcu- covariance matrix Σx = s t=1(xt µ)(xt µ) with } × { − } 1 s − − lations, all patches are further rearranged into vectors with µ = s t=1 xt. P P

478 (r m) r Given an additional matrix U R − × , we define a 2.2. Variance of Finite Gaussian Variables rotation matrix R = [A, U] which∈ satisfies RT R = I. R Both [19] and [23] assumed the minimum eigenvalue λ can be effectively solved by the eigen-decomposition of the r to be the estimation of noise variance σ2. However, as covariance matrix Σ , just as PCA. Thus, equation (2) can x shown in their experiments, their noise estimation results be rewritten as: are consistently smaller than the true noise variance σ2. In yt, this section, we will give a brief theoretical analysis of this xt = R + et, (3)  0  phenomena. where 0 means the column vector with size (r m) 1 and Recall that the eigenvalues of redundant dimensions is each element as 0. Multiplying RT on both size− of equation× the variance of finite Gaussian variables, which is demon- (3) leads to the following equation: strated in Sec. 2.1, we first prove the following lemma that the eigenvalues of redundant dimensions can be viewed as T T yt, T yt + A et, the realizations of a special Gaussian distribution under cer- R xt = + R et = T . (4)  0   U et  tain conditions.

It can be observed that the noise component et has been s Lemma 1. Given a set of random variable nt[i] t=1 with separated from the observable xt by the rotation matrix R, each element following Gaussian distribution{ N(0},σ2) in- and concentrates on (m + 1)-th column to r-th column in dependently, the distribution of the noise estimation σˆ2 = T i 4 the vector R xt. Based on the Gaussian properties, nt = 1 s 2 2 2σ T s t=1 nt[i] converges to the distribution of N(σ , s ) U et is an random variable following Gaussian distribution 2 whenP s is large. Nr m(0,σ I) − As R consists of the eigenvectors of Σx, it satisfies Proof. Since nt[i] is generated from the Gaussian distribu- tion N(0,σ2), it can be proved that the random variable ΣxR = RΦ, (5) nt[i] σ follows the standard N(0, 1). By r r where Φ R × is a diagonal matrix with the eigenvalues the definition of Chi-squared distribution, we get ∈ as its diagonal elements. Thus, the covariance matrix of s T n [i] R xt can be calculated as: ( t )2 χ2. σ ∼ s 1 s 1 s Xt=1 (RT x )(RT x )T = RT x xT R s t t s t t Combined with equation (7), we can obtain Xt=1 Xt=1 T s = R ΣxR 2 2 σˆi χs. (6) σ2 ∼ λ1 0 0 ··· From the properties of Chi-square distribution,  0 λ2 0  ··· , = . . .. . 2  . . . .  χs s d   − N(0, 1) (8)  0 0 λ  √2s →  ··· r which states that i-th eigenvalue λi is also the variance cal- as s . Then we have T → ∞ culated on i-th dimension of vector R xt. From equation 2 2 σˆi σ d (4), we can find that the variances on the principal dimen- √s − N(0, 1), √2σ2 → sions are expected to be larger than σ, and the variances 2 calculated on the redundant dimensions are expected to be which implies that the noise estimation σˆi also converges to equal to σ. Given λ1 λ2 ... λr, we can represent a Gaussian distribution when s is large based on the prop- 2 2 r ≥ ≥ ≥ m σˆi σ = λi i=1 with 1 2, where 1 = λi i=1 denotes erty of Gaussian distribution. Let w = √s − 2 , then S { } S S S { } r √2σ the variance on principalS dimensions and 2 = λi √ 2 S { }i=m+1 σˆ2 = σ2 + 2σ w. Consequently, we can write the proba- denotes the variance on redundant dimensions. i √s bility distribution function (pdf) of σ2 as below Let nt[i] denote the i-th element of the vector nt = ˆi T U et, the eigenvalues in the subset 2 is denoted as 2σ4 S σˆ2 N(σ2, ). (9) 1 s i ∼ s λ = n [i]2, i m + 1, m + 2,...,r , (7) i s t ∀ ∈ { } Xt=1 where nt[i] is a random variable following Gaussian distri- Lemma 1 can be further demonstrated empirically by bution N(0,σ2). As a function of random variables, each Monte Carlo simulation, where the sample variance is es- 6 3 eigenvalue λi 2 is itself a random value and can be timated 10 times repeatedly. In each trial, s = 10 sam- viewed as an estimation∈ S of the noise variance σ2. ples are drawn from a normal distribution N(0,σ2) with

479 Figure 3. The histogram of eigenvalues of Fig.1 (b). It can be ob- served that the eigenvalues follow a Gaussian distribution, despite Figure 2. The simulation results for the variance estimation, which several large outliers. shows that the distribution of the variance can be well approxi- mated by a Gaussian distribution. 2.3. Dimension Selection

σ2 = 502. The histogram of the sample variance estima- As discussed earlier, more accurate estimation can be ob- tions of all trials is drawn in Fig.2, whose result is coin- tained by taking all elements in the 2 into consideration S cided with Lemma 1 and states that the histogram of noise and the problem of underestimating noise variance can be 2 variance σˆi can be effectively approximated by a Gaussian avoided. Fig.3 draws the histogram of eigenvalues calcu- distribution. lated from the noisy image, Fig.1 (b), which consists of the For the issue of noise estimation, the number of decom- eigenvalues of redundant dimensions and some upper out- posed patches s = (M d + 1)(N d + 1) is typically liers. However, the number of redundant dimensions is not large enough. Let d = 8 −by default, an− observed image with known previously. In this section, we propose a new algo- resolution 512 512 has s = (512 8 + 1)2 = 255025 de- rithm for dimension selection which is illustrated in Alg.1. composed patches,× which means that− there are 255025 sam- The number of principal dimensions is estimated by indi- ples for the estimation of the eigenvalue of each redundant cating whether there are upper outliers in the 2 or not. S dimension. As illustrated in Fig.2, 1000 samples are large Starting with the initial status with 1 = and 2 = , S ∅ S S enough for the satisfaction of equation (8). Thus, from the we remove the largest value in 2 until there are not any S equation (7) with lemma 1, we can derive that the eigenval- outliers in 2 based on our criterion. A theoretical bound r S ues of redundant dimensions λi i=m+1 are the realizations is provided to guarantee its performance, as stated in the { } 4 of Gaussian distribution N(σ2, 2σ ). following theorem. 2 s Let Φ(x) = 1 x e t /2dt denote the cumulative √2π − Theorem 2. If x X lie in a m-dimension subspace, −∞ t s distribution functionR of a standard Gaussian distribution, and m satisfies the∀ following∈ condition Blom [25, 27] has proven the following theorem. (1 2α)δ(β)+ α Theorem 1. Given n independent random variables m

480 mean µ of the set can be given as S 1 m Eµ = σ2 + (Eλ σ2) r i − Xi=1 σ2 + β(Eλ σ2) (11) ≥ m − 2 1 r m α = σ + βνΦ− ( − − ), r m 2α + 1 − − 2 √2σ where ν = √s . Let k = r m denote the number of eigenvalues of redundant dimensions,− it can be indicated that there are k 2 2 variables following the Gaussian distribution N(σ ,ν ) in − Figure 4. The blue line plots the function k = (1 2α)δ(β)+α and dataset , as proved in lemma 1. For an arbitrary sample 1−δ(β) the brown line plots the function k − β . It indicates drawn fromS Gaussian distribution N(σ2,ν2), it is smaller = 192(1 ) 2 the relationship between k and β when the patch size is 8. The E Eµ σ than the value µ with the probability Φ( −ν ). Thus, in intersection point of these two curves is 65%, which means that dataset , the expected number of samples that is smaller we only require 192(1 − β) = 67 elements following Gaussian S than Eµ can be denoted as distribution N(σ2,ν) for accurate estimation of noise variance σ2 Eµ σ2 kΦ( − ) ν As illustrated in Fig.4, the dimension of subspace must (12) 1 k α be smaller than a threshold when estimating its mean value. kΦ βΦ− ( − ) ≥  k 2α + 1  In this paper, the patch size is set as 8 by default, then the − s dimension r of X = xt t=1 is 192. As shown in Fig.4, Based on the definition of Φ(x), we can get 1 δ(β) > 0. our method is expected{ to} achieve the accurate mean value − Thus, the given condition (10) is equivalent to of Gaussian distribution when the dimension of subspace is smaller than m = 192 (1 β) = 67. During the proce- × − k α dure, the statements described in theorem 2 always hold as − >δ(β) the number of random variables k don’t change during the k 2α + 1 − procedure, while β 0 along with the decreasing of the → m number of outliers m, which leads to a lower requirement k α 1 1 β of the number k. − > Φ (1 + )Φ− (0.5+ ) k 2α + 1  β 2  − 2.4. Implementation and Time Complexity m The core idea of our method is to obtain the accurate β 1 k α 1+ β Φ Φ− ( − ) > noise variance by removing the eigenvalues of principal 1+ β k 2α + 1  2 − dimensions 1 from . As shown in Alg.1, we first de- S S m compose the observed image I into the set of patches with r 1 k α r patch size d, and calculate the eigenvalues = λi i=1 of kΦ βΦ− ( − ) > . (13) S { }  k 2α + 1  2 the decomposed dataset Xs. Initialized with 1 = and − S ∅ 2 = , our method use the difference between the mean As Φ(x) is an increasing function, Eµ is larger than the me- Svalue µSand the median value ϕ of the subset to indicate S2 dian value of the set when β > 0. whether there are outliers in the subset 2 or not. If µ = ϕ, nd S S 6 For the 2 Statement the largest value in 2 is taken out and put into the subset When β = 0, there exist no outliers in the dataset since . This procedureS will stop until the condition µ ϕ is S 1 = N(σ2,ν2) is a symmetric function and the mean value is Sachieved. The MATLAB version of the source code is in- expectedly to be the median value of the dataset abso- cluded in the supplementary file. S lutely. According to the steps described in Alg.1, we can ana- In summary, we can conclude that we can decide whether lyze the time complexity of our method step by step. The there exists no outliers in by checking if µ is its median complexity of generating sets X with s r dimensional S (1 2α)δ(β)+α − value, when there are more than −1 δ(β) redundant di- samples from the observed image I is (sr). Calculat- mensions in the dataset , whose eigenvalues− follow Gaus- ing the mean vector µ and covarianceO matrix Σ of the sian distribution N(σ2,νS2). dataset X are (sr) and (sr2) respectively. The eigen- decompositionO of Σ is (rO3). Finally, the sorting process in O

481 the 4th step consumes (r2) and the checking procedure in For patch size defined as r = cd2, there is a trade-off be- the 5 9the steps takeO (r2). Thus, the time complexity of tween the statistical significance of the result and the prac- our method− is (sr2 +Or3), which means that our algorithm tical executing time. As there exist correlation between is very fast andO can be solved in polynomial time. pixels for images with texture, the patch size should be large enough to represent the texture patterns. Moreover, Algorithm 1 Estimating Image Noise Level larger patch size d will lead to a higher β value, which can M N c be observed in Fig.4. It means that larger patch size can Require: Observed Image I R × × , Patch Size d. ∈ handle higher proportion of outliers. However, the patch 1: Generating dataset X = x s , which contains s = t t=1 size d cannot be arbitrary large, since larger patch size d (M d + 1)(N d + 1){ patches} with size r = cd2 − − will lead to smaller sample size s of the generated dataset from the image I. s s X = xt t=1, which will be contradict with the assumption 2: µ = t=1 xt { } 1 s T made in Lemma 1 that s should be large enough. Moreover, 3: Σ= Ps t=1(xt µ)(xt µ) − − r as described previously, the time complexity of our method 4: CalculatingP the eigenvalues λi of the covariance i=1 is related to the patch size r cd2. A larger patch size matrix Σ with r = d2 and order{ }λ λ ... λ = 1 ≥ 2 ≥ ≥ r will lead to longer running time. In summary, the suitable 5: for i = 1:r do 1 r patch size should be chosen with the consideration of the 6: τ = r i+1 j=i λj − r running speed and the assumption used in Lemma 1. In this 7: if τ is the medianP of the set λj then { }j=i paper, the patch size d is chosen to be 8 by default for all 8: σ = √τ and break experiments. 9: end if 10: end for 3.2. Noise Level Estimation Results 11: return noise level estimation σ We add synthetic with different known vari- ance to each testing image, and estimate the noise level in the modified noisy image with different algorithms. As dis- 3. Results and Discussions cussed below, three measurements are used for the evalua- tion of our performance. To evaluate the performance of the proposed method, we apply it on two benchmark datasets respectively: TID2008 [22] and BSDS500 test set [1]. To further evaluate its per- 3.2.1 Performance Evaluation Criterion formance on real noisy images, we apply our method on Given the observed image with noise variance σ, each 100 noisy images of a static scene captured by a commer- method can be considered asI a function σˆ = f( ) and the cial under low- condition. We compare optimal estimator should minimize the expectedI square er- its performance with two state-of-the-art methods [19, 23], ror (MSE) which is defined as following: whose source codes can be downloaded from their home- page 1 2. For a fair comparison, all the methods are im- E(f( ) σ)2. (14) plemented in the environment of Matlab R2013a (Intel I − Core(TM) i7 CPU 920 2.67GHz 4). For [19] and [23], No matter which method (function f) we use, we can de- we use the default parameters reported× in their papers. As compose the mean square error (MSE) (14) as below: [19] estimates the noise level for each channel separately MSE = Bias2(f( )) + Std2(f( )), (15) when handling colorful image, the final noise level estima- I I tion is obtained by averaging the noise level estimation of where Std(f( )) = E[f( ) E(f( ))]2 denotes the ro- I I − I each channel. The effectiveness of our approach is finally bustness of an estimatorp f( ) and Bias(f( )) = E σ evaluated by using the automatically estimated noise level E(f( )) denotes the accuracyI of an estimator.I Thus,| in− as input parameter to the denoising algorithm of BM3D on this paper,I | we utilize the following three measurements to both synthetic and real noisy images. evaluate the performance of different estimators,

3.1. Parameter Configurations Bias: evaluate the accuracy of an estimator. • As described in Alg. 1, the patch size d is the only free Std: evaluate the robustness of an estimator. parameter required to be pre-specified. In this section, we • √MSE: evaluate the overall performance of an esti- mainly discuss the influence of d on the performance of our • method. mator.

1http://www.ok.ctrl.titech.ac.jp/res/NLE/AWGNestimation.html Note that smaller Bias, Std or √MSE value means better 2http://physics.medma.uni-heidelberg.de/cms/projects/132-pcanle performance.

482 Figure 5. The performance of different methods on TID2008. Our method outperforms other two algorithms from the perspective of both accuracy and robustness.

Method min t(s) t¯(s) max t(s) textures, we compare the performance of different methods Proposed 0.5409 0.5785 0.6435 on a more challenging dataset, 200 test images of BSDS500 Liu et al. 3.5541 4.1901 4.9197 [1]. Images in BSDS are achieved in different environments Pyatykh et al. 3.3829 3.4462 5.0448 and contain rich features. Fig.1 shows the image ♯77062 in the test set of BSDS500, which contains rich texture infor- Table 1. Execution Time of the different methods on TID2008. mation and is hard to classify the homogeneous field for The proposed algorithm is the fastest. noise estimation. To evaluate the performance of different methods for a larger range of noise level, we synthesize 5 3.2.2 Performance on TID2008 noisy images with different noise levels from σn = 10 to 50 for each image. As illustrated in Fig.6, our method outper- The TID2008 dataset has been widely used for the eval- forms other two algorithms under all three measurements uation of full-reference assessment metrics, significantly. Moreover, we have compared our method which contains 25 reference images without any compres- with Gavish’s work [10], our method achieves better per- sion. As described in [19, 23], the reference images from formance, e.g. on BSDS500 with σn = 50, the √MSE of TID2008 still contain a small level of noise. Thus, the Gavish’s method is 1.612, while ours is 0.183. same experimental settings are applied in our experiments, where each estimation result is corrected by the equation 2 2 2 2 3.2.4 Performance on the Real-world Dataset σcorr = σest σref , with σest being the estimation result − 2 given by each method and σref denoting the inherent noise contained in each reference image. To make a fair compar- 2 ison, we take the σref calculated by [23] for each reference image. As shown in Fig.5, the proposed algorithm outperforms other two algorithms in most cases by removing the upper outliers in the set of eigenvalues . It can be observed that [23] outperforms our method whenS σ2 = 25, but [23] is less robust in this situation, as shown in Fig.5 (b). Note that the results of [19, 23] shown in Fig.5 are coincident with the results reported in their original papers. Figure 7. (a) An example of the 100 images captured by Nikon Furthermore, the average execution time of either [23] D5200. (b) An enlarged patch of this image. (c) The clean version or [19] is nearly 4 seconds per image as both of them re- of this patch. quire to select low-rank patches during their procedure. In comparison, the proposed method takes only 0.5 seconds We further evaluate the performance of our method on per image, which is almost 8 times faster than the previous real-world noisy images. Following the experimental set- methods. The specific executing time of each algorithm can ting in [18], we capture 100 images of a static scene under be found in Tab. 1. low-light condition with Nikon D5200 (ISO 6400, time 1/20s and f/5) and the mean of the 100 im- 3.2.3 Performance on BSDS500 ages is served as the clean image. With the clean image and 100 noisy images, the noise variance of each image can It has been argued that all images in the TID2008 dataset be easily estimated as the ground truth variance, e.g. the contain small or large homogeneous areas [23]. To further ground truth σ of Fig.7 (a) is 5.9549. Then we apply both demonstrate the robustness of our method on images full of our method and competitive ones on each of the 100 images

483 Figure 6. The performance of different methods on the test dataset of BSDS500. Our method outperforms other two algorithms from the perspective of both accuracy and robustness.

Method Bias Std √MSE Proposed 0.1198 0.2234 0.2535 Liu et al. 3.3463 0.0957 3.3476 Pyatykh et al. 3.3919 0.0957 3.3932

Table 2. Performance of the different methods on the real-world dataset. The proposed algorithm significantly outperforms others. individually for noise estimation. Table 2 demonstrates the experimental result and the proposed one significantly out- performs the other two. It shows that our method is more suitable for practical noise estimation applications. 3.3. Image Denoising Results Image denoising is a very important preliminary step for many computer vision methods. During last decades, im- pressive improvements have been made in this area. How- ever, the noise level σ is regarded as a known parameter in many algorithms with good performance. BM3D [6] is one of such algorithms which requires noise level as an input pa- Figure 8. (a) compares the performance of BM3D between our es- rameter. In this section, we apply our noise level estimation timated noise level and the true noise level. (b) and (c) show a real algorithms in the application of BM3D image denoising. noisy image and the denoising image with BM3D+our method. All images in TID2008 and BSDS500 are used here to evaluate the effectiveness of using our noise level estima- tion results as the input parameter of BM3D. As confirmed by Fig.8 (a), the performance of BM3D with our estimated noise level is almost the same as that with true noise level. the least execution time. Due to the article limitation, we only use PSNR [11] to There still exist several future works that we are inter- evaluate the denoising performance. A real-world denois- ested in. In this paper, the noise of an image is assumed to ing examples using BM3D with our estimation results is be zero-mean additive Gaussian distributed. However, the demonstrated in Fig.8 (b) and (c). noise can be more complicated in real world applications 4. Conclusion and Future Works which is required to be investigated. Meanwhile, we are interested in developing efficient blind image denoising In this paper, we propose an efficient method to esti- algorithm in the future. mate noise variance from a single noisy image automati- cally, which is very important for different computer vision Acknowledgments: This work was supported by a algorithms. The performance of this method has been the- grant from the National Basic Research Program of oretically guaranteed. Furthermore, by comparing our ap- China, 973 Program (Project no. 2015CB351706) and the proach with two state-of-the-art ones, we show that the ac- Research Grants Council of Hong Kong (Project number curacy of the proposed method is the best in most cases with CUHK412513).

484 References [18] C. Liu, W. T. Freeman, R. Szeliski, and S. B. Kang. Noise estimation from a single image. In Computer Vision and Pat- [1] P. Arbelaez, M. Maire, C. Fowlkes, and J. Malik. Contour tern Recognition, 2006 IEEE Computer Society Conference detection and hierarchical image segmentation. IEEE Trans. on, volume 1, pages 901–908. IEEE, 2006. Pattern Anal. Mach. Intell., 33(5):898–916, May 2011. [19] X. Liu, M. Tanaka, and M. Okutomi. Single-image noise [2] R. Bracho and A. C. Sanderson. Segmentation of images level estimation for blind denoising. Image Processing, IEEE based on intensity gradient information. In Computer Vision Transactions on, 22(12):5226–5237, 2013. and Pattern Recognition, 1985 IEEE Computer Society Con- [20] P. Meer, J. Jolion, and A. Rosenfeld. A fast parallel algo- ference on, page 19–23. IEEE, 1985. rithm for blind estimation of noise variance. Pattern Analysis [3] A. Buades, B. Coll, and J.-M. Morel. A review of image and Machine Intelligence, IEEE Transactions on, 12(2):216– denoising algorithms, with a new one. Multiscale Modeling 223, 1990. & Simulation, 4(2):490–530, 2005. [21] S. Nakajima, R. Tomioka, M. Sugiyama, and S. D. Baba- [4] G. Chen, P.-A. Heng, and L. Xu. Projection-embedded byy can. Perfect dimensionality recovery by variational bayesian learning algorithm for gaussian mixture-based clustering. In pca. In Advances in Neural Information Processing Systems, Applied Informatics, volume 1, pages 1–20. Springer, 2014. pages 971–979, 2012. [5] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Image [22] N. Ponomarenko, V. Lukin, A. Zelensky, K. Egiazarian, denoising with block-matching and 3d filtering. In Elec- M. Carli, and F. Battisti. Tid2008 - a database for evaluation tronic Imaging 2006, pages 606414–606414. International of full-reference visual quality assessment metrics. Advances Society for Optics and Photonics, 2006. of Modern Radioelectronics, 10:30–45, 2009. [6] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian. Im- [23] S. Pyatykh, J. Hesser, and L. Zheng. Image noise level esti- age denoising by sparse 3-d transform-domain collabora- mation by principal component analysis. Image Processing, tive filtering. Image Processing, IEEE Transactions on, IEEE Transactions on, 22(2):687–699, 2013. 16(8):2080–2095, 2007. [24] K. Rank, M. Lendl, and R. Unbehauen. Estimation of image [7] M. Elad and M. Aharon. Image denoising via sparse and noise variance. IEE Proceedings-Vision, Image and Signal redundant representations over learned dictionaries. Im- Processing, 146(2):80–84, 1999. age Processing, IEEE Transactions on, 15(12):3736–3745, [25] J. Royston. Algorithm as 177: Expected normal order statis- 2006. tics (exact and approximate). Applied Statistics, pages 161– [8] E. Elhamifar and R. Vidal. Sparse subspace clustering: Al- 165, 1982. gorithm, theory, and applications. Pattern Analysis and [26] H. Scharr and H. Spies. Accurate optical flow in noisy image Machine Intelligence, IEEE Transactions on, 35(11):2765– sequences using flow adapted . Signal 2781, 2013. Processing: Image Communication, 20(6):537–553, 2005. [9] W. T. Freeman, E. C. Pasztor, and O. T. Carmichael. Learn- [27] S. S. Shapiro and M. B. Wilk. An analysis of variance test for ing low-level vision. International journal of computer vi- normality (complete samples). Biometrika, pages 591–611, sion, 40(1):25–47, 2000. 1965. [10] M. Gavish and D. L. Donoho. The optimal hard threshold for [28] M. O. Ulfarsson and V. Solo. Dimension estimation in noisy singular values is. Information Theory, IEEE Transactions pca with sure and random matrix theory. Signal Processing, on, 60(8):5040–5053, 2014. IEEE Transactions on, 56(12):5804–5816, 2008. [11] A. Hore and D. Ziou. Image quality metrics: Psnr vs. ssim. [29] R. Vidal. A tutorial on subspace clustering. IEEE Signal In ICPR, volume 34, pages 2366–2369, 2010. Processing Magazine, 28(2):52–68, 2010. [12] B. K. Horn and B. G. Schunck. Determining optical flow. In 1981 Technical Symposium East, pages 319–331. Interna- tional Society for Optics and Photonics, 1981. [13] J. Immerkaer. Fast noise variance estimation. Computer vi- sion and image understanding, 64(2):300–302, 1996. [14] I. Jolliffe. Principal component analysis. Wiley Online Li- brary, 2002. [15] S. Kritchman and B. Nadler. Non-parametric detection of the number of signals: Hypothesis testing and random matrix theory. Signal Processing, IEEE Transactions on, 57(10):3930–3941, 2009. [16] M. Lebrun, A. Buades, and J.-M. Morel. A nonlocal bayesian image denoising algorithm. SIAM Journal on Imag- ing Sciences, 6(3):1665–1688, 2013. [17] J.-S. Lee. Refined filtering of image noise using local statis- tics. Computer graphics and image processing, 15(4):380– 389, 1981.

485