Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Multichannel Image Denoising via Weighted Schatten p-norm Minimization

Xinjian Huang , Bo Du∗ and Weiwei Liu∗ School of Computer Science, Institute of Artificial Intelligence and National Engineering Research Center for Multimedia Software, Wuhan University, Wuhan, China h [email protected], [email protected], [email protected]

Abstract the-art denoising methods are mainly based on sparse repre- sentation [Dabov et al., 2007b], low-rank approximation [Gu The R, G and B channels of a color image gen- et al., 2017; Xie et al., 2016], dictionary learning [Zhang and erally have different noise statistical properties or Aeron, 2016; Marsousi et al., 2014; Mairal et al., 2012], non- noise strengths. It is thus problematic to apply local self-similarity [Buades et al., 2005; Dong et al., 2013a; image denoising algorithms to color im- Hu et al., 2019] and neural networks [Liu et al., 2018; age denoising. In this paper, based on the non-local Zhang et al., 2017a; Zhang et al., 2017b; Zhang et al., 2018]. self-similarity of an image and the different noise Current methods for RGB color image denoising can be strength across each , we propose a Multi- categorized into three classes. The first kind involves apply- Channel Weighted Schatten p-Norm Minimization ing grayscale image denoising algorithms to each channel in (MCWSNM) model for RGB color image denois- a channel-wise manner. However, these methods ignore the ing. More specifically, considering a small local correlations between R, G and B channels, meaning that un- RGB patch in a noisy image, we first find its nonlo- satisfactory results may be obtained. The second method is cal similar cubic patches in a search window with to transform the RGB color image into other color spaces [D- an appropriate size. These similar cubic patches are abov et al., 2007a]. This transform, however, may change the then vectorized and grouped to construct a noisy noise distribution of the original observation data and intro- low-rank matrix, which can be recovered using the duce artifacts. The third type of method involves making full Schatten p-norm minimization framework. More- use of the correlation information across each channel and over, a weight matrix is introduced to balance each conduct the denoising task on R, G and B channels simulta- channel’s contribution to the final denoising result- neously [Zhang et al., 2017a]. s. The proposed MCWSNM can be solved via the Similar to the case of grayscale images, noise from each alternating direction method of multipliers. Con- channel can be generally, regarded as additive Gaussian vergence property of the proposed method are also noise. However, the noise levels of each channel are diverse theoretically analyzed . Experiments conducted on due to camera sensor characteristics and the imagery environ- both synthetic and real noisy color image dataset- ment, such as fog, haze, illumination intensity, etc. Moreover, s demonstrate highly competitive denoising perfor- the on-board processing steps in pipelines may mance, outperforming comparison algorithms, in- also introduce different noise strength assigned to different cluding several methods based on neural networks. channels [Nam et al., 2016]. This reveals the difficulties and challenges faced by RGB color image denoising. Intuitive- 1 Introduction ly, if we apply the grayscale image denoising algorithm to color images in a band-wise manner without considering the Noise corruption is inevitable during the image acquisition mutual information and noise difference between each chan- process and may heavily degrade the visual quality of an ac- nel, artifacts or false could be generated [Mairal et al., quired image. Image denoising is thus an essential prepro- 2008]. An example of this is presented in Fig.1. Here, we cessing step in various image processing and computer vi- apply a representative low-rank-based method, WSNM [Xie [ sion tasks Chatterjee and Milanfar, 2010; Ye et al., 2018; et al., 2016], and our proposed method, MCWSNM (see Sec- ] Wang et al., 2018b ; moreover, it is also an ideal test platform tion 2), to the Kodak PhotoCD Dataset1. It can be seen from for evaluating image prior models and optimization methods the figure that WSNM retains a large amount of noise and, [ ] Roth and , 2005 . As a result, image denoising re- to some extent, introduces some artifacts. Thus, the issue of mains a challenging yet fundamental problem. Early denois- how noise differences in each channel should be modeled is ing algorithms were mainly devised on the basis of filter and key to designing a good RGB color denoising algorithm. transformation [Dabov et al., 2007b], such as wavelet trans- One well-known color image denoising method is color form and curvelet transform [Starck et al., 2002]. State-of- ∗Co-corresponding Author. 1http://r0k.us/graphics/kodak/

637 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

V 1 b V

g 1 r … (a) Clean (b) Noisy (c) WSNM (d) MCWSNM V 2 … 3V V ଶ … Figure 1: Denoised results of ”kodim07” in the Kodak PhotoCD Dataset with σr = 50, σg = 5 and σb = 20. WSNM performs in a band-wise manner; MCWSNM jointly processes three channels V M < jointly and considers their noise differences simultaneously. (It is V M best to zoom in on these images on-screen). Figure 2: Illusion of the grouped local patches. block-matching 3D (CBM3D) [Dabov et al., 2007a]. This 2 Our Proposed Method is a transformation field method that may destroy the noise 2.1 Method Formulation distribution across each channel and introduce artifacts. The The image denoising task involves recovering a clean image method in [Zhu et al., 2016] concatenates similar patches of xc from its noisy observation data yc, where c = {r, g, b} RGB channels into a long vector. However, this concatena- is the index of the R, G and B channels. Here, we assume tion operation regards each channel equally and ignores the that the noisy image is corrupted by additive white Gaussian noise level differences across each channel. For a long time, noise nc with σc = (σr, σg, σb), where σr, σg, σb denote the low-rank theory has been widely used in many fields, such noise standard deviation in the R, G, B channels, respectively. [ et al. ] as recommendation system Wang , 2018a , community Mathematically, y = x + n . Low-rank-based denoising [ et al. et al. ] c c c search Fang , 2020b; Fang , 2020a , multi-out task methods utilize the low-rank property of the grouped non- [ et al. ] [ et al. ] Liu , 2019 and multi-label Liu , 2017 , etc. local similarity patches; this group procedure is illustrated in y Recently, based on the non-local self-similarity of an image Fig.2. Given a noisy RGB color image c, each local patch of size s × s × 3 is extracted and stretched into a vector and SVD, several low-rank-based denoising algorithms have 2 2 y = (yT , yT , yT )T ∈ R3s y , y , y ∈ Rs been proposed, such as weighted nuclear norm minimization r g b , where r g b are (WNNM) [Gu et al., 2017] and WSNM [Xie et al., 2016]. the corresponding patches in the R, G and B channels. For y M The nuclear norm and Schatten p-norm [Xie et al., 2016] are each vectorized local patch , we find its most similar y two surrogates of the rank function [Xu et al., 2017a]. In patches (including itself) by Euclidean distance in a proper M WNNM and WSNM, singular values are assigned different size search window around it. We then stack the similar weights using a weight vector. It has been proven that WNN- patches vector column by column to obtain the noisy patch R3s2×M M is a special case of WSNM and that WSNM outperform- matrix Y = X + N ∈ , where X and N are the s WNNM. Multi-channel WNNM (MCWNNM) [Xu et al., corresponding clean and noise patch matrix, respectively. It 2017b] is an extension of WNNM from grayscale image to is clear that Y is a low-rank matrix and X can be recovered color image that operates by introducing a weight matrix to using a low-rank approximation algorithm. adjust each channel’s contribution to the final results based As a nonconvex surrogate of the rank function, the weight- Rm×n on noise level. ed Schatten p-norm of a matrix Z ∈ is defined as min{m,n} 1 P p p kZkw,Sp = ( i=1 wiσi ) , 0 < p < 1, where w = T In this paper, based on the low-rank property of the non- (w1, w2, . . . , wmin{m,n}) is a non-negative weight vector local self-similar patches and the different noise strength and σi is the i-th singular value of Z. The weighted nuclear across each channel, we propose a multi-channel weighted norm [Gu et al., 2017] is a special case of the weighted Schat- Schatten p-norm minimization (MCWSNM) model for RG- ten p-norm when the power p = 1. It has been proven that the B color image denoising. More specifically, considering a Schatten p-norm with 0 < p < 1 is a better approximation of small local RGB patch in a noisy image, we first find its non- rank function than the nuclear norm [Zhang et al., 2013]. local similar cubic patches in a search window with an appro- Intuitively, when denoising a noisy color image, if a chan- priate size. These similar cubic patches are then vectorized nel is heavily polluted, the contribution of this channel to the and concatenated to construct a noisy low-rank matrix; then final denoised results should be less, and vice versa. Thus, Schatten p-norm minimization method is performed to recov- a weight matrix W assigned to the noise strength of each er the noise-free low-rank matrix. Moreover, in order to make channel is introduced. Based on the low-rank property of full use of the redundant information across each channel, a the non-local self-similarity, we propose the following multi- weight matrix is introduced to balance each channel’s con- channel WSNM (MCWSNM) model to recover the local low- tribution to the final denoising results. This model can be rank patch X: solved via alternating direction method of multipliers (AD- MM) framework, and each variable can be updated with a 2 p min kW (Y − X)kF + λkXkw,S , (1) closed-form solution. In addition, the convergence property X p of our proposed algorithm is also provided. where λ > 0 is a tradeoff parameter used to balance the

638 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Algorithm 1 min kY − Xk2 + λkXkp via GST Algorithm 2 Solve MCWSNM via ADMM X F w,Sp r Input: data Y , weight W , p, K1, µ > 1, tol > 0; Input: Y , weight {wi}i=1, λ, p, K 1: Initialization: X = Z = 0, L = 0, ρ > 0, flag = 1: Singular value decomposition Y = UΣV T , Σ = 0 0 0 0 False, k = 0; diag{σ , σ , . . . , σ }; 1 2 r 2: while flag == False do 2: for i = 1 : r do 1 3: X GST 2−p Update by using (4); 3: τp (wi, λ) = (2λwi(1 − p)) + λwip(2λwi(1 − p−1 4: Update Z by solving (5); p)) 2−p ; 5: Update L: Lk+1 = Lk + ρk(Xk+1 − Zk+1); GST 4: if |σi| ≤ τp (wi) then 6: Update ρ: ρk+1 = µρk; 5: δi = 0; 7: k = k + 1; 6: else 8: if (Convergence condition is satisfied) or (k ≥ K1) (k) 7: δi = |σi|; then 8: for k = 0, 1,...,K do 9: flag = True; (k+1) (k) p−1 10: end if 9: δi = |σi| − wip(δi ) ; 10: k = k + 1; 11: end while 11: end for Output: X∗. (k) 12: δi = sgn(σi)δi ; 13: end if 14: end for the global optima of which can be efficiently solved by the 15: ∆ = diag(δ1, δ2, . . . , δr); generalized soft-thresholding (GST) algorithm. This is sum- Output: X∗ = U∆V T marized in Algorithm 1. (3) Lk+1 = Lk + ρk(Xk+1 − Zk+1). (4) ρk+1 = µρk, (µ > 1). data fidelity term and the regularization term, while W = The above alternative updating steps are repeated until ei- s2×s2 diag(τrI, τgI, τbI), I ∈ R is the identity matrix, ther the convergence condition is satisfied or the number of (τr, τg, τb) = min{σr, σg, σb}/(σr, σg, σb). iterations exceeds a preset K1. The ADMM algorithm con- verges when 1) kX − Z k ≤ tol, 2) kX − 2.2 Model Optimization k+1 k+1 F k+1 XkkF ≤ tol and 3) kZk+1 −ZkkF ≤ tol are simultaneously We utilize the variable splitting method to the MCWSNM satisfied; here, tol is a small tolerance value. The complete model (1). By introducing an augmented variable Z, model updating procedures are summarized in Algorithm 2. (1) can be reformulated as a linear constrained optimization problem. Thus, model (1) can be expressed as follows: 2.3 Convergence and Complexity 2 p min kW (Y − X)kF + λkZkw,S , s.t. X = Z. (2) X,Z p Although the proposed MCWSNM (1) is nonconvex, the global optimum also can be obtained for the limitations of Because the two variables X and Z in problem (2) are weights permutation in GST. From Fig.3, it can be seen that separable, this programming problem can be solved using the kXk+1 − Xkk2, kZk+1 − Zkk2 and kXk+1 − Zk+1k2 si- ADMM framework. The augmented Lagrangian function of multaneously approach 0 during the iteration process. This (2) is curve is based on a synthetic experiment on Kodak PhotoCD L(X, Z, L, ρ) = kW (Y − X)k2 + λkZkp Dataset and the test image is ”kodim01” (see Experiments F w,Sp ρ Section in detail). + Tr(LT (X − Z)) + kX − Zk2 , (3) 2 F 180 ||X -X || where L is the augmented Lagrangian multiplier and ρ > 0 is k+1 k 2 160 ||X -Z || k+1 k+1 2 the penalty parameter. By taking derivatives of L with respect ||Z -Z || to X and Z and setting the derivative function to be zero, the 140 k+1 k 2 variables can be alternatively updated as follows: 120 (1) Xk+1 100

value 80 T ρk −1 T ρk 1 = (W W + I) (W WY + Zk − Lk). (4) 2 2 2 60

(2) Zk+1 40 ρ 1 20 = arg min k kZ − (X + L )k2 +λkZkp . (5) k+1 k F w,Sp 0 Z 2 ρk 0 5 10 15 20 25 30 35 40 45 50 iteration num. This is a weighted Schatten p-norm minimization problem. It was noted in [Xie et al., 2016] that if the weights follow the Figure 3: The convergence curves of kXk+1 − Xkk2, kZk+1 − non-descending permutation, (5) can be equivalently trans- Zkk2 and kXk+1 − Zk+1k2 of image ”kodim01” in the Kodak Dataset. formed into independent nonconvex `p-norm subproblems,

639 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

2 P∞ ρk+1+ρk We next provide a theorem that theoretically guarantees the ≤ L(X1, Z1, L0, ρ0) + a k=0 2ρ2 convergence property of Algorithm 2. k 2 P∞ 1+µ = L(X1, Z1, L0, ρ0) + a k=0 2ρ µk Theorem 1. Assume that the weights w are sorted in a 0 a2 non-descending order and the parameter ρk is unbounded; P∞ 1 ≤ L(X1, Z1, L0, ρ0) + k=0 µk−1 then the sequences {Xk}, {Zk} and {Lk} generated in Al- ρ0 gorithm 2 satisfy: (a) limk→+∞ kXk+1 − Zk+1kF = 0; < +∞. (b) limk→+∞ kXk+1 − XkkF = 0; (c) limk→+∞ kZk+1 − ZkkF = 0. Thus, {L(Xk+1, Zk+1, Lk+1, ρk+1)} is upper bounded. We next prove the sequences of {Xk} and {Zk} are upper Proof. We first prove that the sequence {Lk} generated by bounded. Since {L(Xk, Zk, Lk, ρk)} and {Lk} are upper T Algorithm 2 is upper bounded. UkΛkVk denotes the SVD bounded and 1 of matrix Lk + Xk+1 in the k + 1 iteration, where Λk ρk is the diagonal singular value matrix. By using the GST al- 2 p T kW (Y − Xk)kF + kZkkw,S gorithm for WSNM, we have Zk+1 = Uk∆kVk , where p 1 2 n ∆k = {diag(δk, δk, . . . , δk )} is the diagonal singular value = L(Xk, Zk, Lk−1, ρk−1) − hLk−1, Xk − Zki matrix after generalized soft-thresholding operation. Then: ρk−1 2 − kXk − ZkkF 2 2 2 kLk+1kF = Lk + ρk(Xk+1 − Zk+1)kF = L(Xk, Zk, Lk−1, ρk−1) − hLk−1, (Lk − Lk−1)/ρk−1i 1 = ρ2 k L + X − Z k2 ρ L − L k ρ k k+1 k+1 F − k−1 k k k−1 k2 k 2 ρ F 2 T T 2 2 2 k−1 = ρkkUkΛkVk − Uk∆kVk kF = ρkkΛk − ∆kkF 2 2 kLk−1kF − kLkkF 2 X 2 X 2 = L(Xk, Zk, Lk−1, ρk−1) + , = ρkk rwi/ρkkF = kr wikF . 2ρk−1 i i

Hence, the sequence {Lk} is upper bounded. and the sequence {W (Y −Xk)} and {Zk} are upper bound- We then prove that the sequence of the Lagrange function ed. Since Lk+1 = Lk + ρk(Xk+1 − Zk+1), {Xk} is also {L(Xk+1, Zk+1, Lk+1, ρk+1)} is also upper bounded. The upper bounded. Thus, there exists at least one accumulation inequality L(Xk+1, Zk+1, Lk, ρk) ≤ L(Xk, Zk, Lk, ρk) al- point for {Xk, Zk}. More specifically, we obtain that ways holds since we have the globally optimal solution of X and Z in their corresponding subproblems (step 3 and step 4 1 in Algorithm 2. Based on the updating rule of L, it yields lim kXk+1 − Zk+1kF = lim kLk+1 − LkkF = 0 k→+∞ k→+∞ ρk L(Xk+1, Zk+1, Lk+1, ρk+1) 2 p and that the accumulation point is a feasible solution to the = kW (Y − Xk+1)kF + kZk+1kw,S p objective function. Thus, equation (a) is proved. ρk+1 + hL , X − Z i + kX − Z k2 Finally, we prove that the change of sequence {X } and k+1 k+1 k+1 2 k+1 k+1 F k {Zk} in adjacent iterations tends to be 0. For Xk+1 and = L(Xk+1, Zk+1, Lk, ρk) + hLk+1 − Lk, Xk+1 − Zk+1i 1 Xk = (Lk − Lk−1) + Zk, we have ρ − ρ ρk−1 + k+1 k kX − Z k2 2 k+1 k+1 F L − L lim kXk+1 − XkkF k+1 k k→∞ = L(Xk+1, Zk+1, Lk, ρk) + hLk+1 − Lk, i ρk T ρk −1 T ρk 1 = lim k(W W + I) (W WY + Zk − Lk) ρk+1 − ρk Lk+1 − Lk 2 k→∞ 2 2 2 + k kF 2 ρk 1 − (Lk − Lk−1) − ZkkF ρk+1 + ρk 2 ρk−1 = L(Xk+1, Zk+1, Lk, ρk) + 2 kLk+1 − LkkF . 2ρk T ρk −1 T T = lim k(W W + I) (W WY − W WZk k→∞ 2 Since {Lk} is upper bounded, the sequence {Lk+1 − Lk} is 1 1 also upper bounded. Denote by a the upper bound of {Lk+1− − Lk) − (Lk − Lk−1)kF 2 ρk−1 Lk} for all k ≥ 0, i.e. {Lk+1 − Lk} ≤ a, ∀k ≥ 0. We T ρk −1 T T therefore conclude that ≤ lim k(W W + I) (W WY − W WZk k→∞ 2 L(Xk+1, Zk+1, Lk+1, ρk+1) 1 1 − Lk)kF + k(Lk − Lk−1)kF ρk+1 + ρk 2 2 ρk−1 ≤ L(Xk+1, Zk+1, Lk, ρk) + 2 a 2ρk = 0.

640 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

1 1 32 Since Zk+1 = Lk − Lk+1 + Xk+1, we conclude that ρk ρk 31

lim kZk+1 − ZkkF 30 k→∞ 1 1 29 = lim k Lk − Lk+1 + Xk+1 − ZkkF k→∞ ρk ρk 28 1 27 = lim kX + L − Z + X

k k−1 k k+1 PSNR(dB) k→∞ ρk−1 26 1 1 1 25 − Xk − Lk−1 + Lk − Lk+1kF ρk−1 ρk ρk 24 NC NCSR PGPD MCWNNM X rwi 23 ≤ lim k kF + kXk+1 − Xkk DnCNN FFDNet IRCNN MCWSNM k→∞ ρk−1 5 10 15 20 i Image Number 1 1 1 + k Lk−1 − Lk + Lk+1kF = 0. Figure 4: Line graph of the denoising results in Kodak Dataset. ρk−1 ρk ρk This completes the proof. The main computational cost in a single iteration of Algo- rithm 2 consists of two parts. The first part involves updat- 4 3 ing X, in which the time complexity is O(max{s M,M }). (a) (b) (c) (d) (e) The second part involves updating Z. The predominant cost of this updating is the SVD and the calculation of X∗. It- s complexity is O(min{s4M, s2M 2} + s2r4M). The costs for updating L and ρ can be ignored. Therefore, the total time complexity of our MCWSNM for solving problem (1) is (f) (g) (h) (i) (j) 3 about O(K1M ). Figure 5: Denoised images of ”kodim1” in Kodak PhotoCD Dataset 3 Experiments with σr = 40, σg = 20 and σb = 30. ((a): Original, (b): Noisy, To illustrate the performance of the proposed MCWSNM, we (c): NC, (d): NCSR, (e): PGPD, (f): MCWNNM, (g): DnCNN, (h): FFDNet, (i): IRCNN, (j): MCWSNM). implement MCWSNM on synthetic and real color noisy im- ages. We also compare the proposed method with several re- cent proposed methods, including NC [Lebrun et al., 2015], NCSR [Dong et al., 2013b], PGPD [Xu et al., 2015], M- CWNNM [Xu et al., 2017b], DnCNN [Zhang et al., 2017a], FFDNet [Zhang et al., 2018] and IRCNN [Zhang et al., 2017b]. In more detail, NC is an online blind image denois- ing platform, NCSR and PGPD are designed for grayscale images and MCWNNM is a color image denoising method (a) Original (b) Noisy (c) NC (d) NCSR (e) PGPD while DnCNN, FFDNet, and IRCNN are three methods that operate on the basis of convolutional neural networks. All the parameters of the comparison algorithms are either optimally assigned, or chosen as described in the reference papers. Noise level of MCWSNM and most comparative methods should be provided as parameters. In the synthetic experi- mental case, the noise (σr, σg, σb) in the R, G and B channels (f) MCWNNM (g) DnCNN (h) FFDNet (i) IRCNN (j) MCWSNM are assumed to be known. In the case involving real noise, we utilize the noise estimation method outlined in [Chen et Figure 6: Denoised images of cropped ”kodim3” in the Kodak Pho- al., 2015] to estimate the noise level of the noisy image for toCD Dataset with σr = 40, σg = 20 and σb = 30. each channel. We implement the grayscale denoising meth- ods, NCSR and PGPD, on the color noisy images in a band- wise manner with the corresponding channel noise level. window size of each patch as 20, the similar patch number√ M = 70, each patch size s = 6, K1 = 10, K = 4, c = 2 2, 3.1 Synthetic Noisy Color Image Experiments λ = 0.6, p = 0.999 and ρ = 3. The Kodak PhotoCD Dataset is first utilized in our synthetic The line graph of the PSNR results for MCWSNM and the experiments. It includes 24 color images, each of which is comparison methods are presented in Fig.4. It can be seen either 768 × 512 or 512 × 768 in size. The noisy image is from this graph that our proposed method outperforms the generated by adding zero mean Gaussian noise with σr = 40, other competing methods in most cases. Moreover, Fig.5 and σg = 20 and σb = 30. In MCWSNM, we set the local search Fig.6 show the denoised results of ”kodim1” and ”kodim3”

641 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

(a) #1 (b) #2 (c) #3 (d) #4 (e) #5 (a) GT (b) Noisy (c) NC (d) NCSR (e) PGPD

(f) #6 (g) #7 (h) #8 (i) #9 (j) #10 (f) MCWNNM (g) DnCNN (h) FFDNet (i) IRCNN (j) MCWSNM

Figure 7: Ten noisy images in the CC Dataset. Figure 9: Denoised images of cropped image #10 in the real Dataset CC.

30 33

32

31 25 30

29

(a) GT (b) Noisy (c) NC (d) NCSR (e) PGPD Average PSNR (dB) 28 Average PSNR (dB) 20 27

26

25 0.150.250.350.450.550.650.750.85 0.95 15 0.150.250.350.450.550.650.750.85 0.95 p p (a) (10,15,20) (b) (30,35,40)

26 24 (f) MCWNNM (g) DnCNN (h) FFDNet (i) IRCNN (j) MCWSNM 24 22

Figure 8: Denoised images of cropped #7 in real Dataset CC. 22 20

20 18 Average PSNR (dB) Average PSNR (dB) in Kodak PhotoCD Dataset, respectively. Visual inspection 18 16 reveals that MCWSNM can obtain the best visual result and 16 14 0.150.250.350.450.550.650.750.85 0.95 0.150.250.350.450.550.650.750.85 0.95 clear texture information. Color artifacts are generated by p p NCSR and PGPD, while NC over-smooths the image. While (c) (40,45,50) (d) (50,55,60) MCWNNM, DnCNN, FFDNet and IRCNN can get a clean 24 image, some of the detailed information is lost compared with 24 22 the original image. 22 20 20 3.2 Real Noisy Color Image Experiments 18 18

We implement the WCWSNM on a real noisy color image Average PSNR (dB) Average PSNR (dB) 16 16 dataset, CC [Nam et al., 2016], to evaluate the method’s per- 14 formance. This dataset includes 11 static scenes. The noisy 14 0.150.250.350.450.550.650.750.85 0.95 12 0.150.250.350.450.550.650.750.85 0.95 images were captured in an indoor environment with differ- p p ent cameras and camera settings. For each scene, 500 images (e) (60,65,70) (f) (80,85,90) were taken using the same camera and camera settings. The mean image of each scene was then calculated to generate Figure 10: The influence of changing p on denoised results under the noisy-free images, which were roughly regarded as the different noise levels (σr, σg, σb) on 24 images in the Kodak Pho- ground truth (GT). As the size of the original image is very toCD Dataset. large, 60 cropped smaller images with size 512 × 512 were provided in [Nam et al., 2016]. In this paper, we select 10 cropped images (see Fig.7) to conduct the experiment. 1. It can be seen from the table that the proposed MCWSNM For MCWSNM, we set the local search window size for obtains the highest PSNR value in most noisy images. Fig. each patch as 20, the similar patch number√ M = 60, each 8 and Fig. 9 show the visual results of #7 and #10 images patch size s = 5, K1 = 10, K = 4, c = 2 2, λ = 0.6, p = in CC dataset, respectively. Compared with other methods, 0.999 and ρ = 3. Because the GT are provided, a quantitative MCWSNM gets the best visual effect. It not only remove the assessment of each method is accessible. PSNR results for noise completely, but also preserve the detailed information different cameras and camera settings are provided in Table effectively.

642 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

Camera Settings # NC NCSR PGPD MCWNNM DnCNN FFDNET IRCNN MCWSNM Canon 5D, ISO=3200 1 38.5689 37.8693 36.8979 40.8160 40.8445 40.6137 41.3212 42.1242 2 35.0985 34.7144 33.8614 36.5158 35.7199 35.6380 36.0826 35.8209 Nikon D600, ISO=3200 3 36.8310 36.4562 35.8916 38.9327 38.9287 38.7748 39.3098 37.9467 4 38.6322 36.3762 35.9296 40.1002 39.3217 38.7881 40.4201 40.4434 Nikon D800, ISO=1600 5 38.2259 36.8040 36.2972 39.4494 39.8094 39.5719 39.9118 40.0471 6 39.0023 37.3847 36.4443 42.3815 40.6930 40.2738 42.8365 41.9094 Nikon D800, ISO=3200 7 36.5622 34.2501 33.9318 39.5804 36.7582 36.2557 38.9257 41.5596 8 35.9500 33.8988 33.5135 36.7548 36.0972 35.8297 38.2784 37.8922 Nikon D600, ISO=6400 9 33.2295 31.1022 30.7797 36.5241 32.1810 31.9962 33.4272 38.2520 10 32.2609 30.6858 30.4058 32.2105 31.6635 31.4748 32.7171 32.4324

Table 1: PSNR results (dB) of real color image CC Dataset.

3.3 Analysis of Power p damental Research Funds for the Central Universities under It is necessary to analyze the most appropriate setting of pow- Grant 413000092 and 413000082. er p for different noise levels (σr, σg, σb). We utilize 24 im- ages in Kodak PhotoCD Dataset to test the proposed MCWS- References NM with different p under different noise levels added to the [Buades et al., 2005] Antoni Buades, Bartomeu Coll, and R, G and B channels. The results are presented in Fig.10. In Jean-Michel Morel. A non-local algorithm for image de- each subfigure, the vertical coordinate represents the average noising. In CVPR, pages 60–65, 2005. PSNR value under a certain noise level, while the horizon- tal coordinate denotes the values of p changing from 0.1 to [Chatterjee and Milanfar, 2010] Priyam Chatterjee and Pey- man Milanfar. Is denoising dead? IEEE Transactions on 1 with interval 0.05. We use six levels in this test: σc = {(10, 15, 20), (30, 35, 40),..., (60, 65, 70), (80, 85, 90)}. It Image Processing, 19(4):895–911, 2010. is clear from the histogram that at low and medium noise lev- [Chen et al., 2015] Guangyong Chen, Fengyuan Zhu, and els, the best value for p is 1, while at a high noise level, the Pheng-Ann Heng. An efficient statistical method for im- best value of p is 0.1. This is mainly because, in the strong age noise level estimation. In ICCV, pages 447–485, 2015. noise case (which means more rank components of the data [ et al. ] are contaminated) highly ranked parts should penalized heav- Dabov , 2007a K. Dabov, A. Foi, V. Katkovnik, and ily, while lower-ranked parts should be penalized less. K. Egiazarian. Color image denoising via sparse 3D col- laborative filter with grouping constraint in Luminance- space. In ICIP, pages 313–316, 2007. 4 Conclusion [Dabov et al., 2007b] Kostadin Dabov, Alessandro Foi, In this paper, based on the low-rank property of the non-local Vladimir Katkovnik, and Karen O. Egiazarian. Im- self-similarity, we propose a MCWSNM method for color age denoising by sparse 3-D transform-domain collabo- image denoising. For noisy color images, which generally rative filtering. IEEE Transactions on Image Processing, hold different noise strength in each band, a weight matrix 16(8):2080–2095, 2007. assigned to the noise level of each channel is introduced in [ ] order balance each channel’s the contribution to the final es- Dong et al., 2013a Weisheng Dong, Guangming Shi, and timation result. MCWSNM can be efficiently solved via AD- Xin Li. Nonlocal image restoration with bilateral variance MM optimization framework. Theorem 1 theoretically an- estimation: A low-rank approach. IEEE Transactions on alyzes the convergence property of our proposed algorithm. Image Processing, 22(2):700–711, 2013. Experiments on synthetic and real datasets demonstrate that [Dong et al., 2013b] Weisheng Dong, Lei Zhang, Guang- the proposed method can obtain satisfactory results on the ming Shi, and Xin Li. Nonlocally centralized sparse rep- color image denoising task. resentation for image restoration. IEEE Transactions on Image Processing, 22(4):1620–1630, 2013. Acknowledgments [Fang et al., 2020a] Yixiang Fang, Xin Huang, Lu Qin, Ying This work was supported in part by the National Natu- Zhang, Wenjie Zhang, Reynold Cheng, and Xuemin Lin. ral Science Foundation of China under Grants 61822113, A survey of community search over big graphs. The VLDB 61976161, 62041105, the Science and Technology Major Journal, 29:353–392, 2020. Project of Hubei Province (Next-Generation AI Technolo- [Fang et al., 2020b] Yixiang Fang, Yixing Yang, Wenjie gies) under Grant 2019AEA170, the Natural Science Founda- Zhang, Xuemin Lin, and Xin Gao. Effective and effi- tion of Hubei Province under Grants 2018CFA050, the Fun- cient community search over large heterogeneous infor-

643 Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence (IJCAI-20)

mation networks. Proceedings of the VLDB Endowment, [Wang et al., 2018b] Zheng Wang, Xiang Bai, Mang Ye, and 13(6):854–867, 2020. Shin’ichi Satoh. Incremental deep hidden attribute learn- [Gu et al., 2017] Shuhang Gu, Qi Xie, Deyu Meng, Wang- ing. In ACM MM, pages 72–80, 2018. meng Zuo, Xiangchu Feng, and Lei Zhang. Weighted [Xie et al., 2016] Yuan Xie, Shuhang Gu, Yan Liu, Wang- nuclear norm minimization and its applications to low meng Zuo, Wensheng Zhang, and Lei Zhang. Weight- level vision. International Journal of Computer Vision, ed Schatten p-norm minimization for image denoising and 121(2):183–208, 2017. background subtraction. IEEE Transactions on Image Pro- [Hu et al., 2019] Zhentao Hu, Zhiqiang Huang, Xinjian cessing, 25(10):4842–4857, 2016. Huang, Fulin Luo, and Renzhen Ye. An adaptive nonlocal [Xu et al., 2015] Jun Xu, Lei Zhang, Wangmeng Zuo, David gaussian prior for hyperspectral image denoising. IEEE Zhang, and Xiangchu Feng. Patch group based nonlocal Geoscience and Remote Sensing Letters, 16(9):1487– self-similarity prior learning for image denoising. In IC- 1491, 2019. CV, pages 244–252, 2015. [Lebrun et al., 2015] Marc Lebrun, Miguel Colom, and [Xu et al., 2017a] Chen Xu, Zhouchen Lin, and Hongbin Jean-Michel Morel. The noise clinic: A blind image de- Zha. A unified convex surrogate for the Schatten-p nor- noising algorithm. IPOL Journal, 5:1–54, 2015. m. In AAAI, pages 926–932, 2017. [Liu et al., 2017] Weiwei Liu, Ivor W. Tsang, and Klaus- [Xu et al., 2017b] Jun Xu, Lei Zhang, David Zhang, and Xi- Robert Muller.¨ An easy-to-hard learning paradigm for angchu Feng. Multi-channel weighted nuclear norm min- multiple classes and multiple labels. Journal of Machine imization for real color image denoising. In ICCV, pages Learning Research, 18(94):1–38, 2017. 1096–1104, 2017. [Liu et al., 2018] Ding Liu, BihanWen, Xianming Liu, [Ye et al., 2018] Mang Ye, Zheng Wang, Xiangyuan Lan, Zhangyang Wang, and Thomas S. Huang. When image and Pong C. Yuen. Visible thermal person re-identification denoising meets high-level vision tasks: A deep learning via dual-constrained top-ranking. In IJCAI, pages 1092– approach. In IJCAI, pages 842–848, 2018. 1099, 2018. [Liu et al., 2019] Weiwei Liu, Donna Xu, Ivor W. Tsang, and [Zhang and Aeron, 2016] Zemin Zhang and Shuchin Aeron. Wenjie Zhang. Metric learning for multi-output tasks. Denoising and completion of 3D data via multidimension- IEEE Transactions on Pattern Analysis and Machine In- al dictionary learning. In IJCAI, pages 2371–2377, 2016. telligence, 42(2):408–422, 2019. [Zhang et al., 2013] Min Zhang, Zheng-Hai Huang, and Y- [Mairal et al., 2008] Julien Mairal, Michael Elad, and ing Zhang. Restricted p-isometry properties of nonconvex Guillermo Sapiro. Sparse representation for color im- matrix recovery. IEEE Transactions on Information Theo- age restoration. IEEE Transactions on Image Processing, ry, 59(7):4316–4323, 2013. 17(1):53–69, 2008. [Zhang et al., 2017a] Kai Zhang, Wangmeng Zuo, Yunjin [Mairal et al., 2012] Julien Mairal, Francis R. Bach, and Chen, Deyu Meng, and Lei Zhang. Beyond a Gaussian de- Jean Ponce. Task-driven dictionary learning. IEEE Trans- noiser: Residual learning of deep CNN for image denois- actions on Pattern Analysis and Machine Intelligence, ing. IEEE Transactions on Image Processing, 26(7):3142– 34(4):791–804, 2012. 3155, 2017. [Marsousi et al., 2014] Mahdi Marsousi, Kaveh Abhari, Paul [Zhang et al., 2017b] Kai Zhang, Wangmeng Zuo, Shuhang Babyn, and Javad Alirezaie. An adaptive approach to Gu, and Lei Zhang. Learning deep CNN denoiser prior for learn overcomplete dictionaries with efficient numbers image restoration. In CVPR, pages 2808–2817, 2017. of elements. IEEE Transactions on Signal Processing, 62(12):3272–3283, 2014. [Zhang et al., 2018] Kai Zhang, Wangmeng Zuo, and Lei Zhang. FFDNet: Toward a fast and flexible solution for [ ] Nam et al., 2016 Seonghyeon Nam, Youngbae Hwang, Ya- CNN-based image denoising. IEEE Transactions on Im- suyuki Matsushita, and Seon Joo Kim. A holistic approach age Processing, 27(9):4608–4622, 2018. to cross-channel image noise modeling and its application to image denoising. In CVPR, pages 1683–1691, 2016. [Zhu et al., 2016] Fengyuan Zhu, Guangyong Chen, and Pheng-Ann Heng. From noise modeling to blind image [ ] Roth and Black, 2005 Stefan Roth and Michael J. Black. denoising. In CVPR, pages 420–429, 2016. Fields of experts: A framework for learning image priors. In CVPR, volume 2, pages 860–867, 2005. [Starck et al., 2002] Jean-Luc Starck, Emmanuel J. Candes,` and David L. Donoho. The curvelet transform for im- age denoising. IEEE Transactions on Image Processing, 11(6):670–684, 2002. [Wang et al., 2018a] Zengmao Wang, Yuhong Guo, and Bo Du. Matrix completion with preference ranking for top-N recommendation. In IJCAI, pages 3585–3591, 2018.

644