Deep Gaussian Conditional Random Field Network: a Model-Based Deep Network for Discriminative Denoising

Deep Gaussian Conditional Random Field Network: A Model-based Deep Network for Discriminative Denoising Raviteja Vemulapalli Oncel Tuzel, Ming-Yu Liu Center for Automation Research, UMIACS Mitsubishi Electric Research Laboratories University of Maryland, College Park Cambridge, MA Abstract For example, in the case of image denoising, it has been recently shown that conventional multilayer percep- We propose a novel end-to-end trainable deep network trons (MLP) are not very good at handling multiple levels architecture for image denoising based on a Gaussian Con- of input noise [3]. When a single multilayer perceptron was ditional Random Field (GCRF) model. In contrast to the trained to handle multiple input noise levels (by providing existing discriminative denoising methods that train a sep- the noise variance as an additional input to the network), arate model for each individual noise level, the proposed it produced inferior results compared to the state-of-the-art deep network explicitly models the input noise variance and BM3D [6] approach. In contrast to this, the EPLL frame- hence is capable of handling a range of noise levels. Our work of [40], which is a model-based approach, has been deep network, which we refer to as deep GCRF network, shown to work well across a range of noise levels. These consists of two sub-networks: (i) a parameter generation results suggest that we should work towards bringing deep network that generates the pairwise potential parameters networks and model-based approaches together. Motivated based on the noisy input image, and (ii) an inference net- by this, in this work, we propose a new deep network archi- work whose layers perform the computations involved in tecture for image denoising based on a Gaussian conditional an iterative GCRF inference procedure. We train two deep random field model. The proposed network explicitly mod- GCRF networks (each network operates over a range of els the input noise variance and hence is capable of handling noise levels: one for low input noise levels and one for a range of noise levels. high input noise levels) discriminatively by maximizing the Gaussian Markov Random Fields (GMRFs) [29] are peak signal-to-noise ratio measure. Experiments on Berke- popular models for various structured inference tasks such ley segmentation and PASCALVOC datasets show that the as denoising, inpainting, super-resolution and depth estima- proposed approach produces results on par with the state- tion, as they model continuous quantities and can be effi- of-the-art without training a separate network for each in- ciently solved using linear algebra routines. However, the dividual noise level. performance of a GMRF model depends on the choice of pairwise potential functions. For example, in the case of image denoising, if the potential functions for neighboring 1. Introduction pixels are homogeneous (i.e., identical everywhere), then the GMRF model can result in blurred edges and over- In the recent past, deep networks have been successfully smoothed images. Therefore, to improve the performance used in various image processing and computer vision ap- of a GMRF model, the pairwise potential function param- plications [3, 12, 34]. Their success can be attributed to sev- eters should be chosen according to the image being pro- eral factors such as their ability to represent complex input- cessed. A GMRF model that uses data-dependent potential output relationships, feed-forward nature of their inference function parameters is referred to as Gaussian Conditional (no need to solve an optimization problem during run time), Random Field (GCRF) [35]. availability of large training datasets, etc. One of the pos- Image denoising using a GCRF model consists of two itive aspects of deep networks is that fairly general archi- steps: a parameter selection step in which the potential tectures composed of fully-connected or convolutional lay- function parameters are chosen based on the input image, ers have been shown to work reasonably well across a wide and an inference step in which energy minimization is per- range of applications. However, these general architectures formed for the chosen parameters. In this work, we pro- do not use problem domain knowledge which could be very pose a novel model-based deep network architecture, which helpful in some of the applications. we refer to as deep GCRF network, by converting both the 4801 Figure 1: The proposed deep GCRF network: Parameter generation network (PgNet) followed by inference network (InfNet). The PgNets in dotted boxes are the additional parameter generation networks introduced after each HQS iteration. parameter selection and inference steps into feed-forward discriminatively in an end-to-end fashion, even if the first networks. PgNet fails to generate good potential function parameters, The proposed deep GCRF network consists of two sub- the later PgNets can learn to generate appropriate parame- networks: a parameter generation network (PgNet) that ters based on partially restored images. generates appropriate potential function parameters based on the input image, and an inference network (InfNet) that Contributions: performs energy minimization using the potential function • We propose a new end-to-end trainable deep network parameters generated by PgNet. Since directly generating architecture for image denoising based on a GCRF the potential function parameters for an entire image is very model. In contrast to the existing discriminative de- difficult (as the number of pixels could be very large), we noising methods that train a separate model for each construct a full-image pairwise potential function indirectly individual noise level, the proposed network explicitly by combining potential functions defined on image patches. models the input noise variance and hence is capable If we use d × d patches, then our construction defines a of handling a range of noise levels. graphical model in which each pixel is connected to its • We propose a differentiable parameter generation net- spatial neighbors. This construction (2d − 1) × (2d − 1) work that generates the GCRF pairwise potential pa- is motivated by the recent EPLL framework of [40]. Our rameters based on the noisy input image. PgNet directly operates on each d × d input image patch and chooses appropriate parameters for the corresponding • We unroll a half quadratic splitting-based iterative potential function. GCRF inference procedure into a deep network and Though the energy minimizer can be obtained in closed train it jointly with our parameter generation network. form for GCRF, it involves solving a linear system with • We show that the proposed approach produces results number of variables equal to the number of image pixels on par with the state-of-the-art without training a sep- 6 (usually of the order of 10 ). Solving such a large linear arate network for each individual noise level system could be computationally prohibitive, especially for dense graphs (each pixel is connected to 224 neighbors 2. Related Work when 8 × 8 image patches are used). Hence, in this work, we use an iterative optimization approach based on Half Gaussian CRF: GCRFs were first introduced in [35] by Quadratic Splitting (HQS) [11, 19, 36, 40] for designing our modeling the parameters of the conditional distribution of inference network. Recently, this approach has been shown output given input as a function of the input image. The pre- to work very well for image restoration tasks even with very cision matrix associated with each image patch was mod- few (5-6) iterations [40]. Our inference network consists of eled as a linear combination of twelve derivative filter-based a new type of layer, which we refer to as HQS layer, that matrices. The combination weights were chosen as a para- performs the computations involved in a HQS iteration. metric function of the responses of the input image to a set Combining the parameter generation and inference net- of oriented edge and bar filters, and the parameters were works, we get our deep GCRF network shown in Figure 1. learned using discriminative training. This GCRF model Note that using appropriate pairwise potential functions is was extended to Regression Tree Fields (RTFs) in [18], crucial for the success of GCRF. Since PgNet operates on where regression trees were used for selecting the param- the noisy input image, it becomes increasingly difficult to eters of Gaussians defined over image patches. These re- generate good potential function parameters as the image gression trees used responses of the input image to various noise increases. To address this issue, we introduce an ad- hand-chosen filters for selecting an appropriate leaf node for ditional PgNet after each HQS iteration as shown in dotted each image patch. This RTF-based model was trained by it- boxes in Figure 1. Since we train this deep GCRF network eratively growing the regression trees and optimizing the 4802 Gaussian parameters at leaf nodes. Recently, a cascade of handling multiple noise levels. In all these works, a differ- RTFs [30] has also been used for image restoration tasks. In ent network was trained for each noise level. contrast to the RTF-based approaches, all the components Unfolding inference as a deep network: The proposed of our network are differentiable, and hence it can be trained approach is also related to a class of algorithms that learn end-to-end using standard gradient-based techniques. model parameters discriminatively by back-propagating the Recently,

Load more