Image Denoising of Gaussian and Poisson Noise Based on Thresholding

A dissertation submitted to the

Graduate School

of the University of Cincinnati

in partial fulfillment of the

requirements for the degree of

DOCTOR OF PHILOSOPHY (Ph.D.)

in the School of Electronic and Computing Systems

University of Cincinnati

Cincinnati OH 45221 USA

2013

by

Jin Quan

Bachelor of Science in Engineering, Tongji University

Shanghai, China, 2007

Committee chair: Dr. William G. Wee Abstract

Noise on images is generally undesirable and disturbing. It always plays a negative role on higher level processing tasks such as image registration and segmentation.

Thus,image denoising becomes a fundamental step necessarily required for better im- age understanding and interpretation. During the last couple of years, wavelet has been extensively employed in the application of suppressing noise and proven to be a successful tool which outperforms many conventional denoising filters due to its pre- ferred properties. Therefore, in this dissertation, the wavelet transform is applied to develop our denoising strategy.

Basically, two generic scenarios occur during the acquisition of images. First, when the detected intensities on the image are sufficiently high, the noise can be suitably modeled as following an additive independent Gaussian distribution. Second, when only a few photons are detected, this observed image is usually modeled as a Poisson process and the intensities to be estimated are assumed to be the underlying Poisson parameters. In this dissertation, these two scenarios are discussed respectively in Part

IandPartII.

In part I, we consider to reduce the typical additive white Gaussian noise (AWGN).

Our driving principle is to decrease the upper bound of the error restricted by the soft-thresholding strategy between the investigated image and noise-free image. Thus we develop a new context modeling method to group coefficients with similar statis-

ii tics and construct a smoothed version of the noisy image prior to the actual denoising operation. Then, we propose an optimized soft-thresholding denoising function with parameters derived from a modification of a closed form solution which has a more

flexible shape and is adaptively pointwise. Furthermore, we extend it to its overcom- plete representation by employing the “cycle spinning” method so that the property of shift invariance is achieved which leads to a boost of the denoising performance. By combining these strategies, the denoising results in our experiments confirm that the approach is very competitive to some state-of-the-art denoising methods in terms of quantitative measurements and computational simplicity.

In Part II, a new denoising method for Poisson noise corrupted images is proposed which is based on the variance stabilizing transformation (VST) with a new inverse.

The VST is used to approximately convert the Poisson noisy image into Gaussian distributed, so that the denoising methods aiming at Gaussian noise can be applied subsequently. The motivation for the improved inverse comes from a main drawback existing in the conventional VSTs such as the Anscombe transformation: its efficiency degrades significantly when the pixel intensities of the observed images are very low due to the biased errors generated by its inverse transformation. In order to correct the biased errors, we introduce a polynomial regression model based on weighted least squares as an alternate to its inverse. Moreover, we incorporate our developed wavelet thresholding strategy for Gaussian noise presented in Part I into the proposed method.

We also extend it to the overcomplete representation to suppress the Pseudo-Gibbs phenomena and therefore gains additional denoising effects. Experimental analysis indicates that this method is very competitive.

iii iv Acknowledgments

I would like to express my deepest gratitude and most sincere appreciation to my advisor Dr. William G. Wee for all his continuous guidance, constant encouragement and precious support during my stay at the University of Cincinnati. In every sense, he was always a fruitful source of inspiration from which I tremendously benefited. I am also very grateful to the members of the dissertation committee: Dr. Chia Y. Han, Dr.

Xuefu Zhou, Dr. Raj Bhatnagar and Dr. Ali Minai for kindly sharing their scientific knowledge, devoting priceless time and providing constructive advice and comments on my research. I learned a lot from enthusiastic discussions with them.

In addition, many thanks go to the former and current members of the Multimedia and Augmented Reality lab at the University of Cincinnati for their help, kindness and valuable suggestions.

Most of all, I owe my warmest thanks to my father Dr. Shuhai Quan and my mother Mrs. Yulan Ruan. Nothing could have been possible without their everlasting love, understanding and support.

v Table of Contents

Abstract ii

Acknowledgments v

List of Figures ix

List of Tables xiv

1 Introduction 1

1.1ImageDenoising...... 1

1.2ProblemStatement...... 2

1.3 Research Scope ...... 4

1.4Contributions...... 5

1.5DissertationOrganization...... 6

Part I Image Denoising for Additive Gaussian Noise 9

2 Background 10

2.1 Additive Gaussian Noise Model ...... 10

2.2ImageQualityEvaluation...... 11

2.2.1 Objective Image Quality Evaluation ...... 11

vi 2.2.2 Subjective Image Quality Evaluation ...... 13

2.3 Summary ...... 14

3 Literature Review 15

3.1SpatialDomainApproaches...... 15

3.1.1 LinearFilters...... 16

3.1.2 Nonlinear Filters ...... 17

3.2TransformedDomainApproaches...... 20

3.2.1 FourierTransformDenoising...... 20

3.2.2 WaveletTransformDenoising...... 21

3.2.3 Data-AdaptiveTransformDenoising...... 50

3.3OtherAlternativeApproaches...... 51

3.4 Summary ...... 53

4 The Proposed Gaussian Denoising Method: CWMT 55

4.1Motivation...... 55

4.2Overview...... 58

4.3 Denoising Operation 1–Improved Context Modeling ...... 59

4.4ExperimentalResultsofDenoisingOperation1 ...... 63

4.5 Denoising Operation 2–The Optimization of the Soft Thresholding Func-

tion...... 66

4.6ExperimentalResultsofDenoisingOperation2 ...... 69

4.7 Experimental Results of Combining Two Denoising Operations ..... 71

4.8TheStepsoftheDenoisingProposedMethod:CMWT...... 75

4.9 Summary ...... 79

5 Expansion to the Overcomplete Representation 81

5.1 Overview of Applying Overcomplete Expansion ...... 81

vii 5.2 Overcomplete Expansion Procedure ...... 83

5.3 Experimental Results for Overcomplete Expansion ...... 83

5.4 Summary ...... 91

Part II Image Denoising for Poisson Noise 92

6 Background on Poisson Denoising 93

6.1 Modeling of Low Intensity Images ...... 93

6.2PoissonNoiseModel...... 94

6.3RelatedWork...... 95

6.3.1 Variance Stabilization ...... 95

6.3.2 HypothesisTesting...... 98

6.3.3 WaveletFiltering...... 99

6.3.4 BayesianBasedApproach...... 100

6.4AnscombeTransformationandItsInversions...... 101

6.5 Summary ...... 104

7 The Proposed Poisson Denoising Method: CMWT-IAT 105

7.1InvestigationoftheBiasedErrors...... 106

7.2NewInverseTransformationforAnscombeTransformation...... 108

7.3CombinationoftheDenoisingMethodCMWTforAWGN...... 112

7.4 Summary ...... 113

8 Performance Evaluation 114

8.1ComparisonswithTwoConventionalInversions...... 115

8.2ComparisonswithSURE-LETUsingtheProposedInversion...... 115

8.3ComparisonswithState-of-the-ArtDenoisingMethods...... 117

8.4 Summary ...... 121

viii 9 Conclusion and Perspectives 125

9.1Conclusion...... 125

9.2Perspectives...... 126

Bibliography 130

ix List of Figures

1.1 (a) A noise-free Image Pepper,(b)Anoisyversionofit...... 2

3.1 Original Image Lena anditsFourierdecomposition...... 21

3.2Onescaleofwaveletdecomposition...... 22

3.3Somefamouswavelets...... 23

3.4 Original Image Lena and its first level decomposition by using db4 wavelet 24

3.5 Hard-thresholding function ...... 29

3.6 Soft-thresholding function ...... 30

3.7 Semisoft-thresholding function ...... 31

4.1Awaveletdenoisingflowchart...... 57

4.2 The parent-child relationship of a three level wavelet decomposition. . . 60

4.3 Subband designations ...... 61

4.4 Six standard testing images used on our experiments (a) Lena,(b)Boat,

(c) Goldhill,(d)Barbara,(e)Couple,(f)Man...... 64

4.5 Subbands of the 2D orthogonal wavelet transform ...... 65

4.6 Sensitivity of the denoising function with respect to variations of T on

Lena ...... 69

4.7 Sensitivity of the denoising function with respect to variations of T on

Boat ...... 70

x 4.8 Visual comparison between original soft-thresholding and the optimized

soft-thresholding functions on Image Goldhill. (a) Original Goldhill,

(b) Noisy version of noise level 30, (c) Denoised image by original soft-

thresholding function, (d) Denoised image by the optimized soft-thresholding

function...... 72

4.9 Visual comparison between original soft-thresholding and the optimized

soft-thresholding functions on Image Barbara. (a) Original Barbara,

(b) Noisy version of noise level 30, (c) Denoised image by original soft-

thresholding function, (d) Denoised image by the optimized soft-thresholding

function...... 73

4.10 Comparison of PSNR of CMWT and SURE-LET [1] on Lena ...... 74

4.11 Comparison of PSNR of CMWT and SURE-LET [1] on Goldhill .... 74

4.12 Comparison of PSNR of CMWT and SURE-LET [1] on Couple .... 75

4.13 Visual comparison on Image Lena. (a) Original image, (b)Noisy image

of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised

imagebyCMWT...... 76

4.14 Visual comparison on Image Goldhill. (a) Original image, (b)Noisy im-

age of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised

imagebyCMWT...... 77

4.15 Visual comparison on Image Couple. (a) Original image, (b)Noisy image

of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised

imagebyCMWT...... 78

5.1 Relation between shifted times and PSNR of the denoised Image Lena .84

5.2 Relation between shifted times and PSNR of the denoised Image Goldhill 84

5.3 Comparison of PSNR (dB) with 3 other most efficient methods on Lena 85

5.4 Comparison of PSNR (dB) with 3 other most efficient methods on Goldhill 85

xi 5.5 Comparison of PSNR (dB) with 3 other most efficient methods on Couple 86

5.6 The Image Lena (a) Noise-free image, (b) Noisy image with noise level

of 50, (c) Denoised image using SURE-LET [62], (d) Denoised image

using BLS GSM [3], (e) Denoised image using BM3D [92], (f) Denoised

imageusingCMWT-OE...... 87

5.7 The Image Boat (a) Noise-free image, (b) Noisy image with noise level

of 50, (c) Denoised image using SURE-LET [62], (d) Denoised image

using BLS GSM [3], (e) Denoised image using BM3D [92], (f) Denoised

imageusingCMWT-OE...... 88

5.8 Wavelet domain images of Barbara at: (a) the first scale (b) the second

scale(scaledupbytwo)(c)thethirdscale(scaledupbyfour)...... 89

6.1GeneralPoissonnoisyimagedenoisingprocedure...... 96

7.1 Variance of Poisson distributed data sets ...... 107

7.2Varianceoftransformeddatasets...... 107

7.3BiasederrorsbetweenPoissonparametersandestimatedmeans.... 108

7.4 Curve fitting for Poisson parameter under 10 ...... 111

7.5CurvefittingforPoissonparameterfrom10to30...... 111

8.1 (a) The original Image Boat at peak intensity 30, (b) Poisson noise

corrupted image, (c) Image denoised with non-overcomplete SURE-LET

[1] and the proposed inversion, (d) Image denoised with CMWT-IAT. . 118

8.2 (a) Part of the original Image Man at peak intensity 30, (b) Poisson noise

corrupted image, (c) Image denoised with non-overcomplete SURE-LET

[1] and the proposed inversion, (d) Image denoised with CMWT-IAT. . 119

xii 8.3 (a) The original Image Goldhill at peak intensity 20, (b) Poisson noise

corrupted image, (c) Image denoised with overcomplete SURE-LET [62]

and the proposed inversion, (d) Image denoised with PURE-LET [6], (e)

Image denoised with BM3D and their unbiased inversion [7], (f) Image

denoised with our denoising method [109] and the proposed inversion. . 123

8.4 (a) The original Image Couple at peak intensity 10, (b) Poisson noise

corrupted image, (c) Image denoised with overcomplete SURE-LET [62]

and the proposed inversion, (d) Image denoised with PURE-LET [6], (e)

Image denoised with BM3D and their unbiased inversion [7], (f) Image

denoised with our denoising method [109] and the proposed inversion. . 124

xiii List of Tables

3.1 Comparison of several most famous thresholding denoising methods (PSNR)

...... 36

3.2 Comparison of the aforementioned image denoising methods(PSNR) . 45

4.1 Table for selecting A at different subbands and estimated noise levels . 63

4.2 Comparison of PSNR of 3 images from a reconstruction of Znew(i,j) and

the input PSNR ...... 63

4.3 Table for selecting Δa1 under different subbands and noise levels. . . . 68

4.4 Table for selecting Δa2 under different subbands and noise levels. . . . 68

4.5 Comparison of PSNR (dB) with Soft-thresholding (non-redundant) . . 71

4.6 Comparison of PSNR (dB) with the method SURE-LET(non-redundant)

...... 79

5.1 Comparison of several most famous thresholding denoising methods (PSNR)

...... 90

8.1 PSNR (dB) comparison of the arithmetical, asymptotical inverse Anscombe

transformationsandtheproposedinversetransformation...... 116

8.2 PSNR comparison of SURE-LET [1] and our denoising method [109]

combined with the proposed inversion for different images and peak

intensities...... 117

xiv 8.3 PSNR comparison of some of the best denoising methods for different

imagesandpeakintensities...... 121

xv Chapter 1

Introduction

1.1 Image Denoising

Images obtained from the real world are always mixed with noise. The noise brought in is derived from multiple sources. The imperfect instrument itself would produce a certain amount of noise when the image is taken. When transforming the optical signal into a digital signal, the pixel’s value at specific location is dependent to the number of photons the corresponding captor has received. So the instability of the number of receiving photons can cause the production of noise. Moreover, during image’s am- plification and transmission, additional perturbations can be introduced by electronic devices and transmission lines.

There are several different types of noise in digital images. For instance, shot noise is generated by the random way photons are emitted from a light source especially when the light intensity is limited and it is usually characterized by Poisson distribu- tion. Thermal noise, also known as dark current noise, is produced by thermal agitation of electrons at sensing sites and highly dependent on the sensor’s temperature and the exposure time. Images with impulsive noise, which is generally caused by the mal- functioning of elements in the camera sensors or timing errors in the data transmission

1 process, have bright pixels in dark areas and dark pixels in bright areas. And quanti- zation noise often happens due to the errors when an analog signal is converted to a number of discrete digital values.

Since noise seriously compromises the details of the image and hampers image un- derstanding and image analysis in scientific and commercial applications (see Figure

1.1), image denoising is extensively required. Thus it is highly necessary to use an appropriate and efficient denoising approach to eliminate or reduce noise while keeping the important image features when pre-processing images.

Figure 1.1: (a) A noise-free Image Pepper, (b) A noisy version of it

1.2 Problem Statement

Image denoising attempts to recover a noise-free image by eliminating or reducing the noise on the observed image. This processing can be modeled as obtaining an optimal estimate of the unknown noise-free image from the available noise-corrupted image. A large number of scientific literatures have emphasized on image denoising in the last decade and there is still existing a wide range of interest in the subject nowadays.

Although various algorithms and tools have been proposed, derived and improved, the

2 problem is that many denoising techniques always suffer over-softening the crucial im- age features as well as introducing artifacts. Thus the searching for an efficient image denoising method is still a challenging task.

Besides, the amount of noise usually depends on the signal intensity. Practitioners often consider it to be following a statistical distribution. Generally, when the magni- tude of the measured signal is sufficiently high, the noise is supposed to be independent of the original image that it corrupts, and modeled as an additive Gaussian random variable. On the other hand, when the magnitude of the observed signal is relatively low, it is often assumed to follow a Poisson distribution.

Thus, the general goal of this research is to design and implement an efficient im- age denoising method for Gaussian and Poisson noise, which can satisfy the following requirements:

• Competitive performance

The proposed algorithm should be competitive with other start-of-the-art denoising methods according to certain objective measurements, such as Peak Signal-to-Noise

Ratio (PSNR). It should also satisfy human visual assessment.

• Minimal human interaction

The human interaction should be minimized during the denoising process when ap- plying the proposed algorithm, in other words, the entire denoising process should be totally automatic.

• Low computational burden

The proposed algorithm should not require a very high computing capacity, a regular personal computer should satisfy the hardware requirement and be qualified to com- plete the whole process in a short period of time.

• Adequate reliability

The proposed algorithm should demonstrate consistent and repeatable experimental results regardless of the sources of images and how many times the denoising process

3 is performed.

• Wide targeted application

The major application of the algorithm is to denoise natural images corrupted by additive Gaussian noise. Under certain conditions, it should also be applicable to the denoising of images corrupted by other types of noise, such as Poisson noised corrupted images, which are commonly obtained in astronomy and biomedical research.

1.3 Research Scope

Although ultrasound images are more probably to be corrupted by speckle noise and the statistical distribution of the noise found in MRI images is more like Rician, most of the noise obtained during acquisition and transmission of the natural images are assumed to be additive white Gaussian noise (AWGN), so in particular, the scope of

Part I of this research is to focus on the suppression of AWGN on natural images.

Meanwhile, The image type focused on in the research is grayscale, though we make occasional references to papers on color image denoising. The images used for experimental purposes are all standard grayscale and natural testing images. These grayscale images contain 8 bit data which means the brightness levels are from 0∼255.

In addition, in order to broaden the range of applications of our denoising method, in Part II of the dissertation, we extend the work of this research to the denoising of non Gaussian-corrupted images, in particular, the denoising of low intensity Poisson- corrupted images. Poisson-corrupted images are commonly acquired in biomedical imaging and astronomy research when the intensity of the light source is limited and the exposure time is short. And noise on these low intensity images is often modeled as Poisson distribution.

4 1.4 Contributions

In this dissertation, we address the image denoising problem with focuses on the re- moval of additive Gaussian noise and Poisson noise. By researching on the wavelet transform and soft-thresholding strategy, we develop a very competitive image denois- ing method. Here we present the original contributions of this dissertation consisting of the following:

1. Many effective denoising methods involve sophisticated redundant transforms, which

carry a heavy computational burden. Our proposed method is based on orthogo-

nal wavelet transform which is simple and fast but retains many useful properties.

It is interesting to see that even with this non-redundant transform, our method

is comparable to the state-of-the-art denoising methods under redundant wavelet

transform framework.

2. A new statistical model in wavelet domain is used to smooth the image and reduce

the noise prior to thresholding. We fully exploit the parent-child relationship be-

tween wavelet coefficients and inspect the neighboring dependency. We also give

a detailed discussion about how to construct the estimate for each coefficient by

modifying the number of relevant coefficients involved with respect to the wavelet

decomposition levels in a proper way.

3. In order to achieve optimal performances, most of the existing denoising algorithms

require the optimization of several parameters by solving a nonlinear system, whereas

the parameters in our method are obtained in a practical manner from the noise

model. We do not to need to solve a complicated optimization equations to get

these parameters, but derive a modification of a closed-form solution based on an

unbiased estimate of MSE, which saves tremendous computing time.

5 4. The core principle driving us to develop the proposed denoising method is to min-

imize the mean-squared error (MSE). From a practical standpoint, although this

criterion cannot be optimized in real applications due to the unavailability of the

noise-free images, both our method for additive Gaussian denoising and its version

for Poisson denoising do not require any prior information about the noise-free im-

ages. We also do not assume a statistical modeling for the observed image unlike

many Bayesian based methods.

5. We extend the denoising method to perform wavelet overcomplete expansion, which

yields even fewer visual artifacts and better image quality. Although extra compu-

tational time it costs, if visual effect is the main concern in a specific application,

this expansion is worth implementing.

1.5 Dissertation Organization

This dissertation mainly consists of two parts.

The first part of this dissertation generally deals with the problem of the noise removal of additive white Gaussian noise (AWGN) on 2D images.

In Chapter 2, we investigate the Gaussian noise model with its important features and properties. Common measurements applied to evaluate the performance and the efficiency of denoising approaches are included. Both objective and subjective image quality assessments are discussed.

In Chapter 3, we provide an in-depth and comprehensive literature review of the related work on Gaussian image denoising. Several influential approaches are described and compared to each other in details.

In Chapter 4, first, we review the motivation of reducing the upper bound of soft- thresholding scheme so as to achieve largest denoising effect, which leads to our two- operation denoising method. Then, a low-complexity, but remarkably efficient denois-

6 ing method is provided. This method incorporates an improved context modeling into the optimization of parameterized thresholding functions in the form of derivatives of Gaussian (DOG). We also present the efficiency of each denoising operation with numerical and visual results and combine them together as a two-operation process by comparing with a start-of-the-art image denoising method–SURE-LET [1] under non-overcomplete wavelet transform.

In Chapter 5, we extend the approach to the overcomplete representation of the wavelet transform by using cycle spinning strategy [2] in order to suppress the Pseudo-

Gibbs phenomena introduced by the standard orthogonal wavelet transform. After applying cycle spinning strategy, we compare our method to the redundant versions of the state-of-the-art wavelet techniques such as [3, 4] in terms of Peak Signal-to-Noise

Ratio (PSNR), respectively, as well as a non-wavelet method (BM3D) [5].

The second part of this dissertation is basically dedicated to the problem of non-

Gaussian noise, in particular Poisson noise, reduction on 2D images.

In Chapter 6, we discuss the necessity and the rationale of modeling low intensity images as Poisson distributed and give the mathematical model for Poisson noise. Af- ter this, an extensive literature survey of Poisson denoising research work is included.

At the end of this chapter, the most widely used variance stabilizing transformation

(VST)–Anscombe transformation and its inversions are described in details.

In Chapter 7, we apply the Anscombe transformation to reduce the Poisson sig- nal dependence under the VST framework so that our previously developed denoising method for AWGN can be directly used. Moreover, to avoid the biased errors gen- erated by its conventional inversions, we devise a new inversion aiming to correcting these errors. This new inversion is based on a piecewise polynomial regression model in the sense of weighted least squares.

In Chapter 8, we compare our proposed new inverse transformation with two con- ventional inverse transformations. We also evaluate its performance combined with our

7 denoising method for AWGN proposed in Part I by comparing to some of the leading algorithms such as [1,6,7]. Numerical analysis and denoised images are presented and indicate it as competitive as some state-of-the-art denoisers.

Finally in Chapter 9, the conclusion and some perspectives are presented.

8 Part I

Image Denoising for Additive

Gaussian Noise

9 Chapter 2

Background

It is a challenging task to suppress the disturbing noise while preserving important features on images at the same time. In order to guarantee a successful denoising operation, an appropriate noise model should be set up prior to any further develop- ment. Hence, in this chapter, we investigate the commonly used Gaussian noise model with its basic characteristics. In addition, to assess the effects of a specific denoising method and compare it with other approaches, we introduce several popular criteria for performance evaluation.

2.1 Additive Gaussian Noise Model

As it is well known, the pixel intensity value at a specific location on an image is highly related to the number of photons obtained by the corresponding captor during a fixed period time. According to the central limit theorem, when the light source is stable, the number of photons received by a single captor fluctuates around its average. Thus, in many real applications, especially when the magnitude of observed signal is relevantly high, the the noise n(i, j) can be reasonably assumed as independent and Gaussian distributed with mean and a standard deviation σ. By assuming independent, it

10 simply means pixel intensity values at different locations should be independent random variables. The 1D probability density function (PDF) of the Gaussian distribution is defined by 1 (x−μ)2 G(x)=√ e 2σ2 (2.1) 2πσ

It can be also modeled as additive which means that each pixel intensity value in the noisy image is the sum of the underlying true intensity value and a random noise value following a Gaussian distribution. One can write

y(i)=x(i)+n(i) (2.2) where y(i) is the observed value at pixel location i, x(i) would be the true value which indicates that this value would be obtained if one averages each observed value throughout a very long period of time. Besides, n(i) is the noise.

2.2 Image Quality Evaluation

Choose a proper way to evaluate the performance of denoising approaches is crucial since it is directly connected to the validation of a denoising algorithm applied on real applications. Generally, there are two popular approaches to image quality assessment.

We discuss them in the following.

2.2.1 Objective Image Quality Evaluation

The most intuitive criterion is to calculate the standard deviation of the noise on de- noised image, evaluate it in accordance with image’s smoothness. Other than this, there are mainly five objective quality assessment measures.

• Mean squared error (MSE)

The mean squared error is widely used for quality measurement. It is a mathemat-

11 ical performance index that measures the similarity between the denoised image xˆ and the reference: original noiseless image x. Its expression is defined by

N 1 2 1 2 MSE = ||xˆ − x|| = (ˆxi − xi) (2.3) N N i=1

The lower MSE the denoised image has, the more successful the reconstruction is.

• Peak Signal to Noise Ratio (PSNR)

In applications of image denoising, the MSE is often normalized by the square of the maximum value (it is most likely 255 in regular 2D image with intensity 0∼255) of the signal and scaled in a logarithmic form. Thus, another common measurement: the peak signal to noise ratio (PSNR) is derived as

max(x2) PSNR=10log10 (2.4) MSE

• Structural Similarity Index (SSIM)

In order to better capture the local signal specificities, the structural similarity index

(SSIM) [8] has been designed. This performance index analyzes three different types of similarities between the xˆ and the original noiseless image x: structural, contrast and luminance similarities. By taking advantage of all of them, the formula for SSIM is as follows (2μxμxˆ + c1)(2σxˆx + c2) SSIM(x, xˆ)= 2 2 2 2 (2.5) (μx + μxˆ + c1)(σx + σxˆ + c2) where μx and μxˆ are local measures of the mean of the noiseless image and the denoised

2 2 image; σx and σxˆ is the local measures of the variance of the noiseless image and the denoised image; σxˆx is the local measures of the correlation between two images and c1, c2 are some predefined constants.

It is often shown that, the denoised image scoring the highest SSIM value is the one having the lowest MSE.

12 • Information Fidelity Criterion (IFC)

It is one useful representative of many quality metrics correlating with the human perception. It is proposed by Sheikh et al. in [9] based on natural scene statistics, where they apply a Gaussian scale mixture model under an information fidelity framework.

They assess the image quality by quantifying the statistical information that is shared between the original image and the denoised images. This criterion does not require either additional parameters or training data. Larger value of IFC indicates the test image is more similar to the original image.

2.2.2 Subjective Image Quality Evaluation

Under certain conditions, such as the original clean image is not available, which is very common in practice, it is impossible to exercise an objective assessment. In order to measure the image quality, the simplest way is to reply on human judgment. Instead of the amount of noise reduced, human visual system is prone to focus more on the various artifacts and sharp edges and discontinuities exhibited in the denoised image [10].

Another generic criterion to subjectively evaluate the denoising quality is the so called “method noise” coined in [11], where Buades et al. analyzed the noise guessed by the denoising algorithm. One example of the “method noise” can be defined as the difference between the noisy image and the denoised one by a specific denoising operator. This residual, should behave as similar to a white noise as possible and possess the same statistics as the noise while not represent the structure of the original image.

Besides, some other subjective image quality evaluating methods such as [12, 13] have also been proposed, and they are usually limited to a particular application or a certain type of noisy corruption.

13 2.3 Summary

In practice, most the denoising algorithms requires a prior knowledge of the statistics of the noise. Though different noise models can be assumed due to different sources of noise and detection devices, in this chapter and the forthcoming chapters in Part I, we tackle the problem of noise removal by modeling the noise as an additive, independent random process following a Gaussian distribution.

We also generally analyzed two different types of used to assess the performance of image denoising approaches: objective quality assessment and subjective quality assessment.

14 Chapter 3

Literature Review

In this chapter, we provide a detailed review of the most popular and advanced research approaches in contemporary and some seminal literature on Gaussian image denoising.

But since there exist a tremendously huge amount of related work on the subject of image denoising, a comprehensive overview is beyond the scope of this proposal.

Therefore, we focus on two main classes dealing with image denoising problem: spatial domain denoising approaches and transform-domain denoising approaches. For each class, we further discuss on their basic principles and provide more details on some of the most popular approaches. Note that these classifications are not strict due to the hybrid schemes some denoising methods apply.

3.1 Spatial Domain Approaches

Traditionally in the spatial domain, spatial filters are employed in order to remove noise on corrupted images. The denoising process is directly operated on the observed pixel’s intensity. The operation can either be pixel-wise or based on the relevant neighborhood.

These spatial filters can be further grouped into two classes: linear filters and nonlinear

filters.

15 3.1.1 Linear Filters

The most famous linear filter is the traditional wiener filter [14]. If desired signal and noise statistics are known, it is an optimal estimator in terms of MSE. Wiener

filtering method provides a space-invariant linear estimation of the images corrupted by additive noise.

Suppose that the observed noisy image y is given by

y = x + n (3.1) where x is the noise-free image, and n is the noise with a 0 mean and variance σ2.We suppose they are both independent second-order stationary random processes.

The denoising problem becomes to give an estimationx ˆ = w × y for x such that the mean squared error between x andx ˆ is minimized.w is a linear filter which satisfies the following criterion

N 1 2 wopt = arg min E( ||w × yi,j − xi,j|| ) (3.2) w 2 N i,j where E is the statistical expectation and N is the data size. However, due to this space-invariant property, it ignores the distinction between edges and smooth areas so that suffers from the drawback of over-blurring edges. Additionally, wiener filtering relies remarkably on the power spectrum of the noise-free data, in another word, it will only produce fine performance when the given data is relatively smooth.

Lee [15] proposed an algorithm as the spatially adaptive extension of wiener filter.

This algorithm is based on the assumption that the sample mean and variance of a pixel is equal to the local mean and variance of all pixels within a fixed neighborhood.

For additive noisy case, he assumes the mean and variance of the clear image can be estimated from the difference between the mean and variance of the noisy image

16 and the mean and variance of the noise. Then the algorithm focuses on deriving the mean and variance of each pixel from its neighborhood’s mean and variance. It yields much better performance than traditional wiener filter both in terms of MSE and visual effects. However, due to the fact that not only useful image features such as sharp edges and delicate details, but also noise are likely to be within high frequency spectrum, linear filtering is not an effective tool to distinguish them, in fact, it is prone to over-smooth these image features, therefore its application is limited nowadays.

3.1.2 Nonlinear Filters

The most popular nonlinear filter is the median filter, which simply replaces the in- tensity of a pixel by the median of the intensities in the neighborhood of this pixel.

Suppose yi,j denotes the intensity of the pixel (i, j), median filter can be defined as following to estimate the corresponding pixelx ˆi,j on the uncorrupted image.

xˆi,j = Med {ys,t, (s, t) ∈ S (i, j)} (3.3) where S is the filtering window of pixel (i, j).

Since it does not require the statistical properties of the image, it is convenient to be implemented and applied. Meanwhile it is effective for denoising due to that noise appears as an isolated pixel, and the number of these noisy pixels is relatively small, but image is constituted by blocks of larger number of pixels. So when smoothing the image, median filter can preserve most of the edges information.

In practice application, the size of the filter window is normally selected as 3×3and then gradually increased to 5 × 5 and larger until achieving given performance index.

For the objects with long and slowly changing contour on the image, it is appropriate to adapt the square filter window to rectangular.

In addition, since its output is always a value among the input, as known as a

17 selective filter, the median filter is especially effective when managing impulsive noise with less blurring than linear filters of similar size. But it does not provide so much improvement when dealing with AWGN.

Another simple nonlinear filter is the averaging filter. It replaces a pixel’s intensity by the average of all the pixels’ intensities within a square or rectangular filter window surrounding it. Though it is computationally simple, the main drawback is that the denoising effect is highly dependent with the size of the filter window as well, the larger the size, the more blurring of the image.

Other than median and averaging filters, many advanced nonlinear filters based on them have also been proposed. We introduce a few most famous filtering methods among them.

• Order Statistic Filter

Bovik et al. proposed an order statistic filter(OSF) in [16], which combines prop- erties of both the averaging and median filters. The output of this filter is given by a linear combination of the order statistics of the input sequence. They also explicitly derived an analytical expression for the optimal OSF coefficients.

• Weighted Median Filter

A weighted median filter is introduced in [17]. In this paper, Yang et al. derived a new expression for the output distribution and the output moments of weighted me- dian filters. They applied this expression to evaluate the denoising capability of the proposed weighted median filter. They also justified that the weighted median filter can be easily obtained by solving a set of linear inequalities if certain structural con- straints are met, otherwise nonlinear programming has to be used to get the optimal solution.

• Bilateral Filter

The basic idea underlying bilateral filter [18] is to combine domain filtering and range filtering. When dealing with image discontinuities, beside implementing tradi-

18 tional domain filtering, range filtering is adopted which handles of pixels with similar intensity values together regardless of their spatial locations. Since the authors as- sumed pixels occupying nearby spatial locations share some similarity and are more correlated than distant pixels, the filter replaces the pixel value with a normalized weighted average of similar pixel values in a local neighborhood. The weights of these nearby pixels depend on the products of the spatial domain and the range domain weights. In this way, edges and discontinuities are preserved well while noise is aver- aged out.

• Tri-State Median Filter

In [19], Chen et al. proposed a noise detection scheme by combining the standard median(SM) filter and center weighted median(CWM) filter. The principle of the noise detection is to compare the outputs of SM and CWM filters with the origin or the cen- tral pixel value so that to produce a tri-state decision scheme. They used this scheme to determine if a specific pixel is contaminated by noise before possibly replacing it with a newer value.

• Relaxed Median Filter

In [20], a novel nonlinear filter called relaxed median filter based on median filter is proposed by Hamza et al. It uses lower and upper bounds to define a sublist inside the filtering window, which has included the intensities that are supposed not to be

filtered. When an input belongs to the sublist, it will not be altered, otherwise an output will be obtained as being filtered by the standard median filter.

However, most of spatial domain denoising techniques lead to blur edges of the images since they smooth the singularities.

19 3.2 Transformed Domain Approaches

The motivation for transformed domain processing is that by using an appropriate transform, the most of the signal’s energy can be concentrated into a small number of coefficients. Thus, if we keep a few large coefficients with high SNR whiling discarding a large number of small coefficients with low SNR, the signal can be effectively separated from noise. The basic underlying idea is to first convert the contaminated image into another domain, by the use of an appropriate transformation, then perform an effective denoising operation in the transformed space, without affecting the real signal content, before converting back to the original image domain.

3.2.1 Denoising

At the earliest stage, the Fourier transform is widely applied as a transform domain denoising approach. Under Fourier transform, using low-pass filters with an optimal cut-off frequency, the denoising effect can be achieved by suppressing high frequency coefficients [21]. Since then, some other transforms such as the Discrete Cosine Trans- form (DCT) [22], have been developed. The advantage of applying DCT to denoise is that it assumes mirror reflection of signal at the boundaries instead of periodically repeating properties, so its spectrum is not very subjected to boundary effects when removing discontinuities on images.

However, the main drawback of Fourier transform is that Fourier coefficients all have the same magnitude and it is considered to be very difficult, if not impossible, to determine whether a signal includes a particular frequency at a particular location in the physical domain (space for 2D images or time for 1D signals) (please refer to

Figure 3.1). Besides, these transforms are not only largely dependent on the cut-off frequency and filter model, but also comparatively time-consuming.

20 Figure 3.1: Original Image Lena and its Fourier decomposition

3.2.2 Wavelet Transform Denoising

Among various transform domain denoising approaches, wavelet transform [23, 24] is increasingly considered as a powerful tool for its outstanding denoising performance.

A wavelet is a brief wave-form oscillation with an amplitude that begins at zero, increases and decreases back to zero. It has small area, limited length and the average is zero. can analyze a signal in detail based on different scales. To apply a wavelet transform, a particular wavelet, as known as mother wavelet, is chosen first.

Then it is translated and diluted to meet a given scale and locate the specific position, while investigating its correlations with the analyzed signal. From alternated point of view, the wavelet analysis is a two channel digital filter bank consisting of a lowpass and a highpass filter. The lowpass filtering yields an approximation of a signal at a given scale, whereas the highpass filtering yields the details that constitute the differ- ence between the two successive approximations.

The following figure illustrates the wavelet decomposition of one scale.

The directions response the order in which the highpass (H) and lowpass (L) filters are applied along the orientations of the input image. For instance, Label LH refers to

21 Figure 3.2: One scale of wavelet decomposition the subband in which the coefficients are the output of the lowpass filter in the horizon- tal direction and the highpass filter in the vertical direction. The subbands LH, HL, HH are details, which represents vertical, horizontal and diagonal details and structures, respectively. LL is the low resolution residual and can be further split at coarser scales.

Some commonly used and famous wavelets like Daubechies wavelets, Coiflet wavelets and Symlet wavelets [25] are shown in Figure 3.3. And Figure 3.4 shows the first level wavelet decomposition and reconstruction of the Image Lena where white pixels rep- resent large magnitude coefficients, and black indicates small magnitude.

Similar to Fourier transform, there are also wavelet , integral wavelet transform and discrete wavelet transform in wavelet analysis. However, the substantial difference is that Fourier transform is only considering the one-to-one mapping between time domain and . It represents signal by a function of a single variable, either time or frequency. But wavelet transform uses time-scaling function to analyze non-stationary signals. By changing the scale, wavelet transform can effectively detect transient signals.

The wavelet transform has the following properties [26]:

• Multi-resolution

Since a series of image’s properties under different resolutions can be obtained af-

22 Figure 3.3: Some famous wavelets ter wavelet transform, it provides a very good way to describe the image’s important features such as edges, peaks and discontinuities with balanced resolution at any time and frequency.

• Decorrelation

The relationships between wavelet coefficients in a local region are almost negligi- ble, while in spatial domain, the inter-pixels correlation cannot too significant to be ignored. So this ability of wavelet transform makes pointwise operations efficient.

• Representing information both in spatial and frequency domains

Instead of only localizing in frequency domain like standard Fourier transform does,

23 Figure 3.4: Original Image Lena and its first level decomposition by using db4 wavelet wavelet transform are localized in both time and frequency domain, and each its co- efficient can simultaneously represents image’s local information in both spatial and frequency domains.

• Computational efficiency

Compared to O(NlogN) running time for the fast Fourier transform, the standard discrete wavelet transform only takes O(N) running time. Therefore, it is ideal to apply DWT in the operation of denoising prior to other high level image processing.

• Energy compaction

After discrete wavelet transform, image’s energy is only preserved in a few wavelet coefficients. At the unstable regions such as edges and discontinuities, wavelet coef-

ficients are tend to be with large amplitude and to be sparse, while the amplitudes of the wavelet coefficients within smooth regions are tend to close to zero. Therefore, this property can utilize to adaptively select thresholding value when denoising images based on the signal’s energy.

Due to these obvious advantages, wavelet transform has become a preferred image statistical modeling tool and enjoyed tremendous popularity within the image denois-

24 ing community since its advent.

Essentially, in , the denoising problem based on wavelet transform is an approximation problem, which is, in the wavelet domain, to find the best approx- imation of the original signal according to some criterion. In fact, wavelet denoising problem is the combination of feature extraction and low pass filtering.

The general denoising procedure under the wavelet domain can be described as this:

1) Apply the discrete wavelet transform (DWT) to the input noisy image and ob- tain the empirical wavelet coefficients.

2) Apply a nonlinear operation under proper denoising criterion to obtain the esti- mate coefficients.

3) Use the inverse DWT on the estimated wavelet coefficients and obtain the re- constructed image.

Note that the very first step in denoising requires the selection of a wavelet basis for the forward and inverse DWT. This selection plays an important role to the denoising performance. It is known that after the wavelet transform, the sparser the wavelet coefficients are, the better it facilitates the denoising operation.

Wavelet Based Denoising Methods

Here we introduce some of the most popular wavelet based denoising methods.

1. Singularity detection denoising methods

Singularities often carry the most crucial information in signals. The singularity of a signal refers to the signal breaks somewhere or its some order derivative is not continuous. Singularity detection denoising method possess the characteristics of edges preservation and translation invariance while not producing spurious oscillations and requires very little a priori information of the signal.

In mathematics, the singularities of a signal can be measured by Lipschitz exponent.

25 Although Fourier transform can gives the details of signal’s regularity, it often fails to provide the information about the location and the statistical distribution of singu- larities. Meanwhile, since wavelet transform has the characteristics of time-frequency localization, it can effectively analyze the singularities of signals and locate their posi- tions.

In [27], Mallat et al. incorporated Lipschitz exponent into local maxima of the wavelet coefficients, applied Lipschitz exponent to characterize the irregularities of dis- crete signals and used the decay rate of local maxima in different scales to measure local singularities.

Since negative Lipschitz exponents correspond to sharp irregularities where the wavelet transform modulus increases at fine scales, when scale decreases, if modulus values of some maxima increase dramatically, the corresponding Lipschitz exponent is negative, therefore generally they are corrupted by white noise.

This denoising method is especially useful when the image has vast number of sin- gularities and contaminated by white noise. It could reconstruct images using the detected edges, so that under the criterion of mean square error, it achieves a good performance of denoising. But its processing speed is slow due to the complicated reconstruction of wavelet coefficients.

In [28], Hsung et al. propose to apply both the interscale ratio and the interscale difference conditions of the wavelet transform modulus sum (WTMS) to select the re- quired wavelet coefficients. And the irregular parts of the image can be easily detected by calculating the WTMS of the noisy image inside the so-called corresponding “di- rectional cone of influence”. Their method have no problem of incorrectly removing regular parts of the signal for it only considers the local regularity of the signal and only uses the local information. Furthermore, since no a priori information of the signal or noise is required, the denoising performance is independent of the variation of noise within the signal.

26 2. Projection denoising methods

The basic principle of projection denoising method is to project the noisy signal to a gradually narrowing space in an iterative manner. It includes two main methods:

Matching pursuits method [29] and Projection Onto Convex Set(POCS) method [30].

Matching pursuits method decomposes signals into a linear expansion of waveforms appropriately selected from a redundant dictionary of functions. It projects the noisy signal over the space generated by some a given wavelet family, then projects the residue over the space. The process continues iteratively until the residual error meets a desired precision. A matching pursuit is a greedy strategy since at each iteration, it selects a waveform best adapted to approximate part of the signal. Besides, they use Gabor functions dictionary to define a time-frequency transform and apply it to denoise.

The theory of POCS justifies that successive projections onto closed convex sets starting from an arbitrary initial value would converge.

In [30], a projection onto convex sets method is applied to find sparse estimates that lie in the intersection of confident sets and successively project data onto these sets in multiple wavelet domains. However, its performance is highly sensitive to the initial starting point of the POCS iteration due to their convex sets contain the original noisy signal.

Based on the same POCS theory, in [31] Choi et al. developed a multiple wavelet ba- sis denoising algorithm. The algorithm connects wavelet thresholding and a variational problem with Besov smoothness . It first projects the image onto a Desov ball which is defined in multiple wavelet domains of appropriate radius and then projects

Besov balls onto their intersection. In this way, an effective estimate combining esti- mates from multiple wavelet domains are gained.

3. Thresholding denoising methods

The most commonly used denoising strategy applying wavelet coefficients is the

27 thresholding strategy. The most significant advantage of wavelet thresholding strategy is that it is a nonlinear operation, it can not only suppress the noise but preserve im- portant features such as edges, textures and discontinuities on images.

Natural images consist of non-overlapping smooth regions which are bounded by features such as edges. Wavelet transform creates a sparse representation of the input image due to its property of decorrelation and energy compaction, which means most of the coefficient values are zero or very close to zero. Hence, wavelet coefficients can be coarsely classified into two classes according to their magnitudes. One class of wavelet coefficients of relatively large magnitude are generally dominated by the actual signal like edges and ridges so that carry more signal information, which we need to retain, while the other class of wavelet coefficients of relatively small magnitude are mainly due to noise which we need to suppress, and most of the coefficients fall into this class.

Therefore the idea of replacing the smallest, noisy coefficients by zero and performing an inverse wavelet transform on the resultant coefficients as to reconstruct the original signal with the essential signal information and less noise is appealing.

Here we give a brief introduction to hard and soft thresholding methods, a semi-soft thresholding method derived from soft thresholding and some other most used wavelet thresholding methods consequently. We need to note that, though the implementation of the thresholding values vary extensively among different methods, like, the thresh- olding values can be globally used for all coefficients while multiple thresholding values may be assigned to different subbands and different scales of the transform, they still share the same general procedure as following:

1) Operate the discrete wavelet transform (DWT) on the incoming image

2) Set the thresholding values for the wavelet coefficients. The thresholding values can be either subband adaptive or universal.

3) Calculate the inverse discrete wavelet transform (IDWT) to obtain the estimated image.

28 • Hard-thresholding

The basic scheme of hard-thresholding is shown in the following figure, where λ is the thresholding value.

Figure 3.5: Hard-thresholding function

Hard-thresholding is like a “keep-or-kill” procedure. The simple idea of it is to set an appropriate thresholding value, those coefficients of values less than the thresholding value are cancelled, and those coefficients of larger values are retained. The hard- thresholding function is defined as ⎧ ⎨⎪ xi,j|xi,j|≥λ xˆi,j = δ(xi,j,λ)=⎪ (3.4) ⎩ 0others

The philosophy of this method is that because white noise is spread out equally among all wavelet coefficients, and small coefficients are more likely due to noise, so that the thresholding value is a trade-ff between decimating a crucial signal coefficient when the thresholding value is large and preserving excessive noise coefficient when the thresholding value is small.

However, prominent artifacts and spurious oscillations such as sharp spikes and

Gibbs phenomenon will be generated during the denoising operations due to the failure

29 of suppressing medium value noisy wavelet coefficients.

• Soft-thresholding

On the other hand, the soft-thresholding strategy can overcome the intrinsic draw- back of hard-thresholding. The basic scheme of soft-thresholding is shown in the fol- lowing figure.

Figure 3.6: Soft-thresholding function

If the coefficient value is larger than the thresholding value, it is decreased by being subtracted this thresholding value. If the coefficient value is smaller than the thresholding value, just set it to zero. Its function is as this ⎧ ⎨⎪ sgn(xi,j)(|xi,j|−λ)|xi,j|≥λ xˆi,j = δ(xi,j,λ)=⎪ (3.5) ⎩ 0 others where sgn is the sign function.

Since the amount of shrinking equals the thresholding value, the input-output plot becomes continuous.

In [32,33], the soft-thresholding strategy has been theoretically justified by Donoho and Johnstone. In these papers, Donoho and Johnstone have shown soft-thresholding being the asymptotically nearly optimal in a minimax mean-square error sense over a variety of smoothness spaces and have derived a minimax square error bound for

30 soft-thresholding functions that depends on data sample size and the level of additive

Gaussian noise contamination. They also have provided an expression for the well- known universal threshold to denoise images also as known as VisuShrink [32] algorithm as this √ λUNIV = 2lnNσ (3.6) where N being the signal length, for images, it is the number of pixels in the image, and being the noise variance.

It is useful for obtain a starting value when nothing is known of the signal condition, and may provide a better estimation if the number of samples are relatively larger.

Unfortunately, since the universal threshold is derived with the restrict that the estimation is highly dependent on the signal’s length, its value is incline to be higher for larger value of N, therefore VisuShrink may over-smooth the noisy image and cause distortion such as edge blurring so that it is lack of robustness. An improved semisoft- thresholding operator called WaveShrink is suggested in [34].

• Semisoft-thresholding

Figure 3.7 displays its thresholding function, where λ1 and λ1 are lower threshold and upper threshold respectively.

[h]

Figure 3.7: Semisoft-thresholding function

31 The following is its analytical expression ⎧ ⎪ ⎪ 0 |xi,j|≤λ1 ⎨⎪ λ2(|xi,j |−λ1) xˆi,j = δ(xi,j,λ1,λ2)= sgn(xi,j) λ1 < |xi,j| <λ2 (3.7) ⎪ λ2−λ1 ⎪ ⎩⎪ xi,j |xi,j| >λ2

Generally, It preserves the highest coefficients while transits smoothly from small and noisy coefficients to large important coefficients. The upper threshold λ2 can be set √ as the aforementioned universal threshold, which is 2lnNσ. Similar upper thresh- √ √ olding values such as 4lnNσ and 6lnNσ are also acceptable which even converge to zero faster. Meanwhile, the selection of λ1 is highly dependent on the signal. For instance, for signals with more non-smooth features such as sharp discontinuities, a smaller lower thresholding value is probably more suitable.

Note that in the particular case where λ1 = λ2, one can recover the hard-thresholding; and when λ2 →∞, it turns into soft-thresholding.

• Other thresholding strategies

Some other thresholding denoising approaches based on optimality of minimax has also been proposed, e.g. [35, 36]. However, the implementation of minimax as the estimator of error between corrupted data and clean data does not provide as precise measurement as MSE does, so within soft-thresholding theory, other strategies [1, 37–

40] have emerged in order to search for better threshold selections. Here, we introduce a few most remarkable works.

For example, SureShrink [40] is proposed by Donoho to compute subband adaptive thresholds. It fills up the deficiency when the wavelet coefficients are not sparse enough so that the minimax theory would become imperfect. It is based on Stein’s Unbiased

Risk Estimator (SURE), an estimator to estimate the Mean square error in an unbiased manner. In this method, each highpass subband is assigned an individual threshold ,

32 which can be obtained by minimizing the following SURE(λ, ω)

N−1 2 2 2 λ =SURE(λ, ω) = arg min[σ N+ min{ωi,λ }−2σ #{|ωi|≤λ}] (3.8) i=0 where N being the number of coefficients in a subband, # being the number of ele- ments in the set, σ2 being the noise standard variance, ω being the set of the wavelet coefficients in a subband.

The well-known BayesShrink [41] models wavelet coefficients as the generalized

Gaussian distribution (GGD), and implements a subband adaptive soft-thresholding strategy based on context modeling of wavelet coefficients. In addition, the authors suggest an estimated threshold σ2 T = n (3.9) σx

2 where σn is the estimated noise variance and σx is the estimated standard deviation of the wavelet coefficients in each detailed subband.

The noise variance is estimated by the median estimator in the diagonal detail coefficients of the highest subband. The standard deviation of the wavelet coefficients can be estimated as this

2 2 σx = max(Y˜ − σn, 0) (3.10) where Nw 2 1 2 Y˜ = Yn (3.11) Nw n=1

In the above expression, Nw is the number of wavelet coefficients Yn whose contexts are within a pre-determined moving window.

σn The rationale to select this threshold is intuitively appealing. When σx 1,the T signal is dominant, so the normalized threshold σn is set to be small in order to retain σn most of the signal while removing some of the noise; on the contrast, when σx 1, T which means the contamination is severe, in order to reduce noise, σn is set to be large.

33 In their algorithm, context modeling is applied in order to provide additional infor- mation such as existence of smooth and edge areas.

Abramovich et al. [38] presented an empirical Bayesian approach which combines information of empirical wavelet coefficients within a neighbor to estimate the block wavelet thresholding estimators. In their paper, a prior distribution on the wavelet co- efficients is provided in order to capture the sparseness of wavelet expansion. Then the thresholding function is proposed by applying certain Bayesian rule on the resulting posterior distribution of the wavelet coefficients.

Jansen et al. [37] provided an adaptive algorithm to select level dependent thresh- olds for wavelet coefficients in order to reduce noise. They used generalized cross validation (GCV) procedure to estimate the optimal threshold which minimizes the

MSE as compared to the original data. The GCV thresholding is given by

1 || − ||2 N ω ωδ λ = min 2 (3.12) N0 N where N being the number of coefficients in a subband, N0 being the number of coef-

ficients replaced by zero in shrinkage, ω and ωδ being the noise coefficients before and after shrinkage respectively.

Though GCV is not unbiased, this procedure does not require estimation for the noise energy, and the thresholding is close to the ideal thresholding, many other papers derive their wavelet thresholding functions based on GCV criterion [42,43].

Luisier et al. [1] proposed a state-of-the-art thresholding algorithm named SURE-

LET. It directly simulates the denoising procedure as a linear combination of basis nonlinear processes with unknown weights. These weights are calculated via a linear system of equations by minimizing the mean square error between the clean image and the corrupted one based on the Stein’s unbiased risk estimate (SURE). They also take full advantage of the interscale relationships of the wavelet coefficients by integrating

34 an interscale predictor. The integration of this interscale predictor brings an additional improvement for PSNR of 0.5 dB on average and even more for Image Barbara, which is over 1 dB. The principal advantage of this algorithm is that it does not need to de- rive a particular statistical model for the wavelet coefficients, but only based on noisy data alone. They reported that under non-redundant wavelet transform, this method outperforms other best wavelet based methods in terms of PSNR for over a set of eight images under different degrees of additive noise contaminations, with the exception of

Image Barbara, where recurrent textures appear extensively.

• The comparison of various thresholding methods

This section compares several most famous denoising methods based on threshold- ing strategies by testing some standard images corrupted with additive Gaussian white noise.

In 2.2, we give the details of two basic approaches when assessing the denoising qual- ity of a certain algorithm. Although the subjective quality assessment can obviously reveal improvements such as regional smoothness and edge sharpness, this approach is only appropriate to observe very significant denoising effects.

Thus, here we apply the other approach: the objective quality assessment which quantifies the similarity between the original noise-free image x and estimated image xˆ after denoising through a mathematical criterion. We adopt the popular criterion the peak signal-to-noise ratio (PSNR) whose definition is given in Equation (2.4) and use two famous gray-level images of size 512×512: Lena and Barbara. We test them under different noise levels (noise standard deviation) by different popular denoising meth- ods. The denoising methods for comparison include VisuShrink [32], SURE-LET [1],

SureShrink [40], BayesShrink [41]. The results are summarized in Table 3.1. The high- est values of PSNR are displayed in bold.

It is obviously seen that VisuShrink is the least effective among the compared meth- ods. It is basically because it applies a universal threshold which is not adaptive as

35 Table 3.1: Comparison of several most famous thresholding denoising methods (PSNR)

σ 10 20 30 Input PSNR 28.13 22.11 18.59 Methods Lena VisuShrink 28.96 26.48 25.32 SureShrink 33.30 30.18 28.46 BayesShrink 33.56 30.35 28.64 SURE-LET 34.56 31.37 29.56 Methods Barbara VisuShrink 24.93 22.87 22.15 SureShrink 30.34 26.02 24.43 BayesShrink 31.25 27.32 25.34 SURE-LET 32.18 27.98 25.83 other methods. Meanwhile, SureShrink and BayesShrink produce similar results on

Lena image and have an outstanding improvement to VisuShrink in term of PSNR.

For Barbara image, BayesShrink outperforms SureShrink by approximately increasing

1dB of PSNR on average.

The SURE-LET has the best denoising performance among these four algorithms due to the fact that the threshold is subband adaptive and it is not dependent on any particular statistical model of the noise-free data. Meanwhile, its computation only takes a couple of seconds for 512 × 512 images, which makes it very competitive.

4. Bayesian based denoising methods

Due to the decorrelation ability of the wavelet transform, the noise-free data nor- mally can be explicitly described, which is the basis for Bayesian denoising methods.

Therefore, wavelet based denoising performance can also be operated by first exploiting the statistical properties of wavelet coefficients with a probabilistic model, then min- imizing the error between the noise-free data and the estimator modeled a Bayesian risk, and finally developing shrinkage functions based on it. The most commonly used model for the statistical distribution of wavelet coefficients in each detailed subband of the image is Gaussian distribution [44], which is approximately precise for practice

36 use, and here we briefly introduce a general idea of how it works.

We can develop the following estimation for noise-free data based on the noise and signal’s probability density function(PDF).

xˆ =argmaxpx/y(x/y) (3.13) x where x is the true wavelet coefficients, y is the noisy wavelet coefficients. They satisfy y = x+n, in which n is noise with independent Gaussian distribution. Under Bayesian rule, the above estimation formula can be derived as this

xˆ =argmax[py/x(y/x) · px(x)] = arg max[pn(y − x) · px(x)] (3.14) x x

It is equivalent to this

xˆ =argmax[log(pn(y − x)) + log(px(x))] (3.15) x

2 Since we have assumed px to be a zero mean and σ variance Gaussian distribution, the estimation is easily turned to be

2 σ · xˆ = 2 2 y (3.16) σ + σn

So according to this formula, given the noise variance and noise-free wavelet coeffi- cients’ variance, the noise-free wavelet coefficients can be estimated.

Besides Gaussian distribution, many other probabilistic models have also been pro- posed. For instance, generalized Gaussian distribution(GGD), as known as generalized

Laplacian distribution [45–48], Gaussian scale mixture [3, 49, 50], Gaussian mixture

field [51], alpha-stable [52, 53], modified Gauss-Hermite (MGH) probability distribu- tion [54], Hidden Markov Models (HMM) [55].

37 Furthermore, to minimize the error between the noise-free data and the estimator,

Bayesian risks such as maximum a posteriori (MAP) [46, 56], minimum mean square error (MMSE) [47,48,57], maximum likelihood estimator [58,59] are applied.

Among the many denoising algorithms available now, here we would like to put some emphasis on the followings.

• BLS GSM

Portilla et al. [3] estimate clean data from observed data by modeling the neigh- borhoods of each wavelet coefficient at adjacent location and scale as a Gaussian scale mixture (GSM), the product of a Gaussian vector and a hidden scalar multiplier, which can be expressed as √ x= zu (3.17) where u is the zero-mean Gaussian vector and z is an independent scalar random variable known as multiplier. The advantage of using GSM to model wavelet-based statistics of natural images comes from its ability to capture both marginal distribution with long-tailed characteristics and pairwise joint distributions of wavelet coefficients.

In their method, the coefficients in inactive region of the image are suppressed sig- nificantly. The chosen prior distribution is the Jeffrey’s prior. Its statistical model is based on an overcomplete tight framework known as steerable pyramid, which is without aliasing but with high ability to distinguish orientations. It also explicitly incorporates the covariance between coefficients within neighborhood. In addition, it considers a neighborhood for a coefficient in the same position of the coarser scale as well. Moreover, although the noise is assumed to be additive white Gaussian noise, the method is still effective when dealing with non-white Gaussian noise provided covari- ance is known.

The shrinkage estimator applied is a Bayes least square (BLS). Their denoising method as known as BLS-GSM is the most efficient wavelet based approach in terms

38 of peak signal-to-noise ratio (PSNR). However, although its results are superior, since it is used with overcomplete wavelet transform as well as with pyramid representation, it inevitably requires more time and computer memory during the denoising process.

• ProbShrink

Pizurica et al. [45] propose a new subband-adaptive shrinkage function called Prob-

Shrink which suppresses each wavelet coefficient based on its probability of representing important information: a significant noise free component, which they called “signal of interest”. In contrast to their previous works in which the estimation of the required probabilities either reply on the Markov Random Field (MRF) priors [60] or empirical density estimation combined with fitting of log-likelihood ratio [61], they propose a new approach that does not require a preliminary coefficient classification. It models all the necessary probabilities assuming that there is given a generalized Laplacian prior for clean data.

They define this “signal of interest” as a noise-free coefficient component which exceeds a pre-determined threshold T. Further, they define

μ = P (H1)/P (H0)(3.18) and

η = f(Y |H1)/f(Y |H0) (3.19)

where Y is observed data, H1 represents the hypothesis that a signal of interest exists while H0 represents the hypothesis that a signal of interest does not exist in a particular coefficient, f(Y |H1)andf(Y |H0) are the conditional densities of the noisy coefficients given the hypotheses above. Therefore, the shrinkage rule to estimate the noise-free data X can be expressed as the following

μη Xˆ = P (H1|Y )Y = · Y (3.20) 1+μη

39 where μη is the generalized likelihood ratio, and P (H1|Y ) denotes the conditional prob- ability of a “signal of interest” possessed by a wavelet coefficient provided the observed value.

It is indicated that two noisy coefficients will be shrunk by the equal amount if they share the same magnitude regardless of their spatial position and neighborhood characteristics. In addition, the idea of soft-thresholding is presented here: the small- est coefficients are heavily suppressed towards zero while the largest ones remain the same.

The advantages of the proposed approach include that it doesn’t need any pre- liminary edge detection procedures and the simplicity of implementation yields faster computation than the previous work based on MRF. Experimental simulations have shown that their subband-adaptive shrinkage algorithm outperforms other classical soft-thresholding approaches with a global thresholding value for each subband.

It should be mentioned as well that the authors also have extended the algorithm to a spatial-adaptive method. In this extension, the shrinkage rule is based on each coefficient’s magnitude, local measurement and global statistical characteristics of all the coefficients in a certain subband. It demonstrates superior denoising performance as compared to approaches based on MRF and Hidden Markov Tree (HMT) models.

• BiShrink

Sendur et al. [56] introduce four novel non-Gaussian bivariate distributions for wavelet coefficients of natural images in order to model inter-scale dependencies be- tween the wavelet coefficients and their parent coefficients. And then nonlinear thresh- olding functions based on Bayesian estimation technique—MAP (maximum a posteri- ori) estimator is derived. It suggests new bivariate probability density functions for wavelet coefficients of natural images and their corresponding shrinkage functions only consider the dependency between a wavelet coefficient and its parent, but not other dependencies such as that of a coefficient and its neighbors. Their nonlinear threshold-

40 ing functions are simply developed from the concept of soft-thresholding. We simply summarize the algorithm’s idea here.

After applying wavelet transform, we have the model as

Y = X + N (3.21) where X is the wavelet coefficient of the noise-free image which we need to estimate, and N is the noise.

Since N is assumed as the additive white Gaussian noise, it can be modeled as a independent random variable, thus we have the following relationship between their variance.

2 2 2 σy = σ + σn (3.22)

In their paper, one of the non-Gaussian bivariate probability density function for the wavelet coefficient w1 and its parent coefficient w2 can be expressed as this

√ 3 3 2 2 pw(w)= exp − w + w (3.23) 2πσ2 σ 1 2 where σ2 is the marginal noise-free data’s variance which is dependent on the coefficient index.

The estimate the variance of the observed wavelet coefficients using the neighbor- hoods around each wavelet coefficient plus its parent’s neighborhood in the coarser scale. The formula is as this 1 σˆ2 = Y 2 (3.24) y M i Yi∈N

2 where M is the number of coefficients in the neighborhood N. The noise variance σn is estimated from the finest scale noisy wavelet coefficients using the classical expression:

41 2 median(|Yi|) σ = ,Yi ∈ subband HH1 (3.25) n 0.6745

Furthermore, they estimate the standard deviation of the noise-free wavelet coefficients as

2 − 2 σˆ = (ˆσy σˆn)+ (3.26)

After all the information above is available, the MAP estimator for w1 is given by

 √   2 2 2 − 3σn Y1 + Y2 σ + 1  · 1 wˆ = 2 2 Y (3.27) Y1 + Y2

Note that the operation of their algorithm is point-wise and they also apply dual-tree

CWT (DT-CWT) to obtain the wavelet coefficients in order to take the advantage of shift-invariance and directional response, which we will introduce later.

• UWT SURE-LET

In [62], Blu and Luisier extend their SURE-LET principle coined in [1] to nonlinear processing performed in an undecimated wavelet transformed (UWT) domain.

Similar to the authors’ previous work of SURE-LET on orthonormal wavelet trans- form (OWT) [1], this method does not require the assumption of a statistical model of the noise-free image.

The denoising operation F (Y ) is generalized as a linear combination of pre-set elementary processes Fk(Y )

K F (Y )= akFk(Y ) (3.28) k=1

The weights ak are obtained from the minimization of the SURE, which are the solution of a series of linear equations. In addition, they propose a newer subband-dependent denoising function than the one used in their orthonormal version which only involves

42 two linear parameters and can be regarded as a smooth approximation of a hard- threshold.

Nevertheless, the most remarkable specificity of UWT SURE-LET is not only this new approach is based on redundant or non-orthonormal transforms, but its optimiza- tion is operated in the image domain. They choose so because the SURE minimiza- tion in image domain is equivalent to SURE minimization in individual subband only if the transformation is orthonormal. Whenever the transformation becomes redun- dant/undecimated or non-orthonormal, the above conclusion will be incorrect. There- fore, the estimated image is generated by the summation of a series of weighted images reconstructed from the corresponding subband processed by their proposed pointwise thresholding functions plus the low-pass residual subband. These images are weighted by their corresponding SURE-optimized parameters obtained in the image domain.

Though the algorithm’s orthonormal version provides best denoising performance among denoising methods applying OWT, it should be acknowledged that UWT SURE-

LET is not so competitive as compared with other state-of-the-art algorithms using redundant transforms. Therefore, it is suggested in the paper that if inter and intrascale dependences are introduced, their algorithm might improve the denoising performance significantly. But it is still worth to mention that this strategy is computationally very efficient since the minimization of the SURE only requires to the solutions of a series of linear equations.

• Denoising method using derotated complex coefficients

Miller et al. [49] estimate clean wavelet coefficients by two types of modeling, one is used for edge and ridge discontinuities and the other is used for remaining areas. Both models are based on GSM.

The wavelet coefficients used to represent structural features are obtained from a redundant, shift-invariance, oriented and complex multiscale transform, called dual tree complex wavelet transform (DT-CWT) [63]. The authors then further take ad-

43 vantage of the interscale phase relationships of complex wavelet coefficients. It has been justified the complex wavelets to be an effective method of representing images and especially good as a basis where periodic textures appear regularly. Areas other than structural features are modeled by standard wavelet coefficients. Each detailed subband is divided into several overlapping neighborhood. Denoising performance is operated on the centered complex coefficient within the neighborhood. Which mod- eling to select for each neighborhood is determined by an adaptive Bayesian model selection framework. The reason they implement this mechanism of switching between modeling methods is because derotated wavelet coefficients are not appropriate when modeling some image features which are not multiscale, which the authors also ac- knowledge in their another paper [64]. Certain features like regular texture in specific areas are more proper to be modeled by standard wavelet coefficients.

This strategy provides noticeable denoising improvement by suppressing ringing artifacts near feature discontinuities as well as sharpening edges in images. Their work outperforms previously published methods using overcomplete representation in PSNR criterion. In addition, It is suggested to consider integrating interscale relationship of the coefficients to define the model probability.

• Comparison of the aforementioned image denoising methods

In this section, we still use three famous grayscale images: Peppers of 256 × 256,

Lena and Barbara of 512×512 and test them under three noise levels (noise standard de- viation σ =10, 20, 25). The denoising methods for comparison include ProbShrink [45],

BLS GSM [3], BiShrink [56], UWT SURE-LET [62] and the method applying dero- tated complex coefficients [64]. The results are summarized in Table 2 and the best results are shown in boldface. Note that all of these methods apply with redundant wavelet transforms. It is clearly indicated that the algorithm using derotated complex wavelets generates the most impressive performance among than listed algorithms in term of PSNR over both images and all noise levels. The PSNR improvement is even

44 Table 3.2: Comparison of the aforementioned image denoising methods(PSNR) σ 10 20 25 Input PSNR 28.13 22.11 20.17 Methods Peppers BLS GSM 33.77 30.31 29.21 ProbShrink 33.90 30.30 29.23 BiShrink 33.38 29.80 28.67 UWT SURE-LET 34.00 30.53 29.40 derotated complex wavelets 34.22 30.79 29.69 Methods Lena BLS GSM 35.61 32.66 31.69 ProbShrink 35.24 32.20 31.21 BiShrink 35.34 32.40 31.40 UWT SURE-LET 35.08 32.06 31.30 derotated complex wavelets 35.67 32.79 31.82 Methods Barbara BLS GSM 34.03 30.32 29.13 ProbShrink 33.46 29.53 28.23 BiShrink 33.35 29.80 28.61 UWT SURE-LET 32.65 28.45 27.18 derotated complex wavelets 34.20 30.63 29.49 larger for the Barbara image, which has large areas of periodic texture, indicating that the efficiency of applying directional selective transform. But this algorithm is quite time consuming.

Note that the UWT SURE-LET has the worse result for Barbara image. It is because this method fails to capture the texture information this image carries. There- fore, their authors suggest to select other transforms that have more subbands, so that might provide better denoising performance than the state-of-the-art algorithms.

However, this algorithm is very fast compared to other state-of-the-art algorithms.

Though Bayesian based denoising approaches have gained tremendous popularities among wavelet denoising community, nevertheless, we still need to address that some- times it would not be easy to deduce an analytical solution for the shrinkage function if the statistical model for noise-free signal is too complicated. Furthermore, it would be potentially inefficient to eliminate noisy coefficients if this model fails to appropriately

45 represent the statistics of the noise-free signal.

Denoising Methods Based on Shift Invariant Wavelet Trans- form

Though the wavelet transform has been employed as a very powerful tool in image de- noising, its most representative implementation, the critical sampled discrete wavelet transform (DWT), has a significant problem: shift variant which is potentially not desirable to many applications. The property of shift variance refers to the fact that the there is no simple relationship between the wavelet coefficients of the original and the shifted signal and the transform is sensitive to the spatial shifts of the input signal.

In general, the shift variance is brought in by the use of critical sub-sampling (down- sampling) in the DWT, in which case every other wavelet coefficient at each decompo- sition level is discarded. This causes small shifts in the input waveform, therefore large variations in the wavelet coefficients. From a frequency response standpoint, it will disobey the Nyquist criteria if the subbands are down-sampled so that those frequency components not accordant to the cut-off frequency are aliased to the incorrect subband.

Additionally, in fact, as long as not all wavelet coefficients are used to perform Inverse

DWT, the shift variance is introduced to the reconstructed signal [65].

The basic motivation of applying shift-invariant representations when denoising im- ages instead of merely applying the critical sampled coefficients comes from the idea of suppressing the obvious Pseudo-Gibbs phenomena after denoising. These disturbing visual artifacts are normally produced in the area of edge and ridge discontinuities in images and are generally caused by the aforementioned shift variance.

Thus, obtaining shift-invariant wavelet transform, which can be also called over- complete representations of the wavelet coefficients, and applying the coefficients to the denoising strategy is becoming increasingly popular in recent years. Generally,

46 making DWT shift invariant results in better denoising performance in term of PSNR and MSE significantly in comparison with those only involving critical sampled wavelet representation like in [1].

The most straightforward way to obtain shift-invariance is to apply the undecimated discrete wavelet transform (UDWT), which omits the down-sampling when computing the forward transform and up-sampling when computing the inverse transform.

Another method which up-samples the low-pass filter at each level by inserting zeros between each of the filter’s coefficients can introduce shift-invariance as well. Al- gorithmea ` trous [66] which is usually referred as its most typical representation. The output of this algorithm on each scale contains the same number of samples as the input by inserting zeros in the filters, therefore it is an inherently overcomplete scheme but requires additional computational time and memory.

Shift invariance can also be achieved by the procedure introduced in [67]. In this paper, the authors Coifman and Donoho propose a general routine called “cycle spin- ning”. By carrying out this procedure, the signal is spatially shifted and the positions of features are changed. Then approximate shift invariance can be achieved, as a result, the artifacts produced by standard DWT in the denoised image can be diminished by re-aligning the given noisy image.

Another classical method called stationary wavelet transform (SWT) where the coefficients are not decimated is presented in [68]. This transform results in an over- complete representation of the original data and further achieves shift-invariance.

Other than the aforementioned methods, more complicated shift invariant repre- sentations of discrete wavelet transforms [63,69,70] were also introduced.

For instance, Kingsbury [63] coined a dual tree complex wavelet transform (DT-

CWT), which produces the real and imaginary parts of complex wavelet coefficients of the input signal. This representation provides shift invariance while introducing limited redundancy and retaining perfect reconstruction.

47 In [69], Lang et al. incorporate Beylkin algorithm into their method to implement the shift invariant discrete wavelet transform (SIDWT) and extend Donoho’s thresh- olding strategy on the standard wavelet transform [40] to the shift invariant wavelet transform. They also have mathematically justified the theories of Donoho’s thresh- olding is applicable to their proposed SIDWT.

The authors of [70] present a wavelet transform named the multiscale wavelet repre- sentation (MSWAR) with the property of shift invariance in their paper. In contrast to generate redundant coefficients, their method avoids an exhaustive shifts of the input signal and only compute the necessary non-redundant coefficients at each scale both for 1D and 2D cases.

In addition to achieve shift-invariance by applying classical single wavelet transform, multiwavelets can also be employed to maintain shift-invariance and further enhance the effect of denoising. Generally, multiwavelets can be obtained by using translation and dilation of more than one mother wavelet function.

The basic steps using multiwavelets to denoise is like this [71]:

1) Pre-process the input noisy signal and map single stream into multiple streams by a specific prefilter.

2) Decompose the multiple streams by discrete multiwavelet transform (DMWT) so that to obtain their multiwavelet coefficients.

3) Apply thresholding strategy to the multiwavelet coefficients.

4) Reconstruct the modified multiwavelet coefficients by the inverse process of

DMWT.

5) Post-process the denoised multiple streams and map them back into a single stream to get the denoised signal.

Although multiwavelets introduce the increase of computational complexity, they have still attracted extensive attentions because of their advantages over single wavelets such as symmetry, short support, higher order of approximation through vanishing mo-

48 ments. Hence, multiwavelets have been shown superior performance over single wavelet transform [71–74].

Denoising Methods Based on Other Wavelet-Related Trans- forms

It is intuitively known that in natural images, features such as edges and textures often appear at various orientations. However, since the standard orthogonal wavelet transform uses a separable basis, only features along two orientations, namely horizon- tal and vertical, are sensitive to response, as a result, rotation variance is introduced.

One simple way to avoid rotation variance is to interpolate the input data, but it will inevitably cause unnecessary correlations between pixels within neighborhood, which contradicts the property of decorrelation, one of wavelet transform’s main advantages.

Fortunately as early as in 1990s, researchers such as Simoncelli et al. have found more directional responses can be attained by applying shiftable or steerable trans- forms [75, 76]. Besides the work of Simoncelli et al., a vast number of directional wavelet transforms based on wavelet transform have been developed to describe the object’s directional characteristics on the image.

For instance, ridgelets [77] and curvelets [78] are the most widely known tools which are designed to specially and effectively couple with line-like phenomena in 2D images.

The theories of ridgelets and curvelets transforms indicate they would provide signifi- cant performance of restoring sharp edges, linear and curvilinear features.

Moreover, in recent years, some more novel wavelet based transforms have also been introduced to achieve the properties of rotation invariance. Among them, adaptive wavelet transforms such as directional dyadic wavelets [79], contourlets [80], direction- lets [81], beamlets [82], bandelets [83] have attracted extensive attentions due to their delicate designs to present better image features given special geometry on the images.

49 Nevertheless, if we consider the circumstance that the noisy images possess some special characteristics, for example, most objects on the images are isotropic, an trans- form which does not prefer any specific orientations, such as isotropic wavelet trans- form [84,85], may become more appropriate and achieve better denoising performance.

For instance, applying isotropic wavelet transform to those astronomical images with extensive point light sources sprinkling on a flat background would outperform applying directional wavelet transforms.

3.2.3 Data-Adaptive Transform Denoising

In transformed domain, other than denoising under the conventional Fourier transform and classical wavelet transform, we need to be aware of another widely-used group of transforms: data-adaptive transforms.

One of the most used data-adaptive transform is Principal Component Analysis

(PCA) which is also known as Hotelling transform. The essential concept of it is to obtain a linear subspace according to the observed data such that minimizes the mean squared distance between observed data and their projected counterpart in the sub- space. It possesses the ability to ideally decorrelate the data and the ultimate goal of it is to achieve optimal reduction of redundancy of the data. Because of such proper- ties, it has been spread employed to develop image denoising algorithms [86–88]. The general idea when denoising is to decompose the input signal by applying the princi- pal components, then perform an appropriate operations on the resultant coefficients, and finally reconstruct. Note that the adaptive principal component analysis proposed in [86] has a built-in shift invariant characteristic.

In addition, the Independent Component Analysis (ICA) has also emerged as a suc- cessful approach applying to denoising. It overcomes a significant limitation of PCA, that is PCA is only dependent on second-order statistics of the data while most natural images are in high-order statistics. The basic idea of ICA is to find a representation of

50 the observed data so that the resultant components are as statistically independent as possible from each other. Such a representation has the strong ability to capture the essential structures on the image. Therefore, the ICA method has been introduced as a valuable tool to denoise images [89–91].

3.3 Other Alternative Approaches

Besides performing image denoising methods in spatial domain and transform domain, some other remarkable methods have also been risen as powerful alternative approaches.

In this section, we briefly introduce a few of the most successful approaches.

• BM3D

Dabov et al. coin a novel image denoising algorithm called the sparse 3D transform- domain collaborative filtering (BM3D) in [92]. Essentially, this algorithm is composed as a two-step procedure. The first step includes the process of grouping similar 2D fragments of the image into 3D groups and then applying a 3D collaborative hard- thresholding procedure specially developed to deal with 3D groups and finally obtain- ing the estimate of each pixel by weighted averaging of the block-wise estimates that overlap with the current pixel. In the second step, the general procedure is mostly similar except that 1) the grouping is operated on the basic estimate resulted from the

first step; 2) the 3D hard-thresholding is replaced by an pointwise Wiener filtering, which regards the basic estimate as the underlying true clean image.

BM3D is very effective to preserve image features like textures, periodic patterns and sharp edges and currently achieves the best PSNR performances for most images and for a wide range of noise levels.

• Total variation minimization

The basic principle is that a signal with high total variation (TV), the integral of the absolute gradient of the signal, is very likely to contain excessive and spurious

51 details. Therefore, if the total variation of a noisy signal can be decreased, it will yields a close estimate to the original signal.

In [93], a constrained minimization algorithm has been derived to handle image denoising as a nonlinear time dependent Partial Differential Equation (PDE). The minimization of the total variation of an image is determined by constraints involving the statistics of the noise. In order to solve the constraint problem, the algorithm in- troduces a parameter related to the degree of filter, called Lagrange multiplier so that reformulated it to an unconstraint problem. It should be noted that textural structures can be over-smoothed if the given Lagrange multiplier is too small. in contrast, if La- grange multiplier is set too large, though smaller total variation will be achieved, the output signal becomes undesirably less like the input. Some other refined algorithms based on total variation have also been proposed [94,95].

• Non-local means

Buades et al. propose a non-local means filter [11,96], essentially whose underlying principle is the same as the one proposed in, which is a patch-based method. It takes advantage of the redundant information of the various structural features in images.

It is assumed that each small patch in a natural image has many similar patches in the same images. By relating the Gaussian weighted Euclidian distance between two image patches, it measures the photometric similarity between them. It also estimates a specific pixel as an average of the values of all pixels whose defined neighborhood system is similar to that of this pixel. This algorithm extremely performs well when the image contains vast number of periodic and textural features, due to similar config- urations in far away pixels led by the large redundancy in both cases. [97] extends the work of Buades and his colleagues by exploiting pairwise hypothesis to define non-local estimation neighborhoods of each pixel adaptively.

• Hybrid methods

Besides the aforementioned alternative methods, there are a vast number of hybrid

52 methods aimed to image denoising applications.

In [98], the authors present a multiresolution image denoising framework which incorporates bilateral filtering into wavelet thresholding strategy. They use bilateral

filtering on the approximation subbands and wavelet thresholding, be more specific,

BayesShrink [41] method on the detail subbands.

Schulte et al. [99] developed a novel image denoising method called FuzzyShrink by incorporating the main concept of [45] into a fuzzy logic framework. After the image data is transformed from input plane to the membership plane by the process of fuzzi-

fication, they impose appropriate fuzzy membership functions to modify the coefficient values instead of estimating them according to their “signal of interest” probability.

Durand and Froment [100] propose a model for denoising signal by combining wavelet thresholding and a total variation (TV) minimization algorithm. Instead of introducing Gibbs’ phenomena near discontinuities like the traditional wavelet thresh- olding methods, their method derives a nearly artifact-free denoising. Besides, Wang and Zhou [101] also introduce a denoising algorithm based on these two schemes, in which the wavelet coefficients are thresholded based on the minimization of the TV norm of the reconstructed images. They especially target their algorithm on handling medical images.

Other noticeable methods include nonlinear isotropic and anisotropic diffusion [102–

104], Markov Random Field (MRF) based modeling methods [105–108].

3.4 Summary

In this chapter, we provided a comprehensive literature review of image denoising works. These works are categorized by their intrinsic principles and basic foundations.

Among them, some seminal algorithms are explained in depth.

Based on this literature review, we conclude that the wavelet transform is still

53 a promising tool for designing image denoising algorithms. Among various wavelet based denoising methods, soft thresholding strategy combining the discrete wavelet transform (DWT) is sigficantly in favor due to its simplicity and intuitive assumption.

We note that other more complicated transforms such as the undecimated discrete wavelet transform (UDWT) are able to gain preferred performance at the expense of computational burden. Thus, in the forthcoming chapters, we will develop a new wavelet based denoising method utilizing the idea of soft thresholding.

54 Chapter 4

The Proposed Gaussian Denoising

Method: CWMT

In this chapter, we will present the motivation and algorithms for our new Gaussian denoising method CMWT based on context modeling (CM) and wavelet thresholding

(WT) in the standard wavelet domain. Some parts of this chapter and the next chapter are based on our published paper [109].

4.1 Motivation

The wavelet transform is an important mathematical tool whose origins could be tracked decades ago. However the application of the wavelet transform on image de- noising has not emerged until the seminal work by Donoho and Johnstone in [32,33].

In [32, 33], Donoho and Johnstone have proven wavelet thresholding to be a suc- cessful method in piecewise denoising functions. Their theoretical result shows that

  N 2 2 2 2 E yˆ − x 2,N ≤ ΛN ε + min(yi ,ε ) (4.1) i=1

55 for all x ∈ Rn , and for an estimator

yˆ = ηs(yi,λNε) (4.2) i where N 2 2 v 2,N = vi (4.3) i=1

ηs(yi,λNε)=sgn(λNε)(|yi|−λNε)+ (4.4)

yi = xi + bi = xi + εdi,i=1, 2, ..N (4.5)

with y =(yi)i∈[1,N] noise contamination data for signal x =(xi)i∈[1,N], ε>0 the noise

1/2 level and di are i.i.d. N(0,1), and λN ≤ (2 log N) and ΛN ≤ 2logN + 1 for N total sample points.

More specifically, the minimax quantities, ΛN and λN are defined as in [32]

≡ ρST (λ, μ) ΛN inf sup −1 2 (4.6) λ μ N + min(μ , 1)

λN ≡ the largest λ attaining ΛN above (4.7)

2 where ρST (λ, μ)=E{ηs(Y,λ) − μ} withYbeingN(μ, 1). Also

2 2 2 ρST (λ, μ)=1+λ +(μ −λ −1) {Φ(λ − μ) − Φ(−λ − μ)}−(λ−μ)φ(λ+μ)−(λ+μ)φ(λ−μ)

(4.8) with φ ,Φ are the standard Gaussian density and distribution function [32].

With yi being a DWT coefficient, and bi = εdi being the additive Gaussian noise

2 N(0,1) if ε>0 being the noise level, the variable bi is then distributed N(0,ε ) and more

56 2 commonly denoted as N(0,σn). Equation (4.1) means that to achieve the minimax 2 upper bound of E yˆ − x 2,N ,the soft thresholding Equation (4.4) has to be followed where ΛN, λN and ρST (λ, μ) have to satisfy Equations (4.6,4.7,4.8).

Figure 4.1: A wavelet denoising flowchart

A typical denoising flow chart is shown in Figure 4.1 [1]. In this figure, perform discrete wavelet transform (DWT) to the noisy data (image) y =(yi)i∈[1,N] is to produce j j ∈ J noisy wavelet subband (subimage)Y =(Yi )i∈[1,Nj ],j [1,J] . Denoising operation is to producex ˆ1 ···xˆj ···xˆJ withx ˆj being an estimator of xj, and the inverse wavelet transform(IDWT) is to producex ˆ of the noise free data x. The noisy data y =(yi)i∈[1,N] is the sum of the noise free data x =(xi)i∈[1,N] and noise b =(bi)i∈[1,N] as yi = xi + bi.

The additive noise b is Gaussian noise with the distribution of N(0,σ2).

In addition, to our best knowledge, most of the image denoising techniques based on wavelet transform assume every wavelet coefficient is independent from each other, which is apparently disputable since there are some correlated relationship not only between neighboring wavelet coefficients but between those in the same position across subbands. Thus, denoising algorithms which take the dependencies among the wavelet coefficients into account are supposed to yield preferred results.

For instance, in [41], the authors provide a context modeling estimator in terms of

j j the MSE called zi of yi . And the details are as follows: ∈ j j j ∈ j For each subband j [2,J], compute z with z =(zi )i∈[1,Nj ]j [2,J]aszi =

j t j j j (w ) ui where ui is a 9 × 1 vector with the 8 nearest neighbor of yi coefficient being j j place as the first 8 components of ui and the parent coefficient of yi being place at the    j j j  j 9th component of ui . w is the subband coefficient vector with w = arg min ( yi − w Nj

57 j t j 2 (w ) ui ) . The context modeling is used to estimate the soft thresholding parameter value of each wavelet coefficient.

Besides, Luisier and Blu [1] have derived a denoising operator which is a linear combination of K derivatives of Gaussian (DOG) using an estimator λN of Eq.(4.7) with T as a parameter as K z2 −(k−1) j xˆj = akzje 2T 2 (4.9) k=1 √ In the above expression, K =2andT = 6σ, which are experimentally obtained.

Any large values of K have little or no effect on the resultingx ˆj.

Our research motivation is to reduce the square error upper bound of Equation (4.1) so that the resulting denoising procedure has the largest denoising effect in either SNR or PSNR when compared to the results summarized in [1] under non-redundant wavelet transform. Therefore based upon this principle, we propose a new two-denoising- operation method which contains the application of an improved context modeling and the optimization of the soft thresholding function.

4.2 Overview

In this dissertation, we use wavelet as a useful tool based on its theoretically proven near-minimax optimum and high effectiveness in real world. Its advantage of preserv- ing the property of AWGN makes wavelet favorable in practice. By applying standard wavelet transform, the noise remains additive, white and Gaussian with the same statis- tics in the standard wavelet domain which indicates that its wavelet subbands can be assumed statistically independent. And within each wavelet subband, the observation model

yj = xj + bj (4.10)

58 still holds, where bj follows the Gaussian distribution with the same mean and variance as the input noisy image yj. Therefore we can denoise in each subband independently.

Meanwhile, observing the right hand side of the inequality of Equation (4.1), and

Equations (4.2 to 4.8), an upper bound for the MSE is guaranteed for the pixel value estimation. We also know from this performance bound that the error decays as the number of observations is increased. Furthermore, we realize that there are two places where the error can be reduced: ε and ΛN. The value ε>0orσn is the noise level added to xi and ΛN ≤ 2logN + 1 is related to sample size N and soft thresholding

1/2 ηs(.)withλN ≤ (2 log N) . Our first denoising operation is to reduce σn(y). Our

 j suggestion is to apply a linear estimator Zi = wjYi with Ωι, a local region of yi such Ωι 2 2 that zi do not degrade the signal xi and σn(z) <σn(y) . Our second denoising operation is related to ΛN and ηs(.) where the soft thresholding function is being optimized. We    2 are basically dividing the minimization of E Xˆ − X into two denoising operations. 2,N A rather complete PSNR performance comparisons over a set of 6 commonly known images and over a broad range of noise contaminations from σn =5toσn = 100 for

BayesShrink [41], BiShrink [56], ProbShrink [45], BLS GSM [3], and SURE-LET [1], have been summarized in Table 2.2. In summary, SURE-LET [1] produces better

PSNR performances based on non-redundant wavelet transform in all images with the exception of Image Barbara.

4.3 Denoising Operation 1–Improved Context Mod-

eling

Context modeling, which is commonly used in , can be utilized to re- duce the mean squared error of the soft-thresholding estimator. The general principle of it is to group those wavelet coefficients of similar statistical information for a par- ticular wavelet coefficient, even though they are not spatially adjacent. This pointwise

59 manner takes the local characteristic, the parent-child relationship and the neighbor- hood similarity in particular (see Figure 4.2), into account and yields promising results.

In Section 4.1, we introduced the context modeling estimator presented by Chang

Figure 4.2: The parent-child relationship of a three level wavelet decomposition. et al. in [41]. Here, we also adopt the underlying idea of it to construct an improved context modeling estimator Zi, which experimentally indicates better denoising per- formance. It is a linear combination of the noisy data Yi, and its expression is as this

j Zi = wjYi (4.11) Ωι

2 where Ωι is the neighbor of Yi with Z − Y 2,N being minimized.

2 2 Our objective is to search for the best Ωι so as to achieve σn(z) <σn(y)and

N 2 SNR(z)>SNR(y). Noise level of Z =(Zi)i=1 is reduced to σn(z) with an improved SNR(z).

2 2 With Yi = Xi + bi and bi is distributed N(0,σn), the noise variance of Zi,zσn =   2 2 2 2 2 ( wj )σn.If wj < 1, then σn(z) <σn(y) and noise reduction is achieved. Thus, it   2 is desirable to find the smallest wj . Furthermore, if all wj are positive and wj is

60  2 close to 1, then wj < 1. Notice that DWT sub-band feature coefficients are larger in magnitude and sparse in nature with a strong inter-scale of parent-son relationship, and this information is very useful in selecting Ωι the localized region. Ωι should be selected such that it must contain mostly feature coefficients or mostly noise coefficients. In this way, if a least square approach is employed to obtain (wj)j∈[1,N], then they are positive   2 small and almost equal in value with wj close to 1, making wj < 1. Signal preservation results can be verified by computing the SNR of the resulting estimators.

To perform the context modeling strategy, we first employ a 4-level DWT operation on the incoming noisy image y, so that produce Y with subband designations shown in the Figure 4.3.

Figure 4.3: Subband designations

j j j t j j • For subbands 1 to 6, compute Z as Zi =(w ) ui where ui asa34by1vector. j It includes 24 nearest neighbors of Yi excluding itself in the same subband within j a 5 by 5 window, 9 nearest neighbors of the parent coefficient of Yi in the coarser resolution within a 3 by 3 window, and its grand-parent coefficient being placed as the

j 34th component of ui , totally 34 coefficients. • For subbands 7 to 12, build the similar context model as proposed by Chang et al.[41],

j only consider to place 8 nearest neighbor of Yi coefficient as the first 8 components of j j j ui and the parent coefficient of Yi in the coarser subband as the 9th component of ui .

61 wj is the subband coefficient vector with

  j  j j t j 2 t −1 t w = arg min ( Yi − (w ) ui ) =(U U) U |Y | (4.12) w Nj

2 j where U is a N × m (m being either 34 or 9) matrix with each row being ui for all i, j.

In addition, the absolute values of the wavelet coefficients are used instead of their original values due to the fact that the absolute values of the neighboring coefficients can be better modeled to preserve the correlations between coefficients.

The main reason to introduce a vector with more component for subbands of larger size is basically to include more information of this particular coefficient, since the wavelet coefficients possess strong inter-scale correlation.

Meanwhile, instead of directly applying Z obtained by the above procedure as an alternate of Y in the optimization of soft thresholding operation, we add portion of Y to this Z so that to form a new smoothed version, name it Znew.

The expression would be like this

Znew = A × Z +(1− A) × Y (4.13) with A being within 0 ∼ 1.

And it should be noticed that the selection of A is based on the noise level of the input image y, and the estimate of the noisy level is calculated as in Equation (3.25).

We summarize in Table 4.1 the selection of values of A under different noise level and subband circumstances according to our empirical observation.

62 Table 4.1: Table for selecting A at different subbands and estimated noise levels Subbands Noise level 1,2,3 4,5,6 7,8,9 10,11,12 (0,5] 0.5 0.2 0.05 0 (5,10] 0.7 0.4 0.15 0.05 (10,15] 0.7 0.45 0.2 0.05 (15,20] 0.7 0.5 0.3 0.1 (20,25] 0.75 0.55 0.35 0.15 (25,30] 0.8 0.6 0.4 0.2 (30,50] 0.9 0.8 0.6 0.3 (50,100] 0.9 0.8 0.7 0.5

4.4 Experimental Results of Denoising Operation 1

For all following numerical experiments in this dissertation, we test on six standard images shown in Figure 4.4 which include various and sufficient features, such as hu- man faces (Lena), indoor objects (Couple and Barbara), artificial scenery (Boat)and natural scenery (Goldhill). Besides, we take the peak signal-to-noise ratio (PSNR) as the measurement of the denoising performance, which is defined in Equation (2.4).

We experimentally illustrated the efficiency of the estimator from Denoising Op- eration 1 on selected 3 images Lena, Goldhill and Boat. Results are shown in Table

4.2. We can see that in both cases additive Gaussian noise of σn(y) of 30 and 50, the

PSNR(z)>PSNR(y)andσn(z) <σn(y). If we are to inspect the weight distribution wj in a few subbands of the orthogonal DWT shown in Figure 4.5 of Image Lena with

σn(y) = 30. we have

Subband LH3: {w} = {0.016, 0.135, 0.047, 0.281, 0.004, 0.282, 0.045, 0.136, 0.018} ;

Table 4.2: Comparison of PSNR of 3 images from a reconstruction of Znew(i,j) and the input PSNR σny 30 50 30 50 30 50 Input PSNR(y) 18.59 14.15 18.59 14.15 18.59 14.15 Image Lena Goldhill Boat σnz 16.02 30.34 14.63 29.36 16.23 29.89 PSNR(z) 24.03 18.49 24.82 18.77 23.92 18.62

63 Figure 4.4: Six standard testing images used on our experiments (a) Lena,(b)Boat, (c) Goldhill,(d)Barbara,(e)Couple,(f)Man.

64 Figure 4.5: Subbands of the 2D orthogonal wavelet transform   2 wj =0.96 and wj =0.20 < 1

Subband HL3: {w} = {0.049, 0.187, 0.049, 0.186, 0.014, 0.187, 0.053, 0.188, 0.051} ;   2 wj =0.96 and wj =0.15 < 1

Subband HH3: {w} = {0.032, 0.125, 0.103, 0.201, 0.018, 0.201, 0.103, 0.128, 0.033} ;   2 wj =0.94 and wj =0.14 < 1 This suggests that any non-thresholding types of filtering operations in the higher frequency subbands have a tendency of decreasing the noise level without violating the theoretical foundation of the soft-thresholding strategy.

We emphasize here again that in Figure 4.5, we group the wavelet coefficients into subbands of different levels and orientations for our convenience. For instance, the

Subband LH2 refers to those coefficients at the second level which are the output of the lowpass filter in the vertical direction and highpass filter in the horizontal direction.

65 4.5 Denoising Operation 2–The Optimization of the

Soft Thresholding Function

The second denoising operation is the optimization of the soft thresholding operation in

2 the sense of minimizing zˆi − x . In here, the approach used by Luisier and Blu in [1] is to approximate the soft thresholding denoising operator with a linearly combination of parameterized derivative of Gaussian (DOG) formulations. Two reasons are given in [1]: 1) Soft thresholding depends only on a single parameter T so that its shape is not very flexible and 2) the linear portion is too restrictive. The formula of the denoising function is K Z2 −(k−1) i Xˆi = ajZie 2T 2 (4.14) j=1    2 with Xˆz − X minimized. 2,N This model does not require any assumptions on the statistics of the underlying noise-free image, and the thresholding value for each wavelet coefficient can be opti- mized independently. The additional objective we have to achieve is

2 2 xˆz − x 2,N < xˆy − x 2,N (4.15)

2 2 or PSNR(ˆxz)>PSNR(ˆxy)Sinceσn(Z) <σn(Y ) .Then zλN or Tz < yλN or Ty and zΛN < yΛN. The error upper bound is further reduced.

More specifically, to compare the denoising efficiency of using z as input or using y as input for Denoising Operation 2, we have to compare their PSNR performances by

2 2 either first equating PSNR(y)=PSNR(z)orσn(z)=σn(y) and then use the resulting y and z as inputs to the denoising system.

Furthermore, the linear parameters can be obtained in a simple least square oper- ation for a set of T , σn and K (total number of linear terms). For a given σn(y), T

66 and K are only two parameters contained in the optimized soft-threshlding function, and they can be obtained experimentally with the best T tending to be fixed and for an effective range of K values.

If the parameter T is neglected, but only the number of term K is considered (e.g.

K = 1), this optimized soft-thresholding function regresses to a simple linear function which can be regarded as a pointwise Wiener filter. Therefore, in order to maximize the strength of the optimized soft-thresholding function, K is set to be larger than one, and the parameter T is retained. Furthermore, in [1], it is shown that when K ≥ 2, the shape of the function become insensitive to the variation of K.SoK = 2 suffices √ to make a simple but effective solution. Besides, we adopt the selection of T = 6σn in [1], which will be experimentally studies in Section 4.6.

However, in searching the optimal parameters for the pointwise denoising function, since no closed-form solution is currently available, we need to find an alternative way to produce the weights aj in Equation (4.14).

Although the analytical solution given in the SURE-LET approach [1] by solving a linear system of equations is not directly applicable on Z or Znew, fortunately, we are still able to work it around by applying some offset values on these solutions in order to obtain the final weights aj, j ∈ (1, 2). The detailed procedure is stated as follows

Determine a1 and a2, and compute as

2 −Znew Xˆ =(a1 + a2 · e 12ˆσ2 ) · Znew (4.16)

  where a1 = a1 +Δa1; a2 = a2 +Δa2;

  while a1 and a2 are obtained from the least-mean-square closed-form solution

a = M −1c (4.17)

67    T where⎛a =(a 1,a2) ; ⎞ 2 2 2 − Y ⎜ Y Y · e 12ˆσ2 ⎟ M = ⎝ 2 2 ⎠; 2 − Y 2 − Y Y · e 12ˆσ2 Y · e 6ˆσ2 T 2 2 2 2 2 Y 2 − Y − − − · 12ˆσ2 c = Y σˆ , [Y σˆ (1 6ˆσ2 )] e

And Δa1 and Δa2 can be obtained from Table 4.3 and Table 4.4, respectively.

Table 4.3: Table for selecting Δa1 under different subbands and noise levels. Subbands Noise level 1,2,3 4,5,6 7,8,9 10,11,12 (0,5] 0.2 0.1 0 0 (5,10] 0.3 0.2 0.1 0 (10,15] 0.2 0.3 0.1 0 (15,20] 0.2 0.3 0.2 0.1 (20,25] 0.1 0.4 0.2 0.1 (25,30] 0.1 0.4 0.3 0.1 (30,50] 0.1 0.3 0.4 0.1 (50,100] 0 0.2 0.4 0.3

Table 4.4: Table for selecting Δa2 under different subbands and noise levels. Subbands Noise level 1,2,3 4,5,6 7,8,9 10,11,12 (0,5] 0.1 0 0 0.1 (5,10] 0.1 0 0 0.1 (10,15] 0.1 0 0 0.1 (15,20] 0.1 0 0 0.1 (20,25] 0.1 0 0 0.1 (25,30] 0.1 0 0 0.1 (30,50] 0 0 0 0.1 (50,100] 0.1 0 0 0.1

It should be mentioned that parameter selections in Table 4.1, 4.3 and 4.4 are according to our empirical observations by obtaining the experimental results on six typical standard testing images (Figure 4.4). We also assume that the choice of K denoising functions is up to the practitioners and the optimization of the weighted parameters can be obtained by modifying the solutions of a linear system of equations.

Moreover, instead of choosing standard wavelet transform prior to the actual denoising

68 operations, other promising tools designed for transformed domain operations such as the steerable pyramid [76], ridgelets [77] and curvelets [78], can be implemented as well.

4.6 Experimental Results of Denoising Operation 2

It turns out that Znew behaves very much similar to Y with a large flexibility in the selection of K,sowefixK equals to 2 as well. Illustrations of the (PSNRmax − PSNR)

2 2 versus T /σn values are shown in Figure 4.6 for Image Lena and Figure 4.7 for Boat for different values of σn(y)withK = 2. From these figures, for all three noise levels which are σn =20,σn =30,σn = 50 in the test, we could observe that a good selection √ of T is T = 6σn.

Figure 4.6: Sensitivity of the denoising function with respect to variations of T on Lena

We also compare the proposed optimized soft-thresholding function with the origi- nal soft-thresholding strategy, in which we choose the Universal threshold [32]: λUNIV = √ 2lnNσ,whereσ is the standard deviation of the noise and N is the signal length.

Coefficients in the subbands at the same level share the same thresholding values.

69 Figure 4.7: Sensitivity of the denoising function with respect to variations of T on Boat

We note that the standard deviation of the noise as known as the noise level can be estimated [32,33] from this Equation

σˆ = median (|Y [i, j]|) /0.6745,Y[i, j] ∈ subband HH1 (4.18)

where HH1 is Subband 2 in Figure 4.3.

In order to validate the efficiency of our optimized soft-thresholding function, we also apply the original soft-thresholding function on the coefficients after context mod- eling as we did in Operation 1. Table 4.5 presents the numerical results, from which we notice that the results from our optimized soft-thresholding strategy are 4∼5dB compared to the original soft-thresholding.We also illustrate the visual comparison be- tween these two methods for two images Goldhill and Barbara, shown in Figure 4.8 and

4.9. It obviously indicates that by using the optimized soft-thresholding function, the over-smoothed problem of original soft-thresholding has been significantly overcome.

70 Table 4.5: Comparison of PSNR (dB) with Soft-thresholding (non-redundant) Noise level 5 10 15 20 25 30 50 100 Input PSNR 34.15 28.13 24.61 22.11 20.17 18.59 14.15 8.13 Method Lena Soft-thresholding 34.96 30.54 27.98 27.32 26.16 25.86 23.37 20.32 Proposed 38.92 35.42 33.50 31.89 31.16 30.32 28.17 25.32 Method Boat Soft-thresholding 32.10 27.95 25.85 25.37 24.64 22.97 20.90 17.47 Proposed 37.14 33.65 31.70 30.28 29.23 28.50 26.40 24.10 Method Goldhill Soft-thresholding 32.53 27.89 26.76 25.12 24.20 23.27 22.46 19.82 Proposed 37.62 34.02 32.17 30.87 30.10 29.50 27.80 25.47 Method Barbara Soft-thresholding 32.59 28.38 24.96 23.98 22.36 21.73 19.60 17.96 Proposed 37.13 33.29 31.05 29.50 28.29 27.34 24.78 23.47 Method Couple Soft-thresholding 32.43 28.42 25.24 25.12 23.42 22.69 22.13 18.97 Proposed 37.24 33.42 31.36 29.97 29.07 28.35 26.27 23.95 Method Man Soft-thresholding 33.85 27.97 25.88 25.24 24.27 23.21 21.86 19.12 Proposed 37.56 33.56 31.59 30.28 29.49 28.72 27.03 24.71

4.7 Experimental Results of Combining Two De-

noising Operations

To evaluate the practical performance of our whole denoising approach, hereafter termed CMWT (Context Modeling+Wavelet Thresholding), we use the six standard testing images shown in Figure 4.4 and combine the two denoising operations. In this simulation, we add the additive Gaussian white noise to each image at eight different power levels (5, 10, 15, 20, 25, 30, 50, 100), which corresponds to the input PSNR values (34.15, 28.13, 24.61, 22.11, 20.17, 18.59, 14.15, 8.13) respectively.

At first, we select the denoising results of three images: Lena, Goldhill and Cou- ple images using the most efficient non-redundant wavelet transform method: SURE-

LET [1] and our proposed method. We illustrate the PSNR comparisons in Figure

4.10,4.11 and 4.12. The experiments applied non-redundant wavelet transform. It

71 Figure 4.8: Visual comparison between original soft-thresholding and the optimized soft-thresholding functions on Image Goldhill. (a) Original Goldhill, (b) Noisy version of noise level 30, (c) Denoised image by original soft-thresholding function, (d) Denoised image by the optimized soft-thresholding function.

shows the total improvements in PSNR for various value of σn(y) for these three im- ages. In these cases, 0.5 ∼1 dB improvements have been demonstrated. In addition, we exhibit the visual comparisons for these three images in Figure 4.13,4.14and 4.15.

Although the image’s quality assessment by the human judgment could be subjective, we can observe in the images denoised by our method that the visual artifacts are preferably suppressed and edges are better preserved.

We also have obtained PSNR results on three other standard and widely used

72 Figure 4.9: Visual comparison between original soft-thresholding and the optimized soft-thresholding functions on Image Barbara. (a) Original Barbara, (b) Noisy version of noise level 30, (c) Denoised image by original soft-thresholding function, (d) Denoised image by the optimized soft-thresholding function.

512×512 images: Boat, Barbara and Man. The quantitative results for these six images of our proposed method along with results from SURE-LET [1] are summarized in Table

4.6. It shows that our method carries about 0.4∼1 dB improvement on average. And the computation time is around 3∼5 seconds on a regular PC for the whole denoising process.

73 Figure 4.10: Comparison of PSNR of CMWT and SURE-LET [1] on Lena

Figure 4.11: Comparison of PSNR of CMWT and SURE-LET [1] on Goldhill

74 4.8 The Steps of the Denoising Proposed Method:

CMWT

Here, we summarize the steps of our two denoising operation method CMWT as follows:

1). Perform a 4-level DWT of the noisy image y to produce its wavelet representation

Y .

2). Estimate Z from Y using Equation (4.11) where the weight w is calculated using

Equation (4.11).

3). Estimate the standard deviation of the additive noise corrupting the noise-free image by the Equation (4.18).

4). Form Znew from Y and Z as in Equation (4.13) where A is obtained from Table

4.1 for different subbands and noise levels.

5). Determine a1 and a2, and compute as expressed in Equations (4.16) and (4.17) based on the corresponding values in Table 4.3 and 4.4.

6). Invert DWT of Xˆ to obtain imagex ˆ.

Figure 4.12: Comparison of PSNR of CMWT and SURE-LET [1] on Couple

75 Figure 4.13: Visual comparison on Image Lena. (a) Original image, (b)Noisy image of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised image by CMWT.

76 Figure 4.14: Visual comparison on Image Goldhill. (a) Original image, (b)Noisy image of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised image by CMWT.

77 Figure 4.15: Visual comparison on Image Couple. (a) Original image, (b)Noisy image of noise level 20, (c) Denoised image by SURE-LET [1], (d) Denoised image by CMWT.

78 Table 4.6: Comparison of PSNR (dB) with the method SURE-LET(non-redundant) Noise level 5 10 15 20 25 30 50 100 Input PSNR 34.15 28.13 24.61 22.11 20.17 18.59 14.15 8.13 Method Lena SURE-LET 37.96 34.56 32.68 31.37 30.36 29.56 27.37 24.66 CMWT 38.92 35.42 33.50 31.89 31.16 30.32 28.17 25.32 Method Boat SURE-LET 36.70 32.90 30.85 29.47 28.44 27.63 25.50 22.97 CMWT 37.14 33.65 31.70 30.28 29.23 28.50 26.40 24.10 Method Goldhill SURE-LET 36.53 32.69 30.76 29.52 28.60 27.89 26.06 23.82 CMWT 37.62 34.02 32.17 30.87 30.10 29.50 27.80 25.47 Method Barbara SURE-LET 36.71 32.18 29.66 27.98 26.76 25.83 23.70 21.76 CMWT 37.13 33.29 31.05 29.50 28.29 27.34 24.78 23.47 Method Couple SURE-LET 36.68 32.62 30.44 29.02 27.99 27.19 25.13 22.79 CMWT 37.24 33.42 31.36 29.97 29.07 28.35 26.27 23.95 Method Man SURE-LET 36.95 32.87 30.78 29.44 28.47 27.71 25.76 23.42 CMWT 37.56 33.56 31.59 30.28 29.49 28.72 27.03 24.71

4.9 Summary

In this chapter, we have generalized a new wavelet based approach to image denoising.

Its core principle is built on the intention of further decreasing the lower bound of the error between the estimate and the original signal set up by the soft-thresholding technique.

First, we presented a spatially adaptive and scale based context modeling which improved the estimation of each individual wavelet coefficient. By taking advantage of the parent-child relationships and correlations between neighboring coefficients, we grouped the coefficients of similar statistics and optimized their weights in the sense of least squared error. In this way, we suppressed the noise in smooth regions and sharpened the textures and edges to some extent, thus a less noisy version of the input image prior to the next crucial denoising operation has been constructed.

79 Then, a pointwise optimized soft-thresholding strategy has been introduced. It is a linear combination of a series of denoising functions in the form of derivatives of

Gaussian. Contrary to the classic soft-thresholding function, its shape is more flexi- ble and the thresholding values are less dependent on a single variable and more data adaptive. Though the analytical solution of the optimal values for several parameters in the denoising function could hardly be mathematically obtained, our workaround is to utilize an existing closed-form solution and add an offset value adaptively which can be experimentally obtained.

We have theoretically shown the principle of our denoising method CMWT and presented the experimental results of both denoising operations as well as combining them together. CMWT is able to achieve enhanced performance and better quanti- tative results than applying either of them independently. This method also resulted in a significant improvement compared with the state-of-the art denoising approach

SURE-LET under the framework of the orthogonal wavelet transform.

80 Chapter 5

Expansion to the Overcomplete

Representation

In this chapter, we extend the principle of our Gaussian denoising method to the over- complete representation of the standard wavelet transform. Though the main drawback of it such as larger memory requirement and more computational time is inevitable, the motivation for this study originates from the substantial denoising performance improvements the overcomplete representation accomplishes.

5.1 Overview of Applying Overcomplete Expansion

One may often observe Pseudo-Gibbs phenomena in the area of edge and ridge discon- tinuities in images after standard wavelet denoising. These disturbing visual artifacts are generally caused by the shift variance, an intrinsic drawback of DWT. Normally this drawback is brought in by the use of down-sampling in the DWT, in which case every other wavelet coefficient at each decomposition level is discarded. In fact, as long as not all wavelet coefficients are used to perform Inverse DWT, the shift invariance is introduced to the reconstructed signal [65].

81 According to Coifman and Donoho’s experiment [67], the actual positions of the image discontinuities play a significant role for the size of these artifacts. Since the artifacts are highly related to the alignment between the features in the signal and the features of basis wavelet applied, it is intuitively understandable that similar signals only with different alignment may generate fewer artifacts after being wavelet denoised.

And the reason here is also easy to comprehend. In the operation of standard discrete wavelet transform, down-sampling of the input signal by factor of two retains the even indexed coefficients but discards the odd indexed coefficients, consider we shift an in- put signal by one position, the output signal we obtained is shifted by one position as well. In this case, after down-sampling, the odd indexed coefficients are kept while the even indexed coefficients are been abandoned, so we simply introduce more useful information for further investigation.

The overcomplete expansion procedure is originally derived from Coifman and

Donoho’s Translation-Invariant denoising [67], in which a general routine called “cy- cle spinning” is proposed. By carrying out this procedure, we can shift the signal to change the positions of features, so that the artifacts produced in the denoised image can be diminished by re-aligning the given noisy image. Then shift invariance can be achieved. In cases of 2D signal, we can achieve translation-invarience by shifting the pixels of columns and/or rows. Suppose we shift pixels of columns by S-1 times, so we have gained S redundant representations of the input image. Then the denoising procedure is operated over all S representations, and the final estimate is obtained by averaging this S representations after inverse circular shifting each of them. So generally speaking, the overcomplete representation improves the image’s quality by suppressing pseudo-Gibbs phenomena.

82 5.2 Overcomplete Expansion Procedure

In this section, we provide a detailed procedure of conducting the overcomplete expan- sion.

1. Shift columns or rows of the input image y to generate a redundant set of S repre- sentations of it, namely {y(1),y(2), ...y(S)}. For example, y(i) is created by circularly shifting each single column by i times.

2. For each image y(i) in the redundant set:

1) Apply the same denoising method CMWT in Chapter 4 to get its shifted version of estimated image y’(i)

2) Inversely circular shift each image y’(i), to get image y*(i)

3. Average all images y*(i) (i from 1 to S) to get the final estimated imagex ˆ.

We should note that complete shift-invariance can be achieved if all cyclic shifts are practiced. However, since this procedure is relatively time demanding, it is not neces- sary to shift 512 times to get every shifted representation, the following figures indicate that only a small number of shifts, for example, 20∼30 are sufficient in practice, which can be concluded from Figure 5.1 and 5.2.

5.3 Experimental Results for Overcomplete Expan-

sion

In order to demonstrate the denoising efficiency of CMWT when applying overcomplete expansion, termed CMWT-OE, we tested on six widely used images shown in 4.4 and used BLS GSM [3], UWT SURE-LET [62], BM3D [92] and our proposed method and show the denoising results of Lena, Goldhill and Couple images here in Figures

5.3,5.4,5.5. Based on the conclusion from Figure 5.1 and 5.2, where horizontal axis denotes the number of shifts and the vertical axis denotes the PSNR in dB, we set

83 Figure 5.1: Relation between shifted times and PSNR of the denoised Image Lena

Figure 5.2: Relation between shifted times and PSNR of the denoised Image Goldhill the number of shifts to 25. We notice that for all three images, the proposed method outperforms other available most efficient methods impressively, especially for Image

Goldhill.

We also list the PSNR results in Table 5.1 with six widely used images (see Figure

4.4) tested under different start-of-the-art denoising methods. The parameters of each

84 Figure 5.3: Comparison of PSNR (dB) with 3 other most efficient methods on Lena

Figure 5.4: Comparison of PSNR (dB) with 3 other most efficient methods on Goldhill method have been set to their optimal values in accordance with the corresponding papers thanks to the generosity of their authors.

Two sets of images (Lena and Boat) are presented for visual comparison, please refer to Figure 5.6 and Figure 5.7. Important features such as edges and textures

85 Figure 5.5: Comparison of PSNR (dB) with 3 other most efficient methods on Couple are significantly preserved while very few visual artifacts are exhibited. For other results, we provide an executable software package which can be downloaded and run at http://secs.ceas.uc.edu/~aicv/Subpages/demo.html. Examining these experimental results, we note that our denoising method when comparing to two other best wavelet based methods, SURE-LET and BLS GSM, has

0.5∼1 dB improvements in PSNR for all noise levels tested on average only with the exception of Image Barbara, in which case our wavelet denoising method CMWT-

OE is no better than BLS GSM. One intuitive explanation for this is similar to that indicated in [1], which is that to some extent, crucial detailed recurrent textures such as Barbara’s pants include very tight frequency which might only present in a certain resolution, so that these textures would not be retained in coarser scales, thus the context modeling and correlations of parent-children coefficients may not be effective for the image. Please refer to Figure 5.8 as a visual explanation. From the figure, we can observe that most of the stripes on Barbara’s pants are visible at the first scale wavelet domain image, but they become more and more invisible as the scale becomes

86 Figure 5.6: The Image Lena (a) Noise-free image, (b) Noisy image with noise level of 50, (c) Denoised image using SURE-LET [62], (d) Denoised image using BLS GSM [3], (e) Denoised image using BM3D [92], (f) Denoised image using CMWT-OE.

87 Figure 5.7: The Image Boat (a) Noise-free image, (b) Noisy image with noise level of 50, (c) Denoised image using SURE-LET [62], (d) Denoised image using BLS GSM [3], (e) Denoised image using BM3D [92], (f) Denoised image using CMWT-OE.

88 Figure 5.8: Wavelet domain images of Barbara at: (a) the first scale (b) the second scale (scaled up by two) (c) the third scale (scaled up by four).

89 coarser, which means many useful information has been lost.

Comparing with the best non-wavelet based method BM3D [92], CMWT-OE is

0.3∼1 dB better for most cases on average, except for Image Barbara as well. This impressive result is a major driving force of writing this research work.

Table 5.1: Comparison of several most famous thresholding denoising methods (PSNR)

5 10 15 20 25 30 50 100 Input PSNR 34.15 28.13 24.61 22.11 20.17 18.59 14.15 8.13 Method Lena SURE-LET 38.25 35.08 33.31 32.06 31.10 30.33 28.22 25.57 BLS-GSM 38.49 35.61 33.90 32.66 31.69 30.46 28.61 25.64 BM3D 38.72 35.93 34.27 33.05 32.08 31.26 28.86 25.57 CMWT-OE 39.36 36.10 34.08 32.65 31.81 30.98 28.87 25.95 Method Boat SURE-LET 37.13 33.53 31.57 30.22 29.20 28.39 26.20 23.61 BLS-GSM 36.97 33.58 31.70 30.38 29.37 28.56 26.35 23.75 BM3D 37.28 33.92 32.14 30.88 29.91 29.12 26.64 23.74 CMWT-OE 37.91 34.50 32.40 30.90 29.90 29.12 27.19 24.63 Method Goldhill SURE-LET 36.85 33.20 31.37 30.17 29.30 28.61 26.83 24.69 BLS-GSM 37.00 33.38 31.53 30.32 29.42 28.72 26.87 24.63 BM3D 37.14 33.62 31.86 30.72 29.85 29.16 27.08 24.45 CMWT-OE 38.04 34.57 32.60 31.32 30.55 30.02 28.54 26.18 Method Barbara SURE-LET 36.98 32.65 30.16 28.45 27.18 26.23 24.13 22.26 BLS-GSM 37.79 34.03 31.86 30.32 29.13 28.15 25.48 22.61 BM3D 38.31 34.98 33.11 31.78 30.72 29.81 27.17 23.49 CMWT-OE 36.60 33.74 31.75 30.09 28.59 27.63 25.15 23.54 Method Couple SURE-LET 37.15 33.40 31.36 29.97 28.93 28.10 25.94 23.45 BLS-GSM 37.01 33.46 31.44 30.01 28.93 28.06 25.78 23.12 BM3D 37.52 34.04 32.11 30.76 29.72 28.87 26.38 23.37 CMWT-OE 37.91 34.20 32.00 30.57 29.72 29.02 26.87 24.75 Method Man SURE-LET 37.28 33.42 31.40 30.07 29.10 28.35 26.38 24.05 BLS-GSM 37.44 33.56 31.49 30.13 29.14 28.37 26.35 23.82 BM3D 37.82 33.98 31.93 30.59 29.62 28.86 26.59 23.97 CMWT-OE 38.12 34.14 32.16 30.84 30.03 29.46 27.72 25.43

90 5.4 Summary

In this chapter, we have presented a further development of our image denoising method

CMWT in Chapter 4 under overcomplete expansion. Although overcomplete expan- sion is obviously inefficient for coding purpose, it has been proven to be a superior tool for denoising due to its property of shift-invariance.

We circularly shifted the input image and shifted it back after operating the same denoising algorithm on the shifted versions. By averaging all the processed images, favorably better objective performance can be generated. By equipping this simple yet very efficient technique, the visual artifacts on the denoised images have been substan- tially eliminated and our aforementioned denoising method can be very competitive to the state-of-the-art image denoising algorithms.

91 Part II

Image Denoising for Poisson Noise

92 Chapter 6

Background on Poisson Denoising

In Part I, we tackled the removal of the most widely assumed noise: additive white

Gaussian noise (AWGN). However, In many real applications such as biomedical and astronomic research, images are generally contaminated by shot noise whose major source is strongly signal-dependent. Thus, in Part II, we opt for a non-additive and non-Gaussian model, in particular, Poisson model to describe the statistics of the noise, which is necessary to guarantee a successful image restoration in such situations.

6.1 Modeling of Low Intensity Images

The intensity of a certain pixel in an observed image is approximately proportional to the photon counts arrived in this pixel. These photons are obtained by a detection device like CCD. Normally, errors such as noise will invariably be introduced by the detection device itself due to its lack of infinite precision. In addition, when photons travel from the object to the detection device, noise can also be inevitably generated.

Thus, it is necessary to employ image restoration approaches so that the noise on the obtained image can be suppressed and the image therefore provides people with an improved observation of the object in interest.

93 Meanwhile, low intensity images where relatively few photons are observed are ex- tremely common in the applications of biomedical and astronomic research. In these situations, the counts of photons arrived on the detection device are very limited.

However, many well-established existing image restoration methods designing for ad- ditive white Gaussian noise (AWGN) become expectably unfitted because the models are usually only appropriate when the number of photon counts per pixel is relatively large. But a reasonable assumption is much more suitable for such cases: the observed image can be considered as a realization of a Poisson process and the photon counts of pixels can be modeled as Poisson distribution. Thus, a photon-limited image can be modeled as a 2D matrix of Poisson variables. Poisson noise has a multiplicative aspect, which means that the more intense the signal is, the more variable are the fluctuations or the noise. On the other hand, the modeling of Poisson process is very different from that of the Gaussian process, in that for Gaussian models, the variance of the noise is stationary, whereas the variance of Poisson noise is non-stationary throughout the whole image and the magnitude of the noise is dependent on the pixel intensity we want to restore. Removing noise of this type is a more difficult problem. In the next section, we introduce some of the important and efficiency approaches for Poisson denoising.

6.2 Poisson Noise Model

The detection of an individual photon can be assumed as an independent event that follows a random distribution and photon counting is a classic Poisson process. Thus, we suppose y =(yi)i∈2 is an image we obtained and each pixel intensity yi can be modeled as a Poisson random variable following this probability density function

−xi yi e xi ≥ Pxi (yi)= ,yi 0 (6.1) yi!

94 where the Poisson parameter xi is not only the mean value of yi, but also equals to its

2 variance σi . This indicates that the noise is signal dependent and its standard devi- ation proportionally increases with the square root of the signal. Note that for larger photon counts, in accordance with the limit theorem, it is ensured that a Poisson dis- tribution approaches to a Gaussian distribution.

We assume that the mean value xi of each observed pixel intensity yi is its corre- sponding pixel intensity in the clean image and the variability of the mean can be inter- preted as noise. Therefore, our goal is to restore the original clean image x =(xi)i∈2 by searching for an estimatex ˆ =(ˆxi)i∈2 which is as close as possible to x given that observed noisy image y.

Usually, the closeness between the estimate and original image is measured in terms of minimum mean square error

N 1 2 1 2 MSE = ||xˆ − x|| = (ˆxi − xi) (6.2) N N i=1 where N is the total number of pixels in the image.

The denoising problem can also be stated as to estimate the underlying mean value x =(xi)i∈2 of each pixel from a realization of the Poisson process.

6.3 Related Work

A variety of Poisson denoising methods have been proposed in the literature. Major contributions consist of the followings.

6.3.1 Variance Stabilization

Variance stabilization is a simple and intuitive but widespread procedure for Poisson denoising in practice which was originally proposed by Donoho in [110]. The core prin-

95 ciple of this method is to transform the Poisson variables into Gaussian variables thus existing denoising algorithms aiming at AWGN can be applied. It has three main steps and the details are as follows.

1) a variance stabilizing transformation (VST) is executed on the obtained Poisson noisy image so that the noise variance is approximately stabilized through the whole image;

2) an existing AWGN denoising algorithm is then applied on this transformed image;

3) an inverse transformation is used on this transformed and processed image resulting in the final recovered image. This procedure is illustrated in Figure 6.1.

Figure 6.1: General Poisson noisy image denoising procedure

Since the advent of the most famous and popular VST by Anscombe [111], which will be specifically introduced in the next section, many other meaningful VSTs have been proposed. Among them, the most significant example is the Haar-Fisz trans- form [112] which combines Fisz transform [113] and Haar transform and an advanced redundant multiscale presentation based VST [114] (MS-VST) proposed by Zhang et al. Here we give a brief introduction to both of them.

• Haar-Fisz Transform

J Suppose we have a signal x =(xi),i ∈ (0,N − 1) and xi ≥ 0, ∀i where N =2.

The Haar-Fisz transform is defined as this

1. Perform the Haar Discrete wavelet transform (DWT) on x and calculate a new

j j coefficient called fi immediately for each set of obtained smooth subband s and detail

96 subband dj of x at scale 2j. It is defined as ⎧ ⎪ j ⎨ 0ifs =0, j i f = (6.3) i ⎪ j j ⎩ di si otherwise.

2. Apply the inverse Haar DWT to the modified transform (xJ , f J , f J−1, ..., f 1)to produce a new signal x. There is a one-to-one corresponding mapping relationship between original signal x and the transformed signal x denoted by

x =Φx (6.4) where Φ defines the Haar-Fisz transform.

By applying the Haar-Fisz transform, a signal of Poisson counts is converted to approximately Gaussian distributed with variance one. This transform is fast and simple. When being applied to estimate the intensity of an inhomogeneous Poisson process, very appealing performance can be yielded. For other noise distributions such as binomial, Gamma, and negative binomial, it also has the potential to be applicable.

• MS-VST

In [114], Zhang et al. suppose the performance will be improved for low-intensity images if a well-designed lowpass filter was used to preprocess the input image.

To devise the MS-VST, they set Y as the filtered Poisson image and h as the impulse

 k response of the filter and define τk = i (h[i]) ,fork =1, 2, ..... They also assume that the image is locally homogeneous, i.e., λj−i = λ for all i within the support of h.

ThenewVSTisgivenby[114]

 Δ T (Y ) = b · sign(Y + c) |Y + c| (6.5)

97 where sign( is the sign function. The constant b and care defined as

 Δ b =2 |τ1|/τ2 and Δ 7τ2 τ3 c = − 8τ1 2τ2

It is approved that

 √ D b Y + c − b · sign(τ1) |τ1|λ−−−−→N (0, 1) (6.6) λ→+∞

With this MS-VST, they incorporate some multi-resolution transforms such as ridgelets and curvelets so that to guarantee the efficiency for recovering important structures on the images. Their algorithm especially aims at very low intensity signals and yields very good performance.

Other effective VSTs used for reducing the signal dependence include the mod- els proposed in [115], [116]. It is also worth noting that, more recently, M¨akitalo and Foi [7] introduced new inversions including a maximum likelihood inversion and a minimum mean square error inversion for the commonly used VST: Anscombe transfor- mation [111]. They combined these inversions with BM3D technique, which is a start- of-the-art denoiser for AWGN and consistently improved performances are achieved.

6.3.2 Hypothesis Testing

This Poisson denoising framework is coined by Kolaczyk in [117], where he first intro- duced a thresholding strategy in Haar transform domain. He derived the hypothesis testing approach controlling a user-calibrated false positive rate from the Gaussian cases and proposed to use level-dependent thresholds to handle Poisson statistics.

In addition, Zhang et al. [118] extend the hypothesis testing framework to the dec-

98 imated biorthogonal Haar instead of the classic Haar transform. For a low-intensity setting, they show the upper-bound of the p-values had been substantially decreased.

Their approach ensures a smooth estimate from the biothogonal Haar filter bank while maintaining a low computational complexity.

Moreover, Kolaczyk has developed “corrected” soft and hard thresholds for Poisson intensities in [119]. This method serves as an adapted version of the usual Gaussian- based universal thresholds designed for AWGN [32] and a test of hypothesis. However, the approximated thresholds may fail to hold the efficiency in low-count situations.

6.3.3 Wavelet Filtering

In this part, we provide a brief introduction of some classical wavelet based Poisson denoising approaches proposed in the literature.

Nowak and Baraniuk [120] designed an optimal, data-adaptive filter in wavelet domain using the method of cross-validation. Unlike other conventional linear filters, instead of leading to large bias errors, their filtering strategy favors lower bias over variance reduction.

Another wavelet filtering approach with a wider scope of application was proposed by Antoniadis and Sapatinas [121]. They applied the idea of pointwise modulation estimator originally designed for Gaussian noise and selected an optimal modulation function by minimizing an appropriate risk. Recently, in [6], Luisier et al. proposed a Poisson denoising algorithm PURE-LET based on an unnormalized transform and the minimization of an unbiased estimate of the MSE for Poisson noise called “Poisson’s unbiased risk estimate” (PURE). This signal dependent technique contains a fast interscale thresholding strategy on wavelet coefficients. It has been validated that this approach is very competitive in terms of denoising performance and computational complexity.

99 6.3.4 Bayesian Based Approach

Bayesian approaches for Poisson estimation are mainly investigated in [122–126]. In

[122], Kolaczyk proposed a class of Bayesian multiscale models by using recursive dyadic partitions within an entirely likelihood based framework. In this model, the prior distribution of the multiscale parameter yj,k is given by a mixture of a point mass at 1/2 and a symmetric beta distribution as follows

1 yj,k|γj,k,Bj,k ∼ γj,k +(1− γj,k)Bj,k, (6.7) 2

γj,k|pj ∼ Bernoulli(pj), (6.8) and

Bj,k|aj ∼ beta(aj,aj). (6.9)

The final posterior mean estimate of the underlying intensity is calculated by incorpo- rating a multiscale factorization of the posterior distribution into a translation invariant framework.

Timmermann and Nowak [123] have constructed a multiscale Bayesian model of the Poisson noisy image. They specify more general beta mixtures for the multiscale parameters. They define the mixed priors by

M s −1 si−1 yj,k i (1 − yj,k) f(yj,k)= pj , 0 ≤ y ≤ 1 (6.10) i=1 B(si,si) where M is the number of components, B(a, b) is standard beta function, 0 ≤ pi ≤ 1 M denotes the a priori probability of the ith component and pi =1. i=1 They present that with these mixture priors, a closed-form expression for the pos- terior mean of underlying intensity as the optimal estimate can be derived. They also

100 found that three components for the prior beta-mixture model suffice for many appli- cations in practice and suggested s1 =1,s2 = 100, s3 = 10000, the weights p1 =0.001 and with an approximated value for p2 and p3 based on the moments.

Besides, in [124], Nowak and Kolaczyk applied the Haar transform and derived a novel multiscale Bayesian prior to model intensity functions under squared error loss.

They also characterize the correlation behavior of the new prior and show that it has spectral characteristics. In [125], Sardy et al. formulated the Poisson denoising problem under a penalized maximum likelihood framework combined with arbitrary multiscale transformations. In [126], Willett and Nowak employed a Poisson intensity estima- tion approach involving a platelet-based penalized likelihood estimation of a piecewise polynomial on recursive dyadic partitions of the support of the Poisson intensity. Its estimator does not require any a priori knowledge of the clean signal’s smoothness.

We note that many denoising methods are highly hybrid by involving different techniques, so that the categorization above is by no means strict.

6.4 Anscombe Transformation and Its Inversions

The main obstacle for many existing Gaussian denoising algorithms to be directly applied on noisy photon-limited images is that they are unable to model the variance of the Poisson noise as non-stationary and dependent on the underlying intensity. To make it more explainable, in [127] Willett takes an extreme case as an example. Imagine there is high-resolution imaging system available in which every observation is either zero or one. An image obtained by this system only contains binary values therefore has very non-Gaussian statistics. It leads to the conclusion that the traditional wavelet denoising algorithms developed under Gaussian noise assumptions perform suboptimally and produce unnecessary visual artifacts.

Thus, several VST methods such as those in [112, 114] are used to remove the

101 dependence of the noise variance on the underlying data. Among them, we choose the classic Anscombe transformation [111] since it is still frequently used and considered to be a useful tool due to its efficiency and simplicity. Here, we give an introduction to the classic Anscombe transformation.

In [111], Anscombe considered a transformation in the form of

√ y = r + c (6.11) where r is a Poisson variable.

It was further theoretically proven in the [111] that

1 3 − 8c 32c2 − 52c +17 var(y) ∼ 1+ + (6.12) 4 8m 32m2

3 where m is the mean of Poisson variable r. So that when c = 8 we have 1 1 var(y) ∼ 1+ (6.13) 4 16m2

If the expectation of y is defined as

 E(y)= my + c (6.14)

my is the estimate of m derived by using Equation (6.11) by taking the arithemic mean of a large sample of observed values of y.Thus,

1 8c − 3 my ∼ m − + (6.15) 4 32m

3 − Therefore, if set c = 8 , the bias my m in my is nearly constant and the transformed y in Equation has an approximately constant variance equal to 1/4.

102 Next, we rewrite the expression as follows 3 Yi =T(yi)=2 yi + (6.16) 8

where yi is the observed intensity value of Poisson noisy image and Yi is the trans- formed intensity value. By multiplying a factor of 2, the transformed Yi has a variance approximately equal to one. From now on, we use uppercase letters to represent the corresponding transformed data.

After the Anscombe transformation T, the pixel intensities throughout the whole image are approximately Gaussian distributed with mean 0 and variance σ2 =1.Thus its variance is assumed to be stationary.

We suppose that there is a promising denoising operation available which provides a successful transformed estimate Xˆ based on the observed y. In practice, after the denoising operation is performed, it is necessary to apply an inverse transformation in order to obtain the final estimatex ˆ of the original data. So the arithmetical inverse

Anscombe transformation f1 is naturally derived as

2 ˆ −1 Xi 3 xˆi = f1(Xˆi)=T (Xˆi)= − (6.17) 2 8

Though very simple, we emphasize that this inverse transformation fails to be competent when photon counts are very small since the resulting estimatex ˆ inevitably generates biased errors due to the nonlinearity of the forward Anscombe transformation

T, so that we have [7]

E{T(y)|x} =T( E{y|x}) (6.18) and thus

T−1(E{T(y)|x}) =( E{y|x}) (6.19)

103 Meanwhile, it is worth noting that we can also choose an alternative to the arith- metical inverse Anscombe transformation, which relatively mitigates the biased error for smaller valued Poisson parameters. It is called asymptotical inverse Anscombe transformation f2 and its expression is as [111]

2 Xˆi 1 xˆi = f2(Xˆi)= − (6.20) 2 8

Nevertheless, the main drawback of this inverse transformation is similar to that of the arithmetical inverse transformation, that is its performance on very low values falls out of satisfaction. Thus, in order to minimize the bias error in low-intensity images, in the next section we propose a more precise inverse transformation.

6.5 Summary

In this chapter, we first discussed the rationale of modeling low intensity images as

Poisson distributed. Then a general Poisson noise model was provided with an inter- pretation of the denoising goal for Poisson noisy images according to the characteristics of Poisson variables. Later in Section 6.3, we contributed an in-depth literature sur- vey of important Poisson denoising algorithms including the framework of the variance stabilizing transformation (VST) which will be exploited further in the next chapters.

Finally, in Section 6.4, the most frequently used VST: Anscombe transformation and its conventional inverse transformations were mathematically explained. Anscombe transformation is executed on Poisson noisy images for reducing the noise dependence on the signal.

104 Chapter 7

The Proposed Poisson Denoising

Method: CMWT-IAT

Since Poisson distributed signal is much more difficult to tract than Gaussian dis- tributed signal in the wavelet domain due to it signal dependent nature. We use VST to approximately convert a Poisson noisy image into Gaussian distributed, so that the denoising methods aiming at AWGN can be applied subsequently. Without changing the classic Anscombe transformation, in this chapter, we propose a new inversion for it. The motivation for the new inversion originates from a main drawback existing in the conventional VSTs: its efficiency degrades significantly when the pixel intensities of the observed images are very low due to the biased errors generated by its inverse transformation. Then we develop a new Poisson denoising method called CMWT-IAT by combining the proposed inversion with our previous denoising method CMWT.

Some parts in this chapter are based on our published paper [128].

105 7.1 Investigation of the Biased Errors

As mentioned previously in Chapter 6.4, generally the efficiency of either the arith- metical or the asymptotical inverse Anscombe transformation only holds under the assumption that the underlying mean value I is large enough, but their reliability is invalid for those Poisson variables with relatively small mean values, and it is well known that in such cases, their performances deteriorate quickly. Thus, basically we are interested to get a quantitative idea of how large the biased errors become when the underlying Poisson variables are small enough after the forward transformation.

Accordingly, for each integer i from 1 to 255, we generate a data set consisting of a very large number of Poisson variables with i being the underlying Poisson parameter.

We then calculate the variance of this data set for each i. The results are shown in Fig- ure 7.1, from which we observe that it follows the Poisson property that the variance is approximately equal to the mean value. Then, we apply the Anscombe transformation in Equation (6.16) on all the values in these data sets, calculate the variance of each transformed data set, and illustrate the results in Figure 7.2. The figure reveals that the variances are almost equal to 1 with little respect to the mean value i through- out the entire range of the corresponding underlying Poisson parameters, with only negligible oscillation.

We then explore the biased errors between the transformed parameters and their estimated means and show them in Figure 7.3. The x axis denotes the Poisson param- eters, and y axis denotes the mean values estimated from the Anscombe transformed

Poisson data for each parameter i, but with each value divided by the mean value of the original Poisson parameter. By observing the figure, it is clearly seen that both values are almost identical to each other from a practical standpoint of view when the value of a Poisson parameter is no less than 30. Thus, it is reasonable to consider that after applying Anscombe transformation, the Poisson variables are unbiased as long as

106 Figure 7.1: Variance of Poisson distributed data sets

Figure 7.2: Variance of transformed data sets the values are no less than 30. It indicates that both the arithmetical and asymptotical inverse transformations still work effectively in such a situation so that we can apply either of them directly.

107 Figure 7.3: Biased errors between Poisson parameters and estimated means

7.2 New Inverse Transformation for Anscombe Trans-

formation

In Section 7.1, we have investigated the biased errors generated by the conventional

Anscombe transformations. We suppose that for Poisson parameters with values less than 30, the resulting values are severely underestimated, and this underestimation is especially severe when they are below 10. Consequently, our main focus at this stage is to develop a solution appropriately applicable to the inversion of Poisson parameters with relatively small values. Thus, it is intuitive for us to compensate the denoised results before the arithmetical inversion by adaptively dividing a predefined factor for each Poisson estimate obtained. This factor can be derived by fitting the curve in

Figure 7.3 in a least squares sense using the linear polynomial regression model

n n−1 bi = P(a)=p1ai + p2ai + ... + pnai + pn+1 (7.1)

108 where n+1 called the order of the polynomial and n called the degree of this polynomial.

In order to calculate the estimated parameters in the polynomial, the minimization of the following summation of the weighted squares is required.

K K 2 2 minimize S = wiri = wi(bi − ˆbi) (7.2) i=1 i=1 where ri, the residual, is defined as the difference between the measured value bi and the fitted value ˆbi; K is the number of data points provided in the fit; wi is the weight which determines how much each corresponding value influences the final estimate. For simplicity, we set the weights as 1 wi = 2 (7.3) σi

2 where each σi is derived from the variance of the transformed Poisson data from Figure 7.2.

When fitting the curve by polynomials, we also assume that there exist random variations in the measured data and the random variations are Gaussian distributed with a zero mean and variance σ2. We add this error in the polynomial expressed as

n n−1 bi = P(a)=p1ai + p2ai + ... + pnai + pn+1 + εi (7.4)

2 where εi ∼ N(0,σ ), ∀i and cov(εi,εj)=0, ∀i, j

In matrix notation, the polynomial regression model is given by

b = Ap + ε (7.5)

The solution vector p can be obtained by solving

−1 p =(ATWA) ATWb (7.6)

109 where W consists of the diagonal elements of the weight matrix w.

With the polynomial regression model, the flexibility for the data is desirably achieved. However, if the degree of the polynomial is high, the fitting model becomes dramatically unstable. We should also note that the polynomial regression model is expected to work only within a certain range, and divergence can be caused signif- icantly out of this range. Thus the suitable range has to be selected carefully. For this consideration, we suggest the polynomials for our application be piecewise and quadratic or cubic.

We hereby separate the curve in Figure 7.3 into three segments before using the polynomial regression model to fit: 1) Poisson parameters with values under 10, 2) those with values from 10 to 30 and 3) those with values larger than 30. For the first two groups, illustrated in Figure 7.4 and 7.5, with the coefficient vector p known from

Equation (7.6) for the fitting polynomial and denoised transformed estimate Xˆi,the

final desired value ofx ˆi can be obtained by

ˆ ˆ Xi Xi = m (7.7)  m−j pjXˆi + εi j=1

ˆ 2 Xˆ i 3 xˆi = − (7.8) 2 8

ˆ where m is the order of the polynomial, Xˆ i is the corrected denoised estimate in the transformed domain.

Furthermore, if the value of a Poisson variable is larger than 30, the asymptotical inverse Anscombe transformation (see Equation 6.20) is adopted.

110 Figure 7.4: Curve fitting for Poisson parameter under 10

Figure 7.5: Curve fitting for Poisson parameter from 10 to 30

111 7.3 Combination of the Denoising Method CMWT

for AWGN

So far, we have provided detailed explanations for the first and third steps in the

Poisson denoising procedure stated in Section I. For the second step–the denoising part, we apply our developed wavelet based denoising method CMWT for AWGN in

Part I, which yields very competitive performances. We name our Poisson denoising method CMWT-IAT (Context Modeling, Wavelet Thresholding+proposed Inverse

Anscombe Transformation). For the readers’ convenience, the procedure of CMWT is briefly summarized here

1). Perform multilevel discrete wavelet transform (DWT) on the input noisy image y to produce its wavelet representation Y .

t 2). Estimate a smoothed version Z from Y using Zi = w ui where the weight w is calculated in terms of minimizing mean squared error, and ui is a vector consisting of

Yi’s relevant coefficients.

3). Estimate the additive white Gaussian noise standard deviation of noisy image y.

4). Add portion of noisy Y to Z to form a newer version Z adaptively for different subbands and noise levels.

5). Determine the parameters of the optimized soft thresholding operation on Z and compute the estimate Xˆ by adding offsets adaptively to a close formed solution.

6). Apply inverse DWT on Xˆ to obtain the denoised imagex ˆ.

Furthermore, in order to suppress the unpleasant Pseudo-Gibbs phenomena in the area such as edges and ridge discontinuities in images after standard wavelet denoising, we also carry out an overcomplete expansion process called cycle spinning originally proposed in [67]. The basic procedure of it is to circularly shift the input image to generate its overcomplete representations, then do the same denoising operation on each

112 of them, and shift back before averaging all the denoised representations to obtain the

final desired image. By applying this strategy, we can remove some disturbing visual artifacts so that the denoised image’s quality is notably improved. To see the principle and details of it, please refer to Chapter 5.

7.4 Summary

In this chapter, we have presented a new denoising method called CMWT-IAT for

Poisson noise corrupted images. The method uses the Anscombe variance stabiliz- ing transformation and combines our previously developed wavelet-based Gaussian denoising method: CMWT with a new proposed inverse transformation based on a polynomial regression model. By applying this method, the biased errors produced by conventional inversions have been significantly corrected. We also extend it to the overcomplete wavelet representation by applying the “cycle spinning” to achieve shift- invariance and suppress the Pseudo-Gibbs phenomena. Though simple and straight- forward, it is considered to be very effective on photon-limited images.

113 Chapter 8

Performance Evaluation

Our proposed denoising method on Poisson noise corrupted images CMWT-IAT shows theoretical reliability and we conduct three groups of experiments to confirm its actual performance. In our experiments, we take the commonly favored peak signal-to-noise ratio (PSNR) as our measurement of the denoising performance, which is defined in

Equation (2.4). Here we write it again for convenience.

I2 PSNR=10log ( max ) (8.1) 10 MSE

where MSE is defined in Equation (2.3) and Imax is the largest intensity of the noiseless image.

Since Poisson modeled denoising methods are specifically applicable on the photon- limited images, it is more reasonable to conduct our empirical experiments in a low- intensity setting. For this reason, the intensities of pixels on six standard testing images are scaled down proportionally with the maximum intensity Imax being 60, 30, 20, 10 and 5 respectively and Poisson noise is generated on each of them.

114 8.1 Comparisons with Two Conventional Inversions

In this experiment, we verify the advantage of our proposed inverse transformation over both the arithmetical and asymptotical inverse Anscombe transformations in Equations

(6.17,6.20). In all three methods, we denoise the transformed images using our method presented in [109] with the overcomplete representation. The only difference is the inverse transformations used after the denoising.

In Table 8.1, we present the PSNR comparisons between these three different trans- formations. From the table, we can notice that our proposed inverse method apparently yields significant improvements than the other two when the peak intensity is low.

8.2 Comparisons with SURE-LET Using the Pro-

posed Inversion

We illustrate the efficiency of our developed denoising method [109] when being applied on the transformed Poisson data in this experiment. We compare it with a state-of- the-art Gaussian denoising algorithm SURE-LET [1] under non-overcomplete wavelet transform. For both methods, we apply our proposed inverse transformation. Here the PSNR values of our method are also obtained by using non-overcomplete wavelet transform. These numerical results are presented in Table 8.2.

In Figure 8.1 and 8.2, we show visual results of both denoising methods applied on

Image Boat and Man. We can notice that the delicate details such as the ship’s masts and man’s hair and feathers on the images obtained by our denoising method are less over-smoothed and better restored.

From the numerical results and visual comparison, it is convinced that our denoising method originally designed for AWGN is still a very viable and effective approach when applying on the variance stabilized Poisson data and it generates significantly superior

115 results than SURE-LET in a low-intensity setting.

Table 8.1: PSNR (dB) comparison of the arithmetical, asymptotical inverse Anscombe transformations and the proposed inverse transformation Image Peak Arithmetical Asymptotical Proposed 5 23.11 25.30 25.98 10 25.91 27.20 27.23 Lena 20 28.43 28.98 28.99 30 29.68 29.95 29.95 60 31.32 31.45 31.46 5 23.27 25.72 26.11 10 25.85 26.88 26.92 Goldhill 20 27.77 28.29 28.29 30 28.61 28.86 28.86 60 30.16 30.26 30.26 5 22.76 24.80 25.18 10 25.45 26.53 26.64 Boat 20 27.69 27.72 27.73 30 28.26 28.48 28.50 60 30.05 30.15 30.15 5 21.68 23.26 23.55 10 23.81 24.51 24.61 Barbara 20 25.62 25.96 25.97 30 26.66 28.85 28.85 60 29.04 29.12 29.13 5 22.60 24.81 25.10 10 25.15 26.07 26.28 Couple 20 27.18 27.61 27.63 30 28.13 28.39 28.40 60 29.91 30.01 30.03 5 22.96 25.16 25.53 10 25.71 26.83 26.89 Man 20 27.37 27.84 27.84 30 28.31 28.54 28.54 60 29.95 30.04 30.04

116 Table 8.2: PSNR comparison of SURE-LET [1] and our denoising method [109] com- bined with the proposed inversion for different images and peak intensities Image Peak Noisy SURE-LET CMWT-IAT 5 9.93 24.42 25.31 10 12.95 26.73 26.56 Lena 20 15.98 28.21 28.42 30 17.72 29.07 29.38 60 20.74 30.69 30.87 5 10.22 23.16 25.52 10 13.23 25.23 26.41 Goldhill 20 16.24 26.52 27.86 30 17.99 27.29 28.47 60 20.99 28.66 29.80 5 9.93 22.82 24.60 10 12.94 25.02 25.80 Boat 20 15.96 26.53 27.25 30 17.73 27.33 28.03 60 20.70 28.88 29.55 5 10.20 21.23 23.15 10 13.21 23.10 24.21 Barbara 20 16.24 24.38 25.56 30 17.98 25.36 26.47 60 20.99 27.08 28.56 5 10.20 22.72 24.55 10 13.23 24.75 25.75 Couple 20 16.24 26.15 27.18 30 17.99 27.03 27.96 60 21.03 28.55 29.48 5 10.32 22.80 24.95 10 13.33 25.08 26.01 Man 20 16.36 26.41 27.39 30 18.14 27.24 28.14 60 20.11 28.74 29.59

8.3 Comparisons with State-of-the-Art Denoising

Methods

In this set of experiments, we compare the following denoising methods on their per- formances of denoising low-intensity images and list the PSNR results in Table 8.3 for different images and peak intensities.

117 Figure 8.1: (a) The original Image Boat at peak intensity 30, (b) Poisson noise cor- rupted image, (c) Image denoised with non-overcomplete SURE-LET [1] and the pro- posed inversion, (d) Image denoised with CMWT-IAT.

• SURE-LET

A combination of an overcomplete wavelet transform denoising approach [62] de-

rived from their state-of-the-art method [21] and our proposed inverse transforma-

tion applied after the denoising. It consists of a denoising estimator derived from

a series of weighted and optimized thresholding functions.

• PURE-LET

118 Figure 8.2: (a) Part of the original Image Man at peak intensity 30, (b) Poisson noise corrupted image, (c) Image denoised with non-overcomplete SURE-LET [1] and the proposed inversion, (d) Image denoised with CMWT-IAT.

A competitive Poisson intensity restoration technique [6]. It minimizes an unbiased

estimate of the MSE for Poisson noise and the denoising process is a linear combi-

nation of optimized thresholding functions. The denoising results of this approach

are obtained by applying 10 cycle spins in our simulations.

• BM3D

119 A combination of non-wavelet denoising method BM3D and their inverse trans-

formation [7]. BM3D is a state-of-the-art denoising algorithm using sparse 3D

transform domain collaborative filtering. In addition, their unbiased inverse trans-

formation is based on minimum likelihood (ML) and minimum mean square error

(MMSE).

• CMWT-IAT

A combination of applying the overcomplete representation of our wavelet based de-

noising method [109] for AWGN and the proposed inverse transformation presented

in this paper.

From Table 8.3, in general, our method’s performance is very competitive in terms of PSNR. In particular, it outperforms SURE-LET and PURE-LET with an improve- ment up to 1.5 dB. Meanwhile, it is interesting to mention that by using the proposed inverse transformation, SURE-LET essentially produces better results than the speci-

fied Poisson denoiser PURE-LET. Besides, our method can yield comparable or even improved results to the state-of-the-art BM3D approach, especially for the Images

Goldhill and Man. We found out that for the Images Lena and Barbara, our algorithm is not able to yield a similar performance to BM3D, we analyze that the reason for this is similar to what we investigated in Section 5.3: the limitation of our wavelet modeling in estimating the dominated recurrent textures in these images.

We also provide two sets of Image Goldhill at peak intensity 20 and Couple at peak intensity 10 obtained by different denoising methods for visual comparison in Figure

8.3 and 8.4. We can point out that the combination of our wavelet based denoising method and proposed inverse transformation yields very few artifacts and keeps more useful features like the grids on the windows on Image Goldhill, tables and curtains on

Image Couple compared to the other methods.

120 Table 8.3: PSNR comparison of some of the best denoising methods for different images and peak intensities Image Peak Noisy SURE-LET PURE-LET BM3D CMWT-IAT 5 9.93 25.84 24.74 26.56 25.98 10 12.95 27.55 26.68 28.31 27.23 Lena 20 15.98 29.08 27.81 29.99 28.99 30 17.72 29.97 29.16 30.96 29.95 60 20.74 31.42 30.94 32.43 31.46 5 10.22 24.54 23.48 24.92 26.11 10 13.23 26.04 25.59 26.33 26.92 Goldhill 20 16.24 27.25 26.38 27.75 28.29 30 17.99 28.04 27.42 28.55 28.86 60 20.99 29.36 28.92 29.92 30.26 5 9.93 24.07 23.68 24.77 25.18 10 12.94 25.75 25.33 26.28 26.64 Boat 20 15.96 27.22 26.41 27.83 27.73 30 17.73 28.05 27.51 28.74 28.50 60 20.70 29.64 29.15 30.29 30.15 5 10.20 22.16 22.61 24.48 23.55 10 13.21 23.50 23.56 26.35 24.61 Barbara 20 16.24 24.76 24.80 28.18 25.97 30 17.73 28.05 27.51 29.19 28.50 60 20.99 27.52 27.52 30.91 29.13 5 10.20 23.99 23.35 24.46 25.10 10 13.23 25.53 25.01 26.14 26.28 Couple 20 16.24 27.05 26.24 27.74 27.63 30 17.99 27.97 27.19 28.77 28.40 60 21.03 29.53 28.94 30.37 30.03 5 10.32 24.06 23.65 24.77 25.53 10 13.33 25.71 25.57 26.13 26.89 Man 20 16.36 27.00 26.13 27.54 27.84 30 18.14 27.83 27.32 28.35 28.54 60 20.11 29.30 28.91 29.83 30.04

8.4 Summary

In this chapter, we conducted three sets of empirical experiments to verify the compe- tence of CMWT-IAT in denoising Poisson noise corrupted images.

First, we compared its performance with two widely used Anscombe inverse trans- formations: its arithmetical and the asymptotical inverse transformations. It has been

121 shown that this new inversion outperforms them especially when the intensities are low and guarantees a successful processing. Then, we compared our previous denois- ing method to the state-of-the-art denoiser SURE-LET both combining our proposed inversion. The numerical results have validated its advantage and efficiency on the

Gaussianized Poisson data. Last, we compared our Poisson denoising method to some leading image restoration methods including a specifically designed algorithm for Pois- son data. It contributes competitive results both quantitatively and qualitatively.

122 Figure 8.3: (a) The original Image Goldhill at peak intensity 20, (b) Poisson noise cor- rupted image, (c) Image denoised with overcomplete SURE-LET [62] and the proposed inversion, (d) Image denoised with PURE-LET [6], (e) Image denoised with BM3D and their unbiased inversion [7], (f) Image denoised with our denoising method [109] and the proposed inversion. 123 Figure 8.4: (a) The original Image Couple at peak intensity 10, (b) Poisson noise cor- rupted image, (c) Image denoised with overcomplete SURE-LET [62] and the proposed inversion, (d) Image denoised with PURE-LET [6], (e) Image denoised with BM3D and their unbiased inversion [7], (f) Image denoised with our denoising method [109] and the proposed inversion. 124 Chapter 9

Conclusion and Perspectives

In this chapter, we summarize the dissertation and provide some potential further extensions to this work.

9.1 Conclusion

In this dissertation, a new wavelet based image denoising method is proposed.

In the first part of this dissertation, we handled the removal of additive white

Gaussian noise (AWGN). Under the standard wavelet transform, we first considered to lower the standard deviation of the noise prior to the thresholding. We introduced an improved context modeling method and grouped the wavelet coefficients with sim- ilar statistics by observing the parent-child relationships and neighboring dependency.

Thus, a smoothed version of the input noisy image was constructed. Then, we em- ployed an adaptive and subband-dependent optimized soft-thresholding function on this processed image to gain additional denoising effect. By combining the two opera- tions based on techniques of context modeling and wavelet thresholding, our proposed denoising method CMWT can be placed among the state of the art image denoising algorithms. We also applied the technique of “cycle spinning” to suppress the artifacts

125 generated by the undesired shift-variance property of the standard wavelet transform.

In the second part, we considered another noise model, Poisson noise, which oc- curred more often when the photon counts the imaging devices obtain are limited.

In this case, the observed noisy image can assumed to be a realization of a Poisson process corrupted by an AWGN. With this assumption, many denoisers designed to estimate Gaussian observation models fail to be efficient, whereas the proposed CMWT is still applicable based on the variance stabilizing transformation (VST) framework.

VST is used to reduce the noise dependency on the signal. We then proposed a new inversion for the Anscombe transformation to compensate the biased errors brought by the Anscombe transformation and its conventional inversions. We incorporated the

CMWT proposed in Part I into this Poisson denoising approach and termed it CMWT-

IAT. This approach was finally tested on different Poisson noise corrupted images and it can be regarded as a competitor with a lighter computational cost.

9.2 Perspectives

In this section, we list several potential research topics and directions that might be the extension of our work on image denoising.

Parameter Optimization

We believe that the performances of CMWT as well as CMWT-OE and CMWT-

IAT still can be further improved by employ optimal parameters. While the analyt- ical tractability of the solution for the parameters a1 and a2 in the optimized soft- thresholding as resolved in Section 4.5 remains an open problem, we would like to devise a more generic algorithm for choosing the optimal values for them. Currently, we obtain the values from a training process on different images. We are also inter- ested in deriving a more precise decision process on the percentages of the noisy wavelet

126 coefficients Y added to the context model Z in Section 4.3.

Other Noise Models

In this dissertation, we first dealt with the additive Gaussian noise and tested - oughly on 0-mean additive Gaussian noise at different noise levels, and then tackled

Poisson noise and tested mainly on a low-intensity setting. We did not derive the estimation of images corrupted by noise following other statistics. But we suppose the proposed method is able to be extended to be used for mixed Gaussian-Poisson distributed noise or those possibly described by a more general probability density function (PDF).

Computational Cost

From a practical point of view, the computation cost is an important criterion of the performance of various denoising methods. Although the overcomplete expansion like

“cycle spinning” contributes both superior numerical and visual results than those obtained by standard wavelet transforms, the extra time it requires and additional computer memory consumption limit its applications in real world. We could use a smaller number of cycle spins if less processing time is more preferred or apply other wavelet-based transforms which possess the property of shift-invariance.

Extension to the Denoising of Color Images and

The denoising method presented in this work should not be bounded to grayscale images. Color images usually contain more information and more appealing to human visual system. Typically, there are three channels for RGB and YUV images. We can consider to adapt the proposed method based on monochannel to multichannel image denoising. On the other hand, with the rapid development of video acquisition and transmission, video denoising still remains a crucial step before performing any higher

127 level tasks. In order to optimize the performance, when extending to video denoising, the strong temporal correlations between adjacent frames should be taken into account.

128 Publications

1. J. Quan, W. G. Wee, C. Y. Han, and X. Zhou, “A new Poisson noisy image denoising

method based on the Anscombe transformation,” in Proc. International Conference

on Image Processing, Computer Vision, and Pattern Recognition (IPCV’12), pp.949,

Las Vegas NV, USA, Jul. 2012. 2. J. Quan, W. G. Wee, and C. Y. Han, “A new wavelet based image denoising method,”

in Proc. IEEE Data Compression Conference (DCC’12), pp.408, Snowbird UT, USA,

Apr. 2012. 3. X. Xiao, J. Quan, A. Ferro, X. Zhou and W. Wee, “An approach to the model-image

selection problem of an assisted defect recognition system,” in Book of Abstracts, 37th

Dayton-Cincinnati Aerospace Sciences Symposium, Miamisburg OH, USA, Mar. 2012. 4. J. Quan, W. G. Wee, and C. Y. Han, “A new image denoising approach based on

wavelet thresholding,” under review by the Journal of Signal Processing Systems.

129 Bibliography

[1] F. Luisier and T. Blu, “A new SURE approach to image denoising: interscale

orthonormal wavelet thresholding,” IEEE Trans. Image Process., vol. 16, No. 3,

pp. 593-606, Mar. 2007.

[2] R. R. Coifman and D. L. Donoho, “Translation invariant de-noising,” Lecture Notes

in Statistics: Wavelets and Statistics, vol. 103, Springer Verlag, NewYork, pp. 125-

150,1995.

[3] J. Portilla, V. Strela, M. J. Wainwright, and E. P. Simoncelli, “Image denoising

using scale mixtures of gaussians in the wavelet domain,” IEEE Trans. Image Pro-

cess., vol. 12, no. 11, pp. 1338-1351, Nov. 2003.

[4] T. Blu and F. Luisier, “The SURE-LET approach to image denoising,” IEEE Trans.

Image Process., vol. 16, no. 11, pp. 2778-2786, Nov. 2007.

[5] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse

3-d transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16,

no. 8, pp. 2080-2095, Aug. 2007.

[6] F. Luisier, C. Vonesch, T. Blu, and M. Unser, “Fast interscale wavelet denoising of

Poisson-corrupted images,” Signal Process., vol. 90, no.2, pp. 415-427, Feb. 2010.

130 [7] Markku M¨akitalo and Alessandro Foi, “Optimal Inversion of the Anscombe Trans-

formation in Low-Count Poisson Image Denoising,” IEEE Trans. Image Process.,

vol. 20, no. 1, pp.99-109, Jan. 2011.

[8] Z. Wang, A. Bovik, H. Sheikh, and E. Simoncelli, “Image quality assessment: from

error visibility to structural similarity,” IEEE Trans. Image Process., vol. 13, no.

4, pp. 600-612, Apr.2004.

[9] H. R. Sheikh, A. C. Bovik, and G. de Veciana, “An information fidelity criterion

for image quality assessment using natural scene statistics,” vol. 14, no. 12, pp.

2127-2128, Dec.2005.

[10] P. G. J. Barten. “Contrast sensitivity of the human eye and its effects on image-

quality,” SPIE Optical Engineering, Press. Bellingham, WA, 1999.

[11] A. Buades, B. Coll, and J. M. Morel, “A review of image denoising algorithms, with

a new one,” Multiscale Modeling and Simulation (SIAM interdisciplinary journal),

vol. 4, no. 2, pp. 490-530, July 2005.

[12] M. Rudin, Molecular Imaging: “Basic Principles and Applications in Biomedical

Research,” Imperial College Press, August 2005.

[13] H. Sheikh, A. Bovik, and L. Cormack, “No-reference quality assessment using nat-

ural scene statistics: Jpeg2000,” IEEE Trans. Image Process., vol. 14, no. 11, pp.

1918-1927, Nov. 2005.

[14] N. Wiener, “Extrapolation, interpolation, and smoothing of stationary time series:

with engineering applications. ” Technology Press of the Massachusetts Institute of

Technology, 1949.

[15] J. S. Lee, “Digital image enhancement and noise filtering by use of local statistics,”

IEEE Pattern Anal. Machine Intell., vol. PAMI-2, pp. 165-168, Mar. 1980.

131 [16] A. C. Bovik, T. S. Huang and D. C. Munson, Jr., “A generalization of median fil-

tering using linear combinations of order statistics,” IEEE Trans. Acoustics, Speech

and Signal Process., vol. 31, no. 6, pp. 1342-1350, Dec. 1983.

[17] R. Yang, L. Yin, M. Gabbouj, J. Astola, and Y. Neuvo, “Optimal weighted median

filters under structural constraints,” IEEE Trans. Signal Process., vol. 43, no.3, pp.

591-604, Mar. 1995.

[18] C. Tomasi and R. Manduchi, “Bilateral filtering for gray and color images,” in

Proc. of the Sixth Int. Conf. on Computer Vision, pp. 839-846, Bombay, India,

1998.

[19] T. Chen, K-K. Ma and L-H. Chen, “Tri-state median filter for image denoising,”

IEEE Trans. Image Process., vol. 8, no. 12, pp. 1834-1838, Dec. 1999.

[20] A. Ben Hamza, P. Luque, J. Martinez, and R.Roman, “Removing noise and pre-

serving details with relaxed median filters,” J. Math. Imag. Vision, vol. 11, no. 2,

pp. 161-177, Oct. 1999.

[21] R. C. Gonzalez and R. E. Woods, Digital Image Processing, Prentice Hall, 2008.

[22] L. Yaroslavsky, “Local adaptive image restoration and enhancement with the use

of DFT and DCT in a running window,” in Proc. of Wavelet Applications in Signal

and Image Proc. IV, SPIE Proc. Series, vol. 2825, pp. 1-13, Denver, CO, 1996.

[23] I. Daubechies, “Orthonormal bases of compactly supported wavelets,” Comm.

Pure & Appl. Math., vol. 41, pp. 909-996, 1988.

[24] S. Mallat, “A theory for multiresolution signal decomposition: the wavelet repre-

sentation,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 11, no. 7,

pp. 674-693, July 1989.

132 [25] I. Daubechies, “ Ten Lectures on Wavelets”. SIAM Journal of Math. Analysis.

1992.

[26] B. Vidakovic and C. B. Lozoya, ”On Time-Dependent Wavelet Denoising,” IEEE

Trans. Signal Process., vol. 46, no. 9, pp. 2549-2554, Sept. 1998.

[27] S. Mallat and W.L.Hwang, ”Singularity detection and processing with wavelets,”

IEEE Trans , vol. 38, no. 2, pp. 617-643, 1992.

[28] T.C. Hsung, D. P. K. Lun, W. C. Siu, “Denoising by singularity detection,” IEEE

Trans on Signal Process., vol. 47, no. 11, pp. 3139-3144, Nov. 1999.

[29] S. G. Mallet and Z. F. Zhang, “Matching pursuits with time-frequency dictionar-

ies,” IEEE Trans. Signal Process., vol. 41, no. 12, pp. 3397-3415, Dec. 1993.

[30] P. Ishwar, K. Ratakonda, P. Moulin and N. Ahuja, “Image denoising using multiple

compaction domains,” in Proc. IEEE Int. Conf. on Acoustics, Speech and Signal

Proc., –ICASSP’98, Seattle, WA, pp. 1889-1892, May 1998.

[31] H. Choi and R. G. Baraniuk, “Multiple wavelet basis image denoising using besov

ball projections,” IEEE Signal Process. Letters, vol. 11, no. 9, pp. 717-720, Sep.

2004.

[32] D. L. Donoho and I.M. Johnstone, “Ideal spatial adaptation via wavelet shrink-

age,” Biometrika, vol.81, pp. 425-455, 1994.

[33] D. L. Donoho and I.M. Johnstone, “Adapting to unknown smoothness via wavelet

shrinkage,” J. Amer. Statist. Assoc., vol. 90, no. 432, pp.1200-1224, Dec. 1995.

[34] H.-Y. Gao and A. Bruce, “WaveShrink with firm shrinkage,” Statist. Sin.,vol.7,

pp. 855-874, 1997.

133 [35] A. Chambolle, R. A. DeVore, N. Lee and B. J. Lucier, “Nonlinear wavelet image

processing: variational problems, compression, and noise removal through wavelet

shrinkage, ” IEEE Trans. Image Process., vol. 7, no. 3, pp. 319-335, Mar. 1998.

[36] S. Sardy, “Minimax threshold for denoising complex signals with waveshrink,”

IEEE Trans. Signal Process., vol. 48, no. 4, pp. 1023-1028, Apr. 2000.

[37] M. Jansen, M. Malfait and A. Buttheel, “Generalized cross validation for wavelet

thresholding,” Signal Processing, vol. 56, pp.33-44, Jan. 1997.

[38] F. Abramovich, P. Besbeas and T. Sapatinas, “Empirical Bayes approach to block

wavelet function estimation,” Computational Statistics and Data Analysis, vol. 39,

pp. 435-451, 2002.

[39] G. P. Nason, “Wavelet shrinkage using cross-validation,” Journal of the Royal

Statistical Society, vol. 58, no. 2, pp.463-479, 1996.

[40] D. L. Donoho, “De-noising by soft-thresholding,” IEEE Trans. Inform. Theory,

vol.41, no. 9, pp. 613-627, May. 1995.

[41] S. G. Chang, B. Yu, and M. Vetterli, “Spatially adaptive wavelet thresholding

with context modeling for image denoising,” IEEE Trans. Image Process.,vol.9,

no. 9, pp. 1522-1531, Sep. 2000.

[42] N. Weyrich and G.T. Warhola, “Wavelet Shrinkage and Generalized Cross Vali-

dation for Image Denoising, ” IEEE Trans. Image Process., vol. 7, no. 1, pp 82-90,

Jan. 1998.

[43] M. Jansen and A. Bultheel, “Multiple wavelet threshold estimation by generalized

cross validation for images with correlated noise,” IEEE Trans. Image Process.,vol.

8, no. 7, pp. 947-953, 1999.

134 [44] E. P. Simoncelli, “Bayesian denoising of visual images in the wavelet domain,”

in Bayesian Inference in Wavelet Based Models, P. Muller and B. Vidakovic, Eds.

New York: Springer-Verlag, 1999.

[45] A. Pizurica and W. Philips, “Estimating the probability of the presence of a signal

of interest in multiresolution single- and multiband image denoising,” IEEE Trans.

Image Process., vol. 15, no. 3, pp. 645-665, Mar. 2006.

[46] P. Moulin and J. Liu, “Analysis of multiresolution image denoising schemes using

generalized Gaussian and complexity priors,” IEEE Trans. Inform. Theory, vol. 45,

no. 3, pp. 909-919, Apr. 1999.

[47] M. Hansen and B. Yu, “Wavelet thresholding via MDL for natural images,” IEEE

Trans. Inform. Theory, vol. 46, no. 5, pp. 1778-1788, Aug. 2000.

[48] E. Simoncelli and E. Adelson, “Noise removal via Bayesian wavelet coring,” in

Proc. IEEE Int. Conf. Image Processing, vol. I, pp. 379-382, Sept. 1996.

[49] M. Miller and N. Kingsbury, “Image denoising using derotated complex wavelet

coefficients,”IEEE Trans. Image Process., vol. 17, no. 9, pp 1500-1511, Sep. 2008.

[50] P. Scheunders and S. D. Backer, “Wavelet denoising of multicomponent images

using Gaussian scale mixture models and a noise-free image as priors, ” IEEE

Trans. Image Process., vol. 16, no. 7, pp. 1865-1872, Jul. 2007.

[51] G. Fan and X.G. Xia, “Image denoising using a local contextual hidden markov

model in the wavelet domain,” IEEE Signal Process. Lett., vol. 8, no. 5, pp. 125-128,

May. 2001.

[52] A. Achim and E.E. Kuruoglu, “Image denoising using alpha-stable distributions in

the complex wavelet domain,” IEEE Signal Process. Lett., vol. 12, no. 1, pp.17-20,

Jan. 2005.

135 [53] L. Boubchir and J. M. Fadili, “A closed-form nonparametric Bayesian estimator

in the wavelet domain of images using an approximate alpha-stable prior,” Pattern

Recognit. Lett., vol. 27, no. 12, pp. 1370-1382, Sept. 2006.

[54] S. M. M. Rahman, M. O. Ahmad, and M. N. S. Swamy, “Bayesian wavelet-based

image denoising using the Gauss-Hermite expansion,” IEEE Trans. Image Process.,

vol. 17, no. 10, pp. 1755-1771, Oct. 2008.

[55] J. Romberg, H. Choi and R. G. Baraniuk, “Bayesian wavelet domain image mod-

eling using hidden Markov models,”IEEE Trans. Image Process., vol. 10, no. 7, pp.

1056-1068, Jul. 2001.

[56] L. Sendur and I. W. Selesnick, “Bivariate shrinkage functions for wavelet-based

denoising exploiting interscale dependency,” IEEE Trans. Signal Process., vol. 50,

no. 11, pp. 2744-2756, Nov. 2002.

[57] X. Li and M. T. Orchard, “Spatially adaptive image denoising under overcomplete

expansion,” in Proc. IEEE Int’l Conf. on Image Proc., Vancouver, Canada, Sept.

2000.

[58] V. Strela, J. Portilla, and E. Simoncelli, “Image denoising using a local Gaussian

scale mixture model in the wavelet domain,” in Proc. SPIE. Wavelet Applications

in Signal and Image Processing VIII, vol. 4119, pp. 363-371, Dec. 2000.

[59] M. Figueiredo and R. Nowak, “An EM algorithm for wavelet-based image restora-

tion, ” IEEE Trans. Image Process., vol. 12, no. 8, pp. 906-916, Aug. 2003.

[60] A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy, “A joint inter- and in-

trascale statistical model for wavelet based Bayesian image denoising,”IEEE Trans.

Image Process., vol. 11, no. 5, pp. 545-557, May 2002.

136 [61] A. Pizurica, W. Philips, I. Lemahieu, and M. Acheroy, “A versatile wavelet domain

noise filtration technique for medical imaging,” IEEE Trans. Medical Imaging,vol.

22, no. 3, pp. 323-331, Mar. 2003.

[62] T. Blu and F. Luisier, “The SURE-LET approach to image denoising,” IEEE

Trans. Image Process., vol. 16, no. 11, pp. 2778-2786, Nov. 2007.

[63] N. Kingsbury, “Complex wavelets for shift invariant analysis and filtering of sig-

nals,” J. Appl. Comput. Harmon. Anal., vol. 10, no. 3, pp. 234-253, May. 2001.

[64] M. A. Miller and N. G. Kingsbury, “Image modeling using interscale phase prop-

erties of complex wavelet coefficients,” IEEE Trans. Image Process., vol. 17, no. 9,

pp. 1491-1499, Sep. 2008.

[65] A. P. Bradley, “Shift-invariance in the discrete wavelet transform,” in Proc. VIIth

Digital Image Computing: Techniques and Applications, O. S. Sun C., Talbot H.and

A. T. (Eds.), Eds., Sydney, 10-12 Dec. 2003.

[66] M. Holschneider, R. Kronland-Martinet, J. Morlet and P. Tchamitchian. “A real-

time algorithm for signal analysis with the help of the wavelet transform,” Wavelets,

Time-Frequency Methods and Phase Space, pp. 289-297. Springer-Verlag, 1989.

[67] R. R. Coifman and D. L. Donoho, “Translation invariant de-noising,” Lecture

Notes in Statistics: Wavelets and Statistics, vol. 103, Springer Verlag, NewYork,

pp. 125-150,1995.

[68] G. Nason and B. W. Silverman, “The stationary wavelet transform and some

statistical applications,” vol. 103, New York: Springer Verlag, 1995.

[69] M. Lang, H. Guo, J.E. Odegard, and C.S. Burrus, “Nonlinear processing of a

shift invariant DWT for noise reduction,” SPIE, Mathematical Imaging: Wavelet

Applications for Dual Use, April 1995.

137 [70] H. Sari-Sarraf and D. Brzakovic, “A shift-invariant discrete wavelet transform,”

IEEE Trans. Signal Process., vol. 45, no. 10, pp. 2621-2626, Oct. 1997.

[71] G. Y. Chen, T. D. Bui, “Multiwavelets denoising using neighboring coefficients,”

IEEE Signal Process. Lett., vol. 10, no. 7, pp. 211-214, Jul. 2003.

[72] T. D. Bui, Guangyi Chen, “Translation-invariant denoising using multiwavelets,”

IEEE Trans. Signal Process., vol. 46, no. 12, pp.3414-3420, Dec. 1998.

[73] T. Hsung, D. Lun and K. Ho, “Optimizing the multiwavelet shrinkage denoising,”

IEEE Trans. Signal Process., vol. 53, no. 1, pp. 240-251, Jan. 2005.

[74] V. Strela and A. T. Walden, “Orthogonal and biorthogonal multiwavelets for signal

denoising and ,” in Proc. SPIE, vol. 3391, pp.96-107, 1998.

[75] E. P. Simoncelli, W. T. Freeman, E. H. Adelson, and D. J. Heeger, “Shiftable

multi-scale transforms,” IEEE Trans. Information Theory, vol. 38, no. 2, pp. 587-

607, Mar. 1992.

[76] E. Simoncelli and W. Freeman, “The steerable pyramid: a flexible architecture for

multi-scale derivative computation,” in Proc. International Conference on Image

Processing (ICIP’95), vol. 3, pp. 444-447, Oct. 1995.

[77] E. J. Cands and D. L. Donoho, “Ridgelets: A key to higher-dimensional intenmit-

tency?” Phil. Trans. R. Soc. Lond. A., pp. 2495-2509, 1999.

[78] J.-L. Starck, E. J. Cands, and D. L. Donoho, “The curvelet transform for image

denoising, ” IEEE Trans. Image Process., vol. 11, no. 6, Jun. 2002.

[79] P. Vandergheynst and J.-F. Gobbers, “Directional dyadic wavelet transforms: de-

sign and algorithms,” IEEE Trans. Image Process., vol. 11, no. 4, pp.363-372, Apr.

2002.

138 [80] M. Do and M. Vetterli, “The contourlet transform: an efficient directional mul-

tiresolution image representation,” IEEE Trans. Image Process., vol. 14, no. 12,

pp. 2091-2106, Dec. 2005.

[81] V. Velisavljevic, B. Beferull-Lozano, M. Vetterli, and P. Dragotti, “Directionlets:

anisotropic multidirectional representation with separable filtering,” IEEE Trans.

Image Process., vol. 15, no. 7, pp. 1916-1933, Jul. 2006.

[82]D.L.DonohoandX.Huo,T.J.Barth,T.Chan,andR.Haimes,“Beamletsand

multiscale image analysis,” Multiscale and Multiresolution Methods, Springer, vol.

20, pp.149-196, 2002.

[83] E. Le Pennec and S. Mallat, “Sparse geometric image representations with ban-

delets,” IEEE Trans. Image Process., vol. 14, no. 4, pp. 423-438, Apr. 2005.

[84] D. Van De Ville, T. Blu, and M. Unser, “Isotropic polyharmonic B-Splines: Scaling

functions and wavelets,” IEEE Trans. Image Process., vol. 14, no. 11, pp. 1798-1813,

Nov. 2005.

[85] L. Cayon, J. L. Sanz, R. B. Barreiro, E. Martinez-Gonzalez, P. Vielva, L. Toffolatti,

J. Silk, J. M. Diego and F. Argueso, “Isotropic wavelets: a powerful tool to extract

point sources from cosmic microwave background maps,” Mon. Not. R. Astron.

Soc., vol. 315, no. 4, pp. 757-761, July 2000.

[86] D. Muresan and T. Parks, “Adaptive principal components and image denoising,”

in Proc. International Conference on Image Processing (ICIP’03), vol. 1, pp. I-101-

4, Sept. 2003.

[87] T. Tasdizen, “Principal components for non-local means image denoising,” in Proc.

International Conference on Image Processing (ICIP’08), pp. 1728-1731, Oct. 2008.

139 [88] L. Zhang, W. Dong, D. Zhang and G. Shi, “Two-stage image denoising by principal

component analysis with local pixel grouping,” Pattern Recognition, vol. 43, no. 4,

pp. 1531-1549, Apr. 2010.

[89] P. Hoyer, A. Hyvarinen and E. Oja, “Sparse code shrinkage for image denois-

ing,” Neural Networks Proceedings. IEEE World Congress on Computational Intel-

ligence., vol. 2, pp. 859-864, 1998.

[90] P. Gruber, K. Stadlthanner, M. Bohm, F. J. Theis, E.W. Lang and A. M. Tom,

“Denoising using local projective subspace methods,” Neurocomputing, vol. 69, no.

13-15, pp. 1485-1501, Aug. 2006

[91] A. Hyvarinen, E. Oja, P. Hoyer, J. Hurri, “Image feature extraction by sparse cod-

ing and independent component analysis,” Pattern Recognition, 1998. Proceedings.

Fourteenth International Conference on, vol.2, pp. 1268-1273, Aug. 1998. Brisbane,

Australia.

[92] K. Dabov, A. Foi, V. Katkovnik, and K. Egiazarian, “Image denoising by sparse

3-d transform-domain collaborative filtering,” IEEE Trans. Image Process., vol. 16,

no. 8, pp. 2080-2095, Aug. 2007.

[93] L. I. Rudin, S. Osher, and E. Fatemi, “Nonlinear total variation based noise re-

moval algorithms, ” Physica D, vol. 60, pp. 259-268, 1992.

[94] C. R. Vogel and M. E. Oman, “Fast, robust total variation-based reconstruction

of noisy, blurred images,” IEEE Trans. Image Process., vol. 7, no. 6, pp. 813-824,

Jun. 1998.

[95] S. Osher, M. Burger, D. Goldfarb, J. Xu, and W. Yin, “An iterative regularization

method for total variation-based image restoration,” Multiscale Model. Simul.,vol.

4, pp. 460-489, 2005.

140 [96] A. Buades, B. Coll, and J.-M. Morel, “A non-local algorithm for image denois-

ing,” Computer Vision and Pattern Recognition, 2005. IEEE Computer Society

Conference on, vol. 2, pp. 60-65, San Diego, California, Jun. 2005.

[97] C. Kervrann and J. Boulanger, “Optimal spatial adaptation for patch-based image

denoising,” IEEE Trans. Image Process., vol. 15, no. 10, pp. 2866-2878, Oct. 2006.

[98] M. Zhang and B. K. Gunturk, “Multiresolution bilateral filtering for image de-

noising,” IEEE Trans. Image Process., vol. 17, no. 12, pp. 2324-2333, Dec. 2008.

[99] S. Schulte, B. Huysmans, A. Pizurica, E. E. Kerre, and W. Philips, “A new fuzzy-

based wavelet shrinkage image denoising technique,” Lecture Notes Comput. Sci.,

vol. 4179, p.12 , 2006.

[100] S. Durand and J. Froment, “Reconstruction of wavelet coefficients using total

variation minimization,” SIAM J. Sci. Comput., vol. 24, pp. 1754-1767, 2003.

[101] Y. Wang and H. Zhou, “Total variation wavelet-based medical image denoising,”

Int. J. Biomed. Imag., vol. 2006, pp.1-6, 2006.

[102] M. Black, G. Sapiro, D. Marimont and Heeger, “Robust anisotropic diffusion,”

IEEE Trans. Image Process., vol. 7, pp. 412-432, 1998.

[103] J. Weickert, “Anisotropic diffusion in image processing,” Teubner-Verlag,

Stuttgart, 1998.

[104] J. Bai and X.-C. Feng, “Fractional-order anisotropic diffusion for image denois-

ing,” IEEE Trans. Image Process., vol. 16, no. 10, pp. 2492-2502, Oct. 2007.

[105] D. Geman and G. Reynolds, “Constrained restoration and the recovery of dis-

continuities,” IEEE Trans. Pattern Anal.Machine Intell. vol. 14, no. 3, pp. 367-383,

Mar. 1992.

141 [106] R. Molina, “On the hierarchical Bayesian approach to image restoration: appli-

cations to astronomical images,” IEEE Trans. Pattern Anal. Machine. Intell.vol.

16, no. 11, pp. 1123-1128, Nov. 1994.

[107] Q. Lu and T. Jiang, “Pixon-based image denoising with Markov random fields,”

Pattern Recognition, vol. 34, no. 10, pp. 2029-2039, Oct. 2001.

[108] C. Bouman and K. Sauer, “A generalized Gaussian image model for edge-

preserving MAP estimation,” IEEE Trans. Image Process., vol. 2, no.3, pp. 296-310,

Jul. 1993.

[109] J. Quan, W. G. Wee, and C. Y. Han, “A new wavelet based image denoising

method,” in Proc. IEEE Data Compression Conference (DCC’12), pp. 408, Snow-

bird UT, USA, Apr. 2012.

[110] D.L. Donoho, “Non-linear wavelet methods for recovery of signals, densities and

spectra from indirect and noisy data,” In Proceedings of Symposia in Applied Math-

ematics: Different Perspectives on Wavelets, vol.47, Daubechies, I. (Ed.), pp.173-

205, SanAntonio: American Mathematical Society. 1993.

[111] F.J. Anscombe, “The transformation of Poisson, binomial and negative binomial

data,” Biometrika, vol. 35, no. 3/4, pp.246-254, 1948.

[112] P. Fryzlewicz, G.P. Nason, “A Haar-Fisz algorithm for Poisson intensity estima-

tion,” Journal of Computational and Graphical Statistics, vol. 13 no. 3, pp. 621-638,

2004.

[113] M. Fisz, “The limiting distribution of a function of two independent random

variables and its statistical application,” Colloq. Math., vol. 3, pp.138-146, 1955.

142 [114] B. Zhang, J. M. Fadili, and J.-L. Starck, “Wavelets, ridgelets, and curvelets for

Poisson noise removal,” IEEE Trans. Image Process., vol. 17, no. 7, pp. 1093-1108,

Jul. 2008.

[115] P. R. Prucnal and B. E. A. Saleh. “Transformation of image-signal-dependent

noise into image-signal-independent noise,” Optics Letters,vol. 6, no. 7, pp. 316-

318, Jul. 1981.

[116] A. Foi, M. Trimeche, V. Katkovnik, and K. Egiazarian. “Practical Poissonian-

Gaussian noise modeling and fitting for single-image raw-data,” IEEE Trans. Image

Process., vol. 17, no. 10, pp. 1737-1754, 2008.

[117] E. D. Kolaczyk, “Nonparametric estimation of intensity maps using Haar

wavelets and Poisson noise characteristics,” Astrophys. J., vol.534, pp.490-505,

2000.

[118] B. Zhang, M. J. Fadili, J.-L. Starck, and S. W. Digel, “Fast Poissonnoise removal

by biorthogonal Haar domain hypothesis testing,” Statsit. Method., 2007.

[119] E. D. Kolaczyk, “Wavelet shrinkage estimation of certain Poisson intensity signals

using corrected thresholds,” Statist. Sinica, vol. 9, pp. 119-135, 1999.

[120] R. D. Nowak and R. G. Baraniuk, “Wavelet-domain filtering for photon imaging

systems,” IEEE Trans. Image Process., vol. 8, no. 5,pp. 666-678, May 1999.

[121] A. Antoniadis and T. Sapatinas, “Wavelet shrinkage for natural exponential fam-

ilies with quadratic variance functions,” Biometrika, vol. 88,pp.805-820, 2001.

[122] E. D. Kolaczyk, “Bayesian multiscale models for Poisson processes,” J. Amer.

Statist. Assoc., vol. 94, no. 447, pp.920-933, Sep. 1999.

143 [123] K. E. Timmermann and R. D. Nowak, “Multiscale modeling and estimation of

Poisson processes with application to photon-limited imaging,” IEEE Trans. Inf.

Theory, vol. 45, no. 3, pp. 846C862, Apr. 1999.

[124] R. D. Nowak and E. D. Kolaczyk, “A statistical multiscale framework for Poisson

inverse problems,” IEEE Trans. Inf. Theory, vol. 46, no. 8,pp. 1811-1825, Aug.2000.

[125] S. Sardy, A. Antoniadis and P. Tseng, “Automatic smoothing with wavelets for

a wide class of distributions,” Journal of Computational & Graphical Statistics vol.

13, no. 2, pp.399-423, 2004.

[126] R. M. Willett and R. D. Nowak, “Multiscale Poisson intensity and density esti-

mation,” IEEE Trans. Inf. Theory, vol. 53, no. 9, pp. 3171-3187, Sep. 2007.

[127] R. Willett, “Multiscale analysis of photon-limited astronomical images,” pre-

sented at the Statistical Challenges in Modern Astronomy IV, 2006.

[128] J. Quan, W. G. Wee, C. Y. Han, and X. Zhou, “A new Poisson noisy image denois-

ing method based on the Anscombe transformation,” in Proc. International Confer-

ence on Image Processing, Computer Vision, and Pattern Recognition (IPCV’12),

pp. 949-955, Las Vegas NV, USA, Jul. 2012.

144