<<

Article

The Gaussian transform of distributions : definition, computation and application

ALECU, Teodor, VOLOSHYNOVSKYY, Svyatoslav, PUN, Thierry

Abstract

This paper introduces the general-purpose Gaussian transform of distributions, which aims at representing a generic symmetric distribution as an infinite mixture of Gaussian distributions. We start by the mathematical formulation of the problem and continue with the investigation of the conditions of existence of such a transform. Our analysis leads to the derivation of analytical and numerical tools for the computation of the Gaussian transform, mainly based on the Laplace and Fourier transforms, as well as of the afferent properties set (e.g., the transform of sums of independent variables). The Gaussian transform of distributions is then analytically derived for the Gaussian and Laplacian distributions, and obtained numerically for the generalized Gaussian and the generalized Cauchy distribution families. In order to illustrate the usage of the proposed transform we further show how an infinite mixture of Gaussians model can be used to estimate/denoise non-Gaussian data with linear estimators based on the Wiener filter. The decomposition of the data into Gaussian components is straightforwardly computed with the [...]

Reference

ALECU, Teodor, VOLOSHYNOVSKYY, Svyatoslav, PUN, Thierry. The Gaussian transform of distributions : definition, computation and application. IEEE Transactions on Signal Processing, 2006, vol. 54, no. 8, p. 2976-2985

DOI : 10.1109/TSP.2006.877657

Available at: http://archive-ouverte.unige.ch/unige:47435

Disclaimer: layout of this document may differ from the published version.

1 / 1

“© IEEE. Personal use of this material is permitted. However, permission to reprint/republish this material for advertising or promotional purposes or for creating new collective works for resale or redistribution to servers or lists, or to reuse any copyrighted component of this work in other works must be obtained from the IEEE.”

IEEE Transactions on Signal Processing, accepted October 2005, in press

The Gaussian Transform of distributions: definition, computation and application

Teodor Iulian Alecu*, Sviatoslav Voloshynovskiy and Thierry Pun

1 and is best described by other types of distribution (e.g in Abstract—This paper introduces the general purpose Gaussian image processing, the most commonly used model for the Transform of distributions, which aims at representing a generic wavelet coefficients distribution is the Generalized Gaussian symmetric distribution as an infinite mixture of Gaussian distribution [6]). The goal of the work presented in this paper distributions. We start by the mathematical formulation of the is to describe non-Gaussian distributions as an infinite mixture problem and continue with the investigation of the conditions of of Gaussian distributions, through direct computation of the existence of such a transform. Our analysis leads to the mixing functions from the non-Gaussian distribution. derivation of analytical and numerical tools for the computation of the Gaussian Transform, mainly based on the Laplace and In a related work [4], it was proven that any distribution can Fourier transforms, as well as of the afferent properties set (e.g. be approximated through a mixture of Gaussian up to an the transform of sums of independent variables). arbitrary level of precision. However, no hint was given by The Gaussian Transform of distributions is then analytically the author on how to obtain the desired mixture in the general derived for the Gaussian and Laplacian distributions, and case. In the radar community the problem of modeling non- obtained numerically for the Generalized Gaussian and the Gaussian radar clutters led to the theory of Spherically Generalized Cauchy distribution families. Invariant Random Processes (SIRP) [13], which aimed at In order to illustrate the usage of the proposed transform we generating non-Gaussian multivariate distributions using further show how an infinite mixture of Gaussians model can be Gaussian distributions. However, the proposed used to estimate/denoise non-Gaussian data with linear solutions bypassed the mixing function ([12] and references estimators based on the Wiener filter. The decomposition of the therein), performing direct computation of the multivariate data into Gaussian components is straightforwardly computed with the Gaussian Transform, previously derived. The estimation non-Gaussian distribution from the marginal probability is then based on a two-step procedure, the first step consisting in distribution function and the desired covariance matrix. In [5], estimation, and the second step in data estimation an analytical formula is given for an infinite mixture of through Wiener filtering. To this purpose we propose new Gaussians equivalent to the Laplacian distribution, and used in generic variance estimators based on the Infinite Mixture of a source coding application. Unfortunately, no generalization Gaussians prior. It is shown that the proposed estimators was attempted by the authors. The work presented here has compare favorably in terms of distortion with the shrinkage the roots in their proof and extends the concept through the denoising technique, and that the distortion lower bound under introduced Gaussian Transform of distributions, which this framework is lower than the classical MMSE bound. permits straightforward derivation of exact mixing functions for a wide range of symmetric distributions, including the Index Terms—Gaussian mixture, Gaussian Transform of Generalized Cauchy and the Generalized Gaussian distributions, Generalized Gaussian, denoising, shrinkage distributions. In order to exemplify the possible usage of the transform I. INTRODUCTION we consider then the problem of estimation/denoising, Gaussian distributions are extensively used in the (broad motivated by works such as [9] and [10], where, in order to sense) signal processing community, mainly for computational preserve the computational advantages provided by Gaussian benefits. For instance, in an estimation problem Gaussian modelling, non-Gaussian distributions are approximated with priors yield quadratic functionals and linear solutions. In rate- finite Gaussian mixtures obtained through iterative numerical distortion and coding theories, closed form results are mostly optimization techniques. Relying on the Gaussian Transform, available for Gaussian source and channel descriptions [11]. we propose new generic algorithms for denoising non- However, real data is most often non–Gaussian distributed, Gaussian data, based on the description of non-Gaussian distributions as an Infinite Mixture of Gaussians (IMG). 1This work is supported in part by the Swiss NCCR IM2 The rest of the paper is divided in three main sections. In (Interactive Multimodal Information Management, Section II we define the Gaussian Transform, analyze its http://www.im2.ch), the EU Network of Excellence Similar existence, investigate its properties and derive the (http://www.similar.cc) and the SNF grant N° PP002-68653/1 . mathematical tools for analytical and/or numerical T. I. Alecu, S. Voloshynovskiy, T. Pun are with the Computer computation. In Section III we exemplify both the transform Vision and Multimedia Laboratory, University of Geneva, 24 Rue for some typical distributions such as Generalized Gaussian Général-Dufour, 1204 Geneva, Switzerland (corresponding author: [email protected], phone: +41(22)3791084; fax: and Generalized Cauchy, and some of the properties deduced +41(22)3797780). in Section II. The last section tackles the problem of estimation/denoising and proposes new Gaussian Transform ∞ 22 based denoising schemes, exemplifying them for the ∫ Gd()σσ= 1 . Generalized Gaussian Distribution (GGD) family. The 0 obtained results are compared in terms of distortion with the In order to investigate Condition 1, the existence of G (σ 2 ) , state-of-the-art denoising method of Moulin and Liu [7]. we perform the following variable substitutions: s = x2 and 1 II. THE GAUSSIAN TRANSFORM OF DISTRIBUTIONS t = . Since p(x) is symmetric, it can be rewritten as: 2σ 2 A. Definition and existence px()== p( x) p() s . We consider a generic symmetric continuous distribution The left hand side of (1) transforms to: p(x). As we are aiming at representing it through an infinite ∞∞ mixture of Gaussians, we can safely disregard the , and 222⎛⎞111 −st ∫∫GxdG()(σσσN | ) = ⎜⎟ edt. assume for simplicity reasons that p(x) is zero-mean. We are 00⎝⎠22ttπ t looking for an integral representation in the form: According to the definition of the Laplace Transform L [1], ∞ ∫ Gxdpx()(σσσ222N |, ) = () (1) equation (1) finally takes the form: 0 ⎛⎞⎛⎞111 2 L⎜⎟Gps⎜⎟ = (). (3) where N()x |σ is the zero-mean Gaussian distribution: ⎝⎠⎝⎠22ttπ t x2 2 1 − 2 Thus, G σ is linked to the original N()x |eσ 2 = 2σ , ( ) 2πσ 2 p(x) through the Laplace Transform and can be computed -1 and G ()σ 2 is the mixing function that should reproduce the using the Inverse Laplace Transform L . The direct Gaussian original p(x). We can now introduce the Gaussian Transform. Transform is therefore given by: 1 π Definition 1: Gaussian Transform. The direct Gaussian GLpx= −1 p s t , (4) ()() 22()()() 1 σσ2 ( )t = Transform G is defined as the operator which transforms 2σ 2 and the Inverse Gaussian Transform can be computed as: px() into G ()σ 2 , and the Inverse Gaussian Transform G-1 is −12⎛⎞⎛⎞⎛⎞111 2 GLGG()σ = ⎜⎟⎜⎟⎜⎟ () s. (5) defined as the operator which maps G ()σ back to px( ) : () 22tt ⎝⎠⎝⎠⎝⎠ π t s=x2 G:;px()66 G()σσ21 G − :. G () 2 px() Consequently, the existence of the Gaussian Transform is conditioned by the existence of the Inverse Laplace Transform Obviously, G-1 is simply given by (1): ∞ of ps( ) . Using the general properties of the Laplace −12 2 22 GN()GGxdpx()σσσσ==∫ ()(| ) (). (2) Transform, it is sufficient [1] to prove that the limit at infinity 0 The Gaussian Transform of a distribution exists if, given of s ⋅ ps( ) is bounded, or equivalently : 2 2 px(), a mixing distribution G ()σ can be found such as to lim x→∞ xpx⋅<∞() . (6) comply with (1). This can be summarized in three conditions. The above condition is satisfied by all the distributions from the , as well as by all the distributions with Condition 1. For a given p(x), a function G σ 2 defined () finite variance. according to (1) exists. The equation (4) allows for straightforward identification of Condition 2. This function is non-negative. Gaussian Transforms for distributions whose Laplace ∞ Transforms are known, by simply using handbook tables. Condition 3. Its integral ∫ Gd()σ 22σ is equal to 1. Unfortunately, the condition (6) does not guarantee 0 compliance with the Condition 2: non-negativity. As it is The last condition is a consequence of Condition 1. Indeed, if rather difficult to verify a priori this constraint, the test should G ()σ 2 exists, then integrating both sides of (1) with respect be performed a posteriori, either analytically or numerically. However, by imposing the non-negativity constraint in (1), to x and inverting the integration order one obtains: one can see that it is necessary to have a strictly decreasing ∞∞⎛⎞ ∞ distribution function p(x) on positive x, as p(x) is described by Gxddxpxdxσσσ222N |,= ∫∫⎜⎟()( ) ∫ () a sum of strictly decreasing zero-mean Gaussians functions. −∞⎝⎠0 −∞ From the same reason, but not related to non-negativity, the ∞∞ ⎛⎞ ∞ ∫∫⎜⎟Gxdxd()σσσ222N (|1. ) = distribution function p(x) should also be of class C . 0 ⎝⎠−∞ B. Properties of the Gaussian Transform 2 Finally, since N ()x | σ is a distribution: 2 Property 1: Uniqueness. If two functions G1 (σ ) and 2 −12 1 ∞ ⎛⎞σ 2 G2 ()σ exist such that G ()Gpx1 ()σ = () and −12 22 GNGGxd21()σ = 22⎜⎟()|;σσ ()αα∫ G −12Gpxσ = then they are identical 0 ⎝⎠ ()2 () () 22 ∞ x α 22− * 22 −12 ⎛⎞σ 1 ασ222 ⎛⎞σ GGσ = σ . Conversely, if two distribution G GG21σ = e d 12() () ()() ∫ ⎜⎟222 ⎜⎟ 0 ⎝⎠α σ ⎝⎠α απ2 probabilities px() and px() have the same Gaussian 2 X1 X 2 α Transform G σ 2 then they are identical. 1 ⎛⎞x () G −12Gppxσ ==. ()2 () XX12⎜⎟ () Proof. The second proposition of the uniqueness property is αα⎝⎠ proved straightforwardly using (1). In what concerns the first Finally using the uniqueness property we conclude that the proposition, it is a direct consequence of the uniqueness of the Gaussian Transform G ( pxX ( )) is unique and given by (9). Laplace Transform [1], which states that if two functions have 2 the same Laplace transform, then they are identical. It is then Property 5: . If X1 and X2 are two independent 2 −1 random variables with G px= Gσ and sufficient to observe that G is computed as a Laplace ()XX11() ( ) Transform (5). G px= Gσ 2 , then the Gaussian Transform of their ( XX22( )) ( ) Property 2: Mean Value (variance conservation). The sum is the convolution of their respective Gaussian mean value of the Gaussian Transform is equal to the variance Transforms (the result can be generalized for the sum of of the original distribution. multiple variables): ∞∞ 22 2222 G ( pxGXX+ ( )) =∗ X()σ G X ()σ . (10) ∫∫x pxdx() = σ G()σσ d . (7) 12 1 2 −∞ 0 Proof. Consider the X =+XX. Since G 12 X1 ∞∞∞ 22222⎛⎞ 2 Proof. x pxdx== x GσσσN x| d dx exists, X1 is a random Gaussian variable with variance σ X ∫∫∫() ⎜⎟()( ) 1 −∞ −∞ ⎝⎠0 distributed according to the distribution probability GX . ∞∞ ∞ 1 22⎛⎞ 2 2 222 ==GxxdxdGdσσσσσσ⎜⎟N | q.e.d. Similarly, X2 is a random Gaussian variable with variance ∫∫() ( ) ∫ () 2 00⎝⎠−∞ σ distributed according to G . Then X is also a random X 2 X 2 2 Gaussian variable with varianceσ 222=+σσ. But σ 2 and Property 3: Value at infinity. If lim x→∞ xpx⋅<∞( ) then X XX12 X1 2 σ 2 are independent variables drawn from G and G . It the Gaussian Transform tends to 0 when σ tends to infinity: X 2 X1 X 2 limG σ 2 = 0 . (8) 2 σ 2 →∞ () follows that σ X is a random variable described by the Proof. We use the initial value theorem for the Laplace probability distribution GGG=∗, q.e.d. X XX12 Transform [1] and the direct formula (3): Corollary. If X1 and X2 are independent random variables 2 and X2 is Gaussian distributed px= N x| σ , then the ⎛⎞111 2 XX22() () limtsx→→∞→∞0 Gspsxpx⎜⎟ =⋅=⋅ lim() lim() , ⎝⎠22ttπ t . Gaussian transform of their sum is a shifted version of GX : 3 1 222 2 then limGGσσ<∞⇒ lim σ = 0 22 σσ22→∞()() →∞ () ⎧0,σσ< X 2 ⎪ 2 We study now the influence of basic operations on the GXX+ ()σ = ⎨ . (11) 12 G σ 22−≥σσσ, 2 2 Gaussian Transform, such as scalar multiplication and ⎩⎪ X12()XX 2 addition of independent variables. C. Numerical computation Property 4: Scaling. If X1 and X2 are random variables The computation of the Gaussian Transform for such that X = α X and G px= G, then the Gaussian 21 ()X11() X distributions not available in handbooks is still possible through the complex inversion method for Laplace Transform corresponding to the scaled variable X2 is a scaled version of G : Transforms known as the Bromwich integral [3]: X1 ε +∞i −1 1 ut 1 ⎛⎞σ 2 L ()f ()sfuedu= () , (12) 2 2πi ∫ GGXX()σ = 22⎜⎟ (9) ε −∞i 21αα ⎝⎠ where ε is a real positive constant satisfying Proof. We begin by observing that the distribution probability ε > sup( Re( poles() f )) and u is an auxiliary variable. The of the random variable X2 can be expressed as: 1 ⎛⎞x equation (12) can be rearranged as a , px= p . XX21() ⎜⎟ allowing the use of the numerous numerical and/or symbolical α ⎝⎠α packages available ( ω is the variable in the Fourier space): Then we compute the Inverse Gaussian Transform of the −−11εt expression proposed in (9) using (2): LF( f (se)) =+() f()ε iω . (13) Very often p(x) has no poles, being a continuous and bounded function, and in this case it might be very practical to evaluate 11bb pxbCC()|;|== p sb (13) in the limit case ε → 0 . Using (4) and (13): XXππbx22+ () bs 2+ b (17) 1 π − 21− 11 2 Gpitσω= F () . (14) C 2σ () 22( ()()) 1 G pxb()|.= 2 e σσ2 t = ()X 2 2σ 2 σ 2πσ When the original distribution is only numerically known, The results (15), (16) and (17) are plotted in Fig. 1. approximation of G ()σ 2 is still possible either through To exemplify the Corollary to the Convolution Property, consider now Cauchy data contaminated with independent analytical approximations of p(x), followed by (4) or (14), or additive white Gaussian (AWG) noise. Then the Gaussian through solving the inverse problem yielded by (1), or through Transform of the measured data (Fig. 2) is a shifted version of real inversion of the Laplace transform using (4). The the original data transform (11). accuracy of the obtained transforms can then be assessed using an appropriate metric or distance such as Kullback- Leibler. However, the abovementioned approximation methods and the accuracy study constitute stand-alone problems out of the scope of the present paper.

III. EXAMPLES OF GAUSSIAN TRANSFORMS

A. Analytic Gaussian transforms Figure 1. Gaussian Transforms of the Laplacian, Gaussian and The most obvious and natural example is the Gaussian Cauchy distributions 2 Transform of a Gaussian distribution N()x | σ 0 . One would expect to have (with δ the Dirac function): 222 GN()()x | σ 00=−δσ( σ) . (15) Proof. It is trivial that (15) verifies the equation (1). Then, by using the uniqueness of the Gaussian Transform one can conclude that the Gaussian Transform of a Gaussian 2 2 Figure 2. Shift of the Gaussian Transform following data distribution N()x | σ 0 is a Dirac function centered in σ 0 . contamination with AWG noise The convolution property (10) can now be used to prove a well known result in statistics: the sum of two independent The previous analytical results will be generalized in Gaussian variables, with respective probability laws subsection B through numerical computations. Nx | σ 2 and Nx | σ 2 , is another Gaussian variable with B. Numerical computation of Gaussian Transforms ()X1 ()X 2 This part illustrates the numerical computation of the probability distribution Nx | σσ22+ (the extension to ()XX12 Gaussian Transform through (14) for the Generalized non-zero mean distributions is trivial). Proof: Gaussian and Generalized Cauchy distributions families. GGpx=−δσ22 σ; px =− δσ 22 σ ()()XXXX1122() ()()() 1) Generalized Gaussian Distribution (GGD) GGGpx==−+ pxpx* δσ222 σ σ The GGD family is described by an exponential probability ()XX12+ () () X 1() () X 2() () ( X 1 X 2 ) density function with parameters γ and σ : Then, inverting the Gaussian Transform: γ γ −1 222 22 ⎛⎞⎛⎞ px=−+=+GNδσσσ| σσ. G γη() γ 1 x XX12+ ()() XX 12() XX 12 ⎜⎟ () px()|,γσγ =− e⎜⎟ ηγ() , (18) X ⎜⎟⎜⎟ 2Γ 1 σσγγ Similarly to (15), it is possible to compute the Gaussian ()γ ⎝⎠⎝⎠ Transforms of other usual symmetric distributions using the Laplace transform tables [1]. We exemplify with the −1 where ηγ =Γ3 Γ1 . For γ = 1 the GGD Laplacian and Cauchy distributions. () ( γγ) ( ) Laplacian distribution: particularizes to the Laplacian distribution, while for γ = 2 λλ pxL ()|;|λλ== e−−λλxL p s e s one obtains the Gaussian distribution. The Gaussian XX22() (16) Transform of the Generalized Gaussian distribution can not be 2 λ2 λ − σ 2 obtained in analytical form using (4). However, it does exist G psL |.λ = e2 ()X ()2 (6) and can be calculated numerically through (14): As mentioned, the result (16) was already proven in [5]. 211 π − G GpitX |,γσ ()σω= 22F () γγσ( ( X |,γ ()) ) 1 Cauchy distribution: σσ2 t = 2σ 2 ⎛⎞γ γη() γ1 ⎛⎞ η() γ γ piG ωω=− e⎜⎟⎜⎟ i2 . X |,γσ () () γ ⎜⎟⎜⎟ 2Γ 1 σσγγ⎝⎠ ()γ ⎝⎠ The Gaussian Transforms for γ ranging from 0.5 to 2 with As a remark, the variance of the Cauchy distribution being infinite, its Gaussian Transform has infinite mean value (7). fixed σ γ = 1 are plotted in Fig. 3. The transforms evolve from The GCD family possesses finite variance only for ν > 1 . a Dirac-like distribution centered on 0 for small γ to exponential for γ = 1 , then Rayleigh-like for γ = 1.2 , bell- IV. DENOISING shaped for γ = 1.5 and again Dirac-like centered at σ 2 for γ We are interested in the generic denoising problem of γ = 2 . As expected, the Gaussian Transform of the Laplacian estimating the original scalar data x from the measured data y, distribution (γ = 1 ) is exponential (16). degraded by additive noise z: yxz= + . (19) We assume that the noise random variable Z is zero-mean 2 Gaussian distributed with variance σ Z : z2 − 2 1 2σ Z pzZ ()= e . (20) 2 2πσ Z We further assume that the probability density function pX(x), which describes the source random variable X, is a zero- mean symmetric distribution, and that its Gaussian Transform 2 GpxXX(σ ) = G ( ( )) exists. For exemplification of the

proposed techniques we consider pX(x) to be the Generalized Gaussian distribution (18). We base our estimation on the Maximum a Posteriori (MAP) principle [14]:

xˆ = arg maxxXYpxy| ( | ) , which yields for i.i.d. zero-mean Gaussian data and noise 2 2 (with σ X and σ Z ) the classical Wiener filter: σ 2 ˆ X x = 22y . (21) σσXZ+ We also note that the estimate (21) is also the Minimum Mean Square Error (MMSE) estimate for i.i.d. zero-mean Gaussian Figure 3. Gaussian Transforms of GGD data and noise: Unfortunately, the real part of the complex probability xˆˆ=−arg minxx2 p xydx | function diverges for periodical values of γ, which impedes MMSExXY∫ () | () x the computation of the transform through this method for γ >2. . (22) xxpxydxˆ = | However, real data (in transform domains such as discrete MMSE∫ XY | () cosine and wavelet) is mostly confined to 0<γ<2 [6]. x According to the IMG model, each data sample is 2) Generalized Cauchy Distribution (GCD) interpreted to be drawn from a Gaussian distribution with The GCD probability density function is given by variance σ 2 distributed accordingly to the Gaussian 1 X Γ+ν 2ν 2 C ()2 b Transform GX σ . Consequently, using the prior pxx ()|,ν b= , ( ) 22ν +0.5 πνΓ()()bx+ 2 knowledge of GX (σ ) , we can estimate the variance from y and it particularizes to the Cauchy distribution for ν = 0.5 . Its and use Wiener filtering (21). Thus, our estimator is given by: Gaussian Transform can be computed through: σ 2 ()y 2ν ˆ X 212b Γ+()ν 0.5 − −−ν 0.5 x = 22y . (23) GbiXb|,ν σω=+F . σσXZ()y + () 22 ()(()) 1 t = σσν2 Γ 2 () 2σ The initial problem is now reduced to a variance estimation Corresponding plots are given in Fig. 4. problem. For this purpose, we begin by introducing the cumulative estimation technique. A. Cumulative Estimation We are looking for an estimator that provides variance estimates consistent with the original data distribution, in the sense that the description of the data as a Gaussian mixture should be identical to the non-Gaussian original description. We set the above assertion as a consistency principle. Figure 4. Gaussian Transforms of GCD Principle 1: Consistency. The probability distribution MM12 ξξ22(MM12( y)) = ()()y . function of the variance estimate should be identical to the σσXX Proof. Since the inequalities are preserved (theorem Gaussian Transform of the prior distribution pX(x). hypothesis), then the cumulative distribution functions P Moreover, since the distributions pX(x), pZ(z) are symmetric M1 and zero-mean, we assume an estimator of the form: and P are equal: M2 2 M σξˆX ()y = 2 ()M ()y , σX pypy(MM11≤=≤( )) () MM 2 2() then M M 1 ()y M2 ()y where ξ 2 denotes the consistent variance estimator . σX PMMMMMM1 () y=== p() u du p() u du P2 () y 1122()∫∫ () depending on the symmetric real non-negative function: 00 MMM:;\6\0+ ()y =− (y ), The proof is ended by corroborating the above equality with where \ and \0+ denote real, respectively real nonnegative the cumulative estimation formula (27). numbers. As an example, a possible M function is the B. IMG cumulative estimator absolute value of y. We further assume that this estimator is The cumulative estimator (27) can be directly used in (23) monotonically increasing with the function M: by setting the M function as M()y = y , and the IMG MM MM()y12<⇒ ()yyyξξ22() M( 1) <() M( 2) . (24) cumulative estimator (IMG CE) is simply: σσXX σˆ 2 ()y If one chooses the M function as the absolute value y , xyˆ = XCE_ with IMG_ CE σσˆ 22()y + this simply that if the absolute value of y is larger, then XCE_ Z (28) the estimated variance σˆ 2 ()y is also larger. If the variables y X σˆ 21yPGPyPy==− , 2 pudu . XCE_ () X()MM () ()∫ Y () are vectors (e.g. local estimation paradigm) one can similarly 0 use as the M function the variance of the measurement As X and Z are assumed to be independent, pY(y) can be vector, and (24) would mean that if the variance of the computed as the convolution of pX(x) and pZ(z): measurement is larger, then the estimated variance is also pppYXZ= ∗ . larger. In the general case the operations in (28) have to be performed The assumption in (24) implies that the cumulative numerically. However, if the distribution pX(x) is Laplacian distribution function of the variance estimates is equal to the (GGD with γ = 1 ), its Gaussian Transform is (16), (18): cumulative distribution function of M: σ 2 − 1 σ 2 pypyσσˆˆ22≤=≤MM . (25) GeG σ 2 = γ , (29) ()XX() ()() X |,γσγ () 2 σ γ Considering continuous probability density functions and and the cumulative estimator from (28) reduces to: incorporating the consistency condition (the distribution of the σσ22yPy=−ln 1 − . (30) estimates should be identical to GX), (25) can be rewritten as: XCE_,1γγ= ( )()()M M As γ tends to 2, the Gaussian Transform of the GGD tends PyPGyM ()MM()= X ξ 2 ()() , ()σ X M()y σ 2 2 x (26) asymptotically to a Dirac function centered at σ γ (15), and PM y== p u du; PGσ 2 G u du, MM()() ∫∫() XX() X() the estimator (28) to the classical Wiener filter (21). In order 00to asses the performances of the proposed estimators we where pM is the probability density function describing measure the accuracy of the estimate according to the L2

M()y , and PM and PGX are cumulative probability norm: functions, as defined in (26). Then the estimator we are exx()(), ˆˆ=− x x2 . looking for is simply given by: The global distortion induced by an estimator is therefore: M −1 ξ 2 ()MM()yPGP= X ()M ()() y, (27) Dexxpxpzdxdz= , ˆ . (31) σ X ∫∫ ( )()()XZ −1 where PGX is the mathematical inverse of the function PGX: We compare our estimator in terms of distortion with the if PGσ 212== u then PG− () u σ . estimators known as shrinkage functions from [7]. The idea of XX() shrinking originates from [8], but [7] provides both improved We denote (27) as the cumulative estimator, simply because it results and statistical interpretation of the functions. In fact, uses cumulative probability functions for the estimation. the shrinkage functions from [7] are the result of direct MAP We further infer that the cumulative estimator is robust with estimation, which in the case of GGD data and Gaussian noise respect to monotonically increasing transformations of M. yields xˆ as the solution of the equation in x: 21−−γ γ Theorem 1. If M1 and M2 are two symmetric real non- y =−xxσσZ γ γηγ() . (32) 0+ Equation (32) has analytical solutions for γ = 0.5 (cubic), negative functions MM12,:\6\ , and if γ = 1 or 2 (linear) and γ = 1.5 (quadratic). The resulting MM11()y <⇔< 12 ()yyy MM 21 () 2 () 2, then the associated cumulative estimators are identical: shrinkage functions have thresholding characteristics for γ ≤ 1 and yield the Wiener filter for γ = 2 . The equivalent of separable IMG components. In fact, for a given y, the IMG CE functions do not present thresholding characteristics MMSE estimator is unique (injective estimation), which is not for any γ , behaving in continuous manner. Graphic the case for the ideal estimator that would lead to the proposed IMG lower bound, and that would yield, for the same value of illustration of the shrinkage functions and of the equivalent y, different estimates for each isolated Gaussian component. IMG CE functions obtained via (28) are presented in Fig. 5.

Figure 5. Shrinkage functions and IMG CE estimation curves One can also compute the lower bound on the distortion induced by an estimation procedure based on signal decomposition into an IMG. Assuming that the Gaussian components of the IMG are separable, and that the variance estimate is constant over each separate component, the equation (21) is the MMSE estimate for each component. The bound is attained when the estimator "knows" the Gaussian component of the mixture that has generated the estimated sample. The ensuing distortion lower bound is: σσ22 DG= σ 22Z dσ , (33) lower bound ∫ X ()22 σσ+ Z 22 σσZ where the ratio 22 is the theoretical distortion of the σ +σ Z MMSE filter (21). We also compare our results with the MMSE distortion bound, given by the MMSE estimator (22), which can be rewritten as follows: Figure 6. Distortion for GGD data as a function of SNR (dB) ⎛⎞A considerable difference in performance in the favor of IMG xˆ ==xpxydx||,| x pxyσσ22 p yd σ 2 dx. MMSE ∫∫∫() ⎜⎟()() CE with respect to the shrinkage functions is observed for xx⎝⎠σ 2 γ = 0.5 . The difference tends to decrease for larger γ , and Using Bayes rule and switching integrals, one obtains: inverts for γ = 1.5 , albeit with a very small relative module. ⎛⎞py()()|σσ22 p xˆ = xp x|, yσ 22 dx dσ . However, the IMG CE estimator being unique for a given y, MMSE ∫∫⎜⎟() σ 2 ⎝⎠x py() the IMG CE distortion curve necessarily remains above the Considering Gaussian noise the first parenthesis yields the MMSE bound, as for any other injective estimator. All curves MMSE estimate (21), and by integrating the Gaussian tend to the distortion of the Wiener filter when γ approaches Transform and developing py()|σ 2 and py() one finds: 2, plotted dashed-dotted in Fig.2 on the graph corresponding to γ = 1.5 . σ 2 NyGd| σσ22+ σ 2 σ 2 ∫ 22()()ZX C. IMG ML and MAP variance estimation 2 σσ+ ˆ σ Z xMMSE = y . (34) The simplest way of obtaining the variances in a statistical NyGd| σσ22+ σ 2 σ 2 ∫ ()()ZX framework is through Maximum Likelihood (ML) estimation: σ 2 σσˆ 22= arg max⎡⎤py | , (35) The MMSE distortion bound is then computed by plugging XML_ σ 2 ⎣⎦() the result (34) into (31). The IMG lower bound (33) and the which for zero-mean independent Gaussian source and noise MMSE bound (34) are superposed in Fig. 6, which displays data yields the solution: the distortion of the estimation (31) as a function of the signal 222 2 σσˆXML_ =−max()y Z ,0 . (36) σ γ to noise ratio SNR= 10log (the variance of the GGD If the variance distribution were known, one could improve 10 σ 2 Z (36) through MAP estimation. Using the Gaussian Transform 2 family is given by σ γ , see the moments method from [6] and the MAP variance estimate is: equation (18)). One may notice that the IMG lower bound is σσσˆ 222= arg max⎡⎤Gpy | . (37) XMAP_ σ 2 ⎣⎦ X()( ) lower than the MMSE bound, a result due to the assumption Generally, the expression (37) requires numerical As expected, empirical results reproduce the theoretical maximization, but an analytical form exists for γ = 1 and results obtained for the scalar estimators (Fig. 6). The IMG Gaussian noise (using (29) and differentiation of (37)): MAP results are close, but above the direct MAP estimation of the shrinkage functions for 1 (this result was predictable ⎛⎞ γ = ⎜⎟2 from the curves in Fig.7), however they notably improve on 22⎜⎟2y σσˆXMAP_,1γ = =−max Z ,0 . (38) the Moulin and Liu results for γ = 0.5 . We expect the IMG ⎜⎟8y2 ⎜⎟11++ MAP curves to remain above the Moulin and Liu curves for ⎜⎟σ 2 ⎝⎠γ γ ≥ 1 , both falling slowly down to the Gaussian-Gaussian One would expect the IMG MAP estimator to behave curve in Fig.6 as the IMG MAP estimator tends to the Wiener similarly to the original shrinkage functions of Moulin and filter for γ = 2 . Liu, as it applies the MAP principle two times (37), (23) to While the IMG MAP estimator does not outperform the obtain an indirect estimate, whereas the shrinkage functions cumulative estimator, nor the shrinkage estimator for γ ≥ 1 , are the result of direct MAP estimation (32). This is confirmed and while the IMG CE estimator does not outperform the in Fig. 7, which compares the equivalent shrinkage functions classical MMSE estimate, the linear nature of the estimator of the IMG ML and IMG MAP estimators with the IMG CE (23), coupled with the simple analytical forms (38) and (30) curve and with the Moulin and Liu curve for γ = 1 . for γ = 1 , allowed the successful use of the IMG MAP and IMG CE estimators in the highly underdetermined EEG (electroencephalogram) inverse problem [15] with Laplacian prior, where a direct application of the MAP principle would lead to an underdetermined quadratic programming problem, and where a direct application of the MMSE principle would lead to highly expensive computations.

V. CONCLUSION AND FUTURE WORK

Figure 7. IMG ML and MAP equivalent shrinkage functions We introduced in this paper the Gaussian Transform, which Unlike the IMG CE estimator, the two proposed IMG ML allows for the representation of symmetric distributions as an and MAP estimators do present thresholding characteristics, infinite Gaussian mixture. The scope of applicability of the and, as expected, the equivalent IMG MAP curve behaves Gaussian Transform is potentially very broad, from denoising very similarly to the original shrinkage curve. and regularization to filtering, coding, compression, We tested empirically the MAP (38) estimator by watermarking etc. However, an extension of the concept to generating 1D Generalized Gaussian data of length N=106 non-symmetric distribution would be required for some with γ = 0.5 and γ = 1 , adding independent white Gaussian specific applications. Further investigation of the existence noise, and performing denoising according to (21). The source conditions, especially non-negativity, is also necessary. and noise probability densities were assumed to be known, Finally, one would need adapted numerical packages (most and the corresponding distortion was computed as the mean likely based on existing Laplace and Fourier transform square error (Fig. 8). We also tested the cumulative estimator computational packages) for the computation of Gaussian (28) and the shrinkage function (32). Transforms of both analytically and numerically defined distributions. We also presented in this paper various denoising algorithms based on the Infinite Mixture of Gaussians model. Their application to the denoising of Generalized Gaussian data showed significant improvements in terms of distortion, both theoretically and empirically, with respect to the shrinkage functions of Moulin&Liu. Also, the lower bound under the proposed IMG estimation framework is lower than the classical MMSE bound. However the potential of IMG based denoising is not yet fully explored, as the lower bound presented in this paper is still below current results. The estimation methods proposed may reach this lower bound in the case where the data is generated locally from the same Gaussian component, which would allow for exact variance estimation.

ACKNOWLEDGMENT

Figure 8. Empirical distortion curves The authors thank Dr. Oleksiy Koval for careful reading and fruitful discussions. The authors also thank anonymous reviewers for valuable comments and suggestions.

REFERENCES 1. A. D. Poularikas, S. Seely, “Laplace Transforms. The Transforms and Applications Handbook: Second Edition”. Ed. Alexander D. Poularikas Boca Raton: CRC Press LLC, 2000 2. G. Arfken, Mathematical Methods for Physicists, 3rd ed. Orlando, FL: Academic Press, pp. 481-485, 1985. 3. G. Arfken, "Inverse Laplace Transformation." 15.12 in Mathematical Methods for Physicists, 3rd ed. Orlando, FL: Academic Press, pp. 853-861, 1985. 4. R. Wilson, “MGMM: Multiresolution Gaussian Mixture Models for Computer Vision”, in IEEE International Conference on Pattern Recognition, Barcelona, Spain, September 2000 5. A. Hjørungnes, J. M. Lervik, and T. A. Ramstad , “Entropy Coding of Composite Sources Modeled by Infinite Mixture Gaussian Distributions”, in Proc. 1996 IEEE Digital Signal Processing Workshop , Loen, Norway, September 1996 6. S.G. Mallat, “A Theory for Multiresolution Signal Decomposition: The Wavelet Representation”, in IEEE Transactions on Pattern Analysis and Machine Intelligence Volume 11 , Issue 7 (July 1989) p. 674-693. 7. P. Moulin and J. Liu, "Analysis of Multiresolution Image Denoising Schemes Using Generalized - Gaussian and Complexity Priors", IEEE Trans. Info. Theory, Special Issue on Multiscale Analysis, Vol. 45, No. 3, pp. 909-919, Apr. 1999 8. D. L. Donoho and I. M. Johnstone, "Ideal Spatial Adaptation Via Wavelet Shrinkage", Biometrika, Vol. 81, pp. 425-455, 1994. 9. A. Bijaoui, "Wavelets, Gaussian mixtures and Wiener filtering'', Signal Processing, Vol.82, No.4, April 2002, pp 709-712. 10. Benaroya, L; Bimbot, F; "Wiener-based source separation with HMM/GMM using a single sensor'', Proc. 4th Int. Symp. on Independent Component Anal. and Blind Signal Separation, April 2003, Nara, Japan. 11. T. M. Cover, J. A. Thomas, "Elements of Information Theory", John Wiley & Sons Inc, 1991 12. M. Rangaswamy, D. Weiner, A. Ozturk, "Non-Gaussian Random Vector Identification Using Spherically Invariant Random Processes", in IEEE Transactions on Aerospace and Electronic Systems, vol. 29, No. 1, January 1993 13. E. Conte and M. Longo, "Characterization of radar clutter as a spherically invariant random process", in IEEE Proceedings, Pt. F (Communication, Radar and Signal Processing), 134, 2 (1987), 191-197 14. M.Bertero, P.Boccacci, "Introduction to Inverse Problems in Imaging", IOP Publishing, Bristol, 1998 15. T. I. Alecu, P. Missionnier, S. Voloshynovskiy, P. Giannakopoulos and T. Pun, "Soft/hard focalization in the EEG inverse problem", in IEEE Workshop on Statistical Signal Processing (SSP'05), Bordeaux, France, July, 17-20 2005.