Parameter Estimation for Multivariate Generalized Gaussian Distributions Fred´ Eric´ Pascal, Lionel Bombrun, Jean-Yves Tourneret and Yannick Berthoumieu

SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING 1 Parameter Estimation For Multivariate Generalized Gaussian Distributions Fred´ eric´ Pascal, Lionel Bombrun, Jean-Yves Tourneret and Yannick Berthoumieu. Abstract—Due to its heavy-tailed and fully parametric form, include radar [23], video coding and denoising [24]–[26] or the multivariate generalized Gaussian distribution (MGGD) has biomedical signal processing [25], [27], [28]. Finally, it is been receiving much attention for modeling extreme events interesting to note that complex GGDs have been recently in signal and image processing applications. Considering the estimation issue of the MGGD parameters, the main contribution studied in [29], [30] and that multivariate regression models of this paper is to prove that the maximum likelihood estimator with generalized Gaussian errors have been considered in [31]. (MLE) of the scatter matrix exists and is unique up to a scalar Considering the important attention devoted to GGDs, es- factor, for a given shape parameter β 2 (0; 1). Moreover, an timating the parameters of these distributions is clearly an estimation algorithm based on a Newton-Raphson recursion is interesting issue. Classical estimation methods that have been proposed for computing the MLE of MGGD parameters. Various experiments conducted on synthetic and real data are presented investigated for univariate GGDs include the maximum likeli- to illustrate the theoretical derivations in terms of number of hood (ML) method [32] and the method of moments [33]. In iterations and number of samples for different values of the the multivariate context, MGGD parameters can be estimated shape parameter. The main conclusion of this work is that the by a least-squares method as in [17] or by minimizing a parameters of MGGDs can be estimated using the maximum χ2 distance between the histogram of the observed data and likelihood principle with good performance. the theoretical probabilities associated with the MGGD [34]. Index Terms—Multivariate generalized Gaussian distribution, Estimators based on the method of moments and on the ML covariance matrix estimation, fixed point algorithm. method have also been proposed in [35]–[37]. Several works have analyzed covariance matrix estimators I. INTRODUCTION defined under different modeling assumptions. On the one hand, fixed point (FP) algorithms have been derived and NIVARIATE and multivariate generalized Gaussian dis- analyzed in [38], [39] for SIRVs. On the other hand, in the tributions (GGDs) have received much attention in the U context of robust estimation, the properties of M-estimators literature. Historically, this family of distributions has been have been studied by Maronna in [40]. Unfortunately, introduced in [1]. Some properties of these distributions have Maronna’s conditions are not fully satisfied for MGGDs been reported in several papers such as [2]–[4]. These prop- (see remark II.3). This paper shows that despite the non- erties include various stochastic representations, simulation applicability of Maronna’s results, the MLE of MGGD methods and probabilistic characteristics. GGDs belong to parameters exists, is unique and can be computed by an FP the family of elliptical distributions (EDs) [5], [6], originally algorithm. Although the methodology adopted in this paper introduced by Kelker in [7] and studied in [8], [9]. For has some similarities with the one proposed in [38], [39], β 2 (0; 1], Multivariate GGDs (MGGDs) are a subset of the there are also important differences which require a specific spherically invariant random vector (SIRV) distributions. For analysis (see for instance remark III.1). More precisely, the β > 1, MGGDs are no longer SIRV distributions as illustrated FP equation of [38] corresponds to an approximate MLE for in Fig. ?? (for more details, see [10]). SIRVs while in [39] the FP equation results from a different MGGDs have been used intensively in the image processing problem (see Eq. (14) in [39] compared to Eq. (15) of this arXiv:1302.6498v2 [stat.AP] 24 Feb 2017 community. Indeed, including Gaussian and Laplacian distri- paper). The contributions of this paper are to establish some butions as special cases, MGGDs are potentially interesting for properties related to the FP equation of the ML estimator modeling the statistical properties of various images or features for MGGDs. More precisely, we show that for a given shape extracted from these images. In particular, the distribution parameter β belonging to (0; 1), the MLE of the scatter of wavelet or curvelet coefficients has been shown to be matrix M exists and is unique up to a scalar factor1. An modeled accurately by GGDs [11]–[14]. This property has iterative algorithm based on a Newton-Raphson procedure is been exploited for many image processing applications includ- then proposed to compute the MLE of M. ing image denoising [15]–[18], context-based image retrieval [19], [20], image thresholding [21] or texture classification in The paper is organized as follows. SectionII defines the industrial problems [22]. Other applications involving GGDs MGGDs considered in this study and derives the MLEs of their F. Pascal is with Supelec/SONDRA,´ 91192 Gif-sur-Yvette Cedex, France parameters. Section III presents the main theoretical results (e-mail: [email protected]) of this paper while a proof outline is given in SectionIV. L. Bombrun and Y. Berthoumieu are with Universite´ de Bordeaux, IPB, For presentation clarity, full demonstrations are provided in ENSEIRB-Matmeca, Laboratoire IMS, France (e-mail: lionel.bombrun@ims- bordeaux.fr; [email protected]) J.-Y. Tourneret is with Universite´ de Toulouse, IRIT/INP-ENSEEIHT, (e- 1From the submission of this paper, another approach based on geodesic mail:[email protected]) convexity was proposed in (include reference paper Wiesel). 2 SUBMITTED TO IEEE TRANS. ON SIGNAL PROCESSING the appendices. SectionV is devoted to simulation results elliptical distribution that has received much attention in the conducted on synthetic and real data. The convergence speed literature. Following the results of [44] for real elliptical of the proposed estimation algorithm as well as the bias distributions, by differentiating the log-likelihood of vectors and consistency of the scatter matrix MLE are first inves- (x1;:::; xN ) with respect to M, the MLE of the matrix M tigated using synthetic data. Experimentations performed on satisfies the following FP equation real images extracted from the VisTex database are then N T −1 2 X −gm,β(x M xi) presented. Conclusions and future works are finally reported M = i x xT (5) N h (xT M−1x ) i i in SectionVI. i=1 m,β i i II. PROBLEM FORMULATION where gm,β(y) = @hm,β(y)=@y. In the particular case of an MGGD with known parameters m and β, straightforward A. Definitions computations lead to The probability density function of an MGGD in Rp is N T defined by [41] β X xix M = i : (6) 1 Nmβ T −1 1−β T −1 i=1 xi M xi p(xjM; m; β) = 1 hm,β x M x (1) jMj 2 When the parameter m is unknown, the MLEs of M and m are p for any x 2 R , where M is a p × p symmetric real scatter obtained by differentiating the log-likelihood of (x1;:::; xN ) T matrix, x is the transpose of the vector x, and hm,β (·) is a with respect to M and m yielding so-called density generator defined by β N x xT p β X i i βΓ 2 1 y M = β 1−β ; (7) h (y) = p exp − Nm T −1 m,β p p (2) x M x p 2 β i=1 i i 2 2β m 2m π Γ 2β 2 1 " N # β β X β for any y 2 +, where m and β are the MGGD scale and m = xT M−1x : (8) R pN i i shape parameters. The matrix M will be normalized in this i=1 paper according to Tr (M) = p, where Tr(M) is the trace After replacing m in (6) by its expression (8), the following of the matrix M. It is interesting to note that letting β = 1 result can be obtained corresponds to the multivariate Gaussian distribution. More- N over, when β tends toward infinity, the MGGD is known to 1 X Np T M = xix : (9) converge in distribution to a multivariate uniform distribution N 1−β X β i i=1 yi + yi yj (see (3)). j6=i As mentioned before and confirmed by (9), M can be esti- B. Stochastic representation mated independently from the scale parameter m. Moreover, p Let x be a random vector of R distributed according to an the following remarks can be made about (9). MGGD with scatter matrix Σ = mM and shape parameter β. Gomez´ et al. have shown that x admits the following stochastic Remark II.1 representation [2] When β = 1, Eq. (9) is close to the sample covariance d 1 matrix (SCM) estimator (the only difference between the SCM x = τ Σ 2 u (3) estimator and (9) is due to the estimation of the scale parameter where =d means equality in distribution, u is a random vector that equals 1 for the multivariate Gaussian distribution). For β = 0, (9) reduces to the FP covariance matrix estimator that uniformly distributed on the unit sphere of Rp, and τ is a scalar positive random variable such that has received much attention in [44]–[46]. p Remark II.2 τ 2β ∼ Γ ; 2 (4) 2β Equation (9) remains unchanged if M is replaced by α M where α is any non-zero real factor. Thus, the solutions of (9) where Γ(a; b) is the univariate gamma distribution with pa- (when there exist) can be determined up to a scale factor α. The rameters a and b (see [42] for definition).

Parameter Estimation for Multivariate Generalized Gaussian Distributions Fred´ Eric´ Pascal, Lionel Bombrun, Jean-Yves Tourneret and Yannick Berthoumieu

4.2 Variance and Covariance

On the Scale Parameter of Exponential Distribution

A Study of Non-Central Skew T Distributions and Their Applications in Data Analysis and Change Point Detection

On the Meaning and Use of Kurtosis

On the Efficiency and Consistency of Likelihood Estimation in Multivariate

A Multivariate Student's T-Distribution

A Family of Skew-Normal Distributions for Modeling Proportions and Rates with Zeros/Ones Excess

Generalized Inferences for the Common Scale Parameter of Several Pareto Populations∗

A Study of Ocean Wave Statistical Properties Using SNOW

Sampling Student's T Distribution – Use of the Inverse Cumulative

The Smoothed Median and the Bootstrap

A Note on Inference in a Bivariate Normal Distribution Model Jaya