New insights into the statistical properties of M-estimators Gordana Draskovic, Frédéric Pascal

To cite this version:

Gordana Draskovic, Frédéric Pascal. New insights into the statistical properties of M-estimators. IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, 2018, 66 (16), pp.4253-4263. ￿10.1109/TSP.2018.2841892￿. ￿hal-01816084￿

HAL Id: hal-01816084 https://hal.archives-ouvertes.fr/hal-01816084 Submitted on 26 Feb 2020

HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. 1 New insights into the statistical properties of M-estimators Gordana Draskoviˇ c,´ Student Member, IEEE, Fred´ eric´ Pascal, Senior Member, IEEE

Abstract—This paper proposes an original approach to bet- its explicit form, the SCM is easy to manipulate and therefore ter understanding the behavior of robust scatter matrix M- widely used in the signal processing community. estimators. Scatter matrices are of particular interest for many Nevertheless, the complex normality sometimes presents a signal processing applications since the resulting performance strongly relies on the quality of the matrix estimation. In this poor approximation of underlying physics. Noise and interfer- context, M-estimators appear as very interesting candidates, ence can be spiky and impulsive i.e., have heavier tails than mainly due to their flexibility to the statistical model and their the Gaussian distribution. An alternative has been proposed by robustness to outliers and/or missing data. However, the behavior introducing elliptical distributions [11], namely the Complex of such estimators still remains unclear and not well understood Elliptically Symmetric (CES) distributions. These distributions since they are described by fixed-point equations that make their statistical analysis very difficult. To fill this gap, the main contri- present an important property which states that their higher bution of this work is to prove that these estimators distribution order moment matrices are scalars multiple of their corre- is more accurately described by a than by spondent normal distribution. This presents a starting point the classical asymptotical Gaussian approximation. To that end, for the analysis that is done in this paper. These distributions we propose a new “Gaussian-core” representation for Complex have been frequently employed for non-Gaussian modeling Elliptically Symmetric (CES) distributions and we analyze the proximity between M-estimators and a Gaussian-based Sample (see e.g., for radar applications [12]–[16]). (SCM), unobservable in practice and playing Although Huber introduced robust M-estimators in [17] for only a theoretical role. To confirm our claims we also provide the scalar case, Maronna provided the detailed analysis of the results for a widely used function of M-estimators, the Maha- corresponding scatter matrix estimators in the multivariate real lanobis distance. Finally, Monte Carlo simulations for various case in his seminal work [18]. M-estimators correspond to a scenarios are presented to validate theoretical results. generalization of the well-known Maximum Likelihood esti- Index Terms—M-estimators, Complex Elliptical Symmetric mators (MLE), that have been widely studied in the distributions, robust estimation, Wishart distribution, Mahanalo- bis distance. literature [19], [20]. In contrast to ML-estimators where the estimating equation depends on the probability density func- tion (PDF) of a particular CES distribution, the weight function I.INTRODUCTION in the M-estimating equation can be completely independent In signal processing applications, the knowledge of scatter of the data distribution. Consequently, M-estimators presents matrix is of crucial importance. It arises in diverse applications a wide class of scatter matrix estimators, including the ML- such as filtering, detection, estimation or classification. In estimators, robust to the data model. In [18], it is shown that, recent years, there has been growing interest in covariance under some mild assumptions, the estimator is defined as the matrix estimation in a vast amount of literature on this topic unique solution of a fixed-point equation and that the robust (see e.g., [1]–[8] and references therein). Generally, in most estimator converges almost surely (a.s.) to a deterministic of signal processing methods the data can be locally mod- matrix, equal to the scatter matrix up to a scale quantity elled by a multivariate zero-mean circular Gaussian stochastic (depending on the true statistical model). Their asymptotical process, which is completely determined by its covariance properties have been studied by Tyler in the real case [21]. matrix. Complex multivariate Gaussian, also called complex This has been recently extended to the complex case, more normal (CN), distribution plays a vital role in the theory of useful for signal processing applications, in [1], [6]. statistical analysis [9]. Very often the multivariate observations In most of the papers, three main M-estimators are studied are approximately normally distributed. This approximation and used in practice: the Student’s M-estimator that is MLE is (asymptotically) valid even when the original data is not for t-distribution, the Huber’s M-estimator and the Tyler’s multivariate normal, due to the central limit theorem. In that M-estimator [22], also known as Fixed Point (FP) estima- case, the classical covariance matrix estimator is the sample tor [2]. Student t-distribution is widely employed for non- covariance matrix (SCM) whose behavior is perfectly known. Gaussian data modeling since it offers flexibility thanks to Indeed, it follows the Wishart distribution [10] which is the an additive parameter, namely the Degree of Freedom (DoF). multivariate extension of the gamma distribution. Thanks to As a consequence, Student’s M-estimator is often used for scatter matrix estimation. Huber’s M-estimator, especially its Gordana Draskoviˇ c´ and Fred´ eric´ Pascal are with L2S - CentraleSupelec´ - CNRS - Universite´ Paris-Sud - 3 rue Joliot-Curie, F-91192 Gif-sur- complex multivariate extension, has received a lot of attention Yvette Cedex, France (e-mails: [email protected], since proven to be very robust to outliers. Tyler’s M-estimator [email protected]). “This paper has supplementary down- is not exactly an M-estimator1 but it is very useful because loadable material available at http://ieeexplore.ieee.org, provided by the au- thor. The material includes the results for the real case. This material is 149KB in size.” 1especially because it does not respect all Maronna conditions [18] 2 of rare property that any CES distribution with the same II.PROBLEM FORMULATION scatter matrix leads to the same result (hence “distribution- A. Complex distributions free”). Asymptotical properties of this estimator have been Let z = Re(z) + jIm(z) be an m-dimensional complex analyzed in [1], [23]. Recently, it has been shown that the random vector which consists of a pair of real random vectors behavior of Tyler’s estimator can be better approximated by a Re(z) and Im(z). The distribution of z on m determines Wishart distribution [24]. In this work, one aims at providing C the joint real 2m-variate distribution of Re(z) and Im(z) on more general results that can be applied to all M-estimators 2m and conversely. To completely define the second-order and one wants to analyze the gain of this approach on the R moments of Re(z) and Im(z), z is given by its covariance robust Mahalanobis distance [25], [26], very useful in various matrix C = E[(z−µ)(z−µ)H ] and pseudo-covariance matrix problems such as detection, clustering etc. P = E[(z−µ)(z−µ)T ]. If the complex vector is circular (see The contributions of this work are multiple. First, the [1] for details), the pseudo-covariance vanishes, i.e. P = 0. originality of the results comes from a new CES repre- 1) Generalized Complex Normal distribution: An m- sentation introducing “Gaussian cores”. This representation dimensional random vector has the generalized normal dis- is a modified stochastic representation given in [1] and is tribution z ∼ N(µ, C, P) if its probability density function crucial to understand the proposed method. Second, in this C (PDF) can be written as paper, M-estimators are, for the first time, analyzed thanks to a comparison with a very simple estimator, the SCM.   z − µ  exp − 1 (z − µ)H (z − µ)T  V−1 Indeed, the direct statistical analysis of these estimators is 2 z∗ − µ∗ hz(z) = difficult because they are defined as the solution of an implicit πmp|V| equation and have been analyzed only in classical asymptotic (1)   regimes. Here, we propose a different approach to overcome CP where µ is the statistical mean and V = . If z is this difficulty. More precisely, a sort of distance between M- P∗ C∗ estimators and the SCM is computed in order to propagate circular CN-distributed the pseudo-covariance will be omitted SCM non-asymptotic properties towards M-estimators. Third, in the notation, i.e. z ∼ CN(µ, C). the paper gives new insights into the correlation between M- estimators and the corresponding SCM in the Gaussian context B. Complex Elliptically Symmetric distributions which is the central part of our approach. Finally, we present An important class of circular distributions are the CES a practical interest of the results, specifically the application distributions. An m-dimensional random vector has a CES to the Mahalanobis distance. Note that all the results are distribution if its probability density function (PDF) can be provided in the complex case. For completeness purposes, a written as supplemental material containing analogous results in the real −1 H −1  case is provided together with this article. hz(z) = C|M| gz (z − µ) M (z − µ) (2) The rest of this paper is organized as follows. Section II where C is a constant, g : [0, ∞) → [0, ∞) is any function introduces the considered CES-models based on Gaussian z (called the density generator) such that (2) defines a PDF and cores as well as the M-estimators and Mahalanobis distance. M is the scatter matrix. The matrix M reflects the structure Section III contains the main contribution of the paper with of the covariance matrix C of z, i.e., the covariance matrix discussions and further explanations. Moreover, closed-form is equal to M up to a scale factor C = ξM2. This CES expressions are derived for some particular cases of M- distribution will be denoted by ES(µ, M, g ). In this paper, estimators and the application to the Mahalanobis distance C z we will assume that µ = 0, as it is generally the case for is presented. In Section IV, Monte Carlo simulations are many signal processing applications. presented in order to validate the theoretical results. Finally, 1) Stochastic Representation Theorem: A zero mean ran- some conclusions and perspectives are drawn in Section V. dom vector z ∼ CESm(0, M, gz) if and only if it admits the following stochastic representation [27] Notations - Vectors (resp. matrices) are denoted by bold- T ∗ H d p faced lowercase letters (resp. uppercase letters). , re- z = QAu, (3) spectively represent the transpose, conjugate and the Hermitian operator. Re(.) and Im(.) denote respectively the real and where the non-negative real random variable Q, called the the imaginary part of a complex quantity, i.i.d. stands for modular variate, is independent of the random vector u that “independent and identically distributed” while ∼ means “is is uniformly distributed on the unit complex m-sphere and M = AAH is a factorization of M. distributed as”. =d stands for “shares the same distribution as”, →d denotes convergence in distribution and ⊗ denotes the 2) Circular Complex Normal Distribution: Complex nor- 1 (·) mal (Gaussian) distribution is a particular case of CES distri- Kronecker product. is indicator function and vec is the −z −m operator which transforms a matrix m × n into a vector of butions in which gz(z) = e and C = π . Thus, the PDF length mn, concatenating its n columns into a single column. of z ∼ CN(0, M) is given by  H −1 Moreover, Im is the m × m identity matrix, 0 the matrix of exp −z M z h (z) = . (4) zeros with appropriate dimension and K is the commutation z πm|M| matrix (square matrix with appropriate dimensions) which transforms vec(A) into vec(AT ), i.e. K vec(A) = vec(AT ). 2if the random vector has a finite second-order moment (see [1] for details) 3

Note that for µ = 0 the PDF (1) reduces to the PDF above denoted by Mc, is defined by the solution of the following since the scatter matrix is equal to the covariance matrix C = M-estimating equation M (i.e., the scale factor ξ is equal to 1). Regarding the previous K stochastic representation theorem, for CN-distributed vector z 1 X  H −1  H Mc = ϕ z Mc zk zkz (8) K k k the random variable Q has a scaled chi-squared distribution k=1 Q ∼ (1/2)χ2 . 2m where ϕ is any real-valued weight function on [0, ∞) that 3) Gaussian-core representation of CES: In order to better respects Maronna’s conditions3 [18]. The theoretical (popula- explain the context of this work, we will rewrite the stochastic tion) scatter matrix M-functional is defined as a solution of representation using the fact that u =d n/knk, where n ∼ ϕ zH M−1z zzH  = M CN(0, I). Hence, a random vector z ∼ CES(0, M, gz) can E σ σ be represented as √ The M-functional is proportional to the true scatter matrix d Q −1 z = An (5) parameter M as Mσ = σ M, where the scalar factor σ > 0 knk can be found by solving √ with Q and A defined as in Eq. (3). If Q/knk is independent E [ψ(σt)] = m (9) of n, the vector z is said to have a compound-Gaussian d √ with ψ(σt) = ϕ(σt)σt and t = zH M−1z. distribution and it can be represented as z = τx, where c the non-negative real random variable τ, generally called the Theorem II.1 Let M be a complex M-estimator following texture, is independent of the vector x ∼ N(0, M). c C Maronna’s conditions [18]. Then 4) Student t-distribution: A zero-mean random vector z √   d follows a complex multivariate t-distribution with ν (0 < ν < Kvec Mc − Mσ → CN (0, ΣM , ΩM ) ∞) degrees of freedom if the corresponding stochastic repre- where the asymptotic covariance and pseudo-covariance ma- sentation admits Q ∼ mF2m,ν . This distribution belongs to the compound-Gaussian distributions where τ ∼ IG(ν/2, ν/2), trices are  T H with IG denoting the inverse Gamma distribution. Note that ΣM = ϑ1Mσ ⊗ Mσ + ϑ2vec(Mσ)vec(Mσ) , T  T the case ν → ∞ leads to the Gaussian distribution. The ΩM = ϑ1 Mσ ⊗ Mσ K + ϑ2vec(Mσ)vec(Mσ) . multivariate t-distributions, besides the Gaussian distribution, (10) encompass also multivariate Laplace distribution (for ν = 1/2) The constants ϑ1 > 0 and ϑ2 > −ϑ1/m are given in [1], [6]. and the multivariate Cauchy distribution (for ν = 1) which 1) Tyler’s estimator: For the particular function ϕ(t) = are heavy-tailed alternatives to the Gaussian distribution. The m/t, Tyler’s estimator M is the solution (up to a scale complex multivariate t-distributions are thus useful for study- cFP factor) of the following equation ing robustness of as a decrement of ν yields to distributions with an increased heaviness of the tails. K H m X zkzk McFP = . (11) We shall write Ctν (0, M) to denote this case. K H −1 k=1 zk McFP zk

It should be noted that for Tyler’s estimator, ΣT and ΩT are C. Wishart distribution also defined as in Eq. (10) (see [23]) for σ = 1 and The complex Wishart distribution is the distribution of ϑ = m−1(m + 1) PK x xH , when x are m-dimensional complex circular 1 (12) k=1 k k k ϑ = −m−2(m + 1) i.i.d. zero-mean Gaussian vectors with covariance matrix M. 2 Let 2) Huber’s M-estimator: The complex extension of Hu- K ber’s M-estimator is defined by 1 X H McSCM = xkxk K K h i k=1 1 X H 1 McHub = zkzk zH M−1 z ≤p2 Kβ k cHub k be the related sample covariance matrix (SCM) which will be k=1 K " # also referred to as a Wishart matrix. Its asymptotic distribution 1 z zH 2 X k k 1 [10], is given by + p zH M−1 z >p2 , (13) Kβ H −1 k cHub k k=1 zk McHubzk √   d Kvec McSCM − M → CN (0, ΣSCM , ΩSCM ) (6) where p2 and β depend on a single parameter 0 < q < 1, 2 2 2 1−q according to q = F2m(2p ) and β = F2m+2(2p ) + p m where the asymptotic covariance and pseudo-covariance ma- 2 trices are where Fm(·) is the cumulative distribution function of a χ  T ΣSCM = M ⊗ M distribution with m degrees of freedom. Note that the Huber’s T  (7) ΩSCM = M ⊗ M K. M-estimator can be interpreted as a weighted combination between the SCM and Tyler’s estimator. D. M-estimators 3The weight function ϕ does not need to be related to the PDF of any Let (z1,..., zK ) be a K-sample of m-dimensional complex particular CES distribution, and hence M-estimators constitute a wide class i.i.d. vectors with zk ∼ CES(0, M, gz). An M-estimator, of scatter matrix estimators. 4

3) Student’s M-estimator: The MLE for the Student t- III.MAINCONTRIBUTION distribution, denoted Mct, is obtained as a solution of the This section is devoted to the main contribution of the following equation paper. First, the results for the asymptotic distribution of the

K H difference between any M-estimator and the corresponding m + ν/2 X zkzk Mct = . (14) SCM in a Gaussian context are derived. Then, the results for K H −1 k=1 ν/2 + zk Mct zk particular M-estimators and the application to Mahalanobis distance are presented. Finally, discussion and some expli- The motivation to analyze this estimator arises from the fact cations are provided to emphasize the significance of the that it presents a trade-off between the SCM and Tyler’s theoretical results. estimator, but in a different way as for the Huber’s M- estimator. Indeed, ν → ∞ leads to the Gaussian distribution and the resulting MLE is the SCM (ϕ(t) → 1) while ν → 0 A. M-estimators yields Tyler’s estimator (ϕ(t) → m/t). Finally, Mct is widely Basing on the previously introduced Gaussian-core model, used both in theory (as a benchmark) and in practice which let us assume that K measurements are defined as follows: √ presents strong motivation for understanding its behavior. Note • zk = Qk/knkkAnk ∼ CES(0, M), k = 1,...,K also that as others M-estimators, it is not always used as a with MLE for the t distribution. – (n1,..., nK ) be a K-sample of m-dimensional com- plex i.i.d. vectors with nk ∼ CN(0, I) E. Mahalanobis distance – Q1,...,QK a K-sample of non-negative real i.i.d random variables independent of the nk’s Mahalanobis distance [25], [26] is one of the most common – M = AAH is a factorization of M measures in multivariate statistics and signal processing. It is (z1,..., zK ) corresponds to observed data, without more based on the correlation between variables thanks to which specifications on their distribution, and are used to design an different models can be identified and analyzed. The Maha- M-estimator Mc. lanobis distance of z from µ is given by ∆(µ, M) where Let us also consider some “fictive” data (non observable) ∆2(µ, M) = (z − µ)H M−1(z − µ). (15) given by: • xk = Ank ∼ CN(0, M), k = 1,...,K where µ is population mean and M is common scatter matrix. and consider the SCM McSCM built with (x1,..., xK ). Here- Since we work with zero mean vectors, we will analyze after, we always consider the same model unless it is stressed 2 H −1 ∆ (M) = z M z, without loss of generality. If the data differently. are normal distributed, z ∼ CN(0, M), and the distance is based on the true scatter matrix M, then it follows a scaled Theorem III.1 Let Mc be defined by Eq. (8) and σ is the so- chi-squared distribution lution of Eq. (9). The asymptotic distribution of σMc −McSCM 2 2 is given by ∆ (M) ∼ (1/2) χ2m. (16) √   d Kvec σMc − McSCM → N (0, Σ, Ω) (18) Since the scatter matrix is usually unknown, the distance is C computed with its estimate. If the SCM is plugged in instead where Σ and Ω are defined by of the true scatter matrix, the distance becomes β0-distributed4 T H with an asymptotic chi-squared distribution Σ = σ1M ⊗ M + σ2vec(M)vec(M) , T  T Ω = σ1 M ⊗ M K + σ2vec(M)vec(M) (19) 2   0 ∆ McSCM ∼ Kβ (m, K − m + 1) (17) with where β0(a, b) denotes a Beta prime distribution with real am(m + 1) + c(c − 2b) σ = shape parameters a and b. 1 c2 Beside testing if an observed random sample is from a a − m2 a(m + 1) m(c − b) σ = − + 2 (20) multivariate normal distribution (detecting outliers) [28], [29], 2 (c − m2)2 c2 c(c − m2) the Mahalanobis distance is also a useful way to determine where a = E[ψ2(σt )], b = E[ψ(σt )t ] and c = similarities between sets of known and unknown data. Thus, 1 1 2 E[ψ0(σt )σt ] + m2. it is widely used in classification problems [30], [31], feature 1 1 selection problems [32], anomaly detection in hyperspectral images [33], [34], etc. Remark III.1 Notice that the structure of the asymptotic covariance matrix Σ is the same as in classical asymptotic The object of our study is to analyze the robust Mahalanobis results (Eqs. and ) but the coefficients are different. distance, i.e. the distance computed with M-estimators, com- (7) (10) In the case of the identity matrix as covariance matrix, this paring it to the one based on the SCM, in order to better very particular structure involves only three non-null elements understand its behavior. d1, d2 and d3 at the positions (i, j) and equal to: 4 Beta prime distribution corresponds to a scaled F-distribution. • d1 = σ1 +σ2 for i = j = p+m(p−1) with p = 1, . . . , m, 5

• d2 = σ1 for i = j = p + m(q − 1) with p 6= q and Remark III.3 In the case of Tyler’s estimator ψ(σt1) = m p, q = 1, . . . , m, which leads to a = m2, b = m2 and c = m2. Substituting • d3 = σ2 for i = p + m(p − 1), j = q + m(q − 1) with these values in the expression of σ1, one obtains the previous p 6= q and p, q = 1, . . . , m. result. This is in agreement with the results obtained in [24], Similar comment with slight modifications is valid for the [35]. pseudo-covariance matrix. 2) Student’s M-estimator: In this subsection, one gives the Proof sketch: We provide only a sketch of the proof, results for the Student’s M-estimator and t-distributed data. while the detailed proof of Theorem III.1 is given in Appendix Let A. The main idea is to represent the matrix Σ as • xk ∼ CN(0, M), k = 1, ..., K • τk ∼ IG(ν/2, ν/2), k = 1, ..., K Σ = Σ1(M) − 2Σ2(M) + Σ3(M) √ • zk = τkxk ∼ Ctν (0, M), k = 1, ..., K H where Σ1(M) and Σ3(M) are given by Eq. (10) and Eq. where M = AA is a factorization of M. Consider the (6), respectively and the matrix Σ2(M) is the correlation SCM McSCM built with (x1,..., xK ) and the Student’s M- matrix between an M-estimator and the corresponding SCM estimator Mct built with (z1,..., zK ). in a Gaussian context. The second important step relies on a decomposition Σ2(M): Corollary III.1 Let Mct be defined by Eq. (14). The asymp- −1 −1 H totic distribution of Mct − McSCM is given by (19) with Σ2(M) = D1 (M)B2(M) D2 (M) −1 −1 σ1 = (m + ν/2) and σ2 = 2/ν(m + 1 + ν/2)(m + ν/2) . where D1(M) = E [d{vecΨ1(M)}/d{vec(M}], Proof: See Appendix B. B2(M) = cov (vecΨ1(M), vecΨ2(M)) and D2(M) = E [d{vecΨ2(M)}/d{vec(M)}] with H −1 −1 H 3) Huber’s M-estimator: Ψ1(M) = σϕ(z (σ M) z)zz − M and The theoretical derivation of the H M Ψ2(M) = xx − M, which is a generalization of a asymptotical distribution for Huber’s -estimator is impossi- result derived in [18]. Finally, using the dependence between ble, since the function ψ(t1) is not differentiable in each point. the practical and fictive data one can derive elements of the However, we will present empirical results for this estimator matrix Σ2(M) and obtain the final result. in the next section.

Remark III.2 In this paper, we consider only complex M- C. Application to Mahalanobis distance estimators since they are used in signal processing applica- In this subsection we provide results for the robust Ma- tions. The results for the real case are given in the supple- halanobis distance which shows the main interest of our mental material. In the proof we provide only the steps that contribution. differ from the ones obtained in the complex case. It should be noted that the results of Theorem III.1 can also be derived Theorem III.3 Let M be defined by Eq. and σ is the using the results for the real case and vector/matrix complex- c (8) solution of . For the Mahalanobis distance based on σM to-real mapping [1]. This is briefly discussed at the end of the (9) c one has, conditionally to the distribution of z, the following additional document. asymptotic distribution

H −1 H −1 ! B. Particular cases √ z (σMc) z − z Mc z d K SCM → N(0, φ) (21) zH M−1z 1) Tyler’s estimator: Hereafter, the results derived in [24], z are presented . The first scale factor in the result can be where (roughly speaking) obtained from the Theorem III.15 while φ = σ + σ (22) the derivation of the second one requires a different approach. 1 2

with σ1 and σ2 given by Eq. (20) and where the notation (.)z Theorem III.2 Let McFP be defined by Eq. (11). The asymp- stresses the conditional distribution, conditional to z. totic distribution of McFP − McSCM is given by Proof: See Appendix C. √   d Kvec McFP − McSCM → CN (0, ΣFP , ΩFP ) Remark III.4 The asymptotic variance of the robust Maha- where ΣFP and ΩFP are defined by lanobis distance when centering around Wishart-based dis- tance is smaller than the one when centering around the dis- 1 m − 1 Σ = MT ⊗ M + vec(M)vec(M)H , tance based on the true scatter matrix since σ +σ < ϑ +ϑ . FP m m2 1 2 1 2 The results are accurate even when K is small which will be 1 T  m − 1 T ΩFP = M ⊗ M K + vec(M)vec(M) . demonstrate in the simulation section. These findings reveal m m2 that the distribution of the robust (squared) Mahalanobis distance is better approximated with a scaled Beta prime 5by considering the function ψ(x) = m. distribution than with a scaled chi-squared distribution. 6

D. Discussion where D and Q are “fixed”. Thus, one has a gain in terms of convergence of m. This is agreement with Here are some general comments on the proposed results the results obtained in [37] for a different convergence as well as their great interest in practice. regime (m, K → ∞ with m/K tending to a positive 1) First, to examine the values of the scale factors in Eq. constant). 2 (20), we discuss the values of E[ψ (σt1)], E[ψ(σt1)t2] 0 2 and E[ψ (σt1)σt1] + m . Since 0 < ψ(σt1) < M 5) Finally, it should be pointed out that the results can be and E(ψ(σt1)) = m, using Bhatia-Davis inequality applied to various signal processing problems. One can [36], one has that var(ψ(σt1)) < (M − m)m and note that the scaled variance of the robust Mahalanobis 2 thus E(ψ(σt1) ) < Mm. Since M is of same distance when centering around the one based on the magnitude as m and M > m, one obtains that SCM in a Gaussian context depends only on the scale 2 2 E(ψ(σt1) ) is of same magnitude as m (for factors given by Eq. (20). This directly leads to the 2 2 Tyler’s estimator E(ψ(σt1) ) = m , for Student ν conclusion that the distribution of the robust distance 2 m(m+1)(m+ 2 ) M-estimator E(ψ(σt1) ) = ν , for can be better approximated with the one of the SCM- m+1+ 2 2 2 SCM E(ψ(σt1) ) = m + m...). From this, it based distance, than with the asymptotical chi-squared follows that b is also of the same magnitude as m2 distribution. These results can be extended to various 2 since m = E(ψ(σt1))E(t2) < E(ψ(σt1)t2) < problems such as detection or classification problems (see p 2 p 2 p 2 p 2 E(ψ(σt1) ) E(t2) < (m + m) E(ψ(σt1) ). e.g., [35]). It is obvious that c is also of the same magnitude as m2. Generally, for all widely used M-estimators, one IV. SIMULATIONS 2 obtains that a, b, c = m + αm, α > 0 which leads A. Validation of the theoretical results to σ inversely proportional to m. For σ , one can not 1 2 In this section we first present some simulations that validate provide precise information about its value, but it turns the theoretical results of Theorem III.1. Figure 1 presents the out that it is eather smaller (e.g., Tyler’s estimator) or empirical mean6 norm of the difference between the empirical unchanged (e.g., Student’s M-estimator) comparing to √ covariance matrix of K(σM − M ) (Eq. (18)), denoted the scale factor given in Eq. (10). This ensures the strong c cSCM as Σ(K) and the theoretical results obtained in Theorem III.1. “proximity” between M-estimators and SCM, justifying The plotted results are obtained from t-distributed data with a the approximation of M-estimators behavior thanks to a DoF ν set to 2 and using the Student’s M-estimator (for which Wishart distribution. theoretical results are explicitly given in Corollary III.1). The 2) The results derived in this paper show that all M- estimators are asymptotically closer to the SCM than 0 to the true covariance matrix. By “close”, we mean 10 that the asymptotic variance when centering about the Gaussian-based SCM is much smaller than the one when k −1 centering about true scatter matrix. Also, this difference C 10 −

is more obvious when the dimension m increases. This ) K remark is of course also obvious for Tyler’s estimator. ( C k

−2 3) An important consequence of the previous remark is 10 that any M-estimators (including Tyler’s one) behavior can be approximated by the SCM one (built with Gaussian random vectors), namely by the Wishart 10−3 distribution. This is of great interest in practice since 101 102 103 all the analytical performance of functionals of robust K Figure 1: Euclidean norm of the difference between the empirical scatter estimators can be derived based on its equivalent covariance matrix of Eq. (18) and the theoretical result Eq. (19) with for the simplest Wishart distribution, while keeping the σ1 and σ2 of Corollary III.1. m = 5 inherent robustness brought by M-estimators (contrary to the SCM). To summarize, robust estimators are scatter matrix M is defined by M = ρ|i−j|, i = 1, ..., m. better approximated by Wishart distribution than by the i,j The correlation coefficient ρ is set to 0, i.e. the scatter matrix asymptotic Gaussian distribution with the true scatter is equal to the identity matrix. One can notice that Figure 1 matrix as mean. validates results obtained in Theorem III.1 since the quantity tends to zero when the number K of samples tends to infinity. 4) Another comment is that, roughly speaking, one has the Recall that following Remark III.1 when the scatter matrix following result for any robust scatter matrix estimator is equal to identity, the matrices Σ and Ω contain only three Mˆ : √ 6  ˆ ˆ  obtained as the empirical mean of the quantities obtained from I Monte mK M − MSCM −−−−→ CN (0, D, Q) Carlo runs (I = 10K) K→∞ 7

Indeed, if we look at the results from Theorem III.2 and Corollary III.1, the first scale factor is inversely proportional to the dimension m. 1 On Figure 2(b), we present the results for Tyler’s and Huber’s M-estimators when the data are corrupted by some SCM outliers. The parameter q for Huber’s M-estimator is set to 2 ˆ d TyE 0.95, which means that 95% of the data are considered to be Student 0.5 TyE-SCM Gaussian distributed while the remaining 5% are treated as Student-SCM outliers. As it can be noted, the results are the same as on Figure 2(a), showing the robustness of these two estimators and validating the theoretical results. Generally speaking, these 0 tests show that M-estimators are better characterized by a 0 20 40 60 80 100 Wishart distribution than a Gaussian distribution centered on m the true matrix M. (a) Student-t distribution B. Application to Mahalanobis distance We now present results for the robust Mahalanobis distance. On Figure 3(a), the results for Tyler’s M-estimator are pre- 1 sented when data follow a complex t-distribution with ν = 2. 2

ˆ SCM d TyE TyE 0.5 Huber TyE-SCM TyE-SCM Eq. (22) Huber-SCM 101 0

0 20 40 60 80 100 Variance m 100 (b) Gaussian plus outliers

Figure 2: Empirical second diagonal element in the asymptotic covariance matrices versus the dimension m for K = 1000.

• SCM corresponds to the empirical version of 1 in Eq. (7) 2 3 −1 10 10 • TyE: empirical version of ϑ1 = m (m + 1) in Eq. (12), K • Huber: empirical version of ϑ1 in Eq. (10) for Huber M- estimator, (a) Tyler’s estimator • Student: empirical version of ϑ1 in Eq. (10) for the Student M-estimator, • TyE-SCM: empirical version of 1/m of Theorem III.2, Huber • Huber-SCM: empirical version of σ1 of Theorem III.1 computed Huber-SCM with Huber M-estimator, 101 −1 • Student-SCM: empirical version of σ1 = (m + ν/2) of Corollary III.1

100 Variance different non-null elements: σ1 + σ2, σ1 and σ2. Here, we will compare the empirical value of σ1 to the empirical value −1 of √ϑ1 (first scale factor of the empirical covariance matrix 10 of K(σMc − M)) in distinct non-Gaussian environments. Results are similar for other coefficients and will be omitted. 2 3 Figure 2 presents results in various non-Gaussian cases. On 10 10 Figure 2(a), the results obtained for complex t-distributed data K (ν = 2) are presented. The second diagonal element for Tyler’s (b) Huber’s M-estimator and Student’s M-estimator are plotted. The horizontal scale Figure 3: Scaled empirical variance of the robust Mahalanobis presents the dimension of the data. The number of samples K distance compared to the one when centering around SCM-based is set to 1000. One can notice that the second diagonal element distance (with the result of Theorem III.3) versus K, m = 10 for M-estimators vanishes when m increases, as expected. 8

The empirical variance of the robust distance and the one follows: explaining the behavior of an “intractable” estimator of the difference between the robust distance and the distance θˆ by analyzing its proximity with a well-known estimator ˆ computed with the SCM in a Gaussian context (compared to θ1. It has been shown that the second order statistics of M- the theoretical result of Theorem 21 - Eq. (22)) are plotted. On estimators when centering around a Wishart distributed matrix Figure 3(b) the results for Huber’s M-estimator are plotted. are much smaller than the ones when centering around the true 95% of the data follow a Gaussian distribution, while the scatter matrix. It has also been revealed that this difference is outliers (remaining 5% of the data) are modelled with t- even more meaningful for high-dimensional data. It should distribution (ν = 2). One can notice that the value of the be stressed that these results provide a better approximation robust distance is much closer to the one based on the SCM, of M-estimators properties than any other analyses in the than to the distance computed with the true scatter matrix, literature. In our view, these results represent an excellent which once again justifies the statement that the behavior of initial step toward the better understanding of the behavior M-estimators can be approximated by a Wishart distribution. of M-estimators applied in various problems. In this work, This also implies that the distribution of robust distances can we have presented the application to the widely used Ma- be better approximated with a theoretical distribution of the halanobis distance. This approach could also be applied to SCM-based distance in the Gaussian framework than with the the adaptive detection problems and thus, be very helpful in asymptotic distribution based on the true scatter matrix. the improvement of the detection performances. Moreover, one potential application of our findings can be found in polarimetric SAR images restoration, clustering and/or target K = 100, m = 10 0.16 detection. To conclude, we are confident that the results of

0.14 this work are very promising and that can be applied to a wide range of signal processing problems. 0.12

0.1 APPENDIX A

0.08 PROOFOF THEOREM III.1

0.06 To prove the statement let us rewrite the right hand side of equation (19) as follows: 0.04 √   K vec(σM − MSCM ) = 0.02 c c √   0 K vec(σMc − M − McSCM + M) 0 5 10 15 20 25 30 35 40 " √ # Kvec(σM − M) Figure 4: Histogram distribution of the robust Mahalanobis = 1, −1 √ c 2 Kvec(McSCM − M) distance based on Student’s M-estimator ∆ (Mct) versus asymptotic distribution (Eq. (15)) in red and versus theoretical Σ(K) = Σ(K) − 2Σ(K) + Σ(K) approximative distibution (Eq. (16)) in green where m = 10, Therefore one has 1 2 3 with K = 100, z ∼ Ctν with ν = 2. (K) h H i Σ1 = KE vec(σMc − M)vec(σMc − M)

(K) h H i Figure 4 presents the empirical distribution of the robust Σ2 = KE vec(σMc − M)vec(McSCM − M) Mahalanobis distance built with the Student’s M-estimator and (K) h H i Σ = KE vec(McSCM − M)vec(McSCM − M) . two corresponding distributions proposed in Eqs. (15) and (16) 3 for t-distributed data with ν = 2 (K = 100, m = 10). We One has now observe that the empirical distribution matches significantly (K) better the scaled Beta prime than the scaled chi-squared Σ −−−−−→ Σ = Σ1(M) − 2Σ2(M) + Σ3(M) (23) K→+∞ distribution. The essential advantage of these findings in, for instance, outliers detection is that they support the idea to use where the matrices Σ1(M) and Σ3(M) are given by (10) and the robust M-estimators to estimate the scatter matrix and (7), respectively. to rely on the theoretical distribution of the Wishart-based Following the similar ideas used in [18], [22], we provide distance when computing the detection treshold. a more general result that allows to compute a corelation between two estimators

V. CONCLUSIONS (K) −1 −1 Σ2 −−−−−→ Σ2(M) = D1 (M)B(M)D2 (M) K→+∞ This paper investigated the statistical properties of M- estimators. To that end, a new “Gaussian-core” model has where D1(M) = E [d{vecΨ1(M)}/d{vec(M}], been introduced for CES distributions. We have proposed a B(M) = cov (vecΨ1(M), vecΨ2(M)) and new approach that consists in comparing M-estimators to D2(M) = E [d{vecΨ2(M)}/d{vec(M)}] with H −1 −1 H the well-known Gaussian-based SCM in order to derive new Ψ1(M) = σϕ(z (σ M) z)zz − M and H properties. In other words, the approach can be summarized as Ψ2(M) = xx − M. 9

Without loss of generality, we will assume that M = I. APPENDIX B Indeed, one has that PROOFOF COROLLARY III.1 H  T/2 1/2  T/2 1/2 For Student’s t-distribution t1 ∼ mF2m,ν yields Σ2(M) = M ⊗ M Σ2(I) M ⊗ M . − 2m+ν  2  2 In order to determine the final result, we will derive the f(t ) = C tm−1 1 + t 1 m 1 ν 1 expression for Σ2(I). One can show that with D−1(I) = α I + α vec(I)vec(I)T 1 1 2  m ν  2 Γ m + 2 2 2 C = where α = − m(m+1) and α = m(c −m −m) with c = m ν  1 c 2 c(c−m2) ν Γ(m)Γ 2 E [σt ψ0(σt )] + m2. Moreover, it is simple to show that 1 1 where Γ(.) is the Gamma function. Since σ = 1 for every D (I)−1 = −I. 2 ML-estimator [1], one has Then, basing on Theorem 2 from [22], one can derive more general result 2m + ν ψ (σt1) = ψ (t1) = t1 ν + 2t1 B (I) = β I + β vec(I)vec(I)T 2 1 2 Now one obtains where " 2 #  2  (2m + ν) 2 E ψ (t1) = E 2 t1 β1 = cov[Ψ1(I)jkΨ2(I)jk] (ν + 2t1) 2 2 − 2m+ν = E[ϕ(σt1)t2u u ] Z +∞ 2 2   2 j k (2m + ν) t1 m−1 2t1 = Cm t 1 + dt1 E[ψ(σt1)t2] b 2 2 2 ν = = 0 ν (1 + ν t1) m(m + 1) m(m + 1) (2m + ν)2 1 = Cm 2 and ν Cm+2 m(m + 1)(m + ν ) β = cov[Ψ (I) Ψ (I) ] = 2 2 1 jj 2 kk m + 1 + ν 2 2 = β1 − E[ψ(σt1)]E[t2]/m and = β − 1 1   0 (2m + ν)ν 2 2 2 2 E [t ψ (t )] = E t since u ∼ Beta(1, m − 1), E[u ] = 1/m and E[u u ] = 1 1 2 1 i j j k (ν + 2t1) 1/(m(m + 1)). After some mathematical manipulations, one − 2m+ν Z +∞   2 obtains (2m + ν)νt1 m−1 2t1 = Cm t1 1 + dt1 T ν2(1 + 2 t )2 ν Σ2(I) = γ1I + γ2vec(I)vec(I) 0 ν 1 − 2m+4+ν ν(2m + ν) Z +∞  2t  2 with = C t−1tm+1 1 + 1 dt m ν2 1 1 ν 1 b 0 γ1 = −α1β1 = 2m + ν 1  −1 c = Cm E t ν C 1 m(b − c) m+2 γ2 = −(α1β2 + 2α2β1 + 2mα2β2 = . (24) c(c − m2) where now t/(m+2) ∼ F2m+4,ν or equivalently (m+2)/t ∼ Fν,2m+4 which gives This leads to the final expression of Σ2(M):  −1 1 2m + 4 1 T H E t = = Σ2(M) = γ1M ⊗ M + γ2vec(M)vec(M) (25) 1 m + 2 2m + 4 − 2 m + 1 Combining Eq. (25) together with Eqs. (10) and (7) in and finally Eq. (23), one obtains the coefficients σ1 and σ2 as follows 0 ν m E [tψ (t1)] = ν . 2 (m + 1 + 2 ) σ1 = ϑ1 − 2γ1 + 1 am(m + 2) + c(c − 2b) To compute E[ψ(t1)t2] let us remind that t1 = τt2 where τ = 2 c2 and t2 are independent, τ ∼ IG(ν/2, ν/2) and t2 ∼ (1/2)χ2m. Thus, one can write and ZZ 1 ν ν − 2 − 2τ m+1 −t2 σ2 = ϑ2 − 2γ2 I = E [ψ(t1)t2] = C 2τ τ e t2 e dτdt2 2 1 + t2 a − m2 a(m + 2) m(c − b) R+ ν = − + 2 . ν 2 2 2 2 2m ν 2 ν (c − m ) c c(c − m ) where C = ( ν +1) 2 )/(Γ( 2 )Γ(m)). The change of variable 2τ 2τ u = ν t2 gives du = ν dt2 and hence Finally, one can easily prove that Ω = ΣK [6], which leads Z ∞ Z ∞ ν − ν − ν ν m+1 1 m+1 − ν u to the final results and concludes the proof. I = C τ 2 e 2τ u e 2τ dudτ . 0 2τ 2τ 0 1 + u 10

Then, using the equality is the first derivative with respect to M and ΣM =  H  Z ∞     1 m+1 − ν u ν  ν  E σMc − M σMc − M is defined by Eq. (10) u e 2τ du = e 2τ (m + 1)!Γ −1 − m; 1 + u 2τ 0 (with M = σMσ). Moreover, one has a more gen- 0 0 H where Γ(.; .) stands for the upper incomplete Gamma function, eral result φcorr = f (M) Σ2f (M) = where Σ2 =  H  one obtains     T  E σMc − M McSCM − M = γ1 M ⊗ M + Z ∞ 0 − ν −m−2  ν  I = C τ 2 Γ −1 − m; dτ H 2τ γ2vec (M) vec (M) with γ1 and γ2 the complex versions 0 of Eq. (24). In [?] it has been shown that f 0(M) = 0 ν m+2 H H  T −1 T −1 where C = C(m + 1)! 2 . Since vec yy M ⊗ M . Since M ⊗ M vec (M) = −1  ν   ν −m−1  ν  vec M one has Γ −1 − m; = Em+2 0 0 H 2τ 2τ 2τ φM = f (M) Σf (M) where Em+2 is the generalized exponential integral, one has H H  T −1 H  = ϑ1vec yy M ⊗ M vec yy Z ∞ 00 − ν −1  ν  H H  −1 −1H H  I = C τ 2 E dτ + ϑ vec yy vec M vec M vec yy m+2 2τ 2 0 H H  −1 H −1 = ϑ1vec yy vec M yy M 00 0 2 m+1 where C = C which leads to H −1 −1 H ν + ϑ2Tr(yy M )Tr(M yy ) Z ∞ Z ∞ H −1 H −1 H −1 2 00 − ν −1 − ν t −m−2 = ϑ1Tr(y M yy M y) + ϑ2Tr(y M y) I = C τ 2 e 2τ t dtdτ H −1 2 0 1 = (ϑ1 + ϑ2)(y M y) . ∞ ∞ Z Z ν ν 00 −m−2 − 2 −1 − 2τ t H −1 2 = C t τ e dtdτ It is now clear that φSCM = (y M y) and φcorr = 1 0 0 0 H H −1 2 ∞ 000 f (M)Σ2f (M) = (γ1 + γ2)(y M y) which leads to Z ν C 000 −m−2− 2 = C t dt = ν the final result. 1 m + 1 + 2 − ν 000 00 ν  2 ν REFERENCES where C = C 2 Γ( 2 ) and finally [1] E. Ollila, D. E. Tyler, V. Koivunen, and H. V. Poor, “Complex elliptically (m + ν )m(m + 1) E[ψ(t )t ] = 2 . symmetric distributions: Survey, new results and applications,” Signal 1 2 m + 1 + ν Processing, IEEE Transactions on, vol. 60, no. 11, pp. 5597–5625, 2 November 2012. This leads to the following values for a, b, c [2] F. Pascal, Y. Chitour, J.-P. Ovarlez, P. Forster, and P. Larzabal, “Covari- ance structure maximum-likelihood estimates in Compound-Gaussian ν m(m + 1)(m + 2 ) noise: existence and algorithm analysis,” Signal Processing, IEEE Trans- a = ν , actions on, vol. 56, no. 1, pp. 34–48, January 2008. m + 1 + 2 [3] Y. Chen, A. Wiesel, and A. O. Hero, “Robust shrinkage estimation ν of high-dimensional covariance matrices,” Signal Processing, IEEE m(m + 1)(m + 2 ) b = ν , Transactions on, vol. 59, no. 9, pp. 4097–4107, 2011. m + 1 + 2 [4] A. Wiesel, “Unified framework to regularized covariance estimation ν in scaled Gaussian models,” Signal Processing, IEEE Transactions on, ν m 2 m(m + 1)(m + 2 ) c = ν + m = ν . vol. 60, no. 1, pp. 29–38, 2012. 2 (m + 1 + 2 ) m + 1 + 2 [5] F. Pascal, L. Bombrun, J.-Y. Tourneret, and Y. Berthoumieu, “Parameter estimation for multivariate generalized Gaussian distributions,” Signal Substituting previous results in Eq. (20), one finally obtains Processing, IEEE Transactions on, vol. 61, no. 23, pp. 5960–5971, 1 2(m + 1 + ν/2) December 2013. σ1 = and σ2 = . [6] M. Mahot, F. Pascal, P. Forster, and J.-P. Ovarlez, “Asymptotic properties m + ν/2 ν(m + ν/2) of robust complex covariance matrix estimates,” IEEE Transactions on Signal Processing, vol. 61, no. 13, pp. 3348–3356, July 2013. [7] Y. Sun, P. Babu, and D. P. Palomar, “Regularized Tyler’s Scatter APPENDIX C Estimator: Existence, Uniqueness and Algorithms,” Signal Processing, PROOFOF THEOREM III.3 IEEE Transactions on, vol. 62, no. 19, pp. 5143–5156, Oct. 2014. To prove the statement of Theorem III.3, we will rewrite φ [8] E. Ollila and D. E. Tyler, “Regularized m-estimators of scatter matrix,” Signal Processing, IEEE Transactions on, vol. 62, no. 22, pp. 6059– as 6070, Nov. 2014. H −1 2 φ = (φM − 2φcorr + φSCM )/(y M y) (26) [9] A. K. Gupta and D. K. Nagar, Matrix Variate Distributions. Chapman & Hall/CRC, 2000. where [10] M. Bilodeau and D. Brenner, Theory of Multivariate Statistics, ser. New   York, NY. USA:Springer-Verlag, 1999.    2 [11] D. Kelker, “Distribution theory of spherical distributions and a location- φM = E f σMc − f (M) scale parameter generalization,” Sankhya:¯ The Indian Journal of Statis- h       i tics, Series A, vol. 32, no. 4, pp. 419–430, December 1970. [12] F. Gini and M. S. Greco, “Sub-optimum approach to adaptive coherent φcorr = E f σMc − f (M) f McSCM − f (M) radar detection in Compound-Gaussian clutter,” Aerospace and Elec-    2 tronic Systems, IEEE Transactions on, vol. 35, no. 3, pp. 1095–1103, φSCM = E f McSCM − f (M) July 1999. [13] F. Gini, M. S. Greco, M. Diani, and L. Verrazzani, “Performance analysis H −1 of two adaptive radar detectors against non-Gaussian real sea clutter with f(M) = y M y. Using the Delta method one data,” Aerospace and Electronic Systems, IEEE Transactions on, vol. 36, 0 0 H 0 can obtain φM = f (M)ΣM f (M) [17] where f (M) no. 4, pp. 1429–1439, October 2000. 11

[14] F. Gini and M. S. Greco, “Covariance matrix estimation for CFAR [38] F. Pascal and J.-P. Ovarlez, “Asymptotic Properties of the Robust detection in correlated heavy tailed clutter,” Signal Processing, special ANMF,” in IEEE International Conference on Acoustics, Speech, and section on SP with Heavy Tailed Distributions, vol. 82, no. 12, pp. 1847– Signal Processing, ICASSP-15, Brisbane, Australia, April 2015. 1859, December 2002. [15] E. Conte and M. Longo, “Characterization of radar clutter as a spheri- cally invariant random process,” IEE Proceeding, Part. F, vol. 134, no. 2, pp. 191–197, April 1987. [16] E. Conte, A. De Maio, and G. Ricci, “Covariance matrix estimation for adaptive cfar detection in Compound-Gaussian clutter,” Aerospace and Electronic Systems, IEEE Transactions on, vol. 38, no. 2, pp. 415–426, April 2002. [17] P. J. Huber, “Robust estimation of a location parameter,” The Annals of Mathematical Statistics, vol. 35, no. 1, pp. 73–101, January 1964. [18] R. A. Maronna, “Robust M-estimators of multivariate location and scatter,” Annals of Statistics, vol. 4, no. 1, pp. 51–67, January 1976. [19] J. T. Kent and D. E. Tyler, “Maximum likelihood estimation for the wrapped Cauchy distribution,” Journal of Applied Statistics, vol. 15, no. 2, pp. 247–254, 1988. [20] A. Balleri, A. Nehorai, and J. Wang, “Maximum likelihood estimation for Compound-Gaussian clutter with inverse gamma texture,” Aerospace and Electronic Systems, IEEE Transactions on, vol. 43, no. 2, pp. 775– 779, April 2007. [21] D. E. Tyler, “Radial estimates and the test for sphericity,” Biometrika, vol. 69, no. 2, p. 429, 1982. [22] ——, “A distribution-free M-estimator of multivariate scatter,” The Annals of Statistics, vol. 15, no. 1, pp. 234–251, 1987. [23] F. Pascal, P. Forster, J.-P. Ovarlez, and P. Larzabal, “Performance analysis of covariance matrix estimates in impulsive noise,” IEEE Transactions on Signal Processing, vol. 56, no. 6, pp. 2206–2217, June 2008. [24] G. Draskoviˇ c´ and F. Pascal, “New properties for Tyler’s covariance matrix estimator,” in 2016 50th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, Nov 2016, pp. 820–824. [25] P. C. Mahalanobis, “On the generalized distance in statistics,” Proceed- ings of the National Institute of Sciences (Calcutta), vol. 2, pp. 49–55, 1936. [26] R. D. Maesschalck, D. Jouan-Rimbaud, and D. Massart, “The Mahalanobis distance,” Chemometrics and Intelligent Laboratory Systems, vol. 50, no. 1, pp. 1 – 18, 2000. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0169743999000477 [27] K. Yao, “A representation theorem and its applications to spherically invariant random processes,” Information Theory, IEEE Transactions on, vol. 19, no. 5, pp. 600–608, September 1973. [28] P. J. Rousseeuw and B. C. Van Zomeren, “Unmasking multivariate outliers and leverage points,” Journal of the American Statistical as- sociation, vol. 85, no. 411, pp. 633–639, 1990. [29] A. S. Hadi, “Identifying multiple outliers in multivariate data,” Journal of the Royal Statistical Society. Series B (Methodological), pp. 761–771, 1992. [30] S. Xiang, F. Nie, and C. Zhang, “Learning a Mahalanobis distance metric for data clustering and classification,” Pattern Recognition, vol. 41, no. 12, pp. 3600 – 3612, 2008. [Online]. Available: http://www.sciencedirect.com/science/article/pii/S0031320308002057 [31] K. Q. Weinberger, J. Blitzer, and L. K. Saul, “Distance metric learning for large margin nearest neighbor classification,” in Advances in neural information processing systems, 2006, pp. 1473–1480. [32] P. Pudil, J. Novovicovˇ a,´ and J. Kittler, “Floating search methods in feature selection,” Pattern recognition letters, vol. 15, no. 11, pp. 1119– 1125, 1994. [33] C.-I. Chang and S.-S. Chiang, “Anomaly detection and classification for hyperspectral imagery,” IEEE transactions on geoscience and remote sensing, vol. 40, no. 6, pp. 1314–1325, 2002. [34] J. Frontera-Pons, M. A. Veganzones, S. Velasco-Forero, F. Pascal, J. P. Ovarlez, and J. Chanussot, “Robust anomaly detection in hyperspectral imaging,” in 2014 IEEE Geoscience and Remote Sensing Symposium, July 2014, pp. 4604–4607. [35] G. Draskoviˇ c,´ F. Pascal, A. Breloy, and J.-Y. Tourneret, “New asymptotic properties for the robust ANMF,” in IEEE International Conference on Acoustics, Speech, and Signal Processing, ICASSP-17, New Orleans, USA, March 2017. [36] R. Bhatia and C. Davis, “A better bound on the variance,” The American Mathematical Monthly, vol. 107, no. 4, pp. 353–357, 2000. [37] R. Couillet, F. Pascal, and J. W. Silverstein, “The Random Matrix Regime of Maronna’s M-estimator with elliptically distributed samples,” Journal of Multivariate Analysis, vol. 139, pp. 56–78, July 2015.