Maximum Likelihood Covariance Matrix Estimation from Two Possibly Mismatched Data Sets Olivier Besson

Maximum likelihood covariance matrix estimation from two possibly mismatched data sets Olivier Besson To cite this version: Olivier Besson. Maximum likelihood covariance matrix estimation from two possibly mismatched data sets. Signal Processing, Elsevier, 2020, 167, pp.107285-107294. 10.1016/j.sigpro.2019.107285. hal-02572461 HAL Id: hal-02572461 https://hal.archives-ouvertes.fr/hal-02572461 Submitted on 13 May 2020 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Open Archive Toulouse Archive Ouverte (OATAO ) OATAO is an open access repository that collects the wor of some Toulouse researchers and ma es it freely available over the web where possible. This is an author's version published in: https://oatao.univ-toulouse.fr/25984 Official URL : https://doi.org/10.1016/j.sigpro.2019.107285 To cite this version : Besson, Olivier Maximum likelihood covariance matrix estimation from two possibly mismatched data sets. (2020) Signal Processing, 167. 107285-107294. ISSN 0165-1684 Any correspondence concerning this service should be sent to the repository administrator: [email protected] Maximum likelihood covariance matrix estimation from two possibly mismatched data sets Olivier Besson ISAE-SUPAERO, 10 Avenue Edouard Belin, Toulouse 31055, France a b s t r a c t We consider estimating the covariance matrix from two data sets, one whose covariance matrix R 1 is the sought one and another set of samples whose covariance matrix R 2 slightly differs from the sought one, due e.g. to different measurement configurations. We assume however that the two matrices are rather / − / 1 2 1 1 2| close, which we formulate by assuming that R1 R2 R1 R1 follows a Wishart distribution around the identity matrix. It turns out that this assumption results in two data sets with different marginal distri- Keywords: butions, hence the problem becomes that of covariance matrix estimation from two data sets which are Covariance matrix estimation distribution-mismatched. The maximum likelihood estimator (MLE) is derived and is shown to depend Maximum likelihood on the values of the number of samples in each set. We show that it involves whitening of one data set Mismatch by the other one, shrinkage of eigenvalues and colorization, at least when one data set contains more samples than the size p of the observation space. When both data sets have less than p samples but the total number is larger than p , the MLE again entails eigenvalues shrinkage but this time after a projection operation. Simulation results compare the new estimator to state of the art techniques. 1. Problem statement derived minimax estimators in two important classes, namely estimators of the form Rˆ = GDG T where D is a diagonal matrix and Analysis or processing of multichannel data most often relies on G is the Cholesky factor of S , or of the form Rˆ = U diag ϕ(λ) U T the covariance matrix, which is a fundamental tool e.g., for princi- where U diag( λ) U T is the eigenvalue decomposition of S and ϕ( λ) pal component analysis, spectral analysis, adaptive filtering, detec- is a non-linear function of λ. This seminal work of Stein gave rise tion, direction of arrival estimation among others [1–3] . In practical to a great number of studies, see for instance [7–13] and refer- applications, the p × p covariance matrix R needs to be estimated ences therein. A second class of robust estimates is based on lin- from a finite number n of samples. When the latter are indepen- ear shrinkage of the SCM to a target matrix (an approach which dent and Gaussian distributed, the maximum likelihood estimator can be interpreted as an empirical Bayes technique), i.e., esti- −1 × = T of R is n S where X is the p n data matrix and S XX is the mates of the form Rˆ = αRt + βS where Rt = I is the most widely sample covariance matrix (SCM) [1] . However, in low sample sup- spread choice, see e.g., [14–20] . Note that these techniques ap- port or when deviation from the Gaussian assumption is at hand, plied with Rt = I achieve an affine transformation of the eigen- the SCM tends to behave poorly. In particular it was observed that values of S , while retaining the eigenvectors, and therefore bear the sample covariance matrix is usually less well-conditioned than resemblance with Stein’s method, although the selection of α, β the true covariance matrix, and therefore considerable effort has may not be driven by the same principle. Robustness to a pos- been dedicated to regularizing it with a view to improve its per- sibly non Gaussian distribution has also been a topic of consid- formance. erable interest and many papers have focused on robust estima- One of the most important approach in this respect is due to tion for elliptically distributed data, see e.g., [21–30] and references Stein [4–6] who, instead of maximizing the likelihood function, therein. advocated to minimize a meaningful loss function within a given Most of the above cited works deal with estimation of a co- class of estimators. Stein hence introduced the concept of admissi- variance matrix from a single data set. In this paper, we consider a ble estimation and minimax estimators under the so-called Stein’s situation where two data sets X1 and X2 are available, with respec- loss. He showed that the SCM-based estimator is not minimax and tive covariance matrices R1 and R2 . This situation typically arises in radar applications when one wishes to detect a target buried in clutter with unknown statistics [31,32] . In order to infer the lat- E-mail address: [email protected] ter, training samples are generally used, which hopefully share the https://doi.org/10.1016/j.sigpro.2019.107285 same statistics as the clutter in the cell under test (CUT). However, X2 ) would be maximized under the constraint that the distance it has been evidenced that clutter is most often heterogeneous between W and I is smaller than some value. Alternatively, and [31] , with a discrepancy compared to the CUT that may grow with this is what we elect here, one can resort to an empirical Bayes the distance to the CUT [33] . Therefore, one is led to use some approach where the random matrix W follows some prior distri- clustering that separates training samples, either based on their bution rather concentrated around I . For mathematical tractability, proximity to the CUT or by means of some statistical criterion, we choose a conjugate prior for W and we assume that W fol- such as the power selected training [34] . The samples so selected lows a Wishart distribution with ν degrees of freedom and param- are deemed to be representative of the clutter in the CUT while −1 d −1 eter matrix μ I , i.e., W = W p ν, μ I . Of course, this is a rather others are less reliable, which corresponds to the situation consid- strong assumption whose validity would be difficult to check, e.g., ered herein. A second example is in the field of synthetic aper- on real data. However, it is in accordance with the mere knowl- ture radar in the case where a scene is imaged on two consecutive edge we have about the relation between R1 and R2 , and it allows days, with possible changes in between [35] . Finally, in hyperspec- for tractable derivations. tral imagery, the problem of target or anomaly detection leads to Using the fact that X1 |R1 and X2 |R2 are independent and Gaus- a very similar framework. Indeed, the background in a pixel under sian distributed with respective covariance matrices R1 and R2 , test has to be estimated from the local pixels around and pixels lo- = −1 T , and since R2 G1 W G1 we thus assume the following stochastic cated further apart [36]. In the present paper, we assume that R2 model: is close to R1 , the covariance matrix we wish to estimate. Since R2 −n / 2 −p(n + n ) / 2 −n 1 / 2 −1 2 p(X , X | R , W ) = (2 π ) 1 2 |R | W R differs from but is close to R1 we investigate using both X1 and 1 2 1 1 1 X to estimate R . The reason for using also X is that despite its 2 1 2 1 T −1 1 T −T −1 ×etr − X R X 1 − X G WG X 2 (1a) covariance matrix is not R1 , it is close to. Additionally, one might 2 1 1 2 2 1 1 face situations where the number of samples in X1 is very small. ν p/ 2 This paper constitutes a first approach to this specific problem and μ (ν− − ) / 1 p(W ) = |W | p 1 2etr − μW (1b) we focus herein on the most natural approach, namely maximum ν p/ 2 2 p (ν/ 2) 2 likelihood estimation. The objective is to figure out the pros and cons of the latter and the conditions under which it is an accu- E −1 = (ν − − ) −1 μ E { } = Note that W p 1 I so that R2 rate estimator. The paper is organized as follows.

Load more