Fixed Point Algorithms for Estimating Power Means of Positive Definite Matrices Marco Congedo, Alexandre Barachant, Rajendra Bhatia
Total Page:16
File Type:pdf, Size:1020Kb
Fixed Point Algorithms for Estimating Power Means of Positive Definite Matrices Marco Congedo, Alexandre Barachant, Rajendra Bhatia To cite this version: Marco Congedo, Alexandre Barachant, Rajendra Bhatia. Fixed Point Algorithms for Estimating Power Means of Positive Definite Matrices. IEEE Transactions on Signal Processing, Institute of Electrical and Electronics Engineers, 2017, 65 (9), pp.2211 - 2220. 10.1109/TSP.2017.2649483. hal- 01500514 HAL Id: hal-01500514 https://hal.archives-ouvertes.fr/hal-01500514 Submitted on 3 Apr 2017 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Published in: IEEE Transactions on Signal Processing (2017), 65 (9) : 2211 - 2220. 1 Fixed Point Algorithms for Estimating Power Means of Positive Definite Matrices Marco Congedo, Alexandre Barachant and Ehsan Kharati Koopaei search space, etc. Averaging such SPD matrices is a ubiquitous Abstract—Estimating means of data points lying on the task. In signal processing we find it in a wide variety of data- Riemannian manifold of symmetric positive-definite (SPD) driven algorithms allowing spatial filters, blind source matrices has proved of great utility in applications requiring separation, beamformers and inverse solutions. While robust interpolation, extrapolation, smoothing, signal detection and estimation of covariance matrices and related quantities is a classification. The power means of SPD matrices with exponent p long-standing topic of research, only recently an in the interval [-1, 1] interpolate in between the Harmonic mean (p information/differential geometry perspective has been = -1) and the Arithmetic mean (p = 1), while the Geometric (Cartan/Karcher) mean, which is the one currently employed in considered [14-22]. most applications, corresponds to their limit evaluated at 0. In this Once observations are represented as SPD matrices, they may article we treat the problem of estimating power means along the be treated as points on a smooth Riemannian manifold in which continuum p(-1, 1) given noisy observed measurement. We the fundamental geometrical notion of distance between two provide a general fixed point algorithm (MPM) and we show that its convergence rate for p = ±0.5 deteriorates very little with the points and the center of mass among a number of points are number and dimension of points given as input. Along the whole naturally defined [14]. In turn, these notions allow useful continuum, MPM is also robust with respect to the dispersion of operations such as interpolation, smoothing, filtering, the points on the manifold (noise), much more so than the gradient approximation, averaging, signal detection and classification. descent algorithm usually employed to estimate the geometric In classification problems a simple Riemannian classifier based mean. Thus, MPM is an efficient algorithm for the whole family of on a minimum distance to mean (MDM) procedure [3] has been power means, including the geometric mean, which by MPM can tested with success on electroencephalographic data, in several be approximated with a desired precision by interpolating two kinds of brain-computer interfaces [3-6] and in the analysis of solutions obtained with a small ±p value. We also present an sleep stages [10, 11], as well as on motion capture data for the approximated version of the MPM algorithm with very low computational complexity for the special case p=±½. Finally, we classification of body movements [13]. A similar method has show the appeal of power means through the classification of been used for clustering in the context of video-based face and brain-computer interface event-related potentials data. scene recognition [7] and in radar detection [2]. These examples demonstrate that simple machine learning algorithms, which Index Terms—Brain-Computer Interface, Geometric Mean, are known to allow poor performance using the Euclidean High Dimension, Power Means, Riemannian Manifold, Symmetric metric, can be easily translated into equivalent Riemannian Positive-Definite Matrix. classifiers using an appropriate metric, obtaining excellent performance. I. INTRODUCTION Among the several means one may define from an information geometry point of view, so far the geometric mean (sometimes HE study of means (centers of mass) for a set of symmetric referred to as Karcher, Cartan or Fréchet mean) has been the T positive definite (SPD) matrices has recently attracted most studied and the most used in practical applications. It is much attention, driven by practical problems in radar data the natural definition of mean when the Fisher-Rao metric is processing, image and speech processing, computer vision, applied to multivariate Gaussian distributions [20, 21], but also shape and movement analysis, medical imaging (especially arises naturally from a pure geometrical and algebraic diffusion magnetic resonance imaging and brain-computer perspective without making assumptions on the data interface), sensor networks, elasticity, numerical analysis and distribution [14]. It so happens that the geometric mean satisfies machine learning (e.g., [1-13]). In many applications the a number of desirable invariances, including congruence observed data can be conveniently summarized by SPD invariance, self-duality, joint homogeneity and the determinant matrices, for example, some form of their covariance matrix in identity [23]. The simultaneous verification of all these the time, frequency or time-frequency domain, or properties is hard to find for means based on other metrics, such autocorrelation matrices. In others, SPD matrices arise naturally as the arithmetic, harmonic and log-Euclidean mean, thus the as kernels, tensors (or slice of), density matrices, elements of a Submitted on July 15 2016. This work has been partially supported by Author A.B. is with the Burke Medical Research Institute, White Plains, NY, European project ERC-2012-AdG-320684-CHESS. USA ([email protected]). Author E.K.K. is with the Author M.C. is with the GIPSA-lab (Grenoble Images Parole Signal Department of Statistical Science, University of Padova, Padova, Italy (e-mail: Automatique), CNRS, University of Grenoble Alpes and Grenoble Institute of [email protected]). Technology, Grenoble, France (e-mail: [email protected]). Published in: IEEE Transactions on Signal Processing (2017), 65 (9) : 2211 - 2220. 2 geometric mean of SPD matrices is not just important in space defined at each point and varying smoothly from point to practice, but a fundamental mathematical object per se. point. The tangent space TGM at point G is the vector space For positive numbers the arithmetic, geometric and harmonic containing the tangent vectors to all curves on M passing mean are all members of the family of power means, also through G. For the manifold M of SPD matrices S ++ , this is the known as Hölder or generalized mean. Given a set of K positive space S of symmetric matrices. (Fig. 1). For any two tangent numbers {x1,…xK} and K associated weights {w1,…,wK} vectors ζ1 and ζ2, we are concerned here with the inner product satisfying Σkwk=1, the w-weighted power mean of order p, g, of given by the Fisher-Rao metric at any base-point G [14] : {x1,…xK} is 11 , tr G G . (2) 1 1 2G 1 2 p p . (1) g k wkk x B. The Geodesic Power means interpolate continuously in between the limit p- The SPD manifold has non-positive curvature and is complete; (the minimum of the set) and the limit p+ (the maximum for any two points C1 and C2 on M, a unique path on M of of the set), passing by p=-1 (harmonic mean) and p=1 minimal length (at constant velocity) connecting the two points (arithmetic mean), while the limit p0 from both sides allows always exists. The path is named the geodesic and the points the geometric mean. This generality of power means is along it have analytical expressions given by appealing from a signal processing perspective; in a typical 1 1 1t 1 engineering scenario the sensor measurement is affected by 2 2 2 2 CCCCCCC1# t 2 1 1 2 1 1 , t 0,1 , (3) additive noise and varying p one can find an optimal mean depending on the signal-to-noise-ratio (SNR), as we will show. with t the arc-length parameter [14]. With t=0 we are at C1, with Recently [24] extended the concept of power means of positive t=1 we are at C2 and with t=1/2 we are at the geometric mean numbers to SPD matrices for the continuum p[-1,1], with the (or center of mass) of the two points (Fig. 1). As a special case t ½ -t ½ 1-t case p=-1 being the matrix harmonic mean, p=1 the matrix we note I#tC=C and C#tI=C C C = C . Geodesic equation -1 -1 -1 arithmetic mean and the limit to zero from both sides allowing (3) verifies C1#tC2=C2#1-tC1 and (C1#tC2) =C1 #tC2 . The the matrix geometric mean we have discussed (see also [25- points along the geodesic can be understood as means of C1 and 27]). So far power means of SPD matrices have not been C2 weighted by t according to the Riemannian metric, in applied in signal processing. Also, only a “naive” fixed-point analogy with the weighted mean according to the Euclidean algorithm has been proposed for their estimation [24] and its metric given by (1-t)C1+tC2, which still results in a SPD matrix, convergence behavior is unsatisfactory. In this article we report but greater than C1#tC2 in the Loewner order sense [27].