On Mahalanobis Distance in Functional Settings
Total Page:16
File Type:pdf, Size:1020Kb
Journal of Machine Learning Research 21 (2020) 1-33 Submitted 3/18; Revised 12/19; Published 1/20 On Mahalanobis Distance in Functional Settings Jos´eR. Berrendero [email protected] Beatriz Bueno-Larraz [email protected] Antonio Cuevas [email protected] Department of Mathematics Universidad Aut´onomade Madrid Madrid, Spain Editor: Lorenzo Rosasco Abstract Mahalanobis distance is a classical tool in multivariate analysis. We suggest here an exten- sion of this concept to the case of functional data. More precisely, the proposed definition concerns those statistical problems where the sample data are real functions defined on a compact interval of the real line. The obvious difficulty for such a functional extension is the non-invertibility of the covariance operator in infinite-dimensional cases. Unlike other recent proposals, our definition is suggested and motivated in terms of the Reproducing Kernel Hilbert Space (RKHS) associated with the stochastic process that generates the data. The proposed distance is a true metric; it depends on a unique real smoothing pa- rameter which is fully motivated in RKHS terms. Moreover, it shares some properties of its finite dimensional counterpart: it is invariant under isometries, it can be consistently estimated from the data and its sampling distribution is known under Gaussian models. An empirical study for two statistical applications, outliers detection and binary classification, is included. The results are quite competitive when compared to other recent proposals in the literature. Keywords: Functional data, Mahalanobis distance, reproducing kernel Hilbert spaces, kernel methods in statistics, square root operator 1. Introduction The classical (finite-dimensional) Mahalanobis distance is a well-known tool in multivariate d analysis with multiple applications. Let X be a random variable taking values in R with non-singular covariance matrix Σ. In many practical situations it is required to measure d the distance between two points x; y 2 R when considered as two possible observations drawn from X. Clearly, the usual (square) Euclidean distance kx − yk2 = (x − y)0(x − y) is not a suitable choice since it disregards the standard deviations and the covariances of the d 0 components of X (given a column vector x 2 R we denote by x the transpose of x). The most popular alternative is perhaps the classical Mahalanobis distance, M(x; y), defined as 1=2 M(x; y) = (x − y)0Σ−1(x − y) : (1) c 2020 Jos´eR. Berrendero, Beatriz Bueno-Larraz and Antonio Cuevas. License: CC-BY 4.0, see https://creativecommons.org/licenses/by/4.0/. Attribution requirements are provided at http://jmlr.org/papers/v21/18-156.html. Berrendero, Bueno-Larraz and Cuevas Very often the interest is focused on studying \how extreme" a point x is within the distribution of X; this is typically evaluated in terms of M(x; m), where m stands for the vector of means of X. This distance is named after the Indian statistician P. C. Mahalanobis (1893-1972) who first proposed and analyzed this concept (Mahalanobis, 1936) in the setting of Gaussian distributions. Nowadays, some popular applications of the Mahalanobis distance are: su- pervised classification, outlier detection (Rousseeuw and van Zomeren, 1990 and Penny, 1996), multivariate depth measures (Zuo and Serfling, 2000), hypothesis testing (through Hotelling's statistic, Rencher, 2012, Ch. 5) or goodness of fit (Mardia, 1975). This list of references is far from exhaustive. 1.1. The infinite-dimensional (functional) case. The covariance operator Our framework here is Functional Data Analysis (FDA); see, e.g., Cuevas (2014) for an overview. In other words, we deal with statistical problems involving functional data. Thus 2 our sample is made of trajectories X1(t);:::;Xn(t) in L [0; 1] drawn from a second order stochastic process X(t); t 2 [0; 1] with m(t) = E(X(t)). The inner product and the norm in L2[0; 1] will be denoted by h·; ·i and k · k, respectively. We will henceforth assume that the covariance function K(s; t) = Cov(X(s);X(t)) is continuous. The function K defines a linear operator K : L2[0; 1] ! L2[0; 1], called covari- ance operator, given by Z 1 Kf(t) = K(t; s)f(s)ds: (2) 0 Note that the covariance operator is the infinite-dimensional version of the covariance trans- d formation x 7! Σx in R . Still, its properties are different in some important aspects as we will discuss below. We will define a Mahalanobis distance associated with K assuming that K is injective, that is, all the eigenvalues are strictly positive. For later use, it will be useful to note that the assumption of continuity for K(s; t) entails 2 2 R 1 R 1 2 m 2 L [0; 1] and EkXk < 1. Indeed, since 0 K(t; t)dt = E 0 (X(t) − m(t)) dt < 1, R 1 2 it holds 0 (X(t) − m(t)) dt < 1 almost surely (a.s.). Hence, as we are assuming X(t) 2 2 2 R 1 2 R 1 2 L [0; 1] a.s., we also have m 2 L [0; 1]. Now, E 0 (X(t)−m(t)) dt = 0 E(X(t)−m(t)) dt < 1 implies Z 1 Z 1 Z 1 Z 1 2 2 2 2 2 EkXk = E X (t)dt = EX (t)dt = E(X(t) − m(t)) dt + m (t)dt < 1: 0 0 0 0 It can be noted that this reasoning does not require in fact the continuity of K (boundedness would suffice) but, for simplicity, we will keep the continuity assumption as it is quite common in functional data analysis and not very restrictive. Besides, continuity will be needed below to prove, for example, Theorem 14 on asymptotic distribution. 1.2. Some basic notions on linear operators. Functions of operators Let (H; h·; ·i) be a real, separable, infinite-dimensional Hilbert space. In what follows, we will mostly consider the case H = L2[0; 1] but the basic ideas and definitions can be given 2 On Mahalanobis Distance in Functional Settings in general. Denote by C the space of self-adjoint bounded linear operators on H; see, e.g., Gohberg et al. (2003) for definitions and background. Let us recall that for any linear bounded operator A : H ! H in C, the operator norm of A is defined by kAkop = sup kAxk = sup jhAx; xij: kxk=1 kxk=1 If K 2 C is compact then, from the Spectral Theorem, K can be expressed in the form X Kx = λjhx; ejiej; (3) j feig being an orthonormal system of eigenfunctions of K and fλig the corresponding se- quence of eigenvalues. Let C+ be the subset of positive operators in C, that is the subset of those operators A in C such that hAx; xi ≥ 0, for all x. Note that every compact operator in C+ has necessarily non-negative eigenvalues. If K 2 C is compact, the spectrum of K is defined by σ(K) = fλj : λj is an eigenvalue of Kg [ f0g: If K has a spectral representation (3) and f is a real function defined on a real interval containing σ(K), bounded on σ(K), we define the operator f(K) by X f(K)(x) = f(0)P0x + f(λj)hx; ejiej; j where P0x denotes the orthogonal projection of x on the kernel of K. For these operators we have kf(K)kop = sup jf(λj)j; j see, e.g., Gohberg et al. (2003, Ch. 8). Some examples of functions f(K) to be considered below are as follows. 1=2 (i) If K 2 C+ is compact and injective, then the square root of K, denoted K , is defined 1=2 P 1=2 by K (x) = j λj hx; ejiej. Note that the name \square root" is justified since K1=2K1=2 = K. In fact, A1=2 can be defined as well, in a natural way, even if A is not compact; see, for example, Debnath and Mikiusinski (2005, p. 173) for a definition of the square root operator when A is a bounded positive operator. (ii) We will also need to use operator functions f(A) in some examples where A is not P necessarily compact but it can be expressed in the form Ax = j µjhx; ejiej, for some (bounded) sequence of constants fµjg and some orthonormal system fejg. Then, we P can define f(A) (if f is a function bounded on fµjg) by f(A)(x) = j f(µj)hx; ejiej (iii) A particular case arises when A = K + hI, where K is a compact, injective operator, K 2 C+, h is a real number and I is the identity operator. Since K is injective there is an orthonormal basis fejg of H, made of eigenvectors of K. Thus, if fλjg are the P corresponding eigenvalues, we have, for all x 2 H,(K + hI)(x) = j(λj + h)hx; ejiej 1=2 P 1=2 and (K + hI) (x) = j(λj + h) h:x; ejiej: Similarly, the positive part of K + hI is P + + defined by (K + hI)+(x) = j(λj + h) hx; ejiej; where f(t) := t = maxft; 0g is the usual positive part function, defined for real numbers. 3 Berrendero, Bueno-Larraz and Cuevas 1.3. On the difficulties of defining a functional Mahalanobis-type distance The aim of this paper is to extend the notion of the multivariate (finite-dimensional) Ma- halanobis distance (1) to the functional case when x; y 2 L2[0; 1]. Clearly, in view of (1), the inverse K−1 of the functional operator K should play some role in this extension if we want to keep a close analogy with the multivariate case.