A Theoretical and Computational Comparison of Approximations Used in Perturbation Analysis of Covariance Matrices Jacques Bénasséni, Alain Mom
Total Page:16
File Type:pdf, Size:1020Kb
A theoretical and computational comparison of approximations used in perturbation analysis of covariance matrices Jacques Bénasséni, Alain Mom To cite this version: Jacques Bénasséni, Alain Mom. A theoretical and computational comparison of approximations used in perturbation analysis of covariance matrices. 2019. hal-02083181 HAL Id: hal-02083181 https://hal.archives-ouvertes.fr/hal-02083181 Preprint submitted on 28 Mar 2019 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. A theoretical and computational comparison of approximations used in perturbation analysis of covariance matrices Jacques B´enass´eni and Alain Mom ∗† Univ Rennes, CNRS, IRMAR - UMR 6625 F-35000 Rennes, France Abstract Covariance matrices play a central role in a wide range of multivariate statistical methods. Therefore, a large amount of work has been devoted to analyzing the sensitivity of their eigenstructure to influential observations. In order to evaluate the effect of deleting one or a small subset of the observations, several approxima- tions to the eigenelements of the perturbed matrix have been proposed. This paper provides a theoretical and numerical comparison of the main approximations. A special emphasis is given to those based on Rayleigh quotients which are seldom used. A general approach, using new inequalities, is proposed in order to evaluate their accuracy without having to recompute the exact perturbed eigenvalues and eigenvectors. This approach is of specific interest from a computational standpoint. ∗Email addresses : corresponding author [email protected], alain.mom@univ- rennes2.fr † 1 Theoretical developments are illustrated with a numerical study which emphasizes the accuracy of approximations based on Rayleigh quotients. Keywords. approximation; eigenvalue and eigenvector; covariance matrix; per- turbation. Abbreviated title: Approximations for covariance matrices 1 Introduction We consider a n × p data matrix X in which the n rows are the observation vec- t Rp n tors xi ∈ , i = 1,...,n. Letting x = (1/n) i=1 xi , the covariance matrix n t S = (1/n) i=1 (xi − x)(xi − x) is involved in aP wide range of statistical meth- ods includingP principal component analysis or multiple regression for example. Since S is known to be highly prone to influential observations, sensitivity aspects for principal component analysis have been discussed in several papers including Critchley [6] , Pack, Jolliffe and Morgan [12], Tanaka [16], Jolliffe [10], Prender- gast [13], Prendergast & Li Wai Suen [14] among many others. In order to assess the influence of a small subset I of r observations on the eigenstructure of S, a possible approach consists of studying the effect of removing these r observations on the eigenelements of S. In this framework, letting S denote the covariance matrix obtained without the subset of observations indexed by I, several authors e have studied the relationship between the eigenelements of S and those of S. More specifically, providing approximations to the eigenelements of S allows to detect e influential subsets of observations without having to recompute the exact modi- e fied eigenvalues and eigenvectors. Wang & Nyquist [18] and Hadi & Nyquist [8] 2 study the effect of deleting a single observation while Wang & Liski [17], Enguix- Gonz´alez, Mu˜noz- Pichardo, Moreno-Rebollo & Pino-Mej´ıas [7] and B´enass´eni [3] focus on the general case where I comprises several observations. In these works, approximations to the eigenvalues and eigenvectors are obtained by retaining the first terms in power expansions of these parameters. Independently of these works, B´enass´eni [1] suggest using approximations based on Rayleigh quotients together with inequalities provided by Wilkinson [19]. The aim of this paper is twofold. First, approximations based on Rayleigh quotients are compared with other ap- proximations from a theoretical standpoint. Second, recent inequalities of Chatelin [4] are introduced in order to evaluate the accuracy of these approximations with- out having to recompute the eigenelements of S. Finally, all the approximations are compared from a computational standpoint using the soil composition data of e Kendall [11]. Throughout this paper, the eigenvalues λ1 > λ2 ...>λp ≥ 0 of S are assumed simple and associated to the normalized eigenvectors φ1, φ2,... φp. In the same way, the eigenvalues λ˜1 > λ˜2 ...> λ˜p ≥ 0 of the perturbed matrix S are also assumed simple and associated to the eigenvectors φ˜ , φ˜ ,... φ˜ . 1 2 p e 2 Theoretical background on matrix perturbation 2.1 General case Referring to Enguix-Gonz´alez, Mu˜noz-Pichardo, Moreno-Rebollo & Pino-Mej´ıas [7] or B´enass´eni [3] and letting xI = (1/r) i∈I xi, we know that, when a subset of observations indexed by I is deleted, theP covariance matrix S is transformed to S which can be expressed as: 2 e r 1 t r t S = S+ S − (xi − x)(xi − x) + − (x − x)(x − x) n − r r n − r I I " i∈I # X e (1) 3 2 r We then have a perturbation of the form S = S + ǫM + ǫ N with ǫ = n−r , M = S − (1/r) (x − x)(x − x)t and N = − (x − x)(x − x)t. Following i∈I i i e I I matrix perturbationP theory detailed in Wilkinson [19] for example or referring to Sibson [15], we know that, if ǫ is sufficiently small, for each simple eigenvalue λ of S there is an eigenvalue λ˜ of S given by a convergent power series: e 2 m m+1 λ˜ = λ + γ1ǫ + γ2ǫ + ... + γmǫ + O(ǫ ) (2) with a corresponding eigenvector which can also be expressed under a convergent power series: ˜ 2 m m+1 φ = φ + ψ1ǫ + ψ2ǫ + ... + ψmǫ + O(ǫ ) (3) The parameters γ1,γ2, ...,γm and ψ1,ψ2,...,ψm are derived by equating the 2 m ˜ ˜ ˜ r coefficients of ǫ, ǫ ,...,ǫ in the equation Sφ = λφ. Assuming that ǫ = n−r is sufficiently small to ensure the convergence of the power series, Enguix-Gonz´alez, e Mu˜noz-Pichardo, Moreno-Rebollo & Pino-Mej´ıas [7] provide the first and second t order terms in (2) and (3). Letting αki = φk (xi − x), for i = 1,...,n and k = 1,...,p and αkI =( i∈I αki)/r, they get: P r 1 λ˜ = λ + λ − α2 k k n − r k r ki i∈I ! X 2 2 3 r 2 1 1 r + −αkI − αkiαji + O (4) n − r λj − λk r n − r j k i∈I ! ! X=6 X φ 2 ˜ r 1 j r φk = φk + αkiαji + O (5) n − r r λj − λk n − r j k i∈I ! ! X=6 X for k = 1,...,p. Details on the derivation of the above expressions are given by 4 ˜ B´enass´eni [3]. However, the derivation of the second order term for φk is tedious and omitted by this author. Enguix-Gonz´alez, Mu˜noz-Pichardo, Moreno-Rebollo & Pino-Mej´ıas [7] also provide a comprehensive study of approximations for the unbiased matrix (n − r)S/(n − r − 1). Finally, it should be noted that the perturbation λ˜ of λ may not necessarily e k k be the kth largest eigenvalue of S if the subset of observations indexed by I has initially a strong influence on λ for example. In this case, we assume simply ke that the eigenvalues λ˜k have been reordered in decreasing order and that their ˜ corresponding eigenvectors φk have been relabeled. 2.2 Deleting a single observation When studying the influence of a single observation, we are in the case defined by 1 r = 1, I = {i} and ǫ = n−1 . We have the following simplified expressions for (1), (4) and (5) : 2 1 t 1 t S = S + S − (xi − x)(xi − x) + − (xi − x)(xi − x) (6) n − 1 n − 1 e 1 λ˜ = λ + λ − α2 k k n − 1 k ki 2 2 3 1 2 αji 1 − αki 1+ + O (7) n − 1 λj − λk n − 1 j k ! X=6 2 ˜ 1 αji 1 φk = φk + αki φj + O (8) n − 1 λj − λk n − 1 j k ! X=6 Influence of a single observation has been studied in a comprehensive way by 5 Critchley [6]. This author studies influence functions for the eigenvalues and eigen- vectors in principal component analysis and (6), (7) and (8) agree with relations given in his Subsection 3.2. Finally, it should be noted that (6) can be expressed as n 1 t S = S − (xi − x)(xi − x) (9) n − 1 n − 1 which shows that, whene deleting a single observation, we have a rank one pertur- bation. 3 Approximations based on power expansions 3.1 General approach For any integer m ≥ 1, retaining only terms of order lower or equal to m in ǫ in (2) and (3) provides the following approximations of order m for the eigenvalues and eigenvectors of S : e (m) 2 m λ˜ = λ + γ1ǫ + γ2ǫ + ... + γmǫ and ˜(m) 2 m φ = φ + ψ1ǫ + ψ2ǫ + ... + ψmǫ This is the general approach suggested by Wang & Liski [17], Enguix-Gonz´alez, Mu˜noz-Pichardo, Moreno-Rebollo & Pino-Mej´ıas [7] and B´enass´eni [3]. Develop- ments of the previous section provide explicit formulations for the first and second order approximations as well when deleting a single observation as in the general case where r > 1.