Optimal Whitening and Decorrelation

The American Statistician ISSN: 0003-1305 (Print) 1537-2731 (Online) Journal homepage: https://www.tandfonline.com/loi/utas20 Optimal Whitening and Decorrelation Agnan Kessy, Alex Lewin & Korbinian Strimmer To cite this article: Agnan Kessy, Alex Lewin & Korbinian Strimmer (2018) Optimal Whitening and Decorrelation, The American Statistician, 72:4, 309-314, DOI: 10.1080/00031305.2016.1277159 To link to this article: https://doi.org/10.1080/00031305.2016.1277159 Accepted author version posted online: 19 Jan 2017. Published online: 26 Jan 2018. Submit your article to this journal Article views: 1739 View related articles View Crossmark data Citing articles: 27 View citing articles Full Terms & Conditions of access and use can be found at https://www.tandfonline.com/action/journalInformation?journalCode=utas20 THE AMERICAN STATISTICIAN ,VOL.,NO.,–:General https://doi.org/./.. Optimal Whitening and Decorrelation Agnan Kessya, Alex Lewinb, and Korbinian Strimmerc aStatistics Section, Department of Mathematics, Imperial College London, South Kensington Campus, London, United Kingdom; bDepartment of Mathematics, Brunel University London, Kingstone Lane, Uxbridge, United Kingdom; cEpidemiology and Biostatistics, School of Public Health, Imperial College London, Norfolk Place, London, United Kingdom ABSTRACT ARTICLE HISTORY Whitening, or sphering, is a common preprocessing step in statistical analysis to transform random Received December variables to orthogonality. However, due to rotational freedom there are infinitely many possible whitening Revised December procedures. Consequently, there is a diverse range of sphering methods in use, for example, based on KEYWORDS principal component analysis (PCA), Cholesky matrix decomposition, and zero-phase component analysis CAR score; CAT score; (ZCA), among others. Here, we provide an overview of the underlying theory and discuss five natural Cholesky decomposition; whitening procedures. Subsequently, we demonstrate that investigating the cross-covariance and the Decorrelation; Principal cross-correlation matrix between sphered and original variables allows to break the rotational invariance components analysis; and to identify optimal whitening transformations. As a result we recommend two particular approaches: Whitening; ZCA-Mahalanobis ZCA-cor whitening to produce sphered variables that are maximally similar to the original variables, and transformation PCA-cor whitening to obtain sphered variables that maximally compress the original variables. 1. Introduction W (Σ W TW ) = W,whichisfulfilledifW satisfies the condition Whitening,orsphering, is a linear transformation that converts − T W TW = Σ 1. a d-dimensional random vector x = (x1,...,xd ) with mean (3) E(x) = μ = (μ ,...,μ )T and positive definite d × d covari- 1 d However, unfortunately, this constraint does not uniquely ance matrix var(x) = Σ into a new random vector determine the whitening matrix W.Quitethecontrary,given Σ there are in fact infinitely many possible matrices W that all z = ( ,..., )T = Wx z1 zd (1) satisfy Equation (3), and each W leads to a whitening transformation that produces orthogonal but different sphered random of the same dimension d and with unit diagonal “white” covari- variables. ance var(z) = I.Thesquared × d matrix W is called the This raises two important issues: first, how to best understand whitening matrix. As orthogonality among random variables the differences among the various sphering transformations, and greatly simplifies multivariate data analysis both from a com- second, how to select an optimal whitening procedure for a par- putational and a statistical standpoint, whitening is a critically ticular situation. Here, we propose to address these questions by important tool, most often employed in preprocessing but also investigating the cross-covariance and cross-correlation matrix as part of modeling (e.g., Zuber and Strimmer 2009;Hao,Dong, between z and x. As a result, we identify five natural whitening and Fan 2015). procedures, of which we recommend two particular approaches Whitening can be viewed as a generalization of standardizing for general use. a random variable that is carried out by − / 2. Notation and Useful Identities z = V 1 2x, (2) Inthefollowing,wewillmakeuseofanumberofcovari- / / V = (σ 2,...,σ2) ance matrix identities: the decomposition Σ = V 1 2PV 1 2 of the where the matrix diag 1 d contains the variances ( ) = σ 2 ( ) = covariance matrix into the correlation matrix P and the diagonal var xi i . This results in var zi 1 but it does not remove correlations. Often, standardization and whitening transforma- variance matrix V, and the eigendecomposition of the covari- tions are also accompanied by mean-centering of x or z to ensure ance matrix Σ = UΛU T and the eigendecomposition of the T E(z) = 0,butthisisnotactuallynecessaryforproducingunit correlation matrix P = GΘG ,whereU, G contain the eigen- variances or a white covariance. vectors and Λ, Θ the eigenvalues of Σ, P,respectively.Wewill −1/2 −1/2 The whitening transformation defined in Equation1 ( ) frequently use Σ = UΛ U T , the unique inverse matrix − / −1/2 T requires the choice of a suitable whitening matrix W. square root of Σ,aswellasP 1 2 = GΘ G , the unique Since var(z) = I it follows that WΣW T = I and thus inverse matrix square root of the correlation matrix. CONTACT Korbinian Strimmer [email protected] Epidemiology and Biostatistics, School of Public Health, Imperial College London, Norfolk Place, London W PG, United Kingdom. © American Statistical Association 310 A. KESSY, A. LEWIN, AND K. STRIMMER Following the standard convention, we assume that the The cross-covariance matrix Φ between z and x is given by eigenvalues are sorted in order from largest to smallest value. Φ = (φ ) = (z, x) = (Wx, x) In addition, we recall that by construction all eigenvectors are ij cov cov U G = WΣ = Q Σ1/2. definedonlyuptoasign,thatis,thecolumnsof and can 1 (6) be multiplied with a factor of −1 and the resulting matrix is still valid. Indeed, using different numerical algorithms and software Likewise, the cross-correlation matrix is will often result in eigendecompositions with U and G showing Ψ = (ψ ) = (z, x) = ΦV −1/2 diverse column signs. ij cor = Q A Σ1/2V −1/2 = Q P1/2. 2 2 (7) 3. Rotational Freedom in Whitening Thus, we find that the rotational freedom inherent in W,which Q Q The constraint Equation (3)onthewhiteningmatrixdoesnot is represented by the matrices 1 and 2, is directly reflected W in the corresponding cross-covariance Φ and cross-correlation fully identify but allows for rotational freedom. This becomes Ψ z x apparent by writing W in its polar decomposition between and . This provides the leverage that we will use to select and discriminate among whitening transformations by W = Q Σ−1/2, Φ Ψ 1 (4) appropriately choosing or constraining or . AscanbeseenfromEquation(6)andEquation(7), both Φ Q QT Q = I W where 1 is an orthogonal matrix with 1 1 d.Clearly, and Ψ are in general not symmetric, unless Q = I or Q = Q 1 2 satisfies Equation3 ( ) regardless of the choice of 1. I, respectively. Note that the diagonal elements of the cross- This implies a geometrical interpretation of whitening as a correlation matrix Ψ need not be equal to 1. Σ−1/2 − combination of multivariate rescaling by and rotation Furthermore, since x = W 1z, each x is perfectly explained Q W j by 1.Italsoshowsthatallwhiteningmatrices have the ,..., − / by a linear combination of the uncorrelated z1 zd,and same singular values Λ 1 2,whichfollowsfromthesingular z − / hence the squared multiple correlation between x j and equals value decomposition W = (Q U )Λ 1 2U T with Q U orthog- 1 1 1. Thus, the column sum over the squared cross-correlations onal. This highlights that the fundamental rescaling is via d ψ2 is always 1. In matrix notation, diag(Ψ T Ψ ) = Λ−1/2 i=1 ij thesquarerootoftheeigenvalues .Geometrically,the / T / − / diag(P1 2Q Q P1 2) = diag(P) = (1,...,1)T .Incontrast,the whitening transformation with W = Q UΛ 1 2U T is a rotation 2 2 1 d ψ2 U T followed by scaling, possibly followed by another rotation row sum of over the squared cross-correlations j=1 ij varies Q for different whitening procedures, and is, as we will see below, (depending on the choice of 1). Sinceinmanysituationsitisdesirabletoworkwithstandard- highly informative for choosing relevant transformations. ized variables V −1/2x another useful decomposition of W that also directly demonstrates the inherent rotational freedom is 5. Five Natural Whitening Procedures W = Q P−1/2V −1/2, 2 (5) In practical application of whitening, there are a handful of spheringproceduresthataremostcommonlyused(e.g.,Liand where Q is a further orthogonal matrix with QT Q = I .Evi- 2 2 2 d Zhang 1998). Accordingly, in Table 1 we describe the properties dently, thisW also satisfies the constraint of Equation3 ( ) regard- Q of five whitening transformations, listing the respective sphering less of the choice of 2. − / matrix W, the associated rotation matrices Q and Q ,andthe In this view, with W = Q GΘ 1 2GTV −1/2,thevariablesare 1 2 2 resulting cross-covariances Φ and cross-correlations Ψ.Allfive firstscaledbythesquarerootofthediagonalvariancematrix, methods are natural whitening procedures arising from specific then rotated by GT ,thenscaledagainbythesquarerootofthe constraints on Φ or Ψ,aswewillshowfurtherbelow. eigenvalues of the correlation matrix, and possibly rotated once The ZCA whitening transformation employs the sphering more (depending on the choice of Q ). 2 matrix For the above two representations to result in the same W Q Q ZCA −1/2 whitening matrix , two different rotations 1 and 2 are W = Σ , (8) Q = Q A A = required. These are linked by 1 2 where the matrix / − / P−1/2V −1/2Σ1 2 = P1/2V 1/2Σ 1 2 is itself orthogonal. Since the where ZCA stands for “zero-phase components analysis” (Bell eigendecompositions of the covariance and the correlation and Sejnowski 1997). This procedure is also known as Maha- Q = I matrix are not readily related to each other, the matrix A can lanobis whitening.With 1 itistheuniquespheringmethod unfortunately not be further simplified.

Load more