Covariance and Principal Component Analysis

ESS 522 2014 14. Covariance and Principal Component Analysis Covariance and Correlation Coefficient In many fields of observational geoscience many variables are being monitored together as a function of space (or sample number) or time. The covariance is a measure of how variations in pairs of variables are linked to each other. If we measure properties xi and yi for i = 1, 2, …, n, we can write the sample variances for x and y as n 1 2 s2 = x − x (14-1) x n 1∑( i ) − i=1 and n 1 2 s2 = y − y (14-2) y n − 1∑( i ) i=1 We define the covariance between x and y, sxy as 1 n s2 = x − x y − y (14-3) xy n 1∑( i )( i ) − i=1 The covariance tells us how x and y values depend on each other. The correlation coefficient, r is a normalized version of the covariance and is given by s2 r = xy (14-4) s s x y The correlation coefficient is constrained for fall in the range ±1. A value of +1 tells us that the points (xi, yi) define a straight line with a positive slope. A value of -1 tells us that the points (xi, yi) define a straight line with a negative slope. A value of 0 shows that there is no dependence of y on x or vice versa (i.e., no correlation) . The quantity r2 is the coefficient of determination and it is a measure of the fraction of the variance of y that can be attributed to a relationship with x. It is important to note that r measures the strength of the linear relationship between x and y but a high value of |r| does not necessarily imply a cause and effect relationship or that the two variables are linearly related. It is easy to devise non-linear relationships that give a high correlation coefficient. It is important to look at the data and use common sense. Principal Component Analysis If you are measuring only two variables or a handful of variables then it is straightforward to plot pairs of variables to see how they are correlated. However when many variables are being measured graphical comparisons of individual covariances can be time consuming. Principal component analysis is a tool that is commonly used in exploratory data analysis and predictive analysis, which seeks to explain the bulk of the variation in demeaned data in terms of a smaller number of uncorrelated derived variables (principal components). Each principal component is a linear combination of the starting variables and they are ordered so that each accounts for more of the variation than the previous ones. 14-1 ESS 522 2014 To fully understand the process by which principal components are derived you will need to take an inverse theory class (ESS 523) but you can apply the technique and understand the results without this. th If we have M observations of N demeaned variables x1, x2, x3, … xN where xi,j is the j th observation of the i variable then the covariance matrix C is an N x N matrix in which Ci,j is given by 1 M (14-5) Ci, j = ∑ xi,k x j,k M k=1 This expression is equivalent to equation (14-3) except that the data has been demeaned. If we write all the demeaned observations as a matrix with each variable as a column ! x x x ... x $ # 1,1 2,1 3,1 N ,1 & # x x x ... x & # 1,2 2,2 3,2 N ,2 & X = (14-6) # x1,3 x2,3 x3,3 ... xN ,3 & # & # ... ... ... ... ... & # x x x ... x & " 1, M 2, M 3, M N , M % The N x N symmetric covariance matrix can be calculated as 1 C = XT X (14-7) M Now in principal component analysis we compute the matrix of V of eigenvectors which diagonalizes the covariance matrix according to −1 V CV = D (14-8) where D is a diagonal matrix of eigenvalues of C. In Matlab the command eig.m will do this eigenvalue decomposition and compute V and D. The columns of V are orthogonal vectors (their dot product is zero) of unit length and they define the principal components – that is combinations of data in directions that have zero covariance with each other. The diagonal elements of D are the variance of each of the corresponding principal components. For principal component analysis one would take the diagonal of D and sort its elements into descending order and then apply the same sorting to the columns of V. The fraction of the variance explained by each principal component is given by Di,i fi = M (14-9) D ∑ k,k k =1 The principal components have two related applications (1) They allow you to see how different variable change with each other. For example if 4 variables have a first principal component that explains most of the variation in the data and which is given by 14-2 ESS 522 2014 " 1 % 1 $ 1 ' $ − ' (14-10) 6 $ 2 ' $ 0 ' # & then this tells you that the first and second variable are anti-correlated in proportion to each other, that the first and third variable are correlated with the third variable changing twice as much as the first, and the fourth variable not being correlated with these changes in the other three. The second principal component would give the weightings for the next strongest independent correlation between all the variables. Once you see how variables change with each other in an observational data set, one can think about the physical explanation. (2) By selecting the minimum number of principal components to explain most of the variation (90% of the variance is often chosen) one can develop a simple predictive model that can predict reasonable values of all the variables based on knowledge of only a few. Factor Analysis Factor analysis is another technique that is similar to principal component analysis. To understand the difference, one can envision implementing principal component analysis one component at a time. The 1st principal component is oriented so that it explains as much of the variance as possible. Then the 2nd principal component is oriented to explain as much of the remaining variance as possible. Then the 3rd principal component is oriented, etc. In factor analysis one choses the number of components up front and then seeks to orient them together so that in sum they explain as much of the total variance as possible. Why is this different? Well for, example one might find the 1st principal component explains 75% of the variance and the 2nd 10%, but by adjusting them together come up with alternative components that explain 70% and 20%, respectively, thus accounting for a total of 90% rather than 85% of the variance. By reducing the amount of variance explained by the first component it may be possible to find two directions that explain more variance in total. Unlike eigenvalue decomposition for principal component analysis, there is no elegant mathematical way to do factor analysis – it is generally implemented by an iterative process of adjusting trial orientations (possibly based on the principal components) to try and maximize the variance explained. You will find a lot of discussion on the web and in the scientific literature as to which technique is best. 14-3 .

Load more