Principal Component Analysis (PCA)

-- -- -2- Principal Component Analysis (PCA) • Patternrecognition in high-dimensional spaces -Problems arise when performing recognition in a high-dimensional space (e.g., curse of dimensionality). -Significant improvements can be achievedbyfirst mapping the data into a lower-dimensionality space. a1 b1 a2 b2 x = −−> reduce dimensionality −−> y = (K << N) ... ... aN bK -The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the original dataset. • Dimensionality reduction -PCA allows us to compute a linear transformation that maps data from a high dimensional space to a lower dimensional space. b = t a + t a +...+t a 1 = 11 1 + 12 2+ + 1n N b2 t21a1 t22a2 ... t2n aN ... = + + + bK tK1a1 tK2a2 ... tKN aN t11 t12 ... t1N t21 t22 ... t2N or y = Tx where T = ... ... ... ... tK1 tK2 ... tKN -- -- -3- • Lower dimensionality basis -Approximate the vectors by finding a basis in an appropriate lower dimensional space. (1) Higher-dimensional space representation: = + + ...+ x a1v1 a2v2 aN vN v1, v2,..., vN is a basis of the N-dimensional space (2) Lower-dimensional space representation: = + + ...+ xˆ b1u1 b2u2 bK uK u1, u2,..., uK is a basis of the K-dimensional space - Note: if both bases have the same size (N = K), then x = xˆ) Example 1 0 0 = = = v1 0 , v2 1 , v3 0 (standard basis) 0 0 1 3 = = + + xv 3 3v1 3v2 3v3 3 1 1 1 = = = u1 0 , u2 1 , u3 1 (some other basis) 0 0 1 3 = = + + xu 3 0u1 0u2 3u3 3 = thus, xv xu -- -- -4- • Information loss -Dimensionality Reduction implies Information Loss !! -Preserveasmuch information as possible, that is, minimize ||x − xˆ|| (error) • Howtodetermine the best lower dimensional space? The best low-dimensional space can be determined by the "best" eigenvectorsofthe covariance matrix of x (i.e., the eigenvectorscorresponding to the "largest" eigenvalues -- also called "principal components"). • Methodology -Suppose x1, x2,..., x M are Nx1 vectors M = 1 Step 1: x Σ xi M i=1 Φ = − Step 2: subtract the mean: i xi x = Φ Φ ...Φ Step 3: form the matrix A [ 1 2 M ] (NxM matrix), then compute: 1 M = Φ ΦT = T C Σ n n AA M n=1 (sample covariance matrix, NxN,characterizes the scatter of the data) ... Step 4: compute the eigenvalues of C: 1 > 2 > > N Step 5: compute the eigenvectors of C: u1, u2,...,uN -Since C is symmetric, u1, u2,...,uN form a basis, (i.e., anyvector x or actually (x − x),can be written as a linear combination of the eigenvectors): N − = + + ...+ = x x b1u1 b2u2 bN uN Σ biui i=1 -- -- -5- Step 6: (dimensionality reduction step)keep only the terms corresponding to the K largest eigenvalues: K − = xˆ x Σ biui where K << N i=1 − -The representation of xˆ x into the basis u1, u2,..., uK is thus b1 b 2 ... bK • Linear tranformation implied by PCA -The linear tranformation RN ->RK that performs the dimensionality reduction is: T b1 u1 b uT 2 = 2 (x − x) = UT (x − x) ... ... T bK uK • An example (see Castleman’sappendix, pp. 648-649) -- -- -6- • Geometrical interpretation -PCA projects the data along the directions where the data varies the most. -These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues. -The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions. • Properties and assumptions of PCA -The newvariables (i.e., bi’s)are uncorrelated. 1 0 0 0 0 2 . the covariance of b ’s is: UT CU = i . . 0 0 K -The covariance matrix represents only second order statistics among the vector values. -Since the newvariables are linear combinations of the original variables, it is usally difficult to interpret their meaning. -- -- -7- • Howtochoose the principal components? -Tochoose K,use the following criterion: K Σ i i=1 > Threshold (e.g., 0.9 or 0.95) N Σ i i=1 • What is the error due to dimensionality reduction? -Wesaw above that an original vector x can be reconstructed using its the prin- cipla components: K K − = = + xˆ x Σ biui or xˆ Σ biui x i=1 i=1 -Itcan be shown that the low-dimensional basis based on principal components minimizes the reconstruction error: e = ||x − xˆ|| -Itcan be shown that the error is equal to: N = e 1/2 Σ i i=K+1 • Standardization -The principal components are dependent on the units used to measure the original variables as well as on the range of values theyassume. -Weshould always standardize the data prior to using PCA. -Acommon standardization method is to transform all the data to have zero mean and unit standard deviation: − xi ( and are the mean and standard deviation of xi’s) -- -- -8- • PCA and classification -PCA is not always an optimal dimensionality-reduction procedure for classification purposes: • Other problems -How tohandle occlusions? -How tohandle different views of a 3D object? -- --.

Load more