-- --
-2- Principal Component Analysis (PCA)
• Patternrecognition in high-dimensional spaces
-Problems arise when performing recognition in a high-dimensional space (e.g., curse of dimensionality).
-Significant improvements can be achievedbyfirst mapping the data into a lower-dimensionality space.
a1 b1 a2 b2 x = −−> reduce dimensionality −−> y = (K << N) ... ... aN bK -The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the original dataset.
• Dimensionality reduction
-PCA allows us to compute a linear transformation that maps data from a high dimensional space to a lower dimensional space.
b = t a + t a +...+t a 1 = 11 1 + 12 2+ + 1n N b2 t21a1 t22a2 ... t2n aN ... = + + + bK tK1a1 tK2a2 ... tKN aN
t11 t12 ... t1N t21 t22 ... t2N or y = Tx where T = ...... tK1 tK2 ... tKN -- --
-3-
• Lower dimensionality basis
-Approximate the vectors by finding a basis in an appropriate lower dimen- sional space.
(1) Higher-dimensional space representation: = + + ...+ x a1v1 a2v2 aN vN
v1, v2,..., vN is a basis of the N-dimensional space
(2) Lower-dimensional space representation: = + + ...+ xˆ b1u1 b2u2 bK uK
u1, u2,..., uK is a basis of the K-dimensional space
- Note: if both bases have the same size (N = K), then x = xˆ)
Example
1 0 0 = = = v1 0 , v2 1 , v3 0 (standard basis) 0 0 1
3 = = + + xv 3 3v1 3v2 3v3 3
1 1 1 = = = u1 0 , u2 1 , u3 1 (some other basis) 0 0 1
3 = = + + xu 3 0u1 0u2 3u3 3 = thus, xv xu -- --
-4-
• Information loss
-Dimensionality Reduction implies Information Loss !!
-Preserveasmuch information as possible, that is,
minimize ||x − xˆ|| (error)
• Howtodetermine the best lower dimensional space?
The best low-dimensional space can be determined by the "best" eigenvectorsofthe covariance matrix of x (i.e., the eigenvectorscorresponding to the "largest" eigen- values -- also called "principal components").
• Methodology
-Suppose x1, x2,..., x M are Nx1 vectors
M = 1 Step 1: x Σ xi M i=1 Φ = − Step 2: subtract the mean: i xi x = Φ Φ ...Φ Step 3: form the matrix A [ 1 2 M ] (NxM matrix), then compute: 1 M = Φ ΦT = T C Σ n n AA M n=1 (sample covariance matrix, NxN,characterizes the scatter of the data) ... Step 4: compute the eigenvalues of C: 1 > 2 > > N
Step 5: compute the eigenvectors of C: u1, u2,...,uN
-Since C is symmetric, u1, u2,...,uN form a basis, (i.e., anyvector x or actually (x − x),can be written as a linear combination of the eigenvectors):
N − = + + ...+ = x x b1u1 b2u2 bN uN Σ biui i=1 -- --
-5-
Step 6: (dimensionality reduction step)keep only the terms correspond- ing to the K largest eigenvalues: K − = xˆ x Σ biui where K << N i=1 − -The representation of xˆ x into the basis u1, u2,..., uK is thus
b1 b 2 ... bK
• Linear tranformation implied by PCA
-The linear tranformation RN ->RK that performs the dimensionality reduction is:
T b1 u1 b uT 2 = 2 (x − x) = UT (x − x) ... ... T bK uK
• An example
(see Castleman’sappendix, pp. 648-649) -- --
-6-
• Geometrical interpretation
-PCA projects the data along the directions where the data varies the most.
-These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues.
-The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions.
• Properties and assumptions of PCA
-The newvariables (i.e., bi’s)are uncorrelated.
1 0 0 0 0 2 . . . the covariance of b ’s is: UT CU = i . . . . . . 0 0 K
-The covariance matrix represents only second order statistics among the vector values.
-Since the newvariables are linear combinations of the original variables, it is usally difficult to interpret their meaning. -- --
-7-
• Howtochoose the principal components?
-Tochoose K,use the following criterion:
K Σ i i=1 > Threshold (e.g., 0.9 or 0.95) N Σ i i=1
• What is the error due to dimensionality reduction?
-Wesaw above that an original vector x can be reconstructed using its the prin- cipla components: K K − = = + xˆ x Σ biui or xˆ Σ biui x i=1 i=1
-Itcan be shown that the low-dimensional basis based on principal components minimizes the reconstruction error:
e = ||x − xˆ||
-Itcan be shown that the error is equal to:
N = e 1/2 Σ i i=K+1
• Standardization
-The principal components are dependent on the units used to measure the orig- inal variables as well as on the range of values theyassume.
-Weshould always standardize the data prior to using PCA.
-Acommon standardization method is to transform all the data to have zero mean and unit standard deviation: − xi ( and are the mean and standard deviation of xi’s) -- --
-8-
• PCA and classification
-PCA is not always an optimal dimensionality-reduction procedure for classifi- cation purposes:
• Other problems
-How tohandle occlusions?
-How tohandle different views of a 3D object?
-- --