-- --

-2- Principal Component Analysis (PCA)

• Patternrecognition in high-dimensional spaces

-Problems arise when performing recognition in a high-dimensional space (e.g., ).

-Significant improvements can be achievedbyfirst mapping the data into a lower-dimensionality space.

 a1   b1      a2 b2 x =   −−> reduce dimensionality −−> y =   (K << N)  ...   ...   aN   bK  -The goal of PCA is to reduce the dimensionality of the data while retaining as much as possible of the variation present in the original dataset.

• Dimensionality reduction

-PCA allows us to compute a linear transformation that maps data from a high dimensional space to a lower dimensional space.

b = t a + t a +...+t a 1 = 11 1 + 12 2+ + 1n N b2 t21a1 t22a2 ... t2n aN ... = + + + bK tK1a1 tK2a2 ... tKN aN

 t11 t12 ... t1N    t21 t22 ... t2N or y = Tx where T =    ......   tK1 tK2 ... tKN  -- --

-3-

• Lower dimensionality basis

-Approximate the vectors by finding a basis in an appropriate lower dimen- sional space.

(1) Higher-dimensional space representation: = + + ...+ x a1v1 a2v2 aN vN

v1, v2,..., vN is a basis of the N-dimensional space

(2) Lower-dimensional space representation: = + + ...+ xˆ b1u1 b2u2 bK uK

u1, u2,..., uK is a basis of the K-dimensional space

- Note: if both bases have the same size (N = K), then x = xˆ)

Example

 1   0   0  =   =   =   v1  0 , v2  1 , v3  0  (standard basis)  0   0   1 

 3  =   = + + xv  3  3v1 3v2 3v3  3 

 1   1   1  =   =   =   u1  0 , u2  1 , u3  1  (some other basis)  0   0   1 

 3  =   = + + xu  3  0u1 0u2 3u3  3  = thus, xv xu -- --

-4-

• Information loss

-Dimensionality Reduction implies Information Loss !!

-Preserveasmuch information as possible, that is,

minimize ||x − xˆ|| (error)

• Howtodetermine the best lower dimensional space?

The best low-dimensional space can be determined by the "best" eigenvectorsofthe of x (i.e., the eigenvectorscorresponding to the "largest" eigen- values -- also called "principal components").

• Methodology

-Suppose x1, x2,..., x M are Nx1 vectors

M = 1 Step 1: x Σ xi M i=1 Φ = − Step 2: subtract the mean: i xi x = Φ Φ ...Φ Step 3: form the matrix A [ 1 2 M ] (NxM matrix), then compute: 1 M = Φ ΦT = T C Σ n n AA M n=1 (sample covariance matrix, NxN,characterizes the scatter of the data) ... Step 4: compute the eigenvalues of C: 1 > 2 > > N

Step 5: compute the eigenvectors of C: u1, u2,...,uN

-Since C is symmetric, u1, u2,...,uN form a basis, (i.e., anyvector x or actually (x − x),can be written as a linear combination of the eigenvectors):

N − = + + ...+ = x x b1u1 b2u2 bN uN Σ biui i=1 -- --

-5-

Step 6: (dimensionality reduction step)keep only the terms correspond- ing to the K largest eigenvalues: K − = xˆ x Σ biui where K << N i=1 − -The representation of xˆ x into the basis u1, u2,..., uK is thus

 b1   b   2   ...   bK 

• Linear tranformation implied by PCA

-The linear tranformation RN ->RK that performs the dimensionality reduction is:

T  b1   u1   b   uT   2  =  2 (x − x) = UT (x − x)  ...   ...  T  bK   uK 

• An example

(see Castleman’sappendix, pp. 648-649) -- --

-6-

• Geometrical interpretation

-PCA projects the data along the directions where the data varies the most.

-These directions are determined by the eigenvectors of the covariance matrix corresponding to the largest eigenvalues.

-The magnitude of the eigenvalues corresponds to the variance of the data along the eigenvector directions.

• Properties and assumptions of PCA

-The newvariables (i.e., bi’s)are uncorrelated.

 1 0 0   0 0   2   . . .  the covariance of b ’s is: UT CU = i  . . .     . . .   0 0 K 

-The covariance matrix represents only second order statistics among the vector values.

-Since the newvariables are linear combinations of the original variables, it is usally difficult to interpret their meaning. -- --

-7-

• Howtochoose the principal components?

-Tochoose K,use the following criterion:

K Σ i i=1 > Threshold (e.g., 0.9 or 0.95) N Σ i i=1

• What is the error due to dimensionality reduction?

-Wesaw above that an original vector x can be reconstructed using its the prin- cipla components: K K − = = + xˆ x Σ biui or xˆ Σ biui x i=1 i=1

-Itcan be shown that the low-dimensional basis based on principal components minimizes the reconstruction error:

e = ||x − xˆ||

-Itcan be shown that the error is equal to:

N = e 1/2 Σ i i=K+1

• Standardization

-The principal components are dependent on the units used to measure the orig- inal variables as well as on the range of values theyassume.

-Weshould always standardize the data prior to using PCA.

-Acommon standardization method is to transform all the data to have zero mean and unit standard deviation: − xi ( and are the mean and standard deviation of xi’s) -- --

-8-

• PCA and classification

-PCA is not always an optimal dimensionality-reduction procedure for classifi- cation purposes:

• Other problems

-How tohandle occlusions?

-How tohandle different views of a 3D object?

-- --