Principal Component Analysis & Independent Component Analysis

Principal Component Analysis & Independent Component Analysis Overview Principal Component Analysis •Purpose •Derivation •Examples Independent Component Analysis •Generative Model •Purpose •Restrictions •Computation •Example Literature Fall 2004 Pattern Recognition for Vision Notation & Basics u Vector U Matrix uvTN, u, v∈ R Dot product written as matrix product ua Product of a row vector with scalar as matrix product, and not au uuu2 ==2 T u squared norm Rules for matrix multiplication: UVW ==()UV W U(VW) ()UV+=WUWV+W ()UV TT= V UT Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) Purpose For a set of samples of a random vector xR∈ N ,discover or reduce the dimensionality and identify meaningful variables. u1 u2 u1 yU= x, U: p×<Np, N Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) PCA by Variance Maximization Find the vector u1,such that the variance of the data along this u direction is maximized: B u A σ 22==EE()uxT ,x 0,u =1 u1 {}11{} σ 2 ==EE()uxTT(xu)uTxxTu, u1 {}111{}1 σ 22> σ σ 2 ==uCTTu, C E xx uuA B u1 11 {} The solution is the eigenvector eC1of with the largest eigenvalue λ1. Ce ==e λλ, eT Ce ⇔σ2 =λ 111111 u1 1 Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) PCA by Variance Maximization For a given pN< , find p orthonormal basis vectors ui such that the variance of the data along these vectors is maximally large, under the constraint of decorrelation: TT T En{}()uxin()ux ==0,uiun0 ≠p The solution are the eigenvectors of C ordered according to decreasing eigenvalues λ : ue11==,u2e2,...,upp=e,λλ1>2... >λp Proof of decorrelation for eigenvectors: EE()exTT()ex ==eTxxTe eTCe =eTe λ =0 {}in i{}ninNinn orthogonal Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) PCA by Mean Square Error Compression For a given pN< , find p orthonormal basis vectors such that the mse between xx and its projection ˆ into the subspace spanned by the p orthonormal basis vectors is mimimum: p mse E xxˆˆ2 ,,x uxT u uT u =−{}ik=∑ ()kkkm= δk,m k=1 uk x xˆ Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) 2 p 2 mse =−E xxˆ =E x−u()xT u {}∑ kk k =1 scalar ppp 2 EExx2(TTuxu)E(xTu)uTu(xTu) =−{} ∑∑kk+ nn∑kk kn==11k=1 pp 2 =−EExx2(TTu)22+E(xu) {} ∑∑kk kk==11 p 2 EExx()T u2 =−{} ∑ k k =1 p T T =−trace()Cu∑ kkCu, C= E{}xx k =1 Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) P TT mse =−trace()Cu∑ kkCu, C=E{}xx k= 1 maximize Solution to minimizing mse is any (orthonormal) basis of the subspace spanned by the p first eigenvectors ee1,..., p of C. p N mse =−trace()C ∑λkk=∑ λ kk==11p+ The mse is the sum of the eigenvalues corresponding to N the discarded eigenvectors eePN+1,..., of C: mse = ∑ λk kp=+1 Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) How to determine the number of principal components p? Linear signal model with unknown number p < N of signals: aa1,1 1, p xA=+sn, A=% Np× aaNN,1 , p Signal si have 0 mean and are uncorrelated, n is white noise: TT2 EE{}ss ==I, {nn }σ n I Cx==EExTTAs()As+EnnT+EAsnT {} { } {} { } =0 TT T T 2 =+AsEE{}sA {nn}=AA+σ n I 2 dd12>>... >dpp>d++1=dp2=... =dN=σ n → cut off when eigenvalues become constants Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) Computing the PCA Given a set of samples {xx1,..., M }of a random vector x calculate mean and covariance. 1 M µx=→∑ i , xx−µ M i=1 1 M T Cx=−∑()iiµ()x−µ M i=1 Compute eigenvecotors of C e.g. with QR algorithm Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) Computing the PCA If the number of samples M is smaller than the dimensionality N of x : xx1,1 1,M TT BC==% ,,BBB:NM×,B:M×N xxNN,1 ,M BBT e = eλ BBT e′′= eλ′ BBT ()Be′′= ()Be λ′→ Reducing complexity from eB==e′′, λλ ON()22 to O(M ) Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) Examples Eigenfaces for face recognition (Turk&Pentland): Training: -Calculate the eigenspace for all faces in the training database -Project each face into the eigenspace → feature reduction Classification: -Project new face into eigenspace -Nearest neighbor in the eigenspace Fall 2004 Pattern Recognition for Vision Principal Component Analysis (PCA) Examples cont. Feature reduction/extraction Original Reconstruction with 20 PC http://www.nist.gov/ Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Generative model Noise free, linear signal model: N xA==s∑aiis i=1 T x = ()xx1,..., N Observed variables T s = ()ss1,..., N latent signals, independent components aa1,1 1,N T Aa==% unknown mixing matrix, ii(aa1, ,..., N,i) aaNN,1 , N Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Task For the linear, noise free signal model, compute As and given the measurements x. 1 X1 S S1 Blind source separation : X3 S 1 separate the three original signals S2 ss12, , and s 3 from their mixtures S 1 X2 xx12, , and x 3. S3 Figure by MIT OCW. Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Restrictions 1.) Statistical independence The signals si must be statistically independent: ps( 12, s,..., sNN) = p1(s1) p2(s2)...p (sN) Independent variables satisfy: Eg{}11(s)g2(s2)...gNN(s) = E{g11(s)}Eg{2(s2)}...E{gNN(s)} 1 for any gi ()sL∈ ∞∞ E{}g ()s g (s )= g ()s g (s )p(s ,s )ds ds 11 2 2 ∫∫−∞ −∞ 11 2 2 12 1 2 ∞∞ ==g()s ps()ds g(s)ps()ds E{}g()s E{g(s)} ∫∫−∞ 11 1 1−∞ 2 2 2 2 11 2 2 Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Restrictions Statistical independence cont. Eg{}11(s)g2(s2)...gNN(s) = E{g11(s)}Eg{2(s2)}...E{gNN(s)} Independence includes uncorrelatedness: Es{}()ii−−µµ(snn)=E{()sii−µ}E{(sn−µn)}=0 Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Restrictions 2.) Nongaussian components The components si must have a nongaussian distribution otherwise there is no unique solution. Example: σ 2 given A and two gaussian signals: 22 σ1 1 ss12 ps(,12s)=−exp( 22−) 22πσ12σ σ1 2σ 2 1 generate new signals 1 22 1/σ1 0 1 ss12′ ′ ss′′=⇒ps( 12, s′) =exp(−)exp(−) 01/σ 22π 2 2 Scaling matrix S Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Restrictions Nongaussian components cont. under rotation the components remain independent: 22 cosϕϕsin 1 ss12′′′′ ss′′ ==′, ps( 12′′, s′′ ) exp(−)exp(−) −sinϕϕcos 22π 2 Rotation matrix R combine whitening and rotation BR= S: xA==sAB−1s′′ AB−1 is also a solution to the ICA problem. Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Restrictions 3.) Mixing matrix must be invertible The number of independent components is equal to the number of observerd variables. Which means that there are no redundant mixtures. In case mixing matrix is not invertible apply PCA on measurements first to remove redundancy. Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Ambiguities 1.) Scale NN1 xa==∑∑iiss()ai(αii) ii==11αi 2 Reduce ambiguity by enforcing Es{}i = 1 2.) Order We cannot determine an order of the independant components Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Computing ICA a) Minimizing mutual information: sAˆ = ˆ −1x N ˆˆˆ Mutual information: IH(ss) =−∑ (si ) H( ) i=1 H is the differential etnropy: Hp()ssˆˆ=− ()log(p()sˆ)dsˆ ∫ 2 Is is always nonnegative and 0 only if the ˆi are independent. Iteratively modify Asˆ −1 such that I(ˆ) is minimized. Fall 2004 Pattern Recognition for Vision Independent Component Analysis (ICA) Computing ICA cont. b) Maximizing Nongaussianity sA= −1x introduce yy and bb: ==TTxbAs=qTs From central limit theorem: T ys= qs is more gaussian than any of the i and becomes least gaussian if ys= i . Iteratively modify bT such that the "gaussianity" of y is minimized. When a local minimum is reached, bAT is a row vector of −1. Fall 2004 Pattern Recognition for Vision PCA Applied to Faces x1,1 x1,M … xN ,1 xNM, x1 xM Each pixel is a feature, each face image a point in the feature space. Dimension of feature vector is given by the size of the image. x3 u2 xM ui are the eigenvectors which can be x1 represented as pixel images in the original u 1 x2 cooridinate system xx1... N x1 Fall 2004 Pattern Recognition for Vision ICA Applied to Faces x1,1 xN ,1 x1 xN S S 1 … 1 x1,M xNM, Now each image corresponds to a particular observed variable measured over time (M samples). N is the number of images. N xA==s∑aiis i=1 T x = ()xx1,..., N Observed variables T s = ()ss1,..., N latent signals, independent components Fall 2004 Pattern Recognition for Vision PCA and ICA for Faces Features for face recognition Image removed due to copyright considerations. See Figure 1 in: Baek, Kyungim et. al. "PCA vs. ICA: A comparison on the FERET data set." International Conference of Computer Vision, Pattern Recognition, and Image Processing, in conjunction with the 6th JCIS. Durham, NC, March 8-14 2002, June 2001. Fall 2004 Pattern Recognition for Vision.

Principal Component Analysis & Independent Component Analysis

Distribution of Mutual Information

Lecture 3: Entropy, Relative Entropy, and Mutual Information 1 Notation 2

A Statistical Framework for Neuroimaging Data Analysis Based on Mutual Information Estimated Via a Gaussian Copula

Information Theory 1 Entropy 2 Mutual Information

Fundamentals of Principal Component Analysis (PCA), Independent Component Analysis (ICA), and Independent Vector Analysis (IVA)

On the Estimation of Mutual Information

Weighted Mutual Information for Aggregated Kernel Clustering

Unsupervised Discovery of Temporal Structure in Noisy Data with Dynamical Components Analysis

Contents SYSTEMS GROUP

Lecture 2 Entropy and Mutual Information Instructor: Yichen Wang Ph.D./Associate Professor

Estimating the Mutual Information Between Two Discrete, Asymmetric Variables with Limited Samples

Relation Between the Kantorovich-Wasserstein Metric and the Kullback-Leibler Divergence