Review Session for Final Exam
Total Page:16
File Type:pdf, Size:1020Kb
Review Session for Final Exam BIOE 597, Spring 2017, Penn State University By Xiao Liu Notes about Final Exams • You will be allowed to bring with you a piece of letter-size “cheating” paper! • You won’t need calculator this time • 15 single-choice questions with each counting for 1 points • You need to know/remember major Matlab functions we used in our lab sessions Principal Component Analysis Multicolinearity • Multicollinearity is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated • Perfect multicolinearity: the design matrix � will be rank deficient à �’� is not invertable à No LS solution • High multicolinearity: �’� has a very small determinant à a small change in the data will result in large changes in estimation Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Diagonalizing Covariance Matrix • Covariance matrix of � 1 � = �)� � � − 1 • To find an orthogonal transformation (rotation) matrix � � = �� , , � = �)� = �)�)�� = �)� � � -., -., � is the diagonal matrix with off-diagonal elements equal to zeros Eigendecomposition of Covariance Matrix • To find an orthogonal transformation (rotation) matrix � ) �� = � ��� ) �� = ���� • Covariance matrix is positive definite, we can eigendecompose it into ) �� = �Λ� �: matrix of eigenvectors orthogonal � � Λ = diag(λ�) � Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 • Eigen-decomposition 55.2184 27.6386 � = = �Λ�) � 27.6386 27.6554 0.8504 −0.5262 72.3209 0 0.8504 −0.5262 = * * 0.5262 0.8504 0 10.5529 0.5262 0.8504 Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 • Eigen-decomposition 55.2184 27.6386 � = = �Λ�) � 27.6386 27.6554 0.8504 −0.5262 72.3209 0 0.8504 −0.5262 = * * 0.5262 0.8504 0 10.5529 0.5262 0.8504 � Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 72.3209 0 � = � 0 10.5529 Scree Plot: Example • Grades of all 5 exams %��������: 61.8667 18.1790 9.4286 7.6280 2.8976 ��������� %��������: 61.8667 80.0457 89.4744 97.1024 100 How Many PCs to Retain • Not correct answer • Based on total variance: e.g., up to 90% variance • Based on scree plot “elbow point” • Kaiser’s rule: retain those with a eigenvalue larger than 1 or 0.7. Note: only when we eigendecompose correlation matrix where average eigenvalue will be 1. Dimension Reduction • Another function of PCA is to reduce dimensionality � � ∗ � = �� �� 60% of � = ∗ �) Image Compression Total = 200 PCs http://glowingpython.blogspot.com/2011/07/pca-and-image-compression-with-numpy.html PCA for Classification: Spike Sorting PC Space PCA on Correlation Matrix • Instead of eigendecomposing covariance matrix, we can also do that on correlation matrix, which standardizes the variance of each variables • It will give different results, especially when there are large differences in the variance of individual variables • It would be better to do PCA on correlation matrix under this situation PCA through Singular Value Decomposition • A �×� matrix � can be decomposed to � = �Σ�) where o � is a �×� orthogonal matrix o � is a �×� orthogonal matrix o Σ is a �×� diagonal matrix: Σ = ����(�c) PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) �)� = (�Σ�)))�Σ�) = �Σ)�)�Σ �) = �Σe�) o � is the eigenvector matrix of ��� o � is the eigenvector matrix of ��� e e o Σ = (� − 1)Λ, �� = (� − 1)�c PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � • SVD is often preferred because of numerical stability, but could be slow with � >> �. PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � • SVD is often preferred because of numerical stability, but could be slow with � >> �. Independent Component Analysis Blind Signal Separation • ICA is a special case of blind signal separation • Blind signal separation (BSS), also known as blind source separation, is the separation of a set of source signals from a set of mixed signals, without the aid of information (or with very little information) about the source signals or the mixing process. Mathematical Description � � � + �� � + … + � � , for all � = 1, … , � � = �1 1 2 2 �� � �-×o = �-×p�p×o • Giving: Mixing Matrix observation “�” • Find: Original independent components “�” �-×� = �-×o�o×p Unmixing Matrix Identifiability • �c are statistically independent • At most one of the sources �c is Gaussian • The number of observed mixtures, �, must be at least as large as the number of estimated components �: � ≥ � Compare PCA and ICA • PCA • ICA � = �� � = �� Compare PCA and ICA • PCA: Finds directions • ICA: Finds directions of of maximal variance in maximal independence in gaussian data nongaussian data • In fact, the PCA is a pre-processing step of ICA ICA Steps: Whitening • Whitenning/Sphering, i.e., PCA ) � = �� �� = � ��� = diag(λ�) ) � = ��w �� = �w ���w = � SVD: ., �� = �Σ ��Σ = � Σ = ����(�c) ., �� = � ��w = � Σ = ����(1/�c) rotation scaling ICA Steps: Whitening • Why do we do ”whitening/sphering”? � = ��w �� = � for any orthogonal rotation � � = �� ) ) �� = � ��� = � ��� = � • No matter how we rotate the whitened data, the resulting columns will be ”uncorrelated” ICA Steps: Rotating • Maximize the statistical independence of the estimated components o Maximize non-Gaussianity o Minimize mutual information • Measures of non-Gaussianity and independence o Kurtosis: kurt(y) = E{y4}-3(E{y2})2 o Entropy: H(y) = -ò f (y)log f (y)dy Negentropy: o J(y) = H (ygauss ) - H (y) o Kullback–Leibler divergence (relative entropy) o … ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ Limitations of ICA • Cannot determine the variances, i.e., energies, and signs of independent components. • Cannot determine the order of independent components, i.e., how many components? • Identification problem: the unmixing matrix � to be a scaled and permuted version of the true �0. Canonical Correlation Analysis Introduction • When we have univariate data there are times when we would like to measure the linear relationship between things o Simple Linear Regression: we have 2 variables and all we are interested in is measuring their linear relationship. o Multiple linear regression: we have several independent variables and one dependent variable. � = � + � � � � + ⋯ + � � + �� � ~�(0, �2) � 0 1 �1 + 1 �2 1 �� � • What if we have several dependent variables and several independent variables? o Multivariate Regression o Canonical Correlation Analysis Introduction • Canonical correlation analysis (CCA) is a way of measuring the linear relationship between two groups of multidimensional variables. • Finding two sets of basis vectors such that the correlation between the projections of the variables onto these basis vectors is maximized • Determine correlation coefficients Geometric Interpretation Jargon • Variables: two sets of variables � and � • Canonical Variates --- Linear combinations of variables • Canonical Variates Pair --- Two Canonical Variates with each from one set showing non-zero correlations • Canonical Correlations--- Correlation between Canonical Variate Pairs CCA Definition • Two groups of multidimensional variables � = ��, ��, … , �� and � = [��, ��, … , ��] �c, �•, �ce �•e where � � �� = c‹ �� = •‹ … … �c- �•- � • Purpose of CCA: find coefficient vectors �� = �11, �21, … , ��1 , � and �� = �11, �21, … , ��1 to maximize the correlation � = ����(���, ���) • �1 = ��� and �1 = ���, i.e., linear combinations of � and � respectively, are the first pair of canonical variates. CCA Definition • Then, the second pair of canonical variates can be found in the same way subject to the constraint that they are uncorrelated with the first pair of variables. • � = min �, � pairs of canonical variate pairs can be found by repeating this procedure • We will finally get two matrices � = ��, ��, … , �� and � = ��, ��, … , �� to transfer the � and � to canonical variates U and V. �-×o = �-×–�–×o �-×o = �-×—�—×o Mathematical Description • IF X and Y are both centered, we can