Review Session for Final Exam

Review Session for Final Exam BIOE 597, Spring 2017, Penn State University By Xiao Liu Notes about Final Exams • You will be allowed to bring with you a piece of letter-size “cheating” paper! • You won’t need calculator this time • 15 single-choice questions with each counting for 1 points • You need to know/remember major Matlab functions we used in our lab sessions Principal Component Analysis Multicolinearity • Multicollinearity is a phenomenon in which two or more predictor variables in a multiple regression model are highly correlated • Perfect multicolinearity: the design matrix � will be rank deficient à �’� is not invertable à No LS solution • High multicolinearity: �’� has a very small determinant à a small change in the data will result in large changes in estimation Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Principal Component Analysis • One major goal of PCA is to identify the feature predictors (e.g., factors and latent variables) that are uncorrelated to each other • How can we do this? • Rotate data! X2 X1 Diagonalizing Covariance Matrix • Covariance matrix of � 1 � = �)� � � − 1 • To find an orthogonal transformation (rotation) matrix � � = �� , , � = �)� = �)�)�� = �)� � � -., -., � is the diagonal matrix with off-diagonal elements equal to zeros Eigendecomposition of Covariance Matrix • To find an orthogonal transformation (rotation) matrix � ) �� = � �� ) �� = �� • Covariance matrix is positive definite, we can eigendecompose it into ) �� = �Λ� �: matrix of eigenvectors orthogonal � � Λ = diag(λ�) � Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 • Eigen-decomposition 55.2184 27.6386 � = = �Λ�) � 27.6386 27.6554 0.8504 −0.5262 72.3209 0 0.8504 −0.5262 = * * 0.5262 0.8504 0 10.5529 0.5262 0.8504 Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 • Eigen-decomposition 55.2184 27.6386 � = = �Λ�) � 27.6386 27.6554 0.8504 −0.5262 72.3209 0 0.8504 −0.5262 = * * 0.5262 0.8504 0 10.5529 0.5262 0.8504 � Examples (2D) 55.2184 27.6386 � = � 27.6386 27.6554 72.3209 0 � = � 0 10.5529 Scree Plot: Example • Grades of all 5 exams %��: 61.8667 18.1790 9.4286 7.6280 2.8976 �� %��: 61.8667 80.0457 89.4744 97.1024 100 How Many PCs to Retain • Not correct answer • Based on total variance: e.g., up to 90% variance • Based on scree plot “elbow point” • Kaiser’s rule: retain those with a eigenvalue larger than 1 or 0.7. Note: only when we eigendecompose correlation matrix where average eigenvalue will be 1. Dimension Reduction • Another function of PCA is to reduce dimensionality � � ∗ � = �� 60% of � = ∗ �) Image Compression Total = 200 PCs http://glowingpython.blogspot.com/2011/07/pca-and-image-compression-with-numpy.html PCA for Classification: Spike Sorting PC Space PCA on Correlation Matrix • Instead of eigendecomposing covariance matrix, we can also do that on correlation matrix, which standardizes the variance of each variables • It will give different results, especially when there are large differences in the variance of individual variables • It would be better to do PCA on correlation matrix under this situation PCA through Singular Value Decomposition • A �×� matrix � can be decomposed to � = �Σ�) where o � is a �×� orthogonal matrix o � is a �×� orthogonal matrix o Σ is a �×� diagonal matrix: Σ = ��(�c) PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) �)� = (�Σ�)))�Σ�) = �Σ)�)�Σ �) = �Σe�) o � is the eigenvector matrix of �� o � is the eigenvector matrix of �� e e o Σ = (� − 1)Λ, �� = (� − 1)�c PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � • SVD is often preferred because of numerical stability, but could be slow with � >> �. PCA through Singular Value Decomposition • The relation between SVD and PCA � = �Σ�) SVD: �� = �Σ PCA: �� = � • SVD is often preferred because of numerical stability, but could be slow with � >> �. Independent Component Analysis Blind Signal Separation • ICA is a special case of blind signal separation • Blind signal separation (BSS), also known as blind source separation, is the separation of a set of source signals from a set of mixed signals, without the aid of information (or with very little information) about the source signals or the mixing process. Mathematical Description � � � + �� + … + � � , for all � = 1, … , � � = �1 1 2 2 �� -×o = �-×p�p×o • Giving: Mixing Matrix observation “�” • Find: Original independent components “�” �-×� = �-×o�o×p Unmixing Matrix Identifiability • �c are statistically independent • At most one of the sources �c is Gaussian • The number of observed mixtures, �, must be at least as large as the number of estimated components �: � ≥ � Compare PCA and ICA • PCA • ICA � = �� = �� Compare PCA and ICA • PCA: Finds directions • ICA: Finds directions of of maximal variance in maximal independence in gaussian data nongaussian data • In fact, the PCA is a pre-processing step of ICA ICA Steps: Whitening • Whitenning/Sphering, i.e., PCA ) � = �� = � �� = diag(λ�) ) � = ��w �� = �w ��w = � SVD: ., �� = �Σ ��Σ = � Σ = ��(�c) ., �� = � ��w = � Σ = ��(1/�c) rotation scaling ICA Steps: Whitening • Why do we do ”whitening/sphering”? � = ��w �� = � for any orthogonal rotation � � = �� ) ) �� = � �� = � �� = � • No matter how we rotate the whitened data, the resulting columns will be ”uncorrelated” ICA Steps: Rotating • Maximize the statistical independence of the estimated components o Maximize non-Gaussianity o Minimize mutual information • Measures of non-Gaussianity and independence o Kurtosis: kurt(y) = E{y4}-3(E{y2})2 o Entropy: H(y) = -ò f (y)log f (y)dy Negentropy: o J(y) = H (ygauss ) - H (y) o Kullback–Leibler divergence (relative entropy) o … ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ ICA Steps: Rotating • Fast ICA by Aapo Hyvärinen at Helsinki University of Technology • Available for Matlab, Octave, Python, C, R, Java, and … o https://research.ics.aalto.fi/ica/fastica/ Limitations of ICA • Cannot determine the variances, i.e., energies, and signs of independent components. • Cannot determine the order of independent components, i.e., how many components? • Identification problem: the unmixing matrix � to be a scaled and permuted version of the true �0. Canonical Correlation Analysis Introduction • When we have univariate data there are times when we would like to measure the linear relationship between things o Simple Linear Regression: we have 2 variables and all we are interested in is measuring their linear relationship. o Multiple linear regression: we have several independent variables and one dependent variable. � = � + � � � � + ⋯ + � � + �� ~�(0, �2) � 0 1 �1 + 1 �2 1 �� • What if we have several dependent variables and several independent variables? o Multivariate Regression o Canonical Correlation Analysis Introduction • Canonical correlation analysis (CCA) is a way of measuring the linear relationship between two groups of multidimensional variables. • Finding two sets of basis vectors such that the correlation between the projections of the variables onto these basis vectors is maximized • Determine correlation coefficients Geometric Interpretation Jargon • Variables: two sets of variables � and � • Canonical Variates --- Linear combinations of variables • Canonical Variates Pair --- Two Canonical Variates with each from one set showing non-zero correlations • Canonical Correlations--- Correlation between Canonical Variate Pairs CCA Definition • Two groups of multidimensional variables � = ��, ��, … , �� and � = [��, ��, … , ��] �c, �•, �ce �•e where � � �� = c‹ �� = •‹ … … �c- �•- � • Purpose of CCA: find coefficient vectors �� = �11, �21, … , ��1 ， � and �� = �11, �21, … , ��1 to maximize the correlation � = ��(��, ��) • �1 = �� and �1 = ��, i.e., linear combinations of � and � respectively, are the first pair of canonical variates. CCA Definition • Then, the second pair of canonical variates can be found in the same way subject to the constraint that they are uncorrelated with the first pair of variables. • � = min �, � pairs of canonical variate pairs can be found by repeating this procedure • We will finally get two matrices � = ��, ��, … , �� and � = ��, ��, … , �� to transfer the � and � to canonical variates U and V. �-×o = �-×–�–×o �-×o = �-×—�—×o Mathematical Description • IF X and Y are both centered, we can

Review Session for Final Exam

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support