Eigenfaces and Fisherfaces Dimension Reduction and Component Analysis

Eigenfaces and Fisherfaces Dimension Reduction and Component Analysis Jason Corso University of Michigan EECS 598 Fall 2014 Foundations of Computer Vision JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 1 / 101 V (1) − V (1 − ) D D = VD(1) (2) 1 − (1 − )D 1 D = 20 0.8 D = 5 Noting that the volume of the sphere D = 2 D 0.6 will scale with r , we have: D = 1 D 0.4 VD(r) = KDr (1) volume fraction 0.2 where KD is some constant (depending 0 only on D). 0 0.2 0.4 0.6 0.8 1 ² Dimensionality High Dimensions Often Test Our Intuitions Consider a simple arrangement: you have a sphere of radius r = 1 in a space of D dimensions. We want to compute what is the fraction of the volume of the sphere that lies between radius r = 1 − and r = 1. JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 2 / 101 1 D = 20 0.8 D = 5 D = 2 0.6 D = 1 0.4 volume fraction 0.2 0 0 0.2 0.4 0.6 0.8 1 ² Dimensionality High Dimensions Often Test Our Intuitions Consider a simple arrangement: you VD(1) − VD(1 − ) have a sphere of radius r = 1 in a space = VD(1) (2) of D dimensions. 1 − (1 − )D We want to compute what is the fraction of the volume of the sphere that lies between radius r = 1 − and r = 1. Noting that the volume of the sphere will scale with rD, we have: D VD(r) = KDr (1) where KD is some constant (depending only on D). JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 2 / 101 Dimensionality Let's Build Some More Intuition Example from Bishop PRML Dataset: Measurements taken from a pipeline containing a 2 mixture of oil. Three classes present 1.5 (different geometrical configuration): homogeneous, annular, and laminar. x7 1 Each data point is a 12 dimensional input vector 0.5 consisting of measurements taken with gamma ray densitometers, which measure 0 the attenuation of gamma 0 0.25 0.5 0.75 1 x6 rays passing along narrow beams through the pipe. JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 3 / 101 Dimensionality Let's Build Some More Intuition Example from Bishop PRML 100 data points of features x6 2 and x7 are shown on the right. Goal: Classify the new data 1.5 point at the `x'. x7 1 0.5 0 0 0.25 0.5 0.75 1 x6 JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 4 / 101 Dimensionality Let's Build Some More Intuition Example from Bishop PRML Observations we can make: The cross is surrounded by 2 many red points and some green points. 1.5 Blue points are quite far from the cross. x7 1 Nearest-Neighbor Intuition: The query point should be 0.5 determined more strongly by nearby points from the training 0 set and less strongly by more 0 0.25 0.5 0.75 1 distant points. x6 JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 5 / 101 What problems may exist with this approach? Dimensionality Let's Build Some More Intuition Example from Bishop PRML One simple way of doing it is: 2 We can divide the feature space up into regular cells. For each, cell, we associated the 1.5 class that occurs most frequently in that cell (in our x7 1 training data). Then, for a query point, we 0.5 determine which cell it falls into and then assign in the label 0 associated with the cell. 0 0.25 0.5 0.75 1 x6 JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 6 / 101 Dimensionality Let's Build Some More Intuition Example from Bishop PRML One simple way of doing it is: 2 We can divide the feature space up into regular cells. For each, cell, we associated the 1.5 class that occurs most frequently in that cell (in our x7 1 training data). Then, for a query point, we 0.5 determine which cell it falls into and then assign in the label 0 associated with the cell. 0 0.25 0.5 0.75 1 x6 What problems may exist with this approach? JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 6 / 101 Dimensionality Let's Build Some More Intuition Example from Bishop PRML The problem we are most interested in now is the one that becomes apparent when we add more variables into the mix, corresponding to problems of higher dimensionality. In this case, the number of additional cells grows exponentially with the dimensionality of the space. Hence, we would need an exponentially large training data set to ensure all cells are filled. x2 x2 x1 x1 x1 x3 D = 1 D = 2 D = 3 JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 7 / 101 Dimensionality Curse of Dimensionality This severe difficulty when working in high dimensions was coined the curse of dimensionality by Bellman in 1961. The idea is that the volume of a space increases exponentially with the dimensionality of the space. JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 8 / 101 Dimensionality Dimensionality and Classification Error? Some parts taken from G. V. Trunk, TPAMI Vol. 1 No. 3 PP. 306-7 1979 How does the probability of error vary as we add more features, in theory? Consider the following two-class problem: The prior probabilities are known and equal: P (!1) = P (!2) = 1=2. The class-conditional densities are Gaussian with unit covariance: p(xj!1) ∼ N(µ1; I) (3) p(xj!2) ∼ N(µ2; I) (4) where µ1 = µ, µ2 = −µ, and µ is an n-vector whose ith component is (1=i)1=2. The corresponding Bayesian Decision Rule is T decide !1 if x µ > 0 (5) JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 9 / 101 What can we say about this result as more features are added? Dimensionality Dimensionality and Classification Error? Some parts taken from G. V. Trunk, TPAMI Vol. 1 No. 3 PP. 306-7 1979 The probability of error is 1 P (error) = p 1 exp −z2=2 dz (6) 2π Zr=2 where n 2 2 r = kµ1 − µ2k = 4 (1=i): (7) i=1 p(x ωi)P(ωXi) Let's take this integral for | ω granted... (For more detail, you ω1 2 can look at DHS Problem 31 in reducible Chapter 2 and read Section 2.7.) error x xB x* R1 R2 p(x ω )P(ω ) dx p(x ω )P(ω ) dx ∫ | 2 2 ∫ | 1 1 R1 R2 FIGURE 2.17. Components of the probability of error for equal priors and (nonoptimal) JJ Corso (University of Michigan) Eigenfaces and Fisherfaces∗ 10 / 101 decision point x . The pink area corresponds to the probability of errors for deciding ω1 when the state of nature is in fact ω2; the gray area represents the converse, as given in Eq. 70. If the decision boundary is instead at the point of equal posterior probabilities, xB, then this reducible error is eliminated and the total shaded area is the minimum possible; this is the Bayes decision and gives the Bayes error rate. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Dimensionality Dimensionality and Classification Error? Some parts taken from G. V. Trunk, TPAMI Vol. 1 No. 3 PP. 306-7 1979 The probability of error is 1 P (error) = p 1 exp −z2=2 dz (6) 2π Zr=2 where n 2 2 r = kµ1 − µ2k = 4 (1=i): (7) i=1 p(x ωi)P(ωXi) Let's take this integral for | ω granted... (For more detail, you ω1 2 can look at DHS Problem 31 in reducible Chapter 2 and read Section 2.7.) error What can we say about this x xB x* result as more features are R1 R2 ∫p(x ω2)P(ω2) dx ∫p(x ω1)P(ω1) dx added? | | R1 R2 FIGURE 2.17. Components of the probability of error for equal priors and (nonoptimal) JJ Corso (University of Michigan) Eigenfaces and Fisherfaces∗ 10 / 101 decision point x . The pink area corresponds to the probability of errors for deciding ω1 when the state of nature is in fact ω2; the gray area represents the converse, as given in Eq. 70. If the decision boundary is instead at the point of equal posterior probabilities, xB, then this reducible error is eliminated and the total shaded area is the minimum possible; this is the Bayes decision and gives the Bayes error rate. From: Richard O. Duda, Peter E. Hart, and David G. Stork, Pattern Classification. Copyright c 2001 by John Wiley & Sons, Inc. Dimensionality Dimensionality and Classification Error? Some parts taken from G. V. Trunk, TPAMI Vol. 1 No. 3 PP. 306-7 1979 The probability of error approaches 0 as n approach infinity because 1=i is a divergent series. More intuitively, each additional feature is going to decrease the probability of error as long as its means are different. In the general case of varying means and but same variance for a feature, we have d 2 µi1 − µi2 r2 = (8) σ i=1 i X Certainly, we prefer features that have big differences in the mean relative to their variance. We need to note that if the probabilistic structure of the problem is completely known then adding new features is not going to decrease the Bayes risk (or increase it). JJ Corso (University of Michigan) Eigenfaces and Fisherfaces 11 / 101 Dimensionality Dimensionality and Classification Error? Some parts taken from G.

Load more