c

Face Recognition Using Eigenfaces

Matthew A. Turk and Alex P. Pentland Vision and Modeling Group, The Media Laboratory Massachusetts Institute of Technology

Abstract another. The approach transforms face images into We present an approach to the detection and a small set of characteristic feature images, called “ ’ identification of human faces and describe a work- eigenfaces” , which are the principal components of ing, near-real-time face recognition system which the initial training set of face images. Recognition is tracks a subject’s head and then recognizes the per- performed by projecting a new image into the snb- son by comparing characteristics of the face to those space spanned by the eigenfaces (“face space”) and of known individuals. Our approach treats face then classifying the face by comparing its position in recognition as a two-dimensional recognition prob- face space with the positions of known individuals. lem, taking advantage of the fact that faces are are Automatically learning and later recognizing new normally upright and thus may be described by a faces is practical within this framework. Recogni- small set of 2-D characteristic views. Face images tion under reasonably varying conditions is achieved are projected onto a feature space (“face space”) by training on a limited number of characteristic that best encodes the variation among known face views (e.g., a “straight on” view, a 45’ view, and images. The face space is defined by the “eigen- a profile view). The approach has advantages over faces”, which are the eigenvectors of the set of faces; other face recognition schemes in its speed and sim- they do not necessarily correspond to isolated fea- plicity, learning capacity, and relative insensitivity tures such as eyes, ears, and noses. The framework to small or gradual changes in the face image. provides the ability to learn to recognize new faces in an unsupervised manner. 1.1 Background and related work Much of the work in computer recognition of faces 1 Introduction has focused on detecting individual features such as the eyes, nose, mouth, and head outline, and defin- Developing a computational model of face recogni- ing a face model by the position, size, and relation- tion is quite difficult, because faces are complex, ships among these features. Beginning with Bled- multidimensional, and meaningful visual stimuli. soe’s [2] and Kanade’s [3] early systems, a number They are a natural class of objects, and stand in of automated or semi-automated face recognition stark contrast to sine wave gratings, the “blocks strategies have modeled and classified faces based world”, and other artificial stimuli used in human on normalized distances and ratios among feature and research[l]. Thus unlike most points. Recently this general approach has been early visual functions, for which we may construct continued and improved by the recent work of Yuille detailed models of retinal or striate activity, face et al. [4]. recognition is a very high level task for which com- Such approaches have proven difficult to extend putational approaches can currently only suggest to multiple views, and have often been quite frag- broad constraints on the corresponding neural ac- ile. Research in human strategies of face recogni- tivity. tion, moreover, has shown that individual features We therefore focused our research towards devel- and their immediate relationships comprise an insuf- oping a sort of early, preattentive pattern recogni- ficient representation to account for the performance tion capability that does not depend upon having of adult human face identification [5]. Nonetheless, full three-dimensional models or detailed geometry. this approach to face recognition remains the most Our aim was to develop a computational model of popular one in the computer vision literature. face recognition which is fast, reasonably simple, Connectionist approaches to face identification and accurate in constrained environments such as sepk to capture the configurational, or gestalt-like an office or a household. nature of the task. Fleming and Cottrell [6], build- Although face recognition is a high level visual ing on earlier work by Kohonen and Lahtio [7], use problem, there is quite a bit of structure imposed on nonlinear units to train a network via back propa- the task. We take advantage of some of this struc- gation to classify face images. Stonham’s WISARD ture by proposing a scheme for recognition which is system [8] has been applied with some success to bi- based on an information theory approach, seeking nary face images, recognizing both identity and ex- to encode the most relevant information in a group pression. Most connectionist systems dealing with of faces which will best distinguish them from one faces trrat thr input image as a general 2-D pattern,

586 CH2983-5/91/0000/0586/$01.OO (0 1991 IEEE and can make no explicit use of the configurational The idea of using eigenfaces was motivated by a properties of a face. Only very simple systems have technique developed by Sirovich and Kirby [lo] for been explored to date, and it is unclear how they efficiently representing pictures of faces using prin- will scale to larger problems. cipal component analysis. They argued that a col- Recent work by Burt et al. uses a “smart sensing” lection of face images can be approximately recon- approach based on multiresolution template match- structed by storing a small collection of weights for ing [9]. This coarse-to-fine strategy uses a special- each face and a small set of standard pictures. purpose computer built to calculate multiresolution It occurred to us that if a multitude of face im- pyramid images quickly, and has been demonstrated ages can be reconstructed by weighted sums of a identifying people in near-real-time. The face mod- small collection of characteristic images, then an ef- els are built by hand from face images. ficient way to learn and recognize faces might be to build the characteristic features from known face 2 Eigenfaces for Recognition images and to recognize particular faces by compar- ing the feature weights needed to (approximately) Much of the previous work on automated face recog- reconstruct them with the weights associated with nition has ignored the issue of just what aspects of the known individuals. the face stimulus are important for identification, The following steps summarize the recognition assuming that predefined measurements were rele- process: vant and sufficient. This suggested to us that an 1. Initialization: Acquire the training set of face information theory approach of coding and decod- images and calculate the eigenfaces, which de- ing face images may give insight into the information fine the face space. content of face images, emphasizing the significant local and global “features”. Such features may or 2. When a new face image is encountered, calcu- may not be directly related to our intuitive notion late a set of weights based on the input image of face features such as the eyes, nose, lips, and hair. and the M eigenfaces by projecting the input In the language of information theory, we want image onto each of the eigenfaces. to extract the relevant information in a face image, 3. Determine if the image is a face at all (whether encode it as efficiently as possible, and compare one known or unknown) by checking to see if the face encoding with a database of models encoded image is sufficiently close to “face space.” similarly. A simple approach to extracting the infor- 4. If it is a face, classify the weight pattern as mation contained in an image of a face is to somehow either a known person or as unknown. capture the variation in a collection of face images, independent of any judgement of features, and use 5. (Optional) If the same unknown face is seen this information to encode and compare individual several times, calculate its characteristic weight face images. pattern and incorporate into the known faces In mathematical terms, we wish to find the prin- (i.e., learn to recognize it). cipal components of the distribution of faces, or the eigenvectors of the covariance of the set of 2.1 Calculating Eigenfaces face images. These eigenvectors can be thought of Let a face image 1(z,y) be a two-dimensional N by as a set of features which together characterize the N array of intensity values, or a vector of dimension variation between face images. Each image location N2. A typical image of size 256 by 256 describes a contributes more or less to each eigenvector, so that vector of dimension 65,536, or, equivalently, a point we can display the eigenvector as a sort of ghostly in 65,536-dimensional space. An ensemble of im- face which we call an eigenface. Some of these faces ages, then, maps to a collection of points in this are shown in Figure 2. huge space. Each face image in the training set can be repre- Images of faces, being similar in overall configura- sented exactly in terms of a linear combination of tion, will not be randomly distributed in this huge the eigenfaces. The number of possible eigenfaces is image space and thus can be described by a rel- equal to the number of face images in the training atively low dimensional subspace. The main idea set. However the faces can also be approximated us- of the principal component analysis (or Karhunen- ing only the “best” eigenfaces - those that have the Loeve expansion) is to find the vectors which best largest eigenvalues, and which therefore account for account for the distribution of face images within the most variance within the set of face images. The the entire image space. These vectors define the primary reason for using fewer eigenfaces is compu- subspace of face images, which we call “face space”. tational efficiency. The best M’ eigenfaces span an Each vector is of length N2, describes an N by N

M’-dimensional subspace ~ “face space” ~ of all image, and is a linear combination of the original possible images. As sinusoids of varying frequency face images. Because these vectors are the eigen- and phase are the basis functions of a fourier de= vectors of the corresponding to composition (and are in fact eigenfunctions of linear the original face images, and because they are face- systems), the eigenfaces are the basis vectors of the like in appearance, we refer to them as “eigenfaces.” eigenface decomposition. Some examples of eigenfaces are shown in Figure 2. Figure 3: Three images and their projection\ onto the face space defined by the eigenfaces of Figure 2

(b) in the training set (-11). In practice. the training sct Figure 1: (a) Face images used as the training of face images will tie relatively sinal1 (.U << .Yz). wt. (b) The average face q. and the calculation> tieconie quite manageable. Thcl associated eigenvalues allow us to rank the eigei1vc.c- Let the training set of face images he tors according to their usefulness in characterizing the variation among the irnagrs. Figure 2 shows thr. 1 I. r2.r3. ... r.,~. The average face of the set is de- top seven eigenfaces derived from the input images filled by Q’ = + c;l’=,r,. Each face differs from of Figure 1. Normally the background is removed the average by the vector = - An example @i r, 9. by cropping training imagcs. so that the eigenfares training set is shown in Figure l(a),with the average have zero values outside of the face area. face q shown in Figure l(b). This set of very large vectors is then subject to principal component anal- 2.2 Using Eigenfaces to classify a ysis. which seeks a set of ,If orthonormal vectors un and their associated eigenvalues Xk which best de- face image scribes the distribution of the data. The vectors uk Once the eigenfaces are crr.at.ed. identification be- and scalars Xk are the eigenvectors and eigenvalues. comes a task. The eigenfaces respectively. of the covariance matrix span an df’-dimensional subspace of the original N‘ image space. The -11’significant eigenvectors of t he L matrix are chosen as thosr. with the largest asso- ili ciated eigenvalues. In many of our test cases. based = AA’ on Jf = 16 face images. .If’ = i eigenfaces were used. The number of rigerifaces to be used is chosen where the matrix A = [ @I @2 ... @.v 1. The matrix C’. however, is AVz by *Y2, and determining the .ITz heuristically hased on t ht, eigenvalues. eigenvectors and eigenvalues is an intract able task A nrw face imagr (I’) is transformrd into its for typical image sizes. We need a computationally pigenface coniponrnts (projected into ”face space” ) feasible method to find these eigenvectors. Fortu- by a simple operation. dk = u:(r - 9) for k = nately we can determine the eigenvectors by first 1. . . . . -11’. This drscribrs a set of point-by-point solving a much smaller M by hf matrix problem. image multiplications and summations. Fignre 3 and taking linear combinations of the resulting vec- shows three imagrs and their projections into the tors. (See [ll]for the details.) seren-dimensional facr. space. U-ith this analysis the calculations are greatly re- The weights form a vcc‘tor R’ = [dl 4 . . . dnf,] duced. from the order of the number of pixels in the that dcwribr5 tht. coiitribution of cach eigenface images (S2)to the order of the number of imagrs in reprrsrntiiig th(. input face imagr.. treating the

588 Figure 2: Seven of the eigenfaces calculated from the images of Figure 1, without the background removed. eigenfaces as a basis set for face images. The vec- tor is used to find which of a number of pre-defined face classes, if any, best describes the face. The sim- plest method for determining which face class pro- vides the best description of an input face image is to find the face class lc that minimizes the Euclid- ian distance Ck = Il(0 - Ok)ll, where 0kis a vector describing the lcth face class. A face is classified as belonging to class lc when the minimum Ck is be- low some chosen threshold Be. Otherwise the face is classified as “unknown”. (a) (b) Figure 4: (a) Original image. (b) Face map, 2.3 Using Eigenfaces to detect faces where low values (dark areas) indicate the pres- We can also use knowledge of the face space to de- ence of a face. tect and locate faces in single images. This allows us to recognize the presence of faces apart from the task of identifying them. 4 Creating the vector of weights for an image is equivalent to projecting the image onto the low- dimensional face space. The distance E between the image and its projection onto the face space is sim- ply the distance between the mean-adjusted input image @ = r - Q and af = ~i?;”‘WkUk, its pro- jection onto face space. As seen in Figure 3, images of faces do not change radically when projected into the face space, while the projection of non-face images appear quite dif- ferent. This basic idea is used to detect the presence Figure 5: A simplified version of face space to of faces in a scene: at every location in the image, illustrate the four results of projecting an image calculate the distance E between the local subimage into face space. In this case, there are two eigen- and face space. This distance from face space is used faces (u1 and U*) and three known individuals as a measure of “faceness”, so the result of calculat- (01,02, and 03). ing the distance from face space at every point in the image is a “face map” E(z,Y). Figure 4 shows uals should project to near the corresponding face an image and its face map - low values (the dark class, i.e. Ck < 0,. Thus there are four possibilities area) indicate the presence of a face. There is a dis- for an input image and its pattern vector: (1) near tinct minimum in the face map corresponding to the face space and near a face class; (2) near face space location of the face in the image. but not near a known face class; (3) distant from Unfortunately, direct calculation of this distance face space and near a face class; and (4) distant from measure is rather expensive. We have therefore de- face space and not near a known face class. Figure veloped a simpler, mote efficient method of calcu- 5 shows these four options for the simple example lating the face map e(z,y), which is described in of two eigenfaces. [Ill. In the first case, an individual is recognized and identified. In the second case, an unknown individ- 2.4 Face space revisited ual is present. The last two cases indicate that the An image of a face, and in particular the faces in the image is not a face image. Case three typically shows training set, should lie near the face space, which up as a false positive in most recognition systems; in in general describes images that are “face-like”. In our framework, however, the false recognition may other words, the projection distance E should be be detected because of the significant distance be- within some threshold 06. Images of known individ- tween the image and the subspace of expected face

589 c

images. Figure 3 shows some images and their pro- jections into face space. Figure 3 (a) and (b) are examples of case 1, while Figure 3 (c) illustrates case 4. In our current system calculation of the eigenfaces is done offline as part of the training. The recogni- tion currently takes about 350 msec running rather inefficiently in Lisp on a Sun Sparcstation 1, using face images of size 128x128.

3 Recognition Experiments To assess the viability of this approach to face recog- nition, we have performed experiments with stored face images and built a system to locate and rec- Figure 6: The head tracking and locating sys- ognize faces in a dynamic environment. We first tem. created a large database of face images collected un- der a wide range of imaging conditions. Using this These experiments show an increase of perfor- database we have conducted several experiments to mance accuracy as the acceptance threshold de- assess the performance under known variations of creases. This can be tuned to achieve effectively lighting, scale, and orientation. perfect recognition as the threshold tends to zero, The images from Figure l(a) were taken from a but at the cost of many images being rejected as database of over 2500 face images digitized under unknown. The tradeoff between rejection rate and controlled conditions. Sixteen subjects were digi- recognition accuracy will be different for each of the tized at all combinations of three head orientations, various face recognition applications. three head sizes or scales, and three lighting condi- The results also indicate that changing lighting tions. A six level gaussian pyramid was constructed conditions causes relatively few errors, while perfor- for each image, resulting in image resolution from mance drops dramatically with size change. This is 512x512 pixels down to 16x16 pixels. not surprising, since under lighting changes alone In the first experiment the effects of varying light- the neighborhood pixel correlation remains high, ing. size, and head orientation were investigated us- but under size changes the correlation from one im- ing the complete database of 2500 images. Various age to another is quite low. It is clear that there is groups of sixteen images were selected and used as a need for a multiscale approach, so that faces at a the training set. Within each training set there was particular size are compared with one another. one image of each person, all taken under the same conditions of lighting, image size, and head orienta- 4 Real-time recognition tion. All images in the database were then classified People are constantly moving. Even while sitting, as being one of these sixteen individuals - no faces we fidget and adjust our body position, blink, look were rejected as unknown. around, and such. For the case of a moving person Statistics were collected measuring the mean ac- in a static environment, we built a simple motion curacy as a function of the difference between the detection and tracking system, depicted in Figure training conditions and the test conditions. In the 6, which locates and tracks the position of the head. case of infinite 8, and 86, the system achieved ap- Simple spatio-temporal filtering followed by a non- proximately 96% correct classification averaged over linearity accentuates image locations that change in lighting variation, 85% correct averaged over orien- intensity over time, so a moving person “lights up” tation variation, and 64% correct averaged over size in the filtered image. variation. After thresholding the filtered idage to produce a In a second experiment the same procedures were binary motion image, we analyze the “motion blobs” followed, but the acceptance threshold 8, was also over time to decide if the motion is caused by a per- varied. At low values of O,, only images which son moving and to determine head position. A few project very closely to the known face classes (cases simple rules are applied, such as “the head is the 1 and 3 in Figure 5) will be recognized, so that there small upper blob above a larger blob (the body)”, will be few errors but many of the images will be and “head motion must be reasonably slow and con- rejected as unknown. At high values of 8, most im- tiguous” (heads aren’t expected to jump around the ages will be classified, but there will be more errors. image erratically). Figure 7 shows an image with Adjusting 8, to achieve 100% accurate recognition the head located, along with the path of the head in boosted the unknown rates to 19% while varying the preceding sequence of frames. lighting, 39% for orientation, and 60% for size. Set- We have used the techniques described above to ting the unknown rate arbitrarily to 20% resulted build a system which locates and recognizes faces in correct recognition rates of loo%, 94%, and 74% in near-real-time in a reasonably unstructured envi- respectively. ronment. When the motion detection and analysis

590 The eigenface approach to face recognition was motivated by information theory, leading to the idea of basing face recognition on a small set of image fea- tures that best approximate the set of known face images, without requiring that they correspond to our intuitive notions of facial parts and features. Al- though it is not an elegant solution to the general object recognition problem, the eigenface approach does provide a practical solution that is well fitted to the problem of face recognition. It is fast, rela- tively simple, and has been shown to work well in a somewhat constrained environment. References Davies, Ellis, and Shepherd (eds.), Perceiving and Remembering Faces, Academic Press, Lon- don, 1981. W. W. Bledsoe, “The model method in facial Figure 7: The head has been located - the im- recognition,” Panoramic Research Inc., Palo age in the box is sent to the face recognition pro- Alto, CA, Rep. PRI:15, Aug. 1966. cess. Also shown is the path of the head tracked T. Kanade, “Picture processing system by com- over several previous frames. puter complex and recognition of human faces,” Dept. of Information Science, Kyoto University, programs finds a head, a subimage, centered on the Nov. 1973. head, is sent to the face recognition module. Using A. L. Yuille, D. S. Cohen, and P. W. Halli- the distance-from-face-space measure E, the image nan, “Feature extraction from faces using de- is either rejected as not a face, recognized as one formable templates,” Proc. CVPR, San Diego, of a group of familiar faces, or determined to be an CA, June 1989. unknown face. Recognition occurs in this system at rates of up to two or three times per second. S. Carey and R. Diamond, “From piecemeal to configurational representation of faces,” Sci- 5 Further Issues and Conclusion ence, Vol. 195, Jan. 21, 1977, 312-13. M. Fleming and G. Cottrell, “Categorization We are currently extending the system to deal with of faces using unsupervised feature extraction,” a range of aspects (other than full frontal views) Proc. IJCNN-90, Vol. 2. by defining a small number of face classes for each known person corresponding to characteristic views. T. Kohonen and P. Lehtio, “Storage and pro- Because of the speed of the recognition, the system cessing of information in distributed associa- has many chances within a few seconds to attempt tive memory systems,” in G. E. Hinton and to recognize many slightly different views, at least J. A. Anderson, Parallel Models of Associative one of which is likely to fall close to one of the char- Memory, Hillsdale, NJ: Lawrence Erlbaum As- acteristic views. sociates, 1981, pp. 105-143. An intelligent system should also have an abil- T. 3. Stonham, “Practical face recognition and ity to adapt over time. Reasoning about images verification with WISARD,” in H. Ellis, M. in face space provides a means to learn and sub- Jeeves, F. Newcombe, and A. Young (eds.), As- sequently recognize new faces in an unsupervised pects of Face Processing, Martinus Nijhoff Pub- manner. When an image is sufficiently close to face lishers, Dordrecht, 1986. space (i.e., it is face-like) but is not classified as one P. Burt, “Smart sensing within a Pyramid Vi- of the familiar faces, it is initially labeled as “un- sion Machine,” Proc. IEEE, Vol. 76, No. 8, known”. The computer stores the pattern vector Aug. 1988. and the corresponding unknown image. If a col- lection of “unknown” pattern vectors cluster in the L. Sirovich and M. Kirby, “Low-dimensional pattern space, the presence of a new but unidentified procedure for the characterization of human face is postulated. faces,” J. Opt. Soc. Am. A, Vol. 4, No. 3, March A noisy image or partially occluded face should 1987, 519-524. cause recognition performance to degrade grace- [ll] M. Turk and A. Pentland, “Eigenfaces for fully., since the system essentially implements an Recognition”, Journal of Cognitive Neuro- autoassociative memory for the known faces (as de- science, March 1991. scribed in [7]). This is evidenced by the projection of the occluded face image of Figure 3(b).

591