M.I.T Media Lab oratory Perceptual Computing Section Technical Rep ort No. 443

App ears in: The 3rd IEEE Int'l ConferenceonAutomatic Face & GestureRecognition, Nara, Japan, April 1998

Beyond Eigenfaces:

Probabilistic Matching for Face Recognition

Baback Moghaddam, Wasiuddin Wahid and Alex Pentland

MIT Media Lab oratory, 20 Ames St., Cambridge, MA 02139, USA.

Email: fbaback,wasi,[email protected]

facial expressions of the same individual ) and extrapersonal Abstract

variations (corresp onding to variations b etween di er-

E

We prop ose a novel technique for direct visual

ent individu al s). Our similarity measure is then expressed

matching of images for the purp oses of face

in terms of the probabili ty

recognition and database search. Sp eci call y,

we argue in favor of a probabilistic measure of

S (I ;I )=P ( 2 )=P ( j) (1)

1 2 I I

similarity, in contrast to simpler metho ds which

where P ( j) is the aposteriori probabilitygiven by

I

are based on standard L norms (e.g., template

2

Bayes rule, using estimates of the likeliho o d s P (j )and

I

matching) or subspace-restricted norms (e.g.,

P (j )which are derived from training data using an

E

eigenspace matching). The prop osed similarity

ecient subspace metho d for density estimation of high-

measure is based on a Bayesian analysis of image

dimensional data [6]. This Bayesian (MAP) approach

di erences: we mo del twomutually exclusive

can also b e viewed as a generalized nonlinear extension of

classes of variation b etween two facial images:

Linear Discriminant Analysis (LDA) [8, 3] or \FisherFace"

intra-personal (variations in app earance of the

techniques [1] for face recognition. Moreover, our nonlinear

same individ ual , due to di erent expressions or

generalization has distinct computational/ storage advan-

lighting) and extra-personal (variations in ap-

tages over these linear metho ds for large databases.

p earance due to a di erence in identity). The

high-dimension al probabili ty density functions

2 Analysis of Intensity Di erences

for each resp ective class are then obtained from

training data using an eigenspace densityes-

Wenow consider the problem of characterizing the typ e

timation technique and subsequently used to

of di erences which o ccur when matching two images

compute a similarity measure based on the a

in a face recognition task. We de ne two distinct and

posteriori probabili tyofmemb ership in the intra-

mutually exclusive classes: representing intrapersonal

I

personal class, which is used to rank matches

variations b etween multiple images of the same individu al

in the database. The p erformance advantage

(e.g., with di erent expressions and lighting conditions),

of this probabili sti c matching technique over

and representing extrapersonal variations which result

E

standard nearest-neighb or eigenspace matching

when matching two di erent individual s. We will assume

is demonstrated using results from ARPA's 1996

that b oth classes are Gaussian-distri bu ted and seek to

\FERET" face recognition comp etition, in which

obtain estimates of the likeliho o d functions P (j )and

I

this algorithm was found to b e the top p erformer.

P (j ) for a given intensity di erence  = I I .

E 1 2

Given these likelih o o d s we can de ne the similarity score

1 Intro duction

S (I ;I )between a pair of images directly in terms of the

1 2

intrap ersonal aposteriori probability as given byBayes

Current approaches to image matching for visual ob ject

rule:

recognition and image database retrieval often makeuse

of simple image similarity metrics such as Euclidean

S = P ( j)

I

distance or normalized correlation, which corresp ond to

P (j )P ( )

I I

(2)

=

a standard template-matching approach to recognition [2].

P (j )P ( )+ P (j )P ( )

I I E E

For example, in its simplest form, the similarity measure

S (I ;I )between two images I and I can b e set to b e where the priors P ( ) can b e set to re ect sp eci c

1 2 1 2

inversely prop ortional to the norm jjI I jj.Such a simple op erating conditions (e.g., numb er of test images vs. the

1 2

formulation su ers from a ma jor drawback: it do es not size of the database) or other sources of a priori knowledge

exploit knowledge of whichtypeofvariations are critical regarding the two images b eing matched. Additionall y,this

(as opp osed to incidental) in expressing similarity. In particular Bayesian formulation casts the standard face

this pap er, weformulate a probabilistic similarity measure recognition task (essentially an M -ary classi catio n prob-

which is based on the probability that the image intensity lem for M individu als ) into a binary pattern classi cation

di erences, denoted by = I I ,arecharacteristic problem with and . This much simpler problem

1 2 I E

of typical variations in app earance of the same ob ject. is then solved using the maximum aposteriori (MAP)

For example, for purp oses of face recognition, wecan rule | i.e., two images are determined to b elong to the

de ne two classes of facial image variations: intrapersonal same individua l if P ( j) >P( j), or equivalently,if

I E

1

variations (corresp onding, for example, to di erent . S (I ;I ) >

I 1 2

2 1

Density Mo deling

2.1 F

One diculty with this approach is that the intensity

N

di erence vector is very high-dimensi onal , with  2R

4

N = O (10 ). Therefore wetypically lack sucient

and DFFS

indep endent training observations to compute reliable 2nd-

order statistics for the likeliho o d densities (i.e., singular

covariance matrices will result). Even if wewere able

to estimate these statistics, the computational cost of

aluating the likeliho o ds is formidable. Furthermore, this

ev DIFS

computation would b e highly inecient since the intrinsic

dimensionali ty or ma jor degrees-of-freedom of  for each

class is likely to b e signi cantly smaller than N .

Recently, an ecient density estimation metho d was

y Moghaddam & Pentland [6] which divides

prop osed b F

N

the vector space R into two complementary subspaces

(a)

using an eigenspace decomp osition. This metho d relies on

a Principal Comp onents Analysis (PCA) [4] to form a low-

dimensional estimate of the complete likelihood whichcan

aluated using only the rst M principal comp onents,

be ev FF

where M<

Figure 1 whichshows an orthogonal decomp osition of the

N

vector space R into twomutually exclusive subspaces:

the principal subspace F containing the rst M principal



comp onents and its orthogonal complement F ,which

contains the residual of the expansion. The comp onentin



the orthogonal subspace F is the so-called \distance-from-

feature-space" (DFFS), a Euclidean distance equivalentto

the PCA residual error. The comp onent ofwhichlies

in the feature space F is referred to as the \distance-in-

Mahalanobis distance for

feature-space" (DIFS) and is a 1 M N

Gaussian densities.

(b)

As shown in [6], the complete likeliho o d estimate can

N

Figure 1: (a) Decomp osition of R into the principal sub-

b e written as the pro duct of two indep endent marginal



space F and its orthogonal complement F for a Gaussian

Gaussian densities

!

2 3

density,(b)atypical eigenvalue sp ectrum and its division

M

X

2

y

 

3 2

into the two orthogonal subspaces.

1

i

exp

2

6 7

 () 2 

i

exp

6 7

2

i=1

^ 6 7

5 4

P (j ) = 

(N M )=2

M

6 7

(2)

Y

4 5

1=2

M=2

(2 ) 

images consists of hard recognition cases that have proven

i

i=1

dicult for all face recognition algorithms previously tested

on the FERET database. The diculty p osed bythis

^

= P (j ) P (j )



dataset app ears to stem from the fact that the images were

F

F

(3) taken at di erent times, at di erent lo cations, and under

^

di erent imaging conditions. The set of images consists

where P (j ) is the true marginal densityinF , P (j )



F

F

of pairs of frontal-views (FA/FB) and are divided into

is the estimated marginal density in the orthogonal comple-

2



two subsets: the \gallery" (training set) and the \prob es"

ment F , y are the principal comp onents and  () is the

i

(testing set). The gallery images consisted of 74 pairs of

residual (or DFFS). The optimal value for the weighting

images (2 p er individu al) and the prob e set consisted of 38

parameter  is then found to b e simply the average of the



pairs of images, corresp onding to a subset of the gallery

F eigenvalues

memb ers. The prob e and gallery datasets were captured

N

X

1

aweek apart and exhibit di erences in clothing, hair and

 =  (4)

i

lighting (see Figure 2).

N M

i=M +1

Before we can apply our matching technique, we need to

We note that in actual practice, the ma jorityofthe

p erform an ane alignment of these facial images. For this



F eigenvalues are unknown but can b e estimated, for

purp ose wehave used an automatic face-pro cessing system

example, by tting a nonlinear function to the available

which extracts faces from the input image and normalizes

p ortion of the eigenvalue sp ectrum and estimating the

for translation, scale as well as slight rotations (b oth in-

average of the eigenvalues b eyond the principal subspace.

plane and out-of-plane). This system is describ ed in detail

in [6] and uses maximum-likeli ho o d estimation of ob ject

3 Exp eriments

lo cation (in this case the p osition and scale of a face and the

lo cation of individua l facial features) to geometrically align To test our recognition strategy we used a collection of

faces into standard normalized form as shown in Figure 3. images from the FERET face database. This collection of 2

algorithm tested on this database, and that it is lower (by

ab out 10%) than the typical rates that wehave obtained

with the FERET database [5]. We attribute this lower

p erformance to the fact that these images were selected

to b e particularly challengin g. In fact, using an eigenface

metho d to match the rst views of the 76 individua ls

in the gallery to their second views, we obtain a higher

recognition rate of 89% (68 out of 76), suggesting that

the gallery images representalesschallenging data set

(a) (b)

since these images were taken at the same time and under

Figure 2: Examples of FERET frontal-view image pairs identical lighting conditions.

used for (a) the Gallery set (training) and (b) the Prob e

3.2 Bayesian Matching

set (testing).

For our probabili sti c algorithm, we rst gathered training

data by computing the intensity di erences for a training

subset of 74 intrap ersonal di erences (by matching the two

views of every individ ual in the gallery) and a random sub-

set of 296 extrap ersonal di erences (by matching images of

ent individual s in the gallery), corresp onding to the Multiscale Feature di er

Head Search Search

classes and , resp ectively.

I E

It is interesting to consider how these two classes are distributed, for example, are they linearly separable or

Scale Warp Mask

emb edded distributio ns? One simple metho d of visualizi ng

this is to plot their mutual principal comp onents | i.e.,

p erform PCA on the combined dataset and pro ject each

Figure 3: The face alignmentsystem

vector onto the principal eigenvectors. Such a visualizati on

is shown in Figure 5(a) which is a 3D scatter plot of

the rst 3 principal comp onents. This plot shows what

All the faces in our exp eriments were geometrically aligned

app ears to b e two completely enmeshed distributions , b oth

and normalized in this manner prior to further analysis.

having near-zero means and di ering primarily in the

amount of scatter, with displayin g smaller intensity

I

3.1 Eigenface Matching

di erences as exp ected. It therefore app ears that one

can not reliably distinguish low-amplitude extrap ersonal

As a baseline comparison, we rst used an eigenface

di erences (of which there are many) from intrap ersonal

matching technique for recognition [9]. The normalized

ones.

images from the gallery and the prob e sets were pro jected

However, direct visual interpretation of Figure 5(a) is

onto a 100-dimensional eigenspace and a nearest-neighbor

very misleadin g since we are essentially dealing with low-

rule based on a Euclidean distance measure was used to

dimensional (or \ attened") hyp er-ellipso id s whichare

match each prob e image to a gallery image. We note

intersecting near the origin of a very high-dimension al

that this metho d corresp onds to a generalized template-

space. The key distingui shi ng factor b etween the two

matching metho d which uses a Euclidean norm typ e of

distributions is their relative orientation. Fortunately,we

similarity S (I ;I ), which is restricted to the principal

1 2

can easily determine this relative orientation by p erforming

comp onent subspace of the data. A few of the lower-

a separate PCA on each class and computing the dot

order eigenfaces used for this pro jection are shown in

pro duct of their resp ective rst eigenvectors. This analysis

Figure 4. We note that these eigenfaces representthe

yields the cosine of the angle b etween the ma jor axes

principal comp onents of an entirely di erent set of images



of the twohyp er-ellip soi ds, whichwas found to b e 124 ,

| i.e., none of the individ ual s in the gallery or prob e

implying that the orientation of the twohyp er-ellipsoi ds

sets were used in obtaining these eigenvectors. In other

is quite di erent. Figure 5(b) is a schematic illustration

words, neither the gallery nor the prob e sets were part of

of the geometry of this con guration, where the hyp er-

the \training set." The rank-1 recognition rate obtained

ellipsoids have b een drawn to approximate scale using the

with this metho d was found to b e 84% (64 correct matches

corresp onding eigenvalues.

out of 76), and the correct matchwas always in the

top 10 nearest neighb ors. Note that this p erformance is

3.3 Dual Eigenfaces

b etter than or similar to recognition rates obtained byany

We note that the twomutually exclusive classes and

I

corresp ond to a \dual" set of eigenfaces as shown in

E

Figure 6. Note that the intrap ersonal variations shown in

Figure 6-(a) represent subtle variations due mostly to ex-

pression changes (and lighting) whereas the extrap ersonal

variations in Figure 6-(b) are more representative of general

eigenfaces whichcodevariations such as hair color, facial

hair and glasses. Also note the overall qualitative similarity

of the extrap ersonal eigenfaces to the standard eigenfaces

Figure 4: Standard Eigenfaces.

in Figure 4. This suggests the basic intuition that intensity 3 φ 1 I

o 124 100

50

0

−50 Ω I φ 1 Ω E E −100 200 100 200 0 100 0 −100 −100

−200 −200

(a) (b)

Figure 5: (a) Distributio n of the two classes in the rst 3 principal comp onents (circles for , dots for )and(b)

I E

schematic representation of the two distributions showing orientation di erence b etween the corresp onding principal

eigenvectors.

P (j ) using the PCA-based metho d outlined in Sec-

E

tion 2.1. We selected principal subspace dimensions of

M = 10 and M = 30 for and , resp ectively.

I E I E

These density estimates were then used with a default

setting of equal priors, P ( )=P ( ), to evaluate the a

I E

posteriori intrap ersonal probability P ( j) for matching

I

(a)

prob e images to those in the gallery.

Therefore, for each prob e image we computed prob e-

to-gallery di erences and sorted the matching order, this

time using the aposteriori probability P ( j) as the

I

similarity measure. This probabili stic ranking yielded an

improved rank-1 recognition rate of 89.5%. Furthermore,

out of the 608 extrap ersonal warps p erformed in this

(b)

recognition exp eriment, only 2% (11) were misclassi ed as

Figure 6: \Dual" Eigenfaces: (a) Intrap ersonal, (b)

b eing intrap ersonal | i.e., with P ( j) >P( j).

I E

Extrap ersonal

3.4 The 1996 FERET Comp etition

Results

di erences of the extrap ersonal typ e span a larger vector

This approach to recognition has pro duced a signi cant

space similar to the volume of facespace spanned by

improvementover the accuracy we obtained using a stan-

standard eigenfaces, whereas the intrapersonal eigenspace

dard eigenface nearest-neighb or matching rule. The proba-

corresp onds to a more tightly constrained subspace. It

bilistic similarity measure was used in the Septemb er 1996

is the representation of this intrap ersonal subspace that

FERET comp etition (with subspace dimensional iti es of

is the critical part of formulating a probabili stic measure

M = M = 125) and was found to b e the top-p erforming

I E

of facial similarity. In fact our exp eriments with a larger

system byatypical margin of 10-20% over the other

set of FERET images have shown that this intrap ersonal

comp eting algorithms [7] (see Figure 7). Figure 8 shows

eigenspace alone is sucient for a simpli ed maximum

the p erformance comparison b etween standard eigenfaces

likelihood measure of similarity (see Section 3.4).

and the Bayesian metho d from this test. Note the 10% gain

Finally,we note that since these classes are not linearly

in p erformance a orded by the new Bayesian similarity

separable, simple linear discriminant techniques (e.g., us-

measure. Similarly, Figure 9 shows the recognition results

ing hyp erplanes) can not b e used with any degree of relia-

for \duplicate" images whichwere separated in time by

bility. The prop er decision surface is inherently nonlinear

up to 6 months (a much more challengin g recognition

(quadratic, in fact, under the Gaussian assumption) and

problem) whichshows a 30% improvement in recognition

is b est de ned in terms of the aposteriori probabili ties |

rate with Bayesian matching. Thus we note that in

i.e., by the equality P ( j) = P ( j). Fortunately,the

I E

b oth cases (FA/FB and duplicates) the new probabili stic

optimal discriminant surface is automatically implemented

similarity measure has e ectively halved the error rate of

when invoking a MAP classi catio n rule.

eigenface matching.

Wehave recently exp erimented with a more simpli ed Having analyzed the geometry of the two distribution s,

probabilisti c similarity measure which uses only the in- we then computed the likeliho o d estimates P (j ) and

I 4 1.00 0.90

0.80 0.90

0.70 0.80 MIT Sep 96 0.60 MIT Mar 95 UMD 0.70 Excalibur MIT-Stand. Auguts 1996 Cumulative match score probe: 3816 0.50 Rutgers MIT-Stand.MIT-Stand. MarchMarch 19951995 gallery: 1196 Cumulative Match Score scored probe: 1195 ARL EF Gallery Probes 831 463 0.60 0.40 010203040 Rank

0.30

ulative recognition rates for frontal FA/FB

Figure 7: Cum 0 1020304050

views for the comp eting algorithms in the FERET 1996 Rank

test. The top curve (lab eled \MIT Sep 96") corresp onds

Figure 9: Cumulative recognition rates for frontal dupli-

to our Bayesian matching technique. Note that second

cate views with standard eigenface matching and the newer

placed is standard eigenface matching (lab eled \MIT Mar

Bayesian similarity metric.

95").

trapersonal eigenfaces with the intensity di erence  to

to simpler metho ds which are based on standard L norms

2

formulate a maximum likelihood (ML) matching technique

(e.g., template matching [2]) or subspace-restricted norms

using

(e.g., eigenspace matching [9]). This technique is based on

0

S = P (j ) (5)

I

aBayesian analysis of image di erences which leads to a

very useful measure of similarity.

instead of the maximum a posteriori (MAP) approachde-

The p erformance advantage of our probabili sti c match-

ned by Equation 2. Although this simpli ed measure has

not yet b een ocially FERET tested, our own exp eriments ing technique has b een demonstrated using b oth a small

0

database (internally tested) as well as a large (800+)

with a database of size 2000 haveshown that using S

instead of S results in only a minor (2%) de cit in the database with an indep endent double-bli nd test as part

of ARPA's Septemb er 1996 \FERET" comp etition, in

recognition rate while cutting the computational cost by

a factor of 1/2 (requiring a single eigenspace pro jection as whichBayesian similarity out-p erformed all comp eting

algorithms (at least one of whichwas using an LDA/Fisher

opp osed to two).

typ e metho d). Webelieve that these results clearly demon-

strate the sup erior p erformance of probabilisti c matching

1.00

over eigenface, LDA/Fisher and other existing techniques.

This probabilisti c framework is particularly advanta-

geous in that the intra/extra density estimates explicitly

haracterize the typ e of app earance variations whichare

0.90 c

critical in formulating a meaningful measure of similarity.

For example, the deformations corresp onding to facial

hanges (whichmayhave high image-di erence

MIT-Stand. August 1996 expression c

irrelevant when the measure of sim- 0.80 norms) are, in fact,

MIT-Stand. March 1995

y is to b e based on identity. The subspace density Gallery Probes ilarit

Cumulative Match Score ting these classes

831 780 estimation metho d used for represen

thus corresp onds to a learning metho d for discovering the

0.70

principal mo des of variation imp ortant to the classi cation

0 10203040

task. Furthermore, by equating similarity with the a

Rank posteriori probabili tywe obtain an optimal non-linear

decision rule for matching and recognition. This asp ect

Figure 8: Cumulative recognition rates for frontal FA/FB

of our approach di ers signi cantl y from recent metho ds

views with standard eigenface matching and the newer

which use simple linear discriminant analysis techniques

Bayesian similarity metric.

for recognition (e.g., [8, 3]). Our Bayesian (MAP) metho d

can also b e viewed as a generalized nonlinear (quadratic)

version of Linear Discriminant Analysis (LDA) [3] or

4 Conclusions

\FisherFace" techniques [1]. The computational advantage

of our approach is that there is no need to compute and Wehave prop osed a novel technique for direct visual

store an eigenspace for each individ ual in the gallery (as matching of images for the purp oses of recognition and

required with LDA). One (or at most two) eigenspaces are search in a large face database. Sp eci call y,wehave argued

sucient for probabili sti c matching and therefore storage in favor of a probabilistic measure of similarity,incontrast 5

and computational costs are xed and do not increase with

the size of the database (as with LDA/Fisher metho ds).

Finally, the results obtained with the simpli ed ML

0

similarity measure (S in Eq. 5) suggest a computationally

equivalentyet sup erior alternative to standard eigenface

matching. In other words, a likeliho o d similarity based on

the intrap ersonal density P (j ) alone is far sup erior to

I

nearest-neighb or matching in eigenspace while essentially

requiring the same numb er of pro jections. For complete-

ness (and a slightly b etter p erformance) however, one

should use the aposteriori similarity S in Eq. 2, at twice

the computational cost of standard eigenfaces.

References

[1] V.I. Belhumeur, J.P. Hespanha, and D.J. Kriegman.

Eigenfaces vs. sherfaces: Recognition using class sp e-

ci c linear pro jection. IEEE Transactions on Pattern

Analysis and Machine Intel ligence,PAMI-19(7):711{

720, July 1997.

[2] R. Brunelli and T. Poggio. Face recognition : Features

vs. templates. IEEE Transactions on Pattern Analysis

and Machine Intel ligence, 15(10), Octob er 1993.

[3] K. Etemad and R. Chellappa. Discriminant analysis for

recognition of human faces. In Proc. of Int'l Conf. on

Acoustics, Speech and Signal Processing, pages 2148{

2151, 1996.

[4] I.T. Jolli e. Principal Component Analysis. Springer-

Verlag, New York, 1986.

[5] B. Moghaddam and A. Pentland. Face recognition

using view-based and mo dular eigenspaces. Automatic

Systems for the Identi cation and Inspection of Hu-

mans, 2277, 1994.

[6] B. Moghaddam and A. Pentland. Probabili stic visual

learning for ob ject representation. IEEE Transactions

on Pattern Analysis and Machine Intel ligence,PAMI-

19(7):696{710, July 1997.

[7] P. J. Phillips, H. Mo on, P. Rauss, and S. Rizvi. The

FERET evaluation metho dology for face-recognition

algorithms. In IEEE Proceedings of

and , pages 137{143, June 1997.

[8] D. Swets and J. Weng. Using discriminant eigenfeatures

for image retrieval. IEEE Transactions on Pattern

Analysis and Machine Intel ligence,PAMI-18(8):831{

836, August 1996.

[9] M. Turk and A. Pentland. Eigenfaces for recognition.

Journal of Cognitive Neuroscience, 3(1), 1991. 6