<<

FACIAL SIMILARITY: GROUPING SACRED WOMEN AT TEMPLE

Anil K. Jain, Brendan Klare, and Unsang Park Kent Davis

Department of Computer Science and Engineering DatAsia, Inc. Michigan State University Holmes Beach, FL East Lansing, MI

ABSTRACT

For nearly 1,000 years, Angkor Wat temple in modern Cambodia has protected nearly 2,000 ancient images of individual Asian women (called devata or ) portrayed in detailed full-body carvings. Using a database of 252 devata facial images from one temple section, this study processes the images using pattern recognition techniques Fig. 1 Angkor Wat, Siem Reap, Cambodia with the intent of discovering natural facial groupings. Shape In addition to fine architecture, and celebrated bas and feature-based facial similarity are computed based on depictions of the and other legends, Angkor Wat 140 manually marked facial landmarks. Domain expert has protected the most unique royal portrait gallery on Earth: knowledge in the form of a small number of pair-wise (must- a group of 1,780 contemporary 12th century women link) constraints is employed to guide the clustering (referred to hereafter as devata1) depicted in detailed full- algorithm. This semi-supervised learning algorithm yields body portraits (See Figure 2). These images are eight meaningful groups of visually similar images. extraordinary from an anthropological perspective because

they were created over a relatively short time period when Index Terms — Angkor Wat, archeology, facial Khmer stone carving was at its peak. According to domain landmarks, semi-supervised learning, shape similarity. experts, the women appear to represent a range of ethnicities

found in this region. They may, therefore, represent social

and political relationships among different corners of the 1. INTRODUCTION . Identifying their origins could indicate the People worldwide recognize the 12th century Hindu temple of political or cultural dominance of certain regions and Angkor Wat in modern day Cambodia. Few, however, know provide a genetic fingerprint of the empire’s influence. that it houses one of the world's most unique portrait Probable ethnicities include Khmer (Central Cambodia), Lao collections. Under the reign of King Suryavarman II (1,113- (NE Thailand & Laos), Thai-Lanna (North Central Thai), 1,150AD), the Khmer civilization reached its apex in Chinese (East Asian), Indian (South Asian), Cham (Coastal Southeast Asia, controlling the region now occupied by Vietnam), Primitive (Mountain Tribes) and Archipelago Cambodia and parts of Thailand, Laos, Burma, Vietnam and (Malaysian-Indonesian). Malaysia [1]. The king harnessed his vast empire's resources Facial pattern recognition presents an objective, scientific to build the massive Hindu edifice of Angkor Wat (See Figure tool that may unlock some of the secrets these lifelike 1). Following his reign, the Khmers declined until the 15th portraits hold. To show the feasibility of landmark-based century when they were overrun by their former subjects, the facial similarity in discovering facial groups, this study Siamese. Angkor Wat, and the ancient capital surrounding it, focuses on facial images of all 252 women portrayed in the were swallowed by jungle and only known to local West Gopura, or entrance pavilion, of the temple. The inhabitants. In the mid-1800’s, French explorers rediscovered portraits in this building are representative of the temple the hidden ruins of this previously unknown city and France complex as a whole, in variety, style and degrees of devoted huge resources to documenting and understanding the preservation. This exploratory technique creates an objective Khmer civilization. Yet, despite 150 years of intense study, classification tool for analyzing all 1,780 female devata Angkor Wat’s most amazing treasure has remained images at Angkor Wat. unexamined, hidden in plain sight.

1 Devata: supernatural beings, often depicted as young women of great beauty and elegance proficient in the art of dance (Wikipedia) 2. FACE RECOGNITION Face recognition is an innate human ability that we use in our daily lives. It is natural to build computer systems to duplicate this remarkable human activity for tasks such as man-machine communication, person authentication and surveillance. While substantial progress has been made in automatic face recognition, unconstrained face recognition (in the presence of changes in illumination, occlusion, pose and expression) remains a difficult problem (See Figure 2). Applying face recognition techniques to devata images at Angkor Wat poses its own challenges. The facial carvings do not display much of the texture information (as in photographs of human faces captured by digital cameras). Further, due to the carving age, some devata images are disfigured. On the other hand, the problem is constrained by the fact that almost all the devata facial images are in frontal pose and are portrayed at approximately the same scale. Further, our goal is not to uniquely identify each devata face, but instead to group them into a small number of meaningful categories based on defined measures of facial similarity. There has been some prior work on classifying human Fig. 2 Full body portrait and faces of devata faces into gender and ethnic types. The main purpose of mouth, etc.). The resampling scheme first connects neighboring these studies has been to improve the recognition capability landmarks in a region to form a contour. Next the intersection of and reduce the complexity (by indexing) of face recognition the center of the contour and the lower half of the boundary algorithms. Lu and Jain [3] used Linear Discriminant were used as the starting point. Finally, k ¤ n landmarks were Analysis (LDA) to classify faces as either Asian or non- sampled at an equal spacing, L=(k ¤ n), where k is the Asian type. Lyons et al. [4] classified facial images into East sampling rate, n is the original number of landmarks in the Asians vs. non-East Asians based on Gabor texture wavelet region, and L is the length of the region contour. The features. Gutta and Wechsler [5] classified facial images into resampling scheme leads to more consistent placement of Caucasian, Asian, Oriental, and African types. landmarks on devata faces. There are three main differences between the abovementioned studies and the proposed grouping of 2.2. Similarity measure devata: (i) As mentioned above, devata faces are devoid of In order to compute the similarity between a pair of faces, facial texture and reflectance typical of human faces; (ii) the corresponding landmarks are aligned by centering, instead of classifying faces into distinct races, such as scaling and rotating one set of points with respect to the Caucasian and Asian, our goal is to discover what these other. Given N facial images, an NxN similarity matrix S0 classes are through clustering; clustering is a more difficult is computed, where the similarity between face and , problem than classification; (iii) the number of groups or S0[i; j], is defined as S0[i; j] = n jj© (l) ¡ © (l)jj classes is not known, so both the groups as well as their Pl=1 i j 2 number need to be discovered. where ©i(l) is the two-dimensional location of the lth landmark in image i. We define the normalized similarity 0 0 0 2.1. Facial Feature Extraction matrix as S = S =(max(S ) ¡ min(S )). Figure 4 shows two sets of landmarks aligned for generating the similarity. In order to effectively describe each devata face with the objective of defining a measure of facial similarity, each 2.3. Clustering face was manually annotated with 140 landmarks. The use of Clustering algorithms seek to automatically partition a given landmarks is motivated by the Active Appearance Model set of objects into distinct groups such that objects in each (AAM) in automatic approaches to face recognition [6]. The group are more similar to each other than objects belonging landmarks were obtained from the boundaries of the face, eyes, to different groups [7]. Clustering algorithms are generally neck, nose, mouth and ears (See Figure 3). Since the manual separated into hierarchical and partitional types. placement of landmarks is not always consistent, we applied a Hierarchical algorithms successively merge the most similar resampling scheme separately in each facial region (e.g. face, patterns to generate a tree structure, called a dendrogram. A partition of the objects can be obtained from the dendrogram

Fig. 3 Two facial images with 140 landmarks each

Fig. 5 Must-link pairwise constraints; images in each column should be placed in the same group able to incorporate external constraints (semi-supervised clustering) [9].

2.4. Semi-supervised Clustering In semi-supervised clustering, the domain expert typically specifies two types of pairwise constraints: must-link and cannot-link constraints. Must-link constraints are ideal for Fig. 4 Landmarks in Fig 3 resampled with k = 9 and this task because it is easier for an expert to specify which aligned; similarity between two faces in Fig 3, S[i; j]=0.385 devata faces should belong to the same group. The pairwise constraints were provided by 34 professors and students at by choosing a specific level of similarity to cut the tree. Khmer Arts Academy, Phnom Penh, Cambodia. They were Partitional algorithms, on the other hand, directly construct a asked to group faces and to identify facial pairs that they partition with K clusters by minimizing an objective considered to be the most similar. This lead to a set of 243 function, e.g., the squared error criterion in the well-known must-link pairwise constraints (see Figure 5). One of the K-means algorithm. While hierarchical algorithms operate well-known semi-supervised clustering algorithm is MPC- ideally on a similarity matrix, partitional algorithms prefer a KMeans [9] which minimizes J, a weighted sum of the feature space representation, where each object is described squared error (data term) and the given pairwise constraints. by a feature vector. In order to use a partitional algorithm on In Eq. (1), K is the number of clusters, Âk is the set of our data, we transformed the NxN similarity matrix S to an objects in cluster k with center, ¹k, M is the set of must-link N x d pattern matrix using multidimensional scaling (MDS) constraints, w is the weight of the constraint term, I is the [8]. The goal of multidimensional scaling is to explain the indicator function (I(true) = 1, I(false) = 0), and jj ¢ jjA is given similarity between objects in terms of a small number, the learned metric distance. The value of the w is d, of features (attributes) such that the difference between determined empirically as w = 1. the given similarity and the derived similarity (in the new K 2 feature space) is as small as possible. Given the pattern J(Â; M; w) = X X jjxi¡¹kjjA + X w¢I(li 6= lj) (1) matrix representation, we were also able to use a partitional k=1 xi 2Âk (xi;xj )2M clustering algorithm to group the devata faces. In our exploratory analysis, we clustered the 252 devata 3. EXPERIMENTAL RESULTS faces using both a hierarchical clustering (Complete Link algorithm) of the NxN similarity matrix and a partitional The output of a clustering algorithm can be evaluated in two clustering (K-Means algorithm) of the Nxd pattern matrix ways: (i) subjective approach by visually examining the [8]. While the groups formed by these algorithms did exhibit similarity of objects placed in clusters, and (ii) objective good visual similarity, certain devata faces that were placed evaluation based on a quantitative measure of cluster in different clusters clearly belonged in the same cluster. separation and cohesiveness. In this paper, we evaluate This motivated us to utilize clustering algorithms that are clustering results based on the fraction of the pairwise 0.65

0.6

0.55 Algorithm FC 0.5 Random grouping 0.133 0.45 K-Means 0.242 0.4 FC MPC-KMeans (w = 0:5) 0.679 0.35

0.3 MPC-KMeans (w = 1:0) 0.712

0.25 Table 1 FC values for MPC-KMeans clustering (K = 8 ) 0.2

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 % of Contraints Used Fig. 6 Performance of MPC-KMeans vs. no. of constraints constraints (not included in the MPC-KMeans clustering), FC, that are satisfied. The number of clusters was chosen ( ) based on the squared error criterion [7]. K = 8 Figure 6 shows performance as the number of constraints used in MPC-KMeans increases; One third of available constraints (81) were used to compute the measure FC (these constraints were not used by MPC-KMeans). Table 1 compares MPC-KMeans algorithm with K-Means algorithm (no constraints) and a random partitioning (grouping) of devata faces using 10-fold cross validation. Prototypes of the eight clusters output by MPC-KMeans Fig. 7 A prototype from each of the eight clusters using all the 243 pairwise constraints (w = 1:0) are shown in Figure 7. The facial features of these prototypes show 2006: FRVT 2006 and ICE 2006 Large-Scale Results, Tech. that the clusters represent devata of different ethnic types. Report NISTIR 7408, NIST, 2007. 4. SUMMARY [3] X Lu and A K Jain, “Ethnicity identification from face images,” in Proc. SPIE Defense and Security Symposium, Facial images of devata carvings in Angkor Wat temple Orlando, 2004, pp. 114-123. were processed and clustered according to their facial [4] M J Lyons, J Budynek, and S Akamatsu, “Automatic similarity. The goal of the clustering was to determine if the classification of single facial images,” IEEE Trans. Pattern devata can effectively be grouped into separate types. Analysis and Machine Intelligence, vol. 21, pp. 1357-1362, 1999. Domain expert knowledge about facial features was coded in [5] S Gutta and H Wechsler, “Gender and ethnic classification of the form of pairwise constraints. This prior knowledge human faces using hybrid classifiers,” Intl’ Joint Conf. Neural significantly improved the grouping, supported by visual Networks, vol. 6, pp. 4084-4089, Jul 1999. examination of cluster members and a quantitative measure. [6] T Cootes, G Edwards, and C Taylor, “Active appearance Domain experts see enormous potential utilizing this models,” IEEE Trans. Pattern Analysis and Machine Intelligence, objective classification tool to analyze and group the entire vol. 23, pp. 681-685, 2001. collection of 1,780 devata images. Additional domain [7] A K Jain and R C Dubes. Algorithms for Clustering Data, Prentice Hall, 1988. knowledge will enable cluster refinement and more detailed cluster interpretation. Archeological researchers propose that [8] J B Kruskal, “Multidimensional scaling by optimizing goodness of fit to a nonmetric hypothesis,” Psychometrika, vol. 29, devata facial types will match regional ethnic types, creating pp. 1–27, 1964. an objective measure to enhance anthropological, [9] Bilenko, M., Basu, S., and Mooney, R. J. 2004. “Integrating ethnographic, sociological and historic understanding of the constraints and metric learning in semi-supervised clustering,” In Khmer civilization and its evolution to the modern era. Proc. of the Twenty-First International Conference on Machine Learning, Banff, Alberta, 2004, pp. 11-18.

5. REFERENCES [1] Jessup, Helen Ibbitson, and Thierry Zephir. Sculpture of Angkor and Ancient Cambodia: Millennium of Glory. Washington, D.C.: National Gallery of Art, 1997. [2] P J Phillips, W T Scruggs, A J O’Toole, P J Flynn, K W Bowyer, C L Schott, and M Sharpe, Face Recognition Vendor Test