Facial Similarity: Grouping Sacred Women at Angkor Wat Temple

FACIAL SIMILARITY: GROUPING SACRED WOMEN AT ANGKOR WAT TEMPLE Anil K. Jain, Brendan Klare, and Unsang Park Kent Davis Department of Computer Science and Engineering DatAsia, Inc. Michigan State University Holmes Beach, FL East Lansing, MI ABSTRACT For nearly 1,000 years, Angkor Wat temple in modern Cambodia has protected nearly 2,000 ancient images of individual Asian women (called devata or apsara) portrayed in detailed full-body carvings. Using a database of 252 devata facial images from one temple section, this study processes the images using pattern recognition techniques Fig. 1 Angkor Wat, Siem Reap, Cambodia with the intent of discovering natural facial groupings. Shape In addition to fine architecture, and celebrated bas relief and feature-based facial similarity are computed based on depictions of the Ramayana and other legends, Angkor Wat 140 manually marked facial landmarks. Domain expert has protected the most unique royal portrait gallery on Earth: knowledge in the form of a small number of pair-wise (must- a group of 1,780 contemporary 12th century women link) constraints is employed to guide the clustering (referred to hereafter as devata1) depicted in detailed full- algorithm. This semi-supervised learning algorithm yields body portraits (See Figure 2). These images are eight meaningful groups of visually similar images. extraordinary from an anthropological perspective because they were created over a relatively short time period when Index Terms — Angkor Wat, archeology, facial Khmer stone carving was at its peak. According to domain landmarks, semi-supervised learning, shape similarity. experts, the women appear to represent a range of ethnicities found in this region. They may, therefore, represent social and political relationships among different corners of the 1. INTRODUCTION Khmer Empire. Identifying their origins could indicate the People worldwide recognize the 12th century Hindu temple of political or cultural dominance of certain regions and Angkor Wat in modern day Cambodia. Few, however, know provide a genetic fingerprint of the empire’s influence. that it houses one of the world's most unique portrait Probable ethnicities include Khmer (Central Cambodia), Lao collections. Under the reign of King Suryavarman II (1,113- (NE Thailand & Laos), Thai-Lanna (North Central Thai), 1,150AD), the Khmer civilization reached its apex in Chinese (East Asian), Indian (South Asian), Cham (Coastal Southeast Asia, controlling the region now occupied by Vietnam), Primitive (Mountain Tribes) and Archipelago Cambodia and parts of Thailand, Laos, Burma, Vietnam and (Malaysian-Indonesian). Malaysia [1]. The king harnessed his vast empire's resources Facial pattern recognition presents an objective, scientific to build the massive Hindu edifice of Angkor Wat (See Figure tool that may unlock some of the secrets these lifelike 1). Following his reign, the Khmers declined until the 15th portraits hold. To show the feasibility of landmark-based century when they were overrun by their former subjects, the facial similarity in discovering facial groups, this study Siamese. Angkor Wat, and the ancient capital surrounding it, focuses on facial images of all 252 women portrayed in the were swallowed by jungle and only known to local West Gopura, or entrance pavilion, of the temple. The inhabitants. In the mid-1800’s, French explorers rediscovered portraits in this building are representative of the temple the hidden ruins of this previously unknown city and France complex as a whole, in variety, style and degrees of devoted huge resources to documenting and understanding the preservation. This exploratory technique creates an objective Khmer civilization. Yet, despite 150 years of intense study, classification tool for analyzing all 1,780 female devata Angkor Wat’s most amazing treasure has remained images at Angkor Wat. unexamined, hidden in plain sight. 1 Devata: supernatural beings, often depicted as young women of great beauty and elegance proficient in the art of dance (Wikipedia) 2. FACE RECOGNITION Face recognition is an innate human ability that we use in our daily lives. It is natural to build computer systems to duplicate this remarkable human activity for tasks such as man-machine communication, person authentication and surveillance. While substantial progress has been made in automatic face recognition, unconstrained face recognition (in the presence of changes in illumination, occlusion, pose and expression) remains a difficult problem (See Figure 2). Applying face recognition techniques to devata images at Angkor Wat poses its own challenges. The facial carvings do not display much of the texture information (as in photographs of human faces captured by digital cameras). Further, due to the carving age, some devata images are disfigured. On the other hand, the problem is constrained by the fact that almost all the devata facial images are in frontal pose and are portrayed at approximately the same scale. Further, our goal is not to uniquely identify each devata face, but instead to group them into a small number of meaningful categories based on defined measures of facial similarity. There has been some prior work on classifying human Fig. 2 Full body portrait and faces of devata faces into gender and ethnic types. The main purpose of mouth, etc.). The resampling scheme first connects neighboring these studies has been to improve the recognition capability landmarks in a region to form a contour. Next the intersection of and reduce the complexity (by indexing) of face recognition the center of the contour and the lower half of the boundary algorithms. Lu and Jain [3] used Linear Discriminant were used as the starting point. Finally, k ¤ n landmarks were Analysis (LDA) to classify faces as either Asian or non- sampled at an equal spacing, L=(k ¤ n), where k is the Asian type. Lyons et al. [4] classified facial images into East sampling rate, n is the original number of landmarks in the Asians vs. non-East Asians based on Gabor texture wavelet region, and L is the length of the region contour. The features. Gutta and Wechsler [5] classified facial images into resampling scheme leads to more consistent placement of Caucasian, Asian, Oriental, and African types. landmarks on devata faces. There are three main differences between the abovementioned studies and the proposed grouping of 2.2. Similarity measure devata: (i) As mentioned above, devata faces are devoid of In order to compute the similarity between a pair of faces, facial texture and reflectance typical of human faces; (ii) the corresponding landmarks are aligned by centering, instead of classifying faces into distinct races, such as scaling and rotating one set of points with respect to the Caucasian and Asian, our goal is to discover what these other. Given N facial images, an NxN similarity matrix S0 classes are through clustering; clustering is a more difficult is computed, where the similarity between face and , problem than classification; (iii) the number of groups or S0[i; j], is defined as S0[i; j] = n jj© (l) ¡ © (l)jj classes is not known, so both the groups as well as their Pl=1 i j 2 number need to be discovered. where ©i(l) is the two-dimensional location of the lth landmark in image i. We define the normalized similarity 0 0 0 2.1. Facial Feature Extraction matrix as S = S =(max(S ) ¡ min(S )). Figure 4 shows two sets of landmarks aligned for generating the similarity. In order to effectively describe each devata face with the objective of defining a measure of facial similarity, each 2.3. Clustering face was manually annotated with 140 landmarks. The use of Clustering algorithms seek to automatically partition a given landmarks is motivated by the Active Appearance Model set of objects into distinct groups such that objects in each (AAM) in automatic approaches to face recognition [6]. The group are more similar to each other than objects belonging landmarks were obtained from the boundaries of the face, eyes, to different groups [7]. Clustering algorithms are generally neck, nose, mouth and ears (See Figure 3). Since the manual separated into hierarchical and partitional types. placement of landmarks is not always consistent, we applied a Hierarchical algorithms successively merge the most similar resampling scheme separately in each facial region (e.g. face, patterns to generate a tree structure, called a dendrogram. A partition of the objects can be obtained from the dendrogram Fig. 3 Two facial images with 140 landmarks each Fig. 5 Must-link pairwise constraints; images in each column should be placed in the same group able to incorporate external constraints (semi-supervised clustering) [9]. 2.4. Semi-supervised Clustering In semi-supervised clustering, the domain expert typically specifies two types of pairwise constraints: must-link and cannot-link constraints. Must-link constraints are ideal for Fig. 4 Landmarks in Fig 3 resampled with k = 9 and this task because it is easier for an expert to specify which aligned; similarity between two faces in Fig 3, S[i; j]=0.385 devata faces should belong to the same group. The pairwise constraints were provided by 34 professors and students at by choosing a specific level of similarity to cut the tree. Khmer Arts Academy, Phnom Penh, Cambodia. They were Partitional algorithms, on the other hand, directly construct a asked to group faces and to identify facial pairs that they partition with K clusters by minimizing an objective considered to be the most similar. This lead to a set of 243 function, e.g., the squared error criterion in the well-known must-link pairwise constraints (see Figure 5). One of the K-means algorithm. While hierarchical algorithms operate well-known semi-supervised clustering algorithm is MPC- ideally on a similarity matrix, partitional algorithms prefer a KMeans [9] which minimizes J, a weighted sum of the feature space representation, where each object is described squared error (data term) and the given pairwise constraints.

Load more