Age Estimation and Cultural Event Recognition
Total Page:16
File Type:pdf, Size:1020Kb
ChaLearn Looking at People 2015 new competitions: Age Estimation and Cultural Event Recognition Sergio Escalera Jordi Gonzalez` Xavier Baro´ Pablo Pardo University of Barcelona Univ. Autonoma` de Barcelona Universitat Oberta de Catalunya University of Barcelona Computer Vision Center, UAB Computer Vision Center, UAB Computer Vision Center, UAB Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected] Junior Fabian Marc Oliu Hugo Jair Escalante Ivan Huerta Univ. Autonoma` de Barcelona University of Barcelona INAOE University of Venezia Computer Vision Center, UAB Computer Vision Center, UAB Email: [email protected] Email: [email protected] Email: [email protected] Email: [email protected] Isabelle Guyon Clopinet, Berkeley Email: [email protected] Abstract—Following previous series on Looking at People and context. Therefore, how to combine and exploit all this (LAP) challenges [1], [2], [3], in 2015 ChaLearn runs two new knowledge from pixels constitutes an interesting problem. competitions within the field of Looking at People: age and cul- tural event recognition in still images. We propose the first crowd- This motivates our choice to organize a new workshop and sourcing application to collect and label data about apparent a competition on this topic to sustain the effort of the computer age of people instead of the real age. In terms of cultural event vision community. These new competitions come as a natural recognition, tens of categories have to be recognized. This involves evolution from our previous workshops at CVPR 2011, CVPR scene understanding and human analysis. This paper summarizes 2012, ICPR 2012, ICMI 2013, and ECCV 2014. We will both challenges and data, providing some initial baselines. The continue to use our website http://gesture.chalearn.org for pro- results of the first round of the competition were presented motion, while challenge entries in the quantitative competition at ChaLearn LAP 2015 IJCNN special session on computer will be scored on-line using the Codalab Microsoft-Stanford vision and robotics http://www.dtic.ua.es/∼jgarcia/IJCNN2015. Details of the ChaLearn LAP competitions can be found at University platforms (http://codalab.org/), from which we have http://gesture.chalearn.org/. already organized international challenges related to Computer Vision and Machine Learning problems. The results of the first round of the competition will be presented at ChaLearn LAP I. INTRODUCTION 2015 IJCNN special session on computer vision and robotics http://www.dtic.ua.es/∼jgarcia/IJCNN2015. The automatic analysis of the human body in still images In the rest of this paper, we describe in more detail both and image sequences, also known as Looking at People, keeps age estimation and cultural event recognition challenges, their making rapid progress with the constant improvement of new relevance in the context of the state of the art in terms of published methods that push the state-of-the-art. Applications application and data, and provide some initial baselines. are countless, like Human Computer Interaction, Human Robot Interaction, communication, entertainment, security, commerce and sports, while having an important social impact in assistive II. AGE ESTIMATION CHALLENGE technologies for the handicapped and the elderly. Age estimation is a challenging task which requires the automatic detection and interpretation of facial features. We In 2015, ChaLearn is organizing new competitions and have designed an application using the Facebook API for workshops on age estimation and cultural event recognition the collaborative harversting and labeling by the community from still images. The recognition of continuous, natural in a gamified fashion ( http://sunai.uoc.edu:8005/). We are human signals and activities is very challenging due to the currently collecting the data, which will be composed by multimodal nature of the visual cues (e.g., movements of thousands of images labeled by many users. fingers and lips, facial expression, body pose), as well as technical limitations such as spatial and temporal resolution. A. State of the art on age recognition Facial expressions analysis and age estimation are hot topics in Looking at People that serve as additional cues to determine Age estimation has historically been one of the most human behavior and mood indicators. Finally, images of cul- challenging problems within the field of facial analysis [4], [5]. tural events constitute a very challenging recognition problem It can be very useful for several applications, such as advanced due to a high variability of garments, objects, human poses video surveillance ([5], [6]), demographic statistics collection, business intelligence and customer profiling, and search opti- Active Shape Models (ASM) [22] and an improved version, mization in large databases. Some of the reasons age estimation Contourlet Appearance Model (CAM) [23], where they prove is yet a challenging problem are the uncontrollable nature of the efficiency of using facial landmarks. Then T. Wu et al. the aging process, the strong specificity to the personal traits [24] proposed to use facial landmarks and project them into a of each individual [7], high variance of observations within Grassmann manifold to model the age patterns. the same age range, and the fact that it is very hard to gather complete and sufficient data to train accurate models [8]. With regards to the learning algorithm, Support Vector Machines (SVM) have commonly been used for age clas- One of the earliest works in age estimation was done by sification and regression, as in [20]. A binary decision tree A. Lanitis et al. [9], [10], [11]. They propose two different ag- with SVMs at each node is proposed in [6] by Han et al. ing estimation methods: weighted appearance specific method Age ranges are coarsely assigned, and later are more precisely [10], [11]. The aging factor of a new individual is computed by estimated by Support Vector Regressors (SVR) at the leaves. the weighted sum of the aging functions of other individuals. Chang et al. in [25] used a particular ranking formulation of They also use an age specific method [9], where the new support vectors, OHRank. The approach uses cost-sensitive individual is first classified into a cluster with similar aging aggregation to estimate ordinal hyperplanes (OH) and ranks factor patterns. Then, it is classified into different age ranges them according to the relative order of ages. In this paper and an age-specific classifier is applied to estimate the final Active Appearance Model (AAM) is used. In [7], Weng et age. al. employs a similar ranking technique, where Local Binary There are some drawbacks to this “aging function” ap- Patterns (LBP) histogram features are combined with principal proach pointed out by X. Geng et al. [12]. The formula of the components of BIF, shape and textural features of AAM, and aging function is empirically determined, there is no evidence PCA projection of the original image pixels. Fusion of texture suggesting that the relation between face and age is described and local appearance descriptors (LBP and HOG features) have just by a quadratic function. Also, the new aging function independently also been used for age estimation by Huerta et for the unseen images is simply a linear combination of the al. in [26]. already known aging functions. X. Geng et al. claimed to solve There have been previous proposals training neural net- these problems in their new proposed method AGing pattErn works, which are able to learn complex mappings and deal with Subspace (AGES) [12], where each face image is represented outliers, for age estimation. In [9], Lanitis et al. used AAM- by a point in the aging pattern subspace. encoded face parameters as an input for the supervised training Later N. Ramanathan et al. [13], [14] approached the of a neural network with a hidden layer. More recently, Geng age estimation problem in two different scenarios, estimating et al. in [8] tackle age estimation as a discrete classification the age difference between two face images of the same problem using 70 classes, one for each age. The best algorithm individual based on a Bayesian age-difference classifier [13] proposed in this work (CPNN - Conditional Probability Neural and estimating the age of young faces using the facial growth Network) consists of a three-layered neural network, in which geometry [14]. The problem of the last approach is that it can the input to the network includes both BIF features x and a only be applied to face images of young people in a growing numerical value for age y, and the output neuron is a single age since afterwards the facial geometry does not change as value of the conditional probability density function p(yjx). much. An extensive comparison of these classification schemes for age estimation has been reported in Fernandez et al. [27]. Y. Fu et al. were the first to approach the problem through In [28], Yang et al. used Convolutional neural networks for manifold analysis methods [15], [16]. Each face image is age estimation under surveillance scenarios. assigned to a low-dimensional representation via manifold em- bedding. Following this approach G. Guo et al. [17] proposed Other different variations of the problem has been ad- a new method based on a study of different dimensionality dressed, A. Lanitis et al. [29] performed a first approach to reduction and manifold embedding techniques and added a age estimation using Head and Mouse tracking movements, robust regression step to the previous framework. In a posterior Y. Makihara et al. [30] used a gait-based database to estimate work [18], G. Guo et al. introduced a new approach, using the age, B. Xia et al. [31] proposed an age estimation method kernel partial least square (KPLS) regression to reduce feature based on 3D face images. dimensionality and learn the aging function in a single step.