Preprocessing and Descriptor Features for Facial Micro-Expression Recognition

Chris House and Rachel Meyer Department of Electrical Engineering, Stanford University {chouse12, r3m3y3r}@stanford.edu June 5, 2015

Abstract sions that reveal true feelings even when someone is attempting to conceal their emotions. These expres- Facial micro-expressions contain signicant informa- sions are extremely dicult to detect due to their tion about how people feel, even when they are at- short time scales and subtlety. Micro-expressions tempting to conceal their emotions. Previously, very were rst reported in psychological literature in the little has been done to detect and recognize 1960's and have been studied since then[1, 2]. Re- micro-expressions using computer vision methods. cently, a few research groups have attempted micro- In this paper, detection and classication of micro- expression detection and recognition using computer expressions from the Spontaneous Micro-Expression vision techniques. Trained humans can recognize database were implemented, following preprocessing micro-expressions accurately about 47% of the time and cropping of raw images using Haar features, us- [3], but perhaps computer vision methods can achieve ing local binary patterns on three orthogonal planes higher accuracy. The applications of technology (LBP-TOP) and local gray code patterns on three that can successfully detect and recognize micro- orthogonal planes (LGCP-TOP) as descriptors and expressions are varied. It would be valuable in law support vector machines (SVMs) as detection and enforcement interrogations to detect deceit, in mar- recognition algorithms. Results show accuracy com- keting to detect how humans respond to advertising, parable to other work. and in general and articial intelligence research to study human emotion. 1 Introduction 2 Background Facial expressions contain signicant information about how a person feels. Humans are adept at recog- Previous work in micro-expression analysis has taken nizing macro-expressions that occur for time periods mostly a psychological analysis form. The psychol- on the order of a few seconds, but humans are able to ogists Haggard and Isaacs rst proposed the idea manipulate these expressions to hide their true emo- of micro-expressions, or short, involuntary reactions tions. Micro-expressions, on the other hand, are short that are immediately retracted in an attempt at sup- (lasting less than half a second), involuntary expres- pression, in a 1966 study [4]. , noted

1 for his work in facial expression analysis, published A number of dierent approaches have been pro- his seminal work in 1968 regarding Nonverbal Leak- posed and implemented for detection and recogni- age and Clues to Deception, in which he detailed tion of micro-expressions [9]. Some groups have used his ndings in detecting deception. Ekman used mi- Gabor features and algorithms such as SVM, Ad- croexpression analysis in conjunction with his previ- aboost, and other variations of those to recognize ous work regarding emotion classication based o micro-expressions [10]. Others have divided the face of facial patterns to further understand why hu- into sub-regions and analyzed facial strain in these re- mans lie and how they display deceit nonverbally gions for detection [11], or used 3D gradient descrip- [5]. In 2002, John Gottman, a famous tors [12]. Temporal interpolation models and multi- for his work in predicting divorce rate based o of ple kernel learning have also been explored [3]. How- a couples' interaction, used micro-expression analy- ever, the most common method by far has been Local sis from short video clips to enhance his understand- Binary Patterns on Three Orthogonal Planes (LBP- ing of couple interaction and his divorce prediction TOP) combined with various learning techniques for ability [6]. All of these psychological experts cer- classication [7, 8]. In the last several years, several tainly gained an ability to detect and recognize micro- groups have achieved detection accuracies on the or- expressions. However, while all of this work helped der of 60-70% (with 50% as the baseline chance detec- increase understanding of micro-expressions from a tion) and recognition accuracy between 3 classes of psychological standpoint, none of the work sought emotions on the order of 50% (3 classes, so 33% as the to gain a generalized ability to detect or recognize baseline chance detection) [7]. Another newly devel- micro-expressions. Thus, in the last few years, sev- oped descriptor, Local Gray Code Patterns (LGCP), eral groups have attempted to create general com- has claims of improving upon the deciencies of the puter vision methods that can detect and recognize descriptors listed above, most notably the memory specic facial micro-expressions. Until recently, the requirements of Gabor features and the inability to biggest challenge in attempting to create a general handle scene illumination variance and noise for LBP method for micro-expression detection and recogni- [13]. tion had been a lack of useful data. However, in 2013, two databases were developed and released for pub- 2.1 Our Contributions lic use in research with micro-expressions: the SMIC database (Spontaneous Micro-Expression Database) Our project evaluates previous research by imple- and CASME II. These databases were developed un- menting similar descriptors and algorithms. Our der similar premises: test subjects were lmed using main contributions include a more thorough analysis high speed cameras while being shown various video of the results and implementing a newly developed clips that were intended to induce emotional reac- feature descriptor, LGCP-TOP. tions. The test subjects were also instructed to sup- press an outward reaction so as to properly induce micro-expressions [7, 8]. Other databases have also 3 Methods been constructed but these databases are generally less desirable for analysis since the micro-expressions For this project, we used the Spontaneous Micro- were obtained through posed expressions (test sub- Expression Database (SMIC). The database contains ject told to display facial expression corresponding to both raw images and preprocessed and cropped im- an emotion, but with low muscle intensity) or while ages. The raw images were cropped using Haar fea- the test subject was talking or moving their head tures to detect the subject's face. The descriptors (which makes computer vision analysis more dicult used were local binary patterns on three orthogo- as it essentially introduces noise) [7]. Other more nal planes (LBP-TOP) and local gray code patterns general facial expression databases include CK, MMI, on three orthogonal planes (LGCP-TOP). Support JAFFE, RUFACS, MULTI-PIE, and Oulu-CASIA. vector machines (SVMs) were used for detection and

2 recognition. are then classied using a classication and regres- sion tree (CART). The output of this function was a 3.1 SMIC Database bounding box for the given image. Using the bound- ing box, the image was then cropped (to a uniform The SMIC database, developed by the research group size) and then resized in order to allow for faster led by Xiaobai Li at University of Oulu, was created processing in obtaining the feature descriptors later specically for work with micro-expression recogni- on. In comparison to the preprocessed images in the tion. In the SMIC dataset, the test subjects' facial ex- SMIC database, our cropping method achieved a sim- pressions were lmed while watching 16 various movie ilar result. clips designed to induce strong emotions. Each sub- ject was instructed to try to conceal their reaction to each clip as much as possible, as the testers would at- tempt to guess their reactions to each clip afterwards. As an incentive for the test subjects, the testers added a penalty of having to complete a long, boring ques- tionnaire if they were to accurately guess the sub- Figure 1: Process outlining the cropping of the raw jects' emotions. In this way, the data obtained was images [3]. a more accurate representation of how humans pro- duce micro-expressions. The database contains 164 micro-expression video clips elicited from 16 partic- ipants. These micro-expressions were classied into 3.3 LBP-TOP positive, negative, and surprise categories. We uti- Choosing the features for micro-expressions is a dif- lized the high speed video (100 fps) data from the cult problem, as both spatial and temporal infor- dataset (other options included video sequences at mation should be encoded into them. Additionally, 25 fps and near-infrared images). Table 1 shows the micro-expressions are very subtle and last for a short composition of the dataset. period of time. The local binary patterns on three or- thogonal planes (LBP-TOP) descriptor was selected Table 1: Composition of the SMIC database. Note after an extensive survey of relevant literature [7, that these images are available both as raw images 8]. The local binary pattern (LBP) descriptor, from and cropped images. which LBP-TOP was derived, was developed in 1994 and is used commonly in computer vision applica- tions. Considering just one pixel in the image, a LBP is computed by comparing the pixel value with that of its neighbors, and then converting the binary string representing the local texture into a decimal num- ber. A feature vector can be formed by calculating 3.2 Preprocessing of Images a histogram of LBPs over an entire image. The LBP method is eective for describing 2D textures of static Using the raw images from the SMIC database, images, but to analyze time-dependent textures (i.e. we implemented the CascadeObjectDetectorSystem changing expressions in video), this model needs to be from the MATLAB computer vision toolbox. This extended. To do so, LBP histograms are calculated in object detector has several built-in object detectors, three orthogonal planes. For a video with time length including three face detectors (two using Haar fea- T, LBP are computed for the XY-, XT-, and YT- tures, one using LBP), eye detectors, a nose detector, planes. The XY-plane describes the spatial changes, and a mouth detector. We implemented the face de- while the XT- and YT-planes describe the spatial- tector using Haar features for facial features, which temporal change in each respective dimension. These

3 histograms are then concatenated to form the nal convolved with a 3x3 region to get 8 edge directions LBP-TOP feature vector. Thus, LBP can be ex- for the center pixel. Then, using bitwise compari- tended to our videos of facial micro-expressions. It son between neighboring pixels in a clockwise orien- is dicult to observe noticeable patterns in micro- tation, the edge direction magnitudes are converted expressions data because micro-expressions are reac- to gray code, as shown in Figure 2. With the whole tions which one deliberately tries to suppress, which image split into sub-blocks (typically 81 sub-blocks causes facial expressions to have very low intensity. in a 9x9 grid), the gray code values for each pixel are As a result, additional modications are usually im- then used to form a histogram for each sub-block. plemented in micro-expression recognition. Uniform When computing the histogram, unused bins are dis- binning is used to reduce dimensionality and improve carded, allowing the dimensionality of the feature information quality. Some groups also implement vector to be reduced from 256 bins to 20 bins. At this temporal interpolation models (TIM) to achieve more point, an optional weighting can be performed to give stable results [3]. In our experiments, we adjusted bins with higher term frequency more importance the radius of the descriptor and number of neigh- in the feature vector. Following this, the histogram boring points, and we also investigated uniform/non- of each sub-block is then concatenated to form a fea- uniform binning and bi-linear interpolation. All of ture vector for the whole image of size 1215 (81x15). the results reported used eight neighboring points While LGCP is a new feature descriptor, promising and bi-linear interpolation. Temporal interpolation results have been witnessed. According to the au- was not implemented yet, but could be worth incor- thors, LGCP is particularly good at handling noisy porating in the future as it can improve accuracy. images, as shown in Figure 3. In terms of classica- tion accuracy for facial expressions, LGCP was able to achieve improvements in facial expression recogni- tion for both the JAFFE database and CK+ database (databases with 7 classes of macro-expressions, and 326 images and 213 images, respectively) over other feature descriptors, including LBP, Gabor features, and LPQ (local phase quantization) [13].

Figure 2: Example showing LBP calculation (top) and concatenation (bottom) to form an LBP-TOP descriptor [8].

3.4 LGCP-TOP Like LBP-TOP, Local Gray Code Patterns in Three Orthogonal Planes (LGCP-TOP) is a descriptor used Figure 3: Example showing LGCP calculation [13]. for computer vision applications [13]. Developed in 2013, LGCP involves using Robinson Compass Masks

4 (50%) and recognition accuracy achieved an almost 16% improvement over chance recognition (33%). These results are comparable to those reported by Li and Pster [7], who achieved 65.5% detection accu- racy and 49.3% recognition accuracy using LBP-TOP and SVM. Those results are not directly comparable to ours, however, as they reported accuracies based on leave-one-subject-out cross validation as opposed to accuracies obtained from testing their model on a separate test group. We also examined the dierence in results achieved while varying the parameters used in the LBP-TOP descriptors. We computed LBP- TOP descriptors without uniform binning and spatial radius of one, descriptors with uniform binning and spatial radius of one, and descriptors with uniform binning and spatial radius of three. The results were Figure 4: Example showing LGCP's invariance to interesting, as the descriptors without uniform bin- noise, as opposed to LBP's inability to handle the ning achieved the best detection accuracy but worst noisy image (a is original image, b is noisy image) recognition accuracy. Perhaps uniform binning re- [13]. duces useful information in terms of discriminating between micro-/non-micro expressions, as it drasti- cally reduces the number of bins in the histogram. In 3.5 Detection and Recognition Algo- the case of recognition, uniform binning might elimi- rithms nate some of the noise that occurs in expressions and retain the useful information that allows the SVM Previous researchers have used Support Vector Ma- to discern between dierent micro-expressions. It chines (SVM), Random Forests (RF), and Multiple also seems that nding an optimal spatial radius can Kernel Learning (MKL) as detection and classi- improve the recognition accuracy as the descriptors cation algorithms. SVMs with RBF kernels were with uniform binning and a spatial radius of three chosen as a baseline algorithm to compare to other achieved the best recognition accuracy. Finding op- work. The dataset was split by subject into train- timal LBP-TOP spatial radius can help nd the right ing (75% of the data) and testing sets (25% of the level of localization and changes in features, which al- data) by subject in an eort to test how well these lows for better discrimination between dierent types algorithms would generalize to unseen people. Leave- of micro-expressions. Detection and classication of one subject-out cross-validation was performed on micro-expressions is extremely challenging. Figure the training set to optimize parameters for LBP-TOP 2 shows two examples of micro-expressions from the and the SVM. The detection and recognition were database. The small changes in expression that these done in separate experiments so that we could iso- subjects show is dicult to represent in a descriptor. late the eects of each stage separately. We noticed that small changes in the descriptor and machine learning algorithms drastically aected per- 4 Results formance. It makes sense that detection accuracy was better than recognition accuracy (19% above chance 4.1 LBP-TOP vs. 16% above chance), as when detecting micro- expressions the classier is looking for short periods As shown in Table 2, detection accuracy achieved of movement. This task is much easier than recogni- almost a 19% improvement over chance detection tion, where the classier must discern between dier-

5 Table 2: Final results for each descriptor.

Figure 5: Example of two subjects displaying micro- expressions (positive on the left - note the slight curl structures were not exhaustive although several tests of the lip, negative on the right - note the slight fur- seemed to indicate that block structure only played row of the brow) [3]. a minimal role in the results obtained. However, de- spite the lower accuracy, LGCP-TOP was advanta- geous over LBP-TOP as it was approximately 15x ent types of subtle movements that may be somewhat faster than LBP-TOP in computing the feature de- dierent between subjects. scriptors. This has to do with the block structure uti- lized with LGCP, whereas LBP simply iterates over 4.2 LGCP-TOP every pixel in each image sequence. The faster com- putation time allowed for testing with dierent block The results for LGCP-TOP were less encouraging structures and easier debugging, although the block than those of LBP-TOP, shown below in Table 2. The structures did not have a signicant eect in improv- top detection accuracy observed with LGCP-TOP ing accuracy. descriptors using SVM's as a classier was about 61% and the top recognition accuracy was about 48%. These accuracies still represent improvements over 5 Conclusion chance (11% and 15%) yet these accuracies still fall short of the the results witnessed with LBP-TOP. A method for detecting and identifying facial micro- There are several possibilities for these discrepan- expressions was implemented. Using the SMIC cies. One is that the feature vectors from LGCP-TOP database, preprocessing of raw images was imple- were far too big for SVM's to classify without over- mented using Haar features to identify a facial struc- tting. In an attempt to counter this, we performed ture and the images were cropped based o of the re- PCA through a low-rank SVD approximation to re- sulting bounding box. Two feature descriptors, LBP- duce the feature vectors from dimensionality greater TOP and LGCP-TOP were implemented and tested than 4000 to less than 1000. This helped improve using SVM's as a classier. LBP-TOP achieved re- the recognition accuracy (from 38% to 48%) but did sults comparable to Li's group, while LGCP-TOP fell not improve detection accuracy signicantly (57% to short in detection accuracy but close with recogni- 61%). Another possibility is that the sub-block struc- tion. LGCP-TOP also had the advantage of much ture used in computing the LGCP-TOP descriptors shorter computing time to compute the feature de- was not optimized for recognition or detection. With scriptors. LBP-TOP, changing the spatial radius of the neigh- bor points and binning for the descriptor had large eects on the detection and recognition accuracy, as 6 Future Work increasing the spatial radius from 1 to 3 improved recognition accuracy, while not using uniform binning Future work will seek to improve detection and recog- (reducing dimensionality from 256 to 59 by discard- nition accuracy in several manners. One possibil- ing repeated bins) with a spatial radius of 1 improved ity for the poorer performance of LGCP-TOP was detection accuracy. Our tests of LGCP-TOP block that its feature dimensionality was too large to clas-

6 sify without overtting. As noted previously, LGCP- psychotherapy. Methods of Research in Psy- TOP descriptor dimensionality was reduced using chotherapy, 1966. 154-165. PCA. Other possibilities for decreasing dimension- ality could involved performing weighting over the [5] Ekman, Paul, et al. Nonverbal Leakage and image block structure used in computing LGCP de- Clues to Deception. Psychiatry: Interpersonal scriptors. Blocks close to the center of the face would and Biological Processes. 1969. 88-109. most likely be more important than blocks towards [6] Gottman, John, et al. A Two-Factor Model for the edges of the image and would thus be weighted Predicting When a Couple Will Divorce: Ex- more heavily. Other possible improvements could ploratory Analyses Using 14-Year Longitudinal come from face alignment. Face alignment would Data. Family Process. 83-86. be performed to align each image to a canonical position, such as aligning the position of each sub- [7] Li, Xiaobai, et al. "A spontaneous micro- ject's nose or eyes. Finally, improvements could be expression database: Inducement, collection and witnessed with dierent micro-expression databases. baseline." Automatic Face and Gesture Recogni- Other databases, such as CASME II which con- tion (FG), 2013 10th IEEE International Confer- tains spontaneous micro-expression image sequences, ence and Workshops on. IEEE, 2013. might have better image quality. This could certainly help improve performance, particularly with LBP- [8] Yan,Wen-Jing,etal."CASMEII: An improved TOP, given that LBP is somewhat more susceptible spontaneous micro-expression database and to noise and illumination changes. the baseline evaluation." PloS one 9.1 (2014): e86041 Acknowledgements [9] Zeng, Zhihong, et al. "A survey of aect recog- nition methods: Audio, visual, and spontaneous We would like to thank Huizhong Chen as our project expressions." Pattern Analysis and Machine In- mentor and the entire EE368 teaching sta for the telligence, IEEE Transactions on 31.1 (2009): work over the entire quarter. 39-58. [10] Wu,Qi, Xunbing, Shen,and Xiaolan, Fu."The References machine knows what you are hiding: an auto- matic micro-expression recognition system." An [1] Haggard, Ernest A., and Kenneth S. Isaacs. Eective Computing and Intelligent Interaction. "Micromo- mentary facial expressions as indi- Springer Berlin Heidelberg, 2011. 152-162. cators of ego mechanisms in psychotherapy." [11] Shreve, Matthew, et al. "Macro-and micro- Methods of research in psychotherapy. Springer expression spotting in long videos using spatio- US, 1966. 154-165. temporal strain." Automatic Face & Ges- [2] Ekman, Paul, and Wallace V. Friesen. "Nonver- ture Recognition and Workshops(FG2011), 2011 bal leakage and clues to deception." Psychiatry IEEE International Conference on. IEEE, 2011 32.1 (1969): 88-106. [12] Polikovsky, Senya, Yoshinari Kameda, and [3] Pster, Tomas, et al. "Recognising sponta- Yuichi Ohta. "Facial micro-expressions recogni- neous facial micro-expressions."Computer Vi- tion using high speed camera and 3D-gradient sion (ICCV), 2011 IEEE International Confer- descriptor." (2009): 16-16 ence on. IEEE, 2011. [13] Islam, Mohammad Shahidul, et al. Local Gray [4] Isaacs, K. S., et al. Micro-momentary facial Code Pattern (LGCP): A Robust Feature De- expressions as indicators of ego mechanisms in scriptor for Facial Expression Recognition. In-

7 ternational Journal of Science and Research (IJSR), India Online ISSN: 2319-7064. 2013.

8