Journal of Image and Graphics, Vol. 8, No. 2, June 2020

Beyond Digitalisation: Facial Motion Capture for through the Perspective of Aesthetic Experience and Uncanny Valley

Muhammad Zaffwan Idris and Naimah Musa Creative Multimedia Department, Universiti Pendidikan Sultan Idris, , Email: {zaffwan, naimah.m}@fskik.upsi.edu.my

Abstract—In recent years, study on movement has risen drama component especially because of the storytelling among industry professionals and scholars. This also apply and different characters found in Mak Yong. to the dance area where movement and kinesthetic play the major role. However, to measure the dance movement and II. LITERATURE REVIEW kinesthetic is not complete without considering several element that affects the dance performance such as emotion, Many scholars such as Skeat, Cuisinier, Sheppard and knowledge, and technical parts. The aim of this research is Malm have mentioned Mak Yong in the 20th century as to observe both the potentials and challenges in digitising one of the unique form of cultural expression. However, a facial expressions by the dancer in expressing different type scholar named Mubin Sheppard, claimed that Mak Yong of emotions as part of preserving a Malay folkdance known as Mak Yong. The intention is to use facial motion capture was introduced into from Pattani about 2000 (MoCap) device to record the facial expressions then years ago as a form of entertainment. Prior to 1920, the applying the data in different form of visual representations. Sultan of Kelantan and royalties were responsible in Further discussions are based on theory and model known maintaining groups of Mak Yong performers. Unlike any as the Aesthetic Experience (AX) and Uncanny Valley. other traditional dance theatre, Mak Yong is unique as it combines all the performing arts elements such as dance, Index Terms—facial motion capture (MoCap), Aesthetic vocal, song, instrumental music, acting, the word that Experience (AX), uncanny valley, Mak Yong  spoke, ritual and poetry [4]. Repertoire in Mak Yong has expanded into numerous phases, begins with Dewa Muda, which is regarded as the I. INTRODUCTION story that explains the origins of Mak Yong, and has This paper is in its preliminary stage. However, as part expended into twelve of classical stories. There are Dewa of a bigger research project, this paper highlights on the Muda, Dewa Pechil (with Dewa Samadaru as a variant potentials and challengers that might occur during the title/version), Dewa Sakti (or Rajah Sakti), Dewa Indera- process of digitising Mak Yong focusing on facial Indera Dewa, Dewa Panah (or Anak Rajah Panah), Anak expression using MoCap. Mak Yong is complex form of Raja Gondang, Gading Bertimang, Raja Tangkai Hati, artistic expression as it combines dance and drama Raja, Muda Lakleng, Raja Muda Lembek, Raja Besar components as part of the whole performance. The Dalam Negeri Ho Gading, and Bedara Muda. research on preservation of Intangible Cultural Heritage A. Pak Yong (ICH) in Mak Yong as in management [1] as well as in Pak Yong is portrayed as a main actor in Mak Yong to other ICH using MoCap, for examples, as in Cypriot folk carry several characters in different stories. Basically, Pak dance [2] and Malaysia such as Zapin [3]. Thus, Yong can either performed by a male or female actor. preservation of Mak Yong using MoCap can be However, female actors have been playing these roles considered novice. Mak Yong emphasizes on the nowadays. Besides being portrayed as a hero, Pak Yong expressions through body movement and facial is also being visualised as a King or a Prince. In Mak expression. Since Mak Yong can be segmented into Yong repertoire, one or more actors may play the Pak dance and drama, the former consisted of dance routine, Yong roles. Therefore, there could be a young Pak Yong musical instrument and song, while the later consisted of and an older Pak Yong. storytelling, dialogue and character. The process of Most of the Pak Yong actors are known for his or her digitising the whole Mak Yong performance could wide experiences in Mak Yong as Pak Yong is the become an overwhelming procedure. To begin with, one leading actor in Mak Yong performance (Fig. 1). of the segments in Mak Yong that needs to be highlighted Basically, the actor or actress will undergo several other is the facial expression. This is because the facial characters before being appointed as Pak Yong based on expression plays an important role in both dance and skills, stage presence, and appearance [5]. As Mak Yong and many other folkdances are facing Manuscript received December 18, 2019; revised March 27, 2020. huge threats through modernization and lack of interest

©2020 Journal of Image and Graphics 37 doi: 10.18178/joig.8.2.37-41 Journal of Image and Graphics, Vol. 8, No. 2, June 2020

from younger generation, digital preservation is deemed market demand especially in entertainment and animation. as one of the best solutions in order to ensure its existence Mayya et al. [13] denoted that there are a few methods [6]. Plus, with the advancement of digital technology, used to recognize facial expression, such as, by using the apart from preserving the authenticity of Mak Yong, a support of Vector Machine classifier with boosted CBP new interpretation and presentation of Mak Yong like 3D features, expression recognition can be achieve and also animation, digital games, and hologram exhibition are technique known as Deep Convolutional Neural (DCN) made possible. The following topics are some of the key framework, by extracting the features of the images. discussions in order to support the intention. B. Facial Recognition In 2007, Woodward et al. [14] used the stereo web- cameras to create the performance of a virtual character. However, this technique needs three modules, marker tracking, face animation, and the virtual performance. According Wójcik, [15], there are six ways or techniques to capture and analyse the facial regonition; i) Classical Figure 1. Pak Yong in Mak Yong performance face recognition algorithms ii) Artificial neural networks iii) Gabor wavelets iv) Face descriptor-based method v) 3D based face recognition vi) Video based recognition. III. MOTION CAPTURE One of the popular techniques is EMOTE, a facial recognition SDK, a system that is able to integrate the Motion Capture or better known as MoCap is a system target emotions without needing an open source computer that is able to capture the movemet of actors or objects vision (OpenCV), a library of programming function that digitally for animation purposes. By placing the markers mostly used for real time computer vision. onto the actor or the object, the movement and trajectory Basically, the most common approach used in the can automatically be traced and recorded. MoCap is able facial recognition is by using these three steps; i) face to extract and measure the body movement including detection and tracking ii) feature extraction iii) several parameters such as speed, position, and angle classification of expression. In the end of process, facial during the process of archiving, while the data collected changes can be identified as prototypic emotions or facial may be applied to manage instruments and scientific action units [16]. This technique is less advanced or also investigation [7]. Currently, MoCap has been widely used known as refined tracking, as it unable to fully capture in many areas such as in digital game and movie three-dimensional (3D) motions such as head rotation. productions such as The Lord of the Rings, Rise of the Planet of the Apes, Avatar, Ironman and Marvel C. Facial Motion Capture Avengers. Facial MoCap is a process of electronically tracking MoCap is an advanced tool to track, record and and converting the movements of a person's face into a digitised the complexity of body movement especially in high resolution digitally by using tools such as cameras or dance as it is efficient, produces accurate data and able to laser scanners to track the expressions (Fig. 3). There are assess the movement and gesture [8], [9]. This process two types of facial MoCap existing in the market, which can help to simplify the process of animating a 3D are i) markerless and ii) marker based. For marker based, character, which normally takes longer time to complete. Edward et al. [17] used Vicon MoCap system with the Two type of MoCap are known as marker based and Nexus 1.7.1 software where 100 markers with a diameter markerless [10]. However, the optical marker based is of 1.5 mm and 21 markers with a diameter of 3mm are most widely used and can be divided into active (LEDs) used to capture the expressive of facial motion on the or passive markers (balls coated with reflective material). face. Passive markers are able to give the highest accuracy and Meanwhile, markerless is used to capture and analyse response in a short time. [11]. Recently, MoCap has also thoroughly the face’s surface including the texture and been used in the digitisation of folkdance [12], [6]. (Fig. 2) geometry. The advantage of using the markerless for facial MoCap is it does not require any markers to be placed on the face, making it more convenient for the actor during the recording process (Fig. 4).

Figure 2. Motion capture pipeline

IV. MEASURING FACIAL EXPRESSION

A. Types of Facial Expression Measurement Technique System

Currently, facial MoCap technique is widely used as Figure 3. Multiple camera used to capture the facial in marker based the development of several softwares specialize in facial, MoCap [18]

©2020 Journal of Image and Graphics 38 Journal of Image and Graphics, Vol. 8, No. 2, June 2020

TABLE I. SIX FACIAL EMOTION MODIFIED [27], [29] No. Emotions Descriptions 1 Anger Combination of inner eyebrow pulled lower along with the eyes wide opened, tightened lid and tightened lip or opened (exposed the teeth) 2 Disgust Combination of the relaxed eyebrow and eyelid, wrinkled nose, lip corner depressor, and lower lip depressor (asymmetrically) 3 Fear Combination of inner/outer brow raiser, lowered brow, upper lid raiser, tightened lid, stretched lip and jaw, while eyes are tense and alert 4 Happy/ Relaxed eyebrow. Cheek raiser, pulled corner Joy of lip back toward ears and mouth opened are combined 5 Sad Combination of inner brow raiser, slightly eyes Figure 4. Markerless facial MoCap closed, lowered brow and lip corner depressor while mouth is relaxed 6 Surprise Combination of inner / Outer brow raised, upper lid raiser and opened while the lower V. AESTHETIC EXPERIENCE (AX) relaxed, and jaw drop opened The AX is fundamentally a form of communication between the media, individuals and environment [19]. So, there is no doubt that the notion of ‘aesthetic experience’ is current in contemporary aesthetics as AX is a basis of modern culture that has been aestheticised process. At the Figure 5. Facial expression by Tarnowski et al. [28] early stage, AX started as ‘regressive state’, revives the early developmental phases (for example, symbiotic fusion with caregivers) [20]. By that, Hagman believed VI. UNCANNY VALLEY AX is special in form of relationship with the object. It means that audience members, without prejudice are Highly realistic computer games, immersive willing to give themselves over the experience and application and computer-generated characters would approach with compassion neutrality. sometimes tick off the eeriness sensation or even disgust AX may appear through body and emotional the user. The negative sensation due to the highly realistic engagement along with the work of art by a combination character had already been discussed by Masahiro Mori, a process of the perception-action and motion-emotion roboticist who found the reveal of negative sensation loops [21]. when human responsed to robots and prosthetics that are not quite human-like [30], in a 1970 article in the A. Emotions Japanese journal Energy and it was not completely Certain visual cues can evoke emotional response translated into English until the year of 2012. while the visual of a person’s social background along Mori [31] posits a graph with the human likeness on with emotional experience will become a continuous the x-axis and the level of affinity on the y-axis. trigger of experience [22]. Therefore, movements Referring to Fig. 6, people have greater affinity when the especially in dance are intimately linked to emotional robots become more human-like for instance the experience, the connection between movement and prosthetic hand that has achieved a degree of resemblance emotions is believed to bidirectional where the human to the human form. However, once people realise that the body may be main influence or gain certain emotional real-looking hand is actually artificial, people will lose experiences [23]. The emotions are believed to be the their sense of affinity or the level of affinity will turn to signs of values as emotion charges the AX through a negative. As for this case, the hand becomes what is pragmatist perspective [24]. known as ‘uncanny valley’. Emotions however are human psychological states that are important in personal relations, experience and also everyday life, as there are hundreds of emotions identified [25]. As mentioned by Falip et al. [26], emotions have lots of definition and categorisation. Thus, with facial expression, emotion will be differ and can be measured (Fig. 5). Basic emotion also known as primary, fundamental or prototype is hard to define as scholars and theorist have not agreed on what kind and how much emotions are considered as basic, but at the same time, most of the numbers are between three to eleven [27]. [28] argues facial emotional can be divided into six basic emotion, which are anger, disgust, fear, surprise, joy, and sad (Table I). Figure 6. Mori’s graph of the uncanny valley [31]

©2020 Journal of Image and Graphics 39 Journal of Image and Graphics, Vol. 8, No. 2, June 2020

Similar to other computer-generated character, when make a distinction from reality. Undeniable, increasing user perceives that the character is not really alive, the the accuracy of recording motion and realism of 3D sensation of uncanny valley will make the user see them modeling environment can notably increase user as dead corpse or zombie. The situation is getting worse immersion, but conversely improving the virtual when the uncanny valley character is animated to move characters realism seems to impede the user’s acceptance [32] and the details regarding the affinity of human as been highlighted in the uncanny valley. As for the towards the moving character. context of Mak Yong digital preservation, the digital Additionally, the uncanny valley in modeling and technology may improves significantly, but in the same animation is also fundamentally affected by a series of time, aesthetic experience must be well considered in inconsistencies, such as i) Inconsistency of realistic order to assure the type of emotions intended to be modeling and creepy animation ii) Inconsistency of the portrayed are accurate representation and not derail animation of one part of the face and another iii) during the process of digitalisation as mentioned in the Inconsistency of the 3D face animation and the uncanny valley. Therefore, this research is crucial in expressions of our own faces [33]. order to provide a clear guideline for future researcher, Tinwell et al. [34] indicate that once the character creative industry practitioner and local cultural reaches high levels of human likeness, exhibits human- organisation. like appearance and behavior, the audience will place a greater scrutiny. Factor like facial expression is often the CONFLICT OF INTEREST cause that makes a character looking uncanny and The authors declare that there is no conflict of interest. appearing life-less, especially when the character’s expression appear odd or unnatural. Makarainen et al. [35] AUTHOR CONTRIBUTIONS add that there is two specific ways that make the character look uncanny: They appear either i) distorted or All authors had involved in writing the paper and ii) lifeless. Uncanny due to the distorted is referring to approved the final version. situations in which one can detect something abnormal and the perception of the abnormality sense has disturbed ACKNOWLEDGMENT the audience. For instance, a face with abnormally large This research was supported in part by Ministry of eyes will come to be uncanny even the degree of realism Education Malaysia (MOE) through Fundamental is higher. As for the perspective of lifeless, uncanny Research Grant Scheme, sensation would not be able to escape from the audience (FRGS/1/2017/WAB04/UPSI/02/1). when they look at a perfectly human-like robot. This feeling will appear when the audience aware of the robot REFERENCES is not human. [1] A. S. H. Shafii, “Management of commercial Makyung Kelantan,” As for the case as seen from the context at above, in Proc. 18th Biennial New Zealand Asian Studies Society Tinwell et al. [34] have been outlined many guidelines to International Conference, Rutherford House (RH), Pipitea advise the character designer on how to go across the Campus, 2009 uncanny valley. The guidelines have included factors like [2] A. Aristidou, E. Stavrakis, P. Charalambous, Y. Chrysanthou, and S. L. Himona, “ evaluation using laban movement facial feature and proportion as well as level of detail in analysis,” Journal on Computing and Cultural Heritage, vol. 8, no. skin texture. Designer can achieve more positive result if 4, pp. 1-19, 2015. the character can be designed with non-human feature but [3] N. Mustaffa and M. Z. Idris, “Accessing accuracy of structural performance on basic steps in recording Malay Zapin dance have the ability to emote like a human [36]. On the other movement using motion capture,” January 2017. hand, changing a character’s appearance to more cartoon- [4] G. S. Yousof, Mak Yong through the Ages: Kelantan’s Traditional like can eliminate the uncanny and boost positive feeling Dance Theatre, Malaysia: University of Malaya Press, 2018, p. 89. among the audience [37]. [5] G. S. Yousof, Mak Yong World Heritage Theatre, Malaysia: Areca Books Asia Sdn Bhd, 2019. Geller [38] also agrees that a good way to go across [6] M. Z. Idris, N. B. Mustaffa, and S. O. S. Yusoff, “Preservation of uncanny valley is alter a character’s proportions and intangible cultural heritage using advance digital technology: structure outside the ‘human’ range. An example is given Issues and challenges,” Harmonia: Journal of Arts Research and in his article, saying that why Gollum is so successful is Education, vol. 16, no. 1, p. 1, 2016. [7] E. Hegarini, E. Dharmayanti, and A. Syakur, “Indonesian due to he has big eyes and the shape of his face is not traditional dance motion capture documentation,” in Proc. 2nd quite human. The audience will unanimously deem the International Conference on Science and Technology-Computer, character as not a human. Thus, they don’t have to judge 2016, pp. 108-111. [8] L. Mündermann, S. Corazza, and T. P. Andriacchi, “The evolution the character by the same rule as if it is. But if designer of methods for the capture of human movement leading to portrays human, audience can figure out what is missing. markerless motion capture for biomechanical applications,” Journal of Neuro Engineering and Rehabilitation, 2006. VII. CONCLUSION [9] M. Alivizatou-Barakou, et al., “Intangible cultural heritage and new technologies: Challenges and opportunities for cultural The advancement of digital technology including preservation and development,” in Mixed Reality and Gamification for Cultural Heritage, M. Ioannides, N. Magnenat- computer graphics has made a significant change on how Thalmann, and G. Papagiannakis, Eds., Springer, Cham, 2017. we deal with cultural preservation. With the perfect [10] P. Nogueira, “Motion capture fundamentals: A critical and computer renderings, that is almost impossible for us to comparative analysis on real-world applications,” in Proc. the 7th

©2020 Journal of Image and Graphics 40 Journal of Image and Graphics, Vol. 8, No. 2, June 2020

Doctoral Symposium in Informatics Engineering, 2011, pp. 303- [30] V. Schwind, K. Wolf, and N. Henze, “Avoiding uncanny valley in 314. virtual character design,” Interactions, vol. 25, no. 5, pp. 45-49, [11] I. Kico, N. Grammalidis, Y. Christidis, and F. Liarokapis, 2018. “Digitization and visualization of folk dance in cultural heritage: [31] M. Mori, “The uncanny valley,” IEEE Robotics and Automation A review,” Inventions, vol. 3, no. 4, p. 72, 2018. Magazine, vol. 4, no. 7, pp. 98-100, 2012. [12] A. Aristidou, E. Stavrakis, P. Charalambous, Y. Chrysanthou, and [32] F. Engländer. (2014). The Uncanny Valley. [Online]. Available: S. L. Himona, “Folk dance evaluation using Laban movement https://www.animatorisland.com/the-uncanny- analysis,” Journal on Computing and Cultural Heritage, vol. 8, no. valley/?v=75dfaed2dded 4, pp. 1-19, 2015. [33] Natural Front. (2018). The Uncanny Valley: What It Is and How [13] V. Mayya, R. M. Pai, and M. M. M. Pai, “Automatic facial to Avoid It in Your Animation. [Online]. Available: expression recognition using DCNN,” Procedia Computer Science, https://naturalfront.com/the-uncanny-valley-what-it-is-and-how- vol. 93, pp. 453-461, September 2016. to-avoid-it-in-your-animation [14] A. Woodward, P. Delmas, G. Gimel’farb, and J. Marquez, “Low [34] A. Tinwell, M. Grimshaw, D. A. Nabi, and A. Williams, cost virtual face performance capture using stereo web cameras,” “Computers in human behavior facial expression of emotion and in Proc. Pacific Rim Conference on Advances in Image and Video perception of the uncanny valley in virtual characters,” Computers Technology, 2007. in Human Behavior, vol. 27, no. 2, pp. 741-749, 2011. [15] W. Wójcik, K. Gromaszek, and M. Junisbekov, “Face recognition: [35] M. Makarainen, J. Katsyri, and T. Takala, “Exaggerating facial Issues, methods and alternative applications,” in Face Recognition expressions: A way to intensify emotion or a way to the uncanny - Semisupervised Classification, Subspace Projection and valley?” Cogn. Comput., vol. 6, no. 4, pp. 708-721, 2014. Evaluation Methods, IntechOpen, July 2016. [36] E. Schneider, Y. Wang, and S. Yang, “Exploring the uncanny [16] R. Deshmukh and V. Jagtap, “A comprehensive survey on valley with Japanese video game characters,” in Proc. of Situated techniques for facial emotion recognition,” International Journal Play, DiGRA Conference, Tokyo, Japan, 2007, pp. 546-549. of Computer Science and Information Security, vol. 15, no. 3, pp. [37] D. Hanson, “Exploring the aesthetic range for humanoid robots,” 219-224, 2017. in Proc. the ICCS/CogSci-2006 Long Symposium: Toward Social [17] L. Edward, S. Dakpe, P. Feissel, B. Devauchelle, and F. Marin, Mechanisms of Android Science, 2006, pp. 16-20. “Quantification of facial movements by motion capture,” [38] T. Geller, “Applications overcoming the uncanny valley,” IEEE Computer Methods in Biomechanics and Biomedical Engineering, Computer Graphics and Applications, vol. 28, no. 4, pp. 11-17, vol. 15, suppl. 1, pp. 259-260, 2012. 2008. [18] B. Martin, C. Wallraven, D. W. Cunningham, and H. H. Buelthoff, “Combining 3D scans and motion capture for realistic facial Copyright © 2020 by the authors. This is an open access article animation,” Eurographics, 2003. distributed under the Creative Commons Attribution License (CC BY- [19] A. R. Cuthbert, Understanding Cities: Method in Urban Design, NC-ND 4.0), which permits use, distribution and reproduction in any London: Routledge, 2011. medium, provided that the article is properly cited, the use is non- [20] G. Hagman, Aesthetic Experience: Beauty, Creativity, and the commercial and no modifications or adaptations are made. Search for the Ideal, Amsterdam: Rodopi, 2005. [21] I. Brinck. (2018). Cogn process. 19:201. [Online]. Available: Muhammad Zaffwan Idris is an Associate https://doi.org/10.1007/s10339-017-0805-x Professor at Universiti Pendidikan Sultan Idris, [22] P. Smith, “A psychological model for aesthetic experience,” Malaysia. He obtained his Professional Leonardo, vol. 9, no. 1, pp. 25-31, 1976. Doctorate (D.Des) and Masters Degree from [23] N. F. Bernardi, A. Bellemare-Pepin, and I. Peretz, “Enhancement Swinburne University of Technology, of Pleasure during spontaneous dance,” Front. Hum. Neurosci, Australia in 2011 and 2006, respectively. 2017. Since graduated from Universiti Teknologi [24] P. Määttänen, “Emotionally charged aesthetic experience,” MARA in 1999 for his Bachelor Degree, he Aesthetics and the Embodied Mind: Beyond Art Theory and the has involved in an extensive number of years Cartesian Mind-Body Dichotomy, vol. 73, pp. 85-99, 2015. designing, lecturing, consulting, exhibiting [25] A. Magid and M. I. Mohammed, “The tree of emotions: Exploring and publishing in the area of Graphic Design, Photography, Creative the relationships of basic human emotions,” The International Multimedia and Digital Heritage. Journal of Indian Psychology, vol. 5, no. 1, 2017. [26] S. V. Falip, J. D. Castells, and D. F. Escudero, “Facial animation Naimah Musa is a PhD student at Universiti and motion capture: Key role for the communication,” in Proc. Pendidikan Sultan Idris. She obtained her 3rd International Multi-Conference on Society, Cybernetics and Masters Degree (MA) from Universiti Informatics, 2009, pp. 134-139, 2009. Teknologi Mara, Malaysia in Visual [27] J. Kumari, R. Rajesh, and K. M. Pooja, “Facial expression Communication and New Media. After recognition: A survey,” Procedia Computer Science, vol. 58, pp. completed her Bachelor Degree in Graphic 486-491, 2015. Design in 2009, she gained experienced in [28] P. Tarnowski, M. Kolodziej, A. Majkowski, and R. J. Rak, several areas such as desktop publishing, “Emotion recognition using facial expressions,” in Proc. ICCS, printing, new media and education. She is 2017. passionately explore Graphic Design [29] S. Sumpeno, M. Hariadi, and M. H. Purnomo, “Facial emotional especially, Aesthetics, Ornaments, Digital Heritage and Motion Capture expressions of life-like,” 2011. (MoCap).

©2020 Journal of Image and Graphics 41