<<

TECHNIQUES IN LIP-SYNC AND FACIAL TRACKING USING : THE PRE-EVOLUTION TO MOCAP

Jong Sze Joon Eddie Soon Eu Hui Faculty of Creative Multimedia LimKokWing University College of Multimedia University Creative Technology 63100 Cyberjaya, 63100 Cyberjaya, Selangor, Malaysia Selangor, Malaysia e-mail: [email protected] email: [email protected]

Abstract 1. Introduction

In terms of realistic , MoCap (Motion Capture) gains "There is no particular mystery in animation...it is really the upper hand over techniques such as very simple, and like anything that is simple, it is about the keyframing and simulation due to the capability of real-time visualization and the accuracy of recorded data to produce a hardest thing in the world to do.” Bill Tytla (1937). distinctive quality (natural looking) of movements for animated characters. Motion capture is not without controversy, however. There are many distinctive MoCap (Motion Capture) based The goal of animation is not to create human like motion, but to animation from all over the world where hyper-real virtual CG impart unique personalities to animated characters, to give them characters will be an element of the future of digital storytelling. the "illusion of life". Mocap is used to create 3D and natural simulations in a performance oriented way. Thus, some might There are still many unsolvable problems such as difficulty in hair better define the term, performance animation. However, our plug-in, for realistic flowing hair and follicles, cloth, skin current state in local production of MoCap technology is still texturing and proportion deformations. This research will focus on limited in the factors of resources, knowledge, technology as well the complexity of manipulating this MoCap data to further as the lack of experience and reference in the individuals of develop the controls of character animation to achieve most of the Malaysians. Therefore, it is hard to develop a local content of animation principles. This will then enable the to better motion-capture based animation without any guidelines and understand character motion manipulation to a new degree. One references. way of accomplishing this task is to understand and form a proper integration among the applications needed from capturing the data MoCap can be considered as the shortcut for photo realistic real- to manipulating the animation curves for desired motion of the life based animation. It also has some very useful applications for character. many types of users, not only does it contribute to the industry but it also confers to edutainment purposes. Despite that, With a proper integration system establish, the can have the many usefulness of Mocap also contributes greatly to medical, full control of how the character can be manipulated to the simulation, engineering and ergonomic applications, for the extreme. The final output will be more effective with creation of generic and special purpose virtual reality character enhancement over realistic motion for character animation. As integration. It's also used in the entertainment industry for feature conclusion, this research will recommend the creative way of films, advertising, TV, and 3D computer games. Its sole purpose is using MoCap in 3D animation based on various approaches not simply to duplicate the movements of an actor or animator, but proposed. also as a process of taking a human's emotion and recording it in some fashion. Keywords: Motion-capture, animation, real-time visualization, character motion manipulation.

The use of motion capture for computer character animation is fairly new, having begun in the late 1970's, and only now beginning to become widespread. The idea of copying human motion for animated characters is, of course, been practiced for some time. A method called , has been commonly utilized for replicating live footage of actors playing out the scenes. This technique was invented in 1915 by , a cartoonist, in an attempt to automate the production of animated Figure 1. Mocap data by actor playing a drum cartoons. The idea is to painstakingly trace the image of the converted to virtual character in 3D captured film frame by frame onto paper. It became contentious almost immediately because Fleischer was trying to get studios to mass-produce cartoon characters using this new technology, and that meant 2D animators would end up rather jobless.

It took time for the industry to understand that rotoscoping would only be applied in cases when hyper-realistic human motion was intended. It remains controversial, and most studios do not like to admit using it. Later on, this copying technique is applied to 3D Figure 2. Example of Mocap data for the game Quake animation as well, whereby the animator overlays 3D models on a 3: Arena background 2D drawn images and even video sequence. Despite that, compositing and editing softwares were upgraded to Although the introduction to Mocap (motion capture) may date perform rotoscoping as well. back to as far as the 70s, it is now globally projected more into computer graphics character animation. Basically, the Mocap The goal of animation is not to create human like motion, but to system enables the animator to record the precise movement of a impart unique personalities to animated characters, to give them human subject in time and space for immediate or delayed the "illusion of life". Both the Rotoscope and motion capture analysis and playback, which can later be modified and applied to impose human motion on animated characters, which make them an existing 3D character model in any 3D platform. The recorded seem subtle and lifeless in comparison to those animated or hand data can be as general as the simple position of the body keyframed by skilled artists. This is also because actors cannot interacting with the geographical environment around, or as break the law of reality to fill in the principles of animation intricate as the movement of facial expressions comprise of applied in a fine animation. In the case of rotoscoping, artists muscles movements. However, no matter how high the trace human motion but interpret it with the model of the technology, the most important thing about Mocap is the ability of animated character. In the case of motion capture, human motion the actor to act. is copied and the data is directly applied to the animated character. The temptation to use this captured motion and call it "animation" has led computer animators practiced in the of traditional animation to call it "Satan's Rotoscope" (a term attributed to animator Steph Greenberg).

Figure 3. Camera capturing with a slow shuttle speed to show the path of the markers in space.

2. From times of yore to contemporary

2.1 A STEP BACK TO THE PAST

“The big question is, why make digital humans at all? Sure, they look cool and they've got a certain kitsch value, but what purpose Figure 4. Polygon mask by done in Digital Fusion do they really serve in society?” - Laura Schiff, (2002.) to show rotoscoping in compositing software

2.1.1. What The Future Beholds? 2.1.3 Electromechanic Motion Capture

Out of the many Mocap systems engineered, there is a set of favor only to a few classifications. The These systems are armatures that are worn by the performer Electromagnetic and Electromechanic Mocap systems were during capture. They consist of angular measurement: devices that measure the rotation of joints. Their output can generate motion previously massively used in the industry. Recently in the data in real-time, but their margin of error is large and their past few years, Optical Mocap was introduced. The capture frequency is low. Due to that, this system is very following are a brief description of the Mocap systems. economical. Because they only measure the rotation of joints, most electromechanic suits do not have a way of capturing the 2.1.2 Electromagnetic Motion Capture global position of the performance, so they need to substitute with other resources. One of the disadvantages of using a mechanical One of the types of motion-capture device is the electromagnetic suit is that it cannot collide with another actor. However, it can be tracker, which consists of a series of receivers or sensors, a used outdoors covering a wider capture area. transmitter and a control unit. The sensors are placed at all the joints of the actor and they are connected to the control unit and are placed on the main body of the performer. The transmitter 2.1.4 Optical Motion Capture generates a low-frequency magnetic field, and as the receivers move through it, the control unit is able to track their signal in There are mainly two types of optical capture systems order to calculate their position in space. commercially used today: Active Optical and Passive Optical systems. They both use the same underlying principals. A chain of These systems have been widely used for many years in cameras placed around the capture area track the positions of broadcast and military applications. However, it has a higher markers attached to the body of the actors. Triangulation is used margin of error and smaller capture frequency than the optical to compute the 3D position of a marker at any given sample, from types, but they are less expensive and capable of real-time an array of 2D information from every camera. The Active optical feedback. These systems' biggest drawback is their susceptibility system uses illuminating elements as markers. On the other hand, to interference by metallic objects and confined within a small Passive optical systems however, use retro-reflective markers. capture area. Most systems are more flexible in the use of passive markers. This is due to the capability of achieving greater speed.

Figure 6. Actor recording data using the Optical System. The background behind shows the realtime 3D representative Figure 5. Mocap actor using the electromagnetic suit 1.

Figure 7. The motion of the actor is captured and motion is manipulated in Vicon8.

2.2 FACIAL EXPRESSION AND LIP SYNCING IN 3D Figure 8. Example of different facial expressions of a 3D character, Lela Facial animation specifically addresses the generation of facial expressions and mouth movements. There are two basic approaches: parameterised key positions and physically based models.

When working on a 3D animation, one can't actually animate until the 3D character has been modelled, and it has to be set-up for animation. For characters bodies, this usually consists of model- segmented joints, linked in a typical hierarchy, or for a more complex occasion, a full skeleton setup.

However for the case of facial animation, most animators prepare a set of facial expressions ready to imply them to their character. It actually depends on the 3D software on how the data is actually processed. Some software requires minimum amount of scripting to link the expressions to the character model. Others might Figure 9. Different NURBs faces that use the same surface depend on plug-ins to achieve the purpose. topology (number of CVs) and the same facial poses using blend shape techniques. There is one significant case that one need to consider for the preparation of the facial setup. This is of course, emotions or expressions. Unless the appointed character is to remain perfectly When thinking about lip-sync in 3D, it helps to think how 2D dull and emotionless, you will need to make it look happy, sad or animation achieve lip-sync. By drawing different mouth shapes a wide variety of other expressions. Typically there are 6 base for a character on different frames, it creates the appearance of emotions of a human character. They are: speech in cartoons. By animating the mouth shapes timed to a

specific dialogue, the character appears to speak. Each of these

mouth shapes is usually referred to as a phoneme. Lip-syncing in • Sad 3D work in similar ways. In this case though we're only focusing • Angry on the head of the character. • Fear • Disgust For the basic breakdown, the lowest number of phonemes most • Surprise people use is 9. These usually are broken down as: • Happy • A, I • O • E (as in sweet) • U • C, K, G, J, R, S, TH, Y, Z • D, L, N, T • W, Q • M, B, P • F, V

Figure 10. The phonemes most people use for lip- sync animation. This is where facial motion capture comes into the picture. Alike the optical Mocap system, facial motion capture invloves the Sometimes these shapes overlap. For example, you may need to detailed placing of retroreflective markers on the actor’s face. use the F phoneme for TH. So this should be looked at as a basis Using the same technology of the optical Mocap systems, the that you can work from and not something that must be eagerly markers are illuminated by an infrared strobe, and a CCD camera followed. captures the reflected lights. Even though the optical Mocap system can capture both the actors’s movements and the facial For more precision, the phoneme can be broken down to more expressions at the same time, but the facial Mocap is usually sub-phonemes. It will help the animation attain a more realistic recorded separately for more accuracy. Despite of that, the appearance. In most cases phonemes are created not only in a flat capture is done seperately also because voice recording is done at style but also comprises the mixture of expressions. the same time as the facial capture.

"From a production standpoint, there are a number of significant 2.2.1 Facial Animation advantages to real-time face tracking" - Diana Phillips Mahoney, (2000), Computer Graphics World. Facial animation specifically addresses the generation of facial expressions and mouth movements. There are two basic In the production process, the animators always spend a lot of approaches: parameterized key positions and physically based animating and rendering time. At the end of the day, the director models. might no find it satisfying to see the final animation done by his team. If there are changes to be applied, the animators need to The 3D model head is usually modeled with a mesh of surface spend another week or so animating and rendering. With facial patches. In the parameterized approach, some subset of the mesh motion capture, the director can just sit in and evaluate the capture vertices are placed in a key position and a parameter value is of the expressions done by the actors. If he is dissatisfied, all he associated with the position. As the parameter value is varied, the needs to do is to ask the actor to act again. Compared to the key- mesh vertices' positions are interpolated between key positions. framing method, facial motion capture is much faster and cost Facial animation is performed by appropriately varying parameter effective. values. Unfortunately, because of the flexibility of the face, this can result in a large number of parameters that are difficult for a user to deal with efficiently.

In a simple physically based approach, the mesh edges and vertices can be used to represent springs and mass nodes. By controlling spring constants and masses, the face can be animated by moving a small number of vertices. Unfortunately, the results this model produces fall short of being perfect.

More advanced implementations of facial animation model the bone structure and muscle groups as they slide over the bone structure. In this case, more accurate modeling of the physics and geometry involved produces very good results. However, there is a computational price to be paid for the higher quality of animation (no surprise there).

2.3 FACE TRACKING Figures 11. The different positions of the markers used to capture facial expressions and lip sync in Optical MoCap system. "Mother Nature, in an apparent effort to prevent unauthorized imitations of her work, devised a system in which every grimace, grin, and frown relies on an intricate interplay of muscles that is nearly impossible to replicate" – Diana Phillips Mahoney, 2.4 Conclusion (2000).Computer Graphics World.

The human face might be one of the most distinguished features In a nutshell, Motion Capture (MoCap) in computer graphics is a in comparison with other parts of the human body. This is mainly way to digitally record position and motion information of actors because the face reveals the identity of the individual. Besides from the real world. The great advantage of MoCap over that, the few thousand expressions a face can do tell a lot of traditional animation techniques such as keyframing and statements. Imagine all the different muscle groups around the simulation is the capability of real-time visualization and the high faces interact to pull an expression. Now imagine, having quality (natural looking) of the generated animation. manipulating vertices on a 3D-model head to imitate all these facial expressions. It would be a painstaking and time consuming This is however effective only at the case when hyper-real human way to animate the character. movements is required or for characters that are meant to be instead of hand-animated. Most cartoon characters don't

look appealing when this kind of motion is applied to them, mostly because we expect them to move in a stylised way.

The product of motion capture data is rather distinctive due to the flexibility that it can be used in a variety of different applications, including military, medicine, law, sports and performance animation for visual effects and video games.

In terms of the exploration into the Mocap technicalities, more precise tracking calculations customized to the actor’s bone structure or even bending the rules of reality in the adaptation of the application of the animation principles can be manipulated to create the ultimate character animation.

The development and usage if Mocap in Malaysia is still immature where by a lot of animators and other individuals from this field did not receive sufficient exposure into operating the Mocap systems. The research into Mocap applications is still at a superficial level, and to actually implement Mocap for production purpose will still be categorized as a vision in the making.

Perhaps sometime in the near future, Mocap can be seen as a common practice in various fields in the industry and might even have an early upstart at the academic level.

3. References

FRAISSE, F.W, Motion Capture Research Website [Online] Available: http://www.visgraf.impa.br/Projects/mcapture/mcapture.html [19th November 2002]

GOME, J. 1999. Motion Capture Research Website [Online]. Available: http://www.css.tayloru.edu/instrmat/graphics/hypgraph/animat ion/motion_capture/history1.htm [20th September2001]

KLEISER, J. Character Motion Systems, Course Notes: Character Motion Systems, In Proceedings of ACM SIGGRAPH 1993, ACM Press / ACM SIGGRAPH ,Anaheim, CA, 33-36.

MAHONEY, D.P. 2000, Computer Graphics World. [Online]. Available: http://cgw.pennnet.com/Articles/Article_Displa...play&ARTI CLE_ID=49006&KEYWORD=motion%20capture [24th March 2003]

STURMAN, D.J. 1999. Visual Effects Cinematography: Elemets of Cinematogrpahy. [Online]. Available: http://www.VFXPro.com/article/mainv/0,7220,106549,00.htm l [15th September 2000]

TARDIF, H., Character Animation In Real Time, Panel: Applications of Virtual Reality I: Reports from the Field, In ACM SIGGRAPH Panel Proceedings, 1991.

WALTERS, G. The story of Waldo C. Graphic, Course Notes: 3D Character Animation by Computer, In Proceedings of ACM SIGGRAPH 1989, Boston, July 1989, pp. 65-79.