eHeritage of Shadow Puppetry: Creation and Manipulation

Min Lin∗1, Zhenzhen Hu∗2 , Si Liu1 , Meng Wang2, Richang Hong2, Shuicheng Yan1 1ECE Department, National University of Singapore 2School of Computer and Information, Hefei University of Technology {mavenlin, huzhen.ice, eric.mengwang, hongrc.hfut}@gmail.com, {dcslius, eleyans}@nus.edu.sg

ABSTRACT children and adults, and is also popular in many other coun- 1 To preserve the precious traditional heritage Chinese shadow tries around the world . We focus on Chinese shadow pup- puppetry, we propose the puppetry eHeritage, including a petry in this work. Chinese shadow puppetry, as shown creator module and a manipulator module. The creator in Figure 1(a), is a traditional artistic form of theater per- module accepts a frontal view face image and a profile face formance with colorful silhouette figures. These figures are produced with made of leather or paper. A tradi- image of the user as input, and automatically generates the 2 corresponding , which looks like the original person tional Chinese shadow puppet is composed of a head part and meanwhile has some typical characteristics of traditional shown in Figure 1(b) and a body part shown in Figure 1(c). Chinese shadow puppetry. In order to create the puppet, we As an ancient form of storytelling, sticks and flat puppets first extract the central profile curve and warp the reference are manipulated behind an illuminated background to cre- puppet eye and eyebrow to the shape of the frontal view ate moving pictures [1]. By moving both the puppets and eye and eyebrow. Then we transfer the puppet texture to the light source, various effects can be achieved. But in the the real face area. The manipulator module can accept the 21st century, shadow puppetry in China is on a steep and script provided by the user as input and automatically gener- fast decline. Audiences and apprentices are evaporating at ate the motion sequences. Technically, we first learn atomic an alarming rate. To protect this ancient artistic heritage, motions from a set of shadow puppetry videos. A scripting China’s State Council put Chinese shadow puppetry in the system converts the user’s input to atomic motions, and fi- first list of National Intangible Cultural Heritage of China in nally synthesizes the based on the atomic motion 2006, and United Nations Educational, Scientific and Cul- tural Organization (UNESCO) listed the artistic form in In- instances. For better visual effects, we propose the sparsity 3 optimization over simplexes formulation to automatically as- tangible Cultural Heritage in 2011 . semble weighted instances of different atomic actions into a Nowadays, the preservation of culture heritage attracts smooth shadow puppetry animation sequence. We evaluate growing attentions of the world. Our artistic motivation is the performance of the creator module and the manipula- to develop a system by applying the latest multimedia tech- tor module sequentially. Extensive experimental results on nologies to aid preservation, interpretation, and dissemina- the creation of puppetry characters and puppetry plays well tion of this ancient cultural heritage. To attract more people demonstrate the effectiveness of the proposed system. to be interested in shadow puppetry, we design two puppetry modules, including a creator module and a manipulator Categories and Subject Descriptors module that people all over the world can experience the fun J.5 [Arts and Humanities]: Arts, fine and performing of the puppet creating and performing by themselves. The input of the creator module is two face images. One Keywords of the most important advantages of our creator module is personalization. The making process of a puppet includes Chinese shadow puppetry; face rendering; sparsity optimiza- 4 tion over simplex; animation seven steps which are all complex and ingenious . Our creator module can automatically generate anyone’s person- 1. INTRODUCTION alized puppet, and the process is almost immediate. We pay Shadow puppetry has a long history in China, Indone- special attention to keeping the characteristics of the tra- sia, India, Greece, etc. as a form of entertainment for both ditional puppet during the automatic creator process. For example, the puppet’s face has long narrow eyes, a small ∗indicates equal contribution mouth and a straight bridge of nose as shown in Figure 1(b). Technically, we extract the central profile curve from the profile face image and warp the reference puppet eye into Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full cita- 1http://en.wikipedia.org/wiki/Shadow puppetry tion on the first page. Copyrights for components of this work owned by others than 2In this paper, we make puppet and puppetry interchange- ACM must be honored. Abstracting with credit is permitted. To copy otherwise, or re- able. publish, to post on servers or to redistribute to lists, requires prior specific permission 3 and/or a fee. Request permissions from [email protected]. http://www.unesco.org/culture/ich/index.php? lg=en&pg=00011&RL=00421 MM’13, October 21–25, 2013, Barcelona, Spain. 4 Copyright 2013 ACM 978-1-4503-2404-5/13/10 ...$15.00. http://www.travelchinaguide.com/intro/focus/shadow- http://dx.doi.org/10.1145/2502081.2502104. puppetry.htm

Area Chair: Marc Cavazza 183 (d) (e) Module I: Puppetry creator (a) # Script Puppet.sit(duration=20) walk …… sit bow Puppet.walk(distance=10, … … … … duration=15) (b) …… Puppet.bow(duration=20) …… (c) Shadow puppetry culture (f) Module II: Puppetry manipulator

Figure 1: (a) A scene of traiditonal Chinese shadow puppetry. When light penetrates through a translucent sheet of cloth, the audience will see the “shadows”, silhouettes in profile. (b) Faces of puppets are characterized by exquisite headdresses and representative facial features, such as thin eyebrows. (c) The puppets are manipulated with sticks through three keypoints, i.e., neck, left hand and right hand. (d) and (e) are two results of our puppetry creator module for male and female respectively. (f) is an exemplar sequence generated based on user-provided scripts by the puppetry manipulator module. All of the figures in this paper are best viewed in original PDF file. the frontal view eye simultaneously. Then we transfer the literature is mainly divided into two categories: tangible puppet texture as the texture of the profile curve. One male cultural heritages and intangible cultural heritages. Tangi- and one female puppet generated by our creator module are ble cultural heritages5 include buildings and historic places, shown in Figure 1(d) and Figure 1(e). monuments, artifacts, etc. Anna Paviotti et al. [5] dealt We also design a manipulator module. For this module, we with the problem of estimating the lighting field of the mul- aim to preserve the performance pattern of Chinese shadow tispectral acquisition of frescoes by a variational method. puppetry during the shadow puppetry manipulation process. Lior Wolf et al. [6] studied the task of finding the joins of There are several basic puppet motion patterns (denoted as the Cairo Genizah which is a precious collection of mainly atomic actions afterwards), such as walk, dance, fight, nod, Jewish texts. Tao Luo et al. [7] presented a multi-scale laugh, etc. Besides, as shown in Figure 1(c), the real puppet framework to generate 3D line drawing for archaeological il- is controlled with three sticks which are fixed on the pup- lustration. Intangible culture heritage6 includes traditional pet’s neck and two hands separately, and the motion pattern festivals, oral traditions, oral epics, customs, ways of life, of other puppet parts is affected by gravity. Therefore, we traditional crafts, etc. Markus Seidl et al. [8] proposed a need to simulate all the atomic actions in the puppet style. detection of gradual transitions in historic material. Anu- There are some existing research works on shadow puppets pama Mallik et al. [9] studied on the preservation of Indian focusing on the user’s body interaction with the virtual pup- classical dance. The shadow puppetry belongs to intangible pet, i.e., the puppet’s motion imitates the user’s motion cultural heritages. [2],[3],[4]. But all of them cannot preserve the puppet’s spe- 2.2 Face Rendering cific motion style. To the contrary, our module can convey Digital image processing provides a solid foundation for this traditional artistic charm completely. Our manipulator building artistic rendering algorithms. All image-based artis- module can directly accept text scripts as input and display tic rendering approaches utilize image processing operations the specified motions accordingly. The interface is shown in in some forms to extract information or synthesize results Figure 1(f). For manipulation, we identify atomic motions [10]. Non-photorealistic rendering focuses on enabling a for the animation and collect instances from a set of shadow wide variety of expressive styles of digital arts. Among these puppetry videos. A scripting system converts the user input arts, cartoon and paper cutting are the most related to our into atomic motions, and finally synthesizes the animation work. using the collected instances. Cartoon: A cartoon is a form of two-dimensional il- The organization of the rest of this paper is as follows. lustrated visual art with a typically non-realistic or semi- In Section 2, we provide a review of related work. Section realistic drawing or painting. CharToon system [11] pro- 3 makes an overview of our puppetry module: creator and vides special skeleton-driven components, an extensive set manipulator. Next, in Section 4 and 5, more detailed step of building blocks to design faces and the support to re- by step introduction of the modules is presented. The ex- use components and pieces of . Chen et al. [12] perimental results are shown later in Section 6. Finally, we explored a Pictoon system that allows users to create a per- conclude the paper in Section 7. sonalized cartoon and animation from an input face image 2. RELATED WORK based on sketch generation and cartoon stroke rendering. Our shadow puppetry system is built on several areas of Paper Cutting: Artistic paper cutting is also a tradi- related work. tional and popular Chinese decorative art which, usually 2.1 eHeritage 5http://www.unesco.org/new/en/cairo/culture/tangible- Recently, for the propose of preserving cultural heritage cultural-heritage/ through the application of advanced computing technolo- 6http://www.unesco.org/new/en/cairo/culture/intangible- gies, eHeritage becomes a hot research topic. The related cultural-heritage/

184 in a very concise two-tone form, has its unique beauty of Creator Module expressive abstraction. Xu et al. [13] generated arrange- ments of shapes via a multilayer thresholding operation to compose digital paper cutting designs. M. Meng et al. [14] rendered paper cutting images from human portraits. They localized facial components and used pre-collected represen- tative paper cutting templates. Then they obtained a syn- thesized paper cutting image by matching templates with the bottom-up proposals. Our work differs from cartoon and paper cutting in that we only focus on the face rendering of shadow puppetry, which has its unique characteristics. Thus, very tailored Manipulator Module Smoothness image processing techniques are required. Script bow jump ...... Optimization sit kneel ...... 2.3 Puppetry Animation ...... Developing an interactive and user-friendly interface for Atomic motion sequence 12D DOF sequence Rendering people of all skill levels, to create animation is a long-standing problem in . Recently, a few works have been conducted on digital puppetry. As a visualization tool Real world puppet show Instances of atomic motions for traditional cinematic animation, digital puppetry trans- forms the movements of a performer to the actions of an Figure 2: Illustration of the whole system framework. animated character to provide live performance such as [15]. The upper panel is the creator module and the lower Shadow puppets are also used to produce animated films, panel is the manipulator module. but producing an animated film with shadow puppets, frame nipulates these puppets in front of a Kinect depth sensor. by frame, is laborious and time-consuming. The solution of Leite et al. [23] presented an anim-actor technique. Anim- animation performed by two-dimensional puppets appears actor is a real-time interactive puppets control system using only recently. Hsu et al. [16] introduced a motion planning low-cost based on body movements of non- technique which automatically generates the animation of expert artists. Zhang et al. [4] proposed a general framework 2D puppets. Barnes et al. [17] created a video-based anima- for controlling two shadow puppets, a human model and an tion interface. Users first create a cast of physical puppets animal model. Cooper S. Yoo et al. [24] created a tangible and move these puppets to tell a story. They tracked the stage interface that can be used to control marionette with- motions and rendered an animate video. Tan et al. [18] pre- out wires based on Kinect. Kinect camera can provide very sented a method for interactive animation of 2D shadow play accurate depth estimation which can greatly facilitate the puppets by real-time visual simulating using texture map- animation process, but also limits the system. ping, blending techniques, and lighting and blurring effects. Kim et al. [19] controlled 3D avatars to create user-designed 3. OVERVIEW In this section, we give an overview of the shadow pup- peculiar motions of avatars in real-time using general inter- petry system. As shown in Figure 2, the whole system con- faces. ShadowStory [20] was a project created by Fei et tains two modules: creator module and manipulator module. al. for digital storytelling inspired by traditional Chinese The first part is creator module. Since the real pup- shadow puppetry. The system allows children to design their pet face usually contains a profile (including forehead, nose, own puppets and animate them with a tablet PC. Pan et al. mouth, etc) and a frontal view eye, we require users to input [21] presented a 2D shape deformation of the triangulated one frontal view image and another profile view face image. cartoon which is driven by its skeleton and the animation First, we extract the eye and eyebrow from the frontal view can be obtained by retargeting the skeleton joints to the image, and the puppet eye and eyebrow are deformed into shape. the frontal view eye and eyebrow shape. Then the profile There are some websites where users can entertain with curve of the real profile face is extracted and deformed. Fi- the puppetry, such as Puppet Parade7, an interactive pup- nally, the texture is transferred from a certain sample puppet petry installation that animates puppets by tracking the to our generated puppet. arms of the , or We Be Monsters8, a collaborative The second part is manipulator module. The objective of puppet art installation that tracks multiple skeletons to ani- this part is to create a system that takes in a user script mate a puppet. Although shadow puppetry is not a focus on and outputs an animation sequence. To build this system, these websites, the body motion is used to control the pup- we collect puppetry show videos and manually define atomic pets. All these puppetry methods do not very accurately motions as building blocks of the animation. Instances of the consider the unique motion style of shadow puppetry. atomic motions are extracted from these videos. The script With the development and increasing popularity of Mi- input by the user is first interpreted into an atomic motion crosoft Kinect Camera, many researchers begin to explore sequence, which is then synthesized by weighted mean of the how to achieve human-puppetry interaction via Kinect. Robert instances as later introduced. Held et al. [22] presented a 3D puppetry system that allows users to quickly create 3D animations by performing the 4. MODULE I: PUPPETRY CREATOR motions with their own familiar, rigid toys, props, and pup- The design of Chinese shadow puppetry figures follows pets. During a performance, the physically ma- traditional aesthetics. There are mainly four kinds of pup- petry faces: Sheng (male roles), Dan (female roles), Jing 7http://design-io.com/projects/PuppetParadeCinekid/ (roles with painted faces) and Chou (comic characters). These 8http://wearemonsters.net/ roles are mostly from classical plays and appearances are for-

185 (a) (b)

(e) (a) Frontal face alignment (b) Profile face alignment Figure 3: Aligned frontal and profile view faces. malized. Since the procedures of puppetry making are very complicated and costly, it is hard for the user to create a (d) specific puppetry figure. In this section, we will introduce (c) the creator part of our system in details. Figure 4: Eye & eyebrow warping. (a) Frontal face 4.1 Face Alignment image, (b) extracted eye and eyebrow region after face Given a frontal view face image and a profile face image of alignment, (c) puppet face in our repository, (d) ex- the same person, we utilize the frontal view eye & eyebrow tracted eye and eyebrow of puppet, (e) eyes and eye- and the profile curve to create the head of a user-specific brows regions after warping puppet’s eye and eyebrow puppetry figure head. to the real face shape. Face Alignment: We process the face alignment of the frontal face and profile face separately. For a frontal view face, it is aligned by a commercial frontal face alignment algorithm 9. The alignment of a profile face is performed based on the unified model proposed by Zhu et al. [25] for (a) Profile face (b) Aligned curve (c) Extracted skeleton face detection, pose estimation, and landmark estimation in real-world, cluttered images. We show the aligned results (f) Puppet profile curve of frontal and profile view respectively in Figure 3. After face alignment, we get the locations of a set of keypoints on face. Then based on these points, we extract the profile (d) Reference puppet face (e) Reference Profile curve curve from the profile face and rotate the curve for direction adjustment. Similarly, eye and eyebrow areas are extracted Figure 5: Profile curve generation. (a) The profile face from the frontal image. provided by the users, (b) the extracted profile curve after rotated, (c) the skeleton of the profile curve, (d) 4.2 Eye & Eyebrow Warping the reference puppet face, (e) the reference profile curve In Chinese shadow puppetry, most of the figures’ faces are skeleton and (f) the refined skeleton of profile curve after in profile view. For describing features of characters, eyes applying re-com. and eyebrows are abstracted and exaggerated in an artistic way. For example, the eye of a puppet is stretched and the build the correspondences based on Baryceentric coordinates eyebrow is bent. The topology of the puppet eye is similar between two eyes. The correspondences are used to warp to the frontal view eye. To guarantee the similarity of the reference puppet eye into the shape of real eye. Similarly, output figure face, given a target frontal view face image and for eyebrow, we carry out the same process on puppet. a puppet face, we warp the puppet eye and eyebrow into the One example of eye and eyebrow warping is shown in Fig- real eye and eyebrow shape. ure 4. The original eye and eyebrow of the reference puppet Most state-of-the-art warping methods [26, 27] assume in Figure 4(c) are warped to the shape of Audrey Hepburn’s that the user provides explicit correspondences between the eye and eyebrow in Figure 4(b). The synthetic result is anchor points of the source and the target shape, and other shown in Figure 4(e). The newly generated eye and eyebrow points are mapped according to their relative positions with fuse the characteristics of Audrey Hepburn and the reference respect to the anchor points. Barycentric coordinates of tri- puppet: Audrey Hepburn’s eyes are big and slightly turn up- angles provide a convenient way to linearly interpolate the wards, and the reference puppet’s eyes have obvious double eyelids. points within a triangle. Given a planar triangle [v1, v2, v3], any point v inside it has the unique Barycentric coordinates 4.3 Profile Curve Generation [w1, w2, w3] and: Central profile curve is an important geometric feature. To generate an appropriate puppet profile, we will consider w v + w v + w v 1 1 2 2 3 3 = v. (1) two aspects: 1) this profile should look like the input face w1 + w2 + w3 and 2) this profile should have puppetry style. We keep the The Barycentric coordinates are used as invariants to build most unique parts of the profile curve and transform them the correspondences between two topologically equivalent into one with puppetry style. polygons, characterized by the vertexes [27]. 4.3.1 Profile Abstraction In our case, we first extract the contour points of puppet The central profile curve is highly discriminative. Some eye C = {c , ···, c } by annotating landmarks and detect 1 n 3D facial research works use central profile curve as a sig- the contour points of real eye P = {p , ···, p } automati- 1 m nificant feature for face maching [28] and face recognition cally. In our setting, n = m = 10. [29]. To obtain a discriminative and puppet-like person- We consider puppet eye and real eye as two topologically alized puppet, we generate a profile curve which has the equivalent polygons, characterized by C and P , and then uniqueness of the input face and the characteristics of the 9http://www.omron.com/r d/coretech/vision/okao.html reference puppet face. In order to estimate the uniqueness of

186 (c) Profile curve (a) Binary puppet profile curve (b) Texture sample after texture transform Figure 6: Texture Transfer (a) Props in the play a real profile face, we collect a 90 degree profile face dataset including 500 images from the well-known MultiPIE bench- mark [30]. Given a 90 degree profile face, we extract the edge of a profile face by using extended Difference of Gaussian (DOG) algorithm [31] and based on the face alignment result, we (b) Different head dressing obtain the profile curve. A profile curve is separated into four parts: forehead, nose, mouth and jaw. We aim to pre- Figure 7: Props in the puppetry and different head serve unique parts and replace others with the corresponding dressing. parts of the reference puppet, which are manually annotated Head x, y position 12 degrees of freedom Annotation Point offline. Rotation angle 3x First, we process this curve area by binary morphology y 2x x tools and extract the binary skeleton line with only one pixel Left/right shoulder Rotation angle Right elbow 1x width [32] and then for each part, we extract histogram of rotation angle 1x Right elbow oriented gradients (HOG) [33] feature as a vector. For each rotation angle part, based on the MultiPIE dataset, we calculate the av- erage distance d, and the importance of the part from the Waist rotation angle input face is measured by the ratio of d, which is the mean prismatic joint 2x 1x Rear foot of its distances to the K-Nearest-neighbors (k = 3 in our rotation angle Front foot 1x rotation angle implement), over d. The two most unique parts are kept and the other two parts are replaced by the corresponding parts of shadow puppetry profile. 1x From the profile image as shown in Figure 5(a), we get the Rotation about y axis direction adjusted profile curve in Figure 5(b). The parts in (a) Degrees of freedom (b) Annotation point green circles are the two most unique parts and retained as for DOF infering shown in Figure 5(c). The parts in gray circles are replaced Figure 8: Degrees of freedom and annotation points. with the parts in red circles in Figure 5(e) extracted from left & right hand, lower body, front & rear foot. Although Figure 5(d). Finally, we recombine these selected parts and there are variants replacing lower body with two upper legs, generate a puppet profile curve in skeleton form. the nine-part design dominates. In a real puppet show, each 4.3.2 Profile Texture Transfer puppet is controlled with three stick manipulators. The ma- After previous steps, we obtain a profile skeleton which jor manipulator is attached to the neck, clipping together the is like both the real face and the puppet face. Then we head and the upper body. Each of the other two manipu- dilate this line till the same profile width of the reference lators is hinged to one hand. Based on this control model, puppet as shown in Figure 6(a). To be more like a puppet, shadow puppets have their own distinguished pattern of mo- we transfer the leather texture to the profile area. Efros tion. In this section, we describe a system that is able to et al. [34] presented a simple image quilting algorithm of animate shadow puppets in their own styles learned from generating novel visual appearance in which a new image is real world puppet shows. synthesized by stitching together small patches of existing images. Given a piece of leather sample in Figure 6(b), the 5.1 Control Model texture for the profile curve can be synthesized by the image In our system we adopt the nine-part puppet design, i.e. quilting method [34]. In Figure 6(c), we show an example the digital puppet consists of nine parts linked by 7 joints of texture transfer result. and in total has 12 degrees of freedom (DOF), as depicted 4.4 Post Processing in Figure 8(a). The 7 joints are named according to their In shadow puppetry play, there are a lot of vivid props, positions: left & right shoulder, left & right elbow, waist and including architecture, furniture, plants and animals, de- front & rear knee. All the 7 joints are revolute joints, except signed with distinct dynastic features. Some representative waist, which has an additional prismatic joint along vertical examples are shown in Figure 7(a). To make the roles more axis of the upper body. Therefore, among the 12 DOFs, artistically charming, puppets are decorated with exquisite seven describe the rotation angles at each of the joints; three headdresses, which are classified into male style and female of them describe the x and y coordinates and the rotation style. We can match the generated puppet with the proper angle of the head center (we choose head center as descriptor headdresses, and some results are shown in Figure 7(b). of puppet position because the major manipulator is located near the head); the rest two DOFs are the rotation angle 5. MODULE II: PUPPETRY MANIPULATOR about y axis in the 3D scene which is used to flip the puppet, Traditional Chinese shadow puppet is created by hinging and the displacement of prismatic joint on the waist which together nine parts: head, upper body, left & right arm, is for undulation of upper body.

187 The conformation of the puppet is controlled by assigning values to the 12 DOFs. Animation can thus be synthesized Table 1: Atomic motions Atomic by feeding a sequence of DOFs. The method to generate Meaning DOF sequence is described in the next sections. Motion 5.2 Motion System jump the whole body move forward for some dis- 5.2.1 Identification of Atomic Motions tance, which represents walking in shadow Based on five videos of popular real world puppet show, puppet show i.e. “Tale of the White Snake Madam”,“Emperor Xuanzong bow To lower the head or upper body as a social of Tang”, “Yanfei Wonder Woman”, “Yang Saga”, “Green gesture Dragon Sword”, it can be observed that the animation of sit Sitting on the chairs in the scene puppets can be decomposed into smaller building blocks, which we call atomic motions. Eight atomic motions identi- kneel Kneel down on the ground fied from the videos are listed in Table 1. point A hand gesture frequently appearing during 5.2.2 Data Preparation and Processing speech, indicates the puppet is talking. After the atomic motions are identified, we manually ex- tract video chunks that contain atomic motions from the swing The forward and backward motion of hands puppet shows. For each of the atomic motions, we extract breathe Undulation of upper body, in puppet show, 50 video chunks which we call instances of that atomic mo- this motion indicates the puppet is talking tion. Each frame in the video chunks are labeled with 12 turn Flipping the puppet about the y axis of the key points that describe the conformation of the puppet as scene, changing its facing direction shown in Figure 8(b). From the key points, we can eas- ily calculate the DOFs information for each frame. Thus composite motion“talk for 10 minutes”means“breathe”con- each instance of the atomic motion is finally converted to an tinuously and“point”sometimes for 10 minutes. In this case, instance matrix, with each column being the DOFs of the it will take much more efforts if we input atomic motions one corresponding frame. by one. The scripting system will handle the conversion of However, shaking defect is observed when the DOFs are composite motions to atomic motions. Composite motion is applied directly to our digital puppets, because the available an addition to atomic motion which gives users higher level videos are generally of low quality and the point labeled by control of the puppet. hand will introduce a high frequency shaking defect, which compromises the annotation accuracy. To overcome this de- 5.3 Animation Generation fect, we apply locally weighted scatterplot smoothing to the The animation of puppets can be decomposed to smallest rows of the DOF matrix. Later experiments show that the building blocks called atomic motions. Based on atomic mo- smoothing improves the visual experience significantly. tions, composite motions are synthesized as larger building After smoothing, the instances of the same atomic motion blocks. In this work we render the animation using browser are then temporally realigned by linear interpolation. For based webgl for ease of sharing with others, each puppet has example, the atomic motion A has n instances, each con- a corresponding javascript object. Through the scripting in- verted to a matrix Ai. The Ais are linearly interpolated to terface, manipulators of the digital puppet invokes different have the same column number as the matrix with the most functions on the object. For each atomic or composite mo- 0 tion, we define a corresponding function. The manipulator columns, denoted by Ais. We consider the weighted mean of the aligned instances as eligible variants of that atomic can easily control the puppet by invoking the function with motion: parameters like time duration and spatial distance. For ex- ample, puppet.walk(10,15) tells puppet to walk 10 units in 0 0 0 Anew = A1w1 + A2w2 + ··· + Anwn (2) the defined 2D world in 15 frames’ time. Through script, the manipulator can also place props like chair to specific n X locations, the locations of the props can then be used in the w = 1, w > 0. i i script. The manipulator can either do scripting on-line by i=1 inputing one motion after the other or off-line by writing Unlimited variants of the atomic motion can be generated done all the scripts and feed to the puppet. The on-line which form a rich set of ingredients for building complex system is mainly for the user to experience the individual animations. motions. For an entire animation sequence generation, off- 5.2.3 Recombination of Atomic Motions line is preferred because it offers better control of timing. In Atomic motions captured from the videos have limited the rest part of this section, we will present the method for types. One way to increase their number is through re- off-line animation generation. combination. Each atomic motion can be decomposed into The off-line animation generation system takes in a se- independent parts. For example, the upper body motion is quence of atomic & composite motions as input from a script- independent of the lower body motion. We can thus com- ing interface. A preprocessing step on the input motion bine upper body motion of “bow” with lower body motion sequence decomposes the composite motions into atomic of “kneel” to form a new atomic motion “kowtow”. motions, which makes the motion sequence monotonously 5.2.4 Composite Motions atomic. The DOF sequence is finally created by convert- While atomic motion provides high flexibility for anima- ing each atomic motion into DOFs. As mentioned in Sec- tion synthesis, composite motion offers convenience. Com- tion 5.2.2, atomic motions are generated by weighted means posite motion consists of sequential atomic motions. It is de- of their instances. Reusing a limited number of the instances fined for easier usage of the scripting interface. For instance, makes the animation monotonous; by taking different weight

188 vector, infinite variants of one atomic motion can be gener- Aesthetic Puppetry-alike ated from its limited instances. The aesthetic of the ani- 18 20 15 16 mation is greatly influenced by the smoothness of the ani- 12 12 9 mation. The visual experience is discounted if two adjacent 8 frames differ a lot. Thus, minimizing the position gap be- 6 3 4 tween atomic motions is the criterion in selecting the weight 0 0 vectors. excellent good ordinary weak/poor excellent good ordinary weak/poor 5.3.1 Weights Selection Figure 9: Evaluation of our creator results from the 30 subjects. Results of aesthetic and puppetry-alike level of The smoothness of the animation sequence can be mea- assessment are shown sequentially. The horizontal axis sured by the position gap between consecutive atomic mo- corresponds to the different evaluation levels, the vertical tions, namely the distance from the last frame of one atomic axis is the number of persons. motion to the first frame of the next. Smoothing the whole motion sequence can be achieved by optimizing the following We show here a single variable version of this problem: problem: 1 N−1 min s(x) + λ X 2 x kxk∞ min s({wi}) = kFi+1wi+1 − Liwik2 (3) {w } i i=1 λ = min s(x) + min (8) x i x X i s.t. wij = 1, wij ≥ 0, i ∈ {1, 2, ··· ,N}. λ j = min min s(x) + . i x xi For better explanation, we introduce the instance matrix λ k minx s(x)+ is convex optimization if s(x) is convex, since M , which means the jth instance of the atomic motion k. xi j λ is a convex function when xi > 0. The convex optimiza- k is in the set of atomic motions listed in Table 1. k(i) xi denotes the atomic motion type of the ith atomic motion tion is done for each i ∈ {1, 2, ··· ,N} using exponentiated in the motion sequence. Fi is formed by taking the first gradient descent [36]. By solving the n convex problems (n k(i) is the length of vector x), the single variable version of this columns of M for each j. Li is formed by taking the last j problem can be globally optimized. columns. wi is the weight vector for the ith atomic motion. For an given atomic motion, the weights are constrained to Equivalently, (4) is converted to: be non-negative, and sum to one. λ λ Due to the imperfectness of instances realignment, weighted min ··· min min s{wi} + + ··· + (9) k1 kN wi,i∈{1,2,··· ,N} w1k wNk recombination of the instances has shaking defect if too 1 N many instances are combined together. To reduce shaking Each of the convex problem can be solved by alternatively defects, we impose a sparsity constraint on the weight vec- applying exponentiated gradient to each of the variables. tors, so that a minimum number of instances are selected. However, the number of convex problems grows exponen- The following optimization problem is formed: tially as the variable number increases. In our case, the variable number is N, which is the number of atomic mo- N X tions in the motion sequence. Thus it is not practical to min s({wi}) + λ kwik0 (4) {w } solve the weights for the whole motion sequence in one shot. i i=1 Therefore, we employ a scanning window method: each time X we only optimize a segment of the motion sequence. In this s.t. w = 1, w ≥ 0, i ∈ {1, 2, ··· ,N}. ij ij way the complexity is linear to the sequence length. j After the weights are obtained by optimization described Generally, the l0-norm is relaxed into l1-norm to solve for above, we calculate the whole DOF sequence by using Equa- computational efficiency. However, the l1-norm heuristic tion 2 for each atomic motion. does not work here because the constraint makes l1-norm of the weight vector equal to 1, namely constant. 6. EXPERIMENTS We utilize another relaxation method to handle this spar- In this section, we first evaluate the performance of the sity on probability simplex problem proposed in [35]. The creator module and manipulator module sequentially. Then l0-norm can be lower bounded by reciprocal of infinity norm we comprehensively show the results of the our system in when the corresponding l1-norm is a constant. For any x a so-called “love story with puppetry” scenario to show the constraint to probability simplex: combination effectiveness of creator results and manipulator n results. X kxk1 = |xi| ≤ kxk0 max |xi| ≤ kxk0kxk∞ (5) For the user studies in this section, 30 subjects (7 fe- i i=1 males and 23 males) ranged from 22 to 40 years old (µ=27.3, σ=3.9) are invited to participate in our experiments. Because kxk1 = 1, 1 6.1 Results on Creator Module ≤ kxk0. (6) kxk∞ 6.1.1 Quantitative Results: User Study Then we resort to solve the following problem: Here we discuss the properties of the creator puppet in

N the following three aspects: aesthetic, puppetry-alike and X 1 discrimination. min s({wi}) + λ (7) {wi} kwik∞ i=1 Aesthetic: Whether the result looks elegant?

189 Could you recognize the people from the generated face? 5 5 Please select the one you think is most similar to the puppet, and evaluate the similarity. Puppet Real faces 4 4

3 3

2 2

1 1 smoothed original smoothed original (a) Evaluation of atomic motion smoothness (b) Evaluation of transition smoothness Figure 10: The exemplar user interface of our discrim- Figure 12: Notched box plot of smoothness evaluation. inative evaluate tool. last frame of jump rst frame of bow Table 2: ANOVA analysis of comparing smoothed and original atomic motions F-statistics p-value 1025.49 5.02281e-121

Puppetry-alike: Does the generated face keep the char- acteristics of puppet? (a) random (b) smooth In these two questions, we set five values for subjects Figure 13: Smoothing result to choose: excellent, good, ordinary, weak and poor. We on the transition of adjacent atomic motions. To evalu- present 10 creator puppet images to subjects and the results ate the weight selection method, we generated five anima- are shown in Figure 9. We can see that our puppet module tion sequences each about 30s in length consisting of ran- has achieved very high score in both evaluation metrics. dom atomic motions. Two versions of the animations are Discrimination: Could you recognize the people from generated, one with weights selected using our method de- the generated face? We evaluate this index by several single- scribed in Section 5.3.1 and the other with randomly se- choice questions. For example, we show a puppetry of Emma lected weights. Subjects are given the same options as in Watson to the subjects, in the same time, we show four face the evaluation of atomic motion smoothness. Figure 13(a) images: one is Emma Waston and the other three are ran- and Figure 13(b) compare the transitions selected from the dom choices. An example of user study interface is shown animation sequences generated with smoothness optimized in Figure 10. The subjects choose which face image among weights and random weights. It can be observed the po- the four candidates most looks like the puppet. We test 10 sition gap between consecutive atomic motions is smaller creator results and summarize the subjects’ feedback. The when smoothness optimized weight is used. average precision is 71.1%. The precision is not quite high The notched box plot of the result is shown in Figure 12(b), because the generated puppet images are all in profile view, and the one way ANOVA analysis is shown in Table 3. The but the candidates are in frontal view. However, the preci- results show that optimized weights using our method is sion of 71.1% is much higher than random guess (25%). significantly better in transition smoothness than randomly 6.1.2 Qualitative Results selected weights. Here we show some result examples of the creator module. 6.2.3 Short Video Evaluation As shown in Figure 11, the first column are input frontal Two video chunks about one minute in length is selected view face images, the second column are input profile face from the puppetries “Tale of the White Snake Madam” and images of the same person. The creator results are shown in “Green Dragon Sword”. Using the scripting interface, we re- the third column and results decorated with headdress are produced those two videos. The subjects in this study are shown in the forth column. required to watch the original and reproduced videos, then 6.2 Results on Manipulator Module answer questions regarding three aspects of the synthesized videos. The questions are answered on a 5-point Likert scale 6.2.1 Atomic Motion from “very bad” (1) to “very good” (5). The mean and stan- Here we evaluate the result of applying locally weighted dard deviation of the scores are calculated and displayed in scatterplot smoothing to the instance matrices of atomic mo- Table 4. From the results, conclusion can be made that the tions. For each type of atomic motion, one instance is ran- puppetries generated from our system preserve the motion domly selected and smoothed. The subjects are presented pattern of real puppetry, and successfully convey the infor- both the original and smoothed version of the atomic mo- mation to the subjects. The subjects are satisfied with the tion instances. For each original and smoothed pair of the resolution of the generated puppetry, which is expected be- instances, the subjects are required to evaluate their smooth- cause videos captured from real puppetries are limited in ness on the 5-point Likert scale without prior knowledge the capturing condition and are usually low in quality. which one is smoothed. One randomly selected instance for all eight types of atomic motions are evaluated. We show 6.3 Love Story with Puppetry To demonstrate the usage of both the creator and manip- the notched box plot of the result in Figure 12(a). One way ulator module, we make a short video that tell a love story ANOVA analysis is also performed on the data shown in Ta- ble 2. The results show that locally weighted scatterplot can significantly improve the smoothness of the atomic motions. Table 3: ANOVA analysis of comparing animation se- 6.2.2 Animation Smoothness Evaluation quence generated using weights from smoothness opti- The smoothness of generated animation is not only de- mization and randomly selection pendent on the smoothness of atomic motion itself, but also F-statistics p-value 327.53 6.60827e-50

190 (a) (b) (c) (d) (e) (f) (g) (h) Figure 11: Examples of creators module. The first two columns correspond to the frontal and profile images provided by users. The third column is the generated puppet, and the last column is the puppet after adding head dress. Table 4: Evaluation of short video as compared to the We designed 8 questions and asked all the subjects to original video watch and evaluate the video on a 5-point Likert scale: very Question Mean Stdev good, good, normal, bad and very bad from score 5 to score 1. Besides watching the video, the subjects also tried the Q1. Does the generated video preserve 4.3333 0.6609 manipulator module by typing the script themselves and the motion pattern of real puppet? experiencing the play creation. Then we count the average Q2. How is the information conveyed? 4.2333 0.6789 score of each question, and the results are shown in Table 5. Q3. How is the resolution? 4.6667 0.6065 Q1 and Q2 are related to the user friendliness of our system. Q3 and Q4 are related to the performance of the creator Table 5: The evaluate of the “love story with puppetry” module while Q5 and Q6 are related to the performance Question Mean Stdev of the manipulator module. Q7 and Q8 are related to our Q1: Does the video convey information? initial motivation. Could you understand the whole story by 4.14 0.9438 From the results we can conclude that most subjects agree watching the video? that both the generated puppet (mean score 4.37) and its Q2: Do you like the style of interaction action pattern (mean score 4.31) can well preserve the tradi- 4.00 0.8459 between physical space and virtual space? tional puppet style. And most subjects think our work can Q3: Can you recognize the person gener- help preserve the traditional Chinese shadow puppet culture 4.01 1.0184 ated by our puppet creator module? (mean score 4.27), and want to see similar product in the Q4: Does the generated puppet preserve puppet museum (quite high score 4.60). Due to the limita- 4.37 0.6897 the traditional puppet style? tion of the profile face, people cannot always recognize the Q5: Does the generated action pattern pre- puppet generated by our creator module (score 4.01). We 4.31 0.6311 serve the traditional puppet style? leave refining this part as our future work. The subjects also Q6: Through the practical operation with provided some suggestions. One subject commented, “The the manipulator module, do you think it is 4.27 0.9131 idea is quite good, and I look forward to more wonderful convenient to use it? puppetry videos”. Some of the subjects thought that more Q7: Do you think our work can help pre- atomic actions should be added. One subject thought we serve the traditional Chinese shadow pup- 4.35 0.9632 should enhance the soundtracks of the puppet. pet culture? 7. CONCLUSIONS AND FUTURE WORK Q8: Do you want to see similar product in 4.60 0.6547 In this paper, we proposed the eHeritage of shadow pup- the puppet museum? petry, including a creator module and a manipulator mod- ule. The creator module can generate a puppet for one per- between a man and a female puppet. The animations of the son based on his/her frontal view face image and profile face puppets in this video are generated using our manipulator image. The manipulator module can automatically gener- module. The man in this story finally becomes a puppet, ate the motion sequences based on the script provided by which is generated by our creator module. the user. We conduct extensive experiments on puppetry Story Synopsis: A boy named Ryan was a PhD student. creator and manipulator module, and the results show the He was sad because he was still single. One day, a miracle effectiveness of the proposed system. Currently our system happened, a very beautiful puppet girl appeared and said to mainly focuses on the visual effect of the puppetry. In fu- him: “Would you come to my world?” So he was changed ture, we would like to take more consideration to the audio into a puppet and entered the girl’s world. They got married parts, such as automatic vocal style transfer from human and lived happily ever since. Some example frames are dis- singing to puppetry style singing. played in Figure 14. The full video can be watched online 10 and also included in the supplementary materials. 8. ACKNOWLEDGMENT This research is partially supported by the Singapore Na- 10http://www.youtube.com/watch?v=lxJUw3mKF18yA tional Research Foundation under its International Research

191 1 2 3 # Script

Puppet.walk(distance=10,duration=15) …… Puppet.talk(duration=10) …… Puppet.point(duration=15) …… Puppet.sit(duration=20) …… 4 5 6 Puppet.point(duration=18) …… Puppet.kneel(duration=20) Puppet.kowtow(duration=40) ……

Figure 14: Some representative frames of the “love story with puppetry”.

Centre @Singapore Funding Initiative and administered by puppetry: a performative interface for ,” in the IDM Programme Office; and also partially supported by ACM Transactions on Graphics, 2008. State Key Development Program of Basic Research of China [18] K. Tan, A. Talib, and M. Osman, “Real-time simulation and interactive animation of shadow play puppets using opengl,” 2013CB336500, in part by the Natural Science Foundation International Journal of Computer and Information of China (NSFC) under grant 61172164. Engineering, 2010. [19] D. H. Kim, M. Y. Sung, J.-S. Park, K. Jun, and S.-R. Lee, “Realtime control for motion creation of 3d avatars,” in 9. REFERENCES Advances in Multimedia Information Processing-PCM 2005, 2005. [1] F. Chen, “Visions for the masses: Chinese shadow plays from shaanxi and shanxi,” Pacific Science, 2012. [20] F. Lu, F. Tian, Y. Jiang, X. Cao, W. Luo, G. Li, X. Zhang, G. Dai, and H. Wang, “Shadowstory: creative and collaborative [2] L. Leite and V. Orvalho, “Shape your body: control a virtual digital storytelling inspired by cultural heritage,” in Proc. the silhouette using body motion,” in Proc. ACM Annual 2011 Annual Conference on Human Factors in Computing Conference Extended Abstracts on Human Factors in Systems, 2011. Computing Systems Extended Abstracts, 2012. [21] J. Pan and J. Zhang, “Sketch-based skeleton-driven 2d [3] S. Y. Lin, C. K. Shie, S. C. Chen, and Y. P. Hung, “Action animation and motion capture,” Transactions on edutainment recognition for human-marionette interaction,” in Proc. ACM VI, 2011. International Conference on Multimedia, 2012. [22] R. Held, A. Gupta, B. Curless, and M. Agrawala, “3d puppetry: [4] H. Zhang, Y. Song, Z. Chen, J. Cai, and K. Lu, “Chinese a kinect-based interface for 3d animation,” in Proc. the 25th shadow puppetry with an interactive interface using the kinect Annual ACM Symposium on User Interface Software and sensor,” in Proc. the 12th European Conference on Computer Technology, 2012. Vision Workshop, 2012. [23] L. Leite and V. Orvalho, “Anim-actor: understanding inter [5] A. Paviotti and D. A. Forsyth, “A lightness recovery algorithm with digital puppetry using low-cost motion capture,” in Proc. for the multispectral acquisition of frescoed environments,” in the 8th International Conference on Advances in Computer IEEE 12th International Conference on Computer Vision Entertainment Technology, 2011. Workshops, 2009. [24] M. D. G. Cooper Sanghyun Yoo, “Digital puppet: a tangible, [6] L. Wolf, R. Littman, N. Mayer, N. Dershowitz, R. Shweka, and interactive stage interface,” in Proceedings of the SIGCHI Y. Choueka, “Automatically identifying join candidates in the Conference on Human Factors in Computing Systems, 2011. cairo genizah,” in IEEE 12th International Conference on Computer Vision Workshops, 2009. [25] X. Zhu and D. Ramanan, “Face detection, pose estimation, and landmark localization in the wild,” in IEEE Conf. Computer [7] T. Luo, R. Li, and H. Zha, “3d line drawing for archaeological Vision and Pattern Recognition, 2012. illustration,” International Journal of Computer Vision, 2011. [26] J. Gomes, Warping and morphing of graphical objects. [8] M. Seidl, M. Zeppelzauer, and C. Breiteneder, “A study of Morgan Kaufmann, 1999. gradual transition detection in historic film material,” in Proc. the second workshop on eHeritage and digital art [27] K. Hormann and M. S. Floater, “Mean value coordinates for preservation, 2010. arbitrary planar polygons,” ACM Transactions on Graphics, 2006. [9] A. Mallik, S. Chaudhury, and H. Ghosh, “Preservation of intangible heritage: a case-study of indian classical dance,” in [28] G. Pan, Y. Wu, Z. Wu, and W. Liu, “3d face recognition by Proc. the second workshop on eHeritage and digital art profile and surface matching,” in Proc. the International Joint preservation, 2010. Conference on Neural Networks, 2003. [10] B. Gooch and A. Gooch, Non-photorealistic rendering, 2001. [29] T. Nagamine, T. Uemura, and I. Masuda, “3d facial image analysis for human identification,” in Proc. 11th IAPR [11] Z. Ruttkay and H. Noot, “Animated chartoon faces,” in Proc. International Conference on Pattern Recognition, Conference the 1st International Symposium on Non-photorealistic A: Computer Vision and Applications, 1992. animation and rendering, 2000. [30] R. Gross, I. Matthews, J. Cohn, T. Kanade, and S. Baker, [12] H. Chen, N. N. Zheng, L. Liang, Y. Li, Y. Q. Xu, and H. Y. “Multi-pie,” Image and Vision Computing, 2010. Shum, “Pictoon: a personalized image-based cartoon system,” in Proc. the 10th ACM International Conference on [31] H. Winnem¨oller, “Xdog: advanced image stylization with Multimedia, 2002. extended difference-of-gaussians,” in Proc.the ACM SIGGRAPH/Eurographics Symposium on Non-Photorealistic [13] J. Xu, C. S. Kaplan, and X. Mi, “Computer-generated Animation and Rendering, 2011. papercutting,” in 15th Pacific Conference on Computer Graphics and Applications, 2007. [32] J. Serra, Image analysis and mathematical morphology, 1982. [14] M. Meng, M. Zhao, and S. C. Zhu, “Artistic paper-cut of [33] N. Dalal and B. Triggs, “Histograms of oriented gradients for human portraits,” in Proc. the International Conference on human detection,” in IEEE Conf. Computer Vision and Multimedia, 2010. Pattern Recognition, 2005. [15] A. Mazalek, M. Nitsche, C. R´ebola, A. Wu, P. Clifton, F. Peer, [34] A. A. Efros and W. T. Freeman, “Image quilting for texture and M. Drake, “Pictures at an exhibition: a physical/digital synthesis and transfer,” in Proc. the 28th Annual Conference puppetry performance piece,” in Proceedings of the 8th ACM on Computer Graphics and Interactive Techniques, 2001. conference on Creativity and cognition, 2011. [35] M. Pilanci, L. E. Ghaoui, and V. Chandrasekaran, “Recovery of [16] S. W. Hsu and T. Y. Li, “Planning character motions for sparse probability measures via convex programming,” in shadow play animations,” Proc. the International Conference Advances in Neural Information Processing Systems 25, 2012. on and Social Agents, 2005. [36] J. Kivinen and M. K. Warmuth, “Exponentiated gradient [17] C. Barnes, D. E. Jacobs, J. Sanders, D. B. Goldman, versus gradient descent for linear predictors,” Information and S. Rusinkiewicz, A. Finkelstein, and M. Agrawala, “Video Computation, 1997.

192