Received January 8th, 2019; Accepted May 27th, 2019

Generating 2.5D Character by Switching the Textures of Rigid Deformation

Yuki Morimoto. Atsuko Makita. Takuya Semba. Tokyo Denki University, Kyushu University Tokyo Denki University Tokyo Denki University [email protected]

Tokiichiro Takahashi. Tokyo Denki University, ASTRODESIGN Inc.

Abstract We generated 2.5D animation from raster images, skeletal animation data, and other data formats. The input image of a character is divided into several parts with arbitrarily assigned joint positions. The joint positions in the motion data and additional points are then applied as the control points of rigid deformation, generating a . Geometric interpolation is replaced by switching of the cell animation images. In experimental evaluations, our animation results were successfully generated without interpolation techniques, a large number of input images, and high editing costs for interpolation. We also introduced a transformation content using the method, and confirmed the entertainment value of the contents.

Keywords: 2.5D cartoon model, bone animation, animation

1 Introduction has 2D appearance. However the inputs of this method is lim- In the traditional cel animation process, characters are animated ited to simple geometry. Also the method inapplicable to general by making slight changes to each manually drawn frame along bone . the time series. However, the process requires more than eight per second and is very expensive. To reduce these costs, 2.5D animation of more complicated images can be conducted abstract the motions of their characters from a reduced in Cubism software [3]. The main differences between number of drawn frames in the process called . the method of Rivers et al. and Cubism are the texture mapping Alternatively, animators reuse parts of the cels such as the mouth, and the morphing parameters of each part. Users of Cubism can eyes, and background. Recently, animation has become a digital correspond the parameter values to the part geometries. The sys- process. However, the overall process in which multiple cels of tem then interpolates the geometries by keyframe animation with each part are drawn and switched to generate the animation has the morphing parameters. The user sequentially edits the ver- not changed [1]. tices of the part geometry at the keyframe. And also the user can edit many vertices at once by deforming the curved surfaces In non-photorealistic rendering (a field of ), that the vertices are mapped onto. Such detailed editing usually many works have focused on a technique called toon rendering, incurs high costs. Live2D Inc. has released the animation soft- which transforms 3D models to a 2D cartoon or -like im- ware Euclid, which improves the viewpoint angle to 360 degree ages. Although toon rendering has recently improved the produc- by switching the textures of each part. For this purpose, Euclid tion cost of animation creation and game development, it encoun- extends the method already employed in Cubism. The switching ters problems when the 2D expressions contradict the 3D world. operation is applied only to parts of face and head. To overcome such problems, some researchers have interpolated between the user-specified 3D geometries viewed from some per- Here, we propose a method that generates character anima- spectives. The interpolation sometimes induces unnatural appear- tions from images, while avoiding unnatural interpolations. Our ances especially when the 3D model is bumpy or complicated. To method replaces interpolation with texture-switching and a rigid avoid this problem, additional editing is needed for interpolation. deformation procedure. Moreover, we correspond part textures viewed from different angles using joints instead of vertices such Rivers et al. [2] proposed a 2.5D cartoon model that generates as in the above 2.5D methods, our method is not reliant on the smooth animations such as 3D animations from one input image texture geometry (which can be uneven and complex). The en- per part. In this method, the user specifies the appearances (ge- tertainment value of the method was assessed in a questionnaire ometry and color) of each part from different viewpoints. The survey of the animation results. Although the resulting anima- result animation is smooth like 3D animation by morphing but it tions are less smooth than other 2.5D methods, the smoothness

16 Received January 8th, 2019; Accepted May 27th, 2019 quality was deemed reasonable by the respondents of our ques- tionnaire survey.

2 Related works 2.1 Rigid deformation Proposed by Alexa et al. [4], the rigid deformation technique reduces the distortion in the deformed geometry. Unlike simple linear interpolation of the vertices, the rigid deformation is an affine transformation excluding rotation and shear as far as possi- ble. Igarashi et al. [5] divided a target image into triangle meshes and specified multiple vertices as control points. Their method enables a fast and smooth deformation with reduced distortion of each triangle. Using similar inputs, Schaefer et al. [6] de- formed an image by affine transforms weighted by the distances between control points and mesh vertices. Further, Jacobson et al. proposed a method that accommodates the positions and rota- Figure 1 Overview. tions of control points, control lines, and control cages within the same framework enabling flexible image deformation [7]. Other 4 Details rigid deformation techniques include the registration of hand- 4.1 Rigid deformation scheme of Schaefer et al. drawn animation [8], image processing methods such as content- We apply rigid deformation proposed by Schaefer et al. using aware image resizing [9]. Although rigid deformation has been moving least-squares method [6]. This method deforms an image extended in various ways, we present the first documented exten- based on the specified positions before and after the deformation, sion of 2.5D animation. while avoiding intuitive distortions. In this way, a wide range of Vertex blending is similar to rigid deformation method proposed motions are covered in one image. The method maps the input by Schaefer et al [6]. Vertex blending is generally used for cal- image onto flat triangle meshes, and calculates the positions of culating the vertex positions of a 3D model in bone animation. the vertices on the meshes. Vertex blending operates by summing the weighted affine trans- formations of bone joints. Rigid deformation in the method pro- ⃗f (v) f (v) = |v − p∗| + q∗, ⃗f (v) = ∑qˆ A ⃗ i i posed by Schaefer et al. is limited to translation, rotation, and | f (v)| i ( )( ) uniform scaling of both the x and y axes. These results are less T pˆ v − p∗ 1 distorted in human perception. A = w i , w = , i i − ⊥ − − ⊥ i | − |4 (1) pˆi (v p∗) pi v

2.2 Combination of 2D and 3D expressions pˆi = pi − p∗, qˆi = qi − q∗,

Some research papers have simultaneously captured the textures ∑i wi pi ∑i wiqi p∗ = , q∗ = of 2D expressions with the smoothness of 3D animations [10, 11, ∑i wi ∑i wi 12]. These methods map 2D textures onto 3D models. Based on Here, v is the vertex of the lattice divided in the input image, similar concepts, other researchers have arranged 2D layers into p and q are the control points before and after deformation, re- 3D spaces [13, 14], but these methods have unique goals. spectively, and f (v) is the deformation function applied to v. The weight function w is related to the distance between each 3 Our method vertex and control-point, and m is the number of vertices when 3.1 Overview (i = 0,1,...,m). Please refer [6] for more details. The overview of our system is described below, with reference to Fig. 1. First, the character image is segmented into parts (Fig. 4.2 Two kinds of control points 1(a)), which are the input for our system. Here, the green trian- 4.2.1 Joint-based control points gles and green arrows indicate the joint positions and bones of the Our method defines control points as the joint positions of each character, respectively. Our system requires two or more different part. The joint positions are specified by a mouse click on the part images of the angles of each part. The body parts are shown by images. The required positions after deformation are obtained by the rectangles in Fig. 1(a). The body-part images that are inputs scaling the motion data to the size of the character. Note that rigid for our system are corresponded to their orientations. Hence, our joint deformation of the long parts, such as arms and legs, would system selects the image with the closest direction of each part yield round shapes. To avoid this problem, the control points (see Fig. 1(b)). The additional control points can be specified for in our method are arranged along the bones. Here i is the joint aligning arbitrary positions of different parts (Fig. 1 (c)). In this number, j is the neighboring joint of i, and k = 1,2,...,n, where example, the rigid deformation will align the blue and red posi- n is the number of control points between joints i and j. The joint joint tions shown in Fig. 1(c). Then, bone animation is applied. As position Pi, j,1 of joint i corresponds to Pi . To additionally shown in Fig. 1(d), the part images are switched and deformed constrain the rigid deformation, we add the sub-joints as control to correspond to the direction or angle of the joints of the input points approximately every four pixels between the joints, fol- skeletal animation. lowing a parent-child relation. These sub-control points after de-

17 Received January 8th, 2019; Accepted May 27th, 2019 joint Here, the function H transforms the 3D positions to 2D positions formation Qi, j,k are arranged on the bones between joints i and j at the interval of |Q joint − Q joint |/n (shown as black triangles in on the x-y plane, and j is the parent joint of i. The waist joint is i j the root, meaning that all joint positions from parents to children Fig. 1(d)). joint are calculated with Qroot being the projected position of Kroot 4.2.2 Additionally specified control points onto the x-y plane. If rigid deformation uses the joint only as control points, voids are generated between neighbouring parts in some cases. Our system 4.5 Setting of the part images allows the addition of other control points to cover such voids. Our system requires one or more input images for each part. The Moreover, it also enables to add accessory parts to other than the number of input images for each part can be determined arbitrar- joint positions. This method also enables to add accesory images ily so our system is able to generate animation using part images arbitrarily such as the star, heart, and light-colored belly area in that could be prepared. If there is more than one input image the example of the pink bear (Fig. 2). for one part, these images are switched depending on the angle of In our system, controllers and receivers are additionally specified the part. In our study, we unified the scaling ratio of the image of control points at arbitraril locations. The controllers are located each part in advance. Also, the input images are not obscured by by rigid deformation and their positions are passed as constraints other objects. One or more joints must be specified for each part. to the receivers. The correspondence between controllers and re- It is important to decide where a part image can be divided from ceivers is not necessarily one-to-one, as each receiver can accept the image of a whole character. Here, we assigned the areas with inputs from more than one controller. In such cases, the receivers luminance differences, such as the areas between clothing and/or find the nearest corresponding controller (in relative coordinates) body parts, as borders of separate parts in the image of the body. using the two nearest joints of the controller. If the border areas of the obtained part comprise body or skin, In Figure 1(c), the additional points are the red and blue points on they are dilated slightly with a similar color. This dilation over- the shoulder and top of the arm, respectively. In this example, the laps the images of neighbouring parts to cover the voids caused add by the deformation. For example, see the waist border in Fig. 3,4. red points are controllers Pa and the blue points are receivers add When dividing paper dolls and 2D digital animations, the parts Qa . Part images are deformed in sequential order. Specifi- cally, the part that has controllers is deformed first; the receivers are usually divided around the joints. Examples of divided parts are then deformed to correspond with the deformed controllers. are delineated by the rectangles are shown in Fig. 1(a). Another Here, a is the index of the additional control-point set. Other ex- examples of the dismembered parts are shown in Fig. 2. amples of setting for additional constraints are shown in Fig. 3. Note that the number and locations of the additional points are specified arbitrarily to allow trial and error editing for generating the better result.

4.3 Mesh generation Texture mapping and rigid deformation are performed on meshes of grid points and controll points (see Fig. 1(d)). Rigid defor- mation is computed by using equation (1) with the settings of joint ∪ add joint ∪ add p = Pi, j,k Pa , q = Qi, j,k Qa . Figure 2 Example of a set of parts with a non-segmented trunk 4.4 Scaling of joint positions part. In our method, the input data are the part images and their marked joint positions. A human character has 20 joints: head, neck, cen- ter, left and right shoulders, waist, left and right elbows, left and right wrists, left and right hands, left and right hips, left and right 4.6 Two kinds of switching operations knees, left and right ankles, and left and right toes (the green Each part in the image is switched in two ways: the directions triangles in Fig. 1(a)). Based on the 3D motion data, joint po- of the parts and the angles between bones that belong to the same sitions are calculated to fit the scaling rate of the bone lengths part. The part direction is represented by the direction of one joint in the input 2D character images. We define the scaling rate sc, in the part, which is included in the input animation data. which roughly scales the entire input image to the motion data Switching by part direction is a viewpoint-based approach. This in 3D space. The scaling rate sc avoids the inherent distortions switching method is exemplified in Fig. 1(a), where the switched in certain animations such as stretching of a cloth, by using rigid parts are surrounded by red frames. Switching by angle is an deformation. The scaling rate sc is calculated as approach based on three joints. When three neighbouring joints are included in one part (such as an arm or leg), the switching can |K − K | sc = HipCenter ShoulderCenter , (2) be calculated by the angles between the bones. In the example of | joint − joint | PHipCenter PShoulderCenter Fig.1(a), the parts surrounded by blue frames are switched by where Ki is a position in the 3D coordinates, and i is an index their corresponding bone angles. of joint. The 3D joint positions are then transformed into 2D Furthermore, both switching operations can be combined, and coordinates as follows: more than one operation can be applied to one part. For example, K − K in Fig. 1(a), the skirt part is switched by both operations. As joint joint i j | joint − joint | Qi = Q j + H sc Pi Pj (3) shown in Fig. 4, the skirt part is switched around the y axis and |Ki − Kj|

18 Received January 8th, 2019; Accepted May 27th, 2019 shoulders detected by Kinect in our system. Moreover, we did not use joints other than the 20 joints described in the section 4.4. As the motions were captured by Kinect, our system operated at a real-time rate (Table 1). Table 1 displays the numbers of ver- tices in representative parts of the images of each character, and the computational time of each frame. The numbers change a little by switching images in our actual experiments. The com- putational time includes the rendering time but not the motion- capture time. The part images which are used as inputs for the comparison experiments are shown in Fig. 7. In our method, the depth variations in single image parts are han- dled by setting the depth on each mesh. For example, in the top row of Fig. 6, the shoulder area of the arm part hides behind Figure 3 Examples of correspondences between joints and im- the upper body part, and the elbow area appears in front of the add ages. In the encircled area, the corresponding points upper body part. Setting multiple additional control points Qa are green to green and red to blue. generates relatively more natural results, as shown in the bottom panels of Fig. 6. Our method is also applicable to photographs, as shown in the bottom images of Fig. 9 in the final page of this also by the angle between the right knee, waist, and left knee in paper. this case. In this example, the part images are switched in five directions around the y axis and through three angle ranges (less than -15, -15 to 15, more than 15 degrees).

Figure 4 Examples of image switching by direction of a part around the y axis (top row) or by the angle between joints (bottom row and right column). Figure 5 Comparison between an anime-styled rendered 3D model (left) and our 2.5D animation results (second, third and fourth images from left).

4.7 Setting of joint depth In rendering, the relative-depth order of each joint is decided from the captured joint depths. The user can optionally specify the rel- ative depth based on the input image and orientation of the body Table 1 Number of vertices and processing time for every frame. part. The depth of each vertex in the mesh (Fig. 1(d)) is then the number of computation model defined as the nearest joint depth. To avoid the unnatural appear- vertices time (ms) ance caused by vertices of uneven depths, the vertices around the captured stylized 3D model 1846 3.5 border of the corresponding part are assigned the same depth. hand-drawn girl 1467 5.0 hand-drawn bear 4081 33.5 5 Results, evaluation, and application The results of our approach are shown in Figs. 5 and 6. Our animations are shown in a supplemental video. Our animations Using our system, we generated a virtual-character transforma- were compared against a stylized 3D animation. In one of our tion application for users and also evaluated its performance in a results, we input the captured images of the stylized 3D model; questionnaire survey. Transformation scenes are common in ani- other animation results using hand-drawn versions of the same mated television programs and SFX television programs for chil- character, the pink bear, and the photo of person. Our system dren. Such transformation scenes are popular traditional contents was developed and tested in the following environment: Intel(R) that reflect human desires. General transformation scenes in tele- Core(TM) [email protected] GHz 2.39 GHz, 8 GB memory, Win- vision programs include many . Thus, our contents dows 8.1 Professional 64-bit, and motion-capture equipment (Mi- prepared by us displayed a 2.5D animation after a transformation crosoft Kinect v1). Note that the center shoulder joint was calcu- scene. Throughout the content, we captured the motion of the lated as the center point of the distance between the right and left user in real-time for interaction and animation generation. When

19 Received January 8th, 2019; Accepted May 27th, 2019 evaluated whether or not the user became a cartoon character. Users were asked to rate their perceptions on a three-grade scale (3 = yes, 2 = unsure, and 1 = no). Ten out of 15 participants rated their experiences as 3, suggesting that they perceived them- selves as a cartoon characters. The remaining participants were unsure of their experience. The main content was the 2.5D ani- mation scene; the transformation scene that was included but was significantly shorter. Therefore, our animation results were natu- In the arm part, the shoulder area hides behind the upper body (left). ral and familiar to the users. Incidentally, a vast majority of the In the leg part, the upper area hides behind the skirt (right). participants (14) rated the experiment as ”fun.” One rated it as ”normal”, and none rated it as ”not fun.” Our content was highly appreciated as entertaining.

Result of applying multiple control points on one part. Figure 6 Effects of our method.

Figure 8 Interactive scene of transformation into a 2D character.

Captured stylized 3D model (the hand and skirt parts are shown in Fig. 1 and 4), respectively. 6 Discussion Our static results had a natural appearance, as shown in Fig. 5. Our animation results were also perceived as natural animation contents by the 15 participants of our questionnaire survey. Be- cause our animation results are switched by certain angles, they are less smooth than in existing 2.5D animation, 3D stylized an- imation, and traditional cel animation. However, the study ful- filled its purpose of generating validated cartoon animations with- Hand-drawn girl model out interpolation methods. So we showed possibilities in our method to be a method to more easily generate 2.5D animations in future. It is necessary to input more images or apply interpolation to im- prove the smoothness of animation, however, this will also in- crease its cost. Furthermore, some existing animations, such as stop-motion animation and time lapse videos are not uniformly smooth. Therefore, our method can potentially realize a new style of animation with low editing costs. Hand-drawn bear model Although the present study ignored the user interface (UI), a com- Figure 7 Sets of input images for generating results. Same-part fortable UI for generating the animations is one of our goals. Es- inputs are grouped in the frames. The images of the pecially, by automating some processes on the UI and replacing hand and leg parts were inverted for the other side. The the manual joint specification by image processing, we can ex- parts enclosed in red and blue rectangles were switched pect to reduce the editing costs and improve the quality of the by orientation and angle, respectively. animation result. Another future task is shading the textures in the animation that are not achieved by existing 2.5D methods. Here we describe the difference and comparison between Live2D the user raised his or her right hand, glittery stars were seen to and our method. Live2D requires only one image per part. Ours move from head to toe. This visual effect was accompanied by requires more than one images per part. Both methods need to sound effects (Fig. 8). Next, the user arranged his or her left create a mesh for each image. animation is applied in hand around the side of her face, eliciting heart marks as another Live2D to create movements and also picture itself. In Live2D, visual effect. The content then transited to the 2.5D character an- editing key frame animation is the task to generate not only ani- imation, which appeared after the transformation. Here, the user mations but also pictures (geometries). On the other hands, ani- can freely pose using anything other than the left hand. mations in our method are generated by switching multiple part This content was shown to 15 participants. We experimentally images and rigid deformation with bone data. Unlike drawing a

20 Received January 8th, 2019; Accepted May 27th, 2019 picture, Live2D is not a intuitive way; it must generate all ge- ometries (or appearances from all view point) from one mesh of part image. In our method, input images are just raster images. So there is no cost to edit mesh vertices. In animation genera- tion of Live2D, the shape may collapse for two reasons. One is large deformation caused by covering with only one mesh. The other is that there is no constraints on geometric deformation. In our method, because vertex positions are calculated only by rigid deformation, there is almost no unnatual collapses of shapes. Live2D enables to generate professional animation by detailed editing from very few input images. They are quite useful for producing proffesional animation. Our method is not suitable for generating such professional animations. However our method has an advantage in that it is simple to create a 2D animation with more than minimal quality. The above considerations are explained in terms of the basic theory of both methods except for some optional editing. In the future, simple content generation by individuals would be required because currently virtual avaters are popular online. Thus simple animation generation with mini- mal quality would be valid as one choice.

7 Conclusion We present a system that generates 2.5D animations from bone animation, rigid deformation, and switching of part images. We then generated digital content in our system and evaluated its en- tertainment value in a questionnaire survey. Our system gener- ated animations without interpolation (which produce its smooth- ness but sometimes cause the severity of distortions), and with fewer input images than cel animation. In future work, we hope to install our system in an intuitive UI and evaluate the quality of the generated animations.

References [1] Inc. Celsys. Retus studio. [2] Alec Rivers, Takeo Igarashi, and Fredo´ Durand. 2.5d cartoon models. ACM Trans. Graph., Vol. 29, No. 4, pp. 59:1–59:7, July 2010. [3] Live2D Inc. Live2d. 2008. Figure 9 Our animation results. [4] Marc Alexa, Daniel Cohen-Or, and David Levin. As-rigid- as-possible shape interpolation. In Proceedings of the 27th An- nual Conference on Computer Graphics and Interactive Tech- [10] Johannes Schmid, Martin Sebastian Senn, Markus Gross, niques, SIGGRAPH ’00, 2000. and Robert W. Sumner. Overcoat: An implicit canvas for 3d [5] Takeo Igarashi, Tomer Moscovich, and John F. Hughes. painting. ACM Trans. Graph., Vol. 30, No. 4, July 2011. As-rigid-as-possible shape manipulation. ACM Trans. Graph., [11] Eakta Jain, Yaser Sheikh, Moshe Mahler, and Jessica Hod- Vol. 24, No. 3, July 2005. gins. Three-dimensional proxies for hand-drawn characters. [6] Scott Schaefer, Travis McPhail, and Joe Warren. Image ACM Trans. Graph., Vol. 31, No. 1, February 2012. deformation using moving least squares. ACM Trans. Graph., [12] Katie Bassett, Ilya Baran, Johannes Schmid, Markus Gross, Vol. 25, No. 3, July 2006. and Robert W. Sumner. Authoring and animating painterly char- [7] Alec Jacobson, Ilya Baran, Jovan Popovic,´ and Olga Sorkine. acters. ACM Trans. Graph., Vol. 32, No. 5, October 2013. Bounded biharmonic weights for real-time deformation. ACM [13] Xueting Liu, Xiangyu Mao, Xuan Yang, Linling Zhang, and Trans. Graph., Vol. 30, No. 4, July 2011. Tien-Tsin Wong. Stereoscopizing cel animations. ACM Trans. [8] Daniel Sykora,´ John Dingliana, and Steven Collins. As- Graph., Vol. 32, No. 6, November 2013. rigid-as-possible image registration for hand-drawn cartoon an- [14] Daniel Sykora,´ Ladislav Kavan, Martin Cadˇ ´ık, Ondrejˇ imations. In Proceedings of the 7th International Symposium on Jamriska,ˇ Alec Jacobson, Brian Whited, Maryann Simmons, Non-Photorealistic Animation and Rendering, NPAR ’09, 2009. and Olga Sorkine-Hornung. Ink-and-ray: Bas-relief meshes [9] Yu-Shuen Wang, Chiew-Lan Tai, Olga Sorkine, and Tong- for adding effects to hand-drawn characters. Yee Lee. Optimized scale-and-stretch for image resizing. ACM ACM Trans. Graph., Vol. 33, No. 2, April 2014. Trans. Graph., Vol. 27, No. 5, December 2008.

21