<<

A Software Pipeline for 3D Generation using Mocap Data and Commercial Shape Models

Xin Zhang, David S. Biswas and Guoliang Fan School of Electrical and Computer Engineering Oklahoma State University Stillwater, Oklahoma, {xin.zhang, david.s.bisaws, guoliang.fan}@okstate.edu

ABSTRACT Additionally, high-quality (mocap) data We propose a software pipeline to generate 3D by and realistic human shape models are two important compo- using the motion capture (mocap) data and human shape nents for animation generation, each of which involves a spe- models. The proposed pipeline integrates two animation cific skeleton definition including the number of joints, the software tools, Maya and MotionBuilder in one flow. Specif- naming convention, hierarchical relationship and underlying ically, we address the issue of skeleton incompatibility among physical meaning of each joint. Ideally, animation software the mocap data, shape models, and animation software. Our can drive a human shape model to move and articulate ac- objective is to generate both realistic and accurate motion- cording to the given mocap data and optimize the deforma- specific animation sequences. Our method is tested by three tion of the body surface with natural smoothness, as shown mocap data sets of various motion types and five commer- in Fig. 1. With the development of and cial human shape models, and it demonstrates better visual the mocap technology, there are plenty of mocap data and realisticness and kinematic accuracy when compared with 3D human models available for various research activities. three other animation generation methods. However, due to their different sources, there is a major gap between the mocap data, shape models and animation soft- ware, which often makes animation generation a challenging Categories and Subject Descriptors task. There are three skeleton definitions involved for an- H.2.8 [Information Systems]: Database Applications— imation generation which are from the mocap data, shape image database; I.6.8 [Computing Methodologies]: Types models, and software build-in skeleton. The incompatibil- of —animations, visual ity among those skeletons often make synthesized animation sequences unrealistic, inaccurate or even distorted. General Terms Experiment, Performance Motion Capture Data 3D Animation Keywords Software

3D animation generation, Human motion, MoCap data Human Shape Model Human Motion Animation

1. INTRODUCTION Figure 1: Animation generation using animation soft- Vision-based human motion analysis is an active research ware, mocap data and 3D shape models. field due to its wide practical applications such as biomet- The goal of this work is to propose a 3D motion-specific rics, human-computer interface, image and video retrieval. animation framework which improves both kinematic accu- Usually, an important prerequisite of this kind of research is racy and visual realisticness. A new software pipeline is pre- a large amount high-quality motion-specific animation data sented to overcome the problem of skeleton incompatibility for algorithm training that reflect various imaging condi- among the mocap data, commercial shape models and ani- tions, e.g., view-points and body-shapes. Given difficulties mation software. The proposed pipeline employs two popu- on collecting real-world training data, several computer an- lar animation software, Maya and MotionBuilder, into one imation software, such as Maya MotionBuilder (Autodesk), flow. In the pipeline, the skeleton from the mocap data and Poser (Curious Labs), have been adopted for animation is employed as the reference one. Accordingly, the human generation due to their efficiency, flexibility and low-cost. shape model is re-defined by using the reference skeleton that is connected with the software skeleton via a skeleton mapping technique. We test our pipeline using several com- Permission to make digital or hard copies of all or part of this work for mercial human shape models and various mocap data sets personal or classroom use is granted without fee provided that copies are from different sources. Our method achieves very promis- not made or distributed for profit or commercial advantage and that copies ing results. Using the proposed pipeline, we can generate bear this notice and the full citation on the first page. To copy otherwise, to natural-looking animations that accurately reflect the un- republish, to post on servers or to redistribute to lists, requires prior specific derlying motion data. This work would lead to a useful re- permission and/or a fee. CIVR ’10, July 5-7, Xi’an China search tool for researchers in the field of vision-based human Copyright c 2010 ACM 978-1-4503-0117-6/10/07 ...$10.00. motion analysis. Application Fields • Maya is high-end professional 3D computer modeling software that supports detailed graphical representa- Research Topics tion. Given its powerful and complex capabilities, it re- Video Image and Browsing Video quires a steep learning curve for animation generation. Detection Tracking Retrieval Moreover, Maya needs additional software to handle

Segmenta 3D Joints various often-used mocap data formats (AMC/ASF, tion 3D Human Animation Estimation BVH, C3D etc.). Biometrics Data People Identification Activity • MotionBuilder (MB) is specially designed software Recognition Recognition for 3D with powerful tools to han-

Visual Pose dle mocap data and various camera settings. It also Surveillance Estimation provides well-designed physical constraints and opti- Video Indexing mization algorithms for motion-realistic animation. It Human Computer Interface is often encumbered by the skeleton incompatibility between shape models and mocap data. Figure 2: Related research topics and application fields 2.3 Virtual Environment involving 3D human animation data. Several synthetic datasets and simulation tools have been developed as testing beds for research, es- pecially in the field visual surveillance. By building a vir- tual world, synthetic visual data can be acquired by freely placed cameras that could include multiple human activities and interactions in various scenarios. This work belongs to the second category. We want to pro- vide an efficient and flexible way for animation generation that accommodates arbitrary mocap data and any commer- cial human shape models. (a) (b) (c)

Table 1: Brief summary of related research and their Figure 3: 3D human animations using (a) rigid body animation data generation methods. modeling [34], (b) body mesh animation (Maya) [31], and (c) virtual environment [33]. Research Topics Methods Detection and Segmentation 2. RELATED WORK Li et al.[15] body part super-quadratics 3D human animations are widely used in various research Lin et al.[16] body part super-quadratics topics, including detection, segmentation, tracking, estima- Schlogl et al.[27] body mesh animation Tracking tion, recognition and retrieval, shown in Fig. 2. There are Black et al [5] body mesh animation three different ways to create human animations: rigid body Desurmont et al. [9] virtual environment modeling, body mesh animation, and virtual environment Sminchisescu and Triggs [32] body part super-quadratics (Fig. 3). All these techniques have been employed in the Urtasun and Fua [34] body part super-quadratics human related vision research, as summarized in Table 1. Pose Estimation Agarwal and Triggs [3] body mesh animation (Poser) 2.1 Rigid Human Body Modeling Elgammal and Lee [10] body mesh animation (Poser) Guo and Qian [13] body mesh animation (Maya) The human body is assumed to be composed of several Sigal et al [29] body part super-quadratics hierarchically connected rigid parts. Each body part can be Shakhnarovich et al. [28] body mesh animation (Poser) modeled as a rigid super-quadratics such as cuboid or cylin- Sminchisescu et al. [31] body mesh animation (Maya) der. This method is simple with less parameters and it does 3D Motion Estimation not involve any special animation related software. However, Canton-Ferrer et al. [6] body part super-quadratics the model can only provide a very rough synthesization of Mundermann et al. [19] scanned human shape Wei et al. [35] body part super-quadratics human shape and animations. Zhang et al. [36] body mesh animation (proposed) Recognition 2.2 Body Mesh Animation Chen et al. [7] body mesh animation Body mesh animation can be achieved by graphic pro- Park et al. [21] body part super-quadratics gramming and commercial software. Given large amount Peng and Qian [23] body mesh animation (Maya) of high-quality mocap data, 3D animation software can be Ragheb et al. [25] body mesh animation (MB) used to create various human animation sequences with good Retrieval Deng et al. [8] body mesh animation quality and high efficiency. Three animation software tools Godil and Ressler [12] scanned human shape are often used: Poser, Maya and MotionBuilder. Pawar et al. [22] body part super-quadratics • Poser is easy-to-use software that provides a collection Visual Surveillance Qureshi and Terzopoulos [24] virtual environment of human models and many types of motions. But it Taylor et al. [33] virtual environment is not easy to directly incorporate data or models from other sources. 3. PRELIMINARY AND KEY IDEAS Shape Model 3D animation generation requires mocap data, a human Mocap Shape Model Shape shape model and certain animation software. Data Skeleton Mesh • Mocap Data have two elements: the skeleton and mo- tion information.The skeleton is made up of a number of joints that have specific names and follow a hier- MotionBuilder Maya MotionBuilder archical structure. The motion information is usually represented as translation and rotation of each joint during animation. The skeleton is pre-defined for the motion capture process and can be very different due Converted Mocap New Shape Animated Human to different motion collection purposes and mocap sys- Skeleton Model Model tems [17]. Plenty of algorithms were proposed to com- (3) Animation press, edit, interpolate and retrieve mocap data [20, (1) Skeleton Mapping (2) Skin Binding Generation 26, 11, 14, 18].

• Human Shape Model has a set of vertex simulating hu- Figure 5: The proposed animation generation pipeline. man appearances and a hierarchical bone structure, (1) Skeleton mapping; (2) Skin binding; (3) Animation that is, the shape model skeleton. Similar to the real generation. human body, each joint on the skeleton is associated with a vertex mesh to control a specific part of the body. The skeleton is defined by the shape model mocap skeleton can be recognized by MotionBuilder that can provider and the skeleton-mesh binding relationship optimize the animation generation process via internal func- can largely influence the realisticness and accuracy of tions. For a given human shape model, the original skeleton synthetic motions. is replaced by the mocap skeleton and the skin binding re- lationship is re-generated via Maya. After these steps, the • Animation Software has built-in skin deformation and shape model can be directly driven by any mocap data in optimization algorithms. The shape model mesh (skin) MotionBuilder. is deformed with skeleton movements. To simulate natural movements, the skin surface is further opti- 4. PROPOSED PIPELINE mized given its binding information with the skeleton. The proposed pipeline can generate natural human anima- The skeleton is the crucial concept in the animation pro- tions efficiently and effectively by overcoming the skeleton cess and we have to deal with three different skeletons: the incompatibility. The flowchart is shown in Fig. 5, which in- mocap skeleton, the shape model skeleton and the software volves two animation software Maya and MotionBuilder and default skeleton. The animation generation is straightfor- has three major steps: skeleton mapping, skin binding and ward if three skeletons are consistent. Unfortunately, these animation generation. The detailed implementation steps skeletons usually are not the same due to different providers. are listed in Table 2. The skeleton incompatibility exists on the joint naming con- vention, the number of joints in each skeleton, and the most 4.1 Skeleton Mapping importantly, the structure of skeletons, as shown in Fig. 4. As discussed before, the skeleton incompatibility is the The joint topologies and structures are different, especially most crucial challenge. The skeleton mapping aims at bridg- in the chest and hip areas (circled in Fig. 4). ing the gap between mocap skeleton and the default software skeleton by defining a mapping template. Since the default software skeleton is connected with built-in algorithms, the mocap skeleton has to be consistent with this skeleton in the joint names and structure. We employed MotionBuilder for its capabilities and flexibilities on the skeleton conversion. MotionBuilder enables users to manually build the joint re- lationship between the mocap skeleton and the software one and plot the built-in FK/IK (forward and inverse kinemat- ics) controllers onto the skeleton. We save it as the skeleton (a) (b) (c) mapping template. By using the template, the joint name is automatically changed and a new software skeleton is con- Figure 4: Three skeletons represented as the tree struc- structed following the mocap skeleton hierarchical structure. ture. (a) Mocap skeleton; (b) Shape model skeleton; (c) Once the conversion template is generated, it can be used MotionBuilder built-in skeleton. for any mocap data from the same data set.

In this work, we propose a skeleton mapping (in Motion- 4.2 Skin Binding Builder) and skeleton replacement (in Maya) technique to To overcome the skeleton incompatibility between the mo- overcome the incompatibility among three skeletons. First, cap data and the shape model, we propose a skeleton re- we use the mocap skeleton as the reference because it is di- placement technique as shown in Fig. 6, that is, we create a rectly associated with the motion data. Then we map the new shape model that combines the original mesh and the mocap skeleton with the software build-in skeleton by gener- mocap skeleton. However, the replacement of the skeleton ating a mapping template in MotionBuilder. In this way, the inside the shape model not only changes the internal skeleton structure but also needs to redefine the binding relationship that specifies the association between the body mesh and Table 2: Summary of detailed implementation steps skeleton joints. This process is called skin binding. There • Load the mocap data into MotionBuilder are several algorithms [4] on this topic and here we adopt (MB) and drag a character onto it; Maya for its efficiency. We use the Maya build-in function Skeleton • Manually connect the mocap skeleton with smooth binding to rig the shape model with the new skele- the built-in software character skeleton for ev- ton from the mocap data and fine-tune two parameters (max ery joint; Mapping • Use “Control Rig In” command to plot references and dropoff rate) to reach the best performance. FK/IK controllers onto the mocap skeleton; If the shape model and the skeleton have different heights, (in MB) • Save the file as a mapping template to we have to re-scale the shape model as a whole for a bet- connect the mocap skeleton with the Motion- ter binding relationship while keeping the mocap skeleton Builder built-in skeleton. unchanged. After this, the shape model is controlled by a • Load original shape model into Maya and skeleton that has the same structure as the mocap skele- detach the shape model skeleton from the shape mesh; ton and can be recognized by MotionBuilder for animation Skeleton • Load the mocap skeleton and “Smooth Bind” generation. it with the shape mesh; Binding • Adjust two binding parameters maxim ref- Skin Binding erence set and drop off rate that can influence the smoothness of skin deformation; (in Maya) • Export the new shape model for the anima- tion generation. Animation • Load the new shape model and the skeleton Shape Mesh mapping template into MotionBuilder; Generation • Import a mocap data file and use it to ani- mate the new shape model; Mocap Skeleton Original Shape Model • “Characterize” Shape Model (in MB) Activate the command in Skeleton New Shape Model MotionBuilder to incorporate build-in con- straints and optimization algorithms. Figure 6: Illustration of the skeleton binding process.

4.3 Animation Generation skeleton mapping template and new shape model are gen- Given mocap data, the skeleton mapping template and the erated, we can synthesize human motions using any mocap new shape model with the mocap skeleton, we use Motion- sequence from the data set in MotionBuilder. Additionally, Builder to generate 3D animation sequences. We firstly load the advanced animation and optimization features provided the new shape model into MotionBuilder. With the mapping by MotionBuilder can be easily incorporated. Implementa- template, the skeleton can be directly recognized by Motion- tion steps are summarized in Table 2. Builder. After importing a mocap file, the shape model can move seamlessly following the mocap motion, shown in 7. 5. EXPERIMENTS AND DISCUSSIONS Additionally, we can “characterize” the skeleton inside the shape model, which incorporates the FK/IK optimization 5.1 Comparative Studies algorithms and physical constraints predefined by software. We tested the proposed pipeline using the mocap data This step employs MotionBuilder’s built-in features to im- from CMU Motion Capture Library [1], Ohio State Uni- prove some noisy mocap motion. The visual human motion versity Mocap Lab [2] and ViHASi [25] and shape models sequences can be recorded by freely placed virtual cameras. from ClipArt of MotionBuilder and from aXYZ-design.com 1 . The motion animations are compared with other anima- tion generation methods visually and quantitatively. First, using mocap data from CMU mocap library, we compare our pipeline with other three animation generation methods in terms of visual realisticness and kinematic accu- racy. Since we do not have the ground truth about human body motion, it is hard to quantitatively evaluate the vi- sual realisticness using mesh differences. Hence, we take the realisticness as an subjective measurement of how realistic the motion animation looks. The kinematic accuracy indi- Load Mocap Data onto cates the quantitative difference between the ground truth Animated Human Model New Shape Model motion (mocap data) and the actual motion underlying the animation sequence. The smaller difference means the better Figure 7: Illustration of the human . accuracy. The following are three other animation methods using MotionBuilder for the comparison: 4.4 Summary • Method-I is the general process using MoitionBuilder following its tutorial. Using the proposed pipeline, we overcome the skeleton dif- 1 ferences and generate realistic human animation efficiently http://www.axyz-design.com/axyz-design-3d-humans- using any mocap data sets and shape models. Once the characters-metropoly-rigged.php • Method-II is based on Method-I results and we man- ually adjust the transformation and rotation of body Table 3: The comparison of four animation generation parts to remove artificial movements and to follow mo- methods to synthesize walking in three shape models us- ing five mocap sequences from the CMU mocap library. cap data better. Methods I II III Ours Avg. Error 43.26o 22.38o 16.53o 8.07o • Method-III adopts the same process as Method-I but Speed Fast Slow Medium Medium using a special converted MotionBuilder-friendly mo- Mocap Data 2 Good Good Fair Good cap data from original CMU mocap data . The skele- Extendability ton in the converted mocap data is renamed according Human No Yes Yes Minimal Involvement to MotionBuilder naming convention and the shoul- der of T-pose is manually adjusted to generate natural movements. the ball, gym movements (like jump, twist and squat) and In Fig. 8, we illustrate a few walking sequences generated running-leaping. 3D human motions are visually realistic by four methods in three shape models from three viewing and skin is smooth and natural even in highly articulated angles. The first row shows the ground truth mocap motion dancing poses. We also quantitatively compare Method-I by skeleton and following rows are Method-I, II, III and our and the proposed pipeline using Eqn. 1. Results are shown proposed pipeline. It is obvious that the last two rows have in Table. 4. It is obvious that our method has smaller kine- more realistic and accurate movements, especially around matic error and provides better accuracy. shoulders, knees, and feet. Additionally, we test our pipeline on different mocap data Secondly, we quantitatively compare these generation meth- sets, including data from Ohio State University mocap lab ods by computing the average joint error between the ground [2] and ViHASi [25]. Our method can generate natural mo- truth mocap data and synthetic motions performed by an- tions using both data sets and various human shape models. imated characters. We generate five different walking mo- In Fig. 10, we show a few animations from two mocap data tions in three different shape models (humanoid, female and sets using four different shape models. male). In mocap data, the motion is represented as relative rotation angles of each joint. Hence, the kinematic error is 5.2 Research Applications defined as the joint rotation angle difference. The ground In [36, 37], the proposed pipeline was used to generate truth of the ith motion is denoted by a sequence of rotation i i training data of a specific motion for the framework that angles Θ = {θ (k)|k = 1, ..., K}, where K is the number of can estimate 3D joint positions given image sequences of joints in the skeleton. Similarly, the corresponding synthetic human walking taken by a single camera. The framework motion is defined as the joint rotation sequence of the shape i i involves two generative models, i.e., Kinematics Gait Gen- model skeleton, represented as Φ = {φ (k)|k = 1, ..., K}. erative Model (KGGM) and Visual Gait Generative Model The kinematic error of two motions is written as (VGGM). Dual generative models can interpolate and syn- 1 i i thesize new gaits visually and kinematically, which allows ε = X X θ (k) − φ (k), (1) i k us to infer the kinematics of a new gait from its appear- N N i k ances. In order to learn these models, we need large amount where Ni is the total number of motions and Nk is the num- of fully synchronized high quality kinematic and visual mo- ber of joints for error computation. We used eight joints on tion data from multiple persons. Moreover, animation se- limbs (knees, feet, elbows and hands) as they are dominant quences have to be collected from various viewing angle with joints in the human motion. people in different body shapes. Hence, the visual train- In Table 3, we compare the average error and properties ing data are generated via animation software through the of four approaches. Method-I has nice features but cannot proposed pipeline. We selected 20 representative walking provide satisfied visual and quantitative results given the in- motions from CMU Mocap library for KGGM training. We compatibility among resources. Method-II requires intensive rendered 100 3D gait animations by using the 20 gaits (same labor work and changes the original motion defined by mo- ones used in the KGGM training) and five human models cap data. Method-III provides natural animation sequences (Fig. 11). Each 3D gait animation was recorded under 12 ◦ but the data conversion process involves professional knowl- camera views (30 apart). Fig. 12 shows gait kinematics edge and programming, which makes it difficult to apply and its corresponding gait appearances (five shapes under onto other data sets that have the skeleton incompatibil- one view). We tested estimation algorithms on Subjects 1, ity issue. The proposed pipeline combines advantages of 2 and 3 in the HumanEva-I dataset [30] and reached state- two commercial software and can be extended to wide range of-the-art results. We have also provided the rigged shape of mocap data and shape models. More importantly, our models and a set of gait animations used in [36, 37] online method reaches the smallest error, that is, the ground truth for public use (http://www.vcipl.okstate.edu). mocap data is best reflected in the synthesized motion se- quence with the smallest differences. This kind of kinematic 5.3 Limitations accuracy is essential in the visual-based research for both the There are three limitations in the proposed pipeline. training data generation and the testing data evaluation. Also, we show more animation sequences from various mo- • Firstly, in the skin binding step, two parameters need tions generated by our pipeline in Fig. 9. A few sophisti- to be adjusted manually. However, for all the human cated motions are demonstrated including dancing, kicking shape model, the parameters are set as similar values. 2 http://sites.google.com/a/cgspeed.com/cgspeed/motion- • Secondly, since different mocap data sets usually have capture/cmu-bvh-conversion different skeleton structures, the skeleton mapping and Table 4: The comparison of two animation generation methods to synthesize seven different motions in one shape model using mocap data from the CMU mocap library. Dance Kick Jump Twist Squat Run Leap Method-I 53.26o 32.27o 32.9o 48.91o 32.32o 26.13o 38.07o Proposed 10.29o 9.30o 9.53o 9.97o 8.21o 6.77o 8.26o

skin binding steps are required to be re-defined for a new mocap data set. The good thing is that these procedures only need to be done once.

• Lastly, the pipeline can only work successfully when the skeleton and shape model have similar size. For example, the mocap data of a child cannot drive an adult shape model, vice versa.

6. CONCLUSION AND FUTURE WORK We have proposed a software pipeline for motion anima- tion generation that effectively overcomes the skeleton in- compatibility among mocap data, shape models and anima- tion software. Created 3D motion animations are not only visually realistic but also kinematically accurate when com- Figure 11: Five 3D human models. The first and last pared with other generation methods. one are from the MotionBuilder Clip of Art and the This work provides a useful tool as well as a rich set of train- others are from aXYZ design 3D. ing data for researchers in the field of vision-based human motion analysis. Its usefulness and effectiveness have been manifested in our recent human motion estimation research. In the future, we will study the influence of the binding pa- rameters during skin binding and an objective measurement for visual quality assessment.

Acknowledgements The authors thank anonymous reviewers for their valuable comments and suggestions that improved this paper. The authors also thank Mr. Favian Beltran for his insightful sug- gestions. This work is supported by the National Science Foundation (NSF) under Grant IIS-0347613 and an OHRS award (HR09-030) from the Oklahoma Center for the Ad- vancement of Science and Technology (OCAST).

7. REFERENCES [1] CMU Mocap Library. http://mocap.cs.cmu.edu/. [2] Ohio State University Motion Capture Lab. http://accad.osu.edu/research/mocap/mocapdata.htm. [3] A. Agarwal and B. Triggs. 3D human pose from silhouettes by relevance vector regression. In Proc. IEEE Conference on Computer Vision and Pattern Recognition, 2004. [4] I. Baran and J. Popovic. Automatic rigging and animation of 3D characters. ACM Trans. on Graphics, 26, 2007. [5] J. Black, T. Ellis, and P. Rosin. A novel method for video tracking performance evaluation. In Proc. VS-PETS, 2003. [6] C. Canton-Ferrer, J. Casas, and M. Pardas. Exploiting structural hierarchy in articulated objects towards Figure 12: Some gait animations generated by the robust motion capture. In Conf. on Articulated proposed pipeline that include one gait (the first Motion and Deformable Objects, 2008. one) on five shapes under one view. [7] Y. Chen, R. Parent, R. Machiraju, and J. Davis. Human activity recognition for synthesis. In IEEE Workshop on Learning, Representation, and Context [23] B. Peng, G. Qian, and S. Rajko. View-invariant for Human Sensing in Video, 2006. full-body from video. In Proc. of [8] Z. Deng, Q. Gu, and Q. Li. Perceptually consistent International Conference on Pattern Recognition, example-based human motion retrieval. In Proc. of 2008. ACM SIGGRAPH Symposium on Interactive 3D [24] F. Qureshi and D. Terzopoulos. Towards intelligent Graphics and Games (SI3D), 2009. camera networks: A virtual vision approach. In Proc. [9] X. Desurmont, J.-B. Hayet, C. Machy, J.-F. Delaigle, VS-PETS, 2005. and J.-F. Macq. On the performance evaluation of [25] H. Ragheb, S. Velastin, P. Remagnino, and T. Ellis. tracking systems using multiple pan-tilt-zoom Vihasi: Virtual human action silhouette data for the cameras. In IS&T/SPIE Symposium on Electronic performance evaluation of silhouette-based action Imaging, 2007. recognition methods. In Workshop on Activity [10] A. Elgammal and C.-S. Lee. Tracking people on torus. Monitoring by Multi-Camera Surveillance Systems, IEEE Trans. on Pattern Analysis and Machine 2008. Intelligence, 31:520–538, 2009. [26] L. Ren, A. Patrick, A. A. Efros, J. K. Hodgins, and [11] M. Gleicher. Retargetting motion to new characters. J. M. Rehg. A data-driven approach to quantifying In SIGGRAPH, 1998. natural human motion. ACM Transactions on [12] A. Godil and S. Ressler. Retrieval and clustering from Graphics (SIGGRAPH 2005), 24(3):1090–1097, Aug. a 3d human database based on body and head shape. 2005. In Proc. of SAE Digital Human Modeling Conference, [27] T. Schlogl, C. Beleznai, M. Winter, and H. Bischof. 2006. Performance evaluation metrics for motion detection [13] F. Guo and G. Qian. Monocular 3D tracking of and tracking. In Procs. International Conference on articulated human motion in silhouette and pose Pattern Recognition, 2004. manifolds. EURASIP Journal on Image and Video [28] G. Shakhnarovich, P. Viola, and T. Darrell. Fast pose Processing, 2008:1–18, 2008. estimation with parameter sensitive hashing. In Proc. [14] L. Kovar, M. Gleicher, and F. Pighin. Motion graphs. of International Conference on Computer Vision, In SIGGRAPH ’08: ACM SIGGRAPH 2008 classes, 2003. pages 1–10, New York, NY, USA, 2008. ACM. [29] L. Sigal, S. Bhatia, S. Roth, M. Black, and M. Isard. [15] Y. Li, B. Wu, and R. Nevatia. Human detection by Tracking loose-limbed people. In Proc. IEEE searching in 3d space using camera and scene Conference on Computer Vision and Pattern knowledge. In Proc. International Conference on Recognition, 2004. Pattern Recognition, 2008. [30] L. Sigal and M. Black. HumanEva: Synchronized [16] Z. Lin, L. S. Davis, D. Doermann, and D. DeMenthon. video and motion capture dataset for evaluation of Hierarchical part-template matching for human articulated human motion. Technical Report detection and segmentation. In IEEE International CS-06-08, Brown University, 2006. Conference on Computer Vision, 2007. [31] C. Sminchisescu, A. Kanaujia, and D. N. Metaxas. [17] M. Meredith and S. Maddock. Motion capture file Bm3e: Discriminative density propagation for visual formats explained. Technical Report CS-01-11, tracking. IEEE Trans. on Pattern Analysis and University of Sheffield. Machine Intelligence, 29:2030–2044, 2007. [18] M. Muller,¨ T. R¨oder, and M. Clausen. Efficient [32] C. Sminchisescu and B. Triggs. Kinematic jump content-based retrieval of motion capture data. ACM processes for monocular 3d human tracking. In Proc. Trans. Graph., 24(3):677–685, 2005. IEEE Conference on Computer Vision and Pattern [19] L. Mundermann, S. Corazza, and T. P. Andriacchi. Recognition, 2003. Accurately measuring human movement using [33] G. R. Taylor, A. J. Chosak, and P. C. Brewer. OVVV: articulated ICP with soft-joint constraints and a Using virtual worlds to design and evaluate repository of articulated models. In Proc. IEEE surveillance systems. In Proc. IEEE Conference on Conference on Computer Vision and Pattern Computer Vision and Pattern Recognition, 2007. Recognition, 2007. [34] R. Urtasun and P. Fua. 3d human body tracking using [20] O. Onder, U. Gudukbay, B. Ozguc, T. Erdem, C. E. deterministic temporal motion models. In Proc. of Erdem, and M. Ozkan. Keyframe reduction techniques European Conference on Computer Vision, 2004. for motion capture data. In Proceedings of the 3DTV [35] X. K. Wei and J. Chai. Modeling 3d human poses Conference: The True Vision - Capture, Transmission from uncalibrated monocular images. In Proc. IEEE and Display of 3D Video, 2008. International Conference on Computer Vision, 2009. [21] J. Park, S. Park, and J. Aggarwal. Model-based [36] X. Zhang and G. Fan. Dual gait generative models for human motion tracking and behavior recognition human motion estimation from a single camera. IEEE using hierarchical finite state automata. In ACM Trans. on Systems, Man, and Cybernetics Part B: Conference on Image and Video Retrieval, 2003. Cybernetics, 2010 (to appear). [22] M. Pawar, G. Pradhan, K. Zhang, and [37] X. Zhang, G. Fan, and L. Chou. Two-layer gait B. Prabhakaran. Content based querying and generative models for estimating unknown human gait searching for 3D human motions. In Proc. of kinematics. In Proc. IEEE ICCV Workshop on Internaltional ACM Multimedia Modeling Conference, Machine Learning for Vision-based Motion Analysis, 2008. 2009. Figure 8: 3D human animations using various methods. From the first row to the last: mocap skeleton animation; Method-I; Method-II; Method-III and the proposed method.

(a) (b)

(c)

(d)

Figure 9: Demonstration of various motions. (a) Ballet dance; (b) Kick a ball; (c) Jump-twist-squat; (d) Run-leap.

(a) (b)

(c) (d)

Figure 10: 3D human animations using mocap data from Ohio State University mocap lab (first row) and ViHASi data set (second row). (a) Walk and turn left 90o; (b) punch; (c) run; (d) walk dog.