Using automatic generation of Labanotation to protect folk dance
Jiaji Wang Zhenjiang Miao Hao Guo Ziming Zhou Hao Wu
Jiaji Wang, Zhenjiang Miao, Hao Guo, Ziming Zhou, Hao Wu, “Using automatic generation of Labanotation to protect folk dance,” J. Electron. Imaging 26(1), 011028 (2017), doi: 10.1117/1.JEI.26.1.011028. Journal of Electronic Imaging 26(1), 011028 (Jan∕Feb 2017)
Using automatic generation of Labanotation to protect folk dance
Jiaji Wang,a,* Zhenjiang Miao,a Hao Guo,b Ziming Zhou,a and Hao Wuc aBeijing Jiaotong University, School of Computer and Information Technology, Institute of Information Science, Haidian District, Beijing, China bUniversity of South Carolina, Computer Vision Lab, Department of Computer Science and Engineering, 3A19 Swearingen Engineering Center, Columbia, South Carolina 29208, United States cBeijing Normal University, College of Information Science and Technology, Haidian District, Beijing, China
Abstract. Labanotation uses symbols to describe human motion and is an effective means of protecting folk dance. We use motion capture data to automatically generate Labanotation. First, we convert the motion capture data of the biovision hierarchy file into three-dimensional coordinate data. Second, we divide human motion into element movements. Finally, we analyze each movement and find the corresponding notation. Our work has been supervised by an expert in Labanotation to ensure the correctness of the results. At present, the work deals with a subset of symbols in Labanotation that correspond to several basic movements. Labanotation contains many symbols and several new symbols may be introduced for improvement in the future. We will refine our work to handle more symbols. The automatic generation of Labanotation can greatly improve the work efficiency of documenting movements. Thus, our work will significantly contribute to the protection of folk dance and other action arts. © 2017 SPIE and IS&T [DOI: 10.1117/1.JEI.26.1.011028]
Keywords: motion capture data; Labanotation; motion segmentation; movement analysis. Paper 16462SS received Jun. 17, 2016; accepted for publication Jan. 16, 2017; published online Feb. 10, 2017.
1 Introduction Our work includes five aspects, as follows: (1) acquiring Folk dance is a way for working people to feel and express the motion capture data. We obtain the data via two types themselves in their production and living. The world has a of motion capture systems: marked and nonmarked. rich tradition of folk dances, which are a valuable asset and (2) Analyzing the motion capture data. We analyze the struc- an important part of intangible cultural heritage. However, ture of motion capture data stored in biovision hierarchy influenced by inheritance, geographical environment, and (BVH) files and calculate the spatial information in the regional development imbalance, many folk dances and data. (3) Motion segmentation. The segmentation is used other action arts are disappearing. Therefore, it is an urgent to divide human motion into element movements. We use three methods to address the motion of gravity center, task to protect these precious art forms. upper limbs, and lower limbs. (4) Analyzing element move- Dance notation is an effective method for recording ments. Based on whether the movements are characterized human movements and is a useful tool for the protection as supporting, we use two methods to determine the corre- of folk dances. Similar to music scores, dancers can sponding symbols for the movements. (5) Software develop- understand and perform the content from dance notations. ment. By using the motion capture data acquired through (1), Moreover, the symbolic representation makes dance notation via the processes of (2), (3), and (4), we develop the auto- intuitive and easy to understand. However, dance notation is matic generation software of Labanotation. difficult to document because this type of work requires The paper is organized as follows. Section 2 describes manual drawing by experts, and work efficiency is extremely related work on dance notation and computer technology. low. Section 3 describes Labanotation and our human motion cap- Generating the dance notation automatically using a com- ture. Section 4 describes in detail the automatic generation of puter is an effective method for solving this problem. In our Labanotation. Section 5 presents experiments and evalua- work, we choose Labanotation for the experiments, and the tions. Section 6 contains the conclusions and future work. action arts are Chinese folk dance and opera. In 2012, the Chinese ministry of culture and ministry of science and tech- 2 Related Work nology worked together and established a project called the National Science and Technology Support Program. We took Dance notation is a set of symbols used for the recording of part in a study called “multiple display modes of dynamic dance, which is similar to the music score for music. Using digital cultural resources based on human motion capture,” dance notation to describe human motion is rigorous, con- which is intended to use Labanotation as a means to show- venient, and easy to save. There are many kinds of dance notations. In China, the oldest dance notation is Chinese case Chinese folk dance and opera, thereby ensuring the Naxi Dongba dance notation. Dunhuang dance notation is inheritance and protection of these types of action arts. another old one, which was used in the Tang Dynasty of China. In England, there exists Maurice notation, Benish
*Address all correspondence to: Jiaji Wang, E-mail: [email protected] 1017-9909/2017/$25.00 © 2017 SPIE and IS&T
Journal of Electronic Imaging 011028-1 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance dance notation,1 Seiktein shorthand method, etc. Alphabet Ritsumeikan University in 2001. They proposed a method dance notation is used in North Korea. Labanotation, created to generate Labanotation of upper limb movements based by Rudolf Laban,2 is recognized as one of the most widely on spatial analysis.11 However, their research only focused used and most accurate notations for the recording of dance. on the upper limbs and did not discuss the movements of Therefore, in this paper, we choose Labanotation to do our lower limbs or the gravity center. In Thailand, Worawat research for the automatic generation. Choensawat cooperated with Hachimura and Nakamura, Labanotation is intuitive, vivid, and logical. It has been and they developed GenLaban software for the automatic widely used in ballet and other western dances. However, generation of Labanotation.12 However, the GenLaban drawing the Labanotation is difficult, which is an obstacle software does not handle well the complex lower limb move- for using this notation to describe human movements. ments and gravity center movements. Chen et al.13 analyzed To make the drawing easier, a method to simplify the work motion capture data by using the rule-based approach. through computer technology is necessary. However, in the process of generating Labanotation, they At present, there are three main research directions did not consider the pause and segmentation of movements. regarding combining Labanotation and computer technology. Thus, the quality of the generated Labanotation is not high. The first direction is to develop software as a tool for the manual drawing of Labanotation. The software provides a 3 Labanotation and Human Motion Capture variety of symbols, and people only need to drag the symbols to the designated places. This simplifies the manual drawing. 3.1 Labanotation The research on developing software is mature.3 This kind Labanotation is a recording system designed for analyzing of software includes Calaban, Labanatory,4 LED,5 Laban human movements. The notation uses intuitive symbols to Writer,6,7 etc. Among them, the Laban Writer developed represent the movements of the human body. by Ohio State University is one of the most widely used soft- The dance notation comes in two parts: structure and ware platforms. Laban Writer is based on the Macintosh notation symbols. The basic structure is composed by three platform and provides a graphical tool for the user. vertical lines based on the symmetrical structure of the The second direction is to drive models by Labanotation. human body.2,14 The symbols are placed between the vertical Japanese researchers have developed a software platform lines from the bottom to the top. called Laban Editor,8 which uses Labanotation to drive a The structure of Labanotation is shown in Fig. 1. Among human model. The software named Life Forms designed the three bold lines, the middle one represents the human by Maranan et al.9 and the software called Laban Dancer spine while the left and right lines represent the left and developed by Wilke et al.10 can convert Labanotation to right sides of the human body, respectively. Around the a section of human body animation. three vertical lines, Labanotation generally contains 11 or The third direction is the automatic generation of 9 columns. The quantity of columns is determined by the Labanotation. As an interdiscipline of dance and computer requirement. From the middle to both sides, the 11 columns technology, the study is in the beginning stage. At present, are as follows: supporting movement column (left and right), the acquisition of Labanotation is mainly by manual leg movement column (left and right), torso movement col- drawing, but the speed of manual drawing is too slow to umn (left and right), arm movement column (left and right), write down the huge amount of folk dance in the world. hand movement column (left and right), and the rightmost Therefore, it is necessary to research automatic genera- one is the head movement column (or at the most left). tion. The first study on using motion capture data to gen- Notation symbols are written in a column corresponding erate Labanotation is by Hachimura and Nakamura at to the body parts. The horizontal lines in the notation are
Fig. 1 Distribution of Labanotation columns.
Journal of Electronic Imaging 011028-2 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance
levels. For each symbol, the length represents time span of the movement. The directions of nine kinds of horizontal symbols are as follows: place (original position), left, right, forward, back- ward, left forward, right forward, left backward, and right backward. The shape of the place symbol is a rectangle. The place symbol represents that the body part is in a natural state. The other eight horizontal symbols come into being by cutting off a part of the rectangle. The three kinds of vertical symbols include low, middle, and high levels. Filling the symbol with black indicates low level. Filling with a solid dot indicates middle level. And slashes indicate high level. Figure 3 is an example for human movements and the corresponding Labanotation symbols. The first line shows eight kinds of movements of the right arm, and the second line shows the same movements of the right leg. The move- ments are as follows: ① place, low; ② right, low; ③ right, middle; ④ right, high; ⑤ place, high; ⑥ forward, high; ⑦ forward, middle; ⑧ forward, low.
3.2 Human Motion Capture Motion capture can accurately note the movements of each body part in three-dimensional (3-D) space. The method of motion capture was first proposed by Johansson.15 In the beginning, the method was mainly for producing animation. Later, with the maturity of motion capture technology,16 it has been widely used in many fields, such as film and ani- mation, human–computer interaction, virtual reality, game Fig. 2 Twenty-seven basic symbols of Labanotation. production, sport analysis, and so on. section lines that represent the rhythm. The first section line 3.2.1 Two kinds of motion capture system is a double line, and the others are single lines. Depending on whether there is a need to install markers on Labanotation has a lot of symbols to describe all kinds of the human body, the motion capture system can be divided movements in detail. Nevertheless, the basic symbols are not into a marked one and an unmarked one. In our experiments, complex. There are 27 basic symbols corresponding to we use two kinds of systems. The marked motion capture 27 quantized spaces, including nine kinds of horizontal system we used is called OptiTrack,17 and the unmarked directions and three kinds of vertical levels, as shown in one is built by ourselves and utilizes the method in Fig. 2. For the basic symbols, the shape represents horizontal Refs. 18–20. We use the marked one as the main capture directions, and the filling of symbol represents vertical method and the unmarked one as a supplement.
Fig. 3 Several movements of arm and leg and the corresponding symbols of Labanotation.
Journal of Electronic Imaging 011028-3 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance
OptiTrack motion capture system is a mature industrial the movement of each body part, we convert motion capture product, which is sufficiently accurate for our experiments. data into coordinate data. The devices of OptiTrack are expensive, including a set of Second, we cut the human motion into basic units— infrared cameras and the corresponding equipment. element movements. We use three methods for motion seg- Unmarked motion capture system is not a mature product. mentation. Spatial clustering based on the Laban direction is The devices of the unmarked system are relatively simple. used for cutting the motion of the human body’s gravity The system only needs a computer and several consumer center. We cut the motion of the human upper limbs based level cameras.20 Thus, the unmarked system is really port- on velocity threshold, and probabilistic principal component able. The disadvantage is that the accuracy needs to be analysis (PPCA) segmentation is used for the motion of the improved. human lower limbs. Third, we analyze each element movement by the rules 3.2.2 Motion capture data of Labanotation. We find the symbol of Labanotation to describe each element movement and write the symbols at The commonly used motion capture data formats include the right places corresponding to each body part. Thus, the BVH, acclaim skeleton file/acclaim motion capture data, generation of Labanotation is complete. hierarchical translation rotation, and so on.21 In our experi- ment, we choose the BVH format because it is widely accepted. 4.1 Motion Capture Data Analysis and Format BVH is a text file with ASCII encoding format. In the file, Conversion there are two parts. The first part starts with the keyword 4.1.1 Motion capture data analysis hierarchy, which defines the joint structure of a human In a BVH file, each node in the human skeleton describes skeleton. The second part starts with the keyword motion, a body joint. However, different BVH files may use different which stores the movement information of all the joints. skeletons. There are two kinds of differences. One kind is the differ- 4 Automatic Generation of Labanotation ence in the number of nodes. For example, a BVH file uses Using motion capture data to generate Labanotation can be 26 nodes to represent a human body, while another file may summarized in three parts: analyzing and processing the use 23 nodes. The other kind is the difference in the meaning motion capture data, motion segmentation, and analyzing of nodes. For example, there are two skeletons with 26 the element movements. The flow chart is shown in Fig. 4. nodes, but only 24 nodes have the same meaning. One skel- First of all, we analyze the structure of body joints in the eton use the other two nodes to describe the details of the BVH file. Through the analysis of each joint, we confirm the hands, while the other skeleton use the two nodes for the relationships between body joints and body parts. To judge details of the feet.
Fig. 4 Flow chart of generating Labanotation based on motion capture data.
Journal of Electronic Imaging 011028-4 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance
Table 1 Unique identifiers of 26 nodes in the BVH file.
Unique identifiers
Nodes P (part) D (depth) C1 (children) C2 (count)
Hips Center 1 3 1
Spine Center 2 1 1
Spine1 Center 3 3 2
Neck Center 4 1 2
Head Center 5 1 2
End of head Center 6 0 2
Left shoulder Left 4 1 2
Left arm Left 5 1 2
Left fore arm Left 6 1 2 Fig. 5 Human body skeleton (the skeleton faces us, so the left and the right are opposite). Left hand Left 7 1 2
In this paper, we use the skeleton with 26 nodes in BVH End of hand (L) Left 8 0 2 files; the node structure is shown in Fig. 5 (the same structure used in our previous work).22 Right shoulder Right 4 1 2 To set a unique identifier for each node, we define a quad- hP; D; C ;C i Right arm Right 5 1 2 ruple 1 2 to represent a node. The meaning of the four elements is as follows. P (part) represents the location of the node in the skeleton. As shown in Fig. 5, the nodes can Right fore arm Right 6 1 2 be divided into three parts: left, right, and center, so the Right hand Right 7 1 2 enumeration values of P are {left, right, center}. D (depth) “ ” represents the layer of the node. Node root is in the first End of hand (R) Right 8 0 2 layer, in other words, DðrootÞ¼1. Assuming the layer sequence of a nonleaf-node is d, then the layer sequence Left up leg Left 2 1 1 d þ C of its child nodes are 1. 1 (children) represents the C ¼ C number of child nodes. For leaf nodes, 1 0. 2 (count) Left leg Left 3 1 1 represents the number of a kind of node. Assuming node N is N C on a node chain, on the chain, from root node to node , 2 Left foot Left 4 1 1 is the number of nodes containing three child nodes. J ;J ;:::J J Assuming that 0 1 n are nodes on the chain, 0 is Left toe base Left 5 1 1 the root node and Jn is the leaf node; thus, for a node Ji, ⩽i⩽n C ðJ Þ End of foot (L) Left 6 0 1 0 , the value of 2 i is Xi Right up leg Right 2 1 1 C ðJ Þ¼ uðJ Þ; EQ-TARGET;temp:intralink-;e001;63;260 2 i k (1) k¼1 Right leg Right 3 1 1 where Right foot Right 4 1 1 ;C ðJÞ¼ uðJÞ¼ 1 1 3 : Right toe base Right 5 1 1 EQ-TARGET;temp:intralink-;e002;63;199 ;C ðJÞ ≠ (2) 0 1 3 End of foot (R) Right 6 0 1 In our experiments, unique identifiers of the 26 nodes are shown in Table 1. coordinates in the Cartesian coordinate system. 3-D coordi- nates are suitable for the judgment of relative positions. 4.1.2 Format conversion The conversion process is as follows. Considering any In the BVH file, motion data are recorded by Euler angles. nonroot node J and its parent node Jp in the BVH file, Euler angles describe the orientation of each joint during the Euler angles describe the angle rotation around the Z, X, movements, but the relative positions between joints are not and Y axes for the nodes. After a movement, there exists intuitive. Consequently, we convert the Euler angles to 3-D an angular displacement of joint Jp. Assuming ðxc;yc;zcÞ
Journal of Electronic Imaging 011028-5 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance
α is the relative position between J and Jp after a movement Table 2 Relationships between the angle and the horizontal direc- ðx ;y ;z Þ ðx ;y ;z Þ tion of Labanotation. and 0 0 0 is the previous relative position, c c c can be calculated by the angular displacement matrix M and ðx ;y ;z Þ 0 0 0 : Horizontal direction ! ! Range of angle α of Labanotation x x c 0 y ¼ y : ð−22.5 deg; 22.5 deg Forward EQ-TARGET;temp:intralink-;e003;63;708 c M 0 (3) zc z 0 (22.5 deg, 67.5 deg] Left forward Assuming that the angular displacement matrices of all (67.5 deg, 112.5 deg] Left the precursor nodes of J are M1; M2;:::;Mr, where M1 is the matrix of the direct precursor node (parent node of (112.5 deg, 157.5 deg] Left back J) and Mr is the matrix of the root node, after a movement, J J the relative position between node and root node r is ð157.5 deg; 180 deg ∪ ½−180 deg; −157.5 degÞ Back ! ( " !#)! x x c 0 ½−157.5 deg; −112.5 degÞ Right back y ¼ ::: y : EQ-TARGET;temp:intralink-;e004;63;598 c Mr · Mr−1 · M2 · M1 · 0 z z ½−112 5 ; −67 5 Þ c 0 . deg . deg Right (4) ½−67.5 deg; −22.5 degÞ Right forward Thus, the relative position between any node and its pre- cursor node can be calculated. For a node J on a node chain, 0 ¼ðΔx; Δy; ΔzÞ¼ðx ;y ;z Þ − ðx ;y ;z Þ: J ;J ;:::;J J EQ-TARGET;temp:intralink-;e006;326;512m (6) all the precursor nodes are 1 2 r, where 1 is the 2 2 2 1 1 1 direct precursor node and Jr is the root node. Assuming the relative position of these nodes to their direct precursor Through motion vector m, we can confirm the movement O ;O ;:::;O direction, including a vertical component and a horizontal nodes are 0 1 r−1, notice that the root node has P component. We first analyze the vertical component by cal- no predecessor node. Using root to represent the position J J culating the angle between vector m and the y-axis. When of the root node r, the position of 0 in the world coordinate system can be calculated by the angle is less than a threshold (16 deg, determined by experience), it indicates that the gravity center only does ver- P ¼ P þ O þ ::: þ O þ O þ O : EQ-TARGET;temp:intralink-;e005;63;431 m root r−1 2 1 0 (5) tical motion. If the angle between vector and the positive direction of y-axis is between ð−16 deg; 16 degÞ, the move- Thus, to calculate the coordinates of a node, all the rel- ment direction of gravity center is upward. If the angle ative positions between every adjacent node on the chain are is between (ð164 deg; 180 deg ∪ ½−180 deg; −164 degÞ), accumulated and the coordinates of the root node are added. the movement direction is downward. When the angle between vector m and the y-axis is bigger than the threshold, 4.2 Motion Segmentation it indicates that the movement direction of the gravity center is horizontal. We then analyze the angle between the hori- In Labanotation, a symbol represents an element movement zontal component of vector m and the positive direction of a body part. To generate the dance notation automatically, of z-axis. The relationships between the angle and the hori- motion capture data should be cut into element movements. zontal direction are shown in Table 2. The human motion is composed of a series of element Using the interframe difference method, we can calculate movements. the movement directions between every two adjacent frames. In our experiments, we use three methods to cut human ’ Considering that there are almost no movements that take motion. The movements of the human body s gravity center less than 0.1 s, in our experiments, a suitable frame are segmented by the spatial clustering of Laban direction. frequency used in the interframe difference is 10. Through Upper limb movements are segmented by the method of the twice sampling of high frequency motion capture data, velocity threshold. Lower limb movements are segmented the frame frequency can be reduced to 10. For the consecu- by the PPCA. tive vectors that belong to the same direction, they can be classified as a class, as shown in Fig. 6. 4.2.1 Spatial clustering of Laban direction The movements of human body’s gravity center represent the tendency of human motion. In Labanotation, symbols of 4.2.2 Velocity threshold segmentation this kind of movement are written in the supporting column. The velocity threshold method is suitable for the segmenta- In Fig. 5, the root node is in the middle of skeleton, tion of simple human movements, such as upper limb so the movements of the root node can be roughly seen as movements. There are discontinuities between the simple the movements of the gravity center. From motion capture movements. For example, one arm moves to the left, then data, we can get the trajectory of the root node, and the to the right. When the arm turns from left to right, the moving movement directions of the node can be calculated through speed decreases first and then increases, so there exists a dis- t t changes of its coordinates. Assuming in 1 and 2 moments, continuity between the two movements. In other words, there ðx ;y ;z Þ ðx ;y ;z Þ the coordinates of root node are 1 1 1 and 2 2 2 , is a minimum velocity. respectively, the motion vector of the root node between two For a BVH file, setting a speed threshold not only can moments is remove the nodes that do not move but also can separate
Journal of Electronic Imaging 011028-6 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance
4.2.3 Motion segmentation based on the probabilistic principal component analysis The segmentation based on the PPCA is extended to the PCA segmentation method.23 In the PCA method, each frame of motion capture data can be seen as a point in the high dimensional space. In this paper, the human body skeleton in the BVH file contains 26 nodes. For the 26 nodes (see Fig. 5), the root node has six-dimensions, five leaf nodes have no dimensions, and each of the other 20 nodes has three dimensions, for a total of 66-dimensions. Thus, the motion capture data of the i‘th frame xi ði ¼ 1;2;:::;nÞ represents a point in the 66-dimensional space. We can calculate the center of xi: P n x x ¼ i¼1 i : EQ-TARGET;temp:intralink-;e007;326;605 ¯ n (7) The essence of the PCA segmentation method is to reduce the dimension. Motion capture data of one frame is a 66- dimensional matrix, and we do eigenvalue decomposition n Fig. 6 Spatial clustering of Laban direction. of the data with frames X ¼ T; EQ-TARGET;temp:intralink-;e008;326;523M U V (8) two adjacent different movements. When the speed of the arm is larger than the threshold, we know that the arm is where M, U, and V are matricesP with the size of n × 66, in the process of moving. When the speed grows from n × n,and66 × 66, respectively. is a covariance matrix less than the threshold to larger than it (red points in with the size of n × 66
Fig. 7), we know that a new movement is going to start. EQ-TARGET;temp:intralink-;e009;326;457 2 3 σ When the speed changes from larger than the threshold to 1 6 7 less than it (green point in Fig. 7), we know that a movement 6 σ 7 6 2 7 is finished. Between two adjacent movements, it generally X 6 7 ¼ 6 . . 7: indicates that there exists a transition when the speed 6 . 7 (9) 6 7 value is less than the threshold. Figure 7 is an example of 4 σ 5 our velocity threshold segmentation. 65 σ Velocity threshold segmentation is based on the kinematic 66 feature. This segmentation method is effective for move- P ments with obvious pauses; however, the method does not The diagonal elements of are arranged in descending work well when movements are smooth. order. Assuming that the first d columns are the principal
Fig. 7 Velocity threshold segmentation.
Journal of Electronic Imaging 011028-7 Jan∕Feb 2017 • Vol. 26(1) Wang et al.: Using automatic generation of Labanotation to protect folk dance component of the matrix, then the other 66-d nonprincipal components are useless in PCA. PPCA reuses the nonprincipal components. For the 66-d eigenvalues, the definition of mean square error is
1 X66 σ2 ¼ σ2: EQ-TARGET;temp:intralink-;e010;63;708 d i (10) 66- i¼dþ1
Motion data with n frames can be represented by a high dimensional Gaussian distribution. The mean value of Gaussian distribution is x¯ in Eq. (7), and the covariance matrix of the distribution is X 1 1 ͠ 2 ¼ ð T þ σ2 Þ¼ T; EQ-TARGET;temp:intralink-;e011;63;614C WW I V V (11) n − 1 n − 1 where