Tracking of Human Body Joints Using Anthropometry

TRACKING OF HUMAN BODY JOINTS USING ANTHROPOMETRY A. Gritai and M. Shah School of Electrical Engineering and Computer Science University of Central Florida ABSTRACT take in account all aspects. Compared to some complex meth- We propose a novel approach for tracking of human joints ods, our approach does not require specific knowledge in mod- based on anthropometric constraints. A human is modeled eling human dynamics. Given a model of an action from any as a pictorial structure consisting of body landmarks (joints) viewpoint, this paper proposes a novel approach to track joints and corresponding links between them. Anthropometric con- in a single uncalibrated camera. Our motivation was the re- straints relate the landmarks of two persons if they are in the cent successful application of anthropometric constraints in same posture. Given a test video, where an actor performs the action recognition framework [3]. The anthropometric the same action as in a model video, and joint locations in constraints establish the relation between semantically corre- the model video, anthropometric constraints are used to deter- sponding anatomical landmarks of different people, perform- mine the epipolar lines, where the potential joint locations are ing the same action, in a fashion, as epipolar geometry gov- searched in the test video. The edge templates around joints erns the relation between corresponding points from differ- and related links are used to locate joints in the test video. ent views of the same scene. Because of the nature of an- The performance of this method is demonstrated on several thropometric constraints, the epipolar lines, associated with different human actions. landmarks, can slightly deviate from epipolar lines (due to the errors in positioning landmarks and linear relation between human bodies of different sizes). However, they still 1. INTRODUCTION can reasonably approximate the landmark locations. Anthro- Tracking of human joints is one of the important tasks in com- pometric constraints and known image positions of joints in puter vision due to the vast area of applications. These ap- a model video can be combined in as alternative approach to plications include surveillance, human-computer interaction, complex methods. As with previous methods, the proposed action recognition, athlete performance analysis, etc. Joints approach also has limitations, mainly due to view geometric tracking is a hard problem, since the appearance changes sig- constraints; however, these limitations can be solved without nificantly due to non-rigid motion of humans, clothing, view strong additional efforts. The performance of the proposed point, lighting etc., therefore, appearance alone is not enough approach is demonstrated on several actions. for successful tracking. We propose a novel approach for 2D joints tracking in a single uncalibrated camera using anthro- 2. A HUMAN MODEL pometric constraints and known joint locations in a model video. We consider a window around the joint for modeling. This There has been a large amount of work related to this window provides us with the color and the edge information. problem, and for a more detailed analysis we refer to surveys The detection and tracking of joints can be improved by im- by Gavrila and Moeslund [2, 5]. The advanced methods are posing constraints on their mutual geometric coherence, i.e. based on sophisticated tracking algorithms. The Kalman filter the optimal joint locations must preserve an appearance of has been used previously for human motion tracking [8, 7], the links (body parts) connecting joints. Image regions cor- however, the use of the Kalman filter is limited by complex responding to links contain more essential information than human dynamics. A strong alternative to the Kalman filter windows around joints. Windows around joints and regions is the Condensation algorithm [4], employed by Ong in [6] corresponding to links can be perfectly embedded in a pic- and by Sidenbladh in [9]. In [1], Rehg modified the Con- torial structure. We refer to an entity performing an action densation algorithm to overcome the problem of a large state as an actor.A posture is a stance that an actor has at a cer- space required for human motion tracking. However, even if tain time instant, not to be confused with the actor’s pose, a kinematic model is known, it is a non-trivial task to predict which refers to position and orientation (in a rigid sense). possible deviations from the model. The pose and posture of an actor in terms of a set of points Since, humans perform actions with significant spatial and in 3-space is represented in terms of a set of 4-vectors Q = temporal variations that are hard to model, a tracker should fX1; X2;:::; Xng, where Xk = (Xk;Y k;Zk; ¤)> are ho- a pictorial structure defined as follow Chest 2 J P = (V; S); L )5,2( L )1,2( L )3,2( L )4,2( where V = fx1; x2;:::; xng corresponds to joins, and S = Right Left (k;j) Head Belly fL j k 6= j; k; j 2 Vg corresponds to links. The im- Shoulder Shoulder 1 2 n J 5 J 1 J 3 J 4 aged joint positions are represented by q = fx ; x ;:::; x g, L )7,5( L ,3( 11) L ,3( 10) L )6,4( where xk = (ak; bk; ¸)>. Xk and xk are related by a 4 £ 3 k k Right Right Left Left projection matrix C, i.e. x = CX . In [3], we proposed a Elbow Knee Knee Elbow conjecture, which states that there exists an invertible 4 £ 4 J 7 J 11 J 10 J 6 non-singular matrix relating the anatomical landmarks (Q and )9,7( (11,13) (10,12) )8,6( L L L L W ) of two actors, if they are in the same posture, s.t. Xk = Right Right Left Left k Palm Foot Foot Palm MY . As a consequence of this conjecture, we have the fol- J 9 J 13 J 12 J 8 lowing. First, if q and w describe the imaged positions of a) b) joints of two actors, a fundamental matrix F can be uniquely associated with (xk; yk), i.e. xk>Fyk = 0, if two actors are in the same posture, see Fig.1 c-d). Second, the fundamental matrix remains the same for all frames during the action as far as the actors perform the same action. 3. TRACKING We assume a model video corresponding to different actions is available in the database, and joint locations in the model video are known. The problem then is given an unknown test video, we need to simultaneously decide, which action it is c) and determine frame to frame joint correspondences. Suppose in a test and model video actors perform the same action. Known image location of the joint k in the frame i of k the model video is denoted by yi , and unknown image location of the joint k in the frame j of the test video is de- k noted by xj . Assuming the joint locations in each frame i of the model video and an initial correspondence among joints, 1 2 n 1 2 n w1 = fy1; y1;:::; y1 g and q1 = fx1; x1;:::; x1 g, between the first two frames of the model and test video are known, we propose an algorithm for the joints tracking in the test video. Since we know enough number of joint correspondences between two starting postures of both actors, the fundamental d) matrix, F, can be recovered. Thus, kth joint location in the k Fig. 1. a) Point-based representation. b) Pictorial structure showing frame i, yi , of the model video corresponds to the epipolar k th joints and corresponding links. c-d) The fundamental matrix cap- line, lj , passing through k joint location in some frame j tures the relationship between joints of two different actors that are of the test video. From fundamental matrix, F, we can com- k k in the same posture and the variability in proportion as well as the pute an epipolar line using lj = yi F. Thus, knowing F and change in viewpoint. c) An actor in two frames of the model video. imaged joint locations in the model video, it is possible to d) Another actor in the corresponding frames of the test video. The predict the joint locations in each frame of the test video. joint correspondences in first frames of model and test video were used to compute the fundamental matrix. The image on right in d) shows epipolar lines in different colors corresponding to joints in the 3.1. Locating joints in test video image on right in c). As it is clear that the joints in the test video lies Assume that joint correspondences between frames, fi in the on the corresponding epipolar lines. k k model and fj in the test video, are known, therefore yi Fxj = 0. We can impose constraints on the search space of joint locations in frame fj+1 of the test video by using the known mogenous coordinates of a joint k. Each point represents a joint locations in frames fi+m of the model video, where m spatial coordinate of a joint as shown in Fig.1 a), and points is a length of the temporal window and m = 0;:::;T . For k are connected by links. Thus, a human body is represented as each joint, xj , the search space will be embedded between (e.g. 16 £ 16) window of the edge map centered around the joint location and its links in the first frame of the test video, see Fig.2 b). In order to find the match for the given joint in the current frame, we search for the location, which gives the minimum Hausdorff distance between the model template and the corresponding patches around the candidate location k (k;m) in the search space.

Load more