Trajectory-Based Human Activity Recognition Using Hidden Conditional Random Fields

Trajectory-Based Human Activity Recognition Using Hidden Conditional Random Fields QINGBIN GAO, SHILIANG SUN Department of Computer Science and Technology, East China Normal University 500 Dongchuan Road, Shanghai 200241, P.R. China E-MAIL: [email protected], [email protected] Abstract: trajectories accurately. Methods based on Hidden Markov This paper presents a new method for recognizing Models (HMMs) have been widely used for this problem trajectory-based human activities. We use a discriminative [1] [5] [9]. In these methods, a restrictive, usually un- latent variable model in our proposed method, which con- realistic assumption is made to ensure that observations siders that human trajectories are made up of some spe- are conditionally independent given the values of latent cific motion regimes, and different activities have different variables. However, since human behaviors are complex, switching patterns among the motion regimes. We model it is often more accurately modeled by incorporating long the trajectories using Hidden Conditional Random Fields range dependencies and allowing latent variables to depend (HCRFs) and the motion regimes act as sub-structures in on several local features. the model. Experiments using both synthetic and real Conditional Random Fields (CRFs) have proven to be data sets demonstrate the superiority of our model in a successful tool for labeling sequence data and have been comparison with other methods, including Hidden Markov successfully used for tasks such as part-of-speech tagging Models (HMM) and Conditional Random Fields (CRFs). and gesture recognition [3] [10]. CRFs condition on the observations without modeling them, and therefore they Keywords: avoid the independence assumption and can accommodate human activity recognition, trajectory classification, hid- long range dependencies among observations at different den conditional random field steps. However, CRFs assign each observation in a sequence a label, and they neither capture hidden states nor directly 1. Introduction provide a way to estimate the conditional probability of a The goal of human activity recognition (HAR) is to class label for an entire sequence [11]. This situation leads understand what people are doing from their position [1], to their unfitness for trajectory classification tasks. From daily experience we know that complex human figure [2], motion [3], or other spatiotemporal information behaviors usually consist of simple motion regimes. For derived from video sequences. With the potential for wide example, the behavior of a person “crossing a park” may applications, HAR has been actively investigated for tens be decomposed into “moving east first” and “then moving of years. A focus of recent interest is the use of trajectory north”. This observation underlies the use of models in- data, to learn to recognize human behaviors in which a cluding hidden states, which have a capacity for capturing person is engaged over a long period of time [1] [4] [5]. intrinsic sub-structures. Hidden Conditional Random Fields An important application area in this domain is automatic (HCRFs) are discriminative latent variable models. HCRFs surveillance which is used in busy public places, such as are based on CRFs, and moreover, they use intermediate parks, airports, campus, etc. In a surveillance case, HAR hidden variables to model the latent structures of the input aims at characterizing human behaviors and alarming at any domain [12]. Therefore they avoid the independence as- illegal or abnormal activities being performed [6]. Other sumption and have a capacity for capturing sub-structures. examples in this area include human robot interaction [7], In this paper, we propose a method for trajectory- intelligent environment [8], etc. based human activity recognition based on HCRFs. In our The challenge of this research is how to recognize method, a set of latent variables is introduced to model The potential function can be defined arbitrarily accord- the unobservable motion regimes and different activities are ing to special tasks. A widely used form is recognized based on different switching patterns. Our work is related to the switched dynamical HMM (SD-HMM) [1]. φc(Yc, Xc)=exp( λif1,c(yi, Xc) However, there are important differences. One most signifi- i∈Vc (3) cant difference is that SD-HMM is a generative model while + βi,jf2,c(yi,yj, Xc)), ours is a discriminative one. Another difference is that, in (i,j)∈Ec [1], different activities share identical motion regimes, while in our method, the potential of motion regimes of different where f1,c is a state feature function which models the activities are differently parameterized. We examine our observation-label correlations, f2,c is a transition feature model on both synthetic and real data sets and compare its function which models the label-label dependencies, and λi performance against HMM-based and CRF-based methods. and βi,j are weights to be estimated. Experimental results show the superiority of our model. To simplify the formula, we use a feature function The remainder of this paper is organized as follows. Fc(Yc, Xc) to represent either a state function or a transition Section 2 gives a brief introduction of CRFs. Section 3 function, and λi, βi,j are represented by a set of weights wc. presents the detailed model for human trajectories, including Then the forms of potential functions turn to the parameter estimation and inference techniques. Section 4 reports experimental results on both synthetic and real data φc(Yc, Xc)=exp(wcFc(Yc, Xc)). (4) sets including comparisons with two other methods, HMMs Put 4 into 1 and 2, and the conditional distribution turns and CRFs. Finally, Section 5 gives conclusions and future to research directions. 1 P (Y |X)= exp(wcFc(Yc, Xc)) Z(X) c∈C 2. CRFs: A Nutshell (5) 1 = exp( (wcFc(Yc, Xc))), Before describing our model, we give a review of Z(X) c∈C CRFs proposed by [10], which will make HCRFs easier to understand. where Z(X) is CRFs are undirected graphical models (UGMs) which aim at mapping a sequence of observations Z(X)= exp( wcFc(Yc, Xc)). (6) X = {x1,x2, ..., xm} to a sequence of labels Y c∈C Y = {y1,y2, ..., ym}. Let G = (V, E) be a UGM and 3. Human Activity Recognition Y be indexed by the vertices of G, Y =(Yv)v∈V . (i, j) ∈ E is an edge when there exists a link between nodes yi and yj. 3.1. Trajectory Model By defining different edge structures, CRFs can be applied to different tasks. If when conditioned on X, each yv obeys Our task is to learn a mapping from a sequential trajecto- the Markov property with respect to G, then (Y, X) is a CRF. ry X to a single activity label y. Formally, each trajectory X P ( | ) To define the conditional distribution Y X , we formulate is a vector of observations, X = {x1,x2, ..., xT }, and each in terms of maximal cliques, which are the fully connected observation xt implies the displacement of a person from C sub-graphs in a CRF. Let be the set of all maximal cliques time t-1 to time t (t =1, ..., T ). xt is represented by a D- c D of G, and the non-negative potential function of clique be dimension local feature, φ(xt) ∈ R . Each y is one of the represented as φc(Yc, Xc), then the conditional distribution activity labels represented by a set of constants. Assume can be written as we have Y activities, then y ∈{1, 2, ..., Y}. Based on 1 the fully observable CRFs described in previous section, we P (Y|X)= φc(Yc, Xc), (1) { } Z(X) c∈C introduce a vector of latent variables H = h1,h2, ..., hT to model the intermediate motion regimes contained in Z( ) where X is a normalization factor which guarantees that complex activities [12]. Each ht is a member of a finite set the distribution sums to one. Specifically, Z(X) can be H, which is the collection of all possible motion regimes. computed by summing over all possible configurations of Y For example, if we assume that all trajectories are made Z(X)= φc(Yc, Xc). (2) up of five motion regimes, which are “stopped”, “moving Y c∈C east”, “moving west”, “moving south”, “moving north”, where the normalization factor Z(X) take the form as \ /DEHO $FWLYLWLHV T Z( ; θ)= exp( F (y ,ht−1,ht, ; θ)). /DWHQW9DULDEOHV X X (9) K K K K7 0RWLRQ5HJLPHV y,H t=1 We define the feature function F as follows 2EVHUYDWLRQV [ [ [ [7 'LVSODFHPHQWV F (y, ht−1,ht, X; θ)= θafa(y, ht−1,ht, X) a∈A (10) + θbfb(y, ht, X), Figure 1. The chain structure HCFR for trajectory b∈B recognition. where A is the set of edge features and B is the set of node features, fa is a predefined transition function which depends on a pair of latent variables and fb is a predefined H then contains all five of them and each ht corresponds state function which depends on a single latent variable in to one of them. Let’s still consider the UMGG=(V, the model. θ = {θa,θb} are parameters to be estimated from { } E), In a HCRF, the latent variables H = h1,h2, ..., hT training data. correspond to vertices in the graph and (i, j) ∈ E is an edge when there exists a link between variables hi and hj. It’s 3.2. Parameter Estimation worth noticing that the presence of an edge between two vertices in an UMG implies the dependencies between the Our training data set consists of N labeled trajectories, T { } random variables represented by these vertices. By defining = (X1,y1), (X2,y2), ..., (XN ,yN ) . The parameters different edge structures, HCRFs can be applied to different can be obtained by optimizing the conditional log-likelihood domains.

Load more