A Cluster Specific Latent Dirichlet Allocation Model for Trajectory Clustering in Crowded Videos
Total Page:16
File Type:pdf, Size:1020Kb
A CLUSTER SPECIFIC LATENT DIRICHLET ALLOCATION MODEL FOR TRAJECTORY CLUSTERING IN CROWDED VIDEOS ∗ Jialing Zou, Yanting Cui, Fang Wan, Qixiang Ye, Jianbin Jiao University of Chinese Academy of Sciences, Beijing 101408, China. ∗ [email protected] ABSTRACT Trajectory analysis in crowded video scenes is challeng- ing as trajectories obtained by existing tracking algorithms are often fragmented. In this paper, we propose a new approach to do trajectory inference and clustering on fragmented tra- jectories, by exploring a cluster specif c Latent Dirichlet Al- location(CLDA) model. LDA models are widely used to learn middle level trajectory features and perform trajectory infer- ence. However, they often require scene priors in the learn- ing or inference process. Our cluster specif c LDA model ad- dresses this issue by using manifold based clustering as ini- tialization and iterative statistical inference as optimization. The output middle level features of CLDA are input to a clus- (a) (b) tering algorithm to obtain trajectory clusters. Experiments on Fig. 1. (a) Illustration of the LDA based topic extraction in a public dataset show the effectiveness of our approach. original feature space (up) and a manifold embedding space Index Terms— Latent Dirichlet Allocation, Manifold, (down). (b) Illustration of correspondence between trajectory Trajectory clustering clusters and the manifold embedding. 1. INTRODUCTION [6], Wang et al. propose two Euclidean similarity measures, and perform trajectories with the def ned measures. In [7], Trajectory clustering is a video analysis task whose goal is Hu et al. introduce a dimensional spectral clustering method. to assign individual trajectories with common cluster labels, Trajectories are projected to a lower space through eigenvalue with applications in activity surveillance, traff c f ow estima- factorization, and are clustered in the lower sub-space with a tion and emergency response [1], [2]. k-means algorithm. In straightforward trajectory clustering approaches, a set Middle level features can also be learned with hierarchical of features, i.e. coordinates, velocities and/or geometrical latent variable Bayesian models, such as latent Dirichilet al- shapes and scene specif c information, are extracted to rep- location (LDA) [8] models, which have been widely explored resent trajectories, and then unsupervised learning methods in classif cation and clustering tasks [9], [10], [11], [12], [13], are used to classify these features [1], [2], [3]. However, in [14]. These models are adopted from the natural language videos of crowded scenes, it is often diff cult to obtain com- processing and are well known as ”topic models”. With LDA plete trajectories with off-the-shelf tracking algorithms [4]. In related models, trajectories are treated as documents and ob- most cases, fragmented trajectories of different length are ob- servations of trajectories are treated as visual words. Learned tained, which are diff cult to be aligned and represented with topics correspond to the middle level features of trajectories. low level features. In [12], Li et al. proposed a theme model that poses class Recently, middle level feature based trajectory clustering supervision to LDA, and enables different classes have dif- approaches have attracted attentions. Middle level features ferent topics. In [13], Daniel et.al. proposed a labeled LDA are usually observed as dominant paths of moving object- that assigns each class a Dirichlet prior. In [11], Wang et al. s, which can bridge the fragmented trajectories in low level proposed a topic-supervision LDA (ts-LDA) which enables feature space with their clusters [5]. Trajectory middle level different classes share different topics. In [14] Wang et al. features can be learned with non-Bayesian approaches, for ex- proposed a semi-latent LDA, which enables both the latent ample, similarity clustering [6] or dimension reduction [7]. In labels and words are visible in the training process. In [10], 978-1-4799-5751-4/14/$31.00 ©2014 IEEE 2348 ICIP 2014 Wang et al. proposed a mixture of latent Dirichlet allocation Nj (LDA) models for trajectory clustering. In [9], Zhou et al. z x % j ji ji ! proposed a Random Field Topic (RFT) model for trajectory M M N (a) j clustering by integrating scene priors. "c c $ C Despite of the effectiveness of above LDA models, how- Nj ever, most of them [10, 11, 12, 13] ignore the distribution % z x C j ji ji ! % z of data, so they often require a complex parameter estimation M and variable inference procedure. Some of them [9] use scene (b) (c) priors to improve performance, but can only be used in situ- ations where priors are available. Some of them [10, 12] use Fig. 2. LDA and CLDA models. (a) Graphical representation cluster initialization to replace scene priors, however, to do a of LDA, (b) Graphical representation of CLDA, (c) Graphical good initialization in a high dimensionalfeature space is often representation of approximate distribution of CLDA. diff cult. We propose a cluster-specif c LDA (CLDA) model with cluster initialization in a manifold embedding space and it- nates and velocities of tracked objects are described in Section θ erative optimization with Bayesian inference. As shown in 3. The joint probability of the model is given by Eq.(1), the p(θ|c, α) = C Dir(θ|α )δ(c,j) Fig.1(a), the cluster initialization enables that our approach probability is changed to j=1 j , Dir • create topics (middle level features) that can ref ect data distri- where ( ) denotes the Dirichlet distribution based on pa- α Q bution and cluster information, effectively. After the iterative rameter j . p(x, z, θ, c|α, β, η) = p(c|η)p(θ|α, c)· optimization, the CLDA model creates discriminative topics M (1) for clustering without using any scene prior. Here, “discrimi- j=1 p(zj|θj )p(xj |zj , β). native” implies that different sets of trajectories have different Other terms are consistentQ with that in [8]. Similar to [12], clusters in the manifold space, as shown in Fig.1(b). we set the uniform distribution p(c) = 1/C. And then The remainder of the paper is organized as follows: The leave out the estimation of η. The label C is updated by CLDA model is described in Section 2. The trajectory clus- arg maxc p(x|c, α, β), where x denotes a set of words of a tering approach is presented in section 3. Experiments are trajectory. The posterior probability of x is in Eq. (2). presented in Section 4 and we conclude the paper in Section p(x|α, β, c) = p(θ|α, c) 5. M (2) p(zj|θj )p(xj |zj , β) dθ. j R zj We use the variationalQ breakingP algorithm in [8] to do variable 2. CLUSTER-SPECIFIC LATENT DIRICHLET inference and parameter estimation. Fig.2(c) is the graphical ALLOCATION representation of the approximate distribution of the CLDA. Therefore, we have This section presents the CLDA and the middle level feature log p(x|α, β, c) = L(γc, φc; αc, β)+ (topic) extraction with the CLDA parameter estimation. (3) KL(q(θ, z|γc, φc)||p(θ, z|x, αc, β)). We iteratively maximize the the term L(•) instead of 2.1. Latent Dirichlet Allocation (LDA) log p(x|α, β, c), which results in the minimum of difference between the distribution in Fig. 2(b) and Fig. 2(c). The To make the paper self-contained, we f rst review the LDA. details of computation can be seen in [8]. In Eq. (4) and Eq. Fig. 2(a) shows the graphical representation of LDA [8]. Four (5), we present two terms related to middle level features. important notations about LDA are corpus, document, topic K and word, which in our case correspond to path, trajectory, c c c topic and visual word, respectively. M is number of docu- φki ∝ β exp Ψ(γk) − Ψ( γk) (4) " k=1 # ments. Nj is the number of words of each trajectory j. θ c X βk ∝ Σ φk,ini (5) is assumed to follow a Dirichlet distribution with parameter i α. zji is a latent variable being assumed to follow a parame- where φc denotes the probability that the ith word belongs terized multinomial distribution Mult(θj). x denotes words. ki β is hyper-parameter, corresponding to the middle level fea- to the kth topic’s. Ψ(•) is a digamma function and ni is the tures. count of the ith word in the codebook. βk is the kth topic’s feature representation with respect to the codebook. 2.2. Cluster-specif c Dirichlet Allocation (CLDA) 3. TRAJECTORY CLUSTERING Fig.2(b) shows the graphical representation of CLDA. It can be seen that observed visual words (low level features) and la- In trajectory clustering, we f rst connect trajectory fragments bels are inputs of the CLDA. The visual words that are coordi- into trajectory trees and extract low level features, which are 978-1-4799-5751-4/14/$31.00 ©2014 IEEE 2349 ICIP 2014 embedded in a dimensional manifold space for initial trajec- Low Level Feature Middle Level Feature Trajectory Extraction Extraction Clustering KLT Tracking tory clustering. With initial cluster labels, the middle level Bag of Cluster & CLDA words Label features are extracted by the proposed CLDA for f ne trajec- Videos Tree Spanning tory clustering. The f owchart of the proposed approach is Initial clustering shown in Fig.3. K-means on manifold 3.1. Low-level Feature Extraction Fig. 3. Flowchart of the proposed trajectory clustering ap- Given a crowded video, we f rst use a KLT tracker [15] to cal- proach. culate trajectory segments and motion vectors of objects. A spanning tree algorithm [9] is used to uncover spatiotemporal will be input to the information entropy clustering algorith- relations among trajectory fragments and connect them into m [9], which computes potential information entropy of trees trajectory trees, on which visual words are extracted as low that a trajectory segment belongs to, and then determines the level features.