Clustering Human Behaviors with Dynamic Time Warping and Hidden Markov Models for a Video Surveillance System
Total Page:16
File Type:pdf, Size:1020Kb
Clustering Human Behaviors with Dynamic Time Warping and Hidden Markov Models for a Video Surveillance System Kan Ouivirach and Matthew N. Dailey Computer Science and Information Management Asian Institute of Technology [email protected], [email protected] Abstract—We propose and experimentally evaluate a new by constructing an affinity matrix using dynamic time warping method for clustering human behaviors that is suitable for (DTW) [7] then apply the normalized-cut approach to cluster bootstrapping an anomaly detection module for intelligent video the gestures. Hautamaki¨ et al. [8] apply DTW and use the surveillance systems. The method uses dynamic time warping, ag- glomerative hierarchical clustering, and hidden Markov models pairwise DTW distances as input to a hierarchical clustering to provide an initial partitioning of a set of observation sequences process in which k-means is used to fine-tune the output. then automatically identifies where to cut off the hierarchical In this paper, we use a combination of clustering and HMMs clustering dendrogram. We show that the method is extremely to group human behaviors in a scene. There is some recent effective, providing 100% accuracy in separating anomalous from related work using graphical models such as HMMs to cluster typical behaviors on real-world testbed video surveillance data. behavior patterns. Li and Biswas [9] use the Bayesian informa- tion criterion (BIC) for HMM model selection and construct I. INTRODUCTION a binary hierarchical clustering dendrogram to initialize data Human behavior understanding is an important component partitions based on a sequence-to-model likelihood distance of a wide variety of desirable intelligent systems. However, the measure. They then compare each pair of clusters using a problem is very difficult, due to the wide range of activities partition mutual information (PMI) criterion [10] to find the possible in any given context and the large amount of vari- optimal number of clusters. ability within any particular activity. Many researchers have Xiang and Gong [11] model the distribution of activity data attempted to build systems able to interpret and understand in a scene using a Gaussian mixture model (GMM) and also human behaviors. The most classic work is from Yamato employ the BIC to select the optimal number of behavior et al. [1], who model tennis actions using hidden Markov classes prior to HMM training. models (HMMs). Du et al. [2] present an approach to recog- Swears et al. [12] propose hierarchical HMM-based cluster- nize interaction activities using dynamic Bayesian networks ing to find and cluster motion trajectories and velocities in a (DBNs) that outperforms conventional HMMs. Gao et al. [3] highway interchange scene. They build up a set of HMMs use a single mixture-of-Gaussian HMM, in which the states incrementally. For each new trajectory, they first test the represent the stages of dining activities, to monitor the eating likelihood of the trajectory according to each existing HMM behavior of elderly people in a nursing home. model. If the new observation is not fit by any existing model, As video monitoring is becoming more ubiquitous in our it is considered deviant and is grouped with other deviant lives, research on advanced video surveillance analysis is in- observations to form a new HMM. creasingly important. To help security personnel work reliably Alon et al. [13] propose a method to discover groupings and efficiently, we would like to filter out typical events, and of similar object motions. They apply a finite mixture of in cases of anomalous events, automatically raise an alarm or HMMs where the number of mixture components is assumed present the event to a human operator for consideration as to be known. They estimate the number of clusters using the a security threat. One limitation of most of the work is that minimum description length (MDL) criterion [14], a penalized the number of “normal” behavior patterns need to be known likelihood measure. beforehand. One example is that of Nair and Clark [4], who We propose a new method for clustering human behaviors built an automated video surveillance system using HMMs, in the context of video surveillance. After extracting sequences each modeling a common, predefined activity in a scene. of features representing individual human behaviors in a More recently, research has started to focus on unsupervised given scene, we use DTW to measure the pairwise similarity analysis and clustering of behaviors in a particular scene for a between sequences. Then we construct an agglomerative hier- variety of purposes including anomaly detection, surveillance, archical clustering dendrogram based on the DTW similarity and classification. Zhong et al. [5] treat video segments measure. To find the optimal set of behavior clusters, we start as documents and cluster the documents based on the co- at the root of the tree, train a HMM on the patterns in that occurrence information. Li et al. [6] cluster human gestures cluster, and determine how well the HMM models the set CCTV�camera of patterns in the cluster. When we find that a HMM is an insufficient representation of the patterns in a given cluster, Video we throw away that HMM and recursively consider each Background Discrete�symbol Foreground model Background sequences of the child clusters according to the pre-calculated DTW- Extraction Modeling based dendrogram. Our method is able to automatically find List�of�blobs Similarity the common human behaviors occurring in a given scene. In Single�Blob Measurement Tracking Distance an experiment with a testbed video surveillance data set, we Blob�features matrix find that the method separates typical behaviors and abnormal Vector Agglomerative Quantization Hierarchical�Clustering behaviors into separate sets of clusters with 100% accuracy. Observation The most similar related work is that of Oates et al. [15], symbols Dendrogram Sequence HMM-based who first proposed the idea of using the DTW with HMMs Aggregation Hierarchical�Clustering to cluster time series. They use the DTW dendrogram cut Discrete�symbol off at an arbitrary depth as an initial partition of the training sequences Set�of�HMMs sequences, then they train HMMs on each partition iteratively (a) (b) until they have a set of HMMs that models all of the training sequences. They apply their method to simulated time series Fig. 1. Block diagrams for the proposed method. (a) Blob extraction flow. with good results but report obtaining poor clustering results (b) Behavior clustering flow. in an experiment with real robot sensor data. Our method has the potential to improve upon the state of the art in intelligent video surveillance applications by bootstrapping human behavior classification and anomaly de- tection modules in a given installation. Once a set of ini- tial clusters and corresponding HMMs representing typical (a) (b) (c) behavior is determined, we can easily find which cluster a new sequence should fall into by performing statistical Fig. 2. Sample foreground extraction and shadow removal results. (a) tests on the sequence’s likelihood according to each HMM Original image. (b) Foreground pixels according to background model. (c) Foreground pixels after shadow removal. model. When the likelihood is low according to all of the pre-existing clusters, we can consider the sequence to be anomalous and alert a human operator. When the likelihood 1) Apply dynamic time warping to the set of sequences to is sufficiently high for one of the pre-existing models, we can construct a similarity-based distance matrix. simply incrementally update the sufficient statistics for that 2) Run agglomerative hierarchical clustering on the dis- model. This approach would allow raising alerts for behaviors tance matrix to get a dendrogram. inconsistent with the automatically-derived typical behavior 3) Perform HMM-based hierarchical clustering using the profile for the scene while providing adaptation to gradual set of sequences and the dendrogram to get a set of changes in typical behavior patterns over time. We do not HMMs modeling the typical behaviors in the scene. focus on these incremental learning and anomaly detection issues in this paper, but we plan to in future work. B. Blob Extraction II. HUMAN BEHAVIOR PATTERN CLUSTERING After discarding the no-motion video segments, we use A. Overview Poppe et al.’s [16] background modeling technique to separate the moving foreground pixels from the background. Poppe Fig. 1 provides an overview of the architecture of our pro- et al. extend the standard mixture of Gaussians background posed method. The blob extraction phase (Fig. 1a) generates model [17] to handle gradual illumination changes. observation sequences from videos as follows: To remove shadows cast by moving objects, we eliminate 1) Grab a few initial frames from an the input video to any foreground pixels whose gray level normalized cross model the background scene. correlation (NCC) with the background model is above some 2) Perform foreground extraction to get a list of blobs. threshold. This method works well in outdoor scenes with vis- 3) Find the single largest blob in the scene, remove any ible background texture inside shadows. Sample results from pixels likely to be shadow pixels, and extract feature the foreground extraction and shadow removal procedures are ~ vector ft for the blob at time t. shown in Fig. 2. 4) Apply vector quantization using the k-means algorithm We next apply morphological opening and closing opera- to convert features