Incremental Discriminative-Analysis of Canonical Correlations for Action Recognition

Xinxiao Wu Wei Liang Yunde Jia Beijing Laboratory of Intelligent Information Technology, School of Computer Science Beijing Institute of Technology, Beijing 100081, PR China {wuxinxiao, liangwei, jiayunde }@bit.edu.cn

recognition, each action is represented by an orthogonal Abstract linear subspace of sequential images and the similarity between two actions is defined by the canonical correlation Human action recognition is a challenging problem of the corresponding two subspaces. We do not take into due to the large changes of human appearance in the cases account the temporal dynamics of an action and in many of partial occlusions, non-rigid deformations and high cases several principal images even a single image is irregularities. It is difficult to collect a large set of training sufficient to recognize what a person is doing. samples with the hope of covering all possible variations of Another problem in action recognition rises from high an action. In this paper, we propose an online recognition irregularities of actions undergoing various non-stationary method, namely Incremental Discriminant-Analysis of scenarios. Taking the walk action for example, people may Canonical Correlations (IDCC), whose discriminative walk with a dog, swinging a bag, carrying a briefcase or model is incrementally updated to capture the changes of partially occluded by other objects. It is difficult to account human appearance and thereby facilitates the recognition for all the possible variations of an action during learning task in changing environments. As the training sets are the discriminative model. In order to make the recognition acquired sequentially instead of being given completely in task adapt to the changing image observations, we aim to advance, our method is able to compute a new discriminant find a discriminative model that can be online learned to matrix by updating the existing one using the eigenspace describe the changes of human appearance and accurately merging algorithm. Experimental results on both classify the actions even in the irregular performances. Our Weizmann and KTH action data sets show that our method method is capable of online updating the discriminative performs better than state-of-the-art methods on both model with capturing images as the new training data, accuracy and efficiency. Moreover, the robustness of our therefore the updated discriminative model can reflect the method is demonstrated on the irregular action appearance variations. By merging eigenspace models [20], recognition. the proposed method updates the principal components of the total canonical correlations and between-class canonical correlations separately, and then computes the discriminant components directly from both updated 1. Introduction principal component sets. To improve the computation Recognizing human actions has recently attracted efficiency of eigen-analysis, the sufficient spanning set [20] increasing interests from computer vision for a wide range is adopted in the solution. of promising applications, such as video indexing, visual surveillance, human-computer interaction, sports video 2. Introduction analysis, and intelligent systems. As motion speeds and body sizes are associated with

individuals, the same action executed by different persons 2.1. Action recognition may exhibit large variations; while the environmental Many approaches for human action recognition have conditions such as lighting and view point may make the been proposed in recent decades. Wang and Suter [1] used observations of different actions become similar. In order kernel principal component analysis to obtain the to reduce the variations of actions within the same class and low-dimensional representation of human silhouette and suppress the environmental contributions to the similarities introduced factorial conditional random field to model the of actions in different classes, our method maximizes the motion. Their framework can effectively recognize human canonical correlations of actions within the same class and activities performed by different people with different body minimizes the canonical correlations of actions between builds as well as different motion styles and speeds. Jia and different classes. In the proposed algorithm for action Yeung [2] reported a new manifold embedding method to

2035

discover both the local spatial and temporal discriminant class together and do not exploit the concept of multiple structures of human silhouette. They designed a two-stage sets in a single class. Our method maximizes the canonical recognition scheme to improve the recognition on low correlations between multiple sets within the same class sensitivity to the temporal shape variation in the same and is more robust to the intra-class changes. action. Rodrigue et al. [3] introduced a template-based method for action recognition which is capable of capturing 3. Background intra-class variability by synthesizing a single Action Table 1 demonstrates the important notations used MACH filter for a given action class. Jhuang et al. [5] throughout the paper. applied a biological model of motion processing to the action recognition by accounting for the dorsal stream of the visual cortex. Some other approaches [7-10, 12-14] Notations Descriptions extracted local spatio-temporal features to exploit rich and -th image set with each column describing Xi i intrinsic representation and introduced statistical models to an image classify the action in large intra-class variations. C class label of X i However, most of these approaches offline learn the i orthonormal basis matrix representing the recognition model and lack the adaptability to classify the Pi different irregular actions which are not included in the linear subspace of X i training data. Our method can online efficiently update the T discriminant transformation matrix discriminative model to learn the changes of human transformed canonical correlations of appearance with the superior adaptability to recognize high Sb , St between-class sets and total sets irregular actions. Moreover, to address the intra-class eigenvector and eigenvalue matrices of variation problem, our method maximizes the canonical V , Sb correlations of within-class image sets and minimizes the U , eigenvector and eigenvalue matrices of St canonical correlations of between-class sets. m number of image sets 2.2. Incremental learning Table1: Notations.

Recently, a number of incremental learning approaches By analogy to the optimization concept of LDA [21], have been proposed and applied to computer vision. Hall et Discriminant-Analysis of Canonical Correlations (DCC) al. [15] proposed Incremental Principle Component [19] introduces a linear discriminative function to Analysis (IPCA) based on the update of covariance matrix maximize canonical correlations of within-class sets and through a residue estimating procedure. Then they minimize canonical correlations of between-class sets. improved their method by merging and splitting eigenspace Assume m image sets are given as {X1, X 2,..., X m} , models that allow a chunk of new samples to be learned in a here Xi represents a matrix with each column describing an single step [20]. Pang et al. [16] proposed an Incremental image. belongs to one action class denoted by . Linear Discriminant Analysis (ILDA) in two forms: Xi Ci sequential ILDA and chunk ILDA. The discriminant A d -dimensional linear subspace of Xi is represented by eigenspace is updated for classification when bursts of data an orthonormal basis matrix P + R N d s.t. X X T P PT . are added to an initial discriminant eigenspace in the form i i i i i i of random chunks. As an improvement of ILDA, Kim et al. i and Pi are the eigenvalues and eigenvector matrices of [17] applied the concept of the sufficient spanning set the d largest eigenvalues, and N is the dimension of column approximation in updating the between-class scatter matrix, vector. The discriminant transformation matrix the projected data matrix as well as the total scatter matrix. T T [t ,...,t ]+ RN n is defined by Y T X to make the Lin et al. [18] handled with the online update of 1 n i i discriminative models for tracking objects undergoing transformed image sets more discriminative using large pose and lighting changes. In the image set-based canonical correlations. Orthonormal basis matrices of the recognition, Kim et al. [22] proposed an incremental subspaces of the transformed data are given by T T T T T T T method of learning orthogonal subspace. With the concept YiYi (T Xi )(T Xi ) (T Pi ) i (T Pi ) . (1) of the sufficient spanning set, the algorithm separately Canonical correlations are only defined for orthonormal updates the principal components of the class correlation T basis matrices of subspace. Because T Pi is not generally and total correlation matrices, and then computes the orthonormal, the matrix is normalized to ' so that the orthogonal components of the updated few principal Pi Pi T ' components. columns of T Pi are orthonormal. By the SVD Many of these methods just combine all examples of a T ' T T ' T computation (T Pi ) (T Pj ) Qij Q ji , the similarity of

2036

two transformed data sets is defined as the sum of canonical The solution of incremental discriminant-analysis correlations: canonical correlation includes three steps: updating the T ' T 'T . (2) canonical correlations of the total sets, updating the Fij max tr{T PjQ jiQij Pi T} Qij , Q ji canonical correlations of the between-class sets, and T is determined to maximize the similarities of any pairs of computing the discriminant transformation matrix. Note within-class sets and minimize the similarities of pair-wise that Sb and St are respectively the linear algebra sets of different classes transformation (see Eq.4) of between-class canonical m BB+Fik correlations and total canonical correlations, so the iW1 k i . (3) T arg max m proposed algorithm actually involves the update of T BB F iB+1 l il i principal components of St , the update of principal The two index sets W {j|C C } and B {j |C C } i j i i j i components of Sb and the computation of T from the respectively denote the within-class and between-class sets updated St and Sb . for a given set of class Ci . By the simple linear algebra T ' T 'T T ' ' ' ' T T PjQjiQij Pi T I T (PjQji Pi Qij)(PjQji Pi Qij) T/2 , (4) 4.1. Updating the total canonical correlations the discriminative function is rewritten as Let the total canonical correlations of existing image T T , (5) T argmaxtr(T SbT) / tr(T SwT) sets be {U , , P ,i 1,2,..., m} , here U and respectively T i m denote the eigenvectors and eigenvalues of S , ' ' ' ' T t where S BB(PQ PQ )(PQ PQ ) ,Bi {j|Cj Ci}, b l li i il l li i il s.t. T , and + N d is a normalized orthonormal iB+1 l i St U U Pi R m basis matrix of the i th existing set. Assume P is the ' ' ' ' T , W {j|C C}. m 1 Sw BB(PkQki Pi Qik )(PkQki Pi Qik ) i j i + normalized orthonormal basis matrix of a new data set, the iW1 k i update is defined as Finally the optimal T is computed by eigen-decomposition ' ' 1 1(U, , Pi , Pm 1) (U , ) i 1,2,...,m . (8) of(Sw) Sb . Without losing generality, we assume in the rest m of the paper all the Pi are normalized. Assume T , A 2B(Pm 1Qm 1,i PiQi,m 1)(Pm 1Qm 1,i PiQi,m 1) i1 ' T 4. Incremental Discriminant-Analysis of then the updated St is computed by St St A U U A. Canonical Correlations (IDCC) We wish to calculate the eigenvectors U ' and eigenvalues ' ' ' '' 'T Two equivalent criterions to obtain the discriminant of St , i.e. St U U . To reduce the dimension of transformation matrix T are given by eigenvalue problem, the concept of the sufficient spanning m m T BB+Fik BB+Fih set [20] is used. Let the SVD of A be A W W , here iW1 k i iT1 h i . (6) max max arg T m arg T m W and are respectively the eigenvectors and eigenvalues. BB Fil BB Fil iB+1 l i iB+1 l i ' The sufficient spanning set of S t can be calculated Wi {j |Cj Ci} , Bi {j|Cj Ci} , Ti Wi Bi indexes the by t h([U ,W ]) with h an orthonormalization function. m m total sets. and respectively ' ' BB+Fik BB+Fil Then U is written as and R is a rotation matrix. iW1 k i iB1 l i U t Rt t represent the canonical correlations of within-class sets and Thus, we solve a smaller eigen-problem to obtain ' between-class sets. The total canonical correlations are Rt and : represented by m m m . ' ' ' 'T ' T T BB+Fih BB+Fik BB+Fil S U U R R iT1 h i iW1 k i iB1 l i t t t t t . (9) ; T T T ' T In this paper, the algorithm uses the second criterion in t (St A) t t (U U A) t Rt Rt Eq.6. By the simple linear algebra (Eq. 4), the Suppose that d and d are the number of discriminative function is t A eigenvectors of U and W respectively, the matrix T maxtr(T T S T) /tr(T T S T) , (7) b t T ' ' T t S t t has the reduced size dt dt dA and the m ' 3 T eigen-analysis of takes only computations. where ,B {j|C C}, St O((dt dA) ) Sb BB(PlQli PiQil )(PlQli PiQil ) i j i iB+1 l i Let m be the number of existing training sets, the m eigen-analysis of A requires 3 and the total cost of our T O(m ) S BB(P Q PQ )(P Q PQ ) ,Ti {j | j 1,..,m}. t h hi i ih h hi i ih 3 3 + incremental method is O((dt d A ) m ) . While the iT1 h i

2037

eigen-analysis of ' in batch mode requires 6 . ' ' ' ' ' St O(m ) 3(U , ,V , ) T . (12) 2 Typically, dt and d A are (much) less than m and m In order to further reduce the computation complexity, we respectively. introduce new sufficient spanning set to change eigen-analysis into a smaller dimensional eigenvalue problem requiring cost of '3 rather than '3 . 4.2. Updating the between-class canonical O(db ) O(dt ) '' 1/ 2 T ' correlations LetG U , then G StG I . As the denominator of the The between-class canonical correlations of existing second criterion in Eq.6 is the identity matrix, the problem data sets are represented by{V , , P ,C ,i 1,2,..., m}, here is to find the discriminative components that i i maximize T ' , s.t. T ' T . The final G S bG G SbG H H V and are the eigenvectors and eigenvalues of S b , discriminant components are obtained by T ' GH . The s.t. T . and respectively represent the Sb V V Pi Ci sufficient spanning set of the projection data can be normalized orthonormal component matrix and class label constructed by h([GTV ' ]) and the eigenvalue problem is of the i th set. Given a new image set represented by an T ' T T G SbG R R . (13) orthonormal basis matrix Pm 1and the corresponding class ; T GTV ''V 'T G RRT label C , the update is described as m 1 The updated discriminant matrix is given by ' ' ' 2 (V, , Pi ,Ci , Pm 1,Cm 1) (V , ) i 1,2,...,m. (10) T GH GR . (14) ' T The updated Sb is computed by Sb V V F , where Algorithm Incremental Discriminant-Analysis Canonical T and E {j|Cj Cm 1}. F 2B(Pm 1Qm 1,i PiQi,m 1)(Pm 1Qm 1,i PiQi,m 1) Correlations i+E Input: The total and between-class canonical correlations Q and Q are obtained by the SVD solution m 1,i i,m 1 eigen-models {U,,V,,P,C , i 1,2,...,m} of the existing T T T i i Pi TT Pm 1 Qi,m 1 Qm 1,i . Let Z be the eigenvectors of data sets and the normalized orthonormal basis matrix Pm 1 F obtained by SVD solution, the sufficient spanning set of of the new data set with its label C ' ' m 1 Sb can be given by b h([V ,Z]) . V b Rb with Rb a Output: Updated discriminant matrix ' rotation matrix. Accordingly, the new small dimensional T eigen-problem is expressed by 1. Update the total canonical correlations. ' '' 'T Sb V V T (11) Compute A and A W W . Set t by t h([U ,W ]) . ; T ' T T ' T b Sb b b (V V F) b Rb Rb T T Compute the eigenvectors Rt of t (U U A) t . Let nk be the number of sets belonging to class k , then ' U t Rt . the eigen-analysis of F costsO((m n )3) . Suppose d Cm 1 b 2. Update the between-class canonical correlations. T and dF are the number of eigenvectors of V and F Compute F and F Z Z . Set b by b h([V ,Z]) . respectively, the matrix T ' has the reduced T T b S b b Compute the eigenvectors Rb of b (V V F) b . ' ' ' . size db db dF . The eigen-analysis of Sb requires at most V b Rb 3 3. Update the discriminant matrix. O((db dF ) ) , whereas the eigen-analysis of the new ' ' 1/2 T between-class canonical correlations in batch mode ComputeG U , h([G V ']) and the eigenvectors costs O((m2 B n2 )3) with m B n .Typically, d and R of TGTV ''V 'TG.T ' GR. k k k k b Table2: Procedure of IDCC d are respectively (much) less than (m2 B n2) and F k k (m n ) . ' ' Cm 1 Let db be the number of eigenvectors V , the time for '3 ' eigen-problem in Eq.13 takes O(db ) . The dimension dt of 4.3. Updating the discriminant transformation U' is usually larger than d ' , so the computation efficiency matrix b of T ' improves from O(d '3) to O(d '3 ) . The procedure The discriminant transformation matrix is computed t b using the updated total canonical correlations and of the complete IDCC algorithm is listed in Table 2. between-class canonical correlations:

2038

5. Experiments improvements on recognition accuracy over IPCA and ILDA. Table 4 demonstrates the recognition rates of IDCC We have conducted experiments to evaluate the as well as some art-of-state methods, and all these methods performance of the proposed method on two publicly adopt the evaluation scheme of leaving one out cross available datasets: KTH human dataset and Weizmann validation. As shown in Table 4, our method is human dataset. For all experiments, the non-optimized significantly superior to the previous methods. Matlab codes run on a Dell PC with Intel Pentium D 3.4

GHz CPU and 1G RAM. 1.1

5.1. Weizmann action recognition 1 We have tested the proposed method on the Weizmann 0.9 action dataset [11]. There are about 90 low-resolution 0.8 (180 144, 25fps) video sequences showing nine different subjects, each performing 10 actions including bending 0.7

(bend), jumping jack (jack), jumping-forward-on-two-legs 0.6 (jump), jumping-in-place-on-two-legs (pjump), running (run), skipping (skip), galloping-sideways (side), walking 0.5 (walk), waving-one-hand (wave1) and waving-two-hands Recognition Accuracy 0.4 ILDA (wave2). The centered silhouettes extracted in [11] are IPCA DCC normalized to the same 64 48 dimension and converted 0.3 IDCC into 3072 dimensional vectors in a raster-scan manner. The 0.2 classification accuracy is evaluated under nine-fold cross 1000 1500 2000 2500 3000 3500 4000 4500 5000 Number of Samples validation. Each time we take the silhouette frames of eight subjects for training and use those of the remaining one Figure1: Recognition accuracy of incremental solution. subject for testing. The training dataset is further partitioned into an initial set which is used for learning the initial discriminative model and the remaining sets which 45 are added successively for re-training. 40 DCC The efficiency and accuracy of IDCC have been IDCC examined by comparing it with DCC [19], ILDA [16] and 35 IPCA [15]. Particularly, we are interested in evaluating the 30 discriminability and execution time of IDCC with the increasing datasets. In IDCC and DCC, the best dimension 25 of the linear subspace of each image set is around 19 to 20 represent 99 percent information and the Nearest Neighbor (NN) classification is utilized based on the similarity 15 between subspaces. PCA is performed to learn the linear Time(seconds) Executation 10 subspace of each set in IDCC and DCC. For ILDA and IPCA, the dimensions of eigenspace are set to 8 and 28 5 respectively, and the k-Nearest Neighbor (k=10) is used for 0 1000 1500 2000 2500 3000 3500 4000 4500 classification. Figure 1 demonstrates the recognition Number of Samples accuracy of IDCC and the related methods with the Figure 2: Computation efficiency of IDCC and DCC. increasing training data. IDCC achieves nearly the same accuracy as DCC, provided that enough components of the Methods Weizmann total and between-class canonical correlations are stored. Accuracy (%) 10NN-ILDA and 10NN-IPCA perform worse since they are based on single image matching without exploiting the 10NN-IPCA 86.09 0.03(28) multiple image sets. The comparison of computational 10NN-ILDA 65.15 0.08(8) costs between DCC and IDCC is illustrated in Figure 2. DCC 98.89 0.02(19) Whereas the execution time of DCC increases significantly IDCC 98.89 0.02(19) with the training samples arriving successively, the time of Table 3: Mean and standard deviation of recognition accuracy for the IDCC remains low. Table 3 concludes the mean and different methods. The number in parentheses represents the standard deviation of recognition accuracy for different dimensionality of linear subspace. methods. Both IDCC and DCC provide significant

2039

Methods Weizmann KTH s1 KTH s3 KTH s4 Avg Accuracy (%) 10NN- 51.46 49.82 50.69 50.66 Our method 98.9 IPCA 0.16 0.14 0.15 0.15 Wang and Suter [1] 97.8 10NN- 48.07 39.77 53.62 47.15 Zhang et al. [13] 92.9 ILDA 0.13 0.13 0.08 0.11 DCC 94.67 89.33 97.67 93.89 Ali et al. [6] 92.6 0.05 0.10 0.02 0.06 Jia and Yeung [2] 90.9 IDCC 96.00 90.67 98.67 95.11 Scovanner et al. [14] 84.2 0.06 0.10 0.02 0.06 Niebles and Li [7] 72.8 Table 5: Comparisons of recognition rates. s1, s3, s4 corresponds Table 4: Recognition accuracy of some related recognition to different conditions of the KTH database and Avg to the mean approaches. All these approaches use the evaluation scheme of performance across the sets. leaving one out cross validation Methods KTH Accuracy (%) 5.2. KTH action recognition Our method 95.1 The KTH human action dataset [12] contains six types of Zhang et al. [13] 91.3 human actions: walking, jogging, running, boxing, hand Savarese et al. [8] 86.8 waving and hand clapping. These actions are performed Wang et al. [4] 85.0 several times by twenty-five subjects in four different scenarios: outdoors (s1), outdoors with scale variation (s2), Niebles et al. [9] 81.5 outdoors with different clothes (s3) and indoors with Dollar et al. [10] 81.7 lighting variation (s4). Some body tracking methods (e.g. Table 6: Recognition accuracy on KTH dataset comparison [23]) can be applied to locate the areas and the geometric between related recognition approaches. All these approaches use centers of human bodies in each frame, and the centered the evaluation scheme of leaving one out cross validation body region is normalized to the size of 5050. Since the scenario s2 is only the scale variation of s1, the normalized 5.3. Robustness test human image of s2 are very similar to that of s1 and we To evaluate the adaptability and robustness of IDCC to conduct the experiment on s1, s3 and s4. Leave-one-out the irregular actions in changing scenarios, we conduct the cross-validation is performed to test the proposed method, experiment on 10 video sequences of people walking in i.e. for each run the image sets of 24 subjects are used for various difficult scenarios [11], including walking with a training and the image sets of the remaining subject are for dog, walking when swinging a bag, walking in a skirt, testing. Figure 3 demonstrates the recognition rates of walking with partially occluded legs, walking occluded by incremental solution between IDCC, DCC, IPCA and pole, sleepwalking, limping, walking with knees up, ILDA. Table 5 shows the mean and standard deviation of walking when carrying a briefcase, and normal walking. recognition accuracy for different methods on s1, s3 and s4 In this experiment, the testing action is recognized on a datasets of KTH. Moreover, we compare the recognition frame-by-frame basis and the recognition accuracy is rates between some related recognition methods in Table 6. measured in terms of the percentage of the correctly

1 recognized frames among the whole sequence. At each frame, we collect its local temporal neighbors as test image 0.9 set and acquire the class label by computing the canonical

0.8 correlations between the test set and those training sets. ILDA With the time process, several recognized frames are IPCA 0.7 DCC accumulated to construct the new training set for online IDCC re-training the discriminative model. For the trade-off 0.6 between computational efficiency and effectiveness, we

0.5 update the discriminative model at each interval of several frames rather than each frame. Table 7 gives the Recognition Accuracy 0.4 comparison results of LSTDE [2], DCC and IDCC, from which we can see that IDCC indeed flexibly adapts to the 0.3 irregular action recognition from ever-changing silhouettes

0.2 via online updating the discriminative model. 2000 3000 4000 5000 6000 7000 Number of Samples

Figure 3: Recognition rates of incremental solutions.

2040

Test sequence Recognition accuracy (%) classification. In IEEE Workshop on Motion and Video LSTDE DCC IDCC Computing, 2008. Walk with a dog 90.74 100 100 [9] J.C. Niebles, H.C. Wang and F.F. Li. Unsupervised learning Swinging a bag 74.58 100 100 of human action categories using spatial-temporal words. In Walk in a skirt 92.16 80.49 85.37 BMVC, 2006. Occluded feet 71.19 77.55 93.88 [10] P. Dollar, V. Raband, G. Cottrell and S. Belongie. Behavior recognition via sparse spatio-temporal features. In VS-PETS, Occluded by pole - 84.62 92.31 2005. Moonwalk 56.06 78.57 94.64 [11] M. Blank, L. Gorelick, E. Shechtman, M. Irani and R. Basri. Limp walk 83.96 87.50 90.63 Actions as space-time shapes. In ICCV, 2005. Walk with knees up 66.02 64.52 72.10 [12] C. Schuldt, I. Laptev and B. Caputo. Recognizing human Carry a briefcase 92.86 100 100 actions: A local SVM approach. In ICPR, 2004. Normal walk 95.16 100 100 [13] Z.M. Zhang, Y.Q. Hu, S. Chan and L.T. Chia. Motion Table 7: Comparison of robustness test results between LSTDE, Context: A New Representation for Human Action DCC and IDCC. Recognition. In ECCV, 2008. [14] P. Scovanner, S. Ali and M. Shah. A 3-dimensional sift 6. Conclusions descriptor and its applications to action recognition. In ACM Multimedia, 2007. We have presented a novel Incremental Discriminant- [15] P. Hall and R. Martin. Incremental eigenanalysis for Analysis Canonical Correlation (IDCC) method and its classification. In BMVC, 1998. application to the online human action recognition in [16] S.N. Pang, S. Ozawa and N. Kasabov. Incremental linear various changing scenarios. By efficiently updating the discriminant analysis for classification of data streams. In discriminative model, IDCC can adapt to the appearance IEEE Transactions on Systems, and Cybernetics-Part B: variations of human and accurately recognize the action Cybernetics: vol.35, No.5, October 2005. even in high irregular performance. Experiments on both [17] T. K. Kim, S. F. Wong, B. Stenger, J. Kittler and R. Cipolia. regular and irregular actions have shown the superior Incremental linear discriminant analysis using sufficient spanning set approximations. In CVPR, 2007. discriminability in classification, significant adaptability to [18] R. S. Lin, D. Ross, J. Lim and M. H. Yang. Adaptive changing environments and high computational efficiency discriminative generative model and its applications. In of learning. NIPS, 2004. [19] T. K. Kim, J. Kitter and R. Cipolla. Discriminative learning 7. Acknowledgements and recognition of image set classes using canonical correlations. In IEEE Transactions on Pattern Analysis and This work was partially supported by the Natural Machine Intellegence, vol.29, no.6, June, 2007. Science Foundation of China (60675021), the 973 Program [20] P. Hall, D. Marshall and R. Martin. Merging and splitting of China (2006CB303103), and the Chinese High-Tech eigenspace models. In IEEE Transactions on Pattern Program (2009AA01Z323). Analysis and Machine Learning. vol.22, no.9, pp.1042-1049, Sep.2000. References [21] R. O. Duda, P. E. Hart and D. G. Stork. Pattern classification, seconded. John Wily and Sons, 2000. [1] L. Wang and D. Suter. Recognizing human activities from [22] T.K. Kim, J. Kittler and R. Cipolla. Incremental Learning of silhouettes: motion subspace and factorial discriminative Locally Orthogonal Subspaces for Set-based Object graphical model. In CVPR, 2007. Recognition. In BMVC, 2006. [2] K. Jia and D. Y. Yeung. Human action recognition using [23] A. Bissacco, M.H. Yang amd S. Soatto. Fast human pose local spatio-temporal discriminant embedding. In CVPR, estimation using appearance and motion via 2008. multi-dimensional boosting regression. In CVPR, 2007. [3] M. D. Rodriguez, J. Ahmed and M. Shah. Action MACH: A spatio-temporal maximum average correlation height filter for action recognition. In CVPR, 2008. [4] Y. Wang, P. Sabzmeydani and G. Mori. Semi-latent dirichlet allocation: A hierarchical model for human action recognition. In HUMO, 2007. [5] H. Jhuang, T. Serre, L. Wolf and T. Poggio. A Biologically Inspired System for Action Recognition. In ICCV, 2007. [6] S. Ali, A. Basharat and M. Shah. Chaotic invariants for human action recognition. In ICCV, 2007. [7] J. C. Niebles and F. F. Li. A Hierarchical model of shape and appearance for human action classification. In CVPR, 2007. [8] S. Savarese, A. DelPozo, J.C. Niebles and F.F. Li. Spatial-temporal correlations for unsupervised action

2041