Statistical Analysis on Stiefel and Grassmann Manifolds with Applications in Computer Vision
Total Page:16
File Type:pdf, Size:1020Kb
Statistical analysis on Stiefel and Grassmann Manifolds with applications in Computer Vision Pavan Turaga, Ashok Veeraraghavan and Rama Chellappa Center for Automation Research University of Maryland, College Park {pturaga,vashok,rama}@umiacs.umd.edu Abstract ifold (geodesics etc). The answer to the second question involves statistical modeling of inter- and intra-class varia- Many applications in computer vision and pattern recog- tions on the manifold. The solution to the second problem nition involve drawing inferences on certain manifold- goes far beyond simply defining distance metrics. Given valued parameters. In order to develop accurate inference several samples per class, one can derive efficient proba- algorithms on these manifolds we need to a) understand the bilistic models on the manifold by exploiting the class en- geometric structure of these manifolds b) derive appropri- semble populations that are also consistent with the geo- ate distance measures and c) develop probability distribu- metric structure of these manifolds. This offers significant tion functions (pdf) and estimation techniques that are con- improvements in performance over, say, a distance-based sistent with the geometric structure of these manifolds. In nearest-neighbor classifier. In addition to this, statistical this paper, we consider two related manifolds - the Stiefel models also provide us with generative capabilities via ap- manifold and the Grassmann manifold, which arise natu- propriate sampling strategies. rally in several vision applications such as spatio-temporal In this paper, we concern ourselves with two related modeling, affine invariant shape analysis, image matching manifolds that often appear in several applications in com- and learning theory. We show how accurate statistical char- puter vision – the Stiefel Manifold and the Grassmann Man- acterization that reflects the geometry of these manifolds ifold. The Stiefel manifold is the space of k orthonormal allows us to design efficient algorithms that compare fa- m vectors in R , represented by an m × k matrix Y , such vorably to the state of the art in these very different ap- T that Y Y = Ik. The Grassmann manifold is the space of k plications. In particular, we describe appropriate distance m dimensional subspaces in R . While a point on the Grass- measures and parametric and non-parametric density esti- mann manifold represents a subspace, a point on the Stiefel mators on these manifolds. These methods are then used manifold also specifies exactly what frame (choice of basis to learn class conditional densities for applications such as vectors) is used in order to specify the subspace. The study activity recognition, video based face recognition and shape of these manifolds has important consequences for applica- classification. tions such as dynamic textures [24, 9], human activity mod- eling and recognition [7, 30], video based face recognition [2] and shape analysis [14, 20] where data naturally lies ei- 1. Introduction ther on the Stiefel or the Grassmann manifold. Many applications in computer vision such as dynamic The geometric properties of the Stiefel and Grassmann textures [24, 9], human activity modeling and recognition manifolds are well understood and we refer the interested [7, 30], video based face recognition [2], shape analysis reader to [13, 1] for specific details. The scope of this paper [14, 20] involve learning and recognition of patterns from is not restricted to the differential geometry of these mani- exemplars which lie on certain manifolds. Given a database folds. Instead, we are interested in statistical modeling and of examples and a query, the following two questions are inference tools on these manifolds and their applicability usually addressed – a) what is the ‘closest’ example to the in vision applications. While a meaningful distance met- query in the database ? b) what is the ‘most probable’ class ric is an important requirement, statistical methods (such to which the query belongs ? A systematic solution to these as learning probability density functions) provide far richer problems involves a study of the manifold on which the tools for pattern classification problems. In this context, we data lies. The answer to the first question involves study of describe the Procrustes distance measures on the Stiefel and the geometric properties of the manifold, which then leads the Grassmann manifold [10]. Further, we describe para- to appropriate definitions of distance metrics on the man- metric and non-parametric kernel based density functions 1 on these manifolds and describe learning algorithms for es- of shapes is a well understood field [15, 12]. The shape timating the distribution from data. observed in an image is a perspective projection of the Prior Work: Statistical methods on manifolds have original shape. In order to account for this, shape the- been studied for several years in the statistics community ory studies the equivalent class of all configurations [5, 21, 22]. A compilation of research results on statisti- that can be obtained by a specific class of transforma- cal analysis on the Stiefel and Grassmann manifolds can tion (e.g. linear, affine, projective) on a single basis be found in [10]. Their utility in practical applications has shape. It can be shown that affine and linear shape not yet been fully explored. Theoretical foundations for spaces for specific configurations can be identified by manifolds based shape analysis were described in [14, 20]. points on the Grassmann manifold [20]. Given several The Grassmann manifold structure of the affine shape space exemplar shapes belonging to a few known classes, we is exploited in [4] to perform affine invariant clustering of are interested in estimating a probability distribution shapes. [26] exploited the geometry of the Grassmann man- over the shape space for each of the classes. These can ifold for subspace tracking in array signal processing ap- then be used for problems such as retrieval, classifi- plications. Statistical learning of shape classes using non- cation or even to learn a generative model for shapes. linear shape manifolds was presented in [27] where statis- The theory developed here may also impact several re- tics are learnt on the manifold’s tangent space. Classifica- lated problems such as video stabilization and image tion on Riemannian manifolds have also been explored in registration which amounts to finding the best warping the vision community such as in [28, 29]. based on mean affine shapes of a few landmark points. Organization of the paper: In section 2, we present 3. On-line Visual Learning via Subspace Tracking: motivating examples where Stiefel and Grassmann mani- Applications involving dynamic environments and au- folds arise naturally in vision applications. In section 3, tonomous agents such as a mobile robot navigating a brief review of statistical modeling theory on the Stiefel through an unknown space cannot be represented by and Grassmann manifolds is presented. In section 4, we static models. In such applications it is important to demonstrate the strength of the framework on several ex- adapt models, that have been learnt offline, accord- ample applications including view invariant activity recog- ing to new observations in an online fashion. One ap- nition, video based face recognition, shape matching and proach is to perform incremental PCA to dynamically retrieval. Finally, section 5 presents some concluding re- learn a better representational model as the appearance marks. of the target dynamically changes as in [3]. Incremen- tal PCA has also been used to recognize abnormalities 2. Motivating Examples in the visual field of a robot as in [18]. In an unre- 1. Spatio-temporal dynamical models: Awideva- lated domain, the theory of subspace tracking on the riety of spatio-temporal data in computer vision are Grassmann manifold [26] has been developed for ar- modeled as realizations of dynamical models. Exam- ray signal processing applications. Since PCA basis ples include Dynamic textures [24], human joint an- vectors represent a subspace which is identified by a gle trajectories [7] and silhouette sequences [30]. One point on the Grassmann manifold, subspace tracking popular dynamical model for such time-series data lends itself readily to statistical analysis for online vi- is the autoregressive and moving average (ARMA) sual learning applications. model. For the ARMA model closed form solutions for learning the model parameters have been proposed 3. Statistics on Manifolds in [19, 24] and are widely used. The parameters of the In this section, we discuss the problem of pattern recog- model are known to lie on the Stiefel manifold as noted nition on Stiefel and Grassmann manifolds. Specifically, in [23]. Given several instances, current approaches parametric and non-parametric distributions and parame- involve computing the distance between them using ter estimation are reviewed. Procrustes analysis and cor- well-known distance measures [11] followed by near- responding distance metrics on the manifolds is also pre- est neighbor classification. Instead, given several in- sented. The discussion will focus mainly on the Stiefel stances of each class we can learn compact class con- manifold. Similar results extend to the Grassmann mani- ditional probability density functions over the parame- fold [10]. ter space – the Stiefel manifold in this case. Maximum likelihood and maximum a posteriori estimation can 3.1. Definitions then be performed in order to solve problems such as The Stiefel Manifold Vk,m [10]: The Stiefel manifold video classification, clustering and retrieval. m Vk,m is the space whose points are k-frames in R , where 2. Shape Analysis: Representations and recognition a set of k orthonormal vectors in Rm is called a k-frame in m R (k<= m). Each point on the Stiefel manifold Vk,m can reduced rank (rank = p) singular value decomposition of X. T ˆ ˆ be represented as a m×k matrices X such that X X = Ik, The maximum likelihood estimators for Γ, Θ are given by ˆ ˆ ˆ where Ik is the k × k identity matrix.