An Efficient Exact-PGA Algorithm for Constant Curvature Manifolds

An efficient Exact-PGA algorithm for constant curvature manifolds Rudrasis Chakraborty1, Dohyung Seo2, and Baba C. Vemuri1 1Department of CISE, University of Florida, FL 32611, USA 2U-Systems, A GE Healthcare Company, CA, USA 1 2 {rudrasis, vemuri}@cise.ufl.edu {dhseo.118}@gmail.com Abstract 1. Introduction Principal Component Analysis (PCA) is a widely used dimensionality reduction technique in Science and Engineering. PCA however requires the input data to lie in a vector space. With the advent of new technolo- Manifold-valued datasets are widely encountered in gies and wide spread use of sophisticated feature ex- many computer vision tasks. A non-linear analog of traction methods, manifold-valued data have become the PCA algorithm, called the Principal Geodesic Anal- ubiquitous in many fields including but not limited ysis (PGA) algorithm suited for data lying on Rieman- to, Computer Vision, Medical Imaging and Machine nian manifolds was reported in literature a decade ago. Learning. A nonlinear version of PCA, called the Prin- Since the objective function in the PGA algorithm is cipal Geodesic Analysis (PGA), for data lying on Rie- highly non-linear and hard to solve efficiently in gen- mannian manifolds was introduced in [8]. eral, researchers have proposed a linear approximation. Since the objective function of PGA is highly non- Though this linear approximation is easy to compute, it linear and hard to solve in general, researchers pro- lacks accuracy especially when the data exhibits a large posed a linearized version of the PGA [8]. Though this variance. Recently, an alternative called the exact PGA linearized PGA, hereafter referred to as PGA, is com- was proposed which tries to solve the optimization with- putationally efficient, it lacks accuracy for data with out any linearization. For general Riemannian mani- large spread/variance. In order to solve the objective folds, though it yields a better accuracy than the orig- function exactly, Sommer et al., [25] proposed to solve inal (linearized) PGA, for data that exhibit large vari- the original objective function (not the approximation) ance, the optimization is not computationally efficient. and called it exact PGA . While exact PGA attempts In this paper, we propose an efficient exact PGA al- to solve this complex nonlinear optimization problem, gorithm for constant curvature Riemannian manifolds it is however computationally inefficient. Though it (CCM-EPGA). The CCM-EPGA algorithm differs sig- is not possible to efficiently and accurately solve this nificantly from existing PGA algorithms in two aspects, optimization problem for a general manifold, however, (i) the distance between a given manifold-valued data for manifolds with constant sectional curvature, we for- point and the principal submanifold is computed an- mulate an efficient and exact PGA algorithm, dubbed alytically and thus no optimization is required as in CCM-EPGA. It is well known in geometry, by virtue the existing methods. (ii) Unlike the existing PGA al- of the Killing-Hopf theorem [4], that any non-zero con- gorithms, the descent into codimension-1 submanifolds stant curvature manifold is isomorphic to either the hy- does not require any optimization but is accomplished persphere (SN ) or the hyperbolic space (HN ), hence in through the use of the Rimeannian inverse Exponential this work, we present the CCM-EPGA formulation for map and the parallel transport operations. We present (SN ) and (HN ). Our formulation has several applica- theoretical and experimental results for constant cur- tions to Computer Vision and Statistics including di- vature Riemannian manifolds depicting favorable per- rectional data [21] and color spaces [19]. Several other formance of the CCM-EPGA algorithm compared to applications of hyperbolic geometry are, shape analy- existing PGA algorithms. We also present data recon- sis [30], Electrical Impedance Tomography, Geoscience struction from the principal components which has not Imaging [28], Brain Morphometry [29], Catadiaoptric been reported in literature in this setting. Vision [3] etc. In order to depict the effectiveness of our proposed 13976 CCM-EPGA algorithm, we use the average projection to an efficient and novel algorithm for exact PGA on error as defined in [25]. We also report the computa- constant curvature manifolds (CCM-EPGA). tional time comparison of the CCM-EPGA with the Let M be a Riemannian manifold. Let us sup- PGA [8] and the exact PGA [25] algorithms respec- pose we are given a dataset, X = x , , x , where { 1 ··· n} tively. Several variants of the PGA exist in literature xj M. Let us assume that the finite sample Fréchet and we briefly mention a few here. In [23], authors mean∈ [9] of the data set exists and be denoted by µ. Let computed the principal geodesics (without approxima- Vk be the space spanned by mutually orthogonal vec- tion) only for a special Lie group, SO(3). Geodesic tors (principal directions) v1, , vk , vj TµM, j. th { ··· } ∈ ∀ PCA (GPCA) [14, 13] solves a different optimization Let Sk be the k geodesic subspace of TµM, i.e., function namely, optimizing the projection error along Sk = Expµ(Vk), where Exp is the Riemannian expo- the geodesics. Authors in [13], minimize the projection nential map (see [4] for definition). Then, the principal error instead of maximizing variance in geodesic sub- directions, vi are defined recursively by spaces (defined later in the paper). GPCA does not n use a linear approximation, but it is restricted to man- 1 2 vi = arg max d (µ, ΠSi (xj)) (1) ifolds where a closed form expression for the geodesics v v ⊥ n =1, Vi−1 j=1 exists. More recently, a probabilistic version of PGA k k ∈ X Si = Expµ(spanVi 1, vi) (2) called PPGA was presented in [31], which is a nonlinear − version of PPCA [27]. None of these methods attempt where d(x, y) is the geodesic distance between x M to compute the solution to the exact PGA problem and y M,Π (x) is the point in S closest to x ∈M. defined in [25]. Another recent work in [11], reports S The PGA∈ algorithm on M is summarized in Alg.∈1. a non-linear generalization of PGA, namely the principal geodesic curves, and argues about its usefulness Algorithm 1 The PGA algorithm on manifold M over PGA. 1: The rest of the paper is organized as follows. In Given a data set X = x1, , xn M, and 1 L dim(M) { ··· } ∈ ≤ Section 2, we present the formulation of PGA. We also ≤ discuss the details of the linearized version of PGA [8] 2: Compute the FM, µ, of X [1] and exact PGA [25]. Our formulation of CCM-EPGA 3: Set k 1 4: ←0 0 is presented in Section 2. Experimental results for the Set x¯1, , x¯n x1, , xn 5: while{ k··· L do} ← { ··· } CCM-EPGA algorithm along with comparisons to ex- ≤ n 1 act PGA and PGA are presented in Section 3. In ad- 6: 2 k 1 Solve vk = arg max d (µ, ΠSk (¯xj − )) dition to synthetic data experiments, we present the v v v ⊥ n =1, TµM, Vk−1 j=1 k k ∈ ∈ X comparative performance of CCM-EPGA on two real as in Eq. (1). data applications. In Section 4, we present the for- 7: k 1 k 1 Project x¯1− , , x¯n− to a k co-dimension mulation for the reconstruction of data from principal one submanifold{ Z···of M, which} is orthogonal to directions and components in this nonlinear setting. the current geodesic subspace. Finally, in section 5, we draw conclusions. 8: k k Set the projected points to x¯1 , , x¯n 9: k k + 1 { ··· } 2. Principal Geodesic Analysis 10: end while← Principal Component Analysis (PCA) [17] is a well known and widely used statistical method for dimen- 2.1. PGA and exact PGA sionality reduction. Given a vector valued dataset, it returns a sequence of linear subspaces that maximize In Alg. 1 (lines 6 7), as the projection operator Π the variance of the projected data. The kth subspace is hard to compute,− hence a common alternative is to is spanned by the principal vectors v1, v2, , vk locally linearize the manifold. This approach [8] maps which are mutually orthogonal. PCA{ is well··· suited} all data points on to the tangent space at µ, and as for vector-valued data sets but not for manifold-valued the tangent plane is a vector space, one can use the inputs. A decade ago, the nonlinear version called the PCA to compute the principal directions. This simple Principal Geodesic Analysis (PGA) was developed to scheme is an approximation to the PGA and naturally cope with manifold-valued inputs [8]. In this section, raises the following question: Is it possible to do PGA first, we briefly describe this PGA algorithm, then, we (solve Eq. (1)) without any linearization? The answer show the key modification performed in [25] to arrive is yes. But, computation of the projection operator, at what they termed as the exact PGA algorithm. We ΠS(x), i.e., the closest point to x in S is computation- then motivate and present our approach which leads ally expensive. In [25], Sommer et al. give an alterna- 3977 1 ¯ θ ¯ tive formulation for the PGA by minimizing the aver- by, Expψ− (ψ) = sin(θ) (ψ ψ cos(θ)) where, θ = 2 − age squared reconstruction error, i.e., d (xj, ΠSi (xj)) d(ψ, ψ¯). 2 instead of d (µ, ΠSi (xj)) in eqns. (1). They use an optimization scheme to compute this projection. Further, 2.2.2 Basic Riemannian Geometry of HN they termed their algorithm, exact PGA, as it does not require any linearization. However, their optimization The hyperbolic N-dimensional manifold can be embed- scheme is in general computationally expensive and for ded in RN+1 using any of three different models.

An Efficient Exact-PGA Algorithm for Constant Curvature Manifolds

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support