The Space of Essential Matrices As a Riemannian Quotient Manifold∗

SIAM J. IMAGING SCIENCES c 2017 Society for Industrial and Applied Mathematics Vol. 10, No. 3, pp. 1416{1445 The Space of Essential Matrices as a Riemannian Quotient Manifold∗ Roberto Trony and Kostas Daniilidisz Abstract. The essential matrix, which encodes the epipolar constraint between points in two projective views, is a cornerstone of modern computer vision. Previous works have proposed different characterizations of the space of essential matrices as a Riemannian manifold. However, they either do not consider the symmetric role played by the two views or do not fully take into account the geometric peculiarities of the epipolar constraint. We address these limitations with a characterization as a quotient manifold that can be easily interpreted in terms of camera poses. While our main focus is on theoretical aspects, we include applications to optimization problems in computer vision. Key words. epipolar geometry, Riemannian geometry, optimization AMS subject classifications. 68Q25, 68R10, 68U05 DOI. 10.1137/16M1091332 1. Introduction. The essential matrix and the epipolar constraint, introduced by [16], have been a major mainstay of compute vision for the last 30 years and are the basic building block in any Structure from Motion (SfM) system. Its robust estimation from image data is now standard course material (see the textbooks from [13] and [17]). In practical terms, an essential matrix encodes an epipolar configuration (i.e., an Euclidean motion between two 3×3 3×3 camera views) as a matrix in R . The space of essential matrices is a subset of R , but the algebraic relations imposed by the epipolar constraint render its geometry and its relation with the space of epipolar configurations far from trivial. There have been a few attempts to characterize the space of essential matrices as a Riemannian manifold. The earliest works in this aspect are from [22] and [18], with a follow-up from [7]; in these works, one of the two views is chosen as the global reference frame, and essential matrices are parametrized using the (normalized) relative poses between cameras (unit vectors for the translations and rotation matrices). This parametrization implies a preferential treatment of one of the two cameras and breaks the natural symmetry of the constraint. A different representation, based on the Singular Value Decomposition (SVD) of the essential matrix, has been used in several papers [9, 14, 24, 25]. While this representation has a natural symmetry, previous works do not provide an intuitive geometric interpretation of its parameters. In addition, they do not completely take into account the well-known twisted-pair ambiguity, ∗Received by the editors August 29, 2016; accepted for publication (in revised form) March 27, 2017; published electronically August 31, 2017. A preliminary version of part of this work has appeared in the conference proceeding [28]. http://www.siam.org/journals/siims/10-3/M109133.html Funding: The work of the authors was supported by grants NSF-IIP-0742304, NSF-OIA-1028009, ARL MAST- CTA W911NF-08-2-0004, and ARL RCTA W911NF-10-2-0016, NSF-DGE-0966142, and NSF-IIS-1317788. yDepartment of Mechanical Engineering, Boston University, Boston, MA 02215 ([email protected], http://sites.bu. edu/tron). zGRASP Lab, University of Pennsylvania, Philadelphia, PA 19104 ([email protected], www.cis.upenn.edu/ ∼kostas). 1416 THE SPACE OF ESSENTIAL MATRICES 1417 i.e., the fact that four different epipolar configurations correspond to the same essential matrix (with an arbitrary choice of sign). Moreover, when considered, the algorithm used for the computation of the logarithm map (which is related to the notion of geodesics in this space) is neither efficient nor rigorously motivated. In this work, we propose characterizations of the spaces of essential matrices and epipolar configurations as Riemannian quotient manifolds. We make the following contributions: 1. Our representation is related to the aforementioned SVD formulation, but we derive our results from a particular choice of the global reference frame, leading to a clear geometric interpretation of the parameters. 2. We clarify the relation between the chirality constraint (i.e., the constraint that all the points lie in front of both cameras), the space of essential matrices, and the space of epipolar configurations. 3. We use the theory of quotient manifold to endow the space of essential matrices and the space of epipolar configurations with a Riemannian manifold structure. This procedure leads to a natural characterization of geodesics, distance, and curvature from those defined in the space of rotations. 4. We provide expressions for the curvature of the manifolds, showing that it is nonnegative. This is an important fact for some optimization algorithms. 5. Our treatment includes procedures to efficiently and correctly compute the logarithm map and distance function between points on the manifolds. 6. We apply the theory to problems in two-view SfM, showing that the proposed representation provides an effective way to parametrize optimization problems and a meaningful notion of distance between epipolar configurations. Some material in this paper might appear quite basic for any reader versed in computer vision. However, it is necessary to revisit it and place it in the context of our parametrization. Paper outline. The paper is organized as follows. We first introduce our notation and review basic concepts in Riemannian geometry and group theory (section2). We then derive a canonical decomposition of essential matrices that is given by a particular choice of the global reference frame (subsection 4.1), use it to characterize the space of essential matrices as a quotient space (subsection 4.2), and show its interpretation in terms of image vectors (subsection 4.3). Using the chirality constraint, we derive the signed normalized essential space from the space of essential matrices and show that it is a quotient manifold (subsection 5.3); we derive expressions and algorithms for computing geodesics, distances, and curvature of this manifold (subsections 5.4 to 5.6). We use these results to then go back to the space of essential matrices and show that it is also a manifold (section6). Finally, we show an application of the theory to optimization problems in computer vision (section7). 2. Definitions and notation. In this section, we define the notation used in this paper and review several notions from Riemannian geometry and group theory. For the most part, these are well-established results, and we include here just the minimum necessary to follow the paper while referring the reader to the literature for complete and rigorous definitions [6, 21]. Nonetheless, subsection 2.7 includes results for SO(3) × SO(3) as a Lie group that, although simple, have never been explicitly presented before. 1418 ROBERTO TRON AND KOSTAS DANIILIDIS 3×3 2.1. General notation. We denote as I 2 R the identity matrix and as Pz = diag(1; 1; 0) the standard projector on the xy-plane. As customary, we use so(3) to indicate the space of 3 3 3 × 3 skew-symmetric matrices. For standard vectors a 2 R ,[a]× : R ! so(3) denotes the 3 matrix representation of the cross-product operator, i.e., [a]×b = a × b for all a; b 2 R . We use inv 3 [a]× : so(3) ! R to denote the inverse of this linear mapping. We use sym(A) and skew(A) d×d to denote, respectively, the symmetric and antisymmetric part of a square matrix A 2 R , that is, 1 1 (1) sym(A) = A + AT ; skew(A) = A − AT : 2 2 2.2. Riemannian geometry. At a high level, a manifold M is defined by a topological space that is Hausdorff1 and that is equipped with a set of overlapping local coordinate charts. These charts locally parametrize the space, and it is possible to transition smoothly from one chart to the other. The tangent space at a point x 2 M, denoted as TxM, can be defined as the linear space containing all the tangent vectors corresponding to the curves passing through _ x. We use the notation v to denote the vector of coordinates of v in some basis for TxM.A vector field V assigns a tangent vector v = V jx to each x in M or a subset of it. We denote as X (M) the set of smooth vector fields on M. Given V; W 2 X (M), the Lie bracket of two vector fields is denoted as [V; W ] 2 X (M). Intuitively, [V; W ] represents the \derivative" of a field with respect to another and assumes a simple expression for the manifolds and fields considered in this paper (as we will see in subsection 2.7). A Riemannian manifold (M; h·; ·i) is a manifold endowed with a Riemannian metric, a collection of inner products h·; ·ix over TxM that varies smoothly with x. The metric is used to define the length of a curve γ : R ⊃ [a; b] !M. A curve is a geodesic if the covariant derivative of its tangent is zero, i.e., rγ_ γ_ ≡ 0, where r is the Levi-Civita connection. The exponential map expx : TxM!M maps each tangent vector v to the endpoint of the unit-speed geodesic starting at x with tangent v. The logarithm map logx is the inverse of the expx and is defined (in general) only on a neighborhood of x. We use the shorthand notation Log = log_ to denote the logarithm map expressed in local tangent space coordinates. For any point x and any curve γ(t) in M sufficiently close together, the logarithm is related to the distance function by the relations d 1 (2) dx; γ(t) = kLog γ(t)k; d2x; γ(t) = − Log γ(t)Tγ_ (t)_: x dt 2 x Given the Levi-Civita connection, one can define an intrinsic notion of curvature of the space.

Load more