See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/304035052

Relative Camera Pose Recovery and Scene Reconstruction with the Essential in a Nutshell

Technical Report Β· March 2013

CITATIONS READS 0 32

1 author:

George Terzakis University of Portsmouth

13 PUBLICATIONS 0 CITATIONS

SEE PROFILE

All content following this page was uploaded by George Terzakis on 17 June 2016.

The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Relative Camera Pose Recovery and Scene Reconstruction with the Essential Matrix in a Nutshell

George Terzakis

Technical Report#: MIDAS.SNMSE.2013.TR.007

Plymouth University, March 11, 2013

Abstract This paper intends to present and analyze all the necessary steps one needs to take in order to build a simple application for relative pose structure from motion using only the epipolar constraint. In other words, to illustrate in practice (i.e,, cook-book style) about how the most fundamental of concepts in can be combined into a complete and disambiguated series of steps for camera motion estimation and 3D reconstruction in a relatively simple computer program. In particular, the essential matrix is defined with respect to the algebraic interpretation of the , in order to eliminate ambiguities with respect to how relative orientation is extracted from the epipolar constraint. Based on the latter formulation, we list the prominent properties of the essential matrix with proofs to provide necessary intuition into the series of derivations that ultimately lead to the recovery of the orientation and baseline vector.

1. Introduction

The problem of 3D reconstruction and odometry estimations has been under the spotlight of research in and robotics for the past two decades. Since the early paper by Longuett – Higgins [1] which built a start-up theoretical layer on the original photogrammetric equations by introducing the epipolar constraint, there have been numerous studies and proposed algorithms to estimate camera motion and to realize the world locations of a sparse cloud of points [2-5]. Such problems are now known as structure from motion (SFM) problems and fall in the broader field of multiple view geometry in computer vision. Although the geometry of multiple images has been extensively researched, structure from motion remains a vastly unexplored territory, mainly due to great informational bandwidth that images carry. The pipeline of SFM starts with the detection of corresponding points in two distinct images of the same scene. There have been several algorithms that match points across multiple views, known as optical flow estimation methods [6-8]. Although such methods are being extensively employed with a great deal of success in many cases, they are nevertheless relying on a number of strong assumptions such as brightness constancy, no occlusions and relatively slow camera motion. Limitations in point will consequently propagate a fair amount of noise throughout geometrical computations, thereby introducing unstable results in many cases.

2. The Epipolar Constraint

2.1 Moving Between Coordinate Frames The goal of this section is to derive a formula for the coordinates of a 3D point M in a given camera frame located at a location b, given the orientation of the frame as three unit vectors ( , , ), where b and , , are expressed in terms of the first camera frame ( , , ) which is chosen for global reference. Figure D.1 illustrates the two 1 2 3 1 2 3 frames, the baseline𝑒𝑒 𝑒𝑒 𝑒𝑒and the vectors that𝑒𝑒 connect𝑒𝑒 𝑒𝑒 the two camera centers with the point M. 𝑀𝑀1 𝑀𝑀2 𝑀𝑀3

Figure 1. The two camera frames; the baseline and the vectors that connect the two camera centers and with M are indicated with dashed lines. Let now = be the coordinates of in the first camera frame and the respective 𝑂𝑂1 𝑂𝑂2 coordinates in the second camera frame. Also, let ( ) be the vector expressed 1 2 in the coordinate𝑀𝑀 𝑀𝑀 frame of the first camera.𝑀𝑀 Then, the coordinates of𝑀𝑀 will be the 2 1 2 projections of ( ) on the direction vectors , 𝑂𝑂, 𝑀𝑀 : 𝑂𝑂 𝑀𝑀 𝑀𝑀2 ( ) 𝑂𝑂2𝑀𝑀 1 𝑒𝑒1 𝑒𝑒2 𝑒𝑒3 = 𝑇𝑇( ) = 𝑇𝑇 ( ) = [ ] ( ) = ( ) (1) 𝑒𝑒1 𝑂𝑂2𝑀𝑀 1 𝑒𝑒1 𝑇𝑇( ) 𝑇𝑇 𝑇𝑇 𝑇𝑇 2 2 2 1 2 2 1 𝑒𝑒1 𝑒𝑒2 𝑒𝑒3 2 1 2 1 𝑀𝑀 �𝑒𝑒𝑇𝑇 𝑂𝑂 𝑀𝑀 οΏ½ �𝑒𝑒𝑇𝑇� 𝑂𝑂 𝑀𝑀 𝑂𝑂 𝑀𝑀 𝑅𝑅 𝑂𝑂 𝑀𝑀 But ( ) can2 be1 written as the difference of vectors ( ) = and b: 𝑒𝑒3 𝑂𝑂 𝑀𝑀 𝑒𝑒3 ( ) = 𝑂𝑂2𝑀𝑀 1 𝑂𝑂1𝑀𝑀 1 𝑀𝑀1 (2) Substituting from (1) into (2) yields: 𝑂𝑂2𝑀𝑀 1 𝑀𝑀 βˆ’ 𝑏𝑏 = ( ) (3) 𝑇𝑇 𝑀𝑀2 𝑅𝑅 𝑀𝑀1 βˆ’ 𝑏𝑏 2.2 The Essential Matrix With the above in place, consider now the normalized Euclidean projections and of . Epipolar geometry in two views concerns the geometry of the plane induced by the 1 2 baseline and the projection rays that connect the real-world location of a featureπ‘šπ‘š with π‘šπ‘šthe two𝑀𝑀 projection (camera) centers. Thus, for each pair of correspondences there exists a plane that contains the respective world point and the baseline vector. Figure 2 illustrates the epipolar plane of a pair of correspondences. The projections of the two camera centers in the first and second view are the epipoles , . The lines and defined by the epipoles and the normalized Euclidean projections , are known as the epipolar 1 2 1 2 lines; epipolar lines are in fact, the projections of𝑒𝑒 the𝑒𝑒 two rays thatπœ†πœ† passπœ†πœ† through the 1 2 camera centers , and the point M onto the oppositeπ‘šπ‘š imageπ‘šπ‘š planes.

Let now𝑂𝑂 1 𝑂𝑂and2 be the direction vectors (in the first camera coordinate frame) of the projection rays that connect the first and second camera center with the point M. 1 2 Evidently, , 𝑒𝑒 and span𝑒𝑒 the epipolar plane that corresponds to M.

𝑒𝑒1 𝑒𝑒2 𝑏𝑏

Figure 2. Epipolar plane induced by corresponding projections. An equivalent way of expressing this coplanarity is by considering the orthogonality relationship between and the of b and : ( Γ— ) = 0 𝑒𝑒2 𝑒𝑒1 (4) where is the inner product operator.2 Since1 = = and = + = 𝑒𝑒 βˆ™ 𝑒𝑒 𝑏𝑏 + , substituting in equation (3.10) yields, βˆ™ 𝑒𝑒2 𝑒𝑒1 βˆ’ 𝑏𝑏 𝑀𝑀1 βˆ’ 𝑏𝑏 𝑒𝑒1 𝑅𝑅𝑒𝑒2 𝑏𝑏 2 ( ) [ ] ( ) 𝑅𝑅𝑀𝑀 𝑏𝑏 Γ— + = 0 (5) [ ] 𝑇𝑇 where Γ— is the cross 𝑀𝑀product1 βˆ’ 𝑏𝑏 skewοΏ½ 𝑏𝑏 symmetric𝑅𝑅𝑅𝑅2 𝑏𝑏 οΏ½matrix of b. By simply applying the distributive law and taking into consideration the fact that [ ]Γ— = [ ]Γ— = 0, the following𝑏𝑏 constraint is obtained: 𝑇𝑇 𝑏𝑏 𝑏𝑏 𝑏𝑏 𝑏𝑏 [ ]Γ— = 0 [ ]Γ— = 0 (6) 𝑇𝑇 𝑇𝑇 𝑇𝑇 By definition, the normalized𝑀𝑀1 𝑏𝑏 𝑅𝑅𝑅𝑅 Euclidean2 ⟺ projections𝑀𝑀2 𝑅𝑅 𝑏𝑏 are𝑀𝑀 projectively1 equal to and and therefore the constraint can be re-expressed in terms of and : 𝑀𝑀1 𝑀𝑀2 ([ ] ) = 0 ( [ ] ) = 0 Γ— Γ— π‘šπ‘š1 π‘šπ‘š2 (7) 𝑇𝑇 3 Γ— 3 [𝑇𝑇 ] 𝑇𝑇 The essential matrix.π‘šπ‘š1 𝑏𝑏 The𝑅𝑅 π‘šπ‘š2 matrix⟺ π‘šπ‘š2 𝑅𝑅× is𝑏𝑏 knoπ‘šπ‘šwn1 in literature as the essential matrix E. Using the essential matrix notation,𝑇𝑇 the epipolar constraint is obtained in its widely recognized form [1]1: 𝑅𝑅 𝑏𝑏

[ ]Γ— = 0 (8) 𝑇𝑇 𝑇𝑇 π‘šπ‘š2 �𝑅𝑅��𝑏𝑏�� π‘šπ‘š1 𝐸𝐸 1 Evidently, there can be many essential matrices corresponding to the same set of 2-view projections depending on the interpretation given to the rigid transformation that links the two camera poses. In this thesis, the rotation matrix R is perceived as the matrix that contains the second camera frame directions (expressed in the first camera frame) as its columns and b is the baseline vector (also in the first camera coordinate frame): Thus, the essential matrix will always be given by = [ ]Γ—. 𝑇𝑇 𝐸𝐸 𝑅𝑅 𝑏𝑏 4. Properties of the essential matrix

The most important properties of the essential matrix are listed in this section. These properties are very useful either as constraints or as supplementary formulas in the course of scene structure and relative camera pose estimation. Lemma 1. Suppose E is an essential matrix. Then, the epipoles and are the left and right null spaces of E respectively. 𝑒𝑒1 𝑒𝑒2 Proof. Since the epipoles are the scaled images of the camera centers, then there exist and such that = and = . By the expression of the essential matrix in equation (D.1), = [ ]×𝑇𝑇. Therefore: πœ†πœ†1 πœ†πœ†2 𝑒𝑒2 βˆ’πœ†πœ†2𝑅𝑅 𝑏𝑏 𝑒𝑒1 πœ†πœ†1𝑏𝑏 = ( ) [ ]𝑇𝑇 = [ ] = [ ] = 0 𝛦𝛦 𝑅𝑅 Γ— 𝑏𝑏 Γ— Γ— (9) 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇× 𝑒𝑒2 𝐸𝐸 βˆ’πœ†πœ†2 𝑅𝑅 𝑏𝑏 𝑅𝑅 𝑏𝑏 βˆ’πœ†πœ†2𝑏𝑏 𝑅𝑅�𝑅𝑅 𝑏𝑏 βˆ’πœ†πœ†2 𝑏𝑏��𝑏𝑏� 𝐼𝐼3 βˆ’π‘π‘ 𝑏𝑏=0 Also, = [ ] = 0 Γ— (10) 𝑇𝑇 Γ— 𝐸𝐸𝑒𝑒1 πœ†πœ†2𝑅𝑅 �𝑏𝑏��𝑏𝑏 Lemma 2. Any point , has an associated𝑏𝑏 𝑏𝑏=0 epipolar line , in the opposite view given by: π‘šπ‘š1 π‘šπ‘š2 𝑙𝑙1 𝑙𝑙2 = = Γ— = = Γ—

Proof. The proof is a direct𝑙𝑙2 consequenceπΈπΈπ‘šπ‘š1 𝑒𝑒2 ofπ‘šπ‘š the2 epipolar𝑙𝑙1 𝐸𝐸constraintπ‘šπ‘š2 𝑒𝑒1 forπ‘šπ‘š 1 and . Γ— Lemma 3. For any orthonormal matrix and for any vectorπ‘šπ‘š1 π‘šπ‘š2 the following holds: 3 3 3 𝑅𝑅 ∈ ℝ π‘Žπ‘Ž ∈ ℝ [ ]Γ— = [ ]Γ— (11) Proof. Let R be an orthonormal matrix. Then by 𝑇𝑇the definition of cross-product, 𝑅𝑅𝑅𝑅 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 ( ) Γ— = [ ]Γ— (12)

Multiplying (D.8) with from𝑅𝑅 𝑅𝑅the left𝑏𝑏 yields,𝑅𝑅𝑅𝑅 𝑏𝑏 𝑇𝑇 ( ) [ ] 𝑅𝑅 Γ— = Γ— (13) 𝑇𝑇 𝑇𝑇 We now resort to the following𝑅𝑅 οΏ½ 𝑅𝑅 𝑅𝑅property𝑏𝑏� of𝑅𝑅 the𝑅𝑅 𝑅𝑅cross𝑏𝑏 product that holds for any linear transformation M and any pair of vectors a, b: ( ) Γ— ( ) = ( Γ— ) (14) Making use of the cross product property in (14), equation (13) becomes: 𝑀𝑀𝑀𝑀 𝑀𝑀𝑀𝑀 𝑀𝑀 π‘Žπ‘Ž 𝑏𝑏 ( ) Γ— ( ) = [ ]Γ— 𝑇𝑇 Γ— ( )𝑇𝑇 = [ 𝑇𝑇 ] 𝑅𝑅 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 𝑏𝑏 𝑅𝑅 𝑅𝑅×𝑅𝑅 𝑏𝑏 [ ] = [ ] 𝑇𝑇 ([ ]𝑇𝑇 [ ] ) = 0 Γ— ⇔ π‘Žπ‘Ž ×𝑅𝑅 𝑏𝑏 𝑅𝑅 Γ— 𝑅𝑅𝑅𝑅 𝑏𝑏 Γ— 𝑇𝑇 𝑇𝑇 [ ] = [ ] 𝑇𝑇 𝑇𝑇 (15) ⇔ π‘Žπ‘Ž 𝑅𝑅 𝑏𝑏 𝑅𝑅 𝑅𝑅𝑅𝑅 𝑏𝑏× ⇔ π‘Žπ‘Ž 𝑅𝑅× βˆ’ 𝑅𝑅 𝑅𝑅𝑅𝑅 𝑏𝑏 𝑇𝑇 ⇔ π‘Žπ‘Ž 𝑅𝑅 𝑅𝑅𝑅𝑅 𝑅𝑅 A very significant theorem that provides a simple but hard criterion for the existence of an essential matrix is the following (Faugeras, Luong et al. 2004). Theorem 1. A 3 3 matrix E is an essential matrix if and only if it has a singular value decomposition = such that: π‘₯π‘₯ 𝑇𝑇 = { , , 0}, > 0 𝐸𝐸 π‘ˆπ‘ˆπ‘ˆπ‘ˆπ‘‰π‘‰ where U, V are orthonormal matrices. 𝑆𝑆 𝑑𝑑𝑑𝑑𝑑𝑑𝑑𝑑 𝜎𝜎 𝜎𝜎 𝜎𝜎 Proof. Let E be a fundamental matrix. Then, E is given by, = [ ]Γ—. Taking yields, 𝑇𝑇 𝑇𝑇 𝛦𝛦 𝑅𝑅 𝑏𝑏 𝐸𝐸 𝐸𝐸 = ( [ ]Γ—) [ ]Γ— = [ ]Γ— [ ]Γ— = [ ]Γ— [ ]Γ— (16) 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 = Let now R𝐸𝐸0 be𝐸𝐸 the 𝑅𝑅rotation𝑏𝑏 that𝑅𝑅 aligns𝑏𝑏 the𝑏𝑏 baseline𝑅𝑅𝑅𝑅 𝑅𝑅 vector𝑏𝑏 with𝑏𝑏 the𝑏𝑏 z-axis, that is, [0 0 ] . Then, using lemma 3, it follows that [ ]Γ— = [ ]Γ— . Substituting in 0 (D.7) yields, 𝑇𝑇 𝑇𝑇 𝑅𝑅 𝑏𝑏 ‖𝑏𝑏‖ π‘Žπ‘Ž 𝑅𝑅0 𝑅𝑅0𝑏𝑏 𝑅𝑅0 = 0 [ 0 ]Γ— 0 0 [ 0 ]Γ— 0 = 0 [ 0 ]Γ— [ 0 ]Γ—[ 0 ]Γ— 0 𝑇𝑇 𝑇𝑇 𝑇𝑇 0𝑇𝑇 0 0𝑇𝑇 𝑇𝑇 0 𝐸𝐸 𝐸𝐸 �𝑅𝑅 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 οΏ½ 𝑅𝑅 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 𝑅𝑅 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 π‘Žπ‘Ž 𝑅𝑅 = 0 0 0 0 𝑇𝑇 𝑇𝑇 0 β€–0𝑏𝑏‖ 0 0 βˆ’β€–0𝑏𝑏‖ 0 𝐸𝐸 𝐸𝐸 0 0 ⇔ 𝑅𝑅 οΏ½βˆ’β€–π‘π‘β€– οΏ½ οΏ½0‖𝑏𝑏‖ 0 οΏ½ 𝑅𝑅 = 0 2 0 (17) 𝑇𝑇 𝑇𝑇 ‖𝑏𝑏0β€– 0 2 0 0 0 ⇔ 𝐸𝐸 𝐸𝐸 𝑅𝑅 οΏ½ β€– β€– οΏ½ 𝑅𝑅 The decomposition of equation (17) is a SVD 𝑏𝑏(one out many) of . Hence, E will also decompose as follows: 𝑇𝑇 𝐸𝐸 𝐸𝐸 0 0 = 0 0 (18) β€–0𝑏𝑏‖ 0 0 𝐸𝐸 π‘ˆπ‘ˆ οΏ½ β€– β€– οΏ½ 𝑅𝑅0 for some orthonormal matrix U. 𝑏𝑏 Let now E be a 3 3 matrix with two exactly non-zero singular values which are equal. In this case, it is easier to proceed by gradually constructing the sought result. The decomposition of the matrix E is, π‘₯π‘₯ 0 0 = 0 0 = (19) 0𝑠𝑠 0 0 𝑇𝑇 𝑇𝑇 𝐸𝐸 π‘ˆπ‘ˆ οΏ½ οΏ½ 𝑉𝑉 π‘ˆπ‘ˆπ‘ˆπ‘ˆπ‘‰π‘‰ where U and V are orthonormal matrices.𝑠𝑠 The construction that follows is relying on the observation that S can be written in the following way: 0 1 0 0 0 0 1 0 = 1 0 0 0 0 1 0 0 = (20) 2 2 𝑇𝑇 0 0 1 𝜎𝜎0 0 0 0 βˆ’0 1 πœ‹πœ‹ πœ‹πœ‹ 𝑆𝑆 οΏ½ οΏ½ οΏ½ οΏ½ οΏ½ οΏ½ 𝑧𝑧 οΏ½ οΏ½ 𝑧𝑧 οΏ½ οΏ½ or, βˆ’ 𝜎𝜎 𝑅𝑅 βˆ’ 𝑆𝑆𝑅𝑅 βˆ’ 0 1 0 0 0 0 1 0 = 1 0 0 0 0 1 0 0 = (21) 2 2 𝑇𝑇 0 βˆ’0 1 𝜎𝜎0 0 0 0 0 1 πœ‹πœ‹ πœ‹πœ‹ 𝑆𝑆 οΏ½ οΏ½ οΏ½ 𝜎𝜎 οΏ½ οΏ½βˆ’ οΏ½ 𝑅𝑅𝑧𝑧 οΏ½ οΏ½ 𝑆𝑆𝑅𝑅𝑧𝑧 οΏ½ οΏ½

where and are rotations by Β± /2 about the z axis. We observed that the product of S withπœ‹πœ‹ the precedingπœ‹πœ‹ or following rotation yields a skew symmetric matrix: 𝑅𝑅𝑧𝑧 οΏ½βˆ’ 2οΏ½ 𝑅𝑅𝑧𝑧 οΏ½2οΏ½ πœ‹πœ‹ 0 0 = = 0 0 2 2 𝑇𝑇 2 𝑇𝑇 (22) 0 βˆ’0𝜎𝜎 0 πœ‹πœ‹ πœ‹πœ‹ πœ‹πœ‹ 𝑆𝑆 �𝑅𝑅𝑧𝑧 οΏ½ οΏ½ 𝑆𝑆� 𝑅𝑅𝑧𝑧 οΏ½ οΏ½ �𝜎𝜎 οΏ½ 𝑅𝑅𝑧𝑧 οΏ½ οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ or, 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 0 0 = = 0 0 2 2 𝑇𝑇 2 𝑇𝑇 (23) 0 𝜎𝜎0 0 πœ‹πœ‹ πœ‹πœ‹ πœ‹πœ‹ 𝑆𝑆 𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ �𝑆𝑆𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ οΏ½ οΏ½βˆ’πœŽπœŽ οΏ½ 𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ Now, the property of skew𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 symmetric𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 matrices in lemma 3 can be of great use in a heuristic sense. In particular, it guarantees that for any rotation matrix U and for any skew symmetric matrix Γ—, the matrix Γ— is also a skew symmetric matrix. In the light of this consequence and choosing (22), the𝑇𝑇 SVD of E can be expressed as follows: 𝑆𝑆0 0 π‘ˆπ‘ˆπ‘†π‘† π‘ˆπ‘ˆ = 0 0 2 πœ‹πœ‹ 0 βˆ’0𝜎𝜎 0 𝑇𝑇 𝐸𝐸 π‘ˆπ‘ˆπ‘…π‘…π‘§π‘§ οΏ½βˆ’ οΏ½ �𝜎𝜎 οΏ½ 𝑉𝑉 0 0 = 0 0 2 𝑇𝑇 2 2 𝑇𝑇 𝑇𝑇 0 βˆ’0𝜎𝜎 0 𝑇𝑇 πœ‹πœ‹ πœ‹πœ‹ πœ‹πœ‹ π‘ˆπ‘ˆ 𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ 𝑉𝑉 𝑉𝑉𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ �𝜎𝜎 οΏ½ 𝑅𝑅𝑧𝑧 οΏ½βˆ’ οΏ½ 𝑉𝑉 οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑑𝑑𝑑𝑑 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 0 0 = 0 0 (24) 2 𝑇𝑇 2 2 𝑇𝑇 𝑇𝑇 0 βˆ’0𝜎𝜎 0 𝑧𝑧 πœ‹πœ‹ 𝑧𝑧 πœ‹πœ‹ 𝑧𝑧 πœ‹πœ‹ ⇔ 𝐸𝐸 οΏ½π‘ˆπ‘ˆπ‘…π‘… οΏ½βˆ’ οΏ½ 𝑉𝑉 οΏ½ οΏ½ 𝑉𝑉𝑅𝑅 οΏ½βˆ’ οΏ½οΏ½ �𝜎𝜎 οΏ½ �𝑉𝑉 𝑅𝑅 οΏ½βˆ’ οΏ½ οΏ½. οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ π‘Žπ‘Ž π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ In the very same way, one arrives at a similarπ‘Žπ‘Ž 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 result startingπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š π‘šπ‘šfrom𝑏𝑏𝑏𝑏 𝑣𝑣 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣(D.14):π‘œπ‘œπ‘œπ‘œ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝐷𝐷 3 0 0 = 0 0 2 𝑇𝑇 2 2 𝑇𝑇 (25) 𝑇𝑇 0 𝜎𝜎0 0 𝑧𝑧 πœ‹πœ‹ 𝑧𝑧 πœ‹πœ‹ 𝑧𝑧 πœ‹πœ‹ 𝐸𝐸 οΏ½π‘ˆπ‘ˆπ‘…π‘… οΏ½ οΏ½ 𝑉𝑉 οΏ½ �𝑉𝑉𝑅𝑅 οΏ½ οΏ½οΏ½ οΏ½βˆ’ 𝜎𝜎 οΏ½ �𝑉𝑉 𝑅𝑅 οΏ½ οΏ½οΏ½ . οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ π‘Žπ‘Ž π‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿπ‘Ÿ π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ π‘Žπ‘Ž 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠𝑠 π‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š 𝑏𝑏𝑏𝑏 𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣𝑣 π‘œπ‘œπ‘œπ‘œ 𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙𝑙 𝐷𝐷 3 The only remaining β€œloose end” now is to show that matrices and 𝑇𝑇 πœ‹πœ‹ 𝑇𝑇 are indeed rotation matrices. Since the product involves π‘ˆπ‘ˆboth𝑅𝑅𝑧𝑧 οΏ½ Uβˆ’ and2οΏ½ 𝑉𝑉V, then 𝑇𝑇 the signπœ‹πœ‹ of the𝑇𝑇 product of their determinants should be positive due to the fact that these 𝑧𝑧 2 matricesπ‘ˆπ‘ˆπ‘…π‘… οΏ½ οΏ½ participate𝑉𝑉 in the SVD of which always has a positive determinant. And since the determinant of is also 𝑇𝑇positive, it follows that the two aforementioned products are orthonormal matrices with𝐸𝐸 𝐸𝐸 positive determinants, hence rotations. And that 𝑧𝑧 concludes the proof. 𝑅𝑅

Lemma 4. If E is an essential matrix such that = [ ]Γ—, then: ( ) = ( ) = 𝑇𝑇2 𝐸𝐸 𝑅𝑅 𝑏𝑏 𝑇𝑇 𝑇𝑇 2 𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸 𝑇𝑇𝑇𝑇 𝐸𝐸 𝐸𝐸 ‖𝑏𝑏‖ Proof. Taking yields: 𝑇𝑇 𝐸𝐸 𝐸𝐸 = ( [ ]Γ—) [ ]Γ— = [ ]Γ— (26) 2 = [ 𝑇𝑇] 𝑇𝑇 𝑇𝑇 𝑇𝑇 If now 𝐸𝐸 𝐸𝐸 , then,𝑅𝑅 𝑏𝑏 𝑅𝑅 𝑏𝑏 βˆ’ 𝑏𝑏 𝑇𝑇 𝑏𝑏 𝑏𝑏1 0𝑏𝑏2 𝑏𝑏3 + = 0 2 = 2 2 + (27) βˆ’π‘π‘3 𝑏𝑏2 𝑏𝑏2 𝑏𝑏3 βˆ’π‘π‘1𝑏𝑏2 βˆ’π‘π‘1𝑏𝑏3 𝑇𝑇 0 2 2 + 𝐸𝐸 𝐸𝐸 βˆ’ οΏ½ 𝑏𝑏3 βˆ’π‘π‘1οΏ½ οΏ½ βˆ’π‘π‘1𝑏𝑏2 𝑏𝑏1 𝑏𝑏3 βˆ’π‘π‘2𝑏𝑏3 οΏ½ 2 1 ( ) = 2( + + ) = 2 2 2 From (D.18) it isβˆ’ clear𝑏𝑏 that𝑏𝑏 βˆ’π‘π‘1𝑏𝑏3 βˆ’π‘π‘2𝑏𝑏3 𝑏𝑏1. 𝑏𝑏2 𝑇𝑇 2 2 2 2 In the case of , the𝑇𝑇𝑇𝑇 following𝐸𝐸 𝐸𝐸 is𝑏𝑏 obtained:1 𝑏𝑏2 𝑏𝑏3 ‖𝑏𝑏‖ 𝑇𝑇 𝐸𝐸𝐸𝐸 = [ ]Γ—( [ ]Γ—) = [ ]Γ— 2 𝑇𝑇 𝑇𝑇 = ( 𝑇𝑇[ ] 𝑇𝑇)( [ ]𝑇𝑇 ) (28) 𝐸𝐸𝐸𝐸 𝑅𝑅 𝑏𝑏 𝑅𝑅 𝑏𝑏 Γ— βˆ’π‘…π‘… Γ— 𝑏𝑏 𝑅𝑅 𝑇𝑇 [ ] 𝑇𝑇= [ ]𝑇𝑇 Once again, lemma 3 states⇔ 𝐸𝐸 𝐸𝐸that 𝑅𝑅× 𝑏𝑏 𝑅𝑅 𝑅𝑅 Γ— 𝑏𝑏 for𝑅𝑅 any orthogonal matrix R. Hence, (28) becomes: 𝑇𝑇 𝑏𝑏 𝑅𝑅 𝑅𝑅𝑅𝑅 𝑅𝑅

= ( [ ]Γ— ) ( [ ]Γ— ) = [ ]Γ— (29) 𝑇𝑇 𝑇𝑇 [ ]Γ— 𝑇𝑇 𝑇𝑇 [ ]Γ— 𝑇𝑇 2 𝐸𝐸 𝐸𝐸 �𝑅𝑅 �𝑅𝑅��𝑏𝑏��𝑅𝑅�� 𝑅𝑅� �𝑅𝑅 �𝑅𝑅��𝑏𝑏��𝑅𝑅�� 𝑅𝑅� βˆ’ 𝑏𝑏 And since = , it follows𝑏𝑏 from (27) that𝑏𝑏 ( ) = 2 . 𝑇𝑇 3 Γ— 3 𝑇𝑇 2 Theorem 5.‖𝑅𝑅 A 𝑏𝑏‖ β€– non𝑏𝑏‖ -zero matrix E is an essential𝑇𝑇𝑇𝑇 𝐸𝐸 matrix𝐸𝐸 if and‖𝑏𝑏‖ only if the following relationship holds: ( ) = 2𝑇𝑇 𝑇𝑇 𝑇𝑇𝑇𝑇 𝐸𝐸 𝐸𝐸 Proof. Proving that if the relationship𝐸𝐸𝐸𝐸 𝐸𝐸holds then E is𝐸𝐸 an essential matrix could be done through its SVD. Let = where , are orthonormal matrices and is a diagonal matrix with positive entries.𝑇𝑇 Substituting𝑇𝑇 in the given relationship, yields: 𝐸𝐸 π‘ˆπ‘ˆπ‘ˆπ‘ˆπ‘‰π‘‰ π‘ˆπ‘ˆ 𝑉𝑉 𝑆𝑆 ( ) ( ) = = 0 2 𝑇𝑇 2 𝑇𝑇 𝑇𝑇 𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸 2 𝑇𝑇 𝑇𝑇 𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸 𝑇𝑇 𝐸𝐸𝐸𝐸 𝐸𝐸 βˆ’ 𝐸𝐸 π‘ˆπ‘ˆπ‘†π‘† π‘ˆπ‘ˆ π‘ˆπ‘ˆπ‘ˆπ‘ˆ( 𝑉𝑉 βˆ’) π‘ˆπ‘ˆπ‘ˆπ‘ˆπ‘‰π‘‰ = (30) 2 𝑇𝑇 2 𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸 Let , , be the singular ⇔values𝑆𝑆 of . Then, 𝑆𝑆the trace of should be equal to the sum of the squared singular values: 𝑇𝑇 𝑠𝑠1 𝑠𝑠2 𝑠𝑠2 𝑆𝑆 𝐸𝐸𝐸𝐸 ( ) = + + (31) It follows from (30) and (31) that, 𝑇𝑇 2 2 2 𝑇𝑇𝑇𝑇 𝐸𝐸𝐸𝐸 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 2 = ( + + ) 2 3 = ( 2 + 2 + 2) 𝑠𝑠1 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 𝑠𝑠1 (32) 2 3 = ( 2 + 2 + 2) 𝑠𝑠2 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 𝑠𝑠2 3 2 2 2 𝑠𝑠3 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 𝑠𝑠1 Since E is non-zero, one singular value must be strictly positive. Without constraining generality, let be strictly positive. It follows that,

1 ( + + ) 𝑠𝑠 = = + (33) 2 22 2 2 𝑠𝑠1 𝑠𝑠2 𝑠𝑠3 2 2 2 Substituting in the expression𝑠𝑠1 for the second⇔ singular𝑠𝑠1 𝑠𝑠1 value𝑠𝑠2 in (32) yields: = ( + ) = 0 (34) 3 2 2 2 It follows that either 𝑠𝑠or2 𝑠𝑠is2 zero.𝑠𝑠3 𝑠𝑠If2 ⇔they𝑠𝑠2 are𝑠𝑠3 both zero, then so is which is a contradiction. Exactly 2 of the 3 singular values are non-zero and have the same value, 𝑠𝑠2 𝑠𝑠3 𝑠𝑠1 = = 𝑇𝑇 . 𝑇𝑇𝑇𝑇�𝐸𝐸𝐸𝐸 οΏ½ 𝑠𝑠 1 𝑠𝑠2Consider2 now the case in we which we known that E is an essential matrix. Taking the given relationship and substituting from (D.18) and using the fact that [ ]Γ— = , we have: 2 𝑏𝑏 𝑇𝑇 2= [ ] = = [ ] 𝑏𝑏𝑏𝑏 βˆ’ ‖𝑏𝑏‖ 𝐼𝐼 Γ— Γ— [𝑇𝑇]Γ— 2 2 [ ]Γ— 𝑇𝑇 2 𝑇𝑇 Γ— 𝑇𝑇 𝐸𝐸 𝐸𝐸�𝐸𝐸2 βˆ’πΈπΈ 𝑇𝑇�𝑏𝑏 2 ‖𝑏𝑏‖ 𝐸𝐸( βˆ’ )π‘‡π‘‡πΈπΈβŸ 𝑏𝑏𝑏𝑏 ‖𝑏𝑏‖ 𝐸𝐸 βˆ’ 𝑅𝑅 �𝑏𝑏��𝑏𝑏 𝑏𝑏 (35) βˆ’ 𝑏𝑏 𝑏𝑏=𝑏𝑏 βˆ’β€–π‘π‘β€– 𝐼𝐼 = 𝑅𝑅 𝑏𝑏 03 1 2𝑇𝑇 2 𝑇𝑇𝑇𝑇 𝐸𝐸 𝐸𝐸 𝑇𝑇𝑇𝑇‖��𝑏𝑏𝐸𝐸‖𝑇𝑇𝐸𝐸� 𝐸𝐸 𝐸𝐸 where 0 Γ— is the zero2 3 Γ— 1 vector. And that concludes the proof in the opposite direction. 3 1 5. Recovering baseline and orientation

The method for relative pose extraction detailed in this section is based on the brilliant observation by Berthold Horn (Horn 1990) that the matrix of cofactors of an essential matrix can be expressed in terms of the rotation matrix, the skew symmetric matrix of the baseline and the essential matrix itself. Consider a 3D point M and its normalized Euclidean projections and in two views as show in Figure D.1. With the essential matrix in place, the next step is to 1 2 obtain the rotation matrix R and the unit-length baseline vector b. As a firstπ‘šπ‘š step,π‘šπ‘š from theorem D.1, scale can be removed from the essential matrix by dividing it with . Also, lemma D.4 states that ( ) = 2 ; thus, a β€œnormalized” essential matrix is obtained as follows: 𝑇𝑇 2 ‖𝑏𝑏‖ 𝑇𝑇𝑇𝑇 𝐸𝐸 𝐸𝐸 ‖𝑏𝑏‖ 𝐸𝐸𝑛𝑛 = ( ) (36) 𝐸𝐸 𝑛𝑛 𝑇𝑇 𝐸𝐸 𝑇𝑇𝑇𝑇 𝐸𝐸 𝐸𝐸 οΏ½ 2 5.1 Baseline From lemma 4 it is easy to extract the absolute values of the baseline components as follows: | | = 1 [ ] (37) 𝑇𝑇 𝑏𝑏1 οΏ½ βˆ’ 𝐸𝐸𝑛𝑛 𝐸𝐸𝑛𝑛 11 | | = 1 [ ] (38) 𝑇𝑇 |𝑏𝑏2| = οΏ½1 βˆ’ [𝐸𝐸𝑛𝑛 𝐸𝐸𝑛𝑛]22 (39) [ ] 𝑇𝑇 th th where denotes the element𝑏𝑏3 οΏ½of βˆ’ 𝐸𝐸𝑛𝑛 in𝐸𝐸𝑛𝑛 the33 i row and j column. To resolve the sign ambiguity,𝑇𝑇 the largest squared component𝑇𝑇 is assumed to be a positive square root 𝑛𝑛 𝑛𝑛 𝑖𝑖𝑖𝑖 𝑛𝑛 𝑛𝑛 and the 𝐸𝐸remaining𝐸𝐸 signs are inferred from𝐸𝐸 the𝐸𝐸 off-diagonal elements of : + 𝑇𝑇 𝐸𝐸𝑛𝑛 𝐸𝐸𝑛𝑛 = 2 2 + (40) 𝑏𝑏2 𝑏𝑏3 βˆ’π‘π‘1𝑏𝑏2 βˆ’π‘π‘1𝑏𝑏3 𝑇𝑇 2 2 + 𝑛𝑛 𝑛𝑛 1 2 1 3 2 3 𝐸𝐸 𝐸𝐸 οΏ½ βˆ’π‘π‘ 𝑏𝑏 𝑏𝑏 𝑏𝑏 βˆ’2𝑏𝑏 𝑏𝑏 2οΏ½ It suffices to recover one baselineβˆ’π‘π‘1 𝑏𝑏vector3 βˆ’ from𝑏𝑏2𝑏𝑏3 𝑏𝑏1 as 𝑏𝑏des2 cribed above, as the second baseline will simply be a vector of opposite direction.𝑇𝑇 𝐸𝐸𝑛𝑛 𝐸𝐸𝑛𝑛 5.2 Orientation Recovering the rotation matrix requires slightly more elaborate pre-processing. Consider the matrix of cofactors of :

𝐸𝐸𝑛𝑛 𝑒𝑒22 𝑒𝑒23 𝑒𝑒21 𝑒𝑒23 𝑒𝑒21 𝑒𝑒22 = 32 33 31 33 31 32 ⎑ �𝑒𝑒 𝑒𝑒 οΏ½ βˆ’ �𝑒𝑒 𝑒𝑒 οΏ½ �𝑒𝑒 𝑒𝑒 οΏ½ ⎀ (41) ⎒ 12 13 11 13 11 12 βŽ₯ 𝑛𝑛 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝐢𝐢 βŽ’βˆ’ οΏ½ 32 33οΏ½ οΏ½ 31 33οΏ½ βˆ’ οΏ½ 31 32οΏ½βŽ₯ ⎒ 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 𝑒𝑒 βŽ₯ 𝑒𝑒12 𝑒𝑒13 𝑒𝑒11 𝑒𝑒13 𝑒𝑒11 𝑒𝑒12 Standard tensor notation⎒ is adopted to denote matrix rows andβŽ₯ columns as well as ⎣ �𝑒𝑒22 𝑒𝑒23οΏ½ βˆ’ �𝑒𝑒21 𝑒𝑒23οΏ½ �𝑒𝑒21 𝑒𝑒22οΏ½ ⎦ elements for the following derivations. Thus, for instance, is the element of in the ith row and the jth column. Also, is the ith row of as a 1 ×𝑖𝑖 3 vector, while is the jth 𝑒𝑒𝑗𝑗 𝐸𝐸𝑛𝑛 column as a 3 Γ— 1 vector. With notation𝑖𝑖 in place, we observe that can be written as 𝑛𝑛 𝑗𝑗 follows: 𝑒𝑒 𝐸𝐸 𝑒𝑒 𝐢𝐢𝑛𝑛 (( ) Γ— ( ) ) = (( 2)𝑇𝑇 Γ— ( 3)𝑇𝑇)𝑇𝑇 (42) ((𝑒𝑒3)𝑇𝑇 Γ— (𝑒𝑒1)𝑇𝑇)𝑇𝑇 𝐢𝐢𝑛𝑛 οΏ½ 𝑒𝑒 𝑒𝑒 οΏ½ Also, can be expressed in terms of 1cross𝑇𝑇 -products2 𝑇𝑇 𝑇𝑇 as follows: 𝑒𝑒 𝑒𝑒 [ ] ([ ] ) ( Γ— ) 𝐸𝐸𝑛𝑛 Γ— Γ— = [ ] = 𝑇𝑇 [ ] = 𝑇𝑇[ ] = ([ ] )𝑇𝑇 = ( Γ— )𝑇𝑇 Γ— 1 Γ— 1 Γ— Γ— 1 1 (43) 𝑇𝑇 π‘Ÿπ‘Ÿ 𝑇𝑇 π‘Ÿπ‘Ÿ 𝑇𝑇[𝑏𝑏] βˆ’([𝑏𝑏] π‘Ÿπ‘Ÿ )𝑇𝑇 βˆ’(𝑏𝑏 Γ— π‘Ÿπ‘Ÿ )𝑇𝑇 𝑛𝑛 2 2 Γ— Γ— 2 2 𝐸𝐸 𝑅𝑅 𝑏𝑏 οΏ½π‘Ÿπ‘Ÿ 𝑇𝑇� 𝑏𝑏 οΏ½π‘Ÿπ‘Ÿ 𝑇𝑇 𝑏𝑏 οΏ½ οΏ½βˆ’ 𝑏𝑏 π‘Ÿπ‘Ÿ 𝑇𝑇� οΏ½βˆ’ 𝑏𝑏 π‘Ÿπ‘Ÿ 𝑇𝑇� Substituting from (43)3 in (42) yields3 triple products in the3 rows of ; applying3 the well- π‘Ÿπ‘Ÿ π‘Ÿπ‘Ÿ 𝑏𝑏 βˆ’ 𝑏𝑏 π‘Ÿπ‘Ÿ βˆ’ 𝑏𝑏 π‘Ÿπ‘Ÿ known triple product expansion formula leaves cross product expressions only between 𝑛𝑛 the columns of R (intermediate result) which can also be eliminated𝐢𝐢 from the expression by orthonormality (final expression on the right): ( Γ— ) 𝑇𝑇 ( ) ( ) 2 3 Γ— Γ— Γ— βŽ‘οΏ½π‘π‘ βˆ™ οΏ½π‘Ÿπ‘ŸοΏ½οΏ½οΏ½π‘Ÿπ‘ŸοΏ½οΏ½ 𝑏𝑏 ⎀ ( ) 𝑇𝑇 ⎒ π‘Ÿπ‘Ÿ1 βŽ₯ = ( Γ— 2) Γ— ( Γ— 3) = ( Γ— ) = ( ) 𝑇𝑇 (44) ⎑� 𝑏𝑏 π‘Ÿπ‘Ÿ 𝑏𝑏 π‘Ÿπ‘Ÿ οΏ½ ⎀ ⎒ βŽ₯ 𝑏𝑏 βˆ™ π‘Ÿπ‘Ÿ1 𝑏𝑏 𝑇𝑇 𝑇𝑇 ( ) 𝑇𝑇 𝑛𝑛 ( Γ— ) Γ— ( Γ— ) ⎒ 3 1 βŽ₯ 2 𝐢𝐢 ⎒� 𝑏𝑏 π‘Ÿπ‘Ÿ3 𝑏𝑏 π‘Ÿπ‘Ÿ1 οΏ½ βŽ₯ �𝑏𝑏 βˆ™ οΏ½π‘Ÿπ‘ŸοΏ½οΏ½οΏ½π‘Ÿπ‘ŸοΏ½οΏ½ 𝑏𝑏 οΏ½ 𝑏𝑏 βˆ™ π‘Ÿπ‘Ÿ 𝑏𝑏 οΏ½ 𝑇𝑇 ⎒ π‘Ÿπ‘Ÿ2 βŽ₯ 𝑇𝑇 ⎒ βŽ₯ ⎒ ( Γ— ) βŽ₯ 𝑏𝑏 βˆ™ π‘Ÿπ‘Ÿ3 𝑏𝑏 οΏ½ 𝑏𝑏 π‘Ÿπ‘Ÿ1 𝑏𝑏 π‘Ÿπ‘Ÿ2 οΏ½ ⎣ ⎦ ⎒ 𝑇𝑇βŽ₯ �𝑏𝑏 βˆ™ π‘Ÿπ‘Ÿ1 π‘Ÿπ‘Ÿ2 οΏ½ 𝑏𝑏 ⎒ οΏ½οΏ½οΏ½3οΏ½οΏ½ βŽ₯ It is now easy to re-arrange the inner products⎣ inπ‘Ÿπ‘Ÿ (44) in ⎦order to factor-out the columns of R: ( ) ( ) = ( 𝑇𝑇 ) 𝑇𝑇 = 𝑇𝑇( 𝑇𝑇) = ( ) (45) π‘Ÿπ‘Ÿ1 𝑏𝑏 𝑏𝑏 π‘Ÿπ‘Ÿ1 𝑏𝑏𝑏𝑏 ( 𝑇𝑇 ) 𝑇𝑇 𝑇𝑇( 𝑇𝑇) 𝑇𝑇 𝑇𝑇 𝑛𝑛 2 2 𝐢𝐢 οΏ½ π‘Ÿπ‘Ÿ 𝑇𝑇𝑏𝑏 𝑏𝑏𝑇𝑇� οΏ½π‘Ÿπ‘Ÿ 𝑇𝑇 𝑏𝑏𝑏𝑏𝑇𝑇 οΏ½ 𝑅𝑅 𝑏𝑏𝑏𝑏 And a well-known skew symmetricπ‘Ÿπ‘Ÿ3 𝑏𝑏 𝑏𝑏 matrixπ‘Ÿπ‘Ÿ3 property𝑏𝑏𝑏𝑏 is that, 2 = + [ ]Γ— (46) 𝑇𝑇 Substituting (46) into (45) yields: 3 𝑏𝑏𝑏𝑏 𝐼𝐼 𝑏𝑏 [ ]2 ( [ ] ) [ ] [ ] [ ] = + Γ— = + Γ— Γ— = + Γ— = + Γ— (47) 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝐢𝐢𝑛𝑛 𝑅𝑅 �𝐼𝐼3 οΏ½ 𝑅𝑅 �𝑅𝑅������ 𝑅𝑅 𝐸𝐸𝑛𝑛 ⇔ 𝑅𝑅 𝐢𝐢𝑛𝑛 𝐸𝐸𝑛𝑛 And since b is sign𝑏𝑏 -ambiguous, 𝐸𝐸it𝑛𝑛𝑏𝑏 follows𝑏𝑏 that there exist𝑏𝑏 two possible rotation𝑏𝑏 matrices and can be obtaining by flipping the sign of [ ]Γ— in (D.38): = Β± [ ] (48) 𝑏𝑏× 𝑇𝑇 𝑇𝑇 𝑛𝑛 𝑛𝑛 5.3 Resolving ambiguities in the𝑅𝑅 baseline𝐢𝐢 -rotation𝑏𝑏 𝐸𝐸 solution From an algebraic point of view, it is clear that, given an essential matrix , a very same baseline vector can be obtained either from , or from , since = ( ) ( ). Clearly, this ambiguity reflects the direction of the baseline, since𝑇𝑇 𝐸𝐸the negative𝑇𝑇 sign can only originate from [ ]Γ—. 𝐸𝐸 βˆ’πΈπΈ 𝐸𝐸 𝐸𝐸 βˆ’πΈπΈ βˆ’πΈπΈ The four possible relative pose configurations recovered from the essential matrix 𝑏𝑏 are illustrated in Figures 3 and 4 in terms of the location of the observed points. It is clear that only one combination of the two baselines and two rotation matrices aligns both cameras behind the observed point. This suggests that, in order to resolve the ambiguity between the four solutions, a reconstruction of the scene must be obtained and the transformation that yields positive (or negative) signs in both depths of an observed point as observed from the two camera views should be the correct one. In other words, the point should lie in front of both cameras. With noisy data, this may not be the case for all points even for the correct transformation and therefore in practice this is a matter of voting.

Figure 3. Camera orientation with respect to the observed points for "positive" baseline direction.

Figure 4. Camera orientation with respect to the observed points for "negative" baseline direction. 6. Scene Reconstruction

In this section, a general method for 3D scene reconstruction in two views from known relative pose and correspondences is presented. This method is also used to disambiguate the four relative pose solutions extracted from the essential matrix. Let = [ ] be the real-world location of an observed point expressed in the first camera coordinate𝑇𝑇 frame and let and be the respective normalized Euclidean projections𝑀𝑀 𝑋𝑋 π‘Œπ‘Œin the𝑍𝑍 two camera views. Also, let b denote the baseline vector in 1 2 the coordinate frame of the first camera and Rπ‘šπ‘š be the orthonormalπ‘šπ‘š matrix containing the directions of the second camera frame (expressed in the first camera coordinate frame) arranged column-wise. It follows that can be expressed in terms of as follows:

= 1 (49) 𝑀𝑀 π‘šπ‘š ( ) The location of M in the second 𝑀𝑀cameraπ‘π‘π‘šπ‘š 1frame is then . The normalized Euclidean coordinates of are given by the following: 𝑇𝑇 1 𝑅𝑅 𝑀𝑀 βˆ’ 𝑏𝑏 𝑀𝑀 = ( ) (50) 1 ( ) 𝑇𝑇 2 𝑇𝑇 𝑇𝑇 where 1 = [0 0 1] π‘šπ‘š. Substituting𝑧𝑧 (49) into𝑅𝑅 𝑀𝑀(50βˆ’) 𝑏𝑏yields a relationship in which the 𝑅𝑅 𝑀𝑀 βˆ’ 𝑏𝑏 only unknown is the depth𝑇𝑇 Z: 𝑧𝑧 1 ( ) = ( ) 𝑇𝑇 𝑇𝑇 𝑇𝑇 οΏ½1 𝑧𝑧 𝑅𝑅 ( π‘π‘π‘šπ‘š1 βˆ’ 𝑏𝑏) οΏ½π‘šπ‘š2 𝑅𝑅 ( π‘π‘π‘šπ‘š1 βˆ’ 𝑏𝑏) = 0 (51) 𝑇𝑇 𝑇𝑇 𝑇𝑇 𝑧𝑧 1 2 1 ( ) ( ) For any vectors ⇔, οΏ½, of𝑅𝑅 arbitraryπ‘π‘π‘šπ‘š βˆ’ 𝑏𝑏dimension,οΏ½π‘šπ‘š βˆ’ 𝑅𝑅 it 𝑍𝑍isπ‘šπ‘š easyβˆ’ 𝑏𝑏to prove that = . With this identity at hand, the relationship in (3.27) becomes: 𝑇𝑇 𝑇𝑇 π‘Žπ‘Ž 𝑏𝑏 𝑐𝑐 π‘Žπ‘Ž 𝑏𝑏 𝑐𝑐 π‘π‘π‘Žπ‘Ž 𝑏𝑏 ( 1 ) ( ) ( ) = 0 𝑇𝑇 ( 𝑇𝑇 1 ) ( 𝑇𝑇 ) = 0 (52) π‘šπ‘š2 𝑧𝑧 𝑅𝑅 π‘π‘π‘šπ‘š1 βˆ’ 𝑏𝑏 βˆ’ 𝑅𝑅 π‘π‘π‘šπ‘š1 βˆ’ 𝑏𝑏 𝑇𝑇 𝑇𝑇 where is the 3 Γ— 3 identity2 matrix.𝑧𝑧 3 Equation1 (52) is an over-determined system in Z ⇔ π‘šπ‘š βˆ’ 𝐼𝐼 𝑅𝑅 π‘π‘π‘šπ‘š βˆ’ 𝑏𝑏 and yields two solutions, one for each projection component, provided that the 3 measurements𝐼𝐼 are completely noise-free. However, in most cases the two solutions do not agree and, furthermore, we observed that in the majority of these cases, disparity tends to concentrate either on the x or on the y axis, thereby making one solution more β€œreliable” than the other. The proposed workaround is to regard Z as a minimizer of the following optimization problem: {( ) ( )} (53) 𝑇𝑇 𝑇𝑇 1 1 where C is the followingπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘šπ‘π‘ π‘šπ‘šnonπ‘šπ‘šπ‘šπ‘šπ‘šπ‘š-invertibleπ‘π‘π‘šπ‘š βˆ’ 𝑏𝑏positive𝑅𝑅𝑅𝑅𝑅𝑅 semiπ‘π‘π‘šπ‘š-definiteβˆ’ 𝑏𝑏 (PSD) matrix: 1 0 = ( 1 ) ( 1 ) = 0 1 2 (54) 𝑇𝑇 𝑇𝑇 𝑇𝑇 βˆ’+π‘₯π‘₯ 2 𝑧𝑧 3 2 𝑧𝑧 3 2 𝐢𝐢 π‘šπ‘š βˆ’ 𝐼𝐼 π‘šπ‘š βˆ’ 𝐼𝐼 οΏ½ 2βˆ’π‘¦π‘¦ 2οΏ½ and and are the coordinates of in βˆ’theπ‘₯π‘₯2 directionsβˆ’π‘¦π‘¦2 π‘₯π‘₯ 2of the𝑦𝑦2 local x and y axis respectively. Taking the derivative of the quadratic expression in (53) in terms of Z and 2 2 2 settingπ‘₯π‘₯ it to zero,𝑦𝑦 yields the following minimizer:π‘šπ‘š

= (55) 𝑇𝑇 𝑇𝑇 π‘šπ‘š1 𝑅𝑅𝑅𝑅𝑅𝑅 𝑏𝑏 𝑇𝑇 𝑇𝑇 The expression in (55) is a robust𝑍𝑍 depth estimate1 which takes the direction of disparity π‘šπ‘š1 𝑅𝑅𝑅𝑅𝑅𝑅 π‘šπ‘š into consideration thereby avoiding the β€œpitfall” of having to choose between two solutions for depth without any criteria at hand on how to make that choice. It should be however noted that estimation can yield very erroneous (e.g., negative depth) results if disparity is very noisy in both axes; in such a case, it is preferable to discard the point.

6. The Algorithm and Sample Reconstructions

Examples of 2-view reconstructions using the methods described in the previous sections are illustrated in Figures 5-8. Features were detected with SIFT and the LK tracker was used to establish correspondences. Flow fields are shown on the images on the right (in green are new features detected in the second image for subsequent tracking). RANSAC outliers were omitted from the reconstruction and do not appear in the flow field illustration. Finally, since camera intrinsics were unknown in all cases, the reconstructions present a discrepancy up to an affine transformation with the ground truth (made-up intrinsics were used). Algorithm 1 describes the steps for relative pose and structure recovery from an essential matrix. Algorith 1. 3D Reconstruction and recovery of relative pose from two views

Input: a) Set of normalized Euclidean correspondences ( ) and ( ) b) Essential matrix, E. 𝑖𝑖 𝑖𝑖 ( ) Output: a) Camera relative pose ( , ), b) 3D coordinatesπ‘šπ‘š1 of all pointsπ‘šπ‘š2 . 𝑖𝑖 comment Obtain the SVD of𝑅𝑅 E:𝑏𝑏 𝑀𝑀 [U, S, V] svd(E)

comment← Obtain a β€œnormalized essential matrix” by removing scale and at the same time impose the necessary (and capable) condition of exactly two and equal singular values: 1 0 0 E U 0 1 0 V 0 0 0 T n ← οΏ½ οΏ½ AcceptReconstruction False.

comment Compute← the absolute values of the baseline components , , from 𝑇𝑇 1 2 3 𝑛𝑛 𝑛𝑛 b 1 [E E ] 𝑏𝑏 𝑏𝑏 𝑏𝑏 𝐸𝐸 𝐸𝐸 T 1 ← οΏ½ βˆ’ n n 11 b 1 [E E ] T 2 ← οΏ½ βˆ’ n n 22 b 1 [E E ] T 3 n n 33 comment← οΏ½ βˆ’ Choosing the greatest component (in absolute value) as positive and work-out the remaining signs from the off-diagonal elements of the essential matrix If ( max{b , b , b } = b ):

If (1[E 2E 3] > 01) T n n 12

If ([E E𝑏𝑏2] ←>βˆ’0𝑏𝑏2): T n n 13

Else If ( max{b 𝑏𝑏,3b←, bβˆ’}𝑏𝑏3= b ): If ([E E1 ] 2 >3 0): 2 T n n 12

If ([E E𝑏𝑏1] ←>βˆ’π‘π‘01): T n bn 23 b

Else: 3 ← βˆ’ 3 If ([E E ] > 0): T n bn 13 b

1 ← βˆ’ 1 If ([E E ] > 0): T n bn 23 b

comment Storing2 ← βˆ’ the2 two possible baselines Baselines {(b , b , b ) , ( b , b , b ) } comment ← Find1 2 the3 matrix βˆ’of cofactors1 βˆ’ 2 βˆ’ of3 e e e e (e e e e𝐸𝐸𝑛𝑛) e e e e C (e e e e ) e e e e (e e e e ) 22 33 32 23 21 33 31 23 21 32 31 22 e e βˆ’ e e βˆ’(e e βˆ’ e e ) e e βˆ’ e e n ← οΏ½βˆ’ 12 33 βˆ’ 32 13 11 33 βˆ’ 31 13 βˆ’ 11 32 βˆ’ 31 12 οΏ½ comment 12 23 Storeβˆ’ 22 the13 two βˆ’possible11 23 rotationβˆ’ 21 13matrices 11 22 βˆ’ 21 12

Rotations {C + [Baselines(1)]Γ—E , C + [Baselines(2)]Γ—E } T T T T comment ← nFind the best scene reconstructionn n for the 4 possiblen relative poses BestReconstruction {Rotations(1) , Baselines(1)}

MinCount ← For each baseline← ∞ b in Baselines: For each rotation matrix in Rotations:

errorCount 0 𝑅𝑅 For each pair← of correspondences (m , m ): 1 2 C = (m 1 I ) (m 1 I ) T T T 2 3 2 3 m RC3 Rβˆ’ b 3 βˆ’ Z T T m 1RCR m 1 T T ← 1 1 Z 1 R (Z m b) 𝑇𝑇 T 2 ← z 1 1 βˆ’ If (Z 0) Or (Z 0):

1 2 ≀errorCount≀ errorCount + 1

Else: ← M = Z m

If (errorCount < minCount1 1 ): BestReconstruction {R , b}

minCount errorCount← ←

Figure 5. A reconstruction of the famous Hannover dinosaur (Niem and Buschmann 1995) from the first two views of the sequence.

Figure 6. A reconstruction Oxford Wadham college from two views (Werner and Zisserman 2002).

Figure 7. A reconstruction of a model house2 from two views.

Figure 8. Reconstruction from the first two frames of the "corridor" sequence2. The two camera locations are shown as green spots on the left.

7. Scene Reconstruction from the Essential Matrix: Where’s the Hack?

There have been many authors who presented good to high quality results in terms of recovering camera pose and scene structure from the essential matrix. To name a few renowned researchers, Pollefeys (Pollefeys, Van Gool et al. 2004), Nister (NistΓ©r 2004), Zisserman and Hartley (Hartley and Zisserman 2003) have presented remarkable scene

2 Images retrieved from http://www.robots.ox.ac.uk/~vgg/data/data-mview.html reconstructions with their specialized algorithms for the computation of the fundamental matrix (and subsequently, of the essential matrix through known camera intrinsics). To the best of my knowledge, with the exception of Nister’s algorithm3, what these methods have in common is that they do not directly address the two primary constraints associated with the essential matrix: a) It has exactly two singular values and, b) These singular values are equal. Typically, the aforementioned algorithms enforce these constraints after the optimization. In my opinion, the ramifications of this strategy can be unpredictable, depending on the level of noise in the data. Enforcing two equal singular values is a brutal way of imposing constraints and can potentially alter the relative pose estimate to an extent at which no iterative refinement can recover from. As proven in theorem 1, matrix E is an essential matrix if and only if it can be written as the sum of two tensor products, = + (56) , , , 𝑇𝑇 𝑇𝑇 = = 0 where are unit𝐸𝐸 vectors𝑣𝑣1𝑒𝑒1 such𝑣𝑣2𝑒𝑒2 that, . An alternative approach would be to use3 the standard formula for the essential𝑇𝑇 matrix𝑇𝑇 given in equation 1 2 1 2 2 2 (8) and𝑣𝑣 impose𝑣𝑣 𝑒𝑒 orthonormality𝑒𝑒 ∈ ℝ constraints on the columns𝑣𝑣1 𝑣𝑣 of 𝑒𝑒the1 𝑒𝑒 rotation matrix and a unit-norm constraint on the baseline. Either way, a properly constrained optimization involves a Lagrangian (or some parametrized expression) with the standard 8-point algorithm cost function and 5 Lagrange multipliers (2 for orthogonality and 3 for unit norms). The Karush-Kuhn-Tucker (KKT) conditions will eventually lead to a 4th degree polynomial system in the components of , , , . This system is relatively hard to obtain analytically and it requires a special category of algorithms known as Groebner 1 2 1 2 basis solvers (Lazard 1983) to solve it. To𝑣𝑣 the𝑣𝑣 best𝑒𝑒 of𝑒𝑒 my knowledge, the parametrization of equation (3.32) is not frequently met in literature. Nister’s solution (NistΓ©r 2004) deserves special reference in this section for being the only method (to the best of the author’s knowledge) that actually solves for the essential matrix while abiding by the orthogonality constraints. The idea is to recover the essential matrix from the 4-dimensional null space of the data matrix. The constraints yield a polynomial system containing reasonably-sized expressions (only 4 unknowns up to arbitrary scale) and can be solved in a relatively uncomplicated manner. Of course, the problem with this method is that it does not generalize to the most usual formulation of the problem which is an overdetermined system4, in which case the null space of the data matrix is rarely non-empty. For completeness, I would like to mention a parametrization by Vincent Lui and Tom Drummond for an iterative solution of the 5-point problem estimation (Lui and Drummond 2007). Other solutions for the 5-point problem involve Hongdong’s (Li and Hartley 2006) and Kukelova’s (Kukelova, Bujnak et al. 2008) methods.

3 Of course, many variations of the 5-point algorithm have been proposed since Nister, but they do not essentially add something new to the concept. 4 Nister mentions in his paper that the method generalizes to the overdetermined case if the 4 eigenvectors of the data matrix corresponding to the 4 smallest singular values are taken instead of the 4 null-space basis vectors used in the 5-point case. I believe that this is an arbitrary assertion. Clearly, one is free to employ the singular vectors to diagonalize the data matrix, but there is no justification about why the constrained minimum should be in the space of the smallest 4 singular values. It is a sane conjecture from a greedy point of view, but there is no proof and moreover, there is no justification about why 4 vectors should be used instead of, for instance, 5 which is the number of DOF of the essential matrix. References

[1] H. Longuet-Higgins, "A computer algorithm for reconstructing a scene from two projections," Readings in computer vision: issues, problems, principles, and paradigms, p. 61, 1987. [2] R. Hartley, A. Zisserman, and I. ebrary, Multiple view geometry in computer vision vol. 2: Cambridge Univ Press, 2003. [3] E. Trucco and A. Verri, Introductory techniques for 3-D computer vision vol. 93: Prentice Hall New Jersey, 1998. [4] O. Faugeras, Three-dimensional computer vision: a geometric viewpoint: the MIT Press, 1993. [5] O. Faugeras, Q.-T. Luong, and T. Papadopoulo, The geometry of multiple images: the laws that govern the formation of multiple images of a scene and some of their applications: the MIT Press, 2004. [6] J. Y. Bouguet, "Pyramidal implementation of the affine lucas kanade feature trackerβ€”description of the algorithm," Technical report). Intel Corporation2001. [7] B. K. P. Horn and B. G. Schunck, "Determining optical flow," Artificial intelligence, vol. 17, pp. 185-203, 1981. [8] M. Z. Brown, D. Burschka, and G. D. Hager, "Advances in computational stereo," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 993-1008, 2003. [9] Q. T. Luong and O. D. Faugeras, "The fundamental matrix: Theory, algorithms, and stability analysis," International Journal of Computer Vision, vol. 17, pp. 43- 75, 1996. [10] W. Chojnacki and M. J. Brooks, "On the consistency of the normalized eight- point algorithm," Journal of Mathematical Imaging and Vision, vol. 28, pp. 19- 27, 2007. [11] R. I. Hartley, "In defense of the eight-point algorithm," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 580-593, 1997. [12] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An invitation to 3-d vision: from images to geometric models vol. 26: springer, 2003. [13] B. K. P. Horn, "Recovering baseline and orientation from essential matrix," J. Optical Society of America, 1990. [14] A. OpenGL, M. Woo, J. Neider, and T. Davis, "OpenGL programming guide," ed: Addison-Wesley, 1999. [15] R. Hartley, "Ambiguous configurations for 3-view projective reconstruction," Computer Vision-ECCV 2000, pp. 922-935, 2000. [16] A. Hejlsberg, S. Wiltamuth, and P. Golde, C# language specification: Addison- Wesley Longman Publishing Co., Inc., 2003. [17] D. Fay, "An architecture for distributed applications on the Internet: overview of Microsoft's. NET platform," in Parallel and Distributed Processing Symposium, 2003. Proceedings. International, 2003, p. 7 pp. [18] G. Bradski, "The opencv library," Doctor Dobbs Journal, vol. 25, pp. 120-126, 2000. [19] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol. 24, pp. 381-395, 1981.

View publication stats