See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/304035052
Relative Camera Pose Recovery and Scene Reconstruction with the Essential Matrix in a Nutshell
Technical Report Β· March 2013
CITATIONS READS 0 32
1 author:
George Terzakis University of Portsmouth
13 PUBLICATIONS 0 CITATIONS
SEE PROFILE
All content following this page was uploaded by George Terzakis on 17 June 2016.
The user has requested enhancement of the downloaded file. All in-text references underlined in blue are added to the original document and are linked to publications on ResearchGate, letting you access and read them immediately. Relative Camera Pose Recovery and Scene Reconstruction with the Essential Matrix in a Nutshell
George Terzakis
Technical Report#: MIDAS.SNMSE.2013.TR.007
Plymouth University, March 11, 2013
Abstract This paper intends to present and analyze all the necessary steps one needs to take in order to build a simple application for relative pose structure from motion using only the epipolar constraint. In other words, to illustrate in practice (i.e,, cook-book style) about how the most fundamental of concepts in epipolar geometry can be combined into a complete and disambiguated series of steps for camera motion estimation and 3D reconstruction in a relatively simple computer program. In particular, the essential matrix is defined with respect to the algebraic interpretation of the rotation matrix, in order to eliminate ambiguities with respect to how relative orientation is extracted from the epipolar constraint. Based on the latter formulation, we list the prominent properties of the essential matrix with proofs to provide necessary intuition into the series of derivations that ultimately lead to the recovery of the orientation and baseline vector.
1. Introduction
The problem of 3D reconstruction and odometry estimations has been under the spotlight of research in computer vision and robotics for the past two decades. Since the early paper by Longuett β Higgins [1] which built a start-up theoretical layer on the original photogrammetric equations by introducing the epipolar constraint, there have been numerous studies and proposed algorithms to estimate camera motion and to realize the world locations of a sparse cloud of points [2-5]. Such problems are now known as structure from motion (SFM) problems and fall in the broader field of multiple view geometry in computer vision. Although the geometry of multiple images has been extensively researched, structure from motion remains a vastly unexplored territory, mainly due to great informational bandwidth that images carry. The pipeline of SFM starts with the detection of corresponding points in two distinct images of the same scene. There have been several algorithms that match points across multiple views, known as optical flow estimation methods [6-8]. Although such methods are being extensively employed with a great deal of success in many cases, they are nevertheless relying on a number of strong assumptions such as brightness constancy, no occlusions and relatively slow camera motion. Limitations in point will consequently propagate a fair amount of noise throughout geometrical computations, thereby introducing unstable results in many cases.
2. The Epipolar Constraint
2.1 Moving Between Coordinate Frames The goal of this section is to derive a formula for the coordinates of a 3D point M in a given camera frame located at a location b, given the orientation of the frame as three unit vectors ( , , ), where b and , , are expressed in terms of the first camera frame ( , , ) which is chosen for global reference. Figure D.1 illustrates the two 1 2 3 1 2 3 frames, the baselineπ’π’ π’π’ π’π’and the vectors thatπ’π’ connectπ’π’ π’π’ the two camera centers with the point M. π€π€1 π€π€2 π€π€3
Figure 1. The two camera frames; the baseline and the vectors that connect the two camera centers and with M are indicated with dashed lines. Let now = be the coordinates of in the first camera frame and the respective ππ1 ππ2 coordinates in the second camera frame. Also, let ( ) be the vector expressed 1 2 in the coordinateππ ππ frame of the first camera.ππ Then, the coordinates ofππ will be the 2 1 2 projections of ( ) on the direction vectors , ππ, ππ : ππ ππ ππ2 ( ) ππ2ππ 1 π’π’1 π’π’2 π’π’3 = ππ( ) = ππ ( ) = [ ] ( ) = ( ) (1) π’π’1 ππ2ππ 1 π’π’1 ππ( ) ππ ππ ππ 2 2 2 1 2 2 1 π’π’1 π’π’2 π’π’3 2 1 2 1 ππ οΏ½π’π’ππ ππ ππ οΏ½ οΏ½π’π’πποΏ½ ππ ππ ππ ππ π π ππ ππ But ( ) can2 be1 written as the difference of vectors ( ) = and b: π’π’3 ππ ππ π’π’3 ( ) = ππ2ππ 1 ππ1ππ 1 ππ1 (2) Substituting from (1) into (2) yields: ππ2ππ 1 ππ β ππ = ( ) (3) ππ ππ2 π π ππ1 β ππ 2.2 The Essential Matrix With the above in place, consider now the normalized Euclidean projections and of . Epipolar geometry in two views concerns the geometry of the plane induced by the 1 2 baseline and the projection rays that connect the real-world location of a featureππ with ππthe twoππ projection (camera) centers. Thus, for each pair of correspondences there exists a plane that contains the respective world point and the baseline vector. Figure 2 illustrates the epipolar plane of a pair of correspondences. The projections of the two camera centers in the first and second view are the epipoles , . The lines and defined by the epipoles and the normalized Euclidean projections , are known as the epipolar 1 2 1 2 lines; epipolar lines are in fact, the projections ofππ theππ two rays thatππ passππ through the 1 2 camera centers , and the point M onto the oppositeππ imageππ planes.
Let nowππ 1 ππand2 be the direction vectors (in the first camera coordinate frame) of the projection rays that connect the first and second camera center with the point M. 1 2 Evidently, , π’π’ and spanπ’π’ the epipolar plane that corresponds to M.
π’π’1 π’π’2 ππ
Figure 2. Epipolar plane induced by corresponding projections. An equivalent way of expressing this coplanarity is by considering the orthogonality relationship between and the cross product of b and : ( Γ ) = 0 π’π’2 π’π’1 (4) where is the inner product operator.2 Since1 = = and = + = π’π’ β π’π’ ππ + , substituting in equation (3.10) yields, β π’π’2 π’π’1 β ππ ππ1 β ππ π’π’1 π π π’π’2 ππ 2 ( ) [ ] ( ) π π ππ ππ Γ + = 0 (5) [ ] ππ where Γ is the cross ππproduct1 β ππ skewοΏ½ ππ symmetricπ π π π 2 ππ οΏ½matrix of b. By simply applying the distributive law and taking into consideration the fact that [ ]Γ = [ ]Γ = 0, the followingππ constraint is obtained: ππ ππ ππ ππ ππ [ ]Γ = 0 [ ]Γ = 0 (6) ππ ππ ππ By definition, the normalizedππ1 ππ π π π π Euclidean2 βΊ projectionsππ2 π π ππ areππ projectively1 equal to and and therefore the constraint can be re-expressed in terms of and : ππ1 ππ2 ([ ] ) = 0 ( [ ] ) = 0 Γ Γ ππ1 ππ2 (7) ππ 3 Γ 3 [ππ ] ππ The essential matrix.ππ1 ππ Theπ π ππ2 matrixβΊ ππ2 π π Γ isππ knoππwn1 in literature as the essential matrix E. Using the essential matrix notation,ππ the epipolar constraint is obtained in its widely recognized form [1]1: π π ππ
[ ]Γ = 0 (8) ππ ππ ππ2 οΏ½π π οΏ½οΏ½πποΏ½οΏ½ ππ1 πΈπΈ 1 Evidently, there can be many essential matrices corresponding to the same set of 2-view projections depending on the interpretation given to the rigid transformation that links the two camera poses. In this thesis, the rotation matrix R is perceived as the matrix that contains the second camera frame directions (expressed in the first camera frame) as its columns and b is the baseline vector (also in the first camera coordinate frame): Thus, the essential matrix will always be given by = [ ]Γ. ππ πΈπΈ π π ππ 4. Properties of the essential matrix
The most important properties of the essential matrix are listed in this section. These properties are very useful either as constraints or as supplementary formulas in the course of scene structure and relative camera pose estimation. Lemma 1. Suppose E is an essential matrix. Then, the epipoles and are the left and right null spaces of E respectively. ππ1 ππ2 Proof. Since the epipoles are the scaled images of the camera centers, then there exist and such that = and = . By the expression of the essential matrix in equation (D.1), = [ ]Γππ. Therefore: ππ1 ππ2 ππ2 βππ2π π ππ ππ1 ππ1ππ = ( ) [ ]ππ = [ ] = [ ] = 0 π¦π¦ π π Γ ππ Γ Γ (9) ππ ππ ππ ππ ππ ππ ππΓ ππ2 πΈπΈ βππ2 π π ππ π π ππ βππ2ππ π π οΏ½π π ππ βππ2 πποΏ½οΏ½πποΏ½ πΌπΌ3 βππ ππ=0 Also, = [ ] = 0 Γ (10) ππ Γ πΈπΈππ1 ππ2π π οΏ½πποΏ½οΏ½ππ Lemma 2. Any point , has an associatedππ ππ=0 epipolar line , in the opposite view given by: ππ1 ππ2 ππ1 ππ2 = = Γ = = Γ
Proof. The proof is a directππ2 consequenceπΈπΈππ1 ππ2 ofππ the2 epipolarππ1 πΈπΈconstraintππ2 ππ1 forππ 1 and . Γ Lemma 3. For any orthonormal matrix and for any vectorππ1 ππ2 the following holds: 3 3 3 π π β β ππ β β [ ]Γ = [ ]Γ (11) Proof. Let R be an orthonormal matrix. Then by ππthe definition of cross-product, π π π π π π ππ π π ( ) Γ = [ ]Γ (12)
Multiplying (D.8) with fromπ π π π the leftππ yields,π π π π ππ ππ ( ) [ ] π π Γ = Γ (13) ππ ππ We now resort to the followingπ π οΏ½ π π π π propertyπποΏ½ ofπ π theπ π π π crossππ product that holds for any linear transformation M and any pair of vectors a, b: ( ) Γ ( ) = ( Γ ) (14) Making use of the cross product property in (14), equation (13) becomes: ππππ ππππ ππ ππ ππ ( ) Γ ( ) = [ ]Γ ππ Γ ( )ππ = [ ππ ] π π π π ππ π π ππ π π π π Γπ π ππ [ ] = [ ] ππ ([ ]ππ [ ] ) = 0 Γ β ππ Γπ π ππ π π Γ π π π π ππ Γ ππ ππ [ ] = [ ] ππ ππ (15) β ππ π π ππ π π π π π π ππΓ β ππ π π Γ β π π π π π π ππ ππ β ππ π π π π π π π π A very significant theorem that provides a simple but hard criterion for the existence of an essential matrix is the following (Faugeras, Luong et al. 2004). Theorem 1. A 3 3 matrix E is an essential matrix if and only if it has a singular value decomposition = such that: π₯π₯ ππ = { , , 0}, > 0 πΈπΈ ππππππ where U, V are orthonormal matrices. ππ ππππππππ ππ ππ ππ Proof. Let E be a fundamental matrix. Then, E is given by, = [ ]Γ. Taking yields, ππ ππ π¦π¦ π π ππ πΈπΈ πΈπΈ = ( [ ]Γ) [ ]Γ = [ ]Γ [ ]Γ = [ ]Γ [ ]Γ (16) ππ ππ ππ ππ ππ ππ ππ ππ = Let now RπΈπΈ0 beπΈπΈ the π π rotationππ thatπ π alignsππ theππ baselineπ π π π π π vectorππ withππ theππ z-axis, that is, [0 0 ] . Then, using lemma 3, it follows that [ ]Γ = [ ]Γ . Substituting in 0 (D.7) yields, ππ ππ π π ππ βππβ ππ π π 0 π π 0ππ π π 0 = 0 [ 0 ]Γ 0 0 [ 0 ]Γ 0 = 0 [ 0 ]Γ [ 0 ]Γ[ 0 ]Γ 0 ππ ππ ππ 0ππ 0 0ππ ππ 0 πΈπΈ πΈπΈ οΏ½π π π π ππ π π οΏ½ π π π π ππ π π π π π π ππ π π ππ π π ππ π π = 0 0 0 0 ππ ππ 0 β0ππβ 0 0 ββ0ππβ 0 πΈπΈ πΈπΈ 0 0 β π π οΏ½ββππβ οΏ½ οΏ½0βππβ 0 οΏ½ π π = 0 2 0 (17) ππ ππ βππ0β 0 2 0 0 0 β πΈπΈ πΈπΈ π π οΏ½ β β οΏ½ π π The decomposition of equation (17) is a SVD ππ(one out many) of . Hence, E will also decompose as follows: ππ πΈπΈ πΈπΈ 0 0 = 0 0 (18) β0ππβ 0 0 πΈπΈ ππ οΏ½ β β οΏ½ π π 0 for some orthonormal matrix U. ππ Let now E be a 3 3 matrix with two exactly non-zero singular values which are equal. In this case, it is easier to proceed by gradually constructing the sought result. The decomposition of the matrix E is, π₯π₯ 0 0 = 0 0 = (19) 0π π 0 0 ππ ππ πΈπΈ ππ οΏ½ οΏ½ ππ ππππππ where U and V are orthonormal matrices.π π The construction that follows is relying on the observation that S can be written in the following way: 0 1 0 0 0 0 1 0 = 1 0 0 0 0 1 0 0 = (20) 2 2 ππ 0 0 1 ππ0 0 0 0 β0 1 ππ ππ ππ οΏ½ οΏ½ οΏ½ οΏ½ οΏ½ οΏ½ π§π§ οΏ½ οΏ½ π§π§ οΏ½ οΏ½ or, β ππ π π β πππ π β 0 1 0 0 0 0 1 0 = 1 0 0 0 0 1 0 0 = (21) 2 2 ππ 0 β0 1 ππ0 0 0 0 0 1 ππ ππ ππ οΏ½ οΏ½ οΏ½ ππ οΏ½ οΏ½β οΏ½ π π π§π§ οΏ½ οΏ½ πππ π π§π§ οΏ½ οΏ½
where and are rotations by Β± /2 about the z axis. We observed that the product of S withππ the precedingππ or following rotation yields a skew symmetric matrix: π π π§π§ οΏ½β 2οΏ½ π π π§π§ οΏ½2οΏ½ ππ 0 0 = = 0 0 2 2 ππ 2 ππ (22) 0 β0ππ 0 ππ ππ ππ ππ οΏ½π π π§π§ οΏ½ οΏ½ πποΏ½ π π π§π§ οΏ½ οΏ½ οΏ½ππ οΏ½ π π π§π§ οΏ½ οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ or, π π π π π π π π π π π π π π π π π π π π π π π π π π 0 0 = = 0 0 2 2 ππ 2 ππ (23) 0 ππ0 0 ππ ππ ππ ππ π π π§π§ οΏ½β οΏ½ οΏ½πππ π π§π§ οΏ½β οΏ½ οΏ½ οΏ½βππ οΏ½ π π π§π§ οΏ½β οΏ½ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ Now, the property of skewπ π π π π π π π symmetricπ π π π π π π π π π π π π π π π π π matrices in lemma 3 can be of great use in a heuristic sense. In particular, it guarantees that for any rotation matrix U and for any skew symmetric matrix Γ, the matrix Γ is also a skew symmetric matrix. In the light of this consequence and choosing (22), theππ SVD of E can be expressed as follows: ππ0 0 ππππ ππ = 0 0 2 ππ 0 β0ππ 0 ππ πΈπΈ πππ π π§π§ οΏ½β οΏ½ οΏ½ππ οΏ½ ππ 0 0 = 0 0 2 ππ 2 2 ππ ππ 0 β0ππ 0 ππ ππ ππ ππ ππ π π π§π§ οΏ½β οΏ½ ππ πππ π π§π§ οΏ½β οΏ½ οΏ½ππ οΏ½ π π π§π§ οΏ½β οΏ½ ππ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ ππππππππππ π‘π‘π‘π‘ ππππππππππππππππ 0 0 = 0 0 (24) 2 ππ 2 2 ππ ππ 0 β0ππ 0 π§π§ ππ π§π§ ππ π§π§ ππ β πΈπΈ οΏ½πππ π οΏ½β οΏ½ ππ οΏ½ οΏ½ πππ π οΏ½β οΏ½οΏ½ οΏ½ππ οΏ½ οΏ½ππ π π οΏ½β οΏ½ οΏ½. οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ ππ ππππππππππππππππ ππππππππππππ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ In the very same way, one arrives at a similarππ π π π π π π π π π π π π π π π π π π π π π π π π π π result startingππππππππππ ππfromππππ π£π£ π£π£π£π£π£π£π£π£π£π£(D.14):ππππ ππππππππππ π·π· 3 0 0 = 0 0 2 ππ 2 2 ππ (25) ππ 0 ππ0 0 π§π§ ππ π§π§ ππ π§π§ ππ πΈπΈ οΏ½πππ π οΏ½ οΏ½ ππ οΏ½ οΏ½πππ π οΏ½ οΏ½οΏ½ οΏ½β ππ οΏ½ οΏ½ππ π π οΏ½ οΏ½οΏ½ . οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ ππ ππππππππππππππππ ππππππππππππ οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ ππ π π π π π π π π π π π π π π π π π π π π π π π π π π ππππππππππππ ππππ π£π£π£π£π£π£π£π£π£π£π£π£ ππππ ππππππππππ π·π· 3 The only remaining βloose endβ now is to show that matrices and ππ ππ ππ are indeed rotation matrices. Since the product involves ππbothπ π π§π§ οΏ½ Uβ and2οΏ½ ππV, then ππ the signππ of theππ product of their determinants should be positive due to the fact that these π§π§ 2 matricesπππ π οΏ½ οΏ½ participateππ in the SVD of which always has a positive determinant. And since the determinant of is also ππpositive, it follows that the two aforementioned products are orthonormal matrices withπΈπΈ πΈπΈ positive determinants, hence rotations. And that π§π§ concludes the proof. π π
Lemma 4. If E is an essential matrix such that = [ ]Γ, then: ( ) = ( ) = ππ2 πΈπΈ π π ππ ππ ππ 2 ππππ πΈπΈπΈπΈ ππππ πΈπΈ πΈπΈ βππβ Proof. Taking yields: ππ πΈπΈ πΈπΈ = ( [ ]Γ) [ ]Γ = [ ]Γ (26) 2 = [ ππ] ππ ππ ππ If now πΈπΈ πΈπΈ , then,π π ππ π π ππ β ππ ππ ππ ππ1 0ππ2 ππ3 + = 0 2 = 2 2 + (27) βππ3 ππ2 ππ2 ππ3 βππ1ππ2 βππ1ππ3 ππ 0 2 2 + πΈπΈ πΈπΈ β οΏ½ ππ3 βππ1οΏ½ οΏ½ βππ1ππ2 ππ1 ππ3 βππ2ππ3 οΏ½ 2 1 ( ) = 2( + + ) = 2 2 2 From (D.18) it isβ clearππ thatππ βππ1ππ3 βππ2ππ3 ππ1. ππ2 ππ 2 2 2 2 In the case of , theππππ followingπΈπΈ πΈπΈ isππ obtained:1 ππ2 ππ3 βππβ ππ πΈπΈπΈπΈ = [ ]Γ( [ ]Γ) = [ ]Γ 2 ππ ππ = ( ππ[ ] ππ)( [ ]ππ ) (28) πΈπΈπΈπΈ π π ππ π π ππ Γ βπ π Γ ππ π π ππ [ ] ππ= [ ]ππ Once again, lemma 3 statesβ πΈπΈ πΈπΈthat π π Γ ππ π π π π Γ ππ forπ π any orthogonal matrix R. Hence, (28) becomes: ππ ππ π π π π π π π π
= ( [ ]Γ ) ( [ ]Γ ) = [ ]Γ (29) ππ ππ [ ]Γ ππ ππ [ ]Γ ππ 2 πΈπΈ πΈπΈ οΏ½π π οΏ½π π οΏ½οΏ½πποΏ½οΏ½π π οΏ½οΏ½ π π οΏ½ οΏ½π π οΏ½π π οΏ½οΏ½πποΏ½οΏ½π π οΏ½οΏ½ π π οΏ½ β ππ And since = , it followsππ from (27) thatππ ( ) = 2 . ππ 3 Γ 3 ππ 2 Theorem 5.βπ π A ππβ β nonππβ -zero matrix E is an essentialππππ πΈπΈ matrixπΈπΈ if andβππβ only if the following relationship holds: ( ) = 2ππ ππ ππππ πΈπΈ πΈπΈ Proof. Proving that if the relationshipπΈπΈπΈπΈ πΈπΈholds then E isπΈπΈ an essential matrix could be done through its SVD. Let = where , are orthonormal matrices and is a diagonal matrix with positive entries.ππ Substitutingππ in the given relationship, yields: πΈπΈ ππππππ ππ ππ ππ ( ) ( ) = = 0 2 ππ 2 ππ ππ ππππ πΈπΈπΈπΈ 2 ππ ππ ππππ πΈπΈπΈπΈ ππ πΈπΈπΈπΈ πΈπΈ β πΈπΈ ππππ ππ ππππ( ππ β) ππππππ = (30) 2 ππ 2 ππππ πΈπΈπΈπΈ Let , , be the singular βvaluesππ of . Then, ππthe trace of should be equal to the sum of the squared singular values: ππ π π 1 π π 2 π π 2 ππ πΈπΈπΈπΈ ( ) = + + (31) It follows from (30) and (31) that, ππ 2 2 2 ππππ πΈπΈπΈπΈ π π 1 π π 2 π π 3 2 = ( + + ) 2 3 = ( 2 + 2 + 2) π π 1 π π 1 π π 2 π π 3 π π 1 (32) 2 3 = ( 2 + 2 + 2) π π 2 π π 1 π π 2 π π 3 π π 2 3 2 2 2 π π 3 π π 1 π π 2 π π 3 π π 1 Since E is non-zero, one singular value must be strictly positive. Without constraining generality, let be strictly positive. It follows that,
1 ( + + ) π π = = + (33) 2 22 2 2 π π 1 π π 2 π π 3 2 2 2 Substituting in the expressionπ π 1 for the secondβ singularπ π 1 π π 1 valueπ π 2 in (32) yields: = ( + ) = 0 (34) 3 2 2 2 It follows that either π π or2 π π is2 zero.π π 3 π π If2 βtheyπ π 2 areπ π 3 both zero, then so is which is a contradiction. Exactly 2 of the 3 singular values are non-zero and have the same value, π π 2 π π 3 π π 1 = = ππ . πππποΏ½πΈπΈπΈπΈ οΏ½ π π 1 π π 2Consider2 now the case in we which we known that E is an essential matrix. Taking the given relationship and substituting from (D.18) and using the fact that [ ]Γ = , we have: 2 ππ ππ 2= [ ] = = [ ] ππππ β βππβ πΌπΌ Γ Γ [ππ]Γ 2 2 [ ]Γ ππ 2 ππ Γ ππ πΈπΈ πΈπΈοΏ½πΈπΈ2 βπΈπΈ πποΏ½ππ 2 βππβ πΈπΈ( β )πππΈπΈβ ππππ βππβ πΈπΈ β π π οΏ½πποΏ½οΏ½ππ ππ (35) β ππ ππ=ππ ββππβ πΌπΌ = π π ππ 03 1 2ππ 2 ππππ πΈπΈ πΈπΈ ππππβοΏ½οΏ½πππΈπΈβπππΈπΈοΏ½ πΈπΈ πΈπΈ where 0 Γ is the zero2 3 Γ 1 vector. And that concludes the proof in the opposite direction. 3 1 5. Recovering baseline and orientation
The method for relative pose extraction detailed in this section is based on the brilliant observation by Berthold Horn (Horn 1990) that the matrix of cofactors of an essential matrix can be expressed in terms of the rotation matrix, the skew symmetric matrix of the baseline and the essential matrix itself. Consider a 3D point M and its normalized Euclidean projections and in two views as show in Figure D.1. With the essential matrix in place, the next step is to 1 2 obtain the rotation matrix R and the unit-length baseline vector b. As a firstππ step,ππ from theorem D.1, scale can be removed from the essential matrix by dividing it with . Also, lemma D.4 states that ( ) = 2 ; thus, a βnormalizedβ essential matrix is obtained as follows: ππ 2 βππβ ππππ πΈπΈ πΈπΈ βππβ πΈπΈππ = ( ) (36) πΈπΈ ππ ππ πΈπΈ ππππ πΈπΈ πΈπΈ οΏ½ 2 5.1 Baseline From lemma 4 it is easy to extract the absolute values of the baseline components as follows: | | = 1 [ ] (37) ππ ππ1 οΏ½ β πΈπΈππ πΈπΈππ 11 | | = 1 [ ] (38) ππ |ππ2| = οΏ½1 β [πΈπΈππ πΈπΈππ]22 (39) [ ] ππ th th where denotes the elementππ3 οΏ½of β πΈπΈππ inπΈπΈππ the33 i row and j column. To resolve the sign ambiguity,ππ the largest squared componentππ is assumed to be a positive square root ππ ππ ππππ ππ ππ and the πΈπΈremainingπΈπΈ signs are inferred fromπΈπΈ theπΈπΈ off-diagonal elements of : + ππ πΈπΈππ πΈπΈππ = 2 2 + (40) ππ2 ππ3 βππ1ππ2 βππ1ππ3 ππ 2 2 + ππ ππ 1 2 1 3 2 3 πΈπΈ πΈπΈ οΏ½ βππ ππ ππ ππ β2ππ ππ 2οΏ½ It suffices to recover one baselineβππ1 ππvector3 β fromππ2ππ3 ππ1 as ππdes2 cribed above, as the second baseline will simply be a vector of opposite direction.ππ πΈπΈππ πΈπΈππ 5.2 Orientation Recovering the rotation matrix requires slightly more elaborate pre-processing. Consider the matrix of cofactors of :
πΈπΈππ ππ22 ππ23 ππ21 ππ23 ππ21 ππ22 = 32 33 31 33 31 32 β‘ οΏ½ππ ππ οΏ½ β οΏ½ππ ππ οΏ½ οΏ½ππ ππ οΏ½ β€ (41) β’ 12 13 11 13 11 12 β₯ ππ ππ ππ ππ ππ ππ ππ πΆπΆ β’β οΏ½ 32 33οΏ½ οΏ½ 31 33οΏ½ β οΏ½ 31 32οΏ½β₯ β’ ππ ππ ππ ππ ππ ππ β₯ ππ12 ππ13 ππ11 ππ13 ππ11 ππ12 Standard tensor notationβ’ is adopted to denote matrix rows andβ₯ columns as well as β£ οΏ½ππ22 ππ23οΏ½ β οΏ½ππ21 ππ23οΏ½ οΏ½ππ21 ππ22οΏ½ β¦ elements for the following derivations. Thus, for instance, is the element of in the ith row and the jth column. Also, is the ith row of as a 1 Γππ 3 vector, while is the jth ππππ πΈπΈππ column as a 3 Γ 1 vector. With notationππ in place, we observe that can be written as ππ ππ follows: ππ πΈπΈ ππ πΆπΆππ (( ) Γ ( ) ) = (( 2)ππ Γ ( 3)ππ)ππ (42) ((ππ3)ππ Γ (ππ1)ππ)ππ πΆπΆππ οΏ½ ππ ππ οΏ½ Also, can be expressed in terms of 1crossππ -products2 ππ ππ as follows: ππ ππ [ ] ([ ] ) ( Γ ) πΈπΈππ Γ Γ = [ ] = ππ [ ] = ππ[ ] = ([ ] )ππ = ( Γ )ππ Γ 1 Γ 1 Γ Γ 1 1 (43) ππ ππ ππ ππ ππ[ππ] β([ππ] ππ )ππ β(ππ Γ ππ )ππ ππ 2 2 Γ Γ 2 2 πΈπΈ π π ππ οΏ½ππ πποΏ½ ππ οΏ½ππ ππ ππ οΏ½ οΏ½β ππ ππ πποΏ½ οΏ½β ππ ππ πποΏ½ Substituting from (43)3 in (42) yields3 triple products in the3 rows of ; applying3 the well- ππ ππ ππ β ππ ππ β ππ ππ known triple product expansion formula leaves cross product expressions only between ππ the columns of R (intermediate result) which can also be eliminatedπΆπΆ from the expression by orthonormality (final expression on the right): ( Γ ) ππ ( ) ( ) 2 3 Γ Γ Γ β‘οΏ½ππ β οΏ½πποΏ½οΏ½οΏ½πποΏ½οΏ½ ππ β€ ( ) ππ β’ ππ1 β₯ = ( Γ 2) Γ ( Γ 3) = ( Γ ) = ( ) ππ (44) β‘οΏ½ ππ ππ ππ ππ οΏ½ β€ β’ β₯ ππ β ππ1 ππ ππ ππ ( ) ππ ππ ( Γ ) Γ ( Γ ) β’ 3 1 β₯ 2 πΆπΆ β’οΏ½ ππ ππ3 ππ ππ1 οΏ½ β₯ οΏ½ππ β οΏ½πποΏ½οΏ½οΏ½πποΏ½οΏ½ ππ οΏ½ ππ β ππ ππ οΏ½ ππ β’ ππ2 β₯ ππ β’ β₯ β’ ( Γ ) β₯ ππ β ππ3 ππ οΏ½ ππ ππ1 ππ ππ2 οΏ½ β£ β¦ β’ ππβ₯ οΏ½ππ β ππ1 ππ2 οΏ½ ππ β’ οΏ½οΏ½οΏ½3οΏ½οΏ½ β₯ It is now easy to re-arrange the inner productsβ£ inππ (44) in β¦order to factor-out the columns of R: ( ) ( ) = ( ππ ) ππ = ππ( ππ) = ( ) (45) ππ1 ππ ππ ππ1 ππππ ( ππ ) ππ ππ( ππ) ππ ππ ππ 2 2 πΆπΆ οΏ½ ππ ππππ πππποΏ½ οΏ½ππ ππ ππππππ οΏ½ π π ππππ And a well-known skew symmetricππ3 ππ ππ matrixππ3 propertyππππ is that, 2 = + [ ]Γ (46) ππ Substituting (46) into (45) yields: 3 ππππ πΌπΌ ππ [ ]2 ( [ ] ) [ ] [ ] [ ] = + Γ = + Γ Γ = + Γ = + Γ (47) ππ ππ ππ ππ ππ ππ πΆπΆππ π π οΏ½πΌπΌ3 οΏ½ π π οΏ½π π οΏ½οΏ½οΏ½οΏ½οΏ½οΏ½ π π πΈπΈππ β π π πΆπΆππ πΈπΈππ And since b is signππ -ambiguous, πΈπΈitππππ followsππ that there existππ two possible rotationππ matrices and can be obtaining by flipping the sign of [ ]Γ in (D.38): = Β± [ ] (48) ππΓ ππ ππ ππ ππ 5.3 Resolving ambiguities in theπ π baselineπΆπΆ -rotationππ πΈπΈ solution From an algebraic point of view, it is clear that, given an essential matrix , a very same baseline vector can be obtained either from , or from , since = ( ) ( ). Clearly, this ambiguity reflects the direction of the baseline, sinceππ πΈπΈthe negativeππ sign can only originate from [ ]Γ. πΈπΈ βπΈπΈ πΈπΈ πΈπΈ βπΈπΈ βπΈπΈ The four possible relative pose configurations recovered from the essential matrix ππ are illustrated in Figures 3 and 4 in terms of the location of the observed points. It is clear that only one combination of the two baselines and two rotation matrices aligns both cameras behind the observed point. This suggests that, in order to resolve the ambiguity between the four solutions, a reconstruction of the scene must be obtained and the transformation that yields positive (or negative) signs in both depths of an observed point as observed from the two camera views should be the correct one. In other words, the point should lie in front of both cameras. With noisy data, this may not be the case for all points even for the correct transformation and therefore in practice this is a matter of voting.
Figure 3. Camera orientation with respect to the observed points for "positive" baseline direction.
Figure 4. Camera orientation with respect to the observed points for "negative" baseline direction. 6. Scene Reconstruction
In this section, a general method for 3D scene reconstruction in two views from known relative pose and correspondences is presented. This method is also used to disambiguate the four relative pose solutions extracted from the essential matrix. Let = [ ] be the real-world location of an observed point expressed in the first camera coordinateππ frame and let and be the respective normalized Euclidean projectionsππ ππ ππin theππ two camera views. Also, let b denote the baseline vector in 1 2 the coordinate frame of the first camera and Rππ be the orthonormalππ matrix containing the directions of the second camera frame (expressed in the first camera coordinate frame) arranged column-wise. It follows that can be expressed in terms of as follows:
= 1 (49) ππ ππ ( ) The location of M in the second ππcameraππππ 1frame is then . The normalized Euclidean coordinates of are given by the following: ππ 1 π π ππ β ππ ππ = ( ) (50) 1 ( ) ππ 2 ππ ππ where 1 = [0 0 1] ππ. Substitutingπ§π§ (49) intoπ π ππ(50β) ππyields a relationship in which the π π ππ β ππ only unknown is the depthππ Z: π§π§ 1 ( ) = ( ) ππ ππ ππ οΏ½1 π§π§ π π ( ππππ1 β ππ) οΏ½ππ2 π π ( ππππ1 β ππ) = 0 (51) ππ ππ ππ π§π§ 1 2 1 ( ) ( ) For any vectors β, οΏ½, ofπ π arbitraryππππ β ππdimension,οΏ½ππ β π π it ππisππ easyβ ππto prove that = . With this identity at hand, the relationship in (3.27) becomes: ππ ππ ππ ππ ππ ππ ππ ππ ππππ ππ ( 1 ) ( ) ( ) = 0 ππ ( ππ 1 ) ( ππ ) = 0 (52) ππ2 π§π§ π π ππππ1 β ππ β π π ππππ1 β ππ ππ ππ where is the 3 Γ 3 identity2 matrix.π§π§ 3 Equation1 (52) is an over-determined system in Z β ππ β πΌπΌ π π ππππ β ππ and yields two solutions, one for each projection component, provided that the 3 measurementsπΌπΌ are completely noise-free. However, in most cases the two solutions do not agree and, furthermore, we observed that in the majority of these cases, disparity tends to concentrate either on the x or on the y axis, thereby making one solution more βreliableβ than the other. The proposed workaround is to regard Z as a minimizer of the following optimization problem: {( ) ( )} (53) ππ ππ 1 1 where C is the followingππππππππππ ππnonππππππ-invertibleππππ β ππpositiveπ π π π π π semiππππ-definiteβ ππ (PSD) matrix: 1 0 = ( 1 ) ( 1 ) = 0 1 2 (54) ππ ππ ππ β+π₯π₯ 2 π§π§ 3 2 π§π§ 3 2 πΆπΆ ππ β πΌπΌ ππ β πΌπΌ οΏ½ 2βπ¦π¦ 2οΏ½ and and are the coordinates of in βtheπ₯π₯2 directionsβπ¦π¦2 π₯π₯ 2of theπ¦π¦2 local x and y axis respectively. Taking the derivative of the quadratic expression in (53) in terms of Z and 2 2 2 settingπ₯π₯ it to zero,π¦π¦ yields the following minimizer:ππ
= (55) ππ ππ ππ1 π π π π π π ππ ππ ππ The expression in (55) is a robustππ depth estimate1 which takes the direction of disparity ππ1 π π π π π π ππ into consideration thereby avoiding the βpitfallβ of having to choose between two solutions for depth without any criteria at hand on how to make that choice. It should be however noted that estimation can yield very erroneous (e.g., negative depth) results if disparity is very noisy in both axes; in such a case, it is preferable to discard the point.
6. The Algorithm and Sample Reconstructions
Examples of 2-view reconstructions using the methods described in the previous sections are illustrated in Figures 5-8. Features were detected with SIFT and the LK tracker was used to establish correspondences. Flow fields are shown on the images on the right (in green are new features detected in the second image for subsequent tracking). RANSAC outliers were omitted from the reconstruction and do not appear in the flow field illustration. Finally, since camera intrinsics were unknown in all cases, the reconstructions present a discrepancy up to an affine transformation with the ground truth (made-up intrinsics were used). Algorithm 1 describes the steps for relative pose and structure recovery from an essential matrix. Algorith 1. 3D Reconstruction and recovery of relative pose from two views
Input: a) Set of normalized Euclidean correspondences ( ) and ( ) b) Essential matrix, E. ππ ππ ( ) Output: a) Camera relative pose ( , ), b) 3D coordinatesππ1 of all pointsππ2 . ππ comment Obtain the SVD ofπ π E:ππ ππ [U, S, V] svd(E)
commentβ Obtain a βnormalized essential matrixβ by removing scale and at the same time impose the necessary (and capable) condition of exactly two and equal singular values: 1 0 0 E U 0 1 0 V 0 0 0 T n β οΏ½ οΏ½ AcceptReconstruction False.
comment Computeβ the absolute values of the baseline components , , from ππ 1 2 3 ππ ππ b 1 [E E ] ππ ππ ππ πΈπΈ πΈπΈ T 1 β οΏ½ β n n 11 b 1 [E E ] T 2 β οΏ½ β n n 22 b 1 [E E ] T 3 n n 33 commentβ οΏ½ β Choosing the greatest component (in absolute value) as positive and work-out the remaining signs from the off-diagonal elements of the essential matrix If ( max{b , b , b } = b ):
If (1[E 2E 3] > 01) T n n 12
If ([E Eππ2] β>β0ππ2): T n n 13
Else If ( max{b ππ,3bβ, bβ}ππ3= b ): If ([E E1 ] 2 >3 0): 2 T n n 12
If ([E Eππ1] β>βππ01): T n bn 23 b
Else: 3 β β 3 If ([E E ] > 0): T n bn 13 b
1 β β 1 If ([E E ] > 0): T n bn 23 b
comment Storing2 β β the2 two possible baselines Baselines {(b , b , b ) , ( b , b , b ) } comment β Find1 2 the3 matrix βof cofactors1 β 2 β of3 e e e e (e e e eπΈπΈππ) e e e e C (e e e e ) e e e e (e e e e ) 22 33 32 23 21 33 31 23 21 32 31 22 e e β e e β(e e β e e ) e e β e e n β οΏ½β 12 33 β 32 13 11 33 β 31 13 β 11 32 β 31 12 οΏ½ comment 12 23 Storeβ 22 the13 two βpossible11 23 rotationβ 21 13matrices 11 22 β 21 12
Rotations {C + [Baselines(1)]ΓE , C + [Baselines(2)]ΓE } T T T T comment β nFind the best scene reconstructionn n for the 4 possiblen relative poses BestReconstruction {Rotations(1) , Baselines(1)}
MinCount β For each baselineβ β b in Baselines: For each rotation matrix in Rotations:
errorCount 0 π π For each pairβ of correspondences (m , m ): 1 2 C = (m 1 I ) (m 1 I ) T T T 2 3 2 3 m RC3 Rβ b 3 β Z T T m 1RCR m 1 T T β 1 1 Z 1 R (Z m b) ππ T 2 β z 1 1 β If (Z 0) Or (Z 0):
1 2 β€errorCountβ€ errorCount + 1
Else: β M = Z m
If (errorCount < minCount1 1 ): BestReconstruction {R , b}
minCount errorCountβ β
Figure 5. A reconstruction of the famous Hannover dinosaur (Niem and Buschmann 1995) from the first two views of the sequence.
Figure 6. A reconstruction Oxford Wadham college from two views (Werner and Zisserman 2002).
Figure 7. A reconstruction of a model house2 from two views.
Figure 8. Reconstruction from the first two frames of the "corridor" sequence2. The two camera locations are shown as green spots on the left.
7. Scene Reconstruction from the Essential Matrix: Whereβs the Hack?
There have been many authors who presented good to high quality results in terms of recovering camera pose and scene structure from the essential matrix. To name a few renowned researchers, Pollefeys (Pollefeys, Van Gool et al. 2004), Nister (NistΓ©r 2004), Zisserman and Hartley (Hartley and Zisserman 2003) have presented remarkable scene
2 Images retrieved from http://www.robots.ox.ac.uk/~vgg/data/data-mview.html reconstructions with their specialized algorithms for the computation of the fundamental matrix (and subsequently, of the essential matrix through known camera intrinsics). To the best of my knowledge, with the exception of Nisterβs algorithm3, what these methods have in common is that they do not directly address the two primary constraints associated with the essential matrix: a) It has exactly two singular values and, b) These singular values are equal. Typically, the aforementioned algorithms enforce these constraints after the optimization. In my opinion, the ramifications of this strategy can be unpredictable, depending on the level of noise in the data. Enforcing two equal singular values is a brutal way of imposing constraints and can potentially alter the relative pose estimate to an extent at which no iterative refinement can recover from. As proven in theorem 1, matrix E is an essential matrix if and only if it can be written as the sum of two tensor products, = + (56) , , , ππ ππ = = 0 where are unitπΈπΈ vectorsπ£π£1π’π’1 suchπ£π£2π’π’2 that, . An alternative approach would be to use3 the standard formula for the essentialππ matrixππ given in equation 1 2 1 2 2 2 (8) andπ£π£ imposeπ£π£ π’π’ orthonormalityπ’π’ β β constraints on the columnsπ£π£1 π£π£ of π’π’the1 π’π’ rotation matrix and a unit-norm constraint on the baseline. Either way, a properly constrained optimization involves a Lagrangian (or some parametrized expression) with the standard 8-point algorithm cost function and 5 Lagrange multipliers (2 for orthogonality and 3 for unit norms). The Karush-Kuhn-Tucker (KKT) conditions will eventually lead to a 4th degree polynomial system in the components of , , , . This system is relatively hard to obtain analytically and it requires a special category of algorithms known as Groebner 1 2 1 2 basis solvers (Lazard 1983) to solve it. Toπ£π£ theπ£π£ bestπ’π’ ofπ’π’ my knowledge, the parametrization of equation (3.32) is not frequently met in literature. Nisterβs solution (NistΓ©r 2004) deserves special reference in this section for being the only method (to the best of the authorβs knowledge) that actually solves for the essential matrix while abiding by the orthogonality constraints. The idea is to recover the essential matrix from the 4-dimensional null space of the data matrix. The constraints yield a polynomial system containing reasonably-sized expressions (only 4 unknowns up to arbitrary scale) and can be solved in a relatively uncomplicated manner. Of course, the problem with this method is that it does not generalize to the most usual formulation of the problem which is an overdetermined system4, in which case the null space of the data matrix is rarely non-empty. For completeness, I would like to mention a parametrization by Vincent Lui and Tom Drummond for an iterative solution of the 5-point problem estimation (Lui and Drummond 2007). Other solutions for the 5-point problem involve Hongdongβs (Li and Hartley 2006) and Kukelovaβs (Kukelova, Bujnak et al. 2008) methods.
3 Of course, many variations of the 5-point algorithm have been proposed since Nister, but they do not essentially add something new to the concept. 4 Nister mentions in his paper that the method generalizes to the overdetermined case if the 4 eigenvectors of the data matrix corresponding to the 4 smallest singular values are taken instead of the 4 null-space basis vectors used in the 5-point case. I believe that this is an arbitrary assertion. Clearly, one is free to employ the singular vectors to diagonalize the data matrix, but there is no justification about why the constrained minimum should be in the space of the smallest 4 singular values. It is a sane conjecture from a greedy point of view, but there is no proof and moreover, there is no justification about why 4 vectors should be used instead of, for instance, 5 which is the number of DOF of the essential matrix. References
[1] H. Longuet-Higgins, "A computer algorithm for reconstructing a scene from two projections," Readings in computer vision: issues, problems, principles, and paradigms, p. 61, 1987. [2] R. Hartley, A. Zisserman, and I. ebrary, Multiple view geometry in computer vision vol. 2: Cambridge Univ Press, 2003. [3] E. Trucco and A. Verri, Introductory techniques for 3-D computer vision vol. 93: Prentice Hall New Jersey, 1998. [4] O. Faugeras, Three-dimensional computer vision: a geometric viewpoint: the MIT Press, 1993. [5] O. Faugeras, Q.-T. Luong, and T. Papadopoulo, The geometry of multiple images: the laws that govern the formation of multiple images of a scene and some of their applications: the MIT Press, 2004. [6] J. Y. Bouguet, "Pyramidal implementation of the affine lucas kanade feature trackerβdescription of the algorithm," Technical report). Intel Corporation2001. [7] B. K. P. Horn and B. G. Schunck, "Determining optical flow," Artificial intelligence, vol. 17, pp. 185-203, 1981. [8] M. Z. Brown, D. Burschka, and G. D. Hager, "Advances in computational stereo," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 25, pp. 993-1008, 2003. [9] Q. T. Luong and O. D. Faugeras, "The fundamental matrix: Theory, algorithms, and stability analysis," International Journal of Computer Vision, vol. 17, pp. 43- 75, 1996. [10] W. Chojnacki and M. J. Brooks, "On the consistency of the normalized eight- point algorithm," Journal of Mathematical Imaging and Vision, vol. 28, pp. 19- 27, 2007. [11] R. I. Hartley, "In defense of the eight-point algorithm," Pattern Analysis and Machine Intelligence, IEEE Transactions on, vol. 19, pp. 580-593, 1997. [12] Y. Ma, S. Soatto, J. Kosecka, and S. S. Sastry, An invitation to 3-d vision: from images to geometric models vol. 26: springer, 2003. [13] B. K. P. Horn, "Recovering baseline and orientation from essential matrix," J. Optical Society of America, 1990. [14] A. OpenGL, M. Woo, J. Neider, and T. Davis, "OpenGL programming guide," ed: Addison-Wesley, 1999. [15] R. Hartley, "Ambiguous configurations for 3-view projective reconstruction," Computer Vision-ECCV 2000, pp. 922-935, 2000. [16] A. Hejlsberg, S. Wiltamuth, and P. Golde, C# language specification: Addison- Wesley Longman Publishing Co., Inc., 2003. [17] D. Fay, "An architecture for distributed applications on the Internet: overview of Microsoft's. NET platform," in Parallel and Distributed Processing Symposium, 2003. Proceedings. International, 2003, p. 7 pp. [18] G. Bradski, "The opencv library," Doctor Dobbs Journal, vol. 25, pp. 120-126, 2000. [19] M. A. Fischler and R. C. Bolles, "Random sample consensus: a paradigm for model fitting with applications to image analysis and automated cartography," Communications of the ACM, vol. 24, pp. 381-395, 1981.
View publication stats