1 An Efficient Solution to Non-Minimal Case Essential Estimation

Ji Zhao

Abstract—Finding relative pose between two calibrated images is a fundamental task in . Given five point correspondences, the classical five-point methods can be used to calculate the essential matrix efficiently. For the case of N (N > 5) inlier point correspondences, which is called N-point problem, existing methods are either inefficient or prone to local minima. In this paper, we propose a certifiably globally optimal and efficient solver for the N-point problem. First we formulate the problem as a quadratically constrained quadratic program (QCQP). Then a certifiably globally optimal solution to this problem is obtained by semidefinite relaxation. This allows us to obtain certifiably globally optimal solutions to the original non-convex QCQPs in polynomial time. The theoretical guarantees of the semidefinite relaxation are also provided, including tightness and local stability. To deal with outliers, we propose a robust N-point method using M-estimators. Though global optimality cannot be guaranteed for the overall robust framework, the proposed robust N-point method can achieve good performance when the outlier ratio is not high. Extensive experiments on synthetic and real-world datasets demonstrated that our N-point method is 2 ∼ 3 orders of magnitude faster than state-of-the-art methods. Moreover, our robust N-point method outperforms state-of-the-art methods in terms of robustness and accuracy.

Index Terms—Relative pose estimation, essential manifold, non-minimal solver, robust estimation, quadratically constrained quadratic program, semidefinite programming, convex optimization !

1 INTRODUCTION INDNG relative pose between two images using 2D- provide high robustness. F 2D point correspondences is a cornerstone in geometric Once the maximal consensus set has been found, the vision. It makes a basic building block in many structure- standard RANSAC optionally re-estimates a model by using from-motion (SfM), visual odometry, and simultaneous lo- all inliers to reduce the influence of noise [1], [6]. Thus a non- calization and mapping (SLAM) systems [1]. Relative pose minimal solver is needed in this procedure. This RANSAC estimation is a difficult problem since it is by nature non- framework is called the gold standard algorithm [1]. Fig- convex and known to be plagued by local minima and ure 1 illustrates this framework by taking line fitting as an ambiguous solutions. Relative pose for uncalibrated and example. In addition to the usage as post-processing, the calibrated cameras are usually characterized by fundamen- non-minimal solvers can also be integrated more tightly into tal matrix and essential matrix, respectively [1]. A matrix RANSAC variants. For example, LO-RANSAC [7] attempts is a fundamental matrix if and only if it has two non- to enlarge the consensus set of an initial RANSAC estimate zero singular values. An essential matrix has an additional by generating hypotheses from larger-than-minimal subsets of property that the two nonzero singular values are equal. the consensus set. The rationale is that hypotheses fitted on Due to these strict constraints, essential matrix estimation is a larger number of inliers typically lead to better estimates arguably thought to be more challenging than fundamental with higher support. matrix estimation. This paper focuses on optimal essential In this paper, the non-minimal case relative pose esti- matrix estimation. mation is called N-point problem, and its solver is called N- arXiv:1903.09067v3 [cs.CV] 7 Oct 2020 Due to the scale ambiguity of translation, relative pose point method. As it has been investigated in [8], [9], [10], N- for calibrated cameras has 5 degrees-of-freedom (DoFs), point methods usually lead to more accurate results than including 3 for rotation and 2 for translation. Except for de- five-point methods. Thus the N-point method is useful generate configurations, 5 point correspondences are hence for scenarios that require accurate pose estimation, such enough to determine the relative pose. Given five point as visual odometry and image-based visual servoing. The correspondences, the five-point methods using essential well-known direct linear transformation (DLT) technique [1] matrix [2], [3] or parametrization [4] can with proper normalization [11] can be used to estimate the be used to calculate the relative pose efficiently. The afore- essential matrix using 8 or more point correspondences. mentioned solvers are the so-called minimal solvers. When However, DLT ignores the inherent nonlinear constraints of point correspondences contain outliers, minimal solvers are the essential matrix. To deal with this problem, an essential usually integrated into a hypothesize-and-test framework, matrix is recovered after an approximated essential matrix such as RANSAC [5], to find the solution corresponding is obtained from the DLT solution [1]. An eigenvalue-based to the maximal consensus set. Hence this framework can formulation and its variant were proposed to solve N- point problem [9], [10]. However, all the aforementioned methods fail to guarantee global optimality and efficiency • J. Zhao is with TuSimple, Beijing, China. simultaneously. E-mail: [email protected] Since the N-point problem is challenging, the progress 2

• Robust N-point method. We propose a robust essen- tial matrix estimation method by integrating N-point Input observations with outliers method into M-estimators. Considering that the robust components of the overall framework are not certifiably and provably optimal, we can only demonstrated em- pirical performance assurances. … • Theoretical aspects. We provide theoretical proofs of the semidefinite relaxation (SDR) in the proposed N- point method, including SDR tightness and local stabil- consensus set maximization (minimal solver is used) ity with small observation noise.

error minimization (non-minimal solver is used) The paper is organized as follows. Section 2 introduces active point to determine line the related work. In Section 3, we propose a simple param- best model maximizing consensus set eterization of essential manifold and provide novel formu- best model minimizing fitting error lations of N-point methods. Based on these formulations, Section 4 derives a convex optimization approach by SDR. Fig. 1. The RANSAC framework contains the collaboration of a minimal Section 5 proves tightness and local stability of SDR. A solver and a non-minimal solver. This framework is known as the gold robust N-point method based on robust loss function is standard algorithm [1]. This paper focuses on relative pose estimation. Here we take line fitting for an example due to its convenient visualiza- proposed in Section 6. Section 7 presents the performance tion. of our method in comparison to other approaches, followed by a concluding discussion in Section 8. of its solvers is far behind the absolute pose estimation (perspective-n-point (PnP)) and point cloud registration. 2 RELATED WORK For example, the EPnP algorithm [12] in PnP area is very efficient and has linear complexity with the number Estimating an essential matrix for a calibrated camera from of observations. It has been successfully used in RANSAC point correspondences is an active research area in computer framework given an arbitrary number of 2D-3D correspon- vision. Finding the optimal essential matrix by L∞ norm dences. Though relative pose estimation is more difficult cost and branch-and-bound (BnB) was proposed in [18]. It than absolute pose estimation, it is desirable to find a prac- achieves global optima but is inefficient. There are several tical solver to N-point problem, whose efficiency and global works for N-point fundamental/essential matrix estimation optimality are both satisfactory. This is one motivation of using local optimization [19], [20] or manifold optimiza- this paper. tion [21], [22], [23]. A method for minimizing an algebraic Another motivation of this paper is developing an M- error was investigated in [8]. Its global optimality was estimator based method for relative pose estimation. In obtained by a square matrix representation of homogeneous practice, it arises frequently that the data have been con- forms and relaxation. An eigenvalue-based formulation was taminated by large noise and outliers. Existing robust es- proposed to estimate rotation matrix [9], in which the prob- timation methods are mainly classified into two main cat- lem is optimized by local gradient descent or BnB search. It egories, i.e., inlier set maximization [13] and M-estimator was later improved by a certifiably globally optimal solution based method [14]. Inlier set maximization can be achieved by relaxation and semidefinite programming (SDP) [10]. by randomized sampling methods [5], [7] or deterministic However, none of the aforementioned methods can find optimization methods [15], [16]. For M-estimator based the globally optimal solution efficiently. Though BnB search methods, the associated optimization problems are non- methods [9], [18] can obtain global optima in theory, they convex and difficult to solve. In this paper, the proposed have the exponential time complexity in the worst case. robust N-point method uses M-estimators. To solve the In [8], [10], a theoretical guarantee of the convexification associated optimization problem effectively, the line process procedures is not provided. The most related paper to is adopted [17]. The line process uses a continuous iterative this work is [10], which converts the N-point problem of optimization strategy which takes a weighted version of an eigenvalue-based formulation to a QCQP. However, its non-minimal solver as a vital requirement. Due to the lack of efficiency is not satisfactory and tightness of its SDR has not an efficient and globally optimal N-point method, there did been proved. not exist a practical M-estimator based relative estimation There are also several works on fundamental ma- method before. trix estimation for uncalibrated cameras. The eight-point Based on the aforementioned motivations, in this paper method [11] uses a linear solution, then recovers a valid we propose a novel N-point method and integrate this fundamental matrix by SVD decomposition. This method method into M-estimators. The contributions of this paper ignores the rank constraint in the fundamental matrix, thus are three-fold. the solution is not optimal. In [24], a method for min- • Efficient and globally optimal N-point method.A imizing an algebraic error was proposed which ensures simple parameterization is proposed to characterize the rank constraint. However, it does not guarantee global the essential manifold. Based on this parameterization, minima. In [25], [26], the constraint for a fundamental a certifiably globally optimal N-point method is pro- matrix is imposed by setting its determinant as 0, leading posed, which is 2 ∼ 3 orders of magnitude faster than to a cubic polynomial constraint. In [27], the fundamental state-of-the-art methods. matrix estimation problem is reduced to one or several 3

constrained polynomial optimization problems. Unfortu- denote a correspondence of bearing vectors pointing at the nately, the aforementioned methods deal with uncalibrated same 3D world point from two distinct viewpoints, where cameras only, where the underlying Euclidean constraints fi represents the observation from the first viewpoint, and 0 of an essential matrix are not exploited. Thus they cannot be fi the observation from the second viewpoint. The bearing 0 pi 0 pi applied to essential matrix estimation. vectors are determined by fi = and fi = 0 . kpik kpik For both the essential matrix and fundamental matrix, The relative pose is composed of rotation R and transla- optimal pose estimation can be formulated as a polyno- tion t. Rotation R transforms vectors from the second into mial optimization problem [28]. A polynomial optimization the first frame. Translation t is expressed in the first frame problem can be converted to a QCQP. In multiple view and denotes the position of the second frame with respect > geometry, SDR for polynomial optimization problems was to the first one. The normalized translation t = [t1, t2, t3] first studied in [29]. Later, a large number of methods will be identified with points in the 2-sphere S2, i.e., using QCQP formulations were developed in computer 2 3 > vision and robotics. For example, SDR or Lagrangian duality S , {t ∈ R |t t = 1}. of QCQPs has been used in point set registration [30], The 3D rotation will be featured as 3 × 3 orthogonal matrix triangulation [31], relative pose estimation [10], rotation with positive determinant belonging to the special orthogo- averaging [32], pose synchronization [33], and the Wahba nal group SO(3), i.e., Problem [34]. However, SDR does not guarantee a priori that 3×3 > it generates an optimal solution. If a QCQP satisfies certain SO(3) , {R ∈ R |R R = I, det(R) = 1}, conditions and data noise lies within a critical threshold, a where I is a 3 × 3 identity matrix. recent study proved that the solution to the SDP optimiza- tion algorithm is guaranteed to be globally optimal [35], [36]. A noise threshold that guarantees tightness of SDR is given 3.1 Parametrization for Essential Manifold for the rotation averaging problem [32]. For general QCQPs, The essential matrix E is defined as [1] the global optimality still remains an open problem. E = [t] R, (1) Existing robust estimation methods in geometric vision × are mainly based on inlier set maximization [13] or M- where [·]× defines the corresponding skew-symmetric ma- estimators [14]. Inlier set maximization was proven to be trix for a 3-dimensional vector, i.e., NP-hard [37]. BnB search can be used to find the globally     t1 0 −t3 t2 optimal solution [38], [39], [40], [41], [42], but its efficiency [t]× = t2 =  t3 0 −t1 . (2) is not satisfactory. There are a variety of methods to ap- t3 −t2 t1 0 proximately and efficiently solve the inlier set maximization × problem. The most popular algorithms belong to a class Denote the essential matrix E as of randomized sampling techniques, i.e., RANSAC [5] and  >   e1 e11 e12 e13 its variants [6], [7] [15] [16]. A hybrid method of BnB and > E = e2  = e21 e22 e23 . mixed integer programming (MIP) [43] was proposed to > e3 e31 e32 e33 solve the inlier set maximization in relative pose estimation. > Another alternative robust framework, which is based on where ei is the i-th row of E. Denote its corresponding M-estimators, has been successfully applied to many fields vector as such as bundle adjustment [44], registration [45], [46], and e vec(E) = [e1; e2; e3] data clustering [47]. An important technique to optimize M- , estimators is the line process [17], which is also a building = [e11; e12; e13; e21; e22; e23; e31; e32; e33]. (3) block of the robust version of the proposed method. In the where vec(·) means stacking all the entries of a matrix by line process, progress is hindered by a lack of an efficient row-first order. There is not any essential difference between and globally optimal non-minimal solver. The proposed N- different orders of entry stacking in this paper. We adopt point method in this paper can be integrated into the line row-first order to make the notations in Section 5 more process to make a robust N-point method. convenient. In this paper, an essential matrix set is defined as 2 3 FORMULATIONS OF N -POINT METHOD ME , {E | E = [t]×R, ∃ R ∈ SO(3), t ∈ S }. (4) 0 Denote (pi, pi) as the i-th point correspondence of the This essential matrix set is called normalized essential man- same 3D world point from two distinct viewpoints. Point ifold [21], [22], [23]. Theorem 1 provides equivalent condi- 0 observations pi and pi are represented as homogeneous tions to define ME, which will greatly simplify the opti- coordinates in normalized image plane1. Each point in the mization in the proposed methods. normalized image plane can be translated into a unique unit 0 Theorem 1. A real 3 × 3 matrix, E, is an element in ME if and bearing vector originating from the camera center. Let (fi, fi ) 3 only if there exists a vector t ∈ R satisfying the following two conditions: 1. Bold capital letters denote matrices (e.g., E and R); bold lower-case letters denote column vectors (e.g., e, t); non-bold lower-case letters (i) EE> = [t] [t]> and (ii) t>t = 1. (5) represent scalars (e.g., λ). By Matlab syntax, we use semicolon/comma × × in matrix concatenation to arrange entries vertically/horizontally. For Proof. if a For direction, first it can be verified that example, [[a], [b]] = [a, b] and [[a]; [b]] = . >  2 2 2 2 2 b det [t]×[t]× − σI = −σ[σ − (t1 + t2 + t3)] = −σ(σ − 1) . 4

By combining this result with condition (i), we can see also quadratic. This formulation has 21 variables and 30 that EE> has an eigenvalue 1 with multiplicity 2 and an constraints. eigenvalue 0. According to the definition of singular value, According to Theorem 1, an equivalent QCQP form of the nonzero singular values of E are the square roots of minimizing the algebraic error is the nonzero eigenvalues of EE>. Thus the two nonzero min e>Ce (11) singular values of E are equal to 1. According to Theorem 1 E,t in [48], E is an essential matrix. By combining condition (ii), > > > s.t. EE = [t]×[t]×, t t = 1. E is an element in ME. For only if direction, E is supposed to be an essential There are 12 variables and 7 constraints in this problem. The matrix ME. According to the definition of ME, there exists constraints can be written explicitly as below a vector t satisfying condition (ii). In addition, there exists  h = e>e − (t2 + t2) = 0, (12a) a rotation matrix R such that E = [t]×R. It can be verified  1 1 1 2 3 > > >  > 2 2 that EE = ([t]×R)([t]×R) = [t]×[t] , thus condition (i) h = e e − (t + t ) = 0, ×  2 2 2 1 3 (12b) is also satisfied.  > 2 2 h3 = e e3 − (t + t ) = 0, (12c)  3 1 2 It is worth mentioning that a necessary condition for > h4 = e1 e2 + t1t2 = 0, (12d) only if  general essential matrix, which is similar to the  > h5 = e1 e3 + t1t3 = 0, (12e) direction in Theorem 1, was presented in [48, Proposition 2]  h = e>e + t t = 0, (12f) and [49, Lemma 7.2]. In Theorem 1, we further prove that  6 2 3 2 3  > this condition is also sufficient and propose a novel param- h7 = t t − 1 = 0. (12g) eterization for the normalized essential manifold. Problems (10) and (11) are equivalent since their objectives are the same and their feasible regions are equivalent. Both 3.2 Optimizing Essential Matrix by Minimizing an Alge- of the two problems are nonconvex. Appendix A provides braic Error another equivalent optimization problem. In the following, For noise-free cases, the epipolar constraint [1] implies that we will only consider problem (11) due to its fewer variables and constraints. Moreover, it is homogeneous without the f >Ef 0 = 0. (6) i i need of homogenization, which makes it simpler than the Under the presence of noise, this constraint does not strictly alternative formulations. hold. We pursue the optimal pose by minimizing an alge- Remark: Both problems (10) and (11) are equivalent braic error to an eigenvalue-based formulation [9], [10]. A proof of N the equivalence is available in [10], see its supplementary X > 0 2 min (fi Efi ) . (7) material. Since all the mentioned formulations essentially E∈M E i=1 utilize the normalized essential manifold as feasible regions The algebraic error in objective has been widely used in and have the equivalent objectives, all these formulations previous literature [8], [24], [27], [50]. are equivalent. Our formulations (10) and (11) have the The objective in problem (7) can be reformulated as a following two advantages: standard quadratic form (1) Our formulations have fewer variables and con- straints. In contrast, the eigenvalue based formulation in [10] N 40 536 X > 0 2 > involves variables and constraints. As shown in the (fi Efi ) = e Ce, (8) following sections, the simplicity of our formulations will i=1 result in much more efficient solvers and enable the proof of where tightness and local stability. N (2) Our formulations are easy to associate priors for X 0 0 > C = (fi ⊗ fi )(fi ⊗ fi ) , (9) each point correspondence by simply introducing weights i=1 in the objective. For example, we may introduce a weight and “⊗” means Kronecker product. Note that C is a Gram for each sample by slightly changing the objective tp PN > 0 2 matrix, so it is positive semidefinite and symmetric. i=1 wi(fi Efi ) , where wi ≥ 0 is the weight for i-th observation. For general cases in which wi ≥ 0, we can keep current formulation by changing the construction of C to 3.3 QCQP Formulations N By explicitly writing the constraints for the essential mani- X 0 0 > C = wi (fi ⊗ fi )(fi ⊗ fi ) . (13) fold ME, we reformulate problem (7) as i=1 min e>Ce (10) E,R,t 2 4 OPTIMIZATION OF N -POINT METHOD s.t. E = [t]×R, R ∈ SO(3), t ∈ S , QCQP is a long-standing problem in optimization litera- This problem is a QCQP: The objective is positive semidefi- ture with many applications. Solving its general case is an nite quadratic polynomials; the constraint on the translation NP-hard problem. Global optimization methods for QCQP vector, t>t = 1, is also quadratic; a rotation matrix R are typically based on convex relaxations of the problem. can be fully defined by 20 quadratic constraints [4], [10]; There are two main relaxations for QCQP: SDR and the and the relationship between E, R and t, E = [t]×R, is reformulation-linearization technique. In this paper, we use 5

SDR since it usually has better performance [51] and it is original problem Eq. (11) / Eq. (14) convenient for tightness analysis. local optimization non-convex global optimum Let us consider a QCQP in a general form as QCQP

> min x C0x (14) practice Theory Sec. 5 x∈ n lifting by R Lagrangian convex SDP tightness & > & semidefinite relaxation solver local stability s.t. x Aix = bi, i = 1, ··· , m. duality

n n primal SDP problem Eq. (18) dual problem Eq.(19) where C0, A1, ··· , Am ∈ S and S denotes the set of all SDP duality real symmetric n × n matrices. In our problem, convex SDP

x , [e; t] (15) Fig. 2. An overview of relations between different formulations. is a vector stacking all entries in essential matrix E and   C 09×3 translation vector t; n = 12; m = 7; C0 = ; 03×9 03×3 4.1 Essential Matrix and Relative Pose Recovery A , ··· , A x>A x 1 7 correspond to the canonical form i of Once the optimal X? of the SDP primal problem (18) has ∼ Eqs. (12a) (12g), respectively. been obtained by an SDP solver, we need to recover the A crucial first step in deriving an SDR of problem (14) is ? optimal essential matrix E . Denote Xe as the top-left 9 × 9 to observe that submatrix of X; and denote Xt as the bottom-right 3 × 3 ( X X X X X x>C x = trace(x>C x) = trace(C xx>), submatrix of , i.e., e , [1:9,1:9] and t , [10:12,10:12]. 0 0 0 X? > > > (16) Empirically, we found that the largest singular value of e x Aix = trace(x Aix) = trace(Aixx ). is near 2 and others are close to zero. It is common to set the rank of a matrix as the number of singular values larger It can be seen that both the objective and constraints in > than a threshold. In our method, the threshold depends on problem (14) are linear in matrix xx . Thus, by introducing −7 −5 > > the accuracy of SDP solver (usually 10 ∼ 10 ), leading a new variable X = xx and noting that X = xx ? to rank(Xe) = 1. Denote the eigenvector that corresponding is equivalent to X being a rank one symmetric positive ? ? to the nonzero eigenvalue of Xe as e , then the optimal semidefinite (PSD) matrix, we obtain the following equiv- essential matrix is recovered by alent formulation of problem (14) E? = mat(e?, [3, 3]), (20) min trace(C0X) (17) X∈Sn where mat(e, [r, c]) means reshape the vector e to an r × c s.t. trace(AiX) = bi, i = 1, ··· , m, matrix by row-first order. X  0, rank(X) = 1. After the essential matrix has been obtained, we can recover the rotation and translation by the standard method Here, X  0 means that X is PSD. Solving rank constrained in literature [1]. A recent work proved that the rotation semidefinite programs (SDPs) is NP-hard [52]. SDR drops matrix can be accurately recovered from the essential matrix the rank constraint to obtain the following relaxed version for pure rotation scenarios [54]. Moreover, a statistic, the n ? 0 oN xi×R xi of problem (17) mean of 0 , was proposed to identify the pure kxikkxik i=1 rotation scenarios. min trace(C0X) (18) X∈Sn In Section 4.2, the theoretical guarantee of such a pose recovery method will be provided. In Section 5, the proof s.t. trace(AiX) = bi, i = 1, ··· , m, of tightness and local stability that guarantees the global X  0. optimality will be provided. The outline of N-point method is shown in Algorithm 1. Problem (18) turns out to be an instance of SDP [52], which belongs to convex optimization and can be readily solved using primal-dual interior point methods [53]. Its dual prob- 4.2 Necessary and Sufficient Conditions for Global Op- lem is timality The following Theorem 2 provides a theoretical guarantee > max b λ (19) for the proposed pose recovery method. λ m X Theorem 2. For QCQP (11), its SDR is tight if and only if: s.t. Q(λ) = C0 − λiAi  0, the optimal solution X? to its primal SDP problem (18) satisfies i=1 ? ? rank(Xe) = rank(Xt ) = 1. > > m ? ? where b = [b1, ··· , bm] , λ = [λ1, ··· , λm] ∈ R . Prob- Proof. First, we prove the if part. Note that Xe and Xt are lem (19) is called the Lagrangian dual problem of problem (14), real symmetric matrices because they are in the feasible and Q(λ) is the Hessian of the Lagrangian. In our problem, region of the primal SDP. In addition, it is given that > ? ? bi = 0 for i = 1, ··· , 6; b7 = 1; and b λ = λ7. rank(Xe) = rank(Xt ) = 1, thus there exist two vectors ? ? ? ? > ? ? ? > ? In summary, the relations between different formulations e and t satisfying e (e ) = Xe and t (t ) = Xt . are demonstrated by Fig. 2. Since the constraints in problem (11) do not include any 6

Algorithm 1: Weighted N-Point Method 푒11 푒12 푒13 푒21 푒22 푒23 푒31 푒32 푒33 푡1 푡2 푡3 푒 푒11 11 푒 0 N 12 푒33 푒12 Input: observations {(fi, fi )}i=1, (optional) weight 푒 N 13 {wi} 푒21 i=1 푒 ? ? 22 푒 푒 Output: Essential matrix E , rotation R , translation 푒23 32 13 푡 t?, identification of pure rotation. 푒31 1 푒32 1 Construct C by Eq. (9) for unweighted version or 푒33 푒 31 푒21   푡 C 09×3 1 Eq. (13) for weighted version; C0 = ; 푡2 푡 03×9 03×3 3 푒23 푒22 푡3 푡2 7 2 Construct {Ai}i=1 in problem (14) which is (a) (b) independent of input; ? Fig. 3. The sparsity of our SDP problem. (a) Aggregate sparsity pattern. 3 Obtain X by solving SDP problem (18) or its dual White parts correspond to zeros, and gray parts correspond to nonzeros. problem (19); There are two diagonal blocks in this pattern. (b) Chordal decomposition ? ? 4 Assert that rank(Xe) = rank(Xt ) = 1; of the corresponding graph. The graph contains 12 nodes and it can be ? ? ? 5 E = mat(e , [3, 3]), where e is the eigenvector decomposed into 2 distinct maximal cliques. ? corresponding to the largest eigenvalue of Xe; ? ? ? 6 Decompose E to obtain R and t ; the number of point correspondences once the problem has 7 Identify whether pure rotation occurs. been constructed. Second, we discuss the time complexity of SDP opti- mization. Most SDP solvers use an interior-point algorithm. cross term between E and t, the intersection part of E and The SDP problem (18) can be solved with a worst case ? t in matrix Ai is zero in SDP problem. By substituting e complexity of t? e? t? and into Eq. (16), it can be verified that and satisfy 4 1/2 the constraints in primal problem (11). Now we can see that O(max(m, n) n log(1/)) (22) ? ? ? X and its uniquely determined derivatives (e and t ) are flops given a solution accuracy  > 0 [53]. It can be seen that feasible solutions for the semidefinite relaxation problem the time complexity can be largely reduced given a smaller and the primal problem, respectively. Thus the relaxation variable number n and constraint number m. Since our for- is tight. mulations have much fewer variables and constraints than Then we prove the only if part. Since the semidefinite re- ? ? previous work [10], they own much lower time complexity. laxation is tight, we have rank(X ) = 1. Then rank(Xe) ≤ 1 Finally, we discuss the time complexity of pose recovery. ? ? ? and rank(Xt ) ≤ 1. Since Xe and Xt cannot be zero matrices In our method, the essential matrix is recovered by finding ? (otherwise X is not in the feasible region), the equalities the eigenvector corresponding to the largest eigenvalue of a should hold, i.e., rank(X?) = rank(X?) = 1. ? e t 9 × 9 matrix Xe. Thus the time complexity of pose recovery Theorem 2 provides a necessary and sufficient condition is O(1). to recover the globally optimal solution for the primal ? problem. Empirically, the optimal X by the SDP problem 5 TIGHTNESSAND LOCAL STABILITY OF N -POINT X? always satisfies this condition. Specifically, the optimal METHOD has the following structure In this section, we prove tightness and local stability of the  ?   ? ? >  ? Xe 3 e (e ) 3 SDR for our problem. The readers who are not interested in X = ? = ? ? > . (21) 3 Xt 3 t (t ) theory can safely skip this section. To understand the proofs ? in this section, preliminary knowledge about convex opti- The 3 parts could be arbitrary matrices making X sym- mization, manifold, and algebraic geometry is necessary. We metric. ? recommend the readers to refer to [35, Chapter 6] [36] for Remark: The block diagonal structure of X in Eq. (21) more details. is caused by the sparsity pattern of the problem. The ag- In the following, the bar on a symbol stands for a gregate sparsity pattern in our SDP problem, which is the value under the noise-free case. In other words, it rep- union of individual sparsity patterns of the data matrices, resents ground truth. For example, C¯ denotes the matrix {C0, A1, ··· , Am}, includes two cliques: one includes the in objective constructed by noise-free observations; x¯ is 1 ∼ 9-th entries of x, and the other includes the 10 ∼ 12-th the optimal state estimation from noise-free observations; entries of x, see Fig. 3(a). There is no common node in these ¯t = [t¯ , t¯ , t¯ ]> is the optimal translation estimation from two cliques, see Fig. 3(b). The chordal decomposition theory of 1 2 3 ? noise-free observations. the sparse SDPs can explain the structure of X well. The interested reader may refer to [55], [56] for more details. 5.1 Tightness of the Semidefinite Relaxation 4.3 Time Complexity In this subsection, we prove that the SDR is tight given noise-free observations. The proof is mainly based on First, we consider the time complexity of problem construc- Lemma 2.4 in [36]. tion. The construction of the optimization problem (i.e., 0 N calculating C in Eq. (9) or Eq. (13)) is linear with the number Lemma 1. If point correspondences {(fi, fi )}i=1 are noise-free, of point correspondences, so its time complexity is O(N). the matrix C¯ in Eq. (9) satisfies that rank(C¯ ) ≤ min(N, 8), It is worth mentioning that optimization is independent of where N is the number of point correspondences. The equality 7

holds except for degenerate configurations including points on a 5.2 Local Stability of the Semidefinite Relaxation ruled quadratic, points on a plane, and no translation (explanation of these degeneracies can be found in [57]). In this subsection, we prove that the SDR has local stability near noise-free observations. In other words, our QCQP Proof. (i) When N ≤ 8, from the construction of C in Eq. (9), formulation has a zero-duality-gap regime when its obser- each point correspondence adds a rank-1 matrix to the Gram vations are perturbed (e.g., with noise in the case of sensor matrix C. The rank-1 matrices are linear independent except measurements). The proof is based on Theorem 5.1 in [36]. for the degenerate cases [57]. Thus the rank of C is N for Following [36], we will use the following notations in the non-degenerate cases. When a degeneracy occurs, there will remains of this section to make notation simplicity. be linear dependence between these rank-1 matrices, and ¯ the rank will be below N. (ii) When N > 8, there exists • θ ∈ Θ is a zero-duality-gap parameter. In our problem, ¯ ¯ the stacked vector e¯ of an essential matrix satisfying that θ = {C}. n e¯>C¯ e¯ = 0, so the upper limit of rank(C¯ ) is 8. Then we • Given noise-free observations, let x¯ ∈ R be optimal ¯ m complete the proof by combining these two properties. for the primal problem, and λ ∈ R be optimal for the dual problem. In our problem, n = 12 and m = 7. Given a set of point correspondences, if N ≥ 8 after According to the proof procedure of Lemma 3, we have x¯ = [e¯; ¯t] and λ¯ = 0. excluding points that belong to planar degeneracy, the rank n • Q¯ Q¯(λ¯) ∈ S of C¯ is 8. This principle is the basis of the eight-point Denote , θ as the Hessian of the La-  C¯ 0  method [11]. In our methods, we do not need to distinguish grangian at θ¯. In our problem, Q¯ = 9×3  0. 03×9 03×3 the points that belong to the degenerate configurations, so n • Denote X {x ∈ |h = 0, i = 1, ··· , m} as the the rank-8 assumption in the following text can be easily θ , R i|θ(x) θ X¯ X¯ satisfied in N-point problem in which N  8. primal feasible set given , and denote , θ. • Denote h(x) = [h1(x), ··· , hm(x)]. n×n > Lemma 2. Let C ∈ R be positive semidefinite. If x Cx = 0 > for a given vector x, then Cx = 0. In QCQP (11), the objective e Ce is convex with e. However, the presence of the auxiliary variables t makes the objective is not strictly convex. Theorem 5.1 in [36] provides Proof. Since C is positive semidefinite, its eigenvalues are a framework to prove the local stability for such kinds of non-negative. Suppose the rank of C is r. The eigenvalues problems. can be listed as σ1 ≥ σ2 ≥ · · · ≥ σr > 0 = σr+1 = ··· = σn. Denote the i-th eigenvector as xi. The eigenvectors are > Theorem 3 (Theorem 5.1 in [36]). Assume that the following 4 orthogonal to each other, i.e., xi xj = 0 when i 6= j. Pn conditions are satisfied: Vector x can be expressed as x = i=1 aixi. Thus, Cx = m C Pn a x = Pn a σ x x>Cx = Pn a x> · RS (restricted Slater): There exists µ ∈ R such that i=1 i i i=1 i i i, and i=1 i i > Pm Pn Pr 2 2 > µ ∇hθ¯(x¯) = 0 and ( i=1 µiAi|θ¯)|V 0, where V , {v ∈ i=1 aiσixi = i=1 σiai kxik . Given x Cx = 0, we have Pn n|Qv¯ = 0, x¯>v = 0}. ai = 0 for i = 1, ··· , r. Thus Cx = i=1 aiσixi = 0 is R obtained. R1 (constraint qualification): Abadie constraint qualification ACQX¯ (x¯) holds. 0 N Lemma 3. If point correspondences {(fi, fi )}i=1 are noise-free, R2 (smoothness): W , {(θ, x)|hθ(x) = 0} is a smooth ¯ ¯ there is zero-duality-gap between problem (11) and its Lagrangian manifold nearby w¯ , (θ, x¯), and dimw¯ W = dim Θ + dimx¯ X. dual problem (19). R3 (not a branch point): x¯ is not a branch point of X¯ with respect to v 7→ Qv¯ . Proof. Our proof is an application of the Lemma 2.4 in [36]. Then the SDR relaxation is tight when θ is close enough to θ¯. Let x¯ = [e¯; ¯t] be a feasible point in the primal problem, Moreover, the QCQP has a unique optimal solution xθ, and the where e¯ and ¯t are ground truth. And let λ = 0 be a > SDR problem has a a unique optimal solution xθx . feasible point in its Lagrangian dual problem (19). The three θ conditions needed in Lemma 2.4 in [36] are satisfied: (i) Among the four conditions in Theorem 3, the RS (re- Primal feasibility. By substituting x¯ in the primal problem, stricted Slater) is the main assumption, which is related to the constraints are satisfied since x¯ is ground truth and the the convexity of the Lagrangian function. It corresponds point correspondences are noise-free. (ii) Dual feasibility. to the strict feasibility of an SDP. R1∼R3 are regularity  ¯  Pm C 09×3 assumptions which are related to the continuity of the Q(λ) = C0 − i=1 λiAi =  0. (iii) 03×9 03×3 Pm Lagrange multipliers. In the following text, we prove that Lagrangian multiplier. It satisfies that (C0 − i=1 λiAi)x¯ = the restricted Slater and R3 are satisfied in problem (11),  C¯ e¯  . Since e¯ is the ground truth, e¯>Ce¯ = 0 according which builds an important foundation to prove the local 03×1 stability. In our problem, Ai and hi are independent of θ, to Eq. (8). Recall that C¯ is a Gram matrix and thus it is thus Ai|θ¯ = Ai and hi|θ¯ = hi. positive semidefinite. According to Lemma 2, C¯ e¯ = 0 is obtained. Lemma 4 (Restricted Slater). For QCQP (11), suppose m rank(C¯ ) = 8. Then there exists µ ∈ R such that > Pm The zero-duality gap still holds for the case of noisy µ ∇hθ¯(x¯) = 0 and ( i=1 µiAi|θ¯)|V 0, where V , {v ∈ n > observations. A proof is provided in Appendix B. R |Qv¯ = 0, x¯ v = 0}. 8

Proof. For constraint of Eqs. (12a)∼(12g), its gradient is point of X with respect to π if there is a nonzero vector v ∈ Tx X¯   with π(v) = 0. ∇e1 h1 ∇e2 h1 ∇e3 h1 ∇th1 . . . . Lemma 5. For QCQP (11), suppose rank(C¯ ) = 8. Then x¯ = ∇h¯(x¯) =  . . . .  θ  . . . .  [x¯; ¯t] is not a branch point of X¯ with respect to the mapping ∇ h ∇ h ∇ h ∇ h e1 m e2 m e3 m t m |θ¯ π : v 7→ Qv¯ .  > ¯ ¯  2e¯1 0 0 0 −2t2 −2t3 Proof. It can be verified that π(v) = 0 ⇔ Qv¯ = 0 ⇔ > ¯ ¯      0 2e¯2 0 −2t1 0 −2t3 C¯ 09×3 ce¯  > ¯ ¯  v = 0 ⇔ v = , where c and t are  0 0 2e¯3 −2t1 −2t2 0  03×9 03×3 t  > > ¯> ¯  =  e¯2 e¯1 0 t2 t1 0  . (23) free parameters. The last equivalence takes advantage of  > > ¯ ¯  ¯  e¯3 0 e¯1 t3 0 t1  rank(C) = 8. If x¯ is a branch point, there should exist a  > > ¯ ¯   0 e¯3 e¯2 0 t3 t2  nonzero vector v ∈ ker(∇h(x¯)), i.e., ∇h(x¯)v = 0. We will 0 0 0 2t¯1 2t¯2 2t¯3 prove that such nonzero v does not exist. By substituting ce¯ v = ∇h¯(x¯)v = 0 Let t into equation θ (see Eq. (23)),  1 ¯2 1 ¯2 1 ¯2 ¯ ¯ ¯ ¯ ¯ ¯ > we obtain a homogeneous linear system with unknowns µ = − 2 t1, 2 t2, 2 t3, t1t2, t1t3, t2t3, 0 . (24) > t = [t1, t2, t3] and c Note that for noise-free cases we have  > ¯ ¯ e¯1 e¯1c − t2t2 − t3t3 = 0, (27a)  ¯t>E¯ = ¯t>([¯t] R¯ ) = 0,  > ¯ ¯ × e¯2 e¯2c − t1t1 − t3t3 = 0, (27b)  e¯>e¯ c − t¯ t − t¯ t = 0, (27c) or equivalently  3 3 1 1 2 2 > 2e¯ e¯2c + t¯2t1 + t¯1t2 = 0, (27d) t¯ e¯ + t¯ e¯ + t¯ e¯ = 0. (25) 1 1 1 2 2 3 3  > 2e¯ e¯3c + t¯3t1 + t¯1t3 = 0, (27e)  1 By combining Eqs. (23)∼(25), it can be verified that  > ¯ ¯ > 2e¯2 e¯3c + t3t2 + t2t3 = 0. (27f) µ ∇hθ¯(x¯) = 0.  t¯ t + t¯ t + t¯ t = 0. It remains to check the positivity condition. According 1 1 2 2 3 3 (27g)     Q¯ C¯ 0 By eliminating t from Eqs. (27a)(27b)(27c)(27g), we obtain to the definition of V , > v = 0 ⇔ > ¯> v = 0. > > > x¯ e¯ t that (e¯1 e¯1 + e¯2 e¯2 + e¯3 e¯3)c = 0 and c = 0. By substituting Since C¯ is constructed by noise-free observations, C¯ e¯ = 0. c = 0 into this equation system, it can be further verified In other words, e¯ is orthogonal to the space spanned by that this equation system only has zeros as its solution. So v  C¯  C¯ rank(C¯ ) = 8 rank = 9 can only be a zero vector. . It is given that , thus e¯> . Considering v as a non-trivial solution of a homogeneous Theorem 4. For QCQP (11) and its Lagrangian dual prob- lem (19), let C¯ being constructed by noise-free observations and linear equation system, v can be expressed by a coordinate ¯ 0  assume that rank(C) = 8. (i) There is zero-duality-gap whenever v = 9×1 ¯ system t . C is close enough to C. In other words, there exists a hypersphere ¯ ¯ ¯ It can be seen that only 10 ∼ 12-th entries in coordinate of nonzero radius  with center C, i.e., B(C, ) = kC − Ck ≤ , ¯ system v, which correspond to t, are nonzero. Take Hessian such that for any C ∈ B(C, ) there is zero-duality-gap. (ii) for variable t and calculate the linear combination with Moreover, the semidefinite relaxation problem (19) recovers the coefficient µ, then we have optimum of the original QCQP problem (11). m Proof. From Lemma 3, when the point correspondences are X 2 A(µ) , µi∇tthi|θ¯(x¯) noise-free, the relaxation is tight. From the proof procedure, i=1 the optimum is x¯ = [e¯; ¯t], which uniquely determines the ¯2 ¯2 ¯ ¯ ¯ ¯  ¯ t2 + t3 −t1t2 −t1t3 zero-duality-gap parameter C. Our proof is an application ¯ ¯ ¯2 ¯2 ¯ ¯ ¯ ¯ > ¯ ¯ > =  −t1t2 t1 + t3 −t2t3  = [t]×[t]× = EE . (26) of the Theorem 3. The four conditions needed in Theorem 3 ¯ ¯ ¯ ¯ ¯2 ¯2 −t1t3 −t2t3 t1 + t2 are satisfied: (RS) From Lemma 4, the restricted Slater is satisfied. (R1) The equality constraints in the primal problem Recall that in the proof of Theorem 1 we have proved that forms a variety. Abadie constraint qualification (ACQ) holds the eigenvalues of E¯E¯ > are 1, 1, and 0. So the eigenvalues everywhere since the variety is smooth and the ideal is rad- of A(µ) are 1, 1, and 0. In addition, it can be verified that ical, see [36, Lemma 6.1]. (R2) smoothness. The normalized ¯t = [t¯ , t¯ , t¯ ]> is the normalized eigenvector corresponding 1 2 3 essential manifold is smooth [23]. (R3) From Lemma 5, x¯ is to eigenvalue 0 of A(µ). Hence V is the orthogonal comple- not a branch point of X¯. ment of ¯t. For any vector v ∈ V \{0}, its 10 ∼ 12-th entries are Now we complete the proof that the SDR of the proposed ¯ > orthogonal to t, so v[10:12]A(µ)v[10:12] is strictly positive. It QCQP is tight under low noise observations. In other words, Pm when the observation noise is small, Theorem 4 guarantees follows that ( i=1 µiAi|θ¯)|V = A(µ)|(¯t)⊥ 0. that the optimum of the original QCQP can be found by n k Definition 1 (Branch Point [36]). Let π : R → R be a linear optimizing its SDR. Finding the noise bounds that SDR n map: v 7→ Qv¯ . Let X¯ ⊆ R be the zero set of the equation system can tolerate is still an open problem. In Section 7, we will ¯ h(x) = [h1(x), ··· , hm(x)], and let TxX , ker(∇h(x)) demonstrate that the SDR is tight for large noise levels denote the tangent space of X¯ at x. We say that x is a branch which are much larger than that in actual occurrence. 9

6 ROBUST N -POINT METHOD N-point problem, which can be efficiently solved by the pro- In previous sections, we propose a solution to non-minimal posed method. Specifically, we exploit this special structure case essential matrix estimation. However, the presence of and optimize the objective by alternatively updating vari- outliers in point correspondences is inevitable due to the able sets E and W. As a block coordinate descent algorithm, ambiguities of feature points’ local appearance. When the this alternating minimization scheme provably converges. correspondences are contaminated by outliers, the solution By fixing E, the optimal value of each wi is given by to minimize an algebraic error will be biased. To apply −(f >Ef 0)2/τ 2 w = e i i . (32) our method for outlier-contamination scenarios, a robust i loss instead of least-square loss can be used in the objec- This can be verified by substituting Eq. (32) into objec- tive function. We take advantage of M-estimators in robust tive (30), which yields a constant 1 with respect to E. statistics [14], and modify problem (7) as Thus optimizing objective (30) yields a solution E that is N also optimal for the original objective (28). The line process X > 0 theory [17] provides general formulations to calculate wi min ρ(fi Efi ), (28) E∈M E i=1 for a broad variety of robust loss functions ρ(x; τ). The alternating minimization of a line process is illustrated in where ρ(·) is a robust loss function. Fig. 4. The selection of an appropriate robust loss function is critical to this problem. A natural choice for loss function ( 0, if |x| ≤ τ, weight for each sample would be the `0 norm, i.e., ρ(x; τ) = . 1, otherwise. It turns out to be the inlier set maximization problem, Image 1 Image 2 which has been proved to be a computationally expensive sample weighting problem [37]. The presented approach below can accom-

modate many M-estimators in a computationally efficient epipolar lines loss framework. First we will take the scaled Welsch (Leclerc) weight weighted N-point method function [58], [59] as an example residual |residual| 2 τ  2 2  convergence ρ(x; τ) = 1 − e−x /τ , (29) 2 residual for each sample essential matrix & inlier set where τ ∈ (0, +∞) is a scale parameter. Problem (28) can be optimized by local optimization Fig. 4. Robust estimation of essential matrix by solving a line process of an M-estimator. In left part, the size of a feature point represents its methods directly. However, local optimization is prone to weight. local optima. To solve this problem, our approach is based on Black-Rangarajan duality between M-estimators and line It is worth mentioning that the loss function in our processes [17]. The duality introduces an auxiliary variable robust N-point method is not limited to Welsch function. wi for i-th point correspondence and optimizes a joint The Black-Rangarajan duality [17] provides a theory to objective over the essential matrix E and the line process build the relation between M-estimators and line processes. N variables W = {wi}i=1 Based on this general-purpose framework, integrating other N N robust loss functions is essentially the same as the Welsch X > 0 2 X function. Typical robust loss functions include Cauchy min wi(fi Efi ) + Ψ(wi). (30) E∈M , E W i=1 i=1 (Lorentzian), Charbonnier (pseudo-Huber, `1-`2), Huber, Geman-McClure, smooth truncated quadratic, truncated Ψ(w ) i Ψ(w ) Here i is a loss on ignoring -th correspondence. i quadratic, Tukey’s biweight functions, etc, see Fig. 5(a). i tends to zero when the -th correspondence is active and The Black-Rangarajan duality of these loss functions can be to one when it is inactive. A broad variety of robust es- found in [17], [59]. timators ρ(·) have corresponding loss functions Ψ(·) such that problems (28) and (30) are equivalent with respect to 3 2/2 E quadratic τ = 5 : optimizing either of the two problem yields the same Huber τ = 4 Charbonnier τ = 3 Cauchy essential matrix. The form of problem (30) enables efficient τ = 2 truncated quadratic 10 2 τ Welsch = 1 and scalable optimization by an iterative solution of the Geman-McClure smooth truncated weighted N-point method. This yields a general approach Tukey's biweight

5 that can accommodate many robust nonconvex functions 2/2 ρ(·). The loss function that makes problems (28) and (30) 0 0 equivalent with respect to E is -2 - 0 2 −15 −10 −5 0 5 10 15 (a) Different loss functions (b) Welsch functions τ 2 Ψ(wi) = (1 + wi log(wi) − wi) . (31) Fig. 5. Loss functions. (a) Different loss function. (b) Welsch func- 2 tions [17] with different scale parameters τ. Objective (30) is biconvex on E, W. When E is fixed, the optimal value of each wi has a closed-form solution. When Objective (30) is non-convex and its shape is controlled variables W are fixed, objective (30) turns into a weighted by the parameter τ of loss function ρ(·). To alleviate the 10

effect of local minima, we employ graduated non-convexity The eigen method in openGV implementation provides (GNC) which is dating from 1980s [46], [60]. The idea is to rotation only. Once the relative rotation has been obtained, 0 start from a problem that is easier to solve, then progres- we can calculate the translation t. Recall that fi and fi sively deformed to the actual objective while tracking the represent bearing vectors of a point correspondence across solution along the way. Large τ makes the objective function two images. From Eqs. (1) and (6), the epipolar constraint smoother and allows many correspondences to participate can be written as in the optimization even when they are not fit well by the f >[t] Rf 0 = 0. essential matrix E, see Fig. 5(b). Our method begins with a i × i (33) very large value τ. Over the iterations, τ is automatically Since R has been calculated, each point correspondence decreased, gradually introducing non-convexity into the provides a linear constraint on the entries of the translation objective. The line process together with the GNC strategy is vector t. Due to the scale-ambiguity, the translation has not guaranteed to obtain the global optimum for the original only two DoFs. After DLT, a normalized version of t can non-convex problem (28). They can obtain a good solution if be recovered by simple linear derivation of the right-hand not the globally optimal one in most cases when the outlier null-space vector (e.g. via singular value decomposition). ratio is below a threshold. The outline of the robust N-point Given N (N ≥ 2) point correspondences, the least squares method is shown in Algorithm 2. fit of t can be determined by considering the singular vector corresponding to the smallest singular value. Algorithm 2: Robust N-Point Method Setting for robust N-point method: We compared 0 N N RANSAC+5pt Input: correspondences {(f , f )} , parameter τ the proposed robust -point method with , i i i=1 min RANSAC+7pt RANSAC+8pt Output: Essential matrix E?, rotation R?, translation and , which stands for integrat- 5pt 7pt 8pt t?, and inlier set I. ing , and into the RANSAC framework [5], respectively. All comparison methods are provided by 1 Initialize wi ← 1, ∀i = 1, ··· ,N; 2 3 openGV [61], and the default parameters are used. In the ex- 2 Initialize τ ← 1 × 10 ; periment on real-world data, we also compare our method 3 repeat with a branch-and-bound method BnB-Yang2 [41]. The 4 Update E by weighted N-point method in 0.002 Algorithm 1; angular error threshold in this method was set as ra- dians. 5 Update wi by Eq. (32); 2 2 To evaluate the performance of the proposed method, we 6 set τ ← τ /1.3; separately compared the relative rotation and translation ac- 7 until convergence or τ < τmin; curacy. We follow the criteria defined in [62] for quantitative 8 Generate inlier set I = {i|wi > 0.1}; ? evaluation. Specifically, 9 Calculate E by using inlier set I and unweighted N-point method in Algorithm 1; • the angle difference for rotations is defined as ? ? ? 10 Decompose E to obtain R and t . ! trace(R> R?) − 1 180 ε [degree] = arccos true · , rot 2 π

• and the translation direction error is defined as 7 EXPERIMENTAL RESULTS ! t> t? 180 ε [ ] = arccos true · . Setting for the N-point method: We compared the pro- tran degree ? kttruek · kt k π posed N-point method with several state-of-the-art meth- ? ods on synthetic and real data. Specifically, we compared In above criteria, Rtrue and R are the ground truth and ? our method with 5 classical or state-of-the-art methods: estimated rotation, respectively; ttrue and t are the ground • a general five-point method 5pt [2] for relative pose truth and estimated translation, respectively. estimation. • two general methods for fundamental matrix estima- 7.1 Efficiency of N-Point Method tion, including seven-point method 7pt [1] and eight- The SDPA solver [63] was adopted as an SDP solver in point method 8pt [11]. our methods. All experiments were performed on an Intel • eigenvalue-based method proposed by Kneip and Ly- Core i7 CPU running at 2.40 GHz. The number of point nen [9] which is referred to as eigen and a certifiably correspondences is fixed to 100. The SDP optimization takes globally optimal solution by Briales et al. [10] which is about 5 ms. In addition, it takes about 1 ms for other referred to as SDP-Briales. procedures in our method, including problem construction, Among these methods, the implementation of optimal essential matrix recovery, and pose decomposition. SDP-Briales is provided by the authors. The In summary, the runtime of our method is about 6 ms. implementations of other comparison methods are provided We compared the proposed method with several state- by OpenGV [61]. The method eigen needs initialization, of-the-art methods [8], [10], [18], which also aim to find the and it is initialized using 8pt method. Our method and all globally optimal relative pose. The efficiency comparison comparison methods in openGV are implemented in C++. is shown in Table 1. It can be seen that our method is The SDP-Briales method is implemented in Matlab, and 2 ∼ 3 orders of magnitude faster than comparison methods. the optimization in it relies on an SDP solver with hybrid Matlab/C++ programming. 2. The C++ code is available from http://jlyang.org/ 11

TABLE 1 15 8 5pt 5pt Efficiency comparison with other globally optimal methods. The last 7pt 7pt ours 7 8pt 8pt column is the normalized runtime by setting as 1. eigen 6 eigen ours 10 ours method optimization runtime norm. runtime 5 4 > 7 > 1000 Hartley & Kahl [18] BnB s 3 Chesi [8] LMI 1.15 s 190 5

mean rot. error [degree] 2 SDP-Briales [10] SDP 1 s 160 mean tran. error [degree] 1 ours SDP 6 ms 1 0 0 0 2 4 6 8 10 0 2 4 6 8 10 noise level [pixel] noise level [pixel]

(a) rotation error εrot (b) translation error εtran The superior efficiency makes our method the first globally optimal method that can be applied to large scale structure- Fig. 6. Relative pose accuracy with respect to image noise levels. from-motion and realtime SLAM applications. Since both the methods of SDP-Briales and ours correspondence number is 5. The methods 5pt, 7pt, and take advantage of SDP optimization, further comparison 8pt can only take a small subset of the point correspon- between them is reported in Table 2. It can be seen that dences. In contrast, eigen and ours utilize all the point our method has a much simpler formulation in terms of correspondences. To make a fair comparison, we randomly numbers of variables and constraints. It is not surprising sample minimal number of point correspondences for 5pt, that our method has significantly better efficiency. 7pt, and 8pt, and repeat 20 times for each method. Then we find the optimal relative pose among them. Since all TABLE 2 point correspondences are inliers, we cannot use the max- SDP formulations comparison. For domain Sn, the number of variable is n(n + 1)/2. imal inlier criterion to find the optimal rotation as that in the traditional RANSAC framework. Instead, we use the method domain #variable #constraint algebraic error to find the optimal relative rotation for these SDP-Briales [10] S40 820 536 methods. ours S12 78 7 The pose estimation accuracy with respect to the number of point correspondences is shown in Fig. 7. We have the following observations: (1) The errors of eigen and ours decrease when increasing the numbers of point correspon- 7.2 Accuracy of N-Point Method dences. It further verifies the effectiveness of non-minimal 7.2.1 Synthetic Data solvers. (2) When N > 20, eigen and ours have signifi- To thoroughly evaluate our method’s performance, we per- cantly smaller errors than other methods, and our method form experiments on synthetic scenes in a similar manner has the smallest rotation and translation error among all to [9]. We generate random scenes by first fixing the position methods. (3) The error curve of Eigen has oscillation due of the first frame to the origin and its orientation to the to local minima when the noise level is large. In contrast, identity. The translational offset of the second frame is the error curve of our method is smooth for any noise level. chosen with uniformly distributed random direction and a maximum magnitude of 2. The orientation of the second 7.2.2 Real-World Data frame is generated with random Euler angles bounded to We further provide an experiment on real-world images 0.5 radians in absolute value. This generates random rela- from the EPFL Castle-P19 dataset [64]. It contains 19 tive poses as they would appear in practical situations. Point images in this dataset. We generate 18 wide-baseline image correspondences result from uniformly distributed random pairs by grouping adjacent images. For each image pair, points around the origin with a distance varying between 4 putative point correspondences are determined by SIFT and 8, transforming those points into both frames. Then a feature [65]. Then we use RANSAC with iteration number virtual camera is defined with a focal length of 800 pixels. 2000 and Sampson distance threshold 1.0 × 10−3 for outlier Gaussian noise is added by perturbing each point corre- removal. Since the iteration number is sufficiently large and spondence. The standard deviation of the Gaussian noise the distance threshold is small, the preserved correspon- is referred to as the noise level. dences can be treated as inliers. First, we test image noise resilience. For each image noise Given correct point correspondences, we compare the level, we randomly generate synthetic scenes and repeat the relative pose accuracy of different methods. For 5pt, 7pt experiments 1000 times. The number of correspondences and 8pt methods, they only use a small portion of the corre- is fixed to 10, and the step size of the noise level is 0.1 spondences. For a fair comparison, we repeat them 10 times pixels. The results for all methods with varying image noise using randomly sampled subsets. The rotation and transla- levels are shown in Fig. 6. It can be seen that our method tion errors of different methods are shown in Fig. 9. It can be consistently has smaller rotation error εrot and translation seen that the mean and median error of our N-point method error εtran than other methods. Moreover, our method and is significantly smaller than those errors produced by the eigen significantly outperform 5pt, 7pt and 8pt. This comparison methods. Specifically, our method achieves a result demonstrates the advantage of non-minimal solvers median rotation error of 0.15◦ and a median translation in terms of accuracy. error of 0.56◦. In contrast, 5pt, 7pt and 8pt achieve a Second, we set the noise level as 5 pixels and vary the median rotation error of 0.33◦, 0.99◦ and 1.02◦, respectively; number N of point correspondences. The step size of the and they achieve a median translation error of 1.07◦, 3.07◦ 12

0.35 2 5pt 0.7 5pt 5pt 4 5pt 7pt 7pt 7pt 7pt 3.5 0.3 8pt 0.6 8pt 8pt 8pt eigen eigen eigen 1.5 eigen 3 0.25 ours 0.5 ours ours ours 2.5 0.2 0.4 1 2 0.15 0.3 1.5 0.1 0.2 mean rot. error [degree] mean rot. error [degree] 0.5 mean rot. error [degree]

mean rot. error [degree] 1

0.05 0.1 0.5

0 0 0 0 10 50 100 150 200 10 50 100 150 200 10 50 100 150 200 10 50 100 150 200 point correspondence number point correspondence number point correspondence number point correspondence number

(a) εrot, noise level: 2.5 pix (b) εrot, noise level: 5 pix (c) εrot, noise level: 10 pix (d) εrot, noise level: 20 pix

1 5pt 5pt 6 5pt 5pt 12 7pt 2 7pt 7pt 7pt 8pt 8pt 5 8pt 8pt 0.8 eigen eigen eigen 10 eigen ours ours ours ours 1.5 4 8 0.6 3 1 6 0.4 2 4 mean tran. error [degree] mean tran. error [degree] mean tran. error [degree] 0.5 mean tran. error [degree] 0.2 1 2

0 0 0 0 10 50 100 150 200 10 50 100 150 200 10 50 100 150 200 10 50 100 150 200 point correspondence number point correspondence number point correspondence number point correspondence number

(e) εtran, noise level: 2.5 pix (f) εtran, noise level: 5 pix (g) εtran, noise level: 10 pix (h) εtran, noise level: 20 pix Fig. 7. Relative pose accuracy with respect to number of point correspondences.

2 10 0.035

0.03

0 0.025 10 0.02

0.015 −2 10 0.01 rotation error [degree] translation error [degree] 0.005

0 −4 0 0.001 0.01 0.1 0.5 1 2 3 10 0 0.001 0.01 0.1 0.5 1 2 3 Fig. 8. A sample image pair of EPFL Castle-P19 dataset. translation length translation length (a) rotation error εrot (b) translation error εtran

12 40

35 0.3 10 30 0.25 8 25 0.2

6 20 0.15 15 4 0.1 pure rotation statistic

rotation error [degree] 10 0.05 2 translation error [degree] 5 0 0 0 0 0.001 0.01 0.1 0.5 1 2 3 8pt 7pt 5pt ours 8pt 7pt 5pt ours translation length

(a) rotation error εrot (b) translation error εtran (c) pure rotation statistic Fig. 9. Relative pose accuracy of the EPFL Castle-P19 dataset. Fig. 10. Relative pose accuracy with respect to translation length. and 3.01◦, respectively. In this experiment, eigen and our method have a negligible difference. It means that when the affected by translation length3. The larger translation length noise level is small, local optimization methods might work tends to result in more accurate translation estimation. as well as global optimization methods. In Figure 10(c), we plot a pure rotation statistic [54] with respect to translation length. The statistic is defined as the 7.2.3 Performance for Pure Rotation ? 0 xi×R xi mean of kx kkx0 k , i = 1, ··· ,N. It can be seen that this We use synthetic data to validate the performance of our i i statistic is discriminative to identify pure rotation scenarios. method for pure rotation. The synthetic scenes are generated When (near) pure rotation occurs, this statistic is (near) zero. as that in Section 7.2.1. The field-of-view of the camera is From the above experiments, it certifies that the proposed 180◦. We set the noise level as 0.5 pixels and the number of N-point method can be applied to pure rotation cases. point correspondences as 100. The translation length is var- ied. For each translation length, we repeat the experiments 1000 times using randomly generated scenes. The pose esti- mation errors with respect to translation length is shown in 3. When the ground truth of translation is zero, the translation error ◦ Fig. 10. It can be seen that the rotation error is not affected by εrot is ill-defined. We define εrot as 90 , because the expectation of εrot translation length. In contrast, the translation error is greatly is 90◦ when the estimated translation is uniformly random. 13

1 1 100 100 Welsch

0.8 0.8 80 80

0.6 0.6 60 60 Huber noise level: 0 noise level: 0 Charbonnier 0.4 noise level: 0.1 0.4 noise level: 0.1 40 Cauchy 40 success rate (%) success rate (%) empirical CDF empirical CDF noise level: 0.5 noise level: 0.5 truncated quadratic noise level: 1 noise level: 1 Welsch 0.2 noise level: 2 0.2 noise level: 2 20 Geman−McClure 20 noise level: 5 noise level: 5 smooth truncated noise level: 10 noise level: 10 Tukey’s biweight 0 0 0 0 −15 −13 −11 −9 −7 −5 −3 −1 −15 −13 −11 −9 −7 −5 −3 −1 0 10 20 30 40 50 60 70 80 90 100 0 10 20 30 40 50 60 70 80 90 100 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 outlier ratio (%) outlier ratio (%) 2nd largest singular value 2nd largest singular value (a) forward & varied noise level (b) sideways & varied noise level (a) Robustness comparison (b) Robustness of Welsch function Fig. 12. Success rate of the robust N-point method. (a) Robustness 1 1 comparison with different loss functions. (b) Robustness of Welsch

0.8 0.8 function. (It is plotted separately for better visualization)

0.6 0.6

point num: 10 point num: 10 0.4 point num: 20 0.4 point num: 20 repeat the experiments 100 times using randomly generated empirical CDF point num: 50 empirical CDF point num: 50 point num: 80 point num: 80 data. 0.2 point num: 100 0.2 point num: 100 point num: 150 point num: 150 point num: 200 point num: 200 We evaluate 8 robust loss functions in Fig. 5(a), and 0 0 −15 −13 −11 −9 −7 −5 −3 −1 −15 −13 −11 −9 −7 −5 −3 −1 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 10 report their success rates. The success rate is the ratio of 2nd largest singular value 2nd largest singular value successful cases to the overall trials. A trial is treated as (c) forward & varied point num. (d) sideways & varied point num. ◦ ◦ a success if εrot ≤ 0.15 and εtran ≤ 0.5 . The results are Fig. 11. The second largest singular value with respect to image noise shown in Fig. 12. It can be seen that truncated quadratic, levels and numbers of point correspondences. Welsch, smooth truncated quadratic, and Turkey’s biweight functions can tolerate up to 45% outliers. Cauchy and Geman-McClure functions can tolerate up to 40% outliers; 7.3 Global Optimality of N-Point Method Charbonnier and Huber functions can tolerate up to 15% ? Recall that rank(Xe) is used to verify the optimality of outliers. In the following experiments, we will use Welsch the proposed N-point method. Thus the second largest function in the robust N-point method since it has superior singular value is the key to ensure rank-1 condition and performance. global optimality. We use the synthetic scenes defined in Sec- The performance of the robust N-point method and tion 7.2.1 to demonstrate the second largest singular value RANSAC-based methods is shown in Fig. 13. Our method ? of rank(Xe). Figure 11 shows the cumulative distribution can consistently obtain smaller errors than RANSAC-based functions (CDF) of the second largest singular values under methods in terms of both mean and median errors. From different settings. Given a threshold to determine the rank of logarithmic scale plots in Fig. 13(e)∼(h), our method has sig- ? rank(Xe), the proportion of instances with global optimal- nificantly smaller rotation and translation error than other ity can be obtained from corresponding CDF. The success methods when the outlier ratio is below a threshold. rate of the global optimality depends on the threshold of singular value and the accuracy of the SDP solver. 7.4.2 Real-World Data In Figure 11(a)(b), the number of point correspondences We use the EPFL Castle-19 dataset [64] for real-world is fixed to 100, and the noise level is varied for forward and data. The image pairs and their putative correspondences sideways motion modes. In Figure 11(c)(c), the noise level is are generated in the same way as that in Section 7.2.2. fixed to 0.5 pixels, and the number of point correspondences Each image pair contains 4408 putative correspondences on is varied for forward and sideways motion modes. We average. Though it is theoretically sound, the BnB-Yang have the following observations. (1) When the threshold of method [41] cannot handle a large number of point cor- singular value is 1.0 × 10−6, the global optimality can be respondences. It fails to return a solution for an image obtained in most cases. (2) Smaller noise levels or more point pair in a day, therefore we randomly select 100 putative correspondences will result in higher success rates of global correspondences as input for it. optimality. (3) For usual cases whose noise level is below 1 The relative pose accuracy of different methods is shown pixel and the number of point correspondences is above 20, in Fig. 14. It can be seen that the proposed robust N-point the global optimality can be obtained in most cases. method has higher overall accuracy than RANSAC-based methods and BnB-Yang method. The efficiency compari- 7.4 Performance of Robust N-Point Method son between our method and RANSAC-based methods is summarized as below: (1) The robust N-point method takes We test the robust N-point method on both synthetic data a roughly constant number of iterations. According to the and real-world data. The parameter τ 2 is set as 6.0 × 10−7. min GNC strategy and its parameter setting, its iteration number is at most 79. In contrast, the iteration in RANSAC-based 7.4.1 Synthetic Data methods increases drastically with a high outlier ratio or The synthetic scene is generated as that in Section 7.2.1. The a high confidence level. For example, given 45% outliers noise level is fixed to 0.5 pixels, and the correspondence and 99% confidence level, 5pt, 7pt and 8pt methods need number is fixed to 100. The outlier ratio is varied from 0% to iterate at least 90, 301 and 548 times, respectively. In to 100% with a step size of 5%. For each outlier ratio, we practice, the RANSAC-based methods need more iterations 14

RANSAC+5pt 10 RANSAC+5pt RANSAC+5pt 25 30 RANSAC+7pt RANSAC+7pt RANSAC+7pt 10 RANSAC+8pt RANSAC+8pt RANSAC+8pt ours 8 ours ours 25 20 8 20 6 15 6 RANSAC+5pt RANSAC+7pt 15 4 4 RANSAC+8pt 10 ours 10 mean rot. error [degree] mean tran. error [degree] median rot. error [degree] 2 median tran. error [degree] 5 2 5

0 0 0 0 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 outlier ratio (%) outlier ratio (%) outlier ratio (%) outlier ratio (%)

(a) mean εrot (b) mean εtran (c) median εrot (d) median εtran

1 1 10 10

1 1 10 10

0 0 10 10 RANSAC+5pt 0 0 RANSAC+5pt 10 RANSAC+7pt 10 RANSAC+8pt RANSAC+7pt ours RANSAC+8pt −1 −1 ours 10 10 mean rot. error [degree] mean tran. error [degree]

RANSAC+5pt RANSAC+5pt median rot. error [degree] −1 −1 median tran. error [degree] 10 RANSAC+7pt 10 RANSAC+7pt RANSAC+8pt RANSAC+8pt ours ours

0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 0 10 20 30 40 50 60 70 80 90 outlier ratio (%) outlier ratio (%) outlier ratio (%) outlier ratio (%)

(e) mean εrot (f) mean εtran (g) median εrot (h) median εtran Fig. 13. Relative pose accuracy with respect to outlier ratios. Figures in the first row and the second rows use linear scale and logarithmic scale for vertical axis, respectively.

3 15 and it is 2 ∼ 3 orders of magnitude faster than state-of- BnB−Yang BnB−Yang RANSAC+8pt RANSAC+8pt the-art non-minimal solvers. Moreover, the robust N-point 2.5 RANSAC+7pt RANSAC+7pt RANSAC+5pt RANSAC+5pt ours method outperforms state-of-the-art methods in terms of ours 10 2 robustness and accuracy. 1.5

5

rot. error [degree] 1 tran. error [degree] APPENDIX A 0.5 ANOTHER FORMULATION OF N -POINT PROBLEM 0 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 image pair index image pair index The proposition below provides another equivalent form to

(a) rotation error εrot (b) translation error εtran define essential manifold ME, which will derivate another simple optimization problem. Fig. 14. Relative pose accuracy of the EPFL Castle-19 dataset. Lemma 6. For an essential matrix E which can be decomposed by > P3 P3 2 E = [t]×R, it satisfies that trace(EE ) = E = than the least required to achieve high accuracy. (2) The i=1 j=1 ij 2ktk2. non-minimal solver in the robust N-point method takes much more time than minimal solvers in RANSAC-based Proof. Note that the norm of each row of R is 1, and the methods (6 milliseconds vs. 30 ∼ 180 microseconds). As a rows of R are orthogonal to each other. Taking advan- > result, our method takes about 480 ms, and RANSAC-based tage of E = [t]×R, it can be verified the trace(EE ) = BnB-Yang P3 P3 2 2 2 2 2 methods take 52 ms on average. (3) The method i=1 j=1 Eij = 2(t1 + t2 + t3) = 2ktk . takes 118 ∼ 9007 seconds for randomly selected 100 point correspondences. Proposition 1 (Proposition 7.3 in [49]). A real 3 × 3 matrix, E, is an essential matrix if and only if it satisfies the equation: 1 8 CONCLUSIONS EE>E − trace(EE>)E = 0. (34) 2 This paper introduces a novel non-minimal solver for N- point problem in essential matrix estimation. First, we refor- Proposition 2. A real 3 × 3 matrix, E, is an essential matrix in mulate N-point problem as a simple QCQP by proposing an ME if and only if it satisfies the following two conditions: equivalent form of the essential manifold. Second, semidefi- (i) trace(EE>) = 2 and (ii) EE>E = E. (35) nite relaxation is exploited to convert this problem to an SDP problem, and pose recovery from an optimal solution of SDP Proof. For the if direction, by combining conditions (i) and is proposed. Finally, a theoretical analysis of tightness and (ii) we obtain Eq. (34). According to Theorem 1, E is a valid 3 local stability is provided. Our method is stable, globally essential matrix. So there exist (at least) a pair of t ∈ R and optimal, and relatively easy to implement. In addition, we R ∈ SO(3) such that E = [t]×R. According to condition propose a robust N-point method by integrating the non- (i) and Lemma 6, we have trace(EE>) = 2ktk2 = 2, which minimal solver into M-estimators. Extensive experiments means ktk = 1. Thus we prove E ∈ ME. demonstrate that the proposed N-point method can find For only if direction, since E ∈ ME, it is straightforward and certify the global optimum of the optimization problem, that condition (i) is satisfied according to Lemma 6. In 15 addition, Eq. (34) is satisfied since E is an essential matrix. [4] L. Kneip, R. Siegwart, and M. Pollefeys, “Finding the exact ro- By substituting condition (i) in Eq. (34), we obtain condition tation between two images independently of the translation,” in European Conference on Computer Vision. Springer, 2012, pp. 696– (ii). 709. According to Proposition 2, an equivalent form of mini- [5] M. A. Fischler and R. C. Bolles, “Random sample consensus: A paradigm for model fitting with application to image analysis and mizing the algebraic error is automated cartography,” Communications of the ACM, vol. 24, no. 6, pp. 381–395, 1981. min e>Ce (36) E [6] R. Raguram, O. Chum, M. Pollefeys, J. Matas, and J.-M. Frahm, > > “USAC: A universal framework for random sample consensus,” s.t. trace(EE ) = 2, EE E − E = 0. IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 35, no. 8, pp. 2022–2038, 2013. By introducing an auxiliary matrix G, this problem can be [7] O. Chum, J. Matas, and J. Kittler, “Locally optimized RANSAC,” reformulated as a QCQP Lecture Notes in Computer Science, vol. 2781, pp. 236–243, 2003. [8] G. Chesi, “Camera displacement via constrained minimization > min e Ce (37) of the algebraic error,” IEEE Transactions on Pattern Analysis and E,G Machine Intelligence, vol. 31, no. 2, pp. 370–375, 2009. s.t. G = EE>, trace(G) = 2, GE − E = 0. [9] L. Kneip and S. Lynen, “Direct optimization of frame-to-frame rotation,” in IEEE International Conference on Computer Vision, 2013, Note that G is a symmetric matrix which introduces 6 pp. 2352–2359. variables. Thus there are 15 variables and 16 constraints in [10] J. Briales, L. Kneip, and J. Gonzalez-Jimenez, “A certifiably glob- ally optimal solution to the non-minimal relative pose problem,” this QCQP. in IEEE Conference on Computer Vision and Pattern Recognition, 2018, pp. 145–154. [11] R. I. Hartley, “In defence of the 8-point algorithm,” in International APPENDIX B Conference on Computer Vision, 1995, pp. 1064–1070. STRONG DUALITY BETWEEN PRIMAL SDP AND ITS [12] V. Lepetit, F. Moreno-Noguer, and P. Fua, “EPnP: An accurate O(n) solution to the PnP problem,” International Journal of Com- DUALITY puter Vision, vol. 81, no. 2, pp. 155–166, 2009. Lemma 7. For QCQP (11), there is no duality gap between the [13] T.-J. Chin and D. Suter, The Maximum Consensus Problem: Recent Algorithm Advances. Morgan & Claypool Publishers, 2017. primal SDP problem (18) and its dual problem (19). [14] P. J. Huber, Robust Statistics. John Wiley and Sons, 1981. Proof. Denote the optimal value for problem (18) and its [15] Z. Cai, T.-J. Chin, H. Le, and D. Suter, “Deterministic consensus maximization with biconvex programming,” in European Confer- dual problem (19) as fprimal and fdual. The inequality ence on Computer Vision. Springer, 2018, pp. 699–714. fprimal ≥ fdual follows from weak duality. Equality, and the [16] H. Le, T.-J. Chin, A. Eriksson, T.-T. Do, and D. Suter, “Deterministic existence of X? and λ? which attain the optimal values approximate methods for maximum consensus robust fitting,” follow if we can show that the feasible regions of both IEEE Transactions on Pattern Analysis and Machine Intelligence, 2019. [17] M. J. Black and A. Rangarajan, “On the unification of line pro- the primal and dual problems have nonempty interiors, cesses, outlier rejection, and robust statistics with applications in see [52, Theorem 3.1] (also known as Slater’s constraint early vision,” International Journal of Computer Vision, vol. 19, no. 1, qualification [66].) pp. 57–91, 1996. For the primal problem, let E be an arbitrary point [18] R. I. Hartley and F. Kahl, “Global optimization through rotation 0 space search,” International Journal of Computer Vision, vol. 82, no. 1, on the essential manifold ME: E0 = [t0]×R0, where pp. 64–79, 2009. kt0k = 1. Denote x0 = [vec(E0); t0]. It can be verified [19] Z. Zhang, “Determining the and its uncer- > tainty: A review,” International Journal of Computer Vision, vol. 27, that X0 = x0x0 is an interior in the feasible domain of the primal problem. For the dual problem, let λ = pp. 161–195, 1998. 0 [20] K. Kanatani and Y. Sugaya, “Unified computation of strict max- [−1, −1, −1, 0, 0, 0, −3]>. Recall that C  0 and it can be   imum likelihood for geometric fitting,” Journal of Mathematical C 0 Imaging and Vision verified that Q(λ ) = + I 0. That means λ is an , vol. 38, pp. 1–13, 2010. 0 0 0 0 [21] Y. Ma, J. Koseckˇ a,´ and S. Sastry, “Optimization criteria and geo- interior in the feasible domain of the dual problem. metric algorithms for motion and structure estimation,” Interna- tional Journal of Computer Vision, vol. 44, no. 3, pp. 219–249, 2001. [22] U. Helmke, K. Huper,¨ P. Y. Lee, and J. Moore, “Essential matrix ACKNOWLEDGMENTS estimation using Gauss-Newton iterations on a manifold,” Inter- national Journal of Computer Vision, vol. 74, no. 2, pp. 117–136, 2007. The author would like to thank Prof. Laurent Kneip at [23] R. Tron and K. Daniilidis, “The space of essential matrices as a ShanghaiTech, Prof. Qian Zhao at XJTU, and Haoang Li at Riemannian quotient manifold,” SIAM Journal on Imaging Sciences, CUHK for fruitful discussions. The author also thanks Dr. vol. 10, no. 3, pp. 1416–1445, 2017. [24] R. Hartley, “Minimizing algebraic error in geometric estimation Jesus Briales at the University of Malaga for providing the problem,” in IEEE International Conference on Computer Vision, 1998, code of [10] and Dr. Yijia He at CASIA for his help in the pp. 469–476. experiments. [25] Y. Zheng, S. Sugimoto, and M. Okutomi, “A branch and contract algorithm for globally optimal fundamental matrix estimation,” in IEEE Conference on Computer Vision and Pattern Recognition, 2011, REFERENCES pp. 2953–2960. [26] F. Bugarin, A. Bartoli, D. Henrion, J.-B. Lasserre, J.-J. Orteu, and [1] R. Hartley and A. Zisserman, Multiple View Geometry in Computer T. Sentenac, “Rank-constrained fundamental matrix estimation by Vision. Cambridge University Press, 2003. polynomial global optimization versus the eight-point algorithm,” [2] D. Nister,´ “An efficient solution to the five-point relative pose Journal of Mathematical Imaging and Vision, vol. 53, no. 1, pp. 42–60, problem,” IEEE Transactions on Pattern Analysis and Machine In- 2015. telligence, vol. 26, no. 6, pp. 756–770, 2004. [27] G. Chesi, A. Garulli, A. Vicino, and R. Cipolla, “Estimating the [3] H. Stewenius,´ C. Engels, and D. Nister,´ “Recent developments on fundamental matrix via constrained least-squares: A convex ap- direct relative orientation,” ISPRS Journal of and proach,” IEEE Transactions on Pattern Analysis and Machine Intelli- Remote Sensing, vol. 60, no. 4, pp. 284–294, 2006. gence, vol. 24, no. 3, pp. 397–401, 2002. 16

[28] M. Mevissen and M. Kojima, “SDP relaxations for quadratic op- [54] Q. Cai, Y. Wu, L. Zhang, and P. Zhang, “Equivalent constraints timization problems derived from polynomial optimization prob- for two-view geometry: Pose solution/pure rotation identification lems,” Asia-Pacific Journal of Operational Research, vol. 27, no. 1, pp. and 3D reconstruction,” International Journal of Computer Vision, 15–38, 2010. vol. 127, no. 2, pp. 163–180, 2019. [29] F. I. Kahl and D. Henrion, “Globally optimal estimates for geo- [55] M. Fukuda, M. Kojima, K. Murota, and K. Nakata, “Exploiting metric reconstruction problems,” International Journal of Computer sparsity in semidefinite programming via matrix completion I: Vision, vol. 74, no. 1, pp. 3–15, 2007. General framework,” SIAM Journal on Optimization, vol. 11, no. 3, [30] C. Olsson and A. Eriksson, “Solving quadratically constrained pp. 647–674, 2001. geometrical problems using Lagrangian duality,” in International [56] L. Vandenberghe and M. S. Andersen, “Chordal graphs and Conference on Pattern Recognition. IEEE, 2008, pp. 1–4. semidefinite optimization,” Foundations and Trends in Optimization, [31] C. Aholt, S. Agarwal, and R. Thomas, “A QCQP approach to tri- vol. 1, no. 4, pp. 241–433, 2015. angulation,” in European Conference on Computer Vision. Springer, [57] S. Maybank, “The projective geometry of ambiguous surfaces,” 2012, pp. 654–667. Philosophical Transactions of the Royal Society of London. Series A: [32] A. Eriksson, C. Olsson, F. Kahl, and T.-J. Chin, “Rotation averaging Physical and Engineering Sciences, vol. 332, no. 1623, pp. 1–47, 1990. with the chordal distance: Global minimizers and strong duality,” [58] S. Geman and D. E. Mcclure, “Statistical methods for tomographic IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020. image reconstruction,” Bulletin of International Statistical Institute, [33] D. M. Rosen, L. Carlone, A. S. Bandeira, and J. J. Leonard, “SE- vol. LII-4, pp. 5–21, 1987. Sync: A certifiably correct algorithm for synchronization over the [59] C. Zach and G. Bourmaud, “Iterated lifting for robust cost opti- special Euclidean group,” International Journal of Robotics Research, mization,” in British Machine Vision Conference, 2017, pp. 1–11. vol. 38, no. 2-3, pp. 95–125, 2019. [60] A. Black and A. Zisserman, Visual Reconstruction. MIT Press, 1987. [34] H. Yang and L. Carlone, “A quaternion-based certifiably optimal [61] L. Kneip and P. Furgale, “OpenGV: A unified and generalized solution to the wahba problem with outliers,” in IEEE Conference approach to real-time calibrated geometric vision,” in IEEE Inter- on Computer Vision and Pattern Recognition, 2019, pp. 1665–1674. national Conference on Robotics and Automation, 2014, pp. 1–8. [35] D. Cifuentes, “Polynomial systems: Graphical structure, geometry, [62] Y. Zheng, Y. Kuang, S. Sugimoto, K. Astrom,¨ and M. Okutomi, and applications,” Ph.D. dissertation, Massachusetts Institute of “Revisiting the PnP problem: A fast, general and optimal solu- Technology, 2018. tion,” in IEEE International Conference on Computer Vision, 2013, pp. [36] D. Cifuentes, S. Agarwal, P. A. Parrilo, and R. R. Thomas, 2344–2351. “On the local stability of semidefinite relaxations,” arXiv preprint [63] M. Yamashita, K. Fujisawa, M. Fukuda, K. Kobayashi, K. Nakata, arXiv:1710.04287v2, 2018. and M. Nakata, “Latest developments in the SDPA family for [37] T.-J. Chin, Z. Cai, and F. Neumann, “Robust fitting in computer solving large-scale SDPs,” in Handbook on semidefinite, conic and vision: Easy or hard?” International Journal of Computer Vision, vol. polynomial optimization. Springer, 2012, pp. 687–713. 128, pp. 575–587, 2020. [64] C. Strecha, W. von Hansen, L. van Gool, P. Fua, and U. Thoen- [38] O. Enqvist and F. Kahl, “Robust optimal pose estimation,” in nessen, “On benchmarking camera calibration and multi-view European Conference on Computer Vision, 2008, pp. 141–153. stereo for high resolution imagery,” in IEEE Conference on Computer [39] ——, “Two view geometry estimation with outliers,” in British Vision and Pattern Recognition, 2008. Machine Vision Conference, 2009, pp. 1–10. [65] D. G. Lowe, “Distinctive image features from scale-invariant key- points,” International Journal of Computer Vision, vol. 60, no. 2, pp. [40] H. Li, “Consensus set maximization with guaranteed global opti- 91–110, 2004. mality for robust geometry estimation,” in International Conference [66] S. Boyd and L. Vandenberghe, Convex Optimization. Cambridge on Computer Vision, 2009, pp. 1074–1080. University Press, 2004. [41] J. Yang, H. Li, and Y. Jia, “Optimal essential matrix estimation via inlier-set maximization,” in European Conference on Computer Vision. Springer, 2014, pp. 111–126. [42] J. Fredriksson, V. Larsson, C. Olsson, and F. Kahl, “Optimal rela- tive pose with unknown correspondences,” in IEEE Conference on Computer Vision and Pattern Recognition, 2016, pp. 1728–1736. [43] P. Speciale, D. P. Paudel, M. R. Oswald, T. Kroeger, L. Van Gool, and M. Pollefeys, “Consensus maximization with linear matrix inequality constraints,” in IEEE Conference on Computer Vision and Pattern Recognition, 2017, pp. 5048–5056. [44] C. Zach, “Robust bundle adjustment revisited,” in European Con- ference on Computer Vision, 2014, pp. 772–787. [45] Q.-Y. Zhou, J. Park, and V. Koltun, “Fast global registration,” in European Conference on Computer Vision, 2016, pp. 766–782. [46] H. Yang, P. Antonante, V. Tzoumas, and L. Carlone, “Graduated non-convexity for robust spatial perception: From non-minimal solvers to global outlier rejection,” IEEE Robotics and Automation Letter, vol. 5, no. 2, pp. 1127–1134, 2020. [47] S. A. Shah and V. Koltun, “Robust continuous clustering,” Pro- ceedings of the National Academy of Sciences of the United States of America, vol. 114, no. 37, p. 201700770, 2017. [48] O. D. Faugeras and S. Maybank, “Motion from point matches: multiplicity of solutions,” International Journal of Computer Vision, vol. 4, no. 3, pp. 225–246, 1990. [49] O. Faugeras, Three-dimensional computer vision: a geometric view- point. MIT press, 1993. [50] T. Migita and T. Shakunaga, “Evaluation of epipole esti- mation methods with/without rank-2 constraint across alge- braic/geometric error functions,” in IEEE Conference on Computer Vision and Pattern Recognition, 2007, pp. 1–7. [51] K. M. Anstreicher, “Semidefinite programming versus the reformulation-linearization technique for nonconvex quadratically constrained quadratic programming,” Journal of Global Optimiza- tion, vol. 43, no. 2-3, pp. 471–484, 2009. [52] L. Vandenberghe and S. Boyd, “Semidefinite programming,” SIAM Review, vol. 38, no. 1, pp. 49–95, 1996. [53] Y. Ye, Interior Point Algorithms: Theory and Analysis. Wiley & Sons, 1997.