2013 IEEE International Conference on Computer Vision

Enhanced Continuous Tabu Search for Parameter Estimation in Multiview Geometry

Guoqing Zhou Qing Wang School of Computer Science and Engineering Northwestern Polytechnical University, Xi’an 710072, P. R. China {zhouguoqing,qwang}@nwpu.edu.cn

Abstract lems in general is difficult due to the inherent non-convexity and the presence of local optima. Optimization using the L∞ norm has been becoming To remedy these problems, a number of literatures have an effective way to solve parameter estimation problems in shown that many multiview geometric problems are quasi- multiview geometry. But the computational cost increases convex under L∞ norm [8][13]. A particularly fruitful line rapidly with the size of measurement data. Although some of work has been the development of methods that mini- strategies have been presented to improve the efficiency of mize the maximal of reprojection errors (L∞ norm) across L∞ optimization, it is still an open issue. In the paper, we observations, instead of the sum of squared reprojection er- propose a novel approach under the framework of enhanced rors. It has been proven that many multiview problems have continuous tabu search (ECTS) for generic parameter es- a single local optimum under the framework of L∞ nor- timation in multiview geometry. ECTS is an optimization m. The existence of globally optimal solution enables it ef- method in the domain of artificial intelligence, which has fective to use in parameter estimation an interesting ability of covering a wide solution space by [11]. However, this kind of is too complicated to promoting the search far away from current solution and solve large-scale geometric problems efficiently. consecutively decreasing the possibility of trapping in the Recently, researchers proposed a new strategy, that by local minima. Taking the triangulation as an example, we giving a simple sufficient condition for global optimality propose the corresponding ways in the key steps of ECTS, that can be used to verify that a solution obtained from any diversification and intensification. We also present theoret- local methods is indeed global [15][17]. This algorithm ical proof to guarantee the global convergence of search returns either a certificate of optimality for local solution with probability one. Experimental results have validated or global solution. Agarwal et al. [1] discovered that Ols- that the ECTS based approach can obtain global optimum son’s method[17] is a special case for generalized fractional efficiently, especially for large scale dimension of param- programming. Dai et al. [5] found the sequence of convex eter. Potentially, the novel ECTS based algorithm can be problems are highly related and proposed a method to de- applied in many applications of multiview geometry. rive a Newton-like step from any given point. The efficiency of L∞ algorithm has been improved obviously. All of the above mentioned are still on the 1. Introduction ways of traditional optimization, and fewer modern opti- mization methods are considered to solve these problems so Parameter estimation is one of the most fundamental far. In recent years, Tabu search (TS), a meta-heuristic opti- problems in multiview geometry. The typical measurements mization method originally proposed by Glover [6][7], has of error function include algebraic distance, geometric dis- extensively attracted attentions of researchers. It enhances tance, reprojection error, and Sampson error [9]. Traditional the performance of a local search method by using memory optimization algorithms have been dominated by local opti- structures that describe the visited solutions: once a poten- mization techniques based on the L2 norm, such as Newton tial solution has been determined, it is marked as ‘tabu’ so or Levenberg-Marquardt iterations [9] or bundle adjustment that the algorithm does not visit that possibility repeated- [20] for finding a local optimum. While some of method- ly. However, the basic TS is proposed for combinatorial s except iterative nonlinear optimization yield closed-form optimization problems primitively. Chelouah et al. [4] pro- solutions, they are quite efficient and relatively easy to im- posed a variant of TS for the global continuous optimization plement. However, solving multiple view geometry prob- problems (GCOPs), called enhanced continuous tabu search

1550-5499/13 $31.00 © 2013 IEEE 32333240 DOI 10.1109/ICCV.2013.402 (ECTS). This scheme divides the optimization process into to minimize the following cost function subject to the con-  two sequential phases, namely diversification and intensifi- straint of bi x + ˜bi > 0, cation. As a common drawback in GCOPs, meta-heuristic N approaches cannot guarantee finding the global optimum. 2 f(x)= ui − Pix (P1) In this paper, we proposed a novel method under the i=1 ECTS framework for parameter estimation in multiview ge- ometry. The procedure takes the result of linear method After a simple expansion, (P1) could be rewritten as, as initial estimation, and utilizes the ECTS to attain the  x N x global optimum. In the phase of diversification, we pro- E( )= i=1 fi( ) 2 (a x+˜a )2 x j=1 ij ij pose a non-iterative way to obtain an initial bounding con- s.t.fi( )= ˜ 2 (1) (bi x+bi) vex hull that contains the global optimum. At the stage  b x + ˜bi > 0 of intensification, we propose a new approach to attain the i best neighbor set according to the characteristics of mul- n where x, aij , bi ∈ R and a˜ij , ˜bi ∈ R and tiview geometric problems. Finally, we prove the conver- gence of ECTS method in multiview geometry from the aij = pij − uij pi3, a˜ij =˜pij − uij p˜i3,j =1, 2, (2) viewpoint of probability. The algorithm tends to achieve bi = pi3, ˜bi =˜pi3,i=1, 2,...,N, the global estimation within an arbitrary small tolerance. For the reason, we can prove that the proposed ECTS where pij is the j-th row of Pi concatenated with the scaler method converges with probability one to the global op- p˜ij . timum. Comparing with L∞ algorithm[11] and its variants or improvements[1][15][5], our method not only obtains ac- 3. ECTS in multiview geometry curate estimations, but also decreases computational cost ECTS is a variant of traditional tabu search for the glob- dramatically. al continuous optimization [6]. It is consisted of five stages, including setting of parameters, diversification, search for 2. Problem formulation the most promising area, intensification, and output of the best point found. The key stages of ECTS are diversifi- The geometric vision problems we are considering in cation and intensification. At the stage of diversification, this paper are the ones where the reprojection error can be the algorithm scans the whole solution space and detects written as affine functions composed with a projection, i.e., the promising areas, which are likely to contain the global quotients of affine functions. These problems can be repre- minimum. The centers of these promising areas are stored sented as (P0) based on the squared reprojection error, in a so-called promising list. The aim of diversification is  x N x to determine the most promising area from the promising min f( )= i=1 fi( ) 2 (a x+˜a )2 list. When the diversification ends, the step of intensifica- x j=1 ij ij bx ˜ s.t.fi( )= ˜ 2 , i + bi > 0(P0) tion will start. It searches inside the most promising area (bi x+bi) for a more optimal result. In this phase, the search is con- where x ∈ Rn is the unknowns to be solved for, and centrated on the most promising area by making the search n aij , bi ∈ R and a˜ij , ˜bi ∈ R are differentiated with geo- domain smaller and gradually reducing the neighborhood  metric problems. The constraint bi x + ˜bi > 0 reflects the structure. This strategy improves the performance of the al- fact that the reconstructed points should be located in front gorithm and allows exploiting the most promising area with of the cameras. The dimension of the problem (P0) is n, more accuracy. which is often fixed and intrinsic to particular application. 3.1. The diversification For example, n =3for multi-view triangulation, n =6for 2D affinity, n =7for fundamental matrix, n =8for planar Now, we present how to carry out the diversification for homography, and n =11for camera calibration, etc. In or- multiview geometry problems. In order to facilitate discus- der to facilitate the following discussions, here, we take the sion, we also take the triangulation as the example. At first, N-view triangulation as an example. we show the way how to determine the most promising area, Consider a set of camera matrixes Pi and correspond- which should contain the global optimum xopt.Westart   ing image points ui =(ui1,ui2) of x =(x1,x2,x3) . with a convex hull and an initial point xinit found by linear The objective of triangulation is to recover x.Thesim- algebraic method. If xopt is the true global optimum of L2 plest way is based on a linear algebraic method. Though norm minimization, it follows this method may seem attractive, the cost function that it N is minimized has no particular meaning and the method is 2 E(xopt)=min fi(xopt) ≤ E(xinit)=δ , (3) not reliable. Under the framework of L2 norm, we are led i=1

32343241 ∗ ∗ where δ is a positive value. This means that each fi(xopt) Lemma 1. Let f(x)=maxfi(x), x solves μ = 2 i term is less than δ . According to Eq.(1), Eq.(3) can be  minf(x), S = {x|bi x + ˜bi > 0, ∀i}, if and only if there rewritten in the following form. For each i, x∈S  exists λ∗ such that,  2  2 (a xopt +˜ai1) +(a xopt +˜ai2) i1 i2 ≤ N  2 δ. (4)  (b xopt + ˜bi) ∗ ∗ i λi ∇fi(x )=0, This means, for each i, the following two constraints are i=1 ∗ ≥ x∗ ∗ ∗ x∗ ∗ satisfied, where λi 0 if fi( )= μ and λ =0if fi( ) <μ     ∗     for i =1, 2,...,N and i λi =1.  a x   a x  ( i1 opt +˜ai1) ≤ ( i2 opt +˜ai2) ≤ The geometric interpretation of Lemma 1 is that, if none    δ and    δ. (5) (bi xopt + ˜bi) (bi xopt + ˜bi) of gradients vanish, then in each direction d there is an i  such that ∇fi(x) d ≥ 0, that is at least one of fi(x) does Notice that for N-view triangulation, we have a total not decrease in each direction. The Lemma 1 roughly states x number of 4N linear constraints on the variable ,for- that the gradient does not vanish anywhere except at the op- mulated by multiplying both sides of the above constraints  timum. In the proposed method, we take the gradient of b x ˜i with the depth term i + b > 0. We wish to ob- fi(xk) (k is the iteration of ECTS) as the descent direction x tain a convex hull containing the optimal opt.Thatisto and construct the candidate solution set. We generate the j j j find the lower and upper boundaries xmin, xmax such that component z of z which satisfies the Gaussian distribution j ≤ j ≤ j j xmin x xmax, j =1,...,n. We can formulate a with mean xk and standard deviation σ, j =1,...,n.Ini- (LP) problem by linearizing the con- tially σ =1,whenk increases, σ = d × σ (d is a factor straints for i =1,...,N. which is chosen from 0.997 to 0.999). If σ<10−4,weset −4   σ =10 fixedly. The Lemma 1 ensures that the candidate −(a + δb )x − a˜i1 − δ˜bi ≤ 0 i1 i solution set includes the solution trailing off the error, so the a − b x − ˜ ≤ ( i1 δ i ) +˜ai1 δbi 0 better solution could be obtained through the ECTS. − a b x − − ˜ ≤ (6) ( i2 + δ i ) a˜i2 δbi 0 Since pseudo-convex is the sufficient and necessary con- a − b x − ˜ ≤ ( i2 δ i ) +˜ai2 δbi 0 dition for a global optimum in Lemma 1, each iteration This process then provides an initial bounding convex ensures the solution toward to the global optimum. In a word, in our proposed approach, we can construct the most hull that contains the global optimum xopt basedontheL2 norm cost function. Compared to traditional diversification promising area from L2 based basic method for the diver- methods, the above mentioned way does not need iterative sification and determine the best search direction from L∞ process for the most promising area, so it is effective to based optimality conditions for the intensification. solve multiview geometry problems. 3.3. The convergence analysis 3.2. The intensification Now we discuss the convergence of the proposed ECTS In the classical ECTS, the intensification carries out the algorithm. On the basis of the description in section 3.2, the following routines: generation of neighbors, selection of the problem (P1) is a global continuous optimization problem. best neighbor, updating of the various lists and adjustment It can be rewritten as, of the parameters. In other words, if the current solution minf(x)(P2) converges in the local optimum, we must give the feasible x∈Ω direction to enable the solution escaping from the local one. j where {x ∈ Rn| ≤ j ≤ j }. The generic ECTS generates a specified number of Ω= xmin x xmax,j =1,...,n neighbors. But, when the dimensions of the vector or the Essentially, the proposed method is an instance of the number of the constraints increase rapidly, the verification memory tabu search (MTS) [10]. The MTS has the follow- ing pipeline. of best neighbors is inefficient. In this paper, we propose x ∈ x∗ x a new approach to attain the best neighbor set according to Step1: Generate an initial point 0 Ω.Set 0 = 0 the characteristics of multiview geometry problems. and k =0. Step2: If a prescribed termination condition is satisfied, The following cost function fi(x) is pseudo-convex [3],  we stop the iteration. Otherwise, we generate a random vec- 2 T 2 y j=1(aij x +˜aij ) tor by using the generator of probability density function. i x ∗ ∗ f ( )= T 2 (7) Step3: If f(y) ≤ f(xk) then xk+1 = y and xk+1 = y. (bi x + ˜bi) Otherwise, if f(y) ≤ f(xk),thenxk+1 = y,elseify It is well known that pseudo-convex function has some does not satisfy the tabu conditions, then xk+1 = y,else nice properties, such as described in the Lemma 1[16]: xk+1 = xk.Gotostep2.

32353242 In order to interpret the convergence of ECTS, we intro- Output: The global optimal solution xopt. duce the following definitions [14]. Step 1. Take the result of non-linear method as the initial { } ∗ Definition 1. Let ξm be a sequence of random vari- solution x0 = x0.Setk =0and tabu list as empty. ables defined on a probability space. We say that {ξm} con- Step 2. Construct the convex hull Ω, which contains the ∀ verges in probability towards a random variable ξ,if >0, xopt p  , as mentioned in the section 3.1. m→∞ {| m − |≥ } m −→ ∗ ∗ lim Pr ξ ξ  =1, denoted as ξ ξ. Step 3. If | i fi(xk+1) − i fi(xk)| <or k>K, Definition 2. Let ξm be a sequence of random variables the algorithm terminates, else continues. { } defined on a probability space. We say that ξm converges Step 4. Generate the candidate set. If ∇fi(xk) > 0, with probability one (or almost surely) towards a random a.s. generate the candidate element z along the gradient direc- −→ j variable ξ (denoted as ξm ξ), we have tion ∇fi(xk). z , the element of z, satisfies the Gaussian j distribution with mean xk and standard deviation σ.The P { lim ξm = ξ} =1, m→∞ details of σ and d can be seen in section 3.2. If z ∈ Ω and it is not in the tabu list, we put z in the candidate set S. or, when for any >0, Step 5. For each zs,s =1,...,|S|, we obtain y = ∞ arg min(maxfi(zs)) (based on L∞ norm). P {∩m=1 ∪k≥m [|ξk − ξ|≥]} =1. s i ∗ ∗ Step 6. If maxfi(y) ≤ maxfi(x ) then x = y and a.s. p i i k k+1 Without doubt, ξm −→ ξ is stronger than ξm −→ ξ. xk+1 = y.Otherwise,ifmaxfi(y) ≤ maxfi(xk) or y is { } i i Theorem 1 (Borel-Cantelli theorem). Let An ∗ not in the tabu list, then xk+1 = y,elsex = xk.Put be a sequence of events in a probability space, and k+1 ∞ xk+1 into the tabu list. Pk = Pr{Ak}.Then n=1 Pn < ∞ implies {∩∞ ∪ } Step 7. k = k +1.Gotostep3. Pr(lim supn→∞ An)=Pr n=1 k≥n Ak =0. ∞ x∗ If n=1 Pn = ∞ and An are independent, then In the step 6, we introduce a variable k+1 to record the ∞ {x | } Pr{∩n=1 ∪k≥n Ak} =1. optimal one of i i =1,...,k+1 . This is the main dis- The Lemma 2 and Theorem 2 give the global conver- tinction between our method and traditional TS. It is worthy gence property of the objective optimal value sequence in- noted that, in the step 1, if the initial point is an unreliable duced by MTS, as described in solving problem (P2). f is algebraic result or a random initial value, the iteration of ∗ supposed to have a global minimum f =minx∈Ω f(x), Tabu search will increase accordingly. But, the accuracy ∗ for any >0.LetD0 = {x ∈ Ω ||f(x) − f | <}, can still be guaranteed. D1 =Ω\ D0. x∗ ∈ Lemma 2. Solving (P2) by using MTS, we set k 5. Experimental results ∗ D1. Let the probability of xk+1 ∈ D1 be qk+1 and the ∗ j probability of xk+1 ∈ Do be pk+1.Ify ,j =1, 2,...,n We have tested the proposed method on both synthet- satisfies the Gaussian distribution, then qk+1 ≤ c ∈ (0, 1). ic and real scene data. At the first stage of evaluation, we Theorem 2. Solving (P2)byusingMTS,ifyj , compared our method with bisection algorithm (Bisect-I) j =1, 2,...,n satisfies the Gaussian distribution, then [11] to verify the effectiveness and efficiency for moderate ∗ ∗ ∗ Pr{limk→∞ f(xk)=f } =1. Namely xk converges scale problems, taking the triangulation and resection as ex- with probability one to the global optimal solution of (P2). amples. Then, in order to evaluate the efficiency for large The proofs of Lemma 2 and Theorem 2 are given in the scale problems, we compare our method with some state- Appendix of the paper. In Theorem 2, f ∗ is the a global of-art methods discussed in [1][5][15], taking the SfM with optimum of f and y is the candidate solution in each step known camera orientation as an example. of tabu search. Therefore, we could start from an initial The synthetic data comes from the linfinity-1.01.Mostof 2 estimate xinit. The algorithm tends to achieve the global real scene data are from VGG group and Notre Dame data estimation within an arbitrary small tolerance. For the rea- courtesy is from [19]. Our algorithm is coded in MATLAB. son, we can predicate that the proposed ECTS converges in The experimental environment is a standard PC (P8600, probability one to the global optimum. 6GB 64bits) and Matlab 2010a. We use Matlab profiler to report the timings and performance comparisons. In L∞ 4. The algorithm algorithm based methods, the adopted SOCP solver is SE- DUMI and linear programming is MOSEK. The reported Now we summarize our framework of ECTS for estimat- runtimes are the total time spent in optimization routines ing parameters in multiview geometry. and the time of setting up the problem is omitted. Input: 2D point ui in the image and camera matrix Pi, −4 the step size of tabu search is t =10 , the tolerance is 1See http://www.maths.lth.se/matematiklth/personal/fredrik/download.html >0 and K is a predefined maximal number of iteration. 2See http://www.robots.ox.ac.uk/˜vgg/data.html

32363243 Figure 1. RMS error of the triangulation on varying view amount. Figure 3. RMS error of triangulation with different noise levels.

Figure 2. RMS error of the resection on varying point amount. Figure 4. RMS error of the resection with different noise levels.

Table 1. Runtimes of the triangulation on synthetic data. Table 2. Runtimes of the resection on synthetic data. Timing (s) Timing (s) Problems Speedup Problems Speedup Bisect-I[11] Our ECTS Bisect-I[11] Our ECTS 10-views 117.5779 0.4191 280.5 10-points 4.1865 0.0583 71.8 100-views 150.2874 3.0974 48.5 100-points 5.4075 0.2102 25.7 300-views 263.2977 16.2152 16.2 300-points 9.3340 0.8087 11.5 500-views 341.4276 25.6921 13.3 500-points 9.6668 0.8856 10.9 700-views 421.1711 39.8433 10.6 700-points 14.4366 1.3857 10.4 1000-views 482.4539 51.6829 9.3 1000-points 17.7806 1.7039 10.4

5.1. Test on synthetic data levels. The RMS errors of our ECTS method are lesser than Bisect-I algorithm. In this section we compare the ECTS algorithm with Bisect-I algorithm [11]. For the moderate scale problems 5.2. Test on real scene data we tested the algorithms on randomly generated instances of triangulation and resection problems with different sizes. A. Triangulation We simulated a 3D scene with 1,000 points within a cube In the triangulation, we validated our method on two real and set N-views in front of the scene. The corresponding scene data sets. In the VGG, the Model house contains 10 points of synthetic image are normalized into [−1, +1] and views and 672 tracks, and the Wadham contains 5 views and Gaussian noises up to 0.01 are added in randomly. In Fig- 1331 tracks. The Dinosaur data contains 36 views of 4,983 ures 1 and 2, the RMS (Root Mean Squares) errors of tri- tracks and 16,432 feature points. The Notre Dame sequence angulation and resection problems with different sizes are contains 595 views of 277,877 tracks and about one million reported. One can see that the errors by the ECTS are s- feature points. We only tested 212 images and randomly maller than those of Bisect-I algorithm. Kahl et al.have selected 27000 (from 160147) tracks which is probably suf- pointed out that the improved bisection algorithm based on ficient enough to give a performance indication. The ob- the L∞ norm can obtain the global optimum [12]. Apart tained reconstructions are shown in Figures 5 and 6. In or- from the expected global optimum being achieved by the der to illustrate more intuitive comparison of reconstruction ECTS algorithm, Table 1 and 2 clearly show that the ECT- results, Figure 7 shows the main part of the reconstructed S algorithm is more efficient than Bisect-I algorithm. The dino’s head, where the red crosses and blue circles are re- time cost of the triangulation on synthetic data shows that construction results of Bisect-I method and the ECTS algo- the speedup of our ECTS method decreases when the num- rithm respectively. One can see that most of points in 3D ber of views increases. The main reason is that the number space are coincident. At the edge of point clouds, recon- of iterations in the ECTS increases with the larger number struction results appear slightly inconsistent. of views. Fortunately, the view amount is rarely more than In paper [15], the author reported that the speedups of 100 for a 3D point in practical applications. Dinosaur and Notre Dame are 10.1 (3676s vs. 365s) and 7.2 We have also validated algorithms on different Gaussian (35815s vs. 4968s) respectively. From Table 3, we find it noise level for the triangulation and resection. Figure 3 and only took 12.28s for our ECTS method to obtain the result 4 show the RMS errors of two methods with different noise for Dinosaur data, while Bisect-I method spent 3199s. For

32373244 Table 3. Performance evaluation and speedup of the triangulation. RMS error (pixels) Timing (s) Dataset Images 3D points Speedup Bisect-I[11] Our ECTS Bisect-I[11] Our ECTS Model house 10 672 0.3930 0.3741 455.8962 2.2511 202.5 Wadham 5 1331 0.1553 0.1533 751.4879 3.1245 240.5 Dinosaur 36 4983 0.4283 0.4060 3199.8564 12.2819 261.5 Notre Dame 212 27000 0.5739 0.5488 19721.1946 150.6106 130.9

Table 4. Performance evaluation and speedup of the resection. RMS error (pixels) Timing (s) Dataset Images 3D points Speedup Bisect-I[11] Our ECTS Bisect-I[11] Our ECTS Model house 10 672 0.1000 0.0368 18.0754 0.5111 35.4 Wadham 5 1331 0.0091 0.0084 13.6696 1.9669 6.9 Library 3 667 0.0173 0.0124 7.8832 0.3587 22.0 Oxford 11 737 0.0615 0.0305 21.4552 0.8945 24.0

the Notre Dame data set, the improvement of efficiency is −0.53 also significant, and the speedup is 130.9. −0.54 −0.55

−0.56

−0.57

−0.58

−0.59

−0.6 −0.04 −0.02 −0.61 0 0.1 0.02 0.08 0.06 0.04 0.04 0.02 0 −0.02 0.06 −0.04 −0.06 −0.08 0.08

Figure 7. The enlarged details of the reconstructed Dino’s head. Figure 5. The reconstructed 3D points of Dinosaur.

mids represent the position and principle axis of cameras by our ECTS method. From these results we can find that the ECTS method can achieve more accurate estimation of camera pose than Bisect-I method, nearly approaching the ground truth. C. SfM with known camera orientation Now, we present experiments on two benchmark data Figure 6. The reconstructed 3D points of Notre Dame Cathedral. sets, Dinosaur and Oxford, which are publicly available from the Oxford VGG. The Dinosaur data set contains 36 cameras and 328 3D points [1]. The Oxford data set con- B. Resection tains 11 cameras and 737 3D-points. Since our ECTS algo- As far as to the resection issue, we chose four public rithm focuses on L2 norm reprojection error, we compare available benchmarks from VGG data sets. The details of it with Bisect-II, Dinkel-II, and Gugat algorithms discussed average reprojection error and computational cost can be in [1] (the source code is provided by the author). found in Table 4. In [1], the author pointed out that Gugat’s algorithm Since it is hard to illustrate camera pose comparing with is the best one comparing to others. In [5], the author the ground truth for University library and Oxford, we on- reported the experimental result of Dinosaur with 127 3D ly show comparisons of camera pose estimation by Bisect-I points, which took 1.07s on L2 norm. Comparing to Gu- algorithm and our ECTS method with the ground truth for gat’s algorithm (11.84s), its speedup is nearly 11 times. But Model house and Wadham in Figure 8 and 9 respectively. for Oxford data, the author reported that Gugat’s algorith- The black rectangular pyramids illustrate the ground truth. m failed. In our experiments, we obtain about 3.7 and 4.8 Figure 8(b) and 9(b) show comparisons of Bisect-I algo- times speedup comparing to Gugat’s algorithm. rithm and the ground truth. The green rectangular pyramids More importantly, we carry out the ECTS algorithm and represent the position and orientation of cameras by Bisect- all baseline algorithms for structure and motion recovery on I method. In Figure8(c) and 9(c), the red rectangular pyra- the whole data set of Dinosaur (4983 3D points, 15054 un-

32383245 −15 −15

−10 −10

−5 −5

0 0

5 5

10 10

50 50 0 0 −50 15 −50 15

2 0 −2 −4 −6 −8 −10 −12 −14 −16 −18 2 0 −2 −4 −6 −8 −10 −12 −14 −16 −18 (a) Image. (b) Bisect-I method [11]. (c) Our ECTS method.

Figure 8. Comparisons of resection results of Wadham college.

10 10

8 8

6 6

4 4

2 2

0 0

−2 −2

2 2

1 1

0 0

−1 −1

−2 −2

−6 −7 −6 −7 −3 −3 −4 −5 −3 −3 −4 −5 0 −1 −2 0 −1 −2 3 2 1 3 2 1

(a) Image. (b) Bisect-I method [11]. (c) Our ECTS method.

Figure 9. Comparisons of resection results of Model house.

Table 5. Runtimes for L2 norm reprojection error. All times are in seconds. f denotes numerical failure. Parameter settings, 1 =0.01, 2 =0.001, σ =1e6. Dataset Images Points Unknowns Observations Bisect-II Dinkel-II Gugat Our ECTS Dinosaur (Partial data) 36 328 1089 2663 12.6306 13.9046 17.4233 4.7787 Oxford 11 737 2241 4035 46.2654 45.0122 35.7344 7.4617 Dinosaur (Full data) 36 4983 15054 16432 f f f 1040.6616 knowns and 16432 observations). The RMS error of our and 0.2247 pixels for Oxford and Dinosaur respectively. It algorithm based on L2 norm reprojection is 0.2247 and the is obvious that our method outperforms BA. runtime is 1040.66s. However, Bisect-II, Dinkel-II and Gu- gat fail to obtain the result, encountering the problems of 6. Discussions and conclusions out of memory. We think the key reason is that all these im- proved algorithms solved by SeDumi can not hold the ro- In this paper, we have demonstrated that many problem- bustness when the dimension of problem is becoming large. s of parameter estimation in multiple view geometry can This has proven that the proposed ECTS algorithm is suit- be formulated within a unified framework of enhanced con- able to large scale multiview geometric problems. tinuous tabu search (ECTS), which is guaranteed to con- We have also accomplished comparisons with the clas- verge in probability one to the global optimum theoreti- sical BA on this problem. The BA code is Vincent’s SfM cally. We have validated the ECTS method for multiview Toolbox 3.Agarwalet al. have pointed out the BA has space geometric problems on both synthetic and real scene data, complexity O(N 2) and time complexity O(N 3) [2]. But including triangulation, resection and N-view based struc- traditional Tabu search applicable to SfM is only O(N 2) in ture and motion recovery. Experimental results have shown each iteration [18]. In this paper, we propose to generate that the proposed ECTS algorithm can obtain accurate re- the candidate set along guided directions rather than ran- sults as same as the traditional Bisect-I method. More im- domized hyper-cube around the current solution. So time portantly, the ECTS algorithm significantly speeds up pa- complexity is less than O(k∗N 2),wherek is the number of rameter estimation many times than Bisect-I method and iteration. In experiments, k did not exceed 15. For Oxford some state-of-art algorithms. Another encouraging result is and whole Dinosaur data, the runtime of BA is 31.3211s that the RMS error of our method is lesser than the base- and 9900s. But our algorithm only costs 7.4617s and 1041s, line algorithm regardless the number of images or noises. which show our method is more efficient when the scale of Therefore, the proposed method can be extended to many problem becomes larger. At the same time, the RMS errors problems of parameter estimation in multiview geometry. of BA are 0.5217 and 0.4761 pixels and ours are 0.2267 Acknowledgement. The work is supported by NSFC fund 3See http://vision.ucsd.edu/˜vrabaud/toolbox/doc/ (61272287), “863” project (2012AA011803), Specialized

32393246 Research Fund for the Doctoral Program of Higher Edu- A. Appendix: proofs of Lemma 2 and Theorem cation (20116102110031) and NPU Foundation for Funda- 2 in section 3.3 mental Research (JC20120240), China. We thank anony- mous reviewers for their insightful suggestions and Sameer A.1. Proof of Lemma 2 Agarwal for his source code [1] for comparison. Proof: Let xmin be a global optimal solution of (P2). Since f is a continuous function, there exists a r>0, References such that |f(x) − f(xmin)| </2.LetQxmin,r = {x ∈ |x − x ≤ } ⊂ [1] S. Agarwal, N. Snavely, and S. M. Seitz. Fast algorithms for Ω min r . Obviously, Qxmin,r D0.Byas- ∗ ∗ ∗ l∞ problems in multiview geometry. In Proc. CVPR, 2008. sumption xk ∈ D1,wehavef(xk+1) ≤ f(xk) ≤ f(xk). [2] S. Agarwal, N. Snavely, S. M. Seitz, and R. Szeliski. Bundle j j 2 y ∼ N(xk,σ ),j =1,...,n leads to the generation of j adjustment in the large. In Proc. ECCV, 2010. (yj −x )2 1 k √ 2σ2 [3] S. Boyd and L. Vandenberghe. Convex Optimization. Cam- probability density function g = 2πσ exp . Thus, bridge Univ. Press, 2004. the acceptance probability is [4] R. Chelouaha and P. Siarry. Tabu search applied to global  optimization. European Journal of Operational Research, 1 f(y) ≤ f(xk), A = k 123(2):256–270, 2000. μ{Ω −∪k−L(S1 ∩ S2 ∪ S3)}/μ{Ω} f(y) >f(xk), [5] Z. Dai, Y. Wu, F. Zhang, and H. Wang. A novel fast method where S1, S2 and S3 are three criteria used to determine for l∞ problems in multiview geometry. In Proc. ECCV, 2012. whether the candidate solution is tabu or not, [6] F. Glover. Tabu search-part i. ORSA Journal on Computing S1 = {y ∈ Ω |xk − y <δ1 }, 1, 1:190–206, 1989. S2 = {y ∈ Ω ||f(xk) − f(y)| <δ2 }, [7] F. Glover. Tabu search-part ii. ORSA Journal on Computing S3 = {y ∈ Ω ||f(xk) − f(y)|/f(y) <δ3 }. 2, 1:4–32, 1990. [8] R. Hartley and F. Schaffalitzky. l∞ minimization in geomet- ≤ x∗ ∈ Obviously A 1. The probability of k+1 Qxmin,r is ric reconstruction problems. In Proc. CVPR, 2004. [9] R. Hartley and A. Zisserman. Multiple View Geometry in {x∗ ∈ } {y ∈ } Pr k+1 Qxmin,r = Pr Qxmin,r Computer Vision, 2nd edn. Cambridge University Press, = Q (g × A) dΩ Cambridge, 2004.  xmin ,r ≤ Q gdΩ [10] M. Ji and H. Tang. Global optimizations and tabu search xmin ,r based on memory. Applied Mathematics and Computation, ∗ Since ∅  x ,r ⊂ 0, we know {x ∈ 159(2):449–457, 2004. = Q min D 0 0,letqk = Pr f( k) f  .If ∃ ∈{ } x∗ ∈ view geometry. J. Math. Imaging Vis, 38(1):35–51, 2010. j 0, 1,...,k such that j D0,thenqk =0.If ∗ [17] C. Olsson, F. Kahl, and R. Hartley. Projective least-squares: ∀j ∈{0, 1,...,k} such that xj ∈/ D0,wesetqk = P¯.By global solutions with local optimization. In Proc. CVPR, Lemma 2, we have 2009. ¯ {x∗ ∈ x∗ ∈ x∗ ∈ } [18] G. Paul. An efficient implementation of the robust tabu Pk = P 0 D1, 1 D1,..., k D1 search heuristic for sparse quadratic assignment problem- So s. European Journal of Operational Research, 209(3):215– ∞ ∞ c 218, 2011. ≤ k ∞ Pk c = − < [19] N. Snavely, S. M. Seitz, and R. Szeliski. Photo tourism: ex- k=1 k=1 1 c ploring photo collections in 3d. In SIGGRAPH, pages 835- Then by Theorem 1, we get 846, 2006. ∞ ∗ ∗ [20] W. Triggs, P. McLauchlan, R. Hartley, and A. Fitzgibbon. Pr{∩n=1 ∪k≥n [|f(xk) − f |≥]} =0 Bundle adjustment for structure from motion. In Vision Al- gorithms: Theory and Practice, pages 504–509, 2000. According to the Definition 2, we gain the proof. 

32403247