6D Dynamic Relocalization from Single Reference Image

Wei Feng, Fei-Peng Tian,∗ Qian Zhang, Jizhou Sun School of Computer Science and Technology, Tianjin University, Tianjin, China Tianjin Key Laboratory of Cognitive Computing and Application, Tianjin University, Tianjin, China {wfeng, tianfeipeng, qianz, jzsun}@tju.edu.cn

Abstract (a) y (b) y y y ) x ) x Dynamic relocalization of 6D camera pose from single oo y z x z reference image is a costly and challenging task that re- ) o ) ) y z o x quires delicate hand-eye calibration and precision position- ) z o x ing platform to do 3D mechanical rotation and translation. ) o x z z In this paper, we show that high-quality camera relocaliza- Confi guration and Convergence of the Proposed DCR tion can be achieved in a much less expensive way. Based reference image relocalized image feature-point displacement owfl on inexpensive platform with unreliable absolute reposi- 10? tioning accuracy (ARA), we propose a hand-eye calibra- tion free strategy to actively relocate camera into the same 10? 6D pose that produces the input reference image, by se- AFD=0.2646 quentially correcting 3D relative rotation and translation. (c) Near-planar scene relocalization reference image relocalized image feature-point displacement owfl We theoretically prove that, by this strategy, both rotational 10? and translational relative pose can be effectively reduced to zero, with bounded unknown hand-eye pose displacement. 10?

To conquer 3D rotation and translation ambiguity, this the- AFD=0.8064 oretical strategy is further revised to a practical relocaliza- (d) Nonplanar scene relocalization tion algorithm with faster convergence rate and more reli- reference image relocalized image feature-point displacement owfl 10? ability by jointly adjusting 3D relative rotation and trans- lation. Extensive experiments validate the effectiveness and 10? superior accuracy of the proposed approach on laboratory tests and challenging real-world applications. AFD=1.1333 (e) Yearly minute change monitoring of ancient murals (g) 1. Introduction Camera pose registration and relocalization is an essen- tial problem in computer vision and robotics, fundamentally (f) Active panorama acquisition Our platform supporting a number of important real-world applications, Figure 1. Inexpensive single image DCR. (a) Coordinate systems such as structure-from-motion (SfM) [22, 37, 16], 3D track- ref ref of eye hRA, tAi, hand hRB, tBi, target camera pose hRA , tA i, ing and mapping [24, 17], monocular SLAM [2, 34], and and the unknown hand-eye relative pose hRX, tXi. (b) DCR dy- scene change detection [7, 5]. namic convergence process. (c)–(d) Our DCR results for near- Despite the diversity of previous successful methods, planar and nonplanar scenes. (e) Real minute changes (0.1mm in computer vision, most recent efforts on this topic have level), occurred during June 2014 and July 2015, discovered by been focused on static camera registration and relocaliza- our DCR in Cave-465 of Dunhuang Mogao Grottoes. (f) HD tion (SCR) [22, 30, 10]. That is, for one or a group of input panoramic image directly captured by a moving camera whose tra- images, SCR studies how to align their camera poses into jectory is controlled by our DCR. (g) Our inexpensive DCR plat- a unified world coordinate system that may come from a form. See text for more details. known 3D scene [10, 30, 15] or from 3D reconstruction us- ing the input images [22, 16]. In SCR, the camera poses of ∗is the corresponding author. Tel: (+86)-22-27406538. input images are fixed and cannot be actively readjusted.

14049 Mechanical Rotation reference image image i Camera Relative relocalized image Pose Estimation Bisection no 5-point algorthm yes

Mechanical Translation no s small yes Homography-base Image capturing Bisection Step s Output image i Coarse enough Relocalization Figure 2. Working flow of the proposed 6D dynamic camera relocalization. See text for details.

In this paper, we treat camera pose relocalization as a tion. To our best knowledge, it is the first solid hand-eye dynamic process and study dynamic (or active) camera re- calibration free 6D DCR method in CV and robotics. localization (DCR) from single reference image. Our work was originally motivated by a real-world problem, minute 2. Related Work change monitoring and measurement of ancient murals for preventive conservation. This is indeed a very challenging Static camera relocalization (SCR). Many fundamen- problem due to three major reasons. tal computer vision applications relate to camera pose reg- istration and relocalization, such as sparse or dense 3D 1. High accuracy requirement. We want to find and reconstruction [15, 17, 2, 22, 24, 27], monocular SLAM measure very fine changes occurred on murals with and camera tracking from RGB [39, 21, 34] or RGBD im- complex image content and deterioration patterns. ages [23, 8, 30, 10]. Basically, they share a common SCR 2. Wild environment applicability. Both the equipment problem. That is, they all want to align camera poses of and algorithm should work well in unrestricted envi- input one or multiple images within a unified world coordi- ronments, where both the unpleasant weather and the nate system, which can be defined by a (partially) known 3D need of frequent disassembling/reassembling equip- scene (e.g., monocular SLAM [2, 34] and active scene scan- ments to realize portability for different caves/spots ning [23, 35]) or reconstructed via SfM pipeline [15, 16]. can jeopardize hardware accuracy. Fast and reliable camera pose estimation is critical to 3. Long time interval. Relics usually change very SCR. State-of-the-art method is 5-point algorithm [25], due slowly, thus we must precisely relocalize 6D camera to its generality and superior accuracy [31]. Given the pose to capture the possible minute changes. matched feature points set [20] extracted from input images, 5-point algorithm can faithfully generate their relative rota- Note, there is NO mature solution satisfying all the above tion R and relative translation direction ¯t.1 Besides, ESM three requirements. For instance, single image DCR could is also used in camera pose estimation [24, 27]. be directly solved as a robotic visual servoing problem [3, Visual servoing. Robotic visual servoing (VS), aiming 21, 39, 26]. However, high precision robotics are not appli- to control the pose of robot end-effector (hand) via visual cable to wild environment and cannot be frequently disas- feedback (eye) [26, 39, 21], is closely related to our work. sembled and reassembled, and its accuracy highly relies on In restricted environment, e.g., objects with markers or pla- hand-eye calibration [29, 14, 28]. Rephotography [1] can- nar moving assumption [21, 39], well-calibrated VS could not do this either, due to its much lower relocalization accu- be used to dynamically relocalize camera pose. However, racy and the limited ability to physically handle only 3D rel- as aforementioned, VS is not a cheap and reliable DCR ative translation. In contrast, as shown in Fig. 1 (c)–(e), the worker, whose accuracy highly relies on hand-eye calibra- proposed approach guarantees to produce high-quality 6D tion or other restricted conditions. DCR for both near-planar and nonplanar scenes. Fig. 1 also Rephotography. Computational rephotography aims to shows that our DCR has successfully discovered 0.1mm recapture a from the same viewpoint of a histor- real mural changes occurred in Dunhuang Mogao Grottoes, ical photograph. Previous work mainly focuses on the study which truly provides ice-breaking results for this important of history [32], monitoring of natural environment, such as real-world problem and has great potentials in other areas, glacier melting [9] and geological erosion [11], ecological such as online status monitoring of high speed train. research [33]. These work highly relies on manual judg- Our major contribution is three-fold: 1) hand-eye cali- ment, that makes it quite hard for operation and its accuracy bration free, which enables frequent equipments disassem- is not high. Recently, several computational rephotogra- bling/reassembling without losing precision; 2) applicable to common platforms and wild environments; 3) theoreti- 1 R ¯t = t Unlike relative rotation , only the direction ktk of relative cally guaranteed convergence and feasible boundary condi- translation t can be determined purely from images.

4050 phy (CRP) methods [1, 18, 33] have been proposed and im- should make best use of repetitive moving strategy. Besides, proved the efficiency and accuracy. West et al. [33] present our DCR model needs to overcome two realistic difficulties: a linear blending based rephotography method for the pur- pose of monitoring urban tree canopy. Bae et al. [1] propose 1. Difficulty 1: Unknown hand-eye calibration, an interactive and computational tool to guide users to reach 2. Difficulty 2: We can only obtain the direction of rela- the desired viewpoint, which can achieve about 0.5m phys- tive camera translation ¯t from images [25, 31]. ical relocalization accuracy. However, these existing meth- ods rely on many user interactions and lack the accuracy to We start from the ideal relations between the hand and p support minute change detection of high-value scenes. eye coordinate systems during the DCR process. Let ∈ R3 c Hand-eye calibration. Hand-eye calibration estimates be an arbitrary 3D point in world coordinate system, h relative pose between robot hand and eye coordinate sys- and denote its new coordinate in the reference camera system and corresponding target hand system, respectively. tems, by solving AX = XB, where X is the homo- Similarly, ci and hi denote the coordinate of point p in cur- geneous representation of the relative pose hRX, tXi be- rent camera and hand system after i-iterations adjustment. tween the eye coordinate system hRA, tAi and the hand Since hand-eye relation is fixed and 5-point algorithm can system hRB, tBi [29, 14, 28]. Generally, accurate hand-eye produce reliable camera pose, ideally, we have calibration involves nonlinear optimization in SO(3) and R3 [12, 13]. State-of-the-art methods apply branch-and- i i c = RAci + tA, bound strategy to search global optimum using labeled im- h = Ri h + ti , B i B (1) ages with object markers in restricted environment [29, 14]. c = RXh + tX, In contrast, this paper gives a hand-eye calibration free ap- ci = RXhi + tX. proach to DCR and provides theoretical upper bound of hand-eye pose displacement, under which relative rotational Since we want hand-eye calibration free DCR (difficulty 1), and translational pose can be certainly decreased to zero. we need to “guess” hand-eye relative pose, denoted by hR˜ X,˜tXi. Considering camera pose adjustment is achieved 3. 6D Dynamic Camera Relocalization by moving hand, in practice, we have

i i Notation. A 3D rotation matrix R ∈ SO(3) can also c +1 = R˜ c + ˜t , i A i A (2) ˜ i ˜i be expressed by axis-angle representation hθ, ¯ei and quater- hi+1 = RBhi + tB, θ θ nion (cos 2 , ¯e sin 2 ), where unit vector ¯e is the invariant Eu- ˜ i ˜i ˜ i ˜i ler axis of R and θ is the rotation magnitude about ¯e. In where hRA, tAi and hRB, tBi are deviated pose adjust- θ θ ˜ ˜ this paper, we use R ≃ hθ, ¯ei ≃ (cos 2 , ¯e sin 2 ) to indi- ment, caused by inaccurate guess of hRX, tXi. i cate the equivalence of three kinds of representations for Considering difficulty 2, we cannot obtain real tA but its ¯i a 3D rotation. Let hRX, tXi be the constant relative pose orientation tA from two consecutively captured images. Its between eye coordinate system hRA, tAi and hand system length should also be “guessed” by si, thus yielding another hRB, tBi, where R denotes the orientation and t is the po- approximation i i sition of a coordinate system w.r.t. a fixed world coordinate ˆtA = si¯tA. (3) system. As shown in Fig. 1, the input reference image de- ref ref Combining Eqs. (1)–(3) leads to fines the target camera pose hRA , tA i. Our problem is to dynamically relocalize current eye system hRA, tAi to i+1 i+1 ref ref c = RA ci+1 + tA , the target pose hRA , tA i by properly moving hand, with i+1 i ∗ i −1 ∗ −1 unknown hRX, tXi. We use i to indicate iteration number, RA = RARXRA RX , i i i i ti+1 ti ti , (4) thus hRA, tAi and hRB, tBi represent the current camera A = A + △ A i i i ∗ ˜ i ∗ i −1 and hand poses after i-iterations movement, respectively. △tA = RAtX − RARXtX − RARXRA q, i ∗ −1 q = ˆt − ˜tX + R tX, 3.1. Problem formulation A X ∗ ˜ −1 We want to realize hand-eye calibration free DCR on where RX = RXRX . Eq. (4) is a general hand-eye cal- common low-cost positioning platform. Compared to ex- ibration free DCR model that shows the recurrence rela- pensive high-precision platforms whose absolute reposi- tion of camera pose between two adjacent adjustments. A tioning accuracy (ARA) and repetitive repositioning accu- feasible DCR process should guarantee to reduce the rel- racy (RRA) are both reliable, for proper low-cost platforms, ative 3D rotational and translation pose displacement to their ARA is unreliable but RRA can be trusted.2 Hence, we zero rapidly. The major obstacle comes from the inaccu- i rate guesses hR˜ X,˜tXi and ˆt , due to Difficulty 1 and 2. It 2An industrial-grade 6D miniature hexapod could worth more than A 50,000 USD and requires clean working environment. The repositioning is clear to see that, if the two guesses are correct, Eq. (4) system in our DCR platform worths less than 30,000 RMB. trivially collapses to a one-step relocalization.

4051 3.2. An easy•to•understand strategy Moreover, note that f ′(θi) ≥ 0 with the equality only occurs at θi = 0. Therefore, θi will consistently be reduced Here we give an easy-to-understand strategy by first relo- i i to zero, i.e., lim RA = I or lim θ = 0 equivalently. calizing 3D relative rotational pose to convergence via iter- i→∞ i→∞ atively adjustments using Eq. (5), then reducing 3D relative Theorem 2 (tA bisection-try convergence). When eye rel- translation to zero using the bisection-try method described i ative rotation RA converges to I, with current step size in Theorem 2. The following two theorems theoretically ¯i+1 ¯i ¯i+1 ¯i si, we measure htA , tAi. If htA , tAi ≥ 0, we guess guarantee that the influence of bounded hand-eye relative i+1 the length of tA by si+1 = si, otherwise we do bisec- pose displacement hRX, tXi can be simply ignored in DCR. si i+1 i+1 tion s +1 := and guess ˆt = s +1¯t . Using this ˜ ˜ i 2 A i A Specifically, we can just guess RX = I and tX = 0. Ac- bisection-try method, just before the bisection, we strictly cordingly, Eq. (4) can be simplified to i+1 have ktA k ≤ si+1. As si is monotonically reduced to +1 i+1 i i −1 −1 zero, so is kti k. RA = RARXRA RX , (5) A − Proof. If the relative rotation has converged to I, Eq. (6) i+1 i i i 1 ˆi −1 i tA = tA − RARXRA (tA + RX tX)+ RAtX. (6) can be reduced to

Theorem 1 (RA convergence). By the rotation adjustment i+1 i i i i i i t = t −△ = t − RXˆt = t − RXs ¯t , (11) π i+1 i A A A A A i A strategy defined by Eq. (5), if θX ≤ 3 , then θ ≤ θ and i i i i where si is the bisection step size before i-th adjustment. lim θ = 0. θ and e¯ are the angle and axis of RA. θX i→∞ By Rodrigues’ rotation formula, we have and e¯X are the angle and axis of RX. i i i RXtA = cosθXtA + sinθX(¯eX × tA) Proof. Using quaternion representation, we have i (12) +(1 − cosθX)h¯eX, tAi¯eX. +1 i+1 i+1 i θ i+1 θ i i RA ≃ (cos 2 , ¯e sin 2 ), R t , t i i Hence, h X A Ai≥ 0, because i θ i θ RA ≃ (cos 2 , ¯e sin 2 ), i i i 2 θX θX hRXtA, tAi = cosθXktAk + 0 RX ≃ (cos 2 , ¯eX sin 2 ), (7) i 2 (13) i i i −1 θ i θ +(1 − cosθX)he¯X, tAi ≥ 0. RA ≃ (cos 2 , −¯e sin 2 ), − 1 θX θX ∠ i i π ∠ RX ≃ (cos 2 , −¯eX sin 2 ). That is, (tA, RXtA) ≤ 2 , where (a, b) denotes the an- ∠ i i gle between vectors a and b. Since (△ , RXtA) = 0, From Eqs. (5) and (7), we have ∠ i i π ∠ i i (tA, △ ) ≤ 2 and 0 ≤ cos (tA, △ ) ≤ 1. i i i θ θX θ θX i Now, let us check the i-th iteration i when s bisection RARX ≃ (cos 2 cos 2 − sin 2 sin 2 he¯ , e¯Xi, ∠ i+1 i π i i happens. We have (tA , tA) ≥ . This means θ θX θ θX i 2 cos 2 sin 2 e¯X + sin 2 cos 2 e¯ (8) i i i i i 2 i i θ θX i t , t t , t +sin sin e¯ × e¯X), h A −△ Ai = k Ak − h△ Ai 2 2 i 2 i ∠ i i = ktAk − ktAksi cos (tA, △ ) (14) −1 − i i i 1 θ θX θ θX i ≤ 0. RA RX ≃ (cos 2 cos 2 −sin 2 sin 2 he¯ , e¯Xi, i i θ θX θ θX i i 2 i i i i − cos 2 sin 2 e¯X −sin 2 cos 2 e¯ (9) ∠ i Hence, we have ktAk ≤ ktAksi cos (tA, △ ) ≤ ktAksi, θ θX i i +sin 2 sin 2 e¯ × e¯X), which leads to ktAk≤ si. i i where he¯ , e¯Xi and e¯ ×e¯X denote the inner and cross prod- Theorem 2 shows that using the bisection-try method, i uct of e and eX, respectively. Combing Eqs. (5), (8) and ˆi ¯i i ¯ ¯ tA = sitA can gradually approach to tA. In this case, (9) yields Eq. (6) and (11) can be further simplified to i+1 i i i+1 i i i θ 2 θ 2 θX 2 θ t t R t I R t cos 2 = cos 2 + (1 − 2sin 2 )sin 2 A = A − X A =( − X) A. (15) i 2 θ 2 θX i 2 i+1 +sin 2 sin 2 he¯ , e¯Xi (10) i i i Therefore, we have ktA k ≤ ktAk, if and only if (iff) the 2 θ 2 θ ≥ cos 2 + cosθX sin 2 . magnitudes of eigenvalues of I − RX are not greater than 1. ± Clearly, since the eigenvalues of R are and e jθX , the i X 1 Note, by the right-hand rule, θ ∈ [0, π] and θX ∈ [0, π]. i i i eigenvalues of I−RX are 0 and 1−cosθX ±j sinθX, whose i 2 θ 2 θ θ 1 Define f(θ ) = cos 2 +cosθX sin 2 −cos 2 that is an magnitudes are not greater than 1 iff cosθX ≥ . That is, i i 2 even function. We need to prove f(θ ) ≥ 0 for θ ∈ [0, π]. π i+1 i i i we need θX ≤ 3 to guarantee ktA k ≤ ktAk in Eq. (15). ′ i θ θ 1 Since f (θ ) = sin 2 [cos 2 (cosθX −1)+ 2 ]. Clearly, when Theorem 1 and 2 theoretically prove that this easy-to- π ′ i i θX ≤ 3 , f (θ ) ≥ 0. Hence, f(θ ) is non-decreasing in understand strategy can lead to convergent DCR. Moreover, [0, π]. Meanwhile, due to f(0) = 0 and the fact that f(θi) the bisection strategy to approach relative camera transla- is an even function, we have f(θi) ≥ 0 in [−π, π], thus tion takes advantage of the RRA of inexpensive positioning i i θ +1 θ i+1 i cos 2 ≥ cos 2 and θ ≤ θ . platforms, thus its real repetitive accuracy is reliable.

4052 1.5 30 X X 3.3. The algorithm and implementation details 1 Y 20 Y Z Z θ Step size s As shown in Fig. 1(a), there exist pair ambiguities be- 0.5 10 0 0

Length tween particular axes rotation and translation, e.g., pitch −0.5 −10

Angle in radians and height movement. That is, with the existence of trans- −1 −20

−1.5 −30 lational displacement, during the first rotational relocaliza- 0 5 10 15 20 25 30 35 0 5 10 15 20 25 30 35 tion stage, the theoretical DCR strategy may be not able to (a) Iteration num (b) Iteration num R t converge well, because in this case the 5-point algorithm is Figure 3. Convergence of A and A. (a) and (b) represent the quite possible to generate unreliable large rotation estima- 3D rotation and translation convergence, respectively. Red, green and blue lines represent X, Y and Z axis respectively, and black tion that is actually caused by translation. To overcome this lines indicate the angle θ and step size s in (a) and (b). problem, based on Theorem 1 and 2, we propose a prac- tical DCR algorithm that jointly adjusts both 3D rotation tively position, i.e., s is too large and need to be bisected s ¯i and translation in the process. Detailed working flow of s ← 2 first and then to move stA; otherwise, we directly ¯i 1 the proposed DCR algorithm is shown in Fig. 2 and Al- move stA. In practice, we initialize s0 as 5 full translation gorithm 1. Besides, separately relocalizing rotational and range of the platform. Larger or smaller s0, if feasible, will translational relative pose highly relies on the mechanical only impact the convergence rate. Besides, to counteract in- independence of rotation and translation platforms. In prac- evitable mechanical gaps, for a single axis movement along tice, some translation axes movement may cause extra ro- a particular direction with length L, we first go 1.1L along tational displacement, even the first stage of relative rota- that direction and then return 0.1L inversely. tional pose relocalization does very well, subsequent trans- lation axes movement may certainly jeopardize the final 4. Experimental Results DCR accuracy. As verified by our extensive experiments, the rotation-translation joint relocalization algorithm con- Baselines. We compare our DCR with manual relocal- stantly outperforms sequential adjustment strategy in both ization, homography-based relocalization [7] and computa- convergence rate and accuracy. tional rephotography [1, 33]. Note, manual relocalization just uses human visual judgment to manually steer camera pose. For better performance, we do manual relocalization Algorithm 1 Practical 6D dynamic camera relocalization very carefully in our experiments. The homography-based ref Input: I , initial and stopping moving step s0 and smin. relocalization uses reference image and current image to 1: Initialization: initialize platform to zero position, s = generate a homography matrix that produces two navigation ˜ ˜ ¯0 s0, hRX = I, tX = 0i, i = 1, tA = 0; rectangles, guiding users to correct relative camera pose [7]. 2: Homography-based coarse camera relocalization to re- Criterion. To evaluate the performance of camera re- duce error in the range of platform [7]; localization, we present feature-point displacement flow 3: while s>smin do (FDF), a sparse field of feature-point displacement vectors, i ¯i 4: Capture current image Ii and compute hRA, tAi; and average feature-point displacement (AFD) to quantita- ˜ i ˜ −1 i ˜ i 5: Rotate platform by RB = RX RARX = RA; tively measure the relocalization accuracy: ¯i ¯i−1 6: if tA · tA < 0 then n 7: s Bisection s = 2 ; 1 i i AFD(Pref , Pcur)= kP − P k2 (16) 8: end if n X ref cur ˆi ¯i i=1 9: tA = stA; 10: ˜ti ˆti Translate platform by B = A; where Pref and Pcur are the matched feature-point coordi- 11: i++; nates in reference image and current relocalized image, re- 12: end while spectively, n is the number of matches. In our experiments, Output: DCR result Ii−1. we use SIFT feature detector, descriptor and robust match- ing with RANSAC correction. Specifically, since precision positioning platforms usu- 4.1. Convergence and accuracy ally cannot have large moving range, we first use the homography-based CR [7] to roughly relocate camera to Laboratory tests. Given the initialization of RA, tA and make its relative pose within the range of mechanical plat- fixed known RX, tX, we can simulate the proposed camera forms. We then repeatedly capturing current image and relocalization process in computer. Fig. 3 demonstrates the i ¯i calculate its 6D pose displacement hRA, tAi. We directly rotation and translation convergence process of our DCR. i ¯i rotate RA and check the direction consistency of tA and The initial setup is as follows: initial angle of RA, tA, an- ¯i−1 π π π π π π tA . If they are not in the same direction and s is not small gle of RX and tX are ( 4 , 5 ,- 4 ), (20,-15,15), ( 8 , 8 , 8 ) and enough, this means last time move has passed the objec- (15,10,10), respectively.

4053 Manual Homography Ours

Fig. 3 is a typical DCR convergence process, which tAx (.)0 1mm tAx (.)0 1mm 300 140 clearly shows that tA is bounded by the step size s dur- 250 120 ing the whole DCR procedure. With the decrease of step 200 100 RAz ° t R ° (0.) 01 Ay (0.1mm) Az (0.) 01 80 tAy (0.1mm) 150 60 size s, tA is accordingly descending. In contrast, the angle 100 40 θ of RA is monotonically decreasing to zero. Fig. 3 indeed 50 20 0 0 validates the theoretical convergence of our DCR.

tAz (0.1mm) tAz (0.1mm) R ° Table 1. Average number of iterations and AFD of 10-times DCR RAy (0.) 01° Ay (0.) 01 tests of theoretical strategy and the proposed algorithm.

(a)RAx (0.) 01° (b) RAx (0.) 01° Theoretical strategy The algorithm #iterations AFD #iterations AFD Figure 5. DCR accuracy measurement for two scenes, (a) near- 14.4 13.37 10.4 0.85 planar scene, (b) nonplanar scene.

Convergence rate comparison of two strategies. Here, Table 2. DCR accuracy statistics for two scenes. Unit of measure- we provide detailed comparison of the convergence rate and ment is 1 degree for rotation and 1mm for translation. R R R t t t accuracy of the proposed two DCR strategies. Table 1 com- Near-planar Ax Ay Az Ax Ay Az AFD manual 0.58 0.56 1.48 7.8 8.2 27.2 20.18 pares the AFD and iteration number in 10 independent DCR homogrphy 0.22 0.24 0.6 2.8 3 12 8.82 tests using the two strategies. In Fig. 4, we visually compare ours 0.01 0.02 0.01 0.1 0.1 0.2 0.26 the DCR accuracy of the two strategies. Note, the selected Nonplanar RAx RAy RAz tAx tAy tAz AFD scene in Fig. 4 has minimum AFD value in 10-times tests manual 0.24 0.22 0.56 3.5 3.6 13.5 6.98 using the theoretical strategy. homogrphy 0.13 0.13 0.34 2 2 8 6.01 ours 0.03 0.02 0.03 0.3 0.3 1.3 0.81

AFD=13.0032 AFD=0.4825 outperforms the baseline manual and homography-based re- localization methods significantly. We also observe that, Reference image DCR image by FDF of theoretical DCR image by FDF of the strategy theoretical strategy strategy the strategy in all tested scenes, the physical deviations in Z axis for Figure 4. A visual comparison of DCR using theoretical strategy both rotation and translation are constantly larger than other and the proposed algorithm on a near-planar scene. axes. This mainly attributes to the particular and FoV of the camera. That is, in our camera configura- From Table 1 and Fig. 4, we can clearly see that the pro- tion, it requires relatively larger movement and rotation in Z posed algorithm consistently converges faster than the theo- axis to cause comparable displacement in image plane retically strategy, and produces much better accuracy. This than the translation and rotation in other axes. is because separately relocalizing 3D rotational and trans- lational relative pose highly relies on the mechanical inde- Table 3. Comparison of physical camera relocalization error. CRP [1] Ours Ratio pendence of rotation and translation platforms. In practice, Translation error (m) 1.135 0.00038 2987 some translation axes movement may cause extra rotational Rotation error (◦) NA 0.02 NA displacement, even after the first stage of relative rotational pose relocalization, which may inevitably jeopardize the fi- We also compare the proposed DCR with a state-of-the- nal DCR accuracy. art computational rephotography (CRP) tool, rephoto [33]. DCR accuracy comparison. Physically measuring the Fig. 6 shows the comparative results in two outdoor scenes. relocalization error is an important and direct criterion to Table 3 gives a comparison of physical camera relocaliza- evaluate relocalization accuracy. However, ground truth is tion error of our DCR and another CRP method [1]. It is hard to obtain in practice. To this end, we establish scene- clear that, compared to CRP, the proposed DCR have much related FDF/AFD rulers by densely sampling the platform higher physical relocalization accuracy. positions and orientations in 6 DoFs, with two adjacent 4.2. Active panorama acquisition samplings having only 0.1mm positional displacement or 0.01 degree rotation shift. Via such rulers, for a particular With the proposed DCR, we can realize active acqui- scene, we can quantitatively evaluate physically meaningful sition of high-quality panorama. Fig. 7 illustrates the de- 6D DCR accuracy with radar chart. tailed working flow of active HD panorama acquisition by Fig. 5 shows the physical DCR accuracy measurement the proposed DCR. Intuitively, we repeatedly relocalize the on two different scenes, including one near-planar scene “left” part of current camera FoV to the same 6D pose of and one nonplanar scene. Table 2 gives the detailed statis- the reference image, which is the “right” part of the last- tics. From Fig. 5 and Table 2, we can clearly see that our time captured image. With high-quality camera calibra- DCR algorithm has reached subpixel accuracy and always tion and relocalization, this process guarantees to produce

4054 Rephoto AFD= 16.5952 AFD= 30 . 6004

Homography AFD= 3.9990 AFD= 9.9712

Ours

AFD= 0.8892 AFD= 0 . 7131 Reference Relocalization FDF Reference Relocalization FDF Figure 6. Accuracy of our DCR with 2 state-of-the-art competitors: Rephoto [33] and homography-based coarse camera relocalization [7].

last image reference image no Dynamic Camera input Relocalization images yes Simple Image Stitching enough output and Blending Image Recording

A(%) 40 B(%) 60 B(%) 60 C(%) 40 images set panorama

Figure 7. Working flow of DCR-based active panorama acquisition. Detailed DCR process in red block is shown in Fig. 2. seamless panoramic images, by direct image stitching, with However, current fact is no suitable method and equipment theoretically “unlimited” resolution, as such panorama can can be used for accurate minute change imaging within the be actively captured as long and big as possible. Note, to unrestricted environment in Dunhuang Mogao Grottoes. guarantee the stitching quality, we suggest the ratio of com- Using our inexpensive DCR platform shown in Fig. 1(g), mon part of two successive images is not less than 30%. As we selected 46 monitoring spots (very small regions) from shown in Fig. 8, compared to state-of-the-art image stitch- 11 real Dunhuang caves and have conducted twice DCR- ing method, like APAP [36], the proposed DCR-based ac- based image capturing in June 2014 and July 2015, respec- tive panorama acquisition is able to generate warping and tively. The proposed approach and platform have success- artifact free panoramic image, even for highly-structured fully discovered 0.1mm-level minute changes (measured scenes taken at very close distance. Fig. 1 (f) shows a HD by close-range ) in 31 monitoring spots panorama of ancient mural acquired by our approach. (67%). This is truly a breakthrough, considering practi- 4.3. Minute change monitoring of ancient murals cal state-of-the-art way is simply naked-eye observation by experts that can only support object-level change detection Another promising application of the proposed DCR during 100-year period. Some realistic yearly DCR results is long-time-interval minute change monitoring of ancient and corresponding changes derived by image-difference- Mogao murals, which, as discussed in Sec. 1, is an open aided human labeling are shown in Fig. 9. Our yearly mon- real-world problem. Although ancient Mogao murals have itoring data also find that caves open for tourists have 14 been seriously protected, they still suffer from many types times faster deterioration speed than closed caves. This, for of deteriorations caused by various environmental and hu- the first time, provides real data evidence about the influ- man influences. As a result, their status is constantly chang- ence of tourism to cultural heritages during 1 year period. ing in a very low speed. It is critically important to pro- vide feasible imaging method, through which fine-grained 5. Conclusion changes become visually apparent thus timely detection and accurate measurement of such minute changes can trig- In this paper, we have proposed an inexpensive 6D dy- ger and support multi-disciplinary protective preservation. namic camera relocalization approach. We theoretically

4055 APAP Ours Figure 8. Comparison of panorama acquisition using state-of-the-art image stitching method APAP [36] and the proposed approach.

AFD=1.0414

4.8030 mm2 2.8168 mm2 9.5699 mm2

AFD=1.5686 AFD=0.8266

0.9207 mm2 0.0932 mm2

AFD=0.7854

7.7841 mm2 4.6375 mm2 15.6322 mm2

Figure 9. Real minute deteriorations of Dunhuang Mogao murals discovered by the proposed DCR from June 2014 and July 2015. prove that both 3D rotational and translational pose dis- and matching [4, 19, 6, 38] and apply it to solve more real- placement can be effectively reduced to zero based on low- world change monitoring problems. Besides, we want to ex- cost RRA-reliable platform, without hand-eye calibration. tend the proposed DCR model to faithfully relocalize hand We also find that, to make the proposed strategy work, the coordinate system from multiple images. We are also inter- feasible upper bound of hand-eye 3D orientation displace- ested in dynamic lighting recurrence from single image. π ment angle should be less than 3 . Extensive tests validate Acknowledgements We thank Xudong Wang, Bomin the effectiveness and accuracy of our approach. More im- Su, Mingming Wang, Gangquan Chen, Qinglin Guo, Xi- portantly, we demonstrate the promising applications of our aowei Wang, Bolong Chai, Shujun Ding, Shengli Sun of approach in solving two challenging real-world problems: Dunhuang Academy China for valuable discussions on 1) active acquisition of seamless high-definition panorama slow-and-minute change monitoring of ancient murals. We image; and 2) 0.1mm-level minute deterioration monitoring thank Jiliang Sun, Jiawan Zhang, Qifeng Yue, Nan Zhang, of very-slowly-changed ancient murals in Dunhuang Mo- Rui Huang, Yifeng Zhang, Dongrui Xiao for their con- gao Grottoes. Our work indeed shows the great potentials tributions to our platform development and onsite data of combining algorithms with common hardware in active collection. This work is supported by the National Sci- vision. In near future, we plan to further accelerate our ence and Technology Support Project (2013BAK01B01, DCR approach based on reliable superpixel segmentation 2013BAK01B05) and NSFC (61572354, 61272266).

4056 References [22] R. A. Newcombe and A. J. Davison. Live dense reconstruc- tion with a single moving camera. In CVPR, 2010. [1] S. Bae, A. Agarwala, and F. Durand. Computational repho- [23] R. A. Newcombe, S. Izadi, O. Hilliges, D. Molyneaux, tography. ACM TOG, 29(3):1–15, 2010. D. Kim, A. J. Davison, P. Kohi, J. Shotton, S. Hodges, and [2] A. J. Davison. Real-time simultaneous localization and map- A. Fitzgibbon. KinectFusion: Real-time dense surface map- ping with a single camera. In ICCV, 2003. ping and tracking. In ISMAR, 2011. [3] Y. Fang, W. E. Dixon, D. M. Dawson, and P. Chawda. [24] R. A. Newcombe, S. J. Lovegrove, and A. J. Davison. Homography-based visual servo regulation of mobile robots. DTAM: Dense tracking and mapping in real-time. In ICCV, IEEE Transactions on Systems, Man, and Cybernetics, Part 2011. B: Cybernetics, 35(5):1041–1050, 2005. [25] D. Nister.´ An efficient solution to the five-point relative pose [4] W. Feng, J. Jia, and Z.-Q. Liu. Self-validated labeling problem. IEEE TPAMI, 26(6):756–770, 2004. of Markov random fields for image segmentation. IEEE [26] D.-H. Park, J.-H. Kwon, and I.-J. Ha. Novel position-based TPAMI, 32(10):1871–1887, 2010. visual servoing approach to robust global stability under [5] W. Feng and Z.-Q. Liu. Region-level image authentication field-of-view constraint. IEEE Transactions on Industrial using Bayesian structural content abstraction. IEEE TIP, Electronics, 59(12):4735–4752, 2012. 17(12):2413–2424, 2008. [27] V. Pradeep, C. Rhemann, S. Izadi, C. Zach, M. Bleyer, and [6] W. Feng, Z.-Q. Liu, L. Wan, C.-M. Pun, and J. Jiang. S. Bathiche. MonoFusion: Real-time 3D reconstruction of A spectral-multiplicity-tolerant approach to robust graph small scenes with a single web camera. In ISMAR, 2013. matching. Pattern Recognition, 46(10):2819–2829, 2013. [28] T. Ruland, T. Pajdla, and L. Kruger. Globally optimal hand- [7] W. Feng, F.-P. Tian, Q. Zhang, N. Zhang, L. Wan, and J. Sun. eye calibration. In CVPR, 2012. Fine-grained change detection of misaligned scenes with var- [29] Y. Seo, Y.-J. Choi, and S. W. Lee. A branch-and-bound ied illuminations. In ICCV, 2015. algorithm for globally optimal calibration of a camera-and- [8] B. Glocker, S. Izadi, J. Shotton, and A. Criminisi. Real-time rotation-sensor system. In ICCV, 2009. RGB-D camera relocalization. In ISMAR, 2013. [30] J. Shotton, B. Glocker, C. Zach, S. Izadi, A. Criminisi, and [9] D. Guggenheim, A. Gore, L. Bender, S. Z. Burns, L. David, A. Fitzgibbon. Scene coordinate regression forests for cam- and G. W. Documentary. An inconvenient truth. Paramount era relocalization in RGB-D images. In CVPR, 2013. Classics Hollywood, CA, 2006. [31] H. Stewenius, C. Engels, and D. Nister.´ Recent develop- [10] A. Guzman-Rivera, P. Kohli, B. Glocker, J. Shotton, ments on direct relative orientation. ISPRS Journal of Pho- T. Sharp, A. Fitzgibbon, and S. Izadi. Multi-output learn- togrammetry and Remote Sensing, 60(4):284–294, 2006. ing for camera relocalization. In CVPR, 2014. [32] J. E. Wells II. Western landscapes, western images: a repho- [11] F. C. Hall. Photo point monitoring handbook: Part A-field tography of US Highway 89. PhD thesis, Kansas State Uni- procedures. USDA Forest Service, 2002. versity, 2012. [12] R. Hartley and F. Kahl. Global optimization through rotation [33] R. West, A. Halley, D. Gordon, J. O’Neil-Dunne, and space search. IJCV, 82(1):64–79, 2009. R. Pless. Collaborative rephotography. In ACM SIGGRAPH [13] R. Hartley, J. Trumpf, Y. Dai, and H. Li. Rotation averaging. Studio Talks, 2013. IJCV, 103(3):267–305, 2013. [34] B. Williams, G. Klein, and I. Reid. Automatic relocaliza- [14] J. Heller, M. Havlena, and T. Pajdla. A branch-and-bound al- tion and loop closing for real-time monocular SLAM. IEEE gorithm for globally optimal hand-eye calibration. In CVPR, TPAMI, 33(9):1699–1712, 2011. 2012. [35] K. Xu, H. Huang, H. Li, P. Long, J. Caichen, W. Sun, and [15] A. Irschara, C. Zach, J.-M. Frahm, and H. Bischof. From B. Chen. Autoscanning for coupled scene reconstruction and structure-from-motion point clouds to fast location recogni- proactive object analysis. In ACM SIGGRAPH Asia, 2015. CVPR tion. In , 2009. [36] J. Zaragoza, T.-J. Chin, Q.-H. Tran, M. S. Brown, and [16] N. Jiang, Z. Cui, and P. Tan. A global linear method for D. Suter. As-projective-as-possible image stitching with camera pose registration. In ICCV, 2013. moving dlt. IEEE TPAMI, 36(7):1285–1298, 2014. [17] G. Klein and D. Murray. Parallel tracking and mapping for [37] G. Zhang, J. Jia, T.-T. Wong, and H. Bao. Consistent small ar workspaces. In ISMAR, 2007. depth maps recovery from a video sequence. IEEE TPAMI, [18] K.-T. Lee, S.-J. Luo, and B.-Y. Chen. Rephotography using 31(6):974–988, 2009. image collections. In Computer Graphics Forum, 2011. [38] S. Zhang, W. Feng, J. Zhang, and C.-M. Pun. Bag of squares: [19] L. Li, W. Feng, L. Wan, and J. Zhang. Maximum cohesive A reliable model of measuring superpixel similarity. In grid of superpixels for fast object localization. In CVPR, ICME, 2014. 2013. [39] X. Zhang, Y. Fang, and X. Liu. Motion-estimation-based vi- [20] D. G. Lowe. Distinctive image features from scale-invariant sual servoing of nonholonomic mobile robots. IEEE Trans- keypoints. IJCV, 60(2):91–110, 2004. actions on Robotics, 27(6):1167–1175, 2011. [21] G. L. Mariottini, G. Oriolo, and D. Prattichizzo. Image-based visual servoing for nonholonomic mobile robots using epipo- lar geometry. IEEE Transactions on Robotics, 23(1):87–100, 2007.

4057