MITSUBISHI ELECTRIC RESEARCH LABORATORIES http://www.merl.com

A Theory of Multi-Layer Flat Refractive

Agrawal, A.; Ramalingam, S.; Taguchi, Y., Chari, V.

TR2012-047 June 2012

Abstract Flat refractive geometry corresponds to a perspective camera looking through single/multiple flat refractive mediums. We show that the underlying geometry of rays corresponds to an axial camera. This realization, while missing from previous works, leads us to develop a general theory of calibrating such systems using 2D-3D correspondences. The pose of 3D points is assumed to be unknown and is also recovered. Calibration can be done even using a single image of a . We show that the unknown orientation of the refracting layers corresponds to the underlying axis, and can be obtained independently of the number of layers, their distances from the camera and their refractive indices. Interestingly, the axis estimation can be mapped to the classical essential matrix computation and 5- algorithm [15] can be used. After computing the axis, the thicknesses of layers can be obtained linearly when refractive indices are known, and we derive analytical solutions when they are unknown. We also derive the analytical forward projection (AFP) equations to compute the projection of a 3D point via multiple flat refractions, which allows non-linear refinement by minimizing the reprojection error. For two refractions, AFP is either 4th or 12th degree equation depending on the refractive indices. We analyze ambiguities due to small field of view, stability under noise, and show how a two layer system can be well approximated as a single layer system. Real experiments using a water tank validate our theory.

IEEE Conference on Computer Vision and Pattern Recognition

This work may not be copied or reproduced in whole or in part for any commercial purpose. Permission to copy in whole or in part without payment of fee is granted for nonprofit educational and research purposes provided that all such whole or partial copies include the following: a notice that such copying is by permission of Mitsubishi Electric Research Laboratories, Inc.; an acknowledgment of the authors and individual contributions to the work; and all applicable portions of the copyright notice. Copying, reproduction, or republishing for any other purpose shall require a license with payment of fee to Mitsubishi Electric Research Laboratories, Inc. All rights reserved.

Copyright c Mitsubishi Electric Research Laboratories, Inc., 2012 201 Broadway, Cambridge, Massachusetts 02139 MERLCoverPageSide2 A Theory of Multi-Layer Flat Refractive Geometry

Amit Agrawal, Srikumar Ramalingam, Yuichi Taguchi Visesh Chari Mitsubishi Electric Research Labs (MERL) INRIA Rh one-Alpesˆ [agrawal,ramalingam,taguchi] at merl.com visesh.chari at inrialpes.fr

Axis Plane of Abstract Refraction z1: Axis Camera z2 [R, t] Flat Refractive Flat refractive geometry corresponds to a perspective v0 Surfaces µ0 d0 camera looking through single/multiple parallel flat refrac- q1

µ1 v1 d1 tive mediums. We show that the underlying geometry of q2 rays corresponds to an axial camera. This realization, while µ missing from previous works, leads us to develop a general n-1 qn dn-1 µn vn theory of calibrating such systems using 2D-3D correspon- Object Points dences. The pose of 3D points is assumed to be unknown and is also recovered. Calibration can be done even using Figure 1. (Left) Flat refractive geometry with n layers. (Middle) a single image of a plane. The entire light-path for each pixel lies on a plane and all planes We show that the unknown orientation of the refract- intersect in a common axis passing through the camera center. ing layers corresponds to the underlying axis, and can be (Right) After computing the axis, analysis can be done on the plane obtained independently of the number of layers, their dis- of refraction to estimate layer thicknesses and refractive indices. tances from the camera and their refractive indices. Inter- viewpoint system is known (see, for example, [ 27 ]). How- estingly, the axis estimation can be mapped to the classical ever, we show that the underlying geometry of rays in such essential matrix computation, and 5 point algorithm [17 ] systems actually corresponds to an axial camera. This re- can be used. After computing the axis, the thicknesses of alization, which has been missing from previous works to layers can be obtained linearly when refractive indices are the best of our knowledge, allows us to handle multiple lay- known, and we derive analytical solutions when they are ers in a unified way and results in practical and robust algo- unknown. We also derive the forward projection (FP) equa- rithms. Firstly, we show that the unknown orientation of the tions to analytically compute the projection of a 3D point refracting layers corresponds to the underlying axis, which via multiple flat refractions, which allows non-linear refine- can be estimated independently of the number of layers, ment by minimizing the reprojection error. For two refrac- their distances and their refractive indices. This results in tions, FP is either 4th or 12 th degree equation depending on considerable simplification of the calibration problem via a the refractive indices. We analyze ambiguities due to small two-step process, where the axis is computed first. Without field of view, stability under noise, and show how two layer such simplification, the calibration is difficult to achieve. systems can be well approximated using a single layer. Real Secondly, we show that the axis estimation can be mapped experiments using a water tank validate our theory. to the classical relative orientation problem (essential ma- 1. Introduction trix estimation) for which excellent solutions (e.g., 5-point algorithm [17 ]) already exist. In fact, calibration can be A camera observing a scene through multiple refrac- done using a single plane, since essential matrix can be es- tive planes (e.g. underwater imaging) results in distortions timated from a planar scene [ 17 , 14 ]. Our primary contri- and gives the illusion of scene being closer and magnified. butions are as follows. While 3D reconstruction in such scenarios has been ana- lyzed in multi-media photogrammetry [ 11 , 21 , 19 ], such imaging setups have been relatively unaddressed in com- • We show that the geometry of rays in flat refraction puter vision community until recently. Calibrating such a systems corresponds to an axial camera, leading to a system with multiple layers with unknown layer orienta- unified theory for calibrating such systems with multi- tion, distances and refractive indices remains an open and ple layers. challenging problem. • By demonstrating the equivalence with classical essen- The fact that such systems do not correspond to a single tial matrix estimation, we propose efficient and robust

1 algorithms for calibration using planar as well as non- a bundle-triangulation framework where the points of re- planar objects. fraction are computed iteratively, starting with a central ap- • We derive analytical forward projection equations for proximation. In contrast, we derive analytical equations for multiple refractions, which allows minimizing the re- computing the projection of a 3D point via multiple refrac- projection error. tions (FP). This allows non-linear refinement of the initial solution by minimizing the reprojection error. For single • We analyze ambiguities with small FOV, and show that refraction, FP is a 4th degree equation [ 7]. For two refrac- multiple layer systems may be well-approximated by tions, we derive a 4th /12 th degree equation depending on single/two-layer systems. refractive indices. Non-Central/Axial Cameras: Chari and Sturm [3] showed 1.1. Related Work the existence of geometric entities such as fundamental ma- trix considering refraction. Generic camera calibration al- Maas [15 ] considered a three layer system assuming that gorithms [24 ] have been proposed to calibrate non-central the image plane is parallel to the refractive interfaces. His cameras. Ramalingam et al . [ 18 ] proposed a general frame- approach corrects for the radial shift of the projected 3D work for calibrating axial cameras using three checker- points using optimization. Treibitz et al . [ 27 ] considered a boards. Their parameterization does not use camera-side single refraction with known refractive index in an under- rays and involves two rotations/translations. In contrast, we water imaging scenario. They assume the distance of the parameterize the axis to pass through the camera and use interface as the single unknown parameter (when the cam- camera-side rays, allowing calibration from a single plane. era is internally calibrated) and perform calibration using However, we use their insight that the axis can be computed known depth of a planar checkerboard. The image plane is first. Models based on radial distortion for calibration as- parallel to the interface in their setup as well. In contrast, sume known center of distortion [ 26 ] or model each dis- we do not assume that the refractive interfaces are fronto- tortion circle separately [25 ]. We use a global model with parallel. We handle multiple layers with unknown layer fewer parameters for flat refractive systems. Interestingly, distances, consider known/unknown refractive indices and axis estimation in our case is similar to the center of distor- do not assume prior information about the pose of the cali- tion estimation for central cameras in [ 9]. brating object. We only assume that the camera is internally calibrated a-priori. 2. General Flat Refractive Geometry 3D reconstruction under reflections/refractions has been Consider the general setup for flat refractive geometry explored in [ 4, 16 , 20 , 2, 23 ] either for reconstructing the as shown in Figure 1, where a perspective camera observes scene or the medium itself. Chen et al . [ 4] used a direct view a known calibration object via n flat refraction layers. We and an indirect view by looking through a thick glass slab. work in the camera coordinate system with the camera cen- However, direct view is required to estimate the orientation ter at the origin. Let n denote the common surface normal of the slab and an additional view to obtain the refractive to all the layers and [di, µ i] be the thickness and refractive index. We show that a single set of 2D-3D correspondences th index of the i medium respectively. d0 represents the dis- allows estimating medium thickness as well as refractive tance between the camera and the first layer. Let P(i)K index without requiring the direct view. Other works as- i=1 denote K 3D points on the object which are known in the sume known vertical direction [ 2] or require several images object coordinate system and let [R, t ] be the unknown rigid for calibration [20 ]. Steger and Kutulakos [ 23 ] showed that transformation of these points. light-path triangulation becomes degenerate when the en- tire light-path lies on a plane, which is the case here. Their 2.1. Flat Refraction Constraint goal is to compute the shape of the refractive medium, and thus they consider each light-path independently. In con- We assume that the internal camera calibration has been trast, we know partial knowledge of shape (flat layers), and done offline and hence we know the camera ray v0(i) for thus light-paths can be parameterized using calibration pa- each 3D point P(i). Let [v0(i), v1(i),..., vn(i)] denote the rameters. Thus, we can use information from multiple-light corresponding light-path. The last refracted ray vn should paths to compute the calibration parameters. For two refrac- be parallel to the joining the transformed 3D point tions (air-medium-air), our analysis is consistent with [ 22 ] RP + t and the refraction point qn on the last layer. Thus, in that the distance to the medium cannot be estimated. the following Flat Refraction Constraint (FRC) should be However, we show theoretically that if all refractive indices satisfied. are different, light paths are not degenerate for any number FRC : ( RP + t − q ) × v = 0 , (1) of layers. n n Refractions have also been modeled using ray- where × denotes the cross-product. Our goal is to estimate n−1 n tracing [12 , 13 ] for calibration. Kotowski [ 12 ] proposed the unknown calibration parameters n, [di]i=0 , [µi]i=0 as well as the unknown pose [R, t ] given K 2D-3D correspon- 3.1. Axis Computation K dences [v0(i), P(i)] . i=1 Let X(:) be the vector formed by stacking the columns From Snell’s law, µ sin θ = µ +1 sin θ +1 , where θ is i i i i i of a matrix X and let ⊗ denote the kronecker product. Let the between vi and n. This can be written in vector [A]× be the 3 × 3 skew-symmetric matrix obtained from form [8] as 3-vector A. The constraint can be re-written as vi+1 = ai+1 vi + bi+1 n, (2) T T T where ai+1 = µi/µ i+1 and 0 = v0 (A × (RP + t)) = v0 EP + v0 s, (6)

T 2 T 2 2 2 T T −µivi n − µi (vi n) − (µi − µi+1 )vi vi where E = [ A]×R and s = A × t. Note that s A = 0 and bi+1 = q . (3) thus the full translation t cannot be estimated using copla- µ +1 i narity constraints. The component of t in the direction of Since Snell’s law only depends on the ratio of the refractive axis, tA, vanishes in s. Thus, we have 7 degrees of free- indices, we assume µ0 = 1 without loss of generality. We dom: 2 for axis, 3 for and 2 for translation. first derive the FRC for a single layer and a single 3D point 11 -point Linear Algorithm: Stacking equations for 11 T P. The refraction point q1 is given by q1 = −d0v0/(v0 n). correspondences, we get a linear system Substituting in ( 1), T T T (P(1) ⊗ v0(1) ) v0(1) T . . E(:) (RP + t) × (a1v0 + b1n) + b1d0(v0 × n)/(v0 n) = 0 . (4)  . .  = 0 , (7) . .  s  T T T After substituting for a1 and b1 and removing the square  (P(11) ⊗ v0(11) ) v0(11)    root term in b1, we get an equation with second order terms B of R, t and µi and sixth order terms of n. Thus, directly | {z } solving the FRC is quite difficult. More importantly, the where B is a 11 × 12 matrix whose rank is 11 . Let B = complexity of the FRC equation increases with each addi- UΣV T be the SVD of B. The solution is given by the last tional layer due to the square root term in each b . Thus, i column of V . The scale factor is obtained by setting the in order to solve this problem efficiently, we need to ana- norm of E to one. lyze the geometry of underlying rays to derive simpler con- -point Algorithm: Notice the striking similarity between straints. In the next section, we describe coplanarity con- 8 our E matrix ([ A]×R) and the essential matrix [10 ] for rela- straints that allow us to estimate the normal n and 5 out of tive motion between two perspective cameras ([ t]×R). This 6 pose parameters independently of d ’s and µ ’s. i i implies that we can map the axis estimation to the 5-point 3. Coplanarity Constraints algorithm for essential matrix computation [17 ]. Given 8 correspondences, we obtain a 8 × 12 matrix B as above. Axial Camera: We first show that an n-layer flat refraction 4 Let Ui=1 be the right null singular vectors of B. The solu- system corresponds to an axial (non-central) camera. We tion lies in a four dimensional subspace define the axis A as the line parallel to n passing through the camera center. Let π be the plane of refraction (POR) E(:) containing the axis A and a given camera ray v . The nor- = λ1U1 + λ2U2 + λ3U3 + λ4U4, (8) 0  s  mal n lies on π. From Snell’s law, the incoming ray, the normal and the refracted ray lie on the same plane at any re- where λi’s are unknown scalars. λ4 can be set to 1 due to fraction boundary. Hence v1 should lie on π. By induction, the unknown scale factor. The ‘ E’ part of the solution is the entire light-path should lie on π. Thus, the last refracted ray vn should intersect the axis A, since both of them are E = λ1U1(1 : 9) + λ2U2(1 : 9) + λ3U3(1 : 9) + U4(1 : 9) , coplanar. Thus, all outgoing rays intersect the axis and the system is axial. where U (1 : 9) denote the first 9 elements of U . Now λ ’s Since v lies on π, the transformed 3D point RP + t i i i n can be computed using the solution in [ 17 ] by providing the should also lie on π. Thus, the coplanarity constraint for above subspace vectors for E. each 3D point can be written as After recovering E and s, the axis is computed as the left T T Coplanarity : ( RP + t) (A × v0) = 0 , (5) null singular vector of E (since A E = 0 ). The sign ambi- guity in axis is resolved by pointing it towards the camera. where (A × v0) is the normal to POR. Note that the copla- The translation orthogonal to axis, tA⊥ , can be obtained as narity constraint is independent of the number of layers n, s×A. Four solutions for R are recovered from E as in [ 17 ]. their thicknesses di, and the refractive indices µi. It only The correct solution is obtained after recovering the layer depends on the axis and pose parameters. thicknesses and constraining them to be positive. Angular Error in Axis (degrees) Rotation Error (degrees) Normalized Error in t ⊥ Case 1 14 A 2.5 0.5 v 12 d0 q1 1 Case 1 (All) 2 Case 1 (All) 0.4 n Case 1 (All) 10 Case 1 (11pt) Case 1 (11pt) v0 Case 1 (11pt) Case 1 (8pt) Case 1 (8pt) Case 1 (8pt) Case 2 (All) 1.5 0.3 8 Case 2 (All) Case 2 (All) Case 2 (11pt) Case 2 (11pt) Case 2 (11pt) Case 2 (8pt) 6 Case 2 (8pt) Case 2 (8pt) Case 2 1 0.2 d1 q2 v2 4

d q1 0.5 0.1 0 n v1 2 v0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Noise σ Noise σ Noise σ

Figure 2. (Left) Single layer (Case 1) and two layer (Case 2, µ2 = µ0) configurations. (Right) Comparison of 11 pt and 8pt algorithm for estimation of axis, rotation and tA⊥ using coplanarity constraints.

3.2. Simulations along the axis. Let [z2, z1] denote an orthogonal coordi- nate system on the POR. We choose z along the axis. For We present simulations for estimating the axis with 1 a given camera ray v0, let z2 = z1 × (z1 × v0) be the or- Gaussian noise (variance σ2 pixels) in feature points for thogonal direction. The projection of Pc on POR is given Case 1 and Case 2, shown in Figure 2. We assume a camera x y x T y T ◦ by u = [ u , u ], where u = z P and u = z P . Simi- with FOV of 45 and resolution 10002 pixels. The scene is 2 c 1 c larly, each ray vi on the light-path of v0 can be represented set by choosing d0 = 300 units, d1 = 450 units, µ1 = 1 .5 by a 2D vector vp i on POR, whose components are given and µ2 = 1 . We perform 100 trials for each noise setting, T T T by z2 vi and z1 vi. Let ci = vp z1 and zp = [0; 1] be a unit and plot the average error in axis, rotation and t ⊥ in Fig- i A 2D vector. ure 2. For each trial, the axis is randomly generated in a ◦ cone of half-angle 45 around the camera’s optical axis. 4.1. Linear System for n Layers 100 3D points are randomly generated along with R and For each correspondence, the FRC for n layer system on t so that they lie within [d0, 2d0] after the last layer. For its plane of refraction is given by each trial, RANSAC-framework is employed for both 11 pt and 8pt algorithms using 200 iterations to choose the best vp n × (u + αzp − qn) = 0 . (9) solution using the coplanarity error. Rotation error is de- fined as the minimum angle of rotation required to go from This is because the last refracted ray vp n should be paral- estimated rotations to the true rotation. Similarly, axis er- lel to the line joining the transformed 3D point u + αzp ror is defined as the angle between the estimated and the and the refraction point qn on the last layer. qn = n−1 true axis. The translation error is computed as the norm of i=0 −divp i/c i. Substituting, we get the translation error vector and is normalized using the cor- P v n d0 responding layer thickness. Notice that when 0 k , the . vp vp n−1 . coplanarity constraint cannot be used. Thus, rays having 0 z  .  vp n × c0 . . . cn−1 p = −vp n × u. small angle with axis are unreliable. As expected, the 8pt h i  dn−1   α  algorithm performs significantly better than the 11 pt algo-   rithm as well as using all points in a least squares fashion. Thus, each correspondence gives one linear equation in di’s 4. Recovery of Layer Thicknesses and α. By stacking K > n correspondences, the resulting linear system can be solved to obtain di’s and α for n layers. We first assume known refractive indices. Section 3.1 After estimating α, the translation t is given by tA⊥ + αA. showed how to compute the axis A, rotation and trans- R However, if µi = µn for any i, vp i k vp n and di cannot be ⊥ lation tA orthogonal to the axis. When µi’s are known, estimated. In addition, if µi = µj , only the combined layer the ray directions for the entire light-path v0(i),..., vn(i) thickness di + dj can be estimated, since the correspond- can be pre-computed using the estimated A. The remaining ing columns in the linear system become equal. Now we unknowns are the layer thickness di’s and the translation tA analyze some special cases. along the axis, which can be computed linearly as described Case 1 (Single Refraction): For single layer, we have below. two unknowns d0 and α and the FRC is given by Coordinate Transformations: We first apply the com- d0 puted R and tA⊥ to the 3D points P. Let Pc = RP + tA⊥ . vp 1 × vp 0/c 0 zp = −vp 1 × u. (10)  α  With known axis, the analysis can be done in 2D on the   plane of refraction (POR) itself as shown in Figure 1. Let Using K ≥ 2 correspondences, a least squares solution can tA = αA, where α is the unknown translation magnitude be obtained. Rotation Error (degrees) Reprojection Error (pixels) 7 Normalized Translation Error 0.7 18 Translation in Z (Case 2) 1400 6 16 0.6 1350 14 5 1300 Case 1 (Ours) 0.5 12 Case 1 (Ours) Case 1 (CA) Case 1 (CA) 1250 4 Case 2 (Ours) 0.4 10 Case 2 (Ours) Case 2 (CA) Case 2 (CA) 1200 8 3 0.3 1150 Case 1 (Ours) Case 1 (CA) 6 1100 2 0.2 Case 2 (Ours) 4 1050 Case 2 (CA) Ground Truth 1 0.1 2 1000 Ours CA 0 0 0 950 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 20 40 60 80 100 Noise σ Noise σ Noise σ All Trials ( σ = 1) Figure 3. Comparison of rotation, translation and reprojection error using our algorithm with a central approximation (CA) for Case 1 and Case 2. The right most plot shows the estimated tz for Case 2 over all 100 trials for σ = 1 pixel. CA estimates the object to be closer to the camera. Case (Two Refractions) : This is a common We have vp 1 vp z . From ( 3), 2 µ0 = µ2 1 = µ1 0 + b1 1 scenario when looking through a refractive medium such as a thick glass slab. Here d0, d1 and α are unknowns. Since 2 2 2 2 b1 = ( d0 − d0 − (1 − µ1)( x + d0)) /µ 1. (14) µ0 = µ2, vp 2 k vp 0 and the FRC reduces to q

d1 The refraction point q2 = [ x, d 0] − d1vp 1/c 1. Substituting vp 0 × vp 1/c 1 zp = −vp 0 × u. (11)  α  all these terms in ( 13 ), we get   2 2 2 2 2 x y 2 2 Thus, we can only estimate the thickness d1 of the medium, (d0µ1+µ1x −x )( d0u +d1x−u x) = ( d0d1x) . (15) but not its distance d0. This is consistent with the analy- sis shown in [ 22 ]. In Section 5, we show that although d0 Similar to Case 1, we obtain a 4th degree equation in x. cannot be estimated, the projection of the 3D point on the This gives four solutions for x and the correct solution is image plane can be computed for non-linear refinement. found by checking Snell’s law for each solution. After ob- Case 3 (Two Refractions) µ0 6= µ2: Now v2 and v0 are taining x, the camera ray is obtained as xz2 + d0z1 using not parallel and the FRC is given by which the image projection p can be computed via internal camera calibration matrix. In Section 4, we showed that d0 d0 cannot be estimated for Caseb 2. The reader might wonder vp vp 0 vp 1 z 1 vp u 2 × 0/c 1/c p  d  = − 2 × . (12) how AFP equation can then be solved. The key idea is that   α   the camera ray can be computed by using any value of d0. ′ For example, let d = λd 0 for λ > 0. Then the solution Thus, we can estimate the distance d0 as well. ′ 0 x = λx . Thus, the camera ray remains the same. For Case 5. Non-Linear Refinement 3, given calibration parameters d0, d 1, µ 1, µ 2 and the known 3D point, we show in supplementary materials that a 12 th Till now we described an initial solution for estimating degree AFP equation can be derived similarly. the calibration and pose parameters. We now show how to Iterative Refinement: Let x denote all calibration pa- perform non-linear refinement of calibration and pose pa- c rameters. Given an initial estimate of x and pose, let rameters. c p(i) = AFP(xc, R P(i) + t) be the image projection of Analytical Forward Projection (AFP): Given a calibrated P(i) computed by solving the AFP equation. The repro- central or non-central camera, the AFP describes an analyti- jectionb error is defined as the root mean square (RMS) error cal method to compute the projection (or the corresponding 1 K 2 camera ray) of a known 3D point. AFP can be used to min- J = K i=1 (p(i) − p(i)) . We use lsqnonlin in q imize the image reprojection error. For Case 1, AFP is a Matlab to refineP xc and pose [R, t ] by minimizing J. b 4th degree equation [ 7]. We derive the AFP equation for two refractions and show that it is a 4th degree equation for 5.1. Simulations th Case 2 and a 12 degree equation for Case 3. The analysis Now we show simulations for complete calibration and can be done on POR. pose estimation using the same settings as in Section 3.2. Case 2: Assume given calibration parameters d0, d 1, µ 1 The 8pt algorithm is used since it works better. In the and the coordinates u = [ ux, u y] of a known 3D point on RANSAC framework, after estimating the axis, the best set POR. The unknown camera ray vp 0 can be parameterized of 8 points are used to compute α and di’s in a least square as [x, d 0]. Since µ2 = µ0, vp 2 k vp 0. The AFP equation is fashion as described in Section 4. Since there are 4 solu- given by tions for R from E matrix, we get 4 solutions for α and di’s. vp 0 × (u − q2) = 0 . (13) The correct solution is found by enforcing α > di and P Angular Error in Axis (degrees) Normalized Error in Layer Thickness 0.45 0.03 algorithm was 255.69 mm, resulting in a relative error of 0.4 Case 1 Case 1 1.66%. Notice the large error in tz and large J in central Case 2 0.025 Case 2 0.35 approximation, also evident from projected points in Fig- 0.3 0.02 0.25 ure 6. Interestingly, the central approximation can recover 0.015 0.2 the rotation well-enough. 0.15 0.01 0.1 6.1. Planar Calibration Grid 0.005 0.05

0 0 We now show that calibration can also be done using a 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Noise σ Noise σ single planar grid, which is useful in practice. We describe Figure 4. Error in axis and layer thickness for Case 1 and Case 2 an 8pt algorithm as follows. Without loss of generality, as- using 8pt algorithm after non-linear refinement. sume the plane is aligned with xy plane (Pz(i) = 0 ). Sub- stituting in the coplanarity equation ( 6), the columns 7, 8, 9 of B matrix reduce to zero. Let B′ be the reduced 8 × 9 matrix, whose rank is 8. Thus, we can directly estimate the first two columns of the E matrix and s by SVD based so- lution using 8 correspondences. The last column of E is re- covered using Demazure constraints [ 6, 5] and det (E) = 0 constraint. Figure 6. (Left) Setup. (Right) Photo captured by looking through e1 e4 x a water tank. Projected 3D points are overlayed by applying pose Let E =  e2 e5 y , where ei’s are estimated as estimated using CA (green) and our algorithm (red). e3 e6 z   above and x, y, z are unknown. Setting det (E) = 0 di > 0 ∀ i. The obtained initial solution is refined by min- gives a linear equation using which x can be obtained as imizing the reprojection error using the AFP. We also com- x = (( e1e6 − e3e4)y + ( e2e4 − e1e5)z)/(e2e6 − e3e5). pute the pose obtained using a central (perspective) approxi- The Demazure constraints provide three cubic equations mation from the given 2D-3D correspondences (referred by and six quadratic equations in unknowns, from which any CA ). Figure 3 and 4 show error plots for pose and calibra- two quadratic equations can be chosen. Substituting x re- tion parameters and the final reprojection error for different sults in two quadratic equations in y and z, which can be noise levels, averaged over 100 trials. These plots show that solved to obtain a 4th degree equation in z. In general, there correct calibration and pose parameters can be obtained us- are two real solutions which differ in sign. Thus, we obtain ing our algorithm. Notice the large translation and repro- a pair of E matrices which differ in sign of their last column. jection error, and smaller estimated tz when using a central Each pair of obtained rotation matrices also have the same approximation (CA) in Figure 3. This is expected, because property. The correct rotation matrix is chosen by checking when looking through a denser refractive medium, objects for the determinant value of one 1. appear closer to the camera. Note that the error due to noise Figure 7 shows simulation results on estimating calibra- in CA is insignificant compared to the error due to incorrect tion and pose parameters for Case 1 and Case 2 using a modeling. planar grid. Again, note that a central approximation com- pletely fails. For real data shown in Figure 6, we estimated 6. Real Experiments the calibration parameters using only the left checkerboard We show real results using a water tank of as shown in Figure 5. Thus, we see that calibration can also 508 × 260 × 300 mm 3. We use a Canon Rebel XT camera be performed using a single planar grid. having resolution of 3456 × 2304 pixels with a 18 − 55 mm 6.2. Unknown Refractive Indices zoom lens. The camera was internally calibrated offline us- Due to lack of space, we only consider Case 1. We have ing [1]. Figure 6 shows a photo of a scene consisting of ′ three checkerboards, captured by looking through the wa- three unknowns d0, µ1 and α. When µis are unknown, ter tank (facing 260 mm side of tank). In order to obtain ray directions cannot be pre-computed and FRC needs to ground truth, we took another photo in air, using which the be written in terms of camera rays as follows extrinsic of checkerboards were computed. The resulting (a1vp 0 + b1z1) × (u + αzp + d0vp 0/c 0) = 0 . (16) 3D points in the coordinate system of left checkerboard are x y Let vp = [ v ; v ]. After substituting a1 and b1 and remov- shown in Figure 5. We detect corners in the captured photo 0 ing the square root term in b1, we get and run our algorithm (Case 2) to estimate the calibration x y x y 2 x y y 2 and pose parameters. Figure 5 shows the estimated rotation (d0v − v u )( γ + ( v ) − 1) = ( v v (α − d0 − u )) , (17) and translation parameters, along with the final reprojec- 1The determinant of incorrect rotation matrices equals −1, correspond- tion error J. The estimated thickness of the tank using our ing to a reflection. 3D Points in Object Coordinate System N θx, θ y, θ z (deg) tx, t y, t z (mm) J (pixel) d1 (mm)

400 GT 131 .38 , 1.22 , 84 .07 −237 .58 , −128 .85 , 455 .80 0 260 CA 144 130 .24 , 1.42 , 83 .84 −217 .71 , −120 .73 , 372 .14 10 .1 - 200 Ours†§ 144 131 .38 , 1.26 , 84 .12 −237 .11 , −128 .16 , 453 .12 0.33 255 .69 0 †♯ 200 Ours 48 131 .40 , 1.36 , 84 .03 −239 .76 , −129 .26 , 456 .34 0.26 272 .81 ‡ −200 0 Ours 144 131 .37 , 1.26 , 84 .12 −236 .46 , −127 .86 , 449 .70 0.33 262 .39 600 400 200 0 −200 −200 †: Assuming µ1 = 1 .33 , § All Planes, ♯ Left Plane Only, ‡ All Planes, Unknown µ1 Figure 5. (Left) 3D points in the left checkerboard coordinate system. (Right) Estimates of pose and water-tank thickness d1, and the final reprojection error J for real data shown in Figure 6 using central approximation (CA) and our algorithm. GT denotes ground-truth and N denotes the number of 2D-3D correspondences used.

Rotation Error (degrees) Angular Error in Axis (degrees) 8 Normalized Translation Error 0.7 Normalized Error in Layer Thickness 0.45 0.035 7 0.6 Case 1 0.4 Case 2 Case 1 (Ours) 0.03 Case 1 6 Case 2 Case 1 (CA) 0.35 Case 1 (Ours) 0.5 Case 2 (Ours) Case 1 (CA) 0.025 5 0.3 Case 2 (CA) Case 2 (Ours) 0.4 Case 2 (CA) 4 0.25 0.02 0.3 0.2 3 0.015 0.15 0.2 2 0.01 0.1 0.1 1 0.005 0.05

0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Noise σ Noise σ Noise σ Noise σ Figure 7. Errors in axis, rotation, translation and layer thickness using a planar calibration grid for different noise values, averaged over 100 trials. Rotation and translation errors using a central approximation (CA) are also shown. 2 3 where γ = µ1. Let [EQ i]i=1 be the 3 equations for 3 cor- Notice that even when µi’s are known, the only quantity respondences. Using EQ 1, γ can be obtained as a func- that can be estimated is χ, which is a combination of α and tion of d0 and α. Substituting γ in EQ 2 and EQ 3 makes di’s. This also implies that even when the depth of the cali- 2 them cubic in d0 and quadratic in α. After eliminating α bration object (and hence α) is known as in [ 27 ], individual between EQ 2 and EQ 3, α is obtained as a cubic function layer thickness cannot be obtained for small FOV. In Fig- of d0, which when substituted back into EQ 3 results in a ure 6, if we perform calibration using the center checker- th ◦ 6 degree equation in one unknown d0. Solving it results board only, the rotation error is within 0.5 whereas the tank in 6 solutions. The correct solution is found by enforcing thickness d1 is estimated to be 550.38 mm (error of 290.38 d0 > 0, α > d 0 and µ1 > 0. Similarly, Case 2 also re- mm), along with 70 .45 mm error in α. However, the error sults in a th degree equation 2. However, Case proved too in corresponding quantity 1 from the true 6 3 χ = α+d1( µ1 −1) difficult to obtain an analytical equation. Thus, multi-layer value is only 1.59 mm. A central approximation in this case systems require good initial guess when µi’s are unknown. also gives a low reprojection error with similar translation Figure 5 shows the pose and calibration estimates for real error. data (Figure 6) assuming unknown µ1, which was recov- Multi-Layer Refractions: Non-central cameras can be ered as 1.296 (relative error 2.55%). well modeled using central approximation when the locus 7. Analysis of viewpoints is small (e.g., catadioptric camera with mirror size significantly smaller than scene depth). For multi-layer Field-of-View (FOV): Small FOV results in ambiguity be- refractions, a natural question then arises whether they can tween α and layer thicknesses as analyzed below using be modeled by simpler single/two-layer models. This also small angle approximation sin θ ∼ tan θ ∼ θ. For small becomes important when multi-layer modeling becomes , θi ∼ θ0/µ i, where θ0 is the angle between the cam- challenging due to similar or unknown refractive indices. era ray and the axis3. Rewriting the FRC ( 9) on the POR We analyze if Case 3 (air-glass-water) can be approximated using angles, by Case 1 (air-medium). We perform simulation as in Sec- tion 3.2 using µ1 = 1 .5, µ2 = 1 .33 , d0 = 300 units and x y tan θn = ( u − di tan θi)/(u + α − di). (18) d1 = 450 units. We add different amount of noise in 2D Xχ X features and perform 100 trials for each. For each trial, we apply central approximation, single layer (SL) approxima- di SL y x tion using µ = 0 .5( µ1 + µ2) = 1 .415, and the correct θ0(αz + µn }| − d{i) = −u θ0 + u µn. (19) 1 X µi X two-layer model to estimate the pose and calibration param- eters ( SL can also be estimated as in Section 6.2). Figure 8 2Supplementary materials provide details for Case 1 and Case 2. µ 3For simplicity, we assume small angle between camera’s optical axis shows the average error plots. Notice that with no noise, and the normal n. both SL and CA approximations give non-zero errors while Rotation Error (degrees) Reprojection Error (pixels) Axis Error (degrees) 9 Normalized Translation Error 2 1.4 10 8 1.2 1.6 7 8

6 1 1.2 6 5 0.8 Two Layers Two Layers Two Layers 4 SL SL SL CA 0.6 4 0.8 Two Layers CA CA 3 SL 0.4 2 0.4 2 1 0.2

0 0 0 0 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 0 0.2 0.4 0.6 0.8 1 Noise σ Noise σ Noise σ Noise σ Figure 8. As noise increases, a two layer refraction system (Case 3) can be well-approximated using a single layer (SL) system (Case 1), but not using a central approximation. the correct model gives zero error. However, as noise is in- [5] M. Demazure. Sur deux problemes de reconstruction. Tech- creased, SL approximation gives similar pose, axis and re- nical Report 882, INRIA, 1988. 6 projection errors compared to the true model. As expected, [6] O. Faugeras. Three-Dimensional Computer Vision: A Geo- a central approximation is significantly worse than SL ap- metric Viewpoint . MIT Press, 1993. 6 proximation. In general, with large noise we expect multi- [7] G. Glaeser and H.-P. Schr ocker.¨ Reflections on refractions. layer refractions to be modeled well by either Case 1 or J. Geometry and Graphics , 4(1):1–18, 2000. 2, 5 [8] A. S. Glassner. An Introduction to Ray Tracing . Morgan Case 2 depending on µ 6= µ0 or µ = µ0 respectively. n n Kaufmann, 1989. 3 [9] R. I. Hartley and S. B. Kang. Parameter-free radial distor- 8. Conclusions tion correction with center of distortion estimation. PAMI , Our primary contribution is to understand the geometry 29(8):1309–1321, Aug. 2007. 2, 8 of perspective cameras imaging through parallel refractive [10] R. I. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision . Cambridge University Press, second layers. We developed a theory for calibration and derived edition, 2004. 3 forward projection equations, which can be directly used in [11] J. H ohle.¨ Reconstruction of the underwater object. Pho- applications such as 3D reconstruction [ 2]. We presented a togrammetric Engineering , pages 948–954, 1971. 1 comprehensive analysis under unknown layer distances and [12] R. Kotowski. Phototriangulation in multi-media photogram- orientation, and known/unknown refractive indices. Since metry. Int’l Archives of Photogrammetry and Remote Sens- calibration can be done using a single plane, the proposed ing, XXVII, 1988. 2 algorithms are useful in practical scenarios such as under- [13] R. Li, H. Li, W. Zou, R. G. Smith, and T. A. Curran. Quanti- water imaging. We showed that multi-layer systems may be tative photogrammetric analysis of digital underwater video well-approximated by simpler single layer systems. Multi- imagery. IEEE J. Oceanic Eng. , 2:364–375, 1997. 2 ple planar grids can be used to increase the calibration accu- [14] H. C. Longuet-Higgins. The reconstruction of a plane sur- racy similar to calibration of perspective cameras. Our pro- face from two perspective projections. Proc. Royal Society of London , B227:399–410, 1986. 1 posed 8 point algorithm for axis computation can be used [15] H.-G. Maas. New developments in multimedia photogram- for other axial setups such as catadioptric cameras, as well metry. Optical 3D Measurement Techniques , III, 1995. 2 as to compute the distortion center for fish-eye cameras [ 9]. [16] N. Morris and K. Kutulakos. Dynamic refraction stereo. In Developing a minimal solution for calibrating flat refractive ICCV, volume 2, pages 1573–1580, 2005. 2 geometry remains an interesting future work. [17] D. Nist er.´ An efficient solution to the five-point relative pose Acknowledgements: We thank several members of problem. PAMI , 26(6):756–770, June 2004. 1, 3 MERL and Mitsubishi Electric, Japan, for their help and [18] S. Ramalingam, P. Sturm, and S. K. Lodha. Theory and cal- support in this project. ibration algorithms for axial cameras. In ACCV , 2006. 2 [19] K. Rinner. Problems of two-medium photogrammetry. Pho- togrammetric Engineering , 35(3):275–282, 1969. 1 References [20] M. Shimizu and M. Okutomi. Calibration and rectification for reflection stereo. In CVPR, pages 1–8, June 2008. 2 [1] J.-Y. Bouguet. Camera calibration toolbox for matlab. http : [21] M. Shortis, E. Harvey, and J. Seager. A review of the status //www.vision.caltech.edu/bouguetj . 6 and trends in underwater videometric measurement. In SPIE [2] Y.-J. Chang and T. Chen. Multi-view 3d reconstruction for Conf. 6491, Videometrics IX , Jan. 2007. 1 scenes under the refractive plane with known vertical direc- [22] E. Steger. Reconstructing transparent objects by refrac- tion. In ICCV, 2011. 2, 8 tive light-path triangulation. Master’s thesis, University of [3] V. Chari and P. Sturm. Multiple-view geometry of the refrac- Toronto, 2006. 2, 5 tive plane. In BMVC, Sep 2009. 2 [23] E. Steger and K. Kutulakos. A theory of refractive and spec- [4] Z. Chen, K.-Y. K. Wong, Y. Matsushita, X. Zhu, and M. Liu. ular 3d shape by light-path triangulation. IJCV, 76(1):13–29, Self-calibrating depth from refraction. In ICCV, 2011. 2 2008. 2 [24] P. Sturm and S. Ramalingam. A generic concept for camera calibration. In ECCV, 2004. 2 [25] J.-P. Tardif, P. Sturm, M. Trudeau, and S. Roy. Calibra- tion of cameras with radially symmetric distortion. PAMI , 31(9):1552–1566, 2009. 2 [26] S. Thirthala and M. Pollefeys. Multi-view geometry of 1d radial cameras and its application to omnidirectional camera calibration. In ICCV, volume 2, pages 1539–1546, 2005. 2 [27] T. Treibitz, Y. Y. Schechner, and H. Singh. Flat refractive geometry. In CVPR, 2008. 1, 2, 7