Computer Vision ECE 661 Homework 10
Total Page:16
File Type:pdf, Size:1020Kb
Computer Vision ECE 661 Homework 10 Rohan Sarkar ([email protected]) November 29, 2016 1. Introduction In this assignment, we perform a 3D scene reconstruction with projective distortion using two stereo images captured by an uncalibrated camera. Such a 3D reconstruction is called projective reconstruction. because the reconstruction is projectively distorted visually and it is related by a 4 4 homography to the real scene. If the scene is visually rich and we £ have prior knowledge about the objects that appear in the scene, we can remove projective distortion, affine and similarity distortion successively. The following steps are performed for the 3D reconstruction : 1. An initial estimate of the fundamental matrix F is found, using the manually selected correspondences (minimum of 8) between the 2 images by linear least squares optimization and refine the fundamental matrix F using the Levenberg Marquadt Method. 2. Using this initial estimate of F, we rectify the images to send the epipoles to infinity T such that e £1 0 0¤ and thereby find the Homographies for the two cameras for Æ this purpose, represented by H,H0. 3. Using the rectified images, the interest points on the rectified images are found using SURF and canny edge detector. 4. After finding the interest points in the rectified images, we calculate the euclidean distance geometric error to calculate the cost function and apply the non-linear least squares optimization to minimize it and refine the fundamental matrix, camera matrices and the 3D world points. 5. Finally we shall use triangulation to project the point correspondences to world 3D and compute the 3D Scene structure from the set of correspondences for the given camera pair (P,P0). 2. Theory and Implementation Linear Least Squares estimation of the Fundamental Matrix The Fundamental Matrix F is estimated using the manual correspondences (xi,xi0 ) that are selected by the user.(xi,xi0 ) are the pixel locations in homogeneous coordinates in the left and right image of the same world point selected by the user manually. The Fundamental Matrix F gives us the algebraic representation of the epipolar geometry which finds the relation between the corresponding points (xi,xi0 ). The algebraic equation is given by T x0 Fxi 0 i Æ £ ¤ u0 ui u0 vi u0 v0 ui v0 vi v0 ui vi 1 f 0 Æ) i i i i i i Æ 1 From a set of n point matches, we obtain a set of linear equations of the form 2 3 u01u1 u01v1 u01 v10 u1 v10 v1 v10 u1 v1 1 . A.f 6 . 7f 0 Æ 4 5 Æ u0nun u0nvn u0n v0nun v0nvn v0n un vn 1 where, £ ¤ f f11 f12 f13 f21 f22 f23 f31 f32 f33 f33 1 Æ Æ £ ¤T £ ¤T xi ui vi 1 x0 u0 v0 1 Æ i Æ i i T Since each correspondence gives us 1 equation by solving, x0 Fxi 0. Since there are 8 i Æ unknowns, we need a minimum of 8 such correspondences to use the normalized 8 point algorithm, which normalizes the data to improve the estimate of the Fundamental Matrix F. The steps are enumerated below: 1. Normalization : The image coordinates are transformed according to ˆ xˆ i Txi x i T0x0 Æ 0 Æ i where T and T0 are normalizing transformations consisting of a translation and scaling.The normalizing transformations T, for points xi and T0, for points xi0 for the two images are found such that all the pixel correspondences have 0 mean and are at a distance of p2 from the center C1 (0,0). Æ 2. Linear Solution : When we have normalized all the points, we stack up all of them and make a matrix vector equation of the form Aˆ .fˆ 0.The matrix Aˆ has Æ all the stacked rows similar to the matrix shown above and is composed of the ˆ ˆ ˆ matches between xˆ i and x0i. This is solved using SVD to yield the matrix F from f corresponding to the smallest singular value of Aˆ . 3. Constraint enforcement : The rank of matrix Fˆ has to be 2. Therefore we may need to condition the matrix Fˆ to make its rank 2. This is done again by using Singular Value Decomposition.So if Fˆ UDVˆ T , we set the smallest singular value in ˆ ƈ ˆ ˆ ˆ T D equal to 0 and then denote it with DCE. So we replace F by FCE UDCEV such ˆ Æ that det(FCE) 0 Æ ˆ 4. Denormalization : Next, we denormalize the fundamental matrix FCE by using the following relation T ˆ F T0 FCET Æ 5. Next we compute the epipoles of the left and right images. e is the right null vector of fundamental matrix F and e0 is the left null vector of fundamental matrix F. T F.e 0 e0 F 0 Æ Æ 6. Finally we calculate the camera projection matrices in the canonical form as follows: P [I 0] P0 [[e0]xF e0] Æ j Æ j 2 Initial estimates before rectification and before refining Initial Fundamental Matrix before images are rectified is given by : 21.61993193122650e 07 8.41209003027106e 06 0.004300545977564843 ¡ ¡ ¡ ¡ F 48.50415583183406e 06 7.34816171971557e 07 0.0287916342927544 5 Æ ¡ ¡ ¡ ¡ 0.00364100843113559 0.0295143317479700 1 Initial First Camera Projection Matrix is given by : 21 0 0 03 P 40 1 0 05 Æ 0 0 1 0 Initial Second Camera Projection Matrix is given by : 2 1.80523197679284 14.6332981048719 495.774364765972 3551.87067873625 3 ¡ ¡ ¡ P0 4 12.9323909255888 104.831097950199 3551.87497928223 495.8031564002655 Æ ¡ ¡ ¡ ¡ 0.0302859784829609 0.00678071280441858 104.396385907253 1 ¡ ¡ Initial epipoles are given by : 2 3346.99098369553 3 2 3551.87067873625 3 e 4 446.7803134820165 e0 4 495.8031564002655 Æ ¡ Æ ¡ 1 1 Final estimates before rectification and after refining Refined Fundamental Matrix before images are rectified is given by : 22.00548076986446e 07 1.29984078437012e 05 0.001300769775751963 ¡ ¡ ¡ ¡ F 41.34181204176223e 05 5.41186060674474e 07 0.0377942991379238 5 Æ ¡ ¡ ¡ ¡ 0.000417385932814687 0.0383884228246249 1 Refined First Camera Projection Matrix is given by : 21 0 0 03 P 40 1 0 05 Æ 0 0 1 0 Refined Second Camera Projection Matrix is given by : 2 14.9486114725145 21.6540815829875 408.511137615021 2956.45213307080 3 ¡ ¡ ¡ P0 4 0.844575731781253 112.141710233451 2925.19966927627 75.29346073520635 Æ ¡ ¡ ¡ ¡ 0.0343340132683272 0.00891305474721164 111.087250279651 1 ¡ ¡ 3 Refined epipoles are given by : 2 2814.37638782766 3 2 2956.45213307083 3 e 4 56.64939984275905 e0 4 75.29346073520725 Æ ¡ Æ ¡ 1 1 Final estimates after rectification Final Fundamental Matrix after images are rectified is given by : 2 5.69457800755607e 26 1.24071872320692e 21 8.04011778842355e 203 ¡ ¡ ¡ ¡ F 4 1.32348898008484e 22 7.85821581925376e 23 0.000214584025844281 5 Æ ¡ ¡ ¡ 3.81164826264435e 21 0.000214584025844281 0.00584613578291279 ¡ ¡ ¡ ¡ Final Rectification Homography for first image is given by : 2 0.3537 0.1086 131.53163 ¡ H 4 0.0111 0.5653 0.8599 5 Æ 0.0004 0.0001 1.0000 ¡ Final Rectification Homography for second image is given by : 2 0.3750 0.1174 141.47483 ¡ H0 4 0.0145 0.5700 0.0090 5 Æ ¡ 0.0003 0.0001 1.0000 ¡ 4 Input Images 5 Figure 1: Input Images Extracting Manual Correspondences Taking the manual correspondences requires the user to click on points of interest that are present in both images. The advantage of this method is that it uses human knowledge about the corresponding points of interest that lie on the object of interest to accurately spot correspondences. Often it is difficult to manually extract 40 corresponding points which is required to rectify the image accurately.However, since this method uses human knowledge, we can use even the bare minimum number of points (minimum of 8) required to compute the fundamental matrix.I found 20 manual correspondences. However, a significant draw- back is that while humans can identify correspondences easily, human eye cannot pinpoint the exact position of the correspondences upto a pixel or sub-pixel accuracy. The accuracy of the correspondences greatly affects the result of the scene reconstruction procedure. In my approach, I have written a MATLAB Script find_manual_correspondence.m using which I can just click on the corresponding points of interest on both the images using a cross-hair cursor which makes locating points easier and accurate. Figure 2: Setup for taking Manual Correspondences 6 Figure 3: Points of Interest selected manually Figure 4: Correspondences between points of interest selected manually Image Rectification In order to accurately compute our projectively distorted 3D scene structure, we must obtain a highly refined estimate of matrix F. For this we need a considerably large number of pixel correspondences (xi,xi0 ) between the two images. For this we need to solve the ’correspondence problem’. It is computationally efficient, if we can simply lookup for pixel correspondences along the same rows for each pixel in one image (if F is perfectly estimated).In practical cases estimating F perfectly is not possible, and thus we look for correspondences in a small number of adjoining rows in the second image, but it still reduces the search space considerably and hence is much more computationally efficient. This is done by sending the epipoles in both the images to infinity. We compute the homographies H and H0 to send the epipoles e and e0 to infinity. The steps are enumerated 7 below: Estimating Homography H’ to send epipole e’ to along x axis 1 1. The second image center is shifted to origin using the transformation T1 to make the application of rotation easier. 2. The angle of the epipole w.r.t positive x-axis is found by using the formula 1 ³ height(image)/2 e2(y)´ θ tan¡ ¡ Æ ¡ width(image)/2 e2(x) ¡ 3.