3D Scene Reconstruction from Multiple Uncalibrated Views

3D Scene Reconstruction from Multiple Uncalibrated Views Li Tao Xuerong Xiao [email protected] [email protected] Abstract aerial photo filming. The 3D scene reconstruction applications such as Google Earth allow people to In this project, we focus on the problem of 3D take flight over entire metropolitan areas in a vir- scene reconstruction from multiple uncalibrated tually real 3D world, explore 3D tours of build- views. We have studied different 3D scene recon- ings, cities and famous landmarks, as well as take struction methods, including Structure from Mo- a virtual walk around natural and cultural land- tion (SFM) and volumetric stereo (space carv- marks without having to be physically there. A ing and voxel coloring). Here we report the re- computer vision based reconstruction method also sults of applying these methods to different scenes, allows the use of rich image resources from the in- ranging from simple geometric structures to com- ternet. plicated buildings, and will compare the perfor- In this project, we have studied different 3D mances of different methods. scene reconstruction methods, including Struc- ture from Motion (SFM) method and volumetric 1. Introduction stereo (space carving and voxel coloring). Here we report the results of applying these methods to 3D reconstruction from multiple images is the different scenes, ranging from simple geometric creation of three-dimensional models from a set of structures to complicated buildings, and will com- images. It is the reverse process of obtaining 2D pare the performances of different methods. images from 3D scenes. In recent decades, there is an important demand for 3D content for com- 2. Previous Work puter graphics, virtual reality and communication, 2.1. Review of Previous Work triggering a change in emphasis for the require- ments. Many existing systems for constructing 3D Reconstructing 3D models just by taking and models are built around specialized hardware (e.g. using 2D images of objects has been a popu- stereo rigs) resulting in a high cost, which can- lar research topic in computer vision due to its not satisfy the requirement of its new applications. potential to create an accurate model of the 3D This gap stimulates the use of digital imaging fa- world with very low cost. 3D reconstruction cilities (such as cameras) [1]. from one image alone is possible [2,3] but per- The 3D scene reconstruction from multiple forms not as good as reconstruction from multi- view images is an increasingly popular topic ple images with different views. In most prac- which can be applied to street view mapping, tical cases, the images used to reconstruct a 3D building construction, gaming and even tourism model are not calibrated. Therefore structure from etc. When the reconstruction of a 3D scene is motion becomes a very popular and convenient needed, a reliable computer vision based recon- method to calculate both camera parameters and struction method is much more cost-efficient and 3D point positions simultaneously, only depend- time-efficient than traditional methods such as ing on the identified correspondences between the 1 2D images used. Many successful applications the input for the space carving algorithm. We have have been created using this concept, such as the also tried out applications employing voxel col- Photo Tourism project, which establishes a sys- oring [11] and a smart phone based commercial tem for interactively browsing and exploring large 3D reconstruction application [12] for compari- unstructured collections of photographs of a scene son. The details are discussed below. using a novel 3D interface and successfully recon- structs various historic and natural scenes [4]. 3.2. Technical Solutions Another genre of 3D reconstruction methods is 3.2.1 SFM from two views volumetric stereo, including space carving [5,6] and voxel coloring [7]. Unlike SFM, these meth- We first implemented the SFM algorithm in Mat- ods do not require the identification of a large lab and started with two views [8]. Using the number of correspondences for dense 3D recon- Camera Calibrator app, we calculate the camera struction. However, they require calibrated im- intrinsic parameters from 16 pictures of an asym- ages, i.e., both the intrinsic and extrinsic param- metric checkerboard pattern. The 2D and 3D in- eters of the cameras associated with every view formation are linked by measuring the grid size must be known and input to the algorithm for suc- of the checkerboard. Figure 1(a) below shows cessful reconstruction. the automatic checkerboard corner detection. Fig- ure 1(b) shows the reprojection errors and Fig- 2.2. Our Work ure 1(c) shows the picture orientations (thus cam- In this project, we have studied several 3D re- era positions) in the world coordinate system. We construction methods, including both SFM and eliminate some pictures by setting a threshold on volumetric stereo. We have both implemented the mean reprojection error. The resulting camera our own algorithms and tried out several software intrinsics, the lens distortion coefficients and other packages and applications available. We have parameters are then input to the SFM process. tested their performances for reconstructing dif- Then we used the calibrated camera to take ferent scenes ranging from small objects to large two pictures of the same object in two different buildings. Here we will report the results and views. We extract features and find point corre- compare the performance of different methods. spondences between the two images. We have experimented with Harris features, FAST features, 3. Technical Part and SURF features etc. The SURF features give rise to the best result with the target scenes. With 3.1. Summary of Technical Solutions the identified correspondences, we then estimate In this project, we have implemented both SFM the fundamental matrix and find the epipolar in- and volumetric stereo algorithms for the 3D re- liers using RANSAC method. We calculate the construction task. After camera calibration, we camera positions using these inliers. With the cal- achieved SFM from two views [8] and multiple culated camera positions, we perform the dense views [9]. We also experimented with the Prince- reconstruction of the 3D scene using triangulation ton SFMedu package [10] extensively using im- as the last step. ages of multi-scale objects. The package con- tains an additional multi-view stereo step and pro- 3.2.2 SFM from multiple views duces dense reconstruction from the output of the SFM algorithm. In volumetric stereo, we focus on Using the same calibrated camera, we moved from space carving for reconstructing 3D visual hulls. two-view reconstruction to reconstruction with In order to acquire calibrated views, we obtained multiple views. We need to merge the point clouds camera parameters (intrinsic and extrinsic) from during the reconstruction process. Bundle adjust- our SFM algorithm ad used these parameters as ment is hence performed to refine the resulting co- 2 (a) (b) (c) Figure 1. (a) Automatic checkerboard corner detection; (b) Reprojection errors of different pictures; (c) Picture orientations (camera positions) in the world coordinate system. ordinates by minimizing the reprojection error [9]. 3.2.4 Space carving In order to acquire calibrated views for space carving, we first used the SFM algorithm to calculate 3.2.3 Princeton SFMedu the intrinsic and extrinsic camera parameters associated with different views, and then used these There are four steps of image-based modeling: parameters as the input to the space carving algo- SFM, multi-view stereo, model fitting and texture rithm. In order to acquire better silhouettes, we mapping [10], following a bottom-up order. From have manually removed the background of input images taken from moving cameras, the SFM images. The space carving algorithm is developed method computes 3D point cloud, after which the based on the code from course homework. multi-view stereo generates denser points. Model fitting creates meshes from points, and texture mapping leads to models based on the meshes. Up 3.2.5 Voxel coloring to this point, we reconstruct scenes using SFM and We have also experimented with a generalized multi-view stereo. voxel coloring implementation - the Archimedes Given two images taken by a moving cam- executable [11]. This implementation also re- era at different locations, keypoints are described quires calibrated views (preferably images ac- using SIFT features. The keypoints in the two quired from an object placed on a turntable) and images are then matched to estimate the funda- accurate intrinsic and extrinsic camera parame- mental matrix and essential matrix. RANSAC ters. We have tried out a provided example with is used to eliminate outliers and prune the esti- this implementation and will discuss its perfor- mation. The camera trajectory described by the mance in later sections. rotation and translation matrix is then computed. For multi-view reconstruction, bundle adjustment 3.2.6 Commercial 3D reconstruction applica- is performed to refine the resulting coordinates tion - Autodesk 123D Catch by minimizing the reprojection error. Multiple view stereo is implemented using matching prop- For comparison purposes, we also experimented agation, which utilizes the zero-mean normalized with a commercial smart phone based 3D recon- cross-correlation in a priority queue and add new struction application - Autodesk 123D Catch [12]. potential matches in each iteration. The algorithm This application does not require calibrated views. propagates only on areas with maximal gradient A dense and accurate 3D model can be generated greater than a threshold. Lastly, the triangulation just with images taken from different views of an step is used to generate colorized 3D point clouds. object. 3 4. Experiments and Results two viewing angles) is shown in Figure 3(b). 4.1. SFM from Two Views (a) With SFM from two views, the identified SURF features of the first image are shown in Fig- ure 2(a).

3D Scene Reconstruction from Multiple Uncalibrated Views

Image-Based 3D Reconstruction: Neural Networks Vs. Multiview Geometry

Amodal 3D Reconstruction for Robotic Manipulation Via Stability and Connectivity

Stereoscopic Vision System for Reconstruction of 3D Objects

Configurable 3D Scene Synthesis and 2D Image Rendering with Per-Pixel Ground Truth Using Stochastic Grammars

3D Shape Reconstruction from Vision and Touch

3D Reconstruction Is Not Just a Low-Level Task: Retrospect and Survey

Automatic Reconstruction of Textured 3D Models of Textured 3Dmodels Automatic Reconstruction Dipl.-Ing

Image-Based Synthesis and Re-Synthesis of Viewpoints Guided by 3D Models

3D Reconstruction and Recognition Acknowledgement

Bayesian Reconstruction of 3D Human Motion from Single-Camera Video

Arxiv:2001.05613V2 [Cs.CV] 14 Oct 2020 Mental Results Demonstrate That the Mean Per Joint Position I.E., Parts Or All of the Body Must Not Be Lost at Any Time

Occupancy Networks: Learning 3D Reconstruction in Function Space