Stereo Calibration of Depth Sensors with 3D
Total Page:16
File Type:pdf, Size:1020Kb
STEREO CALIBRATION OF DEPTH SENSORS WITH 3D CORRESPONDENCES AND SMOOTH INTERPOLANTS A Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Chrysanthi Chaleva Ntina May 2013 STEREO CALIBRATION OF DEPTH SENSORS WITH 3D CORRESPONDENCES AND SMOOTH INTERPOLANTS Chrysanthi Chaleva Ntina APPROVED: Dr. Zhigang Deng, Chairman Dept. of Computer Science Dr. Guoning Chen Dept. of Computer Science Dr. Mina Dawood Dept. of Civil and Environmental Engineering Dean, College of Natural Sciences and Mathematics ii iii STEREO CALIBRATION OF DEPTH SENSORS WITH 3D CORRESPONDENCES AND SMOOTH INTERPOLANTS An Abstract of a Thesis Presented to the Faculty of the Department of Computer Science University of Houston In Partial Fulfillment of the Requirements for the Degree Master of Science By Chrysanthi Chaleva Ntina May 2013 iv Abstract The Microsoft Kinect is a novel sensor that besides color images, also returns the actual distance of a captured scene from the camera. Its depth sensing capabilities, along with its affordable, commercial-type availability led to its quick adaptation for research and applications in Computer Vision and Graphics. Recently, multi-Kinect systems are being introduced in order to tackle problems like body scanning, scene reconstruction, and object detection. Multiple-cameras configurations however, must first be calibrated on a common coordinate system, i.e. the relative position of each camera needs to be estimated with respect to a global origin. Up to now, this has been addressed by applying well-established calibration methods, developed for con- ventional cameras. Such approaches do not take advantage of the additional depth information, and disregard the quantization error model introduced by the depth resolution specifications of the sensor. We propose a novel algorithm for calibrat- ing a pair of depth sensors, based on a recovered affine transformation from very few 3D point correspondences, refined under a non-rigid registration, that accounts for the non-linear sensor acquisition error. The result is a closed form mapping, of the smooth warping type, that compensates for pairwise calibration point differ- ences. The formulation is further complemented by proposing two different ways of efficiently capturing candidate calibration points. Qualitative 3D registration re- sults show significant improvement over the conventional rigid calibration method, and highlight the potential for advanced and more accurate multi-sensor configura- tions. v Contents 1 Introduction 1 1.1 Motivation . .1 1.2 Purpose and Aim . .4 1.3 Challenges . .5 1.4 Contributions . .6 1.5 Outline . .6 2 Related Work 8 2.1 Multi-Kinect Calibration Approaches . .9 2.1.1 Conventional Pairwise Stereo Calibration . .9 2.1.2 Global Calibration and Refinements . 11 2.1.3 Other Methods . 13 2.1.4 Limitations of Existing Approaches . 14 2.2 Capturing Calibration Points . 15 3 The Microsoft Kinect 18 3.1 Sensor Description . 18 3.1.1 Kinect for XBOX vs Kinect for Windows . 19 3.1.2 Hardware Components . 19 3.1.3 Software Drivers . 21 vi 3.2 Sensor Calibration . 23 3.2.1 Color Camera Intrinsics . 23 3.2.2 Infrared Camera Intrinsics . 25 3.2.3 Color-IR Stereo Calibration . 26 3.3 The Depth Capturing System . 30 3.3.1 Acquiring Depth Data . 30 3.3.2 The Depth Error Model . 33 4 Non-rigid Calibration 37 4.1 Initial Stereo Calibration . 39 4.2 Insufficiency of the Rigid Assumption . 42 4.3 Non-rigid Correction . 45 4.3.1 Thin Plate Splines . 45 4.3.2 Approximating the Mapping Function with RBFs . 48 5 Capturing Calibration Points 53 5.1 Using the Infrared Image . 53 5.1.1 Obtaining 2D Points . 54 5.1.2 Transforming Points to 3D . 55 5.1.3 Using RGB Instead of IR Camera . 59 5.2 Using the Depth Map Directly . 60 5.2.1 Background Removal . 61 5.2.2 RANSAC-based Model Fitting . 61 5.2.3 MLS Resampling . 63 5.2.4 Center Extraction . 64 6 Reconstruction Experiments and Calibration Evaluation 65 6.1 Experimental Setup . 65 vii 6.1.1 Kinect Network . 66 6.1.2 Multiple Kinects Interference . 67 6.2 Preprocessing for Data Registration . 70 6.2.1 Background Removal . 71 6.2.2 Converting Depth to Point Clouds . 73 6.2.3 Coloring the Point Clouds . 74 6.3 Registration Results and Comparison . 75 6.3.1 Data, Methods, Setups, Visuals . 77 6.3.2 Visual Registration Comparisons . 78 6.3.3 Qualitative and Comparative Analysis . 80 7 Conclusion 88 7.1 Future Work . 89 Bibliography 93 viii List of Figures 2.1 Three Kinects setup (Berger et al. [7]) . 10 2.2 Three Kinects setup for scanning the human body (Tong et al. [69]) . 12 2.3 Four Kinects setup and calibration object (Alexiadis et al. [5]) . 13 2.4 Calibration objects in RGB, depth and IR (Berger et al. [8]) . 17 3.1 The Microsoft Kinect device1....................... 19 3.2 Valid depth values2............................. 20 3.3 Microsoft Kinect components [54]. 21 3.4 RGB and corresponding IR frames used for intrinsics calibration. 24 3.5 Uncalibrated color frame mapped to corresponding depth. 27 3.6 RGB-IR calibration interface. 30 3.7 Actual depth distance measured by the Kinect sensor. 31 3.8 The IR speckled pattern emitted by the laser projector. 32 3.9 The triangulation process for depth from disparity. 33 4.1 Local and global coordinate systems for two sensors . 39 4.2 Error during depth capturing. 42 4.3 Error difference in calibration points captured by two sensors. 44 4.4 Thin Plate Splines interpolation (from [22]). 46 4.5 Calibration steps. 51 ix 5.1 Interface to capture infrared (top) and depth (bottom) images for two Kinects. 55 5.2 Example setup with infrared and depth images captured. 56 5.3 Detected 2D points mapped to 3D. 56 5.4 Geometry of the pinhole camera model. 57 5.5 Detected checkerboard corners converted to 3D point cloud . 59 5.6 Ball quantization step. 62 5.7 Moving Least Squares to upsample sphere. 63 5.8 Calibration points using the depth map . 64 6.1 Server interface with two connected Kinect clients. 67 6.2 Interference depending on relative sensor position. 69 6.3 Dot pattern interference with and without enforced motion blur. 70 6.4 Frames for building a background depth model. 72 6.5 Background subtraction steps. 73 6.6 Depth map converted to point cloud. 74 6.7 Coloring of point cloud through mapping of the RGB frame. 76 6.8 Uncalibrated point clouds. 78 6.9 (a) Conventional and (b) our registration results. 79 6.10 Uncalibrated point clouds. 80 6.11 (a) Conventional and (b) our registration results. 81 6.12 Colored registered point clouds using (a) conventional calibration and (b) our method. 82 6.13 Variance of the proposed method . 82 6.14 Registration using our method for different poses (a) and (b). 83 6.15 Registration using our method for different scenes (a) and (b). 84 6.16 Registration results of (a) conventional and (b) our method in cloud with very little overlap. 85 x 6.17 Registration results of (a) conventional and (b) our method in clouds with almost no overlap. 86 xi List of Tables 3.1 Comparison of available Kinect drivers. 22 3.2 Intrinsic parameters for RGB camera. 25 3.3 Distortion coefficients for RGB camera. 25 3.4 Intrinsic parameters for IR camera. 26 3.5 Distortion coefficients for IR camera. 26 6.1 Color-coded point clouds . 78 xii Chapter 1 Introduction In this chapter we state the research topic and motivation of this thesis, and provide an outline of the contents. 1.1 Motivation The release of the Microsoft Kinect sensor in 2010 has given a new boost to Computer Vision and Graphics research and applications. With its depth sensing capabilities, a new source of scene data, and its affordable, commercial-type avail- ability, it provides a fast and convenient way of enhancing camera setups. The Kinect is a novel sensing device, that apart from being a conventional RGB camera, it also incorporates a depth sensor that can provide the depth value of each frame pixel in the form of distance from the camera in mm units. In the relatively few years that have elapsed since its release, a large body of work has emerged in the literature in 1 diverse fields, related to using, configuring or expanding the new framework. At an algorithmic level, Kinect data have been used with conventional image processing methods for human detection [77], person tracking [45], [37], and object detection [65]. Graphics applications include face [82], [32], body [20], [18], [69], [74], and shape [21], [19] scanning, as well as markerless motion capture [8] and scene reconstruction [35], [75]. A range of additional fields have also incorporated the versatility of the Kinect data for solving traditional or newly emerging prob- lems, besides the originally intended gaming and motion-based interfaces. Examples include, Robotics, Human-Computer Interaction (HCI), Animation, Smart(er) In- terfaces, and more. Applications in HCI range from new interactive designs [58], [55], [81], [42], [58], [76] to immersive and virtual reality [72]. In robotics, the sensor's portability and high frame rate [61] has been employed, e.g. by mounting Kinects on mobile robots for visual-based navigation, obstacle avoidance [52], [9], and indoor scene modeling [31], [70]. Moreover, specialized application domains have emerged by incorporating the use of Kinect and replacing traditional cameras. For example, in healthcare applications, Kinect sensors have been used for patient rehabilitation [44], [16]. The above being indicative applications, the diversity of the domains is dictated by the wide range of problems involving shape, scene, and motion measurements in the three-dimensional space.