Calibration Using a General Homogeneous Depth Camera Model

DEGREE PROJECT IN THE FIELD OF TECHNOLOGY ENGINEERING PHYSICS AND THE MAIN FIELD OF STUDY ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Calibration using a general homogeneous depth camera model DANIEL SJÖHOLM KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Calibration using a general homogeneous depth camera model DANIEL SJÖHOLM Master’s Thesis at CSC/CVAP Supervisor: Magnus Burenius Supervisor at KTH: Patric Jensfelt Examiner: Joakim Gustafson Abstract Being able to accurately measure distances in depth images is important for accurately reconstructing objects. But the measurement of depth is a noisy process and depth sensors could use additional correction even after factory calibration. We regard the pair of depth sensor and image sensor to be one single unit, returning complete 3D information. The 3D information is combined by relying on the more accurate image sensor for everything except the depth measurement. We present a new linear method of correcting depth distortion, using an empirical model based around the con- straint of only modifying depth data, while keeping planes planar. The depth distortion model is implemented and tested on the Intel RealSense SR300 camera. The results show that the model is viable and generally decreases depth measurement errors after calibrating, with an average improvement in the 50 % range on the tested data sets. Referat Kalibrering av en generell homogen djupkameramodell Att noggrant kunna mäta avstånd i djupbilder är viktigt för att kunna göra bra rekonstruktioner av objekt. Men denna mätprocess är brusig och dagens djupsensorer tjänar på ytterligare korrektion efter fabrikskalibrering. Vi betraktar paret av en djupsensor och en bildsensor som en enda enhet som returnerar komplett 3D information. 3D informationen byggs upp från de två sensorerna genom att lita på den mer precisa bildsensorn för allt förutom djupmätningen. Vi presenterar en ny linjär metod för att korrigera djup- distorsion med hjälp av en empirisk modell, baserad kring att enbart förändra djupdatan medan plana ytor behålls plana. Djupdistortionsmodellen implementerades och testades på kameratypen Intel RealSense SR300. Resultaten visar att modellen fungerar och i regel mins- kar mätfelet i djupled efter kalibrering, med en genomsnittlig förbättring kring 50 % för de testade dataseten. Acknowledgments I would like to thank Magnus Burenius for offering the thesis project and supervising me. Special gratitude for sharing his devised linear depth distortion model, allowing me to write my thesis on it. I would in addition like to thank everyone who has taken part in discussions regarding the camera internals and possible physical causes of the distortion. Contents 1 Introduction 1 1.1 Objectives . 2 1.2 Outline . 2 2 Background 3 2.1 Projective geometry . 3 2.2 Geometric transformations . 4 2.2.1 Notation . 5 2.2.2 Rotations . 5 2.2.3 Euclidean transformations . 6 2.2.4 Similarity transformation . 6 2.2.5 Affine transformation . 6 2.2.6 Projective transformation . 7 2.2.7 Extension to 3D . 7 2.3 Camera model . 7 2.3.1 Extension to depth cameras . 10 2.3.2 Lens correction model . 10 2.3.3 Calibration outline . 11 2.4 Depth sensor . 12 2.4.1 Structured light . 12 2.4.2 Depth versus lens distortion . 15 2.5 Accuracy metrics . 15 2.5.1 Reprojection error . 15 2.5.2 Depth accuracy . 16 2.5.3 Depth and image combined . 16 3 Related work 17 3.1 Camera calibration . 17 3.2 Depth cameras . 19 4 Theory 21 4.1 Homogeneous depth camera model . 21 4.2 Projective depth distortion . 22 4.3 General homogeneous depth camera model . 22 4.4 Calibrating the depth distortion parameters analytically . 23 4.5 Iterative reweighted least squares . 24 4.6 Calibrating the depth-distortion parameters non-linearly . 25 4.7 Parameter interpretation . 25 4.8 Parameter visualization . 26 5 Implementation 29 5.1 Physical hardware . 29 5.2 Calibration pattern . 29 5.3 Data collection . 31 5.4 Calibration algorithm . 31 5.5 Evaluation . 32 5.5.1 Dot error . 32 5.5.2 Plane error . 33 6 Results 35 6.1 Data set list . 35 6.2 Complete model with lens distortion . 36 6.2.1 Parameter stability . 39 6.2.2 Difference images . 42 6.2.3 Mean image error . 45 6.3 2 parameter model (x, w) with lens distortion . 47 6.4 Model comparison . 50 6.5 Multi camera comparison . 50 6.5.1 Camera 2 . 51 6.5.2 Camera 3 . 53 6.5.3 Camera 4 . 54 7 Discussion and conclusions 57 7.1 Discussion . 57 7.1.1 Lens distortion . 57 7.1.2 Parameter stability . 57 7.1.3 Error metrics . 58 7.1.4 Measurement error . 59 7.2 Future work . 59 7.3 Conclusions . 60 Appendices 60 A Numerical results camera 1 61 B Depth distortion properties 65 C Ethical considerations 69 Bibliography 71 Chapter 1 Introduction The camera has been a huge part of society ever since its invention. Its progress into the digital world with digital cameras and small embedded optics in today’s smartphones shows that it is not going to disappear anytime soon. As is well known, a camera captures a two dimensional image of our three dimensional world. From this image it is in some situations possible to reconstruct parts of the captured 3D world. However, this reconstruction is generally not well defined. Scale and even perspective can be completely off from the true situation. A way to enable a reconstruction which is correct both in perspective, as well as in scale, is to calibrate the camera. Calibration establishes a connection between the metric properties of the camera, such as its position in space and distance between the lens and optical center (focal length), and the images captured by it. More than one 2D image is required to infer general information about a 3D scene. This can be solved by capturing multiple images from different angles of the scene or by extending the information present in the image with data from another sensor. Using a stereo setup of cameras, one can begin to reconstruct depth by matching features between the two images. A well textured scene is however required in order to be able to match features between the images and draw conclusions about the depth. The distance to the object can also not be too great in comparison to the inter-camera distance, or the views become too similar. A quite recent development in the direction of distance measuring by cameras is to bundle a depth sensor with a camera into a so called RGB-D camera (Red, Green, Blue and Depth), which provides even more information. The depth sensing is implemented as a separate sensor, which internally might utilize stereo vision or another technique. This has especially received new attention after the release of the Microsoft Kinect camera for the Xbox gaming system, which attracted new research due to the cheap hardware showing acceptable performance. The bundled depth sensing is of course welcome in all kinds of applications, as mimicking human vision requires acquiring images as well as information about the distance to the imaged objects. Fields taking explicit advantage of the depth information is for example to create 3D models of real life objects [21], which in turn can be useful 1 CHAPTER 1. INTRODUCTION for tracking, taking measurements on the object or re-creating it by 3D printing it. The depth information can assist in visual odometry [25] and its extension of Simultaneous Localization and Mapping (SLAM) [26, 34]. It’s also of use for whenever image segmentation based on depth is required, such as object recognition [16], and augmented reality applications [16, 22]. Mapping using a camera coupled with a depth sensor is a procedure which instantaneously captures a dense representation of the view. Two alternatives to this are to scan a static scene with a laser scanner or by measuring objects through tactile measurements. These methods generally produce depth measurements of higher accuracy than the RGB-D system of focus in this thesis, but they suffer from measurements taking a long time which leads to unsuitability in dynamic scenes, long wait times and generally cost a lot more than an RGB-D camera setup. These methods are also generally either more adapted for long-range measurements or contact measurement at very close range. RGB-D cameras fit in the middle, allowing relatively precise measurements in the single meter distance range. The act of calibrating a camera is essentially to model it mathematically. This is done by gathering data to determine the parameters of the mathematical model, so that more information about the real world can be extracted from the images captured by the camera. 1.1 Objectives This thesis looks at improving the spatial accuracy of the depth data returned by an active stereo vision system, by regarding the camera system as a complete 3D imaging device, rather than an imaging device coupled with a depth sensor. The end goal is to enable more accurate measurements through modeling distortions in the depth data. We achieve this through a joint camera model of image and depth. The model contains a linear correction for the depth data, utilizing the image data in order to improve the depth data. This previously unpublished linear correction model of depth is presented and tested out in practice. The camera is calibrated in order to achieve the above, which is simply another name for determining the parameters in the model. 1.2 Outline The report is structured as follows. Chapter 2 summarizes necessary theory and gives a brief description of active stereo cameras. Chapter 3 reports previous works within both ordinary camera calibration as well as works centered around depth cameras.

Calibration Using a General Homogeneous Depth Camera Model

Mobile Augmented Reality for Semantic 3D Models -A Smartphone-Based Approach with Citygml

Camera Motion Estimation for Multi-Camera Systems

2020Spring 20 Camerasandca

Pinhole Camera Calibration in the Presence of Human Noise