Calibration Using a General Homogeneous Depth Camera Model

Total Page:16

File Type:pdf, Size:1020Kb

Calibration Using a General Homogeneous Depth Camera Model DEGREE PROJECT IN THE FIELD OF TECHNOLOGY ENGINEERING PHYSICS AND THE MAIN FIELD OF STUDY ELECTRICAL ENGINEERING, SECOND CYCLE, 30 CREDITS STOCKHOLM, SWEDEN 2017 Calibration using a general homogeneous depth camera model DANIEL SJÖHOLM KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION Calibration using a general homogeneous depth camera model DANIEL SJÖHOLM Master’s Thesis at CSC/CVAP Supervisor: Magnus Burenius Supervisor at KTH: Patric Jensfelt Examiner: Joakim Gustafson Abstract Being able to accurately measure distances in depth images is important for accurately reconstructing objects. But the measurement of depth is a noisy process and depth sensors could use additional correction even after factory calibration. We regard the pair of depth sensor and image sensor to be one single unit, returning complete 3D information. The 3D information is combined by relying on the more accurate image sensor for everything except the depth measurement. We present a new linear method of correcting depth distortion, using an empirical model based around the con- straint of only modifying depth data, while keeping planes planar. The depth distortion model is implemented and tested on the Intel RealSense SR300 camera. The results show that the model is viable and generally decreases depth measurement errors after calibrating, with an average improvement in the 50 % range on the tested data sets. Referat Kalibrering av en generell homogen djupkameramodell Att noggrant kunna mäta avstånd i djupbilder är viktigt för att kunna göra bra rekonstruktioner av objekt. Men denna mätprocess är brusig och dagens djupsensorer tjänar på ytterligare korrektion efter fabrikskalibrering. Vi betraktar paret av en djupsensor och en bildsensor som en enda enhet som returnerar komplett 3D information. 3D informationen byggs upp från de två sensorerna genom att lita på den mer precisa bildsensorn för allt förutom djupmätningen. Vi presenterar en ny linjär metod för att korrigera djup- distorsion med hjälp av en empirisk modell, baserad kring att enbart förändra djupdatan medan plana ytor behålls pla- na. Djupdistortionsmodellen implementerades och testades på kameratypen Intel RealSense SR300. Resultaten visar att modellen fungerar och i regel mins- kar mätfelet i djupled efter kalibrering, med en genomsnittlig förbättring kring 50 % för de testade dataseten. Acknowledgments I would like to thank Magnus Burenius for offering the thesis project and supervising me. Special gratitude for sharing his devised linear depth distortion model, allowing me to write my thesis on it. I would in addition like to thank everyone who has taken part in discussions regarding the camera internals and possible physical causes of the distortion. Contents 1 Introduction 1 1.1 Objectives . 2 1.2 Outline . 2 2 Background 3 2.1 Projective geometry . 3 2.2 Geometric transformations . 4 2.2.1 Notation . 5 2.2.2 Rotations . 5 2.2.3 Euclidean transformations . 6 2.2.4 Similarity transformation . 6 2.2.5 Affine transformation . 6 2.2.6 Projective transformation . 7 2.2.7 Extension to 3D . 7 2.3 Camera model . 7 2.3.1 Extension to depth cameras . 10 2.3.2 Lens correction model . 10 2.3.3 Calibration outline . 11 2.4 Depth sensor . 12 2.4.1 Structured light . 12 2.4.2 Depth versus lens distortion . 15 2.5 Accuracy metrics . 15 2.5.1 Reprojection error . 15 2.5.2 Depth accuracy . 16 2.5.3 Depth and image combined . 16 3 Related work 17 3.1 Camera calibration . 17 3.2 Depth cameras . 19 4 Theory 21 4.1 Homogeneous depth camera model . 21 4.2 Projective depth distortion . 22 4.3 General homogeneous depth camera model . 22 4.4 Calibrating the depth distortion parameters analytically . 23 4.5 Iterative reweighted least squares . 24 4.6 Calibrating the depth-distortion parameters non-linearly . 25 4.7 Parameter interpretation . 25 4.8 Parameter visualization . 26 5 Implementation 29 5.1 Physical hardware . 29 5.2 Calibration pattern . 29 5.3 Data collection . 31 5.4 Calibration algorithm . 31 5.5 Evaluation . 32 5.5.1 Dot error . 32 5.5.2 Plane error . 33 6 Results 35 6.1 Data set list . 35 6.2 Complete model with lens distortion . 36 6.2.1 Parameter stability . 39 6.2.2 Difference images . 42 6.2.3 Mean image error . 45 6.3 2 parameter model (x, w) with lens distortion . 47 6.4 Model comparison . 50 6.5 Multi camera comparison . 50 6.5.1 Camera 2 . 51 6.5.2 Camera 3 . 53 6.5.3 Camera 4 . 54 7 Discussion and conclusions 57 7.1 Discussion . 57 7.1.1 Lens distortion . 57 7.1.2 Parameter stability . 57 7.1.3 Error metrics . 58 7.1.4 Measurement error . 59 7.2 Future work . 59 7.3 Conclusions . 60 Appendices 60 A Numerical results camera 1 61 B Depth distortion properties 65 C Ethical considerations 69 Bibliography 71 Chapter 1 Introduction The camera has been a huge part of society ever since its invention. Its progress into the digital world with digital cameras and small embedded optics in today’s smartphones shows that it is not going to disappear anytime soon. As is well known, a camera captures a two dimensional image of our three dimensional world. From this image it is in some situations possible to reconstruct parts of the captured 3D world. However, this reconstruction is generally not well defined. Scale and even perspective can be completely off from the true situation. A way to enable a reconstruction which is correct both in perspective, as well as in scale, is to calibrate the camera. Calibration establishes a connection between the metric properties of the camera, such as its position in space and distance between the lens and optical center (focal length), and the images captured by it. More than one 2D image is required to infer general information about a 3D scene. This can be solved by capturing multiple images from different angles of the scene or by extending the information present in the image with data from another sensor. Using a stereo setup of cameras, one can begin to reconstruct depth by matching features between the two images. A well textured scene is however required in order to be able to match features between the images and draw conclusions about the depth. The distance to the object can also not be too great in comparison to the inter-camera distance, or the views become too similar. A quite recent development in the direction of distance measuring by cameras is to bundle a depth sensor with a camera into a so called RGB-D camera (Red, Green, Blue and Depth), which provides even more information. The depth sensing is implemented as a separate sensor, which internally might utilize stereo vision or another technique. This has especially received new attention after the release of the Microsoft Kinect camera for the Xbox gaming system, which attracted new research due to the cheap hardware showing acceptable performance. The bundled depth sensing is of course welcome in all kinds of applications, as mimicking human vision requires acquiring images as well as information about the distance to the imaged objects. Fields taking explicit advantage of the depth information is for example to create 3D models of real life objects [21], which in turn can be useful 1 CHAPTER 1. INTRODUCTION for tracking, taking measurements on the object or re-creating it by 3D printing it. The depth information can assist in visual odometry [25] and its extension of Simultaneous Localization and Mapping (SLAM) [26, 34]. It’s also of use for whenever image segmentation based on depth is required, such as object recognition [16], and augmented reality applications [16, 22]. Mapping using a camera coupled with a depth sensor is a procedure which instantaneously captures a dense representation of the view. Two alternatives to this are to scan a static scene with a laser scanner or by measuring objects through tactile measurements. These methods generally produce depth measurements of higher accuracy than the RGB-D system of focus in this thesis, but they suffer from measurements taking a long time which leads to unsuitability in dynamic scenes, long wait times and generally cost a lot more than an RGB-D camera setup. These methods are also generally either more adapted for long-range measurements or contact measurement at very close range. RGB-D cameras fit in the middle, allowing relatively precise measurements in the single meter distance range. The act of calibrating a camera is essentially to model it mathematically. This is done by gathering data to determine the parameters of the mathematical model, so that more information about the real world can be extracted from the images captured by the camera. 1.1 Objectives This thesis looks at improving the spatial accuracy of the depth data returned by an active stereo vision system, by regarding the camera system as a complete 3D imaging device, rather than an imaging device coupled with a depth sensor. The end goal is to enable more accurate measurements through modeling distortions in the depth data. We achieve this through a joint camera model of image and depth. The model contains a linear correction for the depth data, utilizing the image data in order to improve the depth data. This previously unpublished linear correction model of depth is presented and tested out in practice. The camera is calibrated in order to achieve the above, which is simply another name for determining the parameters in the model. 1.2 Outline The report is structured as follows. Chapter 2 summarizes necessary theory and gives a brief description of active stereo cameras. Chapter 3 reports previous works within both ordinary camera calibration as well as works centered around depth cameras.
Recommended publications
  • Mobile Augmented Reality for Semantic 3D Models -A Smartphone-Based Approach with Citygml
    Mobile Augmented Reality for Semantic 3D Models -A Smartphone-based Approach with CityGML- Christoph Henning Blut Veröffentlichung des Geodätischen Instituts der Rheinisch-Westfälischen Technischen Hochschule Aachen Mies-van-der-Rohe-Straße 1, 52074 Aachen NR. 70 2019 ISSN 0515-0574 Mobile Augmented Reality for Semantic 3D Models -A Smartphone-based Approach with CityGML- Von der Fakultät für Bauingenieurwesen der Rheinisch-Westfälischen Technischen Hochschule Aachen zur Erlangung des akademischen Grades eines Doktors der Ingenieurwissenschaften genehmigte Dissertation vorgelegt von Christoph Henning Blut Berichter: Univ.-Prof. Dr.-Ing. Jörg Blankenbach Univ.-Prof. Dr.-Ing. habil. Christoph van Treeck Tag der mündlichen Prüfung: 24.05.2019 Diese Dissertation ist auf den Internetseiten der Universitätsbibliothek online verfügbar. Veröffentlichung des Geodätischen Instituts der Rheinisch-Westfälischen Technischen Hochschule Aachen Mies-van-der-Rohe-Straße 1, 52074 Nr. 70 2019 ISSN 0515-0574 Acknowledgments I Acknowledgments This thesis was written during my employment as research associate at the Geodetic Institute and Chair for Computing in Civil Engineering & Geoinformation Systems of RWTH Aachen University. First and foremost, I would like to express my sincere gratitude towards my supervisor Univ.-Prof. Dr.-Ing. Jörg Blankenbach for his excellent support, the scientific freedom he gave me and the inspirational suggestions that helped me succeed in my work. I would also like to thank Univ.-Prof. Dr.-Ing. habil. Christoph van Treeck for his interest in my work and the willingness to take over the second appraisal. Many thanks go to my fellow colleagues for their valuable ideas towards my research and the fun after-work activities that will be remembered. Last but not least, I am grateful to my family for the support and motivation they gave me.
    [Show full text]
  • Camera Motion Estimation for Multi-Camera Systems
    Camera Motion Estimation for Multi-Camera Systems Jae-Hak Kim A thesis submitted for the degree of Doctor of Philosophy of The Australian National University August 2008 This thesis is submitted to the Department of Information Engineering, Research School of Information Sciences and Engineering, The Australian National University, in fullfilment of the requirements for the degree of Doctor of Philosophy. This thesis is entirely my own work, except where otherwise stated, describes my own research. It contains no material previously published or written by another person nor material which to a substantial extent has been accepted for the award of any other degree or diploma of the university or other institute of higher learning. Jae-Hak Kim 31 July 2008 Supervisory Panel: Prof. Richard Hartley The Australian National University Dr. Hongdong Li The Australian National University Prof. Marc Pollefeys ETH, Z¨urich Dr. Shyjan Mahamud National ICT Australia In summary, this thesis is based on materials from the following papers, and my per- cived contributions to the relevant chpaters of my thesis are stated: Jae-Hak Kim and Richard Hartley, “Translation Estimation from Omnidirectional Images,” Digital Im- age Computing: Technqiues and Applications, 2005. DICTA 2005. Proceedings, vol., no., pp. 148-153, Dec 2005, (80 per cent of my contribution and related to chapter 6) Brian Clipp, Jae-Hak Kim, Jan-Michael Frahm, Marc Pollefeys and Richard Hartley, “Robust 6DOF Motion Estimation for Non-Overlapping, Multi-Camera Systems,” Applications of Computer Vision, 2008. WACV 2008. IEEE Workshop on , vol., no., pp.1-8, 7-9 Jan 2008, (40 per cent of my contribu- tion and related to chapter 7) Jae-Hak Kim, Richard Hartley, Jan-Michael and Marc Pollefeys, “Visual Odometry for Non-overlapping Views Using Second-Order Cone Programming,” Asian Conference on Computer Vision, Tokyo, Japan, ACCV (2) 2007, pp.
    [Show full text]
  • 2020Spring 20 Camerasandca
    Cats + mirrors + face filters [reddit – juicysox] [Isa Milefchik (1430 HTA Spring 2020)] [Madelyn Adams (student Spring 2019)] Zoom protocol Please: • Cameras on (it really helps me to see you) • Real names • Mics muted • Raise hands in Zoom for questions, unmute when I call • I will ask more often for questions Project 4 – due Friday • Both parts – Written – Code Project 5 • Questions and code due Friday April 10th Final group project • Groups of four • Groups of one are discouraged – you need a good reason. • Group by timezone where possible; use Piazza • We’ll go over possible projects at a later date Questions • What else did I miss? By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 By Suren Manvelyan, http://www.surenmanvelyan.com/gallery/7116 What is a camera? Camera obscura: dark room Known during classical period in China and Greece (e.g., Mo-Ti, China, 470BC to 390BC) Illustration of Camera Obscura Freestanding camera obscura at UNC Chapel Hill Photo by Seth Ilys James Hays James, San Francisco, Aug. 2017 Camera obscura / lucida used for tracing Lens Based Camera Obscura, 1568 Camera lucida drawingchamber.wordpress.com Tim’s Vermeer Vermeer, The Music Lesson, 1665 Tim Jenison (Lightwave 3D, Video Toaster) Tim’s Vermeer – video still First Photograph Oldest surviving photograph Photograph of the first photograph – Took 8 hours on pewter plate Joseph Niepce, 1826 Stored at UT Austin Niepce later teamed up with Daguerre, who eventually created Daguerrotypes Dimensionality Reduction Machine (3D to 2D) 3D world 2D image Point of observation Figures © Stephen E.
    [Show full text]
  • Pinhole Camera Calibration in the Presence of Human Noise
    Linköping Studies in Science and Technology Dissertations, No. 1402 Pinhole Camera Calibration in the Presence of Human Noise Magnus Axholt Department of Science and Technology Linköping University SE-601 74 Norrköping, Sweden Norrköping, 2011 Pinhole Camera Calibration in the Presence of Human Noise Copyright © 2011 Magnus Axholt [email protected] Division of Visual Information Technology and Applications (VITA) Department of Science and Technology, Linköping University SE-601 74 Norrköping, Sweden ISBN 978-91-7393-053-6 ISSN 0345-7524 This thesis is available online through Linköping University Electronic Press: www.ep.liu.se Printed by LiU-Tryck, Linköping, Sweden 2011 Abstract The research work presented in this thesis is concerned with the analysis of the human body as a calibration platform for estimation of a pinhole camera model used in Aug- mented Reality environments mediated through Optical See-Through Head-Mounted Display. Since the quality of the calibration ultimately depends on a subject’s ability to construct visual alignments, the research effort is initially centered around user studies investigating human-induced noise, such as postural sway and head aiming precision. Knowledge about subject behavior is then applied to a sensitivity analy- sis in which simulations are used to determine the impact of user noise on camera parameter estimation. Quantitative evaluation of the calibration procedure is challenging since the current state of the technology does not permit access to the user’s view and measurements in the image plane as seen by the user. In an attempt to circumvent this problem, researchers have previously placed a camera in the eye socket of a mannequin, and performed both calibration and evaluation using the auxiliary signal from the camera.
    [Show full text]