Abstract Computer Vision in the Space of Light Rays
Total Page:16
File Type:pdf, Size:1020Kb
ABSTRACT Title of dissertation: COMPUTER VISION IN THE SPACE OF LIGHT RAYS: PLENOPTIC VIDEOGEOMETRY AND POLYDIOPTRIC CAMERA DESIGN Jan Neumann, Doctor of Philosophy, 2004 Dissertation directed by: Professor Yiannis Aloimonos Department of Computer Science Most of the cameras used in computer vision, computer graphics, and image process- ing applications are designed to capture images that are similar to the images we see with our eyes. This enables an easy interpretation of the visual information by a human ob- server. Nowadays though, more and more processing of visual information is done by computers. Thus, it is worth questioning if these human inspired “eyes” are the optimal choice for processing visual information using a machine. In this thesis I will describe how one can study problems in computer vision with- out reference to a specific camera model by studying the geometry and statistics of the space of light rays that surrounds us. The study of the geometry will allow us to deter- mine all the possible constraints that exist in the visual input and could be utilized if we had a perfect sensor. Since no perfect sensor exists we use signal processing techniques to examine how well the constraints between different sets of light rays can be exploited given a specific camera model. A camera is modeled as a spatio-temporal filter in the space of light rays which lets us express the image formation process in a function ap- proximation framework. This framework then allows us to relate the geometry of the imaging camera to the performance of the vision system with regard to the given task. In this thesis I apply this framework to problem of camera motion estimation. I show how by choosing the right camera design we can solve for the camera motion using lin- ear, scene-independent constraints that allow for robust solutions. This is compared to motion estimation using conventional cameras. In addition we show how we can extract spatio-temporal models from multiple video sequences using multi-resolution subdivi- son surfaces. COMPUTER VISION IN THE SPACE OF LIGHT RAYS : PLENOPTIC VIDEO GEOMETRY AND POLYDIOPTRIC CAMERA DESIGN by Jan Neumann Dissertation submitted to the Faculty of the Graduate School of the University of Maryland, College Park in partial fulfillment of the requirements for the degree of Doctor of Philosophy 2004 Advisory Commmittee: Professor Yiannis Aloimonos, Chair and Advisor Professor Rama Chellappa, Dean’s Representative Professor Larry Davis Professor Hanan Samet Professor Amitabh Varshney c Copyright by Jan Neumann 2004 ACKNOWLEDGMENTS I owe my gratitude to all the people who have made this thesis possible and because of whom my graduate experience has been one that I will cherish forever. First and foremost I’d like to thank my advisor, Professor Yiannis Aloimonos for giving me an invaluable opportunity to work on challenging and extremely interesting projects over the past seven years, while still being free to follow my own paths. He has always given me help and advice and there has never been an occasion when he hasn’t given me time to discuss personal and research questions with him. I also want to thank Professor Rama Chellappa, Professor Larry Davis, Professor Amitabh Varshney, and Professor Hanan Samet for agreeing to serve on my thesis com- mittee and for sparing their invaluable time reviewing the manuscript. I also want to thank the late Professor Azriel Rosenfeld whose guidance and enthusiasm for the field of computer vision made the computer vision lab at the University of Maryland the in- spiring place that it is. My colleagues and friends at the computer vision lab have enriched my graduate life in many ways and deserve a special mention, Patrick Baker, Tomas Brodsky, Filip Defoort, Cornelia Fermuller,¨ Gutemberg Guerra, Abhijit Ogale, and Robert Pless. I also want to thank my friends Yoni Wexler, Darko Sedej, Gene Chipmann, Nizar Habash, Adam Lopez, Supathorn Phongikaroon and Theodoros Salonidis for sharing the gradu- ate student experience with all its highs and lows with me, and understanding if I could not always spend as much time with them as they deserve. I also want to thank Kostas ii Daniilidis for the many illuminating discussions about the structure of the space of light rays. I dedicate this thesis to my family - my mother Insa, father Joachim and brother Lars,who have always stood by me and guided me through my career, and especially to my wife Danitza, who has supported me during the most stressful times of my PhD and was willing to accept the negative consequences of being with a PhD student, thank you. In the end, I would also like to acknowledge the financial support from the De- partment of Computer Science and the Graduate School for their two fellowships and the Deutsche Studienstiftung for their summer schools that truly enriched my studies during the early years of my graduate school career. iii TABLE OF CONTENTS List of Tables ix List of Figures x 1 Introduction 1 1.1 Why study cameras in the space of light rays? . 7 1.2 Example Application: Dynamic 3D photography . 9 1.3 Plenoptic video geometry: How is information encoded in the space of light rays? . 10 1.4 Polydioptric Camera Design . 14 1.5 Overview of the thesis . 16 2 Plenoptic Video Geometry: The Structure of the Space of Light Rays 18 2.1 Preliminaries . 18 2.2 Representations for the space of light rays . 24 2.2.1 Plenoptic Parametrization . 24 2.2.2 Plucker¨ coordinates . 26 2.2.3 Light field parameterization . 28 2.3 Information content of plenoptic subspaces . 30 iv 2.3.1 One-dimensional Subspaces . 31 2.3.2 Two-dimensional Subspaces . 32 2.3.3 Three-dimensional Subspaces . 35 2.3.4 Four-dimensional Subspaces . 37 2.3.5 Five-dimensional Subspaces . 38 2.4 The space of images . 38 2.5 The Geometry of Epipolar Images . 40 2.5.1 Fourier analysis of the epipolar images . 44 2.5.2 Scenes of constant depth . 45 2.5.3 A slanted surface in space . 47 2.5.4 Occlusion . 49 2.6 Extension of natural image statistics to light field statistics . 52 2.7 Summary . 54 3 Polydioptric Image Formation: From Light Rays to Images 57 3.1 The Pixel Equation . 58 3.2 Optics of the lens system . 59 3.3 Radiometry . 63 3.4 Noise characteristics of a CCD sensor . 65 3.5 Image formation in shift-invariant spaces . 70 3.6 Summary . 72 4 Polydioptric Sampling Theory 74 4.1 Plenoptic Sampling in Computer Graphics . 75 4.2 Quantitative plenoptic approximation error . 79 v 4.3 Evaluation of Approximation Error based Natural Image Statistics . 85 4.4 Non-uniform Sampling . 86 4.5 Summary . 86 5 Application: Structure from Motion 87 5.1 Plenoptic Video Geometry: How is 3D motion information encoded in the space of light rays? . 89 5.2 Ray incidence constraint . 92 5.3 Ray Identity Constraint . 94 5.3.1 Discrete plenoptic motion equations . 94 5.3.2 Differential plenoptic motion equations . 95 5.3.3 Differential Light Field Motion Equations . 97 5.3.4 Derivation of light field motion equations . 98 5.4 Variational Formulation of the Plenoptic Structure from Motion . 101 5.5 Feature computation in the space of light rays . 104 5.6 How much do we need to see of the world? . 105 5.7 Influence of the field of view on the motion estimation . 107 5.8 Sensitivity of motion and depth estimation using perturbation analysis . 109 5.9 Stability Analysis using the Cramer-Rao lower bound . 114 5.10 Experimental validation of plenoptic motion estimation . 117 5.10.1 Polydioptric sequences generated using computer graphics . 118 5.10.2 Polydioptric sequences generated by resampling epipolar volumes 119 5.10.3 Polydioptric sequences captured by a multi-camera rig . 121 5.10.4 Comparison of single view and polydioptric cameras . 123 5.11 Summary . 126 vi 6 Application: Spatio-temporal Reconstruction from Mulitple Video Sequences 130 6.1 Preliminaries . 134 6.2 Multi-Resolution Subdivision Surfaces . 136 6.3 Subdivision . 136 6.4 Smoothing and Detail Computation . 140 6.5 Synthesis . 141 6.6 Encoding of Detail in Local Frames . 146 6.7 Multi-Camera Shape and Motion Estimation . 148 6.8 Shape Initialization . 148 6.9 Motion Initialization . 150 6.10 Shape and Motion Refinement through Spatio-Temporal Stereo . 152 6.11 Results . 155 6.12 Hierarchies of cameras for 3D photography . 159 6.13 Conclusion and Future Work . 160 7 Physical Implementation of a Polydioptric Camera 162 7.1 The Physical Implementation of Argus and Polydioptric Eyes . 163 7.2 The plenoptic camera by Adelson and Wang . 167 7.3 The optically differentiating sensor by Farid and Simoncelli . 168 7.4 Compound Eyes of Insects . 169 8 Conclusion 176 A Mathematical Tools 180 A.1 Preliminaries . 180 A.1.1 Fourier Transform . 180 vii A.1.2 Poisson Summation Formula . 183 A.1.3 Cauchy-Schwartz Inequality . 184 A.1.4 Minkowski Inequality . 184 A.1.5 Sobolev Space . 185 A.2 Derivation of Quantitative Error Bound . 186 A.2.1 Definitions for Sampling and Synthesis Functions . 186 A.2.2 Hypotheses . 187 A.2.3 l2 Convergence of the Samples . 189 A.2.4 Expression of f in Fourier Variables . 191 A.2.5 Average Approximation Error . 195 Bibliography 197 viii LIST OF TABLES 2.1 Information content the axis-aligned plenoptic subspaces, Part I . 55 2.2 Information content of axis-aligned plenoptic subspaces Part II . 56 5.1 Quantities that can be computed using the basic light ray constraints for different camera types.