Single-Image Omnidirectional Vision Systems: a Survey of Models, Calibration, and Applications

Single-Image Omnidirectional Vision Systems: A Survey of Models, Calibration, and Applications

Carlos Jaramillo

Literature Review (Second Exam)

Computer Science Department

The Graduate Center of the City University of New York

Committee in charge: Professor Jizhong Xiao, Advisor Professor Ioannis Stamos, Hunter College Professor YingLi Tian, City College Professor Zhigang Zhu, City College

Spring 2016 i

Abstract

Single-Image Omnidirectional Vision Systems: A Survey of Models, Calibration, and Applications by Carlos Jaramillo Graduate Center, City University of New York Professor Jizhong Xiao, Advisor

The prominent use of omnidirectional vision sensors (ODVS) in various areas of computer vision, such as visual odometry for navigation (egomotion), structure reconstruction (mapping) and localization, teleconferencing, surveillance, virtual reality, and so on, motivates the study of existing camera models and their fundamental calibration methods. The main advantage of an ODVS is the instantaneous wide field-of-view from a single image. Among the most popular instantaneous ODVSs, we consider fish-eye lenses and the family of omnidirectional catadioptric systems (central and non-central), which are those sensors combining conventional cameras and curved mirrors obtained by revolving conic sections (e.g. hyperboloids, paraboloids, spheres, etc.) in order to capture the entire 360◦ azimuthal region of a panorama. Although the majority of state-of-the-art methods describe the image formation (projective geometry) and calibration via the unified “sphere model”, we discuss various alternatives such as the distortion-based model and analytical models. In fact, a calibrated system is required for reliable 3D perception (from motion or multiviews) because the perfect assembly of the components is never guaranteed. We discuss the advantages and disadvantages of the portrayed approaches and how well they generalize. We cover multiview ODVS configurations for viewing a common region of interest for the instantaneous computation of 3D information. We consider coaxial and multiaxial catadioptric configurations using a single camera for rig compactness and simplification. In particularly, coaxial (folded catadioptric) ODVSs can provide range measurements from triangulation of pixel correspondences around the complete panorama. We review a few proposed techniques for calibrating multiview ODVS as well. ii

Contents

List of Figures iv

List of Tables vi

1 Introduction 1 1.1 Panoramic Cameras in the Second Millenium ...... 1 1.2 Single-Image Omnidirectional Vision Sensors (ODVSs) ...... 3 1.2.1 Central Systems (SVP) ...... 3 1.2.1.1 Para-catadioptric Configuration ...... 4 1.2.1.2 Hyper-catadioptric Configuration ...... 5 1.2.1.3 Ellipsoidal Catadioptric Configuration ...... 6 1.2.2 Non-Central Systems (non SVP) ...... 6 1.2.2.1 Conical Mirrors ...... 6 1.2.2.2 Spherical Mirrors ...... 7 1.2.2.3 Fisheye and Panoramic Annular Lenses ...... 7 1.2.3 Quasi-Central Systems ...... 8

2 Monocular Omnidirectional Vision 9 2.1 Projection Models ...... 9 2.1.1 Uniﬁed Sphere Model ...... 9 2.1.2 3D Lines Projected as Conics ...... 12 2.1.3 Distortion Model ...... 13 2.1.4 Non-Central Models ...... 15 2.1.4.1 Analytical Forward Projection (AFP) Model ...... 15 2.1.4.2 Dense Projection through Quadric Mirrors ...... 16 2.1.5 Two-Plane Model ...... 17 2.2 Available Calibration Implementations for ODVSs ...... 18 2.2.1 Hybrid Calibration for Quasi-Central Systems ...... 19 2.2.2 Point-based Methods ...... 20 2.2.3 Self-Calibration ...... 23 2.3 General Remarks ...... 24 iii

3 Multiview Omnidirectional Vision 25 3.1 Relevant Rig Conﬁgurations ...... 25 3.1.1 Coaxial Arrangements ...... 25 3.1.1.1 Single-Camera “Folded” Catadioptric Systems ...... 26 3.1.1.2 Single-Camera “Unfolded” Catadioptric Systems ...... 28 3.1.1.3 Using a Plurality of Cameras ...... 29 3.1.2 Multiaxial Arrangements ...... 30 3.1.2.1 Single-Camera Multiaxial systems ...... 31 3.1.2.2 Binocular Multiaxial Systems ...... 31 3.2 Omnistereo Vision Calibration ...... 33 3.2.1 Epipolar Geometry ...... 33 3.2.2 Calibration Approaches ...... 34 3.3 Range Sensing from Omnistereo Sensors ...... 35 3.3.1 Cylindrical Panoramas ...... 35 3.3.2 Correspondence Search ...... 36 3.3.3 Triangulation ...... 36

4 Applications of Omnidirectional Vision Sensors 39 4.1 Structure from Motion (a.k.a. SLAM) ...... 39 4.1.1 Introduction ...... 39 4.1.2 Egomotion from Omnidirectional Visual Odometry ...... 40 4.1.2.1 Optical Flow from an ODVS ...... 41 4.1.3 Localization ...... 41 4.1.4 Mapping ...... 42 4.1.4.1 Mapping (Erroneously thought as Localization) ...... 42 4.1.4.2 Dense 3D Reconstruction (Mapping) ...... 42 4.1.5 Sfm / 3D SLAM ...... 42 4.1.5.1 Sparse SLAM ...... 43 4.1.5.2 Semi-Dense SLAM ...... 43 4.1.5.3 Dense SLAM ...... 44 4.2 Obstacle Avoidance for Autonomous Robots ...... 45 4.3 Wide-Angle Light Fields (Digital Refocusing) ...... 45 4.4 Surveillance, Conferencing, and Virtual Omnipresence ...... 46 4.4.1 Omnidirectional Surveillance and Monitoring ...... 46 4.4.2 Panoramic Teleconferencing ...... 47 4.4.3 Omnipresence for Virtual Reality ...... 47 4.5 Omnidirectional Thermal Vision ...... 47 4.6 Concluding Remarks ...... 49

Bibliography 50

A Symbol Notation 67 iv

List of Figures

1.1 First panoramic “strip/slit” camera...... 1 1.2 Mirror-based imager patented by Wolcott in 1840...... 2 1.3 Hyperbolic catadioptric sensor developed by Rees in 1970 ...... 3 1.4 Sphereo developed by Nayar in 1988 ...... 4 1.5 Conic sections satisfying the SVP constraint for catadioptric central systems . . 5 1.6 Para-catadioptric rigs by Shree K. Nayar ...... 5 1.7 The effect of changing the curvature parameter k for a hyperboloidal mirror . . 6 1.8 Impractical conic sections catadioptric configurations (non-central systems) . . 7 1.9 An example of a conical mirror configuration ...... 7 1.10 Fisheye lenses and PAL configurations ...... 8 1.11 Projection situation with a quasi-central system ...... 8

2.1 Generalized Uniﬁed Model (GUM) projection process ...... 10 2.2 3D line projected as a conic on the image plane ...... 12 2.3 Distortion model proposed by Scaramuzza ...... 13 2.4 Viewing ray from back-projection of an image point via the two-plane model . 17 2.5 Hybrid calibration for quasi-central systems (setup and centering results) . . . . 19 2.6 Control patterns and toolbox sample for point-based calibration...... 20 2.7 Initialization of the generalized focal length from points on a 3D line...... 22

3.1 Single-Camera omnistereo sensor designed for quadrotor micro aerial vehicles . 26 3.2 Omnistereo sensor using spherical mirrors in a folded conﬁguration ...... 27 3.3 Double-lobed omnistereo catadioptrics ...... 28 3.4 Coaxial omnistereo catadioptrics for unfolded, disjoint mirrors) ...... 29 3.5 Cata-Fisheye camera for omnistereo imaging ...... 30 3.6 Cylindrical representation for the horizontally-aligned binocular omnistereo . . 30 3.7 Various multiaxial catadioptric arrangements for a single camera...... 31 3.8 Horizontal omnistereo rigs using separate hyperbolic OVDSs ...... 32 3.9 The head mounted stereo display developed by Nagahara et al. [125]...... 32 3.10 Epipolar plane constraint for omnistereo ...... 33 3.11 Epipolar lines rendered by the two possible omnistereo conﬁgurations . . . . . 34 3.12 Cylindrical panorama construction and triangulation for coaxial omnistereo . . 35 v

3.13 Example of dense stereo matching from panoramic stereo ...... 36 3.14 Examples of omnistereo range from template matching methods ...... 37 3.15 Triangulation with skew back-projection rays ...... 38 3.16 3D dense point cloud example from triangulation of correspondences . . . . . 38

4.1 3D reconstruction comparison and dominant vanishing points extraction . . . . 43 4.2 LSD-SLAM reconstruction example and pipeline overview ...... 44 4.3 Wide-angle light ﬁeld via axial-cones modeling on a matrix of spherical mirrors 45 4.4 Teleconferencing and Head-mounted catadioptric ODVS ...... 46 4.5 Triangulated model from omnidirectional images captured along a city walk [99] 48 4.6 Examples of omnidirectional thermal vision ...... 48 vi

List of Tables

2.1 The parameters and degree of the FP equation (2.17) for various mirror shapes [4]. 15 2.2 Classiﬁcation of calibration methods for generic ODVSs ...... 18 1

Chapter 1

Introduction

1.1 Panoramic Cameras in the Second Millenium

Vision is the most exploited sense by human beings. However, the central region in our retina (the fovea centralis) is capable of focusing on approximately 6◦ of view [195], and our peripheral vision is poor, but still useful. It is in our nature to want to extend our visual perception capabilities by using technologies that let us see more. Figure 1.1 illustrates the ﬁrst hand-cranked rotating camera (recorded in human history) that was designed by Puchberger in 1843 and it was capable of up to 150◦ of view [21, 179]. However, artistic depictions of 360◦ panoramas and strip/slit photography (mosaics) have obvious drawbacks in practice, especially if compared against instantaneous capture devices in dynamic environments. Therefore, this survey only considers instantaneous single-image devices as practical panoramic cameras (a.k.a. omnidirectional cameras) for real-time applications (Chapter 4).

Figure 1.1: Hand-cranked panoramic camera (slit photography) invented by Joseph Puchberger in 1843 (from [179]).

Over a century and a half has passed since inventors were able to capture wider views on a single image. The first documented device for imaging a field-of-view (FOV) wider than a conventional pinhole camera dates back to 1840 [184]. Although not really a panoramic camera, its inventor, A.S. Wolcott, patented the configuration of a concave mirror with a daguerreotype 2

plate onto where reﬂected light rays get imaged without the need for lenses (Figure 1.2). In this text, we use the acronym ODVS to refer to an omnidirectional vision sensor.

Figure 1.2: Drawing from U.S. Patent # 1582 [184] by inventor A.S. Wolcott (in year 1840) of an apparatus for taking images by means of a concave reﬂector (without lenses) reprojecting upon the plate (adjustable position for focus control).

The first use of a cone-shape reflector to produce panoramic views appeared in the mechanism patented in 1911 [87]. New generations of panoramic cameras continued to emerge through the 20th century. Some combined spheres and lenses [42], others applied the periscope principle by projection onto cylinders for exterior viewing [142] or introverted viewing [81]. In 1939, Conant described several catadioptric sensors as combinations of spherical, parabolic or conical mirrors with a camera lens to obtain high quality images on film [35]. The combination of reflectors (mirrors) and cameras (using lenses) is known as catadioptrics. There exists a variety of catadioptric configurations for extending the FOV of the camera to its 360◦ periphery. We will review the possible catadioptric ODVS in Section 1.2. In the late 1960’s, Rees developed the first ODVS (Figure 1.3) that could display perspectively-correct images in real-time by unwarping the view from a hyperbolic mirror captured by a conventional perspective camera [148]. Other known configurations that followed are based on conic mirrors [197], spherical mirrors setup [71], hyperboloidal mirrors [199]), and paraboloidal reflectors with orthographic cameras [128]. The points projected onto the image sensor can be modeled in several ways, so we review the most important models in Section 2.1. The work by Ishiguro et al. in [75] explains the construction process of low-cost omnidirectional mirrors. Since the mid 1990’s, omnidirectional vision research has grown at a dramatic rate. For instance, Yagi et al. applied his conical-mirror ODVS (called COPIS) for real-time obstacle avoidance of a ground robot’s navigation [196]. Around the same time, Nalwa from Bell Labs reported a true omnidirectional camera for instant 360◦ views satisfying a single effective viewpoint [126]. We discuss several developments of single-image omnidirectional sensors in Chapter 2. Several works surveying the field exist; among the most popular textbooks is “Panoramic Vision” by Benosman and Kang [21] and more recently the monograph titled “Camera Models and Fundamental Concepts Used in Geometric Computer Vision” [166]. Light refraction via lenses alone is another popular approach for increasing a viewing angle (≥ 180◦). The idea of a “fish-eye” view was conceived in 1906 by Wood [188]. According to [85], the first wide angle lens made out of glass was the Hill Sky Lens in 1924. Miyamoto 3

Figure 1.3: First catadioptric ODVS to use a hyperbolic mirror and a perspective camera [148]. produced a fisheye lens for 35 mm film camera in 1964 [119]. Although fisheye lenses have physical refraction limits, some lenses can view up to 185◦. In fact, the Fisheye-Nikkor 6mm (Fig. 1.10a) is capable of a 220◦ view at an exorbitant price tag (> $100,000 USD). In addition, some hybrid systems have been proposed, like Buchele’s unitary catadioptric objective lens system (patented in 1950) that combines a couple of hemispherical lenses and a concave mirror [25]. It is also common to combine a pair of cameras (using lenses) as to imitate binocular vision and its depth-sensing properties. The first catadioptric configuration attempting omnidirectional stereo (omnistereo) vision is the work by Nayar for his “Robotic vision system” in 1988 [127]. His “sphereo” rig consisted of a single camera and a pair of identical reflecting spheres sitting on a horizontal plane (Figure 1.4). In 2000, Yagi and Yachida combined a plurality of mirrors and a single camera [198]. Various multiview ODVS setups have emerged since then and they are surveyed in Chapter 3.

1.2 Single-Image Omnidirectional Vision Sensors (ODVSs)

Many of the catadioptric configurations that were proposed over the last decade (Chapter 2) are primarily derived from the geometries outlined in the seminal treatment by Baker and Nayar [12]. Those systems aimed at satisfying the ubiquitous single-viewpoint (SVP) constraint are known as “Central Systems” (Section 1.2.1). Those sensors not obeying the SVP property, such as for the case of spherical catadioptrics and fisheye lenses are considered as “Non-Central Systems” (Section 1.2.2). We review this seminal ODVS classification based on the SVP requirement of the projective geometries.

1.2.1 Central Systems (SVP) In 1637, René Descartes postulated that conical lenses and mirrors have the ability to focus light through a single point [46]. The parabola, hyperbola, ellipsis, and the cone are the only 4

Figure 1.4: In 1988, Shree K. Nayar prototyped the first catadioptric omnidirectional stereo rig (dubbed “sphereo”) using a pair of spherical mirrors on a horizontal plane at orthogonal distance h from a single camera’s viewpoint [127]. conic sections that respect the single viewpoint (SVP) property while realizing a pure perspective imaging system [130, 12]. Contemporaneously, [69] described the profiles of conic mirrors (surfaces of revolution) producing perspective projections. Thus, only catadioptric cameras using any of these three mirror profiles are considered within this class. Baker and Nayar derived two general solutions for the fixed SVP constraint:

c 2 k c2 k − 2 z − − r2 − 1 = for k ≥ 2 (1.1) 2 2 4 k ✓ ◆ ✓ ◆ ✓ ◆ c 2 c2 2k + c2 z − + r2 1 + = for k > 0 (1.2) 2 2k 4 ✓ ◆ ! ! where due to rotational symmetry, r = x2 + y2. The k is a unit-less parameter that is inversely related to the conic’s curvature, and c is the distance between the conic’s foci. Planar, p conical, parabolic, and hyperbolic profiles use solution (1.1), while spherical and ellipsoidal profiles use (1.2), all with the appropriate parameters specified for each type.

1.2.1.1 Para-catadioptric Conﬁguration In [129], Nayar achieves omnidirectional vision via a a “circular” paraboloidal mirror and an orthographic camera lens. Principal rays of light (directed toward the focus of the parabolic mirror) are reﬂected orthographically onto the image plane as shown in Figure 1.5a. Due to the nature of the orthographic projection, a zoom-lens is usually needed to satisfy these optics. Notice that the projective geometry is invariant to translations of the mirror (Figure 1.6). Also, since c ! ∞ and k ! ∞, so that h = c/k indicates the radius at z = 0, solution (1.1) becomes: 5

(a) Paraboloidal mirror (b) Hyperboloidal mirror (c) Ellipsoidal mirror

Figure 1.5: Conic sections satisfying the SVP constraint for central systems. These non- degenerate catadioptric conﬁgurations were derived by Baker and Nayar in [12].

h2 − r2 z = (1.3) 2h

(a) Paracatadioptric rig (b) Image from the OneShot360 (c) OneShot360 lens

Figure 1.6: Para-catadioptric rigs by Shree K. Nayar

1.2.1.2 Hyper-catadioptric Conﬁguration A hyperbola has two foci. A system that places the camera’s effective viewpoint in one focal point in order to image the principal rays reﬂected by the “circular” hyperbolic mirror obeys the SVP constraint (Fig. 1.5b). Hyperbolic projection obeys equation (1.1) by enforcing k > 2 and c > 0. As depicted in Figure 1.7, k is a unit-less parameter that is inversely related to the mirror’s curvature or more precisely, the eccentricity εc of the conic. In fact, εc > 1 for hyperbolas, yet a plane is produced when εc ! ∞ or k ! 2. 6

Figure 1.7: Hyperboloidal mirrors change curvature according to parameter k. When k ! 2, it approximates the plane. The camera’s viewpoint coincides with the hyperbola’s second focus.

1.2.1.3 Ellipsoidal Catadioptric Conﬁguration The ellipsoidal solution is illustrated in Figure 1.5c and obeys equation (1.2) for k > 0 and c > 0. We already mentioned Rees [148] as a ﬁrst example of a catadioptric camera using elliptic mirrors. Ellipsoidal mirrors cannot increase the FOV by themselves, but two or more can be arranged to achieve wide-angle omnidirectional vision (for example, see Figure 3.9).

1.2.2 Non-Central Systems (non SVP) While the SVP guarantees that true perspective geometry can always be recovered from the original image, it limits the selection of mirror proﬁles to a set of conic sections (Section 1.2.1). The effect of this limit is twofold:

1. SVP mirror proﬁles are typically need to be manufactured uniquely at a low cost, and

2. the vertical ﬁeld-of-view is limited in SVP mirrors for compact systems.

1.2.2.1 Conical Mirrors The projection via a mirror cone obeys equation (1.1) with k ≥ 2 and c = 0 as shown in Figure 1.8a. Conical mirrors experience a circular locus of virtual viewpoints, so they are called radial imaging systems, so at ﬁrst, Nayar considered them “impractical” central-systems. Later, in [92, 101], truncated mirror cones were proved useful. They can be employed for the 3D reconstruction of an object observed through a hole by a camera positioned at the cusp of the cone (Figure 1.9). 7

(a) Conical mirror (b) Spherical mirror (c) Caustic in non-SVP rig

Figure 1.8: The conic sections without a practical SVP location. These degenerate catadioptric conﬁgurations were derived by Baker and Nayar in [12].

(a) Conical catadioptric setup (b) Camera, cone mirror, subject (c) Face capture

Figure 1.9: Conical mirror conﬁguration employed in [92] for object 3D reconstruction.

1.2.2.2 Spherical Mirrors The spherical mirror’s theoretical SVP property is demonstrated in Figure 1.8b and obeys equation (1.2) for k > 0 and c = 0. Since the focus is the center of the sphere, the SVP conﬁguration is violated in a real system. Instead, a locus of viewpoints (caustic) is generated as proven for a spherical mirror was computed parametrically by Nayar and Baker [12]. This caustics of catadioptric cameras are analyzed in [170] and an example is given in Figure 1.8c.

1.2.2.3 Fisheye and Panoramic Annular Lenses Fisheye lenses are a popular method for sky photography (hence, the name sky-lenses). As mentioned earlier, it is very hard to design a fisheye lens that obeys the SVP constraint [126]. Even commercial grade lenses like the Nikon’s Fisheye-Nikkor lenses (Figure 1.10a) cannot satisfy the SVP property or even reduce the inherent distortion around the peripheral regions of the hemispheric view. A back-to-back configuration is necessary to capture full 360◦ sphere images (see Figure 1.10b for an example), but it is impossible to obtain coincident focal viewpoints since they are located inside each lens. Alternatively, a panoramic annular lens (PAL) combines a single piece of glass and conic reflecting surfaces to capture the 360º view 8 of the surrounding as well as extending the vertical FOV (Figure 1.10c). The construction and characteristics of PALs are described in [108], but real-time application of PALs is first provided by Zhu et al. [205] (Section 3.1.2.2 concerns to stereo vision via PALs).

(a) Fisheye-Nikkor 6mm (b) Back-to-back ﬁsheye lens (c) Rays from PAL onto image

Figure 1.10: (a) The very expensive Nikon’s Fisheye-Nikkor 6mm f/2.8 capable of 220◦ FOV; (b) Back-to-back ﬁsheye lenses conﬁguration for a full sphere images by Slater [162]; (c) The imaging path of a panoramic annular lens (PAL) from [73].

1.2.3 Quasi-Central Systems “Quasi-Central” or “Slightly Non-SVP” systems are those central systems (Section 1.2.1) that contain errors forsaking the SVP assumption. A real ODVS cannot be perfect. For example, it’s hard to place the camera’s viewpoint at the exact focal point location or the orthogonality of the sensor plane with respect to the mirror axis cannot be guaranteed. Thus, all principal rays deviate from projecting on their theoretical locations as illustrated in Figure 1.11, so a special calibration to reduce the overall projection error is necessary (Section 2.2.1).

Figure 1.11: [158] illustrates the viewing rays error for a non-calibrated quasi-central system.

An extensive classification of catadioptric sensors is given by Shabayek [159], where he finds [7] and [110] as calibration methods specifically for slightly non-SVP cameras. More recently, Schönbein et al. [158] provides a method for calibrating quasi-central systems. We study generalized calibration techniques in Chapters 2 and 3 that are applicable to SVP and non-SVP systems, so they should work for slightly non-SVP cameras too. 9

Chapter 2

Monocular Omnidirectional Vision

This survey is directed toward low-cost compact omnidirectional vision sensors (C-ODVS) due to the nature of our main application in robotic navigation (Chapter 4). Here, we investigate the available models for generalizing the image acquisition of ODVSs (in the monocular case) and the various calibration techniques associated to these models.

2.1 Projection Models

It is imperative to survey the state-of-the-art projection models because they express the computational mapping between the 3D entities (e.g. geometric shapes) and their projected form on the image. Sturm et al. [166] surveys the existing camera models up to 2011 (the date of such publication). From that point forward, a few enhancements upon existing models emerged, such as by Xiang et al. [191] where the uniﬁed sphere model [58, 14, 112] is generalized, and Schönbein et al. [158] as an extension of the analytical model by Agrawal [4] to calibrate slightly non-central systems. The distortion-based model derived by Micusik and Pajdla[114] and extended by Scaramuzza et al. [154] and Tardif [173] is also popular and it has been compared in [146] to a few calibration methods based on the sphere model and enhanced in [158]. We also survey the projection model for non-central quadric-shaped mirrors proposed by Gonçalves and Araújo [62].

2.1.1 Uniﬁed Sphere Model This popular image formation model was stipulated in 2000 by Geyer and Daniilidis [58]. Originally, the sphere model was only described for central catadioptric cameras. They theorized the existence of a unit sphere model for projection that is equivalent to the nonlinear analytical solution for the actual quadric surface. In 2001, Barreto and Araújo [14] extended the mapping of parameters onto a sphere model for all SVP projections. Later in 2006, Mei and Rives [112], improved the uniﬁed sphere model by adding distortion step (using radial and tangential distortion parameters) to the projection pipeline, which now 10

Figure 2.1: Steps of the projection pipeline of the Generalized Unified Model (GUM) for the monocular case: (1) project point Pw towards the unit sphere’s focus OM, (2) normalize it as [M] [CP] PS, (3) change of coordinates with respect to [CP] such that PS, (4) project onto normalized [π ] [π ] [I] plane as u pu, (5) apply radial distortion to get d pd, and (6) transform to pixel m in the image. permitted the calibration (at some extent) of slightly non-central catadioptric systems such as spherical mirrors and fisheye lenses. An extension to Mei’s sphere model was given by Xiang et al. [191] in 2013. With the goal to generalize the unified model for non-central catadioptric cameras that suffer of misalignment due to rotation and translation, they removed the axial constraint for the center of projection CP in the model as well as the unnecessary tangential distortion parameters. The vector of 10 intrinsic parameters of GUM is given by

θ = ξ , d, c (2.1) (1⇥10) h i with grouped parameters ξ = ξX ,ξY ,ξZ ; d =[kdist1,kdist2]; c =[α,γ1,γ2,uc,vc] When the virtual pinhole camera is collinear with the mirror’s revolution axis such that T [M] ⇥ ⇤ cP =[0, 0, −ξZ] , the value of ξZ relates to the type of system as follows:

ξZ = 0 pinhole (perspective) cameras or planar mirrors 80 < ξZ < 1 hyperbolic, elliptical, spherical mirrors, and ﬁsheye) > <>ξZ = 1 parabolic mirrors

The radial> distortion is indicated by parameters kdisti , the coordinates of the optical axis :> T projection in the image are [uc, vc] , whereas α is a skew parameter, and γ{x,y} are the generalized focal length parameters deﬁned as γ{x,y} = η f{x,y} , where η := ξz − ψ for a camera plane 11 distance ψ described in Barreto’s model [14]. We refer the reader to Appendix A for a summary of our symbol notation in what follows. With the unit sphere’s center OM, the free position for T [M] the projection point becomes cP =[ξX , ξY , −ξZ] . Figure 2.1 depicts the entire projection [I] process of a point Pw in the world frame [W] into the image as m. Assuming that coordinates [M] of point pw are already given with respect to [M], the projection function fϕ is the composition of various subroutines:

[I] [M] θ m fϕ pw, := fP ◦ fD ◦ fπ ◦ fCP ◦ fS (2.2) In sum, the following steps are taken:. / 1. Given [W] p, change its coordinates with respect to [M]

[M] [W] [M] [W] pw fW p := [W] T p (2.3)

[M] Note that this is only possible if [W] T is known.. /

[M] 2. Normalize pw (onto the unit sphere) by

[M] [M] [M] pw pS fS pw := (2.4) [M] pw . / 0 0 3. Change to coordinates with respect to the center of0 projection0 Cp

[M] ⇥ ⇤ xS − ξX [CP] p f [M] p : [M] p [M] c [M] y ξ (2.5) S CP S = S − P = 2 S − Y 3 [M] zS − ξZ . / 6 7 4 5 4. Project onto the undistorted normalized plane as [πu]

[C ] P xS [CP] [πu] [CP] zS pu fπ pS := [C ] (2.6) 2 P yS 3 [CP] z . / S 4 5 5. Apply radial distortion parameters (k1,k2)

[πd] [πu] [πu] [πu] 2 4 pd fD pu := pu + pu k1ρu + k2ρu (2.7)

. / ⇣ 2 2 ⌘ where ρu = xu + yu (2.8) q 6. Finally, we obtain the pixel point [I] m in the image via

[I] [πd] [πd] m fP pd :=K pd (2.9)

. / γ1 γ1α uc where K = 2 0 γ2 vc3 (2.10) 001 6 7 4 5 12

With the hope to simplify the original sphere model by Geyer and Daniilidis, a few various algebras and geometries have been translated into. For example, Lopez-Franco and Bayro- Corrochano [103] uses conformal geometry to express the forward and back projection of points and lines. Tolvanen et al. [174] described the projection of these geometrical entities with Clifford algebra instead. Perwass and Sommer [143] proposed an “inversion camera model” based on Geometric Algebra (of two parameters) to model the pinhole camera, lens distortions and para-catadioptric cameras as a transformation operator. The use of lifted coordinates by Barreto and Daniilidis [13] helps dealing with the nonlinearities present in back-projection function. Similarly, Puig et al. [145] shows how to compute the lifted 6 ⇥ 10 matrix for the generic catadioptric projection of 3D points as a linear transformation.

2.1.2 3D Lines Projected as Conics

Figure 2.2: 3D lines project as conics on the image and as great circles on the unit sphere model.

On the unit sphere model presented in Section 2.1.1, a line in the world and the sphere’s center OM form a plane that intersects the sphere as a great circle. The great circle’s plane can be T represented by a normal vector nc =[nX , nY , nZ] . As a result, this great circle on the model projects as a conic onto the image as depicted in Figure 2.2. Initially, Geyer and Daniilidis [58] showed that a para-catadioptric sensor could be calibrated using three lines on the image (conics). Later, this technique was generalized for any central system by Barreto and Araújo [15], who also proved that two lines are enough to recover the sphere model parameters for hyperbolic and T [M] elliptical catadioptric systems. Assuming coaxial alignment ( cP =[0, 0, −ξZ] ), according to [17], the ﬁnal expression of the projected conic becomes

2 2 2 2 2 2 2 2 2 −ξZ nY nZ − nX ξZ + nZ − 1 nX nY ξZ − 1 nZ − 1 nX nZ ξZ + ψ nZ − 1 2 2 ⇣ 2 ⌘ 2 2 ⇣2 2 ⌘⇣2 2 ⌘ ⇣ 2 ⌘ 3 Cn = nX nY ξZ − 1 nZ − 1 −ξZ nX nZ − nY ξZ + nZ − 1 nY nZ .ξZ + ψ/ nZ − 1 6 ⇣ ⌘⇣ 2 ⌘ ⇣ 2 ⌘ 2 2 ⇣ 2 ⌘ 7 6 nX nZ ξZ + ψ nZ − 1 nY nZ ξZ + ψ nZ − 1 −nZ .ξZ + ψ / nZ − 1 7 6 (2.11) 7 4 . /⇣ ⌘ . /⇣ ⌘ . / ⇣ ⌘ 5 13

Figure 2.3: An illustration of the components in the distortion model proposed by Scaramuzza et al. [154] The z component of the direction ray [M] v is related to an N-degree polynomial that is function of the radial distance of the point [π] u on the model’s camera plane [π].

Ying et. al [201] uses the original uniﬁed sphere model (postulated in [58]) by projecting lines and spheres projected as conics on the image. They prove that a line is has 3 invariants:

L1 := d(bd − ae) − e(be − cd)=0 abd : 2 L2 = bf + de(l − 1)=0 for CL = bce 2 2 3 L3 := d(bd − ae) fe + f (bf − de)=0 def 6 7 and the sphere has 2 invariants which provide higher accuracy4 during calibration5 than lines alone:

S1 := d(bd − ae) − e(be − cd)=0 abd S := b(bd − ae) f 2 + e(bf − de)(l2 − 1)=0 2 e for CS = 2 bce3 def 2εc where length fe := CPOIc according to Figure 2.2 and parameter l :6= 2 for the7 eccentricity 4 1+εc 5 εc of the respective conic. These geometric invariant properties are followed by Barreto et al. [15] for the calibration of all central catadioptric systems.

2.1.3 Distortion Model Fisheye lens distortion equations were mostly trigonometric, so ﬁnding closed-solutions for the “forward” projection (needed for calibration) is a cumbersome task due to the high nonlinearity of the expressions (of any non-central system). Various models have evolved from refractive solutions to rectify (distortion removal) wide-angle and ﬁsheye lenses by usually 14

computing the relationship between the distorted point radial distance rd and its undistorted one ru or its angle θ subtended between the optical axis and the principal ray as suggested in [68] n i where an n-degree polynomial is devised as rd = ∑i=1 ciθ . The literature is vast for classical polynomial radial distortion models for the calibration of wide FOV lenses. We skip forward until Micusik and Pajdla [114], who employ rational polynomials to represent both a linear and nonlinear parametric projection models for dioptric cameras. With the underlying assumption of rotational symmetry and a coincident image center with the lens’ optical axis, two parameters a and b are sought to satisfy the following form in 1D:

ard θ = 2 1 + brd First, calibration of a ﬁsheye lens using this model was attempted by Jannala and Brandt [80]. Then, Scaramuzza et al. [154] and Tardif et al. [173] extended the distortion model to include central catadioptric cameras. We discuss Scaramuzza’s uniﬁed distortion model, whose components are depicted in Figure 2.3. [M] For a scene point Pw located at pw,h in homogeneous coordinates with respect to the 4⇥3 model’s viewpoint OM, perspective projection matrix Tπ 2 R reduces it to a ray:

[M] [M] v T pw,h (2.12) The relationship between its point [π] u on the model’s camera plane [π] is given by a nonlinear relation g, deﬁned as

xu [M] [π] v g u := 2 yu 3 (2.13) f(u) . / 6 7 where f is a polynomial of degree N with coefﬁcients4 ci5for i = 0,2,3...,N (throwing away 2 2 the i = 1 term as in [154]) and with radial Euclidean distanceρu := kukL2 = xu + yu as follows:

2 3 N p f(u) := c0 + c2ρu + c3ρu + ...+ cNρu (2.14) [π] [I ] The ﬁnal mapping between point u and the pixel C mh on the centered image plane [Ic] is achieved via an afﬁne transformation K 2 R3⇥3 such that

[π] [IC] u K mh (2.15) By composition of (2.13), (2.15), and (2.12), the complete projection equation is produced

[M] [IC] [M] v = g K mh = T pw,h (2.16) Therefore, the calibration process consists. in/ finding the 6 parameters in the affine homogeneous matrix K and the N coefficients for the polynomial function f in (2.14) by minimizing the projection error of control points (Section 2.2.2). The model by Scaramuzza et al. [154] is applicable for calibrating both central and non-central ODVSs. 15

2.1.4 Non-Central Models Non-Central models are a better candidate to represent many that systems that violate the SVP constraint for reasons brought up in Section 1.2.2. Unfortunately, all the existing non-central system models are nonlinear, so their running times are undermined by SVP models (like the ones studied above). For example, Micusik and Pajdla [115] proposed a method to calibrate non-central cameras by means of nonlinear optimization and they show that a more geometrically correct 3D reconstruction is achieved with the non-central camera model.

2.1.4.1 Analytical Forward Projection (AFP) Model More recently, the analytical method (AFP) given by Agrawal, Taguchi and Ramalingam [3, 4] requires solving for a complex root in an 8th degree polynomial equation. We decided to describe this method’s characteristics since it is proven useful when seeking accuracy of orientation for the viewing rays incident to the quadric mirror’s surface (as employed for the calibration method in Section 2.2.1). As illustrated in Figure 2.5a, AFP helps to analytically compute the reflection point m on a quadric mirror by tracing the optical path of the 3D world point p and the camera’s viewpoint c. This is different from computing this reflecting point via numerical approximations (as done in the work of Baker in Nayar [12]). By solving the AFP equation, dense volumetric reconstruction can be achieved efficiently via plane sweeping from multiple views of a non-central catadioptric system. A rotationally symmetric surface around the Z-axis can be described by:

x2 + y2 + Az2 + Bz −C = 0 (2.17) with parameters A, B, and C. Thus, any point m lying on the mirror surface must satisfy equation (2.17). We refer the reader to [4] for the full derivation of the following parameters and the degree of the forward projection (FP) equation for various conic shapes presented in Table 2.1.

Degree based on Camera Placement Mirror Shape FP Equation Parameters Coaxial Off-Axis Non-SVP SVP Spherical A = 1,B = 0,C > 0 4 4 - Elliptical A > 0,B = 0,C > 0 8 6 2 Hyperbolic A < 0,B = 0,C < 0 8 6 2 Parabolic A = 0,C = 0 7 5 2 Conical A < 0,B = 0,C = 0 4 2 - Cylindrical A = 0,B = 0,C > 0 4 2 -

Table 2.1: The parameters and degree of the FP equation (2.17) for various mirror shapes [4]. 16

2.1.4.2 Dense Projection through Quadric Mirrors Gonçalves and Araújo [62, 61] characterize the shape of a quadric mirror by three polynomials found via optimization on a single variable. The 1D search problem for the reflection point is performed efficiently along a parametrized curve equation of one unknown instead of employing the a multidimensional search with the classical reflection equations (Snell Law and Fermat Principle). In this model, a perspective camera and curved mirrors expressed by non degenerate quadric shapes are posed as a function of three independent parameters: A, B and C, now given in matrix form:

10 0 0 01 0 0 Q = 2 3 00 AB/2 6 7 6 00B/2 −C 7 6 7 where parameters can vary based on4 the surface type as5 shown in Table 2.1. For example, paraboloids require C = 0 and A = 0; ellipsoids need B = 0; hyperboloids A < 0 and C < 0; and 2 spheres use A = 1 and C + B /2 > 0. Gonçalves presents a non-iterative method where the curve is represented by a vector function X(λ)4⇥1. With λ0 as the solution value, the reﬂected point on the quadric surface m X(λ0). An advantage here is that the solution for the nonlinear equation is given by a single parameter λ. The inverted projection model for Qcam in camera coordinates:

Qcam = {qij} = T−T QT−1

−T 10 0 0 −1 Rt 01 0 0 Rt = 2 3 (2.18) " 01# 00 AB/2 " 01# 6 7 6 00B/2 −C 7 6 7 Therefore, the projection of reﬂection point4 m to the image plane5 is given by: 1 u = K[I | 0]m (2.19) λ T −1 where m(λ0)= vr 1 for its associated emanating ray vr = λ0K u and camera matrix K. T Thus, the quadrich is constrainedi as m Qcamm = 0, expanded as 2 2 2 (v1 q11 + v2 q22 + v3 q33 + 2v2v3q23 + 2v1v3q13+ (2.20) 2 2v1v2q12)λ +(2v1q14 + 2v2q24 + 2v3q34)λ + q44 = 0 to be solved for λ0. This geometric model applies a ray-based calibration method. The complete calibration steps are provided in Gonçalves’ thesis [61]. Although he claims the model is efﬁcient, it is noticed that requires ⇠200 seconds for projecting 10,000 3D points to the image using a CPU. It is suggested that the algorithm can be parallelized on a GPU for faster projection speeds. 17

2.1.5 Two-Plane Model As opposed to the previous models, this one operates in local regions of the image. It was ﬁrst proposed by Chen et al. [33] in the late 70’s and applied to camera calibration by Martins et al. [107]. The two-plane model are applicable to any type of imaging device because it can handle non-perspective distortions and all lines of sight are not forced to pass through a SVP. This model uses lifted image coordinates in its formulation and it’s based on the interpolation function that maps the image plane and the point correspondences on the planar calibration grids. In fact, the linear and quadratic interpolations employed for grid point matches are particular instances of rational polynomial camera models.

Figure 2.4: The viewing ray that results after back-projecting the coordinates of an arbitrarily chosen point on the image (as done in the ﬁrst version of the two-plane model by [107]). The point is mapped by the respective afﬁne transformation (obtained from calibration) of the enclosing triangle on each grid image (not shown here). The back-projected point is shown as a yellow dot inside the corresponding triangle location for the same coordinates on the image plane.

Figure 2.4 illustrates the idea behind the two-plane model for calibration. First, correspondences (chessboard corners) are found among the two images, and a triangulation is performed. These triangles are then matched by estimating an afﬁne transformation: a homography among the triangle vertices of the candidates. An image point (whose position is arbitrarily chosen shown as a yellow dot in Figure 2.4) is back-projected to each calibration grid via the corresponding afﬁne transformation of the triangle enclosing the point on each image (independently). Last, the camera ray is constructed by the line spanned across the back-projected point on the two calibration grids (planes). 18

Cited Method Pattern/entity Views Model Caglioti et al. [29] 1 line + mirror contour Single Non-central Barreto and Araujo [15] 3 lines Single Sphere Ying and Hu [201] 2 lines or 3 spheres Single Sphere Ying and Zha [202] 3 lines Single Sphere Wu et al. [189] 3 lines Single Sphere Vasseur and Mouaddib [180] multiple lines Single Sphere Puig et al. [145] 3D pattern (20+ points) Single Sphere Mei and Rives [112] 2D pattern (points) Multiple Sphere Deng et al. [44] 2D pattern (points) Multiple Sphere Gasparini et al. [56] 2D pattern (points) Multiple Sphere Wu and Hu [190] 4 correspondences Multiple Sphere Xiang et al. [191] 2D pattern (points) Multiple Sphere (GUM) Scaramuzza et al. [154] 2D pattern (points) Multiple Distortion Frank et al. [54] 2D pattern (points) Multiple Distortion Micusik and Pajdla [116] 9 correspondences (epipolar) Multiple Distortion Ramalingam et al. [147] 2 rotational + translation ﬂows Multiple Generic [165] Espuny and Burgos Gil [50] 2 rotational ﬂows Multiple Generic [165] Schönbein et al. [158] 2D pattern (points) Multiple Non-central + Centered

Table 2.2: Calibration methods for generic omnidirectional systems (based on [146])

2.2 Available Calibration Implementations for ODVSs

The calibration of a camera consists of estimating a map for each pixel in the image and its viewing direction (ray) from the single viewpoint (SVP) or caustic of viewpoints (non-SVP). Calibration is not the ultimate goal in a project, but the results of such end application rely heavily on the calibration quality. There exists a lot of omnidirectional camera models and their respective calibration methods. The classification tabulated by Puig et al [146] is modified in Table 2.2 only for those methods performed directly as a whole system (camera parameters are not separately found) and applicable to generic mirror shapes. We review the most popular methods for ODVS calibration. These methods are chosen due to their available implementations (all in MATLAB for some reason) from the authors’ websites. The method based on direct linear transform (DLT-like)1 by Puig et al. [145] and the generalized unified model (GUM)2 extended by Xiang et al. [191] are both based on the unified sphere model (Section 2.1.1). Scaramuzza et al. implemented the unified distortion model3. More recently, a hybrid-method4 for the calibration of quasi-central catadioptric cameras by Schönbein et al.[158]

1http://webdiis.unizar.es/~lpuig/DLTOmniCalibration/ 2https://github.com/zju-isee-cv/new_cv 3https://sites.google.com/site/scarabotix/ocamcalib-toolbox 4http://www.cvlibs.net/projects/omnicam/ 19 is ﬁrst presented in Section 2.2.1. We overview steps generally involved in the point-based methods (Section 2.2.2). Last, we discuss self-calibration possibilities in Section 2.2.3.

2.2.1 Hybrid Calibration for Quasi-Central Systems Schönbein et al.[158] take advantage disproportionate distance from the camera to the 3D points in the scene in comparison with the SVP deviation on quasi-central catadioptric cameras (Figure 1.11). Obviously, the error in the direction of viewing rays carries a higher impact on the system’s end goal than obtaining a precise SVP location provides. Thus, central models alone cannot improve the orientation of viewing rays, particularly when a bias is caused by 3D points sampled from calibration patterns located relatively close to the camera. They propose to estimate their calibration first by employing the analytical forward projection model given by Agrawal et al. [3] for non-central cameras (discussed in Section 2.1.4.1). Then, a simplification is attempted by approximating the resulting viewing rays to a SVP model (called Centered Model in [158]). In fact, Micusik and Pajdla [115] first showed that a non-central model can be approximated by a central model. Figure 2.5 shows the non-central configuration for the geometric model and its parameters and the approximated SVP for the centered model. This technique is evaluated experimentally by measuring the 3D displacement of triplets of landmarks (using monocular and stereo localization). The authors compare their results to the popular calibration toolboxes implemented by Mei [112] and Scaramuzza [154] discussed in previous sections. Similarly, this hybrid implementation (LIBOMNICAL) and data are publicly available on the authors’ website. The authors claim to gain three orders of speed and similar accuracy when compared to the existing non-central models, particularly against [63] and [3].

(a) Non-central projective geometry (b) SVP approximation (red cross)

Figure 2.5: Hybrid calibration for quasi-central systems proposed by Schönbein et al.[158] (a) shows the non-central model (AFP in Section 2.1.4.1) used as the base for the initial non-SVP projection: a scene point p is reflected by the mirror at m and toward the camera pinhole c by crossing the camera plane at q; (b) indicates the approximated SVP as a red cross from the locus formed by reflecting rays {wr}. The rays {wc}in blue are reflected toward the camera. 20

2.2.2 Point-based Methods

(a) 3D pattern (ﬁsheye) (b) Center and boundary extraction (c) Bounding grid by its 4 corners

Figure 2.6: Control patterns for point-based calibration: (a) 3D pattern with more than 20 corner points detected for the DLT-like method and a ﬁsheye lens; (b) Initial estimation of the image’s center based on mirror boundary extraction required in sphere-based methods; (c) the manual selection of four grid corners on the image.

The selected methods employ control points from 2D chessboard patterns (with exception of the DLT-like approach). As shown in Figure 2.6a, the DLT-like method requires a 3D volume of grid points from a single image where 20+ points are spread on at least 3 different planes. Unlike Mei’s sphere model or Xiang’s GUM, the DLT-like calibration uses lifted coordinates in order to deal with the non-linearity of the sphere model. Most implementations allow for automatic extraction of the control points in the image (sometimes by selecting the boundary 4 corners, masking or zooming into the region-of-interest). However, in the comparison of calibration methods carried by Puig et al. [146], the points for the DLT-like approach are manually selected to improve the detector’s accuracy. The accuracy of the chessboard point extraction depends on various factors that cannot be satisfied with the significant distortion inherent in ODVSs. For example, a frontal-orthonormality of the grid pose with respect to the camera plane is assumed by some chessboard corner-detection algorithms whose implementations may ignore, such is the case (prior to Datta’s analysis in 2009 [40]) of the OpenCV’s chessboard corner finder function [24]. Estimating the camera model parameters (intrinsic and extrinsic) by iteratively minimizing the reprojection error (residuals) of the control points in the image is a popular convergence metric for the cost function. The appropriate initialization of parameters are also critical for gradient-descent based approaches. In Section 2.1.3, we described Scaramuzza’s distortion model, which relies on a polynomial approximation of the projection function whose initial values are difficult to estimate. In fact, Schönbein et al. [158] improve the calibration accuracy of the distortion model implementation by appropriately normalizing the initialized polynomial coefficients which are presumed to deviate the solution from the true SVP due to numerical instabilities. In the original calibration toolbox developed by Mei5 for the unified sphere model

5http://www.robots.ox.ac.uk/~cmei/Toolbox.html 21 described in Section 2.1.1, the user must click on the four grid corners of each calibration image (example shown in Figure 2.6c) so the remaining corners can be interpolated (and detected) using the model’s initial parameters. Thus, we provide our understanding of the initialization procedure for the sphere model’s parameters in θ deﬁned in (2.1):

1. It is safe to assume the mirror is parabolic and that it is coaxially aligned (always true for T T [M] Mei’s model), so that cP =[ξX , ξY , ξZ] =[0, 0, 1]

2. All distortion parameters are initialized to zero. For example,[k1,k2] [0,0].

[I] 3. The center of the image mc is initialized from the mirror boundary by performing a RANSAC circle ﬁtting from Sobel edges found within the vicinity of the initial guess given by the user as demonstrated in Figure 2.6b.

[M] [M] [M] 4. The model’s focal lengths γ1,2 are initialized from four or more points p1, p2,..., pN [M] on a 3D line l, so their 2D correspondences on the image {m1n,m2,...,mN} can be o selected as depicted by Figure 2.7. A plane with normal n is formed across [M]l and [M] the center of the model OM. Note that l should not be collinear (radial) with OM so a non-degenerate solution can be found. Assuming that the mirror’s ZM-axis passes [I] through the image center mc , the 2D points can be put in the centered image frame as [I ] [I] [I] C mi = mi − mc for i 2{1,...,N}. Since we are also assuming ξZ = 1, we can lift a pixel into its 3D point on the sphere

ui [M] −1 [IC] pS h mi :⇡ vi i 2 2 2 3 γ ui +vi ⇣ ⌘ 2 − 2γ 6 7 4 5 [M] [M] Ultimately, pSi is also coplanar with its correspondence line point pi, so it must obey:

2 2 [M] T γ ui + vi pSi n = 0 ) uinx + viny + − nz = 0 2 2γ ! With 4 points, the value of γ is then solved from a system of equations using SVD for which a close-form expression is given in [112].

5. The initial poses for some L grids gg where g 2{1,...,L} are also required, and these can be found using planar homography as re-implemented by Xiang et al. [191] for GUM or by Schönbein’s calibration toolbox [158].

By posing the calibration as an optimization problem in which a popular “geometric error” is the Frobenius norm (similar to the Euclidean norm for vectors), which computes the square root of the sum of squares of all components. For example, the norm of pixel residuals is 22

Figure 2.7: The generalized focal length γ in the uniﬁed sphere model by Mei and then Xiang (GUM) can be estimated by selecting 4+ image points {m1,m2,...,mN} of a non-radial 3D line [M] [M] l. The plane formed by l and the center of the model OM is described by its normal n.

2 2 ri kmi − m˜ ikF = (u − u˜) +(v − v˜) (2.21) The goal is to ﬁnd a parameter vector q

v = gg, x (2.22) h i that minimizes the objective fJ, which is the scalar-valued function accumulating the projection errors (pixel residuals) ri deﬁned in (2.21), such that

L N ⇤ vk = argmin(fJ), where fJ(v) := ∑ ∑ rmig (2.23) v g=1 i=1 and according to (2.21), we have

2 2 rig fr m˜ ig,mig := rig = u˜ig − uig + v˜ig − vig (2.24) Recall that m is the true. image/ position0 0 (detected)q. of corner/ . point i /in its grid pattern g, ig 0 0 [C] and m˜ ig fϕ p˜ ig,θ˜ where fϕ is the projection function projecting m˜ ig for the corresponding point p˜ ig from⇣ grid pattern⌘ g using its estimated pose g˜g and the hypothesized model parameters ˜ θk. Instead of letting the solver estimate the gradient values numerically, the search speed can be vastly improved by providing the first partial derivatives (Jacobian) of the objective function in (2.23) with respect to the parameters in v˜ as derived by Mei using the chain rule. All these methods perform a nonlinear optimization (e.g. via Levenberg-Marquardt) as a final refinement. In the case of the hybrid method described in Section 2.2.1, centered model is obtained via a non-linear least-squares minimization about the orientation error of the view rays. 23

2.2.3 Self-Calibration Self-calibration relies on the point correspondence problem from multiple views in which all 3D poses and camera model parameters are unknown. In 2000, Kang [79] performed a pairwise tracking of point features [160] for the self-calibration of a para-catadioptric camera in order to compute the essential matrix E between two views via an error minimization procedure. Then, Micusik and Pajdla [115] extended the method to the auto-calibration of non-central catadioptric cameras by obtaining the distortion function (based on the polynomial eigenvalue problem) that maps the projection of the catadioptric images to a unit sphere. A similar approach is taken by Tardif et al. [173] who presented a self-calibration method for fisheye and catadioptric cameras by means of the radially symmetric distortion function in a couple of approaches: from images of unknown planar textures or by restricting the camera motion (e.g. to pure translation) in a non-planar scene where the tracking is performed. In [116], Micusik and Pajdla proposed their self-calibration method now valid for both fish-eye lenses and catadioptric systems by estimating the epipolar geometry of these systems from a small number of point correspondences. In this case, the epipolar geometry model and RANSAC [52] allow them remove outliers (mismatches) from the set of correspondences. The auto-calibration proposed in all this cases are essentially a 3D structure reconstruction from two-view geometry due to motion (Sfm) [102]. Ramalingam et al. [147] calibrate central cameras from optical flows due to pure rotational and translational constrained motion. Espuny and Burgos Gil [50] tried a similar approach but only from a couple of optical flows due to purely rotational motion about unknown linearly independent axes. The latter method suggests that a liner calibration can be performed by estimating the angular velocities from three rotations (instead of two) so that computing the derivative of the optical flows is not longer necessary. Kelly and Sukhatme [82] present an algorithm for the relative sensor pose self-calibration between an ODVS and an inertial measurement unit (IMU) in order to build a sparse metric map of the environment. Some supervised machine learning approaches have not yet been explicitly employed for ODVS calibration. However, in [109], Masci et al. believe the neural network hashing (NNhash) method outperforms other artificial neural network (ANN) approaches over descriptor learning and matching from omnidirectional images. Their method was not indicated to be explicitly used for calibration, but other works employing ANN for the implicit calibration of conventional cameras exist, such as [5, 6, 187] exist. The employment of ANNs for ODVS in ground mobile robotics was first demonstrated by Zhu et al. in 1996 [206]. Zhu’s method allows the vehicle to estimate its heading and its location along the route while generating orientation histograms from Fourier amplitudes in which rotation invariants is maintained. The road classification (e.g. paved, dirt, curves, intersections, etc.) are learned by the back-propagation network from the low- resolution omnidirectional images. Another usage of ANN for training a path-following system from omnidirectional images is given in [150]. More recently, Plagemann et al. [144] were able to learn ranges to surrounding obstacles from edge features extracted from omnidirectional images via a Gaussian process (a nonparametric model for regression) or via an unsupervised approach based on principal-component analysis (PCA) or linear discriminant analysis (LDA) for dimensionality reduction. 24

2.3 General Remarks

Various factor affect on the choice of the appropriate camera model. From the surveyed omnidirectional vision models, we can establish their selection based on the application’s goal and ease of calibration. As we stated, several models and the respective calibration procedures have been given in the literature. Keeping reproducibility in mind and due to implementation availability, we only can only attest about the usefulness and accuracy obtained by the models addressed in this survey in Section 2.2. These calibration methods rely heavily on the quantity and quality of the control image data employed, the methods are numerical optimization procedures that may be biased to the density and location of the calibration patterns and may not converge at times due to noise in the control data (detected corners). The sphere-based models and the distortion-based model are essentially designed for satisfying a single viewpoint (SVP) conﬁguration of the camera. Therefore, approximating a non-central or slightly non-SVP device into an SVP model may produced unexpected results as analyzed by Derrien and Konolige [45]. Closed form expressions relating 3D world point coordinates to their corresponding image coordinates do not exist for non-central catadioptrics. Calibration is easier and faster using parametrized models designed for central systems, whereas non-central models for non-SVP compliant systems are required for extreme accuracy. The hybrid model discussed in Section 2.2.1 is a good compromise if the non-SVP system is employed for accurate structure reconstruction applications although this is not entirely attempted in Schönbein’s work [156]. Applications that do not require extreme accuracy are among robotics navigation, tracking and visualization, so the latest Generalized Uniﬁed Model (GUM) may be a good candidate. 25

Chapter 3

Multiview Omnidirectional Vision

By multiview, we mainly refer to the omnidirectional stereo (omnistereo) case formed by a pair of panoramic views of the world captured from different viewpoints. The reason why one may combine more than two panoramic views is to reduce false correspondences from the environment. In particular, catadioptric omnistereo vision offers a richer array of topologies that can be adapted to a specific task. For example, omnistereo can be useful when dealing with nonholonomically constrained motion such as for an unmanned aerial vehicle (UAV), which requires a complete globe of range data in order to perform safe and agile maneuvers in cluttered environments (as demonstrated in [37]). In what follows, we outline the state-of-the-art in multiview omnistereo vision and possible system configurations. Many of the catadioptric omnistereo configurations that were proposed over the last decade are primarily derived from the geometries outlined in the seminal treatment by Baker and Nayar [12] reviewed in Section 1.2, and they are often aimed at satisfying the ubiquitous single-viewpoint (SVP) constraint, but not always.

3.1 Relevant Rig Conﬁgurations

Due to practical reasons, we consider only ODVSs where the camera’s optical axis is meant to coincide with the axis-of-symmetry of the quadric mirror or wide-angle lens. Now, we combine various mirrors or lenses to achieve omnidirectional views from multiple viewpoints. In [204], Zhu studied both the vertically-aligned and horizontally-aligned omnistereo conﬁgurations as well as their general triangulation equations and depth error (given some pixel correspondence error). Thus, we mainly treat possible vertically-aligned arrangements in Section 3.1.1 and various horizontal formations in Section 3.1.2.

3.1.1 Coaxial Arrangements When the optical components of the system are meant to be aligned by their axes of symmetry, this is considered a “coaxial” arrangement. The main advantages of this conﬁguration for 26

(a) Prototypes designed by Jaramillo et al. [77] (b) Vertical FOVs and Stereo ROI

Figure 3.1: Single-Camera omnistereo sensor designed for quadrotor MAVs: (a) shows the synthetic and real prototypes of the folded catadioptric ODVS with hyperboloidal mirrors whose profiles have been optimized for acquiring an unoccluded FOV when mounted on a quad-copter; (b) the omnistereo region of interest SROI that is conceived by the overlapping views from the FOV of each mirror. omnidirectional vision are twofold. First, the omnidirectional multiview could be achieved with a single camera that is shared among the multiple views, thus, offering practical advantages for robotics such as reduced cost, size, and weight. Second, a coaxial configuration avoids self-occlusions (singularity cases according to [204]) and maximizes the omnistereo region-of- ◦ interest SROI of the 360 overlapping view-field around the vertical axis. In this section, we refer to field-of-view as FOV to indicate vertical field-of-view instead.

3.1.1.1 Single-Camera “Folded” Catadioptric Systems The coaxial arrangement of a camera mirror system is known as folded catadioptrics (first introduced by Nayar and Peri [131]). There, 9 possible folded-catadioptric configurations were devised. A small form-factor together with a scalable baseline can be achieved with a “folded” configuration. Of practical interest to mobile robotics are configurations that not only offer a wide field-of-view but that also exploit the spatially variant resolution of a mirror to an advantage of the unique dynamics of a robot. For example, the spatial distribution of depth resolution may be tuned to a particular azimuth and elevation, such as the robot’s dominant direction of motion [93]. However, folded configurations tend to have a limited common (overlapping) FOV angle αSROI in which depth from correspondences can be computed (see Figure 3.1b). This fairly narrow vertical field-of-view region typically lying around the equator of the view-sphere. For example, single camera catadioptric stereo have been implemented in [76, 93, 77].

Hyper-catadatioptric Omnistereo Design Jaramillo et al. design and analyze the characteristics of a SVP-compliant omnistereo system based on the folded, catadioptric conﬁguration with hyperboloidal mirrors [77] shown in Figure 3.1a . Jaramillo’s approach resembles the work by 27

(a) Model parameters and common FOV (b) Prototypes: Small-scale and large-scale (robot)

Figure 3.2: Omnistereo sensor using spherical mirrors in a folded conﬁguration from [93].

Jang et al. given in [76] as by the use of a flat “reflex” mirror on the top mirror. Nevertheless, the sensor’s characteristics and parameter values in [76] are not fully justified. This motivates Jaramillo et al. [77] to perform an extensive analysis of their model’s geometric parameters that are obtained via a constrained numerical optimization considering the sensor’s end application. In fact, Jaramillo et al. are the first to propose a single-camera catadioptric omnistereo solution for passive omnidirectional 3D perception for a micro aerial vehicles (MAVs). They also show how the panoramic images are obtained, so correspondences can be found for triangulation (explained further in Section 3.3.3). They also present a triangulation uncertainty model and preliminary 3D experimental results from simulations and real images.

Spherical Mirrors (Non-SVP) Non-SVP configurations using spherical mirrors have addressed the issues of cost and limited field-of-view. The most relevant of such is the work of Derrien and Konolige in [45], while not being stereo, it explicitly models for the error introduced by relaxing the SVP constraint in the projection function. Although non-SVP mirrors have been previously used in robotics, we consider their work seminal in its detailed study of a non-SVP mirror in its application to mobile robotics. Labutov, Jaramillo, and Xiao [93] designed a novel omnistereo catadioptric rig consisting of a perspective camera coaxially-aligned with two spherical mirrors of distinct radii in a “folded” configuration (Figure 3.2). One caveat of spherical mirrors is their non-centrality as they do not satisfy the single effective viewpoint (SVP) constraint (discussed in Section 1.2.2) but rather a locus of viewpoints is obtained [170]. However, the proposed system addresses several of the aforementioned limits by generating a near-spherical depth panorama using generic, low-cost spherical mirrors. Their main contributions are:

1. Spherical mirrors in a folded conﬁguration maximize the image resolution near the poles of the view-sphere. This is a useful property for robots moving in a horizontal plane because the optic ﬂow generated from motion is of higher resolution with respect to depth from omnistereo measured on the lower resolution regions around the equator. 28

2. The radial epipolar geometry of the spherical mirrors is exploited (shown in Figure 3.11b) in order to compute dense metric-depth in the equatorial region of the view-sphere.

3. In addition, they fuse depth from optical-ﬂow (poles) and stereo (equator) in a dense probabilistic depth panorama. As a result, the scale factor for the depth-from-optical-ﬂow regions is recovered from the overlaps.

3.1.1.2 Single-Camera “Unfolded” Catadioptric Systems

(a) Rig (b) Triangulation uncertainty for a narrow baseline (c) Resolution invariance

Figure 3.3: Omnistereo catadioptrics via double-lobed mirrors. (a) A non-SVP double-lobed mirror; (b) Fiala et al. illustrate the triangulation uncertainty due to the narrow baseline distance - an unfortunate characteristic of double-lobed mirror systems [51]; (c) Geometry of a resolution invariance for omnistereo mirrors by Conroy et al. [38].

Double-lobed Mirrors The geometry of a mirror with two lobes of hyperbolic profile seen by a single camera was first reported by Ollis et al. in 1999 [140]. A double-lobed hyperbolic mirror was materialized by Cabral [28] and Correa et al. [39], who developed this kind of omnistereo system characterized by its exceptionally small form-factor. Both utilize SVP mirrors. However, in Fiala and Basu’s reconstruction work [51], the convex conic of the two lobes in their rig is not specified, so they are presumed to be non-SVP quadrics (perhaps, spherical lobes). As shown in Figure 3.3, the effective baseline distance of these systems is relatively smaller than the mirrors height, so this configuration is considered impractical for range sensing (see Section 3.3.3). Another family of double-lobed mirror profiles are specifically designed to guarantee the resolution invariance of the projected points the image. Figure 3.3c illustrates the projective geometry for a 3D point, so its two elevation angles are maintained while it gets reflected by the resolution invariant mirror.

Disjoint Mirrors We call this setup of mirrors “disjoint” in order to differentiate them from the dual-lobe mirrors surveyed before. In fact, this catadioptric conﬁguration is more practical for computing 3D information from coaxial panoramic stereo. The rig designed by Su et al. [167, 105] uses a pair of vertically-aligned hyperbolic mirrors and a single perspective camera. Their design places both hyperbolic mirrors facing down and vertically apart from each other 29

(a) Omnistereo projective geometry (b) Rig by Su et al. (c) Design by Xiong et al.

Figure 3.4: Two examples of coaxial omnistereo catadioptric systems (unfolded, disjoint configuration): (a) and (b) show the projective geometry and a prototype of the omnistereo rig designed by Su et al. [167]; (c) design by Xiong et al. [193] as illustrated in Figure 3.4a. This omnistereo sensor provides a wide baseline at the expense of a very tall system (Figure 3.4b). In [67], He et al. use this rig to propose various solutions to the stereo matching problem between their panoramic images . Figure 3.4c is the diagram for another design proposed in [193] by Xiong et al., who use parabolic mirrors of different diameters such that the small mirror is placed at the bottom (closer to the camera with a telecentric lens). Although the FOV is not explicitly stated in these manuscripts, it can be estimated from the specified geometry to be less than 90◦. A possible disadvantage of these unfolded configurations: while being suitable for ground vehicles, the rig is too tall for use on micro aerial vehicles.

Hybrid Omnistereo A combination of a concave lens and a hyperbolic mirror was attempted in [200]. Similarly, Krishnan and Nayar designed “cata-fisheye” [90] in order to achieve an omnistereo view by putting a convex mirror in front of a camera with a fisheye lens (Figure 3.5). A caveat of these systems is that they render very short baselines (like the double-lobed mirrors from Section 3.1.1.2) in comparison to other catadioptric omnistereo configurations.

3.1.1.3 Using a Plurality of Cameras Although not a single-image ODVS, a hyperbolic omnistereo rig can obviously be conceived by aligning two or more independent omnidirectional cameras by their central axes. Throughout the years, various binocular conﬁgurations have been applied to ground mobile robotics, so they are included here for completeness. For instance in 2001, Koyasu et al. [89] developed an omnistereo system using two separate hyper-catadioptric cameras to assist a mobile robot with obstacle avoidance (Section 4.2). In [164], Spacek studies the projective geometry of some coaxial omnistereo using hyperbolic and also conical mirrors. He shows how to ﬁnd radial edges from panoramas to be used as features for matching. However, no results of depth computation from the contributed edge-based method are provided. More recently, Wang et al. [182] explain 30

(a) Cata-Fisheye prototype (b) Omnistereo image example

Figure 3.5: The Cata-Fisheye camera designed by Krishnan and Nayar [90]. a deeper design analysis of a few omnistereo vision sensors of this binocular type. Unfortunately, these systems are not compact since they use separate camera-mirror pairs, which it is known to produce synchronization issues if not handled directly on chip.

3.1.2 Multiaxial Arrangements

Figure 3.6: Viewer-based cylindrical representation for the horizontally-aligned binocular omnistereo conﬁguration as shown by Zhu in [204].

In [133], Nene and Nayar depicted the arrangement of a variety of SVP-compliant mirrors so that reflecting rays pass through the single-camera’s viewpoint. Although these configurations did not provide fully omnidirectional stereo vision, they are the foundations for systems such as the “folded” catadioptric rigs reviewed in Subsection 3.1.1.1 and the single-camera omnistereo catadioptric systems discussed here. A useful multiaxial configuration is generally achieved by a horizontal alignment of the cameras so the common region-of-interest SROI is maximized (ignoring misalignments). However, triangulation singularities (from self-occlusion) are unavoidable for the horizontally-aligned arrangements as shown by Zhu [204] in Figure 3.6 for the binocular case of cylindrical panoramas. 31

Figure 3.7: Various multiaxial catadioptric arrangements for a single camera (from [121]).

3.1.2.1 Single-Camera Multiaxial systems The “sphereo” rig proposed in 1988 by Nayar [127] is perhaps the ﬁrst of this kind (Figure 1.4). In fact, various alternatives using a single camera and multiple mirrors can be realized. Mouaddib et al. [121] give some arrangements, and we present some of them in Figure 3.7. A comparison of arrangements is executed in [122], where the increment of vertical disparity is suggested by modifying the relative heights between the mirrors, specially if compound. Compound systems are those that use mirrors of several sizes and are fused closely together. The “Category Four” is justiﬁed not to have any triangulation singularity because there is always at least two views that can be obtained for all the points in the global FOV of the system. A “Category Four” arrangement is implemented by Caron et al. [31] for 3D model based tracking. Other actual examples of the usage of compound spherical mirrors can be seen in [88], [94], [171], and [2]. Parabolic mirrors are instead used in [152]. We discuss the application of a larger array of spherical mirrors employed by Taguchi et al. [171] in Section 4.3.

3.1.2.2 Binocular Multiaxial Systems Again, for completeness, we cover some multi-camera ODVSs that have been successfully employed to achieve omnistereo vision.

Horizontally-Combined Omnidirectional Catadioptric Cameras For example, in 2000, Sogo et al. [163] place several catadioptric ODVS around a room with the attempt to track passerby paths in real-time. The hyper-catadioptric horizontal-omnistereo rig by [84] is shown in Figure 3.8a. This system was used to build a 2D map and localize a ground robot in it (2D SLAM, see Section 4.1). Figure 3.8b is for a similar conﬁguration employed by Schönbein et al. in [156, 158] on top of the AnnieWAY1 vehicle. Here the baseline of the system (unspeciﬁed) is large enough for a far-sighted range sensing (their 3D mapping experiment is discussed in Section 4.1.5).

1http://www.mrt.kit.edu/annieway/ 32

(a) Rig employed by Kim in 2003 (b) Rig employed by Schönbein et al. in 2014

Figure 3.8: Horizontal omnistereo rigs using separate hyper-catadioptric OVDSs: (a) Kim and Chung provided a 2D SLAM algorithm using this omnistereo rig; (b) A pair of hyperbolic omnidirectional cameras horizontally aligned on top of AnnieWAY [156, 158].

Elliptical Mirrors for Stereopsis Nagahara et al. [125] prototyped a head mounted stereo ◦ display (Figure 3.9) by combining hyperboloidal and ellipsoidal mirrors to achieve αSROI ⇡ 60 .

(a) HMD Prototype (b) Wide FOVs

Figure 3.9: The head mounted stereo display developed by Nagahara et al. [125].

Stereo via Panoramic Annular Lenses (PALs) The light projection of a PAL is illustrated in Figure 1.10c although this is not the only PAL conﬁguration. Zhu et al. reported in 1999 how a PAL may be constructed with a variety of combinations of conic reﬂectors sharing focal placements (similar to what SVP folded catadioptric systems attempt). The main difference is their compact form and overlapping FOVs are avoided, so a single PAL cannot be used for stereopsis. Obviously, a pair of views from a couple of PALs can be used for 3D Estimation. Procedures for image unwarping into cylindrical panoramas (via look-up tables for speed) and for an empirical self-calibration are also given by Zhu et al. in [205]. The triangulation geometry is the same proposed for the general case of omnistereo cylinders presented in [204], where the baseline is estimated simply by the view of the other system in the world (a.k.a. cooperative stereo vision). A PAL is used by [#Lei2009] to exemplify their omnistereo calibration method. 33

3.2 Omnistereo Vision Calibration

For one to be able to take advantage of the immediate 3D sensing capabilities from an omnistereo system, the relative pose between its views is required. As discussed in Section 2.2for the monocular case, the results accuracy (here from triangulation of correspondences) depends on both the quality of the intrinsic as well as the extrinsic parameters of the system. Epipolar geometry plays an important role in the majority of methods, so we discuss it ﬁrst.

3.2.1 Epipolar Geometry

(a) Epipolar plane for hyperbolic mirrors (from [168]) (b) Epipolar plane with sphere models

Figure 3.10: Epipolar plane constraint for omnistereo

In general, epipolar geometry is essential for efficiently searching for point correspondences on stereo images because the pose between the viewpoints is tied by geometrical and projective constraints. Epipolar geometry for omnistereo has been studied extensively, first by Nene and Nayar in [133] and in [169, 168] by Svoboda. Then, in [57] by Geyer and Daniilidis, in [26] by Bunschoten and Kröse, in [114, 113] by Micusik, and in [13] by Barreto and Daniilidis. According to the illustration (and symbols) employed in Figure 3.10a, an epipolar plane π intersects the mirrors through the focal points (F1,F2), the reflection points (x1,x2), the couple of epipoles on each mirror, and the point in space X. The cut the epipolar plane π on each mirror is another conic segment from the point xi to its mirror epipole (depicted accordingly for each 0 configuration in Figure 3.11). The mirror epipoles gets projected as (ei,ei) to the image plane, whereas xi is projected as ui for i = {1,2}, respectively for each camera-mirror system. Obeying this coplanarity constraint (as in the perspective stereo case), a conic in the second image gets associated with an image point u1. This gives the epipolar geometry expression: T u2 [M2(E,u1)]u2 = 0 (3.1) 34

where nonlinear matrix function M2(E,u1) depends on pose (R,t), point u1, and the mirrors and cameras parameters (or models’ intrinsic parameters) to be determined from calibration.

(a) Non-coaxial conﬁgurations have epipolar conics (b) Coaxial conﬁguration has radial epipolar lines

Figure 3.11: Epipolar lines rendered by the two possible omnistereo conﬁgurations (from [95])

3.2.2 Calibration Approaches As far as we are aware, all existing methods for omnistereo systems assume a multi-camera configuration. This is an acceptable assumption for single-camera coaxial omnistereo rigs. We cannot generalize these methods for the folded configuration although Jaramillo et al. [77] claimed to have successfully calibrated their rig based on some state-of-art models (Chapter 2). For example, Lei et al. [95] use the distortion-based model described in Section 2.1.3 to calibrate a PAL camera (SONY RPU-C251). Then, they compute the essential matrix in the context of expression (3.1) between the two pre-calibrated models (intrinsically). Finally, the omnidirectional (warped) images can be rectified for stereo correspondence search along epipolar lines. A similar approach is taken by others, such as for the horizontal omnistereo setup shown in Figure 3.8b in [158, 156] based on the epipolar geometry solution. However, they claim to jointly optimize the intrinsic and extrinsic calibration parameters of this rig using various models they compare among, such as their own quasi-central centered model [158], the geometric model [4], the original unified central model by Geyer and Daniilidis [58], the unified sphere model of Mei and Rives [112] and the unified distortion model by Scaramuzza et al. [154]. It is reasonable to use the specifications provided by a commercial ODVS in order to initialize the models’ theoretical parameters. In addition, Schönbein and Geiger[156] use a Velodyne 3D laser scanner as precise ground truth for comparison of their depth measurements. In fact, this is the only open-sourced implementation2 at the time of this writing.

2http://www.cvlibs.net/projects/omnicam/ 35

Alternatively, for single-camera multiaxial with spherical mirrors, we ﬁnd a method given by [2, 1] to ﬁnd extrinsic parameters. Taguchi et al. [171] have previously mentioned the use of Agrawal’s method [3] to calibrate an array of spheres (Figure 4.3), but Agrawal does not really explain how his method can be used for stereo in [4].

3.3 Range Sensing from Omnistereo Sensors

Given the epipolar geometry of the omnistereo system (from intrinsic and extrinsic parameters estimated from calibration via a model), and assuming that point correspondences have been correctly established from the 2D images (usually from cylindrical panoramas), 3D points can be computed via triangulation.

(a) Panoramic Image Formation [77] (b) Basic triangulation on a coaxial omnistereo system

Figure 3.12: Illustration of (a) a cylindrical panorama by mapping pixels from the omnidirectional image πimg onto a cylindrical image Scyl; (b) corresponding non-skew rays intersect at point Pw via triangulation of elevation angles θt and θb for the top and bottom model, respectively.

3.3.1 Cylindrical Panoramas Unwarping the omnidirectional image is an essential step prior to searching for correspondences. In [194], Xiong et al. improve the unwarping performance (in regards to size and speed) when compared against Jeng’s method [78]. The improvement consists in partitioning the omnidirectional image into eight radial sectors and then creating a panorama mapping table out of only one of the sectors given the assumption of geometric symmetry of the ODVS. The intuition behind the time speed-up is the reduction in memory space since the LUTs are now 1/8 smaller and can be access from cached more efﬁciently. In sum, Figure 3.12a illustrates the construction of panoramic images as a projection onto a unit cylinder as demonstrated by Jaramillo et al. [77]. 36

3.3.2 Correspondence Search

Figure 3.13: Example of dense stereo matching from panoramic stereo

As shown in [60], the unwarped panoramas contain vertical, parallel epipolar lines that facilitate the pixel correspondence search. Koyasu et al. [89] compute the real-time range of the surrounding scene with a coaxial omnistereo vision rig (of two separate cameras) which eventually provide dense disparity panoramas. They employ a SAD (sum of absolute differences) window for the correspondence matching along the vertical epipolar lines. Jaramillo et al. [77] obtain acceptable disparity maps with the semi-global block matching (SGBM) method introduced by [70]. Figure 3.13 provides an example disparity image [Ξ∆m] resulting from this dense stereo matching among a pair of panoramic images ([Ξt],[Ξb]). Recall that no stereo matching algorithm (as far as we are aware) is immune to mismatches due to several reasons such as ambiguity introduced by cyclic patterns or texture-less regions. The algorithm chosen for finding matches is crucial to attain correct pixel disparity results. We refer the reader to [24] for a detailed description of stereo correspondence methods. A horizontal arrangement example is [59], where Geyer and Daniilidis show the rectification of a pair of panoramic images obtained from two different poses of a paracatadioptric rig. Figure 3.14b shows the resulting disparity image using a correlation window search on the rectified panoramas. The range-space matching template derived by Ng et al. in [136, 137] for multiple baseline omnistereo configurations consists of maintaining a volumetric map of templates (Figure 3.14a).

3.3.3 Triangulation We exemplify the triangulation process for a vertical omnistereo rig as illustrated in Figure 3.12b. The position of a 3D point Pw is simply obtained by the geometric triangulation of corresponding rays leaving each model center at elevation angles θt and θb, such that: 37

(a) Range-Space Approach [136] (b) Disparity image from correlation window search [59]

Figure 3.14: Omnistereo range from template matching: (a) Illustration of the range-space approach for multiple baselines; (b) Disparity image resulting from a dense matching between a pair of rectiﬁed stereo panoramas (only one shown) using the same paracatadioptric rig.

ρ cos(ϕ) w bsin α sin β bcos θ cos θ [Mb] ( ) ( ) ( t) ( b) pw = ρw sin(ϕ) , where ρw = = (3.2) 2 3 sin(α + β) sin(θt + θb) ρw tan(θ ) > > > > b > > > > 6 7 > > > > where ϕ is the4 azimuth angle5 and b the baseline> distance. For> convenience,> the rig’s> origin of coordinates OR coincides with the bottom model’s center OMb . However, when rays are skew (as shown in Figure 3.15b), the formulation turns into an approximation of the triangulated point Pw by getting the midpoint PwG on the common perpendicular line segment GtGb : λt?bvˆt?b where

vt ⌦ vb vˆt?b = (3.3) kvt ⌦ vbk T If the rays are not parallel, there exists an exact solution λ =[λGt , λGb , λt?b] for the well-determined system of equations:

[C] [C] vt −vb vˆt?b λ = tb − tt (3.4) Pixel correspondences are encodedh in the panoramici disparity map represented as a grayscale image in Figure 3.13. Points with larger disparities tend to be closer to the system and should provide a higher accuracy in the range estimation. However, establishing correspondence for objects too close to the system is more difﬁcult due to their more pronounced perspective deformations (foreshortening effect). The role the baseline distance and the physical size of the pixel play for the detectable range of an omnistereo system is analyzed in greater detail in [64]. A dense point cloud like the visualized in Figure 3.16 can be produced. 38

(a) Midpoint for horizontal omnistereo (b) Midpoint for coaxial omnistereo

Figure 3.15: Midpoint PwG found from triangulation of skew back-projection rays: (a) for horizontal omnistereo performed by Schönbein et al. [157]; (b) for skew rays (vt,vb) as provided by Jaramillo et al. [77] for their coaxial omnistereo sensor (Figure 3.1).

(a) 3D Perspective View (b) Orthographic View

Figure 3.16: A 3-D dense point cloud computed by triangulation of correspondences visualized as a disparity image in Figure 3.13: (a) 3D visualization of the point cloud where the position of the omnistereo sensor mounted on the quadrotor is annotated as frame [C] with respect to the scene’s coordinates frame [S] (b) Orthographic projection of the point cloud onto the XY grid. 39

Chapter 4

Applications of Omnidirectional Vision Sensors

Various possible applications from ODVSs are described by [195]. Omnidirectional catadioptric systems have been applied to a range of important problems in robotics including egomotion estimation (odometry) and simultaneous localization and mapping (SLAM)[84, 27]. A signiﬁcant subset of these applications is the recovery of dense omnidirectional depth maps for structure reconstruction and occupancy grids usually employed during path planning and reactive obstacle avoidance are covered in Section 4.1. Non robotics applications include panoramic synthetic refocusing (Section 4.3), omnipresent video conferencing, surveillance, and lately 360-degree virtual tourism reviewed in Section 4.4.

4.1 Structure from Motion (a.k.a. SLAM)

4.1.1 Introduction In robotics, egomotion can be achieved via visual odometry (a termed coined by Nister in [139]). Visual odometry (VO) is the problem of using only visual information (from single or multiple cameras) in order to estimate the position of the robot in the world. Monocular VO can be applied to aerial robotics such as in [22] (done off-board and tracking sparse visual cues without a map). The most prominent use of VO in aerial navigation (to our knowledge) was executed recently by Sinha et al. [161]. They also use an ofﬂine map computed by the structure from motion (Sfm) proposed by Longuet-Higgins in 1981 [102] (dubbed SLAM in robotics for Simultaneous Localization and Mapping). They combine point tracking 2D image-features to 3D point matching by exploiting the spatio-temporal coherence of their map. Their implementation is fast enough for autonomous navigation; however, the robustness of this kind of localization relies heavily on the quality of the 3D map. In fact, it has been very hard to produce accurate maps from Sfm alone. The unconstrained motion of monocular cameras and its ubiquity in all mobile platforms, in- 40 cluding aerial vehicles, requires that the mapping and localization to be processed simultaneously on-board. The problem when the map is known in advance, is reduced to pure localization. For example, in robotic swarm, a leader equipped with powerful sensor(s) can map the environment ﬁrst, so its peers carrying simple cameras and computers can localize themselves in the map. Visual SLAM achieved by using a single camera is known as MonoSLAM [41]. Various real-time methods for relocalization using a monocular system have been proposed. For example, in 2007 by Williams et al. [183], and similar biologically-inspired systems given in [117, 118]. Monocular SLAM methods, such as the Parallel Tracking and Mapping (PTAM) developed by Klein et al. [86] are the foundation of real-time SLAM for robotics and mobile devices. More recently with the emergence of cost-effective RGB-D sensors, Engelhard et al. [49] were able to develop a 3D Visual SLAM in 2011, and DTAM [134] as a GPU-based dense variance of PTAM. With the broad availability of RGB-D sensors like the Microsoft Kinect, faster and more accurate visual mapping techniques have grown, such as the RGB-D SLAM in [47] and a dense approach in [83]. We devote the following review to the related technology pertaining Visual SLAM acquired from omnidirectional visual sensors (ODVSs) in the next subsections.

4.1.2 Egomotion from Omnidirectional Visual Odometry Before we delve into SLAM, we review examples of visual odometry (VO) acquired from ODVS(s). The contribution by Zhu in [205] employs appearance-based panoramic vision instead of tracking hand-crafted features because the panoramic images (represented as matrices) can be represented as an eigenspace via a principal component analysis (PCA) transformation.In 1998, Kurata et al.[91] employed a fisheye lens for a ground robot navigation. However, the trajectories traverse are quiet short and they fuse a gyroscope with the visual odometry in order to estimate the angular velocity of the robot. In [89], besides determining the location of moving obstacles in the scene, the authors also estimate the robot’s egomotion by comparing the range profile (in a 2D layer) obtained from consecutive dense disparity panoramas. In [36], the egomotion estimation from a binocular omnistereo using panoramic annular lenses for a hopping robot is theorized via epipolar constraints (discussed in Section 3.2.1). The VO executed by Scaramuzza et al. [155] for a car’s trajectory of 400 meters uses to methods. Frame-by-frame poses are estimated by a direct application of Triggs’ algorithm [176, 175] in order to extract the rotation and translation components from homography decomposition of tracked planes that get extracted from a calibrated hyper-catadioptric ODVS, so perspective views can be produced for detection of coplanar SIFT features [104] filtered by RANSAC [52]. For robustness, a visual compass (heading orientation) is computed via an appearance based approach (as what Zhu et al. did in [206]) that exploits the rotation-invariance property of omnidirectional images. Although not using a single-camera ODVS, but rather an array of perspective cameras (a PointGrey Ladybug 2), Tardif et al. [172] estimate VO for a vehicle in an urban environment. On the other hand, VO from omnistereo seems more robust and an approach for real-time egomotion is demonstrated in [138], where tracked feature outliers are filter with RANSAC. The sensor used is a single-camera compound configuration of seven parabolic mirrors as explained in Section 3.1.2.1. 41

Bazin et al. in [19] use a para-catadioptric rig for motion estimation. Their algorithm estimates rotation from the assumption of parallel lines in man-made environments appear as conics on the omnidirectional image and they show a novel way to detect them. The translation is instead calculated from a proposed 2-point algorithm. In addition, they extract dominant vanishing points from parallel line bundles (conics) intersections. By a voting scheme among intersections from a bundle of conics, the dominant vanishing point is chosen and the ﬁnal direction is computed by SVD. This vanishing point extraction method is further developed in [18] (Figure 4.1c). In addition, they show how the absolute attitude (pitch and roll) can be estimated from the vertical direction inferred by detecting the sky as the brightest component in the image. In fact, the absolute attitude can always be computed afresh without accumulating error history. An alternative method to egomotion estimation and depth-mapping is via the use of optical ﬂow, and we review it next.

4.1.2.1 Optical Flow from an ODVS As proved by Nelson and Aloimonos [132], omnidirectional optical flow offers a significant advantage in that it provides an unambiguous recovery of the system’s extrinsic parameters given a sufficiently dense optical flow field. This permits a more robust de-rotation of the optical flow field, and thus, a more robust recovery of depth. McCarthy et al. [111] implemented Nelson and Aloimonos algorithm in a planar-moving robot using fish-eye optics. Egomotion estimation from optical flow is also demonstrated by Conroy et al. in 2009 [37] on top of a quadrotor navigating autonomously along a corridor (indoors). While this method offers a nearly hemispherical field-of-view, the depth is only recovered to a scale factor. In addition, it suffers from loss of depth resolution in the direction of the robot’s motion, where it is most valuable. This is inherent to all depth-from-optical-flow approaches.

4.1.3 Localization Localizing the omnidirectional view on an existing map can take various approaches. For example, in [177], an appearance-based technique like the one employed by Zhu et al. in [206] (for road classiﬁcation using a conical mirror described in Section 2.2.3) is now implemented via color histograms matching for a mobile robot’s topological localization from panoramic images also acquired via a catadioptric ODVS. This classiﬁcation occurs in real-time by adapting a nearest-neighbor voting scheme. Murillo et al. [124] improved upon previous appearance-based localization methods by employing a hierarchical matching of SURF [16] keypoints via kernel pyramids. They also incorporate three views for robustness of the outdoor localization. Later, in [178] an analysis for this kind of feature-based localization is made while taking into account seasonal changes. Another localization example is the work by Sato et al. [153] for estimating the trajectory of the camera. Again, epipolar geometry is employed for tracking points given by the wide motion-baseline of panoramic images taken from Google Street View. 42

4.1.4 Mapping 4.1.4.1 Mapping (Erroneously thought as Localization) In [64], they analyze the “localization” performance of a pair of para-catadioptric ODVSs. In their context, “localization” actually refers to the 3D triangulated position of correspondences (range-from-stereo). They study the range error associated to the baseline distance and the physical size of a pixel in the camera. However, only a single ODVS is used to act as stereo taken at constant relative displacements on top of a boat traveling along a creek. The existing topographical map of the region is only used offline to measure the results of the estimated map from the acquired VO. Although it may seem that SLAM is being executed, this is not the case because there was no attempt to do re-entry localization into the map or loop-closure of the computed model. Therefore, [64] is only considered as a mapping example via a monocular ODVS assisted by stereo triangulation (intersection of correspondent coplanar lines). The authors point out that obstacles in the direction of the trajectory cannot be detected, which it is true for the optical flow field of a pure translation where the focus-of-expansion perceives no flow and it indicates the direction of motion over a flat surface.

4.1.4.2 Dense 3D Reconstruction (Mapping) The example discussed previously tracks sparse features (e.g. Kanade-Lucas-Tomasi (KLT) [160]) and 3D positions are computed based on the omnistereo epipolar geometry. An early attempt to produce 3D dense point clouds from triangulation of panoramas when the relative views is given a priori was demonstrated in [26]. Panoramic reconstruction is attempted by Fiala and Basu [51] from well deﬁned rectangular patterns placed around the room. Dense 3D reconstructions of and indoor scene was later demonstrated by Arican and Frosard [10] from disparity maps acquired from two omnidirectional views. This procedure is further described in [11]. Graph cuts are employed as an optimization search of the pixel-wise energy for matching. This approach is also previously taken by Fleck et al. in [53]. We cover simultaneous mapping and localization from omnidirectional vision in the next sections based on the system applied: monocular versus binocular.

4.1.5 Sfm / 3D SLAM In [57], Geyer and Daniilidis theorize an algorithm for structure-from-motion (Sfm) using images from uncalibrated para-catadioptric camera. They represented points and lines in a “circle space” which also contains the image of the absolute conic and a 4×4 catadioptric fundamental matrix. From this point on, various approaches to Euclidean reconstruction have been proposed using the motion of single of single ODVS or using a number of overlapping views. For instance, in [84], a 2D SLAM algorithm is given for an autonomous mobile robot (in a static environment) based on the horizontal omnistereo rig shown in ﬁgure 3.8. Kim and Chung claim their rig to be robust the correspondence problem. By abandoning high-innovation (new) measurements during Kalman ﬁltering, the SLAM process can be run in real-time. However, the mapping is 43

(a) Semi-Global Block Matcher (b) Proposed MRF-based planes (c) Vanishing points detection

Figure 4.1: Examples: (a,b) Comparison of depth images and 3D reconstruction for the same scene for the same scene using baseline algorithms given in [156]; (c) Detection of parallel lines as conics and dominant vanishing points extraction [18]. only experimentally perform in 2D space as a proof-of-concept of their algorithm. An actual 2D SLAM experiment from panoramic vision is demonstrated by Deans [43]. The next surveyed works are successful attempts of the visual SLAM problem in 3D.

4.1.5.1 Sparse SLAM Various approaches to structure-from-motion / SLAM [172, 149, 65] have employed sparse features from the scene. Lemaire and Lacroix [96] present a solution to the bearing-only SLAM (BO-SLAM) problem using a calibrated para-catadioptric ODVS on top of a rover taking long trajectories. Gutierrez et al. [65] adapt the 1-Point RANSAC technique [34] to achieve EKF SLAM with omnidirectional images whose projection rays are linearized via the uniﬁed sphere model [58, 14]. They used the Jacobian matrix computation for the forward and back-projection solutions of sphere model (previously shown in [149]) because recall that EKF requires a linear measurement equation. They use FAST feature keypoints [151].

4.1.5.2 Semi-Dense SLAM The state-of-art work is Large-Scale Direct (LSD)-SLAM for omnidirectional cameras released by Caruso et al. [32] in 2015. In their paper, they exploit the semi-dense depth map estimation approach for VO developed in [48]. The main method consists in maintaining a Gaussian probability distribution of the semi-dense inverse distances on each keyframe of the model. Simply put, an inverse distance is deﬁned as: d = kxk−1 for a 3D point x with respect its camera frame. New poses are computed using direct image alignment: the new keyframe is matched against the closest neighbor using the inverse distance map that minimizes the Huber loss function, so only when . In order to avoid drift, a pose graph optimization of the model’s keyframes is continuously running in a background thread. An example reconstruction from a sequence and an overview of the LSD-SLAM pipeline are given in Figure 4.2, respectively. 44

(a) Pipeline (b) Reconstruction from a ﬁsheye images set

Figure 4.2: LSD-SLAM for ODVS [32]

4.1.5.3 Dense SLAM Arican and Frossard [10] ﬁrst attempt a dense 3D reconstruction of indoor environments based on disparity maps acquired from two omnidirectional views. This procedure is further described in [11]. Graph cuts are employed as an optimization search of the pixel-wise energy for matching. This approach is also previously taken by Fleck et al. in [53]. Pagani and Stricker [141] have also proposed algorithms for the unavoidable estimation errors of Visual SLAM from panoramic images from urban environments. Lhuillier [97, 98] proposed a fully automatic method for Sfm using a catadioptric ODVS. He designed bundle adjustment algorithms for both central and non-central models (even for totally uncalibrated cameras later in [99]). This method was generalized in [123] for “sparse” tracking in real-time, as well. The same approach was taken in [203] for the modeling of larger sequences. A post-processed model employing 3D Delaunay triangulation from the “sparse” point cloud registered by traversing an urban scene with a hand-held ODVS is shown in Figure 4.5. Another distinguished approach to dense SLAM is given by Schönbein and Geiger[156], but in this case via an omnistereo rig calibrated with the quasi-central method described in Section 2.2.1. Motion is estimated by tracking FAST features [151] that get triangulated as 3D points. In a RANSAC PnP fashion, the 3D points from the previous frame t − 1 are reprojected onto the current frame t, so the relative pose is obtained as a reprojection error minimization problem. The stereo matching algorithm for the model construction employs Markov random ﬁelds (MRF) for generating virtual omnidirectional views of hypothesized slanted planes (Recall that 3D planes do not get imaged as planes by an ODVS). A comparison of stereo matching methods and the reconstructed point clouds are shown for a mostly-planar scene in Figures 4.1a-4.1b. The authors use a Velodyne 3D laser scanner for precise ground truth depth measurements. A high-precision GPS/IMU was employed for ground truth motion of the vehicle (Figure 3.8b). 45

Figure 4.3: Wide-angle light ﬁeld depth and refocusing example via the axial-cones modeling proposed by Taguchi et al. [171] from a single image of an array of spherical mirrors.

4.2 Obstacle Avoidance for Autonomous Robots

Koyasu et al. [89] are capable of recognizing dynamic obstacles from their acquired omnidirectional range in real-time. The mapped free space is compared temporally and the velocity vectors of moving candidates are estimated via a Kalman filter. As described in Section 4.1.2, the robot’s egomotion is also filtered in the free space map generation. The work by Su et al. in [167] uses a coaxial binocular hyper-catadioptric rig to obtain obstacle 3D information computed via triangulation as demonstrated in Section 3.3.3. The application of annular fisheye lenses for avoiding obstacles is performed in [192].

4.3 Wide-Angle Light Fields (Digital Refocusing)

Tradition light field cameras alter the depth of field via prisms or micro-lenses around the image sensor, but they have limited FOV. The approach taken by Taguchi et al. [171] uses an array of spherical mirrors captured by single perspective camera so they can map corresponding cones of rays among the different viewpoints on the rotationally symmetric mirrors (spheres). For perspective light fields, a popular technique is texture mapping for refocusing [74]. Thus, Taguchi’s axial-cone modeling employs texture mapping for generating wide-angle light fields. 46

(a) Omni-image [129] (b) Images of participants in (a) (c) FlyVIZ prototype [9]

Figure 4.4: Example applications of catadioptric omnidirectional rigs: (a,b) Teleconferencing from para-catadioptric ODVS developed by Nayar et al. [129]; (c) Extending human capabilities with panoramic vision acquired via a catadioptric ODVS and HMD gear [9].

4.4 Surveillance, Conferencing, and Virtual Omnipresence

4.4.1 Omnidirectional Surveillance and Monitoring A camera with a panoramic annular lens (Section 1.2.2.3) was used for surveillance and smoke detection in [120] via motion history algorithms combining optical flow (reviewed in Section 4.1.2.1) and irregularity (entropy) checks. A low-power omnidirectional tracking system (LOTS) is developed in [23] for area video surveillance and monitoring. LOTS employs background and threshold modeling of the scene but it is only shown to work during the daytime. To solve this issue, human surveillance can be achieved by an omnidirectional thermal camera (Section 4.5). Boult et al. [23] proposed an probabilistic algorithm for detection false alarms, and they empirically compare the resolution of a Nikon Fisheye lens versus a para-catadioptric rig by RemoteReality (oneShot360) with the same camera, where the latter wins and has a larger FOV, too. In [135], Ng et al. analyze various N-ocular stereo setups for human-activity tracking. They conclude that a plurality of vantage points is desired in a robust monitoring system of dynamic environments at a low cost. They also innovate the use of virtual view synthesis via range-space approach of multiple omnistereo baselines techniques described in their previous work [137] (illustrated in Figure 3.14a). Similarly, Gandhi and Trivedi [55] develop a Panoramic Appearance Map (PAM) that performs person re-identification in a multi-camera setup. A person’s location on the floor plan is triangulated using the azimuths and height information from cameras in which the target is detected. The temporal tracks from each omnistereo-pair are estimated via a weighted SSD of their proposed error measurements (skew rays minimum distances as illustrated in Figure 3.15b). A person is re-identified by matching against its color-appearance signature (affected by clothing). 47

4.4.2 Panoramic Teleconferencing Another popular use of an ODVS is teleconferencing. As opposed to the use of multiple cameras for immersive teleconferencing as demonstrated in [106], Nayar et al. in [129] designed various single-camera compact para-catadioptric sensors for teleconferencing such as the one shown in Figure 1.6a. They implemented software to let the end-user specify parameters for the generation of perspective images for the participants in the panoramic view (shown in Figures 4.4a and 4.4b).

4.4.3 Omnipresence for Virtual Reality In the last few years, virtual omnipresence has gained tremendous popularity due to the spread availability of off-the-shelf head-mounted display (HDM) like the Oculus Rift1, the Google Cardboard2 project, among others. For example, Google Street View, tourist attractions such as view-sphere tours of museums, educational exploration, video games, etc., can take advantage of these wearable displays. Generating 360-degree content is now a trivial task also due to availability of affordable PAL and catadioptric omnidirectional cameras. An interesting use of an catadioptric ODVS combined with an HMD is the head-mounted FlyVIZ prototype by Ardouin et al. [9] shown in Figure 4.4c. This rig is meant to capture the 360◦ panoramic view of the user wearing it so it can be displayed back in an augmented fashions such as demonstrated in the later work by the same authors in [8]. More recently, Lhuillier and Yu [99] demonstrated an immersive 3D reconstruction from a head-mounted catadioptric camera (uncalibrated) walking around a long path in a city (example view shown Figure 4.5). The gamut of applications with this kind of device for VR or other foreseeable ways we have not discussed before have great potential.

4.5 Omnidirectional Thermal Vision

Thermal imagers can measure IR radiation during both night and day, so they are better suited for 24-hour image acquisition. In addition, thermal images don’t depend on the shortcomings of color intensity contrast, but they rather use the small variations in heat signatures that objects emit. Due to the high costs of thermal cameras, expanding their field of view is desirable. Since IR energy can be reflected (via IR reflective metal coatings such as aluminum, gold, silver, etc. [181]), it is possible to expand the field-of-view via conic reflectors (mirrors) . For example, Wong at al. [186, 185] designed an omnidirectional thermal camera by reflecting heat signatures on hyperbolic mirror (chromium coated) and thus expanded the thermal camera’s FOV for human surveillance (Figures 4.6a and 4.6b). Another use of an infrared omnidirectional vision was for the skyline detection for UAVs described by Bazin et al. in [20] via an adapted dynamic programming (Figure 4.6c). In fact, there exists an array of uses for omnidirectional thermal

1http://oculus.com 2https://vr.google.com/cardboard 48

Figure 4.5: Triangulated model from omnidirectional images captured along a city walk [99] infrared panoramic imaging sensors, such as “defense and security applications, including force protection, asset protection, asset control, security including port security, perimeter security, video surveillance, border control, airport security, coastguard operations, search and rescue, intrusion detection, and many others” listed by Gutin et al. in [66].

(a) Chromatic image [186] (b) Thermal Image [186] (c) Skyline extraction [20]

Figure 4.6: Examples of omnidirectional thermal vision 49

4.6 Concluding Remarks

We have only overviewed a few direct applications of monocular omnidirectional vision sensors (with the exception of a binocular ODVS used for Sfm/SLAM). In fact, performing Sfm/SLAM from omnidirectional vision provides obvious advantages over traditional perspective cameras, including ﬁsheye lenses and most recently RGB-Depth sensors. For example, observing points from all around the ODVS decreases the probability of losing track of the system’s pose as it may happen when heading toward a featureless surface with a narrow FOV camera. An ODVS can be employed for other than the aforementioned applications, such as for assisting visually-impaired individuals with navigation in unfamiliar environments as proposed in [72]. As in any ﬁeld, there are myths about catadioptric omnidirectional vision, too. The short course on omnidirectional vision given at ICCV 2003 by C. Geyer , T. Pajdla, and K. Daniilidis debunks a few:

Myth: Catadioptric images are highly distorted.

Truth: Not necessarily. For instance, parabolic mirrors induce no distortion when perpendicular to the viewing direction.

Myth: Omnidirectional cameras are more complicated to use for Sfm/SLAM than with perspective cameras.

Truth: Not always. Parabolic mirrors (in particular) are easy to model, calibrate, and do Sfm/SLAM with.

Fact: Omnidirectional systems have lower resolution.

Tradeoff: ODVSs balance resolution and ﬁeld-of-view to ﬁt certain application’s needs.

The choice of ODVS configuration and projection model (as well as calibration method) depends solely upon the task at hand. The possibility of combining modalities of sensors is a practical approach to strengthen the deficiencies of a particular configuration. For example, an observable region’s image resolution can be increased by adding a perspective camera (as done in [30] and [100]) looking into the region-of-interest. Also, an inertial measurement unit (IMU) can be fused with an ODVS to strengthen egomotion estimation when visual odometry is temporarily unavailable due to lack of features to track or any other unforeseeable event. Trading off the accuracy and high-definition provided by a polydioptric system may be a route to take with the unmatched functionality given by a single-image ODVS. For instance, when the payload capacity, energy consumption, and computational resources are limited. 50

Bibliography

[1] Amit Agrawal. Extrinsic Camera Calibration without a Direct View Using Spherical Mirror. In 2013 IEEE International Conference on Computer Vision, pages 2368–2375. Ieee, dec 2013. [2] Amit Agrawal and Srikumar Ramalingam. Single Image Calibration of Multi-axial Imaging Systems. In 2013 IEEE Conference on Computer Vision and Pattern Recognition, pages 1399–1406. Ieee, jun 2013. [3] Amit Agrawal, Yuichi Taguchi, and Srikumar Ramalingam. Analytical Forward Projection for Axial Non-Central Dioptric and Catadioptric Cameras. In Proceedings of the 11th European Conference on Computer Vision Conference on Computer Vision: Part III (ECCV’10), volume 6313 LNCS, pages 129–143, 2010. [4] Amit Agrawal, Yuichi Taguchi, and Srikumar Ramalingam. Beyond Alhazen’s problem: Analytical projection model for non-central catadioptric cameras with quadric mirrors. In Computer Vision and Pattern Recognition 2011, 2011. [5] M T Ahmed, E E Hemayed, and A A Farag. Neurocalibration: a neural network that can tell camera calibration parameters. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 1, pages 463–468 vol.1, 1999. [6] Moumen Ahmed and Aly Farag. A neural approach to zoom-lens camera calibration from data with outliers. Image and Vision Computing, 20(9-10):619–630, 2002. [7] D.G. Aliaga. Accurate catadioptric calibration for real-time pose estimation in room-size environments. In Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, volume 1, 2001. [8] J Ardouin, A Lécuyer, M Marchal, and E Marchand. Stereoscopic rendering of virtual environments with wide Field-of-Views up to 360 degrees. In 2014 IEEE Virtual Reality (VR), pages 3–8, mar 2014. [9] Jérôme Ardouin, Anatole Lécuyer, Maud Marchal, Clément Riant, and Eric Marchand. FlyVIZ: a novel display device to provide humans with 360 vision by coupling catadioptric camera with hmd. In Proceedings of the 18th ACM symposium on Virtual reality software and technology, pages 41–44, 2012. 51

[10] Zafer Arican and Pascal Frossard. Dense disparity estimation from omnidirectional images. In Advanced Video and Signal Based Surveillance, 2007. AVSS 2007. IEEE Conference on, pages 399–404. Ieee, 2007.

[11] Zafer Arican and Pascal Frossard. Dense Depth Estimation from Omnidirectional Images. 2009.

[12] Simon Baker and Shree K. Nayar. A theory of single-viewpoint catadioptric image formation. International Journal of Computer Vision, 35(2):175–196, 1999.

[13] J P Barreto and K Daniilidis. Epipolar Geometry of Central Projection Systems Using Veronese Maps. In 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR’06), volume 1, pages 1258–1265, jun 2006.

[14] Joao P. Barreto and Helder Araújo. Issues on the Geometry of Central Catadioptric Image Formation. In Conference on Computer Vision and Pattern Recognition (CVPR). Published by the IEEE Computer Society, 2001.

[15] Joao P. Barreto and Helder Araújo. Geometric properties of central catadioptric line images and their application in calibration. IEEE Transactions on Pattern Analysis and Machine Intelligence, 27(8):1327–1333, 2005.

[16] H Bay, A Ess, T Tuytelaars, and L Vangool. Speeded-Up Robust Features (SURF). Computer Vision and Image Understanding, 110(3):346–359, jun 2008.

[17] J.C. Bazin. Catadioptric Vision for Robotic Applications. PhD thesis, Korea Advanced Institute of Science and Technology (KAIST), 2011.

[18] J.C. Bazin, C. Demonceaux, P. Vasseur, and I. Kweon. Rotation estimation and vanishing point extraction by omnidirectional vision in urban environment. The International Journal of Robotics Research, 31(1):63–81, jan 2012.

[19] J.C. Bazin, C. Demonceaux, P. Vasseur, and I.S. Kweon. Motion estimation by decoupling rotation and translation in catadioptric vision. Computer Vision and Image Understanding, 114(2):254–273, 2010.

[20] J.C. Bazin, I. Kweon, C. Demonceaux, and P. Vasseur. Dynamic programming and skyline extraction in catadioptric infrared images. 2009 IEEE International Conference on Robotics and Automation, pages 409–416, may 2009.

[21] Ryad Benosman and Sing B. Kang. Panoramic vision : sensors, theory, and applications. Springer, New York, New York, USA, 2001.

[22] Cooper Bills, Joyce Chen, and Ashutosh Saxena. Autonomous MAV Flight in Indoor Environments using Single Image Perspective Cues. In IEEE Int. Conf. on Robotics and Automation, volume 67, 2011. 52

[23] T. E. Boult, X. Gao, R. Micheals, and M. Eckmann. Omni-directional visual surveillance. Image and Vision Computing, 22(7):515–534, 2004. [24] G. Bradski and A. Kaehler. Learning OpenCV: Computer vision with the OpenCV library. O’Reilly Media, 2008. [25] Donald Robert Buchele and William Martin Buchele. Unitary catadioptric objective lens system, 1950. [26] Roland Bunschoten and Ben Kröse. 3-D scene reconstruction from cylindrical panoramic images. Robotics and Autonomous Systems, 41(2-3):111–118, 2002. [27] Christopher Burbridge and Libor Spacek. Omnidirectional vision simulation and robot localisation. In Proceedings of TAROS, 2006. [28] Eduardo E.L.L. Cabral, José C. Junior de Souza, and Marcos C Hunold. Omnidirec- tional stereo vision with a hyperbolic double lobed mirror. In Proceedings of the 17th International Conference on Pattern Recognition (ICPR), pages 0–3. Ieee, 2004. [29] Vincenzo Caglioti, Pierluigi Taddei, Giacomo Boracchi, Simone Gasparini, and Alessan- dro Giusti. Single-image calibration of off-axis catadioptric cameras using lines. Proceed- ings of the IEEE International Conference on Computer Vision, 2007. [30] Stefano Cagnoni, Monica Mordonini, Luca Mussi, and Giovanni Adorni. Hybrid Stereo Sensor with Omnidirectional Vision Capabilities: Overview and Calibration Procedures. In Image Analysis and Processing, 2007. [31] Guillaume Caron, El Mustapha Mouaddib, and Eric Marchand. 3D model based tracking for omnidirectional vision: A new spherical approach. Robotics and Autonomous Systems, 60(8):1056–1068, 2012. [32] David Caruso, Jakob Engel, and Daniel Cremers. Large-Scale Direct SLAM for Omnidi- rectional Cameras. In International Conference on Intelligent Robots and Systems (IROS), 2015. [33] Nai-Yung Chen, J Birk, and R Kelley. Estimating workpiece pose using the feature points method. IEEE Transactions on Automatic Control, 25(6):1027–1041, dec 1980. [34] Javier Civera, Oscar G Grasa, Andrew J Davison, and J M M Montiel. 1-Point RANSAC for EKF Filtering. Application to Real-Time Structure from Motion and Visual Odometry. Journal of Field Robotics, 27(5):609–631, 2010. [35] James S. Conant. Apparatus and method for taking and projecting pictures, 1942. [36] M Confente, P Fiorini, and G Bianco. Stereo omnidirectional vision for a hopping robot. Robotics and Automation, 2003. Proceedings. ICRA ’03. IEEE International Conference on, 3:3467–3472 vol.3, 2003. 53

[37] Joseph Conroy, Gregory Gremillion, Badri Ranganathan, and J. Sean Humbert. Imple- mentation of wide-ﬁeld integration of optic ﬂow for autonomous quadrotor navigation. Autonomous Robots, 27(3):189–198, aug 2009.

[38] Tanya L Conroy and John B Moore. Resolution Invariant Surfaces for Panoramic Vision Systems. Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV’99), 1:392–397, 1999.

[39] Fabiano Rogério Corrêa, V.C. Guizilini, and J.O. Junior. Omnidirectional Stereovision System With Two-Lobe Hyperbolic Mirror for Robot Navigation. ABCM Symposium Series in Mechatronics, 2(1996):653–660, 2006.

[40] Ankur Datta, Jun-Sik Kim, and Takeo Kanade. Accurate camera calibration using iterative reﬁnement of control points. Computer Vision Workshops (ICCV Workshops), 2009 IEEE 12th International Conference on, pages 1201–1208, 2009.

[41] Andrew J Davison, Ian D Reid, Nicholas D Molton, and Olivier Stasse. MonoSLAM: real-time single camera SLAM. IEEE Transactions on Pattern Analysis and Machine Intelligence, 29(6):1052–1067, jun 2007.

[42] Joseph De Falco. Observation Apparatus, 1915.

[43] Matthew Charles Deans. Bearings-Only Localization and Mapping. PhD thesis, Carnegie Mellon University, 2005.

[44] Xiao-Ming Deng, Fu-Chao WU, and Yi-Hong WU. An Easy Calibration Method for Central Catadioptric Cameras. Acta Automatica Sinica, 33(8):801–808, 2007.

[45] S Derrien and K Konolige. Approximating a single viewpoint in panoramic imaging devices. Omnidirectional Vision, 2000. Proceedings. IEEE Workshop on, (April):85–90, 2000.

[46] Renè Descartes, David Eugene Smith, and Marcia L Latham. The geometry of Renè Descartes. Dover Publications, [New York, 1637.

[47] Ivan Dryanovski, Roberto G Valenti, and Jizhong Xiao. Fast Visual Odometry and Mapping from RGB-D Data. In International Conference on Robotics and Automation, volume 10031, 2013.

[48] Jakob Engel, Jurgen Sturm, and Daniel Cremers. Semi-Dense Visual Odometry for a Monocular Camera. In IEEE International Conference on Computer Vision (ICCV), Sydney, Australia, dec 2013.

[49] Nikolas Engelhard, Felix Endres, and J Hess. Real-time 3D visual SLAM with a hand-held RGB-D camera. In Proc. of the RGB-D Workshop on 3D Perception in Robotics at the European Robotics Forum, number c, Vasteras, Sweden, 2011. 54

[50] Ferran Espuny and José I Burgos Gil. Generic Self-calibration of Central Cameras from Two Rotational Flows. International Journal of Computer Vision, 91(2):131–145, 2011.

[51] Mark Fiala and Anup Basu. Panoramic stereo reconstruction using non-SVP optics. Object recognition supported by user interaction for service robots, 4(3):27–30, jun 2005.

[52] Martin A. Fischler and Robert C. Bolles. Random sample consensus: A paradigm for model ﬁtting with applications to image analysis and automated cartography. Communi- cations of the ACM, 24(6), 1981.

[53] S Fleck, F Busch, P Biber, W Strasser, and H Andreasson. Omnidirectional 3D Modeling on a Mobile Robot using Graph Cuts. In Proceedings of the 2005 IEEE International Conference on Robotics and Automation, pages 1748–1754, apr 2005.

[54] Oliver Frank, Roman Katz, C L Tisse, and H Durrant-Whyte. Camera calibration for miniature, low-cost, wide-angle imaging systems. In British Machine Vision Conference. Citeseer, 2007.

[55] Tarak Gandhi and Mohan Manubhai Trivedi. Person tracking and reidentiﬁcation: Intro- ducing Panoramic Appearance Map (PAM) for feature representation. Machine Vision and Applications, 18(3-4):207–220, 2007.

[56] Simone Gasparini, Peter Sturm, and Joao P. Barreto. Plane-based calibration of central catadioptric cameras. In 2009 IEEE 12th International Conference on Computer Vision, pages 1195–1202. Ieee, sep 2009.

[57] Christopher Geyer and K Daniilidis. Structure and motion from uncalibrated catadioptric views. In Computer Vision and Pattern Recognition, 2001. CVPR 2001. Proceedings of the 2001 IEEE Computer Society Conference on, volume 1, pages I–279–I–286 vol.1, 2001.

[58] Christopher Geyer and Kostas Daniilidis. A unifying theory for central panoramic systems and practical implications. European Conference on Computer Vision (ECCV), pages 445–461, 2000.

[59] Christopher Geyer and Kostas Daniilidis. Conformal Rectiﬁcation of Omnidirectional Stereo Pairs. 2003 Conference on Computer Vision and Pattern Recognition Workshop, pages 73–73, jun 2003.

[60] Joshua M Gluckman, Shree K. Nayar, and Keith J. Thoresz. Real-Time Omnidirectional and Panoramic Stereo. Computer vision and image understanding, pages 299–303, 1998.

[61] Nuno Gonçalves. Noncentral catadioptric systems with quadric mirrors: geometry and calibration. PhD thesis, UNIVERSIDADE DE COIMBRA, 2008. 55

[62] Nuno Gonçalves and Helder Araújo. Projection model, 3D reconstruction and rigid motion estimation from non-central catadioptric images. In 3D Data Processing, Visualization and Transmission, 2004.

[63] Nuno Gonçalves and Ana Catarina Nogueira. Projection through quadric mirrors made faster. 2009 IEEE 12th International Conference on Computer Vision Workshops, ICCV Workshops, pages 2141–2148, sep 2009.

[64] Xiaojin Gong, Anbumani Subramanian, Christopher L. Wyatt, and Daniel J. Stilwell. Per- formance analysis and validation of a paracatadioptric omnistereo system. In Proceedings of the IEEE International Conference on Computer Vision, 2007.

[65] Daniel Gutierrez, Alejandro Rituerto, J M M Montiel, and J J Guerrero. Adapting a Real-Time Monocular Visual SLAM from Conventional to Omnidirectional Cameras. In ICCV Workshops, 2011.

[66] Mikhail Gutin, Eddy K. Tsui, Olga Gutin, Xu-Ming Wang, and Alexey Gutin. Thermal infrared panoramic imaging sensor. Proceedings of SPIE, 6206:62062E–62062E–10, 2006.

[67] Lei He, Chuanjiang Luo, Feng Zhu, and Yingming Hao. Stereo Matching and 3D Reconstruction via an Omnidirectional Stereo Sensor. In Motion Planning, number 60575024, pages 123–142. In-Tech Education and Publishing, Vienna, Austria, 2008.

[68] Thomas J Herbert. Calibration of ﬁsheye lenses by inversion of area projections. Applied optics, 25(12):1875–1876, jun 1986.

[69] R A Hicks and R Bajcsy. Catadioptric sensors that approximate wide-angle perspective projections. In Computer Vision and Pattern Recognition, 2000. Proceedings. IEEE Conference on, volume 1, pages 545–551 vol.1, 2000.

[70] Heiko Hirschmüller. Stereo processing by semiglobal matching and mutual information. IEEE transactions on pattern analysis and machine intelligence, 30(2):328–41, feb 2008.

[71] Jiawei Hong, Xiaonan Tan, Brian Pinette, Richard Weiss, and Edward M. Riseman. Image-Based Homing. IEEE Control Systems, 12(1):38–45, 1992.

[72] Feng Hu, Zhigang Zhu, and Jianting Zhang. Mobile panoramic vision for assisting the blind via indexing and localization. Lecture Notes in Computer Science (including subseries Lecture Notes in Artiﬁcial Intelligence and Lecture Notes in Bioinformatics), 8927:600–614, 2015.

[73] Zhi Huang, Jian Bai, Tian Xiong Lu, and Xi Yun Hou. Stray light analysis and suppression of panoramic annular lens. Optics express, 21(9):10810–10820, 2013. 56

[74] Aaron Isaksen, Leonard McMillan, and Steven J Gortler. Dynamically reparameterized light ﬁelds. In Siggraph ’00, pages 297–306, 2000.

[75] Hiroshi Ishiguro. Development of Low-Cost Compact Omnidirectional Vision Sensors and their applications. In Proc. Int. Conf. Information systems, analysis . . . , 1998.

[76] Gijeong Jang, Sungho Kim, and Inso Kweon. Single camera catadioptric stereo system. In Proceedings of Workshop on Omnidirectional Vision, Camera Networks and Nonclassical cameras (OMNIVIS2005). Citeseer, 2005.

[77] Carlos Jaramillo, Roberto G Valenti, Ling Guo, and Jizhong Xiao. Design and Analysis of a Single-Camera Omnistereo Sensor for Quadrotor Micro Aerial Vehicles (MAVs). Sensors, 16(2):217, jan 2016.

[78] S W Jeng and W H Tsai. Using pano-mapping tables for unwarping of omni-images into panoramic and perspective-view images. IET Image Processing, 1(2):149–155, jun 2007.

[79] Sing Bing Kang. Catadioptric self-calibration. In Computer Vision and Pattern Recogni- tion, 2000. Proceedings. IEEE Conference on, volume 1, pages 201–207 vol.1, 2000.

[80] Juho Kannala and Sami Brandt. A generic camera calibration method for ﬁsh-eye lenses. In Proceedings - International Conference on Pattern Recognition, volume 1, pages 10–13, 2004.

[81] James C. Karnes. Omniscope, 1931.

[82] Jonathan Kelly and Gaurav S Sukhatme. Self-Calibration of Inertial and Omnidirectional Visual Sensors for Navigation and Mapping. In Proceedings of the 2nd. Workshop on Omnidirectional Robot Vision A workshop of the 2010 IEEE International Conference on Robotics and Automation (ICRA2010), pages 1–6, Anchorage, 2010.

[83] Christian Kerl, Jurgen Sturm, and Daniel Cremers. Dense visual SLAM for RGB-D cameras. IEEE International Conference on Intelligent Robots and Systems, pages 2100– 2106, 2013.

[84] Jae-Hean Kim and Myung Jin Chung. SLAM with omni-directional stereo vision sensor. Proceedings 2003 IEEERSJ International Conference on Intelligent Robots and Systems IROS 2003 Cat No03CH37453, 1(October):442–447, 2003.

[85] Rudolf Kingslake. A History of the Photographic Lens. 1989.

[86] Georg Klein and David Murray. Parallel Tracking and Mapping for Small AR Workspaces. In 2007 6th IEEE and ACM International Symposium on Mixed and Augmented Reality, pages 1–10, Nara, Japan, nov 2007. Ieee.

[87] L H Kleinschmidt. Apparatus for producing topographic views., 1911. 57

[88] Yuichiro Kojima, Ryusuke Sagawa, Tomio Echigo, and Yasushi Yagi. Calibration and Performance Evaluation of Omnidirectional Sensor with Compound Spherical Mirrors. In Workshop on Omnidirectional Vision, Camera Networks and Non-classical Cameras (OMNIVIS’05), 2005.

[89] Hiroshi Koyasu, Jun Miura, and Yoshiaki Shirai. Recognizing moving obstacles for robot navigation using real-time omnidirectional stereo vision real-time omnidirectional stereo. Journal of Robotics and Mechatronics, 14(2):147–156, 2002.

[90] G Krishnan and Shree K. Nayar. Cata-Fisheye Camera for Panoramic Imaging. 2008 IEEE Workshop on Applications of Computer Vision, pages 1–8, 2008.

[91] J Kurata, K T V Grattan, and H Uchiyama. Navigation system for a mobile robot with a visual sensor using a ﬁsh-eyelens. Review of Scientiﬁc Instruments, 69(2):585–590, 1998.

[92] Sujit Kuthirummal and Shree K. Nayar. Multiview radial catadioptric imaging for scene capture. ACM Transactions on Graphics, 25(3):916, 2006.

[93] Igor Labutov, Carlos Jaramillo, and Jizhong Xiao. Generating near-spherical range panoramas by fusing optical ﬂow and stereo from a single-camera folded catadioptric rig. Machine Vision and Applications, 24(1):1–12, sep 2011.

[94] D Lanman, D Crispell, M Wachs, and G Taubin. Spherical Catadioptric Arrays: Construc- tion, Multi-View Geometry, and Calibration. In 3D Data Processing, Visualization, and Transmission, Third International Symposium on, pages 81–88, jun 2006.

[95] Jie Lei, Xin Du, Yun-fang Zhu, and Ji-lin Liu. Unwrapping and stereo rectiﬁcation for omnidirectional images. Journal of Zhejiang University SCIENCE A, 10(8):1125–1139, 2009.

[96] Thomas Lemaire and Simon Lacroix. SLAM with Panoramic Vision. Journal of Field Robotics, 24(1-2):91–111, jan 2007.

[97] Maxime Lhuillier. Toward Flexible 3D Modeling using a Catadioptric Camera. IEEE Conference on Computer Vision and Pattern Recognition, pages 1–8, 2007.

[98] Maxime Lhuillier. Automatic scene structure and camera motion using a catadioptric system. Computer Vision and Image Understanding, 109(2):186–203, feb 2008.

[99] Maxime Lhuillier and Shuda Yu. Manifold surface reconstruction of an environment from sparse Structure-from-Motion data. Computer Vision and Image Understanding, 117(11):1628–1644, 2013.

[100] Huei-Yung Lin. Robot vision with hybrid omnidirectional and perspective imaging. Transactions on Mechatronics, 2012. 58

[101] Shih-Schon Lin and R Bajcsy. Single-view-point omnidirectional catadioptric cone mirror imager. IEEE Transactions on Pattern Analysis and Machine Intelligence, 28(5):840–845, may 2006.

[102] H.C. Longuet-Higgins. A computer algorithm for reconstructing a scene from two projections. Readings in Computer Vision: Issues, Problem, Principles, and Paradigms, 293(10):133–135, 1981.

[103] C Lopez-Franco and E Bayro-Corrochano. Uniﬁed model for omnidirectional vision using the conformal geometric algebra framework. In Pattern Recognition, 2004. ICPR 2004. Proceedings of the 17th International Conference on, volume 4, pages 48–51 Vol.4, aug 2004.

[104] David G. Lowe. Distinctive image features from scale-invariant keypoints. International Journal of Computer Vision, 60(2):91–110, 2004.

[105] Chuanjiang Luo, Liancheng Su, and Feng Zhu. A Novel Omnidirectional Stereo Vision System via a Single Camera. In Rustam Stolkin, editor, Scene Reconstruction Pose Estimation and Tracking, number June, chapter 2, pages 19–38. In-Tech Education and Publishing, Vienna, Austria, 2007.

[106] Aditi Majumder, W Brent Seales, M Gopi, and Henry Fuchs. Immersive teleconferencing: a new algorithm to generate seamless panoramic video imagery. In Proceedings of the seventh ACM international conference on Multimedia (Part 1), pages 169–178, 1999.

[107] H A Martins, J R Birk, and R B Kelley. Camera models based on data from two calibration planes. Computer Graphics and Image Processing, 17(2):173–180, 1981.

[108] V N Martynov, T I Jakushenkova, and M V Urusova. New constructions of panoramic annular lenses: design principle and output characteristics analysis, 2008.

[109] Jonathan Masci, Davide Migliore, Michael M Bronstein, and Jürgen Schmidhuber. De- scriptor learning for omnidirectional image matching. In Roberto Cipolla, Sebastiano Battiato, and Maria Giovanni Farinella, editors, Registration and Recognition in Im- ages and Videos, chapter Matching, pages 49–62. Springer Berlin Heidelberg, Berlin, Heidelberg, 2014.

[110] Tomohiro Mashita, Yoshio Iwai, and Masahiko Yachida. Calibration Method for Mis- aligned Catadioptric Camera. IEICE Transactions on Information and Systems, E89-D(7), 2006.

[111] C McCarthy, N Barnes, and M Srinivasan. Real Time Biologically-Inspired Depth Maps from Spherical Flow. Robotics and Automation, 2007 IEEE International Conference on, pages 4887–4892, 2007. 59

[112] Christopher Mei and Patrick Rives. Single View Point Omnidirectional Camera Calibration from Planar Grids. In IEEE International Conference on Robotics and Automation (ICRA), number April, pages 3945–3950. Ieee, apr 2007.

[113] Branislav Micusik. Two-view geometry of omnidirectional cameras. PhD thesis, 2004.

[114] Branislav Micusik and Tomas Pajdla. Estimation of omnidirectional camera model from epipolar geometry. Computer Vision and Pattern Recognition, 1:I–485, 2003.

[115] Branislav Micusik and Tomas Pajdla. Autocalibration and 3D reconstruction with non- central catadioptric cameras. In Computer Vision and Pattern Recognition, 2004. CVPR 2004. Proceedings of the 2004 IEEE Computer Society Conference on, volume 1, pages I–58–I–65 Vol.1, jun 2004.

[116] Branislav Micusik and Tomas Pajdla. Structure from motion with wide circular ﬁeld of view cameras. In Pattern Analysis and Machine Intelligence, IEEE Transactions on, volume 28, pages 1135–1149. IEEE, jul 2006.

[117] Michael J. Milford and Gordon F. Wyeth. Single camera vision-only SLAM on a suburban road network. 2008 IEEE International Conference on Robotics and Automation, pages 3684–3689, may 2008.

[118] Michael J. Milford and Gordon F GF Gordon F Wyeth. Mapping a Suburb With a Single Camera Using a Biologically Inspired SLAM System. Robotics, IEEE Transactions on, 24(5):1038–1053, 2008.

[119] Kenro Miyamoto. Fish Eye Lens. J. Opt. Soc. Am., 54(8):1060–1061, aug 1964.

[120] Y. Morimoto, Y. Kondo, H. Kataoka, et al. Application of panoramic annular lens for motion analysis tasks: surveillance and smoke detection. In Pattern Recognition, 2000. Proceedings. 15th International Conference on, volume 4, pages 714–717 vol.4, 2000.

[121] El Mustapha Mouaddib and Ryusuke Sagawa. Stereovision with a single camera and multiple mirrors. In International Conference on Robotics and Automation, number April, pages 800–805, 2005.

[122] El Mustapha Mouaddib, Ryusuke Sagawa, Tomio Echigo, and Yasushi Yagi. Two or more mirrors for the omnidirectional stereovision? In The European Association for Signal Processing (EURASIP), number 1, pages 1–4, 2006.

[123] E. Mouragnon, Maxime Lhuillier, M. Dhome, F. Dekeyser, and P. Sayd. Generic and real- time structure from motion using local bundle adjustment. Image and Vision Computing, 27(8):1178–1193, 2009. 60

[124] A C Murillo, J J Guerrero, and C Sagues. SURF features for efﬁcient robot localization with omnidirectional images. In Proceedings 2007 IEEE International Conference on Robotics and Automation, pages 3901–3907, apr 2007.

[125] Hajime Nagahara, Yasushi Yagi, and Masahiko Yachida. Super Wide Field of View Head Mounted Display Using Catadioptrical Optics. Presence: Teleoperators and Virtual Environments, 15(5):588–598, 2006.

[126] Vic Nalwa. A true omnidirectional viewer. Technical report, technical report, Bell Laboratories, 1996.

[127] Shree K. Nayar. Sphereo: Determining depth using two specular spheres and a single camera. In Proceedings of SPIE Conference on Optics, Illumination, and Image Sensing for Machine Vision III, pages 245–254. Citeseer, 1988.

[128] Shree K. Nayar. Catadioptric Omnidirectional Camera. In Computer Vision and Pattern Recognition, pages 482–488. IEEE Comput. Soc, 1997.

[129] Shree K. Nayar. Omnidirectional Vision. Robotics Research, pages 195–202, 1998.

[130] Shree K. Nayar and Simon Baker. Catadioptric Image Formation. In Proceedings of the 1997 DARPA Image Understanding Workshop, pages 1431–1437, 1997.

[131] Shree K. Nayar and Venkata Peri. Folded catadioptric cameras. Proceedings. 1999 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (Cat. No PR00149), pages 217–223, 1999.

[132] R C Nelson and J Aloimonos. Finding motion parameters from spherical motion ﬁelds (or the advantages of having eyes in the back of your head). Biological Cybernetics, 58(4):261–273, 1988.

[133] Sameer A Nene and Shree K. Nayar. Stereo with Mirrors. ICCV’98 Proceedings of the Sixth International Conference on Computer Vision, page 1087, 1998.

[134] R A Newcombe, S J Lovegrove, and A J Davison. DTAM: Dense tracking and mapping in real-time. In 2011 International Conference on Computer Vision, pages 2320–2327, nov 2011.

[135] K C Ng, H Ishiguro, M M Trivedi, and T Sogo. An integrated surveillance system: human tracking and view synthesis using multiple omni-directional vision sensors. Image and Vision Computing, 22(7):551–561, 2004.

[136] K C Ng, M Trivedi, and H Ishiguro. Range-space approach for generalized multiple baseline stereo and direct virtual view synthesis. In Stereo and Multi-Baseline Vision, 2001. (SMBV 2001). Proceedings. IEEE Workshop on, pages 62–71, 2001. 61

[137] Kim C. Ng, Mohan Trivedi, and Hiroshi Ishiguro. Generalized multiple baseline stereo and direct virtual view synthesis using range-space search, match, and render. International Journal of Computer Vision, 47(1-3):131–147, 2002.

[138] Thanh Trung Ngo, Hajime Nagahara, Ryusuke Sagawa, et al. Robust and Real-Time Egomotion Estimation Using a Compound Omnidirectional Sensor. In IEEE International Conference on Robotics and Automation, number 1, pages 492–497, 2008.

[139] David Nistér. An efﬁcient solution to the ﬁve-point relative pose problem. IEEE transactions on pattern analysis and machine intelligence, 26(6):756–77, jun 2004.

[140] Mark Ollis, Herman Herman, and Sanjiv Singh. Analysis and design of panoramic stereo vision using equi-angular pixel cameras. Technical Report January, 1999.

[141] Alain Pagani and Didier Stricker. Structure from Motion using full spherical panoramic cameras. In 2011 IEEE International Conference on Computer Vision Workshops (ICCV Workshops), pages 375–382, 2011.

[142] Roberto Parodi. Optical Device for Horizontal Panoramic and Zenithal View, 1925.

[143] Christian Perwass and Gerald Sommer. The Inversion Camera Model. Pattern Recognition, pages 647–656, 2006.

[144] Christian Plagemann, Cyrill Stachniss, Jürgen Jurgen Hess, Felix Endres, and Nathan Franklin. A nonparametric learning approach to range sensing from omnidirectional vision. Robotics and Autonomous Systems (Special Issue on Omnidirectional Robot Vision), 58(6):762–772, jun 2010.

[145] Luis Puig, Yalin Bastanlar, Peter Sturm, José J Guerrero, and Joao P. Barreto. Calibration of central catadioptric cameras using a DLT-like approach. International Journal of Computer Vision, 93(1):101–114, 2011.

[146] Luis Puig, J. Bermúdez, Peter Sturm, and J.J. Guerrero. Calibration of omnidirectional cameras in practice: A comparison of methods. Computer Vision and Image Understand- ing, 116(1):120–137, sep 2011.

[147] Srikumar Ramalingam, Peter Sturm, and Suresh K. Lodha. Generic self-calibration of central cameras. Computer Vision and Image Understanding, 114(2):210–219, 2010.

[148] Donald W. Rees. Panoramic television viewing system, 1970.

[149] A Rituerto, L Puig, and J J Guerrero. Visual SLAM with an Omnidirectional Camera. In Pattern Recognition (ICPR), 2010 20th International Conference on, pages 348–351, aug 2010. 62

[150] A. Rizzi, R. Cassinis, and N. Serana. Neural networks for autonomous path-following with an omnidirectional image sensor. Neural Computing and Applications, 11(1):45–52, 2002.

[151] Edward Rosten, Reid Porter, and Tom Drummond. Faster and better: a machine learning approach to corner detection. IEEE transactions on pattern analysis and machine intelligence, 32(1):105–119, jan 2010.

[152] Ryusuke Sagawa, Naoki Kurita, Tomio Echigo, and Yasushi Yagi. Compound Catadioptric Stereo Sensor for Omnidirectional Object Detection. Camera, pages 362–367, 2004.

[153] Tomokazu Sato, Tomáš Pajdla, and Naokazu Yokoya. Epipolar geometry estimation for wide-baseline omnidirectional street view images. In Computer Vision Workshops (ICCV Workshops), 2011 IEEE International Conference on, number 1, pages 56–63, nov 2011.

[154] Davide Scaramuzza, Agostino Martinelli, and Roland Siegwart. A Toolbox for Easily Calibrating Omnidirectional Cameras. In 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pages 5695–5701. IEEE, oct 2006.

[155] Davide Scaramuzza and Roland Siegwart. Monocular omnidirectional visual odometry for outdoor ground vehicles. In Proceedings of the 6th international conference on Computer vision systems, pages 206–215. Springer-Verlag, 2008.

[156] Miriam Schönbein and Andreas Geiger. Omnidirectional 3D Reconstruction in Augmented Manhattan Worlds. 2014 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS 2014), 2014.

[157] Miriam Schönbein, Bernd Kitt, and Martin Lauer. Environmental Perception for Intel- ligent Vehicles Using Catadioptric Stereo Vision Systems. In In Proc. of the European Conference on Mobile Robots (ECMR), pages 1–6, 2011.

[158] Miriam Schönbein, Tobias Strauss, and Andreas Geiger. Calibrating and Centering Quasi- Central Catadioptric Cameras. In International Conference on Robotics and Automation (ICRA), 2014.

[159] Abd El Rahman Shabayek. Non-Central Catadioptric Sensors Auto-Calibration. PhD thesis, Universite de Bourgogne, 2009.

[160] Jianbo Shi and Carlo Tomasi. Good Features to Track. In Conference on Computer Vision and Pattern Recognition (CVPR), pages 593–600, 1994.

[161] S. N. Sinha, M. F. Cohen, and M. Uyttendaele. Real-time image-based 6-DOF localization in large-scale environments. 2012 IEEE Conference on Computer Vision and Pattern Recognition, pages 1043–1050, jun 2012. 63

[162] Dan Slater. Panoramic photography with ﬁsh eye lenses. Panorama, International Association of Panoramic Photographers, 13, 1996.

[163] T Sogo, Hiroshi Ishiguro, and M M Trivedi. Real-time target localization and tracking by N-ocular stereo. In Omnidirectional Vision, 2000. Proceedings. IEEE Workshop on, pages 153–160, 2000.

[164] Libor Spacek. Coaxial Omnidirectional Stereopsis. In Tomás Pajdla and J. Matas, editors, Computer Vision-ECCV 2004, chapter Coaxial Om, pages 354–365. Springer Berlin Heidelberg, Berlin, Heidelberg, 2004.

[165] Peter Sturm and Srikumar Ramalingam. A Generic Concept for Camera Calibration. Computer VisionECCV 2004, 2:1–13, 2004.

[166] Peter Sturm, Srikumar Ramalingam, Jean-Philippe Tardif, Simone Gasparini, and Joao P. Barreto. Camera models and fundamental concepts used in geometric computer vision. Foundations and Trends® in Computer Graphics and Vision, 6(1-2):1–183, 2010.

[167] Liancheng Su, Chuanjiang Luo, and Feng Zhu. Obtaining Obstacle Information by an Omnidirectional Stereo Vision System. 2006 IEEE International Conference on Information Acquisition, pages 48–52, 2006.

[168] Tomáš Svoboda and Tomáš Pajdla. Epipolar geometry for central catadioptric cameras. International Journal of Computer Vision, 49(1):23–37, 2002.

[169] Tomáš Svoboda, Tomáš Pajdla, and Vaclav Hlavác. Epipolar geometry for panoramic cameras. In Computer Vision - ECCV’98, 1998.

[170] Rahul Swaminathan, Michael D. Grossberg, and Shree K. Nayar. Caustics of catadioptric cameras. Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001, 2:2–9, 2001.

[171] Yuichi Taguchi, Amit Agrawal, Ashok Veeraraghavan, Srikumar Ramalingam, and Ramesh Raskar. Axial-cones: modeling spherical catadioptric cameras for wide-angle light ﬁeld rendering. In ACM Transactions on Graphics (Proceedings of SIGGRAPH Asia 2010), volume 29, pages 172:1–172:8, 2010.

[172] Jean-Philippe Tardif, Y. Pavlidis, and Kostas Daniilidis. Monocular visual odometry in urban environments using an omnidirectional camera. In 2008 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 22–26, 2008.

[173] Jean Philippe Tardif, Peter Sturm, and Sébastien Roy. Self-calibration of a general radially symmetric distortion model. Lecture Notes in Computer Science (including subseries Lecture Notes in Artiﬁcial Intelligence and Lecture Notes in Bioinformatics), 3954 LNCS:186–199, 2006. 64

[174] Antti Tolvanen, Christian Perwass, and Gerald Sommer. Projective Model for Cen- tral Catadioptric Cameras Using Clifford Algebra, chapter Projective, pages 192–199. Springer Berlin Heidelberg, Berlin, Heidelberg, 2005.

[175] B Triggs. Camera pose and calibration from 4 or 5 known 3D points. In Computer Vision, 1999. The Proceedings of the Seventh IEEE International Conference on, volume 1, pages 278–284 vol.1, 1999.

[176] Bill Triggs, Hans Burkhardt, and Bernd Neumann. Autocalibration from Planar Scenes. In Proceedings Fifth European Conference Computer Vision, volume 1406, pages 89–105, 1998.

[177] I Ulrich and I Nourbakhsh. Appearance-based place recognition for topological localization. In Robotics and Automation, 2000. Proceedings. ICRA ’00. IEEE International Conference on, volume 2, pages 1023–1029 vol.2, 2000.

[178] Christoffer Valgren and Achim J. Lilienthal. SIFT, SURF and seasons: Appearance-based long-term localization in outdoor environments. Robotics and Autonomous Systems, 58(2):149–156, 2010.

[179] Maarten Vanvolsem. The art of strip photography. Making still images with a moving camera. Leuven UP, Leuven, 2010.

[180] P. Vasseur and El Mustapha Mouaddib. Central Catadioptric Line Detection. Procedings of the British Machine Vision Conference 2004, pages 8.1–8.10, 2004.

[181] L V. Wake and R.F. Brady. Formulating Infrared Coatings for Defence Applications. Technical report, DSTO Materials Research Laboratory (MRL), Victoria, Australia, 1993.

[182] Qing Wang, Yi Ping Tang, Ming Li Zong, Jun Jiang, and Yi Hua Zhu. Design of vertically aligned binocular omnistereo vision sensor. Eurasip Journal on Image and Video Processing, 2010, 2010.

[183] Brian Williams, Georg Klein, and Ian Reid. Real-Time SLAM Relocalisation. In Computer Vision, 2007. ICCV 2007. IEEE 11th International Conference on, pages 1–8, Rio de Janeiro, 2007. Ieee.

[184] Alexander S. Wolcott. Method Of Taking Likenesses By Means Of A Concave Reﬂector And Plates So Prepared As That Luminous Or Other Rays Will Act Thereon, 1840.

[185] W K Wong, C K Loo, and W S Lim. Optical Approach Based Omnidirectional Thermal Visualization. International Journal of Image Processing (IJIP), 4(4):263, 2010.

[186] Wai Kit Wong, Poi Ngee Tan, Chu Kiong Loo, and Way Soong Lim. Omnidirectional Surveillance System Using Thermal Camera. Journal of Computer Science and Engineer- ing, 3(2):42–51, 2010. 65

[187] Dong-Min D M Woo and D C Dong-Chul Park. Implicit Camera Calibration Based on a Nonlinear Modeling Function of an Artificial Neural Network. In Wen Yu, Haibo He, and Nian Zhang, editors, Advances in Neural Networks - ISNN 2009, volume 5551 of Lecture Notes in Computer Science, pages 967–975. Springer Berlin Heidelberg, 2009. [188] R W Wood. Fish-eye views, and vision under water. Philosophical Magazine, 12(6):159– 162, 1906. [189] Fuchao Wu, Fuqing Duan, Zhanyi Hu, and Yihong Wu. A new linear algorithm for calibrating central catadioptric cameras. Pattern Recognition, 41(10):3166–3172, 2008. [190] Yihong Wu and Zhanyi Hu. Geometric invariants and applications under catadioptric camera model. Tenth IEEE International Conference on Computer Vision (ICCV’05) Volume 1, pages 1547—-1554 Vol. 2, 2005. [191] Zhiyu Xiang, Xing Dai, and Xiaojin Gong. Noncentral catadioptric camera calibration using a generalized unified model. Optics letters, 38(9):1367–1369, 2013. [192] Xiao Xiao, Guo Guang Yang, and Jian Bai. Capture of vehicle surroundings using a pair of panoramic annular lens cameras. In ITST 2007 - 7th International Conference on Intelligent Transport Systems Telecommunications, Proceedings, pages 321–325, 2007. [193] Zhihui Xiong, Wang Chen, and Maojun Zhang. Catadioptric Omni-directional Stereo Vision and Its Applications in Moving Objects Detection. In In Tech: Computer Vision, number November, chapter 26, pages 493–538. Vienna, Austria, 2008. [194] Zhihui Xiong, Irene Cheng, Anup Basu, et al. Efficient omni-image unwarping using geometric symmetry. Machine Vision and Applications, 23(4):725–737, 2012. [195] Yasushi Yagi. Omnidirectional sensing and its applications. IEICE Transactions on Information and Systems, 1(3):568–579, 1999. [196] Yasushi Yagi, Saburo Tsuji, and Shinjiro Kawato. Real-Time Omnidirectional Image Sensor (COPIS) for Vision-Guided Navigation. IEEE Transactions on Robotics and Automation, 10(1):11–22, 1994. [197] Yasushi Yagi and Masahiko Yachida. Real-time Generation of Environmental Map and Obstacle Avoidance using Omnidirectional Image Sensor with Conic Mirror. pages 160–165, 1991. [198] Yasushi Yagi and Masahiko Yachida. Omnidirectional visual sensor having a plurality of mirrors with surfaces of revolution, 2000. [199] Kazumasa Yamazawa, Yasushi Yagi, and Masahiko Yachida. Omnidirectional imaging with hyperboloidal projection. In Intelligent Robots and Systems ’93, IROS ’93. Proceed- ings of the 1993 IEEE/RSJ International Conference on, volume 2, pages 1029–1034 vol.2, jul 1993. 66

[200] Sooyeong Yi and Narendra Ahuja. An Omnidirectional Stereo Vision System Using a Single Camera. In 18th International Conference on Pattern Recognition (ICPR’06), pages 861–865. IEEE, 2006.

[201] Xianghua Ying and Zhanyi Hu. Catadioptric camera calibration using geometric invariants. IEEE transactions on pattern analysis and machine intelligence, 26(10):1260–71, oct 2004.

[202] Xianghua Ying and Hongbin Zha. Simultaneously calibrating catadioptric camera and detecting line features using Hough transform. In 2005 IEEE/RSJ International Conference on Intelligent Robots and Systems, pages 412–417, aug 2005.

[203] Shuda Yu and Maxime Lhuillier. Surface reconstruction of scenes using a catadioptric camera. Computer Vision/Computer Graphics Collaboration Techniques, pages 145–156, 2011.

[204] Zhigang Zhu. Omnidirectional stereo vision. In Proceedings of IEEE, 10th International Conference on Advanced Robotics, number 69805003, Budapest, 2001. Citeseer.

[205] Zhigang Zhu, Edward Riseman, and Allen Hanson. Geometrical modeling and real-time vision applications of a panoramic annular lens (PAL) camera system. Technical report, 1999.

[206] Zhigang Zhu, Haojun Xi, and Guangyou Xu. Combining Rotation-Invariance Images and Neural Networks. In Neural Networks, number 1, pages 1732–1737, 1996. 67

Appendix A

Symbol Notation

3 3 Pi: a point 2 R where post-subscript i as a unique identiﬁer such as P ⇢ R −{Pi} n o [A]: a reference frame or image space with origin OA.

[A] pi: The position vector of Pi with respect to a reference frame [A] is indicated component-wise T by a column vector [xi, yi, zi] . Notice how the frame [A], which vector pi is referred with respect to, is written as a pre-superscript on a symbol. In homogeneous coordinates, T [A] it is indicated with an h post-subscript as in pi,h =[xi, yi, zi, 1] .

[I] mi: a 2D point or pixel position on image frame [I]. kpik: the magnitude (Euclidean norm) of pi. qˆ: A unit vector so kqˆk = 1.

^v: A theoretical value for symbol v.

Mi:a3⇥ 3 matrix, or Mi,h in homogeneous coordinates.

fs: a scalar-valued function that outputs some s.

fv: a vector-valued function for the computation of v.