Structure from Motion and Recognition

MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY A.I. Memo No. 1363 September, 1992 Pro jective Structure from two Uncalibrated Images: Structure from Motion and Recognition Amnon Shashua Abstract This pap er addresses the problem of recovering relative structure, in the form of an invariant, from two views of a 3D scene. The invariant structure is computed without any prior knowledge of camera geometry,orinternal calibration, and with the prop erty that p ersp ective and orthographic pro jections are treated alike, namely, the system makes no assumption regarding the existence of p ersp ective distortions in the input images. We show that, given the lo cation of epip oles, the pro jective structure invariant can b e constructed from only four corresp onding p oints pro jected from four non-coplanar p oints in space (like in the case of parallel pro jection). This result leads to two algorithms for computing pro jective structure. The rst algorithm requires six corresp onding p oints, four of which are assumed to b e pro jected from four coplanar p oints in space. Alternatively, the second algorithm requires eight corresp onding p oints, without assumptions of coplanarity of ob ject p oints. Our study of pro jective structure is applicable to b oth structure from motion and visual recognition. We use pro jective structure to re-pro ject the 3D scene from two mo del images and six or eight corresp onding p oints with a novel view of the scene. The re-pro jection pro cess is well-de ned under all cases of central pro jection, including the case of parallel pro jection. c Copyright Massachusetts Institute of Technology, 1992 This rep ort describ es research done at the Arti cial Intelligence Lab oratory of the Massachusetts Institute of Technology. Supp ort for the lab oratory's arti cial intelligence researchisprovided in part bytheAdvanced Research Pro jects Agency of the Department of Defense under Oce of Naval Researchcontract N00014-85-K-0124. A. Shashua was also supp orted by NSF-IRI8900267. Brill, Haag & Pyton (1991) have recently intro duced a 1 Intro duction quadratic invariant based on the fundamental matrix of The problem we address in this pap er is that of recover- Longuet-Higgins (1981), which is computed from eight ing relative, non-metric, structure of a three-dimensional corresp onding p oints. In App endix E weshowthat scene from two images, taken from di erent viewing p o- their result is equivalenttointersecting epip olar lines, sitions. The relative structure information is in the form and therefore, is singular for certain viewing transfor- of an invariant that can b e computed without any prior mations dep ending on the viewing geometry b etween the knowledge of camera geometry, and under all central pro- two mo del views. Our pro jectiveinvariant is not based jections | including the case of parallel pro jection. The on an epip olar intersection, but is based directly on the non-metric nature of the invariantallows the cameras to relative structure of the ob ject, and do es not su er from be internally uncalibrated (intrinsic parameters of cam- any singularities, a nding that implies greater stability era are unknown). The unique nature of the invariant al- in the presence of errors. lows the system to make no assumptions ab out existence The pro jective structure invariant, and the re- of p ersp ective distortions in the input images. Therefore, pro jection metho d that follows, is based on an exten- any degree of p ersp ective distortions is allowed, i.e., or- sion of Ko enderink and Van-Do orn's representation of thographic and p ersp ective pro jections are treated alike, ane structure as an invariant de ned with resp ect to or in other words, no assumptions are made on the size a reference plane and a reference p oint. We start byin- of eld of view. tro ducing an alternativeaneinvariant, using two ref- Weenvision this study as having applications b oth in erence planes (section 5), and it can easily b e extended the area of structure from motion and in the area of to pro jective space. As a result we obtain a pro jective visual recognition. In structure from motion our contri- structure invariant (section 6). bution is an addition to the recent studies of non-metric Weshow that the di erence b etween the ane and structure from motion pioneered by Ko enderink and Van pro jective case lie entirely in the lo cation of the epip oles, Do orn (1991) in parallel pro jection, followed byFaugeras i.e., given the lo cation of epip oles b oth the ane and (1992) and Mohr, Quan, Veillon & Boufama (1992) for pro jective structures are constructed by linear metho ds reconstructing the pro jective co ordinates of a scene up using the information captured from four corresp onding to an unknown pro jective transformation of 3D pro jec- p oints pro jected from four non-coplanar p oints in space. tive space. Our approach is similar to Ko enderink and In the pro jectivecasewe need additional corresp onding Van Do orn's in the sense that we deriveaninvariant, p oints | solely for the purp ose of recovering the lo cation based on a geometric construction, that records the 3D of the epip oles (Theorem 1, section 6). structure of the scene as a variation from two xed ref- Weshow that the pro jective structure invariantcan erence planes measured along the line of sight. Unlike b e recovered from two views | pro duced by parallel or Faugeras and Mohr et al. wedonotrecover the pro jec- central pro jection | and six corresp onding p oints, four tive co ordinates of the scene, and, as a result, we use a of which are assumed to b e pro jected from four coplanar smaller numb er of corresp onding p oints: in addition to p oints in space (section 7.1). Alternatively, the pro jec- the lo cation of epip oles we need only four corresp ond- tive structure can b e recovered from eight corresp onding ing p oints, coming from four non-coplanar p oints in the p oints, without assuming coplanarity of ob ject p oints scene, whereas Faugeras and Mohr et al. require corre- (section 8.1). The 8-p oint metho d uses the fundamental sp ondences coming from vepoints in general p osition. matrix approach (Longuett-Higgins, 1981) for recover- The second contribution of our study is to visual recog- ing the lo cation of epip oles (as suggested byFaugeras, nition of 3D ob jects from 2D images. We show that our 1992). pro jectiveinvariant can b e used to predict novel views of Finally,weshow that, for b oth schemes, it is p ossible the ob ject, given two mo del views in full corresp ondence to limit the viewing transformations to the group of rigid and a small numb er of corresp onding p oints with the motions, i.e., it is p ossible to work with p ersp ectivepro- novel view. The predicted view is then matched against jection assuming the cameras are calibrated. The result, the novel input view, and if the twomatch, then the however, do es not include orthographic pro jection. novel view is considered to b e an instance of the same ob- Exp eriments were conducted with b oth algorithms, ject that gaverisetothetwo mo del views stored in mem- and the results show that the 6-p oint algorithm is sta- ory. This paradigm of recognition is within the general ble under noise and under conditions that violate the framework of alignment (Fischler and Bolles 1981, Lowe assumption that four ob ject p oints are coplanar. The 8- 1985, Ullman 1989, Huttenlo cher and Ullman 1987) and, p oint algorithm, although theoretically sup erior b ecause more sp eci cally, of the paradigm prop osed byUllman of lack of the coplanarity assumption, is considerably and Basri (1989) that recognition can pro ceed using only more sensitive to noise. 2D images, b oth for representing the mo del, and when matching the mo del to the input image. We refer to the 2 Why not Classical SFM? problem of predicting a novel view from a set of mo del views using a limited numb er of corresp onding p oints, The work of Ko enderink and Van Do orn (1991) on ane structure from two orthographic views, and the work of as the problem of re-projection. Ullman and Basri (1989) on re-pro jection from twoor- The problem of re-pro jection has b een dealt with in thographic views, have a clear practical asp ect: it is the past primarily assuming parallel pro jection (Ull- known that at least three orthographic views are re- man and Basri 1989, Ko enderink and Van Do orn 1991). quired to recover metric structure, i.e., relative depth For the more general case of central pro jection, Barret, 1 the image plane is p erp endicular to the pro jecting rays. (Ullman 1979, Huang & Lee 1989, Aloimonos & Brown The third problem is related to the wayshapeis 1989). Therefore, the suggestion to use ane structure instead of metric structure allows a recognition system typically represented under the p ersp ective pro jection mo del. Because the center of pro jection is also the ori- to p erform re-pro jection from two-mo del views (Ullman gin of the co ordinate system for describing shap e, the & Basri), and to generate novel views of the ob ject pro- duced by ane transformations in space, rather than by shap e di erence (e.g., di erence in depth, b etween two ob ject p oints), is orders of magnitude smaller than the rigid transformations (Ko enderink & Van Do orn).

Structure from Motion and Recognition

Hardware Support for Non-Photorealistic Rendering

Automatic Workflow for Roof Extraction and Generation of 3D

A Unified Algorithmic Framework for Multi-Dimensional Scaling

Geant4 Integration Status

Obtaining a Canonical Polygonal Schema from a Greedy Homotopy Basis with Minimal Mesh Reﬁnement

MATH 113 Section 8.1: Basic Geometry

3.3 Convex Polyhedron

Vertex-Transitive Polyhedra, Their Maps, and Quotients

Ball Grid Array (BGA) Packaging 14

On the Axiomatics of Projective and Affine Geometry in Terms of Line

Exterior Orientation Improved by the Coplanarity Equation and Dem Generation for Cartosat-1

Shaping Space Exploring Polyhedra in Nature, Art, and the Geometrical Imagination