MASSACHUSETTS INSTITUTE OF TECHNOLOGY

ARTIFICIAL INTELLIGENCE LABORATORY

A.I. Memo No. 1363 September, 1992

Pro jective Structure from two Uncalibrated Images: Structure from

Motion and Recognition

Amnon Shashua

Abstract

This pap er addresses the problem of recovering relative structure, in the form of an invariant, from two views of a 3D

scene. The invariant structure is computed without any prior knowledge of camera ,orinternal calibration,

and with the prop erty that p ersp ective and orthographic pro jections are treated alike, namely, the system makes no

assumption regarding the existence of p ersp ective distortions in the input images.

We show that, given the lo cation of epip oles, the pro jective structure invariant can b e constructed from only four

corresp onding p oints pro jected from four non-coplanar p oints in space (like in the case of parallel pro jection). This

result leads to two algorithms for computing pro jective structure. The rst algorithm requires six corresp onding

p oints, four of which are assumed to b e pro jected from four coplanar p oints in space. Alternatively, the second

algorithm requires eight corresp onding p oints, without assumptions of coplanarity of ob ject p oints.

Our study of pro jective structure is applicable to b oth structure from motion and visual recognition. We use

pro jective structure to re-pro ject the 3D scene from two mo del images and six or eight corresp onding p oints with a

novel view of the scene. The re-pro jection pro cess is well-de ned under all cases of central pro jection, including the

case of parallel pro jection.

c

Copyright Massachusetts Institute of Technology, 1992

This rep ort describ es research done at the Arti cial Intelligence Lab oratory of the Massachusetts Institute of Technology.

Supp ort for the lab oratory's arti cial intelligence researchisprovided in part bytheAdvanced Research Pro jects Agency of

the Department of Defense under Oce of Naval Researchcontract N00014-85-K-0124. A. Shashua was also supp orted by NSF-IRI8900267.

Brill, Haag & Pyton (1991) have recently intro duced a 1 Intro duction

quadratic invariant based on the fundamental of

The problem we address in this pap er is that of recover-

Longuet-Higgins (1981), which is computed from eight

ing relative, non-metric, structure of a three-dimensional

corresp onding p oints. In App endix E weshowthat

scene from two images, taken from di erent viewing p o-

their result is equivalenttointersecting epip olar lines,

sitions. The relative structure information is in the form

and therefore, is singular for certain viewing transfor-

of an invariant that can b e computed without any prior

mations dep ending on the viewing geometry b etween the

knowledge of camera geometry, and under all central pro-

two mo del views. Our pro jectiveinvariant is not based

jections | including the case of parallel pro jection. The

on an epip olar intersection, but is based directly on the

non-metric nature of the invariantallows the cameras to

relative structure of the ob ject, and do es not su er from

be internally uncalibrated (intrinsic parameters of cam-

any singularities, a nding that implies greater stability

era are unknown). The unique nature of the invariant al-

in the presence of errors.

lows the system to make no assumptions ab out existence

The pro jective structure invariant, and the re-

of p ersp ective distortions in the input images. Therefore,

pro jection metho d that follows, is based on an exten-

any degree of p ersp ective distortions is allowed, i.e., or-

sion of Ko enderink and Van-Do orn's representation of

thographic and p ersp ective pro jections are treated alike,

ane structure as an invariant de ned with resp ect to

or in other words, no assumptions are made on the size

a reference and a reference p oint. We start byin-

of eld of view.

tro ducing an alternativeaneinvariant, using two ref-

Weenvision this study as having applications b oth in

erence planes (section 5), and it can easily b e extended

the area of structure from motion and in the area of

to pro jective space. As a result we obtain a pro jective

visual recognition. In structure from motion our contri-

structure invariant (section 6).

bution is an addition to the recent studies of non-metric

Weshow that the di erence b etween the ane and

structure from motion pioneered by Ko enderink and Van

pro jective case lie entirely in the lo cation of the epip oles,

Do orn (1991) in parallel pro jection, followed byFaugeras

i.e., given the lo cation of epip oles b oth the ane and

(1992) and Mohr, Quan, Veillon & Boufama (1992) for

pro jective structures are constructed by linear metho ds

reconstructing the pro jective co ordinates of a scene up

using the information captured from four corresp onding

to an unknown pro jective transformation of 3D pro jec-

p oints pro jected from four non-coplanar p oints in space.

tive space. Our approach is similar to Ko enderink and

In the pro jectivecasewe need additional corresp onding

Van Do orn's in the sense that we deriveaninvariant,

p oints | solely for the purp ose of recovering the lo cation

based on a geometric construction, that records the 3D

of the epip oles (Theorem 1, section 6).

structure of the scene as a variation from two xed ref-

Weshow that the pro jective structure invariantcan

erence planes measured along the of sight. Unlike

b e recovered from two views | pro duced by parallel or

Faugeras and Mohr et al. wedonotrecover the pro jec-

central pro jection | and six corresp onding p oints, four

tive co ordinates of the scene, and, as a result, we use a

of which are assumed to b e pro jected from four coplanar

smaller numb er of corresp onding p oints: in addition to

p oints in space (section 7.1). Alternatively, the pro jec-

the lo cation of epip oles we need only four corresp ond-

tive structure can b e recovered from eight corresp onding

ing p oints, coming from four non-coplanar p oints in the

p oints, without assuming coplanarity of ob ject p oints

scene, whereas Faugeras and Mohr et al. require corre-

(section 8.1). The 8-p oint metho d uses the fundamental

sp ondences coming from vepoints in general p osition.

matrix approach (Longuett-Higgins, 1981) for recover-

The second contribution of our study is to visual recog-

ing the lo cation of epip oles (as suggested byFaugeras,

nition of 3D ob jects from 2D images. We show that our

1992).

pro jectiveinvariant can b e used to predict novel views of

Finally,weshow that, for b oth schemes, it is p ossible

the ob ject, given two mo del views in full corresp ondence

to limit the viewing transformations to the group of rigid

and a small numb er of corresp onding p oints with the

motions, i.e., it is p ossible to work with p ersp ectivepro-

novel view. The predicted view is then matched against

jection assuming the cameras are calibrated. The result,

the novel input view, and if the twomatch, then the

however, do es not include orthographic pro jection.

novel view is considered to b e an instance of the same ob-

Exp eriments were conducted with b oth algorithms,

ject that gaverisetothetwo mo del views stored in mem-

and the results show that the 6-p oint algorithm is sta-

ory. This paradigm of recognition is within the general

ble under noise and under conditions that violate the

framework of alignment (Fischler and Bolles 1981, Lowe

assumption that four ob ject p oints are coplanar. The 8-

1985, Ullman 1989, Huttenlo cher and Ullman 1987) and,

p oint algorithm, although theoretically sup erior b ecause

more sp eci cally, of the paradigm prop osed byUllman

of lack of the coplanarity assumption, is considerably

and Basri (1989) that recognition can pro ceed using only

more sensitive to noise.

2D images, b oth for representing the mo del, and when

matching the mo del to the input image. We refer to the

2 Why not Classical SFM?

problem of predicting a novel view from a set of mo del

views using a limited numb er of corresp onding p oints,

The work of Ko enderink and Van Do orn (1991) on ane

structure from two orthographic views, and the work of as the problem of re-projection.

Ullman and Basri (1989) on re-pro jection from twoor- The problem of re-pro jection has b een dealt with in

thographic views, have a clear practical asp ect: it is the past primarily assuming parallel pro jection (Ull-

known that at least three orthographic views are re- man and Basri 1989, Ko enderink and Van Do orn 1991).

quired to recover metric structure, i.e., relative depth For the more general case of central pro jection, Barret, 1

the image plane is p erp endicular to the pro jecting rays. (Ullman 1979, Huang & Lee 1989, Aloimonos & Brown

The third problem is related to the wayshapeis 1989). Therefore, the suggestion to use ane structure

instead of metric structure allows a recognition system

typically represented under the p ersp ective pro jection

mo del. Because the center of pro jection is also the ori- to p erform re-pro jection from two-mo del views (Ullman

gin of the co ordinate system for describing shap e, the & Basri), and to generate novel views of the ob ject pro-

duced by ane transformations in space, rather than by

shap e di erence (e.g., di erence in depth, b etween two

ob ject p oints), is orders of magnitude smaller than the rigid transformations (Ko enderink & Van Do orn).

distance to the scene, and this makes the computations

This advantage, of working with two rather than three

very sensitive to noise. The sensitivitytonoiseisre-

views, is not present under p ersp ective pro jection, how-

duced if images are taken from distant viewp oints (large

ever. It is known that two p ersp ective views are sucient

base-line in stereo triangulation), but that makes the

for recovering metric structure (Roach & Aggarwal 1979,

pro cess of establishing corresp ondence b etween p oints in

Longuett-Higgins 1981, Tsai & Huang 1984, Faugeras &

b oth views more of a problem, and hence, may makethe

Maybank 1990). The question, therefore, is whylookfor

situation even worse. This problem do es not o ccur un-

alternative representations of structure, and new meth-

der the assumption of orthographic pro jection b ecause

o ds for p erforming re-pro jection?

translation in depth is lost under orthographic pro jec-

There are three ma jor problems in structure from mo-

tion, and therefore, the origin of the co ordinate system

tion metho ds: (i) critical dep endence on an orthographic

for describing shap e (metric and non-metric) is ob ject-

or p ersp ective mo del of pro jection, (ii) internal camera

centered, rather than viewer-centered (Tomasi, 1991).

calibration, and (iii) the problem of stereo-triangulation.

These problems, in isolation or put together, make

The rst problem is the strict division b etween meth-

much of the reason for the sensitivity of structure from

o ds that assume orthographic pro jection and metho ds

motion metho ds to errors. The recentwork of Faugeras

that assume p ersp ective pro jection. These two classes

(1992) and Mohr et al. (1992) addresses the problem of

of metho ds do not overlap in their domain of applica-

internal calibration by assuming central pro jection in-

tion. The p ersp ective mo del op erates under conditions

stead of p ersp ective pro jection. Faugeras and Mohr et

of signi cant p ersp ective distortions, such as driving on

al. then pro ceed to reconstruct the pro jectivecoordi-

a stretch of highway, requires a relatively large eld of

nates of the scene. Since pro jective co ordinates are mea-

view and relatively large depth variations b etween scene

sured relative to the center of pro jection, this approach

p oints (Adiv 1989, Dutta & Synder 1990, Tomasi 1991,

do es not address the problem of stereo-triangulation or

Broida et al. 1990). The orthographic mo del, on the

the problem of uniformity under b oth orthographic and

other hand, provides a reasonable approximation when

p ersp ective pro jection mo dels.

the imaging situation is at the other extreme, i.e., small

eld of view and small depth variation b etween ob ject

3 Camera Mo del and Notations

p oints (a situation for which p ersp ectiveschemes often

break down). Typical imaging situations are at neither

We assume that ob jects in the world are rigid and are

end of these extremes and, therefore, would b e vulner-

viewed under central projection. In central pro jection

able to errors in b oth mo dels. From the standp ointof

the center of pro jection is the origin of the camera co or-

p erforming recognition, this problem implies that the

dinate frame and can b e lo cated anywhere in pro jective

viewer has control over his eld of view | a prop erty

space. In other words, the center of pro jection can b e

that may b e reasonable to assume at the time of mo del

apoint in Euclidean space or an ideal point(suchas

acquisition, but less reasonable to assume o ccurring at

happ ens in parallel pro jection). The image plane is as-

recognition time.

sumed to b e arbitrarily p ositioned with resp ect to the

camera co ordinate frame (unlike p ersp ective pro jection The second problem is related to internal camera cal-

where it is parallel to the xy plane). We refer to this as a ibration. The assumption of p ersp ective pro jection in-

non-rigid camera con guration. The motion of the cam- cludes a distinguishable p oint, known as the principal

era, therefore, consists of the translation of the center of p oint, which is at the intersection of the optical axis and

pro jection, rotation of the co ordinate frame around the the image plane. The lo cation of the principal p ointis

new lo cation of the center of pro jection, and followed by an internal parameter of the camera, whichmay deviate

tilt, pan, and fo cal length scale of the image plane with somewhat from the geometric center of the image plane,

resp ect to the new optical axis. This mo del of pro jection and therefore, may require calibration. Persp ective pro-

will also b e referred to as p ersp ective pro jection with an

jection also assumes that the image plane is p erp endicu-

uncalibrated camera. lar to the optical axis and the p ossibility of imp erfections

We also include in our derivations the p ossibilityof in the camera requires, therefore, the recovery of the two

having a rigid camera con guration. A rigid camera is axes describing the image frame, and of the fo cal length.

simply the familiar mo del of perspective projection in Although the calibration pro cess is somewhat tedious, it

which the center of pro jection is a p oint in Euclidean is sometimes necessary for many of the available com-

space and the image plane is xed with resp ect to the mercial cameras (Brown 1971, Faig 1975, Lenz and Tsai

camera co ordinate frame. A rigid camera motion, there- 1987, Faugeras, Luong and Maybank 1992). The prob-

fore, consists of translation of the center of pro jection lem of calibration is lesser under orthographic pro jection

followed by rotation of the co ordinate frame and fo cal b ecause the pro jection do es not have a distinguishable

length scaling. Note that a rigid camera implicitly as- ray; therefore any p oint can serve as an origin, however

sumes internal calibration, i.e., the optical axis pierces must still b e considered b ecause of the assumption that 2

P Q 0

on the reference plane). Note that the lo cation ofp ~ is

known via the ane transformation determined bythe

~

ts. Finally,let Q reference pro jections of the three reference p oin

plane

Q b e the fourth reference p oint (not on the reference

P~

plane). Using a simple geometric drawing, the ane

structure invariant is derived as follows.

Consider Figure 1. The pro jections of the reference

p oint Q and an arbitrary p ointofinterest P form two

0 0 0 0

~ ~

P Pp p~ and QQq q~ .From similarity p’ similar trap ezoids:

p

of trap ezoids wehave,

q

0 0

~

jp p~ j P P j ~ j

p’

= : =

p

0 0

q’ ~ jq q~ j

jQ Qj

V

0

q; q is a given corresp onding p oint, we

1 By assuming that

variantthatisinvariant under parallel V ~ obtain a shap e in

2 q’

pro jection (the ob ject p oints are xed while the camera

changes the lo cation and p osition of the image plane

towards the pro jecting rays).

Before we extend this result to central pro jection by

Figure 1: Ko enderink and Van Do orn's Ane Structure.

using pro jective geometry,we rst describ e a di erent

ane invariant using two reference planes, rather than

one reference plane and a reference p oint. The new ane

through a xed p oint in the image and the image plane

invariant is the one that will b e applied later to central

is p erp endicular to the optical axis.

pro jection.

We denote ob ject p oints in capital letters and image

p oints in small letters. If P denotes an ob ject p ointin3D

5 Ane Structure Using Two

0 00

space, p; p ;p denote its pro jections onto the rst, sec-

Reference Planes

ond and novel pro jections, resp ectively.We treat image

p oints as rays (homogeneous co ordinates) in 3D space,

We make use of the same information | the pro jections

and refer to the notation p =(x; y ; 1) as the standard of four non-coplanar p oints | to set up two reference

representation of the image plane. We note that the

planes. Let P , j =1; :::; 4, b e the four non-coplanar

j

0

true co ordinates of the image plane are related to the

b e their ob- reference p oints in space, and let p ! p

j

j

served pro jections in b oth views. The p oints P ;P ;P standard representation by means of a pro jective trans-

1 2 3

formation of the plane. In case we deal with central

and P ;P ;P lie on two di erent planes, therefore, we

2 3 4

pro jection, all representations of image co ordinates are

can account for the motion of all p oints coplanar with

each of these two planes. Let P b e a p ointofinterest, allowed, and therefore, without loss of generalitywework

with the standard representation (more on that in Ap-

not coplanar with either of the reference planes, and let

~ ^

p endix A).

P and P b e its pro jections onto the two reference planes

along the raytowards the rst view.

~ ^

4 Ane Structure: Ko enderink and

Consider Figure 2. The pro jection of P; P and P onto

0 0 0

p ; p~ andp ^ resp ectively,gives rise to two similar trap e-

Van Do orn's Version

zoids from whichwe derive the following relation:

The ane structure invariant describ ed by Ko enderink

0 0

~

jP P j jp p~ j

and Van Do orn (1991) is based on a geometric con-

= = :

p

0 0

^

jp p^ j

jP P j

struction using a single reference plane, and a reference

p oint not coplanar with the reference plane. In ane

The ratio is invariant under parallel pro jection. There

p

geometry (induced by parallel pro jection), it is known

is no particular advantage for preferring over as

p p

from the fundamental theorem of plane pro jectivity, that

a measure of ane structure, but as will b e describ ed

three (non-collinear) corresp onding p oints are sucient

b elow, this new construction forms the basis for extend-

to uniquely determine all other corresp ondences (see Ap-

ing ane structure to pro jective structure, whereas the

p endix A for more details on plane pro jectivity under

single reference plane construction do es not (see Ap-

ane and pro jective geometry). Using three corresp ond-

p endix D for pro of ).

ing p oints b etween two views provides us, therefore, with

In the pro jectiveplane,we need four coplanar p oints

a transformation (ane transformation) for determining

to determine the motion of a reference plane. Weshow

the lo cation of all p oints of the plane passing through

that, given the epip oles, only three corresp onding p oints

the three reference p oints in the second image plane.

for each reference plane are sucient for recovering the

Let P b e an arbitrary p oint in the scene pro jecting

asso ciated pro jective transformations induced by those

0

~

onto p; p on the two image planes. Let P b e the pro jec-

planes. Altogether, the construction provides us with

tion of P onto the reference plane along the raytowards

four p oints along each epip olar line. The similarityof

0

~

the rst image plane, and letp ~ b e the pro jection of P trap ezoids in the ane case turns, therefore, into a cross-

0 0

onto the second image plane (p andp ~ coincide if P is ratio in the pro jective case. 3 P P reference planes

reference ~P plane P~ ^P image image ^ ~ plane P plane p’ p’ p^’

p’ p p V 2 ~ p’ v l V V 1 1 V

^p’

e shap e as the cross ratio

2 Figure 3: De nition of pro jectiv

0 0 0

of p ; p~ ; p^ ;V .

l

Figure 2: Ane structure using two reference planes.

as the ane invariant de ned in section 5 for parallel

pro jection.

The cross-ratio is a direct extension of the ane

p

This leads to the result (Theorem 1) that, in addition

structure invariant de ned in section 5 and is referred

to the epip oles, only four corresp onding p oints, pro jected

to as projective structure.We can use this invariantto

from four non-coplanar p oints in the scene, are sucient

reconstruct anynovel view of the ob ject (taken bya

for recovering the pro jective structure invariant for all

non-rigid camera) without ever recovering depth or even

other p oints. The epip oles can b e recovered by either

pro jective co ordinates of the ob ject.

extending the Ko enderink and Van Do orn (1991) con-

Having de ned the pro jective shap e invariant, and as-

struction to pro jective space using six p oints (four of

suming we still are given the lo cations of the epip oles,

which are assumed to b e coplanar), or by using other

we show next how to recover the pro jections of the two

metho ds, notably those based on the Longuet-Higgins

reference planes onto the second image plane, i.e., we

0 0 fundamental matrix. This leads to pro jective structure

describ e the computations leading top ~ andp ^ .

from eight p oints in general p osition.

Since we are working under central pro jection, we

need to identify four coplanar p oints on each reference

6 Pro jective Structure

plane. In other words, in the pro jective geometry of the

plane, four corresp onding p oints, no three of whichare

We assume for now that the lo cation of b oth epip oles is

collinear, are sucient to determine uniquely all other

known, and we will address the problem of nding the

corresp ondences (see App endix A, for more details). We

epip oles later. The epip oles, also known as the fo ci of ex-

must, therefore, identify four corresp onding p oints that

pansion, are the intersections of the line in space connect-

are pro jected from four coplanar p oints in space, and

ing the two centers of pro jection and the image planes.

then recover the pro jective transformation that accounts

There are two epip oles, one on each image plane | the

for all other corresp ondences induced from that plane.

epip ole on the second image we call the left epip ole, and

The following prop osition states that the corresp onding

the epip ole on the rst image we call the right epip ole.

epip oles can b e used as a fourth corresp onding p oint for

The image lines emanating from the epip oles are known

any three corresp onding p oints selected from the pair of

as the epipolar lines.

images.

Consider Figure 3 which illustrates the two reference

plane construction, de ned earlier for parallel pro jection,

Prop osition 1 Aprojective transformation, A, that is

now displayed in the case of central pro jection. The

determinedfrom three arbitrary, non-col linear, corre-

left epip ole is denoted by V , and b ecause it is on the

l

sponding points and the corresponding epipoles, is a pro-

line V V (connecting the two centers of pro jection), the

1 2

jective transformation of the plane passing through the

0

line PV pro jects onto the epip olar line p V . Therefore,

1 l

three object points which project onto the correspond-

0 0

~ ^

the p oints P and P pro ject onto the p ointsp ~ andp ^ ,

ing image points. The transformation A is an induced

0

which are b oth on the epip olar line p V . The p oints

l

epipolar transformation, i.e., the ray Ap intersects the

0 0 0

0

p ; p~ ; p^ and V are collinear and pro jectively related to

l

epipolar line p V for any arbitrary image point p and its

l

0 ~ ^

P; P; P; V , and therefore have the same cross-ratio:

corresponding point p .

1

0 0 0

~

Comment: An epip olar transformation F is a mapping

jV p^j jV p^ j jp p~ j jP P j

1 l

  : = =

p

between corresp onding epip olar lines and is determined

0 0 0

~

jP p^j jp p^ j jV p~ j

jV P j

l

1

(not uniquely) from three corresp onding epip olar lines

and the epip oles. The induced p oint transformation is Note that when the epip ole V b ecomes an ideal p oint

l

1 t

E =(F ) (induced from the p oint/line dualityofpro- (vanishes along the epip olar line), then is the same

p 4

k

(see App endix B for more details). This jective geometry, see App endix C for more details on then =

0 p

k

way of computing the cross-ratio is preferred over the epip olar transformations).

0

more familiar cross-ratio of four collinear p oints, b ecause Pro of: Let p ! p , j =1; 2; 3, b e three arbitrary

j

j

it enables us to work with all elements of the pro jective corresp onding p oints, and let V and V denote the left

l r

plane, including ideal p oints (a situation that arises, for and right epip oles. First note that the four p oints p and

j

instance, when epip olar lines are parallel, and in general V are pro jected from four coplanar p oints in the scene.

r

under parallel pro jection). The reason is that the plane de ned by the three ob ject

p oints P intersects the line V V connecting the two

Wehave therefore shown the following result:

j 1 2

centers of pro jection, at a p oint | regular or ideal. That

Theorem 1 In the case wherethelocation of epipoles

p oint pro jects onto b oth epip oles. The transformation

are known, then four corresponding points, coming from

A, therefore, is a pro jective transformation of the plane

four non-coplanar points in space, are sucient for com-

passing through the three ob ject p oints P ;P ;P . Note

1 2 3

puting the projective structure invariant for al l other

p

that A is uniquely determined provided that no three of

points in spaceprojecting onto corresponding points in

the four p oints are collinear.

both views, for al l central projections, including paral lel

0

Let p~ = Ap for some arbitrary p oint p. Because lines

projection.

are pro jectiveinvariants, any p oint along the epip olar

0

This result shows that the di erence b etween parallel

line pV must pro ject onto the epip olar line p V . Hence,

r l

and central pro jection lies entirely on the epip oles. In

A is an induced epip olar transformation.

b oth cases four non-coplanar p oints are sucient for ob-

Given the epip oles, therefore, we need just three p oints

taining the invariant, but in the parallel pro jection case

to determine the corresp ondences of all other p oints

wehave prior knowledge that b oth epip oles are ideal,

coplanar with the reference plane passing through the

therefore they are not required for determining the trans-

three corresp onding ob ject p oints. The transformation

formations A and E (in other words, A and E are ane

(collineation) A is determined from the following equa-

transformations, more on that in Section 7.2).

tions:

0

Ap =  p ; j =1; 2; 3

Another p oint to note with this result is that the

j j

j

minimal numb er of corresp onding p oints needed for re-

AV = V ;

r l

pro jection is smaller than the previously rep orted num-

where ;  are unknown scalars, and A =1. One

j 3;3

ber (Faugeras 1992, Mohr et al. 1992) for recovering

can eliminate ;  from the equations and solve for the

j

the pro jective co ordinates of ob ject p oints. Faugeras

matrix A from the three corresp onding p oints and the

shows that ve corresp onding p oints coming from ve

corresp onding epip oles. That leads to a linear system

p oints in general p osition (i.e., no four of them are copla-

of eight equations, and is describ ed in more detail in

nar) can b e used, together with the epip oles, to recover

App endix A.

the pro jective co ordinates of all other p oints in space.

If P ;P ;P de ne the rst reference plane, the trans-

1 2 3

Because the pro jective structure invariant requires only

0

formation A determines the lo cation ofp ~ for all other

four p oints, this implies that re-pro jection is done more

0 0

p oints p (~p and p coincide if P is coplanar with the rst

directly than through full reconstruction of pro jective

0

reference plane). In other words, wehave thatp ~ = Ap.

co ordinates, and therefore is likely to b e more stable.

0

Note thatp ~ is not necessarily a p oint on the second im-

We next discuss algorithms for recovering the lo ca-

~

age plane, but it is on the line V P .We can determine

2

tion of epip oles. The problem of recovering the epip oles

its lo cation on the second plane by normalizing Ap such

is well known and several approaches havebeensug-

that its third comp onent is set to 1.

gested in the past (Longuet-Higgins and Prazdny1980,

Similarly, let P ;P ;P de ne the second reference

2 3 4

Rieger-Lawton 1985, Faugeras and Maybank 1990, Hil-

plane (assuming the four ob ject p oints P , j =1; :::; 4,

j

dreth 1991, Faugeras 1992, Faugeras, Luong and May-

are non-coplanar). The transformation E is uniquely

bank 1992). We start with a metho d that requires six

determined by the equations

corresp onding p oints (two additional p oints to the four

0

we already have). The metho d is a direct extension of the

Ep =  p ; j =2; 3; 4

j j

j

Ko enderink and Van Do orn (1991) construction in par-

EV = V ;

r l

allel pro jection, and was describ ed earlier by Lee (1988)

and determines all other corresp ondences induced by the

for the purp ose of recovering the translational comp o-

second reference plane (we assume that no three of the

nent of camera motion.

four p oints used to determine E are collinear). In other

The second algorithm for lo cating the epip oles is

0

words, Ep determines the lo cation ofp ^ up to a scale

adopted from Faugeras (1992) and is based on the fun-

^

damental matrix of Longuet-Higgins (1981).

factor along the ray V P .

2

Instead of normalizing Ap and Ep we compute

p

from the cross-ratio of the p oints represented in homo-

7 Epip oles from Six Points

geneous co ordinates, i.e., the cross-ratio of the four rays

0 0 0 0

We can recover the corresp ondences induced from the

V p ;V p~ ;V p^ ;V V , as follows: Let the rays p ;V be

2 2 2 2 l l

0

rst reference plane by selecting four corresp onding

represented as a linear combination of the raysp ~ = Ap

0

p oints, assuming they are pro jected from four coplanar

andp ^ = Ep, i.e.,

0 0 0

ob ject p oints. Let p =(x ;y ; 1) and p =(x ;y ; 1)

j j j

j j j

0 0 0

p =~p + k p^

and j =1; :::; 4 represent the standard image co ordinates

0 0 0

of the four corresp onding p oints, no three of whichare

V =~p + k p^ ;

l 5

P

wo epip olar lines are

6 In the case where more than t

available (such as when more than six corresp onding

p oints are available), one can nd a least-squares so-

reference

y using a principle comp onent P lution for the epip ole b

5 plane

analysis, as follows. Let B be a k  3 matrix, where

hrow represents an epip olar line. The least squares ~ ~ eac P P

5 6 image

V is the unit eigenvector asso ciated with the p’ solution to plane l

image 6 t

umb er of the 3  3 matrix B B . Note that p smallest eigen

plane p’ haracteristic 6 this can b e done analytically b ecause the c 5

~ . ~p’ p’ p equation is a cubic p olynomial

6 5

ehave a six p oint algorithm for recover-

. 5 Altogether, w

ing b oth the epip oles, and the pro jective structure , p

V

to anynovel view. 2 V and for p erforming re-pro jection on

l

e summarize in the following section the 6-p oint algo-

. W (left epipole) V

1 rithm.

7.1 Re-pro jection Using Pro jective Structure:

Figure 4: The geometry of lo cating the left epip ole using

6-p oint Algorithm

two p oints out of the reference plane.

We assume we are given two mo del views of a 3D ob ject,

and that all p oints of interest are in corresp ondence. We

assume these corresp ondences can b e based on measures

collinear, in b oth pro jections. Therefore, the transfor-

of correlation, as used in optical- ow metho ds (see also

mation A is uniquely determined by the following equa-

Shashua 1991, Bachelder & Ullman 1992 for metho ds for

tions,

0

extracting corresp ondences using combination of optical

= Ap :  p

j j

j

ow and ane geometry).

0

Letp ~ = Ap b e the homogeneous co ordinate representa-

Given a novel view we extract six corresp onding p oints

1 1 0

~

tion of the ray V P , and letp ~ = A p .

0 00

2

(with one of the mo del views): p ! p ! p ,

j

j j

Having accounted for the motion of the reference

j =1; :::; 6. We assume the rst four p oints are pro jected

plane, we can easily nd the lo cation of the epipoles (in

from four coplanar p oints, and the other corresp onding

standard co ordinates). Given two ob ject p oints P ;P

5 6

p oints are pro jected from p oints that are not on the ref-

that are not on the reference plane, we can nd b oth

erence plane. Without loss of generality,we assume the

0

epip oles by observing thatp ~ is on the left epip olar

standard co ordinate representation of the image planes,

1

line, and similarly thatp ~ is on the right epip olar line.

i.e., the image co ordinates are emb edded in a 3D vec-

Stated formally,wehave the following prop osition:

tor whose third comp onent is set to 1 (see App endix A).

Prop osition 2 The left epipole, denotedbyV ,isatthe

The computations for recovering pro jective shap e and

l

0 0 0 0

intersection of the line p p~ and the line p p~ . Similarly,

p erforming re-pro jection are describ ed b elow.

5 5 6 6

the right epipole, denotedbyV ,isattheintersection of

0 r

= 1: Recover the transformation A that satis es  p

j

1 1

j

p p~ and p p~ .

5 6

5 6

Ap , j =1; :::; 4. This requires setting up a linear

j

Pro of: It is sucient to prove the claim for one of the

system of eight equations (see App endix A). Apply

0

epip oles, say the left epip ole. Consider Figure 4 which

the transformation to all p oints p, denotingp ~ = Ap.

0 0 0 0

describ es the construction geometrically. By construc-

Also recover the epip oles V =(p  p~ )  (p  p~ )

l

5 5 6 6

1 0 1 0 0 0

~

and V =(p  A p )  (p  A p ). tion, the line P P V pro jects to the line p p~ via V

r 5 6 5 5 1 2

5 5

5 6

(p oints and lines are pro jectiveinvariants) and therefore

2: Recover the transformation E that satis es V =

l

they are coplanar. In particular, V pro jects to V which

0

1 l

EV and  p = Ep , j =4; 5; 6.

r j j

j

0 0

is lo cated at the intersection of p p~ and V V . Simi-

1 2

5 5

0

0 0

3: Compute the cross-ratio of the p oints p ;Ap;Ep;V ,

^

l

larly, the line p p~ intersects V V at V . Finally, V and

1 2 l l

6 6

for all p oints p and denote that by (see Ap-

0 0 0 0

p

^

V must coincide b ecause the twolines p p~ and p p~ are

l

5 5 6 6

p endix B for details on computing the cross-ratio

coplanar (b oth are on the image plane).

of four rays).

Algebraically,we can recover the ray V V ,or V up to

1 2 l

4: Perform step 1 b etween the rst and novel view:

a scale factor, using the following formula:

00

~ ~

0 0 0 0

recover A that satis es  p = Ap , j =1; :::; 4,

j j

j

):  p~ )  (p  p~ V =(p

l

6 6 5 5

00

~ ~

apply A to all p oints p and denote that by~p = Ap,

Note that V is de ned with resp ect to the standard co or-

l

00 00 00 00

recover the epip oles V =(p  p~ )  (p  p~ ) and

ln

5 5 6 6

dinate frame of the second camera. We treat the epip ole

1 00 1 00

~ ~

). )  (p  A p V =(p  A p

6 rn 5

6 5

V as the ray V V with resp ect to V , and the epip ole

l 1 2 2

V as the same ray but with resp ect to V . Note also

5: Perform step 2 b etween the rst and novel view:

r 1

~

that the third comp onentofV is zero if epip olar lines

l

Recover the transformation E that satis es V =

ln

00

are parallel, i.e., V is an ideal p oint in pro jectiveterms

~

l

EV and  p = Ep , j =4; 5; 6.

rn j j

j

(happ ening under parallel pro jection, or when the non-

00

6: For every p oint p, recover p from the cross-ratio

p

rigid camera motion brings the image plane to a p osition

00

~ ~

where it is parallel to the line V V ). and the three rays Ap; Ep; V . Normalize p such

1 2 ln 6

that its third co ordinate is set to 1. 8 Epip oles from EightPoints

We adopt a recent algorithm suggested byFaugeras

The entire pro cedure requires setting up a linear sys-

(1992) which is based on Longuet-Higgins' (1981) funda-

tem of eight equations four times (Step 1,2,4,5) and com-

mental matrix. The metho d is very simple and requires

puting cross-ratios (linear op erations as well).

eight corresp onding p oints for recovering the epip oles.

We discuss b elow an imp ortant prop erty of this pro-

0

Let F b e an epip olar transformation, i.e., Fl = l ,

cedure which is the transparency with resp ect to pro jec-

0 0

where l = V  p and l = V  p are corresp onding

r l

tion mo del: central and parallel pro jection are treated

epip olar lines. We can rewrite the pro jective relation of

alike | a prop ertywhich has implications on stability

epip olar lines using the matrix form of cross-pro ducts:

of re-pro jection no matter what degree of p ersp ective

0

distortions are present in the images.

F (V  p)=F [V ]p = l ;

r r

where [V ]isa skew symmetric matrix (and hence has

r

7.2 The Case of Parallel Pro jection

2). From the p oint/line incidence prop ertywehave

t t

0 0 0 0

Hp =0 F [V ]p =0,orp that p  l = 0 and therefore, p

r

The construction for obtaining pro jective structure is

where H = F [V ]. The matrix H is known as the fun-

r

well de ned for all central pro jections, including the case

damental matrix intro duced by Longuet-Higgins (1981),

where the center of pro jection is an ideal p oint, i.e., such

and is of rank 2. One can recover H (up to a scale factor)

as happ ening with parallel pro jection. The construction

directly from eight corresp onding p oints, or by using a

has two comp onents: the rst comp onent has to do with

principle comp onents approach if more than eightpoints

recovering the epip olar geometry via reference planes,

are available. Finally, it is easy to see that

and the second comp onent is the pro jectiveinvariant .

p

HV =0;

From Prop osition 1 the pro jective transformations A

r

and E can b e uniquely determined from three corre-

and therefore the epip ole V can b e uniquely recovered

r

sp onding p oints and the corresp onding epip oles. If b oth

(up to a scale factor). Note that the determinantof

epip oles are ideal, the transformations b ecome ane

the rst principle minor of H vanishes in the case where

transformations of the plane (an ane transformation

V is an ideal p oint, i.e., h h h h =0.Inthat

r 11 22 12 21

separates ideal p oints from Euclidean p oints). All other

case, the x; y comp onents of V can b e recovered (up to

r

p ossibilities (b oth epip oles are Euclidean, one epip ole

a scale factor) from the third rowof H . The epip oles,

Euclidean and the other epip ole ideal) lead to pro jective

therefore, can b e uniquely recovered under b oth central

transformations. Because a pro jectivity of the pro jec-

and parallel pro jection. Wehavearrived at the following

tive plane is uniquely determined from any four p oints

theorem:

on the pro jective plane (provided no three are collinear),

the transformations A and E are uniquely determined

Theorem 2 In the case where we have eight correspond-

under all situations of central pro jection | including

ing points of two views taken under central projection

parallel pro jection.

(including paral lel projection), four of these points, com-

ing from four non-coplanar points in space, aresu-

The pro jectiveinvariant is the same as the one

p

cient for computing the projective structure invariant

p

de ned under parallel pro jection (Section 5) | ane

for the remaining four points and for al l other points in

structure is a particular instance of pro jective structure

spaceprojecting onto corresponding pointsinboth views.

in which the epip ole V is an ideal p oint. By using the

l

same invariant for b oth parallel and central pro jection,

We summarize in the following section the 8-p oint

and b ecause all other elements of the geometric construc-

scheme for reconstructing pro jective structure and p er-

tion hold for b oth pro jection mo dels, the overall system

forming re-pro jection onto a novel view.

is transparent to the pro jection mo del b eing used.

8.1 8-p oint Re-pro jection Algorithm

The rst implication of this prop erty has to do with

stability. Pro jective structure do es not require any p er-

We assume wehave eight corresp onding p oints b etween

0 00

sp ective distortions, therefore all imaging situations can

two mo del views and the novel view, p ! p ! p ,

j

j j

b e handled | wide or narrow eld of views. The second

j =1; :::; 8, and that the rst four p oints are coming from

implication is that 3D visual recognition from 2D images

four non-coplanar p oints in space. The computations

can b e achieved in a uniform manner with regard to the

for recovering pro jective structure and p erforming re-

pro jection mo del. For instance, we can recognize (via re-

pro jection are describ ed b elow.

pro jection) a p ersp ective image of an ob ject from only

1: Recover the fundamental matrix H (up to a scale

two orthographic mo del images, and in general anycom-

t

0

factor) that satis es p Hp , j =1; :::; 8. The right

j

j

bination of p ersp ective and orthographic images serving

epip ole V then satis es HV = 0. Similarly,the

r r

as mo del or novel views is allowed.

t 0

~

left epip ole is recovered from the relation p Hp and

The results so far required prior knowledge (or as-

~

HV =0.

l

sumption) that four of the corresp onding p oints are com-

ing from coplanar p oints in space. This requirementcan 2: Recover the transformation A that satis es V =

l

0

be avoided, using two more corresp onding p oints (mak- AV and  p = Ap , j =1; 2; 3. Similarly, recover

r j j

j

ing eight p oints overall), and is describ ed in the next the transformation E that satis es V = EV and

l r

0

section.  p = Ep , j =2; 3; 4.

j j

j 7

0 P

as the cross-ratio of p ;Ap;Ep;V ,for

3: Compute s

p l

all p oints p.

P

erform step 1 and 2 b etween the rst and novel 4: P ~

Ps

view: recover the epip oles V ;V , and the trans-

rn ln

~ ~

~ P

formations A and E .

00

5: For every p oint p, recover p from the cross-ratio

p

00

~ ~

and the three rays Ap; Ep; V . Normalize p such

ln that its third co ordinate is set to 1. p’

p

e discuss next the p ossibilityofworking with a rigid W ~p’ s p

^ e pro jection and calibrated cam-

camera (i.e., p ersp ectiv p’ era).

V 2

V’

9 The Rigid Camera Case V 1 v

s

The advantage of the non-rigid camera mo del (or the

central pro jection mo del) used so far is that images can

b e obtained from uncalibrated cameras. The price paid

Figure 5: Illustration that pro jective shap e can b e re-

for this prop erty is that the images that pro duce the

covered only up to a uniform scale (see text).

same pro jective structure invariant(equivalence class of

images of the ob ject) can b e pro duced by applying non-

To determine the re ection comp onent, it is sucient

rigid transformations of the ob ject, in addition to rigid

0

.The to observe a third corresp onding p oint p ! p

3

3

transformations.

ob ject p oint P is along the ray V p and therefore has

3 1 3

In this section we show that it is p ossible to verify

the co ordinates p (w.r.t. the rst camera co ordinate

3 3

whether the images were pro duced by rigid transfor-

0

frame), and is also along the ray V p and therefore has

2

3

mations, which is equivalenttoworking with p ersp ec-

0 0

(w.r.t. the second camera co ordi- p the co ordinates

3 3

tive pro jection assuming the cameras are internally cal-

nate frame). We note that the ratio b etween and

3

ibrated. This can b e done for b oth schemes presented

0

is a p ositivenumb er. The change of co ordinates is

3

ab ove, i.e., the 6-p oint and 8-p oint algorithms. In b oth

represented by:

cases we exclude orthographic pro jection and assume

0 0

V + Rp = p ;

only p ersp ective pro jection.

r 3 3

3 3

In the p ersp ective case, the second reference plane is

where is an unknown constant. If wemultiply b oth

0

the image plane of the rst mo del view, and the trans-

, j =1; 2; 3, the term V sides of the equation by l

r

j

formation for pro jecting the second reference plane onto

drops out, b ecause V is incident to all left epip olar lines,

r

t

t 0

any other view is the rotational comp onent of camera

and after substituting l with l R,we are left with,

j j

motion (rigid transformation). We recover the rota-

t

0 0 0 t

;  p l  p = l

tional comp onent of camera motion byadoptingare-

3 3

3 j 3 j

0

sult derived by Lee (1988), who shows that the rota-

which is sucient for determining the sign of l .

j

tional comp onent of motion can b e uniquely determined

The rotation matrix R can b e uniquely recovered from

from two corresp onding p oints and the corresp onding

any three corresp onding p oints and the corresp onding

epip oles. We then show that pro jective structure can b e

epip oles. Pro jective structure can b e reconstructed by

uniquely determined, up to a uniform scale factor, from

replacing the transformation E of the second reference

two calibrated p ersp ective images.

plane, with the rigid transformation R (whichisequiv-

alent to treating the rst image plane as a reference

Prop osition 3 (Lee, 1988) In the case of perspective

plane). We show next that this can lead to pro jective

projection, the rotational component of camera motion

structure up to an unknown uniform scale factor (unlike

can be uniquely recovered, up to a re ection, from two

the non-rigid camera case).

corresponding points and the corresponding epipoles.

Prop osition 4 In the perspective case, the projective

The re ection component can also be uniquely deter-

shapeconstant can be determined, from two views,

mined by using a thirdcorresponding point.

p

at most up to a uniform scale factor.

0 0

Pro of: Let l = p  V and l = p  V , j =1; 2

l j j r

j j

Pro of: Consider Figure 5, and let the e ective trans-

be two corresp onding epip olar lines. Because R is an or-

lation b e V V = k (V V ), which is the true trans-

2 s 2 1

thogonal matrix, it leaves vector magnitudes unchanged,

0 0

lation scaled by an unknown factor k . Pro jectiveshape,

and we can normalize the length of l ;l ;V to b e of the

l

1 2

, remains xed if the scene and the fo cal length of the

p

same length of l ;l ;V , resp ectively.Wehave therefore,

1 2 r

0

rst view are scaled by k : from similarity of triangles we

l = Rl , j =1; 2, and V = RV ,which is sucientfor

j l r

j

have,

determining R up to a re ection. Note that b ecause R

V V p V f

s 2 s s s

is a rigid transformation, it is b oth an epip olar and an

k = = =

V V p V 1

1 2 1

induced epip olar transformation (the induced transfor-

1 t

P V P V mation E is determined by E =(R ) , therefore E = R

s s s 2

= =

b ecause R is an orthogonal matrix).

P V P V

1 2 8

Y

the corresp onding p oints b e pro jected from four copla-

nar p oints in space, it is of sp ecial interest to see how

the metho d b ehaves under conditions that violate this

assumption, and under noise conditions in general. The

stability of the 8-p oint algorithm largely dep ends on the

metho d for recovering the epip oles. The metho d adopted

from Faugeras (1992), describ ed in Section 8, based on

the fundamental matrix, tends to b e very sensitiveto

al numberofpoints (eightpoints) are

Z noise if the minim

used. Wehave, therefore, fo cused the exp erimental error

analysis on the 6-p ointscheme.

Figure 6 illustrates the exp erimental set-up. The ob-

object points

ts in space arranged in the follow- image plane ject consists of 26 p oin

X

ing manner: 14 p oints are on a plane (reference plane)

ortho-parallel to the image plane, and 12 p oints are out

Figure 6: The basic ob ject con guration for the exp eri-

of the reference plane. The reference plane is lo cated

mental set-up.

two fo cal lengths away from the center of pro jection (fo-

cal length is set to 50 units). The depth of out-of-plane

p oints varies randomly b etween 10 to 25 units away from

where f is the scaled fo cal length of the rst view. Since

s

the reference plane. The x; y co ordinates of all p oints,

the magnitude of the translation along the line V V is

1 2

except the p oints P ;:::;P ,vary randomly b etween 0

1 6

irrecoverable, we can assume it is null, and compute

p

| 240. The `privileged' p oints P ; :::; P have x; y co-

1 6

0

as the cross-ratio of p ; Ap; Rp; V which determines pro-

l

ordinates that place these p oints all around the ob ject

jective structure up to a uniform scale.

(clustering privileged p oints together will inevitably con-

Because is determined up to a uniform scale, we

p

tribute to instability).

need an additional p oint in order to establish a common

The rst view is simply a p ersp ective pro jection of the

scale during the pro cess of re-pro jection (we can use one

ob ject. The second view is a result of rotating the ob ject

of the existing six or eightpoints we already have). We

around the p oint (128; 128; 100) with an axis of rotation

obtain, therefore, the following result:

describ ed by the unit vector (0:14; 0:7; 0:7) by an an-

Theorem 3 In the perspective case, a rigid re-

gle of 29 degrees, followed by a p ersp ective pro jection

projection from two model views onto a novel view is pos-

(note that rotation ab out a p oint in space is equivalent

sible, using four corresponding points coming from four

to rotation ab out the center of pro jection followed by

non-coplanar points, and the corresponding epipoles.

translation). The third (novel) view is constructed in a

The projective structurecomputedfrom two perspective

similar manner with a rotation around the unit vector

images, is invariant up to an overal l scale factor.

(0:7; 0:7; 0:14)by an angle of 17 degrees. Figure 7 ( rst

row) displays the three views. Also in Figure 7 (second

Orthographic pro jection is excluded from this result

row) we show the result of applying the transformation

b ecause it is well known that the rotational comp onent

due to the four coplanar p oints p ; :::; p (Step 1, see Sec-

1 4

cannot b e uniquely determined from two orthographic

tion 7.1) to all p oints in the rst view. We see that all

views (Ullman 1979, Huang and Lee 1989, Aloimonos

the coplanar p oints are aligned with their corresp ond-

and Brown 1989). To see what happ ens in the case of

ing p oints in the second view, and all other p oints are

parallel pro jection note that the epip oles are vectors on

situated along epip olar lines. The display on the right

the xy plane of their co ordinate systems (ideal p oints),

in the second rowshows the nal re-pro jection result (8-

and the epip olar lines are twovectors p erp endicular to

p oint and 6-p oint metho ds pro duce the same result). All

the epip ole vectors. The equation RV = V takes care

r l

p oints re-pro jected from the two mo del views are accu-

of the rotation in plane (around the optical axis). The

0 rately (noise-free exp eriment) aligned with their corre-

, j =1; 2, take care only other two equations Rl = l

j

j

sp onding p oints in the novel view.

of rotation around the epip olar direction | rotation

The third row of Figure 7 illustrates a more challeng-

around an axis p erp endicular to the epip olar direction

ing imaging situation (still noise-free). The second view

is not accounted for. The equations for solving for R

is orthographically pro jected (and scaled by 0.5) follow-

provide a non-singular system of equations but do pro-

ing the same rotation and translation as b efore, and the

duce a rotation matrix with no rotational comp onents

novel view is a result of a central pro jection onto a tilted

around an axis p erp endicular to the epip olar direction.

image plane (rotated by 12 degrees around a coplanar

axis parallel to the x-axis). Wehave therefore the situ-

10 Simulation Results Using Synthetic

ation of recognizing a non-rigid p ersp ective pro jection

Ob jects

from a novel viewing p osition, given a rigid p ersp ec-

tive pro jection and a rigid orthographic pro jection from We ran simulations using synthetic ob jects to illustrate

two mo del viewing p ositions. The 6-p oint re-pro jection the re-pro jection pro cess using the 6-p ointscheme under

scheme was applied with the result that all re-pro jected various imaging situations. We also tested the robust-

p oints are in accurate alignment with their corresp ond- ness of the re-pro jection metho d under various typ es of

ing p oints in the novel view. Identical results were ob- noise. Because the 6-p ointscheme requires that four of 9

Figure 7: Illustration of Re-pro jection. Row 1 (left to right): Three views of the ob ject, two mo del views and a

novel view, constructed by rigid motion following p ersp ective pro jection. The lled dots represent p ; :::; p (coplanar

1 4

p oints). Row 2: Overlay of the second view and the rst view following the transformation due to the reference

plane (Step 1, Section 7.1). All coplanar p oints are aligned with their corresp onding p oints, the remaining p oints are

situated along epip olar lines. The righthand display is the result of re-pro jection | the re-pro jected image p erfectly

matches the novel image (noise-free situation). Row 3: The lefthand displayshows the second view whichisnow

orthographic. The middle display shows the third view whichisnow a p ersp ective pro jection onto a tilted image

plane. The righthand display is the result of re-pro jection which p erfectly matches the novel view. 10

lower than the noise asso ciated with other p oints, for served with the 8-p oint algorithms.

the reason that we are interested in tracking p oints of

The remaining exp eriments, discussed in the follow-

interest that are often asso ciated with distinct inten-

ing sections, were done under various noise conditions.

sity structure (such as the tip of the eye in a picture

We conducted three typ es of exp eriments. The rst ex-

of a face). Correlation metho ds, for instance, are known

p eriment tested the stability under the situation where

to p erform much b etter on such lo cations, than on ar-

P ; :::; P are non-coplanar ob ject p oints. The second

1 4

eas having smo oth intensitychange, or areas where the

exp eriment tested stability under random noise added

change in intensity is one-dimensional. We therefore ap-

to all image p oints in all views, and the third exp eri-

plied a level of 0{0:3 p erturbation to the x and y co or-

ment tested stability under the situation that less noise

dinates of the six p oints, and a level of 0{1 to all other

is added to the privileged six p oints, than to other p oints.

p oints (as b efore). The results are shown in Figure 10.

10.1 Testing Deviation from Coplanarity

The average pixel error over 10 trials uctuates around

0:5 pixels, and the re-pro jection shown for a typical trial

In this exp erimentweinvestigated the e ect of translat-

(average error 0:52, maximal error 1:61) is in relatively

ing P along the optical axis (of the rst camera p osition)

1

go o d corresp ondence with the novel view. With larger

from its initial p osition on the reference plane (z =100)

p erturbations at a range of 0{2, the algorithm b ehaves

to the farthest depth p osition (z = 125), in increments

prop ortionally well, i.e., the average error over 10 trials

of one unit at a time. The exp erimentwas conducted us-

is 1:37.

ing several ob jects of the typ e describ ed ab ove (the six

privileged p oints were xed, the remaining p oints were

11 Summary

assigned random p ositions in space in di erent trials),

undergoing the same motion describ ed ab ove (as in Fig-

In this pap er we fo cused on the problem of recovering

ure 7, rst row). The e ect of depth translation to the

relative, non-metric, structure from two views of a 3D

level z = 125 on the lo cation of p is a shift of 0:93 pix-

1

ob ject. Sp eci cally,theinvariant structure we recover

0 00

els, on p is 1:58 pixels, and on the lo cation of p is 3:26

1 1

do es not require internal camera calibration, do es not

pixels. Depth translation is therefore equivalent to p er-

involve full reconstruction of shap e (Euclidean or pro-

turbing the lo cation of the pro jections of P byvarious

1

jective co ordinates), and treats parallel and central pro-

degrees (dep ending on the 3D motion parameters).

jection as an integral part of one uni ed system. We

Figure 8 shows the average pixel error in re-pro jection

have also shown that the invariant can b e used for the

over the entire range of depth translation. The average

purp oses of visual recognition, within the framework of

pixel error was measured as the average of deviations

the alignment approach to recognition.

from the re-pro jected p oint to the actual lo cation of the

The study is based on an extension of Ko enderink and

corresp onding p oint in the novel view, taken over all

Van Do orn's representation of ane structure as an in-

p oints. Figure 8 also displays the result of re-pro jection

variant de ned with resp ect to a reference plane and

for the case where P is at z = 125. The average error

1

a reference p oint. We rst showed that the KV ane

is 1:31, and the maximal error (the p oint with the most

invariant cannot b e extended directly to a pro jectivein-

deviation) is 7:1 pixels. The alignmentbetween the re-

variant (App endix D), but there exists another ane in-

pro jected image and the novel image is, for the most

variant, describ ed with resp ect to two reference planes,

part, fairly accurate.

that can easily b e extended to pro jective space. As a

result we obtained the pro jective structure invariant.

10.2 Situation of Random Noise to all Image

Wehaveshown that the di erence b etween the ane

Lo cations

and pro jective case lie entirely in the lo cation of epip oles,

We next add random noise to all image p oints in all

i.e., given the lo cation of epip oles b oth the ane and

three views (P is set back to the reference plane). This

1

pro jective structure are constructed from the same infor-

exp erimentwas done rep eatedly over various degrees of

mation captured by four corresp onding p oints pro jected

noise and over several ob jects. The results shown here

from four non-coplanar p oints in space. Therefore, the

have noise b etween 0{1 pixels randomly added to the x

additional corresp onding p oints in the pro jectivecase

and y co ordinates separately. The maximal p erturbation

are used solely for recovering the lo cation of epip oles.

p

2, and b ecause the direction of p erturba- is therefore

Wehaveshown that the lo cation of epip oles can b e

tion is random, the maximal error in relative lo cation is

recovered under b oth parallel and central pro jection us-

double, i.e., 2:8 pixels. Figure 9 shows the average pixel

ing six corresp onding p oints, with the assumption that

errors over 10 trials (one particular ob ject, the same mo-

four of those p oints are pro jected from four coplanar

tion as b efore). The average error uctuates around 1:6

p oints in space, or alternatively byhaving eightcor-

pixels. Also shown is the result of re-pro jection on a typ-

resp onding p oints without assumptions on coplanarity.

ical trial with average error of 1:05 pixels, and maximal

The overall metho d for reconstructing pro jective struc-

error of 5:41 pixels. The matchbetween the re-pro jected

ture and achieving re-pro jection was referred to as the 6-

image and the novel image is relatively go o d considering

p oint and the 8-p oint algorithms. These algorithms have

the amount of noise added.

the unique prop erty that pro jective structure can b e re-

covered from b oth orthographic and p ersp ective images

10.3 Random Noise Case 2

from uncalibrated cameras. This prop erty implies, for

instance, that we can p erform recognition of a p ersp ec- A more realistic situation o ccurs when the magnitude of

tive image of an ob ject given two orthographic images as noise asso ciated with the privileged six p oints is much 11 1.40 |  

|  1.20   

|  1.00   

|  0.80    |  0.60  

|   Average Pixel Error 0.40   0.20 |    0.00 || | | | | 1 6 11 16 21

Deviation from Reference Plane

Figure 8: Deviation from coplanarity: average pixel error due to translation of P along the optical axis from z =100

1

to z = 125, by increments of one unit. The result of re-pro jection (overlay of re-pro jected image and novel image)

for the case z = 125. The average error is 1:31 and the maximal error is 7:1.

3.00 |  2.70 |

|  2.40    2.10 |

1.80 |

1.50 | | Average Pixel Error 1.20 

|  0.90    0.60 || | | | 1 4 7 10

Random Error Trials

Figure 9: Random noise added to all image p oints, over all views, for 10 trials. Average pixel error uctuates around

1:6 pixels. The result of re-pro jection on a typical trial with average error of 1:05 pixels, and maximal error of 5:41

pixels.

jectivity is completely determined by four corresp onding a mo del. It also implies greater stability b ecause the size

p oints. of the eld of view is no longer an issue in the pro cess of

reconstructing shap e or p erforming re-pro jection.

Geometric Illustration

Acknowledgments

Consider the geometric drawing in Figure 11. Let

A; B ; C ; U b e four coplanar p oints in the scene, and let

0 0 0 0

Iwant to thank David Jacobs and Shimon Ullman for

A ;B ;C ;U b e their pro jection in the rst view, and

00 00 00 00

discussions and comments on this work.

A ;B ;C ;U b e their pro jection in the second view.

By construction, the two views are pro jectively related

A Fundamental Theorem of Plane

to each other. We further assume that no three of the

p oints are collinear (four p oints form a quadrangle), and

Pro jectivity

without loss of generality let U b e lo cated within the

The fundamental theorem of plane pro jectivity states

triangle AB C . Let BC b e the x-axis and BA be the

that a pro jective transformation of the plane is com-

y -axis. The pro jection of U onto the x-axis, denoted by

pletely determined by four corresp onding p oints. We

U ,istheintersection of the line AU with the x-axis.

x

prove the theorem by rst using a geometric drawing,

Similarly U is the intersection of the line CU with the

y

and then algebraically byintro ducing the concept of rays

y -axis. b ecause straight lines pro ject onto straight lines,

0 0

(homogeneous co ordinates). The app endix ends with the

wehave that U ;U corresp ond to U ;U if and only if U

x y

x y

0

system of linear equations for determining the corresp on-

corresp onds to U .For any other p oint P , coplanar with

dence of all p oints in the plane, given four corresp onding

AB C U in space, its co ordinates P ;P are constructed

x y

p oints (used rep eatedly throughout this pap er).

in a similar manner. We therefore havethatB; U ;P ;C

x x

De nitions: A perspectivity between two planes is

are collinear and therefore the cross ratio must b e equal

0 0 0 0

de ned as a central pro jection from one plane onto the

to the cross ratio of B ;U ;P ;C , i.e.

x x

other. A projectivity is de ned as made out of a nite

0 0 0 0

BC  U P B C  U P

x x

x x

sequence of p ersp ectivities. A pro jectivity, when repre-

= :

0 0 0 0

BP  U C B  U C P

x x

x x

sented in an algebraic form, is called a projective trans-

formation. The fundamental theorem states that a pro- This form of cross ratio is known as the canonical cross 12 0.54 |  | 0.51    | 0.48    0.45 |   0.42 |

0.39 | | Average Pixel Error 0.36

0.33 |

0.30 || | | | 1 4 7 10

Random Error to Non-privileged Points Trials

Figure 10: Random noise added to non-privileged image p oints, over all views, for 10 trials. Average pixel error

uctuates around 0:5 pixels. The result of re-pro jection on a typical trial with average error of 0:52 pixels, and

maximal error of 1:61 pixels.

A ation P Algebraic Deriv

y

From an algebraic p oint of view it is convenient to view

ts as laying on rays emanating from the center of U p oin

y

y representation is also called the homo- reference pro jection. A ra

U

ous coordinates representation of the plane, and is plane gene

B P

achieved by adding a third co ordinate. Twovectors rep-

resent the same p oint X =(x; y ; z ) if they di er at most

U

y a scale factor (di erent lo cations along the same ray). x b

C

ey result, whichmakes this representation amenable P Ak

x

to application of linear algebra to geometry, is describ ed wing prop osition: A’ A" in the follo

p"

on 5 Aprojectivity of the plane is equivalent y Prop ositi

p’

ar transformation of the homogeneous represen- y to a line image u" image

y p" tation. plane u’

y u" plane uller p’ B" The pro of is omitted here, and can b e found in T

u’

A pro jectivity is equiv- C" (1967, Theorems 5.22, 5.24). C’ u"x p"

B’ x t, therefore, to a linear transformation applied to p’ alen

ux’

ys. Because the corresp ondence b etween p oints

x the ra

and co ordinates is not one-to-one, wehavetotake scalar

factors of prop ortionalityinto account when represent-

ing a pro jective transformation. An arbitrary pro jective

transformation of the plane can b e represented as a non-

col lineation) V singular linear transformation (also called

2 0

X = TX, where  is an arbitrary scale factor.

V

Given four corresp onding rays p =(x ;y ; 1) !

j j

1 j

0 0 0

=(x ;y ; 1), wewould like to nd a linear transfor- p

j j j

0

mation T and the scalars  suchthat p = Tp . Note

j j j

j

Figure 11: The geometry underlying plane pro jectivity

that b ecause only ratios are involved, we can set  =1.

4

from four p oints.

The following are a basic lemma and theorem adapted

from Semple and Kneeb one (1952).

3

ratio. In general there are 24 cross ratios, six of whichare

Lemma 1 If p ; :::; p arefourvectors in R ,nothreeof

1 4

numerically di erent (see App endix B for more details

which are linearly dependent, and if e ; :::; e arerespec-

1 4

on cross-ratios). Similarly, the cross ratio along the y -

tively the vectors (1; 0; 0); (0; 1; 0); (0; 0; 1); (1; 1; 1),there

axis of the reference frame is equal to the cross ratio of

exists a non-singular linear transformation A such that

the corresp onding p oints in b oth views.

Ae =  p ,wherethe  are non-zeroscalars; and the

j j j j

0

Therefore, for anypoint p in the rst view, wecon-

matrices of any two transformations with this property

0 0 0 0 0 0

struct its x and y lo cations, p ;p , along B C and B A ,

x y

di er at most by a scalar factor.

resp ectively.From the equality of cross ratios we nd

00 00 00

the lo cations of p ;p , and that leads to p . Because Pro of: Let p have the comp onents (x ;y ; 1), and with-

j j j

x y

wehave used only pro jective constructions, i.e. straight out loss of generalitylet  = 1. The matrix A satis es

4

0

lines pro ject to straightlines,we are guaranteed that p three conditions Ae =  p , j =1; 2; 3 if and only if

j j j

00

and p are corresp onding p oints.  p is the j 'th column of A. Because of the fourth con-

j j 13

A

dition, the values  ; ; satisfy

1 2 3

!

 p

1 y. .p



= p [p ;p ;p ]

2

4 1 2 3

BC.  3 p

x

and since, byhyp othesis of linear indep endence of

p ;p ;p , the matrix [p ;p ;p ] is non-singular, the 

2 3 1 2 3 j

1 A’

are uniquely determined and non-zero. The matrix A is therefore determined up to a scalar factor. .

p’ . p’ C’ 0

0 y

Theorem 4 If p ; :::; p and p ; :::; p are two sets of

1 4

1 4

3 .

ctors in R ,nothreevectors in either set be- four ve p’

B’ x

ing linearly dependent, there exists a non-singular linear

0

transformation T such that Tp =  p (j =1; :::; 4),

j j j

A’’

where the  arescalars; and the matrix T is uniquely

j

determinedapart fromascalar factor. p’’

y.

Pro of: By the lemma, we can solveforA and  that

j . p’’

satisfy Ae =  p (j =1; :::; 4), and similarly wecan

j j

j B’’ . 0

cho ose B and  to satisfy Be =  p ; and without loss

j j j p’’

j x

of generality assume that  =  =1.We then have, 4

4 C’’



j

0 1

. If, further, Tp =  p and T = BA and  =

j j j

j



j

0 0 0

Up =  p , then TAe =   p and UAe =   p ;

Figure 12: Setting a pro jectivity under parallel pro jec-

j j j j j j j j

j j j

and therefore, by the lemma, TA = UA,i.e.,T = U

tion.

for some scalar  .

The immediate implication of the theorem is that one

The standard way to pro ceed is to assume that b oth

can solve directly for T and  ( = 1). Four p oints

j 4

image planes are parallel to their xy plane with a fo cal

provide twelve equations and wehavetwelve unknowns

length of one unit, or in other words to emb ed the im-

(nine for T and three for  ). Furthermore, b ecause the

j

age co ordinates in a 3D vector whose third comp onent

system is linear, one can lo ok for a least squares solu-

0 0 0

; 1) b e the the ;y =(x is 1. Let p =(x ;y ; 1) and p

j j j

tion by using more than four corresp onding p oints (they

j j j

chosen representation of image p oints. The true co ordi-

all have to b e coplanar): each additional p ointprovides

nates of those image p oints may b e di erent (if the image

three more equations and one more unknown (the  as-

plane are in di erent p ositions than assumed), but the

so ciated with it).

main p oint is that all such representations are pro jec-

Alternatively, one can eliminate  from the equations,

j

tively equivalenttoeach other. Therefore,  p = B p^

j j j

set T = 1 and set up directly a system of eight lin-

3;3

0 0 0

and  p = C p^ , wherep ^ andp ^ are the true image

j j

ear equations as follows. In general wehave four cor-

j j j

0 0 0 0

co ordinates of these p oints. If T is the pro jective trans-

resp onding rays p =(x ;y ;z ) ! p =(x ;y ;z ),

j j j j

j j j j

formation determined by the four corresp onding p oints

j =1; :::; 4, and the linear transformation T satis es

0 1

0

p^ ! p^ ,thenA = CT B is the pro jective transfor-

j

 p = Tp . By eliminating  , each pair of corresp ond-

j

j j j

j

0

. mation b etween the assumed representations p ! p

ing rays contributes the following two linear equations:

j

j

Therefore, the matrix A can b e solved for directly

0 0 0

y x z x x x 0

j j j

j j j

(the system of from the corresp ondences p ! p

j

j

t t = x t + y t + z t

3;1 3;2 j 1;1 j 1;2 j 1;3

0 0 0

z z z

eight equations detailed in the previous section). For

j j j

0 0 0

any given p oint p =(x; y ; 1), the corresp onding p oint

x y y y z y

j j j

j j j

0 0 0

x t + y t + z t p =(x ;y ; 1) is determined by Ap followed by - t t =

j 2;1 j 2;2 j 2;3 3;1 3;2

0 0 0

z z z

j j j

ization to set the third comp onent backto1.

A similar pair of equations can b e derived in the case

A.1 Plane Pro jectivity in Ane Geometry

0 0 0

z = 0 (ideal p oints) by using either x or y (all three

j j j

In parallel pro jection we can takeadvantage of the fact

cannot b e zero).

that parallel lines pro ject to parallel lines. This allows to

de ne co ordinates on the plane by subtending lines par-

Pro jectivity Between Two image Planes of an

allel to the axes (see Figure 12). Note also that the two

0 0 0 0

Uncalibrated Camera

trap ezoids BB p p and BB C C are similar trap ezoids,

x

x

We can use the fundamental theorem of plane pro-

therefore,

0 0 jectivity to recover the pro jective transformation that

B C BC

= :

was illustrated geometrically in Figure 11. Given four

0 0

p C p C

0 0

x

x

corresp onding p oints (x ;y ) ! (x ;y ) that are pro-

j j

j j

This provides a geometric derivation of the result that

jected from four coplanar p oints in space wewould like

three p oints are sucient to set up a pro jectivitybe-

to nd the pro jective transformation A that accounts

0 0

tween anytwo planes under parallel pro jection.

for all other corresp ondences (x; y ) ! (x ;y ) that are

Algebraically, a pro jectivity of the plane can b e pro jected from coplanar p oints in space. 14 ys is computed algebraically

A’ a The cross-ratio of ra

through linear combination of p oints in homogeneous

co ordinates (see Gans 1969, pp. 291{295), as follows.

ys a; b; c; d b e represented byvectors

A B’ b Let the the ra

(a ;a ;a ); :::; (d ;d ;d ), resp ectively. We can repre-

2 3 1 2 3

B 1

sent the rays a ; d as a linear combination of the rays

C C’

; c,by

c b

= b + k c D a

D’

0

d = b + k c

d

For example, k can b e found by solving the linear system

of three equation a = b + k c with two unknowns ; k

Figure 13: The cross-ratio of four distinct concurrent

(one can solve using anytwo of the three equations, or

rays is equal to the cross-ratio of the four distinct p oints

nd a least squares solution using all three equations).

that result from intersecting the rays by a transversal.

We shall assume, rst, that the p oints are Euclidean.

The ratio in which A divides the line BC can b e derived

by:

b b b +kc a

1 1 1 1 1

uniquely represented as a 2D ane transformation of the

AB c

3

a b b +kc b

3 3 3 3 3

= = = k

non-homogeneous co ordinates of the p oints. Namely,if

c a

1 1 b +kc c

1 1 1

AC b

0 0 0

3

a c

3 3 b +kc c

p =(x; y ) and p =(x ;y )aretwo corresp onding p oints,

3 3 3

c DB

0

then

3

and, therefore, the cross- = k Similarly,wehave

DC b

0

3

= Ap + w p

k

ratio of the four rays is = . The same result holds

0

k

where A is a non-singular matrix and w is a vector. The

under more general conditions, i.e., p oints can b e ideal

six parameters of the transformation can b e recovered

as well:

from two non-collinear sets of three p oints, p ;p ;p and

o 1 2

Prop osition 6 If A; B ; C ; D are distinct col linear

0 0 0

p ;p ;p . Let

0

o 1 2

points, with homogeneous coordinates b + k c ; b; c; b + k c,

k

  

1

then the canonical cross-ratio is .

0

0 0 0 0

k

x x ;x x x x ;x x

1 o 2 o

1 o 2 o

A =

0 0 0 0

(for a complete pro of, see Gans 1969, pp. 294{295). For

y y ;y y y y ;y y

1 o 2 o

1 o 2 o

our purp oses it is sucient to consider the case when

0 0 0

and w = p Ap , which together satisfy p p = one of the p oints, say the vector d, is ideal (i.e. d = 0).

o 3

o j o

0

From the vector equation d = b + k c,wehavethat

A(p p ) for j =1; 2. For any arbitrary p oint p on

j o

b

DB

0

3

and, therefore, the ratio = 1. As a result, k =

the plane, wehavethat p is spanned by the twovectors

c DC

3

the cross-ratio is determined only by the rst term, i.e., p p and p p , i.e., p = (p p )+ (p p ); and

1 o 2 o 1 1 o 2 2 o

AB

= = k |whichiswhatwewould exp ect if we b ecause translation in depth is lost in parallel pro jection,

AC

0 0 0 0 0

represented p oints in the Euclidean plane and allowed wehavethat p = (p p )+ (p p ), and therefore

1 2

1 o 2 o

0 0

the p oint D to extend to in nity along the line A; B ; C ; D p p = A(p p ).

o

o

(see Figure 13).

The derivation so far can b e translated directly to our

B Cross-Ratio and the Linear

purp oses of computing the pro jective shap e constantby

Combination of Rays

0 0 0

replacing a; b; c; d with p ; p~ ; p^ ;V , resp ectively.

l

The cross-ratio of four collinear p oints A; B ; C ; D is pre-

served under central pro jection and is de ned as:

C On Epip olar Transformations

0

0 0 0 0

Prop osition 7 The epipolar lines pV and p V areper-

DB A B D B AB

r l

 =  ; =

0 0 0 0 spectively related.

AC DC A C D C

Pro of: Consider Figure 14. Wehave already estab-

(see Figure 13). All p ermutations of the four p oints

0

lished that p pro jects onto the left epip olar line p V .

l

are allowed, and in general there are six distinct cross-

By de nition, the right epip ole V pro jects onto the left

r

ratios that can b e computed from four collinear p oints.

epip ole V , therefore, b ecause lines are pro jectiveinvari-

l

Because the cross-ratio is invariant to pro jection, any

0

ants the line pV pro jects onto the line p V .

r l

transversal meeting four distinct concurrentrays in four

The result that epip olar lines in one image are p er-

distinct p oints will have the same cross ratio | therefore

sp ectively related to the epip olar lines in the other im-

one can sp eak of the cross-ratio of rays (concurrentor

age, implies that there exists a pro jective transformation

parallel) a; b; c; d.

0

F that maps epip olar lines l onto epip olar lines l ,that

j

j

The cross-ratio result in terms of rays, rather than

0 0 0

is Fl =  l , where l = p  V and l = p  V .From

p oints, is app ealing for the reasons that it enables the ap- j j j j r l

j j j

plication of linear algebra (rays are represented as p oints the prop ertyofpoint/line duality of pro jective geome-

in homogeneous co ordinates), and more imp ortant, en- try (Semple and Kneeb one, 1952), the transformation

ables us to treat ideal p oints as any other p oint (critical E that maps p oints on left epip olar lines onto p oints on

for having an algebraic system that is well de ned under the corresp onding right epip olar lines is induced from F ,

1 t

b oth central and parallel pro jection). i.e., E =(F ) . 15 P Q reference plane ~−1 P ~P

P image image ~ plane p’ p’ plane ~ ~P Q p q’ p~ −1

V ~q’ 2 p’ ~ V V p’ l r

V 1

V

2 ely related. Figure 14: Epip olar lines are p ersp ectiv V l

V

t/line duality) The

Prop osition 8 (p oin 1

transformation for projecting p onto the left epipolar line

0 1 t

p V ,isE =(F ) .

l

Figure 15: See text.

0

Pro of: Let l; l b e corresp onding epip olar lines, related

0 0

by the equation l = Fl.Letp; p be anytwo p oints,

The epip olar transformation, therefore, has three free

one on each epip olar line (not necessarily corresp onding

parameters (one for scale, the other two b ecause the

p oints). From the p oint/line incidence axiom wehave

equation Fl =  l has dropp ed out).

t

3 3 3

that l  p = 0. By substituting l wehave

   

t

t

1 0 0 t

D Ane Structure in Pro jective Space

F l  p =0 =) l  F p =0:

Prop osition 10 The ane structure invariant, based

1 t

Therefore, the collineation E =(F ) maps p oints p

on a single reference plane and a referencepoint, cannot

onto the corresp onding left epip olar line.

bedirectly extendedtocentral projection.

It is intuitively clear that the epip olar line transforma-

Pro of: Consider the drawing in Figure 15. Let Q be

tion F is not unique, and therefore the induced trans-

the reference p oint, P b e an arbitrary p ointofinterest

formation E is not unique either. The corresp ondence

~ ~

between the epip olar lines is not disturb ed under trans- in space, and Q; P b e the pro jection of Q and P onto

the reference plane (see section 4 for de nition of ane

lation along the line V V , or under non-rigid camera

1 2

structure under parallel pro jection).

motion that results from tilting the image plane with re-

~ ~

sp ect to the optical axis such that the epip ole remains

The relationship b etween the p oints P; Q; P; Q and

0 0 0 0

on the line V V .

1 2 the p oints p ; p~ ;q ; q~ can b e describ ed as a p ersp ectivity

between two triangles. However, in order to establish

Prop osition 9 The epipolar transformation F is not

an invariantbetween the two triangles one must havea

unique.

coplanar p oint outside each of the triangles, therefore the

Pro of: A pro jective transformation is determined

ve corresp onding p oints are not sucient for determin-

by four corresp onding p encils. The transformation is

ing an invariant relation (this is known as the ` vepoint

unique (up to a scale factor) if no three of the p encils are

invariant' which requires that no three of the p oints b e

linearly dep endent, i.e., if the p encils are lines, then no

collinear).

three of the four lines should b e coplanar. The epip olar

line transformation F can b e determined by the corre-

E On the Intersection of Epip olar Lines

sp onding epip oles, V ! V , and three corresp onding

r l

0

Barret et al. (1991) derive a quadratic invariantbased

epip olar lines l ! l .Weshow next that the epip olar

j

j

on Longuet-Higgins' fundamental matrix. We describ e

lines are coplanar, and therefore, F cannot b e deter-

brie y their invariant and show that it is equivalentto

mined uniquely.

0

p erforming re-pro jection using intersection of epip olar

Let p and p , j =1; 2; 3, b e three corresp onding

j

j

0 0

lines.

p oints and let l = p  V and l = p  V . Let

j j r l

j j

In section 8 we derived Longuet-Higgins' fundamental

p = p + p , + = 1,beapoint on the epip o-

3 1 2

t

0

matrix relation p Hp = 0. Barret et al. note that the

lar line p V collinear with p ;p .Wehave,

3 r 1 2

t

equation can b e written in vector form h  q =0,where

l = p  V =(ap + bV )  V = ap  V = a l + a l ;

3 3 r 3 r r 3 r 1 2

h contains the elements of H and

0 0 0 0 0 0

0 0 0 0 0

and similarly l = l + l . q =(x x; x y; x ;y x; y y; y ;x;y;1):

3 1 2 16

[5] T. Broida, S. Chandrashekhar, and R. Chellapa. re- Therefore, the matrix

3 2

cursive 3-d motion estimation from a mono cular im-

q

1

age sequence. IEEE Transactions on Aerospace and

7 6

:

Electronic Systems, 26:639{656, 1990. 7 6

:

(1) B =

7 6

5 4

:

[6] D.C. Brown. Close-range camera calibration. Pho-

q

togrammetric Engineering, 37:855{866, 1971.

9

must haveavanishing determinant. Given eight corre-

[7] R. Dutta and M.A. Synder. Robustness of corre-

sp onding p oints, the condition jB j = 0 leads to a con-

sp ondence based structure from motion. In Pro-

straint line in terms of the co ordinates of anyninth p oint,

ceedings of the International ConferenceonCom-

i.e., x + y + = 0. The lo cation of the ninth p oint

puter Vision, pages 106{110, Osaka, Japan, Decem-

in any third view can, therefore, b e determined byinter-

b er 1990.

secting the constraint lines derived from views 1 and 3,

[8] W. Faig. Calibration of close-range photogram-

and views 2 and 3.

metry systems: Mathematical formulation. Pho-

Another way of deriving this re-pro jection metho d is

togrammetric Engineering and Remote Sensing,

by rst noticing that H is a correlation that maps p onto

41:1479{1486, 1975.

0 0

the corresp onding epip olar line l = V p (see section 8).

l

Therefore, from views 1 and 3 wehave the relation

[9] O.D. Faugeras. What can b e seen in three dimen-

sions with an uncalibrated stereo rig? In Proceed-

t

00

~

Hp =0; p

ings of the European Conference on Computer Vi-

sion, pages 563{578, Santa Margherita Ligure, Italy,

and from views 2 and 3 wehave the relation

June 1992.

t

00 0

^

p Hp =0;

[10] O.D. Faugeras, Q.T. Luong, and S.J. Maybank.

0

~ ^

where Hp and Hp are twointersecting epip olar lines.

Camera self calibration: Theory and exp eriments.

~

Given eight corresp onding p oints, we can recover H and

In Proceedings of the European Conference on Com-

00

^

H . The lo cation of any ninth p oint p can b e recovered

puter Vision, pages 321{334, Santa Margherita Lig-

0

~ ^

ure, Italy, June 1992.

byintersecting the lines Hp and Hp .

This way of deriving the re-pro jection metho d has an

[11] O.D. Faugeras and S. Maybank. Motion from p oint

advantage over using the condition jB j = 0 directly,

matches: Multiplicity of solutions. International

b ecause one can use more than eight p oints in a least

Journal of Computer Vision, 4:225{246, 1990.

~ ^

squares solution (via SVD) for the matrices H and H .

[12] M.A. Fischler and R.C. Bolles. Random sample con-

Approaching the re-pro jection problem using intersec-

census: a paradigm for mo del tting with applica-

tion of epip olar lines is problematic for novel views that

tions to image analysis and automated cartography.

have a similar epip olar geometry to that of the two mo del

Communications of the ACM, 24:381{395, 1981.

~

views (these are situations where the two lines Hp and

0

^

Hp are nearly parallel, such as when the ob ject rotates

[13] D. Gans. Transformations and .

around nearly the same axis for all views). We there-

Appleton-Century-Crofts, New York, 1969.

fore exp ect sensitivity to errors also under conditions of

[14] E.C. Hildreth. Recovering heading for visually-

small separation b etween views. The metho d b ecomes

guided navigation. A.I. Memo No. 1297, Arti cial

more practical if one uses multiple mo del views instead

Intelligence Lab oratory, Massachusetts Institute of

of only two, b ecause each mo del views adds one epip olar

Technology, June 1991.

line and all lines should intersect at the lo cation of the

p ointofinterest in the novel view.

[15] T.S. Huang and C.H. Lee. Motion and struc-

ture from orthographic pro jections. IEEE Transac-

References

tions on Pattern Analysis and Machine Intel ligence,

PAMI-11:536{540, 1989.

[1] G. Adiv. Inherentambiguities in recovering 3D mo-

[16] D.P. Huttenlo cher and S. Ullman. Ob ject recog-

tion and structure from a noisy ow eld. IEEE

nition using alignment. In Proceedings of the In-

Transactions on Pattern Analysis and Machine In-

ternational Conference on Computer Vision, pages

tel ligence,PAMI-11(5):477{489, 1989.

102{111, London, December 1987.

[2] J. Aloimonos and C.M. Brown. On the kinetic depth

e ect. Biological Cybernetics, 60:445{455, 1989.

[17] C.H. Lee. Structure and motion from two p ersp ec-

tive views via planar patch. In Proceedings of the In-

[3] I.A. Bachelder and S. Ullman. Contour match-

ternational Conference on Computer Vision, pages

ing using lo cal ane transformations. In Pro-

158{164, Tampa, FL, December 1988.

ceedings Image Understanding Workshop.Morgan

Kaufmann, San Mateo, CA, 1992.

[18] R.K. Lenz and R.Y. Tsai. Techniques for calibration

[4] E.B. Barrett, M.H. Brill, N.N. Haag, and P.M. Py- of the scale factor and image center for high accu-

ton. General metho ds for determining pro jective racy 3D machine vision metrology.Infourth int.

invariants in imagery. Computer Vision, Graphics, Symp. on Robotics Research, pages 68{75, Santa

and Image Processing, 53:46{65, 1991. Cruz, CA, Aug 1987. 17

[19] H.C. Longuet-Higgins. A computer algorithm for

reconstructing a scene from two pro jections. Nature,

293:133{135, 1981.

[20] H.C. Longuet-Higgins and K. Prazdny.Theinter-

pretation of a moving retinal image. Proceedings of

the Royal Society of London B, 208:385{397, 1980.

[21] D.G. Lowe. Perceptual organization and visual

recognition. Kluwer Academic Publishing, Hing-

ham, MA, 1985.

[22] R. Mohr, L. Quan, F. Veillon, and B. Boufama. Rel-

ative 3D reconstruction using multiple uncalibrated

images. Technical Rep ort RT 84-IMAG, LIFIA |

IRIMAG, France, June 1992.

[23] J.H. Rieger and D.T. Lawton. Pro cessing di eren-

tial image motion. Journal of the Optical Society of

America, 2:354{360, 1985.

[24] J.W. Roach and J.K. Aggarwal. Computer tracking

of ob jects moving in space. IEEE Transactions on

Pattern Analysis and Machine Intel ligence, 1:127{

135, 1979.

[25] J.G. Semple and G.T. Kneeb one. Algebraic Projec-

tive Geometry. Clarendon Press, Oxford, 1952.

[26] A. Shashua. Corresp ondence and ane shap e from

two orthographic views: Motion and Recognition.

A.I. Memo No. 1327, Arti cial Intelligence Lab o-

ratory, Massachusetts Institute of Technology, De-

cemb er 1991.

[27] C. Tomasi. shape and motion from image streams:

a factorization method. PhD thesis, Scho ol of Com-

puter Science, Carnagie Mellon University,Pitts-

burgh, PA, 1991.

[28] R.Y. Tsai and T.S. Huang. Uniqueness and estima-

tion of three-dimensional motion parameters of rigid

ob jects with curved surface. IEEE Transactions on

Pattern Analysis and Machine Intel ligence,PAMI-

6:13{26, 1984.

[29] A. Tuller. Amodern introduction to geometries.D.

Van Nostrand Co., Princeton, NJ, 1967.

[30] S. Ullman. The Interpretation of Visual Motion.

MIT Press, Cambridge and London, 1979.

[31] S. Ullman. Aligning pictorial descriptions: an ap-

proach to ob ject recognition. Cognition, 32:193{

254, 1989. Also: in MIT AI Memo 931, Dec. 1986.

[32] S. Ullman and R. Basri. Recognition by linear com-

bination of mo dels. IEEE Transactions on Pattern

Analysis and Machine Intel ligence,PAMI-13:992|

1006, 1991. Also in M.I.T AI Memo 1052, 1989. 18