Chapter 11

Matching in 2D

This chapter explores how to make and use corresp ondences b etween images and maps,

images and mo dels, and images and other images. All work is done in two dimensions;

metho ds are extended in Chapter 14 to 3D-2D and 3D-3D matching. There are many im-

mediate applications of 2D matching which do not require the more general 3D analysis.

Consider the problem of taking inventory of land use in a township for the purp ose of

planning development or collecting taxes. A plane is dispatched on a clear day to take

aerial images of all the land in the township. These pictures are then compared to the most

recent map of the area to create an up dated map. Moreover, other databases are up dated to

record the presence of buildings, roads, oil wells, etc., or p erhaps the kind of crops growing

in the various elds. This work can all b e done by hand, but is now commonly done using

a computer. A second example is from the medical area. The blo o d ow in a patient's heart

and lungs is to b e examined. An X-ray image is taken under normal conditions, followed

by one taken after the injection into the blo o dstream of a sp ecial dye. The second image

should reveal the blo o d ow, except that there is a lot of noise due to other b o dy structures

such as b one. Subtracting the rst image from the second will reduce noise and artifact

and emphasize only the changes due to the dye. Before this can b e done, however, the rst

image must b e geometrical ly transformed or warped to comp ensate for small motions of the

b o dy due to b o dy p ositioning, heart motion, breathing, etc.

11.1 Registration of 2D Data

A simple general mathematical mo del applies to all the cases in this Chapter and many

others not covered. Equation 11.1 and Figure 11.1 showaninvertible mapping b etween

p oints of a mo del M and p oints of an image I . Actually, M and I can eachbeany2D

co ordinate space and can each represent a map, mo del, or image.

M [x; y ] = I [g (x; y );h(x; y )] (11.1)

1 1

I [r;c] = M[g (r;c);h (r;c)] 1

2 Computer Vision: Mar 2000

( r , c )

( x , y )

M

I

Figure 11.1: A mapping b etween 2D spaces M and I. M may b e a mo del and I an image,

but in general any 2D spaces are p ossible.

1 Definition The mapping from one 2D coordinate space to another as de ned in Equa-

tion 11.1 is sometimes cal ledaspatial transformation, geometric transformation, or warp

(although to some the term warp is reserved for only nonlinear transformations).

The functions g and h create a corresp ondence b etween mo del p oints [x; y ] and image

p oints [r;c] so that a p oint feature in the mo del can b e lo cated in the image: we assume

that the mapping is invertible so that we can go in the other direction using their inverses.

Having such mapping functions in the tax record problem allows one to transform prop erty

b oundaries from a map into an aerial image. The region of the image representing a par-

ticular prop erty can then b e analyzed for new buildings or crop typ e, etc. (Currently, the

analysis is likely to b e done byahuman using an interactive graphics workstation.) Having

such functions in the medical problem allows the radiologist to analyze the di erence image

I [r ;c ] I[g(r ;c );h(r ;c )]: here the mapping functions register like p oints in the two

2 2 2 2 2 2 2

images.

2 Definition Image registration is the process by which points of two images from similar

viewpoints of essential ly the same scene aregeometrical ly transformed so that corresponding

featurepoints of the two images have the same coordinates after transformation.

Another common and imp ortant application, although not actually a matching op era-

tion, is creation of a new image by collecting sample from another image. For example,

as shown in Figure 11.2, we mightwant to cut out a single face I from an image I of several

2 1

p eople as shown in Figure 11.2. Although the content of the new image I is a subset of the

2

original image I , it is p ossible that I can have the same numb er of pixels as I (as is the

1 2 1

case in the gure), or even more.

There are several issues of this theory whichhave practical imp ortance. What is the form

of the functions g and h, are they linear, continuous, etc? Are straight lines in one space

mapp ed into straight or curved lines in the other space? Are the distances b etween p oint

pairs the same in b oth spaces? More imp ortant, howdowe use the prop erties of di erent

functions to achieve the mappings needed? Is the 2D space of the mo del or image continuous

or discrete? If at least one of the spaces is a digital image then quantization e ects will

impact b oth accuracy and visual quality. ( The quantization e ects have delib erately b een

kept in the right image of Figure 11.2 to demonstrate this p oint.)

Shapiro and Sto ckman 3

Figure 11.2: New image of a face (right) cut out of original image (left) using a sampling

transformation. The face at the right is the rightmost face of the ve at the left.

Exercise 1

Describ e how to enhance the right image of Figure 11.2 to lessen the quantization or aliasing

e ect.

11.2 Representation of Points

In this chapter, wework sp eci cally with p oints from 2D spaces. Extension of the de nitions

and results to 3D is done later in Chapter 13; most, but not all, extensions are straightfor-

ward. It is go o d for the student to master the basic concepts and notation b efore handling

the increased complexity sometimes present in 3D. A 2D p oint has two co ordinates and is

t

conveniently represented as either a rowvector P =[x; y ] or column vector P =[x; y ] .

The column vector notation will b e used in our equations here to b e consistent with most

engineering b o oks which apply transformations T on the left to p oints P on the right. For

convenience, in our text we will often use the rowvector form and omit the formal transp ose

notation t. Also, we will separate co ordinates by commas, something that is not needed

when a column vector is displayed vertically.

 

x

t

P =[x; y ] =

y

Sometimes we will have need to lab el a p oint according to the typ e of feature from whichit

was determined. For example, a p oint could b e the center of a hole, a vertex of a p olygon,

or the computed lo cation where two extended line segments intersect. Pointtyp e will b e

used to advantage in the automatic matching algorithms discussed later in the chapter.

Reference Frames

The co ordinates of a p oint are always relative to some co ordinate frame. Often, there

are several co ordinate frames needed to analyze an environment, as discussed at the end of

Chapter 2. When multiple co ordinate frames are in use, wemay use a sp ecial sup erscript

to denote which frame is in use for the co ordinates b eing given for the p oint.

3 Definition If P is some featurepoint and C is some referenceframe, then we denote

j

c

the coordinates of the point relative to the coordinate system as P . j

4 Computer Vision: Mar 2000

Homogeneous Co ordinates

As will so on b ecome clear, it is often convenient notationally and for computer pro cessing

to use homogeneous coordinates for p oints, esp ecially when ane transformations are used.

t t

4 Definition The homogeneous coordinates of a 2D point P = [x; y ] are [sx; sy ; s] ,

where s is a scale factor, commonly 1.0.

Finally,we need to note the conventions of co ordinate systems and programs that display

pictures. The co ordinate systems in the drawn gures of this chapter are typically plotted

as they are in mathematics b o oks with the rst co ordinate (x or u or even r ) increasing to

the right from the origin and the second co ordinate (y or v or even c) increasing upward

from the origin. However, our image display programs display an image of n rows and m

columns with the rst row (row r = 0) at the top and the last row (row r = n 1) at the

b ottom. Thus r increases from the top downward and c increases from left to right. This

presents no problem to our algebra, but may give our intuition trouble at times since the

displayed image needs to b e mentally rotated counterclo ckwise 90 degrees to agree with the

conventional orientation in a math b o ok.

11.3 Ane Mapping Functions

A large class of useful spatial transformations can b e represented bymultiplication of a

and a homogeneous p oint. Our treatment here is brief but fairly complete: for more

details, consult one of the texts or rob otics texts listed in the references.

Scaling

A common op eration is . Uniform scaling changes all co ordinates in the same

way, or equivalently changes the size of all ob jects in the same way. Figure 11.3 shows

0

a2DpointP=[1;2] scaled by a factor of 2 to obtain the new p oint P =[2;4]. The

same scale factor is applied to the three vertices of the triangle yielding a triangle twice as

large. Scaling is a linear transformation, meaning that it can b e easily represented in terms

of the scale factor applied to the two basis vectors for 2D Euclidean space. For example,

[1; 2] = 1[1; 0] + 2[0; 1] and 2[1; 2] = 2(1[1; 0] + 2[0; 1]) = 2[1; 0] + 4[0; 1] = [2; 4]:

Equation 11.2 shows how scaling of a 2D p ointisconveniently represented using multipli-

cation by a simple matrix containing the scale factors on the diagonal. The second case

is the general case where the x and y unit vectors are scaled di erently and is given in

Equation 11.3. Recall the ve co ordinate frames intro duced in Chapter 2: the change of

co ordinates of real image p oints expressed in mm units to image p oints expressed in

row and column units is one such scaling. In the case of a square pixel camera, c = c = c,

x y

but these constants will b e in the ratio of 4=3 for cameras built using TV standards.

        

0

x c 0 x cx x

= = = c (11.2)

0

y 0 c y cy y

      

0

x c 0 x c x

x x

= = (11.3)

0

y 0 c y c y

y y

Shapiro and Sto ckman 5

Y

[2, 4] scaling by a factor of 2

[1, 2]

X

[0, 0] [4, 0] [8, 0]

Figure 11.3: Scaling b oth co ordinates of a 2D vector by scale factor 2.

Exercise 2 scaling for a non-square pixel camera

Supp ose a square CCD chip has side 0.5 inches and contains 480 rows of 640 pixels each

on this active area. Give the scaling matrix needed to convert pixel co ordinates [r, c] to

co ordinates [x, y] in inches. The center of pixel [0,0] corresp onds to [0, 0] in inches. Using

your conversion matrix, what are the integer co ordinates of the center of the pixel in row

100 column 200?

Rotation

A second common op eration is ab out a p oint in 2D space. Figure 11.4 (left)

shows a 2D p oint P =[x; y ] rotated by angle  counterclo ckwise ab out the origin to obtain

0 0 0

the new p oint P =[x;y ]. Equation 11.4 shows how rotation of a 2D p oint ab out the

origin is conveniently represented using multiplication by a simple matrix. As for any linear

transformation, we take the columns of the matrix to b e the result of the transformation

applied to the basis vectors (Figure 11.4 (right)); transformation of any other vector can b e

expressed as a linear combination of the basis vectors.

R ([x; y ]) = R (x[1; 0] + y [0; 1])

 

= xR ([1; 0]) + yR ([0; 1]) = x[cos ; sin ]+y[sin ; cos ]

 

= [xcos y sin ; xsin + y cos ]

      

0

x cos sin x xcos y sin

= = (11.4)

0

y sin cos y xsin + y cos

2D rotations can b e made ab out some arbitrary p oint in the 2D plane, which need not b e the origin of the reference frame. Details are left for a guided exercise later in this section.

6 Computer Vision: Mar 2000

Y Y

[-sin θ , cos θ ] P' = [ x' , y' ] [0, 1]

θ θ [ cos θθ , sin ] P = [ x, y ] θ

[0, 0] XX[1, 0]

rotation of point P by angle θ rotation of basis vectors by θ

Figure 11.4: Rotation of any 2D p oint in terms of rotation of the basis vectors.

Exercise 3

(a) Sketch the three p oints [0,0], [2,2], and [0,2] using an XY co ordinate system. (b) Scale

these p oints by 0.5 using Equation 11.2 and plot the results. (c) Using a new plot, plot the

result of rotating the three p oints by 90 degrees ab out the origin using Equation 11.4. (d)

Let the scaling matrix b e S and the b e R. Let SR b e the matrix resulting

from multiplying matrix S on the left of matrix R. Is there any di erence if we transform

the set of three p oints using SR and RS?

Shapiro and Sto ckman 7

* Orthogonal and Orthonormal Transforms

5 Definition A set of vectors is said to be orthogonal if al l pairs of vectors in the set are

perpendicular; or equivalently, have scalar product of zero.

6 Definition A set of vectors is said to be orthonormal if it is an orthogonal set and if al l

the vectors have unit length.

A rotation preserves b oth the length of the basis vectors and their othogonality. This can

b e seen b oth intuitively and algebraically. As a direct result, the distance b etween anytwo

transformed p oints is the same as the distance b etween the p oints b efore transformation.

A rigid transformation has this same prop erty: a rigid transformation is a comp osition of

rotation and . Rigid transformations are commonly used for rigid ob jects or for

change of co ordinate system. A uniform scaling that is not 1.0 do es not preserve length;

however, it do es preserve the angle b etween vectors. These issues are imp ortant when we

seek prop erties of ob jects that are invarianttohow they are placed in the scene or howa

camera views them.

Translation

Often, p oint co ordinates need to b e shifted by some constant amount, which is equivalent

to changing the origin of the co ordinate system. For example, row-column co ordinates of

a pixel image might need to b e shifted to transform to latitude-longitude co ordinates of a

map. Since translation do es not map the origin [0, 0] to itself, we cannot mo del it using

a simple 2x2 matrix as has b een done for scaling and rotation: in other words, it is not a

linear op eration. We can extend the dimension of our matrix to 3x3 to handle translation as

well as some other op erations: accordingly, another co ordinate is added to our p ointvector

to obtain homogeneous coordinates.Typically, the app ended co ordinate is 1.0, but other

values may sometimes b e convenient.

P =[x; y ] ' [wx; wy; w]= [x; y ; 1] for w =1

The shown in Equation 11.5 can now b e used to mo del the translation

0 0

D of p oint[x; y ] so that [x ;y ]=D([x; y ])=[x+x;y+y].

0 0

2 3 2 3 2 3 2 3

0

x 1 0 x x x + x

0 0

0

4 5 4 5 4 5 4 5

y = 0 1 y y = y + y (11.5)

0 0

1 0 0 1 1 1

Exercise 4 rotation ab out a p oint

Give the 3x3 matrix that represents a =2 rotation of the plane ab out the p oint[5,8].

Hint: rst derive the matrix D that translates the p oint [ 5, 8] to the origin of a new

5;8

co ordinate frame. The matrix whichwewant will b e the combination D R D .

5;8 5;8

=2

Check that your matrix correctly transforms p oints [5, 8],[6, 8] and [5, 9].

8 Computer Vision: Mar 2000

Exercise 5 re ection ab out a co ordinate axis

A re ection ab out the y-axis maps the basis vector [1, 0] onto [-1, 0] and the basis vector

[0, 1] onto [0, 1]. (a) Construct the matrix representing this re ection. (b) Verify that the

matrix is correct by transforming the three p oints [1, 1], [1, 0], and [2, 1].

Y w

col 300 Y i P 2 200 row X 200 i P 400 1 100 θ 200

O i X O 100 200300 400 w

w

Figure 11.5: Image from a square-pixel camera lo oking vertically down on a workb ench:

feature p oints in image co ordinates need to b e rotated, scaled, and translated to obtain

workb ench co ordinates.

Rotation, Scaling and Translation

Figure 11.5 shows a common situation: an image I[r, c] is taken by a square-pixel cam-

era lo oking p erp endicularly down on a planar workspace W[x,y].We need a formula that

converts any pixel co ordinates [r;c] in units of rows and columns to, say, mm units [x; y ].

This can b e done by comp osing a rotation R, a scaling S, and a translation D as given in

w i

Equation 11.6 and denoted P = D S R P . There are four parameters that deter-

j x ;y s  j

0 0

mine the mapping b etween row-column co ordinates and x-y co ordinates on the workb ench;

angle of rotation  , scale factor s that converts pixels to mm, and the two displacements

x ;y . These four parameters can b e obtained from the co ordinates of two control points P

0 0 1

and P . These p oints are determined by clearly marked and easily measured features in the

2

workspace that are also readily observed in the image { '+' marks, for example. In the land

use application, road intersections, building corners, sharp river b ends, etc. are often used

as control p oints. It is imp ortant to emphasize that the same p oint feature, say P , can b e

1

represented bytwo (or more) distinct vectors, one with row-column co ordinates relativeto I

i

and one with mm x-y co ordinates relativetoW.We denote these representations as P and

1

w i w

P resp ectively.For example, in Figure 11.5 wehave P = [100; 60] and P = [200; 100].

1 1 1

Shapiro and Sto ckman 9

7 Definition Control points are clearly distinguishable and easily measuredpoints usedto

establish known correspondences between di erent coordinate spaces.

Given co ordinates for p oint P in b oth co ordinate systems, matrix equation 11.6 yields

1

two separate equations in the four unknowns.

2 3 2 3 2 3 2 3 2 3

x 1 0 x s 0 0 cos  sin  0 x

w 0 i

4 5 4 5 4 5 4 5 4 5

y = 0 1 y 0 s 0 sin  cos  0 y (11.6)

w 0 i

1 0 0 1 0 0 1 0 0 1 1

x = x s cos  y s sin  + x (11.7)

w i i 0

y = x s sin  + y s cos  + y (11.8)

w i i 0

Using p oint P yields two more equations and we should b e able to solve the system to

2

determine the four parameters of the conversion formula.  is easily determined indep endent

of the other parameters as follows. First, the direction of the vector P P in I is determined

1 2

i i i i

as  = arctan (( y y )=( x x )). Then, the direction of the vector in W is determined

i 2 1 2 1

w w w w

as  = arctan(( y y ) =( x x )). The rotation angle is just the di erence of

w 2 1 2 1

these two angles:  =   . Once  is determined, all the sin and cos elements are known:

w i

there are 3 equations and 3 unknowns which can easily b e solved for s and x ;y. The

0 0

reader should complete this solution via the exercise b elow.

Exercise 6 converting image co ordinates to workb ench co ordinates

Assume an environment as in Figure 11.5. (Perhaps the vision system must inform a pick-

and-place rob ot of the lo cations of ob jects.) Give, in matrix form, the transformation

that relates image co ordinates [x ;y ;1] to workb ench co ordinates [x ;y ;1]. Compute the

i i w w

i w i

four parameters using these control p oints: P = [100; 60]; P = [200; 100] ; P =

1 1 2

w

[380; 120]; P = [300; 200].

2

An Example Ane Warp

It is easy to extract a parallelogram of data from a digital image by selecting 3 p oints.

The rst p oint determines the origin of the output image to b e created, while the second

and third p oints determine the extreme p oint of the parallelogram sides. The output image

will b e a rectangular pixel arrayofany size constructed from samples from the input image.

Figure 11.6 shows results of using a program based on this idea. To create the center image

of the gure, the three selected p oints de ned nonorthogonal axes, thus creating shear in

the output; this shear was removed by extracting a third image from the center image and

aligning the new sampling axes with the skewed axes. Figure 11.7 shows another example:

a distorted version of Andrew Jackson's head has b een extracted from an image of a $20 bill

(see Figure 11.24). In b oth of these examples, although only a p ortion of the input image

was extracted, the output image contains the same numb er of pixels.

The program that created Figure 11.7 used the three user selected p oints to transform

the input parallelogram de ned at the left of the gure. The output image was n by m

or 512x512 pixels with co ordinates [r;c]; for each pixel [r;c] of the output, the input image

value was sampled at pixel [x; y ] computed using the transformation in Equation 11.9. The

rst form of the equation is the intuitive construction in terms of basis vectors, while the

10 Computer Vision: Mar 2000

Figure 11.6: (Left) 128x128 digital image of grid; (center) 128x128 image extracted byan

ane warp de ned by three p oints in the left image; (right) 128x128 recti ed version of part

of the center image.

[x , y ] 2 2 [x , y ] 1 1 [x, y]

u v [ x , y ]

00

Figure 11.7: Distorted face of Andrew Jackson extracted from a $20 bill by de ning an ane mapping with shear.

Shapiro and Sto ckman 11

second is its equivalent in standard form.

           

r c

x x x x x x

1 2 0 0 0

= + +

y y y y y y

n m

1 2 0 0 0

2 3 2 3 2 3

x (x x )=n (x x )=m x r

1 0 2 0 0

4 5 4 5 4 5

y = (y y )=n (y y )=m y c (11.9)

1 0 2 0 0

1 0 0 1 1

Conceptually, the p oint[x; y ] is de ned in terms of the new unit vectors along the new

axes de ned by the user selected p oints. The computed co ordinates [x; y ]must b e rounded

1

to get integer pixel co ordinates to access digital image I . If either x or y are out of b ounds,

2 2 1

then the output p oint is set to black, in this case I [r;c] = 0; otherwise I [r;c]= I[x; y ].

One can see a black triangle at the upp er right of Jackson's head b ecause the sampling

parallelogram protrudes ab ove the input image of the $20 bill image.

Ob ject Recognition/Lo cation Example

Consider the example of computing the transformation matching the mo del of an ob ject

shown at the left in Figure 11.8 to the ob ject in the image shown at the right of the gure.

Assume that automatic feature extraction pro duced only three of the holes in the ob ject.

The spatial transformation will map p oints [x; y ] of the mo del to p oints [u; v ] of the image.

Assume that wehave a controlled imaging environment and the known scale factor has

already b een applied to the image co ordinates to pro duce the u-v co ordinates shown. Only

two image p oints are needed in order to derive the rotation and translation that will align

all mo del p oints with correpsonding image p oints. Point lo cations in the mo del and image

and interp oint distances are shown in Tables 11.1 and 11.2. We will use the hyp othesized

corresp ondences (A; H ) and (B; H ) to deduce the transformation. Note that these corre-

2 3

sp ondences are consistent with the known interp oint distances. We will discuss algorithms

for making suchhyp otheses later on in this chapter.

Table 11.1: Mo del Point Lo cations and Interp oint Distances (yco ordinates are for the center

of holes).

p oint co ordinates y to A to B to C to D to E

A (8,17) 0 12 15 37 21

B (16,26) 12 0 12 30 26

C (23,16) 15 12 0 22 15

D (45,20) 37 30 22 0 30

E (22,1) 21 26 15 30 0

The direction of the vector from A to B in the mo del is  = arctan(9:0=8:0) =

1

0:844 and the heading of the corresp onding vector from H to H in the image is  =

2 3 2

arctan(12:0=0:0) = =2=1:571. The rotation is thus  =0:727 radians. Using Equa-

tions 11.6 and substituting the known matching co ordinates for p oints A in the mo del and

H in the image, we obtain the following system, where u ;v are the unknown translation

2 0 0

comp onents in the image plane. Note that the values of sin( ) and cos( ) are actually known

12 Computer Vision: Mar 2000

Y V

Model Object Detected Features

H B 3 R D A C Q H H E 2 1

X U

Figure 11.8: (Left) Mo del ob ject and (right) three holes detected in an image.

Table 11.2: Image Point Lo cations and Interp oint Distances (yco ordinates are for the center

of holes).

p oint co ordinates y to H1 to H2 to H3

H1 (31,9) 0 21 26

H2 (10,12) 21 0 12

H3 (10,24) 26 12 0

since  has b een computed.

2 3 2 3 2 3 2 3

u 10 cos  sin  u 8

0

4 5 4 5 4 5 4 5

v = 12 = sin  cos  v 17 (11.10)

0

1 1 0 0 1 1

The two resulting linear equations readily pro duce u =15:3 and v = 5:95. As a

0 0

check, we can use the matching p oints B and H , obtaining a similar result. Each distinct

3

pair of p oints will result in a slightly di erent transformation. Metho ds of using many

more p oints to obtain a transformation that is more accurate across the entire 2D space

are discussed later in this chapter. Having a complete spatial transformation, we can now

compute the lo cation of any mo del p oints in the image space, including the grip points

R = [29; 19] and Q = [32; 12]. As shown b elow, mo del p oint R transforms to image p oint

i

R = [24:4; 27:4]: using Q = [32; 12] as the input to the transformation outputs the image

Q

lo cation i = [31:2; 24:2] for the other grip p oint.

2 3 2 3 2 3 2 3

u 24:4 cos  sin  15:3 29

R

4 5 4 5 4 5 4 5

v = 27:4 = sin  cos  5:95 19 (11.11)

R

1 1 0 0 1 1

Shapiro and Sto ckman 13

Given this knowledge, a rob ot could grasp the real ob ject b eing imaged, provided that

it had knowledge of the transformation from image co ordinates to co ordinates of the work-

b ench supp orting the ob ject. Of course, a rob ot gripp er would b e op ened a little wider

R Q

than the transformed length obtained from i i indicates, to allow for small e ects such

as distortions in imaging, inaccuracies of feature detection, and computational errors. The

required gripping action takes place on a real continuous ob ject and real numb er co ordinates

make sense despite the fact that the image holds only discrete spatial samples. The image

data itself is only de ned at integer grid p oints. If our action were to verify the presence of

i i

holes C and D bychecking for a bright image pixel, then the transformed mo del p oints

should b e rounded to access an image pixel. Or, p erhaps an entire digital neighborhood

containing the real transformed co ordinates should b e examined. With this example, we

have seen the p otential for recognizing 2D ob jects by aligning a mo del of the ob ject with

imp ortant feature p oints in its image.

8 Definition Recognizing an object by matching transformedmodel features to image fea-

tures via a rotation, scaling, and translation (RST) is cal ledrecognition-by-alignment.

Exercise 7 are transformations commutative?

Supp ose wehave matrices for three primitive transformations: R for a rotation ab out the



origin, S for a scaling, and D for a translation. (a) Do scaling and translation

s ;s x ;y

x y 0 0

commute; that is, do es S D = D S ? (b) Do rotation and scaling

s ;s x ;y x ;y s ;s

x y 0 0 0 0 x y

commute; that is, do es R S = S R ? (c) Same question for rotation and

 s ;s s ;s 

x y x y

translation. (d) Same question for scaling and translation. Do b oth the algebra and intuitive

thinking to deriveyour answers and explanations.

Exercise 8

Construct the matrix for a re ection ab out the line y=3 by comp osing a translation with

y = 3 followed by a re ection ab out the x-axis. Verify that the matrix is correct by

0

transforming the three p oints [1, 1], [1, 0], and [2, 1] and plotting the input and output

p oints.

Exercise 9

Verify that the pro duct of the matrices D and D is the 3x3 .

x ;y x ;y

0 0 0 0

Explain why this should b e the case.

14 Computer Vision: Mar 2000

vv

[1, e ] [0,1] u[0,1] [e , 1] v

uu

[1,0] [1,0]

Figure 11.9: (Left) v-axis shear and (right) u-axis shear.

* General Ane Transformations

Wehave already covered ane transformation comp onents of rotation, scaling, and

translation. A fourth comp onent is shear. Figure 11.9 shows the e ect of shear. Using u v

co ordinates, p ointvectors move along the v-axis in prop ortion to their distance from the

v-axis. The p oint[u; v ] is transformed to [u; e u + v ] with v-axis shear and to [u + e v; v]

u v

with u-axis shear. The matrix equations are given in Equation 11.12 and Equation 11.13.

Recall that the column vectors of the shear matrix are just the images of the basis vectors

under the transformation.

2 3 2 3 2 3

x 1 0 0 u

4 5 4 5 4 5

y e 1 0 v

= (11.12)

u

1 0 0 1 1

2 3 2 3 2 3

x 1 e 0 u

v

4 5 4 5 4 5

y 0 1 0 v

= (11.13)

1 0 0 1 1

Re ections are a fth typ e of comp onent. A re ection ab out the u-axis maps the basis

vectors [1; 0]; [0; 1] onto [1; 0]; [0; 1] resp ectively, while a re ection ab out the v-axis maps

[1; 0]; [0; 1] onto [1; 0]; [0; 1]. The 2x2 or 3x3 matrix representation is straightforward. Any

ane transformation can b e constructed as a comp osition of any of the comp onenttyp es

{ rotations, scaling, translation, shearing, or re ection. Inverses of these comp onents exist

and are of the same typ e. Thus, it is clear that the matrix of a general ane transformation

comp osed of any comp onents has six parameters as shown in Equation 11.14. These six

parameters can b e determined using 3 sets of noncolinear p oints known to b e in corresp on-

dence by solving 3 matrix equations of this typ e. Wehave already seen an example using

shear in the case of the slanted grid in Figure 11.6.

2 3 2 3 2 3

x a a a u

11 12 13

4 5 4 5 4 5

y = a a a v (11.14)

21 22 23

1 0 0 1 1

Shapiro and Sto ckman 15

11.4 * A Best 2D Ane Transformation

A general ane transformation from 2D to 2D as in Equation 11.15 requires six parameters

and can b e computed from only 3 matching pairs of p oints ([x ;y ];[u ;v ]) ).

j j j j j =1;3

2 3 2 3 2 3

u a a a x

11 12 13

4 5 4 5 4 5

v = a a a y (11.15)

21 22 23

1 0 0 1 1

Error in any of the co ordinates of any of the p oints will surely cause error in the transfor-

mation parameters. A much b etter approach is to use many more pairs of matching control

p oints to determine a least-squares estimate of the six parameters. We can de ne an error

criteria function similar to that used for tting a straight line in Chapter 10.

n

X

2 2

"(a ;a ;a ;a ;a ;a )= ((a x + a y + a u ) +(a x + a y + a v ) )

11 12 13 21 22 23 11 j 12 j 13 j 21 j 22 j 23 j

j=1

(11.16)

Taking the six partial derivatives of the error function with resp ect to each of the six variables

and setting this expression to zero gives us the six equations represented in matrix form in

Equation 11.17 .

2 3 2 3 2 3

2

x x y x 0 0 0

a u x

j j j j

11 j j

2

6 7 6 7 6 7

x y y y 0 0 0

a u y

j j j

12 j j

j

6 7 6 7 6 7

6 7 6 7 6 7

x y 1 0 0 0 a u

j j 13 j

6 7 6 7 6 7

= (11.17)

2

6 7 6 7 6 7

0 0 0 x x y x a v x

j j j j 21 j j

6 7 6 7 6 7

2

4 5 4 5 4 5

0 0 0 x y y y a v y

j j j 22 j j

j

a v

0 0 0 x y 1

23 j

j j

Exercise 10

Solve Equation 11.17 using the following three pairs of matching control p oints: ([0,0],[0,0]),

([1,0],[0,2]), ([0,1],[0,2]). Do your computations give the same answer as reasoning ab out

the transformation using basis vectors?

Exercise 11

Solve Equation 11.17 using the following three pairs of matching control p oints: ([0,0],[1,2]),

([1,0],[3,2]), ([0,1],[1,4]). Do your computations give the same answer as reasoning ab out

the transformation using basis vectors?

It is common to use many control p oints to put an image and map or two images into

corresp ondence. Figure 11.10 shows two images of approximately the same scene. Eleven

pairs of matching control p oints are given at the b ottom of the gure. Control p oints are

corners of ob jects that are uniquely identi able in b oth images (or map). In this case, the

control p oints were selected using a display program and mouse. The list of residuals shows

that, using the derived , no u or v co ordinate in the right image will

b e o by 2 pixels from the transformed value. Most residuals are less than one pixel. Better

results can b e obtained by using automatic feature detection which lo cates feature p oints

with subpixel accuracy: control p oint co ordinates are often o by one pixel when chosen

16 Computer Vision: Mar 2000

using a computer mouse and the human eye. Using the derived ane transformation, the

right image can b e searched for ob jects known to b e in the left image. Thus wehave reached

the p oint of understanding how the tax-collectors map can b e put into corresp ondence with

an aerial image for the purp ose of up dating its inventory of ob jects.

Exercise 12

Take three pairs of matching control p oints from Figure 11.10 (for example, ([288, 210, 1],

[31, 160, 1]) ) and verify that the ane tranformation matrix maps the rst into the second.

11.5 2D Ob ject Recognition via Ane Mapping

In this section we study a few metho ds of recognizing 2D ob jects through mappings of

mo del p oints onto image p oints. Wehave already intro duced one metho d of recognition-by-

alignment in the section on ane mappings. The general metho ds work with general p oint

features; however, each application domain will present distinguishing features which allow

lab els to b e attached to feature p oints. Thus we mighthave corners or centers of holes in

a part sorting application, or intersections and high curvature land/water b oundary p oints

in a land use application.

Figure 11.11 illustrates the overall mo del-matching paradigm. Figure 11.11a) is a b ound-

ary mo del of an airplane part. Feature p oints that may b e used in matching are indicated

with small black circles. Figure 11.11b) is an image of the real airplane part in approxi-

mately the same orientation as the mo del. Figure 11.11c) is a second image of the real part

rotated ab out 45 degrees. Figure 11.11d) is a third image of the real part in which the cam-

era angle causes a large amountofskewing in the resultant image. The metho ds describ ed

in this section are meant to determine if a given image, such as those of Figures 11.11b),

11.11c), and 11.11d) contains an instance of an ob ject mo del such as that of Figure 11.11a)

and to determine the pose (p osition and orientation) of the ob ject with resp ect to the camera.

Lo cal Feature Fo cus Metho d

The lo cal-feature-fo cus metho d uses lo cal features of an ob ject and their 2D spatial re-

lationships to recognize the ob ject. In advance, a set of ob ject mo dels is constructed, one

for each ob ject to b e recognized. Each ob ject mo del contains a set of focus features, which

are ma jor features of the ob ject that should b e easily detected if they are not o ccluded by

some other ob ject. For each fo cus feature, a set of its nearby features is included in the

mo del. The nearby features can b e used to verify that the correct fo cus feature has b een

found and to help determine the p ose of the ob ject.

In the matching phase, feature extraction is p erformed on an image of one or more ob-

jects. The matching algorithm lo oks rst for fo cus features. When it nds a fo cus feature

b elonging to a given mo del, it lo oks for a cluster of image features near the fo cus feature

that match as many as p ossible of the required nearby features for that fo cus feature in

that mo del. Once such a cluster has b een found and the corresp ondences b etween this

small set of image features and ob ject mo del features have b een determined, the algorithm

Shapiro and Sto ckman 17

======Best 2D Affine Fit Program======

Matching control point pairs are:

288 210 31 160 232 288 95 205 195 372 161 229 269 314 112 159

203 424 199 209 230 336 130 196 284 401 180 124 327 428 198 69

284 299 100 146 337 231 45 101 369 223 38 64

The Transformation Matrix is:

[ -0.0414 , 0.773 , -119

-1.120 , -0.213 , 526 ]

Residuals (in pixels) for 22 equations are as follows:

0.18 -0.68 -1.22 0.47 -0.77 0.06 0.34 -0.51 1.09 0.04 0.96

1.51 -1.04 -0.81 0.05 0.27 0.13 -1.12 0.39 -1.04 -0.12 1.81

======Fitting Program Complete======

Figure 11.10: Images of same scene and b est ane mapping from the left image into the

right image using 11 control p oints. [x,y] co ordinates for the left image with x increasing

downward and y increasing to the right; [u,v] co ordinates for the right image with u in-

creasing downward and v toward the right. The 11 clusters of co ordinates directly b elow

the images are the matching control p oints x,y,u,v.

18 Computer Vision: Mar 2000

a) Part Mo del b) Horizontal Image

c) Rotated Image d) Rotated and Skewed Image

Figure 11.11: A 2D mo del and 3 matching images of an airplane part.

hyp othesizes that this ob ject is in the image and uses a veri cation technique to decide if

the hyp othesis is correct.

The veri cation pro cedure must determine whether there is enough evidence in the im-

age that the hyp othesized ob ject is in the scene. For p olyhedral ob jects, the b oundary of

the ob ject is often used as suitable evidence. The set of feature corresp ondences is used

to determine a p ossible ane transformation from the mo del p oints to the image p oints.

This transformation is then used to transform each line segment of the b oundary into the

image space. The transformed line segments should approximately line up with image line

segments wherever the ob ject is uno ccluded. Due to noise in the image and errors in feature

extraction and matching, it is unlikely that the transformed line segments will exactly align

with image line segments, but a rectangular area ab out each transformed line segment can

b e searched for evidence of p ossibly matching image segments. If sucient evidence is found,

that mo del segment is marked as veri ed. If enough mo del segments are veri ed, the ob ject

is declared to b e in the image at the lo cation sp eci ed by the computed transformation.

The lo cal-feature-fo cus algorithm for matching a given mo del F to an image is given

b elow. The mo del has a set fF ;F ;:::;F g of fo cus features. For each fo cus feature F ,

1 2 M m

Shapiro and Sto ckman 19

there is a set S (F ) of nearby features that can help to verify this fo cus feature. The image

m

has a set fG ;G ;:::;G g of detected image features. For each image feature G , there is

1 2 I i

a set of nearby image features S (G ).

i

Find the transformation from mo del features to image features

using the lo cal-feature-fo cus approach.

G ;i =1;I is the set of the detected image features.

i

F ;m =1;M is the set of fo cus features of the mo del.

m

S (f ) is the set of nearby features for any feature f .

pro cedure lo cal feature fo cus(G,F);

f

for each fo cus feature F

m

for each image feature G of the same typ e as F

i m

f

Find the maximal subgraph S of S (F ) that

m m

matches a subgraph S of S (G );

i i

Compute the transformation T that maps the p oints of

each feature of S to the corresp onding feature of S ;

m i

Apply T to the b oundary segments of the mo del;

if enough of the transformed b oundary segments nd

evidence in the image then return(T );

g ;

g

Algorithm 1: Lo cal-Feature-Fo cus Metho d

Figure 11.12 illustrates the lo cal-feature-fo cus metho d with two mo dels, E and F, and

an image. The detected features are circular holes and sharp corners. Lo cal feature F1 of

mo del F has b een hyp othesized to corresp ond to feature G1 in the image. Nearby features

F2, F3, and F4 of the mo del have b een found to corresp ond well to nearby features G2, G3,

and G4, resp ectively, of the image. The veri cation step will show that mo del F is indeed

in the image. Considering the other mo del E, feature E1 and the set of nearby features E2,

E3, and E4 have b een hyp othesized to corresp ond to features G5, G6, G7, and G8 in the

image. However, when veri cation is p erformed, the b oundary of mo del E will not line up

well with the image segments, and this hyp othesis will b e discarded.

Pose Clustering

Wehave seen that an alignmentbetween mo del and image features using an RST trans-

formation can b e obtained from two matching control p oints. The solution can b e obtained

using Equations 11.6 once two control p oints have b een matched b etween image and mo del.

Obtaining the matching control p oints automatically may not b e easy due to ambiguous

matching p ossibilities. The p ose clustering approach computes an RST alignment for all

p ossible control p oint pairs and then checks for a cluster of similar parameter sets. If indeed

there are many matching feature p oints b etween mo del and image, then a cluster should

20 Computer Vision: Mar 2000

Model F

F2 F3 F1 F4

G2 G1 Image G4 G3 E4

G8 E1

G5 G6 G7 E2 Model E

E3

Figure 11.12: The Lo cal Feature Fo cus Metho d

exist in the parameter space. A p ose-clustering algorithm is sketched b elow.

9 Definition Let T beaspatial transformation aligning model M to an object O in image

I . The pose of object O is its location and orientation as de ned by the parameters of T .

Using all p ossible pairs of feature p oints would provide to o much redundancy. An ap-

plication in matching aerial images to maps can use intersection p oints detected on road

networks or at the corners of regions such as elds. The degree of the intersection gives it a

typ e to b e used in matching; for example, common intersections havetyp e 'L','Y','T', 'Ar-

row', and 'X'. Assume that we use only the pairs of combined typ e LX or TY. Figure 11.14

shows an example with 5 mo del pairs and 4 image pairs. Although there are 4  5= 20

'L' 'Y' 'T' 'Arrow' 'X'

junction junction junction junction junction

Figure 11.13: Common line-segment junctions used in matching.

Shapiro and Sto ckman 21

Find the transformation from mo del features to image features

using p ose clustering.

P ;i =1;D is the set of detected image features.

i

L ;j =1;M is the set of stored mo del features.

j

pro cedure p ose clustering(P,L);

f

for each pair of image feature p oints (P ;P )

i j

for each pair of mo del feature p oints (L ;L ) of same typ e

m n

f

compute parameters of RST mapping

pair (L ;L )onto (P ;P );

m n i j

contribute to the cluster space;

g ;

examine space of all candidates for clusters;

verify every large cluster by mapping all

mo del feature p oints and checking the image;

return(veri ed f g);

k

g

Algorithm 2: Pose-Clustering Algorithm

p ossible pairings, only 10 of them have matching typ es for b oth p oints. The transformations

computed from each of those are given in Table 11.3. The 10 transformations computed

have scattered and inconsistent parameter sets, except for 3 of them indicated by'*inthe

last column of the table. These 3 parameter sets form a cluster whose average parameters

are  =0:68;s =2:01;u = 233;v = 41. While one would like less variance than this for

0 0

correct matches, this variance is typical due to slight errors in p oint feature lo cation and

small nonlinear distortions in the imaging pro cess. If the parameters of the RST mapping

are inaccurate, they can b e used to verify matching p oints which can then b e used as control

p oints to nd a nonlinear mapping or ane mapping (with more parameters) that is more

accurate in matching control p oints.

Pose clustering can work using low level features; however, b oth accuracy and eciency

are improved when features can b e ltered bytyp e. Clustering can b e p erformed by a simple

2

O (n ) algorithm: for each parameter set , count the numb er of other parameter sets

i

that are close to it using some p ermissible distance. This requires n 1 distance computa-

tions for eachofthe nparameter sets in cluster space. A faster, but less exible, alternative

is to use binning. Binning has b een the traditional approach rep orted in the literature and

was discussed in Chapter 10 with resp ect to the Hough transform. Each parameter set

pro duced is contributed to a bin in parameter space, after which all bins are examined for

signi cant counts. A cluster can b e lost when a set of similar spread over neighb oring bins.

i

The clustering approach has b een used to detect the presence of a particular mo del of

airplane from an aerial image, as shown in Figure 11.15. Edge and curvature features are

extracted from the image using the metho ds of Chapters 5 and 10. Various overlapping

22 Computer Vision: Mar 2000

Y c X L L T 200 X 400 Y L X Y T T Y 100 T L 200 X L X Y

MI100 200 X200 400 r

Figure 11.14: Example p ose detection problem with 5 mo del feature p oint pairs and 4 image

feature p oint pairs.

Table 11.3: Cluster space formed from 10 p ose computations from Figure 11.14

Mo del Pair Image Pair  s u v

0 0

L(170,220),X(100,200) L(545,400),X(200,120) 0.403 6.10 118 -1240

L(170,220),X(100,200) L(420,370),X(360,500) 5.14 2.05 -97 514

T(100,100),Y( 40,150) T(260,240),Y(100,245) 0.663 2.05 225 -48 *

T(100,100),Y( 40,150) T(140,380),Y(300,380) 3.87 2.05 166 669

L(200,100),X(220,170) L(545,400),X(200,120) 2.53 6.10 1895 200

L(200,100),X(220,170) L(420,370),X(360,500) 0.711 1.97 250 -36 *

L(260, 70),X( 40, 70) L(545,400),X(200,120) 0.682 2.02 226 -41 *

L(260, 70),X( 40, 70) L(420,370),X(360,500) 5.14 0.651 308 505

T(150,125),Y(150, 50) T(260,240),Y(100,245) 4.68 2.13 3 568

T(150,125),Y(150, 50) T(140,380),Y(300,380) 1.57 2.13 407 60

Shapiro and Sto ckman 23

(a) original air eld image

(b) mo del of ob ject

(c) detections matching mo del

Figure 11.15: Pose-clustering applied to detection of a particular airplane. (a) Aerial image

of an air eld; (b) ob ject mo del in terms of real edges and abstract edges subtending one

corner and one curve tip p oint; (c) image window containing detections that match many mo del parts via the same transformation.

24 Computer Vision: Mar 2000

windows of these features are matched against the mo del shown in part (b) of the gure.

Part (c) of the gure shows the edges detected in one of these windows where manyofthe

features aligned with the mo del features using the same transformation parameters.

Geometric Hashing

Both the lo cal-feature-fo cus metho d and the p ose clustering algorithm were designed to

match a single mo del to an image. If several di erent ob ject mo dels were p ossible, then these

metho ds would try each mo del, one at a time. This makes them less suited for problems in

which a large numb er of di erent ob jects can app ear. Geometric hashing was designed to

work with a large database of mo dels. It trades a large amount of oine prepro cessing and a

large amount of space for a p otentially fast online ob ject recognition and p ose determination.

Supp ose we are given

1. a large database of mo dels

2. an unknown ob ject whose features are extracted from an image and whichisknown

to b e an ane transformation of one of the mo dels.

and we wish to determine which mo del it is and what transformation was applied.

Consider a mo del M to b e an ordered set of feature p oints. Any subset of three non-

collinear p oints E = fe , e , e g of M can b e used to form an ane basis set, which

00 01 10

de nes a co ordinate system on M , as shown in Figure 11.16a). Once the co ordinate system

is chosen, any p oint x 2 M can b e represented in ane co ordinates (; ) where

x =  (e e )+(e e )+e

10 00 01 00 00

Furthermore, if we apply an ane transform T to p oint x,we get

Tx = (Te Te )+(Te Te )+Te

10 00 01 00 00

Thus Tx has the same ane co ordinates (; ) with resp ect to (Te ;Te ;Te )asxhas

00 01 10

with resp ect to (e ;e ;e ). This is illustrated in Figure 11.16b).

00 01 10

Oine Prepro cessing: The oine prepro cessing step creates a hash table containing all

of the mo dels in the database. The hash table is set up so that the pair of ane co ordinates

(; ) indexes a bin of the hash table that stores a list of mo del-basis pairs (M; E) where

some p oint x of mo del M has ane co ordinates (; ) with resp ect to basis E . The oine

prepro cessing algorithm is given b elow.

Online Recognition: The hash table created in the prepro cessing step is used in the

online recognition step. The recognition step also uses an accumulator array A indexed by

mo del-basis pairs. The bin for each(M; E) is initialized to zero and used to vote for the

hyp othesis that there is a transformation T that places (M; E) in the image. Computation

of the actual transformations is done only for those mo del-basis pairs that achieve a high

numberofvotes and is part of the veri cation step that follows the voting. The online

Shapiro and Sto ckman 25

e 10

Te 01 e 01 x Tx

Te 10

e 00

Te 00

a) original ob ject b) transformed ob ject

Figure 11.16: The ane transformation of a p oint with resp ect to an ane basis set.

recognition and p ose estimation algorithm is given b elow.

Supp ose that there are s mo dels of approximately n p oints each. Then the prepro cessing

4 3

step has complexity O (sn ) which comes from pro cessing s mo dels, O (n ) triples p er mo del,

and O (n) other p oints p er mo del. In the matching, the amountofwork done dep ends

somewhat on howwell the feature p oints can b e found in the image, how many of them are

o ccluded, and how many false or extra feature p oints are detected. In the b est case, the rst

triple selected consists of three real feature p oints all from the same mo del, this mo del gets a

large numberofvotes, the veri cation pro cedure succeeds, and the task is done. In this b est

case, assuming the average list in the hash table is of a small constant size and that hashing

time is approximately constant, the complexity of the matching step is approximately O (n).

In the worst case, for instance when the mo del is not in the database at all, every triple is

4

tried, and the complexityis O(n ). In practice, although it would b e rare to try all bases,

it is also rare for only one basis to succeed. A numb er of di erent things can go wrong:

1. feature p oint co ordinates have some error

2. missing and extra feature p oints

3. o cclusion, multiple ob jects

4. unstable bases

5. weird ane transforms on a subset of the p oints

In particular, the algorithm can \halucinate" a transformation based on a subset of the

p oints that passes the p ointveri cation tests, but gives the wrong answer. Figure 11.17

illustrates this p oint. Pose clustering and fo cus feature metho ds are also susceptible to this same phenomenon.

26 Computer Vision: Mar 2000

Set up the hash table for matching to a database of mo dels

using geometric hashing.

D is the database of mo dels.

H is an initially empty hash table.

pro cedure GH Prepro cessing(D,H);

f

for each mo del M

f

Extract the feature p oint set F of M ;

M

for each noncollinear triple E of p oints from F

M

for each other p oint x of F

M

f

Calculate (; ) for x with resp ect to E ;

Store (M; E) in hash table H at index (; );

g ;

g ;

g

Algorithm 3: Geometric Hashing Oine Prepro cessing

11.6 2D Ob ject Recognition via Relational Matching

Wehave previously describ ed three useful metho ds for matching observed image p oints to

mo del p oints: these were lo cal-feature-fo cus, p ose clustering, and geometric hashing. In

this section, we examine three simple general paradigms for ob ject recognition within the

context given in this chapter. All three paradigms view recognition as a mapping from

mo del structures to image structures: a consistent labeling of image features is soughtin

terms of mo del features, recognition is equivalent to mapping a sucientnumb er of features

from a single mo del to the observed image features. The three paradigms di er in how the

mapping is develop ed.

Four concepts imp ortant to the matching paradigms are parts, labels, assignments, and

relations.

 A part is an ob ject or structure in the scene such as region segment, edge segment,

hole, corner or blob.

 A lab el is a symb ol assigned to a part to identify/recognize it at some level.

 An assignment is a mapping from parts to lab els. If P is a region segment and L is

1 1

the lake symb ol and L the eld symb ol, an assignmentmay include the pair (P ;L )or

2 1 2

p erhaps (P ; fL ;L g) to indicate remaining ambiguity. A pairing (P ;NIL) indicates

1 1 2 1

that P has no interpretation in the current lab el set. An interpretation of the scene

1

is just the set of all pairs making up an assignment.

 A relation is the formal mathematical notion. Relations will b e discovered and

computed among scene ob jects and will b e stored for mo del ob jects. For example,

R4(P ;P ) might indicate that region P is adjacent to region P

1 2 1 2

Shapiro and Sto ckman 27

Use the hash table to nd the correct mo del and transformation

that maps image features to mo del features.

H is the hash table created by the prepro cessing step.

A is an accumulator array indexed by(M; E) pairs.

I is the image b eing analyzed.

Recognition(H,A,I); pro cedure GH

f

Initialize accumulator array A to all zero es;

Extract feature p oints from image I ;

for each basis triple F

f

for each other p oint v

f

Calculate (; ) for v with resp ect to F ;

Retrieve the list L of mo del-basis pairs from the

hash table H at index (; );

for each pair (M; E)ofL

A[M; E]= A[M; E]+ 1;

g ;

Find the p eaks in accumulator array A;

for each p eak (M; E)

f

Calculate T suchatF =TE;

if enough of the transformed mo del p oints of M nd

evidence on the image then return(T );

g ;

g ;

g

Algorithm 4: Geometric Hashing Online Recognition

28 Computer Vision: Mar 2000

a) image p oints b) halucinated ob ject

Figure 11.17: The geometric hashing algorithm can halucinate that a given mo del is present

in an image. In this example, 60% of the feature p oints (left) led to the veri ed hyp othesis

of an ob ject (right) that was not actually present in the image.

Given these four concepts, we can de ne a consistent lab eling.

10 Definition Given a set of parts P , a set of labels for those parts L,arelation R over

P

P , and a second relation R over L,aconsistent labeling f is an assignment of labels to

L

parts that satis es:

0 0

If (p ;p ) 2 R , then (f (p );f(p )) 2 R .

i i P i i L

For example, supp ose we are trying to nd a matchbetween two images: for each image

wehave a set of extracted line segments and a connection relation that indicates which pairs

of line segments are connected. Let P b e the set of line segments and R b e the set of pairs

P

of connecting line segments from the rst image, R  P  P . Similarly, let L b e the set of

P

line segments and R b e the set of pairs of connecting line segments from the second image,

L

R  L  L. Figure 11.18 illustrates two sample images and the sets P , R , L, and R .

L P L

Note that b oth R and R are symmetric relations; if (Si; Sj) b elongs to such a relation,

P L

then so do es (Sj; Si). In our examples, we list only tuples (Si; Sj) where i< j, but the

mirror image tuple (Sj; Si) is implicitly present.

A consistent lab eling for this problem is the mapping f given b elow:

f(S1) = Sj f(S7) = Sg

f(S2) = Sa f(S8) = Sl

f(S3) = Sb f(S9) = Sd

f(S4) = Sn f(S10) = Sf

f(S5) = Si f(S11) = Sh

f(S6) = Sk

For another example, we return to the recognition of the kleep ob ject shown in Fig-

ure 11.8 and de ned in the asso ciated tables. Our matching paradigms will use the distance

Shapiro and Sto ckman 29

Sj S1 Si S2 Sa Sk S5 S6 Sn Sh S4 S11 Sb Sg Sl S3 S7 S8 Sc Sd Sf Sm S9 S10 Se

Image 1 Image 2

P = fS1,S2,S3,S4,S5,S6,S7,S8,S9,S10,S11g.

L = fSa,Sb,Sc,Sd,Se,Sf,Sg,Sh,Si,Sj,Sk,Sl,Smg.

R = f (S1,S2), (S1,S5), (S1,S6), (S2,S3), (S2,S4), (S3,S4), (S3,S9), (S4,S5), (S4,S7),

P

(S4,S11), (S5,S6), (S5,S7), (S5,S11), (S6,S8), (S6,S11), (S7,S9), (S7,S10), (S7,S11), (S8,S10),

(S8,S11), (S9,S10) g.

R = f (Sa,Sb), (Sa,Sj), (Sa,Sn), (Sb,Sc), (Sb,Sd), (Sb,Sn), (Sc,Sd), (Sd,Se), (Sd,Sf ),

L

(Sd,Sg), (Se,Sf ), (Se,Sg), (Sf,Sg), (Sf,Sl), (Sf,Sm), (Sg,Sh), (Sg,Si), (Sg,Sn), (Sh,Si), (Sh,Sk),

(Sh,Sl), (Sh,Sn), (Si,Sj), (Si,Sk), (Si,Sn), (Sj,Sk), (Sk,Sl), (Sl,Sm) g.

Figure 11.18: Example of a Consistent Lab eling Problem

Exercise 13 Consistent Lab eling Problem

Show that the lab eling f given ab ove is a consistent lab eling. Because the relations are

symmetric, the following mo di ed constraintmust b e satis ed:

0 0 0

If (p ;p ) 2 R , then (f (p );f(p )) 2 R or (f (p );f(p )) 2 R .

i i P i i L i i L

\relation" de ned for anytwo p oints: each pair of holes is related by the distance b etween

them. Distance is invariant to rotations and translations but not scale change. Abusing

notation, we write 12(A; B ) and 12(B; C ) to indicate that p oints A and B are distance 12

apart in the mo del, similarly for p oints B and C . 12(C; D) do es NOT hold, however, as we

see from the distance tables. To allow for some distortion or detection error, we might allow

that 12(C; D) is true even when the distance b etween C and D is actually 12 + =  for

some small amount.

The Interpretation Tree

11 Definition An interpretation tree (IT) is a tree that represents al l possible assignments

of labels to parts. Every path in the tree is terminated either because it represents a complete

consistent assignment, or because the partial assignment it represents fails some relation.

A partial interpretation tree for the image data of Figure 11.8 is shown in Figure 11.19.

The tree has three levels, each to assign a lab el to one of the three holes H ;H ;H ob-

1 2 3

30 Computer Vision: Mar 2000

φ

H ,A H ,B H ,C H ,D H ,E H ,NIL 11 1111

21(H ,H ) 12

H ,A H ,B H ,C H ,D H ,E H ,NIL H ,A 21(H ,H )~21(A,E) 222222 2 12

0(A,A) 12(A,B) 15(A,C) 37(A,D) 21(A,E) 21(H ,H ) 12

26(H ,H ) 12(H , H ) 13 23

H ,A H ,B H ,C H ,D H ,E H ,NIL H ,B 3 33 33 3 3 26(H ,H ) ~ 26(E,B) 0(A,A) 37(A,D) 13 12(H ,H ) ~ 12(A,B)

21(A,E) 30(D,E) 23

Figure 11.19: Partial interpretation tree search for a consistent lab eling of the kleep parts

in Figure 11.8 (right).

served in the image. No inconsistencies o ccur at the rst level since there are no distance

constraints to check. However, most lab el p ossibilities at level 2 can b e immediately ter-

minated using one distance check. For example, the partial assignment f(H ;A);(H ;A)g

1 2

is inconsistent b ecause the relation 21(H ;H ) is violated by0(A; A). Many paths are not

1 2

shown due to lack of space. The path of lab els denoted by the b oxes yields a complete and

consistent assignment. The path of lab els denoted by ellipses is also consistent; however, it

contains one NIL lab el and thus has fewer constraints to check. This assignment has the

rst two pairs of the complete (b oxed) assignment reversed in lab els and the single distance

check is consistent. Multiple paths of an IT can succeed due to symmetry. Although the

IT p otentially contains an exp onential numb er of paths, it has b een shown that most paths

will terminate by level 3 due to the relational constraints. Use of the lab el NIL allows for

detection of artifacts or the presence of features from another ob ject in the scene.

The IT can easily b e develop ed using a recursive backtracking pro cess that develops

paths in a depth- rst fashion. Atany instantiation of the pro cedure, the parameter f ,

which is initially NIL, contains the consistent partial assignment. Whenever a new lab eling

of a part is consistent with the partial assignment, the algorithm go es deep er in the tree

byhyp othesizing another lab el for an unlab eled part; if an inconsistency is detected, then

the algorithm backs up to make an alternate choice. As co ded, the algorithm returns the

rst completed path, whichmay include NIL lab els if that lab el is explicitly included in L.

Shapiro and Sto ckman 31

An improvementwould b e to return the completed path with the most nonNIL pairs, or

p erhaps all completed paths.

The recursiveinterpretation tree search algorithm is given b elow. It is de ned to b e gen-

eral and to handle arbitary N-ary relations, R and R , rather than just binary relations.

P L

R and R can b e single relations, such as the connection relation in our rst example,

P L

or they can b e unions of a numb er of di erent relations, such as connection, parallel, and

distance.

Find the mapping from mo del features to image features that

satis es the mo del relationships by a tree search.

P is the set of detected image features.

L is the set of stored mo del features.

R is the relationship over the image features.

P

R is the relationship over the mo del features.

L

f is the consistent lab eling to b e returned, initially NIL.

pro cedure Interpretation Tree Search(P ; L; R ;R ;f);

P L

f

p := rst(P );

for each l in L

f

0

f = f [f(p; l )g; /* add part-lab el to interpretation */

OK = true;

for each N-tuple (p ;:::;p )inR containing comp onent p

1 N P

and whose other comp onents are all in domain(f )

/* check on relations */

if (f (p );:::;f(p )) is not in R then

1 N L

f

OK: = false;

break;

g

if OK then

f

0

P = rest(P );

0 0

if isempty(P ) then output(f );

0 0

else Interpretation Tree Search(P ;L;R ;R ;f );

P L

g

g

g

Algorithm 5: Interpretation Tree Search Discrete Relaxation

32 Computer Vision: Mar 2000

Relaxation uses only lo cal constraints rather than all the constraints available; for ex-

ample, all constraints from matches on a single path of the IT. After N iterations, lo cal

constraints from one part neighb orho o d can propagate across the ob ject to another part on

a path N edges distant. Although the constraints used in one iteration are weaker than

those available using the IT search, they can b e applied in parallel, thus making faster and

simpler pro cessing p ossible.

Initially, a part can b e lab eled byany lab el p ermitted by its typ e; supp ose we assign

it a set of all these p ossible lab els. Discrete relaxation examines the relations b etween a

particular part and all others, and by doing so, reduces the p ossible lab els for that particular

part. In the related problem of character recognition, if it is known that the following letter

cannot b e \U", then we can conclude that the current letter cannot b e \Q". In yet a di erent

domain, if it is known that an image region is not water, then an ob ject in it is not a ship.

Discrete relaxation was p opularized byDavid Waltz, who used it to constrain the lab els

assigned to edges of line drawings. (Waltz ltering is discussed in the text by Winston.)

Waltz used an algorithm with some sequential character; here we present a parallel approach.

Each part P is initially assigned the entire set of p ossible lab els L according to its

i j

typ e. Then, all relations are checked to see if some lab els are imp ossible: inconsistent lab els

are removed from the set. The lab el sets for each part can b e pro cessed in parallel through

passes. If, after any pass some lab els have b een ltered out of some sets, then another

pass is executed; if no lab els havechanged, then the ltering is completed. There maybe

no interpretations left p ossible, or there may b e several. The following example should b e

instructive. Tokeep it simple, we assume that there are no extra features detected that are

not actual parts of the mo del; as b efore, we assume that some features mayhave b een missed.

Wenow match the data in Tables 11.1 and 11.2. The ltering pro cess b egins with all 5

lab els p ossible for each of the 3 holes H ;H ;H .To add interest and to b e more practical,

1 2 3

we wil l al low a toleranceof 1 on distance matches.Table 11.4 shows the 3 lab el sets at

some p oint midway through the rst pass. Each cell of the table gives the reason why a lab el

must b e deleted or why it survives. A is deleted from the lab el set for H b ecause the relation

1

26(H ;H ) cannot b e explained byany lab el for H . The lab el A survives for H b ecause

1 3 3 2

there is lab el E 2 L(H ) to explain the relation 21(H ;H ) and lab el B 2 L(H ) to explain

1 2 1 3

relation 12(H ;H ). The lab el C survives for H , b ecause d(H ;H )= 21 22 = d(C; D).

2 3 2 2 1

At the end of the rst pass, as Table 11.5 shows, there are only two lab els p ossible for

H , only lab el E remains for H , and only lab el B remains for H .At the end of pass 1 the

2 1 3

reduced lab el sets are made available for the parallel pro cessing of pass 2, where each lab el

set is further ltered in asynchronous parallel order.

Exercise 14

Give detailed justi cation for each of the lab els b eing in or out of each of the lab el sets after

pass 1 as shown in Table 11.5.

Pass 2 deletes lab el C from L(H ) b ecause the relation 21(H ;H ) can no longer b e

2 1 2

explained by using D as a lab el for H . After pass 3, additional passes cannot change any

1

lab el set so the pro cess has converged. In this case, the lab el sets are all singletons repre-

Shapiro and Sto ckman 33

Table 11.4: Midway through rst pass of relaxation lab eling

A B C D E

H no N 3 no N 3 no N 3 no N 3 21(H ;H )

1 1 2

d(A; N )= 26 d(B; N )=21 d(C; N )= 26 d(D; N )= 26 A 2L(H )

2

26(H ;H )

1 3

B 2 L(H )

3

H 21(H ;H ) no N 3 21(H ;H )

2 2 1 2 1

E 2 L(H ) d(B; N )=21 D 2L(H )

1 1

12(H ;H ) 12(H ;H )

2 3 2 3

B 2 L(H ) B 2 L(H )

3 3

H no N 3 12(H ;H )

3 3 2

d(A; N )= 26 A 2L(H )

2

26(H ;H )

3 1

E 2 L(H )

1

Table 11.5: After completion of the rst pass of relaxation lab eling

A B C D E

H no no no no p ossible

1

H p ossible no p ossible no no

2

H no p ossible no no no

3

senting a single assignment and interpretation. A high-level sketch of the algorithm is given

b elow. Although a simple and p otentially fast pro cedure, relaxation lab eling sometimes

leaves more ambiguity in the interpretation than do es IT search b ecause constraints are

only applied pairwise. Relaxation lab eling can b e applied as prepro cessing for IT search: it

can substantially reduce the branching factor of the tree search.

Continuous Relaxation

In exact consistent lab eling pro cedures, such as tree search and discrete relaxation, a

lab el l for a part p is either p ossible or imp ossible at any stage of the pro cess. As so on

as a part{lab el pair (p; l ) is found to b e incompatible with some already instantiated pair,

the lab el l is marked as illegal for part p. This prop erty of calling a lab el either p ossible

or imp ossible in the preceding algorithms makes them discrete algorithms. In contrast, we

can asso ciate with each part{lab el pair (p; l ) a real numb er representing the probabilityor

Table 11.6: After completion of the second pass of relaxation lab eling

A B C D E

H no no no no p ossible

1

H p ossible no no no no

2

H no p ossible no no no 3

34 Computer Vision: Mar 2000

Table 11.7: After completion of the third pass of relaxation lab eling

A B C D E

H no no no no p ossible

1

H p ossible no no no no

2

H no p ossible no no no

3

Remove incompatible lab els from the p ossible lab els for a set

of detected image features.

P ;i =1;D is the set of detected image features.

i

S (P );i =1;D is the set of initially compatible lab els.

i

R is a relationship over which compatibility is determined.

pro cedure Relaxation Lab eling(P, S, R);

f

rep eat

for each(P;S(P ))

i i

f

for each lab el L 2 S (P )

k i

for each relation R(P ;P )over the image parts

i j

if 9 L 2 S (P ) with R(L ;L ) in mo del

m j k m

then keep L in S (P )

k i

else delete L from S (P )

k i

g

until no change in any set S (P )

i

return(S);

g

Algorithm 6: Discrete Relaxation Lab eling

Shapiro and Sto ckman 35

certainty that part p can b e assigned lab el l . In this case the corresp onding algorithms

are called continuous. In this section we will lo ok at a lab eling algorithm called continuous

relaxation for symmetric binary relations.

A continuous relaxation lab eling problem is a 6{tuple C LRP =(P ; L; R ;R ; P R; C ).

P L

As b efore, P is a set of parts, L is a set of lab els for those parts, R is a relationship over the

P

parts, and R is a relationship over the lab els. L is usually given as the union over all parts

L

i of L , the set of allowable lab els for part i. Supp ose that jP j = n. Then PR is a set of n

i

functions PR = fpr :::;pr g where pr (l ) is the a priori probability that lab el l is valid for

1 n i

2

part i. C is a set of n compatibility co ecients C = fc g;i =1;:::;n;j =1;:::;n: c

ij ij

can b e thought of as the in uence that part j has on the lab els of part i.Thus, if we view

the constraint relation R as a graph, we can view c asaweight on the edge b etween part

P ij

i and part j .

2

Instead of using R and R directly,we combine them to form a set R of n functions

P L

0

R = fr g;i =1;:::;n;j =1;:::;n; where r (l; l ) is the compatibility of lab el l for part

ij ij

0 0 0

i with lab el l for part j . In the discrete case, r (l; l )would b e 1, meaning ((i; l ); (j; l ))

ij

0

is allowed or 0, meaning that combination is incompatible. In the continuous case, r (l; l )

ij

can b e anyvalue b etween 0 and 1, indicating how compatible the relationship b etween parts

0

i and j is with the relationship b etween lab els l and l . This information can come from

R and R , whichmay b e themselves simple, binary relations or may b e attributed binary

P L

relations, where the attribute asso ciated with a pair of parts (or pair of lab els) represents

the likeliho o d that the required relationship holds b etween them. The solution of a contin-

uous relaxation lab eling problem, like that of a consistent lab eling problem, is a mapping

f : P ! L that assigns a lab el to each unit. Unlike the discrete case, there is no external def-

inition stating what conditions such a mapping f must satisfy. Instead, the de nition of f is

implicit in the pro cedure that pro duces it. This pro cedure is known as continuous relaxation.

As discrete relaxation algorithms iterate to remove p ossible lab els from the lab el set L

i

of a part i, continuous relaxation iterates to up date the probabilities asso ciated with each

part{lab el pair. The initial probabilities are de ned by the set PR of functions de ning a

priori probabilities. The algorithm starts with these initial probabilities at step 0. Thus we

de ne the probabilities at step 0 by

0

pr (l )=pr (l ) (11.18)

i

i

for each part i and lab el l .At each iteration k of the relaxation, a new set of probabilities

k

fpr (l )g is computed from the previous set and the compatibility information. In order to

i

k k

de ne pr (l )we rst intro duce a piece of it, q (l ) de ned by

i i

X X

0 0 k k

(l )] (11.19) (l )= c [ r (l; l )pr q

ij ij

j i

0

l 2L

fjj(i;j )2R g

j

P

k

The function q (l ) represents the in uence that the current probabilities asso ciated with

i

lab els of other parts constrained by part i have on the lab el of part i: With this formulation,

k

the formula for up dating the pr 's can b e written as

i

k k

pr (l )(1 + q (l ))

k +1

i i

P

(11.20) pr (l )=

i

k k

0 0

pr (l )(1 + q (l ))

i i

0

l 2L i

36 Computer Vision: Mar 2000

p1 l1 l5

p4 p2 l4 l2 l6

p3 l3 l7 model image

parts labels

Figure 11.20: A mo del and image for continuous relaxation.

k

The numerator of the expression allows us to add to the current probability pr (l ) a term

i

k k

that is the pro duct pr (l )q (l ) of the current probability and the opinions of other related

i i

parts, based on the current probabilities of their own p ossible lab els. The denominator

normalizes the expression by summing over all p ossible lab els for part i.

Exercise 15 Continous Relaxation

Figure 11.20 shows a mo del and an image, each comp osed of line segments. Two line

segments are said to b e in the relationship closadj if their endp oints either coincide or are

close to each other. a) Construct the attributed relation R over the parts of the mo del

P

de ned by R = f(p ;p ;d)jp cl osadjpgand the attributed relation R over the lab els

P i j i j L

of the image de ned by R = f(l ;l )jl cl osadjl g. b) De ne the compatibility co ecients

L i j i j

by c =1if (p ;p ) 2 R el se 0. Use R and R together to de ne R in a manner of

ij i j P P L

your cho osing. Let pr (l )begiven as 1 if p has the same orientation as l , 0 if they are

i j i j

p erp endicular, and .5 if one is diagonal and the other is horizontal or vertical. De ne pr for

the parts of the mo del and lab els of the image. c) Apply several iterations of continuous

relaxation to nd a probable lab eling from the mo del parts to the image lab els.

Relational Distance Matching

A fully-consistent lab eling is unrealistic in many real applications. Due to feature ex-

traction errors, noise, and o cclusion of one ob ject by another, an image mayhave missing

and extra parts, and required relationships may not hold. Continous relaxation maybe

used in these cases, but it is not guaranteed to nd the b est solution. In problems where

nding an optimal solution is imp ortant, we can p erform a search to nd the best mapping

f from P to L, in the sense that it preserves the most relationships and/or minimizes the

numb er of NIL lab els. The concept of relational distance as originally de ned by Haralick

and Shapiro [1981] allows us to de ne the b est mapping in the general case where there may

be anynumb er of relations with p ossibly di erent dimensions. To do this we rst need the

concept of a relational description of an image or ob ject.

Shapiro and Sto ckman 37

12 Definition Arelational description D isasequenceofrelations D = fR ;:::;R g

P X 1 I

n

i

where for each i =1;:::;I, there exists a positive integer n with R  P for some set

i i

P . P is a set of the parts of the entity being described and the relations R indicate various

i

relationships among the parts.

A relational description is a data structure that may b e used to describ e two{dimensional

shap e mo dels, three-dimensional ob ject mo dels, regions on an image, and so on.

Let D = fR ;:::;R g b e a relational description with part set A and D = fS ;:::;S g

A 1 I B 1 I

b e a relational description with part set B .We will assume that j A j=j B j; if this is not

the case, we will add enough dummy parts to the smaller set to make it the case. The

assumption is made in order to guarantee that the relational distance is a metric.

N

Let f be any one-one, onto mapping from A to B .For any R  A , N a p ositiveinteger,

the composition R  f of relation R with function f is given by

N

R  f = f(b ;:::;b ) 2 B j ther e exists(a ;:::;a ) 2 R

1 N 1 N

w ith f (a )= b ;n =1;:::;Ng

n n

This comp osition op erator takes N {tuples of R and maps them, comp onentby comp onent,

N

into N {tuples of B .

The function f maps parts from set A to parts from set B . The structural error of f for

the ith pair of corresp onding relations (R and S )inD and D is given by

i i A B

i 1

E (f )=jR f S j+jS f R j:

i i i i

S

The structural error indicates how many tuples in R are not mapp ed by f to tuples in S

i i

1

and how many tuples in S are not mapp ed by f to tuples in R . The structural error is

i i

expressed with resp ect to only one pair of corresp onding relations.

The total error of f with resp ect to D and D is the sum of the structural errors for

A B

each pair of corresp onding relations. That is,

I

X

i

E (f )= E (f ):

S

i=1

The total error gives a quantitative idea of the di erence b etween the two relational descrip-

tions D and D with resp ect to the mapping f .

A B

The relational distance GD (D ;D )between D and D is then given by

A B A B

GD (D ;D )= min E (f ):

A B

11

f :A!B

onto

That is, the relational distance is the minimal total error obtained for any one{one, onto

mapping f from A to B .We call a mapping f that minimizes total error a best mapping from

D to D . If there is more than one b est mapping, additional information that is outside

A B the pure relational paradigm can b e used to select the preferred mapping. More than one

38 Computer Vision: Mar 2000

12 ab

3d4c

RS

Figure 11.21: Two digraphs whose relational distance is 3.

b est mapping will o ccur when the relational descriptions involve certain kinds of symmetries.

We illustrate the relational distance with several examples. Figure 11.21 shows two di-

graphs, eachhaving four no des. A b est mapping from A = f1; 2; 3; 4g to B = fa; b; c; dg is

ff (1) = a; f (2) = b; f (3) = c; f (4) = dg: For this mapping wehave

jRfSj = jf(1; 2)(2; 3)(3; 4)(4; 2)g f f(a; b)(b; c)(c; b)(d; b)gj

= jf(a; b)(b; c)(c; d)(d; b)g f(a; b)(b; c)(c; b)(d; b)gj

= jf(c; d)gj

= 1

1 1

jS  f Rj = jf(a; b)(b; c)(c; b)(d; b)g f f(1; 2)(2; 3)(3; 4)(4; 2)gj

= jf(1; 2)(2; 3)(3; 2)(4; 2)g f(1; 2)(2; 3)(3; 4)(4; 2)gj

= jf(3; 2)gj

= 1

1

E (f ) = jR  f S j + jS  f Rj

= 1 + 1

= 2

Since f is a b est mapping, the relational distance is also 2.

Figure 11.22 gives a set of ob ject mo dels M ;M ;M ; and M whose primitives are im-

1 2 3 4

age regions. Two relations are shown in the gure: the connection relation and the parallel

relation. Both are binary relations over the set of primitives. Consider the rst two mo dels,

0 0 0

M and M : The b est mapping f maps primitive 1 to 1 ; 2to2;and3to3:Under this

1 2

mapping the connection relations are isomorphic. The parallel relationship (2; 3) in mo del

0 0

M do es not hold b etween 2 and 3 in mo del M : Thus the relational distance b etween M

1 2 1

00

and M is exactly 1. Now consider mo dels M and M : The b est mapping maps 1 to 1 ; 2

2 1 3

00 00 00

to 2 ; 3to3 ; and a dummy primitiveto4 : Under this mapping, the parallel relations are

Shapiro and Sto ckman 39

Connection 4" Connection (1",2") (1,2) (1",3") 1 1" (1,3) (1",4") 2 3 2" 3" Parallel Parallel (2,3) (2",3")

M1 M3

Connection (4*,5*) 4* Connection 1' (4*,6*) 5* 6* (1',2') 2' (1*,5*) (1',3') 3' (1*,6*) 1* (1*,2*) Parallel (1*,3*) 2* 3* 0 Parallel (2*,3*) M2 M4

(5*,6*)

Figure 11.22: Four ob ject mo dels. The relational distance of mo del M to M and M to

1 2 1

M is 1. The relational distance of mo del M to M is 6.

3 3 4

now isomorphic but there is one more connection in M than in M : Again the relational

3 2

distance is exactly 1.

00  00  00 

Finally consider mo dels M and M : The b est mapping maps 1 to 1 ; 2 to 2 ; 3 to 3 ;

3 4

00   

4 to 4 ; 5 to 5 ; and 6 to 6 : (5 and 6 are dummy primitives.) For this mapping wehave

d d d d

40 Computer Vision: Mar 2000

00 00 00 00 00 00

jR  f S j = jf(1 ; 2 )(1 ; 3 )(1 ; 4 )gf

1 1

           

f(4 ; 5 )(4 ; 6 )(1 ; 5 )(1 ; 6 )(1 ; 2 )(1 ; 3 )gj

     

= jf(1 ; 2 )(1 ; 3 )(1 ; 4 )g

           

f(4 ; 5 )(4 ; 6 )(1 ; 5 )(1 ; 6 )(1 ; 2 )(1 ; 3 )gj

 

= jf(1 ; 4 )gj

= 1

1             1

jS  f R j = jf(4 ; 5 )(4 ; 6 )(1 ; 5 )(1 ; 6 )(1 ; 2 )(1 ; 3 )gf

1 1

00 00 00 00 00 00

f(1 ; 2 )(1 ; 3 )(1 ; 4 )gj

00 0 00 00 00 00 00 00

= jf(4 ; 5 )(4 ; 6 )(1 ; 5 )(1 ; 6 )(1 ; 2 )(1 ; 3 )g

d d d d

00 00 00 00 00 00

f1 ; 2 )(1 ; 3 )(1 ; 4 )gj

00 00 00 00

= jf(4 ; 5 )(4 ; 6 )(1 ; 5 )(1 ; 6 )gj

d d d d

= 4

00 00    

jR  f S j = jf(2 ; 3 )gf f2 ;3 )(5 ; 6 )gj

2 2

     

= jf(2 ; 3 )g f(2 ; 3 )(5 ; 6 )gj

= j;j

= 0

1     1 00 00

jS  f R j = jf(2 ; 3 )(5 ; 6 )gf f(2 ; 3 )gj

2 2

00 00 00 00

= jf(2 ; 3 )(5 ; 6 )g f(2 ; 3 )gj

d d

= jf(5 ; 6 )gj

d d

= 1

1

E (f ) = 1+4 = 5

S

2

E (f) = 0+1 = 1

S

E(f) = 6

Exercise 16 Relational Distance Treesearch

Mo dify the algorithm for Interpretation Tree Search to nd the relational distance b etween

two structural descriptions and to determine the b est mapping in the pro cess of doing so.

Exercise 17 One-Way Relational Distance

The ab ove de nition of relational distance uses a two-way mapping error, which is useful

when comparing two ob jects that stand alone. When matching a mo del to an image, we

often want to use only a one-way mapping error, checking how many relationships of the

mo del are in the image, but not vice versa. De ne a mo di ed one-way relational distance

that can b e used for mo del-image matching.

Shapiro and Sto ckman 41

Exercise 18 NIL mappings in Relational Distance

The ab ove de nition of relational distance do es not handle NIL lab els explicitly. Instead,

if part j has a NIL lab el, then any relationship (i; j ) will cause an error, since (f (i);NIL)

will not b e present. De ne a mo di ed relational distance that counts NIL lab els as errors

only once and do es not p enalize again for missing relationships caused by NIL lab els.

Exercise 19 Attributed Relational Distance

The ab ove de nition also do es not handle attributed relations in which each tuple, in addi-

tion to a sequence of parts, contains one or more attributes of the relation. For example,

a connection relation for line segments mighthave as an attribute the angle b etween the

connecting segments. Formally, an attributed n-relation R over part set P and attribute

set A is a set R  P  A for some nonnegativeinteger m that sp eci es the number

n m

of attributes in the relation. De ne a mo di ed relational distance in terms of attributed

relations.

Relational Indexing

Sometimes a tree searcheven with relaxation ltering is to o slow, esp ecially when an im-

age is to b e compared to a large database of mo dels. For structural descriptions in terms of

lab eled relations, it is p ossible to approximate the relational distance with a simpler voting

scheme. Intuitively, supp ose we observetwo coincentric circles and two 90 degree corners

connected by an edge. Wewould like to quickly nd all mo dels that have these structures

and match them in more detail. Toachieve this, we can build an index that allows us to lo ok

up the mo dels given the partial graph structure. Given two coincentric circles, welookup

all mo dels containing these related features and give each of those mo dels one vote. Then,

we lo ok up all mo dels having connected 90 degree corners: any mo dels rep eating from the

rst test will nowhavetwovotes. These lo okups can b e done rapidly provided that an index

is built oine b efore recognition by extracting signi cant binary relations from each mo del

and recording each in a lo okup table.

Let DB = fM ;M ;:::;M g b e a database of t ob ject mo dels. Each ob ject mo del M

1 2 t t

consists of a set of attributed parts P plus a lab eled relation R .For simplicity of explana-

T T

tion, we will assume that each part has a single lab el, rather than a vector of attributes and

that the relation is a binary relation, also with a single lab el attached to each tuple. In this

case, a mo del is represented by a set of two-graphs each of which is a graph with two no des

and two directed edges. Each no de represents a part, and each edge represents a directed

binary relationship. The value in the no de is the label of the part, rather than a unique

identi er. Similarly, the value in an edge is the label of the relationship. For example, one

no de could represent an ellipse and another could represent a pair of parallel lines. The edge

from the parallel lines no de to the ellipse no de could represent the relationship \encloses",

while the edge in the opp osite direction represents the relationship \is enclosed by".

Relational indexing requires a prepro cessing step in which a large hash table is set up.

The hash table is indexed by a string representation of a two-graph. When it is completed,

one can lo ok up anytwo-graph in the table and quickly retrieve a list of all mo dels containing

that particular two-graph. In our example, all mo dels containing an ellipse b etween two

parallel line segments can b e retrieved. During recognition of an ob ject from an image, the

42 Computer Vision: Mar 2000

Figure 11.23: (Left) Regular grid of lines; (right) grid warp ed by wrapping it around a

cylinder.

Figure 11.24: (Left) Image of center of US$20 bill; (center) image of Andrew Jackson

wrapp ed around a cylinder of circumference 640 pixels; (right) same as center except cir-

cumference is 400 pixels.

features are extracted and all the two-graphs representing the image are computed. A set of

accumulators, one for each mo del in the database are all set to zero. Then eachtwo-graph

in the image is used to index the hash table, retrieve the list of asso ciated mo dels, and vote

for each one. The discrete version of the algorithm adds one to the vote; a probabilistic

algorithm would add a probabilityvalue instead. After all two-graphs havevoted, the

mo dels with the largest numb ers of votes are candidates for veri cation.

11.7 Nonlinear Warping

Nonlinear warping functions are also imp ortant. Wemay need to rectify nonlinear distortion

in an image; for example, radial distortion from a sheye lens. Or, wemaywant to distort

an image in creativeways. Figure 11.23 shows a nonlinear warp which maps a regular grid

onto a cylinder. The e ect is the same as if we wrapp ed a at image around a cylinder and

then viewed the cylinder from afar. This same warp applied to a twenty dollar bill is shown

in Figure 11.24. Intuitively,we need to cho ose some image axis corresp onding to the center

of a cylinder and then use a formula which mo dels how the pixels of the input image will

\compress" o the cylinder axis in the output image. Figure 11.24 shows twowarps; the

rightmost one uses a cylinder of smaller radius than the center one.

Figure 11.25 shows how to derive the cylindrical warp. Wecho ose an axis for the warp

Shapiro and Sto ckman 43

X U

[x, y] W [u, v] d d φ r d' x u 0 0 W

I I V 2 Y 1

input image output image

Figure 11.25: The output image at the right is created by \wrapping" the input image at

the left around a cylinder (center): distance d in the input image b ecomes distance d' on

output.

(determined by x ) and a width W . W corresp onds to one quarter of the circumference

0

of the cylinder. Any length d in the input image is \wrapp ed around the cylinder" and

then pro jected to the output image. Actually, d corresp onds to the length x x where x

0 0

is the x-co ordinate of the axis of the cylinder. The y co ordinate of the input image p oint

is preserved by the warp, so wehavev=y.From the gure, the following equations are

seen to hold. First, W =(=2)r since W accounts for one quarter of the circumference. d

0

accounts for a fraction of that: d=W = =(=2) and sin  = d =r . Combining these yields

0

d = x x =(2W=) arcsin ((=2W )(u u )). Of course, d = u u = u x0.

0 0 0

Wenowhave a formula for computing input image co ordinates [x; y ] from output co or-

dinates [u; v ] and the warp parameters x and W . This seems to b e backwards; why not

0

take each pixel from the input image and transform it to the output image? Were weto

do this, there would b e no guarantee that the output image would have each pixel set. For

a digital image, wewould like to generate each and every pixel of output exactly once and

obtain the pixel value from the input image as the following algorithm shows. Moreover,

using this approach it is easily p ossible for the output image to have more or fewer pixels

than the input image. The concept is that in generating the output image, we map back

into the input image and sample it.

44 Computer Vision: Mar 2000

Perform a cylindrical warp op eration.

1

I [x; y ] is the input image.

x is the axis sp eci cation.

0

W is the width.

2

I [u; v ] is the output image.

1

pro cedure Cylindrical Warp( I [x; y ])

f

r =2W=;

for u:=0, Nrows-1

for v:=0, Ncols-1

f

2

I [u; v ] = 0; // set as background

if (ju u j r)

0

f

x= x +rarcsin((u x )=r );

0 0

y=v;

2 1

I[u; v ]= I[r ound(x); r ound(y )];

g

g

2

return( I [u; v ]);

g

Algorithm 7: Cylindrical Warp of Image Region

Exercise 20

(a) Determine the transformation that maps a circular region of an image onto a hemisphere

and then pro jects the hemisphere onto an image. The circular region of the original image

is de ned by a center (x ;y ) and radius r . (b) Develop a computer program to carry out

c c 0 the mapping.

Shapiro and Sto ckman 45

Figure 11.26: Twotyp es of radial distortion, barrel (left) and pincushion (center) which can

b e removed bywarping to pro duce a recti ed image (right).

Rectifying Radial Distortion

Radial distortion is present in most lenses: it might go unnoticed byhuman interpreters,

but sometimes can pro duce large errors in photometric measurements if not corrected. Phys-

ical arguments deduce that radial distortion in the lo cation of an imaged p oint is prop or-

tional to the distance from that p oint to the optical axis. Figure 11.26 shows two common

cases of radial distortion along with the desirable recti ed image. If we assume that the

optical axis passes near the image center, then the distortion can b e corrected by displacing

all image p oints either toward or away from the center by a displacement prop ortional to

the square of the distance from the center. This is not a linear transformation since the

displacementvaries across the image. Sometimes, more even p owers of the radial distance

are used in the correction, as the mathematical mo del b elow shows. Let [x ;y ] b e the image

c c

center whichwe are assuming is also where the optical axis passes through the image. The

corrections for the image p oints are as follows, assuming we need the rst twoeven p owers

of the radial distance to compute the radial distortion. The b est values for the constants c

2

and c can b e found by empirical study of the radial displacements of known control p oints

4

or by formally using least-squares tting in a calibration pro cess.

p

2 2

R = ((x x ) +(y y ) ) (11.21)

c c

2 4

D = (c R + c R )

r 2 4

x = x +(xx)D

c c r

y = y +(yy)D

c c r

Polynomial Mappings

Many small global distortion factors can b e recti ed using p olynomial mappings of max-

imum degree twointwovariables as de ned in Equation 11.22. Twelve di erent co ecients

must b e estimated in order to adapt to the di erent geometric factors. To estimate these

co ecients, we need the co ordinates of at least six control p oints b efore and after mapping;

however, many more p oints are used in practice. ( Each such control p oint yields two equa- tions.) Note that if only the rst three terms are used in Equation 11.22 the mapping is an

46 Computer Vision: Mar 2000

ane mapping.

2 2

u = a + a x + a y + a xy + a x + a y (11.22)

00 10 01 11 20 02

2 2

v = b + b x + b y + b xy + b x + b y

00 10 01 11 20 02

Exercise 21

Show that radial distortion in Equation 11.21 with c = 0 can b e mo deled exactly bya

4

p olynomial mapping of the form shown in Equation 11.22.

11.8 Summary

Multiple concepts have b een discussed in this chapter under the theme of 2D matching. One

ma jor theme was 2D mapping using transformations. These could b e used as simple image

pro cessing op erations which could extract or sample a region from an image, register two

images together in the same co ordinate system, or remove or creatively add distortion to 2D

images. Algebraic machinery was develop ed for these transformations and various metho ds

and applications were discussed. This development is continued in Chapter 13 as it relates

to mapping p oints in 3D scenes and 3D mo dels. The second ma jor theme of this chapter

was the interpretation of 2D images through corresp ondences with 2D mo dels. A general

paradigm is recognition-by-alignment: the image is interpreted by discovering a mo del and

an RST transformation such that the transformation maps known mo del structures onto

image structures. Several di erent algorithmic approaches were presented, including p ose-

clustering, interpretation tree search, and the feature fo cus metho d. Discrete relaxation

and relational matching were also presented: these two metho ds can b e applied in very

general contexts even though they were intro duced here within the context of constrained

geometric relationships. Relational matching is p otentially more robust than rigid alignment

when the relations themselves are more robust than those dep ending on metric prop erties.

Image distortions caused by lens distortions, slanted viewing axes, quanti cation e ects, etc,

can cause metric relations to fail; however, top ological relationships such as cotermination,

connectivity, adjacency, and insideness are usually invariant to such distortions. A successful

match using top ological relationships on image/mo del parts might then b e used to nd a

large numb er of matching p oints, which can then b e used to nd a mapping function with

many parameters that can adapt to the metric distortions. The metho ds of this chapter

are directly applicable to many real world applications. Subsequentchapters extend the

metho ds to 3D.

11.9 References

The pap er byVan Wie and Stein (1977) discusses a system to automatically bring satellite

images into registration with a map. An approximate mapping is known using the time at

which the image is taken. This mapping is used to do a re ned search for control p oints

using templates: the control p oints are then used to obtain a re ned mapping. The b o ok by

Wolb erg gives a complete account of image warping including careful treatment of sampling

the input image and lessening aliasing by smo othing. The treatment of 2D matching via

p ose clustering was drawn from Sto ckman et al (1982), which contained the airplane detec-

tion example provided. A more general treatment handling the 3D case is given in Sto ckman

Shapiro and Sto ckman 47

1987. The pap er by Grimson and Lozano-Perez demonstrates how distance constraints can

b e used in matching mo del p oints to observed data p oints. Least squares tting is treated

very well in other references and is a topic of much depth. Least squares techniques are

in common use for the purp ose of estimating transformation parameters in the presence of

noise by using many more than the minimal numb er of control p oints. The b o ok byWolb erg

(1990) treats several least squares metho ds within the context of warping while the b o ok

by Daniel and Wo o d (1971) treats the general tting problem. In some problems, it is not

p ossible to nd a go o d geometric transformation that can b e applied globally to the entire

image. In such cases, the image can b e partitioned into a numb er of regions, each with

its own control p oints, and separate warps can b e applied to each region. The warps from

neighb oring regions must smo othly agree along the b oundary b etween them. The pap er by

Gostasby (1988) presents a exible way of doing this.

Theory and algorithms for the consistent lab eling problems can b e found in the pap ers

by Haralick and Shapiro (1979,1980). Both discrete and continuous relaxation are de ned

in the pap er by Rosenfeld, Hummel, and Zucker (1976), and continuous relaxation is further

analyzed in the pap er by Hummel and Zucker (1983). Metho ds for matching using structural

descriptions have b een derived from Shapiro (1981); relational indexing using 2-graphs can

b e found in Costa and Shapiro (1995). Use of invariant attributes of structural parts for

indexing into mo dels can b e found in Chen and Sto ckman (1996).

1. J.L. Chen and G. Sto ckman, Indexing to 3D Model Aspects using 2D Contour Features,

Pro c. Int. Conf. Computer Vision and pattern Recognition (CVPR), San Francisco,

CA June 18-20, 1996. Expanded pap er to app ear in the journal CVIU.

2. M. Clowes (1971) On seeing things, Arti cal Intelligence, Vol. 2, pp79-116.

3. M. S. Costa and L. G. Shapiro (1995), Scene Analysis Using Appearance-BasedModels

and Relational Indexing, IEEE Symposium on Computer Vision,Novemb er, 1995, pp.

103-108.

4. C. Daniel and F. Wo o d (1971) Fitting Equations to Data John Wiley and Sons, Inc.,

New York.

5. A.Goshtasby (1988) Image Registration by Local Approximation Methods, Image and

Vision Computing, Vol. 6, No. 4 (Nov 1988) pp255-261.

6. W. Grimson and T. Lozano-Perez (1984) Model-BasedRecognition and Localization

From Sparse Range or Tactile Data,Int. Journal of Rob otics Research, Vol. 3, No. 3

pp3-35.

7. R. Haralick and L. Shapiro (1979) The consistent labeling problem I, IEEE Trans.

PAMI-1, pp173-184.

8. R. Haralick and L. Shapiro (1980) The consistent labeling problem II, IEEE Trans.

PAMI-2, pp193-203.

9. R. Hummel and S. Zucker (1983) On the Foundations of Relaxation Labeling Processes,

IEEE Trans. PAMI-5, pp267-287.

48 Computer Vision: Mar 2000

10. Y. Lamden and H. Wolfson (1988) Geometric Hashing: A General and Ecient Model-

BasedRecognition Scheme, Pro c. 2nd Int. Conf. on Computer Vision, Tarp on Springs,

FL. (Nov 1988) pp238-249.

11. D. Rogers and J. Adams (**) Mathematical Elements for Computer Graphics,

2nd Ed.. McGraw-Hill.

12. A. Rosenfeld, R. Hummel, and S. Zucker (1976) Scene Labeling by Relaxation Opera-

tors, IEEE Trans. Systems, Man, and Cyb ernetics, SMC-6, pp. 420-453.

13. G. Sto ckman, S. Kopstein and S. Benett (1982) Matching Images to Models for Reg-

istration and Object Detection via Clustering, IEEE Trans. on PAMI, Vol. PAMI-4,

No. 3 (May 1982) pp 229-241.

14. G. Sto ckman (1987) Object Recognition and Localization via Pose Clustering, Com-

puter Vision, Graphics and Image Pro cessing, Vol. 40 (1987) pp 361-387.

15. P Van Wie and M. Stein (1977) A LANDSAT digital image recti cation system, IEEE

Trans. Geosci. Electron., Vol. GE-15, July, 1977.

16. P.Winston (1977) Arti cial Intelligence Addison-Wesley.

17. G. Wolb erg (1990) Digital Image Warping, IEEE Computer So ciety Press, Los Alamitos, CA.