Notes on Estimation

Psych CS DEE

Prof David J Heeger

Novemb er

There are a great variety of applications that dep end on analyzing the motion in image se

quences These include motion detection for surveillance image sequence data compression MPEG

image understanding motionbased segmentation depthstructure from motion obstacle avoid

ance image registration and comp ositing The rst step in pro cessing image sequences is typically

image velo city estimation The result is called the optical ow eld a collection of twodimensional

velo cityvectors one for each small region p otentiallyoneforeach pixel of the image

Image velo cities can b e measured using correlation or blo ckmatching for example see Anan

dan in which each small patch of the image at one is compared with nearby patches

in the next frame Feature extraction and matching is another way to measure the ow eld

for reviews of feature tracking metho ds see Barron or Aggarwal and Nandhakumar

Gradientbased algorithms are a third approach to measuring ow elds for example Horn and

Schunk Lucas and Kanade Nagel A fourth approach using spatiotemp oral l

tering has also b een prop osed for example Watson and Ahumada Heeger Grzywacz

and Yuille Fleet and Jepson This handout concentrates on the lterbased and

gradientbased metho ds Emphasis is placed on the imp ortance of multiscale coarsetone rene

mentofthevelo city estimates Foragoodoverview and comparison of dierentow metho ds see

Barron et al

Geometry D Velo city and D Image Velo city

Motion o ccurs in many applications and there are many tasks for which motion information might

be very useful Here we concentrate on natural image sequences of d scenes in which ob jects

and the camera maybemoving Typical issues for whichwewant computational solutions include

inferring the relatived motion b etween the camera and ob jects in the scene inferring the depth

and surface structure of the scene and segmenting the scene using the motion information Towards

this end we are interested in knowing the geometric relationships between d motion surface

structure and d image velo city

To b egin consider a point on the surface of an ob ject We will represent d surface points

T

as p osition vectors X X Y Z relative to a viewercentered co ordinate frame as depicted in

Fig When the camera moves or when the ob ject moves this p oint moves along a d path

T

Xt X tY tZt relative to our viewercentered co ordinate frame The instantaneous

d velo cityofthepoint is the of this path with resp ect to time

T

dXt dY dZ dX

V

dt dt dt dt

We are interested in the image lo cations to which the d p oint pro jects as a function of time

T

Under p ersp ective pro jection the p oint X pro jects to the image p ointx y given by

x fXZ

y fYZ Y camera 3d path of ● center surface point y X x Z image

plane

Figure Camera centered co ordinate frame and p ersp ective pro jection Owing to motion b etween

the camera and the scene a D surface p ointtraverses a path in D Under p ersp ective pro jection

this path pro jects onto a D path in the image plane the temp oral derivativeofwhich is called D

velo city The d velo cities asso ciated with all visible points denes a dense d vector eld called

the d motion eld

where f is the fo cal length of the pro jection As the d point moves through time its corre

T

sp onding d image p oint traces out a d path xtyt the derivative of which is the image

velo city

T

dxt dy t

u

dt dt

We can combine Eqs and and thereby write image velo city in terms of the d p osition and

velo city of the surface p oint

T

dZ dX dY

T

u X tY t

Z dt dt Z dt

Each visible surface p oint traces out a d path which pro jects onto the image plane to form a d

path If one considers the paths of all visible d surface p oints and their pro jections onto the image

plane then one obtains a dense set of d paths the temp oral derivative of which is the vector eld

of d velo cities commonly known as the optical ow eld

To understand the structure of the optical ow eld in greater detail it is often helpful to make

further assumptions ab out the structure of the d surface or ab out the form of the d motion We

b egin by considering the sp ecial case in which the ob jects in the scene move rigidly with resp ect

to the camera as though the camera was moving through a stationary scene LonguetHiggins

and Prazdny Bruss and Horn Waxman and Ullman Heeger and Jepson

This is a sp ecial case because all p oints on a share the same six motion parameters

relative to the cameracentered co ordinate frame In particular the instantaneous velo city of

the camera through a stationary scene can be expressed in terms of the cameras d translation

T T

T T T T and its instantaneous d Here the direction of

x y z x y z

gives the axis of rotation while jj is the magnitude of rotation p er unit time Given this motion

of the camera the instantaneous d velo city of a surface p oint in cameracentered co ordinates is

T

dX dY dZ

X T

dt dt dt

The d velo cities of all surface points in a stationary scene dep end on the same rigidb o dy

motion parameters and are given by Eq When we substitute these d velo cities for dXdt in

A B C

Figure Example ow elds A Camera translation B Camera rotation C Translation plus

rotation Eachowvector in C is the vector sum of the two corresp onding vectors in A and B

Eq we obtain an expression for the form of the optical ow eld for a rigid scene

ux y px y Ax y T Bx y

where px y Z x y isinverse depth at each image lo cation and

f x

Ax y

f y

xy f f x f y

Bx y

f y f xy f x

The matrices Ax y andBx y dep end only on the image p osition and the fo cal length

Equation describ es the ow eld as a function of D motion and depth It has two terms

The rst term is referred to as the translational comp onent of the ow eld since it dep ends on

d translation and d depth The term is referred to as the rotational comp onent since it

dep ends only on d rotation Since px y the inverse depth and T the translation are multiplied

together in Eq a larger smaller p or a slower D translation smaller jTj b oth yield

aslower image velo city

Figure shows some example ow elds Each vector represents the sp eed and direction of

motion for each lo cal image region Figure A is a ow eld resulting from camera translation

ab ove a planar surface Figure B is a ow eld resulting from camera rotation andFigC is a

ow eld resulting from simultaneous translation and rotation Eachowvector in Fig C is the

vector sum of the two corresp onding owvectors is Figs A and B

Pure Translation When a camera is undergoing a pure translational motion features in

the image move toward or away from a single point in the image called the fo cus of expansion

FOE The FOE in Fig A is centered just ab ove the horizon

When the rotation is zero

ux y px y Ax y T

ie

u x y px x T x fT T

  z  x z

u x y py y T y fT T

 z  y z

T

The image velo city is zero at image p osition x y this is the fo cus of expansion Atany other

 

T T

image p osition the owvector is prop ortional to x y x y that is the owvector p oints

 

either toward or away from the fo cus of expansion

Pure Rotation When the camera is undergoing a pure rotational motion the resulting ow

eld dep ends only on the axis and sp eed of rotation indep endent of the scene depthstructure

see Fig B for an example Sp ecically rotation results in a quadratic ow eld that is each

comp onent u and u of the ow eld is a quadratic function of image p osition This is evident



from the form of Bx y which contains quadratic terms in x and y eg x xy etc

Motion With Resp ect to a Planar Surface When the camera is rotating as well as

translating the ow eld can b e quite complex Unlike the pure translation case the singularityin

the ow eld no longer corresp onds to the translation direction But for planar surfaces the ow

eld simplies to again b e a quadratic function of image p osition see Fig C for an example

A planar surface can b e expressed in the cameracentered co ordinate frame as

Z m X m Y Z

x y 

The depth map is obtained by substituting from Eq to give

Z x y m xf Z x y m yf Z x y Z

x y 

Solving for Z x y

fz



Z x y

f xm ym

x y

and the inverse depth map is

f m m

x y

px y x y

fZ fZ fZ

  

That is for a planar surface the inverse depth map is also planar

Combining Eqs and gives

m x m y

f x

x y

ux y T B

f y

fZ



We already know see ab ove that the rotational comp onentoftheow eld the second term B

is quadratic in x and y It is clear that the translational comp onent the rst term in the ab ove

equation is also quadratic in x and y

For curved d surfaces the translational comp onent and hence the optical ow eld will be

cubic or even higher order The rotational comp onentisalways quadratic

Occlusions The optical ow eld is often more complex than is implied by the ab ove equa

tions For example the b oundaries of ob jects gives rise to o cclusions in an image sequence which

cause discontinuities in the optical ow eld A detailed discussion of these issues is beyond the

scop e of this presentation

time = t time = t+1

Figure Intensity conservation assumption the image intensities are shifted lo cally translated

from one frame to the next

GradientBased D Motion Estimation

The optical ow eld describ ed ab ove is a geometric entity We now turn to the problem of

estimating it from the spatiotemp oral variations in image intensity Towards this end a number of

p eople Horn and Schunk Lucas and Kanade Nagel and others have prop osed

algorithms that compute optical ow from spatial and temp oral of image intensity This

is where we b egin

Intensity Conservation and Blo ck Matching The usual starting p ointforvelo city esti

mation is to assume that the intensities are shifted lo cally translated from one frame to the next

and that the shifted intensityvalues are conserved ie

f x y tf x u y u t



where f x y t is image intensity as a function of and time and u u u is velo city



Of course this intensity conservation assumption is only approximately true in practice The

eective assumption is that the surface radiance remains xed from one frame to the next in an

image sequence One can fabricate scene constraints for which this holds but they are somewhat

articial For example the scene might be constrainted to contain only Lamb ertian surfaces no

sp ecularities with a distant point source so that changing the distance to the light source has

no eect with no secondary illumination eects shadows or intersurface reection Althought

unrealistic it is remarkable that this intensity conservation constraintworks as well as it do es

Then a typical blo ck matching algorithm would pro ceed as illustrated in Fig bycho osing a

region eg x pixels from one frame and shifting it over the next frame to nd the b est match

One mightsearch for a p eak in the crosscorrelation

X

f x y t f x u y u t



or equivalently a minimum in the sumofsquareddierences SSD

X

f x y t f x u y u t



In b oth cases the summation is taken over equal sized blo cks in the two frames f (x) f (x) f1 (x) f2 (x) 1 2 d ●● ● ● f (x) f (x) ● ● ● ● − 2 f (x) − f (x) d = 1 dˆ = 1 2 f ′(x) 1 f1 ′ (x)

x-d x x-dˆ x

Figure The gradient constraint relates the at one p oint of the signal to the temp oral

dierence and the spatial derivative slop e of the signal For a displacement of a linear signal left

the dierence in signal values at a p oint divided by the slop e gives the displacement In the case

of nonlinear signals right the dierence divided by the slop e gives an approximation to the

displacement

Gradient Constraints To derive a simple linear metho d for estimating the velo city ulets

rst digress to consider a d case Let our signals at two time instants be f x and f x and



supp ose that f x is a translated version of f x ie f xf x d for displacement d This

 

situation is illustrated in Fig ATaylor series expansion of f x daboutx is given by



xO d f f x d f x df

 

 

where f denotes the derivativeof f with resp ect to x Given this expansion wecannow write the

dierence b etween the two signals at lo cation x as

xO d f f x f x df



 

By neglecting higherorder terms we arrive at an approximation to the displacement d that is

f x f x



d

f x



The Motion Gradient Constraint This approach generalizes straightforwardly to two

spatial dimensions to pro duce the motion gradient constraint equation As ab ove one assumes

that the timevarying image intensityiswell approximated by a rstorder Taylor series expansion

f x u y u t f x y tu f x y tu f x y tf x y t

  x y t

where f f and f are the spatial and temp oral partial derivatives of image intensity If weignore

x y t

second and higherorder terms in the Taylor series and then substitute the resulting approximation

into Eq weobtain

u f x y tu f x y tf x y t

 x y t

This equation relates the velo city at one lo cation in the image to the spatial and temp oral derivatives

of image intensity at that same lo cation Eq is called the motion gradient constraint

If the time between frames is relatively large so that the temp oral derivative f x y t is

t

unavailable then a similar constraint can b e derived by taking only the rstorder spatial terms of

the Taylor expansion In this case we obtain

u f x y tu f x y tf x y t

 x y

where f x y tf x y t f x y t

Intensity Conservation A dierentway of deriving the gradient constraint can b e found if

T

we consider the d paths in the image plane xtyt along which the intensity is constant In

other words imagine that we are tracking p oints of constantintensity from frame to frame Image

velo city is simply the temp oral derivative of such paths

T

More formally a path xtyt along which intensityis equal to a constant c must satisfy

the equation

f xtyttc

As explained ab ove assuming that the path is smo oth the temp oral derivative of the path

T

xtyt gives us the optical ow To obtain a constraint on optical ow we therefore dif

ferentiate equation

d

f xtytt

dt

The right hand side is zero b ecause c is a constant Because the lefthandside is a function of

variables which all dep end on time twetake the total derivative and use the chain rule to obtain

d f dx f dy f dt

f xtytt

dt x dt y dt t dt

f u f u f

x  y t

In this last step the velo cities were substituted for the path derivatives ie u dxdt and



u dy dt Finallyifwe combine Eqs and we obtain the motion gradient constraint

Equation is often referred to as an intensity conservation constraint In terms of the d

scene this conservation assumption implies that the image intensity asso ciated with the pro jection

of a d p oint do es not change as the ob ject or the camera move This assumption is often violated

to some degree in real image sequences

Combining Constraints It is imp ossible to recover velo city given just the gradient con

straint at a single p osition since Eq oers only one linear constraint to solveforthetwo unknown

comp onents of velo city Only the comp onentofvelo city in the gradient direction is determined In

eect as shown in Fig the gradient constraint equation constrains the velo city to lie somewhere

along a line in velo city space Further assumptions or measurements are required to constrain

velo citytoapointinvelo city space

Manygradientbased metho ds solve for velo citybycombining information over a spatial region

The dierent gradientbased metho ds use dierent combination rules A particularly simple rule

for combining constraints from two nearby spatial p ositions is

f x y t f x y t u f x y t

x   y    t  

f x y t f x y t u f x y t

x y t u2

u12 fxt + u fy = −f

u1

Figure The gradient constraint equation constrains velo city to lie somewhere on a line in d

velo city space That is u f u f f denes a line p erp endicular f f If one had a set

 x y t x y

of measurements from nearby p oints all with dierent gradient directions but consistent with the

same velo city then we exp ect the constraint lines to intersect at the true velo city the black dot

where the two co ordinate pairs x y corresp ond to the two spatial p ositions EachrowofEq

i i

is the gradient constraint for one spatial p osition Solving this equation simultaneously for both

spatial p ositions gives the velo city that is consistent with b oth constraints

A b etter option is to combine constraints from more than just two spatial p ositions When wedo

this we will not b e able to exactly satisfy all of the gradient constraints simultaneously b ecause there

will b e more constraints than unknowns To measure the extent to which the gradient constraints

are not satised we square each constraint and sum them

X

E u u g x y u f x y tu f x y tf x y t

  x y t

xy

where g x y is a window function that is zero outside the neighb orho o d within which we wish

to use the constraints Each squared term in the summation is a constraint on the ow from a

dierent nearby p osition Usuallywewanttoweight the terms in the center of the neighborhood

higher to give them more inuence Accordingly it is common to let g x y b e a decaying window

function such as a Gaussian

Since there are now more constraints than unknowns there may not b e a solution that satises

all of the constraints simultaneously In other words E u u will typically be nonzero for all



u u The choice of u u that minimizes E u u is the least squares estimate of velo city

  

Least Squares Estimate Since Eq is a quadratic expression there is a simple analytical

expression for the velo city estimate The solution is derived by taking derivatives of Eq with

resp ect to u and u and setting them equal to zero



X

Eu u



g x y u f u f f f f

 x x y x t

u



xy

X

Eu u



g x y u f u f f f f

y  x y y t

u

xy

These equations may b e rewritten as a single equation in matrix notation

M u b

where the elements of M and b are

P P

gf gf f

x y

x

P P

M

gf f gf

x y

y

P

gf f

x t

P

b

gf f

y t

The leastsquares solution is then given by



u M b

presuming that M is invertible

Implementation with Image Op erations Remember that we want to compute optical

ow at every spatial lo cation Therefore we will need the matrix M and the vector b at each

lo cation In particular we should rewrite our matrix equation as

Mx y ux y bx y

T

As ab ove in Eq the elements in Mx y and bx y at p osition x y are lo cal weighted

summations of the intensity derivatives What we intend to show here is that the elements of

Mx y andbx y can b e computed in a straightforward way using convolutions and image p oint

op erations

To begin let g x y be a Gaussian window sampled on a square grid of width w Then at a

T

sp ecic lo cation x y the elements of the matrix Mx y andthevector bx y are given by

P P P P

g a bf x a y b g a bf x a y bf x a y b

x y

x

a b a b

P P P P

M

g a bf x a y bf x a y b g a bf x a y b

x y

a b a b y

P P

g a bf x a y bf x a y b

x t

a b

P P

b

g a bf x a y bf x a y b

y t

a b

where each summation is over the width of the Gaussian from w to w Although this notation is

somewhat cumb ersome it serves to show explicitly that each element of the matrix as a function

of spatial p osition is actually equal to the convolution of g x y with pro ducts of the intensity

derivatives To compute the elements of Mx y and bx y one simply applies derivative lters

to the image takes pro ducts of their outputs and then blurs the results with a Gaussian For

example the upp erleft comp onent of Mx y ie m x y is computed by applying an



xderivative lter squaring the result at each pixel blurring with a Gaussian lowpass lter

Gaussian PreSmo othing Another practicalityworth mentioning here is that some prepro

cessing b efore applying the gradient constraint is generally necessary to pro duce reliable estimates

of optical ow The reason is primarily due to the use of a rstorder Taylor series expansion in

deriving the motion constraint equation By smo othing the signal it is hop ed that we are reducing

the relative amplitudes of higherorder terms in the image Therefore it is common to smo oth

the image sequence with a lowpass lter b efore estimating derivatives as in Eq Often this

presmo othing can b e incorp orated into the derivative lters used to computer the spatial and tem

poral derivatives of the image intensity This also helps avoid problems caused by small amounts

of aliasing something well say more ab out later +=

u 2 u2

u1

u1

A B

Figure Ap erture problem A Single moving grating viewed through a circular ap erture The

diagonal line indicates the lo cus of velo cities compatible with the motion of the grating B Plaid

comp osed of twomoving gratings The lines give the p ossible motion of each grating alone Their

intersection is the only shared motion

Ap erture Problem When the matrix M in Eq is singular or illconditioned there

are not enough constraints to solve for both unknowns This situation corresp onds to what has

b een called the ap erture problem For some patterns eg a gradual curve there is not enough

information in a lo cal region small ap erture to disambiguate the true direction of motion For

other patterns eg an extended grating or edge the information is insucient regardless of the

ap erture size Of course the problem is not really due to the ap erture but to the onedimensional

structure of the stimulus and to ensure the reliability of the velo city estimates it is common to

check the condition number of M or b etter yet the magnitude of its smallest singular value

The latter case is illustrated in Fig A The diagonal line in velo city space indicates the lo cus

of velo cities compatible with the motion of the grating At b est we may extract only one of the

two velo city comp onents Figure B shows how the motion is disambiguated when there is more

spatial structure The plaid pattern illustrated in Fig B is comp osed of twomoving gratings The

lines give the p ossible motion of each grating alone Their intersection is the only shared motion

Combining the gradient constraints according to the summation in Eq is related to the

intersection of constraints rule depicted in Fig The gradient constraint Eq is linear in b oth

u and u Given measurements of the derivatives f f f there is a line of p ossible solutions

 x y t

for u u analogous to the constraint line illustrated in Fig A For each dierent p osition there



will generally b e a dierent constraint line Equation gives the intersection of these constraint

lines analogous to Fig B

Regularization Solution Another way to add further constraints to the gradient constraint

equation is to invoke a smo othness assumption In other words one can constrain regularize the

estimation problem by assuming that resultant ow eld should b e the smo othest ow eld that

is also consistent with the gradient constraints The central assumption here like that made with

the based regression approachabove is that the ow in most image regions is exp ected to b e

smo oth b ecause smo oth surfaces generally give rise to smo oth optical ow elds

For example lets assume that we want the optical ow to minimize any violations of the x past

present y y x x t

ABCt

Figure Orientation in spacetime based on an illustration by Adelson and Bergen A

Avertical bar translating to the right B The spacetime cub e of image intensities corresp onding

to motion of the vertical bar C An xt slice through the spacetime cub e Orientation in the

xt slice is the horizontal comp onent of velo city Motion is like orientation in spacetime and

spatiotemp orally oriented lters can b e used to detect and measure it

gradientconstraint equation and at the same time to minimize the magnitude of velo citychanges

between neighb oring pixels For example wemay wish to minimize

X X

u u u u

 

E f u f u f

x  y t

x y x y

xy xy

In practice the derivatives are approximated bynumerical derivative op erators and the resulting

minimization problem has as its solution a large system of linear equations Because of the large

dimension of the linear system it is commonly solved iteratively using Jacobi GaussSeidel or

conjugategradient iteration

One of the main disadvantages of this approach is the extensive amount of smo othing often

imp osed even when the actual optical ow eld is not smo oth Some researchers have advo cated

the use of line processes that allow for violations of smo othness at certain lo cations where the ow

eld constraints indicate discontinuous ow due to o cclusion for example This issue is involved

and b eyond the scop e of the handout for now

SpaceTime Filters and the Domain

One can also view image motion and these approaches to estimating optical ow in the frequency

domain This will help to understand aliasing and it also serves as the foundation for lterbased

metho ds for optical ow estimation In addition a numb er of mo dels of biological motion sensing

have b een based on direction selective spatiotemp oral linear op erators eg Watson and Ahumada

Adelson and Bergen Heeger Grzywacz and Yuille The concept underlying

these mo dels is that visual motion is like orientation in spacetime and that spatiotemp orally

oriented linear op erators can b e used to detect and measure it

Figure shows a simple example Figure A depicts a vertical bar moving to the right Imagine

that we stack the consecutive frames of this image sequence one after the next We end up with

a threedimensional volume spacetime cub e of intensity data like that shown in Figure B

Figure C shows an xt slice through this spacetime cub e The slop e of the edges in the xt slice

equals the horizontal comp onent of the bars velo city change in over time Dierent

sp eeds corresp ond to dierentslopes x ω t

u fxt + f = 0

OR past

(u g xt + g ) ∗ f = 0 present ω x future

u += t Space-Time Domain Spatiotemporal Frequency Domain

A BC

Figure Spacetime lters and the gradient constraint A The gradient constraint states that

there is some spacetime orientation for which the directional derivativeiszero B Pattern moving

from right to left and the spacetime derivative lter with the orthogonal spacetime orientation

so that convolving the two gives a zero resp onse C Spatiotemp oral Fourier transform of B The

sp ectrum of the translating pattern the line passing through the origin and the frequency

resp onse of the derivative lter the pair of symmetrically p ositioned blobs the pro duct of the two

is zero

Following Adelson and Bergen and Simoncelli we now show that the gradient

based solution can be expressed in terms of the outputs of a set of spacetime oriented linear

lters To this end note that the derivative op erations in the gradient constraintmay be written

as convolutions Furthermore we can prelter the stimuli to extract some spatiotemp oral subband

and p erform the analysis on that subband Consider for example preltering with a spacetime

Gaussian function Abusing the notation somewhat we dene

g x y t f x y t g x y t f x y t f x y t

x x

x

where is convolution and g is the xderivative of a Gaussian In words we compute f by

x x

convolving with g g x a spatiotemp oral linear lter We compute f and f similarly

x y t

Then the gradient constraint can b e rewritten as

u g u g g f

 x y t

where g g and g are the spacetime derivative lters For a xed u Eq is the convolution

x y t

of a spacetime lter namely u g u g g with the image sequence f x y t This lter is a

 x y t

weighted sum of the three basis derivativeltersg g andg and hence is itself a directional

x y t

derivative in some spacetime direction The gradient constraint states that there is some space

time orientation for which the directional derivativeiszero This lterbased interpretation of the

gradient constraint is depicted in one spatial dimension in Fig

Frequency Domain Analysis of the Gradient Constraint The gradient constraint for

image velo city estimation also has a simple interpretation as regression in the spatiotemp oral Fre

quency domain

The power sp ectrum of a translating twodimensional pattern lies on a plane in the Fourier

domain Watson and Ahumada Fleet and the tilt of this plane dep ends on

velo city An intuition for this result can be obtained by rst considering each spatiotemp oral

frequency comp onent separately The spatial frequency of a moving sine grating is expressed in

cycles per pixel and its temp oral frequency is expressed in cycles per frame Velo city which is

distance over time or pixels per frame equals the temp oral frequency divided by the spatial

frequency More formally a drifting d sinusoidal grating can be expressed in terms of spatial

frequency and velo cityas

f x t sin x u t

x 

where is the spatial frequency of the grating and u is the velo city It can also b e expressed in

x 

terms of spatial frequency and temp oral frequency as

x t

f x tsin x t

x t

The relation b etween frequency and velo city can b e derived algebraically by setting x u t

x 

x t From this one can showthat

x t

u

 t x

Now consider a onedimensional spatial pattern that has many spatialfrequency comp onents

eg made up of parallel bars edges or random strip es translating with sp eed u Each frequency



comp onent has a spatial frequency and a temp oral frequency but all of the Fourier comp onents

x t

share the same velo citysoeach comp onent satises

u

 x t

Analogously two dimensional patterns arbitrary textures translating in the image plane occupy

a plane in the spatiotemp oral frequency domain

u u

 x y t

For example the exp ected value of the power sp ectrum of a translating white noise texture is

a constant within this plane and zero outside of it The plane is p erp endicular to the vector

T

u u



Formally consider a spacetime image sequence f x y tthatwe construct by translating a d

image f x y with a constantvelo city u The image intensityover time may b e written as



f x y tf x u t y u t

 

At time t the spatiotemp oral image is equal to f x y The threedimensional Fourier trans



form of f x y t is

Z Z Z



j x y t 

x y t

f x y te dxdy dt F

x y t



To simplify this expression we substitute f x u t y u t for f x y t and arrange the order

 

of integration so that time is integrated last

Z Z Z



j x y  j t

x y t

dt F f x u t y u te dxdy e

x y t  



The term inside the square brackets is the d Fourier transform of f x u t y u t whichcan

 

b e simplied using the Fourier shift prop erty to yield

Z



j u t u t  j t

x y t



F F e e dt

x y t  x y



Z



j tu u  j t

x y t



dt F e e

 x y



where F is the d Fourier transform of f x y The Fourier transform of a complex

 x y 

exp onential is a Dirac delta function and therefore this expression simplies further to

F F u u

x y t  x y  x y t

This equation shows that the Fourier transform is only nonzero on a plane b ecause the delta

function is only nonzero when u u

 x y t

There are many imp ortant reasons to understand motion in the frequency domain For example

if the motion of an image region may b e approximated by translation in the image plane then the

velo city of the region may b e computed by nding the plane in the Fourier domain in whichall

the p ower resides Figure illustrates how a derivative lter can b e used to determine whichplane

contains the p ower The frequency resp onse of a spatiotemp oral derivative lter is a symmetrically

p ositioned pair of blobs The gradient constraint states that the blobs can be p ositioned by

rotating them around the origin so that multiplying the Fourier transform of the image sequence

the plane the frequency resp onse of the lter blobs gives zero

FormallywerelyonParsevals theorem and the derivative theorem of the Fourier transform to

write the gradient metho d in the frequency domain Parsevals theorem states that the sum of the

squared values in spacetime equals the sum of the p ower in the frequency domain

X X

jf x y tj jF j

x y t

xy t

x y t

The derivative theorem states that the Fourier transform of the derivativeoff equals the Fourier

transform of f itself multiplied by an imaginary ramp function of the form j For example

F f x y t j F f x y t j F

x x x x y t

Combining Eqs and gives

X

E u u ju f x y tu f x y tf x y tj

  x y t

xy t

X

ju F u F F j

 x x y t y x y t t x y t

x y t

X

u u jF j



x y t

If all of the assumptions and approximations hold precisely notably that the image is uniformly

translating for all space and time then this expression can b e precisely minimized bycho osing u

to satisfy u u If any of the approximations are violated then there will not

 x y t

in general be any u that will satisfy the constraint p erfectly Minimizing Eq is a weighted

leastsquares regression

To b e concrete let for n N be the spatiotemp orals for

n xn yn tn

whichwehave samples of jF j Rewriting Eq in matrix notation gives

E u u kAuk



T

where u u u and



F F F

 x  y   t

B C

F F F

x y t

B C

A

B C

A

F F F

N xN N yN N tN

Because of the jF j the regression dep ends most on those frequency comp onents where

the p ower is concentrated Equation is minimized byavector v in the direction of the eigenvector

T

corresp onding to the smallest eigenvalue of A A The image velo city estimate u is obtained by

normalizing v so that its third elementis

For real image sequences the leastsquares estimate of image velo city Eq is denitely

preferred over the frequency domain regression estimate Eq b ecause the frequency domain

estimate requires computing a global Fourier transform But for an image sequence containing a

uniformly translating pattern the two estimates are the same

MultiScale Iterative Motion Estimation

Temp oral Aliasing The image sequences we are given as input often have temp oral sampling

rates that are slower than that required by the sampling theorem to uniquely represent the high

temp oral frequencies in the signal As a consequence temp oral aliasing is a common problem in

computing image velo city in real image sequences The classic example of temp oral aliasing is the

wagon wheel eect in old Western movies when the wheel rotates at the right sp eed relative to the

frame its direction of rotation app ears to reverse This eect can most easily b e understo o d

by considering the closest match for a given sp oke as the wheel rotates from one frame to the next

Let the angular spacing b etween sp okes b e If the wheel rotates by more than but less than

from one frame to the next then the wheel will app ear to rotate backwards In fact anyinteger

multiple of this range will also do so that between k and k will app ear to rotate

backwards for anyinteger k

Temp oral aliasing can b e a problem for anytyp e of motion algorithm For featurebased match

ing algorithms temp oral aliasing gives rise to false matches For ltering and gradientbased

metho ds temp oral aliasing b ecomes a problem whenever the motion is large compared to the spa

tial extent of the lters With phasebased metho ds the problems are also evident b ecause phase

is only dened uniquely with a single wavelength of a bandpass signal If the signal shifts bymore

than one wavelength then the nearest lo cation in the next image with the same phase will not yield

the correct match

The eect of temp oral aliasing may also be characterized in the spatiotemp oral frequency do

main as illustrated in Fig Consider a onedimensional signal moving at a constantvelo city As

mentioned ab ove the power sp ectrum of this signal lies on a line through the origin Temp oral

sampling causes a replication of the signal sp ectrum at temp oral frequency intervals of T ra

dians where T is the time between frames It is easy to see that these replicas could confuse a

gradientbased motion estimation algorithm b ecause the derivative lters may resp ond as much or

more to the replicated sp ectra as they do to the original Temporal Sampling with Period T

ωt ωt

2π/T

ωx ωx

Warp ωt ωt

ωx ωx

Figure Topleft The p ower sp ectrum of a translating signal o ccupies a line in the spatiotemp oral

frequency domain Topright Sampling in time to get the discrete frames of an image sequence

gives rise to replicas of the sp ectra that may result in temp oral aliasing These replicas can cause

severe problems for a gradientbased motion algorithm b ecause the derivative lters may resp ond

as much or more to the replicated sp ectra as they do to the original Bottomleft The problem

can b e alleviated by using coarsescale derivative lters tuned for lower frequencies ie by blurring

the images a lot b efore computing derivatives The frequency resp onses of these coarsescale lters

avoid the replicas Bottomright The velo city estimates from the coarse scale are used to warp

the images thereby undoing most of the motion The ner scale lters can now b e used to estimate

the residual motion since the velo cities after warping are slower the frequency resp onses of the

ne scale lters maynowavoid the replicas

Eliminating Temp oral Aliasing by CoarsetoFine Warping An imp ortant observation

concerning this typ e of aliasing is that it usually aects only the higher spatial frequencies of an

image In particular for a xed velo city those spatial frequencies moving more than half of their

p erio d p er frame will b e aliased but lower spatial frequencies will not

This suggests a simple but eectiveapproach for avoiding the problem of temp oral aliasing Es

timate the velo city using coarsescale lters ie spatially blur the individual frames of the sequence

b efore applying the derivative lters to avoid the p otentially aliased high spatial frequencies These

velo city estimates will corresp ond to the average velo cityover a large region of the image b ecause

the coarsescale lters are large in spatial extent Given a uniformly translating image sequence

we could stop here But in typical image sequences the velo cityvaries from one p oint to the next

The coarsescale lters miss much of this spatial variation in the oweld To get b etter estimates

of lo cal velo city we need ner scale higher frequency lters with smaller spatial extents

Towards this following Bergen et al for example the coarse motion estimates are used

to warp the images and undo the motion roughly stabilizing the ob jects and features in the

images Then ner scale lters are used to estimate the residual motion since the velo cities after

warping are slower the frequency resp onses of the ne scale lters maynowavoid the temp orally

aliased replicas as illustrated in Fig C Typicallythisisalldonewith an image pyramid data

structure

Coarsetone metho ds are widely used to measure motion in video image sequences b ecause of

the aliasing that regularly o ccurs Despite this it do es have its drawbacks stemming from the fact

that the nescale estimates can only b e as reliable as their coarsescale precursors For example if

there is little or no signal at coarsescales then coarsescale estimates cannot b e trusted and

wehave no initial guess to oer nerscales A related problem o ccurs if the coarsescale estimate

is very p o or p erhaps due to the broad spatial extent of the estimation then the warping will not

signicantly decrease the eective velo city as desired in whichcase the nerscale estimation will

pro duce an equally p o or result

Iterative Renement of Velo city Estimates Such coarsetone estimation strategies

are an example of iteratively rening and up dating velo city measurements The pro cess involves

warping the images according to the currentvelo city estimate and then reestimating the velo city

from the warp ed sequence The warping is an attempt to remove the motion and therefore stabilize

the image sequence This is a useful idea within a coarsetone strategy to avoid the adverse eects

of temp oral aliasing as mentioned ab ove

It can also b e used to rene the ow estimates without necessarily changing scales Remember

that derivative estimates are often noisy and the assumptions up on whichthe velo city estimates

were based do not hold exactly In particular lets return to the derivation of the gradient constraint

in the d case shown in Fig where we wanted to nd the displacement d between f x and

f x We linearized the signals to derive an estimate d f x f xf x As illustrated

 



in Fig when f is a linear signal then d d Otherwise to leading order the accuracy of the



estimate is b ounded by the magnitude of the dispacement and the second derivativeof f



d jf xj



jd dj

jf xj



For a suciently small displacement and b ounded f f we exp ect reasonably accurate estimates

 

In any case so long as the estimation error is smaller than jdj then with s metho d we

can iterate the estimation to acheive our goal of having the estimate d satisfy f x d f x



Toward this end assume were given an estimate d and let the error b e denoted u d d Then

 

b ecause f x is a displaced version of f x we know that f x f x d u Accordingly

  

using gradientbased estimation Eq we form a new estimate of d given by d d u where

 

f x d f x



u

f x d



Here f x d is simply a warp ed version of f x This pro cess can b e iterated to nd a succession

 

of estimates d that converge to d if the initial estimate is in the correct direction

k

   T

In the general D problem we are given an estimate of the optical ow eld u u u



eg it mighthave b een estimated from the coarse scale and we create a warp ed image sequence

hx y t

  

hx y tW f x y t t u x y f x u t y u t t t





where t is the time b etween two consecutive frames and W f u denotes the warping op eration



Note that warping need only b e done only over a range that warps image f by the vector eld u

of t that are required for the temp oral dierentiation lter

The warp ed image sequence hx y t is then used to estimate the residual ow eld



u M b

where M and b are computed by taking derivatives via convolution as ab ove of hx y t Given

the residual ow estimates u the rened velo city estimates b ecome

 

u x y u x y ux y

This new ow estimate is then used to rewarp the original sequence and the residual owmaybe

estimated again In the coarsetone pro cess this new estimate of the ow eld is used to warp

the images at the next ner scale This pro cess continues until the nest scale is reached and the

residual ow is suciently small

HigherOrder Gradient Constraints and Motion Mo dels

There are two relatively straightforward ways in which the gradientbased approach intro duced

ab ove can b e extended The rst is to intro duce other forms of image prepro cessing b efore applying

the gradient constraint If such pro cessing is a derivative lter then this amounts to deriving high

order dierential constraints on the optical ow

The second waywe can generalize the metho d ab ove is byintro ducing rich parametric mo dels

of the optical ow within the neighborhood over which contraints are combined In the metho d

intro duced ab ove weessentially compute a single estimate of the optical ow in the neighborhood

This amounts to an assumption that the ow is well mo delled with a constant ow mo del within

the region It is a relatively straightforward task to generalize this approach to include higher

order mo dels such as an ane mo del that allows us to estimate rotation and scaling along with

translation in the neighborhood

Using Higher Order Derivatives

In the gradientbased approach discussed ab ove we tracked points of consistant intensity or

smo othed version of image intensity using only rstorder derivatives One can however use

various forms of preltering b efore applying the gradient constraint For example the original

image sequence might be ltered to enhance image structure that is of particular interest or to

suppress structure that one wishes to ignore One may also do this to obtain further constraints

on the optical ow at a single pixel

For example one could prelter the image with rst derivative lters g and g Following

x y

Nagel et al and Verri et al the gradient constraint could then b e applied to f and

x

f instead of f in as was done ab ove in Eq This yields two constraints on the optical owat

y

each lo cation

u f u f f

 xx xy xt

u f u f f

 xy yy yt

Alternatively you might use secondorder Gaussian derivatives g g and g as prelters

xx xy yy

In this case you obtain three gradient constraints at each pixel

u f u f f

 xxx xxy xxt

u f u f f

 xxy xy y xy t

u f u f f

 xy y yyy yyt

where f should b e interpreted as f g and likewise for the other derivatives Equation

xxx xxx

written in terms of third derivatives gives three constraints on velo city

An advantage of using higher order derivatives is that in principle there are more constraints

at a single spatial p osition Even so it is typically worthwhile combining constraints over a lo cal

spatial region and computing a leastsquares estimate of velo cityby minimizing

X

E u u u f u f f

  xxx xxy xxt

xy

X

u f u f f

 xxy xy y xy t

xy

X

u f u f f

 xy y yyy yyt

xy

The solutions can be written in the same form as Eq The only dierences are that the

third derivative formulation uses a greater numb er of linear lters and that the third derivative

lters are more narrowly tuned for spatiotemp oral orientation

The intensity gradient constraint in Eq is based on the intensity conservation assumption it

assumes that intensityisconserved only shifting from one lo cation to another lo cation over time

The third derivative constraints Eq are based on conservation of the second spatial derivatives

of intensity ie that f f and f are conserved over time

xx xy yy

It is worth noting that one can also view these higherorder constraints in terms of the conser

vation of some image prop erty For example following Eqs and secondorder constraints

arise from an assumption that the spatial gradient rf x y t is conserved In eect this assump

tion is inconsistent with dilating and rotating deformations By comparison the assumption of

intensity conservation in Eq implies only that image must smo othly deform from one frame

to the next but the form of the ow eld is otherwise unrestricted

HigherOrder Motion Mo dels

The multiscale gradientderivative metho ds describ ed ab ove compute velo city estimates separately

for each pixel They compute the best estimate of translational optical ow in each lo cal neigh

borhood of the image Alternatively one may wish to estimate more complex parametric mo dels

over much larger image neighb orho o ds For example camera motion with resp ect to a planar sur

face results in a quadratic ow eld If the entire image happ ens to corresp ond to a planar or

nearly planar surface then you can take advantage of this information Even if the image do es

not corresp ond to a planar surface a mo del of the ow eld may be a reasonable approximation

in lo cal neighb orho o ds Here we consider an ane ow mo del but quadratic ows can b e handled

similarly

Ane Mo dels Many researchers have found that ane mo dels generally pro duce much

b etter velo city estimates than constant mo dels eg Fleet and Jepson Bergen et al

An ane motion mo del is expressed comp onentwise as

u x y p p x p y

  

u x y p p x p y

 

T

where p p p p p p p are the parameters to b e estimated This can b e written in matrix

   

notation as

ux y Ax y p

where

x y

Ax y

x y

In this instance the ane mo del describ es the ow eld ab out an origin at lo cation if one

T

wished to consider an ane mo del ab out the p ointx y then one might use

 

x x y y

 

Ax y

x y y

 

The gradient constraintmay b e written

f u f

s t

T

where f f f are the spatial derivatives Combining Eqs and gives

s x y

T

f Ap f

s t

so wewant to minimize

X

T

Ap f f E p

t s

xy

where the summation may be taken over the entire image or over some reasonably large image

region Equation is quadratic in p so the estimate p is derived in the usual linear leastsquares

wayby taking derivatives of E with resp ect to p This yields the solution



p R s

where

X

T T

R A f f A

s s

X

T

s A f f

s t

P

and where here means that each element of R and s is obtained by summing elementby

element over the entire image region When R is singular or illconditioned there is not sucient

image structure to estimate all six of the ow eld parameters this has sometimes b een called the

generalized ap erture problem but note as b efore that the problem is not really due to any

ap erture

Again multiscale renement of the parameter estimates is needed to overcome temp oral aliasing

when the motion is large We use the following notation

 

u Ap u Ap u Ap

so that



u Ap p

As b efore weleth denote the warp ed image sequence and we use its derivatives in the gradient

constraint

h u g

s t

Combining with Eq gives

T T 

h Ap h h Ap

t

s s

so wewant to minimize

X

 T T

Ap Ap h h E p h

t

s s

The rened leastsquares estimate of the ow eld mo del parameters is therefore

  

p R s Rp

 

p R s

where R and s are as ab ove except with h instead of f

X

T T

R A h h A

s

s

X

T

s A h h

s t

In other words the renement of the mo del parameters has the same form as the renement of the

ow elds estimates ab ove The residual mo del parameters p are given by



p R s

Plane Persp ective Mo dels Another mo del that is common is the parameter plane

p ersp ective mo del that mo dels the deformation of the image of a plane that o ccurs with any rigid

transformation of the camera under p ersp ective pro jection

PhaseBased Metho ds

The gradient constraint is useful for tracking any dierentiable image prop erty We can apply it

other prop erties other than the image intensities or linearly lter versions of the image One could

apply various forms of nonlinear prepro cessing instead There are two approximations underlying

the gradient constraint see ab ove the image intensity conservation assumption and the

numerical approximation of rstorder partial derivatives in space and time With raw image

intensities the intensity conservation assumption is often violated and derivatives can b e dicult

to estimate reliably However with appropriate prepro cessing of the image sequence one may

satisfy the conservation approximation more closely and also improve the accuracy of numerical

dierentiation For example lo cal automatic gain control to comp ensate for changes in illumination

can make a signicant dierence

One successful example of nonlinear prepro cessing prop osed by Fleet and Jepson

is to use the phase output of quadraturepair linear lters Phasebased metho ds are a combination

of frequencybased and gradientbased approaches they use bandpass prelters along with gradient

constraints and a lo cal ane mo del to estimate optical ow

Phase Correlation To explain phasebased approaches it is useful to b egin with a metho d

called phasecorrelation Kuglin and Hines The metho d is often derived by assuming pure

translation b etween two signals here in the d case for simplicity

f xf x d



The Fourier shift theorem tells us that their Fourier transforms satisfy

id

F F e



Their amplitude sp ectra are identical A A and their phase sp ectra dier by d ie



d



Taking the pro duct of F and the complex conjugate of F and then dividing by the



pro duct of their amplitude sp ectra we obtain

 i    



F F A A e

 

i d

e

A A A A

 

i d

Here e is a sinusoidal function in the frequency domain and therefore its inverse Fourier trans

form is an lo cated at the disparity d that is x d Thus in short phasecorrelation

metho ds measure disparityby nding p eaks in



F F





F

A A



Phase correlation by using a global Fourier basis measures a single displacement for the entire

image But for many applications of interest wereallywant to measure image velo city lo cally So

a natural alternativetotheFourier basis is to use a wavelet basis This gives us a lo cal selfsimilar

representation of the image from whichwe can still use phase information Fleet 1 A Real Part 0 of Signal −1

B 1 Amplitude 0.5 Component 0

2 C Phase 0 −2 Component

1 D Instantanoues 0.5 Frequency 0

1 20 40 60 80 100 120

Figure something here

Lo cal Phase Dierences Todevelop lo cal phasebased metho ds werstneedtointro duce

some notation and terminology asso ciated with bandpass signals First with d signals and lters

lets assume that we have a bandpass lter hx and its resp onse r x p erhaps from a wavelet

transform Lets also assume that r xisoversampled to avoid aliasing in individual subbands

Because r x is bandpass we exp ect it to mo dulate at frequencies similar to those in the

passband A good mo del is usually r xaxcosx where ax and x are referred to as

the amplitude and phase comp onents of the resp onse Another useful quantity is the instantaneous

frequency of r x at a point x dened as k x dxdx To see that this denition makes

sense intuitivelyrememb er that a sinusoidal signal with constant amplitude is given by sinx in

which case the phase comp onentisxx and the instantaneous frequency is k x Here

the frequency is constant but with more general classes of bandpass signals the instantaneous

frequency is a function of spatial p osition

Figure shows a bandpass signal along with its amplitude and phase signals and its instan

taneous frequency Figure A shows the real part of the signal as a function of spatial p osition

while Figures BC show the amplitude and phase comp onents of the signal UnlikeFourier sp ec

tra these two signals are functions of spatial p osition and not frequency The amplitude represents

the magnitude of the signal mo dulation The phase comp onent represents the lo cal structure of

the signal eg whether the lo cal structure is a peak or a zerocrossing Figure D shows the

instantaneous frequency

Normally as can b e seen in Figure amplitude is slowly varying the phase is close to linear

and the instantaneous frequency is close to the center of the passband of the signals Fourier

sp ectrum This characterization suggests a crude but eective mo del for a bandpass signal ab out

apoint x ie



r x ax cosx x x x

   

This approximation involves a constant approximation to the amplitude signal and a rstorder

approximation to the phase comp onent

With resp ect to estimating displacements wecannowpointtosomeoftheinteresting prop erties

of phase First if the signal translates from one frame to the next then as long as the lter is shift

invariant b oth the amplitude and phase comp onents will also translate Second phase almost never

has a zero gradientanditislinearover relatively large spatial extents compared to the signal itself

These prop erties are useful since gradientbased techniques require division by the gradient and

the accuracy of gradientbased measurements is a function of the linearity of the signal Together

all this is promising for phasegradient metho ds

Therefore given two phase signals xand x we can estimate the displacement required



to match phase values at a p osition x using gradient constraints Fleet et al prop osed a

constraint of the form

b x xc



d

x x



Here the phase dierence is only unique within a range of and therefore bc denotes a mo dulo

op erator Also rather than divide by the gradient of one signal we divide by the average of

the two gradients As with intensity gradientbased metho ds this estimator can b e iterated using

a form of Newtons metho d to obtain a nal estimate However in many cases the estimator is

accurate enough that only one or two iterations are necessary

D PhaseBased Flow and Practical Issues This metho d generalizes straightforwardly

to d yielding one phasebased gradient constraint at every pixel for every lter A natural class

of lters would b e a steerable pyramid the lters of which are orientation and scaletuned

With a steerable pyramid the easiest way to extract phase and instantaneous frequency is to

use quadrature pair lters These a complexvalued lters the real and imaginary parts of whichare

Hilb ert transforms of one another In the Fourier domain they o ccupy space in only one halfplane

And b ecause the lter are complexvalued a pair of realvalued lters the output r x y t is also

complexvalued Accordingly the amplitude comp onents of the output is just the squarero ot of

the sum of the squares of the real and imaginary outputs ie jr xj The phase comp onent is the

phase of r x often written as arg r x and computed using atan

In order to compute the phase dierence it is convenienttouse



b x xc arg r x r x

 

And to compute the phase gradientitisconvenienttousetheidentity



dx xr x fr

dx jr xj

This is convenient since b ecause r x is bandpass we know how to interp olate and dierentiate

it Conversely b ecause phase is only unique in the interval there are phase wraparounds

from to whichmakes explicit dierentiation of phase awkward

Finally b ecause the resp onses r x y thavetwo spatial dimensions wetake partical derivatives

of phase with resp ect to b oth x and y yielding gradient constraint equations of the form

u x y tu x y tx y t

 x y

where x y tx y t x y t Since wehave many lters in the steerable pyramid

weget one such constraint from each lter at each pixels We may combine such constraints over

all lters within a lo cal neighb ourho o d and then compute the b est leastsquares t to a motion

mo del eg an ane mo del And of course this can b e iterated to acheiveconvergence as ab ove

Also if it is known that aliasing do es not exist in the spatiotemp oral signal then we can also

dierentiate the resp onse in time and hence compute the temp oral phase derivative This yields

constraints of the form

u x y tu x y t x y t

 x y t

And we may use phase constraints at all scales simultaneously to estimate a lo cal mo del of the

optical ow

Multiscale Phase Dierences Of course as is often the case with conventional image se

quences like those taken with a video camera there may b e aliasing so that one can not dierentiate

the signal with resp ect to time With resp ect to phasebased techniques any signal that is displaced

more than half a wavelength is aliased and therefore phasebased estimates of the displacement

will b e incorrect Thus estimates from resp onses from lters tuned to higher frequencies will have

problems at smaller displacements than resp onses from lters tuned to lower frequency

For example Figure shows the consequences of aliasing The bandwidth of the lters

was approximately octaves and the passbands were centered at frequencies with wavelengths

and pixels Figure A shows the distributions of instantaneous frequency weighted

by amplitude that one obtains from these lters with the Einstein image For relatively large

amplitudes the instantaneous frequencies are concentrated in the resp ective passbands

So instantaneous frequencies in the resp onse to the lter with are generally close to

According we exp ect to see aliasing app ear when the displacements approach pixels half

awavelength Indeed Figure B shows that the distribution of displacement estimates obtained

from the four lters is very accurate when the displacementisvery small By comparison when the

displacement is pixels as in Figure C one can see that the distribution of estimates from the

high frequency lter broadens As exp ected given wavelengths near and a displacementof

the aliased distribution of estimates o ccurs near a disparity of pixels When the displacementis

in Figure D the high frequency lter pro duces a broad distribution of aliased estimates near

The lter tuned to also b egins to show signs of aliasing Finally in Figure E with a

displacement of aliasing causes erroneous estimates is all but the lowest frequency lter

Clearly some form of control structure is required so that errors due to aliasing can b e detected

and ignored When talking ab out gradient based metho ds we intro duced a coarsetone control

strategy that could b e used here as well The phase correlation metho ds suggests another approach

Another approach somewhat similar to phase correlation is to consider evidence through scale at

a wide variety of shifts and cho ose that displacement at which the evidence at all channels is

consistent This approachisbeyond the scop e of this handout

Phase Stability and Singularities There are several reasons for the success of phasebased

techniques for measuring displacements and optical ow One is that phase signals from bandpass

lters are pseudolinear in lo cal spacetime neighb orho o ds of an image sequence The extent of the

linearity can b e shown to dep end on the bandwidth of the lters narrow bandwidth lters pro duce

linear phase b ehavior over larger regions Fleet and Jepson

Another interesting prop erty of phase is it stability with resp ect to deformations other than

translation b etween views of a d scene Fleet and Jepson showed that phase is stable Instantaneous Frequencies A λ=4 λ=8 λ=16 λ=32

Displacement = 0.5 pixels Displacement = -2.5 pixels B C

-8-6-4 -2 0246 8 -8 -6-4 -2 0 2 4 6 8

Displacement = 4.0 pixels Displacement = -7.5 pixels D E

-8-6-4 -2 0246 8 -8 -6-4 -2 0 2 4 6 8

Figure something here

with resp ect to small ane deformations of the image eg scaling the image intensities andor

adding a constanttothe image intensities so that lo cal phase is more often conserved over time

Again it turns out that the exp ected stability of phase through scale dep ends on the bandwidth

of the lters The broader the bandwidth the more stable phase b ecomes

For example Figure BC show the amplitude and phase comp onents of a scalespace expansion

S x of the Gaussian noise signal in Figure A This is the resp onse of a complexvalued Gab or

lter with a halfamplitude bandwidth of o ctaves as a function of spatial p osition x and

wavelength where pixels Level contours of amplitude and phase are shown below

in Figure DE In the context of the scalespace expansion an image prop ertyissaidtobestable

for image matching where its level contours are vertical In this gure notice that the amplitude

contours are not generally vertical By contrast the phase contours are vertical except for several

isolated regions

These isolated regions of instabilityare called singularities neighb orho o ds In such regions it

can b e shown that phase is remarkably unstable even to small p erturbations of scale or of spatial

p osition Fleet et al Fleet and Jepson Accordingly phase constraints on motion

estimation should not be trusted in singularity neighb orho o ds To detect pixels in singularity A 400 Input 200 Intensity 0 -200 -400

0 25 50 75 100 125 150 175 200 225 250 275 300 Pixel Location BC

t

x

DE

FG

Figure something here

neighb orho o ds Fleet and Jepson derive the following reliability condition

log S x ik

x

where k and is the standard deviation of the Gaussian envelop e of the Gab or lter

which increases with wavelength As decreases this constraint detects larger neighb ourho o ds

ab out the singular p oints In eect the lefthand side of Eqn is the inverse scalespace distance

to a singular point For example Figure F shows phase contours that survive the stability

constraint with while Figure G shows the phase contours in regions removed by the

constraint which amounts to ab out of the entire scalespace area

Multiple

When an ane or constant mo del is a go o d approximation to the image motion the areabased

regression metho ds using gradientbased constraints are very eective One of the main reasons is

that with these metho ds one estimates only a small number of parameters for the ane ow

mo del given hundreds or thousands of constraints one constraint for each pixel throughout a large

image region Another reason is the simplicity of the linear estimation metho d that follows from

the gradientbased constraints and the linear parametric motion mo dels

The problem with such mo delbased metho ds is that large image regions are often not well

mo deled by a single parametric motion due to the complexity of the motion or the presence of

multiple moving ob jects Smaller regions on the other hand maynotprovide sucient constraints

for estimating the parameters the generalized ap erture problem discussed ab ove Recent research

has extended mo delbased ow estimation approachby using a piecewise ane mo del for the ow

eld and simultaneously estimating the b oundaries b etween each region and the ane parameters

within each region Wang and Adelson Black and Jepson Ju et al Another

exciting development is the use of mixture mo dels where multiple parameterized mo dels are used

simultaneously in a given image neighorho o d to mo del and estimate the optical ow

References

E H Adelson and J R Bergen Spatiotemp oral energy mo dels for the p erception of motion

Journal of the Optical Society of America A

E H Adelson andJRBergenTheextractionofspatiotemporalenergyinhuman and machine

vision In Proceedings of IEEE Workshop on Motion Representation and Analysis pages

Charleston S Carolina

J K Aggarwal and N Nandhakumar On the computation of motion from sequences of images

a review Proceedings of the IEEE

P Anandan A computational framework and an algorithm for the measurement of visual

motion International Journal of Computer Vision

J Barron A survey of approaches for determining optic ow environmental layout and ego

motion Technical Rep ort RBCVTR Department of Computer Science University of

Toronto

J Barron D J Fleet and S S Beauchemin Performance of optical ow techniques International

Journal of Computer Vision

J R Bergen P Anandan K Hanna and R Hingorani Hierarchical mo delbased motion estima

tion In Proceedings of Second European Conf on Comp Vis pages SpringerVerlag

M J Black and P Anandan The robust estimation of multiple motions parametric and

piecewisesmo oth ow elds Computer Vision and Image Understanding

A R Bruss and B K P Horn Passive navigation Computer Vision Graphics and Image

Processing

D J Fleet Measurement of Image Kluwer Academic Press Boston

D J Fleet Disparity from lo cal weighted phase correlation In IEEE Internat Conf on SMC

pages

D J Fleet and A D Jepson Computation of comp onent image velo city from lo cal phase

information International Journal of Computer Vision

D J Fleet and A D Jepson Stability of phase information IEEE Pattern Analysis and Machine

Intel ligence

D J Fleet A D Jepson and M Jenkin Phasebased disparity measurement Computer Vision

Graphics and Image Processing

N M Grzywacz and A L Yuille A mo del for the estimate of lo cal image velo city by cells in

the visual cortex Proceedings of the Royal Society of London A

D J Heeger Mo del for the extraction of image ow Journal of the Optical Society of America

A

D J Heeger and A D Jepson Subspace metho ds for recovering rigid motion I Algorithm and

implementation International Journal of Computer Vision

B K P Horn and B G Schunk Determining optical ow Articial Intel ligence

A D Jepson and M J Black Mixture mo dels for optical ow computation In IEEE Conference

on Computer Vision and Pattern Recognition pages

S X Ju M J Black and A D Jepson Skin and b ones multilayer lo cally ane optical ow

and regularization with transparency In Computer Vision and Pattern Recognition pages

C Kuglin and D Hines The phase correlation image alignment metho d In IEEE Int Conf

Cybern Soc pages

H C LonguetHiggins and K Prazdny The interpretation of a moving retinal image Proceedings

of the Royal Society of London B

B D Lucas and T Kanade An iterative image registration technique with an application to

stereo vision In Proceedings of the th International Joint ConferenceonArticial Intel ligence

pages Vancouver

H H Nagel On the estimation of optical ow relations b etween dierent approaches and some

new results Articial Intel ligence

E P Simoncelli Distributedrepresentation and analysis of visual motion PhD thesis Electrical

Engineering Department MIT

AVerri F Girosi and V Torre Dierential techniques for optical ow Journal of the Optical

Society of America A

J Y A Wang and E H Adelson Representing moving images with layers IEEE Trans on

Image Processing

A B Watson and A J Ahumada A lo ok at motion in the frequency domain In J K Tsot

sos editor Motion Perception and representation pages Asso ciation for Computing

Machinery New York

AB Watson and A J Ahumada Mo del of human visualmotion sensing Journal of the Optical

Society of America A

AMWaxman and S Ullman Surface structure and threedimensional motion from image ow

International Journal of Robotics Research

YWeiss Smo othness in layers Motion segmentation using nonparametric mixture estimation

In IEEE Conference on Computer Vision and Pattern Recognition pages