1 Geometry: 3D Velocity and 2D Image Velocity
Total Page:16
File Type:pdf, Size:1020Kb
Notes on Motion Estimation Psych CS DEE Prof David J Heeger Novemb er There are a great variety of applications that dep end on analyzing the motion in image se quences These include motion detection for surveillance image sequence data compression MPEG image understanding motionbased segmentation depthstructure from motion obstacle avoid ance image registration and comp ositing The rst step in pro cessing image sequences is typically image velo city estimation The result is called the optical ow eld a collection of twodimensional velo cityvectors one for each small region p otentiallyoneforeach pixel of the image Image velo cities can b e measured using correlation or blo ckmatching for example see Anan dan in which each small patch of the image at one time is compared with nearby patches in the next frame Feature extraction and matching is another way to measure the ow eld for reviews of feature tracking metho ds see Barron or Aggarwal and Nandhakumar Gradientbased algorithms are a third approach to measuring ow elds for example Horn and Schunk Lucas and Kanade Nagel A fourth approach using spatiotemp oral l tering has also b een prop osed for example Watson and Ahumada Heeger Grzywacz and Yuille Fleet and Jepson This handout concentrates on the lterbased and gradientbased metho ds Emphasis is placed on the imp ortance of multiscale coarsetone rene mentofthevelo city estimates Foragoodoverview and comparison of dierentow metho ds see Barron et al Geometry D Velo city and D Image Velo city Motion o ccurs in many applications and there are many tasks for which motion information might be very useful Here we concentrate on natural image sequences of d scenes in which ob jects and the camera maybemoving Typical issues for whichwewant computational solutions include inferring the relatived motion b etween the camera and ob jects in the scene inferring the depth and surface structure of the scene and segmenting the scene using the motion information Towards this end we are interested in knowing the geometric relationships between d motion surface structure and d image velo city To b egin consider a point on the surface of an ob ject We will represent d surface points T as p osition vectors X X Y Z relative to a viewercentered co ordinate frame as depicted in Fig When the camera moves or when the ob ject moves this p oint moves along a d path T Xt X tY tZt relative to our viewercentered co ordinate frame The instantaneous d velo cityofthepoint is the derivative of this path with resp ect to time T dXt dY dZ dX V dt dt dt dt We are interested in the image lo cations to which the d p oint pro jects as a function of time T Under p ersp ective pro jection the p oint X pro jects to the image p ointx y given by x fXZ y fYZ Y camera 3d path of ● center surface point y X x Z image plane Figure Camera centered co ordinate frame and p ersp ective pro jection Owing to motion b etween the camera and the scene a D surface p ointtraverses a path in D Under p ersp ective pro jection this path pro jects onto a D path in the image plane the temp oral derivativeofwhich is called D velo city The d velo cities asso ciated with all visible points denes a dense d vector eld called the d motion eld where f is the fo cal length of the pro jection As the d point moves through time its corre T sp onding d image p oint traces out a d path xtyt the derivative of which is the image velo city T dxt dy t u dt dt We can combine Eqs and and thereby write image velo city in terms of the d p osition and velo city of the surface p oint T dZ dX dY T u X tY t Z dt dt Z dt Each visible surface p oint traces out a d path which pro jects onto the image plane to form a d path If one considers the paths of all visible d surface p oints and their pro jections onto the image plane then one obtains a dense set of d paths the temp oral derivative of which is the vector eld of d velo cities commonly known as the optical ow eld To understand the structure of the optical ow eld in greater detail it is often helpful to make further assumptions ab out the structure of the d surface or ab out the form of the d motion We b egin by considering the sp ecial case in which the ob jects in the scene move rigidly with resp ect to the camera as though the camera was moving through a stationary scene LonguetHiggins and Prazdny Bruss and Horn Waxman and Ullman Heeger and Jepson This is a sp ecial case because all p oints on a rigid body share the same six motion parameters relative to the cameracentered co ordinate frame In particular the instantaneous velo city of the camera through a stationary scene can be expressed in terms of the cameras d translation T T T T T T and its instantaneous d rotation Here the direction of x y z x y z gives the axis of rotation while jj is the magnitude of rotation p er unit time Given this motion of the camera the instantaneous d velo city of a surface p oint in cameracentered co ordinates is T dX dY dZ X T dt dt dt The d velo cities of all surface points in a stationary scene dep end on the same rigidb o dy motion parameters and are given by Eq When we substitute these d velo cities for dXdt in A B C Figure Example ow elds A Camera translation B Camera rotation C Translation plus rotation Eachowvector in C is the vector sum of the two corresp onding vectors in A and B Eq we obtain an expression for the form of the optical ow eld for a rigid scene ux y px y Ax y T Bx y where px y Z x y isinverse depth at each image lo cation and f x Ax y f y xy f f x f y Bx y f y f xy f x The matrices Ax y andBx y dep end only on the image p osition and the fo cal length Equation describ es the ow eld as a function of D motion and depth It has two terms The rst term is referred to as the translational comp onent of the ow eld since it dep ends on d translation and d depth The second term is referred to as the rotational comp onent since it dep ends only on d rotation Since px y the inverse depth and T the translation are multiplied together in Eq a larger distance smaller p or a slower D translation smaller jTj b oth yield aslower image velo city Figure shows some example ow elds Each vector represents the sp eed and direction of motion for each lo cal image region Figure A is a ow eld resulting from camera translation ab ove a planar surface Figure B is a ow eld resulting from camera rotation andFigC is a ow eld resulting from simultaneous translation and rotation Eachowvector in Fig C is the vector sum of the two corresp onding owvectors is Figs A and B Pure Translation When a camera is undergoing a pure translational motion features in the image move toward or away from a single point in the image called the fo cus of expansion FOE The FOE in Fig A is centered just ab ove the horizon When the rotation is zero ux y px y Ax y T ie u x y px x T x fT T z x z u x y py y T y fT T z y z T The image velo city is zero at image p osition x y this is the fo cus of expansion Atany other T T image p osition the owvector is prop ortional to x y x y that is the owvector p oints either toward or away from the fo cus of expansion Pure Rotation When the camera is undergoing a pure rotational motion the resulting ow eld dep ends only on the axis and sp eed of rotation indep endent of the scene depthstructure see Fig B for an example Sp ecically rotation results in a quadratic ow eld that is each comp onent u and u of the ow eld is a quadratic function of image p osition This is evident from the form of Bx y which contains quadratic terms in x and y eg x xy etc Motion With Resp ect to a Planar Surface When the camera is rotating as well as translating the ow eld can b e quite complex Unlike the pure translation case the singularityin the ow eld no longer corresp onds to the translation direction But for planar surfaces the ow eld simplies to again b e a quadratic function of image p osition see Fig C for an example A planar surface can b e expressed in the cameracentered co ordinate frame as Z m X m Y Z x y The depth map is obtained by substituting from Eq to give Z x y m xf Z x y m yf Z x y Z x y Solving for Z x y fz Z x y f xm ym x y and the inverse depth map is f m m x y px y x y fZ fZ fZ That is for a planar surface the inverse depth map is also planar Combining Eqs and gives m x m y f x x y ux y T B f y fZ We already know see ab ove that the rotational comp onentoftheow eld the second term B is quadratic in x and y It is clear that the translational comp onent the rst term in the ab ove equation is also quadratic in x and y For curved d surfaces the translational comp onent and hence the optical ow eld will be cubic or even higher order The rotational comp onentisalways quadratic Occlusions The optical ow eld is often more complex than is implied by the ab ove equa tions For example the b oundaries of ob jects gives rise to o cclusions in an image sequence which cause discontinuities in the optical ow eld A detailed discussion of these issues is beyond the scop e of this presentation time = t time = t+1 Figure Intensity conservation assumption the image intensities are shifted lo cally translated from one frame to the next GradientBased D Motion Estimation The optical ow eld describ ed ab ove is a geometric entity We now turn to the problem of estimating it from the spatiotemp oral variations in image intensity Towards this end a number of p eople Horn and Schunk Lucas and Kanade Nagel and others