CS4501: Introduction to Computer Vision and Optical Flow

Various slides from previous courses by: D.A. Forsyth (Berkeley / UIUC), I. Kokkinos (Ecole Centrale / UCL). S. Lazebnik (UNC / UIUC), S. Seitz (MSR / Facebook), J. Hays (Brown / Georgia Tech), A. Berg (Stony Brook / UNC), D. Samaras (Stony Brook) . J. M. Frahm (UNC), V. Ordonez (UVA), Steve Seitz (UW). Last Class

• More on • Fundamental Matrix Today’s Class

• More on Epipolar Geometry • Essential Matrix • Fundamental Matrix • Optical Flow Estimating depth with stereo

• Stereo: shape from “motion” between two views • We’ll need to consider: • Info on camera pose (“calibration”) • Image point correspondences scene point

image plane optical center Key idea: Epipolar constraint X

X

X

x x ’ x ’ x ’

Potential matches for x have to lie on the corresponding line l’.

Potential matches for x’ have to lie on the corresponding line l. Epipolar geometry: notation X

x x ’

• Baseline – line connecting the two camera centers • Epipoles = intersections of baseline with image planes = projections of the other camera center • Epipolar Plane – plane containing baseline (1D family) Epipolar geometry: notation X

x x ’

• Baseline – line connecting the two camera centers • Epipoles = intersections of baseline with image planes = projections of the other camera center • Epipolar Plane – plane containing baseline (1D family) • Epipolar Lines - intersections of epipolar plane with image planes (always come in corresponding pairs) Epipolar Geometry: Another example

Credit: William Hoff, Colorado School of Mines Example: Converging cameras Geometry for a simple stereo system

• First, assuming parallel optical axes, known camera parameters (i.e., calibrated cameras): Simplest Case: Parallel images

• Image planes of cameras are parallel to each other and to the baseline • Camera centers are at same height • Focal lengths are the same • Then epipolar lines fall along the horizontal scan lines of the images Geometry for a simple stereo system

• Assume parallel optical axes, known camera parameters (i.e., calibrated cameras). What is expression for Z?

Similar triangles (pl, P, pr) and (Ol, P, Or): T + x - x T l r = Z - f Z

T Z = f xr - xl disparity Depth from disparity

image I(x,y) Disparity map D(x,y) image I´(x´,y´)

(x´,y´)=(x+D(x,y), y)

So if we could find the corresponding points in two images, we could estimate relative depth… Correspondence search

Left Right

scanline

Matching cost disparity

• Slide a window along the right scanline and compare contents of that window with the reference window in the left image • Matching cost: SSD or normalized correlation Correspondence search

Left Right

scanline

SSD Correspondence search

Left Right

scanline

Norm. corr Basic stereo matching algorithm

• If necessary, rectify the two stereo images to transform epipolar lines into scanlines • For each pixel x in the first image • Find corresponding epipolar scanline in the right image • Examine all pixels on the scanline and pick the best match x’ • Compute disparity x–x’ and set depth(x) = B*f/(x–x’) Failures of correspondence search

Textureless surfaces Occlusions, repetition

Non-Lambertian surfaces, specularities Active stereo with structured light

• Project “structured” light patterns onto the object • Simplifies the • Allows us to use only one camera

camera

projector

L. Zhang, B. Curless, and S. M. Seitz. Rapid Shape Acquisition Using Color Structured Light and Multi-pass Dynamic Programming. 3DPVT 2002 Kinect and Iphone X Solution

• Add Texture! Kinect: Structured infrared light

http://bbzippo.wordpress.com/2010/11/28/kinect-in-infrared/ iPhone X Basic stereo matching algorithm

• If necessary, rectify the two stereo images to transform epipolar lines into scanlines • For each pixel x in the first image • Find corresponding epipolar scanline in the right image • Examine all pixels on the scanline and pick the best match x’ • Compute disparity x–x’ and set depth(x) = B*f/(x–x’) Effect of window size

W = 3 W = 20 • Smaller window + More detail • More noise • Larger window + Smoother disparity maps • Less detail Results with window search Data

Window-based matching Ground truth Better methods exist...

Graph cuts Ground truth

Y. Boykov, O. Veksler, and R. Zabih, Fast Approximate Energy Minimization via Graph Cuts, PAMI 2001

For the latest and greatest: http://www.middlebury.edu/stereo/ When cameras are not aligned: Stereo

• Reproject image planes onto a common • plane parallel to the line between optical centers • Pixel motion is horizontal after this transformation • Two homographies (3x3 transform), one for each input image reprojection

•C. Loop and Z. Zhang. Computing Rectifying Homographies for Stereo Vision. IEEE Conf. Computer Vision and Pattern Recognition, 1999. Rectification example Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Back to the General Problem

Credit: William Hoff, Colorado School of Mines Brief Digression: Cross Product as Matrix Multiplication • Dot Product as matrix multiplication: Easy !" !$ !% & '" '$ '% = !"'" + !$'$ + !%'% * !" !$ !% '" '$ '% = !"'" + !$'$ + !%'%

• Cross Product as matrix multiplication: Back to the General Problem

Credit: William Hoff, Colorado School of Mines The Essential Matrix The Essential Matrix

Essential Matrix

(Longuet-Higgins, 1981) Credit: William Hoff, Colorado School of Mines Epipolar constraint: Calibrated case X

x x ’

• Intrinsic and extrinsic parameters of the cameras are known, world coordinate system is set to that of the first camera • Then the projection matrices are given by K[I | 0] and K’[R | t] • We can multiply the projection matrices (and the image points) by the inverse of the calibration matrices to get normalized image coordinates: -1 -1 xnorm = K xpixel = [I 0] X, x¢norm = K¢ x¢pixel = [R t] X Epipolar constraint: Calibrated case

æ xö X = (x,1)T æ xö [I 0]ç ÷ [R t]ç ÷ è 1 ø è 1 ø

x x’ = Rx+t

t R

The vectors Rx, t, and x’ are coplanar Epipolar constraint: Calibrated case

X

x x’ = Rx+t

¢ T x ×[t ´(Rx)] = 0 x! [t× ]Rx = 0

é 0 - az ay ùébx ù ê ú a´b = a 0 - a êb ú = [a ]b Recall: ê z x úê y ú ´ ê úê ú ë- ay ax 0 ûëbz û

The vectors Rx, t, and x’ are coplanar Epipolar constraint: Calibrated case

X

x x’ = Rx+t

¢ T T x ×[t ´(Rx)] = 0 x! [t× ]Rx = 0 x! E x = 0

Essential Matrix (Longuet-Higgins, 1981)

The vectors Rx, t, and x’ are coplanar Epipolar constraint: Calibrated case X

x x ’

x!T E x = 0 • E x is the epipolar line associated with x (l' = E x) • Recall: a line is given by ax + by + c = 0 or éaù éxù T ê ú ê ú l x = 0 where l = êbú, x = êyú ëêcûú ëê1ûú Epipolar constraint: Calibrated case X

x x ’

x!T E x = 0 • E x is the epipolar line associated with x (l' = E x) • ETx' is the epipolar line associated with x' (l = ETx') • E e = 0 and ETe' = 0 • E is singular (rank two) • E has five degrees of freedom Epipolar constraint: Uncalibrated case X

x x ’

• The calibration matrices K and K’ of the two cameras are unknown • We can write the epipolar constraint in terms of unknown normalized coordinates: xˆ¢T E xˆ = 0 xˆ = K -1 x, xˆ¢ = K¢-1 x¢ Epipolar constraint: Uncalibrated case

X

x x ’

xˆ¢T E xˆ = 0 x¢T F x = 0 with F = K¢-T E K -1 xˆ = K -1 x Fundamental Matrix -1 xˆ¢ = K¢ x¢ (Faugeras and Luong, 1992) Epipolar constraint: Uncalibrated case

X

x x ’

xˆ¢T E xˆ = 0 x¢T F x = 0 with F = K¢-T E K -1

• F x is the epipolar line associated with x (l' = F x) • FTx' is the epipolar line associated with x' (l = FTx') • F e = 0 and FTe' = 0 • F is singular (rank two) • F has seven degrees of freedom Estimating the Fundamental Matrix

• 8-point algorithm • Least squares solution using SVD on equations from 8 pairs of correspondences • Enforce det(F)=0 constraint using SVD on F

• 7-point algorithm • Use least squares to solve for null space (two vectors) using SVD and 7 pairs of correspondences • Solve for linear combination of null space vectors that satisfies det(F)=0

• Minimize reprojection error • Non-linear least squares

Note: estimation of F (or E) is degenerate for a planar scene. 8-point algorithm

1. Solve a system of homogeneous linear equations a. Write down the system of equations xT Fx¢ = 0 " " " " " " !! #$$ + !& #$' + !#$( + &! #'$ + && #'' + &#'( + ! #($ + & #(' + #(( = 0

#$$ # ! ! ′ ! & ′ ! & ! ′ & & ′ & ! ′ & ′ 1 $' $ $ $ $ $ $ $ $ $ $ $ $ # A, = ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ $( =0 # ! !" ! & ′ ! & ! ′ & & ′ & ! ′ & ′ 1 '$ 0 1 0 0 0 0 0 0 0 0 0 0 ⋮ #(( 8-point algorithm

1. Solve a system of homogeneous linear equations a. Write down the system of equations b. Solve f from Af=0 using SVD Matlab: [U, S, V] = svd(A); f = V(:, end); F = reshape(f, [3 3])’; Optical Flow

Slides by Juan Carlos Niebles and Ranjay Krishnan From images to videos

• A video is a sequence of frames captured over time • Now our image data is a function of space (x, y) and time (t) Why is motion useful? Why is motion useful? Optical flow • Definition: optical flow is the apparent motion of brightness patterns in the image

• Note: apparent motion can be caused by lighting changes without any actual motion • Think of a uniform rotating sphere under fixed lighting vs. a stationary sphere under moving illumination

GOAL: Recover image motion at each pixel from Savarese optical flow Source: Silvio Optical flow

Vector field function of the spatio-temporal image brightness variations

Picture courtesy of Selim Temizer - Learning and Intelligent Systems (LIS) Group, MIT Estimating optical flow

I(x,y,t–1) I(x,y,t) • Given two subsequent frames, estimate the apparent motion field u(x,y), v(x,y) between them • Key assumptions • Brightness constancy: projection of the same point looks the same in every frame Savarese • Small motion: points do not move very far • Spatial coherence: points move like their neighbors Source: Silvio Key Assumptions: small motions Key Assumptions: spatial coherence

* Slide from Michael Black, CS143 2003 Key Assumptions: brightness Constancy

* Slide from Michael Black, CS143 2003 The brightness constancy constraint

I(x,y,t–1) I(x,y,t) • Brightness Constancy Equation: I(x, y,t −1) = I(x + u(x, y), y + v(x, y), t) Linearizing the right side using Taylor expansion: Image derivative along x I(x + u, y + v,t) ≈ I(x, y,t −1)+ Ix ⋅u(x, y)+ Iy ⋅ v(x, y)+ It Savarese

I(x + u, y + v,t)− I(x, y,t −1) = Ix ⋅u(x, y)+ Iy ⋅ v(x, y)+ It T Hence, I x ×u + I y ×v + It » 0 → ∇I ⋅[u v] + It = 0 Source: Silvio Filters used to find the derivatives

!" !# !$ The brightness constancy constraint

Can we use this equation to recover image motion (u,v) at each

pixel? T ∇I ⋅[u v] + It = 0 • How many equations and unknowns per pixel? • One equation (this is a scalar equation!), two unknowns (u,v) The component of the flow perpendicular to the gradient (i.e., parallel to the edge) cannot be measured gradient (u,v)

If (u, v ) satisfies the equation, so does (u+u’, v+v’ ) if Savarese T ∇I ⋅[u' v'] = 0 (u+u’,v+v’) (u’,v’) edge Source: Silvio The aperture problem Savarese

Actual motion Source: Silvio The aperture problem Savarese

Perceived motion Source: Silvio The barber pole illusion Savarese

http://en.wikipedia.org/wiki/Barberpole_illusion Source: Silvio The barber pole illusion Savarese

http://en.wikipedia.org/wiki/Barberpole_illusion Source: Silvio What we will learn today?

• Optical flow • Lucas-Kanade method

Reading: [Szeliski] Chapters: 8.4, 8.5 [Fleet & Weiss, 2005] http://www.cs.toronto.edu/pub/jepson/teaching/vision/2503/opticalFlow.pdf Solving the ambiguity…

B. Lucas and T. Kanade. An iterative technique with an application to stereo vision. In Proceedings of the International Joint Conference on Artificial Intelligence, pp. 674– 679, 1981. • How to get more equations for a pixel? • Spatial coherence constraint: • Assume the pixel’s neighbors have the same (u,v) • If we use a 5x5 window, that gives us 25 equations per pixel Savarese Source: Silvio Lucas-Kanade flow • Overconstrained linear system: Savarese Source: Silvio Lucas-Kanade flow • Overconstrained linear system

Least squares solution for d given by Savarese

The summations are over all pixels in the K x K window Source: Silvio Conditions for solvability • Optimal (u, v) satisfies Lucas-Kanade equation

When is This Solvable? • ATA should be invertible • ATA should not be too small due to noise T – eigenvalues l1 and l 2 of A A should not be too small

• ATA should be well-conditioned Savarese

– l 1/ l 2 should not be too large (l 1 = larger eigenvalue) Does this remind anything to you? Source: Silvio M = ATA is the second moment matrix ! (Harris corner detector…)

• Eigenvectors and eigenvalues of ATA relate to edge direction and magnitude • The eigenvector associated with the larger eigenvalue points in the direction of fastest intensity change • The other eigenvector is orthogonal to it Savarese Source: Silvio Interpreting the eigenvalues Classification of image points using eigenvalues of the second moment matrix:

l2 “Edge” l2 >> l1 “Corner”

l1 and l2 are large, l1 ~ l2 Savarese

l1 and l2 are small “Flat” “Edge” region l1 >> l2

Source: Silvio l1 Edge Savarese

– gradients very large or very small

– large l1, small l2 Source: Silvio Low-texture region Savarese

– gradients have small magnitude

– small l1, small l2 Source: Silvio High-texture region Savarese

– gradients are different, large magnitudes

– large l1, large l2 Source: Silvio Errors in Lukas-Kanade

What are the potential causes of errors in this procedure? • Suppose ATA is easily invertible • Suppose there is not much noise in the image

• When our assumptions are violated – Brightness constancy is not satisfied – The motion is not small – A point does not move like its neighbors • window size is too large • what is the ideal window size?

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003 Improving accuracy • Recall our small motion assumption

It-1(x,y)

It-1(x,y) • This is not exact – To do better, we need to add higher order terms back in:

It-1(x,y) • This is a polynomial root finding problem – Can solve using Newton’s method (out of scope for this class) – Lukas-Kanade method does one iteration of Newton’s method • Better results are obtained via more iterations

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003 Iterative Refinement • Iterative Lukas-Kanade Algorithm 1. Estimate velocity at each pixel by solving Lucas- Kanade equations 2. Warp I(t-1) towards I(t) using the estimated flow field - use image warping techniques 3. Repeat until convergence

* From Khurram Hassan-Shafique CAP5415 Computer Vision 2003 When do the optical flow assumptions fail?

In other words, in what situations does the displacement of pixel patches not represent physical movement of points in space?

1. Well, TV is based on illusory motion – the set is stationary yet things seem to move

2. A uniform rotating sphere – nothing seems to move, yet it is rotating

3. Changing directions or intensities of lighting can make things seem to move – for example, if the specular highlight on a rotating sphere moves.

4. Muscle movement can make some spots on a cheetah move opposite direction of motion. – And infinitely more break downs of optical flow. Questions?

83