Outline

• Last lecture: – stereo reconstruction with calibrated cameras – non-geometric correspondence constraints Lecture 12: Multi-view • Homogeneous coordinates, projection matrices geometry / Stereo III • Camera calibration • Weak calibration/self-calibration Tuesday, Oct 23 – Fundamental – 8-point algorithm CS 378/395T Prof. Kristen Grauman

Review: stereo with calibrated cameras Review: the essential matrix

Vector p’ in second coord. p ⋅[T×(Rp′)]= 0 sys. has coordinates Rp’ in the first one. p ⋅[Tx]Rp′ = 0 Let E = []Tx R pΤEp′ = 0

E is the essential matrix, which relates corresponding image points in both cameras, given the rotation and Camera-centered coordinate systems are related by known translation between their coordinate systems. rotation R and translation T.

Review: stereo with calibrated cameras Review: disparity/depth maps

• Image pair • Detect some features image I(x,y) Disparity map D(x,y) image I´(x´,y´) • Compute E from R and T • Match features using the epipolar and other constraints • Triangulate for 3d structure (x´,y´)=(x+D(x,y),y) Review: correspondence problem Review: correspondence problem

• To find matches in the image pair, assume – Most scene points visible from both views Multiple – Image regions for the matches are similar in match appearance hypotheses • Dense or sparse matches satisfy • Additional (non-epipolar) constraints: epipolar – Similarity constraint, – Uniqueness but which is –Ordering correct? – Figural continuity – Disparity gradient

Figure from Gee & Cipolla 1999

Review: correspondence error sources Homogeneous coordinates

• Low-contrast / textureless image regions • Extend Euclidean space: add an extra coordinate • Occlusions • Points are represented by equivalence classes • Camera calibration errors • Why? This will allow us to write process of perspective projection as a matrix • Poor image resolution • Violations of brightness constancy (specular 2d: (x, y)’ Æ (x, y, 1)’ Mapping to reflections) homogeneous coordinates • Large motions 3d: (x, y, z)’ Æ (x, y, z, 1)’

2d: (x, y, w)’ Æ (x/w, y/w)’ Mapping back from homogeneous 3d: (x, y, z, w)’ Æ (x/w, y/w, z/w)’ coordinates

Homogeneous coordinates Perspective projection equations

•Equivalence relation: Image plane Focal length

(x, y, z, w) is the same as (kx, ky, kz, kw) Optical Camera axis frame Homogeneous coordinates are only defined up to a scale

Scene point Image coordinates Projection matrix for perspective projection Projection matrix for perspective projection

fX fY x = y = Z From pinhole From pinhole Z camera model camera model Same thing, but written in terms of homogeneous coordinates

Projection matrix for orthographic projection Camera parameters • Extrinsic: location and orientation of camera frame with respect to reference frame • Intrinsic: how to map pixel coordinates to image plane coordinates 1

Reference frame

X Y Camera 1 x = y = frame 1 1

Rigid transformations Rotation about coordinate axes in 3d

Combinations of rotations and translation Express 3d rotation as • Translation: add values to coordinates series of rotations around coordinate • Rotation: matrix multiplication axes by angles α, β,γ

Overall rotation is product of these elementary rotations:

R = R xR y R z Extrinsic camera parameters Camera parameters • Extrinsic: location and orientation of camera P = R(P − T) frame with respect to reference frame c w • Intrinsic: how to map pixel coordinates to image plane coordinates Point in camera Point in world reference frame

Reference frame Pc =()X ,Y, Z

Camera 1 frame

Intrinsic camera parameters Camera parameters • We know that in terms of camera reference frame: • Ignoring any geometric distortions from optics, we can describe them by: x = −(x − o )s • Substituting previous eqns describing intrinsic and im x x extrinsic parameters, can relate pixels coordinates to world points: y = −(yim − oy )sy Τ R1 (Pw − T) − (xim − ox )sx = f Τ R3 (Pw − T) Ri = Row i of Coordinates of Coordinates of Coordinates of Effective size of a Τ projected point in image point in image center in pixel (mm) R 2 (Pw − T) camera reference pixel units pixel units − (yim − oy )sy = f Τ frame R3 (Pw − T)

Linear version of perspective Calibrating a camera projection equations point in camera coordinates • Compute intrinsic and extrinsic • This can be rewritten X x1 w parameters using observed as a matrix product = Y x2 Mint Mext w camera data using homogeneous Z x3 w coordinates: 1 Main idea x = x / x im 1 3 • Place “calibration object” with y = x / x im 2 3 known geometry in the scene

Τ • Get correspondences r11 r12 r13 − R1 T − f /sx 0 ox • Solve for mapping from scene M = r r r − R ΤT int 0 − f /sy oy M = 21 22 23 2 ext Τ to image: estimate M=MintMext 0 0 1 r31 r32 r33 − R3 T Linear version of perspective projection equations Estimating the projection matrix Pw in homog. M ⋅P • This can be rewritten x Xw 1 w 1 xim = 0 = (M − x M )⋅P as a matrix product = Y 1 im 3 w x2 Mint Mext w M3 ⋅Pw using homogeneous x Zw 3 M ⋅P coordinates: 1 y = 2 w M im 0 = (M − y M )⋅P M3 ⋅Pw 2 im 3 w product M is single M ⋅P projection matrix x = 1 w xim = x / x encoding both im M31⋅Pw 3 extrinsic and intrinsic parameters y =Mx2 ⋅P/w x yimim= 2 3 Let M be row M ⋅P i 3 w i of matrix M

Estimating the projection matrix Estimating the projection matrix 0 = (M1 − ximM3 )⋅Pw 0 = (M − y M )⋅P Τ Τ Τ For a given feature point: 2 im 3 w P 0 − x P M1 w im w M 0 Τ Τ T 2 = 0 P − y P M 0 0 = (M1 − ximM3 )⋅Pw w im w 3 0 = (M − y M )⋅P 2 im 3 w Expanding this to see the elements: In matrix form: X Y Z 1 0 0 0 0 − x X − x Y − x Z − x 0 w w w im w im w im w im = Τ Τ Τ 0 0 0 0 X w Yw Z w 1 − yim X w − yimYw − yim Z w − yim 0 P 0 − x P M1 w im w M = 0 Τ Τ T 2 0 0 Pw − yim Pw M3 Stack rows of matrix M

Estimating the projection matrix Summary: camera calibration

This is true for every feature point, so we can stack up n observed image features and their associated 3d points • Associate image points with scene points in single equation: Pm = 0 on object with known geometry P m • Use together with perspective projection X (1) Y (1) Z (1) 1 0 0 0 0 − x(1) X (1) − x(1)Y (1) − x(1)Z (1) − x(1) w w w im w im w im w im (1) (1) (1) (1) (1) (1) (1) (1) (1) (1) relationship to estimate projection matrix 0 0 0 0 X Y Z 1 − yim X − y Y − y Z − y w w w w im w im w im 0 …… … … = 0 • (Can also solve for explicit parameters X (n) Y (n) Z (n) 1 0 0 0 0 − x(n) X (n) − x(n)Y (n) − x(n)Z (n) − x(n) w w w im w im w im w im … themselves) 0 0 0 0 X (n) Y (n) Z (n) 1 − y(n) X (n) − y(n)Y (n) − y(n)Z (n) − y(n) w w w im w im w im w im 0 = 0

Solve for mij’s (the calibration information) with least squares. [F&P Section 3.1] When would we calibrate this way? Self-calibration

• Makes sense when geometry of system is • Want to estimate world geometry without not going to change over time requiring calibrated cameras – Archival videos – Photos from multiple unrelated users • …When would it change? – Dynamic camera system

• We can still reconstruct 3d structure, up to certain ambiguities, if we can find correspondences between points…

−1 p (left ) = Mleft,int p (left ) Uncalibrated case: Uncalibrated case −1 p ( right ) = M right,int p(right) fundamental matrix p ΤEp = 0 From before, the p = MintMext Pw (right) (left) essential matrix p Dropped subscript, still −1 Τ −1 internal parameter (M right pright ) E()Mleft pleft = 0 So: matrices −1 p (left ) = M p (left ) Τ −Τ −1 Camera left,int Image pixel pright (M right EMleft )pleft = 0 coordinates −1 coordinates p ( right ) = M right,int p(right) Τ Internal prightFpleft = 0 calibration matrices Fundamental matrix

Fundamental matrix Computing F from correspondences

• Relates pixel coordinates in the two views −Τ −1 Τ • More general form than essential matrix: F = (MrightEMleft ) pright Fpleft = 0 we remove need to know intrinsic parameters • Cameras are uncalibrated: we don’t know

• If we estimate fundamental matrix from E or left or right Mint matrices correspondences in pixel coordinates, can • Estimate F from 8+ point correspondences. reconstruct without intrinsic or extrinsic parameters Computing F from correspondences Robust computation

Each point Τ correspondence prightFpleft = 0 • Find corners generates one constraint on F • Unguided matching – local search, cross- correlation to get some seed matches • Compute F and epipolar geometry: find F that is consistent with many of the seed matches • Now guide matching: using F to restrict search to epipolar lines Collect n of these constraints

Invert and solve for F. Or, if n > 8, least squares solution.

RANSAC application: robust computation

Interest points (Harris corners) in left and right images about 500 pts / image 640x480 resolution

Putative Outliers (117) correspondences (268) (t=1.25 pixel; 43 iterations) (Best match,SSD<20)

Final inliers (262) Inliers (151)

Hartley & Zisserman p. 126 Figure by Gee and Cipolla, 1999

Need for multi-view geometry and Z-keying for virtual reality 3d reconstruction

Applications including: • Merge synthetic and real images given • 3d tracking depth maps • Depth-based grouping • Image rendering and generating interpolated or “virtual” viewpoints • Interactive video

Kanade et al., CMU, 1995 Z-keying for virtual reality Virtualized RealityTM

Capture 3d shape from multiple views, texture from images Use them to generate new views on demand

Kanade et al, CMU http://www.cs.cmu.edu/afs/cs/project/stereo-machine/www/z-key.html http://www.cs.cmu.edu/~virtualized-reality/3manbball_new.html

Virtual viewpoint video Virtual viewpoint video

C. Zitnick et al, High-quality video view interpolation using a layered representation, SIGGRAPH 2004. http://research.microsoft.com/IVM/VVV/

Noah Snavely, Steven M. Seitz, Richard Szeliski, "Photo tourism: Exploring photo collections in 3D," ACM Transactions on Graphics (SIGGRAPH Proceedings), 25(3), 2006, 835-846. http://phototour.cs.washington.edu/, http://labs.live.com/photosynth/ Coming up

• Tuesday: Local invariant features – Read Lowe paper on SIFT

• Problem set 3 out next Tuesday, due 11/13 • Graduate students: remember paper reviews and extensions, due 12/6