<<

Agenda

• Rotations

• Camera calibration

• Ransac Geometric Transformations y

164 Computer Vision: Algorithms andx Applications (September 3, 2010 draft)

Transformation Matrix # DoF Preserves Icon

translation I t 2 orientation 2 3 h i ⇥ ⇢⇢SS rigid (Euclidean) R t 3 lengths S ⇢ 2 3 S⇢ ⇥ h i ⇢ similarity sR t 4 angles S 2 3 S⇢ h i ⇥ ⇥ ⇥ affine A 6 parallelism ⇥ ⇥ 2 3 h i ⇥ projective H˜ 8 straight lines ` 3 3 ` h i ⇥ Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preserves Let’s definethe properties families listed of in thetransformations rows below it, i.e., similarity by the preserves properties not only anglesthat butthey also preserve parallelism and straight lines. The 2 3 matrices are extended with a third [0T 1] row to form ⇥ a full 3 3 matrix for homogeneous coordinate transformations. ⇥

amples of such transformations, which are based on the 2D geometric transformations shown in Figure 2.4. The formulas for these transformations were originally given in Table 2.1 and are reproduced here in Table 3.5 for ease of reference.

In general, given a transformation specified by a formula x0 = h(x) and a source f(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)? Think about this for a minute before proceeding and see if you can figure it out. If you are like most people, you will come up with an algorithm that looks something like Algorithm 3.1. This process is called forward warping or forward mapping and is shown in Figure 3.46a. Can you think of any problems with this approach?

procedure forwardWarp(f,h, out g):

For every pixel x in f(x)

1. Compute the destination location x0 = h(x).

2. Copy the pixel f(x) to g(x0).

Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an image g(x0) through the parametric transform x0 = h(x). Rotations Linear transformations that preserve distances and angles

Definition: an orthogonal transformation perserves dot products T T n n n a Tb = F (a) F (b)whereF (a)=Aa, a Rn ,A Rn n⇥ a b = T (a)T (b)whereT (a)=Aa, a 2R ,A 2R ⇥ 2 2 T T T T a b = a A Ab A A = I [can conclude by setting a,b = coordinate vectors] ()

Defn: A is a matrix if ATA = I, det(A) = 1 Defn: A is a reflection matrix if ATA = I, det(A) = -1 2D Rotations

cos ✓ sin ✓ R = sin ✓ cos ✓ 

1 DOF 3D Rotations

X r11 r12 r13 X R Y = r r r Y 2 3 2 21 22 233 2 3 Z r31 r32 r33 Z 4 5 4 5 4 5 Think of as change of basis where ri = r(i,:) are orthonormal basis vectors

r2

r1 rotated

coordinate frame r3

How many DOFs?

3 = (2 to point r1 + 1 to rotate along r1) 7 Shears

1 hxy hxz 0 h 1 h 0 A˜ = 2 yx yz 3 hzx hzy 10 6 00017 6 7 4 Shears y into5 x

7

3D Rotations 8 LotsRotations of parameterizations that try to capture 3 DOFs

Helpful• 3D Rotations one for fundamentally vision: axis-angle more complex representation than in 2D! • 2D: amount of rotation! Represent a 3D rotation with a vector that represents the axis of • 3D: amount and axis of rotation rotation, and an angle of rotation about that vector

-vs-

2D 3D 8

05-3DTransformations.key - February 9, 2015 Recall: cross-product

Dot product: a b = a b cos✓ · || || || ||

Cross product:

ijk a2 a3 a1 a3 a1 a2 a1 a2 a3 = i j + k b2 b3 b1 b3 b1 b2 b b b 1 2 3

Cross product 0 a a b 3 2 1 matrix: a b = ˆab = a3 0 a1 b2 ⇥ 2 a a 0 3 2b 3 2 1 3 4 5 4 5 Approach

! R3, ! =1 2 || ||

✓ x Approach

! R3, ! =1 2 || || x ? ✓ x k x

1. Write as x as sum of parallel and perpindicular component to omega 2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega R = I +ˆw sin ✓ +ˆwwˆ(1 cos ✓) [Rx can simplify to cross and dot product computations] Exponential

! R3, ! =1 2 || || x ? ✓ x k x

R =exp(ˆv), where v = !✓ 1 = I +ˆv + vˆ2 + ... 2!

[standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…]

Implication: we can approximate change in position due to a small rotation as v x, where v = !✓ ⇥ Agenda

• Rotations

• Camera calibration

• Homography

• Ransac Perspective

y

x (X,Y,Z) (x,y,1)

z COP

[right-handed coordinate system]

f x = X Z f y = Y Z Perspective projection revisited

x f 00 X y = 0 f 0 Y 213 20013 2Z 3 4 5 4 5 4 5 Given (X,Y,Z) and f, compute (x,y) and lambda:

x = fX = Z x fX x = = Z Special case: f = 1

Natural geometric intuition: • 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point

(X,Y,Z) (x,y,1) COP

x X Z y = Y 213 2Z 3 4 5 4 5 [Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image] Homogenous notation x X y Y 2z3 ⇠ 2Z 3 4x5 4X5 y Y 2z3 ⌘ 2Z 3 4 5 4 5 For now, think of above as shorthand notation for x X s.t. y = Y 9 2z3 2Z 3 4 5 4 5 Camera projection X x f 00 r r r t 11 12 13 x Y y = 0 f 0 r r r t 21 22 23 y 2Z 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 6 7 4 5 Camera4 instrinsic matrix5 K 4 Camera extrinsics 53D point in (can include skew & non-square pixel size) (rotation and translation) world4 coordinates5

r2

r1 camera r3

T

world coordinate frame

x Aside: homogenous notation is shorthand for x = Fancier intrinsics

xs = sxx } non-square pixels ys = syy

x0 = xs + ox } shifted origin y0 = ys + oy y skewed image axes x”=x0 + s✓y0 ✓ x

sx s✓ ox f 00 fsx fs✓ ox K = 0 sy oy 0 f 0 = 0 fsy oy 2 0013 20013 2 0013 4 5 4 5 4 5 Notation [Using Matlab’s rows x columns] X x fs fs o r r r t x ✓ x 11 12 13 x Y y = 0 fs o r r r t y y 21 22 23 y 2Z 3 213 2 0013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 X 4 5 Y = K3 3 R3 3 T3 1 2 3 ⇥ ⇥ ⇥ Z ⇥ ⇤ 6 1 7 6 7 X 4 5 Y = M3 4 2 3 ⇥ Z 6 1 7 6 7 4 5 Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor Notation (more) X X Y Y M3 4 2 3 = A3 3 b3 1 2 3 ⇥ Z ⇥ ⇥ Z 6 1 7 ⇥ ⇤ 6 1 7 6 7 6 7 4 5 X 4 5 = A3 3 Y + b3 1 ⇥ 2Z 3 ⇥ 4 5

T T m1 a1 b1 T T M = m2 ,A= a2 ,b= b2 2 T 3 2 T 3 2 3 m3 a3 b3 4 5 4 5 4 5 Applying the projection matrix 1 x = ( XY Za + b ) 1 1 ⇥ ⇤ 1 y = ( XY Za + b ) 2 2 ⇥ ⇤ = XY Za3 + b3 ⇥ ⇤

Set of 3D points that project to x = 0: XY Za1 + b1 =0

Set of 3D points that project to y = 0: ⇥XY Z⇤ a2 + b2 =0 ⇥ ⇤ Set of 3D points that project to x = inf or y = inf: XY Za3 + b3 =0 ⇥ ⇤ Rows of the projection matrix describe the 3 planes defined by the image coordinate system

a3

a1 y COP

a2

x

image plane Other geometric properties

(x,y) COP (X,Y,Z)

What’s set of (X,Y,Z) points that project to same (x,y)?

X x 1 1 Y = w + b where w = A y ,b= A b 2Z 3 213 4 5 4 5 What’s the position of COP / pinhole?

X X 1 A Y + b =0 Y = A b 2Z 3 ) 2Z 3 4 5 4 5 Affine Cameras

x = XY Za + b T 1 1 m3 = 0001 y = XY Za + b ⇥ ⇤ ⇥ ⇤ 2 1 ⇥ ⇤ Image coordinates (x,y) are an affine of world coordinates (X,Y,Z)

Affine transformations = linear transformations plus an offset

• Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines are projected to parallel lines • The transformation can be written as a direct linear transformation

Geometric Transformations

Euclidean (trans + rot) Affine: preserves Projective: preserves lengths + angles parallel lines preserves lines

Projective

Affine

Euclidean Agenda

• Rotations

• Camera calibration

• Homography

• Ransac Calibration: Recover M from scene points P1,..,PN and the corresponding projections in the image plane p1,..,pN

Find M that minimizes the distance between the actual points in the image, pi, and their predicted projections MPi

Problems: • The projection is (in general) non-linear • M is defined up to an arbitrary scale factor PnP = Perspective n-Point The math for the calibration procedure follows a recipe that is used in many (most?) problems involving camera geometry, so it’s worth remembering:

Write relation between image point, projection matrix, and point in space:

pi ≡ MPi

m T P m T P Write non-linear relations 1 i 2 i ui = vi = between coordinates: T T m3 Pi m3 Pi

T T m1 Pi − (m3 Pi ) ui = 0 Make them linear: T T m2 Pi − (m3 Pi ) vi = 0 ⎡m1 ⎤ ⎡PT 0 − u PT ⎤ Write them in i i i m = 0 m = ⎢m ⎥ matrix form: ⎢ PT T ⎥ ⎢ 2 ⎥ ⎣ 0 i − vi Pi ⎦ ⎣⎢m3 ⎦⎥

T T ⎡ P1 0 − u1P1 ⎤ ⎢ T T ⎥ Put all the relations for all the ⎢ 0 P1 − v1P1 ⎥ points into a single matrix: ⎢ ⎥m = 0 ⎢ T T ⎥ ⎢PN 0 − uN PN ⎥ ⎢ T T ⎥ (vector of 0’s) ⎣ 0 PN − vN PN ⎦

In noise-free case: Lm = 0 What about noisy case?

min Lm 2 m 2=1 || || || || Min right singular vector of L (or eigenvector of LTL)

Is this the right error to minimize?

If not, what is? Ideal error

Pi

(u1,v1) P1 y MPi

(ui,vi) z

2 2 ⎛ m ⋅ P ⎞ ⎛ m ⋅ P ⎞ ⎜u 1 i ⎟ ⎜v 2 i ⎟ x Error(M) = ⎜ i − ⎟ + ⎜ i − ⎟ ⎝ m3 ⋅ Pi ⎠ ⎝ m3 ⋅ Pi ⎠

Initialize nonlinear optimization with “algebraic” solution Radial Lens Distortions Radial Lens Distortions

No Distortion Barrel Distortion Pincushion Distortion

Correcting Radial Lens Distortions

Before After

http://www.grasshopperonline.com/barrel_distortion_correction_software.html Overall approach

Minimize reprojection error: Error(M,k’s)

Initialize with algebraic solution (approaches in literature based on various assumptions) Revisiting

Place world coordinate frame on object plane

X x f 00 r r r t 11 12 13 x Y y = 0 f 0 r r r t 21 22 23 y 2 0 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 4 5 Projection of planar points X x f 00 r r r t 11 12 13 x Y y = 0 f 0 r r r t 21 22 23 y 2 0 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 4 5 f 00 r11 r12 tx X = 0 f 0 r r t Y 2 3 2 21 22 y3 2 3 001 r31 r32 tz 1 4 5 4 5 4 5 fr11 fr12 ftx X = fr fr ft Y 2 21 22 y3 2 3 r31 r32 tz 1 4 5 4 5 Convert between 2D location on object plane and image coordinate with a 3X3 matrix H (Above holds for any instrinc matrix K) Two-views of a plane

Image correspondences

x1 X 1 y1 = H1 Y 2 3 2 3 x2 x1 1 1 1 y2 = H2H1 y1 4 5 4 5 2 3 2 3 x2 X 1 1 2 y2 = H2 Y 4 5 4 5 2 1 3 2 1 3 [LHS and RHS are related by a scale factor] 4 5 4 5 [Aside: H usually invertible] x2 X y2 = H Y 2 1 3 2 1 3 4 5 4 5 Computing homography projections

Given (x1,y1) and H, how do we compute (x2,y2)?

x2 abc x1 y2 = def y1 2 1 3 2ghi3 2 1 3 4 5 4 5 4 5

x2 ax1 + by1 + c x2 = = gx1 + hy1 + i

Is this operation linear in H or (x1,y1)? Estimating homographies

Given corresponding 2D points in left and right image, estimate H

Image correspondences

x2(gx1 + hy1 + i)=ax1 + by1 + c . .

0 AH(:) = 0 Homogenous linear system 2.3 . 6 7 4 5 How many corresponding points needed? How many degrees of freedom in H? Estimating homographies

Given corresponding 2D points in left and right image, estimate H

0 Image 0 correspondences AH(:) = 2.3 . 6 7 4 5

H is determined only up to scale factor (8 DOFs) Need 4 points minimum. How to handle more points?

min AH(:) 2 H(:) 2=1 || || || || Minimum right singular vector of A (eigenvector of ATA) “Frontalizing” planes using homographies

Estimate homography on (at least) 4 pairs of corresponding points (e.g., corners of quad/rect) Apply homography on all (x,y) coordinates inside target rectangle to compute source pixel location “Frontalizing” planes using homographies LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5 cues (parallax) can only be recovered when T is nonzero. Looking at the homography equation, the limit of H as d approaches infinity is R. Thus any pair of images of an arbitrary scene captured by a purely rotating camera is related by a planar homography. A planar panorama can be constructed by capturing many overlapping images at di↵erent rotations, picking an image to be a reference, and then finding corresponding points between the overlapping images. The pairwise homographies are derived from the corresponding points, forming a mosaic that typically is shaped like a “bow-tie,” as images farther away from the reference are warped outwardSpecial to fit case the homography. of 2 views: The figure below is from Pollefeys androtations Hartley & Zisserman. about camera center

Can be modeled as planar transformations, regardless of scene geometry!

(a) incline L.jpg (img1) (b) incline R.jpg (img2) (c) img2 warped to img1’s frame

Figure 5: Example output for Q6.1: Original images img1 and img2 (left and center) and img2 warped to fit img1 (right). Notice that the warped image clips out of the image. We will fix this in Q6.2 Figure 6: Final panorama view. With homography estimated with RANSAC.

H2to1=computeH(p1,p2) a folder matlab containing all the .m and .mat files you were asked to write and • Inputs: p1 and p2 should be 2 N matrices of correspondinggenerate (x, y)T coordinates between two images. ⇥ Outputs: H2to1 should be a 3 3 matrix encoding the homographya pdf thatnamed best matcheswriteup.pdf containing the results, explanations and images asked for ⇥ the linear equation derived above for Equation 8 (in the least• in squares the assignment sense). Hint: along with to the answers to the questions on homographies. Remember that a homography is only determined up to scale. The Matlab functions eig() or svd() will be useful. Note that this functionSubmit can be all written the code without needed an to make your panorama generator run. Make sure all the .m explicit for-loop over the data points. files that need to run are accessable from the matlab folder without any editing of the path variable. If you downloaded and used a feature detector for the extra credit, include the 6 Stitching it together: Panoramas (30code withpts) your submission and mention it in your writeup. You may leave the data folder in your submission, but it is not needed. Please zip your homework as usual and submit it We can also use homographies to create a panorama image from multiple views of the same 4.7. Secondscene. This is possible Derivation for example when there is noof camera Homographyusing translation blackboard. between the views Constraint (e.g., only rotation about the camera center), as we saw in Q4.2. First, you will generate panoramas using matched point correspondences between images using the BRIEF matching you implemented in Q2.4. We willAppendix: assume that there is Image no error Blending The homographyin your matched constraint, point correspondences element between images (Although by element, there might be in some homogenous coordinates errors). Note: This section is not for credit and is for informational purposes only. is as follows:In the next section you will extend the technique to use (potentially noisy) keypoint matches. For overlapping pixels, it is common to blend the values of both images. You can sim- You will need to use the provided function warp im=warpH(im, H, out size),which warps image im using the homography transform H.Thepixelsinply averagewarp_im theare values sampled but that will leave a seam at the edges of the overlapping images. at coordinates in the rectangle (1, 1) to (out_size(2), out_size(1)Alternatively,). The coordinates you can obtain of a blending value for each image that fades one image into the the pixelsx in2 the source image areH taken11 toH be12 (1, 1) toH (size(im,2)other.13 To, dosize(im,1) this,x1 first) and create a mask like this for each image you wish to blend: transformed according to H. y = H H Hmask = zeros(size(im,1),y size(im,2));x Hx Q6.1 (15pts)2 In this problem you21 will implement22 and usemask(1,:)23 the function = (stub 1;1 mask(end,:) provided =2 1; mask(:,1)1 = 1; mask(:,end) = 1; •2in matlab/imageStitching.m3 2 ): 3 2 3 , ⇠ z2 H31 H32 Hmask33 = bwdist(mask,z1 ’city’); [panoImg] = imageStitching(img1,mask img2, = H2to1) mask/max(mask(:)); 4 5 4 5 4 5 on two images from the Dusquesne incline. This functionThe accepts function two imagesbwdist andcomputes the the distance transform of the binarized input image, so this In inhomogenousoutput from the homography coordinates estimation function. (x20 This=mask functionx will2/z will: be2 zeroand at they borders20 = andy2/z 1 at2 the), center of the image. You can warp this mask just as you warped your images. How would you use the mask weights to compute a linear 10 combination of the pixels in the overlap region? Your function should behave well where one or both of the blending constants are zero.

13 LECTURE 4. PLANAR SCENES AND HOMOGRAPHY 5 cues (parallax) can only be recovered when T is nonzero. Looking at the homography equation, the limit of H as d approaches infinity is R. Thus any pair of images of an arbitrary scene captured by a purely rotating camera is related by a planar homography. A planar panorama can be constructed by capturing many overlapping images at di↵erent rotations, picking an image to be a reference, and then finding corresponding points between the overlapping images. The pairwise homographies are derived from the corresponding points, forming a mosaic that typically is shaped like a “bow-tie,” as images farther away from the reference are warped outward to fit the homography. The figure below is from Pollefeys and Hartley & Zisserman.Derivation

X2 X1 Y = R Y K2 2 2 3 2 1 3 Z2 Z1 4 5 4 5 x2 f2 00 X2 2 y2 = 0 f2 0 Y2 2 1 3 2 0013 2Z 3 … 2 4 5 4 5 4 5 x2 x1 1 y2 = K2RK1 y1 2 1 3 2 1 3 4 5 4 5

4.7. Second Derivation of Homography Constraint The homography constraint, element by element, in homogenous coordinates is as follows:

x2 H11 H12 H13 x1 y = H H H y x Hx 2 2 3 2 21 22 23 3 2 1 3 , 2 ⇠ 1 z2 H31 H32 H33 z1 4 5 4 5 4 5 In inhomogenous coordinates (x20 = x2/z2 and y20 = y2/z2), Take-home points for homographies

x2 abc x1 y2 = def y1 2 1 3 2ghi3 2 1 3 4 5 4 5 4 5

• If camera rotates about its center, then the images are related by a homography irrespective of scene depth.

• If the scene is planar, then images from any two cameras are related by a homography.

• Homography mapping is a 3x3 matrix with 8 degrees of freedom. Matching features

What do we do about the “bad” matches? General problem: we are trying to fit a (geometric) model to noisy data

How about we choose the average vector (least-squares soln)? 49 Why will/won’t this work? Let’s generalize the problem a bit Estimate best model (a ) that fits data x ,y { i i} 2 min (yi fw,b(xi)) w,b i fw,bX(xi)=wxi + b

y

x Let’s generalize the problem a bit “Least-squares” solution

y

x RANSAC Line Fitting Example

Sample two points RANSAC Line Fitting Example

Fit Line RANSAC Line Fitting Example

Total number of points within a threshold of line. RANSAC Line Fitting Example

Repeat, until get a good result RANSAC Line Fitting Example

Repeat, until get a good result RANSAC Line Fitting Example

Repeat, until get a good result RAndom SAmple Consensus

Select one match, count inliers RAndom SAmple Consensus

Select one match, count inliers Least squares fit

Find “average” translation vector for the largest of inliers RANSAC for estimating transformation

RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 2 3. Compute inliers (point matches where |pi’ - T pi| < ε) 4. Keep largest set of inliers

5. Re-compute least-squares estimate of transformation T using all of the inliers RANSAC for estimating transformation

RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 2 3. Compute inliers (point matches where |pi’ - T pi| < ε) 4. Keep largest set of inliers

5. Re-compute least-squares estimate of transformation T using all of the inliers

Recall homography estimation: how do we estimate with all inlier points? Ah =0,AR8X9 h, 0 R9 2 2 RANSAC for estimating transformation

RANSAC loop: 1. Select feature pairs (at random) 2. Compute transformation T (exact) 2 3. Compute inliers (point matches where |pi’ - T pi| < ε) 4. Keep largest set of inliers

5. Re-compute least-squares estimate of transformation T using all of the inliers

Recall homography estimation: how do we estimate with all inlier points? Ah =0,AR8X9 h, 0 R9 2 2 RANSAC for alignment RANSAC for alignment RANSAC for alignment Planar object recognition

(what is transformation used; how many pairs must be selected in initial step?