• Rotations • Camera Calibration • Homography • Ransac
Total Page:16
File Type:pdf, Size:1020Kb
Agenda • Rotations • Camera calibration • Homography • Ransac Geometric Transformations y 164 Computer Vision: Algorithms andx Applications (September 3, 2010 draft) Transformation Matrix # DoF Preserves Icon translation I t 2 orientation 2 3 h i ⇥ ⇢⇢SS rigid (Euclidean) R t 3 lengths S ⇢ 2 3 S⇢ ⇥ h i ⇢ similarity sR t 4 angles S 2 3 S⇢ h i ⇥ ⇥ ⇥ affine A 6 parallelism ⇥ ⇥ 2 3 h i ⇥ projective H˜ 8 straight lines ` 3 3 ` h i ⇥ Table 3.5 Hierarchy of 2D coordinate transformations. Each transformation also preserves Let’s definethe properties families listed of in thetransformations rows below it, i.e., similarity by the preserves properties not only anglesthat butthey also preserve parallelism and straight lines. The 2 3 matrices are extended with a third [0T 1] row to form ⇥ a full 3 3 matrix for homogeneous coordinate transformations. ⇥ amples of such transformations, which are based on the 2D geometric transformations shown in Figure 2.4. The formulas for these transformations were originally given in Table 2.1 and are reproduced here in Table 3.5 for ease of reference. In general, given a transformation specified by a formula x0 = h(x) and a source image f(x), how do we compute the values of the pixels in the new image g(x), as given in (3.88)? Think about this for a minute before proceeding and see if you can figure it out. If you are like most people, you will come up with an algorithm that looks something like Algorithm 3.1. This process is called forward warping or forward mapping and is shown in Figure 3.46a. Can you think of any problems with this approach? procedure forwardWarp(f,h, out g): For every pixel x in f(x) 1. Compute the destination location x0 = h(x). 2. Copy the pixel f(x) to g(x0). Algorithm 3.1 Forward warping algorithm for transforming an image f(x) into an image g(x0) through the parametric transform x0 = h(x). Rotations Linear transformations that preserve distances and angles Definition: an orthogonal transformation perserves dot products T T n n n a Tb = F (a) F (b)whereF (a)=Aa, a Rn ,A Rn n⇥ a b = T (a)T (b)whereT (a)=Aa, a 2R ,A 2R ⇥ 2 2 T T T T a b = a A Ab A A = I [can conclude by setting a,b = coordinate vectors] () Defn: A is a rotation matrix if ATA = I, det(A) = 1 Defn: A is a reflection matrix if ATA = I, det(A) = -1 2D Rotations cos ✓ sin ✓ R = sin ✓ −cos ✓ 1 DOF 3D Rotations X r11 r12 r13 X R Y = r r r Y 2 3 2 21 22 233 2 3 Z r31 r32 r33 Z 4 5 4 5 4 5 Think of as change of basis where ri = r(i,:) are orthonormal basis vectors r2 r1 rotated coordinate frame r3 How many DOFs? 3 = (2 to point r1 + 1 to rotate along r1) 7 Shears 1 hxy hxz 0 h 1 h 0 A˜ = 2 yx yz 3 hzx hzy 10 6 00017 6 7 4 Shears y into5 x 7 3D Rotations 8 LotsRotations of parameterizations that try to capture 3 DOFs Helpful• 3D Rotations one for fundamentally vision: axis-angle more complex representation than in 2D! • 2D: amount of rotation! Represent a 3D rotation with a unit vector that represents the axis of • 3D: amount and axis of rotation rotation, and an angle of rotation about that vector -vs- 2D 3D 8 05-3DTransformations.key - February 9, 2015 Recall: cross-product Dot product: a b = a b cos✓ · || || || || Cross product: ijk a2 a3 a1 a3 a1 a2 a1 a2 a3 = i j + k b2 b3 − b1 b3 b1 b2 b b b 1 2 3 Cross product 0 a a b − 3 2 1 matrix: a b = ˆab = a3 0 a1 b2 ⇥ 2 a a −0 3 2b 3 − 2 1 3 4 5 4 5 Approach ! R3, ! =1 2 || || ✓ x Approach ! R3, ! =1 2 || || x ? ✓ x k x 1. Write as x as sum of parallel and perpindicular component to omega 2. Rotate perpindicular component by 2D rotation of theta in plane orthogonal to omega R = I +ˆw sin ✓ +ˆwwˆ(1 cos ✓) − [Rx can simplify to cross and dot product computations] Exponential map ! R3, ! =1 2 || || x ? ✓ x k x R =exp(ˆv), where v = !✓ 1 = I +ˆv + vˆ2 + ... 2! [standard Taylor series expansion of exp(x) @ x=0 as 1 + x + (1/2!)x2 +…] Implication: we can approximate change in position due to a small rotation as v x, where v = !✓ ⇥ Agenda • Rotations • Camera calibration • Homography • Ransac Perspective projection y x (X,Y,Z) (x,y,1) z COP [right-handed coordinate system] f x = X Z f y = Y Z Perspective projection revisited x f 00 X λ y = 0 f 0 Y 213 20013 2Z 3 4 5 4 5 4 5 Given (X,Y,Z) and f, compute (x,y) and lambda: λx = fX λ = Z λx fX x = = λ Z Special case: f = 1 Natural geometric intuition: • 3D point is obtained by scaling ray pointed at image coordinate • Scale factor = true depth of point (X,Y,Z) (x,y,1) COP x X Z y = Y 213 2Z 3 4 5 4 5 [Aside: given an image with a focal length ‘f’, resize by ‘1/f’ to obtain unit-focal-length image] Homogenous notation x X y Y 2z3 ⇠ 2Z 3 4x5 4X5 y Y 2z3 ⌘ 2Z 3 4 5 4 5 For now, think of above as shorthand notation for x X λ s.t. λ y = Y 9 2z3 2Z 3 4 5 4 5 Camera projection X x f 00 r r r t 11 12 13 x Y λ y = 0 f 0 r r r t 21 22 23 y 2Z 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 6 7 4 5 Camera4 instrinsic matrix5 K 4 Camera extrinsics 53D point in (can include skew & non-square pixel size) (rotation and translation) world4 coordinates5 r2 r1 camera r3 T world coordinate frame λx Aside: homogenous notation is shorthand for x = λ Fancier intrinsics xs = sxx } non-square pixels ys = syy x0 = xs + ox } shifted origin y0 = ys + oy y skewed image axes x”=x0 + s✓y0 ✓ x sx s✓ ox f 00 fsx fs✓ ox K = 0 sy oy 0 f 0 = 0 fsy oy 2 0013 20013 2 0013 4 5 4 5 4 5 Notation [Using Matlab’s rows x columns] X x fs fs o r r r t x ✓ x 11 12 13 x Y λ y = 0 fs o r r r t y y 21 22 23 y 2Z 3 213 2 0013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 X 4 5 Y = K3 3 R3 3 T3 1 2 3 ⇥ ⇥ ⇥ Z ⇥ ⇤ 6 1 7 6 7 X 4 5 Y = M3 4 2 3 ⇥ Z 6 1 7 6 7 4 5 Claims (without proof): 1. A 3x4 matrix ‘M’ can be a camera matrix iff det(M) is not zero 2. M is determined only up to a scale factor Notation (more) X X Y Y M3 4 2 3 = A3 3 b3 1 2 3 ⇥ Z ⇥ ⇥ Z 6 1 7 ⇥ ⇤ 6 1 7 6 7 6 7 4 5 X 4 5 = A3 3 Y + b3 1 ⇥ 2Z 3 ⇥ 4 5 T T m1 a1 b1 T T M = m2 ,A= a2 ,b= b2 2 T 3 2 T 3 2 3 m3 a3 b3 4 5 4 5 4 5 Applying the projection matrix 1 x = ( XY Za + b ) λ 1 1 ⇥ ⇤ 1 y = ( XY Za + b ) λ 2 2 ⇥ ⇤ λ = XY Za3 + b3 ⇥ ⇤ Set of 3D points that project to x = 0: XY Za1 + b1 =0 Set of 3D points that project to y = 0: ⇥XY Z⇤ a2 + b2 =0 ⇥ ⇤ Set of 3D points that project to x = inf or y = inf: XY Za3 + b3 =0 ⇥ ⇤ Rows of the projection matrix describe the 3 planes defined by the image coordinate system a3 a1 y COP a2 x image plane Other geometric properties (x,y) COP (X,Y,Z) What’s set of (X,Y,Z) points that project to same (x,y)? X x 1 1 Y = λw + b where w = A− y ,b= A− b 2Z 3 213 − 4 5 4 5 What’s the position of COP / pinhole? X X 1 A Y + b =0 Y = A− b 2Z 3 ) 2Z 3 − 4 5 4 5 Affine Cameras x = XY Za + b T 1 1 m3 = 0001 y = XY Za + b ⇥ ⇤ ⇥ ⇤ 2 1 ⇥ ⇤ Image coordinates (x,y) are an affine function of world coordinates (X,Y,Z) Affine transformations = linear transformations plus an offset • Example: Weak-perspective projection model • Projection defined by 8 parameters • Parallel lines are projected to parallel lines • The transformation can be written as a direct linear transformation Geometric Transformations Euclidean (trans + rot) Affine: preserves Projective: preserves lengths + angles parallel lines preserves lines Projective Affine Euclidean Agenda • Rotations • Camera calibration • Homography • Ransac Calibration: Recover M from scene points P1,..,PN and the corresponding projections in the image plane p1,..,pN Find M that minimizes the distance between the actual points in the image, pi, and their predicted projections MPi Problems: • The projection is (in general) non-linear • M is defined up to an arbitrary scale factor PnP = Perspective n-Point The math for the calibration procedure follows a recipe that is used in many (most?) problems involving camera geometry, so it’s worth remembering: Write relation between image point, projection matrix, and point in space: pi ≡ MPi m T P m T P Write non-linear relations 1 i 2 i ui = vi = between coordinates: T T m3 Pi m3 Pi T T m1 Pi − (m3 Pi ) ui = 0 Make them linear: T T m2 Pi − (m3 Pi ) vi = 0 ⎡m1 ⎤ ⎡PT 0 − u PT ⎤ Write them in i i i m = 0 m = ⎢m ⎥ matrix form: ⎢ PT T ⎥ ⎢ 2 ⎥ ⎣ 0 i − vi Pi ⎦ ⎣⎢m3 ⎦⎥ T T ⎡ P1 0 − u1P1 ⎤ ⎢ T T ⎥ Put all the relations for all the ⎢ 0 P1 − v1P1 ⎥ points into a single matrix: ⎢ ! ! ! ⎥m = 0 ⎢ T T ⎥ ⎢PN 0 − uN PN ⎥ ⎢ T T ⎥ (vector of 0’s) ⎣ 0 PN − vN PN ⎦ In noise-free case: Lm = 0 What about noisy case? min Lm 2 m 2=1 || || || || Min right singular vector of L (or eigenvector of LTL) Is this the right error to minimize? If not, what is? Ideal error Pi (u1,v1) P1 y MPi (ui,vi) z 2 2 ⎛ m ⋅ P ⎞ ⎛ m ⋅ P ⎞ ⎜u 1 i ⎟ ⎜v 2 i ⎟ x Error(M) = ⎜ i − ⎟ + ⎜ i − ⎟ ⎝ m3 ⋅ Pi ⎠ ⎝ m3 ⋅ Pi ⎠ Initialize nonlinear optimization with “algebraic” solution Radial Lens Distortions Radial Lens Distortions No Distortion Barrel Distortion Pincushion Distortion Correcting Radial Lens Distortions Before After http://www.grasshopperonline.com/barrel_distortion_correction_software.html Overall approach Minimize reprojection error: Error(M,k’s) Initialize with algebraic solution (approaches in literature based on various assumptions) Revisiting homographies Place world coordinate frame on object plane X x f 00 r r r t 11 12 13 x Y λ y = 0 f 0 r r r t 21 22 23 y 2 0 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 4 5 Projection of planar points X x f 00 r r r t 11 12 13 x Y λ y = 0 f 0 r r r t 21 22 23 y 2 0 3 213 20013 2r r r t 3 31 32 33 z 6 1 7 4 5 4 5 4 5 6 7 4 5 f 00 r11 r12 tx X = 0 f 0 r r t Y 2 3 2 21 22 y3 2 3 001 r31 r32 tz 1 4 5 4 5 4 5 fr11 fr12 ftx X = fr fr ft Y 2 21 22 y3 2 3 r31 r32 tz 1 4 5 4 5 Convert between 2D location