Or Motion) Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Image Sequence (or Motion) Analysis also referred as: Dynamic Scene Analysis Input: A Sequence of Images (or frames) Output: Motion parameters (3-D) of the objects/s in motion and/or 3-D Structure information of the moving objects Assumption: The time interval ∆T between two consecutive or successive pair of frames is small. Typical steps in Dynamic Scene Analysis • Image filtering and enhancement • Tracking • Feature detection • Establish feature correspondence • Motion parameter estimation (local and global) • Structure estimation (optional) • Predict occurrence in next frame TRACKING in a Dynamic Scene F1 Since ∆T is small, the amount of motion/displacement of objects, F2 between two successive pair of frames is also small. F3 At 25 frames per sec (fps): ∆T = 40 msec. FN At 50 frames per sec (fps): ∆T = 20 msec. Image Motion Image changes by difference equation: fd (x1, x2 ,ti ,t j ) = f (x1, x2 ,ti ) − f (x1, x2 ,t j ) = f (ti ) − f (ti ) = fi − f j Accumulated difference image: fT (X ,tn ) = fd (X ,tn−1,tn ) − fT (X ,tn−1);n ≥ 3, where, fT (X ,t2 ) = fd (X ,t2 ,t1) Moving Edge (or feature) detector: ∂f F (X ,t ,t ) = . f (X ,t ,t ) mov _ feat 1 2 ∂X d 1 2 Recent methods include background and foreground modeling. F 1 F2 F1 -F2 abs(F1 -F2) Input Video Frame Segmented Video frame Extracted moving object using Alpha matte Two categories of Visual Tracking Algorithms: Target Representation and Localization; Filtering and Data Association. A. Some common Target Representation and Localization algorithms: Blob tracking: Segmentation of object interior (for example blob detection, block- based correlation or optical flow) Kernel-based tracking (Mean-shift tracking): An iterative localization procedure based on the maximization of a similarity measure (Bhattacharyya coefficient). Contour tracking: Detection of object boundary (e.g. active contours or Condensation algorithm) Visual feature matching: Registration B. Some common Filtering and Data Association algorithms: Kalman filter: An optimal recursive Bayesian filter for linear functions and Gaussian noise. Particle filter: Useful for sampling the underlying state-space distribution of non-linear and non-Gaussian processes. Also see: Match moving; Motion capture; Swistrack BLOB Detection – Approaches used: • Corner detectors (Harris, Shi & Tomashi, Susan, Level Curve Curvature, Fast etc.) • LOG, DOG, DOH (Det. Of Hessian), Ridge detectors, Scale-space, Pyramids • Hessian affine, SIFT (Scale-invariant feature transform) • SURF (Speeded Up Robust Features) • GLOH (Gradient Location and Orientation Histogram) • LESH (Local Energy based Shape Histogram). Complexities and issues in tracking: Need to overcome difficulties that arise from noise, occlusion, clutter, moving cameras, multiple moving objects and changes in the foreground objects or in the background environment. DOH - the scale-normalized determinant of the Hessian, also referred to as the Monge–Ampère operator, where, HL denotes the Hessian matrix of L and then detecting scale-space maxima/minima of this operator one obtains another straightforward differential blob detector with automatic scale selection which also responds to saddles. Hessian Affine: SURF is based on a set of 2-D HAAR wavelets; implements DOH SIFT: Four major steps: Detect extremas 1. Scale-space extrema detection at various scales: 2. Keypoint localization 3. Orientation assignment 4. Keypoint descriptor Only three steps (1, 3 & 4) are shown below: Histograms contain 8 bins each, and each descriptor contains an array of 4 histograms around the keypoint. This leads to a SIFT feature vector with (4 x 4 x 8 = 128 elements). e.g. 2*2*8: GLOH (Gradient Location and Orientation Histogram) Gradient location-orientation histogram (GLOH) is an extension of the SIFT descriptor designed to increase its robustness and distinctiveness. The SIFT descriptor is computed for a log-polar location grid with 3 bins in radial direction (the radius set to 6, 11 and 15) and 8 in angular direction, which results 17 location bins. Note that the central bin is not divided in angular directions. The gradient orientations are quantized in 16 bins. This gives a 272 bin histogram. The size of this descriptor is reduced with PCA. The covariance matrix for PCA is estimated on 47 000 image patches collected from various images (see section III-C.1). The 128 largest eigenvectors are used for description. Gradient location and orientation histogram (GLOH) is a new descriptor which extends SIFT by changing the location grid and using PCA to reduce the size. Motion Equations: o x Models derived from mechanics: Euler’s or Newton’s equations. P’ = R.P + T, where: (X, Y) (X’, Y’) 2 2 y m11 = n1 + (1− n1 )cosθ m12 = n1n2 (1− cosθ ) − n3 sinθ m = n n (1− cosθ ) + n sinθ 13 1 3 2 z P(x, y, z) P’(x’, y’, z’) at t1 at t R = [mij ]3*3 2 m = n n (1− cosθ ) + n sinθ 21 1 2 3 T = ∂x ∂y ∂z T 2 2 [] m22 = n2 + (1− n2 )cosθ 2 2 2 m23 = n2n3 (1− cosθ ) − n1 sinθ where: n1 + n2 + n3 =1 Observation in the image plane: U = X’ – X; V = Y’ – Y. m31 = n1n3 (1− cosθ ) − n2 sinθ X = Fx / z;Y = Fy / z. m32 = n2n3 (1− cosθ ) + n1 sinθ 2 2 X '= Fx'/ z';Y = Fy'/ z'. m33 = n3 + (1− n3 )cosθ Mathematically (for any two successive frames), the problem is: Input: Given (X, Y), (X’, Y’); Output: Estimate n1, n2, n3, θ, δx, δy, δz First look at the problem of estimating motion parameters using 3D knowledge only: Given only three (3) non-linear equations, you have to obtain seven (7) parameters. Need a few more constraints and may be assumptions too. Since ∆T is small, θ must also be small enough (in radians). Thus R simplifies (reduces) to: ⎡ 1 − n3θ n2θ ⎤ ⎡ 1 −φ3 φ2 ⎤ R = ⎢ n θ 1 − n θ ⎥ = ⎢ φ 1 −φ ⎥ θ →0 ⎢ 3 1 ⎥ ⎢ 3 1 ⎥ ⎣⎢− n2θ n1θ 1 ⎦⎥ ⎣⎢−φ2 φ1 1 ⎦⎥ 2 2 2 2 where, φ1 + φ2 + φ3 = θ Now (problem linearized), given three (3) linear equations, Evaluate R . you have to obtain six (6) parameters. - Solution ? Take two point correspondences: ' ' P1 = R.P1 +T; P2 = R.P2 +T; Subtracting one from the other, gives: ' ' (eliminates the translation component) (P1 − P2 ) = R.(P1 − P2 ) ⎡∆x' ⎤ ⎡ 1 −φ φ ⎤⎡∆x ⎤ 12 3 2 12 ⎡φ ⎤ ⎢ ' ⎥ ⎢ ⎥⎢ ⎥ 1 ∆y = φ 1 −φ ∆y , Solve ⎢ 12 ⎥ ⎢ 3 1 ⎥⎢ 12 ⎥ Φ = ⎢φ ⎥; ⎢ ' ⎥ ⎢ ⎥⎢ ⎥ for: ⎢ 2 ⎥ ⎣∆z12 ⎦ ⎣−φ2 φ1 1 ⎦⎣∆z12 ⎦ ⎣⎢φ3 ⎦⎥ where,∆x12 = x1 − x2 , ' ' ' ⎡ 0 ∆z12 − ∆y12 ⎤ ∆x12 = x1 − x2; ∇ = ⎢− ∆z 0 ∆x ⎥ and so on .... for y and z. 12 ⎢ 12 12 ⎥ ⎣⎢ ∆y12 − ∆x12 0 ⎦⎥ Re-arrange to form: ⎡φ1 ⎤ ' ⎡∆x12 − ∆x12 ⎤ ⎢ ⎥ 2 ⎢ ⎥ []∇12 φ2 = ∆12 2 ' ⎢ ⎥ and ∆12 = ⎢∆y12 − ∆y12 ⎥ ⎢φ ⎥ ⎢ ' ⎥ ⎣ 3 ⎦ ⎣ ∆z12 − ∆z12 ⎦ ⎡φ1 ⎤ ⎢ ⎥ 2 []∇12 φ2 = ∆12 ⎡ ' ⎤ ⎢ ⎥ ⎡ 0 ∆z12 − ∆y12 ⎤ ∆x12 − ∆x12 ⎢ ⎥ ∇ = ⎢− ∆z 0 ∆x ⎥ and ∆2 = ∆y' − ∆y ⎣⎢φ3 ⎦⎥ 12 ⎢ 12 12 ⎥ 12 ⎢ 12 12 ⎥ ⎢ ⎥ ⎢ ' ⎥ ⎣ ∆y12 − ∆x12 0 ⎦ ⎣∆z12 − ∆z12 ⎦ ∇12 is a skew - symmetric matrix. Interprete, why is it so ? ∇12 = 0 So what to do ? Contact a Mathematician? ⎡φ1 ⎤ Take two (one pair) more point correspondences: ∇ ⎢φ ⎥ = ∆2 []34 ⎢ 2 ⎥ 34 ⎣⎢φ3 ⎦⎥ ' ⎡− ∆z34 0 ∆x34 ⎤ ⎡∆y34 − ∆y34 ⎤ ⎢ ⎥ ∇ = ⎢ ∆y − ∆x 0 ⎥ and ∆2 = ∆z ' − ∆z 34 ⎢ 34 34 ⎥ 34 ⎢ 34 34 ⎥ ⎢ ⎥ ⎢ ' ⎥ ⎣ 0 ∆z34 − ∆y34 ⎦ ⎣∆x34 − ∆x34 ⎦ Using two pairs (4 points) ⎡φ1 ⎤ ⎡φ1 ⎤ of correspondences: ∇ ⎢φ ⎥ = ∆2 ∇ ⎢φ ⎥ = ∆2 []12 ⎢ 2 ⎥ 12 []34 ⎢ 2 ⎥ 34 Adding: ⎢φ ⎥ ⎢φ ⎥ ⎡φ1 ⎤ ⎣ 3 ⎦ ⎣ 3 ⎦ ∇ + ∇ ⎢φ ⎥ = [∆2 + ∆2 ] ⇒ ∇ Φ = [∆2 ] []12 34 ⎢ 2 ⎥ 12 34 [][]1234 1234 ⎢φ ⎥ ⎣ 3 ⎦ ⎡− ∆z34 0 ∆x34 ⎤ 0 ∆z − ∆y ⎢ ⎥ ⎡ 12 12 ⎤ ∇ = ∆y − ∆x 0 ⎢ ⎥ 34 ⎢ 34 34 ⎥ ∇12 = − ∆z12 0 ∆x12 ⎢ ⎥ ⎣⎢ 0 ∆z34 − ∆y34 ⎦⎥ ⎢ ∆y − ∆x 0 ⎥ ⎣ 12 12 ⎦ ' ⎡∆y34 − ∆y34 ⎤ ⎡∆x' − ∆x ⎤ ⎢ ⎥ 12 12 and ∆2 = ∆z ' − ∆z 2 ⎢ ' ⎥ 34 ⎢ 34 34 ⎥ and ∆ = ∆y − ∆y 12 ⎢ 12 12 ⎥ ⎢∆x' − ∆x ⎥ ⎢ ' ⎥ ⎣ 34 34 ⎦ ⎣∆z12 − ∆z12 ⎦ ' ' ⎡ −∆z34 ∆z12 ∆x34 −∆y12⎤ ⎡∆y34 −∆y34 +∆x12 −∆x12⎤ ⎢ ⎥ ∇ = ⎢∆y −∆z −∆x ∆x ⎥ ; ∆2 = ∆z' −∆z +∆y' −∆y 1234 ⎢ 34 12 34 12 ⎥ 1234 ⎢ 34 34 12 12 ⎥ ⎢ ⎥ ⎢ ' ' ⎥ ⎣ ∆y12 −∆x12 −∆z34 −∆y34 ⎦ ⎣∆x34 −∆x34 +∆z12 −∆z12 ⎦ ⎡φ1 ⎤ ∇ + ∇ ⎢φ ⎥ = [∆2 + ∆2 ] ⇒ ∇ Φ = [∆2 ] []12 34 ⎢ 2 ⎥ 12 34 [][]1234 1234 ⎣⎢φ3 ⎦⎥ Condition for existence of a unique solution is based on a geometrical relationship of the coordinates of four points in space, at time t1: ⎡ −∆z34 ∆z12 ∆x34 −∆y12⎤ ∇ = ⎢∆y −∆z −∆x ∆x ⎥ 1234 ⎢ 34 12 34 12 ⎥ ⎣⎢ ∆y12 −∆x12 −∆z34 −∆y34 ⎦⎥ ' ' ⎡∆y34 −∆y34 +∆x12 −∆x12⎤ 2 ⎢ ' ' ⎥ and ∆1234 = ⎢∆z34 −∆z34 +∆y12 −∆y12 ⎥ ⎢ ' ' ⎥ ⎣∆x34 −∆x34 +∆z12 −∆z12⎦ This solution is often used as an initial guess for the final estimate of the motion parameters. Find geometric condition, when: ∇1234 = 0 OPTICAL FLOW A point in 3D space: X 0 = []kxo kyo kzo k ;k ≠ 0, an arbitary constant. Image point: where, X i = [wxi wyi w] X i = PX 0 ⎡x ⎤ x o ⎡ i ⎤ ⎢ zo ⎥ Assuming normalized focal length, f = 1: ⎢ ⎥ = ⎢ ⎥....(1) y yo ⎣ i ⎦ ⎢ ⎥ ⎣ zo ⎦ Assuming linear motion model (no acceleration), between successive frames: ⎡xo (t)⎤ ⎡ xo + ut ⎤ Combining equations (1) and (2): ⎢y (t)⎥ = ⎢ y + vt ⎥....(2) ⎢ o ⎥ ⎢ o ⎥ ⎡(x + ut) ⎤ x (t) o ⎢z (t)⎥ ⎢z + wt⎥ ⎡ i ⎤ ⎢ (zo + wt)⎥ ⎣ o ⎦ ⎣ o ⎦ ⎢ ⎥ = ⎢ ⎥....(3) y (t) (yo + vt) ⎣ i ⎦ ⎢ ⎥ ⎣ (zo + wt)⎦ ⎡(xo + ut) ⎤ ⎡xi (t)⎤ ⎢ (z + wt)⎥ Assume in equation (3), = o ....(3) that w (=dz/dt) < 0. ⎢ ⎥ ⎢ ⎥ y (t) (yo + vt) ⎣ i ⎦ ⎢ ⎥ ⎣ (zo + wt)⎦ In that case, the object (or points on the object) will appear to come closer to you. Now visualize, where does these points may come out from? ⎡u ⎤ This point “e” is a point in ⎡xi (t)⎤ w the image plane, known as Lt = ⎢ ⎥ = e.....(4) the: t→−∞⎢ ⎥ v ⎣yi (t)⎦ ⎢ ⎥ ⎣ w⎦ FOE (Focus of Expansion). The motion of the object points appears to emanate from a fixed point in the image plane. This point is known as FOE.