Or Motion) Analysis
Total Page:16
File Type:pdf, Size:1020Kb
Image Sequence (or Motion) Analysis also referred as: Dynamic Scene Analysis Input: A Sequence of Images (or frames) Output: Motion parameters (3-D) of the objects/s in motion and/or 3-D Structure information of the moving objects Assumption: The time interval ΔT between two consecutive or successive pair of frames is small. Typical step s in Dy namic Scene Analy sis • Imaggge filtering and enhancement • Tracking • Feature detection • Establish feature correspondence • Motion param eter e sti matio n (lo cal and glob al) • Structure estimation ((poptional ) • Predict occurrence in next frame TRACKING in a Dynamic Scene F1 Since ΔT is small, the amount of motion/displacement of objects, F2 between two successive pair of frames is also small. F3 At 25 frames per sec (fps): ΔT=40msecT = 40 msec. FN At 50 frames per sec (fps): ΔT = 20 msec. Image Motion Image changes by difference equation: fd (x1, x2 ,ti ,t j ) = − f (x1, x2 ,ti ) f (x1, x2 ,t j ) = − = − f (ti ) f (ti ) fi f j Accumulated difference image: = − ≥ fT (X ,tn ) fd (X ,tn−1,tn ) fT (X ,tn−1);n 3, = where, fT (X ,t2 ) fd (X ,t2 ,t1) Moving Edge (or feature) detector: ∂f F (X ,t ,t ) = . f (X ,t ,t ) mov _ feat 1 2 ∂X d 1 2 Recent methods include background and foreground modeling. F 1 F2 F1 -F2 abs(F1 -F2) ItVidFInput Video Frame Segmented Video frame Extracted moving object using Alpha matte Two categories of Visual Tracking Algorithms: Target Representation and Localization; Filtering and Data Association. A. Some common Target Representation and Localization algorithms: Blob tracking: Segmentation of object interior (for example blob detection, block- based correlation or optical flow) Kernel-based tracking (Mean-shift tracking): An iterative localization procedure based on the maximization of a similarity measure (Bhattacharyya coefficient). Contour tracking: Detection of object boundary (e.g. active contours or Condensation algorithm) Visual feature matching: Registration B. Some common Filteri ng and D ata A ssoci ati on alihlgorithms: Kalman filter: An optimal recursive Bayesian filter for linear functions and Gaussian noise. Particle filter: Useful for sampling the underlying state-space distribution of non-linear and non-Gaussian processes. Also see: Match moving; Motion capture; Swistrack BLOB Detection – Approaches used: • Corner detectors (Harris, Shi & Tomashi, Susan, Level Curve Curvature, Fast etc.) • Ridge detection, Scale-space Pyramids • LOG, DOG, DOH (Det. Of Hessian) • Hess ian a ffine, SIFT (Scale-invariant feature transform) • SURF (Speeded Up Robust Features) • GLOH (Gradient Location and Orientation Histogram) • LESH (Local Energy based Shape Histogram). Complexities and issues in tracking: Need to overcome difficulties that arise from noise, occlusion, clutter, moving cameras , mu ltiple mo ving objects and changes in the foregro und objects or in the background environment. DOH - the scale-normalized determinant of the Hessian, also referred to as the Monge–Ampère operator, where, HL dhiifdenotes the Hessian matrix of L andhd then detect ing scale-space maxima/minima of this operator one obtains another straightforward differential blob detector with automatic scale selection which also responds to saddles. Hessian Affine: SURF is based on a set of 2-D HAAR wavelets; implements DOH SIFT: Four major steps: Detect extremas 1. Scale-space extrema detection at various scales: 2. Keypoint localization 3. Orientation assignment 4. Keypoint descriptor OlOnly th ree st eps (1 (13&4), 3 & 4) are s hown blbelow: Histoggprams contain 8 bins each, and each descriptor contains an array of 4 histograms around the keypoint. This leads to a SIFT feature vector with (4 x 4 x 8 = 128 elements). ege.g. 2*2*8: GLOH (Gradient Location and Orientation Histogram) Gradient location-orientation histogram (GLOH) is an extension of the SIFT descriptor designed to increase its robustness and distinctiveness. The SIFT descriptor is computed for a log-polar location grid with 3 bins in radial direction (the radius set to 6, 11 and 15) and 8 in angular direction, which results 17 location bins. Note that the central bin is not divided in angular directions. The gradient orientations are quantized in 16 bins. This gives a 272 bin histogram. The size of this descriptor is reduced with PCA. The covariance matrix for PCA is estimated on 47000 image patches collected from various images. The 128 largest eigenvectors are used for description. Gradient location and orientation histogram (GLOH) is a new descriptor which extends SIFT by changing the location grid and using PCA to reduce the size. Motion Equations: o x Models de rived fro m mec ha nics . Euler’s or Newton’s equations: (X, Y) P’ = R.P + T, where: (X’,Y , Y)’) = 2 + − 2 θ y m11 n1 (1 n1 )cos = − θ − θ m12 n1n2 (1 cos ) n3 sin m = n n (1− cosθ ) + n sinθ 13 1 3 2 z P(x, y, z) P’(x ’, y ’, z ’) at t1 = at t R [mij ]3*3 2 m = n n (1− cosθ ) + n sinθ 21 1 2 3 T = [∂x ∂y ∂z]T = 2 + − 2 θ m22 n2 (1 n2 )cos = − θ − θ 2 + 2 + 2 = m23 n2n3 (1 cos ) n1 sin where: n1 n2 n3 1 Observation in the image plane: = − θ − θ U = X’ – X; V = Y’ – Y. m31 n1n3 (1 cos ) n2 sin = − θ + θ X = Fx / z;Y = Fy / z. m32 n2n3 (1 cos ) n1 sin = = = 2 + − 2 θ X ' Fx'/ z';Y Fy'/ z'. m33 n3 (1 n3 )cos Mathematically (for any two successive frames), the problem is: Input: Given (X , Y) , (X’ , Y’); θ δ δ δ Output: Estimate n1, n2, n3, , x, y, z First look at the problem of estimating motion parameters using 3D knowledge only: Given only three (3) non-linear equations, you have to obtai n seven (7) parame ters. Need a few more constraints and may be assumptions too. Since ΔT is small, θ must also be small enough (in radians). Thus R simplifies (reduces) to: − θ θ −φ φ 1 n3 n2 1 3 2 R = n θ 1 − n θ = φ 1 −φ θ →0 3 1 3 1 − θ θ −φ φ n2 n1 1 2 1 1 φ 2 + φ 2 + φ 2 = θ 2 where, 1 2 3 Now (problem linearized), gg()q,iven three (3) linear equations, Evaluate R . you have to obtain six (6) parameters. - Solution ? Take two point correspondences: ' = + ' = + P1 R.P1 T; P2 R.P2 T; Su btr acting one fr om the other, giv es: ' − ' = − (eliminates the translation component) (P1 P2 ) R.(P1 P2 ) Δx' 1 −φ φ Δx 12 3 2 12 φ ' 1 Δy = φ 1 −φ Δy , Solve 12 3 1 12 Φ = φ ; Δ ' −φ φ Δ for: 2 z12 2 1 1 z12 φ Δ = − 3 where, x12 x1 x2 , Δ − Δ Δ ' = ' − ' 0 z12 y12 x12 x1 x2; ∇ = − Δz 0 Δx and so on .... for y and z. 12 12 12 Δ − Δ y12 x12 0 Re-arrange to form: φ 1 Δ ' − Δ x12 x12 []∇ φ = Δ2 12 2 12 Δ2 = Δ ' − Δ and 12 y12 y12 φ Δ ' − Δ 3 z12 z12 φ 1 [∇ ]φ = Δ2 12 2 12 Δ − Δ Δ ' − Δ 0 z12 y12 x12 x12 φ ∇ = − Δz 0 Δx and Δ2 = Δy' − Δy 3 12 12 12 12 12 12 Δ − Δ Δ ' − Δ y12 x12 0 z12 z12 ∇ 12 is a skew - symmetric matrix. ∇ = Interprete, why is it so ? 12 0 So what to do ? Contact a Mathematician? φ 1 Take two (one pair) more point correspondences: []∇ φ = Δ2 34 2 34 φ 3 − Δ Δ Δ ' − Δ z34 0 x34 y34 y34 ∇ = Δy − Δx 0 and Δ2 = Δz ' − Δz 34 34 34 34 34 34 Δ − Δ Δ ' − Δ 0 z34 y34 x34 x34 φ φ Using two pairs (4 points) 1 1 of correspondences: [∇ ]φ = Δ2 [∇ ]φ = Δ2 12 2 12 34 2 34 Adding: φ φ φ 1 3 3 []∇ + ∇ φ = [Δ2 + Δ2 ] [][]∇ Φ = [Δ2 ] 12 34 2 12 34 1234 1234 φ − Δ Δ 3 z34 0 x34 0 Δz − Δy 12 12 ∇ = Δy − Δx 0 34 34 34 ∇ = − Δz 0 Δx 12 12 12 0 Δz − Δy Δ − Δ 34 34 y12 x12 0 Δy' − Δy Δ ' − Δ 34 34 x12 x12 Δ2 = Δ ' − Δ and 34 z34 z34 and Δ2 = Δy' − Δy 12 12 12 Δx' − Δx Δ ' − Δ 34 34 z12 z12 −Δ Δ Δ −Δ Δ ' −Δ +Δ ' −Δ z34 z12 x34 y12 y34 y34 x12 x12 2 ' ' ∇ = Δy −Δz −Δx Δx ; Δ = Δz −Δz +Δy −Δy 1234 34 12 34 12 1234 34 34 12 12 Δ −Δ −Δ −Δ Δ ' −Δ +Δ ' −Δ y12 x12 z34 y34 x34 x34 z12 z12 φ 1 [∇ + ∇ ]φ = [Δ2 + Δ2 ] [∇ ][Φ]= [Δ2 ] 12 34 2 12 34 1234 1234 φ 3 Condition for existence of a unique solution is based on a geometrical relationship of the coordinates of four points in space, at time t : 1 −Δ Δ Δ −Δ z34 z12 x34 y12 ∇ = Δy −Δz −Δx Δx 1234 34 12 34 12 Δ −Δ −Δ −Δ y12 x12 z34 y34 Δ ' −Δ +Δ ' −Δ y34 y34 x12 x12 Δ2 = Δ ' −Δ +Δ ' −Δ and 1234 z34 z34 y12 y12 Δ ' −Δ +Δ ' −Δ x34 x34 z12 z12 This solution is often used as an initial guess for the final estimate of the motion parameters . Find geometric condition, when: ∇ = 1234 0 OPTICAL FLOW A point in 3D space: = [ ] ≠ X 0 kxo kyo kkzo k ;k 0, an arbitary constant. Image point: = [ ] where, = X i wxi wyi w X i PX 0 xo x i = zo Assuming normalized focal length, f = 1: ....(1) y yo i zo Assuming linear motion model (no acceleration), between successive frames: + xo (t) xo ut Combining equations (1) and (2): = + yo (t) yo vt ....(2) + (xo ut) x (t) + z (t) z + wt i = (zo wt) o o + ....(3) y (t) (yo vt) i + (zo wt) + (xo ut) xi (t) (z + wt) Assume in equation (3), = o ....(3) that w (=dz/dt) < 0.