Computer Vision Gaussian Kernel, and Controls the Amount of Smoothing
Total Page:16
File Type:pdf, Size:1020Kb
Recap: Smoothing with a Gaussian Recall: parameter σ is the “scale” / “width” / “spread” of the Computer Vision Gaussian kernel, and controls the amount of smoothing. Computer Science Tripos Part II … Dr Christopher Town Dr Chris Town Dr Chris Town Recap: Effect of σ on derivatives σ = 1 pixel σ = 3 pixels The apparent structures differ depending on Gaussian’s scale parameter. Larger values: larger scale edges detected Smaller values: finer features detected Dr Chris Town Dr Chris Town Dr Chris Town Dr Chris Town 1 Scale Invariant Detection Scale Invariant Detection • Consider regions (e.g. circles) of different sizes • The problem: how do we choose corresponding around a point circles independently in each image? • Regions of corresponding sizes will look the same in both images Dr Chris Town Dr Chris Town Scale Invariant Detection Scale Invariant Detection: Summary • Solution: – Design a function on the region (circle), which is “scale • Given: two images of the same scene with a large invariant” (the same for corresponding regions, even if scale difference between them they are at different scales) • Goal: find the same interest points independently f Image 1 f Image 2 in each image scale = 1/2 • Solution: search for extrema of suitable functions in scale and in space (over the image) Methods: region size region size 1. Harris-Laplacian [Mikolajczyk, Schmid]: maximise Laplacian over scale, Harris measure of corner response over the image 2. SIFT [Lowe]: maximise Difference of Gaussians over scale and space Dr Chris Town Dr Chris Town Image Matching Invariant local features -Algorithm for finding points and representing their patches should produce similar results even when conditions vary -Buzzword is “invariance” – geometric invariance: translation, rotation, scale – photometric invariance: brightness, exposure, … Feature Descriptors Dr Chris Town Dr Chris Town 2 Feature detection Scale invariant detection Suppose you’re looking for corners Local measure of feature uniqueness – How does the window change when you shift it? Key idea: find scale that gives local maximum response in both position and scale: use a Laplacian approximated by difference between two Gaussian filtered images with different sigmas) “flat” region: “edge”: “corner”: no change in all no change along significant change directions the edge direction in all directions Slide adapted from Darya Frolova, Denis Simakov, Weizmann Institute. Dr Chris Town Dr Chris Town Gaussian Pyramid The Gaussian Pyramid Low resolution G4 (G3 *gaussian) 2 All the extra levels blur add very little G3 (G2 * gaussian) 2 overhead for memory blur or computation! G2 (G1 * gaussian) 2 blur G1 (G0 *gaussian) 2 G Image blur0 High resolution Source: Irani & Basri Dr Chris Town Source: Irani & Basri Dr Chris Town The Laplacian Pyramid Laplacian ~ Difference of Gaussian Li Gi expand(Gi1) Gaussian Pyramid Laplacian Pyramid - = Gn Ln Gn G2 L - = 2 G DoG = Difference of Gaussians 1 L1 - = Cheap approximation – no derivatives needed. G0 L0 - = - = Why is this useful? 17 B. Leibe DrSource: Chris Irani Town& Basri Dr Chris Town 3 DoG approximation to LoG • We can efficiently approximate the (scale-normalised) Laplacian of a Gaussian with a difference of Gaussians: B. Leibe Dr Chris Town Dr Chris Town Scale-Space Pyramid • Multiple scales must be examined to identify scale-invariant features • An efficient function is to compute the Difference of Gaussian (DOG) pyramid (Burt & Adelson, 1983) Resample Blur Subtract Gaussian pyramid Dr Chris Town Dr Chris Town Laplacian pyramid algorithm x1 G1x1 x2 x2 x3 (I F3G3 )x3 (I F2G2 )x2 Notice that each layer shows detail at a particular scale --- these are, basically, bandpass filtered versions F G x of the image. 1 1 1 Laplacian pyramid Dr Chris Town (I F1G1)x1 Dr Chris Town 4 Showing, at full resolution, the information captured at each SIFT – Scale Invariant Feature Transform level of a Gaussian (top) and Laplacian (bottom) pyramid. http://www-bcs.mit.edu/people/adelson/pub_pdfs/pyramid83.pdf Dr Chris Town From: David Lowe (2004) Dr Chris Town DoG approximates scale-normalised Laplacian of a Gaussian (heat diffusion equation) Dr Chris Town Dr Chris Town Octave increment in scale of the Gaussian Pyramid followed by factor-of-two downsampling (for efficiency). To achieve better performance, each octave i is further divided into s intervals. Remember that we defined neighbouring scales as So starting with some , the next scale parameter will be , followed by etc., so that after s sub- levels of the pyramid we have a complete octave with Therefore Dr Chris Town Dr Chris Town 5 Dr Chris Town Dr Chris Town Key point localization with DoG Example of Keypoint Detection • Detect extrema of (a) 233x189 image difference-of-Gaussian (b) 832 DoG extrema (c) 729 left after peak (DoG) in scale space value threshold (d) 536 left after testing • Then reject points with ratio of principle curvatures (removing low contrast edge responses) (threshold) • Eliminate edge responses Candidate keypoints: list of (x,y,σ) Slide credit: David Lowe Dr Chris Town Slide credit: David Lowe Dr Chris Town Feature Descriptors: SIFT Rotation Invariant Descriptors • Scale Invariant Feature Transform • Find local orientation • Descriptor computation: – Dominant direction of gradient – Divide patch into 4x4 sub-patches: 16 cells for the image patch – Compute histogram of gradient orientations (8 reference • Rotate patch according to this angle angles) for all pixels inside each sub-patch – This puts the patches into a canonical orientation. – Resulting descriptor: 4x4x8 = 128 dimensions David G. Lowe. "Distinctive image features from scale-invariant keypoints.” IJCV 60 (2), pp. 91-110, 2004. Slide credit: Svetlana Lazebnik Dr Chris Town Slide credit: Svetlana Lazebnik, Matthew Brown Dr Chris Town 6 Orientation Normalisation: Computation Feature stability to noise • Match features after random change in image scale & [Lowe, SIFT, 1999] orientation, with differing levels of image noise • Compute orientation histogram • Find nearest neighbor in database of 30,000 features • Select dominant orientation • Normalise: rotate to fixed orientation 0 2p 37 Slide adapted from David Lowe Dr Chris Town Dr Chris Town Feature stability to affine change Distinctiveness of features • Match features after random change in image scale & • Vary size of database of features, with 30 degree affine change, orientation, with 2% image noise, and affine distortion 2% image noise • Find nearest neighbor in database of 30,000 features • Measure % correct for single nearest neighbor match Dr Chris Town Dr Chris Town Working with SIFT Descriptors SIFT • One image yields: – n 128-dimensional descriptors: each one is a histogram of the gradient orientations within a patch • [n x 128 matrix] – n scale parameters specifying the size of each patch • [n x 1 vector] – n orientation parameters specifying the angle of the patch • [n x 1 vector] – n 2D points giving positions of the patches • [n x 2 matrix] Slide credit: Steve Seitz Dr Chris Town D. Lowe, 2004 Dr Chris Town 7 Feature matching Image stitching Slides from Steve Seitz and Rick Szeliski Dr Chris Town Brown, Lowe, 2007 Dr Chris Town Nearest-neighbor matching Approximate k-d tree matching • Solve following problem for all feature vectors, x: Key idea: Search k-d tree bins in • Nearest-neighbour matching is the major computational order of distance from bottleneck query – Linear search performs dn2 operations for n features Requires use of a priority and d dimensions queue – No exact methods are faster than linear search for d>10 – Approximate methods can be much faster, but at the cost of missing some correct matches. Failure rate gets worse for large datasets. Dr Chris Town Dr Chris Town K-d tree construction K-d tree query Simple 2D example 4 l1 6 l9 7 l5 l1 4 l1 6 l9 7 l6 l5 l1 q 8 l l6 l2 5 l3 l2 3 8 l l2 5 l3 l2 3 9 10 3 l10 l8 l7 l l l l 9 10 4 5 7 6 l 3 l10 8 l7 l4 l5 l7 l6 1 2 l4 11 1 2 l8 2 5 4 11 l10 8 l9 l4 11 l8 2 5 4 11 l10 8 l9 1 3 9 10 6 7 1 3 9 10 6 7 Slide credit: Anna Atramentov Slide credit: Anna Atramentov Dr Chris Town Dr Chris Town 8 Recognition with Local Features Fourier transform • Image content is transformed into local features that are invariant to translation, rotation, and scale • Goal: Verify if they belong to a consistent = * configuration Fourier Fourier bases pixel domain transform are global: image each transform coefficient Local Features, depends on all e.g. SIFT pixel locations. 49 Slide credit: David Lowe Dr Chris Town Dr Chris Town Gaussian pyramid Laplacian pyramid = * = * Laplacian Gaussian pixel image pixel image pyramid pyramid Overcomplete representation. Overcomplete representation. Low-pass filters, sampled Transformed pixels represent appropriately for their blur. Dr Chris Town bandpassed image information. Dr Chris Town Edge Fitting • Edge Detection: – The process of labeling the locations in the image where the gray level’s “rate of change” is high. • OUTPUT: “edgels” locations, direction, strength • Edge Integration, Contour fitting: – The process of combining “local” and perhaps sparse and non- contiguous “edgel”-data into meaningful, long edge curves (or closed contours) for segmentation • OUTPUT: edges/curves consistent with the local data Dr Chris Town Dr Chris Town 9 Framework for snakes Modeling • The contour is defined in the (x, y) plane of an image as a • A higher level process or a user initialises any curve parametric curve close to the object boundary. v(s)=(x(s), y(s)) • The snake then starts deforming and moving towards the desired object boundary. • Contour is said to possess an energy (Esnake) which is defined as the sum of the three energy terms. • In the end it completely “shrink-wraps” around the EEEE object. courtesy snakeint ernal external constra int • The energy terms are defined in a way such that the final position of the contour will have minimum energy (Emin) • Therefore our problem of detecting objects reduces to an energy minimisation problem.