Achieving scale covariance Blobs (and scale selection) • Goal: independently detect corresponding regions in scaled versions of the same image • Need scale selection mechanism for finding characteristic region size that is covariant with the image transformation

S.Lazebnik, UNC S.Lazebnik, UNC

Blob detection in 2D in 2D Laplacian of Gaussian: Circularly symmetric Laplacian of Gaussian: Circularly symmetric operator for blob detection in 2D operator for blob detection in 2D

2 2 2  g  g  of Gaussian controls the 2 2  g   radius of the operator   g  g   x2 y2 Scale-normalized 2 g   2      norm   2 2   x y  need this to make filter response insensitive to the scale  S.Lazebnik, UNC S.Lazebnik, UNC

LoG Blob Finding and Scale Scale Selection Lapacian of Gaussian (LoG) filter extrema locate “blobs” maxima = dark blobs on light background minima = light blobs on dark background

Scale of blob (size ; radius in pixels) is determined by the sigma parameter of the LoG filter.

“Laplacian” operator.

LoG sigma = 2 LoG sigma = 10 Scale selection Scale selection • At what scale does the Laplacian achieve a maximum • At what scale does the Laplacian achieve a maximum response to a binary circle of radius r? response to a binary circle of radius r? • To get maximum response, the zeros of the Laplacian have to be aligned with the circle • The Laplacian is given by (up to scale): 2 2 2 (x2  y 2  2 2 ) e( x  y 2/)  • Therefore, the maximum response occurs at   r / .2

r r

circle

image Laplacian image Laplacian

Characteristic scale Scale-space blob detector • We define the characteristic scale of a blob 1. Convolve image with scale-normalized as the scale that produces peak of Laplacian Laplacian at several scales response in the blob center 2. Find maxima of squared Laplacian response in scale-space

characteristic scale

T. Lindeberg (1998). "Feature detection with automatic scale selection." International Journal of 30 (2): pp 77--116.

Scale-space blob detector: Example Scale-space blob detector: Example

S.Lazebnik, UNC S.Lazebnik, UNC Scale-space blob detector: Example Efficient implementation Approximating the Laplacian with a :

2 L   Gxx (,,)x y   Gyy (,,)x y   (Laplacian)

DoG G(,, x y k )  G(,,) x y  (Difference of Gaussians)

Why do this: 2D is separable into two 1D filters, making it more efficient to compute.

S.Lazebnik, UNC

Structure Feature Descriptors

Per-pixel transformations Most features descriptors can be thought of either: Interest points: edges, corners, blobs Templates Feature descriptors • Intensity, gradients, etc.

Histograms • Color, texture, SIFT descriptors, etc.

or combinations of both

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 1515 James Hays, Brown U.

Textons Computing Textons

An attempt to characterize texture Take a bank of filters and apply Cluster in filter space Replace each pixel with integer representing the texture ‘type’ to lots of images

For new pixel, filter surrounding region with same filter bank, and assign to nearest cluster

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 17 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 18 Bag of words descriptor Sift Descriptor Goal: produce a scale and rotation invariant vector that describes the region around an interest point. • Compute visual features in image • Compute descriptor around each All calculations are relative to the orientation and • Find closest match in library and assign index scale of the keypoint • Compute histogram of these indices over the region • Dictionary computed using K-means Makes descriptor invariant to rotation and scale

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 19 Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 20

HoG Descriptor HOG=Histogram of Oriented Gradients Shape Context Descriptor

Count the number of points inside each bin, e.g.:

Count = 4 . . . Count = 10

Log-polar binning: more precision for nearby points, more flexibility for farther points.

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 21 Belongie & Malik, ICCV 2001 K. Grauman, B. Leibe

Shape Context Descriptor SIFT Detector/Descriptor

Algorithm Steps

Generating Gaussian and Let’s look “under the hood” DOG pyramid

at one of the most popular Locating scale-space extrema image feature detectors. Eliminating edge responses (related to Harris corners) Computing patch orientation

Spatial histogram of gradients References: •Lowe, “Distinctive Image Features from Scale Invariant Keypoints” International Journal of Computer Vision, Vol.60(2):91-110, 2004. •Crowley et.al., “Fast Computation of Characteristic Scale using a Half- Octave Pyramid”, Cognitive Vision Workshop, Zurich Switzerland, Sept 19-20, 2002.

K. Grauman, B. Leibe Generating a Gaussian Pyramid Using a Half-Octave Pyramid From Crowley et.al., “Fast Computation of Characteristic Scale using a Half-Octave Pyramid.”. General idea: cascaded filtering using [1 4 6 4 1] kernel (sigma = 1) to Basic Functions: generate a pyramid with two images per octave (power of 2 change in resolution). When we reach a full octave, Blur (convolve with Gaussian downsample the image. to smooth image)

DownSample (reduce image size by half)

Upsample (double image size)

Crowley etal.

Half Octave Gaussian and DOG Half Octave Gaussian and DOG L1 L2

blur blur blur blur blur blur tmp tmp G5 G1 G2 downsample downsample downsample downsample

blur blur blur G1,L1: N x M blur blur blur tmp G2,L2: N x M G3 G4 tmp G3,L3: N/2 x M/2 G4,L4: N/2 x M/2 G5,L5: N/4 x M/4 and so on… L3 L4

Find Extrema in Space and Scale Filtering out edge responses Either Using Harris corner matrix DOG level L-1 (2x2 matrix computed from first partial image ) Scale DOG level L H

Or use (2x2 matrix computed from DOG level L+1 second partial image derivatives)

Space Dxx Dxy

Hint: when finding maxima or minima at level L, Dxy Dyy we must DownSample or UpSample as necessary to make DOG images at level L-1 and L+1 the same size as L. Lowe’s notation, Section 4.1 Filtering out edge responses Any Significant Difference?

Would be removed by Harris method Would be removed by Hessian method

Not much difference

Since the goal is filtering out edges, I like this one better

Yellow=extrema, blue=removed by Harris corners, red=removed by Hessian

Computing Patch Orientation Which Gaussian is Closest in Scale?

L Sigma = 1.18  (Crowley paper)

G(i) G(i+1) Sigma =  Sigma = sqrt(2) 

Which Gaussian is Closest in Scale? Computing Orientation

L Sigma = 1.18  (Crowley paper)

G(i) G(i+1) Sigma =  Sigma = sqrt(2)  G(I) is closest in scale to L Gradient Magnitude and Angle Computing Orientation

Dx Dy

M 

Sampling/Weighting over a Circular Region Example

Gaussian-weighted circular window == 2D Gaussian kernel.

Weight magnitude by Gaussian kernel.

What size kernel?

Lowe suggests sigma value of weighting kernel should be 3 times bigger than scale of the keypoint. Design decision: make the gaussian kernel have sigma =3, in the pixel coordinate system of G(I), the image from the 19x19 Gaussian pyramid that magnitude and angle were computed with.

Plus or minus 3 sigma, with sigma = 3, gives •Keypoint location = extrema location a width of 19 pixels. •Keypoint scale is scale of the DOG image

Example (continued) Example (continued)

gradient magnitude

.* =

gradient weighted by 2D weighted gradient magnitude gaussian kernel magnitude

gaussian image (at closest scale, from pyramid) gradient orientation Example (continued) Computing Orientation weighted gradient magnitude weighted orientation histogram. Each bucket contains sum of weighted gradient magnitudes corresponding to angles that fall within that bucket.

gradient orientation

36 buckets 10 degree range of angles in each bucket, i.e. 0 <=ang<10 : bucket 1 10<=ang<20 : bucket 2 20<=ang<30 : bucket 3 …

Example (continued) Example (continued) weighted gradient magnitude There may be multiple orientations. weighted orientation histogram.

peak Second peak peak 80% of peak value 80% of peak value

gradient orientation

In this case, generate duplicate SIFT patches, one 20-30 degrees with orientation at 25 degrees, one at 155 degrees.

Orientation of keypoint Design decision: one might want to limit number of is approximately 25 degrees possible multiple peaks to two.

Forming the Sift Descriptor Spatial Histogram of Gradients Computed on rotated and scaled version of window according to computed orientation & scale • resample the window Based on gradients weighted by a Gaussian of variance half the window (for smooth falloff)

2. Pool into local histograms 1. Compute image gradients 3. Concatenate histograms 4. Normalize histograms

Computer vision: models, learning and inference. ©2011 Simon J.D. Prince 47 Spatial Histogram of Gradients Ensure smoothness 4x4 array of gradient orientation histograms Gaussian weight • each orientation “increment” is weighted by magnitude Trilinear interpolation 8 orientations x 4x4 array = 128 dimensions • a given gradient contributes to 8 bins: Motivation: some sensitivity to spatial layout, but not too 4 in space times 2 in orientation much.

showing only 2x2 here but is 4x4

Reduce effect of illumination SIFT Invariance and covariance properties 128-dim vector normalized to 1 Threshold gradient magnitudes to avoid excessive influence • Laplacian blob response (detection) is invariant of high gradients w.r.t. rotation and scaling • after normalization, clamp gradients >0.2 • Blob location covariant w.r.t. rotation and scale • renormalize • Normalized patch is invariant wrt rotation/scale • Normalized vector is insensitive to illumination

• Not invariant/covariant with respect to affine transformations (such as due to small changes in viewing angle)

S.Lazebnik, UNC

Achieving affine covariance Affine adaptation example Consider the second moment matrix of the window containing the blob:  I 2 I I   0  M w(x, y) x x y R1 1 R    2     x, y I x I y I y   0 2 

direction of the fastest change

Recall: direction of the slowest change u -1/2 (max) [u v] M    const -1/2 v (min)

This ellipse visualizes the “characteristic shape” of the window Scale-invariant regions (blobs)

S.Lazebnik, UNC S.Lazebnik, UNC Affine adaptation example Affine adaptation • Problem: the second moment “window” determined by weights w(x,y) must match the characteristic shape of the region • Solution: iterative approach • Use a circular window to compute second moment matrix • Perform affine adaptation to find an ellipse-shaped window • Recompute second moment matrix using new window and iterate

Affine-adapted blobs

S.Lazebnik, UNC S.Lazebnik, UNC

Iterative affine adaptation Affine covariance • Affinely transformed versions of the same neighborhood will give rise to ellipses that are related by the same transformation • What to do if we want to compare these image regions? • Affine normalization: transform these regions into same- size circles

K. Mikolajczyk and C. Schmid, Scale and Affine invariant interest point detectors, IJCV 60(1):63-86, 2004.

http://www.robots.ox.ac.uk/~vgg/research/affine/ S.Lazebnik, UNC

Affine normalization Eliminating rotation ambiguity • Problem: There is no unique transformation from an • To assign a unique orientation to circular ellipse to a unit circle image windows: • We can rotate or flip a unit circle, and it still stays a unit circle • Create histogram of local gradient directions in the patch • Assign canonical orientation at peak of smoothed histogram

0 2 

S.Lazebnik, UNC S.Lazebnik, UNC From covariant regions to invariant features

Eliminate rotational Compute appearance Extract affine regions Normalize regions ambiguity descriptors

SIFT (Lowe ’04)

S.Lazebnik, UNC