Local descriptors

SIFT (Scale Invariant Feature Transform) descriptor

• SIFT keypoints at locaon xy and scale σ have been obtained according to a procedure that guarantees illuminaon and scale invance. By assigning a consistent orientaon, the SIFT keypoint descriptor can be also orientaon invariant.

• SIFT descriptor is therefore obtained from the following steps: – Determine locaon and scale by maximizing DoG in scale and in space (i.e. find the blurred image of closest scale) – Sample the points around the keypoint and use the keypoint local orientaon as the dominant gradient direcon. Use this scale and orientaon to make all further computaons invariant to scale and rotaon (i.e. rotate the gradients and coordinates by the dominant orientaon) – Separate the region around the keypoint into subregions and compute a 8-bin gradient orientaon histogram of each subregion weigthing samples with a σ = 1.5 Gaussian SIFT aggregaon window

• In order to derive a descriptor for the keypoint region we could sample intensies around the keypoint, but they are sensive to lighng changes and to slight errors in x, y, θ. In order to make keypoint esmate more reliable, it is usually preferable to use a larger aggregaon window (i.e. the Gaussian kernel size) than the detecon window.

The SIFT descriptor is hence obtained from thresholded image gradients sampled over a 16x16 array of locaons in the neighbourhood of the detected keypoint, using the level of the Gaussian pyramid at which the keypoint was detected. For each 4x4 region samples they are accumulated into a gradient orientaon histogram with 8 sampled orientaons.

• In total it results into a 4x4x8 = 128 dimensional feature vector

4x4 neighbourhood region 16x16 keypoint neighbourhood A a keypoint

ass 8 sampled orientations

SIFT Gradient orientaon histogram

• The gradient magnitudes are downweighted by a Gaussian funcon (red circle) in order to reduce the influence of gradients far from the center, as these are more affected by small misregistraons (smoothed by a Gaussian funcon with σ equal to 1.5 the keypoint scale).

• The 8-bin gradient orientaon histogram in each 4 × 4 quadrant, a is formed by soly adding the weighted gradient magnitudes.

Orientation histogram

Dominant direction

So distribuon of values to adjacent histogram bins is performed by trilinear interpolaon to reduce the effects of locaon and dominant orientaon misesmaon

Image from: Jonas Hurrelmann SIFT illuminaon invariance

• The keypoint descriptor is normalized to unit length to make it invariant to intensity change, i.e. to reduce the effects of contrast or gain.

• To make the descriptor robust to other photometric variaons, gradient magnitude values are clipped to 0.2 and the resulng vector is once again renormalized to unit length. • SIFT has been empirically found to show very good performance, invariant to image rotaon, scale, intensity change, and to moderate affine transformaons [Mikolajczyk & Schmid 2005]’s extensive survey : 80% Repeatability at: 10% image noise 45° viewing angle 1k-100k keypoints in database

Color SIFT descriptors

• Local descriptors like SIFT are usually based only on luminance and shape, so they use grey-scale values and ignore color, mainly because it is very difficult to select a color model that it sufficiently robust and general. Nevertheless, color is very important to describe/disnguish objects or scenes

• Different types of descriptors can be combined to improve representaon; the most common combinaon is between a local shape-descriptor (e.g. SIFT) and a color descriptor (e.g. color histogram in a “smart” color space like Luv or HSV)

An example of color-SIFT (sparse) descriptor (van de Weijer and Schmid, ECCV 2006). The combined descriptor is obtained by fusion of standard SIFT and Hue descriptor

Courtesy J. van de Weijer Dense gray and color SIFT

• SIFT descriptors can also be taken at fixed locaons by defining a regular grid over the image. In this case at each center point, 128-d SIFT descriptors are computed. In this case the descriptor account for the distrubuon of the gradient orientaon but does not has as obtained from the DoG detector. • Mulple descriptors are therefore computed to allow for scale variaon

128-d SIFT

(128 x 3)-d SIFT SIFT descriptors in numbers

• 1 patch (1 SIFT descriptor) = 128 float = 128 * 4 byte (32bit) = 512 byte.

• 1 image 320x240 well textured ~ 600 keypoints= 600*512 byte = 307200 byte = 300 KB.

• This presentaon in memory= 51 slides = 300 KB*51 = 15300 KB ~ 15 MB.

….. SIFT: implementaons available

• SIFT: available implementaons: − David Lowe’s: first code of SIFT algorithm by its creator (only binary). − OpenCV 2.2 implementaon - wrapper from Vedaldi code. SIFT original Lowe’s algorithm is quite slow (~6 sec for an image of size 1280x768): it is computaonally expensive and copyrighted

− Implementaon by Rob Hess. Best ever − entaons:

A C B

References: A.hp://www.cs.ubc.ca/~lowe/keypoints/ B.hp://opencv.willowgarage.com/documentaon/cpp/features2d_feature_detecon_and_descripon.html#si C.hp://blogs.oregonstate.edu/hess/code/si/ Rob Hess’ SIFT implementaon

Features: • Best Open Source SIFT implementaon in terms of speed, efficiency and similarity compared to Lowe’s binary. • SIFT library is wrien in C with versions available for both Linux and Windows • Easy to integrate with an OpenCV Project.

Contains a suite of algorithms: •SIFT keypoint detecon/descripon •KD-tree matching •Robust plane-to-plane trasformaon

References: 1.hps://web.engr.oregonstate.edu/%7Ehess/publicaons/silib-acmmm10.pdf 2.hp://blogs.oregonstate.edu/hess/code/si/

Compile and Install: • Unix plaorms are preferred • On a debian-based simply: 1. sudo apt-get install build-essenal libgtk2.0-dev libcv-dev libcvaux-dev 2. cd / && make all 3. ./bin/sifeat -h

Output: Usage: sifeat [opons] Opons: -h Display this message and exit -o Output keypoints to text file -m Output keypoint image file (format determined by extension) -i Set number of sampled intervals per octave in pyramid (default 3) -s Set sigma for inial gaussian smoothing at each octave (default 1.6000) -c Set threshold on keypoint contrast |D(x)| based on [0,1] pixel values (default 0.1400) -r Set threshold on keypoint rao of principle curvatures (default 10) -n Set width of descriptor histogram array (default 4) -b Set number of bins per histogram in descriptor array (default 8) -d Toggle image doubling (default on) -x Turn off keypoint display

SIFT parameters to tune ( in si.h or as argument of ./bin/sifeat) :

− SIFT_CONTR_THR [0.04] is the default threshold on keypoint contrast |D(x)|. To high values correspond less but stronger keypoints and vice versa.

−SIFT_DESCR_HIST_BINS [8] is the default number of bins per histogram in descriptor array. Trade-off between disncveness and efficiency.

−SIFT_IMG_DBL [0/1] tells if the image must be resized twice before keypoint localizaon. Increase the detectable keypoints lowering performance.

NN Matching parameters to tune ( in ./src/match.c) :

− KDTREE_BBF_MAX_NN_CHKS [200] is the maximum number of keypoint NN candidates to check during BBF search.

− NN_SQ_DIST_RATIO_THR [0.49] is the threshold on squared rao of distances between 1-st NN and 2-nd NN. Used to reject noisy matches in high dimensions.

SIFT alternaves

• GLOH (Gradient Locaon and Orientaon Histogram). – larger inial descriptor + PCA, TPAMI 2005 • SURF (Speeded Up Robust Features) – faster than SIFT and somemes more robust. ECCV 2006. (343, Google citaons) • GIST – Rapid Biologically-Inspired Scene Classificaon Using Features Shared with Visual Aenon, TPAMI 2007 • LESH – Head Pose Esmaon In Face Recognion Across Pose Scenarios, VISAPP 2008 • PCA-SIFT – A More Disncve Representaon for Local Image Descriptors, CVPR 2004 • Spin Image – Sparse Texture Representaon Using Affine-Invariant Neighborhoods, CVPR 2003 GLOH (Gradient Locaon and Orientaon Histogram) descriptor

• GLOH is a method for local shape descripon very similar to SIFT introduced by Miko in 2005

• Differently from SIFT it employs a log-polar locaon grid: – 3 bins in radial direcon – 8 bins in angular direcon – 16 bins for Gradient orientaon quanzaon

• The GLOH descriptor is therefore a higher dimensional vector with a total of 17 (i.e. 2x8+1) * 16 = 272 bins. PCA dimension reducon is employed in the vector representaon space

Example

16 dim

2x8 +1 dim

GLOH SURF (Speed Up Robust Features) descriptor

• SURF is a performant scale and rotaon invariant interest point detector and descriptor. It approximates or even outperforms SIFT and other local descriptors with respect to repeatability, disncveness, and robustness, yet can be computed and compared much faster

• This is achieved by: − relying on integral images for image convoluons − building on the strengths of the leading exisng detectors and descriptors (using a -based measure for the detector, and a distribuon-based descriptor) − simplifying these methods to the essenal

− The approach of SURF is similar to that of SIFT but ...

– Integral images in conjuncon with Haar are used to increase robustness and decrease computaon me − Instead of iteravely reducing the image size (like in the SIFT approach), the use of integral images allows the up-scaling of the filter at constant cost

“SIFT approach” “SURF approach”

− The SURF descriptor computaon can be divided into two tasks: − Orientaon assignment − Extracon of descriptor components SURF integral images

• Much of the performance increase in SURF can be aributed to the usage of Integral Images: – Integral Image is computed rapidly from an input image I – Then it is used to speed-up the calculaon of the area of any upright rectangular area

Using this representaon, the cost of computaon for a Given an input image I and a point (x,y) the integral rectangular area is only 4 addions image I∑ is calculated by the sum of the pixel intensies between the point and the origin:

• The integral image ii (x,y) at locaon x,y is an intermediate representaon of the image i (x,y) that contains all the pixels above and to the le of xy

ii(x, y) = !i(x', y') x'"x, y'" y It can be computed in one pass over the original image integraon along rows s(x, y) = s(x, y !1) + i(x, y) integraon along columns ii(x, y) = ii(x !1, y) + s(x, y)

s(x,!1) = 0 ii( ! 1, y) = 0 where s(x,y) is the cumulave row sum.

Integral Image ii(x,y)

x (0,0)

s(x,y) = s(x,y-1) + i(x,y)

y (x-1,y) (x,y) ii(x,y) = ii(x-1,y) + s(x,y)

Using the integral image representaon one can compute the value of any rectangular sum in constant me. For example the integral sum inside rectangle D is computed as: ii(4) + ii(1) – ii(2) – ii(3) SURF fast Hessian Detecon

• SURF detector approximates the the second-order Gaussian derivaves of the image at point x, also referred to as Laplacian of Gaussians (LoG) by using box filter representaons of the Gaussian kernels (differently from SIFT that uses instead Difference of Gaussians (DoG)).

• In the SURF approach interest points are detected at locaons where the determinant of the Hessian Matrix is maximum. SURF permits a very efficient implementaon, making good use of integral images to perform fast convoluons of varying size box filters (at near costant me)

The Hessian matrix H is calculated as a funcon of both space and scale:

Lxx(x, σ) , Lyy(x, σ), Lxy(x, σ) are the second-order Gaussian derivaves of the image at point x (LoGs) SURF LoG approximaon

• Approximated second order derivaves (LoGs) with box filters:

0

0

The 9x9 box filters in the figure are approximaons of Lyy(x, σ), Lxy(x, σ) second-order Gaussian derivaves of the image at point x. A Gaussian with σ=1.2 is considered that represents the lowest scale

• Haar-Wavelets are simple filters which can be used to find gradients in x and y direcons. For each template, the corresponding feature value is the sum of the pixels' intensity lying under the black part, minus the sum of the pixels' intensity lying under the white part: x response y response

-1 When used with integral images each requires 6 operaons to compute +1 +1 -1 SURF rientaon assignment

− In order to achieve invariance to image rotaon each detected interest point is assigned a reproducible orientaon. To determine the orientaon, Haar wavelet responses of 4σ are calculated for a set of pixels within radius 6σ of the detected point: – The Haar wavelet responses are represented as vectors. – The dominant orientaon is esmated by calculang the sum of all responses within a sliding orientaon window of 60 degrees.

Circular neighborhood of radius 6σ

Sliding orientaon window: the longest vector is the dominant orientaon

(x,y,σ) SURF extracon of descriptor components

• To extract descriptor components, a square window of size 20σ is taken around the interest point oriented along the dominant orientaon. • The window is divided into 4x4 regular subregions • Haar wavelets of size 2σ are calculated for 25 regularly distributed sample points in each subregion • The x and y wavelet responses (dx, dy) are collected for each subregion according to:

• Each subregion therefore contributes 4 values to the SURF descriptor vector leading to an overall vector of length 4x4x4 = 64

The green square bounds one of the 16 subregions and blue circles represent the sample points at which we compute the wavelet responses

x ad y responses are calculated relave to the dominant orientaon SURF computaonal cost

• SURF computaonal cost (detector and descriptor) against the most-used detectors and the SIFT descriptor: SURF implementaons available

A. Herbert Bay code (creator). Library and source code. B. OpenCV implementaon. C. OpenSurf1 at google code. See matcher_simple.cpp in samples/cpp in OpenCV 2.2

A B C

References: A.h p://www.vision.ee.ethz.ch/~surf/ B.hp://opencv.willowgarage.com/documentaon/cpp/features2d_feature_detecon_and_descripon.html#surf C.hp:// code.google.com/p/opensurf1/ SURF alternave versions

• U-SURF (Upright version) – For some applicaons rotaon invariance is not necessary. – U-SURF (Upright version) of SURF does not implement the orientaon assignment step. – It is faster while maintaining a robustness to rotaon of about +/- 15 degrees

• SURF-128 – SURF-128 implements a descriptor vector of 128 length. – The sum of dx and |dx| are computed separately for dy < 0 and dy > 0 (and so for dy and |dy|) – It is more precise and not much slower to compute, although slower to match Comparave table for invariance of main descriptors

Only moderate

Van Gool SURF Only ‘06 moderate

Fei-Fei Li Affine Invariant Descriptors

Fit an ellipse to the auto-correlaon (using eigenvalue analysis) and then use the principal axes and raos of this fit as the affine coordinate frame. The square root of the moment matrix can be used to transform local patches into a frame which is similar up to rotaon.

• Find affine normalized frame

A T T Σ1 =pp Σ2 =qq

−1 AAT A −1 T Σ111= 1 A2 Σ222=AA

rotation

• Compute rotaonal invariant descriptor in this normalized frame • Intensity domain spin image (2-D histogram of brightness) in the affine-normalized patch.

• Affine invariant color moments is another affine invariant descriptor:

mxyRxyGxyBxydxdyabc= p q a(, ) b (, ) c (, ) pq ∫ region

Different combinaons of these moments are fully affine invariant Also invariant to affine transformaon of intensity I → a I + b

F.Mindru et.al. “Recognizing Color Paerns Irrespecve of Viewpoint and Illuminaon”. CVPR99