Local Descriptors

Total Page:16

File Type:pdf, Size:1020Kb

Local Descriptors Local descriptors SIFT (Scale Invariant Feature Transform) descriptor • SIFT keypoints at locaon xy and scale σ have been obtained according to a procedure that guarantees illuminaon and scale invance. By assigning a consistent orientaon, the SIFT keypoint descriptor can be also orientaon invariant. • SIFT descriptor is therefore obtained from the following steps: – Determine locaon and scale by maximizing DoG in scale and in space (i.e. find the blurred image of closest scale) – Sample the points around the keypoint and use the keypoint local orientaon as the dominant gradient direc;on. Use this scale and orientaon to make all further computaons invariant to scale and rotaon (i.e. rotate the gradients and coordinates by the dominant orientaon) – Separate the region around the keypoint into subregions and compute a 8-bin gradient orientaon histogram of each subregion weigthing samples with a σ = 1.5 Gaussian SIFT aggregaon window • In order to derive a descriptor for the keypoint region we could sample intensi;es around the keypoint, but they are sensi;ve to ligh;ng changes and to slight errors in x, y, θ. In order to make keypoint es;mate more reliable, it is usually preferable to use a larger aggregaon window (i.e. the Gaussian kernel size) than the detec;on window. The SIFT descriptor is hence obtained from thresholded image gradients sampled over a 16x16 array of locaons in the neighbourhood of the detected keypoint, using the level of the Gaussian pyramid at which the keypoint was detected. For each 4x4 region samples they are accumulated into a gradient orientaon histogram with 8 sampled orientaons. • In total it results into a 4x4x8 = 128 dimensional feature vector 4x4 neighbourhood region 16x16 keypoint neighbourhood A a keypoint ass 8 sampled orientations SIFT Gradient orientaon histogram • The gradient magnitudes are downweighted by a Gaussian func;on (red circle) in order to reduce the influence of gradients far from the center, as these are more affected by small misregistraons (smoothed by a Gaussian func;on with σ equal to 1.5 the keypoint scale). • The 8-bin gradient orientaon histogram in each 4 × 4 quadrant, a is formed by soly adding the weighted gradient magnitudes. Orientation histogram Dominant direction SoX distribu;on of values to adjacent histogram bins is performed by trilinear interpolaon to reduce the effects of locaon and dominant orientaon mises;maon Image from: Jonas Hurrelmann SIFT illuminaon invariance • The keypoint descriptor is normalized to unit length to make it invariant to intensity change, i.e. to reduce the effects of contrast or gain. • To make the descriptor robust to other photometric variaons, gradient magnitude values are clipped to 0.2 and the resul;ng vector is once again renormalized to unit length. • SIFT has been empirically found to show very good performance, invariant to image rota.on, scale, intensity change, and to moderate affine transformaons [Mikolajczyk & Schmid 2005]’s extensive survey : 80% Repeatability at: 10% image noise 45° viewing angle 1k-100k keypoints in database Color SIFT descriptors • Local descriptors like SIFT are usually based only on luminance and shape, so they use grey-scale values and ignore color, mainly because it is very difficult to select a color model that it sufficiently robust and general. Nevertheless, color is very important to describe/dis;nguish objects or scenes • Different types of descriptors can be combined to improve representaon; the most common combinaon is between a local shape-descriptor (e.g. SIFT) and a color descriptor (e.g. color histogram in a “smart” color space like Luv or HSV) An example of color-SIFT (sparse) descriptor (van de Weijer and Schmid, ECCV 2006). The combined descriptor is obtained by fusion of standard SIFT and Hue descriptor Courtesy J. van de Weijer Dense gray and color SIFT • SIFT descriptors can also be taken at fixed locaons by defining a regular grid over the image. In this case at each center point, 128-d SIFT descriptors are computed. In this case the descriptor account for the distrubu;on of the gradient orientaon but does not has scale invariance as obtained from the DoG detector. • Mul;ple descriptors are therefore computed to allow for scale variaon 128-d SIFT (128 x 3)-d SIFT SIFT descriptors in numbers • 1 patch (1 SIFT descriptor) = 128 float = 128 * 4 byte (32bit) = 512 byte. • 1 image 320x240 well textured ~ 600 keypoints= 600*512 byte = 307200 byte = 300 KB. • This presentaon in memory= 51 slides = 300 KB*51 = 15300 KB ~ 15 MB. ….. SIFT: implementaons available • SIFT: available implementaons: − David Lowe’s: first code of SIFT algorithm by its creator (only binary). − OpenCV 2.2 implementaon - wrapper from Vedaldi code. SIFT original Lowe’s algorithm is quite slow (~6 sec for an image of size 1280x768): it is computaonally expensive and copyrighted − Implementaon by Rob Hess. Best ever − entaons: A C B References: A.htp://www.cs.ubc.ca/~lowe/keypoints/ B.htp://opencv.willowgarage.com/documentaon/cpp/features2d_feature_detec;on_and_descrip;on.html#siX C.htp://blogs.oregonstate.edu/hess/code/siX/ Rob Hess’ SIFT implementaon Features: • Best Open Source SIFT implementaon in terms of speed, efficiency and similarity compared to Lowe’s binary. • SIFT library is writen in C with versions available for both Linux and Windows • Easy to integrate with an OpenCV Project. Contains a suite of algorithms: •SIFT keypoint detec;on/descrip;on •KD-tree matching •Robust plane-to-plane trasformaon References: 1.htps://web.engr.oregonstate.edu/%7Ehess/publicaons/siXlib-acmmm10.pdf 2.htp://blogs.oregonstate.edu/hess/code/siX/ Compile and Install: • Unix plaorms are preferred • On a debian-based simply: 1. sudo apt-get install build-essen;al libgtk2.0-dev libcv-dev libcvaux-dev 2. cd <path-to-siX>/ && make all 3. ./bin/siXfeat -h Output: Usage: siXfeat [op;ons] <img_file> Opons: -h Display this message and exit -o <out_file> Output keypoints to text file -m <out_img> Output keypoint image file (format determined by extension) -i <intervals> Set number of sampled intervals per octave in scale space pyramid (default 3) -s <sigma> Set sigma for ini;al gaussian smoothing at each octave (default 1.6000) -c <thresh> Set threshold on keypoint contrast |D(x)| based on [0,1] pixel values (default 0.1400) -r <thresh> Set threshold on keypoint rao of principle curvatures (default 10) -n <width> Set width of descriptor histogram array (default 4) -b <bins> Set number of bins per histogram in descriptor array (default 8) -d Toggle image doubling (default on) -x Turn off keypoint display SIFT parameters to tune ( in si#.h or as argument of ./bin/si#feat) : − SIFT_CONTR_THR [0.04] is the default threshold on keypoint contrast |D(x)|. To high values correspond less but stronger keypoints and vice versa. −SIFT_DESCR_HIST_BINS [8] is the default number of bins per histogram in descriptor array. Trade-off between dis;nc;veness and efficiency. −SIFT_IMG_DBL [0/1] tells if the image must be resized twice before keypoint localizaon. Increase the detectable keypoints lowering performance. NN Matching parameters to tune ( in ./src/match.c) : − KDTREE_BBF_MAX_NN_CHKS [200] is the maximum number of keypoint NN candidates to check during BBF search. − NN_SQ_DIST_RATIO_THR [0.49] is the threshold on squared rao of distances between 1-st NN and 2-nd NN. Used to reject noisy matches in high dimensions. SIFT alternaves • GLOH (Gradient Locaon and Orientaon Histogram). – larger ini;al descriptor + PCA, TPAMI 2005 • SURF (Speeded Up Robust Features) – faster than SIFT and some;mes more robust. ECCV 2006. (343, Google citaons) • GIST – Rapid Biologically-Inspired Scene Classificaon Using Features Shared with Visual Aten;on, TPAMI 2007 • LESH – Head Pose Es;maon In Face Recogni;on Across Pose Scenarios, VISAPP 2008 • PCA-SIFT – A More Dis;nc;ve Representaon for Local Image Descriptors, CVPR 2004 • Spin Image – Sparse Texture Representaon Using Affine-Invariant Neighborhoods, CVPR 2003 GLOH (Gradient Locaon and Orientaon Histogram) descriptor • GLOH is a method for local shape descrip;on very similar to SIFT introduced by Miko in 2005 • Differently from SIFT it employs a log-polar locaon grid: – 3 bins in radial direc;on – 8 bins in angular direc;on – 16 bins for Gradient orientaon quan;zaon • The GLOH descriptor is therefore a higher dimensional vector with a total of 17 (i.e. 2x8+1) * 16 = 272 bins. PCA dimension reduc;on is employed in the vector representaon space Example 16 dim 2x8 +1 dim GLOH SURF (Speed Up Robust Features) descriptor • SURF is a performant scale and rotaon invariant interest point detector and descriptor. It approximates or even outperforms SIFT and other local descriptors with respect to repeatability, dis;nc;veness, and robustness, yet can be computed and compared much faster • This is achieved by: − relying on integral images for image convolu;ons − building on the strengths of the leading exis;ng detectors and descriptors (using a Hessian matrix-based measure for the detector, and a distribu;on-based descriptor) − simplifying these methods to the essen;al − The approach of SURF is similar to that of SIFT but ... – Integral images in conjunc;on with Haar Wavelets are used to increase robustness and decrease computaon ;me − Instead of iteravely reducing the image size (like in the SIFT approach), the use of integral images allows the up-scaling of the filter at constant cost “SIFT approach” “SURF approach” − The SURF descriptor computaon can be divided into two tasks: − Orientaon assignment − Extrac;on of descriptor components SURF integral
Recommended publications
  • Scale Space Multiresolution Analysis of Random Signals
    Scale Space Multiresolution Analysis of Random Signals Lasse Holmstr¨om∗, Leena Pasanen Department of Mathematical Sciences, University of Oulu, Finland Reinhard Furrer Institute of Mathematics, University of Zurich,¨ Switzerland Stephan R. Sain National Center for Atmospheric Research, Boulder, Colorado, USA Abstract A method to capture the scale-dependent features in a random signal is pro- posed with the main focus on images and spatial fields defined on a regular grid. A technique based on scale space smoothing is used. However, where the usual scale space analysis approach is to suppress detail by increasing smoothing progressively, the proposed method instead considers differences of smooths at neighboring scales. A random signal can then be represented as a sum of such differences, a kind of a multiresolution analysis, each difference representing de- tails relevant at a particular scale or resolution. Bayesian analysis is used to infer which details are credible and which are just artifacts of random variation. The applicability of the method is demonstrated using noisy digital images as well as global temperature change fields produced by numerical climate prediction models. Keywords: Scale space smoothing, Bayesian methods, Image analysis, Climate research 1. Introduction In signal processing, smoothing is often used to suppress noise. An optimal level of smoothing is then of interest. In scale space analysis of noisy curves and images, instead of a single, in some sense “optimal” level of smoothing, a whole family of smooths is considered and each smooth is thought to provide information about the underlying object at a particular scale or resolution. The ∗Corresponding author ∗∗P.O.Box 3000, 90014 University of Oulu, Finland Tel.
    [Show full text]
  • Hough Transform, Descriptors Tammy Riklin Raviv Electrical and Computer Engineering Ben-Gurion University of the Negev Hough Transform
    DIGITAL IMAGE PROCESSING Lecture 7 Hough transform, descriptors Tammy Riklin Raviv Electrical and Computer Engineering Ben-Gurion University of the Negev Hough transform y m x b y m 3 5 3 3 2 2 3 7 11 10 4 3 2 3 1 4 5 2 2 1 0 1 3 3 x b Slide from S. Savarese Hough transform Issues: • Parameter space [m,b] is unbounded. • Vertical lines have infinite gradient. Use a polar representation for the parameter space Hough space r y r q x q x cosq + ysinq = r Slide from S. Savarese Hough Transform Each point votes for a complete family of potential lines: Each pencil of lines sweeps out a sinusoid in Their intersection provides the desired line equation. Hough transform - experiments r q Image features ρ,ϴ model parameter histogram Slide from S. Savarese Hough transform - experiments Noisy data Image features ρ,ϴ model parameter histogram Need to adjust grid size or smooth Slide from S. Savarese Hough transform - experiments Image features ρ,ϴ model parameter histogram Issue: spurious peaks due to uniform noise Slide from S. Savarese Hough Transform Algorithm 1. Image à Canny 2. Canny à Hough votes 3. Hough votes à Edges Find peaks and post-process Hough transform example http://ostatic.com/files/images/ss_hough.jpg Incorporating image gradients • Recall: when we detect an edge point, we also know its gradient direction • But this means that the line is uniquely determined! • Modified Hough transform: for each edge point (x,y) θ = gradient orientation at (x,y) ρ = x cos θ + y sin θ H(θ, ρ) = H(θ, ρ) + 1 end Finding lines using Hough transform
    [Show full text]
  • Image Denoising Using Non Linear Diffusion Tensors
    2011 8th International Multi-Conference on Systems, Signals & Devices IMAGE DENOISING USING NON LINEAR DIFFUSION TENSORS Benzarti Faouzi, Hamid Amiri Signal, Image Processing and Pattern Recognition: TSIRF (ENIT) Email: [email protected] , [email protected] ABSTRACT linear models. Many approaches have been proposed to remove the noise effectively while preserving the original Image denoising is an important pre-processing step for image details and features as much as possible. In the past many image analysis and computer vision system. It few years, the use of non linear PDEs methods involving refers to the task of recovering a good estimate of the true anisotropic diffusion has significantly grown and image from a degraded observation without altering and becomes an important tool in contemporary image changing useful structure in the image such as processing. The key idea behind the anisotropic diffusion discontinuities and edges. In this paper, we propose a new is to incorporate an adaptative smoothness constraint in approach for image denoising based on the combination the denoising process. That is, the smooth is encouraged of two non linear diffusion tensors. One allows diffusion in a homogeneous region and discourage across along the orientation of greatest coherences, while the boundaries, in order to preserve the natural edge of the other allows diffusion along orthogonal directions. The image. One of the most successful tools for image idea is to track perfectly the local geometry of the denoising is the Total Variation (TV) model [9] [8][10] degraded image and applying anisotropic diffusion and the anisotropic smoothing model [1] which has since mainly along the preferred structure direction.
    [Show full text]
  • Scale-Space Theory for Multiscale Geometric Image Analysis
    Tutorial Scale-Space Theory for Multiscale Geometric Image Analysis Bart M. ter Haar Romeny, PhD Utrecht University, the Netherlands [email protected] Introduction Multiscale image analysis has gained firm ground in computer vision, image processing and models of biological vision. The approaches however have been characterised by a wide variety of techniques, many of them chosen ad hoc. Scale-space theory, as a relatively new field, has been established as a well founded, general and promising multiresolution technique for image structure analysis, both for 2D, 3D and time series. The rather mathematical nature of many of the classical papers in this field has prevented wide acceptance so far. This tutorial will try to bridge that gap by giving a comprehensible and intuitive introduction to this field. We also try, as a mutual inspiration, to relate the computer vision modeling to biological vision modeling. The mathematical rigor is much relaxed for the purpose of giving the broad picture. In appendix A a number of references are given as a good starting point for further reading. The multiscale nature of things In mathematics objects have no scale. We are familiar with the notion of points, that really shrink to zero, lines with zero width. In mathematics are no metrical units involved, as in physics. Neighborhoods, like necessary in the definition of differential operators, are defined as taken into the limit to zero, so we can really speak of local operators. In physics objects live on a range of scales. We need an instrument to do an observation (our eye, a camera) and it is the range that this instrument can see that we call the scale range.
    [Show full text]
  • An Overview of Wavelet Transform Concepts and Applications
    An overview of wavelet transform concepts and applications Christopher Liner, University of Houston February 26, 2010 Abstract The continuous wavelet transform utilizing a complex Morlet analyzing wavelet has a close connection to the Fourier transform and is a powerful analysis tool for decomposing broadband wavefield data. A wide range of seismic wavelet applications have been reported over the last three decades, and the free Seismic Unix processing system now contains a code (succwt) based on the work reported here. Introduction The continuous wavelet transform (CWT) is one method of investigating the time-frequency details of data whose spectral content varies with time (non-stationary time series). Moti- vation for the CWT can be found in Goupillaud et al. [12], along with a discussion of its relationship to the Fourier and Gabor transforms. As a brief overview, we note that French geophysicist J. Morlet worked with non- stationary time series in the late 1970's to find an alternative to the short-time Fourier transform (STFT). The STFT was known to have poor localization in both time and fre- quency, although it was a first step beyond the standard Fourier transform in the analysis of such data. Morlet's original wavelet transform idea was developed in collaboration with the- oretical physicist A. Grossmann, whose contributions included an exact inversion formula. A series of fundamental papers flowed from this collaboration [16, 12, 13], and connections were soon recognized between Morlet's wavelet transform and earlier methods, including harmonic analysis, scale-space representations, and conjugated quadrature filters. For fur- ther details, the interested reader is referred to Daubechies' [7] account of the early history of the wavelet transform.
    [Show full text]
  • A PERFORMANCE EVALUATION of LOCAL DESCRIPTORS 1 a Performance Evaluation of Local Descriptors
    MIKOLAJCZYK AND SCHMID: A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS 1 A performance evaluation of local descriptors Krystian Mikolajczyk and Cordelia Schmid Dept. of Engineering Science INRIA Rhone-Alpesˆ University of Oxford 655, av. de l'Europe Oxford, OX1 3PJ 38330 Montbonnot United Kingdom France [email protected] [email protected] Abstract In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [3], steerable filters [12], PCA-SIFT [19], differential invariants [20], spin images [21], SIFT [26], complex filters [37], moment invariants [43], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor, and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors. Index Terms Local descriptors, interest points, interest regions, invariance, matching, recognition. I. INTRODUCTION Local photometric descriptors computed for interest regions have proved to be very successful in applications such as wide baseline matching [37, 42], object recognition [10, 25], texture Corresponding author is K.
    [Show full text]
  • Scale-Equalizing Pyramid Convolution for Object Detection
    Scale-Equalizing Pyramid Convolution for Object Detection Xinjiang Wang,* Shilong Zhang∗, Zhuoran Yu, Litong Feng, Wayne Zhang SenseTime Research {wangxinjiang, zhangshilong, yuzhuoran, fenglitong, wayne.zhang}@sensetime.com Abstract 42 41 FreeAnchor C-Faster Feature pyramid has been an efficient method to extract FSAF D-Faster 40 features at different scales. Development over this method Reppoints mainly focuses on aggregating contextual information at 39 different levels while seldom touching the inter-level corre- L-Faster AP lation in the feature pyramid. Early computer vision meth- 38 ods extracted scale-invariant features by locating the fea- FCOS RetinaNet ture extrema in both spatial and scale dimension. Inspired 37 by this, a convolution across the pyramid level is proposed Faster Baseline 36 SEPC-lite in this study, which is termed pyramid convolution and is Two-stage detectors a modified 3-D convolution. Stacked pyramid convolutions 35 directly extract 3-D (scale and spatial) features and out- 60 65 70 75 80 85 90 performs other meticulously designed feature fusion mod- Time (ms) ules. Based on the viewpoint of 3-D convolution, an inte- grated batch normalization that collects statistics from the whole feature pyramid is naturally inserted after the pyra- Figure 1: Performance on COCO-minival dataset of pyramid mid convolution. Furthermore, we also show that the naive convolution in various single-stage detectors including RetinaNet [20], FCOS [38], FSAF [48], Reppoints [44], FreeAnchor [46]. pyramid convolution, together with the design of RetinaNet Reference points of two-stage detectors such as Faster R-CNN head, actually best applies for extracting features from a (Faster) [31], Libra Faster R-CNN (L-Faster) [29], Cascade Faster Gaussian pyramid, whose properties can hardly be satis- R-CNN (C-Faster) [1] and Deformable Faster R-CNN (D-Faster) fied by a feature pyramid.
    [Show full text]
  • 1 Introduction
    1: Introduction Ahmad Humayun 1 Introduction z A pixel is not a square (or rectangular). It is simply a point sample. There are cases where the contributions to a pixel can be modeled, in a low order way, by a little square, but not ever the pixel itself. The sampling theorem tells us that we can reconstruct a continuous entity from such a discrete entity using an appropriate reconstruction filter - for example a truncated Gaussian. z Cameras approximated by pinhole cameras. When pinhole too big - many directions averaged to blur the image. When pinhole too small - diffraction effects blur the image. z A lens system is there in a camera to correct for the many optical aberrations - they are basically aberrations when light from one point of an object does not converge into a single point after passing through the lens system. Optical aberrations fall into 2 categories: monochromatic (caused by the geometry of the lens and occur both when light is reflected and when it is refracted - like pincushion etc.); and chromatic (Chromatic aberrations are caused by dispersion, the variation of a lens's refractive index with wavelength). Lenses are essentially help remove the shortfalls of a pinhole camera i.e. how to admit more light and increase the resolution of the imaging system at the same time. With a simple lens, much more light can be bought into sharp focus. Different problems with lens systems: 1. Spherical aberration - A perfect lens focuses all incoming rays to a point on the optic axis. A real lens with spherical surfaces suffers from spherical aberration: it focuses rays more tightly if they enter it far from the optic axis than if they enter closer to the axis.
    [Show full text]
  • Scale Normalized Image Pyramids with Autofocus for Object Detection
    1 Scale Normalized Image Pyramids with AutoFocus for Object Detection Bharat Singh, Mahyar Najibi, Abhishek Sharma and Larry S. Davis Abstract—We present an efficient foveal framework to perform object detection. A scale normalized image pyramid (SNIP) is generated that, like human vision, only attends to objects within a fixed size range at different scales. Such a restriction of objects’ size during training affords better learning of object-sensitive filters, and therefore, results in better accuracy. However, the use of an image pyramid increases the computational cost. Hence, we propose an efficient spatial sub-sampling scheme which only operates on fixed-size sub-regions likely to contain objects (as object locations are known during training). The resulting approach, referred to as Scale Normalized Image Pyramid with Efficient Resampling or SNIPER, yields up to 3× speed-up during training. Unfortunately, as object locations are unknown during inference, the entire image pyramid still needs processing. To this end, we adopt a coarse-to-fine approach, and predict the locations and extent of object-like regions which will be processed in successive scales of the image pyramid. Intuitively, it’s akin to our active human-vision that first skims over the field-of-view to spot interesting regions for further processing and only recognizes objects at the right resolution. The resulting algorithm is referred to as AutoFocus and results in a 2.5-5× speed-up during inference when used with SNIP. Code: https://github.com/mahyarnajibi/SNIPER Index Terms—Object Detection, Image Pyramids , Foveal vision, Scale-Space Theory, Deep-Learning. F 1 INTRODUCTION BJECT-detection is one of the most popular and widely O researched problems in the computer vision community, owing to its application to a myriad of industrial systems, such as autonomous driving, robotics, surveillance, activity-detection, scene-understanding and/or large-scale multimedia analysis.
    [Show full text]
  • Multi-Scale Edge Detection and Image Segmentation
    MULTI-SCALE EDGE DETECTION AND IMAGE SEGMENTATION Baris Sumengen, B. S. Manjunath ECE Department, UC, Santa Barbara 93106, Santa Barbara, CA, USA email: {sumengen,manj}@ece.ucsb.edu web: vision.ece.ucsb.edu ABSTRACT In this paper, we propose a novel multi-scale edge detection and vector field design scheme. We show that using multi- scale techniques edge detection and segmentation quality on natural images can be improved significantly. Our ap- proach eliminates the need for explicit scale selection and edge tracking. Our method favors edges that exist at a wide range of scales and localize these edges at finer scales. This work is then extended to multi-scale image segmentation us- (a) (b) (c) (d) ing our anisotropic diffusion scheme. Figure 1: Localized and clean edges using multiple scales. a) 1. INTRODUCTION Original image. b) Edge strengths at spatial scale σ = 1, c) Most edge detection algorithms specify a spatial scale at using scales from σ = 1 to σ = 4. d) at σ = 4. Edges are not which the edges are detected. Typically, edge detectors uti- well localized at σ = 4. lize local operators and the effective area of these local oper- ators define this spatial scale. The spatial scale usually corre- sponds to the level of smoothing of the image, for example, scales and generating a synthesis of these edges. On the the variance of the Gaussian smoothing. At small scales cor- other hand, it is desirable that the multi-scale information responding to finer image details, edge detectors find inten- is integrated to the edge detection at an earlier stage and the sity jumps in small neighborhoods.
    [Show full text]
  • Computer Vision: Edge Detection
    Edge Detection Edge detection Convert a 2D image into a set of curves • Extracts salient features of the scene • More compact than pixels Origin of Edges surface normal discontinuity depth discontinuity surface color discontinuity illumination discontinuity Edges are caused by a variety of factors Edge detection How can you tell that a pixel is on an edge? Profiles of image intensity edges Edge detection 1. Detection of short linear edge segments (edgels) 2. Aggregation of edgels into extended edges (maybe parametric description) Edgel detection • Difference operators • Parametric-model matchers Edge is Where Change Occurs Change is measured by derivative in 1D Biggest change, derivative has maximum magnitude Or 2nd derivative is zero. Image gradient The gradient of an image: The gradient points in the direction of most rapid change in intensity The gradient direction is given by: • how does this relate to the direction of the edge? The edge strength is given by the gradient magnitude The discrete gradient How can we differentiate a digital image f[x,y]? • Option 1: reconstruct a continuous image, then take gradient • Option 2: take discrete derivative (finite difference) How would you implement this as a cross-correlation? The Sobel operator Better approximations of the derivatives exist • The Sobel operators below are very commonly used -1 0 1 1 2 1 -2 0 2 0 0 0 -1 0 1 -1 -2 -1 • The standard defn. of the Sobel operator omits the 1/8 term – doesn’t make a difference for edge detection – the 1/8 term is needed to get the right gradient
    [Show full text]
  • Detection and Evaluation Methods for Local Image and Video Features Julian Stottinger¨
    Technical Report CVL-TR-4 Detection and Evaluation Methods for Local Image and Video Features Julian Stottinger¨ Computer Vision Lab Institute of Computer Aided Automation Vienna University of Technology 2. Marz¨ 2011 Abstract In computer vision, local image descriptors computed in areas around salient interest points are the state-of-the-art in visual matching. This technical report aims at finding more stable and more informative interest points in the domain of images and videos. The research interest is the development of relevant evaluation methods for visual mat- ching approaches. The contribution of this work lies on one hand in the introduction of new features to the computer vision community. On the other hand, there is a strong demand for valid evaluation methods and approaches gaining new insights for general recognition tasks. This work presents research in the detection of local features both in the spatial (“2D” or image) domain as well for spatio-temporal (“3D” or video) features. For state-of-the-art classification the extraction of discriminative interest points has an impact on the final classification performance. It is crucial to find which interest points are of use in a specific task. One question is for example whether it is possible to reduce the number of interest points extracted while still obtaining state-of-the-art image retrieval or object recognition results. This would gain a significant reduction in processing time and would possibly allow for new applications e.g. in the domain of mobile computing. Therefore, the work investigates different corner detection approaches and evaluates their repeatability under varying alterations.
    [Show full text]