Detection and Evaluation Methods for Local Image and Video Features Julian Stottinger¨

Total Page:16

File Type:pdf, Size:1020Kb

Detection and Evaluation Methods for Local Image and Video Features Julian Stottinger¨ Technical Report CVL-TR-4 Detection and Evaluation Methods for Local Image and Video Features Julian Stottinger¨ Computer Vision Lab Institute of Computer Aided Automation Vienna University of Technology 2. Marz¨ 2011 Abstract In computer vision, local image descriptors computed in areas around salient interest points are the state-of-the-art in visual matching. This technical report aims at finding more stable and more informative interest points in the domain of images and videos. The research interest is the development of relevant evaluation methods for visual mat- ching approaches. The contribution of this work lies on one hand in the introduction of new features to the computer vision community. On the other hand, there is a strong demand for valid evaluation methods and approaches gaining new insights for general recognition tasks. This work presents research in the detection of local features both in the spatial (“2D” or image) domain as well for spatio-temporal (“3D” or video) features. For state-of-the-art classification the extraction of discriminative interest points has an impact on the final classification performance. It is crucial to find which interest points are of use in a specific task. One question is for example whether it is possible to reduce the number of interest points extracted while still obtaining state-of-the-art image retrieval or object recognition results. This would gain a significant reduction in processing time and would possibly allow for new applications e.g. in the domain of mobile computing. Therefore, the work investigates different corner detection approaches and evaluates their repeatability under varying alterations. The proposed sparse color interest point detector gives a stable number of features and thus a better comparable image represen- tation. By taking the saliency of color information and color invariances into account, improved retrieval of color images, being more stable to lighting and shadowing effects than using illumination correlated color information, is obtained. In an international benchmark the approach outperforms all other participants in 4 out of 20 classes using a fractional amount of features compared to other approaches. The Gradient Vector Flow (GVF) has been used with one manually adjusted set of parameters to locate centers of local symmetry at a certain scale. This work extends this approach and proposes a GVF based scale space pyramid and a scale decision criterion to provide general purpose interest points. This multi-scale orientation invariant interest point detector has the aim of providing stable and densely distributed locations. Due to the iterative gradient smoothing during the computation of the GVF, it takes more surrounding image information into account than other detectors. In the last decade, a great interest in evaluation of local visual features in the domain of images is observed. Most of the state-of-the-art features have been extended to the temporal domain to allow for video retrieval and categorization using similar techniques as used for images. However, there is no comprehensive evaluation of these. This thesis provides the first comparative evaluation based on isolated and well de- fined alterations of video data. The aim is to provide researchers with guidance when selecting the best approaches for new applications and data-sets. A dedicated publicly available data-set of 1710 videos is set up, with which researchers are able to test their features’ robustness against well defined challenges. For the evaluation of the detectors, a repeatability measure treating the videos as 3D volumes is developed. To evaluate the robustness of spatio-temporal descriptors, a prin- cipled classification pipeline is introduced where the increasingly altered videos build a set of queries. This allows for an in-depth analysis of local detectors and descriptors and their combinations. B Inhaltsverzeichnis 1 Introduction 1 1.1 Image Features . 4 1.2 Video Features . 7 1.3 Contributions . 9 1.3.1 Summary of the Contributions . 9 1.3.2 Contributions in Detail . 10 1.4 Applications . 17 1.4.1 Image Features and Matching . 17 1.4.2 Video Features and Matching . 22 1.5 Structure of the Thesis . 25 2 Principles 27 2.1 Color Spaces and Perception . 27 2.2 Properties of the Gaussian Kernel . 30 2.3 The Structure Tensor . 32 2.4 Local Description . 36 2.4.1 Local Description in Images . 36 2.4.2 Local Description in Videos . 41 2.5 Invariance of Features . 43 2.6 Summary . 46 3 Interest Point Detectors for Images 48 3.1 Luminance Based detectors . 48 3.1.1 Corner Detection . 48 3.1.2 Blob Detection . 52 3.1.3 Symmetry Based Interest Points . 56 3.1.4 Affine Invariant Estimation of Scale . 59 3.2 Color based detectors . 60 3.2.1 Corner Detection . 62 3.2.2 Scale Invariant Color Points . 63 3.2.3 Blob Detectors . 67 i 3.3 Summary . 69 4 Evaluation of Interest Points for Images 70 4.1 Data-sets . 72 4.1.1 Robustness Data-set . 72 4.1.2 The Amsterdam Library of Object Images . 78 4.1.3 PASCAL VOC 2007 data-set . 80 4.2 Robustness . 81 4.2.1 GVFpoints . 82 4.2.2 Color Points . 89 4.3 Image Matching . 91 4.4 Object categorization . 96 4.5 Feature Benchmark . 99 4.6 Summary . 104 5 Interest Point Detectors for Video 112 5.1 Luminance Based Spatio-Temporal Features . 113 5.1.1 Corner Detection . 113 5.1.2 Blob Detection . 115 5.1.3 Gabor Filtering . 118 5.2 Color Based Spatio-Temporal Features . 119 5.2.1 Corner Detection . 119 5.2.2 Blob Detection . 121 5.3 Summary . 122 6 Evaluation of Interest Point Detectors for Videos 123 6.1 Data-Sets . 123 6.1.1 Popular Video Data-sets . 124 6.1.2 FeEval . 127 6.2 Feature Behavior . 131 6.3 Robustness . 137 6.4 Video Matching . 141 6.5 Summary . 146 7 Conclusion 147 Bibliography 150 ii Kapitel 1 Introduction Interest points are the first stage of robust visual matching applications and build the state-of-the-art of visual feature localization. The interest point detectors allow for the reduction of computational complexity in scene matching and object recognition app- lications by selecting only a subset of image locations corresponding to specific and/or informative structures [110]. The extraction of stable locations is a successful way to match visual input in images or videos of the same scene acquired under different conditions [107]. As evaluated in [105], successful approaches extracting stable loca- tions rely on corner detection [50, 103], blobs like Maximally Stable Extremal Regions (MSER) [100], Difference of Gaussians (DoG) [87] or detecting local symmetry [90]. The majority of interest point extraction algorithms are purely intensity based [50, 61, 105]. This ignores saliency information contained in the color channels. It is known that the distinctiveness of color based interest points is larger, and therefore color is of importance when matching images [136]. Furthermore, color plays an important role in the pre-attentive stage in which features are detected [52, 135] as it is one of the elementary stimulus features [164]. Fergus et al. [34] point out that a categorization framework is heavily dependent on the detector to gather useful features. In general, the current trend is toward increasing the number of points [183], applying several detectors or combining them [102, 138], or making the interest point distribution as dense as possible [159]. While such a dense sampling has been shown to be effective in object recognition, these approaches ba- sically shift the task of discarding the non-discriminative points to the classifier [17]. Further, dense sampling implies that a huge amount of data must be extracted from each image and processed. This is feasible when executed on a cluster of computers in a rese- arch environment. Nevertheless, there are environments in which the luxury of extensive computing power is not available. This is illustrated by the strong trend towards mobile computing on Netbooks, mobile phones and PDAs. With growing datasets, clustering and off-line training of features become infeasible [133]. Therefore, there is a strong interest to exploit state-of-the-art classification and to 1 Abbildung 1.1: The main stages of image matching. focus on the extraction of discriminative interest points. Following the main idea of interest points to reduce the data to be processed, one important question is whether it is possible to reduce the number of interest points extracted while still obtaining state- of-the-art image retrieval or object recognition results. Recent work aims at finding discriminative features e.g. by performing an evaluation of all features within the dataset or per image class and choosing the most frequent ones [157]. All these methods require an additional calculation step with an inherent demand for memory and processing time dependent on the number of features. In this thesis, color salient and invariant information is used for image point selection with the aim to achieve state-of-the-art performance while using significantly fewer interest points. The difference to prior work is that feature selection takes place at the very first step of the feature extraction and is carried out independently per feature. In contrast to feature selection that is taken care of by the classification step only, this method of regarding color saliency or color invariance provides a diminished amount of data for subsequent operations. Techniques such as the bags-of-words approach are originally inspired by text re- trieval. These have been extended to “2D” techniques on images and build the state-of- the-art in image matching. These approaches are now successfully carried out in both the spatial and the temporal domains for action recognition, video understanding and video matching (e.g.
Recommended publications
  • Machine Generated Curvilinear Structures Extracted from Magnetic Data in the Fort Mcmurray Oil Sand Area Using 2D Structure Tensor Hassan H
    Machine generated curvilinear structures extracted from magnetic data in the Fort McMurray oil sand area using 2D structure tensor Hassan H. Hassan Multiphysics Imaging Technology Summary Curvilinear structures are among the most important geological features that are detected by geophysical data, because they are often linked to subsurface geological structures such as faults, fractures, lithological contacts, and to some extent paleo-channels. Thus, accurate detection and extraction of curvilinear structures from geophysical data is crucial for oil and gas, as well as mineral exploration. There is a wide-range of curvilinear structures present in geophysical images, either as straight lines or curved lines, and across which image magnitude changes rapidly. In magnetic images, curvilinear structures are usually associated with geological features that exhibit significant magnetic susceptibility contrasts. Traditionally, curvilinear structures are manually detected and extracted from magnetic data by skilled professionals, using derivative or gradient based filters. However, the traditional approach is tedious, slow, labor-intensive, subjective, and most important they do not perform well with big geophysical data. Recent advances in digital imaging and data analytic techniques in the fields of artificial intelligence, machine and deep learnings, helped to develop new methods to automatically enhance, detect, and extract curvilinear structures from images of small and big data. Among these newly developed techniques, we adopted one that is based on 2D Hessian structure tensor. Structure tensor is a relatively modern imaging technique, and it reveals length and orientation of curvilinear structures along the two principal directions of the image. It is based on the image eigenvalues (λ1, λ2) and their corresponding eigenvectors (e1, e2), respectively.
    [Show full text]
  • Hough Transform, Descriptors Tammy Riklin Raviv Electrical and Computer Engineering Ben-Gurion University of the Negev Hough Transform
    DIGITAL IMAGE PROCESSING Lecture 7 Hough transform, descriptors Tammy Riklin Raviv Electrical and Computer Engineering Ben-Gurion University of the Negev Hough transform y m x b y m 3 5 3 3 2 2 3 7 11 10 4 3 2 3 1 4 5 2 2 1 0 1 3 3 x b Slide from S. Savarese Hough transform Issues: • Parameter space [m,b] is unbounded. • Vertical lines have infinite gradient. Use a polar representation for the parameter space Hough space r y r q x q x cosq + ysinq = r Slide from S. Savarese Hough Transform Each point votes for a complete family of potential lines: Each pencil of lines sweeps out a sinusoid in Their intersection provides the desired line equation. Hough transform - experiments r q Image features ρ,ϴ model parameter histogram Slide from S. Savarese Hough transform - experiments Noisy data Image features ρ,ϴ model parameter histogram Need to adjust grid size or smooth Slide from S. Savarese Hough transform - experiments Image features ρ,ϴ model parameter histogram Issue: spurious peaks due to uniform noise Slide from S. Savarese Hough Transform Algorithm 1. Image à Canny 2. Canny à Hough votes 3. Hough votes à Edges Find peaks and post-process Hough transform example http://ostatic.com/files/images/ss_hough.jpg Incorporating image gradients • Recall: when we detect an edge point, we also know its gradient direction • But this means that the line is uniquely determined! • Modified Hough transform: for each edge point (x,y) θ = gradient orientation at (x,y) ρ = x cos θ + y sin θ H(θ, ρ) = H(θ, ρ) + 1 end Finding lines using Hough transform
    [Show full text]
  • Edge Orientation Using Contour Stencils Pascal Getreuer
    Edge Orientation Using Contour Stencils Pascal Getreuer To cite this version: Pascal Getreuer. Edge Orientation Using Contour Stencils. SAMPTA’09, May 2009, Marseille, France. Special session on sampling and (in)painting. hal-00452291 HAL Id: hal-00452291 https://hal.archives-ouvertes.fr/hal-00452291 Submitted on 1 Feb 2010 HAL is a multi-disciplinary open access L’archive ouverte pluridisciplinaire HAL, est archive for the deposit and dissemination of sci- destinée au dépôt et à la diffusion de documents entific research documents, whether they are pub- scientifiques de niveau recherche, publiés ou non, lished or not. The documents may come from émanant des établissements d’enseignement et de teaching and research institutions in France or recherche français ou étrangers, des laboratoires abroad, or from public or private research centers. publics ou privés. Edge Orientation Using Contour Stencils Pascal Getreuer (1) (1) Department of Mathematics, University of California Los Angeles [email protected] Abstract: structure tensor J( u) = u u. The struc- ture tensor satisfies ∇J( u)∇ = J⊗( ∇u) and u is an eigenvector of J( u).−∇ The structure∇ tensor takes∇ into Many image processing applications require estimat- ∇ ing the orientation of the image edges. This estimation account the orientation but not the sign of the direc- is often done with a finite difference approximation tion, thus solving the antipodal cancellation problem. of the orthogonal gradient. As an alternative, we ap- As developed by Weickert [9], let ply contour stencils, a method for detecting contours from total variation along curves, and show it more Jρ( uσ) = Gρ J(Gσ u) (1) ∇ ∗ ∗ robustly estimates the edge orientations than several where Gσ and Gρ are Gaussians with standard devia- finite difference approximations.
    [Show full text]
  • A PERFORMANCE EVALUATION of LOCAL DESCRIPTORS 1 a Performance Evaluation of Local Descriptors
    MIKOLAJCZYK AND SCHMID: A PERFORMANCE EVALUATION OF LOCAL DESCRIPTORS 1 A performance evaluation of local descriptors Krystian Mikolajczyk and Cordelia Schmid Dept. of Engineering Science INRIA Rhone-Alpesˆ University of Oxford 655, av. de l'Europe Oxford, OX1 3PJ 38330 Montbonnot United Kingdom France [email protected] [email protected] Abstract In this paper we compare the performance of descriptors computed for local interest regions, as for example extracted by the Harris-Affine detector [32]. Many different descriptors have been proposed in the literature. However, it is unclear which descriptors are more appropriate and how their performance depends on the interest region detector. The descriptors should be distinctive and at the same time robust to changes in viewing conditions as well as to errors of the detector. Our evaluation uses as criterion recall with respect to precision and is carried out for different image transformations. We compare shape context [3], steerable filters [12], PCA-SIFT [19], differential invariants [20], spin images [21], SIFT [26], complex filters [37], moment invariants [43], and cross-correlation for different types of interest regions. We also propose an extension of the SIFT descriptor, and show that it outperforms the original method. Furthermore, we observe that the ranking of the descriptors is mostly independent of the interest region detector and that the SIFT based descriptors perform best. Moments and steerable filters show the best performance among the low dimensional descriptors. Index Terms Local descriptors, interest points, interest regions, invariance, matching, recognition. I. INTRODUCTION Local photometric descriptors computed for interest regions have proved to be very successful in applications such as wide baseline matching [37, 42], object recognition [10, 25], texture Corresponding author is K.
    [Show full text]
  • Plane-Wave Sobel Attribute for Discontinuity Enhancement in Seismic Imagesa
    Plane-wave Sobel attribute for discontinuity enhancement in seismic imagesa aPublished in Geophysics, 82, WB63-WB69, (2017) Mason Phillips and Sergey Fomel ABSTRACT Discontinuity enhancement attributes are commonly used to facilitate the interpre- tation process by enhancing edges in seismic images and providing a quantitative measure of the significance of discontinuous features. These attributes require careful pre-processing to maintain geologic features and suppress acquisition and processing artifacts which may be artificially detected as a geologic edge. We propose the plane-wave Sobel attribute, a modification of the classic Sobel filter, by orienting the filter along seismic structures using plane-wave destruction and plane- wave shaping. The plane-wave Sobel attribute can be applied directly to a seismic image to efficiently and effectively enhance discontinuous features, or to a coherence image to create a sharper and more detailed image. Two field benchmark data examples with many faults and channel features from offshore New Zealand and offshore Nova Scotia demonstrate the effectiveness of this method compared to conventional coherence attributes. The results are reproducible using the Madagascar software package. INTRODUCTION One of the major challenges of interpreting seismic images is the delineation of reflection discontinuities that are related to distinct geologic features, such as faults, channels, salt boundaries, and unconformities. Visually prominent reflection features often overshadow these subtle discontinuous features which are critical to understanding the structural and depositional environment of the subsurface. For this reason, precise manual interpretation of these reflection discontinuities in seismic images can be tedious and time-consuming, es- pecially when data quality is poor. Discontinuity enhancement attributes are among the most widely used seismic attributes today.
    [Show full text]
  • 1 Introduction
    1: Introduction Ahmad Humayun 1 Introduction z A pixel is not a square (or rectangular). It is simply a point sample. There are cases where the contributions to a pixel can be modeled, in a low order way, by a little square, but not ever the pixel itself. The sampling theorem tells us that we can reconstruct a continuous entity from such a discrete entity using an appropriate reconstruction filter - for example a truncated Gaussian. z Cameras approximated by pinhole cameras. When pinhole too big - many directions averaged to blur the image. When pinhole too small - diffraction effects blur the image. z A lens system is there in a camera to correct for the many optical aberrations - they are basically aberrations when light from one point of an object does not converge into a single point after passing through the lens system. Optical aberrations fall into 2 categories: monochromatic (caused by the geometry of the lens and occur both when light is reflected and when it is refracted - like pincushion etc.); and chromatic (Chromatic aberrations are caused by dispersion, the variation of a lens's refractive index with wavelength). Lenses are essentially help remove the shortfalls of a pinhole camera i.e. how to admit more light and increase the resolution of the imaging system at the same time. With a simple lens, much more light can be bought into sharp focus. Different problems with lens systems: 1. Spherical aberration - A perfect lens focuses all incoming rays to a point on the optic axis. A real lens with spherical surfaces suffers from spherical aberration: it focuses rays more tightly if they enter it far from the optic axis than if they enter closer to the axis.
    [Show full text]
  • Structure-Oriented Gaussian Filter for Seismic Detail Preserving Smoothing
    STRUCTURE-ORIENTED GAUSSIAN FILTER FOR SEISMIC DETAIL PRESERVING SMOOTHING Wei Wang*, Jinghuai Gao*, Kang Li**, Ke Ma*, Xia Zhang* *School of Electronic and Information Engineering, Xi’an Jiaotong University, Xi’an, 710049, China. **China Satellite Maritime Tracking and Control Department, Jiangyin, 214431, China. [email protected] ABSTRACT location. Among the different methods to achieve the denoising of 3D seismic data, a large number of approaches This paper presents a structure-oriented Gaussian (SOG) using nonlinear diffusion techniques have been proposed in filter for reducing noise in 3D reflection seismic data while the recent years ([6],[7],[8],[9],[10]). These techniques are preserving relevant details such as structural and based on the use of Partial Differential Equations (PDE). stratigraphic discontinuities and lateral heterogeneity. The Among these diffusion based seismic denoising Gaussian kernel is anisotropically constructed based on two methods, an excellent sample is the seismic fault preserving confidence measures, both of which take into account the diffusion (SFPD) proposed by Lavialle et al. [10]. The regularity of the local seismic structures. So that, the filter SFPD filter is driven by a diffusion tensor: shape is well adjusted according to different local wU divD JUV U U (1) geological features. Then, the anisotropic Gaussian is wt steered by local orientations of the geological features Anisotropy in the smoothing processes is addressed in terms (layers) provided by the Gradient Structure Tensor. The of eigenvalues and eigenvectors of the diffusion tensor. potential of our approach is presented through a Therein, the diffusion tensor is based on the analysis of the comparative experiment with seismic fault preserving Gradient Structure Tensor ([4],[11]): diffusion (SFPD) filter on synthetic blocks and an §·2 §·wwwwwUUUUUVVVVV application to real 3D seismic data.
    [Show full text]
  • Coding Images with Local Features
    Int J Comput Vis (2011) 94:154–174 DOI 10.1007/s11263-010-0340-z Coding Images with Local Features Timo Dickscheid · Falko Schindler · Wolfgang Förstner Received: 21 September 2009 / Accepted: 8 April 2010 / Published online: 27 April 2010 © Springer Science+Business Media, LLC 2010 Abstract We develop a qualitative measure for the com- The results of our empirical investigations reflect the theo- pleteness and complementarity of sets of local features in retical concepts of the detectors. terms of covering relevant image information. The idea is to interpret feature detection and description as image coding, Keywords Local features · Complementarity · Information and relate it to classical coding schemes like JPEG. Given theory · Coding · Keypoint detectors · Local entropy an image, we derive a feature density from a set of local fea- tures, and measure its distance to an entropy density com- puted from the power spectrum of local image patches over 1 Introduction scale. Our measure is meant to be complementary to ex- Local image features play a crucial role in many computer isting ones: After task usefulness of a set of detectors has vision applications. The basic idea is to represent the image been determined regarding robustness and sparseness of the content by small, possibly overlapping, independent parts. features, the scheme can be used for comparing their com- By identifying such parts in different images of the same pleteness and assessing effects of combining multiple de- scene or object, reliable statements about image geome- tectors. The approach has several advantages over a simple try and scene content are possible. Local feature extraction comparison of image coverage: It favors response on struc- comprises two steps: (1) Detection of stable local image tured image parts, penalizes features in purely homogeneous patches at salient positions in the image, and (2) description areas, and accounts for features appearing at the same lo- of these patches.
    [Show full text]
  • Local Descriptors
    Local descriptors SIFT (Scale Invariant Feature Transform) descriptor • SIFT keypoints at locaon xy and scale σ have been obtained according to a procedure that guarantees illuminaon and scale invance. By assigning a consistent orientaon, the SIFT keypoint descriptor can be also orientaon invariant. • SIFT descriptor is therefore obtained from the following steps: – Determine locaon and scale by maximizing DoG in scale and in space (i.e. find the blurred image of closest scale) – Sample the points around the keypoint and use the keypoint local orientaon as the dominant gradient direc;on. Use this scale and orientaon to make all further computaons invariant to scale and rotaon (i.e. rotate the gradients and coordinates by the dominant orientaon) – Separate the region around the keypoint into subregions and compute a 8-bin gradient orientaon histogram of each subregion weigthing samples with a σ = 1.5 Gaussian SIFT aggregaon window • In order to derive a descriptor for the keypoint region we could sample intensi;es around the keypoint, but they are sensi;ve to ligh;ng changes and to slight errors in x, y, θ. In order to make keypoint es;mate more reliable, it is usually preferable to use a larger aggregaon window (i.e. the Gaussian kernel size) than the detec;on window. The SIFT descriptor is hence obtained from thresholded image gradients sampled over a 16x16 array of locaons in the neighbourhood of the detected keypoint, using the level of the Gaussian pyramid at which the keypoint was detected. For each 4x4 region samples they are accumulated into a gradient orientaon histogram with 8 sampled orientaons.
    [Show full text]
  • Local Feature Detectors and Descriptors Overview
    CS 6350 – COMPUTER VISION Local Feature Detectors and Descriptors Overview • Local invariant features • Keypoint localization - Hessian detector - Harris corner detector • Scale Invariant region detection - Laplacian of Gaussian (LOG) detector - Difference of Gaussian (DOG) detector • Local feature descriptor - Scale Invariant Feature Transform (SIFT) - Gradient Localization Oriented Histogram (GLOH) • Examples of other local feature descriptors Motivation • Global feature from the whole image is often not desirable • Instead match local regions which are prominent to the object or scene in the image. • Application Area - Object detection - Image matching - Image stitching Requirements of a local feature • Repetitive : Detect the same points independently in each image. • Invariant to translation, rotation, scale. • Invariant to affine transformation. • Invariant to presence of noise, blur etc. • Locality :Robust to occlusion, clutter and illumination change. • Distinctiveness : The region should contain “interesting” structure. • Quantity : There should be enough points to represent the image. • Time efficient. Others preferable (but not a must): o Disturbances, attacks, o Noise o Image blur o Discretization errors o Compression artifacts o Deviations from the mathematical model (non-linearities, non-planarities, etc.) o Intra-class variations General approach + + + 1. Find the interest points. + + 2. Consider the region + ( ) around each keypoint. local descriptor 3. Compute a local descriptor from the region and normalize the feature.
    [Show full text]
  • Interest Point Detectors and Descriptors for IR Images
    DEGREE PROJECT, IN COMPUTER SCIENCE , SECOND LEVEL STOCKHOLM, SWEDEN 2015 Interest Point Detectors and Descriptors for IR Images AN EVALUATION OF COMMON DETECTORS AND DESCRIPTORS ON IR IMAGES JOHAN JOHANSSON KTH ROYAL INSTITUTE OF TECHNOLOGY SCHOOL OF COMPUTER SCIENCE AND COMMUNICATION (CSC) Interest Point Detectors and Descriptors for IR Images An Evaluation of Common Detectors and Descriptors on IR Images JOHAN JOHANSSON Master’s Thesis at CSC Supervisors: Martin Solli, FLIR Systems Atsuto Maki Examiner: Stefan Carlsson Abstract Interest point detectors and descriptors are the basis of many applications within computer vision. In the selection of which methods to use in an application, it is of great in- terest to know their performance against possible changes to the appearance of the content in an image. Many studies have been completed in the field on visual images while the performance on infrared images is not as charted. This degree project, conducted at FLIR Systems, pro- vides a performance evaluation of detectors and descriptors on infrared images. Three evaluations steps are performed. The first evaluates the performance of detectors; the sec- ond descriptors; and the third combinations of detectors and descriptors. We find that best performance is obtained by Hessian- Affine with LIOP and the binary combination of ORB de- tector and BRISK descriptor to be a good alternative with comparable results but with increased computational effi- ciency by two orders of magnitude. Referat Detektorer och deskriptorer för extrempunkter i IR-bilder Detektorer och deskriptorer är grundpelare till många ap- plikationer inom datorseende. Vid valet av metod till en specifik tillämpning är det av stort intresse att veta hur de presterar mot möjliga förändringar i hur innehållet i en bild framträder.
    [Show full text]
  • Keypoint Extraction and Selection for Object Recognition
    7-1 MVA2009 IAPR Conference on Machine Vision Applications, May 20-22, 2009, Yokohama, JAPAN Keypoint extraction and selection for object recognition Maja Rudinac Boris Lenseigne Pieter Jonker Delft Biorobotics Laboratory, Department of BioMechanical Engineering Delft University of Technology, 2628 CD, Delft, The Netherlands { m.rudinac, b.a.j.lenseigne, p.p.jonker}@tudelft.nl Abstract the method for selecting the most representative points in the scene. In order to gain as much information as possi- In order to improve the performance of affine invariant ble, we decided to combine different keypoint extraction detectors, approaches that combine different keypoint methods for detecting the object and then to reduce the extraction methods can be found in literature. However, number of found keypoints using an independent measure such a combining has two major drawbacks: a high com- for information content. This reduction is performed for putational cost in matching similar objects, and a large two reasons: to keep the most discriminative points, and number of false positives because only a relatively small to speed up the matching. For creating the object models, subset of those keypoints is really discriminative. In this the most representative keypoints are then described us- paper we propose a method to overcome these limitations: ing SIFT [1] and GLOH [6] descriptors. This paper is First we combine different keypoint extractors in order to organized as follows: chapter 2 gives a short overview of obtain a large set of possible interest points for a given related work. The detailed explanation of our approach object. Then, a two step filtering approach is applied: and the test results are presented in chapters 3-5.
    [Show full text]