Object Recognition Techniques in Real Applications Fernandez Robles, Laura

University of Groningen Object recognition techniques in real applications Fernandez Robles, Laura IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below. Document Version Publisher's PDF, also known as Version of record Publication date: 2016 Link to publication in University of Groningen/UMCG research database Citation for published version (APA): Fernandez Robles, L. (2016). Object recognition techniques in real applications. University of Groningen. Copyright Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons). The publication may also be distributed here under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license. More information can be found on the University of Groningen website: https://www.rug.nl/library/open-access/self-archiving-pure/taverne- amendment. Take-down policy If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim. Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum. Download date: 29-09-2021 Chapter 2 State of the art n the last decades, there has been substantial work in the computer vision field Ithat tackles the problem of object recognition. Here we present a brief survey of different approaches on object recognition. Some reviews divide object recognition approaches in three categories. Model- based methods deal with the representation and identification of a known three dimensional (3-D) objects (boxes, spheres, cylinders, cones, surface of revolution, etc.). Similarly, shape-based methods represent an object by its shape and/or contour. In contrast, appearance-based models use the appearance of the object usually under several two dimensional (2-D) views. Another way of classifying object recognition techniques distinguishes between local and global approaches. Local methods search for salient regions or points that characterise the object of interest such as corners, edges or entropy. Later, these regions are typified by given descriptors. The local descriptors of the object of interest and the local descriptors of the test image are then compared for object recognition purposes. In contrast to that, global methods model the information content of the whole object of interest. This information can come from simple statistical measures (such as mean values or histograms of features) to more advanced dimensionality reduction techniques. Global methods allow to reconstruct the original image providing robutness to some extend whereas local methods can better cope with partly occluded objects. Local appearance based object recognition methods need to detect and describe dis- tinctive regions or keypoints in an image. As for the detection, we can differentiate corner based detectors, region based detectors and others. Corner based detectors locate keypoints and regions which contain a lot of image structure like edges. Corners can be defined as points with low self-similarity in all directions. The self-similarity of an image patch can be measured by taking the sum of squared differences (SSD) between an image patch and a shifted version of itself. The most popular corner based detector is the one of Harris and Steph- ens (1988). It works by computing a response function across all the image pixels. Then, those exceeding a threshold, also known as locally maximal, are considered corners. The response function is obtained from the Harris matrix computed from image derivatives. The Harris-point detector achieves a large number of keypo- 12 2. State of the art ints with sufficient repeatability (Schmid et al., 2000). The main advantage of this detector is the high computation speed whereas the main disadvantage is that the detector is only invariant to rotation since no information about scale and orientation is provided. Harris-Laplace detector add invariance to scale and is based on the work of Lindeberg (1998) that studies the properties of scale space. Mikolajczyk and Schmid (2002) proposed Harris-Affine detector by extending the Harris-Laplace detector in order to achieve invariance to affine transformations. It is based on the shape estimation properties of the second moment matrix. The main disadvantage of the Harris-Affine detector is the increase in time computation. Region based detectors locate local blobs of uniform brightness and are therefore suited for uniform regions or regions with smooth transitions. Hessian matrix detectors (Mikolajczyk et al., 2005) are similar to Harris detectors. The Hessian matrix is computed from the second image derivatives, thus, this detector responds to blobs-like structures. Keypoints are selected based on the determinant of the Hessian matrix after non-maximum suppression. The main drawback is that it provides only rotational invariance properties. Similarly, Hessian-Laplace detectors have scale invariance properties and Hessian-Affine detectors add invariance to affine transformations, (Mikolajczyk and Schmid, 2002). Instead of a scale normalised Laplacian, Lowe (1999, 2004) uses an approximation of the Laplacian, namely dif- ference of Gaussian function (DoG), by calculating differences of Gaussian blurred images at different adjacent local scales. The main advantage is the invariance to scale but it is punished by an increase in runtime. Maximally stable extremal regions (MSER) (Matas et al., 2004) are regions that are either darker or brighter than their surroundings and that are stable across a range of thresholds of the intensity function. If single keypoints are needed, they are usually consider as the centres of gravity of each MSERs. The number of MSERs detected is rather small in compar- ison with previous mentioned detectors, but Mikolajczyk et al. (2005) affirms that the repeatability is higher in most cases. An example of other approaches different to corner or region detectors can be the entropy based salient regions detectors (Kadir et al., 2003; Kadir and Brady, 2001, 2003). They consider the grey value entropy of a circular region in the image in order to estimate the visual saliency of a region. The main drawback is that it is time consuming, specially for the affine invariant implementation (Kadir et al., 2004). Tuytelaars and Gool (1999) and Tuytelaars and Van Gool (2004) proposed two detectors, intensity based regions (IBR) and edge based regions (EBR). Some works locally describe the whole object in a dense way such as bag of words (BoW) descrip- tion based on dense SIFT and histogram of oriented gradients (HOG). BoW (Sivic and Zisserman, 2009) is a vector of occurrence counts of a vocabulary of local image features. HOG (Dalal and Triggs, 2005a) counts occurrences of gradient orientation in a dense grid of uniformly spaced cells and uses overlapping local contrast nor- malization to improve accuracy. 13 After region or point detection, features descriptors should be computed for de- scribing the regions or the local neighbourhoods of the points respectively. One can distinguish among distribution based descriptors, filter based descriptors and other methods. Distribution based descriptors represent some properties of given regions by histograms. Usually these properties come from the geometric information of the points and local orientation information in the region. Probably the most popular descriptor is scale invariant feature transform (SIFT), developed by Lowe (1999, 2004). Actually, he proposed a combination of both SIFT detector and descriptor, where SIFT detector is the DoG previously discussed. For obtaining SIFT descriptors, the local image gradients are measured at the selected scale in the region around each keypoint. These are usually transformed into a representation of a 4×4 array of histograms with 8 orientation bins in each, leading to a 128 element feature vector for each keypoint. SIFT is invariant to uniform scaling and orientation and partially invariant to affine distortion and illumination changes and allows for object recognition under clutter and partial occlusion. The main disadvantage of SIFT is the high computational time required. Many versions of SIFT have been proposed. Ke and Sukthankar (2004) reduced the dimensionality of the SIFT descriptor by ap- plying principal component analysis (PCA) to the scale-normalised gradient patches instead of gradient histograms on the keypoints. Gradient location-orientation histograms (GLOH) (Mikolajczyk et al., 2005) try to obtain higher robustness and dis- tinctiveness than SIFT descriptors. Authors divided the keypoint patch into a radial and angular grid leading to a higher dimensional descriptor that is reduced by ap- plying PCA to the 128 largest eigenvalues. Spin images (Johnson and Hebert, 1999) for 2-D images (Lazebnik et al., 2003) use a 2-D histogram of intensity values and their distance from the centre of the region. Each row of the 2-D descriptor repres- ents the histogram of the grey values in an annulus distance from the centre. This descriptor is invariant to in-plane rotations. Belongie et al. (2002) introduced shape context descriptors that

Load more