Local Feature Filtering for 3D Object Recognition
Total Page:16
File Type:pdf, Size:1020Kb
Local Feature Filtering for 3D Object Recognition Md. Kamrul Hasan∗, Liang Chen†, Christopher J. Pal∗ and Charles Brown† ∗ Ecole Polytechnique de Montreal, QC, Canada E-mail: [email protected], [email protected], Tel: +1(514) 340-5121,x.7110 † University of Northern British Columbia, BC, Canada E-mail: [email protected], [email protected] Tel: +1(250) 960-5838 Abstract—Three filtering/sampling approaches for local fea- by the Bag of Words (BOW) model from textual information tures, namely Random sampling, Concentrated sampling, and retrieval, computer vision researchers have induced it as Bag Inversely concentrated sampling are proposed and investigated. Of Visual Words(BOVW) model[13,14], where visual features While local features characterize an image with distinctive key- points, the filtering/sampling analogies have a two-fold goal: (i) are quantized, the code-words are generated, and named as compressing an object definition, and (ii) reducing the classifier visual words. These Visual Words have been successfully learning bias, that is induced by the variance of key-point tested for feature matching in 2D object images. numbers among object classes. The proposed methodologies have In this work, we have proposed three new local image fea- been tested for 3D object recognition on the Columbia Object ture filtering methodologies that are based on some very sim- Library (COIL-20) dataset, where the Support Vector Machines (SVMs) is applied as a classification tool and winner takes all ple assumptions. Following these filtering approaches, three principle is used for inferences. The proposed approaches have object recognition models have been formulated, which are achieved promising performance with 100% recognition rate for described in section two. Section three comes with two major objects with out of plane rotational range less than 30 degree. experiments that have been conducted: (i) Object Recognition, The systems also performed less invariantly in the asymmetrical and (ii) Asymmetrical Rotational Test. Finally, section four recognition test. concludes with a discussion. I. INTRODUCTION II. I-SIFT MODELS (MF ) Object recognition is the task of identifying object(s) in an Our models are based on the following three assumptions: image or in a video sequence. The major object recognition (i) The representative views of an object, Oi share some approaches can be categorized as: (i) Appearance based [1,2,3] common features, which concentrate around the mean feature , (ii) Geometry based[4], and (iii) Local feature oriented [5- vector. 7]. The difficulties that make object recognition challenging (ii) Key-points, that are distant from the mean feature vector are: scaling, rotational and translation invariance, lighting are the representative of highly distinctive feature points for and illumination change, occlusion, cluttered background, and object Oi. These define the Inverse Concentrated Samples. different types of noise. As found to be the superior in tackling (iii) Following the Monte Carlo Assumptions, key-points, these challenges, object recognition research has gradually be- collected by random sampling from a set of view instances came focused in local feature oriented[5,8-10], and researchers might represent an object Oi. have been successful to extract various scaling, rotational, For an image I(x,y), we define a SIFT-Image with a set of translation invariant local features in images. Following the feature descriptor vectors extracted by the SIFT algorithm [6]. scale space theory[11], Lowe at al. developed Scale Invariant Formally, a SIFT-Image I(x,y) is defined as: Feature Transform (SIFT)[5,6] features, which uses difference of Gaussian functions at different scales to extract distinctive IS(x, ˙ y,˙ X) = SIF T (I(x, y)) local features. In a series of papers [5,6,12], the effectiveness where, X is the feature vectors detected at positions (x, ˙ y˙) of the SIFT features for the object recognition task has been in an image I(x, y); vector X = (x , x , ......., x ) is under proven. Matas et. al. [8] developed descriptors computed from 1 2 D dimension D, where D = 128 [6]. For simplicity, in the pixels inside convex hulls, called the Invariant Pixel Set Sig- subsequent discussions,| we| will denote IS(x, ˙ y,˙ X) as IS. For natures(IPSS). Some of the successful local features in object a SIFT-Image IS, the mean vector µ, which is calculated from recognition are[10]: SIFT, PCA-SIFT, Gradient Location and all feature points, detected in IS. Orientation Histogram (GLOH), shape context, spin images, steerable filters, differential invariants, and moment invariants. A. µk-SIF T -Image(ISµk) In feature based approaches, each training 2D view image A µk-SIF T -Image is defined as, of a 3D object may generate a number of key-points. It is found that direct feature matching is expensive as this does ISµk(¨x, y,¨ Xµk) = kNN(µ(I(x, y)) not scale with increasing number of training images and target objects. In [12], Panu Turcot et. al. proposed to reduce the where, kNN(µ(I(x, y)) are the k nearest key-points to the number of features by selecting useful feature set. Motivated mean feature descriptor vector, µ. Therefore, for a SIFT-Image 899 Proceedings of the Second APSIPA Annual Summit and Conference, pages 899–902, Biopolis, Singapore, 14-17 December 2010. 10-0108990902©2010 APSIPA. All rights reserved. Sµk Sµk Srk class, FC C ,C ,C is generated. Each of the ∈ { i i i } feature vectors in FC is labeled with the class-label Ci, and a database of training feature vectors is created. The model MF has been formulated as Support Vector Machines with Radial Basis Function(RBF) kernel, and the model parameters were learned with a ten fold cross validation technique. For the three filtering (FS) setups, the corresponding object classification models are denoted as Mµk,Mµk and Mrand. S For testing, a SIFT image IX is generated from a test image, IX . For models Mµk and Mµk, the SIFT image IX is filtered and corresponding filtered images ISµk and ISµk are generated. On the other hand, for model Mrand, the whole S set of SIFT key-points in IX are selected as the test key-point set. Each of the key-points are tested with the corresponding I- SIFT model, MF , and classification votes are accumulated for each object class, Oi. An winning class, and thus the winning object is decided by the national voting principle [1]. III. EXPERIMENTS AND RESULTS We have carried out two experiments on the Columbia Ob- ject Image Library (COIL)-20[15] dataset: (i) Object Recog- nition, and (ii) Asymmetrical Rotational Recognition Test. Fig. 1. COIL-20 Dataset (a) Zero-degree instances from the twenty categories, and (b) The first thirty six instances from object one COIL-20 is a dataset of 20 different 3D objects with 72 views per object, taken at a pose interval of 5◦ by a moving camera. Fig. 1(a) shows the zero-degree samples, whereas the first thirty six instances for object one are shown in Fig. 1(b). IS, the µk-SIF T -Image(ISµk) is a filtered image defined with the k nearest key-points from µ. A. Object Recognition B. µk-SIF T -Image (ISµk) I-SIFT recognition results and a model comparison graph A filtered SIFT image, ISµk is defined with the farthest is shown in Fig. 2. The recognition results, provided for the k key-points from the mean feature descriptor vector, µ . model Mrand is an average from three test runs. A number Formally, of conclusions are drawn below on the relationship among parameter k, the number of training view(s) per object, and ISµk(¨x, y,¨ Xµk) = kNN(µ(IS(X)), k) their effect on recognition results: For k = 10, models M and Mrand achieved the 100% S • µk where kNN(µ(I (X)), k) denotes the farthest k key-points recognition rate with 24 and 36 training views per object from the mean feature descriptor vector, µ. respectively. Srk For k = 20, models M and Mrand achieved the 100% C. rand-SIF T -Image (I ) • µk recognition rate with 24 and 18 training views per object A rand-SIF T -Image, ISrk is an image with k randomly S respectively. selected key-points from a SIFT image I . Formally, For k = 30, all the three models were able to recognize • the objects fully; the number of corresponding training ISrk(¨x, y,¨ Xrk) = random(IS(X), k) instances required for models Mµ, Mµk , and Mrand In this work, object recognition problem has been formu- were 18, 24, and 18 respectively. lated as a supervised classification problem. For each of the From Fig. 2, it can be inferred that models Mµk and filtering setups FS µk-SIF T, µk-SIF T, rand-SIF T , a Mrand performed similar recognition results for an increasing ∈ { } corresponding model MF Mµk,M ,Mrand is learned. k. Model Mµ performed the best when there were less than ∈ { µk } The task of a model MF is to classify SIFT key-points and three training instances were available; while its recognition to recognize objects in a test image assuming that the SIFT rate was lower than the other two models for a k = 10. It is key-points were generated from the targeted object. We call evident in general, an increasing k increases the recognition these models as Inverse-SIFT(I-SIFT). rate. However, a random large k may not necessarily always A set of training images I1,I2, ............, In , which are produce a better recognition. The reason is, the number of key- { } the representative view samples of an object Oi are grouped points varies from instance to instance, and from class to class. into a category, Ci .