Meta-Class Features for Large-Scale Object Categorization on a Budget

Meta-Class Features for Large-Scale Object Categorization on a Budget Alessandro Bergamo Lorenzo Torresani Dartmouth College Hanover, NH, U.S.A. faleb, [email protected] Abstract cation accuracy over a predefined set of classes, and without consideration of the computational costs of the recognition. In this paper we introduce a novel image descriptor en- We believe that these two assumptions do not meet the abling accurate object categorization even with linear mod- requirements of modern applications of large-scale object els. Akin to the popular attribute descriptors, our feature categorization. For example, test-recognition efficiency is a vector comprises the outputs of a set of classifiers evaluated fundamental requirement to be able to scale object classi- on the image. However, unlike traditional attributes which fication to Web photo repositories, such as Flickr, which represent hand-selected object classes and predefined vi- are growing at rates of several millions new photos per sual properties, our features are learned automatically and day. Furthermore, while a fixed set of object classifiers can correspond to “abstract” categories, which we name meta- be used to annotate pictures with a set of predefined tags, classes. Each meta-class is a super-category obtained by the interactive nature of searching and browsing large im- grouping a set of object classes such that, collectively, they age collections calls for the ability to allow users to define are easy to distinguish from other sets of categories. By us- their own personal query categories to be recognized and ing “learnability” of the meta-classes as criterion for fea- retrieved from the database, ideally in real-time. Depend- ture generation, we obtain a set of attributes that encode ing on the application, the user can define the query cat- general visual properties shared by multiple object classes egory either by supplying a set of image examples of the and that are effective in describing and recognizing even desired class, by performing relevance feedback on images novel categories, i.e., classes not present in the training retrieved for predefined tags, or perhaps by bootstrapping set. We demonstrate that simple linear SVMs trained on the recognition via text-to-image search. In all these cases, our meta-class descriptor significantly outperform the best the classifiers cannot be precomputed during an offline stage known classifier on the Caltech256 benchmark. We also and thus both training and testing must occur efficiently at present results on the 2010 ImageNet Challenge database query-time in order to be able to provide results in reason- where our system produces results approaching those of the able time to the user. best systems, but at a much lower computational cost. In this paper we consider the problem of designing a system that can address these requirements: our goal is to de- 1. Introduction velop an approach that enables accurate real-time search and recognition of arbitrary categories in gigantic image In this work we consider the problem of object class collections, where the classes are not known at the time of recognition in large image databases. Over the last few the creation of the database. We propose to achieve this goal years this topic has received a growing amount of atten- by means of a novel image descriptor enabling good recog- tion in the vision community [9, 27]. We argue, however, nition accuracy even with simple linear classifiers, which that nearly all proposed systems have focused on a sce- can be trained efficiently and – perhaps even more crucially nario involving two restrictive assumptions: the first, is that – can be tested in just a few seconds even on databases con- the recognition problem involves a fixed set of categories, taining millions of images. Rather than optimizing classifi- known before the creation of the database; the second, is cation accuracy for a fixed set of classes, our aim is to learn that there are no constraints on the learning and testing time a general image representation which can be used to de- of the object classifiers, besides the requirement that train- scribe and recognize arbitrary categories, even novel classes ing and testing must be feasible. This is clearly reflected in not present in the training set used to learn the descriptor. the benchmarks of this field [13,3], which measure the per- Furthermore, we show that our feature vector can be bina- formance of recognition systems solely in terms of classifi- rized with little loss of recognition accuracy to produce a 1 compact binary code that allows even gigantic image col- training categories, by definition the classifiers trained on lections to be kept in memory for more efficient testing. them will capture common visual properties shared by sim- Finally, while multiclass recognition of a fixed set of cat- ilar classes while being effective to discriminate visually- egories is not our main motivating application, nevertheless dissimilar object classes. We demonstrate that our meta- we show that our approach achieves excellent performance class features greatly outperform classifier-based descrip- even on this task. On the Caltech256 benchmark, a sim- tors defined in terms of hand-selected classes [29], precisely ple linear SVM trained on our representation outperforms because our abstract categories encode properties shared by the state-of-the-art LP-β classifier [12] trained on the same many object classes and thus can produce better general- low-level features used to learn our descriptor. On the 2010 ization on novel categories. Furthermore, we present for ImageNet Challenge (ILSVRC2010) database, linear clas- the first time categorization results using classifier-based sification with our meta-class features achieves recognition descriptors on the large-scale ILSVRC2010 database and accuracy only 10:3% lower than the winner of the competi- study their efficiency advantages compared to prior work. tion [20], which is a system that was trained for a week using a powerful cluster of machines, a specialized hardware 2. Related Work architecture for memory sharing, and a file system capable The problem of object class recognition in large datasets of handling terabytes of data; instead, our approach allows has been the subject of much recent work. While nonlin- use to fit the entire ILSVRC2010 training and testing set in ear classifiers are recognized as the state-of-the-art in terms the RAM of a standard computer and produce results within of categorization accuracy [12], they are difficult to scale to a day using a budget PC. large training sets. Thus, much more efficient linear mod- In our approach we use as entries of our image descriptor els are typically adopted in recognition settings involving a the outputs of a predefined set of nonlinear classifiers eval- large number of object classes, with many image examples uated on low-level features computed from the photo. This per class [9]. As a result, much work in the last few years implies that a simple linear classification model applied to has focused on methods to retain high recognition accuracy this descriptor effectively implements a nonlinear function even with linear classifiers. We can loosely divide these of the original low-level features. As demonstrated in re- methods in three categories. cent literature on object categorization [30, 12], these non- The first category comprises techniques to approximate linearities are critical to achieve good categorization accu- nonlinear kernel distances via explicit feature maps [22, racy with low-level features. The advantage of our approach 31]: for many popular kernels in computer vision, these is that our classification model, albeit nonlinear in the low- methods provide analytical mappings to slightly higher- level features, remains linear in our descriptor and thus it dimensional feature spaces where inner products approxi- enables efficient training and testing. We are not the first mate the kernel distance. This permits to achieve results to propose the idea of using the scores of nonlinear classi- comparable to those of the best nonlinear classifiers with fiers as features to achieve good recognition accuracy at low simple linear models. However, these approaches are dif- cost [29, 19]. However, the fundamental difference with ficult to use when the training and test sets are very large, our work is that in these prior systems the individual classi- due to the high storage costs caused by the dimensionality fiers defining the descriptor are trained to recognize a hand- of the data in the “lifted-up” space. selected set of classes or visual properties. Our contribution A second line of work involves the use of high- is to replace these subjectively-chosen classes with learned dimensional feature vectors to produce a higher degree of “abstract” categories, i.e., categories that do not necessarily linear separability [27]: this idea is similar to the one behind exist in the real-world but that capture salient common vi- the use of explicit feature maps, with the difference that sual properties shared among many object classes. We refer these high-dimensional signatures are not produced with to these abstract categories as “meta-classes”. the intent of approximating kernel distances between lower- Intuitively, we want our meta-class classifiers to be “re- dimensional features but rather to yield higher accuracy peatable” (i.e., they should produce similar outputs on im- with linear models. Since large datasets represented with ages of the same object category) and to capture properties these high-dimensional descriptors cannot be kept in mem- of the image that are useful for categorization. We formal- ory, the feature vectors are often stored in compressed form ize this intuition by defining each meta-class to be a set of and they are decompressed on the fly “one at a time” during object classes in the training set.

Meta-Class Features for Large-Scale Object Categorization on a Budget

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support