Capturing Long-Tail Distributions of Object Subcategories

Capturing long-tail distributions of object subcategories Xiangxin Zhu Dragomir Anguelov Deva Ramanan University of California, Irvine Google Inc. University of California, Irvine [email protected] [email protected] [email protected] 500 log−log distribution 4 Abstract 10 Line fit 400 3 10 300 2 We argue that object subcategories follow a long-tail dis- 10 1 200 10 tribution: a few subcategories are common, while many are Number of occurrences 0 100 10 0 2 10 10 rare. We describe distributed algorithms for learning large- Object classes Number of occurrences 0 mixture models that capture long-tail distributions, which 0 200 400 600 800 1000 Window Person ... Rope Spoon Locker ... Coffin ... Ziggurat are hard to model with current approaches. We introduce a generalized notion of mixtures (or subcategories) that allow (a) The number of examples by object class in SUN dataset for examples to be shared across multiple subcategories. We Busbus Personperson 100 100 2 10 optimize our models with a discriminative clustering algo- 2 10 rithm that searches over mixtures in a distributed, “brute- 80 80 1 10 1 10 60 60 Number of examples force” fashion. We used our scalable system to train tens Number of examples 0 0 10 10 0 1 0 2 of thousands of deformable mixtures for VOC objects. We 40 10 10 40 10 10 Visibility pattern Visibility pattern Number of examples demonstrate significant performance improvements, partic- Number of examples 20 20 ularly for object classes that are characterized by large ap- examplesNum. of 0 0 pearance variation. 0 10 20 30 40 50 60 0 200 400 600 800 1000 VisibilityVisibility pattern VisibilityVisibility patternpattern (b) Distributions of the visibility patterns for bus and person 1. Introduction Figure 1: Long tail distributions exist for both object categories and subcategories. (a) shows the number of exam- It is well-known that the frequency of object occurrence ples by object class in the SUN dataset. The blue curve in in natural scenes follows a long-tail distribution [26]: for the inset show a log-log plot, along with a best-fit line in example, people and windows are much more common than red. This suggests that the distribution follows a long-tail coffins and ziggurats (Fig. 1a). Long-tails complicate anal- power law. (b) shows the distributions of the keypoint visi- ysis because rare cases from the tail still collectively make bility patterns for bus and person from PASCAL (using the up a significant portion of the data and so cannot be ig- manual annotations of [6]), which also follow a long-tail. nored. Many approaches try to minimize this phenomenon We describe methods for automatically discovering long- by working with balanced datasets of objects categories tail distributions of subcategories with a distributed, “brute- [10]. But long-tails still exist for object subcategories: most force” search without using additional annotations. people tend to stand, but people can assume a large number of unusual poses (Fig.1b). We believe that current approaches may capture iconic object appearances well, but not clear. Various approaches have been suggested (includ- are still limited due to inadequate modeling of the tail. ing visual similarity [12], geometric similarity [5], semantic In theory, multi-mixture or subcategory models should ontologies [10]), but the optimal criteria remains unknown. address this, with possibly large computational costs: train (2) Even given the optimal criteria, it is not clear how to a separate model for different viewpoints, shape deforma- algorithmically optimize for it. Typical methods employ tion, etc. Empirically though, these approaches tend to satu- some form of clustering, but common algorithms (e.g., k- rate early in performance after a modest number of mixtures means) tend to report clusters of balanced sizes, while we [34, 20, 12, 16, 14]. hope to get long-tail distributions. (3) Even given the opti- We argue that the long-tail raises three major challenges mal clustering, how does one learn models for rare subcat- that current mixture models do not fully address: (1) The egories (small clusters) with little training data? “right” criteria for grouping examples into subcategories is In our work, we address all three challenges: (1) We 1 respect to recognition accuracy. We verify this hypothesis experimentally. It is possible that brute-force clustering of other data types with respect to other criteria may not pro- duce long-tails. Rather, our experimental results reflect an empirical property of our visual world, much like the empirical analysis of Fig.1. We review related work in Sec.2, introduce our generalized subcategory model and discriminative optimization algorithm in Sec.3, and present results in Sec.4. We demonstrate that our long-tail mixture-models significantly outper- form prior work on benchmark detection datasets, in some cases achieving a factor of 2 improvement. 2. Related work Subcategory discovery: Estimating subcategories is surprisingly hard; clustering based on keypoints [16,5] and appearance [11,3] have provided only modest performance increases [34]. [13] uses combined appearance, shape, and context information to discover a small number of common subcategories for object classification, however the rare cases are thrown away as “outliers”. One attractive approach is to use a discriminative model to re-rank and Figure 2: We describe overlapping subcategory models that identify other nearby training examples for clustering. This allow for training data to belong to multiple clusters with a is often implemented through latent mixture assignment in large variation in size. For example, frontal (red) and side- 3 a max-margin model [14] or discriminative k-means [30]. view (blue) buses may share a large number of 4 -view ex- In practice, such methods are sensitive to initialization and amples, and both are much more common than multi-body can suffer from miscalibration [27]. We describe a discrim- articulated buses (yellow). We show that such models bet- inative clustering algorithm that searches over all initializa- ter characterize objects with “long-tail” appearance distri- tions in a distributed fashion without ever comparing scores butions. across different models during training. Finally, our models allow for overlapping clusters. This differs from soft assignment in that the total contribution of an example need posit that the optimal grouping criteria is simply recog- not be 1; indeed, we show that certain examples are much nition accuracy. But this presumably requires a “brute- more dominant than others (consistent with Rosch’s proto- force” search over all possible clusterings, and an evaluation type theory [24]). of the recognition accuracy of each grouping, which appears Sharing across categories: There has been much work hopeless. (2) We introduce a discriminative clustering al- on sharing information between object category models gorithm that accomplishes this through distributed compu- [26,4, 18]. Most related is [18], which allows an object tation, making use of massively-parallel architectures. We class to borrow examples from similar categories, e.g. some show that long-tail cluster sizes naturally emerge from our armchairs can be used to train sofa models. While this ap- algorithm. (3) To address the lack of training data for small proach yields modest performance gains (1.4%AP), we pro- clusters, we allow rare subcategories to share training exam- duce larger gains presumably due to our brute-force opti- ples with dominant ones, introducing a notion of a overlap- mization over subcategory labels and sharing. Another at- ping subcategories. Such fluid definitions of categories are tractive formalism is that of attributes, or generic properties common in psychology [25]. For example, a sport utility shared across object categories [17, 15, 29]. One could in- vehicle could be equally classified as a truck or a car. Over- terpret binary subcategory labels as a binary attribute vec- lapping subcategories allow for cluster label assignment to tor; indeed, we perform multi-dimensional scaling on such decouple across subcategories, crucial for our distributed a vector to generate the visualization in Fig.2. Our ap- optimization. proach differs from much past work in that our “attributes” Noteably, our clustering algorithm does not explicitly en- are latently inferred in a data-driven manner. force long-tail distributions as priors. Rather, our underly- Sharing across subcategories: Various approaches ing hypothesis is that long-tail distributions are an emergent have also explored sharing across subcategories. For ex- property of the “optimal” clustering, when measured with ample, it is common to share local features across view- CVPR CVPR #328 #328 CVPR 2013 Submission #328. CONFIDENTIAL REVIEW COPY. DO NOT DISTRIBUTE. 216 270 217 iterate 271 218 272 219 Iter0 273 iterate 220 274 221 275 222 iterate Iter1 276 223 277 224 278 225 279 FigureFigure 3: 3: Our Our overall overall pipeline. pipeline. We We learn learn a massive a massive number num- 226 FigureFigure 4: 4: We We visualize visualize examples examples training training images images on on the the 280 ofber candidate of candidate subcategory subcategory models models in parallel, in parallel, each initialized each ini- 227 toptop.. We We show show initial initial exemplar exemplar models models trained trained with with them them 281 withtialize itsd withown itstraining own training example example (an exemplar) and particular and particu- clus- 228 inin the the middlemiddle.. These These templates templates perform perform well well (25% (25% AP AP on on 282 larter cluster size. We size. train We each train subcategory each subcategory with a with discriminative a discrim- 229 VOC2007),VOC2007), but sometimes sometimes emphasize emphasize gradients incorrect in gradients, the back- 283 inativemeanshift clustering algorithm algorithm that iterates that iterates between between selecting selecting exam- 230 suchground, as thesuch foreground as the tree treein the in top-left the center corner image.

Load more