Modeling Images Using Transformed Indian Buffet Processes KE ZHAI

Modeling Images using Transformed Indian Buffet Processes Yuening Huy [email protected] Ke Zhai y [email protected] Department of Computer Science, University of Maryland, College Park, MD USA Sinead Williamson [email protected] Department of Machine Learning, Carnegie Mellon University, Pittsburgh, PA USA Jordan Boyd-Graber [email protected] iSchool and UMIACS, University of Maryland, College Park, MD USA Abstract explain the ball’s every position, it will devote less attention Latent feature models are attractive for image to other aspects of the image and will be unable to generalize modeling, since images generally contain mul- across the ball’s path. Instead, we would like some proper- tiple objects. However, many latent feature ties of a feature, e.g., shape, to be shared across data points models ignore that objects can appear at dif- but properties, e.g., location, to be observation-specific. ferent locations or require pre-segmentation of Models that generalize across images to discover images. While the transformed Indian buffet transformation-invariant features have many applications. process (tIBP) provides a method for modeling Image tracking, for instance, discovers mislaid bags or ille- transformation-invariant features in unsegmented gally stopped cars. Image reconstruction restores partially binary images, its current form is inappropriate corrupted images. Movie compression recognizes recurring for real images because of its computational cost image patches and caches them across frames. and modeling assumptions. We combine the tIBP with likelihoods appropriate for real images and We argue that latent feature models of images should: develop an efficient inference, using the cross- • Discover features needed to model data and add addi- correlation between images and features, that is tional features to model new data. theoretically and empirically faster than existing • Generalize across transformations so features can inference techniques. Our method discovers rea- have different locations, scales, and orientations. sonable components and achieve effective image • Handle properties of real images such as occlusion. reconstruction in natural images. A nonparametric model that comes close to our goals is the noisy-OR transformed Indian buffet process (NO-tIBP, 1. Introduction Austerweil & Griffiths, 2010); however, its likelihood model is inappropriate for real images. Existing unsupervised mod- Latent feature models assume data are generated by combin- els that handle realistic likelihoods (Jojic & Frey, 2001; ing latent features shared across the dataset and aim to learn Titsias & Williams, 2006) are parametric and cannot dis- this latent structure in an unsupervised manner. Such mod- cover new features. In Section2, we further describe these els typically assume all properties of a feature are common and other models that meet some, but not all, of our criteria. to all data points—i.e., each feature appears in exactly the same way across all observations. This is often a reasonable In Section3, we propose models that fulfill these proper- assumption. For example, microarray data are designed so ties by combining realistic likelihoods with nonparametric each cell consistently corresponds to a specific condition. frameworks. In Section4, we introduce novel inference algorithms that dramatically improve inference for trans- This does not hold for images. Consider a collection of im- formed IBPs in larger datasets (Section5). In Section6, we ages of a rolling ball. If a model must create new features to show that our models can discover features and model data th better than existing models. We discuss relationships with Appearing in Proceedings of the 29 International Conference on Machine Learning, Edinburgh, Scotland, UK, 2012. Copyright other nonparametric models and extensions in Section7. 2012 by the author(s)/owner(s). y indicates equal contributions. Modeling Images using Transformed Indian Buffet Processes from location to location or where a person may be in either ak the foreground or background. Na¨ıve models would learn different features for each location a car appears in; a more znk appropriate model would learn that each observation is in rnk fact a transformation of a common feature. ! The transformed IBP (tIBP, Austerweil & Griffiths, 2010) extends the IBP to accommodate data with varying locations. s n;k In the tIBP, each column of an IBP-distributed matrix Z is (as before) associated with a feature. In addition, each non- xn zero element of Z is associated with a transformation rnk. (a) IBP (b) LG-tIBP (c) M-tIBP Transforming the features and combining them according to a likelihood model produces observations. In the original Figure 1. Generative process of the linear Gaussian IBP (IBP), the tIBP paper, features were generated and combined using linear Gaussian tIBP (LG-tIBP) and the masked tIBP (M-tIBP). noisy-OR (Wood et al., 2006); we refer to this model as the All models share features ak across the dataset, and observation- noisy-OR tIBP (NO-tIBP), which allows the same feature specific indicators znk determine which features contribute to a to appear in different locations, scales, and orientations. data point xn. In the tIBP models, transformations rnk change where features appear in the observation. In the IBP and LG-tIBP, 2.3. Likelihoods for Latent Feature Image Models features are combined additively. In the M-tIBP, only one feature contributes to each pixel of a final image. Together, a global In addition to the noisy-OR, another likelihood that has ordering ! over features and a local binary mask sn;k determine been used with the IBP is a linear Gaussian model, which which feature is responsible for the appearance of a given pixel. assumes images are generated via a linear superposition of features (Griffiths & Ghahramani, 2005). Each IBP row se- 2. Background lects a subset of features and generates an observation by additively superimposing these features and adding Gaussian In this section, we review the Indian buffet process and how noise. This is demonstrated in Figure 1(a). This model can its extension, the transformed IBP, models simple images. be extended by adding weights to the non-zero elements of We then describe likelihood models for images. These mod- the IBP-distributed matrix (Knowles & Ghahramani, 2007) els are a prelude to the models we introduce in Section3. and incorporating a spiky noise model (Zhou et al., 2011) appropriate for corrupted images. 2.1. The Indian Buffet Process If we want to model images where features can occlude The Indian buffet process (IBP, Griffiths & Ghahramani, each other, linear Gaussian models are inappropriate. In 2005) is a distribution over binary matrices with exchange- the vision community, images are often represented via able rows and infinitely many columns. This can define overlapping layers (Wang & Adelson, 1994), including in a nonparametric latent feature model with an unbounded generative probabilistic models (Jojic & Frey, 2001; Titsias number of features. This often matches our intuitions. We & Williams, 2006). In these “sprite” models, features are do not know how many latent features we expect to find Gaussian-distributed, and an ordering is defined over a set in our data; neither do we expect to see all possible latent of features. In each image, every active feature has a trans- features in a given dataset. formation (as in the tIBP) and a binary mask for each pixel. Given the feature order, the image is generated by taking To use the IBP to model data, we must select a likelihood the value, at each pixel, of the uppermost unmasked feature. model that determines the form of features corresponding to columns of Z and how the subset of features selected This model is appealing. It is an intuitive occlusion model; by a row of Z combine to generate a data point.1 Many features have a consistent ordering; and only the topmost likelihoods have been proposed for the IBP, several of which feature is visible. However, this likelihood model has only are appropriate for modeling images. been used for parametric feature sets and on data where the number of features is known a priori. 2.2. The Transformed IBP Most IBP-based latent feature models assume a feature is 3. Modeling Real-valued Images identical in every data point in which it appears. This pre- While the NO-tIBP likelihood model is incompatible with cludes image modeling, where (for example) a car moves real images, it provides a foundation for nonparametric 1 th models with transformed features. In this section, we use We follow the convention that zn is the n row of a matrix th Z, and znk is the k element of the vector zn. the tIBP to build models that combine nonparametric feature Modeling Images using Transformed Indian Buffet Processes models with more useful and realistic likelihood functions The generative process can be described as follows. For for real images. each image n and feature k, define an auxiliary variable M , the visibility indicator, We begin by providing a general representation for the trans- n;k formed IBP with an arbitrary likelihood. 8 r−1 (d) >1 if argmax ! z s n;j = k > j j n;j n;j d < 1. Sample a binary matrix Z ∼ IBP(α), determining the M = r−1 (d) (2) n;k and s n;k > 0 features (columns) present in observations (rows). > n;k :> 2. For k 2 N, sample a feature φk ∼ p(φ). 0 otherwise. 3. For each image n 2 f1;:::;Ng M d 1 k • For k 2 N, sample a transformation rnk ∼ p(r). The visibility indicator n;k, is when feature is the • Sample an image xn ∼ p(xjΦ; zn; rn). uppermost unmasked feature at pixel d in image n. The image and feature likelihoods for the M-tIBP are The distribution over transformations p(r), the feature likeli- a ∼N (0; σ2I) hood p(φ), and the image likelihood p(xjΦ; z ; r ) can be k a n n d defined in various ways.

Modeling Images Using Transformed Indian Buffet Processes KE ZHAI

Details

Download

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

Support