Fast Exact Inference for Recursive Cardinality Models
Total Page:16
File Type:pdf, Size:1020Kb
Fast Exact Inference for Recursive Cardinality Models Daniel Tarlow, Kevin Swersky, Richard S. Zemel, Ryan P. Adams, Brendan J. Frey Dept. of Computer Science Sch. of Eng. & Appl. Sci. Prob. & Stat. Inf. Group University of Toronto Harvard University University of Toronto fdtarlow,kswersky,[email protected] [email protected] [email protected] Abstract celebrated result is the ability of the \graph cuts" algo- rithm to exactly find the maximum a posteriori (MAP) Cardinality potentials are a generally use- assignment in any pairwise graphical model with bi- ful class of high order potential that affect nary variables, where the internal potential structure probabilities based on how many of D bi- is restricted to be submodular. Along similar lines, nary variables are active. Maximum a poste- polynomial-time algorithms can exactly compute the riori (MAP) inference for cardinality poten- partition function in an Ising model if the underlying tial models is well-understood, with efficient graph is planar (Fisher, 1961). computations taking O(D log D) time. Yet Extensions of these results have been a topic of much efficient marginalization and sampling have recent interest, particularly for the case of MAP in- not been addressed as thoroughly in the ma- ference. Gould (2011) shows how to do exact MAP chine learning community. We show that inference in models with certain higher order terms there exists a simple algorithm for comput- via graph cut-like algorithms, and Ramalingham et al. ing marginal probabilities and drawing ex- (2008) give results for multilabel submodular models. act joint samples that runs in O(D log2 D) Tarlow et al. (2010) provide efficient algorithms for a time, and we show how to frame the algo- number of other high-order potentials. rithm as efficient belief propagation in a low order tree-structured model that includes ad- Despite these successes in finding the optimal config- ditional auxiliary variables. We then develop uration, there has been relatively less progress in effi- a new, more general class of models, termed cient high order marginalization and sampling. This Recursive Cardinality models, which take ad- partially stems from the difficulty of some of the com- vantage of this efficiency. Finally, we show putations associated with summation in these models. how to do efficient exact inference in mod- For example, computing the partition function for bi- els composed of a tree structure and a car- nary pairwise submodular models (where graph cuts dinality potential. We explore the expressive can find the MAP) is #P-complete, so we do not ex- power of Recursive Cardinality models and pect to find an efficient exact algorithm. empirically demonstrate their utility. One important high-order potential where such hard- ness results do not exist is the cardinality potential, which expresses constraints over the number of vari- 1 Introduction ables that take on a particular value. Such potentials come up in natural language processing, where they Probabilistic graphical models are widely used in ma- may express a constraint on the number of occurrences chine learning due to their representational power and of a part-of-speech, e.g., that each sentence contains at the existence of efficient algorithms for inference and least one verb. In computer vision, a cardinality po- learning. Typically, however, the model structure tential might encode a prior distribution over the re- must be restricted to ensure tractability. To enable lationships between size of an object in an image and efficient exact inference, the most common restriction distance from camera. In a conference paper matching is that the model have low tree-width. system, cardinality potentials could enforce a require- A natural question to ask is if there are other, differ- ment that e.g. each paper have 3-4 reviews and each ent restrictions that we can place on models to ensure reviewer receive 8-10 papers. tractable exact or approximate inference. Indeed, a A simple form of model containing a cardinality poten- ing a balanced tree, exact inference can be done in the tial is a model over binary variables, where the model same O(D log2 D) time (for non-balanced trees, the probability is a Gibbs distribution based on an energy runtime is O(D2)). This leads to a model class that function consisting of unary potentials θd and one car- strictly generalizes standard tree structures, which is dinality potential f(·): also able to model high order cardinality structure. X X −E(y) = θdyd + f( yd) (1) 2 Related Work d d exp {−E(y)g 2.1 Applications of Cardinality Potentials p(y) = P 0 ; (2) y0 exp {−E(y )g Cardinality potentials have seen many applications, in where no restrictions are placed on f(·). We call this diverse areas. For example, in worker scheduling pro- the standard cardinality potential model. Perhaps the grams in the constraint programming literature, they best-known algorithm in machine learning for comput- have been used to express regulations such \each se- ing marginal probabilities is due to Potetz and Lee quence of 7 days must contain at least 2 days off" and (2008); however, the runtime is O(D3 log D), which is \a worker cannot work more than 3 night shifts every impractical for larger problems. 8 days" (R´egin,1996). Milch et al. (2008) develop car- dinality terms in a relational modeling framework, us- We observe in this paper that there are lesser-known ing a motivating example of modeling how many peo- algorithms from the statistics and reliability engineer- ple will attend a workshop. In error correcting codes, ing literature that are applicable to this task. Though message passing-based decoders often use constraints these earlier algorithms were not presented in terms on a sum of binary variables modulus 2 (Gallager, of a graphical modeling framework, we will present 1963). Another application is in graph problems, such them as such, introducing an interpretation as a two as finding the maximum-weight b-matching, in which step procedure: (i) create auxiliary variables so that the cardinality parameter b constrains the degree of the high order cardinality terms can be re-expressed each node in the matching (Huang & Jebara, 2007), as unary potentials on auxiliary variables, then (ii) or to encode priors over sizes of partitions in graph pass messages on a tree-structured model that includes partitioning problems (Mezuman & Weiss, 2012). original and auxiliary variables, using a known efficient message computation procedure to compute individual More recently, cardinality potentials have become pop- messages. The runtime for computing marginal prob- ular in language and vision applications. In part-of- abilities with this procedure will be O(D log2 D). This speech tagging, cardinalities can encode the constraint significant efficiency improvement over the Potetz and that each sentence contains at least one verb and noun Lee (2008) approach makes the application of cardinal- (Ganchev et al., 2010). In image segmentation prob- ity potentials practical in many cases where it other- lems from computer vision, they have been utilized wise would not be. For example, exact maximum like- to encourage smoothness over large blocks of pixels lihood learning can be done efficiently in the standard (Kohli et al., 2009), and Vicente et al. (2009) show that cardinality potential model using this formulation. optimizing out a histogram-based appearance model leads to an energy function that contains cardinality We then go further and introduce a new high or- terms. der class of potential that generalizes cardinality po- tentials, termed Recursive Cardinality (RC) poten- 2.2 Maximization Algorithms tials, and show that for balanced RC structures, ex- As noted previously, there is substantial work on per- act marginal computations can be done in the same forming MAP inference in models containing one or O(D log2 D) time. Additionally, we show how the more cardinality potentials. In these works, there is algorithm can be slightly modified to draw an exact a division between methods for restricted classes of sample with the same runtime. We follow this up by cardinality-based potential, and those that work for developing several new application formulations that arbitrary cardinality potentials. When the form of the use cardinality and RC potentials, and we demonstrate cardinality potential is restricted, tractable exact max- their empirical utility. The algorithms are equally ap- imization can sometimes be performed in models that plicable within an approximate inference algorithm, contain many such potentials, e.g., Kohli et al. (2009); like loopy BP, variational message passing, or tree- Ramalingham et al. (2008); Stobbe and Krause (2010); based schemes. This also allows fast approximate in- Gould (2011). A related case, where maximization ference in multi-label models that contain cardinality can only be done approximately is the \pattern poten- potentials separately over each label. tials" of Rother et al. (2009). For arbitrary functions Finally, we show that cardinality models can be com- of counts, the main approaches are that of Gupta et bined with a tree-structured model, and again assum- al. (2007) and Tarlow et al. (2010). The former gives a simple O(D log D) algorithm for performing MAP variables similar to that of PL, dates back at least to inference, and the latter gives an algorithm with the Gail et al. (1981). In this work, the algorithmic chal- same complexity for computing messages necessary for lenge is to compute the probability that exactly k of D max-product belief propagation. elements are chosen to be on, given that elements turn on independently and with non-uniform probabilities. 2.3 Summation Algorithms D Naively, this would require summing over k config- Relatively less work in the machine learning commu- urations, but Gail et al. shows that it can be done in nity has examined efficient inference of marginal prob- O(Dk) time using dynamic programming.