Higher-Order Gradient Descent by Fusion-Move Graph

Hiroshi Ishikawa

Nagoya City University Department of Information and Biological Sciences 1 Yamanohata Mizuho, Nagoya 467-8501, Japan

Abstract 130 110 is now ubiquitous in many formu- lations of various vision problems. Recently, optimization 90 of higher-order potentials became practical using higher- -expansion order graph cuts: the combination of i) the fusion move 70 algorithm, ii) the reduction of higher-order binary energy blur & random 50 minimization to first-order, and iii) the QPBO algorithm. In gradient (this paper) the fusion move, it is crucial for the success and efficiency 30 of the optimization to provide proposals that fits the ener- 0 20 40 60 80 100 120 140 sec. gies being optimized. For higher-order energies, it is even Figure 1. Minimization of a third-order energy for denoising of an more so because they have richer class of null potentials. image. The vertical axis is the energy and the horizontal axis is the ffi In this paper, we focus on the e ciency of the higher-order time elapsed for optimization. The three plots are for the same en- graph cuts and present a simple technique for generating ergy discussed in 4.2. Alpha expansion failed to achieve the same proposal labelings that makes the algorithm much more ef- level of minimum energy. Proposals by pixel-wise random label ficient, which we empirically show using examples in stereo and blurring is used in [10]. The new technique by gradient de- and image denoising. scent proposal achieves the same level of minimum energy much faster. 1. Introduction

Many problems in such as segmenta- cuts: i.e., a combination of the following three techniques. tion, stereo, and image restoration are often formulated as optimization problems involving inference of the maximum i) The fusion move algorithm[18], which is a general- α a posteriori solution of a probability distribution defined by ization of the -expansion[3] algorithm that iteratively Markov Random Fields (MRFs)1. Typically, the problem fuses the current labeling and a proposed labeling us- is defined as finding a labeling of pixels that minimizes a ing binary optimization. function on the space of labeling, called the energy. Such ii) The reduction of higher-order binary enery minimiza- optimization schemes are now ubiquitous in vision, largely tion to first-order. A method for reducing second-order owing to the success of optimization techniques such as energies[6, 14] has been known for some time, and its graph cuts[3, 9, 14], belief propagation[5, 21], and tree- generalization[10] to general order was found recently. reweighted message passing[13]. Recently, optimization of higher-order potentials became iii) The QPBO algorithm[1, 2, 7], which allows the mini- practical using what we might call the higher-order graph mization of non-submodular binary problems.

1The notion of conditional random field (CRF) is increasingly popular The combination was first proposed by Woodford et al.[27] in the vision literature. A CRF contains both hidden and observed vari- when they demonstrated a second-order prior for stereo, us- ables; when the values of the observed variables are fixed, the rest of the ing the reduction[6, 14] then available. The more recent model on the hidden variables is an MRF. In the context of this paper, there is not much difference between MRF and CRF, since the observed data is reduction[10] for energies of even higher order made the assumed to be fixed before optimization. framework potentially more useful.

568

2009 IEEE 12th International Conference on Computer Vision (ICCV) 978-1-4244-4419-9/09/$25.00 ©2009 IEEE In this paper, we focus on the efficiency of this algorithm. well as addressing the problem of transforming general Specifically, we introduce a technique that makes it much multi-label functions into quadratic ones. faster (see Figure 1) by moving the current labeling by the gradient of the energy on the space of labelings to produce a 2.2. Minimization Problem proposal. We also show that using only a part of the energy The energy minimization problem considered here is as for taking gradient can also be effective. We demonstrate it follows. Let V be a set of pixels and L a set of labels, both through experiments with energies of order two and three. finite sets. Also, let C be a set of subsets of V. We call an In the next section, we continue with an overview, ex- element of C a clique. The energy E(X) is defined on the plaining the relevant techniques. In section 3, we formu- space LV of labelings X : V → L, i.e., an assignment of a late the higher-order graph cuts in more detail and present label Xv ∈ L to each pixel v ∈ V. The crucial assumption is the new technique for generating the proposal labelings. that E(X) is decomposable into a sum In section 4, we report the results of experiments showing the effectiveness of the technique using two examples, after E(X) = fC(XC), (1) which we conclude. C∈C ∈ C 2. Preliminaries Here, for a clique C , fC(XC) expresses a function fC that depends only on labels assigned by X to the pixels v in In this section, we give an overview of the recent de- C. We denote the set of labelings on the clique C by LC; C C velopments in higher-order energy minimization, which we thus XC ∈ L and fC : L → R. The cliques define a form then formally define. Then we explain the algorithms that of generalized neighborhood structure. In the case where all the new method in this paper is directly related. cliques consist of one or two pixels, it can be thought of as an undirected graph G = (V, E), where the set of cliques are 2.1. Higher Order Energies divided into the set of all singleton subsets of V and the set E of pairs of pixels, i.e., edges. In the general case, the pair As the newer energy minimization techniques came to (V, C ) can be thought of as a hypergraph. Then the Markov be widely used, the limitations of the energies readily op- Random Field of first order is written as timizable by these algorithms came to be more salient. As pointed out by Meltzer et al.[21] after they introduced an E(X) = fv(Xv) + fuv(Xuv). (2) algorithm that can find the global minimum of certain en- v∈V {u,v}∈E ergy functions, even global minimum of an energy does not solve many problems if the energy itself does not reflect the Although it is a little confusing, an n-th order MRF is given vision problem well enough. In particular, because of the one that has cliques of size up to n + 1. Thus, a second- the lack of efficient general algorithms to optimize energies order MRF can have cliques containing three pixels, and an with higher-order interactions, in most applications energies energy of the form are represented in terms of unary and pairwise clique poten- + + . tials, with a few exceptions that consider triples[4, 14]. f{v}(X{v}) f{uv}(X{uv}) f{uvw}(X{uvw}) (3) Higher-order energies can model more complex interac- {v}∈C {u,v}∈C {u,v,w}∈C tions and reflect the natural statistics better. This has been As we have already done above, we sometimes abuse the long realized[11, 22, 24], but with the success of energy notation slightly and write Xuv instead of X{u,v}, and so on. optimization methods, there is a renewed emphasis on the necessity of an efficient way to optimize MRFs of higher- 2.3. Graph Cuts and Move Making Algorithms order. Since the algorithms mentioned above can only op- timize energies with unary and pairwise clique potentials, Currently, one of the most popular optimization tech- α there have been efforts to expand them to higher-order en- niques in vision is -expansion[3]. It starts from an initial ergies. For instance, belief propagation variants[17, 23] labeling and iteratively makes a series of moves by solving have been introduced to do inference based on higher-order a binary optimization problem with an s-t mincut algorithm, clique potentials. Kohli et al.[12] extend the class of energy which can globally optimize a class of binary energy poten- functions for which the optimal α-expansion moves can be tials called submodular functions[14]. Here, we explain it computed in polynomial time. Komodakis and Paragios[16] and its extension, the fusion move algorithm. employ a master-slave decomposition framework to solve a dual relaxation to the MRF problem. Rother et al.[26] use a The α-expansion Algorithm The α-expansion algorithm soft-pattern-based representation of higher-order functions starts with an arbitrary labeling X and iteratively makes a that may for some energies lead to very compact first-order move, i.e., a change of labeling. In one iteration, the al- functions with small number of non-submodular terms, as gorithm changes the labeling so that the energy becomes

569 smaller or at least stays the same. The move is decided by timizing higher-order potentials, it means that some limita- either keeping the original label or replacing it with a glob- tions that prevented the use of these algorithms for higher- ally fixed label α at each pixel. Thus, the “area” that has order functions can possibly be overcome. the label α can only expand; hence the name α-expansion. The choice of either leaving the label the same or changing 2.4. Higher Order Graph Cut it to α at each pixel defines a binary labeling problem, using Now, the development of the two algorithms above— as its energy the original energy after the move. By mini- fusion move and QPBO—has an important implication in mizing the binary energy, the move that reduces the energy optimization of higher-order energies. For second order po- most is chosen. When the binary problem is submodular, it tentials (i.e., those with cliques having up to three pixels), a can be solved globally by the minimum-cut algorithm. By reduction technique that can reduce them into pairwise po- visiting all labels α in some order, and repeating it, E(X)is tentials has been introduced by Kolmogorov and Zabih[14] approximately minimized. The algorithm also has a guar- and later reformulated by Freedman and Drineas[6]. How- antee on how close it can approach the global minima. ever, for the result of the reduction to be minimized with the popular techniques such as graph cuts, it must be submod- The Fusion Move The fusion move[18, 19] is a simple ular. This requirement made its actual use quite rare, if not α generalization of -expansion: in each iteration, define the nonexistent. binary problem as the choice at each pixel between two ar- Thanks to the QPBO technique, now we can think of bitrary labelings, instead of between the current label and reducing higher-order potentials into pairwise ones with a α a fixed label . It seems so simple and elegant that one hope that at least part of the solution can be found. Al- may wonder why the move-making algorithm was not for- though it is not a solution to every problem, the QPBO tech- mulated this way from the beginning. The answer is simple: nique often allows solving non-submodular problem ap- it is because fusion moves are non-submodular in general, proximately by giving a large enough part of the globally α whereas in the case of -expansion, the submodularity can optimal solution to be used in a move-making algorithm be guaranteed by some simple criterion. For instance, if the to iteratively improve the solution. Moreover, we recently α pairwise energy is a metric, each -expansion is guaranteed introduced[10] a technique that extends the reduction in [6] to be submodular. It is only because of the emergence of and [14] to general higher-order energies. It is a simple / the QPBO roof-duality optimization below that we can now technique that can convert the minimization problem of any consider the general fusion move. Another special case that higher-order binary energy to that of a first order energy. α β has a guarantee of submodularity is - swap, which allows In the same paper, we also obtained a result suggest- those moves that, at each pixel, leave the current label un- ing that the use of fusion move as opposed to α-expansion α β changed or swap the two fixed labels and . is more important in higher-order energies. This may be because higher-order energies have richer class of null (or QPBO Separately, there is a recent (at least to the vi- almost null) potentials. The proposals in α-expansion are sion community) innovation that is important to the move- constant labelings; and α-expansion works well exactly making graph-cut algorithms: the optimization of binary with those energies, such as the Potts potential, that heavily energies saw an advance that allows the optimization of favors piecewise-constant solutions. For higher-order ener- non-submodular functions. This method by Boros, Hammer gies, it is crucial for the success and efficiency of the op- and their co-workers [1, 7, 2] is variously called QPBO[15] timization to provide proposals that fits the energies being or roof-duality[25] in the vision literature. If the function optimized. is submodular, QPBO is guaranteed to find the global min- The combination, which we might call the higher-order imum. Even if it is not submodular, QPBO returns a par- graph cuts, was first introduced by Woodford et al.[27], tial solution assigning either 0 or 1 to some of the pixels, where they show that second-order smoothness priors can leaving the rest unlabeled. The algorithm guarantees that be used for stereo reconstruction by introducing an inter- the partial labeling is a part of a global minimum labeling. esting framework. It integrates existing stereo algorithms, This has a crucial impact on the move-making algorithms which may sometimes be rather ad hoc, and combines their since the choice of the move in each iteration depends on results in a principled way. In the framework, an energy binary-label optimization. In particular, QPBO has an “au- is defined to represent the global tradeoff of various fac- tarky” property[15]: if we take any labeling and “overwrite” tors. The energy is minimized using the fusion move algo- it with a partial labeling obtained by QPBO, the energy for rithm, in which the various existing algorithms are utilized the resulting labeling is not higher than that for the original to generate the proposals in each iteration. In particular, labeling. This lets us ensure that energy does not increase this allows the powerful segmentation-based techniques to in move-making: we just leave the label unchanged at those be combined with the global optimization methods. In this pixels that are not labeled by QPBO. In the context of op- way, the framework allows the integration of energy formu-

570 lation and any other stereo method that produces disparity where the proposal labeling is the constant labeling that as- map. signs the same label to every pixel. The contribution of this paper is the simple but useful 3. Gradient-Descent Fusion Move observation that often the energy E(X) or its part can be differentiated, and then we can move the current labeling by In this section, we first formulate the higher-order graph the gradient of the energy in order to generate the proposal cuts in more detail and then present the new technique for labeling. generating the proposal labelings. P = X − η gradE(X) (7) 3.1. Higher Order Graph Cut Algorithm ∂E Pv = Xv − η . (8) The algorithm solves the minimization problem ex- ∂Xv plained in 2.2 by minimizing the energy (1). That is, we can generate the proposal labeling in each iter- It maintains the current labeling X. In each iteration, the ation by using a simple gradient descent. We found empir- ∈ V algorithm fuses X and a proposed labeling P L by min- ically that, unlike in the case of ordinary gradient descent, imizing a binary energy. For instance, in the α-expansion ffi α we can omit some of the energy and still get an e cient re- algorithm, the proposal is a constant labeling with label sult. For instance, we can use only the prior, higher-order everywhere. Here, it can be any labeling and how it is pre- part of the energy to get the gradient and move the label- pared is problem-specific, as is how X is initialized at the ing in that direction, with very fast decrease in energy at the beginning. { , } B beginning of the minimization process. If we do that in or- Let us denote the set 0 1 of binary labels by . The dinary gradient descent, there is usually no guarantee that following binary energy is minimized every iteration. We energy even decreases. However, the fusion move is always consider a binary labeling variable Y ∈ BV . It consists of a ∈ B ∈ guaranteed to decrease, or at least not increase, the energy; binary variable Yv for each pixel v V that indicates so we can do rough things such as this. Another benefit of the choice of the value that Xv will have at the end of the the protection by the graph cuts is that we can take a very iteration. That is, Yv = 0ifXv is to remain the same and = large step compared to ordinary gradient descent. Yv 1ifXv is to change to the proposed label Pv. In the next section, we give concrete examples and also X,P β ∈ C Let us denote by FC ( ) L the labeling on clique C ffi C demonstrate the e ciency of the method. that X will have if the value YC of Y on C is β ∈ B : ⎧ ⎪ , ⎨X if β = 0 4. Experiments FX P( β) = ⎪ v v (v ∈ C) (4) C v ⎩⎪ Pv. if βv = 1 We show the effectiveness of the gradient descent pro- posal generation in higher-order graph cuts with two exam- With this notation, we define a binary energy ples. One is the second-order stereo[27] that uses the cur- E = X,P β θ β , vature term. The other is a third-order denoising example. (Y) fC FC ( ) C(YC) (5) C∈C β∈BC 4.1. Second order Stereo Prior θ β | | where C(YC) is a polynomial of degree C defined by Woodford et al.[27] showed that second-order smooth- β ness priors can be beneficial to stereo reconstruction. Their θ (Y ) = β Y + (1 − β )(1 − Y ) , (6) C C v v v v algorithm is the higher-order graph-cut algorithm explained v∈C in 3.1 with the following special case of (1) as the energy: which is 1 if YC = β and 0 otherwise. E(X) = E (X) + E (X) (9) The polynomial E (Y) is then reduced into a quadratic D P one, i.e., a first-order MRF, using the technique described in = fv(Xv) + fC(XC). (10) [10] or its predecessor [6, 14]. We then minimize the energy ∈ v V C∈CP using the QPBO algorithm, obtaining a partial labeling. For C each pixel v that is labeled 1, we update X to P , leaving Here, the set of cliques is divided into the set of singleton v v C = {{ }| ∈ } C it unchanged for other pixels. We iterate the process until cliques D v v V and the set P of cliques with some convergence criterion is met. three pixels. The prior approximates the second derivative; thus each C = 3.2. Proposal by Gradient Descent clique C in P consists of three consecutive pixels C (u, v, w) as shown in Figure 2, and the energy term is a trun- ffi The successful use and e ciency of the fusion move al- cated absolute second difference of the disparity: gorithm crucially depends on the choice of the proposal la- beling P. We can consider the α-expansion as a special case fC(XC) = WC min (σs, |Xu − 2Xv + Xw|) , (11)

571 where σ > 0 is a constant cut-off value for robustness and s u W is a weight for each clique. The gradient of the prior C v E (X) has, as the component for each pixel v, the partial uvw P w derivative by Xv: ∂ fC(XC) ∂ fC(XC) Figure 2. The second-order cliques used in Woodford et al.[27]. gradE (X) = = . (12) P v ∂X ∂X C∈C v C,v∈C v

The RHS is the sum of the terms for (in general) six cliques C that contain v (Figure 3). Each term depends on the sign v v v and the absolute values of the second difference, e.g.,: ∂ fC(XC) = ∂ ⎧ Xv v v v ⎪ = , , , < − + <σ ⎪WC if C (v u w) 0 Xv 2Xu Xw s ⎪ ⎨2WC if C = (u, v, w), −σs < Xu − 2Xv + Xw < 0 ⎪ (13) ⎪0ifC = (u, v, w),σ < X − 2X + X Figure 3. Six cliques that contains a particular pixel in [27]. ⎪ s u v w ⎩etc. Table 1. Second-order stereo results for the “cones” stereo pair. We omit the rest since it is clear that they can be computed Bad pixels Final Energy Time easily. Original[27] 5.82% 2.84 × 1010 2153 sec. In the first phase of their fusion-move optimization, η = 0.003 5.78% 2.84 × 1010 1215 sec. Woodford et al.[27] use the result of the segmentation-based η = 0.01 7.06% 2.86 × 1010 740 sec. stereo with different degree of coarseness as proposals. They first use a local window matching process to gener- ate an approximate disparity map. Then they use 14 sets of Table 2. Second-order stereo results for the “teddy” stereo pair. parameters in total and two segmentation algorithms to pro- Bad pixels Final Energy Time duce segmentations of different degree of coarseness, rang- Original[27] 6.59% 2.60 × 1010 1630 sec. ing from highly undersegmented to highly oversegmented. η = 0.001 6.55% 2.61 × 1010 1188 sec. For each segment in each segmentation, LO-RANSAC is η = 0.003 6.67% 2.62 × 1010 524 sec. used to find the plane that produces the greatest number of inlying correspondences by the approximate disparity and set the proposal so that all the pixels in the segment lie on 4.2. Denoising by Third order Field of Experts that plane. In the second phase, α-expansion-like constant proposals are used with uniformly random disparity. In [10], we use an image denoising problem to test the higher-order graph cuts with a third-order energy, which is made possible by the new reduction introduced in the paper. Experiments Then, in the third phase, they use the result The image restoration scheme uses the recent image statis- of the first and second phases as two of the six rotating pro- tical model called the Fields of Experts (FoE) by Roth and posals. For the other four of the six proposals, they use the Black[24], which captures complex natural image statistics proposal generated by a smoothing operation on the current beyond pairwise interactions by providing a way to learn an disparity map. Curiously, they say the smoothing proposal image model from natural scenes. FoE has been shown to “can be viewed as a proxy for local methods such as gra- be highly effective, performing well at image denoising and dient descent,” but do not seem to have actually tested the image inpainting using a gradient descent algorithm. Simi- simple gradient descent for generating proposals. lar to its predecessor, the Product of Experts model[8], the We use the gradient descent proposal FoE model represents the prior probability of an image as the product of several student-T distributions: = − η Pv Xv gradEP(X) v (14) K −α 1 i p(X) ∝ 1 + (J · X )2 . (15) instead of the smoothing proposals and compare the result 2 i C of the third phase with the original one. Table 1 and 2 show C i=1 that the technique yields a result with the comparable qual- where C runs over the set of all n × n patches in the image, ity and energy almost twice as fast. and Ji is an n × n filter. The parameters Ji and αi are learned

572 from a database of natural images. Table 3. Third-order image restoration results. In the simple image restoration problem, we are given test001 test002 test003 test004 a noisy image N made by adding a Gaussian i.i.d. noise Original[10] E 37769 25030 29805 27356 with known standard deviation σ and find the maximum a Original time 1326 sec. 1330 sec. 1305 sec. 1290 sec. posteriori estimation according to the prior model (15). The This paper E 38132 24831 29683 27354 prior gives rise to a third-order MRF, with clique potentials This paper time 71 sec. 81 sec. 67 sec. 79 sec. that depend on up to four pixels. The likelihood of noisy image N given the true image X is assumed to be 5. Conclusion (N − X )2 p(N|X) ∝ exp − v v . (16) 2σ2 v∈V We suggested in [10] that the use of fusion moves as opposed to α-expansion is more important in higher-order The energy in [10] is of the form (1). Specifically, it is: energies, because they have richer class of null potentials. α α E(X) = ED(X) + EP(X) (17) The proposals in -expansion are constant labelings; and - expansion works well exactly with those energies, such as 1 2 ED(X) = (Nv − Xv) , (18) the Potts potential, that heavily favors piecewise-constant 2σ2 v∈V solutions. The message this paper brings is that, for fusion 3 moves with higher-order energies, it is crucial for the suc- = α + 1 · 2 , EP(X) i log 1 (Ji XC) (19) cess and efficiency of the optimization to provide proposals ∈C = 2 C P i 1 that fit the energies being optimized. where the set CP of cliques consists of all 2 × 2 patches in For each application, a mix of various strategies will be the image. probably necessary. In this paper, we point out a simple technique: we generate proposal labelings in fusion moves Experiments In [10], similarly to the stereo example[27], by moving the current labeling by the gradient of the en- the proposed image is one of the following two, alternating ergy. It is not really gradient descent as i) the gradient of each iteration: even a part of the energy (e.g. the prior in the stereo ex- periment) can be used to speed up the fusion move, and ii) i) a uniform random image created every time it is used, unlike ordinary gradient descent, the descent step can safely ii) a blurred image, which is made every 30 iterations by be made quite large, which speeds up the optimization es- blurring the current image with a Gaussian kernel (σ = pecially in the early stage. These can safely be done only 0.5625). because the actual move is “guarded” against increasing the energy by graph cuts. Although it is simple and seems obvi- We compared the results by this original algorithm with the ous, the technique makes the fusion move algorithm much gradient descent fusion proposal more efficient in the case of higher-order potential. = − η . Pv Xv gradE(X) v (20) Acknowledgments From the energy (18) and (19), the partial derivative of the energy is This work was partially supported by the Kayamori ∂ 3 (v) Foundation and the Grant-in-Aid for Scientific Research E(X) 1 Ji = (Xv − Nv) + αi , 19650065 from the Japan Society for the Promotion of Sci- ∂ σ2 1 Xv ∈ = 1 + (J · X )2 v V C∈CP i 1 2 i C ence. (v) where Ji denotes the component of the filter Ji that corre- sponds to the pixel v. References Table 3 shows the energy and the time for four exam- σ = [1] E. Boros, P. L. Hammer, and X. Sun. “Network Flows ple noisy images with 20 made from four images in and Minimization of Quadratic Pseudo-Boolean Functions.” the Berkeley segmentation database [20], grayscaled and RUTCOR Research Report, RRR 17-1991, May 1991. reduced in size. Figure 4 shows the result for one of the example images (test003). The quality of the two denoising [2] E. Boros, P. L. Hammer, and G. Tavares. “Preprocessing of results are comparable, but the new result was produced al- Unconstrained Quadratic Binary Optimization.” RUTCOR most 20 times faster. Figure 1 on the first page shows the Research Report, RRR 10-2006, April 2006. plot of the energy for the original and our proposal method, [3] Y. Boykov, O. Veksler, and R. Zabih. “Fast Approximate En- as well as the α-expansion, for the example image test001. ergy Minimization via Graph Cuts.” IEEE TPAMI 23:1222– All experiments used a 2.33GHz Xeon E5345 processor. 1239, 2001.

573 Figure 4. From left to right: original image (test003), noise-added image (Gaussian, σ = 20), denoised using the α expansion proposal, the proposals in [10], and using the new technique. The last two denoising results are comparable, but the new result was produced almost 20 times faster.

[4] D. Cremers and L. Grady. “Statistical Priors for Effi- [17] X. Lan, S. Roth, D. P. Huttenlocher, and M. J. Black. “Effi- cient Combinatorial Optimization Via Graph Cuts.” In cient Belief Propagation with Learned Higher-Order Markov ECCV2006, III:263–274. Random Fields.” In ECCV2006, II:269–282. ffi [5] P. Felzenszwalb and D. Huttenlocher. “E cient Belief Prop- [18] V. Lempitsky, C. Rother, and A. Blake. “LogCut - Efficient agation for Early Vision.” Int. J. Comp. Vis. 70:41–54, 2006. Graph Cut Optimization for Markov Random Fields.” In [6] D. Freedman and P. Drineas. “Energy minimization via ICCV2007. Graph Cuts: Settling What is Possible.” In CVPR2005 [19] V. Lempitsky, S. Roth, and C. Rother. “FusionFlow: II:939–946. Discrete-Continuous Optimization for Optical Flow Estima- [7] P. L. Hammer, P. Hansen, and B. Simeone. “Roof Dual- tion.” in CVPR2008. ity, Complementation and Persistency in Quadratic 0-1 Op- timization.” Mathematical Programming 28:121–155, 1984. [20] D. Martin, C. Fowlkes, D. Tal, and J. Malik. “A Database of Human Segmented Natural Images and its Application [8] G. Hinton. “Training products of experts by minimizing con- to Evaluating Segmentation Algorithms and Measuring Eco- trastive divergence.” Neural Comp., 14(7):1771–1800, 2002. logical Statistics.” In ICCV2001, pp. 416–423. [9] H. Ishikawa. “Exact Optimization for Markov Random Fields with Convex Priors.” IEEE TPAMI 25(10):1333– [21] T. Meltzer, C. Yanover, and Y. Weiss. “Globally Optimal 1336, 2003. Solutions for Energy Minimization in Stereo Vision Using Reweighted Belief Propagation.” In ICCV2005, pp. 428– [10] H. Ishikawa. “Higher-Order Clique Reduction in Binary 435. Graph Cut.” In CVPR2009. ff [11] H. Ishikawa and D. Geiger. “Rethinking the Prior Model for [22] R. Paget and I. D. Longsta . “Texture Synthesis via a Non- Stereo.” In ECCV2006, III:526–537. causal Nonparametric Multiscale Markov Random Field.” IEEE Trans. on Image Processing, 7(6):925–931, 1998. [12] P. Kohli, M. P. Kumar, and P. H. S. Torr. “P3 & Beyond: Move Making Algorithms for Solving Higher Order Func- [23] B. Potetz. “Efficient Belief Propagation for Vision Using tions.” IEEE TPAMI, 31(9):1645–1656, 2009. Linear Constraint Nodes.” In CVPR2007. [13] V. Kolmogorov. “Convergent Tree-Reweighted Message [24] S. Roth and M. J. Black. “Fields of Experts: A Framework Passing for Energy Minimization.” IEEE TPAMI 28 (10): for Learning Image Priors.” In CVPR2005, II:860–867. 1568–1583, 2006. [25] C. Rother, V. Kolmogorov, V. Lempitsky, and M. Szummer. [14] V. Kolmogorov and R. Zabih. “What Energy Functions Can “Optimizing Binary MRFs via Extended Roof Duality.” In Be Minimized via Graph Cuts?” IEEE TPAMI 26(2):147– CVPR2007. 159, 2004. [15] V. Kolmogorov and C. Rother. “Minimizing Non- [26] C. Rother, P. Kohli, W. Feng, and J. Jia. “Minimizing Sparse submodular Functions with Graph Cuts—A Review.” Higher Order Energy Functions of Discrete Variables.” In IEEE TPAMI 29(7):1274–1279, 2007. CVPR2009. [16] N. Komodakis and N. Paragios “Beyond Pairwise Ener- [27] O. J. Woodford, P. H. S. Torr, I. D. Reid, and A. W. Fitzgib- gies: Efficient Optimization for Higher-order MRFs.” In bon. “Global Stereo Reconstruction under Second Order CVPR2009. Smoothness Priors.” In CVPR2008.

574