Metric Labeling with -Metrics

Pedro Felzenszwalb, Gyula Pap, Eva Tardos, Ramin Zabih Metric labeling

Graph G = (V, E) Labels L Labeling f : V L → - c(v, a) is a cost for giving label a to node v - d(a, b) is a on L - we would like related objects to get similar labels Goal: Find f minimizing

Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ Metric labeling

General problem is NP-hard - log(k) approximation via LP If G is a tree, or low tree-width - dynamic programming

General G, L = {1,...,k} and d(i, j) = | i - j | - min s-t cut

- (or any d(i, j) = convex(i - j))

Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ Application: image de-noising

Observe a noisy image We want to pick a new value for each pixel - Each value should be similar to observed one - Value of nearby pixels should be similar Metric labeling with “ideal label”

Observations are labels Observed label o(v) associated with each object v Assignment cost measures distance between labels

- c(v, f(v)) = wv d(o(v), f(v))

Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ Still NP-hard... generalizes 0-extension (multiway cut) “ideal label” + tree metric

Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ Main result - Fast algorithm when d is a tree-metric - k: number of labels - n: number of objects being labeled - ~ log(k) min-cuts on graphs with n nodes - Runtime: O(log(k)(g(n) + k)) - g(n): time for min-cut in graph with n nodes Tree-metric not enough

Consider arbitrary assignment costs

Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ NP-hard even if d is tree-metric

Star-graph generalizes potts model (d = 0/1)

Compare to L = {1,...,k} and d(i, j) = | i - j | - min s-t cut - special case of tree-metric: path Application: spatially coherent clustering

I tree of labels optimal labeling

Each pixel in I is red, green or blue We build a tree-metric via hierarchical clustering Main tool: (a, b) swaps

We have a current labeling f Labels a and b compete over objects currently labeled a or b

Optimal (a, b) swap can be found using a min cut On graph with nodes currently labeled a or b + s, t Optimal (a, b) swap

t

s optimal (a, b) swap via min cut s,t links reflect assignment costs and separation to other objects Kolen’s optimality result

Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ￿∈ ￿∈ d is tree-metric - tree edges define adjacent labels Theorem: Local minimum with respect to adjacent swaps is global minimum How to find a local minimum? - Kolen: 1 swap per edge in tree: O(kg(n)) - Our main result: still one swap per edge, but on smaller and smaller graphs: O(log(k)g(n)) T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm

Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree Correctness of sweep algorithm

Final labeling f Theorem: After sweep, no adjacent swap improves f

a

Proof idea b - Consider a tree edge (a, b) - Labeling right after (a, b) swap: g - Anything labeled a (b) by f is labeled a (b) by g - If (a, b) swap improves f it also improves g Runtime analysis (balanced binary trees) a Initially all n objects have label a b c Two swap moves from a - O(g(n)) time

- nb nodes adopt b

- nc nodes adopt label c

- nb + nc ≤ n

Two swap moves from b: O(g(nb)) time

Two swap moves from c: O(g(nc)) time

Time charged to b+c: O(g(nb)) + O(g(nc)) = O(g(n)) Runtime analysis (balanced binary trees)

nl : number of objects that adopt label l in swap move from parent d e f g nd + ne + nf + ng ≤ n

Time charged to all nodes at each level: O(g(n)) Tree with k labels, depth log(k) Total time: O(log(k)g(n)) Runtime analysis (balanced binary trees)

nl : number of objects that adopt label l in swap move from parent d e f g nd + ne + nf + ng ≤ n

Time charged to all nodes at each level: O(g(n)) Tree with k labels, depth log(k) Total time: O(log(k)g(n))

General case: O(depth * max * g(n)) Setting up swap t

a

b

Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t

a

b

Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t

a

b

Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t

a

b

Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t

a

b

Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Non-binary trees

a a 0 a’ d b c d b c Binarize tree by adding extra nodes Number of labels doesn’t grow much Optimal solution may use extra labels - we can relabel without changing quality

Deep trees? Divide and conquer Ta a b Tb A B

Pick any tree edge (a, b) Find optimal labeling using only a and b Breaks objects into 2 sets A, B Divide and conquer Ta a b Tb A B

Pick any tree edge (a, b) Find optimal labeling using only a and b Breaks objects into 2 sets A, B Theorem: There exists an optimal labeling f where

- nodes in A get label from Ta

- nodes in B get label from Tb Subproblems Ta a b Tb A B

Independently label A and B

Find optimal labeling of A using Ta

forced label Running time

Repeatedly pick tree edge that evenly divides labels - Binary tree -> good partition Depth of recursion: log(k) At each depth we have “disjoint subproblems” - Each object participates in one subproblem Total time for min-cuts at each depth: O(g(n)) Total runtime: O(log(k)(g(n) + k)) - Min cuts + building subproblems Proof of correctness Ta a b Tb A B

By correctness of the sweep algorithm - We can root tree at a and do (a, b) swap first

- Final f labels objects in A (B) with labels in Ta (Tb) Direct proof sketch

Let f be optimal labeling using a and b only - A: set of objects labeled a - B: set of objects labeled b Let g be optimal labeling subject to

- objects in A take labels in Ta

- objects in B take labels in Tb Suppose there exists h better than g

- There are objects S in A given labels in Tb by h

- There are objects S in B given labels in Ta by h f: optimal a, b labeling g: optimal labeling respecting A, B partition h: optimal overall, better than g

Suppose there are objects S in A given labels in Tb by h Pick h so that d ( h ( v ) ,a ) is as small as possible v S Relabeling S with￿∈ b in f must not improve it

Q(f ￿) Q(f) 0 − ≥ Change h to h’ by moving label of objects in S towards a

Q(h￿) Q(h) Q(f) Q(f ￿) 0 − ≤ − ≤ So h’ is optimal as well Spatially coherent clustering Hierarchical clustering of colors -> label tree Labeling -> spatially coherent clustering ~100.000 objects, ~50.000 labels, runs in a few seconds

Figure 2. Some images from the Berkeley segmentation dataset [21] (first row) and the result of our method (second row). Note that our output is not a full-fledged segmentation. It could be used to generate superpixels or as a preprocessing step for segmentation.

and its application to image querying. PAMI, 24(8):1026– [20] V. Lempitsky, C. Rother, S. Roth, and A. Blake. Fusion 1038, 2002. moves for Markov Random Field optimization. PAMI, 2010. [6] D. Comaniciu and P. Meer. Mean shift: A robust approach [21] D. Martin, C. Fowlkes, and J. Malik. Learning to detect nat- toward feature space analysis. PAMI, 24(5):603–619, 2002. ural image boundaries using local brightness, color, and tex- [7] A. Delong and Y. Boykov. Globally optimal segmentation of ture cues. PAMI, pages 530–549, 2004. multi-region objects. In ICCV, pages 1–8, 2009. [22] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propaga- [8] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial struc- tion for approximate inference: an empirical study. In UAI, tures for object recognition. IJCV, 61(1), 2005. 1999. [9] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient belief [23] D. Murray and B. Buxton. Scene segmentation from visual propagation for early vision. IJCV, 70(1), 2006. motion using global optimization. PAMI, 9(2), 1987. [10] M. Figueiredo, D. Cheng, and V. Murino. Clustering under [24] J. Pearl. Probabilistic reasoning in intelligent systems: net- prior knowledge with application to image segmentation. In works of plausible inference. Morgan Kaufmann, 1988. NIPS, 2007. [25] C. Rother, V. Kolmogorov, and A. Blake. “GrabCut” - inter- active foreground extraction using iterated graph cuts. SIG- [11] D. Greig, B. Porteous, and A. Seheult. Exact maximum a GRAPH, 23(3):309–314, 2004. posteriori estimation for binary images. Journal of the Royal [26] G. Sfikas, C. Nikou, and N. Galatsanos. Edge preserving Statistical Society, Series B, 51(2):271–279, 1989. spatially varying mixtures for image segmentation. In CVPR, [12] P. Hammer. Some network flow problems solved with pages 1–7, 2008. pseudo-boolean programming. OR, 13:388–399, 1965. [27] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kol- [13] D. S. Hochbaum. An efficient algorithm for image segmen- mogorov, A. Agarwala, M. Tappen, and C. Rother. A com- tation, Markov Random Fields and related problems. JACM, parative study of energy minimization methods for Markov 48(4):686–701, 2001. Random Fields. PAMI, 30(6):1068–1080, 2008. [14] P. Indyk. Algorithmic aspects of geometric embeddings. Tu- [28] L. Valiant. Universality considerations in VLSI circuits. torial presented at FOCS, 2001. IEEE Trans. Comput., 30(2):135–140, 1981. [15] H. Ishikawa. Exact optimization for Markov Random Fields [29] O. Veksler. Graph cut based optimization for MRFs with with convex priors. PAMI, 25(10):1333–1336, 2003. truncated convex priors. In CVPR, pages 1–8, 2007. [16] H. Ishikawa and D. Geiger. Segmentation by grouping junc- [30] J. Ward Jr. Hierarchical grouping to optimize an objective tions. In CVPR, pages 125–131, 1998. function. Journal of the American statistical association, [17] J. Kleinberg and E. Tardos. Algorithm Design. Addison Wes- 58(301):236–244, 1963. ley, 2005. [31] Y. Weiss and E. H. Adelson. A unified mixture framework [18] A. J. W. Kolen. Tree Network and Planar Rectilinear Loca- for motion segmentation: Incorporating spatial coherence tion Theory, volume 25 of CWI Tracts. CWI, 1986. and estimating the number of models. In CVPR, pages 321– [19] V. Lempitsky, C. Rother, and A. Blake. Logcut - efficient 326, 1996. graph cut optimization for markov random fields. In ICCV, [32] R. Zabih and V. Kolmogorov. Spatially coherent clustering 2007. using graph cuts. In CVPR, pages II: 437–444, 2004. Open problems

Simultaneous optimization over f and T

These colors want a recent common ancestor

More general metrics?

Are these techniques helpful for general d or assignment costs?