Metric Labeling with Tree-Metrics
Pedro Felzenszwalb, Gyula Pap, Eva Tardos, Ramin Zabih Metric labeling
Graph G = (V, E) Labels L Labeling f : V L → - c(v, a) is a cost for giving label a to node v - d(a, b) is a distance on L - we would like related objects to get similar labels Goal: Find f minimizing
Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ Metric labeling
General problem is NP-hard - log(k) approximation via LP If G is a tree, or low tree-width - dynamic programming
General G, L = {1,...,k} and d(i, j) = | i - j | - min s-t cut
- (or any d(i, j) = convex(i - j))
Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ Application: image de-noising
Observe a noisy image We want to pick a new value for each pixel - Each value should be similar to observed one - Value of nearby pixels should be similar Metric labeling with “ideal label”
Observations are labels Observed label o(v) associated with each object v Assignment cost measures distance between labels
- c(v, f(v)) = wv d(o(v), f(v))
Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ Still NP-hard... generalizes 0-extension (multiway cut) “ideal label” + tree metric
Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ Main result - Fast algorithm when d is a tree-metric - k: number of labels - n: number of objects being labeled - ~ log(k) min-cuts on graphs with n nodes - Runtime: O(log(k)(g(n) + k)) - g(n): time for min-cut in graph with n nodes Tree-metric not enough
Consider arbitrary assignment costs
Q(f)= c(v, f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ NP-hard even if d is tree-metric
Star-graph generalizes potts model (d = 0/1)
Compare to L = {1,...,k} and d(i, j) = | i - j | - min s-t cut - special case of tree-metric: path Application: spatially coherent clustering
I tree of labels optimal labeling
Each pixel in I is red, green or blue We build a tree-metric via hierarchical clustering Main tool: (a, b) swaps
We have a current labeling f Labels a and b compete over objects currently labeled a or b
Optimal (a, b) swap can be found using a min cut On graph with nodes currently labeled a or b + s, t Optimal (a, b) swap
t
s optimal (a, b) swap via min cut s,t links reflect assignment costs and separation to other objects Kolen’s optimality result
Q(f)= wvd(o(v),f(v)) + wuvd(f(u),f(v)) v V (u,v) E ∈ ∈ d is tree-metric - tree edges define adjacent labels Theorem: Local minimum with respect to adjacent swaps is global minimum How to find a local minimum? - Kolen: 1 swap per edge in tree: O(kg(n)) - Our main result: still one swap per edge, but on smaller and smaller graphs: O(log(k)g(n)) T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree T (tree of labels) G Sweep algorithm
Pick a root label r Label all objects r Repeat: - Pick tree edge (a, b) with a explored and b not explored - Find optimal (a, b) swap move Label of each node starts at r and moves down the tree Correctness of sweep algorithm
Final labeling f Theorem: After sweep, no adjacent swap improves f
a
Proof idea b - Consider a tree edge (a, b) - Labeling right after (a, b) swap: g - Anything labeled a (b) by f is labeled a (b) by g - If (a, b) swap improves f it also improves g Runtime analysis (balanced binary trees) a Initially all n objects have label a b c Two swap moves from a - O(g(n)) time
- nb nodes adopt b
- nc nodes adopt label c
- nb + nc ≤ n
Two swap moves from b: O(g(nb)) time
Two swap moves from c: O(g(nc)) time
Time charged to b+c: O(g(nb)) + O(g(nc)) = O(g(n)) Runtime analysis (balanced binary trees)
nl : number of objects that adopt label l in swap move from parent d e f g nd + ne + nf + ng ≤ n
Time charged to all nodes at each level: O(g(n)) Tree with k labels, depth log(k) Total time: O(log(k)g(n)) Runtime analysis (balanced binary trees)
nl : number of objects that adopt label l in swap move from parent d e f g nd + ne + nf + ng ≤ n
Time charged to all nodes at each level: O(g(n)) Tree with k labels, depth log(k) Total time: O(log(k)g(n))
General case: O(depth * max degree * g(n)) Setting up swap t
a
b
Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t
a
b
Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t
a
b
Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t
a
b
Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Setting up swap t
a
b
Need to know cost of assigning a and b to each object s Each object has ideal label l - Cost difference depends on where l lives relative to (a, b) - Can compute in O(1) time per object - DFS labeling Non-binary trees
a a 0 a’ d b c d b c Binarize tree by adding extra nodes Number of labels doesn’t grow much Optimal solution may use extra labels - we can relabel without changing quality
Deep trees? Divide and conquer Ta a b Tb A B
Pick any tree edge (a, b) Find optimal labeling using only a and b Breaks objects into 2 sets A, B Divide and conquer Ta a b Tb A B
Pick any tree edge (a, b) Find optimal labeling using only a and b Breaks objects into 2 sets A, B Theorem: There exists an optimal labeling f where
- nodes in A get label from Ta
- nodes in B get label from Tb Subproblems Ta a b Tb A B
Independently label A and B
Find optimal labeling of A using Ta
forced label Running time
Repeatedly pick tree edge that evenly divides labels - Binary tree -> good partition Depth of recursion: log(k) At each depth we have “disjoint subproblems” - Each object participates in one subproblem Total time for min-cuts at each depth: O(g(n)) Total runtime: O(log(k)(g(n) + k)) - Min cuts + building subproblems Proof of correctness Ta a b Tb A B
By correctness of the sweep algorithm - We can root tree at a and do (a, b) swap first
- Final f labels objects in A (B) with labels in Ta (Tb) Direct proof sketch
Let f be optimal labeling using a and b only - A: set of objects labeled a - B: set of objects labeled b Let g be optimal labeling subject to
- objects in A take labels in Ta
- objects in B take labels in Tb Suppose there exists h better than g
- There are objects S in A given labels in Tb by h
- There are objects S in B given labels in Ta by h f: optimal a, b labeling g: optimal labeling respecting A, B partition h: optimal overall, better than g
Suppose there are objects S in A given labels in Tb by h Pick h so that d ( h ( v ) ,a ) is as small as possible v S Relabeling S with∈ b in f must not improve it
Q(f ) Q(f) 0 − ≥ Change h to h’ by moving label of objects in S towards a
Q(h) Q(h) Q(f) Q(f ) 0 − ≤ − ≤ So h’ is optimal as well Spatially coherent clustering Hierarchical clustering of colors -> label tree Labeling -> spatially coherent clustering ~100.000 objects, ~50.000 labels, runs in a few seconds
Figure 2. Some images from the Berkeley segmentation dataset [21] (first row) and the result of our method (second row). Note that our output is not a full-fledged segmentation. It could be used to generate superpixels or as a preprocessing step for segmentation.
and its application to image querying. PAMI, 24(8):1026– [20] V. Lempitsky, C. Rother, S. Roth, and A. Blake. Fusion 1038, 2002. moves for Markov Random Field optimization. PAMI, 2010. [6] D. Comaniciu and P. Meer. Mean shift: A robust approach [21] D. Martin, C. Fowlkes, and J. Malik. Learning to detect nat- toward feature space analysis. PAMI, 24(5):603–619, 2002. ural image boundaries using local brightness, color, and tex- [7] A. Delong and Y. Boykov. Globally optimal segmentation of ture cues. PAMI, pages 530–549, 2004. multi-region objects. In ICCV, pages 1–8, 2009. [22] K. Murphy, Y. Weiss, and M. Jordan. Loopy belief propaga- [8] P. F. Felzenszwalb and D. P. Huttenlocher. Pictorial struc- tion for approximate inference: an empirical study. In UAI, tures for object recognition. IJCV, 61(1), 2005. 1999. [9] P. F. Felzenszwalb and D. P. Huttenlocher. Efficient belief [23] D. Murray and B. Buxton. Scene segmentation from visual propagation for early vision. IJCV, 70(1), 2006. motion using global optimization. PAMI, 9(2), 1987. [10] M. Figueiredo, D. Cheng, and V. Murino. Clustering under [24] J. Pearl. Probabilistic reasoning in intelligent systems: net- prior knowledge with application to image segmentation. In works of plausible inference. Morgan Kaufmann, 1988. NIPS, 2007. [25] C. Rother, V. Kolmogorov, and A. Blake. “GrabCut” - inter- active foreground extraction using iterated graph cuts. SIG- [11] D. Greig, B. Porteous, and A. Seheult. Exact maximum a GRAPH, 23(3):309–314, 2004. posteriori estimation for binary images. Journal of the Royal [26] G. Sfikas, C. Nikou, and N. Galatsanos. Edge preserving Statistical Society, Series B, 51(2):271–279, 1989. spatially varying mixtures for image segmentation. In CVPR, [12] P. Hammer. Some network flow problems solved with pages 1–7, 2008. pseudo-boolean programming. OR, 13:388–399, 1965. [27] R. Szeliski, R. Zabih, D. Scharstein, O. Veksler, V. Kol- [13] D. S. Hochbaum. An efficient algorithm for image segmen- mogorov, A. Agarwala, M. Tappen, and C. Rother. A com- tation, Markov Random Fields and related problems. JACM, parative study of energy minimization methods for Markov 48(4):686–701, 2001. Random Fields. PAMI, 30(6):1068–1080, 2008. [14] P. Indyk. Algorithmic aspects of geometric embeddings. Tu- [28] L. Valiant. Universality considerations in VLSI circuits. torial presented at FOCS, 2001. IEEE Trans. Comput., 30(2):135–140, 1981. [15] H. Ishikawa. Exact optimization for Markov Random Fields [29] O. Veksler. Graph cut based optimization for MRFs with with convex priors. PAMI, 25(10):1333–1336, 2003. truncated convex priors. In CVPR, pages 1–8, 2007. [16] H. Ishikawa and D. Geiger. Segmentation by grouping junc- [30] J. Ward Jr. Hierarchical grouping to optimize an objective tions. In CVPR, pages 125–131, 1998. function. Journal of the American statistical association, [17] J. Kleinberg and E. Tardos. Algorithm Design. Addison Wes- 58(301):236–244, 1963. ley, 2005. [31] Y. Weiss and E. H. Adelson. A unified mixture framework [18] A. J. W. Kolen. Tree Network and Planar Rectilinear Loca- for motion segmentation: Incorporating spatial coherence tion Theory, volume 25 of CWI Tracts. CWI, 1986. and estimating the number of models. In CVPR, pages 321– [19] V. Lempitsky, C. Rother, and A. Blake. Logcut - efficient 326, 1996. graph cut optimization for markov random fields. In ICCV, [32] R. Zabih and V. Kolmogorov. Spatially coherent clustering 2007. using graph cuts. In CVPR, pages II: 437–444, 2004. Open problems
Simultaneous optimization over f and T
These colors want a recent common ancestor
More general metrics?
Are these techniques helpful for general d or assignment costs?