A Probabilistic Image Jigsaw Puzzle Solver
Total Page:16
File Type:pdf, Size:1020Kb
A probabilistic image jigsaw puzzle solver The MIT Faculty has made this article openly available. Please share how this access benefits you. Your story matters. Citation Taeg Sang Cho; Avidan, S.; Freeman, W.T.; , "A probabilistic image jigsaw puzzle solver," Computer Vision and Pattern Recognition (CVPR), 2010 IEEE Conference on , vol., no., pp.183-190, 13-18 June 2010 doi: 10.1109/CVPR.2010.5540212 © Copyright 2010 IEEE As Published http://dx.doi.org/ 10.1109/CVPR.2010.5540212 Publisher Institute of Electrical and Electronics Engineers (IEEE) Version Final published version Citable link http://hdl.handle.net/1721.1/71674 Terms of Use Article is made available in accordance with the publisher's policy and may be subject to US copyright law. Please refer to the publisher's site for terms of use. A probabilistic image jigsaw puzzle solver Taeg Sang Cho†, Shai Avidan‡, William T. Freeman† † Massachusetts Institute of Technology ‡ Tel-Aviv University, Adobe Systems Inc. [email protected], [email protected], [email protected] Abstract when the pairwise affinity of jigsaw pieces is unreliable. Despite the challenge, many scientific problems, includ- We explore the problem of reconstructing an image from ing speech descrambling [23], DNA / RNA modeling [14], a bag of square, non-overlapping image patches, the jigsaw reassembling archeological relics [2] and document frag- puzzle problem. Completing jigsaw puzzles is challenging ments [24], can be modeled as jigsaw puzzles. The NP- and requires expertise even for humans, and is known to be complete complexity of jigsaw puzzles has also been ex- NP-complete. We depart from previous methods that treat ploited in cryptography [3, 7]. the problem as a constraint satisfaction problem and de- In this paper, we focus on solving image jigsaw puz- velop a graphical model to solve it. Each patch location is zles with square pieces. This type of puzzles, sometimes a node and each patch is a label at nodes in the graph. called jig swap puzzles, is missing the shape information of A graphical model requires a pairwise compatibility individual pieces, which is critical for evaluating pairwise term, which measures an affinity between two neighbor- affinities among them. Therefore this problem formulation ing patches, and a local evidence term, which we lack. is even more challenging than solving conventional jigsaw This paper discusses ways to obtain these terms for the jig- puzzles. This, however, is a good framework for analyzing saw puzzle problem. We evaluate several patch compati- structural regularities in natural images since it requires us bility metrics, including the natural image statistics mea- to focus on the image content to solve the puzzle. sure, and experimentally show that the dissimilarity-based This paper also lays groundwork for addressing the compatibility – measuring the sum-of-squared color differ- patch-based image editing / image synthesis problems in ence along the abutting boundary – gives the best results. which the image layout is required, but is not readily ap- We compare two forms of local evidence for the graphical parent. For example, in the patch transform image editing model: a sparse-and-accurate evidence and a dense-and- scenario [4], one needs to know the image layout in order noisy evidence. We show that the sparse-and-accurate ev- to synthesize a visually pleasing image. However, in some idence, fixing as few as 4 − 6 patches at their correct lo- cases, – for instance, when we mix patches from multiple cations, is enough to reconstruct images consisting of over images to synthesize a single image –, it’s unclear what the 400 patches. To the best of our knowledge, this is the largest image layout should be. This paper studies how well we can puzzle solved in the literature. We also show that one can recover the image layout and a natural looking image from coarsely estimate the low resolution image from a bag of a bag of image patches. Such statistical characterization of patches, suggesting that a bag of image patches encodes images is useful for image processing and image synthesis some geometric information about the original image. tasks. We use a graphical model to solve the jigsaw puzzle 1. Introduction problem: Each patch location is a node in the graph and each patch is a label at each node. Hence, the problem is re- We explore the problem of reconstructing an image from duced to finding a patch configuration that is most likely on a bag of square image patches, the jigsaw puzzle problem. the graph. Cho et al.[4] solved this problem in their patch Given square, non-overlapping patches sampled from an transform work, but assumed access to a low-resolution ver- image grid, our goal is to reconstruct the original image sion of the original image, information not available for the from them. jigsaw puzzle problem. Nevertheless, we are assured that A jigsaw puzzle is an intellectually intriguing problem, we can solve the jigsaw puzzle problem if we can address which is also provably technically challenging. Demaine et the simpler problem of the lack of a low resolution image. al. [5] show that the jigsaw puzzle problem is NP-complete We evaluate two methods to address this issue. The 978-1-4244-6985-7/10/$26.00 ©2010 IEEE 183 first approach estimates a low resolution image from a bag a similar scene structure as y, and i be the index of the of patches. The estimated low resolution image serves as patch locations. To reconstruct an image, the patch trans- dense-and-noisy local evidence for the graphical model. form maximizes the following probability: The second approach is to fix a small number of patches, N called anchor patches, at their correct locations. Anchor 1 P (x; y) = p(y |x )p (x |x )p(x )E(x) patches serve as sparse-and-accurate local evidence. We Z i i i,j j i i iY=1 j∈NY(i) can view the anchor patches as injected geometric informa- (1) tion. We study how much geometric information is needed where pi,j (xj|xi) is the probability of placing a patch xj in to reliably reconstruct an image from its bag of patches. the neighborhood of another patch xi, N (i) is the Markov We demonstrate successful image reconstructions of 20 blanket of a node i, and E(x) is an exclusion term that dis- test images. The results suggest that the spatial layout of courages patches from being used more than once. In con- a bag of patches is quite constrained by the patches in the trast to Cho et al.[4], we do not assume we know what the bag, and that a simple bag of patches does not throw away low resolution image y is. as much geometric information as might be thought. We can interpret Eq. (1) as a graphical model, and find Contribution We summarize our contributions as below: the patch configuration x that maximizes the probability Eq. (1) using loopy belief propagation. The message from • We explore a number of patch compatibility met- a node j to a node i is: rics for the graphical model. We show that the dissimilarity-based compatibility – measuring the mji(xi) ∝ pi,j (xi|xj )p(yj |xj ) mlj (xj ) (2) sum-of-squared color difference along the abutting Xxj l∈NY(j)\i boundary – is the most discriminative. We can find the marginal probability at a node i by gather- • We evaluate two strategies to model the evidence term ing all messages from its neighbors and the local evidence: in the graphical model: dense-and-noisy evidence and sparse-and-accurate evidence. The first approach esti- bi(xi) = p(yi|xi) mji(xi) (3) mates the low resolution image from a bag of patches. j∈NY(i) The second approach assumes that few patches, called E(x) is a factor node that gathers messages from all anchor patches, are fixed at their correct location in the nodes. E(x) suppresses the use of the patch l if any of the puzzle. other nodes already claimed the patch l with a high prob- • We introduce three measures to evaluate the puzzle re- ability. In terms of message passing, the factor f sends a construction accuracy, and show that our algorithm can message m to a node i: reconstruct real images reliably. fi mfi(xi = l) ≈ (1 − mtf (xt = l)) (4) 2. Background t∈YS\i Freeman and Gardner [8] were the first to propose where mtf is the marginal probability at node t, and S is an algorithm for solving jigsaw puzzles. Many papers the set of all image nodes. We use this model, which Cho et [10, 11, 16, 21] assume using classic jigsaw pieces with dis- al.[4] used for image editing, to solve jigsaw puzzles. tinct shapes, and focus on matching the shape of the pieces to solve the puzzle. Kosiba et al. [12] considered both the 3. Compatibility boundary shape and the image contents, and many papers The pair-wise patch compatibility P (x |x ) tells us [13, 15, 22] followed suit. Most algorithms solve the puz- i,j j i how likely it is for a patch x to appear next to another zle in two steps. The frame pieces are assembled first and j patch x . There are four types of compatibilities for each the interior is filled in with a greedy algorithm. To date, the i pair of patches: the compatibility of placing the patch x maximum number of jigsaw pieces completed by these al- j to the left/right/top/bottom of the patch x . If the pairwise gorithms is 320 (16x20) [15], and most of them report the i compatibility between patches is accurate, we can solve the reconstruction result on just one or few images.