Review and Preview: Disocclusion by Inpainting for Image-Based Rendering

1 Review and Preview: Disocclusion by Inpainting for Image-based Rendering Zinovi Tauber, Ze-Nian Li, Mark S.Drew Abstract|Image-based rendering takes as input multiple we shall show. Disocclusion refers to the process of recov- images of an object and generates photorealistic images from ering scene information obstructed by visible points. novel viewpoints. This approach avoids explicitly modeling scenes by replacing the modeling phase with an object re- Realistic computer models can instead be obtained us- construction phase. Reconstruction is achieved in two pos- ing 3D acquisition methods on existing objects or models sible ways: recovering 3D point locations using multiview stereo techniques, or reasoning about consistency of each (maquettes). Acquisition using images is versatile and re- voxel in a discretized object volume space. sults in more detail than other 3D acquisition methods. The most challenging problem for image-based recon- Unfortunately, image correspondences for multiview stereo struction is the presence of occlusions. Occlusions make matching are hard to accomplish reliably. Most stereo re- reconstruction ambiguous for object parts not visible in any input image. These parts must be reconstructed in a visu- construction approaches initially recover at least camera ally acceptable way. This review presents our insights for pose parameters, so that epipolar geometry can be estab- image inpainting to provide both attractive reconstruction lished between the reference view and any additional views. and a framework increasing the accuracy of depth recov- The epipolar constraint for a pixel in the reference view ery. Digital image inpainting refers to any methods that ¯ll-in holes of arbitrary topology in images so that they indicates that its line of sight (LOS) ray projects to the seem to be part of the original image. Available meth- epipolar line in another view image. Reviews of stereo cor- ods are broadly classi¯ed as structural inpainting or tex- respondence and recti¯cation formulation can be found in tural inpainting. Structural inpainting reconstructs using prior assumptions and boundary conditions, while textural the books [66][67]. For dense matching, a disparity map is inpainting only considers available data from texture exem- calculated for all pixels in the reference view by matching plars or other templates. Of particular interest is research them to the pixels on the corresponding epipolar lines of of structural inpainting applied to 3D models, impressing its e®ectiveness for disocclusion. the other views. There are many issues that complicate calculating a good match, the worst of which are occlu- Index Terms|Disocclusion, Image Inpainting, Image- based Rendering, Depth from Stereo, Volumetric Recon- sions. In the presence of occlusions some pixels will not struction, View-Dependent Rendering. have other corresponding pixels at all, and pixels on depth discontinuity boundary have mixed colors [1]. I. Introduction Image-based rendering techniques combine both vision and graphics processes in various interesting ways to re- A. Motivation construct an object from multiple images, and reproject it to a novel view. The ability of these methods to han- One of the main objectives of computer graphics is to dle occlusions, despite many innovations, is insu±cient and produce realistic imagery. When scene realism is the main could bene¯t greatly from integration with principles from concern, the easiest way to achieve it is by photograph- digital inpainting. ing or ¯lming the desired scene, using it on a computer. Objects in recorded scenes, special e®ects and computer For any number of cameras, reconstruction algorithms imagery can be composited together easily using chroma- might face a family of shapes to choose from [2], all pro- keying/blue-matting techniques. It is often desired to in- jecting identically to all camera images due to occlusions. corporate objects that would be di±cult to acquire on ¯lm, In order to estimate the occluded object sections we need either due to physical limitations or practical limitations prior knowledge or assumptions about the model. Dis- such as positioning of cameras. These objects can be mod- occlusion algorithms have been studied in computer vision eled on a computer by an artist, but this process is very for purposes such as segmentation or stereo matching [3][4]. laborious and rarely achieves realistic imagery. Illumina- Recently, such techniques have taken a new role, that of tion, complex shapes, materials and interaction dynamics restoration of images with occluded and damaged regions, are all very hard to model in a realistic way. The term called holes, where the location of these regions is known. photorealistic rendering refers to a computer process that Bertalmio et al. [5] have formulated the problem in terms generates images indistinguishable from photographs. It is of a di®usion of image structure elements from the hole di±cult to re-image acquired scene into a new virtual cam- boundary into the hole. This process was called digital era in space. The two main problems are that the depths image inpainting, a term that is borrowed from the arts of the visible scene points are unknown, and worse, that used to describe a restoration process for damaged paint- nothing is known of the invisible points. These two prob- ings. Research in the image inpainting ¯eld focuses on im- lems are interrelated with object reconstruction in 3D, as proving assumptions for connectivity of object structures in the hole, as well as perform inpainting of texture, and Z. Tauber is with the Department of Computing Science in Simon even inpainting based on statistical/template data. Image Fraser University, British Columbia, Canada. inpainting has also been performed in a rudimentary fash- 2 ion on surfaces in 3D that had holes due to occlusions. We painting called Curvature Driven Di®usion (CDD), and in argue that inpainting methodology, extended to 3D in a a later paper remarkably showed how the Euler elastica framework where an object surface is the structure com- encapsulates both the CCD inpainting and transportation ponent on which applies a 3D displacement map as the inpainting [11]. texture component, will enable rendering object sections There are many additional types of inpaintings that that otherwise cannot be reconstructed. Here, 3D Texture were proposed subsequently, including textural inpaint- is de¯ned as a 3D displacement map. Moreover, this frame- ing [13][14][15][16] which rely on texture matching and work can be used to remove noise and improve disparity replication, or global image statistics [17], or templates matching in 3D reconstruction. matching functionals [18]. Finally, there is also research done on inpainting in 3D, by explicitly reconstructing sur- B. Overview of Surveyed Literature faces [19] or by applying the inpainting suggested in [5] to Reconstruction tasks such as image restoration, object generate a surface in a volume grid [20]. disocclusion, texture segmentation and synthesis, and sur- The earliest research on image-based rendering used im- face extraction share some similar underlying principles. age warping techniques to generate new views of a realistic These problems admit a probabilistic model in which each or a computer generated scene to speed up rendering time, possible state of each element is assigned some probabil- between two existing close views. Chen and Williams [21] ity drawn from a random ¯eld, most commonly a Markov have extended the idea to 3D by calculating a linear warp Random Field (MRF) [6][7]. Then, reconstructing an im- ¯eld between corresponding 3D points of two scenes, and age is accomplished by ¯nding the Maximum A-Posteriori interpolate for views in between. Their research tries to (MAP) hypothesis. Gibbs ¯elds can calculate the equiva- deal with both holes and visibility ordering. The Light- lent MRF probability and explicitly depend on an energy ¯eld [22] and Lumigraph [23] provide a more accurate and functional, which can be more easily minimized than the complete capability of viewing from any point in the sam- probability itself in order to ¯nd the MAP hypothesis. New pled space. The space is sampled regularly, and a 4D lat- energy functionals can be constructed that are driven by tice of images is created. Any single view direction corre- some envisioned process, rather than by explicitly model- sponds to a 2D slice in the space, and interpolated from ing the likelihood of states. nearest neighbors as necessary. Acquisition, data storage, Some early work on disocclusion has been done by and access are main concerns here. In Plenoptic Modelling, Nitzberg, Mumford and Shiota [4]. In their work they at- McMillan and Bishop [24] described a complete framework tempted to generate outlines of objects for image segmen- for generating new views from arbitrary viewpoints. First, tation and depth extraction, so T-junctions were detected cylindrical image samples are generated, and then a form in an edge map, and corresponding T-junctions were con- of stereo matching is performed on the cylinders for dense nected using an energy functional minimizing edge length correspondence. A new visibility ordering approach was and curvature. Masnou and Morel [8] had extended the introduced called McMillan's Depth Ordering. This ap- idea to level-sets of images. In this way, all the gray- proach is widely used for reprojecting 3D points to a new levels of the occluded object can be overpainted. Bertalmio view without

Load more