Spatio-Temporal Texture Synthesis and Image Inpainting for Video Applications
Total Page:16
File Type:pdf, Size:1020Kb
SPATIO-TEMPORAL TEXTURE SYNTHESIS AND IMAGE INPAINTING FOR VIDEO APPLICATIONS Sanjeev Kumar, Mainak Biswas, Serge J. Belongie and Truong Q. Nguyen University of California, San Diego ABSTRACT texture synthesis is somewhat complementary to DCT etc. Texture synthesis works on more global level. The moti- In this paper we investigate the application of texture syn- vation behind present work was the perceptual quality of thesis and image inpainting techniques for video applica- output of texture synthesis and image inpainting methods tions. Working in the non-parametric framework, we use using little amount of original data. Intra block prediction 3D patches for matching and copying. This ensures tempo- and context based arithmetic coding in H.264, try to exploit ral continuity to some extent which is not possible to obtain some redundancies which are not exploited by DCT alone, by working with individual frames. Since, in present ap- but these still work on a very local level and fail to exploit plication, patches might contain arbitrary shaped and mul- redundacies at more global level. Although, building a prac- tiple disconnected holes, fast fourier transform (FFT) and tical video codec based on these principles would require summed area table based sum of squared difference (SSD) more research effort, present work is intended to be a small calculation [1] cannot be used. We propose a modification step in that direction. of above scheme which allows its use in present application. Our approach is similar to [2] which uses non-parametric This results in significant gain of efficiency since search texture synthesis approach for image inpainting with appro- space is typically huge for video applications. priate choice of fill order. The contribution of present work is two fold. First, we extend the work of [2] to three di- 1. INTRODUCTION mensions, which is more natural setting for video. Second, we extend the work of [1], which allows to use FFT based Any semblance of order in an observation is a manifesta- SSD calculation for all possible translations of a rectangular tion of redundancy of its present representation. Such re- patch, to arbitrary shaped regions, thus making it applicable dundancies have been exploited for different purposes in in- to image inpainting. numerable applications, one canonical example being data This paper is organised as follows. In section 2 we briefly compression. Specifically, in video compression, motion review the relevant literature. Section 3 describes the pro- compensation and transform coding are used to exploit the posed approach. Experimental results have been presented spatio-temporal redundancy in the video signal. But the in Section 4. Some future research directions are discussed types of redundancies exploited by current methods are rather in Section 5. limited. This is exemplified by the success of error con- cealment techniques. Morover, Discrete cosine transform 2. PREVIOUS WORK (DCT), the most commonly used transform coding tech- nique in video compression, has been found to be effective Related prior work can be broadly classified into following only for small block sizes, which shows its inability to ex- three categories. ploit redundancies extending upto larger extent. Wavelet 1. Texture Synthesis: Texture synthesis has been used has been relatively more successful in this respect for cer- in the literature to fill large image regions with tex- tain classes of images but hasn’t found much use in general ture pattern similar to given sample. Methods used purpose video compression. for this purpose range from parametric, which esti- It is interesting to compare aforementioned techniques with mate parametrized model for the texture and use it texture synthesis and image inpainting both of which also for synthesis, e.g. Heeger et al. [3], to nonparamet- assume the existence of redundancies in images and video, ric, in which synthesis is based on direct sampling of but exploit it for other purposes different from compression. the supplied texture pattern, e.g. Efros and Leung [4]. The type of redundancy exploited by these techniques e.g. Texture synthesis methods have also been used to fill This work is supported in part by NSC and by CWC with matching in small holes in the image which might originate due grant from UCDiMi. to deterioration of image or desire to remove some objects. But, aforementioned texture synthesis meth- Initially, C(p) = 0; 8p 2 . and C(p) = 1; 8p 2 I ¡ . ods have been found to work poorly for highly struc- We also assign a data term to every pixel on the boundary, tured textures which are common in natural images. which is given by Graphcut based techniques [5] try to maintain struc- D(p) = jrI?:n j (2) tural continuity during texture synthesis by finding p p optimal seam while copying texture patches from dif- where, rIp represents gradient of the image and np repre- ferent portions of image. sents normal vector to the region boundary. Based on data and confidence terms a priority is assigned to every pixel 2. Image Inpainting: Image inpainting has been used on the boundary, which is given by P (p) = C(p) ¤ D(p). in the literature to fill in small holes in the image, The patch centered at pixel with maximum priority value by propagating structure information from image to p^ = arg maxp Pp is selected as the starting point for filling. the region to be filled. Typically, diffusion is used to Search is performed over all patches Ãq 2 © for best match- propogate linear strucure based on partial differential ing patch Ãq^ according to modified sum of squared differ- equation [6] [7]. These are found to perform well in ence criterion d(Ãp^;Ãq) which includes only those pixels filling small holes but produce noticeable blur when of Ãp^ which are already filled in. Authors of [2] performed filling large holes. Recently, Criminisi et al. [2] pro- search in CIE Lab color space, but in present work we only posed a method which combines the benefit provided deal with grayscale images. But all the algorithms presented by texture synthesis and image inpainting. The re- here can easily be extended to any color space. Values for sults of their algorithm compare favorably with other all pixels r 2 Ãp^ \ are copied from corresponding pixels state of the art in the field without resorting to texture in Ãq^ and confidence terms are copied from p^. Data term is segmentation or explicit structure propagation and is recomputed for pixels on the boundary created by recent fill able to fill in large holes. Our approach is similar to and above steps are repeated untill all the pixels get filled. their work but we investigate some issues which are The algorithm described so far is similar to that presented not so important for static images but become relevant in [2] except lack of a normalization factor in (2), which for videos. doesn’t affect the final result. Now we describe contribu- tion of the present work in following two subsections. 3. Applications in Video: Some contemporary applica- tions of these techniques include recovery of lost data 3.3. Extension to 3D in image sequences [8], video logo removal [9], ob- ject removal [2]. A naive aproach to extend the algorighm described above to video would be to treat video as a collection of frames or images and perform the same operation on each frame in- 3. PROPOSED APPROACH dependently. There are two main disadvantages of this ap- proach. Because of temporal correlation among the frames In this section, we begin with brief recapitulation of the a matching patch is also likely to be found in temporally notation and algorithm presented in [2]. For more details, adjacent frames especially if a portion of the region to be reader should refer to the original manuscript. filled (and hence absent) in one frame is present in some other neighboring frame. The other disadvantage is that 3.1. Problem Statement and Some Notations even if every filled frame is spatially coherent, temporal co- herence is not ensured, which might result in visible arti- Given an image or video I = © [ , where © is the source facts in the video. For video texture synthesis, 3D spatio region, set of pixels whose value is known and is the tar- temporal patches were used in [5]. Similarly, we propose to get region (hole), set of pixels whose value is to be filled. use 3D spatio temporal patches in the present framework. ± represents boundary between source and target region, This ensures temporal consistency in the video frame be- which is a contour in 2D and a surface in 3D. For all bound- cause of the similar reasons which result in spatial consis- ary pixels p 2 ±, Ãp denotes a patch centered at p. tency to be maintained in 2D version of the present algo- rithm. 3.2. Inpainting in 2D In order to extend the present algorithm to 3D, we need gen- eralizations of confidence term of (1) and data term of (2). We assign a confidence term to every pixel on the boundary, Generalization of confidence term is trivial since now we which is given by just need to sum over pixels in the 3D patch. Generaliza- P tion of data term is not that obvious. Although we could C(q) still compute the spatio-temporal gradient of image inten- C(p) = q2Ãp\(I¡) (1) jÃpj sity (rIp) and normal vector (np) to the boundary surface, there is no unique perpendicular direction to the gradient of FFT. But, summation domain in (5) is not a simple rectan- ? image intensity i.e. rIp is not unique. We propose follow- gular one and can contain arbitrary shaped and disconnected ing modified data term using cross product of vectors.