Texture Synthesis and Image Inpainting for Video Compression Project Report for CSE-252C
Total Page:16
File Type:pdf, Size:1020Kb
Texture Synthesis and Image Inpainting for Video Compression Project Report for CSE-252C Sanjeev Kumar Abstract little amount of original data. Although, building a practical video codec based on these principles would require more In this report we investigate the application of texture syn- research effort, present work is intended to be a small step thesis and image inpainting techniques for video compres- in that direction. sion. Working in the non-parametric framework, we use 3D We build upon the work of [3] which uses non-parametric patches for matching and copying. This ensures temporal texture synthesis approach for image inpainting with appro- continuity to some extent which is not possible to obtain priate choice of fill order. The contribution of present work by working with individual frames. Since, in present ap- is two fold. First, we extend the work of [3] to three dimen- plication, patches might contain arbitrary shaped and mul- sions, which, we argue, is more natural setting for video. tiple disconnected holes, fast fourier transform (FFT) and Second, we extend the work of [8], which allows to use summed area table based sum of squared difference (SSD) FFT based SSD calculation for all possible translations of a calculation [8] cannot be used. We propose a modifica- rectangular patch, to arbitrary shaped regions, thus making tion of above scheme which allows its use in present appli- it applicable to image inpainting. cation. This results in significant gain of efficiency since This report is organised as follows. In section 2 we briefly search space is typically huge for video applications. review the relevant literature. Section 3 describes the pro- posed approach. Experimental results have been presented 1. Introduction in section 4. Some future research directions are discussed in Section 5. Any semblance of order in an observation is a manifesta- tion of redundancy of its present representation. Such re- dundancies have been exploited for different purposes in in- 2. Previous work numerable applications, one canonical example being data compression. Specifically, in video compression, motion Related prior work can be broadly classified into following compensation and transform coding are used to exploit the three categories. spatio-temporal redundancy in the video signal. But the types of redundancies exploited by current methods are rather limited. This is exeplified by the success of error 2.1 Texture Synthesis concealment techniques. Morover, Discrete cosine trans- form (DCT), the most commonly used transform coding Texture synthesis has been used in the literature to fill large technique in video compression, has been found to be effec- image regions with texture pattern similar to given sam- tive only for small block sizes, which shows its inability to ple. Methods used for this purpose range from parametric, exploit redundancies extending upto larger extent. Wavelet which estimate parametrized model for the texture and use has been relatively more successful in this respect for cer- it for synthesis, e.g. Heeger et al. [2], to nonparametric, in tain classes of images but hasn’t found much use in general which synthesis is based on direct sampling of the supplied purpose video compression. texture pattern, e.g. Efros and Leung [1]. Texture synthe- It is interesting to compare aforementioned techniques with sis methods have also been used to fill in small holes in the texture synthesis and image inpainting both of which also image which might originate due to detereoration of image assume the existence of redundancies in images and video, or desire to remove some objects. But, aforementioned tex- but exploit it for other purposes different from compression. ture synthesis methods have been found to work poorly for The type of redundancy exploited by these techniques e.g. highly structured textures which are common in natural im- texture synthesis is somewhat complementary to DCT etc. ages. Graphcut based techniques [7] try to main structural Texture synthesis works on more global level. The motiva- continuity during texture synthesis by finding optimal seam tion behind present work was the perceptual quality of out- while copying texture patches from different portions for put of texture synthesis and image inpainting methods using image. 1 2.2 Image Inpainting which is given by ? Image inpainting has been used in the literature to fill in jrI :npj D(p) = p (2) small holes in the image, by propogating structure informa- ® tion from image to the region to be filled. They typically use diffusion to propogate linear strucure based on partial where, ® is maximum value of intensity range used for differential equation [5] [9]. These are found to perform image ( e.g 255 ). rIp represents gradient of the im- well in filling small holes but produce noticeable blur when age and np represents normal vector to the region bound- filling large holes. Recently, Criminisi et al. [3] proposed ary. Based on data and confidence terms a priority is as- a method which combines the benefit provided by texture signed to every pixel on the boundary, which is given by synthesis and image inpainting. The results of their algo- P (p) = C(p) ¤ D(p). The patch centered at pixel with rithm compare favorably with other state of the art in the maximum priority value p^ = arg maxp Pp is selected as field without resorting to texture segementation or explicit the starting point for filling. Search is performed over all structure propagation and is able to fill in large holes. We patches Ãq 2 © for best matching patch Ãq^ according primarily build upon their work and investivate some issues to modified sum of squared difference criterion d(Ãp^;Ãq) which are not so important for static images but become which includes only those pixels of Ãp^ which are already relevant for videos. filled in. Authors of [3] performed search in CIE Lap color space, but in present work we only deal with grayscale im- ages. But all the algorithms presented here can easily be ex- 2.3 Applications in Video tended to any color space. Values for all pixels r 2 Ãp^ \ Application of texture synthesis and image inpainting for are copied from corresponding pixels in Ãq^ and confidence video has been investigated in Bornard et al [6]. While terms are copied from p^. Data term is recomputed for pixels searching for the matching neighborhood their algorithm on the boundary created by recent fill and above steps are also searches in previous and next frames apart from the repeated untill all the pixels get filled. current frame. Although it provides robustness against er- The algorithm described so far is exactly same as that pre- ror, it doesn’t try to maintain temporal continuity which is sented in [3]. Now we describe, proposed extensions in the very important for video applications. present work in following two subsections. 3. Proposed approach 3.3 Extension to 3D A naive aproach to extend the algorighm described above In this section, we begin with brief recapitulation of the to video would be to treat video as a collection of frames or notation and algorithm presented in [3]. For more details, images and perform the same operation on each frame in- reader should refer to the original manuscript. dependently. There are two main disadvantages of this ap- proach. Because of temporal correlation among the frames 3.1 Problem Statement and Some Notations a matching patch is also likely to be found in temporally ad- jacent frames especially if a portion of the region to be filled Given an image or video I = ©[, where © is the source re- (and hence absent) in one frame is present in some other gion, set of pixels whose value is known and is the target neighboring frame. The other disadvantage is that even region (aka hole), set of pixels whose value is to be filled. if every filled frame is spatially coherent, temporal coher- ± represents boundary between source and target region, ence is not ensured, which might result in visible artifacts which is a contour in 2D and a surface in 3D. For all bound- in the video. For video texture synthesis, 3D spatio tempo- ary pixels p 2 ±, Ãp denotes a patch centered at p. ral patches were used in [7]. Similarly, we propose to use 3D spatio temporal patches in the present framework. This 3.2 Inpainting in 2D ensures temporal consistency in the video frame because of the similar reasons which result in spatial consistency to be We assign a confidence term to every pixel on the boundary, maintained in 2D version of the present algorithm. which is given by In order to extend the present algorithm to 3D, we need gen- P eralizations of confidence term of (1) and data term of (2). q2Ã \(I¡) C(q) Generalization of confidence term is trivial since now we C(p) = p (1) jÃpj just need to sum over pixels in the 3D patch. Generaliza- tion of data term is not that obvious. Although, we could Initially, C(p) = 0; 8p 2 . and C(p) = 1; 8p 2 I ¡ . still compute the spatio-temporal gradient of image inten- We also assign a data term to every pixel on the boundary, sity (rIp) and normal vector (np) to the boundary surface, 2 there is no unique perpendicular direction to the gradient of search region and for non-masked resion can be computed ? image intensity i.e. rIp is not unique. We propose follow- in O(nlog(n)) time using FFT. But, summation domain in ing modified data term using cross product of vectors. (5) is not a simple rectangular one but can contain arbi- trary shaped and disconnected holes and hence algorithm jrI £ n j D(p) = p p (3) presented in [8] is not applicable here.