Gif2video: Color Dequantization and Temporal Interpolation of GIF Images
Total Page:16
File Type:pdf, Size:1020Kb
GIF2Video: Color Dequantization and Temporal Interpolation of GIF images Yang Wang1, Haibin Huang2†, Chuan Wang2, Tong He3, Jue Wang2, Minh Hoai1 1Stony Brook University, 2Megvii Research USA, 3UCLA, †Corresponding Author Abstract Color Color Quantization Dithering Graphics Interchange Format (GIF) is a highly portable graphics format that is ubiquitous on the Internet. De- !"#"$ %&#'((' spite their small sizes, GIF images often contain undesir- )$$"$ *+,,-.+"/ able visual artifacts such as flat color regions, false con- tours, color shift, and dotted patterns. In this paper, we propose GIF2Video, the first learning-based method for en- hancing the visual quality of GIFs in the wild. We focus Artifacts: Artifacts: 1. False Contour 4. Dotted Pattern on the challenging task of GIF restoration by recovering 2. Flat Region 3. Color Shift information lost in the three steps of GIF creation: frame sampling, color quantization, and color dithering. We first Figure 1. Color quantization and color dithering. Two major propose a novel CNN architecture for color dequantization. steps in the creation of a GIF image. These are lossy compression It is built upon a compositional architecture for multi-step processes that result in undesirable visual artifacts. Our approach is able to remove these artifacts and produce a much more natural color correction, with a comprehensive loss function de- image. signed to handle large quantization errors. We then adapt the SuperSlomo network for temporal interpolation of GIF frames. We introduce two large datasets, namely GIF-Faces and GIF-Moments, for both training and evaluation. Ex- temporal resolution of the image sequence by using a mod- perimental results show that our method can significantly ified SuperSlomo [20] network for temporal interpolation. improve the visual quality of GIFs, and outperforms direct The main effort of this work is to develop a method for baseline and state-of-the-art approaches. color dequantization, i.e., removing the visual artifacts in- troduced by heavy color quantization. Color quantization is a lossy compression process that remaps original pixel col- 1. Introduction ors to a limited set of entries in a small color palette. This GIFs [1] are everywhere, being created and consumed process introduces quantization artifacts, similar to those by millions of Internet users every day on the Internet. The observed when the bit depth of an image is reduced. For widespread of GIFs can be attributed to its high portability example, when the image bit depth is reduced from 48-bit . 14 and small file sizes. However, due to heavy quantization to 24-bit, the size of the color palette shrinks from 2 8×10 . 7 in the creation process, GIFs often have much worse visual colors to 1 7 × 10 colors, leading to a small amount of ar- quality than their original source videos. Creating an ani- tifacts. The color quantization process for GIF, however, mated GIF from a video involves three major steps: frame is far more aggressive with a typical palette of 256 dis- sampling, color quantization, and optional color dithering. tinct colors or less. Our task is to perform dequantization Frame sampling introduces jerky motion, while color quan- from a tiny color palette (e.g., 256 or 32 colors), and it is tization and color dithering create flat color regions, false much more challenging than traditional bit depth enhance- contours, color shift, and dotted patterns, as shown in Fig. 1. ment [15, 25, 38]. In this paper, we propose GIF2Video, the first learning- Of course, recovering all original pixel colors from the based method for enhancing the visual quality of GIFs. Our quantized image is nearly impossible, thus our goal is to algorithm consists of two components. First, it performs render a plausible version of what the original image might color dequantization for each frame of the animated gif look like. The idea is to collect training data and train a sequence, removing the artifacts introduced by both color ConvNet [16, 23, 29, 34, 44] to map a quantized image to quantization and color dithering. Second, it increases the its original version. It is however difficult to obtain a good 1419 dequantization network for a wide range of GIF images. tion of the input GIFs. To this end, we propose two novel techniques to improve the performance of the dequantization network. Firstly, 3. GIF Generation and Artifacts we pose dequantization as an optimization problem, and we propose Compositional Color Dequantization Network The three main steps of creating a GIF from a video are: (CCDNet), a novel network architecture for iterative color (1) frame sampling, (2) color quantization, and (3) color dequantization. Similar to the iterative Lucas-Kanade algo- dithering. Frame sampling reduces the file size of the ob- rithm [28], this iterative procedure alleviates the problems tained GIF, but it also lowers the temporal resolution of the associated with severe color quantization. Secondly, during video content. In this section, we will provide more details training, we consider reconstruction loss and generative ad- about the color quantization and color dithering processes versarial loss [10, 19, 32] on both pixel colors and image and the resulting visual artifacts as seen in Figure 1. gradients. This turns out to be far more effective than a loss 3.1. GIF Color Quantization function defined on the color values only. Another contribution of the paper is the creation of two The GIF color quantization process takes an input im- large datasets: GIF-Faces and GIF-Moments. Both datasets age I ∈ RH×W ×3 and a color palette C ∈ RN×3 of N contain animated GIFs and their corresponding high-quality distinct colors, and produces a color-quantized GIF image videos. GIF-Faces is face-centric whereas GIF-Moments G. The quantization is computed for each pixel, thus G has is more generic and diverse. Experiments on these two the same width and height as the input image. Gi,j at pixel datasets demonstrate that our method can significantly en- (i,j) is simply set to the color in the palette C closest to hance the visual quality of GIFs and reduce all types of ar- the input color I , i.e., G argmin I c 2. i,j i,j = c ∈ C i,j − 2 tifacts. Comparatively, Our method outperforms its direct The color palette C could be optimized with a clustering al- baselines as well as existing methods such as False Contour 2 gorithm to minimize the total quantization errorkI − Gk2. Detection & Removal [15] and Pix2Pix [19]. Different clustering algorithms are used in practice, but Me- dian Cut [13] is the most popular one due to its computa- 2. Related Work tional efficiency. Most of the visual artifacts in GIFs are produced by the False Contour Detection and Removal. Smooth areas of color quantization process with a tiny color palette (N = images and video frames should not contain color edges, 256, 32,...). As illustrated in Figure 1, the three most no- but false contours are often visible in those areas after color ticeable types of artifacts are (1) flat regions, (2) false con- bit-depth reduction or video codec encoding. Several false tours, and (3) color shift. We notice a GIF image has a lot contour detection and decontouring methods [2, 5, 7, 15, 21, of connected components in which the color values are the 24, 42] have been proposed to address this problem. Among same, which will be referred to as “flat regions”. Flat re- them, False Contour Detection and Removal (FCDR) [15] gions are created because neighboring pixels with similar is the most recent state-of-the-art approach. It first locates colors are quantized into the same color bin in the palette. the precise positions of the false contours and then applies False contours also emerge at the boundaries between flat dedicated operations to suppress them. However, the color regions with close color values. This is because the con- quantization artifacts in GIFs are far more severe, and GIF tinuity of the color space has been broken and the color color dequantization requires more than removing minor change cannot be gradual. We also notice the color shift false contours produced by bit-depth reduction. between the input image and the GIF is larger for certain Video Interpolation. Classical video interpolation meth- small regions such as the lips of the baby in Figure 1. This ods rely on cross-frame motion estimation and occlusion is because the color palette does not spend budget on these reasoning [3, 4, 14, 18]. However, motion boundaries and small regions even though they have unique, distinct colors. severe occlusions are still challenging for existing optical 3.2. GIF Color Dithering flow estimation methods [6, 9]. Moreover, the flow com- putation, occlusion reasoning, and frame interpolation are Color quantization with a small color palette yields sub- separated steps that are not properly coupled. Drawing in- stantial quantization error and artifacts. Color dithering is spiration from the success of deep learning in high-level vi- a technique that can be used to hide quantization error and sion tasks [12, 22, 36], many deep models have been pro- alleviate large-scale visual patterns such as false contours in posed for single-frame interpolation [26, 27, 30, 31] and GIFs. The most popular color dithering approach is Floyd- multi-frame interpolation [20]. SuperSlomo [20] is a re- Steinberg dithering [8]. It diffuses the quantization error cently proposed state-of-the-art method for variable-length from every pixel to its neighboring pixels in a feedback multi-frame interpolation approach. We adapt this method process. The dithered GIF has the same exact small color for GIF frame interpolation to enhance the temporal resolu- palette. However, it appears to have more colors.