Gifnets: Differentiable GIF Encoding Framework
Total Page:16
File Type:pdf, Size:1020Kb
GIFnets: Differentiable GIF Encoding Framework Innfarn Yoo, Xiyang Luo, Yilin Wang, Feng Yang, and Peyman Milanfar Google Research - Mountain View, California [innfarn, xyluo, yilin, fengyang, milanfar]@google.com Abstract Graphics Interchange Format (GIF) is a widely used im- age file format. Due to the limited number of palette col- ors, GIF encoding often introduces color banding artifacts. Traditionally, dithering is applied to reduce color banding, but introducing dotted-pattern artifacts. To reduce artifacts and provide a better and more efficient GIF encoding, we introduce a differentiable GIF encoding pipeline, which in- cludes three novel neural networks: PaletteNet, DitherNet, and BandingNet. Each of these three networks provides an (a) Original (b) Palette important functionality within the GIF encoding pipeline. PaletteNet predicts a near-optimal color palette given an input image. DitherNet manipulates the input image to re- duce color banding artifacts and provides an alternative to traditional dithering. Finally, BandingNet is designed to detect color banding, and provides a new perceptual loss specifically for GIF images. As far as we know, this is the first fully differentiable GIF encoding pipeline based on deep neural networks and compatible with existing GIF de- coders. User study shows that our algorithm is better than Floyd-Steinberg based GIF encoding. (c) Floyd-Steinberg (d) Our Method Figure 1: Comparison of our method against GIF encod- 1. Introduction ing with no dithering and Floyd-Steinberg dithering. Com- pared to GIF without dithering (b) and Floyd-Steinberg (c), GIF is a widely used image file format. At its core, GIF our method (d) shows less banding artifacts as well as less represents an image by applying color quantization. Each dotted noise artifacts. The examples are generated with 16 pixel of an image is indexed by the nearest color in some palette colors. color palette. Finding an optimal color palette, which is arXiv:2006.13434v1 [eess.IV] 24 Jun 2020 equivalent to clustering, is an NP-hard problem. A com- monly used algorithm for palette extraction is the median- tions in an input image to make the color quantized version cut algorithm [15], due to its low cost and relatively high of the input image look more visually appealing. Error dif- quality. Better clustering algorithms such as k-means pro- fusion is a classical method for dithering, which diffuses duce palettes of higher quality, but are much slower, and quantization errors to nearby pixels [8, 11, 25]. While effec- have O(n2) time complexity [6, 21]. Nearly all classical tive, these error diffusion methods also introduce artifacts of palette extraction methods involve an iterative procedure their own, e.g., dotted noise artifacts as shown in Figure 1c. over all the image pixels, and are therefore inefficient. Fundamentally, these methods are not aware of the higher Another issue with GIF encoding is the color banding level semantics in an image, and are therefore incapable of artifacts brought by color quantization as shown in Fig- optimizing for higher level perceptual metrics. ure 1b. A popular method for suppressing banding arti- To this end, we propose a differentiable GIF encoding facts is dithering, a technique which adds small perturba- framework consisting of three novel neural networks: Palet- teNet, DitherNet, and BandingNet. Our architecture en- of edge sharpening and noise shaping. Adaptively chang- compasses the entire GIF pipeline, where each component ing threshold in error diffusion reduces artifact of quantiza- is replaced by one of the networks above. Our motivation tion [9]. for designing a fully differentiable architecture is two-fold. Halftoning or digital halftoning are other types of er- First of all, having a differentiable pipeline allows us to ror diffusion methods, representing images as patternized jointly optimize the entire model with respect to any loss gray pixels. In halftoning, Pang et al. [26] used structural functions. Instead of designing by hand heuristics for arti- similarity measurement (SSIM) to improve halftoned im- fact removal, our network can automatically learn strategies age quality. Chang et al. [8] reduced the computational cost to produce more visually appealing images by optimizing of [26] by applying precomputed tables. Li and Mould [22] for perceptual metrics. Secondly, our design implies that alleviated contrast varying problems using contrast-aware both color quantization and dithering only involve a single masks. Recently, Fung and Chan [12] suggested a method forward pass of our network. This is inherently more ef- to optimize diffusion filters to provide blue-noise character- ficient. Therefore, our method has O(n) time complexity, istics on multiscale halftoning, and Guo et al. [13] proposed and is easy to be parallelized compared to the iterative pro- a tone-replacement method to reduce banding and noise ar- cedures in the traditional GIF pipeline. tifacts. Our contributions are the following: 2.2. Palette Selection • To the best of our knowledge, our method is the first Color quantization involves clustering the pixels of an fully differentiable GIF encoding pipeline. Also, our image to N clusters. One of the most commonly used al- method is compatible with existing GIF decoders. gorithms for GIF color quantization is the median-cut al- • We introduce PaletteNet, a novel method that extracts gorithm [5]. Dekker proposed using Kohonen neural net- color palettes using a neural network. PaletteNet works for predicting cluster centers [10]. Other clustering shows a higher peak signal-to-noise ratio (PSNR) than techniques such as k-means [6], hierarchical clustering [7], the traditional median-cut algorithm. particle swarm methods [24] have also been applied to the problem of color quantization [30]. • We introduce DitherNet, a neural network-based ap- Another line of work focuses on making clustering al- proach to reducing quantization artifacts. DitherNet is gorithms differentiable. Jang et al. [16] proposed efficient designed to randomize errors more on areas that suffer gradient estimators for the Gumbel-Softmax which can be from banding artifacts. smoothly annealed to a categorical distribution. Oord et al. [35] proposed VQ-VAE that generates discrete encod- • We introduce BandingNet, a neural network for detect- ings in the latent space. Agustsson et al. [1] provides an- ing banding artifacts which can also be used as a per- other differentiable quantization method through annealing. ceptual loss. Peng et al. [27] proposed a differentiable k-means network which reformulates and relaxes the k-means objective in a The rest of the paper is structured as follows: Section 2 differentiable manner. reviews the current literature on relevant topics. Section 3 gives a detailed explanation of our method. The experi- 2.3. Perceptual Loss ments are given in Section 4, and conclusions in Section 5. Compared to quantitative metrics, e.g., signal-to-noise 2. Previous Work ratio (SNR), PSNR, and SSIM, perceptual quality metrics measure human perceptual quality. There have been efforts We categorize the related literature into four topics: to evaluate perceptual quality using neural networks. John- dithering, palette selection, perceptual loss, and other re- son et al. [17] suggested using the feature layers of convolu- lated work. tional neural networks as a perceptual quality metric. Talebi and Milanfar [33] proposed NIMA, a neural network based 2.1. Dithering and Error Diffusion perceptual image quality metric. Blau and Michaeli [3] dis- Dithering is a procedure to randomize quantization er- cussed the relationship between image distortion and per- rors. Floyd-Steinberg error diffusion [11] is a widely used ceptual quality, and provided comparisons of multiple per- dithering algorithm. The Floyd-Steinberg algorithm se- ceptual qualities and distortions. The authors extended their quentially distributes errors to neighboring pixels. Ap- discussion further in [4] and considered compression rate, plying blue-noise characteristics on dithering algorithms distortion, and perceptual quality. Zhang et al. [38] com- showed improvement in perceptual quality [25, 34]. Kite pared traditional image quality metrics such as SSIM or et al. [18] provided a model for error diffusion as linear PSNR and deep network-based perceptual quality metrics, gain with additive noise, and also suggested quantification and discussed the effectiveness of perceptual metrics. Palette Selection Dithering Palette Input Image I Palette Image I’ Output Image Ȋ Projection Projection Median-cut Floyd-Steinberg Error Diffusion Proj Proj (a) Traditional Pipeline Loss L2(I, I’) Loss L1(I, Ȋ) Loss L1(E’, I - I’) Perceptual Loss R(I, Ȋ) BandingNet Loss B(Ȋ) Palette Prediction Dithering Palette Input Image I Palette Image I’ Error Image E’ Output Image Ȋ Projection Projection Projs/h Projs/h PaletteNet DitherNet (b) Our Pipeline Figure 2: Traditional GIF pipeline (a) and our GIF encoding pipeline (b). In our pipeline, PaletteNet predicts a near-optimal color palette and applies either a soft or hard projection to produce the quantized output image. DitherNet suppresses the banding artifacts by randomizing quantization errors, and BandingNet provides a perceptual loss to judge the severity of banding artifacts. Banding is a common compression artifact caused by work architectures, such as ResNet [14], U-Net [29], and quantization. There are some existing works [2, 20, 37] Inception [31]. ResNet allowed deeper neural network con-