LEARNED LOSSLESS IMAGE COMPRESSION WITH A HYPERPRIOR AND DISCRETIZED GAUSSIAN MIXTURE LIKELIHOODS Zhengxue Cheng, Heming Sun, Masaru Takeuchi, Jiro Katto Department of Computer Science and Communications Engineering, Waseda University, Tokyo, Japan. ABSTRACT effectively in [12, 13, 14]. Some methods decorrelate each Lossless image compression is an important task in the field channel of latent codes and apply deep residual learning to of multimedia communication. Traditional image codecs improve the performance as [15, 16, 17]. However, deep typically support lossless mode, such as WebP, JPEG2000, learning based lossless compression has rarely discussed. FLIF. Recently, deep learning based approaches have started One related work is L3C [18] to propose a hierarchical archi- to show the potential at this point. HyperPrior is an effective tecture with 3 scales to compress images lossless. technique proposed for lossy image compression. This paper In this paper, we propose a learned lossless image com- generalizes the hyperprior from lossy model to lossless com- pression using a hyperprior and discretized Gaussian mixture pression, and proposes a L2-norm term into the loss function likelihoods. Our contributions mainly consist of two aspects. to speed up training procedure. Besides, this paper also in- First, we generalize the hyperprior from lossy model to loss- vestigated different parameterized models for latent codes, less compression model, and propose a loss function with L2- and propose to use Gaussian mixture likelihoods to achieve norm for lossless compression to speed up training. Second, adaptive and flexible context models. Experimental results we investigate four parameterized distributions and propose validate our method can outperform existing deep learning to use Gaussian mixture likelihoods for the context model. based lossless compression, and outperform the JPEG2000 Experimental results have demonstrated our method can out- and WebP for JPG images. perform recent learned compression approach L3C. Besides, our method outperform JPEG2000 and WebP for JPG images. Index Terms— Lossless Image Compression, Deep Learning, HyperPrior, Gaussian Mixture Model 2. LEARNED LOSSLESS IMAGE COMPRESSION 1. INTRODUCTION 2.1. Formulation of Compression with a Hyperprior Image compression is a fundamental task in the field of According to the transform coding approach [19], image com- signal processing in many decades for efficient transmis- pression can be formulated as sion and storage. Classical image compression standards, y = g (x; φ) such as JPEG [1], JPEG2000 [2], WebP [3], HEVC/H.265- a intra [4] and lossless FLIF [5], usually rely on hand-crafted y^ = Q(y) (1) encoder/decoder (codec) block diagrams. They use fixed lin- x^ = gs(y^; θ) ear transform, quantization and the entropy coder to reduce spatial redundancy for images. Typically they also sup- where x, x^, y, and y^ are raw images, reconstructed images, arXiv:2002.01657v1 [eess.IV] 5 Feb 2020 port lossless model to compress images lossless. However, a latent presentation before quantization, and compressed along with the fast development of new image formats and codes, respectively. Q represents the quantization , which 1 1 high-resolution mobile devices, existing image compression is approximated by a uniform noise U(− 2 ; 2 ) during the algorithms are not expected to be optimal solutions. training, which is denoted as UjQ in Fig. 1. φ and θ are Recently, we have seen a great surge of deep learning optimized parameters of analysis and synthesis transforms. based image compression approaches. Most of them fo- The operation diagram is explained in Fig. 1(a). cus on lossy image compression. For example, some image In the work [12], Balle´ proposed a hyperprior for lossy compression approaches use generative models to learn the image compression to achieve promising results. Hyperprior distribution of images using adversarial training [6, 7, 8] to introduces a side information z to capture spatial dependen- achieved impressive subjective quality at extremely low bit cies among the elements of y, formulated as rate. Some works use recurrent neural networks to compress the residual information recursively, such as [9, 10, 11] to re- z = ha(y; φh); alize scalable coding. Some approaches propose a hyperprior- z^ = Q(z) (2) based and context-adaptive context model to compress codes (µy; σy) = hs(z^; θh) (a) Lossy model (b) Lossy model with a hyper- (c) Proposed Lossless model (d) Distributions of lossy and prior with a hyperprior lossless models Fig. 1. Operational diagrams and distributions of learned lossy and lossless image compression. where h and h denote the analysis and synthesis functions Y 2 1 1 a s py^(y^jz^) = (N (µy ; σ ) ∗ U(− ; ))(^yi) i yi 2 2 (5) in the auxiliary autoencoder, where φh and θh are optimized i parameters of the hyperprior, respectively. µ and σ are y y Y 1 1 generated by the auxiliary autoencoder to model a Gaus- pz^(z^j ) = (pzij i ( i) ∗ U(− ; ))(^zi) (6) 2 2 2 sian distribution N (µy; σy). Originally, Balle´ assumed a i zero-mean scalar Gaussian distribution, but in the enhanced where are learnable factorized distributions [12] for z be- work [13], a mean and scale Gaussian distribution achieves cause there is no prior for z. better results. Then we use a mean and scale one in this paper. Basically, by using Eq.(3), we can train a neural network The operation diagram is explained in Fig. 1(b). for lossless compression. However, by experiments, we have However, both cases are lossy compression. In this paper, found the converge is very slow. By observing the distribu- we generalize a lossy model to a lossless model as Fig. 1(c). tion in the distribution of Fig. 1(d), we can notice the actual Not only the reconstructed pixel value, we predict a probabil- marginal distribution of x typically is not identical to the ity model for raw images, and model x using another param- estimated entropy model px^, especially for the initial training eterized distributions. This is feasible because entropy cod- stage of neural networks. It takes quite a long time to find a ing techniques such as arithmetic coding [20] can losslessly proper and accurate distribution, which is centralized at the compress the signals if a probability model is given. The dis- actual x value. Therefore, we novelly introduce a L2-norm tribution difference of lossy and lossless compression is illus- term between the ground-truth value and predicted mean trated in Fig. 1(d), where we can think lossy compression is value into the lossless compression, that is, to predict a delta distribution at the value x^ for each element, 0 2 2 while lossless compression is to predict a more generalized L = jjµx − x^jj + jjµy − y^jj (7) and arbitrary probability models. One intuitive choice is to 2 Then the updated loss function for lossless image compres- use Gaussian distribution N (µx; σ ), like y, as Fig. 1(d). x sion is defined as 2.2. Proposed Loss Function with a L2-Norm L = E[− log2(px^(x^jy^)) + E[− log2(py^(y^jz^))+ [− log (p (z^j )) + λ · (jjµ − x^jj2 + jjµ − y^jj2) Different from lossy compression which controls the rate- E 2 z^ x y (8) distortion tradeoff, the loss function of lossless compression where λ controls the weights of L2-norm term. only consists of required bits for x, y and z, that is, 2.3. Proposed Discretized Gaussian Mixture Likelihoods L = E[− log2(px^(x^jµx; σx))] + E[− log2(py^(y^jµy; σy))]+ E[− log2(pz^(z^j ))] Whether the parameterized distribution model fits the margin (3) distribution of x and y^ is a key factor for the performance. where the probability models of x^ are estimated by the pa- We visualize the margin distribution of all sub-pixel values x rameters µx; σx, which are conditioned on y^ actually. Simi- and compressed codes y^ for kodim02 from Kodak dataset [27] larly, the probability model of y are estimated by parameters in Fig. 3, and it does not actually satisfy the assumption that conditioned on z^, i.e. they follow Gaussian distribution. Previous works have inves- tigated several parameterized distribution models, but to our Y 2 1 1 knowledge, which distribution model can fit true margin dis- px^(x^jy^) = (N (µx ; σ ) ∗ U(− ; ))(^xi) i xi 2 2 (4) i tribution best is still an open question. For example, standard Fig. 2. Network architecture of lossless image compression. (a) sub-pixel values x (b) compressed codes y^ Fig. 3. Margin Distributions for Image kodim02. Fig. 4. Different distributions with zero-mean and one-scale. PixelCNN [21] uses full 256-way softmax likelihoods, which rameterized probabilities from univariate Gaussian model to is memory-consuming and impractical for large images. Pix- multivariate Gaussian mixture model as elCNN++ [22] proposed a discretized logistic mixture likeli- Y hoods to achieve faster training. L3C [18] followed the Pix- py^(y^jz^) = py^(^yijz^) elCNN++ to use a logistic mixture model. Lossy compres- i sion work [12] assumed a univariate Gaussian distribution. K (9) X (k) (k) 2(k) 1 1 The work [23] used Laplace distribution. Traditional coding py^(^yijz^) = ( w N (µ ; σ ) ∗ U(− ; ))(^yi) yi yi yi 2 2 work [24] used a Cauchy distribution to model coefficients. k=1 We investigate all the above models, including Gaussian where the k-th mixture distribution is characterized by Gaus- distribution, Cauchy distribution, Laplace distribution and (k) sian distribution with three parameters, i.e. weights wi , Logistic distribution. We replace them for y^ individually. (k) 2(k) Examples of their distribution models are visualized in Fig. 4, means µi and variances σi for each i-th element. The where we clip the ranges and add the tail masses to edge mixture models for x are conducted in the same way as y.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages6 Page
-
File Size-