Learning Scalable Ly=-Constrained Near-Lossless Image Compression Via Joint Lossy Image and Residual Compression
Total Page:16
File Type:pdf, Size:1020Kb
Learning Scalable ℓ∞-constrained Near-lossless Image Compression via Joint Lossy Image and Residual Compression Yuanchao Bai1, Xianming Liu1,2,∗ Wangmeng Zuo1,2, Yaowei Wang1, Xiangyang Ji3 1Peng Cheng Laboratory, 2Harbin Institute of Technology, 3Tsinghua University {baiych,wangyw}@pcl.ac.cn, {csxm,wmzuo}@hit.edu.cn, [email protected] Abstract of the decoded images, ℓ∞-constrained near-lossless image compression is developed [6, 15, 2, 47, 51] and standardized We propose a novel joint lossy image and residual in traditional codecs, e.g., JPEG-LS [47] and CALIC [51]. compression framework for learning ℓ∞-constrained near- Different from lossy image compression with Peak Signal- lossless image compression. Specifically, we obtain a lossy to-Noise Ratio (PSNR) or Multi-Scale Structural SIMilar- reconstruction of the raw image through lossy image com- ity index (MS-SSIM) [45, 46] distortion measures, ℓ∞- pression and uniformly quantize the corresponding resid- constrained near-lossless image compression requires the ual to satisfy a given tight ℓ∞ error bound. Suppose that maximum reconstruction error of each pixel to be no larger the error bound is zero, i.e., lossless image compression, than a given tight numerical bound. Lossless image com- we formulate the joint optimization problem of compress- pression is a special case of near-lossless image compres- ing both the lossy image and the original residual in terms sion, when the tight error bound is zero. of variational auto-encoders and solve it with end-to-end With the fast development of deep neural networks training. To achieve scalable compression with the error (DNNs), learning-based lossy image compression [40, 3, bound larger than zero, we derive the probability model 39, 41, 34, 24, 4, 31, 27, 22, 8, 7, 23, 25, 29] has achieved of the quantized residual by quantizing the learned prob- tremendous progress over the last four years. Most re- ability model of the original residual, instead of training cent methods for lossy image compression adopt variational multiple networks. We further correct the bias of the de- auto-encoder (VAE) architecture [17, 18] based on trans- rived probability model caused by the context mismatch be- form coding [11], where the rate is modeled by the entropy tween training and inference. Finally, the quantized resid- of quantized latent variables and the reconstruction distor- ual is encoded according to the bias-corrected probability tion is measured by PSNR or MS-SSIM. Through end-to- model and is concatenated with the bitstream of the com- end rate-distortion optimization, the state-of-the-art meth- pressed lossy image. Experimental results demonstrate that ods, such as [7], can achieve comparable performance with our near-lossless codec achieves the state-of-the-art per- the lastest compression standard Versatile Video Coding formance for lossless and near-lossless image compression, (VVC) [32]. However, the above transform coding scheme and achieves competitive PSNR while much smaller ℓ∞ er- cannot be directly employed on near-lossless image com- ror compared with lossy image codecs at high bit rates. pression, because it is difficult for DNN-based transforms to satisfy a tight bound on the maximum reconstruction error of each pixel, even without quantization. Invertible trans- 1. Introduction forms, such as integer discrete flow [13] or wavelet-like transform [25], are possible solutions to lossless compres- Image compression is a ubiquitous technique in com- sion but not to general near-lossless compression. puter vision. For certain applications with stringent de- In this paper, we propose a new joint lossy image and mands on image fidelity, such as medical imaging or image residual compression framework for learning near-lossless archiving, the most reliable choice is lossless image com- image compression, inspired by the traditional “lossy plus pression. However, the compression ratio of lossless com- residual” coding scheme [10, 26, 2]. Specifically, we obtain pression is upper-bounded by Shannon’s source coding the- a lossy reconstruction of the raw image through lossy im- orem [36], and is typically between 2:1 and 3:1 for practical age compression and uniformly quantize the corresponding lossless image codecs [50, 47, 37, 14, 5, 38]. To improve residual to satisfy the given error bound . Suppose that the compression performance while keeping the reliability ℓ∞ τ the error bound τ is zero, i.e., lossless image compression, ∗Corresponding author we formulate the joint optimization problem of compress- 11946 ing both the lossy image and the original residual in terms of further compress residuals, which is a special case of our VAEs [17, 18], and solve it with end-to-end training. Note framework. Beyond [30], our lossy image compressor and that our VAE model is novel, different from transform cod- residual compressor are jointly optimized through end-to- ing based VAEs [3, 39, 4] for simply lossy image compres- end training and are interpretable as a VAE model. sion or bits-back coding based VAEs [42, 19] for lossless Near-lossless Image Compression. Basically, previ- image compression. ous methods for near-lossless image compression can be To achieve scalable near-lossless compression with error divided into three categories: 1) pre-quantization: adjust- bound τ > 0, we derive the probability model of the quan- ing raw pixel values to the ℓ∞ error bound, and then com- tized residual by quantizing the learned probability model pressing the pre-processed images with lossless image com- of the original residual at τ = 0, instead of training mul- pression, e.g., near-lossless WebP [14]; 2) predictive cod- tiple networks. Because residual quantization leads to the ing: predicting subsequent pixels based on previously en- context mismatch between training and inference, we fur- coded pixels, then quantizing predication residuals to sat- ther propose a bias correction scheme to correct the bias of isfy the ℓ∞ error bound, and finally compressing the quan- the derived probability model. An arithmetic coder [48] is tized residuals, e.g., [6, 15], near-lossless JPEG-LS [47] adopted to encode the quantized residual according to the and near-lossless CALIC [51]; 3) lossy plus residual cod- bias-corrected probability model. Finally, the near-lossless ing: similar to 2), but replacing predictive coder with lossy compressed image is stored including the bitstreams of the image coder, and both the lossy image and the quantized encoded lossy image and the quantized residual. residual are encoded, e.g., in [2]. Our main contributions are summarized as follows: In this paper, we propose a joint lossy image and residual • We propose a joint lossy image and residual compression compression framework for learning-based near-lossless framework to realize learning-based lossless and near- image compression, inspired by lossy plus residual cod- lossless image compression. The framework is inter- ing. To the best of our knowledge, we propose the first preted as a VAE model and can be end-to-end optimized. deep-learning-based near-lossless image codec. Recently, • We realize scalable compression by deriving the proba- CNN-based soft-decoding methods [52, 53] are proposed bility model of the quantized residual from the learned to improve the reconstruction performance of near-lossless probability model of the original residual, instead of CALIC [51]. However, these methods should belong to im- training multiple networks. A bias correction scheme fur- age reconstruction rather than image compression. ther improves the compression performance. • Our codec achieves the state-of-the-art performance 3. Scalable Near-lossless Image Compression for lossless and near-lossless image compression, and achieves competitive PSNR while much smaller ℓ∞ er- 3.1. Overview of Compression Framework ror compared with lossy image codecs at high bit rates. Given a tight ℓ∞ bound τ ∈ {0, 1, 2,...}, near-lossless codecs compress a raw image satisfying the following dis- 2. Related Work tortion measure: Learning-based Lossy Image Compression. Early Dnll(x, xˆ)= kx − xˆk∞ = max |xi,c − xˆi,c|≤ τ (1) learning-based methods [40, 41] for lossy image compres- i,c sion are based on recurrent neural networks (RNNs). Fol- lowing [3, 39], most recent learned methods [34, 24, 4, 31, where x and xˆ are a raw image and its near-lossless recon- 27, 22, 8, 7, 23, 25, 29] adopt convolutional neural networks structed counterpart. xi,c and xˆi,c are the pixels of x and xˆ, (CNNs) and can be interpreted as VAEs [17, 18] based on respectively. i denotes the i-th spatial position in 2D raster- transform coding [11]. In our joint lossy image and residual scan order, and c denotes the c-th channel. compression framework, we take advantage of the advanced In order to realize the ℓ∞ error bound (1), we propose transform (network structures), quantization and entropy a near-lossless image compression framework by integrat- coding techniques in the existing learning-based methods ing lossy image compression and residual compression. We for lossy image compression. first obtain a lossy reconstruction x˜ of the raw image x Learning-based Lossless Image Compression. Given through lossy image compression. Lossy image compres- the strong connections between lossless compression and sion methods, such as traditional [44, 37, 14, 5] or learned unsupervised learning, auto-aggressive models [33, 43, 35], methods [24, 4, 31, 27, 22, 8, 7, 23, 25], can achieve high flow models [13], bits-back coding based VAEs [42, 19] and PSNR results at relatively low bit rates, but it is difficult for other specific models [28, 30, 25], are introduced to approx- these methods to ensure a tight error bound τ of each pixel imate the true distribution of raw images for entropy cod- in x˜. We then compute the residual r = x − x˜ and suppose ing. Previous work [30] uses traditional BPG lossy image that r is quantized to ˆr.