How to Exploit the Transferability of Learned Image Compression to Conventional Codecs
Total Page:16
File Type:pdf, Size:1020Kb
How to Exploit the Transferability of Learned Image Compression to Conventional Codecs Jan P. Klopp Keng-Chi Liu National Taiwan University Taiwan AI Labs [email protected] [email protected] Liang-Gee Chen Shao-Yi Chien National Taiwan University [email protected] [email protected] Abstract Lossy compression optimises the objective Lossy image compression is often limited by the sim- L = R + λD (1) plicity of the chosen loss measure. Recent research sug- gests that generative adversarial networks have the ability where R and D stand for rate and distortion, respectively, to overcome this limitation and serve as a multi-modal loss, and λ controls their weight relative to each other. In prac- especially for textures. Together with learned image com- tice, computational efficiency is another constraint as at pression, these two techniques can be used to great effect least the decoder needs to process high resolutions in real- when relaxing the commonly employed tight measures of time under a limited power envelope, typically necessitating distortion. However, convolutional neural network-based dedicated hardware implementations. Requirements for the algorithms have a large computational footprint. Ideally, encoder are more relaxed, often allowing even offline en- an existing conventional codec should stay in place, ensur- coding without demanding real-time capability. ing faster adoption and adherence to a balanced computa- Recent research has developed along two lines: evolu- tional envelope. tion of exiting coding technologies, such as H264 [41] or As a possible avenue to this goal, we propose and investi- H265 [35], culminating in the most recent AV1 codec, on gate how learned image coding can be used as a surrogate the one hand. On the other hand, inspired by the success to optimise an image for encoding. A learned filter alters of deep learning in computer vision, based on the vari- the image to optimise a different performance measure or ational autoencoder [20], several learned image compres- a particular task. Extending this idea with a generative ad- sion approaches have been developed. Through careful versarial network, we show how entire textures are replaced modelling of the latent code’s symbol probabilities, those by ones that are less costly to encode but preserve a sense learned codecs have recently shown to outperform the x265 of detail. codec on various performance measures [30]. This trades Our approach can remodel a conventional codec to ad- engineering effort for complexity: training a neural network just for the MS-SSIM distortion with over 20% rate im- image codec is simple compared to designing a conven- provement without any decoding overhead. On task-aware tional codec with all its bells and whistles. However, the 5 image compression, we perform favourably against a simi- resulting neural decoder requires on the order of 10 oper- lar but codec-specific approach. ations per pixel, where a conventional codec has at most a few hundred operations, being several orders of magnitude more efficient. This is an important factor, given how much 1. Introduction video and imagery is consumed on mobile devices. Another advantage of learned compression is the ability Forming the basis for video compression, image com- to be easily adapted to different distributions or distortion pression’s efficiency is vital to many data transmission and measures. By merely collecting data and choosing an ap- storage applications. Most compression algorithms are propriate distortion loss, a new codec can quickly be trained lossy, i.e. they do not reproduce the original content ex- without having to manually re-engineer coding tools or in- actly but allow for deviations that reduce the coding rate. ternal parameters. 16165 In this work, we take a step towards leveraging the adapt- Table 1. Comparing different coding mechanism as to which part ability of learned codecs for conventional codecs without of a lossy coding pipeline they influence. changing the conventional codec itself. In doing so, we en- Approach Encoder Decoder able a higher compression performance without touching Remove Capture Post- Code Rec. the decoder, meaning that existing endpoint decoder imple- Inform. Redund. Filter mentations do not require changes. ✓ ✓ ✓ ✓ ✗ ✓ In lossy compression, adaptability has two aspects: drop- Conv. Codec / ✓ ✓ ✓ ✓ ✓ ping information that is hard to encode and of little impor- Learn. Codec ✓ tance and, secondly, exploiting expensive redundancies in Denoising ✓ ✓ the data. The latter would require either reorganising data Inpainting ✓ ✓ or changing the coding tools a codec can apply at encod- [12, 34] ✓ ing time. The former, though, is feasible as the data can Our’s / [37] be changed prior to letting the encoder map it to its coded representation. We demonstrate three possible applications of this In codecs with fixed transforms, this process needs to be scheme: emulated manually, which typically happens by varying the 1. We apply a filtering mechanism to siphon out informa- quantisation granularity so that high-frequency transform tion that would drive the coding cost but is only of little coefficients are quantised more coarsely than low-frequency concern with respect to the chosen loss function. Intro- ones. Different rate-distortion trade-offs are then achieved ducing a learned codec as a surrogate gradient provider by adjusting the granularity for all coefficients simultane- conventional codecs can be refitted to the MS-SSIM ously. Omitting information happens implicitly through the distortion measure at over 20% rate improvement. quantisation process. This, however, requires the trans- forms to separate information according to its importance 2. We apply the approach to task-specific compression so that varying quantisation granularity has the desired ef- where we perform favourably on image classification fect. The general underlying idea is that low-frequency con- without having to model the target codec specifically. tent is the most important and importance decreases towards higher frequencies. While this adequately distinguishes 3. We pair the surrogate-induced filter with a generative noise from actual information for some content, it does not adversarial network to alter an image beyond optimi- hold in general. Hence, more adaptive filters are desirable, sation for simple distortion measures. Textures are re- however difficult to adjust as their influence on the coding placed with perceptually similar alternatives that have behaviour is unknown. a shorter code representation, thereby survive the cod- ing process and preserve the crispness of the original Omitting unimportant information helps to significantly impression. reduce the size of compressed multimedia data. For an opti- mal codec, it would be desirable to design the data filtering By simply manipulating the input data, we alter the way process with the distortion measure on the one hand and the the fixed-function codec behaves. We present evaluations transformation and coding procedure on the other in mind. across different datasets and codecs and show examples Machine-learning based codecs can do this implicitly be- of how introducing GANs can lead to distinctly sharper- cause all elements required are differentiable. Their opti- looking images at low coding rates. misation procedure allows the filters to adapt to the input data, the distortion measure, and the arithmetic coding pro- 2. Learning to Filter with a Surrogate Codec cedure. For conventional codecs, this process is realised Current coding mechanisms are most often instances through extensive engineering and by relying on local opti- of transform coding, where the data is first transformed, misation of the code structure, for example, how to partition then quantised in the transform domain and finally encoded the image into transform blocks or the choice of prediction losslessly using a variant of arithmetic coding. Conven- modes, which makes it difficult to turn a codec into a differ- tional codecs use a cosine- or sine-transform, while learned entiable function and obtain a gradient with respect to the codecs replace them with a convolutional neural network. input image. This has the distinct advantage that the filters of the neural In the machine learning literature this has been addressed network can be adapted to leave out information that has as black-box function optimisation, for example through the low marginal coding gain, i.e. costly to encode and with REINFORCE learning algorithm [42] or the learning-to- only little benefit as measured by a given loss function. Ta- learn approaches [15, 3, 11]. These methods rely on an ble 1 shows which functionality is invoked by different cod- approximation of the unknown function to be optimised, ing tools. though. Filtering of less important information is likely 16166 Surrogate Gradient Surrogate Codec 푇 퐷 퐶 Rate Estimator 휕푔푅 퐼;መ 휃퐶 휕퐷 퐼, 푔 퐼;መ 휃 + 휆푇 휕퐼መ 휕퐼መ 푔푅 퐼;መ 휃퐶 Encoder Reconstruction Target Loss 푔퐷 퐼;መ 휃퐶 Rate Switch 푇 Filter 푅 휕퐷푇 퐼, 퐼መ Distortion Measure 휆푇 휕퐼መ 퐷푇 퐼, 퐼መ 퐹 퐹 Image 퐼መ = 푓 퐼; 휃 Reconstruction 퐼 ℎ퐷 퐼መ Encoder Gradient Rate Estimator Image Forward 푅 Target Codec (notℎ diff.)퐼መ 퐼 Figure 1. Structural overview of our method. The goal is to obtain a trained filter fF (I; θF ) to optimise the input image I for encoding by a target codec. This target is typically not differentiable. A surrogate codec is used instead. It provides a differentiable rate estimate. For the reconstruction there are two options as indicated by the switch. The first is to take the surrogate’s reconstruction, the second to invoke the target during the forward pass. The gradient flows back accordingly either through the surrogate’s reconstruction or directly into the filter. The target distortion measure DT can be chosen freely. The surrogate codec is pre-trained with a distortion measure similar to the one of the target codec so as to imitate its behaviour. At testing time, the filter is applied to the image before it’s encoded by the target codec.