Arxiv:2101.00973V1 [Eess.IV] 30 Dec 2020 Ternet Users Mainly Communicated in the Form of Plain Text, Reduce the Data Size [8]
Total Page:16
File Type:pdf, Size:1020Kb
TOWARDS ROBUST DATA HIDING AGAINST (JPEG) COMPRESSION: A PSEUDO-DIFFERENTIABLE DEEP LEARNING APPROACH Chaoning Zhang∗, Adil Karjauv∗, Philipp Benz∗, In So Kweon [email protected], [email protected], [email protected], [email protected] ABSTRACT in the compressed form, such as JPEG or MPEG. This kind Data hiding is one widely used approach for protecting au- of lossy operation can easily destroy the hidden information thentication and ownership. Most multimedia content like for most existing data hiding approaches [1, 2]. Thus, robust images and videos are transmitted or saved in the compressed data hiding in multimedia that can resist non-malicious attack, form. This kind of lossy compression, such as JPEG, can de- lossy compression, has become a hot research direction. stroy the hidden data, which raises the need of robust data Recently, deep learning has also shown large success in hiding. It is still an open challenge to achieve the goal of data hiding data in images [3, 4, 5, 6, 7]. For improving the robust- hiding that can be against these compressions. Recently, deep ness against JPEG compression, the SOTA approaches need learning has shown large success in data hiding, while non- to include the JPEG compression in the training process [6, 7]. differentiability of JPEG makes it challenging to train a deep The challenge of this task is that JPEG compression is non- pipeline for improving robustness against lossy compression. differentiable. Prior works have either roughly simulated the The existing SOTA approaches replace the non-differentiable JPEG compression [6] or carefully designed the differentiable parts with differentiable modules that perform similar oper- function to mimic every step of the JPEG compression [7]. ations. Multiple limitations exist: (a) large engineering ef- Compared with the rough simulation approach [6], the care- fort; (b) requiring a white-box knowledge of compression at- fully designed mimicing approach has achieved significant tacks; (c) only works for simple compression like JPEG. In performance [7]. However, it still has several limitations: this work, we propose a simple yet effective approach to ad- (a) mimicking the JPEG compression requires large amount dress all the above limitations at once. Beyond JPEG, our of engineering effort, (b) it is compression specific, i.e. it approach has been shown to improve robustness against var- only works for JPEG compression, while not effective against ious image and video lossy compression algorithms. Code: other compression, (c) it requires a full white-box knowledge https://github.com/mikolez/Robust_JPEG. of the attack method. In practice, the attacker might have some publicly available tool. In this work, we demonstrate Index Terms— robust data hiding, lossy compression, that a simple approach can be adopted for alleviating all the pseudo-differentialble above limitations at once. 1. INTRODUCTION 2. RELATED WORK The Internet has revolutionized the world development in the sense that it has become the de facto most convenient and 2.1. Lossy Data compression cost-effective remote communication medium [1]. Early in- In information technology, data compression is widely used to arXiv:2101.00973v1 [eess.IV] 30 Dec 2020 ternet users mainly communicated in the form of plain text, reduce the data size [8]. The data compression techniques can and in recent years, images or even videos have gradually be- be mainly divided into lossless and lossy ones [8]. Lossless come the dominant multimedia content for communication. techniques have no distortion effect on the encoded image, However, those images are vulnerable to security attacks in- thus it is not relevant to robust data hiding. We mainly sum- cluding unauthorized modification and sharing. Data hiding marize the lossy compression for images and videos. Overall, has been established as a promising and competitive approach lossy compression is a class of data encoding methods that for authentication and ownership protection [2]. discards partial data. Without noticeable artifacts, it can re- Social media has become indispensable part for most of us duce the data size significantly, which facilitates storing and who are active on the FaceBook, Twitter, Youtube for sharing transmitting digital images. JPEG was first introduced in multimedia content like photos or videos. For reducing the 1992 and has become the de facto most widely used compres- bandwidth or traffic to faciliate the transmission, most photos sion technique for images [9]. JPEG compression involves and videos on the social media (or internet in general) are two major steps: DCT transform and quantization. For color ∗Equal contribution images, color transform is also often adopted as a standard pre-processing step [9]. Even though humans cannot eas- that “true” JPEG is non-differentiable, the authors propose to ily identify the difference between original images and com- “differentiate” the JPEG compression with two approxima- pressed images, this lossy compression can have significant tion methods, i.e. JPEG-Mask and JPEG-Drop, to remove the influence on the deep neural networks. JPEG2000 [10] is an- high frequency content in the images. We refer the readers other famous lossy compression that has higher visual quality to [6] for more details. Even though JPEG-Mask and JPEG- with less mosaic artifacts. In contrast to JPEG, JPEG2000 Drop also remove the high frequency content as the “true” is based on Discrete Wavelet Transform [11]. Concurrent to JPEG, their behavior is still quite far from the “true” JPEG, JPEG2000, another variant of DWT based lossy compres- resulting in over-fitting and poor performance when tested un- sion, i.e. progressive file format Progressive Graphics File der the “true” JPEG. To minimize the gap approximation and (PGF) [12] has also come out. In 2010, Google developed a “true” JPEG, more recently, [7] proposed a new approach new image format called WebP, which is also based on DCT to carefully simulate the important steps in the “true” JPEG. as JPEG but outperforms JPEG in terms of compression ra- Similar approach for simulating JPEG has also been proposed tio [13]. For video lossy compression, MPEG [14] is the most in [25] for generating JPEG-resistant adversarial examples. widely adopted approach. 3. PSEUDO-DIFFERENTIABLE JPEG AND BEYOND 2.2. Traditional robust data hiding Since the advent of image watermarking, numerous works Robust data hiding requires imperceptibility and robustness. have investigated robust data hiding to improve its impercep- The imperceptibility is achieved minimizing its distortion on tibility, robustness as well as hiding capacity. Early works the cover image. In this work, we aim to improve robustness have mainly explored manipulating the cover image pixels against lossy compression, such as JPEG, while maintaining directly, i.e. embedding the hidden data in the spatial do- its imperceptibility. Specifically, we focus on deep learning main, for which hiding data in least significant bits (LSBs) based approach through addressing the challenge that lossy can be seen as a classical example. For improving robustness, compression is non-differentiable. Note that lossy compres- the trend has shifted to hiding in a transform domain, such sion by default is always non-differentiable due to many fac- as Discrete Cosine Transform (DCT) [15], Discrete Wavelet tors, out of which quantization is one of the widely known Transform [11], or a mixture of them [16, 17]. For achiev- factors. ing a good balance between imperceptibility and robustness, adaptive data hiding has emerged to embed the hidden data based on the cover image local features [18, 19]. Traditional 3.1. Why do we focus on lossy compression? data hiding methods often require the practitioners to have A intentional attacker can adopt various techniques, such as a good understanding of a wide range of relevant expertise. nosing, filtering or compression, to remove the hidden data. Deep learning based approaches automated the whole pro- In practice, however, unauthorized sharing or modication is cess, which significantly facilitates its wide use. Moreover, often done without any intentional attacks. Instead, the social recent works have demonstrated that deep learning based ap- media platform often performs non-intentional attack through proaches have achieved competitive performance against tra- lossy compression, while it is very unusual that a media plat- ditional approaches in terms of capacity, robustness, and se- form would intentionally add noise to the images. Thus, ad- curity [6, 7]. dressing the robustness against lossy compression has high practical relevance. Moreover, the existing approaches al- 2.3. Deep learning based robust data hiding ready have a good solution to other methods. Beyond the success in a wide range of applications, deep learning has shown success in data hiding [20]. The trend 3.2. Deep Learning based robust data hiding SOTA has shifted from adopting DNNs for a certain stage of the framework whole data hiding pipeline [21, 22, 23] to utilizing DNNs for the whole pipeline in an end-to-end manner [3]. [24] has Before introducing the proposed method for improving ro- shown the possibility of hiding a full image in a image, which bustness against lossy compression, we first summarize the has been extended to hiding video in video in [4, 5]. Hiding deep learning based robust data hiding framework adopted by binary data with adversarial learning has also been proposed SOTA approaches [6, 7] in Fig 1. Straightforwardly, it has in [3]. The above deep learning approaches do not take ro- two major components: encoding network for hiding the data bustness into account and the hidden data can easily get de- in the cover image and decoding network for extracting the stroyed by common lossy compression, such as JPEG. For hidden data from the encoded image. Additionally it has an improving its robustness to JPEG, [6] have proposed to in- auxiliary component, i.e. attack layer (also called as “noise clude “noise layer” that simulates the JPEG to augment the layer” in [6]), necessary for improving its robustness against encoded images in the training.