Session: & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

Adaptive VP8 Steganography Based on Deblocking Filtering

Pei Xie Hong Zhang∗ Weike You State Key Laboratory of Information State Key Laboratory of Information State Key Laboratory of Information Security, Institute of Information Security, Institute of Information Security, Institute of Information Engineering, Chinese Academy of Engineering, Chinese Academy of Engineering, Chinese Academy of Sciences, Beijing 100093, China Sciences, Beijing 100093, China Sciences, Beijing 100093, China School of Cyber Security, University School of Cyber Security, University School of Cyber Security, University of Chinese Academy of Sciences, of Chinese Academy of Sciences, of Chinese Academy of Sciences, Beijing 100093, China Beijing 100093, China Beijing 100093, China [email protected] [email protected] [email protected] Xianfeng Zhao Jianchang Yu Yi Ma State Key Laboratory of Information State Key Laboratory of Information Beijing Information Technology Security, Institute of Information Security, Institute of Information Institute, Beijing 100094, China Engineering, Chinese Academy of Engineering, Chinese Academy of [email protected] Sciences, Beijing 100093, China Sciences, Beijing 100093, China School of Cyber Security, University School of Cyber Security, University of Chinese Academy of Sciences, of Chinese Academy of Sciences, Beijing 100093, China Beijing 100093, China [email protected] [email protected]

ABSTRACT ACM Reference Format: In this paper, a novel deblocking filtering-based VP8 steganographic Pei Xie, Hong Zhang, Weike You, Xianfeng Zhao, Jianchang Yu, and Yi Ma. 2019. Adaptive VP8 Steganography Based on Deblocking Filtering. In scheme is proposed. The unique aspect of this work and one that ACM Information Hiding and Multimedia Security Workshop (IH&MMSec distinguishes it from the prior art is that we effectively exploit ’19), July 3–5, 2019, Paris, France. ACM, New York, NY, USA, 6 pages. the characteristics of deblocking filtering. We propose to embed https://doi.org/10.1145/3335203.3335711 the secret by comparing the quantized discrete cosine transform coefficients before and after the in-loop filtering. Inthe 1 INTRODUCTION process of encoding, given one frame, first, we encode it to obtain the quantized discrete cosine transform coefficients. Second, a new Steganography is the art of covert communication, which sends set of coefficients is obtained by re-encoding the filtered frame. secret messages under the camouflage of innocent-looking cover Third, the distortion function is defined by comparing the difference media, such as digital images and , without arousing any between the two sets of coefficients. Finally, adaptive embedding is suspicion. With the development of computer processing power realized by using the syndrome-trellis codes. Experimental results and network transmission speed, more and more digital videos have show that satisfactory levels of visual quality and steganographic appeared in social networks such as Twitter and Youtube. At the security could be achieved with adequate payloads. same time, video coding technology is continually evolving (Figure 1). According to the 2016 Global Media Formats Report [1], H.264 CCS CONCEPTS [19], Webm [2] and HEVC [17] are the most mainstream video cod- ing standards or containers. The video stream in the WebM is • Information systems → Multimedia streaming; compressed using the VP8 video coding standard. VP8, a video cod- ing standard sponsored by , is supported in HTML5 video KEYWORDS tag in and Chrome. Compared to other video coding stan- VP8, Video steganography, Deblocking filtering dards, VP8 has high compression efficiency and low computational complexity for decoding [4]. ∗ Corresponding author According to the embedding domain, video steganography can be divided into two categories: spatial-domain video steganography [7, Permission to make digital or hard copies of all or part of this work for personal or 15] and compressed-domain video steganography. The embedding classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation elements of compressed video include motion vectors values [6, 21, on the first page. Copyrights for components of this work owned by others than ACM 23], prediction modes [10–12, 22], quantization parameters [16, 20] must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, and transformed coefficients. to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from [email protected]. There are many steganographic algorithms based on the trans- IH&MMSec ’19, July 3–5, 2019, Paris, France formed coefficients. Ma et al. [14] proposed a H.264 video stegano- © 2019 Association for Computing Machinery. ACM ISBN 978-1-4503-6821-6/19/07. . . $15.00 graphic method based on Quantized Discrete Cosine Transform https://doi.org/10.1145/3335203.3335711 (QDCT) coefficients. Several paired coefficients of a 4 × 4 DCT

25 Session: Video & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

embedded by using the Syndrome-Trellis Codes (STCs) [9], while minimizing overall embedding impact. The rest of paper is organized as follows. The principles of VP8 and in-loop filtering are introduced in Section 2. The algorithm implementation is explained in Section 3. Experimental results are shown in Section 4. Finally, the paper is concluded in Section 5. 2 PRELIMINARIES In this section, first, the coding principle of VP8 is introduced. Then we discuss how in-loop filtering removes blocking artifacts. 2.1 VP8 Compared with other video coding standards, the VP8 coding com- ponents are basically the same. It has intra/inter prediction, trans- form, quantization, entropy coding and in-loop filtering (Figure 2, Figure 3). However, in terms of details, VP8 has many unique tech- nical features, such as advanced predictive coding based on virtual reference frames, macroblock-based multi-threading technology, and advanced complexity entropy coding of adaptive complexity, etc. VP8 can achieve high compression rate and low computational complexity decoding at the same time, providing higher quality video with less bandwidth and less computation. These technical features are mainly to adapt to the application environment of VP8 - the Internet, especially the wireless network of poor stability and signal instability.

Figure 1: Development of standard coefficient block are exploited for data embedding and distortion adjustment purposes. Lin et al. [13] proposed an improved error propagation-free DCT-based perturbation scheme. Chang et al. [8] transplanted algorithms [13, 14] to H.265/HEVC video, and argued that their proposed algorithm has a larger payload under low bitrate conditions. Cao et al. [5] analyzed the reasons why the distortion minimization framework cannot be applied, and made the first at- tempt to achieve content-adaptive DCT-based H.264 steganography in intra-frames. This paper focuses on deblocking filtering-based video steganog- raphy for the VP8 standard. It mainly dues to the following two reasons. On the one hand, with the emergence of real-time video ser- Figure 2: Diagram of VP8 encoding vices such as video telephony and video surveillance, the H.264/AVC standard cannot meet the requirements of fast video coding. On the other hand, adaptive DCT-based video steganography in H.264/AVC [5] has already reached a plateau in terms of empirical security but it is a quite different case in VP8. The main contribution of this paper is the proposal of the first adaptive DCT-based steganography in VP8 by considering the dif- ference between the coefficients of the original frame and its filtered version resulting from deblocking filtering. In the encoding process, given one frame, first, a set of QDCT coefficients is obtained through the process of transform and quantization. Second, another set of QDCT coefficients is obtained through the process of re-encoding the filtered frame. Third, the two sets of coefficients are compared Figure 3: Diagram of VP8 decoding and the distortion is calculated. Finally, the secret messages are

26 Session: Video & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

2.2 In-loop Filtering twice. The distortion is obtained by comparing the difference be- There are three main reasons for the blocking artifacts. First, each tween the coefficients of the original frame and the filtered frame, macroblock transform and quantization coding process is inde- and the embedding of information is achieved using STCs [9]. pendent. Therefore, the quantization error size and its statistical characteristics introduced by each macroblock are independent of 3.1 Distortion Definition each other. It results in discontinuity of adjacent block boundaries. The calculation of the distortion function is performed as follows. Second, the prediction residual signal in motion compensation pro- First, a coefficient matrix is obtained by encoding the original frame. duces a numerical discontinuity at the macroblock boundary. Third, Second, the filtered frame is obtained after in-loop filtering. Third, there is a boundary discontinuity in the reference image. The main- another coefficient matrix is obtained by re-encoding the filtered stream video coding standards use in-loop filters to deblock. The frame. Finally, the distortion of each element is obtained through a in-loop filter of VP8 is similar to H.264, and there are some differ- defined distortion function. ences. First, it has two modes: fast mode and normal mode. The fast • Step 1: In the process of VP8 video encoding, given one frame mode is simpler than H.264. It is only used to filter the luminance Fw×h, a residual matrix Rw×h is obtained by intra/inter predic- boundary, and does not filter the chrominance boundary. The nor- tion, i.e., mal mode is more complicated than H.264. Second, when filtering Rw×h = Fw×h − Predw×h (1) between macroblocks, the VP8 filter selects a longer filtering mode wherew denotes the width of the frame, h represents the height of than the intra mode filtering. In order to speed up the encoding the frame, and Pred indicates the matrix of predicted values and decoding speed, the VP8 standard can select different filtering w×h obtained by intra/inter prediction. modes according to the prediction mode or reference frame used • Step 2: The coefficient matrix is obtained by performing to encode each macroblock. The can also apply w×h transform and quantization. different deblocking filter strengths to different parts of theimage (Figure 4). c1,1 ... cw,1 © . . ª C × = ­ . . ® (2) w h ­ . . ® c ... c « 1,h w,h¬ ′ • Step 3: Another residual matrix R w×h is obtained by intra/inter prediction ′ ′ ′ − Rw×h = Fw×h Predw×h (3) ′ where the filtered frame Fw×h resulting from the in-loop deblock- ing filtering, which is illustrated in Section 2. ′ • Step 4: Another coefficient matrix C w×h is obtained by trans- form and quantization. ′ Based on Cw×h and C w×h, the distortion function is defined as ′ γi,j = F(ci,j ,ci,j )

(4) 1 = ′ ci,j − ci,j

where 1 ≤ i ≤ w, 1 ≤ j ≤ h. In the equation (4), γi,j represents the cost of modifying the (i, j)-th coefficient. As seen in (4), for the distortion of the coefficient at the (i, j), the larger the difference ′ between ci,j and ci,j , the smaller the distortion of the element. 3.2 Practical Implementation 3.2.1 Data Embedding. Without loss of generality, the processes of embedding with one single frame are described as follows (Figure 5). Figure 4: in-Loop filter strength adaptive to image content • Step 1: In the process of VP8 video encoding, given one frame Fw×h, a residual matrix Rw×h can be obtained by above equation (3). A coefficient matrix Cw×h can be obtained by transform and quantization. 3 PROPOSED APPROACH • Step 2: The previous residual matrix Rw×h is added to the pre- In this section, the definition of distortion, the proposed video dicted value Predw×h. Perform in-loop filtering on the obtained ′ steganographic algorithm and data extraction are described. In the values to get another frame Fw×h with blocking artifacts re- proposed algorithm, each frame in the video sequence is encoded moved.

27 Session: Video & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

′ • Step 3: Another coefficient matrix C × is obtained by Re-encoding ′ w h the Fw×h. • Step 4: Calculate the distortion of the corresponding positions ′ of the two matrices Cw×h and Cw×h according to equation (4). • Step 5: The coefficient matrix Cw×h are arranged in a sequence according to the set order. • Step 6: We denote the cover sequence as x = (x1, x2,..., xn), where xk is the least significant bit of the value of the kth position in the sequence of Cw×h and n = w × h is the cover length. Given a relative payload α, an αn-bit message m is expected to be embedded by modifying x into y. • Step 7: Coefficient matrix Cw×h is modified according to y. If the least significant bit of the coefficient value ofthe (i, j)-th coefficient ci,j is equal to the current bit of the stego yk to be embedded, the coefficient value ci,j does not change. Otherwise, ′ ci,j is modified. When modifying ci,j , if ci,j is greater than ci,j , ′ the ci,j is increased by one. If ci,j is less than ci,j , the ci,j is subtracted by one.

 ci,j , ((ci,j & 1) = yk ) ′′  ′  c + 1 , ((c & 1) y AN D c < c ) ci,j = i,j i,j , k i,j i,j (5)  ′  ci,j − 1 , ((ci,j & 1) , yk AN D ci,j > ci,j )  ′′ where ci,j represents the value of the i-th row and the j-th column ′′ of the finally modified coefficient matrix C . w×h Figure 5: Generic structure of data embedding ′′ ′′ c1,1 ... cw,1 ′′ © ª ­ . . . ® Cw×h = ­ . . ® (6) ­ ′′ ′′ ® c ... c « 1,h w,h¬ At the same time, the information embedding cost D is the smallest. w h ′′ Õ Õ ′′ ( ) · ( ) D Cw×h, Cw×h, Γw×h = γi,j φ ci,j ,ci,j (7) i=1 j=1

 0, x = y φ(x,y) = (8) 1, x , y where Γw×h is the cost matrix, γi,j represents the cost of changing ′′ ci,j to ci,j . γ1,1 ... γw,1 © . . ª Γ × = ­ . . ® (9) w h ­ . . ® γ ... γ Figure 6: Generic structure of data extraction « 1,h w,h¬ 3.2.2 Data Extraction. Compared to the embedding process, the message extraction is much easier (Figure 6). With the received frame, the coefficient matrix obtained by entropy decoding are sponsored by Google. To implement our proposed method, a good arranged in a sequence in the same order as the embedding method, STCs listed in [9] is used, with the constraint height h set to 7 and and the last bit of each coefficient is composed into a stego sequence payload α set to 1/2, 1/4 and 1/8 respectively. The test set comprises y = (y ,y ,...,y ). The parity matrix H can be obtained by the 1 2 n of 20 standard CIF sequences stored in the YUV420 format, as initial setting. Then the secret messages are obtained with H·y = m. shown in Figure 7. The numbers of frame vary from 90 to 300. All 4 EXPERIMENTS sequences are compressed by the standard encoder, referred to as STD, to produce the class of clean videos. On the other hand, for our 4.1 Experimental Settings proposed algorithm, all sequences are compressed using random In our experiments, the proposed algorithm was implemented with secret messages to create the class of stego videos with comparable open source VP8 codec [3], which is created by the WebM project embedding capacities.

28 Session: Video & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

4.2 Impacts on Coding Performance Table 2: Average steganalysis results. (Payload (%), Bitrate (Kb/s), TNR (True Negative Rates (%)), TPR (True Positive The embedding impacts on video coding performance is evaluated Rates (%)), AR (Accuracy Rates (%))) from two aspects, i.e., the visual quality and compression efficiency achieved, which are measured by Peak Signal-to-Noise Ratio (PSNR) and compressed file size (KB) respectively (Table 1). Payload TNR TPR AR Table 1: Test results of some used sequences. (SN (Sequence 25.00 41.04 59.94 50.49 Name), FN (Frame Number), EM (Embedding Method), Bi- 22 trate (Kb/s), Payload (%), PSNR (dB), FS (File Size (KB))) 50.00 52.72 47.02 49.87 25.00 50.95 49.18 50.07 SN FN EM Bitrate Payload PSNR FS QP 32 50.00 50.59 49.22 49.91 STD 787.65 0.00 48.02 987 25.00 52.86 47.19 50.03 42 akiyo 300 Ours 786.59 25.00 48.00 986 50.00 53.25 46.61 49.90 Method 787.02 12.50 47.99 987 25.00 50.86 49.01 49.94 STD 1066.06 0.00 39.39 1336 500 50.00 53.04 46.51 49.77 aspen 300 Ours 1066.76 25.00 39.41 1336 25.00 46.36 53.71 50.03 Method 1065.81 12.50 39.33 1332 Bitrate 1000 50.00 47.75 52.38 50.07 STD 979.40 0.00 43.37 1227 25.00 52.98 46.88 49.93 1500 container 300 Ours 975.65 25.00 43.34 1222 50.00 51.82 48.53 50.17 Method 975.76 12.50 42.27 1223 STD 983.06 0.00 39.62 1232

crew 300 Ours 987.50 25.00 39.64 1237 Method 986.36 12.50 39.62 1236 STD 975.37 0.00 33.40 1222

mobile 300 Ours 977.64 25.00 33.35 1225 Method 976.60 12.50 33.40 1224 STD 991.83 0.00 36.07 373

stefan 90 Ours 990.06 25.00 36.05 374 Method 982.21 12.50 36.00 370 STD 999.76 0.00 35.64 1086 1002.99 25.00 35.62 1089 tempete 260 Ours Figure 7: The YUV sequences used in our experiments Method 1003.30 12.50 35.62 1090 STD 975.48 0.00 42.99 1222 compressed sequences (cover and stego) are randomly selected for washdc 300 Ours 973.15 25.00 42.96 1219 the training purposes, and the remaining 8 are left for testing. Method 976.63 12.50 43.03 1224 4.3.3 Steganalytic Results. After each run, the True Negative Rates (TNR) and True Positive Rates (TPR) are computed by count- ing the numbers of detections in the test sets. The average results 4.3 Steganalysis of 30 runs are recorded in Table 2. It is observed that with the con- 4.3.1 Steganalytic Features. In order to evaluate the stegano- sidered embedding strength, the used steganalytic features cannot graphic security of our algorithm, the state-of-the-art steganalytic reliably detect the proposed method. method [18] is chosen for benchmarking. The data set is divided into two broad categories according to the parameters, Quantiza- 5 CONCLUSIONS AND FUTURE WORK tion Parameters (QP) and bitrate. In the VP8 standard, the value of In this paper, a novel deblocking filtering-based VP8 video stegano- QP has a certain range, that is, 4 ≤ QP ≤ 63. graphic scheme is proposed. This method proposed to exploit the 4.3.2 Training and Classification. To measure steganographic QDCT coefficients in the VP8 encoding process for steganography. security levels of the proposed method, in each run, 12 pairs of In addition, we design a distortion function for QDCT coefficients

29 Session: Video & Audio Steganography IH&MMSec ’19, July 3–5, 2019, Paris, France

considering the difference of coefficients before and after in-loop fil- II, Vol. 7541. International Society for Optics and Photonics, 754105. tering. Experimental results show that satisfactory levels of coding [10] Yang Hu, Chuntian Zhang, and Yuting Su. 2007. Information hiding based on intra prediction modes for H. 264/AVC. In Multimedia and Expo, 2007 IEEE International performance and security are achieved with adequate payloads. Conference on. IEEE, 1231–1234. As part of our future work, the proposed embedding scheme [11] Sung Min Kim, Sang Beom Kim, Youpyo Hong, and Chee Sun Won. 2007. Data hid- would be further optimized by testing on different distortion func- ing on H. 264/AVC compressed video. In International Conference Image Analysis and Recognition. Springer, 698–707. tions and embedding structures. Meanwhile, attempts of further [12] Ke Liao, Shiguo Lian, Zhichuan Guo, and Jinlin Wang. 2012. Efficient information steganalysis are to be carried out under more complicated stegana- hiding in H.264/AVC video coding. Telecommunication Systems 49, 2 (2012), 261–269. lytic models to ensure security. What’s more, the application scope [13] Tseng-Jung Lin, Kuo-Liang Chung, Po-Chun Chang, Yong-Huai Huang, Hong- is to be extended to VP9 standard. Yuan Mark Liao, and Chiung-Yao Fang. 2013. An improved DCT-based perturba- tion scheme for high capacity data hiding in H. 264/AVC intra frames. Journal of Systems and Software 86, 3 (2013), 604–614. ACKNOWLEDGMENTS [14] Xiaojing Ma, Zhitang Li, Hao Tu, and Bochao Zhang. 2010. A data hiding This work was supported by NSFC under 61802393, U1636102, algorithm for H. 264/AVC video streams without intra-frame distortion drift. IEEE transactions on circuits and systems for video technology 20, 10 (2010), 1320– U1736214 and 61872356, National Key Technology R&D Program 1330. under 2016QY15Z2500 and 2016YFB0801003, and Project of Beijing [15] Jarno Mielikainen. 2006. LSB matching revisited. IEEE signal processing letters 13, Municipal Science & Technology Commission under Z181100002718001. 5 (2006), 285–287. [16] Tamer Shanableh. 2012. Data hiding in MPEG video files using multivariate regression and flexible macroblock ordering. IEEE transactions on information REFERENCES forensics and security 7, 2 (2012), 455–464. [17] Gary J Sullivan, Jens-Rainer Ohm, Woo-Jin Han, and Thomas Wiegand. 2012. [1] 2016. 2016 Global Media Formats Report. http://www.encoding.com/wp-content/ Overview of the high efficiency video coding (HEVC) standard. IEEE Transactions uploads/FormatReport-2016-2.pdf. on circuits and systems for video technology 22, 12 (2012), 1649–1668. [2] 2019. About Webm. https://www.webmproject.org/about/. [18] Peipei Wang, Yun Cao, Xianfeng Zhao, and Meineng Zhu. 2017. A Steganalytic [3] 2019. sourcecode. https://chromium.googlesource.com/webm/libvpx/. Algorithm to Detect DCT-based Data Hiding Methods for H. 264/AVC Videos. [4] Jim Bankoski, Paul Wilkins, and Yaowu Xu. 2011. Technical overview of VP8, an In Proceedings of the 5th ACM Workshop on Information Hiding and Multimedia open source video codec for the web. In Multimedia and Expo (ICME), 2011 IEEE Security. ACM, 123–133. International Conference on. IEEE, 1–6. [19] Thomas Wiegand and Gary J Sullivan. 2007. The H. 264/AVC video coding [5] Yun Cao, Yu Wang, Xianfeng Zhao, Meineng Zhu, and Zhoujun Xu. 2018. Cover standard [Standards in a Nutshell]. IEEE Signal Processing Magazine 24, 2 (2007), Block Decoupling for Content-Adaptive H. 264 Steganography. In Proceedings 148–153. of the 6th ACM Workshop on Information Hiding and Multimedia Security. ACM, [20] KokSheik Wong and Kiyoshi Tanaka. 2008. A data hiding method using Mquant 23–30. in MPEG domain. The Journal of the Institute of Image Electronics Engineers of [6] Yun Cao, Xianfeng Zhao, Fenghua Li, and Nenghai Yu. 2013. Video steganography Japan 37, 3 (2008), 256–267. with multi-path motion estimation. In Media Watermarking, Security, and Forensics [21] Changyong Xu, Xijian Ping, and Tao Zhang. 2006. Steganography in compressed 2013, Vol. 8665. International Society for Optics and Photonics, 86650K. video stream. In Innovative Computing, Information and Control, 2006. ICICIC’06. [7] Ozdemir Cetin and A Turan Ozcerit. 2009. A new steganography algorithm based First International Conference on, Vol. 1. IEEE, 269–272. on color histograms for data embedding into raw video streams. computers & [22] Dawen Xu, Rangding Wang, and Jicheng Wang. 2012. Prediction mode modulated security 28, 7 (2009), 670–682. data-hiding algorithm for H. 264/AVC. Journal of real-time image processing 7, 4 [8] Po-Chun Chang, Kuo-Liang Chung, Jiann-Jone Chen, Chien-Hsiung Lin, and (2012), 205–214. Tseng-Jung Lin. 2014. A DCT/DST-based error propagation-free data hiding [23] Hong Zhang, Yun Cao, and Xianfeng Zhao. 2016. Motion vector-based video algorithm for HEVC intra-coded frames. Journal of Visual Communication and steganography with preserved local optimality. Multimedia Tools and Applications Image Representation 25, 2 (2014), 239–253. 75, 21 (2016), 13503–13519. [9] Tomáš Filler, Jan Judas, and Jessica Fridrich. 2010. Minimizing embedding impact in steganography using trellis-coded quantization. In Media Forensics and Security

30