<<

arXiv:1803.04061v1 [eess.IV] 11 Mar 2018 † ⋆ P oe sstrerfrne o oincompensation, motion for The references [8]. three groups GF uses namely the codec to structure VP9 coding hierarchical duce Chen Di Detection Stillness Using Coding Video Multi-Reference rm ufr ssoni iue1 n iue1.The 1b. Figure and reference 1a the Figure on of in depending shown construction Var- designed extra-ALTREF as the be frames. buffer, on 16 may frame decision to 4 structures encoder’s between coding the length a GF have may ious group GF a within frame individual each group. GF structur of coding order The encoding frames. the other termines of coding referenc the of for gro selection buffered the GF and each length of interval their structure on coding based The rando of resilience. sake error the for or request cess at interval configur frame key encoder the other as video’ssuch and the characteristics to group, temporal according GF vary or each may tial interval, of group length GF The the i.e. groups. (GF) Golden-Frame into in improvement VP9. over generational efficiency achieve coding to vari [7]. expected and compensation is [6], AOM/AV1 motion block compensation overlapped motion block-size warped able locally loop-restorat and switchable by as global such web tools new video coding several the a and introduced features codec on 3], AOM/AV1 The [2, media [4]. codec for Project VP9 WebM specifically the jointly followed designed is It codec which (AOM) Google. Media by Open founded for Alliance co technology called major panies of consortium a by developed codec video Introduction approach. new codi acco the consistent structures using a coding demonstrates GF result different Experimental uses ingly. oth and from groups group stillness GF of GF groups still each GF from those differentiates extracted then metrics dete It three stillness using automatic scheme an develops tion hence approach det new stillness of Our use the Golden-Fra through the structure pape coding designs group adaptively This (GF) that approach frames. new a its poses across gr stillness GF codi the consistent if multilayer performance presents of coding use worse the in result that may observed structure been has It group. wi GF structure coding given multilayer a hierarchical, a frames a with using video encoder, fixed AOM/AV1 coded current is the group In GF size. group a (GF) GF of Golden-Frame structure in coding grouped The frames groups. of succession as quence Abstract oge n. 60Apihar aka,Muti iw CA View, Mountain Parkway, Amphitheatre 1600 Inc., Google, colo lcrcladCmue niern,Pru Univ Purdue Engineering, Computer and Electrical of School ntecretipeetto fteAMA1encoder, AOM/AV1 the of implementation current the In frames video source the divides codec AOM/AV1 Current royalty-free source, open an is [1] codec AOM/AV1 The se- video input an consider codec AOM/AV1 of Encoders LAST ⋆ o Liu Zoe , FRAME † FRAME aw Xu Yaowu , , GOLDEN n the and s † egigZhu Fengqing , FRAME BWDREF and ALTREF ⋆ dadDelp Edward , FRAME frames e intro- s FRAME hnone thin ggain ng o [5], ion rnon- er ection. ations, pro- r ac- m de- e pis up spa- The oup one rd- me m- ng re c- ⋆ - . . riy etLfyte nin,UA47907. USA Indiana, Lafayette, West ersity, S 94043. USA , h owr eeec frame. reference forward the V htetnstenme frfrnefae yadding by frames by reference adopted is of tool t number coding is new the LAST2 A It extends group. that frame. GF AV1 future each distant of a frame last from selected frame reference hre itne h andfeec sthat is difference main The distance. shorter ieybtentocdn tutr ssoni iue1 and 1a Figure in shown adap- as choose to structure groups coding GF two the between allows tively which paper this in posed Detection Stillness Group GF Figur Automatic in structure as layer one performance, the coding by worse generated that produce is to may opposed stillness structu 1a coding when multilayer Figure that of in use found shown the and group, GF group one in GF present the each sti the across of and efficiency motion feature coding zoom-out ness the examined / We zoom-in group. GF a entire present may fr successive other its across and stillness present For may group groups. GF GF some other from itself differentiate to teristics Stillness Group GF Method video on based size [10]. GOP in different presented (GO using is pictures for tem- content of method the group A while a high. in B-frame are frames and the p P- among is correlation more scheme poral using control coding suggests video that adaptive v posed an on inve depending [9], that structures In done GF coding been content. difference a has of work in use Little frames the motion. gates the little s whether of A determine is group to detection. designed stillness ne of are GF use a gro metrics the the propose (GF) is through we Golden-Frame structure example paper, the coding such this designs adaptively In One that approach feature. groups. stillness coding GF with mul- better the groups have the all always that for not means ficiency does This structure coding enabled. tilayer were th when frames worse reference actually was two videos test some for efficiency ing with compariso and a performance However, compression groups. GF the the of all for 1a Figure in shown coding hierarchical design. efficie reference The coding multi-backward the multi-layer, improve its to greatly due may filtering. 1a Figure temporal in structure apply not does to similar are and BWDREF GOLDEN BWDREF natmtcsilesdtcino h Fgop spro- is groups GF the of detection stillness automatic An charac- consistent contain to constructed be may group GF A structure coding the uses encoder AOM/AV1 current The BWDREF FRAME FRAME FRAME FRAME FRAME , LAST steitapeito frame. prediction intra the is r akadrfrnefae narelatively a in frames reference backward are LAST3 nbe n ialdsoe httecod- the that showed disabled and enabled . FRAME LAST2 FRAME ALTREF . extra-ALTREF FRAME , extra-ALTREF extra-ALTREF FRAME and BWDREF LAST stebackward the is LAST3 FRAME FRAME instance, FRAME FRAME FRAME FRAME ames, 1b. e tof et eas re ideo and ncy sti- ese ro- ef- up he ll- P) w is n (a) GF Group Coding Structure Using Multilayer

(a) zero motion accumulator (b) GF Group Coding Structure Using One Layer

Figure 1. GF Group Coding Structures

Figure 1b. Three metrics are extracted from the GF group dur- ing the first coding pass of AOM/AV1 to determine the GF group stillness. The first coding pass of AOM/AV1 conducts a fast block matching with integer-pixel accuracy and use only one reference frame, the previous frame. Some motion vector and motion com- pensation information are collected during the first coding pass. Our proposed stillness detection method uses this information to extract three metrics as described below which requires small amount of computation. It then identifies the thresholds and de- rives the criteria to classify GF groups into two categories: GF groups of stillness and GF groups of non-stillness. The thresh- (b) avg pixel error olds are obtained by collecting statistics of the three metrics from GF groups of eight low resolution (cif ) test videos. We manually labeled the stillness or non-stillness of the GF groups. Figure 2 shows the histograms and the thresholds of the three metrics. We intentionally included some test videos that contain GF groups of “stillness-like” characteristics in the non-stillness class because they are more likely to be misclassified as GF group of stillness. The GF group with “stillness-like” characteristics shows either very slow motion or static background with small moving objects. We obtained three criteria which are jointly applied to automati- cally detect stillness. Finally, the GF group is coded using the workflow given in Figure 3. Stillness Detection Metrics: (c) avg error stdev 1. zero motion acumulator: Minimum of the per-frame percentage of zero-motion inter blocks within one GF group: Figure 2. Thresholds for Metrics zero motion accumulator = MIN(pcnt zero motion | Fi ∈ S) Fi errors (SSE) within one GF group: (1) avg pixel error = (2) where MEAN( f rame sseFi/number o f pixels per f rame | Fi ∈ S) S = {Fi|i = 1,2,...,gf group interval}, the set of frames in the GF group where gf group interval: number of frames in the GF group f rame sseFi is the SSE of frame Fi pcnt zero motion: percentage of the zero-motion inter blocks out of all the inter blocks 3. avg error stdev: First calculate the standard deviation of the block-wise SSEs for each frame, where block SSEs are 2. avg pixel error: Average of per-pixel sum of squared obtained from zero-motion prediction; then obtain the mean value as shown in Table 2. More specifically, the set of lowres includes 40 videos of cif resolution, and the set of midres includes 30 videos of 480p and 360p resolution. Each video is coded with a single GOLDEN FRAME and a set of target bitrates. For qual- ity metrics we use the arithmetic average of the frame PSNR and SSIM [11]. To compare RD curves obtained by the base AV1 codec and our proposed method, we use the BDRATE metric [12]. Experimental results demonstrated the advantage of the proposed approach. The Google test set of lowres has two video clips that contain detected still GF groups (pamplet cif and bowing cif ) and test set midres has one (snow mnt). As shown in Table 3, by ap- plying the proposed approach, the BDRATE of video clips that contains GF groups of stillness has decreased by approximately 1%. The classification results of the proposed automatic still- Figure 3. GF Group Coding With Stillness Detection ness detector contains no misclassification case in the videos from these two test video sets. There are mainly two reasons that the single layer coding structure has better coding efficiency on the of the standard deviations of all the frames in one GF group: GF groups with stillness feature. One is that the multilayer coding structure in Figure 1a involves more candidate reference frames avg error stdev = MEAN(STDEVFi(block sse(0,0)) | Fi ∈ S) (3) thus requires more motion information to be transmitted to the decoder. The other reason is that the multilayer coding structure where uses an unbalanced bit allocation scheme which is not preferable block sse(0,0) is the block-wise SSEs obtained from zero-motion for GF group of stillness in which the frames are very similar. prediction STDEVFi is the standard deviation of the block-wise SSEs of Table 2 BDRATE Reduction Using Proposed Method On frame Fi Google Test Set test set BDRATE(PSNR) BDRATE(SSIM) We use the above three metrics to differentiate those GF test set of -0.063 -0.045 groups of stillness features from other GF groups, subject to the lowres criteria in Table 1. test set of -0.026 -0.041 midres Table 1 Criteria for GF group stillness detection

Stillness Detection Stillness Detection Metrics Criteria (Identified as Table 3 BDRATE Reduction Using Proposed Method On Video GF group of stillness) Clips Contain GF Group Of Stillness zero motion accumulator >0.9 video clip BDRATE(PSNR) BDRATE(SSIM) avg pixel error <40 pamplet cif -1.395 -1.076 avg error stdev <2000 bowing cif -1.118 -0.735 snow mnt -0.767 -1.235

Adaptive GF Group Structure Design Once a GF group is categorized as a GF group of stillness, no Conclusion and Future Work extra-ALTREF FRAME or BWDREF FRAME is used in the sin- We proposed an automatic GF group stillness feature detec- gle layer coding structure as shown in Figure 1b. The single layer tion method. Each GF groups is classified into still GF group and coding structure still has multiple reference frames employed for non-still GF group based on three metrics and the encoder adap- the coding of one video frame. LAST FRAME, LAST2 FRAME, tively chooses the coding structure based on optimized coding LAST3 FRAME and GOLDEN FRAME are used as forward pre- efficiency. Experimental results showed coding gain for videos diction reference and ALTREF FRAME is used as backward pre- containing still GF group. We also observed that GF groups con- diction reference. If a GF group is categorized as non-still GF taining other features, such as fast zoom-out and high motion, group, we will further leverage the use of BWDREF FRAME and may also benefit from the single layer coding structure. extra-ALTREF FRAME to help improve the coding perfor- mance. References [1] http://www.aomedia.org/. Experimental Results [2] D. Mukherjee, J. Bankoski, R. S. Bultje, A. Grange, J. Han, We tested the proposed method using two standard video test J. Koleszar, P. Wilkins, and Y. Xu, “The latest open-source video sets with various resolutions and spatial/temporal characteristics, codec - an overview and preliminary results,” Proceedings of the IEEE Picture Coding Symposium, pp. 390–393, December 2013, Author Biography San Jose, CA. Di Chen is a PhD candidate in Video and Image Processing [3] ——, “A technical overview of vp9 - the latest open-source video Laboratory (VIPER) at Purdue University, West Lafayette. Her codec,” SMPTE Annual Techical Conference & Exhibition, October research focuses on video analysis and compression. Currently, 2013. she is working on texture segmentation based video compression [4] http://www.webmproject.org/. using convolutional neural networks. [5] D. Mukherjee, S. Li, Y. Chen, A. Anis, S. Parker, and J. Bankoski, “A switchable loop-restoration with side-information framework for Zoe Liu received her Ph.D. in ECE from Purdue University. the emerging video codec,” Proceedings of the IEEE Interna- She once worked with corporate research institutes, prototyping tional Conference on Image Processing, pp. 265–269, September mobile video conferencing platforms at Bell Labs (Lucent) and 2017, Beijing, China. Nokia Research Center. Zoe then devoted her effort to the design [6] S. Parker, Y. Chen, D. Barker, P. de Rivaz, and D. Mukherjee, and development of FaceTime at Apple, Tango Video Call with “Global and locally adaptive warped motion compensation in video the startup of TangoMe, and Google Hangouts. She is currently compression,” Proceedings of the IEEE International Conference on working in the WebM team of Google. Image Processing, pp. 275–279, September 2017, Beijing, China. [7] Y. Chen and D. Mukherjee, “Variable block-size overlapped block Dr. Yaowu Xu is currently the Tech Lead Manager of the motion compensation in the next generation open-source video video coding research team at Google. Dr. Xu’s education codec,” Proceedings of the IEEE International Conference on Im- background includes the BS degree in Physics, the MS and age Processing, pp. 938–942, September 2017, Beijing, China. PhD degree in Nuclear Engineering from Tsinghua University [8] Z. Liu, D. Mukherjee, W. Lin, P. Wilkins, J. Han, Y. Xu, and at Beijing, China. He also holds the MS and PhD degree J. Bankoski, “Adaptive multireference prediction using a symmet- in Electrical and Computer Engineering from University of ric framework,” Proceedings of the IS&T International Symposium Rochester. His current research focuses on advanced algorithms on Electronic Imaging, Visual Information Processing and Commu- for digital video compression. nication VIII, pp. 65–72(8), January 2017, Burlingame, CA. [9] S. C. Hsia, “An adaptive video coding control scheme for real-time Fengqing Zhu is an Assistant Professor of Electrical and mpeg applications,” EURASIP Journal on Advances in Signal Computer Engineering at Purdue University, West Lafayette, Processing, vol. 2003, no. 3, p. 161846, Mar 2003. [Online]. IN. Dr. Zhu received her Ph.D. in Electrical and Computer Available: https://doi.org/10.1155/S1110865703210040 Engineering from Purdue University in 2011. Prior to joining [10] B. Zatt, M. S. Porto, J. Scharcanski, and S. Bampi, “Gop structure Purdue in 2015, she was a Staff Researcher at Tech- adaptive to the video content for efficient h.264/avc encoding,” Pro- nologies (USA), where she received a Huawei Certification of ceedings of the International Conference on Image Processing, pp. Recognition for Core Technology Contribution in 2012. Her 3053–3056, September 2010. research interests include image processing and analysis, video [11] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Im- compression, computer vision and computational photography. age quality assessment: from error visibility to structural similar- ity,” IEEE Transactions on Image Processing, vol. 13, no. 4, pp. Edward J. Delp was born in Cincinnati, Ohio. He is cur- 600–612, April 2004. rently The Charles William Harrison Distinguished Professor of [12] G. Bjntegaard, “Calculation of average psnr differences between Electrical and Computer Engineering and Professor of Biomedi- rdcurves,” VCEGM33, 13th VCEG meeting, March 2001, Austin, cal Engineering at Purdue University. His research interests in- Texas. clude image and video processing, image analysis, computer vi- [13] D. Mukherjee, H. Su, J. Bankoski, A. Converse, J. Han, Z. Liu, sion, image and video compression, multimedia security, medi- and Y. Xu, “An overview of video coding tools under consideration cal imaging, multimedia systems, communication and informa- for vp10: the successor to vp9,” Proceedings of the SPIE, Applica- tion theory. Dr. Delp is a Life Fellow of the IEEE, a Fellow of the tions of Digital Image Processing XXXVIII, vol. 9599, p. 95991E, SPIE, a Fellow of IS&T, and a Fellow of the American Institute of September 2015. Medical and Biological Engineering. [14] M. Paul, W. Lin, C.-T. Lau, and B. Lee, “A long-term reference frame for hierarchical b-picture-based video coding,” IEEE Trans- action on Circuits and Systems for Video Technology, vol. 24, no. 10, pp. 1729–1742, October 2014. [15] J. Bankoski, J. Koleszar, L. Quillio, J. Salonen, P. Wilkins, and Y. Xu, “Vp8 data format and decoding guide, rfc 6386,” http://datatracker.ietf.org/doc/rfc6386/.