applied sciences

Article Fast Thumbnail Extraction for H.264/AVC, HEVC and VP9

Joohyung Byeon 1 , Seungchul Jang 1, Jongseok Lee 1 , Kyungyong Kim 2 and Donggyu Sim 1,*

1 Department of Computer Engineering, Kwangwoon University, Seoul 139701, Korea; [email protected] (J.B.); [email protected] (S.J.); [email protected] (J.L.) 2 LG Electronics Inc., Seoul 135860, Korea; [email protected] * Correspondence: [email protected]

Abstract: In this paper, we propose a partial decoding method with limited memory usage for high-speed thumbnail extraction. The proposed method performs a partial inverse transform and a partial intra prediction in order to reconstruct pixels for intra prediction and thumbnails. Thereafter, the reconstructed pixels at the bottom and right line of the block are stored in the line buffer and the thumbnail buffer without being stored in the decoded picture buffer with full resolution. H.264/AVC, HEVC and VP9 video codecs have different coding structures, prediction and transforms; however, the proposed algorithm can be applied to the corresponding codecs in the same manner. In order to evaluate the performance of the proposed method, we implemented the proposed algorithm for H.264/AVC, HEVC and VP9. We found that the thumbnail extraction time of the proposed method decreased by 66% in H.264/AVC, 52% in HEVC and 48% in VP9 as compared to the full decoding method.

Keywords: thumbnail; partial decoding; real-time processing; memory reduction; H.264/AVC; HEVC; VP9

 

Citation: Byeon, J.; Jang, S.; Lee, J.; 1. Introduction Kim, K.; Sim, D. Fast Thumbnail Extraction for H.264/AVC, HEVC With the recent development of video capture, displays and processing capability, and VP9. Appl. Sci. 2021, 11, 1844. there is a growing demand for high-definition, high-quality video services in the market. https://doi.org/10.3390/app11041844 Along with the market requirements, the ISO/IEC moving picture expert group (MPEG) and video coding expert group (VCEG) have developed subsequent standards Academic Editor: Andrea Prati such as MPEG-2, H.264/AVC and high-efficiency video coding (HEVC) [1–3]. In addition, private entity codecs such as VP8 and VP9 [4,5], among others, have been developed. As Received: 4 February 2021 the codecs have evolved, their coding performance has increased by a bit-saving factor Accepted: 18 February 2021 of two for the same visual quality. These advances have been made possible by adding a Published: 19 February 2021 number of new tools with high computational complexity. Many tools have been developed for and added to intra frames. However, to reduce the amount of spatial redundancy of the Publisher’s Note: MDPI stays neutral frame, coding tools are interdependent, making it difficult to accelerate the video codecs in with regard to jurisdictional claims in real time with a minimal computational load. published maps and institutional affil- In today’s world, we are flooded with video content from multiple media services iations. through broadcasting, over-the-top media service, the internet, and so on. Easy selection of these contents is necessary, and thumbnail display is used as a part of user interfaces to allow users to visually select the content they wish to engage with. However, video resolution is also increasing rapidly up to 8K or higher, resulting in a significant increase Copyright: © 2021 by the authors. in the hardware requirements for fully decoding video content. Several attempts have Licensee MDPI, Basel, Switzerland. been made, such as parallel decoding and decoder implementation using single Instruction This article is an open access article multiple data (SIMD) instructions, to improve the decoding speed [6–9]. However, it is distributed under the terms and still almost impossible to fully decode multiple 4K or 8K videos simultaneously in limited conditions of the Creative Commons hardwares to enable thumbnail display. In addition, we need to reduce the required amount Attribution (CC BY) license (https:// of memory used for thumbnail processing on embedded systems due to their memory creativecommons.org/licenses/by/ limitations. Some attempts have been made to extract thumbnail images in the frequency 4.0/).

Appl. Sci. 2021, 11, 1844. https://doi.org/10.3390/app11041844 https://www.mdpi.com/journal/applsci Appl. Sci. 2021, 11, 1844 2 of 13

Appl. Sci. 2021, 11, 1844 2 of 13

systems due to their memory limitations. Some attempts have been made to extract thumbnail images in the frequency domain [10,11] based on Chen’s transform domain domainintra prediction [10,11] based method on Chen’s[12]. However, transform thes domaine methods intra predictionrequire additi methodonal [look-up12]. However, , thesewhich methods increases require the amount additional of memory look-up use tables, and is which very increaseshard to apply the amount to other of codecs. memory As usesuch, and a partial is very decoding hard to apply method to otherhas been codecs. prop Asosed such, [13]. a This partial method decoding restores method only has the been proposed [13]. This method restores only the right and bottom boundary pixels in right and bottom boundary pixels in 4 × 4 units, which is the minimum transform unit 4 × 4 units, which is the minimum transform unit (TU) size of HEVC. This method can (TU) size of HEVC. This method can be easily applied to other codecs; however, it always be easily applied to other codecs; however, it always operates in 4 × 4 units regardless of operates in 4 × 4 units regardless of the block size. Therefore, unnecessary pixels that are the block size. Therefore, unnecessary pixels that are not used for the thumbnail output not used for the thumbnail output or reference pixels for intra prediction are restored. or reference pixels for intra prediction are restored. Figure1 shows the pixels used for Figure 1 shows the pixels used for a reference or thumbnail output (gray) and the a reference pixel or thumbnail output (gray) and the unnecessarily reconstructed pixels unnecessarily reconstructed pixels (yellow) of the method [12]. The number of (yellow) of the method [12]. The number of unnecessarily restored pixels for the N × N TU unnecessarily restored pixels for the N × N TU block are as follows: block are as follows: 2 3N3𝑁− −12 12𝑁N (1)(1) 8 8

FigureFigure 1.1. TheThe restoredrestored pixelspixels forfor thethe intraintra predictionprediction oror thumbnailthumbnail output (gray) and unnecessarily restoredrestored pixelspixels (yellow)(yellow) inin aa 1616× × 16 block.

InIn thisthis paper, we present a a fast fast thumbna thumbnailil decoding decoding method method using using a asmall small amount amount of ofmemory memory and and partial partial decoding decoding according according to tothe the prediction prediction block block size size for for intra intra frames frames of ofH.264/AVC, H.264/AVC, HEVC HEVC and and VP9 VP9 with with minimal minimal visual visual quality quality loss loss and and without anyany errorerror propagation.propagation. TheThe proposedproposed partialpartial decodingdecoding methodmethod restoresrestores thethe pixelspixels constitutingconstituting thethe thumbnailthumbnail andand thethe referencereference pixelspixels usedused inin thethe intraintra predictionprediction byby replacingreplacing thethe fullfull inverseinverse transformtransform andand intraintra predictionprediction withwith thethe partialpartial inverseinverse transformtransform andand partialpartial intraintra prediction.prediction. TheThe computationalcomputational loadload andand memorymemory usageusage areare greatlygreatly reducedreduced byby omittingomitting bothboth thethe reconstruction reconstruction process process and and the storagethe storage process process for the for other the pixels. other HEVC pixels. employs HEVC largeemploys transformations large transformations with dimensions with dimensions such as 32 such× 32; as thus,32 × 32; we thus, reconstructed we reconstructed several pixelsseveral inside pixels the inside block tothe preserve block theto preserve visual quality the ofvisual the reconstructedquality of the thumbnails. reconstructed The memorythumbnails. structure The memory for the proposed structure partial for the decoding proposed method partial uses decoding the minimal method thumbnail uses the bufferminimal and thumbnail the reference buffer line and buffer the rather reference than li thene decodedbuffer rather picture than buffer the decoded (DPB). Memory picture isbuffer not allocated (DPB). Memory for pixels is whose not allocated restoration for ispixels omitted, whose and referencerestoration pixels is omitted, that are and no longerreference required pixels that are removedare no longer by storing required the are restored removed reference by storing pixels the ofrestored the next reference block, therebypixels of reducing the next theblock, memory thereby allocation reducing required. the memory In addition, allocation a down-sampling required. In addition, process a usingdown-sampling a thumbnail process buffer using to store a thumbnail thumbnail buffer pixels to for store output thumbnail is not performed, pixels for output thereby is reducingnot performed, computational thereby reducing complexity, computatio memorynal usage complexity, and memory memory access. usage Video and codecs memory of H.264/AVC,access. Video HEVC codecs and of H.264/AVC, VP9 have different HEVC and coding VP9 structures have different and transforms.coding structures However, and thetransforms. proposed However, algorithm the can proposed be applied algorith in them same can be manner applied for in these the same codecs. manner In order for tothese evaluate codecs. the In performance order to evaluate of the proposed the performance method, weof implementedthe proposed themethod, proposed we algorithm for H.264/AVC, HEVC and VP9. We found that the thumbnail extraction time implemented the proposed algorithm for H.264/AVC, HEVC and VP9. We found that the of the proposed method decreased by 66% in H.264/AVC, 52% in HEVC and 48% in VP9 compared to the full decoding method. Appl. Sci. 2021, 11, 1844 3 of 13

Appl. Sci. 2021, 11, 1844thumbnail extraction time of the proposed method decreased by 66% in H.264/AVC, 52% 3 of 13 in HEVC and 48% in VP9 compared to the full decoding method. This paper is organized as follows: Section 2 explains the proposed partial decoding method for fast thumbnail extraction and an efficient memory structure. Section 3 shows the experimental resultsThis paper of the is proposed organized al asgorithm follows: and Section the existing2 explains implementation the proposed partial of decoding open software methodin terms for of running fast thumbnail times and extraction visual quality. and an Section efficient 4 memory concludes structure. the paper. Section 3 shows the experimental results of the proposed algorithm and the existing implementation of 2. Proposed Partialopen softwareDecoding in of terms H.264/AVC, of running HEVC times and and VP9 visual for quality.Thumbnail Section Extraction4 concludes the paper. A proposed2. Proposed thumbnail Partial extraction Decoding method of H.264/AVC,includes a video HEVC decoding and VP9 process for Thumbnail and a Extraction down-sampling process. The video decoding process consists of entropy decoding, A proposed thumbnail extraction method includes a video decoding process and a inverse transformation, intra prediction and in-loop filtering for intra frames. During the down-sampling process. The video decoding process consists of entropy decoding, inverse thumbnail extraction,transformation, entropy intra decoding, prediction invers ande transformation in-loop filtering and for intraintra frames. prediction During are the thumbnail performed to reconstructextraction, entropythe full decoding,image fram inversee. Then, transformation down-sampling and intrais performed prediction with are performed to the thumbnailreconstruct size. However, the full theimage decoding frame. and Then, down-sampling down-sampling processes is performed of thumbnail with the thumbnail extraction havesize. high However, computational the decoding complexi andty down-sampling and memory usage. processes In ofthis thumbnail paper, we extraction have propose a partialhigh decoding computational method complexity according and to prediction memory usage. block Insize this and paper, a memory we propose a partial structure for high-speeddecoding method thumbnail according extraction to prediction and compact block memory size and usage a memory with structure minimal for high-speed visual degradation.thumbnail extraction and compact memory usage with minimal visual degradation. The proposed Thethumbnail proposed extraction thumbnail method extraction replaces method the inverse replaces transform the inverse and transform intra and intra prediction in theprediction existing in video the existing decoding video process decoding with a process partial withinverse a partial transform inverse and transform and partial intra prediction.partial intra The prediction. proposed Thealgorithm proposed uses algorithm a low-capacity uses a thumbnail low-capacity buffer thumbnail buffer optimized for optimizedthe reconstructed for the reconstructed pixels in the pixelspartial in decoding the partial process, decoding along process, with alongtwo- with two-line line buffers. Figurebuffers. 2 shows Figure the2 showsblock diagram the block of diagramthe proposed of the method proposed for fast method thumbnail for fast thumbnail extraction. Thisextraction. diagram This shows diagram a thumbnail shows a thumbnailextraction extractionmethod in method which in the which partial the partial inverse inverse transformtransform and partial and partial intra intra prediction prediction are are replaced replaced with with the the original original inverseinverse transform and transform andintra intra prediction, prediction, and and the the restored restored pixels pixels are are stored stored in thein the line line buffer buffer and and thumbnail buffer thumbnail bufferto omit to omit the the down-sampling down-sampling process. process.

Figure 2. BlockFigure diagram 2. Block of the diagram proposed of thethumbnail proposed extraction thumbnail method. extraction method.

In this section,In we this describe section, a partial we describe decoding a partial method decoding that reduces method the that computational reduces the computational complexity of complexitythe inverse oftransform the inverse and transform intra prediction and intra process prediction during process the decoding during the decoding process, with process,minimal with visual minimal degradation. visual degradation. The proposed The proposed partial partialdecoding decoding method method restores restores only onlythe thepixels pixels necessary necessary for for thum thumbnailbnail extraction extraction inin orderorder to to reduce reduce the complexitythe of the complexity of inversethe inverse transform transform and intraand prediction.intra prediction. The pixels The necessarypixels necessary for the thumbnailfor the extraction thumbnail extractionare not are only not the only pixels the topixels be output to be output to the to thumbnail the thumbnail but also but those also those necessary to avoid necessary to avoiderror error propagation propagation in the in intra-picture the intra-picture prediction. prediction. The pixels The pixels required required for intra prediction for intra predictionin the in subsequent the subsequent blocks blocks are the are right-boundary the right-boundary and lower-boundary and lower-boundary pixels required for pixels requiredintra for predictionintra prediction in all thein all pixels the ofpixels a transform of a transform block. Ifblock. these If pixels these have pixels errors or are not have errors or restored,are not restored, the errors the propagate errors propagate and accumulate and accumulate for the entirefor the image, entire image, greatly reducing the greatly reducingvisual the qualityvisual qua of thelity reconstructedof the reconstructed image. image. For videos withFor a videosresolution with of a less resolution than Full of HD, less thanthe decoding Full HD, speed the decoding is sufficiently speed is sufficiently fast even in a limitedfast even hardware in a limited environment; hardware environment;thus, we targeted thus, vide weos targeted with a videosresolution with a resolution of 4K or higher.of 4KAs ordemonstrated higher. As demonstratedby the experiments, by the the experiments, subjective thequality subjective was good quality was good enough to display the thumbnails even if the thumbnails were generated at 1/64 size of enough to display the thumbnails even if the thumbnails were generated at 1/64 size of the original resolution from 4K or higher UHD videos. This paper is based on the 1/64 size the original resolution from 4K or higher UHD videos. This paper is based on the 1/64 size thumbnail extraction method, and in order to extract the thumbnails at the 1/2n size of the original resolution according to the user’s preference, it can be applied similarly by n n restoring the right and bottom boundary pixels and one pixel for each 2 2 × 2 2 blocks. Appl. Sci. 2021, 11, 1844 4 of 13

Appl. Sci. 2021, 11, 1844 thumbnail extraction method, and in order to extract the thumbnails at the 1/2 size4 of 13of the original resolution according to the user’s preference, it can be applied similarly by restoring the right and bottom boundary pixels and one pixel for each 2 ×2 blocks. In order toto extractextract aa thumbnailthumbnail withwith aa sizesize correspondingcorresponding to to 1/64 1/64 of the original imageimage resolution, oneone pixelpixel isis requiredrequired forfor eacheach 8 8× × 8 block. Therefore, all the pixels output toto thethe thumbnail correspond to one pixelpixel atat thethe bottombottom rightright ofof thethe 88 ×× 8 unit block. However, HEVC and others employ larger transformtransform sizes;sizes; thus, we werewere requiredrequired toto reconstructreconstruct several pixels insideinside thethe transformtransform blocksblocks forfor betterbetter visualvisual quality.quality. H.264/AVC,H.264/AVC, HEVCHEVC andand VP9 video compression standardsstandards havehave variousvarious predictionprediction andand transformtransform sizessizes andand shapesshapes (from(from 44 ×× 4 to 3232 ×× 32, and square and non-squarenon-square blocks).blocks). When thethe proposedproposed partialpartial decoding method was appliedapplied toto thethe H.264/AVC,H.264/AVC, HEVC andand VP9VP9 videovideo compressioncompression standards, thethe restoredrestored pixelpixel positionspositions withinwithin thethe blockblock werewere asas shownshown inin FigureFigure3 .3.

Figure 3. Pixels to be reconstructed forfor thethe proposedproposed thumbnailthumbnail extractionextraction forfor variousvarious blockblock shapes. shapes. Table1 lists the numbers of reconstructed pixels in the block according to the transform blockTable size1. The when number the of proposed pixels to methodbe reconstructed is applied. for Thethe block reference sizes. pixels in the table are pixels to be reconstructed and stored for intra prediction of the next block, and the pixels inside the # Pixels to Be Reconstructed Total # of blockBlock are the pixels that are reconstructed to be output as a thumbnail. The reconstructionReconstruction # Pixels Reference Pixels to Be pixelSize/Shape ratio represents the ratio of the reconstructedInside pixels Block of the proposed methodPixel Ratios to the Pixels Reconstructed total number of pixels in each block. According to the table, the reconstruction ratio of 4 × 4 16 7 0 7 44% the proposed algorithm is reduced from 44% to as low as 7%, depending on the block 4 × 8 (8 × 4) 32 11 0 11 34% sizes. Because the reconstruction ratio is reduced in proportion to the block size, lower 8 × 8 64 15 0 15 23% computational complexity of the thumbnail extraction method is necessary when larger- 8 × 16 (16 × 8) 128 23 0 23 18% sized16 blocks × 16 are included 256 in an image. 31 In addition, 1 the memory 32 usage can be 13% reduced for16 × large 32 (32 blocks. × 16) We 512are required to 47 develop partial 3 inverse transformations 50 and 10% partial intra32 prediction, × 32 which 1024 are described 63 in the subsections 9 along with 72 the minimal 7%memory structure. The proposed partial decoding for chroma samples can be easily derived in a sameTable way, even1 lists for the a 4:2:0 numbers format. of reconstructed pixels in the block according to the transform block size when the proposed method is applied. The reference pixels in the table are pixels to be reconstructed and stored for intra prediction of the next block, and the pixels inside the block are the pixels that are reconstructed to be output as a thumbnail. Appl. Sci. 2021,, 11,, 18441844 5 of 13

TableThe 1.reconstructionThe number of pixelspixel toratio be reconstructedrepresents the for theratio block of sizes.the reconstructed pixels of the proposed method to the total number of pixels in each block. According to the table, the # Pixels to Be Reconstructed Total # of Pixels to Reconstruction Block Size/Shape # Pixelsreconstruction ratio of the proposed algorithm is reduced from 44% to as low as 7%, Reference Pixels Inside Block Pixel Ratios depending on the block sizes. Because the reconstructionBe Reconstructed ratio is reduced in proportion to 4 × 4 16the block size, lower 7 computational 0 complexity of the thumbnail 7 extraction 44% method is × × 4 8 (8 4) 32necessary when 11 larger-sized blocks 0 are included in an image. 11 In addition, the 34% memory 8 × 8 64 15 0 15 23% 8 × 16 (16 × 8) 128usage can be reduced 23 for large blocks. 0 We are required 23 to develop partial 18% inverse 16 × 16 256transformations 31 and partial intra prediction, 1 which are described 32 in the subsections 13% along 16 × 32 (32 × 16) 512with the minimal 47 memory structure. 3 The proposed partial 50 decoding for chroma 10% samples 32 × 32 1024can be easily derived 63 in a same way, 9 even for a 4:2:0 format. 72 7%

2.1. Partial Transformations The inverse transformation process is to transformtransform frequencyfrequency domaindomain transformtransform coefficients,coefficients, obtained through the entropy decodingdecoding process, into magnitudes in the pixel domain by performing a 2D inverseinverse discrete cosine transformtransform (IDCT). In order to reduce computational complexity, the inverseinverse transformationtransformation ofof H.264/AVC,H.264/AVC, HEVC and VP9VP9 decoders consists of a butterfly butterfly structure in which 2D IDCTs are divided into two vertical and horizontal 1D IDCT operations, and each 1D IDCT is added and multiplied [[14,15].14,15]. The proposed transform is to inversely tr transformansform the lowermost and rightmost pixels according toto reference reference pixels, pixels, in orderin order to avoid to avoid error propagation.error propagation. In addition, In addition, if a transform if a transformblock is larger block than is larger 8 × than8 pixels, 8 × 8 onepixels, pixel one is pixel recovered is recovered per 8 ×per8 8 sub-block × 8 sub-block inside inside the thelarger larger block block to avoidto avoid interpolation interpolation in in the th thumbnail.e thumbnail. We We employ employ the the two-stagetwo-stage inverseinverse transformation based on thethe separableseparable characteristics.characteristics. To To perform the horizontalhorizontal 1D1D transformation, the vertical 1D IDCT should be fully performed. However, the horizontal 1D transform can be partly performed for the reference pixels and one pixel per internalinternal 8 × 88 sub-block.sub-block. As As shown shown in in Figure 44,, thethe 1616 × 1616 block block is inversely transformed for the yellow pixels of the second stage.

Figure 4. Two-stage partial reconstruction for 16 × 1616 inverse inverse transformation. transformation.

The partial horizontal 1D-IDCT is performed by removing some pa partrt of the butterfly butterfly structure. Depending Depending on on the the transformation transformation sizes, sizes, the the 8th, 16th, 24th and 32nd pixelspixels should be reconstructed. Figure Figure 55 showsshows thethe operationsoperations requiredrequired toto restorerestore thethe lastlast 16th16th pixel in the original 16-point 1D-IDCT. The computationcomputation amount required forfor the last pixel reconstruction of the horizontal 16-point 1D-IDCT is 15 additions and 24 multiplications. The popular butterfly butterfly structure of 1D-IDCT requires 64 additions and 72 multiplications. Table2 2 shows shows thethe numbers numbers ofof additions additions andand multiplications multiplications forfor thethe proposedproposed partialpartial reconstruction depending on the transformation blockblock sizes.sizes. As shown in the table, blocks that are larger and have greater horizontal lengths can be accelerated more. that are larger and have greater horizontal lengths can be accelerated more. Appl. Sci. 2021, 11, 1844 6 of 13 Appl. Sci. 2021, 11, 1844 6 of 13

FigureFigure 5. 5. PartialPartial reconstruction flow flow of of the the butterfly butterfly structure structure for for a a16-point 16-point 1D 1D inverse inverse discrete discrete cosinecosine transform transform (ID-IDCT). (ID-IDCT).

Table 2. The numbers of additions (Add) and multiplications (Mul) of the full and partial inverse Table 2. The numbers of additions (Add) and multiplications (Mul) of the full and partial in- transformations. verse transformations. Transform Full 2D-IDCT Partial 2D-IDCT Reduction Ratio TransformSize Add Full 2D-IDCTMul Add Partial 2D-IDCT Mul Add Reduction Ratio Mul 4Size × 4 64Add 64 Mul 52 Add 49 Mul 19% Add Mul 23% 44 × 84 224 64 224 64 105 52 126 49 53% 19% 23% 44% 48 × 48 224 224 224 224 212 105 209 126 5% 53% 44% 7% 88 × 84 384 224 384 224 265 212 286 209 31% 5% 26%7% × 88 × 168 1216 384 1344 384 530 265 672 286 56% 31% 26% 50% 8 × 16 1216 1344 530 672 56% 50% 16 × 8 1216 1344 1097 1246 10% 7% 16 × 8 1216 1344 1097 1246 10% 7% 16 × 1616 2048 2048 2304 2304 1362 1362 1632 1632 33% 33% 29% 29% 16 × 3232 5184 5184 7552 7552 2412 2412 3464 3464 53% 53% 54% 54% 32 × 1616 5184 5184 7552 7552 4498 4498 6880 6880 13% 13% 9% 9% 32 × 3232 8320 8320 12,800 12,800 5548 5548 8712 8712 33% 33% 32% 32%

2.2.2.2. Partial Partial Intra Intra Prediction Prediction TheThe intra intra prediction prediction process process was was designed designed to to improve improve the the compression compression efficiency efficiency by by eliminatingeliminating redundancy redundancy among among adjacent adjacent pixels pixels in in an an image. image. In In the the H.264, H.264, HEVC HEVC and and VP9 VP9 decoders,decoders, reference reference pixels pixels of of neighbouring neighbouring blocks blocks are are filtered filtered and and the the filtered filtered pixel pixel values values areare then then used used for for predicted predicted signals. signals. The The predicted predicted and and residual residual signals signals from from the the inverse inverse transformationtransformation are are added. added. ForFor thumbnail thumbnail extraction, extraction, partial partial prediction prediction can can be performedbe performed from from the filteredthe filtered ref- erencereference samples. samples. In In a mannera manner similar similar to to the th partiale partial inverse inverse transformation, transformation, the the bottom bottom andand rightmostrightmost pixelpixel lines as as well well as as one one pixel pixel per per inner inner 8 × 8 8× sub-block8 sub-block are predicted. are predicted. As a Asresult, a result, it is it possible is possible to toomit omit the the memory memory co copypy operations operations as as much much as thethe numbernumber ofof unnecessaryunnecessary pixels pixels for for thumbnail. thumbnail. Figure Figure6 6shows shows the the necessary necessary pixels pixels to to be be predicted predicted and and reconstructedreconstructed for for the the thumbnail thumbnail extraction extraction process process for for a a 16 16× × 1616 block.block. Appl. Sci. 2021, 11, 1844 7 of 13 Appl. Sci. 2021, 11, 1844 7 of 13

Figure 6. Partial intra prediction for a 16 × 1616 block block (diagonal (diagonal mode). mode).

2.3. Memory Structures and Memory Access In real-timereal-time thumbnailthumbnail extraction,extraction, itit is is important important to to reduce reduce memory memory access access and and re- requirements.quirements. In thisIn this section, section, theproposed the propos partialed partial decoding decoding memory memory structure isstructure described. is described.In the proposed In the partial proposed decoding partial method, decoding a very method, small a number very small of pixels number are restoredof pixels over are restoredthe original over number the original of image number pixels. of Furthermore,image pixels. Furthermore, some part of the some restored part of pixels the restored is used pixelsfor the is thumbnail used for the image. thumbnail Therefore, image. theTherefore, method the of method storing theof storing reconstructed the reconstructed pixels in pixelsthe original in the memoryoriginal memory structure structure for partial for decoding partial decoding is inefficient is inefficient because because it leaves it aleaves large aamount large amount of memory of memory unused. Theunused. proposed The prop memoryosed structure memory does structure not allocate does not memory allocate for pixels whose restoration operations are omitted. Because reference pixels are not re-used, a memory for pixels whose restoration operations are omitted. Because reference pixels are new reference pixel line can be overwritten. For the proposed thumbnail extraction, one not re-used, a new reference pixel line can be overwritten. For the proposed thumbnail thumbnail buffer and two reference line buffers are employed rather than a full recon- extraction, one thumbnail buffer and two reference line buffers are employed rather than struction frame buffer. The thumbnail buffer resolution is 1/64 of the original one. The a full reconstruction frame buffer. The thumbnail buffer resolution is 1/64 of the original reference line buffer is composed of the left and top line buffers. The left line buffer has the one. The reference line buffer is composed of the left and top line buffers. The left line maximum block height, and the top line buffer has a pixel value that corresponds to the buffer has the maximum block height, and the top line buffer has a pixel value that width of the original image. corresponds to the width of the original image. After the partial intra prediction and the partial inverse transformation are performed, After the partial intra prediction and the partial inverse transformation are the predicted and residual signals are summed up, thereby restoring the thumbnail and performed, the predicted and residual signals are summed up, thereby restoring the reference pixels. Among the restored pixels, the thumbnail pixels are to be included in the thumbnail and reference pixels. Among the restored pixels, the thumbnail pixels are to be output thumbnail image, and one pixel per inner 8 × 8 sub-block is extracted and stored in includedthe thumbnail in the buffer. output The thumbnail reference pixels image, are and referred one topixel as anper input inner of the8 × intra8 sub-block prediction, is extractedand they areand divided stored in into the two thumbnail reference buffer. line buffers The reference and stored. pixels The are restored referred rightmost to as an inputpixels of thethe blockintra areprediction, stored in and the leftthey reference are divided line buffer, into two and reference the lowest-order line buffers pixels and are stored.stored inThe the restored upper reference rightmost line pixels buffer. of the The bloc referencek are stored line buffers in the left are reference used for consecutive line buffer, andblock the reconstructions. lowest-order pixels are stored in the upper reference line buffer. The reference line buffersFigure7 are shows used an for example consecutive of the block proposed reconstructions. memory structure when extracting thumb- nailsFigure with a resolution7 shows an of 3840example× 2160. of the The proposed thumbnail memory buffer has structure a resolution when of extracting480 × 270 (129,600thumbnails pixels). with Thea resolution upper and of left3840 reference × 2160. The line thumbnail buffers have buffer 3840 has and a 64resolution pixels, respec- of 480 ×tively; 270 (129,600 therefore, pixels). the required The upper memory and spaceleft re forference the restored line buffers pixel have is reduced 3840 and by 98%,64 pixels, from respectively;8,294,400 to 133,504. therefore, the required memory space for the restored pixel is reduced by 98%, from 8,294,400 to 133,504. Appl. Sci. 2021, 11, 1844 8 of 13

Appl. Sci. 2021, 11, 1844Appl. Sci. 2021, 11, 1844 8 of 13 8 of 13

Figure 7. The proposed memory structure for a 3840 × 2160 input bitstream.

3. Experimental Results and Discussion In order to evaluate the performance of the fast thumbnail extraction method Figure 7. The proposed memory structure for a 3840 × 2160 input bitstream. proposed Figurein this 7. paper, The proposed the proposed memory structurealgorithm for afor 3840 H.264/AVC, × 2160 input HEVCbitstream. and VP9 was implemented3. Experimental on FFmpeg Results andversion Discussion 4.2.2 [16]. We also used ffmpegthumbnailer [17] to fairly 3. Experimental Results and Discussion evaluateIn order the performance to evaluate the of performance thumbnail extr of theaction fast thumbnailwith the FFmpeg extraction decoder. method The proposed open softwarein this paper, ffmpegthumbnailer the proposedIn order algorithm to drivesevaluate forthe H.264/AVC,the FFmpeg performance decoders HEVC of andthe and VP9fast down-sampler wasthumbnail implemented extraction for method thumbnailon FFmpeg extraction, versionproposed 4.2.2 as inshown [16 this]. We paper,in also Figure usedthe 8.proposed ffmpegthumbnailer Thus, the algorithm proposed [for 17algorithm] H.264/AVC, to fairly also evaluate employsHEVC the and VP9 was theperformance open software ofimplemented thumbnail for the same extractionon FFmpeginterface with version and the fair 4.2.2 FFmpeg evaluation. [16].decoder. We also used The openffmpegthumbnailer software ffm- [17] to fairly pegthumbnailerThe experimentevaluate drives was the conducted FFmpegperformance decoders in ofa virtthumbnail andual down-sampler extr environmentaction forwith thumbnail the with FFmpeg a extraction,3.40 decoder. GHz The open processor,as shown in 16.0 Figuresoftware GB of8. Thus,memory ffmpegthumbnailer the and proposed Windows algorithm 10drives 64-bit also theoperating employs FFmpeg system, the decoders open as software shown and in fordown-sampler Table the for 3.same interfacethumbnail and fair evaluation. extraction, as shown in Figure 8. Thus, the proposed algorithm also employs the open software for the same interface and fair evaluation. The experiment was conducted in a virtual Linux environment with a 3.40 GHz processor, 16.0 GB of memory and 64-bit operating system, as shown in Table 3.

Figure 8. A block diagram for pe performancerformance evaluation.

The experiment was conducted in a virtual Linux environment with a 3.40 GHz The experiment was conducted in a virtual Linux environment with a 3.70 GHz processor, 16.0 GB of memory and Windows 10 64-bit operating system, as as shown shown in in TableTable 33.. Figure 8. A block diagram for performance evaluation.

Table 3. Experimental conditions. The experiment was conducted in a virtual Linux environment with a 3.70 GHz CPUprocessor, 16.0 GB of memoryIntel and® Core™ Windows i7-8700K 10 64-bit Processor operating @3.70 system, GHz as shown in Table 3. RAM 16.0 GB Windows 10 64-bit Windows Subsystem OS for Linux Ubuntu 18.04 LTS Software FFmpeg 4.2.2 Appl. Sci. 2021, 11, 1844 9 of 13

The experiment was conducted in a virtual Linux environment with a 3.70 GHz pro- cessor, 16.0 GB of memory and Windows 10 64-bit operating system, as shown in Table3. For test sequences, six video sequences from class A1 and A2 with under the common test conditions of (VVC) were selected [18], and they were coded by H.264/AVC, HEVC and VP9 encoders. Table4 shows the bit rates of the test bitstreams.

Table 4. Bitrates of the coded test bitstreams by H.264/AVC, HEVC and VP9 for test sequences.

Bitrate (Mb/s) Sequence Name H.264/AVC HEVC VP9 Campfire 54.7 24.6 45.0 FoodMarket4 19.4 6.4 6.5 Tango2 23.6 10.1 16.4 CatRobot 20.7 9.8 20.3 DaylightRoad2 25.1 12.0 28.7 ParkRunning3 72.4 39.3 47.9

For performance evaluation, the thumbnail of the first frame of each test sequence was extracted for H.264/AVC, HEVC and VP9 bitstreams, and the extraction times were compared. The time saving (TS) for measuring the thumbnail extraction speed comparison was calculated, as defined by:   Torg − Tproposed TS = × 100 (2) Torg

Tables5–7 show the comparison of the thumbnail extraction time of each codec compared to the conventional method. The acceleration rate of the proposed thumbnail extraction algorithm was 66% for H.264/AVC bitstreams, 52% for HEVC bitstreams and 48% for VP9 bitstreams. The acceleration ratios may differ slightly depending on the codecs. The computing time to decode the inverse transform and intra prediction of H.264/AVC was higher than that in the others; in addition, the speed factors differed depending on image characteristics because the ratios of transform skip or zero residuals influenced the thumbnail extraction computing time. The proposed algorithm focuses on the removal of the inverse transformation and intra prediction portions. Table8 shows the peak signal-to-noise ratio values of the thumbnails compared with the conventional full decoding method for each codec. The result shows that the PSNR value significantly differs depending on the sequences. This is because the proposed method stores only pixels necessary for the thumbnail in the thumbnail buffer, and the down-sampling process is removed. Therefore, in the case of sequences containing complex textures, they suffer aliasing; thus, PSNR can be lower than for others. Nevertheless, the thumbnails generated by the proposed method had a sufficient level of visual quality for commercial use in thumbnail applications.

Table 5. Comparison of the extraction times for H.264/AVC bitstreams. TS: time saving.

Sequence Name Original (ms) Proposed (ms) TS Campfire 256 88 66% FoodMarket4 191 69 64% Tango2 209 72 66% CatRobot 238 84 65% DaylightRoad2 234 84 64% ParkRunning3 319 103 68% Average 66% Appl. Sci. 2021, 11, 1844 10 of 13

Table 6. Comparison of the extraction times for HEVC bitstreams.

Sequence Name Original (ms) Proposed (ms) TS Campfire 178 94 47% FoodMarket4 159 75 53% Tango2 156 66 51% CatRobot 178 84 53% DaylightRoad2 184 88 58% ParkRunning3 247 122 52% Average 52%

Table 7. Comparison of the extraction times for VP9 bitstreams.

Sequence Name Original (ms) Proposed (ms) TS Campfire 300 175 42% FoodMarket4 259 116 45% Tango2 256 141 54% CatRobot 297 163 55% DaylightRoad2 344 178 45% ParkRunning3 444 206 48% Average 48%

Table 8. PSNR (dB) compared with the conventional method for each codec.

Sequence Name H.264 HEVC VP9 Campfire 24.02 25.05 24.07 FoodMarket4 31.48 31.84 31.13 Tango2 28.17 28.62 28.07 CatRobot 23.74 24.37 23.93 DaylightRoad2 25.26 25.70 25.29 ParkRunning3 21.82 22.53 22.49 Average 25.75 26.35 25.83

Figures9–11 show the thumbnails from the proposed method alongside those from the conventional method, which performs down-sampling after decoding, for the ‘Tango2’, ‘Campfire’ and ‘ParkRunning3’ sequences. They were 4K sequences of 3840 × 2160 size. Figures9–11 show the thumbnails extracted by the proposed algorithm and exiting soft- ware, respectively. The width and height of the thumbnails were 1/8 of the original ones, respectively. Since the intra prediction was performed with the reconstructed pixels of the upper and left neighboring boundaries, errors at the boundary pixels propagated to the consecutive blocks. In the worst case, the error could propagate up to the last coding block of the slice or picture. The proposed algorithm was designed to reconstruct the boundary pixels with the same inverse transforms and prediction. The reconstructed pixels were efficiently stored in the down-sampled reference buffers. In addition, the thumbnail was a low-resolution image, the degradation of the image quality in the proposed method was insignificant and it was difficult to see their visual difference. Appl. Sci. 2021, 11, 1844 Appl. Sci. 2021, 11, 1844 11 of 13 11 of 13 Appl. Sci. 2021, 11, 1844 11 of 13

Original Proposed Original Proposed

H.264 H.264

HEVC HEVC

VP9 VP9

Figure 9. Subjective quality performance for the ‘Tango2’ sequence. Figure 9. SubjectiveFigure quality 9. Subjectiveperformance quality for performancethe ‘Tango2’ for sequence. the ‘Tango2’ sequence.

Original Proposed Original Proposed

H.264 H.264

HEVC HEVC

VP9 VP9

Figure 10. Subjective quality performance for the ‘Campfire’ sequence. Figure 10. SubjectiveFigure quality 10. Subjective performance quality for performance the ‘Campfire’ for the sequence. ‘Campfire’ sequence.

Appl. Sci. 2021, 11, 1844 12 of 13 Appl. Sci. 2021, 11, 1844 12 of 13

Original Proposed

H.264

HEVC

VP9

Figure 11. SubjectiveFigure quality 11. Subjective performance quality performance for the ‘ParkRunning3’ for the ‘ParkRunning3’ sequence. sequence.

4. Conclusions 4. Conclusions The conventionalThe thumbnail conventional extraction thumbnail method extraction consists method of the consists decoding of the and decoding down- and down- sampling stages forsampling a thumbnail. stages for However, a thumbnail. the However,decoding theand decoding down-sampling and down-sampling processes processes have high computational complexity and memory usage. In this paper, we proposed a have high computational complexity and memory usage. In this paper, we proposed a partial decoding method and a memory structure for high-speed thumbnail extraction. partial decoding Themethod proposed and partiala memory decoding stru methodcture for replaces high-speed the inverse thumbnail transform extraction. and intra prediction The proposed partialin the decodingdecoding process method with re partialplaces inverse the transforminverse transform and partial and intra prediction.intra The prediction in thecomputation decoding process complexity with is reducedpartial byinverse restoring transform the one-pixel and rulepartial per innerintra 8 × 8 sub- prediction. The computationblock along with complexity the rightmost is reduced pixels and by the restoring lowermost the pixels. one-pixel In addition, rule per we designed inner 8 × 8 sub-blocka memory along structure with the suitable rightm forost the pixels partial and decoding the lowermost process. The pixels. proposed In memory addition, we designedstructure a memory reduces structure 98% of the suit restorationable for bufferthe partial required decoding for 4K videosprocess. by The replacing the proposed memoryrestoration structure buffer reduces with 98% a low-resolution of the restoration thumbnail buffer buffer required and two for reference4K videos line buffers for intra prediction. In order to evaluate the performance of the proposed fast thumbnail by replacing the restoration buffer with a low-resolution thumbnail buffer and two extraction method, we implemented the proposed algorithm with the FFmpeg H.264/AVC, reference line buffersHEVC for and intra VP9 decodersprediction. and In compared order to them evaluate with the the speed performance of the conventional of the thumbnail proposed fast thumbnailextraction extraction algorithm implementedmethod, we onimplemented FFmpeg. For the 4K resolutionproposed videos,algorithm wecompared with the FFmpegrunning H.264/AVC, times byHEVC extracting and VP9 the thumbnail decoders of and the firstcompared frame ofthem the test with sequences. the With speed of the conventionalthe proposed thumbnail method, weextrac reducedtion algorithm 66% of the processimplemented time for on H.264/AVC, FFmpeg. 52%For for HEVC 4K resolution videos,and 48%we compared for VP9. In running addition, ti wemes reduced by extracting the amount the of thumbnail required memory of the first without visual frame of the test sequences.quality loss. With The proposed the prop algorithmosed method, was commercialized we reduced 66% and implementedof the process on an ARM time for H.264/AVC,processor 52% for for HEVC 2019 and and 2020 48% LG for televisions. VP9. In addition, we reduced the amount

of required memoryAuthor Contributions:without visualConceptualization, quality loss. J.L., The D.S.; proposed methodology, algorithm D.S.; investigation, was S.J., J.L.; commercialized andsoftware, implemented J.B., S.J.; writing—original on an ARM proc draftessor preparation, for 2019 J.B., and D.S.; 2020 writing—review LG televisions. and editing, J.B., D.S.; project administration, K.K.; supervision, D.S.; All authors have read and agreed to the published Author Contributions:version Conceptualization, of the manuscript. J.L., D.S.; methodology, D.S.; investigation, S.J., J.L.; software, J.B., S.J.; writing—originalFunding: This research draft waspreparat supportedion, J.B., by theD.S.; Ministry writing—review of Science and and ICT editing, (MSIT), J.B., Korea, under D.S.; project administration,the Information K.K.; Technologysupervision, Research D.S.; All Center authors (ITRC) have support read program and agreed (IITP-2020-2016-0-00288) to the published version of the manuscript. Funding: This research was supported by the Ministry of Science and ICT (MSIT), Korea, under the Information Technology Research Center (ITRC) support program (IITP-2020-2016-0-00288) supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP) and LG Electronics “Development of extraction technique of high-speed video thumbnail”. Conflicts of Interest: The authors declare no conflict of interest. Appl. Sci. 2021, 11, 1844 13 of 13

supervised by the Institute for Information & Communications Technology Planning & Evaluation (IITP) and LG Electronics “Development of extraction technique of high-speed video thumbnail”. Conflicts of Interest: The authors declare no conflict of interest.

References 1. Tudor, P. MPEG-2 video compression. Electron. Commun. Eng. J. 1995, 7, 257–264. [CrossRef] 2. Wiegand, T.; Sullivan, G.; Bjontegaard, G.; Luthra, A. Overview of the H.264/AVC video coding standard. IEEE Trans. Circuits Syst. Video Technol. 2003, 13, 560–576. [CrossRef] 3. Sullivan, G.J.; Ohm, J.; Han, W.J.; Wiegand, T. Overview of the high efficiency video coding (HEVC) standard. IEEE Trans. Circuits Syst. Video Technol. 2012, 22, 1649–1668. [CrossRef] 4. Bankoski, J.; Wilkins, P.; Xu, Y. Technical overview of VP8, an open source video codec for the web. In Proceedings of the 2011 IEEE International Conference on Multimedia and Expo (ICME), Wachington, DC, USA, 11–15 July 2011; pp. 1–6. 5. Mukherjee, D.; Bankoski, J.; Grange, A.; Han, J.; Koleszar, J.; Wilkins, P.; Xu, Y.; Bultje, R. The latest open-source video codec VP9-an overview and preliminary results. In Proceedings of the Picture Coding Symposium (PCS), San Jose, CA, USA, 8–11 December 2013; pp. 390–393. 6. Jo, H.H.; Sim, D.G.; Jeon, B.W. Hybrid parallelization for HEVC decoder. In Proceedings of the 2013 6th International Congress on Image and Signal Processing (CISP), Hangzhou, China, 16–18 December 2013. 7. Ryu, H.C.; Ahn, Y.J.; Mok, J.S.; Sim, D.G. Performance Analysis of HEVC Parallelization Methods for High-Resolution Videos. IEIE Trans. Smart Process. Comput. 2015, 4, 28–34. [CrossRef] 8. Lee, J.Y.; Moon, S.K.; Sung, W.Y. H.264 decoder optimization exploiting SIMD instructions. In Proceedings of the IEEE Asia-Pacific Conference on Circuits and Systems (APCCAS), Tainan, Taiwan, 6–9 December 2004. 9. Chi, C.C.; Alvarez-Mesa, M.; Bross, B.; Juurlink, B.; Schierl, T. SIMD acceleration for HEVC decoding. IEEE Trans. Circuits Syst. Video Technol. 2014, 25, 841–855. [CrossRef] 10. Kim, E.S.; Um, T.W.; Oh, S.J. A fast thumbnail extraction method in H.264/AVC video streams. IEEE Trans. Consumer Electron. 2009, 55, 1424–1430. [CrossRef] 11. Kim, M.H.; Lee, H.J.; Sull, S.H. Fast thumbnail generation in integer DCT domain for H.264/AVC. IEEE Trans. Consumer Electron. 2011, 57, 589–596. [CrossRef] 12. Chen, C.; Wu, P.H.; Chen, H. Transform-Domain Intra Prediction for H.264. In Proceedings of the IEEE International Symposium on Circuits and Systems, Kobe, Japan, 23–26 May 2005; pp. 1497–1500. 13. Lee, W.J.; Jeon, G.G.; Jeong, J.C. Fast thumbnail extraction algorithm for HEVC. In Proceedings of the 2015 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 9–12 January 2015; pp. 321–322. 14. Chen, W.H.; Smith, C.H.; Fralick, S.C. A fast computational algorithm for the discrete cosine transform. IEEE Trans. Commun. 1977, 25, 1004–1009. [CrossRef] 15. Shen, S.; Shen, W.; Fan, Y.; Zeng, X. A unified 4/8/16/32-point integer IDCT architecture for multiple video coding standards. In Proceedings of the IEEE International Conference on Multimedia and Expo, Melbourne, VIC, Australia, 9–13 July 2012; pp. 788–793. 16. FFmpeg Software. Available online: https://www.ffmpeg.org/ (accessed on 2 February 2021). 17. Ffmpegthumbnailer Software. Available online: http://code.google.com/p/ffmpegthumbnailer/ (accessed on 2 February 2021). 18. Bossen, F. Common test conditions and software reference configurations. In Proceedings of the Joint Collaborative Team on Video Coding 12th Meeting, Geneva, Switzerland, 14–23 January 2015.