Low-Complexity Adaptive Block-Size Transform Based on Extended Transforms
Total Page:16
File Type:pdf, Size:1020Kb
LOW-COMPLEXITY ADAPTIVE BLOCK-SIZE TRANSFORM BASED ON EXTENDED TRANSFORMS Honggang Qi1,Wen Gao 1, Siwei Ma2, Debin Zhao 1 and Xiangyang Ji1 1Institute of Computing Technology, Chinese Academy of Sciences E-mail: {hgqi, wgao, dbzhao, xyji}@jdl.ac.cn 2Department of Electrical Engineering and Integrated Medias Center University of Southern California E-mail: swma@usc.edu ABSTRACT small transform prevents more ringing artifacts in coded pictures [2]. To exert the virtues of different transform In this paper, a low-complexity 8x8/4x4 Adaptive Block- sizes greatly, Adaptive Block-size Transform (ABT) size Transform (ABT) scheme is tentatively proposed for scheme is employed in image and video coding standards. the Chinese Audio and Video coding Standard (AVS)1. The scheme can improve the coding performance In the proposed ABT scheme, an integer 8x8 transform is significantly. For example, the average luma PSNR gain derived from the integer 4x4 transform used in AVS in high-definition sequences is up to 0.5dB in H.264/AVC. according to a transform extension principle. The 8x8 In ABT scheme, the transforms in different sizes are transform not only has high energy compacted property adaptively chosen by encoder. Usually, RD decision is but also can be merely implemented within several employed in encoder for choosing the better transform [2]. additions and shifts, and all intermediate results are With RD decision, the optimal coding performance can be limited with 16-bit. The 8x8 transform and 4x4 transform achieved in RD sense. Since two transform sizes are can be merged together and share the same scale matrix so employed in ABT scheme, two zig-zag scans are needed that the hardware units and storage resources are for corresponding transforms. Moreover, the RD decision efficiently saved for both encoder and decoder. The requires that the current block is coded twice with the two experimental results on numerous sequences show that the transforms. It is clear that the ABT scheme introduces proposed ABT scheme can achieves significant high complexity to encoder. However, in view of its high performance improvement for AVS. coding performance, the increased complexity is tolerable in some applications. Thus, ABT scheme is adopted into 1. INTRODUCTION Fidelity Range Extensions of H.264/AVC by Joint Video Team (JVT). The Discrete Cosine Transform (DCT) is an effective In this paper, the ABT scheme with extended transform coding tool for image and video coding. In transforms is proposed for Chinese Audio and Video hybrid video coding, it is used to convert the correlated coding Standard (AVS). In this scheme, the extended 8x8 prediction residuals to seldom correlated frequency transform is derived from the original 4x4 transform in domain coefficients for more efficient data compression. AVS. The extended 8x8 transform not only improves the Different coding standards employ different transforms. coding performance of ABT but also reduces the memory MPGE-2 adopts an 8x8 floating-point DCT, which is a requirements and hardware units for both encoder and real DCT with fine compression performance but high decoder. Since the computation and storage complexities complexity. H.264/AVC adopts a 4x4 integer DCT [1], of ABT mainly focus on the encoder and only some which has similar properties to real 4x4 DCT but relative storage complexities are increased in the decoder, the low complexity. Since the inverse DCT in the encoder and proposed method can reduce the complexities of decoder the decoder are defined by exact integer operations, the efficiently. Thus, the method is meaningful to some problem of mismatches is avoided. applications where the decoders are paid more attention The different transform sizes possess different than encoder. coding performances. Larger transform has a better The rest of this paper is organized as follows: In the energy compaction than smaller transform. However, the section 2, the concept of extended transforms is introduced, and an 8x8 transform which is an extension of 1 4x4 transform of AVS is designed. In section 3, the The Chinese audio and video coding standard, an emerging standard targeting at higher coding efficiency and lower complexity. implementation of ABT in AVS is briefly specified. In 1424403677/06/$20.00 ©2006 IEEE 129 ICME 2006 section 4, the experimental results on test sequences with A low-complexity integer 4x4 transform is employed in different coding characteristics are shown. Finally, the AVS-M (a version of AVS orient to mobile applications), conclusions are made by us. The transform is expressed as follows: 2. THE EXTENDED TRANSFORMS OF ABT ª 2 2 2 2º « 3 1 1 3» Extended transforms means that the NxN transform as a C4 « » (3) part involves in the 2Nx2N transform [4]. Taking 4x4 and « 2 2 2 2» « » 8x8 transform as an example, if the 4x4 transform is ¬ 1 3 3 1¼ expressed with (1), its extended 8x8 transform should be expressed as (2). The transform has a good energy compaction and low complexity. It can be implemented with merely several ª c0 c0 c0 c0 º addition and shift operations. In practice, the C4 is just a « » c2 c6 c6 c2 (1) core matrix of DCT. The norms of its some rows are D4 « » « c4 c4 c4 c4 » unequal. A scale matrix is needed for normalizing the core « » matrix. As the output of the transform and scale of the ¬ c6 c2 c2 c6 ¼ input X, Y is expressed as ª c c c c c c c c º 0 0 0 0 0 0 0 0 Y (C4u X uC4c) S4 (4) « c c c c c c c c » « 1 3 5 7 7 5 3 1» « c2 c6 c6 c2 c2 c6 c6 c2 » « » where the notation denotes element-by-element c c c c c c c c (2) D8 « 3 7 1 5 5 1 7 3 » multiplication, and S4 means the 4x4 scale matrix. The « c c c c c c c c » « 4 4 4 4 4 4 4 4 » scale matrix S4 can be calculated by « c5 c1 c7 c3 c3 c7 c1 c5 » « c c c c c c c c » 6 2 2 6 6 2 2 6 3 3 « » 2 2 ¬« c7 c5 c3 c1 c1 c3 c5 c7 ¼» S4x,y 1 ( ¦ C4x,j )u( ¦C4ic,y ) (5) j 0 i 0 It can be found from (1) and (2) that the even basis functions of 8x8 transform D8 inherit all basis functions 2.2. 8x8 transform design of 4x4 transform D4. i.e. D82i,j= D4i,j, (0İi, j<4). The relation can be more clearly expressed from their butterfly According to the requirement of performance, the integer flow graph in Fig.1, where the two 4x4 transforms D4II 8x8 transform should preserve the real DCT compaction and D4IV mean the II-type and IV-type DCT respectively property. Its basis functions of transform matrix should [3]. If D4II is identical to D4, it is said that D8 is extended approximate the real transform’s. Meanwhile, these basis from D4. functions should not be so large that the transform can be implemented within narrow bit range. Moreover, the integer 8x8 transform should be an extension of the integer 4x4 transform C4 so that the 8x8 transform can be shared by the 4x4/8x8 transforms. Thus, the integer 8x8 II D4x4 transform should meet the following three design rules: 1) The basis functions of each row of the designed 8x8 transform should correlate closely with corresponding floating-point basis functions of real 8x8 transform where the correlation is measured by the cosine of the included angle between them; D IV 4x4 2) The basis functions of the designed 8x8 transform matrix should be represented by several simple integers; 3) The designed 8x8 transform must be the extension of Fig.1. Fast DCT butterfly signal flow graph. 4x4 transform of AVS i.e. meeting the relation: C82x,y=C4x,y, 0İx, y<4. 2.1. 4x4 transform 130 According to the above rules, an integer 8x8 DCT is prediction mode. Moreover, numorous experimental expressed as follows: results show that ABT applying to chroma signal does not bring the significant coding gains. Thus, from the view ª 4 4 4 4 4 4 4 4º point of complexityˈ ABT scheme is not used in chroma « 6 6 3 2 2 3 6 6» signals. Unlike the conventional schemes where all 8x8, « » « 6 2 2 6 6 2 2 6» 8x4, 4x8 and 4x4 transforms are used, no more transforms « » are used in proposed scheme except for 8x8 and 4x4 6 2 6 3 3 6 2 6 C8 « » / 2 (6) transforms. Because the 8x4 and 4x8 transforms just « 4 4 4 4 4 4 4 4» « » obtain small PSNR gains while introducing large coding « 3 6 2 6 6 2 6 3» loads to the codec. « 2 6 6 2 2 6 6 2» RD optimizer is used to choose the transform size « » with the better coding performance for current coded « 2 3 6 6 6 6 3 2» ¬ ¼ block. Since the RD optimizer just works in the encoder, a one-bit flag is added into bitstream to signal the decoder where C8 is an integer orthogonal matrix. Its even basis which transform is used to coding the current block. For functions and those of integer 4x4 transform matrix C4 further reducing the complexity, the 4x4 and 8x8 zig-zag meet the relation, C82x,y=C4x,y. Thus, C8 is an extended scans, entropy codings and deblocking tools are also transform of C4. The odd basis functions of C4 consists of merged with some optimization algorithms. For example, integers (6, 6, 3, 2), which are so simple that dynamic the 8x8 quantized transform coefficients are split into four range is only increased by 5.1 bits (log234).