A Complete Video Coding Chain Based on Multi-Dimensional Discrete Cosine Transform

RADIOENGINEERING, VOL. 19, NO. 3, SEPTEMBER 2010 421 A Complete Video Coding Chain Based on Multi-Dimensional Discrete Cosine Transform Tomas FRYZA Department of Radio Electronics, Brno University of Technology, Purkynova 118, 612 00 Brno, Czech Republic [email protected] Abstract. The paper deals with a video compression method well in [5], [13] or [22]. Also the testing and comparison of based on the multi-dimensional discrete cosine transform. the 3-D DCT properties of several image and video compres- In the text, the encoder and decoder architectures including sion codecs could be found in literature, such as [2] or [21]. the definitions of all mathematical operations like the for- The contribution of the paper is to present a new Huffman ward and inverse 3-D DCT, quantization and thresholding dictionary, then the 3-D DCT implementation in digital sig- are presented. According to the particular number of cur- nal processor and testing of the method with comparison to rently processed pictures, the new quantization tables and other codecs, mainly to H.264 and VP8 from the new WebM entropy code dictionaries are proposed in the paper. The video container [20]. practical properties of the 3-D DCT coding chain compared The paper is divided into four major parts. The main with the modern video compression methods (such as H.264 emphasis is focused on the structure of the entire coding and WebM) and the computing complexity are presented as chain with a description of each coding block (Section 2). In well. It will be proved the best compress properties could be the second part, Section 3, the fast algorithms for 3-D DCT achieved by complex H.264 codec. On the other hand the calculation are presented. The experimental verification of computing complexity - especially on the encoding side - is the coding chain along with other video codecs comparison lower for the 3-D DCT method. is introduced in Section 4. The results discussion is depicted in Section 5. Keywords 2. Encoder Structure Video signal processing, data compression, discrete cosine transform, multidimensional coding, digital signal The basic 3-D DCT encoder and decoder could be de- processors. rived from the JPEG coder [18] and has only three main parts, as shown in Fig. 1: transform coding, frequency coefficients processing and entropy coding. The input of the 3-D DCT encoder f is formed as a video cube [15]. Video 1. Introduction cube is a technical term for 3-D matrix, with dimension of N made up of small elements of particular frames with the Video compression techniques have generally two ob- dimensions of N ×N pixels. The third (temporal) dimension jectives: to reduce the spatial redundancy among the pic- of the cube is commonly of the same size as the first (hor- ture elements and to reduce the temporal redundancy be- izontal) and the second (vertical) dimension. The reason is tween successive frames, i.e. interframe coding. The main the way of fast implementing method of the 3-D algorithm interframe coding principle is the predictive coding, which and the entropy coding (process of implementation with vari- is used in all major standard video codecs, such as H.261, able video cube dimension will be described later). Another H.263, MPEG-1, MPEG-2 and MPEG-4 [7], [8], [16], [19]. reason is the reachable compress ration: the smaller N in Video compression with the 3-D DCT (Three-Dimensional temporal dimension, the smaller compress ratio. Discrete Cosine Transform) contains both types of correlation reduction in a single transform coding. This transform For a color sequence three independent coders are used: is a block based method and requires storage of N picture one for the luminance signal Y and two for the chrominance frames but can provide an acceptable picture quality with signals Cb and Cr, respectively. Each signal is transformed high compression ratio [6], [15]. The basic structure of the into frequency domain using the 3-D DCT forward trans- 3-D DCT encoder and decoder is known and it was published form. The values of the frequency coefficients D are subse- for example in [15] or [21]. Several improvements mainly in quently reduced by quantization (DQ) and in some cases by quantization and entropy coding blocks for 2-D and 3-D co- thresholding as well (DQT ). Non-zero coefficients are finally efficients were proposed in [1], [11] or [12]. Fast methods encoded by an entropy coder and formed into an output bit for DCT and Huffman coding algorithms were published as stream F (see Fig. 1). 422 T. FRYZA, A COMPLETE VIDEO CODING CHAIN BASED ON MULTI-DIMENSIONAL DISCRETE COSINE TRANSFORM AC value vs. AC index fx;y;z f˜x;y;z Church 1500 Transform Transform encoding decoding 1000 Du;v;w D˜ u;v;w 500 Quantization Quantization Dequantization tables Qu;v;w 0 Q Du;v;w AC value [-] -500 Thresholding -1000 QT QT Du;v;w D˜ u;v;w -1500 Entropy Huffman Entropy 0 100 200 300 400 500 encoding code dictionaries decoding AC index [-] Fig. 2. Distribution of the AC coefficients for a slowly varying F F˜ sequence. Fig. 1. Video coding chain based on 3-D discrete cosine transform. AC value vs. AC index Wind For video cube dimensions of N, the forward 3-D DCT 1500 is defined in the following way [15] 1000 500 N−1 N−1 N−1 Du;v;w = gugvgw · ∑ ∑ ∑ fx;y;z · (1) 0 x=0 y=0 z=0 AC value [-] pu(2x + 1) pv(2y + 1) pw(2z + 1) cos · cos · cos -500 2N 2N 2N -1000 where Du;v;w represents 3-D DCT coefficient of a picture el- -1500 0 100 200 300 400 500 ement fx;y;z while u;v;w = 0;1;:::;N − 1. The constants g AC index [-] could be expressed as follows Fig. 3. Distribution of the AC coefficients for a fast varying sequence. g = p1=N : u;v;w = 0 u;v;w (2) p =N u;v;w 6= : 2 : 0 An inverse 3-D DCT transforms the coefficients from frequency to time domain and could be defined in the fol- The coefficient with coordinates of (0;0;0) is called lowing form [15]: the DC (Direct Current) coefficient and all others are called AC (Alternating Current) coefficients. It can be seen the N−1 N−1 N−1 total number of the 3-D DCT coefficients is equal to the ˜ ˜ fx;y;z = ∑ ∑ ∑ gugvgw · Du;v;w · (3) number of input pixels. It can be proved that the number u=0 v=0 w=0 of bits for representing the frequency coefficients is higher pu(2x + 1) pv(2y + 1) pw(2z + 1) cos · cos · cos : than for representing pixels in the time domain. The ade- 2N 2N 2N quate bit number for a real DC coefficient is equal to 13 bits and equal to 12 bits for an AC coefficient; i.e. symmetric The notation D˜ corresponds to the frequency coefficients DC 2 h −4,095; 4,095i and AC 2 h −2,047; 2,047i. The ef- modifications during the quantization and thresholding pro- fect of the transform coding is in reducing correlation be- cess, respectively and the erroneous transmission of encoded tween picture elements. The main energy of transformed sequences. values is centralized into the low frequency coefficients. It is known that fast varying sequences with high details have In the following text, the remaining coder blocks will increased number of non-zero coefficients. An example of be described. In Subsection 2.1 the quantization followed by AC coefficient distribution of a slowly and fast varying se- thresholding are outlined. The entropy coding is introduced quences are shown in Fig. 2 and in Fig. 3, respectively. in Subsection 2.2. RADIOENGINEERING, VOL. 19, NO. 3, SEPTEMBER 2010 423 2.1 Quantization and Thresholding coefficients are reordered using the modified zig-zag scan- ning, followed by run-length coding and finished by an end- According to the encoder block scheme, the second of-block identifier (EOB). Assuming that N = 8, one cube part of the encoder includes a quantization. There are two contains exactly 511 AC coefficients. Similarly to DC co- objectives of this operation: the first is to reduce the dy- efficients, the whole interval of possible values is divided namic range of the coefficients and the second is to decrease into 11 subintervals and theoretically, run-length values vary the number of insignificant coefficients, thus to increase the between 0 and 510. Thus the proposed code dictionary for compress ratio. This aim can be described by the following the AC coefficients contains 511×11+EOB = 5,622 unique equation codewords. The outlook of the particular codewords for AC coefficients with run-length equal to 0 is shown in Tab. 1. Q Du;v;w Du;v;w = (4) According to the extensive code dictionary, the less com- Qu;v;w mon AC values are represented by wide codewords. For Q where Du;v;w is a quantized 3-D DCT coefficient, Qu;v;w is example, a codeword for subinterval j256÷511j is 15 bits a quantization value from interval h1; 255i and operation wide, for j512÷1,023j 26 bits wide and for the subinterval b·c represents rounding down. In general, the quantization of j1,024÷2,047j, 27 bits are needed. In output bit stream, causes lossy processing. The greater the quantization value, each codeword is followed by the number identifies the co- the higher is the impact in decompressed video sequence efficient value within the subinterval.

A Complete Video Coding Chain Based on Multi-Dimensional Discrete Cosine Transform

Video Codec Requirements and Evaluation Methodology

(A/V Codecs) REDCODE RAW (.R3D) ARRIRAW

Encoding H.264 Video for Streaming and Progressive Download

Google Chrome Browser Dropping H.264 Support 14 January 2011, by John Messina

Qoe Based Comparison of H.264/AVC and Webm/VP8 in Error-Prone Wireless Networkqoe Based Comparison of H.264/AVC and Webm/VP8 In

Digital Audio Processor Vp-8

CALIFORNIA STATE UNIVERSITY, NORTHRIDGE Optimized AV1 Inter

Video Compression and Communications: from Basics to H.261, H.263, H.264, MPEG2, MPEG4 for DVB and HSDPA-Style Adaptive Turbo-Transceivers

Comparison of Video Compression Standards

VP8 Vs VP9 – Is This About Quality Or Bitrate?

20.1 Data Sheet - Supported File Formats

A Dataflow Description of ACC-JPEG Codec