Entropy Coding
Total Page:16
File Type:pdf, Size:1020Kb
Video Codec Design Iain E. G. Richardson Copyright q 2002 John Wiley & Sons, Ltd ISBNs: 0-471-48553-5 (Hardback); 0-470-84783-2 (Electronic) Entropy Coding 8.1 INTRODUCTION A video encoder contains two main functions: a source model that attempts to represent a video scene in a compact form that is easy to compress (usually an approximation of the original video information) and anentropy encoder that compresses the output of the model prior to storage and transmission. The source model is matched to the characteristics of the input data (images or video frames), whereas the entropy coder may use ‘general-purpose’ statistical compression techniques that are not necessarily unique in their application to image and video coding. As with the functions described earlier (motion estimation and compensation, transform coding,quantisation), the design ofan entropyCODEC is affected by a number of constraints including: 1. Compression eficiency: the aim is to represent the source model output using as few bits as possible. 2. Computational eficiency: thedesign should be suitable for implementation on the chosen hardware or software platform. 3. Error robustness: if transmission errors are likely, the entropy CODEC should support recovery from errors and should (if possible) limit error propagation at decoder (this constraint may conflict with (1) above). In a typical transform-based video CODEC, the data to be encoded by the entropy CODEC falls into three main categories: transform coefficients (e.g. quantised DCT coefficients), motion vectors and ‘side’ information (headers, synchronisation markers, etc.). The method of coding side information depends on the standard. Motion vectors can often be represented compactly in a differential form due to the high correlation between vectors for neighbouring blocks or macroblocks.Transform coefficients can be represented efficiently with‘run- level’ coding, exploiting the sparse nature of the DCT coefficient array. An entropy encoder maps input symbols (for example, run-level coded coefficients) to a compressed data stream. It achieves compression by exploiting redundancy in the set of input symbols, representing frequently occurring symbols with a small number of bits and infrequently occumng symbols with a larger number of bits. The two most popular entropy encodingmethods used in video coding standards are Huffman coding and arithmetic coding.Huffman coding (or ‘modified’Huffman coding) represents each input symbol by avariable-length codeword containing an integralnumber of bits. It is relatively 164 ENTROPY CODING straightforwardto implement, but cannot achieve optimal compression because of the restriction that each codeword must contain an integral number of bits. Arithmetic coding maps aninput symbol into a fractional number of bits,enabling greater compression efficiency at the expense of higher complexity (depending on the implementation). 8.2 DATA SYMBOLS 8.2.1 Run-LevelCoding The output of the quantiser stage in a DCT-based video encoder is a block of quantised transform coefficients. The arrayof coefficients is likely to be sparse:if the image block has been efficiently decorrelated by the DCT, most of the quantised coefficients in a typical block are zero. Figure 8.1 shows a typical block of quantised coefficients from an MPEG-4 ‘intra’block. The structure of thequantised block is fairlytypical. A few non-zero coefficients remain after quantisation, mainly clustered around DCT coefficient (0,O): this is the ‘DC’ coefficient and is usually the most important coefficient to the appearanceof the reconstructed image block. The block of coefficients shown in Figure 8.1 may be efficiently compressed as follows: 1. Reordering. The non-zero values are clustered around the top left of the 2-D array and this stage groups these non-zero values together. 2. Run-level coding. This stage attempts to find a more efficient representation for the large number of zeros (48 in this case). 3. Entropy coding. The entropy encoder attempts to reduce the redundancyof the data symbols. Reordering The optimum method of reordering the quantised data depends on the distribution of the non-zero coefficients. If the original image (or motion-compensated residual) data is evenly DC Figure 8.1 Block of quantisedcoefficients (intra-coding) DATA SYMBOLS 165 distributed in the horizontal and vertical directions (i.e. there is not a predominance of ‘strong’ image features in either direction), then the significant coefficientswill also tend to be evenly distributed about the top left of the array (Figure 8.2(a)). In this case, a zigzag reordering pattern such as Figure 8.2 (c) should group together the non-zero coefficients Typical coefficientmap: frame coding 8000 - 6000. 4000. 2000 - 0OL iPo2 Typical coefficientmap: field coding 2000 j 0Ob 2 2 88 (b) Figure 8.2 Typicaldata distributions and reordering patterns: (a) even distribution; (b) field distribution; (c) zigzag; (d) modified zigzag 166 ENTROPY CODING efficiently. However, in some cases an alternative pattern performs better. For example, a field of interlaced video tends to vary more rapidly in the vertical than in the horizontal direction (because it has been vertically subsampled). In this case the non-zero coefficients are likely to be ‘skewed’ as shown in Figure 8.2(b): they are clustered more to the leftof the array (corresponding to basis functions with a strong vertical variation, see for example Figure 7.4). A modified reordering pattern such as Figure 8.2(d) should perform better at grouping the coefficients together. Run-level coding The output of the reordering process is a linear array of quantised coefficients. Non-zero coefficients are mainly grouped together near the start of the array and the remaining values inthe array arezero. Long sequences of identicalvalues (zeros in this case) can be represented as a (run, level) code, where (run) indicates the number of zeros preceding a non-zero value and (level) indicates the sign and magnitude of the non-zero coefficient. The following example illustrates the reordering and run-level coding process. Example Theblock of coefficientsin Figure 8.1 isreordered with the zigzag scan shown in Figure 8.2 and the reordered array is run-level coded. Reordered array: [102, -33, 21, -3, -2, -3, -4, -3,0,2, 1,0, 1,0, -2, - 1, -1,0, 0,0, -2, 0,0, 0, 0,0,0,0,0,0,0,0,1,0 ...l Run-level coded: (0, 102) (0, -33) (0, 21) (0, -3) (0, -2) (0, -3) (0, -4) (0, -3) (1, 2) (0, 1) (1, 1) (1, -2) (0, - 1) (0, -1) (4, -2) (11, 1) DATA SYMBOLS 167 Two special cases need to be considered. Coefficient (0, 0) (the ‘DC’ coefficient) is impor- tant to the appearance of the reconstructed image block and has no preceding zeros. In an intra-coded block (i.e. coded without motion compensation), the DC coefficient is rarely zero and so is treated differently from other coefficients. In an H.263 CODEC, intra-DC coefficients are encoded with a fixed, relatively low quantiser setting (to preserve image quality) and without (run, level) coding. Baseline JPEG takes advantageof the property that neighbouringimage blocks tend to have similar mean values (and hence similar DC coefficientvalues) and each DC coefficient is encoded differentially from the previous DC coefficient. The second special case is the final run of zeros in a block. Coefficient (7, 7) is usually zero and so we need a special case to deal with the final run of zeros that has no terminating non-zero value. In H.261 and baseline JPEG, a special code symbol, ‘end of block’ or EOB, is inserted after the last (run, level) pair. This approach is known as ‘two-dimensional’ run- level coding since each code representsjust two values (run and level).The method doesnot perform well underhigh compression: in this case, many blocks contain only a DC coefficient and so the EOB codes make up a significant proportion of the coded bit stream. H.263 and MPEG-4 avoid this problemby encoding a flag along with each (run, level)pair. This ‘last’ flag signifies the final (run, level) pair in the block and indicates to the decoder thatthe rest of theblock should be filled withzeros. Each code now representsthree values (run, level, last) and so this method is known as ‘three-dimensional’ run-level-last coding. 8.2.2 Other Symbols In addition to run-level coded coefficient data, a number of other values need to be coded and transmitted by the video encoder. These include the following. Motion vectors The vectordisplacement between the current and reference areas (e.g. macroblocks)is encoded along with each dataunit. Motion vectors for neighbouring data units are oftenvery similar, and this property may be used to reduce the amount of information required to be encoded.In anH.261 CODEC,for example, the motion vector for each macroblock is predicted from the preceding macroblock. The difference between the current and previous vectoris encoded and transmitted (instead of transmittingthe vector itself). A more sophisticatedprediction is formed during MPEG-4/H.263 coding: the vector for each macroblock (or block if the optional advanced prediction mode is enabled) is predicted from up to three previously transmitted motion vectors. This helps to further reduce the transmitted information. These two methods of predicting the current motion vector are shown in Figure 8.3. Example Motion vector of current macroblock: x = +3.5, y = +2.0 Predicted motion vector from previous macroblocks: x = f3.0, y = 0.0 Differential motion vector: dx = +0.5, dy = -2.0 168 ENTROPY CODING Current Current macroblock macroblock H.261: predict MV from previous H.263/MPEG4: predict MV from three previous macroblock vector MV1 macroblock vectors MV1, MV2 and MV3 Figure 8.3 Motion vector prediction (H.261, H.263) Quantisation parameter In order to maintain a target bit rate, it is common for a video encoder to modify the quantisation parameter (scale factor or step size) during encoding.