AUTOMATIC GENERATION OF ++/JAVA CODE FOR BINARY

Danny Hong and Alexandros Eleftheriadis

Columbia University Dept. of Electrical Engineering New York, NY 10027, USA

ABSTRACT used. Despite such wide use of BACs, different BACs are gener- ally incompatible with each other (e.g., the code string generated by Binary arithmetic coding is, compression-wise, the most effective sta- an arithmetic encoder specified in JBIG cannot be correctly decoded tistical coding method used in image and video compression. It is be- by an arithmetic decoder specified for MPEG-4 shape coding); as a ing used for compressing bi-level images (JBIG, JBIG2, and MPEG- remedy, we present a unique solution that unifies binary arithmetic 4 shape coding) and is also being utilized (optionally) for coding of coders (BACs). We define a set of parameters that can be used to continuous-tone images (JPEG) and videos (H.264). Despite its wide automatically generate different variants of BACs. use, different arithmetic coders are incompatible with each other and Arithmetic coding can be separated into two main parts: mod- application developers are faced with the difficult task of understand- eling and coding. The modeling part appropriately selects one or ing and building each coder. We present a set of simple parameters more structures for conditioning events, and gathers the relative fre- that can be used to describe any binary arithmetic coder that is cur- quencies of the conditioned events [9, 19], which correspond to the rently being deployed, and we also introduce a software tool for au- event probabilities. Modeling, by itself, is a huge topic and there tomatically generating C++/Java code for binary arithmetic coding are numerous effective models that have been introduced. The H.264 according to the description. standard alone defines more than 300 models to account for different structures each bit of the binarized syntactic elements might have. Consequently, unifying modeling is an extremely difficult task (if not 1. INTRODUCTION impossible), and we focus only on the coding part. Flavor [20, 21] is a language that has been developed to describe [1] is arguably the most widely used statistical com- the syntax of any compressed bitstream so that the bitstream parsing pression mechanism for media representation (e.g., Group 3 [2] and and generation code can be automatically generated. Flavor already Group 4 [3] , MPEG-1 [4], MPEG-2 [5], and etc.). It is proven has constructs for describing variable-length codes and we comple- to be optimal among instantaneous (prefix) codes as it can repre- ment it by introducing a set of new constructs for describing binary sent any given random variable within 1 bit of its entropy. Arith- arithmetic codes. Using Flavor with the new constructs, the cod- metic coding [6, 7], derived from Elias coding [8], is another statis- ing part of any BAC can be easily described and the corresponding tical coding method proven to yield better compression than Huff- C++/Java code can be automatically generated. As a result, applica- man coding; however, it has not been widely used for media cod- tion developers can solely concentrate on the modeling part, which ing due to its complexity and patent issues. The very first, practical has been shown to have a very high impact on the compression effec- arithmetic coder was developed for compressing bi-level images (the tiveness of the BAC compared to the coding part. Skew coder [9, 10] and the Q-Coder [11]), as Huffman coding can- The next section briefly describes the main concept behind binary not compress binary symbols, unless groups of symbols are coded at arithmetic coding, Sections 3 and 4 present the constructs needed to a time. Run-length coding (e.g., Golomb coding [12]) is a good al- describe practical BACs, and we conclude with Section 5. ternative coding method for binary symbols, when the probability of one symbol is much higher than the other. Nevertheless it is a static coding method and for the best result for all possible binary source 2. BACKGROUND sequences, using an adaptive binary arithmetic coder (BAC) yields better compression. A high-level pseudo-code describing the basic concept of binary arith- Even today, most of the practical arithmetic coders deal solely metic coding is depicted in Figure 1. The variable R represents the with binary alphabets: binary arithmetic coding is computationally current interval (initially 1) and the interval is divided into two subin- simple and it makes using higher-order conditioning models feasi- tervals (R0 and R1) according to the probabilities of the two possible ble. In some cases it is only natural to assume a binary source. For symbols (P0 and P1=1-P0). The variable L represents the lower instance, JBIG [13], JBIG2 [14], and MPEG-4 shape coding [15] fo- bound of the current interval (the interval is represented as [L, L+R)) cus on coding of bi-level images (JBIG and JBIG2 can also be used and if the symbol 0 is being coded, then, assuming that R0 is al- for grayscale images where bit-plane by bit-plane coding is applied), ways above R1, the new interval is [L+R1, L+R1+R0); likewise, for and for JPEG2000 [16] and MPEG-4 texture coding [15], bit-plane symbol 1, the new interval is [L, L+R1). For coding of each sym- coding is ultimately applied. On the other hand, arithmetic coding bol, the current interval gets subdivided and at the end, the minimum can optionally be used in JPEG [17] and H.264 [18], and in these number of bits that can uniquely represent the final interval gets out- cases, each syntactic element is first binarized so that a BAC can be put as the code string. For decoding, the variable V represents the code string and the decoder essentially mimics the encoding process This material is based upon work supported in part by the National Sci- to deduce the original symbols. X represents the current symbol en- ence Foundation under Grant ACI-0313116. coded/decoded. the upper subinterval to the symbol 0 and the lower subinterval to 1) R0 = R*P0 1) R0 = R*P0 1 (0 over 1 SOC) as shown in Figure 2. Alternatively, it has been 2) R1 = R-R0 2) R1 = R-R0 shown to be optimal (computation-wise), for software arithmetic cod- 3) if (X == 0) R = R0, L = L+R1 3) if (V-L >= R1) R = R0, L = L+R1, X = 0 ing, to have the upper subinterval assigned to the least probable sym- else R = R1 else R = R1, X = 1 bol (LPS) and the lower one to the most probable symbol (MPS) [11]. (a) Encoding (b) Decoding This results in one less operation when coding an MPS than coding an LPS. On the other hand, when parallel processing is possible, the MPS over LPS SOC is more efficient. In Flavor, the SOC keyword Fig. 1. Binary Elias coding. can be set to one of the four options: 1) LM Ð LPS over MPS, 2) ML Ð MPS over LPS, 3) 01 Ð 0 over 1, and 4) 10 Ð 1 over 0. Another possibility is to have any mixture of the above four options; as long The current interval of an arithmetic coder is referred to as the as both the encoder and decoder follow the same SOC, there will be state (or internal state) of the coder, and there are many ways to rep- no ambiguity; however, all of the currently available BACs use one of resent it. As in the above example, we can use L and R. Alternatively, the four simple SOCs listed above, and there is no inherent advantage let H be the upper (higher) bound of the current interval, and the inter- of using any combination of the four options. val can be represented by [L, H). There are also other representations 4) Truncation excess (TE). In Figure 2, due to the integer di- but as long as the intervals do not overlap, the decoder can yield cor- vision applied, R0 gets truncated and the excess range is assigned rect symbols. Additionally, all units of the current interval should to R1. The best way to distribute the truncation excess (TE) due to be assigned to any one of the subintervals to maximize compression. integer arithmetic is to assign the TE to the two subintervals accord- The Flavor-generated code uses the [L, L+R) interval convention. ing to the corresponding symbol probabilities. However, this can be too complex (none of the currently availabe BACs supports this), and 3. INTEGER ARITHMETIC CODING the next best choice is to assign the TE to the MPS. Although it is very simple to assign the TE to the MPS when SOC=LM or ML,an

To overcome the precision problem inherent in Elias coding, most of extra comparison is required (per symbol) when SOC=01 or 10.As  the practical arithmetic coders are implemented using integer arith- a result, Flavor supports four options MPS, LPS, 0, 1 for the TE metic with renormalization [7, 11]. Though it is possible to use keyword (e.g., TE=0 assigns the TE to the symbol 0). floating-point numbers, integer arithmetic is preferred for its simplic- 5) When to renormalize (R). Using a fixed-precision arithmetic ity and better portability. As a consequence of using integer arith- coding requires a renormalization process, which prevents R from metic, the probabilities of the symbols are represented by respective getting too small (so that R can be represented by a fixed-bit integer counts (C0and C1), and the corresponding integer, binary arithmetic and that underflow can be prevented). At the same time, renormal- coding process is shown in Figure 2. In the following, we describe ization fixes the retention problem, allowing sequential coding, by the set of parameters that can be set to describe any integer BAC. outputting (in the case of encoding) the known bits of the code string. For example, if the subinterval entirely lies within [0, 0.5), then the first bit of the code string must be 0. Renormalization can be applied 1) R0 = R * C0 / (C0+C1) 1) R0 = R * C0 / (C0+C1) as soon as the leading bit of the code string is known (e.g., the CACM 2) R1 = R-R0 2) R1 = R-R0 implementation [7]), or when the current interval R falls below a cer- 3) if (X == 0) R = R0, L = L+R1 3) if (V-L >= R1) R = R0, L = L+R1, X = 0 tain value. For example, in the TOIS implementation [22], renormal- else R = R1 else R = R1 X = 1 ization is applied when R<=QTR, and the MPEG-4 Visual and H.264 4) renormalize 4) renormalize specifications define renormalization to be applied when R=Rmin causes renormalization to be applied whenever R falls below Rmin. Fig. 2. Binary arithmetic coding. It is also possible to speed up the coding process by outputting a byte at a time [23], or more generally, to output n>=1 bits at a 1) Precision (B). B is the number of bits used to represent the time [24]. Using the Flavor construct R(n)>=Rmin indicates that current interval. Using the specified value for B, Flavor defines two during renormalization n bits are output at a time. Note that even constants HALF=1<<(B-1) and QTR=1<<(B-2) that are needed though n is not a multiple of 8, Flavor-generated coders access bit- for renormalization. The constants can also be set manually. Note streams in bytes and Ò bits are obtained via efficient bit string manip- that using integer arithmetic with B-bit precision is the same as map- ulations. ping the [0, 1) interval into the [0, 1<

of the two values: M Ð apply multiplication first or D Ð apply division byte is encountered (CO=BS(0xFF,0x00)).  first. 7) Initialization L, R, (V or D) . Though it makes the most 3) Symbol ordering convention (SOC). For proper decoding of sense to initially set L to 0 and R to the biggest value possible (1<=2<<(B-b), then it can be shown 2) R1 = R-R0 2) R1 = R-R0 that b+1 bits are always enough to uniquely identify the last interval. 3) if (X == 0) 3) if (V-L >= R1) Usually an encoded file (e.g., an H.264 video data) contains multiple R = R0 , L = L+R1, R = R0, L = L+R1, X = 0, components where each component is separatly coded. Thus, in the i = Next0[i] i = Next0[i] case where b+1 bits are used to disambiguate the last symbol of a else else component, the decoder must recant B-(b+1) bits, which are needed R = R1, R = R1, i = Next1[i] i = Next1[i] to decode the symbols of the subsequent component. A simpler, and 4) renormalize 4) renormalize less effective (compression-wise), method is to always output B bits to disambiguate the last symbol so that the decoder does not have to (a) Encoding (b) Decoding recant any bits. The most effective way (but also more complex) to disambiguate Fig. 4. Fast arithmetic coding. the last symbol is to calculate the exact bits needed. For this, Mof- fat and Turpin [25] introduce a general method, known as the fru- A general version of the Q-Coder is the M coder [26] specified gal method. We introduce the end construct for indicating the last in the H.264 standard. In the M coder, rather than assuming R to be symbol disambiguation method. For example, the end(frugal)

always equal to 1 (substituting R ¾ [0.75, 1.5) by 1 is a rather crude statement indicates the use of the frugal method, and in this case, the approximation), R is better approximated by allowing U>1 different decoder also has to go through the same calculations to determine values. For example, the M coder used by an H.264 video coder how many bits to recant. This requires the decoder to maintain the allows R to take U=4 different values. Then the values of R LPS L variable and the decoding method using only the D variable cannot for, not one but, four different values of R are pre-specified. As a be used. By default b+1 bits are used (in the case where the CACM result, the table is bigger than the one used for the Q-Coder; but, the renormalization is used, only 2 bits are used [7]), however, specifying M coder yields better compression. In the M coder, the RTable is end(B) uses B bits of L to disambiguate the last symbol. Figure 3 accessed by two indices Ð i and r. As in the Q-Coder, the index shows Flavor descriptions for several integer BACs.

i determines the current probability distribution and the index r ¾ [0, U) determines the current range. To maximize the coding speed AC { B=32, AC{ B=32, of the M coder, U is restricted to be a power of 2, i.e., U=1<>q)&((1<QTR, R>=QTR, 9) Specifying the lookup tables. The table specified can con- R=CACM, CO=FO, CO=FO, tain the values for R0 (as shown in Figure 4), R1, R LPS,orR MPS. CO=FO, init(R=HALF,D=B), init(R=HALF-1,V=B-1), The keyword RTable can be used to indicate the type of table be- init(R=(1<=QTR, [10] G. G. Langdon and J. Rissanen, “A Simple General Binary CO=FO, Source Code,” IEEE Trans. on Info. Theory, vol. 28, pp. 800Ð init(R=HALF-2, D=B-1), 803, 1982. RTable(LPS)=MTable(64, 4), Next(LPS)=NextLPS(64), [11] W. B. Pennebaker and et al., “An Overview of the Basic Princi- Next(MPS)=NextMPS(64) } ples of the Q-Coder Adaptive Binary Arithmetic Coder,” IBM J. Res. Develop., vol. 32, pp. 717Ð726, 1988. Fig. 5. Flavor description of the H.264 arithmetic coder. [12] S. W. Golomb, “Run-Length Encodings,” IEEE Trans. on Info. Theory, vol. 12, pp. 399Ð401, 1966. The quasi-arithmetic coder [27] is perhaps the fastest BAC, as [13] ISO/IEC 11544 International Standard (JBIG), Information it replaces all arithmetic operations with table lookups; all calcula- Technology - Coded Representation of Picture and Audio In- tions are done in advance. It takes the idea of the M coder and the formation - Progressive Bi-level , 1993. Q-Coder a step further, and in addition to the RTable and the transi- tion tables, four additional tables are specified. Two tables, which can [14] ISO/IEC 14492 International Standard (JBIG2), Information be specified using the Out keyword, are needed to specify the output. Technology - Lossy/Lossless Coding of Bi-level Images, 2001. For example the Out(LPS)=OutLPS(64,4) construct defines the [15] ISO/IEC 14496-2 International Standard (MPEG-4:2), Infor- OutLPS table where each entry of the table indicates the output for mation technology Ð Coding of audio-visual objects Ð Part 2: the corresponding entry in RTable (with 64x4 entries) when the Video, 1999. symbol to be coded is an LPS; likewise for the MPS, Out(MPS) can [16] ISO/IEC 15444 International Standard (JPEG 2000), Informa- be used. Two additional tables, specified using the NextR keyword, tion technology - JPEG 2000 image coding system, 2000. are used for determining the next r index (as the transition tables defined using the Next keyword are used to determine the next i [17] ISO/IEC 10918 International Standard (JPEG), Information index). technology Ð Digital compression and coding of continuous- tone still images, 1994. 5. CONCLUSION [18] ISO/IEC 14496-10 International Standard (MPEG-4:10), Kla- genfurt, AT, Information Technology - Coding of Audio-Visual We have described a set of new constructs (based on common features Objects - Part 10: (FDIS), 2003. in BACs that are currently being deployed) for Flavor, which can be [19] J. Rissanen and G. G. Langdon, “Universal Modeling and Cod- used to describe any BACs. Then, using the Flavor translator, C++ or ing,” IEEE Trans. on Info. Theory, vol. 27, pp. 12Ð23, 1981. Java code can be automatically generated for the described arithmetic [20] A. Eleftheriadis, “Flavor: A Language for Media Representa- coder. tion,” in ACM Int. Conf. on Multimedia, 1997, Proceedings, pp. 1Ð9. 6. REFERENCES [21] Y. Fang and A. Eleftheriadis, “Automatic Generation of Entropy Coding Programs Using Flavor,” in IEEE Workshop on Multi- [1] D. A. Huffman, “A Method for the Construction of Minimum media Signal Processing, 1998, Proceedings, pp. 341Ð346. Redundancy Codes,” Proceedings of the IRE, vol. 40, pp. 1098Ð 1101, 1952. [22] A. Moffat, R. M. Neal, and I. H. Witten, “Arithmetic Coding Revisited,” ACM Transactions on Information Systems, vol. 16, [2] CCITT (ITU Recommendation T.4), Standardization of Group pp. 256Ð294, 1998. 3 Facsimile Apparatus for Document Transmission, 1980, amended in 1984 and 1988. [23] M. Schindler, “A Fast Renormalisation for Arithmetic Coding,” in IEEE Conference, 1998, Proceedings, p. [3] CCITT (ITU Recommendation T.11), Facsimile Coding 572. Schemes and Coding Control Functions for Group 4 Facsimile Apparatus, 1984, amended in 1988. [24] L. Stuiver and A. Moffat, “Piecewise Integer Mapping for Arithmetic Coding,” in IEEE Data Compression Conference, [4] ISO/IEC 11172 International Standard (MPEG-1), Information 1998, Proceedings, pp. 3Ð12. technology Ð Coding of moving pictures and associated audio for digital storage media at up to about 1,5 Mbits/s, 1993. [25] A. Moffat and A. Turpin, Compression and Coding Algorithms, Kluwer Academic, 2002. [5] ISO/IEC 13818 International Standard (MPEG-2), Information technology Ð Generic coding of moving pictures and associated [26] D. Marpe and T. Wiegand, “A Highly Efficient Multiplication- audio information, 1996. Free Binary Arithmetic Coder and Its Application in Video Coding,” in IEEE Int. Conf. on Image Processing, 2003, Pro- [6] G. G. Langdon, “An Introduction to Arithmetic Coding,” IBM ceedings, pp. 263Ð266. J. Res. Develop., vol. 28, pp. 135Ð149, 1984. [27] P. G. Howard and J. S. Vitter, “Practical Implementations of [7] I. H. Witten, R. M. Neal, and J. G. Cleary, “Arithmetic Coding Arithmetic Coding,” in Image and Text Compression, 1992, for Data Compression,” Communications of ACM, vol. 30, pp. Kluwer Academic, pp. 85Ð112. 520Ð540, 1987.