Practical Implementations of Arithmetic Coding

Practical Implementations of Arithmetic Co ding Paul G. Howard and Je rey Scott Vitter Brown University Department of Computer Science Technical Rep ort No. 92{18 Revised version, April 1992 Formerly Technical Rep ort No. CS{91{45 App ears in Image and Text Compression, James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, 1992, pages 85{112. A shortened version app ears in the pro ceedings of the International Conference on Advances in Communication and Control COMCON 3, Victoria, British Columbia, Canada, Octob er 16{18, 1991. Practical Implementations of 1 Arithmetic Coding 2 3 Paul G. Howard Je rey Scott Vitter Department of Computer Science Brown University Providence, R.I. 02912{191 0 Abstract We provide a tutorial on arithmetic co ding, showing how it provides nearly optimal data compression and how it can b e matched with almost any prob- abilistic mo del. We indicate the main disadvantage of arithmetic co ding, its slowness, and give the basis of a fast, space-ecient, approximate arithmetic co der with only minimal loss of compression eciency. Our co der is based on the replacement of arithmetic by table lo okups coupled with a new deterministic probability estimation scheme. Index terms : Data compression, arithmetic co ding, adaptive mo deling, analysis of algorithms, data structures, low precision arithmetic. 1 A similar version of this pap er app ears in Image and Text Compression, James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, 1992, 85{112. A shortened version of this pap er app ears in the pro ceedings of the International Conference on Advances in Communication and Control COMCON 3, Victoria, British Columbia, Canada, Octob er 16{18, 1991. 2 Supp ort was provided in part by NASA Graduate Student Researchers Program grant NGT{ 50420 and by a National Science Foundation Presidential Young Investigators Award grant with matching funds from IBM. Additional supp ort was provided by a Universities Space Research As- so ciation/CESDIS asso ciate memb ership. 3 Supp ort was provided in part by National Science Foundation Presidential Young Investigator Award CCR{9047466 with matching funds from IBM, by NSF research grant CCR{9007851, by Army Research Oce grantDAAL03{91{G{0035, and by the Oce of Naval Research and the De- fense Advanced Research Pro jects Agency under contract N00014{91{J{4052 ARPA Order No. 8225. Additional supp ort was provided by a Universities Space Research Asso ciation/CESDIS asso ciate memb ership. 1 1 Data Compression and Arithmetic Co ding Data can b e compressed whenever some data symb ols are more likely than others. Shannon [54] showed that for the b est p ossible compression co de in the sense of minimum average co de length, the output length contains a contribution of lg p bits from the enco ding of each symb ol whose probability of o ccurrence is p.Ifwe can provide an accurate mo del for the probability of o ccurrence of each p ossible symbol at every p oint in a le, we can use arithmetic co ding to enco de the symb ols that actually o ccur; the numb er of bits used by arithmetic co ding to enco de a symb ol with probability p is very nearly lg p, so the enco ding is very nearly optimal for the given probability estimates. In this pap er we showby theorems and examples how arithmetic co ding achieves its p erformance. We also p oint out some of the drawbacks of arithmetic co ding in practice, and prop ose a uni ed compression system for overcoming them. We b egin by attempting to clear up some of the false impressions commonly held ab out arithmetic co ding; it o ers some genuine b ene ts, but it is not the solution to all data compression problems. The most imp ortant advantage of arithmetic co ding is its exibility: it can b e used in conjunction with any mo del that can provide a sequence of event probabilities. This advantage is signi cant b ecause large compression gains can b e obtained only through the use of sophisticated mo dels of the input data. Mo dels used for arithmetic co ding may b e adaptive, and in fact a numb er of indep endent mo dels may b e used in succession in co ding a single le. This great exibility results from the sharp separation of the co der from the mo deling pro cess [47]. There is a cost asso ciated with this exibility: the interface b etween the mo del and the co der, while simple, places considerable time and space demands on the mo del's data structures, esp ecially in the case of a multi-symb ol input alphab et. The other imp ortant advantage of arithmetic co ding is its optimality. Arithmetic co ding is optimal in theory and very nearly optimal in practice, in the sense of enco d- ing using minimal average co de length. This optimality is often less imp ortant than it might seem, since Hu man co ding [25] is also very nearly optimal in most cases [8,9, 18,39]. When the probability of some single symb ol is close to 1, however, arithmetic co ding do es give considerably b etter compression than other metho ds. The case of highly unbalanced probabilities o ccurs naturally in bilevel black and white image co ding, and it can also arise in the decomp osition of a multi-symb ol alphab et into a sequence of binary choices. The main disadvantage of arithmetic co ding is that it tends to b e slow. We shall see that the full precision form of arithmetic co ding requires at least one multiplication per event and in some implementations up to twomultiplications and two divisions per event. In addition, the mo del lo okup and up date op erations are slow b ecause of the input requirements of the co der. Both Hu man co ding and Ziv-Lemp el [59, 60] co ding are faster b ecause the mo del is represented directly in the data structures 2 2 TUTORIAL ON ARITHMETIC CODING used for co ding. This reduces the co ding eciency of those metho ds by narrowing the range of p ossible mo dels. Much of the current research in arithmetic co ding concerns nding approximations that increase co ding sp eed without compromising compression eciency. The most common metho d is to use an approximation to the multiplication op eration [10,27,29,43]; in this pap er we present an alternative approach using table lo okups and approximate probability estimation. Another disadvantage of arithmetic co ding is that it do es not in general pro duce a pre x co de. This precludes parallel co ding with multiple pro cessors. In addition, the p otentially unb ounded output delay makes real-time co ding problematical in critical applications, but in practice the delay seldom exceeds a few symb ols, so this is not a ma jor problem. A minor disadvantage is the need to indicate the end of the le. One nal minor problem is that arithmetic co des have p o or error resistance, esp e- cially when used with adaptive mo dels [5]. A single bit error in the enco ded le causes the deco der's internal state to b e in error, making the remainder of the deco ded le wrong. In fact this is a drawbackofal l adaptive co des, including Ziv-Lemp el co des and adaptive Hu man co des [12,15,18,26,55,56]. In practice, the p o or error resistance of adaptive co ding is unimp ortant, since we can simply apply appropriate error cor- rection co ding to the enco ded le. More complicated solutions app ear in [5,20], in which errors are made easy to detect, and up on detection of an error, bits are changed until no errors are detected. Overview of this pap er. In Section 2 we give a tutorial on arithmetic co ding. We include an intro duction to mo deling for text compression. We also restate several imp ortant theorems from [22] relating to the optimality of arithmetic co ding in theory and in practice. In Section 3 we present some of our current researchinto practical ways of improv- ing the sp eed of arithmetic co ding without sacri cing much compression eciency. The center of this research is a reduced-precision arithmetic co der, supp orted by ecient data structures for text mo deling. 2 Tutorial on Arithmetic Co ding In this section we explain how arithmetic co ding works and give implementation details; our treatment is based on that of Witten, Neal, and Cleary [58]. We p oint out the usefulness of binary arithmetic co ding that is, co ding with a 2-symb ol alphab et, and discuss the mo deling issue, particularly high-order Markov mo deling for text compression. Our fo cus is on enco ding, but the deco ding pro cess is similar. 2.1 Arithmetic co ding and its implementation Basic algorithm. The algorithm for enco ding a le using arithmetic co ding works conceptually as follows: 2.1 Arithmetic co ding and its implementation 3 Old interval 0LH1 Decomposition probability of ai 01 New interval 0LH1 Figure 1: Sub division of the currentinterval based on the probability of the input symbol a that o ccurs next. i 1. We b egin with a \currentinterval" [L; H initialized to [0; 1. 2.

Load more