Arithmetic Coding for Data Compression

Arithmetic Co ding for Data Compression PAUL G. HOWARD and JEFFREY SCOTT VITTER, fellow, ieee Arithmetic coding provides an e ective mechanism for remov- the subinterval is re ned incrementally using the probabili- ing redundancy in the encoding of data. We show how arithmetic ties of the individual events, with bits b eing output as so on coding works and describe an ecient implementation that uses as they are known. Arithmetic co des almost always give b et- table lookup as a fast alternative to arithmetic operations. The ter compression than pre x co des, but they lack the direct reduced-prec ision arithmetic has a provably negligible e ect on the corresp ondence b etween the events in the input data set and amount of compression achieved. We can speed up the implemen- bits or groups of bits in the co ded output le. tation further by use of paral lel processing. We discuss the role of probability models and how they provide probability information A statistical co der must work in conjunction with a mo d- to the arithmetic coder. We conclude with perspectives on the eler that estimates the probabili ty of each p ossible eventat comparative advantages and disadvantages of arithmetic coding. each p oint in the co ding. The probabili ty mo del need not Index terms | Data compression, arithmetic co ding, lossless describ e the pro cess that generates the data; it merely has compression, text mo deling, image compression, text compres- to provide a probabili ty distribution for the data items. The sion, adaptive, semi-adaptive. probabiliti es do not even have to b e particularly accurate, but the more accurate they are, the b etter the compression I. Arithmetic coding will b e. If the probabiliti es are wildly inaccurate, the le may The fundamental problem of lossless compression is to de- even b e expanded rather than compressed, but the original comp ose a data set for example, a text le or an image data can still b e recovered. To obtain maximum compres- into a sequence of events, then to enco de the events using as sion of a le, we need b oth a go o d probabili ty mo del and few bits as p ossible. The idea is to assign short co dewords an ecientway of representing or learning the probabili ty to more probable events and longer co dewords to less prob- mo del. able events. Data can b e compressed whenever some events To ensure deco dabili ty, the enco der is limited to the use are more likely than others. Statistical co ding techniques of mo del information that is available to the deco der. There use estimates of the probabili ties of the events to assign the are no other restrictions on the mo del; in particular, it can co dewords. Given a set of mutually distinct events e , e , e , 1 2 3 change as the le is b eing enco ded. The mo dels can b e adap- :::, e , and an accurate assessment of the probabili ty dis- n tive dynamically estimating the probability of eachevent tribution P of the events, Shannon [1] proved that the the based on all events that precede it, semi-adaptive using smallest p ossible exp ected numb er of bits needed to enco de a preliminary pass of the input le to gather statistics, or an event is the entropy of P , denoted by non-adaptive using xed probabiliti es for all les. Non- n adaptive mo dels can p erform arbitrarily p o orly [3]. Adaptive X co des allow one-pass co ding but require a more complicated H P = pfe g log pfe g; k k 2 data structure. Semi-adaptive co des require two passes and k=1 transmission of mo del data as side information; if the mo del where pfe g is the probability that event e o ccurs. An op- k k data is transmitted eciently they can provide slightly b et- timal co de outputs log p bits to enco de an event whose 2 ter compression than adaptive co des, but in general the cost probability of o ccurrence is p. Pure arithmetic co des sup- of transmitting the mo del is ab out the same as the \learn- plied with accurate probabiliti es provides optimal compres- ing" cost in the adaptive case [4]. sion. The older and b etter-known Hu man co des [2] are To get go o d compression we need mo dels that go b eyond optimal only among instantaneous co des, that is, those in global event counts and takeinto account the structure of which the enco ding of one event can b e deco ded b efore en- the data. For images this usually means using the numeric co ding has b egun for the next event. intensityvalues of nearby pixels to predict the intensityof In theory, arithmetic co des assign one \co deword" to each each new pixel and using a suitable probability distributio n p ossible data set. The co dewords consist of half-op en subin- for the residual error to allow for noise and variation b etween tervals of the half-op en unit interval [0; 1, and are expressed regions within the image. For text, the previous letters form by sp ecifying enough bits to distingui sh the subinterval cor- a context, in the manner of a Markov pro cess. resp onding to the actual data set from all other p ossible In Section I I, we provide a detailed description of pure subintervals. Shorter co des corresp ond to larger subinter- arithmetic co ding, along with an example to illustrate the vals and thus more probable input data sets. In practice, pro cess. We also show enhancements that allow incremental Manuscript received Mmm DD, 1993. Some of this work was p er- transmission and xed-precision arithmetic. In Section I I I formed while b oth authors were at Brown University and while the we extend the xed-precision idea to low precision, and show rst author was at Duke University. Supp ort was provided in part by howwe can sp eed up arithmetic co ding with little degrada- NASA Graduate Student Researchers Program grant NGT{50420,by tion of compression p erformance by doing all the arithmetic a National Science Foundation Presidential Young Investigator Award with matching funds from IBM, and by Air Force Oce of Scienti c ahead of time and storing the results in lo okup tables. We Research grantnumb er F49620{92{J{0515. Additional supp ort was call the resulting pro cedure quasi-arithmeticcoding. In Sec- provided by Universities Space Research Asso ciation/CESDIS asso- tion IV we brie y explore the p ossibili ty of parallel co ding ciate memb erships. P.G.Howard is with AT&T Bell Lab oratories, Visual Communica- using quasi-arithmetic co ding. In Section V we discuss the tions Research, Ro om 4C{516, 101 Crawfords Corner Road, Holmdel, mo deling pro cess, separating it into structural and probabil- NJ 07733{3030. ity estimation comp onents. Each comp onent can b e adap- J. S. Vitter is with the Department of Computer Science, Duke tive, semi-adaptive, or static; there are two approaches to University,Box 90129, Durham, NC 27708{0129. the probability estimation problem. Finally, Section VI pro- IEEE Log Numb er 0000000. 1 vides a discussion of the advantages and disadvantages of Table 1 Example of pure arithmetic co ding arithmetic co ding and suggestions of alternative metho ds. Action Subintervals II. How arithmetic coding works Start [0; 1 2 2 2 In this section we explain how arithmetic co ding works Sub divide with left prob. pfa g = [0; ; [ ; 1 1 3 3 3 2 and give op erational details; our treatment is based on that Input b , select right subinterval [ ; 1 1 3 of Witten, Neal, and Cleary [5]. Our fo cus is on enco ding, 1 2 5 5 Sub divide with left prob. pfa g = [ ; ; [ ; 1 2 but the deco ding pro cess is similar. 2 3 6 6 5 2 ; Input a , select left subinterval [ 2 3 6 A. Basic algorithm for arithmetic coding 3 2 23 23 5 Sub divide with left prob. pfa g = [ ; ; [ ; 3 5 3 30 30 6 23 5 The algorithm for enco ding a le using arithmetic co ding Input b , select right subinterval [ ; 3 30 6 works conceptually as follows: =[0:110001 ::: ;0:110101 ::: 2 2 1. We b egin with a \currentinterval" [L; H initial ized to Output 11001 0:11001 is the shortest binary 2 [0; 1. 23 5 fraction that lies within [ ; 2. For eachevent in the le, we p erform two steps. 30 6 a We sub divide the currentinterval into subintervals, one for each p ossible event. The size of a event's subinterval is prop ortional to the estimated proba- 0 1 initial currentinterval bility that the event will b e the next event in the sub divide le, according to the mo del of the input. 2 1 b We select the subinterval corresp onding to the 3 3 2 event that actually o ccurs next, and make it the 0 1 a b 1 1 3 new currentinterval. select b 1 3. We output enough bits to distingui sh the nal current 1 1 sub divide 2 2 interval from all other p ossible nal intervals. 2 5 1 a b 2 2 The length of the nal subinterval is clearly equal to the 3 6 pro duct of the probabili ties of the individual events, which select a 2 3 2 is the probability p of the particular sequence of events in sub divide 5 5 the le.

Load more