Practical Implementations of
Arithmetic Co ding
Paul G. Howard and Je rey Scott Vitter
Brown University
Department of Computer Science
Technical Rep ort No. 92{18
Revised version, April 1992
Formerly Technical Rep ort No. CS{91{45
App ears in Image and Text Compression,
James A. Storer, ed., Kluwer Academic Publishers, Norwell, MA, 1992, pages 85{112.
A shortened version app ears in the pro ceedings of the
International Conference on Advances in Communication and Control COMCON 3,
Victoria, British Columbia, Canada, Octob er 16{18, 1991.
Practical Implementations of
1
Arithmetic Coding
2 3
Paul G. Howard Je rey Scott Vitter
Department of Computer Science
Brown University
Providence, R.I. 02912{191 0
Abstract
We provide a tutorial on arithmetic co ding, showing how it provides nearly
optimal data compression and how it can b e matched with almost any prob-
abilistic mo del. We indicate the main disadvantage of arithmetic co ding, its
slowness, and give the basis of a fast, space-ecient, approximate arithmetic
co der with only minimal loss of compression eciency. Our co der is based on
the replacement of arithmetic by table lo okups coupled with a new deterministic
probability estimation scheme.
Index terms : Data compression, arithmetic co ding, adaptive mo deling, analysis
of algorithms, data structures, low precision arithmetic.
1
A similar version of this pap er app ears in Image and Text Compression, James A. Storer, ed.,
Kluwer Academic Publishers, Norwell, MA, 1992, 85{112. A shortened version of this pap er app ears
in the pro ceedings of the International Conference on Advances in Communication and Control
COMCON 3, Victoria, British Columbia, Canada, Octob er 16{18, 1991.
2
Supp ort was provided in part by NASA Graduate Student Researchers Program grant NGT{
50420 and by a National Science Foundation Presidential Young Investigators Award grant with
matching funds from IBM. Additional supp ort was provided by a Universities Space Research As-
so ciation/CESDIS asso ciate memb ership.
3
Supp ort was provided in part by National Science Foundation Presidential Young Investigator
Award CCR{9047466 with matching funds from IBM, by NSF research grant CCR{9007851, by
Army Research Oce grantDAAL03{91{G{0035, and by the Oce of Naval Research and the De-
fense Advanced Research Pro jects Agency under contract N00014{91{J{4052 ARPA Order No. 8225.
Additional supp ort was provided by a Universities Space Research Asso ciation/CESDIS asso ciate
memb ership.
1
1 Data Compression and Arithmetic Co ding
Data can b e compressed whenever some data symb ols are more likely than others.
Shannon [54] showed that for the b est p ossible compression co de in the sense of
minimum average co de length, the output length contains a contribution of lg p
bits from the enco ding of each symb ol whose probability of o ccurrence is p.Ifwe can
provide an accurate mo del for the probability of o ccurrence of each p ossible symbol
at every p oint in a le, we can use arithmetic co ding to enco de the symb ols that
actually o ccur; the numb er of bits used by arithmetic co ding to enco de a symb ol with
probability p is very nearly lg p, so the enco ding is very nearly optimal for the given
probability estimates.
In this pap er we showby theorems and examples how arithmetic co ding achieves
its p erformance. We also p oint out some of the drawbacks of arithmetic co ding
in practice, and prop ose a uni ed compression system for overcoming them. We
b egin by attempting to clear up some of the false impressions commonly held ab out
arithmetic co ding; it o ers some genuine b ene ts, but it is not the solution to all data
compression problems.
The most imp ortant advantage of arithmetic co ding is its exibility: it can b e
used in conjunction with any mo del that can provide a sequence of event probabilities.
This advantage is signi cant b ecause large compression gains can b e obtained only
through the use of sophisticated mo dels of the input data. Mo dels used for arithmetic
co ding may b e adaptive, and in fact a numb er of indep endent mo dels may b e used
in succession in co ding a single le. This great exibility results from the sharp
separation of the co der from the mo deling pro cess [47]. There is a cost asso ciated with
this exibility: the interface b etween the mo del and the co der, while simple, places
considerable time and space demands on the mo del's data structures, esp ecially in
the case of a multi-symb ol input alphab et.
The other imp ortant advantage of arithmetic co ding is its optimality. Arithmetic
co ding is optimal in theory and very nearly optimal in practice, in the sense of enco d-
ing using minimal average co de length. This optimality is often less imp ortant than it
might seem, since Hu man co ding [25] is also very nearly optimal in most cases [8,9,
18,39]. When the probability of some single symb ol is close to 1, however, arithmetic
co ding do es give considerably b etter compression than other metho ds. The case of
highly unbalanced probabilities o ccurs naturally in bilevel black and white image
co ding, and it can also arise in the decomp osition of a multi-symb ol alphab et into a
sequence of binary choices.
The main disadvantage of arithmetic co ding is that it tends to b e slow. We shall
see that the full precision form of arithmetic co ding requires at least one multiplication
per event and in some implementations up to twomultiplications and two divisions
per event. In addition, the mo del lo okup and up date op erations are slow b ecause
of the input requirements of the co der. Both Hu man co ding and Ziv-Lemp el [59,
60] co ding are faster b ecause the mo del is represented directly in the data structures
2 2 TUTORIAL ON ARITHMETIC CODING
used for co ding. This reduces the co ding eciency of those metho ds by narrowing
the range of p ossible mo dels. Much of the current research in arithmetic co ding
concerns nding approximations that increase co ding sp eed without compromising
compression eciency. The most common metho d is to use an approximation to
the multiplication op eration [10,27,29,43]; in this pap er we present an alternative
approach using table lo okups and approximate probability estimation.
Another disadvantage of arithmetic co ding is that it do es not in general pro duce a
pre x co de. This precludes parallel co ding with multiple pro cessors. In addition, the
p otentially unb ounded output delay makes real-time co ding problematical in critical
applications, but in practice the delay seldom exceeds a few symb ols, so this is not a
ma jor problem. A minor disadvantage is the need to indicate the end of the le.
One nal minor problem is that arithmetic co des have p o or error resistance, esp e-
cially when used with adaptive mo dels [5]. A single bit error in the enco ded le causes
the deco der's internal state to b e in error, making the remainder of the deco ded le
wrong. In fact this is a drawbackofal l adaptive co des, including Ziv-Lemp el co des
and adaptive Hu man co des [12,15,18,26,55,56]. In practice, the p o or error resistance
of adaptive co ding is unimp ortant, since we can simply apply appropriate error cor-
rection co ding to the enco ded le. More complicated solutions app ear in [5,20], in
which errors are made easy to detect, and up on detection of an error, bits are changed
until no errors are detected.
Overview of this pap er. In Section 2 we give a tutorial on arithmetic co ding.
We include an intro duction to mo deling for text compression. We also restate several
imp ortant theorems from [22] relating to the optimality of arithmetic co ding in theory
and in practice.
In Section 3 we present some of our current researchinto practical ways of improv-
ing the sp eed of arithmetic co ding without sacri cing much compression eciency.
The center of this research is a reduced-precision arithmetic co der, supp orted by
ecient data structures for text mo deling.
2 Tutorial on Arithmetic Co ding
In this section we explain how arithmetic co ding works and give implementation
details; our treatment is based on that of Witten, Neal, and Cleary [58]. We p oint out
the usefulness of binary arithmetic co ding that is, co ding with a 2-symb ol alphab et,
and discuss the mo deling issue, particularly high-order Markov mo deling for text
compression. Our fo cus is on enco ding, but the deco ding pro cess is similar.
2.1 Arithmetic co ding and its implementation
Basic algorithm. The algorithm for enco ding a le using arithmetic co ding works
conceptually as follows:
2.1 Arithmetic co ding and its implementation 3
Old interval 0LH1
Decomposition probability of ai 01
New interval
0LH1
Figure 1: Sub division of the currentinterval based on the probability of the input
symbol a that o ccurs next.
i
1. We b egin with a \currentinterval" [L; H initialized to [0; 1.
2. For each symb ol of the le, we p erform two steps see Figure 1:
a We sub divide the currentinterval into subintervals, one for each p ossible
alphab et symb ol. The size of a symb ol's subinterval is prop ortional to the
estimated probability that the symb ol will b e the next symb ol in the le,
according to the mo del of the input.
b We select the subinterval corresp onding to the symb ol that actually o ccurs
next in the le, and make it the new currentinterval.
3. We output enough bits to distinguish the nal currentinterval from all other
p ossible nal intervals.
The length of the nal subinterval is clearly equal to the pro duct of the probabilities
of the individual symb ols, which is the probability p of the particular sequence of
symb ols in the le. The nal step uses almost exactly lg p bits to distinguish the
le from all other p ossible les. We need some mechanism to indicate the end of the
le, either a sp ecial end-of- le symb ol co ded just once, or some external indication of
the le's length.
In step 2, we need to compute only the subinterval corresp onding to the symbol a
i
P
i 1
that actually o ccurs. To do this we need two cumulative probabilities, P = p
C k
k =1
P
i
and P = p . The new subinterval is [L + P H L;L+P H L. The
N k C N
k =1
need to maintain and supply cumulative probabilities requires the mo del to havea
complicated data structure; Mo at [35] investigates this problem, and concludes for
amulti-symb ol alphab et that binary search trees are ab out twice as fast as move-to-
front lists.
Example 1 : We illustrate a non-adaptive co de, enco ding the le containing the
symb ols bbb using arbitrary xed probability estimates p =0:4, p =0:5, and
a b
p =0:1. Enco ding pro ceeds as follows: EOF
4 2 TUTORIAL ON ARITHMETIC CODING
Current Subintervals
Interval Action a b EOF Input
[0:000; 1:000 Sub divide [0:000; 0:400 [0:400; 0:900 [0:900; 1:000 b
[0:400; 0:900 Sub divide [0:400; 0:600 [0:600; 0:850 [0:850; 0:900 b
[0:600; 0:850 Sub divide [0:600; 0:700 [0:700; 0:825 [0:825; 0:850 b
[0:700; 0:825 Sub divide [0:700; 0:750 [0:750; 0:812 [0:812; 0:825 EOF
[0:812; 0:825
The nal interval without rounding is [0:8125; 0:825, which in binary is approx-
imately [0.11010 00000, 0.11010 01100. We can uniquely identify this interval by
outputting 1101000. According to the xed mo del, the probability p of this partic-
3
ular le is 0:5 0:1=0:0125 exactly the size of the nal interval and the co de
length in bits should b e lg p =6:322. In practice wehave to output 7 bits. 2
The idea of arithmetic co ding originated with Shannon in his seminal 1948 pap er
on information theory [54]. It was rediscovered by Elias ab out 15 years later, as
brie y mentioned in [1].
Implementation details. The basic implementation of arithmetic co ding de-
scrib ed ab ove has two ma jor diculties: the shrinking currentinterval requires the
use of high precision arithmetic, and no output is pro duced until the entire le has
b een read. The most straightforward solution to b oth of these problems is to output
each leading bit as so on as it is known, and then to double the length of the cur-
rentinterval so that it re ects only the unknown part of the nal interval. Witten,
Neal, and Cleary [58] add a clever mechanism for preventing the currentinterval from
shrinking to o much when the endp oints are close to 1=2 but straddle 1=2. In that
case we do not yet know the next output bit, but we do know that whatever it is, the
fol lowing bit will have the opp osite value; we merely keep track of that fact, and ex-
pand the currentinterval symmetrically ab out 1=2. This follow-on pro cedure maybe
rep eated anynumb er of times, so the currentinterval size is always longer than 1/4.
Mechanisms for incremental transmission and xed precision arithmetic have b een
develop ed through the years byPasco [40], Rissanen [48], Rubin [52], Rissanen and
Langdon [49], Guazzo [19], and Witten, Neal, and Cleary [58]. The bit-stung idea
of Langdon and others at IBM that limits the propagation of carries in the additions
is roughly equivalent to the follow-on pro cedure describ ed ab ove.
Wenow describ e in detail how the co ding and interval expansion work. This
pro cess takes place immediately after the selection of the subinterval corresp onding
to an input symb ol.
We rep eat the following steps illustrated schematically in Figure 2 as many times
as p ossible:
1 1 3
a. If the new subinterval is not entirely within one of the intervals [0; = , [ = ; = ,
2 4 4
1
or [ = ; 1, we stop iterating and return. 2
2.1 Arithmetic co ding and its implementation 5
(a)
(b)
(c)
(d)
1
0 /2 1
1
Figure 2: Interval expansion pro cess. a No expansion. b Interval in [0; = . c
2
1 1 3
Interval in [ = ; 1. d Interval in [ = ; = follow-on case.
2 4 4
1
b. If the new subinterval lies entirely within [0; = , we output 0 and any 1s left
2
1
over from previous symb ols; then we double the size of the interval [0; = ,
2
expanding toward the right.
1
c. If the new subinterval lies entirely within [ = ; 1, we output 1 and any 0s left
2
1
over from previous symb ols; then we double the size of the interval [ = ; 1,
2
expanding toward the left.
1 3
d. If the new subinterval lies entirely within [ = ; = , wekeep track of this fact
4 4
1 3
for future output; then we double the size of the interval [ = ; = , expanding in
4 4
b oth directions away from the midp oint.
Example 2 :We show the details of enco ding the same le as in Example 1.
6 2 TUTORIAL ON ARITHMETIC CODING
Current Subintervals
Interval Action a b EOF Input
[0:00; 1:00 Sub divide [0:00; 0:40 [0:40; 0:90 [0:90; 1:00 b
[0:40; 0:90 Sub divide [0:40; 0:60 [0:60; 0:85 [0:85; 0:90 b
[0:60; 0:85 Output 1
1
Expand [ = ; 1
2
[0:20; 0:70 Sub divide [0:20; 0:40 [0:40; 0:65 [0:65; 0:70 b
[0:40; 0:65 fol low
1 3
Expand [ = ; =
4 4
[0:30; 0:80 Sub divide [0:30; 0:50 [0:50; 0:75 [0:75; 0:80 EOF
[0:75; 0:80 Output 10
1
Expand [ = ; 1
2
[0:50; 0:60 Output 1
1
Expand [ = ; 1
2
[0:00; 0:20 Output 0
1
Expand [0; =
2
[0:00; 0:40 Output 0
1
Expand [0; =
2
[0:00; 0:80 Output 0
The \fol low " output in the sixth line indicates the follow-on pro cedure: wekeep
track of our knowledge that the next output bit will b e followed by its opp osite; this
\opp osite" bit is the 0 output in the ninth line. The enco ded le is 1101000,as
b efore. 2
Clearly the currentinterval contains some information ab out the preceding inputs;
this information has not yet b een output, so we can think of it as the co der's state. If
a is the length of the currentinterval, the state holds lg a bits not yet output. In the
basic metho d illustrated by Example 1 the state contains al l the information ab out
the output, since nothing is output until the end. In the implementation illustrated
by Example 2, the state always contains fewer than two bits of output information,
since the length of the currentinterval is always more than 1=4. The nal state in
Example 2 is [0; 0:8, which contains lg 0:8 0:322 bits of information.
Use of integer arithmetic. In practice, the arithmetic can b e done by storing
the currentinterval in suciently long integers rather than in oating p oint or exact
rational numb ers. We can think of Example 2 as using the integer interval [0; 100
by omitting all the decimal p oints. We also use integers for the frequency counts
used to estimate symb ol probabilities. The sub division pro cess involves selecting non-
overlapping intervals of length at least 1 with lengths approximately prop ortional
P
i 1
c and to the counts. To enco de symbol a we need two cumulative counts, C =
k i
k =1
P P
i n
N = c , and the sum T of all counts, T = c . Here and elsewhere we
k k
k =1 k =1
C H L NH L
denote the alphab et size by n. The new subinterval is [L + b c;L+b c.
T T
2.1 Arithmetic co ding and its implementation 7
In this discussion we continue to use half-op en intervals as in the real arithmetic case.
In implementations [58] it is more convenient to subtract 1 from the right endp oints
and use closed intervals. Mo at [36] considers the calculation of cumulative frequency
counts for large alphab ets.
Example 3 : Supp ose that at a certain p oint in the enco ding wehave symb ol counts
c =4,c =5,andc = 1 and currentinterval [25; 89 from the full interval [0; 128.
a b EOF
Let the next input symbol be b. The cumulative counts for b are C = 4 and N =9,
489 25 989 25
and T = 10, so the new interval is [25 + b c; 25 + b c = [50; 82; we then
10 10
increment the follow-on count and expand the interval once ab out the midp oint 64,
giving [36; 100. It is p ossible to maintain higher precision, truncating and adjusting
to avoid overlapping subintervals only when the expansion pro cess is complete; this
makes it p ossible to prove a tight analytical b ound on the lost compression caused by
the use of integer arithmetic, as we do in [22], restated as Theorem 1 b elow. In practice
this re nement makes the co ding more dicult without improving compression. 2
Analysis. In [22] we proveanumb er of theorems ab out the co de lengths of les
co ded with arithmetic co ding. Most of the results involve the use of arithmetic co ding
in conjunction with various mo dels of the input; these will b e discussed in Section 2.3.
Here we note two results that apply to implementations of the arithmetic co der. The
rst shows that using integer arithmetic has negligible e ect on co de length.
Theorem 1 If we use integers from the range [0;N and use the high precision al-
gorithm for scaling up the subrange, the code length is provably boundedby 4=Nln 2
bits per input symbol more than the ideal code length for the le.