PPM Performance with BWT Complexity:



A New Metho d for Lossless

Michelle E ros

California Institute of Technology

e [email protected]

Abstract

This work combines a new fast context-search algorithm with the lossless

source co ding mo dels of PPM to achieve a lossless data compression algorithm

with the linear context-search complexity and memory of BWT and Ziv-Lemp el

co des and the compression p erformance of PPM-based algorithms. Both se-

quential and nonsequential enco ding are considered. The prop osed algorithm

yields an average rate of 2.27 bits per character bp c on the ,



comparing favorably to the 2.33 and 2.34 bp c of PPM5 and PPM and the 2.43

bp c of BW94 but not matching the 2.12 bp c of PPMZ9, which, at the time of

this publication, gives the greatest compression of all algorithms rep orted on

the Calgary corpus results page. The prop osed algorithm gives an average rate

of 2.14 bp c on the Canterbury corpus. The Canterbury corpus web page gives

average rates of 1.99 bp c for PPMZ9, 2.11 bp c for PPM5, 2.15 bp c for PPM7,

and 2.23 bp c for BZIP2 a BWT-based co de on the same data set.

I Intro duction

The Burrows Wheeler Transform BWT [1] is a reversible sequence transformation

that is b ecoming increasingly p opular for lossless data compression. The BWT rear-

ranges the symb ols of a data sequence in order to group together all symb ols that share

the same unb ounded history or \context." Intuitively, this op eration is achieved by

forming a table in which each row is a distinct cyclic shift of the original data string.

The rows are then ordered lexicographically. The BWT outputs the last character of

each row which is the character that precedes the rst character of the given cyclic

shift in the original data string. For a nite memory source, this ordering groups

together all symb ols with the same conditional distribution and leads to a family of

low complexity universal lossless source co des [2, 3].

While the b est of the universal co des describ ed in [2, 3] converges to the optimal

p erformance at a rate within a constant factor the optimum, the p erformance of these

co des on nite sequences from sources such as text fails to meet the p erformance of



M. E ros is with the Department of Electrical Engineering MC 136-93 at the California Insti-

tute of Technology, Pasadena, CA 91125. This material is based up on work partially supp orted by

NSF Award No. CCR-9909026. 1

some comp eting algorithms [4]. This failure stems in part from the fact that in

grouping together all symb ols with the same context, the BWT makes the context

information inaccessible to the deco der.

Originally, the work describ ed in this pap er set out to mo dify the BWT-based

co des of [2, 3, 4] in order to make the context information accessible to the deco der.

What evolved, however, was e ectively a variation on the Prediction by Partial Map-

pings PPM algorithms of [5, 6]. Thus the work presented here can be viewed either

as a mo di cation of the BWT to include context information or as a mo di cation

of PPM to allow for the computational eciency of the BWT. The description that

follows takes the latter viewp oint.

The PPM algorithm and its descendants are extremely e ective techniques for

sequential lossless source co ding. In tests on data sets such as the Calgary and

Canterbury corp ora, the p erformance of PPM co des consistently meets or exceeds the

p erformance of a wide array of algorithms, including techniques based on the BWT

and Ziv-Lemp el style algorithms such as LZ77 [7], LZ78 [8], and their descendants.

Yet despite their excellent p erformance, PPM co des are far less commonly applied

than algorithms like LZ77 and LZ78. The Ziv-Lemp el co des are favored over PPM-

based co des for their relative eciencies in memory and computational complexity [9].

2

Straightforward implementations of some PPM algorithms require O n  computa-

tional complexity and memory to co de a data sequence of length n. While implemen-

tations requiring only O n memory have b een prop osed in the literature [6, 10], the

high computational complexity and enco ding time of PPM algorithms, remains an

imp ediment for their more widespread use.

This pap er intro duces a new context-search algorithm. While the prop osed algo-

rithm could also be employed in Ziv-Lemp el and BWT-based co des, its real distinction

is its applicability within PPM-based co des. The PPM co de used is a minor variation

on an existing PPM algorithm [10]. Our co de achieves average rates of 2:27 and 2:14

bits per character bp c, resp ectively, on the Calgary and Canterbury corp ora and

is ecient in b oth space and computation. The algorithm uses O n memory and

achieves O n complexity in its search for the longest matching context for each sym-

bol of a length-n data sequence. The O n complexity is a signi cant improvement

2

over the worst-case O n  complexity of a direct context search. Several variations

on the given approach are presented. The variations include b oth sequential and

non-sequential enco ding techniques and allow the user to trade o enco der memory,

delay, and computational complexity.

The remainder of the pap er is organized as follows. Section II contains a review of



PPM algorithms. The review fo cuses on PPM [6] and its exclusion-based variation

from [10]. A short intro duction to sux trees and McCreight's sux tree construction

algorithm [11] follows in Section III. McCreight's algorithm is used in implementa-

tions of Ziv-Lemp el [12] and BWT [1] co des. The algorithm description is followed

by a brief discussion of the diculties inherent in applying McCreight's algorithm

in PPM-based co des. Section IV describ es a new metho d for combining PPM data

compression with a new sux-tree algorithm. Section V gives exp erimental results

and conclusions. 2

II PPM Algorithms

The algorithms in the PPM family of co des combine sequential

arithmetic source co ding see, for example, [13], [14], or texts such as [9] with adaptive

n

Markov-style data mo dels. Given a probability mo del px  for symb ols x ; : : : ; x

1 n

in some nite source alphab et X , arithmetic co ding guarantees a description length

n n n n n

` x  such that ` x  < log px +2 for all p ossible x = x ; : : : ; x  2 X [13].

n n 1 n

Thus, given a good source mo del, arithmetic co des yield excellent lossless source

co ding p erformance.

A simple approach to source mo deling is the Markov mo del approach. For any

nite integer k , a Markov mo del of order k conditions the probability of the next

symbol on the previous k symb ols. Thus we co de symbol x using the probability

n

n1 n1

n1

px jx  = px jx , where the string x = x ; x ; : : : ; x  describ es

n n nk nk +1 n1

nk nk

the \context" of past information on which our estimation of likely future events is

conditioned.

In an adaptive co de, the probability estimates are built using information ab out

the previously co ded data stream. Thus the enco der may up date its probability

k

estimates p ajs for all a 2 X and all s 2 X at each time step n in order to b etter

n

re ect its current knowledge of the source. The subscript n on p ajs here makes

n

explicit the adaptive nature of the probability estimate; p ajs is the estimate of

n

probability pajs at time n { just b efore the nth symbol is co ded. The estimate

p ajs is based on the n 1 previously co ded symb ols in the data stream. Let

n

k

N ajs denote the number of times that symbol a has followed sequence s 2 X in

n

P

n1

i

the previous data stream, where N ajs = 1x = sa for each a 2 X . If

n

ik

i=k +1

the probability mo del p ajs relies only on the conditional symbol counts fN ajs :

n n

k

a 2 X ; s 2 X g and if the deco der can sequentially decipher the information sent to

it, then the deco der can track the changing probability mo del p by keeping a tally

n

of symbol counts identical to the one used at the enco der.

PPM source mo dels generalize adaptive Markov source mo dels by replacing the

single Markov mo del of xed order k by a col lection of Markov mo dels of varying

orders. For example, given some nite memory constraint M , the original PPM

algorithm uses Markov mo dels of all orders k 2 f1; 0; : : : ; M g, where the order

k = 1 mo del refers to a xed uniform distribution on all symb ols x 2 X . Typical

values of M for text compression satisfy M  6 [5, 15]. PPM uses \escap e" events

to combine the prediction probabilities of its M + 1 Markov mo dels. The escap e

mechanism is employed on symb ols that have not previously b een seen in a particular

context. Let Esc denote the escap e character, and use Y to denote the mo di ed

alphab et X [ fEscg. Use N Escjs to denote the number of distinct symb ols that

n

have followed context s, which equals the number of times an escap e was used in the

given context. PPM builds its probability estimate for symbol a at time n recursively.

It b egins by nding the longest context of the given symbol that has previously

app eared in the data stream. It then uses the escap e character to back o to lower

order mo dels if necessary to nd a mo del in which symbol a is not novel in the given

context. 3

More precisely, for each a 2 Y and each k 2 f0; : : : ; M g let

n1

N ajx 

n

nk

pn; k; a = ;

P

n1

N bjx 

n

b2Y

nk

and de ne

1

; pn; 1; a =

jX j

where jX j is the size of alphab et X . De ne K n to be the length of the largest

context for symbol n that has previously app eared in the given data stream. For each

n1

a 2 X let k n; a be the largest k 2 f0; : : : ; K ng such that N ajx  > 0. The

n

nk

initial context index K n equals zero if the context of symbol n has not previously

app eared in the data stream; the nal context index k n; a is set to 1 if symbol a

has not previously app eared in the given data string. PPM uses probability mo del

K n

Y

n1

n1

pn; k; Esc: p ajx  = p ajx  = pn; k n; a; a

n n

nK n

k =k n;a+1

This mo del is inecient in its normalization of pn; k; a for k n; a  k < K n.

By describing an escap e character in mo del k , the enco der tells the deco der that the

observed symbol satis es the equation N ajs = 0. As a result, b oth the enco der and

n

the deco der may remove from their low order probability estimates all symb ols a 2 X

such that N ajs > 0. Mo difying the normalizing constants accordingly improves

n

system p erformance, and thus this mo di cation is assumed in the work that follows.



PPM [6] mo di es PPM by removing the a priori memory constraint M to allow

for unb ounded contexts. Mo dels are kept for al l contexts of al l lengths that have



previously app eared in the data stream. In cho osing its initial context, PPM uses

the shortest matching deterministic context, where a context is deterministic if the

number of symb ols that have previously app eared in that context is exactly equal to



one. If no such matching deterministic context exists, then PPM uses the longest

matching context instead.



In [10], Bunton considers a collection of variations on PPM . One of those varia-

tions uses the \up date exclusion" mechanism of [15]. The up date exclusion metho d

0

replaces count N ajs in the calculation of p ajs with a mo di ed count N ajs.

n n

n

 0

While N increments by one each time symbol a is seen in context s 2 X , N in-

crements by one only if symbol a is either novel in context s or s is a sux of some



longer context t 2 X such that jtj = jsj + 1 and a is novel in context t.

III Sux Trees



In the original PPM algorithm [6], the contexts are stored in a \context trie." In [10],

the contexts are stored in a sux tree. This pap er follows an approach closer to that

n

of the latter work. A sux tree for data sequence x = x ; x ; : : : ; x  describ es all

1 2 n

n n n

suxes of x . We denote those suxes by fs g , where s = x = x ; : : : ; x .

i i i n

i=1 i

Using  to denote a unique end-of-string character, the sux tree for a xed string 4

k k

P P

P P

@ @ P P

a  a 

P P

P P

P P @ @

banana na b na

P P @ @

k k k k

na  na 

b nab b na

@ @ @ @

@ @ @ @

k k

na 

b nab

@ @

@ @

a b

*

Figure 1: a A sux tree for the string banana. b A sux tree for ananab, or,

equivalently, a \pre x tree" for banana.

n

x  is uniquely determined by three prop erties [11]. 1 Each tree arc may represent



any nite nonempty string s 2 X . 2 Each internal tree no de except for the ro ot

must have at least two children. 3 The strings on sibling arcs must b egin with

di erent characters. The sux tree for banana app ears in Figure 1a.

A sux tree on a string of length n has n leaves and no more than 2n no des,

giving a linear storage requirement. Each arc string, describ ed by its start and end

p oints in the original data sequence, is stored at the no de to which the given arc

descends. In the implementation of [11], each no de contains a single \sibling" p ointer

and a single \descendant" p ointer to link it to the rest of the tree.

Sux trees are very p opular data structures for use in lossless co des. In particular,

sux tree implementations of b oth Ziv-Lemp el style co des e.g., [12] and BWT-based

co des [1] app ear in the literature. In Ziv-Lemp el co des, sux trees may be used to

nd longest matches in the previously co ded data stream. In fact, the sux tree

n n

for x records al l matches of al l lengths in x . Each match is an internal no de of

n

the tree. The number of times that the match o ccurs in x equals the number of

leaves descending from that no de. For example, from the context tree in Figure 1a

we learn that the substring \a" app ears three times in the data string while the

substrings \ana" and \na" each app ear twice.

To understand the relationship between sux trees and the BWT, supp ose that

the tree arcs descending from each no de in the tree are ordered alphab etically { with

branch \a" falling to the left of branch \b" and so on. Assuming this lexicographic

ordering of tree arcs, note that the sux tree gives a lexicographic ordering of all

n

suxes of the string x . Note further that, given the existence of the end-of-string

symb ol, the lexicographic ordering of all suxes is equivalent to the lexicographic

ordering of all cyclic shifts of the original data stream. As a result, if we mo dify

the sux tree of Figure 1a by adding the symbol x to the leaf corresp onding to

i1

n

sux s = x ; : : : ; x , then p erforming the BWT on x is equivalent to reading o

i i n

the x values from left to right through the tree. This similarity to the BWT is

i1



not coincidental. Both PPM and the BWT rely on the same unb ounded contexts in

building probability mo dels for use in adaptive arithmetic co ding. The key di erence



between these two algorithms is that PPM manages to preserve context information

in a manner that is sequentially available to b oth enco der and deco der, while the

BWT makes this information inaccessible to the deco der except through explicit or

implicit context description as in some of the algorithms of [2, 3]. 5



The sux tree implementation of PPM given in [10] is closely related to the

context trie approach of [6], and the memory required by b oth algorithms grows

linearly in n. At time n, the enco der stores b oth the tree structure asso ciated with all

contexts s ; s ; : : : ; s and a list of K n+1 p ointers. These K n+1 p ointers p oint

1 2 n1

to the nth symb ol's contexts of lengths 0; 1; : : : ; K n, all of which are represented by

either states or \virtual states" in the context tree. While the contexts are suxes

of each other, there is no natural order to their lo cations in the context tree.

In [11], McCreight intro duces an O n complexity technique for sux tree con-

2

struction. The O n complexity is a huge savings over the O n  worst-case complex-

ity of direct tree construction. Unfortunately, McCreight's technique is not directly



applicable to PPM , as the following discussion illustrates.

In [11], sux tree construction is p erformed by adding suxes to the tree one

by one in order of decreasing length. Thus in the example of Figure 1a, the sux

\banana" would be added to the tree rst, followed by the sux \anana", followed

by the sux \nana" and so on. While each subsequent sux could be added to the

tree by simply starting at the ro ot and searching down for the longest match, the time

required by this straightforward approach is sup erlinear. The two key observation

used in [11] are:

 Context s di ers from context s in only its rst character s = x s .

i i+1 i i i+1

 The longest match for s is known in searching for the longest match for s .

i i+1

Together, these two observations imply that if the longest match for s in the previous

i



data string is x s for some s 2 X , then the longest match for sux s must b egin

i i+1

with string s. Combining this observation with an ecient approach for extending s

to nd the full match for s yields a linear tree construction algorithm.

i+1

The eciency of McCreight's sux tree construction technique results from its use

of information ab out s in adding sux s to the tree. An unfortunate consequence

i1 i

of this approach, however, is that sux tree construction using McCreight's algorithm

requires access to the complete data string. While the complete data string could be



made available to PPM 's enco der if we assume a nonsequential co de, the deco der

do es not have the full data string during sux tree construction. In fact, the deco der

cannot deco de the binary source description without access to at least part of the sux

tree. As a result of this predicament { that the deco der needs the data sequence to

get the sux tree and needs the sux tree to get the data sequence { McCreight's



algorithm cannot be directly applied to PPM .

IV Algorithm

The algorithm considered here is a memory- and computation-ecient implementa-



tion of PPM . Like the implementation of [10], the implementation given here uses



sux trees for storing and computing the information used in PPM 's probability

mo del. Unlike the prior implementation, however, the algorithm given here relies on

a sux tree of the reversed data stream or, equivalently, a \pre x tree" for the data 6

stream under consideration. Note, again, the similarities between this approach and

the BWT approach, where string reversal is often employed prior to p erformance of

the transform. A sux tree for the reversed data string ananab, which is also a

pre x tree for the data string banana, app ears in Figure 1b.

Pre x trees are useful for the following prop erty. All contexts of a given symbol

are nested subsets of a single branch through the pre x tree. For example, supp ose

that we are constructing a pre x tree as we co de each subsequent symbol of a data

n

string x , and supp ose that the symb ols seen so far are \banana". Then the pre x

tree available for co ding the next symbol is the pre x tree of Figure 1b. The

contexts for this symbol are: \a", \na", \ana", \nana", \anana", and \banana". All

of these contexts are represented along the branch whose terminating leaf is lab eled

by a * in Figure 1b. The reversed contexts \a", \an", \ana", \anan", \anana",

and \ananab" lab el the partial substrings of this branch. Using this approach, we

can replace the earlier list of context p ointers by a single context p ointer p ointing to

the longest context of interest in co ding the nth symb ol.

As discussed in the previous section, the use of sux or, equivalently, pre x trees



in implementing the probability mo dels in PPM is motivated by their space-eciency.

The goal here is to develop a fast sux tree construction algorithm that can be applied

 

directly within PPM . Unfortunately, the order in which PPM makes the suxes of

the reversed data string available for addition to the sux tree is exactly opp osite the

order in which strings are added to the sux tree in McCreight's algorithm. More



sp eci cally, in PPM , symb ols b ecome available to the deco der sequentially, where

symbol x can be deco ded only after the context tree based on symb ols x ; : : : ; x has

i 1 i1

n

b een constructed. Thus using p to denote the ith pre x p = x ; : : : ; x  of string x ,

i i 1 i



PPM requires an algorithm for adding suxes of the reversed data stream in order

from shortest to longest { that is order p = s = x ; p = s = x ; x ; : : : ; p =

1 n 1 2 n1 1 2 n

s = x ; : : : ; x . In contrast, McCreight's algorithm adds the longest sux rst

1 1 n

and then follows with shorter and shorter suxes of that original sux { giving

order s ; s ; : : : ; s . The following observations parallel the observations on which

1 2 n

McCreight's algorithm is built.

 Pre x p di ers from pre x p by only its last character p = p x . Thus

i i1 i i1 i

the longest context for symbol x , here denoted by suf p , is at most one

i+1 i i

character longer than the longest context for symbol x and can rely on no

i

characters prior to those found in the context for x . More precisely, suf p  =

i i i

suf p x  = suf suf p x .

i i1 i i i1 i1 i

 Since we add the pre xes to the tree in order p ; p ; : : : , the longest match for

1 2

pre x p is known prior to the search for the longest match for pre x p .

i1 i

A strategy similar to that of [11] exploits these observations to decrease the complexity

asso ciated with searching for the longest context for each subsequent symb ol. The

basic idea is to add auxiliary links to the data structure. These links provide \short-

cuts" in the search for the longest match. A description of the algorithm follows.

The tree is initialized to a single no de corresp onding to the length-0 empty

string. The initial tree is lab eled T . The sux tree is then built up one leaf at a

0 7

time, by adding one pre x at each time step. After the i 1th addition, the tree

contains pre xes p ; p ; : : : ; p and is called T . Asso ciated with every no de in

1 2 i1 i1

the tree except for the ro ot and the most recently added leaf p  is an array of

i1

\short-cut" p ointers. The number of elements in a short-cut array equals the number

of symb ols seen so far in the corresp onding context. The short-cut p ointer at no de s

for symbol a p oints to the tree no de corresp onding to context sa and is created the

rst time symbol a app ears in context s. Short-cut p ointers are not necessary at the

ro ot since the symb ol-a short-cut p ointer from the ro ot would always p oint from the

ro ot to the ro ot's child if any lab eled by \a".

At time i 1, the algorithm visits suf p  in order to add the leaf for pre x

i1 i1

p . Thus the algorithm for step i assumes no de suf p  as its starting p oint.

i1 i1 i1



At step i, the PPM enco der describ es symbol x using tree T and then adds pre x

i i1

p to the tree. In making this addition, the algorithm b egins at no de suf p ,

i i1 i1

which is the longest p ossible context for symbol x . If symbol x has previously

i i

app eared in context suf p , then suf p  has a short-cut p ointer for symbol

i1 i1 i1 i1

x . Traversing this p ointer leads to no de suf p  = suf p x . If x has not

i i i i1 i1 i i

previously app eared in context suf p  then the algorithm considers a context

i1 i1

one symbol shorter and lo oks for a context p ointer there. This pro cedure continues

until the algorithm either nds and traverses a short-cut p ointer or ends up at the

ro ot. The nal no de in this pro cedure is an extension of suf p . The algorithm

i i

creates a no de for suf p  if necessary and then adds leaf p to suf p . Each context

i i i i i

of x visited along the way is given a short-cut p ointer p ointing to the new leaf. Since

i

0

symbol x is novel in each such context, the counts N x js are incremented for each

i i

of these no des and each of their parents.

Let k denote the number of contexts of x visited in searching for suf p . Finding

i i i i

P

n

the complexity of the tree construction algorithm is equivalent to nding k .

i

i=1

Rep eated application of the equality jsuf p j = jsuf p x j k accomplishes

i i i1 i1 i i

P

n

that goal, giving nally k = n jsuf p j  n. Thus in total at most n no des

i n n

i=1

must be visited in the search for suf p  given suf p , giving O n complexity.

i i i1 i1

The resulting algorithm is O n in memory; the constant here is larger than that

of [10] due to the additional p ointers. The motivation for the p ointers is the re-

2

duction, from worst-case O n  to O n, in the complexity asso ciated with nding

the longest context. The same approach could be used in sux-tree construction for

other algorithms as well, including, for example, PPM.

The given context tree design algorithm may be applied in a number of di erent



ways to yield a PPM variant using probability estimates based entirely on the up-

0

date exclusion counters N ajs. Sequential and non-sequential metho ds are prop osed

here. The sequential approach uses the ab ove algorithm in b oth its enco der and its

deco der; thus the sequential co des' enco der and deco der use the same pro cedure to

indep endently up date the tree with each subsequent symb ol. A non-sequential ap-

proach, using McCreight's algorithm at the enco der and the ab ove algorithm at the

deco der is also prop osed. The algorithms di er in their memory and computational

complexity, but b oth algorithms are O n in b oth the memory required to store the

mo del and the number of computations needed to nd the longest context of an in-

coming data symb ol. Note that the given complexity describ es only the complexity 8

Calgary Corpus



File B97 NEW PPM BW94

Canterbury Corpus

bp c bp c bp c bp c

File PPMZ9 NEW PPM7 BZIP2

bib 1.79 1.84 1.91 2.07

bp c bp c bp c bp c

bk1 2.18 2.39 2.40 2.49

text 2.08 2.18 2.26 2.27

bk2 1.86 1.97 2.02 2.13

fax 0.79 0.87 0.94 0.78

geo 4.46 4.75 4.83 4.45

csrc 1.87 1.95 2.08 2.18

news 2.29 2.37 2.42 2.59

excl 1.01 1.49 0.97 1.01

ob j1 3.68 3.79 4.00 3.98

sprc 2.45 2.67 2.58 2.70

ob j2 2.28 2.35 2.43 2.64

tech 1.83 1.95 2.01 2.02

ppr1 2.25 2.32 2.37 2.55

poe 2.22 2.40 2.46 2.42

ppr2 2.21 2.33 2.36 2.51

html 2.19 2.29 2.35 2.48

pic 0.78 0.87 0.85 0.83

lisp 2.25 2.33 2.43 2.79

prgc 2.29 2.34 2.40 2.58

man 2.87 2.94 3.01 3.33

prgl 1.55 1.59 1.67 1.80

play 2.34 2.47 2.55 2.53

prgp 1.53 1.56 1.62 1.79

AVG 1.99 2.14 2.15 2.23

trns 1.33 1.38 1.45 1.57

AVG 2.18 2.27 2.34 2.43

Table 1: Compression results on the Calgary and Canterbury corp ora.

asso ciated with nding the longest matching context but do es not include the cost of



moving from the longest context to the shortest deterministic context in PPM .

V Results and Conclusions



This pap er intro duces a new implementation of PPM with up date exclusions that

2

reduces the worst-case O n  complexity of the tree up date mechanism to O n. The



new algorithm maintains the O n memory of earlier PPM algorithms but increases

the constant in that term. Table 1 shows the rate results achieved by the prop osed

algorithm lab eled as \new" as compared to those of a variety of alternative al-

gorithms. The results for comp eting algorithms on the Calgary corpus are quoted

from [6] and [10] B97. The results on the Canterbury corpus are quoted from the

Canterbury corpus web page. While the results given in Table 1 are by no means

exhaustive, they give a reasonable picture of how the p erformance of the prop osed

algorithm compares to current alternatives. While the prop osed approach is not the



b est p ossible algorithm in rate p erformance, it surpasses b oth PPM and the BWT-

based co des in compression capabilities using only O n complexity in the tree design.

References

[1] M. Burrows and D. J. Wheeler. A blo ck-sorting lossless data compression algo-

rithm. Technical Rep ort SRC 124, Digital Systems Research Center, Palo Alto, 9

CA, May 1994.

[2] M. E ros. Universal lossless source co ding with the Burrows Wheeler transform.

In Proceedings of the Data Compression Conference, pages 178{187, Snowbird,

UT, March 1999. IEEE.

[3] M. E ros. Universal lossless source co ding with the Burrows Wheeler transform.

1999. Submitted to the IEEE Transactions on June 28,

1999.

[4] M. E ros. Theory meets practice: universal source co ding with the Burrows

Wheeler transform. In Proceedings of the Al lerton Conference on Communica-

tion, Control, and Computing, Monticello, IL, September 1999. IEEE.

[5] J. G. Cleary and I. H. Witten. Data compression using adaptive co ding and

partial string matching. IEEE Transactions on Communications, 324:396{402,

April 1984.

[6] J. G. Cleary, W. J. Teahan, and I. H. Witten. Unb ounded length contexts

for PPM. In Proceedings of the Data Compression Conference, pages 52{61,

Snowbird, UT, March 1995. IEEE Computer So ciety.

[7] J. Ziv and A. Lemp el. A universal algorithm for sequential data compression.

IEEE Transactions on Information Theory, IT-23:337{343, May 1977.

[8] J. Ziv and A. Lemp el. Compression of individual sequences via variable-rate co d-

ing. IEEE Transactions on Information Theory, IT-245:530{536, September

1978.

[9] T. C. Bell, J. G. Cleary, and I. H. Witten. Text Compression. Prentice Hall,

New Jersey, 1990.

[10] S. Bunton. Semantically motivated improvements for PPM variants. The Com-

puter Journal, 402/3:76{93, 1997.

[11] E. M. McCreight. A space-economical sux tree construction algorithm. Jornal

of the ACM, 232:262{272, April 1976.

[12] M. Ro deh, V. R. Pratt, and S. Even. Linear algorithm for data compression via

string matching. Journal of the Association for Computing Machinery, 281:16{

24, January 1981.

[13] F. Jelinek. Bu er over ow in variable length co ding of xed rate sources. IEEE

Transactions on Information Theory, 14:490{501, May 1968.

[14] J. Rissanen and G. G. Langdon Jr. Arithmetic co ding. IBM Journal of Research

and Development, 232:149{162, March 1979.

[15] A. Mo at. Implementating the PPM data compression scheme. IEEE Transac-

tions on Communications, 3811:1917{1921, November 1990. 10