arXiv:cs/0503085v1 [cs.IT] 30 Mar 2005 iaycdwr.Orga st iiietelnt fthe of length the minimize of to encoding is goal entire Our codeword. binary ontne rfc oteecdn n h sinetof of assignment suffix the the and on encoding depend the cannot characters to to w preface codewords character, a same need the not differe of do a occurrences use different can We for coding, one. codeword next prefix-free the receiving as dynamic before For recorded character encoding. given is are the characters we to to preface codewords a codeword. of same the assignment by character The same the of occurrence every of all given size hno 1 n ufa 2.Sannsagrtmue at uses algorithm Shannon’s [2]. Huffman most and [1] Shannon encoded. yet r ie string a problem given this are For optimization. combinatorial and pression wi coding and coding costs. efficient for letter alphabetic encoding modified unequal be the , can of algorithm dynamic our length-restricted length for how the bound show We on corresponding coding. bound the analys simple upper than a better produced give a We coding. prove Shannon and on based coding, free hbtccds oe ihueullte costs. letter unequal with codes codes, phabetic oeodasge oacharacter a to assigned codeword oot (email: Toronto ih de r aeldb ’ n ’,rsetvl,and respectively, 1’s, in characters and distinct 0’s the by by labelled are labelled leaves whose are edges right is where bits, in delbl ntept rmtero otela labelled leaf the to root the from path the on labels edge tutn a structing hno’ loih ulsacd-rei hc,for which, in code-tree a builds algorithm Shannon’s curne ftecharacter the of occurrences hno rvdalwrbudof bound lower a proved Shannon steeprcletoyof entropy empirical the is loih rdcsa noigta,ecuigtepreface is the Huffman’s length excluding total prefix-free. The that, length. minimum are encoding has they an produces not algorithm or whether algorithms, rvsGgei ihteDprmn fCmue cec,Uni Science, Computer of Department the with is Gagie Travis h etkonagrtm o ttccdn r by are coding static for algorithms best-known The rfi-recdn sawl-tde rbe ndt com- data in problem well-studied a is coding Prefix-free Terms Index Abstract ohagrtm sincdwrst hrcesb con- by characters to codewords assign algorithms Both S n [3]. ( H n utecd ahcaatrb self-delimiting a by character each encode must and 1) + epeetanwagrtmfrdnmcprefix- dynamic for algorithm new a present We — 0 code-tree [email protected] ≤ S S m aacmrsin eghrsrce oe,al- codes, length-restricted compression, Data — H < r hrce ycaatradms noeeach encode must and character by character eoew tr noigadms encode must and encoding start we before + = S O S 1 a X ( .I I. hti,abnr rewoelf and left whose tree binary a is, that , ∈ = o ttcpexfe oig eare we coding, prefix-free static For . n safnto ftecaatrfrequencies character the of function a is S log s # NTRODUCTION 1 a · · · m n ( ) S S a s ) ist encode to bits m yai hno Coding Shannon Dynamic and in log rw rma lhbtof alphabet an from drawn a S  # By . Hm in ). # a ( ( a m S H S ( S ) log isfralcoding all for bits rvsGagie, Travis + stesqec of sequence the is ) stenme of number the is r  S ) emean we m where , + O ( n est of versity S a log S The . log ∈ we , tdn ebr IEEE Member, Student not n S th a nt 2 is e ) . . , , weights re,hv weights have order, ossigo utaro.A ahse,w aetetoroots two the make we step, weights, each At smallest root. with a just of consisting re,mnmzstewihe xenlpt egh obuil To for length. tree path Huffman external weighted a the minimizes trees, ftecaatr in characters frequenci the the for of tree Huffman a builds algorithm Huffman’s otwt weight with root a weight has root new that root, new aeldb h itntcaatr in characters Shannon’s distinct for the code-tree r by a with labelled starting construct algorithm, Golumbic’s can running we by algorithm So weight. its of sabnr rewoelae,i oeodr aeweights have order, some in leaves, whose tree w binary a is iia re h ifrnei ht hnw aetetwo the make we a when weights, constructing smallest that, with is for roots difference Huffman’s, [4] The to Golumbic tree. similar depth. minimax and algorithm, weight an leaf’s gave any of sum maximum nsm re,hv depths have order, some in w et fec oein node each of depth h eflabelled leaf the T he hss rtps over pass first a phases: three ahdsic hrce,a sineto oeod othe to codewords of assignment in characters an distinct character, distinct each n eodps over pass second a and loih htrcmue h oete rmsrthafter scratch from for Specifically, code-tree character. the each reading recomputes that algorithm h sindcdwr.Tefis hs takes phase first The codeword. assigned the second ecnltrdcd hrce ycaatr hti,w can we is, That character. by character recover decode later can We e h,asm htw aerecovered have we that assume why, see ecncmueteasgmn fcdwrst characters to codewords of assignment the compute can we 1 i aigweight having A ohSannsagrtmadHfmnsagrtmhave algorithm Huffman’s and algorithm Shannon’s Both )W eparnigcuto h ubro occurrences of number the of count running a keep We 1) o n ttcalgorithm static any For )If 3) characters to codewords of assignment the compute We 2) )If 4) + for w , . . . , w iia tree minimax fec itntcaatri h urn prefix current the in character distinct each of of htwudrsl rmapplying from result where would that codeword h iayrpeetto of representation binary the h concatenation the j − O oieta,i hr xssabnr rewoeleaves, whose tree binary a exists there if that, Notice . s s d w s S i ( 1 1 i n 1 . n , . . . , osntocrin occur not does · · · w , . . . , cusin occurs log n ht mn l uhtes iiie the minimizes trees, such all among that, and ⊥ s saseilcaatrnti h alphabet. the in not character special a is i n c − −⌈ i ) w sso sw aereceived have we as soon as d a n sindt htcharacter. that to assigned i ieadtethird the and time n log( o euneo weights of sequence a for + w S so et tmost at depth of is sabnr rewoelae,i some in leaves, whose tree binary a is S ssc read oegnrly the generally, more and, tree a such is w 1 s w w rcre sapeaet h encoding) the to preface a as (recorded A . w , . . . , S 1 1 T c m/ w , . . . , j i i · · · . oecd ahcaatrin character each encode to and ftecdwr sindto assigned codeword the of sbuddaoeb h negative the by above bounded is # ufa tree Huffman s w d s A i a 1 1 − i n ( w S d , . . . , hr sasml dynamic simple a is there , · · · S 1 n and esatwith start we , j hnw encode we then , )) ocutteocrecsof occurrences the count to notecide fanew a of children the into , s n ht mn l such all among that, and max( s ⌉ i i w . sidxi h alphabet. the in index ’s − S n j O 1 ihtero labelled root the with , notecide fa of children the into , hnamnmxtree minimax a then , hnw encode we then , (( w i 1 = A H i w , o euneof sequence a for ⌈ s log( 1) + to 1 j O m . . . )+1 · · · ( n ⊥ m m/ m c w s 1 s re,each trees, ) s i : ) 1 nta of instead 1 · · · − # s 1 ie the time, w , . . . , · · · time. i 1 · · · S a Then . c sthe as ⊥ ( using S i s s s To . oots i )) i and i − − as es ⌉ 1 n d 1 1 . , 2

that A used to encode si. Since A is prefix-free, ci is the only O(mn log n) time to encode S. The rest of this section shows codeword in this assignment that is a prefix of ci ··· cm. Thus, this algorithm uses at most (H + 1)m + O(n log m) bits to we can recover si as soon as ci has been received. This takes encode S. the same amount of time as encoding si. For 1 ≤ i ≤ m and each distinct character a that occurs Faller [5] and Gallager [6] independently gave a dynamic in ⊥s1 ··· si−1, Shannon’s algorithm on ⊥s1 ··· si−1 assigns coding algorithm based on Huffman’s algorithm. Their algo- to a a codeword of length at most ⌈log(i/#a(⊥s1 ··· si−1))⌉. rithm is similar to, but much faster than, the simple dynamic This fact is key to our analysis. algorithm obtained by adapting Huffman’s algorithm as de- Let R be the set of indices i such that si is a repetition scribed above. After encoding each character of S, their algo- of a character in s1 ··· si−1. That is, R = {i : 1 ≤ rithm merely updates the Huffman tree rather than rebuilding it i ≤ m, si ∈ {s1,...,si−1}}. Our analysis depends on the from scratch. Knuth [7] implemented their algorithm so that it following technical lemma. uses time proportional to the length of the encoding produced. Lemma 1: For this reason, it is sometimes known as Faller-Gallager- i log ≤ Hm + O(n log m) . Knuth coding; however, it is most often called dynamic #s (s1 ··· si−1)  Xi∈R i Huffman coding. Milidi´u, Laber, and Pessoa [8] showed that Proof: Let this version of dynamic Huffman coding uses fewer than 2m i more bits to encode S than Huffman’s algorithm. Vitter [9] L = log . #s (s1 ··· si−1)  gave an improved version that he showed uses fewer than Xi∈R i m more bits than Huffman’s algorithm. These results imply Notice that log i < m log i = log(m!). Also, for Knuth’s and Vitter’s versions use at most (H +2+ r)m + i∈R i=1 i ∈ R, if s is the jth occurrence of a in S, for some j ≥ 2, O(n log n) and (H +1+ r)m + O(n log n) bits to encode i P P then log # (s ··· s ) = log(j − 1). Thus, S, but it is not clear whether these bounds are tight. Both si 1 i−1 algorithms use O((H + 1)m) time. L = log i − log #si (s1 ··· si−1) In this paper, we present a new dynamic algorithm, dynamic Xi∈R Xi∈R

Shannon coding. In Section II, we show that the simple #a(S) dynamic algorithm obtained by adapting Shannon’s algorithm < log(m!) − log(j − 1) as described above, uses at most (H + 1)m + O(n log m) bits aX∈S Xj=2 and O(mn log n) time to encode S. Section III contains our = log(m!) − log(# (S)!) + log # (S) . main result, an improved version of dynamic Shannon coding a a aX∈S aX∈S that uses at most (H + 1)m + O(n log m) bits to encode S and only O((H + 1)m + n log2 m) time. The relationship There are at most n distinct characters in S and each occurs between Shannon’s algorithm and this algorithm is similar at most m times, so a∈S log #a(S) ∈ O(n log m). By to that between Huffman’s algorithm and dynamic Huffman Stirling’s Formula, P coding, but our algorithm is much simpler to analyze than x log x − x ln 2 < log(x!) ≤ x log x − x ln 2 + O(log x) . dynamic Huffman coding. In Section IV, we show that dynamic Shannon coding can Thus, be applied to three related problems. We give algorithms for L < m log m − m ln 2 − dynamic length-restricted coding, dynamic alphabetic coding and dynamic coding with unequal letter costs. Our algorithms #a(S) log #a(S) − #a(S) ln 2 + O(n log m) . have better bounds on the length of the encoding produced aX∈S   than were previously known. For length-restricted coding, no Since a∈S #a(S)= m, codeword can exceed a given length. For alphabetic coding, P the lexicographic order of the codewords must be the same as m L< #a(S) log + O(n log m) . #a(S) that of the characters. aX∈S Throughout, we make the common simplifying assumption By definition, this is . that m ≥ n. Our model of computation is the unit-cost word Hm + O(n log m) As an aside, we note ; RAM with Ω(log m)-bit words. In this model, ignoring space a∈S log #a(S) ∈ o((H + 1)m) to see why, compare corresponding terms in required for the input and output, all the algorithms mentioned P a∈S log #a(S) and the expansion in this paper use O(|{a : a ∈ S}|) words, that is, space P proportional to the number of distinct characters in S. m (H + 1)m = #a(S) log +1 .   #a(S)  aX∈S II. ANALYSIS OF SIMPLE DYNAMIC SHANNON CODING Using Lemma 1, it is easy to bound the number of bits that In this section, we analyze the simple dynamic algorithm simple dynamic Shannon coding uses to encode S. obtained by repeating Shannon’s algorithm after each character Theorem 2: Simple dynamic Shannon coding uses at most of the string ⊥s1 ··· sm, as described in the introduction. Since (H + 1)m + O(n log m) bits to encode S. the second phase of Shannon’s algorithm, assigning codewords Proof: If si is the first occurrence of that character in S to characters, takes O(n log n) time, this simple algorithm uses (i.e., i ∈{1,...,m}−R), then the algorithm encodes si as the 3 codeword for ⊥, which is at most ⌈log m⌉ bits, followed by that the extra n only affects low-order terms in the bound on the binary representation of si’s index in the alphabet, which the length of the encoding. is ⌈log n⌉ bits. Since there are at most n such characters, the After we encode si, we ensure that T contains one algorithm encodes them all using O(n log m) bits. leaf labelled si and this leaf has weight −⌈log((i +1+

Now, consider the remaining characters in S, that is, those n)/#si (⊥s1 ··· si))⌉. First, if si is the first occurrence of that characters whose indices are in R. In total, the algorithm distinct character in S (i.e., i ∈{1,...,m}−R), then we insert encodes these using at most a new leaf labelled si into T with the same weight as the leaf i labelled ⊥. Next, we update the weight of the leaf labelled si. log We consider this processing to be in the foreground.   #s (⊥s1 ··· si−1) Xi∈R i In the background, we use a queue to cycle through i the distinct characters that have occurred in the current < m + log #s (s1 ··· si−1) prefix. For each character that we encode in the foreground, Xi∈R i we process one character in the background. When we bits. By Lemma 1, this is at most (H + 1)m + O(n log m). dequeue a character a, if we have encoded precisely Therefore, in total, this algorithm uses at most (H + 1)m + s1 ··· si, then we update the weight of the leaf labelled bits to encode . O(n log m) S a to be −⌈log((i +1+ n)/#a(⊥s1 ··· si))⌉, unless it has this weight already. Since there are always at most III. DYNAMIC SHANNON CODING n + 1 distinct characters in the current prefix (⊥ and This section explains how to improve simple dynamic the n characters in the alphabet), this maintains the Shannon coding so that it uses at most (H +1)m+O(n log m) following invariant: For 1 ≤ i ≤ m and a ∈ ⊥s1 ··· si−1, bits and O((H + 1)m + n log2 m) time to encode the string immediately after we encode s1 ··· si−1, the leaf labelled a has weight between −⌈log((i + n)/#a(⊥s1 ··· si−1))⌉ S = s1 ··· sm. The main ideas for this algorithm are using a dynamic minimax tree to store the code-tree, introducing and −⌈log(max(i,n)/#a(⊥s1 ··· si−1))⌉. Notice that “slack” in the weights and using background processing to max(i,n) < i + n ≤ 2 max(i,n) and #a(s1 ··· si−1) ≤ keep the weights updated. #a(⊥s1 ··· si) ≤ 2#a(s1 ··· si−1)+1. Also, if si is Gagie [10] showed that Faller’s, Gallager’s and Knuth’s the first occurrence of that distinct character in S, then # (⊥s ··· s ) = # (⊥s ··· s ). It follows that, techniques for making Huffman trees dynamic can be used si 1 i ⊥ 1 i−1 to make minimax trees dynamic. A dynamic minimax tree T whenever we update a weight, we use at most one increment supports the following operations: or decrement. Our analysis of this algorithm is similar to that in Section II, • given a pointer to a node v, return v’s parent, left child, with two differences. First, we show that weakening the bound and right child (if they exist); on codeword lengths does not significantly affect the bound on • given a pointer to a leaf v, return v’s weight; the length of the encoding. Second, we show that our algorithm • given a pointer to a leaf v, increment v’s weight; only takes O((H+1)m+n log2 m) time. Our analysis depends • given a pointer to a leaf v, decrement v’s weight; on the following technical lemma. • and, given a pointer to a leaf v, insert a new leaf with Z+ the same weight as v. Lemma 3: Suppose I ⊆ and |I|≥ n. Then In Gagie’s implementation, if the depth of each node is i + n i bounded above by the negative of its weight, then each log ≤ log + n log(max I + n) .  xi  xi  operation on a leaf with weight −di takes O(di) time. Next, Xi∈I Xi∈I we will show how to use this data structure for fast dynamic Proof: Let Shannon coding. i + n i i + n We maintain the invariant that, after we encode L = log = log + log .  xi  xi   i  s1 ··· si−1, T has one leaf labelled a for each Xi∈I Xi∈I Xi∈I distinct character a in ⊥s1 ··· si−1 and this leaf has Let i ,...,i be the elements of I, with 0

Therefore, time. It follows from Lemmas 1 and 3 that this is O((H + 2 i 1)m + n log m). L ≤ log + n log(max I + n) . xi  Xi∈I IV. VARIATIONS ON DYNAMIC SHANNON CODING In this section, we show how to implement efficiently Using Lemmas 1 and 3, it is easy to bound the number of variations of dynamic Shannon coding for dynamic length- bits and the time dynamic Shannon coding uses to encode S, restricted coding, dynamic alphabetic coding and dynamic as follows. coding with unequal letter costs. Abrahams [11] surveys static Theorem 4: Dynamic Shannon coding uses at most (H + algorithms for these and similar problems, but there has 2 1)m + O(n log m) bits and O((H + 1)m + n log m) time. been relatively little work on dynamic algorithms for these Proof: First, we consider the length of the encoding problems. produced. Notice that the algorithm encodes S using at most We use dynamic minimax trees for length-restricted dy- i + n namic Shannon coding. For alphabetic dynamic Shannon cod- log + O(n log m) ing, we dynamize Melhorn’s version of Shannon’s algorithm.  #s (⊥s1 ··· si−1) Xi∈R i For dynamic Shannon coding with unequal letter costs, we i + n dynamize Krause’s version. ≤ m + log + O(n log m) #s (s1 ··· si−1) Xi∈R i A. Length-Restricted Dynamic Shannon Coding bits. By Lemmas 1 and 3, this is at most (H + 1)m + O(n log m). For length-restricted coding, we are given a bound and Now, we consider how long this algorithm takes. We will cannot use a codeword whose length exceeds this bound. prove separate bounds on the processing done in the fore- Length-restricted coding is useful, for example, for ensuring ground and in the background. that each codeword fits in one machine word. Liddell and Moffat [12] gave a length-restricted dynamic coding algorithm If si is the first occurrence of that character in S (i.e., i ∈{1,...,m}− R), then we perform three operations in the that works well in practice, but it is quite complicated and they did not prove bounds on the length of the encoding it foreground when we encode si: we output the codeword for produces. We show how to length-restrict dynamic Shannon ⊥, which is at most ⌈log(i+n)⌉ bits; we output the index of si in the alphabet, which is ⌈log n⌉ bits; and we insert a new leaf coding without significantly increasing the bound on the length of the encoding produced. labelled si and update its weight to be −⌈log(i +1+ n)⌉. In total, these take O(log(i + n)) ⊆ O(log m) time. Since there Theorem 5: For any fixed integer ℓ ≥ 1, dynamic Shannon are at most n such characters, the algorithm encodes them all coding can be adapted so that it uses at most 2⌈log n⌉ + ℓ bits using O(n log m) time. to encode the first occurrence of each distinct character in S, For i ∈ R, we perform at most two opera- at most ⌈log n⌉ + ℓ bits to encode each remaining character 1 tions in the foreground when we encode si: we out- in S, at most H +1+ (2ℓ−1) ln 2 m + O(n log m) bits in  2  put the codeword for si, which is of length at most total, and O((H + 1)m + n log m) time. ⌈log((i + n)/#si (s1 ··· si−1))⌉; and, if necessary, we incre- Proof: We modify the algorithm presented in Section III ment the weight of the leaf labelled si. In total, these take by removing the leaf labelled ⊥ after all of the characters in the O (log((i + n)/#si (s1 ··· si−1))) time. alphabet have occurred in S, and changing how we calculate For 1 ≤ i ≤ m, we perform at most two operations in the weights for the dynamic minimax tree. Whenever we would background when we encode si: we dequeue a character a; use a weight of the form −⌈log x⌉, we smooth it by instead if necessary, decrement the weight of the leaf labelled a; and using re-enqueue a. These take O(1) time if we do not decrement 2ℓ the weight of the leaf labelled a and O(log m) time if we do. − log  (2ℓ − 1)/x +1/n Suppose si is the first occurrence of that distinct character in S. Then the leaf v labelled s is inserted into T with weight 2ℓx i ≥ − min log , ⌈log n⌉ + ℓ . −⌈log(i + n)⌉. Also, v’s weight is never less than −⌈log(m +  2ℓ − 1  1+ n)⌉. Since decrementing v’s weight from w to w − 1 or With these modifications, no leaf in the minimax tree is ever incrementing ’s weight from to both take v w − 1 w O(−w) of depth greater than ⌈log n⌉ + ℓ. Since time, we spend the same amount of time decrementing v’s 2ℓ−1 weight in the background as we do incrementing it in the 1 2ℓx log 1+ 2ℓ−1 foreground, except possibly for the time to decrease v’s weight log < log x +1+   from −⌈log(i + n)⌉ to −⌈log(m +1+ n)⌉. Thus, we spend  2ℓ − 1 2ℓ − 1 O(log2 m) more time decrementing v’s weight than we do 1 < log x +1+ , incrementing it. Since there are at most n distinct characters (2ℓ − 1) ln 2 in S, in total, this algorithm takes essentially the same analysis as for Theorem 4 shows this i + n 1 2 algorithm uses at most H +1+ (2ℓ−1) ln 2 m+O(n log m) O log + O(n log m)    #s (s1 ··· si−1) 2 Xi∈R i bits in total, and O((H + 1)m + n log m) time. 5

It is straightforward to prove a similar theorem in which the where aj

cases, we update the information stored at the ancestors of vsi

B. Alphabetic Dynamic Shannon Coding and splay vsi to the root. Essentially the same analysis as for Theorem 4 shows this For alphabetic coding, the lexicographic order of the code- algorithm uses at most bits. By the words must always be the same as the lexicographic order of (H + 2)m + O(n log m) Static Optimality theorem [15], it uses time. the characters to which they are assigned. Alphabetic coding O((H + 1)m) is useful, for example, because we can compare encoded strings without decoding them. Although there is an alphabetic C. Dynamic Shannon Coding with Unequal Letter Costs version of minimax trees [13], it cannot be efficiently dy- It may be that one code letter costs more than another. For namized [10]. Mehlhorn [14] generalized Shannon’s algorithm example, sending a dash by telegraph takes longer than send- to obtain an algorithm for alphabetic coding. In this section, ing a dot. Shannon [1] proved a lower bound of Hm ln(2)/C we dynamize Mehlhorn’s algorithm. for all algorithms, whether prefix-free or not, where the Theorem 6 (Mehlhorn, 1977): There exists an alphabetic channel capacity C is the largest real root of e−cost(0)·x + prefix-free code such that, for each character a e−cost(1)·x = 1 and e ≈ 2.71 is the base of the natural in the alphabet, the codeword for a is of length logarithm. Krause [16] generalized Shannon’s algorithm for ⌈log((m + n)/#a(S))⌉ +1. the case with unequal positive letter costs. In this section, we Proof: Let a1,...,an be the characters in the alphabet dynamize Krause’s algorithm. in lexicographic order. For 1 ≤ i ≤ n, let Theorem 8 (Krause, 1962): Suppose cost(0) and cost(1) are constants with 0 < cost(0) ≤ cost(1). Then there exists a i−1 # (S)+1 #ai (S)+1 aj prefix-free code such that, for each character a in the alphabet, f(ai)= + < 1 . 2(m + n) m + n ln(m/#a(S)) Xj=1 the codeword for a has cost less than C + cost(1). Proof: Let a ,...,a be the characters in S in non- ′ 1 k For 1 ≤ i 6= i ≤ n, notice that |f(ai) − f(ai′ )| ≥ increasing order by frequency. For 1 ≤ i ≤ k, let # (S)+1 ai . Therefore, the first log m+n +1 bits of 2(m+n) #ai (S)+1 i−1 l  m #a (S) the binary representation of f(ai) suffice to distinguish it. Let f(a )= j < 1 . i m this sequence of bits be the codeword for ai. Xj=1 Repeating Mehlhorn’s algorithm after each character of S, Let be the following binary string, where and as described in the introduction, is a simple algorithm for b(ai) x0 =0 : For , if is in the first −cost(0)·C fraction alphabetic dynamic Shannon coding. Notice that we always y0 =1 j ≥ 1 f(ai) e of the interval , then the th bit of is 0 and assign a codeword to every character in the alphabet; thus, [xj−1,yj−1) j b(ai) x and y are such that [x ,y ) is the first e−cost(0)·C fraction we do not need to prepend ⊥ to the current prefix of S. This j j j j of [x ,y ). Otherwise, the jth bit of b(a ) is 1 and x algorithm uses at most (H+2)m+O(n log m) bits and O(mn) j−1 j−1 i j and y are such that [x ,y ) is the last e−cost(1)·C fraction of time to encode S. j j j [x ,y ). Notice that the cost to encode the jth bit of b(a ) To make this algorithm more efficient, after encoding each j−1 j−1 i is exactly ln((yj−1−xj−1)/(yj −xj )) ; it follows that the total cost character of S, instead of computing an entire code-tree, we C ln(1/(yj −xj )) only compute the codeword for the next character in S. We to encode the first j bits of b(ai) is C . ′ use an augmented splay tree [15] to compute the necessary For 1 ≤ i 6= i ≤ k, notice that |f(ai) − f(ai′ )| ≥ partial sums. #ai (S)/m. Therefore, if yj − xj < #ai (S)/m, then the Theorem 7: Alphabetic dynamic Shannon coding uses first j bits of b(ai) suffice to distinguish it. So the shortest prefix of b(ai) that suffices to distinguish b(ai) has cost less (H + 2)m + O(n log m) bits and O((H + 1)m) time. cost(1)·C ln(e m/#a(S)) ln(m/#a(S)) Proof: We keep an augmented splay tree T and maintain than C = C + cost(1). Let this the invariant that, after encoding s1 ··· si−1, there is a node sequence of bits be the codeword for ai. va in T for each distinct character a in s1 ...,si−1. The node Repeating Krause’s algorithm after each character of S, va’s key is a; it stores a’s frequency in s1 ··· si−1 and the as described in the introduction, is a simple algorithm for sum of the frequencies of the characters in va’s subtree in T . dynamic Shannon coding with unequal letter costs. This To encode si, we use T to compute the partial sum algorithm produces an encoding of S with cost at most H ln 2 + cost(1) m + O(n log m) in O(mn) time. # (s ··· s ) C si 1 i−1 + # (s ··· s ) , As in Subsection IV-B, we can make this simple algorithm 2 aj 1 i−1 aXj

However, instead of lexicographic order, we want to keep REFERENCES the characters in non-increasing order by frequency in the [1] C. E. Shannon, “A mathematical theory of communication,” Bell System current prefix. We use a data structure for dynamic cumulative Technical Journal, vol. 27, pp. 379–423, 623–656, 1948. probability tables [17], due to Moffat. This data structure stores [2] D. A. Huffman, “A method for the construction of minimum redundancy codes,” Proceedings of the IERE, vol. 40, pp. 1098–1101, 1952. a list of characters in non-increasing order by frequency and [3] R. De Prisco and A. De Santis, “On the redundancy achieved by supports the following operations: Huffman codes,” Information Sciences, vol. 88, pp. 131–148, 1996. [4] M. Golumbic, “Combinatorial merging,” IEEE Transactions on Com- • given a character a, return a’s frequency; puters, vol. 24, pp. 1164–1167, 1976. • given a character a, return the total frequency of all [5] N. Faller, “An adaptive system for ,” in Proceedings characters before a in the list; of the 7th Asilomar Conference on Circuits, Systems, and Computers, 1973, pp. 593–597. • given a character a, increment a’s frequency; and, [6] R. G. Gallager, “Variations on a theme by Huffman,” IEEE Transactions • given an integer k, return the last character a in the list on , vol. 24, pp. 668–674, 1978. such that the total frequency of all characters before a is [7] D. E. Knuth, “Dynamic Huffman coding,” Journal of Algorithms, vol. 6, pp. 163–180, 1985. at most k. [8] R. L. Milidi´u, E. S. Laber, and A. A. Pessoa, “Bounding the compression If a’s frequency is a p fraction of the total frequency of all loss of the FGK algorithm,” Journal of Algorithms, vol. 32, pp. 195–211, characters in the list, then an operation that is given a or 1999. [9] J. S. Vitter, “Design and analysis of dynamic Huffman codes,” Journal returns a takes O(log(1/p)) time. of the ACM, vol. 34, pp. 825–845, 1987. Dynamizing Krause’s algorithm using Moffat’s data struc- [10] T. Gagie, “Dynamic length-restricted coding,” Master’s thesis, University ture gives the following theorem, much as dynamizing of Toronto, 2003. [11] J. Abrahams, “Code and parse trees for lossless source encoding,” Mehlhorn’s algorithm with an augmented splay tree gave Communications in Information and Systems, vol. 1, pp. 113–146, 2001. Theorem 7. We omit the proof because it is very similar. [12] M. Liddell and A. Moffat, “Length-restricted coding in static and Theorem 9: Suppose cost(0) and cost(1) are constants dynamic frameworks,” in Proceedings of the IEEE Data Compression Conference, 2001, pp. 133–142. with 0 < cost(0) ≤ cost(1). Then dynamic Shannon [13] D. G. Kirkpatrick and M. M. Klawe, “Alphabetic minimax trees,” SIAM coding produces an encoding of S with cost at most Journal on Computing, vol. 14, pp. 514–526, 1985. H ln 2 + cost(1) m + O(n log m) in O((H + 1)m) time. [14] K. Mehlhorn, “A best possible bound for the weighted path length of C binary search trees,” SIAM Journal on Computing, vol. 6, pp. 235–239, If cost(0) = cost(1) = 1, then C = 1 and Theorem 9 1977. is the same as Theorem 4. We considered this special case [15] D. D. Sleator and R. E. Tarjan, “Self-adjusting binary search trees,” first because it is the only one in which we know how to Journal of the ACM, vol. 32, pp. 652–686, 1985. [16] R. M. Krause, “Channels which transmit letters of unequal duration,” efficiently maintain the code-tree, which may be useful for Information and Control, vol. 5, pp. 13–24, 1962. some applications. [17] A. Moffat, “An improved data structure for cumulative probability tables,” Software—Practice and Experience, vol. 29, pp. 647–659, 1999.

ACKNOWLEDGMENTS Many thanks to Julia Abrahams, Will Evans, Faith Fich, Mordecai Golin, Charlie Rackoff and Ken Sevcik. This re- search was supported by the Natural Sciences and Engineering Research Council of Canada.