Parallel Algorithms for Entropy-Coding Techniques ∗
Total Page:16
File Type:pdf, Size:1020Kb
Parallel Algorithms for Entropy-Coding Techniques ¤ ABDOU YOUSSEF Department of EE & CS The George Washington University Washington, DC 20052, USA email: [email protected] Abstract: With the explosion of imaging applica- of essentially logical computations. In practice, how- tions, and due to the massive amounts of imagery data, ever, ¯les are broken down into many substrings before data compression is essential. Lossless compression, being arithmetic-coded, for precision reasons that will also called entropy coding, is of special importance be- become clear later on. Accordingly, the coded streams cause not only it serves as a stand-alone system for of those substrings can be decoded in parallel. certain applications such as medical imaging, it also is For RLE, we design parallel algorithms for both en- an inherent part of lossy compression. Therefore, fast coding and decoding, each taking O(log N) time. Fi- entropy coding/decoding algorithms are desirable. In nally, in the case of Hu®man coding, the coding algo- this paper we will develop parallel algorithms for sev- rithm is easily data-parallel. The statistics gathering eral widely used entropy coding techniques, namely, for computing symbol probabilities before construct- arithmetic coding, run-length encoding (RLE), and ing the Hu®man tree is parallelized to take O(log2 N) Hu®man coding. Our parallel arithmetic coding algo- time. Like arithmetic coding, Hu®man decoding is rithm takes O(log2 N) time on an N-processor hyper- highly sequential. However, in certain applications cube, where N is the input size. For RLE, our parallel where the data is inherently broken into many blocks coding and decoding algorithms take O(log N) time on that are processed independently as in JPEG/MPEG N processors. Finally, in the case of Hu®man coding, [5, 11], simple provisions can be made to have the bit- the parallel coding algorithm takes O(log2 N +n log n), streams easily separable into many independent sub- where n is the alphabet size, n << N. As for decoding, streams that can be decoded independently in parallel. however, both arithmetic and Hu®man decoding are It must be noted that other lossless compression hard to parallelize. However, special provisions could techniques are also in use such as Lempel-Ziv, bit- be made in many applications to make arithmetic de- plane coding, and di®erential pulse-code modulation coding and Hu®man decoding fairly parallel. (DPCM) [14]. The ¯rst two will be considered in fu- ture work. The last technique, DPCM, is the subject matter of another paper appearing in this conference 1 Introduction [?]. The paper is organized as follows. The next section With the explosion of imaging applications and due gives a brief description of the various standard par- to the massive amounts of imagery data, data com- allel operations that will be used in our algorithms. pression is essential to reduce the storage and trans- Section 3 develops a parallel algorithm for arithmetic mission requirements of images and videos [5, 11, 14]. coding. Section 4 develops parallel encoding and de- Compression can be lossless or lossy. Lossless com- coding algorithms for RLE. Section 5 addresses the pression, also called entropy coding, allows for perfect parallelization of Hu®man coding and decoding. Con- reconstruction of the data, whereas lossy compression clusions and future directions are given in section 6. does not. Even in lossy compression, which is by far more prevalent in image and video compression, en- tropy coding is needed as a last stage after the data has 2 Preliminaries been transformed and quantized [14]. Therefore, fast entropy coding algorithms are of prime importance, The parallel algorithms designed in this paper use sev- especially in online or real-time applications such as eral standard parallel operations. A list of those oper- video teleconferencing. ations along with a brief description will follow. Parallel algorithms are an obvious choice for fast pro- cessing. Therefore, in this paper we will develop par- ² Parsort(Y [0 : N ¡ 1]; Z[0 : N1], ¼[0 : N ¡ 1]): allel algorithms for several widely used entropy coding This operation sorts in parallel the input array techniques, namely, arithmetic coding [13], run-length Y into the output array Z, and records the per- encoding (RLE) [16], and Hu®man coding [4]. Our mutation ¼ that orders Y to Z: Z[k] = Y [¼[k]]. parallel arithmetic coding algorithm takes O(log2 N) Of the many parallel sorting algorithms, we use the fastest practical one, namely, Batcher's bitonic time on an N-processor hypercube, where N is the in- 2 put size. Unfortunately, arithmetic decoding seems to sorting [2], which takes O(log N) parallel time on be hard to parallelize because it is a sequential process an N-processor hypercube. ¤This work was performed in part at the National institute ² C=Parmult(A0:N¡1): It multiplies the N ele- of Standards and technology. ments of the array A, yielding the product C. In 1 this paper, the elements of A are 2 £ 2 matrices. applied at the outset before arithmetic coding starts, This operation clearly takes simply O(logN) time and the inverse mapping is applied after arithmetic de- on O(N) processors connected as a hypercube. coding is completed. Henceforth, we will assume the alphabet to be f0; 1; :::; n ¡ 1g. ² A[0;N ¡ 1]=Parpre¯x(a0:N¡1): This is the well- The conditional probabilities fPkig are either com- known parallel pre¯x operation [7]. It computes puted statistically from the input ¯le or derived from from the input array a the array A where A[i] = an assumed theoretical probabilistic model about the a[0] + a[1] + ::: + a[i], for all i = 0; 1; :::; N ¡ 1. input ¯les. Naturally, the statistical method is the one Parallel pre¯x takes O(log N) time on N proces- used most often, and will be assumed here. The struc- sors connected in a variety of ways, including the ture of the probabilistic model is, however, still useful hypercube. in knowing what statistical data should be gathered. The model often used is the Markov model of a certain ² A[0 : N ¡ 1]=Barrier-Parpre¯x(a[0 : N ¡ 1]): order m, where m tends to be fairly small, in the order This operation assumes that the input array a is of 1{5. That is, the probability that the next symbol divided into groups of consecutive elements; ev- is of some value a depends on only the values of the ery group has a left-barrier at its start and a previous m symbols. Therefore, to determine statisti- right-barrier at its end. Barrier-Parpre¯x per- cally the probability that the next symbol is a given forms a parallel pre¯x within each group indepen- that the previous m symbols are some b1b2:::bm, it suf- dently from other groups. Barrier-Parpre¯x ¯ces to compute the frequency of occurrences of the takes O(log N) time on an N-processor hyper- substring b1b2:::bma in the input string, and normal- cube. To see this, let f[0 : N ¡ 1] be a flag array ize that frequency by N, which is the total number of where f[k] = 0 if k is a right-barrier, and f[k] = 1, substrings of length m + 1 symbols in the zero-padded otherwise. Clearly, A[i] = f[i ¡ 1]A[i ¡ 1] + a[i] input string. The padding of m 0's to the left of x for all i = 0; 1; :::; N ¡ 1. The latter is a linear re- is taken to simplify the statistics gathering at the left currence relation which can be solved in O(log N) boundary of x: assume that the imaginary symbols time on an N-processor hypercube [6]. x[¡m : ¡1] are all 0. To summerize, the sequential algorithm for comput- ing the statistical probabilities and performing arith- 3 Parallel Arithmetic Coding metic coding is presented next. Arithmetic coding [13] relies heavily on the probability distributions of the input ¯les to be coded. Essentially, Algorithm Arithmetic-coding(in: x[0 : N ¡ 1]; out: B) arithmetic coding maps each input ¯le to a subinterval begin [LR] of the unit interval [0 1] such that the probability /* The alphabet is assumed to be f0; 1; :::; n ¡ 1g */ of the input ¯le is R ¡ L. Afterwards, it represents the Phase I: Statistics Gathering fraction value L in n-ary using r = d¡ logn(R ¡ L)e for k = 0 to N ¡ 1 do n-ary digits, where n is the size of the alphabet. The 0 /* compute fPkig s which are initialized to 0*/ stream of those r digits are taken to be the code of the compute the frequency fk of the substring input ¯le. x[k ¡ m : k] in the whole string x; The mapping of an input ¯le into a subinterval [LR] set Qk = fk=N; is done progressively by reading the ¯le and updating let i = x[k], and set Pki = Qk; the [LR] subinterval to always be the corresponding endfor subinterval of the input substring scanned so for. The for k = 0 to N ¡ 1 do update rule works as follows. Assume that the input Let i = x[k]; set Pk = Pk0 + Pk1 + ::: + Pk;i¡1; ¯le is the string x[0 : N¡1] where every symbol is in the endfor alphabet fa0; a1; :::; an¡1g, and that the substring x[0 : k¡1] has been processed, i.e., mapped to interval [LR]. Phase II: ¯nding the interval [LR] correspond- Let Pki be the probability that the next symbol is ai ing to the string x given that the previous symbols are x[0 : k¡1].