Adaptive Golomb Code for Joint Geometrically Dis- tributed Data and Its Application in Image Coding

Jian-Jiun Ding*, Soo-Chang Pei**, Wei-Yi Wei†, Hsin-Hui Chen††, and Tzu-Heng Lee††† Department of Electrical Engineering, National Taiwan University, Taiwan, R.O.C. Email: [email protected]*, [email protected]**, [email protected]†, [email protected]††, [email protected]†††

Abstract—The Golomb code is a special case of the Huffman also dependent on t2 − t1. That is, the value of p in (1) is not a code that is optimal for the data with a geometric distribution. In constant. When t2 − t1 is very small, the value of p in (1) is this paper, we generalize the Golomb code by using the joint near to 0. When t − t is large, the value of p is also large. probability. In image coding, there are many conditions that the 2 1 probability of a data not only has a geometric distribution but Thus, we can improve the performance of the Golomb code also depends on another data. Based on this idea, we improve the using the joint probability. We can use the information of Golomb code by considering the joint probability and make it another data to adjust the ratio p in (1). The detail of the algo- more flexible and efficient. The proposed method based on the rithm is shown in Section III. From our simulations for image joint probability is called the adaptive Golomb code. The simula- compression, using the Golomb code together with the joint tions show that, if we use the adaptive Golomb code instead of the probability can achieve higher compression rate than using the Huffman code or the original Golomb code in JPEG, the com- Huffman code or the original Golomb code. See Section IV. pression rate can be improved.

II. REVIEW OF THE ORIGINAL GOLOMB CODE

I. INTRODUCTION Suppose that a data y has the geometric distribution as in (1). In 1975, Gallager and Voorhis [3] found that, if As we know, the algorithm [1] can achieve m m+1 m m−1 the minimal codeword length when the probability distribution p + p ≤ 1 < p + p , of a data is known to the encoder. However, the Huffman i.e., ⎡ log(p + 1)⎤ , (2) m =−⎢ ⎥ code has two problems. First, it needs a codeword table for ⎢ log(p ) ⎥ encoding and decoding. Moreover, it cannot be applied to the where ⎡ ⎤ means the round-up operation, then, in addition to data with infinite possible values. For this type of data, it the first several layers, each level of the coding tree should takes infinite memory capacity to form the coding trees. have m leaf nodes. The process of the Golomb coding algo- A well-known source model with infinite symbols is the rithm is as follows: geometric distribution source, whose distribution has the form: a (i) First, determine m from p by (2). Prob()() y==− a1 p p , (1) (ii) Then, we regard a in (1) as the dividend and m as the di- where p ∈ [0, 1) is the ratio and a ∈ {0, 1, …, ∞}. In 1966, visor. q and r are the quotient and the remainder of a/m, Golomb [2] proposed an efficient entropy coding algorithm, respectively. known as the Golomb coding algorithm, to encode the geo- (iii) Convert q into the prefix. The prefix is composed of q metric distribution source. Gallager and Voorhis [3] general- “1” bits followed by a “0” bit. ized Golomb’s initial algorithm and proved that the Golomb (iv) Convert r into the suffix using the binary code. The num- code can achieve the optimal coding efficiency when the in- formation source is geometrically distributed. ber of bits of the suffix can be ⎣log2m⎦ or ⎡log2m⎤. In order Moreover, the Golomb coding algorithm can convert the to determine the length of the suffix, we must have a symbol into the codeword directly by a tunable parameter p. threshold parameter τ(m), which is defined as τ(m) = 2^ ⎡log m⎤ − m. If r < τ(m), the length of the suffix is ⎣log m⎦ This makes the Golomb code quite efficient because no code- 2 2 word table is required for encoding and decoding. bits. Otherwise, we update r into r+τ(m) and encode it into Due to these reasons, the Golomb code is useful and effi- a ⎡log2m⎤-length suffix. cient for . In this paper, we find that using the concept of joint probability can further improve the per- For example, assume that a = 9 and the parameter m deter- formance of the Golomb code. mined from (2) is 5. Then the quotient q of a/m is 1 and the In practice, many data not only have geometric distributions remainder r is 4. Since q = 1, the prefix is ‘10’. On the other but also are highly dependent on another data. hand, the remainder r = 4 is larger than τ(m) = 2^3 − 5 = 3. For example, in processing, the difference of the ve- Therefore, we update r into r + τ(m), which is equal to 7, and locities of an object at t = t1 and t = t2 is near to have a geomet- the length of the suffix is ⎡log2m⎤ = 3. Therefore, the suffix is rically distributed probability and its distribution can be ex- ‘111’ and the entire code is ‘10111’. pressed as the form in (1). However, the velocity difference is

302

Proceedings of the Second APSIPA Annual Summit and Conference, pages 302–305, Biopolis, Singapore, 14-17 December 2010.

10-0103020305©2010 APSIPA. All rights reserved. TABLE I COMPARISON AMONG THE HUFFMAN CODE, THE ORIGINAL then we can use the following procedure to encode y. This is GOLOMB CODE, AND THE PROPOSED ADAPTIVE GOLOMB CODE. the process of our proposed modified Golomb coding algo-

rithm. Without code- Flexibility and word table adaptation Huffman NO GOOD Step 1: Choose η as a large value. For example, we can Golomb YES MIDDLE choose Adaptive Golomb YES GOOD ητ= max{ |fx ( [ ]) |} . (5) Step 2: Scale the value of y[τ] as Thus, using the Golomb coding algorithm, it is easy to convert y[]τ η yˆ []τ = . (6) a data into a codeword and no codeword table is required. |([])|fxτ Instead, we only have to record the tunable parameter m in (2). Step 3: Find the best value of pˆ such that the probability of If m is known, we can reconstruct the original data. yˆ [τ ] can be approximated by the geometric distribution as: III. PROPOSED ADAPTIVE GOLOMB CODE BY JOINT PROB- k Py( ˆˆˆ[]τ =≈− k) (1 pp ) . (7) ABILITY Since if (7) is satisfied, then In theory, using the Huffman code can achieve the optimal ∞ pˆ Ey()ˆˆˆ[]τ =−=∑ k (1 pp ) k , (8) coding efficiency. However, we need extra memory to record k =0 1− pˆ the codeword table. It is not good for compression. By con- thus we can estimate the value of pˆ from trast, as the description in Section 2, when using the Golomb code, the codeword table is not required. Although the Ey()ˆ[]τ pˆ = . (9) Golomb coding algorithm approximates the distribution of a Ey()ˆ[]τ + 1 data by a geometric series and sacrifices the optimization, in Step 4: The probability of |y[τ]| = k can be approximated by: practice, since the codeword table is saved, it can achieve k Py( []τ =≈− k) ( 1 px() []ττ) p() x [], (10) higher compression ratio (i.e., lower data rate) than the Huff- man code (see our simulations in Section IV). However, we where the adjustable ratio p(x[τ]) is estimated from: believe that the flexibility and the performance of the Golomb 1 px()[]τ = . (11) code can be further improved. η ⎛⎞1 ⎜⎟−+11 In this paper, we modify the Golomb coding algorithm by |([])|fxτ ⎝⎠ pˆ using the joint probability. As the original Golomb coding The proof of (11) is described as follows. Analogous to (9), algorithm, we assume that the data y is near to have a geomet- Ey( []τ ) ric distribution as in (1). However, the value of p is not a con- px()[]τ = . (12) Ey[]τ + 1 stant and may vary with another data x, i.e., (1) is modified as () Proby()()==− a1()() px pxa . (3) Then, from (6) and (8), |([])|f xfxpττ |([])| ˆ In nature, it is very often that a data y is not only near to have Ey()[]ττ== Ey()ˆ [] . (13) a geometric distribution but also dependent on another data x. ηη1− pˆ In addition to the example of time difference VS. velocity dif- After substituting (13) into (12), we obtain (11). ference described in Section 1, there are many other examples: Step 5: Then, from (2), we can determine the tunable parame- ter m[τ] for each data y[τ] by ⎡ log(px ( [τ ])+ 1) ⎤ • x is the number of days and y is the variation of the price of m[]τ =− (14) commodities after x days. ⎢ log(px ( [τ ])) ⎥ . • x is the height of a person and y is the difference between Step 6: After m[τ] is determined, we can encode y[τ] by the the standard weight and the weight of the person Golomb code process in Section 2 (but m should be replaced • x is the area of a pattern and y is the difference between the by m[τ]). circumference of the pattern and 2 π x . • x is the distance between two pixels in an image and y is the Furthermore, since sometimes y[τ] can be positive or nega- difference of the intensities of the two pixels. tive, we can use two ways to consider the sign of y[τ]. First, as the work in [6], we can encode |y[τ]| and assign an extra In these cases, we can use the information of x to further sign bit for y[τ]. Alternatively, as the works in [5][6], we can improve the coding efficiency of y. We will modify the map y[τ] to the function y1[τ], which is defined as: y1[τ] = Golomb coding algorithm based on this idea. 2|y[τ]|-1 if y[τ]<0 and y1[τ] = 2|y[τ]| if y[τ] ≥ 0. Then we can Suppose that we have obtained a set of data pairs (x[0], use y1[τ] instead of y[τ] as the input of the encoding process. y[0]), (x[1], y[1]), (x[2], y[2]), (x[3], y[3]), ………….. . We It is worth noting that, in our algorithm, the parameter also suppose that the expected value of |y| is proportional to a p(x[τ]) in (10) and (11) is not a constant but varies with x[τ]. function of x: Thus, m[τ] is also not a constant. This means that, in Step 6, Ey()∝ fx()[]τ , (4) we encode y[τ] adaptively according to the value of x[τ].

303 Moreover, although p(x[τ]) is not a constant, it can be com- (a) First, we call the 8×8 block that corresponds to the (8m-7)th puted from pˆ by (11). Therefore, we only have to record the to 8mth rows and the (8n-7)th to 8nth columns of an image the th value of pˆ . If pˆ is known, we can reconstruct p(x[τ]) and (m, n) block and use D[m, n] to denote its DC value, which has been divided by a constant and rounded. Then, we find hence the original data from the codes. That is, as the original that D[m, n] − D[m, n−1] has a high correlation with D[m−1, n] Golomb code, no codeword table is required. − D[m−1, n−1]. Therefore, in Section 3, we can set Although pˆ is a decimal and has many possible values, we x[m, n] = D[m−1, n] − D[m−1, n−1], can constrain that ˆ should be chosen from a set of candidates, p y[m, n] = D[m, n] − D[m, n−1] − round(0.7x[m, n]), which is analogous to Rice’s work [5]. For example, we can f ()xmn[,]=+ 4 xmn [,]. (16) assign the candidates of pˆ as 16 Then we find that y[m, n] is near to have zero mean and a − cη geometrically distributed probability and the mean of |y[m, n]| 2 , (2) where η is defined in (5) and c = 1, 2, 3, …, 32, 34, 36, ..., 64, is proportional to f(x[m, n]). Thus, we can follow the pro- 72, 80, …, 128, 160, 192, 224, 256, 512, 1024, 4096, and 216. posed modified Golomb coding algorithm in Section 3 to en- We can first use (9) in Step 3 to estimate the value of pˆ and code y[m, n] and hence the DC differences between the adja- cent blocks. then find which candidates in (15) can approximate pˆ the best. Note that, since c and hence pˆ has 64 possible values, we (b) Moreover, if we use A[p, q] (p, q = 0, 1, …, 7) to denote the nonzero AC coefficients, which has been divided by the need only 6 bits to encode the value of pˆ . quantization table and rounded, in the (m, n)th block and use B[p, q] and C[p, q] to denote the AC coefficients in the (m−1, IV. SIMULATIONS FOR JPEG IMAGE CODING WITH THE PRO- n)th and the (m, n−1)th blocks, respectively, then we find that POSED ADAPTIVE GOLOMB CODE A[p, q] has a high correlation with (B[p, q] + C[p, q])/2. In this section, we use the proposed adaptive Golomb code, Therefore, we can set which uses the joint probability, for JPEG . x[p, q] = (B[p, q]+C[p, q])/2, y[p, q] = A[p, q]−αx[p, q], We also compare the results with those of the Huffman code f ()xpq[,]=+ 4 xpq [,], (17) and the original Golomb code. In JPEG, the encoder divides the input image into several where α can be adjusted for different images (For example, in 8×8 blocks and performs the DCT and the HVS-based quanti- Fig. 1(a), we choose α = 11/32). Then y[p, q] is near to have zation on each block. Then the coefficients will become zero mean and geometrically distributed probability and sparse after processing. We will perform differential coding mean(|y[p, q]|) is proportional to f(x[p, q]). Thus, we can use on the DC terms and perform zigzag scan and zero-run-length the proposed modified Golomb coding algorithm to encode coding on the remaining AC coefficients. In the JPEG stan- y[m, n] and hence the nonzero AC coefficients. dard, the following 3 data are encoded by Huffman codes: (c) We use Z[k] to denote the number of zeros followed by the (a) The difference of DC values, which has been divided by kth AC coefficient in the (m, n)th block (k is the scanning order a constant and rounded, between the adjacent blocks. of zigzag). We then use Z1[k] and Z2[k] to denote the numbers (b) The nonzero AC terms, which has been divided by the th th of zeros followed by the k1 and the k2 AC coefficients in the quantization table and rounded, in each block. th th th (m−1, n) and the (m, n−1) blocks. (The k1 AC coefficient (c) The zero-run-lengths in each block. in the (m−1, n)th block should be nonzero and |k−k | should be However, all the three data are near to have geometric distri- 1 minimized. k2 is determined by the similar way). Since Z[k] butions and their probabilities can be approximated by the has high correlation with (Z [k] + Z [k])/2, we can set form as in (1). Therefore, one can use the Golomb code in- 1 2 stead of the Huffman code to encode these three data in JPEG. x[k] = (Z1[k] + Z2[k])/2, y[k] = Z[k] − βx[p, q], In Fig. 1, we perform the simulations that use the Huff- f ( xk[]) =+ 1 xk [], (18) man code (blue lines with *) and the Golomb code (green lines where β can be adjusted for different images (For example, in with O) to encode the above three data in JPEG for four dif- Fig. 1(a), we choose β = 12/32). Then, since y[k] is near to ferent images (512x512 Monkey, Plane, House, and City im- have zero mean and geometrically distributed probability and ages). We use data rate (measured by bpp, bit per pixel) to mean(|y[k]|) ∝ f(x[k]), we can use our proposed method to en- measure the efficiency of compression. Lower bpp means code y[k] and hence the zero-run-lengths. higher compression rate. From the results in Fig. 1, using the In Fig. 1, we show several simulations that use the proposed Golomb code instead of the Huffman code can indeed achieve adaptive Golomb code instead of the Huffman code and the higher compression rate in most cases. original Golomb code to encode the DC differences, the non- However, we observe that the three data (DC differences, zero AC coefficients, and the zero-run-lengths in JPEG. (The nonzero AC coefficients, and zero-run-lengths) are not only results of our method are plotted by red lines with +). The near to have geometric distributions but also highly dependent results show that using our proposed algorithm can achieve on another data. Therefore, we can use the joint probability lower bpp and hence higher compression rate in all cases. together with our proposed algorithm to improve the coding efficiency.

304 (a) Monkey (b) Plane

0.9 0.8

0.8 0.7

0.7 0.6

0.6 0.5 0.5 0.4

Data Rate (bpp) Rate Data 0.4 Data Rate (bpp)

0.3 0.3

0.2 JPEG + Static Huffman Table 0.2 JPEG + Static Huffman Table JPEG + Golomb Code JPEG + Golomb Code JPEG + Proposed Adaptive Golomb Code JPEG + Proposed Adaptive Golomb Code 0.1 0.1 28 29 30 31 32 33 34 35 36 29 30 31 32 33 34 35 36 37 PSNR (dB) PSNR (dB) (c) House (d) City

0.9 0.9

0.8 0.8

0.7 0.7

0.6 0.6

0.5 0.5

Data Rate (bpp) 0.4 Data Rate (bpp) 0.4

0.3 0.3

0.2 JPEG + Static Huffman Table 0.2 JPEG + Static Huffman Table JPEG + Golomb Code JPEG + Golomb Code JPEG + Proposed Adaptive Golomb Code JPEG + Proposed Adaptive Golomb Code 0.1 0.1 28 29 30 31 32 33 34 35 28 29 30 31 32 33 34 35 PSNR (dB) PSNR (dB) Fig. 1 The coding performances of different coding schemes, where bpp means the bit per pixel and lower bpp means higher compression rate.

Therefore, using the proposed adaptive Golomb code together the Huffman code in image compression, acoustical signal with the joint probability can indeed improve the coding effi- compression, and other data compression applications. ciency. The proposed adaptive Golomb code has the potential to replace the roles of the Huffman code in JPEG, MPEG, and VI. REFERENCES other data compression standards. [1] D. A. Huffman, "A method for the construction of minimum- redundancy codes," Proceedings of the IRE, vol. 40, no. 9, pp. V. CONCLUSION 1098-1101, 1952. In this paper, we use the joint probability to improve the [2] S. W. Golomb, "Run length encodings," IEEE Trans. Inf. The- ory, vol. 12, pp. 399-401, 1966. Golomb coding algorithm. As the Golomb code, our method [3] R. Gallager and D. V. Voorhis, "Optimal source codes for geo- also has the advantages of being suitable for the data with metrically distributed integer alphabets," IEEE Trans. Informa- geometric distributions and that no codeword table is required. tion Theory, vol. 21, pp. 228–230, March 1975. Furthermore, since our proposed algorithm use the informa- [4] K. M. Cheung and P. Smyth, "A high-speed distortionless predic- tion of another data to adjust the value of p, it can achieve tive image compression scheme," Proc. of the 1990 Int’l Sympo- better coding efficiency than the original Golomb code. sium on and Its Applications, pp. 467-470, Moreover, in Section IV, we combine the JPEG algorithm Nov. 1990. with the proposed adaptive Golomb code for image compres- [5] R. F. Rice, "Some practical universal noiseless coding tech- sion. All simulation results showed that the proposed algo- niques–part I," Tech. Rep. JPL-79-22, Jet Propulsion Laboratory, Pasadena, CA, March 1979. rithm outperforms the Huffman coding and the conventional [6] G. Seroussi and M. J. Weinberger, "On adaptive strategies for an Golomb coding algorithms. Thus, the proposed adaptive extended family of Golomb-type codes," Proc. DCC’97, pp. 131- Golomb code with joint probability has the potential to replace 140, 1997.

305