<<

IEEE TRANSACTIONS ON ACOUSI'ICS. SPEECH, AND PROCESSING. VOL. 37. NO. 2, FEBRUARY 1989 231 Input and Output Index Mappings for a Prime-Factor- Decomposed Computation of Discrete Cosine Transform

Abstract-This paper provides a direct derivation of the prime-fac- work and time, still providing a simple and nice structure. -decomposed computation algorithm of an N-point discrete cosine However, this technique has not yet been widely utilized transform for the number N decomposable into two relative prime numbers. It also presents input and output index mappings in the form mainly because its input and output index mappings are of tables-namely, ri-, it-, nc-, nK-, and k-tables. The index mapping seemingly too involved. In fact, the mappings are the only tables are useful for practical use of the prime-factor-decomposed barrier to overcome in applying the prime-factor algo- computation of arbitrarily sized discrete cosine transforms. rithm. This paper is therefore intended to provide a simple and organized method to perform the index mappings. In this paper, a formal direct derivation of the prime- I. INTRODUCTION factor-decomposed computation algorithm will be pre- INCE its first introduction in 1974 [l], the discrete sented first. The derivation is a direct one in the sense that Scosine transform (DCT) has found applications in it is based on the real cosine function without resorting to speech and image signal processing [1]-[8] as well as in the DFT expressions or the complex functions. Then, signal processing [9], [lo]. The DCT based on the equations obtained during the derivation, in- has been applied for speech and be- put and output index mappings will be introduced in the cause its performance was nearly optimal, yet not being form of tables. This tabulation will enable us to imple- signal dependant. On the other hand, the DCT has been ment any prime-factor-decomposable DCT in a straight- utilized for realizing filter banks in FDM-TDM transmul- forward manner. Finally, the index mapping tables will tiplexers because its real computation was simpler and be demonstrated through the 12-point DCT. faster than the complex computation of the discrete Fou- 11. DIRECTDERIVATION OF PRIMEFACTOR rier transform (DFT). DECOMPOSITION Along with the expanded applications of the DCT, a number of fast computation techniques have been also in- Let x(k), k = 0, 1, - , N - I, be a time-domain troduced [ 111-[ 191. Depending on the number of points sequence and X(n),n = 0, 1, * * - , N - 1, be its trans- N, the computation techniques can be divided into two form-domain data sequence. Then, by definition, the DCT categories: one on the general composite number cases, and the inverse DCT (IDCT), respectively, have the and the other one on the prime-factor cases. For the for- expressions mer case, N of special interest is of 2" type; for the latter 2 N-l case, N is factorizable into two mutually relative prime X(n) = - e(n) x(k) cos [x(2k + l)n/2N], numbers NI and N2. A recent work reports that the number N k=O of real multiplications for the power-of-two case can re- n = 0,1, **. 9N-1, (1) duce to (N/2)log N, and its structure resembles that of N- 1 the fast Fourier transform (FFT) [ 141. For the prime-fac- x(k) = c e(.) X(n)cos [a(2k + 1) n/2N], tor case, the number of multiplications reduces to N(NI n=O + N2) in its most primitive form, and its structure is sim- k = 0, 1, -*a ,N- 1, ilar to that of the prime-factor algorithm of the DFT [ 191. (2) The prime-factor-decomposed computation of the DCT where was proven to be powerful in reducing the computational [:fa, if n = 0, e(.) = (3) otherwise. Manuscript received September 12. 1987: revised March 5, 1988. The author is with the Department of Electronics Engineering. Seoul Since (1) can be realized simply by transposing the National University. Seoul, 151-742. Korea. IEEE Log Number 8825 133. flowgraph for (2), and since the term e(n)means nothing 'On the prime-factor algorithm of DFT, refer to [20]-[22]. but a slight modification of the data X(n),it is sufficient

0096-35 18/89/0200-0237$01 .OO O 1989 IEEE

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. 238 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH, AND SIGNAL PROCESSING, VOL 37. NO 2. FEBRUARY 1989 for our discussion to consider the IDCT-like equation’ for all nl in XI and all n2 in X2. We denote by & and N- 1 $7, sets of N integers such that x(k) = c X(n) cos [7~(2k+ 1) n/2N], I1 = 0 * = {nln =.f(nl, n2), nI E x1, n2 E ~2). k = 0, 1, . , N - 1. (4) (9a)

Throughout this paper we will assume that $7, = {.In =f(nl, n2), nl E rtl, n2 E rt2).

N = NlN2, (5) (9b) where NI and N2 are mutually prime integers. Suppose we Then it can be shown that the 2 N integers in the collection could decompose the N-point IDCT in (4)into the cascade of fi and $7, are identical to the 2 N integers in the collec- of N2 NI-point IDCT’s and NI N2-point IDCT’S.~Then, tion of 32 and 32.5This implies that a summation over N the expression for the resulting decomposed transform indexes in X can %plitinto two terms-a summation over would be of the form the N indexes in 3t and a summation over N indexes in $7,. Therefore, we can rewrite (4) as follows:

x(k) = 1/2 X(n)cos [7r(2k + 1) n/2N] ncX * COS [7r(2kl + 1) nl/2NI] + 1/2 X(n) COS [a(2k + 1) n/2N]. nE31. . COS [n(2kz + 1) n2/2N,] , (6) 1. fork, = 0, 1, . * * , NI - 1, and k2 = 0, 1, . , Nz - We denote, for all (nl, n2)in 32, X 322, 1. The main goal of this section is in deriving (6) from (4) n2) = 44 x(4 I n=f(n,,n*)’ by finding two appropriate mappings: the input mapping connecting X(n), n = 0, 1, * * * , N - 1, to X(ni, n2), * IZI = 0, 1, . * * ,NI - 1, n2 = 0, 1, , N2 - 1, and where the output mapping connecting x(k),k = 0, 1, * . - , N - 1, tox(kl, kz), kl = 0, 1, . . . ,NI - 1, k2 = 0, 1, 1, if nlNz + n2N, < N, ... , N2 - l.4 The input and output mappings are di- s(n) = - 1, otherwise. rectly tied with the input and output index mappings among the corresponding indexes. Then (10) can be rewritten as

We first consider the input mapping which connects N2-1 NI-1 X(n) to X(n1, nz). x(k) = 1/2 c {2(n1,n2) Let 32 denote the set of N integers 0 through N - 1. nz=O n1=0c Similarly, let XIand X2,respectively, denote the sets of * COS [~(2k+ l)(nlN2 + nzNI)/2N] NI integers 0 through NI - 1 and the set of N2 integers 0 through N2 - 1. We define .f and f to be mappings from + qn,, n2) cos [a(2k + 1) Ttl x rt2to 32 such that * (W2 - n,”)/2Nl]. (13) nlN2 + nzNl, ifnlN2 + n2NI < N, hl?n2) = The term s(n)reflects the negative sign appearing in the - otherwise, 2N (nlNz + nzNl), relation

COS [a(2k + 1)(2N - n)/2N]

= -COS [7~(2k+ 1) n/2N]. (14) We do not need such a term for the second part of (lo), since 1 n1N2 - n2Nl1 < N for all (nl, nz)in 321 X ‘32. We now define X(nl,nz) such that

qn,,n2) = qn1, 41, ‘It should be noted that the transposed flowgraph of DCT performs the IDCT function (with e(n) related coefficient modification), while the X(n,, n2) = if n1 = 0 orn2 = 0, (15) transposed flowgraph of DFT still performs the same DFT function. Refer to the arrow marks in Fig. 1 to appear. X(nl, n2) + X(n,, n2), otherwise. ‘Thc term IDCT we encounter hereafter will denote the IDCT in the sense of (4). ‘We name the former input mapping and the latter output mapping, even ‘A proof of this is given in Appendix A. The term collection here in- though the meaning of input and output could be reversed for the forward dicates the set obtained by listing all the elements in two sets. See footnote DCT. 11.

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. LEE: INPUT AND OUTPUT INDEX MAPPINGS 239

Then, since cos [7r(2k + 1) n,/2N2] = cos [7r(2k, + 1) n,/2Nz]. Nz-1 NI-1 ( 22b 1 c c g(nl, n2)cos [a(2k + 1) 112’ I ni = 1 Therefore, we have shown that (4)and (6) are identical if X(n) and X(nl,n2) are connected through (7), (II), - (nlN2 + n2Nl)/2N] = 0, (16a) and (15), and if x (k) and x( kl, k2) are connected through N2-1 NI-1 (20) and (21). The former three equations form the input c c X(nl, n2)cos [7r(2k + 1) nz=l ni=l mapping; and the latter two equations form the output mapping. Thus, we can now perform an N-point IDCT by cascading N2 N,-point IDCT’s and NI N2-point IDCT’s, equation (13) can be written in the form as is demonstrated in Fig. 1 for the numbers N = 12, NI = 3, N2 = 4.’ Nz-1 NI-1 x(k) = 1/2 c X(nl,n2) 111. TABULATIONOF INDEXMAPPINGS n?=O ni=0c For any given N, the main body performing N2 NI-point * (cos [7r(2k + 1>(nlN2+ n2N1)/2N] IDCT’s and NI N2-point IDCT’s is quite straightforward to implement, as is illustrated in Fig. 1. But more care COS [7r(2k l)(nIN2 - n2N1)/2N]}. + + should be taken on the input mapping which converts (17) X(n), n = 0, 1, * . * , N - 1 to X(n1, n2), 121 = 0, 1, Recalling the relation ... ,NI - 1, n2 = 0, 1, * * * , N2 - 1, and the output mapping which converts x(kl, k2), kl = 0, 1, * . . ? NI 1/2 {cos [7r(2k + l)(nlN2 + n2Nl)/2N] - 1, k2 = 0, 1, * , N2 - 1, tox(k), k 0, 1, * * . , N - 1. In this section, we will tabulate the index map- + COS [7r(2k + l)(nIN2 - nzN1)/2N]} pings relating n to (nl, n2)and relating (kl, k2) to k, and = COS [7r(2k + 1) nlN2/2N] will discuss how to utilize the resulting tables to realize the above input and output mappings. * cos [7r(2k + 1) n2Nl/2N], ( 18) We first consider the input mapping, which is repre- we can rewrite it again as sented by (7), (ll), and (15). We set up a table with NI rows and N2 columns, naming each row 0 through NI - Nz-1 NI-I 1 and each column 0 through N2 - 1. Then we fill loca- x(k) = c X(nl,n2) COS [7r(2k + 1) n1/2N1] n2=0 nl =O tion (nl, n2),which is at row n1and column n2, withf(n,, n2)fornl = 0, 1, * . , N, - 1, n2 = 0, 1, - , N2 - * COS [~(2k+ 1) n2/2N2]. (19) 1, putting a negative sign to everyf( nl, n2)meeting n1N2 Now, we consider the output mapping which connects + n2N1 2 N. The negative sign is meant to reflect the x(k)- to x(kl, k2). We denote by k, and k2, respectively, s( n)term in (12), designating that the input data X( f(nl, kl = k modulo 2 NI, k2 = k modulo 2 N2. We define g to n2))corresponding to the index should accompany a neg- be a mapping from 32 to XI X X2 such that (kl, k2) = ative sign as (1 la) indicates. We name the resulting table g (k) with &table. Then the ri-table represents (7a) and (1 la). We define A-table in a similar manner by listing f( nl, n2)to location (nl, n2),n1 = 0, 1, - * . ,NI - 1, n2 = 0, 1, , N2 - 1. Then the &table will represent (7b) and (1 lb). Note that there is no negative sign in the A-table. If we denote by fi(nl,n2) and A(nl, n2)the entries at (n,,n2) of the ri- and A-tables, respectively, then we have ri(nl, n2) = A(nl, n2)for nl = 0 or n2 = 0, and ri(nl, for each (kl, k2) in 32, X X2,and a k in 32. Then, it can n;?)= ri(NI - nI, N2 - n2), A(n1, n2) = A(N1 - nl, N2 be shown that g is a one-to-one mapping, so there exists - n2) for the other nl and n2. This reflects (Sa) and (Sb), an inverse mapping g from XI x X2 to 32.6 Thus, we implying that neither the entries in the ri-table nor the en- can define tries in the A-table cover the set X. So we introduce new tables which can cover the set 32. We define nc-table to x(klt k2) = x(k) (k=g-1(kl,k2)‘ (21 1 be the table obtained from the ri- and A-tables in the fol- lowing manner. If nl = 0 or n2 = 0, we take nc(nl,n2) If we apply this to (1 9), we finally obtain the expression in (6), since = ri(nl, n2) = A(nl, n2),where nc(nl,n,) denotes the entry at (al, n2) of the actable. Otherwise, we take COS [~(2k+ 1) n1/2NI] = COS [~(2kl+ 1) n1/2Nl], nc(nl,n2)=A(nl,n2)forthefirst(Nl - 1)(N, - 1)/2

(22a) ’It should be noted that the block diagram in Fig. 1 performs IDCT if signals flow from left to right, and performs DCT if signals flow from right 6A proof of this is given in Appendix B to left.

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. 240 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOL 37, NO. 2. FEBRUARY 19x9

DCT

output

Mapping

X

Fig. 1. Decomposition of N-point IDCT into the cascade of /$,-point IDCT’s and N,-point IDCT’s, where N = 12. N, = 3. and NZ = 4.

entries taken columnwise,x and nc(nl, n,) = ri(n,, n,) In the case when NI N,-point IDCT’s appear first in the for the others but without negative signs. Also, we define main body, the output data of the input mapping box are nR-tuble in a similar manner, except that we take entries aligned in the order of X(0, 0), X(0, 1 ), * - , X(0. N2 rowwise.’ Then, clearly, the entries in the nctable, as - 1); X(1, O), X( 1, l), . . * , X( I, N2 - I); . . . ; well as the entries in the nR-table, cover the set X.There- X(N1 - 1, 0), X(N1 - 1, l), , X(N1 - 1, N2 - fore, we can align data X(n) and X(nl, n2)for the input 1). Since this implies taking data rowwise, we now take mapping by taking either nc or nR-table. the nR-table and read out data X(n) rowwise. The other We consider the case when N2 N,-point IDCT’s appear processes are the same as the previous case. first in the main body, as in Fig. 1. For this case, the We need not consider both nc- and nR-tables for a given output data of the input mapping box should be aligned in N. If we want to perform the NI-point IDCT operation

the order of X( 0, 0), X( 1, 0), . 2 X(N1 - 1,O); X(0, first, followed by the N,-point IDCT operation, then the l), X( 1, l), . * , X(N1 - 1, 1); * * ; X(0, N2 - l), nctable together with the fi- and fi-tables will suffice. But X(1, N2 - l), * . . , X(Nl - I, N, - 1). This implies if we want to do N,-point IDCT first, then we need the that we have to read out data columnwise. So we take the nR-table instead of the n,-table. Depending on the table n,-table and read out the input data X(n) columnwise, we take for the input mapping, the output mapping also juxtaposing them to the left-hand side of the correspond- changes, as will be discussed below. ing X(nl, n2).Based on these alignments, we now per- We now consider the output mapping, which is repre- form the operation in (15). We can use ri- and ri-tables, sented by (20) and (21). This case is rather simple com- respectively, for the terms X(nl, n2)and X(nl,n,). Due pared to the previous case. Notice that this tabulation is to the symmetric properties of the ri- and fi-tables de- possible due to the one-to-oneness of the mapping g. scribed by (8), equation (15) can be realized as a We make a table with NI rows and N2 columns, naming flowgraph of the following shape. The output data X(nl, each row 0 through N, - 1 and each column 0 through n2)with nl = 0 or n2 = 0 is joined with the corresponding N2 - 1 as before. We write k, k = 0, 1, . * , N - 1, to input data X(n) by a unity-gain . For other input data, location (k,,k2), where k, and k2 follow (20a) and (20b), there exists a butterfly operation joining every X(n,, n2) respectively. We name the resulting table k-table. and X(NI - n,, N2 - n,) with their corresponding coun- Using this table, we now consider realizing (21). We terparts in X(n), n = 0, 1, . . , N - 1. The signs for first consider the case when the N,-point IDCT’s were the butterfly operation follow those in the fi- and A-ta- performed first, followed by the N,-point IDCT’s in the bles.’ main body. For this case, the input data x(k,, k2) (to the output mapping box), which correspond to the output data of the N,-point IDCT’s of the main body, are aligned in ‘The term “columnwise” implies that we are taking the entries at the first column first, followed by the second through the last columns sequen- theorderx(O,O),x(O, l), . , ~(0,Nz - 1 ); X( 1, O), tially. More specifically, we are taking entriea in the order (0, 0). ( 1. 0). X( 1, l), . * . , X( 1, NZ - 1); * . . ;x(NI - l,O), x(N1 ... , (NI - 1. 0);(0. I), (1, 1). ’ ’ , (NI ~ 1, I); ’ ‘ . ; (0, N: - - 1, I), . . , x(Nl - 1, N, - 1). This implies that we 1 ), ( 1, Nz - 1 ), ‘ . . , (N, - 1, NZ - 1 ). A similar interpretation is also valid for the term “rowwise.” have to take data rowwise. Thus, we read out the output ’Note that the signs of all but those having negative signs are positive. data x( k) rowwise from the k-table, juxtaposing them to

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. LEE: INPUT AND OUTPUT INDEX MAPPINGS 24 1 the right-hand side of the corresponding x ( kl , k2 ). On TABLE I these alignments, we now perform the operation corre- INDEX MAPPINGTABL~S FOR N = I2 WITH N, = 3, NZ = 4. (a) +TABLE, (b) A-TABLE,(c) ~~(.-TABLE,(d) I~~-TABLE, (e),&TABLE sponding to (21), which is nothing but a unity-gain line drawing between each x( k,, k2) and the corresponding counterpart x(k). For the other case, where we perform the N,-point IDCT’s first and the NI-point IDCT’s last, we can repeat a similar procedure. Since we now have the alignment x(0, 0), x( 1, 0), * . . , x(N, - 1, 0); ~(0, l),x(l, 11, * *. , x(N, - 1, 1); * . ; ~(0,N? - l), .x( 1, N2 - l), . . * ,x(Nl - 1,N2 - l),wehavetotake the k-table columnwise in this case. Therefore, we can summarize the usage of the tables as follows. If we want to arrange the main body of the de- composed system such that the N2 NI-point IDCT’s come first, followed by the NI N,-point IDCT’s, then we should take the nc-table columnwise in the input mapping, taking the k-tuble rowwise in the output mapping. But if we want to have the reversed order, we have to take the nR-table rowwise and the k-table columnwise. For either case, the ri and A-tables are used in common to provide butterfly connections. IV. EXAMPLE We consider the 12-point IDCT/DCT to illustrate the decomposed computation and the corresponding n- and k- tables. We let NI = 3, N2 = 4, and perform four 3-point IDCT’s first followed by three 4-point IDCT’s. Then we obtain the structure shown in Fig. 1. One can also show a similar structure for the case when three 4-point IDCT’s are taken first. We want to figure out the black boxes named input and output mappings in the figure. We first draw up the ri-, A-, nc-, and n,-tables for the input mapping box. By evaluating (7a) with NI = 3, N2 = 4, nl = 0, 1, 2, and n2 = 0, 1, 2, 3, we obtain the A- table shown in Table I(a). Similarly, using (7b), we ob- tain the A-table in Table I@). We now derive the nc- and nR-tables from the above two tables. For the locations with n, = 0, or n2 = 0 of both tables, we copy the correspond- of (15). The second part of it turns out to be butterfly ing entries in the A- or A-table. But for the other locations, operations which are realized by joining X(nl, n2) and we copy down the first three entries from the A-table and X(3 - nl, 4 - nz) with the corresponding two X(n)’s the other three from the ri-table, since (NI - 1 ) ( N2 - identified through the ri- and ii-tables. For example, when 1)/2 = 3. If we take the entries columnwise, then we nl = 1, n2 = 1, we have 1 and 7 at location (1, 1) of the obtain the nc-table; and if we take the entries rowwise, ri- and ii-table, respectively; and we have 1 and -7 at (3- we obtain the nR-table. These two tables are shown in Ta- 1, 4-1) of the tables. Thus, we draw a butterfly joining ble l(~)and l(d). X( 1, 1 ) and X( 2, 3) with X( 1 ) and X( 7). Each line in Now, we consider aligning the input data X(n), n = 0, this butterfly has unity-gain except the line joining X( 7)

1, -** , 11. Since we are taking four 3-point IDCT’s first with X( 2, 3) which has the gain - 1, as is designated by in the main body, the output data of the input mapping the negative sign in -7. In a similar way, we can draw box are aligned in the order of X(0, 0), X( 1, 0), X(2, the second and the third butterfly operations. Due to the 0); X(0, 11, X( 1, 1 ), X(2, 1 ); X(0, 2), X( 1, 2), X(2, symmetric properties of the A- and A-tables, the butterflies 2); X(0, 3), X( 1, 3), X(2, 3). So, we have to take the are dwindling with the crossing point of the first butterfly n,-table in aligning the input data X(n) and read out the as their axes. All these are depicted in Fig. 2(a). data columnwise. As a result, we have the order X(O), We now consider the output mapping. By evaluating X(4), X(8); X(3), X(l), X(5); x(6), X(2), X(10); (20a) and (20b) with k = 0, 1, * * , 11, we obtain the X(9), X(11), X(7). table in Table l(e). Drawing the flowgraph for the output The next step is to draw flowgraphs connecting those mapping is quite straightforward. Since we took the nc- input and output data. For each X(nl, n2) having n, = 0 table for the output mapping, we have to take the k-table or n2 = 0, we draw a horizontal unity-gain line joining it rowwise. Thus, we have the orderx(0, 0), x(0, l), x(0, with the corresponding X(n). This achieves the first part 2),x(O, 3);x(130),x(l, 1),x(1,2),x(1,3);x(2,0),

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. 242 JEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING, VOL. 37. NO. 2. FEBRUARY 19x9

X( 0 ) D - -~-.X( 0.0) to-1 reduction. The ratio can be further improved by re- X( 1 ) Y - OX( 1.0 ) peatedly applying the prime factor algorithm or by com- X( 8 ) 0 - .X( 2,0 ) bining it with other available fast algorithms. Since with X( :i ) 0 t OX( 0.1 ) the present-day computing power the computational com- X( 1 ) --;Px( 1,1 ) plexity is no longer dominated by the number of multi- plications, we should examine relevant modularity, lo- cation switching, and topology issues also. In this respect, the systematic structure provided by the prime-factor al- gorithm of DCT can be viewed as another important ad- vantage. X( 7 ) P---- ~~ ?.X( 2.3 ) The index mappings employed for the prime-factor al- (a) gorithm of DCT (PFA-DCT) differ from those for the Fig. 2. (a) Input mapping and (b) output mapping when 3-point IDCT’s prime-factor algorithm of DFT (PFA-DFT) in various come first, followed by 4-point IDCT’s. points. For the PFA-DFT, index mappings are based on the Chinese Remainder Theorem (CRT) and the Second

x( 0.0 ) - --- 0 x( 0 ) Integer Representation (SIR) in such a manner that if one x( 1.0) 6 ---~ is used for the input mapping, the other one is used for x( 2.0 ) 0 ---4.48) the output mapping.” For the PFA-DCT, however, a x( 0.1 ) -- ---* x( 6 ) variation of SIR along with a variation of CRT are both x( 1.1 ) --- x( 1 ) employed for the input index mapping, while the output x( 2.1 ) 0 ~-- AX(9) index mapping is independent of them. The variations x( 0.2 ) 0 -x(5) stem from the definitions in (7a) and (7b) which are sim- x(1.2) -~- -x(IO) ilar to, but not quite the same as, the conventional SIR x( 2.2 ) ---4 x( 2 ) and CRT. The input and the output mappings for the PFA- x( 0.3 ) *-----a x( 11 ) DFT correspond to a reordering of the related data, while x( 1.3 ) 0 - - x( 4 ) the input mapping of the PFA-DCT includes data com- x( 2.3 ) - ox( 3) bining operations in it. This is a phenomenon already (a) (b) known to us for the power-of-two case (see, for example, Fig. 3. (a) Input mapping and (b) output mapping when 4-point IDCT’s come first, followed by 3-point IDCT’s. ~41). We have discussed tabulation techniques for the input and output index mappings, based on the equations used x(2, l), x(2, 2), x(2, 3) for the input data of the output for the derivation of the prime-factor algorithm. Among mapping. And by reading out the k-table rowwise, we ob- the resulting five tables, nc and nR-tables are used for tain the sequencex(O), x(6), x(5), x( 11); x(7), x( l), input signal alignment, ri- and A-tables for input signal x( lo), x( 4); x( 8), x( 9),x( 2), x( 3) for the output data. connection (butterfly operation), and the k-table for the The remaining process is to realize (21) on these align- output signal alignment and mapping. By employing these ments, which is nothing but to join each x(kl, k2)and the tables, we can find the necessary mappings in a straight- corresponding x(k) with a unity-gain line. This is de- forward manner. Furthermore, once the tables are gener- picted in Fig. 2(b). ated, we can freely choose the DCT’s to put first-either For the case when three 4-point IDCT’s come first, fol- NI-point or N2-point. The index mapping tables are there- lowed by four 3-point IDCT’s, one can show, by taking fore expected to play a valuable role in practical appli- a similar procedure, that the resulting input and output cations of the prime-factor-decomposed computation of mapping boxes are as shown in Fig. 3(a) and (b). How- DCT. ever, care should be taken for this case such that the nR- APPENDIXA table be used in the input mapping and the k-table be used Let the operationadenote the collection of two sets, columnwise in the output mapping. preserving the total number of elements, such that” V. CONCLUDINGREMARKS In this paper, we have discussed how to decompose an NlN2-point DCT into the cascade of NI-point DCT’s and (AI) N,-point DCT’s for the case when NI and N2 are relative We want to prove in this appendix that prime to each other. We have derived and proved appro- *@% = XQX. (A21 priate mappings for this decomposition. As a conse- quence, we can now achieve faster computation of N,N2- ‘“Refer to [23] for terminology. point DCT in an organized manner. Compared to the case “Notice that the operationaindicates the collection of two sets. which employing brute-force techniques for the computation, the differs from the union operation U in that withawe list all the elements in % and % even if some of them are identical. As a consequence, we have number of real multiplications reduces from N:N; to the relation I %@% I = I & I + I % 1, where I X I denotes the number of N2N: + N,N;, which amounts to an N1N2/(NI + N2)- elements in the set X.

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. LEE. INPUT AND OUTPUT INDEX MAPPINGS 243

We define set &’ to be k in 32, is identically obtained from & in XB.Then, by (20) we have A’= {nln = nlNz + n2Nl, k^ = 2alNl + kl, ( A9a 1 111 €32.1, n*E%), (A31 and we first show that f = 2a2N2 + k?, ( A9b 1 32 c %’ U 3. (A4) and Since, by the SIR [23], k^ = 2blNl + kl, (AlOa) {nln = nlN2 + n2NI modulo N, F = 2b2N2 + (2N2 - 1 - ki), (AlOb)

nl E XI, n2 E X2}= 32, (A5) where ul,a2, bl, h2 are integers. By (A9a) and (AlOa), we have it is sufficient to show that for each n in with n 2 N, we can find n - N in % n 32. Let n’ = n;N2 + n;N1 be k^ - & = 2(ul - a2)NI, (Alla) an in %’ with n’ L N. We want to show that we and, by (A9b) and (A9b), we obtain can identify the element n’ - N in 3 n 32. Let n” denote the element of % which is at (nl, n2) = (ni,N2 - ni). IC + & = 2(~2+ b2 + 1) N2 - 1. (Allb) Then But it is a contradiction because by (AlOa) k^ and & are = (n;~,- (N~- n;) N~I both even or both odd, while by (AlOb) only one is even and the other one is odd. = ln;N2 + n;N1 - NlN2) Second, we suppose a (kl, k2) obtained from k^ in XIA is identically obtained from fin XD.Then, by (20), we = Inr - NI obtain = n’ - N. (A61 &= 2clN + (2NI - 1 - kl), (AI%) Thus, (A4) is verified. Now, since &‘ n 32 C % and 32 C %’ U %, we k^ = 2c2N + (2N2 - 1 - k?). (A12b) have 32 % %. Therefore, to Prove the in where c1,c2 are integers. By (A9a) and (A12a), we have (A2), it suffices to show that for each element in %@ %, we can identify another identical element in the set. k^ + & = 2(al + cI + 1) NI - 1, (A13a)

complete. Since the smallest integer k + &that satisfies both (A13a) APPENDIXB and (A13b) is 2NlN2 - 1, the fact that k is in 32 implies In this appendix, we prove that g is a one-to-one map- that Lis not in 32, and vice versa. Therefore, the assunip- ping from 32 to X X2. tion leads to a contradiction. Let X,, XR,XC, and 3ZD denote subsets of 32 such We can show the other four cases in a similar manner, that by applying the above reasonings. This proves the one- to-oneness of g. 32, = (k(kE 32, kl < N, and k, < N}, (A7a) REFERENCES 3tR= {k(kE32,kl< N, andk2 2 N}, (A7b) [I] N. Ahmed, T. Natarajan, and K. R. Ran, “Discrete cosinc tram- form.” IEEE Truns. Cov~put..vol. C-23. pp. 90-93. Jan. 1974. Xc= {klkE32,kl 2 N, andk2 < N}, (A7c) [2] N. Hamidi and J. Pearl, “Comparison of thc cosine and Fourier trans- forms of Markov-1 signals.” IEEE Truris. Acousr. , Speech, Tigrid 32,= {klkE32,kl 1 N, andk2 1 N). (A7d) Processing. vol. ASSP-24. pp. 428-429, Oct. 1976. [3] R. Zelinski and P. Noll, “Adaptive transform coding of speech sig- Then (k,, k2) of (20) is obtained by taking (El, z2)for all nals,” IEEE Trans. Acousr., Speech, Sigitul Processing. vol. ASSP- k in 32,; (kl, 2N2 - 1 - k2)for all k in 32&; (2NI - 1 25, pp. 299-309. Aug. 1977. [4] J. M. Tribolet and R. E. Crochiere, “Frequency domain coding of - kl,- k2)for all kin Zc;and (2N1 - 1 - kl, 2N2 - 1 speech,” IEEE Truns. .4cousr. , Speech. Signcil Procc.ssing. vol. - k2) for all k in XD.Also, we have the relation ASSP-27, pp. 512-530, Oct. 1979. [5] W. H. Chen and C. H. Smith, “Adaptive coding of monochrome and xAQxBQTLC@JZD = 32 (‘48) color images,” IEEE Trurts. Comrmtri.. vol. COM-25, pp. 1285- 1292, Kov. 1977. for the operathadefined in Appendix A. Therefore, to 16) J. A. Rose and G. S. Robinson, “Interframe coine transform image prove the one-to-oneness of g, it suffices to show that no ending.” IEEE Tram. Commit~i..vol. COM-25. pp. 1329-1339, Nov. two ( kl , k2) taken from the above four sets can be iden- 1977. [7] R. C. Reininger and J. D. Gibson. “Distributions of the two-dimen- tical. We prove this by contradiction. sional DCT coefficients for images.” lEEE Trans. Contmun., vol. To begin with, we suppose that a (k,, k2)obtained from COM-31, pp. 835-839, June 1983.

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply. 244 IEEE TRANSACTIONS ON ACOUSTICS. SPEECH. AND SIGNAL PROCESSING. VOI. 37. NO 2. FEBRUARY IYXY

[8] N. B. Nill. “A visual model weighted cosine transform for image (211 C. S. Burrus and P. W. Eachcnbachzr, ”An iwplace in-order prime compression and quality assessment,’‘ IEEE Trans. Conintun.,vol. factor FFT algorithm.’‘ IEEE Trtrn.\. Awrc.\r. , Spwch, Sifinul Pro- COM-33, pp. 551-557, June 1985. c.cs.sing, vol. ASSP-29, pp. 806-817. Aug. 1981. (91 M. J. Narasimha and A. M. Peterson, “Design ofa 24-channel trans- [22] S. Chu and C. S. Burrus. “A priiiic-factor ulgoi-ithm using distributed multiplexer.” IEEE Trclrts. ACOUSI., Speech. Signul Procrs.sing, vol. arithmetics.“ lEEE Trtrm. ,4cousr.. Spc(,c/t, Signa/ Procc~.s\ing.vol. ASSP-27. pp. 752-762. Dec. 1979. ASSP-30. pp. 217-227. Apr. 1‘9x2. [IO] M. J. Narasimha, P. P. N. Yang. B. G. Lee. and M. L. Abell. “The 1231 D. F. Elliott and K. R. Rao. Fo.\r Trtr,r.vf;,rrn.c:Algorith~n.~, Anu!\..w.s, TM-7800-MI: A 60-channel CClTT transmultiplexer,” in Pro(,. IEEE Applicutions. New York: Acndemic. 1982. Int. Con5 Contrnun., May 1984. [ 1 I] W. H. Chen, C. H. Smith, and S. C. Fralick, “A fast computational algorithm for the discrete cosine transform,” IEEE Truns. Connnicn., vol. COM-25, pp. 1004-1009. Scpt. 1977. [I21 M. J. Narasiinha and A. M. Peterson, “On the computation of dis- crete cosine transform.“ IEEE Trcrns. Conwturt.. vol. COM-26. pp. 934-936, June 1978. 1131 J. Makhoul, “A fast cosine transform in one and two dinicnsiona.” Hyeung Gi Lee (S’80-M’82) was born in Uae- IEEE Trans. Acoust. , Speei.h, Sigttul Processing, vol. ASSP-28, pp. chon, Korea. on May 12, 1951. He reccivcd the 27-34, Feb. 1980. B.S. and M.E. degrees in 1974 and 1978, respec- [I41 B. G. Lee, “A new algorithm to compute the discrete cosine trans- tively. from Seoul National University, Seoul, and form,’’ IEEE Trans. Acousr., Speech, Sigtiu( Processirig. vol. ASSP- Kyungpook National University, Taegu, Korea. 32, pp. 1243-1245, Dec. 1984. both in electronics engineering; and the Ph.D. de- [IS] Z. Wang, “Fast algorithms for the discrete W transform and for the gree in 1982 froin the University of California. discrete Fourier transform,” IEEE Trans. Acousr., Spec~h,Signd Los Angeles. in electrical engineering. Procrssitig, vol. ASSP-32, pp. 803-816, Aug. 1984. From 1974 to 1979 he was with the Department [I61 H. V. Sorensen, D. L. Jones, M. T. Heideman. and C. S. Burrus, of Electronics Engineering of ROK Naval Acad- “Real-valued fast Fourier transforni algorithms,” IEEE Tmns. emy, Chinhae. Korea, as an Instructor and Naval Acoust., Speech, Signul Processing, vol. ASSP-35, pp. 849-863, June Officer in active service. From 1982 to 1984 he worked for Granger As- 1987. sociates, Santa Clara, CA, as a Senior Engineer doing research and devel- [ 171 H. S. Malvar, “Fast computation of the discrete cosine transform and opment on applications of digital signal processing to digital transmission. the discrete Hartley transform.” IEEE Trans. Acoust., Speech, Sigwl During the period 1984 to 1986 he worked for ATbiT Bell Laboratories, Processing, vol. ASSP-35, pp. 1484-1485. Oct. 1987. North Andover, MA, as a nieniber of the Technical Staf participating in [I81 H. S. Hou, “A fast recursive algorithm for computing the discrete lightwave transmission system development along with related standard cosine transform,” IEEE Tram. Acou~t.,Speech, Signal Processing, works. Since September 1986 he has been with the Department of Elec- vol. ASSP-35, pp. 1455-1461, Oct. 1987. tronics Engineering, Seoul National University. Seoul, Korea. His current [I91 P. P. N. Yang, M. J. Narasintha, and B. G. Lee, “A prime Factor fields of interest include theory and applications of digital signal process- decomposition algorithm for the computation of discrete cosine trans- ing, digital transmission and lightwave transmission systems. and circuit form,” in Proc. IEEE Inr. Con6 Cornput., Sysr., Signal Procrssirtg, theory. He is the author of Elecrronics Engineering E.rperimenr Series (5 Dec. 1984. volumes. all in Korean) and holds four U.S. patents. 1201 C. S. Bums, “Index mappings for multidimensional formulation of Dr. Lee received the 1984 Myril B. Reed Best Paper Award from the the DFT and convolutions,” IEEE Trans. Acou.st., Sprec,h, Signal Midwest Symposium on Circuits and Systems. and exceptional contribu- Processing, vol. ASSP-25, pp. 239-242, June 1977. tion awards from AT&T Bell Laboratories. He is a member of Sigma Xi.

Authorized licensed use limited to: Seoul National University. Downloaded on February 17,2010 at 03:36:48 EST from IEEE Xplore. Restrictions apply.