<<

COOLEY-TUKEY FFT LIKE FOR THE DCT

Markus Pusc¨ hel

Department of Electrical and Computer Engineering Carnegie Mellon University Pittsburgh, U.S.A.

ABSTRACT All algorithms in this class have a large degree of parallelism, low arithmetic cost, and a regular, flexible, recursive structure, The Cooley-Tukey FFT decomposes a discrete Fourier which makes them suitable for efficient hardware implementation transform (DFT) of size n = km into smaller DFTs of size k and or for automatic software generation and adaptation [2, 3]. Only m. In this paper we present a theorem that decomposes a polyno- few special cases in this class are known from the literature. We mial transform into smaller transforms, and show that present the mathematical framework in Section 2; the algorithm the FFT is obtained as a special case. Then we use this theorem derivation and analysis is in Section 3. to derive a new class of recursive algorithms for the discrete co- sine transforms (DCTs) of type II and type III. In contrast to other 2. POLYNOMIAL ALGEBRAS AND TRANSFORMS approaches, we manipulate polynomial algebras instead of trans- form entries, which makes the derivation transparent, con- A A that permits multiplication of elements such that cise, and gives insight into the algorithms’ structure. The derived the distributive law holds is called an algebra. Examples include algorithms have a regular structure and, for 2-power size, minimal C and the set C[x] of with complex coefficients. arithmetic cost (among known DCT algorithms). Polynomial Algebra. Let p(x) be a polynomial of degree deg(p) = n. Then, A = C[x]/p(x) = {q(x) | deg(q) < n}, 1. INTRODUCTION the set of residue classes modulo p, is an n-dimensional algebra with respect to the addition of polynomials, and the polynomial The celebrated Cooley-Tukey fast (FFT) algo- multiplication modulo p. We call A a polynomial algebra. rithm [1] decomposes a discrete Fourier transform (DFT) of size Polynomial Transform. For the remainder of this paper, we n = km into smaller DFTs of size k and m. The algorithm’s in- assume that the polynomial p(x) in A = C[x]/p(x) has pair- wise distinct zeros α = (α0, . . . , αn 1). Then, the Chinese Re- herent versatility—due to the degree of freedom in factoring n and − due to its structure—has proven very useful for its implementation mainder Theorem decomposes A into a Cartesian product of one- on a variety of diverse computing platforms. A recent example is dimensional algebras as automatic platform adaptation of FFT software through algorith- C[x]/p(x) → C[x]/(x − α0) ⊕ . . . ⊕ C[x]/(x − αn 1), mic search [2, 3, 4]. − q(x) 7→ (q(α0), . . . , q(αn 1)). (1) In this paper we derive a new class of algorithms, analogous to − the FFT, for the discrete cosine transforms (DCTs) of type II and If we choose b = (p0, . . . , pn 1), deg(pi) < n, as a basis for 0 − III. The discovery of this class and its derivation is made possible A, and bk = (x ) as the basis in each C[x]/(x − αk), then the through an approach using polynomial algebras. decomposition in (1) is given by the polynomial transform Algebraic Approach to the DCTs. There is a large number Pb,α = [p`(αk)]0 k,`

k` 2π√ 1/n DFTn = [ωn ]0 k,`

n (III) Lm : jk + i 7→ im + j, 0 ≤ i < k, 0 ≤ j < m, DCTn = [T`(cos(k + 1/2)π/n)]0 k,`

n n Definition 1 Let p(x) = Tn(x) − cos rπ with list of zeros α = DFTn = Lm(Ik ⊗ DFTm) Tm(DFTk ⊗ Im), (10) (cos r1π, . . . , cos rn 1π), computed from (16), normalized to sat- − n isfy 0 ≤ ri ≤ 1, and ordered as ri < rj for i < j. Further, where Tm is the diagonal twiddle matrix. By transposing (10) we let d = (T0, . . . , Tn 1) be the basis of C[x]/p(x). Then we obtain a different recursion (III) − call DCTn (rπ) = Pd,α a skew DCT of type III. In particular, n n (III) (III) DFTn = (DFTk ⊗ Im) Tm(Ik ⊗ DFTm) Lk (11) DCTn (π/2) = DCTn . We start the derivation with the matrix B = Bn,k in (7). The basis with the bidiagonal matrix b0 in (6) is given by 1 1 b0 = (T0T0(Tm), . . . , Tm 1T0(Tm),  1 1  − . . . . . Sk =  .. ..  . (20)   T0Tk 1(Tm), . . . , Tm 1Tk 1(Tm))  1 1  − − −   = (Tjm i/2 + Tjm+i/2)0 i

(III) (II) T (III) n (III) DCTn = DCTn (π/2) , DCTn (rπ) = Km DCTm (riπ) (II) T n (II) T 0Mi

Qn,k (III) T n = (Ik ⊕(Im 1 ⊗Sk)) , DCTm (riπ) Mk , (24) − 0Mi

(III) n (III) [13] G. Bi and L. W. Yu, “DCT Algorithms for Composite Se- DCTn (rπ) = Km(DCTm (rπ/2) quence Lengths,” IEEE Trans. on Signal Processing, vol. 46, (III) no. 3, pp. 554–562, 1998. ⊕ DCTm ((1 − r)π/2))(F2 ⊗ Im)An, (30)