Matrix-Vector Based Fast Fourier Transformations on SDR Architectures
Total Page:16
File Type:pdf, Size:1020Kb
Adv. Radio Sci., 6, 89–94, 2008 Advances in www.adv-radio-sci.net/6/89/2008/ © Author(s) 2008. This work is distributed under Radio Science the Creative Commons Attribution 3.0 License. Matrix-Vector Based Fast Fourier Transformations on SDR Architectures Y. He2, K. Hueske1, J. Gotze¨ 1, and E. Coersmeier2 1University of Dortmund, Information Processing Lab, Otto-Hahn-Str. 4, 44221 Dortmund, Germany 2Nokia Research Center, Bochum, Meesmannstr. 103, 44807 Bochum, Germany Abstract. Today Discrete Fourier Transforms (DFTs) are ap- optimum in terms of mathematical complexity, but as the plied in various radio standards based on OFDM (Orthogonal used architectures are not necessarily optimized for butterfly Frequency Division Multiplex). It is important to gain a fast structures, the question arises: Are there any efficient pro- computational speed for the DFT, which is usually achieved cedures, which reduce the computational complexity of the by using specialized Fast Fourier Transform (FFT) engines. DFT significantly (as far as possible to the FFT complex- However, in face of the Software Defined Radio (SDR) de- ity), but at the same time are more tailor-made to specific velopment, more general (parallel) processor architectures hardware architectures than the FFT?. In this paper a matrix- are often desirable, which are not tailored to FFT compu- vector based formulation of the FFT algorithm (Van Loan, tations. Therefore, alternative approaches are required to 1992) is derived from the respective matrix factorizations of reduce the complexity of the DFT. Starting from a matrix- the DFT matrix. Due to the structure of the butterfly matrices vector based description of the FFT idea, we will present dif- and the matrix factorizations, the matrix-vector based FFT ferent factorizations of the DFT matrix, which allow a reduc- allows several separations of the resulting matrix product. tion of the complexity that lies between the original DFT and These separations have an increased computational complex- the minimum FFT complexity. The computational complexi- ity compared to the FFT, but because of their mathematical ties of these factorizations and their suitability for implemen- structure they allow more flexible implementations on dif- tation on different processor architectures are investigated. ferent architectures (e.g. multi processors, vector processors, accelerated systems). The paper is organized as follows: In Sect. 2 we will clarify the definition and notation of the DFT. A short repetition of the FFT idea and the corresponding fac- 1 Introduction torization is described in Sect. 3. Then, in Sect. 4 differ- ent separations of the butterfly matrices are presented, which Recently multi-carrier modulation techniques, such as lead to different algorithmic structures and different compu- OFDM, have received great attention in high-speed data tational complexities. Implementation issues of these separa- communication systems and have been selected for several tions and their suitability for different hardware architectures communication standards (e.g. WLAN, DVB and WiMax). are discussed. Conclusions are given in Sect. 5. A central baseband signal processing task required in OFDM transceivers is the Fourier transform of the received data, which is generally realized as FFT (Cooley, 1965). To meet 2 Discrete Fourier Transform (DFT) the real-time processing requirements, various specialized m FFT processors have been proposed (Son, 2002). On the The sequence of N (with N=2 , m∈ N) complex numbers other hand Software Defined Radio (SDR) approaches for x1, ···, xN is transformed into the sequence of N complex mobile terminals are gaining momentum (Dillinger, 2003). numbers y1, ···, yN by the DFT according to To meet the demand for high computing power and flexi- N bility, specific reconfigurable accelerators (Heyne, 2004) or = X (t−1)(n−1) = ··· yt xnwN , t 1, ,N, (1) parallel processor architectures (Berthold, 2004) are used. n=1 The DFT can be implemented as FFT, which defines the −j 2π where wN =e N is a primitive N-th root of the unity. The DFT can also be formulated as a matrix vector prod- Correspondence to: K. Hueske T N uct (Van Loan, 1992). With xN =[x1···xN ] ∈ C and ([email protected]) T N yN =[y1···yN ] ∈ C , the DFT can be described as matrix Published by Copernicus Publications on behalf of the URSI Landesausschuss in der Bundesrepublik Deutschland e.V. 90 Y. He et al.: Matrix-Vector Based Fast Fourier Transformations on SDR Architectures N×N vector product yN =FN ·xN , where FN ∈ C is the DFT- − · − Matrix with elements f =w(t 1) (n 1) and n, t=1, ···,N. nt N y1 T yN = = FN · xN = FN · PN · PN ·xN (5) y2 | {z } 3 Matrix-vector based FFT IN I N D N F N 0 xo 2 2 2 To give a repetition of the known radix-2 FFT algo- = · rithms (Cooley, 1965), this section will show the connection I N −D N 0 F N xe 2 2 2 between the matrix FN and the matrix FN/2, which enables us to compute an N-point DFT from a pair of (N/2)-point Using yo=F N ·xo and ye=F N ·xe we obtain: DFTs. To transfer the FFT idea to the DFT matrix, we in- 2 2 troduce a so-called permutation matrix PN of size (N×N), which performs an odd-even ordering of the rows (multiplied y1 = yo + D N · ye = yo + d N · ∗ye (6) 2 2 from the left with PN ) or columns (multiplied from the right T T · = with PN ) of a matrix. Furthermore PN PN IN holds. The following factorizations and separations are shown for the y2 = yo − D N · ye = yo − d N · ∗ye. (7) 2 2 decimation in time (DIT) FFT. They can be easily transferred to the decimation in frequency (DIF) FFT. With d N the vectorized diagonal of D N , with (·∗) the 2 2 component-wise vector multiplication and taking into ac- 3.1 FFT-Matrix decomposition count that the two products are identical, Eq. (6) and Eq. (7) need altogether N multiplications and 2 · N additions. It is T 2 2 Applying PN to the right of the DFT-matrix FN yields easy to see that a DFT of size N can be executed by two N = m DFTs of size 2 . For N 2 this process can be recursively F N D N · F N = 2 2 2 repeated m log2 N times. The required number of oper- F0 = F · PT = . (2) N N N N ations then will be 2 log2 N multiplications and N log2 N F N −D N · F N additions, which represents the complexity of the FFT. 2 2 2 0 The resulting product FN has the following structure: The 3.3 Factorization of the DFT matrix 0 odd indexed columns, forming the left half of FN , are com- posed of FN/2, while the even indexed columns, form- To obtain alternative factorizations of the DFT matrix, the 0 · ing the right half of FN , are composed of DN/2 FN/2 and number of recursions described in the previous section can be −DN/2·FN/2. This matrix can be decomposed into the prod- modified. Introducing a new parameter k we can define the uct of two matrices (as shown in Eq. 3), number of performed recursions as m−k with N=2m giving the size of the initial DFT matrix. For arbitrary m and k the k k I N D N F N 0 resulting block submatrices in FB are of size (2 ×2 ) and 2 2 2 0 can be executed as independent DFTs (or FFTs). Because the FN = · (3) m I N −D N 0 F N length of the input vector xN is given by N=2 the number 2 2 2 of block submatrices equals 2m/2k=2m−k. After calculating − − 2m k independent DFTs, m−k butterfly matrices B have to where D is a diagonal matrix with elements d =w(n 1) i N/2 nn N be applied to finish the calculation. The entire DFT is then and n=1, ···,N/2. The first matrix represents FFT butterfly given as operations, while the second matrix contains two indepen- dent DFTs of length N/2. Note that the second matrix is a y = B · B ··· B · F · x , (8) block diagonal matrix. N m−k m−k−1 1 B P with x the (m−k)-times permuted input vector x . An ex- 3.2 Matrix-Vector based computation of the FFT p N ample for m=4 and k=2 is given below. 0 Example 1: To apply the modified DFT FN to an input vector xN , a per- mutation of xN is required, i.e. ⎡ ⎤ ⎡ ⎤ I4 D4 00 F N 000 T ⎢ ⎥ ⎢ 4 ⎥ P · x = [x x ] , (4) ⎡ ⎤ ⎢ ⎥ ⎢ ⎥ N N o e ⎢ ⎥ ⎢ 0 F 00⎥ I8 D8 ⎢ I4 −D4 00⎥ ⎢ N ⎥ ⎣ ⎦ ⎢ ⎥ ⎢ 4 ⎥ y = · · ·xp N N N ⎢ ⎥ ⎢ ⎥ ⎢ ⎥ ⎢ 00F N 0 ⎥ 2 2 I8 −D8 ⎢ 00I4 D4 ⎥ ⎢ ⎥ with xo ∈ C the odd and xe ∈ C the even indexed part of ⎣ ⎦ ⎣ 4 ⎦ N B 2 000F , ∈ 2 00I4 −D4 N xN . Introducing y1 y2 C we can formulate the DFT as 4 B follows: 1 F B Adv. Radio Sci., 6, 89–94, 2008 www.adv-radio-sci.net/6/89/2008/ Y. He et al.: Matrix-Vector Based Fast Fourier TransformationsYuheng He on SDR et al.: Architectures Matrix-Vector Based Fast Fourier 91 Transformations on SDR Architectures 3 3.4 Construction of butterfly matrices B1 B1 F4 Increasing the order of recursion m−k will also increase the number of butterfly matrices B . While the independent DFT = F4 i y16 xp submatrices in FB can be processed independently, the but- F4 terfly matrices can be treated in different ways: F4 ·As sequence of matrix multiplications a) m−k−1 nz = 512 nz = 512 Y B2 ⋅ Bm−k−i B2 B1 B1′ F4 i=0 B2′ F4 y16 = Pl Pr xp ·As one matrix multiplication with the butterfly product ma- B3′ F4 trix B4′ F4 m−k−1 b) Y B = Bm−k−i i=0 nz = 512 nz = 1024 F4 F4 B3 B3 ⋅ B2 ⋅ B1 ·As partly combined matrix multiplication with the products F4 F4 y16 = Pl WPr xp p p m−k−1 F4 F4 Y1 Y2 Y B B B m−k−i m−k−i m−k−i F4 F4 i=0 i=p1−1 i=pr −1 c) with arbitrarily chosen integer numbers p fulfilling (1<p <p <.