The 5th International Conference on Electrical Engineering and Informatics 2015 August 10-11, 2015, Bali, Indonesia A Novel 4-Point Discrete Fourier Transforms Circuit based on Product of Rademacher Functions

Zulfikar1, Hubbul Walidainy2 Department of Electrical Engineering Syiah Kuala University Banda Aceh 23111, Indonesia. E-mail: [email protected], [email protected]

Abstract—This paper presents a new circuit design for algorithms have been developed for certain purposes and each implementing 4-point DFT algorithm based on product of of them comes with advantages and drawbacks. Rademacher functions. The circuit has been derived from the similarity of how Fourier transforms and Walsh transforms are Walsh transforms algorithm is simpler than Fourier implemented. Walsh matrices contain numbers either +1 or -1 transform, but this transformation model is rarely used in except for first row. Similarly, the 4-point DFT contain application and almost forgotten. Walsh transforms has a few numbers either positive or negative except for first row. This similarity to the Fourier transforms [2]-[5]. Based on this, some similarity has been taken into the case of how to implement the researchers have adopted this algorithm for developing the DFT circuit based on how Walsh transforms is generated. Since more efficient Fourier transforms [6]-[8]. An algorithm for Walsh transforms is derived based on product of Rademacher calculating DFT using Walsh transforms is developed through functions, the proposed 4-point DFT circuit is designed according the factorization of intermediate transform T [6]. Monir T et al to product Rademacher functions. The circuit consist of negative proposed an efficient combination of Walsh and DFT circuit, multiplexers, accumulator (real and imaginary), buffers, calculation. The technique is based on the utilize Radix-4 fast and control circuit. The control circuit is designed to produce Walsh Hadamard Transforms (FWHT) [7]. Another efficient Rademacher functions for controlling and managing data flow. technique of calculating both DFT and Walsh transforms by The 4-point DFT circuit has been successfully designed and utilizing Radix-2 algorithm was also published [8]. implemented to FPGA platform. Among the selected chips, Artix 7 is the fastest one. The previous combination algorithms are designed for parallel input and output data. This leads to huge number of memory resources which is not suitable for small circuit Keywords—DFT; Walsh transforms; Rademacher functions; applications. Therefore, we proposed an algorithm to minimize DFT matrix; Walsh matrix the use of memory by taking input serially and the output is gathered in parallel. This such algorithm may be achieved I. INTRODUCTION using product of Rademacher. The design has been done for 4- point DFT, details of the design is covered in section III. Digital signal processing is used in almost all electronics devices. The need for processing a signal is a must nowadays. In this paper, some basic theory of Walsh transforms, and Data or signal either from outside or internal have to process Fourier transforms are covered in the next section. Section III for specific purposes using specific application or processing. provide the detail design of both algorithm and circuit for 4- Often, data have to be transformed to other domains for easier point DFT. Section IV views the implementation and processing. discussion of the proposed algorithm into FPGA. Finally, the conclusion and some suggestions for future works are The most widely used transformation type is Fourier presented in the section V transforms. In terms of discrete model, Discrete Fourier Transforms (DFT) often used to converts signal or data to frequency domain. In domain frequency, it is easier to process II. BACKGROUND THEORY and extract information of the data. This is a powerful transformation model that has been used since long time ago. A. Fourier Transforms The Fourier transforms is a tool that converts a waveform The DFT is inefficient when it is implementing directly. (a function or signal) into an alternate representation, Many scientist proposed simplification of the process of data characterized by cosine and sine. The Fourier Transforms transformation of DFT. Those simplifications lead to the indicates that any waveform may be re-written as the sum of development of Fast Fourier Transforms (FFT) algorithm. sinusoidal functions [9]. During previous several decades, researchers have been developed the algorithm of the FFT implementation such as Usually, Fourier transform is used to analyze the frequency Radix-2, Radix-4, Split Radix, Prime Factor Algorithm (FPA), content of a signal, design a system/ filter with particular and Winograd Fourier Transform (WFTA) [1]. Those properties, and solve differential equations in the frequency domain using algebraic operators. It is possible to develop an

The authors gratefully acknowledge the financial support from Syiah Kuala University, Ministry of Education and Culture, Indonesia under project Hibah Bersaing, No. 035/SP2H/PL/Dit.Litabmas/II/2015, Feb 5, 2015. 142 alternative Fourier representation for finite duration sequences Hadamard transforms for transform lengths N (N-point) can that is referred to the Discrete Fourier Transforms (DFT) be performed by multiplying the input values (numbers) X with [9],[10]. H to produce the output as the transformed coefficients matrix, Y as follows: DFT contains signal which is discrete and periodic, as can be expressed in the following expression (x(n) represents N 1 point discrete signal in time domain). Y = ()H X (7) N N N −1 = nk X (k) ∑ x(n)WN (1) Hadamard matrix is a square matrix whose entries are n=0 either +1 or -1 and whose rows are mutually orthogonal. This − j2π / N matrix is another name of Walsh functions based on Hadamard where = therefore WN e , ordering [12]. A direct implementation of eq. (7) in case input transform N −1 N −1 (2) 2 nk length N will require N –N additions and subtractions. This = − j2πkn/ N = ()− j2π / N X (k) ∑ x(n)e ∑ x(n) e huge area consumption, challenges many scientists to develop n=0 n=0 the more efficient computation algorithms. One such method to Direct computation of the above equations is inefficient reduce number of additions and subtractions has been since they do not extract the symmetry and periodicity property introduced in terms of unified matrix by Fino and Algazi [13]. as follows: The implementation of this idea require N(log2N) additions and subtractions. Many workers adopted Fino et al idea in order to (1) Symmetry property: realize the Walsh transform; Fast (FHT) k+N / 2 = − k is the popular one. WN WN (3) (2) Periodicity property Another way to perform Walsh transforms is by evaluating k+N k it in terms product of Rademacher functions [12]. It has W =W (4) N N attracted many scientists to develop effective and efficient Direct computation of DFT of a complex value sequence structures. In general, Walsh transforms may be evaluated as x(n) can be expressed further as follows N−1 ⎡ 2πkn 2πkn⎤ N −1 X (k) = x (n)cos + x (n)sin (5) = 1 ψ () (8) R ∑⎢ R I ⎥ Yn ∑ X k n,t n=0 ⎣ N N ⎦ N k=0 N−1 ⎡ 2πkn 2πkn⎤ where ψ ()n,t refers to any ordering of Walsh functions. = − − (6) X I (k) ∑⎢xR (n)sin xI (n)cos ⎥ n=0 ⎣ N N ⎦ III. CIRCUIT DESIGN The DFT requires N2 (NxN) complex multiplications, which is each X(k) requires N complex multiplications. Walsh transforms converts a signal in time domain into Therefore to determine the values of the DFT (from X(0) to frequency domain in very simple way. Walsh matrix contains X(N-1)) N2 multiplications are required. The DFT also requires number either +1 or -1. Therefore, in the transformation (N-1)*N complex additions that is each X(k) requires N-1 process, there will be no multiplication task. Let consider additions. Therefore to evaluate all the values of the DFT (N- Walsh matrix for transform lengths N=4 as follow: 1)*N additions are required [1],[9] After Cooley and Tukey developed the divide-and-conquer method, many scientists proposed the algorithm for simplifying the calculation of DFT. All of them aim to reduce number of memory usage and arithmetic functions [1]. In performing the transformation, it requires addition or subtraction only. This has been performed in many works for B. Walsh Transforms certain signals [2],[3],[14]-[16]. In spite of this, this Walsh transforms performs a symmetric, orthogonal, and transformation method is rarely used in applications. linear operation on 2m real numbers (or complex numbers). In terms of discrete one, Walsh transform is used to transform In contrast, DFT requires very complicated algorithm and numbers (information) from time domain to frequency domain. circuit in performing transformation process. However, DFT A method of transforming information from time domain, provide more useful information about a signal in frequency represented in real numbers, is known as Hadamard transforms domain. DFT matrix may contains non integer and complex or Walsh Hadamard transforms. This method is also known as numbers. Obviously, it would require more circuit in Walsh-Fourier transforms, since it is an example of a transformation process. The way of DFT performing generalized class of Fourier transforms [11]. transformation may be imitated the behavior of Walsh transforms performed, especially for N=4. Let’s consider DFT matrix (N=4) as follows.

143 condition. If R(1) is not zero, buffer F3 will store negative value of x. If R(0) is zero, buffer F2 store positive value of input data x, values stored in buffers F1 and F3 will consider as imaginary. Otherwise, if R(0) is not zero, buffer F2 will store negative value of x, the values in buffers F1 and F3 will be The matrix is quite similar to Walsh matrix. The 4-point accumulated in real accumulators Real1 and Real3, DFT matrix contain numbers either positive or negative except respectively. The values either in real accumulators or in for first row. We have developed the algorithm that connect imaginer accumulators will consider as the DFT results and both transforms method. We consider method of using product will be passing out the system when Clock is goes high. of Rademacher functions in transforming signal from time domain to frequency domain. The method has been develop more for distinguish and treats the complex numbers (in this case j and –j). Fig. 1 shows block diagram and data flow of the proposed 4-point DFT.

x Xr0 Xr1

Real Real Xr2 Negative Circuits Accumulators Data Buffers Xr3 Output Buffers Multiplexers Xi0 Xi1 Xi2

Control Imaginer Enter Circuits Accumulators Xi3 Output Buffers

Fig. 1. Blocks and data flow of the design 4-point DFT circuit

The design block diagram shown in the fig. 1 consist of: - Negative circuit - Multiplexers - Data buffers and Output buffers - Real accumulators - Imaginary accumulators - Control circuit. Input data x is connected to negative circuit, multiplexers and data buffers in parallel. Negative circuit is used to provide Fig. 2. Algorithm of 4-point DFT based on product of Rademacher functions negative value of input data x. multiplexers is used to selects either positive or negative value of input data x. The Fig. 3 shows design control circuit that is used to manage connection of data input x directly to the buffers in order to the data flow of the designed 4-point DFT circuit. The control avoid selection of multiplexers since the first row of DFT circuit is developed based on Rademacher functions. In matrix contain only positive value. practice, Rademacher functions can be easily generated using a counter. The signals are generated based on product of The selected values will be passed to either real Rademacher functions. Signal R(0) and R(1) are taken from the accumulators or imaginary accumulators through buffers also Least Significant Bit (LSB) and Most Significant Bit (MSB) of controlled by the signal from control circuit. Finally all values the counter up, respectively. Meanwhile, signal Rxor is product stored in output buffers will be passing out and consider as the of R(0) and R(1). These signals are then used to control DFT results. Data buffers is controlled by the control circuit, multiplexers and accumulators. but the output buffers is not. These buffers is used to stored temporary values before passing out. Fig. 2 views algorithm for the designed block of the fig. 1. Input data x is passing to the system in serial and the results of DFT are taking out in parallel. Every times signal Enter goes high, one data x is passing into the system and Rademacher functions is generated. Buffers (F0, F1, F2, F3) are used to stored input data temporary. Buffers F1, F2 and F3 will hold values temporary based on product of Rademacher functions

144

(b)

Fig. 3. Control circuits based on Rademacher functions Fig. 5. Configuration of control signal for: (a) Accumulator Real1 and Im1; (b) Accumulator Real3 and Im3 Fig. 4 shown the connections configuration of the control signals for multiplexers. The input data through Mux2 is IV. FPGA IMPLEMENTATION control by signal R(0). If the signal is high, then positive x will be passed to buffer F2. Otherwise, negative x will be selected. The proposed 4-point DFT design has been implemented The input data through mux1 is controlled by signal Rxor. into FPGA. Various Xilinx chips has been selected using When the signal is high, positive x will be pass to buffer F1. Xilinx ISE 13.4 software. The proposed design is realized Similarly, when the signal control R(1) is high, the positive using VHDL codes. Fig. 6 shows the behavior simulation value of input data will be selected and passed to buffer F3. results of the design. Input data x=[5,6,1,3] is passing into the circuit serially. The resulting DFT comes immediately, but this is not the final values since at that time, not all input values has been passing in. The right DFT values comes up after all input data has been passed. Here the output DFT is Xr=[15,4,-3,4] and Xi=[0,-3,0,3] represent real and imaginary part, respectively. Input data x is represented in 4-bit sign number, meanwhile the output data are represented in 6-bit sign number. This is to accommodate the possibility of summation results [5],[17].

Fig. 4. Configuration of control signal for multiplexers

Fig. 5 shown the connection configuration of the control signals for accumulators. Initially, all accumulators is zero. Fig. 5(a) shows the configuration connection of control signal R(0) for enabling accumulator Real1 or Im1. If R(0) is equal to ‘1’, data in buffer F1 will be accumulated in accumulator Im1 (accumulator Real1 is disable). Otherwise, data in buffer F1 will be accumulated in accumulator Real1. The value stored in buffer F3 will be accumulated in accumulator Im3 (accumulator Real3 is disable) if signal control R(0) is high. The value will be accumulate in accumulator Real3 if R(0) is being low when the signal Enter goes high. This behavior is shown in the Fig. 5(b). Fig. 6. Simulation results of the design DFT circuit

Table I views speed comparison of the proposed circuit into various Xilinx chips. The proposed DFT circuit is best implement into Artix 7 in terms of speed parameters: maximum frequency, input arrival time and output require time. Kintex 7 chip almost as fast as Artix 7. Spartan 3E and Spartan 6 are very slow compare to other chips, this is due to old technology. (a) In terms of occupies area, the old chips (Spartan 3E and Spartan 6) utilized less area than others. They require only 26 slice registers and 48 slice Look up Table (LUTs). Meanwhile the new technology chips utilized more area. This is might be because the new chips equipped with 6 input LUTs technology.

145 It would be not too efficient for small circuit implementation. Clock to Setup on destination clock Enter ------+------+------+------+------+ The comparison of area occupies are listed in the Table II. | Src:Rise| Src:Fall| Src:Rise| Src:Fall| Source Clock |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall| ------+------+------+------+------+ TABLE I. SPEED COMPARISON AMONG XILINX CHIPS Enter | 1.608| | | | ------+------+------+------+------+ Speed Variables

Chips Max Frequency Input arrival times Output require time Clock to Setup on destination clock clk ------+------+------+------+------+ MHz ns ns | Src:Rise| Src:Fall| Src:Rise| Src:Fall| Virtex 7 555 0.820 0.737 Source Clock |Dest:Rise|Dest:Rise|Dest:Fall|Dest:Fall| ------+------+------+------+------+ Spartan 6 352 2.631 3.597 Enter | 2.519| | | | Kintex 7 631 0.758 0.681 ------+------+------+------+------+

Artix 7 635 0.704 0.640 Spartan 3E 296 3.920 4.040 V. CONCLUSIONS

The circuit and algorithm of 4-point DFT circuit based on product of Rademacher functions has been designed and TABLE II. CCUPIES AREA COMPARISON AMONG XILINX CHIPS implemented into FPGA successfully. Data comes into the circuit in serial, meanwhile the output data are provided in Area Variables Chips parallel. Among the targeted chips, Artix 7 is the fastest one Slice Registers Slice LUTs and Spartan 3E and Spartan 6 occupies less area. There is no Virtex 7 43 56 multiplication process in this design. However, in application, Spartan 6 26 48 it is required circuit that is higher than 4-point DFT. In the Kintex 7 43 56 future work, it is possible to develop the 8-point DFT or higher Artix 7 43 56 using this method. Higher point DFT is implemented Spartan 3E 26 48 efficiently using FFT method such as Radix-2. Therefore, this design is suitable for FFT algorithms. There is no comparison can be made to other methods since the FPGA implementation The following data are concerning about static timing designed only for 4-point. report of the design circuit that has been implemented into Artix 7 chip. It can be seen that maximum setup and hold to clock edge vary from 0.333 to 0.558 ns and 1.403 to 1.693 ns. Some clock to pad data are shown there, node XR1(2) (6.558 REFERENCES ns) is the slowest and node XR2(1) is the fastest (2.759 ns). [1] P. Duhamel, and M. Vetterli, “Fast Fourier Transforms: a tutorial, Clock to setup on destination clock Enter is about 1.608 ns review and a state of the art,” Trans. Signal Processing, vol. 19, no. 4, (rise) and clock to setup on destination clock clk is about 2.519 pp. 259-299, April 1990. ns (rise). [2] A. Amira, A. Bouridane, and P. Milligan, and P. Sage, “A High Throughput, FPGA Implementation of A Bit Level Matrix Product,” Proceeding of IEEE Workshop on Signal Processing Systems Design and Implementation (SIPS), LA, USA, pp: 356-364, 2000. Device,package,speed: xc7a100t,csg324,C,-3 [3] A. Amira, A. Bouridane, P. Milligan, and M. Roula, “An FPGA Data Sheet report: Implementation of Walsh-Hadamard Transforms for Signal Processing,” All values displayed in nanoseconds (ns) Proceeding of IEEE International Conference on Acoustic, Speech and Signal Processing, Vol. 2, pp: 1105-1108, 2001. Setup/Hold to clock Enter ------+------+------+------+ [4] M. Y. Zulfikar, S. A. Abbasi, and A. R. M. Alamoud, “FPGA Based |Max Setup to|Max Hold to | Clock | Processing of Digital Signals using Walsh Analysis,” Proceeding of Source | clk (edge) | clk (edge) | Phase | IEEE International Conference on Electrical, Control and Computer ------+------+------+------+ Engineering (INECCE 2011), pp: 440-444, 21-22 June, Pahang, X<0> | 0.558(R)| 1.403(R)| 0.000| Malaysia, 2011. X<1> | 0.475(R)| 1.597(R)| 0.000| X<2> | 0.333(R)| 1.590(R)| 0.000| [5] Zulfikar, S. A. Abbasi, and A. R. M. Alamoud, “A Novel Complete Set X<3> | 0.413(R)| 1.693(R)| 0.000| of Walsh and Inverse Walsh Transforms for Signal Processing,” ------+------+------+------+ Proceeding of IEEE International Conference on Communication Systems and Network Technologies (CSNT 2011), pp: 504-509, Katra, Jammu, 3-5 June, 2011. Clock clk to Pad ------+------+------+------+ [6] S. Boussakta, and A. G. J. Holt, “Fast algorithm for calculation of both |Max (slowest) clk |Min (fastest) clk| Clock| Walsh-Hadamard and Fourier transforms (FWFTs),” Electron. Letter, Destination| (edge) to PAD | (edge) to PAD | Phase| vol. 25, no. 20, pp. 1352-1354, 1989. ------+------+------+------+ [7] Monir T. Hamood and, Said Boussakta, “Fast Walsh–Hadamard–Fourier XR1<2> | 6.558(R) | 2.912(R) | 0.000| transform algorithm,” Trans. Signal Processing, vol. 59, no. 11, pp. XR1<3> | 6.396(R) | 2.816(R) | 0.000| XR2<0> | 6.359(R) | 2.780(R) | 0.000| 5627-5631, November 2011 XR2<1> | 6.335(R) | 2.759(R) | 0.000| [8] Teng Su, and Feng. Yu, “A Family of Fast Hadamard–Fourier XR3<2> | 6.554(R) | 2.908(R) | 0.000| Transform Algorithms,” Signal Processing Letters, vol. 19, no. 9, pp. ------+------+------+------+ 583-586, September 2012.

146 [9] John. G. Proakis, and Dimitris G. Manolakis, Digital signal processing: [15] B. J. Falkowski, and T. Sasao, “Unified Algorithm to Generate Walsh principles, algorithms, and applications, 4th ed., Pearson Prentice Hall, Functions in Four Different Orderings and Its Programmable Hardware New Jersey, 2007. Implementations,” Proceeding of IEE on Vision, Image and Signal [10] S. Salivahanan, A. Vallavaraj, and C. Gnanapriya, Digital Signal Processing, Vol. 152, Issue: 6, pp: 819-826, 2005. Processing, McGraw Hill, New Delhi, 2000. [16] P. K. Meher, and J. C. Patra, “Fully-Pipelined Efficient Architectures for [11] “Hadamard Transform.” en.wikipedia.org. Wikipedia, 6 Feb. 2011. FPGA Realization of Discrete Hadamard Transform,” Proceeding of Web. 27 Apr. 2011. International Conference on Application Specific Systems, Architectures and Processors (ASAP 2008), pp: 43-48, 2008 [12] M. G. Karpovsky, R. S. Stankovic and J. T. Astola, Spectral Logic and Its Applications for The Design of Digital Devices, John Wiley & Sons [17] Zulfikar, S. A. Abbasi, and A. R. M. Alamoud, “FPGA Based Complete Inc. Publication, New Jersey, 2008 Set of Walsh and Inverse Walsh Transforms for Signal Processing,” Transaction of Electronics and Electrical Engineering, vol. 18, no. 8, pp. [13] B. J. Fino and V. R. Algazi, “Unified Matrix Treatment of the Fast 3-8, October 2012. Walsh-Hadamard Transform,” IEEE Transactions on Computers, Vol. 42, pp: 1142-1146, 1976 [14] S. K. Bahl, “Design and Prototyping a Fast Hadamard Transformer for WCDMA,” Proceeding of 14th IEEE International Workshop on Rapid Systems Prototyping, pp: 134-140, 2003

147