OpenRISC-based System-on-Chip for Digital Signal Processing

Alexander Lopez-Parrado´ Juan-Camilo Valderrama-Cuervo Electronics Engineering Program Envigado’s Traffic Light System Management Universidad del Quind´ıo Envigado, Antioquia, Colombia Armenia, Quind´ıo, Colombia e-mail: [email protected] e-mail: [email protected].

Abstract—This paper presents the design and implementation inspired from open source software models has been deployed of an OpenRISC-based System-on-Chip (SoC), which is composed since the last ten years. This model has been supported of hardware cores implementing the Digital Signal Processing by communities like OpenCores, which develops open (DSP) functions: Finite Impulse Response (FIR) filter, Infinite source hardware under the Lesser General Public License Impulse Response (IIR) filter and Fast Fourier Transform (FFT). (LGPL). OpenCores community has remarkable products The FIR-filter core is based on the transpose realization form, as the OpenRISC core [7] and the Wishbone bus the IIR-filter core is based on the Second Order Sections (SOS) architecture and the FFT core is based on the Radix specification [8], which jointly allow the development of SoC 22 Single Delay Feedback (R22SDF) architecture. The three hardware. However, OpenCores community lacks of fully cores are compatible with the Wishbone SoC bus, and they parameterizable DSP cores compatible with the Wishbone were described using generic and structural VHDL. In-system bus. By considering previous ideas, we developed cores FIR hardware verification was performed by using an OpenRisc-based filter, IIR filter and FFT under the LGPL license, which are SoC synthesized on an FPGA. Tests showed that the compatible with the Wishbone bus and allow the development designed DSP cores are suitable for building SoC based on the of DSP-SoC based on the OpenRISC processor [9]. The OpenRisc processor and the Wishbone bus. FIR-filter core is based on the transpose architecture [1] [2], Keywords—Digital signal processing, digital filters, finite the IIR core is based on the SOS architecture [1] [2], and the impulse response filters, infinite impulse response filters, fast FFT core is based on the R22SDF architecture [10]. The three Fourier transforms, system-on-chip, open source hardware, open cores were described using generic and structural VHDL and RISC processor, Wishbone bus. targeted to an Altera FPGA device. This paper is organized as follows: First, section II describes some theoretical concepts I.INTRODUCTION about DSP and Wishbone bus, then section III presents the design of the DSP cores architecture and describes the ODAY’S technology uses heavily Digital Signal functional blocks, later the section IV shows the in-system T Processing (DSP) on its applications, and since the past hardware verification results, and finally the conclusion and 20 years [1] these applications have been growing up because the acknowledgements are presented. the performed improvements to digital integrated circuits in speed, integration capabilities and power consumption. II.THEORETICAL BACKGROUND The increased speed of integrated circuits allows real time processing of signals with higher bandwidths such as the This section presents some theoretical concepts about the ones used in communication systems [1]. Nowadays there are DSP functions that were implemented, and the SoC bus Digital Signal Processors (DSPs) devices specifically designed Wishbone. for DSP that perform real time filtering, Fourier transforms, Wavelet transforms, or encoding processes on audio and video A. FIR Filters signals. Nevertheless, the parallel nature of DSP algorithms FIR filters are discrete Linear Time Invariant (LTI) systems has motivated research interest to hardware solutions based that have a finite duration impulse response h[n]. Practical on reconfigurable targets such as the Field Programmable implementations of FIR filters are always stable because of Gate Arrays (FPGAs); these solutions have demonstrated their non-recursive nature. Eq. 1 shows the direct realization improvements in speed and power consumption compared of a FIR filter. with the DSPs-based ones [2]. There are several FPGA-based DSP solutions, which are developed by private corporations such as Altera and . These solutions include FIR N−1 X filtering cores [3] [4], FFT cores [5] [6], among others; y[n] = h[k]x[n − k] (1) however these cores have expensive licenses for commercial k=0 use or they can be used for free only for academic purposes. Nonetheless, a new open source hardware development model Here, x[n] is the input signal, y[n] is the output signal and N is the length of the impulse response h[n]. There are several 978-1-4799-7666-9/14/$31.00 2014 IEEE 32 16 8 4 2 1

BF2 BF2I BF2 BF2I BF2 BF2I X[k] x[n] t s sI t I s sI t I s sI I . . W2[n] W3[n] . clk . 5 4 3 2 1 0

. + + + Fig. 3. R22SDF architecture for 64-point FFT. Fig. 1. Transpose realization form for a FIR filter of length N.

√ + N−1 X − i2πkn X[k] = x[n]e N (3)

n=0

+ According to the used radix, FFT algorithms can be

radix-2, radix-4, radix-22, radix-8, mixed-radix, split-radix [2]

2 + [5] [6] [10], among others. The radix-2 algorithms have become popular for hardware implementations of the FFT Fig. 2. Transpose type II realization form for second order section. [5] [6] [10] due to their regularity, simple control, pipelined operation, and low hardware resources usage; the R22SDF architecture is based on a radix-22 algorithm and it is suitable realization forms for FIR filters [1]; in the case of hardware for FFT hardware [5] [6] [10] [11]. Fig. 3 shows the R22SDF implementations, the transpose form has the shortest critical architecture for a 64-point FFT. path and it is less sensitive to the round-off errors when fixed 2 point arithmetic is used [1] [2]. Fig. 1 shows the transpose The R2 SDF architecture uses two types of butterflies, which have a similar structure to the radix-2 butterfly, the realization form for a FIR filter with impulse response of length 2 N. R2 SDF architecture has a resource usage similar to the radix-4 algorithms [10]. From Fig. 3 it can be seen that control is performed by a log (N)-bit counter. B. IIR Filters 2 IIR filters are discrete Linear Time Invariant (LTI) systems D. OpenRISC processor and the Wishbone bus that have an infinite duration impulse response. Practical OpenRISC is a Reduced Instruction Set Computer (RISC) implementations of IIR filters can become unstable because 32-bit soft-core processor designed by OpenCores community, of their recursive nature [1]. Eq. 2 shows the direct realization its architecture is described in a standard document [7]; also of a (P − 1)-th order IIR filter. a synthesizable description in under the LGPL license is available through the OR1200 core [12]. OpenRISC allows P −1 P −1 the development of SOCs by using the interconnection bus X X Wishbone which is described in a standard document [8]. y[n] = bkx[n − k] − aky[n − k] (2) k=0 k=1 Before our project [9], the OpenCores community had not developed DSP cores with Wishbone connectivity, thus Here, bk is the coefficients set for the non-recursive part, the DSP cores proposed in this paper are the first ones ak is the coefficients set for the recursive part, x[n] is the Wishbone compatible, and they use the basic connection input signal, and y[n] is the output signal. There are several depicted in Fig. 4. The OpenCores community has developed realization forms for IIR filters [1]; the transpose type II form some reference SoCs based on the OpenRISC processor which has the shortest critical path and the SOS form is less sensitive are FPGA-synthesizable; one of the simplest is MinSoC to the round-off errors when fixed point arithmetic is used [13], which allows an easy and fast verification of the [1] [2]. In the case of hardware implementations the SOS OpenRISC-based SoC with custom slave modules such as the form has good stability for high order filters, and the critical DSP cores we designed. path is minimized by using the transpose type II form for each second-order section. Fig. 2 shows the transpose type II III.DSPCORES ARCHITECTURE realization form for a single second-order section. In this section we describe the designed DSP cores and Here, Nsect is the number of second-order sections and G the slave interfaces with the Wishbone bus. For all DSP cores is the total gain after the SOS decomposition√ [1], thus each we used fixed point arithmetic, where the word length, guard N second-order section has a gain of sect G. The whole IIR bit, fractional part [2], filter order, and FFT length [1] are filter is composed by a cascade of Nsect second-order sections parameterizable features through VHDL generics. Each DSP as the shown in Fig. 2. core was designed using the two-unit structure shown in Fig. 4. C. Fast Fourier Transform The processing unit performs the DSP operation according The FFT is an algorithm that efficiently computes the to the specific core, and the slave interface unit is the Wishbone Discrete Fourier Transform (DFT) of a discrete time signal interface for the SoC connection. The interface lines are named [1] [2]. The DFT of a signal x[n] is shown in Eq. 3 according to the Wishbone bus standard [8]. DSP Core writes the number of used sections minus one from the Nsect sDATi[31:0] available sections; IIR GAIN is a write-√ only register; in sDATo[31:0] N sADRi[31:0] sect sSTBi this address, the user writes the gain G for each SOS sWEi Slave Processing sACKo interface unit by using fixed- point representation with Q fractional bits. sCLKi unit Internal sRSTi From IIR COEFF starts an addressing space composed of signals 6 × Nsect consecutive 32-bit address positions where the user can write the 16- bit fixed-point SOS coefficients starting st from IIR COEFF for a2-1 SOS section, IIR COEFF + 4 for Fig. 4. Two-unit structure for DSP cores. st st a1-1 SOS section, IIR COEFF + 8 for a0-1 SOS section, st IIR COEFF + 12 for b2-1 SOS section, IIR COEFF + 16 for b -1st SOS section, and IIR COEFF + 20 for b -1st SOS A. FIR-filter core 1 0 section; and finishing with IIR COEFF + 4×(6×Nsect−6) for th The processing unit of the FIR-Filter core was designed a2-Nsect SOS section, IIR COEFF + 4×(6×Nsect −5) for th using the transpose realization form shown in Fig. 1 and a1-Nsect SOS section, IIR COEFF + 4×(6×Nsect −4) for th considering the generic parameters N, word length M, and a0-Nsect SOS section, IIR COEFF + 4×(6×Nsect −3) for th guard bit Gr. Table I shows the register description of the b2-Nsect SOS section, IIR COEFF + 4×(6×Nsect −2) for th Wishbone interface for the FIR-filter core. b1-Nsect SOS section, and IIR COEFF + 4×(6×Nsect −1) th TABLE I. REGISTERDESCRIPTIONOFTHE FIR-FILTER CORE. for b0-Nsect SOS section.

Register Address C. FFT core FIR CONTROL[0:0] FIR BASE+0 FIR DATA[M + Gr − 1:0] FIR BASE+4 The processing unit of the FFT core was designed using FIR STATUS[0:0] FIR BASE+8 the pipelined version of the R22SDF architecture developed FIR Q[3:0] FIR BASE+12 in [11] and considering the generic parameters N, M, and FIR COEFF FIR BASE+16 Q. Table III shows the register description of the Wishbone interface for the FFT core. FIR DATA is a read/write register used to write/read the TABLE III. REGISTERDESCRIPTIONOFTHE FFT CORE. input/filtered sample. FIR CONTROL is a write-only register; when it is set by the user, the filtering process is started. Register Address FIR STATUS is a read-only register; it is set when the FFT CONTROL[0:0] FFT BASE+0 filtering process finishes. FIR Q is a write-only register; in FFT DATA[2 × M − 1:0] FFT BASE+4 this address the user writes the number of fractional bits FFT STATUS[0:0] FFT BASE+8 of the fixed-point representation of the filter coefficients. FFT MEMORY FFT BASE+12 From FIR COEFF starts an addressing space composed of N consecutive 32-bit address positions where the user can FFT DATA is a write-only register used to write the input write the 16-bit fixed-point filter coefficients starting from samples. FFT CONTROL is a write-only register; when it is FIR COEFF for h[0] and finishing with FIR COEFF+4×(N − written by the user the processing unit and the status register 1) for h[N − 1]. are cleared; FFT STATUS is a read-only register; it is set when the whole FFT process finishes. From FFT MEMORY starts B. IIR-filter core an addressing space composed of N consecutive 32-bit address positions where the user can read the FFT results starting from The processing unit of the IIR-Filter core was designed FFT MEMORY for X[0] and finishing with FFT MEMORY using a cascade of pipelined SOS as the shown in Fig. 2 + 4 × (N − 1) for X[N − 1]. and considering the generic parameters Nsect, M, Gr, and fractional bits Q. Table II shows the register description of the Wishbone interface for the IIR-filter core. IV. IN-SYSTEM HARDWARE VERIFICATION

TABLE II. REGISTERDESCRIPTIONOFTHE IIR-FILTER CORE. The three DSP cores were integrated into an OpenRISC-based SoC built from the reference design Register Address MinSoC [13]. The SoC was synthesized on the Altera FPGA IIR CONTROL[0:0] IIR BASE+0 device EP2S60F1020C4 included in the development board IIR DATA[M + Gr − 1:0] IIR BASE+4 TREX-S2-TMB [14]. The accuracy of the cores was measured IIR STATUS[0:0] IIR BASE+8 in terms of the Mean Squared Error (MSE) between frequency IIR NSECT[3:0] IIR BASE+12 responses in DFT domain as shown in Eq. 4 IIR GAIN[M − 1:0] IIR BASE+16 IIR COEFF IIR BASE+20 N−1 X IIR DATA is a read/write register used to write/read the MSE = |X[k] − X˜[k]|2 (4) input/filtered sample. IIR CONTROL is a write-only register; k=0 when it is set by the user, the filtering process is started. IIR STATUS is a write/read register; it is set when the filtering Here X[k] is the frequency response in the DFT domain process finishes, and it is cleared with a write operation. when it is computed by simulation using double precision IIR NSECT is a write-only register; in this address, the user floating point arithmetic, and X˜[k] is the frequency response of the core in DFT domain. In the case of the FIR-filter simulation using double-precision floating-point arithmetic. core, we designed a 49-th order equiripple low-pass filter The frequency response of the IIR-filter core was computed with cutoff frequencies 3/8π rad/s and π/2 rad/s. The core by getting the impulse response and taking its DFT, this is parameterized with M = 16, Q = 15, and Gr = 8. Fig. is depicted with the red plot. In this case, the MSE is 5 shows the magnitude frequency responses for the tested 4.83 × 10−5 Table V shows the synthesis report for the FIR-filter core. IIR-filter core.

Frequency response of the FIR filter 1.4 TABLE V. SYNTHESIS REPORT FOR THE IIR COREWITH Nsect = 6.

1.2 |HS (Ω)| Paramter Value |HM (Ω)| 1 Logic utilization 5 % Combinational ALUTs 2201 / 48352 (6 %) 0.8 Dedicated logic registers 504 / 48352 ( 1 %) 0.6 Total block memory bits 0 / 2544192 ( 0 %) 0.4 DSP block 9-bit elements 288 / 288 ( 100 %) 0.2 Maximum operating frequency 85.81 MHz

0 0 0.51 1.52 2.53 3.5 Ω The IIR-filter core requires the 5% of the resources and reaches up a maximum operating frequency of 85.81 MHz. Fig. 5. Frequency response of the FIR-filter core. The FFT core was parameterized with N = 1024 , M = 16, Q = 15, and a total gain of 2 [11]. Fig. 7 shows the frequency In Fig. 5, the blue plot depicts the magnitude frequency responses for the tested FFT core. response of the FIR filter when it is computed by simulation using double-precision floating-point arithmetic. ×10−3 Frequency response of the FFT The frequency response of the FIR-filter core was computed 4.5 4 by getting the impulse response and taking its DFT, this |FFTS | |FFT | is depicted with the red plot. In this case, the MSE is 3.5 M 1.16 × 10−4 Table IV shows the synthesis report for the 3 2.5 FIR-filter core. 2 TABLE IV. SYNTHESIS REPORT FOR THE FIR COREWITH N = 50. 1.5 1 Paramter Value 0.5 Logic utilization 27 % 0 0 200 400 600 800 1000 1200 Combinational ALUTs 11598 / 48352 ( 24 %) k Dedicated logic registers 1947 / 48352 ( 4 %) Total block memory bits 0 / 2544192 ( 0 %) Fig. 7. Frequency response of the FFT core. DSP block 9-bit elements 100 / 288 ( 35 %) Maximum operating frequency 103.92 MHz In this case, we computed the FFT for the signal x[n] = {−(2−9+2−13+2−16+2−17), 2−9, 0,..., 0} ∀ 0 ≤ n < 1024. The FIR-filter core requires the 27% of the resources and In Fig. 7 the blue plot depicts the FFT computed by simulation reaches up a maximum operating frequency of 103.92 MHz. using double-precision floating-point arithmetic; the red plot depicts the FFT computed by the core. In this case, the MSE In the case of the IIR-filter core, we designed a 12-th order is 1.64 × 10−9. Butterworth band-pass filter with cutoff frequencies 0.10625 rad/s, 0.11875 rad/s, 0.1025 rad/s, and 0.1225 rad/s. The core Table VI shows the synthesis report for the FFT core. was parameterized with Nsect = 6, M = 16, Q = 13, and TABLE VI. SYNTHESIS REPORT FOR THE FFT COREWITH N = 1024. Gr = 8. Fig. 6 shows the magnitude frequency responses for the tested IIR-filter core. Paramter Value

Frequency response of the IIR filter Logic utilization 69 % 1.4 Combinational ALUTs 1085 / 48352 (2 %) 1.2 |HS (Ω)| Dedicated logic registers 33095 / 48352 ( 68 %)

|HM (Ω)| 1 Total block memory bits 73728 / 2544192 ( 3 %)

0.8 DSP block 9-bit elements 32 / 288 ( 11 %) Maximum operating frequency 115.3 MHz 0.6 0.4 The FFT core requires the 69 % of the resources 0.2 and reaches up a maximum operating frequency of 115.3 0 0 0.51 1.52 2.53 3.5 MHz. Table VII shows the synthesis report for the whole Ω OpenRISC-based DSP SoC.

Fig. 6. Frequency response of the IIR-filter core. In this case, the OpenRISC-based DSP SoC requires the 100 % of the resources and reaches up a maximum operating In Fig. 6, the blue plot depicts the magnitude frequency frequency of 55.05 MHz. The reduced operating frequency is response of the IIR filter when it is computed by due to MinSoC not the DSP Cores. TABLE VII. SYNTHESIS REPORT FOR THE [13] R. F. et. al. (2013) Minsoc project. [Online]. Available: http: OPENRISC-MINSOC-BASED DSPSOC. //opencores.org/project,minsoc Paramter Value [14] T. Tehcnologies, TREX-S2-TMB Motherboard for Logic utilization 100 % II FPGA Module Data Book v1.3, Terasic Tehcnologies, Combinational ALUTs 33568 / 48352 (69 %) 2006. [Online]. Available: http://www.terasic.com.tw/cgi-bin/ page/archive download.pl?Language=English&No=189&FID= Dedicated logic registers 39428 / 48352 ( 82 %) d27fe61e50f8d9c5c7d0278b78c8f4fd Total block memory bits 494464 / 2544192 ( 19 %) DSP block 9-bit elements 288 / 288 ( 100 %) Maximum operating frequency 55.05 MHz

V. CONCLUSION We designed an OpenRISC-based SoC for DSP composed of the Cores FIR filter, IIR filter, FFT, which are compatible with the Wishbone bus. These DSP cores are parameterizable through VHDL generics and they have easy-to-use hardware/software interfaces. The three DSP cores we designed are the only of their kind in the OpenCores community, because of the broad DSP functions availability, the Wishbone compatibility, the flexibility, and speed performance. The three cores have been tested on Altera FPGA devices Cyclone II and Stratix II.

ACKNOWLEDGMENT Juan Camilo Valderrama-Cuervo thanks Prof. Lopez-Parrado´ for the given support and teachings. Alexander Lopez-Parrado´ thanks Colciencias for the scholarship, and he also thanks Universidad del Quindo for the study commission.

REFERENCES [1] S. K. Mitra, Digital Signal Processing: A Computer-Based Approach, 4th edition. McGraw-Hill, 2010. [2] U. Meyer-Baese, Digital Signal Processing with Field programmable Gate Array. Springer, 2005. [3] A. Corporation, FIR Compiler User Guide, 11th ed., Altera Corporation, May 2011. [Online]. Available: http://www.altera.com/ literature/ug/fircompiler ug.pdf [4] X. Corporation, IP LogiCORE FIR Compiler v5.0, Xilinx Corporation, 2011. [Online]. Available: http://www.xilinx.com/support/ documentation/ip documentation/fir compiler ds534.pdf [5] A. Corporation, FFT MegaCore Function User Guide, 12th ed., Altera Corporation, 2012. [Online]. Available: http://www.altera.com/ literature/ug/ug fft.pdf [6] X. Corporation, LogiCORE IP Fast Fourier Transform v7.1, Xilinx Corporation, 2011. [Online]. Available: http://www.xilinx.com/support/ documentation/ip documentation/xfft ds260.pdf [7] OpenCores, OpenRISC 1000 Architecture Manual, 1st ed., OpenCores, 2012. [Online]. Available: http://opencores.org/websvn,filedetails?repname=openrisc&path= %2Fopenrisc%2Ftrunk2Fdocs%2Fopenrisc-arch-1.0-rev0.pdf [8] ——, Wishbone Revision B.3 Specification, OpenCores Std., 2011. [9]A.L opez-Parrado´ and J. C. Valderrama-Cuervo. (2013) Wdsp project. [Online]. Available: http://opencores.org/project,wdsp. [10] S. He and M. Torkelson, “A new approach to pipeline fft processor,” in Proceedings of IPPS 96 the 10th International Parallel Processing Symposium, Honolulu, USA, April 1996, pp. 766–770. [11]A.L opez-Parrado,´ J. Velasco-Medina, and J. A. Ram´ırez-Gutierrez,´ “Revista de la facultad de ingeniera de la universidad de antioquia,” Efficient hardware implementation of a full COFDM processor with robust channel equalization and reduced power consumption, no. 68, pp. 48–60, 2013. [12] OpenCores. (2012) Or1200 project. [Online]. Available: http:// opencores.org/or1k/Main Page