FIXED-POINT C COMPILER FOR TMS320C50

Jiyang liang Wonyong Sung School of Electrical Engineering Seoul National University, KOREA jiyang@hdtv .snu.ac. kr and wysung@dsp, snu. ac. kr

ABSTRACT 2. FIXED-POINT ARITHMETIC RULES A fixed-point C compiler is developed for convenient and ef- An integer variable or a constant in C language consists of, ficient programming of TMS320C50 fixed-point digital sig- usually, 16 , and the LSB(Least Significant ) has the nal processor. This compiler supports the ‘fix’ data type weight of one for the conversion to or from the floating-point that can have an individual integer word-length according data type. This can bring overflows or unacceptable quanti- to the range of a variable. It can add or subtract two data zation errors when a floating-point digital signal processing having different integer word-lengths by automatically in- program is converted to an integer version. Therefore, it is serting shift operations. The accuracy of fixed-point mul- necessary to assign a different weight to the LSB of a vari- tiply operation is significantly increased by storing the up- able or a constant [7] [8]. For this purpose, we employed per part of the multiplied double-precision result instead of a fixed-point data type that can have an individual integer keeping the lower part as conducted in the integer multipli- word-length as follows: cation. Several target specific code optimization techniques are employed to improve the compiler efficiency. The empir- fix(integer-wordlength) variable-name; ical results show that the execution speed of a fixed-point C program is much, about an order of magnitude, faster than that of a floating-point C program in a fixed-point digital Note that the range (R)that a variable can represent and signal processor. the quantization step (Qs)are dependent on the integer word-length (IWL)as follows.

1. INTRODUCTION The reduction of development time by employing a high- level language is very much needed for the programming of digital signal processors. Especially, C compilers for floating-point digital signal processors are gaining accep- The arithmetic rules based on this fixed-point data rep- tance because of the shortened development time and the resentation and a hardware data-path having a 16 bit by 16 improved compiler efficiency. However, C compilers for bit two’s complement multiplier, 16 bit ALU, and a barrel fixed-point digital signal processors have met with little ac- shifter can be derived as follows. ceptance [l][Z] especially because of the overhead in execut- ing floating-point operations using a fixed-point data-path. 2.1. Addition or Subtraction One may use the ‘int’data type in C language not to use floating-point operations, but it results in a severe loss of ac- Two data can be added or subtracted after equalizing the curacy, especially for performing multiply operations, even integer word-length for them. The integer word-length can after careful scaling of the program. Since the lower part be modified by arithmetic shift operations. An arithmetic of the 31 bit product is stored as the result in the integer right shift of 1 increases the integer word-length by 1. For multiplication, the input data to the multiplier have to be example, the program shown in Fig. I-(a) can be compiled severely scaled down in order to prevent overflows. It is also as depicted in Fig. 1-(b). Note that SFR(1) represents an tedious to develop integer programs. one-bit arithmetic right shift. In this paper, we propose a C compiler that supports the ‘fix’ data type and corresponding arithmetic operations to solve these problems. The execution speed and the finite word-length effects of the floating-point, integer, and fixed- y=x+y; point implementations are compared by using biquad IIR digital filter and ADM coder programs. Texas Instruments’ (4 fixed-point digital signal processor, TMS 320C50, is em- y = ADD x, y); ployed [3], and the GNU compiler, gcc, is modified. Discus- (SFR(1) sions on our prototype fixed-point C compiler based on the (b) lcc retargetable C compiler [4] is shown in [5]. Relevant ap- proaches for the Motorola 56000 DSP and automatic scaling Figure 1. Fixed-point add-operation at assembly level can be found in [6] and [7], respectively.

0-8186-7919-0/97 $10.00 0 1997 IEEE 707

Authorized licensed use limited to: Universidad Nacional Autonoma de Mexico. Downloaded on August 13,2010 at 22:21:10 UTC from IEEE Xplore. Restrictions apply. x(15:O)

MSB P(31:O) LSB mod;fied gcc c-) C-C P(15:O) P(30:15) + 4 to memory to memory (fixed-point case) (integer case)

Figure 2. Integer and fixed-point multiplication 2.2. Multiplication Two’s complement multiplication of two 16 bit data, x and y, yields a 31 bit result in the P register, P (30 :0) , as shown in Fig. 2. Note that an extra sign bit is eliminated in the two’s complement multiplication process. In the mul- tiplication of the ‘fix’ type data, the upper 16 bit part, P(30: 15), is stored as the product, while the lower 16 bit part is stored in the integer multiplication (Fig. 2). For example, the program shown in Fig. 3 -(a) can be compiled as depicted in Fig. 3 -(b) by the fixed-point C compiler. Note that SFL(2) represents a two-bit left shift.

fix(2) x; /* IWL of 2 */ fix(3) y; of /* IUL 3 */ W y=x*y; Figure 4. Overall Compiler structure (4 P(31:O) = RUL (x, y); y = SFL(2) P(30:15); lCsource, file (b) modified scaler w gcc Figure 3. Fixed-point multiply-operation

3. COMPILER STRUCTURE The proposed fixed-point C compiler for TMS320C50 is based on the GNU C compiler, gcc, from the Free Software Foundation [9]. The overall compiler structure is shown in Figure 5. Scaling steps Fig. 4. Not only the back-ends of the gcc, which conducts the target specific code generation, but also the middle part gcc, such as jump optimization, common subexpression were modified to implement a few unique features in this elimination, and allocates registers. fixed-point C compiler, such as scaling and code optimiza- In the next step, after appropriate peephole optimiza- tion. tions, the code generator produces the intermediate assem- The overall compilation process is as follows. In the first bly code by annotating the RTL representations based on step, the compiler preprocesses source program, and ob- the instruction patterns defined in the target description tains the integer word-length information of the variables file. At the same time, the compiler divides the source pro- used in the source program. The integer word-length can gram into smaller blocks. The integer word-length of the be given manually or determined automatically using the accumulator in each block is uniquely determined on the ba- Fixed-Pornt Optimrzation Utility [8]. The Fixed-point Op sis of the integer word-length informations of the operands timixation Utility converts the float data type variables involved in the arithmetic operations. The integer word- into a corresponding C++ range estimation class, and esti- lengths of temporary variables which are assigned automat- mating their integer word-lengths by simulation. Note that ically by the compiler are also determined in this step. the integer word-length of a constant is determined auto- Finally, with the unscaled intermediate assembly code matically by the compiler. Source programs are converted and various integer word-length informations, the scaler into syntax trees by the parser. These syntax trees are performs the scaling phase by applying the fixed-point then transformed into intermediate representations known arithmetic rules discussed before. The number of shifts re- as register transfer language (RTL) expressions by the RTL mained undetermined in the intermediate assembly code are generator. determined according to the word-lengths of relevant vari- In the second step, the compiler conducts several ables and the accumulator. The compilation steps focused machine-independent optimizations that are supported by on scaling procedures are shown in Fig. 5.

708

Authorized licensed use limited to: Universidad Nacional Autonoma de Mexico. Downloaded on August 13,2010 at 22:21:10 UTC from IEEE Xplore. Restrictions apply. 4. CODE OPTIMIZATIONS In this compiler, several post optimization techniques are ac- implemented. In TMS320C50, certain instructions are con- trolled by mode variables, such as the product-shift mode ‘ Method Manual Fixed- Floating- (PM), the data page pointer (DP), and t,he sign-extension Assembly Point C Point C mode (SXM). In order to use the direct addressing mode to sQNR 64.09 57.69 access global variables efficiently, it is especially important (dB) to find and remove any redundant instructions to load the Machine 16 40 938 data page pointer (i.e., LDP). Whenever a redundant LDP Cycles instruction is identified and eliminated, two machine cy- cles can be saved. With the informations from the reaching 5.2. Adaptive Delta Modulater definition analysis [IO], we can eliminate redundant mode setting instructions. An adaptive delta modulater program shown in Fig. 6 In TMS320C50, another optimization point is to improve is implemented using floating-point and fixed-point data the allocation of local variables. Since address change more types. The compiled codes are shown in Fig. 7. Since than one requires additional instructions (i.e., ADRK or this algorithm contains several conditional branches, the SBRK), minimizing the number of such instructions by op- upper and the lower bounds of the number of machine cy- timal allocation of local variables in order to utilize the cles are presented here. In this example, the performance post-increment and post-decrement addressing modes can gap between the manual assembly program and the com- improve the overall performance [ll] [12]. In addition, re- piled fixed-point C program is narrowed. The fixed-point C moving redundant offset-updating instructions also greatly program is 12 to 13 times faster than the floating-point C contributes to the performance of the indirect addressing program, but it is still slower than the hand-coded assembly mode. We employed an optimization algorithm which uses program by a factor of 1.42-1.58. a table of variables to track the values of auxiliary registers and the auxiliary register pointer similar to the algorithm shown in [13]. Table 2. ADM: performance comparison according Besides these optimizations, other peephole optimiza- tions such as instruction compaction and loop optimization Method/ Manual Fixed- Floating- are also performed to fully exploit the architecture specific Machine Assembly Point C Point C features of the TMS320C50. The instruction compaction is Cvcles to combine multiple operations into a parallelized instruc- Best 48 72 880 tion and the loop optimization is to improve the perfor- Worst 54 100 1283 mance of the test-and-branch sequence at the end of loop body. 6. DISCUSSIONS 5. EXPERIMENTAL RESULTS In this paper, a fixed-point C compiler which introduces A biquad IIR filter and an adaptive delta modulater a new data type to support fixed-point arithmetic rules for were taken for the experimental examples. We used the TMS320C50 digital signal processor is presented. The com- TMS320C2x/5x optimizing C compiler from Texas Instru- parison results show that the fixed-point C compiler can ments (version 6.60) [14] to compile the floating-point C provide an acceptable compromise to the users of the fixed- versions of the example programs, while our fixed-point C point digital signal processor in terms of SQNR, execution compiler was used for the fixed-point C versions. We set speed, and the development efforts. Although the execution the optimization levels of both compilers to be the level 0 time of the compiled fixed-point code is 1.5 to 2.5 times in all of our experiments. longer than that of the hand-coded assembly program, the compiler can be improved by applying better target specific 5.1. Biquad Filter optimization techniques in the future. At this moment, the A biquad digital filtering program shown in Eq. (3) and developed compiler does not support local fixed-point vari- (4) is implemented using floating-point and fixed-point data ables and fixed-point structures, but is useful for fixed-point types. prototyping of real-time digital signal processing applica- tions.

U[.] = 1.683u[n - 13 - 0.7843u[n - 21 (3) ACKNOWLEDGMENTS

+ 3: bl The research described in this paper was supported by the ~[n]= ~[n]- 0.669~[n- 11 -I- U[. - 21 (4) LG Electronics Research Center, KOREA. REFERENCES In the fixed-point implementation, the coefficients and the data can be represented in 16 bit full precision because Buyer’s Guide to DSP Processors, Berkeley Design there is no overflow in the multiplication. Thus, it is possi- Technology, Inc. ble to obtain the SQNR of about 60 dB as shown in Table 1. V. ZivojnoviC, “Compilers for Digital Signal Proces- In terms of the execution speed, it is shown that the fixed- sors,” DSP & Multimedia Technology, vol. 4, no. 5, pp. point C program is about 23 times faster than the floating- 27-45, July/August, 1995. point C program and 2.5 times slower than the hand-coded assembly program. Note that an optimally scaled integer C TMS320C55c User’s Guide, Houston, Texas Instruments program only yields the SQNR of 21.27 dB [5]. Inc., 1993.

709

Authorized licensed use limited to: Universidad Nacional Autonoma de Mexico. Downloaded on August 13,2010 at 22:21:10 UTC from IEEE Xplore. Restrictions apply. [4] C. Fraser and D. Hanson, A Retargetable C Compiler: Design and Implementation, Benjamin/Cummings, 1995. [5] Wonyong Sung and Jiyang Kang, ‘‘Fixed-point C Lan- guage for Digital Signal Processing,” in Proc. of Twenty- Ninth Annual Asilomar Conference on Signals, Systems and Computers, vol. 2, pp. 816-820, Oct. 1995. [6] K. Baudendistel, Compiler Development for Fixed- ... Point Processors, PhD thesis, Georgia Institute of Tech- LT -sr nology, 1994. HPYK 7AElh PAC [7] Seehyun Kim and Wonyong Sung, “A Floating-point SACH -se, 1 to Fixed-point Assembly Program Translator for the LARP AR6 TMS320C25,” IEEE Trans. Circuits and Systems, vol. SBRK 3 41, no. 11, pp. 730-739, Nov. 1994. LAC *,I6 [8] Seehyun Kim, Ki-I1 Kum, and Wonyong Sung, “Fixed- SUB -se,15 Point Optimization Utility for C and C++ Based Dig- ADRK 5 SACH * ital Signal Processing Programs,” in Proc. 1995 IEEE ,O SACH -d ,O Workshop on VLSI Signal Processing, pp. 197-206, Oct. ... 1995. LDPK -step [9] R. Stallman, Using and Porting GNU CC, Free Software LT -step Foundation, Inc., Nov. 1995. HPYK 7C29h PAC [lo] A. Aho, R. Sethi, and J. Ullman, Compilers - Prin- ADDK 7FFFh,12 ciples, Techniques, and Tools, Addison-Wesley, 1986. SACH -step,1 [Ill David H. Bartley, “Optimizing Stack Frame Accesses B L6 for Processors with Restricted Addressing Modes,” L3: Software-Practice and Experience, vol. 22, no. 2, pp. LDPK -step 101-110, Feb. 1992. LT -step HPYK 7C29h [12] S. Y. Liao, S. Devadas, K. Keutzer, S. Tjiang, and A. PAC Wang, “Storage Assignment to Decrease Code Size,” in SACH -step,1 Proc. of the 1995 ACM SIGPLAN Conference on Pro- L6: gramming Language Design and Implementation, pp. LDPK -sgnO 186-195, Jun. 1995. LAC -sgnO [13] Wen-Yen Lin, Corinna G. Lee, and Paul Chow, “An BZ L7 LDPK -se Optimizing Compiler for the TMS320C25,” in Proc. LAC -se,16 of the International Conference on Signal Processing ADD -step,l4 Applications and Technology, vol. 1, pp. 689-694, Oct. ADDK 7FFFh,11 1994. B L10 [14] TMS~.~OC~X/C~XX/C~Xoptimizing C Compiler, Hous- L7: ton, Texas Instruments Inc., 1995. LDPK -se LAC -se,16 SUB -step,14 SUBK TFFFh,11 fix(l2) sr; /* reconstructed signal */ L10: fix(i0) step; /* step size */ LARP AR6 fix(l2) se; /* estimated signal */ ADRK 2 fix(13) d; /* difference */ ... SACH * $0 LAC *,16 /* delay line */ LDPK sgn2 = sgnl; -sr SACH sgnl = sgn0; -sr,0 LAC *,16 /* beginning of adm */ SBRK se = ACOEF * sr; ... 2: d = in - se; sgnO = SIGBcd); code = sgn0; if((sgnO&&sgnl&&sgn2)Il(!sgnO&&!sgnl&&!sgn:!)) Figure 7. ADM example: compiled fixed-point C step = ALPHA * step + BETA; program else step = ALPHA * step; return (sr = (sgnO?(se+step+BIAS) : (se-step-BIAS))) ;

Figure 6. ADM example: fixed-point C program

710

Authorized licensed use limited to: Universidad Nacional Autonoma de Mexico. Downloaded on August 13,2010 at 22:21:10 UTC from IEEE Xplore. Restrictions apply.