Writing Callable Assembly Routines This document describes how to make ASM routines that are C callable. In addition this document has some tips on how to make more C like ASM files. Most of the information presented in this document is a summary from spru514g, “TMS320C28x Optimizing C/C++ v6.2.4”, and spru513g, “TMS320C28x Tools v6.2.4”. Using the information in this document and a little more reading from spru514g it is easy to right asm callable C functions as well but this document focuses on C callable assembly routines.

The basics  The C compiler prepends all symbols (i.e. function names, globals, constants, variables) with an underscore (7.4.1, spru514g) o This means the assembler does not prepend symbols with a _ therefore you must do so manually  To be callable by a C function an asm object must have a .def or .global assembler directive (7.4.1, spru514g)  C functions use LCR therefore your asm routine must return through a LRETR (7.3.2, spru514g)  In the C code you must have a forward declaration for the assembly routine (this is case sensitive) o Example

ASM:

.def _myASMFunc _myASMFunc: Assembly code LRETR C header:

// Other code void myASMFunc(void); // Other code  When passing arguments to a routine there are a set of rules followed by C. Note there are exceptions for functions which pass more than 63 bytes of data in their argument list, see 7.3.3 spru514g. The rules are copied here from 7.3.1, spru514g for your convenience

“3. Arguments passed to the called function are placed in registers and, when necessary, placed on the stack. Arguments are placed in registers using the following scheme:

(a) If the target is FPU and there are any 32-bit float arguments, the first four float arguments are placed in registers R0H-R3H.

(b) If there are any 64-bit integer arguments (long long), the first is placed in ACC and P (ACC holds the upper 32 bits and P holds the lower 32 bits). All other 64-bit arguments are placed on the stack in reverse order. If the P register is used for argument passing, then prolog/epilog abstraction is disabled for that

function. See Section 3.10 for more information on abstraction.

(c) If there are any 32-bit arguments (longs or floats) the first is placed in the 32-bit ACC (AH/AL). All other 32-bit arguments are placed on the stack in reverse order.

func1(long a, long long b, int c, int* );

stack ACC/P XAR5, XAR4

(d) Pointer arguments are placed in XAR4 and XAR5. All other pointers are placed on the stack.

(e) Remaining 16-bit arguments are placed in the order AL, AH, XAR4, XAR5 if they are available.

4. Any remaining arguments not placed in registers are pushed on the stack in reverse order. That is, the leftmost argument that is placed on the stack is pushed on the stack last. All 32-bit arguments are aligned to even addresses on the stack.

A structure argument is passed as the address of the structure. The called function must make a local copy.

For a function declared with an ellipsis, indicating that it is called with varying numbers of arguments, the convention is slightly modified. The last explicitly declared argument is passed on the stack so that its stack address can act as a reference for accessing the undeclared arguments. “

 When returning arguments there are a set of rules to follow as defined in 7.3.2, spru514g. These rules are copied here for your convenience.

“6. The called function returns a value. It is placed in a register using the following convention:

16-bit integer value AL 32-bit integer value ACC 64-bit integer value ACC/P 16- or 22-bit pointer XAR4

If the target is FPU and a 32-bit float value is returned, the called function places this value in R0H.

If the function returns a structure, the caller allocates space for the structure and passes the address of the return space to the called function in XAR4. To return a structure, the called function copies the structure to the memory block pointed by the extra argument. In this way, the caller can be smart about telling the called function where to return the structure. For example, in the statement s= f(x), where S is a structure and F is a function that returns a structure, the caller can actually make the call as f(&s, x). The function f then copies the return structure directly into s, performing the assignment automatically.

If the caller does not use the return structure value, an address value of 0 can be passed as the first argument. This directs the called function not to copy the return structure.

You must be careful to properly declare functions that return structures both at the point where they are called (so that the extra argument is passed) and at the point where they are declared (so the function knows to copy the result). Returning 64-bit floating-point values (long double) are returned similarly to structures. “

 There are several registers that can be used without having to save them because the calling program or returning program is in charge of those registers. There are four cases: o Save on entry – the called program must restore the register to its original value before returning o Not save on entry – the called function can corrupt the register o Save on call – calling function must save the register before calling the function o Not Save on call – calling function doesn’t have to save the register

Making C like ASMs Assembly is hard to read especially when it is sparsely are not at all commented. Luckily, any good assembler has tools to make it easier to interpret. Additionally assemblers may have functionality to make interfacing with C code easy. Discussed below are just a few of these tools for CCS. To go more in depth with these tools and others take a look at spru513g, “TMS320C28x Assembly Language Tools v6.2.4.”

 .ASG and .define assembler directive The .ASG and .define directives allow you to define a substitution symbol. For example you can define XAR0 as myVar1. The difference between .define and .ASG is that .ASG symbols can be redefined latter in the program whereas .defines can’t. The syntax for both is .ASG ["]character string["], substitution symbol .define ["]character string["], substitution symbol Example: .ASG “XAR0”, myVar1 .define “*+XAR2”, myPtr

MOVL myVar1, myPtr[2]

; .define “XAR3” ; This is illegal .ASG “AR1”, myVar1 ; This is legal

MOV myVar1, #0  . cdecls

Cdecls is a directive that works just like a #include “header_name.h” but it converts the C objects to assembly versions, this way you can use c functionality like structures in assembly! It even handles recursive includes in the header file. Function, function like macros, and variable definitions are ignored but most other c or c++ constructs such as enumerators structures, variable prototypes, and unions will be converted to assembly equivalents, NOTE all of these equivalent formats have their own assembler directives so that you can directly define them in the assembly, however; it’s much easier to just do it in a C header.

Syntax:

Single Line:

.cdecls [options ,] " filename "[, " filename2 "[,...]]

Syntax Multiple Lines:

.cdecls [options]

%{

/*------*/

/* C/C++ code - Typically a list of #includes and a few defines */ /*------*/

%}

Example:

.cdecls C, NOLIST, "Filters.h"

This includes Filters.h as c code and does not add it to the list file

 Using C structs in assembly Once you’ve cdecls a header file with a c structure into your assembly it converts the c structure into constants. These constants represent the offset from the base pointer of the structure to the memory location of the element of the struct. To access these constants us the following syntax Name_of_struct.member_var Name_of_sturct.member_struct.member_var Example Header // Other code

typedef struct fir { const float* coeffs; Uint16_circle_buff in_buff; float freq; float dc_offset; float (*filter)(struct fir*); } fir;

typedef struct float_circle_buff { float* start; unsigned long size; unsigned long offset; } float_circle_buff;

ASM ; Other code MOVL ACC, *+XAR4[fir.in_buff.start] MOVL ACC, #fir.coeffs

Example FIR filter This example puts everything together for a full fledge mixed C FIR. A c struct is defined in the filter.h file along with function forward declarations and some other useful c constructs. The functions are defined in filter.c. The filter struct has an embedded circle buffer struct within it this is defined in buffers.h and buffers.c. A C callable assembly routine to perform the filter functionality is defined in fir.asm. Filter Header file /* * Filters.h * * Created on: Nov 13, 2014 * Author: Kevin French */

#ifndef FILTERS_H_ #define FILTERS_H_

#include "Buffers.h"

/******************************************************************** * FIR * ********************************************************************/

// Macro to define a default FIR #define DEFAULT_FIR {(const float*)0, \ {(Uint16*)0, 0l, 0l}, \ 0.0f, FirFilter_asm}

// Structures typedef struct fir { // Pointer to an array of floating point coefficients must be the same size as in_buf const float* coeffs; // A circular buffer as defined in Buffers.h Uint16_circle_buff in_buff; // Frequency that the FIR will be run at float freq; // The dc offset for FIRs that filter out DC components set to 0 otherwise float dc_offset; // Function pointer to the function that performs the math for the filter // Syntax return_type (*name) (comma separated input list) float (*filter)(struct fir*); } fir;

// Function Decelerations

// C Filter function float FirFilter_c(fir* self); // Assembly Filter function float FirFilter_asm(fir* self); // Initialize a fir filter void InitFir(fir* self, unsigned long buff_start, unsigned long size, const float* coeffs, float freq, float dc_offset, float (*filter) (fir*));

#endif /* FILTERS_H_ */

Filter C /* * Filters.c * * Created on: Nov 13, 2014 * Author: Kevin French */

#include "Filters.h"

// Initialization for fir filters void InitFir(fir* self, unsigned long buff_start, unsigned long size, const float* coeffs, float freq, float dc_offset, float (*filter) (fir*)) { // TODO: implement error checking

// Set all of the structures members self->filter = filter; self->freq = freq; self->coeffs = coeffs; self->dc_offset = dc_offset;

self->in_buff.size = size; self->in_buff.offset = 0; self->in_buff.start = (Uint16*)buff_start; }

// Filter function for all fir filters in C float FirFilter_c(fir* self) { float sum = 0.0f; int i = 0;

for(i = 0; i < self->in_buff.size; ++i) { sum += self->coeffs[i] * ReadUint16CircleBuff( &(self->in_buff), i ); }

// Add the dc offset back to the signal sum += self->dc_offset;

return sum; }

Circle Buffer Header /* * Buffers.h * * Created on: Nov 12, 2014 * Author: Kevin French */

#ifndef BUFFERS_H_ #define BUFFERS_H_ #include "DSP2833x_Device.h"

// Structures typedef struct Uint16_circle_buff { // Starting address pointer Uint16* start; // Size of the buffer unsigned long size; // Current offset unsigned long offset; } Uint16_circle_buff;

// Functions void WriteUint16CircleBuff(Uint16_circle_buff* c_buff, Uint16 data);

Uint16 ReadUint16CircleBuff(Uint16_circle_buff* c_buff, unsigned long offset); #endif /* BUFFERS_H_ */

Circle Buffer C /* * Buffers.c * * Created on: Nov 12, 2014 * Author: Kevin French */ #include "Buffers.h" void WriteUint16CircleBuff(Uint16_circle_buff* c_buff, Uint16 data) { // Increment the offset value ++c_buff->offset;

// Wrap the offset around if we need to if(c_buff->offset >= c_buff->size) { c_buff->offset = 0; }

//Store the data at the correct location *(c_buff->start + c_buff->offset) = data; }

Uint16 ReadUint16CircleBuff(Uint16_circle_buff* c_buff, unsigned long offset) { // Read from the circle buffer with an offset // Grab the current offset and subtract the current offset offset = c_buff->offset - offset;

// Wrap around if necessary while(offset >= c_buff->size) { offset += c_buff->size; }

//Return the correct data return *(c_buff->start + offset); }

FIR asm ; C callable function for fir filtering ; Include The filter header .cdecls C, NOLIST, "Filters.h"

; Globals and definitions .def _FirFilter_asm

.text ; Performs an FIR filter on data in the filters circular buffer ; See spru514g section 7.4 for details on C assembly interactions ; XAR4 contains the pointer to the FIR struct .ASG "XAR5", circle_buff_ptr .ASG "XAR6", coeff_ptr .ASG "XAR4", fir_ptr .ASG "ACC", end_ptr .ASG "R3H", buff_value .ASG "R1H", coeff_value .ASG "R2H", product .ASG "R0H", result

_FirFilter_asm: ; Preserve all Save on entry registers ;None used

; Initilization ; Load Circle buffer address position MOVL circle_buff_ptr, *+fir_ptr[fir.in_buff.start] ; Circle buff pointer base MOVL ACC, *+fir_ptr[fir.in_buff.offset] ADDL @circle_buff_ptr, ACC ; Add the offset

; Load the buff end address MOVL end_ptr, *+fir_ptr[fir.in_buff.size] ADDL end_ptr, *+fir_ptr[fir.in_buff.start]

; Load Coeffs address MOVL coeff_ptr, *+fir_ptr[fir.coeffs]

; Zero the result and product register ZERO result ZERO product

; Do the filter FirFilter_asm_loop1: ; Load buffer value UI16TOF32 buff_value, *circle_buff_ptr++

; Load coeff MOV32 coeff_value, *coeff_ptr++

; Multiple and accumulate in parallel ; Add last product in esentially result += product then product = ; coeff_value * buff_value MPYF32 product, coeff_value, buff_value || ADDF32 result, result, product

; Check if its time to wrap the circle buff ptr back to the start CMPL end_ptr, circle_buff_ptr SBF FirFilter_asm_loop1, NEQ

; Wrap the coeff_ptr back to the start MOVL circle_buff_ptr, *+fir_ptr[fir.in_buff.start]

; Load end address of coeffs MOVL end_ptr, *+fir_ptr[fir.in_buff.size] ; Floats have a data size of two since there 32 bits wide so left shift LSL end_ptr, 1 ADDL end_ptr, *+fir_ptr[fir.coeffs]

FirFilter_asm_loop2: ; Check if done CMPL end_ptr, coeff_ptr SBF FirFilter_asm_final, EQ

; Load buffer value UI16TOF32 buff_value, *circle_buff_ptr++

; Load coeff MOV32 coeff_value, *coeff_ptr++

; Multiple and accumulate ; Add last product in esentially result += product then product = ; coeff_value * buff_value MPYF32 product, coeff_value, buff_value || ADDF32 result, result, product

SB FirFilter_asm_loop2, UNC

FirFilter_asm_final: ; Add the final product ADDF32 result, result, product

; The addressing mode that was used in this file for accessing data ; members only goes up to an offset of 7 the dc_offset is farther ; than that so the below convoluted methode was used, this is not ; the optimal solution! but it works ; Add the dc offset TODO: make this better MOVL ACC, fir_ptr ;Load ACC with fir_ptr MOVL fir_ptr, #fir.dc_offset ;Load fir_ptr with the offset ADDL ACC, fir_ptr ;Add offset to ptr MOVL fir_ptr, ACC ;Reload ptr MOV32 buff_value, *fir_ptr ;Grab dc offset ADDF32 result, buff_value, result ;Add dc offset to result

; Restore all save on entry registers ; None used LRETR Main Code // Other code … fir bandpass48_fir;

// Set the start address, size, coeffs, frequency, dc offset and the filter function InitFir(&bandpass48_fir, start_address, BANDPASS48_NUM_TAPS, BANDPASS48_COEFFS_PTR, 48e3, 16768, FirFilter_asm);

// Other code …

// Interrupt fir callback interrupt void FirISR() { // Get some input // Store the input in a circle buff WriteUint16CircleBuff(&(bandpass48_fir.in_buff), input);

// Invoke the filter function pointed to by bandpass48_fir.filter float output = bandpass48_fir.filter(&bandpass48_fir); // Do something with the output }