4. Mixed: C/Assembly, Assembly/Assembly, Etc

Total Page:16

File Type:pdf, Size:1020Kb

4. Mixed: C/Assembly, Assembly/Assembly, Etc

PROGRAMMING

I. Programming A. Options 1. C/C++ 2. Linear Assembly 3. Assembly 4. Mixed: C/Assembly, Assembly/Assembly, etc B. Choice depends on real-time requirements and optimization achieved with each language – will look at this in next chapter.

II. Assembly Language Programming A. Basics 1. Format

label parallel bars [condition] instruction unit operands ; comments

A label identifies a line of code or a variable and represents a memory address that contains either an instruction or data. Ex. x_addr .short 1,2,3,4 ; numbers in x array (x_addr can be used as a pointer)

Labels must meet the following conditions: The first character must be a letter or an underscore ( _ ) The first character must be in the first column (All other elements of the line cannot be in the first column.) Labels can include up to 32 alphanumeric characters.

An instruction that executes in parallel with the previous instruction is signified by parallel bars (||).

[ ] Represents a condition A1, A2, B0, B1, B2 are available for use as conditional registers. [A2] means that the following instruction should execute if A2 is not zero. [!A2] means the instruction should execute if A2 is zero.

Instructions are either directives or mnemonics: - Directives are commands for the assembler that control the assembly process or define the data structures (constants and variables) in the program. All assembler directives begin with a period. - Mnemonics are the actual microprocessor instructions that execute at runtime and perform the operations in the program.

The functional unit in the assembly code is optional. The functional unit can be used to document which resource each instruction uses and can be used to optimize performance.

Operands – three types: · Register operands indicate a register that contains the data. · Constant operands specify the data within the assembly code. · Pointer operands contain addresses of data values. Only the load and store instructions require and use pointer operands to move data values between memory and a register.

Instructions have the following requirements for operands in the assembly code: · The destination operand must be in the same register file as one source operand. · In an execute packet, for a functional unit on each side, one source operand can come from the opposite register file. 2. Assembler directives – a command for the assembler. Indicates assembly code sections and declares data types. a. Directives that define sections The smallest unit of an object file is called a section. A section is a block of code or data that occupies space in the memory map with other sections. There are two basic types of sections: Initialized sections contain data or code. Ex: .text, .data, .sect Uninitialized sections reserve space in the memory map for uninitialized data. It is a place for creating and storing variables during execution. Ex: .bss, .usect Object files always contain three default sections: .text section - contains executable code .data section - contains initialized data .bss section - reserves space for uninitialized variables

The assembler allows you to create sections through directives.

.text – to declare the following section as program (not needed but for memory allocation) .data – to declare the following section as data .sect “name” – defines a section of code or data named ‘name’ and associates subsequent code or data with that section. ex: mydata .bss symbol, size in bytes – reserves size bytes in the .bss (uninitialized data) symbol .usect “section name”, size in bytes - reserves space in an uninitialized named section. The .usect directive is similar to the .bss directive, but it allows you to reserve space separately from the .bss section.

b. Directives that reference or define files

.def symbol - identifies a symbol that is defined in the current module and that can be used in another module. Defines a routine. .ref symbol - identifies a symbol that is used in the current module but is defined in another module. Used for a subroutine called in the main program. .global symbol - declares a symbol external so that it is available to other modules at link time. The .global directive does double duty, acting as a .def for defined symbols and as a .ref for undefined symbols. A symbol can be declared global for either of two reasons: - If the symbol is not defined in the current module, the .global or .ref directive tells the assembler that the symbol is defined in an external module. - If the symbol is defined in the current module, the .global or .def directive declares that the symbol and its definition can be used externally by other modules.

c. Directives that initialize constants (data and memory)

.byte value – Initializes a byte in memory. Reserves 8 bits in memory and fills it with the specified value .short value - Initializes a 16-bit integer. 2s complement/binary (-2^15 to 2^15-1) .float value - Initializes a 32-bit floating-point constant .int value or .word value – Initializes a 32-bit integer -2^31 to 2^31-1 .double value - Initializes a 64-bit constant in memory

d. Directives that define symbols at assembly time

symbol .equ value - Equates a constant value to a symbol symbol .set value *The symbol is a label that must appear in the label field. 3. Mnemonics – Instructions

a. Location: in CCS: Help > Contents > Instruction Set Summary

Each instruction provides the following;

Syntax: ADD (.unit) src1, src2, dst .unit = .L1, .L2, .S1, .S2 src and dst indicate source and destination src – what is being operated on dst – the result

Operand table:

src1 sint src2 xsint dst sint

.L1, .L2 ADD

Tells what type of operands can be used.

S means signed; u unsigned. Any operand that begins with x can be read from a register file that is different from the destination register file – this is called a cross path.

Instruction type and delay slots. Useful for pipelining/constraints.

[Functional latency for a given instruction type can be found in CCS: Help > CPU Reference Guide > ‘C67x Pipeline > Pipeline Execution of Instruction Types]

b. Add/Subtract/Multiply

ADD .L1 A3, A7, A7 ; add A3+A7 -> A7

SUB .S1 A1, 1, A1 ; A1-1 -> A1 using the S unit

MPY .M2 A7,B7,B6 ;multiply 16LSBs of A7,B7 -> B6 || MPYH .M1 A7,B7,A6 ;multiply 16MSBs of A7,B7 -> A6

c. Load/Store (.D unit)

The address register to be used must be on the same side as the .D unit.

LDH .D2 *B2++,B7 ; load (B2) -> B7, increment B2 || LDH .D1 *A2++,A7 ; load (A2) -> A7, increment A2

Loads the half-word at the address pointed by B2 into B7. Then B2 is incremented to point at the next higher memory address.

NOT valid: LDH .D2 *A2++,B7

Can use LDW to load a 32-bit word into each side or LDDW to load two 32-bit words into each side. This is a way to get more data in at one time.

STW .D2 A1,*B4 ; store A1 -> (B4) Stores the 32-bit word in A1 into memory whose address is pointed by B4. Data Address Paths

- The .D functional units access general memory (not register files) through data address paths. There is one for each side, T1 and T2. The .D unit from one side can use the data address path from the other side by adding the T1 or T2 to the functional unit. - Ex: LDW .D1T2 *A0, B3 or LDW .D1 *A0, B3 This command loads a 32-bit word based on the address provided by .D1, i.e. from register file A, *A0. But the data access path T2 is used in order to put the data into the B register file, B3.

d. Branch/Move

x .short 1, 2, 3

Loop MVK .S1 x,A4 ;move 16LSBs of x address -> A4 MVKH .S1 x,A4 ;move 16MSBs of x address -> A4 . . . SUB .S1 A1,1,A1 ; decrement A1 [A1] B .S2 Loop ; branch to Loop if A1 is not equal to 0 NOP 5 STW .D1 A3,*A7 ; store A3 into (A7)

e. Division – is done by taking the reciprocal of the denominator and then multiplying by the numerator. One single-precision floating point instruction: RCPSP

4. Cross-Paths

a. There are two data cross-paths, 1X and 2X, which allow the functional units on one side to access data from the register file on the other side. b. There can only be two cross-path reads per cycle. So only one functional unit per data path per execute cycle can get an operand from the opposite register file.

Ex:

MPY .M2 A7,B7,B6 ;multiply 16LSBs of A7,B7 -> B6 || MPYH .M1 A7,B7,A6 ;multiply 16MSBs of A7,B7 -> A6

B. Programming Constraints

1. Functional Unit Constraints

a. The same two functional units cannot be used in parallel.

b. Location: in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline > Functinal Unit Constraints

A functional unit may not be available for another instruction during an execute cycle because of performing certain operations, like reading or writing. Important for pipelining.

c. Ex. Let's look at two instructions: 16x16 (integer – short) multiply MPY (fixed- point instruction) and 32x32 (integer) multiply MPYI (floating-point instruction).

MPY: in CCS: Help > Contents > Instruction Set Summary > 'C62x/'C64x/'C67x (Shared) Fixed-Point Instructions Pipeline Stage E1 E2 Read src1,2 Write dst Unit in use .M

So we see that we have an instruction with a functional unit latency of 1 and a delay slot of 1.

MPYI: in CCS: Help > Contents > Instruction Set Summary > 'C67x (Specific) Floating-Point Instructions

Pipeline Stage E1 E2 E3 E4 … … … E9 Read src1,2 src1,2 scr1,2 scr1,2 Write dst Unit in use .M .M .M .M

So we have an instruction with a functional unit latency of 4 and a delay slot of 8.

So we find in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline > Functinal Unit Constraints for MPYI:

1 2 3 4 … … 8 9 MPYI R R R R W

Subsequent Same Unit Instruction

16x16 multiply Xr Xr Xr Xw

Xr = a read conflict Xw=a write conflict

In other words, a MPY instruction cannot follow a MPYI instruction ON THE SAME unit during the MPYI's E1, E2, E3, E4, and E8 phases.

Valid: MPYI .M1 ADD SUB ADD MPY .M1

Or: MPYI .M1 NOP 3 MPY .M1

2. Cross-Path Constraints

a. Recall: only one functional unit per data path per execute cycle can get an operand from the opposite register file.

b. Location: in CCS: Help > Contents > CPU Reference Guide > 'C67x Pipeline > Functinal Unit Constraints provides the cross-path constraints

So we find for the MPYI instruction 1 2 3 4 … … 8 9 MPYI R R R R W Same Side, DifferentUnit, Both Using Cross-Path

Single Cycle Xr Xr Xr

Xr = a read conflict

Valid:

MPYI .M1 A1,B1,A2 ADD .S1 3,A3,A3

Not Valid:

MPYI .M1 A1,B1,A2 ADD .S1 3,B3,A3

3. Load/Store Constraints

Loading and storing cannot be done from/to the same register file.

Valid:

LDW .D1 *A0,B1 (use data address path T2 to load into B) || STW .D2 A1,*B2 (use data address path T1 to get data to store in memory) (Note: both addresses come from the same register file as the functional unit)

Not valid:

LDW .D1 *A0,A1 (use data address path T1 to load data into A) || STW .D2 A2,*B2 (use data address path T1 to get data from A to store in memory)

C. File Structure

Main program: generally call it init and have a vectors program which initializes to init

Calling Assembly Language Subroutines (true if called from C or assembly language)

Arguments are passed to the subroutine through register A4,B4,A6, … in that order. Result is passed through A4. The return address is in B3. In C, assembly function must have an underscore at the beginning: _func The name of the *.c file cannot be the same as the *.asm file. In C, external declaration of an assembly function is optional, i.e. extern int func();

D. Examples

Example 1: Assembly calling assembly program to do dot product

Dotp_init.asm: ASM program to init variables. Calls dotpfunc.asm .def init ;starting address .ref dotpfunc ;subroutine .text ;section for code follows x_addr .short 1,2,3,4 ;numbers in x array y_addr .short 0,2,4,6 ;numbers in y array result_addr .short 0 ;initialize sum of products, ;address for result

init MVK .S1 x_addr,A4 ;16 LSBs address of x in A4 MVKH .S1 x_addr,A4 ;16 MSBS address of x in A4 MVK .S2 y_addr,B4 ;B4 since we pass arguments in this way MVKH .S2 y_addr,B4 MVK .S1 4,A6 ;A6 is another argument size of array B .S1 dotpfunc ;branch to the subroutine MVK .S2 ret_addr,B3 ;B3 is the return address for dotpfunc MVKH .S2 ret_addr,B3 NOP 3

ret_addr MVK .S1 result_addr,A0 MVKH .S1 result_addr,A0 STW .D1 A4,*A0 ;store result wait B .S1 wait NOP 5

Dotpfunc.asm Dot product subroutine

.def dotpfunc ;define dot product function .text dotpfunc MV A6,A1 ;move loop count to conditional register ZERO A7 ;init A7 for sum of products

loop LDH .D1 *A4++,A2 ;load half-word x(1) to A2 LDH .D2 *B4++,B2 ;B2=y(1) these two could be in parallel NOP 4 MPY .M1 A2,B2,A3 ;A3=x*y NOP ADD .L1 A3,A7,A7 ;sum of products in A7 SUB .L1 A1,1,A1 ;decrement loop counter [A1] B .S1 loop ;branch back to loop until A1=0 NOP 5

MV A7,A4 ;put the result into the return register B .S2 B3 ;branch to addr in B3 return_addr NOP 5

Example 2: C calling assembly language subroutine

Dotp.c

#include

#define count 4

short x[4] = {1,2,3,4}; short y[4] = {0,2,4,6}; int result;

main() { result = dotpfunc(x,y,count); printf("result = %d \n", result); }

Change the dotpfunc.asm so that

.def _dotpfunc _dotpfunc MV A6,A1 III. Linear Assembly Programming A. Basics 1. Assembler Optimizer An assembler optimizer (instead of C compiler) is used with a linear assembly program (*.sa) to create an assembly source program (*.asm). Usually more efficient than code generated from C compiler.

Assembler optimizer assigns the functional unit and registers to use, finds instructions that can execute in parallel, and performs pipelining.

2. General Programming

Parallel instructions are not valid in a linear assembly program.

Specifying the functional unit, register, or NOPs is optional.

Use syntax of assembly code instructions: ADD, SUB

Use operands as used in C. Variables are used to designate the registers.

A C program calling a linear assembly subroutine requires that the subroutine be _func.

3. Directives

.cproc and .endproc specifies a C-callable procedure or section of code to be optimized by the assembler optimizer. The variables being passed must follow the .cproc directive: .cproc x,y,count

.proc and .endproc starts and ends a general procedure, i.e. no arguments passed.

.return is used to return result to calling function.

.reg is to declare variables and use descriptive names for values that will be stored in registers. When you use .reg, the assembly optimizer chooses a register whose use agrees with the functional units chosen for the instructions that operate on the value.

.reg a,b ;represents registers which will be determined by optimizer.

mv 5,a ;moves 5 to the register the optimizer assigns to a

.def defines a function

.trip must be included for the optimizer to pipeline code for a loop. Specifies the number of times a loop iterates. loop .trip 4,20,4 loop will iterate a minimum of 4 times, max of 20, in multiples of 4, i.e. 4,8,12,16,20 loop .trip 4 loop iterates at least 4 times loop .trip 4,10 minimum of 4 times, max of 10 times

B. Example

Dot product: C program calling a linear assembly program

Dotp.c

#include

#define count 4 short x[4] = {1,2,3,4}; short y[4] = {0,2,4,6}; int result;

main() { result = dotpfunc(x,y,count); printf("result = %d \n", result); }

Dotpfunc.sa Linear assembly program to do a dot product

.def _dotpfunc ;defines the function _dotpfunc .cproc x,y,count ;start linear asm section .reg a,b,prod,sum ;define variables for registers

ZERO sum ;initialize sum of products loop .trip 4,4 ;exactly 4 iterations through loop LDH *x++,a ;pointer to x array > a LDH *y++,b ;put an element of y into b MPY a,b,prod ;prod=x*y ADD prod,sum,sum ;sum of products > sum SUB count,1,count ;decrement counter [count] B loop ;go to loop if count is not equal to 0

.return sum ;return sum as the result .endproc ;end linear assembly function

Recommended publications