Easy Question 1: Describe the motivation to implement a hardware loop, and when is it inappropriate to implement one?

The advantage of a hardware loop is that there is no overhead for wasted instruction cycles through each iteration of the loop. This is because the counter comparisons are performed in hardware and do not take any clock cycles.

A hardware loop cannot be used it any function calls are made within the loop (e.g loop is in a leaf function), or if the number of loop cycles cannot be determined before entering the loop.

Hard Question 1:

Write a test/set of tests to validate the response of the following FIR filter function: void FIRFilter(float fInput_, float *pfOutput_, float afCoefs_[], float afHistory_[], uint32_t u32Size_);

//Answer #include ”filter.h” #define FILTER_SIZE 4 #define INPUT_SIZE 12 #define TOLERANCE 0.0001f void Test_FIR(void) { //Make coefficients float afCoefs[FILTER_SIZE] = {1.0, 2.0, 3.0, 4.0};

//History buffer float afHistory = {0.0};

//Impulse float afImpulse[INPUT_SIZE] = {0.0};

//Make sure this is not 1.0 to check for funky multiplication errors afImpulse[0] = 42.0; afImpulse[FILTER_SIZE] = 3.0;

//output array (don't need to initialize) float afOutput[INPUT_SIZE];

//run filter for(int i = 0; i

//check filter for(int i = 0; i

//Impulse of 2 CHECK_CLOSE(afOutput[i], afCoefs[i]*afImpulse[0], TOLERANCE);

//Impulse of 3 CHECK_CLOSE(afOutput[FILTER_SIZE + i], afCoefs[i]*afImpulse[FILTER_SIZE], TOLERANCE);

//Zero //Should be exactly 0.0, not close for index 8 through 12 CHECK_EQUAL(afOutput[2*FILTER_SIZE + i], 0.0);

//Check that coefficients were not modified (should be 1, 2, 3, 4) CHECK_CLOSE(afCoefs[i], float(i + 1.0) TOLERANCE); }

//Done first test, make sure we reset the history buffer and the output and the input for the next

afHistory = {0.0}; afImpulse[INPUT_SIZE] = {0.0}; afOutput[INPUT_SIZE] = {0.0};

//more tests go here but this is most of a legal sheet for a quiz question... //Other examples include: //timing test, //test with non-impulse inputs to ensure convolution is working properly, //etc... } Easy question 2: What are Data Address Generators? DSP reduces the number of instruction required to manage memory access by using two specialized Data Address Generators (DAGs), one for each of the two memories. Indeed, the first DAG supplies an address over the Data Memory address bus, the other one over the Program Memory address bus. DAGs consist of Index, Modify, Base and Length registers. Index and Modify registers have been used during assignment 1 and lab 1. Modify registers are the only ones that can be used with any Index register in the same DAG, the others must being used as a group of registers. It is important to know that DAGs are disconnected, so unless we use an intermediary data register to move data from one DAG to the other, two DAGs can’t share information.

Hard question 2: You are given a piece of code in assembly language for what will be an averaging filter function. Find the code defects and explain what the code is doing when you have fixed the program. 1 #include line 2: including a CPP library into an 2 #include "Assign1Library.h" assembly file: you want to make sure that 3 .section/dm seg_dmda; you have a #define __IS__ASM both in this 4 .section/pm seg_pmco; file and in your CPP library that contains the definition corresponding to you assembly 5 .global _AverageSingleAudioValue_ASM; function only. 6 _AverageSingleAudioValue_LeftASM: 7 pointer_I10=FIFO_Left; line 6: Mind the typo mistakes ! 8 loopCounter_R0=0; _AverageSingleAudioValue_ASM: is 9 loopMax_R1 = N; correct 10 loopMax_R1=loopMax_R1-1; line 7: Be sure that FIFO_Left is correctly 11 LOOP1_START: declared 12 COMP(loopCounter_R0, loopMax_R1); 13 IF GE JUMP LOOP1_END (DB); line 10 : temp_R2=1; 14 nop; loopMax_R1=loopMax_R1-temp_R2; For 15 nop; small constant (like 1), it is `acceptable`not 16 temp_R2=1; to use a register, otherwise you must use 17 temp_R2 = temp_R2 + loopCounter_R0; one 18 addressIndex_M4=temp_R2; 19 tempFloat_F12 =dm(addressIndex_M4, line 19: you can’t access M4 and I10, they pointer_I10); are in two different DAGs: unless we use an 20 addressIndex_M4=loopCounter_R0; intermediary data register to move data from 21 dm(addressIndex_M4)= tempFloat_F12 ; one DAG to the other, use the same DAG 22 temp_R2=1; (I0-I7, M0-M7) and (I8-I15, M8-M15). 23 loopCounter_R0=loopCounter_R0+temp_R2; second defect: i10 is a call preserved 24 JUMP MP LOOP1_START(DB); register !! to be use, we need the content of i10 to be saved before being called and then 25 nop; restored because it will be modified. Use a 26 nop; scratch register instead (do not need to be 27 LOOP1_END: saved/restored) like i4. We will then need to refactor pointer_I10 to pointer_I4

line 21: that’s neither pre or post modify dm(addressIndex_M4,pointer_I4): pre- modify: brings M4 in, adds to I4 and uses I4+ M4as address to access the memory (I4 unchanged). Required more bits and takes more time than post-modify! The program initializes the different registers we need for the loop. It loops from 0 to N-1 to push all values down the FIFO: - If loop counter is different from loopmax, we enter the loop. We increment our value stored in the register R2 by the loop counter value, and store it into an index register. - We use a pre-modify operation (does not change the index value !) to do the following `C`operation: FIFO[i]=FIFO[i+1]; before incrementing the loop counter by one and starting the loop again until loopCounter_R0=loopMax_R1