Spring 2012 Prof. Hyesoon Kim • Nintendo DS introduction • Introduction of Nintendo DS programming • ARM architecture • Friday: – ARM architecture/assembly code • Next Wednesday: – ARM assembly coding (lab-day): introduction task #2 • Next Friday: – Assignment #1

• 1st part of programming platform • Programming with Nintendo DS • http://www.cc.gatech.edu/~hyesoon/spr12/i ntro1.html • Installation Guide and Hello world

http://www.cosc.brocku.ca/Offerings/3P92/seminars/nintendo_ds_slideshow.pdf • Dual TFT LCD screens • CPUs – ARM 7 TDMI (33MHz) – ARM 9 946E-S (67MHz) • Main memory: 4MB RAM – VRAM: 656 KB • 2D graphics – Up to 4 backgrounds • 3D graphics

• Both can be running code at the same time. • ARM 7 is the only CPU that controls the touch screen. – based

• DevKit Pro is a collection of tool chain for homebrew applications developers for various architectures • DevKitARM: ARM binaries • Not official development tool chain – Much simpler and naïve • libnds – Started with header files for definition – Extended to have other data structures, simple APIs • *.nds – A binary for Nintendo DS, a separate region for ARM7 and ARM9

http://patater.com/files/projects/manual/manual.html#id2612503 int main(void) { consoleDemoInit(); //Initialize the console irqSet(IRQ_VBLANK, Vblank); //this line says: When the IRQ_VBLANK interrupt occurs execute function Vblank iprintf(" Hello DS dev'rs\n"); while(1) { iprintf("\x1b[10;0HFrame = %d",frame); //print out the current frame number swiWaitForVBlank(); //This line basically pauses the while loop and makes it //wait for the IRQ_VBLANK interrupt to occur. This way, we print only once //per frame. } return 0; }

• Instead of pure assembly coding, we will use inline assembly programming • Not only ARM, x86 etc. • Good place to look at http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly- HOWTO.html#ss5.3 http://www.ethernut.de/en/documents/arm-inline-asm.html NOP asm( "mov r0, r0\n\t" "mov r0, r0\n\t" "mov r0, r0\n\t" "mov r0, r0" ); Use deliminaters Linefeed or tab to differentitate assembly lines

http://www.ethernut.de/en/documents/arm -inline-asm.html

• ARM is short for Advanced Risc Machines Ltd. – Founded 1990, owned by Acorn, Apple and VLSI • Known before becoming ARM as computer manufacturer • ARM is one of the most licensed company • Used especially in portable devices due to low power consumption and reasonable performance (MIPS/watt) • They do not fabricate silicon

http://tisu.it.jyu.fi/embedded/TIE345/luentokalvot/Embedded_3_ARM.pdf • 32-bit wide (16-bit thumb compressed format) • Load-store instruction set architecture • 3-address data processing instructions • Conditional execution of every instruction • Powerful load and store multiple register instructions • A general shift operation and a sequential ALU operations in a single instruction that executes in a single clock cycle • Open instruction set extension through the coprocessor instruction set, including adding new registers and data types to the programmer’s model • Compressed 16-bit thumb architecture

Steve Furber, ARM system-on-chip architecture 2nd edition • Data processing (ALU) operations write results only into registers • Memory operations are only copy (from memory to registers, register to memory) • ARM does not support memory-to-memory operations • ARM instruction three categories – 1. data processing instructions – 2. Data transfer instructions • memory-to/from-registers, exchange-memory-register (system only) – 3. Control flow instructions • Branch instructions, branch and link register (saving return address), trap instructions (supervisor calls)

Steve Furber, ARM system-on-chip architecture 2nd edition Current Usable Visible in user Registers mode r0 IRQFIQSVCUndefUserAbort ModeMode ModeMode ModeMode r1 r2 r3 BankedSystem out modes Registers only r4 r5 r6 User FIQ IRQ SVC Undef Abort r7 r8 r8 r8 r9 r9 r9 r10 r10 r10 r11 r11 r11 r12 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc)

cpsr spsr spsr spsr spsr spsr spsr

31 28 27 8 7 6 5 4 0 N Z V unused IF T mode

• N: Negative (the last ALU operation) • Z: zero (the last ALU operation) • C: carry (the last ALU or from shifter) • V: overflow

Steve Furber, ARM system-on-chip architecture 2nd edition CPSR[4:0] Mode Use Registers 10000 user Normal user code user 10001 FIQ Processing fast _fiq 10010 IRQ Processing standard interrupts _irq 10011 SVC Processing software interrupts (SWIs) _svc 10111 Abort Processing memory faults _abt 11011 Undef Handling undefined instruction traps _und 11111 System Running privileged user tasks

Software interrupt: supervisor calls

Steve Furber, ARM system-on-chip architecture 2nd edition • A linear array of byte address • Data format (8-bit bytes, 16-bit half-words, 32-bit words) • Aligned address accesses • Little endian Bit 31 Bit 0 23 22 21 20 19 18 17 16 15 14 13 12 11 10 9 8 7 6 5 4

3 2 1 0 Byte 1 Byte 0

Steve Furber, ARM system-on-chip architecture 2nd edition

• Fetch/Decode/Execute • Allow multi-cycle execution • Register, two read ports, one write port, – Additional register read/write for r15 (program counter)

Steve Furber, ARM system-on-chip architecture 2nd edition • Fetch/Decode/Execut e/Mem/write-back • Introduce a forwarding path

Steve Furber, ARM system-on-chip architecture 2nd edition • 2-Phase non-overlapping clock scheme

Steve Furber, ARM system-on-chip architecture 2nd edition

• SPSR (Saved Program Status Register)

Steve Furber, ARM system-on-chip architecture 2nd edition • 16 bits long • Similarity with ARM ISA – The load-store architecture with data processing, data transfer, and control-flow instructions – Support Byte, half-word, word (aligned accesses) – A 32-bit unsegmented memory • Differences – Most Thumb instructions are executed unconditionally • All ARM instructions are executed conditionally – Many thumb data processing instructions use a 2-address format – Thumb instruction formats are less regular than ARM ISA.

Steve Furber, ARM system-on-chip architecture 2nd edition • ARM7: 3 stage pipeline, 16 32-bit Registers , 32-bit instruction set • TMDI – Thumb instruction set – Debug-interface – Multiplier (hardware) – Interrupt (fast interrupt) – The most commonly used one

• 32/16-bit RISC • 32-bit ARM instruction set • 16-bit Thumb instruction set • 3-stage pipeline • Very small die size and low power • Unified bus interface (32-bit data bus carries both instruction, data)

1st Phase 2nd Phase

The ARM9 Family -High Performance Microprocessors for Embedded Applications • Instruction compression to save I-cache/memory accesses • Use only top 8 registers, • 3 operands  2 operands

• Instructions are compiled either native ARM code or Thumb code – To utilize full 16bit opcode – Use current processor status register (CPSR) to set thumb/native instruction

• All instructions are conditional • BX, branch and eXhange  branch and exchange (Thumb) • Link register (subroutine Link register) – R14 receives the return address when a Branch with Link (BL or BLX) instruction is executed

• 5-stage pipeline • I-cache and D-cache • Floating point support with the optional VFP9-S coprocessor • Enhanced 16 x 32-bit multiplier capable of single cycle MAC operations • The ARM946E-S processor supports ARM's real-time trace technology

• ARM7 3stage->ARM9 5 stage – Increase clock frequency

The ARM9 Family -High Performance Microprocessors for Embedded Applications • ARM7: Thumb instruction decode: first ½ phase of decode stage • ARM9: Parallel decoding • ARM7: ALU (arithmetic, and logic units) is active all the time • ARM9: Two units are partitioned to save power • ARM9: Forwarding path

The ARM9 Family -High Performance Microprocessors for Embedded Applications • Thumb 2 ISA • ARM architecture version 7 • A profile: high- performance open application platforms • R profile:real-time • M profile: (deeply embedded)

http://www.arm.com/images/ARM11MPCORE_chip_Big.jpg • Load store architecture has separate instruction sets to handle memory operations (True, False) • Thumb ISA is a 32-bit ISA (True, False) • What registers are used to store the program counter and link register? • Name the pipeline stages in ARM7 and ARM 9.

• ARM assembly code – Up: OR operation Down: AND operation start: Reset to default values A: Exclusive OR operation B: AND NOT (BIC) operation Left: left shift by #1 Right: right shift by #1 No need to use interrupt, use a polling method

– Implement at least 2 features among them and submit the code into T-square.

• Some instructions clobber some hardware registers. • We have to list those registers in the clobber-list • Input/output operands do not have to there. • Mostly side-effect operands that have to be treated very carefully. Such as “CC”. condition code.

KEY_A 1 << 0 A Button • Button, touch KEY_B 1 << 1 B Button screen, microphone KEY_SELECT 1 << 2 Select Button • Libnds key KEY_START 1 << 3 Start Button definition KEY_RIGHT 1 << 4 Right D-pad KEY_LEFT 1 << 5 Left D-pad KEY_UP 1 << 6 Up D-pad KEY_DOWN 1 << 7 Down D-pad KEY_R 1 << 8 R Button KEY_L 1 << 9 L Button KEY_X 1 << 10 X Button KEY_Y 1 << 11 Y Button Pen Touching KEY_TOUCH 1 << 12 Screen (no coordinates) Lid shutting KEY_LID 1 << 13 (useful for sleeping)

0x4000130

• Instead of pure assembly coding, we will use inline assembly programming • Not only ARM, x86 etc. • Good place to look at http://www.ibiblio.org/gferg/ldp/GCC-Inline-Assembly- HOWTO.html#ss5.3 http://www.ethernut.de/en/documents/arm-inline-asm.html

asm( "mov r0, r0\n\t" "mov r0, r0\n\t" "mov r0, r0\n\t" What do they do? NOP "mov r0, r0" ); Use deliminaters Linefeed or tab to differentitate assembly lines

http://www.ethernut.de/en/documents/arm -inline-asm.html We can specify operands. asm(code : // opcode , destination, src output operand list : /* optional*/ input operand list : /* optional*/ clobber list /* optional*/ ); /* Rotating bits example */ asm("mov %[result], %[value], ror #1" : [result] "=r" (y) : [value] "r" (x)); Symbolic name encoded in square brackets followed by a constraint string, followed by a C expression enclosed in parentheses e.g.) sets the current program status register of the ARM CPU asm("msr cpsr,%[ps]" : : [ps]"r"(status) );

... int test1=0, test2=0; while(1) { swiWaitForVBlank(); asm("MOV R0, #0x4000000\n\t" "ADD R0, #0x130\n\t" "LDR R2, [R0]\n\t“ "MOV R3, #0x300\n\t" "ADD R3, #0xff\n\t" "mov %[out1], R0 \n\t” "mov %[out2], R3\n\t”: [out1]"=r" (test1), [out2] "=r" (test2):);

iprintf("\x1b[16;0H test1: %x test2:%x\n", test1, test2); }

Constra Usage in ARM state Usage in Thumb state int F Floating point registers f0 .. f7 Not available h Not available Registers r8..r15 G Immediate floating point constant Not available H Same a G, but negated Not available Immediate value in data processing instructions Constant in the range 0 .. 255 I e.g. ORR R0, R0, #operand e.g. SWI operand Indexing constants -4095 .. 4095 Constant in the range -255 .. -1 J e.g. LDR R1, [PC, #operand] e.g. SUB R0, R0, #operand K Same as I, but inverted Same as I, but shifted Constant in the range -7 .. 7 L Same as I, but negated e.g. SUB R0, R1, #operand Registers r0..r7 l Same as r e.g. PUSH operand Constant in the range of 0 .. 32 or a power of 2 Constant that is a multiple of 4 in the range of 0 .. 1020 M e.g. MOV R2, R1, ROR #operand e.g. ADD R0, SP, #operand m Any valid memory address Constant in the range of 0 .. 31 N Not available e.g. LSL R0, R1, #operand Constant that is a multiple of 4 in the range of -508 .. 508 O Not available e.g. ADD SP, #operand

General register r0 .. r15 r Not available e.g. SUB operand1, operand2, operand3 w Vector floating point registers s0 .. s31 Not available X Any operand Modifier Specifies Write-only operand, usually used for all = output operands Read-write operand, must be listed as an + output operand & A register that should be used for output only asm("mov %[value], %[value], ror #1" : [value] "+r" (y));

Same register value

• MOV – MOV{S}{cond} Rd, Operand2 – MOV{cond} Rd, #imm16 • MSR Load an immediate value, or the contents of a general-purpose register, into specified fields of a Program Status Register (PSR) Syntax MSR{cond} APSR_flags, Rm where: – Cond is an optional condition code. – Flags specifies the APSR flags to be moved. flags can be one or more of: – Nzcvq ALU flags field mask, PSR[31:27] (User mode) – gSIMD GE flags field mask, PSR[19:16] (User mode). – Rm: is the source register. Rm must not be PC.

31 28 27 8 7 6 5 4 0 N Z C V unused IF T mode

int main(void) { //------consoleDemoInit(); int* notGood= (int *)0xb0; //bad *notGood= 10; int better=20; irqSet(IRQ_VBLANK, Vblank); printf(" Hello CS4803DGC"); // case 1 asm("MOV R1, #0xb0"); //init R1 to address asm("LDR R0, [R1]"); asm("ADD R0, R0, R0"); asm("STR R0, [R1]"); Please note that this code does not run correctly! // case 2 asm ("MOV R1, %[value]"::[value]"r"(better)); asm ("ADD R1, R1, R1"); asm ("MOV %[result], R1":[result]"=r"(better):); while(1) { swiWaitForVBlank(); // print at using ansi escape sequence \x1b[line;columnH printf("\x1b[10;0HFrame = %d",frame); printf ("\nblah is: %d, %d", *notGood, better); } return 0; }

• The current status of the keys is stored in memory at address 0x4000130. • When no key is pressed- the value is 1023. • A key press causes a change in the value at this location. The new value depends on which key is pressed. • Here are the values for various keys. A- #1022 b 11 1111 1110 B- #1021 b 11 1111 1101 start- #1015 b 11 1111 1011 UP- #959 b 11 1011 1111 DOWN- #895 b 11 0111 1111

asm ("MOV R4, #0x0000"); //R4 has the counter.. funny things happening with R1 while(1) { swiWaitForVBlank(); //init R4 to address asm ("MOV R0, #0x4000000"); //R0 has the address asm ("ADD R0, #0x130"); // finished moving address /* We have only 8-bit immediate values */

//load value from that address asm ("LDR R2, [R0]"); // check the register value of R2 and compare and then increment the counter // use condition code or shift etc. //move counter value from R2 to C variable asm ("MOV %[result], R2":[result]"=r"(result_):);

• Compiler still rearranges the assembly code. • Default compilation mode is ARM-thumb • The makefile has to be modified- set it to no optimization by -O0 (instructions might be re- arranged. So we prevent that.)

• change line ARCH := -mthumb –mthumb-interwork TO ARCH := -marm

• Error: value of 67108864 too large for field of 2 bytes at 124 •  What’s wrong?

• Compiler might decide not to include your asm code. !!! • Especially if no output operand is used. • Asm ("mov r0, r0"); • But what if you really really want to have that instruction inside your code • asm volatile("mov r0, r0"); • The volatile attribute to the compiler to exclude your assembler code from code optimization

• 1) 1 point task : Turn student information sheet by next Wednesday • 2) upload a screenshot of printing “hello my name is xyz” in the emulator by next Tuesday. • 3) Bring your computer for the next Wednesday lecture. We might have some time to do programing during the class time.