ARM Architecture and Assembly Programming Intro

Instructors: Dr. Phillip Jones

http://class.ece.iastate.edu/cpre288 1 Announcements

• Lab 9: object detection: Give TAs Creative Team name & verify team • Exam 3: Friday 7/6 (last day of class) • Final Projects – Projects: Mandatory Demos Friday , 7/6 (11am – 3pm) – Reminder, Lab attendance is mandatory: -10 points from final project for each lab session you miss • Homework 6: Due Sunday (7/1) • Quiz 6 (Thursday, 6/28): Textbook readings below (in Chp2 and 4). • Reading for the few weeks – Textbook: Chapter 2.1-2.3, and 2.6.1-2.6.2 – Textbook: Chapter 4.1 – 4.3 – Assemble ARM instruction set manual: • Preface, Chapter 3, Chapter 4 – ARM Procedure Call Standard: • Sections: 5, 7.1.1, 7.2. 2 ARM ARCHITECTURE OVERVIEW

http://class.ece.iastate.edu/cpre288 3 Why use assembly programming?

• Full access to hardware features – Compiler limits a programmers access to the hardware features that the compiler writer decided to implement • Writing time critical portions of code – Allows tight control over what the CPU is doing on every clock cycle • Debugging – It in not uncommon when trying to debug odd system behavior to have to look at disassembled code

http://class.ece.iastate.edu/cpre288 4 Why learn the ARM Hardware Architecture?

• Helps give intuition to why the assembly instructions were created the way the were • Help understand what special feature may be available for you to make use of.

http://class.ece.iastate.edu/cpre288 5 ARM (Cortex-M4) Architecture Overview

Table 2.1 Development history of the ARM® MCUs family. Cores Bit Cortex Architecture Cores Designed by ARM Holdings Designed by Width Profile Third Parties ARMv1 32/26 ARM1

Amber, STORM ARMv2 32/26 ARM2, ARM3 Open Soft Core ARMv3 32 ARM6, ARM7

StrongARM,

ARMv4 32 ARM8 FA526 ARMv4T 32 ARM7TDMI, ARM9TDMI

XScale, FA626TE, ARMv5 32 ARM7EJ, ARM9E, ARM10E Feroceon, PJ1/Mohawk

ARMv6 32 ARM11

ARMv6-M 32 ARM Cortex-M0, ARM Cortex-M0+, ARM Cortex-M1 Microcontroller

ARMv7-M 32 ARM Cortex-M3 Microcontroller

ARMv7E-M 32 ARM Cortex-M4 Microcontroller

ARMv7-R 32 ARM Cortex-R4, ARM Cortex-R5, ARM Cortex-R7 Real-time

ARM Cortex-A5, ARM Cortex-A7, ARM Cortex-A8, Krait, Scorpion, ARMv7-A 32 ARM Cortex-A9, ARM Cortex-A12, ARM Cortex-A15, PJ4/Sheeva, Application ARM Cortex-A17 /A6X X-Gene, Denver, Apple ARMv8-A 64/32 ARM Cortex-A53, ARM Cortex-A57 Application A7 (Cyclone), K12 ARMv8-R 32 No announcements yet Real-time

http://class.ece.iastate.edu/cpre288 6 ARM (Cortex-M4) Architecture Overview

• 32 bit processor – size of bus is 32 bits

– size of registers is 32 bits System Controller

System Clock – Size of instructions 16/32 bits Control-PLL

Memory Peripheral Protection Unit Bridge • RISC architecture (MPU) Float Point Unit (FPU)

Watchdog Peripheral Flash Timers • Harvard architecture Data Program Reset Control Application Specific Logics

– separate data and instruction MAC CAN

memory UARTs 0 ~ 7 USB

SPI PWM

Analog Comparator I2C • 203 instructions ADCs 0 ~ 7 General Timers

Figure 2.1 The block diagram for a general MCU.

http://class.ece.iastate.edu/cpre288 7 ARM Block Diagram

System Controller

System Clock Control-PLL

Memory Peripheral Protection Unit Bridge (MPU) Float Point Unit (FPU)

Watchdog Peripheral Flash Timers Data Program Reset Control Application Specific Logics

MAC CAN

UARTs 0 ~ 7 USB

SPI PWM

Analog Comparator I2C

ADCs 0 ~ 7 General Timers

Figure 2.1 The block diagram for a general MCU.

http://class.ece.iastate.edu/cpre288 8 ARM: RISC CPU Architecture

• What is RISC? – Reduced Instruction Set Computing (with respect to CISC, Complex Instruction set Computing) • Typical RISC – LD/Store based: ALU to memory transaction via registers – Most instruction are the same length – Typically many less instructions than a CISC architecture – Typically many more registers than CISC since Data must be moved into a register before it can be operated on – Low number of instruction typically makes hardware design simpler (as compared to CISC)

Ref: http://www.seas.upenn.edu/~palsetia/cit595s07/RISCvsCISC.pdf (Diana Palsetia) http://class.ece.iastate.edu/cpre288 9

ARM: RISC vs CISC example

Size / time CISC Size / time RISC 1 byte, 1clk mov R1, 10 1 byte, 1clk mov R1, 0 1 byte, 1clk mov R2, 5 1 byte, 1clk mov R2, 10 4 byte, 30 clk mul R2, R1 1 byte, 1 clk mov R3, 5 1 byte, 1clk Begin: add R1, R2 1 byte, 1 clk loop Begin

• CISC: Instructions often variable length and variable time • RISC: Instruction typically constant length and time – Simpler hardware logic for decoding instructions (thus typically faster)

Ref: http://www.seas.upenn.edu/~palsetia/cit595s07/RISCvsCISC.pdf (Diana Palsetia) http://class.ece.iastate.edu/cpre288 10 ARM CPU Core Summary

Instructions are 16-bit or 32-bit

Simple three-stage pipeline

Registers are 32-bit and addresses are 32-bit

http://class.ece.iastate.edu/cpre288 11 Cortex-M4 Architecture

• 32 bit processor – size of data bus is 32 bits

– size of registers is 32 bits • Harvard architecture Interrupts Trace Interface

• RISC architecture Interrupt Controller SysTick • ~203 instructions Debug Interface JTAG/SWD Trace Interface

ICode & DCode RAM + Peripherals (Flash) & (SRAM)

Figure 2.3 The functional block diagram for the ARM® Cortex®-M4 MCU.

http://class.ece.iastate.edu/cpre288 12 Cortex-M4 Processor Architecture

• 32 bit processor – size of data bus is 32 bits

– size of registers is 32 bits • Harvard architecture Interrupts Trace Interface

• RISC architecture Interrupt Controller SysTick • ~203 instructions Debug Interface JTAG/SWD Trace Interface

ICode & DCode RAM + Peripherals (Flash) & (SRAM)

Figure 2.3 The functional block diagram for the ARM® Cortex®-M4 MCU.

http://class.ece.iastate.edu/cpre288 13 ARM: Harvard Architecture

Program Data Memory Memory 64K (256KB) CPU (32KB) 8K rows rows

32-bits 32-bits • Program memory – Flash based: Program stays even if power turned off (non-volatile) – 16-bits wide, instructions are 16-bit or 32-bit wide. • Data Memory – SRAM based: Data disappears if power is turned off (volatile)

– 32-bits wide: 14 http://class.ece.iastate.edu/cpre288 ARM: Logical Data Memory organization

CPU Data 16 32-bit Memory Registers (32KB) 8K rows

ALU

• Registers: 32-bits – 32-bits wide – Directly accessible by ALU • Data Memory: – 32-bit wide – Must use a register to move to/from the ALU http://class.ece.iastate.edu/cpre288 15 ARM: Memory Map organization

CPU

http://class.ece.iastate.edu/cpre288 16 Understanding Data

• Stack – Stores data related to function variables, function calls, parameters, return variables, etc. – Data on the stack can go “out of scope”, and is then automatically deallocated – Starts at the top of the program’s data memory space, and addresses move down as more variables are allocated • Heap – Stores dynamically allocated data – Dynamically allocated data usually calls the functions alloc or malloc (or uses new in C++) to allocate memory, and free to (or delete in C++) deallocate – There’s no garbage collector! – Starts at bottom of program’s data memory space, and addresses move up as more variables are allocated

http://class.ece.iastate.edu/cpre288 17 ARM: Typical Logical Memory Layout

• Static Data (e.g. Global vars) • Stack (e.g. local vars, function args & return value, return address) • Heap (e.g. dynamically allocated memory: malloc, new) • Code (Code may be in the same or a separate memory device) 0xHigh address Stack Base Address Stack Stack grows Stack Pointer (SP): downward Address of top of the stack Data Heap grows Heap upward

Static Data

… Code Segment Code 0x0000_0000 http://class.ece.iastate.edu/cpre288 18 Example Memory Layout

Memory Mapped I/O, Devices High end Address I/O addresses static char[] greeting =“Hello world!”; Stack (grows down) main() { int i; char bVal; Heap (grows up) Static Data LCD_init(); LCD_PutString(greetin); … } Code

Low end Address

http://class.ece.iastate.edu/cpre288 19 ARM: Typical Function Stack-Frame

• Stack is typically divided into a number of function stack-frames. • Stack-Frame: Contains information associated with a function call – Function args, local vars, return address, preserved values Stack Stack Base Address • Function Return Address • Preserved Values Stack grows Function 1 Frame • Function Args downward • Local Vars

• Function Return Address Function 2 Frame • Preserved Values • Function Args • Local Vars … http://class.ece.iastate.edu/cpre288 20 Function and Stack

Conventional program stack grows downwards: New items are put at the top, and the top grows down

high address

Occupied stack top

stack growth

low address Function and Stack

Auto, local variables have their storage in stack

Why stack? • The LIFO order matches perfectly with functions call/return order – LIFO: Last In, First Out – Function: Last called, first returned • Efficient memory allocation and de-allocation – Allocation: Decrease SP (stack top): PUSH – De-allocation: Increase SP: POP

Function and Stack

Function Frame: Local storage for a function Example: 1. A is called; 2. A calls B; 3. B calls C; 4. C returns Function and Stack

What can be put in a stack frame? • Function return address • Parameter values • Return value • Local variables • Saved register values

May 18, 2011 http://class.ece.iastate.edu/cpre288 24 Example: Stack

• The following example shows the execution of a simple program (left) and the memory map of the stack (right)

http://class.ece.iastate.edu/cpre288 25 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; for (i = 0; i < 10; i++) { 996 doNothing(); 995 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 26 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 27 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 28 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 PC Address for line 9 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 29 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 PC Address for line 9 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 30 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 PC Address for line 9 } 994

return 0; 993 c } …. …

http://class.ece.iastate.edu/cpre288 31 Example: Stack void doNothing() { char c; …. … } 1000 x 999 y int main() { 998 z char x, y, z; 997 short i; i for (i = 0; i < 10; i++) { 996 doNothing(); 995 } 994 return 0; 993 } …. …

http://class.ece.iastate.edu/cpre288 32 General Purpose Register File

R0 Application Execution Interrupt R1 PSR PSR PSR R2 APSR EPSR IPSR R3 Low Register R4

R5 XPSR Program Status Register 13 General R6 Purpose PRIMASK Interrupt R7 Registers FAULTMASK Exception Mask R8 BASEPRI Registers (R0 ~ R12) R9 CONTROL Control Register High Register R10 R11 R12 Main Stack Pointer (MSP) Stack Pointer Register R13 (MSP-PSP) Link Register R14 (LR) Process Stack Pointer (PSP) Program Counter R15 (PC)

Figure 2.5 A structure block diagram of 21 registers in the Cortex®-M4 Core.

http://class.ece.iastate.edu/cpre288 33 ARM: GP Registers (Cont.)

• 16 32-bit general purpose registers – Used for accessing SRAM – Used for storing function parameters – Used for instructions to execute operations on • What is an 32-bit register. – Basically just 32 D-Flips connected together Bit 31 Bit 30 Bit 5 … … Bit 2 Bit 1 Bit 0

http://class.ece.iastate.edu/cpre288 34 STATUS REGISTER (SREG)

http://class.ece.iastate.edu/cpre288 35 Status Register (SREG)

Bits 31 30 29 28 27 26:25 24 23:20 19:16 15:10 9 8 7 6 5 4 3 2 1 0 * APSR N Z C V Q GE Reserved IPSR Reserved Exception Number EPSR Reserved ICI/IT T Reserved ICI/IT Reserved (a) Three individual register – APSR, IPSR and EPSR.

Bits 31 30 29 28 27 26:25 24 23:20 19:16 15:10 9 8 7 6 5 4 3 2 1 0 * PSR N Z C V Q ICI/IT T GE ICI/IT Exception Number (b) The combined register PSR.

Figure 2.6 Structure and bit functions in special registers.

http://class.ece.iastate.edu/cpre288 36 Status Register (SREG): Common flags

• Z: Zero flag – Set to 1 when the result of an instruction is 0 • C: Carry flag – Set to 1 when the result of an instruction causes a carry to occur • N: Negative – Set to 1 to indicate the result of an instruction is negative • V: Overflow – Set to 1 to indicate the result of an instruction caused an overflow

http://class.ece.iastate.edu/cpre288 37 Example ARM Assembly int x, a, b; x = (a + b);

MOVW r4,address-of-a; get address for a LDR r0,[r4]; get value of a MOVW r4,address-of-b; get address for b LDR r1,[r4]; get value of b ADD r3,r0,r1; compute a+b MOVW r4,address-of-x; get address for x STR r3,[r4]; store value of x

http://class.ece.iastate.edu/cpre288 38 How to Study Assembly

1. Get to know CPU registers Memorize rules for usage

2. Know basic types of Instruction Memory load and store Arithmetic/Logic Compare and branch

http://class.ece.iastate.edu/cpre288 39 How to Study Assembly

3. Translate C statements Memory accesses Simple arithmetic statements If statement Loop statements

4. Translate C functions Function Linkage Making a function call

http://class.ece.iastate.edu/cpre288 40 How to Study Assembly

5. Interrupt System Principle of interrupt and exception Interrupt vector table Saving and restoring context

http://class.ece.iastate.edu/cpre288 41 Challenges

Challenges in learning Assembly – Must understand how program works cycle by cycle – Have to memorize some notations before fully understanding them

http://class.ece.iastate.edu/cpre288 42 Announcements • Final Projects – Project Teams: Demo Thursday 7/6 • +5 bonus if Demoed by Wednesday 7/5 – Reminder lab attendance is mandatory: -10 points from final project for each lab session you miss • Exam 2: In class Thursday 11/9 • Reading for remainder of semester – Chapter 2.1-2.3, and 2.6.1-2.6.2 – Chapter 4.1 – 4.3 – Assemble ARM instruction set manual. • Preface, Chapter 3, Chapter 4 – ARM Procedure Call Standard • Sections: 5, 7.1.1, 7.2. 43

http://class.ece.iastate.edu/cpre288 44