Introduction to the ARM Architecture

Max Mauro Dias Santos [email protected] Ponta Grossa, Paraná - Brazil February 12, 2021 Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools ARM Partnership Model

Design Manufacture Market ARM Powered Products ARM Powered Products ARM Powered Products ARM 7 applications ARM9 applications ARM11 applications ARM CortexM applications

. Dell E4300 Latitude Laptop . The hardware is used at the instant of boot-up for users and access to select applications, with multi-day battery lifetimes ARM CortexA applications ARM CortexR Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Intellectual Property

. ARM provides hard and soft views to licenses

. RTL and synthesis flows

. GDSII layout

. Licenses have the right to use hard or soft views of the IP

. soft views include gate level netlists

. hard views are DSMs

. OEMs must use hard views

. to protect ARM IP Topologies

Memory-mapped I/O: Von Neumann Harvard

. No specific instructions for I/O (use ARM9s Load/Store instr. instead) ARM7s and newers and olders . Peripheral’s registers at some Inst. Data memory addresses AHB I D Cache MEMORY & I/O

Bus Interface

AHB bus

MEMORY & I/O ARM7TDMI Block Diagram A[31:0]

Address Register Address Incrementer

PC bus PC

REGISTER BANK

INSTRUCCTION

ALU bus DECODER Multiplier

Control Lines

A bus B bus

SHIFT

A.L.U. Instruction Reg.

Thumb to ARM Write Data Reg. Read Data Reg. translator

D[31:0] ARM7TDMI Block Diagram

. Load/store architecture . A large array of uniform registers . Fixed-length 32- instructions . 3-address instructions RISC Architecture

. Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture. . It has the following key features: . A fixed (32-bit) instruction size with few formats; . CISC processors typically had variable length instruction sets with many formats. . A load–store architecture where instructions that data operate only on registers and are separate from instructions that access memory; . CISC processors typically allowed values in memory to be used as operands in data processing instructions. . A large register bank of thirty-two 32-bit registers, all of which could be used for any purpose, to allow the load- store architecture to operate efficiently; . CISC register sets were getting larger, but none was this large and most had different registers for different purposes

February 12, 18 2021 RISC Organization

. Hard-wired instruction decode logic . CISC used large ROMs to decode their instructions . Pipelined execution . CISC processors allowed little, if any, overlap between consecutive instructions (though they do now) . Single-cycle execution . CISC processors typically took many clock cycles to completes a single instruction

→ Simple is beauty Compiler plays an important role

February 12, 19 2021 ARM Architecture vs. Berkeley RISC

. Features used . Load/Store architecture . Fixed-length 32-bit instructions . 3-address instruction formats

f n bits n bits n bits function op 1 addr. op 2 addr. dest. addr. ADD d, S1, S2 ; d := S1 + S2 . Features rejected . Register windows → costly . Use shadow (banked) registers in ARM . Delay branch . Badly with branch prediction . Single-cycle execution of all instructions . Most single cycle, many other take multiple clock cycles

February 12, 20 2021 ARM Features

. Different from pure RISC in several ways: . Variable cycle execution for certain instructions: multiple-register load/store (faster/higher code density) . Inline barrel shifter leading to more complex instructions: improves performance and code density . Thumb 16-bit instruction set: 30% code density improvement . Conditional execution: improve performance and code density by reducing branch . Enhanced instructions: DSP instructions

February 12, 21 2021 Data Sizes and Instruction Sets

. The ARM is a 32-bit architecture.

. When used in relation to the ARM:

. Byte means 8 bits

. Halfword means 16 bits (two bytes)

. Word means 32 bits (four bytes)

. Most ARM’s implement two instruction sets

. 32-bit ARM Instruction Set

. 16-bit Thumb Instruction Set Data Types

. ARM processor supports 6 data types . 8-bits signed and unsigned bytes . 16-bits signed and unsigned half-word, aligned on 2-byte boundaries . 32-bits signed and unsigned words, aligned on 4-byte boundaries . ARM instructions are all 32-bit words, word-aligned . Thumb instructions are half-words, aligned on 2-byte boundaries

February 12, 23 2021 ARM Pipelining examples

. Fetch: Read Op-code from memory to internal . Decode: Activate the appropriate control lines depending on Opcode . Execute: Do the actual processing

ARM7TDMI Pipeline

FETCH DECODE EXECUTE Reg. Reg. Read Shift ALU Write

1 Clock cycle

ARM9TDMI Pipeline

FETCH DECODE EXECUTE MEMORY WRITE Reg. Reg. Shift ALU access Read Write

1 Clock cycle ARM7TDMI Pipelining (I)

. Simple instructions (like ADD) Complete at a rate of one per cycle

1 FETCH DECODE EXECUTE

2 FETCH DECODE EXECUTE

3 FETCH DECODE EXECUTE instruction time ARM7TDMI Pipelining (II)

• More complex instructions:

1 ADD FETCH DECODE EXECUTE

2 STR FETCH DECODE Cal. ADDR Data Xfer.

3 ADD FETCH stall DECODE EXECUTE

4 ADD FETCH stall DECODE EXECUTE

5 ADD FETCH DECODE EXECUTE instruction time

STR : 2 effective clock cycles (+1 cycle) Arithmetic and Carry Flag

• Same as 6502, PowerPC (Borrow = not Carry) • In contrast with Z80, Intel , m68k, many others (Borrow = Carry) Carry flag behavior for subtraction SBC R, #0 (4-bit examples) A B R 32 32 1 0 1 0 SUB 1 1 1 1 #0 0 Ci

32 Co 1 1 0 0 1 = 0 for ADD

Co Ci = 1 for SUB 1 0 1 0 R to C_flag = C_flag for ADC, SBC 32 1 1 1 1 #0 1 Ci Co ALU equivalent for arithmetic instructions 1 1 0 1 0

Carry acts as an inverted borrow Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Processor Modes

. The ARM has seven operating modes:

. User: unprivileged mode under which most tasks run

. FIQ: entered when a high priority (fast) interrupt is raised

. IRQ: entered when a low priority (normal) interrupt is raised

. SVC: (Supervisor) entered on reset and when a Software Interrupt instruction is executed

. Abort: used to handle memory access violations

. Undef: used to handle undefined instructions

. System: privileged mode using the same registers as user mode The Registers

. ARM has 37 registers all of which are 32-bits long. . 1 dedicated program . 1 dedicated current program . 5 dedicated saved program status registers . 30 general purpose registers

. The current processor mode governs which of several banks is accessible. Each mode can access . a particular set of r0-r12 registers . a particular r13 (the stack pointer, sp) and r14 (the link register, lr) . the , r15 (pc) . the current program status register, cpsr

. Privileged modes (except System) can also access . a particular spsr (saved program status register) The ARM Register Set

Current Visible Registers

rr00 IRQFIQUndefUserSVCAbort ModeMode ModeMode ModeMode r1 rr22 rr33 Banked out Registers rr44 rr55 User, r6 User FIQ IRQ SVC Undef Abort r7r7 SYS rr88 r8 rr88 r9 r9 r9 rr1010 r10 r10r10 rr1111 r11 r11 rr1212 r12 r12 rr1313 (sp) rr1313 (sp) rr1313 (sp) rr1313 (sp) r13r13 (sp) r13 (sp) rr1313 (sp) r14 (lr) rr1414 (lr) r14 (lr) rr1414 (lr) r14 (lr) rr1414 (lr) r14 (lr) r15r15 (pc)

cpsr spsr spsr spsr spsr spsr spsr Special Registers

. Special function registers: . PC (R15): Program Counter. Any instruction with PC as its destination register is a program branch

. LR (R14): Link Register. Saves a copy of PC when executing the BL instruction (subroutine call) or when jumping to an exception or interrupt routine - It is copied back to PC on the return from those routines

. SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even so, R13 is usually reserved as a pointer for the program-managed stack

. CPSR : Current Program Status Register. Holds the visible status register

. SPSR : Saved Program Status Register. Holds a copy of the previous status register while executing exception or interrupt routines - It is copied back to CPSR on the return from the exception or interrupt - No SPSR available in User or System modes Register Organization Summary

User, FIQ IRQ SVC Undef Abort SYS r0 r1 User r2 mode r3 r0-r7, r4 r15, User User User User r5 and mode mode mode mode cpsr r0-r12, r0-r12, r0-r12, r0-r12, r6 r15, r15, r15, r15, r7 and and and and r8 r8 cpsr cpsr cpsr cpsr r9 r9 r10 r10 r11 r11 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc)

cpsr spsr spsr spsr spsr spsr

Note: System mode uses the User mode register set Program Status Registers

 Condition code flags  N = Negative result from ALU  Z = Zero result from ALU  C = ALU operation Carried out  V = ALU operation oVerflowed • Interrupt Disable bits. 31 28 27 24 23 16 15 8 7 6 5 4 0 • I = 1: Disables the IRQ. N Z C V undefined I F T mode • F = 1: Disables the FIQ.

f s x c • T Bit (Arch. with Thumb mode only) • T = 0: Processor in ARM state  Mode bits • T = 1: Processor in Thumb state 10000 User • Never change T directly (use BX 10001 FIQ instead) 10010 IRQ • Changing T in CPSR will lead to 10011 Supervisor unexpected behavior due to pipelining 10111 Abort 11011 Undefined • Tip: Don’t change undefined bits. 11111 System • This allows for code compatibility with newer ARM processors Program Counter (r15)

. When the processor is executing in ARM state: . All instructions are 32 bits wide . All instructions must be word aligned . Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned).

. When the processor is executing in Thumb state: . All instructions are 16 bits wide . All instructions must be halfword aligned . Therefore the pc value is stored in bits [31:1] with bit [0] undefined (as instruction cannot be byte aligned).

. When the processor is executing in Jazelle state: . All instructions are 8 bits wide . Processor performs a word access to read 4 instructions at once Exception Handling

• When an exception occurs, the ARM: • Copies CPSR into SPSR_ • Sets appropriate CPSR bits • Change to ARM state • Change to exception mode • Disable interrupts (if appropriate) 0x1C FIQ • Stores the return address in LR_ 0x18 IRQ • Sets PC to vector address 0x14 (Reserved) • To return, exception handler needs to: 0x10 Data Abort • Restore CPSR from SPSR_ 0x0C Prefetch Abort • Restore PC from LR_ 0x08 Software Interrupt This can only be done in ARM state. 0x04 Undefined Instruction 0x00 Reset Vector Table Vector table can be at 0xFFFF0000 on ARM720T and on ARM9/10 family devices Development of the ARM Architecture

Improved Halfword 4 ARM/Thumb 5TE Jazelle and signed Interworking Java bytecode 5TEJ 1 halfword / CLZ execution byte support

System SA-110 Saturated maths ARM9EJ-S ARM926EJ-S mode 2 DSP multiply- SA-1110 ARM7EJ-S ARM1026EJ-S accumulate instructions 3 ARM1020E SIMD Instructions Thumb 6 instruction 4T Multi-processing set XScale Early ARM V6 Memory architectures architecture (VMSA) ARM7TDMI ARM9TDMI ARM9E-S Unaligned data ARM720T ARM940T ARM966E-S support ARM1136EJ-S Registers

. ARM has 37 registers, all of which are 32 bits long . 1 dedicated program counter . 1 dedicated current program status register . 5 dedicated saved program status registers . 31 general purpose registers . The current processor mode governs which bank is accessible . Each mode can access . A particular set of r0 – r12 registers . A particular r13 (stack pointer, SP) and r14 (link register, LR) . The program counter, r15 (PC) . The current program status register, CPSR . Privileged modes (except system) can access . A particular SPSR (Saved Program Status Register)

February 12, 38 2021 Registers Banking Again

r0 usable in user mode r1 r2 r3 exception modes only r4 r5 r6 r7 r8_fiq r8 r9 r9_fiq r10_fiq r10 r11 r11_fiq r12_fiq r13_irq r13_und r12 r13_abt r13_fiq r13_svc r14_irq r14_und r13 r14_svc r14_abt r14 r14_fiq r15 (PC)

SPSR_und SPSR_abt SPSR_irq CPSR SPSR_fiq SPSR_svc

system mode fiq svc abort irq undefined user mode mode mode mode mode mode

February 12, 39 2021 General Purpose Registers

. The unbanked registers . r0 – r15 . user and system mode refer to the same physical registers . The banked registers . r8_fiq – r12_fiq, r13_, and r14_ . The set of physical registers depend on the processor mode . r13 is normally used as the stack pointer (SP) . r14 is also known as the link register (LR), which is used to store the return address from a subroutine . Register 15, PC . r15 is the program counter

February 12, 40 2021 Program Counter (r15)

. When the processor is executing in ARM state: . All instructions are 32 bits wide . All instructions must be word-aligned . Therefore the PC value is stored in bits [32:2] with bits [1:0] undefined (as instruction cannot be halfword) . When the processor is executing in Thumb state: . All instructions are 16 bits wide . All instructions must be halfword-aligned . Therefore the PC value is stored in bits [32:1] with bits [0] undefined (as instruction cannot be byte-aligned)

February 12, 41 2021 Saved Program Status Register (SPSR)

. Each privileged mode (except system mode) has associated with it a SPSR . This SPSR is used to save the state of CPSR when the privileged mode is entered in order that the user state can be fully restored when the user process is resumed . Often the SPSR may be untouched from the time the privileged mode is entered to the time it is used to restore the CPSR . If the privileged supervisor calls to itself the SPSR must be copied into a general register and saved

February 12, 42 2021 Exceptions

. Exceptions are usually used to handle unexpected events which arise during the execution of a program, such as interrupts or memory faults, also cover software interrupts, undefined instruction traps, and the system reset . Three groups: . Exceptions generated as the direct effect of executing an instruction . Software interrupts, undefined instructions, and prefetch abort . Exceptions generated as a side effect of an instruction . Data aborts . Exceptions generated externally . Reset, IRQ and FIQ

February 12, 43 2021 Exception Entry (1/2)

. When an exception arises . ARM completes the current instruction as best it can (except that reset exception) . handle the exception which starts from a specific location (exception vector). . Processor performs the following sequence: . Change to the operating mode corresponding to the particular exception . Stores the return address in LR_ . Copy old CPSR into SPSR_ . Set appropriate CPSR bits . If core currently in Thumb state then ARM state is entered. . Disable IRQs by setting bit 7 . If the exception is a fast interrupt, disable further faster interrupt by setting bit 6 of the CPSR

February 12, 44 2021 Exception Entry (1/2)

. Force PC to relevant vector address

Priority Exception Mode vector address 1 Reset SVC 0x00000000 2 Data abort (data access memory fault) Abort 0x00000010 3 FIQ (fast interrupt ) FIQ 0x0000001C 4 IRQ (normal interrupt) IRQ 0x00000018 5 Prefetch abort (instruction fetch memory fault) Abort 0c0000000C 6 Undefined instruction UND 0x00000004 Software interrupt (SWI) SVC 0x00000008

. Normally the vector address contains a branch to the relevant routine . Exception handler use r13_ and r14_ to hold the stack point and return address

February 12, 45 2021 Exception Return

. Once the exception has been handled, the user task is normally resumed . The sequence is . Any modified user registers must be restored from the handler’s stack . CPSR must be restored from the appropriate SPSR . PC must be changed back to the relevant instruction address . The last two steps happen atomically as part of a single instruction

February 12, 46 2021 Memory Organization

. Word, half-word alignment (xxxx00 or xxxxx0) . ARM can be set up to access data in either little-endian or big-endian format, through they default to little- endian.

February 12, 47 2021 Features of the ARM Instruction Set

. Load-store architecture . Process values which are in registers . Load, store instructions for memory data accesses . 3-address data processing instructions . Conditional execution of every instruction . Load and store multiple registers . Shift, ALU operation in a single instruction . Open instruction set extension through the instruction . Very dense 16-bit compressed instruction set (Thumb)

February 12, 48 2021

. Up to 16 coprocessors can be defined . Expands the ARM instruction set . Each coprocessor can have up to 16 private registers of any reasonable size . Load-store architecture

A coprocessor is a computer processor used to supplement the functions of the ARM core CoprocessorX CoprocessorY primary processor (the CPU). Operations performed by the coprocessor may be F D E F D E F D E floating point arithmetic, graphics, signal processing, string processing, cryptography or I/O interfacing with peripheral devices.

February 12, 49 2021 Thumb

. Thumb is a 16-bit instruction set . Optimized for code density from C code . Improved performance form narrow memory . Subset of the functionality of the ARM instruction set . Core has two execution states – ARM and Thumb . between them using BX instruction . Thumb has characteristic features: . Most Thumb instructions are executed unconditionally . Many Thumb data process instruction use a 2-address format . Thumb instruction formats are less regular than ARM instruction formats, as a result of the dense encoding.

February 12, 50 2021 I/O System

. ARM handles input/output peripherals as memory-mapped with interrupt support . Internal registers in I/O devices as addressable locations with ARM’s memory map read and written using load-store instructions . Interrupt by normal interrupt (IRQ) or fast interrupt (FIQ) . Interrupt input signals are level-sensitive and maskable . May include Direct Memory Access (DMA) hardware

February 12, 51 2021 Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Conditional Execution and Flags

• ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field. • This improves code density and performance by reducing the number of forward branch instructions. CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2 skip

• By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”. loop … SUBS r1,r1,#1 decrement r1 and set flags BNE loop if Z flag clear then branch Condition Codes

. The possible condition codes are listed below: . Note AL is the default and does not need to be specified

Suffix Description Flags tested EQ Equal Z=1 NE Not equal Z=0 CS/HS Unsigned higher or same C=1 CC/LO Unsigned lower C=0 MI Minus N=1 PL Positive or Zero N=0 VS Overflow V=1 VC No overflow V=0 HI Unsigned higher C=1 & Z=0 LS Unsigned lower or same C=0 or Z=1 GE Greater or equal N=V LT Less than N!=V GT Greater than Z=0 & N=V LE Less than or equal Z=1 or N=!V AL Always Examples of conditional execution

• Use a sequence of several conditional instructions if (a==0) func(1); CMP r0,#0 MOVEQ r0,#1 BLEQ func

• Set the flags, then use various condition codes if (a==0) x=0; if (a>0) x=1; CMP r0,#0 MOVEQ r1,#0 MOVGT r1,#1

• Use conditional compare instructions if (a==4 || a==10) x=0; CMP r0,#4 CMPNE r0,#10 MOVEQ r1,#0 Branch instructions

. Branch : B{} label . Branch with Link : BL{} subroutine_label

31 28 27 25 24 23 0

Cond 1 0 1 L Offset

Link bit 0 = Branch 1 = Branch with link Condition field

. The processor core shifts the offset field left by 2 positions, sign-extends it and adds it to the PC . ± 32 Mbyte range . How to perform longer branches? Data processing Instructions

. Consist of : . Arithmetic: ADD ADC SUB SBC RSB RSC . Logical: AND ORR EOR BIC . Comparisons: CMP CMN TST TEQ . Data movement: MOV MVN

. These instructions only work on registers, NOT memory.

. Syntax:

. {}{S} Rd, Rn, Operand2

. Comparisons set flags only - they do not specify Rd . Data movement does not specify Rn

. Second operand is sent to the ALU via barrel shifter. Data processing Instructions

 Consist of :  Arithmetic: ADD ADC SUB SBC RSB RSC  Logical: AND ORR EOR BIC  Comparisons: CMP CMN TST TEQ  Data movement: MOV MVN

 These instructions only work on registers, NOT memory. 31 28 25 24 21 20 19 16 15 12 11 0 cond. 0 0 L op-code S Rn Rd Operand 2

• L, Literal: 0: Operand 2 from register, 1: Operand 2 immediate  Syntax: • {}{S} Rd, Rn, Operand2

 {S} means that the Status register is going to be updated  Comparisons always update the status register. Rd is not specified  Data movement does not specify Rn  Second operand is sent to the ALU via barrel shifter. The Barrel Shifter

LSL : Logical Left Shift ASR: Arithmetic Right Shift

CF Destination 0 Destination CF Multiplication by a power of 2 Division by a power of 2, preserving the sign bit

LSR : Right ROR: Rotate Right

...0 Destination CF Destination CF

Division by a power of 2 Bit rotate with wrap around from LSB to MSB

RRX: Rotate Right Extended

Destination CF

Single bit rotate with wrap around from CF to MSB Using the Barrel Shifter: The Second Operand

Operand Operand Register, optionally with shift operation 1 2 • Shift value can be either be: • 5 bit unsigned integer • Specified in bottom byte of another register. Barrel • Used for multiplication by constant Shifter Immediate value • 8 bit number, with a range of 0-255. • Rotated right through even number of positions • Allows increased range of 32-bit constants to be ALU loaded directly into registers

Result Immediate constants (1)

. No ARM instruction can contain a 32 bit immediate constant . All ARM instructions are fixed as 32 bits long . The data processing instruction format has 12 bits available for operand2

11 8 7 0 rot immed_8 Quick Quiz: x2 0xe3a004ff Shifter ROR MOV r0, #???

. 4 bit rotate value (0-15) is multiplied by two to give range 0-30 in steps of 2 . Rule to remember is “8-bits shifted by an even number of bit positions”. Immediate constants (2)

• Examples:

31 0 ror #0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000000ff step 0x00000001

ror #8 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0xff000000 step 0x01000000

ror #30 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 range 0-0x000003fc step 0x00000004

• The assembler converts immediate values to the rotate form: • MOV r0,#4096 ; uses 0x40 ror 26 • ADD r1,r2,#0xFF0000 ; uses 0xFF ror 16

• The bitwise complements can also be formed using MVN: • MOV r0, #0xFFFFFFFF ; assembles to MVN r0,#0

• Values that cannot be generated in this way will cause an error. Loading 32 bit constants

• To allow larger constants to be loaded, the assembler offers a pseudo-instruction: • LDR rd, =const • This will either: • Produce a MOV or MVN instruction to generate the value (if possible). or • Generate a LDR instruction with a PC-relative address to read the constant from a literal pool (Constant data area embedded in the code). • For example • LDR r0,=0xFF => MOV r0,#0xFF • LDR r0,=0x55555555 => LDR r0,[PC,#Imm12] … … DCD 0x55555555 • This is the recommended way of loading constants into a register Multiply

. Syntax: . MUL{}{S} Rd, Rm, Rs Rd = Rm * Rs . MLA{}{S} Rd,Rm,Rs,Rn Rd = (Rm * Rs) + Rn . [U|S]MULL{}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := Rm*Rs . [U|S]MLAL{}{S} RdLo, RdHi, Rm, Rs RdHi,RdLo := (Rm*Rs)+RdHi,RdLo

. Cycle time . Basic MUL instruction . 2-5 cycles on ARM7TDMI . 1-3 cycles on StrongARM/XScale . 2 cycles on ARM9E/ARM102xE . +1 cycle for ARM9TDMI (over ARM7TDMI) . +1 cycle for accumulate (not on 9E though result delay is one cycle longer) . +1 cycle for “long”

. Above are “general rules” - refer to the TRM for the core you are using for the exact details Single register data transfer

LDR STR Word LDRB STRB Byte LDRH STRH Halfword LDRSB Signed byte load LDRSH Signed halfword load

• Memory system must support all access sizes

• Syntax: • LDR{}{} Rd,

• STR{}{} Rd,

e.g. LDREQB Address accessed

. Address accessed by LDR/STR is specified by a base register plus an offset . For word and unsigned byte accesses, offset can be . An unsigned 12-bit immediate value (ie 0 - 4095 bytes). LDR r0,[r1,#8] . A register, optionally shifted by an immediate value LDR r0,[r1,r2] LDR r0,[r1,r2,LSL#2]

. This can be either added or subtracted from the base register: LDR r0,[r1,#-8] LDR r0,[r1,-r2] LDR r0,[r1,-r2,LSL#2]

. For halfword and signed halfword / byte, offset can be: . An unsigned 8 bit immediate value (ie 0-255 bytes). . A register (unshifted). . Choice of pre-indexed or post-indexed addressing Pre or Post Indexed Addressing?

. Pre-indexed: STR r0,[r1,#12]

r0 Offset Source 12 0x20c 0x5 0x5 Register for STR r1 Base Register 0x200 0x200

Auto-update form: STR r0,[r1,#12]!

. Post-indexed: STR r0,[r1],#12

Updated r1 Offset Base 0x20c 12 0x20c Register r0 Source Original r1 0x5 Register for STR Base 0x200 0x5 Register 0x200 LDM / STM operation

• Syntax: {} Rb{!}, • 4 addressing modes: LDMIA / STMIA increment after LDMIB / STMIB increment before LDMDA / STMDA decrement after LDMDB / STMDB decrement before

IA IB DA DB LDMxx r10, {r0,r1,r4} r4 STMxx r10, {r0,r1,r4} r4 r1 r1 r0 Increasing Base Register (Rb) r10 r0 r4 Address r1 r4 r0 r1 r0 Software Interrupt (SWI)

31 28 27 24 23 0

Cond 1 1 1 1 SWI number (ignored by processor)

Condition Field

. Causes an exception trap to the SWI hardware vector . The SWI handler can examine the SWI number to decide what operation has been requested. . By using the SWI mechanism, an operating system can implement a set of privileged operations which applications running in user mode can request. . Syntax: . SWI{} PSR Transfer Instructions

31 28 27 24 23 16 15 8 7 6 5 4 0 N Z C V Q J U n d e f i n e d I F T mode f s x c

. MRS and MSR allow contents of CPSR / SPSR to be transferred to / from a general purpose register. . Syntax: . MRS{} Rd, ; Rd = . MSR{} ,Rm ; = Rm . where . = CPSR or SPSR . [_fields] = any combination of ‘fsxc’ . Also an immediate form . MSR{} ,#Immediate . In User Mode, all bits can be read but only the condition flags (_f) can be written. ARM Branches and Subroutines

. B

func1 func2

STMFD : : sp!,{regs,lr} : : : : BL func1 BL func2 : : : : : LDMFD sp!,{regs,pc} MOV pc, lr Thumb

• Thumb is a 16-bit instruction set • Optimised for code density from C code (~65% of ARM code size) • Improved performance from narrow memory • Subset of the functionality of the ARM instruction set • Core has additional execution state - Thumb • Switch between ARM and Thumb using BX instruction

31 ADDS r2,r2,#1 0 32-bit ARM Instruction For most instructions generated by compiler: . Conditional execution is not used . Source and destination registers identical . Only Low registers used . Constants are of limited size . Inline barrel shifter not used 15 ADD r2,#1 0 16-bit Thumb Instruction Atomic data swap

. Exchanges a word or byte between a register and a memory location

. This operation cannot be interrupted, not even by DMA

. Main use: Operating System semaphores

. Syntax: . SWP {} Rd, Rm, [Rn] . SWPB{} Rd, Rm, [Rn]

. Rd=[Rn]; [Rn]=Rm (Rd and Rm can be the same) Exception / Interrupt Return

. How to restore CPSR from SPCR?

. Data processing instruction with S-bit set (update status) and PC as the destination register:

. MOVS pc, lr

. SUBS pc, lr, #4

. Load Multiple, restoring PC from a stack, and with the special qualifier ‘^’:

. LDMFD sp!, {r0-r12, pc}^

. Different return for each exception/interrupt:

SWI: MOVS pc, lr UNDEF: MOVS pc, lr FIQ: SUBS pc, lr, #4 IRQ: SUBS pc, lr, #4 Prefetch Abort: SUBS pc, lr, #4 Data Abort: SUBS pc, lr, #8 Coprocessors

. Coprocessor instructions: . Coprocessor data operation: CDP . Coprocessor Load/Store: LDC, STC . Coprocessor register transfer: MRC, MCR . (some coprocessors, like P14 and P15, only support MRC and MCR)

. A 4-bit coprocessor number (Pxx) has to be specified in these instructions.

. Result in UNDEF exceptions if coprocessor is missing

. The most common coprocessors: . P15: System control (cache, MMU, …) . P14: Debug (Debug Communication Channel) . P1, P4, P10: Floating point (FPA, FPE, Maverick, VFP, …)

. The assembler can translate the floating-point mnemonics into coprocessor instructions. Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Example ARM-based System

16 bit RAM 32 bit RAM

Interrupt Controller I/O nIRQ nFIQ Peripherals

ARM Core 8 bit ROM AMBA

Arbiter Reset

ARM TIC Remap/ Timer External Bus Interface Pause ROM External Bus Interface External Bridge RAM On-chip Interrupt Decoder RAM Controller

AHB or ASB APB

System Bus Peripheral Bus

. AMBA . ACT . Advanced Bus Architecture . AMBA Compliance Testbench

. ADK . PrimeCell . Complete AMBA Design Kit . ARM’s AMBA compliant peripherals Agenda

. Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools The RealView Product Families

Compilation Tools Debug Tools Platforms

ARM Developer Suite (ADS) – AXD (part of ADS) ARMulator (part of ADS) Compilers (C/C++ ARM & Thumb), Trace Debug Tools Integrator™ Family Linker & Utilities Multi-ICE Multi-Trace

RealView Compilation Tools (RVCT) RealView Debugger (RVD) RealView ARMulator ISS (RVISS) RealView ICE (RVI) RealView Trace (RVT) ARM Debug Architecture

Ethernet

Debugger (+ optional trace tools)

JTAG port Trace Port . EmbeddedICE Logic . Provides breakpoints and processor/system access TAP . JTAG interface (ICE) controller . Converts debugger commands to JTAG signals ETM

. Embedded trace Macrocell (ETM) EmbeddedICE Logic . Compresses real-time instruction and data access trace . Contains ICE features (trigger & filter logic) ARM . Trace port analyzer (TPA) core . Captures trace in a deep buffer References

• https://havasi.sed.hu/sites/havasi.sed.hu/files/download/ARM.ppt • https://www.ele.uva.es/~jesus/hardware_empotrado/ARM2.ppt

All this material has the contente based on these source files. All copyright are of these authors. I did this material with the purpose for teaching. Thanks for the authors to available this rich material.

February 12, 82 2021