Introduction to the ARM Architecture

Introduction to the ARM Architecture Max Mauro Dias Santos [email protected] Ponta Grossa, Paraná - Brazil February 12, 2021 Agenda . Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools ARM Partnership Model Design Manufacture Market ARM Powered Products ARM Powered Products ARM Powered Products ARM 7 applications ARM9 applications ARM11 applications ARM CortexM applications . Dell E4300 Latitude Laptop . The hardware is used at the instant of boot-up for users and access to select applications, with multi-day battery lifetimes ARM CortexA applications ARM CortexR Agenda . Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Intellectual Property . ARM provides hard and soft views to licenses . RTL and synthesis flows . GDSII layout . Licenses have the right to use hard or soft views of the IP . soft views include gate level netlists . hard views are DSMs . OEMs must use hard views . to protect ARM IP Topologies Memory-mapped I/O: Von Neumann Harvard . No specific instructions for I/O (use ARM9s Load/Store instr. instead) ARM7s and newers and olders . Peripheral’s registers at some Inst. Data memory addresses AHB bus I D Cache Cache MEMORY & I/O Bus Interface AHB bus MEMORY & I/O ARM7TDMI Block Diagram A[31:0] Address Register Address Incrementer PC bus PC REGISTER BANK INSTRUCCTION ALU bus DECODER Multiplier Control Lines A bus B bus SHIFT A.L.U. Instruction Reg. Thumb to ARM Write Data Reg. Read Data Reg. translator D[31:0] ARM7TDMI Block Diagram . Load/store architecture . A large array of uniform registers . Fixed-length 32-bit instructions . 3-address instructions RISC Architecture . Berkeley incorporated a Reduced Instruction Set Computer (RISC) architecture. It has the following key features: . A fixed (32-bit) instruction size with few formats; . CISC processors typically had variable length instruction sets with many formats. A load–store architecture where instructions that process data operate only on registers and are separate from instructions that access memory; . CISC processors typically allowed values in memory to be used as operands in data processing instructions. A large register bank of thirty-two 32-bit registers, all of which could be used for any purpose, to allow the load- store architecture to operate efficiently; . CISC register sets were getting larger, but none was this large and most had different registers for different purposes February 12, 18 2021 RISC Organization . Hard-wired instruction decode logic . CISC processor used large microcode ROMs to decode their instructions . Pipelined execution . CISC processors allowed little, if any, overlap between consecutive instructions (though they do now) . Single-cycle execution . CISC processors typically took many clock cycles to completes a single instruction → Simple is beauty Compiler plays an important role February 12, 19 2021 ARM Architecture vs. Berkeley RISC . Features used . Load/Store architecture . Fixed-length 32-bit instructions . 3-address instruction formats f bits n bits n bits n bits function op 1 addr. op 2 addr. dest. addr. ADD d, S1, S2 ; d := S1 + S2 . Features rejected . Register windows → costly . Use shadow (banked) registers in ARM . Delay branch . Badly with branch prediction . Single-cycle execution of all instructions . Most single cycle, many other take multiple clock cycles February 12, 20 2021 ARM Features . Different from pure RISC in several ways: . Variable cycle execution for certain instructions: multiple-register load/store (faster/higher code density) . Inline barrel shifter leading to more complex instructions: improves performance and code density . Thumb 16-bit instruction set: 30% code density improvement . Conditional execution: improve performance and code density by reducing branch . Enhanced instructions: DSP instructions February 12, 21 2021 Data Sizes and Instruction Sets . The ARM is a 32-bit architecture. When used in relation to the ARM: . Byte means 8 bits . Halfword means 16 bits (two bytes) . Word means 32 bits (four bytes) . Most ARM’s implement two instruction sets . 32-bit ARM Instruction Set . 16-bit Thumb Instruction Set Data Types . ARM processor supports 6 data types . 8-bits signed and unsigned bytes . 16-bits signed and unsigned half-word, aligned on 2-byte boundaries . 32-bits signed and unsigned words, aligned on 4-byte boundaries . ARM instructions are all 32-bit words, word-aligned . Thumb instructions are half-words, aligned on 2-byte boundaries February 12, 23 2021 ARM Pipelining examples . Fetch: Read Op-code from memory to internal Instruction Register . Decode: Activate the appropriate control lines depending on Opcode . Execute: Do the actual processing ARM7TDMI Pipeline FETCH DECODE EXECUTE Reg. Reg. Read Shift ALU Write 1 Clock cycle ARM9TDMI Pipeline FETCH DECODE EXECUTE MEMORY WRITE Reg. Reg. Shift ALU access Read Write 1 Clock cycle ARM7TDMI Pipelining (I) . Simple instructions (like ADD) Complete at a rate of one per cycle 1 FETCH DECODE EXECUTE 2 FETCH DECODE EXECUTE 3 FETCH DECODE EXECUTE instruction time ARM7TDMI Pipelining (II) • More complex instructions: 1 ADD FETCH DECODE EXECUTE 2 STR FETCH DECODE Cal. ADDR Data Xfer. 3 ADD FETCH stall DECODE EXECUTE 4 ADD FETCH stall DECODE EXECUTE 5 ADD FETCH DECODE EXECUTE instruction time STR : 2 effective clock cycles (+1 cycle) Arithmetic and Carry Flag • Same as 6502, PowerPC (Borrow = not Carry) • In contrast with Z80, Intel x86, m68k, many others (Borrow = Carry) Carry flag behavior for subtraction SBC R, #0 (4-bit examples) A B R 32 32 1 0 1 0 SUB 1 1 1 1 #0 0 Ci 32 Co 1 1 0 0 1 = 0 for ADD Co adder Ci = 1 for SUB 1 0 1 0 R to C_flag = C_flag for ADC, SBC 32 1 1 1 1 #0 1 Ci Co ALU equivalent for arithmetic instructions 1 1 0 1 0 Carry acts as an inverted borrow Agenda . Introduction to ARM . Architecture . Programmers Model . Instruction Set . System Design . Development Tools Processor Modes . The ARM has seven operating modes: . User: unprivileged mode under which most tasks run . FIQ: entered when a high priority (fast) interrupt is raised . IRQ: entered when a low priority (normal) interrupt is raised . SVC: (Supervisor) entered on reset and when a Software Interrupt instruction is executed . Abort: used to handle memory access violations . Undef: used to handle undefined instructions . System: privileged mode using the same registers as user mode The Registers . ARM has 37 registers all of which are 32-bits long. 1 dedicated program counter . 1 dedicated current program status register . 5 dedicated saved program status registers . 30 general purpose registers . The current processor mode governs which of several banks is accessible. Each mode can access . a particular set of r0-r12 registers . a particular r13 (the stack pointer, sp) and r14 (the link register, lr) . the program counter, r15 (pc) . the current program status register, cpsr . Privileged modes (except System) can also access . a particular spsr (saved program status register) The ARM Register Set Current Visible Registers rr00 IRQFIQUndefUserSVCAbort ModeMode ModeMode ModeMode r1 rr22 rr33 Banked out Registers rr44 rr55 User, r6 User FIQ IRQ SVC Undef Abort r7r7 SYS rr88 r8 rr88 r9 r9 r9 rr1010 r10 r10r10 rr1111 r11 r11 rr1212 r12 r12 rr1313 (sp) rr1313 (sp) rr1313 (sp) rr1313 (sp) r13r13 (sp) r13 (sp) rr1313 (sp) r14 (lr) rr1414 (lr) r14 (lr) rr1414 (lr) r14 (lr) rr1414 (lr) r14 (lr) r15r15 (pc) cpsr spsr spsr spsr spsr spsr spsr Special Registers . Special function registers: . PC (R15): Program Counter. Any instruction with PC as its destination register is a program branch . LR (R14): Link Register. Saves a copy of PC when executing the BL instruction (subroutine call) or when jumping to an exception or interrupt routine - It is copied back to PC on the return from those routines . SP (R13): Stack Pointer. There is no stack in the ARM architecture. Even so, R13 is usually reserved as a pointer for the program-managed stack . CPSR : Current Program Status Register. Holds the visible status register . SPSR : Saved Program Status Register. Holds a copy of the previous status register while executing exception or interrupt routines - It is copied back to CPSR on the return from the exception or interrupt - No SPSR available in User or System modes Register Organization Summary User, FIQ IRQ SVC Undef Abort SYS r0 r1 User r2 mode r3 r0-r7, r4 r15, User User User User r5 and mode mode mode mode cpsr r0-r12, r0-r12, r0-r12, r0-r12, r6 r15, r15, r15, r15, r7 and and and and r8 r8 cpsr cpsr cpsr cpsr r9 r9 r10 r10 r11 r11 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) cpsr spsr spsr spsr spsr spsr Note: System mode uses the User mode register set Program Status Registers Condition code flags N = Negative result from ALU Z = Zero result from ALU C = ALU operation Carried out V = ALU operation oVerflowed • Interrupt Disable bits. 31 28 27 24 23 16 15 8 7 6 5 4 0 • I = 1: Disables the IRQ. N Z C V undefined I F T mode • F = 1: Disables the FIQ. f s x c • T Bit (Arch. with Thumb mode only) • T = 0: Processor in ARM state Mode bits • T = 1: Processor in Thumb state 10000 User • Never change T directly (use BX 10001 FIQ instead) 10010 IRQ • Changing T in CPSR will lead to 10011 Supervisor unexpected behavior due to pipelining 10111 Abort 11011 Undefined • Tip: Don’t change undefined bits. 11111 System • This allows for code compatibility with newer ARM processors Program Counter (r15) . When the processor is executing in ARM state: . All instructions are 32 bits wide . All instructions must be word aligned . Therefore the pc value is stored in bits [31:2] with bits [1:0] undefined (as instruction cannot be halfword or byte aligned).

Introduction to the ARM Architecture

Implementation of Optimized CORDIC Designs

Master's Thesis

Arithmetic and Logical Unit Design for Area Optimization for Microcontroller Amrut Anilrao Purohit 1,2 , Mohammed Riyaz Ahmed 2 and R

8-Bit Barrel Shifter That May Rotate the Data in Both Direction

A Lisp Oriented Architecture by John W.F

4-Bit Barrel Shifter Using Transmission Gates

Tms320c54x DSP Functional Overview

The Implementation of Prolog Via VAX 8600 Microcode ABSTRACT

Algorithms and Hardware Designs for Decimal Multiplication

Fpga Implementation of Barrel Shifter Using Revesible Logic

PDSP1601/PDSP1601A PDSP1601/PDSP1601A ALU and Barrel Shifter

Design of an Optimized CORDIC for Fixed Angle of Rotation