The ARM Architecture
Total Page:16
File Type:pdf, Size:1020Kb
The ARM Architecture 1 Agenda . Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines SoC Design Development Tools 2 ARM Ltd . Founded in November 1990 . Spun out of Acorn Computers . Initial funding from Apple, Acorn and VLSI . Designs the ARM range of RISC processor cores . Licenses ARM core designs to semiconductor partners who fabricate and sell to their customers . ARM does not fabricate silicon itself . Also develop technologies to assist with the design- in of the ARM architecture . Software tools, boards, debug hardware . Application software . Bus architectures . Peripherals, etc 3 ARM’s Activities Connected Community Development Tools Software IP Processors mmeemmoorryy System Level IP: Data Engines SSooCC Fabric 3D Graphics Physical IP 4 ARM Connected Community – 700+ 5 5 Huge Range of Applications IR Fire Detector Exercise Utility Intelligent Machines Energy Efficient Appliances Intelligent toys Meters Vending Tele-parking Equipment Adopting 32-bit ARM Microcontrollers 6 World’s Smallest ARM Computer? Battery Solar Cells Wireless Sensor Network Sensors, timers Cortex-M0 +16KB RAM 65nm UWB Radio antenna 10 kB Storage memory ~3fW/bit 12µAh Li-ion Battery A B C Processor, SRAM and PMU Wirelessly networked into large scale sensor arrays Cortex-M0; 65¢ University of Michigan 7 World’s Largest ARM Computer? 4200 ARM powered Neutrino Detectors 70 bore holes 2.5km deep 60 detectors per string starting 1.5km down 1km3 of active telescope Work supported by the National Science Foundation and University of Wisconsin-Madison 8 From 1mm3 to 1km3 1mm3 1km3 10¢ $1000 Mobile Home Mobile Computing Server Embedded Consumer Enterprise PC HPC 9 Agenda Introduction to ARM Ltd . ARM Architecture/Programmers Model Data Path and Pipelines SoC Design Development Tools 10 ARM Cortex Processors (v7) . ARM Cortex-A family (v7-A): . Applications processors for full OS and 3rd party applications x1-4 Cortex-A15 ...2.5GHz x1-4 Cortex-A9 . ARM Cortex-R family (v7-R): Cortex-A8 x1-4 . Embedded processors for real-time Cortex-A5 signal processing, control applications 1-2 R Heron Cortex-R4 . ARM Cortex-M family (v7-M): Cortex-M4 SC300™ . Microcontroller-oriented processors Cortex™-M3 for MCU and SoC applications Cortex-M1 Cortex-M0 12k gates... 11 Relative Performance* 2500 2000 ) z h M ( y 1500 c n e u q e 1000 r F x a M 500 0 Cortex- Cortex- Cortex-A9 ARM7 ARM926 ARM1026 ARM1136 ARM1176 Cortex-A8 M0 M3 Dual-core Max Freq (MHz) 50 150 184 470 540 610 750 1100 2000 Min Power (mW/MHz) 0.012 0.06 0.35 0.235 0.36 0.335 0.568 0.43 0.5 *Represents attainable speeds in 130, 90, 65, or 45nm processes 12 Cortex family Cortex-A8 Cortex-R4 Cortex-M3 . Architecture v7A . Architecture v7R . Architecture v7M . MMU . MPU (optional) . MPU (optional) . AXI . AXI . AHB Lite & APB . VFP & NEON support . Dual Issue 13 Cortex-M0 DesignStart ARM Cortex-M4 “32-bit/DSC” applications Efficient digital signal control ARM Cortex-M3 “16/32-bit” applications Performance efficiency ARM Cortex-M0 “8/16-bit” applications Low-cost & simplicity 14 Cortex-M0 DesignStart (2) ARM Cortex-M0 processor Full product “M0_DS” features options implementation Zero jitter 32-bit RISC core AMBA AHB-lite interface M i n ARMv6-M instruction set architecture i m u NVIC Interrupt controller m u s Interrupt line configurations 1 to 32 16 only a b l Debug (SWD, JTAG) option e Up to 4 breakpoints, 2 watchpoints Low power optimisations (ACG) Multiple power domain support with WIC Fast multiplier (1 cycle) option System timer Area (gates) 12k – 25k 16K 15 ARM and Thumb Performance 30000 25000 20000 Dhrystone 2.1/sec @ 20MHz 15000 ARM Thumb 10000 5000 0 32-bit 16-bit 16-bit with 32-bit stack Memory width (zero wait state) 16 The Thumb-2 instruction set . Variable-length instructions . ARM instructions are a fixed length of 32 bits . Thumb instructions are a fixed length of 16 bits . Thumb-2 instructions can be either 16-bit or 32-bit . Thumb-2 gives approximately 26% improvement in code density over ARM . Thumb-2 gives approximately 25% improvement in performance over Thumb 17 Processor Modes . The ARM has seven basic operating modes: . User : unprivileged mode under which most tasks run . FIQ : entered when a high priority (fast) interrupt is raised . IRQ : entered when a low priority (normal) interrupt is raised . Supervisor : entered on reset and when a Software Interrupt instruction is executed . Abort : used to handle memory access violations . Undef : used to handle undefined instructions . System : privileged mode using the same registers as user mode 18 The ARM Register Set Current Visible Registers r0 IRQFIQUndefUserSVCAbort Mode Mode Mode Mode Mode Mode r1 r2 r3 Banked out Registers r4 r5 r6 User FIQ IRQ SVC Undef Abort r7 r8 r8 r8 r9 r9 r9 r10 r10 r10 r11 r11 r11 r12 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc) cpsr spsr spsr spsr spsr spsr spsr 19 Exception Handling . When an exception occurs, the ARM: . Copies CPSR into SPSR_<mode> . Sets appropriate CPSR bits . Change to ARM state 0x1C FIQ 0x18 IRQ . Change to exception mode 0x14 (Reserved) . Disable interrupts (if appropriate) 0x10 Data Abort . Stores the return address in LR_<mode> 0x0C Prefetch Abort . Sets PC to vector address 0x08 Software Interrupt 0x04 Undefined Instruction . To return, exception handler needs to:0x00 Reset . Restore CPSR from SPSR_<mode> Vector Table . Restore PC from LR_<mode> Vector table can be at 0xFFFF0000 on ARM720T This can only be done in ARM state. and on ARM9/10 family devices 20 Cortex-M3 Programmer’s Model Main r0 . Fully programmable in C r1 Stack-based exception model r2 . r3 Only two processor modes r4 . r5 . Thread Mode for User tasks r6 r7 . Handler Mode for OS tasks and exceptions r8 Vector table contains addresses r9 . r10 r11 r12 Process sp sp lr r15 (pc) xPSR 21 Conditional Execution and Flags . ARM instructions can be made to execute conditionally by postfixing them with the appropriate condition code field. This improves code density and performance by reducing the number of forward branch instructions. CMP r3,#0 CMP r3,#0 BEQ skip ADDNE r0,r1,r2 ADD r0,r1,r2 skip . By default, data processing instructions do not affect the condition code flags but the flags can be optionally set by using “S”. CMP does not need “S”. loop … SUBS r1,r1,#1 decrement r1 and set flags BNE loop if Z flag clear then branch 22 Data processing Instructions . Consist of : . Arithmetic: ADD ADC SUB SBC RSB RSC . Logical: AND ORR EOR BIC . Comparisons: CMP CMN TST TEQ . Data movement: MOV MVN . These instructions only work on registers, NOT memory. Syntax: <Operation>{<cond>}{S} Rd, Rn, Operand2 . Comparisons set flags only - they do not specify Rd . Data movement does not specify Rn . Second operand is sent to the ALU via barrel shifter. 23 Using a Barrel Shifter:The 2nd Operand Operand Operand Register, optionally with shift operation 1 2 . Shift value can be either be: . 5 bit unsigned integer . Specified in bottom byte of Barrel another register. Shifter . Used for multiplication by constant Immediate value . 8 bit number, with a range of 0-255. Rotated right through even ALU . number of positions . Allows increased range of 32-bit constants to be loaded directly into Result registers 24 Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model . Data Path and Pipelines SoC Design Development Tools 25 The ARM7TDM Core ABE A[31:0] Address Incrementer Address Register Incrementer P C BIGEND MCLK nWAIT Register Bank PC Update nRW Instruction A MAS[1:0] Decode Stage Decoder L A B ISYNC Instruction nIRQ U Decompression nFIQ Multiplier nRESET B B and ABORT nTRANS B u u Read Data nMREQ u s Barrel s Register SEQ Control LOCK s Shifter Logic nM[4:0] Write Data nOPC Register nCPI 32 Bit ALU CPA CPB DBE D[31:0] 26 Cortex-M3 Datapath I_HRDATA Instruction Decode Write Data D_HWDATA Address Register Incrementer D_HRDATA D_HADDR Read Data Address Register Register B Address Register Barrel Incrementer Mul/Div Bank Shifter I_HADDR ALU A ALU Address Register Writeback INTADDR 27 Pipeline changes for ARM9TDMI ARM7TDMI ARM decode Instruction Reg Reg ThumbARM Shift ALU Fetch decompress Read Write Reg Select FETCH DECODE EXECUTE ARM9TDMI ARM or Thumb Inst Decode Reg Instruction Shift + ALU Memory Fetch Reg Reg Access Write Decode Read FETCH DECODE EXECUTE MEMORY WRITE 28 Cortex-M3 Pipeline . Cortex-M3 has 3-stage fetch-decode-execute pipeline . Similar to ARM7 . Cortex-M3 does more in each stage to increase overall performance 1st Stage - Fetch 2nd Stage - Decode 3rd Stage - Execute Address Data Phase AGU Phase & Write Load/Store & Back Branch Instruction Fetch Decode & Write (Prefetch) Multiply & Divide Register Read Branch Shift ALU & Branch Branch forwarding & speculation Execute stage branch (ALU branch & Load Store Branch) 29 ARM10 vs. ARM11 Pipelines ARM10 Branch ARM or Memory Prediction Shift + ALU Thumb Reg Read Access Reg Instruction Write Instruction Multiply Decode Multiply Fetch Add FETCH ISSUE DECODE EXECUTE MEMORY WRITE ARM11 Shift ALU Saturate Fetch Fetch MAC MAC MAC Write Decode Issue 1 2 1 2 3 back Data Data Address Cache Cache 1 2 30 NEON register file Stage NEON Pipeline - 10 Architectural register file A8 Pipeline Diagram - Stage Integer Pipeline - 13 31 Full Cortex Agenda Introduction to ARM Ltd ARM Architecture/Programmers Model Data Path and Pipelines . SoC Design Development Tools 32 An Example AMBA System High Performance APB ARM processor UART High Bandwidth AHB Timer APB External Bridge Memory Keypad Interface High-bandwidth DMA PIO on-chip RAM Bus Master Low Power Non-pipelined High Performance Simple Interface Pipelined Burst Support Multiple Bus Masters 33 AHB Structure Arbiter HADDR HADDR HWDATA Slave Master HWDATA #1 #1 HRDATA HRDATA Address/Control Slave #2 Master #2 Write Data Slave Read Data #3 Master #3 Slave #4 Decoder 34 AHB basic signal timing Address Phase Data Phase A Data Phase B A Address Phase B HCLK HADDR A B C HWRITE A B C HWDATA A B HRDATA A B HRESP OKAY A OKAY B HREADY 35 Mali200 + GP2 SoC Integration .