Computer Architecture 10

ARM Processors

Made wi th OpenOffi ce.org 1 ARM

ARMARM –– AdvancedAdvanced RISCRISC MachinesMachines Ltd.Ltd. (Cambridge,(Cambridge, England)England) earlier Acorn RISC Machine (1983) founded 1990 (Acorn, Apple, VLSI)

Made wi th OpenOffi ce.org 2 History

DevelopmentDevelopment ofof MOSMOS 65026502 microprocessormicroprocessor MOS Technology Company (Commodore Semiconductor Group) Roger Wilson & Steve Furber MOS6502MOS6502 –– 8-bit8-bit processor:processor: 3 x 8-bit registers 1MHz clock No dedicated IO commands 8-bit stack

Made wi th OpenOffi ce.org 3 Early Versions (Acorn)

ARM1ARM1 –– 19851985 ARM2ARM2 –– 1986-871986-87 32-bit data bus 26-bit address bus 16 x 32-bit registers only 30.000 transistors Low power consump. & better performance than 80286 1987: Archimedes Computer ● World’s first commercial RISC microcomputer ● Acorn Computer Group ● Intended for schools & educational use

Made wi th OpenOffi ce.org 4 First Successes

ARM3ARM3 –– 19891989 Cache 4kB – great performance boost ARM6ARM6 –– 1990-911990-91 Apple: ARM6 (version ARM610) Used in Apple Newton palmtop (PDA) 35.000 transistors

Made wi th OpenOffi ce.org 5 ARM I Line

1995:1995: ARMARM cooperatescooperates withwith DECDEC (Digital(Digital EquipmentEquipment Corporation)Corporation) StrongARM ● Not fully compatible with ARM line, but greater performance ● Applications in PDA & terminals ● SA-100, SA-110 i SA-1110 1997:1997: ARMARM sectionsection ofof DECDEC isis soldsold toto IntelIntel Xscale (2000) successor of SA-1110 Replacement of Intel RISC i860 & i960 arch.

Made wi th OpenOffi ce.org 6 ARM II Line

19931993 –– ARM7ARM7 24-150 mW/MHz - 0.8-1.0 MIPS/MHz first 19951995 -- ARM9ARM9 ARM9 – Harvard Architecture (at cache level) 19981998 -- ARM10ARM10 20012001 –– ARM11ARM11 20052005 –– CortexCortex

Made wi th OpenOffi ce.org 7 ARM Ltd. Today

HoldsHolds && sellssells licenselicense ofof ARMARM corecore ARM Ltd. has no silicon manufacture facilities and does not produce any microprocessor itself DesignDesign ofof ARMARM developmentdevelopment toolstools Software tools, Prototype boards Solutions to bus & peripherals architectures

Made wi th OpenOffi ce.org 8 Intellectual Property (IP)

ARMARM licenseslicenses hard views ● For OEMs ● DSM (Design For Manufacture) ● RTL/GDSII description soft views: ● Gate level netlists ● Ready for synthesis

Made wi th OpenOffi ce.org 9 ARM - Basic Features

RISCRISC Architecture:Architecture: Simple & fast commands Few & simple addressing modes Reduction of memory accesses – increase of internal registers number (general purpose) Simplified datapath control – pipelining and further superscalar processing Microprocessor without Interlocked Piped Stages (MIPS)

Made wi th OpenOffi ce.org 10 ARM - Basic Features

32-bit32-bit architecture,architecture, possiblepossible operationsoperations on:on: ● Byte - 8 bits, Halfword - 16 bits, Word - 32 bits MostMost ARMARM processorsprocessors cancan executeexecute twotwo commandscommands sets:sets: 32-bit ARM Instruction Set 16-bit Thumb Instruction Set JazelleJazelle –– ARMARM corescores withwith supportsupport forfor directdirect JavaJava bytecodebytecode executionexecution Jazelle Java Machine ● 140 Java-instructions are executed directly in hardware, rest 94 by emulating with multiple ARM instructions Made wi th OpenOffi ce.org 11 ARM - Basic Features

DedicatedDedicated forfor portableportable devicesdevices Low power consumption (MIPS/Watt ratio) CoreCore Extensions:Extensions: Thumb,Thumb, DSP,DSP, Jazelle,Jazelle, etc.etc. I/O:I/O: IP-blocksIP-blocks UART GPIO MMU and lots of others

Made wi th OpenOffi ce.org 12 ARM Terminology

NamingNaming chaoschaos ?? CoreCore versionversion vsvs CoreCore architecturearchitecture CoreCore andand peripheralsperipherals CompanyCompany namingnaming preferencespreferences

Made wi th OpenOffi ce.org 13 Core Versions

ARMv1ARMv1 ARMv2ARMv2 ARMv3ARMv3 ARMv4ARMv4 –– SA-110,SA-110, SA-1110,SA-1110, ARM7xx,ARM7xx, ARM9xxARM9xx ARMv5ARMv5 –– ARM9xxE,ARM9xxE, ARM10xx,ARM10xx, XscaleXscale ARMv6ARMv6 –– ARM11xxARM11xx ARMv7ARMv7 –– CortexCortex

Made wi th OpenOffi ce.org 14 Core Architectures

T - Thumb instruction set D - Debug-interface (JTAG/ICEBreaker) M - Multiplier (hardware) E – DSP support I - Interrupt (fast interrupts) J – Jazelle Example:Example: ARM7xxTDMIARM7xxTDMI

Made wi th OpenOffi ce.org 15 Command Set Evolution

Improved Jazelle Halfword 4 ARM/Thumb 5TE signed Interworking 5TEJ 1 (Java bytecode halfword / execution) byte System SA-110 Saturated math ops. ARM9EJ-S ARM926EJ-S 2 mode DSP multiply SA-1110 -accumulate ARM7EJ-S ARM1026EJ-S

3 ARM1020E SIMD Instructions Thumb 6 instruction 4T Multi-processing set XScale Early ARMs V6 Memory architecture (VMSA) ARM7TDMI ARM9TDMI ARM9E-S Support for ARM720T ARM940T ARM966E-S misaligned data ARM1136EJ-S

Made wi th OpenOffi ce.org 16 Core Specific Features

● v3: 32-bit addressing & architecture variants: – T – Thumb state: 16-bit instructions set execution – M – long multiply support (32 x 32 => 64 or 32 x 32 + 64 => 64) (standard feature in all following architecture generations) ● V4 new functions: halfword load & store ● V5 advanced cooperation between ARM & Thumb, CLZ instructions (count leading-zeros) and new architecture variants: – E – enhanced DSP – saturated math (in contrast to modulo arithmetics) & 16-bit multiplications – J – support for Java bytecode executions ● V6 – multiprocessing, advanced memory management, multimedia instructions, enhanced exceptions and interrupts

Made wi th OpenOffi ce.org 17 ARMv6 – Performance

Enhanced media-processing ● 2x faster MPEG4 coding/decoding ● 2x faster audio DSP processing Advanced cache architecture ● Physically addressable cache ● Improvements of cache flush/refill modes ● Faster context switching Advanced exceptions and interrupts handling: ● Significant speed improvement for real-time apps. Support for processing of misaligned and mixed- endian data formats ● Simpler data sharing ● Efficient memory usage ● Easier porting of applications Made wi th OpenOffi ce.org 18 Examples

Made wi th OpenOffi ce.org 19 Cortex Family (v7)

ARMARM Cortex-ACortex-A SeriesSeries –– forfor biggerbigger OS’esOS’es andand applications,applications, suportsuport ARMARM && Thumb-2Thumb-2 instructioninstruction setssets ARMARM Cortex-RCortex-R SeriesSeries –– forfor real-time,real-time, embeddedembedded applications,applications, suportsuport ARMARM && Thumb-2Thumb-2 instructioninstruction setssets ARMARM Cortex-MCortex-M SeriesSeries –– forfor simple,simple, „deep„deep embeddeed”embeddeed” cost-optimisedcost-optimised applications.applications. suportsuport onlyonly Thumb-2Thumb-2 instructioninstruction setssets

Made wi th OpenOffi ce.org 20 V5-V6-V7 Comparison

Made wi th OpenOffi ce.org 21 Thumb-2 Technology

Made wi th OpenOffi ce.org 22 E – DSP Enhancements

Single-cycleSingle-cycle 16x1616x16 ii 32x1632x16 MACMAC unitsunits CLZCLZ (count(count leadingleading zeros)zeros) support for number normalization, multiplication speedup AdaptiveAdaptive MultiMulti RateRate (AMR)(AMR) for GSM, UMTS, WCDMA):

Made wi th OpenOffi ce.org 23 E – DSP Enhancements

Made wi th OpenOffi ce.org 24 NEON - Advanced SIMD

Embedded SIMD processing ● Fixed and floating-point (single prec.) arithmetics 8-, 16-, 32- & 64-bit data types ● (un)signed; float32; poly{8,16} Parallel DSP operations ● fast fingerprint recognition ● real-time hand-writing recognition ● real-time FFT, MPEG4 code/decode Registers shared with VFP units ● Used as registers 32 x 64-bit lub 16 x 128-bit Great performance improvement compared to ARMv6 SIMD instructions

Made wi th OpenOffi ce.org 25 VFPv3 – Vector Floating Point

BasedBased onon VFPv2VFPv2 FP registers: 32 (from previous 16) Type conversion instructions: Fixed to Float ● Integer: 16- or 32-bit ● Signed & Unsigned conversions ● FP: single or double Instructions with floatint-point constants PossiblePossible operationoperation withoutwithout exceptionexception callingcalling forfor time-criticaltime-critical applicationsapplications

Made wi th OpenOffi ce.org 26 ARM7 and ARM9 Pipelines

ARM7TDMIARM7TDMI corecore andand ARM7TDMI-SARM7TDMI-S

ARM9TDMIARM9TDMI

ARM9E-SARM9E-S

Made wi th OpenOffi ce.org 27 ARM11 Pipeline

Made wi th OpenOffi ce.org 28 Cortex R4 Pipeline

Made wi th OpenOffi ce.org 29 Programming Model

3737 (32-bit)(32-bit) registers,registers, 1616 visiblevisible (R0-R15),(R0-R15), otherother bankedbanked forfor exceptionexception processingprocessing SpecialSpecial registersregisters R13 - Stack Pointer (SP) R14 – Link Register (LR) R15 – Program Counter (PC) CPSR (current program status register) SPSR (saved program status register) AccessAccess toto registersregisters dependsdepends onon processorprocessor operationoperation modemode (banked(banked registers)registers)

Made wi th OpenOffi ce.org 30 Registers and Operation Mode

System & User FIQ IRQ SVC Undef Abort

r0 r1 User r2 mode r3 r4 r0-r7, User User User User r15, r5 mode mode mode mode cpsr r6 r0-r12, r0-r12, r0-r12, r0-r12, r7 r15, r15, r15, r15, r8 r8 cpsr cpsr cpsr cpsr r9 r9 r10 r10 r11 r11 r12 r12 r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r13 (sp) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r14 (lr) r15 (pc)

cpsr spsr spsr spsr spsr spsr

Made wi th OpenOffi ce.org 31 Status Register

Condition code flags I&F - Interrupt masks N = Negative I = 1: IRQ not allowed Z = Zero F = 1: FIQ not allowed C = Carry T – Instruction Set V = Overflow Only xT J bit T = 0: ARM ISA Only 5TEJ T = 1: Thumb ISA J = 1: Jazelle ISA M4-0 - Operation Mode

Made wi th OpenOffi ce.org 32 Operating Modes

SevenSeven operatingoperating modes:modes: User Privileged: ● System (version 4 and above) ● FIQ ● IRQ ● Abort exception modes ● Undefined ● Supervisor

Made wi th OpenOffi ce.org 33 Operating Modes

UserUser mode:mode: Normal program execution mode System resources unavailable Mode changed by exception only ExceptionException modes:modes: Entered upon exception Full access to system resources Mode changed freely

Made wi th OpenOffi ce.org 34 Operating Modes

User – for most applications System – privileged mode with access to user mode registers FIQ – fast (high priority) interrupt IRQ – normal interrupt Abort – for handling memory access errors Undefined – for handling undefined exceptions Supervisor – after hardware reset or software interrupt instruction

Made wi th OpenOffi ce.org 35 Interrupt Vector Table

Exception Mode Priority Int. Vec. Address Reset Supervisor 1 0x00000000 Undefined instruction Undefined 6 0x00000004 Software interrupt Supervisor 6 0x00000008 Prefetch Abort Abort 5 0x0000000C Data Abort Abort 2 0x00000010 Interrupt IRQ 4 0x00000018 Fast interrupt FIQ 3 0x0000001C

FIQ subroutine code starts immediately at 1C address (no jump to subroutine)

Made wi th OpenOffi ce.org 36 Instruction Set

Made wi th OpenOffi ce.org 37 Instruction Set

FullyFully 32-bit32-bit instructioninstruction setset inin nativenative operatingoperating mode,mode, 32-bit32-bit longlong instructioninstruction wordswords AllAll instructionsinstructions areare conditionalconditional (!)(!) In normal instruction execution (unconditional) condition field contents of AL is used (Always) InIn conditionalconditional operationsoperations oneone ofof thethe 1414 availableavailable conditionsconditions cancan bebe selectedselected

Made wi th OpenOffi ce.org 38 Conditions

Made wi th OpenOffi ce.org 39 Branching

BB Branch with 24-bit signed offset BLBL Branch with link (24-bit signed offset, PC → R14) BXBX Branch and eXchange (branch with instruction set exchange (ARM ↔ Thumb) BXJBXJ (if(if JazelleJazelle extensionextension available)available) Branch and enter Java bytecode interpretation

Made wi th OpenOffi ce.org 40 Data Processing

Made wi th OpenOffi ce.org 41 Data Processing Details

Made wi th OpenOffi ce.org 42 Multiplication

MUL,MUL, MLAMLA Multiply and Multiply-Accumulate MULL,MLALMULL,MLAL Multiply Long and Multiply-Accumulate Long

Made wi th OpenOffi ce.org 43 Data Transfer

SingleSingle DataData TransferTransfer LDR, STR HalfwordHalfword andand SignedSigned DataData TransferTransfer LDRH, STRH, LDRSB, LDRSH BlockBlock DataData TransferTransfer LDM, STM SingleSingle DataData SwapSwap SWP

Made wi th OpenOffi ce.org 44 Other Instructions

SoftwareSoftware InterruptInterrupt (SWI)(SWI) CoprocessorCoprocessor DataData OperationsOperations (CDP)(CDP) CoprocessorCoprocessor DataData TransfersTransfers (LDC,(LDC, STC)STC) CoprocessorCoprocessor RegisterRegister TransfersTransfers (MRC,(MRC, MCR)MCR) UndefinedUndefined InstructionInstruction Cond011xxxxxxxxxxxxxxxxxxxx1xxxx

Made wi th OpenOffi ce.org 45 Assembler Syntax

MULMUL R1,R2,R3R1,R2,R3 R1:=R2*R3 MLAEQSMLAEQS R1,R1, R2,R2, R3,R3, R4R4 Conditionally R1:=R2*R3+R4 & setting condition codes e.g.e.g. MLAEQSMLAEQS MLA – core mnemonic EQ – condition S – condition codes set bit

Made wi th OpenOffi ce.org 46 Code Example

CC LangugeLanguge if (a==0 || b==1) c = d + e ;

AsemblerAsembler CMP R0, #0 ; compare a with 0 CMPNE R1, #1 ; if a is not 0, compare b to 1 ADDEQ R2, R3, R4 ; if either was true c = d + e

Made wi th OpenOffi ce.org 47 Thumb Instruction Set

InstructionInstruction wordword lengthlength shrunkshrunk toto 16-bits16-bits NumberNumber ofof workingworking registersregisters isis limitedlimited toto 88 InstructionsInstructions followfollow theirtheir ownown syntaxsyntax butbut eacheach instructioninstruction hashas it’sit’s nativenative ARMARM instructioninstruction counterpartcounterpart DueDue toto shrinkingshrinking somesome functionalityfunctionality isis lostlost 1919 differentdifferent ThumbThumb instructioninstruction formatsformats

Made wi th OpenOffi ce.org 48 Advanced Features - Example

ARMARM IntelligentIntelligent EnergyEnergy ManagerManager (IEM)(IEM) AdaptiveAdaptive VoltageVoltage ScalingScaling (AVS)(AVS) II

Made wi th OpenOffi ce.org 49 Operating Systems

MicrosoftMicrosoft WindowsWindows MobileMobile LinuxLinux EPOCEPOC NetBSDNetBSD Microsystems:Microsystems: uCOS,uCOS, uLinuxuLinux ......

Made wi th OpenOffi ce.org 50 ARM World

Made wi th OpenOffi ce.org 51