ARM architecture

Computer architecture M

1 History

• Acorn computer: an english company Cambridge -off (UK) which had developed a 8 bit microprocessor for the BBC on 6502 architecture (Synertek e Rockwell)

• In 1982 Acorn engineers looked for a new microprocessor per more sophisticated applications but decided against CISC solutions because too slow for the specific requirements and interrupt latency time

• They decided to design a totally new architecture. At the same time Stanford RISC I and II and MIPS (Microprocessor without Interlocked Pipeline Stages) of Berkley appeared on the market Berkley and they decided to follow that philosophy

• ARM (Advanced RISC Machine) whose three stages is still now used

• ARM is now a true industry (from 1990) and a «brand» with multiple implementations and is used by many processor companies (Intel too) in multiple environments in tailored versions (Intellectual Property – IP - cores)

• Design software can be bought (Verilog) – soft core

2 ARM

• T: Thumb • D: On-chip debug support • M: Enhanced multiplier • I: Embedded ICE hardware • T2: Thumb-2 • S: Synthesizable code • E: Enhanced DSP instruction set • J: JAVA support • Z: TrustZone • F: Floating point unit • H: Handshake, clockless design for synchronous or asynchronous design

3 ARM- base concepts

• Arm is a family of Risc processors conceptually similar to DLX • There are several versions from a very simple to a very sophisticated one • Multiple environments (i.e. mobile phones)

Apple iPod Photo e iPod Video 5th gen (2X, @80MHz)

Roomba 500 Lego Mindstorm

4 ARM – first version

• LOAD/STORE architecture very simple since the designers had no full-custom previous experience

• 32 bit fixed length instructions

• Three addresses instructions RISC type (with some exceptions CISC type)

• Fixed register bank. Obviously in addition to programmer visible registers there are the machine registers

• Single cycle instructions but potentially multiple cycle since the no Harvard architecture is implemented. When more that a single memory access is required (i.e. a LOAD) the extra cycles are used for useful microoperations (i.e. autoindex address)

5 Tthree stages ARM

• 16 32-bit general-purpose registers (r0 - r15)

• Three ports register bank (two for reading and one for writing) An additional port for read and write register 15 (PC)

• N-positions barrel shifter

• 32 bit ALU

• The address register is provided with an incrementer (for sequential accesses) – In practice it is a programmable counter

• Two buffer registers for data to and from the memory (invisible to the programmer). Single bank memory

• Instructions decoder and control logic

• Status register (CSPR)

• Two interrupts: fast and standard

6 ARM register set

r0 fiq: fast interrupt r1 r0-r7-are common to svc: software interrupt r2 user and system mode abt: memory faults (abort) irq: standard interrupt r3 und:undefined instructions r4 r5 r6 r7 System mode only r8_fiq r8 r9-fiq r9 r10_fiq r10 r11_fiq r13_und r11 r13_irq r12_fiq r13_abt r14_und r12 r13_svc r14_irq r13_fiq r14_abt r13 (MSP) r14_svc r14_fiq r14 (LR) SPSR_und r15 (PC) SPSR_irq SPSR_abt SPSR_svc SPSR_fiq CPSR User mode fiq mode svc mode abort mode irq mode undef. mode

CPSR Current Program Status Register SPSR Saved Processor Status Register MSP Master Stack Pointer LR Link Register (return register for subroutines) 7 Current Program Status Register CPSR (similar to flag register)

31 28 27 8 7 6 5 4 0 N Z C Unused I F T Mode

Condition codes N negative Z zero I,F interrupt masks C carry T Thumb Instr. Set V oVerflow

CPSR [4:0] mode Use Used Register Set 10000 User Normal user code user 10001 FIQ Processing Fast interrupt fiq 10010 IRQ Processing standard interrupts irq 10011 SVC Processing software interrupts svc 10111 Abort Processing memory faults abt 11011 Undef Handling undefined instructions trap und 11111 System Running privileged tasks user

Each privileged mode has a Saved Program Status Register SPSR where the current CPSR is saved and a specific r14 (Link Register)

NB Thumb Instruction Set: higly encoded instructions for memory save

8 Exceptions

When an exception occurs:

1) The corresponding mode is activated 2) The PC (r15) is saved in r14 (link register) of the new mode 3) The old CSPR is saved in the SPSR of the new mode 4) IRQ is disabled setting bit 7 of CSPR and if the exception corresponds to the Fast Interrupt CSPR bit6 is set 5) PC assumes the value of the following table (fixed addresses)

Exception Mode Address Reset SVC 00000000 Undef. Instr. UND 00000004 Soft. Int. (SWI) SVC 00000008 Prefetch Abort ( Instr.Mem. Fault) Abort 0000000C Data Abort (Data Mem. Fault) Abort 00000010 IRQ IRQ 00000018 FIQ FIQ 0000001C

00000014 cannot be used (old ARMs compatibility)

9 ARM three stages pipeline

Write Read Read

10 Organisation

• Register bank (=Register File): two ports (read) and one port (write) - as in DLX – for the normal data traffic plus two accesses (read e write) for r15 (PC)

• Memory access register has an incrementer for sequential accesses which is used for incrementing the PC too

• One extra register for 32 bit multiplication (when multiplying 32 bit data the result can be longer that 32 bits)

• Two transit register for the memory (Datain and Dataout - no harvard architecture initially)

• No forwarding unit (not needed because the following instruction finds the updated value already in the RF - three stages pipelines – see next slide)

11 Three stages pipeline Single cycle instructions

fetch decode .add add r3,r1,r5

sub r2,r3,r6 fetch decode exec. sub

cmp r2,#3 fetch decode exec.cmp

1 2 3 time

• A single execution clock instruction accesses during the execute stage two operands; the datum on bus B shifted (if required), combined in the ALU with bus A datum. The result is written back in the register bank. The PC is incremented by the incrementer and the result is stored back in r15 AND in the address register for next instruction access

• Fetch stage: the instruction is read from the memory into the data-in register for decoding

• Decode stage: the instruction in the data-in register (it doesn’t use the datapath) and in the meantime the next instruction is read from the memory and is «clocked» at the end of the fetch stage

• Exec stage the instruction uses the datapath. In an arithmetic instruction two operands are read, that on the bus B shifted (combinatorially) if needed and combined with datum on bus A. The result is written back in the RF in the same clock period NB The datum required by an instruction (exec stage) finds the datum already available in the RF. No forwarding unit !!!! 12 Three stages pipeline Multiple cycles instructions

Here no fetch because the 1 fetchADD decode execute memory is busy with the WB 2(memory fetch STR decode calc. addr. data xfer Store)

3 fetchADD bubble decode execute

The address computation 4 prevents the decoding because fetchADD bubble decode execute the registers towards the ALU cannot be opened 5 Decoder busy No fetch fetchADD decode execute instruction Memory busy time

• Multi-cycles instructions are executed more irregularly. In this example an ADD followed by a STORE and three ADDs

• The greyed stages are those where the memory is accessed

• The datapath is used by the STORE for the address computation

• Since the PC(r15) is incremented in the first stage the programmer must be aware that it was already twice incremented (two instructions – 8 bytes) if it has to be used in the exec stage

13 Three stages pipeline Multiple cycles instructions

«register based» load with autoincrement

ldmia r0!,{r2,r3} fetch decode ex ld r2 ex ld r3

sub r2,r3,r6 fetch decode ex sub

cmp r2,#3 fetch decode ex cmp

time Ldmia -> Load multiple registers increment address

• This instruction loads two registers (in this case r2 and r3) with data starting from the address in r0 (in this case). No need for address computation (value already present in r0). The address is incremented by 4 each load (incrementer)

14 Branch

Decision on the third clock bne foo fetch decodexecute linkret adjust sub r2,r3,r6 fetch decod ex add add r13,r14,r2 fetch decod ex add

fetch decod ex add foo add r0,r1,r2

time The branch can be with return and the PC value is saved in the linkret stage. The adjust stage adjusts its value which has been already incremented by 8

15 Register/Register instructions Datapath

address register

incrementer

Rd PC(r15)  Instruction Reg-Reg registers  Rd <= Rn op Rm Rn Rm  R15 (PC) <= AR + 4  AR <= AR + 4 multiplier

Barrel AR: Address Register as per ins.

PC value incremented by 4 as per instruction The same incremented value in the AR

data out data in instr. pipe

16 Register/Immediate instructions Datapath

address register

incrementer  Reg-Imm  Rd <= Rn op Imm Rd PC(r15)  R15(PC) <= AR + 4 registers AR <= AR + 4 Rn

In this case the operand is in the instruction multiplier

as per ins.

As per instruction

[7:0]

data out data in instr. pipe

17 Store instruction Datapath

address register address register  Compute address  AR <= Rn op Disp increment increment  R15 (PC)<= AR + 4 PC Rn PC(r15) registers registers Rn Rd

 Store data  AR <= R15 (PC)  mem[AR] <= Rd  If autoindexing lsl #0 shifter Rn <= Rn +/- 4

= A / A + B / A - B = A + B / A - B

[11:0]

data out data in i. pipe byte? data in i. pipe

(a) 1st cycle – The STORE address is computed and stored in the AR. In the meantime r15(PC) is incremented and the value stored in the RF ONLY for the next instruction

(a) 2nd cycle – r15 (PC) is copied into the AR while the datum is written into the memory and an autoincrement (if required) is executed. If a single byte only must be written the lowest byte of the word is 4 times replicated in the output register.

18 Branch

Target PC increment

address register address register

increment increment  Compute target address  AR <= PC + Disp R14 PC(r15) registers registers PC(r15) PC(r15)

 Save return address (if required)  r14 <= PC (R15/PC to save) shifter  AR <= AR(PC) + 4 #2  R15 <=AR(PC) + 4  PC adjustement Shit 2 posiztions right = A + B = A

data out data in i. pipe data out data in i. pipe

(a) 1st cycle – compute branch target (b) 2nd cycle – save return address

19 Pipeline clock

• ARMs don’t use edge-sensitive FF (FF D) but they are based on a two non overlapping phases clock internally derived from the processor clock

• Data transfer is achieved loading alternatively the data in the latches

phase 1

phase 2

1 clock cycle

20 Datapath timing

• Read registers bus are dynamic and precharged in phase 2. In this case “dynamic” means that sometimes they are not driven: they maintain their values and look “pseudo-static”

• In phase 1 the used registers enable their drivers onto the read busses which presents valid data from the start of phase 1.

• The second operand goes through the barrel shifter and is therefore available with a little delay

• The ALU has input registers which are enabled during phase 1

• The ALU processes the operands during phase 2: the result is sampled in the destination register at the end of phase 2

21 Timing diagram

ALU operands latched phase 1

phase 2 register read time read bus valid precharge invalidates register write time shift time shift out valid buses

ALU time

ALU out

Delay = Register read time + shifter delay + ALU delay + Write register setup time + No phases overlap delay 22 ALU ARM

A operand latch B operand latch invert A invert B XOR gates XOR gates

function C in logic functions adder C V logic/arithmetic result mux N

zero detect Z

result

Since the integration of logic and mathematical functions is cumbersome two different circuits were designed plus a MUX

The value range in 2’ complement of the ARM 2 bit registers goes from –231 (0x80000000) to +231 – 1 (or 0x7FFFFFFF). In case of saturation – “overload” - (out of range values) an automatic correction is performed. If the value is greater than +231 – 1 the result becomes +231 – 1: if it is smaller that –231 the value becomes –231 .

23 Control logic

instruction

multiply control decode cycle PLA count load/store multiple

address register ALU shifter control control control control

Control signals for the subsystems

It must be noted the two subsystems which perform the multiplication and the multiples LOAD e STORE.

25 Memories

Data: 32 bit ----- Addresses:32 bit

The notation A(K+2:2) indicates that the addresses of a devices with K addresses lines are connected to the addresses of the ARM two positions right shifted. (Obviously ARM addresses 0 and 1 are NOT emitted). Notice that in ARM dialect the LSBit is on the right

26 Memories

mas [1:0] -> parallel access control

BusSeL and WR

mas[0] mas[1] 0 1 word access 1 0 half word access (selection depending fromA[1] 0 0 byte access (selection depending from A[0] and A[1])

A[31] selectes either ROM or RAM and r/!w the access type

Obviously a Wait/Ready is available

The I/O is memory mapped 27 ARM Bus

There are three types of busses defined by ARM

• Advanced High-performance Bus (AHB). It is a protocol based on a single bus. Addressing and transfer are overlapped for maximum bandwith which supports the burst mode

• Advanced eXtensible Interface (AXI) It is a protocol where data and addresses use different channels both for reading and writing. Addressing and transfer overlap and burst mode.

• Advanced Peripheral Bus (APB) for low complexity peripherals interface

Normally each ARM incorporates a AHB or a AXI together with an APB.

28 ARM Bus

AHB o AXI

29 AHB bus generic structure

Arbiter

HADDR HADDR HWDATA Slave Master HWDATA #1 #1 HRDATA HRDATA

Address/Control Slave Master #2 #2

Write Data Slave Read Data #3 Master #3

Slave #4 Decoder

30 AHB Bus

• Synchronous bus which supports 32, 64 e 128 bit transfers. 32 bit address and burst transfers (multiple transfers with incremented addresses)

• Separated address and data bus

• Up to 16 arbitrated bus masters

• The data transfer to an address is overlapped with the emission of the next address (max. bandwith exploitation). The arbitration takes place during the current transfer

• Burst transfers can be directed to a fix address (.e. a FIFO) or to an automatically incremented address. Burst cannot trespass 1KB address

• The bus master can «lock» the bus for atomic transfers (i.e. semaphores). «Split» transactions are allowed where a slave defers the acknowledge to the master. The slave stores the master request, wich when gains the bus again replicates the request (and the slave hopefully is ready to answer). Only a single pending split transaction is allowed

31 Topologies

32 Multilayer structure

33 Typical Multilayer

Periph Periph Periph #1 #2 #3

34 AHB Bus - Master

• An AHB master (i.e. an ARM processor)

• HBSREQx is the request to the arbiter and the transfer starts when the master receives the signal HGRANTx by activating the address and control signals which are received by the slaves which in turn decode them. The master is unaware of how many slaves are present For instance there can be three slaves, each one controlling 24 MB memory or two slaves each one controlling 36 MB memory

• The meaning ot the other signals explained in the next slides

35 AHB Bus - Timing

Write

Read

A read or write transfer with wait states

36 AHB Bus - Timing

Multiple transfers with and without wait periods. It must be noted the overlap of data transfer and next data addressing

37 AHB Bus- Topology

There is single address bus used in turn by all selected masters

38 AHB bus - Arbiter

In this figure an AHB arbiter. Nothing is obviously said about the arbiter policy. Normally it is a round robin scheme. The signal HGRANTx indicates which master the next access is granted to. HMASTER[3:0] indicates which master is presently controlling the bus

39 High perfomance AXI bus

• Great bandwith and low latency

• Retrocompatible with AHB

• Address/control and transfer phases separated

• Separated read write channels

• Transfer parallelism 1 to 128 bytes (lanes) with bus enables

• Burst transactions with single initial address. A signal indicates the transfer end

• Each transaction carries address and controls

41 AXI read and write

42 ARM caches

• The majority of the ARM processors famility use a cache virtually addressed (iAPX are physically addressed)

• Advantage: the cache access is performend in parallel with the virtual address translation (faster access)

• Disadvantages:

• The cache must be emptied for each context switch

• Possible data sharing must take place outside the cache (address translation different for different processors)

53 ARM MMU

• The MMUs depends on the processor. Here the characteristics of ARM 7

• The virtual memory is based on page tables not on chip (in memory as it is the case with iAPX come nei sistemi )

• Page size can be 1MB (indicated as sections with a single level page table) or 64 KB or 4 KB (pages double level page tables)

• Internal TLB

• Memory is protected by up to 16 domains. A different policy can be defined for each of them (i.e. cacheable or non cacheable)

• Virtual memory is mandatory since caches are virtually addressed

54 ARM9TDMI

• Harvard architecture

• 5 stages pipeline

• Increased clock frequency

55 Strong ARM (ARM9TDMI) 5 stages (DLX with cache)

56 ARM7TDMI vs ARM9TDMI

Increased number of stages for clock frequency increase

57 Pipeline ARM9TDMI

ARM7TDMI: Fetch Decode Execute

ARM9TDMI:

Reg. Read Decode

Process 0.25 um Transistors 110,000 MIPS 220

Metal layers 3 Core area 2.1 mm 2 Power 150 mW Vdd 2.5 V Clock 0 to 200 MHz MIPS/W 1500

58 Stages dynamic

 Fetch

 Decode: decodes the instruction and register read (three read ports)

 Execute

 An operand is shifted (if needed) and the ALU result is available  or  Address computation

 Buffer/data: memory access (load, store)

 Write-back: register write

… as DLX

59 Forwarding

Forwarding Paths

60 Further ARM information

From here onward other information about ARM (only for cultural purposes NOT for the exam)

61 Unavoidable stalls (as DLX)

1 234 5 6 7

LDR R1,@(R2) IF ID EX MEM WB

SUB R4,R1,R5 IF ID EXsub MEM WB

AND R6,R1,R7 IF ID EXand MEM WB

OR R8,R1,R9 IF ID EXE MEM

Not possible:R1 read from memory when required by the SUB => STALL

1234 5 6 7 8 9

LDR R1 ,@(R2) IF ID EX MEM WB

SUB R4,R1,R5 IF ID stall EXsub MEM WB

AND R6,R1,R7 IF stall ID EX MEM WB

OR R8,R1,R9 stall IF ID EX MEM WB

62 LDR interlock (bubble)

Unused Stadi MEM stag

LDR R4, [R7] ; R4 := MEM32 [R7] ; EOR Exclusive Or (Interlock => bubble). LDR is followed by an instruction which requires it

63 ARM architectures

64 Performance

65 ARM10TDMI

Reg. Read Decode

• 6 stages pipeline

• Clock 300 MHz

• CMOS 250 nm

• Performance: 4 times ARM7TDMI

• Branch prediction

• Non blocking Load and Store(queue)

• 64 bit memory: two registers transfer in a cycle 66 ARM 11

. OOO execution for the three pipelines

67 8 stages ARM

8 stages pipeline

. Data forwarding . Static and dynamic branch prediction . Non blocking cache access

Pipeline parallelism . ALU/MAC, LSU . Load and Store don’t block the pipeline . OOO execution

68 ARM11 MPCore

Highly configurable

• Up to 4 processors

• Configurable cache 16K-64K for each processor. MESI

• Double or single bus 64-bit AXI

• Optional vectored floating point

• Up to 255 interrupts sources

71 ARM11 MPCore

72 Comparison

Feature ARM9ETM ARM 10ETM Intel® XScaleTM ARM1 1TM

Architecture ARMv5TE(J) ARMv5TE(J) ARMv5TE ARMv6

Pipeline Length 5 6 7 8

Java Decode (ARM926EJ) (ARM1026EJ) No Yes V6 SIMD Instructions No No No Yes

MIA Instructions No No Yes Copross.

Branch Prediction No Static Dynamic Dynamic Independent Load- Store No Yes Yes Yes Unit

Instruction Issue Scalar, in-order Scalar, in-order Scalar, in-order Scalar, in-order

Concurrency None ALU/MAC, LSU ALU, MAC, LSU ALU/MAC, LSU Out-of-order No Yes Yes Yes completion

Target Synthesizable Synthesizable Custom chip Synthesizable Implementation and Hard macro

73 ARM Cortex family (V7)

x1-4 Cortex-A15 ...2.5GHz x1 -4 Cortex-A9 Cortex-A8 x1 -4 Cortex-A5 1-2 R Heron Cortex-R4

Cortex-M4

SC 300 TM CortexTM-M3 Cortex-M1

Cortex-M0 12k gates...

• ARM Cortex-A family (v7-A): General purpose processors - Applications processors for full OS and 3rd party applications

• ARM Cortex-R family (v7-R): Embedded processors for real time and control signal processing

• ARM Cortex-M family (v7-M): for SoC

74 ARM Cortex performance

75 ARM Cortex M3 Pipeline

1st Stage - Fetch 2nd Stage - 3rd Stage - Execute Decode

Address Data Phase Phase & AGU Load/Store & Write Branch Back Instruction Fetch Decode & Write (Prefetch) Multiply & Divide Register Read

ALU & Branch Shift Branch forwarding & speculation Branch

Execute stage branch (ALU branch & Load Store Branch)

77 ARM Cortex M3 Datapath

I_HRDATA Instruction Decode

D_HWDATA Write Data Address Register Incrementer D_HRDATA D_HADDR Read Data Address Register Register

B

Address Register Barrel Incrementer Mul/Div Bank Shifter I_HADDR ALU A ALU Address Register Writeback

INTADDR

• This diagram refers to the internal core and has therefore I and D ports. The memory access takes place outside the core

• Three stages pipeline similar to that of ARM 7

78 ARM Bit Banding

Traditional bit manipulation

0 0 0 0 0 0 0 0 RAM byte read 0x02000000

Mask and bit modification x x x x x 1 x x 0x02000000

0 0 0 0 0 1 0 0 RAM writeback 0x02000000

81 ARM Bit Banding

• A write to a bit band address affects only one bit M3 has two 32MB regions that map onto the two 1MB bit-band regions. The two regions are separate, one in the SRAM region and one in the peripheral region. Each bit in the bit-band region is addressed sequentially in the 32MB alias region. For example, the eighth bit in the bit-band region can be accessed using the eighth word in the 32MB alias region.

• The write is transformed into an atomic read-modify-write

• Register bit 0 is written into the bit 82 ARM Thumb instructions

83 ARM Thumb 2 instructions

• Variable instructions length • ARM instructions are fixed length 32 bits • Thumb instructions (higly encoded) are fixed length 16 bit • Thumb-2 instructions are both 16 ot 32-bit

• Cortex-M3 implements only a portion of Thumb-2 84 ARM Cortex M3 Interrupts

INTNMI

NVIC 1-240 Interrupts Cortex-M3

… Processor Core

Cortex-M3

• A single non maskable interrupt (INTNMI)

• 1-240 interruzioni con prioritized interrupts

• Maskable interrupts • Variable (according to the version) interrupt number • Vectored interrupt controller (NVIC)

85 ARM Cortex M3 NVIC

• In caso di interruzioni di maggiore priorità che si presentino durante un PUSH o un POP dello stack a causa di un interrupt precedente l’NVIC legge immediatamente il puntatore alla routine dell’interrupt di maggiore priorità

• L’NVIC s’incarica anche dello schema di power management: nel caso di istruzioni WFI (Wait for Interrupt) e WFE (Wait for Event) il core dell’M3 viene messo automaticamente nello stato di low-power. Analogamente per la SOE (Sleep On Exit) che pone il core in power down all’uscita dall’interrupt di minore priorità 87 ARM Cortex M3 memory map

La mappa della memoria è prefissata

System FFFFFFFF E0100000 APB Debug Components E0040000 CM3 Instruction SCS + NVIC E0000000 Core Data External Peripheral 1GB Bus Matrix INTERNAL PPB with A0000000 SYSTEMSYSTEM AHB AHB Bit- Bander External RAM DebugDebug Aligner ICODE AHB 1GB and Patch DCODE AHB 60000000 Peripheral ½GB RAM 40000000 ½GB 20000000 Code Space ½GB 00000000

88 ARM Cortex M3 Protection

• The processor allows to define 8 memory regions defined by specific registers

• Each region includes both data and instructions

• Region size: 32bytes-3GBytes

• There are many free open source OS for Cortex 3

• BeRTOS • ChibiOS • OS • Free RTOS • Micrium uC/OS-II • eCos • NuttX

89 ARM Cortex M3 Simple system

90 ARM Cortex 8

Cell phones, game controllers navigations systems oriented

Advanced performance with low power consumption

Architecture • Thumb-2 instructions • 130 new instructions • High density and high performance • NEON unit for signal processing • Audio video and 3D graphic

91 ARM Cortex 8

92 ARM Cortex 8

Instruction Instruction Instruction Fetch Decode Execute & NEON Media Unit Unit Load/Store Processor

L1 I Cache L1 D Cache

L2 Memory System

Cortex- A8 AXI Level 3 Memory Interface

93 ARM Cortex 8 Register file

95 ARM Cortex 8 Protection

Physical memory

Privileged OS Mode Code + Data

OS

User Mode Application Code + Data Application Code

97 ARM Cortex 8 Allocazione della memoria

Physical Memory

Privileged Mode Virtual Physical Address Address OS OS Code + Data

Memory User Mode Management Application Application Unit Code + Data Code Application User Mode Code + Data Application Code

28

98 ARM Cortex 8 Memory management

• The Memory Management Unit (MMU) controls the memory accesses for protection in addition to the address translation

• The TLBs associate a process identifier to each entry

99 ARM Cortex 8

• Super scalar pipeline: double emission in order and OOO execution

100 ARM Cortex 8

NEON media engine

101 ARM Cortex 8

102 ARM Cortex 8 for cell phones

105 ARM Cortex A9

107 ARM Cortex 15

108 ARM family

ARM6 →ARM7 • 3 stages pipeline • Unified memory for data and instructions • 16 bit Thumb instruction set • 54 multiplication unit

ARM8 → ARM9 →ARM10

ARM9 • 5 stages pipeline (130 MHz or 200MHz) • Separated data and instructions memory

ARM 10 • 300 MHz • Multimedia support • Optional vectored floating point unit

ARM11 • 8 stages pipeline • 1GHz • Wireless, consumer, networking and automotive

109 ARM Cortex family

• High end performance

• ARM Cortex-A for complex OS and applications. Thumb and Thumb-2 support

• ARM Cortex-R: embedded processor for real-time systems. Thumb e Thumb 2 support

• ARM Cortex-M: embedded processor for low cost applications. Thumb 2 only

110 La famiglia ARM

115