ARM Was Developed at Acron Computer Limited Of

MEH420 Intro. To Embedded Systems ARM Processors ARM Processors • ARM was developed at Acron Computer • Based upon RISC Architecture with Limited of Cambridge, England between enhancements to meet requirements of 1983 & 1985 embedded applications. • RISC concept introduced in 1980 at Stanford • A large uniform register file and Berkley • Load-store architecture, where data processing operations operate on register contents only • ARM Limited founded in 1990 • Uniform and fixed length instruction • ARM Cores • 32-bit processor • Licensed to partners to develop and fabricate new • Instructions are 32-bit long microcontrollers • Good speed / power consumption ratio • Soft core • High code density -1- -2- -3- ARM Processors ARM Processors ARM Processors • Version 1 (1983-1985) (obsolete) • Version 5T • Enhancement to Basic RISC Features: • 26-bit addressing, no multiply or coprocessor • Superset of 4T adding new instruction • Version 5TE • Control over ALU and barrel shifter for every data • Version 2 (obsolete) processing operation to maximize their usage • Includes 32-bit result multiply co-processor • Add signal processing extension • Auto-increment and auto-decrement addressing • Version 3 • Examples: • ARM9E-S: v5TE (Sony Ericsson K-W series, TI modes to optimize program loops • 32-bit addressing • Load and Store multiple instructions to maximize OMAPs) data throughput • Version 4 • XScale: v4 (Samsung Omnia, Blackberry) • Conditional execution of instructions to maximize • Add signed, unsigned half-word and signed byte • Version 6 execution throughput load and store instructions • ARM11: ARMv6 (iPhone, Nokia E90, N95 etc) • Version 4T: Thumb compressed form of • Cortex-M0-M1: ARMv6 (STM32, NXP LPC, FPGA instruction introduced. Softcore) -4- -5- -6- ARM Processors ARM Processors: Common Features (till v5) ARM Processors: Basic ARM Organization • ARM v7: (M,E-M,R,A): Cortex-M3-M4, Cortex-R4-R5- R7, Cortex-A5-A7-A8-A9,A12, A15. • Data items are place in register file • A: Applications processors are intended for use with open OS and • No data processing instructions directly manipulate feature a memory management unit (MMU) providing for virtual data in memory addressing • R: Real-time processors will focus more deeply embedded • Instructions typically use two source registers applications. They will feature a memory protection unit (MPU) which protects regions of memory but does not provide for virtual addressing. and single results or destination register • M: Microcontrollers will generally not have memory protection, and • A Barrel shifter on the data path can process focus on providing very low latency responses to interrupts and including features such as flash memory controllers and interrupt data before it enters ALU (no clock- controllers • NXP, Atmel, Cypress, ST, TI, OMAP Samsung, Nvidia Tegra (20MHz- combinational circuit) 2.5GHz) • Increments/decrement logic can update • ARM v8: 32/64 bit version (A-R), Cortex-A53-A57 register content for sequential access • New instruction set, Advances SIMD (NEON), Crypto instuctions, Linux kernel version 3.7 and iOS 7 support. independent of ALU. (auto increment-dec. modes) -7- -8- -9- ARM Processors: Registers ARM Processors: Registers ARM Processors: Registers (r15) • General purpose registers hold either • Depending upon context, registers r13 • When the processor is executing in data or address and r14 can also be used as GPR ARM state • All registers are of 32 bits • Any instruction which use r0 can as well • In user mode 16 data registers and 2 be used with any other GPR (r1-r13) status registers are visible • All instructions are 32-bit wide • In addition, there are two status • Data registers: r0 to r15 • All instructions are word aligned registers: • r13: stack pointer • PC value is stored in bits [31:2] with • r14: link register (where return address is put • CPSR: current program status bits [1:0] undefined whenever a subroutine is called) register • r15: program counter • SPSR: saved program status register -10- -11- -12- ARM Processors: Registers ARM Processors: Processor Modes ARM Processors: Processor Modes • Processor modes determine • Which registers are active • Access rights to CPSR register itself • Each processor mode is either • Privileged: full read-write access to • The processor enters abort mode when there is • N: Negative Z: Zero, C: Carry, V: Overflow the CPSR a failed attempt to access memory. • I: 1 disable IRQ , F: 1 disable FIQ • Non-privileged: only read access to • Fast interrupt request (FIQ) and interrupt request modes correspond to the two interrupt • T: 0 ARM state, 1: Thumb state the control field of CPSR but read- levels available on the ARM processor. • Q : Overflow , saturation arithmetic (v5TE) write access to the condition flags. -13- -14- -15- ARM Processors: Processor Modes ARM Processors: Banked Registers ARM Processors: Banked Registers • Supervisor mode is the mode that the processor • ARM has 37 registers in the register file. • ARM has 37 registers in the register file. is in after reset and is generally the mode that Of those, 20 registers are hidden from a Of those, 20 registers are hidden from a an operating system kernel operates in. program at different times. These registers program at different times. These registers • System mode is a special version of user mode are called banked registers and are are called banked registers and are that allows full read-write access to the CPSR. identified by the shading in the diagram. identified by the shading in the diagram. • Undefined mode is used when the processor • They are available only when the • They are available only when the encounters an instruction that is undefined or processor is in a particular mode; for processor is in a particular mode; for not supported by the implementation. example, abort mode has banked example, abort mode has banked • User mode is used for programs and applications. registers r13_abt, r14_abt and spsr_abt. registers r13_abt, r14_abt and spsr_abt. -16- -17- -18- ARM Processors: Banked Registers ARM Processors: Mode Changing ARM Processors: Pipeline • When we enter FIQ mode we have a • Mode changes by written directly to • A pipeline is the mechanism a RISC fresh copy of r8-r14. CPSR or by hardware when the processor uses to execute instructions. • You normally should store (typically to stack) the status register before entering processor responds to exception or Using a pipeline speeds up execution to an interrupt. • CPSR is copied to SPSR. When going interrupt. by fetching the next instruction while to user mode SPSR is copied to CPSR. • To return user mode a special return other instructions are being decoded instruction is used that instructs the and executed. core to restore the original CPSR and banked registers. -19- -20- -21- ARM Processors: Pipeline ARM Processors: Pipeline ARM Processors: Pipeline and Memory Organization • As the pipeline length increases, the amount of work done at each stage is reduced, which allows Processor # of pipeline Memory Clock MIPS/MHz the processor to attain a higher operating family stages organization Rate frequency. This in turn increases the performance. ARM6 3 Von Neumann 25 MHz • The system latency also increases because it ARM7 3 Von Neumann 66 MHz 0.9 takes more cycles to fill the pipeline before the ARM8 5 Von Neumann 72 MHz 1.2 core can execute an instruction. ARM9 5 Harvard 200 MHz 1.1 ARM10 6 Harvard 400 MHz 1.25 • The increased pipeline length also means there The ARM9 adds a memory and writeback stage, which StrongARM 5 Harvard 233 MHz 1.15 allows the ARM9 to process on average 1.1 Dhrystone can be data dependency between certain stages. ARM11 8 Von 550 MHz 1.2 MIPS per MHz—an increase in instruction throughput You can write code to reduce this dependency by Neumann/ by around 13% compared with an ARM7. The maximum using instruction scheduling. Harvard core frequency attainable using an ARM9 is also higher. -22- -23- -24- ARM Processors: Instructions ARM Processors: Instructions ARM Processors: Data Types • Instructions process data held in registers and • Word is 32-bit long access memory with load and store • 3-address data processing instructions, • Word can be divided into four 8-bit instructions. 2 for inputs and 1 for output • Conditional execution of each instruction bytes • Classes of instructions: • Load and store multiple registers • ARM address can be 32-bit long • Data processing • Shift, ALU operation in a single • Address refer to byte • Branch instructions instruction (barrel shifter) • Address 4 start at byte 4 • Load-store instructions • Open instruction set extension through • Software interrupt instructions the coprocessors instruction • Can be configured at power-up as • Programs status register instructions either little- or big-endian mode. -25- -26- -27- ARM Processors: Data Processing ARM Processors: Data Processing ARM Processors: Move Instruction • Manipulate data within registers • Operands are 32-bit wide: come from • MOV Rd, N • MOVE instructions registers or specified as literal in the • Rd: destination register • Arithmetic instructions (multiply) instruction itself • N: can be an immediate value or a source register • Logical instructions • Second operand sent to ALU via barrel • Example: mov r7,r5 • Comparison instructions shifter • MVN Rd, N • Suffix S on data processing instructions • 32-bit result placed in register; long multiply instruction produce 64-bit • Move into Rd “not” of the 32-bit value from updates flags in CPSR source results -28- -29- -30- ARM Processors: Barrel Shifter ARM Processors: Arithmetic Instructions ARM Processors: Arithmetic Instructions • Enables shifting 32-bit operand in one of the • Implements 32-bit addition and • Use of barrel shifter with arithmetic and source registers left or right by a specific number subtraction logical instructions increases the set of of positions within the cycle time of instruction. • 3-operand form possible available operations. • Available Operations: shift left and right, rotate right. •Examples: • SUB r0, r1, r2 • Facilitates fast multiply, division and increases •Examples: • Subtract value stored in r2 from that of r1 and • ADD r0, r1, r1 LSL #1 code density.

ARM Was Developed at Acron Computer Limited Of

A3MAP: Architecture-Aware Analytic Mapping for Networks-On-Chip Wooyoung Jang and David Z

Comparative Study of Various Systems on Chips Embedded in Mobile Devices

Nomadik Application Processor Andrea Gallo Giancarlo Asnaghi ST Is #1 World-Wide Leader in Digital TV and Consumer Audio

Table of Contents

Parallel Applications Mapping Onto Heterogeneous Mpsocs Interconnected Using Network on Chip Dihia Belkacemi, Daoui Mehammed, Samia Bouzefrane

Neomagic Corporate Overview

Perceptual Feature Based Music Classification - a DSP Perspective for a New Type of Application

Design-Time Application Mapping and Platform Exploration for MP-Soc Customised Run-Time Management

Multicore Technology in Mobile Devices

HW-SW Components for Parallel Embedded Computing on Noc-Based Mpsocs Keywords

Università Degli Studi Di Parma Service Oriented

Stmicroelectronics N.V