• About the ARM Processor • Processor Families and Architecture Versions

 About the ARM processor

The ARM architecture has been designed to allow very small, yet high-performance implementations. The architectural simplicity of ARM processors leads to very small implementations, and small implementations allow devices with very low power consumption. The ARM is a Reduced Instruction Set Computer (RISC), as it incorporates these typical RISC architecture features:

A large uniform register file A load/store architecture, where data-processing operations only operate on register contents, not directly on memory contents Simple addressing modes, with all load/store addresses being determined from register contents and instruction fields only Uniform and fixed-length instruction fields, to simplify instruction decode.

In addition, the ARM architecture gives you:  Control over both Arithmetic Logic Unit (ALU) and shifter in every data-  processing instruction to maximize the use of an ALU and a shifter  Load and Store multiple to maximize data throughput.

These enhancements to a basic RISC architecture allow ARM processors to achieve a good balance of high performance, low code size and low power consumption.

 Processor families and architecture versions.

The important thing to recognise is the difference between a processor's family name and the instruction set architecture (ISA) version it implements. o There are many variants of ARM processors, with different capabilities, and implementing different features. But all of them implement a version of the 'ARM architecture'(ARM ISA), that describes the interface and properties (instruction set, behaviour, etc.) that ARM processors must support. It has been refined over time with successivearchitecture versions, referred to with the ARMv{n} scheme. (Note the "v".) o ARM processor families group multiple processors, and were named chronologically, starting with ARM1 (1985) up to ARM11 (2002). The naming scheme then changed with the Cortex family introduced in 2005, in which processors are named following the scheme Cortex-{letter}{number}.

Each new family introduced new processors, with an improved design, better performance, and new features.

So a "dual Cortex-A9, based on ARMv7" is a processor with two Cortex-A9 cores and implementing the 7th version of the ARM architecture. The technically correct naming for this processor is a Cortex-A9 MPCore processor (based on ARMv7), comprising two Cortex-A9 cores.

Note that the processor family is sometimes misleadingly substituted with the actual processor's name. You may for example find a reference to an "ARMv6 ARM11 processor". There are actually no "ARM11" processors, but rather ARM1136, ARM1156, or ARM1176 processors, so the "ARM11 processor" refers to "a" processor of the ARM11 family.

Suffixes

Before the Cortex family, from ARM1 up to ARM11, processors were named after their family with suffixes to specify each processor's specificities. Here are a few examples of suffixes. The details are a bit technical, but you can find more information here [1]. Letters indicate specific features of the processor. For example:

- 'F' indicates that the processor has a VFP floating point unit.

- 'T' or 'T2' means that the processor is able to use the Thumb or Thumb2 instruction encoding.

Digits detail hardware characteristics of the processor.

- For example, for an ARM946 processor, the '4' indicates a cache and memory protection unit and the '6' a tightly coupled SRAM interface.

Note that these suffixes are often omitted if they are not relevant to the context, or implied for newer processor that always implement the related features. You will for example not find the 'T' suffix for Cortex processors: they all handle Thumb!

Letter suffixes are also sometimes appended to the architecture name to show that one or a few specific extensions are available. You may for example see references to ARMv4T to refer to the ARM architecture version 4 with the Thumb extension.

Family Architecture versions implemented Example of processors ARM1 ARMv1 ARM1 ARM2 ARMv2 ARM2 ARMv2a ARM250 ARM3 ARMv2a ARM3 ARM6 ARMv3 ARM60, ARM600, ARM610 ARM7 ARMv3 ARM700, ARM710, ARM710a ARMv4T ARM7TDMI, ARM710T, ARM720T ...... Cortex- Cortex-A5, Cortex-A8, A ARMv7-A Cortex-A9 Cortex- R ARMv7-R Cortex-R4 Cortex- M ARMv6-M Cortex-M0, Cortex-M1 ARMv7-M Cortex-M3 ARMv7-ME Cortex-M4

ARM7, ARM9 & ARM11 features

ARM7 versions  ARM7TDMI® (Integer Core)  ARM7TDMI-S™, (Synthesisable version of ARM7TDMI)  ARM7EJ-S™ (Synthesisable core with DSP and Jazelle technology)  ARM720T™ (cached processor macrocell , 8K Cached Core with Memory Management Unit (MMU) supporting operating systems1 including Windows CE, Palm OS, Symbian OS and Linux)  130 MIPS using Dhrystone 2.1 benchmark in typical 0.13μm process ARM9 versions  ARM920T (Dual 16k caches with MMU support multiple OSs.  ARM922T (Dual 8k caches for applications support multiple OSs1.  ARM940T™ (Dual 4k caches for embedded control pplications running a RTOS)  32-bit RISC processor core Super scaling 5-stage integer pipeline. 8-entry write buffers to avoid blocking the processor on external memory writes  Achieves 1.1 MIPS/MHz, 300 MIPS (Dhrystone 2.1) in a typical 0.13μm process

ARM11 versions  Families with ARMv6 instruction set architecture that includes the Thumb® extensions for code density, Jazelle™ technology for Java™ acceleration, ARM DSP extensions, and SIMD media processing extensions. MMU) supporting operating systems1 and palm OS  32-bit RISC processor core with 8-stage integer pipeline, static and dynamic branch prediction, and separate load- store and arithmetic pipelines to maximize instruction throughput  Targets a performance range of Dhrystone MIPS 400 to 1200

Advantages & Suitability in Embedded Applications

 The fact that it is a simple hardware design and the fact that many things can be left off the chip, such as a FP multiplier as options, coupled with the fact that it is a RISC pipeline architecture all lend themselves to creating a chip with a very small die size.  Small die size translates into low cost since much of the cost of a chip is proportional to the die area.  Having small die area and simple pipeline construction allows the other major benefit of the ARM chip. Designers are able to use less hardware and make better hardware decisions to reduce the processor's power consumption.  The small size, low cost, and low power usage leads to one of the most common uses for an ARM processor today, embedded applications. Embedded environments like cell phones or PDAs (Personal Digital Assistants) require those benefits that this architecture provides. Sure, there has to be a trade-off between performance, cost, and size. But, the ARM fits into this category nicely. It has very small die size, its performance, although not on the cutting edge, is more than adequate for the tasks at hand, and most importantly, it is cheap and low in power consumption.  An Important factor that contributes to making such a claim true is its simple design using a not-so-fancy 5 stage pipeline. But, other contributing factors are as follows below.  ARM makers have been able to apply an instruction set called Thumb, which takes 32-bit instructions and compresses them down to 16-bits. This tactic enables programs to be coded much more densely than standard RISC instruction sets, not to mention cutting some portions of the hardware down in size.  Processors enabled to take advantage of Thumb also allow 32-bit instructions to run on the same processor. In fact, 16-bit and 32-bit instructions can be mixed together and the hardware will be able to decode and decompress at the same time without a performance hit, thus maintaining powerful computing capabilities.  Cost is minimized by having a simple, small structure with many configurations available. Small means less silicon, higher yield per wafer.  A simple pipeline and instruction set makes it easier to learn, optimize, and build, again saving on cost.

ARM DATA FLOW MODEL

● Von-Neumann implementation – data items and instructions share the same bus. ● Instruction decoder translates instructions before they are executed. ● Load instruction: copy data from memory to register ● Store instruction: copy data from register to memory ● There are no data processing instructions that directly manipulate data in memory. ● Data items are placed in the register file – a storage bank made up of 32-bit registers. ● ARM instructions typically have two source registers Rn and Rm and one destination register Rd. ● ALU and MAC (Multiply-accumulate) unit takes the register values Rn and Rm from A and B buses and computes a result. ● Load and store instructions use the ALU to generate an address to be held in the address register and broadcast on the Address bus. ● The register Rm can be alternatively pre-processed in the barrel shifter before it enters the ALU. ● For load and store instructions the incrementer updates the address register before the core reads or writes the next register value from or to the next sequential memory location.