Electronics

Factors in choosing a architecture by Terry Schmidt, , USA

It has never been easy to come up with a definitive measure of CPU performance in real-time embedded control applications. Although it is one of the key factors for systems designers, it is complicated by the wide choice of 8, 16 and 32-bit microcontroller and digital signal controller (DSC) devices on the market, with their wide choice of pin counts, memory capacity and types, and peripheral functions.

Although, at first sight, millions of instructions . With the Harvard architecture. instructions and per instruction, which per second (MIPS) appears to be a good way and operands use separate address and data- can be used to calculate the total number of comparing processors, products with the bus structures. of clock . In the real same MIPS rating can perform quite differently, application, throughput can be significantly The combination of the above two choices depending on the application. Many other below the expectations of a ’s quoted determines the peak MIPS of its CPU and the factors, including interrupt responsiveness, code peak MIPS rate. sustainable MIPS across the instruction-set execution predictability and the ability to easily and addressing modes. It can be seen that Combining RISC and von Neumann can offer and quickly manipulate I/O pins and register processors that execute an equal number advantages in some applications. When RISC bits are also important. Other standardised of instructions across all addressing modes architectures were originally defined, CPU benchmarks, such as Dhrystones and EEMBC, in a single cycle can sustain their peak MIPS. speeds and memory access times were can help with comparisons, but the best However, it is equally important to consider how measurements are always made in relation to similar. So, an architecture that executed much is accomplished with each instruction. the actual requirements of the application. small instructions at the same rate as memory Many microcontroller families combine a CISC access was optimal. Since RISC processors Two of the most important influences on instruction set and a von Neumann data-path typically have operands in CPU registers, the processor performance are the instruction-set architecture. These devices have a complex bottleneck to fetch operands is not severe. architecture and the data-path architecture. instruction set executing multiple tasks per RISC architectures certainly execute at high The choice of a RISC or CISC instruction set instruction, such as: fetch two operands, add peak MIPS rates. However, several instructions and von Neumann or Harvard data path can them together and store them in a destination are required to execute an equivalent CISC make a significant difference to the system address. The addressing modes for these multi-operand addition and store function. performance in a given application. operands can be quite complex. The von With CISC, the peak MIPS rate is high, but the With CISC, each instruction can execute Neumann data path means that a single work accomplished per instruction is relatively multiple operations, such as load operand(s) address and data path is used for all instruction low. To support the high clock rates of a RISC from memory, arithmetic or logical operation, fetches, operand address derivation, operand processor, the memory needs help to keep and store results in memory. With RISC, fetches, execution and results store operations. pace. This problem has been addressed however, each instruction executes a simple This creates a bottleneck, so instructions can by instruction prefetch pipelines and task, such as loading a register from memory. take four, six, or more clock cycles to execute. memory accelerators. These solutions are fine in Other instructions perform arithmetic or logical This reduces the peak MIPS performance by data-processing or fixed-algorithm applications, functions, or store a register to memory. With the 15% – 50%, depending on the instruction mix. but in real-time control applications where the , the instructions and Most manufacturers provide information on interrupt response and software execution are operands both use the same address and data the number of machine cycles required per unpredictable, they are far from perfect.

Fig. 2: Microchip’s PIC24 and dsPIC33 16-bit families combine Harvard architectures with a Fig. 1: A typical Harvard architecture incorporating separate program (24-bit) and data (16-bit) bus structures. blended RISC/CISC CPU.

48 October 2008 - EngineerIT Combining Harvard with • Loading the CPU’s PC and stack pointer for blended RISC/CISC the new interrupt service routine

DSP devices often use a Harvard The worst-case interrupt latency is dramatically architecture, but it is not widely used affected by the time to execute the worst-case in . Two significant instruction. For example, if a divide or multi-bit advantages of the Harvard architecture shift instruction takes 15, 20 or more cycles are its multiple data paths for instruction and an interrupt occurs while this instruction streams and operands, and the is executing, there will be a significant delay before the interrupt can be processed. separation of the instruction word width Some advanced 16-bit MCUs and most from the native data-type width. This DSC products have hardware-assisted multi- allows a wider instruction word and richer bit shifts, and some incorporate interruptible CISC-like instructions, while maintaining divide instructions. These attributes can the single cycle fetch and execution of significantly reduce the worst-case interrupt RISC. The separate data paths and wide latency. instruction word reduce the number of cycles spent fetching instructions. The Data types and bit manipulation wider instruction word also improves The data types in real-time control instruction-set encoding, reducing applications include 32-bit words, 16-bit the size of the compiled C code. The words, bytes and bit manipulation. Processors Harvard architecture allows this blending designed for embedded control typically of the best attributes of RISC and CISC. have dedicated instructions that quickly For example, Microchip’s PIC24 16-bit and predictably execute bit manipulation microcontroller and dsPIC33 digital- in a single cycle. Since RISC architectures signal-controller families implement a do not typically include these instructions, a Harvard architecture with 24-bit instruction multiple-instruction, multiple-cycle stream is words and 16-bit data (Fig. 2). needed to set or clear a bit in a peripheral The Harvard architecture’s parallel data control register, or to modify the state of an paths support simultaneous fetching of I/O pin – which is often time critical. This type multiple operands, instruction operation of bit manipulation operation is typically not included in standardised benchmarks. execution and the storing of results – all in a single CPU clock cycle. This results Conclusion in a CPU where peak MIPS and actual application MIPS are nearly identical. The The performance of microcontrollers and digital only exception to single-cycle execution signal controllers in real-time applications is is typically found in branch instructions, significantly affected by the combination of when the branch is taken and iterative instruction-set and data-path architectures instructions, such as divide, are executed. that is used. When assessing performance, it is Harvard-architecture devices can offer important to focus on measures of continuous the same or greater processing power MIPS and work done per instruction, rather than at lower clock speeds, due to their the peak MIPS that a processor can achieve internal parallel data paths. Lower clock on a few simple instructions. Architecture speeds also typically result in lower power choices are also influenced by the cost and dissipation and reduced EMI. relative performance of logic and memory semiconductor technology. If a family of The effect of interrupt latency on real- devices was initially defined some time ago, time performance constraints of legacy and compatibility will mean that some features and modes of Another critical factor in the performance operation have been carried forward as of embedded real-time systems is those families have developed, even though interrupt latency. For example, interrupt semiconductor technology has advanced. response time can be critical in a The parallelism of the Harvard architecture such as shutting off a motor. A number offers many advantages for today’s dense of steps are needed to transfer from one semiconductor technology. By delivering stream of code to servicing an interrupt: instructions that accomplish several functions • Completion of the currently executing per instruction, at lower clock speeds, this instruction architecture offers advantages in power • Saving the CPU registers, such as PC dissipation and improved EMI performance. and status, and stack pointer Contact Willem Hijbeek, • Saving the working registers for the Tempe Technologies, Tel 011 452-0530, currently executing software routine [email protected]

EngineerIT - October 2008 49