Choosing a DSP Processor Berkeley Design Technology, Inc

Choosing a DSP Processor Berkeley Design Technology, Inc. Introduction A second feature shared by DSP processors is the ability to complete several accesses to memory in a sin- DSP processors are microprocessors designed to per- gle instruction cycle. This allows the processor to fetch form digital signal processing—the mathematical ma- an instruction while simultaneously fetching operands nipulation of digitally represented signals. Digital signal and/or storing the result of a previous instruction to processing is one of the core technologies in rapidly memory. For example, in calculating the vector dot prod- growing application areas such as wireless communica- uct for an FIR filter, most DSP processors are able to per- tions, audio and video processing, and industrial control. form a MAC while simultaneously loading the data Along with the rising popularity of DSP applications, the sample and coefficient for the next MAC. Such single- variety of DSP-capable processors has expanded greatly cycle multiple memory accesses are often subject to since the introduction of the first commercially success- many restrictions. Typically, all but one of the memory ful DSP chips in the early 1980s. Market research firm locations accessed must reside on-chip, and multiple Forward Concepts projects that sales of DSP processors memory accesses can only take place with certain in- will total U.S. $6.2 billion in 2000, a growth of 40 per- structions. To support simultaneous access of multiple cent over 1999. With semiconductor manufacturers vy- memory locations, DSP processors provide multiple on- ing for bigger shares of this booming market, designers’ chip buses, multi-ported on-chip memories, and in some choices will broaden even further in the next few years. cases multiple independent memory banks. Today’s DSP processors (or “DSPs”) are sophisticat- A third feature often used to speed arithmetic pro- ed devices with impressive capabilities. In this paper, we cessing on DSP processors is one or more dedicated ad- introduce the features common to modern commercial dress generation units. Once the appropriate addressing DSP processors, explain some of the important differ- registers have been configured, the address generation ences among these devices, and focus on features that a unit operates in the background (i.e., without using the system designer should examine to find the processor main data path of the processor), forming the addresses that best fits his or her application. What is a DSP Processor? X Data Bus Y Data Bus Most DSP processors share some common basic fea- 24 24 tures designed to support high-performance, repetitive, numerically intensive tasks. X0 Operand X1 Registers Y0 The most often cited of these features is the ability to Y1 perform one or more multiply-accumulate operations 24 24 (often called “MACs”) in a single instruction cycle. The multiply-accumulate operation is useful in DSP algo- Multiplier rithms that involve computing a vector dot product, such as digital filters, correlation, and Fourier transforms. To 56 56 ALU achieve a single-cycle MAC, DSP processors integrate Shifter multiply-accumulate hardware into the main data path of 56 the processor, as shown in Figure 1. Some recent DSP 24 Accumulators A (56) processors provide two or more multiply-accumulate B (56) 24 units, allowing multiply-accumulate operations to be 56 56 performed in parallel. In addition, to allow a series of multiply-accumulate operations to proceed without the Shifter/Limiter possibility of arithmetic overflow (the generation of numbers greater than the maximum value the proces- 24 24 sor’s accumulator can hold), DSP processors generally provide extra “guard” bits in the accumulator. For example, the Motorola DSP processor family examined in FIGURE 1. A representative conventional fixed-point Figure 1 offers eight guard bits. DSP processor data path (from the Motorola DSP560xx, a 24-bit, fixed-point processor family). © 1996-2000 Berkeley Design Technology, Inc. required for operand accesses in parallel with the execu- On the other hand, because general-purpose proces- tion of arithmetic instructions. In contrast, general-pur- sor architectures generally lack features that simplify pose processors often require extra cycles to generate the DSP programming, software development is sometimes addresses needed to load operands. DSP processor ad- more tedious than on DSP processors and can result in dress generation units typically support a selection of ad- awkward code that’s difficult to maintain. Moreover, if dressing modes tailored to DSP applications. The most general-purpose processors are used only for signal pro- common of these is register-indirect addressing with cessing, they are rarely cost-effective compared to DSP post-increment, which is used in situations where a re- chips designed specifically for the task. Thus, at least in petitive computation is performed on data stored sequen- the short run, we believe that system designers will con- tially in memory. Modulo addressing is often supported, tinue to use traditional DSP processors for the majority to simplify the use of circular buffers. Some processors of DSP intensive applications. We focus on DSP proces- also support bit-reversed addressing, which increases the sors in this paper. speed of certain fast Fourier transform (FFT) algorithms. Applications Because many DSP algorithms involve performing repetitive computations, most DSP processors provide DSP processors find use in an extremely diverse ar- special support for efficient looping. Often, a special ray of applications, from radar systems to consumer loop or repeat instruction is provided, which allows the electronics. Naturally, no one processor can meet the programmer to implement a for-next loop without ex- needs of all or even most applications. Therefore, the pending any instruction cycles for updating and testing first task for the designer selecting a DSP processor is to the loop counter or branching back to the top of the loop. weigh the relative importance of performance, cost, integration, ease of development, power consumption, and Finally, to allow low-cost, high-performance input other factors for the application at hand. Here we’ll brief- and output, most DSP processors incorporate one or ly touch on the needs of just a few classes of DSP appli- more serial or parallel I/O interfaces, and specialized I/O cations. handling mechanisms such as low-overhead interrupts and direct memory access (DMA) to allow data transfers In terms of dollar volume, the biggest applications to proceed with little or no intervention from the rest of for digital signal processors are inexpensive, high-vol- the processor. ume embedded systems, such as cellular telephones, disk drives (where DSPs are used for servo control), and por- The rising popularity of DSP functions such as table digital audio players. In these applications, cost and speech coding and audio processing has led designers to integration are paramount. For portable, battery-pow- consider implementing DSP on general-purpose proces- ered products, power consumption is also critical. Ease sors such as desktop CPUs and microcontrollers. Nearly of development is usually less important; even though all general-purpose processor manufacturers have re- these applications typically involve the development of sponded by adding signal processing capabilities to their custom software to run on the DSP and custom hardware chips. Examples include the MMX and SSE instruction surrounding the DSP, the huge manufacturing volumes set extensions to the Intel Pentium line, and the extensive justify expending extra development effort. DSP-oriented retrofit of Hitachi’s SH-2 microcontroller to form the SH-DSP. A second important class of applications involves processing large volumes of data with complex algo- In some cases, system designers may prefer to use a rithms for specialized needs. Examples include sonar general-purpose processor rather than a DSP processor. and seismic exploration, where production volumes are Although general-purpose processor architectures often lower, algorithms more demanding, and product designs require several instructions to perform operations that larger and more complex. As a result, designers favor can be performed with just one DSP processor instruc- processors with maximum performance, good ease of tion, some general-purpose processors run at extremely use, and support for multiprocessor configurations. In fast clock speeds. If the designer needs to perform non- some cases, rather than designing their own hardware DSP processing, then using a general-purpose processor and software from scratch, designers assemble such sys- for both DSP and non-DSP processing could reduce the tems using off-the-shelf development boards, and ease system parts count and lower costs versus using a sepa- their software development tasks by using existing func- rate DSP processor and general-purpose microprocessor. tion libraries as the basis of their application software. Furthermore, some popular general-purpose processors feature a tremendous selection of application development tools. © 1996-2000 Berkeley Design Technology, Inc. PAGE 2 OF 8 Choosing the Right DSP Processor It’s possible to perform general-purpose floating- point arithmetic on a fixed-point processor by using soft- As illustrated in the preceding section, the right DSP ware routines that emulate the behavior of a floating- processor for a job depends heavily on the application.

Choosing a DSP Processor Berkeley Design Technology, Inc

Chapter 5 Multiprocessors and Thread-Level Parallelism

Computer Organization & Architecture Eie

Parallel Computer Architecture

Digital Signal Processing – II

Computer Memory Architecture Pdf

Introduction to Computer Architecture

Programmable Address Generation Unit for Deep Neural Network Accelerators

FPGA Implementation of RISC-Based Memory-Centric Processor Architecture

A Scalable Near-Memory Architecture for Training Deep Neural Networks on Large In-Memory Datasets

Computer Architectures & Hardware Programming 1

PRIME: a Novel Processing-In-Memory Architecture for Neural Network Computation in Reram-Based Main Memory

Computer Architecture Review Von Neumann Model, the CPU