Choosing a DSP Processor Berkeley Design Technology, Inc.

Introduction A second feature shared by DSP processors is the ability to complete several accesses to memory in a sin- DSP processors are microprocessors designed to per- gle instruction cycle. This allows the processor to fetch form digital signal processing—the mathematical ma- an instruction while simultaneously fetching operands nipulation of digitally represented signals. Digital signal and/or storing the result of a previous instruction to processing is one of the core technologies in rapidly memory. For example, in calculating the vector dot prod- growing application areas such as wireless communica- uct for an FIR filter, most DSP processors are able to per- tions, audio and video processing, and industrial control. form a MAC while simultaneously loading the data Along with the rising popularity of DSP applications, the sample and coefficient for the next MAC. Such single- variety of DSP-capable processors has expanded greatly cycle multiple memory accesses are often subject to since the introduction of the first commercially success- many restrictions. Typically, all but one of the memory ful DSP chips in the early 1980s. Market research firm locations accessed must reside on-chip, and multiple Forward Concepts projects that sales of DSP processors memory accesses can only take place with certain in- will total U.S. $6.2 billion in 2000, a growth of 40 per- structions. To support simultaneous access of multiple cent over 1999. With semiconductor manufacturers vy- memory locations, DSP processors provide multiple on- ing for bigger shares of this booming market, designers’ chip buses, multi-ported on-chip memories, and in some choices will broaden even further in the next few years. cases multiple independent memory banks. Today’s DSP processors (or “DSPs”) are sophisticat- A third feature often used to speed arithmetic pro- ed devices with impressive capabilities. In this paper, we cessing on DSP processors is one or more dedicated ad- introduce the features common to modern commercial dress generation units. Once the appropriate addressing DSP processors, explain some of the important differ- registers have been configured, the address generation ences among these devices, and focus on features that a unit operates in the background (i.e., without using the system designer should examine to find the processor main data path of the processor), forming the addresses that best fits his or her application.

What is a DSP Processor? X Data Bus Y Data Bus Most DSP processors share some common basic fea- 24 24 tures designed to support high-performance, repetitive, numerically intensive tasks. X0 Operand X1 Registers Y0 The most often cited of these features is the ability to Y1 perform one or more multiply-accumulate operations 24 24 (often called “MACs”) in a single instruction cycle. The multiply-accumulate operation is useful in DSP algo- Multiplier rithms that involve computing a vector dot product, such as digital filters, correlation, and Fourier transforms. To 56 56 ALU achieve a single-cycle MAC, DSP processors integrate Shifter multiply-accumulate hardware into the main data path of 56 the processor, as shown in Figure 1. Some recent DSP 24 Accumulators A (56) processors provide two or more multiply-accumulate B (56) 24 units, allowing multiply-accumulate operations to be 56 56 performed in parallel. In addition, to allow a series of multiply-accumulate operations to proceed without the Shifter/Limiter possibility of arithmetic overflow (the generation of numbers greater than the maximum value the proces- 24 24 sor’s accumulator can hold), DSP processors generally provide extra “guard” bits in the accumulator. For exam- ple, the Motorola DSP processor family examined in FIGURE 1. A representative conventional fixed-point Figure 1 offers eight guard bits. DSP processor data path (from the Motorola DSP560xx, a 24-bit, fixed-point processor family).

© 1996-2000 Berkeley Design Technology, Inc. required for operand accesses in parallel with the execu- On the other hand, because general-purpose proces- tion of arithmetic instructions. In contrast, general-pur- sor architectures generally lack features that simplify pose processors often require extra cycles to generate the DSP programming, software development is sometimes addresses needed to load operands. DSP processor ad- more tedious than on DSP processors and can result in dress generation units typically support a selection of ad- awkward code that’s difficult to maintain. Moreover, if dressing modes tailored to DSP applications. The most general-purpose processors are used only for signal pro- common of these is register-indirect addressing with cessing, they are rarely cost-effective compared to DSP post-increment, which is used in situations where a re- chips designed specifically for the task. Thus, at least in petitive computation is performed on data stored sequen- the short run, we believe that system designers will con- tially in memory. Modulo addressing is often supported, tinue to use traditional DSP processors for the majority to simplify the use of circular buffers. Some processors of DSP intensive applications. We focus on DSP proces- also support bit-reversed addressing, which increases the sors in this paper. speed of certain fast Fourier transform (FFT) algorithms. Applications Because many DSP algorithms involve performing repetitive computations, most DSP processors provide DSP processors find use in an extremely diverse ar- special support for efficient looping. Often, a special ray of applications, from radar systems to consumer loop or repeat instruction is provided, which allows the electronics. Naturally, no one processor can meet the programmer to implement a for-next loop without ex- needs of all or even most applications. Therefore, the pending any instruction cycles for updating and testing first task for the designer selecting a DSP processor is to the loop counter or branching back to the top of the loop. weigh the relative importance of performance, cost, inte- gration, ease of development, power consumption, and Finally, to allow low-cost, high-performance input other factors for the application at hand. Here we’ll brief- and output, most DSP processors incorporate one or ly touch on the needs of just a few classes of DSP appli- more serial or parallel I/O interfaces, and specialized I/O cations. handling mechanisms such as low-overhead interrupts and direct memory access (DMA) to allow data transfers In terms of dollar volume, the biggest applications to proceed with little or no intervention from the rest of for digital signal processors are inexpensive, high-vol- the processor. ume embedded systems, such as cellular telephones, disk drives (where DSPs are used for servo control), and por- The rising popularity of DSP functions such as table digital audio players. In these applications, cost and speech coding and audio processing has led designers to integration are paramount. For portable, battery-pow- consider implementing DSP on general-purpose proces- ered products, power consumption is also critical. Ease sors such as desktop CPUs and microcontrollers. Nearly of development is usually less important; even though all general-purpose processor manufacturers have re- these applications typically involve the development of sponded by adding signal processing capabilities to their custom software to run on the DSP and custom hardware chips. Examples include the MMX and SSE instruction surrounding the DSP, the huge manufacturing volumes set extensions to the Intel Pentium line, and the extensive justify expending extra development effort. DSP-oriented retrofit of Hitachi’s SH-2 microcontroller to form the SH-DSP. A second important class of applications involves processing large volumes of data with complex algo- In some cases, system designers may prefer to use a rithms for specialized needs. Examples include sonar general-purpose processor rather than a DSP processor. and seismic exploration, where production volumes are Although general-purpose processor architectures often lower, algorithms more demanding, and product designs require several instructions to perform operations that larger and more complex. As a result, designers favor can be performed with just one DSP processor instruc- processors with maximum performance, good ease of tion, some general-purpose processors run at extremely use, and support for multiprocessor configurations. In fast clock speeds. If the designer needs to perform non- some cases, rather than designing their own hardware DSP processing, then using a general-purpose processor and software from scratch, designers assemble such sys- for both DSP and non-DSP processing could reduce the tems using off-the-shelf development boards, and ease system parts count and lower costs versus using a sepa- their software development tasks by using existing func- rate DSP processor and general-purpose microprocessor. tion libraries as the basis of their application software. Furthermore, some popular general-purpose processors feature a tremendous selection of application develop- ment tools.

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 2 OF 8 Choosing the Right DSP Processor It’s possible to perform general-purpose floating- point arithmetic on a fixed-point processor by using soft- As illustrated in the preceding section, the right DSP ware routines that emulate the behavior of a floating- processor for a job depends heavily on the application. point device. However, such software routines are usual- One processor may perform well for some applications, ly very expensive in terms of processor cycles. Conse- but be a poor choice for others. With this in mind, one quently, general-purpose floating-point emulation is can consider a number of features that vary from one seldom used. A more efficient technique to boost the nu- DSP to another in selecting a processor. These features meric range of fixed-point processors is block floating- are discussed below. point, wherein a group of numbers with different mantis- Arithmetic Format sas but a single, common exponent are processed as a block of data. Block floating-point is usually handled in One of the most fundamental characteristics of a pro- software, although some processors have hardware fea- grammable is the type of native tures to assist in its implementation. arithmetic used in the processor. Most DSPs use fixed- point arithmetic, where numbers are represented as inte- Data Width gers or as fractions in a fixed range (usually -1.0 to +1.0). All common floating-point DSPs use a 32-bit data Other processors use floating-point arithmetic, where word. For fixed-point DSPs, the most common data values are represented by a mantissa and an exponent as exponent word size is 16 bits. Motorola’s DSP563xx family uses a mantissa x 2 . The mantissa is generally 24-bit data word, however, while Zoran’s ZR3800x fam- a fraction in the range -1.0 to +1.0, while the exponent is ily uses a 20-bit data word. The size of the data word has an integer that represents the number of places that the a major impact on cost, because it strongly influences the binary point (analogous to the decimal point in a base 10 size of the chip and the number of package pins required, number) must be shifted left or right in order to obtain as well as the size of external memory devices connected the value represented. to the DSP. Therefore, designers try to use the chip with Floating-point arithmetic is a more flexible and gen- the smallest word size that their application can tolerate. eral mechanism than fixed-point. With floating-point, As with the choice between fixed- and floating-point system designers have access to wider dynamic range chips, there is often a trade-off between word size and (the ratio between the largest and smallest numbers that development complexity. For example, with a 16-bit can be represented). As a result, floating-point DSP pro- fixed-point processor, a programmer can perform dou- cessors are generally easier to program than their fixed- ble-precision 32-bit arithmetic operations by stringing point cousins, but usually are also more expensive and together an appropriate combination of instructions. (Of have higher power consumption. The increased cost and course, double-precision arithmetic is much slower than power consumption result from the more complex cir- single-precision arithmetic.) If the bulk of an application cuitry required within the floating-point processor, can be handled with single-precision arithmetic, but the which implies a larger silicon die. The ease-of-use ad- application needs more precision for a small section of vantage of floating-point processors is due to the fact that the code, the selective use of double-precision arithmetic in many cases the programmer doesn’t have to be con- may make sense. If most of the application requires more cerned about dynamic range and precision. In contrast, precision, a processor with a larger data word size is like- on a fixed-point processor, programmers often must ly to be a better choice. carefully scale signals at various stages of their programs to ensure adequate numeric precision with the limited Note that while most DSP processors use an instruc- dynamic range of the fixed-point processor. tion word size equal to their data word size, not all do. The Analog Devices ADSP-21xx family, for example, Most high-volume, embedded applications use fixed- uses a 16-bit data word and a 24-bit instruction word. point processors because the priority is on low cost and, often, low power. Programmers and algorithm designers Speed determine the dynamic range and precision needs of their A key measure of the suitability of a processor for a application, either analytically or through simulation, particular application is its execution speed. There are a and then add scaling operations into the code if neces- number of ways to measure a processor’s speed. Perhaps sary. For applications that have extremely demanding the most fundamental is the processor’s instruction cycle dynamic range and precision requirements, or where time: the amount of time required to execute the fastest ease of development is more important than unit cost, instruction on the processor. The reciprocal of the in- floating-point processors have the advantage. struction cycle time divided by one million and multi-

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 3 OF 8 plied by the number of instructions executed per cycle is A more general approach is to define a set of standard the processor’s peak instruction execution rate in mil- benchmarks and compare their execution speeds on dif- lions of instructions per second, or MIPS. ferent DSPs. These benchmarks may be simple algo- rithm “kernel” functions (such as FIR or IIR filters), or A problem with comparing instruction execution they might be entire applications or portions of applica- times is that the amount of work accomplished by a sin- tions (such as speech coders). Implementing these gle instruction varies widely from one processor to an- benchmarks in a consistent fashion across various DSPs other. Some of the newest DSP processors use VLIW and analyzing the results can be difficult. Our company, (very long instruction word) architectures, in which mul- Berkeley Design Technology, Inc., pioneered the use of tiple instructions are issued and executed per cycle. algorithm kernels to measure DSP processor perfor- These processors typically use very simple instructions mance with the BDTI Benchmarks™ included in our in- that perform much less work than the instructions typical dustry report, Buyer’s Guide to DSP Processors. Several of conventional DSP processors. Hence, comparisons of processors’ execution time results on BDTI’s FFT MIPS ratings between VLIW processors and conven- benchmark are shown in Figure 2. tional DSP processors can be particularly misleading, because of fundamental differences in their instruction Two final notes of caution on processor speed: First, set styles. For an example contrasting work per instruc- be careful when comparing processor speeds quoted in tion between Texas Instrument’s VLIW TMS320C62xx terms of “millions of operations per second” (MOPS) or and Motorola’s conventional DSP563xx, see BDTI’s “millions of floating-point operations per second” white paper entitled The BDTImark™: A Measure of (MFLOPS) figures, because different processor vendors DSP Execution Speed, available at www.BDTI.com. have different ideas of what constitutes an “operation.” For example, many floating-point processors are Even when comparing conventional DSP processors, claimed to have a MFLOPS rating of twice their MIPS however, MIPS ratings can be deceptive. Although the rating, because they are able to execute a floating-point differences in instruction sets are less dramatic than multiply operation in parallel with a floating-point addi- those seen between conventional DSP processors and tion operation. VLIW processors, they are still sufficient to make MIPS comparisons inaccurate measures of processor perfor- Second, use caution when comparing processor mance. For example, some DSPs feature barrel shifters clock rates. A DSP’s input clock may be the same fre- that allow multi-bit data shifting (used to scale data) in quency as the processor’s instruction rate, or it may be just one instruction, while other DSPs require the data to two to four times higher than the instruction rate, de- be shifted with repeated one-bit shift instructions. Simi- pending on the processor. Additionally, many DSP chips larly, some DSPs allow parallel data moves (the simulta- now feature clock doublers or phase-locked loops neous loading of operands while executing an instruction) that are unrelated to the ALU instruction be- 70 ing executed, but other DSPs only support parallel 60 moves that are related to the operands of an ALU instruc- Fixed-Point Floating-Point tion. Some newer DSPs allow two MACs to be specified 50 in a single instruction, which makes MIPS-based com- 40 parisons even more misleading. 30 One solution to these problems is to decide on a basic 20 operation (instead of an instruction) and use it as a yard- stick when comparing processors. A common operation 10 is the MAC operation. Unfortunately, MAC execution 0 times provide little information to differentiate between processors: on many DSPs a MAC operation executes in a single instruction cycle, and on these DSPs the MAC time is equal to the processor’s instruction cycle time.

And, as mentioned above, some DSPs may be able to do DSP56311 (150 MIPS) TMS320C5416 (160 MIPS) TMS320C6203 (300 MHz) MSC8101 (300 MHz) Pentium III GHz) (1 III-C Pentium GHz) (1 TMS320C6701 (167 MHz) considerably more in a single MAC instruction than oth- FIGURE 2. Execution times for a 256-point complex FFT, ers. Additionally, MAC times don’t reflect performance in microseconds (lower is better). on other important types of operations, such as looping, Note: Times are calculated for the fastest version of each processor projected to be available in June 2000. For the processors with on-chip , “-C” that are present in virtually all applications. indicates performance with cache pre-loaded.

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 4 OF 8 (PLLs) that allow the use of a lower-frequency external Some floating-point chips provide relatively little (or clock to generate the needed high-frequency clock on- no) on-chip memory, but feature large external data bus- chip. es. For example, the Texas Instruments TMS320C30 provides 6K words of on-chip memory, one 24-bit exter- Memory Organization nal address bus, and one 13-bit external address bus. In The organization of a processor’s memory subsystem contrast, the Analog Devices ADSP-21060 provides 4 can have a large impact on its performance. As men- Mbits of memory on-chip that can be divided between tioned earlier, the MAC and other DSP operations are program and data memory in a variety of ways. fundamental to many signal processing algorithms. Fast MAC execution requires fetching an instruction word As with most DSP features, the best combination of and two data words from memory at an effective rate of memory organization, size, and number of external bus- once every instruction cycle. There are a variety of ways es is heavily application-dependent. to achieve this, including multiported memories (to per- Ease of Development mit multiple memory accesses per instruction cycle), The degree to which ease of system development is a separate instruction and data memories (the “Harvard” concern depends on the application. Engineers perform- architecture and its derivatives), and instruction caches ing research or prototyping will probably require tools (to allow instructions to be fetched from cache instead of that make system development as simple as possible. On from memory, thus freeing a memory access to be used the other hand, a company developing a next-generation to fetch data). Figures 3 and 4 show how the Harvard digital cellular telephone may be willing to suffer with memory architecture differs from the “Von Neumann” poor development tools and an arduous development en- architecture used by many microcontrollers. vironment if the DSP chip selected shaves $5 off the cost Another concern is the size of the supported memory, of the end product. (Of course, this same company might both on- and off-chip. Most fixed-point DSPs are aimed reach a different conclusion if the poor development en- at the embedded systems market, where memory needs vironment results in a three-month delay in getting their tend to be small. As a result, these processors typically product to market!) have small-to-medium on-chip memories (between 4K and 64K words), and small external data buses. In addi- That said, items to consider when choosing a DSP are tion, most fixed-point DSPs feature address buses of 16 software tools (assemblers, linkers, simulators, debug- bits or less, limiting the amount of easily-accessible ex- gers, compilers, code libraries, and real-time operating ternal memory. systems), hardware tools (development boards and emu-

Processor Core

Processor Core

Address Bus 1

Data Bus 1 Address Bus

Address Bus 2 Data Bus

Data Bus 2

Memory

Memory A Memory B

FIGURE 3. A , common to many FIGURE 4. The Von Neumann memory architecture, DSP processors. The processor can simultaneously common among microcontrollers. Since there is only access the two memory banks using two independent one data bus, operands cannot be loaded while sets of buses, allowing operands to be loaded while instructions are fetched, creating a bottleneck that fetching instructions. slows the execution of DSP algorithms.

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 5 OF 8 lators), and higher-level tools (such as block-diagram- point processors typically support larger memory spaces based code-generation environments). A design flow us- than fixed-point processors, and are thus better able to ing some of these tools is illustrated in Figure 5. accommodate compiler-generated code, which tends to be larger than hand crafted assembly code. A fundamental question to ask when choosing a DSP is how the chip will be programmed. Typically, develop- VLIW-based DSP processors, which typically use ers choose either assembly language, a high-level lan- simple, orthogonal RISC-based instruction sets and have guage—such as C or Ada—or a combination of both. large register files, are somewhat better compiler targets Surprisingly, a large portion of DSP programming is still than traditional DSP processors. However, even compil- done in assembly language. Because DSP applications ers for VLIW processors tend to generate code that is in- have voracious number-crunching requirements, pro- efficient in comparison to hand-optimized assembly grammers are often unable to use compilers, which often code. Hence, these processors, too, are often pro- generate assembly code that executes slowly. Rather, grammed in assembly language—at least to some de- programmers can be forced to hand-optimize assembly gree. code to lower execution time and code size to acceptable Whether the processor is programmed in a high-level levels. This is especially true in consumer applications, language or in assembly language, debugging and hard- where cost constraints may prohibit upgrading to a high- ware emulation tools deserve close attention since, sad- er-performance DSP processor or adding a second pro- ly, a great deal of time may be spent with them. Almost cessor. all manufacturers provide instruction set simulators, Users of high-level language compilers often find which can be a tremendous help in debugging programs that the compilers work better for floating-point DSPs before hardware is ready. If a high-level language is than for fixed-point DSPs, for several reasons. First, used, it is important to evaluate the capabilities of the most high-level languages do not have native support for high-level language debugger: will it run with the simu- fractional arithmetic. Second, floating-point processors lator and/or the hardware emulator? Is it a separate pro- tend to feature more regular, less restrictive instruction gram from the assembly-level debugger that requires the sets than smaller, fixed-point processors, and are thus user to learn another user interface? better compiler targets. Third, as mentioned, floating- Most DSP vendors provide hardware emulation tools for use with their processors. Modern processors usually Assembly Object Code feature on-chip debugging/emulation capabilities, often Libraries Libraries accessed through a serial interface that conforms to the IEEE 1149.1 JTAG standard for test access ports. This Object Code serial interface allows scan-based emulation—program- Assembly Code Assembler Linker mers can load breakpoints through the interface, and then scan the processor’s internal registers to view and change the contents after the processor reaches a break- Binary point. Scan-based emulation is especially useful because Executable debugging may be accomplished without removing the Simulator Final Product processor from the target system. Other debugging meth- ods, such as pod-based emulation, require replacing the processor with a special processor emulator pod.

DSP Development Off-the-shelf DSP system development boards are Board available from a variety of manufacturers, and can be an important resource. Development boards can allow soft- In-Circuit Emulator ware to run in real-time before the final hardware is ready, and can thus provide an important productivity boost. Additionally, some low-production-volume sys- tems may use development boards in the final product. Debugger Multiprocessor Support FIGURE 5. The interaction among assembly Certain computationally intensive applications with language development tools. These tools include high data rates (e.g., radar and sonar) often demand mul- assemblers, linkers, libraries, simulators, development hardware, and in-circuit emulators. tiple DSP processors. In such cases, ease of processor in- © 1996-2000 Berkeley Design Technology, Inc.

PAGE 6 OF 8 terconnection (in terms of time to design interprocessor • Peripheral control. Some DSPs allow the program- communications circuitry and the cost of linking proces- mer to disable peripherals that are not in use. sors) and interconnection performance (in terms of com- Regardless of power management features, it is often munications throughput, overhead, and latency) may be difficult for design engineers to obtain meaningful pow- important factors. Some DSP families—notably the An- er consumption figures for DSPs. This is because a alog Devices ADSP-2106x—provide special-purpose DSP’s power consumption may vary by as much as a hardware to ease multiprocessor system design. factor of three depending on the instructions it executes. ADSP-2106x processors feature bidirectional data Unfortunately, most vendors publish only “typical” or and address buses coupled with six bidirectional bus re- “maximum” power consumption numbers, usually with- quest lines. These allow up to six processors to be con- out specifying what constitutes a “typical” program. One nected together via a common external bus with elegant exception is Texas Instruments, which provides applica- bus arbitration. Moreover, a unique feature of the ADSP- tion notes that detail power consumption vs. instruction 2106x processor connected in this way is that each pro- type and processor configuration. cessor can access the internal memory of any other Cost ADSP-2106x on the shared bus. Six four-bit parallel Obviously, processor cost is a major concern for communication ports round out the ADSP-2106x’s par- products that are to be produced in volume. For such ap- allel processing features. Interestingly, Texas Instru- plications, designers try to use the lowest cost DSP that ment’s newest floating-point processor, the VLIW-based meets the requirements of the application, even though TMS320C67xx, does not currently provide similar hard- such devices may be considerably less flexible and more ware support for multiprocessor designs, though it is difficult to program than costlier processors. Among possible that future family members will address this is- processor families, the least expensive family members sue. tend to have significantly fewer features, less on-chip Power Consumption and Management memory, and lower performance than the more expen- DSPs are increasingly being used in portable applica- sive members. tions (such as cellular phones and portable audio players) A key factor in processor pricing is the dependence where power consumption is a major concern. As a re- of price on device packaging. For example, plastic thin sult, many processor vendors are reducing processor quad flat pack (PQFP and TQFP) packages can be signif- supply voltages and adding power management features icantly less expensive than pin grid array (PGA) packag- to give programmers greater influence over processor es. power consumption. Power management features avail- able on some DSPs include: Finally, when considering prices, it is important to remember two things. First, processor prices are contin- • Reduced voltage operation. Many vendors offer ually falling. Second, prices are strongly dependent on low-voltage (3.3-, 2.5-, or 1.8-volt) versions of their quantity, and prices for, say, a quantity 100,000 order DSP processors. These processors consume far less may be significantly lower than for a quantity 1,000 or- power than five-volt equivalents at the same clock der. rate. • “Sleep” or “idle” modes. Most DSPs feature modes Summary that turn off the processor’s clock to all but certain sections of the processor, reducing power consump- Despite some manufacturers’ claims, there isn’t a tion. In some cases, any unmasked interrupt will single best DSP chip. Rather, the right DSP depends on bring the processor back from sleep mode, while in the application; a good choice for one application might other cases, only a few designated external interrupt be a poor choice for another. In this paper we have re- lines will wake the processor. Some processors pro- viewed a number of criteria useful for choosing a DSP: vide multiple sleep modes with different power sav- arithmetic format, data width, speed, memory organiza- ings and wakeup latencies. tion, ease of development, multiprocessor support, pow- er consumption, and cost. Which of these are most • Programmable clock dividers. Some DSPs allow important is a decision the system designer must make the processor’s clock frequency to be varied under based on his or her application. software control to use the minimum clock speed required for a particular task. We conclude by mentioning two trends in DSP pro- cessor design. First, we expect to see more DSP proces-

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 7 OF 8 sors tailored for specific high-volume applications, like Inside the Infineon Carmel, and DSP Processor cellular phones and portable digital audio players. The Fundamentals. processors will integrate more system functions, such as • Analyzes DSP algorithms and applications. analog-to-digital converters and LCD controllers, in an • Evaluates design tools and advises on tool selection effort to reduce product cost and parts count. Second, we and design methodologies. believe that more processors will be sold as licensable core designs, such as the Oak, Teak, and Palm cores of- • Develops specifications and recommendations for fered by DSP Group, and the Carmel and TriCore cores new DSP processors, software, and tools. offered by Infineon. As EDA (electronic design automa- • Provides DSP-related training classes. tion) tools advance in sophistication, system designers will find it easier to modify DSP cores and add custom peripherals to create highly specialized, cost-effective BERKELEY DESIGN TECHNOLOGY, INC. solutions for high-volume applications. 2107 Dwight Way, Second Floor Berkeley, CA 94704 USA References (510) 665-1600 Fax: (510) 665-1680 [1] Buyer’s Guide to DSP Processors, Berkeley, Cali- email: [email protected] fornia: Berkeley Design Technology, Inc., 1994, http://www.BDTI.com 1995, 1997, 1999. This 946-page technical report contains feature analyses and extensive benchmark- International Representatives ing data for most popular DSP processors. The EUROPE JAPAN report provides profiling data from actual applica- Cornelius Kellerhoff Shinichi Hosoya tions, such as modems and vocoders, to help design- Technology Products Japan Kyastem Co ers assess the importance of specific processor Dusseldorf, Germany Tokyo, Japan features. Excerpts from this report, as well as a +49 (211) 467 998 +81 (425) 23 7176 Fax: +49 (211) 467 999 Fax: +81 (425) 23 7178 pocket guide to DSP processors, are available on the [email protected] [email protected] World Wide Web at www.BDTI.com. [2] Phil Lapsley, Jeff Bier, Amit Shoham, and Edward A. Lee, DSP Processor Fundamentals: Architec- tures and Features, Berkeley, California: Berkeley Design Technology, Inc., 1996. An introductory textbook on DSP processor architectures which dis- cusses how chip design affects performance. [3] Will Strauss, DSP Strategies 2002, Tempe, Arizona: Forward Concepts, 1999. A 600-page report on the DSP chip market.

About Berkeley Design Technology Berkeley Design Technology, Inc. (BDTI) is a soft- ware and technical services company focused on digital signal processing (DSP) technology. The company was founded by U.C. Berkeley faculty and researchers. BDTI specializes in the analysis, benchmarking, evaluation, and development of technology used to im- plement DSP applications. Specifically, the company: • Performs in-depth technical evaluations of micro- processors. • Develops DSP application software and firmware. • Publishes technical reports and books on DSP tech- nology, including Buyer's Guide to DSP Processors,

© 1996-2000 Berkeley Design Technology, Inc.

PAGE 8 OF 8