DSP 1

Subra Ganesan

Professor, Computer Science and Engineering Associate Director, Product Development and Manufacturing Center, Oakland University, Rochester, MI 48309 Email: [email protected] Topics Covered: 1. Introduction to DSP Processors 2. Fixed Point DSP- c24x 3. Floating Point DSP- C6711 4. Code Composer Studio 5. DSP/BIOS for C6711 6. External Memory Interface for C6711 7. – C6711 8. Applications DSP Microprocessor – Advances and Automotive Applications

• Advances in Circuit Technology, Architecture, Algorithms and VLSI design techniques have contributed to high performance Digital Signal Processing(DSP) microprocessors and to multitude of novel applications of DSP chips. • DSP processors are RISC based which have fast arithmetic units, on chip memory, analog interface, serial ports, timers, counters, facilities for inter processor communications and other special features. The Microprocessor overview

1949 Transistors 1958 Integrated Circuits 1961 ICs IN Quality 1964 Small Scale IC(SSI) Gates 1968 Medium Scale IC(MSI) Registers 1971 Large Scale IC(LSI), Memory, CPU 1972 8 MICROPROCESSORS 1973 16 BIT MICROPROCESSORS 1982 32 BIT MICROPROCESSORS 1984 DSP MICROPROCESSORS – I GENERATION 1986 DSP MICROPROCESSORS – II GENERATION 1988 DSP MICROPROCESSORS – III GENERATION 1989 RISC MICROPROCESSORS – II QUALITY 1990 MISC MINIMUM INSTRUSTION SET MICROPROCESSOR MICROPROCESSOR OVERVIEW

Microprocessor Number of Performance Number of transistors Instructions

4 Bit Intel 4004 2300 45 1971 68000 70000 0.5 MIPS 80 Different ƒ 14 address mode ƒ Size B,W,L TMS 320C80 32 2 Billion bit RISC operations per second [BOPs] INTRODUCTION TO DSP MICROPROCESSORS

DSP micros are reduced-instruction-set computers optimized for the fastest possible execution of the following instructions • Addition • Subtraction • Multiplication • Shifting

Single cycle multiplication and shifting using ARRAY multiplier and barrel (or combination) shifter. In contrast, general purpose micros effect such as operations via multiple cycle, micro-code instructions that make use of the ALU’s single cycle, parallel-add, single bit shift capability. DSP micros do each multiply/accumulate in a single cycle = (e.g 100 ns.)

• For 80386: Add( 16 bit addition) = 125 ns (16 Mhz)

(IMUL) 16 bit * 16 bit multiplication = 1250 ns DSP micros employ • Pipe lining of instructions • Use of addressing modes that efficiently access relevant data structure (e.g., auto increment, auto decrement modes for arrays & Indexed addressing modes for FFTs) Dual- HARVARD ARCHITECTURE, which enables • Simultaneous fetching of data and instructions • Special DSP related addressing modes (e.g., Index computation module an arbitrary number, automatic circular queue or free data move for FIR filters, bit reversal for FFTs) • Extra addressing,Multiple ALUs • Special interfaces to serve specific fields of application( e.g., serial interfaces for CODEC in telecommunications) Progress in new technologies, Gallium arsenate (GaAs) transistors and high electron-mobility transmission increase in the future DSP microprocessor. 80836 computes 1024 point FFT only 66% slower than 20 MHz TMS 32010. New version general purpose micros with DSP like dual bus structures(e.g., 68030 Motorola) array multiplier, barrel shifter, GaAs/HEMT technology, can achieve a performance of 100 MIPS and upwards. • TMS 32010 does = 5 MIPs 320C25 = 10 MIPs • Motorola 56000 = 10.25 MIPs(24 bit data) • TMS 320 C 6201 = 1600 MIPs FLOATING-POINT DIGITAL SIGNAL PROCEESING CHIPS DSP has the capability to perform floating-point arithmetic including multiply-accumulate operations with an increased degree of parallelism.

The design phase is often performed with the aid of high-level language or a commercial, DSP-oriented “design system” that yields a nonreal-time, floating point simulation on a general purpose computer.

The new generation of floating point digital signal processors are AT&T, DSP32C, Motorola DSP96002, and Texas Instruments TMS320C30. A typical development system could involve an

• Iconic graphical interface( implemented in PC software) • A computer • A PC plugin board containing a floating point DSP micro chip • Memory system The Next –PC is the first to incorporate a DSP micro. The on-board Motorola fixed-point DSP56001 is complemented by numerous “canned” procedures. These procedures enable graphics and signal processing tasks to be carried out at rates orders-of-magnitude faster than possible with on-board MC68882 floating-point co-processor. The cycle of improvement in functionality and performance for both general-purpose and DSP micros continues. Architectures incorporating such structures as systolic arrays and neural networks, will replace those now considered conventional. DSP APPLICATIONS CHARACTERSTICS

1. Algorithms are mathematically intensive e.g., for FIR filter

n-1 y(n) = ∑ a(i) * x(n-1) i=0 Where y(n) = output samples a(i) = coefficients x(n-1) = input samples

2. Real time performance

e.g. Speech Recognition Image processing within a frame update period 3. Sample Input Signal DSP processor must effectively handle sampled data in large quantities.

DSP processors must be flexible to accommodate changing algorithms, new DSP processors etc. The DSP Environment: Definitions

Lowpass Lowpass Analog A/D D/A Analog Filter DSP Filter Signal Converter Converter Signal (LPF1) Processor (LPF2) A simple digital filter system

Sample A/D X(n) Register, R R R X(n-N+1) fs X(n-2) X(n) X(n-1) X a(0) X X X a(1) a(2)

Where fs sampling frequency + Y(n) a(0),a(i) co-efficients Y(t) y(n) Digital output D/A y(t) analog output As long as the system samples the analog input at a frequency fs that is at least twice the information band width of that input, all information present in the original analog signal is contained in the digital signal A/D conversion introduces quantization noise. Signal to quantization noise ratio or SQNR is a function of A/D’s accuracy. • DSP stores current A/D sample and N-1 previous samples in a sample shift register, or a RAM which can simulate shift register function by modifying pointers.

• The coefficients ai are stored in ROM or RAM and they determine the impulse response and filter characteristics. • A large N gives a longer impulse response and generally produces filters with sharper roll-off, greater stop band attenuation, and less frequency ripple. • This filter is called Nth order, finite impulse response (FIR) (no feed back path), digital filter. • The FIR filter requires N multiplies and N-1 additions to compute an output y(n) each time the input signal is sampled. • Some DSP applications involve sampling rates of up to 100 Mhz and 100 MIPS. SHANNON’S SAMPLING THEORY

An analog signal containing maximum frequency fi Hz may be completely represented by regularly spaced samples,

provided the sampling rate is at least 2f1 samples per second. fs = 2f1 Nyquist sampling rate.

If sampled at less than 2f1 rate, aliasing error occurs. Signal is then represented with distortion which depends on the degree of aliasing. • Use anti-aliasing filter, a low-pas filter with cut-off frequency

at f1 (or fs/2) Quantization Noise (Qe)

a(t) A/D n bit

n Qe = ± ( V ref / 2 * 2 )

e.g. V ref = 5 V, n = 8 then Qe = 5 / 512 |G(f)|

f (a) Input spectrum (a) Input continuous time signal g(t)

f fs/2 SAMP (b) Sampled spectrum (b) Sampled signal gr(t)

fs/2 fSAMP

(c) Reconstructed spectrum (c) Reconstructed signal Fig. Aliasing in the frequency domain Fig. Aliasing in the time domain LINEAR SYSTEM obeys the principle of superposition. If an input consisting of a number of signals is applied to a linear system, then the output is the sum or the superposition of the system’s responses to each signal considered separately FREQUENCY PRESERVATION PROPERTY If we apply a complicated signal containing many frequencies, the output must be the sum of output due to each input frequency , considered separately. The output contains only those frequencies present in the input. TIME INVARIANT SYSTEM It is the one whose property do not vary with time. • LTI: Linear Time Invariant LTI associative property means that we may analyze a complicated LTI system by breaking down into a number of simpler subsystems. • Commutative Property It means that the subsystems can be arranged in series or cascaded in any order without affecting the overall performance. • Causal System In this system the output depends only on the present and or/previous values of the input. • Stable System It is one that produces a finite or bounded output in response to the bounded input. • Invertibility If a system with input x[n] gives an output y[n], then its inverse would produce x[n] if fed with y[n]. BIT REVERSED ADDRESSING

It is a special type of indirect addressing. It is used for implementing FFT *ARn ++ (IRO)B After the operand is fetched, AR n is updated to (AR N + IRO) in a reversed carry propagation format. CIRCULAR ADDRESSING

A circular buffer is necessary to implement the delays associated with convolution and correlation equations. The block size is in register Bk. *ARI ++; ARI is incremented each time until it points to the bottom of the circular buffer. After that it will point to the top of the buffer. REPEAT INSTRUCTION

A block of instruction is repeated ‘count’ number of times using RPTB. RC contains the count number. LDI 8, RC RPTB Label 1 CALL filter FIX RO Label1 STI RO, * AR3 RPTB instruction repeats next instruction ‘count’ number of times PARALLEL INSTRUCTION

The symbol ‘||’ indicated parallel operation LDF 0, RO LDI 29, AR2 New Value RPTS AR2 MPYF *ARO++, *AR1++, R0 || ADDF RO, R2, R2

Old value MPYF ---> Multiply Floating point number Parallel operation DELAYED BRANCH Conditional or unconditional delayed branch allows the subsequent 3 instruction to be fetched and executed. This gives the effect of single cycle branch. BD Loop; Delayed Branch ADDF R0, R1 } FIX R1 } executed whether STI R1, *AR3 } branch is taken or not Loop Standard branches empty the pipeline before branching. This results in taking 4 cycles to execute branch. DSP CHIPS

• Analog Devices ADSP 2100, 21020 • AT&T DSP 16. 32 • DSP semiconductors Pine 16 bit fixed point • Motorola 56100, 96000 • NEC uPD 77C25 (16 bit fixed pt) • 77220 (24 bit fixed pt) • SGS Thomson ST 18 ( 16 bit fixed point) • Start semiconductor SPROC 1000 24 bit fixed point • Texas Instruments TMS3201x, 2x, 3x, 4x, 80, 6xx • Zilog Z89 Cxx 16 bit fixed DSP • Xilinx DSP FPGA MARKET SHARE

• TI 46.7% • AT&T 18.7% • MOTOROLA 15% • AD 9.3% • NEC 8.4% • OTHER 1.9% DSP Vs

Microcontroller

• Multicycle instruction set. Single cycle inst. set. • Multicycle multiplicity. Single cycle multiply. • 8 or 16 bit support. 16/32 bit fixed or floating. • Limited onchip RAM. Large on chip data RAM. • Limited data pointers. Data pointers. • Limited BW and limited algorithms. Speed!