Last Time

General model of Assembly Languages II • Undifferentiated sequence of instructions • Arithmetic instructions (ADD, SUB) • Control-flow (JMP, CALL, RET)

Four main types: CISC, RISC, DSP, and VLIW Prof. Stephen A. Edwards CISC • Few, special-purpose registers • Complex addressing modes with contributions from Prof. Brian Evans, • Powerful instruction (e.g., string move)

Niranjan Damera-Venkata and RISC Magesh Valliappan, UT Austin • Many general-purpose registers • Few addressing modes • Arithmetic operations don’t touch memory • Simple instructions

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Digital Signal Processor Apps. Conventional DSP Architecture

Low-cost embedded systems Harvard architecture

• Modems, cellular telephones, disk drives, printers • Separate data memory/ and program memory/bus • Three reads and one or two writes per instruction cycle High-throughput applications Deterministic service routine latency • Halftoning, base stations, 3-D sonar, tomography Multiply-accumulate in single instruction cycle PC based multimedia Special addressing modes supported in hardware • Compression/decompression of audio, graphics, video • Modulo addressing for circular buffers for FIR filters Embedded processor requirements • -reversed addressing for fast Fourier transforms • Inexpensive with small area and volume Instructions to keep the pipeline (3-4 stages) full • Deterministic interrupt service routine latency • Zero-overhead looping (one pipeline flush to set up) • Low power: ~50 mW (TMS320C54x uses 0.36 µA/MIPS) • Delayed branches

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Conventional DSPs Conventional DSPs

Fixed-Point Floating-Point ¡ Market share: 95% fixed-point, 5% floating-point

Cost/Unit $5 - $79 $5 - $381 ¡ Architecture Accumulator load-store or Each processor comes in dozens of configurations memory-register • Data and program memory size Registers 2-4 data, 8 address 8-16 data, 8-16 address • Peripherals: A/D, D/A, serial, parallel ports, timers

Data Words 16 or 24 bit 32 bit ¡ Chip Memory 2-64K data and program 8-64K data and program Drawbacks Address 16-128K data, 16M – 4Gdata, • No byte addressing (needed for image and video) Space 16-64K program 16M – 4G program • Limited on-chip memory Compilers Bad C Better C, C++ • Limited addressable memory on most fixed-point DSPs

Examples TI TMS320C5x; TI TMS320C3x; • Non-standard C extensions to support fixed-point data Motorola 56000 Analog Devices SHARC

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

1 DSP Example 56001 Programmer’s Model

¡ Finite Impulse Response filter (FIR) 55 48 47 24 23 0 15 0 x1 x0 Source Program counter ¡ Can be used for lowpass, highpass, bandpass, etc. y1 y0 Registers Status Register

¡ a2 a1 a0 Loop Address Basic DSP operation Accumulators b2 b1 b0 Loop Count Pointer Offset Modifier ¡ 15 For each sample, computes 15 0 15 0 15 0 … -1 -1 -1 PC Stack z z z r7 n7 m7 0 k r6 n6 m6 y = Σ a x r5 n5 m5 15 n i n+i … i=0 r4 n4 m4 Address SR Stack

r3 n3 m3 Registers 0 ¡ a0 … ak are filter coefficients r2 n2 m2

¡ Stack Pointer x and y are the nth input and output sample r1 n1 m1 n n r0 n0 m0 Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

56001 Datapath 56001 Memory Spaces

x1 ¡ Three memory regions, each 64K: x0 • 24-bit Program memory y0 • 24-bit X data memory y1 • 24-bit Y data memory

multiplier Y bus X bus ¡ Idea: enable simultaneous access of program, shifter alu sample, and coefficient memory

¡ Three on-chip memory spaces can be used this way

a2 a1 a0 ¡ b2 b1 b0 One off-chip memory pathway connected to all three memory spaces

shifter/limiter ¡ Only one off-chip access per cycle maximum

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

56001 Address Generation FIR Filter in 56001

¡ Addresses come from pointer register r0 … r7 n equ 20 Define symbolic constants

¡ Offset registers n0 … n7 can be added to pointer start equ $40 samples equ $0 ¡ Modifier registers cause the address to wrap around coefficients equ $0 Addresses of ¡ Zero modifier causes reverse-carry arithmetic input equ $ffe0 memory-mapped I/O Address Notation Next value of r0 output equ $ffe1 “Locate this in program r0 (r0) r0 memory at $40” org p:start r0 + n0 (r0+n0) r0 “Initialize pointers to r0 (r0)+ (r0 + 1) mod m0 move #samples, r0 samples and r0 – 1 –(r0) r0 – 1 mod m0 move #coefficients, r4 coefficients” r0 (r0)– (r0 – 1) mod m0 move #n-1, m0 r0 (r0)+n0 (r0 + n0) mod m0 move m0, m4 “Prepare to treat these as r0 (r0)-n0 (r0 – n0) mod m0 circular buffers of size n”

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

2 FIR Filter in 56001 FIR Filter in 56001 “Load a sample from an I/O movep y:input, x:(r0) device in Y data memory” movep y:input, x:(r0)

“Load a sample from X memory clr a x:(r0)+, x0 y:(r4)+, y0 “Clear accumulator A” into x0, advance the pointer” “Repeat the next instruction n-1 times” clr a x:(r0)+, x0 y:(r4)+, y0 “Fetch next sample and coefficient” rep #n-1 “Load a coefficient from Y memory into y0, advance the pointer” mac x0,y0,a x:(r0)+, x0 y:(r4)+, y0 rep #n-1 “a = a + x0 * y0” mac x0,y0,a x:(r0)+, x0 y:(r4)+, y0 macr x0,y0,a (r0)-

macr x0,y0,a (r0)- movep a, y:output

movep a, y:output

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

FIR Filter in 56001 TI TMS320C6000 VLIW DSP

movep y:input, x:(r0) ¡ Eight instruction units dispatched by one very long instruction word

clr a x:(r0)+, x0 y:(r4)+, y0 ¡ Designed for DSP applications

¡ Orthogonal instruction set

rep #n-1 ¡ Big, uniform register file (16 32-bit registers) mac x0,y0,a x:(r0)+, x0 y:(r4)+, y0 ¡ Better compiler target than 56001

macr x0,y0,a (r0)- “Get ready for the ¡ Deeply pipelined (up to 15 levels) next sample” ¡ Complicated, but more regular, datapath movep a, y:output “a = a + x0 * y0 and round the result”

“Write the filtered result to an I/O device in Y data memory” Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Pipelining on the C6 ’C6 Datapath

¡ One instruction issued per clock cycle A0 B0 … … ¡ Very deep pipeline • 4 fetch cycles A15 B15 • 2 decode cycles • 1-10 execute cycles

¡ Branch in pipeline disables

¡ Conditional instructions avoid branch-induced stalls .L1 .S1 .M1 .D1 .D2 .M2 .S2 .L2

¡ No hardware to protect against hazards • Assembler or compiler’s responsibility

D A A D

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

3 ’C6 Datapath FIR in ’C6 Assembly

¡ Two identical halves B0 “Load a halfword (16 )” … ¡ Each has “Do this on unit D1” FIRLOOP: • 16 32-bit registers B15 LDH .D1 *A1++, A2 ; Fetch next sample • Logical/Arithmetic (.L) || LDH .D2 *B1++, B2 ; Fetch next coefficient • Shifter/Branching (.S) || [B0] SUB .L2 B0, 1, B0 ; Decrement loop count • Multiplier (.M) || [B0] B .S2 FIRLOOP ; Branch if non-zero • Data/Memory (.D) .D2 .M2 .S2 .L2 || MPY .M1X A2, B2, A3 ; Sample * Coefficient || ADD .L1 A4, A3, A4 ; Accumulate result ¡ One cross path X: “Use the cross path” predicated instruction: “Execute only if B0 is non-zero” “Run all of these A D instructions in parallel”

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Peripherals Example: 56001 Port C

¡ Often the whole point of the system Nine pins each usable in one of two ways

• Simple parallel I/O ¡ Memory-mapped I/O • Serial interface • Magical memory locations that make something happen or change on their own Parallel Serial

PC0 RxD ¡ Typical meanings: PC1 TxD Serial Communication Interface (SCI) • Configuration (write) PC2 SCLK • Status (read) • Address/Data (access more peripheral state) PC3 SC0 PC4 SC1 PC5 SC2 PC6 SCK Synchronous Serial Interface (SSI) PC7 SRD PC8 STD

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Port C Registers for Parallel Port Port C Registers for Parallel Port ¡ ¡ Port C Control Register Port C Data Register

• Selects mode (parallel or serial) of each pin • Returns input data or sets output state of parallel port X: $FFE1 X: $FFE5

0 = parallel I/O Read: pin state 1 = serial I/O Write: set output pin state ¡ Port C Data Direction Register • I/O direction when used in parallel mode X: $FFE3

0 = Input 1 = Output

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

4

Port C SCI Port C SCI Registers ¡ Three-pin interface X: $FFF0 ¡ 422 Kbit/s NRZ asynchronous interface (RS-232-like)

Word select bits ¡ ¡ 3.375 Mbit/s synchronous serial mode SCI Control Shift direction Send break ¡ Register Multidrop mode for multiprocessor systems Wakeup mode select ¡ Two Wakeup modes Receiver wakeup enable Wired-OR mode select • Idle line Receiver Enable • Address bit Transmitter Enable ¡ Wired-OR mode Idle line interrupt enable

¡ Receive interrupt enable On-chip or external baud rate generator Transmit interrupt enable ¡ Four interrupt priority levels Timer interrupt enable Clock polarity

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Port C SCI Registers Port C SCI Registers

X: $FFF1 X: $FFF2

Transmitter Empty ¡ ¡ SCI Status Transmitter Reg Empty SCI Clock Clock Divider Bits Register (read- Receive Data Full Control Register Clock Output Divider only) Idle Line Clock Prescaler Overrun Error Receive Clock Source Parity Error Transmit Clock Source Framing Error Received bit 8

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

Port C SSI Port C SSI Registers ¡ ¡ Intended for synchronous, constant-rate protocols $FFEC SSI Control Register A

• Easy interface to serial ADCs and DACs • prescaler, frame rate, word length ¡ ¡ Many more operating modes than SCI $FFED SSI Control Register B

¡ Six Pins (Rx, Tx, Clk, Rx Clk, Frame Sync, Tx Clk) • Interrupt enables, various mode settings ¡ ¡ 8, 12, 16, or 24-bit words $FFEE SSI Status/Time Slot Register • Sync, empty, overrun

¡ $FFEF SSI Receive/Transmit Data Register

Copyright © 2001 Stephen A. Edwards All rights reserved Copyright © 2001 Stephen A. Edwards All rights reserved

5