StrongARM and Bridges Division

StrongARM

CEG/SBD SA-1500:SA-1500: AA 300300 MHzMHz RISCRISC CPUCPU withwith AttachedAttached MediaMedia Processor*Processor*

Prashant P. Gandhi, Ph.D. StrongARM and Bridges Division Computing Enhancement Group Corporation Santa Clara, CA 95052 [email protected]

*SA-1500 was designed by Digital Semiconductor’s Palo Alto Design Team.

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 1 StrongARM and Bridges Division

StrongARM OutlineOutline

CEG/SBD

‹ High Level Overview ‹ System Architecture ‹ SA-1500 Architecture ‹ Programming Model ‹ Companion Chip for Video ‹ Summary

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 2 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM DesignDesign GoalsGoals CEG/SBD ‹ High Performance for Multimedia Applications ‹ Flexible, General Purpose Architecture ‹ Easily Targeted for Different Market Segments ‹ Lower Cost for High Volume ‹ Low Power

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 3 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM SpecificationsSpecifications CEG/SBD

FEATURE DESCRIPTION Technology 0.28 micron CMOS 3.3 Million Clock Frequency 200-300 MHz Die Size 60 mm2 Power Consumption <0.5 W @ 100MHz <2.5 W @ 300MHz Voltage Internal: 1.5-2.0V I/O: 3.3V Packaging 240-pin MQFP 256-pin PBGA Clocking Dynamic clk freq. switching

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 4 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM PerformacePerformace TargetTarget CEG/SBD ‹ Replace and DSP-HW in high-volume applications ‹ Numeric and General-Purpose Processing z Multimedia, DSP, and FP Applications z MPEG-2 MP@ML with power to spare z Modembank and multi-channel voice over IP z Video conferencing ‹ Soft/scalable architecture ‹ High sustained memory bandwidth ‹ High multimedia code density ‹ Ease of programming

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 5 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM The System CEG/SBD

ROM Network I/F IDE / PCMCIA Bus I/F DRAM

P-Bus SA1500 . . Computation Chip Companion Chip . . GP Processing Network Connection .  DSP, MM, FP Audio/Video Processing . DMA, Mem Control Low Speed Logic & I/O IR O

S-BUS

Video Audio SDRAM

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 6 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM TheThe SystemSystem -- DataData FlowFlow CEG/SBD

ROM Network I/F IDE / PCMCIA Bus I/F DRAM SA1500 Video/Audio  Computation Chip Companion Chip . .  . IR O

Video Audio SDRAM

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 7 StrongARM and Bridges Division SA-1500SA-1500 ArchitectureArchitecture StrongARM ProcessorProcessor BlockBlock DiagramDiagram CEG/SBD

Prefetch Streaming Buffer Buffer 256B 128B

64-bit AMP: StrongARM W Multimedia, 16KB Execution 16KB C DSP, and FPU DCache Unit ICache S Engines Mini DCache 4KB 300MHz 1KB 300MHz

100MHz BIU 50MHz SDRAM Bus Mem Ctrl/DMA I/O Bus

WCS = Writable Control Store

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 8 StrongARM and Bridges Division SA-1500 CPU Attributes StrongARM SA-1500 CPU Attributes

CEG/SBD ‹ Semi-autonomous dual-processor CPU z Master-slave during invocation of AMP z Autonomous execution thereafter ‹ Non-homogeneous instruction set z 32-bit ARM instructions z 64-bit Long Instruction Word (LIW) AMP instructions (Arith || Mem/JMP) ‹ Single-instruction Multiple-Data (SIMD) AMP Processing z single-FMAC (32-bit), dual-IMAC (18-bit), quad-SAD (9-bit) ‹ Memory Architecture z Shared external memory z Shared Data Cache and Write Buffer z Separate Instruction Caches (Icache for ARM, WCS for AMP) z Separate Stream (pre-fetch) Buffers

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 9 StrongARM and Bridges Division SA-1500SA-1500 MemoryMemory ControllerController andand BusBus StrongARM InterfaceInterface CEG/SBD ‹ 100 MHz SDRAM Memory Controller (SBUS): z Programmable bus clock z 32-bit mode

‹ 50 MHz General I/O Bus (PBUS): z Programmable bus clock z Direct Connection to: – PC-style I/O devices – ROM/SRAM/DRAM/Flash – Bus adapter to PCI, PCMCIA

‹ DMA controller with 15 descriptor-based channels

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 10 StrongARM and Bridges Division AMP Block Diagram StrongARM AMP Block Diagram

CEG/SBD

FPU Load/Store Mem Acc

MAC 64 GPRs Acc

ALU/Shift Ctl & Status Branch WCS Acc Regs

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 11 SA-1500 Chip Micrograph StrongARM and Bridges Division

StrongARM

CEG/SBD

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 12 StrongARM and Bridges Division Processor Features StrongARM Processor Features

CEG/SBD

StrongARM AMP ‹ 32 bit RISC, ARM V4 Architecture ‹ Dual Issue instructions ‹ 16K Icache, 16K Dcache, Write buff 1x SIMD Arithmetic (ALU, MAC, FPU) 1x Memory or Branch ‹ High Code Density ‹ Dual personality ‹ All Instructions Conditional or parallel processor ‹ Operand Shift Support for ALU Inst ‹ Variety of Register DataTypes ‹ Powerful 32 x 32 + 64 -> 64 MAC 8/9b, 16/18b, 32/36b, 32-bit FP 64 registers (36 bits wide) ‹ Fully Pipelined Operations ‹ 512-instruction WCS

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 13 StrongARM and Bridges Division AMP Operations StrongARM AMP Operations

CEG/SBD

‹ Fully pipelined ‹ Fully pipelined Arithmetic operations Memory and Branch operations

‹ ‹ ALU/MAC/FPU Operations: Load/Store/Branch Operations: z Reg+Disp, Reg+Scaled-Index z 8b mul with scale/round/sat z Reverse-byte, replicate, texel loads z 9b averaging, pack/unpack z Transpose stores with auto-update z 9/18b add, sub, motion estimation z Single-inst compare and branches z 18/36b mul, mul-add, mul-sub with scale/round/sat z Decrement and branch z 32/36b add, sub, shift, logical z Single Precision FPU: mul-add, mul-sub, div, sqrt

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 14 StrongARM and Bridges Division SA-1500 Instruction Execution StrongARM SA-1500 Instruction Execution

CEG/SBD ‹ Parallel Method ‹ Tightly Coupled Method z StrongARM “library call” to WCS entry z Issue an “Execute” or a “Memory/Branch” z USAGE: operation in StrongARM inst stream MPEG-II Decoding z Supported by the C compiler 3-D/Graphics Libraries Application-Specific Functions StrongARM AMP EXU AMP MBU 1 X StrongARM AMP EXU AMP MBU 2 X 1X 3 X 2 XX X 4 X X 3XX X 5 X X 6 X X 4X 7 X X 8 X

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 15 StrongARM and Bridges Division SA-1500: Programming Flexibility StrongARM SA-1500: Programming Flexibility

CEG/SBD

Media-Processor AMP Asm + Libraries Fast RISC Core

Multimedia- SA Asm Enhanced, Fast Libraries RISC Core PERFORMANCE HLL Fast RISC Core PROGRAMMING EFFORT Compiled Libraries Multimedia Programming Model

StrongARM native AMP:Tightly Coupled AMP:Parallel Method

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 16 StrongARM and Bridges Division Example of AMP Processing Flow StrongARM Example of AMP Processing Flow

CEG/SBD N−1 ‹ FIR Filtering: y(m) = ∑h(n) x(m − n ,) m = ,1,0 / ,M −1 n=0 ‹ Assume: 16-bit data & coeff, N = 64, M = multiple of 128, dual-MAC processing ‹ AMP Processing: z Prefetch data x[0:63] into PFB[0:63] z Load coefficients h[0:63] into registers L0-L31 z Loop m=1:M/128 – Prefetch data x[m*64:m*64+63] into PFB[64:128] – Compute and store y[(m-1)*64: (m-1)*64+63] – Prefetch data x[(m+1)*64:(m+1)*64+63] into PFB[0:63] – Compute and store y[m*64: m*64+63] z Update FIR states x[0:63]

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 17 StrongARM and Bridges Division Multimedia Processing StrongARM Multimedia Processing

CEG/SBD ‹ MPEG-2 Decode z Main Level/Main Profile: 720x480x30FPS z Processor Load Balancing: – StrongARM: All processing above MacroBlocks (coded in C) – AMP: MacroBlocks down (hand-coded) z Audio processing using FPU ‹ Modembanks z Can Support multiple modems (V.34 or V.90) ‹ Video Conferencing z H.324 ‹ Voice over IP z Can support multiple digital voice channels (ITU G.72x plus echo cancellation)

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 18 StrongARM and Bridges Division

StrongARM ExampleExample Application:Application: MPEG-2MPEG-2 decodedecode

CEG/SBD ‹Transport Stream De-multiplex y Companion chip:Filters transport stream Adds timestamps for synchronization Tags packets, DMA's into SDRAM y StrongARM: Splits stream into components

‹ Audio Decoding: Decodes MPEG 2 audio in software y StrongARM: Control, bitstream parsing, dequantize, float y AMP : IDCT, windowing, fix pt cvt, store y Companion chip: Formats, drives external DACs (x6)

‹ Video Decoding: Can decode full MPEG-2 MP@ML in software y StrongARM: Control, bitstream parsing, VLC decoding y AMP : Dequantize, IDCT, motion-reconstruction y Companion chip: Scaling, mixing with overlay data Pan & Scan

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 19 StrongARM and Bridges Division

StrongARM CompanionCompanion Chip:Chip:

ArchitectureArchitecture CEG/SBD

Low Speed I/O DMA/Interrupt Ctl Timers

Interchip Interface Interchip Network Interface FIFOs Filtering Audio

Movie Scalar Video Encode

Palettes Graphics Scalar

Pointer

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 20 StrongARM and Bridges Division

StrongARM CompanionCompanion Chip:Chip: Video/NetworkVideo/Network

CEG/SBD ‹ Flexible Network Interface w/ Packet Filtering y up to155 Mbit receive and transmit channels ‹ ITU601Digital Video Generation System y Movie Stream and Palletized Graphics y Up/down horizontal scaling (320 tap filter) y Palletized Cursor/Pointer overlay y Variable mix of Movie and Graphics y On chip fully programmable pixel clock VCO/PLL ‹ Digital Video Output y NTSC, PAL encoding, or direct monitor drive y Macrovision(tm) Anti-taping Option y Composite, S-Video or RGB output

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 21 StrongARM and Bridges Division

StrongARM CompanionCompanion Chip:Chip: LowLow SpeedSpeed I/OI/O

CEG/SBD ‹ Serial Ports yUp to 6 Channel Audio Output System Serial Digital Output to External DACs yIrDA + Generic Infra-Red Interfaces yTelephone Quality CODEC Interface for Audio Input yPS/2 Keyboard/Mouse Interfaces ySmart Card Interface ‹ IEEE1284 Parallel Port Interface ‹ General Purpose Timers ‹ Flexible Interrupt Controller ‹ General Purpose I/Os

Prashant Gandhi, Hotchips 10, August 1998, Stanford University 22