StrongARM and Bridges Division
StrongARM
CEG/SBD SA-1500:SA-1500: AA 300300 MHzMHz RISCRISC CPUCPU withwith AttachedAttached MediaMedia Processor*Processor*
Prashant P. Gandhi, Ph.D. StrongARM and Bridges Division Computing Enhancement Group Intel Corporation Santa Clara, CA 95052 [email protected]
*SA-1500 was designed by Digital Semiconductor’s Palo Alto Design Team.
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 1 StrongARM and Bridges Division
StrongARM OutlineOutline
CEG/SBD
High Level Overview System Architecture SA-1500 Architecture Programming Model Companion Chip for Video Summary
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 2 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM DesignDesign GoalsGoals CEG/SBD High Performance for Multimedia Applications Flexible, General Purpose Architecture Easily Targeted for Different Market Segments Lower Cost for High Volume Low Power
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 3 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM SpecificationsSpecifications CEG/SBD
FEATURE DESCRIPTION Technology 0.28 micron CMOS Transistor Count 3.3 Million Clock Frequency 200-300 MHz Die Size 60 mm2 Power Consumption <0.5 W @ 100MHz <2.5 W @ 300MHz Voltage Internal: 1.5-2.0V I/O: 3.3V Packaging 240-pin MQFP 256-pin PBGA Clocking Dynamic clk freq. switching
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 4 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM PerformacePerformace TargetTarget CEG/SBD Replace Microprocessor and DSP-HW in high-volume applications Numeric and General-Purpose Processing z Multimedia, DSP, and FP Applications z MPEG-2 MP@ML with power to spare z Modembank and multi-channel voice over IP z Video conferencing Soft/scalable architecture High sustained memory bandwidth High multimedia code density Ease of programming
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 5 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM The System CEG/SBD
ROM Network I/F IDE / PCMCIA Bus I/F DRAM
P-Bus SA1500 . . Computation Chip Companion Chip . . GP Processing Network Connection . DSP, MM, FP Audio/Video Processing . DMA, Mem Control Low Speed Logic & I/O IR O
S-BUS
Video Audio SDRAM
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 6 StrongARM and Bridges Division HighHigh LevelLevel OverviewOverview StrongARM TheThe SystemSystem -- DataData FlowFlow CEG/SBD
ROM Network I/F IDE / PCMCIA Bus I/F DRAM SA1500 Video/Audio Computation Chip Companion Chip . . . IR O
Video Audio SDRAM
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 7 StrongARM and Bridges Division SA-1500SA-1500 ArchitectureArchitecture StrongARM ProcessorProcessor BlockBlock DiagramDiagram CEG/SBD
Prefetch Streaming Buffer Buffer 256B 128B
64-bit AMP: StrongARM W Multimedia, 16KB Execution 16KB C DSP, and FPU DCache Unit ICache S Engines Mini DCache 4KB 300MHz 1KB 300MHz
100MHz BIU 50MHz SDRAM Bus Mem Ctrl/DMA I/O Bus
WCS = Writable Control Store
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 8 StrongARM and Bridges Division SA-1500 CPU Attributes StrongARM SA-1500 CPU Attributes
CEG/SBD Semi-autonomous dual-processor CPU z Master-slave during invocation of AMP z Autonomous execution thereafter Non-homogeneous instruction set z 32-bit ARM instructions z 64-bit Long Instruction Word (LIW) AMP instructions (Arith || Mem/JMP) Single-instruction Multiple-Data (SIMD) AMP Processing z single-FMAC (32-bit), dual-IMAC (18-bit), quad-SAD (9-bit) Memory Architecture z Shared external memory z Shared Data Cache and Write Buffer z Separate Instruction Caches (Icache for ARM, WCS for AMP) z Separate Stream (pre-fetch) Buffers
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 9 StrongARM and Bridges Division SA-1500SA-1500 MemoryMemory ControllerController andand BusBus StrongARM InterfaceInterface CEG/SBD 100 MHz SDRAM Memory Controller (SBUS): z Programmable bus clock z 32-bit mode
50 MHz General I/O Bus (PBUS): z Programmable bus clock z Direct Connection to: – PC-style I/O devices – ROM/SRAM/DRAM/Flash – Bus adapter to PCI, PCMCIA
DMA controller with 15 descriptor-based channels
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 10 StrongARM and Bridges Division AMP Block Diagram StrongARM AMP Block Diagram
CEG/SBD
FPU Load/Store Mem Acc
MAC 64 GPRs Acc
ALU/Shift Ctl & Status Branch WCS Acc Regs
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 11 SA-1500 Chip Micrograph StrongARM and Bridges Division
StrongARM
CEG/SBD
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 12 StrongARM and Bridges Division Processor Features StrongARM Processor Features
CEG/SBD
StrongARM AMP 32 bit RISC, ARM V4 Architecture Dual Issue instructions 16K Icache, 16K Dcache, Write buff 1x SIMD Arithmetic (ALU, MAC, FPU) 1x Memory or Branch High Code Density Dual personality All Instructions Conditional coprocessor or parallel processor Operand Shift Support for ALU Inst Variety of Register DataTypes Powerful 32 x 32 + 64 -> 64 MAC 8/9b, 16/18b, 32/36b, 32-bit FP 64 registers (36 bits wide) Fully Pipelined Operations 512-instruction WCS
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 13 StrongARM and Bridges Division AMP Operations StrongARM AMP Operations
CEG/SBD
Fully pipelined Fully pipelined Arithmetic operations Memory and Branch operations
ALU/MAC/FPU Operations: Load/Store/Branch Operations: z Reg+Disp, Reg+Scaled-Index z 8b mul with scale/round/sat z Reverse-byte, replicate, texel loads z 9b averaging, pack/unpack z Transpose stores with auto-update z 9/18b add, sub, motion estimation z Single-inst compare and branches z 18/36b mul, mul-add, mul-sub with scale/round/sat z Decrement and branch z 32/36b add, sub, shift, logical z Single Precision FPU: mul-add, mul-sub, div, sqrt
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 14 StrongARM and Bridges Division SA-1500 Instruction Execution StrongARM SA-1500 Instruction Execution
CEG/SBD Parallel Method Tightly Coupled Method z StrongARM “library call” to WCS entry z Issue an “Execute” or a “Memory/Branch” z USAGE: operation in StrongARM inst stream MPEG-II Decoding z Supported by the C compiler 3-D/Graphics Libraries Application-Specific Functions StrongARM AMP EXU AMP MBU 1 X StrongARM AMP EXU AMP MBU 2 X 1X 3 X 2 XX X 4 X X 3XX X 5 X X 6 X X 4X 7 X X 8 X
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 15 StrongARM and Bridges Division SA-1500: Programming Flexibility StrongARM SA-1500: Programming Flexibility
CEG/SBD
Media-Processor AMP Asm + Libraries Fast RISC Core
Multimedia- SA Asm Enhanced, Fast Libraries RISC Core PERFORMANCE HLL Fast RISC Core PROGRAMMING EFFORT Compiled Libraries Multimedia Programming Model
StrongARM native AMP:Tightly Coupled AMP:Parallel Method
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 16 StrongARM and Bridges Division Example of AMP Processing Flow StrongARM Example of AMP Processing Flow
CEG/SBD N−1 FIR Filtering: y(m) = ∑h(n) x(m − n ,) m = ,1,0 / ,M −1 n=0 Assume: 16-bit data & coeff, N = 64, M = multiple of 128, dual-MAC processing AMP Processing: z Prefetch data x[0:63] into PFB[0:63] z Load coefficients h[0:63] into registers L0-L31 z Loop m=1:M/128 – Prefetch data x[m*64:m*64+63] into PFB[64:128] – Compute and store y[(m-1)*64: (m-1)*64+63] – Prefetch data x[(m+1)*64:(m+1)*64+63] into PFB[0:63] – Compute and store y[m*64: m*64+63] z Update FIR states x[0:63]
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 17 StrongARM and Bridges Division Multimedia Processing StrongARM Multimedia Processing
CEG/SBD MPEG-2 Decode z Main Level/Main Profile: 720x480x30FPS z Processor Load Balancing: – StrongARM: All processing above MacroBlocks (coded in C) – AMP: MacroBlocks down (hand-coded) z Audio processing using FPU Modembanks z Can Support multiple modems (V.34 or V.90) Video Conferencing z H.324 Voice over IP z Can support multiple digital voice channels (ITU G.72x plus echo cancellation)
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 18 StrongARM and Bridges Division
StrongARM ExampleExample Application:Application: MPEG-2MPEG-2 decodedecode
CEG/SBD Transport Stream De-multiplex y Companion chip:Filters transport stream Adds timestamps for synchronization Tags packets, DMA's into SDRAM y StrongARM: Splits stream into components
Audio Decoding: Decodes MPEG 2 audio in software y StrongARM: Control, bitstream parsing, dequantize, float y AMP : IDCT, windowing, fix pt cvt, store y Companion chip: Formats, drives external DACs (x6)
Video Decoding: Can decode full MPEG-2 MP@ML in software y StrongARM: Control, bitstream parsing, VLC decoding y AMP : Dequantize, IDCT, motion-reconstruction y Companion chip: Scaling, mixing with overlay data Pan & Scan
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 19 StrongARM and Bridges Division
StrongARM CompanionCompanion Chip:Chip:
ArchitectureArchitecture CEG/SBD
Low Speed I/O DMA/Interrupt Ctl Timers
Interchip Interface Interchip Network Interface FIFOs Filtering Audio
Movie Scalar Video Encode
Palettes Graphics Scalar
Pointer
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 20 StrongARM and Bridges Division
StrongARM CompanionCompanion Chip:Chip: Video/NetworkVideo/Network
CEG/SBD Flexible Network Interface w/ Packet Filtering y up to155 Mbit receive and transmit channels ITU601Digital Video Generation System y Movie Stream and Palletized Graphics y Up/down horizontal scaling (320 tap filter) y Palletized Cursor/Pointer overlay y Variable mix of Movie and Graphics y On chip fully programmable pixel clock VCO/PLL Digital Video Output y NTSC, PAL encoding, or direct monitor drive y Macrovision(tm) Anti-taping Option y Composite, S-Video or RGB output
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 21 StrongARM and Bridges Division
StrongARM CompanionCompanion Chip:Chip: LowLow SpeedSpeed I/OI/O
CEG/SBD Serial Ports yUp to 6 Channel Audio Output System Serial Digital Output to External DACs yIrDA + Generic Infra-Red Interfaces yTelephone Quality CODEC Interface for Audio Input yPS/2 Keyboard/Mouse Interfaces ySmart Card Interface IEEE1284 Parallel Port Interface General Purpose Timers Flexible Interrupt Controller General Purpose I/Os
Prashant Gandhi, Hotchips 10, August 1998, Stanford University 22