ATI Radeon™ HD 2000 Series Technology Overview
Total Page:16
File Type:pdf, Size:1020Kb
C O N F I D E N T I A L ATI Radeon™ HD 2000 Series Technology Overview Richard Huddy Worldwide DevRel Manager, AMD Graphics Products Group Introducing the ATI Radeon™ HD 2000 Series ATI Radeon™ HD 2900 Series – Enthusiast ATI Radeon™ HD 2600 Series – Mainstream ATI Radeon™ HD 2400 Series – Value 2 ATI Radeon HD™ 2000 Series Highlights Technology leadership Cutting-edge image quality features • Highest clock speeds – up to 800 MHz • Advanced anti-aliasing and texture filtering capabilities • Highest transistor density – up to 700 million transistors • Fast High Dynamic Range rendering • Lowest power for mobile • Programmable Tessellation Unit 2nd generation unified architecture ATI Avivo™ HD technology • Superscalar design with up to 320 stream • Delivering the ultimate HD video processing units experience • Optimized for Dynamic Game Computing • HD display and audio connectivity and Accelerated Stream Processing DirectX® 10 Native CrossFire™ technology • Massive shader and geometry processing • Superior multi-GPU support performance • Enabling the next generation of visual effects 3 The March to Reality Radeon HD 2900 Radeon X1950 Radeon Radeon X1800 X800 Radeon Radeon 9700 9800 Radeon 8500 Radeon 4 2nd Generation Unified Shader Architecture y Development from proven and successful Command Processor Sha S “Xenos” design (XBOX 360 graphics) V h e ade der Programmable r t Settupup e x al Z Tessellator r I Scan Converter / I n C ic n s h • New dispatch processor handling thousands of Engine ons Rasterizer Engine d t c r e r u x ar e c t f a t Fe er i i n simultaneous threads on C t H Buf Geometry tc Vertex C t h u Interpolators Assembler Assembler a a c c O h h e m e y Up to 320 discrete, independent a re stream processing units St Ultra-Threaded Dispatch Processor • In comparison, ATI Radeon X19xx family L T Tex e had 48 vector + 48 scalar processing units 2 h ext T c L a e 1 xtu Ver C T t e ur ur e r it Stream Stream t xtu e r e Ca y Superscalar ALU implementation x e W e C r e U Units ch Ca PrProcessingocessing a c ead/ e h n ch e R • Dedicated branch execution units and y i e r e Units t o Units s ch m e texture address processors Ca M l ci n e t y Full support for DirectX 10.0, Z/S Shader Model 4.0 Shader Export • Dynamic shader load balancing between vertex, geometry and pixel shader operations Render Baackck-EnEndsds • Handled automatically by hardware scheduler Color Cache 5 Shader Processor Progression Vector Radeon and 1 instruction/clk earlier 4 components Vector Radeon + 9600 Scalar 9700 2 instructions/clk 9800 3+1 or 4+1 components X series Superscalar Radeon 5 instructions/clk HD 2000 5 components series 6 Shader Processors Command Processor S 5-Way BBrranchanch EExecutixecutioonn UUnniitt S V Programmable had had e Tessellator r t er Z ex er al Index I c n C i Scan Converter / s Superscalar h o t Rasterizer n Geometry Vertex ru s arc er c t f ant f Assembler Assembler F t er i i u o e n Stream PrH ocessing B tc Stream Processing C C Shader t h u a a Interpolators c c h h Units e Units e ream O Processor t S Ultra-Threaded Dispatch Processor GeGenerneraall PPuurrpposeose RReegistegisterrss L2 T e ch L e 1 x Ve Ca T t u e te r r i t x e r e t C ur x C /W a e C d ch a a c e a h c e Re h y e r e o ch m e M cil Ca en St / Z Shader Export Color Cache Pushing a TeraFLOP 475 GigaFLOPS per GPU • 237 billion single precision floating point multiply-add operations per second • Real, measurable performance – not just theoretical 950 GigaFLOPS in a CrossFire configuration • Tera-scale computing is possible today on your desktop Unprecedented compute density • More than 1 GigaFLOP per mm2 • Less than $1 per GigaFLOP • Over 3.4 GigaFLOPs per Watt 8 Texture Unit Features Full speed floating point texture filtering • 64-bit HDR textures bilinear filtered at full speed (~7x faster than Radeon X1000 series)1 • 128-bit floating point textures filtered at half speed • New compact 32-bit HDR shared exponent texture format (RGBE 9:9:9:5) • Trilinear and anisotropic filtering supported for all formats Improved high quality anisotropic filtering Percentage Closer Filtering (PCF) for enhanced shadow rendering High resolution texture support • Up to 64 MTexels (8192 x 8192) Full texturing capability accessible to vertex, geometry, and pixel shader programs 9 Bandwidth Drives Performance 2 512 bits Per f or sec) mance (3DMar / B G GDDR4 h ( t d GDDR3 256 DDR2 k0 Bandwi 3 y 128 bits bits S DDR c ore) Memor 10 Memory Controller Progression ATI Radeon X850 & earlier Centralized Crossbar + all competing GPUs to date Partially ATI Radeon X1000 Hybrid Distributed Series Ring Bus Fully ATI Radeon Ring Bus HD 2000 Distributed Series 11 Massive Bandwidth ATI Radeon HD 2900 memory controller • World’s first 512-bit memory interface GDDR3/4 GDDR3/4 • Designed for full performance HDR rendering DRAM DRAM 64-bit memory 64-bit memory channel channel P 1024-bit ring bus C Sequencer Sequencer I Ring R E (512-bit read + 512- in x Stop g p Arbiter Arbiter re bit write) S s to s p Ring Stop Arbiter Arbiter Ring Ring Crossbar Mux Stop Stop Read Write Memory client interfaces HighlHighliightsghts •• OOveverr 110000 GGBB/sec/sec memomemoryry bbaannddwwididtthh Ring • Eight 64-bit memory channels Stop • Eight 64-bit memory channels •• KKilobilobitit rringing bubuss •• FFulullyly didissttriribubutedted ddeesisigngn --nnoo cencenttraral l huhubb •• SSimpimpliliffieiedd llaayoyoutut,, hhigighhlyly scascallaabblele 12 High Dynamic Range Performance ATI Radeon HD 2000 Series vs. ATI Radeon X1000 Series High Dynamic Range Performance Radeon X1950 XTX Radeon HD 2900 XT 240% 220% 200% 180% 160% 140% 120% 100% 80% Far Cry HDR Fa r Cr y HDR 3DMark06 HDR1 3DMark06 HDR2 Serious Sam 2 Serious Sam 2 El d e r Sc r o l l s I V: El d e r Sc r o l l s I V: 16x12 25x16 12x10 12x10 HDR HDR Oblivion Oblivion 16x12 25x16 16x12 25x16 13 Geometry Performance Large vertex cache • 8x larger than Radeon X1950 for improved vertex fetch performance Fast, full-featured Vertex Texture Fetch • Uses same texture units as pixel shaders All shader processors can perform vertex and/or geometry processing if necessary • Up to 10x the vertex processing power of Radeon X1950 available on demand • Up to 50x the geometry processing power of the fastest competing DirectX 10 GPUs3 14 Tessellation All ATI Radeon HD 2000 series GPUs feature new programmable tessellation unit • Based on Xbox 360 technology • Provides highly effective geometry data compression • Orders of magnitude faster than CPU-based or geometry shader-based tessellation Enables: • More detailed animation • More realistic characters • Complex terrain • More sophisticated shader effects 15 CrossFire All ATI Radeon HD 2000 series GPUs CrossFire Rendering Modes feature native CrossFire support GPU_0 GPU_1 Simplified CrossFire experience • Easy plug-and-play setup Frame_0 Frame_1 Display • No special master cards required Alternate Frame • Integrated compositing engine Rendering • New AFR detect algorithm - intelligent mode selection for best scalability SuperTile Most immersive and most responsive gaming experience High bandwidth dual-link GPU • Scissor interconnect • Supports display resolutions up to 2560x2048 @ 60Hz Super AA • Built for future scalability (>2 GPUs) 16 CrossFire Performance ATI Radeon HD 2900 XT CrossFire Scaling Radeon HD X2900 XT Radeon HD X2900 XT CrossFire 200% 180% 160% 140% 120% 100% 80% 3DMark05 3DMark06 Co m p a n y o f Call of Duty 2 Doom 3 Fa r Cr y FEAR Hal f Li fe 2 Half Life 2 : EP1 Hal f Li fe 2: LC Pr ey Se r i o u s Sa m 2 Splinter Cell:CT St a l k e r 25x16 4xAA 8xAF 25x16 4xAA 8xAF Heroes 25x16 16xAF 25x16 4xAA 8xAF 25x16 16xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 4xAA 8xAF 25x16 AA 8xAF 17 ATI Avivo™ HD Technology Dedicated silicon for accelerated HD video decode and processing • UVD – Unified Video Decoder • AVP – Advanced Video Processor Leading video playback quality y Up to 128 out of 130 on HQV video quality test Dual-link outputs with HDMI & HDCP • First products to support high resolution HDMI displays4 On-chip HD audio controller • AC3 5.1 surround-sound output over HDMI 18 Cutting Edge Process Technology ATI Radeon HD 2900 1.6 • 700 million transistors at 750 MHz 1.4 Transistor Speed • Uses unique TSMC 80nm process (80HS) 1.2 • Optimized for high clock speeds 1.0 0.8 ATI Radeon HD 2600 & HD 2400 0.6 • Use unique TSMC 65nm process (65G+) 0.4 • Optimized for power efficiency 0.2 0 Both processes target aggressive transistor density 90GT 80HS 2.0 Gates per mm2 1.8 Leakage power per mm2 1.8 1.6 1.6 1.4 1.4 1.2 1.2 1 1 0.8 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 90GT 80GT 80HS 65G+ 0 80GT 65G+ 19 Unified Architecture (Painful) Detail y Command Processor Command Processor S S V Programmable hade h e ader SeSetuptup Tessellator r t e r x al Z I In ns C ic Scan Converter / EEngingine h o d t Rasterizer n r Geometry Vertex e r uc y Setup Engine s x e t a ff Assembler Assembler t F erarc i i nt u o e n H B tch C C t u a a Interpolators c c O h h e m e a y Ultra-Threaded Dispatch Processor e r t S Ultra-Threaded Dispatch Processor y Stream Processing Units L Text T e 2 T ext ch L a e 1 x Ve C T t ur ur ur e te r i Stream Stream t x e y Texture Units & Caches r ex t ur C e U W e C / a e d Un c a a Processing C Processing h c e a h n c e Re h y it e r it e h Units o Units s c s m y Memory Read/Write Cache & a Me il C enc t S Stream Out Buffer / Z y Shader Export Shader Export y Render Back-Ends Renndder BBackack-EndsEnds AATTII RRaadedeoonn