ATI Stream Computing Update
Total Page:16
File Type:pdf, Size:1020Kb
Stream Computing on ATI Radeon™ Embedded Graphics Processors Peter Mandl Sr. Product Mgr, Embedded Graphics January 2010 1 Stream Computing Harnessing the Computational Power of GPUs • Hundreds of processing elements in a Graphics Processor Unit (GPU) • Enormous computational capability for data parallel workloads • Enables significant performance/watt and performance/$ improvements • Works together with the CPU Software Applications Graphics Workloads Serial and Task Parallel Data Parallel Workloads Workloads 2 ATI Stream Technology is… Heterogeneous: Developers leverage AMD GPUs and CPUs for optimal application performance and user experience High performance: Massively parallel, programmable GPU architecture enables superior performance and power efficiency Industry Standards: OpenCL™ and DirectCompute 11 enable cross- platform development Gaming Digital Content Productivity Creation Sciences Government Engineering 3 The World’s Most Efficient GPU* 16 14 14.47 GFLOPS/W 12 GFLOPS/W GFLOPS/mm2 10 7.50 8 4.50 7.90 6 GFLOPS/mm2 2.01 2.21 4 4.56 1.07 2.24 2 0.42 1.06 0.92 0 Nov-05 Jan-06 Sep-07 Nov-07 Jun-08 Oct-09 ATI Radeon™ ATI Radeon™ ATI Radeon™ HD ATI Radeon™ HD ATI Radeon™ HD ATI Radeon™ HD X1800 XT X1900 XTX 2900 PRO 3870 4870 5870 *Based on comparison of consumer client single-GPU configurations as of 12/08/09. ATI Radeon™ HD 5870 provides 4 14.47 GFLOPS/W and 7.90 GFLOPS/mm2 vs. NVIDIA GTX 285 at 5.21 GFLOPS/W and 2.26 GFLOPS/mm2. Enhancing GPUs for Computation RV770 Architecture • 800 stream processing units • Fast double-precision processing – up to 240 GFLOPS • 115 GB/sec GDDR5 memory interface • Integer bit shift operations for all units – useful for crypto, compression, video processing • Data sharing between threads • More aggressive clock gating for improved performance per watt 5 5 Stream Software Development 6 ATI Stream Technology Development and Deployment Deployment Development ATI Radeon™ graphics on A PC for consumer applications ATI FirePro™ graphics for professional graphics Industry AMD FireStream™ Standards: compute accelerator OpenCL™ and for computation Microsoft DirectCompute AMD Embedded Radeon™ Graphics for long availability and small footprint 7 7 Moving Past Proprietary Solutions for Ease of Cross-Platform Programming Open and Custom Tools High Level Tools High Level Language Application Specific Compilers Libraries Industry Standard Interfaces OpenCL™ DirectX® OpenGL® AMD AMD Other GPUs CPUs CPUs/GPUs •Cross-platform development •Interoperability with OpenGL and DirectX •CPU/GPU backends enable balanced platform approach 8 ATI Stream SDK v2.0 (Production): OpenCL™ For Multicore x86 CPUs and GPUs The Power of Fusion: Developers leverage heterogeneous architecture to enable a superior user experience • First complete OpenCL™ development platform • Certified OpenCL™ 1.0 compliant by the Khronos Group1 • Write code that can scale well on multi-core CPUs and GPUs • AMD delivers on the promise of support for OpenCL™, with both high-performance CPU and GPU technologies • Available for download now – includes documentation, samples, and developer support Software Applications Graphics Workloads Serial and Task Parallel Data Parallel Workloads Workloads Product Page: http://developer.amd.com/stream 1 Conformance logs submitted for the ATI Radeon™ HD 5800 series GPUs, ATI Radeon™ HD 5700 series GPUs, ATI Radeon™ HD 4800 series GPUs, ATI FirePro™ V8700 series GPUs, AMD FireStream™ 9200 series GPUs, ATI Mobility 9 Radeon™ HD 4800 series GPUs and x86 CPUs with SSE3. ATI Stream SDK v2.0 (Production): Key Features • New: Support for OpenCL™ ICD (Installable Client Driver) • New: Support for atomic functions for 32-bit integers • New: Microsoft® Visual Studio® 2008-integrated ATI Stream Profiler performance analysis tool • Preview: Support for OpenCL™ / OpenGL® interoperability • Preview: Support for OpenCL™ / Microsoft® DirectX® 10 interoperability • Preview: Support for double-precision floating point basic arithmetic in OpenCL™ C kernels • Added support for ATI Radeon™ HD 5970 GPU1 • Supported OSes: Microsoft® XP / Vista® / 7 and Linux® openSUSE™ 11.0 / Ubuntu® 9.04 • Supported Compilers: Microsoft® Visual Studio® 2008 Pro / GCC 4.3 or later / ICC 11.x Product Page: http://developer.amd.com/stream 1 Conformance logs submitted for the ATI Radeon™ HD 5800 series GPUs, ATI Radeon™ HD 5700 series GPUs, ATI Radeon™ HD 4800 series GPUs, ATI FirePro™ V8700 series GPUs, AMD FireStream™ 9200 series GPUs, ATI Mobility 10 Radeon™ HD 4800 series GPUs and x86 CPUs with SSE3. OpenCL™: Game-Changing Development Enabling Broad Adoption of GP-GPU Capabilities . Industry standard API: Open, multiplatform development platform for heterogeneous architectures . The power of Fusion: Leverages CPUs and GPUs for balanced system approach . Broad industry support: Created by architects from AMD, Apple, IBM, Intel, Nvidia, Sony, etc. Fast track development: Ratified in December; system vendors working on implementations now . Momentum: Enormous interest from mainstream developers More stream-enabled applications across all markets 11 Stream Computing Applications 12 ATI Stream Technology Enabled Multimedia Applications MediaShow 5 PowerDirector 8 MediaShow Espresso PowerDirector 7 SimHD™ Plug-in for TotalMedia Theatre Roxio Creator™ 2010 Roxio Creator™ 2010 Pro 13 High Performance With ATI Stream Technology: Cryptography through Distributed.net . Distributed.net provides a distributed compute model allowing users to donate compute cycles to large compute-intensive projects . RC5-72 – Cryptography algorithm that searches for encryption keys RC5-72 1400 1200 1000 800 600 MKeys/sec 400 200 0 ATI Radeon ATI Radeon ATI Radeon Nvidia GEForce Nvidia GEForce Nvidia GEForce HD5870 HD4870 HD4850 GTX285 GTX280 GTX260 *Based on AMD internal testing using RC5-72 clients as of 9/04/09. Results shown in millions of keys evaluated per second [Mkeys/sec]. Configuration: AMD Phenom™ X4 9950 Black Edition processor, 8GB DDR2 RAM, Windows Vista® 32-bit. AMD drivers: ATI Catalyst™ 9.8 (ATI Radeon™ HD 48xx), prerelease driver (ATI Radeon HD 5870). Nvidia driver: GeForce 190.62. AMD client: [x86/Stream], v2.9106.513 (beta8). Nvidia client: [x86/CUDA-2.2], v2.9105.512 (beta8). 14 GPU Acceleration in Technical Applications Tomographic Reconstruction • Alain Bonissent, Centre de Physique des Particules de Marseille • Reporting 42-60x* speedups • This image: 7 minutes in optimized C++; 10 seconds in Brook+ EDA Simulation • ACCIT • Currently beta testing applications and reporting >10x speedup** Seismic Processing • Brown Deer • Achieving 120x speedup vs CPU on 3D 2nd order finite-difference time-domain (FDTD) seismic processing algorithm Options Trading • Scotia Capital: • Reported a 28x speedup over a quad-core CPU. Neural Networks • Neurala • Developing Neurala Technology Platform for advanced brain-based machine learning applications • Report achieving 10-200x speedups on biologically inspired neural models 15 Embedded Stream Computing Products 16 ATI Radeon™ E4690 Embedded GPU Performance Gaming and Computation • 384 GLOPS Peak • 320 stream processors @ 600 MHz • Bigger & Wider Memory • 128-bit memory interface • 512 MB GDDR3 in the package • Available as a Chip or MXM Module • MXM 3.0 Type A • 5 Year Planned Longevity of Supply* * Availability subject to change without notice. 17 17 ATI Radeon™E4870 MXM Module Top Performance for Stream Computing • GLOPS Peak single precision • Up to 240 GFLOPS double-precision • 800 stream processors • 115 GB/s Memory Interface • 256 bits wide • 1 GB GDDR5 on the module • MXM Module • MXM 3.0 Type II • Limited availability • Targeted for engineering demos • See an AMD Embedded Sales Representative 18 18 ONLY AMD! 19 Disclaimer and Attribution DISCLAIMER The information presented in this document is for informational purposes only and may contain technical inaccuracies, omissions and typographical errors. The information contained herein is subject to change and may be rendered inaccurate for many reasons, including but not limited to product and roadmap changes, component and motherboard version changes, new model and/or product releases, product differences between differing manufacturers, software changes, BIOS flashes, firmware upgrades, or the like. AMD assumes no obligation to update or otherwise correct or revise this information. However, AMD reserves the right to revise this information and to make changes from time to time to the content hereof without obligation of AMD to notify any person of such revisions or changes. AMD MAKES NO REPRESENTATIONS OR WARRANTIES WITH RESPECT TO THE CONTENTS HEREOF AND ASSUMES NO RESPONSIBILITY FOR ANY INACCURACIES, ERRORS OR OMISSIONS THAT MAY APPEAR IN THIS INFORMATION. AMD SPECIFICALLY DISCLAIMS ANY IMPLIED WARRANTIES OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE. IN NO EVENT WILL AMD BE LIABLE TO ANY PERSON FOR ANY DIRECT, INDIRECT, SPECIAL OR OTHER CONSEQUENTIAL DAMAGES ARISING FROM THE USE OF ANY INFORMATION CONTAINED HEREIN, EVEN IF AMD IS EXPRESSLY ADVISED OF THE POSSIBILITY OF SUCH DAMAGES. ATTRIBUTION © 2010 Advanced Micro Devices, Inc. All rights reserved. AMD, the AMD Arrow logo, ATI, the ATI logo, Radeon and combinations thereof are trademarks of Advanced Micro Devices, Inc. OpenCL is a trademark of Apple Inc. used under license to the Khronos Group Inc. Other names are for informational purposes only and may be trademarks of their respective owners. CAUTIONARY STATEMENT This release contains forward-looking statements concerning revenue and other expectations which are made pursuant to the safe harbor provisions of the Private Securities Litigation Reform Act of 1995. Forward-looking statements are commonly identified by words such as “would,” “may,” “expects,” “believes,” “plans,” “intends,” “projects,” and other terms with similar meaning. Investors are cautioned that the forward-looking statements in this release are based on current beliefs, assumptions and expectations, speak only as of the date of this release and involve risks and uncertainties that could cause actual results to differ materially from current expectations. 20.