Software Optimization Guide for Amd Family 15H Processors (.Pdf)

Total Page:16

File Type:pdf, Size:1020Kb

Software Optimization Guide for Amd Family 15H Processors (.Pdf) Software Optimization Guide for AMD Family 15h Processors Publication No. Revision Date 47414 3.06 January 2012 Advanced Micro Devices © 2012 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. The infor- mation contained herein may be of a preliminary or advance nature and is subject to change without notice. No license, whether express, implied, arising by estoppel or other- wise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as compo- nents in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD’s product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD Arrow logo, and combinations thereof, AMD Athlon, AMD Opteron, 3DNow!, AMD Virtualization and AMD-V are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Linux is a registered trademark of Linus Torvalds. Microsoft and Windows are registered trademarks of Microsoft Corporation. MMX is a trademark of Intel Corporation. PCI-X and PCI Express are registered trademarks of the PCI-Special Interest Group (PCI-SIG). Solaris is a registered trademark of Sun Microsystems, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. 47414 Rev. 3.06 January 2012 Software Optimization Guide for AMD Family 15h Processors Contents Tables . .11 Figures . .13 Revision History. .15 Chapter 1 Introduction . .17 1.1 Intended Audience . .17 1.2 Getting Started . .17 1.3 Using This Guide . .18 1.3.1 Special Information . .19 1.3.2 Numbering Systems . .19 1.3.3 Typographic Notation . .20 1.4 Important New Terms . .20 1.4.1 Multi-Core Processors . .20 1.4.2 Internal Instruction Formats . .20 1.4.3 Types of Instructions . .21 1.5 Key Optimizations . .22 1.5.1 Implementation Guideline . .22 1.6 What’s New on AMD Family 15h Processors . .22 1.6.1 AMD Instruction Set Enhancements . .23 1.6.2 Floating-Point Improvements . .23 1.6.3 Load-Execute Instructions for Unaligned Data . .25 1.6.4 Instruction Fetching Improvements . .25 1.6.5 Instruction Decode and Floating-Point Pipe Improvements . .26 1.6.6 Notable Performance Improvements . .26 1.6.7 AMD Virtualization™ Optimizations . .27 Chapter 2 Microarchitecture of AMD Family 15h Processors . .29 2.1 Key Microarchitecture Features . .30 2.2 Microarchitecture of AMD Family 15h Processors . .30 2.3 Superscalar Processor . .31 Contents 3 Software Optimization Guide for AMD Family 15h Processors 47414 Rev. 3.06 January 2012 2.4 Processor Block Diagram . .31 2.5 AMD Family 15h Processor Cache Operations . .32 2.5.1 L1 Instruction Cache . .33 2.5.2 L1 Data Cache . .33 2.5.3 L2 Cache . .33 2.5.4 L3 Cache . .33 2.6 Branch-Prediction . .34 2.7 Instruction Fetch and Decode . .34 2.8 Integer Execution . .35 2.9 Translation-Lookaside Buffer . .35 2.9.1 L1 Instruction TLB Specifications . .35 2.9.2 L1 Data TLB Specifications . .35 2.9.3 L2 Instruction TLB Specifications . .35 2.9.4 L2 Data TLB Specifications . .36 2.10 Integer Unit . .36 2.10.1 Integer Scheduler . .36 2.10.2 Integer Execution Unit . .36 2.11 Floating-Point Unit . .37 2.12 Load-Store Unit . .38 2.13 Write Combining . .39 2.14 Integrated Memory Controller . .39 2.15 HyperTransport™ Technology Interface . .40 2.15.1 HyperTransport Assist . .41 Chapter 3 C and C++ Source-Level Optimizations . .43 3.1 Declarations of Floating-Point Values . .44 3.2 Using Arrays and Pointers . .45 3.3 Use of Function Prototypes . .47 3.4 Unrolling Small Loops . .47 3.5 Expression Order in Compound Branch Conditions . .48 3.6 Arrange Boolean Operands for Quick Expression Evaluation . .49 4 Contents 47414 Rev. 3.06 January 2012 Software Optimization Guide for AMD Family 15h Processors 3.7 Long Logical Expressions in If Statements . .50 3.8 Pointer Alignment . .51 3.9 Unnecessary Store-to-Load Dependencies . .52 3.10 Matching Store and Load Size . .53 3.11 Use of const Type Qualifier . .56 3.12 Generic Loop Hoisting . .56 3.13 Local Static Functions . .59 3.14 Explicit Parallelism in Code . .59 3.15 Extracting Common Subexpressions . .62 3.16 Sorting and Padding C and C++ Structures . .63 3.17 Replacing Integer Division with Multiplication . .64 3.18 Frequently Dereferenced Pointer Arguments . ..
Recommended publications
  • An Introduction to Analysis and Optimization with AMD Codeanalyst™ Performance Analyzer
    An introduction to analysis and optimization with AMD CodeAnalyst™ Performance Analyzer Paul J. Drongowski AMD CodeAnalyst Team Advanced Micro Devices, Inc. Boston Design Center 8 September 2008 Introduction This technical note demonstrates how to use the AMD CodeAnalyst™ Performance Analyzer to analyze and improve the performance of a compute-bound program. The program that we chose for this demonstration is an old classic: matrix multiplication. We'll start with a "textbook" implementation of matrix multiply that has well-known memory access issues. We will measure and analyze its performance using AMD CodeAnalyst. Then, we will improve the performance of the program by changing its memory access pattern. 1. AMD CodeAnalyst AMD CodeAnalyst is a suite of performance analysis tools for AMD processors. Versions of AMD CodeAnalyst are available for both Microsoft® Windows® and Linux®. AMD CodeAnalyst may be downloaded (free of charge) from AMD Developer Central. (Go to http://developer.amd.com and click on CPU Tools.) Although we will use AMD CodeAnalyst for Windows in this tech note, engineers and developers can use the same techniques to analyze programs on Linux. AMD CodeAnalyst performs system-wide profiling and supports the analysis of both user applications and kernel- mode software. It provides five main types of data collection and analysis: • Time-based profiling (TBP), • Event-based profiling (EBP), • Instruction-based sampling (IBS), • Pipeline simulation (Windows-only feature), and • Thread profiling (Windows-only feature). We will look at the first three kinds of analysis in this note. Performance analysis usually begins with time-based profiling to identify the program hot spots that are candidates for optimization.
    [Show full text]
  • AMD Codexl 1.7 GA Release Notes
    AMD CodeXL 1.7 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010 extension o CodeXL Microsoft® Visual Studio® 2012 extension o CodeXL Microsoft® Visual Studio® 2013 extension o CodeXL Remote Agent For 64-bit Linux platforms o CodeXL Standalone application o CodeXL Remote Agent Note about installing CodeAnalyst after installing CodeXL for Windows AMD CodeAnalyst has reached End-of-Life status and has been replaced by AMD CodeXL. CodeXL installer will refuse to install on a Windows station where AMD CodeAnalyst is already installed. Nevertheless, if you would like to install CodeAnalyst, do not install it on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5000 series or newer, desktop or mobile version) or APU is required. o The AMD Catalyst Driver must be installed, release 13.11 or later. Catalyst 14.12 (driver 14.501) is the recommended version. See "Getting the latest Catalyst release" section below. For GPU API-Level Debugging, a working OpenCL/OpenGL configuration is required (AMD or other). CPU Profiling o Time-Based Profiling can be performed on any x86 or AMD64 (x86-64) CPU/APU.
    [Show full text]
  • Motmot Documentation Release 0
    motmot Documentation Release 0 Andrew Straw June 26, 2010 CONTENTS 1 Overview 3 1.1 The name motmot............................................3 1.2 Packages within motmot.........................................3 1.3 Mailing list................................................4 1.4 Related Software.............................................4 2 Download and installation instructions5 2.1 Quick install: FView application on Windows..............................5 3 Full install information 7 3.1 Supported operating systems.......................................7 3.2 Download.................................................7 3.3 Installation................................................7 3.4 Download direct from the source code repository............................8 4 Gallery of applications built on motmot packages9 4.1 Open source...............................................9 4.2 Closed source............................................... 12 5 Frequently Asked Questions 13 5.1 What cameras are supported?...................................... 13 5.2 What frame rates, image sizes, bit depths are possible?......................... 13 5.3 Which way is up? (Why are my images flipped or rotated?)...................... 13 6 Writing FView plugins 15 6.1 Overview................................................. 15 6.2 Register your FView plugin....................................... 15 6.3 Tutorials................................................. 15 7 Camera trigger device with precise timing and analog input 25 7.1 camtrig – Camera trigger
    [Show full text]
  • AMD Codexl 1.8 GA Release Notes
    AMD CodeXL 1.8 GA Release Notes Contents AMD CodeXL 1.8 GA Release Notes ......................................................................................................... 1 New in this version .............................................................................................................................. 2 System Requirements .......................................................................................................................... 2 Getting the latest Catalyst release ....................................................................................................... 4 Note about installing CodeAnalyst after installing CodeXL for Windows ............................................... 4 Fixed Issues ......................................................................................................................................... 4 Known Issues ....................................................................................................................................... 5 Support ............................................................................................................................................... 6 Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio®
    [Show full text]
  • AMD Accelerated Parallel Processing Math Libraries Are Software Libraries
    OVERVIEW AMD Core Math Library (ACML) provides a no-cost set of math routines for high performance computing (HPC), scientific, engineering, and related compute-intensive applications, thoroughly optimized and threaded for use on OTHER AMD PERFORMANCE AMD processors. ACML is ideal for weather modeling, computational fluid dynamics, financial analysis, oil and gas LIBRARIES applications, and more. APPML: AMD Accelerated Parallel Processing Math Libraries are FEATURES software libraries containing FFT and BLAS functions written in > 100% compatible BLAS library including all standard Level 1, Level 2, and Level 3 subroutines OpenCL and designed to run on > Highly optimized kernels for GEMM routines and other Level 3 AMD GPUs. matrix-matrix operations BLAS > Highly optimized for Level 1 BLAS vector operations AMD LibM: a software library > Support for AMD-K8TM, AMD Family 10h, AMD Family 15h and various containing a collection of basic Intel processor families math functions optimized for x86- > OpenMP support for Level 3 BLAS routines 64 processor-based machines. > Derived from Mark 22 NAG Library for SMP and Multicore LAPACK > Multithreading optimizations in many routines AMD String Library: standard > Complex, Real-Complex, Complex-Real transforms GNU C Library (glibc) string > 1D, 2D, and 3D transforms FFT functions optimized for AMD > Expert interfaces provide more control over scaling, in-place/out-of- processors. place, array layout > Optimized versions of most critical libm functions Vector Math Library Framewave Project: a collection > Scalar, Vector, and array versions of popular low-level software > 5 base generators routines beginning with simple > NAG Basic, Wichmann-Hill, L’Ecuyer, Mersenne Twister, arithmetic and extending into Blum-Blum-Shub rich domains, such as image and > 26 distribution generators Random Number Generators signal processing.
    [Show full text]
  • Instruction-Based Sampling: a New Performance Analysis Technique for AMD Family 10H Processors
    Instruction-Based Sampling: A New Performance Analysis Technique for AMD Family 10h Processors Paul J. Drongowski AMD CodeAnalyst Project Advanced Micro Devices, Inc. Boston Design Center 16 November 2007 1. Introduction Software applications must use computational resources efficiently in order to deliver best results in a timely manner. This is especially true for time-sensitive applications such as transaction processing, real-time control, multi-media and games. A program profile is a histogram that reflects dynamic program behavior. For example, a profile shows where a program is spending its time. Program profiling helps software developers to meet performance goals by identifying performance bottlenecks and issues. Profiling is most effective when a developer can quickly identify the location and cause of a performance issue. Instruction-Based Sampling (IBS) is a new profiling technique that provides rich, precise program performance information. IBS is introduced by AMD Family10h processors (AMD Opteron Quad- Core processor “Barcelona.”) IBS overcomes the limitations of conventional performance counter sampling. Data collected through performance counter sampling is not precise enough to isolate performance issues to individual instructions. IBS, however, precisely identifies instructions which are not making the best use of the processor pipeline and memory hierarchy. IBS collects a wide range of performance information in a single program run, making it easier to conduct performance testing. The AMD CodeAnalyst performance analysis tool suite supports IBS and correlates the instruction-level IBS information with processes, program modules, functions and source code. IBS in combination with CodeAnalyst helps a developer to find, analyze and ameliorate performance problems. This technical note is a brief introduction to Instruction-Based Sampling.
    [Show full text]
  • AMD Codexl 1.1 GA Release Notes
    AMD CodeXL 1.1 GA Release Notes Thank you for using CodeXL. We appreciate any feedback you have! Please use our CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: CodeXL Visual Studio 2012 and 2010 packages and Standalone application, for 32-bit and 64-bit Windows platforms CodeXL for 64-bit Linux platforms Kernel Analyzer v2 for both Windows and Linux platforms Note about 32-bit Windows CodeXL 1.1 Upgrade Error On 32-bit Windows platforms, upgrading from previous version of CodeXL using the CodeXL 1.1 installer will remove the previous version and then display an error message without installing CodeXL 1.1. The recommended method is to uninstall previous CodeXL before installing CodeXL 1.1. If you ran the 1.1 installer to upgrade a previous installation and encountered the error mentioned above, ignore the error and run the installer again to install CodeXL 1.1. Note about installing CodeAnalyst after installing CodeXL for Windows CodeXL can be safely installed on a Windows station where AMD CodeAnalyst is already installed. However, do not install CodeAnalyst on a Windows station already installed with CodeXL. Uninstall CodeXL first, and then install CodeAnalyst. System Requirements CodeXL contains a host of development features with varying system requirements: GPU Profiling and OpenCL Kernel Debugging o An AMD GPU (Radeon HD 5xxx or newer) or APU is required o The AMD Catalyst Driver must be installed, release 12.8 or later.
    [Show full text]
  • Downloaded and Freely Modified to Meet Our Additional Requirements Related to Result Logging
    INVESTIGATING TOOLS AND TECHNIQUES FOR IMPROVING SOFTWARE PERFORMANCE ON MULTIPROCESSOR COMPUTER SYSTEMS A thesis submitted in fulfilment of the requirements for the degree of MASTER OF SCIENCE of RHODES UNIVERSITY by WAIDE BARRINGTON TRISTRAM Grahamstown, South Africa March 2011 Abstract The availability of modern commodity multicore processors and multiprocessor computer sys- tems has resulted in the widespread adoption of parallel computers in a variety of environments, ranging from the home to workstation and server environments in particular. Unfortunately, par- allel programming is harder and requires more expertise than the traditional sequential program- ming model. The variety of tools and parallel programming models available to the programmer further complicates the issue. The primary goal of this research was to identify and describe a selection of parallel programming tools and techniques to aid novice parallel programmers in the process of developing efficient parallel C/C++ programs for the Linux platform. This was achieved by highlighting and describing the key concepts and hardware factors that affect parallel programming, providing a brief survey of commonly available software development tools and parallel programming models and libraries, and presenting structured approaches to software performance tuning and parallel programming. Finally, the performance of several parallel programming models and libraries was investigated, along with the programming effort required to implement solutions using the respective models. A quantitative research methodology was applied to the investigation of the performance and programming effort associated with the selected parallel programming models and libraries, which included automatic parallelisation by the compiler, Boost Threads, Cilk Plus, OpenMP, POSIX threads (Pthreads), and Threading Building Blocks (TBB).
    [Show full text]
  • VSM Cover Snipe
    0808vsm_RdrsChoice_C2_final 8/14/08 11:55 AM Page 1 SPECIAL SECTION: 2008 BUYERS GUIDE 2008 Buyers Guide Readers Choice Awards 4 Product Listings 6 Third-Party Tools Put the “Rapid” in RAD 2 Project3 7/10/08 1:32 PM Page 1 Project3 7/10/08 1:33 PM Page 2 0808vsm_BGEdNote_2.final 7/24/08 2:05 PM Page 2 Editor’s Note THIRD-PARTY TOOLS BY PATRICK MEADER PUT THE “RAPID” IN RAD editor in chief Welcome to the Visual Studio Magazine 2008 Buyers Guide supplement! Every year, the editors of Visual Studio Magazine survey include anything that covers DVD- or online-based training. the third-party market of tools and services for Visual Studio We think K-Source is intriguing for several reasons, not least and compile a list of relevant products in areas that are of the because it brings the notion of suites of controls that have most interest to Visual Studio developers. This year we com- proven so popular in the VS market to the area of training. K- piled a list of more than 400 products and services across 22 cat- Source gives you the ability to package together a wide range egories (these begin on p.6).Note that you won’t see any prod- of online training subjects for your entire development team ucts from Microsoft listed in the categories; this is a survey of (see VSM’s review of K-Source on p.12 of the August issue). third-party solution providers, which by definition excludes The listings in the print version of this supplement provide Microsoft’s offerings.When compiling the list,we allow a prod- the product name,company,and a Web site for each of the prod- uct to be listed in only one category.In cases where a product fits ucts within a given category.You can find a more detailed version more than one category (and this is frequently the case), we of these listings at VisualStudioMagazine.com (Locator+ attempt to choose the closest category fit for that product.
    [Show full text]
  • Nova.Simd - a Framework for Architecture-Independent SIMD Development
    nova.simd - A framework for architecture-independent SIMD development Tim BLECHMANN [email protected] Abstract extended to integer data and double-precision floating point types with SSE2, and literally Most CPUs provide instruction set extensions each new CPU generation added some more in- to make use of the Single Instruction Multi- structions for specific use cases1. Some ven- ple Data (SIMD) programming paradigm, ex- dors provide specific libraries, but unfortunately amples are the SSE and AVX instruction set most of them have specific restrictions, only ab- families for IA32 and X86 64, Altivec on PPC stract once specific instruction set or work only or NEON on ARM. While compilers can do lim- on a specific platform. ited auto-vectorization, developers usually have Nova.simd was designed to provide a generic to target each instruction set manually in order and easy to use framework to easily write SIMD to get the full performance out of their code. code, which is independent from the instruction Nova.simd provides a generic framework to set. It provides ready-to-use vector functions, write cross-platform code, that makes use of but also an generic framework to write generic data level parallelism by utilizing instruction vector code. It is a header-only C++ library sets. that makes heavy use of templates and tem- plate metaprogramming techniques and cur- 1 Introduction & Motivation rently supports the SSE and AVX families on Most processors provide instructions to make IA32 and X86 64, Altivec on PPC and NEON use of data-level parallelism via the Single In- on ARM.
    [Show full text]
  • AMD Codexl 1.9 GA Release Notes
    AMD CodeXL 1.9 GA Release Notes Contents AMD CodeXL 1.9 GA Release Notes ......................................................................................................... 1 New in this version .............................................................................................................................. 2 System Requirements .......................................................................................................................... 2 Getting the latest Radeon software release ......................................................................................... 4 Note about installing CodeAnalyst after installing CodeXL for Windows ............................................... 4 Fixed Issues ......................................................................................................................................... 4 Known Issues ....................................................................................................................................... 5 Support ............................................................................................................................................... 7 Thank you for using CodeXL. We appreciate any feedback you have! Please use the CodeXL Forum to provide your feedback. You can also check out the Getting Started guide on the CodeXL Web Page and the latest CodeXL blog at AMD Developer Central - Blogs This version contains: For 64-bit Windows platforms o CodeXL Standalone application o CodeXL Microsoft® Visual Studio® 2010
    [Show full text]
  • AMD Technology & Software
    AMD Technology & Software Justin Boggs, Sr. Developer Relations Engineer [email protected] Q3’06 Agenda • Desktop Overview • Processors & Roadmap • Software Architecture & Performance • Desktop Platforms • Future AMD Technology Directions • AMD Developer Resources • Call to Action 2 AMD Technology & Software AMD Desktop Overview AMD Desktop Advantage • Built from the ground up, AMD x86 processor technology makes it possible to improve responsiveness to changing business needs AMD64 offers flexibility by supporting 32- and 64-bit applications across desktop, mobile and server applications Direct Connect Architecture enables increased performance, scalability and improved multi-tasking AMD Dual Core provides enhanced performance without increasing power requirements AMD Cool ‘n’ Quiet™ decreases overall power consumption by optimizing performance on demand Enhanced Virus Protection adds an extra level of virus protection to your security solution 4 AMD Technology & Software AMD64™ Powerful 64-Bit Computing • Technology that gives total backward compatibility with leading-edge 32-bit computing performance. • Technology that paves the way to multi-core computing with cutting-edge communications technology. • Technology is more than 64-bit computing— it’s also about next-generation architecture. • Technology that allows software developers to create new functionality for end users. • Technology that solves real problems. 5 AMD Technology & Software AMD Direct Connect Architecture Direct Connect Architecture moves more data more efficiently,
    [Show full text]