Software Optimization Guide for the AMD Family 15H Processors

Software Optimization Guide for AMD Family 15h Processors Publication No. Revision Date 47414 3.08 January 2014 Advanced Micro Devices © 2014 Advanced Micro Devices Inc. All rights reserved. The information contained herein is for informational purposes only, and is subject to change without notice. While every precaution has been taken in the preparation of this document, it may contain technical inaccuracies, omissions and typographical errors, and AMD is under no obligation to update or otherwise correct this information. Advanced Micro Devices, Inc. makes no representations or warranties with respect to the accuracy or completeness of the contents of this document, and assumes no liability of any kind, including the implied warranties of noninfringement, merchantability or fitness for particular purposes, with respect to the operation or use of AMD hardware, software or other products described herein. No license, including implied or arising by estoppel, to any intellectual property rights is granted by this document. Terms and limitations applicable to the purchase or use of AMD’s products are as set forth in a signed agreement between the parties or in AMD's Standard Terms and Conditions of Sale. Trademarks AMD, the AMD Arrow logo, and combinations thereof, AMD Athlon, AMD Opteron, 3DNow!, AMD Virtualization, and AMD-V are trademarks of Advanced Micro Devices, Inc. HyperTransport is a licensed trademark of the HyperTransport Technology Consortium. Linux is a registered trademark of Linus Torvalds. Microsoft and Windows are registered trademarks of Microsoft Corporation. MMX is a trademark of Intel Corporation. PCI-X and PCI Express are registered trademarks of the PCI-Special Interest Group (PCI-SIG). Solaris is a registered trademark of Sun Microsystems, Inc. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. 47414 Rev. 3.08 January 2014 Software Optimization Guide for AMD Family 15h Processors Contents Tables . .11 Figures . .13 Revision History. .15 Chapter 1 Introduction . .17 1.1 Intended Audience . .17 1.2 Getting Started . .17 1.3 Using This Guide . .18 1.3.1 Special Information . .19 1.3.2 Numbering Systems . .19 1.3.3 Typographic Notation . .20 1.4 Important New Terms . .20 1.4.1 Multi-Core Processors . .20 1.4.2 Internal Instruction Formats . .20 1.4.3 Types of Instructions . .21 1.5 Key Optimizations . .22 1.5.1 Implementation Guideline . .22 1.6 What’s New on AMD Family 15h Processors . .22 1.6.1 AMD Instruction Set Enhancements . .23 1.6.2 Floating-Point Improvements . .23 1.6.3 Load-Execute Instructions for Unaligned Data . .26 1.6.4 Instruction Fetching Improvements . .26 1.6.5 Instruction Decode and Floating-Point Pipe Improvements . .26 1.6.6 Notable Performance Improvements . .26 1.6.7 Additional Enhancements for Models 30h–4Fh . .27 1.6.8 AMD Virtualization™ Optimizations . .28 Chapter 2 Microarchitecture of AMD Family 15h Processors . .29 2.1 Key Microarchitecture Features . .30 2.2 Microarchitecture of AMD Family 15h Processors . .30 Contents 3 Software Optimization Guide for AMD Family 15h Processors 47414 Rev. 3.08 January 2014 2.3 Superscalar Processor . .31 2.4 Processor Block Diagram . .31 2.5 AMD Family 15h Processor Cache Operations . .34 2.5.1 L1 Instruction Cache . .34 2.5.2 L1 Data Cache . .34 2.5.3 L2 Cache . .34 2.5.4 L3 Cache . .35 2.6 Branch-Prediction . .35 2.7 Instruction Fetch and Decode . .36 2.8 Integer Execution . .36 2.9 Translation-Lookaside Buffer . .36 2.9.1 L1 Instruction TLB Specifications . .37 2.9.2 L1 Data TLB Specifications . .37 2.9.3 L2 Instruction TLB Specifications . .37 2.9.4 L2 Data TLB Specifications . .37 2.10 Integer Unit . .37 2.10.1 Integer Scheduler . .37 2.10.2 Integer Execution Unit . .37 2.11 Floating-Point Unit . .38 2.12 Load-Store Unit . .41 2.13 Write Combining . .41 2.14 Integrated Memory Controller . .42 2.15 HyperTransport™ Technology Interface . .42 2.15.1 HyperTransport Assist . .43 Chapter 3 C and C++ Source-Level Optimizations . .45 3.1 Declarations of Floating-Point Values . .46 3.2 Using Arrays and Pointers . .47 3.3 Use of Function Prototypes . .49 3.4 Unrolling Small Loops . .49 3.5 Expression Order in Compound Branch Conditions . .50 4 Contents 47414 Rev. 3.08 January 2014 Software Optimization Guide for AMD Family 15h Processors 3.6 Arrange Boolean Operands for Quick Expression Evaluation . .51 3.7 Long Logical Expressions in If Statements . .52 3.8 Pointer Alignment . .53 3.9 Unnecessary Store-to-Load Dependencies . .54 3.10 Matching Store and Load Size . .55 3.11 Use of const Type Qualifier . .58 3.12 Generic Loop Hoisting . .58 3.13 Local Static Functions . .61 3.14 Explicit Parallelism in Code . .61 3.15 Extracting Common Subexpressions . .64 3.16 Sorting and Padding C and C++ Structures . .65 3.17 Replacing Integer Division with Multiplication . .66 3.18 Frequently Dereferenced Pointer Arguments . .67 3.19 32-Bit Integral Data Types . ..

Software Optimization Guide for the AMD Family 15H Processors

Effective Virtual CPU Configuration with QEMU and Libvirt

A Quantitative Study of Advanced Encryption Standard Performance

Software Optimization Guide for Amd Family 15H Processors (.Pdf)

On Security and Privacy for Networked Information Society

Amd Epyc 7351

CS 110 Discussion 15 Programming with SIMD Intrinsics

Efficient Hashing Using the AES Instruction

Motmot Documentation Release 0

AMD's Bulldozer Architecture

AMD Ryzen 5 1600 Specifications

C++ Code M128 Add (Const M128 &X, Const __M128 &Y){ X X3 X2 X1 X0 Return Mm Add Ps(X, Y); } + + + + +

Stream Cipher Designs: a Review