AMD Athlon Processor X86 Code Optimization Guide
Total Page:16
File Type:pdf, Size:1020Kb
AMD Athlon™ Processor x86 Code Optimization Guide Publication No. Revision Date 22007 K February 2002 © 2001, 2002 Advanced Micro Devices, Inc. All rights reserved. The contents of this document are provided in connection with Advanced Micro Devices, Inc. (“AMD”) products. AMD makes no representations or war- ranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and prod- uct descriptions at any time without notice. No license, whether express, implied, arising by estoppel or otherwise, to any intellectual property rights is granted by this publication. Except as set forth in AMD’s Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD’s products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other applica- tion in which the failure of AMD’s product could create a situation where per- sonal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice. Trademarks AMD, the AMD Arrow logo, AMD Athlon, and combinations thereof, 3DNow!, AMD-751, and Super7 are trade- marks, and AMD-K6 and AMD-K6-2 are registered trademarks of Advanced Micro Devices, Inc. Microsoft, Windows, and Windows NT are registered trademarks of Microsoft Corporation. MMX is a trademark and Pentium is a registered trademark of Intel Corporation. Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies. 22007K February 2002 AMD Athlon™ Processor x86 Code Optimization Guide Contents List of Figures . .xiii List of Tables . xv Revision History . xvii Chapter 1 Introduction . 1 About This Document . 1 AMD Athlon™ Processor Family. 3 AMD Athlon Processor Microarchitecture Summary . 4 Chapter 2 Top Optimizations. 7 Optimization Star . 8 Group I Optimizations—Essential Optimizations . 8 Memory-Size and Alignment Issues . 8 Use the 3DNow!™ Prefetching Instructions. 9 Select DirectPath Over VectorPath Instructions. 10 Group II Optimizations—Secondary Optimizations. 10 Load-Execute Instruction Usage . 10 Take Advantage of Write Combining . 12 Optimizing Main Memory Performance for Large Arrays . 12 Use 3DNow! Instructions . 13 Recognize 3DNow! Professional Instructions. 14 Avoid Branches Dependent on Random Data . 14 Avoid Placing Code and Data in the Same 64-Byte Cache Line. 15 Table of Contents iii AMD Athlon™ Processor x86 Code Optimization Guide 22007K February 2002 Chapter 3 C Source-Level Optimizations. .17 Ensure Floating-Point Variables and Expressions are of Type Float . 17 Use 32-Bit Data Types for Integer Code . 17 Consider the Sign of Integer Operands . 18 Use Array-Style Instead of Pointer-Style Code . 20 Completely Unroll Small Loops. 22 Avoid Unnecessary Store-to-Load Dependencies . 23 Always Match the Size of Stores and Loads . 24 Consider Expression Order in Compound Branch Conditions . 27 Switch Statement Usage. 28 Use Prototypes for All Functions . 29 Use Const Type Qualifier . 29 Generic Loop Hoisting . 30 Declare Local Functions as Static . 32 Dynamic Memory Allocation Consideration . 33 Introduce Explicit Parallelism into Code . 33 Explicitly Extract Common Subexpressions . 35 C Language Structure Component Considerations . 36 Sort Local Variables According to Base Type Size . 37 Accelerating Floating-Point Divides and Square Roots . 38 Fast Floating-Point-to-Integer Conversion . 40 Speeding Up Branches Based on Comparisons Between Floats. 42 Avoid Unnecessary Integer Division. 44 Copy Frequently Dereferenced Pointer Arguments to Local Variables . 44 Use Block Prefetch Optimizations. 46 iv Table of Contents 22007K February 2002 AMD Athlon™ Processor x86 Code Optimization Guide Chapter 4 Instruction Decoding Optimizations. 49 Overview . 49 Select DirectPath Over VectorPath Instructions. 50 Load-Execute Instruction Usage . 50 Use Load-Execute Integer Instructions . 50 Use Load-Execute Floating-Point Instructions with Floating-Point Operands . 51 Avoid Load-Execute Floating-Point Instructions with Integer Operands . 51 Use Read-Modify-Write Instructions Where Appropriate . 52 Align Branch Targets in Program Hot Spots . 54 Use 32-Bit LEA Rather than 16-Bit LEA Instruction. 54 Use Short Instruction Encodings. 54 Avoid Partial-Register Reads and Writes. 55 Use LEAVE Instruction for Function Epilogue Code . 56 Replace Certain SHLD Instructions with Alternative Code. 57 Use 8-Bit Sign-Extended Immediates . 57 Use 8-Bit Sign-Extended Displacements. 58 Code Padding Using Neutral Code Fillers . 58 Recommendations for AMD-K6® Family and AMD Athlon Processor Blended Code. 59 Table of Contents v AMD Athlon™ Processor x86 Code Optimization Guide 22007K February 2002 Chapter 5 Cache and Memory Optimizations . 63 Memory Size and Alignment Issues . 63 Avoid Memory-Size Mismatches . 63 Align Data Where Possible . 65 Optimizing Main Memory Performance for Large Arrays . 66 Memory Copy Optimization . 67 Array Addition . 74 Summary . 78 Use the PREFETCH 3DNow!™ Instruction . 79 Determining Prefetch Distance . 83 Take Advantage of Write Combining . 85 Avoid Placing Code and Data in the Same 64-Byte Cache Line. 85 Multiprocessor Considerations . 86 Store-to-Load Forwarding Restrictions. 86 Store-to-Load Forwarding Pitfalls—True Dependencies . 87 Summary of Store-to-Load Forwarding Pitfalls to Avoid . 90 Stack Alignment Considerations . 90 Align TBYTE Variables on Quadword Aligned Addresses. 91 C Language Structure Component Considerations . 91 Sort Variables According to Base Type Size . 92 vi Table of Contents 22007K February 2002 AMD Athlon™ Processor x86 Code Optimization Guide Chapter 6 Branch Optimizations . 93 Avoid Branches Dependent on Random Data . 93 AMD Athlon Processor Specific Code. 94 Blended AMD-K6 and AMD Athlon Processor Code . 94 Always Pair CALL and RETURN . 96 Recursive Functions . 97 Replace Branches with Computation in 3DNow! Code . 98 Muxing Constructs . 98 Sample Code Translated into 3DNow! Code . 100 Avoid the Loop Instruction . 104 Avoid Far Control Transfer Instructions . 104 Chapter 7 Scheduling Optimizations. .105 Schedule Instructions According to their Latency . 105 Unrolling Loops. 106 Complete Loop Unrolling. 106 Partial Loop Unrolling . 106 Use Function Inlining . 109 Overview . 109 Always Inline Functions if Called from One Site . 110 Always Inline Functions with Fewer than 25 Machine Instructions . ..