Intel Pentium 4 and Intel Xeon Processor Optimization
Total Page:16
File Type:pdf, Size:1020Kb
Intel® Pentium® 4 and Intel® Xeon™ Processor Optimization Reference Manual Issued in U.S.A. Order Number: 248966-007 World Wide Web: http://developer.intel.com INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL’S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel prod- ucts are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. This Intel Pentium 4 and Intel Xeon Processor Optimization Reference Manual as well as the software described in it is fur- nished under license and may only be used or copied in accordance with the terms of the license. The information in this manual is furnished for informational use only, is subject to change without notice, and should not be construed as a com- mitment by Intel Corporation. Intel Corporation assumes no responsibility or liability for any errors or inaccuracies that may appear in this document or any software that may be provided in association with this document. Except as permitted by such license, no part of this document may be reproduced, stored in a retrieval system, or transmitted in any form or by any means without the express written consent of Intel Corporation. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. Intel, Pentium, Intel Xeon, Intel NetBurst, Itanium, MMX, and VTune are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. * Other names and brands may be claimed as the property of others. Copyright © 1999-2002 Intel Corporation. ii Contents Introduction About This Manual ................................................................................. xxii Related Documentation......................................................................... xxiv Notational Conventions .......................................................................... xxv Chapter 1 Intel® Pentium® 4 and Intel® Xeon™ Processor Overview SIMD Technology and Streaming SIMD Extensions 2 ........................... 1-2 Summary of SIMD Technologies ....................................................... 1-5 MMX™ Technology....................................................................... 1-5 Streaming SIMD Extensions ......................................................... 1-5 Streaming SIMD Extensions 2 ...................................................... 1-6 Intel® NetBurst™ Micro-architecture ...................................................... 1-6 The Design Considerations of the Intel NetBurst Micro-architecture.............................................................................. 1-7 Overview of the Intel NetBurst Micro-architecture Pipeline ............... 1-8 The Front End ............................................................................. 1-10 The Out-of-order Core ................................................................. 1-11 Retirement ................................................................................... 1-11 Front End Pipeline Detail ................................................................ 1-12 Prefetching.................................................................................. 1-12 Decoder ...................................................................................... 1-13 Execution Trace Cache............................................................... 1-13 Branch Prediction........................................................................ 1-13 Branch Hints ............................................................................... 1-15 iii Intel Pentium 4 and Intel Xeon Processor Optimization Contents Execution Core Detail ...................................................................... 1-15 Instruction Latency and Throughput............................................ 1-16 Execution Units and Issue Ports ................................................. 1-17 Caches........................................................................................ 1-19 Data Prefetch .............................................................................. 1-20 Loads and Stores ........................................................................ 1-22 Store Forwarding......................................................................... 1-23 Hyper-Threading Technology ............................................................... 1-23 Processor Resources and Hyper-Threading Technology ................ 1-25 Replicated Resources ................................................................. 1-25 Partitioned Resources................................................................. 1-26 Shared Resources ...................................................................... 1-26 Microarchitecture Pipeline and Hyper-Threading Technology ......... 1-26 Front End Pipeline .......................................................................... 1-27 Execution Core ................................................................................ 1-27 Retirement ....................................................................................... 1-27 Chapter 2 General Optimization Guidelines Tuning to Achieve Optimum Performance.............................................. 2-1 Tuning to Prevent Known Coding Pitfalls ............................................... 2-2 General Practices and Coding Guidelines.............................................. 2-3 Use Available Performance Tools ...................................................... 2-3 Optimize Performance Across Processor Generations ..................... 2-4 Optimize Branch Predictability........................................................... 2-4 Optimize Memory Access .................................................................. 2-4 Optimize Floating-point Performance ................................................ 2-5 Optimize Instruction Selection ........................................................... 2-5 Optimize Instruction Scheduling ........................................................ 2-6 Enable Vectorization .......................................................................... 2-6 Coding Rules, Suggestions and Tuning Hints ........................................ 2-6 Performance Tools.................................................................................. 2-7 Intel® C++ Compiler........................................................................... 2-7 General Compiler Recommendations................................................ 2-8 iv Intel Pentium 4 and Intel Xeon Processor Optimization Contents VTune™ Performance Analyzer ........................................................ 2-9 Processor Generations Perspective ....................................................... 2-9 The CPUID Dispatch Strategy and Compatible Code Strategy ....... 2-11 Branch Prediction ................................................................................. 2-12 Eliminating Branches ....................................................................... 2-12 Spin-Wait and Idle Loops................................................................. 2-15 Static Prediction ............................................................................... 2-15 Branch Hints .................................................................................... 2-17 Inlining, Calls and Returns............................................................... 2-18 Branch Type Selection..................................................................... 2-19 Loop Unrolling................................................................................. 2-22 Compiler Support for Branch Prediction .......................................... 2-24 Memory Accesses ................................................................................ 2-24 Alignment......................................................................................... 2-25 Store Forwarding ............................................................................. 2-27 Store-forwarding Restriction on Size and Alignment................... 2-28 Store-forwarding Restriction on Data Availability ........................ 2-32 Data Layout Optimizations............................................................... 2-34 Stack Alignment ............................................................................... 2-37 Aliasing Cases ................................................................................. 2-38 Mixing Code and Data ..................................................................... 2-39 Self-modifying Code.................................................................... 2-40 Write Combining .............................................................................. 2-41 Locality Enhancement ....................................................................