The POWER4 Processor Introduction and Tuning Guide

The POWER4 Processor Introduction and Tuning Guide

Front cover The POWER4 Processor Introduction and Tuning Guide Comprehensive explanation of POWER4 performance Includes code examples and performance measurements How to get the most from the compiler Steve Behling Ron Bell Peter Farrell Holger Holthoff Frank O’Connell Will Weir ibm.com/redbooks International Technical Support Organization The POWER4 Processor Introduction and Tuning Guide November 2001 SG24-7041-00 Take Note! Before using this information and the product it supports, be sure to read the general information in “Special notices” on page 175. First Edition (November 2001) This edition applies to AIX 5L for POWER Version 5.1 (program number 5765-E61), XL Fortran Version 7.1.1 (5765-C10 and 5765-C11) and subsequent releases running on an IBM ^ pSeries POWER4-based server. Unless otherwise noted, all performance values mentioned in this document were measured on a 1.1 GHz machine, then normalized to 1.3 GHz. Note: This book is based on a pre-GA version of a product and may not apply when the product becomes generally available. We recommend that you consult the product documentation or follow-on versions of this redbook for more current information. Comments may be addressed to: IBM Corporation, International Technical Support Organization Dept. JN9B Building 003 Internal Zip 2834 11400 Burnet Road Austin, Texas 78758-3493 When you send information to IBM, you grant IBM a non-exclusive right to use or distribute the information in any way it believes appropriate without incurring any obligation to you. © Copyright International Business Machines Corporation 2001. All rights reserved. Note to U.S Government Users – Documentation related to restricted rights – Use, duplication or disclosure is subject to restrictions set forth in GSA ADP Schedule Contract with IBM Corp. Contents Figures . vii Tables . .ix Preface . .xi The team that wrote this redbook. xii Notice . xiii IBM trademarks . xiv Comments welcome. xiv Chapter 1. Processor evolution . 1 1.1 POWER1 . 1 1.2 POWER2 . 2 1.3 PowerPC . 2 1.4 RS64 . 3 1.5 POWER3 . 4 1.6 POWER4 . 4 Chapter 2. The POWER4 system . 5 2.1 POWER4 system overview. 5 2.2 The POWER4 chip . 6 2.3 Processor overview . 8 2.3.1 The POWER4 processor execution pipeline. 9 2.3.2 Instruction fetch, group formation, and dispatch . 9 2.3.3 Instruction execution, speculation, rename resources . 11 2.3.4 Branch prediction . 12 2.3.5 Translation buffers (TLB, SLB, I- and D-ERAT) . 13 2.3.6 Load instruction processing . 13 2.3.7 Store instruction processing . 14 2.3.8 Fixed-point execution pipeline. 15 2.3.9 Floating-point execution pipeline. 15 2.3.10 Group completion . 16 2.4 Storage hierarchy. 16 2.4.1 L1 instruction cache . 17 2.4.2 L1 data cache . 17 2.4.3 L2 cache . 17 2.4.4 L3 cache . 18 2.4.5 Interconnecting chips to form larger SMPs . 18 2.4.6 Multiple module interconnect . 19 © Copyright IBM Corp. 2001 iii 2.4.7 Memory subsystem . 20 2.4.8 Hardware data prefetch. 21 2.4.9 Memory/L3 cache command queue structure. 22 2.5 I/O structure . 23 2.6 The POWER4 Performance Monitor . 23 Chapter 3. POWER4 system performance and tuning . 25 3.1 Tuning for numerically intensive applications . 25 3.1.1 The tuning process for numerically intensive applications . 26 3.1.2 Hand tuning overview for numerically intensive programs . 26 3.1.3 Key aspects of the POWER4 design . 27 3.1.4 Tuning for the memory subsystem . 34 3.1.5 Tuning for the FPUs . 40 3.1.6 Cache and memory latency measurement . 47 3.1.7 Selected fundamental kernel performance within on-chip cache . 49 3.1.8 Other tuning considerations . 51 3.2 Tuning non-floating point applications . 52 3.2.1 The load/store and integer units . 52 3.2.2 Memory configurations . 53 3.3 System tuning . 54 3.3.1 POWER4 virtual memory architecture overview . 54 3.3.2 Small and large page sizes . 58 3.3.3 AIX system parameters. 61 3.3.4 Minimizing variation in job performance . 67 Chapter 4. Optimizing with the compilers. 69 4.1 POWER4-specific compiler options . 69 4.1.1 General performance options . 70 4.1.2 Options for POWER4 . 75 4.1.3 Using XL Fortran vector-intrinsic functions . 76 4.1.4 Recommended options . 79 4.1.5 Comparing C and Fortran compiler code generation . 79 4.2 XL Fortran compiler directives for tuning . 80 4.2.1 Prefetch directives. 81 4.2.2 Loop-related directives . 82 4.2.3 Cache and other directives . 83 4.3 The object code listing . 84 4.4 Basic coding practices for performance . 88 4.4.1 Language-independent tips. 88 4.4.2 Fortran tips . 89 4.4.3 C and C++ tips . 89 4.4.4 Inlining procedure references . 90 4.4.5 Structuring code for optimal grouping . 91 iv POWER4 Processor Introduction and Tuning Guide 4.5 Tuning for 64-bit integer performance. 91 Chapter 5. General tuning guidelines . 93 5.1 Hand tuning code . 93 5.1.1 Local or global variables? . 93 5.1.2 Pointers . 94 5.1.3 Expressions. 94 5.1.4 Data type conversions. 95 5.1.5 Tuning loops . 95 5.2 Using pre-tuned code . 101 5.3 The performance monitor . 101 5.4 Tuning for I/O . ..

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    212 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us