Dual-Core Intel® Itanium® 2 Processor: Reference Manual Update

Dual-Core Update to the Intel® Itanium® 2 Processor Reference Manual For Software Development and Optimization Revision 0.9 January 2006 Document Number: 308065-001 Notice: This document contains information on products in the design phase of development. The information here is subject to change without notice. Do not finalize a design with this information. INFORMATION IN THIS DOCUMENT IS PROVIDED IN CONNECTION WITH INTEL® PRODUCTS. NO LICENSE, EXPRESS OR IMPLIED, BY ESTOPPEL OR OTHERWISE, TO ANY INTELLECTUAL PROPERTY RIGHTS IS GRANTED BY THIS DOCUMENT. EXCEPT AS PROVIDED IN INTEL'S TERMS AND CONDITIONS OF SALE FOR SUCH PRODUCTS, INTEL ASSUMES NO LIABILITY WHATSOEVER, AND INTEL DISCLAIMS ANY EXPRESS OR IMPLIED WARRANTY, RELATING TO SALE AND/OR USE OF INTEL PRODUCTS INCLUDING LIABILITY OR WARRANTIES RELATING TO FITNESS FOR A PARTICULAR PURPOSE, MERCHANTABILITY, OR INFRINGEMENT OF ANY PATENT, COPYRIGHT OR OTHER INTELLECTUAL PROPERTY RIGHT. Intel products are not intended for use in medical, life saving, or life sustaining applications. Intel may make changes to specifications and product descriptions at any time, without notice. Designers must not rely on the absence or characteristics of any features or instructions marked “reserved” or “undefined.” Intel reserves these for future definition and shall have no responsibility whatsoever for conflicts or incompatibilities arising from future changes to them. The Itanium 2 processor may contain design defects or errors known as errata which may cause the product to deviate from published specifications. Current characterized errata are available on request. The code name “Montecito” presented in this document is only for use by Intel to identify a product, technology, or service in development, that has not been made commercially available to the public, i.e., announced, launched or shipped. It is not a “commercial” name for products or services and is not intended to function as a trademark. Contact your local Intel sales office or your distributor to obtain the latest specifications and before placing your product order. Copies of documents which have an order number and are referenced in this document, or other Intel literature, may be obtained by calling 1-800- 548-4725, or by visiting Intel's web site at http://www.intel.com. Intel, Itanium, Pentium, VTune and the Intel logo are trademarks or registered trademarks of Intel Corporation or its subsidiaries in the United States and other countries. Copyright © 2006, Intel Corporation. All rights reserved. *Other names and brands may be claimed as the property of others. 2 Reference Manual for Software Development and Optimization Contents 1 Introduction.........................................................................................................................9 1.1 Terminology...........................................................................................................9 1.2 Related Documentation.........................................................................................9 2 The Dual-Core Itanium 2 Processor.................................................................................11 2.1 Overview .............................................................................................................11 2.1.1 Identifying the Dual-Core Itanium 2 Processor.......................................11 2.1.2 Introducing Montecito .............................................................................12 2.2 New Instructions..................................................................................................14 2.3 Core.....................................................................................................................15 2.3.1 Instruction Slot to Functional Unit Mapping............................................15 2.3.2 Instruction Latencies and Bypasses.......................................................17 2.3.3 Caches and Cache Management Changes ...........................................18 2.4 Threading ............................................................................................................20 2.4.1 Sharing Core Resources........................................................................21 2.4.2 Tailoring Thread Switch Behavior ..........................................................23 2.4.3 Sharing Cache and Memory Resources ................................................24 2.5 Dual Cores ..........................................................................................................25 2.5.1 Fairness and Arbitration .........................................................................27 2.6 Intel® Virtualization Technology ..........................................................................27 2.7 Tips and Tricks ....................................................................................................27 2.7.1 Cross Modifying Code ............................................................................27 2.7.2 ld.bias and lfetch.excl .............................................................................27 2.7.3 L2D Victimization Optimization...............................................................27 2.7.4 Instruction Cache Coherence Optimization............................................28 2.8 IA-32 Execution...................................................................................................28 2.9 Brand Information................................................................................................28 3 Performance Monitoring...................................................................................................31 3.1 Introduction to Performance Monitoring ..............................................................31 3.2 Performance Monitor Programming Models........................................................31 3.2.1 Workload Characterization.....................................................................32 3.2.2 Profiling ..................................................................................................35 3.2.3 Event Qualification .................................................................................37 3.2.4 References .............................................................................................43 3.3 Performance Monitor State .................................................................................43 3.3.1 Performance Monitor Control and Accessibility......................................46 3.3.2 Performance Counter Registers.............................................................46 3.3.3 Performance Monitor Event Counting Restrictions Overview ................49 3.3.4 Performance Monitor Overflow Status Registers (PMC0,1,2,3).............49 3.3.5 Instruction Address Range Matching .....................................................50 3.3.6 Opcode Match Check (PMC32,33,34,35,36) .........................................53 3.3.7 Data Address Range Matching (PMC41)...............................................56 3.3.8 Instruction EAR (PMC37/PMD32,33,36)................................................57 3.3.9 Data EAR (PMC40, PMD32,33,36)........................................................60 Reference Manual for Software Development and Optimization 3 3.3.10 Execution Trace Buffer (PMC39,42,PMD48-63,38,39)................................ 65 3.3.11 Interrupts ................................................................................................ 72 3.3.12 Processor Reset, PAL Calls, and Low Power State............................... 73 4 Performance Monitor Events............................................................................................ 75 4.1 Introduction ......................................................................................................... 75 4.2 Categorization of Events ..................................................................................... 75 4.2.1 Hyper-Threading and Event Types ........................................................ 76 4.3 Basic Events ....................................................................................................... 77 4.4 Instruction Dispersal Events................................................................................ 77 4.5 Instruction Execution Events............................................................................... 78 4.6 Stall Events ......................................................................................................... 79 4.7 Branch Events..................................................................................................... 80 4.8 Memory Hierarchy............................................................................................... 81 4.8.1 L1 Instruction Cache and Prefetch Events.............................................83 4.8.2 L1 Data Cache Events ........................................................................... 84 4.8.3 L2 Instruction Cache Events .................................................................. 86 4.8.4 L2 Data Cache Events ........................................................................... 87 4.8.5 L3 Cache Events .................................................................................... 91 4.9 System Events .................................................................................................... 92 4.10 TLB Events.......................................................................................................... 93

Dual-Core Intel® Itanium® 2 Processor: Reference Manual Update

1 Introduction

Caching Basics

Instruction Latencies and Throughput for AMD and Intel X86 Processors

45-Year CPU Evolution: One Law and Two Equations

Hierarchical Roofline Analysis for Gpus: Accelerating Performance

Micro-Circuits for High Energy Physics*

An Algorithmic Theory of Caches by Sridhar Ramachandran

Evolution of Microprocessor Performance

Computer Architecture Out-Of-Order Execution

PDSM4+ 1.0.Indb

VX97 User's Manual ASUS CONTACT INFORMATION Asustek COMPUTER INC

Parallel Computing