IA-64 Architecture Overview

IAIA--6464 ArchitectureArchitecture A High-Performance Computing Architecture [email protected] Internet Solutions Group EMEA Technical Marketing Overview Overview July 2000 AgendaAgenda Motivation and IA-64 feature overview IA-64 features • EPIC • Data types, memory and registers • Register stack • Predication and parallel compares • Software pipelining and register rotation • Control & data speculation • Branch architecture • Integer architecture • Floating point architecture Itanium™ processor overview Itanium™ processor based systems overview Operating systems, tools and programming Copyright © 2000, Intel Corporation. All rights reserved. 2 *Other brands and names are the property of their respective owners IAIA--64:64: ExtendingExtending thethe IntelIntel®® ArchitectureArchitecture Designed for High Performance Computing • Scientific • Technical & Engineering • Business New EPIC Technology IA-64 Architecture uses EPIC Itanium™ processor is the first implementation of IA-64 Copyright © 2000, Intel Corporation. All rights reserved. 3 *Other brands and names are the property of their respective owners PerformancePerformance LimitersLimiters Parallelism not fully utilized • Existing architectures cannot exploit sufficient parallelism in integer code to feed a wide in-order implementation Branches • Even with perfect branch prediction, small basic blocks of code do not fully utilize machine width Procedure Calls • Software modularity is becoming standard resulting call/return overhead Memory latency and address space • Increasing relative to processor cycle time (larger cache miss penalties) and limited address space IA-64 overcomes these limitations, IA-64 IA-64 overcomes these limitations, and more ! Copyright © 2000, Intel Corporation. All rights reserved. 4 *Other brands and names are the property of their respective owners IAIA--6464 ArchitecturalArchitectural FeaturesFeatures 64-bit Address Flat Memory Model Explicit Parallel Instruction Computing Parallelism Explicit Parallel Instruction Computing Large Register Files Automatic Register Stack Engine Predication Branches Software Pipelining Support Register Rotation Sophisticated Branch Architecture Procedure Calls Loop Control Hardware Control & Data Speculation Cache Control Memory Powerful Integer Architecture Latency Advanced Floating Point Architecture Multimedia Support (MMX™ Technology) IA-64 It’s more than just 64 bits … Copyright © 2000, Intel Corporation. All rights reserved. 5 *Other brands and names are the property of their respective owners NextNext GenerationGeneration ArchitectureArchitecture EPICEPIC DesignDesign PhilosophyPhilosophy Maximize performance via EPIC hardware & software synergy Advanced features VLIW OOO / SS enhance instruction level parallelism RISC • Predication, Speculation, … Massive hardware CISC resources for parallel Performance Performance execution Time BeyondBeyond TraditionalTraditional ArchitecturesArchitectures Copyright © 2000, Intel Corporation. All rights reserved. 6 *Other brands and names are the property of their respective owners AgendaAgenda Motivation and IA-64 feature overview IA-64 features • EPIC • Data types, memory and registers • Register stack • Predication and parallel compares • Software pipelining and register rotation • Control & data speculation • Branch architecture • Integer architecture • Floating point architecture Itanium™ processor overview Itanium™ processor based systems overview Operating systems, tools and programming Copyright © 2000, Intel Corporation. All rights reserved. 7 *Other brands and names are the property of their respective owners EPICEPIC InstructionInstruction ParallelismParallelism Source Code Instruction Groups No RAW or WAW (series of bundles) dependencies Issued in parallel depending on resources Instruction 3 instructions + Bundles template (3 Instructions) 3 x 41 bits + 5 bits = 128 bits Up to 6 instructions executed per clock Copyright © 2000, Intel Corporation. All rights reserved. 8 *Other brands and names are the property of their respective owners CompilerCompiler (HW)(HW) DataData TypesTypes 64-bit Integer 2x32-bit SIMD Integer 4x16-bit SIMD Integer 8x8-bit SIMD Integer 64-bit DP F.P. 2x32-bit SIMD SP-F.P. IA-64 All common data types are supported Copyright © 2000, Intel Corporation. All rights reserved. 9 *Other brands and names are the property of their respective owners 6464--bitbit MemoryMemory AccessAccess 18 BILLION Giga Bytes accessible • 264 == 18,446,744,073,709,551,616 Byte addressable access with 64-bit pointers • 64-bit virtual address space • HW support for 32-bit pointers Access granularity and alignmentalignment • 1,2,4,8,10,16 bytes • Alignment on naturally aligned boundaries is recommended • Instructions are always 16-byte aligned Support for both Big and Little endian byte order Memory hierarchy control 2.1 GB/s front-side bus Byte Addressable 64-bit IA-64 Byte Addressable 64-bit Virtual Address-Space Copyright © 2000, Intel Corporation. All rights reserved. 10 *Other brands and names are the property of their respective owners MemoryMemory HierarchyHierarchy ControlControl Software can explicitly control memory accesses • Specify levels of the memory hierarchy affected by the access • Allocation and Flush resolution is at least 32-bytes Allocation (Prefetch) • Allocation implies bringing the data close to the CPU • Allocation hints indicate at which level allocation takes place • Used in load, store, and explicit pre-fetch instructions De-allocation and Flush • Invalidates the addressed line in all levels of cache hierarchy • Write data back to memory if necessary Three levels of cache (full speed L2 cache, 2/4MB L3-cache) & Atomic operation support IA-64 Control over Cache (De)Allocation Copyright © 2000, Intel Corporation. All rights reserved. 11 *Other brands and names are the property of their respective owners MemoryMemory AccessAccess OrderingOrdering Explicit control • Memory Fence mf - ensures all prior memory operations are seen prior to all future memory operations • Acquire Load ld.acq - ensure I am seen prior to all future memory operations • Release store st.rel - ensure that all prior memory operations are seen prior to me • Synchronize instruction caches sync.i - Ensure all instruction caches have seen all prior flush cache instructions Implicit - applicable to semaphore instructions • xchg Exchange mem and General Register (GR) • cmpxchg Conditional exchange of mem and GR • fetchadd Add immediate to memory Strong ordering model is compatible with IA-32 Ordering Copyright © 2000, Intel Corporation. All rights reserved. 12 *Other brands and names are the property of their respective owners LargeLarge RegisterRegister SetSet Predicate Integer Registers FP Registers Branch Registers Registers NaT 63 0 63 0 81 0 63 bit 0 GR0 0 FR0 + 0.0 BR0 GR1 FR1 + 1.0 PR0 1 BR7 PR1 GR31 FR31 GR32 FR32 PR15 PR16 GR127 FR127 PR63 32 Static 32 Static 16 Static 96 Framed, Rotating 96 Rotating 48 Rotating Remove Resource IA-64 Remove Resource Bottlenecks Copyright © 2000, Intel Corporation. All rights reserved. 13 *Other brands and names are the property of their respective owners ContextContext SwitchSwitch “Normal”“Normal” – full context switch saves all registers – GR and FR “Lazy”“Lazy” – saves only GR registers or specific range (GR0-31, GR32-GR127) “Fast”“Fast” – doesn’t save any registers and uses 16 separate banked “shadow” registers ‘(OS only, GR16’-GR31’) – e.g. interrupt and exception handling Copyright © 2000, Intel Corporation. All rights reserved. 14 *Other brands and names are the property of their respective owners RegisterRegister StackStack GRs 0-31 are global to all procedures Stacked registers begin at GR32 and are local to each procedure 127 Each procedure’s registerregister stackstack frame varies from 0 to 96 registers 96 Stacked Only GRs implement a register stack • The FRs, PRs, and BRs are global to all 32 procedures 31 Register Stack Engine (RSE) 32 Global • Upon stack overflow/underflow, registers 0 are saved/restored to/from a backing store transparently Optimized CALL/RETURN IA-64 Optimized CALL/RETURN and Parameter Passing Copyright © 2000, Intel Corporation. All rights reserved. 15 *Other brands and names are the property of their respective owners RegisterRegister StackStack inin WorkWork Call changes frame to contain only the caller’s output Alloc instr. sets the frame region to the desired size • Three architecture parameters: local, output, and rotating Return restores the stack frame of the caller 56 Outputs 48 Virtual Local 52 52 Outputs Outputs (Inputs) Outputs 46 46 32 32 Local Local (Inputs) (Inputs) 32 Call Alloc Ret 32 PROC APROC B PROC B PROC A Avoids Register Spill/Fill IA-64 Avoids Register Spill/Fill among Procedure Calls Copyright © 2000, Intel Corporation. All rights reserved. 16 *Other brands and names are the property of their respective owners RegisterRegister StackStack EngineEngine allocate 127 Stack frame Stack frame Stack frame 56 D E F Outputs release 48 Stack frame Local C 52 (Inputs) Stack frame 46 Stack frame 32 B Logical save Stack frame A Stack frame Stack frame 32 B A restore 31 Global Register 0 Physical IA-64 GR (Integer) Registers only Copyright © 2000, Intel Corporation. All rights reserved. 17 *Other brands and names are the property of their respective owners Predication:Predication: ControlControl FlowFlow toto DataData FlowFlow Traditional Arch. IA-64 cmp a, b if cmp a, b p1,p2 jump.eq p2 X=1 p1 X=2 Load X=1 else

Load more