Intel® Processor Architecture

Intel® Processor Architecture January 2013 Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. Agenda •Overview Intel® processor architecture •Intel x86 ISA (instruction set architecture) •Micro-architecture of processor core •Uncore structure •Additional processor features – Hyper-threading – Turbo mode •Summary Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 2 Intel® Processor Segments Today Architecture Target ISA Specific Platforms Features Intel® phone, tablet, x86 up to optimized for ATOM™ netbook, low- SSSE-3, 32 low-power, in- Architecture power server and 64 bit order Intel® mainstream x86 up to flexible Core™ notebook, Intel® AVX, feature set Architecture desktop, server 32 and 64bit covering all needs Intel® high end server IA64, x86 by RAS, large Itanium® emulation address space Architecture Intel® MIC accelerator for x86 and +60 cores, Architecture HPC Intel® MIC optimized for Instruction Floating-Point Set performance Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 3 Itanium® 9500 (Poulson) New Itanium Processor Poulson Processor • Compatible with Itanium® 9300 processors (Tukwila) Core Core 32MB Core Shared Core • New micro-architecture with 8 Last Core Level Core Cores Cache Core Core • 54 MB on-die cache Memory Link • Improved RAS and power Controllers Controllers management capabilities • Doubles execution width from 6 to 12 instructions/cycle 4 Intel 4 Full + 2 Half Scalable Width Intel • 32nm process technology Memory QuickPath Interface Interconnect • Launched in November 2012 (SMI) Compatibility provides protection for today’s Itanium® investment Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 5 Intel® XEON™ Phi Former Code Name “Knights Corner” Intel® XEON™ Phi - The first product implementation of the Intel® Many Integrated Core Architecture (Intel® MIC) Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 6 Agenda •Overview Intel® processor architecture •Intel x86 ISA (instruction set architecture) •Micro-architecture of processor core •Uncore structure •Additional processor features – Hyper-threading – Turbo mode •Summary Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 7 X86: From Smartphones to … Motorola RAZR* i • Launched September 2012 • RAZR i is the first smartphone that can achieve speeds of 2.0 GHz Processor: Intel® ATOM™ Z2460 Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 8 X86: … to Supercomputers LRZ SuperMUC System • Installed summer 2012 – Most powerful x86-architecture based computer – #6 on Top500 list – More than 150000 cores • Processor: - Intel® Xeon® E5-2680 (“Sandy Bridge”) - Intel® Xeon® E7-4870 (“Westmere”) Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 9 Intel Tick-Tock Roadmap for Mainstream x86 Architecture since 2006 2nd 3nd ® ™ Generation Generation Intel Core Micro Architecture Intel® Core™ Intel® Core™ MicroArchitecture Codename “Nehalem” Micro Micro Architecture Architecture Merom Penryn Nehalem Westmere Sandy Bridge Ivy Bridge NEW NEW NEW NEW NEW NEW Micro architecture Process Technology Micro architecture Process Technology Micro architecture Process Technology 65nm 45nm 45nm 32nm 32nm 22nm TOCK TICK TOCK TICK TOCK TICK 2006 2007 2008 2009 2011 2012 SSSE-3 SSE4.1 SSE4.2 AES AVX 7 new instructions TICK + TOCK = SHRINK + INNOVATE Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 12 To be continued ... 4nth Generation Intel® Core™ TBD TBD TBD TBD TBD Micro Architecture Haswell Broadwell TBD TBD TBD TBD NEW NEW NEW NEW NEW NEW Micro architecture Process Technology Micro architecture Process Technology Micro architecture Process Technology 22nm 14nm 14nm 10nm 10nm 7nm TICK TOCK TICK TOCK TICK TOCK 2013 >= 2014 ??? ??? ??? ??? AVX-2 TICK + TOCK = SHRINK + INNOVATE Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 13 Registers State for Intel® Pentium® 3 Processor (1998) IA32-INT MMX Technology / SSE Registers Registers IA-FP Registers 80 32 64 128 eax st0 mm0 xmm0 … edi st7 mm7 xmm7 Fourteen 32-bit registers Eight 80/64-bit registers Eight 128-bit registers Scalar data & addresses Hold data only Hold data only: Direct access to regs Direct access to MM0..MM7 4 x single FP numbers No MMX™ Technology / FP 2 x double FP numbers interoperability 128-bit packed integers Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 15 SSE Vector Types 4x single precision Intel® SSE FP 2x double precision FP 16x 8 bit integer 8x 16 bit integer Intel® SSE2 4x 32 bit integer 2x 64 bit integer plain 128 bit Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 16 AVX Vector Types 8x single precision FP Intel® AVX 4x double precision FP 32x 8 bit integer 16x 16 bit integer Intel® AVX2 8x 32 bit integer (Future) 4x 64 bit integer plain 256 bit Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 19 X86 ISA: The Instruction Set • The instruction set for the x86 architecture has been extended numerous times since the set supported by the 8086 processor – See http://en.wikipedia.org/wiki/X86_instruction_listings for an excellent overview • Today, the “base” instructions set (“IA32 ISA”) is the one supported by the first 32bit processor - 80386 • Multiple, “smaller” extensions added then before SSE (1998 / Intel® Pentium® 3) like – MMX( 64 bit SIMD using the x87 FP registers) – Conditional move – Atomic exchange Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 20 New Instructions in Haswell (2013) Group Description Count * SIMD Integer Adding vector integer operations to 256-bit Instructions promoted to 256 bits Gather Load elements from vector of indices 170 / 124 vectorization enabler AVX2 Shuffling / Data Blend, element shift and permute instructions Rearrangement FMA Fused Multiply-Add operation forms ( FMA-3) 96 / 60 Bit Manipulation and Improving performance of bit stream manipulation and 15 / 15 Cryptography decode, large integer arithmetic and hashes TSX=RTM+HLE Transactional Memory 4/4 Others MOVBE: Load and Store of Big Endian forms 2 / 2 INVPCID: Invalidate processor context ID * Total instructions / different mnemonics Software & Services Group, Developer Products Division Copyright © 2013, Intel Corporation. All rights reserved. *Other brands and names are the property of their respective owners. 23 HSW Improvements for Threading Sample Code Computing PI by Windows Threads #include <windows.h> void main () #define NUM_THREADS 2 { HANDLE thread_handles[NUM_THREADS]; double pi; int i; CRITICAL_SECTION hUpdateMutex; DWORD threadID; static long num_steps = 100000; int threadArg[NUM_THREADS]; double step; double global_sum = 0.0; for(i=0; i<NUM_THREADS; i++) threadArg[i] = i+1; void Pi (void *arg) { InitializeCriticalSection(&hUpdateMutex); int i, start; double x, sum = 0.0; for (i=0; i<NUM_THREADS; i++){ thread_handles[i] = CreateThread(0, 0, (LPTHREAD_START_ROUTINE) Pi, start = *(int *) arg; &threadArg[i], 0, &threadID); step = 1.0/(double) num_steps; } for (i=start;i<= num_steps; WaitForMultipleObjects(NUM_THREADS, i=i+NUM_THREADS){ thread_handles, TRUE,INFINITE); x = (i-0.5)*step; sum = sum + 4.0/(1.0+x*x); pi = global_sum * step; } EnterCriticalSection(&hUpdateMutex); printf(" pi is %f \n",pi); global_sum += sum; } LeaveCriticalSection(&hUpdateMutex); } Locks can be key bottleneck – even in case there is no conflict Copyright© 2012, Intel Corporation. All rights reserved. Partially Intel Confidential Information. *Other brands and names are the property of their respective owners. Intel® Transactional Synchronization Extensions (Intel® TSX) Intel® TSX = HLE + RTM HLE (Hardware Lock Elision) is a hint inserted in front of a LOCK operation to indicate a region is a candidate for lock elision • XACQUIRE (0xF2) and XRELEASE (0xF3) prefixes • Don’t actually acquire lock, but execute region speculatively • Hardware buffers loads and stores, checkpoints registers • Hardware

Intel® Processor Architecture

AMD Athlon™ Processor X86 Code Optimization Guide

Introduction to the Poulson (Intel 9500 Series) Processor Openvms Advanced Technical Boot Camp 2015 Keith Parris / September 29, 2015

Inside Intel® Core™ Microarchitecture Setting New Standards for Energy-Efficient Performance

Microcode Revision Guidance August 31, 2019 MCU Recommendations

Evolution of Microprocessor Performance

Multiprocessing Contents

Intel® Core™ Microarchitecture • Wrap Up

Evaluation of the Intel Sandy Bridge-EP Server Processor

Nt* and Rtl* INT 2Eh CALL Ntdll!Kifastsystemcall

The Intel X86 Microarchitectures Map Version 2.0

M39 Sandy Bridge-PDF

A 65 Nm 2-Billion Transistor Quad-Core Itanium Processor