IA32 Architecture

IA32 Architecture

IAIA--3232 ArchitectureArchitecture Sunil Saxena Principal Engineer Intel Corporation September 11, 2000 Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Labs Agenda l Pentium ® III Processor New Features l Pentium ® 4 Processor New Features l Pentium ® 4 Processor Micro-architecture Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 2 1 IA Processor Roadmap . Madison Extends IA Headroom, Extends IA Headroom, IA-64 Perf . ScalabilityScalability andand AvailabilityAvailability Deerfield forfor thethe MostMost IA-64 Price/Perf McKinley DemandingDemanding EnvironmentsEnvironments Future . IA-32 ItaniumTM processor Performance Foster Cascades OutstandingOutstanding ® Pentium III Xeon™ PerformancePerformance forfor processor 3232 BitBit VolumeVolume AppsApps ’99 ’00 ’01 ’02 .25µ .18µ .13µ Strong Execution on Itanium™ Processor, Continued Focus on the Long Term Intel ® Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 3 Pentium III Processor l Pentium III Processor New Features –36-bit Physical Addressing – Physical Address Extension - PAE-36 – Page Size Extensions - PSE-36 –Page Attribute Table –Fast Floating-point save/restore –New Instructions –New Exceptions Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 4 2 36-bit Addressing l 36-bit Addressing – PSE-36 – PAE-36 l PSE-36 – 4GB mapped through 4K of page directories and 4MB page tables – Memory above 4 GB is only accessible as 4 MB pages – Operating system can freely use both 4KB and 4MB pages without PDE structure change – All 4KB pages and page tables MUST reside below 4GB boundary – Reduces effort needed to develop & support changes in virtual memory subsystem l PAE-36 – 4GB mapped through 16K of page directories and 16MB page tables – All Memory accessible as 4KB or 2MB pages – OS needs to load PDEPTRs for mapping changes on writes to CR3 – CONFIG_HIMEM to enable more than 4 GB physical memory Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 5 4KB Page Translation Linear Address 10 10 12 PTE PDE 4-Kb page CR3 Page Table Page Directory 1024 Entries 1024 Entries Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 6 3 4KB PAE Translation Linear Address 2 9 9 12 PTE 4-Kb page PDE PDPE Page Table 512 Entries CR3 Page Directory Page Directory 512 Entries Pointer Table Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 7 4MB Translation Linear Address 10 22 PDE CR3 4-MB page Page Directory 1024 Entries Provides bits 31-22 of physical address of 4MB Page (Bits 12-21 are currently RESERVED) 31 22 17 12 0 Page Directory Entry 16 13 Provides bits 35-32 of physical address of 4MB Page (new) Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 8 4 Example Using PSE-36 8GB ONLY 4MB PAGES ABOVE 4GB 4-byte Entries 4MB Page 4-byte Entries 4K Page . 4K Page 4K Page 4GB 4MB Page CR3 Page Table Page Directory 4K & 4MB PAGES BELOW 4GB 0 Physical Memory Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 9 Page Attribute Table (PAT) l Physical Memory Attributes Described through the Page-Tables – Builds upon enhanced memory type capability provided via MTRR’s in Pentium® Pro processor – Relaxes MTRR alignment/length requirements l Builds upon PCD/PWT bits on IA-32 Architecture – These interact with effective memory type determination l PAT Architecture – PAT is an 8-entry table indexed via PCD, PWT, and Resv. bits – Allows up to 8 memory attributes defined by the page tables – PAT is always enabled when Paging is used – Default table entries fully compatible with PCD/PWT/Resv settings – PAT entries R/W programmable via RDMSR/WRMSR (0x277) – 8 bits per entry; 3 bits for attribute with other bits reserved – Memory attributes as specified by Pentium® Pro processor Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 10 5 Page Attribute Table (PAT) l PAT Architecture (continued) – PAT Memory Types interact with MTRRs – As architecturally specified by Pentium® Pro processor – Implementation specific combinations remain undefined – Should not be depended upon by system software l Precautions – OS Uses Page Directory as a Page Table: – Restricted to using 4 lowest PAT entries – PAT bit 7 in 4K PTE is PS bit when used as a PDE – Memory type changes for pages require TLB invalidation – Follow procedure as when changing MTRRs – cache flush, TLB invalidation – PAT entries on multiple processors must be maintained in consistent manner by OS – All processors have same values in PAT Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 11 Page Attribute Table (PAT) l Precautions (continued) – Page Aliasing – PAT maintains memory types according to linear addresses – Architecture allows OS to map single physical page with 2 linear addresses containing differing types – This may lead to undefined results and must be avoided l PAT Uses – Essentially unlimited MTRRs – Provide support for more devices (frame buffers, RAID cards, etc…) to map memory as WC – Allows map system memory for specific optimizations – Memory shared with 3D accelerator/CPU for textures – Reduce eviction, read-for-ownership bus transactions and cache thrashing for common operations such as memory fill Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 12 6 Fast Floating Point Save/Restore l These instructions minimize cost of saving/restoring Floating Point/MMX™ Technology State – Does NOT re-initialize the FPU state after saving – Performance improvements come from more natural format and alignment of the cpu state l State area is larger – 512 bytes – MUST be aligned on 16 byte boundary, else GP(0) fault – Use of reserved fields risks incompatibility with future Intel Architecture processors l FXSAVE does not check for unmasked exceptions (i.e. like FNSAVE) – FXRSTOR does not fault when loading an image that contains pending exceptions Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 13 Pentium III New Instructions Core Floating Multimedia Memory Architecture Point Arch. Architecture Architecture MMX™ P6 bus + Pentium III FP Technology WC I/O processors= Dynamic + + + Execution New Media Streaming SIMD FP Instr. Mem Instr l 52 New SIMD Single Precision Floating Point Instructions l up to 4 FP results per cycle l Eight 128 bit registers l 4 x Single precision FP numbers l 12 New Media Instructions l 8 New Cacheability Instructions Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 14 7 Prefetching Instruction l Prefetch gets a cacheline at a time l Prefetch Hint (Load) Instructions – Instructions do not fault – Retires quickly to free up machine resources – Hints to cache at different levels – Store in different levels of cache hierarchy – Don’t store in the cache hierarchy (stream) l Potential OS tuning uses – e.g. TCP/IP Checksum gets ~2x speedup Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 15 Streaming Store Instruction l Store data to memory minimizing cache pollution l Potential OS tuning benefits – 128 bit registers used with streaming store to zero pages ~4x faster – mem copy ~2x faster using prefetch/stream together Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 16 8 New Exceptions l Interrupt vector 19 used to invoke unmasked exception handlers – Provide larger (512 bytes of state) context record to handler – Handlers need to account for SIMD nature of Pentium III SSE numeric exceptions – One instruction can generate multiple exceptions l Exceptions are precise (reported when detected) l Pentium III SSE Instructions architecturally separate from x87- FP – Pentium III SSE Instructions do not report x87-FP/MMX™ Technology exceptions – New handlers must include IEEE filter to decode and emulate exception raising SIMD instructions Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 17 Pentium® 4 Processor l Pentium 4 Processor New Features –SSE2 Instructions –Enhanced Prefetch Instructions –System Bus and Cache Enhancements l OS Recommendation l New Instruction support Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 18 9 Pentium 4 Architecture Overview l Willamette is the next generation IA-32 processor microarchitecture l New micro-architecture – ~1.4x average performance of Pentium® III processor family on same process – Enables faster processor speeds (1 GHz+) – Trace Cache for Instruction Decode l Willamette New Instructions l New platform (chipsets, AGP4X) Intel Labs Copyright © 2000 Intel Corporation. Linux Supercluster Users Conference Page 19 Pentium 4 New Instructions l New 128 bit arithmetic instructions – Extend MMXä technology instructions from 64 bit to 128 bit data type – Operates on XMM registers instead of MMX/x87-FP registers – New 128-bit integer and SIMD-Integer instructions – Memory operands MUST be 128-bit aligned! Will cause Exception during executions if not aligned. – Packed 32 * 32 bit Multiply – Packed 64 bit Add/Subtract – Shift, Shuffle, Unpack, Move, Conversion l New SIMD Double Precision FP instructions – Full complement of FP arithmetic operations – Packed/Scalar DP Û SP conversions l New cache / memory management instructions – Cache line flush

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    27 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us