Address Translation with Paging

Address Translation with Paging

Overview • Page tables Address Translation with – What are they? (review) – What does a page table entry (PTE) contain? Paging – How are page tables organized? Case studies for X86, SPARC, • Making page table access fast and PowerPC – Caching entries – Translation lookaside buffer (TLB) – TLB management Generic Page Table Generic PTE • Memory divided into pages • PTE maps virtual page to physical page • Page table is a collection of PTEs that maps a virtual page number to a PTE • Includes some page properties • Organization and content vary with architecture – Valid?, writable?, dirty?, cacheable? • If no virtual to physical mapping => page fault Virtual Page # Physical Page # Property bits Some acronyms used in this lecture: Virtual page # ? • PTE = page table entry PTE (or page fault) • PDE = page directory entry • VA = virtual address … • PA = physical address • VPN = virtual page number Page Table • {R,P}PN = {real, physical} page number 1 Real Page Tables X86-32 Address Translation • Design requirements • Page tables organized as a two-level tree – Minimize memory use (PT are pure overhead) • Efficient because address space is sparse • Each level of the tree indexed using a piece of – Fast (logically accessed on every memory ref) the virtual page number for fast lookups • Requirements lead to • One set of page tables per process – Compact data structures • Current set of page tables pointed to by CR3 – O(1) access (e.g. indexed lookup, hashtable) • CPU walks the page tables to find translations • Examples: X86 and PowerPC • Accessed and dirty bits updated by CPU • 4K or 4M (sometimes 2M) pages X86-32 PDE and PTE Details X86-32 Page Table Lookup 32-bit virtual address Page Directory Entry (PDE) • Top 10 address bits index 20 bit page number of a PTE 12 bit properties page directory and return 10-bit page dir index 10-bit page tbl index 12-bit offset of byte in page a page directory entry that points to a page table 0 1 • Middle 10 bits index the 2 . page table that points to a Global Present . Available Available Available physical memory page Reserved Accessed . Read/Write 0 Write-through 1024 • Bottom 12 bits are an 4K or 4M Page 1 Cache Disabled User/Supervisor 2 offset to a single byte in . Page Table Entry (PDE) . 0 the physical page . 1 2 1024 . 20 bit page number of a physical memory page 12 bit properties . • Checks made at each 1024 step to ensure desired page is available in 0 Page Directory memory and that the PAT Dirty 1 Global 2 process making the Present Available Available Available . Accessed Read/Write . request has sufficient Write-through . rights to access the page Cache Disabled User/Supervisor Where is the virtual page number? 1024 If a page is not present, all but bit 0 are available for OS Page Tables Physical Memory IA-32 Intel Architecture Software Developer’s Manual, Volume 3, pg. 3-24 2 X86-32 and PAE What about 64-bit X86? • Intel added support for up to 64GB of physical memory in the Pentium Pro - called Physical Address Extensions • X86-64 (AMD64 or EM64T) supports a 64- (PAE) bit virtual address (only 48 bits effective) • Introduced a new CPU mode and another layer in the page tables • Three modes • In PAE mode, 32-bit VAs map to 36-bit PAs – Legacy 32-bit (32-bit VA, 32-bit PA) • Single-process address space is still 32 bits • 4-entry page-directory-pointer-table (PDPT) points to a – Legacy PAE (32-bit VA, up to 52-bit PA) page directory and then translation proceeds as normal – Long PAE mode (64-bit VA, 52-bit PA) • Page directory and page table entries expanded to 64 bits to hold 36 bit physical addresses • Long mode requires four levels of page • Only 512 entries per 4K page tables to map 48-bit VA to 52-bit PA • 4K or 2M page sizes AMD64 Architecture Programmer’s Manual Volume 2: System Programming, Ch. 5 PowerPC Address Translation PowerPC Segmentation 64-bit “effective” address generated by a program 36-bit ESID 28 address bits • SLB is an “associative memory” • 80-bit virtual address obtained via PowerPC • Top 36 bits of a program- segmentation mechanism generated “effective address Associative Lookup used as a tag called the effective • 62-bit physical (“real”) address segment id (ESID) • Search for tag value in SLB • PTEs organized in a hash table (HTAB) • If a match exists, property bits validated for access • Each HTAB entry is a page table entry group Buffer (SLB) • A failed match causes segment fault (PTEG) • Associated 52-bit virtual segment id (VSID) is concatenated with • Each PTEG has (8) 16-byte PTEs Lookaside the remaining address bits to • Hash function on VPN gives the index of two form an 80-bit virtual address Property bits (U/S, X, V) • Segmentation used to separate Segment ESID 52-bit VSID PTEGs (Primary and secondary PTEGs) processes within the large virtual address space • Resulting 16 PTEs searched for a VPN match Matching entry 52-bit VSID 28 address bits • No match => page fault 80-bit “virtual” address used for page table lookup 3 PowerPC Page Table Lookup PowerPC PTE Details • Variable size hash table 80-bit virtual address 0 60 62 63 • Processor register points 56 Hash function to hash table base and gives table’s size Abbreviated Virtual Page Number SW / H V Primary hash index • Architecture-defined A Secondary hash index hash function on virtual / / Real Page Number / / C R C WIMG N PP address returns two possible hash table 0 2 51 54 55 56 57 60 61 62 63 entries • Each of the 16 possible Primary PTEG PTEs is checked for a Key SW=Available for OS use • 16-byte PTE Fault VA match No Match? H=Hash function ID Secondary PTEG • If no match then page fault V=Valid bit • Both VPN and RPN Page AC=Address compare bit • Possibility that a R=Referenced bit translation exists but that C=Changed bit • Why only 57 bit VPN? Match? it can’t fit in the hash Hash Table (HTAB) table – OS must handle WIMG=Storage control bits N=No execute bit PP=Page protection bits 16-byte PTE PowerPC Operating Environment Architecture, Book III, Version 2.01, Sections 4.3-4.5 Making Translation Fast Generic TLB • Page table logically accessed on every • Cache of recently used PTEs instruction • Small – usually about 64 entries • Paging has turned each memory reference • Huge impact on performance • Various organizations, search strategies, and into at least three memory references levels of OS involvement possible • Page table access has temporal locality • Consider X86 and SPARC • Use a cache to speed up access • Translation Lookaside Buffer (TLB) Virtual Address TLB Physical Address or TLB Miss or Access fault 4 TLB Organization Associativity Trade-offs TLB Entry Tag (virtual page number) Value (page table entry) Various ways to organize a 16-entry TLB • Higher associativity A A B A B C D – Better utilization, fewer collisions 0 0 0 1 1 1 – Slower 2 2 2 3 3 3 Index – More hardware 4 4 Four-way set associative 5 5 • Lower associativity 6 6 Set 7 7 – Fast 8 Two-way set associative 9 – Simple, less hardware 10 A B C D E L M N O P 11 – Greater chance of collisions 12 13 Fully associative • How does page size affect TLB performance? 14 Lookup 15 •Calculate index (index = tag % num_sets) Direct mapped • Search for tag within the resulting set • Why not use upper bits of tag value for index? X86 TLB Example: Pentium-M TLBs • TLB management shared by processor and OS • Four different TLBs • CPU fills TLB on demand from page table (the OS is – Instruction TLB for 4K pages unaware of TLB misses) • 128 entries, 4-way set associative • CPU evicts entries when a new entry must be added and – Instruction TLB for large pages no free slots exist • 2 entries, fully associative • Operating system ensures TLB/page table consistency – Data TLB for 4K pages by flushing entries as needed when the page tables are • 128 entries, 4-way set associative updated or switched (e.g. during a context switch) – Data TLB for large pages • 8 entries, 4-way set associative • TLB entries can be removed by the OS one at a time using the INVLPG instruction or the entire TLB can be • All TLBs use LRU replacement policy flushed at once by writing a new entry into CR3 • Why different TLBs for instruction, data, and page sizes? 5 SPARC TLB Minimizing Flushes • SPARC is RISC (simpler is better) CPU • On SPARC, TLB misses trap to OS (SLOW) • We want to avoid TLB misses • Example of a “software-managed” TLB • Retain TLB contents across context switch • TLB miss causes a fault, handled by OS • SPARC TLB entries enhanced with a context id • Context id allows entries with the same VPN to coexist in • OS explicitly adds entries to TLB the TLB (e.g. entries from different process address spaces) • OS is free to organize its page tables in • When a process is switched back onto a processor, any way it wants because the CPU does chances are that some of its TLB state has been not use them retained from the last time it ran • Some TLB entries shared (OS kernel memory) • E.g. Linux uses a tree like X86, Solaris – Mark as global uses a hash table – Context id ignored during matching Example:UltraSPARC III TLBs Speeding Up TLB Miss Handling • Five different TLBs • In some cases a huge amount of time can be spent handling TLB misses (2-50% in one study of SuperSPARC and SunOS) • Instruction TLBs • Many architectures that use software managed TLBs have hardware – 16 entries, fully associative (supports all page sizes) assisted TLB miss handling – 128 entries, 2-way set associative (8K pages only) • SPARC uses a large, virtually-indexed, direct-mapped, physically contiguous table of recently used TLB entries called the Translation • Data

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    6 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us