Design Goals

COS 318: Operating Systems u Protection u Virtualization l Use disk to extend physical memory Virtual Memory Design Issues: l Make addressing user friendly (0 to high address)

Address Translation u Enabling memory sharing (code, libraries, communication)

u Efficiency Jaswinder Pal Singh and a Fabulous Course Staff l Translation efficiency (TLB as cache) Computer Science Department l Access efficiency Princeton University • Access time = h × memory access time + ( 1 - h ) × disk access time • E.g. Suppose memory access time = 100ns, disk access time = 10ms (http://www.cs.princeton.edu/courses/cos318/) • If h = 90%, VM access time is 1ms!

u Portability

3

1 3

Recall Address translation: Base and Bound Recall Address Translation: Segmentation

Processor’s View Implementation Physical u A segment is a contiguous region of virtual memory Memory u Every process has a segment table (in hardware)

Virtual Base l Entry in table per segment Memory Base Virtual Virtual Physical Address Address Address u Segment can be located anywhere in physical memory Processor Processor l Each segment has: start, length, access permission

u Base+ Processes can share segments Bound Bound l Same start, length, same/different access permissions

Raise Exception

◆ Pros: Simple, fast, cheap, safe, can relocate ◆ Cons:

● Can’t keep program from accidentally overwriting its own code ● Can’t share code/data with other processes (all or nothing)

● Can’t grow stack/heap as needed (stop program, change reg, …)

4 5

1 Segmentation Segments Enable Copy-on-Write

Process View Implementation Physical u Memory Idea of Copy-on-Write Base 3 Virtual l Child process inherits copy of parent’s address space on fork Memory Stack l But don’t really want to make a copy of all data upon fork Processor Base+ Virtual Bound 3 Address Code Virtual Segment Table l Would like to share as far as possible and make own copy Processor Address Base Bound Access only “on-demand”, i.e. upon a write Base 0 Segment O!set Read Code Data R/W u Segments allow this to an extent Base+ R/W Bound 0 l Copy segment table into child, not entire address space R/W Heap Base 1 l Mark all parent and child segments read-only

Data l Start child process; return to parent Physical Address Base+ Stack Bound 1 l If child or parent writes to a segment (e.g. stack, heap) Raise Exception • Trap into kernel Base 2 • At this point, make a copy of the segment, and resume Heap • Segments contiguous, but gaps in VM between them Base+ • Segment table small, so stored on-CPU Bound 2 • Access control on per-segment basis, allows code protection, e.g.

6 7

Segmentation Recall Address Translation: Paging

u Manage memory in fixed size units, or pages u Pros u Finding a free page is easy l Can share code/data segments between processes l Effectively a bitmap allocation: 0011111100000001100 l Can protect code segment from being overwritten l Every bit represents one physical page frame l Can transparently grow stack/heap as needed u Every process has its own page table l Can detect if need to copy-on-write l Stored in physical memory l Supported by a couple of hardware registers: • Pointer to start of page table u Cons • Page table length l Complex memory management due to external fragmentation • Need to find chunk of particular size u Recall fancier structures: segmentation+paging, multi-level PT • Wasted space between chunks/segments l Better for sparse virtual address spaces • May need to rearrange memory from time to time to make room l E.g. per-processor heaps, per-thread stacks, memory mapped files, for new segment or to grow segment dynamically linked libraries, … l Eliminate need for page table entries for address space “holes”

8 9

2 Multilevel Page Table Copy on Write with Paging

Implementation Physical u UNIX fork with copy on write Memory l Copy page table of parent into child process Processor l Mark all pages (in new and old page tables) as read-only Virtual Address l Trap into kernel on write (in child or parent) Index 1 Index 2 Index 3 Offset Physical l Copy page Address Level 1 l Mark both as writeable Frame Offset l Resume execution l Finer grained than with segments Level 2

Level 3

10 11

Shared Pages Pinning (or Locking) Page Frames

u PTEs from two processes share u When do you need it? v vp# pp# the same physical pages l When DMA is in progress, you don’t want to page the pages out v vp# pp# l Entries in both page tables to point to . to avoid CPU from overwriting the pages . same page frames u Mechanism? l What use cases? v vp# pp# l A data structure to remember all pinned pages Page table 1 u Implementation issues l Paging algorithm checks the data structure to decide on page l What if you terminate a process replacement with shared pages v vp# pp# l Special calls to pin and unpin certain pages l Paging in/out shared pages v vp# pp# . l Pinning, unpinning shared pages . l Deriving the working set for a v vp# pp# Physical process with shared pages Page table 2 pages

12 13

12 13

3 Zeroing Pages Efficient address translation

u Initilalize pages to all zero values u Recall translation lookaside buffer (TLB) l Heap and static data are initialized l Cache of recent virtual page -> physical page translations u How to implement? l If cache hit, use translation l On the first on a data page or stack page, zero it l If cache miss, walk (perhaps multi-level) page table

l Or, have a special thread zeroing pages in the background Virtual Virtual Address Address Raise Processor TLB Miss Page Invalid Exception Table Hit Valid

Frame Frame

Offset Physical Memory Physical Address Data

Data

14

14 15

TLB Performance Intel i7 Memory hierarchy

u Cost of translation = Cost of TLB lookup + Prob(TLB miss) * cost of page table lookup u Cost of a TLB miss on a modern processor? l Cost of multi-level page table walk l Software-controlled: plus cost of trap handler entry/exit l Use additional caching principles: multi-level caching, etc

TLB is important: Intel i7 Processor Chip

16 i7 has 8MB as shared 3rd level cache; 2nd level cache is per-core

16 17

4 Problem with Translation Slowdown Virtually vs. Physically Addressed Caches

u What is the cost of a first level TLB miss? u It can be too slow to first access TLB to find physical l Second level TLB lookup address, then look up address in the cache u What is the cost of a second level TLB miss? u Instead, first level cache is virtually addressed l x86: 2-4 level page table walk u In parallel with cache lookup using virtual address, u Problem: Do we need to wait for the address translation access TLB to generate physical address in case of a in order to look up the caches (for code and data)? cache miss

18 19

Virtually addressed caches Physically addressed cache

Virtual Virtual Virtual Virtual Virtual Virtual Address Address Address Address Address Address Virtual Raise Virtual Raise Processor Miss TLB Miss Page Invalid Processor Miss TLB Miss Page Invalid Cache Exception Cache Exception Table Table Hit Hit Hit Hit Valid Valid

Data Frame Frame Data Frame Frame

Offset Offset Physical Physical Physical Miss Cache Memory Memory Physical Physical Physical Address Hit Address Address Data Data Data Data Data

20 21

5 When do TLBs work/not work? Superpages

Video Frame Buffer u Video Frame Page# u On many systems, TLB entry can be 0 Buffer: 32 bits x l A page 1 1K x 1K = 4MB l A superpage: a set of contiguous pages 2 3 u x86: superpage is a set of pages in one page table l x86 TLB entries • 4KB • 2MB • 1GB 1021 1022 1023

22 23

Superpages When do TLBs Work/Not Work, Part 3

Physical u What happens on a context switch? Memory Virtual l Keep using TLB? Address Page# Offset l Flush TLB? SP Offset u Solution: Tagged TLB

Translation Lookaside Buffer (TLB) l Each TLB entry has process ID Superpage Superframe l TLB hit only if process ID matches current process (SP) or (SF) or Implementation Physical Page# Frame Access Physical Memory Processor Address Page Virtual Frame Address Matching Entry Frame Offset Page# Offset

Translation Lookaside Buffer (TLB) Process ID Process ID Page Frame Access Physical Matching Address Superpage Matching Entry Frame Offset

Page Table Lookup SF Offset Page Table Lookup

24 26

6 Aliasing TLB Consistency Issues

u Alias: two (or more) virtual cache entries that refer to the u “Snoopy” cache protocols (hardware) same physical memory l Maintain consistency with DRAM, even when DMA happens l A consequence of a tagged virtually addressed cache! u Consistency between DRAM and TLBs (software) l A write to one copy needs to update all copies l You need to flush related TLBs whenever changing a page table entry in memory u Typical solution u TLB “shoot-down” l Keep both virtual and physical address for each entry in l On multiprocessors/multicore, when you modify a page table virtually addressed cache entry, need to flush all related TLB entries on all l Lookup virtually addressed cache and TLB in parallel processors/cores l Check if physical address from TLB matches multiple entries, and update/invalidate other copies

28

27 28

Summary

u Must consider many issues l Global and local replacement strategies l Management of backing store l Primitive operations • Pin/lock pages • Zero pages • Shared pages • Copy-on-write u Real system designs are complex

30

30

7