Operating Systems & Memory Systems: Address Translation & Caches
CPS 220 Professor Alvin R. Lebeck Fall 2001
Outline
• Review • TLBs • Page Table Designs • Interaction of VM and Caches Admin • HW #4 Due today • Project Status report due Thursday
© Alvin R. Lebeck 2001 CPS 220 2
Page 1 Virtual Memory: Motivation
Virtual • Process = Address Space + thread(s) of Physical control • Address space = PA – programmer controls movement from disk – protection? – relocation? • Linear Address space – larger than physical address space » 32, 64 bits v.s. 28-bit physical (256MB) • Automatic management
© Alvin R. Lebeck 2001 CPS 220 3
Virtual Memory
• Process = virtual address space + thread(s) of control • Translation – VA -> PA – What physical address does virtual address A map to – Is VA in physical memory? • Protection (access control) – Do you have permission to access it?
© Alvin R. Lebeck 2001 CPS 220 4
Page 2 Segmented Virtual Memory
• Virtual address (232, 264) to Physical Address mapping (230) • Variable size, base + offset, contiguous in both VA and PA Virtual
Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000
0x11000
© Alvin R. Lebeck 2001 CPS 220 5
Paged Virtual Memory
• Virtual address (232, 264) to Physical Address mapping (228) – virtual page to physical page frame Virtual page number Offset • Fixed Size units for access control & translation Virtual
Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000
0x11000
© Alvin R. Lebeck 2001 CPS 220 6
Page 3 Page Table
• Kernel data structure (per process) • Page Table Entry (PTE) – VA -> PA translations (if none page fault) – access rights (Read, Write, Execute, User/Kernel, cached/uncached) – reference, dirty bits • Many designs – Linear, Forward mapped, Inverted, Hashed, Clustered • Design Issues – support for aliasing (multiple VA to single PA) – large virtual address space – time to obtain translation
© Alvin R. Lebeck 2001 CPS 220 7
Alpha VM Mapping (Forward Mapped)
• “64-bit” address divided into 3 segments 21 L1 L2 L3 PO – seg0 (bit 63=0) user code/heap seg 0/1 10 10 10 13 – seg1 (bit 63 = 1, 62 = 1) user stack – kseg (bit 63 = 1, 62 = 0) kernel segment for OS base + • Three level page table, each one page + – Alpha 21064 only 43 unique bits of VA + – (future min page size up to 64KB => 55 bits of VA) • PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) phys page – What do you do for replacement? frame number
© Alvin R. Lebeck 2001 CPS 220 8
Page 4 Inverted Page Table (HP, IBM)
Virtual page number Offset • One PTE per page frame Inverted Page Table (IPT) – only one VA per physical frame Hash VA PA,ST • Must search for virtual address • More difficult to support aliasing • Force all sharing to use the same VA Hash Anchor Table (HAT)
© Alvin R. Lebeck 2001 CPS 220 9
Intel Pentium Segmentation + Paging
Physical Logical Address Linear Address Space Address Space Dir Table Offset Seg Selector Offset
Global Descriptor Page Table Table (GDT)
Page Dir Segment Descriptor
Segment Base Address
© Alvin R. Lebeck 2001 CPS 220 10
Page 5 The Memory Management Unit (MMU)
• Input – virtual address • Output – physical address – access violation (exception, interrupts the processor) • Access Violations – not present – user v.s. kernel – write – read – execute
© Alvin R. Lebeck 2001 CPS 220 11
Translation Lookaside Buffers (TLB)
• Need to perform address translation on every memory reference – 30% of instructions are memory references – 4-way superscalar processor – at least one memory reference per cycle • Make Common Case Fast, others correct • Throw HW at the problem • Cache PTEs
© Alvin R. Lebeck 2001 CPS 220 12
Page 6 Fast Translation: Translation Buffer
• Cache of translated addresses • Alpha 21164 TLB: 48 entry fully associative Page Page Number offset
v r w tag phys frame 1 2
...... 4
......
3 48 48:1 mux
© Alvin R. Lebeck 2001 CPS 220 13
TLB Design
• Must be fast, not increase critical path • Must achieve high hit ratio • Generally small highly associative • Mapping change – page removed from physical memory – processor must invalidate the TLB entry • PTE is per process entity – Multiple processes with same virtual addresses – Context Switches? • Flush TLB • Add ASID (PID) – part of processor state, must be set on context switch
© Alvin R. Lebeck 2001 CPS 220 14
Page 7 Managing a Cache/VM
• How do we detect a miss/page fault? • What happens on a miss/page fault? • Processor caches are a cache over main memory – Hardware managed • Virtual memory can be a cache over the file system + anonymous memory (e.g., stack, heap) – Software managed • How do we manage the TLB? • What happens on a TLB miss? How does this relate to a page fault?
© Alvin R. Lebeck 2001 CPS 220 15
Hardware Managed TLBs
• Hardware Handles CPU TLB miss • Dictates page table organization TLB • Complicated state Control machine to “walk page table” – Multiple levels for forward mapped – Linked list for inverted • Exception only if Memory access violation
© Alvin R. Lebeck 2001 CPS 220 16
Page 8 Software Managed TLBs
• Software Handles CPU TLB miss • Flexible page table organization TLB Control • Simple Hardware to detect Hit or Miss • Exception if TLB miss or access violation • Should you check for access violation Memory on TLB miss?
© Alvin R. Lebeck 2001 CPS 220 17
Mapping the Kernel 264-1 • Digital Unix Kseg Physical User Memory – kseg (bit 63 = 1, 62 = 0) Stack • Kernel has direct access to physical Kernel memory Kernel • One VA->PA mapping for entire Kernel • Lock (pin) TLB entry – or special HW detection User Code/ Data
0
© Alvin R. Lebeck 2001 CPS 220 18
Page 9 Considerations for Address Translation
Page Size Large virtual address space • Can map more things – Files, frame buffers, network interfaces, memory from another workstation • Sparse use of address space • Page Table Design – space – less locality => TLB misses OS structure • microkernel => more TLB misses
© Alvin R. Lebeck 2001 CPS 220 19
A Case for Large Pages
• Page table size is inversely proportional to the page size – memory saved • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, – larger page size maps more memory – reduces TLB misses • Fast cache hit time easy when cache <= page size (VA caches) – bigger page makes it feasible as cache size grows – More on this later today…
© Alvin R. Lebeck 2001 CPS 220 20
Page 10 A Case for Small Pages
• Fragmentation – large pages can waste storage – data must be contiguous within page • Quicker process start for small processes(??)
© Alvin R. Lebeck 2001 CPS 220 21
Superpages
• Hybrid solution: multiple page sizes – 8KB, 16KB, 32KB, 64KB pages – 4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages • Need to identify candidate superpages – Kernel – Frame buffers – Database buffer pools • Application/compiler hints • Detecting superpages – static, at page fault time – dynamically create superpages • Page Table & TLB modifications
© Alvin R. Lebeck 2001 CPS 220 22
Page 11 Address Translation for Large Address Spaces
• Forward Mapped Page Table – grows with virtual address space » worst case 100% overhead not likely – TLB miss time: memory reference for each level • Inverted Page Table – grows with physical address space » independent of virtual address space usage – TLB miss time: memory reference to HAT, IPT, list search
© Alvin R. Lebeck 2001 CPS 220 23
Hashed Page Table (HP)
Virtual page number Offset • Combine Hash Table and IPT [Huck96] Hash – can have more entries than physical page frames Hashed Page Table (HPT) • Must search for virtual VA PA,ST address • Easier to support aliasing than IPT • Space – grows with physical space • TLB miss – one less memory ref than IPT
© Alvin R. Lebeck 2001 CPS 220 24
Page 12 Clustered Page Table (SUN)
VPBN Boff Offset • Combine benefits of HPT and Linear [Talluri95] Hash • Store one base VPN
... (TAG) and several PPN VPBN values next PA0 attrib – virtual page block number PA1 attrib (VPBN) PA2 attrib PA3 attrib – block offset VPBN next PA0 attrib VPBN VPBN next next PA0 attrib PA0 attrib
...
© Alvin R. Lebeck 2001 CPS 220 25
Reducing TLB Miss Handling Time
• Problem – must walk Page Table on TLB miss – usually incur cache misses – big problem for IPC in microkernels • Solution – build a small second-level cache in SW – on TLB miss, first check SW cache » use simple shift and mask index to hash table
© Alvin R. Lebeck 2001 CPS 220 26
Page 13 Review: Address Translation
• Map from virtual address to physical address • Page Tables, PTE – va->pa, attributes – forward mapped, inverted, hashed, clustered • Translation Lookaside Buffer – hardware cache of most recent va->pa translation – misses handled in hardware or software • Implications of larger address space – page table size – possibly more TLB misses • OS Structure – microkernels -> lots of IPC -> more TLB misses
© Alvin R. Lebeck 2001 CPS 220 27
Cache Memory 102
• Block 7 placed in 4 DM SA block cache: FA 7 mod 4 7 mod 2 – Fully associative, 0 1 2 3 0 1 2 3 0 1 2 3 direct mapped, 2-way set associative – S.A. Mapping = Block Number Modulo Number Sets Set Set – DM = 1-way Set Assoc 0 1 • Cache Frame 0 1 2 3 7 – location in cache Main • Bit-selection Memory
© Alvin R. Lebeck 2001 CPS 220 28
Page 14 Cache Indexing
• Tag on each block – No need to check index or block offset • Increasing associativity shrinks index, expands tag
Block Address
TAG Index Block offset
Fully Associative: No index Direct-Mapped: Large index
© Alvin R. Lebeck 2001 CPS 220 29
Address Translation and Caches
• Where is the TLB wrt the cache? • What are the consequences?
• Most of today’s systems have more than 1 cache – Digital 21164 has 3 levels – 2 levels on chip (8KB-data,8KB-inst,96KB-unified) – one level off chip (2-4MB) • Does the OS need to worry about this?
Definition: page coloring = careful selection of va->pa mapping
© Alvin R. Lebeck 2001 CPS 220 30
Page 15 TLBs and Caches
CPU CPU CPU
VA VA VA VA TLB $ PA $ TLB Tags Tags PA VA PA L2 $ $ TLB
PA PA MEM
MEM MEM Overlap $ access with VA translation: Conventional Virtually Addressed Cache requires $ index to Organization Translate only on miss remain invariant Alias (Synonym) Problem across translation
© Alvin R. Lebeck 2001 CPS 220 31
Virtual Caches
• Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache • Avoid address translation before accessing cache – faster hit time to cache • Context Switches? – Just like the TLB (flush or pid) – Cost is time to flush + “compulsory” misses from empty cache – Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process • I/O must interact with cache
© Alvin R. Lebeck 2001 CPS 220 32
Page 16 I/O and Virtual Caches
Virtual interrupts Cache Processor
Physical Cache
Addresses Memory Bus I/O Bridge
I/O is accomplished with physical addresses Main I/O Bus DMA Memory • flush pages from cache Disk Graphics Network • need pa->va reverse Controller Controller Interface translation • coherent DMA Disk Disk Graphics Network
© Alvin R. Lebeck 2001 CPS 220 33
Aliases and Virtual Caches 264-1 Physical • aliases (sometimes User Memory called synonyms); Stack Two different virtual addresses Kernel map to same Kernel physical address • But, but... the virtual address is used to index the cache User Code/ • Could have data in Data two different locations in the cache 0
© Alvin R. Lebeck 2001 CPS 220 34
Page 17 Index with Physical Portion of Address
• If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag
Page Address Page Offset
Address Tag Index Block Offset
• Limits cache to page size: what if we want bigger caches and use same trick? – Higher associativity – Page coloring
© Alvin R. Lebeck 2001 CPS 220 35
Page Coloring for Aliases
• HW that guarantees that every cache frame holds unique physical address • OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame – one form of page coloring
Page Offset Page Address
Address Tag Block Offset Index
© Alvin R. Lebeck 2001 CPS 220 36
Page 18