Operating Systems & Memory Systems: Address Translation & Caches

CPS 220 Professor Alvin R. Lebeck Fall 2001

Outline

• Review • TLBs • Table Designs • Interaction of VM and Caches Admin • HW #4 Due today • Project Status report due Thursday

© Alvin R. Lebeck 2001 CPS 220 2

Page 1 : Motivation

Virtual • Process = Address Space + thread(s) of Physical control • Address space = PA – programmer controls movement from disk – protection? – relocation? • Linear Address space – larger than physical address space » 32, 64 bits v.s. 28-bit physical (256MB) • Automatic management

© Alvin R. Lebeck 2001 CPS 220 3

Virtual Memory

• Process = virtual address space + thread(s) of control • Translation – VA -> PA – What physical address does virtual address A map to – Is VA in physical memory? • Protection (access control) – Do you have permission to access it?

© Alvin R. Lebeck 2001 CPS 220 4

Page 2 Segmented Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (230) • Variable size, base + offset, contiguous in both VA and PA Virtual

Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000

0x11000

© Alvin R. Lebeck 2001 CPS 220 5

Paged Virtual Memory

• Virtual address (232, 264) to Physical Address mapping (228) – virtual page to physical page frame Virtual page number Offset • Fixed Size units for access control & translation Virtual

Physical 0x1000 0x0000 0x1000 0x6000 0x2000 0x9000

0x11000

© Alvin R. Lebeck 2001 CPS 220 6

Page 3

• Kernel data structure (per process) • Page Table Entry (PTE) – VA -> PA translations (if none page fault) – access rights (Read, Write, Execute, User/Kernel, cached/uncached) – reference, dirty bits • Many designs – Linear, Forward mapped, Inverted, Hashed, Clustered • Design Issues – support for aliasing (multiple VA to single PA) – large virtual address space – time to obtain translation

© Alvin R. Lebeck 2001 CPS 220 7

Alpha VM Mapping (Forward Mapped)

• “64-bit” address divided into 3 segments 21 L1 L2 L3 PO – seg0 (bit 63=0) user code/heap seg 0/1 10 10 10 13 – seg1 (bit 63 = 1, 62 = 1) user stack – kseg (bit 63 = 1, 62 = 0) kernel segment for OS base + • Three level page table, each one page + – Alpha 21064 only 43 unique bits of VA + – (future min page size up to 64KB => 55 bits of VA) • PTE bits; valid, kernel & user read & write enable (No reference, use, or dirty bit) phys page – What do you do for replacement? frame number

© Alvin R. Lebeck 2001 CPS 220 8

Page 4 Inverted Page Table (HP, IBM)

Virtual page number Offset • One PTE per page frame Inverted Page Table (IPT) – only one VA per physical frame Hash VA PA,ST • Must search for virtual address • More difficult to support aliasing • Force all sharing to use the same VA Hash Anchor Table (HAT)

© Alvin R. Lebeck 2001 CPS 220 9

Intel Pentium Segmentation + Paging

Physical Logical Address Linear Address Space Address Space Dir Table Offset Seg Selector Offset

Global Descriptor Page Table Table (GDT)

Page Dir Segment Descriptor

Segment Base Address

© Alvin R. Lebeck 2001 CPS 220 10

Page 5 The Memory Management Unit (MMU)

• Input – virtual address • Output – physical address – access violation (exception, interrupts the processor) • Access Violations – not present – user v.s. kernel – write – read – execute

© Alvin R. Lebeck 2001 CPS 220 11

Translation Lookaside Buffers (TLB)

• Need to perform address translation on every memory reference – 30% of instructions are memory references – 4-way – at least one memory reference per cycle • Make Common Case Fast, others correct • Throw HW at the problem • Cache PTEs

© Alvin R. Lebeck 2001 CPS 220 12

Page 6 Fast Translation: Translation Buffer

• Cache of translated addresses • TLB: 48 entry fully associative Page Page Number offset

v r w tag phys frame 1 2

...... 4

......

3 48 48:1 mux

© Alvin R. Lebeck 2001 CPS 220 13

TLB Design

• Must be fast, not increase critical path • Must achieve high hit ratio • Generally small highly associative • Mapping change – page removed from physical memory – processor must invalidate the TLB entry • PTE is per process entity – Multiple processes with same virtual addresses – Context Switches? • Flush TLB • Add ASID (PID) – part of processor state, must be set on context switch

© Alvin R. Lebeck 2001 CPS 220 14

Page 7 Managing a Cache/VM

• How do we detect a miss/page fault? • What happens on a miss/page fault? • Processor caches are a cache over main memory – Hardware managed • Virtual memory can be a cache over the file system + anonymous memory (e.g., stack, heap) – Software managed • How do we manage the TLB? • What happens on a TLB miss? How does this relate to a page fault?

© Alvin R. Lebeck 2001 CPS 220 15

Hardware Managed TLBs

• Hardware Handles CPU TLB miss • Dictates page table organization TLB • Complicated state Control machine to “walk page table” – Multiple levels for forward mapped – Linked list for inverted • Exception only if Memory access violation

© Alvin R. Lebeck 2001 CPS 220 16

Page 8 Software Managed TLBs

• Software Handles CPU TLB miss • Flexible page table organization TLB Control • Simple Hardware to detect Hit or Miss • Exception if TLB miss or access violation • Should you check for access violation Memory on TLB miss?

© Alvin R. Lebeck 2001 CPS 220 17

Mapping the Kernel 264-1 • Digital Unix Kseg Physical User Memory – kseg (bit 63 = 1, 62 = 0) Stack • Kernel has direct access to physical Kernel memory Kernel • One VA->PA mapping for entire Kernel • Lock (pin) TLB entry – or special HW detection User Code/ Data

0

© Alvin R. Lebeck 2001 CPS 220 18

Page 9 Considerations for Address Translation

Page Size Large virtual address space • Can map more things – Files, frame buffers, network interfaces, memory from another • Sparse use of address space • Page Table Design – space – less locality => TLB misses OS structure • microkernel => more TLB misses

© Alvin R. Lebeck 2001 CPS 220 19

A Case for Large Pages

• Page table size is inversely proportional to the page size – memory saved • Transferring larger pages to or from secondary storage, possibly over a network, is more efficient • Number of TLB entries are restricted by clock cycle time, – larger page size maps more memory – reduces TLB misses • Fast cache hit time easy when cache <= page size (VA caches) – bigger page makes it feasible as cache size grows – More on this later today…

© Alvin R. Lebeck 2001 CPS 220 20

Page 10 A Case for Small Pages

• Fragmentation – large pages can waste storage – data must be contiguous within page • Quicker process start for small processes(??)

© Alvin R. Lebeck 2001 CPS 220 21

Superpages

• Hybrid solution: multiple page sizes – 8KB, 16KB, 32KB, 64KB pages – 4KB, 64KB, 256KB, 1MB, 4MB, 16MB pages • Need to identify candidate superpages – Kernel – Frame buffers – Database buffer pools • Application/compiler hints • Detecting superpages – static, at page fault time – dynamically create superpages • Page Table & TLB modifications

© Alvin R. Lebeck 2001 CPS 220 22

Page 11 Address Translation for Large Address Spaces

• Forward Mapped Page Table – grows with virtual address space » worst case 100% overhead not likely – TLB miss time: memory reference for each level • Inverted Page Table – grows with physical address space » independent of virtual address space usage – TLB miss time: memory reference to HAT, IPT, list search

© Alvin R. Lebeck 2001 CPS 220 23

Hashed Page Table (HP)

Virtual page number Offset • Combine Hash Table and IPT [Huck96] Hash – can have more entries than physical page frames Hashed Page Table (HPT) • Must search for virtual VA PA,ST address • Easier to support aliasing than IPT • Space – grows with physical space • TLB miss – one less memory ref than IPT

© Alvin R. Lebeck 2001 CPS 220 24

Page 12 Clustered Page Table (SUN)

VPBN Boff Offset • Combine benefits of HPT and Linear [Talluri95] Hash • Store one base VPN

... (TAG) and several PPN VPBN values next PA0 attrib – virtual page block number PA1 attrib (VPBN) PA2 attrib PA3 attrib – block offset VPBN next PA0 attrib VPBN VPBN next next PA0 attrib PA0 attrib

...

© Alvin R. Lebeck 2001 CPS 220 25

Reducing TLB Miss Handling Time

• Problem – must walk Page Table on TLB miss – usually incur cache misses – big problem for IPC in microkernels • Solution – build a small second-level cache in SW – on TLB miss, first check SW cache » use simple shift and mask index to hash table

© Alvin R. Lebeck 2001 CPS 220 26

Page 13 Review: Address Translation

• Map from virtual address to physical address • Page Tables, PTE – va->pa, attributes – forward mapped, inverted, hashed, clustered • Translation Lookaside Buffer – hardware cache of most recent va->pa translation – misses handled in hardware or software • Implications of larger address space – page table size – possibly more TLB misses • OS Structure – microkernels -> lots of IPC -> more TLB misses

© Alvin R. Lebeck 2001 CPS 220 27

Cache Memory 102

• Block 7 placed in 4 DM SA block cache: FA 7 mod 4 7 mod 2 – Fully associative, 0 1 2 3 0 1 2 3 0 1 2 3 direct mapped, 2-way set associative – S.A. Mapping = Block Number Modulo Number Sets Set Set – DM = 1-way Set Assoc 0 1 • Cache Frame 0 1 2 3 7 – location in cache Main • Bit-selection Memory

© Alvin R. Lebeck 2001 CPS 220 28

Page 14 Cache Indexing

• Tag on each block – No need to check index or block offset • Increasing associativity shrinks index, expands tag

Block Address

TAG Index Block offset

Fully Associative: No index Direct-Mapped: Large index

© Alvin R. Lebeck 2001 CPS 220 29

Address Translation and Caches

• Where is the TLB wrt the cache? • What are the consequences?

• Most of today’s systems have more than 1 cache – Digital 21164 has 3 levels – 2 levels on chip (8KB-data,8KB-inst,96KB-unified) – one level off chip (2-4MB) • Does the OS need to worry about this?

Definition: page coloring = careful selection of va->pa mapping

© Alvin R. Lebeck 2001 CPS 220 30

Page 15 TLBs and Caches

CPU CPU CPU

VA VA VA VA TLB $ PA $ TLB Tags Tags PA VA PA L2 $ $ TLB

PA PA MEM

MEM MEM Overlap $ access with VA translation: Conventional Virtually Addressed Cache requires $ index to Organization Translate only on miss remain invariant Alias (Synonym) Problem across translation

© Alvin R. Lebeck 2001 CPS 220 31

Virtual Caches

• Send virtual address to cache. Called Virtually Addressed Cache or just Virtual Cache vs. Physical Cache or Real Cache • Avoid address translation before accessing cache – faster hit time to cache • Context Switches? – Just like the TLB (flush or pid) – Cost is time to flush + “compulsory” misses from empty cache – Add process identifier tag that identifies process as well as address within process: can’t get a hit if wrong process • I/O must interact with cache

© Alvin R. Lebeck 2001 CPS 220 32

Page 16 I/O and Virtual Caches

Virtual interrupts Cache Processor

Physical Cache

Addresses Memory I/O Bridge

I/O is accomplished with physical addresses Main I/O Bus DMA Memory • flush pages from cache Disk Graphics Network • need pa->va reverse Controller Controller Interface translation • coherent DMA Disk Disk Graphics Network

© Alvin R. Lebeck 2001 CPS 220 33

Aliases and Virtual Caches 264-1 Physical • aliases (sometimes User Memory called synonyms); Stack Two different virtual addresses Kernel map to same Kernel physical address • But, but... the virtual address is used to index the cache User Code/ • Could have data in Data two different locations in the cache 0

© Alvin R. Lebeck 2001 CPS 220 34

Page 17 Index with Physical Portion of Address

• If index is physical part of address, can start tag access in parallel with translation so that can compare to physical tag

Page Address Page Offset

Address Tag Index Block Offset

• Limits cache to page size: what if we want bigger caches and use same trick? – Higher associativity – Page coloring

© Alvin R. Lebeck 2001 CPS 220 35

Page Coloring for Aliases

• HW that guarantees that every cache frame holds unique physical address • OS guarantee: lower n bits of virtual & physical page numbers must have same value; if direct-mapped, then aliases map to same cache frame – one form of page coloring

Page Offset Page Address

Address Tag Block Offset Index

© Alvin R. Lebeck 2001 CPS 220 36

Page 18