<<

Unit (or TLB)

Virtual Memory

The position and function of the MMU 1 2

Virtual Address Virtual Address Typical 15 Paging Space 15 14 14 Space Layout 13 Kernel 13 • 12 • Physical Memory 12 • Stack region is at top, – Divided into equal- and can grow down sized pages 11 – Divided into Stack 11 – A mapping is a 10 equal-sized 10 • Heap has free space to translation between 9 frames Shared 9 grow up Libraries • A and a frame 8 8 • Text is typically read-only • A page and null – Mappings defined at 7 1 7 BSS 7 • Kernel is in a reserved, runtime 6 12 6 (heap) 6 protected, shared region • They can change 5 10 5 5 • 0-th page typically not – Address space can 4 4 have holes 4 used, why? – does not 3 2 3 3 have to be 2 2 Text 2 contiguous in 1 3 1 physical memory 1 (Code) 0 11 0 Space 3 0 4

Programmer’s perspective : Proc 1 Address Proc 2 Address Space Space 15 logically present 15 15 14 System’s perspective : Not Currently 14 14 running 13 mapped, data on disk 13 13 • A process may 12 12 Physical 12 Address Space be only partially 11 4 11 11 resident 14 6 2 10 10 10 15 – Allows OS to 2 4 9 9 9 14 14 store individual 10 13 8 8 8 2 3 pages on disk 4 7 1 Disk 7 7 15 1 – Saves memory 6 15 6 1 6 for infrequently 2 Disk used data & code 5 5 5 4 4 3 4 • What happens if Memory 3 5 3 13 3 we access non- Access 2 2 2 resident 1 3 Physical Address 1 1 memory? 5 6 0 13 Space 0 0

1 Virtual Address Page Faults Space 15 6 14 Page • Referencing an invalid page triggers a 13 Table 0 • An exception handled by the OS • for 12 • Broadly, two standard page fault types resident part of 11 – Illegal Address (protection error) • Signal or kill the process address space 10 – Page not resident 9 • Get an empty frame 8 • Load page from disk 7 1 7 • Update page (translation) table (enter frame #, set valid bit, etc.) • Restart the faulting instruction 6 15 6 5 5 5 3 4 4 4 3 5 3 1 2 2 2 1 3 1 Physical 7 7 0 13 0 Address Space 8

Shared Pages

• Note: Some implementations store disk • Private code and data • Shared code block numbers of non-resident pages in – Each process has own – Single copy of code the page table (with valid bit Unset ) copy of code and data shared between all – Code and data can processes executing it appear anywhere in – Code must not be self the address space modifying – Code must appear at same address in all processes

9 10

Proc 1 Address Proc 2 Address Space Space 15 15 Page Table Structure 14 14 0 13 13 5 • Page table is (logically) an array of 5 12 Physical 12 Address Space frame numbers 11 11 10 2 10 – Index by page number 9 6 9 • Each page-table entry (PTE) also has 8 13 8 other bits 7 4 7 Two (or more) 3 processes 6 6 running the 5 1 5 same program 4 3 4 4 4 1 and sharing 3 13 3 the text section 7 2 2 7 7 2 Page 1 1 Page 2 Page 2 Table 0 0 Table 11 Table 12

2 PTE bits Address Translation • Present/Absent bit – Also called valid bit, it indicates a valid mapping for the page • Modified bit • Every (virtual) issued by – Also called dirty bit, it indicates the page may have been the CPU must be translated to physical modified in memory memory • Reference bit – Indicates the page has been accessed – Every load and every store instruction • Protection bits – Every instruction fetch – Read permission, Write permission, Execute permission – Or combinations of the above • Need Translation Hardware • Caching bit – Use to indicate should bypass the when • In paging system, translation involves accessing memory replace page number with a frame number • Example: to access device registers or memory

13 14

virtual memory virtual and physical mem chopped up in pages

• programs use virtual addresses • virtual to physical mapping by MMU -first check if page present (present/absent bit ) -if yes: address in page table form MSBs in physical address -if no: bring in the page from disk  page fault

Page Tables Page Tables

• Assume we have • Assume we have – 32-bit virtual address (4 Gbyte address space) – 64-bit virtual address ( humungous address space) – 4 KByte page size – 4 KByte page size – How many page table entries do we need for one – How many page table entries do we need for one process? process? • Problem: – Page table is very large – Access has to be fast, lookup for every memory reference – Where do we store the page table? • Registers? • Main memory?

17 18

3 Two-level Page Page Tables Table • Page tables are implemented as data structures in main • 2nd –level memory page tables • Most processes do not use the full 4GB address space representing – e.g., 0.1 – 1 MB text, 0.1 – 10 MB data, 0.1 MB stack unmapped • We need a compact representation that does not waste pages are not space allocated – But is still very fast to search – Null in the • Three basic schemes top-level – Use data structures that adapt to sparsity page table – Use data structures which only represent resident pages – Use VM techniques for page tables (details left to extended OS)

19 20

Two-level Translation Example Translations

21 22

Alternative: Inverted Page Table Alternative: Inverted Page Table

PID VPN offset PID VPN offset 0 0x5 0x123

Index PID VPNctrl next Index PID VPNctrl next Hash Anchor Table 0 Hash Anchor Table 0 (HAT) 1 (HAT) 1 Hash 2 Hash 2 1 0x1A 0x40C 3 … 4 0x40C 0 0x5 0x0 5 0x40D 6 2 … … … IPT: entry for each physical frame ppn offset 0x40C 0x123

4 Inverted Page Table (IPT) Properties of IPTs • “Inverted page table” is an array of page numbers sorted (indexed) by frame number (it’s • IPT grows with size of RAM, NOT virtual address space a frame table). • Frame table is needed anyway (for page replacement, • more later) – Compute hash of page number • Need a separate data structure for non-resident pages – Extract index from • Saves a vast amount of space (especially on 64-bit – Use this to index into inverted page table systems) – Match the PID and page number in the IPT entry • Used in some IBM and HP – If match, use the index value as frame # for translation – If no match, get next candidate IPT entry from chain field – If NULL chain entry ⇒ page fault

25 26

Given n processes Another look at sharing…

• how many page tables will the system have for – ‘normal’ page tables – inverted page tables?

Proc 1 Address Proc 2 Address Space Space 15 15 VM Implementation Issue 14 14 0 13 13 5 12 Physical 12 • Problem: Address Space 11 11 – Each virtual memory reference can cause two physical memory accesses 10 2 10 6 • One to fetch the page table entry 9 9 • One to fetch/store the data 13 8 8 ⇒Intolerable performance impact!! 7 4 7 Two (or more) • Solution: 6 3 6 processes – High-speed cache for page table entries (PTEs) running the 5 1 5 • Called a translation look-aside buffer (TLB) same program 3 4 4 4 • Contains recently used page table entries 1 and sharing 3 13 3 the text section • Associative, high-speed memory, similar to cache memory 7 2 2 7 • May be under OS control (unlike memory cache) 2 Page 1 1 Page 2 Table 0 0 Table 29 30

5 On-CPU hardware TLB operation Translation Lookaside Buffer device!!! • Given a virtual address, processor examines the TLB • If matching PTE found ( TLB hit ), the address is translated • Otherwise ( TLB miss ), the page number is used Data structure to index the process’s page table in main memory – If PT contains a valid entry, reload TLB and restart – Otherwise, (page fault) check if page is on disk • If on disk, swap it in • Otherwise, allocate a new page or raise an exception

31 32

TLB properties TLB properties • TLB may or may not be under direct OS control • Page table is (logically) an array of frame – Hardware-loaded TLB numbers • On miss, hardware performs PT lookup and reloads TLB • Example: , ARM • TLB holds a (recently used) subset of PT entries – Software-loaded TLB – Each TLB entry must be identified (tagged) with the • On miss, hardware generates a TLB miss exception, and page # it translates exception handler reloads TLB • Example: MIPS, (optionally) – Access is by associative lookup: • All TLB entries’ tags are concurrently compared to the page # • TLB size: typically 64-128 entries • TLB is associative (or content-addressable) memory • Can have separate TLBs for instruction fetch and data access • TLBs can also be used with inverted page tables (and others) 33 34

TLB and context switching TLB effect • TLB is a shared piece of hardware • Normal page tables are per-process (address space) • Without TLB • TLB entries are process-specific – Average number of physical memory – On context need to flush the TLB (invalidate all references per virtual reference entries) = 2 • high context-switching overhead (Intel x86) – or tag entries with address-space ID (ASID) • With TLB (assume 99% hit ratio) • called a tagged TLB • used (in some form) on all modern architectures – Average number of physical memory • TLB entry: ASID, page #, frame #, valid and write-protect bits references per virtual reference = .99 * 1 + 0.01 * 2 = 1.01

35 36

6 Recap - Simplified Components of Recap - Simplified Components of

Virtual Address Spaces VM System Page Tables for 3 Virtual Address Spaces VM System (3 processes) processes (3 processes) Frame Table Inverted Page Table

1 3 2

CPU CPU TLB TLB 1 2 3 1 2 3

Frame Pool Frame Pool

Physical Memory Physical Memory 37 38

0xFFFFFFFF R3000 Address MIPS R3000 TLB kseg2 Space Layout 0xC0000000 • kuseg: kseg1 – 2 0xA0000000 – TLB translated (mapped) – Cacheable (depending on ‘N’ bit) kseg0 – user-mode and kernel mode 0x80000000 accessible – Page size is 4K • N = Not cacheable • V = valid bit • D = Dirty = Write protect • 64 TLB entries • Accessed via software through kuseg • G = Global (ignore ASID Cooprocessor 0 registers in lookup) – EntryHi and EntryLo

39 40 0x00000000

0xFFFFFFFF 0xffffffff R3000 Address kseg2 R3000 Address kseg2

Space Layout 0xC0000000 Space Layout 0xC0000000 – Switching processes • kseg0: kseg1 kseg1 the translation 0xA0000000 – 512 0xA0000000 (page table) for kuseg – Fixed translation window to kseg0 physical memory kseg0 • 0x80000000 - 0x9fffffff virtual = 0x80000000 0x80000000 0x00000000 - 0x1fffffff physical • TLB not used – Cacheable – Only kernel-mode accessible – Usually where the kernel code is Proc 1 Proc 2 Proc 3 placed kuseg kuseg kuseg kuseg

41 Physical Memory 42 0x00000000 0x00000000

7 0xffffffff 0xffffffff

R3000 Address kseg2 R3000 Address kseg2 Space Layout 0xC0000000 Space Layout 0xC0000000 • kseg1: • kseg2: kseg1 kseg1 – 512 megabytes 0xA0000000 – 1024 megabytes 0xA0000000 – Fixed translation window to – TLB translated (mapped) physical memory kseg0 – Cacheable kseg0 • 0xa0000000 - 0xbfffffff virtual = 0x80000000 0x80000000 0x00000000 - 0x1fffffff physical • Depending on the ‘N’-bit • TLB not used – Only kernel-mode accessible – NOT cacheable – Can be used to store the virtual – Only kernel-mode accessible linear array page table – Where devices are accessed (and boot ROM) kuseg kuseg

Physical Memory 43 44 0x00000000 0x00000000

8