EEL-4713 Computer Architecture Virtual Memory Outline Memory

Outline ° Recap of Memory Hierarchy EEL-4713 Computer Architecture ° Virtual Memory Virtual Memory ° Page Tables and TLB ° Protection vm.1 vm.2 Memory addressing - physical Virtual Memory? Provides illusion of very large memory – sum of the memory of many jobs greater than physical memory ° So far we considered addresses of loads/stores go directly to caches/memory – address space of each job larger than physical memory • As in your project Allows available (fast and expensive) physical memory to be efficiently utilized Simplifies memory management and programming ° This makes life complicated if a computer is multi-processed/multi-user • How do you assign addresses within a program so that you know other Exploits memory hierarchy to keep average access time low. users/programs will not conflict with them? Program A: Program B: Involves at least two storage levels: main and secondary store 0x100,1 store 0x100,5 Main (DRAM): nanoseconds, M/GBytes Secondary (HD): miliseconds, G/TBytes load R1,0x100 Virtual Address -- address used by the programmer Virtual Address Space -- collection of such addresses Memory Address -- address of word in physical memory also known as “physical address” or “real address” vm.3 vm.4 Memory addressing - virtual Basic Issues in VM System Design size of information blocks (pages) that are transferred from secondary (disk) to main storage (Mem) Program A: Program B: Page brought into Mem, if Mem is full some page of Mem must be released to make room for the new page --> store 0x100,1 store 0x100,5 replacement policy load R1,0x100 missing page fetched from secondary memory only on the occurrence of a page fault --> fetch/load policy disk Translation A: Translation B: mem cache 0x100 -> 0x40000100 0x100 -> 0x50000100 reg pages Use software and hardware to guarantee no conflicts frame Operating system: keep software translation tables Paging Organization Hardware: cache recent translations virtual and physical address space partitioned into blocks of equal size page frames pages vm.5 vm.6 Address Map Paging Organization Page size: Phys Addr (PA) Virt. Addr (VA) V = {0, 1, . , n - 1} virtual address space n can be > m unit of M = {0, 1, . , m - 1} physical address space 0 frame 0 1K 0 page 0 1K mapping Addr 1024 1 1K 1024 1 1K MAP: V --> M U {0} address mapping function Trans MAP also unit of MAP(a) = a' if data at virtual address a is present in physical 7168 7 1K transfer from address a' in M virtual to Physical physical = 0 if data at virtual address a is not present in M need to Memory memory allocate address in M 31744 31 1K Virtual Memory a missing item fault Name Space V Address Mapping fault 10 Page table stored in memory handler VA page no. Page offset One page table per process Processor Start of page table stored in 0 page table base register Addr Trans Main Secondary Page Table V – Is page in memory or on disk Mechanism Memory Memory Page Table a Base Reg a' Access index V Rights PA + (concatenation) into physical address OS performs page this transfer table table located physical in physical memory memory address vm.7 vm.8 Address Mapping Algorithm *Hardware/software interface If V = 1 (where is page currently stored?) then page is in main memory at frame address stored in table ° What checks does the processor perform during a load/store memory else page is located in secondary memory (location determined at access? process creation) • Effective address computed in pipeline is virtual Access Rights • Before accessing memory, must perform virtual-physical mapping R = Read-only, R/W = read/write, X = execute only - At hardware speed, critical to performance If kind of access not compatible with specified access rights, • If there is a valid mapping, load/store proceeds as usual; address then protection_violation_fault sent to cache, DRAM is the mapped address (physical addressed) • If there is no valid mapping, or if there is a protection violation, If valid bit not set then page fault processor does not know how to handle it - Throw an exception Terms: – Save the PC of the instruction that caused the exception so that Protection Fault: access rights violation; hardware raises exception, it can be retried later microcode, or software fault handler – Jump into an operating system exception handling routine Page Fault: page not resident in physical memory, also causes a trap; - O/S handles exception using its specific policies (Linux, usually accompanied by a context switch: current process Windows will behave differently) suspended while page is fetched from secondary storage; page faults - Once it finishes handling, issue “return from interrupt” usually handled in software by OS because page fault to instruction to recover PC and try instruction again secondary memory takes million+ cycles vm.9 vm.10 Virtual Address and a Cache TLBs VA PA miss A way to speed up translation is to use a special cache of recently used page table entries -- this has many names, but the most Trans- Main CPU Cache frequently used is Translation Lookaside Buffer or TLB lation Memory hit Virtual Address Physical Address Dirty Ref Valid Access data It takes an extra memory access to translate VA to PA This makes cache access very expensive, and this is the "innermost TLB access time comparable to, though shorter than, cache access time loop" that you want to go as fast as possible (still much less than main memory access time) ASIDE: Why access cache with PA at all? VA caches have a problem! synonym problem: 1. two different virtual addresses map to same physical address => two different cache entries holding data for the same physical address! (data sharing between different processes) 2. two same virtual addresses (from different processes) map to different physical addresses vm.11 vm.12 Translation Look-Aside Buffers Reducing Translation Time Just like any other cache, the TLB can be organized as fully associative, set associative, or direct mapped TLBs are usually small, typically not more than hundreds of entries. Machines with TLBs can go one step further to reduce # cycles/cache This permits large/full associativity. access They overlap the cache access with the TLB access Works because high order bits of the VA are used to look up in the TLB while low order bits are used as index into cache * Virtually indexed, physically tagged. hit VA PA miss TLB Main CPU Cache Lookup Memory Translation miss hit with a TLB Trans- lation data vm.13 1/2 t t 20 t vm.14 Overlapped Cache & TLB Access Problems With Overlapped TLB Access Overlapped access only works as long as the address bits used to index into the cache do not change as the result of VA translation assoc index This usually limits things to small caches, large page sizes, or high 32 1 K TLB lookup Cache n-way set associative caches if you want a large cache (e.g., cache size and page size need to match) 4 bytes Example: suppose everything the same except that the cache is 10 2 increased to 8 K bytes instead of 4 K: 00 11 2 cache index 00 Hit/ PA PA Data Hit/ This bit is changed Miss 20 12 Miss by VA translation, but page # disp 20 12 is needed for cache virt page # disp lookup = Solutions: go to 8K byte page sizes IF cache hit AND (cache tag = PA) then deliver data to CPU go to 2 way set associative cache (would allow you to continue to ELSE IF [cache miss OR (cache tag != PA)] and TLB hit THEN use a 10 bit index) access memory with the PA from the TLB ELSE do standard VA translation 1K 2 way set assoc cache 10 4 4 vm.15 vm.16 Hardware versus software TLB management Optimal Page Size Choose page that minimizes fragmentation ° The TLB misses can be handled either by software or hardware large page size => internal fragmentation more severe (unused memory) BUT increase in the # of pages / name space => larger page tables • Software: processor has instructions to modify TLB in the architecture; O/S handles replacement • E.g. MIPS In general, the trend is towards larger page sizes because • Hardware: processor handles replacement without need for instructions to store TLB entries -- memories get larger as the price of RAM drops • E.g. x86 -- the gap between processor speed and disk speed grows wider Larger pages can exploit more spatial locality in transfers between ° Instructions that cause TLB “flushes” are needed in hardware case too disk and memory -- programmers desire larger virtual address spaces Most machines at 4K-64K byte pages today, with page sizes likely to increase vm.17 vm.18 2-level page table Example: two-level address translation in x86 Root Page Tables Second Level Page Table 4 bytes PA 4 bytes PA Data Pages VA 10 10 12 Seg 0 256 P0 1 K D0 4 K directory CR3 . PA . PA . Seg 1 . P255 D1023 . table Seg 1 Mbyte, but allocated Allocated in PA 255 in system virtual addr User Virtual space Space 256K bytes in physical memory 8 8 10 12 38 2 x 2 x 2 x 2 = 2 vm.19 vm.20 Example: Linux VM Linux VM areas vm_area_struct! process virtual memory! ° Demand-paging task_struct! mm_struct! vm_end! • Pages are brought into physical memory when referenced mm! vm_start! mmap! vm_prot! ° Kernel keeps track of each process’ virtual address space using a mm_struct data structure shared libraries! • Which contains pointers to list of “area” structures vm_next! (vm_area_struct) vm_end! • Linked list of vm_area structures vm_start! associated with process vm_prot! • Start/end: data! - Bounds of each VM map • Prot: vm_next! - Protection info (r/w) text! vm_end! vm_start! vm_prot! vm_next! vm.21 vm.22 Linux fault handling VM “areas” ° Exception triggers O/S handling if address process virtual memory! out of bounds or protection violated ° Linked vm_area_struct structures vm_area_struct! ° A VM area: a part of the process virtual memory space that has a vm_end! ° Traverse vm_area list, check for bounds special rule for the page-fault handlers (i.e.

Load more