Coscheduling in the Multicore Era
Total Page:16
File Type:pdf, Size:1020Kb
Load more
										Recommended publications
									
								- 
												  Operating SystemsUC Santa Barbara Operating Systems Christopher Kruegel Department of Computer Science UC Santa Barbara http://www.cs.ucsb.edu/~chris/ Virtual Memory and Paging UC Santa Barbara • What if a program is too big to be loaded in memory • What if a higher degree of multiprogramming is desirable • Physical memory is split in page frames • Virtual memory is split in pages • OS (with help from the hardware) manages the mapping between pages and page frames 2 Mapping Pages to Page Frames UC Santa Barbara • Virtual memory: 64KB • Physical memory: 32KB • Page size: 4KB • Virtual memory pages: 16 • Physical memory pages: 8 3 Memory Management Unit UC Santa Barbara • Automatically performs the mapping from virtual addresses into physical addresses 4 Memory Management Unit UC Santa Barbara • Addresses are split into a page number and an offset • Page numbers are used to look up a table in the MMU with as many entries as the number of virtual pages • Each entry in the table contains a bit that states if the virtual page is actually mapped to a physical one • If it is so, the entry contains the number of physical page used • If not, a page fault is generated and the OS has to deal with it 5 Page Tables UC Santa Barbara • Page tables contain an entry for each virtual table • If virtual memory is big (e.g., 32 bit and 64 bit addresses) the table can become of unmanageable size • Solution: instead of keeping them in the MMU move them to main memory • Problem: page tables are used each time an access to memory is performed.
- 
												  Page Replacement AlgorithmsCMPT 300 Introduction to Operating Systems Page Replacement Algorithms 0 Demand Paging Modern programs require a lot of physical memory Memory per system growing faster than 25%-30%/year But they don’t use all their memory all of the time 90-10 rule: programs spend 90% of their time in 10% of their code Wasteful to require all of user’s code to be in memory Solution: use main memory as cache for disk Processor Control Caching Tertiary Second Main Secondary Storage On Cache Level Memory Storage (Tape) - Datapath Chip Cache (DRAM) (Disk) (SRAM) 1 Illusion of Infinite Memory TLB Page Table Physical Disk Virtual Memory 500GB Memory 512 MB 4 GB Disk is larger than physical memory In-use virtual memory can be bigger than physical memory Combined memory of running processes much larger than physical memory More programs fit into memory, allowing more concurrency Principle: Transparent Level of Indirection (page table) Supports flexible placement of physical data Data could be on disk or somewhere across network Variable location of data transparent to user program Performance issue, not correctness issue 2 Demand Paging is Caching Since Demand Paging is Caching, must ask: What is block size? 1 page What is organization of this cache (i.e. direct-mapped, set- associative, fully-associative)? Fully associative: arbitrary virtualphysical mapping How do we find a page in the cache when look for it? First check TLB, then page-table traversal What is page replacement policy? (i.e. LRU, Random…) This requires more explanation… (kinda LRU) What happens on a miss? Go to lower level to fill miss (i.e.
- 
												  Outline Paging Working Set Model Working Set Model Working SetOutline Paging 1 Paging 2 Eviction policies 3 Thrashing 4 Details of paging 5 The user-level perspective 6 Case study: 4.4 BSD Use disk to simulate larger virtual than physical mem 1 / 48 • 2 / 48 Working set model Working set model # of accesses # of accesses virtual address virtual address Disk much, much slower than memory Disk much, much slower than memory • • - Goal: run at memory speed, not disk speed - Goal: run at memory speed, not disk speed 80/20 rule: 20% of memory gets 80% of memory accesses 80/20 rule: 20% of memory gets 80% of memory accesses • • - Keep the hot 20% in memory Keep the hot 20% in memory - Keep the cold 80% on disk - Keep the cold 80% on disk 3 / 48 3 / 48 Working set model Paging challenges How to resume a process aer a fault? • - Need to save state and resume - Process might have been in the middle of an instruction! # of accesses What to fetch from disk? virtual address • - Just needed page or more? Disk much, much slower than memory What to eject? • • - Goal: run at memory speed, not disk speed - How to allocate physical pages amongst processes? 80/20 rule: 20% of memory gets 80% of memory accesses - Which of a particular process’s pages to keep in memory? • - Keep the hot 20% in memory Keep the cold 80% on disk 3 / 48 4 / 48 Re-starting instructions What to fetch Hardware provides kernel with information about page fault • - Faulting virtual address (In %cr2 reg on x86—may see it if you Bring in page that caused page fault modify Pintos page_fault and use fault_addr) • Pre-fetch surrounding pages? - Address
- 
												  Last Class: Paging & SegmentationLast Class: Paging & Segmentation • Paging: divide memory into fixed-sized pages, map to frames (OS view of memory) • Segmentation: divide process into logical ‘segments’ (compiler view of memory • Combine paging and segmentation by paging individual segments Lecture 13, page 1 Computer Science CS377: Operating Systems Today: Demand Paged Virtual Memory • Up to now, the virtual address space of a process fit in memory, and we assumed it was all in memory. • OS illusions 1. treat disk (or other backing store) as a much larger, but much slower main memory 2. analogous to the way in which main memory is a much larger, but much slower, cache or set of registers • The illusion of an infinite virtual memory enables 1. a process to be larger than physical memory, and 2. a process to execute even if all of the process is not in memory 3. Allow more processes than fit in memory to run concurrently. Lecture 13, page 2 Computer Science CS377: Operating Systems Demand Paged Virtual Memory • Demand Paging uses a memory as a cache for the disk • The page table (memory map) indicates if the page is on disk or memory using a valid bit • Once a page is brought from disk into memory, the OS updates the page table and the valid bit • For efficiency reasons, memory accesses must reference pages that are in memory the vast majority of the time – Else the effective memory access time will approach that of the disk • Key Idea: Locality---the working set size of a process must fit in memory, and must stay there.
- 
												  4.4 Page Replacement Algorithms214 MEMORY MANAGEMENT CHAP. 4 4.4 PAGE REPLACEMENT ALGORITHMS When a page fault occurs, the operating system has to choose a page to re- move from memory to make room for the page that has to be brought in. If the page to be removed has been modified while in memory, it must be rewritten to the disk to bring the disk copy up to date. If, however, the page has not been changed (e.g., it contains program text), the disk copy is already up to date, so no rewrite is needed. The page to be read in just overwrites the page being evicted. While it would be possible to pick a random page to evict at each page fault, system performance is much better if a page that is not heavily used is chosen. If a heavily used page is removed, it will probably have to be brought back in quickly, resulting in extra overhead. Much work has been done on the subject of page replacement algorithms, both theoretical and experimental. Below we will describe some of the most important algorithms. It is worth noting that the problem of ‘‘page replacement’’ occurs in other areas of computer design as well. For example, most computers have one or more memory caches consisting of recently used 32-byte or 64-byte memory blocks. When the cache is full, some block has to be chosen for removal. This problem is precisely the same as page replacement except on a shorter time scale (it has to be done in a few nanoseconds, not milliseconds as with page replacement).
- 
												  Effects of System Paging / Thrashing Overhead on System Performance Based on Degree of MultiprogrammingAnale. Seria Informatic ă. Vol. XI fasc. 2 – 2013 45 Annals. Computer Science Series. 11 th Tome 2 nd Fasc. – 2013 EFFECTS OF SYSTEM PAGING / THRASHING OVERHEAD ON SYSTEM PERFORMANCE BASED ON DEGREE OF MULTIPROGRAMMING Akhigbe–Mudu Thursday Ehis Department of Computer Science, Federal University of Agriculture Abeokuta, Nigeria ABSTRACT: There is usually a strong concern about the system may degrade by multiple orders of magnitude functionality of computer systems and very little concern as a result of excessive paging [MA82]. In virtual about its performance. In general, performance becomes an memory system, thrashing may be caused by issue only when the performance becomes perceived as programs or work load that present insufficient being poor by users. Major constraints on systems show locality of reference, if the working set of a program themselves in form of external symptoms, stress conditions and paging being the chief forms. Thrashing occurs when a or a workload cannot be efficiently held within computer’s virtual memory subsystem is in a constant state physical memory, then constant data swapping, i.e. of paging, rapidly exchanging data in memory for data on thrashing may occur. Hence, the performance disk, to the exclusion of most application processing. This evaluation of computer system and networks has thus causes the performance of the system to degrade by become a permanent concern of all who use them. multiple orders of magnitude. We developed analytical models with computational algorithms that provide the 1.1. Identifying System Constraints values of desired performance measures. Lyaponuv stability criteria provide sufficient conditions for the Major constraints on system show themselves in the system validity, the result shows that the savings generated form of external symptoms, stress conditions and from the page- based, offsets the additional time needed to process the increase number of page- faults.
- 
												  Today: LRU Approximations, MultiprogrammingLast Class: Demand Paged Virtual Memory Benefits of demand paging: • Virtual address space can be larger than physical address space. • Processes can run without being fully loaded into memory. – Processes start faster because they only need to load a few pages (for code and data) to start running. – Processes can share memory more effectively, reducing the costs when a context switch occurs. • A good page replacement algorithm can reduce the number of page faults and improve performance • FIFO, MIN, LRU Computer Science CS377: Operating Systems Lecture 15, page 1 Today: LRU Approximations, Multiprogramming • LRU approximations: – Second Chance – Enhanced Second Chance • Hardware support for page replacement algorithms • Replacement policies for multiprogramming Computer Science CS377: Operating Systems Lecture 15, page 2 Implementing LRU: • Perfect LRU: • Keep a time stamp for each page with the time of the last access. Throw out the LRU page. Problems? • OS must record time stamp for each memory access, and to throw out a page the OS has to look at all pages. Expensive! • Keep a list of pages, where the front of the list is the most recently used page, and the end is the least recently used. • On a page access, move the page to the front of the list. Doubly link the list. Problems? • Still too expensive, since the OS must modify multiple pointers on each memory access Lecture 14, page Computer Science CS377: Operating Systems 3 Approximations of LRU • Hardware Requirements: Maintain reference bits with each page. – On each access to the page, the hardware sets the reference bit to '1'. – Set to 0 at varying times depending on the page replacement algorithm.
- 
												  Operating Systems Virtual Memory Design Issues: Address TranslationVirtual Memory Design Goals COS 318: Operating Systems u Protection u Virtualization l Use disk to extend physical memory Virtual Memory Design Issues: l Make addressing user friendly (0 to high address) Address Translation u Enabling memory sharing (code, libraries, communication) u Efficiency Jaswinder Pal Singh and a Fabulous Course Staff l Translation efficiency (TLB as cache) Computer Science Department l Access efficiency Princeton University • Access time = h × memory access time + ( 1 - h ) × disk access time • E.g. Suppose memory access time = 100ns, disk access time = 10ms (http://www.cs.princeton.edu/courses/cos318/) • If h = 90%, VM access time is 1ms! u Portability 3 1 3 Recall Address translation: Base and Bound Recall Address Translation: Segmentation Processor’s View Implementation Physical u A segment is a contiguous region of virtual memory Memory u Every process has a segment table (in hardware) Virtual Base l Entry in table per segment Memory Base Virtual Virtual Physical Address Address Address u Segment can be located anywhere in physical memory Processor Processor l Each segment has: start, length, access permission u Base+ Processes can share segments Bound Bound l Same start, length, same/different access permissions Raise Exception ◆ Pros: Simple, fast, cheap, safe, can relocate ◆ Cons: ● Can’t keep program from accidentally overwriting its own code ● Can’t share code/data with other processes (all or nothing) ● Can’t grow stack/heap as needed (stop program, change reg, …) 4 5 1 Segmentation Segments Enable Copy-on-Write Process View Implementation Physical u Memory Idea of Copy-on-Write Base 3 Virtual l Child process inherits copy of parent’s address space on fork Memory Stack l But don’t really want to make a copy of all data upon fork Processor Base+ Virtual Bound 3 Address Code Virtual Segment Table l Would like to share as far as possible and make own copy Processor Address Base Bound Access only “on-demand”, i.e.
- 
												  The Working Set Model for Program BehaviorThe Working Set Model for Program Behavior Peter J. Denning Massachusetts Institute of Technology, Cambridge, Massachusetts Probably the most basic reason behind the absence of a general develop a new model, the working set model, which em- treatment of resource allocation in modern computer systems is bodies certain important behavioral properties of com- an adequate model for program behavior. In this paper a new putations operating in multiprogrammed environs, en- model, the "working set model," is developed. The working set abling us to decide which information is in use by a running of pages associated with a process, defined to be the collec- program and which is not. We do not intend that the pro- tion of its most recently used pages, provides knowledge vital posed model be considered "final"; rather, we hope to to the dynamic management of paged memories. "Process" stimulate a new kind of thinking that may be of consider- and "working set" are shown to be manifestations of the same able help in solving many operating system design prob- ongoing computational activity; then "processor demand" lems. and "memory demand" are defined; and resource allocation The working set is intended to model the behavior of is formulated as the problem of balancing demands against programs in the general purpose computer system, or available equipment. computer utility. For this reason we assume that the KEY WORDS AND PHRASES: general operating systemconcepts, multiprocess- operating system must determine on its own the behavior ing, multiprogramming, operating systems, program behavior, program of programs it runs; it cannot count on outside help.
- 
												  CMSC 714 SchedulingCMSC 714 Scheduling Guest Lecturer: Sukhyun Song Scheduling Techniques for Concurrent Systems John K. Ousterhout IEEE ICDCS’82 CMSC 714 - Alan Sussman & Jeffrey K. Hollingsworth 2 Introduction ● Motivation – People assume in OS scheduling that interactions between processes are the exception rather than the rule. – But it is not true any more. • Multiprocessor systems are appearing. • Cooperation between processes becomes widespread. • Traditional techniques will break down. ● Short-term scheduling – Two-phase blocking scheme ● Long-term scheduling – Three algorithms of coscheduling CMSC 714 - Alan Sussman & Jeffrey K. Hollingsworth 3 Short-Term Scheduling ● Waiting is a fundamental aspect of communication. – It is unlikely that two processes reach the rendezvous point at exactly the same time. – A sender or a receiver should wait for the other party. ● Most short-term schedulers are inefficient. – Immediate blocking: When a process waits for some event, it is thrown off its processor and another process is activated. – It always requires two context swaps. ● Solution: to divide waiting into two phases – Pause: A process pauses until the event occurs. – Block: If pause time is exceeded, it relinquishes a processor. – Effective only for fine-grained communication • The event should occur very soon so that we do not get into the block phase. • Duration of the event wait < 2 x CS (context swap cost) CMSC 714 - Alan Sussman & Jeffrey K. Hollingsworth 4 Thrashing and Process Working Sets (1) ● Naïve long-term scheduling – Processes are sending messages among themselves. – Half scheduled in odd time slices, the other in even slices – Most processes in the executing half will block awaiting messages from processes in the non-executing half.
- 
												  Virtual Memory 2Virtual Memory 2 Hakim Weatherspoon CS 3410, Spring 2012 Computer Science Cornell University P & H Chapter 5.4 Goals for Today Virtual Memory • Address Translation • Pages, page tables, and memory mgmt unit • Paging • Role of Operating System • Context switches, working set, shared memory • Performance • How slow is it • Making virtual memory fast • Translation lookaside buffer (TLB) • Virtual Memory Meets Caching 2 Role of the Operating System Context switches, working set, shared memory 3 Role of the Operating System The operating systems (OS) manages and multiplexes memory between process. It… • Enables processes to (explicitly) increase memory: sbrk and (implicitly) decrease memory • Enables sharing of physical memory: multiplexing memory via context switching, sharing memory, and paging • Enables and limits the number of processes that can run simultaneously 4 sbrk Suppose Firefox needs a new page of memory (1) Invoke the Operating System void *sbrk(int nbytes); (2) OS finds a free page of physical memory • clear the page (fill with zeros) • add a new entry to Firefox’s PageTable 5 Context Switch Suppose Firefox is idle, but Skype wants to run (1) Firefox invokes the Operating System int sleep(int nseconds); (2) OS saves Firefox’s registers, load skype’s • (more on this later) (3) OS changes the CPU’s Page Table Base Register • Cop0:ContextRegister / CR3:PDBR (4) OS returns to Skype 6 Shared Memory Suppose Firefox and Skype want to share data (1) OS finds a free page of physical memory • clear the page (fill with zeros) • add a new
- 
												  CSE 120 Principles of Operating SystemsCCSSEE 112200 PPrriinncciipplleess ooff OOppeerraattiinngg SSyysstteemmss FFaallll 22000077 Lecture 11: Page Replacement Keith Marzullo MMeemmoorryy MMaannaaggeemmeenntt Final lecture on memory management: Goals of memory management ◆ To provide a convenient abstraction for programming ◆ To allocate scarce memory resources among competing processes to maximize performance with minimal overhead Mechanisms ◆ Physical and virtual addressing (1) ◆ Techniques: Partitioning, paging, segmentation (1) ◆ Page table management, TLBs, VM tricks (2) Policies ◆ Page replacement algorithms (3) 2 LLeeccttuurree OOvveerrvviieeww Review paging and page replacement Survey page replacement algorithms Discuss local vs. global replacement Discuss thrashing 3 LLooccaalliittyy All paging schemes depend on locality ◆ Processes reference pages in localized patterns Temporal locality ◆ Locations referenced recently likely to be referenced again Spatial locality ◆ Locations near recently referenced locations are likely to be referenced soon Although the cost of paging is high, if it is infrequent enough it is acceptable ◆ Processes usually exhibit both kinds of locality during their execution, making paging practical 4 DDeemmaanndd PPaaggiinngg ((OOSS)) Recall demand paging from the OS perspective: ◆ Pages are evicted to disk when memory is full ◆ Pages loaded from disk when referenced again ◆ References to evicted pages cause a TLB miss » PTE was invalid, causes fault ◆ OS allocates a page frame, reads page from disk ◆ When I/O completes, the OS fills