CSCI [4|6] 730 Operating Systems
CPU Scheduling
Maria Hybinette, UGA
Scheduling Plans
• Introductory Concepts • Embellish on the introductory concepts • Case studies • Look at real time scheduling. – Practical system have some theory, and lots of tweaking (hacking).
Maria Hybinette, UGA Why Schedule? Management of Resources • Resource: Anything that can be used by only a single [set] process(es) at any instant in time – Not just the CPU, what else? • Hardware device or a piece of information – Examples: • CPU (time, time slice) • Tape drive, Disk space, Memory (spatial) • Locked record in a database (information, synchronization) • Focus today managing the CPU, short term scheduling.
Maria Hybinette, UGA
Multilevel Feedback Queue
• Give new processes high priority and small time slice (preference to smaller jobs) • If process doesn‘t finish job bump it to the next lower level priority queue (with a larger time-slice). • Common in interactive system
Maria Hybinette, UGA Case Studies: Early Scheduling Implementations • Windows and Early MS-DOS – Non-Multitasking (so no scheduler needed) • Mac OS 9 – Kernel schedule processes: • A Round Robin Preemptive (fair, each process gets a fair share of CPU – Processes • schedules multiple (MACH) threads that use a cooperative thread schedule manager – each process has its own copy of the scheduler.
Maria Hybinette, UGA
Case Studies: Modern Scheduling Implementations • Multilevel Feedback Queue w/ Preemption: – FreeBSD, NetBSD Solaris, Linux pre 2.5 – Example Linux: 0-99 real time tasks (200ms quanta), 100-140 nice tasks (10 ms quanta -> expired queue) • Cooperative Scheduling (no preemption) – Windows 3.1x, Mac OS pre3 (thread level) • O(1) Scheduling – time to schedule independent of number of tasks in system – Linux 2.5-2.6.24 ((v2.6.0 first version ~2003/2004) • Completely Fair Scheduler (link read) – Maximizes CPU utilization while maximizing interactive performance / Red/Black Tree instead of Queue – Linux 2.6.23+
Maria Hybinette, UGA Review: Real World Scheduling:
• Characteristics of the processes – Are they I/O bound or CPU bound? – Do we have statistics about the processes, e.g., deadlines? – Is their behavior predictable? • Characteristics of the machine – How many CPUs or cores? – Can we preempt processes? – How is memory shared by the CPUs? • Characteristics of the user – Are the processes interactive (e.g. desktop apps)? • Drove the the development of Linux Scheduling protocols. – Or are the processes background jobs?
Maria Hybinette, UGA http://pages.cs.wisc.edu/~remzi/OSTEP/ Basic Scheduler Architecture
• Scheduler selects from the ready processes, and assigns them to a CPU – System may have >1 CPU – Various different approaches for selecting processes • Scheduling decisions are made when a process: 1. Switches from running to waiting No preemption 2. Terminates 3. Switches from running to ready Preemption 4. Switches from waiting to ready • Scheduler may have access to additional information – Process deadlines, data in shared memory, etc.
Maria Hybinette, UGA
Beware: Dispatch Latency • The dispatcher gives control of the CPU to the process selected by the scheduler – Switches context – Switching to/from kernel mode/user mode – Saving the old EIP, loading the new EIP • Warning: dispatching incurs a cost – Context switching and mode switch are expensive – Adds latency to processing times • It is advantageous to minimize process switching
Maria Hybinette, UGA Review: Scheduling Optimization Criteria • Max CPU utilization – keep the CPU as busy as possible • Max throughput – # of processes that finish over time – Min turnaround• No scheduler time – amount can of meet time to all finish these a process criteria – • Min Which waiting criteria time – amountare most of time important a ready processdepend has on been types waiting of to execute processes and expectations of the system • Min response• E.g. time response – amount time is timekey on between the desktop submitting a request• Throughput and receiving is more a response important for MapReduce – E.g. time between clicking a button and seeing a response • Fairness – all processes receive min/max fair CPU resources
11 Maria Hybinette, UGA
Interactive Systems & Tension • Imagine you are typing/clicking in a desktop app – You don’t care about turnaround time – What you care about is responsiveness • E.g. if you start typing but the app doesn’t show the text for 10 seconds, you’ll become frustrated • Response time = first run time – arrival time – Round Robin • Turn around time : – SJCF
12 Maria Hybinette, UGA Proce Burst Arriva ss Time l Time RR vs. STCF P1 6 0 P2 8 0 P3 10 0
P1 P2 P3 STCF Time: 0 6 14 24 • Avg. turnaround time: (6 + 14 + 24) / 3 = 14.7 • Avg. response time: (0 + 6 + 14) / 3 = 6.7
P1 P2 P3 P1 P2 P3 P1 P2 P3 P2 P3 Time: 0 2 4 6 8 10 12 14 16 18 20 24 RR • 2 second time slices • Avg. turnaround time: (14 + 20 + 24) / 3 = 19.3 • Avg. response time: (0 + 2 + 4) / 3 = 2 Maria Hybinette, UGA
Tradeoffs RR STCF + Excellent response times + Achieves optimal, low + With N process and time slice of turnaround times Q… - Bad response times + No process waits more than (N-1)/Q time slices - Inherently unfair + Achieves fairness - Short jobs finish first + Each process receives 1/N CPU time - Worst possible turnaround times - If Q is large ! FIFO behavior
• Optimizing for turnaround or response time is a trade-off • Achieving both requires more sophisticated algorithms
Maria Hybinette, UGA Earliest Deadline First (EDF)
• Each process has a deadline it must finish by • Priorities are assigned according to deadlines – Tighter deadlines are given higher priority
Process Burst Arrival Dead Time Time line P1 15 0 40 P2 3 4 10 P1 P2 P1 P3 P4 P3 P1 P3 6 10 20 0 4 7 10 13 17 20 28 P4 4 13 18
• EDF is optimal (assuming preemption) [uniprocessors] • But, it’s only useful if processes have known deadlines – Typically used in real-time OS’s
Maria Hybinette, UGA https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling
Rules of MFLQ
• Rule 1: If Priority(A) > Priority(B), A runs, B doesn’t • Rule 2: If Priority(A) = Priority(B), A & B run in RR • Rule 3: Processes start at the highest priority • Rule 4: – Rule 4a: If a process uses an entire time slice while running, its priority is reduced – Rule 4b: If a process gives up the CPU before its time slice is up, it remains at the same priority level
16 Maria Hybinette, UGA Problems With MLFQ So Far…
Q0 High priority processes Q1 always take • Starvation precedence Q2sleep(1ms) just before over low time slice expires priority Time: 0 2 4 6 8 10 12 14 Unscrupulous Q0 process never gets Q1 • Cheating demoted, Q2 monopolizes CPU time Time: 0 2 4 6 8 10 12 14 17 Maria Hybinette, UGA
MLFQ Rule 5: Priority Boost
• Rule 5: After some time period S, move all processes to the highest priority queue • Solves two problems: – Starvation: low priority processes will eventually become high priority, acquire CPU time – Dynamic behavior: a CPU bound process that has become interactive will now be high priority
18 Maria Hybinette, UGA Priority Boost Example Priority Starvation :( Boost Without Priority Boost With Priority Boost
Q0 Q0
Q1 Q1
Q2 Q2
Time: 0 2 4 6 8 10 12 14 Time: 0 2 4 6 8 10 12 14 16 18
19 Maria Hybinette, UGA
Revised Rule 4: Cheat Prevention
• Rule 4a and 4b let a process game the scheduler – Repeatedly yield just before the time limit expires • Solution: better accounting – Rule 4: Once a process uses up its time allotment at a given priority (regardless of whether it gave up the CPU), demote its priority – Basically, keep track of total CPU time used by each process during each time interval S • Instead of just looking at continuous CPU time
20 Maria Hybinette, UGA Preventing Cheating Time Time sleep(1ms) just before allotment allotment time slice expires exhausted exhausted Without Cheat Prevention With Cheat Prevention
Q0 Q0
Q1 Q1
Q2 Q2
Time: 0 2 4 6 8 10 12 14 Time: 0 2 4 6 8 10 12 14 16 Round robin
21 Maria Hybinette, UGA
MLFQ Rule Review
• Rule 1: If Priority(A) > Priority(B), A runs, B doesn’t • Rule 2: If Priority(A) = Priority(B), A & B run in RR • Rule 3: Processes start at the highest priority • Rule 4: Once a process uses up its time allotment at a given priority, demote it • Rule 5: After some time period S, move all processes to the highest priority queue
22 Maria Hybinette, UGA Parameterizing MLFQ
• MLFQ meets our goals – Balances response time and turnaround time – Does not require prior knowledge about processes • But, it has many knobs to tune – Number of queues? – How to divide CPU time between the queues? – For each queue: • Which scheduling regime to use? • Time slice/quantum? – Method for demoting priorities? – Method for boosting priorities?
23 Maria Hybinette, UGA
MLFQ In Practice
• Many OSs use MLFQ-like schedulers – Example: Windows NT/2000/XP/Vista, Solaris, FreeBSD • OSs ship with “reasonable” MLFQ parameters – Variable length time slices • High priority queues – short time slices • Low priority queues – long time slices – Priority 0 sometimes reserved for OS processes
24 Maria Hybinette, UGA Giving Hints/Advice
• Some OSes allow users/processes to give the scheduler “hints” about priorities • Example: nice command on Linux $ nice
25 Maria Hybinette, UGA
Recap
• So-far schedulers discussed are designed to optimize performance: – Minimum response times – Minimum turnaround times • MLFQ achieves these goals, but it’s complicated – Non-trivial to implement – Challenging to parameterize and tune • What about a simple algorithm that achieves fairness?
Maria Hybinette, UGA Ideal Fair Scheduling
Jobs Burst (ms) • Divide processor time equally among processes A 8 B 4 • Ideal Fairness : If there are N C 16 jobs/processes in the D 4 system, each process should have got (100/N)% of the CPU time
A 1 2 3 4 6 8 (4ms slice) B 1 2 3 4 at each slice give all jobs a C 1 2 3 4 6 8 12 16 chance to run D 1 2 3 4
Maria Hybinette, UGA
Ideal Fairness
• Not Feasible: Time slices that are infinitely small – equal sharing is not feasible • The overheads due to context switching and scheduling will become significant • Modern Schedulers that accounts for ‘fairness’ such as CFS uses an approximation of ideal fairness
Maria Hybinette, UGA Lottery Scheduling • Key idea: Proportional Share Scheduling – Scheduler give process a number of tickets – Each time slice, scheduler holds a lottery – Process holding the winning ticket gets to run
Process Arrival Ticket Time Range • P1 ran 8 of 11 slices – 72% • P2 ran 3 of 11 slices – 27% P1 0 0-74 (75 total)
P2 0 75-99 (25 total)
P1 P2 P1 P1 P1 P2 P2 P1 P1 P1 P1 Time: 0 2 4 6 8 10 12 14 16 18 20 22 • Probabilistic scheduling – Over time, run time for each process converges to the correct value (i.e. the # of tickets it holds)
Maria Hybinette, UGA https://en.wikipedia.org/wiki/Lottery_scheduling
Implementation Advantages • Very fast scheduler execution – All the scheduler needs to do is run random() – No need to manage O(log N) priority queues (if using a heap data structure, or a self balancing tree – not balancing – tree – then O(n). • No need to store lots of state – Scheduler needs to know the total number of tickets – No need to track process behavior or history • Automatically balances CPU time across processes – New processes get some tickets, adjust the overall size of the ticket pool • Easy to prioritize processes – Give high priority processes many tickets – Give low priority processes a few tickets – Priorities can change via ticket inflation (i.e. minting tickets)
Maria Hybinette, UGA Unfair to short job Randomness is Is Lottery Schedulingdue to amortizedFair? over randomness long time scales • Does lottery scheduling achieve fairness? – Assume two processes with equal # tickets – Study: X axis : Runtime of processes varies – Unfairness ratio = 1 is ‘fair’ this is when both processes finish at the same time – Unfair: if they do not finish at the same time (the difference).
Unfairness : The time the first job completes divided by the time that the second job completes.
31 Maria Hybinette, UGA
Stride Scheduling
• Randomness lets us build a simple and approximately fair scheduler – But fairness is not guaranteed! • Why not build a deterministic, fair scheduler? • Stride scheduling – Each process is given some tickets – Each process has a stride = a big # / total # of tickets • A process with lots of tickets have a small stride – gets to run more. – Each time a process runs, its pass += stride – Scheduler chooses process with the lowest pass to run next
– Lower strides get to run more frequently.
Maria Hybinette, UGA Stride Scheduling Example
Proces Arrival Ticket Stride P1 P2 P3 Who s Time s (K = pass pass pass runs? 10,000) P1 0 100 100 0 0 0 P1 P2 0 50 200 100 0 0 P2 P3 0 250 40 100 200 0 P3 100 200 40 P3 • P1: 100 of 400 tickets – 25% 100 200 80 P3 • P2: 50 of 400 tickets – 12.5% • P3: 250 of 400 tickets – 62.5% 100 200 120 P1 200 200 120 P3 200 200 160 P3 • P1 ran 2 of 8 slices – 25% 200 200 200 … • P2 ran 1 of 8 slices – 12.5% • P3 ran 5 of 8 slices – 62.5%
Maria Hybinette, UGA
Summary & Lingering Issues
• Why stride over lottery scheduling? – It is deterministic – It is more “fair” for shorter time scales • Why choose lottery over stride scheduling? – Stride schedulers need to store a lot more state, lottery does not need global state. – How does a stride scheduler deal with new processes? • Enters at Pass = 0 • So new jobs will monopolizes the CPU, and it takes time to catch up. • Both schedulers require tickets assignment – How do yoopen problemu know how many tickets to assign to each process? – This is an
34 Maria Hybinette, UGA .
• What about systems with multiple CPUs? – Things get a lot more complicated when the number of CPUs > 1 • Other things to consider – Single Queue for multiple processors • Does not support Processor Affinity
35 Maria Hybinette, UGA
Single Queue Scheduling across multiprocessors • Single Queue Multiprocessor Scheduling (SQMS) – Most basic design: all processes go into a single queue – CPUs pull tasks from the queue as needed – Good for load balancing (CPUs pull processes on demand)
Process Queue P1 P1P2 P3P2 P3P4 P4P5
CPU 0 P1
CPU 1 P2
CPU 2 P3
P4 36 Maria Hybinette, UGA CPU 3 Problems with SQMS
• The process queue is a shared data structure – Necessitates locking, or careful lock-free design • SQMS does not respect cache affinity
Process Queue P1 P1P2P4P5 P3P5P1P2 P3P4P2P1 P2P4P5P3
CPU 0 P1 P5 P4 P3
CPU 1 P2 P1 P5 P4 Worst case scenario: processes rarely run CPU 2 P3 P2 P1 P5 on the same CPU
CPU 3 P4 P3 P2 P1
37 Maria Hybinette, UGA Time
Linux Schedulers
• Linux 1.2: Round Robin (text book) • Linux 2.4 Global Queues (text book) – protected by locks – Simple: 3 types of processes: • Preemptable User/normal • Nonpreemptable: FIFO • Nonpreemptable Real time Round Robin – Poor performance on multiprocessors/core – Poor performance when N is large (N is #processes) – Selects a process according to a goodness function • Linux 2.5 and early versions of Linux 2.6: – O(1) scheduler, per-CPU run queue • Solves performance problems in the old scheduler • Complex, error prone logic to boost interactivity • No guarantee of fairness • Linux 2.6: completely fair scheduler (CFS) – Fair – Naturally boosts interactivity
Maria Hybinette, UGA Linux: O(1) Scheduler (brief discussion) • Replaced the old O(n) scheduler – Designed to reduce the cost of context switching – Used in kernels prior to 2.6.23 • Implements RR – Multilevel feedback paradigm MLFQ – Scheduler picks first process from highest priority queue with a ready process – Each queue has a time quantum that scales with priority – 140 priority levels, 2 queues per priority: • Active inactive(expired) queue – Process are scheduled from the active queue • When the active queue is empty, refill from inactive queue • Uses a bit map for the 140 queues to
achive O(1) 39 Maria Hybinette, UGA
• unsigned long bitmap[5] —> 160 (more than 140) • 32*5 = 160 • 32*4 = 128
Maria Hybinette, UGA Priority Assignment
• Static priorities – nice values [-20,19] – Default = 0 – Used for time slice calculation • Dynamic priorities [0, 139] – Used to demote CPU bound processes – Maintain high priorities for interactive processes – sleep() time for each process is measured • High sleep time ! interactive or I/O bound! high priority
41 Maria Hybinette, UGA
SNP / NUMA Support
• Processes are placed into a virtual hierarchy – Groups are scheduled onto a physical CPU – Processes are preferentially pinned to individual cores – to leverage warmed up caches. • Problems: not fair
42 Maria Hybinette, UGA Completely Fair Scheduler (CFS)
• Replaced the O(1) scheduler – In use since 2.6.23, has O(log N) runtime • Moves from MLFQ to Weighted Fair Queuing – First major OS to use a fair scheduling algorithm – Very similar to stride scheduling – Processes ordered by the amount of CPU time they use • Gets rid of active/inactive run queues in favor of a red-black tree of processes – Red back tree is simply a balanced tree – unbalanced tree can be O(n), but balanced ensures • CFS isn’t actually “completely fair” – -- Unfairness is bounded O(N)
43 Maria Hybinette, UGA
Red-Black Process Tree
• Tree organized according to amount of CPU time used by each process – Measured in nanoseconds, obviates the need for time slices
1 • Add the • Left-most 7 process back process 1 2 to the tree has always • Rebalance used the 5 5 2 the tree least time 3 2 2 5 • Scheduled 8 2 7 1 2 next 7 7 2 44 2 Maria Hybinette, UGA BF Scheduler • What does BF stand for? – Look it up yourself • Alternative to CFS, introduced in 2009 – O(n) runtime, single run queue – Dead simple implementation • Goal: a simple scheduling algorithm with fewer parameters that need manual tuning – Designed for light NUMA workloads – Doesn’t scale to cores > 16 • For the adventurous: download the BFS patches and build yourself a custom kernel
https://en.wikipedia.org/wiki/BF Maria Hybinette, UGA
Energy/Power & Scheduling.
Maria Hybinette, UGA