CSCI [4|6] 730 Operating Systems

CPU

Maria Hybinette, UGA

Scheduling Plans

• Introductory Concepts • Embellish on the introductory concepts • Case studies • Look at real time scheduling. – Practical system have some theory, and lots of tweaking (hacking).

Maria Hybinette, UGA Why Schedule? Management of Resources • Resource: Anything that can be used by only a single [set] (es) at any instant in time – Not just the CPU, what else? • Hardware device or a piece of information – Examples: • CPU (time, time slice) • Tape drive, Disk space, Memory (spatial) • Locked record in a database (information, synchronization) • Focus today managing the CPU, short term scheduling.

Maria Hybinette, UGA

Multilevel Feedback Queue

• Give new processes high priority and small time slice (preference to smaller jobs) • If process doesn‘t finish job bump it to the next lower level priority queue (with a larger time-slice). • Common in interactive system

Maria Hybinette, UGA Case Studies: Early Scheduling Implementations • Windows and Early MS-DOS – Non-Multitasking (so no scheduler needed) • Mac OS 9 – Kernel schedule processes: • A Round Robin Preemptive (fair, each process gets a fair share of CPU – Processes • schedules multiple (MACH) threads that use a cooperative schedule manager – each process has its own copy of the scheduler.

Maria Hybinette, UGA

Case Studies: Modern Scheduling Implementations • Multilevel Feedback Queue w/ Preemption: – FreeBSD, NetBSD Solaris, pre 2.5 – Example Linux: 0-99 real time tasks (200ms quanta), 100-140 nice tasks (10 ms quanta -> expired queue) • Cooperative Scheduling (no preemption) – Windows 3.1x, Mac OS pre3 (thread level) • O(1) Scheduling – time to schedule independent of number of tasks in system – Linux 2.5-2.6.24 ((v2.6.0 first version ~2003/2004) • Completely Fair Scheduler (link read) – Maximizes CPU utilization while maximizing interactive performance / Red/Black Tree instead of Queue – Linux 2.6.23+

Maria Hybinette, UGA Review: Real World Scheduling:

• Characteristics of the processes – Are they I/O bound or CPU bound? – Do we have statistics about the processes, e.g., deadlines? – Is their behavior predictable? • Characteristics of the machine – How many CPUs or cores? – Can we preempt processes? – How is memory shared by the CPUs? • Characteristics of the user – Are the processes interactive (e.g. desktop apps)? • Drove the the development of Linux Scheduling protocols. – Or are the processes background jobs?

Maria Hybinette, UGA http://pages.cs.wisc.edu/~remzi/OSTEP/ Basic Scheduler Architecture

• Scheduler selects from the ready processes, and assigns them to a CPU – System may have >1 CPU – Various different approaches for selecting processes • Scheduling decisions are made when a process: 1. Switches from running to waiting No preemption 2. Terminates 3. Switches from running to ready Preemption 4. Switches from waiting to ready • Scheduler may have access to additional information – Process deadlines, data in shared memory, etc.

Maria Hybinette, UGA

Beware: Dispatch Latency • The dispatcher gives control of the CPU to the process selected by the scheduler – Switches context – Switching to/from kernel mode/user mode – Saving the old EIP, loading the new EIP • Warning: dispatching incurs a cost – Context switching and mode switch are expensive – Adds latency to processing times • It is advantageous to minimize process switching

Maria Hybinette, UGA Review: Scheduling Optimization Criteria • Max CPU utilization – keep the CPU as busy as possible • Max throughput – # of processes that finish over time – Min turnaround• No scheduler time – amount can of meet time to all finish these a process criteria – • Min Which waiting criteria time – amountare most of time important a ready processdepend has on been types waiting of to execute processes and expectations of the system • Min response• E.g. time response – amount time is timekey on between the desktop submitting a request• Throughput and receiving is more a response important for MapReduce – E.g. time between clicking a button and seeing a response • Fairness – all processes receive min/max fair CPU resources

11 Maria Hybinette, UGA

Interactive Systems & Tension • Imagine you are typing/clicking in a desktop app – You don’t care about turnaround time – What you care about is responsiveness • E.g. if you start typing but the app doesn’t show the text for 10 seconds, you’ll become frustrated • Response time = first run time – arrival time – Round Robin • Turn around time : – SJCF

12 Maria Hybinette, UGA Proce Burst Arriva ss Time l Time RR vs. STCF P1 6 0 P2 8 0 P3 10 0

P1 P2 P3 STCF Time: 0 6 14 24 • Avg. turnaround time: (6 + 14 + 24) / 3 = 14.7 • Avg. response time: (0 + 6 + 14) / 3 = 6.7

P1 P2 P3 P1 P2 P3 P1 P2 P3 P2 P3 Time: 0 2 4 6 8 10 12 14 16 18 20 24 RR • 2 second time slices • Avg. turnaround time: (14 + 20 + 24) / 3 = 19.3 • Avg. response time: (0 + 2 + 4) / 3 = 2 Maria Hybinette, UGA

Tradeoffs RR STCF + Excellent response times + Achieves optimal, low + With N process and time slice of turnaround times Q… - Bad response times + No process waits more than (N-1)/Q time slices - Inherently unfair + Achieves fairness - Short jobs finish first + Each process receives 1/N CPU time - Worst possible turnaround times - If Q is large ! FIFO behavior

• Optimizing for turnaround or response time is a trade-off • Achieving both requires more sophisticated algorithms

Maria Hybinette, UGA Earliest Deadline First (EDF)

• Each process has a deadline it must finish by • Priorities are assigned according to deadlines – Tighter deadlines are given higher priority

Process Burst Arrival Dead Time Time line P1 15 0 40 P2 3 4 10 P1 P2 P1 P3 P4 P3 P1 P3 6 10 20 0 4 7 10 13 17 20 28 P4 4 13 18

• EDF is optimal (assuming preemption) [uniprocessors] • But, it’s only useful if processes have known deadlines – Typically used in real-time OS’s

Maria Hybinette, UGA https://en.wikipedia.org/wiki/Earliest_deadline_first_scheduling

Rules of MFLQ

• Rule 1: If Priority(A) > Priority(B), A runs, B doesn’t • Rule 2: If Priority(A) = Priority(B), A & B run in RR • Rule 3: Processes start at the highest priority • Rule 4: – Rule 4a: If a process uses an entire time slice while running, its priority is reduced – Rule 4b: If a process gives up the CPU before its time slice is up, it remains at the same priority level

16 Maria Hybinette, UGA Problems With MLFQ So Far…

Q0 High priority processes Q1 always take • Starvation precedence Q2sleep(1ms) just before over low time slice expires priority Time: 0 2 4 6 8 10 12 14 Unscrupulous Q0 process never gets Q1 • Cheating demoted, Q2 monopolizes CPU time Time: 0 2 4 6 8 10 12 14 17 Maria Hybinette, UGA

MLFQ Rule 5: Priority Boost

• Rule 5: After some time period S, move all processes to the highest priority queue • Solves two problems: – Starvation: low priority processes will eventually become high priority, acquire CPU time – Dynamic behavior: a CPU bound process that has become interactive will now be high priority

18 Maria Hybinette, UGA Priority Boost Example Priority Starvation :( Boost Without Priority Boost With Priority Boost

Q0 Q0

Q1 Q1

Q2 Q2

Time: 0 2 4 6 8 10 12 14 Time: 0 2 4 6 8 10 12 14 16 18

19 Maria Hybinette, UGA

Revised Rule 4: Cheat Prevention

• Rule 4a and 4b let a process game the scheduler – Repeatedly yield just before the time limit expires • Solution: better accounting – Rule 4: Once a process uses up its time allotment at a given priority (regardless of whether it gave up the CPU), demote its priority – Basically, keep track of total CPU time used by each process during each time interval S • Instead of just looking at continuous CPU time

20 Maria Hybinette, UGA Preventing Cheating Time Time sleep(1ms) just before allotment allotment time slice expires exhausted exhausted Without Cheat Prevention With Cheat Prevention

Q0 Q0

Q1 Q1

Q2 Q2

Time: 0 2 4 6 8 10 12 14 Time: 0 2 4 6 8 10 12 14 16 Round robin

21 Maria Hybinette, UGA

MLFQ Rule Review

• Rule 1: If Priority(A) > Priority(B), A runs, B doesn’t • Rule 2: If Priority(A) = Priority(B), A & B run in RR • Rule 3: Processes start at the highest priority • Rule 4: Once a process uses up its time allotment at a given priority, demote it • Rule 5: After some time period S, move all processes to the highest priority queue

22 Maria Hybinette, UGA Parameterizing MLFQ

• MLFQ meets our goals – Balances response time and turnaround time – Does not require prior knowledge about processes • But, it has many knobs to tune – Number of queues? – How to divide CPU time between the queues? – For each queue: • Which scheduling regime to use? • Time slice/quantum? – Method for demoting priorities? – Method for boosting priorities?

23 Maria Hybinette, UGA

MLFQ In Practice

• Many OSs use MLFQ-like schedulers – Example: Windows NT/2000/XP/Vista, Solaris, FreeBSD • OSs ship with “reasonable” MLFQ parameters – Variable length time slices • High priority queues – short time slices • Low priority queues – long time slices – Priority 0 sometimes reserved for OS processes

24 Maria Hybinette, UGA Giving Hints/Advice

• Some OSes allow users/processes to give the scheduler “hints” about priorities • Example: nice command on Linux $ nice – Run the command at the specified priority – Priorities range from -20 (high) to 19 (low)

25 Maria Hybinette, UGA

Recap

• So-far schedulers discussed are designed to optimize performance: – Minimum response times – Minimum turnaround times • MLFQ achieves these goals, but it’s complicated – Non-trivial to implement – Challenging to parameterize and tune • What about a simple algorithm that achieves fairness?

Maria Hybinette, UGA Ideal Fair Scheduling

Jobs Burst (ms) • Divide processor time equally among processes A 8 B 4 • Ideal Fairness : If there are N C 16 jobs/processes in the D 4 system, each process should have got (100/N)% of the CPU time

A 1 2 3 4 6 8 (4ms slice) B 1 2 3 4 at each slice give all jobs a C 1 2 3 4 6 8 12 16 chance to run D 1 2 3 4

Maria Hybinette, UGA

Ideal Fairness

• Not Feasible: Time slices that are infinitely small – equal sharing is not feasible • The overheads due to context switching and scheduling will become significant • Modern Schedulers that accounts for ‘fairness’ such as CFS uses an approximation of ideal fairness

Maria Hybinette, UGA • Key idea: Proportional Share Scheduling – Scheduler give process a number of tickets – Each time slice, scheduler holds a lottery – Process holding the winning ticket gets to run

Process Arrival Ticket Time Range • P1 ran 8 of 11 slices – 72% • P2 ran 3 of 11 slices – 27% P1 0 0-74 (75 total)

P2 0 75-99 (25 total)

P1 P2 P1 P1 P1 P2 P2 P1 P1 P1 P1 Time: 0 2 4 6 8 10 12 14 16 18 20 22 • Probabilistic scheduling – Over time, run time for each process converges to the correct value (i.e. the # of tickets it holds)

Maria Hybinette, UGA https://en.wikipedia.org/wiki/Lottery_scheduling

Implementation Advantages • Very fast scheduler execution – All the scheduler needs to do is run random() – No need to manage O(log N) priority queues (if using a heap data structure, or a self balancing tree – not balancing – tree – then O(n). • No need to store lots of state – Scheduler needs to know the total number of tickets – No need to track process behavior or history • Automatically balances CPU time across processes – New processes get some tickets, adjust the overall size of the ticket pool • Easy to prioritize processes – Give high priority processes many tickets – Give low priority processes a few tickets – Priorities can change via ticket inflation (i.e. minting tickets)

Maria Hybinette, UGA Unfair to short job Randomness is Is Lottery Schedulingdue to amortizedFair? over randomness long time scales • Does lottery scheduling achieve fairness? – Assume two processes with equal # tickets – Study: X axis : Runtime of processes varies – Unfairness ratio = 1 is ‘fair’ this is when both processes finish at the same time – Unfair: if they do not finish at the same time (the difference).

Unfairness : The time the first job completes divided by the time that the second job completes.

31 Maria Hybinette, UGA

Stride Scheduling

• Randomness lets us build a simple and approximately fair scheduler – But fairness is not guaranteed! • Why not build a deterministic, fair scheduler? • – Each process is given some tickets – Each process has a stride = a big # / total # of tickets • A process with lots of tickets have a small stride – gets to run more. – Each time a process runs, its pass += stride – Scheduler chooses process with the lowest pass to run next

– Lower strides get to run more frequently.

Maria Hybinette, UGA Stride Scheduling Example

Proces Arrival Ticket Stride P1 P2 P3 Who s Time s (K = pass pass pass runs? 10,000) P1 0 100 100 0 0 0 P1 P2 0 50 200 100 0 0 P2 P3 0 250 40 100 200 0 P3 100 200 40 P3 • P1: 100 of 400 tickets – 25% 100 200 80 P3 • P2: 50 of 400 tickets – 12.5% • P3: 250 of 400 tickets – 62.5% 100 200 120 P1 200 200 120 P3 200 200 160 P3 • P1 ran 2 of 8 slices – 25% 200 200 200 … • P2 ran 1 of 8 slices – 12.5% • P3 ran 5 of 8 slices – 62.5%

Maria Hybinette, UGA

Summary & Lingering Issues

• Why stride over lottery scheduling? – It is deterministic – It is more “fair” for shorter time scales • Why choose lottery over stride scheduling? – Stride schedulers need to store a lot more state, lottery does not need global state. – How does a stride scheduler deal with new processes? • Enters at Pass = 0 • So new jobs will monopolizes the CPU, and it takes time to catch up. • Both schedulers require tickets assignment – How do yoopen problemu know how many tickets to assign to each process? – This is an

34 Maria Hybinette, UGA .

• What about systems with multiple CPUs? – Things get a lot more complicated when the number of CPUs > 1 • Other things to consider – Single Queue for multiple processors • Does not support

35 Maria Hybinette, UGA

Single Queue Scheduling across multiprocessors • Single Queue Multiprocessor Scheduling (SQMS) – Most basic design: all processes go into a single queue – CPUs pull tasks from the queue as needed – Good for load balancing (CPUs pull processes on demand)

Process Queue P1 P1P2 P3P2 P3P4 P4P5

CPU 0 P1

CPU 1 P2

CPU 2 P3

P4 36 Maria Hybinette, UGA CPU 3 Problems with SQMS

• The process queue is a shared data structure – Necessitates locking, or careful lock-free design • SQMS does not respect cache affinity

Process Queue P1 P1P2P4P5 P3P5P1P2 P3P4P2P1 P2P4P5P3

CPU 0 P1 P5 P4 P3

CPU 1 P2 P1 P5 P4 Worst case scenario: processes rarely run CPU 2 P3 P2 P1 P5 on the same CPU

CPU 3 P4 P3 P2 P1

37 Maria Hybinette, UGA Time

Linux Schedulers

• Linux 1.2: Round Robin (text book) • Linux 2.4 Global Queues (text book) – protected by locks – Simple: 3 types of processes: • Preemptable User/normal • Nonpreemptable: FIFO • Nonpreemptable Real time Round Robin – Poor performance on multiprocessors/core – Poor performance when N is large (N is #processes) – Selects a process according to a goodness function • Linux 2.5 and early versions of Linux 2.6: – O(1) scheduler, per-CPU run queue • Solves performance problems in the old scheduler • Complex, error prone logic to boost interactivity • No guarantee of fairness • Linux 2.6: completely fair scheduler (CFS) – Fair – Naturally boosts interactivity

Maria Hybinette, UGA Linux: O(1) Scheduler (brief discussion) • Replaced the old O(n) scheduler – Designed to reduce the cost of context switching – Used in kernels prior to 2.6.23 • Implements RR – Multilevel feedback paradigm MLFQ – Scheduler picks first process from highest priority queue with a ready process – Each queue has a time quantum that scales with priority – 140 priority levels, 2 queues per priority: • Active inactive(expired) queue – Process are scheduled from the active queue • When the active queue is empty, refill from inactive queue • Uses a bit map for the 140 queues to

achive O(1) 39 Maria Hybinette, UGA

• unsigned long bitmap[5] —> 160 (more than 140) • 32*5 = 160 • 32*4 = 128

Maria Hybinette, UGA Priority Assignment

• Static priorities – nice values [-20,19] – Default = 0 – Used for time slice calculation • Dynamic priorities [0, 139] – Used to demote CPU bound processes – Maintain high priorities for interactive processes – sleep() time for each process is measured • High sleep time ! interactive or I/O bound! high priority

41 Maria Hybinette, UGA

SNP / NUMA Support

• Processes are placed into a virtual hierarchy – Groups are scheduled onto a physical CPU – Processes are preferentially pinned to individual cores – to leverage warmed up caches. • Problems: not fair

42 Maria Hybinette, UGA Completely Fair Scheduler (CFS)

• Replaced the O(1) scheduler – In use since 2.6.23, has O(log N) runtime • Moves from MLFQ to Weighted Fair Queuing – First major OS to use a fair scheduling algorithm – Very similar to stride scheduling – Processes ordered by the amount of CPU time they use • Gets rid of active/inactive run queues in favor of a red-black tree of processes – Red back tree is simply a balanced tree – unbalanced tree can be O(n), but balanced ensures • CFS isn’t actually “completely fair” – -- Unfairness is bounded O(N)

43 Maria Hybinette, UGA

Red-Black Process Tree

• Tree organized according to amount of CPU time used by each process – Measured in nanoseconds, obviates the need for time slices

1 • Add the • Left-most 7 process back process 1 2 to the tree has always • Rebalance used the 5 5 2 the tree least time 3 2 2 5 • Scheduled 8 2 7 1 2 next 7 7 2 44 2 Maria Hybinette, UGA BF Scheduler • What does BF stand for? – Look it up yourself • Alternative to CFS, introduced in 2009 – O(n) runtime, single run queue – Dead simple implementation • Goal: a simple scheduling algorithm with fewer parameters that need manual tuning – Designed for light NUMA workloads – Doesn’t scale to cores > 16 • For the adventurous: download the BFS patches and build yourself a custom kernel

https://en.wikipedia.org/wiki/BF Maria Hybinette, UGA

Energy/Power & Scheduling.

Maria Hybinette, UGA