<<

Linux Scheduler Descending to Reality. . . Philosophies Processor Processor Affinity Basic Scheduling Algorithm The Run Queue The Highest Priority Calculating Scheduler Timeslices Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 1 / 40 Timers Descending to Reality. . .

Linux Scheduler ■ Descending to The Linux scheduler tries to be very efficient Reality. . . Philosophies ■ To do that, it uses some complex data Processor Scheduling structures Processor Affinity Basic Scheduling ■ Algorithm Some of what it does actually contradicts the The Run Queue The Highest Priority schemes we’ve been discussing. . . Process Calculating Timeslices Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 2 / 40 Timers Philosophies

Linux Scheduler ■ Descending to Use large quanta for important processes Reality. . . Philosophies ■ Modify quanta based on CPU use Processor Scheduling ■ Bind processes to CPUs Processor Affinity Basic Scheduling ■ Algorithm Do everything in O(1) time The Run Queue The Highest Priority Process Calculating Timeslices Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 3 / 40 Timers Processor Scheduling

Linux Scheduler ■ Descending to Have a separate run queue for each processor Reality. . . Philosophies ■ Each processor only selects processes from its Processor Scheduling own queue to run Processor Affinity Basic Scheduling ■ Algorithm Yes, it’s possible for one processor to be The Run Queue The Highest Priority while others have jobs waiting in their run Process Calculating queues Timeslices Typical Quanta ■ Periodically, the queues are rebalanced: if one Dynamic Priority Interactive Processes processor’s run queue is too long, some Using Quanta Avoiding Indefinite processes are moved from it to another Overtaking The Priority Arrays processor’s queue Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 4 / 40 Timers Processor Affinity

Linux Scheduler ■ Descending to Each process has a bitmask saying what CPUs Reality. . . Philosophies it can run on Processor Scheduling ■ Normally, of course, all CPUs are listed Processor Affinity Basic Scheduling ■ Algorithm Processes can change the mask The Run Queue ■ The Highest Priority The mask is inherited by child processes (and Process Calculating threads), thus tending to keep them on the Timeslices Typical Quanta same CPU Dynamic Priority Interactive Processes ■ Rebalancing does not override affinity Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 5 / 40 Timers Basic Scheduling Algorithm

Linux Scheduler ■ Descending to Find the highest- with a runnable Reality. . . Philosophies process Processor Scheduling ■ Find the first process on that queue Processor Affinity Basic Scheduling ■ Algorithm Calculate its quantum size The Run Queue ■ The Highest Priority Let it run Process Calculating ■ When its time is up, put it on the expired list Timeslices Typical Quanta ■ Repeat Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 6 / 40 Timers The Run Queue

Linux Scheduler ■ Descending to 140 separate queues, one for each priority level Reality. . . Philosophies ■ Actually, that number can be changed at a Processor Scheduling given site Processor Affinity Basic Scheduling ■ Algorithm Actually, two sets, active and expired The Run Queue ■ The Highest Priority Priorities 0-99 for real-time processes Process Calculating ■ Priorities 100-139 for normal processes; value Timeslices Typical Quanta set via () Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 7 / 40 Timers The Highest Priority Process

Linux Scheduler ■ Descending to There is a bit map indicating which queues Reality. . . Philosophies have processes that are ready to run Processor Scheduling ■ Find the first bit that’s set: Processor Affinity Basic Scheduling Algorithm ◆ 140 queues ⇒ 5 integers The Run Queue The Highest Priority ◆ Only a few compares to find the first that Process Calculating Timeslices is non-zero Typical Quanta ◆ Dynamic Priority Hardware instruction to find the first 1-bit Interactive Processes ◆ Using Quanta Time depends on the number of priority Avoiding Indefinite Overtaking levels, not the number of processes The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 8 / 40 Timers Calculating Timeslices

Linux Scheduler ■ Descending to Calculate Reality. . . Philosophies Processor Scheduling  (140 − SP) × 20 if SP < 120 Processor Affinity Quantum =  Basic Scheduling (140 − SP) × 5 if SP ≥ 120 Algorithm  The Run Queue The Highest Priority Process where SP is the static priority Calculating Timeslices ■ Higher priority process get longer quanta Typical Quanta Dynamic Priority ■ Basic idea: important processes should run Interactive Processes Using Quanta longer Avoiding Indefinite Overtaking ■ Other mechanisms used for quick interactive The Priority Arrays Swapping Arrays response Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 9 / 40 Timers Typical Quanta

Linux Scheduler Descending to Static Reality. . . Philosophies Pri Niceness Quantum Processor Scheduling Processor Affinity Highest Static Pri 100 20 800 ms Basic Scheduling Algorithm High Static Pri 110 -10 600 ms The Run Queue The Highest Priority Normal 120 0 100ms Process Calculating Timeslices Low Static Pri 130 +10 50 ms Typical Quanta Dynamic Priority Lowest Static Pri 139 +20 5 ms Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 10 / 40 Timers Dynamic Priority

Linux Scheduler ■ Descending to Dynamic priority is calculated from static Reality. . . Philosophies priority and average sleep time Processor Scheduling ■ When process wakes up, record how long it Processor Affinity Basic Scheduling Algorithm was sleeping, up to some maximum value The Run Queue ■ The Highest Priority When the process is running, decrease that Process Calculating value each timer tick Timeslices Typical Quanta ■ Roughly speaking, the bonus is a number in Dynamic Priority Interactive Processes [0, 10] that measures what percentage of the Using Quanta Avoiding Indefinite time the process was sleeping recently; 5 is Overtaking The Priority Arrays neutral, 10 helps priority by 5, 0 hurts priority Swapping Arrays Why Two Arrays? The Traditional by 5 Algorithm Linux is More Efficient Locking Runqueues Real-Time DP = max(100, min(SP − bonus +5, 139)) Scheduling Sleeping and Waking 11 / 40 Timers Interactive Processes

Linux Scheduler ■ Descending to A process is interactive if Reality. . . Philosophies Processor Scheduling bonus − 5 ≥ S/4 − 28 Processor Affinity Basic Scheduling Algorithm ■ The Run Queue Low-priority processes have a hard time The Highest Priority Process becoming interactive Calculating Timeslices ■ Typical Quanta A default priority process becomes interactive Dynamic Priority Interactive Processes when its sleep time is greater than 700 ms Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 12 / 40 Timers Using Quanta

Linux Scheduler ■ Descending to At every time tick, decrease the quantum of Reality. . . Philosophies the current running process Processor Scheduling ■ If the time goes to zero, the process is done Processor Affinity Basic Scheduling ■ Algorithm If the process is non-interactive, put it aside on The Run Queue The Highest Priority the expired list Process Calculating ■ If the process is interactive, put it at the end Timeslices Typical Quanta of the current priority queue Dynamic Priority Interactive Processes ■ If there’s nothing else at that priority, it will Using Quanta Avoiding Indefinite run again immediately Overtaking The Priority Arrays ■ Of course, by running so much is bonus will go Swapping Arrays Why Two Arrays? The Traditional down, and so will its priority and its interative Algorithm Linux is More status Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 13 / 40 Timers Avoiding Indefinite Overtaking

Linux Scheduler ■ Descending to There are two sets of 140 queues, active and Reality. . . Philosophies expired Processor Scheduling ■ The system only runs processes from active Processor Affinity Basic Scheduling Algorithm queues, and puts them on expired queues The Run Queue The Highest Priority when they use up their quanta Process Calculating ■ When a priority level of the active queue is Timeslices Typical Quanta empty, the scheduler looks for the next-highest Dynamic Priority Interactive Processes priority queue Using Quanta Avoiding Indefinite ■ After running all of the active queues, the Overtaking The Priority Arrays active and expired queues are swapped Swapping Arrays Why Two Arrays? ■ The Traditional There are pointers to the current arrays; at the Algorithm Linux is More end of a cycle, the pointers are switched Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 14 / 40 Timers The Priority Arrays

Linux Scheduler Descending to Reality. . . struct runqueue { Philosophies Processor struct prioarray *active; Scheduling Processor Affinity struct prioarray *expired; Basic Scheduling Algorithm struct prioarray arrays[2]; The Run Queue The Highest Priority Process }; Calculating Timeslices Typical Quanta Dynamic Priority struct prioarray { Interactive Processes Using Quanta int nr_active; /* # Runnable */ Avoiding Indefinite Overtaking unsigned long bitmap[5]; The Priority Arrays Swapping Arrays struct list_head queue[140]; Why Two Arrays? The Traditional ; Algorithm } Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 15 / 40 Timers Swapping Arrays

Linux Scheduler Descending to Reality. . . struct prioarray *array = rq->active; Philosophies Processor Scheduling Processor Affinity if (array->nr_active == 0) { Basic Scheduling Algorithm rq->active = rq->expired; The Run Queue The Highest Priority Process rq->expired = array; Calculating Timeslices } Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 16 / 40 Timers Why Two Arrays?

Linux Scheduler ■ Descending to Why is it done this way? Reality. . . Philosophies ■ It avoids the need for traditional aging Processor Scheduling ■ Why is aging bad? Processor Affinity Basic Scheduling ■ Algorithm It’s O(n) at each clock tick The Run Queue The Highest Priority Process Calculating Timeslices Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 17 / 40 Timers The Traditional Algorithm

Linux Scheduler Descending to Reality. . . for(pp = proc; pp < proc+NPROC; pp++) { Philosophies Processor if (pp->prio != MAX) Scheduling Processor Affinity pp->prio++; Basic Scheduling Algorithm if (pp->prio > curproc->prio) The Run Queue The Highest Priority Process reschedule(); Calculating Timeslices } Typical Quanta Dynamic Priority Interactive Processes Using Quanta Every process is examined, quite frequently Avoiding Indefinite Overtaking (This code is taken almost verbatim from 6th The Priority Arrays Swapping Arrays Edition Unix, circa 1976.) Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 18 / 40 Timers Linux is More Efficient

Linux Scheduler ■ Descending to Processes are touched only when they start or Reality. . . Philosophies stop running Processor Scheduling ■ That’s when we recalculate priorities, bonuses, Processor Affinity Basic Scheduling Algorithm quanta, and interactive status The Run Queue ■ The Highest Priority There are no loops over all processes or even Process Calculating over all runnableprocesses Timeslices Typical Quanta Dynamic Priority Interactive Processes Using Quanta Avoiding Indefinite Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 19 / 40 Timers Locking Runqueues

Linux Scheduler ■ Descending to To rebalance, the kernel sometimes needs to Reality. . . Philosophies move processes from one runqueue to another Processor Scheduling ■ This is actually done by special kernel threads Processor Affinity Basic Scheduling ■ Algorithm Naturally, the runqueue must be locked before The Run Queue The Highest Priority this happens Process Calculating ■ The kernel always locks runqueues in order of Timeslices Typical Quanta increasing address Dynamic Priority Interactive Processes ■ Why? Deadlock prevention! (It is good for Using Quanta Avoiding Indefinite something. . . Overtaking The Priority Arrays Swapping Arrays Why Two Arrays? The Traditional Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 20 / 40 Timers Real-Time Scheduling

Linux Scheduler ■ Descending to Linux has soft real-time scheduling Reality. . . Philosophies ■ Processes with priorities [0, 99] are real-time Processor Scheduling ■ All real-time processes are higher priority than Processor Affinity Basic Scheduling Algorithm any conventional processes The Run Queue ■ The Highest Priority Two real-time scheduling systems, FCFS and Process Calculating round-robin Timeslices Typical Quanta ■ First-come, first-served: process is only Dynamic Priority Interactive Processes preempted for a higher-priority process; no Using Quanta Avoiding Indefinite time quanta Overtaking The Priority Arrays ■ Round-robin: real-time processes at a given Swapping Arrays Why Two Arrays? The Traditional level take turns running for their time quantum Algorithm Linux is More Efficient Locking Runqueues Real-Time Scheduling Sleeping and Waking 21 / 40 Timers Linux Scheduler

Sleeping and Waking Sleeping and Waking Sleeping Waking Up a Process Scheduler-Related System Calls Major Kernel Functions Fair Share Scheduling Sleeping and Waking Timers

22 / 40 Sleeping and Waking

Linux Scheduler ■ Processes need to wait for events Sleeping and Waking Sleeping and Waking ■ Waiting is done by putting the process on a Sleeping Waking Up a Process wait queue Scheduler-Related ■ System Calls Wakeups can happen too soon; the process Major Kernel Functions must check its condition and perhaps go back Fair Share Scheduling to sleep Timers

23 / 40 Sleeping

Linux Scheduler Sleeping and Waking DECLARE_WAIT_QUEUE(wait, current); Sleeping and Waking Sleeping Waking Up a Process Scheduler-Related /* Sleep on queue ’q’ */ System Calls Major Kernel add_wait_queue(q, &wait); Functions Fair Share while (!condition) { Scheduling Timers set_current_state(TASK_INTERRUPTIBLE); if (signal_pending(current)) /* handle signal */ (); } set_current_state(TASK_RUNNING); remove_wait_queue(q, &wait);

24 / 40 Waking Up a Process

Linux Scheduler ■ You don’t wake a process, you wake a wait Sleeping and Waking Sleeping and Waking queue Sleeping Waking Up a ■ Process There may be multiple processes waiting for Scheduler-Related System Calls the event, i.e., several processes trying to read Major Kernel Functions a single disk block Fair Share Scheduling ■ The condition may not, in fact, have been Timers satisfied ■ That’s why the sleep routine has a loop

25 / 40 Scheduler-Related System Calls

Linux Scheduler nice() Lower a process’ static priority Sleeping and Waking Sleeping and Waking getpriority()/setpriority() Change priorities of Sleeping Waking Up a Process a process group Scheduler-Related System Calls sched getscheduler()/sched setscheduler() Major Kernel Functions Set scheduling policy and parameters. (Many Fair Share Scheduling more starting with sched ; use man -k to Timers learn their names.)

26 / 40 Major Kernel Functions

Linux Scheduler

Sleeping and Waking scheduler tick() Called each timer tick to up- Sleeping and Waking Sleeping date quanta Waking Up a Process try to wakeup() Attempts to wake a process, Scheduler-Related System Calls put in on a run queue, rebal- Major Kernel Functions Fair Share ance loads, etc. Scheduling recalc prio() update average sleep time an Timers dynamic priority schedule() Pick the next process to run rebalance tick() Check if load-banlancing is needed

27 / 40 Fair Share Scheduling

Linux Scheduler ■ Suppose we wanted to add a fair share Sleeping and Waking Sleeping and Waking scheduler to Linux Sleeping Waking Up a ■ Process What should be done? Scheduler-Related ■ System Calls Add a new scheduler type for Major Kernel Functions sched setscheduler() Fair Share Scheduling ■ Calculate process priority, interactivity, bonus, Timers etc., based on all processes owned by that user ■ How can that be done efficiently? What sorts of new data structures are needed?

28 / 40 Linux Scheduler

Sleeping and Waking

Timers Why Does the Kernel Need Timers? Two Basic Functions Timer Types Timer Ticks Jiffies Potent and Evil Timers Magic Time of Day Kernel Timers Dynamic Timers Delay Functions System Calls

29 / 40 Why Does the Kernel Need Timers?

Linux Scheduler ■ Animated applications Sleeping and Waking ■ Timers Screen-savers Why Does the Kernel Need ■ Time of day for file timestamps Timers? Two Basic Functions ■ Quanta! Timer Types Timer Ticks Jiffies Potent and Evil Magic Time of Day Kernel Timers Dynamic Timers Delay Functions System Calls

30 / 40 Two Basic Functions

Linux Scheduler ■ Time of day (especially as a service to Sleeping and Waking Timers applications) Why Does the Kernel Need ■ Timers? Interval timers — something should happen n Two Basic Functions Timer Types ms from now Timer Ticks Jiffies Potent and Evil Magic Time of Day Kernel Timers Dynamic Timers Delay Functions System Calls

31 / 40 Timer Types

Linux Scheduler ■ Real-Time Clock: tracks time and date, even if Sleeping and Waking

Timers the computer is off; can at a certain Why Does the Kernel Need rate or at a certain time. Use by Linux only at Timers? Two Basic Functions boot time to get time of day Timer Types Timer Ticks ■ Time Stamp Counter: ticks once per CPU Jiffies Potent and Evil Magic clock; provides very accurate interval timing Time of Day ■ Kernel Timers Programmable Interval Timer: generates Dynamic Timers Delay Functions periodic . On Linux, the rate, called System Calls HZ, is usually 1000 Hz (100 Hz on slow CPUs) ■ A variety of special, less common (and less used) timers

32 / 40 Timer Ticks

Linux Scheduler ■ Linux programs a timer to interrupt at a Sleeping and Waking

Timers certain rate Why Does the Kernel Need ■ Each tick, a number of operations are carried Timers? Two Basic Functions out Timer Types Timer Ticks ■ Three most important Jiffies Potent and Evil Magic ◆ Keeping track of time Time of Day Kernel Timers ◆ Invoking dynamic timer routines Dynamic Timers Delay Functions ◆ Calling scheduler tick() System Calls ■ The system uses the best timer available

33 / 40 Jiffies

Linux Scheduler ■ Each timer tick, a variable called jiffies is Sleeping and Waking

Timers incremented Why Does the Kernel Need ■ It is thus (roughly) the number of HZ since Timers? Two Basic Functions system boot Timer Types Timer Ticks ■ A 32-bit counter incremented at 1000 Hz Jiffies Potent and Evil Magic wraps around in about 50 days Time of Day ■ Kernel Timers We need 64 bits — but there’s a problem Dynamic Timers Delay Functions System Calls

34 / 40 Potent and Evil Magic

Linux Scheduler ■ A 64-bit value cannot be accessed atomically Sleeping and Waking

Timers on a 32-bit machine Why Does the Kernel Need ■ A spin- is used to synchronize access to Timers? Two Basic Functions jiffies 64; kernel routines call Timer Types Timer Ticks get jiffies 64() Jiffies Potent and Evil ■ Magic But we don’t want to have to increment two Time of Day Kernel Timers variables each tick Dynamic Timers ■ Delay Functions Linker magic is used to make jiffies the System Calls low-order 32 bits of jiffies 64 ■ Ugly!

35 / 40 Time of Day

Linux Scheduler ■ The time of day is stored in xtime, which is a Sleeping and Waking

Timers struct timespec Why Does the Kernel Need ■ It’s incremented once per tick Timers? Two Basic Functions ■ Again, a spin-lock is used to synchronize Timer Types Timer Ticks access to it Jiffies Potent and Evil ■ Magic The apparent tick rate can be adjusted Time of Day Kernel Timers slightly, via the adjtimex() system call Dynamic Timers Delay Functions System Calls

36 / 40 Kernel Timers

Linux Scheduler ■ Two types of timers use by kernel routines Sleeping and Waking ■ Timers Dynamic timer — call some routine after a Why Does the Kernel Need particular interval Timers? Two Basic Functions ■ Delay loops — tight spin loops for very short Timer Types Timer Ticks delays Jiffies Potent and Evil ■ Magic User-mode interval timers are similar to kernel Time of Day Kernel Timers dynamic timers Dynamic Timers Delay Functions System Calls

37 / 40 Dynamic Timers

Linux Scheduler ■ Specify an interval, a subroutine to call, and a Sleeping and Waking

Timers parameter to pass to that subroutine Why Does the Kernel Need ■ Parameter used to differentiate different Timers? Two Basic Functions instantiations of the same timer — if you have Timer Types Timer Ticks 4 Ethernet cards creating dynamic timers, the Jiffies Potent and Evil Magic parameter is typically the address of the Time of Day Kernel Timers per-card data structure Dynamic Timers ■ Delay Functions Timers can be cancelled; there is (as usual) a System Calls delicate synchronization dance on multiprocessors. See the text for details

38 / 40 Delay Functions

Linux Scheduler ■ Spin in a tight loop for a short time — Sleeping and Waking

Timers microseconds or nanoseconds Why Does the Kernel Need ■ Nothing else can use that CPU during that Timers? Two Basic Functions time, except via interrupt Timer Types Timer Ticks ■ Used only when the overhead of creating a Jiffies Potent and Evil Magic dynamic timer is too great for a very short Time of Day Kernel Timers delay Dynamic Timers Delay Functions System Calls

39 / 40 System Calls

Linux Scheduler ■ time() and gettimeofday() Sleeping and Waking ■ Timers adjtimex() — tweaks apparent clock rate Why Does the Kernel Need (even the best crystals drift; see the Remote Timers? Two Basic Functions Physical Device Fingerprinting paper from my Timer Types Timer Ticks COMS E6184 class Jiffies Potent and Evil ■ Magic setitimer() and alarm() — interval timers Time of Day Kernel Timers for applications Dynamic Timers Delay Functions System Calls

40 / 40