Process Scheduling 5
Total Page:16
File Type:pdf, Size:1020Kb
Unix Internals Process Scheduling 5
Unix scheduler policy implementation context switch hardware execution context of current process is saved in it’s process control block (PCB) which is traditionally part of the u area of the process context of the next process is retrieved from it’s PCB, loaded into the hardware registers architecture-specific tasks o flush data cache after context switch o flush instruction cache after context switch o flush address translation cache after context switch o pipelined architectures (e.g., RISC) flush instruction pipeline prior to context switch hardware clock fixed time interval interrupt clock tick 10 milliseconds major tick n clock ticks n depends upon specific Unix variant clock frequency o number of ticks per second o constant HZ defined in param.h file kernel functions measure time in number of ticks
clock interrupt handler highly system-dependent priority second only to power-failure interrupt updates CPU usage statistics for current process performs scheduler-related functions o priority recomputation o time-slice expiration handling sends SIGXCPU signal to current process if it has exceeded its CPU usage quota
C. Robert Putnam Page 1 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM updates o time-of-day clock o SVR4 variable lbolt – number of ticks elapsed since system last booted handles callouts wakes system processes when required o swapper o pagedaemon handles alarms
callouts records a function that the kernel will invoke after a specific number of ticks int to_ID = timeout(void(*fn)(), caddr_t arg, long delta); o fn() – kernel function to be invoked o arg – argument to be passed to the function o delta – time interval measured in number of ticks kernel invokes callout in system context o function cannot sleep o function cannot access the process context void untimeout( int to_ID); periodic tasks o retransmission of network packets o scheduler functions o memory management functions o monitoring devices to avoid losing interrupts o polling devices normal kernel operation – do not execute at interrupt priority o clock handler checks if any callouts are due o sets flag indicating that callout handler must run o when system returns from base interrupt priority . checks flag . if flag is set, system invokes callout handler . handler invokes each callout that is due to run o once due, callouts run, but only after all pending interrupts have been serviced
pending callout list checking time o every CPU tick o high interrupt priority o optimize checking time insertion time o low priority o much less frequently than one per tick
C. Robert Putnam Page 2 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM implementation o 4.3BSD sort list in order of “time to fire” . stores differential time of expiration for each entry difference between its time to fire and that of the previous callout . decrements time difference in first entry at each clock o sort list in order of “time to fire” . stores absolute time of expiration for each entry . compares current absolute time with time in first entry o timing wheel . fixed-size circular array of callout queues . at each tick, clock interrupt handler advances current time pointer to the next element in the array, wrapping around at the end of the array . if any callouts are in the queue, their expiration time is checked . new callouts are inserted into the queue which is N elements away from current queue, where N is the time to fire measured in ticks . timing wheel hashes callouts based on the expiration time . within each queue, callouts may be sorted or unsorted . sorting callouts reduces time required to process non-empty queues but increases insertion time
alarms real time alarm o measures actual elapsed time o notifies process via SIGALRM signal profiling alarm o measures amount of time process has been executing o notifies process via SIGPROF signal virtual-time alarm monitors time spent in user mode notifies process via SIGVALRM signal
BSD Unix setitimer() system call allows process to request any type of alarm specify time interval in microseconds
SVRx alarm() system call requests real time alarm specify time interval in whole number of seconds
C. Robert Putnam Page 3 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM SVR4 hrtsys() system call allows time to be specified in microseconds
high resolution of real time alarms does not imply high accuracy
kernel delivers SIGALRM signal to process, process will not respond to signal until it is next scheduled to run
high-resolution timers used by high priority processes are less likely to have scheduling delays but if current process is executing in kernel mode and does not reach an preemption point
profiling & virtual alarms
clock interrupt handler charges whole tick to process even though it uses only a portion of it time measured by these alarms reflect number of clock interrupts that have occurred while process was running long run – average out good indicator of time used for any single alarm, significant inaccuracy
scheduler goals
interactive process reduce average time and variance between user action and application response sufficiently so that users cannot readily detect the delay
batch process measure of scheduling efficiency is the task completion time in the presence of other activity
real time process -- time critical predictable scheduling behavior with guaranteed bounds on response times
kernel functions – paging, interrupt handling, process management execute promptly when required
mix of processes
all application processes must continue to progress no application should prevent another from progressing unless user has explicitly permitted it system should always be able to receive and process interactive user input
C. Robert Putnam Page 4 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Traditional Unix Scheduling
SVR3, 4.3BSD
time-sharing, interactive environments multiple users several batch & foreground processes simultaneously improve response times of interactive users ensuring that low-priority, background jobs do not starve
priority-based scheduling each process has a scheduling priority that changes with time scheduler selects highest-priority runnable process uses preemptive time-slicing to schedule processes of equal priority dynamically varies process priority based on their CPU usage patterns kernel is nonpreemptible running process o may voluntarily relinquish CPU when blocking on a resource o can be preempted when it returns to user mode
process priorities kernel priority numbers 0 – 49 user mode priority numbers 50 – 127
proc structure p_pri current scheduling priority p_usrpri user-mode priority p_cpu measure of recent CPU usage p_nice user-controllable nice factor
scheduler uses p_pri to decide which process to schedule process in user mode p_pri == p_usrpri process wakes up after blocking in a system call scheduler uses p_pri to store the temporary kernel priority
sleep priority terminal input 28 disk I/O 20 process wakes up after blocking o kernel sets p_pri sleep priority of the event or resource o process scheduled ahead of other user processes process completes system call, about to return to user mode kernel sets scheduling priority p_pri p_usrpri
C. Robert Putnam Page 5 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM user-mode priority p_usrpri depends upon recent CPU usage p_cpu user-controllable nice factor p_nice range 0 – 39 , default 20 o user mode: Δ↑ p_nice Δ↓ priority o superuser: Δ↓ p_nice Δ↑ priority
process priority algorithm process creation p_cpu = 0 clock tick clock handler increments p_cpu for current process (max 127) every second kernel invokes schedcpu() -- scheduled by callout o reduces p_cpu of each process by decay factor o SVR3 fixed decay factor ½ undesirable side effect – system load Δ↑ Δ↑ priorities o 4.3BSD computes decay = (2 * load_average) / (2 * load_average + 1); load_average == average number of runnable processes over last second p_usrpri = PUSER + ( p_cpu / 4 ) + ( 2 * p_nice ); PUSER is the baseline user priority of 50
C. Robert Putnam Page 6 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM scheduler implementation
whichgs (global variable – bitmask 32 cells – one bit for each queue) 0 0 0 1 0 1 0 … qs (32 rows) 0 – 3 proc structures 4 -- 7 8 -- 11 P P P 12--15 P P 16--19 runnable processes 20--23
…
context switch - 124-127 swtch() function examines whichgs to find index of the first set bit bit identifies scheduler queue containing highest priority runnable process removes process from head of queue & switches context to it proc structure -- p_addr field points to page table entries of the u area process control block is a part of the u area VAX-11 o special instructions – 32 bit fields . FFS – find First Set . FFC – Find First Clear o special instructions – double linked lists . INSQHI – insert elements into doubly linked lists . REMQHI– remove elements from doubly linked lists o special instructions – process context . LDPCTX – load process context . SVPCTX – save process context
C. Robert Putnam Page 7 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM run queue manipulation highest priority process always runs unless current process is executing in kernel mode fixed time quantum -- 4.3BSD 100 milliseconds every 100 ms, kernel invokes roundrobin() function to schedule next process from same queue if higher priority process were runnable, preferentially scheduled without waiting for roundrobin() function if other runnable processes are on a lower priority queue, the current process continues to run even though its quantum has expired every second, schedcpu() recomputes the priority of each process o removes the process from the queue o recomputed process priority o inserts process into a (possibly) different run queue every four ticks, clock interrupt handler computes priority of the current process context switch o current process (voluntary context switch) . blocks on a resource . exits o priority recomputation priority of another process > current priority o current process or interrupt handler wakes up a higher priority process
voluntary context switch o kernel directly calls swtch() from sleep() or exit() functions
involuntary switches o system is in kernel mode, cannot directly preempt the process o kernel sets runrun flag o when process is about to return to user mode, kernel checks the runrun flag o if runrun flag is set, kernel transfers control to swtch() routine
limitations large number of processes inefficient to recompute priorities every second no way to guarantee CPU resources to specific group of processes no guarantee of response time little control over priorities by applications nonpreemptive kernel -- priority inversion -- higher-priority process may have to wait significant amount of time for lower-priority process to complete
C. Robert Putnam Page 8 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM SVR4 Scheduler
scheduling class defines scheduling policy for set of processes
SVR4 scheduling classes time-sharing real-time
class-independent routines -- implement common services context switching run queue manipulation preemption procedural interface
class-dependent functions – pure virtual functions priority recomputation o real-time class – fixed priorities o time-sharing class -- varies process priorities dynamically inheritance
class independent layer
dqactmap (global variable – bitmask 160 cells – one bit for each queue) 0 0 1 0 0 1 0 … dispq (160 rows) 160 proc structures 159 158 P P P 157 P P 156 runnable processes 155 0 – 59 time-sharing class … 60 – 99 system priorities setfrontdq() – insert process100 at front– 159 of real-time queue, class e.g., 0 process preempted before time quantum had expired setbackdq() – insert process at back of queue, e.g., newly runnable process dispdeq() – remove process from queue
dispatch latency delay between the
C. Robert Putnam Page 9 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM time that process becomes runnable and time that process actually begins running
SVR4 kernel preemption points
preemption point -- places in kernel code where all kernel data structures are in stable state kernel is about to embark on a lengthy computation
at preemption point kernel macro PREEMPT() checks kprunrun flag if kprunrun flag is set, o real-time process is requesting access to run o PREEMPT() macro calls preempt() kernel routine to preempt process bounds the amount of time that real-time process must wait before being scheduled
preemption points pathname parsing routine lookuppn() before beginning to parse each individual pathname component open() system call before creating a file if it does not exist memory subsystem before freeing process pages
preempt() kernel routine o invokes CL_PREEMPT operation to perform class-dependent processing then o calls swtch() to initiate context switch
C. Robert Putnam Page 10 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM swtch() function o calls pswtch() to perform machine-independent part of context switch o pswtch() . clears runrun and kprunrun flags, . selects highest priority runnable process . removes highest priority runnable process from the dispatch queue . updates dqactmap . sets state of process to SONPROC (running on processor) . updates memory management registers to map the u area . updates virtual address translation registers of the new process o invokes assembly language code to . manipulate register content . flush translation buffers . etc
C. Robert Putnam Page 11 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM interface to scheduling classes generic interface provides virtual functions that are implemented by each scheduling class
rt_classfuncs
rt_init real-time sys_classfunc sys_init system ts_classfunc ts_init time-sharing
proc structures proc structures
p_cid p_cid p_cid p_cid
p_clfuncs p_clfuncs p_clfuncs p_clfuncs
p_clproc p_clproc p_clproc p_clproc
class- class- class- class- dependent dependent dependent dependent data data data data
classfuncs structure vector of pointers to the functions that implement the class-dependent interface for any class
global class table contains one entry
p_cid index into global class table class ID p_clfuncs pointer to the classfuncs vector for the process class p_clproc pointer to class-dependent private data structure
C. Robert Putnam Page 12 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM accessing class-dependent functions set of macros are used to o resolve generic interface function calls and to o invoke the correct class-dependent function
#define CL_SLEEP(procp, clproc, … ) \ (*(procp)->p_clfuncs->cl_sleep)(clprocp, … )
scheduling classes determine the policies for priority computation o range of priorities for the processes o if and under what conditions the priority can change o size of the time slice each time a process runs o whether time slice size will depend upon the priority class e.g., infinite time quantum for real-time tasks scheduling of the processes in the class
entry points of the class-dependent interface
CL_TICK clock interrupt handler initiates call monitors time slice, recomputes priorities, handles time quantum expiration, etc. CL_FORK CL_FORKRET fork() initiates call CL_FORK initializes child-specific data structure CL_FORKRET may set runrun, allowing child process to run before parent CL_ENTERCLASS CL_EXITCLASS call is initiated when a process enters or exits a scheduling class; responsible for allocating & deallocating the class-dependent data structures
CL_SLEEP sleep() initiates call; may recompute process priority
CL_WAKEUP wakeprocs() initiates call; places the process on the appropriate run queue; may set runrun or kprunrun scheduling class determines the specific actions to be accomplished by a particular function, e.g., clock interrupt handler calls CL_TICK routine for the class to which the process belongs; the routine determines the exact actions to be performed
C. Robert Putnam Page 13 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Time-Sharing Class default class dynamic priorities round-robin scheduling is used for processes with equal priorities static dispatcher parameter table controls process priorities and time slices time slice depends upon scheduling priority ↓Δ priority ↑Δ time slice
index globpri quantum tqexp slpret maxwait lwait 0 0 100 0 10 5 10 1 1 100 0 11 5 11 … … … … … … … 15 15 80 7 25 5 25 … … … … … … … 40 40 20 30 50 5 50 … … … … … … … 59 59 10 49 59 5 59
ts_globpri global priority ts_quantum time quantum ts_tqexp ts_cpupri value when time quantum expires before using ts_maxwait seconds ts_slpret ts_cpupri value when returning to user mode after sleeping ts_maxwait seconds to wait for time quantum expiry ts_lwait ts_cpupri value when time quantum does not expire before using ts_maxwait seconds
event-driven scheduling ,i.e., process priority changed in response to specific event related to process scheduler o reduces priority each time it uses its entire time slice o boosts priority if . process blocks on event or resource . takes a long time to use its time slice
struct tsproc class-dependent data ts_timeleft time remaining in quantum 0 ≤ ts_umdpri ≤ 59 ts_cpupri system part of priority ts_upri user part of priority (nice value), range [ -20, +19 ], default 0
C. Robert Putnam Page 14 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM ts_umdpri user mode priority == min { max { ts_cpupri + ts_upri, 0 } 59 } ts_diswait clock time (seconds) since start of quantum
process returns from sleeping in kernel priority is kernel priority determined by sleep condition return to user mode priority restored from ts_umdpri ts_umdpri may be changed by priocntl(), but only a superuser increase it ts_cpupri is adjusted according to dispatcher parameter table
dispatcher parameter table access current ts_cpupri ts_tqexp, to_slpret, ts_lwait new ts_cpupri ts__umdpri ts_globpri, ts_quantum, ts_maxwait
ts_upri = 14, ts_cpupri = 1, ts_globpri = 15, ts_umdpri = 15
time quantum expires prior to ts_maxwait time ts_cpupri = 0, ts_umdpri = 14
time quantum does not expire prior to ts_maxwait time ts_cpupri = 11, ts_umdpri = 25
process makes a system call and blocks on resource; resumes and returns to user mode ts_cpupri = 11 ( ts_cpupri ts_slpret ), ts_umdpri = 25
C. Robert Putnam Page 15 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Real-Time Class
priority range [100 – 159] fixed priority fixed quantum bounded dispatch latency bounded response time real-time processes are scheduled before any kernel services
process executing in kernel mode; real-time process becomes runnable; real-time process must wait until current process is about to return to user mode or until it reaches a kernel preemption point
process explicitly makes a priocntl call to change priority or time quantum only super processes can enter real-time class; priocntl – specify priority and time quantum
default parameter table ↓Δ priority ↑Δ time slice
struct rtproc current time quantum time remaining in quantum current priority
dispatch latency time between process becomes runnable process begins to run
response time time between occurrence of event response to event time required by interrupt handler to process the event dispatch latency real-time process to respond to the event
C. Robert Putnam Page 16 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM preemption points divide kernel algorithms into small bounded work units
real-time process becomes runnable
rt_wakeup() [ class-dependent wakeup processing routine] sets kernel flag kprunrun
current process executing kernel code reaches preemption point checks flag initiates context switch to real-time process
latency bounds guaranteed only when real-time process is highest-priority runnable process on system
real-time process is running on system higher-priority real-time process becomes available higher-priority real-time process is preferentially scheduled latency calculation of lower-priority process is recalculated after process regains CPU
priocntl system call -- single process change priority class of process set ts_upri for time-sharing process reset priority and quantum for real-time process obtaining current values of scheduling parameters
priocntlset system call -- set of related process change priority class of processes set ts_upri for time-sharing processes reset priority and quantum for real-time processes obtaining current values of scheduling parameters
applies to all processes in system in process group or session in scheduling class owned by particular user having same parent
C. Robert Putnam Page 17 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Analysis allows addition of scheduling classes to system time-sharing class changes priorities based on events related to process fast and scalable dynamic priority calculation event-driven scheduling favors I/O bound and interactive jobs over CPU-bound jobs interactive job with large computations system will be unresponsive total system load and job mix characteristics determines system response system administer can alter system behavior by changing dispatcher table settings and rebuilding kernel; such retuning may be required to keep system efficient and responsive
adding a scheduling class does not require access to kernel source code o provide implementation of each class-dependent scheduling function o initialize classfuncs vector to point to these functions o provide initialization function to perform setup tasks, e.g., allocating internal data structures o add entry for this class in the class table in a master configuration file, typically located in the master.d directory of the kernel build directory o entry contains pointers to the initialization function and classfuncs vector o rebuild kernel
priocntl system call restricted to superuser no good way for a time-sharing class process to switch to a different class no provision for deadline-driven scheduling code path between preemption points is too long for many time-critical processes hard real-time systems require a fully preemptible kernel extremely difficult to tune system for a mixed set of applications system load varies constantly hence it is not reasonable to require careful manual tuning each time the load changes
C. Robert Putnam Page 18 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Solaris 2.x Scheduling
multithreaded, symmetric multiprocessing operating system
preemptive kernel
fully preemptive kernel
global kernel data structures are protected by synchronization objects mutual exclusion locks (mutexes) semaphores
implementation of interrupts using special kernel threads use the standard synchronization primitives of the kernel
rarely needs to raise the interrupt level to protect critical regions few nonpreemptible code segments higher-priority process is scheduled as soon as it is runnable
interrupt threads run at the highest priority scheduling classes can be dynamically loaded priorities of the interrupt threads are recomputed to ensure that they remain at the highest possible level blocked thread can only be restarted on the same processor
multiprocessor support
single dispatch queue for all processors slelected threads can be restricted to a single, specific processor processors communicate by sending cross-processor interrupts
per-processor data structure -- scheduling variables cpu_thread thread currently running on this processor cpu_dispthread thread last selected to run on this processor cpu_idle idle thread for this processor cpu_runrun preemption flag used for time-sharing threads cpu_kprunrun kernel preemption flag set by real-time threads cpu_chosen_level priority of thread that is going to preempt the current thread on this processor
C. Robert Putnam Page 19 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM Thread T1, priority 120 Processor P1 Thread T2, priority 130 Processor P2 Thread T3, priority 100 Processor P3 Thread T4, priority 132 Processor P4 Thread T5, priority 135 Processor P5
Thread T6, priority 130 Blocked
Thread T7, priority 115 Blocked
event on processor P1 makes thread T6 runnable kernel o places T6 on dispatch queue o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose() . marks selected processor for preemption . sets cpu_chosen_level to priority T6 value130 . send cross-processor interrupt
event on processor P2 makes thread T7 runnable kernel o places T7 on dispatch queue o calls cpu_choose() to find processor with the lowest priority thread
cpu_choose() . compares cpu_chosen_level to priority T7 value115 . does not mark selected processor for preemption
processor P3 o handles the interrupt o preempts thread T3
hidden scheduling
C. Robert Putnam Page 20 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM priority inversion
implementation of priority inheritance
limitations of priority inheritance
turnstiles
analysis
Mach Scheduling
C. Robert Putnam Page 21 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM C. Robert Putnam Page 22 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM C. Robert Putnam Page 23 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM C. Robert Putnam Page 24 4/3/2018 025608908da3ab8c3d1cf3661d077766.doc 6:19 AM