CCChahahaptptpteeerrr 555 KKKeeerrrnnneeelll SSSyyynnnccchhhrrrooonnniiizzzaaatttionionion

Hsung-Pin Chang Department of Computer Science National Chung Hsing University Outline • Kernel Control Paths • When Synchronization Is Not Necessary • Synchronization Primitives • Synchronizing Accesses to Kernel Data Structure • Examples of Race Condition Prevention Kernel Control Paths • Kernel control path – A sequence of instructions executed by the kernel to handle interrupts of different kinds – Each kernel request is handled by a different kernel control path Kernel Control Paths (Cont.) • Kernel requests may be issued in several possible ways – A process executing in User Mode causes an exception-for instance, by executing at int0x80 instruction – An external devices sends a signal to a Programmable Interrupt Controller Kernel Control Paths (Cont.) – A process executing in Kernel Mode causes a Page Fault exception – A process running in a MP system and executing in Kernel Mode raises an interprocessorinterrupt Kernel Control Paths (Cont.) • Kernel control path is quite similar to the process, except – Does not have a descriptor – Not scheduled through scheduler, but rather by inserting sequence of instructions into the kernel code Kernel Control Paths (Cont.) • In some cases, the CPU interleaves kernel control paths when one of the following event occurs – A process switch occurs, i.e., when the schedule() function is invoked – An interrupt occurs while the CPU is running a kernel control path with interrupt enabled – A deferrable function is executed Kernel Control Paths (Cont.) • Thus, some kernel data structures must be protected to prevent race condition – The code to modify these data structures must be in a critical section When Synchronization Is Not Necessary • kernel is not preemptive – A running process cannot be preempted while it remains in Kernel Mode • As a result, in Linux – No process running in Kernel Mode may be replaced by another process, except when the former voluntarily relinquishes control of CPU When Synchronization Is Not Necessary (Cont.) – Interrupt, exception or softirqhandling can interrupt a process running in Kernel Mode • However, when the handler terminates, kernel control path of the process is resumed – A process control path performing interrupt handling cannot be interrupted by a kernel control path executing a deferrable function or a system call service routine When Synchronization Is Not Necessary (Cont.) • Thus, on uniprocessor – Kernel data structures that are not updated by interrupt, exception, or softirqhandlers can be safely accessed • However, on MP, things are much more complicated • The rest describes what to do when synchronization is necessary Synchronization Primitives • Atomic operations • Memory Barriers • Spin Locks • Read/Write Spin Locks • The Big Reader Lock • • Read/Write Semaphores • Completions • Local Interrupt Disabling • Global Interrupt Disabling • Disabling Deferrable Functions Synchronization Primitives (Cont.)

Atomic operation Atomic read-modify-write ALL CPUs instruction to a counter Memory barrier Avoid instruction re-ordering Local CPU Spin lock Lock with busy wait ALL CPUs Semaphore Lock with blocking wait (sleep) ALL CPUs Local interrupt Forbid interrupt handling on a Local CPU disabling single CPU Local softirq Forbid deferrable function Local CPU disabling handling on a single CPU Global interrupt Forbid interrupt and softirq ALL CPUs disabling handling on all CPUs Atomic Operations • Some instructions are of type “read- modify-write” • If two such instructions are nonatomicthat issued by two CPUs to access the same location – Memory arbiter may assign memory to the second one while the first one has not yet been completed – Race condition Atomic Operations (Cont.) • To prevent race conditions – Provide operations that are atomic at chip level – Thus, cannot be interrupted in the middle and avoid access to the same memory location by other CPUs • Atomic operations acts as base of other, more flexible mechanisms to create critical sections Atomic Operations (Cont.) • 80x86 instructions that are atomic – Instructions that make zero or one aligned memory access – Read-modify-write, e.g., inc or dec, are atomic if no other processor has taken the memory bus in the middle • In a uniprocessor, no memory bus stealing Atomic Operations (Cont.) – Read-modify-write instruction whose opcodeis prefixed by the lock byte (0xf0) are atomic even on MP • Control unit (CU) lock the memory bus until the instruction is completed – Instructions whose opcodeis prefixed by a rep (0xf2) byte is not atomic • Rep: CU repeat the same instructions several times • CU check pending interrupts before a new iteration Atomic Operations (Cont.) • We don’t know the will use a single, atomic instruction for an operation – Linux thus provides • Atomic_t data type that are 24-bit atomically accessible counter • atmoicoperations – Table 5-2 and 5-3 Memory Barriers • Compiler may optimizing the code – Reorder the execution of instructions • However, for synchronization – Instructions reordering must be avoided – Thus, all synchronization primitives act as memory barriers Memory Barriers (Cont.) • A memory barriers primitives ensures that – the operations placed before the primitives are finished before starting the operations placed after the primitives – Like a firewall that cannot be passed by any outside instructions Memory Barriers (Cont.) • 80x86’s instructions that are “serializing” because they act as memory barriers – Instructions operate on I/O ports – Instructions perfixedby the lock type – Instructions that writes to control registers, system registers, or debug registers – A few special instructions, e.g., iret Memory Barriers in Linux Macro Description mb() Memory barrier for MP and UP rmb() Read memory barrier for MP or UP wmb() Write memory barrier for MP and UP smp_mb() Memory barrier for MP only, do nothing for UP smp_rmb() Read memory barrier for MP only, do nothing for UP smp_wmb() Write memory barrier for MP only, do nothing for UP Linux Implementation of Memory Barrier • Depends on system architecture • On the Intel platform – rmb() expands to • asmvolatile(“lock; addl$0,0(%%esp)”:::”memory”) – Volatile: Spin Locks • Spin locks are a special kind of lock designed to work in a MP system – If the lock is closed, spin around, i.e., repeatedly executing a tight loop, until the lock is released – Useless in a UP system • The waiting kernel control path would keep running, and the holding kernel control path have no chance to release the lock Spin Locks (Cont.) • Spin locks are useful since many kernel resources are locked for a fraction of milliseconds only – Thus, it would be far more time- consuming to release the CPU and reacquire it later Spin Locks (Cont.) • Five functions are used to initialize, test, and set spin locks – All these functions are based on atomic operations Spin Locks (Cont.) spin_lock_init() Set the spin lock to 1 (unlocked) spin_lock() Cycle until spin lock becomes 1 (unlocked), then set to 0 (locked) spin_unlock() Set the spin lock to 1 (unlocked) spin_unlock_wait() Wait until the spin lock becomes 1 (unlocked) spin_is_locked() Return 0 if the spin lock is 1 (unlocked); 0 otherwise spin_trylock() Set the spin lock to 0 (locked), and return 1 if the lock is obtained; 0 otherwise Spin Locks (Cont.) • spin_lock(): acquire a spin lock – 1: lock; decbslp ; (1)slp: spin lock’s address; (2) atomic jns3f ; jump if not signed (Positive=>JUMP) 2: cmpb$0, slp ; compare 0 with slp? pause ; see the following jle2b ; jump if less than or equal jmp1b ; check whether other processor ; has grabbed the lock 3: ; acquire the lock • Pause: P4 instruction that optimizing the execution of spin lock – Backward compatible to rep;nop, equal to do nothing Spin Locks (Cont.) • spin_unlock(): release a spin lock – lock; movb$1, slp Read/Write Spin Locks • Introduced to increase the amount of concurrency inside the kernel – Allow several kernel control path to simultaneously read the same DS • As long as no one modifies it – However, once to write, must acquire the write lock The Big Reader Lock • For MP systems – Skip!!! Semaphores • A lock primitive that allows waiters to sleep until the desired resource become free • Linux provides two kinds of semaphores – Kernel semaphores used by kernel control path – System V IPC semaphores used by User Mode process Semaphores (Cont.) • Kernel semaphores is similar to a spin lock – Does not allow a kernel control path to proceed until the lock is open – However, if resource is protected • Process is suspended (blocked) • Thus, kernel semaphores can be acquired only by functions that are allowed to sleep – Interrupt handlers and deferrable functions cannot use them Semaphores (Cont.) • Kernel semaphore: an object of type struct_semaphore – Count: store an atomic_t value • >0: the resource is free • =0: the resource is busy but no one is waiting • <0: the resource is unavailable – Wait • Store the address of a wait queue list that includes all sleeping processes waiting for this semaphore positive – Sleepers • Store a flag that indicates whether some processes are sleeping on the semaphore Semaphores (Cont.) • UP() – Increment count value – If (count > 0) • No process waiting, do nothing – Else • Wake up one sleeping process Semaphores (Cont.) • Down – Decrement the count value – If (count >= 0) • Acquire the resource – Else • Change the state from TASK_RUNNING to TASK_UNITERRUPTIBLE • Put the process in the semaphore wait queue • Call schedule() Semaphores (Cont.) • Only exception handlers and system call services can use the down() function – Exception handlers can block on a semaphore since Linux takes special care to avoid race condition – Interrupt handlers or deferrable functions must not invoke down() – Since the down() function suspends the process when the semaphore is busy – Linux provides the down_trylock() • If resource is busy, return immediately instead of blocked Read/Write Semaphores • A new feature of Linux 2.4 • Similar to the “Read/Write Spin Locks” – Except that waiting processes are suspended until the semaphore becomes open – Improve the amount of concurrency inside the kernel and improve system performance Read/Write Semaphores (Cont.) • The kernel handles all processes waiting for a read/write semaphore in strict FIFO order – Insert in the last position, but select from the first • If first process is reader – Any other reader following is also woken up until a writer is encountered • If first process is writer – Just pick up this one Read/Write Semaphores (Cont.) • Each read/write semaphore is described by a rw_semaphore structure – Count: 32-bits • 16~31: number of nonwaitingwriter (0 or 1) + number of waiting kernel control path • 0~15: number of nonwaitingreaders and writers – Wait_list: a list of waiting process – Wait_lock: a spin lock used to protect the wait queue list and the rw_semaphorestructure Completions • Introduced in Linux 2.4 • Used to solve a race condition occurred in MP – Skip! Local Interrupt Disabling • Effective way to protect DS that are also accessed by interrupt handlers – However, does not protect against concurrent accesses to DS by interrupt handlers running on other CPUs – In a MP system, local interrupt disabling is often coupled with spin locks • See later section Local Interrupt Disabling (Cont.) • Disabling interrupt – cli: clear the IF flag of eflags register • Enabling interrupt – sti: set the IF flag of eflags register Local Interrupt Disabling (Cont.) • At the end of the critical section – The kernel can’t simply set IF flag again – Since interrupts can execute in nested fashion • Kernel does not know what the IF flag before the current control path • Thus, control path must save the old setting of the flag and restore the setting at the end __save_flags(old); __cli(); […] __restore_flags(old); Global Interrupt Disabling • Global interrupt disabling significantly lowers the system concurrency level – It should not be used because it can be replaced by more efficient synchronization techniques • Mentioned later Global Interrupt Disabling (Cont.) • Global interrupt disabling is still available in Linux 2.4 to support old device drivers – It has been removed from the Linux 2.5 • Skip!!! Disabling Deferrable Functions • Deferrable functions can be executed at unpredictable times – DS both accessed by current control path and shared by deferrable functions must be protected to against race conditions Disabling Deferrable Functions (Cont.) • A trivial way to forbid deferrable functions execution is to disable interrupts – However, in some cases (mentioned later), kernel must disable deferrable functions without disabling interrupts • Disabling deferrable functions – Set the __local_bh_countto a nonzero value – do_softirq() will check this value • If found nonzero, do not execute softirqs – Since taskletsand bottom halves are implemented on top of softirqs • Thus, if set __local_bh_countnonzero, disable all deferrable functions Synchronizing Accesses to Kernel Data Structures • The rule of thumb when using which types of synchronization primitives in the kernel – Always keep the concurrency level as high as possible in the system • In turn, the concurrency level depends on two factors – The number of I/O devices that operate concurrently • Thus, interrupt should be disable as short as possible – The number of CPUs that do productive work • Spin lock should be avoided whenever possible Choosing Among Spin Locks, Semaphores, and Interrupt Disabling • Choosing the synchronization primitives depends on what kinds of kernel control paths access by the DS – As shown in the following slide Protection Required by DS Accessed by Kernel Control Paths Kernel control path accessing UP protection MP further the DS protections Exceptions Semaphore None Interrupts Local interrupt Spin lock disabling Deferrable functions None None or spin lock Exceptions + Interrupts Local interrupt Spin lock disabling Exceptions + Deferrable Local softirqdisabling Spin lock functions Interrupts + Deferrable Local interrupt Spin lock functions disabling Exceptions + Interrupts + Local interrupt Spin lock Deferrable functions disabling Protecting a DS Accessed by Exceptions • When a DS is accessed only by exception handlers, e.g., system call service routines – This DS usually represents a resource that can be assigned to one or more processes • Thus, race conditions are avoided through semaphores – Waiting process can to go sleep – Semaphore works in both UP and MP Protecting a DS Accessed by Interrupts • A DS is accessed by only the “top half” of an interrupt handler – Since interrupt handler is serialized with respect to itself • It cannot execute more than once concurrently • DS accessed does not require any synchronization primitive Protecting a DS Accessed by Interrupts (Cont.) • However, a DS may be accessed by more than one interrupt handlers – Require synchronization! • In UP – By disabling interrupts in all critical regions of the interrupt handler – In contrast, semaphore can block the process, which is forbid in a system – Spin lock, on the other hand, can freeze the system Protecting a DS Accessed by Interrupts (Cont.) • In MP, interrupt may occur in other processor by simply disabling local interrupts – Disable local interrupt • Prevent other interrupt handlers coming from the same CPU – Use spin lock (or read/write spin lock) • Prevent other interrupt handlers coming from other CPU Protecting a DS Accessed by Interrupts (Cont.) • provides several macros that couple local interrupts enabling/disabling with spin lock handling – Table 5-7 Protecting a DS Accessed by Deferrable Functions • The kind of deferrable functions determine its protection scheme for a DS accessed by the deferrable function • In UP, no race condition may exists – Execution of deferrable functions is always serialized on a CPU – A deferrable function cannot be interrupted by another deferrable function – Thus, no synchronization primitives is required Protecting a DS Accessed by Deferrable Functions (Cont.) • In MP, race conditions exist since several deferrable functions may run concurrently Deferrable functions Protection accessing the DS

Softirqs Spin lock One tasklet None Many tasklet Spin lock Bottom halves None Protecting a DS Accessed by Deferrable Functions (Cont.) • For softirqs – A DS accessed by softriqsmust always be protected – Because the same softriqcan run concurrently on MP • For tasklets – No protection if a DS is accessed only by one kind of tasklet • Only one kind of taskletcan be run on a MP system – Need protection if the DS is accessed by many kinds of tasklets Protecting a DS Accessed by Deferrable Functions (Cont.) • For bottom halves – Need not be protected – Because bottom halves never run concurrently • It is also possible to prevent race conditions by globally disabling deferrable functions by cli() macro – Should be avoided since it also disable the execution of interrupt handlers on all CPUs Protecting a DS Accessed by Exceptions and Interrupts • In UP – Interrupt handler cannot be interrupted by exceptions – Thus, use local interrupt disabling • In MP – Local interrupt disabling + spin lock Protecting a DS Accessed by Exceptions and Interrupts (Cont.) • Local interrupt disabling disables local interrupts • Spin lock prevent interrupt handlers on other processor to access shared DS – If only use spin lock, but no local interrupt disabling in MP • It is useless in SP. Thus, interrupt handlers on the local process may still be invoked and system may be frozen Protecting a DS Accessed by Exceptions and Deferrable Functions • Local interrupt disabling + spin lock is sufficient – Since deferrable functions are essentially activated by interrupt occurrences – No exception can be raised while a deferrable function is running Protecting a DS Accessed by Exceptions and Deferrable Functions (Cont.) • However, – The exception handler can simply disable deferrable functions instead of local interrupts – Thus, interrupts continue to be serviced • Thus, in UP – Local softirqdisabling • In MP – Local softirqdisabling + spin lock Protecting a DS Accessed by Interrupts and Deferrable Functions • An interrupt might be raised while a deferrable function is running • But, no deferrable function can stop an interrupt handler • Thus, in UP – Local interrupt disabling • In MP – Local interrupt disabling + spin lock Protecting a DS Accessed by Exceptions, Interrupts, and Deferrable Functions • In UP – Local interrupt disabling • In MP – Local interrupt disabling + spin lock