CCChahahaptptpteeerrr 555 KKKeeerrrnnneeelll SSSyyynnnccchhhrrrooonnniiizzzaaatttionionion

Hsung-Pin Chang Department of Computer Science National Chung Hsing University

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Outline • Kernel Control Paths • When Synchronization Is Not Necessary • Synchronization Primitives • Synchronizing Accesses to Kernel Data Structure • Examples of Race Condition Prevention

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths • Kernel control path – A sequence of instructions executed by the kernel to handle kernel requests of different kinds

– Each kernel request is handled by a different kernel control path

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths (Cont.) • Kernel requests may be issued in several possible ways – A process executing in User Mode causes an exception-for instance, by executing at int0x80 instruction – An external devices sends a signal to a Programmable Interrupt Controller

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths (Cont.) – A process executing in Kernel Mode causes a Page Fault exception

– A process running in a MP system and executing in Kernel Mode raises an interprocessorinterrupt

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths (Cont.) • Kernel control path is quite similar to the process, except – Does not have a process descriptor

– Not scheduled through scheduler • By inserting sequence of instructions into the kernel code

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths (Cont.) • In some cases, the CPU interleaves kernel control paths when one of the following event occurs – A process switch occurs, i.e., when the schedule() function is invoked – An interrupt occurs while the CPU is running a kernel control path with interrupt enabled – A deferrable function is executed

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Kernel Control Paths (Cont.) • Thus, some kernel data structures must be protected to prevent race condition – The code to modify these data structures must be in a critical section

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com When Synchronization Is Not Necessary • kernel is not preemptive – A running process cannot be preempted while it remains in Kernel Mode • As a result, in Linux – No process running in Kernel Mode may be replaced by another process, except when the former voluntarily relinquishes control of CPU

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com When Synchronization Is Not Necessary (Cont.) – Interrupt, exception or softirqhandling can interrupt a process running in Kernel Mode, for example, system calls • However, when the handler terminates, kernel control path of the process is resumed

– A process control path performing interrupt handling cannot be interrupted by a kernel control path executing a deferrable function or a system call service routine

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com When Synchronization Is Not Necessary (Cont.) • Thus, on uniprocessor – Kernel data structures that are not updated by interrupt, exception, or softirq handlers can be safely accessed • However, on MP, things are much more complicated • The rest describes what to do when synchronization is necessary

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Synchronization Primitives • Atomic operations • Memory Barriers • Spin Locks • Read/Write Spin Locks • The Big Reader Lock • • Read/Write Semaphores • Completions • Local Interrupt Disabling • Global Interrupt Disabling • Disabling Deferrable Functions

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Synchronization Primitives (Cont.)

Atomic operation Atomic read-modify-write ALL CPUs instruction to a counter Memory barrier Avoid instruction re-ordering Local CPU Spin lock Lock with busy wait ALL CPUs Semaphore Lock with blocking wait (sleep) ALL CPUs Local interrupt Forbid interrupt handling on a Local CPU disabling single CPU Local softirq Forbid deferrable function Local CPU disabling handling on a single CPU Global interrupt Forbid interrupt and softirq ALL CPUs disabling handling on all CPUs

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Atomic Operations • Some instructions are of type “read- modify-write” • If two such instructions are non-atomic that issued by two CPUs to access the same location – Memory arbiter may assign memory to the second one while the first one has not yet been completed – Race condition

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Atomic Operations (Cont.) • To prevent race conditions – Provide operations that are atomic at chip level – Thus, cannot be interrupted in the middle and avoid access to the same memory location by other CPUs • Atomic operations acts as base of other, more flexible mechanisms to create critical sections

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Atomic Operations (Cont.) • 80x86 instructions that are atomic – Instructions that make zero or one aligned memory access – Read-modify-write, e.g., inc or dec, are atomic if no other processor has taken the memory bus in the middle • In a uniprocessor, no memory bus stealing

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Atomic Operations (Cont.) – Read-modify-write instruction whose opcodeis prefixed by the lock byte (0xf0) are atomic even on MP • Control unit (CU) lock the memory bus until the instruction is completed – Instructions whose opcodeis prefixed by a rep (0xf2) byte is not atomic • Rep: CU repeat the same instructions several times • CU check pending interrupts before a new iteration

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Atomic Operations (Cont.) • We don’t know whether the will use a single, atomic instruction for an operation, e.g., a++; • Linux thus provides – Atomic_t data type • 24-bit atomically accessible counter – atmoicoperations • Table 5-2 and 5-3

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Table 5-2. Atomic Operations in Lunux Function Description atomic_read(v) Return *v atomic_set(v, i) Set *v to i atomic_add(i, v) Add i to *v atomic_sub(i, v) Subtract i from *v atomic_sub_and_test(i, v) Subtract i from *v and return 1 if the result is zero, 0 otherwise atomic_inc(v) Add 1 to *v atomic_dec(v) Subtract 1 from *v atomic_dec_and_test(v) Subtract 1 from *v and return 1if the result is zero, 0 otherwise

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Memory Barriers • Compiler may optimizing the code – Reorder the execution of instructions

• However, for synchronization – Instructions reordering must be avoided – In fact, all synchronization primitives act as memory barriers

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Memory Barriers (Cont.) • A memory barriers primitives ensures that – The operations placed before the primitives are finished before starting the operations placed after the primitives – Like a firewall that cannot be passed by any outside instructions

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Memory Barriers (Cont.) • The following 80x86’s instructions are “serializing” because they act as memory barriers – Instructions operate on I/O ports – Instructions perfixedby the lock type – Instructions that writes to control registers, system registers, or debug registers – A few special instructions, e.g., iret

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Memory Barriers in Linux • Linux uses six memory barrier primitives – See the next slides

• Memory barriers are useful both in MP and in uniprocessorsystems

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Memory Barriers in Linux Macro Description mb() Memory barrier for MP and UP rmb() Read memory barrier for MP or UP wmb() Write memory barrier for MP and UP smp_mb() Memory barrier for MP only, do nothing for UP smp_rmb() Read memory barrier for MP only, do nothing for UP smp_wmb() Write memory barrier for MP only, do nothing for UP

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Linux Implementation of Memory Barrier • Depends on system architecture • On the Intel platform – rmb() expands to • asmvolatile(“lock; addl$0,0(%%esp)”:::”memory”) – asm: tell the compiler to insert some assembly languages – volatile: forbid the compiler to reorder the asm instruction with other instructions – lock prefix makes the instruction a memory barrier

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks • Spin locks are a special kind of lock designed to work in a MP system – If the lock is closed, spin around, i.e., repeatedly executing a tight loop, until the lock is released – Useless in a UP system • The waiting kernel control path would keep running, and the holding kernel control path have no chance to release the lock

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks (Cont.) • Spin locks are useful since many kernel resources are locked for a fraction of milliseconds only – Thus, it would be far more time- consuming to release the CPU and reacquire it later

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks (Cont.) • Five functions are used to initialize, test, and set spin locks – All these functions are based on atomic operations

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks (Cont.)

spin_lock_init() Set the spin lock to 1 (unlocked)

spin_lock() Cycle until spin lock becomes 1 (unlocked), then set to 0 (locked) spin_unlock() Set the spin lock to 1 (unlocked)

spin_unlock_wait() Wait until the spin lock becomes 1 (unlocked) spin_is_locked() Return 0 if the spin lock is 1 (unlocked); 0 otherwise spin_trylock() Set the spin lock to 0 (locked), and return 1 if the lock is obtained; 0 otherwise

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks (Cont.) • spin_lock(): acquire a spin lock – 1: lock; decbslp ; (1)slp: spin lock’s address; (2) atomic jns3f ; jump if not signed (Positive=>JUMP) 2: cmpb$0, slp ; compare 0 with slp? pause ; see the following jle2b ; jump if less than or equal jmp1b ; check whether other processor ; has grabbed the lock 3: ; acquire the lock • Pause: P4 instruction that optimizing the execution of spin lock – Backward compatible to rep;nop, equal to do nothing

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Spin Locks (Cont.) • spin_unlock(): release a spin lock – lock; movb$1, slp

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Read/Write Spin Locks • Introduced to increase the amount of concurrency inside the kernel – Allow several kernel control path to simultaneously read the same DS • As long as no one modifies it – However, once to write, must acquire the write lock

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com The Big Reader Lock • For MP systems – Skip!!!

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores • A lock primitive that allows waiters to sleep until the desired resource become free • Linux provides two kinds of semaphores – Kernel semaphores used by kernel control path

– System V IPC semaphores used by User Mode process

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores (Cont.) • Kernel semaphores is similar to a spin lock – Does not allow a kernel control path to proceed until the lock is open – However, if resource is protected • Process is suspended (blocked) • Thus, kernel semaphores can be acquired only by functions that are allowed to sleep – Interrupt handlers and deferrable functions cannot use them

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores (Cont.) • Kernel semaphore: an object of type struct_semaphore – Count: store an atomic_t value • >0: the resource is free • =0: the resource is busy but no one is waiting • <0: the resource is unavailable – Wait • Store the address of a wait queue list that includes all sleeping processes waiting for this semaphore positive – Sleepers • Store a flag that indicates whether some processes are sleeping on the semaphore

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores (Cont.) • UP() – Increment count value – If (count > 0) • No process waiting, do nothing – Else • Wake up one sleeping process

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores-UP()

up: movl$sem, %ecx lock; incl(%ecx) jg1f __up() pushl%eax pushl%edx void __up(structsemaphore *sem) pushl%ecx { call __up() wake_up(&sem->wait); popl%ecx } pop1 %edx pop1 %eax 1:

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores (Cont.) • Down() – Decrement the count value – If (count >= 0) • Acquire the resource – Else • Change the state from TASK_RUNNING to TASK_UNITERRUPTIBLE • Put the process in the semaphore wait queue • Call schedule()

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphore-Down() down: movl$sem, %ecx, lock; decl(%ecx); jns1f ; jmpif no signed pushl%eax pushl%edx pushl%ecx call __down popl%ecx popl%edx popl%eax 1:

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphore-Down() (Cont.)

void __down(sturctsemaphore *sem) { DECLARE_WAITQUEUE(wait, current); current->state = TASK_UNINTERUPTIBLE; add_wait_queue_exclusive(&sem->wait, &wait); spin_lock_irq(&semaphore_lock); sem->sleepers++; for (;;) { if ( !atomic_add_negative(sem->sleepers-1,&sem->count)) { sem->sleepers++; break } sem->sleepers = 1; spin_unlock_irq(&semaphore_lock); schedule(); current->state = TASK_UNINTERRUPTIBLE; spin_lock_irq(&semaphore_lock); } spin_unlock_irq(&semaphore_lock); remove_wait_queue(&sem->wait, &wait); Current->state = TASK_RUNNING; wait_up(&sem->wait);

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Semaphores (Cont.) • Only exception handlers and system call services can use the down() function – Interrupt handlers or deferrable functions must not invoke down()

• Since the down() function suspends the process when the semaphore is busy – Linux provides the down_trylock() • If resource is busy, return immediately instead of blocked

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Read/Write Semaphores • A new feature of Linux 2.4 • Similar to the “Read/Write Spin Locks” – Except that waiting processes are suspended until the semaphore becomes open – Improve the amount of concurrency inside the kernel and improve system performance

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Read/Write Semaphores (Cont.) • The kernel handles all processes waiting for a read/write semaphore in strict FIFO order – Insert in the last position, but select from the first • If first process is reader – Any other reader following is also woken up until a writer is encountered • If first process is writer – Just pick up this one

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Read/Write Semaphores (Cont.) • Each read/write semaphore is described by a rw_semaphore structure – Count: 32-bits (store two 16-bit counters) • 16~31: number of nonwaitingwriter + number of waiting kernel control path • 0~15: number of nonwaitingreaders and writers – Wait_list: a list of waiting process – Wait_lock: a spin lock used to protect the wait queue list and the rw_semaphorestructure

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Completions • Introduced in Linux 2.4

• Used to solve a race condition occurred in MP – Skip!

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Local Interrupt Disabling • Effective way to protect DS that are also accessed by interrupt handlers – However, does not protect against concurrent accesses to DS by interrupt handlers running on other CPUs

– In a MP system, local interrupt disabling is often coupled with spin locks • See later section

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Local Interrupt Disabling (Cont.) • Disabling interrupt – cli: clear the IF flag of eflags register

• Enabling interrupt – sti: set the IF flag of eflags register

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Local Interrupt Disabling (Cont.) • At the end of the critical section – The kernel can’t simply set IF flag again – Since interrupts can execute in nested fashion • Kernel does not know what the IF flag before the current control path • Thus, control path must save the old setting of the flag and restore the setting at the end __save_flags(old); __cli(); […] __restore_flags(old);

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Global Interrupt Disabling • Global interrupt disabling significantly lowers the system concurrency level – It should not be used because it can be replaced by more efficient synchronization techniques • Mentioned later

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Global Interrupt Disabling (Cont.) • Global interrupt disabling is still available in Linux 2.4 to support old device drivers – It has been removed from the Linux 2.5

• Skip!!!

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Disabling Deferrable Functions • Deferrable functions can be executed at unpredictable times – DS both accessed by current control path and shared by deferrable functions must be protected to against race conditions

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Disabling Deferrable Functions (Cont.) • A trivial way to forbid deferrable functions execution is to disable interrupts – However, in some cases (mentioned later), kernel must disable deferrable functions without disabling interrupts • Disabling deferrable functions – Set the __local_bh_countto a nonzero value – do_softirq() will check this value • If found nonzero, do not execute softirqs – Since taskletsand bottom halves are implemented on top of softirqs • Thus, if set __local_bh_countnonzero, disable all deferrable functions

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Synchronizing Accesses to Kernel Data Structures • The rule of thumb when using which types of synchronization primitives in the kernel – Always keep the concurrency level as high as possible in the system • In turn, the concurrency level depends on two factors – The number of I/O devices that operate concurrently • Thus, interrupt should be disable as short as possible – The number of CPUs that do productive work • Spin lock should be avoided whenever possible

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Choosing Among Spin Locks, Semaphores, and Interrupt Disabling • Choosing the synchronization primitives depends on what kinds of kernel control paths access by the DS – As shown in the following slide

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protection Required by DS Accessed by Kernel Control Paths Kernel control path accessing UP protection MP further the DS protections Exceptions Semaphore None Interrupts Local interrupt Spin lock disabling Deferrable functions None None or spin lock Exceptions + Interrupts Local interrupt Spin lock disabling Exceptions + Deferrable Local softirqdisabling Spin lock functions Interrupts + Deferrable Local interrupt Spin lock functions disabling Exceptions + Interrupts + Local interrupt Spin lock Deferrable functions disabling

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions • When a DS is accessed only by exception handlers, in most cases, system call service routines – This DS usually represents a resource that can be assigned to one or more processes • Thus, race conditions are avoided through semaphores – Waiting process can to go sleep – Semaphore works in both UP and MP

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Interrupts • A DS is accessed by only the “top half” of an interrupt handler – Since interrupt handler is serialized with respect to itself • It cannot execute more than once concurrently • DS accessed does not require any synchronization primitive

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Interrupts (Cont.) • However, a DS may be accessed by more than one interrupt handlers – Require synchronization! • In UP – By disabling interrupts in all critical regions of the interrupt handler – Cannot use semaphore that may block the process, which is forbid in a system – Cannot use spin lock that may freeze the system

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Interrupts (Cont.) • In MP, interrupt may occur in other processor if only disabling local interrupts – Disable local interrupt • Prevent other interrupt handlers coming from the same CPU – Use spin lock (or read/write spin lock) • Prevent other interrupt handlers coming from other CPU

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Interrupts (Cont.) • provides several macros that couple local interrupts enabling/disabling with spin lock handling – Table 5-7

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Deferrable Functions • The kind of deferrable functions determine its protection scheme for a DS accessed by the deferrable function • In UP, no race condition may exists – Execution of deferrable functions is always serialized on a CPU – A deferrable function cannot be interrupted by another deferrable function – Thus, no synchronization primitives is required

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Deferrable Functions (Cont.) • In MP, race conditions exist since several deferrable functions may run concurrently Deferrable functions Protection accessing the DS

Softirqs Spin lock One tasklet None Many tasklet Spin lock Bottom halves None

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Deferrable Functions (Cont.) • For softirqs – A DS accessed by softriqsmust always be protected – Because the same softriqcan run concurrently on MP • For tasklets – No protection if a DS is accessed only by one kind of tasklet • Only one kind of taskletcan be run on a MP system – Need protection if the DS is accessed by many kinds of tasklets

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Deferrable Functions (Cont.) • For bottom halves – Need not be protected – Because bottom halves never run concurrently • It is also possible to prevent race conditions by globally disabling deferrable functions by cli() macro – Should be avoided since it also disable the execution of interrupt handlers on all CPUs

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions and Interrupts • In UP – Interrupt handler cannot be interrupted by exceptions (e.g., system call) – Thus, use local interrupt disabling • In MP – Local interrupt disabling + spin lock – Or local interrupt disabling + semaphore

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions and Interrupts (Cont.) • Local interrupt disabling disables local interrupts • Spin lock prevent interrupt handlers on other processor to access shared DS – If only use spin lock, but no local interrupt disabling in MP • Interrupt handlers on the local process may still be invoked and system may be frozen

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions and Deferrable Functions • Local interrupt disabling + spin lock – Deferrable functions are essentially activated by interrupt occurrences

– No exception can be raised while a deferrable function is running

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions and Deferrable Functions (Cont.) • However, above is much more than sufficient – The exception handler can simply disable deferrable functions instead of local interrupts – Thus, interrupts continue to be serviced • Thus, in UP – Local softirqdisabling • In MP – Local softirqdisabling + spin lock

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Interrupts and Deferrable Functions • An interrupt might be raised while a deferrable function is running • But, no deferrable function can stop an interrupt handler • Thus, in UP – Local interrupt disabling • In MP – Local interrupt disabling + spin lock

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com Protecting a DS Accessed by Exceptions, Interrupts, and Deferrable Functions • In UP – Local interrupt disabling

• InMP – Local interrupt disabling + spin lock

PDF created with FinePrint pdfFactory Pro trial version www.pdffactory.com