
Kernel concurrency bugs CSE 451, 2015 Pedro Fonseca Kernel Concurrency Bugs Concurrency bugs ● Depend on the interleaving of instructions – Non-deterministic ● Hard to avoid concurrency in the multi-core era – Finer grained locking – New algorithms ● Can have serious consequences – Therac-25 accident Kernel Concurrency Bugs Concurrency bug example int x = 0; void threadA(){ A = x + 1; x = A; } void threadB(){ B = x + 1; x = B; } Kernel Concurrency Bugs Concurrency bug example A = x + 1 int x = 0; B = x + 1 1 x = B void threadA(){ x = A A = x + 1; x = A; } void threadB(){ B = x + 1; x = B; } Kernel Concurrency Bugs Concurrency bug example A = x + 1 int x = 0; B = x + 1 1 x = B void threadA(){ x = A A = x + 1; x = A; A = x + 1 } 2 B = x + 1 x = A void threadB(){ x = B B = x + 1; x = B; } Kernel Concurrency Bugs Concurrency bug example A = x + 1 int x = 0; B = x + 1 1 x = B void threadA(){ x = A A = x + 1; x = A; A = x + 1 } 2 B = x + 1 x = A void threadB(){ x = B B = x + 1; x = B; A = x + 1 3 } x = A 3 B = x + 1 x = B Kernel Concurrency Bugs Concurrency bug example How many interleavings? A = x + 1 int x = 0; B = x + 1 1 x = B void threadA(){ x = A A = x + 1; x = A; A = x + 1 } 2 B = x + 1 x = A void threadB(){ x = B B = x + 1; x = B; A = x + 1 3 } x = A 3 B = x + 1 x = B Kernel Concurrency Bugs Concurrency bug example How many interleavings? A = x + 1 int x = 0; B = x + 1 1 x = B void threadA(){(a + b)! x = A A = x + 1; x = A; a! x b! A = x + 1 } 2 B = x + 1 a,b → number of instructionsx = A void threadB(){ x = B B = x + 1; x = B; A = x + 1 3 } x = A 3 B = x + 1 x = B Kernel Concurrency Bugs Concurrency bug example a=b=1 2 How many interleavings? a=b=2 6 A = x + a=b=31 20 a=b=4 70 int x = 0; B = x + 1 1 a=b=5x = B 252 x = A a=b=6 924 void threadA(){(a + b)! a=b=7 3432 A = x + 1; a=b=8 12870 x = A; a! x b! A = x + a=b=91 48620 } 2 a=b=10B = x 184756 + 1 a,b → number of instructionsx = A a=b=11 705432 void threadB(){ a=b=12x = B 2704156 B = x + 1; a=b=13 10400600 x = B; A = x + a=b=141 40116600 3 } x = A 3 a=b=15 155117520 a=b=16B = x 601080390 + 1 a=b=17x = B 2333606220 a=b=18 9075135300 Too many!! a=b=19 35345263800 Kernel Concurrency Bugs a=b=20 137846528820 Only a subset of executions exposes ● Often the subset of executions is really tiny – Concurrency bugs may go unnoticed for years! ● Small environment diferences may expose them more easily – Diferent kernels, diferent libraries, diferent workloads, diferent hardware – Bugs may never show up during testing and always show up on users computers! ● Non-determinism can be really painful! Kernel Concurrency Bugs Concurrency bugs can have many efect! ● Crash ● Hang ● Wrong results Kernel Concurrency Bugs Concurrency bug example int x = 0; void threadA(){ Lock_x.acquire(); A = x + 1; x = A; Lock_x.release(); } void threadB(){ Lock_x.acquire(); B = x + 1; x = B; Lock_x.release(); } Kernel Concurrency Bugs Concurrency bug example int x = 0; A = x + 1 B = x + 1 void threadA(){ 1 x = B Lock_x.acquire(); x = A A = x + 1; x = A; Lock_x.release(); A = x + 1 } 2 B = x + 1 x = A void threadB(){ x = B Lock_x.acquire(); B = x + 1; A = x + 1 x = B; x = A 3 Lock_x.release(); B = x + 1 } x = B Kernel Concurrency Bugs Concurrency bug example int x = 0; A = x + 1 B = x + 1 void threadA(){ 1 x = B Lock_x.acquire(); x = A A = x + 1; x = A; Lock_x.release(); A = x + 1 } 2 B = x + 1 x = A void threadB(){ x = B Lock_x.acquire(); B = x + 1; A = x + 1 x = B; x = A 3 Lock_x.release(); B = x + 1 } x = B Kernel Concurrency Bugs What about this solution? int x = 0; void threadA(){ x = x + 1; } void threadB(){ x = x + 1; } Kernel Concurrency Bugs What about this solution? int x = 0; void threadA(){ x = x + 1; } Still buggy! void threadB(){ x = x + 1; } Kernel Concurrency Bugs What about this solution? One C statement → Many instructions 00000000004004ed threadA 4004ed: 55 push %rbp 4004ee: 48 89 e5 mov %rsp,%rbp 4004f1: 8b 05 45 0b 20 00 mov 0x200b45(%rip),%eax # 60103c <x> 4004f7: 83 c0 01 add $0x1,%eax 4004fa: 89 05 3c 0b 20 00 mov %eax,0x200b3c(%rip) # 60103c <x> 400500: 5d pop %rbp 400501: c3 retq 0000000000400502 threadB 400502: 55 push %rbp 400503: 48 89 e5 mov %rsp,%rbp 400506: 8b 05 30 0b 20 00 mov 0x200b30(%rip),%eax # 60103c <x> 40050c: 83 c0 01 add $0x1,%eax 40050f: 89 05 27 0b 20 00 mov %eax,0x200b27(%rip) # 60103c <x> 400515: 5d pop %rbp 400516: c3 retq Kernel Concurrency Bugs What about this code? int x = 0; void threadA(){ x = CONSTANT_A; } void threadB(){ x = CONSTANT_B; } Kernel Concurrency Bugs What about this code? Still not a good idea! int x = 0; void threadA(){ x = CONSTANT_A; } void threadB(){ x = CONSTANT_B; } Kernel Concurrency Bugs What about this code? Still not a good idea! int x = 0; void threadA(){ a) Compiler might still emit x = CONSTANT_A; multiple instructions } void threadB(){ x = CONSTANT_B; } Kernel Concurrency Bugs What about this code? Still not a good idea! int x = 0; void threadA(){ a) Compiler might still emit x = CONSTANT_A; multiple instructions } and... void threadB(){ b) Some instructions are not x = CONSTANT_B; atomic } Kernel Concurrency Bugs What about this code? Still not a good idea! int x = 0; void threadA(){ a) Compiler might still emit x = CONSTANT_A; multiple instructions } and... void threadB(){ b) Some instructions are not x = CONSTANT_B; atomic } and... c) C standard says don't do it Kernel Concurrency Bugs What about this code? Still not a good idea!* int x = 0; void threadA(){ a) Compiler might still emit x = CONSTANT_A; multiple instructions } and... void threadB(){ b) Some instructions are not x = CONSTANT_B; atomic } and... * Kernel developers c) C standard says don't do it sometimes do it though Kernel Concurrency Bugs Kernel concurrency bugs ● Bugs that depend on the instruction interleavings – Triggered only by a subset of the interleavings ● Plenty of kernel concurrency bugs in kernels! Three of the fve 3.4.9 machines [...] locked up. TheThe bug bug is is a a race race and Threeand not not of always alwaysthe fve easy easy 3.4.9 to to reproducemachines reproduce [...]. .[...] [...] locked On On my my up. I've tried reproducing the issue, but so far I've particularparticular machine, machine,I've [the [the tried test test reproducing case] case] usually usually the triggers triggers issue, but [the [the so bug] bug]far I've within 10 minutesbeen but unsuccessfulenabling debug [...] options can change the within 10 minutesbeen but unsuccessfulenabling debug [...] options can change the Linux kernel mailing list (5/1/2013) timingtiming such such that that it it never never hits hits. .Once Once the the bug bug is is triggered, triggered, the the machinemachine is is in in trouble trouble and and needs needs to to be be rebooted. rebooted. Linux 3.0.41 change log [The[The bug] bug] was was quite quite hard hard to to decode decode as as the the reproduction reproduction time time isis between between 2 2 days days and and 3 3 weeks weeks and and intrusive intrusive tracing tracing makesmakes it it less less likely likely [...] [...] Linux 3.4.41 change log Kernel Concurrency Bugs Approaches to explore interleavings ● Stress testing approach – Hope to fnd the interleaving ● Systematic approach – Take full control of the interleavings – Existing tools focus on user-mode applications Focus on operating system kernels Kernel Concurrency Bugs SKI Finding kernel concurrency bugs ● Testing applications versus kernels ● Our approach ● Implementation ● Evaluation Kernel Concurrency Bugs User-mode tools App Kernel-level abstractions Previous user-mode Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Kernel Concurrency Bugs User-mode tools App Previous user-mode User-modeKernel-level testingabstractions tool Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Scheduler Kernel Concurrency Bugs Kernel-mode challenges Testing tool ● Kernel doesn't have a good instrumentation interface Kernel Scheduler ● An alternative would be to modify the kernel – But kernel modifcations: ● Change the tested software ● Are non-trivial ● Hinder portability Avoid kernel modifcations Kernel Concurrency Bugs User-mode versus kernel-mode App Kernel-level abstractions Existing user-mode Threads and sync. objects systematic tools LD_PRELOAD, ptrace Kernel Scheduler HW-level abstractions mov,Kernel add, jmp, testing registers, tool APIC Our tool (modifed VMM) Hardware Kernel Concurrency Bugs SKI Finding kernel concurrency bugs Systematic Practical No modifcations Full control of the + to the kernel kernel interleavings Fast Kernel Concurrency Bugs SKI Finding kernel concurrency bugs ● Challenges testing the kernel code ● SKI's approach ● Implementation ● Evaluation Kernel Concurrency Bugs SKI's approach VM App Challenges 1. How to control the schedules? Kernel 2. Which contexts are schedulable? 3. Which schedules to choose? HW-level abstractions mov, add, jmp, registers, APIC SKI VMM Kernel Concurrency Bugs 1. How to control the kernel schedules? ● Pin each tested thread to a diferent CPU (thread afnity) ● Pause and resume CPUs to control schedules Thread 1 Thread 2 Thread 1 Thread 2 Thread 1 Thread 2 Thread 1 Thread 2 MOV MOV MOV MOV ADD ADD PUSH ADD PUSH Pin PUSH Control MOV MOV t MOV MOV SUB + PUSH MOV MOV ADD MOV SUB SUB MOV SUB JMP JMP JMP JMP CPU CPU 1 CPU 2 CPU 1 CPU 2 CPU 1 CPU 2 Leverage thread afnity and control CPUs Kernel Concurrency Bugs 2.
Details
-
File Typepdf
-
Upload Time-
-
Content LanguagesEnglish
-
Upload UserAnonymous/Not logged-in
-
File Pages58 Page
-
File Size-