Concurrent Systems Freebsd Kernel

Concurrent Systems Freebsd Kernel

10/23/16 Concurrent systems Case study: FreeBSD kernel concurrency Dr Robert N. M. Watson 1 FreeBSD kernel • Open-source OS kernel – Large: millions oF LoC – Complex: tHousands oF subsystems, drivers, ... – Very concurrent: dozens or Hundreds oF CPU cores/tHreads – Widely used: NetApp, EMC, Dell, Apple, Juniper, NetFlix, Sony, Cisco, YaHoo!, … • WHy a case study? – Employs C&DS principles – Concurrency perFormance and composability at scale In the library: Marshall Kirk McKusick, George V. Neville-Neil, and Robert N. M. Watson. The Design and 2 Implementation of the FreeBSD Operating System (2nd Edition), Pearson Education, 2014. 1 10/23/16 BSD + FreeBSD: a brief History • 1980s Berkeley Standard Distribution (BSD) – ‘BSD’-style open-source license (MIT, ISC, CMU, …) – UNIX Fast File System (UFS/FFS), sockets API, DNS, used TCP/IP stack, FTP, sendmail, BIND, cron, vi, … • Open-source FreeBSD operating system 1993: FreeBSD 1.0 witHout support for multiprocessing 1998: FreeBSD 3.0 witH “giant-lock” multiprocessing 2003: FreeBSD 5.0 witH fine-grained locking 2005: FreeBSD 6.0 witH mature fine-grained locking 2012: FreeBSD 9.0 witH TCP scalability beyond 32 cores 3 FreeBSD: beFore multiprocessing (1) • Concurrency model inherited from UNIX • Userspace – Preemptive multitasking between processes – Later, preemptive multithreading within processes • Kernel – ‘Just’ a C program running ‘bare metal’ – Internally multitHreaded – User tHreads ‘in kernel’ (e.g., in system calls) – Kernel services (e.g., async. work for VM, etc.) 4 2 10/23/16 FreeBSD: beFore multiprocessing (2) • Cooperative multitasking witHin kernel – Mutual exclusion as long as you don’t sleep() – Implied global lock means local locks rarely required – Except For interrupt Handlers, non-preemptive kernel – Critical sections control interrupt-handler execution • Wait channels: implied condition variable For every address sleep(&x, …); // Wait for event on &x wakeup(&x); // Signal an event on &x – Must leave global state consistent when calling sleep() – Must reload any cached local state aFter sleep() returns • Use to build HigHer-level syncHronization primitives – E.g., lockmgr() reader-writer lock can be held over I/O (sleep) 5 Pre-multiprocessor scheduling Lots oF unexploited parallelism! 6 3 10/23/16 Hardware parallelism, synchronization • Late 1990s: multi-CPU begins to move down market – In 2000s: 2-processor a big deal – In 2010s: 64-core is increasingly common • Coherent, symmetric, shared memory systems – Instructions For atomic memory access • Compare-and-swap, test-and-set, load linked/store conditional • Signaling via Inter-Processor Interrupts (IPIs) – CPUs can trigger an interrupt Handler on eacH anotHer • Vendor extensions For perFormance, programmability – MIPS inter-thread message passing – Intel TM support: TSX (WHoops: HSW136!) 7 Giant locking tHe kernel • FreeBSD follows footsteps oF Cray, Sun, … • First, allow user programs to run in parallel – One instance oF kernel code/data sHared by all CPUs – DiFFerent user processes/tHreads on diFFerent CPUs • Giant spinlock around kernel – Acquire on syscall/trap to kernel; drop on return – In eFFect: kernel runs on at most once CPU at a time; ‘migrates’ between CPUs on demand • Interrupts – IF interrupt delivered on CPU X wHile kernel is on CPU Y, forward interrupt to Y using an IPI 8 4 10/23/16 Giant-locked scheduling User-user Kernel-user parallelism parallelism Kernel giant-lock Serial kernel execution; parallelism contention opportunity missed 9 Fine-grained locking • Giant locking is OK For user-program parallelism • Kernel-centered workloads trigger Giant contention – ScHeduler, IPC-intensive workloads – TCP/buFFer cacHe on HigH-load web servers – Process-model contention witH multitHreading (VM, …) • Motivates migration to fine-grained locking – Greater granularity (may) aFFord greater parallelism • Mutexes/condition variables ratHer tHan semapHores – Increasing consensus on ptHreads-like syncHronization – Explicit locks are easier to debug than semapHores – Support for priority inheritance + priority propagation – E.g., Linux is also now migrating away From semapHores 10 5 10/23/16 Fine-grained scheduling True kernel parallelism 11 Kernel syncHronization primitives • Spin locks – Used to implement the scHeduler, interrupt Handlers • Mutexes, reader-writer, read-mostly locks – Most Heavily used – diFFerent optimization tradeoFFs – Can be Held only over “bounded” computations – Adaptive: on contention, sleep is expensive; spin First – Sleeping depends on scHeduler, and Hence on spinlocks… • Shared-eXclusive (SX) locks, condition variables – Can be Held over I/O and otHer unbounded waits • Condition variables usable with any lock type • Most primitives support priority propagation 12 6 10/23/16 WITNESS lock-order cHecker • Kernel relies on partial lock order to prevent deadlock (Recall dining pHilosopHers) – In-field lock-related deadlocks are (very) rare • WITNESS is a lock-order debugging tool – Warns wHen lock cycles (could) arise by tracking edges – Only in debugging kernels due to overhead (15%+) • Tracks botH statically declared, dynamic lock orders – Static orders most commonly intra-module – Dynamic orders most commonly inter-module • Deadlocks For condition variables remain Hard to debug – What thread should have woken up a CV being waited on? – Similar to semapHore problem 13 WITNESS: global lock-order graph* * Turns out tHat tHe global lock-order grapH is pretty complicated. 14 7 10/23/16 * * Commentary on WITNESS Full-system lock-order grapH complexity; courtesy Scott Long, NetFlix 15 Excerpt From global lock-order graph* This bit mostly has to do Local clusters: e.g., related with networking locks From the firewall: two leaF nodes; one is held over calls to other subsystems Network interface locks: “transmit” occurs at the bottom oF call stacks via many layers holding locks Memory allocator locks follow most other locks, since most kernel components * THe local lock-order grapH is also complicated. require memory allocation16 8 10/23/16 WITNESS debug output 1st 0xffffff80025207f0 run0_node_lock (run0_node_lock) @ /usr/src/sys/net80211/ieee80211_ioctl.c:1341 2nd 0xffffff80025142a8 run0 (network driver) @ /usr/src/sys/modules/usb/run/../../../dev/usb/wlan/if_run.c:3368 KDB: stack backtrace: db_trace_self_wrapper() at db_trace_self_wrapper+0x2a Lock names and source kdb_backtrace() at kdb_backtrace+0x37 code locations oF _witness_debugger() at _witness_debugger+0x2c witness_checkorder() at witness_checkorder+0x853 acquisitions adding the _mtx_lock_flags() at _mtx_lock_flags+0x85 oFFending graph edge run_raw_xmit() at run_raw_xmit+0x58 ieee80211_send_mgmt() at ieee80211_send_mgmt+0x4d5 domlme() at domlme+0x95 setmlme_common() at setmlme_common+0x2f0 ieee80211_ioctl_setmlme() at ieee80211_ioctl_setmlme+0x7e ieee80211_ioctl_set80211() at ieee80211_ioctl_set80211+0x46f in_control() at in_control+0xad ifioctl() at ifioctl+0xece Stack trace to acquisition kern_ioctl() at kern_ioctl+0xcd that triggered cycle: sys_ioctl() at sys_ioctl+0xf0 802.11 called USB; amd64_syscall() at amd64_syscall+0x380 Xfast_syscall() at Xfast_syscall+0xf7 previously, perhaps USB --- syscall (54, FreeBSD ELF64, sys_ioctl), rip = 0x800de7aec,called 802.11? rsp = 0x7fffffffd848, rbp = 0x2a --- 17 How does tHis work in practice? • Kernel is Heavily multi-threaded • EacH user tHread Has a corresponding kernel tHread – Represents user tHread wHen in syscall, page Fault, etc. • Kernels services oFten execute in asyncHronous tHreads – Interrupts, timers, I/O, networking, etc. • ThereFore extensive syncHronization – Locking model is almost always data-oriented – Think ‘monitors’ ratHer tHan ‘critical sections’ – ReFerence counting or reader-writer locks used For stability – HigHer-level patterns (producer-consumer, active objects, etc.) used Frequently 18 9 10/23/16 Kernel threads in action robert@lemongrass-freebsd64:~> procstat –at 12 100037 intr irq7: ppc0 0 16 wait - PID TID COMM TDNAME CPU PRI STATE WCHAN 12 100038 intr Vast hoards oF swi0: uart uart 0 threads 24 wait - 0 100000 kernel swapper 1 84 sleep sched 13 100010 geom g_event 0 92 sleep - 0 100009 kernel firmware taskq 0 108 sleep - 13 100011 geom represent concurrent activitiesg_up 1 92 sleep - 0 100014 kernel kqueue taskq 0 108 sleep - 13 100012 geom g_down 1 92 sleep - 0 100016 kernel thread taskq 0 108 sleep - 14 100013 yarrow - 1 84 sleep - 0 100020 kernel acpi_task_0 1 108 sleep - 15 100029 usb usbus0 0 32 sleep - 0 100021 kernel acpi_task_1 Idle CPUs are occupied by 1 108 sleep - 15 100030 usb usbus0 0 28 sleep - 0 100022 kernel acpi_task_2 1 108 sleep - 15 100031 usb usbus0 0 32 sleep USBWAIT 0 100023 kernel ffs_trim taskq an 1 idle thread 108 sleep - … why?15 100032 usb usbus0 0 32 sleep - 0 100033 kernel em0 taskq 1 8 sleep - 16 100046 bufdaemon - 0 84 sleep psleep 1 100002 init - 0 152 sleep wait 17 100047 syncer - 1 116 sleep syncer 2PID 100027 mpt_recovery0 TID COMM - 0 84 sleep TDNAME idle 18 100048 vnlru CPU -PRI STATE 1 84WCHAN sleep vlruwt 3 100039 fdc0 - 1 84 sleep - 19 100049 softdepflush - 1 84 sleep sdflush 4 10004011 100003ctl_thrd idle- 0 84 sleep idle: ctl_work cpu0104 100055 adjkerntz 0 -255 run 1 152 - sleep pause 5 100041 sctp_iterator - 0 84 sleep waiting_ 615 100056 dhclient - 0 139 sleep select 6 10004212 100024xpt_thrd intr- 0 84 sleep irq14: ccb_scan ata0667 100075 dhclient 0 - 12 wait 1 120 - sleep select 7 100043 pagedaemon - 1 84 sleep psleep 685 100068 devd - 1 120 sleep wait 8 10004412 100025vmdaemon intr- 1 84 sleep irq15:

View Full Text

Details

  • File Type
    pdf
  • Upload Time
    -
  • Content Languages
    English
  • Upload User
    Anonymous/Not logged-in
  • File Pages
    14 Page
  • File Size
    -

Download

Channel Download Status
Express Download Enable

Copyright

We respect the copyrights and intellectual property rights of all users. All uploaded documents are either original works of the uploader or authorized works of the rightful owners.

  • Not to be reproduced or distributed without explicit permission.
  • Not used for commercial purposes outside of approved use cases.
  • Not used to infringe on the rights of the original creators.
  • If you believe any content infringes your copyright, please contact us immediately.

Support

For help with questions, suggestions, or problems, please contact us