The Ephemeral Smoking Gun Using and kgdb to resolve a pthread “deadlock”

Brad Mouring LabVIEW Real-Time National Instruments The Intro – Who am I

● Work at National Instruments in the RTOS R&D group

● Multiple product lines use RTOS ● NI Switched to 2-3 yrs ago

● Single-mode RTOS ⇒ Dual-mode RTOS ● Functionality and support that comes ● Mindset within company about FOSS – Work with maintainers, minimize out-of-tree

The Ephemeral Smoking Gun 2 Brad Mouring The Setup – Crashing Application

● User-mode application crashed after a few hours of running ● The clincher: new issue from existing code

● The same application ran continually without issue on older, singlemode RTOS

The Ephemeral Smoking Gun 3 Brad Mouring Initial Investigation

● Configured to provide a core file on crash ● Checking the core file fingered a SIGABRT

● Normally used for assert() and critical errors ● Coming from glibc, __pthread_mutex_lock_full() ● To enable: ulimit -c ${blocks}

● May need to edit /etc/security/limits.conf ● Can set in the /etc/profile(.d/*)

The Ephemeral Smoking Gun 4 Brad Mouring Digging In Further

● Reproduced the issue checking stderr

● “pthread_mutex_lock.c:309: Assertion `...' failed.” ● Points me to a file and line number ● Assertion is checking the return from a futex syscall – Checking for a reported deadlock on certain types

The Ephemeral Smoking Gun 5 Brad Mouring Background: Pthread_mutexes

● Mutexes used to protect a few different application execution system state structures ● The application uses pthread_mutex_t's configured to be priority-inheriting

The Ephemeral Smoking Gun 6 Brad Mouring Background: Priority Inversion

A is running on the processor

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 7 Brad Mouring Background: Priority Inversion

A is running on the processor A takes mutex M

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 8 Brad Mouring Background: Priority Inversion

A is running on the processor A takes mutex M B becomes runnable, is scheduled in

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 9 Brad Mouring Background: Priority Inversion

A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 10 Brad Mouring Background: Priority Inversion

A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 11 Brad Mouring Background: Priority Inversion

A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M B is scheduled in (prio 11), blocking C (prio 90) from running!

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 12 Brad Mouring Background: Priority Inheritance A Solution A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 13 Brad Mouring Background: Priority Inheritance A Solution A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M A receives C's priority, finishes with mutex M, releases M

Task C (prio 90) Task B (prio 11) Task A (prio 90)

The Ephemeral Smoking Gun 14 Brad Mouring Background: Priority Inheritance A Solution A is running on the processor A takes mutex M B becomes runnable, is scheduled in C becomes runnable, is scheduled in C blocks on mutex M A receives C's priority, finishes with mutex M, releases M A receives its previous priority, C is scheduled in

Task C (prio 90) Task B (prio 11) Task A (prio 10)

The Ephemeral Smoking Gun 15 Brad Mouring Background: Pthread_mutexes and Futexes

● pthread_mutex use futexes when contended

● Uncontested lock stays in userspace (cmpxchg)

● Uses the kernel sys_futex call if contested

● Creates a queue of tasks to wake when the holder releases the lock (FUTEX_LOCK_PI) ● Sits atop rtmutex code within the kernel ● On release, previous holder notes that there are waiters, wakes one or more (FUTEX_UNLOCK_PI) ● The underlying rt_mutex subsystem provides some nice features (deadlock detection, e.g.)

The Ephemeral Smoking Gun 16 Brad Mouring Background: RT Mutexes

● RT Mutexes were designed the linux-rt tree

● Used to silently replace normal spinlocks ● Sold to mainline as a solution for prio inversion through futexes ● Prio inheritance is attained through tracking:

● The tasks blocked on a mutex (sorted by prio) ● The task that owns a mutex

The Ephemeral Smoking Gun 17 Brad Mouring Background: RT Mutexes Visually

● These relationships allow for prio inheritance

task

task mutex Blocked-on Owns

task mutex

task

The Ephemeral Smoking Gun 18 Brad Mouring Background: RT Mutexes Visually

● These relationships allow for prio inheritance

● Also handy for checking for deadlocks

task mutex Blocked-on Owns

task mutex

task

The Ephemeral Smoking Gun 19 Brad Mouring How to debug, and where?

● EDEADLK returned in a few locations, including a few in futex/mutex/rtmutex code ● Place a kgdb_breakpoint at these sites ● Build a kernel with kgdb enabled ● Reproduce the issue ● Troubleshoot from there

The Ephemeral Smoking Gun 20 Brad Mouring Background: How to enable KGDB

● Configure the kernel

● CONFIG_DEBUG_INFO ● CONFIG_KGDB ● CONFIG_KGDB_method_to_connect – e.g. CONFIG_KGDB_SERIAL_CONSOLE ● CONFIG_KGDB_KDB (optional)

The Ephemeral Smoking Gun 21 Brad Mouring Background: Connecting to a KGDB target

● You have a few options

● Serial port (null-modem connection) ● Over Ethernet (kgdboe) with out-of-tree source¹ ● Set module params on boot, on module load, or thereafter through

● Port and baud

¹http://sysprogs.com/VisualKernel/kgdboe/ The Ephemeral Smoking Gun 22 Brad Mouring Tips for using kgdb/gdb

● Find (or write) useful user-defined cmds

● Sequences you use frequently ● Pop cmds and settings in your ~/.gdbinit ● Graphical frontends are available ● Excellent resources online

● https://sourceware.org/gdb/current/onlinedocs/gdb/

The Ephemeral Smoking Gun 23 Brad Mouring KGDB leads to a dead end … and that's not necessarily a bad thing.

● EDEADLK came from rtmutex priority chain walking code (rt_mutex_adjust_prio_chain)

● The priochain walking code seemed to think that we had a loop in the chain ● Walking the chain manually in gdb from the original mutex, we reach a mutex who has no owner ● We were supposed to loop back around to the original mutex, as that's the current state of the pointers within the chain walking function

The Ephemeral Smoking Gun 24 Brad Mouring State of the Priority Chain at EDEADLK

C M2

Task about to B M1 orig_lock get EDEADLK

Blocked-on Iterator used to walk A priority chain Owns

D M3

The Ephemeral Smoking Gun 25 Brad Mouring A Few Clues From the Scene of the Crime

● Mutex M2 recently had an owner but doesn't currently ● There are two tasks (A, B) blocked on mutex M1 ● The checks that occur while walking the chain don't see anything odd and complain until a deadlock is detected

The Ephemeral Smoking Gun 26 Brad Mouring Re-ftrace-ing my steps

● A picture of what's going on leading up to the detected deadlock may shed some light into what's going on

● Ftrace and a set of tracers were already enabled on our kernel

● Insert some strategic trace_printk()s

● Add SIGABRT handler to app to stop tracing

● Reproduce the issue, use trace-cmd extract

The Ephemeral Smoking Gun 27 Brad Mouring kernelshark comes into the picture

28 kernelshark comes into the picture

● Pulling the dump into kernelshark to take a closer look, we notice a few interesting points

● Task 'B' (received EDEADLK) scheduled out between attempting to take mutex and reporting EDEADLK ● Quite a bit of mutex activity while B is out ● We can begin to form a narrative on what is happening leading up to the reported deadlock

The Ephemeral Smoking Gun 29 Brad Mouring Re-ftrace-ing my steps

A

C M2

task B M1 iter orig_lock

Blocked-on

B blocks on M1 Owns M1 is held by C C is blocked on M2 M2 is held by A

The Ephemeral Smoking Gun 30 Brad Mouring Re-ftrace-ing my steps

A

task C M2 iter

B M1 orig_lock

Blocked-on

B blocks on M1 Owns M1 is held by C C is blocked on M2 M2 is held by A B begins walking the prio chain The Ephemeral Smoking Gun 31 Brad Mouring Re-ftrace-ing my steps

task iter A (B)

C M2

B M1 orig_lock (B)

Blocked-on

B blocks on M1 Owns M1 is held by C C is blocked on M2 M2 is held by A B begins walking the prio chain...PREEMPT! The Ephemeral Smoking Gun 32 Brad Mouring Re-ftrace-ing my steps

task iter A (B)

C M2

B M1 orig_lock (B)

Blocked-on Owns A is scheduled in, releases M2

The Ephemeral Smoking Gun 33 Brad Mouring Re-ftrace-ing my steps

C M2

B M1 orig_lock (B) task iter A (B)

A is scheduled in, releases M2 A takes (uncontended) M3 in userspace A blocks on M1

The Ephemeral Smoking Gun 34 Brad Mouring Re-ftrace-ing my steps

C M2

B M1 orig_lock (B) task iter A (B)

D M3

D is scheduled in, blocks on M3 (creates rtmutex) The Ephemeral Smoking Gun 35 Brad Mouring Re-ftrace-ing my steps

C M2

B M1 orig_lock

task iter A

D M3

B is scheduled back in, continues its walk of the prio chain The Ephemeral Smoking Gun 36 Brad Mouring Re-ftrace-ing my steps

A blocks on the same mutex as B!

C M2

B M1 orig_lock

task iter A

D M3

B is scheduled back in, continues its walk of the prio chain The Ephemeral Smoking Gun 37 Brad Mouring Re-ftrace-ing my steps

A blocks on the same mutex as B! ! ted tec C de M2 ck dlo Dea e) als (F B M1 orig_lock

task iter A

D M3

B is scheduled back in, continues its walk of the prio chain The Ephemeral Smoking Gun 38 Brad Mouring Takin' it to the Streets

● Came to the linux-rt-users mailing list ● Had findings writeup, preliminary patch ● tglx saw the issue at hand, didn't like my patch, proposed a different fix ● Result: issue got fixed, learned about working with the mailing lists ● Moral: Don't be afraid to engage the community, but be ready for feedback

The Ephemeral Smoking Gun 39 Brad Mouring Conclusions

● There are some great tools (and online documentation) to solve kernel issues ● I've only covered two, there are many more

● Lockdep checking ● RCU diagnostics ● kernel(s) ● KDB ● Vendor tools The Ephemeral Smoking Gun 40 Brad Mouring Questions? Comments? Thanks!

The Ephemeral Smoking Gun 41 Brad Mouring