Rights to copy

• © Copyright 2004-2019, Bootlin • License: Creative Commons Attribution - Share Alike 3.0 • https://creativecommons.org/licenses/by-sa/3.0/legalcode • You are free: – to copy, distribute, display, and perform the work – to derivative works – to make commercial use of the work • Under the following conditions: – Attribution. You must give the original author credit. – Share Alike. If you alter, transform, or build upon this work, you may distribute the resulting work only • under a license identical to this one. – For any reuse or distribution, you must make clear to others the license terms of this work. – Any of these conditions can be waived if you get permission from the copyright holder. • Your fair use and other rights are in no way affected by the above. • Document sources: https://git.bootlin.com/training-materials/

Dongkun Shin, SKKU 1 9. Concurrent Access to Resources: Locking

Dongkun Shin, SKKU 2 Sources of concurrency issues

• In terms of concurrency, the kernel has the same constraint as a multi-threaded program – its state is global and visible in all executions contexts • Concurrency arises because of – Interrupts, which interrupts the current thread to execute an interrupt handler. They may be using shared resources. – Kernel preemption, if enabled, causes the kernel to switch from the execution of one system call to another. They may be using shared resources. – Multiprocessing, in which case code is really executed in parallel on different processors, and they may be using shared resources as well. • The solution is to keep as much local state as possible and for the shared resources, use locking.

Dongkun Shin, SKKU 3 Concurrency protection with locks

Dongkun Shin, SKKU 4 mutexes

• The kernel's main locking primitive • The process requesting the lock blocks when the lock is already held. – Mutexes can therefore only be used in contexts where sleeping is allowed. • Mutex denition: – #include • Initializing a mutex statically: – DEFINE_MUTEX(name); • Or initializing a mutex dynamically: – void mutex_init(struct mutex *lock);

Dongkun Shin, SKKU 5 Locking and Unlocking Mutexes

Dongkun Shin, SKKU 6 mutex_lock_killable()

bool oom_killer_disable(signed long timeout) { signed long ret;

/* * Make sure to not race with an ongoing OOM killer. Check that the * current is not killed (possibly due to sharing the victim's memory). */ if (mutex_lock_killable(&oom_lock)) return false; oom_killer_disabled = true; mutex_unlock(&oom_lock);

ret = wait_event_interruptible_timeout(oom_victims_wait, !atomic_read(&oom_victims), timeout); if (ret <= 0) { oom_killer_enable(); return false; } pr_info("OOM killer disabled.\n");

return true; } Dongkun Shin, SKKU 7 Locking and Unlocking Mutexes

Dongkun Shin, SKKU 8 Spinlocks

• Locks to be used for code that is not allowed to sleep (interrupt handlers), or that doesn't want to sleep (critical sections). – Be very careful not to call functions which can sleep! • Originally intended for multiprocessor systems • Spinlocks never sleep and keep spinning in a loop until the lock is available. • Spinlocks cause kernel preemption to be disabled on the CPU executing them. • The critical section protected by a spinlock is not allowed to sleep.

Dongkun Shin, SKKU 9 Initializing Spinlocks

Dongkun Shin, SKKU 10 Using Spinlocks

Dongkun Shin, SKKU 11 Spinlock example

Dongkun Shin, SKKU 12 Deadlock Situations

• They can lock up your system. Make sure they never happen! • Don't call a function that can try to get access to the same lock • Holding multiple locks is risky!

Dongkun Shin, SKKU 13 Kernel lock validator

• Adds instrumentation to kernel locking code • Detect violations of locking rules during system life, such as: – Locks acquired in different order (keeps track of locking sequences and compares them). – Spinlocks acquired in interrupt handlers and also in process context when interrupts are enabled. • Not suitable for production systems but acceptable overhead in development. • See https://www.kernel.org/doc/Documentation/locking/lockdep- design.txt for details

Dongkun Shin, SKKU 14 Kernel lock validator

To enable lockdep feature,

make menuconfig Kernel hacking ---> Lock Debugging (spinlocks, mutexes, etc...) ---> [*] RT Mutex debugging, deadlock detection -*- Spinlock and rw-lock debugging: basic checks -*- Mutex debugging: basic checks [*] Wait/wound mutex debugging: Slowpath testing -*- Lock debugging: detect incorrect freeing of live locks [*] Lock debugging: prove locking correctness [*] Lock usage statistics [*] Lock dependency engine debugging [*] Sleep inside atomic section checking [*] Locking API boot-time self-tests torture tests for locking

/proc/lockdep /proc/lockdep_chains /proc/lockdep_stat /proc/locks /proc/lock_stats Dongkun Shin, SKKU 15 Alternatives to Locking

• Locking can have a strong negative impact on system performance. • In some situations, you could do without it. – By using lock-free algorithms like Read Copy Update (RCU). – RCU API available in the kernel (See http://en.wikipedia.org/wiki/RCU). – When available, use atomic operations.

Dongkun Shin, SKKU 16 Atomic Variables

• Useful when the shared resource is an integer value • Even an instruction like n++ is not guaranteed to be atomic on all processors! • Atomic operations definitions – #include • atomic_t – Contains a signed integer (at least 24 bits) • Atomic operations (main ones) – Set or read the counter: • void atomic_set(atomic_t *v, int i); • int atomic_read(atomic_t *v); – Operations without return value: • void atomic_inc(atomic_t *v); • void atomic_dec(atomic_t *v); • void atomic_add(int i, atomic_t *v); • void atomic_sub(int i, atomic_t *v);

Dongkun Shin, SKKU 17 Atomic Variables

• Similar functions testing the result: – int atomic_inc_and_test(...); – int atomic_dec_and_test(...); – int atomic_sub_and_test(...); • Functions returning the new value: – int atomic_inc_return(...); – int atomic_dec_return(...); – int atomic_add_return(...); – int atomic_sub_return(...);

Dongkun Shin, SKKU 18 Atomic Bit Operations

• Supply very fast, atomic operations • On most platforms, apply to an unsigned long type. • Apply to a void type on a few others. • Set, clear, toggle a given bit: – void set_bit(int nr, unsigned long * addr); – void clear_bit(int nr, unsigned long * addr); – void change_bit(int nr, unsigned long * addr); • Test bit value: – int test_bit(int nr, unsigned long *addr); • Test and modify (return the previous value): – int test_and_set_bit(...); – int test_and_clear_bit(...); – int test_and_change_bit(...);

Dongkun Shin, SKKU 19 Kernel locking: summary and references

• Use mutexes in code that is allowed to sleep • Use spinlocks in code that is not allowed to sleep (interrupts) or for which sleeping would be too costly (critical sections) • Use atomic operations to protect integers or addresses • See kernel-hacking/locking in kernel documentation for many details about kernel locking mechanisms.

Dongkun Shin, SKKU 20 Debugging Embedded Linux Systems

• Useful Link – https://training.ti.com/kr/debugging-embedded-linux-systems- training-series

Dongkun Shin, SKKU 21 Kernel logging system architecture

Dongkun Shin, SKKU 22 Kernel log example

Dongkun Shin, SKKU 23 Kernel log buffer

• Kernel log buffer stores kernel messages • It is a circular buffer. Old messages are overwritten when the buffer is full – Use klogd daemon to keep old msgs in a file • klogd receives kernel messages via syslog system call (or /proc/kmsg) and redirect them to syslogd – Log buffer size is configurable • Kernel log buffer can be manipulated via syslog system call – or dmesg command line tool

Dongkun Shin, SKKU 24 Kernel log buffer size

• Default size is 64KB • Adjust the size – Method #1: Kernel Config Option - CONFIG_LOG_BUF_SHIFT=n • menuconfig: “General Setup” – Method #2: uboot bootargs: log_buf_len=n – Buffer Size = 2n • n=16: 64KB • n=17: 128KB, …

Dongkun Shin, SKKU 25 Retrieve kernel logs

• dmesg command – prints/controls the log buffer • Common dmesg usage: – dmesg # print the log buffer – dmesg - # clear the log buffer – dmesg -c # print then clear the log buffer

Dongkun Shin, SKKU 26 syslog system call

Dongkun Shin, SKKU 27 Adding log messages from

• Interface: /dev/kmsg • Usage: echo “some comments” > /dev/kmsg • Example: echo “### TESTNOTE: unplugged thumb drive” > /dev/kmsg echo “### TESTNOTE: waited for a couple seconds” > /dev/kmsg echo “### TESTNOTE: re-plugged thumb drive” > /dev/kmsg

Dongkun Shin, SKKU 28 Kernel Debugging using messages

• printk(): Kernel-space equivalent of printf() • printk(KERN_ERR "something went wrong, return code: %d\n", ret); • Kernel-specific conversion specifiers – Ex.: “%pS” - print symbol name with offset: versatile_init+0x0/0x110 • Alias function pr_info("Booting CPU %d\n", cpu); [ 202.350064] Booting CPU 1

Name String Meaning alias function KERN_EMERG "0" Emergency messages, system is about to crash or is unstable pr_emerg KERN_ALERT "1" Something bad happened and action must be taken immediately pr_alert KERN_CRIT "2" A critical condition occurred like a serious hardware/software failure pr_crit KERN_ERR "3" An error condition, often used by drivers to indicate difficulties with the hardware pr_err KERN_WARNING "4" A warning, meaning nothing serious by itself but might indicate problems pr_warning KERN_NOTICE "5" Nothing serious, but notably nevertheless. Often used to report security events. pr_notice KERN_INFO "6" Informational message e.g. startup information at driver initialization pr_info pr_debug, pr_devel KERN_DEBUG "7" Debug messages if DEBUG is defined KERN_DEFAULT "d" The default kernel loglevel KERN_CONT "" "continued" line of log printout (only done after a line that had no enclosing \n) pr_cont

Dongkun Shin, SKKU 29 printk() loglevel

• If you don't specify a log level in your message it defaults to DEFAULT_MESSAGE_LOGLEVEL (usually "4"=KERN_WARNING) – set via the CONFIG_DEFAULT_MESSAGE_LOGLEVEL kernel config option • The log level is used by the kernel to determine the importance of a message and to decide whether it should be presented to the user immediately, by printing it to the current console . • Kernel compares the log level of the message to the console_loglevel (a kernel variable) – if the log level of the message < the console_loglevel • the message will be printed to the current console. – All the messages, regardless of their priority, are stored in the kernel log ring buffer ➔ Typically accessed using the dmesg command

Dongkun Shin, SKKU 30 printk() loglevel

• To determine your current console_loglevel: – $ cat /proc/sys/kernel/printk • To change console_loglevel: – $echo n > /proc/sys/kernel/printk • Filtering log messages – $dmesg -n 5 • Set console logging filter to KERN_WARNING or more severe. – $dmesg -l warn • Only print the logs of KERN_WARNING in the kernel ring buffer.

Dongkun Shin, SKKU 31 Kernel Debugging using messages

• For drivers, • The dev_*() family of functions: dev_emerg(), dev_alert(), dev_crit(), dev_err(), dev_warn(), dev_notice(), dev_info() and the special dev_dbg() (see next page) – They take a pointer to struct device as first argument, and then a format string with arguments – Defined in include/linux/device.h – To be used in drivers integrated with the Linux device model – Example: dev_info(&pdev->dev, "in probe\n"); [ 25.878382] serial 48024000.serial: in probe [ 25.884873] serial 481a8000.serial: in probe

Dongkun Shin, SKKU 32 pr_debug() and dev_dbg()

• Macros defined with the lowest message level 7 - KERN_DEBUG • Used for printing debug messages in kernel or device drivers, respectively. • When the driver is compiled with DEBUG defined, all these messages are compiled and printed at the debug level. • Enable DEBUG macro – Kernel config option: ex. config DEBUG_GPIO – add #define DEBUG at the beginning of the driver – use ccflags-$(CONFIG_DRIVER) += -DDEBUG in the Makefile • Dynamic Debugging – Compile kernel with CONFIG_DYNAMIC_DEBUG – Details in admin-guide/dynamic-debug-howto • Very powerful feature to only get the debug messages you’re interested in. • When neither DEBUG nor CONFIG_DYNAMIC_DEBUG are used, these messages are not compiled in.

Dongkun Shin, SKKU 33 pr_debug() and dev_dbg()

#ifdef DEBUG #define pr_debug(...) printk(KERN_DEBUG ...) #define dev_dbg(...) dev_printk(KERN_DEBUG ...) #else #define pr_debug(...) ({}) #define dev_dbg(...) ({}) #endif

Dongkun Shin, SKKU 34 What is dynamic debug?

• Dynamically enable/disable kernel debug code at runtime to obtain kernel debug log: – pr_debug()/dev_dbg() – print_hex_dump_debug()/print_hex_dump_bytes() • Benefits: – Almost no overhead when log code is not enabled. – Turn on/off debug log at runtime. – No need to recompile the kernel.

• CONFIG_DYNAMIC_DEBUG=y • menuconfig: Kernel hacking ---> printk and dmesg options ---> [*] Enable dynamic printk() support

Dongkun Shin, SKKU 35 Control interface

• Control methods – Line Number or Range – Function Name – Filename – Module Name • Control interface – – u-boot bootargs

Dongkun Shin, SKKU 36 debugfs control interface

# echo “ ” > /dynamic_debug/control

Dongkun Shin, SKKU 37 Enable debug messages during boot process

• This allows debugging of core code or built-in modules during the boot process. • uboot bootargs – dyndbg=“QUERY” <-- for kernel – module.dyndbg=“QUERY” < -- for module • Example: dyndbg="file ec.c +p"

Dongkun Shin, SKKU 38 DebugFS

• A virtual filesystem to export debugging information to user space. – Kernel configuration: CONFIG_DEBUG_FS • Kernel hacking -> Debug Filesystem – The debugging interface disappears when Debugfs is configured out. – You can mount it as follows: • sudo mount -t debugfs none /sys/kernel/debug – First described on http://lwn.net/Articles/115405/ – API documented in the Filesystem API: filesystems (section The debugfs filesystem)

Dongkun Shin, SKKU 39 DebugFS API

• Create a sub-directory for your driver: – struct dentry *debugfs_create_dir(const char *name, struct dentry *parent); • Expose an integer as a file in DebugFS. Example: – struct dentry *debugfs_create_u8(const char *name, mode_t mode, struct dentry *parent, u8 *value); • u8, u16, u32, u64 for decimal representation • x8, x16, x32, x64 for hexadecimal representation • Expose a binary blob as a file in DebugFS: – struct dentry *debugfs_create_blob(const char *name, mode_t mode, struct dentry *parent, struct debugfs_blob_wrapper *blob); • Also possible to support writable DebugFS files or customize the output using the more generic debugfs_create_file() function.

Dongkun Shin, SKKU 40 Debugfs example

Dongkun Shin, SKKU 41 strace: system call trace

• Intercepts and records – System calls issued by a process – Signals a process received

Dongkun Shin, SKKU 42

• Useful for event tracing, analyzing latencies and performance issues • ftrace uses the tracefs file system to hold the control files as well as the files to display output. • When tracefs is configured into the kernel, the directory /sys/kernel/tracing will be created. • To mount this directory, $mount -t tracefs nodev /sys/kernel/tracing • Before 4.1, all ftrace tracing control files were within the debugfs file system, which is typically located at /sys/kernel/debug/tracing. • For backward compatibility, when mounting the debugfs file system, the tracefs file system will be automatically mounted at: /sys/kernel/debug/tracing – All files located in the tracefs file system will be located in that debugfs file system directory as well.

https://www.kernel.org/doc/html/v5.1/trace/ftrace.html

Dongkun Shin, SKKU 43 Ftrace key files

Dongkun Shin, SKKU 44 Ftrace tracers

Dongkun Shin, SKKU 45 Example of function tracer

# echo sys_nanosleep hrtimer_interrupt > set_ftrace_filter # echo function > current_tracer # echo 1 > tracing_on # usleep 1 # echo 0 > tracing_on # cat trace # tracer: function # # entries-in-buffer/entries-written: 5/5 #P:4 # # _-----=> irqs-off ‘d’ interrupts are disabled. ‘.’ otherwise. # / _----=> need-resched # | / _---=> hardirq/softirq ‘h’ - hard irq is running # || / _--=> preempt-depth # ||| / delay # TASK-PID CPU# |||| TIMESTAMP FUNCTION # | | | |||| | | usleep-2665 [001] .... 4186.475355: sys_nanosleep <-system_call_fastpath -0 [001] d.h1 4186.475409: hrtimer_interrupt <-smp_apic_timer_interrupt usleep-2665 [001] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt -0 [003] d.h1 4186.475426: hrtimer_interrupt <-smp_apic_timer_interrupt -0 [002] d.h1 4186.475427: hrtimer_interrupt <-smp_apic_timer_interrupt

Dongkun Shin, SKKU 46 Example of function tracer

# echo __do_fault > set_graph_function # tracer: function_graph # echo function_graph > current_tracer # # CPU DURATION FUNCTION CALLS # echo 1 > tracing_on # | | | | | | | # usleep 1 # echo 0 > tracing_on 0) | sys_open() { 0) | do_sys_open() { # cat trace 0) | getname() { 0) | kmem_cache_alloc() { 0) 1.382 us | __might_sleep(); 0) 2.478 us | } 0) | strncpy_from_user() { 0) | might_fault() { 0) 1.389 us | __might_sleep(); 0) 2.553 us | } 0) 3.807 us | } 0) 7.876 us | } 0) | alloc_fd() { 0) 0.668 us | _spin_lock(); 0) 0.570 us | expand_files(); 0) 0.586 us | _spin_unlock();

Dongkun Shin, SKKU 47 Using Magic SysRq

• Allows to run multiple debug / rescue commands even when the kernel seems to be in deep trouble – On PC: press [Alt] + [Prnt Scrn] + simultaneously ([SysRq] = [Alt] + [Prnt Scrn]) – On embedded: in the console, send a break character (Picocom: press [Ctrl] + a followed by [Ctrl] + \ ), then press • Example commands: – h: show available commands – s: sync all mounted filesystems – b: reboot the system – n: makes RT processes nice-able. – w: shows the kernel stack of all sleeping processes – t: shows the kernel stack of all running processes – You can even register your own! – Detailed in admin-guide/sysrq

Dongkun Shin, SKKU 48 kgdb - A kernel debugger

• The execution of the kernel is fully controlled by gdb from another machine, connected through a serial line. • Can do almost everything, including inserting breakpoints in interrupt handlers. • Feature supported for the most popular CPU architectures • Details available in the kernel documentation: dev-tools/kgdb • Recommended to turn on CONFIG_FRAME_POINTER to aid in producing more reliable stack backtraces in gdb.

Dongkun Shin, SKKU 49 kgdb - A kernel debugger

• You must include a kgdb I/O driver. • One of them is kgdb over serial console (kgdboc: kgdb over console, enabled by CONFIG_KGDB_SERIAL_CONSOLE) • Configure kgdboc at boot time by passing to the kernel: – kgdboc=,. – For example: kgdboc=ttyS0,115200 • Then also pass kgdbwait to the kernel: it makes kgdb wait for a debugger connection. • Boot your kernel, and when the console is initialized, interrupt the kernel with a break character and then g in the serial console (see our Magic SysRq explanations). • On your workstation, start gdb as follows: – arm-linux-gdb ./ – (gdb) set remotebaud 115200 – (gdb) target remote /dev/ttyS0 • Once connected, you can debug a kernel the way you would debug an application program.

Dongkun Shin, SKKU 50 kernel Oops

• Deviation from correct behavior of the Linux kernel • Produces certain error messages in kernel logs • Log structure – Error Summary – Error Type – CPU#/PID#/Kernel-Version – Hardware – CPU Register Dump – PC/LR – Stack Dump – Backtrace

Dongkun Shin, SKKU 51 Kernel oops log structure example

Dongkun Shin, SKKU 52 Tools for locating errors in

• gdb or addr2line helps to locate the error in source code when kernel debug info is enabled in kernel config • gdb list command • addr2line -fe option • objdump -dS option

Dongkun Shin, SKKU 53 Locate errors example 1: Kernel

Dongkun Shin, SKKU 54 Locate errors example 1: Kernel

Dongkun Shin, SKKU 55 Locate errors example 2: Module

Dongkun Shin, SKKU 56 Locate errors example 3: NULL pointer in workqueue

Dongkun Shin, SKKU 57 Locate errors example 3: NULL pointer in workqueue

Dongkun Shin, SKKU 58 Locate errors example 4: Spinlock dead lock

Dongkun Shin, SKKU 59 Locate errors example 4: Spinlock dead lock

Dongkun Shin, SKKU 60 Debugging with a JTAG interface

• Two types of JTAG dongles – The ones offering a gdb compatible interface, over a serial port or an Ethernet connection. gdb can directly connect to them. – The ones not offering a gdb compatible interface are generally supported by OpenOCD (Open On Chip Debugger): http://openocd.sourceforge.net/ • OpenOCD is the bridge between the gdb debugging language and the JTAG interface of the target CPU. • See the very complete documentation: http://openocd.org/documentation/ • For each board, you’ll need an OpenOCD configuration file (ask your supplier)

Dongkun Shin, SKKU 61 More kernel debugging tips

• Make sure CONFIG_KALLSYMS_ALL is enabled – Is turned on by default – To get oops messages with symbol names instead of raw addresses • On ARM, if your kernel doesn’t boot or hangs without any message, you can activate early debugging options (CONFIG_DEBUG_LL and CONFIG_EARLYPRINTK), and add earlyprintk to the kernel command line.

Dongkun Shin, SKKU 62