Kdump: Smarter, Easier, Trustier

Kdump: Smarter, Easier, Trustier Vivek Goyal Neil Horman Ken’ichi Ohmichi IBM Red Hat, Inc. NEC Soft [email protected] [email protected] [email protected] Maneesh Soni Ankita Garg IBM IBM [email protected] [email protected] Abstract a tera-byte of RAM. Capturing the contents of the entire RAM would result in a proportionately large core file Kdump, a kexec based kernel crash dumping mecha- and managing a tera-byte file can be difficult. One does nism, has witnessed a lot of new significant development not need the contents of entire RAM to debug a ker- activities recently. Features like, the relocatable kernel, nel problem and many pages like userspace pages can dump filtering, and initrd based dumping have enhanced be filtered out. Now an open source userspace utility is kdump capabilities are important steps towards making available for dump filtering and Section 3 discusses the it a more reliable and easy to use solution. New tools working and internals of the utility. like Linux Kernel Dump Test Module (LKDTM) pro- vide an opportunity to automate kdump testing. It can Currently, a kernel crash dump is captured with the be especially useful for distributions to detect and weed help of init scripts in the userspace in the capture ker- out regressions relatively quickly. This paper presents nel. This approach has some drawbacks. Firstly, it as- various newly introduced features and provides imple- sumes that the root filesystem did not get corrupted and mentation details wherever appropriate. It also briefly is still mountable in the second kernel. Secondly, mini- discusses the future directions, such as early boot crash mal work should be done in second kernel and one need dumping. not have to run various user space init scripts. This led to the idea of building a custom initial ram disk (initrd) to 1 Introduction capture the dump and improve the reliability of the op- eration. Various implementation details of initrd based Kdump is a kernel crash dumping mechanism where a dumping are presented in Section 4. pre-loaded kernel is booted in, to capture the crash dump after a system crash [12]. This pre-loaded kernel, often Section 5 discusses the Linux Kernel Dump Test Mod- called as capture kernel, runs from a different physical ule (LKDTM), a kernel module, which allows one to set memory area than the production kernel or regular ker- up and trigger crashes from various kernel code paths at nel. As of today, a capture kernel is specifically com- run time. It can be used to automate kdump testing pro- piled and linked for a specific memory location, and is cedure to identify bugs and eliminate regressions with shipped as an extra kernel to capture the dump. A re- lesser efforts. This paper also gives a brief update on de- locatable kernel implementation gets rid of the require- vice driver hardening efforts in Section 6 and concludes ment to run the kernel from the address it has been com- with future work in Section 7. piled for, instead one can load the kernel at a different address and run it from there. Effectively the distribu- 2 Relocatable bzImage tions and kdump users don’t have to build an extra kernel to capture the dump, enhancing ease of use. Section Generally, the Linux R kernel is compiled for a fixed 2 provides the details of relocatable kernel implementa- address and it runs from that address. Traditionally, tion. for i386 and x86_64 architectures, it has been com- Modern machines are being shipped with bigger and piled and run from 1MB physical memory location. bigger RAMs and a high end configuration can possess Later, Eric W. Biederman introduced a config option, • 167 • 168 • Kdump: Smarter, Easier, Trustier CONFIG_PHYSICAL_START, which allowed a kernel to and packed into the bzImage. The uncompressed be compiled for a different address. This option effec- kernel code decompresses the kernel, performs the tively shifted the kernel in virtual address space and one relocations, and transfers control to the protected could compile and run the kernel from a physical ad- mode kernel. This approach has been adopted by dress say, 16MB. Kdump used this feature and built a the i386 implementation. separate kernel for capturing the dump. This kernel was specifically built to run from a reserved memory location. 2.2 Design Details (i386) The requirement of building an extra kernel for dump capturing has not gone over too well with distributions, In i386, kernel text and data are part of linearly mapped as they end up shipping an extra kernel binary. Apart region which has got hard-coded assumptions about vir- from disk space requirements, it also led to increased tual to physical address mapping. Hence, it is probably efforts in terms of supporting and testing this extra ker- difficult to adopt the modifying the page table approach nel. Also from a user’s perspective, building an extra for implementing a relocatable kernel. Instead, a sim- kernel is cumbersome. pler, non-intrusive approach is to ask the linker to generate relocation information, pack this relocation infor- The solution to the above problem is a relocatable ker- mation into bzImage, and the uncompressed kernel code nel, where the same kernel binary can be loaded and can process these relocations before jumping to the 32- run from a different address than what it has been com- bit kernel entry point (startup_32()). piled for. Jan Kratochvil had posted a set of proto- type patches to kick- off the discussion on the fastboot mailing list [6]. Later, Eric W. Biederman came up 2.2.1 Relocation Information Generation with another set of patches and posted them to LKML for review [8]. Finally, Vivek Goyal picked up Eric’s patches, cleaned those up, fixed a number of bugs, in- Relocation information can be generated in many ways. corporated various review comments, and went through The initial experiment was to compile the kernel as multiple rounds of reposting to LKML for inclusion into shared object file (-shared) which generated the re- the mainline kernel. Relocatable kernel implementation location entries. Eric had posted the patches for this is very architecture dependent and support for i386 ar- approach [9] but it was found that the linker also gen- chitecture has been merged with version 2.6.20 of the erated the relocation entries for absolute symbols (for mainline kernel. Patches for x86_64 have been posted some historical reason) [1]. By definition, absolute sym- on LKML [11] and are now part of -mm. Hopefully, bols are not to be relocated, but, with this approach, ab- these will be merged with mainline kernel soon. solute symbols also ended up being relocated. Hence this method did not prove to be a viable one. 2.1 Design Approaches Later, a different approach was taken where The following design approaches have been discussed the i386 kernel is built with the linker option for the relocatable bzImage implementation. --emit-relocs. This option builds an executable vmlinux and still retains relocation information in a fully linked executable. This increases the size • Modify kernel text/data mapping at run time of vmlinux by around 10%, though this information is At run time, the kernel determines where it has discarded at runtime. The kernel build process goes been loaded by the boot-loader and it updates its through these relocation entries and filters out PC page tables to reflect the right mapping between relative relocations, as these don’t have to be adjusted kernel virtual and physical addresses for kernel text if bzImage is loaded at a different physical address. It and data. This approach has been adopted for the also filters out the relocations generated with respect to x86_64 implementation. absolute symbols because absolute symbols don’t have • Relocate using relocation information to be relocated. The rest of the relocation offsets are This approach forces the linker to generate reloca- packed into the compressed vmlinux. Figure 1 depicts tion information. These relocations are processed the new i386 bzImage build process. 2007 Linux Symposium, Volume One • 169 Link vmlinux with --emit-relocs option Compressed bzImage Process Relocations Move kernel Uncompressed to a safe Kernel (relocs.c) location (1) Decompress Kernel (2) vmlinux.relocs Compressed Location where Concatenate with bzImage kernel is run from vmlinux.bin Location where vmlinux.bin.all kernel is loaded by boot-loader, Compress Usually, 1MB vmlinux.bin.gz Figure 2: In-place bzImage decompression piggy.o 2.2.3 Perform Relocations Link with misc.o and head.o After decompression, all the relocations are performed. vmlinux Uncompressed code calculates the difference between Strip the address for which the kernel was compiled and the address at which it is loaded. This difference is added vmlinux.bin to the locations as specified by relocation offsets and Append Real mode control is transferred to the 32-bit kernel entry point. code bzImage 2.2.4 Kernel Config Options Figure 1: i386 bzImage build process Several new config options have been introduced for relocatable kernel implementation. CONFIG_ RELOCATABLE controls whether the resulting bzImage 2.2.2 In-place Decompression is relocatable or not. If this option is not set, no relocation information is generated in the vmlinux. Generally, bzImage decompresses itself to the address it has been compiled for (CONFIG_PHYSICAL_START) The code for decompressing the kernel has been and runs from there. But if CONFIG_RELOCATABLE is changed so that decompression can be done in-place. set, then it runs from the address it has been loaded at by Now the kernel is not first decompressed to any other the boot-loader and it ignores the compile time address.

Load more