Debugging Kernel Problems
Total Page:16
File Type:pdf, Size:1020Kb
Debugging Kernel Problems by Greg Lehey Edition for BSDCan 2006 Ottawa, 11 May 2006 Debugging Kernel Problems by Greg Lehey ([email protected], [email protected], [email protected]) Copyright © 1995-2005 Greg Lehey $Id: handout.mm,v 1.14 2005/11/05 02:36:32 grog Exp grog $ This book is licensed under the Creative Commons Attribution-NonCommer cial-ShareAlike license, Version 2.5. (http://cr eativecommons.org/licenses/by-nc-sa/2.5/). The following is a slightly refor matted version of the license specified there. Any dif ferences arenot intended to change the meaning of the license. Youare free: • to copy, distribute, display, and perfor m the work, and • to make derivative works under the following conditions: • Attribution: You must attribute the work in the manner specified by the author or licensor. • Non-commercial: You may not use this work for commercial purposes. • Shar e Alike: If you alter,transfor m, or build upon this work, you may distribute the resulting work only under a license identical to this one. • For any reuse or distribution, you must make clear to others the license terms of this work. • Any of these conditions can be modified if you get permission from the copyright holder. Your fair use and other rights areinnoway affected by the above. The latest version of this document is available at http://www.lemis.com/gr og/Papers/Debug-tutorial/tutori- al.pdf. The latest version of the accompanying slides is at http://www.lemis.com/gr og/Papers/Debug-tutori- al/slides.pdf. Debugging Kernel Problems 3 Preface Debugging kernel problems is a black art. Not many people do it, and documentation is rare, inaccurate and incomplete. This document is no exception: faced with the choice of accuracy and completeness, I chose to attempt the latter.Asusual, time was the limiting factor,and this draft is still in beta status, as it has been through numerous pr esentations of the tutorial. This is a typical situation for the whole topic of kernel debugging: building debug tools and documentation is expensive, and the people who write them arealso the people who use them, so there's a tendency to build as much of the tool as necessary to do the job at hand. If the tool is well-written, it will be reusable by the next person who looks at a particular area; if not, it might fall into disuse. Consider this book a starting point for your own development of debugging tools, and remember: morethan anywhereelse, this is an area with “some assembly requir ed”. 4Debugging Kernel Problems Debugging Kernel Problems 5 1 Introduction Operating systems fail. All operating systems contain bugs, and they will sometimes cause the system to behave incorrectly. BSD ker nels ar e no exception. Compar ed to most other operating systems, both free and commercial, BSD kernels offer a large number of debugging tools. This tutorial examines the options available both to the experienced end user and also to the developer. This tutorial bases on the FreeBSD kernel, but the differ ences in other BSDs aresmall. We'll look at the following topics: • How and why kernels fail. • Understanding log files: dmesg and the files in /var/log,notably /var/log/messages. • Userland tools for debugging a running system. • Building a kernel with debugging support: the options. • Using a serial console. • Pr eparing for dumps: dumpon, savecor e. • The assembler-level view of a C program. • Pr eliminary dump analysis. • Reading code. • Intr oduction to the kernel source tree. • Analysing panic dumps with gdb. • On-line kernel debuggers: ddb,remote serial gdb. • Debugging a running system with ddb. • Debugging a running system with gdb. 6Debugging Kernel Problems • Debug options in the kernel: INVARIANTS and friends. • Debug options in the kernel: WITNESS. • Code-based assistance: KTR. Howand whyker nels fail Good kernels should not fail. They must protect themselves against a number of ex- ter nal influences, including hardwarefailur e, both deliberately and accidentally badly written user programs, and kernel programming errors. In some cases, of course, ther e is no way a kernel can recover,for example if the only processor fails. On the other hand, a good kernel should be able to protect itself from badly written user pro- grams. Aker nel can fail in a number of ways: • It can stop reacting to the outside world. This is called a hang. • It can destroy itself (overwriting code). It's almost impossible to distinguish this state from a hang unless you have tools which can examine the machine state inde- pendently of the kernel. • It can detect an inconsistency, report it and stop. In UNIX terminology, this is a panic . • It can continue running incorrectly. For example, it might corrupt data on disk or br each network protocols. By far the easiest kind of failuretodiagnose is a panic. Ther e ar e two basic types: • Failed consistency checks result in a specific panic: panic: Free vnode isn't • Exception conditions result in a less specific panic: panic: Page fault in kernel mode The other cases can be very difficult to catch at the right moment. Debugging Kernel Problems 7 2 Userland programs dmesg In normal operation, a kernel will sometimes write messages to the outside world via the “console”, /dev/console.Inter nally it writes via a circular buffer called msgbuf. The dmesg pr ogram can show the current contents of msgbuf.The most important use is at startup time for diagnosing configuration problems: # dmesg Copyright (c) 1992-2002 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 4.5-PRERELEASE #3: Sat Jan 513:25:02 CST 2002 [email protected]:/src/FreeBSD/4-STABLE-ECHUNGA/src/sys/compile/ECHUNGA Timecounter "i8254" frequency 1193182 Hz Timecounter "TSC" frequency 751708714 Hz CPU: AMD Athlon(tm) Processor (751.71-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0x621 Stepping = 1 Features=0x183f9ff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE3 6,MMX,FXSR> AMD Features=0xc0400000<AMIE,DSP,3DNow!> ... pci0: <unknown card> (vendor=0x1039, dev=0x0009) at 1.1 ... cd1 at ahc0 bus 0 target 1 lun 0 cd1: <TEAC CD-ROM CD-532S 1.0A> Removable CD-ROM SCSI-2 device cd1: 20.000MB/s transfers (20.000MHz, offset 15) cd1: Attempt to query device size failed: NOT READY, Medium not present ... WARNING: / was not properly unmounted Much of this information is informative, but occasionally you get messages indicating some problem. The last line in the previous example shows that the system did not shut down properly: either it crashed, or the power failed. During normal operation you might see messages like the following: sio1: 1 more silo overflow (total 1607) 8Debugging Kernel Problems sio1: 1 more silo overflow (total 1608) nfsd send error 64 ... nfs server wantadilla:/src: not responding nfs server wantadilla:/: not responding nfs server wantadilla:/src: is alive again nfs server wantadilla:/: is alive again arp info overwritten for 192.109.197.82 by 00:00:21:ca:6e:f1 In the course of time, the message buffer wraps around and the old contents arelost. For this reason, FreeBSD and NetBSD print the dmesg contents after boot to the file /var/run/dmesg.boot for later refer ence. In addition, the output is piped to syslogd, the system log daemon, which by default writes it to /var/log/messages. During kernel debugging you can print msgbuf.For FreeBSD, enter: (gdb) printf "%s", (char *)msgbufp->msg_ptr For NetBSD or OpenBSD, enter: (gdb) printf "%s", (char *) msgbufp->msg_bufc Log files BSD systems keep track of significant events in log files.They can be of great use for debugging. Most of them arekept in /var/log,though this is not a requir ement. Many of them aremaintained by syslogd,but thereisnorequir ement for a special program. The only requir ement is to avoid having two programs maintaining the same file. syslogd syslogd is a standard daemon which maintains a number of the files in /var/log.You should always run syslogd unless you have a very good reason not to. Pr ocesses nor mally write to syslogd with the library function syslog: #include <syslog.h> #include <stdarg.h> void syslog (int priority, const char *message, ...); syslog is used in a similar manner to printf;only the first parameter is differ ent. Although it's called priority in the man page, it's divided into two parts: • The level field describes how serious the message is. It ranges from LOG_DEBUG (infor mation nor mally suppr essed and only produced for debug purposes) to LOG_EMERG (“machine about to self-destruct”). • The facility field describes what part of the system generated the message. The priority field can be repr esented in text formasfacility.level.For example, error Debugging Kernel Problems 9 messages from the mail subsystem arecalled mail.err. In FreeBSD, as the result of security concerns, syslogd is started with the -s flag by default. This stops syslogd fr om accepting remote messages. If you specify the -ss flag, as suggested in the comment, you will also not be able to log to remote systems. Depending on your configuration, it’s worth changing this default. For example, you might want all systems in example.or g to log to gw.That way you get one set of log files for the entirenetwork. /etc/syslog.conf syslogd reads the file /etc/syslog.conf,which specifies wheretolog messages based on their message priority. Her e's aslightly modified example: #$FreeBSD: src/etc/syslog.conf,v 1.13 2000/02/08 21:57:28 rwatson Exp $ # #Spaces are NOT valid field separators in this file.