From owner-freebsd-doc@FreeBSD.ORG Fri Jun 23 13:26:25 2006 Return-Path: X-Original-To: freebsd-doc@freebsd.org Delivered-To: freebsd-doc@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 39B5516A4A0 for ; Fri, 23 Jun 2006 13:26:25 +0000 (UTC) (envelope-from keramida@freebsd.org) Received: from igloo.linux.gr (igloo.linux.gr [62.1.205.36]) by mx1.FreeBSD.org (Postfix) with ESMTP id 5CFC243D45 for ; Fri, 23 Jun 2006 13:26:24 +0000 (GMT) (envelope-from keramida@freebsd.org) Received: from gothmog.pc (host5.bedc.ondsl.gr [62.103.39.229]) (authenticated bits=128) by igloo.linux.gr (8.13.7/8.13.7/Debian-1) with ESMTP id k5NDQ5s7014503 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT); Fri, 23 Jun 2006 16:26:07 +0300 Received: from gothmog.pc (gothmog [127.0.0.1]) by gothmog.pc (8.13.7/8.13.7) with ESMTP id k5NDPwlv007916; Fri, 23 Jun 2006 16:25:59 +0300 (EEST) (envelope-from keramida@freebsd.org) Received: (from giorgos@localhost) by gothmog.pc (8.13.7/8.13.7/Submit) id k5NDPw1J007915; Fri, 23 Jun 2006 16:25:58 +0300 (EEST) (envelope-from keramida@freebsd.org) Date: Fri, 23 Jun 2006 16:25:58 +0300 From: Giorgos Keramidas To: Kostik Belousov Message-ID: <20060623132558.GD7062@gothmog.pc> References: <20060607084346.GA21391@deviant.kiev.zoral.com.ua> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20060607084346.GA21391@deviant.kiev.zoral.com.ua> X-Hellug-MailScanner: Found to be clean X-Hellug-MailScanner-SpamCheck: not spam, SpamAssassin (score=-4.268, required 5, autolearn=not spam, ALL_TRUSTED -1.80, AWL 0.13, BAYES_00 -2.60) X-Hellug-MailScanner-From: keramida@freebsd.org X-Spam-Status: No Cc: freebsd-doc@freebsd.org Subject: Re: [patch] deadlock debugging X-BeenThere: freebsd-doc@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Documentation project List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 23 Jun 2006 13:26:25 -0000 On 2006-06-07 11:43, Kostik Belousov wrote: > Reports of the deadlocks are reccurrent topic on the current- > and stable- lists. Many of us have to repeat the instructions > on how to provide the useful initial bug report from them. > > Please, comment proposed addition to the kernel debugging > chapter of the developer handbook. Hi Kostik, > Obviously, I am not an english native speaker. Your corrections > for both factual material and grammar/style are very much > welcome ! > > P.S. I'm not on the list, do not remove CC: to me on replying. Ok :) This seems like a useful addition to the developer's handbook, but I have some minor comments. See inline text below: > Index: en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml > =================================================================== > RCS file: /usr/local/arch/ncvs/doc/en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml,v > retrieving revision 1.64 > diff -u -r1.64 chapter.sgml > --- en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 5 Jan 2006 20:03:34 -0000 1.64 > +++ en_US.ISO8859-1/books/developers-handbook/kerneldebug/chapter.sgml 7 Jun 2006 08:39:20 -0000 > @@ -821,6 +821,41 @@ > on any configured console driver, including a serial > console. > > + > + > + Debugging the Deadlocks `Debugging Kernel Deadlocks' is probably a better title here, since deadlocks can only occur in the kernel and `the Deadlocks' doesn't really make this as obvious as I'd probably want it to be. > + You may experience so called deadlocks, the situation where > + system stops doing useful work. To provide the useful bug report > + in this situation, you shall use ddb as described above. Please, > + include the output of ps and > + trace for suspected processes in the > + report. This paragraph has a few minor syntax buglets. English is not my native language, but I would probably rewrite this as: | Modern &os; releases have been extended with support for | Symmetric Multiprocessing (SMP). To support highly parallel | processing, the &os; kernel uses a lot of internal locking and | synchronization primitives, to allow multiple kernel threads | to run concurrently on systems that can support such a mode of | operation. Bugs in the use of these internal locking | mechanisms can lead to a situation where one or more kernel | threads block compete for the same resources and block | indefinitely waiting for each other. When this happens, the | system may become unstable, leading either to a crash or | appear to hang. This hang is called a | deadlock. | | Debugging a deadlock may be a tricky and difficult thing, | but &os; provides some tools that may assist you in tracking | down the problem or collect information about the deadlock | when it occurs. | | One of these tools is the kernel debugger, | DDB, which you can use as described | in the previous sections to collect useful information for | such a bug. DDB commands that are | very useful and may provide information that helps debugging a | deadlock are: | | | ps | trace | | | Use the ps command to list all the | processes and then use trace on processes | that are suspects for having caused the deadlock. | | Other commands that can provide useful information for | tracking down the cause of a deadlock are: | | | show allcpu | show alllocks | show lockedvnods | | | Useful information about what each process was doing, at | the time the deadlock occured, can be listed with: | | | where PID | | | The output of the where command tends | to be very useful for the processes listed in the output of | the show commands. | | To obtain meaningful backtraces for threaded processes, | use thread thread-id first, to switch to | the correct thread, and then get a backtrace | with where. Does this version look ok to you? I can handle the merging of this change with your initial diff/patch > + If possible, consider doing further investigation. Receipt > + below is especially usefull if you suspect deadlock occurs in the > + VFS layer. Add the options > + makeoptions DEBUG=-g > + options INVARIANTS > + options INVARIANT_SUPPORT > + options WITNESS > + options DEBUG_LOCKS > + options DEBUG_VFS_LOCKS > + options DIAGNOSTIC > + > + to the kernel config. When deadlock occurs, in addition to the > + output of the ps command, provide information > + from the show allpcpu, show > + alllocks and show > + lockedvnods. More, please provide output of the > + where pid for each process id mentioned in > + the output of the show commands. > + > + > + For threaded processes, to obtain meaningful backtraces, use > + thread thread-id to switch to the thread > + stack, and do backtrace with where. > + > This part is also nice, but IMHO it would be even nicer if we could expand it a bit more. How about something like this? | | | Deadlocks are pretty nasty bugs, since they are not very | easy to reproduce. Their occurence depends on specific | timing, synchronization, system load and many more factors. | This makes it hard to reliably reproduce a deadlock bug. | Since reproducing a bug is some times a crucial part of | gathering all the necessary information, you may have to spend | some time investigating the deadlock. Naturally, this is not | always possible for production systems, but if you can | reproduce the deadlock on a test system which can afford | staying off-line for extended periods of time, then consider | staying inside DDB while you are | investigating the deadlock further. | | A serial console can be extremely helpful in collecting | DDB output. | | If it's impossible to set up a serial console | (i.e. because you cannot find or afford a second system to | configure as a testbed), emulators like | emulators/qemu, | emulators/vmware2 or | emulators/bochs may prove a | very efficient way of debugging kernel issues, like a | deadlock. Part #2 ... | | | Apart from the usual kernel options that are useful for | debugging kernel problems, there are some options that are | prticularly useful and targetted at debugging locking | problems. These options are: | | options INVARIANTS | options INVARIANT_SUPPORT | options WITNESS | options DEBUG_LOCKS | options DEBUG_VFS_LOCKS | options DIAGNOSTIC Any help in expanding these parts (especially the second one) is more than welcome :-)