Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Oct 2010 20:45:55 +0100
From:      "Robert N. M. Watson" <rwatson@freebsd.org>
To:        Garrett Cooper <gcooper@FreeBSD.org>
Cc:        FreeBSD Current <current@freebsd.org>, freebsd-net@freebsd.org, Attilio Rao <attilio@freebsd.org>, Sergey Kandaurov <pluknet@freebsd.org>, Jack F Vogel <jfv@freebsd.org>, Ryan Stone <rstone@sandvine.com>, Ryan Stone <rysto32@gmail.com>, Ed Maste <emaste@sandvine.com>
Subject:   Re: [PATCH] Netdump for review and testing -- preliminary version
Message-ID:  <93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6@freebsd.org>
In-Reply-To: <AANLkTi=uwBtd5ce5ctQJZwm%2BxJcNVMQfs9thOUh%2BuYxG@mail.gmail.com>
References:  <AANLkTikA5OUYD1A9pqCqVEZ5qk%2BVECq8x-fnRXnpp0KE@mail.gmail.com> <AANLkTikau6omhWrXVM13zonFEPCxXM%2B8EqJauovDu0OU@mail.gmail.com> <alpine.BSF.2.00.1010090121310.1232@fledge.watson.org> <AANLkTimisSojDg2z_f1_v71evfooVdPQ44eu2Thhrf3O@mail.gmail.com> <C73FFD46-80B0-44F0-9A19-2B047C285134@freebsd.org> <AANLkTimLnRsa4v=A3Ui-1hKiVc5YLwkBND4NOmT4t%2BtB@mail.gmail.com> <15387E38-1E6C-4347-BEA1-61AEE31B5544@freebsd.org> <AANLkTimusir1uCE_uxS0uRQCa4rgm_%2B26duep3%2Bo1XUH@mail.gmail.com> <alpine.BSF.2.00.1010152019450.83418@fledge.watson.org> <AANLkTi=uwBtd5ce5ctQJZwm%2BxJcNVMQfs9thOUh%2BuYxG@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On 15 Oct 2010, at 20:39, Garrett Cooper wrote:

>    But there are already some cases that aren't properly handled
> today in the ddb area dealing with dumping that aren't handled
> properly. Take for instance the following two scenarios:
> 1. Call doadump twice from the debugger.
> 2. Call doadump, exit the debugger, reenter the debugger, and call
> doadump again.
>    Both of these scenarios hang reliably for me.
>    I'm not saying that we should regress things further, but I'm just
> noting that there are most likely a chunk of edgecases that aren't
> being handled properly when doing dumps that could be handled better /
> fixed.

Right: one of the points I've made to Attilio is that we need to move to =
a more principled model as to what sorts of things we allow in various =
kernel environments. The early boot is a special environment -- so is =
the debugger, but the debugger on panic is not the same as the debugger =
when you can continue. Likewise, the crash dumping code is special, but =
also not the same as the debugger. Right now, exceptional behaviour to =
limit hangs/etc is done inconsistently. We need to develop a set of =
principles that tell us what is permitted in what contexts, and then use =
that to drive design decisions, normalizing what's there already.

This is not dissimilar to what we do with locking already, BTW: we =
define a set of kernel environments (fast interrupt handlers, =
non-sleepable threads, sleepable thread holding non-sleepable locks, =
etc), and based on those principles prevent significant sources of =
instability that might otherwise arise in a complex, concurrent kernel. =
We need to apply the same sort of approach to handling kernel debugging =
and crashing.

BTW, my view is that except in very exceptional cases, it should not be =
possible to continue after generating a dump. Dumps often cause disk =
controllers to get reset, which may leave outstanding I/O in nasty =
situations. Unless the dump device and model is known not to interfere =
with operation, we should set state indicating that the system is =
non-continuable once a dump has occurred.

Robert




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?93AB0F13-5995-4AAD-BEFC-A6F1317E3CA6>