Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 10 Jul 2013 17:09:10 -0500
From:      Kevin Day <toasty@dragondata.com>
To:        Jordan Hubbard <jkh@mail.turbofuzz.com>
Cc:        hackers@freebsd.org
Subject:   Re: Kernel dumps [was Re: possible changes from Panzura]
Message-ID:  <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com>
In-Reply-To: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com>
References:  <FDEEB55D-823B-4899-8EEC-7F5306D91F5B@elischer.org> <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com>

next in thread | previous in thread | raw e-mail | index | archive | help

>=20
>=20
> Those sound useful.   Just out of curiosity, however, since we're on =
the topic of kernel dumps:  Has anyone even looked into the notion of an =
emergency fall-back network stack to enable remote kernel panic (or =
system hang) debugging, the way OS X lets you do?  I can't tell you the =
number of times I've NMI'd a Mac and connected to it remotely in a =
scenario where everything was totally wedged and just a couple of =
minutes in kgdb (or now lldb) quickly showed that everything was waiting =
on a specific lock and the problem became manifestly clear.
>=20
> The feature also lets you scrape a panic'd machine with automation, =
running some kgdb scripts against it to glean useful information for =
later analysis vs having to have someone schlep the dump image manually =
to triage.  It's going to be damn hard to live without this now, and if =
someone else isn't working on it, that's good to know too!


At a previous employer, we had a system where on a panic it had a =
totally separate stack capable of just IP/UDP/TFTP and would save its =
core via TFTP to a server. This isn=92t as nice as full remote =
debugging, but it was a whole lot easier to develop. The caveats I =
remember were:

1) We didn=92t want to implement ARP, so you had to write the mac =
address of the =93dump server=94 to the kernel via sysctl before =
crashing.
2) We also didn=92t want to have to deal with routing tables, so you had =
to manually specify what interface to blast packets out to, also via =
sysctl.
3) After a panic we didn=92t want to rely on interrupt processing =
working, so it polled the network interface and blocked whenever it =
needed to. Since this was an embedded system, it wasn=92t too big of a =
deal - only one network driver had to be hacked to support this. =
Basically a flag that would switch to =93disable normal processing, =
switch to polled fifos for input and output=94 until reboot.
4) The whole system used only preallocated buffers and its own stack =
(carved out from memory on boot) so even if the kernel=92s malloc was =
trashed, we could still dump.

I=92m not sure this really would scratch your itch, but I believe this =
took me no more than a day or two to implement. Parts #1 and #2 would be =
pretty easy, but I=92m not sure how generic the kernel could support an =
emergency network mode that doesn=92t require interrupts for every =
network card out there. Maybe that isn=92t as important to you as it was =
to us.

The whole exercise is much easier if you don=92t use TFTP but a custom =
protocol that doesn=92t require the crashing system to receive any =
packets, if it can just blast away at some random host oblivious if it=92s=
 working or not, it=92s a lot less code to write.





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D>