From owner-freebsd-hackers@FreeBSD.ORG Thu Jul 11 14:28:04 2013 Return-Path: Delivered-To: hackers@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by hub.freebsd.org (Postfix) with ESMTP id 2482B22B for ; Thu, 11 Jul 2013 14:28:04 +0000 (UTC) (envelope-from julian@freebsd.org) Received: from vps1.elischer.org (vps1.elischer.org [204.109.63.16]) by mx1.freebsd.org (Postfix) with ESMTP id 06A3D1053 for ; Thu, 11 Jul 2013 14:28:03 +0000 (UTC) Received: from Julian-MBP3.local (ppp121-45-226-51.lns20.per1.internode.on.net [121.45.226.51]) (authenticated bits=0) by vps1.elischer.org (8.14.5/8.14.5) with ESMTP id r6BERv7E030579 (version=TLSv1/SSLv3 cipher=DHE-RSA-CAMELLIA256-SHA bits=256 verify=NO); Thu, 11 Jul 2013 07:28:00 -0700 (PDT) (envelope-from julian@freebsd.org) Message-ID: <51DEC0E8.7010305@freebsd.org> Date: Thu, 11 Jul 2013 22:27:52 +0800 From: Julian Elischer User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:17.0) Gecko/20130620 Thunderbird/17.0.7 MIME-Version: 1.0 To: Kevin Day Subject: Re: Kernel dumps [was Re: possible changes from Panzura] References: <9890DFF1-892A-4DCA-9E33-B70681154F43@mail.turbofuzz.com> <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> In-Reply-To: <4F0DFAB7-D6D5-4068-A543-C9DF885D1A7D@dragondata.com> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 8bit Cc: hackers@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 11 Jul 2013 14:28:04 -0000 On 7/11/13 6:09 AM, Kevin Day wrote: >> >> Those sound useful. Just out of curiosity, however, since we're on the topic of kernel dumps: Has anyone even looked into the notion of an emergency fall-back network stack to enable remote kernel panic (or system hang) debugging, the way OS X lets you do? I can't tell you the number of times I've NMI'd a Mac and connected to it remotely in a scenario where everything was totally wedged and just a couple of minutes in kgdb (or now lldb) quickly showed that everything was waiting on a specific lock and the problem became manifestly clear. >> >> The feature also lets you scrape a panic'd machine with automation, running some kgdb scripts against it to glean useful information for later analysis vs having to have someone schlep the dump image manually to triage. It's going to be damn hard to live without this now, and if someone else isn't working on it, that's good to know too! I could imagine that we could stash away a vimage stack just for this purpose. yould set it up on boot and leave it detached until you need it. you just need to switch the interfaces over to the new stack on panic and put them into 'poll' mode. Or maybe you'd need more (like pre-allocating mbufs for it to use). Just an idea. > > At a previous employer, we had a system where on a panic it had a totally separate stack capable of just IP/UDP/TFTP and would save its core via TFTP to a server. This isn’t as nice as full remote debugging, but it was a whole lot easier to develop. The caveats I remember were: > > 1) We didn’t want to implement ARP, so you had to write the mac address of the “dump server” to the kernel via sysctl before crashing. > 2) We also didn’t want to have to deal with routing tables, so you had to manually specify what interface to blast packets out to, also via sysctl. > 3) After a panic we didn’t want to rely on interrupt processing working, so it polled the network interface and blocked whenever it needed to. Since this was an embedded system, it wasn’t too big of a deal - only one network driver had to be hacked to support this. Basically a flag that would switch to “disable normal processing, switch to polled fifos for input and output” until reboot. > 4) The whole system used only preallocated buffers and its own stack (carved out from memory on boot) so even if the kernel’s malloc was trashed, we could still dump. > > I’m not sure this really would scratch your itch, but I believe this took me no more than a day or two to implement. Parts #1 and #2 would be pretty easy, but I’m not sure how generic the kernel could support an emergency network mode that doesn’t require interrupts for every network card out there. Maybe that isn’t as important to you as it was to us. > > The whole exercise is much easier if you don’t use TFTP but a custom protocol that doesn’t require the crashing system to receive any packets, if it can just blast away at some random host oblivious if it’s working or not, it’s a lot less code to write. > > > _______________________________________________ > freebsd-hackers@freebsd.org mailing list > http://lists.freebsd.org/mailman/listinfo/freebsd-hackers > To unsubscribe, send any mail to "freebsd-hackers-unsubscribe@freebsd.org" > >