From owner-freebsd-stable@FreeBSD.ORG Mon Jun 26 15:10:43 2006 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 8E93716A536 for ; Mon, 26 Jun 2006 15:10:43 +0000 (UTC) (envelope-from mike@sentex.net) Received: from smarthost1.sentex.ca (smarthost1.sentex.ca [64.7.153.18]) by mx1.FreeBSD.org (Postfix) with ESMTP id 973C644EE2 for ; Mon, 26 Jun 2006 14:43:13 +0000 (GMT) (envelope-from mike@sentex.net) Received: from lava.sentex.ca (pyroxene.sentex.ca [199.212.134.18]) by smarthost1.sentex.ca (8.13.6/8.13.6) with ESMTP id k5QEhBWF017041; Mon, 26 Jun 2006 10:43:11 -0400 (EDT) (envelope-from mike@sentex.net) Received: from simian.sentex.net (simeon.sentex.ca [192.168.43.27]) by lava.sentex.ca (8.13.3P/8.13.3) with ESMTP id k5QEhB0B041172 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 26 Jun 2006 10:43:11 -0400 (EDT) (envelope-from mike@sentex.net) Message-Id: <6.2.3.4.0.20060626102033.02f2fcd8@64.7.153.2> X-Mailer: QUALCOMM Windows Eudora Version 6.2.3.4 Date: Mon, 26 Jun 2006 10:42:32 -0400 To: "Marc G. Fournier" , freebsd-stable@freebsd.org From: Mike Tancsa In-Reply-To: <20060626085321.T1114@ganymede.hub.org> References: <20060626085321.T1114@ganymede.hub.org> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii"; format=flowed X-Virus-Scanned: ClamAV version 0.88.2, clamav-milter version 0.88.2 on clamscanner1 X-Virus-Status: Clean Cc: Subject: Re: force panic of remote server ... possible? X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 26 Jun 2006 15:10:43 -0000 At 07:55 AM 26/06/2006, Marc G. Fournier wrote: >For the server that I'm fighting with right now, where Dmitry >pointed out that it looks like a deadlock issue ... I have >dumpdev/savecore enabled, is there some way of forcing it to panic >when I know I actually have the deadlock, so that it will dump a core? > >DDB is a difficult option, since a keyboard isn't always attached to >the server when it boots ... These are ugly quick hacks, but it might work for you... If the network still continues to function. you might be able to hack up a quick script to force a panic. Hackup some kld (e.g. ichwd) with something like # diff -u /usr/src/sys/dev/ichwd/ichwd.c.orig /usr/src/sys/dev/ichwd/ichwd.c --- /usr/src/sys/dev/ichwd/ichwd.c.orig Mon Jun 26 09:50:33 2006 +++ /usr/src/sys/dev/ichwd/ichwd.c Mon Jun 26 09:51:04 2006 @@ -225,6 +225,7 @@ device_t ich = NULL; device_t dev; + panic("I played panicky idiot no 3 on the Poseidon Adventure"); /* look for an ICH LPC interface bridge */ for (id = ichwd_devices; id->desc != NULL; ++id) if ((ich = pci_find_device(id->vendor, id->device)) != NULL) Then run a script something like the one below. Set target to be an ip that you control and is always up. When you think your box has deadlocked, add a firewall rule on the target machine to block ICMP echos from the problem machine. You might need to fiddle with max_tries to make it more aggressive. If the target machine is on the local LAN you can make it a nice low value like 2 or 3. Ideally, you would want to make a kld that would instead do the test for you, or you could perhaps hack up the software watchdog to call a panic for you. Dont know if that works or not as I have only used hardware watchdogs. #!/bin/sh timeout=5 no_resp_sleep=10 max_tries=25 normal_sleep=300 con_cnt=0 target=1.1.1.1 while true; do strings /boot/kernel/ichwd.ko > /dev/null # try and make sure these binaries are cached strings /sbin/kldload > /dev/null # try and make sure these binaries are cached if /sbin/ping -c1 -t$timeout $target > /dev/null 2>&1; then no_resp=0 else no_resp=$(($no_resp + 1)) fi if [ $no_resp -gt $max_tries ]; then /sbin/kldload ichwd fi if [ $no_resp -gt 0 ]; then sleep $no_resp_sleep else sleep $normal_sleep if [ $con_cnt -lt 25 ]; then con_cnt=$(($con_cnt + 1)) fi fi done & ---Mike