From owner-freebsd-stable@FreeBSD.ORG Sat Apr 14 06:42:12 2007 Return-Path: X-Original-To: freebsd-stable@freebsd.org Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 844C816A402 for ; Sat, 14 Apr 2007 06:42:12 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from mail.ntplx.net (mail.ntplx.net [204.213.176.10]) by mx1.freebsd.org (Postfix) with ESMTP id 3581E13C4AE for ; Sat, 14 Apr 2007 06:42:12 +0000 (UTC) (envelope-from deischen@freebsd.org) Received: from sea.ntplx.net (sea.ntplx.net [204.213.176.11]) by mail.ntplx.net (8.14.0/8.14.0/NETPLEX) with ESMTP id l3E6gBWj002077; Sat, 14 Apr 2007 02:42:11 -0400 (EDT) X-Virus-Scanned: by AMaViS and Clam AntiVirus (mail.ntplx.net) X-Greylist: Message whitelisted by DRAC access database, not delayed by milter-greylist-3.0 (mail.ntplx.net [204.213.176.10]); Sat, 14 Apr 2007 02:42:11 -0400 (EDT) Date: Sat, 14 Apr 2007 02:42:11 -0400 (EDT) From: Daniel Eischen X-X-Sender: eischen@sea.ntplx.net To: "Marc G. Fournier" In-Reply-To: Message-ID: References: <000301c77a53$d2219940$0200a8c0@satellite> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII; format=flowed Cc: Dave , freebsd-stable@freebsd.org Subject: Re: 74 hours till next "No Buffer Space Available" reboot ... X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list Reply-To: Daniel Eischen List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 14 Apr 2007 06:42:12 -0000 On Sat, 14 Apr 2007, Marc G. Fournier wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA1 > > > > - --On Sunday, April 08, 2007 23:04:42 -0400 Dave wrote: > >> Hello, >> This is what i get for catching this late. Can you describe your >> situation? I've got a server, router actually running 6.1-p6 i believe, and >> lately it's been doing this stop. I can't be any more specific than that, >> because that's all i know. The box just goes unresponsive, i can get a login >> prompt on the console, but it's unresponsive. I have to reboot it. This has >> occurred twice now and i'm starting to get concerned. I've ruled out ram, i >> recently replaced it's ram for an unrelated reason so i don't think that's >> it. If your situation is similar can you let me know what you tried? > > This is a different situation, I think ... first, I'm running 6.2-STABLE, as of > about last week, so a much newer kernel then you are running ... and in my > case, at least, I can still login to the machine using ssh and force a reboot > remotely ... it doesn't seem to be a 'solid hang' ... if I were to hazard a > guess as to what it "feels like" ... it feels like the network interface > "buffer" has filled up, but isn't being released properly ... almost like a > memory leak, but on the network ... if I leave it long enough, it will > eventually require a tech to power cycle it, but if I catch it early enough, I > can still get in to do a reboot ... > > But ... that said ... when you say "'get a login prompt on the console, but > it's unresponse" ... do you mean that you can actually type in a userid, and > possibly passwd, but after that it just hangs? I will just add that I get this on an old 4-stable router box (for years). It is on an sf interface and I _thought_ it was due to a flaky hub. I got the "sendto: no buffer space avail" message on the incoming/outgoing interface to the router that was doing NAT and ipfw to our internal LANs. I resorted to writing a cron job that would try to ping the router at the other end of the sf interface and do an 'ifconfig sf0 down; ifconfig sf0 up' whenever the router at the other end could not be ping'd. Something like this: if ping -c 2 remote-router > /dev/null; then /usr/bin/true else /sbin/ifconfig sf0 down /bin/sleep 1 /sbin/ifconfig sf0 up fi This router is running 4.11. Without the cronjob, the network would fail every week or two. I gave up trying to figure out what the real problem was. -- DE