From owner-freebsd-net@FreeBSD.ORG Wed Oct 18 15:46:16 2006 Return-Path: X-Original-To: freebsd-net@FreeBSD.org Delivered-To: freebsd-net@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id A6F3B16A416 for ; Wed, 18 Oct 2006 15:46:16 +0000 (UTC) (envelope-from vulture@netvulture.com) Received: from rackman.netvulture.com (adsl-63-197-17-60.dsl.snfc21.pacbell.net [63.197.17.60]) by mx1.FreeBSD.org (Postfix) with ESMTP id BE46443D5A for ; Wed, 18 Oct 2006 15:45:36 +0000 (GMT) (envelope-from vulture@netvulture.com) Received: from [192.168.2.243] (host73.netvulture.com [208.201.244.73]) (authenticated bits=0) by rackman.netvulture.com (8.13.5/8.13.5) with ESMTP id k9IFgxI3036279; Wed, 18 Oct 2006 08:43:01 -0700 (PDT) Message-ID: <45364B82.9050409@netvulture.com> Date: Wed, 18 Oct 2006 08:42:58 -0700 From: Jonathan Feally User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.2) Gecko/20040804 Netscape/7.2 (ax) X-Accept-Language: en-us, en MIME-Version: 1.0 To: freebsd-net@FreeBSD.org, rizzo@icir.org References: <452FC336.6060504@netvulture.com> <20061014014441.D96390@fledge.watson.org> In-Reply-To: <20061014014441.D96390@fledge.watson.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-netvulture_com-MailScanner-Information: Please contact the ISP for more information X-netvulture_com-MailScanner: Found to be clean X-netvulture_com-MailScanner-SpamCheck: not spam (whitelisted), SpamAssassin (timed out) X-netvulture_com-MailScanner-From: vulture@netvulture.com X-Spam-Status: No Cc: Subject: Re: Problems with 6.2-PRE and udp applications - dhcpd and named - ipfw stateful issue? X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 18 Oct 2006 15:46:16 -0000 OK - It did it again. named locked up - wait chan was select. But I was able to kill the process this time and restart it. However I was still not able to do any query's. I added a quick ipfw add 1 allow ip from any to any and that solved the query problem. I then proceeded to inspect my ipfw rules. All outbound dnsquery's are using the following rule: allow udp from any to any dst-port 53 keep-state. I then tried to utilize some of my other keep-state rules with no luck. It would seem as if the firewall stack simply doesn't want to do stateful after a while. I also tried flushing all the rules and reloading them - that still did not work. I can live for today with out stateful, so if anyone can help me with it today/tpnight troubleshooting that would be great. I don't want to reboot the machine until somebody can help me diagnose the problem - especially since I'm running what is going to be 6.2-RELEASE. Looking back at the mailing list - I see that there was a change to ipfw.c that deals with dynamic rule timeout, perhaps this is to blame? I am willing to give ssh access to debug this problem. -Jon Robert Watson wrote: > > On Fri, 13 Oct 2006, Jonathan Feally wrote: > >> I have a P4 2.8 box running on an intel MB with a em0 acting as a >> firewall. The em0 has multiple tagged vlans on it, no ip assigned to >> main interface. Almost clockwork now, 6-7 days after bootup named or >> dhcpd completly locks up. I can't even kill -9 the apps. I have >> recompiled both apps since upgrading. I have only made two changes to >> this system around the same time. 1. Removed 2nd em nic that had >> only 1 network connected not vlan tagged. 2. Upgraded to 6.2-PRE >> >> Has anyone else had these problems? I am going to try running the >> system with the internet connection not tagged to see if that helps. > > > I've not seen this on any boxes. The usual debugging path here is to: > > (1) Look at the process wait channel in ps axl. > > (2) Compile KDB/DDB into the kernel, and do a kernel stack trace of the > process. > > Once you know what the kernel thread associated with the process is > doing, we can attempt to figure out why it's doing it. > > Robert N M Watson > Computer Laboratory > University of Cambridge >