From owner-freebsd-current@FreeBSD.ORG Wed Jul 27 19:47:31 2005 Return-Path: X-Original-To: freebsd-current@freebsd.org Delivered-To: freebsd-current@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7439716A41F for ; Wed, 27 Jul 2005 19:47:31 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from mh2.centtech.com (moat3.centtech.com [207.200.51.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 0F7D743D5C for ; Wed, 27 Jul 2005 19:47:30 +0000 (GMT) (envelope-from anderson@centtech.com) Received: from [10.177.171.220] (neutrino.centtech.com [10.177.171.220]) by mh2.centtech.com (8.13.1/8.13.1) with ESMTP id j6RJlPv9033215; Wed, 27 Jul 2005 14:47:30 -0500 (CDT) (envelope-from anderson@centtech.com) Message-ID: <42E7E4C9.1030600@centtech.com> Date: Wed, 27 Jul 2005 14:47:21 -0500 From: Eric Anderson User-Agent: Mozilla/5.0 (X11; U; FreeBSD i386; en-US; rv:1.7.8) Gecko/20050603 X-Accept-Language: en-us, en MIME-Version: 1.0 To: Brooks Davis References: <42E58007.9030202@rogers.com> <20050726193324.GA4603@odin.ac.hmc.edu> <20050726200059.GA47478@freebie.xs4all.nl> <200507261853.19211.jkim@FreeBSD.org> <20050726233933.GA13679@odin.ac.hmc.edu> <20050727191043.GA17885@odin.ac.hmc.edu> <42E7E1EA.9060209@centtech.com> <20050727193848.GB20112@odin.ac.hmc.edu> In-Reply-To: <20050727193848.GB20112@odin.ac.hmc.edu> Content-Type: text/plain; charset=us-ascii; format=flowed Content-Transfer-Encoding: 7bit Cc: freebsd-current@freebsd.org Subject: Re: dhclient taking all cpu X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2005 19:47:31 -0000 Brooks Davis wrote: > On Wed, Jul 27, 2005 at 02:35:06PM -0500, Eric Anderson wrote: > >>Brooks Davis wrote: >> >>>On Tue, Jul 26, 2005 at 04:39:33PM -0700, Brooks Davis wrote: >>> >>> >>>>On Tue, Jul 26, 2005 at 06:53:17PM -0400, Jung-uk Kim wrote: >>>> >>>> >>>>>On Tuesday 26 July 2005 04:00 pm, Wilko Bulte wrote: >>>>> >>>>> >>>>>>On Tue, Jul 26, 2005 at 12:33:24PM -0700, Brooks Davis wrote.. >>>>>> >>>>>> >>>>>> >>>>>>>On Mon, Jul 25, 2005 at 10:39:09PM -0400, Mike Jakubik wrote: >>>>>>> >>>>>>> >>>>>>>>On Mon, July 25, 2005 9:54 pm, Brooks Davis said: >>>>>>>> >>>>>>>> >>>>>>>>>>>Probably something wrong with your interface, but you >>>>>>>>>>>havent't provided any useful information so who knows. At >>>>>>>>>>>the very least, I need to know what interface you are >>>>>>>>>>>running on, something about it's status, and if both >>>>>>>>>>>dhclient processes are running. >>>>>>>>>> >>>>>>>>>>The interface is xl0 (3Com 3c905C-TX Fast Etherlink XL), and >>>>>>>>>>it worked in this machine fine for as long as i remember. >>>>>>>>>>This seems to have happened since a recent cvsup and >>>>>>>>>>buildworld from ~6-BETA to 7-CURRENT. I rebooted three >>>>>>>>>>times, and the problem occured rougly a minute after bootup. >>>>>>>>>>On the fourth time however, it seems to be ok so far. >>>>>>>>> >>>>>>>>>That sounds like a problem with the code that handles the >>>>>>>>>link state notifications in the interface driver. The >>>>>>>>>notifications are a reletivly new feature that we're only now >>>>>>>>>starting to use heavily so there are going to be bumps in the >>>>>>>>>road. It would be intresting to know if you see link state >>>>>>>>>messages promptly if you plug and unplug the network cable. >>>>>>>> >>>>>>>>It seems to be back at it again, this time it took longer to >>>>>>>>kick in. Here is a "ps auxw|grep dhclient" : >>>>>>>> >>>>>>>>_dhcp 219 93.5 0.2 1484 1136 ?? Rs 8:49PM >>>>>>>>5:06.00 dhclient: xl0 (dhclient) >>>>>>>>root 193 0.0 0.2 1484 1088 d0- S 8:49PM >>>>>>>>0:00.02 dhclient: xl0 [priv] (dhclient) >>>>>>>> >>>>>>>>top: >>>>>>>> >>>>>>>>PID USERNAME THR PRI NICE SIZE RES STATE TIME >>>>>>>>WCPU COMMAND 219 _dhcp 1 129 0 1484K 1136K RUN >>>>>>>> 9:33 94.24% dhclient >>>>>>>> >>>>>>>>Nothing in dmesg about link state changes on xl0. Unplugging >>>>>>>>and replugging the network cable results in link state >>>>>>>>notification within a couple seconds. >>>>>>> >>>>>>>Could you see what happens if you run dhclient in the foreground? >>>>>>>Just running "dhclient -d xl0" should do it. I'd like to know >>>>>>>what sort of output it's generating. >>>>>> >>>>>>In my case it is not displaying anything: >>>>>> >>>>>> >>>>>>chuck#dhclient -d ath0 >>>>>>DHCPREQUEST on ath0 to 255.255.255.255 port 67 >>>>>>DHCPACK from 192.168.5.254 >>>>>>bound to 192.168.5.20 -- renewal in 21600 seconds. >>>>>> >>>>>> >>>>>> >>>>>>I can tell the phenomenon occurs when my laptop fan springs to >>>>>>life: >>>>>> >>>>>>CPU states: 96.5% user, 0.0% nice, 2.7% system, 0.8% interrupt, >>>>>>0.0% idle >>>>>>Mem: 48M Active, 28M Inact, 50M Wired, 680K Cache, 34M Buf, 115M >>>>>>Free Swap: 257M Total, 257M Free >>>>>> >>>>>>PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU >>>>>>COMMAND 719 _dhcp 1 129 0 1384K 1092K RUN 2:14 >>>>>>93.55% dhclient 607 root 1 98 0 34584K 21212K select >>>>>>0:09 1.81% Xorg 663 wb 4 20 0 46712K 40224K kserel >>>>>>0:27 0.00% mozilla-bin 503 root 1 8 0 1184K 796K >>>>>>nanslp 0:07 0.00% powerd >>>>>> >>>>>>Took (best guess) approx 5-10 minutes for the effect to kick in. >>>>> >>>>>FYI, I have the same issues with bge(4) and ndis(4). >>>> >>>>I've seen it on ath and em interfaces now, but am not sure what's going >>>>on. and have no idea how to reproduce the problem. As also reported by >>>>Bakul Shah, we seem to be getting into a state where receive_packet() is >>>>spinning. I'm not seeing an obvious way for this to be possible. >>> >>> >>>I think I've found it. There was a really odd typo (= instead of +) in >>>the code that handles undersized captures on the bpf socket. Please try >>>the following patch and see if it solves the problem. I'm testing here, >>>but I don't have a reliable way to trigger the bug. The fix is fairly >>>obvious so I'll commit it to head shortly. >> >>It's been 20 minutes without any issues - I think that did it. Thanks! > > > Great! Thanks for the report. I give up. Now it's back to it's dirty ways. Ran for 22 mins without issue (with -d option), so I reran without the -d, and it spiked within a few minutes. I'll now wait until someone else claims it works before commenting on it since my computer seems to enjou making me look bad. :) Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------