Date: Wed, 27 Jul 2005 14:35:06 -0500 From: Eric Anderson <anderson@centtech.com> To: Brooks Davis <brooks@one-eyed-alien.net> Cc: Wilko Bulte <wb@freebie.xs4all.nl>, freebsd-current@freebsd.org, Jung-uk Kim <jkim@freebsd.org>, Mike Jakubik <mikej@rogers.com> Subject: Re: dhclient taking all cpu Message-ID: <42E7E1EA.9060209@centtech.com> In-Reply-To: <20050727191043.GA17885@odin.ac.hmc.edu> References: <42E58007.9030202@rogers.com> <20050726193324.GA4603@odin.ac.hmc.edu> <20050726200059.GA47478@freebie.xs4all.nl> <200507261853.19211.jkim@FreeBSD.org> <20050726233933.GA13679@odin.ac.hmc.edu> <20050727191043.GA17885@odin.ac.hmc.edu>
next in thread | previous in thread | raw e-mail | index | archive | help
Brooks Davis wrote: > On Tue, Jul 26, 2005 at 04:39:33PM -0700, Brooks Davis wrote: > >>On Tue, Jul 26, 2005 at 06:53:17PM -0400, Jung-uk Kim wrote: >> >>>On Tuesday 26 July 2005 04:00 pm, Wilko Bulte wrote: >>> >>>>On Tue, Jul 26, 2005 at 12:33:24PM -0700, Brooks Davis wrote.. >>>> >>>> >>>>>On Mon, Jul 25, 2005 at 10:39:09PM -0400, Mike Jakubik wrote: >>>>> >>>>>>On Mon, July 25, 2005 9:54 pm, Brooks Davis said: >>>>>> >>>>>>>>>Probably something wrong with your interface, but you >>>>>>>>>havent't provided any useful information so who knows. At >>>>>>>>>the very least, I need to know what interface you are >>>>>>>>>running on, something about it's status, and if both >>>>>>>>>dhclient processes are running. >>>>>>>> >>>>>>>>The interface is xl0 (3Com 3c905C-TX Fast Etherlink XL), and >>>>>>>>it worked in this machine fine for as long as i remember. >>>>>>>>This seems to have happened since a recent cvsup and >>>>>>>>buildworld from ~6-BETA to 7-CURRENT. I rebooted three >>>>>>>>times, and the problem occured rougly a minute after bootup. >>>>>>>>On the fourth time however, it seems to be ok so far. >>>>>>> >>>>>>>That sounds like a problem with the code that handles the >>>>>>>link state notifications in the interface driver. The >>>>>>>notifications are a reletivly new feature that we're only now >>>>>>>starting to use heavily so there are going to be bumps in the >>>>>>>road. It would be intresting to know if you see link state >>>>>>>messages promptly if you plug and unplug the network cable. >>>>>> >>>>>>It seems to be back at it again, this time it took longer to >>>>>>kick in. Here is a "ps auxw|grep dhclient" : >>>>>> >>>>>>_dhcp 219 93.5 0.2 1484 1136 ?? Rs 8:49PM >>>>>>5:06.00 dhclient: xl0 (dhclient) >>>>>>root 193 0.0 0.2 1484 1088 d0- S 8:49PM >>>>>>0:00.02 dhclient: xl0 [priv] (dhclient) >>>>>> >>>>>>top: >>>>>> >>>>>> PID USERNAME THR PRI NICE SIZE RES STATE TIME >>>>>>WCPU COMMAND 219 _dhcp 1 129 0 1484K 1136K RUN >>>>>> 9:33 94.24% dhclient >>>>>> >>>>>>Nothing in dmesg about link state changes on xl0. Unplugging >>>>>>and replugging the network cable results in link state >>>>>>notification within a couple seconds. >>>>> >>>>>Could you see what happens if you run dhclient in the foreground? >>>>> Just running "dhclient -d xl0" should do it. I'd like to know >>>>>what sort of output it's generating. >>>> >>>>In my case it is not displaying anything: >>>> >>>> >>>>chuck#dhclient -d ath0 >>>>DHCPREQUEST on ath0 to 255.255.255.255 port 67 >>>>DHCPACK from 192.168.5.254 >>>>bound to 192.168.5.20 -- renewal in 21600 seconds. >>>> >>>><nothing> >>>> >>>>I can tell the phenomenon occurs when my laptop fan springs to >>>>life: >>>> >>>>CPU states: 96.5% user, 0.0% nice, 2.7% system, 0.8% interrupt, >>>>0.0% idle >>>>Mem: 48M Active, 28M Inact, 50M Wired, 680K Cache, 34M Buf, 115M >>>>Free Swap: 257M Total, 257M Free >>>> >>>> PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU >>>>COMMAND 719 _dhcp 1 129 0 1384K 1092K RUN 2:14 >>>>93.55% dhclient 607 root 1 98 0 34584K 21212K select >>>>0:09 1.81% Xorg 663 wb 4 20 0 46712K 40224K kserel >>>>0:27 0.00% mozilla-bin 503 root 1 8 0 1184K 796K >>>>nanslp 0:07 0.00% powerd >>>> >>>>Took (best guess) approx 5-10 minutes for the effect to kick in. >>> >>>FYI, I have the same issues with bge(4) and ndis(4). >> >>I've seen it on ath and em interfaces now, but am not sure what's going >>on. and have no idea how to reproduce the problem. As also reported by >>Bakul Shah, we seem to be getting into a state where receive_packet() is >>spinning. I'm not seeing an obvious way for this to be possible. > > > I think I've found it. There was a really odd typo (= instead of +) in > the code that handles undersized captures on the bpf socket. Please try > the following patch and see if it solves the problem. I'm testing here, > but I don't have a reliable way to trigger the bug. The fix is fairly > obvious so I'll commit it to head shortly. It's been 20 minutes without any issues - I think that did it. Thanks! Eric -- ------------------------------------------------------------------------ Eric Anderson Sr. Systems Administrator Centaur Technology Anything that works is better than anything that doesn't. ------------------------------------------------------------------------
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?42E7E1EA.9060209>