From owner-freebsd-current@FreeBSD.ORG Wed Jul 27 19:10:59 2005 Return-Path: X-Original-To: freebsd-current@FreeBSD.org Delivered-To: freebsd-current@FreeBSD.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 6266C16A41F; Wed, 27 Jul 2005 19:10:59 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (Odin.AC.HMC.Edu [134.173.32.75]) by mx1.FreeBSD.org (Postfix) with ESMTP id EFE8043D45; Wed, 27 Jul 2005 19:10:58 +0000 (GMT) (envelope-from brdavis@odin.ac.hmc.edu) Received: from odin.ac.hmc.edu (localhost.localdomain [127.0.0.1]) by odin.ac.hmc.edu (8.13.0/8.13.0) with ESMTP id j6RJAhEI018678; Wed, 27 Jul 2005 12:10:43 -0700 Received: (from brdavis@localhost) by odin.ac.hmc.edu (8.13.0/8.13.0/Submit) id j6RJAhSj018677; Wed, 27 Jul 2005 12:10:43 -0700 Date: Wed, 27 Jul 2005 12:10:43 -0700 From: Brooks Davis To: Brooks Davis Message-ID: <20050727191043.GA17885@odin.ac.hmc.edu> References: <42E58007.9030202@rogers.com> <20050726193324.GA4603@odin.ac.hmc.edu> <20050726200059.GA47478@freebie.xs4all.nl> <200507261853.19211.jkim@FreeBSD.org> <20050726233933.GA13679@odin.ac.hmc.edu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="Q68bSM7Ycu6FN28Q" Content-Disposition: inline In-Reply-To: <20050726233933.GA13679@odin.ac.hmc.edu> User-Agent: Mutt/1.4.1i X-Virus-Scanned: by amavisd-new X-Spam-Status: No, hits=0.0 required=8.0 tests=none autolearn=no version=2.63 X-Spam-Checker-Version: SpamAssassin 2.63 (2004-01-11) on odin.ac.hmc.edu Cc: Wilko Bulte , freebsd-current@FreeBSD.org, Jung-uk Kim , Mike Jakubik Subject: Re: dhclient taking all cpu X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 27 Jul 2005 19:10:59 -0000 --Q68bSM7Ycu6FN28Q Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Tue, Jul 26, 2005 at 04:39:33PM -0700, Brooks Davis wrote: > On Tue, Jul 26, 2005 at 06:53:17PM -0400, Jung-uk Kim wrote: > > On Tuesday 26 July 2005 04:00 pm, Wilko Bulte wrote: > > > On Tue, Jul 26, 2005 at 12:33:24PM -0700, Brooks Davis wrote.. > > > > > > > On Mon, Jul 25, 2005 at 10:39:09PM -0400, Mike Jakubik wrote: > > > > > On Mon, July 25, 2005 9:54 pm, Brooks Davis said: > > > > > >>> Probably something wrong with your interface, but you > > > > > >>> havent't provided any useful information so who knows. At > > > > > >>> the very least, I need to know what interface you are > > > > > >>> running on, something about it's status, and if both > > > > > >>> dhclient processes are running. > > > > > >> > > > > > >> The interface is xl0 (3Com 3c905C-TX Fast Etherlink XL), and > > > > > >> it worked in this machine fine for as long as i remember. > > > > > >> This seems to have happened since a recent cvsup and > > > > > >> buildworld from ~6-BETA to 7-CURRENT. I rebooted three > > > > > >> times, and the problem occured rougly a minute after bootup. > > > > > >> On the fourth time however, it seems to be ok so far. > > > > > > > > > > > > That sounds like a problem with the code that handles the > > > > > > link state notifications in the interface driver. The > > > > > > notifications are a reletivly new feature that we're only now > > > > > > starting to use heavily so there are going to be bumps in the > > > > > > road. It would be intresting to know if you see link state > > > > > > messages promptly if you plug and unplug the network cable. > > > > > > > > > > It seems to be back at it again, this time it took longer to > > > > > kick in. Here is a "ps auxw|grep dhclient" : > > > > > > > > > > _dhcp 219 93.5 0.2 1484 1136 ?? Rs 8:49PM =20 > > > > > 5:06.00 dhclient: xl0 (dhclient) > > > > > root 193 0.0 0.2 1484 1088 d0- S 8:49PM =20 > > > > > 0:00.02 dhclient: xl0 [priv] (dhclient) > > > > > > > > > > top: > > > > > > > > > > PID USERNAME THR PRI NICE SIZE RES STATE TIME =20 > > > > > WCPU COMMAND 219 _dhcp 1 129 0 1484K 1136K RUN =20 > > > > > 9:33 94.24% dhclient > > > > > > > > > > Nothing in dmesg about link state changes on xl0. Unplugging > > > > > and replugging the network cable results in link state > > > > > notification within a couple seconds. > > > > > > > > Could you see what happens if you run dhclient in the foreground? > > > > Just running "dhclient -d xl0" should do it. I'd like to know > > > > what sort of output it's generating. > > > > > > In my case it is not displaying anything: > > > > > > > > > chuck#dhclient -d ath0 > > > DHCPREQUEST on ath0 to 255.255.255.255 port 67 > > > DHCPACK from 192.168.5.254 > > > bound to 192.168.5.20 -- renewal in 21600 seconds. > > > > > > > > > > > > I can tell the phenomenon occurs when my laptop fan springs to > > > life: > > > > > > CPU states: 96.5% user, 0.0% nice, 2.7% system, 0.8% interrupt,=20 > > > 0.0% idle > > > Mem: 48M Active, 28M Inact, 50M Wired, 680K Cache, 34M Buf, 115M > > > Free Swap: 257M Total, 257M Free > > > > > > PID USERNAME THR PRI NICE SIZE RES STATE TIME WCPU > > > COMMAND 719 _dhcp 1 129 0 1384K 1092K RUN 2:14 > > > 93.55% dhclient 607 root 1 98 0 34584K 21212K select =20 > > > 0:09 1.81% Xorg 663 wb 4 20 0 46712K 40224K kserel =20 > > > 0:27 0.00% mozilla-bin 503 root 1 8 0 1184K 796K > > > nanslp 0:07 0.00% powerd > > > > > > Took (best guess) approx 5-10 minutes for the effect to kick in. > >=20 > > FYI, I have the same issues with bge(4) and ndis(4). >=20 > I've seen it on ath and em interfaces now, but am not sure what's going > on. and have no idea how to reproduce the problem. As also reported by > Bakul Shah, we seem to be getting into a state where receive_packet() is > spinning. I'm not seeing an obvious way for this to be possible. I think I've found it. There was a really odd typo (=3D instead of +) in the code that handles undersized captures on the bpf socket. Please try the following patch and see if it solves the problem. I'm testing here, but I don't have a reliable way to trigger the bug. The fix is fairly obvious so I'll commit it to head shortly. -- Brooks =3D=3D=3D=3D //depot/user/brooks/cleanup/sbin/dhclient/bpf.c#3 - /usr/home/= brooks/working/freebsd/p4/cleanup/sbin/dhclient/bpf.c =3D=3D=3D=3D @@ -316,19 +316,19 @@ continue; } =20 + /* Skip over the BPF header... */ + interface->rbuf_offset +=3D hdr.bh_hdrlen; + /* * If the captured data wasn't the whole packet, or if * the packet won't fit in the input buffer, all we can * do is drop it. */ if (hdr.bh_caplen !=3D hdr.bh_datalen) { - interface->rbuf_offset +=3D hdr.bh_hdrlen =3D hdr.bh_caplen; + interface->rbuf_offset +=3D hdr.bh_caplen; continue; } =20 - /* Skip over the BPF header... */ - interface->rbuf_offset +=3D hdr.bh_hdrlen; - /* Decode the physical header... */ offset =3D decode_hw_header(interface->rbuf, interface->rbuf_offset, hfrom); --=20 Any statement of the form "X is the one, true Y" is FALSE. PGP fingerprint 655D 519C 26A7 82E7 2529 9BF0 5D8E 8BE9 F238 1AD4 --Q68bSM7Ycu6FN28Q Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.2.1 (GNU/Linux) iD8DBQFC59wyXY6L6fI4GtQRAmNsAKC9ip99isZd1n+RIp8SUkQzScdVDACgpap4 QPBXKXyGK2oQHqvW4wV3tEU= =LYPT -----END PGP SIGNATURE----- --Q68bSM7Ycu6FN28Q--