From owner-freebsd-stable@FreeBSD.ORG Wed Sep 8 04:38:36 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 6E28C10656DC for ; Wed, 8 Sep 2010 04:38:36 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.emeryville.ca.mail.comcast.net (qmta06.emeryville.ca.mail.comcast.net [76.96.30.56]) by mx1.freebsd.org (Postfix) with ESMTP id 525448FC08 for ; Wed, 8 Sep 2010 04:38:35 +0000 (UTC) Received: from omta09.emeryville.ca.mail.comcast.net ([76.96.30.20]) by qmta06.emeryville.ca.mail.comcast.net with comcast id 3oub1f0090S2fkCA64ebXU; Wed, 08 Sep 2010 04:38:35 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta09.emeryville.ca.mail.comcast.net with comcast id 44ea1f00b3LrwQ28V4ebgE; Wed, 08 Sep 2010 04:38:35 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 8A71B9B423; Tue, 7 Sep 2010 21:38:34 -0700 (PDT) Date: Tue, 7 Sep 2010 21:38:34 -0700 From: Jeremy Chadwick To: Pyun YongHyeon Message-ID: <20100908043834.GA27124@icarus.home.lan> References: <20100907210813.GI49065@martini.nu> <20100907222403.GA18595@icarus.home.lan> <20100907233257.GA94092@martini.nu> <20100908002917.GO1439@michelle.cdnetworks.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100908002917.GO1439@michelle.cdnetworks.com> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: freebsd-stable@freebsd.org Subject: Re: Network memory allocation failures X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 08 Sep 2010 04:38:36 -0000 On Tue, Sep 07, 2010 at 05:29:17PM -0700, Pyun YongHyeon wrote: > On Tue, Sep 07, 2010 at 04:32:57PM -0700, Mahlon E. Smith wrote: > > On Tue, Sep 07, 2010, Jeremy Chadwick wrote: > > > > > > This could be a bce(4) bug, meaning the "failed to allocate memory" > > > message could be indicating DMA failure or something else from the card, > > > and not necessarily related to mbufs. > > > > > > There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE) > > > that aren't in 8.1-RELEASE, but I don't know if those are responsible > > > for your problem. > > > > Hmm, well -- I'm definitely not opposed to jumping to -STABLE if it > > might fix it. > > > > > > > Please provide output from the following: > > > > > > * uname -a (if desired, XXX out hostname) > > > > FreeBSD jessage 8.1-RELEASE FreeBSD 8.1-RELEASE #2: Fri Aug 20 14:30:31 PDT 2010 root@jessage:/usr/src/sys/amd64/compile/R810 amd64 > > > > Custom kernel, with additions to GENERIC (nothing removed): > > > > device carp > > device snp > > options HZ=1000 > > options DEVICE_POLLING > > bce(4) does not support polling(4) so you can completely remove > configuration of HZ and DEVICE_POLLING. In fact, there is no reason > to use polling(4) at all on intelligent controllers like bce(4). > polling(4) is mainly for dumb controllers that lack efficient > interrupt moderation. > > > options ALTQ > > options ALTQ_CBQ > > options ALTQ_PRIQ > > options SC_DISABLE_REBOOT > > options PANIC_REBOOT_WAIT_TIME=5 > > > > ALTQ and friends not actually active on the machine. I was fighting a > > different battle when running GENERIC, so I can't honestly recall if this > > problem existed then -- I'll make sure it is still happening under > > GENERIC for a baseline, to eliminate any potential weirdness with > > DEVICE_POLLING or the HZ timing. > > > > > > > * vmstat -i > > > > interrupt total rate > > irq19: ehci0 1547103 0 > > irq21: uhci1 uhci3+ 29 0 > > irq23: atapci0 35 0 > > irq32: mfi0 68104468 43 > > cpu0: timer 3093305346 1986 > > irq256: bce0 46587008 29 > > cpu19: timer 3103614834 1992 > > cpu1: timer 3093298527 1986 > > cpu4: timer 3093297557 1986 > > cpu10: timer 3089824707 1983 > > cpu12: timer 3097896788 1989 > > cpu16: timer 3097897232 1989 > > cpu22: timer 3103615267 1992 > > cpu2: timer 3093297601 1986 > > cpu5: timer 3093298349 1986 > > cpu3: timer 3093298637 1986 > > cpu6: timer 3089823402 1983 > > cpu18: timer 3103614571 1992 > > cpu13: timer 3097897961 1989 > > cpu20: timer 3103615299 1992 > > cpu23: timer 3103614783 1992 > > cpu9: timer 3089821582 1983 > > cpu17: timer 3097898138 1989 > > cpu11: timer 3089821712 1983 > > cpu14: timer 3097897190 1989 > > cpu7: timer 3089821360 1983 > > cpu21: timer 3103615012 1992 > > cpu15: timer 3097898081 1989 > > cpu8: timer 3089824487 1983 > > Total 74424047066 47788 > > > > > > > * ifconfig -a (if desired, XXX out IPs and MACs) > > > > bce0: flags=8943 metric 0 mtu 1500 > > options=c01bb > > ether 00:25:64:fd:0b:24 > > inet 10.5.2.69 netmask 0xfffffc00 broadcast 10.5.3.255 > > media: Ethernet autoselect (1000baseT ) > > status: active > > bce1: flags=8802 metric 0 mtu 1500 > > options=c01bb > > ether 00:25:64:fd:0b:26 > > media: Ethernet autoselect (none) > > status: no carrier > > bce2: flags=8802 metric 0 mtu 1500 > > options=c01bb > > ether 00:25:64:fd:0b:28 > > media: Ethernet autoselect (none) > > status: no carrier > > bce3: flags=8802 metric 0 mtu 1500 > > options=c01bb > > ether 00:25:64:fd:0b:2a > > media: Ethernet autoselect (none) > > status: no carrier > > lo0: flags=8049 metric 0 mtu 16384 > > options=3 > > inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 > > inet6 ::1 prefixlen 128 > > inet 127.0.0.1 netmask 0xff000000 > > nd6 options=3 > > vboxnet0: flags=8802 metric 0 mtu 1500 > > ether 0a:00:27:00:00:00 > > > > > > > * netstat -inbd (if desired, XXX out MACs) > > > > Name Mtu Network Address Ipkts Ierrs Idrop Ibytes Opkts Oerrs Obytes Coll Drop > > bce0 1500 00:25:64:fd:0b:24 14467627 0 0 6346549588 11846499 0 4646920777 0 0 > > bce0 1500 10.5.0.0/22 10.5.2.69 1987644 - - 371635478 415087 - 74168123 - - > > bce1* 1500 00:25:64:fd:0b:26 0 0 0 0 0 0 0 0 0 > > bce2* 1500 00:25:64:fd:0b:28 0 0 0 0 0 0 0 0 0 > > bce3* 1500 00:25:64:fd:0b:2a 0 0 0 0 0 0 0 0 0 > > lo0 16384 25561 0 0 47338756 25561 0 47338756 0 0 > > lo0 16384 fe80:5::1/64 fe80:5::1 0 - - 0 0 - 0 - - > > lo0 16384 ::1/128 ::1 0 - - 0 0 - 0 - - > > lo0 16384 127.0.0.0/8 127.0.0.1 25561 - - 47338756 25561 - 47338756 - - > > vboxn 1500 0a:00:27:00:00:00 0 0 0 0 0 0 0 0 0 > > > > > > > > > * pciconf -lvc (only the bceX entry please) > > > > bce0@pci0:1:0:0: class=0x020000 card=0x02d41028 chip=0x163914e4 rev=0x20 hdr=0x00 > > vendor = 'Broadcom Corporation' > > device = 'NetXtreme II Gigabit Ethernet (BCM5709)' > > class = network > > subclass = ethernet > > cap 01[48] = powerspec 3 supports D0 D3 current D0 > > cap 03[50] = VPD > > cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message > > cap 11[a0] = MSI-X supports 9 messages in map 0x10 > > cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x2(x4) > > > > > > > Also check dmesg to see if there's any error messages that correlate > > > when the problem occurs. > > > > All quiet on that front. > > > > Based on your outputs, I don't see abnormal things in bce(4). > Why do you think bce(4) is the cause of problem? > You may see more detailed MAC statistics if controller saw some > kind of memory related failure from the output of > "sysctl dev.bce.0". I figured there might memory exhaustion of sorts, possibly in the bce(4) driver itself, that could cause the OP's problem. bce(4) might not be the problem at all. But the OP's issue seems to only occur when transmitting data, not receiving: http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.html The 2nd-to-last paragraph there is worth noting, specifically how limiting maximum addressable memory to 32GB via loader.conf seems to work around the issue. There were other problems with the systems in question back in July, it seems. I assume these got hammered out somehow: http://www.mail-archive.com/freebsd-stable@freebsd.org/msg111408.html -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |