Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Sep 2010 17:29:17 -0700
From:      Pyun YongHyeon <pyunyh@gmail.com>
To:        "Mahlon E. Smith" <mahlon@martini.nu>, Jeremy Chadwick <freebsd@jdc.parodius.com>, freebsd-stable@freebsd.org
Subject:   Re: Network memory allocation failures
Message-ID:  <20100908002917.GO1439@michelle.cdnetworks.com>
In-Reply-To: <20100907233257.GA94092@martini.nu>
References:  <20100907210813.GI49065@martini.nu> <20100907222403.GA18595@icarus.home.lan> <20100907233257.GA94092@martini.nu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 07, 2010 at 04:32:57PM -0700, Mahlon E. Smith wrote:
> On Tue, Sep 07, 2010, Jeremy Chadwick wrote:
> > 
> > This could be a bce(4) bug, meaning the "failed to allocate memory"
> > message could be indicating DMA failure or something else from the card,
> > and not necessarily related to mbufs.
> > 
> > There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE)
> > that aren't in 8.1-RELEASE, but I don't know if those are responsible
> > for your problem.
> 
> Hmm, well -- I'm definitely not opposed to jumping to -STABLE if it
> might fix it.
> 
> 
> > Please provide output from the following:
> > 
> > * uname -a        (if desired, XXX out hostname)
> 
> FreeBSD jessage 8.1-RELEASE FreeBSD 8.1-RELEASE #2: Fri Aug 20 14:30:31 PDT 2010 root@jessage:/usr/src/sys/amd64/compile/R810  amd64
> 
> Custom kernel, with additions to GENERIC (nothing removed):
> 
>     device carp
>     device snp
>     options HZ=1000
>     options DEVICE_POLLING

bce(4) does not support polling(4) so you can completely remove
configuration of HZ and DEVICE_POLLING. In fact, there is no reason
to use polling(4) at all on intelligent controllers like bce(4).
polling(4) is mainly for dumb controllers that lack efficient
interrupt moderation.

>     options ALTQ
>     options ALTQ_CBQ
>     options ALTQ_PRIQ
>     options SC_DISABLE_REBOOT
>     options PANIC_REBOOT_WAIT_TIME=5
> 
> ALTQ and friends not actually active on the machine.  I was fighting a
> different battle when running GENERIC, so I can't honestly recall if this
> problem existed then -- I'll make sure it is still happening under
> GENERIC for a baseline, to eliminate any potential weirdness with
> DEVICE_POLLING or the HZ timing.
> 
> 
> > * vmstat -i
> 
>     interrupt                          total       rate
>     irq19: ehci0                     1547103          0
>     irq21: uhci1 uhci3+                   29          0
>     irq23: atapci0                        35          0
>     irq32: mfi0                     68104468         43
>     cpu0: timer                   3093305346       1986
>     irq256: bce0                    46587008         29
>     cpu19: timer                  3103614834       1992
>     cpu1: timer                   3093298527       1986
>     cpu4: timer                   3093297557       1986
>     cpu10: timer                  3089824707       1983
>     cpu12: timer                  3097896788       1989
>     cpu16: timer                  3097897232       1989
>     cpu22: timer                  3103615267       1992
>     cpu2: timer                   3093297601       1986
>     cpu5: timer                   3093298349       1986
>     cpu3: timer                   3093298637       1986
>     cpu6: timer                   3089823402       1983
>     cpu18: timer                  3103614571       1992
>     cpu13: timer                  3097897961       1989
>     cpu20: timer                  3103615299       1992
>     cpu23: timer                  3103614783       1992
>     cpu9: timer                   3089821582       1983
>     cpu17: timer                  3097898138       1989
>     cpu11: timer                  3089821712       1983
>     cpu14: timer                  3097897190       1989
>     cpu7: timer                   3089821360       1983
>     cpu21: timer                  3103615012       1992
>     cpu15: timer                  3097898081       1989
>     cpu8: timer                   3089824487       1983
>     Total                        74424047066      47788
> 
> 
> > * ifconfig -a     (if desired, XXX out IPs and MACs)
> 
>     bce0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> metric 0 mtu 1500
>             options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>             ether 00:25:64:fd:0b:24
>             inet 10.5.2.69 netmask 0xfffffc00 broadcast 10.5.3.255
>             media: Ethernet autoselect (1000baseT <full-duplex>)
>             status: active
>     bce1: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>             options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>             ether 00:25:64:fd:0b:26
>             media: Ethernet autoselect (none)
>             status: no carrier
>     bce2: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>             options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>             ether 00:25:64:fd:0b:28
>             media: Ethernet autoselect (none)
>             status: no carrier
>     bce3: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>             options=c01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSUM,TSO4,VLAN_HWTSO,LINKSTATE>
>             ether 00:25:64:fd:0b:2a
>             media: Ethernet autoselect (none)
>             status: no carrier
>     lo0: flags=8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu 16384
>             options=3<RXCSUM,TXCSUM>
>             inet6 fe80::1%lo0 prefixlen 64 scopeid 0x5 
>             inet6 ::1 prefixlen 128 
>             inet 127.0.0.1 netmask 0xff000000 
>             nd6 options=3<PERFORMNUD,ACCEPT_RTADV>
>     vboxnet0: flags=8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>             ether 0a:00:27:00:00:00
> 
> 
> > * netstat -inbd   (if desired, XXX out MACs)
> 
>     Name    Mtu Network       Address              Ipkts Ierrs Idrop     Ibytes    Opkts Oerrs     Obytes  Coll Drop
>     bce0   1500 <Link#1>      00:25:64:fd:0b:24 14467627     0     0 6346549588 11846499     0 4646920777     0    0 
>     bce0   1500 10.5.0.0/22   10.5.2.69          1987644     -     -  371635478   415087     -   74168123     -    - 
>     bce1*  1500 <Link#2>      00:25:64:fd:0b:26        0     0     0          0        0     0          0     0    0 
>     bce2*  1500 <Link#3>      00:25:64:fd:0b:28        0     0     0          0        0     0          0     0    0 
>     bce3*  1500 <Link#4>      00:25:64:fd:0b:2a        0     0     0          0        0     0          0     0    0 
>     lo0   16384 <Link#5>                           25561     0     0   47338756    25561     0   47338756     0    0 
>     lo0   16384 fe80:5::1/64  fe80:5::1                0     -     -          0        0     -          0     -    - 
>     lo0   16384 ::1/128       ::1                      0     -     -          0        0     -          0     -    - 
>     lo0   16384 127.0.0.0/8   127.0.0.1            25561     -     -   47338756    25561     -   47338756     -    - 
>     vboxn  1500 <Link#6>      0a:00:27:00:00:00        0     0     0          0        0     0          0     0    0 
> 
> 
> 
> > * pciconf -lvc    (only the bceX entry please)
> 
>     bce0@pci0:1:0:0:        class=0x020000 card=0x02d41028 chip=0x163914e4 rev=0x20 hdr=0x00
>         vendor     = 'Broadcom Corporation'
>         device     = 'NetXtreme II Gigabit Ethernet (BCM5709)'
>         class      = network
>         subclass   = ethernet
>         cap 01[48] = powerspec 3  supports D0 D3  current D0
>         cap 03[50] = VPD
>         cap 05[58] = MSI supports 16 messages, 64 bit enabled with 1 message
>         cap 11[a0] = MSI-X supports 9 messages in map 0x10
>         cap 10[ac] = PCI-Express 2 endpoint max data 256(512) link x2(x4)
> 
>  
> > Also check dmesg to see if there's any error messages that correlate
> > when the problem occurs.
> 
> All quiet on that front.
> 

Based on your outputs, I don't see abnormal things in bce(4).
Why do you think bce(4) is the cause of problem?
You may see more detailed MAC statistics if controller saw some
kind of memory related failure from the output of
"sysctl dev.bce.0".



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100908002917.GO1439>