Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 7 Sep 2010 15:24:03 -0700
From:      Jeremy Chadwick <freebsd@jdc.parodius.com>
To:        "Mahlon E. Smith" <mahlon@martini.nu>
Cc:        Yong-Hyeon PYUN <pyunyh@gmail.com>, freebsd-stable@freebsd.org
Subject:   Re: Network memory allocation failures
Message-ID:  <20100907222403.GA18595@icarus.home.lan>
In-Reply-To: <20100907210813.GI49065@martini.nu>
References:  <20100907210813.GI49065@martini.nu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Sep 07, 2010 at 02:08:13PM -0700, Mahlon E. Smith wrote:
> I picked up a couple of Dell R810 monsters a couple of months ago.  96G
> of RAM, 24 core.  With the aid of this list, got 8.1-RELEASE on there,
> and they are trucking along merrily as VirtualBox hosts.
> 
> I'm seeing memory allocation errors when sending data over the network.
> It is random at best, however I can reproduce it pretty reliably.
> 
> Sending 100M to a remote machine.  Note the 2nd scp attempt worked.
> Most small files can make it through unmolested.
> 
>     obb# dd if=/dev/random of=100M-test bs=1M count=100
>     100+0 records in
>     100+0 records out
>     104857600 bytes transferred in 2.881689 secs (36387551 bytes/sec)
>     obb# rsync -av 100M-test skin:/tmp/
>     sending incremental file list
>     100M-test
>     Write failed: Cannot allocate memory
>     rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32)
>     rsync: connection unexpectedly closed (28 bytes received so far) [sender]
>     rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7]
>     obb# scp 100M-test skin:/tmp/
>     100M-test        52%   52MB  52.1MB/s   00:00 ETAWrite failed: Cannot allocate memory
>     lost connection
>     obb# scp 100M-test skin:/tmp/
>     100M-test       100%  100MB  50.0MB/s   00:02    
>     obb# scp 100M-test skin:/tmp/
>     100M-test         0%    0     0.0KB/s   --:-- ETAWrite failed: Cannot allocate memory
>     lost connection
> 
> Fetching a file, however, works.
> 
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     obb# scp skin:/usr/local/tmp/100M-test .
>     100M-test    100%  100MB  20.0MB/s   00:05    
>     ...
> 
> 
> I've ruled out bad hardware (mainly due to the behavior being
> *identical* on the sister machine, in a completely different data
> center.) It's a broadcom (bce) NIC.

This could be a bce(4) bug, meaning the "failed to allocate memory"
message could be indicating DMA failure or something else from the card,
and not necessarily related to mbufs.

There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE)
that aren't in 8.1-RELEASE, but I don't know if those are responsible
for your problem.

Please provide output from the following:

* uname -a        (if desired, XXX out hostname)
* vmstat -i
* ifconfig -a     (if desired, XXX out IPs and MACs)
* netstat -inbd   (if desired, XXX out MACs)
* pciconf -lvc    (only the bceX entry please)

Also check dmesg to see if there's any error messages that correlate
when the problem occurs.

I'm also CC'ing Yong-Hyeon PYUN who might have some ideas.

-- 
| Jeremy Chadwick                                   jdc@parodius.com |
| Parodius Networking                       http://www.parodius.com/ |
| UNIX Systems Administrator                  Mountain View, CA, USA |
| Making life hard for others since 1977.              PGP: 4BD6C0CB |




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20100907222403.GA18595>