From owner-freebsd-stable@FreeBSD.ORG Tue Sep 7 22:24:06 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id A4FAB10656A9 for ; Tue, 7 Sep 2010 22:24:06 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta06.westchester.pa.mail.comcast.net (qmta06.westchester.pa.mail.comcast.net [76.96.62.56]) by mx1.freebsd.org (Postfix) with ESMTP id 030558FC13 for ; Tue, 7 Sep 2010 22:24:05 +0000 (UTC) Received: from omta11.westchester.pa.mail.comcast.net ([76.96.62.36]) by qmta06.westchester.pa.mail.comcast.net with comcast id 3yBc1f0050mv7h056yQ5dJ; Tue, 07 Sep 2010 22:24:06 +0000 Received: from koitsu.dyndns.org ([98.248.41.155]) by omta11.westchester.pa.mail.comcast.net with comcast id 3yQ41f00X3LrwQ23XyQ58u; Tue, 07 Sep 2010 22:24:05 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 341B69B423; Tue, 7 Sep 2010 15:24:03 -0700 (PDT) Date: Tue, 7 Sep 2010 15:24:03 -0700 From: Jeremy Chadwick To: "Mahlon E. Smith" Message-ID: <20100907222403.GA18595@icarus.home.lan> References: <20100907210813.GI49065@martini.nu> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100907210813.GI49065@martini.nu> User-Agent: Mutt/1.5.20 (2009-06-14) Cc: Yong-Hyeon PYUN , freebsd-stable@freebsd.org Subject: Re: Network memory allocation failures X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 07 Sep 2010 22:24:06 -0000 On Tue, Sep 07, 2010 at 02:08:13PM -0700, Mahlon E. Smith wrote: > I picked up a couple of Dell R810 monsters a couple of months ago. 96G > of RAM, 24 core. With the aid of this list, got 8.1-RELEASE on there, > and they are trucking along merrily as VirtualBox hosts. > > I'm seeing memory allocation errors when sending data over the network. > It is random at best, however I can reproduce it pretty reliably. > > Sending 100M to a remote machine. Note the 2nd scp attempt worked. > Most small files can make it through unmolested. > > obb# dd if=/dev/random of=100M-test bs=1M count=100 > 100+0 records in > 100+0 records out > 104857600 bytes transferred in 2.881689 secs (36387551 bytes/sec) > obb# rsync -av 100M-test skin:/tmp/ > sending incremental file list > 100M-test > Write failed: Cannot allocate memory > rsync: writefd_unbuffered failed to write 4 bytes to socket [sender]: Broken pipe (32) > rsync: connection unexpectedly closed (28 bytes received so far) [sender] > rsync error: unexplained error (code 255) at io.c(601) [sender=3.0.7] > obb# scp 100M-test skin:/tmp/ > 100M-test 52% 52MB 52.1MB/s 00:00 ETAWrite failed: Cannot allocate memory > lost connection > obb# scp 100M-test skin:/tmp/ > 100M-test 100% 100MB 50.0MB/s 00:02 > obb# scp 100M-test skin:/tmp/ > 100M-test 0% 0 0.0KB/s --:-- ETAWrite failed: Cannot allocate memory > lost connection > > Fetching a file, however, works. > > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > obb# scp skin:/usr/local/tmp/100M-test . > 100M-test 100% 100MB 20.0MB/s 00:05 > ... > > > I've ruled out bad hardware (mainly due to the behavior being > *identical* on the sister machine, in a completely different data > center.) It's a broadcom (bce) NIC. This could be a bce(4) bug, meaning the "failed to allocate memory" message could be indicating DMA failure or something else from the card, and not necessarily related to mbufs. There are also changes/fixes to bce(4) that are in RELENG_8 (8.1-STABLE) that aren't in 8.1-RELEASE, but I don't know if those are responsible for your problem. Please provide output from the following: * uname -a (if desired, XXX out hostname) * vmstat -i * ifconfig -a (if desired, XXX out IPs and MACs) * netstat -inbd (if desired, XXX out MACs) * pciconf -lvc (only the bceX entry please) Also check dmesg to see if there's any error messages that correlate when the problem occurs. I'm also CC'ing Yong-Hyeon PYUN who might have some ideas. -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |