Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 25 Jul 2011 10:07:36 +0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Peter Ross <Peter.Ross@bogen.in-berlin.de>
Cc:        Yong-Hyeon Pyun <pyunyh@gmail.com>, freebsd-stable List <freebsd-stable@freebsd.org>, "Vogel, Jack" <jack.vogel@intel.com>, Scott Sipe <cscotts@gmail.com>, davidch@freebsd.org, Jeremy Chadwick <freebsd@jdc.parodius.com>
Subject:   Re: scp: Write Failed: Cannot allocate memory
Message-ID:  <CAJ-Vmom19KbU0kki0KVTSyzmq-CTKh-j7g%2BmFcRVscb%2B0XPxhQ@mail.gmail.com>
In-Reply-To: <20110711115947.51686v4930s7ze37@webmail.in-berlin.de>
References:  <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan> <20110706143129.10696235ldx9bjmp@webmail.in-berlin.de> <20110706173242.23404ffbhkxz6mqi@webmail.in-berlin.de> <20110706182141.13056plxp148y61h@webmail.in-berlin.de> <CA%2B30O_O8b8O29rc6BLnnGVTY3cWzpuKQ1q8FTG1idJKM5ykrvA@mail.gmail.com> <20110711115947.51686v4930s7ze37@webmail.in-berlin.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Has someone asked for the output of netstat -mb? That error message is
mbuf related, so I bet it's something to do with mbuf allocation.

Is it possible that the system is incorrectly tuned when virtualbox is enab=
led?


Adrian

On 11 July 2011 09:59, Peter Ross <Peter.Ross@bogen.in-berlin.de> wrote:
> Quoting "Scott Sipe" <cscotts@gmail.com>:
>
>> On Wed, Jul 6, 2011 at 4:21 AM, Peter Ross
>> <Peter.Ross@bogen.in-berlin.de>wrote:
>>
>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>>
>>> =A0Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>>>
>>>> =A0Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>
>>>>> =A0On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>>>>>>
>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>
>>>>>>> =A0On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>>>>>>
>>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>>>
>>>>>>>>> =A0On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>>>>>>
>>>>>>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>>>>>>
>>>>>>>>>>> =A0On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do wi=
th
>>>>>>>>>>>>> it.
>>>>>>>>>>>>>
>>>>>>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000
>>>>>>>>>>>>>
>>>>>>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>>>>>>> dies when crossing over arc_max. I can run the same scp 10
>>>>>>>>>>>>> times
>>>>>>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>>>>>>
>>>>>>>>>>>>> Scott
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> =A0Hi all,
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>>>>>>> problem.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>>>>>>>>>>
>>>>>>>>>>>>> some statistics.
>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de**>:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> =A0Hi all,
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> September/058708.html<http://lists.freebsd.org/pipermail/fr=
eebsd-stable/2010-September/058708.html>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> September/058711.html<http://lists.freebsd.org/pipermail/fr=
eebsd-stable/2010-September/058711.html>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>>>>>>> then the value was still below.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>> does not help.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>>>>>>> isn't released..
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am
>>>>>>>>>>>>>>> upgrading.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I am happy to give information gathered on old/new kernel i=
f
>>>>>>>>>>>>>>> it
>>>>>>>>>>>>>>> helps.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Regards
>>>>>>>>>>>>>>> Peter
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> =A0On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwi=
ck
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe
>>>>>>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>>>>>>>> with scp. When scping
>>>>>>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>>>>>>>> most notably large files
>>>>>>>>>>>>>>>>>>> -- the transfer frequently dies after just a few
>>>>>>>>>>>>>>>>>>> seconds. In my last test, I
>>>>>>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>>>>>>>> the transfer died after
>>>>>>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I
>>>>>>>>>>>>>>>>>>> tried, and then died again on
>>>>>>>>>>>>>>>>>>> the next attempt.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> On the client side:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> "Connection to home closed by remote host.
>>>>>>>>>>>>>>>>>>> lost connection"
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> In /var/log/auth.log:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Jul =A01 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>>>>>>>> failed: Cannot allocate
>>>>>>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've never seen this before and have used scp before
>>>>>>>>>>>>>>>>>>> to transfer large files
>>>>>>>>>>>>>>>>>>> without problems. This computer has been used in
>>>>>>>>>>>>>>>>>>> production for months and
>>>>>>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been
>>>>>>>>>>>>>>>>>>> able to notice any problems
>>>>>>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> any problems in
>>>>>>>>>>>
>>>>>>>>>>>> apache.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Uname:
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>>>>>>>> Feb 19 01:02:54 EST
>>>>>>>>>>>>>>>>>>> 2011 =A0 =A0 root@xeon:/usr/obj/usr/src/**sys/GENERIC =
=A0amd64
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the
>>>>>>>>>>>>>>>>>>> computer.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> Am glad to provide any other information or test anythi=
ng
>>>>>>>>>>>>>>>>>>> else.
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>> {snip vmstat -z and dmesg}
>>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> You didn't provide details about your networking setup
>>>>>>>>>>>>>>>>>> (rc.conf,
>>>>>>>>>>>>>>>>>> ifconfig -a, etc.). =A0netstat -m would be useful too.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> Next, please see this thread circa September 2010, title=
d
>>>>>>>>>>>>>>>>>> "Network
>>>>>>>>>>>>>>>>>> memory allocation failures":
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010=
-**
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pip=
ermail/freebsd-stable/2010-September/thread.html#58708>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>> The user in that thread is using rsync, which relies on
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> scp by default.
>>>>>>>>>>>
>>>>>>>>>>>> I believe this problem is similar, if not identical, to yours.
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a )
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> for the server
>>>>>>>>>>>
>>>>>>>>>>>> end and the client.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a
>>>>>>>>>>>>>>>>> but
>>>>>>>>>>>>>>>>> some
>>>>>>>>>>>>>>>>> information about the networking driver your using for th=
e
>>>>>>>>>>>>>>>>> interface
>>>>>>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> of the pool.
>>>>>>>>>
>>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should
>>>>>>>>>> probably
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> prop this information up somewhere so you can reference b=
y
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> URL whenever
>>>>>>>>>>>
>>>>>>>>>>>> needed.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> can be made to
>>>>>>>>>>>
>>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what Jeremy
>>>>>>>>>>>> is
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> stating here but correct me if I am wrong. It does use
>>>>>>>>>>>>>>>>> ssh(1)
>>>>>>>>>>>>>>>>> by
>>>>>>>>>>>>>>>>> default.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(=
8)
>>>>>>>>>>>>>>>>> for /tmp
>>>>>>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up you=
r
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> temp ram area
>>>>>>>>>>>
>>>>>>>>>>>> and causing the connection abort which would be
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>> expected. ( df -h ) would
>>>>>>>>>>>>>>>>> help here.
>>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Hello,
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>>>>>>>> worth, the server is backed up remotely every night with
>>>>>>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>>>>>>>> seen any errors in the nightly rsync.
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Sorry for the omission of networking info, here's the
>>>>>>>>>>>>>>>> output of the requested commands and some that popped up
>>>>>>>>>>>>>>>> in the other thread:
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> In rc.conf: =A0ifconfig_em1=3D"inet 10.1.1.1 netmask
>>>>>>>>>>>>>>>> 255.255.0.0"
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>> Scott
>>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>>
>>>>>>>>>>>> Just to make it crystal clear to everyone:
>>>>>>>>>>>>
>>>>>>>>>>>> There is no correlation between this problem and use of ZFS.
>>>>>>>>>>>> =A0People are
>>>>>>>>>>>> attempting to correlate "cannot allocate memory" messages with
>>>>>>>>>>>> "anything
>>>>>>>>>>>> on the system that uses memory". =A0The VM is much more comple=
x
>>>>>>>>>>>> than
>>>>>>>>>>>> that.
>>>>>>>>>>>>
>>>>>>>>>>>> Given the nature of this problem, it's much more likely the
>>>>>>>>>>>> issue
>>>>>>>>>>>> is
>>>>>>>>>>>> "somewhere" within a networking layer within FreeBSD, whether =
it
>>>>>>>>>>>> be
>>>>>>>>>>>> driver-level or some sort of intermediary layer.
>>>>>>>>>>>>
>>>>>>>>>>>> Two people who have this issue in this thread are both using
>>>>>>>>>>>> VirtualBox.
>>>>>>>>>>>> Can one, or both, of you remove VirtualBox from the
>>>>>>>>>>>> configuration
>>>>>>>>>>>> entirely (kernel, etc. -- not sure what is required) and then
>>>>>>>>>>>> see
>>>>>>>>>>>> if the
>>>>>>>>>>>> issue goes away?
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On the machine in question I only can do it after hours so I wi=
ll
>>>>>>>>>>> do
>>>>>>>>>>> it tonight.
>>>>>>>>>>>
>>>>>>>>>>> I was _successfully_ sending the file over the loopback interfa=
ce
>>>>>>>>>>> using
>>>>>>>>>>>
>>>>>>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat >
>>>>>>>>>>> /dev/null"
>>>>>>>>>>>
>>>>>>>>>>> I did it, btw, with the IPv6 localhost address first
>>>>>>>>>>> (accidently),
>>>>>>>>>>> and then using IPv4. Both worked.
>>>>>>>>>>>
>>>>>>>>>>> It always fails if I am sending it through the bce(4) interface=
,
>>>>>>>>>>> even if my target is the VirtualBox bridged to the bce card (so
>>>>>>>>>>> it
>>>>>>>>>>> does not "leave" the computer physically).
>>>>>>>>>>>
>>>>>>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>>>>>>>> kldstat output.
>>>>>>>>>>>
>>>>>>>>>>> I have another box where I do not see that problem. It copies
>>>>>>>>>>> files
>>>>>>>>>>> happily over the net using ssh.
>>>>>>>>>>>
>>>>>>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>>>>>>>> driver instead. It runs the same last week's RELENG_8. I
>>>>>>>>>>> installed
>>>>>>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules)=
.
>>>>>>>>>>> But
>>>>>>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>>>>>>>>
>>>>>>>>>>> Regards
>>>>>>>>>>> Peter
>>>>>>>>>>>
>>>>>>>>>>> DellT410one# uname -a
>>>>>>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Th=
u
>>>>>>>>>>> Jun
>>>>>>>>>>> 30 17:07:18 EST 2011
>>>>>>>>>>> root@DellT410one.vv.fda:/usr/**obj/usr/src/sys/GENERIC =A0amd64
>>>>>>>>>>> DellT410one# ifconfig -a
>>>>>>>>>>> bce0: flags=3D8943<UP,BROADCAST,**RUNNING,PROMISC,SIMPLEX,**
>>>>>>>>>>> MULTICAST>
>>>>>>>>>>> metric 0 mtu 1500
>>>>>>>>>>> =A0 =A0 =A0 options=3Dc01bb<RXCSUM,TXCSUM,**
>>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>> =A0 =A0 =A0 ether 84:2b:2b:68:64:e4
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.220 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.221 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.223 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.224 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.225 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.226 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.227 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 inet 192.168.50.219 netmask 0xffffff00 broadcast
>>>>>>>>>>> 192.168.50.255
>>>>>>>>>>> =A0 =A0 =A0 media: Ethernet autoselect (1000baseT <full-duplex>=
)
>>>>>>>>>>> =A0 =A0 =A0 status: active
>>>>>>>>>>> bce1: flags=3D8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 mtu =
1500
>>>>>>>>>>> =A0 =A0 =A0 options=3Dc01bb<RXCSUM,TXCSUM,**
>>>>>>>>>>> VLAN_MTU,VLAN_HWTAGGING,JUMBO_**MTU,VLAN_HWCSUM,TSO4,VLAN_**
>>>>>>>>>>> HWTSO,LINKSTATE>
>>>>>>>>>>> =A0 =A0 =A0 ether 84:2b:2b:68:64:e5
>>>>>>>>>>> =A0 =A0 =A0 media: Ethernet autoselect
>>>>>>>>>>> lo0: flags=3D8049<UP,LOOPBACK,**RUNNING,MULTICAST> metric 0 mtu
>>>>>>>>>>> 16384
>>>>>>>>>>> =A0 =A0 =A0 options=3D3<RXCSUM,TXCSUM>
>>>>>>>>>>> =A0 =A0 =A0 inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>>>>>>>>> =A0 =A0 =A0 inet6 ::1 prefixlen 128
>>>>>>>>>>> =A0 =A0 =A0 inet 127.0.0.1 netmask 0xff000000
>>>>>>>>>>> =A0 =A0 =A0 nd6 options=3D3<PERFORMNUD,ACCEPT_**RTADV>
>>>>>>>>>>> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,**MULTICAST> metric 0 =
mtu
>>>>>>>>>>> 1500
>>>>>>>>>>> =A0 =A0 =A0 ether 0a:00:27:00:00:00
>>>>>>>>>>> DellT410one# netstat -rn
>>>>>>>>>>> Routing tables
>>>>>>>>>>>
>>>>>>>>>>> Internet:
>>>>>>>>>>> Destination =A0 =A0 =A0 =A0Gateway =A0 =A0 =A0 =A0 =A0 =A0Flags=
 =A0 =A0Refs =A0 =A0 =A0Use
>>>>>>>>>>> =A0Netif
>>>>>>>>>>> Expire
>>>>>>>>>>> default =A0 =A0 =A0 =A0 =A0 =A0192.168.50.201 =A0 =A0 UGS =A0 =
=A0 =A0 =A0 0 =A0 =A052195
>>>>>>>>>>> bce0
>>>>>>>>>>> 127.0.0.1 =A0 =A0 =A0 =A0 =A0link#11 =A0 =A0 =A0 =A0 =A0 =A0UH =
=A0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A06
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.0/24 =A0 =A0link#1 =A0 =A0 =A0 =A0 =A0 =A0 U =A0 =A0=
 =A0 =A0 =A0 0 =A01118212
>>>>>>>>>>> bce0
>>>>>>>>>>> 192.168.50.219 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 9670
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.220 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 8347
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.221 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 103024
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.223 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A043614
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.224 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 8358
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.225 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 8438
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.226 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 8338
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.50.227 =A0 =A0 link#1 =A0 =A0 =A0 =A0 =A0 =A0 UHS =A0 =
=A0 =A0 =A0 0 =A0 =A0 8333
>>>>>>>>>>> =A0lo0
>>>>>>>>>>> 192.168.165.0/24 =A0 192.168.50.200 =A0 =A0 UGS =A0 =A0 =A0 =A0=
 0 =A0 =A0 3311
>>>>>>>>>>> bce0
>>>>>>>>>>> 192.168.166.0/24 =A0 192.168.50.200 =A0 =A0 UGS =A0 =A0 =A0 =A0=
 0 =A0 =A0 =A0699
>>>>>>>>>>> bce0
>>>>>>>>>>> 192.168.167.0/24 =A0 192.168.50.200 =A0 =A0 UGS =A0 =A0 =A0 =A0=
 0 =A0 =A0 3012
>>>>>>>>>>> bce0
>>>>>>>>>>> 192.168.168.0/24 =A0 192.168.50.200 =A0 =A0 UGS =A0 =A0 =A0 =A0=
 0 =A0 =A0 =A0552
>>>>>>>>>>> bce0
>>>>>>>>>>>
>>>>>>>>>>> Internet6:
>>>>>>>>>>> Destination =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 Gateway
>>>>>>>>>>> Flags =A0 =A0 =A0Netif Expire
>>>>>>>>>>> ::1 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 ::1
>>>>>>>>>>> UH
>>>>>>>>>>> lo0
>>>>>>>>>>> fe80::%lo0/64 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 link#11 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 U
>>>>>>>>>>> lo0
>>>>>>>>>>> fe80::1%lo0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 link#11
>>>>>>>>>>> UHS
>>>>>>>>>>> lo0
>>>>>>>>>>> ff01::%lo0/32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fe80::1%l=
o0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 U
>>>>>>>>>>> lo0
>>>>>>>>>>> ff02::%lo0/32 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 fe80::1%l=
o0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 U
>>>>>>>>>>> lo0
>>>>>>>>>>> DellT410one# kldstat
>>>>>>>>>>> Id Refs Address =A0 =A0 =A0 =A0 =A0 =A0Size =A0 =A0 Name
>>>>>>>>>>> 1 =A0 19 0xffffffff80100000 dbf5d0 =A0 kernel
>>>>>>>>>>> 2 =A0 =A03 0xffffffff80ec0000 4c358 =A0 =A0vboxdrv.ko
>>>>>>>>>>> 3 =A0 =A01 0xffffffff81012000 131998 =A0 zfs.ko
>>>>>>>>>>> 4 =A0 =A01 0xffffffff81144000 1ff1 =A0 =A0 opensolaris.ko
>>>>>>>>>>> 5 =A0 =A02 0xffffffff81146000 2940 =A0 =A0 vboxnetflt.ko
>>>>>>>>>>> 6 =A0 =A02 0xffffffff81149000 8e38 =A0 =A0 netgraph.ko
>>>>>>>>>>> 7 =A0 =A01 0xffffffff81152000 153c =A0 =A0 ng_ether.ko
>>>>>>>>>>> 8 =A0 =A01 0xffffffff81154000 e70 =A0 =A0 =A0vboxnetadp.ko
>>>>>>>>>>> DellT410one# pciconf -lv
>>>>>>>>>>> ..
>>>>>>>>>>> bce0@pci0:1:0:0: =A0 =A0 =A0 =A0class=3D0x020000 card=3D0x028d1=
028
>>>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>>>> =A0vendor =A0 =A0 =3D 'Broadcom Corporation'
>>>>>>>>>>> =A0class =A0 =A0 =A0=3D network
>>>>>>>>>>> =A0subclass =A0 =3D ethernet
>>>>>>>>>>> bce1@pci0:1:0:1: =A0 =A0 =A0 =A0class=3D0x020000 card=3D0x028d1=
028
>>>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>>>> =A0vendor =A0 =A0 =3D 'Broadcom Corporation'
>>>>>>>>>>> =A0class =A0 =A0 =A0=3D network
>>>>>>>>>>> =A0subclass =A0 =3D ethernet
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Could you please provide "pciconf -lvcb" output instead, specifi=
c
>>>>>>>>>> to
>>>>>>>>>> the
>>>>>>>>>> bce chips? =A0Thanks.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Her it is:
>>>>>>>>>
>>>>>>>>> bce0@pci0:1:0:0: =A0 =A0 =A0 =A0class=3D0x020000 card=3D0x028d102=
8
>>>>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>>>> =A0vendor =A0 =A0 =3D 'Broadcom Corporation'
>>>>>>>>> =A0class =A0 =A0 =A0=3D network
>>>>>>>>> =A0subclass =A0 =3D ethernet
>>>>>>>>> =A0bar =A0 [10] =3D type Memory, range 64, base 0xda000000, size
>>>>>>>>> 33554432, enabled
>>>>>>>>> =A0cap 01[48] =3D powerspec 3 =A0supports D0 D3 =A0current D0
>>>>>>>>> =A0cap 03[50] =3D VPD
>>>>>>>>> =A0cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1
>>>>>>>>> message
>>>>>>>>> =A0cap 11[a0] =3D MSI-X supports 9 messages in map 0x10
>>>>>>>>> =A0cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link x=
4(x4)
>>>>>>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4
>>>>>>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected
>>>>>>>>> ecap 0004[150] =3D unknown 1
>>>>>>>>> ecap 0002[160] =3D VC 1 max VC0
>>>>>>>>>
>>>>>>>>
>>>>>>>> Thanks Peter.
>>>>>>>>
>>>>>>>> Adding Yong-Hyeon and David to the discussion, since they've both
>>>>>>>> worked
>>>>>>>> on the bce(4) driver in recent months (most of the changes made
>>>>>>>> recently
>>>>>>>> are only in HEAD), and also adding Jack Vogel of Intel who maintai=
ns
>>>>>>>> em(4). =A0Brief history for the devs:
>>>>>>>>
>>>>>>>> The issue is described "Network memory allocation failures" and wa=
s
>>>>>>>> reported last year, but two users recently (Scott and Peter) have
>>>>>>>> reported the issue again:
>>>>>>>>
>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2010-**
>>>>>>>>
>>>>>>>> September/thread.html#58708<http://lists.freebsd.org/pipermail/fre=
ebsd-stable/2010-September/thread.html#58708>
>>>>>>>>
>>>>>>>> And was mentioned again by Scott here, which also contains some
>>>>>>>> technical details:
>>>>>>>>
>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>>>
>>>>>>>> July/063172.html<http://lists.freebsd.org/pipermail/freebsd-stable=
/2011-July/063172.html>
>>>>>>>>
>>>>>>>> What's interesting is that Scott's issue is identical in form but
>>>>>>>> he's
>>>>>>>> using em(4), which isn't known to behave like this. =A0Both
>>>>>>>> individuals
>>>>>>>> are using VirtualBox, though we're not sure at this point if that =
is
>>>>>>>> the
>>>>>>>> piece which is causing the anomaly.
>>>>>>>>
>>>>>>>> Relevant details of Scott's system (em-based):
>>>>>>>>
>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>
>>>>>>>> Relevant details of Peter's system (bce-based):
>>>>>>>>
>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>>>
>>>>>>>> July/063221.html<http://lists.freebsd.org/pipermail/freebsd-stable=
/2011-July/063221.html>
>>>>>>>> http://lists.freebsd.org/**pipermail/freebsd-stable/2011-**
>>>>>>>>
>>>>>>>> July/063223.html<http://lists.freebsd.org/pipermail/freebsd-stable=
/2011-July/063223.html>
>>>>>>>>
>>>>>>>> I think the biggest complexity right now is figuring out how/why s=
cp
>>>>>>>> fails intermittently in this nature. =A0The errno probably "trickl=
es
>>>>>>>> down"
>>>>>>>> to userland from the kernel, but the condition regarding why it
>>>>>>>> happens
>>>>>>>> is unknown.
>>>>>>>>
>>>>>>>
>>>>>>> BTW: I also saw 2 of the errors coming from a BIND9 running in a
>>>>>>> jail on that box.
>>>>>>>
>>>>>>> DellT410one# fgrep -i allocate
>>>>>>> /jails/bind/20110315/var/log/**messages
>>>>>>> Apr 13 05:17:41 bind named[23534]: internal_send:
>>>>>>> 192.168.50.145#65176: Cannot allocate memory
>>>>>>> Jun 21 23:30:44 bind named[39864]: internal_send:
>>>>>>> 192.168.50.251#36155: Cannot allocate memory
>>>>>>> Jun 24 15:28:00 bind named[39864]: internal_send:
>>>>>>> 192.168.50.251#28651: Cannot allocate memory
>>>>>>> Jun 28 12:57:52 bind named[2462]: internal_send:
>>>>>>> 192.168.165.154#1201: Cannot allocate memory
>>>>>>>
>>>>>>> My initial guess: it happens sooner or later somehow - whether it i=
s
>>>>>>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a
>>>>>>> lot of traffic over a longer period (a nameserver gets asked again
>>>>>>> and again).
>>>>>>>
>>>>>>
>>>>>> Scott, are you also using jails? =A0If both of you are: is there any
>>>>>> possibility you can remove use of those? =A0I'm not sure how Virtual=
Box
>>>>>> fits into the picture (jails + VirtualBox that is), but I can imagin=
e
>>>>>> jails having different environmental constraints that might cause
>>>>>> this.
>>>>>>
>>>>>> Basically the troubleshooting process here is to remove pieces of th=
e
>>>>>> puzzle until you figure out which piece is causing the issue. =A0I d=
on't
>>>>>> want to get the NIC driver devs all spun up for something that, for
>>>>>> example, might be an issue with the jail implementation.
>>>>>>
>>>>>
>>>>> I understand this. As said, I do some afterhours debugging tonight.
>>>>>
>>>>> The scp/ssh problems are happening _outside_ the jails. The bind runs
>>>>> _inside_ the jail.
>>>>>
>>>>> I wanted to use the _host_ system to send VirtualBox virtual disks an=
d
>>>>> =A0filesystems used by jails to archive them and/or having them avail=
able
>>>>> on
>>>>> other FreeBSD systems (as a cold standby solution).
>>>>>
>>>>
>>>> I just switched off the VirtualBox (without removing the kernel
>>>> modules).
>>>>
>>>> The copy succeeds now.
>>>>
>>>> Well, it could be a VirtualBox related problem, or is the server just
>>>> relieved to have 2GB more memory at hands now?
>>>>
>>>> Do you have a quick idea to "emulate" the 2GB memory load usually
>>>> delivered by VirtualBox?
>>>>
>>>
>>> Well, managed that (using lookbusy)
>>>
>>> Interestingly I could copy a large file (30GB) without problems, as soo=
n
>>> as
>>> I switched off the VirtualBox. As said, the kernel modules weren't
>>> unloaded,
>>> they are still there.
>>>
>>> The copy crashes seconds after I started the VirtualBox. According to
>>> vmstat and top I had more free memory (ca. 1.5GB) as I had without
>>> VirtualBox and lookbusy (ca. 350MB).
>>>
>>> So, it looks (to me, at least) as I have a VirtualBox related problem,
>>> somehow.
>>>
>>> Any ideas? I am happy to play a bit more to get it sorted although it h=
as
>>> some limits (it is running the company mailserver, after all)
>>>
>>> Regards
>>> Peter
>>>
>>
>> This is it -- I'm seeing the exact same thing.
>>
>> Scp dies reliably with VirtualBox running. Quit VirtualBox and I was abl=
e
>> to
>> scp about 30 large files with no errors. Once I started VirtualBox an
>> in-progress scp died within seconds.
>>
>> Ditto that the Kernel modules merely being loaded don't seem to make a
>> difference, it's VirtualBox actually running.
>>
>> virtualbox-ose-3.2.12_1
>
> Hi,
>
> I wonder whether anyone has new ideas.
>
> I am puzzled that it happens when VirtualBoxes are running, while the loa=
d
> or unload of the VirtualBox kernel modules doesn't seem to have an effect=
.
>
> Should I describe the case at the -emulation mailing list to get some ide=
as
> from the engineers working on VirtualBox?
>
> I do not want to create too much noise so I would like to know your thoug=
hts
> on it first.
>
> I experimented a little bit with the ssh code and know which write(2) in
> /usr/src/crypto/openssh/roaming_common.c (in function roaming_write) retu=
rns
> the ENOMEM (an error it should never return, according to the mainpage;-)
>
> but unfortunately I am lost to track it further down in the kernel. I do =
not
> know enough about it, to be frankly.
>
> Are there any memory stats inside the kernel that could help?
>
> Thank you for all ideas
> Peter
>
> _______________________________________________
> freebsd-stable@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-stable
> To unsubscribe, send any mail to "freebsd-stable-unsubscribe@freebsd.org"
>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmom19KbU0kki0KVTSyzmq-CTKh-j7g%2BmFcRVscb%2B0XPxhQ>