FreeBSD Mail Archives

Date:      Wed, 6 Jul 2011 00:34:12 -0400
From:      Scott Sipe <cscotts@gmail.com>
To:        Jeremy Chadwick <freebsd@jdc.parodius.com>
Cc:        Peter Ross <Peter.Ross@bogen.in-berlin.de>, Yong-Hyeon Pyun <pyunyh@gmail.com>, freebsd-stable List <freebsd-stable@freebsd.org>, "Vogel, Jack" <jack.vogel@intel.com>, davidch@freebsd.org
Subject:   Re: scp: Write Failed: Cannot allocate memory
Message-ID:  <345CD069-BAC6-4E4E-A963-38C77CC74345@gmail.com>
In-Reply-To: <20110706041504.GA73698@icarus.home.lan>
References:  <20110706122339.61453nlqra1vqsrv@webmail.in-berlin.de> <20110706023234.GA72048@icarus.home.lan> <20110706130753.182053f3ellasn0p@webmail.in-berlin.de> <20110706032425.GA72757@icarus.home.lan> <20110706135412.15276i0fxavg09k4@webmail.in-berlin.de> <20110706041504.GA73698@icarus.home.lan>


On Jul 6, 2011, at 12:15 AM, Jeremy Chadwick wrote:

> On Wed, Jul 06, 2011 at 01:54:12PM +1000, Peter Ross wrote:
>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>=20
>>> On Wed, Jul 06, 2011 at 01:07:53PM +1000, Peter Ross wrote:
>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>=20
>>>>> On Wed, Jul 06, 2011 at 12:23:39PM +1000, Peter Ross wrote:
>>>>>> Quoting "Jeremy Chadwick" <freebsd@jdc.parodius.com>:
>>>>>>=20
>>>>>>> On Tue, Jul 05, 2011 at 01:03:20PM -0400, Scott Sipe wrote:
>>>>>>>> I'm running virtualbox 3.2.12_1 if that has anything to do with =
it.
>>>>>>>>=20
>>>>>>>> sysctl vfs.zfs.arc_max: 6200000000
>>>>>>>>=20
>>>>>>>> While I'm trying to scp, kstat.zfs.misc.arcstats.size is
>>>>>>>> hovering right around that value, sometimes above, sometimes
>>>>>>>> below (that's as it should be, right?). I don't think that it
>>>>>>>> dies when crossing over arc_max. I can run the same scp 10 =
times
>>>>>>>> and it might fail 1-3 times, with no correlation to the
>>>>>>>> arcstats.size being above/below arc_max that I can see.
>>>>>>>>=20
>>>>>>>> Scott
>>>>>>>>=20
>>>>>>>> On Jul 5, 2011, at 3:00 AM, Peter Ross wrote:
>>>>>>>>=20
>>>>>>>>> Hi all,
>>>>>>>>>=20
>>>>>>>>> just as an addition: an upgrade to last Friday's
>>>>>>>>> FreeBSD-Stable and to VirtualBox 4.0.8 does not fix the
>>>>>>>>> problem.
>>>>>>>>>=20
>>>>>>>>> I will experiment a bit more tomorrow after hours and grab
>>>>>> some statistics.
>>>>>>>>>=20
>>>>>>>>> Regards
>>>>>>>>> Peter
>>>>>>>>>=20
>>>>>>>>> Quoting "Peter Ross" <Peter.Ross@bogen.in-berlin.de>:
>>>>>>>>>=20
>>>>>>>>>> Hi all,
>>>>>>>>>>=20
>>>>>>>>>> I noticed a similar problem last week. It is also very
>>>>>>>>>> similar to one reported last year:
>>>>>>>>>>=20
>>>>>>>>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058708.ht=
ml
>>>>>>>>>>=20
>>>>>>>>>> My server is a Dell T410 server with the same bge card (the
>>>>>>>>>> same pciconf -lvc output as described by Mahlon:
>>>>>>>>>>=20
>>>>>>>>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/058711.ht=
ml
>>>>>>>>>>=20
>>>>>>>>>> Yours, Scott, is a em(4)..
>>>>>>>>>>=20
>>>>>>>>>> Another similarity: In all cases we are using VirtualBox. I
>>>>>>>>>> just want to mention it, in case it matters. I am still
>>>>>>>>>> running VirtualBox 3.2.
>>>>>>>>>>=20
>>>>>>>>>> Most of the time kstat.zfs.misc.arcstats.size was reaching
>>>>>>>>>> vfs.zfs.arc_max then, but I could catch one or two cases
>>>>>>>>>> then the value was still below.
>>>>>>>>>>=20
>>>>>>>>>> I added vfs.zfs.prefetch_disable=3D1 to sysctl.conf but it
>>>> does not help.
>>>>>>>>>>=20
>>>>>>>>>> BTW: It looks as ARC only gives back the memory when I
>>>>>>>>>> destroy the ZFS (a cloned snapshot containing virtual
>>>>>>>>>> machines). Even if nothing happens for hours the buffer
>>>>>>>>>> isn't released..
>>>>>>>>>>=20
>>>>>>>>>> My machine was still running 8.2-PRERELEASE so I am =
upgrading.
>>>>>>>>>>=20
>>>>>>>>>> I am happy to give information gathered on old/new kernel if =
it helps.
>>>>>>>>>>=20
>>>>>>>>>> Regards
>>>>>>>>>> Peter
>>>>>>>>>>=20
>>>>>>>>>> Quoting "Scott Sipe" <cscotts@gmail.com>:
>>>>>>>>>>=20
>>>>>>>>>>>=20
>>>>>>>>>>> On Jul 2, 2011, at 12:54 AM, jhell wrote:
>>>>>>>>>>>=20
>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:22:32PM -0700, Jeremy Chadwick =
wrote:
>>>>>>>>>>>>> On Fri, Jul 01, 2011 at 03:13:17PM -0400, Scott Sipe =
wrote:
>>>>>>>>>>>>>> I'm running 8.2-RELEASE and am having new problems
>>>>>>>>>>>>>> with scp. When scping
>>>>>>>>>>>>>> files to a ZFS directory on the FreeBSD server --
>>>>>>>>>>>>>> most notably large files
>>>>>>>>>>>>>> -- the transfer frequently dies after just a few
>>>>>>>>>>>>>> seconds. In my last test, I
>>>>>>>>>>>>>> tried to scp an 800mb file to the FreeBSD system and
>>>>>>>>>>>>>> the transfer died after
>>>>>>>>>>>>>> 200mb. It completely copied the next 4 times I
>>>>>>>>>>>>>> tried, and then died again on
>>>>>>>>>>>>>> the next attempt.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> On the client side:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> "Connection to home closed by remote host.
>>>>>>>>>>>>>> lost connection"
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> In /var/log/auth.log:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> Jul  1 14:54:42 freebsd sshd[18955]: fatal: Write
>>>>>>>>>>>>>> failed: Cannot allocate
>>>>>>>>>>>>>> memory
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> I've never seen this before and have used scp before
>>>>>>>>>>>>>> to transfer large files
>>>>>>>>>>>>>> without problems. This computer has been used in
>>>>>>>>>>>>>> production for months and
>>>>>>>>>>>>>> has a current uptime of 36 days. I have not been
>>>>>>>>>>>>>> able to notice any problems
>>>>>>>>>>>>>> copying files to the server via samba or netatalk, or
>>>>>> any problems in
>>>>>>>>>>>>>> apache.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> Uname:
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> FreeBSD xeon 8.2-RELEASE FreeBSD 8.2-RELEASE #0: Sat
>>>>>>>>>>>>>> Feb 19 01:02:54 EST
>>>>>>>>>>>>>> 2011     root@xeon:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> I've attached my dmesg and output of vmstat -z.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> I have not restarted the sshd daemon or rebooted the =
computer.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> Am glad to provide any other information or test anything =
else.
>>>>>>>>>>>>>>=20
>>>>>>>>>>>>>> {snip vmstat -z and dmesg}
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> You didn't provide details about your networking setup =
(rc.conf,
>>>>>>>>>>>>> ifconfig -a, etc.).  netstat -m would be useful too.
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> Next, please see this thread circa September 2010, titled =
"Network
>>>>>>>>>>>>> memory allocation failures":
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.ht=
ml#58708
>>>>>>>>>>>>>=20
>>>>>>>>>>>>> The user in that thread is using rsync, which relies on
>>>>>> scp by default.
>>>>>>>>>>>>> I believe this problem is similar, if not identical, to =
yours.
>>>>>>>>>>>>>=20
>>>>>>>>>>>>=20
>>>>>>>>>>>> Please also provide your output of ( /usr/bin/limits -a )
>>>>>> for the server
>>>>>>>>>>>> end and the client.
>>>>>>>>>>>>=20
>>>>>>>>>>>> I am not quite sure I agree with the need for ifconfig -a =
but some
>>>>>>>>>>>> information about the networking driver your using for the =
interface
>>>>>>>>>>>> would be helpful, uptime of the boxes. And configuration
>>>> of the pool.
>>>>>>>>>>>> e.g. ( zpool status -a ;zfs get all <poolname> ) You should =
probably
>>>>>>>>>>>> prop this information up somewhere so you can reference by
>>>>>> URL whenever
>>>>>>>>>>>> needed.
>>>>>>>>>>>>=20
>>>>>>>>>>>> rsync(1) does not rely on scp(1) whatsoever but rsync(1)
>>>>>> can be made to
>>>>>>>>>>>> use ssh(1) instead of rsh(1) and I believe that is what =
Jeremy is
>>>>>>>>>>>> stating here but correct me if I am wrong. It does use =
ssh(1) by
>>>>>>>>>>>> default.
>>>>>>>>>>>>=20
>>>>>>>>>>>> Its a possiblity as well that if using tmpfs(5) or mdmfs(8) =
for /tmp
>>>>>>>>>>>> type filesystems that rsync(1) may be just filling up your
>>>>>> temp ram area
>>>>>>>>>>>> and causing the connection abort which would be
>>>>>>>>>>>> expected. ( df -h ) would
>>>>>>>>>>>> help here.
>>>>>>>>>>>=20
>>>>>>>>>>> Hello,
>>>>>>>>>>>=20
>>>>>>>>>>> I'm not using tmpfs/mdmfs at all. The clients yesterday
>>>>>>>>>>> were 3 different OSX computers (over gigabit). The FreeBSD
>>>>>>>>>>> server has 12gb of ram and no bce adapter. For what it's
>>>>>>>>>>> worth, the server is backed up remotely every night with
>>>>>>>>>>> rsync (remote FreeBSD uses rsync to pull) to an offsite
>>>>>>>>>>> (slow cable connection) FreeBSD computer, and I have not
>>>>>>>>>>> seen any errors in the nightly rsync.
>>>>>>>>>>>=20
>>>>>>>>>>> Sorry for the omission of networking info, here's the
>>>>>>>>>>> output of the requested commands and some that popped up
>>>>>>>>>>> in the other thread:
>>>>>>>>>>>=20
>>>>>>>>>>> http://www.cap-press.com/misc/
>>>>>>>>>>>=20
>>>>>>>>>>> In rc.conf:  ifconfig_em1=3D"inet 10.1.1.1 netmask =
255.255.0.0"
>>>>>>>>>>>=20
>>>>>>>>>>> Scott
>>>>>>>=20
>>>>>>> Just to make it crystal clear to everyone:
>>>>>>>=20
>>>>>>> There is no correlation between this problem and use of ZFS.  =
People are
>>>>>>> attempting to correlate "cannot allocate memory" messages with =
"anything
>>>>>>> on the system that uses memory".  The VM is much more complex =
than that.
>>>>>>>=20
>>>>>>> Given the nature of this problem, it's much more likely the =
issue is
>>>>>>> "somewhere" within a networking layer within FreeBSD, whether it =
be
>>>>>>> driver-level or some sort of intermediary layer.
>>>>>>>=20
>>>>>>> Two people who have this issue in this thread are both using =
VirtualBox.
>>>>>>> Can one, or both, of you remove VirtualBox from the =
configuration
>>>>>>> entirely (kernel, etc. -- not sure what is required) and then =
see if the
>>>>>>> issue goes away?
>>>>>>=20
>>>>>> On the machine in question I only can do it after hours so I will =
do
>>>>>> it tonight.
>>>>>>=20
>>>>>> I was _successfully_ sending the file over the loopback interface =
using
>>>>>>=20
>>>>>> cat /zpool/temp/zimbra_oldroot.vdi | ssh localhost "cat > =
/dev/null"
>>>>>>=20
>>>>>> I did it, btw, with the IPv6 localhost address first =
(accidently),
>>>>>> and then using IPv4. Both worked.
>>>>>>=20
>>>>>> It always fails if I am sending it through the bce(4) interface,
>>>>>> even if my target is the VirtualBox bridged to the bce card (so =
it
>>>>>> does not "leave" the computer physically).
>>>>>>=20
>>>>>> Below the uname -a, ifconfig -a, netstat -rn, pciconf -lv and
>>>>>> kldstat output.
>>>>>>=20
>>>>>> I have another box where I do not see that problem. It copies =
files
>>>>>> happily over the net using ssh.
>>>>>>=20
>>>>>> It is an an older HP ML 150 with 3GB RAM only but with a bge(4)
>>>>>> driver instead. It runs the same last week's RELENG_8. I =
installed
>>>>>> VirtualBox and enabled vboxnet (so it loads the kernel modules). =
But
>>>>>> I do not run VirtualBox on it (because it hasn't enough RAM).
>>>>>>=20
>>>>>> Regards
>>>>>> Peter
>>>>>>=20
>>>>>> DellT410one# uname -a
>>>>>> FreeBSD DellT410one.vv.fda 8.2-STABLE FreeBSD 8.2-STABLE #1: Thu =
Jun
>>>>>> 30 17:07:18 EST 2011
>>>>>> root@DellT410one.vv.fda:/usr/obj/usr/src/sys/GENERIC  amd64
>>>>>> DellT410one# ifconfig -a
>>>>>> bce0: flags=3D8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST>
>>>>>> metric 0 mtu 1500
>>>>>> 	=
options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO,LINKSTATE>
>>>>>> 	ether 84:2b:2b:68:64:e4
>>>>>> 	inet 192.168.50.220 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.221 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.223 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.224 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.225 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.226 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.227 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	inet 192.168.50.219 netmask 0xffffff00 broadcast 192.168.50.255
>>>>>> 	media: Ethernet autoselect (1000baseT <full-duplex>)
>>>>>> 	status: active
>>>>>> bce1: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu 1500
>>>>>> 	=
options=3Dc01bb<RXCSUM,TXCSUM,VLAN_MTU,VLAN_HWTAGGING,JUMBO_MTU,VLAN_HWCSU=
M,TSO4,VLAN_HWTSO,LINKSTATE>
>>>>>> 	ether 84:2b:2b:68:64:e5
>>>>>> 	media: Ethernet autoselect
>>>>>> lo0: flags=3D8049<UP,LOOPBACK,RUNNING,MULTICAST> metric 0 mtu =
16384
>>>>>> 	options=3D3<RXCSUM,TXCSUM>
>>>>>> 	inet6 fe80::1%lo0 prefixlen 64 scopeid 0xb
>>>>>> 	inet6 ::1 prefixlen 128
>>>>>> 	inet 127.0.0.1 netmask 0xff000000
>>>>>> 	nd6 options=3D3<PERFORMNUD,ACCEPT_RTADV>
>>>>>> vboxnet0: flags=3D8802<BROADCAST,SIMPLEX,MULTICAST> metric 0 mtu =
1500
>>>>>> 	ether 0a:00:27:00:00:00
>>>>>> DellT410one# netstat -rn
>>>>>> Routing tables
>>>>>>=20
>>>>>> Internet:
>>>>>> Destination        Gateway            Flags    Refs      Use  =
Netif Expire
>>>>>> default            192.168.50.201     UGS         0    52195   =
bce0
>>>>>> 127.0.0.1          link#11            UH          0        6    =
lo0
>>>>>> 192.168.50.0/24    link#1             U           0  1118212   =
bce0
>>>>>> 192.168.50.219     link#1             UHS         0     9670    =
lo0
>>>>>> 192.168.50.220     link#1             UHS         0     8347    =
lo0
>>>>>> 192.168.50.221     link#1             UHS         0   103024    =
lo0
>>>>>> 192.168.50.223     link#1             UHS         0    43614    =
lo0
>>>>>> 192.168.50.224     link#1             UHS         0     8358    =
lo0
>>>>>> 192.168.50.225     link#1             UHS         0     8438    =
lo0
>>>>>> 192.168.50.226     link#1             UHS         0     8338    =
lo0
>>>>>> 192.168.50.227     link#1             UHS         0     8333    =
lo0
>>>>>> 192.168.165.0/24   192.168.50.200     UGS         0     3311   =
bce0
>>>>>> 192.168.166.0/24   192.168.50.200     UGS         0      699   =
bce0
>>>>>> 192.168.167.0/24   192.168.50.200     UGS         0     3012   =
bce0
>>>>>> 192.168.168.0/24   192.168.50.200     UGS         0      552   =
bce0
>>>>>>=20
>>>>>> Internet6:
>>>>>> Destination                       Gateway
>>>>>> Flags      Netif Expire
>>>>>> ::1                               ::1                           =
UH
>>>>>> lo0
>>>>>> fe80::%lo0/64                     link#11                       U
>>>>>> lo0
>>>>>> fe80::1%lo0                       link#11                       =
UHS
>>>>>> lo0
>>>>>> ff01::%lo0/32                     fe80::1%lo0                   U
>>>>>> lo0
>>>>>> ff02::%lo0/32                     fe80::1%lo0                   U
>>>>>> lo0
>>>>>> DellT410one# kldstat
>>>>>> Id Refs Address            Size     Name
>>>>>> 1   19 0xffffffff80100000 dbf5d0   kernel
>>>>>> 2    3 0xffffffff80ec0000 4c358    vboxdrv.ko
>>>>>> 3    1 0xffffffff81012000 131998   zfs.ko
>>>>>> 4    1 0xffffffff81144000 1ff1     opensolaris.ko
>>>>>> 5    2 0xffffffff81146000 2940     vboxnetflt.ko
>>>>>> 6    2 0xffffffff81149000 8e38     netgraph.ko
>>>>>> 7    1 0xffffffff81152000 153c     ng_ether.ko
>>>>>> 8    1 0xffffffff81154000 e70      vboxnetadp.ko
>>>>>> DellT410one# pciconf -lv
>>>>>> ..
>>>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>   vendor     =3D 'Broadcom Corporation'
>>>>>>   class      =3D network
>>>>>>   subclass   =3D ethernet
>>>>>> bce1@pci0:1:0:1:        class=3D0x020000 card=3D0x028d1028
>>>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>>>   vendor     =3D 'Broadcom Corporation'
>>>>>>   class      =3D network
>>>>>>   subclass   =3D ethernet
>>>>>=20
>>>>> Could you please provide "pciconf -lvcb" output instead, specific =
to the
>>>>> bce chips?  Thanks.
>>>>=20
>>>> Her it is:
>>>>=20
>>>> bce0@pci0:1:0:0:        class=3D0x020000 card=3D0x028d1028
>>>> chip=3D0x163b14e4 rev=3D0x20 hdr=3D0x00
>>>>   vendor     =3D 'Broadcom Corporation'
>>>>   class      =3D network
>>>>   subclass   =3D ethernet
>>>>   bar   [10] =3D type Memory, range 64, base 0xda000000, size
>>>> 33554432, enabled
>>>>   cap 01[48] =3D powerspec 3  supports D0 D3  current D0
>>>>   cap 03[50] =3D VPD
>>>>   cap 05[58] =3D MSI supports 16 messages, 64 bit enabled with 1 =
message
>>>>   cap 11[a0] =3D MSI-X supports 9 messages in map 0x10
>>>>   cap 10[ac] =3D PCI-Express 2 endpoint max data 256(512) link =
x4(x4)
>>>> ecap 0003[100] =3D Serial 1 842b2bfffe6864e4
>>>> ecap 0001[110] =3D AER 1 0 fatal 0 non-fatal 1 corrected
>>>> ecap 0004[150] =3D unknown 1
>>>> ecap 0002[160] =3D VC 1 max VC0
>>>=20
>>> Thanks Peter.
>>>=20
>>> Adding Yong-Hyeon and David to the discussion, since they've both =
worked
>>> on the bce(4) driver in recent months (most of the changes made =
recently
>>> are only in HEAD), and also adding Jack Vogel of Intel who maintains
>>> em(4).  Brief history for the devs:
>>>=20
>>> The issue is described "Network memory allocation failures" and was
>>> reported last year, but two users recently (Scott and Peter) have
>>> reported the issue again:
>>>=20
>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2010-September/thread.ht=
ml#58708
>>>=20
>>> And was mentioned again by Scott here, which also contains some
>>> technical details:
>>>=20
>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063172.html
>>>=20
>>> What's interesting is that Scott's issue is identical in form but =
he's
>>> using em(4), which isn't known to behave like this.  Both =
individuals
>>> are using VirtualBox, though we're not sure at this point if that is =
the
>>> piece which is causing the anomaly.
>>>=20
>>> Relevant details of Scott's system (em-based):
>>>=20
>>> http://www.cap-press.com/misc/
>>>=20
>>> Relevant details of Peter's system (bce-based):
>>>=20
>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063221.html
>>> =
http://lists.freebsd.org/pipermail/freebsd-stable/2011-July/063223.html
>>>=20
>>> I think the biggest complexity right now is figuring out how/why scp
>>> fails intermittently in this nature.  The errno probably "trickles =
down"
>>> to userland from the kernel, but the condition regarding why it =
happens
>>> is unknown.
>>=20
>> BTW: I also saw 2 of the errors coming from a BIND9 running in a
>> jail on that box.
>>=20
>> DellT410one# fgrep -i allocate /jails/bind/20110315/var/log/messages
>> Apr 13 05:17:41 bind named[23534]: internal_send:
>> 192.168.50.145#65176: Cannot allocate memory
>> Jun 21 23:30:44 bind named[39864]: internal_send:
>> 192.168.50.251#36155: Cannot allocate memory
>> Jun 24 15:28:00 bind named[39864]: internal_send:
>> 192.168.50.251#28651: Cannot allocate memory
>> Jun 28 12:57:52 bind named[2462]: internal_send:
>> 192.168.165.154#1201: Cannot allocate memory
>>=20
>> My initial guess: it happens sooner or later somehow - whether it is
>> a lot of traffic in one go (ssh/scp copies of virtual disks) or a
>> lot of traffic over a longer period (a nameserver gets asked again
>> and again).
>=20
> Scott, are you also using jails?  If both of you are: is there any
> possibility you can remove use of those?  I'm not sure how VirtualBox
> fits into the picture (jails + VirtualBox that is), but I can imagine
> jails having different environmental constraints that might cause =
this.
>=20
> Basically the troubleshooting process here is to remove pieces of the
> puzzle until you figure out which piece is causing the issue.  I don't
> want to get the NIC driver devs all spun up for something that, for
> example, might be an issue with the jail implementation.

No jails here.

I do have one bind error message in all my logs:

daemon:Jun 20 10:52:28 xeon named[399]: internal_send: 10.1.2.95#51946: =
Cannot allocate memory

Greping my logs for "allocate" turned up a handful of memory allocation =
errors with netatalk too.

afpd.log:Jul 01 16:13:04.828835 afpd[18303] {dsi_stream.c:427} (E:DSI): =
dsi_stream_send: Cannot allocate memory
afpd.log:Jun 23 13:34:01.000987 afpd[17970] {fork.c:980} (E:AFPDaemon): =
afp_read(final file.pdf): Cannot allocate memory

And a handful from samba:

[2011/07/05 23:43:22.483224,  0] lib/util_sock.c:675(write_data)
  write_data: write failure in writing to client 10.1.1.10. Error Cannot =
allocate memory
[2011/07/05 23:43:22.493839,  0] smbd/process.c:79(srv_send_smb)
  Error writing 51 bytes to client. -1. (Cannot allocate memory)

I haven't personally seen any errors on the client side with =
samba/netatalk (and when scp was failing regularly I transferred the =
same files over netatalk+samba without error) nor have I had any reports =
of problems, but I guess there's a good chance all these log messages =
are related.

I've been trying to trigger the scp failure remotely tonight with no =
luck. I was triggering it regularly during the work day today, but not =
tonight. I will try to experiment tomorrow during the day with stopping =
VirtualBox and removing the kernel modules and seeing what happens.

Scott=

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?345CD069-BAC6-4E4E-A963-38C77CC74345>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation