Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Jul 2015 06:58:56 -0500
From:      Karl Denninger <karl@denninger.net>
To:        freebsd-fs@freebsd.org
Subject:   Re: FreeBSD 10.1 Memory Exhaustion
Message-ID:  <55A3A800.5060904@denninger.net>
In-Reply-To: <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com>
References:  <CAB2_NwCngPqFH4q-YZk00RO_aVF9JraeSsVX3xS0z5EV3YGa1Q@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
This is a cryptographically signed message in MIME format.

--------------ms030401070204080609030509
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: quoted-printable

Put this on your box and see if the problem goes away.... :-)

https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D187594

The 2015-02-10 refactor will apply against 10.1-STABLE and 10.2-PRE (the
latter will give you a 10-line fuzz in one block but applies and works.)

I've been unable to provoke misbehavior with this patch in and I run a
cron job that does auto-snapshotting.  There are others that have run
this patch with similarly positive results.

On 7/13/2015 06:48, Christopher Forgeron wrote:
> TL;DR Summary: I can run FreeBSD out of memory quite consistently, and =
it=92s
> not a TOS/mbuf exhaustion issue. It=92s quite possible that ZFS is the
> culprit, but shouldn=92t the pager be able to handle aggressive memory
> requests in a low memory situation gracefully, without needing custom
> tuning of ZFS / VM?
>
>
> Hello,
>
> I=92ve been dealing with some instability in my 10.1-RELEASE and
> STABLEr282701M machines for the last few months.
>
> These machines are NFS/iSCSI storage machines, running on Dell M610x or=

> similar hardware, 96 Gig Memory, 10Gig Network Cards, dual Xeon Process=
ors
> =96 Fairly beefy stuff.
>
> Initially I thought it was more issues with TOS / jumbo mbufs, as I had=

> this problem last year. I had thought that this was properly resolved, =
but
> setting my MTU to 1500, and turning off TOS did give me a bit more
> stability. Currently all my machines are set this way.
>
> Crashes were usually represented by loss of network connectivity, and t=
he
> ctld daemon scrolling messages across the screen at full speed about lo=
st
> connections.
>
> All of this did seem like more network stack problems, but with each cr=
ash
> I=92d be able to learn a bit more.
>
> Usually there was nothing of any use in the logfile, but every now and =
then
> I=92d get this:
>
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 8=
0
> bytes
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  3 13:02:04 san0 kernel: WARNING: icl_pdu_new: failed to allocate 8=
0
> bytes
> Jun  3 13:02:04 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> ---------
> Jun  4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 8=
0
> bytes
> Jun  4 03:03:09 san0 kernel: WARNING: icl_pdu_new: failed to allocate 8=
0
> bytes
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): failed to allocate memory
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping
> connection
> Jun  4 03:03:09 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): connection error; dropping
> connection
> Jun  4 03:03:10 san0 kernel: WARNING: 172.16.0.97
> (iqn.1998-01.com.vmware:esx5a-3387a188): waiting for CTL to terminate
> tasks, 1 remaining
> Jun  4 06:04:27 san0 syslogd: kernel boot file is /boot/kernel/kernel
>
> So knowing that it seemed to be running out of memory, I started leavin=
g
> leaving =91vmstat 5=92 running on a console, to see what it was display=
ing
> during the crash.
>
> It was always the same thing:
>
>  0 0 0   1520M  4408M    15   0   0   0    25  19   0   0 21962 1667 91=
390
>  0 33 67
>  0 0 0   1520M  4310M     9   0   0   0     2  15   3   0 21527 1385 95=
165
>  0 31 69
>  0 0 0   1520M  4254M     7   0   0   0    14  19   0   0 17664 1739 72=
873
>  0 18 82
>  0 0 0   1520M  4145M     2   0   0   0     0  19   0   0 23557 1447 96=
941
>  0 36 64
>  0 0 0   1520M  4013M     4   0   0   0    14  19   0   0 4288  490 346=
85
>  0 72 28
>  0 0 0   1520M  3885M     2   0   0   0     0  19   0   0 11141 1038 69=
242
>  0 52 48
>  0 0 0   1520M  3803M    10   0   0   0    14  19   0   0 24102 1834 91=
050
>  0 33 67
>  0 0 0   1520M  8192B     2   0   0   0     2  15   1   0 19037 1131 77=
470
>  0 45 55
>  0 0 0   1520M  8192B     0  22   0   0     2   0   6   0  146   82  57=
8  0
>  0 100
>  0 0 0   1520M  8192B     1   0   0   0     0   0   0   0  130   40  51=
0  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  143   40  50=
1  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  201   62  66=
0  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0  101   28  40=
4  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   97   27  39=
8  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   93   28  37=
7  0
>  0 100
>  0 0 0   1520M  8192B     0   0   0   0     0   0   0   0   92   27  37=
3  0
>  0 100
>
>
>  I=92d go from a decent amount of free memory to suddenly having none. =
Vmstat
> would stop outputting, console commands would hang, etc. The whole syst=
em
> would be useless.
>
> Looking into this, I came across a similar issue;
>
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D199189
>
> I started increasing v.v_free_min, and it helped =96 My crashes went fr=
om
> being ~every 6 hours to every few days.
>
> Currently I=92m running with vm.v_free_min=3D1254507 =96 That=92s (1254=
507 * 4KiB)
> , or 4.78GiB of Reserve.  The vmstat above is of a machine with that
> setting still running to 8B of memory.
>
> I have two issues here:
>
> 1) I don=92t think I should ever be able to run the system into the gro=
und on
> memory. Deny me new memory until the pager can free more.
> 2) Setting =91min=92 doesn=92t really mean =91min=92 as it can obviousl=
y go below
> that threshold.
>
>
> I have plenty of local UFS swap (non-ZFS drives)
>
>  Adrian requested that I output a few more diagnostic items, and this i=
s
> what I=92m running on a console now, in a loop:
>
>         vmstat
>         netstat -m
>         vmstat -z
>         sleep 1
>
> The output of four crashes are attached here, as they can be a bit long=
=2E
> Let me know if that=92s not a good way to report them. They will each s=
tart
> mid-way through a vmstat =96z output, as that=92s as far back as my ter=
minal
> buffer allows.
>
>
>
> Now, I have a good idea of the conditions that are causing this: ZFS
> Snapshots, run by cron, during times of high ZFS writes.
>
> The crashes are all nearly on the hour, as that=92s when crontab trigge=
rs my
> python scripts to make new snapshots, and delete old ones.
>
> My average FreeBSD machine has ~ 30 zfs datasets, with each pool having=
 ~20
> TiB used. These all need to snapshot on the hour.
>
> By staggering the snapshots by a few minutes, I have been able to reduc=
e
> crashing from every other day to perhaps once a week if I=92m lucky =96=
 But if
> I start moving a lot of data around, I can cause daily crashes again.
>
> It=92s looking to be the memory demand of snapshotting lots of ZFS data=
sets
> at the same time while accepting a lot of write traffic.
>
> Now perhaps the answer is =91don=92t do that=92 but I feel that FreeBSD=
 should be
> robust enough to handle this. I don=92t mind tuning for now to
> reduce/eliminate this, but others shouldn=92t run into this pain just b=
ecause
> they heavily load their machines =96 There must be a way of avoiding th=
is
> condition.
>
> Here are the contents of my /boot/loader.conf and sysctl.conf, so show =
my
> minimal tuning to make this problem a little more bearable:
>
> /boot/loader.conf
> vfs.zfs.arc_meta_limit=3D49656727553
> vfs.zfs.arc_max =3D 91489280512
>
> /etc/sysctl.conf
> vm.v_free_min=3D1254507
>
>
> Any suggestions/help is appreciated.
>
> Thank you.
>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> http://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"

--=20
Karl Denninger
karl@denninger.net <mailto:karl@denninger.net>
/The Market Ticker/
/[S/MIME encrypted email preferred]/

--------------ms030401070204080609030509
Content-Type: application/pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIGXzCC
BlswggRDoAMCAQICASkwDQYJKoZIhvcNAQELBQAwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQI
EwdGbG9yaWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBM
TEMxHDAaBgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEg
U3lzdGVtcyBMTEMgQ0EwHhcNMTUwNDIxMDIyMTU5WhcNMjAwNDE5MDIyMTU5WjBaMQswCQYD
VQQGEwJVUzEQMA4GA1UECBMHRmxvcmlkYTEZMBcGA1UEChMQQ3VkYSBTeXN0ZW1zIExMQzEe
MBwGA1UEAxMVS2FybCBEZW5uaW5nZXIgKE9DU1ApMIICIjANBgkqhkiG9w0BAQEFAAOCAg8A
MIICCgKCAgEAuYRY+EB2mGtZ3grlVO8TmnEvduVFA/IYXcCmNSOC1q+pTVjylsjcHKBcOPb9
TP1KLxdWP+Q1soSORGHlKw2/HcVzShDW5WPIKrvML+Ry0XvIvNBu9adTiCsA9nci4Cnf98XE
hVpenER0qbJkBUOGT1rP4iAcfjet0lEgzPEnm+pAxv6fYSNp1WqIY9u0b1pkQiaWrt8hgNOc
rJOiLbc8CeQ/DBP6rUiQjYNO9/aPNauEtHkNNfR9RgLSfGUdZuOCmJqnIla1HsrZhA5p69Bv
/e832BKiNPaH5wF6btAiPpTr2sRhwQO8/IIxcRX1Vxd1yZbjYtJGw+9lwEcWRYAmoxkzKLPi
S6Zo/6z5wgNpeK1H+zOioMoZIczgI8BlX1iHxqy/FAvm4PHPnC8s+BLnJLwr+jvMNHm82QwL
J9hC5Ho8AnFU6TkCuq+P2V8/clJVqnBuvTUKhYMGSm4mUp+lAgR4L+lwIEqSeWVsxirIcE7Z
OKkvI7k5x3WeE3+c6w74L6PfWVAd84xFlo9DKRdU9YbkFuFZPu21fi/LmE5brImB5P+jdqnK
eWnVwRq+RBFLy4kehCzMXooitAwgP8l/JJa9VDiSyd/PAHaVGiat2vCdDh4b8cFL7SV6jPA4
k0MgGUA/6Et7wDmhZmCigggr9K6VQCx8jpKB3x1NlNNiaWECAwEAAaOB9DCB8TA3BggrBgEF
BQcBAQQrMCkwJwYIKwYBBQUHMAGGG2h0dHA6Ly9jdWRhc3lzdGVtcy5uZXQ6ODg4ODAJBgNV
HRMEAjAAMBEGCWCGSAGG+EIBAQQEAwIFoDALBgNVHQ8EBAMCBeAwLAYJYIZIAYb4QgENBB8W
HU9wZW5TU0wgR2VuZXJhdGVkIENlcnRpZmljYXRlMB0GA1UdDgQWBBTFHJQt6cloXBdG1Pv1
o2YgH+7lWTAfBgNVHSMEGDAWgBQkcZudhX383d29sMqSlAOh+tNtNTAdBgNVHREEFjAUgRJr
YXJsQGRlbm5pbmdlci5uZXQwDQYJKoZIhvcNAQELBQADggIBAE9/dxi2YqjCYYhiybp4GKcm
7tBVa/GLW+qcHPcoT4dqmqghlLz8+iUH+HCJjRQATVGyMEnvISOKFVHC6aZIG+Sg7J8bfS4+
fjKDi9smRH2VPPx3bV8+yFYRNroMGHaPHZB/Xctmmvc+PZ9O2W7rExgrODtxIOB3Zs6wkYf+
ty+9r1KmTHlV+rRHI6timH1uiyFE3cPi1taAEBxf0851cJV8k40PGF8G48ewnq8SY9sCf5cv
liXbpdgU+I4ND5BuTjg63WS32zuhLd1VSuH3ZC/QbcncMX5W3oLXmcQP5/5uTiBJy74kdPtG
MSZ9rXwZPwNxP/8PXMSR7ViaFvjUkf4bJlyENFa2PGxLk4EUzOuO7t3brjMlQW1fuInfG+ko
3tVxko20Hp0tKGPe/9cOxBVBZeZH/VgpZn3cLculGzZjmdh2fqAQ6kv9Z9AVOG1+dq0c1zt8
2zm+Oi1pikGXkfz5UJq60psY6zbX25BuEZkthO/qiS4pxjxb7gQkS0rTEHTy+qv0l3QVL0wa
NAT74Zaj7l5DEW3qdQQ0dtVieyvptg9CxkfQJE3JyBMb0zBj9Qhc5/hbTfhSlHzZMEbUuIyx
h9vxqFAmGzfB1/WfOKkiNHChkpPW8ZeH9yPeDBKvrgZ96dREHFoVkDk7Vpw5lSM+tFOfdyLg
xxhb/RZVUDeUMYIE4zCCBN8CAQEwgZYwgZAxCzAJBgNVBAYTAlVTMRAwDgYDVQQIEwdGbG9y
aWRhMRIwEAYDVQQHEwlOaWNldmlsbGUxGTAXBgNVBAoTEEN1ZGEgU3lzdGVtcyBMTEMxHDAa
BgNVBAMTE0N1ZGEgU3lzdGVtcyBMTEMgQ0ExIjAgBgkqhkiG9w0BCQEWE0N1ZGEgU3lzdGVt
cyBMTEMgQ0ECASkwCQYFKw4DAhoFAKCCAiEwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAc
BgkqhkiG9w0BCQUxDxcNMTUwNzEzMTE1ODU2WjAjBgkqhkiG9w0BCQQxFgQUn7/qalbe53fp
ZqLqyBH8XFk3o7owbAYJKoZIhvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAEC
MAoGCCqGSIb3DQMHMA4GCCqGSIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzAN
BggqhkiG9w0DAgIBKDCBpwYJKwYBBAGCNxAEMYGZMIGWMIGQMQswCQYDVQQGEwJVUzEQMA4G
A1UECBMHRmxvcmlkYTESMBAGA1UEBxMJTmljZXZpbGxlMRkwFwYDVQQKExBDdWRhIFN5c3Rl
bXMgTExDMRwwGgYDVQQDExNDdWRhIFN5c3RlbXMgTExDIENBMSIwIAYJKoZIhvcNAQkBFhND
dWRhIFN5c3RlbXMgTExDIENBAgEpMIGpBgsqhkiG9w0BCRACCzGBmaCBljCBkDELMAkGA1UE
BhMCVVMxEDAOBgNVBAgTB0Zsb3JpZGExEjAQBgNVBAcTCU5pY2V2aWxsZTEZMBcGA1UEChMQ
Q3VkYSBTeXN0ZW1zIExMQzEcMBoGA1UEAxMTQ3VkYSBTeXN0ZW1zIExMQyBDQTEiMCAGCSqG
SIb3DQEJARYTQ3VkYSBTeXN0ZW1zIExMQyBDQQIBKTANBgkqhkiG9w0BAQEFAASCAgBZkAw+
z2e0+ZzQkpD4BDW9dWBwV0ZZVpMq9/80TKuILBKjDSDM3Bto3mMtBZ+lO9y163Syx1q0oex1
znKkAvL5P9LExozyp2F2Y+cqDfZWZEBmo8gkyvyoYRiYYj87tzjjOCUw7MoTaj3YTqyQyY/8
TnRYrljNjqUIIcI++fC8jtce6kGUpHVb/G3/ZrPaSzR8KxkRfHxgC1L1g4VBfOFKguCOPEQt
RSTqkK/qnN0KRKEFdZsIu1lTSlD8j01D7gfoIFSR2z8aPf36zkx9/7jtWnptWhc9dy/+dgeG
d/ewZDn0QisNj8jYdIFCAfbe8Vq68cI1bgkm3X2QP2yHFjv4jpYbH6BAQYK84j5YiqFEanoK
RGpXOBP4PfepHENY9cLQS6Fz31Zolz01R+/SxA6yLWqwl5AD03fcCQs0ixJj7SjzTbKv91WY
eRE8Wbq3TZz7AqWiOyCXljFJHxYW/M9avNqMxaOoYYhiVElzVN0t3dVDdOnBEIIT8WzJiv3D
b75WMt7P+0w8J7hMxeUR1+kurN4KaW5gFyz8eUOm1IYusjuas108lcwMKOTwEnNVvRDx4AnF
uwPWbNPQVWJVdANAGBJhwugISZ+bYrv9oWBvnPKHiaESuwnf6P1HEmz7xdW6P4iOgc1HVzfj
clfmMlFBWCM56bUsKxPkOeFs/UuxvAAAAAAAAA==
--------------ms030401070204080609030509--





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55A3A800.5060904>