Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2016 10:27:57 +0200
From:      Fabian Keil <freebsd-listen@fabiankeil.de>
To:        FreeBSD Filesystems <freebsd-fs@freebsd.org>
Subject:   Re: zfs receive stalls whole system
Message-ID:  <20160517102757.135c1468@fabiankeil.de>
In-Reply-To: <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>
References:  <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/ZojT=4SLUeXeJZEf2IdOajl
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Rainer Duffner <rainer@ultra-secure.de> wrote:

> I have two servers, that were running FreeBSD 10.1-AMD64 for a long time,=
 one zfs-sending to the other (via zxfer). Both are NFS-servers and MySQL-s=
laves, the sender is actively used as NFS-server, the recipient is just a w=
arm-standby, in case something serious happens and we don=E2=80=99t want to=
 wait for a day until the restore is back in place. The MySQL-Slaves are ac=
tively used as read-only servers (at the application level, Python=E2=80=99=
s SQL-Alchemy does that, apparently).
>=20
> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think one=
 has 144, the other has 192).
> While they were running 10.1, they used HP P420 RAID-controllers with ind=
ividual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
> I use zfsnap to do hourly, daily and weekly snapshots.
[...]
> Now, when I do a zxfer, sometimes the whole system stalls while the data =
is sent over, especially if the delta is large or if something else is read=
ing from the disk at the same time (backup agent).
>=20
> I had this before, on 10.0 (I believe, we didn=E2=80=99t have this in 9.1=
 either, IIRC) and it went away in 10.1.

Do you use geli for swap device(s)?

> It=E2=80=99s very difficult (well, impossible) to debug, because the syst=
em totally hangs and doesn=E2=80=99t accept any keypresses.

You could try reducing ZFS's deadman timeout to get a panic.
On systems with local disks I usually use:

vfs.zfs.deadman_enabled: 1
vfs.zfs.deadman_checktime_ms: 5000
vfs.zfs.deadman_synctime_ms: 10000

Fabian

--Sig_/ZojT=4SLUeXeJZEf2IdOajl
Content-Type: application/pgp-signature
Content-Description: OpenPGP digital signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iEYEARECAAYFAlc61g4ACgkQBYqIVf93VJ0shgCaA2wnHQq+AKX3XK7yt5jWKHZ/
rUEAn1IMBjKGvRcA9ZljB/Qy7cY0gLAk
=TR3y
-----END PGP SIGNATURE-----

--Sig_/ZojT=4SLUeXeJZEf2IdOajl--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20160517102757.135c1468>