Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 4 Mar 2019 10:44:22 +0100
From:      Ole <ole@free.de>
To:        freebsd-questions@freebsd.org
Subject:   Re: ZFS deadlock with virtio (was: ZFS deadlock on parallel ZFS operations FreeBSD 11.2 and 12.0)
Message-ID:  <20190304104422.443a8c20.ole@free.de>
In-Reply-To: <20190219101717.61526ab1.ole@free.de>
References:  <20190215113423.01edabe9.ole@free.de> <20190219101717.61526ab1.ole@free.de>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/DCOjDCL5C54O7HpcW38p1mj
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Hello,

I have done some investigations. I think that there a two different
problems, so lets focus on the bhyve VM. I can now reproduce the
behaviour very well. It seems to be connected to the virtio disks.

The disk stack is:

Geli-encryption
Zpool (mirror)
Zvol
virtio
Zpool

- Hostsystem is FreeBSD 11.2
- VM is FreeBSD 12.0 (VM-Raw image + additional disk for zpool)
- VM is controlled by vm-bhyve
- inside the VM there are 5 to 10 running jails (managed with iocage)

If I start the Bhyve VM and let the Backups run (~10 operations per
hour) the Zpool inside the VM will crash after 1 to 2 days.

If I change the Disk from irtio-blk to ahci-hd, the VM keeps stable.

regards
Ole

Tue, 19 Feb 2019 10:17:17 +0100 - Ole <ole@free.de>:

> Hi,
>=20
> ok now I got a again unkillable ZFS process. It is only one 'zfs send'
> command. Any Idea how to kill this process without powering off the
> machine?
>=20
> oot@jails1:/usr/home/admin # ps aux | grep 'zfs send'
> root      17617   0.0  0.0  12944  3856  -  Is   Sat04       0:00.00
> sudo zfs send -e -I
> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-
> root      17618   0.0  0.0  12980  4036  -  D    Sat04       0:00.01
> zfs send -e -I
> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16
> root      19299   0.0  0.0  11320  2588  3  S+   09:53       0:00.00
> grep zfs send root@jails1:/usr/home/admin # kill -9 17618
> root@jails1:/usr/home/admin # ps aux | grep 'zfs send' root
> 17617   0.0  0.0  12944  3856  -  Is   Sat04       0:00.00 sudo zfs
> send -e -I
> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-
> root      17618   0.0  0.0  12980  4036  -  D    Sat04       0:00.01
> zfs send -e -I
> cryptopool/iocage/jails/2fe7ae89-760e-423c-8e7f-4f504e0f08bf@2019-02-16
> root      19304   0.0  0.0  11320  2588  3  S+   09:53       0:00.00
> grep zfs send
>=20
> It is a FreeBSD 12.0 VM-Image running in a Bhyve VM. There is basicly
> only py36-iocage installed, and there are 7 running Jails.=20
>=20
> There is 30G RAM and sysctl vfs.zfs.arc_max ist set to 20G. It seems
> that the whole zpool is in some kind of deadlock. All Jails are
> crashed, unkillable and I can not run any command inside.=20
>=20
> regards
> Ole
>=20
>=20
> Fri, 15 Feb 2019 11:34:23 +0100 - Ole <ole@free.de>:
>=20
> > Hi,
> >=20
> > I observed that FreeBSD Systems with ZFS will run into a deadlock if
> > there are many parallel zfs send/receive/snapshot processes.
> >=20
> > I observed this on bare metal and virtual machines with FreeBSD 11.2
> > and 12.0. With RAM from 20 to 64G.
> >=20
> > If the system is also on ZFS the whole system crashes. With only
> > jails on ZFS they freeze, but the Host system stays stable. But you
> > can't kill -9 the zfs processes. Only a poweroff stops the machine.
> >=20
> > On a FreeBSD 12.0 VM (bhyve), 30G RAM, 5 CPUs, about 30 zfs
> > operations, mostly send and receive will crash the system.
> >=20
> > There is no heavy load on the machine:
> >=20
> > # top | head -8
> > last pid: 91503;  load averages:  0.34,  0.31,  0.29  up 0+22:50:47
> > 11:24:00 536 processes: 1 running, 529 sleeping, 6 zombie
> > CPU:  0.9% user,  0.0% nice,  1.5% system,  0.2% interrupt, 97.4%
> > idle Mem: 165M Active, 872M Inact, 19G Wired, 264M Buf, 9309M Free
> > ARC: 11G Total, 2450M MFU, 7031M MRU, 216M Anon, 174M Header, 1029M
> > Other 8423M Compressed, 15G Uncompressed, 1.88:1 Ratio
> > Swap: 1024M Total, 1024M Free
> >=20
> > I wonder if this is a BUG or normal behaviour. I could live with a
> > limited amount of parallel ZFS operation, but I don't want the whole
> > system to crash.=20
> >=20
> > Reducing the vfs.zfs.arc_max wont help.
> >=20
> > Any Idea to handle with this?
> >=20
> > regards
> > Ole =20

--Sig_/DCOjDCL5C54O7HpcW38p1mj
Content-Type: application/pgp-signature
Content-Description: Digitale Signatur von OpenPGP

-----BEGIN PGP SIGNATURE-----

iQIzBAEBCAAdFiEE60BGd7KVfL83NXCUJZaRRqjklFAFAlx883YACgkQJZaRRqjk
lFDTxQ//b2xW+wqwrX3j94xGl2UEFgVDoRTuS974IJKIjZhmwTeSJlFn/UGf/C9D
stY5LZugHeB3gORzg38xygwaHp33km+v69ENXl3ACfCj/Tl6dALJT/K4UtEjbiaH
qrHkMj27wM1cjJyDnFX9bqejALca+66AMtsOStvFo2Ukv8SOJ52zgLPBsb55QY6P
z7oXWSkktFbI7k5sJBcfkFv6Z2bhz2B8LPsQQQaFcQSoU4t6I9FwM9e7oMbE6SvP
0GX0m7kJwcNYtRd+cg/3BEXTF8ZYmYOLEqbrW1NSbWOg/aQn3DUkq1jU2m1X1qgs
KTCWVIBXqxmpvKlTobje4u4ZbOJEr4HsTqaF0OQzs6f2lE+ZXiZcjA2W62JN8iTi
HpWlpqlrdZmb47eJcm51eXQkeYgqZouNleTwstVe3NAJqmDgk4GwZLHOIj+KT+E9
lkXm7dx3ffgaeZGYG5G/wzLcYqBx7mvnaOZqM7/6zHe5pigUNIdo6QZ+YopP5Wj6
PwA+en5V0WYfM+8MhRMTh/dd5hNBZewODTBivefFIIWxbr9TdQINlWv77bavBi5C
emRKruURHDbnkPmZA8zCaTYS6SFSQQNTBy6leynflntXdhcc1Di0sdQTeELmmF1x
FI6uK/OgkHkIgkQkaOFnqFBQ8G1M5NWgQ5I4WpgEGQAmM0isxoM=
=b47W
-----END PGP SIGNATURE-----

--Sig_/DCOjDCL5C54O7HpcW38p1mj--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20190304104422.443a8c20.ole>