Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 17 May 2016 11:08:14 +0200
From:      rainer@ultra-secure.de
To:        Fabian Keil <freebsd-listen@fabiankeil.de>
Cc:        FreeBSD Filesystems <freebsd-fs@freebsd.org>, owner-freebsd-fs@freebsd.org
Subject:   Re: zfs receive stalls whole system
Message-ID:  <c090ab7bbff2fffe2a49284f9be70183@ultra-secure.de>
In-Reply-To: <20160517102757.135c1468@fabiankeil.de>
References:  <0C2233A9-C64A-4773-ABA5-C0BCA0D037F0@ultra-secure.de> <20160517102757.135c1468@fabiankeil.de>

next in thread | previous in thread | raw e-mail | index | archive | help
Am 2016-05-17 10:27, schrieb Fabian Keil:
> Rainer Duffner <rainer@ultra-secure.de> wrote:
> 
>> I have two servers, that were running FreeBSD 10.1-AMD64 for a long 
>> time, one zfs-sending to the other (via zxfer). Both are NFS-servers 
>> and MySQL-slaves, the sender is actively used as NFS-server, the 
>> recipient is just a warm-standby, in case something serious happens 
>> and we don’t want to wait for a day until the restore is back in 
>> place. The MySQL-Slaves are actively used as read-only servers (at the 
>> application level, Python’s SQL-Alchemy does that, apparently).
>> 
>> They are HP DL380G8 (one CPU, hexacore) with over 128 GB RAM (I think 
>> one has 144, the other has 192).
>> While they were running 10.1, they used HP P420 RAID-controllers with 
>> individual 12 RAID0 volumes that I pooled into 6-disk RAIDZ2 vdevs.
>> I use zfsnap to do hourly, daily and weekly snapshots.
> [...]
>> Now, when I do a zxfer, sometimes the whole system stalls while the 
>> data is sent over, especially if the delta is large or if something 
>> else is reading from the disk at the same time (backup agent).
>> 
>> I had this before, on 10.0 (I believe, we didn’t have this in 9.1 
>> either, IIRC) and it went away in 10.1.
> 
> Do you use geli for swap device(s)?


Yes, I do.
/dev/mirror/swap.eli		none	swap	sw		0	0

Bad idea?


>> It’s very difficult (well, impossible) to debug, because the system 
>> totally hangs and doesn’t accept any keypresses.
> 
> You could try reducing ZFS's deadman timeout to get a panic.
> On systems with local disks I usually use:
> 
> vfs.zfs.deadman_enabled: 1
> vfs.zfs.deadman_checktime_ms: 5000
> vfs.zfs.deadman_synctime_ms: 10000


Too bad I don't have a spare-system I could use to test this ;-)



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?c090ab7bbff2fffe2a49284f9be70183>