Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 22 Jul 2016 14:39:01 +0100
From:      Steven Hartland <killing@multiplay.co.uk>
To:        Andriy Gapon <avg@FreeBSD.org>, Karl Denninger <karl@denninger.net>, freebsd-stable@FreeBSD.org
Subject:   Re: Panic on BETA1 in the ZFS subsystem
Message-ID:  <89b66fd6-09d8-d8a2-4894-3a6e5f73a0bb@multiplay.co.uk>
In-Reply-To: <6cb46059-85c8-0c3b-7346-773647f1a962@FreeBSD.org>
References:  <8f44bc09-1237-44d0-fe7a-7eb9cf4fe85b@denninger.net> <54e5974c-312e-c33c-ab83-9e1148618ddc@FreeBSD.org> <97cf5283-683b-83fd-c484-18c14973b065@denninger.net> <c2f24b1e-be84-bcdd-ea0b-515cd2aca266@FreeBSD.org> <1f064549-fa72-fe9b-d66d-85923437bb9b@denninger.net> <6cb46059-85c8-0c3b-7346-773647f1a962@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On 21/07/2016 13:52, Andriy Gapon wrote:
> On 21/07/2016 15:25, Karl Denninger wrote:
>> The crash occurred during a backup script operating, which is (roughly)
>> the following:
>>
>> zpool import -N backup (mount the pool to copy to)
>>
>> iterate over a list of zfs filesystems and...
>>
>> zfs rename fs@zfs-base fs@zfs-old
>> zfs snapshot fs@zfs-base
>> zfs send -RI fs@zfs-old fs@zfs-base | zfs receive -Fudv backup
>> zfs destroy -vr fs@zfs-old
>>
>> The first filesystem to be done is the rootfs, that is when it panic'd,
>> and from the traceback it appears that the Zio's in there are from the
>> backup volume, so the answer to your question is "yes".
> I think that what happened here was that a quite large number of TRIM
> requests was queued by ZFS before it had a chance to learn that the
> target vdev in the backup pool did not support TRIM.  So, when the the
> first request failed with ENOTSUP the vdev was marked as not supporting
> TRIM.  After that all subsequent requests were failed without sending
> them down the storage stack.  But the way it is done means that all the
> requests were processed by the nested zio_execute() calls on the same
> stack.  And that lead to the stack overflow.
>
> Steve, do you think that this is a correct description of what happened?
>
> The state of the pools that you described below probably contributed to
> the avalanche of TRIMs that caused the problem.
>
Yes does indeed sound like what happened to me.

     Regards
     Steve



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?89b66fd6-09d8-d8a2-4894-3a6e5f73a0bb>