From owner-freebsd-fs@freebsd.org  Fri Aug 28 09:55:09 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6F2639C4C77
 for <freebsd-fs@mailman.ysv.freebsd.org>; Fri, 28 Aug 2015 09:55:09 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: from mail-wi0-f175.google.com (mail-wi0-f175.google.com
 [209.85.212.175])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 0DF291084
 for <freebsd-fs@freebsd.org>; Fri, 28 Aug 2015 09:55:08 +0000 (UTC)
 (envelope-from killing@multiplay.co.uk)
Received: by wibcx1 with SMTP id cx1so8688526wib.1
 for <freebsd-fs@freebsd.org>; Fri, 28 Aug 2015 02:55:00 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:subject:to:references:from:message-id:date
 :user-agent:mime-version:in-reply-to:content-type
 :content-transfer-encoding;
 bh=tUtRcxOLMWAKZbQQo6yO/eosSCgrcRp0sHuketYr1BI=;
 b=ZrMIE6ovvRElu+5jvrveohvXFSs6yA9pF9NznJSmSBoznkuoU0A/b114zqZhwMJNSC
 fW3qpmOC6Wq2G2yDFZpwJLoJ/yAW/yFmS0ytgOajxxU/il6WfAQ6b9iLIk3aDDcDnHXz
 lFQElFHuKg0In3S6rs42DmGBu+nKFkClEjwhp/EK+H9VDr4F08BrtgkV65z8luWckS1D
 Ms249zhBVPxHrblX3OPYgtdfIs+xhkMQerOAW0a8sytA6k4F39yNzbo/VscCeSb81uBV
 bmIMgeIqaE7rXDzImDLSNqlUuzLNd9nnTW7HN30LYmAMVgN4wEQjuP/WxjuQ2Vov/F8i
 39Zg==
X-Gm-Message-State: ALoCoQlryyfpS2PsxLVjhTW60Ly2jij1tZm3PsI+aldJSST3sOJOwwADDzAOvZHPQ7lxhAdRVvBy
X-Received: by 10.194.57.205 with SMTP id k13mr10250051wjq.100.1440755700678; 
 Fri, 28 Aug 2015 02:55:00 -0700 (PDT)
Received: from [10.10.1.68] (82-69-141-170.dsl.in-addr.zen.co.uk.
 [82.69.141.170])
 by smtp.gmail.com with ESMTPSA id y13sm7227805wjq.26.2015.08.28.02.54.59
 for <freebsd-fs@freebsd.org> (version=TLSv1/SSLv3 cipher=OTHER);
 Fri, 28 Aug 2015 02:54:59 -0700 (PDT)
Subject: Re: Panic in ZFS during zfs recv (while snapshots being destroyed)
To: freebsd-fs@freebsd.org
References: <55BB443E.8040801@denninger.net> <55CF7926.1030901@denninger.net>
 <55DF7191.2080409@denninger.net>
 <sig.0681f4fd27.ADD991B6-BCF2-4B11-A5D6-EF1DB585AA33@chittenden.org>
From: Steven Hartland <killing@multiplay.co.uk>
Message-ID: <55E02FF5.2060805@multiplay.co.uk>
Date: Fri, 28 Aug 2015 10:55:01 +0100
User-Agent: Mozilla/5.0 (Windows NT 5.1; rv:38.0) Gecko/20100101
 Thunderbird/38.2.0
MIME-Version: 1.0
In-Reply-To: <sig.0681f4fd27.ADD991B6-BCF2-4B11-A5D6-EF1DB585AA33@chittenden.org>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Fri, 28 Aug 2015 09:55:09 -0000

You would need to have a very broken TRIM implementation for that to 
happen, do you have any details on the devices involved?

On 27/08/2015 21:30, Sean Chittenden wrote:
> Have you tried disabling TRIM?  We recently ran in to an issue where a `zfs delete` on a large dataset caused the host to panic because TRIM was tripping over the ZFS deadman timer.  Disabling TRIM worked as  valid workaround for us.  ?  You mentioned a recent move to SSDs, so this can happen, esp after the drive has experienced a little bit of actual work.  ?  -sc
>
>
> --
> Sean Chittenden
> sean@chittenden.org
>
>
>> On Aug 27, 2015, at 13:22, Karl Denninger <karl@denninger.net> wrote:
>>
>> On 8/15/2015 12:38, Karl Denninger wrote:
>>> Update:
>>>
>>> This /appears /to be related to attempting to send or receive a
>>> /cloned /snapshot.
>>>
>>> I use /beadm /to manage boot environments and the crashes have all
>>> come while send/recv-ing the root pool, which is the one where these
>>> clones get created.  It is /not /consistent within a given snapshot
>>> when it crashes and a second attempt (which does a "recovery"
>>> send/receive) succeeds every time -- I've yet to have it panic twice
>>> sequentially.
>>>
>>> I surmise that the problem comes about when a file in the cloned
>>> snapshot is modified, but this is a guess at this point.
>>>
>>> I'm going to try to force replication of the problem on my test system.
>>>
>>> On 7/31/2015 04:47, Karl Denninger wrote:
>>>> I have an automated script that runs zfs send/recv copies to bring a
>>>> backup data set into congruence with the running copies nightly.  The
>>>> source has automated snapshots running on a fairly frequent basis
>>>> through zfs-auto-snapshot.
>>>>
>>>> Recently I have started having a panic show up about once a week during
>>>> the backup run, but it's inconsistent.  It is in the same place, but I
>>>> cannot force it to repeat.
>>>>
>>>> The trap itself is a page fault in kernel mode in the zfs code at
>>>> zfs_unmount_snap(); here's the traceback from the kvm (sorry for the
>>>> image link but I don't have a better option right now.)
>>>>
>>>> I'll try to get a dump, this is a production machine with encrypted swap
>>>> so it's not normally turned on.
>>>>
>>>> Note that the pool that appears to be involved (the backup pool) has
>>>> passed a scrub and thus I would assume the on-disk structure is ok.....
>>>> but that might be an unfair assumption.  It is always occurring in the
>>>> same dataset although there are a half-dozen that are sync'd -- if this
>>>> one (the first one) successfully completes during the run then all the
>>>> rest will as well (that is, whenever I restart the process it has always
>>>> failed here.)  The source pool is also clean and passes a scrub.
>>>>
>>>> traceback is at http://www.denninger.net/kvmimage.png; apologies for the
>>>> image traceback but this is coming from a remote KVM.
>>>>
>>>> I first saw this on 10.1-STABLE and it is still happening on FreeBSD
>>>> 10.2-PRERELEASE #9 r285890M, which I updated to in an attempt to see if
>>>> the problem was something that had been addressed.
>>>>
>>>>
>>> -- 
>>> Karl Denninger
>>> karl@denninger.net <mailto:karl@denninger.net>
>>> /The Market Ticker/
>>> /[S/MIME encrypted email preferred]/
>> Second update: I have now taken another panic on 10.2-Stable, same deal,
>> but without any cloned snapshots in the source image. I had thought that
>> removing cloned snapshots might eliminate the issue; that is now out the
>> window.
>>
>> It ONLY happens on this one filesystem (the root one, incidentally)
>> which is fairly-recently created as I moved this machine from spinning
>> rust to SSDs for the OS and root pool -- and only when it is being
>> backed up by using zfs send | zfs recv (with the receive going to a
>> different pool in the same machine.)  I have yet to be able to provoke
>> it when using zfs send to copy to a different machine on the same LAN,
>> but given that it is not able to be reproduced on demand I can't be
>> certain it's timing related (e.g. performance between the two pools in
>> question) or just that I haven't hit the unlucky combination.
>>
>> This looks like some sort of race condition and I will continue to see
>> if I can craft a case to make it occur "on demand"
>>
>> -- 
>> Karl Denninger
>> karl@denninger.net <mailto:karl@denninger.net>
>> /The Market Ticker/
>> /[S/MIME encrypted email preferred]/
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"