From owner-freebsd-fs@freebsd.org  Mon Sep 14 23:42:24 2015
Return-Path: <owner-freebsd-fs@freebsd.org>
Delivered-To: freebsd-fs@mailman.ysv.freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 by mailman.ysv.freebsd.org (Postfix) with ESMTP id 8C63CA03C14
 for <freebsd-fs@mailman.ysv.freebsd.org>; Mon, 14 Sep 2015 23:42:24 +0000 (UTC)
 (envelope-from schittenden@groupon.com)
Received: from mail-wi0-x235.google.com (mail-wi0-x235.google.com
 [IPv6:2a00:1450:400c:c05::235])
 (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits))
 (Client CN "smtp.gmail.com",
 Issuer "Google Internet Authority G2" (verified OK))
 by mx1.freebsd.org (Postfix) with ESMTPS id 244F11276
 for <freebsd-fs@freebsd.org>; Mon, 14 Sep 2015 23:42:23 +0000 (UTC)
 (envelope-from schittenden@groupon.com)
Received: by wiclk2 with SMTP id lk2so6028340wic.0
 for <freebsd-fs@freebsd.org>; Mon, 14 Sep 2015 16:42:22 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=groupon.com; s=google;
 h=mime-version:in-reply-to:references:from:date:message-id:subject:to
 :cc:content-type;
 bh=s51Q6KQRIInEbUG3S+45xIxKkPaLn8jcuzibMPEW69I=;
 b=QbC3goleAzAq6DfiYDB8iWwe6AF+8lEp3L5D9+4hH2vn5ebPCzhp+p6yXrF4qB5XmD
 w9t+BvzaIGWdI7IjZ4OVDh7yLmw6WGSPMIR5VrlOU8R6z5ipnjX6PJxBBXKeHV1G0saq
 d06aS9vbgBbHKtb2tJP/RdXb/2gbfHdwIOus4=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20130820;
 h=x-gm-message-state:mime-version:in-reply-to:references:from:date
 :message-id:subject:to:cc:content-type;
 bh=s51Q6KQRIInEbUG3S+45xIxKkPaLn8jcuzibMPEW69I=;
 b=TE5DKJTOLSWp+q3mGXYxxSmvYgEkod3BW8+qpYgnvOw+EKlNaw/xncFjiFojEv4/B7
 yapBd5AsoXGwl83EnW08+VmRG29xp3K69gROHBkcjqx3bbmXyoPBvd6TiuTm5evbzHak
 Ihi90+0PXiVWdvm3E2Ax440R9MP1bMEJMCvLoqohV+6IJMQOA24HBSvo9ib2Sq0Hue1+
 ngUEmFO6MnRCl1xAoa2OHmGvp5Ls4AykhwxzuTeItvTm7GR09aNhQlT9QyOgdPXJRPvl
 TE5VDbWUlua6kAedgMg4tHB4xw0S7C14TqUZEwtxGxTJGQuLWh5mQsEJWIsVDy7dZd21
 sumw==
X-Gm-Message-State: ALoCoQk49tppsrqhk3yqlzJntaqhe7rJuttRyQsjru+erXQBEmyXuTET1O9p0DUwv1ASh97sfCXz
X-Received: by 10.194.173.72 with SMTP id bi8mr33635194wjc.100.1442274142300; 
 Mon, 14 Sep 2015 16:42:22 -0700 (PDT)
MIME-Version: 1.0
Received: by 10.194.116.230 with HTTP; Mon, 14 Sep 2015 16:41:42 -0700 (PDT)
In-Reply-To: <55F57439.8060000@multiplay.co.uk>
References: <CAJjvXiE2mRT4=kPMk3gwiT-3ykeAhaYBx6Tw6HgXhs2=XZWWFg@mail.gmail.com>
 <55F308B7.3020302@FreeBSD.org> <55F57439.8060000@multiplay.co.uk>
From: Sean Chittenden <seanc@groupon.com>
Date: Mon, 14 Sep 2015 16:41:42 -0700
Message-ID: <CACfj5vKcTofNK6XdNAyGTx1NrFo=ptW3_U6c7XYnv7dDS3OJNA@mail.gmail.com>
Subject: Re: zfs_trim_enabled destroys zio_free() performance
To: Steven Hartland <killing@multiplay.co.uk>
Cc: FreeBSD Filesystems <freebsd-fs@freebsd.org>,
 Matthew Ahrens <mahrens@delphix.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Content-Filtered-By: Mailman/MimeDel 2.1.20
X-BeenThere: freebsd-fs@freebsd.org
X-Mailman-Version: 2.1.20
Precedence: list
List-Id: Filesystems <freebsd-fs.freebsd.org>
List-Unsubscribe: <https://lists.freebsd.org/mailman/options/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-fs/>
List-Post: <mailto:freebsd-fs@freebsd.org>
List-Help: <mailto:freebsd-fs-request@freebsd.org?subject=help>
List-Subscribe: <https://lists.freebsd.org/mailman/listinfo/freebsd-fs>,
 <mailto:freebsd-fs-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Mon, 14 Sep 2015 23:42:24 -0000

Random
=E2=80=8Bindustry note
, we've had issues with trim-enabled hosts where
=E2=80=8Bdeleting a moderate sized dataset (~1TB) would cause the box to tr=
ip over
the
deadman timer
=E2=80=8B.  When the host comes back up it almost immediately panics again =
because
the trim commands will be reissued again causing the box to panic in a
loop.  Disabling TRIM breaks this cycle.
At the very least, getting trim to obey a different timer would useful.
 -sc=E2=80=8B


# panic: I/O to pool 'tank' appears to be hung on vdev guid
> 1181753144268412659 at '/dev/da0p1'.
> cpuid =3D 13
> KDB: stack backtrace:
> #0 0xffffffff805df950 at kdb_backtrace+0x60
> #1 0xffffffff805a355d at panic+0x17d
> #2 0xffffffff81034db3 at vdev_deadman+0x123
> #3 0xffffffff81034cc0 at vdev_deadman+0x30
> #4 0xffffffff81034cc0 at vdev_deadman+0x30
> #5 0xffffffff810298e5 at spa_deadman+0x85
> #6 0xffffffff805b8ca5 at softclock_call_cc+0x165
> #7 0xffffffff805b90b4 at softclock+0x94
> #8 0xffffffff805716cb at intr_event_execute_handlers+0xab
> #9 0xffffffff80571b16 at ithread_loop+0x96
> #10 0xffffffff8056f19a at fork_exit+0x9a
> #11 0xffffffff807a817e at fork_trampoline+0xe
> Uptime: 59s


On Sun, Sep 13, 2015 at 6:03 AM, Steven Hartland <killing@multiplay.co.uk>
wrote:
>
> Do you remember if this was this causing a deadlock or something similar
that's easy to provoke?
>
>     Regards
>     Steve
>
>
> On 11/09/2015 18:00, Alexander Motin wrote:
>>
>> Hi.
>>
>> The code in question was added by me at r253992. Commit message tells it
>> was made to decouple locks. I don't remember much more details, but may
>> be it can be redone somehow else.
>>
>> On 11.09.2015 19:07, Matthew Ahrens wrote:
>>>
>>> I discovered that when destroying a ZFS snapshot, we can end up using
>>> several seconds of CPU via this stack trace:
>>>
>>>                kernel`spinlock_exit+0x2d
>>>                kernel`taskqueue_enqueue+0x12c
>>>                zfs.ko`zio_issue_async+0x7c
>>>                zfs.ko`zio_execute+0x162
>>>                zfs.ko`dsl_scan_free_block_cb+0x15f
>>>                zfs.ko`bpobj_iterate_impl+0x25d
>>>                zfs.ko`bpobj_iterate_impl+0x46e
>>>                zfs.ko`dsl_scan_sync+0x152
>>>                zfs.ko`spa_sync+0x5c1
>>>                zfs.ko`txg_sync_thread+0x3a6
>>>                kernel`fork_exit+0x9a
>>>                kernel`0xffffffff80d0acbe
>>>               6558 ms
>>>
>>> This is not good for performance since, in addition to the CPU cost, it
>>> doesn't allow the sync thread to do anything else, and this is
>>> observable as periods where we don't do any write i/o to disk for
>>> several seconds.
>>>
>>> The problem is that when zfs_trim_enabled is set (which it is by
>>> default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing th=
e
>>> free to be dispatched to a taskq.  Since each task completes very
>>> quickly, there is a large locking and context switching overhead -- we
>>> would be better off just processing the free in the caller's context.
>>>
>>> I'm not sure exactly why we need to go async when trim is enabled, but
>>> it seems like at least we should not bother going async if trim is not
>>> actually being used (e.g. with an all-spinning-disk pool).  It would
>>> also be worth investigating not going async even when trim is useful
>>> (e.g. on SSD-based pools).
>>>
>>> Here is the relevant code:
>>>
>>> zio_free_sync():
>>>          if (zfs_trim_enabled)
>>>                  stage |=3D ZIO_STAGE_ISSUE_ASYNC |
ZIO_STAGE_VDEV_IO_START |
>>>                      ZIO_STAGE_VDEV_IO_ASSESS;
>>>          /*
>>>           * GANG and DEDUP blocks can induce a read (for the gang block
>>> header,
>>>           * or the DDT), so issue them asynchronously so that this
thread is
>>>           * not tied up.
>>>           */
>>>          else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
>>>                  stage |=3D ZIO_STAGE_ISSUE_ASYNC;
>>>
>>> --matt
>>
>>
>
> _______________________________________________
> freebsd-fs@freebsd.org mailing list
> https://lists.freebsd.org/mailman/listinfo/freebsd-fs
> To unsubscribe, send any mail to "freebsd-fs-unsubscribe@freebsd.org"


--
Sean Chittenden