Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 11 Sep 2015 20:00:39 +0300
From:      Alexander Motin <mav@FreeBSD.org>
To:        Matthew Ahrens <mahrens@delphix.com>, freebsd-fs <freebsd-fs@freebsd.org>
Subject:   Re: zfs_trim_enabled destroys zio_free() performance
Message-ID:  <55F308B7.3020302@FreeBSD.org>
In-Reply-To: <CAJjvXiE2mRT4=kPMk3gwiT-3ykeAhaYBx6Tw6HgXhs2=XZWWFg@mail.gmail.com>
References:  <CAJjvXiE2mRT4=kPMk3gwiT-3ykeAhaYBx6Tw6HgXhs2=XZWWFg@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hi.

The code in question was added by me at r253992. Commit message tells it
was made to decouple locks. I don't remember much more details, but may
be it can be redone somehow else.

On 11.09.2015 19:07, Matthew Ahrens wrote:
> I discovered that when destroying a ZFS snapshot, we can end up using
> several seconds of CPU via this stack trace:
> 
>               kernel`spinlock_exit+0x2d
>               kernel`taskqueue_enqueue+0x12c
>               zfs.ko`zio_issue_async+0x7c
>               zfs.ko`zio_execute+0x162
>               zfs.ko`dsl_scan_free_block_cb+0x15f
>               zfs.ko`bpobj_iterate_impl+0x25d
>               zfs.ko`bpobj_iterate_impl+0x46e
>               zfs.ko`dsl_scan_sync+0x152
>               zfs.ko`spa_sync+0x5c1
>               zfs.ko`txg_sync_thread+0x3a6
>               kernel`fork_exit+0x9a
>               kernel`0xffffffff80d0acbe
>              6558 ms
> 
> This is not good for performance since, in addition to the CPU cost, it
> doesn't allow the sync thread to do anything else, and this is
> observable as periods where we don't do any write i/o to disk for
> several seconds.
> 
> The problem is that when zfs_trim_enabled is set (which it is by
> default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the
> free to be dispatched to a taskq.  Since each task completes very
> quickly, there is a large locking and context switching overhead -- we
> would be better off just processing the free in the caller's context.
> 
> I'm not sure exactly why we need to go async when trim is enabled, but
> it seems like at least we should not bother going async if trim is not
> actually being used (e.g. with an all-spinning-disk pool).  It would
> also be worth investigating not going async even when trim is useful
> (e.g. on SSD-based pools).
> 
> Here is the relevant code:
> 
> zio_free_sync():
>         if (zfs_trim_enabled)
>                 stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START |
>                     ZIO_STAGE_VDEV_IO_ASSESS;
>         /*
>          * GANG and DEDUP blocks can induce a read (for the gang block
> header,
>          * or the DDT), so issue them asynchronously so that this thread is
>          * not tied up.
>          */
>         else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp))
>                 stage |= ZIO_STAGE_ISSUE_ASYNC;
> 
> --matt


-- 
Alexander Motin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?55F308B7.3020302>