From owner-freebsd-fs@freebsd.org Fri Sep 11 17:00:44 2015 Return-Path: Delivered-To: freebsd-fs@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6D0AFA01AE9 for ; Fri, 11 Sep 2015 17:00:44 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: from mail-wi0-x22d.google.com (mail-wi0-x22d.google.com [IPv6:2a00:1450:400c:c05::22d]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (Client CN "smtp.gmail.com", Issuer "Google Internet Authority G2" (verified OK)) by mx1.freebsd.org (Postfix) with ESMTPS id 06059147C for ; Fri, 11 Sep 2015 17:00:44 +0000 (UTC) (envelope-from mavbsd@gmail.com) Received: by wiclk2 with SMTP id lk2so71649796wic.0 for ; Fri, 11 Sep 2015 10:00:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=sender:message-id:date:from:user-agent:mime-version:to:subject :references:in-reply-to:content-type:content-transfer-encoding; bh=9sIP0LUYDboDSbFjXC1NgiRTUBThCd8ktsxVGDlGkxU=; b=YqprS1F/HNedJCI2Y/tDlfXJ64HCeAU83IWCDM9bqK+hsp7dkBz6JeBuAfpZC/it1/ mrxFniqZlDVbAMF9qLQjgLmfxmSEs5iUkUi10oeBkB0LKJbtzbnX1y2z/EUydNb5o8pe hXnSYCPekNw1w4jF1PGsoUcEgY1MIKOTBWpK2iaBvaFeSbG4+XPVIb1pCG45txmYoLCt UwOHUVK1HcTaMvAOYHdbAYrT/UZLQj6eW0ubMSwU6h3PqYfyvi7yZr6/dZjQ3w5vDqsR plm4oSxcIG5KcHEYdxAmz6nC1SzbUojMRkMy4fMs+dcTx74cnIZ9utFBJuANOmyfD+08 HJVQ== X-Received: by 10.194.179.103 with SMTP id df7mr87967167wjc.69.1441990841464; Fri, 11 Sep 2015 10:00:41 -0700 (PDT) Received: from mavbook.mavhome.dp.ua ([2a01:d0:c0a9:3:c685:8ff:fe11:1aa2]) by smtp.googlemail.com with ESMTPSA id z2sm72098wij.1.2015.09.11.10.00.40 (version=TLSv1.2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Fri, 11 Sep 2015 10:00:40 -0700 (PDT) Sender: Alexander Motin Message-ID: <55F308B7.3020302@FreeBSD.org> Date: Fri, 11 Sep 2015 20:00:39 +0300 From: Alexander Motin User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:31.0) Gecko/20100101 Thunderbird/31.6.0 MIME-Version: 1.0 To: Matthew Ahrens , freebsd-fs Subject: Re: zfs_trim_enabled destroys zio_free() performance References: In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.20 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 11 Sep 2015 17:00:44 -0000 Hi. The code in question was added by me at r253992. Commit message tells it was made to decouple locks. I don't remember much more details, but may be it can be redone somehow else. On 11.09.2015 19:07, Matthew Ahrens wrote: > I discovered that when destroying a ZFS snapshot, we can end up using > several seconds of CPU via this stack trace: > > kernel`spinlock_exit+0x2d > kernel`taskqueue_enqueue+0x12c > zfs.ko`zio_issue_async+0x7c > zfs.ko`zio_execute+0x162 > zfs.ko`dsl_scan_free_block_cb+0x15f > zfs.ko`bpobj_iterate_impl+0x25d > zfs.ko`bpobj_iterate_impl+0x46e > zfs.ko`dsl_scan_sync+0x152 > zfs.ko`spa_sync+0x5c1 > zfs.ko`txg_sync_thread+0x3a6 > kernel`fork_exit+0x9a > kernel`0xffffffff80d0acbe > 6558 ms > > This is not good for performance since, in addition to the CPU cost, it > doesn't allow the sync thread to do anything else, and this is > observable as periods where we don't do any write i/o to disk for > several seconds. > > The problem is that when zfs_trim_enabled is set (which it is by > default), zio_free_sync() always sets ZIO_STAGE_ISSUE_ASYNC, causing the > free to be dispatched to a taskq. Since each task completes very > quickly, there is a large locking and context switching overhead -- we > would be better off just processing the free in the caller's context. > > I'm not sure exactly why we need to go async when trim is enabled, but > it seems like at least we should not bother going async if trim is not > actually being used (e.g. with an all-spinning-disk pool). It would > also be worth investigating not going async even when trim is useful > (e.g. on SSD-based pools). > > Here is the relevant code: > > zio_free_sync(): > if (zfs_trim_enabled) > stage |= ZIO_STAGE_ISSUE_ASYNC | ZIO_STAGE_VDEV_IO_START | > ZIO_STAGE_VDEV_IO_ASSESS; > /* > * GANG and DEDUP blocks can induce a read (for the gang block > header, > * or the DDT), so issue them asynchronously so that this thread is > * not tied up. > */ > else if (BP_IS_GANG(bp) || BP_GET_DEDUP(bp)) > stage |= ZIO_STAGE_ISSUE_ASYNC; > > --matt -- Alexander Motin