From owner-freebsd-fs@FreeBSD.ORG Mon Oct 28 21:43:56 2013 Return-Path: Delivered-To: freebsd-fs@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTP id CC6EC2DF; Mon, 28 Oct 2013 21:43:56 +0000 (UTC) (envelope-from slw@zxy.spb.ru) Received: from zxy.spb.ru (zxy.spb.ru [195.70.199.98]) by mx1.freebsd.org (Postfix) with ESMTP id 8858A2166; Mon, 28 Oct 2013 21:43:56 +0000 (UTC) Received: from slw by zxy.spb.ru with local (Exim 4.69 (FreeBSD)) (envelope-from ) id 1VaudE-000POV-Qi; Tue, 29 Oct 2013 01:45:52 +0400 Date: Tue, 29 Oct 2013 01:45:52 +0400 From: Slawa Olhovchenkov To: d@delphij.net Subject: Re: ZFS txg implementation flaw Message-ID: <20131028214552.GY63359@zxy.spb.ru> References: <20131028092844.GA24997@zxy.spb.ru> <9A00B135-7D28-47EB-ADB3-E87C38BAC6B6@ixsystems.com> <20131028213204.GX63359@zxy.spb.ru> <526ED956.10202@delphij.net> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <526ED956.10202@delphij.net> User-Agent: Mutt/1.5.21 (2010-09-15) X-SA-Exim-Connect-IP: X-SA-Exim-Mail-From: slw@zxy.spb.ru X-SA-Exim-Scanned: No (on zxy.spb.ru); SAEximRunCond expanded to false Cc: freebsd-fs@freebsd.org, freebsd-current@freebsd.org, Jordan Hubbard X-BeenThere: freebsd-fs@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Filesystems List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 28 Oct 2013 21:43:56 -0000 On Mon, Oct 28, 2013 at 02:38:30PM -0700, Xin Li wrote: > -----BEGIN PGP SIGNED MESSAGE----- > Hash: SHA512 > > On 10/28/13 14:32, Slawa Olhovchenkov wrote: > > On Mon, Oct 28, 2013 at 02:22:16PM -0700, Jordan Hubbard wrote: > > > >> > >> On Oct 28, 2013, at 2:28 AM, Slawa Olhovchenkov > >> wrote: > >> > >>> As I see ZFS cretate seperate thread for earch txg writing. > >>> Also for writing to L2ARC. As result -- up to several thousands > >>> threads created and destoyed per second. And hundreds thousands > >>> page allocations, zeroing, maping unmaping and freeing per > >>> seconds. Very high overhead. > >> > >> How are you measuring the number of threads being created / > >> destroyed? This claim seems erroneous given how the ZFS thread > >> pool mechanism actually works (and yes, there are thread pools > >> already). > >> > >> It would be helpful to both see your measurement methodology and > >> the workload you are using in your tests. > > > > Semi-indirect. dtrace -n 'fbt:kernel:vm_object_terminate:entry { > > @traces[stack()] = count(); }' > > > > After some (2-3) seconds > > > > kernel`vnode_destroy_vobject+0xb9 zfs.ko`zfs_freebsd_reclaim+0x2e > > kernel`VOP_RECLAIM_APV+0x78 kernel`vgonel+0x134 > > kernel`vnlru_free+0x362 kernel`vnlru_proc+0x61e > > kernel`fork_exit+0x11f kernel`0xffffffff80cdbfde 2490 0xffffffff80cdbfd0 : mov %r12,%rdi 0xffffffff80cdbfd3 : mov %rbx,%rsi 0xffffffff80cdbfd6 : mov %rsp,%rdx 0xffffffff80cdbfd9 : callq 0xffffffff808db560 0xffffffff80cdbfde : jmpq 0xffffffff80cdca80 0xffffffff80cdbfe3 : nopw 0x0(%rax,%rax,1) 0xffffffff80cdbfe9 : nopl 0x0(%rax) > > I don't have user process created threads nor do fork/exit. > > This has nothing to do with fork/exit but does suggest that you are > running of vnodes. What does sysctl -a | grep vnode say? kern.maxvnodes: 1095872 kern.minvnodes: 273968 vm.stats.vm.v_vnodepgsout: 0 vm.stats.vm.v_vnodepgsin: 62399 vm.stats.vm.v_vnodeout: 0 vm.stats.vm.v_vnodein: 10680 vfs.freevnodes: 275107 vfs.wantfreevnodes: 273968 vfs.numvnodes: 316321 debug.sizeof.vnode: 504