From owner-freebsd-current@freebsd.org Fri Mar 3 17:25:09 2017 Return-Path: Delivered-To: freebsd-current@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 3097BCF78EF; Fri, 3 Mar 2017 17:25:09 +0000 (UTC) (envelope-from allanjude@FreeBSD.org) Received: from mx1.scaleengine.net (mx1.scaleengine.net [209.51.186.6]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0989A1C2F; Fri, 3 Mar 2017 17:25:08 +0000 (UTC) (envelope-from allanjude@FreeBSD.org) Received: from [10.250.112.121] (unknown [209.171.88.121]) (Authenticated sender: allanjude.freebsd@scaleengine.com) by mx1.scaleengine.net (Postfix) with ESMTPSA id 832F2137BE; Fri, 3 Mar 2017 17:25:06 +0000 (UTC) Date: Fri, 03 Mar 2017 12:25:02 -0500 User-Agent: K-9 Mail for Android In-Reply-To: <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net> References: <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Subject: Re: effect of strip(1) on du(1) To: freebsd-current@freebsd.org, "Rodney W. Grimes" , Peter Jeremy CC: freebsd-hackers , Subbsd , freebsd-current Current , Ngie Cooper , Alan Somers From: Allan Jude Message-ID: <3FAE8942-2896-4EC6-95C6-D87945E57B29@FreeBSD.org> X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Mar 2017 17:25:09 -0000 On March 3, 2017 9:11:30 AM EST, "Rodney W=2E Grimes" wrote: >-- Start of PGP signed section=2E >[ Charset ISO-8859-1 unsupported, converting=2E=2E=2E ] >> On 2017-Mar-02 22:19:10 -0800, "Rodney W=2E Grimes" > wrote: >> >> du(1) is using fts_read(3), which is based on the stat(2) >information=2E >> >> The OpenGroup defines st_blocksize as "Number of blocks allocated >for >> >> this object=2E" In the case of ZFS, a write(2) may return before >any >> >> blocks are actually allocated=2E And thanks to compression, gang >> =2E=2E=2E >> >My gut tells me that this is gona cause problems, is it ONLY >> >the st_blocksize data that is incorrect then not such a big >> >problem, or are we returning other meta data that is wrong? >>=20 >> Note that it's st_blocks, not st_blocksize=2E >Yes, I just ignore that digretion, as well as the digretion into >fts_read >being anything special about this, as it just ends up calling stat(2) >in >the end anyway=2E > >>=20 >> I did an experiment, writing a (roughly) 113MB file (some data I had >> lying around), close()ing it and then stat()ing it in a loop=2E This >is >> FreeBSD 10=2E3 with ZFS and lz4 compression=2E Over the 26ms following >the >> close(), st_blocks gradually rose from 24169 to 51231=2E It then >stayed >> stable until 4=2E968s after the close, when st_blocks again started >> increasing until it stabilized after a total of 5=2E031s at 87483=2E=20 >Based >> on this, st_blocks reflects the actual number of blocks physically >> written to disk=2E None of the other fields in the struct stat vary=2E > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ >Thank you for doing the proper regression test, that satisfies me that >we dont have a lattent bug sitting here and infact what we have is >exposure of the kernel caching, which I might be too thrilled about, >is just how its gona have to be=2E > >>=20 >> The 5s delay is presumably the TXG delay (since this system is >basically >> unloaded)=2E I'm not sure why it writes roughly ? the data immediately >> and the rest as part of the next TXG write=2E >>=20 >> >My expectactions of executing a stat(2) call on a file would >> >be that the data returned is valid and stable=2E I think almost >> >any program would expect that=2E >>=20 >> I think a case could be made that st_blocks is a valid representation >> of "the number of blocks allocated for this object" - with the number >> increasing as the data is physically written to disk=2E As for it >being >> stable, consider a (hypothetical) filesystem that can transparently >> migrate data between different storage media, with different >compression >> algorithms etc (ZFS will be able to do this once the mythical block >> rewrite code is written)=2E > >I could counter argue that st_blocks is: >st_blocks The actual number of blocks allocated for the file in > 512-byte units=2E > >Nothing in that says anything about "on disk"=2E So while this thing >is sitting in memory on the TXG queue we should return the number of >512 byte blocks used by the memory holding the data=2E >I think that would be the more correct thing than exposing the >fact this thing is setting in a write back cache to userland=2E Can we compare the results of du with du -A? Du will show compression savings, and -A wont ZFS compresses between the write cache and the disk, so the final size may= not be know for 5+ seconds --=20 Allan Jude