From owner-freebsd-hackers@freebsd.org Fri Mar 3 14:11:34 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 36B8BCF5093; Fri, 3 Mar 2017 14:11:34 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (br1.CN84in.dnsmgr.net [69.59.192.140]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 0AED5113D; Fri, 3 Mar 2017 14:11:33 +0000 (UTC) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: from pdx.rh.CN85.dnsmgr.net (localhost [127.0.0.1]) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3) with ESMTP id v23EBWmM069970; Fri, 3 Mar 2017 06:11:32 -0800 (PST) (envelope-from freebsd-rwg@pdx.rh.CN85.dnsmgr.net) Received: (from freebsd-rwg@localhost) by pdx.rh.CN85.dnsmgr.net (8.13.3/8.13.3/Submit) id v23EBUdM069969; Fri, 3 Mar 2017 06:11:30 -0800 (PST) (envelope-from freebsd-rwg) From: "Rodney W. Grimes" Message-Id: <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net> Subject: Re: effect of strip(1) on du(1) In-Reply-To: <20170303092143.GM4503@server.rulingia.com> To: Peter Jeremy Date: Fri, 3 Mar 2017 06:11:30 -0800 (PST) CC: freebsd-hackers , Subbsd , freebsd-current Current , Ngie Cooper , Alan Somers X-Mailer: ELM [version 2.4ME+ PL121h (25)] MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=US-ASCII X-Mailman-Approved-At: Fri, 03 Mar 2017 14:56:52 +0000 X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 03 Mar 2017 14:11:34 -0000 -- Start of PGP signed section. [ Charset ISO-8859-1 unsupported, converting... ] > On 2017-Mar-02 22:19:10 -0800, "Rodney W. Grimes" wrote: > >> du(1) is using fts_read(3), which is based on the stat(2) information. > >> The OpenGroup defines st_blocksize as "Number of blocks allocated for > >> this object." In the case of ZFS, a write(2) may return before any > >> blocks are actually allocated. And thanks to compression, gang > ... > >My gut tells me that this is gona cause problems, is it ONLY > >the st_blocksize data that is incorrect then not such a big > >problem, or are we returning other meta data that is wrong? > > Note that it's st_blocks, not st_blocksize. Yes, I just ignore that digretion, as well as the digretion into fts_read being anything special about this, as it just ends up calling stat(2) in the end anyway. > > I did an experiment, writing a (roughly) 113MB file (some data I had > lying around), close()ing it and then stat()ing it in a loop. This is > FreeBSD 10.3 with ZFS and lz4 compression. Over the 26ms following the > close(), st_blocks gradually rose from 24169 to 51231. It then stayed > stable until 4.968s after the close, when st_blocks again started > increasing until it stabilized after a total of 5.031s at 87483. Based > on this, st_blocks reflects the actual number of blocks physically > written to disk. None of the other fields in the struct stat vary. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Thank you for doing the proper regression test, that satisfies me that we dont have a lattent bug sitting here and infact what we have is exposure of the kernel caching, which I might be too thrilled about, is just how its gona have to be. > > The 5s delay is presumably the TXG delay (since this system is basically > unloaded). I'm not sure why it writes roughly ? the data immediately > and the rest as part of the next TXG write. > > >My expectactions of executing a stat(2) call on a file would > >be that the data returned is valid and stable. I think almost > >any program would expect that. > > I think a case could be made that st_blocks is a valid representation > of "the number of blocks allocated for this object" - with the number > increasing as the data is physically written to disk. As for it being > stable, consider a (hypothetical) filesystem that can transparently > migrate data between different storage media, with different compression > algorithms etc (ZFS will be able to do this once the mythical block > rewrite code is written). I could counter argue that st_blocks is: st_blocks The actual number of blocks allocated for the file in 512-byte units. Nothing in that says anything about "on disk". So while this thing is sitting in memory on the TXG queue we should return the number of 512 byte blocks used by the memory holding the data. I think that would be the more correct thing than exposing the fact this thing is setting in a write back cache to userland. -- Rod Grimes rgrimes@freebsd.org