Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 03 Mar 2017 12:25:02 -0500
From:      Allan Jude <allanjude@FreeBSD.org>
To:        freebsd-current@freebsd.org, "Rodney W. Grimes" <freebsd-rwg@pdx.rh.CN85.dnsmgr.net>, Peter Jeremy <peter@rulingia.com>
Cc:        freebsd-hackers <freebsd-hackers@freebsd.org>, Subbsd <subbsd@gmail.com>, freebsd-current Current <freebsd-current@freebsd.org>, Ngie Cooper <yaneurabeya@gmail.com>, Alan Somers <asomers@freebsd.org>
Subject:   Re: effect of strip(1) on du(1)
Message-ID:  <3FAE8942-2896-4EC6-95C6-D87945E57B29@FreeBSD.org>
In-Reply-To: <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net>
References:  <201703031411.v23EBUdM069969@pdx.rh.CN85.dnsmgr.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On March 3, 2017 9:11:30 AM EST, "Rodney W=2E Grimes" <freebsd-rwg@pdx=2Erh=
=2ECN85=2Ednsmgr=2Enet> wrote:
>-- Start of PGP signed section=2E
>[ Charset ISO-8859-1 unsupported, converting=2E=2E=2E ]
>> On 2017-Mar-02 22:19:10 -0800, "Rodney W=2E Grimes"
><freebsd-rwg@pdx=2Erh=2ECN85=2Ednsmgr=2Enet> wrote:
>> >> du(1) is using fts_read(3), which is based on the stat(2)
>information=2E
>> >> The OpenGroup defines st_blocksize as "Number of blocks allocated
>for
>> >> this object=2E"  In the case of ZFS, a write(2) may return before
>any
>> >> blocks are actually allocated=2E  And thanks to compression, gang
>> =2E=2E=2E
>> >My gut tells me that this is gona cause problems, is it ONLY
>> >the st_blocksize data that is incorrect then not such a big
>> >problem, or are we returning other meta data that is wrong?
>>=20
>> Note that it's st_blocks, not st_blocksize=2E
>Yes, I just ignore that digretion, as well as the digretion into
>fts_read
>being anything special about this, as it just ends up calling stat(2)
>in
>the end anyway=2E
>
>>=20
>> I did an experiment, writing a (roughly) 113MB file (some data I had
>> lying around), close()ing it and then stat()ing it in a loop=2E  This
>is
>> FreeBSD 10=2E3 with ZFS and lz4 compression=2E  Over the 26ms following
>the
>> close(), st_blocks gradually rose from 24169 to 51231=2E  It then
>stayed
>> stable until 4=2E968s after the close, when st_blocks again started
>> increasing until it stabilized after a total of 5=2E031s at 87483=2E=20
>Based
>> on this, st_blocks reflects the actual number of blocks physically
>> written to disk=2E  None of the other fields in the struct stat vary=2E
>                     ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>Thank you for doing the proper regression test, that satisfies me that
>we dont have a lattent bug sitting here and infact what we have is
>exposure of the kernel caching, which I might be too thrilled about,
>is just how its gona have to be=2E
>
>>=20
>> The 5s delay is presumably the TXG delay (since this system is
>basically
>> unloaded)=2E  I'm not sure why it writes roughly ? the data immediately
>> and the rest as part of the next TXG write=2E
>>=20
>> >My expectactions of executing a stat(2) call on a file would
>> >be that the data returned is valid and stable=2E  I think almost
>> >any program would expect that=2E
>>=20
>> I think a case could be made that st_blocks is a valid representation
>> of "the number of blocks allocated for this object" - with the number
>> increasing as the data is physically written to disk=2E  As for it
>being
>> stable, consider a (hypothetical) filesystem that can transparently
>> migrate data between different storage media, with different
>compression
>> algorithms etc (ZFS will be able to do this once the mythical block
>> rewrite code is written)=2E
>
>I could counter argue that st_blocks is:
>st_blocks   The actual number of blocks allocated for the file in
>                 512-byte units=2E
>
>Nothing in that says anything about "on disk"=2E  So while this thing
>is sitting in memory on the TXG queue we should return the number of
>512 byte blocks used by the memory holding the data=2E
>I think that would be the more correct thing than exposing the
>fact this thing is setting in a write back cache to userland=2E

Can we compare the results of du with du -A?

Du will show compression savings, and -A wont

ZFS compresses between the write cache and the disk, so the final size may=
 not be know for 5+ seconds
--=20
Allan Jude



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3FAE8942-2896-4EC6-95C6-D87945E57B29>