Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 8 May 2013 11:20:00 GMT
From:      Matthew Rezny <mrezny@hexaneinc.com>
To:        freebsd-bugs@FreeBSD.org
Subject:   Re: kern/178388: [zfs] [patch] allow up to 8MB recordsize
Message-ID:  <201305081120.r48BK0C4048779@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR kern/178388; it has been noted by GNATS.

From: Matthew Rezny <mrezny@hexaneinc.com>
To: Adam Nowacki <nowak@tepeserwery.pl>
Cc: bug-followup@FreeBSD.org
Subject: Re: kern/178388: [zfs] [patch] allow up to 8MB recordsize
Date: Wed, 8 May 2013 13:09:44 +0200

 On Tue, 07 May 2013 18:46:05 +0200
 Adam Nowacki <nowak@tepeserwery.pl> wrote:
 
 > On 2013-05-07 16:11, Matthew Rezny wrote:
 > > The proposed patch is rather ugly. Is there some reason to not
 > > simply change the definition of SPA_MAXBLOCKSIZE?
 > 
 > Yes. Altering the value of SPA_MAXBLOCKSIZE will change the sizes of 
 > certain metadata objects that will break compatibility with
 > non-patched systems. Just importing the pool on system with modified 
 > SPA_MAXBLOCKSIZE would result in this pool being inaccessible in 
 > non-patched systems - forever. It will also prevent booting from zfs 
 > pools as there is not enough memory available in the bootloader to 
 > support large block sizes for metadata or the loader files.
 > 
 That is understandable and something I had thought about but not
 verified if it were the case.
 
 > > The point of defining a constant is it can then be changed in the
 > > place it's defined rather than in every place it's used. Having to
 > > go change every reference to it is error prone as missing a single
 > > reference could wreck havoc.
 > 
 > SPA_MAXBLOCKSIZE is used for far more than just a limit - in many
 > places it is used as a default block size. I'm introducing
 > SPA_BIGBLOCKSIZE because of the above compatibility problems and
 > using it only in places that are essential to supporting large block
 > sizes for file or zvol data leaving default block sizes unmodified
 > (especially for pool metadata). The changed block size is only in
 > effect when recordsize dataset property is modified by explicit
 > action of the administrator. Existing and new datasets created post
 > patch default to backwards compatible 128k block size.
 > 
 > SPA_BIGBLOCKSIZE is used for asserts on the size of read/written
 > block, ARC cache, recordsize property bounds checks and block size
 > calculation logic.
 > 
 > The names of the constants could probably be changed:
 > current SPA_MAXBLOCKSIZE to SPA_DEFAULTBLOCKSIZE
 > and the new SPA_BIGBLOCKSIZE to SPA_MAXBLOCKSIZE.
 >
 Changing the value of SPA_MAXBLOCKSIZE while defining
 SPA_DEFAULTBLOCKSIZE (or SPA_MAXCOMPATBLOCKSIZE as I almost
 suggested) with the prior value would be clearer in terms of both naming
 and patch readability. I will venture so say that the number of
 references to the SPA_DEFAULTBLOCKSIZE will be fewer than references to
 SPA_MAXBLOCKSIZE.
 
 > > Specifically, I call into question the effect this has on the
 > > definition of SPA_BLOCKSIZES. The reference to SPA_MAXBLOCKSIZE was
 > > not replaced by SPA_BIGBLOCKSIZE and thus SPA_BLOCKSIZES is
 > > insufficiently sized to represent all the possible block sizes that
 > > could be used.
 > 
 > The SPA_BLOCKSIZES define is never used in the code and should
 > probably be removed.
 > 
 In that case, please kill it now.
 
 > > That one jumped out at me when I skimmed over the patch. I have not
 > > reviewed all the ZFS code to look for other unchanged references
 > > that are not part of the patch context.
 > 
 > Keep in mind that I have been using this for two months now on 3 
 > systems, 5 zpools and a total of over 50TB data written post-patch
 > with varying record sizes (128k, 1MB, 4MB, 8MB). All systems boot
 > directly from the big pools using unmodified (128k limited)
 > bootloader.
 Thank you for your work in this area and the extensive testing you have
 done. Do you have any performance data from these tests that you can
 share? Do you have some reason for not going beyond 8MB record size
 (diminishing returns, etc)?
 
 I have put some thought to additional compression algorithms in ZFS,
 e.g. LZMA, which would require large record size to see significant
 gains over existing gzip support. High strength compression on large
 data chunks would be slow for data frequently written, but for an
 archival filesystem where data is written once and then read
 periodically it would be quite useful.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201305081120.r48BK0C4048779>