Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 12 Feb 2018 09:04:57 -0800
From:      John Baldwin <jhb@freebsd.org>
To:        freebsd-current@freebsd.org
Cc:        Garrett Wollman <wollman@hergotha.csail.mit.edu>, asomers@freebsd.org
Subject:   Re: posix_fallocate on ZFS
Message-ID:  <1868530.6C5Wu4I1lN@ralph.baldwin.cx>
In-Reply-To: <201802101846.w1AIkX4Y000167@hergotha.csail.mit.edu>
References:  <CAOtMX2jZr_kvJgOZWeiB-AZ3-7-uUu%2BUQ3P0nKhGZ0eNRzwMOQ@mail.gmail.com> <1e2f43fd-85da-6629-62d1-6e96790278e5@digiware.nl> <201802101846.w1AIkX4Y000167@hergotha.csail.mit.edu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, February 10, 2018 01:46:33 PM Garrett Wollman wrote:
> In article
> <CAOtMX2jZr_kvJgOZWeiB-AZ3-7-uUu+UQ3P0nKhGZ0eNRzwMOQ@mail.gmail.com>,
> asomers@freebsd.org writes:
> 
> >On Sat, Feb 10, 2018 at 10:28 AM, Willem Jan Withagen <wjw@digiware.nl>
> >wrote:
> 
> >> Is there any expectation that this is going to fixed in any near future?
> 
> >No.  It's fundamentally impossible to support posix_fallocate on a COW
> >filesystem like ZFS.  Ceph should be taught to ignore an EINVAL result,
> >since the system call is merely advisory.
> 
> I don't think it's true that this is _fundamentally_ impossible.  What
> the standard requires would in essence be a per-object refreservation.
> ZFS supports refreservation, obviously, but not on a per-object basis.
> Furthermore, there are mechanisms to preallocate blocks for things
> like dumps.  So it *could* be done (as in, the concept is there), but
> it may not be practical.  (And ultimately, there are ways in which the
> administrator might manage the system that would defeat the desired
> effect, but that's out of the standard's scope.)  Given the semantic
> mismatch, though, I suspect it's unreasonable to expect anyone to
> prioritize implementation of such a feature.

I don't think posix_fallocate() can be compatible with COW.  Suppose you
do reserve a fixed set of blocks.  That ensures the first write has a
place to write, but not if you overwrite one of those blocks.  You'd have
to reserve another block to maintain the reservation each time you wrote
to a block, or you'd have to have a way to mark a file as not COW.  The
first case isn't really any better than not using posix_fallocate() in the
first place as you are still requiring writes to allocate blocks, and the
second seems a bit fraught with peril as well if the application is
expecting the non-COW'd file to be in sync with other files in the system
since presumably non-COW'd files couldn't be snapshotted, etc.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1868530.6C5Wu4I1lN>