Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Apr 2011 16:31:30 +0200
From:      Pawel Jakub Dawidek <pjd@FreeBSD.org>
To:        mdf@FreeBSD.org
Cc:        FreeBSD Arch <freebsd-arch@freebsd.org>
Subject:   Re: posix_fallocate(2)
Message-ID:  <20110415143130.GC4526@garage.freebsd.pl>
In-Reply-To: <BANLkTimYzJ11w9X1OHShEn2wi6gjHx=YjA@mail.gmail.com>
References:  <BANLkTimYzJ11w9X1OHShEn2wi6gjHx=YjA@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--iFRdW5/EC4oqxDHL
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Apr 14, 2011 at 12:35:34PM -0700, mdf@FreeBSD.org wrote:
> For work we need a functionality in our filesystem that is pretty much
> like posix_fallocate(2), so we're using the name and I've added a
> default VOP_ALLOCATE definition that does the right, but dumb, thing.
>=20
> The most recent mention of this function in FreeBSD was another thread
> lamenting it's failure to exist:
> http://lists.freebsd.org/pipermail/freebsd-ports/2010-February/059268.html
>=20
> The attached files are the core of the kernel implementation of the
> syscall and a default VOP for any filesystem not supporting
> VOP_ALLOCATE, which allows the syscall to work as expected but in a
> non-performant manner.  I didn't see this syscall in NetBSD or
> OpenBSD, so I plan to add it to the end of our syscall table.
>=20
> What I wanted to check with -arch about was:
>=20
> 1) is there still a desire for this syscall?
> 2) is this naive implementation useful enough to serve as a default
> for all filesystems until someone with more knowledge fills them in?
> 3) are there any obvious bugs or missing elements?

As I understand it you have two cases to consider:
1. The caller wants to reserve space in region that might be a hole, so
   we read and rewrite this region.
2. The caller wants to reserve space beyond file size. We need to write
   zeros there.

For the first case I don't see a point in rewriting the block if it
contains data that are not all-zeros. Hole can contain only zeros, so
there is a place for optimization right there - skip write step if data
is not all-zeros. Of course you need to know somehow what smallest block
size file system uses.

In case of ZFS overwriting hole with zeros won't reserve the space if
you have compression turned on.  All-zeros are turned into holes by ZFS
internally when compression is on.

The first case would be better implemented using SEEK_HOLE/SEEK_DATA,
but those are not implemented yet in UFS, but will allow to find holes
in the file and just overwrite them. You could entirely avoid reading
and most of the writes in general purpose implementation. You could also
add a flag to VFS_SET(9) to mark file systems that support holes. If
file system doesn't support holes, first case might be skipped.

For the second case I find it as a waste to first extend file size and
then read those zeros. Why can't you just write zeros and avoid read
step when you are extending file?

--=20
Pawel Jakub Dawidek                       http://www.wheelsystems.com
FreeBSD committer                         http://www.FreeBSD.org
Am I Evil? Yes, I Am!                     http://yomoli.com

--iFRdW5/EC4oqxDHL
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.14 (FreeBSD)

iEYEARECAAYFAk2oVsEACgkQForvXbEpPzRXjQCgx0wDZsaeZUugBi9+sjYN+M4T
wf8An2GK/pVsFb+Db/WUIGcttkvEruIi
=N2pF
-----END PGP SIGNATURE-----

--iFRdW5/EC4oqxDHL--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20110415143130.GC4526>