Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 08 Dec 2015 17:42:58 +0100
From:      =?utf-8?Q?Dag-Erling_Sm=C3=B8rgrav?= <des@des.no>
To:        Warner Losh <imp@bsdimp.com>
Cc:        "freebsd-hackers\@freebsd.org" <freebsd-hackers@freebsd.org>
Subject:   Re: Fwd: DELETE support in the VOP_STRATEGY(9)?
Message-ID:  <864mfssxgt.fsf@desk.des.no>
In-Reply-To: <CANCZdfqHoduhdCss0b6=UsBPAxfRZv4hF8vyuUVLBdP5gYUduQ@mail.gmail.com> (Warner Losh's message of "Tue, 8 Dec 2015 08:43:33 -0700")
References:  <CAH7qZftSVAYPmxNCQy=VVRj79AW7z9ade-0iogv2COfo2x%2Ba2Q@mail.gmail.com> <201512052002.tB5K2ZEA026540@chez.mckusick.com> <CAH7qZfs6ksE%2BQTMFFLYxY0PNE4hzn=D5skzQ91=gGK2xvndkfw@mail.gmail.com> <86poyhqsdh.fsf@desk.des.no> <CAH7qZftVj9m_yob=AbAQA7fh8yG-VLgM7H0skW3eX_S%2Bv75E-g@mail.gmail.com> <86fuzdqjwn.fsf@desk.des.no> <CANCZdfo=NfKy51%2B64-F_v%2BDh2wkrFYP4gXe=X9RWSSao49gO9g@mail.gmail.com> <CANCZdfqHoduhdCss0b6=UsBPAxfRZv4hF8vyuUVLBdP5gYUduQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Warner Losh <imp@bsdimp.com> writes:
> Dag-Erling Sm=C3=B8rgrav <des@des.no> writes:
> > Here are a few of our options for implementing FALLOC_FL_PUNCH_HOLE:
> >
> > a) create a filesystem-level hole in the disk image;
> > b) perform a), then issue a BIO_DELETE for the blocks that were
> >    released;
> > c) perform a) or b), then zero the overspill if the requested range is
> >    unaligned;
> > d) zero the entire range;
> > e) perform d) followed by either a) or b);
> > f) nothing at all.
> I don't think f is an option. Unless it is OK to have random contents
> after creating a file and seeking some ways into and writing a
> byte. When you punch a hole in the file, you should get the same
> semantics as if you'd written up to just before the hole originally,
> then skipped to the end of the punched range and written the rest of
> the file.

I didn't realize there was a spec, so I didn't know what the intended
semantics were.

> You are correct, though, that the decision to issue a BIO_DELETE is
> between the filesystem and the storage device. This makes a-e possible
> implementations, but some are stupider than others (which ones depend
> on the situation).

Each of them except f) is the optimal solution for at least one of the
36 cases I outlined, or 18 if you ignore the zvol and device points on
the first axis.

> > Discuss the advantages and drawbacks of each option I listed above
> > for each of the 36 points in the space defined by the following
> > axes:
> > [...]
> > If you think the answer is the same in all cases, you are deluded.
> That's why these decisions are left to the stack.

Define "stack".  Do you mean the entire food chain from the hardware to
the POSIX filesystem API?  By design, no element in the stack has any
knowledge of any other element, beyond the names and dimensions of its
immediate consumers and suppliers (I find "producer" ambiguous).

> The only semantic that is required by the punch hole operation is that
> the filesystem return 0's on reads to that range.  What the filesystem
> does to ensure this is up to the filesystem.

That's easy to say, but each option has advantages and disadvantages
depending on information which is not necessarily available where it is
needed.  A filesystem-level hole results in fragmentation, which can
have a huge performance impact on electromechanical storage but is
negligible on solid-state storage.  But the filesystem does not know
whether the underlying storage is electromechanical or solid-state, nor
does it know whether the user cares much about seek times (unless we
introduce the heuristic "avoid creating holes unless the file already
has them, in which case the userland probably does not care").  Then
again, either the filesystem or the underlying storage *or both* may
have copy-on-write semantics, in which case zeroing is worse than
creating a hole.

BTW, writing zeroes to NAND flash does not require erasing the block.  I
don't know whether SSDs take advantage of that to avoid unnecessarily
reallocating or erasing a block, nor whether they automatically release
and erase blocks that end up being completely zeroed.

DES
--=20
Dag-Erling Sm=C3=B8rgrav - des@des.no



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?864mfssxgt.fsf>