Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Aug 2012 20:05:26 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        arch@freebsd.org, David Xu <davidxu@freebsd.org>
Subject:   Re: short read/write and error code
Message-ID:  <20120802170526.GC2676@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120802222245.D2585@besplex.bde.org>
References:  <5018992C.8000207@freebsd.org> <20120801071934.GJ2676@deviant.kiev.zoral.com.ua> <20120801183240.K1291@besplex.bde.org> <20120801162836.GO2676@deviant.kiev.zoral.com.ua> <20120802040542.G2978@besplex.bde.org> <20120802100240.GV2676@deviant.kiev.zoral.com.ua> <20120802222245.D2585@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--ZJ9LQt8cES71PTQH
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu, Aug 02, 2012 at 11:54:43PM +1000, Bruce Evans wrote:
> Please trime quotes!!
Should this request be trimmed ?

>=20
> On Thu, 2 Aug 2012, Konstantin Belousov wrote:
>=20
> >On Thu, Aug 02, 2012 at 04:58:49AM +1000, Bruce Evans wrote:
> >>On Wed, 1 Aug 2012, Konstantin Belousov wrote:
> We must return a short write with no SIGPIPE, then SIGPIPE and EPIPE for
> the next write (without writing anything).
Exactly. I really tired arguing about this point with David.
He stopped providing any technical reasoning in later conversation,
so I stopped replying to him.

>=20
> >For naive programs, which are not aware that stdout can be
> >pipe (i.e. the original target audience of SIGPIPE) not delivering the
> >signal on write(2) which write anything is just fine, since we can
> >make a valid assumption that they would repeat the write, and then
> >get EPIPE/SIGPIPE as intended.
I decided not to trim the paragraph above, possibly getting a reprimand
for excessive quoting :). It is too useful for the context.
>=20
> This is correct for non-naive programs too, but the assumption isn't
> valid.  Non-naive programs don't understand short writes and typically
> treat them as errors.  Except ones that use stdio -- stdio doesn't
> repeat the whole write, but continues from the point that was
> successfully written up to.
Non-naive programs do understand short writes when they expect the file
descriptor to not reference regular file. Really good not-naive programs
do understand short writes even to supposedly regular files. Do you=20
remember the recent install(1) fix ?

>=20
> >Anyway, as I said, I very much dislike making the generic I/O layer
> >decide after the filesystem code, and limiting its ability to report
> >errors.
>=20
> And I very much dislike losing data to report errors.
So the bugs with losing data shall be fixed in the filesystems.
Otherwise well-behaving filesystems which do return errors only when
it is proper to return error are punished.

> The 1000 as times as much code is because you have to do this in all
> drivers.  The runtime pessimizations are the useless retries and the
> executing the extra code to manage this.  The end result is that the
> dofile* layer never sees the error for short i/o's.  Unlike other
> layers, this layer doesn't pretend to understand short i/o's, so it
> doesn't retry (lower layers could also skip the retry).
I do not understand why the lower layers need to do retry at all.
When appropriate, the layer shall set error to 0 if any advance of resid
was performed. This is already done in quite reasonable number of cases.
Anyway, this is only important for non-idempotent cases, like non-block
devices or pipes.

[patch trimmed]
>=20
> This is quite reasonable, but it only touches EPIPE for pipes, leaving
> the general case of EANY for anyfiletype broken.
Yes, this is good, since each case shall be considered and fixed as
appropriate. In the case of pipe/fifo, this is actual bug in the sys_pipe.c.
Making a hack in dofilewrite() only hides it for write(2), but leaving
other users of pipe_write() orphaned.

>=20
> EFAULT is the next easiest error (or even easier) after ENOPSC for
> testing bugs in this area, since the user can generate it.  Consider
> what happens for an EFAULT on the uio for the last of the 8 128K-blocks
> in the above (physio seems to actually use memory mapping, not
> uioread/write()).  The first 7 128K-blocks get written irreversibly
> and you hopefully get EFAULT when mapping the last block.  This EFAULT
> is just as important as EIO, since the underlying file has been changed.
EFAULT is especially special. EFAULT is user error, and behaviour seems
to be undefined by SUSv4 at all for EFAULT.

In fact, newnfs and ufs handle EFAULT properly now, returning carefully
advanced uio, since deadlock avoidance code relies on this information
from the lower layers.
>=20
> Oops, I just rememebered the justification for _not_ returning short
> writes or backing out of writes in the middle of a regular file.  It
> is that the failing part may have changed the underlying file, and if
> you return a short write for the successful part then the application
> will only learn that something fails if it retries the failing part.
> A failing write means that the whole extent of the region where the
> write was attempted is indeterminate, and this applies equally to
> regular files and disk files.  Applications just shouldn't attempt to
> write GBs or TBs at a time, since a failure in the middle leaves them
> with either no indication of the extent of the error (if a short count
> is returned) or with the possibility of the whole extent being changed
> to garbage (if an error is returned).  Even in the kernel where bith
> the count and the error are returned, there is no way to tell how much
> was turned to garbage beyond the ailure point (this depends on details
> of the buffering).  Retrying in a careful application will eventually
> find the extent of the error though.  The necessary care seems to turn
> a simple write() call into at least 50 lines of error handling :-(.
Yes, the regular files have the nice property of the write idempotence.
This is why returning error and not short write (sometimes) is more
useful then returning short writes. Examples are indeed EIO, EFAULT or
ENXIO. Do not deny the filesystems the right to decide how to handle
such situations.

--ZJ9LQt8cES71PTQH
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlAas1YACgkQC3+MBN1Mb4gK/QCg248SA5XIJk1FbNQV32tXEyrQ
UKUAn1+2gJUkwPqkijv26uwjDKlpx/XD
=HXic
-----END PGP SIGNATURE-----

--ZJ9LQt8cES71PTQH--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120802170526.GC2676>