Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Aug 2012 19:28:36 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Bruce Evans <brde@optusnet.com.au>
Cc:        arch@freebsd.org, David Xu <davidxu@freebsd.org>
Subject:   Re: short read/write and error code
Message-ID:  <20120801162836.GO2676@deviant.kiev.zoral.com.ua>
In-Reply-To: <20120801183240.K1291@besplex.bde.org>
References:  <5018992C.8000207@freebsd.org> <20120801071934.GJ2676@deviant.kiev.zoral.com.ua> <20120801183240.K1291@besplex.bde.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--uD6Il+FtNNOLHt4d
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Aug 01, 2012 at 07:23:09PM +1000, Bruce Evans wrote:
> On Wed, 1 Aug 2012, Konstantin Belousov wrote:
>=20
> >On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote:
> >>POSIX requires write() to return actually bytes written, same rule is
> >>applied to read().
> >>
> >>http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
> >>>ETURN VALUE
> >>>
> >>>Upon successful completion, write() [XSI]   and pwrite()  shall
> >>>return the number of bytes actually written to the file associated
> >>>with fildes. This number shall never be greater than nbyte.
> >>>Otherwise, -1 shall be returned and errno set to indicate the error.
> >>
> >>http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
> >>>RETURN VALUE
> >>>
> >>>Upon successful completion, read() [XSI]   and pread()  shall return
> >>>a non-negative integer indicating the number of bytes actually read.
> >>>Otherwise, the functions shall return -1 and set errno to indicate
> >>>the error.
> >Note that the wording is only about successful return, not for the case
> >when error occured. I do think that if fo_read() returned an error, and
> >error is not of the kind 'interruption', then the error shall be returned
> >as is.
>=20
> That is clearly not what is intended.  write() is unusable if it won't
> tell you how many bytes it wrote.  According to your interpretation,
> recalcitrantix would conform to POSIX if all it writes wrote whatever
> they could and then returned -1 after detecting the error EPOSIXFUZZY.
I think this is obvious pull, because no useful implementation would
insert _artificial_ error.

>=20
> The usability is specified for signals.  From an old POSIX draft:
>=20
> % 51235              If write( ) is interrupted by a signal before it=20
> writes any data, it shall return -1 with errno set to
> % 51236              [EINTR].
> % 51237              If write( ) is interrupted by a signal after it=20
> successfully writes some data, it shall return the
> % 51238              number of bytes written.
This is exactly what existing code does.

>=20
> POSIX formally defines "Successfully Transferred", mainly for aio.  I
> couldn't find any formal definition of "successfully writes", but clearly
> it is nonsense for a write to be unsuccessful if a reader on the local
> system or on an external system has successfully read some of the data
> written by the write.
>=20
> FreeBSD does try to convert EINTR to 0 after some data has been written,
> to conform to the above.  SIGPIPE should return EINTR to be returned to
> dofilewrite(), so there should be no problem for SIGPIPE.  But we were
> reminded of this old FreeBSD bug by probelms with SIGPIPE.
Sorry, I do not understand this, esp. second sentence.

As I said, patch behaviour in regard of SIGPIPE is just wrong.
>=20
> POSIX contradicts itself by disallowing successful completion if _any_
> error is detected:
>=20
> % 435              RETURN VALUE
> % 436                        This section indicates the possible return=
=20
> values, if any.
> % 437                        If the implementation can detect errors,=20
> ``successful completion'' means that no error
> % 438                        has been detected during execution of the=20
> function. If the implementation does detect
>=20
> Relcalcitrantix has 2 versions according to which of these contradictions
> has precedence.  In one version, writes do as much as possible before
> returning -1/EPOSIXFUZZY, as above.  In the other version, this still
> happens for most writes.  But ones that are interrupted by a signal after
> having written some data return the number of bytes written, accoding to
> the "shall" for the interrupted case.  Perhaps there are some other weird
> cases where writes are required to work :-).
>=20
> >>I have following patch to fix our code to be compatible with POSIX:
> >...
> >
> >>-current only resets error code to zero for short write when code is
> >>ERESTART, EINTR or EWOULDBLOCK.
> >>But this is incorrect, at least for pipe, when EPIPE is returned,
> >>some bytes may have already been written. For a named pipe, I may don't
> >>care a reader is disappeared or not, because for named pipe, a new
> >>reader can come in and talk with writer again,  so I need to know
> >>how many bytes have been written, same is applied to reader, I don't
> >>care writer is gone, it can come in again and talk with reader. So I
> >>suggest to remove surplus code in -current's dofilewrite() and
> >>dofileread().
> >Then fix the pipe code, and not introduce the behaviour change for all
> >file types ?
>=20
> Because returning the error to userland breaks all file types that
> want to return a short i/o (mainly special files whose i/o cannot be
> backed out of).  They are just detecting and returning an error as a
> courtesy to upper layers, and to simplify the implementation.  The
> syscall API doesn't permit returning both the error code (the reason
> for the short i/o) and the short count, so the error code must be
> cleared to allow the short count to be returned.
No, there is the only sane behaviour for the fo_read and fo_write, to
return either no error (or interruption error) and adjust resid, or
return error. Returning both error and adjusting resid is just wrong.

Proposed patch makes generic i/o layer much less flexible, and probably
preventing implementation of things like transactional writes.

We should fix sys_pipe.c and not require filesystems to roll back uio
into inconsistent state to report errors (since rolling back into
consistent state is typically impossible but is required after the patch).

>=20
> Bruce

--uD6Il+FtNNOLHt4d
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.12 (FreeBSD)

iEYEARECAAYFAlAZWTQACgkQC3+MBN1Mb4ieaACg5Jt2PwJqw/VtVZ7ovRPGbUZw
ec0AniWjpRP6WRWOaXO9GZxEGZAiJy2M
=D1/n
-----END PGP SIGNATURE-----

--uD6Il+FtNNOLHt4d--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120801162836.GO2676>