Date: Wed, 1 Aug 2012 19:28:36 +0300 From: Konstantin Belousov <kostikbel@gmail.com> To: Bruce Evans <brde@optusnet.com.au> Cc: arch@freebsd.org, David Xu <davidxu@freebsd.org> Subject: Re: short read/write and error code Message-ID: <20120801162836.GO2676@deviant.kiev.zoral.com.ua> In-Reply-To: <20120801183240.K1291@besplex.bde.org> References: <5018992C.8000207@freebsd.org> <20120801071934.GJ2676@deviant.kiev.zoral.com.ua> <20120801183240.K1291@besplex.bde.org>
next in thread | previous in thread | raw e-mail | index | archive | help
--uD6Il+FtNNOLHt4d Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Wed, Aug 01, 2012 at 07:23:09PM +1000, Bruce Evans wrote: > On Wed, 1 Aug 2012, Konstantin Belousov wrote: >=20 > >On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote: > >>POSIX requires write() to return actually bytes written, same rule is > >>applied to read(). > >> > >>http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html > >>>ETURN VALUE > >>> > >>>Upon successful completion, write() [XSI] and pwrite() shall > >>>return the number of bytes actually written to the file associated > >>>with fildes. This number shall never be greater than nbyte. > >>>Otherwise, -1 shall be returned and errno set to indicate the error. > >> > >>http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html > >>>RETURN VALUE > >>> > >>>Upon successful completion, read() [XSI] and pread() shall return > >>>a non-negative integer indicating the number of bytes actually read. > >>>Otherwise, the functions shall return -1 and set errno to indicate > >>>the error. > >Note that the wording is only about successful return, not for the case > >when error occured. I do think that if fo_read() returned an error, and > >error is not of the kind 'interruption', then the error shall be returned > >as is. >=20 > That is clearly not what is intended. write() is unusable if it won't > tell you how many bytes it wrote. According to your interpretation, > recalcitrantix would conform to POSIX if all it writes wrote whatever > they could and then returned -1 after detecting the error EPOSIXFUZZY. I think this is obvious pull, because no useful implementation would insert _artificial_ error. >=20 > The usability is specified for signals. From an old POSIX draft: >=20 > % 51235 If write( ) is interrupted by a signal before it=20 > writes any data, it shall return -1 with errno set to > % 51236 [EINTR]. > % 51237 If write( ) is interrupted by a signal after it=20 > successfully writes some data, it shall return the > % 51238 number of bytes written. This is exactly what existing code does. >=20 > POSIX formally defines "Successfully Transferred", mainly for aio. I > couldn't find any formal definition of "successfully writes", but clearly > it is nonsense for a write to be unsuccessful if a reader on the local > system or on an external system has successfully read some of the data > written by the write. >=20 > FreeBSD does try to convert EINTR to 0 after some data has been written, > to conform to the above. SIGPIPE should return EINTR to be returned to > dofilewrite(), so there should be no problem for SIGPIPE. But we were > reminded of this old FreeBSD bug by probelms with SIGPIPE. Sorry, I do not understand this, esp. second sentence. As I said, patch behaviour in regard of SIGPIPE is just wrong. >=20 > POSIX contradicts itself by disallowing successful completion if _any_ > error is detected: >=20 > % 435 RETURN VALUE > % 436 This section indicates the possible return= =20 > values, if any. > % 437 If the implementation can detect errors,=20 > ``successful completion'' means that no error > % 438 has been detected during execution of the=20 > function. If the implementation does detect >=20 > Relcalcitrantix has 2 versions according to which of these contradictions > has precedence. In one version, writes do as much as possible before > returning -1/EPOSIXFUZZY, as above. In the other version, this still > happens for most writes. But ones that are interrupted by a signal after > having written some data return the number of bytes written, accoding to > the "shall" for the interrupted case. Perhaps there are some other weird > cases where writes are required to work :-). >=20 > >>I have following patch to fix our code to be compatible with POSIX: > >... > > > >>-current only resets error code to zero for short write when code is > >>ERESTART, EINTR or EWOULDBLOCK. > >>But this is incorrect, at least for pipe, when EPIPE is returned, > >>some bytes may have already been written. For a named pipe, I may don't > >>care a reader is disappeared or not, because for named pipe, a new > >>reader can come in and talk with writer again, so I need to know > >>how many bytes have been written, same is applied to reader, I don't > >>care writer is gone, it can come in again and talk with reader. So I > >>suggest to remove surplus code in -current's dofilewrite() and > >>dofileread(). > >Then fix the pipe code, and not introduce the behaviour change for all > >file types ? >=20 > Because returning the error to userland breaks all file types that > want to return a short i/o (mainly special files whose i/o cannot be > backed out of). They are just detecting and returning an error as a > courtesy to upper layers, and to simplify the implementation. The > syscall API doesn't permit returning both the error code (the reason > for the short i/o) and the short count, so the error code must be > cleared to allow the short count to be returned. No, there is the only sane behaviour for the fo_read and fo_write, to return either no error (or interruption error) and adjust resid, or return error. Returning both error and adjusting resid is just wrong. Proposed patch makes generic i/o layer much less flexible, and probably preventing implementation of things like transactional writes. We should fix sys_pipe.c and not require filesystems to roll back uio into inconsistent state to report errors (since rolling back into consistent state is typically impossible but is required after the patch). >=20 > Bruce --uD6Il+FtNNOLHt4d Content-Type: application/pgp-signature Content-Disposition: inline -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.12 (FreeBSD) iEYEARECAAYFAlAZWTQACgkQC3+MBN1Mb4ieaACg5Jt2PwJqw/VtVZ7ovRPGbUZw ec0AniWjpRP6WRWOaXO9GZxEGZAiJy2M =D1/n -----END PGP SIGNATURE----- --uD6Il+FtNNOLHt4d--
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120801162836.GO2676>