Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Aug 2012 19:23:09 +1000 (EST)
From:      Bruce Evans <brde@optusnet.com.au>
To:        Konstantin Belousov <kostikbel@gmail.com>
Cc:        arch@FreeBSD.org, David Xu <davidxu@FreeBSD.org>
Subject:   Re: short read/write and error code
Message-ID:  <20120801183240.K1291@besplex.bde.org>
In-Reply-To: <20120801071934.GJ2676@deviant.kiev.zoral.com.ua>
References:  <5018992C.8000207@freebsd.org> <20120801071934.GJ2676@deviant.kiev.zoral.com.ua>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, 1 Aug 2012, Konstantin Belousov wrote:

> On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote:
>> POSIX requires write() to return actually bytes written, same rule is
>> applied to read().
>>
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
>>> ETURN VALUE
>>>
>>> Upon successful completion, write() [XSI]   and pwrite()  shall
>>> return the number of bytes actually written to the file associated
>>> with fildes. This number shall never be greater than nbyte.
>>> Otherwise, -1 shall be returned and errno set to indicate the error.
>>
>> http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
>>> RETURN VALUE
>>>
>>> Upon successful completion, read() [XSI]   and pread()  shall return
>>> a non-negative integer indicating the number of bytes actually read.
>>> Otherwise, the functions shall return -1 and set errno to indicate
>>> the error.
> Note that the wording is only about successful return, not for the case
> when error occured. I do think that if fo_read() returned an error, and
> error is not of the kind 'interruption', then the error shall be returned
> as is.

That is clearly not what is intended.  write() is unusable if it won't
tell you how many bytes it wrote.  According to your interpretation,
recalcitrantix would conform to POSIX if all it writes wrote whatever
they could and then returned -1 after detecting the error EPOSIXFUZZY.

The usability is specified for signals.  From an old POSIX draft:

% 51235              If write( ) is interrupted by a signal before it writes any data, it shall return -1 with errno set to
% 51236              [EINTR].
% 51237              If write( ) is interrupted by a signal after it successfully writes some data, it shall return the
% 51238              number of bytes written.

POSIX formally defines "Successfully Transferred", mainly for aio.  I
couldn't find any formal definition of "successfully writes", but clearly
it is nonsense for a write to be unsuccessful if a reader on the local
system or on an external system has successfully read some of the data
written by the write.

FreeBSD does try to convert EINTR to 0 after some data has been written,
to conform to the above.  SIGPIPE should return EINTR to be returned to
dofilewrite(), so there should be no problem for SIGPIPE.  But we were
reminded of this old FreeBSD bug by probelms with SIGPIPE.

POSIX contradicts itself by disallowing successful completion if _any_
error is detected:

% 435              RETURN VALUE
% 436                        This section indicates the possible return values, if any.
% 437                        If the implementation can detect errors, ``successful completion'' means that no error
% 438                        has been detected during execution of the function. If the implementation does detect

Relcalcitrantix has 2 versions according to which of these contradictions
has precedence.  In one version, writes do as much as possible before
returning -1/EPOSIXFUZZY, as above.  In the other version, this still
happens for most writes.  But ones that are interrupted by a signal after
having written some data return the number of bytes written, accoding to
the "shall" for the interrupted case.  Perhaps there are some other weird
cases where writes are required to work :-).

>> I have following patch to fix our code to be compatible with POSIX:
> ...
>
>> -current only resets error code to zero for short write when code is
>> ERESTART, EINTR or EWOULDBLOCK.
>> But this is incorrect, at least for pipe, when EPIPE is returned,
>> some bytes may have already been written. For a named pipe, I may don't
>> care a reader is disappeared or not, because for named pipe, a new
>> reader can come in and talk with writer again,  so I need to know
>> how many bytes have been written, same is applied to reader, I don't
>> care writer is gone, it can come in again and talk with reader. So I
>> suggest to remove surplus code in -current's dofilewrite() and
>> dofileread().
> Then fix the pipe code, and not introduce the behaviour change for all
> file types ?

Because returning the error to userland breaks all file types that
want to return a short i/o (mainly special files whose i/o cannot be
backed out of).  They are just detecting and returning an error as a
courtesy to upper layers, and to simplify the implementation.  The
syscall API doesn't permit returning both the error code (the reason
for the short i/o) and the short count, so the error code must be
cleared to allow the short count to be returned.

Bruce



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20120801183240.K1291>