Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 1 Aug 2012 08:12:06 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        davidxu@freebsd.org
Cc:        Konstantin Belousov <kostikbel@gmail.com>, arch@freebsd.org
Subject:   Re: short read/write and error code
Message-ID:  <D7DC1F82-6CAA-4359-847C-EE89357D8538@bsdimp.com>
In-Reply-To: <5018E1FC.4080609@gmail.com>
References:  <5018992C.8000207@freebsd.org> <20120801071934.GJ2676@deviant.kiev.zoral.com.ua> <5018E1FC.4080609@gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

On Aug 1, 2012, at 1:59 AM, David Xu wrote:

> On 2012/8/1 15:19, Konstantin Belousov wrote:
>> On Wed, Aug 01, 2012 at 10:49:16AM +0800, David Xu wrote:
>>> POSIX requires write() to return actually bytes written, same rule =
is
>>> applied to read().
>>>=20
>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/write.html
>>>> ETURN VALUE
>>>>=20
>>>> Upon successful completion, write() [XSI]   and pwrite()  shall
>>>> return the number of bytes actually written to the file associated
>>>> with fildes. This number shall never be greater than nbyte.
>>>> Otherwise, -1 shall be returned and errno set to indicate the =
error.
>>>=20
>>> http://pubs.opengroup.org/onlinepubs/009695399/functions/read.html
>>>> RETURN VALUE
>>>>=20
>>>> Upon successful completion, read() [XSI]   and pread()  shall =
return
>>>> a non-negative integer indicating the number of bytes actually =
read.
>>>> Otherwise, the functions shall return -1 and set errno to indicate
>>>> the error.
>> Note that the wording is only about successful return, not for the =
case
>> when error occured. I do think that if fo_read() returned an error, =
and
>> error is not of the kind 'interruption', then the error shall be =
returned
>> as is.
> I do think data is more important than error code.  Do you think if a =
512 bytes block is bad,
> all bytes in the block should be thrown away while you could really =
get some bytes from it,
> this might be very important to someone, such as a password or a bank =
account,  this
> is just an example, whether filesystem works in this way is =
irrelevant.

You do know that with disk drives it is an all or nothing sort of thing =
at the sector level.  Either you get the whole thing, or you get none of =
it.  There's no partial sector reads, and there's no way to get the data =
generally.  Some drives sometimes allow you to access raw tracks, but =
those interfaces are never connected to read, but usually an ioctl that =
issues the special command and returns the results.  And even then, it =
returns everything (perhaps including the ECC bytes)

> While program continues to execute,  next read()/write() should return =
-1 and errno will be
> set, I think both socket and pipe already work in this way, it is =
dofileread/dofilewrite have
> made it not happen.

Usually it is up to the driver to make this decision.  Most drivers =
already return 0 when they've put any data into the buffer.  The case =
where there's an error returned from the driver and also data indicated =
by resid would be vanishingly small.

>>> I have following patch to fix our code to be compatible with POSIX:
>> ...
>>=20
>>> -current only resets error code to zero for short write when code is
>>> ERESTART, EINTR or EWOULDBLOCK.
>>> But this is incorrect, at least for pipe, when EPIPE is returned,
>>> some bytes may have already been written. For a named pipe, I may =
don't
>>> care a reader is disappeared or not, because for named pipe, a new
>>> reader can come in and talk with writer again,  so I need to know
>>> how many bytes have been written, same is applied to reader, I don't
>>> care writer is gone, it can come in again and talk with reader. So I
>>> suggest to remove surplus code in -current's dofilewrite() and
>>> dofileread().
>> Then fix the pipe code, and not introduce the behaviour change for =
all
>> file types ?
> see above, I think data is more important than error code,  and next =
read/write will
> get the error.
>=20
>>> For EPIPE, We still deliver SIGPIPE to current thread, but returns
>>> actually bytes written.
>> And this sounds wrong. I think that fixing the code for pipes would =
also
>> semi-magically makes this correct.

Yes.  Pipes are too magical and don't match devices very well.

Warner=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?D7DC1F82-6CAA-4359-847C-EE89357D8538>