Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 20 Apr 2005 11:52:33 -0400
From:      Brian Fundakowski Feldman <green@freebsd.org>
To:        Marc Olzheim <marcolz@stack.nl>
Cc:        freebsd-current@freebsd.org
Subject:   Re: NFS client/buffer cache deadlock
Message-ID:  <20050420155233.GJ1157@green.homeunix.org>
In-Reply-To: <20050420153528.GC77731@stack.nl>
References:  <20050419151800.GE1157@green.homeunix.org> <20050419160258.GA12287@stack.nl> <20050419160900.GB12287@stack.nl> <20050419161616.GF1157@green.homeunix.org> <20050419204723.GG1157@green.homeunix.org> <20050420140409.GA77731@stack.nl> <20050420142448.GH1157@green.homeunix.org> <20050420143842.GB77731@stack.nl> <20050420152038.GI1157@green.homeunix.org> <20050420153528.GC77731@stack.nl>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Apr 20, 2005 at 05:35:28PM +0200, Marc Olzheim wrote:
> On Wed, Apr 20, 2005 at 11:20:38AM -0400, Brian Fundakowski Feldman wrote:
> > Reads should be totally unaffected...
> 
> The server was misbehaving. Fixed. :-)
> 
> > > Btw.: I'm not sure write(),writev() and pwrite() are allowed to do short
> > > writes on regular files... ?
> > 
> > Our manpage is incorrect; POSIX states that they are (see earlier
> > e-mail).  There really is no alternative -- we simply can't build
> > an NFS transaction larger than our buffer cache can accomodate.
> > Note that short wries won't happen for normal buffer sizes, only
> > excessively large ones.  I really don't believe that writev() is meant
> > to be used so that you can write gigantic data structures in a single
> > transaction...
> 
> Ah, I was reading the SUSv2 page:
> 
> http://www.opengroup.org/onlinepubs/009695399/functions/write.html
> 
> instead of the POSIX version.
> 
> But in neither of those I can extrude the fact that it can return
> with result < nbyte, without it being a permanent condition.
> What phrase makes you conclude that it can ?

This specific issue is not clear-cut; the best thing to do lies somewhere
within the range of these scenarios:

"If a write() requests that more bytes be written than there is room
for (for example, [XSI] [Option Start] the process' file size limit
or [Option End] the physical end of a medium), only as many bytes as
there is room for shall be written. For example, suppose there is
space for 20 bytes more in a file before reaching a limit. A write of
512 bytes will return 20. The next write of a non-zero number of bytes
would give a failure return (except as noted below)."

"When attempting to write to a file descriptor (other than a pipe or
FIFO) that supports non-blocking writes and cannot accept the data
immediately:

    * If the O_NONBLOCK flag is clear, write() shall block the calling
    thread until the data can be accepted.

    * If the O_NONBLOCK flag is set, write() shall not block the
    thread. If some data can be written without blocking the thread,
    write() shall write what it can and return the number of bytes
    written. Otherwise, it shall return -1 and set errno to [EAGAIN]."

"[ENOBUFS] Insufficient resources were available in the system to
perform the operation."

I think the first is more useful behavior than the last.  Supporting it
should be exactly the same as supporting what happens if the actual
filesystem fills up.  In this case, the filesystem is being requested to
write more "than there is room for."

-- 
Brian Fundakowski Feldman                           \'[ FreeBSD ]''''''''''\
  <> green@FreeBSD.org                               \  The Power to Serve! \
 Opinions expressed are my own.                       \,,,,,,,,,,,,,,,,,,,,,,\



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20050420155233.GJ1157>