Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Nov 1996 14:55:30 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        jgreco@brasil.moneng.mei.com (Joe Greco)
Cc:        terry@lambert.org, jgreco@brasil.moneng.mei.com, jdp@polstra.com, scrappy@ki.net, hackers@FreeBSD.org
Subject:   Re: Sockets question...
Message-ID:  <199611152155.OAA27106@phaeton.artisoft.com>
In-Reply-To: <199611152014.OAA28769@brasil.moneng.mei.com> from "Joe Greco" at Nov 15, 96 02:14:47 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> If I want to do a 
> 
> write(fd, buf, 1048576)
> 
> on a socket connected via a 9600 baud SLIP link, I might expect the system
> call to take around 1092 seconds.  If I have a server process dealing with
> two such sockets, response time will be butt slow if the server is
> currently writing to the other socket...  it has to wait for the write to
> complete because write(2) has to finish sending the entire 1048576 bytes.

Actually, write will return when the data has been copied into the
local transmit buffers, not when it has actually been sent.  It's
only when you run out of local transmit buffers that the write blocks.

And well it should: something needs to tell the server process to
quit making calls which the kernel is unable to satisfy.  Halting
the server process based on resource unavailability does this.


> So a clever software author does not do this.  He has 1048576 bytes of
> (different, even) data that he wants to write "simultaneously" to two
> sockets.  He wants to do the equivalent of Sun's
> 
> aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL);
> aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL);

Yes.  This is *exactly* what he wants to do.

> Well how the hell do you do THAT if you are busy blocked in a write call?

He uses a native aiowrite().

Or he wants to call a write from a thread dedicated to that client,
which may block the thread, but not the process, and therefore not
other writes.

The underlying implementation may use non-blocking I/O, or it may use
an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space
threads library provided).  It doesn't matter.  That's the point of
using threads.


> Well, you use non-blocking I/O...  and you take advantage of the fact that
> the OS is capable of buffering some data on your behalf.
> 
> Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1"
> and "len2" for the size of the corresponding buf's.
> 
> You write code to do the following:
> 
> rval = write(fd1, buf1, len1)		# Wrote 2K of data
> len1 -= rval;				# 1046528 bytes remain
> buf1 += rval;				# Move forward 2K in buffer
[ ... ]
> You can trivially do this with a moderately complex select() mechanism,
> so that the outbound buffers for both sockets are kept filled.


This is exactly the finite state automaton I was talking about
having to move into user space code in order to use the interface.

It makes things more complex for the user space programmer.


> A little hard to do without nonblocking sockets.  Very useful.  I don't
> think that this is a "stupid idea" at all.

Maybe not compared to being unable to do it at all... but BSD is not
limited this way.  We have threads.


> > What is the point of a non-blocking write if this is what happens?
> 
> I will leave that as your homework for tonite.

Answer:		for writes in a multiple client server.
Extra credit:	the failure case that originated this discussion was
		concerned with a client using read.

> Please tell that to FreeBSD's FTP server, which uses a single (blocking)
> write to perform delivery of data.
> 
> Why should an application developer have to know or care what the available
> buffer space is?  Please tell me where in write(2) and read(2) it says I
> must worry about this.
> 
> It doesn't.

Exactly my point on a socket read not returning until it completes.

> > Indeterminate sockets are evil.  They are on the order of not knowing
> > your lock state when entering into a function that's going to need
> > the lock held.
> 
> I suppose you have never written a library function.
> 
> I suppose you do not subscribe to the philosophy that you should be
> liberal in what you accept (in this case, assume that you may need to
> deal with either type of socket).

If I wrote a library function which operated on a nonu user-opaque
object like a socket set up by the user, then it would function for
all potential valid states in which that object could be at the time
of the call.  For potential invalid states, I would trap the ones
which I could identify from subfunction returns, and state that the
behaviour for other invalid states was "undefined" in the documentation
which I published with the library (ie: optimise for the success case).


More likely, I would encapsulate the object using an opaque data
type, and I would expect the users who wish to consume my interface
to obtain an object of that type, operate on the object with my
functions, and release the object when done.  In other words, I
would employ standard data encapsulation techniques.


> I wonder if anyone has ever rewritten one of your programs, and made
> a fundamental change that silently broke one of your programs because
> an underlying concept was changed.

Unlikely.  I document my assumptions.


> Any software author who writes code and does not perform reasonable
> sanity checks on the return value, particularly for something as important
> as the read and write system calls, is hanging a big sign around their
> neck saying "Kick Me I Code Worth Shit".

On the other hand, "do not test for an error condition which you can
not handle".

If as part of my rundown in a program, I go to close a file, and the
close fails, what should I do about it?  Not exit?  Give me a break...

> > It bothers me too... I am used to formatting my IPC data streams.  I
> > either use fixed length data units so that the receiver can post a
> > fixed size read, or I use a fix length data unit, and guarantee write
> > ordering by maintaining state. I do this in order to send a fixed
> > length header to indicate that I'm writing a variable length packet,
> > so the receiver can then issue a blocking read for the right size.
> 
> I have never seen that work as expected with a large data size.

I have never seen *any* IPC transport work (reliably) with large data
sizes... depending on your definition of large.  To deal with this,
you can only encapsulate the transport and handle them, or don't use
large data sizes in the first place.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611152155.OAA27106>