Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 15 Nov 1996 17:00:53 -0600 (CST)
From:      Joe Greco <jgreco@brasil.moneng.mei.com>
To:        terry@lambert.org (Terry Lambert)
Cc:        jgreco@brasil.moneng.mei.com, terry@lambert.org, jdp@polstra.com, scrappy@ki.net, hackers@FreeBSD.org
Subject:   Re: Sockets question...
Message-ID:  <199611152300.RAA29354@brasil.moneng.mei.com>
In-Reply-To: <199611152155.OAA27106@phaeton.artisoft.com> from "Terry Lambert" at Nov 15, 96 02:55:30 pm

next in thread | previous in thread | raw e-mail | index | archive | help
> > If I want to do a 
> > 
> > write(fd, buf, 1048576)
> > 
> > on a socket connected via a 9600 baud SLIP link, I might expect the system
> > call to take around 1092 seconds.  If I have a server process dealing with
> > two such sockets, response time will be butt slow if the server is
> > currently writing to the other socket...  it has to wait for the write to
> > complete because write(2) has to finish sending the entire 1048576 bytes.
> 
> Actually, write will return when the data has been copied into the
> local transmit buffers, not when it has actually been sent.  It's
> only when you run out of local transmit buffers that the write blocks.

Yes, that should be clear, I made it clear that this is precisely what
allows non-blocking sockets to be useful in this scenario.

> And well it should: something needs to tell the server process to
> quit making calls which the kernel is unable to satisfy.  Halting
> the server process based on resource unavailability does this.

So does returning EWOULDBLOCK to the server process, allowing the server
to react to this by going on to service someone else.

> > So a clever software author does not do this.  He has 1048576 bytes of
> > (different, even) data that he wants to write "simultaneously" to two
> > sockets.  He wants to do the equivalent of Sun's
> > 
> > aiowrite(fd1, buf1, 1048576, SEEK_CUR, 0, NULL);
> > aiowrite(fd2, buf2, 1048576, SEEK_CUR, 0, NULL);
> 
> Yes.  This is *exactly* what he wants to do.
> 
> > Well how the hell do you do THAT if you are busy blocked in a write call?
> 
> He uses a native aiowrite().

Which doesn't exist in a portable fashion.  ANYWHERE.

> Or he wants to call a write from a thread dedicated to that client,
> which may block the thread, but not the process, and therefore not
> other writes.

Which is fine IF you have a threads implementation.  Which is, again, not
a given, and therefore, not portable.

> The underlying implementation may use non-blocking I/O, or it may use
> an OS implementation of aiowrote (like Sun's SunOS 4.3 LWP user space
> threads library provided).  It doesn't matter.  That's the point of
> using threads.

Yes, well, the point of using threads is currently that you're not really 
assured of being portable.

I do not disagree that in an ideal world, threads are a good way to deal
with this.

> > Well, you use non-blocking I/O...  and you take advantage of the fact that
> > the OS is capable of buffering some data on your behalf.
> > 
> > Let's say you have "buf1" and "buf2" to write to "fd1" and "fd2", and "len1"
> > and "len2" for the size of the corresponding buf's.
> > 
> > You write code to do the following:
> > 
> > rval = write(fd1, buf1, len1)		# Wrote 2K of data
> > len1 -= rval;				# 1046528 bytes remain
> > buf1 += rval;				# Move forward 2K in buffer
> [ ... ]
> > You can trivially do this with a moderately complex select() mechanism,
> > so that the outbound buffers for both sockets are kept filled.
> 
> 
> This is exactly the finite state automaton I was talking about
> having to move into user space code in order to use the interface.
> 
> It makes things more complex for the user space programmer.

So?  Making things more complex is a small tradeoff if it makes it POSSIBLE
to do something in the first place.

Tell me, how else do you do this on a system that does NOT support threads?

You can select() on writability and send one byte at a time on a blocking
socket until select() reports no further writability.  Poor solution.

> > A little hard to do without nonblocking sockets.  Very useful.  I don't
> > think that this is a "stupid idea" at all.
> 
> Maybe not compared to being unable to do it at all... but BSD is not
> limited this way.  We have threads.

_FREE_BSD is not limited this way.  _FREE_BSD has threads.  The local
4.3BSD Tahoe system (it _is_ a BSD system, I hope you would agree) offers
nonblocking writes but does not offer threads.  Ultrix does not offer
threads.  I am sure there are other examples...

You are missing the point as usual.  BSD != FreeBSD, and FreeBSD != UNIX in
general.  I am continually amazed that someone like you could make that
error...

In order to write portable code, one must write portable code.

> > > What is the point of a non-blocking write if this is what happens?
> > 
> > I will leave that as your homework for tonite.
> 
> Answer:		for writes in a multiple client server.

Ahhhh.  You got it.

> Extra credit:	the failure case that originated this discussion was
> 		concerned with a client using read.

That is not very relevant.  The statement which originated _THIS_
discussion was your assertion that "Non-blocking sockets for reliable 
stream protocols like TCP/IP are a stupid idea."

I do not care about Karl's problem...  he may well have a legitimate
problem, and I agreed that it was probably beyond the scope of a usage
discussion given his description.

I do not care about Marc's problem...  that is a separate issue.

I am simply correcting a misconception that you are spreading that
non-blocking sockets are a "stupid idea".

> > Please tell that to FreeBSD's FTP server, which uses a single (blocking)
> > write to perform delivery of data.
> > 
> > Why should an application developer have to know or care what the available
> > buffer space is?  Please tell me where in write(2) and read(2) it says I
> > must worry about this.
> > 
> > It doesn't.
> 
> Exactly my point on a socket read not returning until it completes.

Yes, that's fine.  I agree that there are merits on both sides.  The read()
returning what is available is probably more generally useful, and that
seems to be what is implemented.

I am not going to argue with the design and implementation of the Berkeley
networking code, since it is widely considered to be the standard model
for networking.  Most other folks have not found this to be a critical
design flaw, and neither do I.  I can see several cases where a blocking
read() call would be a substantial nuisance, and so I think that the
behaviour as it exists makes a fair amount of sense.

> > > Indeterminate sockets are evil.  They are on the order of not knowing
> > > your lock state when entering into a function that's going to need
> > > the lock held.
> > 
> > I suppose you have never written a library function.
> > 
> > I suppose you do not subscribe to the philosophy that you should be
> > liberal in what you accept (in this case, assume that you may need to
> > deal with either type of socket).
> 
> If I wrote a library function which operated on a nonu user-opaque
> object like a socket set up by the user, then it would function for
> all potential valid states in which that object could be at the time
> of the call.  For potential invalid states, I would trap the ones
> which I could identify from subfunction returns, and state that the
> behaviour for other invalid states was "undefined" in the documentation
> which I published with the library (ie: optimise for the success case).

What do you define "potential valid states" to be?

I do not claim to cover all the bases all the time, but I do at least
catch exceptional conditions I was not expecting.  In my case, I would
try to write a socket-handling library function to handle both blocking 
and non-blocking sockets if it was reasonably practical to do so.  If
not, I would cause it to bomb if it detected something odd.

I think you are saying the same thing: that is good.

> More likely, I would encapsulate the object using an opaque data
> type, and I would expect the users who wish to consume my interface
> to obtain an object of that type, operate on the object with my
> functions, and release the object when done.  In other words, I
> would employ standard data encapsulation techniques.

Nifty.  That's even possible in many cases if you are designing from 
scratch.  Otherwise, it is a real pain in the butt.

> > I wonder if anyone has ever rewritten one of your programs, and made
> > a fundamental change that silently broke one of your programs because
> > an underlying concept was changed.
> 
> Unlikely.  I document my assumptions.

So what?  If I, as the engineer who replaces you five years down the road,
decide that your program needs to use non-blocking writes, and I change
the program to do them, and I miss one place where you failed to check
a return value, your "documented assumptions" are worth diddly squat.
Code your assumptions when they are this trivial to check.

> > Any software author who writes code and does not perform reasonable
> > sanity checks on the return value, particularly for something as important
> > as the read and write system calls, is hanging a big sign around their
> > neck saying "Kick Me I Code Worth Shit".
> 
> On the other hand, "do not test for an error condition which you can
> not handle".

One can handle ANY error condition by bringing it to the attention of
a higher authority.

My UNIX kernel panicks when it hits a condition that it does not know how
to handle.  It does not foolishly take your advice and "do not test for
an error condition which you can not handle".  To do so would risk great
havoc.  You ALWAYS test for error conditions, PARTICULARLY the ones which
you can not handle - because they are the really scary ones.

> If as part of my rundown in a program, I go to close a file, and the
> close fails, what should I do about it?  Not exit?  Give me a break...

No, but if a close() fails, and you had a reasonable expectation for it
to succeed, printing a warning is not unreasonable.  According to SunOS,
there are two reasons this could happen:  EBADF and EINTR.  If you are
closing an inactive descriptor, it is clearly an error in the code, and
I WOULD CERTAINLY WANT TO KNOW.  If it is due to a signal, it is unclear
what to do, but it is certainly not a "bad" idea to at least be aware
that such a thing can (and has) happened!

> > > It bothers me too... I am used to formatting my IPC data streams.  I
> > > either use fixed length data units so that the receiver can post a
> > > fixed size read, or I use a fix length data unit, and guarantee write
> > > ordering by maintaining state. I do this in order to send a fixed
> > > length header to indicate that I'm writing a variable length packet,
> > > so the receiver can then issue a blocking read for the right size.
> > 
> > I have never seen that work as expected with a large data size.
> 
> I have never seen *any* IPC transport work (reliably) with large data
> sizes... depending on your definition of large.  To deal with this,
> you can only encapsulate the transport and handle them, or don't use
> large data sizes in the first place.

Okay, here we are in complete agreement.  One _always_ needs to be aware
of this, then.

... JG



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199611152300.RAA29354>