From owner-freebsd-chat  Wed Oct 25 11:35:28 2000
Delivered-To: freebsd-chat@freebsd.org
Received: from peace.netnation.com (peace.netnation.com [204.174.223.2])
	by hub.freebsd.org (Postfix) with ESMTP id 8F35037B479
	for <chat@freebsd.org>; Wed, 25 Oct 2000 11:35:25 -0700 (PDT)
Received: from sim by peace.netnation.com with local (Exim 3.13 #5)
	id 13oVOS-0003Id-00; Wed, 25 Oct 2000 11:35:20 -0700
Date: Wed, 25 Oct 2000 11:35:20 -0700
From: Simon Kirby <sim@stormix.com>
To: Jamie Lokier <lk@tantalophile.demon.co.uk>
Cc: Jonathan Lemon <jlemon@flugsvamp.com>,
	Dan Kegel <dank@alumni.caltech.edu>, chat@freebsd.org,
	linux-kernel@vger.kernel.org
Subject: Re: kqueue microbenchmark results
Message-ID: <20001025113520.E12064@stormix.com>
References: <20001024225637.A54554@prism.flugsvamp.com> <39F6655A.353FD236@alumni.caltech.edu> <20001025010246.B57913@prism.flugsvamp.com> <20001025112709.A1500@stormix.com> <20001025190848.C2266@pcep-jamie.cern.ch>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 1.0i
In-Reply-To: <20001025190848.C2266@pcep-jamie.cern.ch>; from lk@tantalophile.demon.co.uk on Wed, Oct 25, 2000 at 07:08:48PM +0200
Sender: owner-freebsd-chat@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.org

On Wed, Oct 25, 2000 at 07:08:48PM +0200, Jamie Lokier wrote:

> Simon Kirby wrote:
> 
> > What applications would do better by postponing some of the reading? 
> > I can't think of any reason off the top of my head why an application
> > wouldn't want to read everything it can.
> 
> Pipelined server.
> 
> 1. Wait for event.
> 2. Read block
> 3. If EAGAIN, goto 1.
> 4. If next request in block is incomplete, goto 2.
> 5. Process next request in block.
> 6. Write response.
> 7. If EAGAIN, wait until output is ready for writing then goto 6.
> 8. Goto 1 or 2, your choice.
>    (Here I'd go to 2 if the last read was complete -- it avoids a
>    redundant call to poll()).
> 
> If you simply read everything you can at step 2, you'll run out of
> memory the moment someone sends you 100000 requests.
> 
> This doesn't happen if you leave unread data in kernel space --
> TCP windows and all that.

Hmm, I don't understand.

What happens at "wait until output is ready for writing then goto 6"?
You mean you would stop the main loop to wait for a single client to
unclog?  Wouldn't you just do this? ->

1. Wait for event (read and write queued).  Event occurs: Incoming
   data available.
2. Read a block.
3. Process block just read: Does it contain a full request?  If not,
   queue, goto 2, munge together.  If no more data, queue beginning
   of request, if any, and goto 1.
4. Walk over available requests in block just read.  Process.
5. Attempt to write response, if any.
6. Attempted write: Did it all get out?  If not, queue waiting
   writable data and goto 1 to wait for a write event.
7. Goto 2.

Assume we got write clogged.  Some loop later:

10. Wait for event (read and write queued).  Event occurs: Write
    space available.
11. Write remaining available data.
12. Attempted write: Did it all get out?  If not, queue remaining
    writable data and goto 1 to wait for another write event.
13. Goto 2.

    (If we're some sort of forwarding daemon and the receiving end
    of our forward has just unclogged, we want to read any readable
    data we had waiting.  Same with if we're just answering a
    request, though, as the send direction could still get clogged.)

What can't you do here?  What's wrong?  Note that the write event will
let you read any remaining queued data.  If you actually stop from going
back to the main loop when you're write clogged, you will pause the
daemon and create an easy DoS problem.  There's no way around needing to
queue writable data at least.

This is how I wrote my irc daemon a while back, and it works fine with
select().  I can't see what wouldn't work with edge-triggered events
except perhaps the write() event -- I'm not sure what would be considered
"triggered", perhaps when it goes under a watermark or something.  In any
case, it should all still work assuming get_events() offers the ability
to receive "write space available" events.

You don't have to read all data if you don't want to, assuming you will
get another event later that will unclog the situation (meaning the
obstacle must also trigger an event when it is cleared).

In fact, if you did leave the read queued in a daemon using select()
before, you'd keep looping endlessly taking all CPU and never idle
because there would always be read data available.  You'd have to not
queue the descriptor into the read set and instead stick it in the write
set so that you can sleep waiting for the write set to become available,
effectively ignorning any further events on the read set until the write
unclogs.  This sounds just like what would happen if you only got one
notification (edge triggered) in the first place.

Simon-

[  Stormix Technologies Inc.  ][  NetNation Communications Inc. ]
[       sim@stormix.com       ][       sim@netnation.com        ]
[ Opinions expressed are not necessarily those of my employers. ]


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-chat" in the body of the message