From owner-freebsd-hackers  Tue May  9 14:37:44 1995
Return-Path: hackers-owner
Received: (from majordom@localhost)
          by freefall.cdrom.com (8.6.10/8.6.6) id OAA26766
          for hackers-outgoing; Tue, 9 May 1995 14:37:44 -0700
Received: from cs.weber.edu (cs.weber.edu [137.190.16.16])
          by freefall.cdrom.com (8.6.10/8.6.6) with SMTP id OAA26760
          for <freebsd-hackers@FreeBSD.org>; Tue, 9 May 1995 14:37:41 -0700
Received: by cs.weber.edu (4.1/SMI-4.1.1)
	id AA19726; Tue, 9 May 95 15:28:07 MDT
From: terry@cs.weber.edu (Terry Lambert)
Message-Id: <9505092128.AA19726@cs.weber.edu>
Subject: Re: Apache + FreeBSD 2.0 benchmark results (fwd)
To: taob@gate.sinica.edu.tw (Brian Tao)
Date: Tue, 9 May 95 15:28:07 MDT
Cc: nc@ai.net, Arjan.deVet@nl.cis.philips.com, freebsd-hackers@FreeBSD.org,
        Guido.VanRooij@nl.cis.philips.com
In-Reply-To: <Pine.BSI.3.91.950510033052.9251D-100000@aries.ibms.sinica.edu.tw> from "Brian Tao" at May 10, 95 03:36:03 am
X-Mailer: ELM [version 2.4dev PL52]
Sender: hackers-owner@FreeBSD.org
Precedence: bulk

> > The correct term for "pre-forking" is "spawn-ahead".
> 
>     I was always under the impression that the creation of another
> process is called "forking" under UNIX and not "spawning (isn't that
> VAX-speak?).

Nope; just generic CS-speak.

> > Actually, a lot of UNIX kernels keep process templates around, which
> > are most of the generic process information but none of the specific
> > so as to optimize forking benchmarks (hint, hint).
> 
>     What, have a specially-compiled kernel that can fork off httpd's
> in no time at all?  As usual, you're too far ahead of me, Terry, and
> I'm having trouble keeping up.  :-/

Nope; it applies to all processes -- it keeps a preallocated process
pool to reduce (not eliminate) fork time.  Things like memory setup
and so on still take the same amount of time.  You'd have to combine
it with vfork to get rid of most of this, and that'd only work to
reduce overall fork+exec time, not fork time.  The procedure is described
in both the Bach book and "The Magic Garden Explained".

Like caching the uid and gid in the library for use in user queries
to avoid system calls on HP machines, it's mostly a dodge to get
better benchmark numbers, although it does optimize specifica cases
that end up being similar to benchmark usage.

>     BTW, the multithreaded server I've got running on my FreeBSD box
> probably isn't truly "multithreaded" (it uses select() to handle
> multiple connections with a single process).  What should this be
> called?  A multiheaded server?

That's a 2 letter difference!  Any you were worried about a 4 letter
difference on "pre-forking"  8-)  8-).

A select-based threading is an I/O Dispatch model, since each time data
is available it gets dispatched.  This is close to a voluntary context
switch threading model (which is what Windows prior to Win95 used).  If
you converted your I/O requests not involved with the actual dispatch
scheduling into asynchronus requests plus a context switch, you'd be
close to the SunOS 4.x LWP/NetWare 3.x & 4.x/VMS MTS models, which are
all voluntary context switch based thread scheduling.  8-).

If you don't convert the I/O requests, then you aren't really a
multithreaded server at all, since a blocking request in any thread
of control can block other threads of control that would otherwise
be runnable.

The other alternative is a Non-Blocking I/O Dispatch model, where you
guarantee that you will not attempt potentially blocking operations
in the context of a dispatched thread of control.  This is actually
how SVR4 port monitors work (as well as being how xpmon, a program
I wrote to manage 36 X terminals sessions from a single process instead
of running 36 copies of xdm, works).  Depending on what you do with
this model, you can actually end up with some very complex finite
state automatons to get the desired behaviour.  It depends on all
blocking requests being replaced with non-blocking requests, and
assumes both shared stack and shared heap, replacing a context switch
with a state structure switch and a state transition.

I expect that you'll get your best performance from the spawn-ahead
implementation, with the I/O dispatch server and the forking server's
relative performance depending on whether the average request takes
more time to satisfy than a fork takes (assuming light server loading).

I expect under heavy server loading that the two forking models will
converge and the I/O dispatch model will drop *way* behind.  The
pre-spawn model might actually pull ahead depending on whether it is
a work-to-do model or whether a client is "married" to a server.  If
the former, then you won't be having any more fork overhead, and
you'll get the maximum concurrency you can get without actually
changing your process model (since you will continue to have context
switch overhead equal to the pure forking model).

The whole subject is quite fascinating.  I could even tell you why
the Solaris/UnixWare/SVR4 threading model pretty much sucks out if
you don't have a 1:1 correlation between kernel and user threads
and write your own scheduling class and use async I/O... assuming you
were even interested, since you don't have a server using this model
(or the LWP model) at all.


					Terry Lambert
					terry@cs.weber.edu
---
Any opinions in this posting are my own and not those of my present
or previous employers.