From owner-freebsd-smp  Mon Apr 28 10:44:05 1997
Return-Path: <owner-smp>
Received: (from root@localhost)
          by hub.freebsd.org (8.8.5/8.8.5) id KAA27098
          for smp-outgoing; Mon, 28 Apr 1997 10:44:05 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.50])
          by hub.freebsd.org (8.8.5/8.8.5) with SMTP id KAA27031
          for <FreeBSD-SMP@FreeBSD.org>; Mon, 28 Apr 1997 10:43:57 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id KAA02151; Mon, 28 Apr 1997 10:41:47 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199704281741.KAA02151@phaeton.artisoft.com>
Subject: Re: SMP
To: ccsanady@nyx.pr.mcs.net (Chris Csanady)
Date: Mon, 28 Apr 1997 10:41:47 -0700 (MST)
Cc: black@zen.cypher.net, chuckr@mat.net, FreeBSD-SMP@FreeBSD.org
In-Reply-To: <199704280357.WAA12920@nyx.pr.mcs.net> from "Chris Csanady" at Apr 27, 97 10:57:12 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-smp@FreeBSD.org
X-Loop: FreeBSD.org
Precedence: bulk

> >freebsd-smp is not the best example of how to do SMP.  it uses the 
> >simplest method: one giant kernel lock.  i don't know that it is 
> >particularly representative of advanced SMP operating systems (though 
> >linux also uses a giant kernel lock).
> 
> Actually, linux has moved to a slightly finer grain system.  Now they
> have seperate locks for the run queues, scheduler, and some other
> things..

These are isolated subsystems.  They don't ever reenter code on multiple
CPU's simultaneously.

This is not much of a symmetry win; it also isn't a scalable win if
they don't place the locks in a hierarchical relationship.  Without
that change, they are subject to deadly embrace deadlocks if they get
any more complex in their locking structure.

The correct approach to incremental improvement is "push down".  You
define locking primitives that assume transitive closure over all the
other locks in the kernel (ie: that there is a hierarchical relationship
and intention modes), and initially run them all off the single lock.

Then you "push them down" into the kernel subsystems, "through" the
system call interface.

The important thing to note is that you lock *subsystems*, not *datum*
at this point.

Every time you descend the hierarchy completely, you free the top level
lock to be nothing but an intention mode holder, and you prerun Warshal's
to the n-1 level in the tree to make conflict calculation trivial and
nearly instantaneous.

Eventually, the terminal locks all lock datum, not subsystems, and you
can use non-SMP aware subsystem components (like protocol stacks or FS
modules) by locking the non-terminal locks... but you want to use SMP
aware code where possible.

This is different from the Solaris approach, where they lock subsystems
nearly exclusively, and functionally encapsulate the datum.  For one
thing, the Solaris stuff is only scalable to 4-8 CPU's before the
interprocessor contention makes it worthless to add more CPU's.  All
you have to do is look at the VFS code to see the scalability problems
in Solaris.  SVR4 is not any better (exception: the Unisys 6000/50 SMP
SVR4.0.2, independently developed by Unisys, did FS locking the right
way).


					Regards,
					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.