From owner-freebsd-hackers  Tue May 21 18:45:23 1996
Return-Path: owner-hackers
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.3/8.7.3) id SAA24487
          for hackers-outgoing; Tue, 21 May 1996 18:45:23 -0700 (PDT)
Received: from phaeton.artisoft.com (phaeton.Artisoft.COM [198.17.250.211])
          by freefall.freebsd.org (8.7.3/8.7.3) with SMTP id SAA24482;
          Tue, 21 May 1996 18:45:21 -0700 (PDT)
Received: (from terry@localhost) by phaeton.artisoft.com (8.6.11/8.6.9) id SAA02800; Tue, 21 May 1996 18:40:10 -0700
From: Terry Lambert <terry@lambert.org>
Message-Id: <199605220140.SAA02800@phaeton.artisoft.com>
Subject: Re: Congrats on CURRENT 5/1 SNAP...
To: imp@village.org (Warner Losh)
Date: Tue, 21 May 1996 18:40:09 -0700 (MST)
Cc: davem@caip.rutgers.edu, terry@lambert.org, jehamby@lightside.com,
        jkh@time.cdrom.com, current@FreeBSD.ORG, hackers@FreeBSD.ORG
In-Reply-To: <199605220117.TAA01774@rover.village.org> from "Warner Losh" at May 21, 96 07:17:54 pm
X-Mailer: ELM [version 2.4 PL24]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-hackers@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

> Don't know if clustering is the answer because then you get a lot of
> data copying overhead that is free on a shared memory MP box.  Does
> lock contention get so bad that you actually win by data copying for
> huge numbers of CPUs?  I don't know, but it is certainly an area of
> research that will be fruitful for a few years.

I haven't responded to the other post yet myself because of the
time needed to give it fair treatment.

I dislike the clustering soloution at a gut level.  I have to say
I agree that the data copying overhead is a big loss, especially
if you plan to migrate processes between clusters (anyone else
ever run Amoeba?).

The locking overhead only applies to contended memory access;
using per processor pools instead of a Solaris/SVR4 pure SLAB
allocator resolves most of the locking issues.

The locking overhead scales relative to bus contention and
cache invalidation and/or update overhead for distributed
cache coherency.  This is more a design flaw in the SMP model
used by Solaris/SVR4 than it is an inhernet flaw in the
idea of shared memory SMP.

Hybrid NUMA architectures using per processor page pools instead
of sharing all memory are a big win (Sequent proved this; where
Sequent screwed up was in not doing real FS multithreading.  Even
so, Sequent was able to scale a BSD-derived OS to 32 processors
without a lot of undue overhead.


The idea of SLAB allocation (as treated in the Vahalia book) is
a good one; but it probably wants to be implemented on top of
a per processor pool global memory allocator to make it less
contentious.


The idea of distributed cluster computing only works for small
domain distribution; the net is currently too unreliable (and
too slow) for any serious process migration over a big domain.
And the net is probably only going to get worse in the short
term.  By the time it's fixed, I suspect content servers will
have a big edge over compute servers as a desirable resource.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.