Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 21 May 1996 10:22:17 -0700 (MST)
From:      Terry Lambert <terry@lambert.org>
To:        davem@caip.rutgers.edu (David S. Miller)
Cc:        terry@lambert.org, jehamby@lightside.com, jkh@time.cdrom.com, current@freebsd.org, hackers@freebsd.org
Subject:   Re: Congrats on CURRENT 5/1 SNAP...
Message-ID:  <199605211722.KAA01411@phaeton.artisoft.com>
In-Reply-To: <199605210823.EAA07997@huahaga.rutgers.edu> from "David S. Miller" at May 21, 96 04:23:57 am

next in thread | previous in thread | raw e-mail | index | archive | help
> 
>    The SunOS LWP's are pretty easy.
> 
> Actually SunOS does do lwp scheduling where it checks for AST's etc.
> although I don't know how relevant that is to whats being discussed.

Yes.  It uses aioread/aiowrite/aiowait/aiocancel; these are closer
to an event flag cluster than AST's.


> Furthermore, the way Solaris does threads in the kernel has been
> proven to be a lose (pre-emption, a billion mutexes in the kernel,
> another thousand read writer locks) and expect the industry to move in
> "another" direction.  Computer science has proven that current smp
> technology (read as: what SVR4.2MP based kernels do right now) cannot
> scale past 32 cpu's without an exponential loss in performance.

This is an artifact of their VM implementation, and the number is
generally acknowledged to be 8 processors.  It's possible to get a
modified NUMA for an SMP environment using per processor page
allocation pools.  You're free to put SLAB allocators on top of
those pages.  This means that the allocation mutex need only be
held when the per processor page pool is refilled/released to the
general page pool.  Using a hierarchical lock manager and computation
of transitive closure over the lock hierarchy (treating it as a
directed graph), coupled with intention mode locking, there should
be a significant decrease in bus overhead.

I *don't* think you'd want a non-symmetric implementation.

It should be noted that multithreading UFS in SVR4 (UnixWare)
resulted in a 160% performance improvement -- even after the
performance loss for using mutexes for locking was subtracted
out.

> Clustering is the answer and can scale to more CPU's than you can
> count in an unsigned char. ;-)

So can the scheme described above.


					Terry Lambert
					terry@lambert.org
---
Any opinions in this posting are my own and not those of my present
or previous employers.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199605211722.KAA01411>