Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 03 Dec 1997 15:15:01 -0800
From:      Joe Eykholt <jre@Ipsilon.COM>
To:        Steve Passe <smp@csn.net>
Cc:        smp@freebsd.org
Subject:   Re: SMP
Message-ID:  <3485E7F5.15FB7483@ipsilon.com>
References:  <199712031657.JAA08702@Ilsa.StevesCafe.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Steve,

You wrote:

>  3: begin the design of "the real thing".  My current lock-pushdown attempts
>     to co-exist with the splxxx paradigm.  I'm pretty much convinced at this
>     point that the work I've done in this area is only useful to prove it
>     ain't gonna' cut it.  We need to design IN DETAIL a shift to a mutex based
>     kernel.  One obvious question is whether we move both UP and SMP that
>     direction, or just SMP.  There are many,many other questions to be
> answered.
>     It would be nice if we could progress on this issue in a serious manner,
>     getting a design wrapped up b4 I have to go back to my real job...
 

I've been playing with the SMP code a bit.  I have some design
suggestions (and some
code snippets) that might or might not be useful.  Some you've probably
already
got lots of ideas about.  For what they're worth:

1.  I agree that the mutex_lock/unlock based approach would be nice, and
more 
explicit, and possibly better even in the UP case.  Hopefully a
persuasive design
document can convince people it's worth the pain.  It should be possible
to 
have spin-type mutexes directly correspond to the various splxxx
routines at first,
and then break them up into finer-grained locks.  I'd use the
pthreads_mutex_lock()
interfaces (except perhaps leave the pthreads_ prefix off of the
names).  The
locks would be initialized with information about which interrupts are
automatically
blocked while the lock is held, etc.   BTW, I don't think locks should
be allowed to
be recursively grabbed.

2.  Whether a first-level interrupt handler automatically gets a lock
blocking all
similarly-registered (same imask) interrupts or not is an area to
consider.  This 
is analogous to raising CPL before re-enabling.  I'd like to investigate
very
different interrupt models, where interrupts are scheduled as separate
threads,
similar to but not exactly like Solaris.  I think this is possible
without massive
driver changes.

3.  I'd change the CPU-private variables to be inside a structure
(struct cpu).
There should be two structures, actually, one which is portable and one
which is
machine-dependent.  The portable one can be inside the machine-dependent
one, or
vice-versa.  This way, per-CPU variables are easier to identify.  (e.g.
cpl and 
ipending are per-CPU, and seeing CPU->cpu_cpl makes this more clear.  

4.  The per-CPU mapping causes problems with rfork and with
multi-threading in
general, and may also hurt context-switching.  I prefer the approach of
reserving
a segment and a segment register (%gs) to select the per-CPU structure
throughout
the kernel.  Unfortunately this does require loading %gs using the local
APIC ID
on every kernel entry.  It does make some accesses somewhat more
expensive ... I'm
not sure how much.   Solaris does this (which is what made me think of
it).
On non-i386 architectures, there's usually some other register available
which the
compiler doesn't use (%g7 on SPARC, %r2 on PowerPC) that can point to
the per-CPU or 
per-thread data.  Actually, pointing at the per-thread (or per-process
until there
are kernel threads) data rather than per-CPU data is better in the long
term.
This is because then preemption isn't a problem ... you can be preempted
and moved 
to another CPU and your per-thread data is still in the same place.  You
can find 
the current CPU pointer through your per-thread data.  Inlining
references through
the %gs register can actually reduce code size from the current curproc
method.

5.  The APIC vectors need to be re-arranged into a priority order so
that interrupts
don't need to access the I/O APIC to mask off the interrupt during
handling. 
(Or maybe there's a better way to do this).  I noticed that
level-sensitive
interrupts are getting taken twice, because the first interrupt only
masks with CPL,
so after the EOI and sti, the interrupt is still pending and is taken
again.  Only
the second time does it get masked in the I/O APIC.  Being a central
resource, locking
the IOAPIC on every interrupt is unacceptable, so restructuring the
interrupts in
a priority order and deferring the EOI until the end seems necessary.  I
don't 
completely like this, so I'm hoping there's a better way.

If you'd like to discuss any of these areas further, I'd be happy to.  I
have
some code developed around issues #3 and #4, which you could have.

	Thanks,
	Joe Eykholt



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?3485E7F5.15FB7483>