Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Oct 1999 01:28:41 -0700 (PDT)
From:      Alfred Perlstein <bright@wintelcom.net>
To:        smp@freebsd.org
Subject:   SMP infoletter #1
Message-ID:  <Pine.BSF.4.05.9910270125180.12797-100000@fw.wintelcom.net>

next in thread | raw e-mail | index | archive | help

Infoletter #1

This is the start of what I hope to be several informative documents
describing the current and ongoing state of SMP in FreeBSD.

The purpose is to avoid duplicate research of the current state of
FreeBSD's SMP behavior by those who haven't been following FreeBSD-SMP
since 'day one'.  It also points out some areas that are still
unclear to me.

This document was written on Tue Oct 26 1999 referencing the HEAD
branch of the code, things may have significantly changed since.

I also hope that this series helps to shed some light onto the low
level routines in the kernel such as trap and interrupt handling,
ASTs and scheduling.

Where possible direct pointers are given to source code to reduce
the amount of digging one must do to locate routines of interest.

It is also important to note that the document is the result of
the author investigation into the code, and much appreciated help
from various members of the FreeBSD development team, (Poul-Henning
Kamp (phk), Alan Cox (alc), Matt Dillon (dillon)) and Terry Lambert.
As I am not the writer of the code there may be missing or incorrect
information contained in this document.

Please email any corrections or comments to smp@freebsd.org and
please make sure I get a CC. (alfred@freebsd.org)

------------------------------------------------------------

The Big Giant Lock: (src/sys/i386/i386/mplock.s)

The current state of SMP in FreeBSD is by means of the Big Giant
Lock, (BGL).

The BGL is an exclusive counting semaphore, the lock may be
recursively acquired by a single CPU, from that point on other CPUs
will spin while waiting to acquire the lock.

The implementation on i386 is contained in the file
src/sys/i386/i386/mplock.s

The function 'void MPgetlock(unsigned int *lock)' acquires the BGL.

An important side effect of MPgetlock is that it routes all interrupts
to the processor that has acquired the lock.  This is done so that
if an interrupt occurs the handler doesn't need to spin waiting for
the BGL.

The code that is responsible for routing the interrupts is the GRAB_HWI
macro within the MPgetlock code.  Which fiddles the local APIC's
interrupt priority level.

Other MPlock functions exist in mplock.s to initialize, test and
release the lock.

---

Usage of the BGL: (src/sys/i386/i386/mplock.s)

The BGL is pushed down (acquired) on all entry into the kernel, by
means of syscall, trap or interrupt.

The file src/sys/i386/i386/exception.s contains all the initial
entry points for syscalls, traps and interrupts.

syscalls and 'altsyscalls' acquire the lock through the macros
SYSCALL_LOCK, and ALTSYSCALL_LOCK which map to the functions assembler
functions _get_syscall_lock and _get_altsyscall_lock on SMP machines
(if SMP is not defined they are not called)

_get_syscall_lock and _get_altsyscall_lock are also present in
src/sys/i386/i386/mplock.s, they save the contents of the local
apic's interrupt priority and call MPgetlock.

It would seem that the syscall lock could simply be delayed until
entry to the actual system call (write/read/...) however several
issues arise:

1) fault on copyin of user's syscall arguments

This is actually a non-issue, if a fault occurs the processor will
spin to acquire the MPlock, before potentially recursing into the
non-re-entrant vm system.  Although this leaves the processor in
a faulted state for quite some time, it is no different than when
CPU 1 has the lock and a process running on CPU 2 page faults.

Problem #1 takes care of itself because of the recursive MPlock.

2) ktrace hooks

src/sys/kern/kern_ktrace.c

The ktrace hooks in the syscalls manipulate kernel resources that
are not MP safe, ktrace touches many parts of the kernel that need
work to become MP safe, a temporary solution would be to raise the
BGL when entering the ktrace code.

3) STOPEVENT aka void stopevent(struct proc*, unsigned int, unsigned int);

/home/src/sys/kern/sys_process.c

stopevent will be called if the process is marked to sleep via
procfs, stopping the process requires entry into the scheduler
which is not MP safe.

again a temporary hack would be to conditionally set the MPlock if
the condition exists.

---

SPL issues:    (src/sys/i386/isa/ipl_funcs.c)

There exists an inherent race condition with the spl() system in
a MP environment, consider:

  system is at splbio:

  process A          process B

  int s;             int s;
  s = splhigh();                            /* spl raised to high however, 
                                               saved spl 's' has old value
                                               of splbio */
                     s = splhigh();         /* spl still high */
  splx(s);                                  /* processor spl now at bio
                                               even though B still needs
                                               splhigh */
                     splx(s);


Process B may be interrupted in a critical section.

Also note that the asymmetric nature of the spl system makes it
very difficult to pinpoint down locations in the the bottom half
of the kernel (the part that services interrupts) that may collide
with the top half (user process context).

A short sighted solution would be to enforce spl as an MPlock, an
exclusive counting semaphore, however since no locking protocol or
ordering of spl pushdown is required deadlock becomes a major
problem.

The only solution that may work with spl, is adding the pushdown
of the BGL when first asserting any level of spl and releasing the
MPlock when spl0 is reached.

It may also be interesting to see what a separate lock based only
on spl would accomplish, moving to a model where the spl entry
points become our new BGL might also be something to investigate.

Since spl is used only for short time mutual exclusion it may
actually work nicely as a course grained locking system for the
time being.

---

Simple locks:   (src/sys/i386/i386/simplelock.s)

cursory research into the CVS logs reveals:

on the file kern/vfs_syscalls.c:

   1.28 Thu Jul 13 8:47:42 1995 UTC by davidg 
   Diffs to 1.27 

   NOTE: libkvm, w, ps, 'top', and any other utility which depends on struct
         proc or any VM system structure will have to be rebuilt!!!

   Much needed overhaul of the VM system. Included in this first round of
   changes:
 ...
   4) simple_lock's removed. Discussion with several people reveals that the
      SMP locking primitives used in the VM system aren't likely the mechanism
      that we'll be adopting. Even if it were, the locking that was in the code
      was very inadequate and would have to be mostly re-done anyway. The
      locking in a uni-processor kernel was a no-op but went a long way toward
      making the code difficult to read and debug.

However with the Lite/2 merge they were re-introduced and the kernel
is littered with them, the ones in place seem somewhat adequate
for short term exclusion.  essentially they are spinlocks.

What's interesting is that the simplelocks seem to provide for MP
sync with lockmgr locks, however the code is littered with calls
to unsafe functions such as MALLOC.

It looks like someone decided to do the hard stuff first.

Why are the simplelocks necessary if the kernel is still guarded
by the BGL?  (besides use in the lockmgr)

---

Scheduler:

The scheduler in cpu_switch() (src/sys/i386/i386/swtch.s) saves the
current nesting level of the process's MPlock (after masking off
the CPUid bits from it) into the PCB (process control block) (lines
317-324) before attempting to switch to another process where it
restores the next process's nesting level (lines 453-455).

---

-Alfred Perlstein - [bright@rush.net|alfred@freebsd.org]
Wintelcom systems administrator and programmer
   - http://www.wintelcom.net/ [bright@wintelcom.net]




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-smp" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9910270125180.12797-100000>