Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 16 Feb 2016 12:50:22 -0800
From:      John Baldwin <jhb@freebsd.org>
To:        arch@freebsd.org
Subject:   Starting APs earlier during boot
Message-ID:  <1730061.8Ii36ORVKt@ralph.baldwin.cx>

next in thread | raw e-mail | index | archive | help
Currently the kernel bootstraps the non-boot processors fairly early in the
SI_SUB_CPU SYSINIT.  The APs then spin waiting to be "released".  We currently
release the APs as one of the last steps at SI_SUB_SMP.  On the one hand this
removes much of the need for synchronization while SYSINITs are running since
SYSINITs basically assume they are single-threaded.  However, it also enforces
some odd quirks.  Several places that deal with per-CPU resources have to
split initialization up so that the BSP init happens in one SYSINIT and the
initialization of the APs happens in a second SYSINIT at SI_SUB_SMP.

Another issue that is becoming more prominent on x86 (and probably will also
affect other platforms if it isn't already) is that to support working
interrupts for interrupt config hooks we bind all interrupts to the BSP during
boot and only distribute them among other CPUs near the end at SI_SUB_SMP. 
This is especially problematic with drivers for modern hardware allocating
num(CPUs) interrupts (hoping to use one per CPU).  On x86 we have aboug 190
IDT vectors available for device interrupts, so in theory we should be able to
tolerate a lot of drivers doing this (e.g. 60 drivers could allocate 3
interrupts for every CPU and we should still be fine).  However, if you have,
say, 32 cores in a system, then you can only handle about 5 drivers doing
this before you run out of vectors on CPU 0.

Longer term we would also like to eventually have most drivers attach in the 
same environment during boot as during post-boot.  Right now post-boot is 
quite different as all CPUs are running, interrupts work, etc.  One of the 
goals of multipass support for new-bus is to help us get there by probing 
enough hardware to get timers working and starting the scheduler before 
probing the rest of the devices.  That goal isn't quite realized yet.

However, we can run a slightly simpler version of our scheduler before
timers are working.  In fact, sleep/wakeup work just fine fairly early (we
allocate the necessary structures at SI_SUB_KMEM which is before the APs
are even started).  Once idle threads are created and ready we could in
theory let the APs startup and run other threads.  You just don't have working 
timeouts.  OTOH, you can sort of simulate timeouts if you modify the scheduler 
to yield the CPU instead of blocking the thread for a sleep with a timeout.  
The effect would be for threads that do sleeps with a timeout to fall back to 
polling before timers are working.  In practice, all of the early kernel 
threads use sleeps without timeouts when idle so this doesn't really matter.

I've implemented these changes and tested them for x86.  For x86 at least
AP startup needed some bits of the interrupt infrastructure in place, so
I moved SI_SUB_SMP up to after SI_SUB_INTR but before SI_SUB_SOFTINTR.  I
modified the *sleep() and cv_*wait*() routines to not always bail if cold
is true.  Instead, sleeps without a timeout are permitted to sleep
"normally".  Sleeps with a timeout drop their interlock and yield the
CPU (but remain runnable).  Since APs are now fully running this means
interrupts are now routed to all CPUs from the get go removing the need for 
the post-boot shuffle.  This also resolves the issue of running out of IDT 
vectors on the boot CPU.

I believe that adopting other platforms for this change should be relatively
simple, but we should do that before committing the full patch.  I do think
that some parts of the patch (such as the changes to the sleep routines, and
using SI_SUB_LAST instead of SI_SUB_SMP as a catch-all SYSINIT) can be 
committed now without breaking anything.

However, I'd like feedback on the general idea and if it is acceptable I'd
like to coordinate testing with other platforms so this can go into the
tree.

The current changes are in the 'ap_startup' branch at github/bsdjhb/freebsd.
You can view them here:

https://github.com/bsdjhb/freebsd/compare/master...bsdjhb:ap_startup

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1730061.8Ii36ORVKt>