Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Feb 2015 09:46:35 -0500
From:      John Baldwin <jhb@freebsd.org>
To:        arch@freebsd.org
Subject:   RFC: bus_get_cpus(9)
Message-ID:  <1848011.eGOHhpCEMm@ralph.baldwin.cx>

next in thread | raw e-mail | index | archive | help
One of the next steps for NUMA device-awareness is a way to let drivers know 
which CPUs are ideal to use for interrupts (and in particular this is targeted 
at multiqueue NICs that want to create a TX/RX ring pair per CPU).  However, 
for modern Intel systems at least, it is usually best to use CPUs from the 
physical processor package that contains the I/O hub that a device connects to 
(e.g. to allow DDIO to work).

The PoC API I came up with is a new bus method called bus_get_cpus() that 
returns a requested cpuset for a given device.  It accepts an enum for the 
second parameter that says the type of cpuset being requested.  Currently two 
valus are supported:

 - LOCAL_CPUS (on x86 this returns all the CPUs in the package closest to the
   device when NUMA is enabled)
 - INTR_CPUS (like LOCAL_CPUS but only returns 1 SMT thread for each core)

For a NIC driver the expectation is that the driver will call 
'bus_get_cpus(dev, INTR_CPUS, &set)' and create queues for each of the CPUs in 
'set'.  (In my current patchset I have updated igb(4) to use this approach.)

For systems that do not support NUMA (or if it is not enabled in the kernel 
config), LOCAL_CPUS is mapped to 'all_cpus' by default in the 'root_bus' 
driver.  INTR_CPUS is also mapped to 'all_cpus' by default.

The x86 interrupt code maintains its own set of interrupt CPUs which this 
patch now exposes via INTR_CPUS in the x86 nexus driver.

The ACPI bus driver and PCI bridge drivers use _PXM to return a suitable 
LOCAL_CPUS set when _PXM exists and NUMA is enabled.  They also and the global 
INTR_CPUS set from the nexus driver with the per-domain set from _PXM to 
generate a local INTR_CPUS set for child devices.

The current patch can be found here:

https://github.com/bsdjhb/freebsd/compare/bsdjhb:master...numa_bus_get_cpus

It includes a few other fixes besides the implementation of bus_get_cpu() (and 
some things have already been committed such as 
taskqueue_start_threads_cpuset() and CPU_COUNT()):

 - It fixes the x86 interrupt code to exclude modern SMT threads from the
   default interrupt set.  (Previously only Pentium 4-era HTT threads were
   excluded.)
 - It has a sample conversion of igb(4) to this interface (albeit ugly using
   #if's).

Longer term I think I would like to make the INTR_CPUS thing a bit more 
formal.  In particular, Solaris allows you to alter the set of CPUs that 
handle interrupts via prctl (or a tool named something close to that).  I 
think I would like to have a dedicated global cpuset for that (but not named 
"2", it would be a new WHICH level).  That would allow userland to use cpuset 
to alter the set of CPUs that handle interrupts in case you wanted to use SMT 
for example.  I think if we do this that all ithreads would have their cpusets 
hang off of this set instead of the root set (which would also remove some of 
the recent special case handling for ithreads I believe).  The one uglier part 
about this is that we should probably then have a way to notify drivers that 
INTR_CPUS changed so that they could try to cope gracefully.  I think that's a 
bit of a longer horizon thing, but for now I think bus_get_cpus() is a good 
next step.

What do other folks think?  (And yes, I know it needs a manpage before it goes 
in, but I'd rather get the API agreed on before polishing that.)

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1848011.eGOHhpCEMm>