Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 29 Jul 2014 14:51:19 -0700
From:      Andrew Bates <andrewbates09@gmail.com>
To:        Adrian Chadd <adrian@freebsd.org>
Cc:        "freebsd-hackers@freebsd.org" <freebsd-hackers@freebsd.org>, Jeff Roberson <jeff@freebsd.org>
Subject:   Re: Working on NUMA support
Message-ID:  <CAPi5Lmk1zGSOg%2Bp2Hdhw2Uk%2BMZfOX-=bFFTsdH%2B5ZZwd7_oqeg@mail.gmail.com>
In-Reply-To: <CAJ-Vmom-wWZLCuuAEKDO1vuaGaSQM-=4e3xoh3OeVibc6m9Z8A@mail.gmail.com>
References:  <CAPi5LmkRO4QLbR2JQV8FuT=jw2jjcCRbP8jT0kj1g8Ks%2B7jv8A@mail.gmail.com> <CAJ-VmonJPT-NUSi=Wnu7a0oNwe8V=LQMZ-fZGriC7H44edRVLg@mail.gmail.com> <CAPi5Lm=8Z3fh_vxKY26qC3oEv1Ap%2BRvFGRAOhRosF5UEnDTVpw@mail.gmail.com> <00E55D89-BDD1-41AD-BBF6-6752B90E8324@ccsys.com> <CAJ-Vmom-wWZLCuuAEKDO1vuaGaSQM-=4e3xoh3OeVibc6m9Z8A@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
Hey Adrian,

Yes, there has been progress on this - although admittedly not as much as
we'd like at this point.  I believe to what you're talking about, we have
the layout for CPU affinity/locality.  I need to go through and cleanup a
good half-dozen branches of code.

Myself a mere mortal standing on the shoulders of giants in a room of
titans, I have to merge in my changes with Jeff's pertinent branch to get
this closer to useable.

>From my experience and research, in terms of access/response time:
1. localized DMA < all remote
2. (localized DMA + spillover remote) >= all remote

As ugly as it may be, I think I said that right..

There have been a few changes since that original email, but yes what we're
working to address is the userland <---> kernelspace.


On Sat, Jul 26, 2014 at 1:11 PM, Adrian Chadd <adrian@freebsd.org> wrote:

> Hi all!
>
> Has there been any further progress on this?
>
> I've been working on making the receive side scaling support usable by
> mere mortals and I've reached a point where I'm going to need this
> awareness in the 10ge/40ge drivers for the hardware I have access to.
>
> I'm right now more interested in the kernel driver/allocator side of
> things, so:
>
> * when bringing up a NIC, figure out what are the "most local" CPUs to run
> on;
> * for each NIC queue, figure out what the "most local" bus resources
> are for NIC resources like descriptors and packet memory (eg mbufs);
> * for each NIC queue, figure out what the "most local" resources are
> for local driver structures that the NIC doesn't touch (eg per-queue
> state);
> * for each RSS bucket, figure out what the "most local" resources are
> for things like packet memory (mbufs), tcp/udp/inp control structures,
> etc.
>
> I had a chat with jhb yesterday and he reminded me that y'all at
> isilon have been looking into this.
>
> He described a few interesting cases from the kernel side to me.
>
> * On architectures with external IO controllers, the path cost from an
> IO device to multiple CPUs may be (almost) equivalent, so there's not
> a huge penalty to allocate things on the wrong CPU. I think it'll be
> nice to get CPU local affinity where possible so we can parallelise
> DRAM access fully, but we can play with this and see.
> * On architectures with CPU-integrated IO controllers, there's a large
> penalty for doing inter-CPU IO,
> * .. but there's not such a huge penalty for doing inter-CPU memory access.
>
> Given that, we may find that we should always put the IO resources
> local to the CPU it's attached to, even if we decide to run some / all
> of the IO for the device on another CPU. Ie, any RAM that the IO
> device is doing data or descriptor DMA into should be local to that
> device. John said that in his experience it seemed the penalty for a
> non-local CPU touching memory was much less than device DMA crossing
> QPI.
>
> So the tricky bit is figuring that out and expressing it all in a way
> that allows us to do memory allocation and CPU binding in a more aware
> way. The other half of this tricky thing is to allow it to be easily
> overridden by a curious developer or system administrator that wants
> to experiment with different policies.
>
> Now, I'm very specifically only addressing the low level kernel IO /
> memory allocation requirements here. There's other things to worry
> about up in userland; I think you're trying to address that in your
> KPI descriptions.
>
> Thoughts?
>
>
> -a
>



-- 
V/Respectfully,
Andrew M Bates



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAPi5Lmk1zGSOg%2Bp2Hdhw2Uk%2BMZfOX-=bFFTsdH%2B5ZZwd7_oqeg>