Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Oct 2014 21:53:52 -0600
From:      Warner Losh <imp@bsdimp.com>
To:        Adrian Chadd <adrian@FreeBSD.org>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: [rfc] enumerating device / bus domain information
Message-ID:  <838B58B2-22D6-4AA4-97D5-62E87101F234@bsdimp.com>
In-Reply-To: <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>
References:  <CAJ-VmokF7Ey0fxaQ7EMBJpCbgFnyOteiL2497Z4AFovc%2BQRkTA@mail.gmail.com> <2975E3D3-0335-4739-9242-5733CCEE726C@bsdimp.com> <CAJ-VmonbGW1JbEiKXJ0sQCFr0%2BCRphVrSuBhFnh1gq6-X1CFdQ@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help

--Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=windows-1252


On Oct 8, 2014, at 5:12 PM, Adrian Chadd <adrian@FreeBSD.org> wrote:

> On 8 October 2014 12:07, Warner Losh <imp@bsdimp.com> wrote:
>>=20
>> On Oct 7, 2014, at 7:37 PM, Adrian Chadd <adrian@FreeBSD.org> wrote:
>>=20
>>> Hi,
>>>=20
>>> Right now we're not enumerating any NUMA domain information about =
devices.
>>>=20
>>> The more recent intel NUMA stuff has some extra affinity information
>>> for devices that (eventually) will allow us to bind kernel/user
>>> threads and/or memory allocation to devices to keep access local.
>>> There's a penalty for DMAing in/out of remote memory, so we'll want =
to
>>> figure out what counts as "Local" for memory allocation and perhaps
>>> constrain the CPU set that worker threads for a device run on.
>>>=20
>>> This patch adds a few things:
>>>=20
>>> * it adds a bus_if.m method for fetching the VM domain ID of a given
>>> device; or ENOENT if it's not in a VM domain;
>>=20
>> Maybe a default VM domain. All devices are in VM domains :) By =
default
>> today, we have only one VM domain, and that=92s the model that most =
of the
>> code expects=85
>=20
> Right, and that doesn't change until you compile in with num domains > =
1.

The first part of the statement doesn=92t change when the number of =
domains
is more than one. All devices are in a VM domain.

> Then, CPUs and memory have VM domains, but devices may or may not have
> a VM domain. There's no "default" VM domain defined if num domains >
> 1.

Please explain how a device cannot have a VM domain? For the
terminology I'm familiar with, to even get cycles to the device, you =
have to
have a memory address (or an I/O port). That memory address has to
necessarily map to some domain, even if that domain is equally sucky
to get to from all CPUs (as is the case with I/O ports). while there may
not be a =93default=94 domain, by virtue of its physical location it has =
to have
one.

> The devices themselves don't know about VM domains right now, so
> there's nothing constraining things like IRQ routing, CPU set, memory
> allocation, etc. The isilon team is working on extending the cpuset
> and allocators to "know" about numa and I'm sure this stuff will fall
> out of whatever they're working on.

Why would the device need to know the domain? Why aren=92t the IRQs,
for example, steered to the appropriate CPU? Why doesn=92t the bus =
handle
allocating memory for it in the appropriate place? How does this =
=93domain=94 tie
into memory allocation and thread creation?

> So when I go to add sysctl and other tree knowledge for device -> vm
> domain mapping I'm going to make them return -1 for "no domain.=94

Seems like there=92s too many things lumped together here. First off, =
how
can there be no domain. That just hurts my brain. It has to be in some
domain, or it can=92t be seen. Maybe this domain is one that sucks for =
everybody
to access, maybe it is one that=92s fast for some CPU or package of CPUs =
to
access, but it has to have a domain.

> (Things will get pretty hilarious later on if we have devices that are
> "local" to two or more VM domains ..)

Well, devices aren=92t local to domains, per se. Devices can communicate =
with
other components in a system at a given cost. One NUMA model is =93near=94=
 vs =93far=94
where a single near domain exists and all the =93far=94 resources are =
quite costly. Other
NUMA models may have a wider range of costs so that some resources are =
cheap,
others are a little less cheap, while others are down right expensive =
depending
on how far across the fabric of interconnects the messages need to =
travel. While
one can model this as a full 1-1 partitioning, that doesn=92t match all =
of the extant
implementations, even today. It is easy, but an imperfect match to the =
underlying
realities in many cases (though a very good match to x86, which is =
mostly what
we care about).

Warner

--Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4
Content-Transfer-Encoding: 7bit
Content-Disposition: attachment;
	filename=signature.asc
Content-Type: application/pgp-signature;
	name=signature.asc
Content-Description: Message signed with OpenPGP using GPGMail

-----BEGIN PGP SIGNATURE-----
Comment: GPGTools - https://gpgtools.org

iQIcBAEBCgAGBQJUN1hQAAoJEGwc0Sh9sBEAZ+EP/18pDCCN8iON0ziWDSFutha8
eLm/2Z3Me32wGm+uiv6wXMvoCsu9oqpi8ULwheQIEZf6Ieh9RaCacIXeEzlAjO8u
1zEaVv6qXALkv8IEhtfbaesFElcnFCbAdYJG90GnmaFXdE0N9Z7oV/6C7M4nuIYq
82OgeziQ5UMAc8LPQxZyk2aDaHT7SrtB/A2Y+e+KBfiWgcHFjoiEQwlB4TT1gFC+
ycYJGlfkaEFmspilymVRUWSJkqhVSJFkn+0v6KMOtUCpxMvVDcIWyIUxAtg/wYt7
qnR+JDKYiS7fa5UGqfUDEZtJ2p2D10l4ziMelAOasUWfFtgi+2HDLP4GfBnvGQdq
lu7cE1FPGsHNxMwuTi9nVegImYj8rJ4Uiec0kq1rIV1mukQS2V3vFADR/BSGViSr
7SZ2NFEf7CJND2246jxTaXoF4bKbYJilohd82FV3S1yAnj/UEONElbbDzMwfpIuS
oWKFfF/ywau8A+qNp0EI6GjBDxLAmjK1cepSlDcTraQrrLgf6bUnTGhZYiujYk0p
gGJtmkU+DMknKJFN5MouOTFpPHG7+KGvvbgpN5D9MuTqmYhqvDmuV+dhfRyi9zoT
DAp7K5SuubwfuThUV8yjEAllE5Fv5q8wizCesZDZ1nRYTLmC8Z5EMbmk1lYVBmek
YivD8gbWK1DE1cpLPBHy
=Hphq
-----END PGP SIGNATURE-----

--Apple-Mail=_5C657A39-8CEF-4768-80C7-AD7E7A5071B4--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?838B58B2-22D6-4AA4-97D5-62E87101F234>