Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 18 Aug 2014 21:39:25 +0300
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        "Alexander V. Chernikov" <melifaro@FreeBSD.org>
Cc:        arch@freebsd.org, Gleb Smirnoff <glebius@freebsd.org>, "Andrey V. Elsukov" <ae@freebsd.org>
Subject:   Re: superpages for UMA
Message-ID:  <20140818183925.GP2737@kib.kiev.ua>
In-Reply-To: <53F215A9.8010708@FreeBSD.org>
References:  <53F215A9.8010708@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help

--y8hmAOsilT9lKboI
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Mon, Aug 18, 2014 at 07:03:05PM +0400, Alexander V. Chernikov wrote:
> Hello list.
>=20
> Currently UMA(9) uses PAGE_SIZE kegs to store items in.
> It seems fine for most usage scenarios,  however there are some where=20
> very large number of items is required.
>=20
> I've run into this problem while using ipfw tables (radix based) with=20
> ~50k records. This is how
> `pmcstat -TS DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK -w1` looks like:
> PMC: [DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK] Samples: 2359 (100.0%) , 0=20
> unresolved
>=20
> %SAMP IMAGE      FUNCTION             CALLERS
>   28.7 kernel     rn_match             ipfw_lookup_table:21.7=20
> rtalloc_fib_nolock:7.0
>   25.5 ipfw.ko    ipfw_chk             ipfw_check_hook
>    6.0 kernel     rn_lookup            ipfw_lookup_table
>=20
> Some numbers: table entry occupies 128 bytes, so we may store no more=20
> than 30 records in single page-sized keg.
> 50k records require more than 1500 kegs.
> As far as I understand second-level TLB for modern Intel CPU may be 256=
=20
> or 512 entries( for 4K pages ), so using large number of entries
> results in TLB cache misses constantly happening.
>=20
> Other examples:
> Route tables (in current implementation): struct rte occupies more than=
=20
> 128 bytes and storing full-view (> 500k routes) would result in TLB=20
> misses happening all of the time.
> Various stateful packet processing: modern SLB/firewall can have=20
> millions of states. Regardless of state size PAGE_SIZE'd kegs is not the=
=20
> best choice.
>=20
> All of these can be addressed:
> Ipwa tables/ipfw dynamic state allocation code can (and will) be=20
> rewritten to use uma+uma_zone_set_allocf (suggested by glebius),
> radix should simply be changed to a different lookup algo (as it is=20
> happening in ipfw tables).
>=20
> However, we may consider on adding another UMA flag to allocate=20
> 2M/1G-sized kegs per request.
> (Additionally, Intel Haswell arch has 512 entries in STLB shared?=20
> between 4k/2M so it should help the former).
>=20
> What do you think?
>=20
Zones with small object sizes use uma_small_alloc() to request physical
page and its KVA mapping. On amd64, uma_small_alloc() allocates a
physical page and returns direct mapping address for the page. The
direct map is done by large pages (2MB, 1GB if avaliable). In this
sense, your allocations already use large pages for virtual memory
translations.

Zones are not local in the KVA, i.e. objects from the same zone are
usually far apart in the KVA.  Zones do not get dedicated submaps to
contain the zone-owned pages.

Note that large pages TLB is usually relatively small.  E.g. on my
Nehalem machine, it only has 32 entries which can hold 2MB pages,
which results in the 64MB of cached address space translations in
the best case.  You might try to reduce the available memory to
see the increased locality and better DTLB hit ratio, if your load
can survive with lesser memory size.

--y8hmAOsilT9lKboI
Content-Type: application/pgp-signature

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2

iQIcBAEBAgAGBQJT8khdAAoJEJDCuSvBvK1BhjQP/R565J1uLGZorgaLL9g8Vmkb
2+NsiNyxtRqEUkOQu5mvtuJrRFfHhshQlnyu1mya5710Y4JndIGsUKiiSSot/zSe
81833zvmOWE0MKJ7vVLH7Iw/PgOM+7obWm7QxuiLgLrOW/HJOdwZWABm0dw1zdIU
eu249sF4F4OhRzxBilV5jCb2m8iIRc90St07eBz+441p3xR+ZgVpBQAlQiODAV+j
4CpxpxQrvBWqhdCOKISnKMiOi2rIx4NUz5SdVXF3EjfvV40WWkMuwSnTc4jNMO7p
qY53ChGfcKsfx2CKwpzfrSPZ8wStk5s1hmryoCHEIffzyKRrnQ5Yy+ksOT+fFoe3
OW5GSbDKE+3pgEsPqwuuLhLciX1rZ9LWFoCesciVWqh9er5n3CT5XjllN3wFRGyb
s79uUsBBc4Yk+mowgyzwtGZTzIZTLtXkkVochHwDCRB5IhvWFWWyJ0heVN/mwaI3
3KlmN5JMsv+XXGO0WV/h8qVdIzlvXzbmZqXeuLoX7YbRvpjyckxsAG1UJGqTDNPx
nsCZwLZqpb7oJ0xXvdkbj1Gl3P35sa4YVNaPiY2T9JwdyWMQ88hz2U+D7xr4zw1E
HFFFka76CUWIKoInOW54vQOZhAayq24Sy7hUJeq01Zd+GCFHfo1Kahs0mG0jPtPU
ZBlEZoHQzvXyj49i/fiq
=K3wE
-----END PGP SIGNATURE-----

--y8hmAOsilT9lKboI--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20140818183925.GP2737>