From owner-freebsd-arch@FreeBSD.ORG Mon Aug 18 18:39:31 2014 Return-Path: Delivered-To: arch@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1 with cipher ADH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 37D0DDAD; Mon, 18 Aug 2014 18:39:31 +0000 (UTC) Received: from kib.kiev.ua (kib.kiev.ua [IPv6:2001:470:d5e7:1::1]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id B0AA03698; Mon, 18 Aug 2014 18:39:30 +0000 (UTC) Received: from tom.home (kib@localhost [127.0.0.1]) by kib.kiev.ua (8.14.9/8.14.9) with ESMTP id s7IIdPeD099532 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NO); Mon, 18 Aug 2014 21:39:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) DKIM-Filter: OpenDKIM Filter v2.9.2 kib.kiev.ua s7IIdPeD099532 Received: (from kostik@localhost) by tom.home (8.14.9/8.14.9/Submit) id s7IIdP4g099531; Mon, 18 Aug 2014 21:39:25 +0300 (EEST) (envelope-from kostikbel@gmail.com) X-Authentication-Warning: tom.home: kostik set sender to kostikbel@gmail.com using -f Date: Mon, 18 Aug 2014 21:39:25 +0300 From: Konstantin Belousov To: "Alexander V. Chernikov" Subject: Re: superpages for UMA Message-ID: <20140818183925.GP2737@kib.kiev.ua> References: <53F215A9.8010708@FreeBSD.org> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; protocol="application/pgp-signature"; boundary="y8hmAOsilT9lKboI" Content-Disposition: inline In-Reply-To: <53F215A9.8010708@FreeBSD.org> User-Agent: Mutt/1.5.23 (2014-03-12) X-Spam-Status: No, score=-2.0 required=5.0 tests=ALL_TRUSTED,BAYES_00, DKIM_ADSP_CUSTOM_MED,FREEMAIL_FROM,NML_ADSP_CUSTOM_MED autolearn=no autolearn_force=no version=3.4.0 X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on tom.home Cc: arch@freebsd.org, Gleb Smirnoff , "Andrey V. Elsukov" X-BeenThere: freebsd-arch@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Discussion related to FreeBSD architecture List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 18 Aug 2014 18:39:31 -0000 --y8hmAOsilT9lKboI Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Mon, Aug 18, 2014 at 07:03:05PM +0400, Alexander V. Chernikov wrote: > Hello list. >=20 > Currently UMA(9) uses PAGE_SIZE kegs to store items in. > It seems fine for most usage scenarios, however there are some where=20 > very large number of items is required. >=20 > I've run into this problem while using ipfw tables (radix based) with=20 > ~50k records. This is how > `pmcstat -TS DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK -w1` looks like: > PMC: [DTLB_LOAD_MISSES.MISS_CAUSES_A_WALK] Samples: 2359 (100.0%) , 0=20 > unresolved >=20 > %SAMP IMAGE FUNCTION CALLERS > 28.7 kernel rn_match ipfw_lookup_table:21.7=20 > rtalloc_fib_nolock:7.0 > 25.5 ipfw.ko ipfw_chk ipfw_check_hook > 6.0 kernel rn_lookup ipfw_lookup_table >=20 > Some numbers: table entry occupies 128 bytes, so we may store no more=20 > than 30 records in single page-sized keg. > 50k records require more than 1500 kegs. > As far as I understand second-level TLB for modern Intel CPU may be 256= =20 > or 512 entries( for 4K pages ), so using large number of entries > results in TLB cache misses constantly happening. >=20 > Other examples: > Route tables (in current implementation): struct rte occupies more than= =20 > 128 bytes and storing full-view (> 500k routes) would result in TLB=20 > misses happening all of the time. > Various stateful packet processing: modern SLB/firewall can have=20 > millions of states. Regardless of state size PAGE_SIZE'd kegs is not the= =20 > best choice. >=20 > All of these can be addressed: > Ipwa tables/ipfw dynamic state allocation code can (and will) be=20 > rewritten to use uma+uma_zone_set_allocf (suggested by glebius), > radix should simply be changed to a different lookup algo (as it is=20 > happening in ipfw tables). >=20 > However, we may consider on adding another UMA flag to allocate=20 > 2M/1G-sized kegs per request. > (Additionally, Intel Haswell arch has 512 entries in STLB shared?=20 > between 4k/2M so it should help the former). >=20 > What do you think? >=20 Zones with small object sizes use uma_small_alloc() to request physical page and its KVA mapping. On amd64, uma_small_alloc() allocates a physical page and returns direct mapping address for the page. The direct map is done by large pages (2MB, 1GB if avaliable). In this sense, your allocations already use large pages for virtual memory translations. Zones are not local in the KVA, i.e. objects from the same zone are usually far apart in the KVA. Zones do not get dedicated submaps to contain the zone-owned pages. Note that large pages TLB is usually relatively small. E.g. on my Nehalem machine, it only has 32 entries which can hold 2MB pages, which results in the 64MB of cached address space translations in the best case. You might try to reduce the available memory to see the increased locality and better DTLB hit ratio, if your load can survive with lesser memory size. --y8hmAOsilT9lKboI Content-Type: application/pgp-signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iQIcBAEBAgAGBQJT8khdAAoJEJDCuSvBvK1BhjQP/R565J1uLGZorgaLL9g8Vmkb 2+NsiNyxtRqEUkOQu5mvtuJrRFfHhshQlnyu1mya5710Y4JndIGsUKiiSSot/zSe 81833zvmOWE0MKJ7vVLH7Iw/PgOM+7obWm7QxuiLgLrOW/HJOdwZWABm0dw1zdIU eu249sF4F4OhRzxBilV5jCb2m8iIRc90St07eBz+441p3xR+ZgVpBQAlQiODAV+j 4CpxpxQrvBWqhdCOKISnKMiOi2rIx4NUz5SdVXF3EjfvV40WWkMuwSnTc4jNMO7p qY53ChGfcKsfx2CKwpzfrSPZ8wStk5s1hmryoCHEIffzyKRrnQ5Yy+ksOT+fFoe3 OW5GSbDKE+3pgEsPqwuuLhLciX1rZ9LWFoCesciVWqh9er5n3CT5XjllN3wFRGyb s79uUsBBc4Yk+mowgyzwtGZTzIZTLtXkkVochHwDCRB5IhvWFWWyJ0heVN/mwaI3 3KlmN5JMsv+XXGO0WV/h8qVdIzlvXzbmZqXeuLoX7YbRvpjyckxsAG1UJGqTDNPx nsCZwLZqpb7oJ0xXvdkbj1Gl3P35sa4YVNaPiY2T9JwdyWMQ88hz2U+D7xr4zw1E HFFFka76CUWIKoInOW54vQOZhAayq24Sy7hUJeq01Zd+GCFHfo1Kahs0mG0jPtPU ZBlEZoHQzvXyj49i/fiq =K3wE -----END PGP SIGNATURE----- --y8hmAOsilT9lKboI--