From owner-freebsd-net@FreeBSD.ORG Tue Nov 13 06:03:54 2012 Return-Path: Delivered-To: freebsd-net@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [69.147.83.52]) by hub.freebsd.org (Postfix) with ESMTP id 8619372C; Tue, 13 Nov 2012 06:03:54 +0000 (UTC) (envelope-from bright@mu.org) Received: from elvis.mu.org (elvis.mu.org [192.203.228.196]) by mx1.freebsd.org (Postfix) with ESMTP id 623848FC13; Tue, 13 Nov 2012 06:03:54 +0000 (UTC) Received: from kruse-124.4.ixsystems.com (drawbridge.ixsystems.com [206.40.55.65]) by elvis.mu.org (Postfix) with ESMTPSA id B1BD01A3D25; Mon, 12 Nov 2012 22:03:53 -0800 (PST) Message-ID: <50A1E2E7.3090705@mu.org> Date: Mon, 12 Nov 2012 22:04:23 -0800 From: Alfred Perlstein User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:16.0) Gecko/20121026 Thunderbird/16.0.2 MIME-Version: 1.0 To: Andre Oppermann Subject: Re: auto tuning tcp References: <50A0A0EF.3020109@mu.org> <50A0A502.1030306@networx.ch> <50A0B8DA.9090409@mu.org> <50A0C0F4.8010706@networx.ch> <50A13961.1030909@networx.ch> <50A14460.9020504@mu.org> In-Reply-To: <50A14460.9020504@mu.org> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Content-Filtered-By: Mailman/MimeDel 2.1.14 Cc: "freebsd-net@freebsd.org" , Adrian Chadd , Peter Wemm X-BeenThere: freebsd-net@freebsd.org X-Mailman-Version: 2.1.14 Precedence: list List-Id: Networking and TCP/IP with FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Nov 2012 06:03:54 -0000 On 11/12/12 10:48 AM, Alfred Perlstein wrote: > On 11/12/12 10:01 AM, Andre Oppermann wrote: >> >> I've already added the tunable "kern.maxmbufmem" which is in pages. >> That's probably not very convenient to work with. I can change it >> to a percentage of phymem/kva. Would that make you happy? >> > > It really makes sense to have the hash table be some relation to > sockets rather than buffers. > > If you are hashing "foo-objects" you want the hash to be some relation > to the max amount of "foo-objects" you'll see, not backwards derived > from the number of "bar-objects" that "foo-objects" contain, right? > > Because we are hashing the sockets, right? not clusters. > > Maybe I'm wrong? I'm open to ideas. Hey Andre, the following patch is what I was thinking (uncompiled/untested), it basically rounds up the maxsockets to a power of 2 and replaces the default 512 tcb hashsize. It might make sense to make the auto-tuning default to a minimum of 512. There are a number of other hashes with static sizes that could make use of this logic provided it's not upside-down. Any thoughts on this? Tune the tcp pcb hash based on maxsockets. Be more forgiving of poorly chosen tunables by finding a closer power of two rather than clamping down to 512. Index: tcp_subr.c =================================================================== --- tcp_subr.c (revision 242936) +++ tcp_subr.c (working copy) @@ -235,7 +235,7 @@ * variable net.inet.tcp.tcbhashsize */ #ifndef TCBHASHSIZE -#define TCBHASHSIZE 512 +#define TCBHASHSIZE 0 #endif /* @@ -282,6 +282,27 @@ return (0); } +/* + * Take a value and get the next power of 2 that doesn't overflow. + * Used to size the tcp_inpcb hash buckets. + */ +static int +maketcp_hashsize(int size) +{ + int hashsize; + + /* + * auto tune. + * get the next power of 2 higher than maxsockets. + */ + hashsize = 1 << fls(maxsockets); + /* catch overflow, and just go one power of 2 smaller */ + if (hashsize < maxsockets) { + hashsize = 1 << (fls(maxsockets) - 1); + } + return hashsize; +} + void tcp_init(void) { @@ -296,9 +317,20 @@ hashsize = TCBHASHSIZE; TUNABLE_INT_FETCH("net.inet.tcp.tcbhashsize", &hashsize); + if (hashsize == 0) { + /* auto tune based on maxsockets */ + hashsize = maketcp_hashsize(maxsockets); + } + /* + * Be forgiving of admins that don't know to make the tunable + * a power of two. + */ if (!powerof2(hashsize)) { - printf("WARNING: TCB hash size not a power of 2\n"); - hashsize = 512; /* safe default */ + int oldhashsize = hashsize; + + hashsize = maketcp_hashsize(hashsize); + printf("%s: WARNING: TCB hash size not a power of 2, " + "fixed %d -> %d\n", __func__, oldhashsize, hashsize); } in_pcbinfo_init(&V_tcbinfo, "tcp", &V_tcb, hashsize, hashsize, "tcp_inpcb", tcp_inpcb_init, NULL, UMA_ZONE_NOFREE,