Date: Tue, 13 Nov 2012 00:18:25 -0800 From: Alfred Perlstein <bright@mu.org> To: Andre Oppermann <oppermann@networx.ch> Cc: "freebsd-net@freebsd.org" <freebsd-net@freebsd.org>, Adrian Chadd <adrian@freebsd.org>, Peter Wemm <peter@wemm.org> Subject: Re: auto tuning tcp Message-ID: <50A20251.7010302@mu.org> In-Reply-To: <50A1FF80.3040900@networx.ch> References: <50A0A0EF.3020109@mu.org> <50A0A502.1030306@networx.ch> <50A0B8DA.9090409@mu.org> <50A0C0F4.8010706@networx.ch> <EB2C22B5-C18D-4AC2-8694-C5C0D96C07B3@mu.org> <50A13961.1030909@networx.ch> <50A14460.9020504@mu.org> <50A1E2E7.3090705@mu.org> <50A1E47C.1030208@mu.org> <CAGE5yCoj1dL9w-EMMi8iYMTOq9uUUHmFg4rMY7aPneUBHBv67Q@mail.gmail.com> <50A1EC92.9000507@mu.org> <50A1FF80.3040900@networx.ch>
next in thread | previous in thread | raw e-mail | index | archive | help
On 11/13/12 12:06 AM, Andre Oppermann wrote: > On 13.11.2012 07:45, Alfred Perlstein wrote: >> On 11/12/12 10:23 PM, Peter Wemm wrote: >>> On Mon, Nov 12, 2012 at 10:11 PM, Alfred Perlstein <bright@mu.org> >>> wrote: >>>> On 11/12/12 10:04 PM, Alfred Perlstein wrote: >>>>> On 11/12/12 10:48 AM, Alfred Perlstein wrote: >>>>>> On 11/12/12 10:01 AM, Andre Oppermann wrote: >>>>>>> >>>>>>> I've already added the tunable "kern.maxmbufmem" which is in pages. >>>>>>> That's probably not very convenient to work with. I can change it >>>>>>> to a percentage of phymem/kva. Would that make you happy? >>>>>>> >>>>>> It really makes sense to have the hash table be some relation to >>>>>> sockets >>>>>> rather than buffers. >>>>>> >>>>>> If you are hashing "foo-objects" you want the hash to be some >>>>>> relation to >>>>>> the max amount of "foo-objects" you'll see, not backwards derived >>>>>> from the >>>>>> number of "bar-objects" that "foo-objects" contain, right? >>>>>> >>>>>> Because we are hashing the sockets, right? not clusters. >>>>>> >>>>>> Maybe I'm wrong? I'm open to ideas. >>>>> >>>>> Hey Andre, the following patch is what I was thinking >>>>> (uncompiled/untested), it basically rounds up the maxsockets to a >>>>> power of 2 >>>>> and replaces the default 512 tcb hashsize. >>>>> >>>>> It might make sense to make the auto-tuning default to a minimum >>>>> of 512. >>>>> >>>>> There are a number of other hashes with static sizes that could >>>>> make use >>>>> of this logic provided it's not upside-down. >>>>> >>>>> Any thoughts on this? >>>>> >>>>> Tune the tcp pcb hash based on maxsockets. >>>>> Be more forgiving of poorly chosen tunables by finding a closer power >>>>> of two rather than clamping down to 512. >>>>> Index: tcp_subr.c >>>>> =================================================================== >>>> >>>> Sorry, GUI mangled the patch... attaching a plain text version. >>>> >>>> >>> Wait, you want to replace a hash with a flat array? Why even bother >>> to call it a hash at that point? >>> >>> >> >> If you are concerned about the space/time tradeoff I'm pretty happy >> with making it 1/2, 1/4th, 1/8th >> the size of maxsockets. (smaller?) >> >> Would that work better? > > I'd go for 1/8 or even 1/16 with a lower bound of 512. More than > that is excessive. I'm OK with 1/8. All I'm really going for is trying to make it somewhat better than 512 when un-tuned. > >> The reason I chose to make it equal to max sockets was a space/time >> tradeoff, ideally a hash should >> have zero collisions and if a user has enough memory for 250,000 >> sockets, then surely they have >> enough memory for 256,000 pointers. > > I agree in general. Though not all large memory servers do serve a > large amount of connections. We have find a tradeoff here. > > Having a perfect hash would certainly be laudable. As long as the > average hash chain doesn't go beyond few entries it's not a problem. > >> If you strongly disagree then I am fine with a more conservative >> setting, just note that effectively >> the hash table will require 1/2 the factor that we go smaller in >> additional traversals when we max >> out the number of sockets. Meaning if the table is 1/4 the size of >> max sockets, when we hit that >> many tcp connections I think we'll see an order of average 2 linked >> list traversals to find a node. >> At 1/8, then that number becomes 4. > > I'm fine with that and claim that if you expect N sockets that you > would also increase maxfiles/sockets to N*2 to have some headroom. That is a good point. > >> I recall back in 2001 on a PII400 with a custom webserver I wrote >> having a huge benefit by upping >> this to 2^14 or maybe even 2^16, I forget, but suddenly my CPU went >> down a huge amount and I didn't >> have to worry about a load balancer or other tricks. > > I can certainly believe that. A hash size of 512 is no good if > you have more than 4K connections. > > PS: Please note that my patch for mbuf and maxfiles tuning is not yet > in HEAD, it's still sitting in my tcp_workqueue branch. I still have > to search for derived values that may get totally out of whack with > the new scaling scheme. > This is cool! Thank you for the feedback. Would you like me to put this on a user branch somewhere for you to merge into your perf branch? -Alfred
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?50A20251.7010302>