Date: Sun, 2 Feb 1997 07:02:25 -0500 From: "David S. Miller" <davem@jenolan.rutgers.edu> To: michaelh@cet.co.jp Cc: netdev@roxanne.nuclecu.unam.mx, roque@di.fc.ul.pt, freebsd-smp@freebsd.org Subject: Re: SMP Message-ID: <199702021202.HAA09281@jenolan.caipgeneral> In-Reply-To: <Pine.SV4.3.95.970202202503.2569D-100000@parkplace.cet.co.jp> (message from Michael Hancock on Sun, 2 Feb 1997 20:36:36 %2B0900 (JST))
next in thread | previous in thread | raw e-mail | index | archive | help
Date: Sun, 2 Feb 1997 20:36:36 +0900 (JST) From: Michael Hancock <michaelh@cet.co.jp> It almost sounds like there are cases where "short holds" and "less contention" are hard to achieve. Can you give us an example? Or are you saying that spending time on contention minimization is not very fruitful. It is hard to achieve in certain cirsumstances yes, but it is worth putting some effort towards just not "too much" if things begin to look a bit abismal. I think some key subsystems would benefit from per processor pools (less contention) and the code can still be structured for short holds. Agreed. This becomes a serious issue once you have parallelized a subsystem. Even though all cpu's can go about their own ways if they are messing with seperate object, all of them contend equally for the kernel memory allocator. As Terry Lambert pointed out, even if you have per-processor pools you run into many cache pulls between processors when you are on one cpu, allocate a resource, get scheduled on another and then free it from there. (the TLB invalidation cases he described are not an issue at all for us on the kernel side, brain damaged traditional unix kernel memory designs eat this overhead, we won't, other than vmalloc() chunks we eat no mapping setup and destruction overhead at all and thus nothing to keep "consistant" with IPI's since it is not changing) On the other hand, I like the idea of: 1) per cpu pools for object FOO 2) global secondard pool for object class BAR which FOO is a member of and thus the strategy becomes: alloc_a_FOO() { again: if(FOO_free_list[smp_processor_id()]) { /* Go like smoke */ } else { lock_global_pool(); get_some_for_me(); if(others_running_low) feed_them_too(); unlock_global_pool(); goto again; } } free_a_FOO() { if(my_pool_only_moderately_full) { give_it_to_me(); } else { feed_someone_else_who_needs_it(); } } The idea is that when you need to enter a contension situation, you make the most of it by working towards keeping others from running into the same situation. Another modification to the idea on the freeing side is to always free this FOO to the local per-cpu queue, and if other cpus could starve soon you give some of your oldest FOO's to them, the encourage cache locality. Just some ideas. These are some of ideas I was toying about with Larry McVoy when we were discussing with each other how N processors could go full throttle on an N high speed interface machine with near zero contention. ---------------------------------------------//// Yow! 11.26 MB/s remote host TCP bandwidth & //// 199 usec remote TCP latency over 100Mb/s //// ethernet. Beat that! //// -----------------------------------------////__________ o David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702021202.HAA09281>