Skip site navigation (1)Skip section navigation (2)
Date:      Sun, 2 Feb 1997 07:02:25 -0500
From:      "David S. Miller" <davem@jenolan.rutgers.edu>
To:        michaelh@cet.co.jp
Cc:        netdev@roxanne.nuclecu.unam.mx, roque@di.fc.ul.pt, freebsd-smp@freebsd.org
Subject:   Re: SMP
Message-ID:  <199702021202.HAA09281@jenolan.caipgeneral>
In-Reply-To: <Pine.SV4.3.95.970202202503.2569D-100000@parkplace.cet.co.jp> (message from Michael Hancock on Sun, 2 Feb 1997 20:36:36 %2B0900 (JST))

next in thread | previous in thread | raw e-mail | index | archive | help
   Date: Sun, 2 Feb 1997 20:36:36 +0900 (JST)
   From: Michael Hancock <michaelh@cet.co.jp>

   It almost sounds like there are cases where "short holds" and "less
   contention" are hard to achieve.  Can you give us an example?  Or
   are you saying that spending time on contention minimization is not
   very fruitful.

It is hard to achieve in certain cirsumstances yes, but it is worth
putting some effort towards just not "too much" if things begin to
look a bit abismal.

   I think some key subsystems would benefit from per processor pools
   (less contention) and the code can still be structured for short
   holds.

Agreed.  This becomes a serious issue once you have parallelized a
subsystem.  Even though all cpu's can go about their own ways if they
are messing with seperate object, all of them contend equally for
the kernel memory allocator.  As Terry Lambert pointed out, even if
you have per-processor pools you run into many cache pulls between
processors when you are on one cpu, allocate a resource, get scheduled
on another and then free it from there.  (the TLB invalidation cases
he described are not an issue at all for us on the kernel side, brain
damaged traditional unix kernel memory designs eat this overhead, we
won't, other than vmalloc() chunks we eat no mapping setup and
destruction overhead at all and thus nothing to keep "consistant"
with IPI's since it is not changing)

On the other hand, I like the idea of:

	1) per cpu pools for object FOO

	2) global secondard pool for object class BAR which
	   FOO is a member of

and thus the strategy becomes:

	alloc_a_FOO()
	{
	again:
		if(FOO_free_list[smp_processor_id()]) {
			/* Go like smoke */
		} else {
			lock_global_pool();
			get_some_for_me();
			if(others_running_low)
				feed_them_too();
			unlock_global_pool();
			goto again;
		}
	}

	free_a_FOO()
	{
		if(my_pool_only_moderately_full) {
			give_it_to_me();
		} else {
			feed_someone_else_who_needs_it();
		}
	}

The idea is that when you need to enter a contension situation, you
make the most of it by working towards keeping others from running
into the same situation.

Another modification to the idea on the freeing side is to always free
this FOO to the local per-cpu queue, and if other cpus could starve
soon you give some of your oldest FOO's to them, the encourage cache
locality.

Just some ideas.  These are some of ideas I was toying about with
Larry McVoy when we were discussing with each other how N processors
could go full throttle on an N high speed interface machine with near
zero contention.

---------------------------------------------////
Yow! 11.26 MB/s remote host TCP bandwidth & ////
199 usec remote TCP latency over 100Mb/s   ////
ethernet.  Beat that!                     ////
-----------------------------------------////__________  o
David S. Miller, davem@caip.rutgers.edu /_____________/ / // /_/ ><



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199702021202.HAA09281>