Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 30 Oct 2000 14:40:32 -0800
From:      Mike Smith <msmith@mass.osd.bsdi.com>
To:        Bosko Milekic <bmilekic@dsuper.net>
Cc:        freebsd-net@freebsd.org
Subject:   Re: MP: per-CPU mbuf allocation lists 
Message-ID:  <200010302240.e9UMeWF18172@mass.osd.bsdi.com>
In-Reply-To: Your message of "Mon, 30 Oct 2000 13:20:52 EST." <Pine.BSF.4.21.0010301256580.30271-100000@jehovah.technokratis.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>   	I recently wrote an initial "scratch pad" design for per-CPU mbuf
>   lists (in the MP case). The design consists simply of introducing
>   these "fast" lists for each CPU and populating them with mbufs on bootup.
>   Allocations from these lists would not need to be protected with a mutex
>   as each CPU has its own. The general mmbfree list remains, and remains
>   protected with a mutex, in case the per-CPU list is empty.

Have you by any chance done any profiling to determine whether contention 
for the free mbuf list is actually a performance issue, or is this just 
one of those "hey, this would be cool" design decisions?

>   - "Fast" list; a per-CPU mbuf list. They contain "w" (for "watermark")
>     number of mbufs, typically... more on this below.
>     
>   - The general (already existing) mmbfree list; mutex protected, global
>     list, in case the fast list is empty for the given CPU.
>     
>   - Allocations; all done from "fast" lists. All are very fast, in the
>     general case. If no mbufs are available, the general mmbfree list's
>     lock is acquired, and an mbuf is made from there.

Do you handle the case where an interrupt handler running on the same CPU 
is run while you are manipulating the "fast" list, ie. do you still lock 
the "fast" list?

>     If no mbuf is
>     available, even from the general list, we let go of the lock and
>     allocate a page from mb_map and drop the mbufs onto our fast list, from
>     which we grab the one we need.

Starvation of the general list should result in the general list being 
populated, not the fast list.  In this case, the general list will remain 
depleted until mbufs are freed, which will blow mb_map out faster than is 
otherwise desirable.

>    - Freeing; First, if someone is sleeping, we grab the mmbfree global
>      list mutex and drop the mbuf there, and then issue a wakeup. If nobody
>      is sleeping, then we proceed as follows:
>      	(a) if our fast list does not have over "w" mbufs, put the mbuf on
> 	our fast list and then we're done
> 	(b) since our fast list already has "w" mbufs, acquire the mmbfree
> 	mutex and drop the mbuf there.

This is a half-hearted "donation" algorithm.  It might make sense to 
waste some cycles in the "sleeping" case to lock other cpu's "fast" lists 
and steal mbufs from them...

>   Things to note:
>   
>     - note that if we're out of mbufs on our fast list, and the general
> 	mmbfree list has none available either, and mb_map is starved, even
> 	though there may be free mbufs on other CPU's fast lists, we will
> 	return ENOBUFS. This behavior will usually be an indication of a
> 	wrongly chosen watermark ("w") and we will have to consider how to
> 	inform our users on how to properly select a watermark. I already
> 	have some ideas for alternate situations/ways of handeling this, but
> 	will leave this investigation for later.

See previous comment.

>     - "w" is a tunable watermark. No fast list will ever contain more than
> 	"w" mbufs. This presents a small problem. Consider a situation where
> 	we initially set w = 500; consider we have two CPUs; consider CPU1's
> 	fast list eventually gets 450 mbufs, and CPU2's fast list gets 345.
> 	Consider then that we decide to set w = 200; Even though all
> 	subsequent freeing will be done to the mmbfree list, unless we
> 	eventually go under the 200 mark for our free list, we will likely
> 	end up sitting with > 200 mbufs on each CPU's fast list. The idea I
> 	presently have is to have a kproc "garbage collect" > w mbufs on the
> 	CPUs' fast lists and put them back onto the mmbfree general list, if
> 	it detects that "w" has been lowered.

The watermark-lowering operation is likely to be very infrequent.  As 
such, it would hardly hurt for it to scan each of the "fast" lists and 
steal excess mbufs back into the global pool.

-- 
... every activity meets with opposition, everyone who acts has his
rivals and unfortunately opponents also.  But not because people want
to be opponents, rather because the tasks and relationships force
people to take different points of view.  [Dr. Fritz Todt]
           V I C T O R Y   N O T   V E N G E A N C E




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-net" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200010302240.e9UMeWF18172>