FreeBSD Mail Archives

Date:      Tue, 18 Feb 2003 09:57:45 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Bosko Milekic <bmilekic@unixdaemons.com>
Cc:        freebsd-arch@FreeBSD.ORG
Subject:   Re: mb_alloc cache balancer / garbage collector
Message-ID:  <200302181757.h1IHvjaC051829@apollo.backplane.com>
References:  <15952.62746.260872.18687@grasshopper.cs.duke.edu> <20030217095842.D64558@unixdaemons.com> <200302171742.h1HHgSOq097182@apollo.backplane.com> <20030217154127.A66206@unixdaemons.com> <200302180000.h1I00bvl000432@apollo.backplane.com> <20030217192418.A67144@unixdaemons.com> <20030217192952.A67225@unixdaemons.com> <200302180101.h1I11AWr001132@apollo.backplane.com> <20030217203306.A67720@unixdaemons.com> <200302180458.h1I4wQiA048763@apollo.backplane.com> <20030218093946.A69621@unixdaemons.com>


:   I've looked at integrating these with the general all-purpose system
:   allocator (UMA).  I ran into several issues that are not, to my
:   knowledge, easily solved without ripping into UMA pretty badly.  I've
:   mentionned these before.  One of the issues is the keep-cache-lock
:   across grouped (m_getcl(), m_getm()) allocations and grouped
:   de-allocations (m_freem()) for as long as possible.  The other issue
:   has to do with keeping the common allocation and free cases down to
:   one function call.  Further, the mbuf code does special things like
:   call drain routines when completely exhausted and although I'm not
:   100% certain, I can almost guarantee that making sure these work
:   right with UMA is going to take a lot of ripping into it.  I'd like
:   to avoid ripping into a general-purpose allocator that I think needs
:   to have less rather than more application-specific complexities.

    Lets separate out the pure efficiency issues from special feature
    support.  The cache locking issue is really just an efficiency issue,
    easily solved with a little work on UMA.  Something like this for
    example:

	void **uma_lock = NULL;

	/*
	 * use of *uma_lock is entirely under the control of UMA.  It
	 * can  release block and reobtain, release and obtain another
	 * lock, or not use it at all (leave it NULL).  The only 
	 * requirement is that you call uma_cache_unlock(&uma_lock) 
	 * after you are through and that you not block in between UMA 
	 * operations.
	 */
	uma_cache_free(&uma_lock, ...) ... etc
	uma_cache_alloc(&uma_lock, ...) ... etc

	uma_cache_unlock(&uma_lock);

    Which would allow UMA to maintain a lock through a set of operations,
    at its sole discretion.  If the lock were made a real mutex then we
    could even allow the caller to block in between UMA operations by
    msleep()ing on it.  I've used this trick on a more global basis on
    embedded systems... the 'uma_lock' equivalent actually winds up being
    part of the task structure allowing it to be used universally by
    multiple subsystems.  (which, by the way, would allow one to get
    rid of the mutex argument to msleep() if it were done that way
    in FreeBSD).

    The mbuf draining issue is more of an issue.

:  Yes, you're right, but the difference is that in most cases, with the
:  kproc, you'll minimize the cost of most of the allocations and frees
:  because the kproc will have done the moving in less time.

    Why would the kproc minimize the cost of the allocations? 

    Try to estimate the efficiency of the following three methods:

    * The kproc allocating 200 mbufs per scheduled wakeup and the
      client then making 200 allocations via the local cpu cache.

      (2 Context switches for every 200 allocations)

    * The client making 200 allocations via the local cpu cache,
      the local cpu cache running out, and the allocator doing a bulk
      allocation of 20 mbufs at a time.

      (1 VM/global mutex interaction for every 20 allocations).

    * The kproc uses idle cycles to pre-allocate N mbufs in the per-cpu
      cache(s).

      (potentially no overhead if idle cycles are available)


    I would argue that the kproc method only exceeds the on the fly
    method if the system has lots of idle cycles for the kproc to run in.
    Under heavy loads, the on-the-fly method is going to win hands down
    (in my opinion).  Under light loads we shouldn't care if we are
    slightly less efficiency since we would become more efficient as the
    load increases.

    Consider the tuning you would have to do under heavy loads to minimize
    the number of kproc wakeups.  And, also, note that if your goal is 
    for the kproc to never have to wakeup then you are talking about a
    situation where the on-the-fly mechanism would equivalently not have
    to resort to the global cache.  The on-the-fly mechanism is trivially
    tunable, the kproc mechanism is not.

:  I understand that the pageout daemon probably employs an algorithm
:  that can get de-stabilized by large shifting of memory from one
:  subsystem to another.  However, my argument is that the effect of
:  moving slightly larger chunks of memory for network buffers is more
:  good than bad.  There are more common cases than there are corner
:  cases and for the corner cases I think that I could work out a decent
:  recovery mechanism (the kproc could be temporarily 'turned off,' for
:  example).

    I agree as long as the phrase is 'slight larger chunks...'.  But
    that same argument applies to on-the-fly allocation from the global
    cache, and as I point out above when you have a kproc you still have
    to decide how long (how much latency) to allow that kproc to
    introduce, which limits how many mbufs it should try to allocate from
    the global cache, right?

:  Here's what I think I'll do in order to get what I'm sure we both want
:  immediately without slowing down progress.  I'm going to implement the
:  on-the-fly freeing to VM case (this is pretty trivial).  I'll present
:  that and we can get that into the tree (so that we can at least
:  recover resources following network spikes).  I'll keep the kproc code
:  here and try to tune it to demonstrate eventually that it does the
:  right thing and that corner cases are minimized.  I'll also try
:  varying the number of objects per bucket, especially in the cluster
:  case, and see where we go from there.  Keep in mind that because this
:  is a specific network-buffer allocator, we may be able to get away
:  with moving larger chunks of objects from a kproc without necessarily
:  incurring all the bad effects of general-purpose allocation systems.
:...
:  It's an interesting corner case, but instead of completely trashing
:  the kproc idea (which does gain us something in common cases by
:  minimizing interactions with VM), I'll see if I can tune it to react
:  properly.  I'll look at what kind of gains we can get from more
:  conservative moves from the kproc vis-a-vis larger buckets.  It's easy
:  to tune these things without ripping anything else apart, specifically
:  because network buffers are allocated in their own special way.
:  
:  Matt, thanks for still reading the lists and remaining concerned.
:
:-- 
:Bosko Milekic * bmilekic@unixdaemons.com * bmilekic@FreeBSD.org

    I think it is well worth you implementing both and making them 
    switchable with sysctl's (simply by adjusting two different sets of
    hysteresis levels, for example).  Then you can test both under load
    to see if the kproc is worth it.  It might well turn out that the 
    kproc is a good idea but that on-the-fly allocation and deallocation
    is necessary to handle degenerate situations.  Or it might turn out
    that the kproc creates more problems then it solves.  Or it might turn
    out that the on-the-fly allocation and deallocation code is so close
    to the kproc code in regards to efficiency that there is no real 
    reason to have the kproc.  Or it might turn out that the kproc's best
    use is to recover memory after the machine has finished doing some
    real hard networking work and is now becoming more idle.

    Obviously my opinion is heavily weighted towards on-the-fly.  At
    the same I see no reason why you can't develop your kproc idea and
    even commit it.  You are, after all, the person who is taking the
    time to work on it.

					-Matt
					Matthew Dillon 
					<dillon@backplane.com>

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200302181757.h1IHvjaC051829>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation