FreeBSD Mail Archives

Date:      Wed, 17 Jan 2001 11:28:50 -0800
From:      Alfred Perlstein <bright@wintelcom.net>
To:        Matt Dillon <dillon@earth.backplane.com>
Cc:        arch@FreeBSD.ORG
Subject:   Re: HEADS-UP: await/asleep removal imminent
Message-ID:  <20010117112850.X7240@fw.wintelcom.net>
In-Reply-To: <200101171907.f0HJ7Qe48680@earth.backplane.com>; from dillon@earth.backplane.com on Wed, Jan 17, 2001 at 11:07:26AM -0800
References:  <200101171138.MAA11834@freebsd.dk> <ybug0iixiee.fsf@jesup.eng.tvol.net.jesup.eng.tvol.net> <20010117092109.O7240@fw.wintelcom.net> <20010117100516.Q7240@fw.wintelcom.net> <200101171907.f0HJ7Qe48680@earth.backplane.com>

* Matt Dillon <dillon@earth.backplane.com> [010117 11:07] wrote:
> :* Alfred Perlstein <bright@wintelcom.net> [010117 09:24] wrote:
> :> 
> :> I'm not going to axe it for a few days, this is a really amazing
> :> API that Matt added, the problem is utility and useage over code
> :> complexity.
> :> 
> :> It's just a proposal.
> :
> :I found several places where it may be useful, but I'm not sure if the
> :benefits outweigh the gains.
> :...
> :
> :The lock must be unwound becasue we're calling MGETHDR with M_TRYWAIT.
> :If wae used M_TRY'A'WAIT the code would probably look something like
> :this:
> 
>     The basic premis of using asleep()/await() is to allow you to
>     propogate a 'blocking condition' back up to a higher level rather
>     then blocking deep in side the kernel.
> 
>     The original reasoning was to deal with memory allocation blockages.
> 
>     For example, lets say you have three subsystem layers calling each
>     other.  The top layer wishes to implement a non-blocking API but the
>     bottom layer might do an allocation that could block.
> 
>     The bottom layer could do a non-blocking allocation and return NULL,
>     but how does the top layer (or an even higher layer) know when to try
>     again?
> 
>     The original idea with asleep()/await() was for the bottom layer to
>     call asleep() on the resource that would block and then return NULL.
>     NULL would tell the higher layer(s) that someone down below couldn't
>     get some resource and that an asleep() has been setup.
> 
>     The higher layers can then decide what to do with the situation.. they
>     can abort the operation entirely, they can do the blocking (await() call)
>     themselves, or they can propogate the condition to their own callers.
> 
>     This way you can hold more then one lock (now mutex) through a number
>     of program layers without having to worry about them blocking on you.
> 
>     --
> 
>     For the c urrent SMP system, asleep()/await() could be used to deal with
>     complex situations where you (A) do not want to release a mutex through 
>     a call to another subsystem (like the memory allocator), or (B) do not
>     know if the code calling you is already holding some mutex X and you
>     want to hold mutex Y while you make a call to another subsystem.
> 
>     So, in that regard, you example:
> 
> :            /* SOCKBUF_UNLOCK(&so->so_snd, 0); */
> :again:
> :            if (top == 0) {
> :                MGETHDR(m, M_TRYWAIT, MT_DATA);
> :                if (m == NULL) {
> :                    error = mawait(&so->so_snd.sb_mtx, -1, -1);
> :                    if (error) {
> :                      if (error == EWOULDBLOCK)
> :                         error = ENOBUFS;
> :                      goto release;
> :                    }
> :                    goto again;
> :                    /* SOCKBUF_LOCK(&so->so_snd, 0); */
> :                }
> :                mlen = MHLEN;
> :...
> :            /* SOCKBUF_LOCK(&so->so_snd, 0); */      /* XXX */
> :
> :Which means we don't have to drop the lock over the socket unless
> :we'd block on allocation.
> 
>     Works exactly as I originally intended.

Yup, less overhead.

> :Matt, is this what you intended for it to do?  So far I've only 
> :seen it used to avoid races, but not to optimize out mutex
> :aquire/release.
> :
> :-- 
> :-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
> 
>     I've never liked the BSDI mutex rules, because subsystems have to have
>     major knowledge as to how other systems operate (in reagrds to whether
>     they can block or not), and the callee must have intimate knowledge of
>     the callers to know that it can hold a mutex and that be the only mutex
>     it is holding.
> 
>     This makes for extremely fragile and complex coding.

Also the need to drop/reaquire locks in a somewhat obtuse manner as my
example shows.

>     So, this said... I'm still on the fence as to whether await()/asleep()
>     can be used effectively.  As you said, there are not too many cases
>     at the moment and await()/asleep() does introduce significant code
>     complexity to the scheduler, and for it to really shine it needs to be
>     optimized to not require any sort of mutex at all in the calls to
>     asleep().
> 
>     In order to get rid of the overhead, asleep() needs to simply initialize
>     the curproc fields and not try to actually queue the process to the
>     sleep queue.  Right now asleep() queues the process to the sleep queue
>     (see kern/kern_synch.c) in order to support the ability for the system
>     to asynchronously 'wake the process up again' before it actually goes to
>     sleep (which causes the later await() to become a NOP).  i.e. the 
>     situation that caused the potential blockage might be resolved before
>     the process has a chance to sleep.  Traditionally we have used SPL levels
>     (and now mutexes) to prevent the possibility of a condition being
>     satisfied between the test and the tsleep().
> 
>     ----
> 				    Proposal

This is confusing:

> 
>     Revamp asleep/await to be based on state variables rather then 
>     tsleep's traditionaln 'fake' addresses.  Rather then
>     have a traditional sleep/wakeup we instead have a state variable that
>     asleep/await operate on.  For example lets say we have a memory allocator.
>     When the memory allocator finds it would block, it utilizes a global
>     state structure representing the blockage and clears the state
>     (blah.state = 0;).  Then it calls asleep(&blah).  asleep simply stores 
>     the pointer to &blah in the process structure, it does not try to queue
>     it or do anything else.  Thus no locking or mutexes or interrupt
>     disablement is required *at all*.  The routine is entirely passive and
>     safe to call from anywhere.

It's not queued _anywhere_?

>     Later, some event makes more memory available for allocation.  That
>     even is asynchronous and simply sets the state variable to 1
>     and wakes up anybody on the sleep queue for that condition variable.
>     This event will NOT catch the guy in the previous paragraph who has
>     not yet called await(), however, since the call to asleep() does not
>     actually enqueue the process (which would require a mutex).

again, it's not queued _anywhere_?

>     Later, the process that called asleep() finally decides to try
>     to go to sleep for real.  await() checks p_state->state and if it
>     is zero await() places the process on the sleep queue for real and actually
>     goes to sleep.  If p_state->state is non-zero, await() simply clears
>     the pointer (proc->p_state = NULL;) and returns (without sleeping).

This is a bit weird, basically await doesn't enqueue the process, so
there's a weird race going on such that all "freeing" need to signal
rather than just when the pool is exhausted, in fact the pool may not be
exhausted but you still block because no one signal's after you await.

>     I believe that this conforms to the state of the SMP system much much
>     better then my original asleep()/await() implementation, and has the
>     advantage of extremely *LOW* overhead (virtually none in many cases).
> 
>     You could then use it to give the 'power of blocking' to the caller 
>     rather then the callee.  This in turn gives you much greater flexibility
>     in regards to who can hold mutexes when and who can hold mutexes through
>     procedure calls to other subsystems.

Jason Evans just added his conditional variables, you could change
asleep/await into cv_attach/cv_await or something, that would give
you less contention on the sleep/run queues, but you'd still need
a spinlock or mutex on these variables.

-- 
-Alfred Perlstein - [bright@wintelcom.net|alfred@freebsd.org]
"I have the heart of a child; I keep it in a jar on my desk."


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-arch" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20010117112850.X7240>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation