FreeBSD Mail Archives

Date:      Wed, 16 Dec 1998 19:33:55 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        hackers@FreeBSD.ORG
Subject:   Async blocking on temporary failures
Message-ID:  <199812170333.TAA83886@apollo.backplane.com>

next in thread | raw e-mail | index | archive | help

    This is a general idea I'm throwing out that I believe could be integrated
    into FreeBSD pretty easily.

    It is the notion of not blocking on 'trivial' conditions deep, deep in
    the kernel, but instead flagging the task for an asynchronous wait and
    returning NULL.  This gives us the ability to pop a temporary failure
    (typically NULL) to a higher level routine.  The higher level routine
    can then decide what to do:  i.e. block on the asynchronous wait queued
    by the subroutine or pop itself out to yet a higher level routine and let 
    it deal with it, or do something else.

    An asynchronous wait capability would allow a deep, low level routine
    to propogate the blocking condition up through multiple procedural levels
    undoing any temporary locks made by those procedures before the blocking
    condition is actually acted upon.

    Why do we need this?  Well, we have several serious deadlock problems
    within the kernel and these problems are only going to get worse with SMP
    as master locks are propogated inward.  By my reading of the kernel,
    most of these deadlock situations occur when something deep within the
    kernel finds it necessary to block on some temporary situation such as
    trying to allocate memory or a buffer or something like that.  Most of
    these blocking situations already incorporate hysterisis, which means
    that we *can* 'abort' the routine by initiating the async wait, returning 
    NULL instead of blocking, and allowing some higher level procedure to
    determine when to actually block.

    An asynchronous wait capability thus allows a process to block without
    holding major locks (spinlocks, bp locks, vm_page locks, vnode locks,
    etc etc etc).

    I use asynchronous waits in some of my other OS projects and I have
    found them to be invaluable in their ability to avoid deadlocks and
    to even greatly simplify code.

    Here's an example to illustrate the idea:

	The (FS)->bread()->getblk()->allocbuf() chain would benefit greatly
	from such a mechanism.    Assuming an async wait capability exists,
	allocbuf() could be adjusted such that it never blocks but instead
	returns 0 if an async wait occurs, allowing the chain to 'undo'
	itself back through getblk() and then have the blocking condition
	actually occur in the bread().  The mechanism could then eventually 
	be extended on up past the bread() and be directly supported by
	FS code and thus avoid holding locks on (for example) vnodes due
	to a synchronous I/O request, which would massively increase 
	parallelism on simultanious VFS/VNODE ops to the same descriptor
	(I'm thinking of mmap page faults specifically but it applies to
	any lseek()/read() combo).

    How it would work:

	Instead of tsleep()ing on a structure, we call asleep() instead and
	return a temporary failure.  For example, a routine that allocates or
	returns a bp would call asleep() and return NULL rather then tsleep(),
	retry internally, and eventually return a valid bp.

	The higher level parent procedure can either propogate the failure up
	by undoing whatever locks it had and returning a condition (note:
	without calling asleep()), and eventually you get to a parent procedure
	which decides it must block waiting for the temporary failure to
	clear, then retry the call that failed.  This routine blocks by
	calling await().

	Now, asleep() and await() do not nest.  There is a single embedded
	asyncwait structure in the struct process.  An asleep() call 
	*replaces* any previous async sleep.  await() blocks the process
	on whatever the most recent asyncwait structure was.  A wakeup on
	the associated address clears any queued asyncwait's.

	In the case where the async wait address is woken up prior to await()
	being called, the async wait structure is cleared by the wakeup and
	await() becomes a NOP.  The async wait can also be cleared by calling
	asleep(NULL).  Thus, *ALL* potential race conditions can be handled
	without any fancy coding.

    How to deal with race conditions:

	There are two ways to deal with potential race conditions.  The
	traditional way is to call splbio() or equivalent to prevent other
	processes from waking up the object you are about to sleep on.

	You can still do this with asleep().

	asleep() gives us another option:  Call asleep() BEFORE testing the
	condition in the structure being waited on.  Then test the condition
	and if you determine that you do not need to block, call asleep(NULL)
	to clear the async wait and continue as if nothing had happened.  
	Specifically:

	    /*
	     * Block waiting for blah
	     */
	    if (structure->flags & somecondition) {
		asleep(structure, ...);
		if (structure->flags & somecondition) {
		    return failure....
		}
		asleep(NULL, ...);
	    }


    I invite discussion on this feature.  I would be pleased to develop it
    for FreeBSD.  I think it would be extremely useful, especially with SMP
    but also with non-SMP kernels in regards to avoiding deadlock situations
    in the kernel.  I believe that the feature could be implemented easily
    and folded into major subsystems incrementally, 'fixing' the kernel from
    the inside out without having to make wholesale changes all in one shot.

						-Matt

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812170333.TAA83886>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation