FreeBSD Mail Archives

Date:      Sat, 19 Dec 1998 01:53:06 -0800 (PST)
From:      Matthew Dillon <dillon@apollo.backplane.com>
To:        Don Lewis <Don.Lewis@tsc.tdk.com>
Cc:        freebsd-current@FreeBSD.ORG
Subject:   Re: asleep()/await(), M_AWAIT, etc...
Message-ID:  <199812190953.BAA07138@apollo.backplane.com>
References:   <199812190844.AAA11936@salsa.gv.tsc.tdk.com>


:On Dec 17, 12:05am, Matthew Dillon wrote:
:} Subject: asleep()/await(), M_AWAIT, etc...
:
:}     We add an await() kernel function.  This function initiates any timeout
:}     and puts the process to sleep, but only if it is still on a sleep queue.
:}     If someone (i.e. an interrupt) wakes up the sleep address after the
:}     process calls asleep() but before it calls await(), the slpque is
:}     cleared and the await() winds up being a NOP.
:
:How likely is this to happen if the process doesn't go to sleep for some
:other reason inbetween the asleep() and the await()?  The CPU can execute
:a *lot* of code in the time it takes for physical I/O to happen.

    Well, the idea is for asleep() to not interfere with a normal sleep.
    If a process does an asleep() and then, for some reason, does a
    normal sleep or another asleep() without waiting for the prior event
    to occur, the original asleep() condition is lost and an await() later
    on that, code-wise, was expecting to wait for the condition earmarked
    by the original asleep() will not wait for it, instead causing an 
    immediate return and thus an immediate retry.  This shouldn't cause
    a problem, though.

    The chance of a condition being signalled after an asleep() but before
    the associated await(), assuming no blocking inbetween, is not very
    high but I expect it would happen under normal operating conditions 
    maybe 1 out of every 5000 or so uses.

    The situation becomes more interesting when you get into SMP situations,
    especially once we start allowing all N processors to enter into supervisor
    mode and run mainstream supervisor code simultaniously.  It should be
    noted that event interlocks can be done very easily with asleep()/await()
    without having to mess with the ipl mask.  Since the ipl mask doesn't
    work when SMP supervisor operation is allowed on > 1 cpu at a time,
    it is just as well that another mechanism exists.

:}     The purpose of the new routines is to allow blocking conditions to
:}     propogate up a subroutine chain and get handled at a higher level rather
:}     then at a lower level in those areas of code that cannot afford to 
:}     leave exclusive locks sitting around.  For example, if bread() blocks
:}     waiting for a low level disk I/O on a block device, the vnode remains
:}     locked throughout which badly mars potential parallelism when multiple
:}     programs are accessing the same file.  There is no reason to leave the
:}     high level vnode locked while bringing a page into the VM buffer cache!
:
:What happens if some other process decides to truncate the file while
:another process is in the middle of paging in a piece of it?  If there
:is no reason to care about this sort of thing, then there is no reason
:to hold the lock across the bread(), which would probably be a simple

    Well, in this particular case we don't care because it isn't the pagein
    into the process's VM space that we are waiting on, it's the bringing 
    of the page from the underlying block device into the filesystem cache,
    which is independant of the overlayed filesystem structure and was
    queued to the disk device on the original attempt.  

    In the case of a truncate, this higher level operation will not effect
    the lower level I/O in progress (or, if it does abort it, will wakeup
    anybody waiting for that page anyway).  The wakeup occurs and the
    original requesting task retries its vm fault.  On this attempt it
    notices the fact that the file has been truncated and does the right 
    thing.  Effectively we are retrying an operation 'from scratch', so
    the fact that the truncate occured is handled properly.

    Another indirect use for asleep() would be to unwind locks when an inner
    lock cannot be obtained and to then retry the entire sequence later when 
    the inner lock 'might' become attainable.  You do this by asleep()ing on
    the event of the inner lock getting unlocked, then popping back through
    the call stack and unwinding the locks you were able to get, then
    sleeping (calling await()) at the top level (holding no locks) and retrying
    when you wake up again.  This wouldn't work very well for complex locking 
    (4 or more levels), but I would guess that it would work quite nicely
    for the 2-layer locking that we typically do in the kernel.

:}     allocation fails would be able to unwind the lock(s), await(), and retry.
:}     This is something the current code cannot do at all.
:
:Most things that allocate memory want to scribble on it right after they
:allocate it.  Using M_AWAIT would take a fair amount of rewriting.  You
:can already do something similar without M_AWAIT by using M_NOWAIT.  If
:that fails, unwind the lock, use M_WAITOK, and relock the object.  However,
:it would probably be cleaner to just do do MALLOC(..., M_WAITOK) before
:grabbing the lock, if possible.

    The point here is that if you cannot afford to block in the procedure
    that is doing the memory allocation, you may be able to block in a 
    higher level procedure.  M_NOWAIT and M_WAITOK cannot cover that
    situation at all.  M_AWAIT (which is like M_NOWAIT but it calls 
    asleep() as well as returns NULL) *can*.  The only implementation
    requirement is that the procedure call chain being implemented with
    asleep() understand a temporary failure condition and do the right
    thing with it (eventually await() and retry from the top level).

:There may be cases where this is not possible.  For example, the amount of
:the memory you need to allocate depends on the object that you have locked.

    Oh, certainly, but asleep/await do not have to be implemented everywhere,
    only in those places where it makes sense to.  We aren't removing any
    of the prior functionality, we are adding new functionality to allow
    us to solve deadlock situations that occur with the old functionality.

:If you have the object unlocked while the memory is being allocated, another
:process may touch the object while it is unlocked and you'll end up allocating
:the wrong amount of memory.  The only scheme that works in this case is
:locking the object first and leaving it locked across MALLOC(..., M_WAITOK).
:
:NOTE: some of the softupdates panics before 3.0-RELEASE were caused by

    I think you missed the primary point of asleep()/await().  The idea
    is that you pop back through subroutine levels, undoing the entire
    operation (or a good portion of it), the 'retry later'.  What you
    describe is precisely the already-existant situation that asleep() and 
    await() can be used to fix.  This might sound expensive, but most
    of the places where we would need to use asleep()/await() would not
    actually have to pop back more then a few subroutine levels to be
    effective.

						-Matt

:vnodes inadvertently being unlocked and then relocked in some low level
:routines, which allowed files to be fiddled with by one process while
:another process thought it had exclusive access.

    Matthew Dillon  Engineering, HiWay Technologies, Inc. & BEST Internet 
                    Communications & God knows what else.
    <dillon@backplane.com> (Please include original email in any response)    

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-current" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199812190953.BAA07138>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation