Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Aug 1998 17:15:44 -0700
From:      David Greenman <dg@root.com>
To:        peter@sirius.com
Cc:        mrcpu@internetcds.com (Jaye Mathisen), hackers@FreeBSD.ORG, stable@FreeBSD.ORG
Subject:   Re: vmopar state in 2.2.7? 
Message-ID:  <199808140015.RAA17635@implode.root.com>
In-Reply-To: Your message of "Thu, 13 Aug 1998 11:34:00 PDT." <199808131834.LAA14961@staff.sirius.com> 

next in thread | previous in thread | raw e-mail | index | archive | help
>We worked around a similar problem (processes left immortal, here in
>the context of several processes [httpd] writing to the same NFS mounted 
>file [http log file]) by adjusting the timeout value from 0 (never) to
>2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in
>FreeBSD.org's gnats problem report database.
>
>It looks like other parts of the kernel (here the vm subsystem) suffer
>similar problems. It appears to me that an overly optimistic use of
>tsleep() with both, interrupts disabled and time-out set to infinity,
>leaves immortal yet paralyzed processes around.

   No, there's just a missing or unprotected wakeup() somewhere.

>>From /usr/src/sys/vm/vm_object.c (a second, similar occurence around
>line 1261):
>
>   1218      /*
>   1219      * The busy flags are only cleared at
>   1220      * interrupt -- minimize the spl transitions
>   1221      */
>   1222      if ((p->flags & PG_BUSY) || p->busy) {
>   1223               s = splvm();
>   1224               if ((p->flags & PG_BUSY) || p->busy) {
>   1225                       p->flags |= PG_WANTED;
>   1226                       tsleep(p, PVM, "vmopar", 0);
>   1227                       splx(s);
>   1228                       goto again;
>   1229               }
>   1230               splx(s);
>   1231      }
>
>The code in line 1224 checks a condition to see whether somebody else
>is already performing an operation on object p; in this case it wants
>to ensure that a wakeup() for the following tsleep() is delivered by
>setting a flag in line 1225.
>
>But what ensures that the world did not change between lines 1224 and
>1225? Could the wakeup() happen after 1224 has determined to issue
>the tsleep() but before the flagging in 1225 was registered? Then it
>would be missed. Is this a race condition biting heavily hit machines?

   No. The wakeup occurs as a function of IO rundown which occurs in an
interrupt context. The purpose of splvm() is to block interrupts to prevent
the race condition.

>Try changing lines 1226 and 1261 to something like:
>	tsleep(p, PVM, "vmopar", 5 * hz);
>
>>From the tsleep man page:
>
>     Tsleep is the general sleep call.  Suspends the current process until a
>     wakeup is performed on the specified identifier.  The process will then
>     be made runnable with the specified priority. Sleeps at most timo / hz
>     seconds (0 means no timeout).  If pri includes the PCATCH flag, signals
>     are checked before and after sleeping, else signals are not checked.  Re-
>     turns 0 if awakened, EWOULDBLOCK if the timeout expires.  If PCATCH is
>     set and a signal needs to be delivered, ERESTART is returned if the cur-
>     rent system call should be restarted if possible, and EINTR is returned
>     if the system call should be interrupted by the signal (return EINTR).
>
>This function would return "EWOULDBLOCK" after the time-out expires then, 
>no clue what that will do to your system or apps ;) -- I would expect the
>blocked process to go away within 5 seconds...

   It would do bad things. There's a bug, but not there and that isn't the
fix. I think this is another manifestation of the lack of NFSnode locking
in the kernel, but that's just a guess.

-DG

David Greenman
Co-founder/Principal Architect, The FreeBSD Project

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808140015.RAA17635>