Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Aug 1998 11:34:00 -0700 (PDT)
From:      peter@sirius.com
To:        mrcpu@internetcds.com (Jaye Mathisen)
Cc:        hackers@FreeBSD.ORG, stable@FreeBSD.ORG
Subject:   Re: vmopar state in 2.2.7?
Message-ID:  <199808131834.LAA14961@staff.sirius.com>
In-Reply-To: <Pine.NEB.3.95.980813010207.8849B-100000@schizo.cdsnet.net> from Jaye Mathisen at "Aug 13, 98 01:06:02 am"

next in thread | previous in thread | raw e-mail | index | archive | help
> 
> 
> I'm having a problem with my INN 2.1 newsreader machines NFS mounting
> the spool.
> 
> The nnrpd's are occasionally getting stuck in what top shows as
> the vmopar state.  ps shows the process in Ds state.
> 
> 
> No kill (obviously) will get it unstuck, and nothing else I do seems to
> make it come back to life.  
> 
> The NFS server is a Network Appliance, running latest released code,
> UDP mounts, v2 NFS.
> 
> Any tip appreciated.
> 

We worked around a similar problem (processes left immortal, here in
the context of several processes [httpd] writing to the same NFS mounted 
file [http log file]) by adjusting the timeout value from 0 (never) to
2 * hz (2 seconds). Details are posted as follow-up to kern/4588 in
FreeBSD.org's gnats problem report database.

It looks like other parts of the kernel (here the vm subsystem) suffer
similar problems. It appears to me that an overly optimistic use of
tsleep() with both, interrupts disabled and time-out set to infinity,
leaves immortal yet paralyzed processes around.

>From /usr/src/sys/vm/vm_object.c (a second, similar occurence around
line 1261):

   1218      /*
   1219      * The busy flags are only cleared at
   1220      * interrupt -- minimize the spl transitions
   1221      */
   1222      if ((p->flags & PG_BUSY) || p->busy) {
   1223               s = splvm();
   1224               if ((p->flags & PG_BUSY) || p->busy) {
   1225                       p->flags |= PG_WANTED;
   1226                       tsleep(p, PVM, "vmopar", 0);
   1227                       splx(s);
   1228                       goto again;
   1229               }
   1230               splx(s);
   1231      }

The code in line 1224 checks a condition to see whether somebody else
is already performing an operation on object p; in this case it wants
to ensure that a wakeup() for the following tsleep() is delivered by
setting a flag in line 1225.

But what ensures that the world did not change between lines 1224 and
1225? Could the wakeup() happen after 1224 has determined to issue
the tsleep() but before the flagging in 1225 was registered? Then it
would be missed. Is this a race condition biting heavily hit machines?

Try changing lines 1226 and 1261 to something like:
	tsleep(p, PVM, "vmopar", 5 * hz);

>From the tsleep man page:

     Tsleep is the general sleep call.  Suspends the current process until a
     wakeup is performed on the specified identifier.  The process will then
     be made runnable with the specified priority. Sleeps at most timo / hz
     seconds (0 means no timeout).  If pri includes the PCATCH flag, signals
     are checked before and after sleeping, else signals are not checked.  Re-
     turns 0 if awakened, EWOULDBLOCK if the timeout expires.  If PCATCH is
     set and a signal needs to be delivered, ERESTART is returned if the cur-
     rent system call should be restarted if possible, and EINTR is returned
     if the system call should be interrupted by the signal (return EINTR).

This function would return "EWOULDBLOCK" after the time-out expires then, 
no clue what that will do to your system or apps ;) -- I would expect the
blocked process to go away within 5 seconds...

Peter Preuss
Sirius Connections, San Francisco

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199808131834.LAA14961>