Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 17 Dec 1998 01:43:03 -0500 (EST)
From:      "Andrew Macpherson" <Andrew.Macpherson.andrew@nortelnetworks.com>
To:        David G Andersen <danderse@cs.utah.edu>
Cc:        Karl Denninger <karl@Denninger.Net>, bright@hotjobs.com, hackers@FreeBSD.ORG
Subject:   Re: yup, found it (NFS)
Message-ID:  <Pine.BSF.4.05.9812170124530.45027-100000@hcarp00g.ca.nortel.com>
In-Reply-To: <199812170523.WAA02697@lal.cs.utah.edu>

next in thread | previous in thread | raw e-mail | index | archive | help

On Wed, 16 Dec 1998, David G Andersen wrote:

> Date: Wed, 16 Dec 1998 22:23:53 -0700 (MST)
> From: David G Andersen <danderse@cs.utah.edu>
> To: Karl Denninger <karl@Denninger.Net>
> Cc: bright@hotjobs.com, hackers@FreeBSD.ORG
> Subject: Re: yup, found it (NFS)
> 
> Lo and behold, Karl Denninger once said:
> > 
> > On Wed, Dec 16, 1998 at 11:51:39PM -0500, Alfred Perlstein wrote:
> > > On Wed, 16 Dec 1998, Karl Denninger wrote:
> > > 
> > > > Remove the intr for now.  If that fixes it then at least we have
> > > > hard proof of where it is.
> 
>   It does.  You may wish to look at PR kern/8732, which we opened about a
> month ago on exactly this topic.

Yep, I got bit by this while using amd. Updating amd seemed to reduce the
frequency of the freezes, however one type of freeze was very easy to 
reproduce. While editing a file on an NFS partition with Xemacs, the 
system would consistently lock when Xemacs attempted to auto-save the
document... it was doing a write to an NFS disk from a SIGALRM handler.
Alfred's pine behaviour sounds like it might be similar.

David suggested I toast my nfsiod's and since then the system's been
rock-solid.

As for mount options, I have `intr' enabled...

I wonder if this PR is one of the deadlocks that Matt Dillon referred to
in his recent mail to the list...

Andrew

> 
> > > > cause.  This of course assumes you mount executable directories (very
> > > > common in clusters) across NFS.
> 
>   Interesting.  We didn't bump into this one, but my test program didn't
> check for it - only for the buffer flushing.
> 
> > > > Certainly the expected execution path is basically the same, and I can
> > > > *trigger it* with a SIGINT to a running process which happens to have some
> > > > of its working set paged out at the time it receives the signal (ouch!)
> > > 
> > > That doesn't seem very good at all.  Is this second case for all
> > > NFS mounts? or only intr mounts?
> 
>   If it's like the bug we found (which I'd wager), it's probably for intr
> mounts.  Like we mention in the PR, the problem seems to be related to the
> change from sleep to an interruptable tsleep.
> 
> > What I want to know is whether a "ro,soft" mount has the same
> > vulnerability.  We use them around here for things like mounting
> > the Usenet spool.
> 
>   Nope.  Soft doesn't seem to affect it (at least, the last time I tested
> it).  Another cheap fix is to not run any nfsiods, preventing the
> asynchronous flush from occuring in the first place.
> 
>    We've been hounding on this PR for a while (that's kern/8732. :), and
> would love to see a resolution for it.  If someone wants to suggest the
> proper behavior, I'm more than happy to start drudging up a fix.
> 
>    -Dave
> 
> -- 
> work: danderse@cs.utah.edu                     me:  angio@pobox.com
>       University of Utah                            http://www.angio.net/
>       Department of Computer Science
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.05.9812170124530.45027-100000>