Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Dec 1998 20:52:01 -0600
From:      Karl Denninger <karl@Denninger.Net>
To:        Alfred Perlstein <bright@hotjobs.com>, hackers@FreeBSD.ORG
Subject:   Re: NFS hangs, old problem revisited.
Message-ID:  <19981216205201.A27104@Denninger.Net>
In-Reply-To: <Pine.BSF.4.05.9812161930170.377-100000@bright.fx.genx.net>; from Alfred Perlstein on Wed, Dec 16, 1998 at 07:43:17PM -0500
References:  <Pine.BSF.4.05.9812161930170.377-100000@bright.fx.genx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
Alfred,

If I'm reading the code correctly you can get screwed by this in
vm/vm_fault.c as well around line 239.

Remove "intr" from all NFS mounts and see if the problem goes away. I've
caught this happening for NFS mounted *IMAGES* with a wait channel of
"vmpfw", which is a situation where tsleep() is called protected
by splvm() (waiting for a page-in to be completed).

Thanks for the prodding on this - I've been out of the NFS issues for a
while, but this may give me enough to find and squash this one.

-
-- 
Karl Denninger (karl@denninger.net) http://www.mcs.net/~karl
I ain't even *authorized* to speak for anyone other than myself, so give
up now on trying to associate my words with any particular organization.


On Wed, Dec 16, 1998 at 07:43:17PM -0500, Alfred Perlstein wrote:
> 
> Anyone want to take a look at this?
> 
> I kinda think i just got bitten by it, but i have no idea.
> 
> It's my old "deleting mail in pine over NFS killed my box bug"
> 
> You can still ping the box after the hang i just got, and you can telnet
> to open ports, however all that happens is that the connection is opened,
> but nothing ever gets sent across.
> 
> ie:
> % telnet box
> Trying x.x.x.x...
> Connected to x.x.x.
> Escape character is '^]'.
>                   
> then nothing.
> 
> I'd submit a PR, however i've already done so, i tried enabling crashdumps
> after being told it was 'ok' and i lost my /usr.
> 
> Can i do anything to give better feedback?
> 
> i have intr mounts, which is why i thought of this, there really is no PR
> with this dialog and i didn't see any followups about it.
> 
> 3.0 box as of Nov 30th.  I think i will cvsup, perhaps something somewhere
> else has been done to fix this, but the code looks the same as in this
> mail.
> 
> thanks,
> -Alfred
> 
> ----- begin conversation with people that understand vfs ------
> 
> NFS/FS people care to comment?
> 
> (Regarding the looping 'tsleep' in vfs_subr.c: vinvalbuf() which
> causes a system hang). 
> 
> To reiterate a bit, the code in question is:
>                 while (vp->v_numoutput) {
>                         vp->v_flag |= VBWAIT;
>                         tsleep((caddr_t)&vp->v_numoutput,
>                                 slpflag | (PRIBIO + 1),
>                                 "vinvlbuf", slptimeo);
>                 }
> 
> When the filesystem is NFS mounted with the 'intr' flag, this tsleep
> gets interrupted occasionally, and the system begins infinitely
> looping here.
> 
> The discussion about which we need comments:
> 
> Lo and Behold, Mike Hibler said:
> > > From: David G Andersen <danderse@cs>
> > 
> > > I can see a few options for the way to go, but I'm not sure which is
> > > right.
> > > 
> > > 1 - return EINTR on the close ('man close' says that's a possible
> error
> > >     code)
> > > 
> > > 2 - retry the flush a few times, then return EINTR.
> > >     (more likely to make clients happy)
> > > 
> > > 3 - For those of us who are lazy bastards, ignore SIGINTR during
> > >     NFS flushes.  This seems like a bad idea.
> > > 
> > > 4 - Something else?  
> > > 
> > 
> > There are really two issues involved.  One is whether the FreeBSD change
> > to vinvalbuf is even necessary/correct...  Ok, I just did a cvs annotate
> > and found what the change was:
> > ==================
> > 
> > revision 1.156
> > date: 1998/06/10 22:02:14;  author: julian;  state: Exp;  lines: +4 -2
> > Replace 'sleep()' with 'tsleep()'
> > Accidentally imported from Kirk's codebase.
> > 
> > Pointed out by: various.
> > ----------------------------
> > revision 1.155
> > date: 1998/06/10 18:13:19;  author: julian;  state: Exp;  lines: +18 -8
> > Submitted by: Kirk McKusick <mckusick@McKusick.COM>
> > 
> > Fix for potential hang when trying to reboot the system or
> > to forcibly unmount a soft update enabled filesystem.
> > FreeBSD already handled the reboot case differently, this is however a
> better
> > fix.
> > 
> > ==================
> > So as 1.155 indicates, this change came directly from The Source so I
> believe
> > it is necessary.  The change in 1.156 is the key: by changing from the
> 4.4bsd
> > non-interruptible "sleep" to the possibly interruptible "tsleep" and
> OR'ing
> > in the "slpflag" the problem was introduced--now the sleep became
> > interruptible when called on an interruptible NFS mount.
> > 
> > That brings us to issue #2 which is what is the correct behavior in this
> case?
> > The easy way out is to just not OR in slpflag and go back to full-time
> non-
> > interruptibility (your #3).  However, that probably isn't necessary.
> I'm a
> > bettin' that you could just slpx() and return the tsleep value (your #1)
> > and all will be fine. (well, as fine as it ever is in the NFS world...)
> 
>   Thanks in advance.
> 
>     -Dave
> 
> --
> work: danderse@cs.utah.edu                     me:  angio@pobox.com
>       University of Utah                            http://www.angio.net/
>       Department of Computer Science
> 
> 
> 
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-hackers" in the body of the message

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19981216205201.A27104>