From owner-freebsd-hackers Wed Dec 16 18:52:17 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id SAA22521 for freebsd-hackers-outgoing; Wed, 16 Dec 1998 18:52:17 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from Genesis.Denninger.Net (kdhome-2.pr.mcs.net [205.164.6.10]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id SAA22516 for ; Wed, 16 Dec 1998 18:52:15 -0800 (PST) (envelope-from karl@Genesis.Denninger.Net) Received: (from karl@localhost) by Genesis.Denninger.Net (8.9.1/8.8.2) id UAA27115; Wed, 16 Dec 1998 20:52:02 -0600 (CST) Message-ID: <19981216205201.A27104@Denninger.Net> Date: Wed, 16 Dec 1998 20:52:01 -0600 From: Karl Denninger To: Alfred Perlstein , hackers@FreeBSD.ORG Subject: Re: NFS hangs, old problem revisited. References: Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: ; from Alfred Perlstein on Wed, Dec 16, 1998 at 07:43:17PM -0500 Organization: Karl's Sushi and Packet Smashers X-Die-Spammers: Spammers will be LARTed and the remains fed to my cat Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG Alfred, If I'm reading the code correctly you can get screwed by this in vm/vm_fault.c as well around line 239. Remove "intr" from all NFS mounts and see if the problem goes away. I've caught this happening for NFS mounted *IMAGES* with a wait channel of "vmpfw", which is a situation where tsleep() is called protected by splvm() (waiting for a page-in to be completed). Thanks for the prodding on this - I've been out of the NFS issues for a while, but this may give me enough to find and squash this one. - -- Karl Denninger (karl@denninger.net) http://www.mcs.net/~karl I ain't even *authorized* to speak for anyone other than myself, so give up now on trying to associate my words with any particular organization. On Wed, Dec 16, 1998 at 07:43:17PM -0500, Alfred Perlstein wrote: > > Anyone want to take a look at this? > > I kinda think i just got bitten by it, but i have no idea. > > It's my old "deleting mail in pine over NFS killed my box bug" > > You can still ping the box after the hang i just got, and you can telnet > to open ports, however all that happens is that the connection is opened, > but nothing ever gets sent across. > > ie: > % telnet box > Trying x.x.x.x... > Connected to x.x.x. > Escape character is '^]'. > > then nothing. > > I'd submit a PR, however i've already done so, i tried enabling crashdumps > after being told it was 'ok' and i lost my /usr. > > Can i do anything to give better feedback? > > i have intr mounts, which is why i thought of this, there really is no PR > with this dialog and i didn't see any followups about it. > > 3.0 box as of Nov 30th. I think i will cvsup, perhaps something somewhere > else has been done to fix this, but the code looks the same as in this > mail. > > thanks, > -Alfred > > ----- begin conversation with people that understand vfs ------ > > NFS/FS people care to comment? > > (Regarding the looping 'tsleep' in vfs_subr.c: vinvalbuf() which > causes a system hang). > > To reiterate a bit, the code in question is: > while (vp->v_numoutput) { > vp->v_flag |= VBWAIT; > tsleep((caddr_t)&vp->v_numoutput, > slpflag | (PRIBIO + 1), > "vinvlbuf", slptimeo); > } > > When the filesystem is NFS mounted with the 'intr' flag, this tsleep > gets interrupted occasionally, and the system begins infinitely > looping here. > > The discussion about which we need comments: > > Lo and Behold, Mike Hibler said: > > > From: David G Andersen > > > > > I can see a few options for the way to go, but I'm not sure which is > > > right. > > > > > > 1 - return EINTR on the close ('man close' says that's a possible > error > > > code) > > > > > > 2 - retry the flush a few times, then return EINTR. > > > (more likely to make clients happy) > > > > > > 3 - For those of us who are lazy bastards, ignore SIGINTR during > > > NFS flushes. This seems like a bad idea. > > > > > > 4 - Something else? > > > > > > > There are really two issues involved. One is whether the FreeBSD change > > to vinvalbuf is even necessary/correct... Ok, I just did a cvs annotate > > and found what the change was: > > ================== > > > > revision 1.156 > > date: 1998/06/10 22:02:14; author: julian; state: Exp; lines: +4 -2 > > Replace 'sleep()' with 'tsleep()' > > Accidentally imported from Kirk's codebase. > > > > Pointed out by: various. > > ---------------------------- > > revision 1.155 > > date: 1998/06/10 18:13:19; author: julian; state: Exp; lines: +18 -8 > > Submitted by: Kirk McKusick > > > > Fix for potential hang when trying to reboot the system or > > to forcibly unmount a soft update enabled filesystem. > > FreeBSD already handled the reboot case differently, this is however a > better > > fix. > > > > ================== > > So as 1.155 indicates, this change came directly from The Source so I > believe > > it is necessary. The change in 1.156 is the key: by changing from the > 4.4bsd > > non-interruptible "sleep" to the possibly interruptible "tsleep" and > OR'ing > > in the "slpflag" the problem was introduced--now the sleep became > > interruptible when called on an interruptible NFS mount. > > > > That brings us to issue #2 which is what is the correct behavior in this > case? > > The easy way out is to just not OR in slpflag and go back to full-time > non- > > interruptibility (your #3). However, that probably isn't necessary. > I'm a > > bettin' that you could just slpx() and return the tsleep value (your #1) > > and all will be fine. (well, as fine as it ever is in the NFS world...) > > Thanks in advance. > > -Dave > > -- > work: danderse@cs.utah.edu me: angio@pobox.com > University of Utah http://www.angio.net/ > Department of Computer Science > > > > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-hackers" in the body of the message To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message