From owner-freebsd-hackers Wed Dec 16 21:09:05 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id VAA05019 for freebsd-hackers-outgoing; Wed, 16 Dec 1998 21:09:05 -0800 (PST) (envelope-from owner-freebsd-hackers@FreeBSD.ORG) Received: from Genesis.Denninger.Net (kdhome-2.pr.mcs.net [205.164.6.10]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id VAA05011 for ; Wed, 16 Dec 1998 21:09:03 -0800 (PST) (envelope-from karl@Genesis.Denninger.Net) Received: (from karl@localhost) by Genesis.Denninger.Net (8.9.1/8.8.2) id XAA27449; Wed, 16 Dec 1998 23:08:56 -0600 (CST) Message-ID: <19981216230855.A27443@Denninger.Net> Date: Wed, 16 Dec 1998 23:08:56 -0600 From: Karl Denninger To: Alfred Perlstein Cc: hackers@FreeBSD.ORG Subject: Re: yup, found it (NFS) References: <19981216211723.A27176@Denninger.Net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.93.2i In-Reply-To: ; from Alfred Perlstein on Wed, Dec 16, 1998 at 11:51:39PM -0500 Organization: Karl's Sushi and Packet Smashers X-Die-Spammers: Spammers will be LARTed and the remains fed to my cat Sender: owner-freebsd-hackers@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Dec 16, 1998 at 11:51:39PM -0500, Alfred Perlstein wrote: > On Wed, 16 Dec 1998, Karl Denninger wrote: > > > Remove the intr for now. If that fixes it then at least we have > > hard proof of where it is. > > Already done. I'm silly, not suicidal about things :) > > > The problem is that vinvlbuf is not the only place you can get screwed. > > There is also a problem in the vm pager (it can hang in there too, as I've > > now been able to prove and isolate) due to what I *believe* is the same > > cause. This of course assumes you mount executable directories (very > > common in clusters) across NFS. > > You mean, if i'm running an executable over NFS? I've seen this but not nearly as often. In my case pine is local to the machine, but my mailbox isn't. > > Just because of curiousity, it's hanging because the program text > retrieval from the binary (not swap) has a similar loop? Yep. It locks up the process in question. I suspect, but haven't yet proven, that if that lockup bites "pagedaemon" you're fucked on a system level. I *have* proven that the process in question gets hosed and deadlocks. Example: www 11988 0.0 0.5 6260 612 ?? D 8:12AM 0:00.99 /lbin/httpd.apa www 11994 0.0 0.5 6288 620 ?? D 8:12AM 0:06.68 /lbin/httpd.apa Guess what. Right at 8:12 in the morning the server gets "kicked" to produce logs (it gets sent a SIGINT). Hmmm..... > > Certainly the expected execution path is basically the same, and I can > > *trigger it* with a SIGINT to a running process which happens to have some > > of its working set paged out at the time it receives the signal (ouch!) > > That doesn't seem very good at all. Is this second case for all > NFS mounts? or only intr mounts? Don't know yet - still testing. > Thanks for the attention. Sorry i took so long to get some proof > of this bug, it's just that it's a work machine and taking time > out to do this isn't always possible. > > I'm sure tracking down/fixing the problem is on a totally different > level, so thanks, > > -Alfred Yep. I understand fully. What I want to know is whether a "ro,soft" mount has the same vulnerability. We use them around here for things like mounting the Usenet spool. -- -- Karl Denninger (karl@denninger.net) http://www.mcs.net/~karl I ain't even *authorized* to speak for anyone other than myself, so give up now on trying to associate my words with any particular organization. To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-hackers" in the body of the message