Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Dec 1998 23:08:56 -0600
From:      Karl Denninger <karl@Denninger.Net>
To:        Alfred Perlstein <bright@hotjobs.com>
Cc:        hackers@FreeBSD.ORG
Subject:   Re: yup, found it (NFS)
Message-ID:  <19981216230855.A27443@Denninger.Net>
In-Reply-To: <Pine.BSF.4.05.9812162341200.378-100000@bright.fx.genx.net>; from Alfred Perlstein on Wed, Dec 16, 1998 at 11:51:39PM -0500
References:  <19981216211723.A27176@Denninger.Net> <Pine.BSF.4.05.9812162341200.378-100000@bright.fx.genx.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Dec 16, 1998 at 11:51:39PM -0500, Alfred Perlstein wrote:
> On Wed, 16 Dec 1998, Karl Denninger wrote:
> 
> > Remove the intr for now.  If that fixes it then at least we have
> > hard proof of where it is.
> 
> Already done.  I'm silly, not suicidal about things :)
> 
> > The problem is that vinvlbuf is not the only place you can get screwed.
> > There is also a problem in the vm pager (it can hang in there too, as I've
> > now been able to prove and isolate) due to what I *believe* is the same
> > cause.  This of course assumes you mount executable directories (very
> > common in clusters) across NFS.
> 
> You mean, if i'm running an executable over NFS?  I've seen this but not nearly as often.  In my case pine is local to the machine, but my mailbox isn't.
> 
> Just because of curiousity, it's hanging because the program text
> retrieval from the binary (not swap) has a similar loop?

Yep.  It locks up the process in question.  I suspect, but haven't yet
proven, that if that lockup bites "pagedaemon" you're fucked on a system
level.  I *have* proven that the process in question gets hosed and
deadlocks.

Example:
www      11988  0.0  0.5  6260  612  ??  D     8:12AM   0:00.99 /lbin/httpd.apa
www      11994  0.0  0.5  6288  620  ??  D     8:12AM   0:06.68 /lbin/httpd.apa

Guess what.  Right at 8:12 in the morning the server gets "kicked" to
produce logs (it gets sent a SIGINT).  Hmmm.....

> > Certainly the expected execution path is basically the same, and I can
> > *trigger it* with a SIGINT to a running process which happens to have some
> > of its working set paged out at the time it receives the signal (ouch!)
> 
> That doesn't seem very good at all.  Is this second case for all
> NFS mounts? or only intr mounts?

Don't know yet - still testing.

> Thanks for the attention.  Sorry i took so long to get some proof
> of this bug, it's just that it's a work machine and taking time
> out to do this isn't always possible.
> 
> I'm sure tracking down/fixing the problem is on a totally different
> level, so thanks,
> 
> -Alfred

Yep.  I understand fully.

What I want to know is whether a "ro,soft" mount has the same
vulnerability.  We use them around here for things like mounting
the Usenet spool.

--
-- 
Karl Denninger (karl@denninger.net) http://www.mcs.net/~karl
I ain't even *authorized* to speak for anyone other than myself, so give
up now on trying to associate my words with any particular organization.

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-hackers" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19981216230855.A27443>