Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 12 Mar 2015 12:40:23 +0200
From:      Konstantin Belousov <kostikbel@gmail.com>
To:        Mark Johnston <markj@freebsd.org>
Cc:        freebsd-stable@freebsd.org, Nick Frampton <nick.frampton@akips.com>, John Baldwin <jhb@freebsd.org>, kib@FreeBSD.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <20150312104023.GL2379@kib.kiev.ua>
In-Reply-To: <20150312043407.GA11120@raichu>
References:  <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com> <1648097.s1OBMXVVbH@ralph.baldwin.cx> <5501108C.4080303@akips.com> <20150312043407.GA11120@raichu>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Mar 11, 2015 at 09:34:07PM -0700, Mark Johnston wrote:
> On Thu, Mar 12, 2015 at 02:05:32PM +1000, Nick Frampton wrote:
> > On 12/03/15 00:38, John Baldwin wrote:
> > >>> It sounds like this issue might be the one fixed in r272566: if the
> > >>> > >KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
> > >>> > >sbuf error return value could bubble up and be treated as ERESTART,
> > >>> > >resulting in a loop.
> > >>> > >
> > >>> > >This can be confirmed with something like
> > >>> > >
> > >>> > >    dtrace -n 'syscall:::entry/pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
> > >>> > >
> > >>> > >If the output consists solely of __sysctl, this bug is likely the
> > >>> > >culprit.
> > >> >
> > >> >Unfortunately, I accidentally killed fstat this morning before I could do any further debug.
> > >> >
> > >> >I ran truss -p on it yesterday and it was spinning solely on __sysctl.
> > >> >
> > >> >I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the
> > >> >problem in a reasonable time frame so it could be days or weeks before we see it happen again.
> > > Tha truss output is consistent with Mark's suggestion, so I would try
> > > his suggested fix of 272566.
> > 
> > I patched the 10.1 kernel with r272566 and it appears to have fixed the issue. Is this patch likely 
> > to be MFCed back to 10-stable?
> 
> I can't see any reason it shouldn't be, and there was an MFC reminder in
> the commit log entry for that revision. I've cc'ed kib@, who might have a
> reason.

The mentioned commit depends on r271976, in fact it depends on the series of
commits, including r271486 and r271489.

I did not merged r271976 with manual resolution of the conficts, since it
means that the work done for HEAD needs to be redone for stable/10 to
ensure that all cases are covered.  Later, when the mentioned series is
merged, the work should be redone once more.

And to note, r271489 is not trivially mergeable as well, just checked.



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20150312104023.GL2379>