Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Mar 2015 10:38:12 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Nick Frampton <nick.frampton@akips.com>
Cc:        Mark Johnston <markj@freebsd.org>, freebsd-stable@freebsd.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <1648097.s1OBMXVVbH@ralph.baldwin.cx>
In-Reply-To: <54FFBDE9.5060702@akips.com>
References:  <54FE3803.2000307@akips.com> <20150310215913.GB52108@charmander.picturesperfect.net> <54FFBDE9.5060702@akips.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wednesday, March 11, 2015 02:00:41 PM Nick Frampton wrote:
> On 11/03/15 07:59, Mark Johnston wrote:
> > On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote:
> >> Often loops using libkvm are due to programs using libkvm are trying to read
> >> kernel data structures while they are changing.  However, if you use sysctls
> >> to fetch this data instead, you should be able to get a stable snapshot of the
> >> system state without getting stuck in a possible loop.  I believe for libkvm
> >> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and
> >> "/dev/null" for the core image.
> 
> In our code, we're invoking kvm_openfiles as you suggest:
> kd = kvm_openfiles (NULL, _PATH_DEVNULL, NULL, O_RDONLY, errbuf)
> 
> 
> > It sounds like this issue might be the one fixed in r272566: if the
> > KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
> > sbuf error return value could bubble up and be treated as ERESTART,
> > resulting in a loop.
> >
> > This can be confirmed with something like
> >
> >    dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
> >
> > If the output consists solely of __sysctl, this bug is likely the
> > culprit.
> 
> Unfortunately, I accidentally killed fstat this morning before I could do any further debug.
> 
> I ran truss -p on it yesterday and it was spinning solely on __sysctl.
> 
> I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the 
> problem in a reasonable time frame so it could be days or weeks before we see it happen again.

Tha truss output is consistent with Mark's suggestion, so I would try
his suggested fix of 272566.

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?1648097.s1OBMXVVbH>