Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 11 Mar 2015 14:00:41 +1000
From:      Nick Frampton <nick.frampton@akips.com>
To:        Mark Johnston <markj@FreeBSD.org>, John Baldwin <jhb@freebsd.org>
Cc:        freebsd-stable@freebsd.org
Subject:   Re: Suspected libkvm infinite loop
Message-ID:  <54FFBDE9.5060702@akips.com>
In-Reply-To: <20150310215913.GB52108@charmander.picturesperfect.net>
References:  <54FE3803.2000307@akips.com> <4637620.LE11f9AQj7@ralph.baldwin.cx> <20150310215913.GB52108@charmander.picturesperfect.net>

next in thread | previous in thread | raw e-mail | index | archive | help
On 11/03/15 07:59, Mark Johnston wrote:
> On Tue, Mar 10, 2015 at 02:10:09PM -0400, John Baldwin wrote:
>> Often loops using libkvm are due to programs using libkvm are trying to read
>> kernel data structures while they are changing.  However, if you use sysctls
>> to fetch this data instead, you should be able to get a stable snapshot of the
>> system state without getting stuck in a possible loop.  I believe for libkvm
>> to use sysctl instead of /dev/kmem you have to pass a NULL for the kernel and
>> "/dev/null" for the core image.

In our code, we're invoking kvm_openfiles as you suggest:
kd = kvm_openfiles (NULL, _PATH_DEVNULL, NULL, O_RDONLY, errbuf)


> It sounds like this issue might be the one fixed in r272566: if the
> KERN_PROC_ALL sysctl is read with an insufficiently large buffer, an
> sbuf error return value could bubble up and be treated as ERESTART,
> resulting in a loop.
>
> This can be confirmed with something like
>
>    dtrace -n 'syscall:::entry /pid == $target/{@[probefunc] = count();} tick-3s {exit(0);}' -p <pid of looping proc>
>
> If the output consists solely of __sysctl, this bug is likely the
> culprit.

Unfortunately, I accidentally killed fstat this morning before I could do any further debug.

I ran truss -p on it yesterday and it was spinning solely on __sysctl.

I'll try compiling with debug symbols in case it happens again. I haven't been able to reproduce the 
problem in a reasonable time frame so it could be days or weeks before we see it happen again.

-Nick





Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?54FFBDE9.5060702>