Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 19 Aug 2010 20:42:27 +0300
From:      Andriy Gapon <avg@icyb.net.ua>
To:        Doug Barton <dougb@FreeBSD.org>
Cc:        freebsd-current@FreeBSD.org
Subject:   Re: Runaway intr, not flash related
Message-ID:  <4C6D6D03.4000101@icyb.net.ua>
In-Reply-To: <4C6D6A3C.9020507@FreeBSD.org>
References:  <alpine.BSF.2.00.1008121349230.1721@qbhto.arg> <4C6D4CB4.20601@icyb.net.ua> <4C6D6A3C.9020507@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
on 19/08/2010 20:30 Doug Barton said the following:
> On 08/19/2010 08:24, Andriy Gapon wrote:
>> I am sorry, but I don't see anything dramatically wrong here. So
>> "swi4: clock" uses 5.76% of WCPU, is that such a big deal to be
>> called "runaway intr"?
> 
> That's the symptom.

OK, I see.

Perhaps you will find this message (and its ancestor thread) interesting:
http://lists.freebsd.org/pipermail/freebsd-hackers/2008-February/023447.html
I believe that your issue is different, but perhaps that stuff will inspire you to
use ktr(4) and schedgraph to properly debug this issue.  I strongly believe that
you have some sort of a scheduling issue and ktr seems to be the way to
investigate it.

Perhaps, you can first try the following dtrace script.
It should give a better view of what statclock sees, but I am not sure if that
information will be sufficient.
/********************************************************/
fbt::statclock:entry
/curthread->td_oncpu == 0/
{

	@stacks0[stack()] = count();
	counts0++;
}

fbt::statclock:entry
/curthread->td_oncpu == 1/
{

	@stacks1[stack()] = count();
	counts1++;
}

fbt::statclock:entry
{

	@stacks[pid, tid, stack()] = count();
	counts++;
}

END
{
	printf("\n");
	printf("***** CPU 0:\n");
	normalize(@stacks0, counts0 / 100);
	trunc(@stacks0, 5);
	printa("%k%@u\n\n", @stacks0);

	printf("\n\n");
	printf("***** CPU 1:\n");
	normalize(@stacks1, counts1 / 100);
	trunc(@stacks1, 5);
	printa("%k%@u\n\n", @stacks1);

	printf("\n\n");
	printf("***** Top Processes:\n");
	normalize(@stacks, counts / 200);
	trunc(@stacks, 20);
	printa(@stacks);
}
/********************************************************/
You would run this script when the problem hits, few seconds should be sufficient.
You may want to play with values in trunc() calls, you may also want to filter
gathered statistics (using conditions in /.../) by pid/tid if you spot anything
interesting unusual.

-- 
Andriy Gapon



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?4C6D6D03.4000101>