Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Oct 2014 14:31:32 -0700
From:      Jason Wolfe <nitroboost@gmail.com>
To:        John Baldwin <jhb@freebsd.org>
Cc:        Sean Bruno <sbruno@llnw.com>, freebsd-net@freebsd.org
Subject:   Re: ixgbe(4) spin lock held too long
Message-ID:  <CAAAm0r1N=F6wgroVJZx3CR-bzti45=VA==dv1VaVhx1hSufUEQ@mail.gmail.com>
In-Reply-To: <2077446.sYcZo9xEXb@ralph.baldwin.cx>
References:  <1410203348.1343.1.camel@bruno> <201410071428.15753.jhb@freebsd.org> <CAAAm0r0=7G=MgKEpXGDLqYE5%2BK9gfV3PLWJ=hLfvf2KWvPtd7w@mail.gmail.com> <2077446.sYcZo9xEXb@ralph.baldwin.cx>

next in thread | previous in thread | raw e-mail | index | archive | help
On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin <jhb@freebsd.org> wrote:

> My only other thought is if a direct timeout routine ran for a long time.
>
> I just committed a change to current that can let you capture KTR traces of
> callout routines for use with schedgraph (r272757).  Unfortunately,
> enabling KTR_SCHED can be slow.  If you are up for trying it, I do think
> that
> building a kernel with KTR and KTR_SCHED enabled (KTR_COMPILE=KTR_SCHED and
> KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced above
> applied is the best bet to figure out why it is spinning so long.  If you
> can
> try that (and if it doesn't slow things down too much) and get a panic with
> those options enabled, then capture the output of
> 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use
> Src/tools/sched/schedgraph.py to look at that output to get a picture of
> what
> was going on just prior to the crash.
>
> --
> John Baldwin
>

As luck would have it.. caught one of the boxes with the new KTR
code/options after only an hour.  Crashed in the same way w tid 100003 and
looking the same in callout_process

spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
100003) too long
spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
100003) too long

#4 0xffffffff8070d6fa in callout_process (now=7915682202423) at
/usr/src/sys/kern/kern_
timeout.c:490

The ktrdump oddly only seems to have the last 702, though debug.ktr.entries
is set to 1024.  It appears the buffer may also start after 100003 had
already hung?  I've bumped debug.ktr.entries up in case we don't have
enough history here.

http://nitrology.com/ktrdump-spinlock.txt

Jason



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAAAm0r1N=F6wgroVJZx3CR-bzti45=VA==dv1VaVhx1hSufUEQ>