Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 16 Oct 2014 15:23:32 -0400
From:      John Baldwin <jhb@freebsd.org>
To:        Jason Wolfe <nitroboost@gmail.com>
Cc:        Sean Bruno <sbruno@llnw.com>, freebsd-net@freebsd.org
Subject:   Re: ixgbe(4) spin lock held too long
Message-ID:  <201410161523.32415.jhb@freebsd.org>
In-Reply-To: <CAAAm0r01DqqNLUr0Mu15o2RLLdgnKntnFNKrw8TJut8XE7KNOw@mail.gmail.com>
References:  <1410203348.1343.1.camel@bruno> <3567780.Mf6fMnzmYG@ralph.baldwin.cx> <CAAAm0r01DqqNLUr0Mu15o2RLLdgnKntnFNKrw8TJut8XE7KNOw@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
On Saturday, October 11, 2014 2:19:19 am Jason Wolfe wrote:
> On Fri, Oct 10, 2014 at 8:53 AM, John Baldwin <jhb@freebsd.org> wrote:
> 
> > On Thursday, October 09, 2014 02:31:32 PM Jason Wolfe wrote:
> > > On Wed, Oct 8, 2014 at 12:29 PM, John Baldwin <jhb@freebsd.org> wrote:
> > > > My only other thought is if a direct timeout routine ran for a long
> > time.
> > > >
> > > > I just committed a change to current that can let you capture KTR
> > traces
> > > > of
> > > > callout routines for use with schedgraph (r272757).  Unfortunately,
> > > > enabling KTR_SCHED can be slow.  If you are up for trying it, I do
> > think
> > > > that
> > > > building a kernel with KTR and KTR_SCHED enabled 
(KTR_COMPILE=KTR_SCHED
> > > > and
> > > > KTR_MASK=KTR_SCHED) with the kernel part of the commit I referenced
> > above
> > > > applied is the best bet to figure out why it is spinning so long.  If
> > you
> > > > can
> > > > try that (and if it doesn't slow things down too much) and get a panic
> > > > with
> > > > those options enabled, then capture the output of
> > > > 'ktrdump -e /path/to/kernel -m /var/crash/vmcore.X -ct', we can use
> > > > Src/tools/sched/schedgraph.py to look at that output to get a picture
> > of
> > > > what
> > > > was going on just prior to the crash.
> > > >
> > > > --
> > > > John Baldwin
> > >
> > > As luck would have it.. caught one of the boxes with the new KTR
> > > code/options after only an hour.  Crashed in the same way w tid 100003
> > and
> > > looking the same in callout_process
> > >
> > > spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
> > > 100003) too long
> > > spin lock 0xffffffff81262d00 (callout) held by 0xfffff800151fe000 (tid
> > > 100003) too long
> > >
> > > #4 0xffffffff8070d6fa in callout_process (now=7915682202423) at
> > > /usr/src/sys/kern/kern_
> > > timeout.c:490
> > >
> > > The ktrdump oddly only seems to have the last 702, though
> > debug.ktr.entries
> > > is set to 1024.  It appears the buffer may also start after 100003 had
> > > already hung?  I've bumped debug.ktr.entries up in case we don't have
> > > enough history here.
> > >
> > > http://nitrology.com/ktrdump-spinlock.txt
> >
> > Hmm, schedgraph.py chokes on this, but it's a bit interesting.  It seems
> > that
> > in this time sample, CPUs 1, 2, 4, and 5 were constantly running ixgbe
> > interrupt handlers.  No actual thread state changes are logged (which is
> > why
> > schedgraph choked).
> >
> > Try setting the sysctl 'net.inet.tcp.per_cpu_timers' to 1 and see if that
> > makes a difference.  I'm guessing they are all contending on the default
> > callout lock which is causing your headache.
> >
> > --
> > John Baldwin
> >
> 
> net.inet.tcp.per_cpu_timers=1 triggered some other fun :)  Saw this same
> panic across a handful of machines:
> 
> panic: tcp_setpersist: retransmit pending
> cpuid = 3
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame
> 0xfffffe1f9e9ab800
> panic() at panic+0x155/frame 0xfffffe1f9e9ab880
> tcp_setpersist() at tcp_setpersist+0xa3/frame 0xfffffe1f9e9ab8b0
> tcp_timer_persist() at tcp_timer_persist+0x176/frame 0xfffffe1f9e9ab8e0
> softclock_call_cc() at softclock_call_cc+0x1ce/frame 0xfffffe1f9e9ab9e0
> softclock() at softclock+0x44/frame 0xfffffe1f9e9aba00
> intr_event_execute_handlers() at intr_event_execute_handlers+0x83/frame
> 0xfffffe1f9e9aba30
> ithread_loop() at ithread_loop+0x96/frame 0xfffffe1f9e9aba70
> fork_exit() at fork_exit+0x71/frame 0xfffffe1f9e9abab0
> fork_trampoline() at fork_trampoline+0xe/frame 0xfffffe1f9e9abab0
> --- trap 0, rip = 0, rsp = 0xfffffe1f9e9abb70, rbp = 0 ---
> 
> (kgdb) up 3
> #3  0xffffffff808467d3 in tcp_setpersist (tp=<value optimized out>) at
> /usr/src/sys/netinet/tcp_output.c:1619
> 1619                    panic("tcp_setpersist: retransmit pending");
> (kgdb) list
> 1614            int t = ((tp->t_srtt >> 2) + tp->t_rttvar) >> 1;
> 1615            int tt;
> 1616
> 1617            tp->t_flags &= ~TF_PREVVALID;
> 1618            if (tcp_timer_active(tp, TT_REXMT))
> 1619                    panic("tcp_setpersist: retransmit pending");
> 1620            /*
> 1621             * Start/restart persistance timer.
> 1622             */
> 1623            TCPT_RANGESET(tt, t * tcp_backoff[tp->t_rxtshift],
> 
> I have debug.ktr.entries set to 204800 on the machines compiled with KTR
> options, maybe a bit more history can provide some extra context.

I looked at the other trace and I don't think it disagrees with my previous 
theory.  I do have more KTR patches to log when we spin on locks which would 
really confirm this, but I haven't tested those fully on HEAD yet.

However, I'd rather spend time debugging this panic.  I think I added that
assertion myself.

The root problem is that the persist and retransmit timers share state,
so only one should ever be active at a time.  In this case, the persist
timer has fired and is rescheduling itself, but some other thread has
scheduled the retransmit timer to fire.  The bug is probably in that other 
thread.  It should either not scheduled the retransmit timer, or it should
first cancel the persist timer.

A new assertion should be able to find this.  Note, this is definitely going
to panic, but one panic is probably enough to find this.

Index: tcp_timer.c
===================================================================
--- tcp_timer.c	(revision 272718)
+++ tcp_timer.c	(working copy)
@@ -713,10 +713,14 @@ tcp_timer_activate(struct tcpcb *tp, int timer_typ
 		case TT_REXMT:
 			t_callout = &tp->t_timers->tt_rexmt;
 			f_callout = tcp_timer_rexmt;
+			if (callout_active(&tp->t_timers->tt_persist))
+				panic("scheduling retransmit with persist active");
 			break;
 		case TT_PERSIST:
 			t_callout = &tp->t_timers->tt_persist;
 			f_callout = tcp_timer_persist;
+			if (callout_active(&tp->t_timers->tt_rexmt))
+				panic("scheduling persist with retransmit active");
 			break;
 		case TT_KEEP:
 			t_callout = &tp->t_timers->tt_keep;

-- 
John Baldwin



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201410161523.32415.jhb>