Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Mar 2017 11:38:13 -0700
From:      Mark Johnston <markj@FreeBSD.org>
To:        Peter Holm <peter@holm.cc>
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: draining high-frequency callouts
Message-ID:  <20170313183813.GB57357@wkstn-mjohnston.west.isilon.com>
In-Reply-To: <20170313082120.GA44651@x2.osted.lan>
References:  <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com> <20170313082120.GA44651@x2.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On Mon, Mar 13, 2017 at 09:21:20AM +0100, Peter Holm wrote:
> On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote:
> > I'm occasionally seeing an assertion failure in softclock_call_cc() when
> > running DTrace tests on a system with hz=10000. The assertion
> > (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
> > concurrently draining the callout, which runs at a high frequency. At
> > the time of the panic, that thread is spinning on the per-CPU callout
> > lock after having been awoken from "codrain", and CALLOUT_PENDING is
> > set on the callout. The callout is direct, i.e., it is executed in hard
> > interrupt context.
> > 
> > I think this is what's happening:
> > - callout_drain() is called while the callout is executing but after the
> >   callout has rescheduled itself, and goes to sleep after having cleared
> >   CALLOUT_ACTIVE.
> > - softclock_call_cc() wakes up the callout_drain() caller, but the
> >   callout fires again before the caller is scheduled.
> > - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
> >   cleared and panics.
> > 
> > Is there anything that prevents this scenario? Is it really correct to
> > leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
> > dropped in order to acquire a sleepqueue lock?
> > 
> 
> Is this the same problem?
> 
> panic: softclock_call_cc: act 0xfffff8000de64800 0

It's hard to say for sure. The minimal patch below fixed the problem for
me - could you give it a try? I also did not see any problems while
testing on Hans' branch.

diff --git a/sys/kern/kern_timeout.c b/sys/kern/kern_timeout.c
index 5b70cf2033f5..a9c50fd98fbe 100644
--- a/sys/kern/kern_timeout.c
+++ b/sys/kern/kern_timeout.c
@@ -1256,7 +1256,8 @@ again:
 		 * Succeed we to stop it or not, we must clear the
 		 * active flag - this is what API users expect.
 		 */
-		c->c_flags &= ~CALLOUT_ACTIVE;
+		if ((flags & CS_DRAIN) == 0)
+			c->c_flags &= ~CALLOUT_ACTIVE;
 
 		if ((flags & CS_DRAIN) != 0) {
 			/*
@@ -1315,6 +1316,7 @@ again:
 				PICKUP_GIANT();
 				CC_LOCK(cc);
 			}
+			c->c_flags &= ~CALLOUT_ACTIVE;
 		} else if (use_lock &&
 			   !cc_exec_cancel(cc, direct) && (drain == NULL)) {
 			



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20170313183813.GB57357>