Skip site navigation (1)Skip section navigation (2)
Date:      Mon, 13 Mar 2017 09:27:59 +0100
From:      Hans Petter Selasky <hps@selasky.org>
To:        Peter Holm <peter@holm.cc>, Mark Johnston <markj@FreeBSD.org>
Cc:        freebsd-hackers@FreeBSD.org
Subject:   Re: draining high-frequency callouts
Message-ID:  <ac7c766a-98fe-7f96-ecba-6f4d083f854a@selasky.org>
In-Reply-To: <20170313082120.GA44651@x2.osted.lan>
References:  <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com> <20170313082120.GA44651@x2.osted.lan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 03/13/17 09:21, Peter Holm wrote:
> On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote:
>> I'm occasionally seeing an assertion failure in softclock_call_cc() when
>> running DTrace tests on a system with hz=10000. The assertion
>> (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is
>> concurrently draining the callout, which runs at a high frequency. At
>> the time of the panic, that thread is spinning on the per-CPU callout
>> lock after having been awoken from "codrain", and CALLOUT_PENDING is
>> set on the callout. The callout is direct, i.e., it is executed in hard
>> interrupt context.
>>
>> I think this is what's happening:
>> - callout_drain() is called while the callout is executing but after the
>>   callout has rescheduled itself, and goes to sleep after having cleared
>>   CALLOUT_ACTIVE.
>> - softclock_call_cc() wakes up the callout_drain() caller, but the
>>   callout fires again before the caller is scheduled.
>> - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is
>>   cleared and panics.
>>
>> Is there anything that prevents this scenario? Is it really correct to
>> leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be
>> dropped in order to acquire a sleepqueue lock?
>>
>
> Is this the same problem?
>
> panic: softclock_call_cc: act 0xfffff8000de64800 0
> cpuid = 10
> time = 1489389893
> KDB: stack backtrace:
> db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0f984c9660
> vpanic() at vpanic+0x19c/frame 0xfffffe0f984c96e0
> kassert_panic() at kassert_panic+0x126/frame 0xfffffe0f984c9750
> softclock_call_cc() at softclock_call_cc+0xae/frame 0xfffffe0f984c98f0
> softclock() at softclock+0x12c/frame 0xfffffe0f984c9930
> intr_event_execute_handlers() at intr_event_execute_handlers+0x248/frame 0xfffffe0f984c9980
> ithread_execute_handlers() at ithread_execute_handlers+0x47/frame 0xfffffe0f984c99b0
> ithread_loop() at ithread_loop+0xfc/frame 0xfffffe0f984c9a30
> fork_exit() at fork_exit+0x13b/frame 0xfffffe0f984c9ab0
>
> https://people.freebsd.org/~pho/stress/log/kevent7-2.txt
>
> I first spotted this @ 12.0-CURRENT #2 r305652M: Fri Sep  9 13:09:03 CEST 2016
> It's quite easy to reproduce.
>

Hi,

It depends on the trigger. The panic above just means the callout 
structure was referred after being freed. Either a drain call is missing 
or the callout escaped drain.

Can you run this test with a kernel built from projects/hps_head aswell?

--HPS




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?ac7c766a-98fe-7f96-ecba-6f4d083f854a>