From owner-freebsd-hackers@freebsd.org Mon Mar 13 08:28:58 2017 Return-Path: Delivered-To: freebsd-hackers@mailman.ysv.freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) by mailman.ysv.freebsd.org (Postfix) with ESMTP id 6A017D07AFA for ; Mon, 13 Mar 2017 08:28:58 +0000 (UTC) (envelope-from hps@selasky.org) Received: from mail.turbocat.net (turbocat.net [88.99.82.50]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id 3448D1ECA; Mon, 13 Mar 2017 08:28:57 +0000 (UTC) (envelope-from hps@selasky.org) Received: from hps2016.home.selasky.org (unknown [62.141.129.119]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by mail.turbocat.net (Postfix) with ESMTPSA id 4B1D91FE025; Mon, 13 Mar 2017 09:28:07 +0100 (CET) Subject: Re: draining high-frequency callouts To: Peter Holm , Mark Johnston References: <20170110205711.GA86449@wkstn-mjohnston.west.isilon.com> <20170313082120.GA44651@x2.osted.lan> Cc: freebsd-hackers@FreeBSD.org From: Hans Petter Selasky Message-ID: Date: Mon, 13 Mar 2017 09:27:59 +0100 User-Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:45.0) Gecko/20100101 Thunderbird/45.7.1 MIME-Version: 1.0 In-Reply-To: <20170313082120.GA44651@x2.osted.lan> Content-Type: text/plain; charset=windows-1252; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: freebsd-hackers@freebsd.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: Technical Discussions relating to FreeBSD List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 13 Mar 2017 08:28:58 -0000 On 03/13/17 09:21, Peter Holm wrote: > On Tue, Jan 10, 2017 at 12:57:12PM -0800, Mark Johnston wrote: >> I'm occasionally seeing an assertion failure in softclock_call_cc() when >> running DTrace tests on a system with hz=10000. The assertion >> (c->c_flags & CALLOUT_ACTIVE) != 0 is failing while a thread is >> concurrently draining the callout, which runs at a high frequency. At >> the time of the panic, that thread is spinning on the per-CPU callout >> lock after having been awoken from "codrain", and CALLOUT_PENDING is >> set on the callout. The callout is direct, i.e., it is executed in hard >> interrupt context. >> >> I think this is what's happening: >> - callout_drain() is called while the callout is executing but after the >> callout has rescheduled itself, and goes to sleep after having cleared >> CALLOUT_ACTIVE. >> - softclock_call_cc() wakes up the callout_drain() caller, but the >> callout fires again before the caller is scheduled. >> - the second softclock_call_cc() call sees that CALLOUT_ACTIVE is >> cleared and panics. >> >> Is there anything that prevents this scenario? Is it really correct to >> leave CALLOUT_ACTIVE cleared when the per-CPU callout lock must be >> dropped in order to acquire a sleepqueue lock? >> > > Is this the same problem? > > panic: softclock_call_cc: act 0xfffff8000de64800 0 > cpuid = 10 > time = 1489389893 > KDB: stack backtrace: > db_trace_self_wrapper() at db_trace_self_wrapper+0x2b/frame 0xfffffe0f984c9660 > vpanic() at vpanic+0x19c/frame 0xfffffe0f984c96e0 > kassert_panic() at kassert_panic+0x126/frame 0xfffffe0f984c9750 > softclock_call_cc() at softclock_call_cc+0xae/frame 0xfffffe0f984c98f0 > softclock() at softclock+0x12c/frame 0xfffffe0f984c9930 > intr_event_execute_handlers() at intr_event_execute_handlers+0x248/frame 0xfffffe0f984c9980 > ithread_execute_handlers() at ithread_execute_handlers+0x47/frame 0xfffffe0f984c99b0 > ithread_loop() at ithread_loop+0xfc/frame 0xfffffe0f984c9a30 > fork_exit() at fork_exit+0x13b/frame 0xfffffe0f984c9ab0 > > https://people.freebsd.org/~pho/stress/log/kevent7-2.txt > > I first spotted this @ 12.0-CURRENT #2 r305652M: Fri Sep 9 13:09:03 CEST 2016 > It's quite easy to reproduce. > Hi, It depends on the trigger. The panic above just means the callout structure was referred after being freed. Either a drain call is missing or the callout escaped drain. Can you run this test with a kernel built from projects/hps_head aswell? --HPS