Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 12 Nov 2014 19:39:50 -0800
From:      Adrian Chadd <adrian@freebsd.org>
To:        Alexander Kabaev <kabaev@gmail.com>
Cc:        "freebsd-arch@freebsd.org" <freebsd-arch@freebsd.org>
Subject:   Re: Questions about locking; turnstiles and sleeping threads
Message-ID:  <CAJ-Vmok-8znyycyOBS_ZQU275zFy%2BzuZ2C-jt4N3DnuEVS=PWg@mail.gmail.com>
In-Reply-To: <20141112212613.21037929@kan>
References:  <CAJ-VmomrauhCMoF_dZfMWWhZp0EgwfE9RmxL5Pc37PhLSzZ6Qg@mail.gmail.com> <20141112212613.21037929@kan>

next in thread | previous in thread | raw e-mail | index | archive | help
On 12 November 2014 18:26, Alexander Kabaev <kabaev@gmail.com> wrote:
> On Wed, 12 Nov 2014 18:13:55 -0800
> Adrian Chadd <adrian@freebsd.org> wrote:
>
>> Hi,
>>
>> I have a bit of an odd case here.
>>
>> I'm getting panics in the net80211/ath code, "sleeping thread (X) owns
>> non-sleepable lock."
>>
>> show alllocks just showed one lock held - the net80211 comlock. It's a
>> recursive mutex, that's supposed to be sleepable.
>>
>> The two threads in question look like this:
>>
>> thread X: net80211_newstate_cb (grabs IEEE80211_LOCK())
>>     ath_newstate
>>     callout_drain - which grabs the ATH_LOCK as part of the callout
>> drain side of things
>>     that enters sleepq_wait() and goes to sleep, waiting for
>> whatever's running the callout to
>>     finish
>>
>> thread Y:
>>     rx_path in if_ath_rx_edma
>>     ath_rx_pkt -> sta_input -> ath_recv_mgmt -> sta_recv_mgmt (grabs
>> IEEE80211_LOCK()) -> panics
>>
>> Thread Y doesn't hold any other locks. It's just trying to grab the
>> IEEE80211_LOCK that is being held by thread X. But thread X is asleep
>> waiting for whatever callout to finish so it can continue. The code in
>> propagate_priority() sees that thread X is sleeping and panics.
>>
>> So, what's really going on? I don't mind (well, "don't mind") having
>> to take another deep dive through all of this to sort it out so it
>> doesn't tickle the callout / turnstile code in this particular
>> fashion, but I'd first like to ensure that it's not some corner case
>> that isn't handled by the check in propagate_priority().
>>
>> Thanks,
>>
>>
>> -adrian
>> _______________________________________________
>
> Hi,
>
> mutexes are blocking and not sleepable primitives, so doing any
> unbounded sleep with mutex locked, such as one you are attempting by
> calling callout_drain is illegal. In other words, you are getting an
> expected assert and the code in question is wrong.

Hi,

Right. That isn't mentioned in the manpage. The manpage says:

     The function callout_drain() is identical to callout_stop() except that
     it will wait for the callout to be completed if it is already in
     progress.  This function MUST NOT be called while holding any locks on
     which the callout might block, or deadlock will result.  Note that if the
     callout subsystem has already begun processing this callout, then the
     callout function may be invoked during the execution of callout_drain().
     However, the callout subsystem does guarantee that the callout will be
     fully stopped before callout_drain() returns.

The callout isn't going to block here, but another thread may block.

This is good to know. I'll see if I can come up with an addition to
the manpage about this.

I'm going to have to do another pass over all of the wifi drivers and
stack to see where this is happening. Ugh. :(

Thanks!



-adrian



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAJ-Vmok-8znyycyOBS_ZQU275zFy%2BzuZ2C-jt4N3DnuEVS=PWg>