Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 23 Aug 2011 09:42:20 -0700
From:      mdf@FreeBSD.org
To:        Andriy Gapon <avg@freebsd.org>
Cc:        freebsd-arch@freebsd.org
Subject:   Re: skipping locks, mutex_owned, usb
Message-ID:  <CAMBSHm_ffZYUqQ1pFf_ZD4gebRDT5E0Mto_QCdBhO5BjHbuRyw@mail.gmail.com>
In-Reply-To: <4E53AC2F.1040006@FreeBSD.org>
References:  <4E53986B.5000804@FreeBSD.org> <CAMBSHm-nT5-wTbAFqsJ6ZjCPE9Vfxeva1F3zwWSAK8Ecw=8VsA@mail.gmail.com> <4E53AC2F.1040006@FreeBSD.org>

next in thread | previous in thread | raw e-mail | index | archive | help
On Tue, Aug 23, 2011 at 6:33 AM, Andriy Gapon <avg@freebsd.org> wrote:
> on 23/08/2011 15:58 mdf@FreeBSD.org said the following:
>> On Tue, Aug 23, 2011 at 5:09 AM, Andriy Gapon <avg@freebsd.org> wrote:
>>> III. =A0With my stop_scheduler_on_panic patch ukbd_poll() produces infi=
nite chains
>>> of 'infinite' recursion because mtx_owned() always returns false. =A0Th=
is is because
>>> I skip all lock/unlock/etc operations in the post-panic context. =A0I t=
hink that
>>> it's a good philosophical question: what operations like mtx_owned(),
>>> mtx_recursed(), mtx_trylock() 'should' return when we actually act as i=
f no locks
>>> exist at all?
>>>
>>> My first knee-jerk reaction was to change mtx_owned() to return true in=
 this
>>> "lock-less" context, but _hypothetically_ there could exist some symmet=
ric code
>>> that does something like:
>>> func()
>>> {
>>> =A0 =A0 =A0 =A0if (mtx_owned(&Giant)) {
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mtx_unlock(&Giant);
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0func();
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0mtx_lock(&Giant);
>>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return;
>>> =A0 =A0 =A0 =A0}
>>>
>>> =A0 =A0 =A0 =A0// etc ...
>>>
>>> What do you think about this problem?
>>
>> Given that checking for a lock being held is a lot more common than
>> checking if it's not held (in the context of mtx_assert(9) and
>> friends), it seems reasonable to report that a lock is held in the
>> special context of after panic.
>
> But, OTOH, there are snippets like this (found with grep, haven't looked =
at the code):
> /usr/src/sys/kern/kern_sx.c: =A0 =A0 =A0 =A0 =A0 =A0while (mtx_owned(&Gia=
nt)) {
>
>>> That question III actually brings another thought: perhaps the whole of=
 idea of
>>> skipping locks in a certain context is not a correct direction?
>>> Perhaps, instead we should identify all the code that could be legitima=
tely
>>> executed in the after-panic and/or kdb contexts and make that could exp=
licitly
>>> aware of its execution context. =A0That is, instead of trying to make _=
lock,
>>> _unlock, _owned, _trylock, etc do the right thing auto-magically, we sh=
ould try to
>>> make the calling code check panicstr and kdb_active and do the right th=
ing on that
>>> level (which would include not invoking those lock-related operations o=
r other
>>> inappropriate operations).
>>
>> I don't think it's possible to identify all the code, since what runs
>> after panic isn't tested very much. :-) =A0I don't disagree that one
>> should think about the issue, but providing reasonable defaults like
>> skipping locks reduces the burden on the programmer.
>
> Yes, I agree with your and John's practical approach to this.
> Maybe print a warning about each skipped locking operation? =A0But not su=
re if that
> would be useful, if there would be no intention of changing the code.

Skipped locking seems like it should be left silent.  I think this is
a reasonable behaviour used on many operating systems -- at least I
know it is used on AIX.

I like the idea of a printf() in mtx_owned() since there is no
completely correct answer there.  Then at least on e.g. an infinite
loop like you point out, it would be clear what's happening (and
presumably this could use the __FILE__ and __LINE__ of the caller in
the print).

Cheers,
matthew



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?CAMBSHm_ffZYUqQ1pFf_ZD4gebRDT5E0Mto_QCdBhO5BjHbuRyw>