Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 13 Apr 2010 12:50:29 +0200
From:      Attilio Rao <attilio@freebsd.org>
To:        Garrett Cooper <yanefbsd@gmail.com>
Cc:        Tom Couch <tom.couch.storage@gmail.com>, Giovanni Trematerra <giovanni.trematerra@gmail.com>, FreeBSD Current <freebsd-current@freebsd.org>, Scott Long <scottl@freebsd.org>, Pyun YongHyeon <yongari@freebsd.org>
Subject:   Re: Removing USB keyboard after filesystems synced causes panic with destroyed mutex twa(4)?
Message-ID:  <h2q3bbf2fe11004130350td06cbb37ha564ba8cf11dd96a@mail.gmail.com>
In-Reply-To: <x2t3bbf2fe11004130348gdeb3003ela5f17acd0b0b780c@mail.gmail.com>
References:  <7d6fde3d1003070207q621e69ado2cb64e431feacd76@mail.gmail.com> <7d6fde3d1003070224k3626a9b5y98c11a43eef1bed4@mail.gmail.com> <4e6cba831003101356i534341ffr2961b983854ab788@mail.gmail.com> <7dc40bd01003101407m605e41ey2d8ace0049cf5e61@mail.gmail.com> <7d6fde3d1003102158o7834ca67lce3eca23aa723fd1@mail.gmail.com> <7d6fde3d1003121933s4ba7b57fw6542628c16edf723@mail.gmail.com> <x2t3bbf2fe11004130348gdeb3003ela5f17acd0b0b780c@mail.gmail.com>

next in thread | previous in thread | raw e-mail | index | archive | help
2010/4/13 Attilio Rao <attilio@freebsd.org>:
> 2010/3/13 Garrett Cooper <yanefbsd@gmail.com>:
>> On Wed, Mar 10, 2010 at 9:58 PM, Garrett Cooper <yanefbsd@gmail.com> wro=
te:
>>> On Wed, Mar 10, 2010 at 2:07 PM, Tom Couch <tom.couch.storage@gmail.com=
> wrote:
>>>> Hi FreeBSD-current,
>>>> =C2=A0 =C2=A0 My name is Tom Couch,
>>>> I am part of the 3ware driver team recently acquired by LSI.
>>>> I believe Giovanni's patch, below, is the correct fix for this bug.
>>>>
>>>> I am available to maintain the twa driver, now that I am on this list.
>>>> Let me know how I can help,
>>>>
>>>> Tom
>>>>
>>>>
>>>> On Wed, Mar 10, 2010 at 1:56 PM, Giovanni Trematerra <
>>>> giovanni.trematerra@gmail.com> wrote:
>>>>
>>>>> On Sun, Mar 7, 2010 at 11:24 AM, Garrett Cooper <yanefbsd@gmail.com>
>>>>> wrote:
>>>>> > On Sun, Mar 7, 2010 at 2:07 AM, Garrett Cooper <yanefbsd@gmail.com>
>>>>> wrote:
>>>>> >> Hi Alexander and Hans,
>>>>> >> =C2=A0 =C2=A0I recently did the following which generated a panic =
on a
>>>>> >> 9-CURRENT kernel compiled on the 26th:
>>>>> >>
>>>>> >> 1. Executed reboot
>>>>> >> 2. Removed keyboard.
>>>>> >> 3. Some time after `All buffers synced\nUptime: ...' was displayed=
,
>>>>> >> the keyboard was registered disconnected.
>>>>> >> 4. The interrupt was delivered to my twa(4) enabled card and the
>>>>> >> kernel panicked, like so:
>>>>> >>
>>>>> >> ugen2.2: <Mitsumi Electric> at usbus2 (disconnected)
>>>>> >> uhub8: at uhub2, port 1, addr 2 (disconnected)
>>>>> >> ugen2.3: <Mitsumi Electric> at usbus2 (disconnected)
>>>>> >> ukbd0: at uhub8, port 3, addr 3 (disconnected)
>>>>> >> uhid0: at uhub8, port 3, addr 3 (disconnected)
>>>>> >> panic: mtx_lock_spin() of destroyed mutex @
>>>>> /usr/src/sys/dev/twa/tw_cl_intr.c:88
>>>>> >>
>>>>> >> cpuid =3D 1
>>>>> >> KDB: enter: panic
>>>>> >> [thread pid 12 tid 100025 ]
>>>>> >> Stopped at =C2=A0 =C2=A0 =C2=A0 =C2=A0 kdb_enter+0x3d: movq =C2=A0=
 =C2=A0 $0,0x40289c(%rip)
>>>>> >> db>
>>>>> >>
>>>>> >> =C2=A0 =C2=A0I wish I could provide you with more details, but unf=
ortunately I
>>>>> >> the USB bus isn't registering the fact that I'm reattaching the
>>>>> >> keyboard right now and the box won't reboot automatically :( (didn=
't
>>>>> >> set the right sysctl beforehand to panic automatically). I'll try =
and
>>>>> >> reproduce the issue again, but I was just wondering whether or not=
 you
>>>>> >> guys had seen this problem before.
>>>>> >
>>>>> > =C2=A0 =C2=A0Phew... it's reproducible with that kernel. Here's wha=
t I did
>>>>> > exactly (because my original directions were incorrect):
>>>>> > =C2=A0 =C2=A01. Hit power button (for S5).
>>>>> > =C2=A0 =C2=A02. Disconnect keyboard RIGHT as `Uptime: ...' is displ=
ayed.
>>>>> > =C2=A0 =C2=A0Kernel panicked on my system again. Now to figure out =
if it still
>>>>> > exists with a kernel compiled today, and also how to debug it if it
>>>>> > still does exist :/...
>>>>> > Thanks,
>>>>> > -Garrett
>>>>>
>>>>> Hi Garrett,
>>>>> Could you please try the patch below and report back?
>>>>>
>>>>> Thank you
>>>>>
>>>>> diff -r cab6489de66d sys/dev/twa/tw_cl_intr.c
>>>>> --- a/sys/dev/twa/tw_cl_intr.c =C2=A0 =C2=A0 =C2=A0 =C2=A0Wed Mar 03 =
04:51:13 2010 -0500
>>>>> +++ b/sys/dev/twa/tw_cl_intr.c =C2=A0 =C2=A0 =C2=A0 =C2=A0Wed Mar 10 =
06:29:05 2010 -0500
>>>>> @@ -75,9 +75,12 @@ tw_cl_interrupt(struct tw_cl_ctlr_handle
>>>>> =C2=A0 =C2=A0 =C2=A0if (ctlr =3D=3D NULL)
>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto out;
>>>>>
>>>>> - =C2=A0 =C2=A0 /* If we get an interrupt while resetting, it is a sh=
ared
>>>>> - =C2=A0 =C2=A0 =C2=A0 =C2=A0one for another device, so just bail */
>>>>> - =C2=A0 =C2=A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS=
)
>>>>> + =C2=A0 =C2=A0 /*
>>>>> + =C2=A0 =C2=A0 =C2=A0* =C2=A0If we get an interrupt while resetting =
or shutting down,
>>>>> + =C2=A0 =C2=A0 =C2=A0* =C2=A0it is a shared one for another device, =
so just bail
>>>>> + =C2=A0 =C2=A0 =C2=A0*/
>>>>> + =C2=A0 =C2=A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS=
 ||
>>>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 (ctrl->state & TW_CLI_CTLR_STATE_ACTIVE) =3D=3D 0)
>>>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto out;
>>>>>
>>>>> =C2=A0 =C2=A0 =C2=A0/*
>>
>> Apart from the typo above (s/ctrl/ctlr/), things work appropriately
>> now at reboot. The only problem is that bootup is really wonky now,
>> because the RAID had a LOT of issues attaching to cam(4) (failed in
>> 2/3 cold boot attempts); an additional branch condition may need to be
>> added to the above if-statement if this change didn't take that into
>> account. However, if the old behavior was incorrect and the new
>> behavior is correct, s.t. the RAID controller demonstrating bus
>> detection timeout issue that is occurring with a lot of USB devices
>> and some RAID controllers today, this could be extremely problematic.
>>
>> So, while it looks better than before at reboot, it's not ready yet
>> for prime time; I'd rather that the bug was filed with the patch you
>> provided after the typo fixed, with the caveat mentioned and NOT
>> committed, because the adverse affect(s) seem a bit more annoying than
>> the previous panic issue described.
>
> I looked briefly at it and I think there are 2 bugs, one in
> twa_detach() and another one in twa_shutdown().
> Basically, locks get destroyed in tw_cl_shutdown_ctlr() which is
> called by twa_shutdown() while interrupts are teared down in
> tw_osli_free_resource(). twa_shutdown() is called in twa_detach()
> before than tw_osli_free_resource().
> tw_cl_shutdown_ctlr() also takes care to disable the interrupts for
> twa but a problem can arise with shared IRQ. Infact, the handler will
> remain on the IRQ until the bus_intr_teardown() takes place and it may
> run, trying to acknowledge the interrupt, but with destroyed lock, if
> an interrupt is sent by a shared source between twa_shutdown() and
> tw_osli_free_resource() call in twa_detach() or just after a simple
> call to twa_shutdown().
>
> Problems I see here:
> - twa_shutdown() should not destroy the mutex at all, it is not
> something our shutdown handlers generally do and it might be kept in
> sync
> - twa_detach() might do a first half of tw_cl_shutdown_ctlr(), do the
> resource deallocation and just at the end destroy mutexes. That is how
> generally our detach handler works.
>
> All these solutions would mean refactoring the tw_osli_free_resource()
> and tw_cl_shutdown_ctlr(). I don't know very well the twa code, but it
> seems to me that we want to keep the driver very compatible with any
> vendor version or such? If yes this may be a problem because the
> failing patterns are all located into the shared code and an ideal
> solution could be more difficult to find out. Otherwise a fix might be
> simple to hammer down.

Forgot to tell: twe might have the same problem even if it doesn't
expose just for luckiness.

Attilio


--=20
Peace can only be achieved by understanding - A. Einstein



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?h2q3bbf2fe11004130350td06cbb37ha564ba8cf11dd96a>