From owner-freebsd-current@FreeBSD.ORG Tue Apr 13 10:48:51 2010 Return-Path: Delivered-To: freebsd-current@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 09C8F106566C; Tue, 13 Apr 2010 10:48:51 +0000 (UTC) (envelope-from asmrookie@gmail.com) Received: from ey-out-2122.google.com (ey-out-2122.google.com [74.125.78.26]) by mx1.freebsd.org (Postfix) with ESMTP id 303DC8FC18; Tue, 13 Apr 2010 10:48:49 +0000 (UTC) Received: by ey-out-2122.google.com with SMTP id d26so408904eyd.9 for ; Tue, 13 Apr 2010 03:48:49 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=domainkey-signature:mime-version:sender:received:in-reply-to :references:date:x-google-sender-auth:received:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=8eblrFc2NGuGA4nPALa3GVZ8AD/FTOsou9K4KqagyTA=; b=ZDlrnXWKD7s+NeHnMWCMGX8chyYiHuISB7rNF0miUkZKXLJCu7hjIzp5IG/wguYfyP QZWEWLxRRL0+3PgWZjs+9SnJSlFPTy6mByE7tUgPyDrL4drx0By6Qc4GreiNrg0zRKLC ii8cuFEwidVIoXvCcl/isf9vAbXedTBWdP7gQ= DomainKey-Signature: a=rsa-sha1; c=nofws; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :content-transfer-encoding; b=SQUsyO7szqnG0g+DNXWVAwmRUiOJPkaeI6qefGdnyY3PMcNfEJve+YhLgeCaDFnw7f bbNrrzbJg87kLfia5tjcd7PerIUgaXH8iL/sfz2hE4vEodrFEvXyCXgoIrz1BciPXBHE pogJz/9MUAKHO2EP2/Hz6Tt9186ES07d8asvY= MIME-Version: 1.0 Sender: asmrookie@gmail.com Received: by 10.103.193.19 with HTTP; Tue, 13 Apr 2010 03:48:48 -0700 (PDT) In-Reply-To: <7d6fde3d1003121933s4ba7b57fw6542628c16edf723@mail.gmail.com> References: <7d6fde3d1003070207q621e69ado2cb64e431feacd76@mail.gmail.com> <7d6fde3d1003070224k3626a9b5y98c11a43eef1bed4@mail.gmail.com> <4e6cba831003101356i534341ffr2961b983854ab788@mail.gmail.com> <7dc40bd01003101407m605e41ey2d8ace0049cf5e61@mail.gmail.com> <7d6fde3d1003102158o7834ca67lce3eca23aa723fd1@mail.gmail.com> <7d6fde3d1003121933s4ba7b57fw6542628c16edf723@mail.gmail.com> Date: Tue, 13 Apr 2010 12:48:48 +0200 X-Google-Sender-Auth: 657cc82d42a523d6 Received: by 10.102.174.30 with SMTP id w30mr2970368mue.57.1271155728861; Tue, 13 Apr 2010 03:48:48 -0700 (PDT) Message-ID: From: Attilio Rao To: Garrett Cooper Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Cc: Tom Couch , Giovanni Trematerra , FreeBSD Current , Scott Long , Pyun YongHyeon Subject: Re: Removing USB keyboard after filesystems synced causes panic with destroyed mutex twa(4)? X-BeenThere: freebsd-current@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Discussions about the use of FreeBSD-current List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 13 Apr 2010 10:48:51 -0000 2010/3/13 Garrett Cooper : > On Wed, Mar 10, 2010 at 9:58 PM, Garrett Cooper wrot= e: >> On Wed, Mar 10, 2010 at 2:07 PM, Tom Couch = wrote: >>> Hi FreeBSD-current, >>> =C2=A0 =C2=A0 My name is Tom Couch, >>> I am part of the 3ware driver team recently acquired by LSI. >>> I believe Giovanni's patch, below, is the correct fix for this bug. >>> >>> I am available to maintain the twa driver, now that I am on this list. >>> Let me know how I can help, >>> >>> Tom >>> >>> >>> On Wed, Mar 10, 2010 at 1:56 PM, Giovanni Trematerra < >>> giovanni.trematerra@gmail.com> wrote: >>> >>>> On Sun, Mar 7, 2010 at 11:24 AM, Garrett Cooper >>>> wrote: >>>> > On Sun, Mar 7, 2010 at 2:07 AM, Garrett Cooper >>>> wrote: >>>> >> Hi Alexander and Hans, >>>> >> =C2=A0 =C2=A0I recently did the following which generated a panic o= n a >>>> >> 9-CURRENT kernel compiled on the 26th: >>>> >> >>>> >> 1. Executed reboot >>>> >> 2. Removed keyboard. >>>> >> 3. Some time after `All buffers synced\nUptime: ...' was displayed, >>>> >> the keyboard was registered disconnected. >>>> >> 4. The interrupt was delivered to my twa(4) enabled card and the >>>> >> kernel panicked, like so: >>>> >> >>>> >> ugen2.2: at usbus2 (disconnected) >>>> >> uhub8: at uhub2, port 1, addr 2 (disconnected) >>>> >> ugen2.3: at usbus2 (disconnected) >>>> >> ukbd0: at uhub8, port 3, addr 3 (disconnected) >>>> >> uhid0: at uhub8, port 3, addr 3 (disconnected) >>>> >> panic: mtx_lock_spin() of destroyed mutex @ >>>> /usr/src/sys/dev/twa/tw_cl_intr.c:88 >>>> >> >>>> >> cpuid =3D 1 >>>> >> KDB: enter: panic >>>> >> [thread pid 12 tid 100025 ] >>>> >> Stopped at =C2=A0 =C2=A0 =C2=A0 =C2=A0 kdb_enter+0x3d: movq =C2=A0 = =C2=A0 $0,0x40289c(%rip) >>>> >> db> >>>> >> >>>> >> =C2=A0 =C2=A0I wish I could provide you with more details, but unfo= rtunately I >>>> >> the USB bus isn't registering the fact that I'm reattaching the >>>> >> keyboard right now and the box won't reboot automatically :( (didn'= t >>>> >> set the right sysctl beforehand to panic automatically). I'll try a= nd >>>> >> reproduce the issue again, but I was just wondering whether or not = you >>>> >> guys had seen this problem before. >>>> > >>>> > =C2=A0 =C2=A0Phew... it's reproducible with that kernel. Here's what= I did >>>> > exactly (because my original directions were incorrect): >>>> > =C2=A0 =C2=A01. Hit power button (for S5). >>>> > =C2=A0 =C2=A02. Disconnect keyboard RIGHT as `Uptime: ...' is displa= yed. >>>> > =C2=A0 =C2=A0Kernel panicked on my system again. Now to figure out i= f it still >>>> > exists with a kernel compiled today, and also how to debug it if it >>>> > still does exist :/... >>>> > Thanks, >>>> > -Garrett >>>> >>>> Hi Garrett, >>>> Could you please try the patch below and report back? >>>> >>>> Thank you >>>> >>>> diff -r cab6489de66d sys/dev/twa/tw_cl_intr.c >>>> --- a/sys/dev/twa/tw_cl_intr.c =C2=A0 =C2=A0 =C2=A0 =C2=A0Wed Mar 03 0= 4:51:13 2010 -0500 >>>> +++ b/sys/dev/twa/tw_cl_intr.c =C2=A0 =C2=A0 =C2=A0 =C2=A0Wed Mar 10 0= 6:29:05 2010 -0500 >>>> @@ -75,9 +75,12 @@ tw_cl_interrupt(struct tw_cl_ctlr_handle >>>> =C2=A0 =C2=A0 =C2=A0if (ctlr =3D=3D NULL) >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 goto out; >>>> >>>> - =C2=A0 =C2=A0 /* If we get an interrupt while resetting, it is a sha= red >>>> - =C2=A0 =C2=A0 =C2=A0 =C2=A0one for another device, so just bail */ >>>> - =C2=A0 =C2=A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS) >>>> + =C2=A0 =C2=A0 /* >>>> + =C2=A0 =C2=A0 =C2=A0* =C2=A0If we get an interrupt while resetting o= r shutting down, >>>> + =C2=A0 =C2=A0 =C2=A0* =C2=A0it is a shared one for another device, s= o just bail >>>> + =C2=A0 =C2=A0 =C2=A0*/ >>>> + =C2=A0 =C2=A0 if (ctlr->state & TW_CLI_CTLR_STATE_RESET_IN_PROGRESS = || >>>> + =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 (ctrl->state & TW_CLI_CTLR_STATE_ACTIVE) =3D=3D 0) >>>> =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0goto out; >>>> >>>> =C2=A0 =C2=A0 =C2=A0/* > > Apart from the typo above (s/ctrl/ctlr/), things work appropriately > now at reboot. The only problem is that bootup is really wonky now, > because the RAID had a LOT of issues attaching to cam(4) (failed in > 2/3 cold boot attempts); an additional branch condition may need to be > added to the above if-statement if this change didn't take that into > account. However, if the old behavior was incorrect and the new > behavior is correct, s.t. the RAID controller demonstrating bus > detection timeout issue that is occurring with a lot of USB devices > and some RAID controllers today, this could be extremely problematic. > > So, while it looks better than before at reboot, it's not ready yet > for prime time; I'd rather that the bug was filed with the patch you > provided after the typo fixed, with the caveat mentioned and NOT > committed, because the adverse affect(s) seem a bit more annoying than > the previous panic issue described. I looked briefly at it and I think there are 2 bugs, one in twa_detach() and another one in twa_shutdown(). Basically, locks get destroyed in tw_cl_shutdown_ctlr() which is called by twa_shutdown() while interrupts are teared down in tw_osli_free_resource(). twa_shutdown() is called in twa_detach() before than tw_osli_free_resource(). tw_cl_shutdown_ctlr() also takes care to disable the interrupts for twa but a problem can arise with shared IRQ. Infact, the handler will remain on the IRQ until the bus_intr_teardown() takes place and it may run, trying to acknowledge the interrupt, but with destroyed lock, if an interrupt is sent by a shared source between twa_shutdown() and tw_osli_free_resource() call in twa_detach() or just after a simple call to twa_shutdown(). Problems I see here: - twa_shutdown() should not destroy the mutex at all, it is not something our shutdown handlers generally do and it might be kept in sync - twa_detach() might do a first half of tw_cl_shutdown_ctlr(), do the resource deallocation and just at the end destroy mutexes. That is how generally our detach handler works. All these solutions would mean refactoring the tw_osli_free_resource() and tw_cl_shutdown_ctlr(). I don't know very well the twa code, but it seems to me that we want to keep the driver very compatible with any vendor version or such? If yes this may be a problem because the failing patterns are all located into the shared code and an ideal solution could be more difficult to find out. Otherwise a fix might be simple to hammer down. Thanks, Attilio --=20 Peace can only be achieved by understanding - A. Einstein