From owner-freebsd-stable@FreeBSD.ORG Mon Jan 30 21:04:52 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 039F6106566C for ; Mon, 30 Jan 2012 21:04:52 +0000 (UTC) (envelope-from aboyer@averesystems.com) Received: from zimbra.averesystems.com (75-149-8-245-Pennsylvania.hfc.comcastbusiness.net [75.149.8.245]) by mx1.freebsd.org (Postfix) with ESMTP id B1EE58FC08 for ; Mon, 30 Jan 2012 21:04:51 +0000 (UTC) Received: from localhost (localhost [127.0.0.1]) by zimbra.averesystems.com (Postfix) with ESMTP id EE038446005; Mon, 30 Jan 2012 15:49:09 -0500 (EST) X-Virus-Scanned: amavisd-new at averesystems.com Received: from zimbra.averesystems.com ([127.0.0.1]) by localhost (zimbra.averesystems.com [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id WTrjC3+NPRUz; Mon, 30 Jan 2012 15:49:04 -0500 (EST) Received: from riven.arriad.com (fw.arriad.com [10.0.0.16]) by zimbra.averesystems.com (Postfix) with ESMTPSA id E253A446004; Mon, 30 Jan 2012 15:49:00 -0500 (EST) From: Andrew Boyer Content-Type: text/plain; charset=us-ascii Content-Transfer-Encoding: quoted-printable Date: Mon, 30 Jan 2012 15:46:58 -0500 Message-Id: <76687387-92D3-4EA5-AD39-3F6820B27DCD@averesystems.com> To: FreeBSD Stable Mailing List , Alexander Motin Mime-Version: 1.0 (Apple Message framework v1084) X-Mailer: Apple Mail (2.1084) Cc: Subject: Kernel panics under 8.2 due to ATA timeouts X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 30 Jan 2012 21:04:52 -0000 Hello Alexander, I have a system that appears to have a flaky SATA controller (one of the = Intel ESB2 variants) and it seems to be exposing a weakness in the ATA = driver (not using ATA_CAM). If a command with ATA_R_DIRECT set times = out, the channel gets reinitialized, but from the soft interrupt = context. It panics when it tries to sleep in ata_queue_request(). Timeouts work if ATA_R_DIRECT isn't set because in that case it uses a = taskqueue to complete the request. Here is the backtrace: > #0 kdb_enter (why=3D0xffffffff80962cfa "panic", msg=3D0xa
) at ../../../kern/subr_kdb.c:349 > #1 0xffffffff805d6d0b in panic (fmt=3DVariable "fmt" is not = available. > ) at ../../../kern/kern_shutdown.c:689 > #2 0xffffffff8061bc53 in sleepq_add (wchan=3D0xffffff00052c3e58, = lock=3D0xffffff00052c3e38, wmesg=3D0xffffffff808fa213 "ATA request = done",=20 > flags=3D1, queue=3D0) at ../../../kern/subr_sleepqueue.c:320 > #3 0xffffffff80590c95 in _cv_timedwait (cvp=3D0xffffff00052c3e58, = lock=3D0xffffff00052c3e38, timo=3D40000) at = ../../../kern/kern_condvar.c:313 > #4 0xffffffff805d61af in _sema_timedwait (sema=3D0xffffff00052c3e38, = timo=3D40000, file=3D0xffffffff808fa1f6 "../../../dev/ata/ata-queue.c",=20= > line=3D118) at ../../../kern/kern_sema.c:123 > #5 0xffffffff8028559f in ata_queue_request = (request=3D0xffffff00052c3dc0) at ../../../dev/ata/ata-queue.c:117 > #6 0xffffffff80286628 in ata_controlcmd (dev=3D0xffffff0002e83d00, = command=3D239 '?', feature=3DVariable "feature" is not available. > ) at ../../../dev/ata/ata-queue.c:153 > #7 0xffffffff8027ffd3 in ata_setmode (dev=3D0xffffff0002e83d00) at = ../../../dev/ata/ata-all.c:637 > #8 0xffffffff802a0af9 in ad_init (dev=3D0xffffff0002e83d00) at = ../../../dev/ata/ata-disk.c:405 > #9 0xffffffff802a0c29 in ad_reinit (dev=3D0xffffff0002e83d00) at = ../../../dev/ata/ata-disk.c:221 > #10 0xffffffff80280cad in ata_reinit (dev=3D0xffffff0002902800) at = ata_if.h:79 > #11 0xffffffff802856c4 in ata_completed (context=3DVariable "context" = is not available. > ) at ../../../dev/ata/ata-queue.c:313 > #12 0xffffffff80285ffb in ata_finish (request=3D0xffffff00054ec8c0) at = ../../../dev/ata/ata-queue.c:265 > #13 0xffffffff805ed419 in softclock (arg=3DVariable "arg" is not = available. > ) at ../../../kern/kern_timeout.c:430 This is very repeatable. I'm not sure what's the best fix - always use = a taskqueue on timeouts? Don't reinit if direct commands fail? -Andrew -------------------------------------------------- Andrew Boyer aboyer@averesystems.com