From owner-freebsd-scsi@FreeBSD.ORG Tue Apr 27 20:26:17 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 8E1A110656AA for ; Tue, 27 Apr 2010 20:26:17 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id 319FD8FC17 for ; Tue, 27 Apr 2010 20:26:16 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.3/8.14.3) with ESMTP id o3RKQERb005049; Tue, 27 Apr 2010 14:26:14 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: Date: Tue, 27 Apr 2010 14:26:14 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: <76C33FA5-993A-4D23-8ECB-F0913E77A677@samsco.org> References: <4BD6F266.5080403@feral.com> <4BD74535.4060503@feral.com> To: Andy Farkas X-Mailer: Apple Mail (2.1078) X-Spam-Status: No, score=-1.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org Subject: Re: MFC of "Large set of CAM improvements" breaks I/O to Adaptec 29160 SCSI controller X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 27 Apr 2010 20:26:17 -0000 On Apr 27, 2010, at 2:20 PM, Andy Farkas wrote: > On Wed, Apr 28, 2010 at 6:12 AM, Matthew Jacob wrote: >=20 >> Does anything time out (eventually)? >=20 > No. I left it sitting overnight and it was still deadlocked > in the morning... >=20 A couple of possible scenarios here: 1. A command completed with an error, that error was reported up to the = periph layer, and the periph failed to properly handle it, leading to a = lost command that eventually livelocked the VM/block layer. 2. An error happened the transport layer, and the aic7xxx tried to = freeze the CAM queues to perform error recovery. Something broke in the = freeze/unfreeze API, so the aic7xxx was left stranded. The more I think about it, it's likely case 2, since I know that = Alexander has been working in or near that code. Scott