From owner-freebsd-stable@FreeBSD.ORG Fri Jan 14 01:35:32 2005 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 85BB116A4D1 for ; Fri, 14 Jan 2005 01:35:16 +0000 (GMT) Received: from carver.gumbysoft.com (carver.gumbysoft.com [66.220.23.50]) by mx1.FreeBSD.org (Postfix) with ESMTP id 3E65E43D45 for ; Fri, 14 Jan 2005 01:35:16 +0000 (GMT) (envelope-from dwhite@gumbysoft.com) Received: by carver.gumbysoft.com (Postfix, from userid 1000) id 318DD72DD4; Thu, 13 Jan 2005 17:35:16 -0800 (PST) Received: from localhost (localhost [127.0.0.1]) by carver.gumbysoft.com (Postfix) with ESMTP id 2C52D72DCB; Thu, 13 Jan 2005 17:35:16 -0800 (PST) Date: Thu, 13 Jan 2005 17:35:16 -0800 (PST) From: Doug White To: Tony Byrne In-Reply-To: <1433078378.20050111134014@byrnehq.com> Message-ID: <20050113172415.E13904@carver.gumbysoft.com> References: <1433078378.20050111134014@byrnehq.com> MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII cc: freebsd-stable@freebsd.org Subject: Re: MegaRAID 'Bad Slot' Kernel message and crash. X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 14 Jan 2005 01:35:33 -0000 On Tue, 11 Jan 2005, Tony Byrne wrote: > Basically, after some amount of uptime the kernel will emit a "amr0: > Bad slot x completed" message and pretty soon after this the box goes into a > partially unresponsive state forcing us to reboot it. So far the only > thing triggering the problem is the nightly jobs, where the amount of > IO is higher than during the day. scottl has been able to reproduce this on a U320 controller he has. I only have U160 equipment and can't get the txn rate up high enough to reproduce the issue. The driver needs KTR instrumentation so we can see where the bad slot is popping up from. The "bad slot" message appears when the controller returns completion for a command that had already completed. The amr driver has several other issues and is in dire need of an overhaul. Unfortunately LSI has not been forthcoming with documentation, so Scott and I are pretty much scratching our heads without knowing where to go. This is in 5.X and HEAD, at least. I can't comment on 4.x. -- Doug White | FreeBSD: The Power to Serve dwhite@gumbysoft.com | www.FreeBSD.org