From owner-freebsd-stable@FreeBSD.ORG Sat Feb 20 23:35:47 2010 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id CD6B71065672 for ; Sat, 20 Feb 2010 23:35:47 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta11.emeryville.ca.mail.comcast.net (qmta11.emeryville.ca.mail.comcast.net [76.96.27.211]) by mx1.freebsd.org (Postfix) with ESMTP id B04A38FC12 for ; Sat, 20 Feb 2010 23:35:47 +0000 (UTC) Received: from omta12.emeryville.ca.mail.comcast.net ([76.96.30.44]) by qmta11.emeryville.ca.mail.comcast.net with comcast id kNtS1d0090x6nqcABPYl7F; Sat, 20 Feb 2010 23:32:45 +0000 Received: from koitsu.dyndns.org ([98.248.46.159]) by omta12.emeryville.ca.mail.comcast.net with comcast id kPbn1d0023S48mS8YPbnUy; Sat, 20 Feb 2010 23:35:47 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 29BCD1E301A; Sat, 20 Feb 2010 15:35:46 -0800 (PST) Date: Sat, 20 Feb 2010 15:35:46 -0800 From: Jeremy Chadwick To: freebsd-stable@freebsd.org Message-ID: <20100220233546.GA36973@icarus.home.lan> References: <20100131144217.ca08e965.torfinn.ingolfsen@broadpark.no> <20100131175639.86ba9aee.torfinn.ingolfsen@broadpark.no> <20100207163631.da7205fc.torfinn.ingolfsen@broadpark.no> <20100213192404.5e15b5eb.torfinn.ingolfsen@broadpark.no> <20100217091625.d0e74570.torfinn.ingolfsen@broadpark.no> <20100220202108.e1dd1b74.torfinn.ingolfsen@broadpark.no> <20100220193718.GA33214@icarus.home.lan> <20100220224959.c424dd9e.torfinn.ingolfsen@broadpark.no> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20100220224959.c424dd9e.torfinn.ingolfsen@broadpark.no> User-Agent: Mutt/1.5.20 (2009-06-14) Subject: Re: panic - sleeping thread on FreeBSD 8.0-stable / amd64 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Sat, 20 Feb 2010 23:35:48 -0000 On Sat, Feb 20, 2010 at 10:49:59PM +0100, Torfinn Ingolfsen wrote: > On Sat, 20 Feb 2010 11:37:18 -0800 > Jeremy Chadwick wrote: > > > Can you re-run smartctl -a instead of -H? Some of the SMART attributes > > may help determine what's going on, or there may be related errors in > > the SMART error log. > > smartctl -a output attached. Test sequence: ad4 - ad12, ada0. Most of your disks look to be in decent shape. Well, that is to say, all of them should be working fine; I don't see anything that's of major, or even minor concern. Others might focus on Attributes 191 or 195, but neither of those are absurdly high given the number of hours these disks have been in use (see Attribute 9). > > Otherwise I'd say what's happening is a SATA controller lock-up of some > > sort, since it happens on any of your channels. Could be a quirk of > > some kind in the SATA->CAM stuff (unless it also happens when using pure > > ata(4)). > > I am running a quite recent 8.0-stable: > root@kg-f2# uname -a > FreeBSD kg-f2.kg4.no 8.0-STABLE FreeBSD 8.0-STABLE #2: Sun Jan 31 18:39:17 CET 2010 root@kg-f2.kg4.no:/usr/obj/usr/src/sys/GENERIC amd64 > > Perhaps I should upgrade. > > > What controller are these disks hooked to again? > > Six of the disks (ad4, ad6, ad8, ad10, ad12) are connected to the SATA ports on the motherboard: > root@kg-f2# pciconf -lv | grep ata -A 4 > atapci0@pci0:0:17:0: class=0x010601 card=0xb0021458 chip=0x43911002 rev=0x00 hdr=0x00 > vendor = 'ATI Technologies Inc. / Advanced Micro Devices, Inc.' > device = 'SB700 SATA Controller [AHCI mode]' > class = mass storage > subclass = SATA Let's backtrack a bit. I've gone back and read through all of your previous posts on this matter, and so far all the problems are happening on ata5 and ata6. No timeouts or anomalies have appeared on any other ports -- just those two. The kernel error messages indicate that commands submit to the controller took longer than 10 seconds to get a response, so the OS does a force-reset of the ports in attempt to get things working again. We can safely rule out the Silicon Image controller (otherwise "ataX" wouldn't be involved), which leaves the AMD SB700 SATA controller and the AMD SB700 PATA controller. What exact disks (e.g. adX) are attached to ata5 and ata6? You haven't provided dmesg output in any of your posts, and atacontrol/pciconf is not sufficient (I should really improve atacontrol by printing this information. I'll work on that in a few minutes). Some Linux users have reported AHCI-related issues with the SB600 southbridge, but the core of the problem turned out to be MSI on certain AMD northbridges (specifically RS480, RS400, and RS200). By disabling MSI entirely they were able to achieve stability. The FreeBSD equivalent would be to set the following in loader.conf and reboot: hw.pci.enable_msix="0" hw.pci.enable_msi="0" The Linux quirk fix for this: http://git.kernel.org/?p=linux/kernel/git/stable/stable-queue.git;a=blob_plain;f=queue-2.6.21/pci-quirks-disable-msi-on-rs400-200-and-rs480.patch;hb=05ab505f2909acf3a614d3e6a32271c4c1f8a69d Your board has an AMD 740G northbridge, but it might be worth trying the MSI disable trick anyway. If it doesn't fix the problem then definitely re-enable MSI. Isn't hardware fun? ;-) -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, USA | | Making life hard for others since 1977. PGP: 4BD6C0CB |