From owner-freebsd-stable@FreeBSD.ORG Tue Feb 14 16:50:31 2012 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id 1B64C106566C for ; Tue, 14 Feb 2012 16:50:31 +0000 (UTC) (envelope-from jdc@koitsu.dyndns.org) Received: from qmta10.emeryville.ca.mail.comcast.net (qmta10.emeryville.ca.mail.comcast.net [76.96.30.17]) by mx1.freebsd.org (Postfix) with ESMTP id F0D498FC1F for ; Tue, 14 Feb 2012 16:50:30 +0000 (UTC) Received: from omta08.emeryville.ca.mail.comcast.net ([76.96.30.12]) by qmta10.emeryville.ca.mail.comcast.net with comcast id ZsTl1i0050FhH24AAsqWn9; Tue, 14 Feb 2012 16:50:30 +0000 Received: from koitsu.dyndns.org ([67.180.84.87]) by omta08.emeryville.ca.mail.comcast.net with comcast id ZsqV1i00A1t3BNj8UsqVJe; Tue, 14 Feb 2012 16:50:30 +0000 Received: by icarus.home.lan (Postfix, from userid 1000) id 30B5D102C1E; Tue, 14 Feb 2012 08:50:29 -0800 (PST) Date: Tue, 14 Feb 2012 08:50:29 -0800 From: Jeremy Chadwick To: Claudius Herder Message-ID: <20120214165029.GA1852@icarus.home.lan> References: <20120214091909.GP2010@equilibrium.bsdes.net> <20120214100513.GA94501@icarus.home.lan> <20120214135435.GQ2010@equilibrium.bsdes.net> <20120214141601.GA98986@icarus.home.lan> <4F3A83DE.3000200@ambtec.de> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <4F3A83DE.3000200@ambtec.de> User-Agent: Mutt/1.5.21 (2010-09-15) Cc: freebsd-stable@freebsd.org Subject: Re: problems with AHCI on FreeBSD 8.2 X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 14 Feb 2012 16:50:31 -0000 On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote: > > Hello, > > I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still > persists on FreeBSD 9.0 release. > > Switching from ahci to ataahci resolved the problem for me too. > > I'm using gmirror for swap, system is on a zpool and the problem first > occurred during a zpool scrub, but it is easily reproducible with dd. > > The timeouts only occur when writing to disks, dd if=/dev/ada{0|1} > of=/dev/null is not an issue. > Sometimes I need to power off the server because after a reboot one disk > is still missing. > > I really would like to help in this issue, so let me know if you need > any more information. I find it interesting that, at least so far, the only people reporting problems of this type with the ahci.ko driver are people using Samsung disks. The only difference is that your models are F1s while the OPs are F2s. The only difference I can think of is that the ahci.ko driver may have more strict timeouts than the ata driver (ata driver includes ataahci; ataahci.ko != ahci.ko, as you know). You may be able to adjust these using loader.conf variables: kern.cam.ada.default_timeout kern.cam.ada.retry_count I also imagine that hint.ahci.X.ccc might have some involvement here, but it's something I am not familiar with. mav@ would need to comment on this -- it's outside of my familiarity scope. Furthermore, in your case, your ada1 disk has serious CRC-related problems, and your ada0 disk has seen similar just at a much lower rate. ada1 should probably be replaced (along with cables, dusting out SATA ports, etc.), but keeping ada0 is probably fine. The statistics for these are shown in the "smartctl -l sataphy" output, field labelled ID 0x0001, "Command failed due to ICRC error". These are SATA-level problems or physical problems which will manifest themselves as anomalies during any kind of I/O. The counters shown in ID 0x000a and 0x0009 are completely fine; these don't indicate any problems. Your drives don't support GP log region 0x04, which is why "smartctl -l devstat" returns the errors it does. The errors you see coming from the kernel in this situation are 100% okay/acceptable; the drive itself is literally returning ABRT status to the inquiry submit to it. Different drives from different vendors behave differently in this regard. So, what I'm trying to say is, your problem looks different than the OPs. Let's not start a big "I have this problem too" thread; that has happened so many times over the years that when it happens I immediately bow out + stop participating in the thread. > smartctl -l sataphy /dev/ada0 > > SATA Phy Event Counters (GP Log 0x11) > ID Size Value Description > 0x000a 2 150 Device-to-host register FISes sent due to a COMRESET > 0x0001 2 3 Command failed due to ICRC error > 0x0009 2 173 Transition from drive PhyRdy to drive PhyNRdy > > smartctl -l sataphy /dev/ada1 > > SATA Phy Event Counters (GP Log 0x11) > ID Size Value Description > 0x000a 2 155 Device-to-host register FISes sent due to a COMRESET > 0x0001 2 65535+ Command failed due to ICRC error > 0x0009 2 178 Transition from drive PhyRdy to drive PhyNRdy -- | Jeremy Chadwick jdc@parodius.com | | Parodius Networking http://www.parodius.com/ | | UNIX Systems Administrator Mountain View, CA, US | | Making life hard for others since 1977. PGP 4BD6C0CB |