From owner-freebsd-stable@FreeBSD.ORG  Tue Feb 14 16:50:31 2012
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34])
	by hub.freebsd.org (Postfix) with ESMTP id 1B64C106566C
	for <freebsd-stable@freebsd.org>; Tue, 14 Feb 2012 16:50:31 +0000 (UTC)
	(envelope-from jdc@koitsu.dyndns.org)
Received: from qmta10.emeryville.ca.mail.comcast.net
	(qmta10.emeryville.ca.mail.comcast.net [76.96.30.17])
	by mx1.freebsd.org (Postfix) with ESMTP id F0D498FC1F
	for <freebsd-stable@freebsd.org>; Tue, 14 Feb 2012 16:50:30 +0000 (UTC)
Received: from omta08.emeryville.ca.mail.comcast.net ([76.96.30.12])
	by qmta10.emeryville.ca.mail.comcast.net with comcast
	id ZsTl1i0050FhH24AAsqWn9; Tue, 14 Feb 2012 16:50:30 +0000
Received: from koitsu.dyndns.org ([67.180.84.87])
	by omta08.emeryville.ca.mail.comcast.net with comcast
	id ZsqV1i00A1t3BNj8UsqVJe; Tue, 14 Feb 2012 16:50:30 +0000
Received: by icarus.home.lan (Postfix, from userid 1000)
	id 30B5D102C1E; Tue, 14 Feb 2012 08:50:29 -0800 (PST)
Date: Tue, 14 Feb 2012 08:50:29 -0800
From: Jeremy Chadwick <freebsd@jdc.parodius.com>
To: Claudius Herder <claudius@ambtec.de>
Message-ID: <20120214165029.GA1852@icarus.home.lan>
References: <20120214091909.GP2010@equilibrium.bsdes.net>
	<20120214100513.GA94501@icarus.home.lan>
	<20120214135435.GQ2010@equilibrium.bsdes.net>
	<20120214141601.GA98986@icarus.home.lan>
	<4F3A83DE.3000200@ambtec.de>
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
In-Reply-To: <4F3A83DE.3000200@ambtec.de>
User-Agent: Mutt/1.5.21 (2010-09-15)
Cc: freebsd-stable@freebsd.org
Subject: Re: problems with AHCI on FreeBSD 8.2
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.5
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>, 
	<mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
	<mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 14 Feb 2012 16:50:31 -0000

On Tue, Feb 14, 2012 at 04:55:10PM +0100, Claudius Herder wrote:
> 
> Hello,
> 
> I have got a quite similar problem with AHCI on FreeBSD 8.2 and it still
> persists on FreeBSD 9.0 release.
> 
> Switching from ahci to ataahci resolved the problem for me too.
> 
> I'm using gmirror for swap, system is on a zpool and the problem first
> occurred during a zpool scrub, but it is easily reproducible with dd.
> 
> The timeouts only occur when writing to disks, dd if=/dev/ada{0|1}
> of=/dev/null is not an issue.
> Sometimes I need to power off the server because after a reboot one disk
> is still missing.
> 
> I really would like to help in this issue, so let me know if you need
> any more information.

I find it interesting that, at least so far, the only people reporting
problems of this type with the ahci.ko driver are people using Samsung
disks.  The only difference is that your models are F1s while the OPs
are F2s.

The only difference I can think of is that the ahci.ko driver may have
more strict timeouts than the ata driver (ata driver includes ataahci;
ataahci.ko != ahci.ko, as you know).

You may be able to adjust these using loader.conf variables:

kern.cam.ada.default_timeout
kern.cam.ada.retry_count

I also imagine that hint.ahci.X.ccc might have some involvement here,
but it's something I am not familiar with.  mav@ would need to comment
on this -- it's outside of my familiarity scope.

Furthermore, in your case, your ada1 disk has serious CRC-related
problems, and your ada0 disk has seen similar just at a much lower rate.
ada1 should probably be replaced (along with cables, dusting out SATA
ports, etc.), but keeping ada0 is probably fine.  The statistics for
these are shown in the "smartctl -l sataphy" output, field labelled ID
0x0001, "Command failed due to ICRC error".  These are SATA-level
problems or physical problems which will manifest themselves as
anomalies during any kind of I/O.

The counters shown in ID 0x000a and 0x0009 are completely fine; these
don't indicate any problems.

Your drives don't support GP log region 0x04, which is why "smartctl -l
devstat" returns the errors it does.  The errors you see coming from the
kernel in this situation are 100% okay/acceptable; the drive itself is
literally returning ABRT status to the inquiry submit to it.  Different
drives from different vendors behave differently in this regard.

So, what I'm trying to say is, your problem looks different than the
OPs.  Let's not start a big "I have this problem too" thread; that has
happened so many times over the years that when it happens I immediately
bow out + stop participating in the thread.

> smartctl -l sataphy /dev/ada0
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x000a  2          150  Device-to-host register FISes sent due to a COMRESET
> 0x0001  2            3  Command failed due to ICRC error
> 0x0009  2          173  Transition from drive PhyRdy to drive PhyNRdy
> 
> smartctl -l sataphy /dev/ada1
> 
> SATA Phy Event Counters (GP Log 0x11)
> ID      Size     Value  Description
> 0x000a  2          155  Device-to-host register FISes sent due to a COMRESET
> 0x0001  2        65535+ Command failed due to ICRC error
> 0x0009  2          178  Transition from drive PhyRdy to drive PhyNRdy

-- 
| Jeremy Chadwick                                 jdc@parodius.com |
| Parodius Networking                     http://www.parodius.com/ |
| UNIX Systems Administrator                 Mountain View, CA, US |
| Making life hard for others since 1977.             PGP 4BD6C0CB |