From owner-freebsd-stable@FreeBSD.ORG  Tue Dec  9 08:34:19 2014
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Delivered-To: freebsd-stable@freebsd.org
Received: from mx1.freebsd.org (mx1.freebsd.org
 [IPv6:2001:1900:2254:206a::19:1])
 (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits))
 (No client certificate requested)
 by hub.freebsd.org (Postfix) with ESMTPS id 5F22E45E
 for <freebsd-stable@freebsd.org>; Tue,  9 Dec 2014 08:34:19 +0000 (UTC)
Received: from smtp.free.de (smtp.free.de [91.204.6.103])
 (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
 (Client did not present a certificate)
 by mx1.freebsd.org (Postfix) with ESMTPS id CF61F6D
 for <freebsd-stable@freebsd.org>; Tue,  9 Dec 2014 08:34:18 +0000 (UTC)
Received: (qmail 90167 invoked from network); 9 Dec 2014 09:34:07 +0100
Received: from smtp.free.de (HELO orwell) (k@free.de@[91.204.4.103])
 (envelope-sender <k@free.de>)
 by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP
 for <freebsd-stable@freebsd.org>; 9 Dec 2014 09:34:07 +0100
Date: Tue, 9 Dec 2014 09:34:05 +0100
From: Kai Gallasch <k@free.de>
To: freebsd-stable@freebsd.org
Subject: Re: 10.1 RC4 r273903 - zpool scrub on ssd mirror - ahci command
 timeout
Message-ID: <20141209093405.6dd2c268@orwell>
In-Reply-To: <545ACCEF.5000300@multiplay.co.uk>
References: <20141106003240.344dedf6@orwell> <545AB64F.1060502@multiplay.co.uk>
 <20141106012739.509b96b5@orwell> <545ACCEF.5000300@multiplay.co.uk>
Organization: FREE!
X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu)
MIME-Version: 1.0
Content-Type: multipart/signed; micalg=pgp-sha512;
 boundary="Sig_/YDEQFYLCZWGw5j7u2ru5=EY"; protocol="application/pgp-signature"
X-BeenThere: freebsd-stable@freebsd.org
X-Mailman-Version: 2.1.18-1
Precedence: list
List-Id: Production branch of FreeBSD source code <freebsd-stable.freebsd.org>
List-Unsubscribe: <http://lists.freebsd.org/mailman/options/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=unsubscribe>
List-Archive: <http://lists.freebsd.org/pipermail/freebsd-stable/>
List-Post: <mailto:freebsd-stable@freebsd.org>
List-Help: <mailto:freebsd-stable-request@freebsd.org?subject=help>
List-Subscribe: <http://lists.freebsd.org/mailman/listinfo/freebsd-stable>,
 <mailto:freebsd-stable-request@freebsd.org?subject=subscribe>
X-List-Received-Date: Tue, 09 Dec 2014 08:34:19 -0000

--Sig_/YDEQFYLCZWGw5j7u2ru5=EY
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Am Thu, 06 Nov 2014 01:20:47 +0000
schrieb Steven Hartland <killing@multiplay.co.uk>:

> Try recabling and re-seating, if it still happens try to identify if
> its the disk or backplane by moving it in the chassis. We had a
> machine here recently where it was backplane issue and simply
> replacing it fixed the issue.

Steven.

In the last weeks I took some time to single out the reason for the AHCI
timeouts with the two Samsung SSD drives.

Just for the record, my original post on the FreeBSD mailing
list archive:

http://lists.freebsd.org/pipermail/freebsd-stable/2014-November/080914.html


I changed / tried the following to get rid of the AHCI timouts, but no
chance, they still show :-/

Hardware:

- Changed all four SATA cables with cables of an identical spare server
- Changed all four SATA cables with certified SATA3 cables
- Replaced the 2.5" -> 3.5" drive converters with ones of another
  manufacturer
- Replaced the drive backplane of the server
- Directly hooking the two SSDs up to the SATA connectors on the
  mainboard
- Experimentally put an LSI 9212-4i4e PCIe SATA/SAS Controller into the
  server and and connected the SATA cables to it.
- Same as before, but using the certified SATA3 cables
- Same as before, but this time connecting the two SSDs directly to the
  9212-4i4e
- Same as before, connecting the two SSD directly to the 9212-4i4e, but
  this time with the original SATA cables


BIOS:
- Temporarily disabled Power Management
- Tried disabling "Enable Hot Plug" Option


The difference between using the SATA connectors of the mainboard and
using the LSI 9212-4i4e is, that the LSI controller seems to be more
picky about CRC errors on the SATA bus and bus problems even show
without starting a zfs scrub. When doing a scrub using the LSI
controller there are plenty of timeouts and in one test, one of the SSD
drives even disappeard from the SATA bus.

Of course all the time during testing the two Hitachi non-SSD SATA
drives did not show any problems at all - although also connected to
the mainboard or the LSI controller during the testing.

So I now think the whole problem centers around the Samsung 850 PRO
512GB SSDs. Too bad I do not have the budget to just buy two Intel (or
other) SSDs of similar size and see if the timeouts disappear..

I wonder if this is a firmware issue with the drive or just some
misguided fancy energy saving feature of this particular drive
model causing the whole trouble.

Both drives have serial numbers not far apart and smartctl claims there
are no errors on the SSDs.

Any ideas (left) ?

Regards,
Kai.

--=20
PGP-KeyID =3D 0xE401B671927D4A5C
I am not a robot.


--Sig_/YDEQFYLCZWGw5j7u2ru5=EY
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCgAGBQJUhrP9AAoJEHBlTXxPsfWIj3oP/3VvAq8fVVM02tyHI1ZGCLGO
YX+I+plC37NmAFQTjcVpTt7hUO+QBLWmkT9TkCmrmfQ9XaY/zxWiMPxAy1G+ecH4
o1o8vhOMGXbVpbVfXWGwXkp8dHagER+Rhd9Sx23JPRo3TuHhU6qkKex2/rbk6mVg
2+j74y5MA41jOBo+R4NdR98tqQ3dG/S2GdgQPFWjbZcrnS5r2UD8lcQIaUnyuRbJ
nd1W6ckILDeOSnPWhHCJ6UIkWTwxWwJ1A7Ji1PaBG3MT8RWJPsySS31NY6pJVjRr
lEefOEQpYPPP4bi2T0YIC6KA4OOmD/ZXDqFimnKNzGBQVa2dG7bU8yt6XCY2Kszf
2bKD2W/gglHTfa524xssCQI84qQwsn1BdAcHfUqUqMuIAQLvEOEESrawsrIWkBYk
edeUhA/tcNgacVeD49TrUU1oVoHKEHBcuSBehcxg6p/uxSVSXzU/X9d/zrjsG3Pz
p8ZiZLrPKS0qHNrjLzA0Nog/WSbAWCKnWqhv2cIJlH/Xro1HuVlj4i8NmzdexPkm
o0IqL+PSl4O6vwQoWQTge+DZyWunNkGMJM4nH7ZGYJAI87+yvVxddHtsZFi/Iq99
J7WOmQagUStQgi7QK0EEHzGElBJi138Gw0CM8g4yvf1yUa4u4XgLMGYXvj+NSD+P
S6YiEhkjSS68NIuxP6ff
=u7id
-----END PGP SIGNATURE-----

--Sig_/YDEQFYLCZWGw5j7u2ru5=EY--