From owner-freebsd-stable@FreeBSD.ORG Tue Dec 9 08:34:19 2014 Return-Path: Delivered-To: freebsd-stable@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:1900:2254:206a::19:1]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by hub.freebsd.org (Postfix) with ESMTPS id 5F22E45E for ; Tue, 9 Dec 2014 08:34:19 +0000 (UTC) Received: from smtp.free.de (smtp.free.de [91.204.6.103]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (Client did not present a certificate) by mx1.freebsd.org (Postfix) with ESMTPS id CF61F6D for ; Tue, 9 Dec 2014 08:34:18 +0000 (UTC) Received: (qmail 90167 invoked from network); 9 Dec 2014 09:34:07 +0100 Received: from smtp.free.de (HELO orwell) (k@free.de@[91.204.4.103]) (envelope-sender ) by smtp.free.de (qmail-ldap-1.03) with AES128-SHA encrypted SMTP for ; 9 Dec 2014 09:34:07 +0100 Date: Tue, 9 Dec 2014 09:34:05 +0100 From: Kai Gallasch To: freebsd-stable@freebsd.org Subject: Re: 10.1 RC4 r273903 - zpool scrub on ssd mirror - ahci command timeout Message-ID: <20141209093405.6dd2c268@orwell> In-Reply-To: <545ACCEF.5000300@multiplay.co.uk> References: <20141106003240.344dedf6@orwell> <545AB64F.1060502@multiplay.co.uk> <20141106012739.509b96b5@orwell> <545ACCEF.5000300@multiplay.co.uk> Organization: FREE! X-Mailer: Claws Mail 3.9.3 (GTK+ 2.24.23; x86_64-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha512; boundary="Sig_/YDEQFYLCZWGw5j7u2ru5=EY"; protocol="application/pgp-signature" X-BeenThere: freebsd-stable@freebsd.org X-Mailman-Version: 2.1.18-1 Precedence: list List-Id: Production branch of FreeBSD source code List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 09 Dec 2014 08:34:19 -0000 --Sig_/YDEQFYLCZWGw5j7u2ru5=EY Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Am Thu, 06 Nov 2014 01:20:47 +0000 schrieb Steven Hartland : > Try recabling and re-seating, if it still happens try to identify if > its the disk or backplane by moving it in the chassis. We had a > machine here recently where it was backplane issue and simply > replacing it fixed the issue. Steven. In the last weeks I took some time to single out the reason for the AHCI timeouts with the two Samsung SSD drives. Just for the record, my original post on the FreeBSD mailing list archive: http://lists.freebsd.org/pipermail/freebsd-stable/2014-November/080914.html I changed / tried the following to get rid of the AHCI timouts, but no chance, they still show :-/ Hardware: - Changed all four SATA cables with cables of an identical spare server - Changed all four SATA cables with certified SATA3 cables - Replaced the 2.5" -> 3.5" drive converters with ones of another manufacturer - Replaced the drive backplane of the server - Directly hooking the two SSDs up to the SATA connectors on the mainboard - Experimentally put an LSI 9212-4i4e PCIe SATA/SAS Controller into the server and and connected the SATA cables to it. - Same as before, but using the certified SATA3 cables - Same as before, but this time connecting the two SSDs directly to the 9212-4i4e - Same as before, connecting the two SSD directly to the 9212-4i4e, but this time with the original SATA cables BIOS: - Temporarily disabled Power Management - Tried disabling "Enable Hot Plug" Option The difference between using the SATA connectors of the mainboard and using the LSI 9212-4i4e is, that the LSI controller seems to be more picky about CRC errors on the SATA bus and bus problems even show without starting a zfs scrub. When doing a scrub using the LSI controller there are plenty of timeouts and in one test, one of the SSD drives even disappeard from the SATA bus. Of course all the time during testing the two Hitachi non-SSD SATA drives did not show any problems at all - although also connected to the mainboard or the LSI controller during the testing. So I now think the whole problem centers around the Samsung 850 PRO 512GB SSDs. Too bad I do not have the budget to just buy two Intel (or other) SSDs of similar size and see if the timeouts disappear.. I wonder if this is a firmware issue with the drive or just some misguided fancy energy saving feature of this particular drive model causing the whole trouble. Both drives have serial numbers not far apart and smartctl claims there are no errors on the SSDs. Any ideas (left) ? Regards, Kai. --=20 PGP-KeyID =3D 0xE401B671927D4A5C I am not a robot. --Sig_/YDEQFYLCZWGw5j7u2ru5=EY Content-Type: application/pgp-signature; name=signature.asc Content-Disposition: attachment; filename=signature.asc -----BEGIN PGP SIGNATURE----- Version: GnuPG v1 iQIcBAEBCgAGBQJUhrP9AAoJEHBlTXxPsfWIj3oP/3VvAq8fVVM02tyHI1ZGCLGO YX+I+plC37NmAFQTjcVpTt7hUO+QBLWmkT9TkCmrmfQ9XaY/zxWiMPxAy1G+ecH4 o1o8vhOMGXbVpbVfXWGwXkp8dHagER+Rhd9Sx23JPRo3TuHhU6qkKex2/rbk6mVg 2+j74y5MA41jOBo+R4NdR98tqQ3dG/S2GdgQPFWjbZcrnS5r2UD8lcQIaUnyuRbJ nd1W6ckILDeOSnPWhHCJ6UIkWTwxWwJ1A7Ji1PaBG3MT8RWJPsySS31NY6pJVjRr lEefOEQpYPPP4bi2T0YIC6KA4OOmD/ZXDqFimnKNzGBQVa2dG7bU8yt6XCY2Kszf 2bKD2W/gglHTfa524xssCQI84qQwsn1BdAcHfUqUqMuIAQLvEOEESrawsrIWkBYk edeUhA/tcNgacVeD49TrUU1oVoHKEHBcuSBehcxg6p/uxSVSXzU/X9d/zrjsG3Pz p8ZiZLrPKS0qHNrjLzA0Nog/WSbAWCKnWqhv2cIJlH/Xro1HuVlj4i8NmzdexPkm o0IqL+PSl4O6vwQoWQTge+DZyWunNkGMJM4nH7ZGYJAI87+yvVxddHtsZFi/Iq99 J7WOmQagUStQgi7QK0EEHzGElBJi138Gw0CM8g4yvf1yUa4u4XgLMGYXvj+NSD+P S6YiEhkjSS68NIuxP6ff =u7id -----END PGP SIGNATURE----- --Sig_/YDEQFYLCZWGw5j7u2ru5=EY--