Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 9 Dec 2014 11:08:57 +0100
From:      Kai Gallasch <k@free.de>
To:        freebsd-stable@freebsd.org
Subject:   Re: 10.1 RC4 r273903 - zpool scrub on ssd mirror - ahci command timeout
Message-ID:  <20141209110857.6bb9bcbd@orwell>
In-Reply-To: <20141209090157.M14881@martymac.org>
References:  <20141106003240.344dedf6@orwell> <545AB64F.1060502@multiplay.co.uk> <20141106012739.509b96b5@orwell> <545ACCEF.5000300@multiplay.co.uk> <20141209093405.6dd2c268@orwell> <20141209090157.M14881@martymac.org>

next in thread | previous in thread | raw e-mail | index | archive | help
--Sig_/4ki=HT6Fq8P1xA3z3vESpzF
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

Am Tue,  9 Dec 2014 09:04:26 +0000 (UTC)
schrieb "Ganael LAPLANCHE" <ganael.laplanche@martymac.org>:

> On Tue, 9 Dec 2014 09:34:05 +0100, Kai Gallasch wrote
>=20
> Hi Kai,
>=20
> > Any ideas (left) ?
>=20
> There is a PR for AHCI timeouts :
>=20
> https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D195349
>=20
> I don't know if it is related to your problem but maybe you can try
> the suggested workaround ?

Thank you for this information.
But no. My problem seems to be unrelated..

K.


echo 'hint.ahci.0.msi=3D"0"' >> /boot/loader.conf

After reboot:

# zpool scrub ssdpool
# zpool status ssdpool
  pool: ssdpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are
unaffected. action: Determine if the device needs to be replaced, and
clear the errors using 'zpool clear' or replace the device with 'zpool
replace'. see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub in progress since Tue Dec  9 10:36:24 2014
        5.36G scanned out of 115G at 166M/s, 0h11m to go
        24.5K repaired, 4.65% done
config:

	NAME              STATE     READ WRITE CKSUM
	ssdpool           ONLINE       0     0     0
	  mirror-0        ONLINE       0     0     0
	    gpt/ssdpool0  ONLINE       0     0    13  (repairing)
	    gpt/ssdpool1  ONLINE       0     0     0

errors: No known data errors


After the zpool scrub finished:

# zpool status ssdpool
  pool: ssdpool
 state: ONLINE
status: One or more devices has experienced an unrecoverable error.  An
	attempt was made to correct the error.  Applications are
unaffected.
action: Determine if the device needs to be replaced, and clear the
errors
	using 'zpool clear' or replace the device with 'zpool replace'.
   see: http://illumos.org/msg/ZFS-8000-9P
  scan: scrub repaired 38.5K in 0h9m with 0 errors on Tue Dec  9
10:45:58 2014
config:

	NAME              STATE     READ WRITE CKSUM
	ssdpool           ONLINE       0     0     0
	  mirror-0        ONLINE       0     0     0
	    gpt/ssdpool0  ONLINE       0     0    15
	    gpt/ssdpool1  ONLINE       0     0     4



# zpool clear ssdpool
# zpool scrub ssdpool

This "zpool scrub" run one SSD drive is lost during the scrub :-/
A "camcontrol rescan all" does not bring the missing ssd drive back..


# zpool status ssdpool
  pool: ssdpool
 state: DEGRADED
status: One or more devices could not be opened.  Sufficient replicas
exist for
	the pool to continue functioning in a degraded state.
action: Attach the missing device and online it using 'zpool online'.
   see: http://illumos.org/msg/ZFS-8000-2Q
  scan: scrub canceled on Tue Dec  9 10:58:42 2014
config:

	NAME                     STATE     READ WRITE CKSUM
	ssdpool                  DEGRADED     0     0     0
	  mirror-0               DEGRADED     0     0     0
	    gpt/ssdpool0         ONLINE       0     0     0
	    2481016284460057031  UNAVAIL    297   215    47
	was /dev/gpt/ssdpool1



ahcich3: Timeout on slot 24 port 0
ahcich3: is 00000000 cs fc00001f ss ff00001f rs ff00001f tfd 40 serr 000000=
00 cmd 0024d917
(ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 24 1b 4a c6 40 1e 00 00 00=
 00 00
(ada3:ahcich3:0:0:0): CAM status: Command timeout
(ada3:ahcich3:0:0:0): Retrying command
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0=
0 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Retrying command
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0=
0 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0=
0 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retry was blocked
ada3 at ahcich3 bus 0 scbus3 target 0 lun 0
ada3: <Samsung SSD 850 PRO 512GB EXM01B6Q> s/n S1SXNSAFA06835A detached
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0=
0 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Retrying command
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): ATA_IDENTIFY. ACB: ec 00 00 00 00 40 00 00 00 00 0=
0 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Poll timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(ada3:ahcich3:0:0:0): SETFEATURES ENABLE RCACHE. ACB: ef aa 00 00 00 40 00 =
00 00 00 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Poll timeout on slot 4 port 0
ahcich3: is 00000000 cs 00000010 ss 00000000 rs 00000010 tfd 80 serr 000000=
00 cmd 0024c417
(aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted
ahcich3: Timeout on slot 4 port 0
ahcich3: is 00000000 cs 0001fff0 ss 0001fff0 rs 0001fff0 tfd 80 serr 000000=
00 cmd 0024c417
(ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 24 1b 4a c6 40 1e 00 00 00=
 00 00
(ada3:ahcich3:0:0:0): CAM status: Command timeout
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 15 3f 4a c6 40 1e 00 00 00=
 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 08 ed c4 21 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): READ_FPDMA_QUEUED. ACB: 60 00 71 19 d3 40 04 00 00 01=
 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 f0 bb 43 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 45 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 46 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 47 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 48 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 49 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4a 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4b 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): WRITE_FPDMA_QUEUED. ACB: 61 e8 bb 4c 23 40 20 00 00 0=
0 00 00
(ada3:ahcich3:0:0:0): CAM status: Unconditionally Re-queue Request
(ada3:ahcich3:0:0:0): Error 5, Periph was invalidated
(ada3:ahcich3:0:0:0): Periph destroyed
ahcich3: AHCI reset: device not ready after 31000ms (tfd =3D 00000080)
ahcich3: Poll timeout on slot 16 port 0
ahcich3: is 00000000 cs 00010000 ss 00000000 rs 00010000 tfd 80 serr 000000=
00 cmd 0024d017
(aprobe0:ahcich3:0:0:0): NOP. ACB: 00 00 00 00 00 00 00 00 00 00 00 00
(aprobe0:ahcich3:0:0:0): CAM status: Command timeout
(aprobe0:ahcich3:0:0:0): Error 5, Retries exhausted


--=20
PGP-KeyID =3D 0xE401B671927D4A5C
I am a robot.



--Sig_/4ki=HT6Fq8P1xA3z3vESpzF
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1

iQIcBAEBCgAGBQJUhso5AAoJEHBlTXxPsfWIUfAP/3PBpfvPXY/eCbXtF5Kh+08+
pTASCmwlAjyB2HGQG3MkknyhcDzLU9Anaht+R4qAiUF579i5Zzn/2Vg/516h+Fon
HSIZPQUOrrvKbAqY5gAaAwLWiIBtphL1jt1tZBtLlDLA0cZ6DsRa7XWlsCrtoraV
G54kTGczCc3gZR7m7QJr0b9CPA8DNp0dEUZVUpIN3fq7NRWXkp88TY7AJyHclkPz
LHH8DMK8/uLwwCFkrmUIpYRdAsu1VdvWvYMDiFa36v+ZbbG+f1HPyJa6GiSIHX3C
Tdrb5KWij8eNdRXFsoil4dTIQXfqdKOfgcVZVXI7cztI5LBqUrHNMp7hxgHvF8QT
u7XPn+GhBMkbPCGZJCu5cl37gz7Mom2mHa96eMTj8Ezcbg8OJCtPCdkluDwEpaNK
mxE8OprktNDRD9jcCUShC6ZzisEp0phWunobaO74pyZ3GSOwHnJBx8YOR56Gb7x9
KwF033fi3TRGvf32P0GlmQbJc2U1/q9tNmaoeubOsIjOliUE8EBKP/vu+zSj+3sY
iwpvIjJFhhs42iAPrmESJzk3kwjZl2ZSHMLrQHLRlX07q1GVg6a5xKGJY1DyXtLR
QA0XWbcKabXEb+Yt/AkMI1CKQp6fvJoJVcFij03Ln8iQ96Iiu8tlxNnkbsEsqRQ4
NRaJ8rchIfPs0HlF0Hqv
=MglT
-----END PGP SIGNATURE-----

--Sig_/4ki=HT6Fq8P1xA3z3vESpzF--



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?20141209110857.6bb9bcbd>