Skip site navigation (1)Skip section navigation (2)
Date:      Tue, 09 Feb 2021 14:14:48 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 253373] mps(4) drive status request during spin-up timeout results in zpool degradation
Message-ID:  <bug-253373-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D253373

            Bug ID: 253373
           Summary: mps(4) drive status request during spin-up timeout
                    results in zpool degradation
           Product: Base System
           Version: 12.2-RELEASE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Only Me
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: grembo@FreeBSD.org

Short description:
------------------

When a drive spins up while `camcontrol epc -c status' is run on the drive,=
 a
command timeout happens that results in the drive being unavailable tempora=
rily
and marked as FAULTED in the zpool.

Increasing MPS_ATA_ID_TIMEOUT (in code, not tunable) might be a workaround.

Long version:
-------------

I've started seeing this after updating from 11.3 to 12.1 and, subsequently,
12.2. It's not necessarily caused to the update itself, but it might be rel=
ated
to switching to `camcontrol epc' after the update - I used `camcontrol cmd =
-a
"E5 00..."' beforehand.

I have an LSI (Avago/Broadcom) SAS2008 based controller with two spindles
connected to it (Seagate IronWolf) storing data only (system drives are
separate).

As this is a home NAS, drives are configured to spin-down when idle for more
than 30 minutes using camcontrol:

  /sbin/camcontrol standby /dev/da2 -v -t 1800
  /sbin/camcontrol standby /dev/da3 -v -t 1800

There's a cron job running every ten minutes logging the spin state of the
drives using camcontrol:

  camcontrol epc da2 -c status -P
  camcontrol epc da2 -c status -P

All of this is working just fine, but once in a while, the zpool goes to a
DEGRADED state, because one of the drives is in a FAULTED state (too many
errors).

Based on the logs and the timestamps therein, this seems to be caused when a
drive spins up exactly at the same time when an EPC status check is done.

So the order of events is like this:

  1. System is idle, drives are on standby
  2. Cronjob checks drive spin state using camcontrol epc status
  3. At the same time, a disk access happens, causing the drives in the zpo=
ol
to spin up
  4. Spinning up of the drives delays reporting the spin state
  5. The timeout causes the mps driver to abort the command, subsequent
requests to the drive fail
  6. Failed requests to the drive cause it to go to a FAILED state and degr=
ade
the zpool

This happens every few weeks by chance, e.g., today I accessed a file exact=
ly
in the right moment, which was recorded in the local webserver's error log,=
 so
everything matched up.

This is the log output, first, the drive status inquiry (which was started =
at
10:40:00) fails

  Feb  9 10:40:05 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01=
 00
00 ff 00 length 255 SMID 803 Command timeout on target 2(0x0009) 5000 set,
5.388682389 elapsed

Then, the command is aborted:

  Feb  9 10:40:05 nas kernel: mps0: Sending abort to target 2 for SMID 803
  Feb  9 10:40:05 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01=
 00
00 ff 00 length 255 SMID 803 Aborting command 0xfffffe00026c5708
  Feb  9 10:40:08 nas kernel: mps0: Controller reported scsi ioc terminated=
 tgt
2 SMID 369 loginfo 31140000
  Feb  9 10:40:08 nas kernel: mps0: Finished abort recovery for target 2
  Feb  9 10:40:08 nas kernel: mps0: Unfreezing devq for target ID 2

At 10:40:09 the webservers log shows a 502 error (request for the file hung=
 for
a couple of seconds beforehand, so the two events clearly overlapped - I ca=
n't
say if the spin-up was already initiated/the webserver request started shor=
tly
before the status request or if it was the other way round).

Following this, syslog shows a ton of messages about the drive not being re=
ady:

  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): READ(10). CDB: 28 00 e4 00 =
4a
40 00 00 20 00=20
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): CAM status: CCB request
completed with an error
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): Retrying command, 9 more tr=
ies
remain
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): READ(10). CDB: 28 00 e4 00 =
4a
40 00 00 20 00=20
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): CAM status: SCSI Status Err=
or
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): SCSI status: Check Condition
  Feb  9 10:40:08 nas kernel: (da2:mps0:0:2:0): SCSI sense: NOT READY asc:4=
,0
(Logical unit not=20
ready, cause not reportable)
  ...
  Feb  9 10:40:09 nas kernel: (da2:mps0:0:2:0): Error 5, Retries exhausted
  ...

Until ZFS realizes the vdev has issues:

  Feb  9 10:40:09 nas ZFS[38943]: vdev state changed, pool_guid=3D$xxxx
vdev_guid=3D$xxxx

After that there are more NOT READY errors, and another timeout/abort:

  Feb  9 10:40:14 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01=
 00
00 ff 00 length 255 SMID 724 Command timeout on target 2(0x0009) 5000 set,
5.353551387 elapsed
  Feb  9 10:40:14 nas kernel: mps0: Sending abort to target 2 for SMID 724
  Feb  9 10:40:14 nas kernel:       (pass2:mps0:0:2:0): INQUIRY. CDB: 12 01=
 00
00 ff 00 length 255 SMID 724 Aborting command 0xfffffe00026bece0
  Feb  9 10:40:14 nas kernel: mps0: Finished abort recovery for target 2
  Feb  9 10:40:14 nas kernel: mps0: Unfreezing devq for target ID 2

Afterwards, the drive is just fine and running "zpool clear" + scrubbing sh=
ows
that all is good again - it's also not always the same drive, it just depen=
ds
on timing).

Now, the obvious workaround is to remove logging of the drive spin state for
the time being, which should fix the that specific issue, but I wonder if t=
here
is a solution for this (either, mps should be able to handle the timeout
scenario gracefully, or it should be possible to avoid it increasing a
timeout).

I haven't increased kern.cam.scsi_delay yet (it's also 5 seconds, but seems
unrelated otherwise), but looking at the mps code, it seems like that
increasing MPS_ATA_ID_TIMEOUT in /usr/src/sys/dev/mps/mpsvar.h might help to
avoid this specific timeout:

  #define MPS_ATA_ID_TIMEOUT     5/* 5 second timeout for SATA ID cmd */

I would prefer to hear from someone familiar with the driver before startin=
g to
poke around in there though.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-253373-227>