Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 27 Mar 2024 16:00:06 +0000
From:      bugzilla-noreply@freebsd.org
To:        bugs@FreeBSD.org
Subject:   [Bug 277992] mpr and possible trim issues
Message-ID:  <bug-277992-227@https.bugs.freebsd.org/bugzilla/>

next in thread | raw e-mail | index | archive | help
https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=3D277992

            Bug ID: 277992
           Summary: mpr and possible trim issues
           Product: Base System
           Version: 14.0-STABLE
          Hardware: amd64
                OS: Any
            Status: New
          Severity: Affects Some People
          Priority: ---
         Component: kern
          Assignee: bugs@FreeBSD.org
          Reporter: mike@sentex.net

The thread
https://lists.freebsd.org/archives/freebsd-hardware/2024-March/000094.html =
has
most of the details.=20

In summary, a set of WD Blue SA510 SSDs with the latest firmware as of Mar =
2024
will eventually start throwing errors and detach from the controller when I
copy and then destroy a zfs dataset with several million files.  It sort of
feels like a TRIM issue, but not sure.  Putting the disks off the onboard S=
ATA
controller does not recreate the issue.=20

If I start with a low level trim (trim -f /dev/daX), create a raidz1 zfs po=
ol
with 4, one TB WD disks, import a dataset of about 280GB (compressed) that =
has
many (20+mill files), do a zfs send original pool | zfs recv copy-of-pool, =
then
zfs destroy copy-of-pool and repeat about 4 or 5 times, the drives in the p=
ool
will start throwing errors.

If I do a hard trim of the disks, I can start from scratch and again get 4 =
or 5
cycles before the errors.  Hence, it sort of feels like a broken trim issue=
 ?

I tried with auto trim on and off, a manual zfs trim <pool> between zfs sen=
d|
zfs recv tests to no avail. When the disks are on the mpr controller I will=
 get
errors such as=20
(da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ae 28 00 00 08 00
(da6:mpr0:0:16:0): CAM status: CCB request completed with an error
(da6:mpr0:0:16:0): Retrying command, 3 more tries remain
(da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 0c cb 3f 00 00 00 e8 00
(da6:mpr0:0:16:0): CAM status: CCB request completed with an error
(da6:mpr0:0:16:0): Retrying command, 3 more tries remain
(da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ad 28 00 01 00 00
(da6:mpr0:0:16:0): CAM status: CCB request completed with an error
(da6:mpr0:0:16:0): Retrying command, 3 more tries remain
(da6:mpr0:0:16:0): READ(10). CDB: 28 00 6d e0 ac 28 00 00 f8 00
(da6:mpr0:0:16:0): CAM status: CCB request completed with an error
(da6:mpr0:0:16:0): Retrying command, 3 more tries remain
(da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 40 07 df 88 00 01 00 00
(da6:mpr0:0:16:0): CAM status: CCB request completed with an error
(da6:mpr0:0:16:0): Retrying command, 3 more tries remain
(da6:mpr0:0:16:0): WRITE(10). CDB: 2a 00 3f 48 72 08 00 01 00 00
(da6:mpr0:0:16:0): CAM status: SCSI Status Error
(da6:mpr0:0:16:0): SCSI status: Check Condition
(da6:mpr0:0:16:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,=20
or bus device reset occurred)
(da6:mpr0:0:16:0): Retrying command (per sense data)
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2036 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 637 loginfo=20
31110f00
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 42 00 00 01 00 00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1242 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 979 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1243 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2091 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 1612 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2093 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 152 loginfo=20
31110f00
mpr0: Controller reported scsi ioc terminated tgt 15 SMID 2132 loginfo=20
31110f00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 43 17 dc 88 00 01 00 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 43 00 00 00 50 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 0c d4 f6 80 00 00 68 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 0c d4 f5 80 00 01 00 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): READ(10). CDB: 28 00 05 dc 12 28 00 00 f8 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): READ(10). CDB: 28 00 05 dc 0f b0 00 00 88 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 02 96 7e 80 00 00 10 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): READ(10). CDB: 28 00 6f 5b 8d 68 00 01 00 00
(da5:mpr0:0:15:0): CAM status: CCB request completed with an error
(da5:mpr0:0:15:0): Retrying command, 3 more tries remain
(da5:mpr0:0:15:0): WRITE(10). CDB: 2a 00 41 98 42 00 00 01 00 00
(da5:mpr0:0:15:0): CAM status: SCSI Status Error
(da5:mpr0:0:15:0): SCSI status: Check Condition
(da5:mpr0:0:15:0): SCSI sense: UNIT ATTENTION asc:29,0 (Power on, reset,=20
or bus device reset occurred)
(da5:mpr0:0:15:0): Retrying command (per sense data)

The same tests with Samsung disks work without issue or at least I was not =
able
to recreate the error.=20

# mprutil show adapter
mpr0 Adapter:
       Board Name: INSPUR 3008IT
   Board Assembly: INSPUR
        Chip Name: LSISAS3008
    Chip Revision: ALL
    BIOS Revision: 18.00.00.00
Firmware Revision: 16.00.12.00
  Integrated RAID: no
         SATA NCQ: ENABLED
 PCIe Width/Speed: x8 (8.0 GB/sec)
        IOC Speed: Full
      Temperature: 56 C


I originally ran into this problem with the same series of LSI adapter, but=
 it
was not in IT mode and instead was using the mrsas driver.=20=20

When on the ATA controller the disks are DSM_TRIM. When on MPR, they are
ATA_TRIM.

--=20
You are receiving this mail because:
You are the assignee for the bug.=



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?bug-277992-227>