Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 16 Jun 2010 12:17:34 -0400
From:      Andrew Boyer <aboyer@averesystems.com>
To:        freebsd-scsi@freebsd.org
Subject:   Overlapped Commands error
Message-ID:  <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com>

next in thread | raw e-mail | index | archive | help
Hello SCSI experts,
We recently saw this SCSI command error:

> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 2 =
c8 7f a0 0 0 20 0
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI =
Status Error
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check =
Condition
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND =
asc:4e,0
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands =
attempted field replaceable unit: 1
> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command (per =
Sense Data)
> Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 =
timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800)
> Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req =
0xffffffff815d5c20:40101 function 0
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort =
timed-out. Resetting controller
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0
> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0
> Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req =
0xffffffff815d5c20:40101
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12
> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16

No one here has ever seen this before.  We're using a CAM and MPT stack =
from August 2009 with an LSI1068e HBA connected to Seagate SAS HDDs.

This is what the SCSI Architecture Manual (SAM-5 draft) has to say about =
overlapped commands:
> 5.10 Overlapped commands
> An overlapped command occurs when a task manager or a task router =
detects the use of a duplicate I_T_L_Q nexus (see 4.6.6) in a command =
before that I_T_L_Q nexus completes its command lifetime (see 5.5). Each =
SCSI transport protocol standard shall specify whether or not a task =
manager or a task router is required to detect overlapped commands.
> A task manager or a task router that detects an overlapped command =
shall abort all commands received on the I_T nexus on which the =
overlapped command was received and the device server shall return a =
CHECK CONDITION status for the overlapped command. The sense key shall =
be set to ABORTED COMMAND and the additional sense code shall be set to =
OVERLAPPED COMMANDS ATTEMPTED.
> NOTE 11 - An overlapped command may be indicative of a serious error =
and, if not detected, may result in corrupted data. This is considered a =
catastrophic failure on the part of the SCSI initiator device. =
Therefore, vendor specific error recovery procedures may be required to =
guarantee the data integrity on the medium. The SCSI target device =
logical unit may return additional sense data to aid in this error =
recovery procedure (e.g., sequential-access devices may terminate the =
overlapped command with the residue of blocks remaining to be written or =
read at the time the second command was received).

> 4.8.2 Command identifier
> A command identifier (i.e., the Q in an I_T_L_Q nexus) is assigned by =
a SCSI initiator device to uniquely identify one command in the context =
of a particular I_T_L nexus, allowing more than one command to be =
outstanding for that I_T_L nexus at the same time. Each SCSI transport =
protocol defines the size of the command identifier, up to a maximum of =
64 bytes, to be used by SCSI ports that support that SCSI transport =
protocol.
> SCSI transport protocols may define additional restrictions on command =
identifier assignments (e.g., requiring command identifiers to be unique =
per I_T nexus or per I_T_L nexus, or sharing command identifier values =
with other uses such as task management functions).

Can anyone point me to where in the stack the command identifier is =
assigned?  I see where MPT assigns tags in target mode, but it's the =
initiator in this case.  Any advice?

Also, is CAM doing the right thing by retrying?  scsi_error_action() in =
cam/scsi/scsi_all.c always sets the retry bit on aborted commands, even =
though the spec quoted above makes it sound like this should be a fatal =
error ("This is considered a catastrophic failure on the part of the =
SCSI initiator device").  Should scsi_error_action() be looking at the =
Additional Sense Code?

Thanks,
  Andrew

--------------------------------------------------
Andrew Boyer	aboyer@averesystems.com







Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?51DD9715-89B2-4058-A4FE-7097603013CC>