From owner-freebsd-scsi@FreeBSD.ORG Wed Jun 16 23:32:23 2010 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.freebsd.org (mx1.freebsd.org [IPv6:2001:4f8:fff6::34]) by hub.freebsd.org (Postfix) with ESMTP id DCC451065674 for ; Wed, 16 Jun 2010 23:32:23 +0000 (UTC) (envelope-from scottl@samsco.org) Received: from pooker.samsco.org (pooker.samsco.org [168.103.85.57]) by mx1.freebsd.org (Postfix) with ESMTP id A4C818FC12 for ; Wed, 16 Jun 2010 23:32:23 +0000 (UTC) Received: from [127.0.0.1] (pooker.samsco.org [168.103.85.57]) (authenticated bits=0) by pooker.samsco.org (8.14.4/8.14.4) with ESMTP id o5GNWIJS074858; Wed, 16 Jun 2010 17:32:18 -0600 (MDT) (envelope-from scottl@samsco.org) Mime-Version: 1.0 (Apple Message framework v1078) Content-Type: text/plain; charset=us-ascii From: Scott Long In-Reply-To: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> Date: Wed, 16 Jun 2010 17:32:18 -0600 Content-Transfer-Encoding: quoted-printable Message-Id: References: <51DD9715-89B2-4058-A4FE-7097603013CC@averesystems.com> To: Andrew Boyer X-Mailer: Apple Mail (2.1078) X-Spam-Status: No, score=-50.0 required=3.8 tests=ALL_TRUSTED, T_RP_MATCHES_RCVD autolearn=unavailable version=3.3.0 X-Spam-Checker-Version: SpamAssassin 3.3.0 (2010-01-18) on pooker.samsco.org Cc: freebsd-scsi@freebsd.org Subject: Re: Overlapped Commands error X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 16 Jun 2010 23:32:23 -0000 On Jun 16, 2010, at 10:17 AM, Andrew Boyer wrote: > Hello SCSI experts, > We recently saw this SCSI command error: >=20 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): READ(10). CDB: 28 0 = 2 c8 7f a0 0 0 20 0 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): CAM Status: SCSI = Status Error >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): SCSI Status: Check = Condition >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): ABORTED COMMAND = asc:4e,0 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Overlapped commands = attempted field replaceable unit: 1 >> Jun 15 15:08:32 eval12 kernel: (da1:mpt0:0:1:0): Retrying Command = (per Sense Data) >> Jun 15 15:08:37 eval12 kernel: mpt0: request 0xffffffff815d5c20:40101 = timed out for ccb 0xffffff000d54d800 (req->ccb 0xffffff000d54d800) >> Jun 15 15:08:37 eval12 kernel: mpt0: attempting to abort req = 0xffffffff815d5c20:40101 function 0 >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_wait_req(1) timed out >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_recover_commands: abort = timed-out. Resetting controller >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 >> Jun 15 15:08:38 eval12 kernel: mpt0: mpt_cam_event: 0x0 >> Jun 15 15:08:38 eval12 kernel: mpt0: completing timedout/aborted req = 0xffffffff815d5c20:40101 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x12 >> Jun 15 15:09:00 eval12 kernel: mpt0: mpt_cam_event: 0x16 >=20 > No one here has ever seen this before. We're using a CAM and MPT = stack from August 2009 with an LSI1068e HBA connected to Seagate SAS = HDDs. >=20 > This is what the SCSI Architecture Manual (SAM-5 draft) has to say = about overlapped commands: >> [...] >=20 > Can anyone point me to where in the stack the command identifier is = assigned? I see where MPT assigns tags in target mode, but it's the = initiator in this case. Any advice? Don't want to step on Matt, but wanted to expand on what he's said so = far. CAM doesn't assign tag identifiers for initiator I/O, it leaves that up = to the driver and hardware. The tag_id field that you see in CCB's is = for target I/O only. In the case of MPT, the firmware assigns tags, = while on simpler controllers like ESP the driver does it. CAM does = provide the tag action message, i.e. SIMPLE, ORDERED, HEAD_OF_Q, and = it's up to the driver to relay that to hardware, which MPT does in = mpt_start(). The MPT architecture abstracts a lot of the transport protocol away, so = it's generally assumed that it's going to do the right thing in a case = like this. I don't know if the firmware is wrong, or if FreeBSD is = wrong. CAM almost always attaches a SIMPLE action flag with I/O = commands, and the MPT driver looks like it will faithfully translate = that into the corresponding MPT flag. By looking at the inquiry data, = it's roughly possible to determine if the device supports tagged = queuing, so maybe CAM needs to be smarter about this. Instead of the TQ = flag just affecting command scheduling, maybe it also needs to suppress = attaching the SIMPLE action flag, and likewise the MPT driver should set = an UNTAGGED flag in correlation to that. I would expect the MPT firmware to look at the inquiry data and behave = appropriately despite what might be sent in the MPT i/o request, but = again, maybe that's asking too much. If you're adventurous, try = modifying the MPT driver to always set the MPI_SCSIIO_CONTROL_UNTAGGED = flag in mpt_start(), and see if that makes your problem go away. >=20 > Also, is CAM doing the right thing by retrying? scsi_error_action() = in cam/scsi/scsi_all.c always sets the retry bit on aborted commands, = even though the spec quoted above makes it sound like this should be a = fatal error ("This is considered a catastrophic failure on the part of = the SCSI initiator device"). Should scsi_error_action() be looking at = the Additional Sense Code? >=20 The error recovery code in CAM already cross references the ASC/ASCQ to = an action table, but that table is often incomplete for uncommon edge = cases. Try the following: RCS file: /usr1/ncvs/src/sys/cam/scsi/scsi_all.c,v retrieving revision 1.55.2.3 diff -u -r1.55.2.3 scsi_all.c --- scsi_all.c 14 Feb 2010 19:38:27 -0000 1.55.2.3 +++ scsi_all.c 16 Jun 2010 23:31:47 -0000 @@ -1962,7 +1962,7 @@ { SST(0x4D, 0xFF, SS_RDEF | SSQ_RANGE, NULL) }, /* Range 0x00->0xFF */ /* DTLPWROMAEBKVF */ - { SST(0x4E, 0x00, SS_RDEF, + { SST(0x4E, 0x00, SS_FATAL | ENXIO, "Overlapped commands attempted") }, /* T */ { SST(0x50, 0x00, SS_RDEF, Scott