Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 2 Jun 2016 17:08:09 +0000 (UTC)
From:      Alan Somers <asomers@FreeBSD.org>
To:        src-committers@freebsd.org, svn-src-all@freebsd.org, svn-src-stable@freebsd.org, svn-src-stable-10@freebsd.org
Subject:   svn commit: r301211 - in stable/10/sys/dev: mpr mps
Message-ID:  <201606021708.u52H8963075282@repo.freebsd.org>

next in thread | raw e-mail | index | archive | help
Author: asomers
Date: Thu Jun  2 17:08:08 2016
New Revision: 301211
URL: https://svnweb.freebsd.org/changeset/base/301211

Log:
  MFC r299121
  
  mpr(4) and mps(4) shouldn't indefinitely retry for "terminated ioc" errors
  
  Sponsored by:	Spectra Logic Corp

Modified:
  stable/10/sys/dev/mpr/mpr_sas.c
  stable/10/sys/dev/mps/mps_sas.c
Directory Properties:
  stable/10/   (props changed)

Modified: stable/10/sys/dev/mpr/mpr_sas.c
==============================================================================
--- stable/10/sys/dev/mpr/mpr_sas.c	Thu Jun  2 16:58:47 2016	(r301210)
+++ stable/10/sys/dev/mpr/mpr_sas.c	Thu Jun  2 17:08:08 2016	(r301211)
@@ -2469,11 +2469,20 @@ mprsas_scsiio_complete(struct mpr_softc 
 	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
 	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
 		/*
-		 * Since these are generally external (i.e. hopefully
-		 * transient transport-related) errors, retry these without
-		 * decrementing the retry count.
+		 * These can sometimes be transient transport-related
+		 * errors, and sometimes persistent drive-related errors.
+		 * We used to retry these without decrementing the retry
+		 * count by returning CAM_REQUEUE_REQ.  Unfortunately, if
+		 * we hit a persistent drive problem that returns one of
+		 * these error codes, we would retry indefinitely.  So,
+		 * return CAM_REQ_CMP_ERROR so that we decrement the retry
+		 * count and avoid infinite retries.  We're taking the
+		 * potential risk of flagging false failures in the event
+		 * of a topology-related error (e.g. a SAS expander problem
+		 * causes a command addressed to a drive to fail), but
+		 * avoiding getting into an infinite retry loop.
 		 */
-		mprsas_set_ccbstatus(ccb, CAM_REQUEUE_REQ);
+		mprsas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR);
 		mprsas_log_command(cm, MPR_INFO,
 		    "terminated ioc %x scsi %x state %x xfer %u\n",
 		    le16toh(rep->IOCStatus), rep->SCSIStatus, rep->SCSIState,

Modified: stable/10/sys/dev/mps/mps_sas.c
==============================================================================
--- stable/10/sys/dev/mps/mps_sas.c	Thu Jun  2 16:58:47 2016	(r301210)
+++ stable/10/sys/dev/mps/mps_sas.c	Thu Jun  2 17:08:08 2016	(r301211)
@@ -2408,11 +2408,20 @@ mpssas_scsiio_complete(struct mps_softc 
 	case MPI2_IOCSTATUS_SCSI_IOC_TERMINATED:
 	case MPI2_IOCSTATUS_SCSI_EXT_TERMINATED:
 		/*
-		 * Since these are generally external (i.e. hopefully
-		 * transient transport-related) errors, retry these without
-		 * decrementing the retry count.
+		 * These can sometimes be transient transport-related
+		 * errors, and sometimes persistent drive-related errors.
+		 * We used to retry these without decrementing the retry
+		 * count by returning CAM_REQUEUE_REQ.  Unfortunately, if
+		 * we hit a persistent drive problem that returns one of
+		 * these error codes, we would retry indefinitely.  So,
+		 * return CAM_REQ_CMP_ERROR so that we decrement the retry
+		 * count and avoid infinite retries.  We're taking the
+		 * potential risk of flagging false failures in the event
+		 * of a topology-related error (e.g. a SAS expander problem
+		 * causes a command addressed to a drive to fail), but
+		 * avoiding getting into an infinite retry loop.
 		 */
-		mpssas_set_ccbstatus(ccb, CAM_REQUEUE_REQ);
+		mpssas_set_ccbstatus(ccb, CAM_REQ_CMP_ERR);
 		mpssas_log_command(cm, MPS_INFO,
 		    "terminated ioc %x scsi %x state %x xfer %u\n",
 		    le16toh(rep->IOCStatus), rep->SCSIStatus, rep->SCSIState,



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?201606021708.u52H8963075282>