Skip site navigation (1)Skip section navigation (2)
Date:      Sat, 10 Feb 2001 18:05:03 -0800 (PST)
From:      Matthew Jacob <mjacob@feral.com>
To:        audit@freebsd.org
Cc:        "Kenneth D. Merry" <ken@kdm.org>, "Justin T. Gibbs" <gibbs@scsiguy.com>, Gerard Roudier <groudier@club-internet.fr>
Subject:   a couple of minor but important changes to SCSI error handling
Message-ID:  <Pine.LNX.4.21.0102101753560.7694-100000@zeppo.feral.com>

next in thread | raw e-mail | index | archive | help

First is scsi_all.c:
Index: scsi_all.c
===================================================================
RCS file: /home/ncvs/src/sys/cam/scsi/scsi_all.c,v
retrieving revision 1.17
diff -u -r1.17 scsi_all.c
--- scsi_all.c	2000/10/30 08:08:00	1.17
+++ scsi_all.c	2001/02/11 02:03:01
@@ -2177,16 +2177,16 @@
 			/* These should be filtered by the peripheral drivers */
 			/* FALLTHROUGH */
 		case SSD_KEY_MISCOMPARE:
-			print_sense = FALSE;
-			/* FALLTHROUGH */
-		case SSD_KEY_RECOVERED_ERROR:
-
 			/* decrement the number of retries */
 			retry = ccb->ccb_h.retry_count > 0;
-			if (retry)
+			if (retry) {
+				error = ERESTART;
 				ccb->ccb_h.retry_count--;
-
-			error = 0;
+			} else {
+				error = EIO;
+			}
+		case SSD_KEY_RECOVERED_ERROR:
+			error = 0;	/* not an error */
 			break;
 		case SSD_KEY_ILLEGAL_REQUEST:
 			if (((sense_flags & SF_QUIET_IR) != 0)
@@ -2241,6 +2241,7 @@
 				}
 			}
 			break;
+		case SSD_KEY_ABORTED_COMMAND:
 		default:
 			/* decrement the number of retries */
 			retry = ccb->ccb_h.retry_count > 0;
@@ -2255,6 +2256,13 @@
 
 				error = error_action & SS_ERRMASK;
 			}
+			/*
+			 * Make sure ABORTED COMMAND errors get
+			 * printed as they're indicative of marginal
+			 * SCSI busses that people should address.
+			 */
+			if (sense_key == SSD_KEY_ABORTED_COMMAND)
+				print_sense = TRUE;
 		}
 		break;
 	}

---------------------


1. The key SSD_KEY_RECOVERED_ERROR  is not an error at all and should
not be retried. It is an indication that there was an error that was
corrected during the execution of the command. This is per ANSI SCSI2
spec.

It's possible that these should also be noted to the console (as indicative,
perhaps, of growing media defect lists in drives), but the default of
printing errors out if bootverbose in this case is probably enough.

Also, there'd been a missing ERESTART for that clause anyway.

2. If you have an ABORTED COMMAND, it's almost invariably a SCSI parity
error. You should never be silent about these since users should do something
about this if it occurs (moving that power cord *away* from the SCSI cable is
always a good first start). This should print irrespective of bootverbose
because it's an actual real error even if we retry a transmission.



Second is scsi_da.c:
Index: scsi_da.c
===================================================================
RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v
retrieving revision 1.65
diff -u -r1.65 scsi_da.c
--- scsi_da.c	2001/02/07 07:05:58	1.65
+++ scsi_da.c	2001/02/11 01:59:42
@@ -1127,7 +1127,7 @@
 				tag_code = MSG_SIMPLE_Q_TAG;
 			}
 			scsi_read_write(&start_ccb->csio,
-					/*retries*/4,
+					/*retries*/10, /* retry a few times */
 					dadone,
 					tag_code,
 					bp->bio_cmd == BIO_READ,
------

10 retries with a .5 second delay between each is still only 5 seconds. 10
retries might be more appropriate to a SAN environment with at least a couple
of seconds of different initiators spasming the loop.

-matt




To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-audit" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.LNX.4.21.0102101753560.7694-100000>