From owner-freebsd-audit Sat Feb 10 18: 5:24 2001 Delivered-To: freebsd-audit@freebsd.org Received: from feral.com (feral.com [192.67.166.1]) by hub.freebsd.org (Postfix) with ESMTP id 84FAF37B4EC for ; Sat, 10 Feb 2001 18:05:05 -0800 (PST) Received: from zeppo.feral.com (IDENT:mjacob@zeppo [192.67.166.71]) by feral.com (8.9.3/8.9.3) with ESMTP id SAA29388; Sat, 10 Feb 2001 18:05:06 -0800 Date: Sat, 10 Feb 2001 18:05:03 -0800 (PST) From: Matthew Jacob Reply-To: mjacob@feral.com To: audit@freebsd.org Cc: "Kenneth D. Merry" , "Justin T. Gibbs" , Gerard Roudier Subject: a couple of minor but important changes to SCSI error handling Message-ID: MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII Sender: owner-freebsd-audit@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG First is scsi_all.c: Index: scsi_all.c =================================================================== RCS file: /home/ncvs/src/sys/cam/scsi/scsi_all.c,v retrieving revision 1.17 diff -u -r1.17 scsi_all.c --- scsi_all.c 2000/10/30 08:08:00 1.17 +++ scsi_all.c 2001/02/11 02:03:01 @@ -2177,16 +2177,16 @@ /* These should be filtered by the peripheral drivers */ /* FALLTHROUGH */ case SSD_KEY_MISCOMPARE: - print_sense = FALSE; - /* FALLTHROUGH */ - case SSD_KEY_RECOVERED_ERROR: - /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; - if (retry) + if (retry) { + error = ERESTART; ccb->ccb_h.retry_count--; - - error = 0; + } else { + error = EIO; + } + case SSD_KEY_RECOVERED_ERROR: + error = 0; /* not an error */ break; case SSD_KEY_ILLEGAL_REQUEST: if (((sense_flags & SF_QUIET_IR) != 0) @@ -2241,6 +2241,7 @@ } } break; + case SSD_KEY_ABORTED_COMMAND: default: /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; @@ -2255,6 +2256,13 @@ error = error_action & SS_ERRMASK; } + /* + * Make sure ABORTED COMMAND errors get + * printed as they're indicative of marginal + * SCSI busses that people should address. + */ + if (sense_key == SSD_KEY_ABORTED_COMMAND) + print_sense = TRUE; } break; } --------------------- 1. The key SSD_KEY_RECOVERED_ERROR is not an error at all and should not be retried. It is an indication that there was an error that was corrected during the execution of the command. This is per ANSI SCSI2 spec. It's possible that these should also be noted to the console (as indicative, perhaps, of growing media defect lists in drives), but the default of printing errors out if bootverbose in this case is probably enough. Also, there'd been a missing ERESTART for that clause anyway. 2. If you have an ABORTED COMMAND, it's almost invariably a SCSI parity error. You should never be silent about these since users should do something about this if it occurs (moving that power cord *away* from the SCSI cable is always a good first start). This should print irrespective of bootverbose because it's an actual real error even if we retry a transmission. Second is scsi_da.c: Index: scsi_da.c =================================================================== RCS file: /home/ncvs/src/sys/cam/scsi/scsi_da.c,v retrieving revision 1.65 diff -u -r1.65 scsi_da.c --- scsi_da.c 2001/02/07 07:05:58 1.65 +++ scsi_da.c 2001/02/11 01:59:42 @@ -1127,7 +1127,7 @@ tag_code = MSG_SIMPLE_Q_TAG; } scsi_read_write(&start_ccb->csio, - /*retries*/4, + /*retries*/10, /* retry a few times */ dadone, tag_code, bp->bio_cmd == BIO_READ, ------ 10 retries with a .5 second delay between each is still only 5 seconds. 10 retries might be more appropriate to a SAN environment with at least a couple of seconds of different initiators spasming the loop. -matt To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-audit" in the body of the message