From owner-freebsd-stable Thu Sep 14 23:37:21 2000 Delivered-To: freebsd-stable@freebsd.org Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169]) by hub.freebsd.org (Postfix) with ESMTP id 43F5037B424 for ; Thu, 14 Sep 2000 23:37:18 -0700 (PDT) Received: (from ken@localhost) by panzer.kdm.org (8.9.3/8.9.1) id AAA83751; Fri, 15 Sep 2000 00:37:14 -0600 (MDT) (envelope-from ken) Date: Fri, 15 Sep 2000 00:37:14 -0600 From: "Kenneth D. Merry" To: Rahul Dhesi Cc: freebsd-stable@FreeBSD.ORG Subject: Re: SCSI retries without errors in /var/log/messages? Message-ID: <20000915003713.A83692@panzer.kdm.org> References: <20000913200811.56E267C63@yellow.rahul.net> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline User-Agent: Mutt/1.2i In-Reply-To: <20000913200811.56E267C63@yellow.rahul.net>; from dhesi@rahul.net on Wed, Sep 13, 2000 at 01:08:11PM -0700 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On Wed, Sep 13, 2000 at 13:08:11 -0700, Rahul Dhesi wrote: > "Kenneth D. Merry" writes: > > >The timeout for read and write operations in the da(4) driver is 60 > >seconds, and we retry things four times. > > And I understand that an error is logged only if all retries fail. So > potentially we could try tree times, with a total 180 second delay, then > succeed on the fourth try, and no error would be logged. > > So I thought about this, and wondered if we could have ongoing SCSI > delays with no errors logged. > > Suppose there is a SCSI hardware problem such that every I/O operation > has a 0.01 probability of timing out, which means it has a 0.99 > probability of succeeding. [ analysis of the percentage of time spent in error recovery given the above probability of problems ] There's an easy way to find out whether your disk is getting SCSI errors and retrying things over and over again. In src/sys/cam/scsi/scsi_all.c, in scsi_interpret_sense(), comment out the following print_sense line: default: /* decrement the number of retries */ retry = ccb->ccb_h.retry_count > 0; if (retry) { ccb->ccb_h.retry_count--; error = ERESTART; print_sense = FALSE; } else error = EIO; break; } Then boot your system with -v. You should get an error message for every SCSI error we get back, even if the retry count hasn't been exhaused. If things are timing out, you'll get a timeout message from the HBA driver (the Adaptec driver will likely print out "timed out while idle"). (The HBA driver timeout messages should show up by default.) Another thing to try, if you suspect that commands are taking a long time to complete, but aren't hitting the timeout, is to try reducing the read/write timeout in the da(4) driver. Near the top of the driver, you'll see the following: #ifndef DA_DEFAULT_TIMEOUT #define DA_DEFAULT_TIMEOUT 60 /* Timeout in seconds */ #endif You can just change it to 10 seconds or something. If your disk is taking 50 seconds or so to return a command, that'll cause the timeout handler in the HBA driver to fire, and print out a message. Ken -- Kenneth Merry ken@kdm.org To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message