From owner-freebsd-stable  Thu Sep 14 23:37:21 2000
Delivered-To: freebsd-stable@freebsd.org
Received: from panzer.kdm.org (panzer.kdm.org [216.160.178.169])
	by hub.freebsd.org (Postfix) with ESMTP id 43F5037B424
	for <freebsd-stable@FreeBSD.ORG>; Thu, 14 Sep 2000 23:37:18 -0700 (PDT)
Received: (from ken@localhost)
	by panzer.kdm.org (8.9.3/8.9.1) id AAA83751;
	Fri, 15 Sep 2000 00:37:14 -0600 (MDT)
	(envelope-from ken)
Date: Fri, 15 Sep 2000 00:37:14 -0600
From: "Kenneth D. Merry" <ken@kdm.org>
To: Rahul Dhesi <dhesi@rahul.net>
Cc: freebsd-stable@FreeBSD.ORG
Subject: Re: SCSI retries without errors in /var/log/messages?
Message-ID: <20000915003713.A83692@panzer.kdm.org>
References: <freebsd-stable.20000911141718.A51045@panzer.kdm.org> <20000913200811.56E267C63@yellow.rahul.net>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
User-Agent: Mutt/1.2i
In-Reply-To: <20000913200811.56E267C63@yellow.rahul.net>; from dhesi@rahul.net on Wed, Sep 13, 2000 at 01:08:11PM -0700
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On Wed, Sep 13, 2000 at 13:08:11 -0700, Rahul Dhesi wrote:
> "Kenneth D. Merry" <ken@kdm.org> writes:
> 
> >The timeout for read and write operations in the da(4) driver is 60
> >seconds, and we retry things four times.
> 
> And I understand that an error is logged only if all retries fail.  So
> potentially we could try tree times, with a total 180 second delay, then
> succeed on the fourth try, and no error would be logged.
> 
> So I thought about this, and wondered if we could have ongoing SCSI
> delays with no errors logged.
> 
> Suppose there is a SCSI hardware problem such that every I/O operation
> has a 0.01 probability of timing out, which means it has a 0.99
> probability of succeeding.

[ analysis of the percentage of time spent in error recovery given the
above probability of problems ]

There's an easy way to find out whether your disk is getting SCSI errors
and retrying things over and over again.

In src/sys/cam/scsi/scsi_all.c, in scsi_interpret_sense(), comment out the
following print_sense line:

	default:
		/* decrement the number of retries */
		retry = ccb->ccb_h.retry_count > 0;
		if (retry) {
			ccb->ccb_h.retry_count--;
			error = ERESTART;
			print_sense = FALSE;
		} else 
			error = EIO;
		break;
	}

Then boot your system with -v.  You should get an error message for every
SCSI error we get back, even if the retry count hasn't been exhaused.

If things are timing out, you'll get a timeout message from the HBA driver
(the Adaptec driver will likely print out "timed out while idle").  (The
HBA driver timeout messages should show up by default.)

Another thing to try, if you suspect that commands are taking a long time
to complete, but aren't hitting the timeout, is to try reducing the
read/write timeout in the da(4) driver.  Near the top of the driver, you'll
see the following:

#ifndef DA_DEFAULT_TIMEOUT
#define DA_DEFAULT_TIMEOUT 60   /* Timeout in seconds */
#endif

You can just change it to 10 seconds or something.  If your disk is taking
50 seconds or so to return a command, that'll cause the timeout handler in
the HBA driver to fire, and print out a message.

Ken
-- 
Kenneth Merry
ken@kdm.org


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message