FreeBSD Mail Archives

Date:      Thu, 15 Apr 1999 09:00:35 +0200
From:      J Wunsch <j@uriah.heep.sax.de>
To:        scsi@FreeBSD.ORG
Subject:   Re: timed out while idle?
Message-ID:  <19990415090035.03868@uriah.heep.sax.de>
In-Reply-To: <199904142037.OAA13777@narnia.plutotech.com>; from Justin T. Gibbs on Wed, Apr 14, 1999 at 02:37:53PM -0600
References:  <199904140231.UAA07250@panzer.plutotech.com> <199904142037.OAA13777@narnia.plutotech.com>

As Justin T. Gibbs wrote:

> That's not entirely true.  The device will come back if it transitions
> through final close (e.g you umount -f all filesystems referencing
> it).

Last time i checked, this didn't work.  You couldn't umount -f since
umount needed (one way or the other, i didn't investigate) a drive
that at least didn't respond with ENXIO all the time, and since the
umount never completed, CAM was unable to ever get the drive back
again.

>  Further, the code that usually causes the disk pack to be
> invalidated is in cam_periph.c:cam_periph_error() where a selection
> timeout causes us to receive an ENXIO error.  I believe that
> invalidating the pack is the correct thing to do since we have no
> way of determining if the media or device are the same, but that we
> should be retrying things like selection timeouts in a more sane
> fashion so that invalidations are a rarity.

I think we've been at this discussion before.  IMHO, CAMs action in
this case is not what all the people would expect, and it makes CAM
(which i believe is excellent by design -- no criticism) rather
fragile compared to other operating system.  You can't e.g. swap a
SCSI chain terminator while the chain is under heavy load, or it would
invalidate all the disks on it.  Compare this to e.g. a Solaris
machine, where you can do this.

Don't get me wrong, i understand why you implemented it this way (at
least i believe i understand, since i guess that's the behaviour you
needed for Plutotech), and i agree that this is one possible view at
the world.  However, i'd like to see it `tunable' in a way where it
tries a lot harder to assume the drive is still alive, since from my
experience, 99 % of the SCSI problems are not drives gone south, but
SCSI busses being temporarily broken, which is fixable.  I'd even
argue that this is what most people would expect...  Any chance to
have the behaviour optional?  (The current default behaviour might be
very feasible for people running large disk farms, where the wiring is
usually well, but it's indeed the disks that wear out.)

> Its not CAM behavior, its da behavior.  It would be a da(4) ioctl.
> If you'd like to add such and ioctl and a utility to toggle it, be
> my guest.

OK, i'll look into it. ;-)

-- 
cheers, J"org

joerg_wunsch@uriah.heep.sax.de -- http://www.sax.de/~joerg/ -- NIC: JW11-RIPE
Never trust an operating system you don't have sources for. ;-)

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message

Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?19990415090035.03868>

Header And Logo

Peripheral Links

Site Navigation

Header And Logo

Peripheral Links

Search

Site Navigation