Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Jul 1995 10:33:18 -0700 (PDT)
From:      Julian Elischer <julian@ref.tfs.com>
To:        karl@mcs.net (Karl Denninger)
Cc:        questions@freebsd.org
Subject:   Re: SCSI disk wedge
Message-ID:  <199507131733.KAA00748@ref.tfs.com>
In-Reply-To: <199507130143.UAA00551@Jupiter.mcs.net> from "Karl Denninger" at Jul 12, 95 08:43:04 pm

next in thread | previous in thread | raw e-mail | index | archive | help
Hi.
I've been following this with interest as I wrote
the 1742 driver abd the rest of the scsi system.

I really don't think that we have a generic bug in that code
that is producing your problem, because it's running (and has been for
years) on too many other systems. However I'm quite willing to
believe that we have a problem that is only visible on 
hardware that has certain characteristics.

You mention that BSDI can run it successfully
but that FreeBSD can't. This may still be a problem with your
drives.
FreeBSD does quite a lot that BSDI doen't do.
FOR EXAMPLE.. FreeBSD will merge filesystem IO operations that
are to adjacent blocks.. (clustered IO). This results occasionally
in IO operations of a much larger size than BSDI will ever make
to the disk. FreeBSD will also allow several outstanding commands at a 
time to be sent to the adapter. Some adapters will attempt to
negotiate with the drive to see if it's allowable to send
those commands to the drive as a set of chained commands.
It's possible that your drives don't like this very much but pretend that
they do.. BSDI can't do this either..  (last time I looked).

We can possibly turn these features off
if you think that might help you..
(I know how to turn off the multiple operations but you'll have to 
ask DG or someone abut the clusterring..)


> 
> 
> The drives on these machines are (1) less than two months old, (2) have
> current firmware, and (3) don't have ANY problems with BSDI.
> 
> If FreeBSD is going to be a production platform then it is going to have to
> start behaving like one.  This means that pushing things off on drive
> vendors is not acceptable.
Does it mean we need to cripple our code every time a drive vendor makes
a mistake?, pretty soon we'd have no functionallity left above that of 
DOS, as that's always tested:)
having said that, we do that with the SCSI TAPE drives, where we know about
'rogue' devices and cripple ourselves selectively..
I hope to move this code over to the scsi system as a general thing..
The NCR guys have already made a start on this but it needs  more work.
> 
> If you have a problem with a device, you *report it*.  Silent death is never
> acceptable.  The kernel is running in this case, but the system is hung
> waiting on I/O completion.
the system now should time out the transaction and report that..

> 
> I am not at all convinced this is a firmware issue.  If it was then the 83
> days of uptime on identically-configured BSDI machines wouldn't be happening.
> 
> But they are.
> 
> Those 83-day uptimes are recorded on our production NFS servers which run a
> much heavier disk load, with the same devices, on a different OS with no
> problems.

I'm not trying to escape fixing a problem. It's just that I can't see
the problem anywhere else and am trying 
to work out why only you are seeing this exact problem..

It could be other factors as well..BSDI and FreeBSD will treat the CACHE
(memory) differntly for example.. one may fall into a hardware trap the other
misses..

> 
> Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity
> Modem: [+1 312 248-0900]     | (shell, PPP, SLIP, leased) in Chicagoland
> Voice: [+1 312 248-8649]     | 7 Chicagoland POPs, ISDN, 28.8, much more
> Fax: [+1 312 248-9865]       | Email to "info@mcs.net" WWW: http://www.mcs.net
> ISDN - Get it here TODAY!    | Home of Chicago's only FULL AP Clarinet feed!
+----------------------------------+       ______ _  __
|   __--_|\  Julian Elischer       |       \     U \/ / On assignment
|  /       \ julian@ref.tfs.com    +------>x   USA    \ in a very strange
| (   OZ    ) 300 lakeside Dr. oakland CA. \___   ___ | country !
+- X_.---._/  USA+(510) 645-3137(wk)           \_/   \\            
          v
> 




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199507131733.KAA00748>