From owner-freebsd-questions Sat Jul 15 00:56:10 1995 Return-Path: questions-owner Received: (from majordom@localhost) by freefall.cdrom.com (8.6.10/8.6.6) id AAA27684 for questions-outgoing; Sat, 15 Jul 1995 00:56:10 -0700 Received: from ref.tfs.com (ref.tfs.com [140.145.254.251]) by freefall.cdrom.com (8.6.10/8.6.6) with ESMTP id AAA27678 for ; Sat, 15 Jul 1995 00:56:09 -0700 Received: (from julian@localhost) by ref.tfs.com (8.6.11/8.6.9) id KAA00748; Thu, 13 Jul 1995 10:33:18 -0700 From: Julian Elischer Message-Id: <199507131733.KAA00748@ref.tfs.com> Subject: Re: SCSI disk wedge To: karl@mcs.net (Karl Denninger) Date: Thu, 13 Jul 1995 10:33:18 -0700 (PDT) Cc: questions@freebsd.org In-Reply-To: <199507130143.UAA00551@Jupiter.mcs.net> from "Karl Denninger" at Jul 12, 95 08:43:04 pm X-Mailer: ELM [version 2.4 PL24] Content-Type: text Content-Length: 3766 Sender: questions-owner@freebsd.org Precedence: bulk Hi. I've been following this with interest as I wrote the 1742 driver abd the rest of the scsi system. I really don't think that we have a generic bug in that code that is producing your problem, because it's running (and has been for years) on too many other systems. However I'm quite willing to believe that we have a problem that is only visible on hardware that has certain characteristics. You mention that BSDI can run it successfully but that FreeBSD can't. This may still be a problem with your drives. FreeBSD does quite a lot that BSDI doen't do. FOR EXAMPLE.. FreeBSD will merge filesystem IO operations that are to adjacent blocks.. (clustered IO). This results occasionally in IO operations of a much larger size than BSDI will ever make to the disk. FreeBSD will also allow several outstanding commands at a time to be sent to the adapter. Some adapters will attempt to negotiate with the drive to see if it's allowable to send those commands to the drive as a set of chained commands. It's possible that your drives don't like this very much but pretend that they do.. BSDI can't do this either.. (last time I looked). We can possibly turn these features off if you think that might help you.. (I know how to turn off the multiple operations but you'll have to ask DG or someone abut the clusterring..) > > > The drives on these machines are (1) less than two months old, (2) have > current firmware, and (3) don't have ANY problems with BSDI. > > If FreeBSD is going to be a production platform then it is going to have to > start behaving like one. This means that pushing things off on drive > vendors is not acceptable. Does it mean we need to cripple our code every time a drive vendor makes a mistake?, pretty soon we'd have no functionallity left above that of DOS, as that's always tested:) having said that, we do that with the SCSI TAPE drives, where we know about 'rogue' devices and cripple ourselves selectively.. I hope to move this code over to the scsi system as a general thing.. The NCR guys have already made a start on this but it needs more work. > > If you have a problem with a device, you *report it*. Silent death is never > acceptable. The kernel is running in this case, but the system is hung > waiting on I/O completion. the system now should time out the transaction and report that.. > > I am not at all convinced this is a firmware issue. If it was then the 83 > days of uptime on identically-configured BSDI machines wouldn't be happening. > > But they are. > > Those 83-day uptimes are recorded on our production NFS servers which run a > much heavier disk load, with the same devices, on a different OS with no > problems. I'm not trying to escape fixing a problem. It's just that I can't see the problem anywhere else and am trying to work out why only you are seeing this exact problem.. It could be other factors as well..BSDI and FreeBSD will treat the CACHE (memory) differntly for example.. one may fall into a hardware trap the other misses.. > > Karl Denninger (karl@MCS.Net)| MCSNet - The Finest Internet Connectivity > Modem: [+1 312 248-0900] | (shell, PPP, SLIP, leased) in Chicagoland > Voice: [+1 312 248-8649] | 7 Chicagoland POPs, ISDN, 28.8, much more > Fax: [+1 312 248-9865] | Email to "info@mcs.net" WWW: http://www.mcs.net > ISDN - Get it here TODAY! | Home of Chicago's only FULL AP Clarinet feed! +----------------------------------+ ______ _ __ | __--_|\ Julian Elischer | \ U \/ / On assignment | / \ julian@ref.tfs.com +------>x USA \ in a very strange | ( OZ ) 300 lakeside Dr. oakland CA. \___ ___ | country ! +- X_.---._/ USA+(510) 645-3137(wk) \_/ \\ v >