From owner-freebsd-stable Sun Jul 12 01:59:28 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA27491 for freebsd-stable-outgoing; Sun, 12 Jul 1998 01:59:28 -0700 (PDT) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from Octopussy.MI.Uni-Koeln.DE (Octopussy.MI.Uni-Koeln.DE [134.95.166.20]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA27471; Sun, 12 Jul 1998 01:59:20 -0700 (PDT) (envelope-from se@dialup124.zpr.uni-koeln.de) Received: from dialup124.zpr.Uni-Koeln.DE (dialup124.zpr.Uni-Koeln.DE [134.95.219.124]) by Octopussy.MI.Uni-Koeln.DE (8.8.8/8.8.8) with ESMTP id KAA12872; Sun, 12 Jul 1998 10:59:17 +0200 (MET DST) Received: (from se@localhost) by dialup124.zpr.Uni-Koeln.DE (8.8.8/8.6.9) id KAA00434; Sun, 12 Jul 1998 10:33:16 +0200 (CEST) X-Face: " Date: Sun, 12 Jul 1998 10:33:16 +0200 From: Stefan Esser To: Leo Papandreou , stable@FreeBSD.ORG Cc: Stefan Esser Subject: Re: NCR 875 and tagged queing. Broken? References: <19980627214312.58671@supersex.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Mailer: Mutt 0.89i In-Reply-To: <19980627214312.58671@supersex.com>; from Leo Papandreou on Sat, Jun 27, 1998 at 09:43:12PM -0400 Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG On 1998-06-27 21:43 -0400, Leo Papandreou wrote: > > 2.2-STABLE (cvsupped and built June 26) > > Twin channel NCR 875 adapter, Quantum Atlas III, FAILSAFE commented > out in kernerl's configuration file. > > cp -RP dir1 dir2 (dir1 and dir2 on different partitions, same drive.) > produces lots of these messages: > > Jun 26 17:42:47 abou /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 6191 > Jun 26 17:42:48 abou /kernel: sd0(ncr0:6:0): COMMAND FAILED (4 28) @f14a1800. This is a result of too many simultanous outstanding commands. The drive returns QUEUE_FULL status if it is asked to accept another (tagged) command, and the upper layer SCSI Code will initiate several retries of that command. > I've seen recent reports of an identical problem. I'm not sure if its > the hardware; the fact that these other reports are very recent makes > me suspect the hard drive is not at fault. I wish I had a spare AHA > around to test this suspicion but I do not. Also, although I realize > older quantums cannot reliably do tagged queing, this is an 18.2 Gig > Atlas III bought not 2 days ago. (Please let it not be the hardware.) It might be the firmware. Atlas drives have been known to show that effect for quite some time: They accept a huge number of tagged commands during normal operation, but suddenly decide to support only a few (during short intervals of resource exhaustion ?) The generic SCSI code in FreeBSD 2.2.x and -current pre-dates use of tags in drivers, and can't really deal with QUEUE_FULL. The new CAM code (a new snapshot has been announced by Justin Gibbs recently) will understand QUEUE_FULL status to mean "throttle down". It will reduce the number of simultanous commands sent to a drive, and will try to slowly raise that value again after things seem normal again. > This does not happen if the directories involved are small. This does > not happen when FAILSAFE is present. The problem certainly has something > to do with tagged queing as has already been pointed out in a previous > msg. Without FAILSAFE, SCSI_NCR_DFLT_TAGS defaults to 4 but I've seen > at least 1 msg on this list where someone had set SCSI_NCR_DFLT_TAGS=8. You can use any number of tags between 0 and 16, but in my tests with several drives I found 8 tags to give best performance and 4 tags to give nearly identical performance woth less system load. Justin Gibbs reported throughput improvements with much higher numbers of tags, but I could not reproduce them, either because I could not produce the same kind of load, or because the NCR driver uses linear lists in a few cases, which does not matter if there are a few entries in the list, but may do, if the list grows to tens or hundreds of entries. > Can anyone confirm or deny that the problem is related to recent (Jun 2?) > changes in the kernel? No, there have been none in that area, sorry. > Jun 26 18:01:15 abou /kernel: (ncr0:6:0): "QUANTUM QM318000TD-SW N1B0" type 0 fixed SCSI 2 I do not know, whether there is a problem with tags in that firmware release (N1B0). The problem existed in both the Atlas and Atlas II, but I do not know much about the Atlas III ... There should not be any data loss because of that situation. You may want to test the next snapshot release of Justin Gibbs CAM code. It is much better tested with Adaptec cards, but I've been using a CAM system for several months with my NCR card and an old Quantum Atlas with no problems. (But the highest load is an occasional "make world" every one or two weeks :) Regards, STefan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message