From owner-freebsd-stable Mon Jul 13 01:45:11 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id BAA00690 for freebsd-stable-outgoing; Mon, 13 Jul 1998 01:45:11 -0700 (PDT) (envelope-from owner-freebsd-stable@FreeBSD.ORG) Received: from isb.ncr.com.pk (waraich@isb.ncr.com.pk [194.133.48.215]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA00680; Mon, 13 Jul 1998 01:44:55 -0700 (PDT) (envelope-from waraich@Pakistan.NCR.COM) Received: (from waraich@localhost) by isb.ncr.com.pk (8.8.8/8.8.8) id NAA11954; Mon, 13 Jul 1998 13:44:55 +0500 (PKT) (envelope-from Saad.Waraich) From: "Saad M. Waraich" Message-Id: <199807130844.NAA11954@isb.ncr.com.pk> Subject: Re: NCR 875 and tagged queing. Broken? In-Reply-To: <19980712103316.07090@mi.uni-koeln.de> from Stefan Esser at "Jul 12, 98 10:33:16 am" To: stable@FreeBSD.ORG Date: Mon, 13 Jul 1998 13:44:54 +0500 (PKT) Cc: leo@talcom.net, se@FreeBSD.ORG X-Mailer: ELM [version 2.4ME+ PL38 (25)] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Sender: owner-freebsd-stable@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.ORG The problem is a combination of the NCR driver and the Atlas III drive. I have an 875 based card (Tekram 390F) and a 2 gig. Atlas III drive and I've seen this problem a lot. Upgrading the drive's firmware didn't help either. Is it worth it to talk to Quantum about this problem ? They could easily shrug it off saying that it is a problem in the driver. -- Saad Stefan Esser wrote: > On 1998-06-27 21:43 -0400, Leo Papandreou wrote: > > > > 2.2-STABLE (cvsupped and built June 26) > > > > Twin channel NCR 875 adapter, Quantum Atlas III, FAILSAFE commented > > out in kernerl's configuration file. > > > > cp -RP dir1 dir2 (dir1 and dir2 on different partitions, same drive.) > > produces lots of these messages: > > > > Jun 26 17:42:47 abou /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 6191 > > Jun 26 17:42:48 abou /kernel: sd0(ncr0:6:0): COMMAND FAILED (4 28) @f14a1800. > > This is a result of too many simultanous outstanding commands. > > The drive returns QUEUE_FULL status if it is asked to accept > another (tagged) command, and the upper layer SCSI Code will > initiate several retries of that command. > > > I've seen recent reports of an identical problem. I'm not sure if its > > the hardware; the fact that these other reports are very recent makes > > me suspect the hard drive is not at fault. I wish I had a spare AHA > > around to test this suspicion but I do not. Also, although I realize > > older quantums cannot reliably do tagged queing, this is an 18.2 Gig > > Atlas III bought not 2 days ago. (Please let it not be the hardware.) > > It might be the firmware. Atlas drives have been known to show > that effect for quite some time: They accept a huge number of > tagged commands during normal operation, but suddenly decide to > support only a few (during short intervals of resource exhaustion ?) > > The generic SCSI code in FreeBSD 2.2.x and -current pre-dates use > of tags in drivers, and can't really deal with QUEUE_FULL. > The new CAM code (a new snapshot has been announced by Justin Gibbs > recently) will understand QUEUE_FULL status to mean "throttle down". > It will reduce the number of simultanous commands sent to a drive, > and will try to slowly raise that value again after things seem > normal again. > > > This does not happen if the directories involved are small. This does > > not happen when FAILSAFE is present. The problem certainly has something > > to do with tagged queing as has already been pointed out in a previous > > msg. Without FAILSAFE, SCSI_NCR_DFLT_TAGS defaults to 4 but I've seen > > at least 1 msg on this list where someone had set SCSI_NCR_DFLT_TAGS=8. > > You can use any number of tags between 0 and 16, but in my tests > with several drives I found 8 tags to give best performance and > 4 tags to give nearly identical performance woth less system load. > Justin Gibbs reported throughput improvements with much higher > numbers of tags, but I could not reproduce them, either because I > could not produce the same kind of load, or because the NCR driver > uses linear lists in a few cases, which does not matter if there > are a few entries in the list, but may do, if the list grows to > tens or hundreds of entries. > > > Can anyone confirm or deny that the problem is related to recent (Jun 2?) > > changes in the kernel? > > No, there have been none in that area, sorry. > > > Jun 26 18:01:15 abou /kernel: (ncr0:6:0): "QUANTUM QM318000TD-SW N1B0" type 0 fixed SCSI 2 > > I do not know, whether there is a problem with tags in that firmware > release (N1B0). The problem existed in both the Atlas and Atlas II, > but I do not know much about the Atlas III ... > > There should not be any data loss because of that situation. You may > want to test the next snapshot release of Justin Gibbs CAM code. It > is much better tested with Adaptec cards, but I've been using a CAM > system for several months with my NCR card and an old Quantum Atlas > with no problems. (But the highest load is an occasional "make world" > every one or two weeks :) > > Regards, STefan > > To Unsubscribe: send mail to majordomo@FreeBSD.org > with "unsubscribe freebsd-stable" in the body of the message > To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-stable" in the body of the message