From owner-freebsd-stable  Mon Jul 13 01:45:11 1998
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id BAA00690
          for freebsd-stable-outgoing; Mon, 13 Jul 1998 01:45:11 -0700 (PDT)
          (envelope-from owner-freebsd-stable@FreeBSD.ORG)
Received: from isb.ncr.com.pk (waraich@isb.ncr.com.pk [194.133.48.215])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA00680;
          Mon, 13 Jul 1998 01:44:55 -0700 (PDT)
          (envelope-from waraich@Pakistan.NCR.COM)
Received: (from waraich@localhost)
	by isb.ncr.com.pk (8.8.8/8.8.8) id NAA11954;
	Mon, 13 Jul 1998 13:44:55 +0500 (PKT)
	(envelope-from Saad.Waraich)
From: "Saad M. Waraich" <Saad.Waraich@Pakistan.NCR.COM>
Message-Id: <199807130844.NAA11954@isb.ncr.com.pk>
Subject: Re: NCR 875 and tagged queing. Broken?
In-Reply-To: <19980712103316.07090@mi.uni-koeln.de> from Stefan Esser at "Jul 12, 98 10:33:16 am"
To: stable@FreeBSD.ORG
Date: Mon, 13 Jul 1998 13:44:54 +0500 (PKT)
Cc: leo@talcom.net, se@FreeBSD.ORG
X-Mailer: ELM [version 2.4ME+ PL38 (25)]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

The problem is a combination of the NCR driver and the Atlas III drive.
I have an 875 based card (Tekram 390F) and a 2 gig. Atlas III drive and 
I've seen this problem a lot.

Upgrading the drive's firmware didn't help either. Is it worth it to talk
to Quantum about this problem ? They could easily shrug it off saying 
that it is a problem in the driver.
-- 
Saad


Stefan Esser wrote:
> On 1998-06-27 21:43 -0400, Leo Papandreou <leo@talcom.net> wrote:
> > 
> > 2.2-STABLE (cvsupped and built June 26)
> > 
> > Twin channel NCR 875 adapter, Quantum Atlas III, FAILSAFE commented
> > out in kernerl's configuration file.
> > 
> > cp -RP dir1 dir2 (dir1 and dir2 on different partitions, same drive.)
> > produces lots of these messages:
> > 
> > Jun 26 17:42:47 abou /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 6191
> > Jun 26 17:42:48 abou /kernel: sd0(ncr0:6:0): COMMAND FAILED (4 28) @f14a1800.
> 
> This is a result of too many simultanous outstanding commands.
> 
> The drive returns QUEUE_FULL status if it is asked to accept 
> another (tagged) command, and the upper layer SCSI Code will
> initiate several retries of that command.
> 
> > I've seen recent reports of an identical problem. I'm not sure if its
> > the hardware; the fact that these other reports are very recent makes
> > me suspect the hard drive is not at fault. I wish I had a spare AHA
> > around to test this suspicion but I do not. Also, although I realize
> > older quantums cannot reliably do tagged queing, this is an 18.2 Gig
> > Atlas III bought not 2 days ago. (Please let it not be the hardware.)
> 
> It might be the firmware. Atlas drives have been known to show
> that effect for quite some time: They accept a huge number of
> tagged commands during normal operation, but suddenly decide to
> support only a few (during short intervals of resource exhaustion ?)
> 
> The generic SCSI code in FreeBSD 2.2.x and -current pre-dates use 
> of tags in drivers, and can't really deal with QUEUE_FULL.
> The new CAM code (a new snapshot has been announced by Justin Gibbs 
> recently) will understand QUEUE_FULL status to mean "throttle down".
> It will reduce the number of simultanous commands sent to a drive,
> and will try to slowly raise that value again after things seem 
> normal again.
> 
> > This does not happen if the directories involved are small. This does
> > not happen when FAILSAFE is present. The problem certainly has something
> > to do with tagged queing as has already been pointed out in a previous
> > msg. Without FAILSAFE, SCSI_NCR_DFLT_TAGS defaults to 4 but I've seen
> > at least 1 msg on this list where someone had set SCSI_NCR_DFLT_TAGS=8.
> 
> You can use any number of tags between 0 and 16, but in my tests
> with several drives I found 8 tags to give best performance and 
> 4 tags to give nearly identical performance woth less system load.
> Justin Gibbs reported throughput improvements with much higher
> numbers of tags, but I could not reproduce them, either because I 
> could not produce the same kind of load, or because the NCR driver
> uses linear lists in a few cases, which does not matter if there
> are a few entries in the list, but may do, if the list grows to
> tens or hundreds of entries.
> 
> > Can anyone confirm or deny that the problem is related to recent (Jun 2?)
> > changes in the kernel? 
> 
> No, there have been none in that area, sorry.
> 
> > Jun 26 18:01:15 abou /kernel: (ncr0:6:0): "QUANTUM QM318000TD-SW N1B0" type 0 fixed SCSI 2
> 
> I do not know, whether there is a problem with tags in that firmware
> release (N1B0). The problem existed in both the Atlas and Atlas II, 
> but I do not know much about the Atlas III ...
> 
> There should not be any data loss because of that situation. You may
> want to test the next snapshot release of Justin Gibbs CAM code. It
> is much better tested with Adaptec cards, but I've been using a CAM
> system for several months with my NCR card and an old Quantum Atlas
> with no problems. (But the highest load is an occasional "make world"
> every one or two weeks :)
> 
> Regards, STefan
> 
> To Unsubscribe: send mail to majordomo@FreeBSD.org
> with "unsubscribe freebsd-stable" in the body of the message
> 


To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message