From owner-freebsd-stable  Sun Jul 12 01:59:28 1998
Return-Path: <owner-freebsd-stable@FreeBSD.ORG>
Received: (from majordom@localhost)
          by hub.freebsd.org (8.8.8/8.8.8) id BAA27491
          for freebsd-stable-outgoing; Sun, 12 Jul 1998 01:59:28 -0700 (PDT)
          (envelope-from owner-freebsd-stable@FreeBSD.ORG)
Received: from Octopussy.MI.Uni-Koeln.DE (Octopussy.MI.Uni-Koeln.DE [134.95.166.20])
          by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id BAA27471;
          Sun, 12 Jul 1998 01:59:20 -0700 (PDT)
          (envelope-from se@dialup124.zpr.uni-koeln.de)
Received: from dialup124.zpr.Uni-Koeln.DE (dialup124.zpr.Uni-Koeln.DE [134.95.219.124])
	by Octopussy.MI.Uni-Koeln.DE (8.8.8/8.8.8) with ESMTP id KAA12872;
	Sun, 12 Jul 1998 10:59:17 +0200 (MET DST)
Received: (from se@localhost) by dialup124.zpr.Uni-Koeln.DE (8.8.8/8.6.9) id KAA00434; Sun, 12 Jul 1998 10:33:16 +0200 (CEST)
X-Face: "<d]#=8pzx);RzeqSKI86OVa7=!0/(uRa.+B.9Z9\eNUn@UG?!`y7yt2dFNn%k4'.}](uE%
 yCO)$e&Y1%3EO~ifu6Q-#YUM&JZ't,}JkPnAz,8Dj33u%@GBi%[Y#LHz$]h7a<p4)-jKI7~sKjlP-^
 EvA[G;]v&0]W!EL%shs,{7x0|oqN4YVIs5,NI#,V{9"WF):5&RkOhyj*#-IAG}Tnu;YCF,d
Message-ID: <19980712103316.07090@mi.uni-koeln.de>
Date: Sun, 12 Jul 1998 10:33:16 +0200
From: Stefan Esser <se@FreeBSD.ORG>
To: Leo Papandreou <leo@talcom.net>, stable@FreeBSD.ORG
Cc: Stefan Esser <se@FreeBSD.ORG>
Subject: Re: NCR 875 and tagged queing. Broken?
References: <19980627214312.58671@supersex.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
X-Mailer: Mutt 0.89i
In-Reply-To: <19980627214312.58671@supersex.com>; from Leo Papandreou on Sat, Jun 27, 1998 at 09:43:12PM -0400
Sender: owner-freebsd-stable@FreeBSD.ORG
Precedence: bulk
X-Loop: FreeBSD.ORG

On 1998-06-27 21:43 -0400, Leo Papandreou <leo@talcom.net> wrote:
> 
> 2.2-STABLE (cvsupped and built June 26)
> 
> Twin channel NCR 875 adapter, Quantum Atlas III, FAILSAFE commented
> out in kernerl's configuration file.
> 
> cp -RP dir1 dir2 (dir1 and dir2 on different partitions, same drive.)
> produces lots of these messages:
> 
> Jun 26 17:42:47 abou /kernel: assertion "cp" failed: file "../../pci/ncr.c", line 6191
> Jun 26 17:42:48 abou /kernel: sd0(ncr0:6:0): COMMAND FAILED (4 28) @f14a1800.

This is a result of too many simultanous outstanding commands.

The drive returns QUEUE_FULL status if it is asked to accept 
another (tagged) command, and the upper layer SCSI Code will
initiate several retries of that command.

> I've seen recent reports of an identical problem. I'm not sure if its
> the hardware; the fact that these other reports are very recent makes
> me suspect the hard drive is not at fault. I wish I had a spare AHA
> around to test this suspicion but I do not. Also, although I realize
> older quantums cannot reliably do tagged queing, this is an 18.2 Gig
> Atlas III bought not 2 days ago. (Please let it not be the hardware.)

It might be the firmware. Atlas drives have been known to show
that effect for quite some time: They accept a huge number of
tagged commands during normal operation, but suddenly decide to
support only a few (during short intervals of resource exhaustion ?)

The generic SCSI code in FreeBSD 2.2.x and -current pre-dates use 
of tags in drivers, and can't really deal with QUEUE_FULL.
The new CAM code (a new snapshot has been announced by Justin Gibbs 
recently) will understand QUEUE_FULL status to mean "throttle down".
It will reduce the number of simultanous commands sent to a drive,
and will try to slowly raise that value again after things seem 
normal again.

> This does not happen if the directories involved are small. This does
> not happen when FAILSAFE is present. The problem certainly has something
> to do with tagged queing as has already been pointed out in a previous
> msg. Without FAILSAFE, SCSI_NCR_DFLT_TAGS defaults to 4 but I've seen
> at least 1 msg on this list where someone had set SCSI_NCR_DFLT_TAGS=8.

You can use any number of tags between 0 and 16, but in my tests
with several drives I found 8 tags to give best performance and 
4 tags to give nearly identical performance woth less system load.
Justin Gibbs reported throughput improvements with much higher
numbers of tags, but I could not reproduce them, either because I 
could not produce the same kind of load, or because the NCR driver
uses linear lists in a few cases, which does not matter if there
are a few entries in the list, but may do, if the list grows to
tens or hundreds of entries.

> Can anyone confirm or deny that the problem is related to recent (Jun 2?)
> changes in the kernel? 

No, there have been none in that area, sorry.

> Jun 26 18:01:15 abou /kernel: (ncr0:6:0): "QUANTUM QM318000TD-SW N1B0" type 0 fixed SCSI 2

I do not know, whether there is a problem with tags in that firmware
release (N1B0). The problem existed in both the Atlas and Atlas II, 
but I do not know much about the Atlas III ...

There should not be any data loss because of that situation. You may
want to test the next snapshot release of Justin Gibbs CAM code. It
is much better tested with Adaptec cards, but I've been using a CAM
system for several months with my NCR card and an old Quantum Atlas
with no problems. (But the highest load is an occasional "make world"
every one or two weeks :)

Regards, STefan

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-stable" in the body of the message