From owner-freebsd-bugs  Tue Oct 29 10:50:02 1996
Return-Path: owner-bugs
Received: (from root@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id KAA16020
          for bugs-outgoing; Tue, 29 Oct 1996 10:50:02 -0800 (PST)
Received: (from gnats@localhost)
          by freefall.freebsd.org (8.7.5/8.7.3) id KAA16014;
          Tue, 29 Oct 1996 10:50:01 -0800 (PST)
Date: Tue, 29 Oct 1996 10:50:01 -0800 (PST)
Message-Id: <199610291850.KAA16014@freefall.freebsd.org>
To: freebsd-bugs
Cc: 
From: se@zpr.uni-koeln.de (Stefan Esser)
Subject: Re: kern/1919: NCR PCI error
Reply-To: se@zpr.uni-koeln.de (Stefan Esser)
Sender: owner-bugs@FreeBSD.ORG
X-Loop: FreeBSD.org
Precedence: bulk

The following reply was made to PR kern/1919; it has been noted by GNATS.

From: se@zpr.uni-koeln.de (Stefan Esser)
To: daniels@borg.mit.edu
Cc: FreeBSD-gnats-submit@freebsd.org
Subject: Re: kern/1919: NCR PCI error
Date: Tue, 29 Oct 1996 19:44:15 +0100

 Daniel C. Stevenson writes:
 > The system has an NCR PCI controller. It has a 2GB SCSI hard
 > disk, 48MB of memory, and an ISA-based Ethernet card (3C509).
 
 What model of disk drive is that ?
 
 > >Description:
 > 
 > The problem: access to various directories and files fails, giving an
 > error to the shell of "Input/output failed" for the affected files or
 > directories. The console displays the following message repeatedly:
 > 
 > assertion "cp" failed: file "../../pci/ncr.c", line 5563
 > sd0(ncr0:0:0):COMMAND FAILED (4 28) @f0d7c400
 
 This is a secondary effect. Please send the lines ABOVE those.
 They should start with the word ERROR in capital letters ...
 
 > (I'm not sure which of these 2 lines actually goes first, but the
 > pattern repeats continuously)
 > 
 > This problem seems to happen at random times. It has happened when the
 > system has been up for over a week (since the last reboot) or just a
 > few days after the last reboot. The system currently maintains a
 > moderate Web server load and 1 or 2 users. The problem seems to happen
 > independent of server load or any other discernible influences.
 > 
 > When the error happens, some but not all partitions of the disk
 > are affected, and only parts of them are affected.
 
 This is quite interesting. Something like that might happen if you 
 have tagged command queues enabled, but the drive does not (always)
 support as many tags. Since the driver default is 4 tags (and most
 drives should easily support at least 32), this is a safe default. 
 But I've got some reports, which seem to indidcate that certain drives
 don't support while doing error recovery after a failed read or write.
 
 I'm not certain about this, but you may want to
 
 1) make sure that automatic bad block replacement is enabled
    (see the "scsi" command, mode page 1, ARRE and AWRE)
 
 2) check whether not using tags does help ("ncrcontrol -s tags=0")
 
 > >How-To-Repeat:
 > 
 > Wait for it to happen again.
 
 Well, you should tell me how I might repeat it. Hmmm, just send me
 your system and I'll wait :)
 
 > >Fix:
 > 	
 > A hard reboot (cycling the power). "reboot" doesn't work
 > ("Input/output error") and using Ctrl-Alt-Del doesn't work either; in
 > the latter case, it results in a hung system at the "Boot:" prompt.
 
 Ctrl-Alt-Del does not work ???
 Hmmm ... Seems that the drive did lock up and does not even recover
 if faced with a SCSI bus reset ...
 
 I'll look into this again after I receive some more detailed information
 about your disk drive and whether the system works reliably without tags.
 
 Regards, STefan