From owner-freebsd-bugs Tue Oct 29 10:50:02 1996 Return-Path: owner-bugs Received: (from root@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA16020 for bugs-outgoing; Tue, 29 Oct 1996 10:50:02 -0800 (PST) Received: (from gnats@localhost) by freefall.freebsd.org (8.7.5/8.7.3) id KAA16014; Tue, 29 Oct 1996 10:50:01 -0800 (PST) Date: Tue, 29 Oct 1996 10:50:01 -0800 (PST) Message-Id: <199610291850.KAA16014@freefall.freebsd.org> To: freebsd-bugs Cc: From: se@zpr.uni-koeln.de (Stefan Esser) Subject: Re: kern/1919: NCR PCI error Reply-To: se@zpr.uni-koeln.de (Stefan Esser) Sender: owner-bugs@FreeBSD.ORG X-Loop: FreeBSD.org Precedence: bulk The following reply was made to PR kern/1919; it has been noted by GNATS. From: se@zpr.uni-koeln.de (Stefan Esser) To: daniels@borg.mit.edu Cc: FreeBSD-gnats-submit@freebsd.org Subject: Re: kern/1919: NCR PCI error Date: Tue, 29 Oct 1996 19:44:15 +0100 Daniel C. Stevenson writes: > The system has an NCR PCI controller. It has a 2GB SCSI hard > disk, 48MB of memory, and an ISA-based Ethernet card (3C509). What model of disk drive is that ? > >Description: > > The problem: access to various directories and files fails, giving an > error to the shell of "Input/output failed" for the affected files or > directories. The console displays the following message repeatedly: > > assertion "cp" failed: file "../../pci/ncr.c", line 5563 > sd0(ncr0:0:0):COMMAND FAILED (4 28) @f0d7c400 This is a secondary effect. Please send the lines ABOVE those. They should start with the word ERROR in capital letters ... > (I'm not sure which of these 2 lines actually goes first, but the > pattern repeats continuously) > > This problem seems to happen at random times. It has happened when the > system has been up for over a week (since the last reboot) or just a > few days after the last reboot. The system currently maintains a > moderate Web server load and 1 or 2 users. The problem seems to happen > independent of server load or any other discernible influences. > > When the error happens, some but not all partitions of the disk > are affected, and only parts of them are affected. This is quite interesting. Something like that might happen if you have tagged command queues enabled, but the drive does not (always) support as many tags. Since the driver default is 4 tags (and most drives should easily support at least 32), this is a safe default. But I've got some reports, which seem to indidcate that certain drives don't support while doing error recovery after a failed read or write. I'm not certain about this, but you may want to 1) make sure that automatic bad block replacement is enabled (see the "scsi" command, mode page 1, ARRE and AWRE) 2) check whether not using tags does help ("ncrcontrol -s tags=0") > >How-To-Repeat: > > Wait for it to happen again. Well, you should tell me how I might repeat it. Hmmm, just send me your system and I'll wait :) > >Fix: > > A hard reboot (cycling the power). "reboot" doesn't work > ("Input/output error") and using Ctrl-Alt-Del doesn't work either; in > the latter case, it results in a hung system at the "Boot:" prompt. Ctrl-Alt-Del does not work ??? Hmmm ... Seems that the drive did lock up and does not even recover if faced with a SCSI bus reset ... I'll look into this again after I receive some more detailed information about your disk drive and whether the system works reliably without tags. Regards, STefan