From owner-freebsd-scsi Tue Mar 23 0:36:49 1999 Delivered-To: freebsd-scsi@freebsd.org Received: from Sisyphos.MI.Uni-Koeln.DE (Sisyphos.MI.Uni-Koeln.DE [134.95.212.10]) by hub.freebsd.org (Postfix) with ESMTP id 3471214D2C; Tue, 23 Mar 1999 00:34:08 -0800 (PST) (envelope-from se@dialup124.zpr.uni-koeln.de) Received: from dialup124.zpr.Uni-Koeln.DE (dialup124.zpr.Uni-Koeln.DE [134.95.219.124]) by Sisyphos.MI.Uni-Koeln.DE (8.8.7/8.8.7) with ESMTP id JAA19900; Tue, 23 Mar 1999 09:33:48 +0100 (MET) Received: (from se@localhost) by dialup124.zpr.Uni-Koeln.DE (8.9.3/8.6.9) id JAA00480; Tue, 23 Mar 1999 09:36:34 +0100 (CET) Date: Tue, 23 Mar 1999 09:36:34 +0100 From: Stefan Esser To: Christian Weisgerber Cc: freebsd-scsi@freebsd.org, Stefan Esser Subject: Re: Crash: what happened? Message-ID: <19990323093634.A425@dialup124.mi.uni-koeln.de> Reply-To: se@freebsd.org References: <7d6obe$ne4$1@mips.rhein-neckar.de> Mime-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Transfer-Encoding: 8bit X-Mailer: Mutt 0.95.4i In-Reply-To: <7d6obe$ne4$1@mips.rhein-neckar.de>; from Christian Weisgerber on Tue, Mar 23, 1999 at 01:47:42AM +0100 Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org On 1999-03-23 01:47 +0100, Christian Weisgerber wrote: > Our favorite unstable 2.2.8 box crashed with the following. Any idea > what could have caused this? > > ncr0:1: ERROR (1:0) (8-0-800) (8/13) @ (mem 159e0:00000000). This looks like a memory read error (or the delayed result thereof) ... In the error message above (1:0) = (dstat:sist). dstat=1 (Illegal Instruction) sist=0 Offset of next instruction is 0x159e0 (way outside the "official" NCR SCRIPTS code). At that address, a value of 0 was read, which is not a valid instruction for the CPU in the NCR chip. It stopped working, and the driver did not manage to recover from that state. (It is possible, that the memory range holding the "micro-code" was corrupted. I can't tell, what made the NCR jump to that invalid address where the "Illegal Instruction Detected" interrupt made it stop. (It may have been a soft error, just had one a few days ago, when one bit flipped during a kernel build ...) Since I assume that this is a single occurence in an otherwise reliable system, I'd not consider this to be a major problem. If something like that happens again, I'd rather guess it is hardware going bad than software. (I have heard of NCR chips fail after years of reliable operation, and this may also be a memory chip running under marginal conditions ...) Gruß, STefan To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message