Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 13 Aug 1997 05:40:02 -0700 (PDT)
From:      Stefan Esser <se@FreeBSD.ORG>
To:        freebsd-bugs
Subject:   Re: misc/4293: strang disk error messages
Message-ID:  <199708131240.FAA14894@hub.freebsd.org>

next in thread | raw e-mail | index | archive | help
The following reply was made to PR misc/4293; it has been noted by GNATS.

From: Stefan Esser <se@FreeBSD.ORG>
To: daniels@media.mit.edu
Cc: FreeBSD-gnats-submit@freebsd.org, Stefan Esser <se@freebsd.org>
Subject: Re: misc/4293: strang disk error messages
Date: Wed, 13 Aug 1997 14:19:41 +0200

 On Aug 13, daniels@media.mit.edu wrote:
 > The disk is a 2GB Quantum (SCSI) running from a PCI SCSI controller.
 
 What Quantum drive is that ?
 They are of quite different quality ...
 
 > Every few hours or days, a series of error messages about the disk
 > (and maybe the controller) appear on the console. These messages last
 > about 2 minutes, and then stop. During that time, user activity may
 > freeze, but the Web server (the primary purpose of the system) seems
 > to be running well. My preliminary deciphering of the error messages
 > suggest something wrong with swap space (pager errors) but I can't
 > really tell.
 
 No, there is an error returned as a result of 
 a disk request from the VM system.
 
 > Late last week, the computer lost power (as did most of Cambridage,
 > Mass.) which may have contributed to the problem, which only surfaced
 > over the weekend.
 
 The problem did not exist before that power loss ?
 
 > Here is a complete cycle of the /var/log/messages accounting of the
 > problem:
 > 
 > Aug 13 06:40:26 borg login: login on ttyv1 as daniels
 > Aug 13 06:41:30 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:13 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 
 
 The drive returns an UNIT ATTENTION message with 
 ASC=29 and ASCQ=2. This is a little odd, ASC=29 
 and ASCQ=0 have been expected ...
 
 > Aug 13 06:44:13 borg /kernel: , retries:3
 > Aug 13 06:44:14 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:15 borg /kernel: ncr0: restart (ncr dead ?).
 
 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:19 borg /kernel: ncr0: restart (ncr dead ?).
 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,2 
 > Aug 13 06:44:19 borg /kernel: , retries:1
 > Aug 13 06:44:19 borg /kernel: sd0(ncr0:0:0): FAST SCSI-2 100ns (10 Mb/sec) offset 8.
 > Aug 13 06:44:19 borg /kernel: pid 3577 (httpd), uid 65534: exited on signal 6
 
 Hmmm, and the system recovers after some time ?
 
 > >How-To-Repeat:
 > 
 > Just wait a few hours.
 
 Well, sorry, but this is not true. It may work 
 if *you* wait a few hours, but my system runs 
 fine for however long I let it ...
 
 So, there must be some other problem. The first
 obvious question is of course, whether the drive
 worked fine up to some external event (opposed 
 to a kernel rebuild :)
 
 If you did not install a new kernel, then there
 is a high probability, that your drive is going
 bad. Did you check whether it stops spinning
 during the time when those errors are reported ?
 
 There is a limited number of retries after a 
 SCSI transfer failed, but if a failure extends
 for more than a few seconds, then read errors
 will be returned back to the application (which
 may be the VM code in the kernel, as observed by
 you.)
 
 For now, I assume a hardware problem. Please let
 me know, if you know for sure, that your hardware
 does not cause the failure ...
 
 Regards, STefan



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199708131240.FAA14894>