Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 8 Dec 1995 16:23:21 +0100
From:      se@zpr.uni-koeln.de (Stefan Esser)
To:        rmallory@wiley.csusb.edu, scsi@freebsd.org
Subject:   Re: Check back Re: Problem with IBM 2 gig
Message-ID:  <199512081523.AA26892@Sysiphos>
In-Reply-To: Andrew Russell <arussell@bga.com> "Check back Re: Problem with IBM 2 gig" (Dec  3, 23:33)

next in thread | previous in thread | raw e-mail | index | archive | help
} Subject: Check back Re: Problem with IBM 2 gig
} looks like a power glitch.. are those devices in an external cabinet?
} OR
} possibly the ncd drive detected a hang and reset the scsi bus..
} 
} what do you think stefan?
} julian

Thanks for forwarding the message ...

} > [asus MB, ncr825 bios3.04, plextor 6x]
} > off a new kernel from sunday night, I got the following when doing
} > a  `df` with a mounted cdrom...  any clues?
} > ps: the new clustering code on cd9660's works excellent!
} >     I can (almost) watch two ~100MB qt movies off a cd at the same time!
} > 
} > root@kickme$ df
} > Dec  3 22:56:03 kickme /kernel: ncr0:6: ERROR (80:140) (8-2a-0) (88/13) @ (bd4:900b0000).
} > Dec  3 22:56:03 kickme /kernel: ncr0:6: ERROR (80:140) (8-2a-0) (88/13) @ (bd4:900b0000).

It is quite funny to see this (and the other messages)
appear twice ... Never observed that before ...

dstat and istat registers = (80:140):

80:	dma fifo empty (Ok)
140:	arbitration complete + handshake timeout

SCSI bus state:

out:	ATN		(NCR issues ATN)
bus:	BSY + ATN + C_D (SCSI lines are: Command phase + Attention)
data:	0

DISPATCH:
	...			...
bd4:	900b0000   00000000	return when (data_out)
	910a0000   00000000	return if (data_in)

Hmmm, this is where it fails ...

The DISPATCH code waits for the SCSI phase to stabilize
(that's the WHEN clause). It will just return (to the 
data transfer code), if either a data input or output
phase is detected.

Obviously the NCR chip blocked at the WHEN, because it
considered the bus to be in an inconsistent state (e.g.
not connected, arbitrating, ...)

} > root@kickme$ Dec  3 22:56:04 kickme /kernel:    script cmd = 910a0000
} > Dec  3 22:56:04 kickme /kernel:         script cmd = 910a0000
} > Dec  3 22:56:04 kickme /kernel:         reg:     da 10 80 13 47 88 06 0f 01 08 02 2a 80 00 0a 00.
} > Dec  3 22:56:04 kickme /kernel:         reg:     da 10 80 13 47 88 06 0f 01 08 02 2a 80 00 0a 00.
} > Dec  3 22:56:04 kickme /kernel: ncr0: handshake timeout
} > Dec  3 22:56:04 kickme /kernel: ncr0: handshake timeout
} > Dec  3 22:56:04 kickme /kernel: cd0(ncr0:6:0): COMMAND FAILED (6 ff) @f0aa8a00.
} > Dec  3 22:56:04 kickme /kernel: cd0(ncr0:6:0): COMMAND FAILED (6 ff) @f0aa8a00.

The command failed because of the timeout. The NCR did
not go on, because it considered the SCSI bus to not be
in a valid state.

} > Dec  3 22:56:04 kickme /kernel: sd0(ncr0:0:0): COMMAND FAILED (6 ff) @f0aa7c00.
} > Dec  3 22:56:04 kickme /kernel: sd0(ncr0:0:0): COMMAND FAILED (6 ff) @f0aa7c00.

This is a secondary effect. The hard disk had an active
command, which also was terminated because of the timeout.
It had not got a chance to connect, so this is a little
unfair ...

} > Dec  3 22:56:04 kickme /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,0
} > Dec  3 22:56:04 kickme /kernel: sd0(ncr0:0:0): UNIT ATTENTION asc:29,0

The hard disk complains about the bus reset. Why was there
no message from the NCR driver, that it was about to send
a SCSI bus reset ??? Hmmm.

} > Dec  3 22:56:05 kickme /kernel: sd0(ncr0:0:0):  Power on, reset, or bus device reset occurred
} > Dec  3 22:56:05 kickme /kernel: sd0(ncr0:0:0):  Power on, reset, or bus device reset occurred

The same in text form ...

I'm a little confused about the two identical error messages.
The commands in question are one and the same (as the @f0aa8a00
command control block address proves).


If this remains a single case, thaen I'd say it most likely was
a glitch. The driver was in an consistent state, and the NCR
seems to have missed the fact, that the bus was ready for the 
requested command transfer, according to the SCSI control lines
printed in the error message.

The timeout lead to a SCSI bus reset, but the generic SCSI code 
should have resent the command to the hard disk, and if the 
CDROM did not lock up internally, then it should have been able
to continue normal operation as well.

Or did the system crash as a result ???


Regards, STefan

-- 
 Stefan Esser, Zentrum fuer Paralleles Rechnen		Tel:	+49 221 4706021
 Universitaet zu Koeln, Weyertal 80, 50931 Koeln	FAX:	+49 221 4705160
 ==============================================================================
 http://www.zpr.uni-koeln.de/~se			  <se@ZPR.Uni-Koeln.DE>



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199512081523.AA26892>