Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 9 Dec 1999 11:56:25 -0400 (AST)
From:      The Hermit Hacker <scrappy@hub.org>
To:        Ben Speirs <igiveup@ix.netcom.com>
Cc:        freebsd-scsi@freebsd.org, freebsd-stable@freebsd.org
Subject:   Re: SCSI problem ... OS or just bus?
Message-ID:  <Pine.BSF.4.21.9912091154270.500-100000@thelab.hub.org>
In-Reply-To: <384F3A52.23868C19@ix.netcom.com>

next in thread | previous in thread | raw e-mail | index | archive | help

As an update, so far...without turning news back on again, and after
upgrading the kernel and doing a make world 12hrs ago, things have been
stable *so far*...the first time this happened after we added the drives,
it took about 17hrs or so...subsequent ones generally took 2-4hrs...

I'm going to re-enable news this afternoon and see if adding that extra
thrashing to the system causes a repeat of the problem or not...



On Wed, 8 Dec 1999, Ben Speirs wrote:

> The Hermit Hacker wrote:
> > 
> > I recently did two upgrades in the course of a few days...upgraded my
> > 3.3-STABLE to a more recent version, and added hard drives onto the
> > system...now I'm getting SCSI problems that make no sense :(
> > 
> > The machine just hung once more, which its doing every few hours...I can
> > get down to the debugger, but a 'trace' doesn't appear to show anyting, so
> > I panic...
> > 
> > ==========
> > (da4:ahc0:0:8:0): Other SCB Timeout
> > (da4:ahc0:0:8:0): SCB 0xeb - timed out in dataout phase, SEQADDR == 0x10f
> > (da4:ahc0:0:8:0): Other SCB Timeout
> > (da2:ahc0:0:5:0): SCB 0x24 - timed out in dataout phase, SEQADDR == 0x10f
> > (da2:ahc0:0:5:0): BDR message in message buffer
> > (da2:ahc0:0:5:0): SCB 0x92 - timed out in dataout phase, SEQADDR == 0x10f
> > (da2:ahc0:0:5:0): no longer in timeout, status = 34b
> > ahc0: Issued Channel A Bus Reset. 98 SCBs aborted
> 
> Just another data point - A similar thing happened to me.  I rebuilt the
> kernel and world back in September and my previously happy SCSI system
> started issuing the same type of messages.  I saved the output of the
> system log.  Portions of it are listed below:
> 
> Copyright (c) 1992-1999 FreeBSD Inc.
> Copyright (c) 1982, 1986, 1989, 1991, 1993
>         The Regents of the University of California. All rights
> reserved.
> FreeBSD 3.3-STABLE #3: Fri Sep 24 21:00:39 PDT 1999
>     root@sloth:/usr/src/sys/compile/SLOTH
> [...trim...]
> ahc0: <Adaptec 2940 Ultra SCSI adapter> rev 0x00 int a irq 9 on pci0.9.0
> ahc0: aic7880 Wide Channel A, SCSI Id=7, 16/255 SCBs
> [...trim...]
> Waiting 8 seconds for SCSI devices to settle
> changing root device to da0s3a
> da0 at ahc0 bus 0 target 15 lun 0
> da0: <FUJITSU M2954Q-512 0142> Fixed Direct Access SCSI-2 device
> da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing
> Enabled
> da0: 4149MB (8498506 512 byte sectors: 255H 63S/T 529C)
> cd0 at ahc0 bus 0 target 0 lun 0
> cd0: <TOSHIBA CD-ROM XM-5701TA 3136> Removable CD-ROM SCSI-2 device
> cd0: 10.000MB/s transfers (10.000MHz, offset 8)
> cd0: Attempt to query device size failed: NOT READY, Medium not present
> cd1 at ahc0 bus 0 target 1 lun 0
> cd1: <NEC CD-ROM DRIVE:500 2.5> Removable CD-ROM SCSI-2 device
> cd1: 3.300MB/s transfers
> cd1: Attempt to query device size failed: NOT READY, Medium not present
> 
> 
> Unexpected busfree.  LASTPHASE == 0x1
> SEQADDR == 0x153
> ahc0:A:0: no active SCB for reconnecting target - issuing BUS DEVICE
> RESET
> SAVED_TCL == 0x0, ARG_1 == 0xff, SEQ_FLAGS == 0x0
> (cd0:ahc0:0:0:0): SCB 0x16 - timed out in datain phase, SEQADDR == 0x153
> (cd0:ahc0:0:0:0): Other SCB Timeout
> (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153
> (da0:ahc0:0:15:0): BDR message in message buffer
> (da0:ahc0:0:15:0): SCB 0x3 - timed out in datain phase, SEQADDR == 0x153
> (da0:ahc0:0:15:0): no longer in timeout, status = 34b
> ahc0: Issued Channel A Bus Reset. 2 SCBs aborted
> fd0c: hard error reading fsbn 0 (No status)
> 
> 
> The problem occurred while accessing the da0 device and cd0 device at
> the same time.  I could reproduce it at will, and almost instantly by
> copying a file from the CD-ROM to the hard drive.  I could not reproduce
> the error with the older, slower NEC cd1 CD-ROM device.  I rechecked all
> my termination and unplugged one device after another without any
> success.  My guess was that the cd0 drive had gone goofy on me.  The
> only thing I have not tried is replacing the cables.  Since I had the
> other CD available my fix was to yank out the suspect device.  It has
> been near the bottom of my 'things to do' list.
> 
> Maybe we both got bit by the same "fix" that uncovered hidden hardware
> problems.  Maybe not, it looks like you have problems with only Wide
> channel devices.
> 
> --
> -Ben Speirs
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?Pine.BSF.4.21.9912091154270.500-100000>