Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Apr 1998 18:42:06 -0500 (CDT)
From:      Greg Rowe <greg@uswest.net>
To:        freebsd-scsi@FreeBSD.ORG
Subject:   Re: SCSI Failures
Message-ID:  <199804152342.SAA17216@psv.oss.uswest.net>
In-Reply-To: <9804151329.ZM14520@psv.oss.uswest.net> from Greg Rowe at "Apr 15, 98 01:29:19 pm"

next in thread | previous in thread | raw e-mail | index | archive | help
As an update to this, I spent the day trying to duplicate the failures on one
of the systems I'm having problems with. Using cpio to copy the /home files
to /dev/null and doing similar copies of /usr on the other controller at the
same time, I'm still getting failures but they are now only on sd1 and sd2 ??
This seems to make more sense, but doesn't solve the problem. My backups and
rdist are over the network, so could I possibly be looking at a problem on the
PCI BUS that changes the results when the ethernet card is involved ?? This is
getting worrysome since I have a number of these systems in production !!
A sample from todays messages:

pr 15 17:29:43 muop /kernel: sd1(ahc1:0:0): SCB 0x2 - timed out in command phas
e, SCSISIGI == 0x84
Apr 15 17:29:43 muop /kernel: SEQADDR = 0x47 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1
= 0x2
Apr 15 17:29:43 muop /kernel: sd1(ahc1:0:0): abort message in message buffer
Apr 15 17:29:43 muop /kernel: sd2(ahc1:1:0): SCB 0x6 timedout while recovery in
progress
Apr 15 17:29:43 muop /kernel: sd1(ahc1:0:0): SCB 0x3 timedout while recovery in
progress
Apr 15 17:29:43 muop /kernel: sd2(ahc1:1:0): SCB 0x7 timedout while recovery in
progress
Apr 15 17:29:44 muop /kernel: sd1(ahc1:0:0): SCB 0x5 timedout while recovery in
progress
Apr 15 17:29:44 muop /kernel: sd1(ahc1:0:0): SCB 0x1 timedout while recovery in
progress
Apr 15 17:29:44 muop /kernel: sd2(ahc1:1:0): SCB 0x4 timedout while recovery in
progress
Apr 15 17:29:44 muop /kernel: sd2(ahc1:1:0): SCB 0x0 timedout while recovery in
progress
Apr 15 17:29:45 muop /kernel: sd1(ahc1:0:0): SCB 0x2 - timed out in command phas
e, SCSISIGI == 0x94
Apr 15 17:29:45 muop /kernel: SEQADDR = 0x47 SCSISEQ = 0x12 SSTAT0 = 0x7 SSTAT1
= 0x2
Apr 15 17:29:45 muop /kernel: sd1(ahc1:0:0): no longer in timeout
Apr 15 17:29:45 muop /kernel: ahc1: Issued Channel A Bus Reset. 8 SCBs aborted
Apr 15 17:29:46 muop /kernel: sd2(ahc1:1:0): UNIT ATTENTION asc:29,2  field repl
aceable unit: 2
Apr 15 17:29:46 muop /kernel: , retries:3
Apr 15 17:29:47 muop /kernel: sd1(ahc1:0:0): UNIT ATTENTION asc:29,2  field repl
aceable unit: 2
Apr 15 17:29:47 muop /kernel: , retries:3
Apr 15 17:29:47 muop /kernel: sd1(ahc1:0:0): NOT READY asc:4,1
Apr 15 17:29:47 muop /kernel: sd1(ahc1:0:0):  Logical unit is in process of beco
ming ready field replaceable unit: 2
Apr 15 17:29:47 muop /kernel: , retries:3
Apr 15 17:29:47 muop /kernel: sd1(ahc1:0:0): NOT READY asc:4,1

Justin,
 Is CAM stable enough to try yet ? I can only go so long without backups 
(although the system runs fine as long as I don't try and back it up).
Thanks,
Greg

> Greetings,
> 
>  I'm having some crashes with a couple mail servers during nightly backups. The
> configuration is as follows:
> 
>  Tyan Tahoe, 300MZ Intel, 384MB
>  2 - Adaptec 2940UW SCSI's
>  3 - 4 GIG Seagate ST34572W
> 
>  OS Level is 2.2.5-Stable-980318
>  sd0-controller 0 contains /, /usr, /var, and swap
>  sd1 & sd2 - controller 1 is ccd'd for /home and swap
> 
>  We are using Qmail with around 8000 maildirs on the /home partition.
> 
>  The problem occurs during backups of the ccd (/home) partition using cpio, but
> we've also seen the problem using rdist on that partition. We'll get a couple
> SCSI resets during the backups and then finally a crash. The crash usually
> occurs well into the backup. The problem does not seem to be due to bad
> hardware as it can be reproduced on multiple, duplicate configuration systems
> with a large number of maildirs.
> 
>  Kernel configuration for AHC is as follows:
> 
>   options         AHC_TAGENABLE
>   options         AHC_ALLOW_MEMIO
>   options         AHC_SCBPAGING_ENABLE
> 
>  Dmesg on boot is:
> 
>  CPU: Pentium Pro (298.42-MHz 686-class CPU)
>   Origin = "GenuineIntel"  Id = 0x633  Stepping=3
>   Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,<b11>,MTRR,PGE,MCA,CMOV>
> real memory  = 402653184 (393216K bytes)
> avail memory = 392658944 (383456K bytes)
> Probing for devices on PCI bus 0:
> chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0
> chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
> chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
> fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 2 int a irq 3 on pci0:11
> fxp0: Ethernet address 00:a0:c9:81:1f:e1
> vga0 <VGA-compatible display device> rev 84 int a irq 11 on pci0:12
> ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 9 on pci0:13
> ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
> ahc0 waiting for scsi devices to settle
> ahc0: target 0 Tagged Queuing Device
> (ahc0:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
> sd0(ahc0:0:0): Direct-Access 4340MB (8888924 512 byte sectors)
> ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 10 on pci0:14
> ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
> ahc1 waiting for scsi devices to settle
> ahc1: target 0 Tagged Queuing Device
> (ahc1:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
> sd1(ahc1:0:0): Direct-Access 4340MB (8888924 512 byte sectors)
> ahc1: target 1 Tagged Queuing Device
> (ahc1:1:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
> sd2(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors)
> Probing for devices on the ISA bus:
> sc0 at 0x60-0x6f irq 1 on motherboard
> sc0: VGA color <16 virtual consoles, flags=0x0>
> sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
> sio0: type 16550A
> sio1 not found at 0x2f8
> fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
> fdc0: FIFO enabled, 8 bytes threshold
> fd0: 1.44MB 3.5in
> wdt0 at 0x280 irq 7 on isa
> npx0 on motherboard
> npx0: INT 16 interface
> changing root device to sd0a
> ccd0-1: Concatenated disk drivers
> 
>  The errors we are seeing are (from dmesg):
> 
> sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> Ordered Tag queued
> sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> sd0(ahc0:0:0): abort message in message buffer
> sd0(ahc0:0:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == 0xf6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> sd0(ahc0:0:0): no longer in timeout
> sd0(ahc0:0:0): no longer in timeout
> ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
> sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> Ordered Tag queued
> sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> sd0(ahc0:0:0): Queueing an Abort SCB
> sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> sd0(ahc0:0:0): no longer in timeout
> ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
> sd0(ahc0:0:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> , retries:2
> sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> Ordered Tag queued
> sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> sd0(ahc0:0:0): abort message in message buffer
> sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xf6
> SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
> sd0(ahc0:0:0): no longer in timeout
> ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
> sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> Ordered Tag queued
> sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
> sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> sd0(ahc0:0:0): Queueing an Abort SCB
> sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
> 0x0
> SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
> sd0(ahc0:0:0): no longer in timeout
> ahc0: Issued Channel A Bus Reset. 5 SCBs aborted
> sd0(ahc0:0:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
> , retries:2
> 
>  Again, these errors are occuring while we're backing up the sd1&sd2 ccd
> device. Any help would be greatly appreciated as we can't currently backup the
> data.
> 

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?199804152342.SAA17216>