Skip site navigation (1)Skip section navigation (2)
Date:      Wed, 15 Apr 1998 13:29:19 +0000
From:      "Greg Rowe" <greg@uswest.net>
To:        freebsd-scsi@FreeBSD.ORG
Subject:   SCSI Failures
Message-ID:  <9804151329.ZM14520@psv.oss.uswest.net>

next in thread | raw e-mail | index | archive | help
Greetings,

 I'm having some crashes with a couple mail servers during nightly backups. The
configuration is as follows:

 Tyan Tahoe, 300MZ Intel, 384MB
 2 - Adaptec 2940UW SCSI's
 3 - 4 GIG Seagate ST34572W

 OS Level is 2.2.5-Stable-980318
 sd0-controller 0 contains /, /usr, /var, and swap
 sd1 & sd2 - controller 1 is ccd'd for /home and swap

 We are using Qmail with around 8000 maildirs on the /home partition.

 The problem occurs during backups of the ccd (/home) partition using cpio, but
we've also seen the problem using rdist on that partition. We'll get a couple
SCSI resets during the backups and then finally a crash. The crash usually
occurs well into the backup. The problem does not seem to be due to bad
hardware as it can be reproduced on multiple, duplicate configuration systems
with a large number of maildirs.

 Kernel configuration for AHC is as follows:

  options         AHC_TAGENABLE
  options         AHC_ALLOW_MEMIO
  options         AHC_SCBPAGING_ENABLE

 Dmesg on boot is:

 CPU: Pentium Pro (298.42-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x633  Stepping=3
  Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,<b11>,MTRR,PGE,MCA,CMOV>
real memory  = 402653184 (393216K bytes)
avail memory = 392658944 (383456K bytes)
Probing for devices on PCI bus 0:
chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0
chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0
chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1
fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 2 int a irq 3 on pci0:11
fxp0: Ethernet address 00:a0:c9:81:1f:e1
vga0 <VGA-compatible display device> rev 84 int a irq 11 on pci0:12
ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 9 on pci0:13
ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
ahc0 waiting for scsi devices to settle
ahc0: target 0 Tagged Queuing Device
(ahc0:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
sd0(ahc0:0:0): Direct-Access 4340MB (8888924 512 byte sectors)
ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 10 on pci0:14
ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs
ahc1 waiting for scsi devices to settle
ahc1: target 0 Tagged Queuing Device
(ahc1:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
sd1(ahc1:0:0): Direct-Access 4340MB (8888924 512 byte sectors)
ahc1: target 1 Tagged Queuing Device
(ahc1:1:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2
sd2(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors)
Probing for devices on the ISA bus:
sc0 at 0x60-0x6f irq 1 on motherboard
sc0: VGA color <16 virtual consoles, flags=0x0>
sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa
sio0: type 16550A
sio1 not found at 0x2f8
fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa
fdc0: FIFO enabled, 8 bytes threshold
fd0: 1.44MB 3.5in
wdt0 at 0x280 irq 7 on isa
npx0 on motherboard
npx0: INT 16 interface
changing root device to sd0a
ccd0-1: Concatenated disk drivers

 The errors we are seeing are (from dmesg):

sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
Ordered Tag queued
sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
sd0(ahc0:0:0): abort message in message buffer
sd0(ahc0:0:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == 0xf6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
sd0(ahc0:0:0): no longer in timeout
sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
Ordered Tag queued
sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
sd0(ahc0:0:0): Queueing an Abort SCB
sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
sd0(ahc0:0:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
, retries:2
sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
Ordered Tag queued
sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
sd0(ahc0:0:0): abort message in message buffer
sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xf6
SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13
sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset. 4 SCBs aborted
sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
Ordered Tag queued
sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress
sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
sd0(ahc0:0:0): Queueing an Abort SCB
sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI ==
0x0
SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0
sd0(ahc0:0:0): no longer in timeout
ahc0: Issued Channel A Bus Reset. 5 SCBs aborted
sd0(ahc0:0:0): UNIT ATTENTION asc:29,2  field replaceable unit: 2
, retries:2

 Again, these errors are occuring while we're backing up the sd1&sd2 ccd
device. Any help would be greatly appreciated as we can't currently backup the
data.

Greg Rowe

-- 
Greg Rowe <greg@uswest.net>   US WEST - !NTERACT Internet Services
 "To err is human, to really foul up requires the root password."

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-scsi" in the body of the message



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9804151329.ZM14520>