Date: Wed, 15 Apr 1998 13:29:19 +0000 From: "Greg Rowe" <greg@uswest.net> To: freebsd-scsi@FreeBSD.ORG Subject: SCSI Failures Message-ID: <9804151329.ZM14520@psv.oss.uswest.net>
next in thread | raw e-mail | index | archive | help
Greetings, I'm having some crashes with a couple mail servers during nightly backups. The configuration is as follows: Tyan Tahoe, 300MZ Intel, 384MB 2 - Adaptec 2940UW SCSI's 3 - 4 GIG Seagate ST34572W OS Level is 2.2.5-Stable-980318 sd0-controller 0 contains /, /usr, /var, and swap sd1 & sd2 - controller 1 is ccd'd for /home and swap We are using Qmail with around 8000 maildirs on the /home partition. The problem occurs during backups of the ccd (/home) partition using cpio, but we've also seen the problem using rdist on that partition. We'll get a couple SCSI resets during the backups and then finally a crash. The crash usually occurs well into the backup. The problem does not seem to be due to bad hardware as it can be reproduced on multiple, duplicate configuration systems with a large number of maildirs. Kernel configuration for AHC is as follows: options AHC_TAGENABLE options AHC_ALLOW_MEMIO options AHC_SCBPAGING_ENABLE Dmesg on boot is: CPU: Pentium Pro (298.42-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x633 Stepping=3 Features=0x80fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,<b11>,MTRR,PGE,MCA,CMOV> real memory = 402653184 (393216K bytes) avail memory = 392658944 (383456K bytes) Probing for devices on PCI bus 0: chip0 <Intel 82440FX (Natoma) PCI and memory controller> rev 2 on pci0:0 chip1 <Intel 82371SB PCI-ISA bridge> rev 1 on pci0:7:0 chip2 <Intel 82371SB IDE interface> rev 0 on pci0:7:1 fxp0 <Intel EtherExpress Pro 10/100B Ethernet> rev 2 int a irq 3 on pci0:11 fxp0: Ethernet address 00:a0:c9:81:1f:e1 vga0 <VGA-compatible display device> rev 84 int a irq 11 on pci0:12 ahc0 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 9 on pci0:13 ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc0 waiting for scsi devices to settle ahc0: target 0 Tagged Queuing Device (ahc0:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 4340MB (8888924 512 byte sectors) ahc1 <Adaptec 2940 Ultra SCSI host adapter> rev 1 int a irq 10 on pci0:14 ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc1 waiting for scsi devices to settle ahc1: target 0 Tagged Queuing Device (ahc1:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd1(ahc1:0:0): Direct-Access 4340MB (8888924 512 byte sectors) ahc1: target 1 Tagged Queuing Device (ahc1:1:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd2(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors) Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 not found at 0x2f8 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdt0 at 0x280 irq 7 on isa npx0 on motherboard npx0: INT 16 interface changing root device to sd0a ccd0-1: Concatenated disk drivers The errors we are seeing are (from dmesg): sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): abort message in message buffer sd0(ahc0:0:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == 0xf6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): no longer in timeout sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 Ordered Tag queued sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): Queueing an Abort SCB sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 , retries:2 sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): abort message in message buffer sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xf6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): Queueing an Abort SCB sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 5 SCBs aborted sd0(ahc0:0:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 , retries:2 Again, these errors are occuring while we're backing up the sd1&sd2 ccd device. Any help would be greatly appreciated as we can't currently backup the data. Greg Rowe -- Greg Rowe <greg@uswest.net> US WEST - !NTERACT Internet Services "To err is human, to really foul up requires the root password." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?9804151329.ZM14520>