From owner-freebsd-scsi Wed Apr 15 06:29:53 1998 Return-Path: Received: (from majordom@localhost) by hub.freebsd.org (8.8.8/8.8.8) id GAA23947 for freebsd-scsi-outgoing; Wed, 15 Apr 1998 06:29:53 -0700 (PDT) (envelope-from owner-freebsd-scsi@FreeBSD.ORG) Received: from psv.oss.uswest.net (psv.oss.uswest.net [204.147.85.6]) by hub.freebsd.org (8.8.8/8.8.8) with ESMTP id NAA23942 for ; Wed, 15 Apr 1998 13:29:50 GMT (envelope-from greg@psv.oss.uswest.net) Received: (from greg@localhost) by psv.oss.uswest.net (8.8.7/8.8.5) id IAA14522 for freebsd-scsi@FreeBSD.ORG; Wed, 15 Apr 1998 08:29:19 -0500 (CDT) From: "Greg Rowe" Message-Id: <9804151329.ZM14520@psv.oss.uswest.net> Date: Wed, 15 Apr 1998 13:29:19 +0000 X-Mailer: Z-Mail (3.2.1 10apr95) To: freebsd-scsi@FreeBSD.ORG Subject: SCSI Failures Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii Sender: owner-freebsd-scsi@FreeBSD.ORG Precedence: bulk X-Loop: FreeBSD.org Greetings, I'm having some crashes with a couple mail servers during nightly backups. The configuration is as follows: Tyan Tahoe, 300MZ Intel, 384MB 2 - Adaptec 2940UW SCSI's 3 - 4 GIG Seagate ST34572W OS Level is 2.2.5-Stable-980318 sd0-controller 0 contains /, /usr, /var, and swap sd1 & sd2 - controller 1 is ccd'd for /home and swap We are using Qmail with around 8000 maildirs on the /home partition. The problem occurs during backups of the ccd (/home) partition using cpio, but we've also seen the problem using rdist on that partition. We'll get a couple SCSI resets during the backups and then finally a crash. The crash usually occurs well into the backup. The problem does not seem to be due to bad hardware as it can be reproduced on multiple, duplicate configuration systems with a large number of maildirs. Kernel configuration for AHC is as follows: options AHC_TAGENABLE options AHC_ALLOW_MEMIO options AHC_SCBPAGING_ENABLE Dmesg on boot is: CPU: Pentium Pro (298.42-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x633 Stepping=3 Features=0x80fbff,MTRR,PGE,MCA,CMOV> real memory = 402653184 (393216K bytes) avail memory = 392658944 (383456K bytes) Probing for devices on PCI bus 0: chip0 rev 2 on pci0:0 chip1 rev 1 on pci0:7:0 chip2 rev 0 on pci0:7:1 fxp0 rev 2 int a irq 3 on pci0:11 fxp0: Ethernet address 00:a0:c9:81:1f:e1 vga0 rev 84 int a irq 11 on pci0:12 ahc0 rev 1 int a irq 9 on pci0:13 ahc0: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc0 waiting for scsi devices to settle ahc0: target 0 Tagged Queuing Device (ahc0:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd0(ahc0:0:0): Direct-Access 4340MB (8888924 512 byte sectors) ahc1 rev 1 int a irq 10 on pci0:14 ahc1: aic7880 Wide Channel, SCSI Id=7, 16/255 SCBs ahc1 waiting for scsi devices to settle ahc1: target 0 Tagged Queuing Device (ahc1:0:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd1(ahc1:0:0): Direct-Access 4340MB (8888924 512 byte sectors) ahc1: target 1 Tagged Queuing Device (ahc1:1:0): "SEAGATE ST34572W 0876" type 0 fixed SCSI 2 sd2(ahc1:1:0): Direct-Access 4340MB (8888924 512 byte sectors) Probing for devices on the ISA bus: sc0 at 0x60-0x6f irq 1 on motherboard sc0: VGA color <16 virtual consoles, flags=0x0> sio0 at 0x3f8-0x3ff irq 4 flags 0x10 on isa sio0: type 16550A sio1 not found at 0x2f8 fdc0 at 0x3f0-0x3f7 irq 6 drq 2 on isa fdc0: FIFO enabled, 8 bytes threshold fd0: 1.44MB 3.5in wdt0 at 0x280 irq 7 on isa npx0 on motherboard npx0: INT 16 interface changing root device to sd0a ccd0-1: Concatenated disk drivers The errors we are seeing are (from dmesg): sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): abort message in message buffer sd0(ahc0:0:0): SCB 0x1 - timed out in dataout phase, SCSISIGI == 0xf6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): no longer in timeout sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 Ordered Tag queued sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): Queueing an Abort SCB sd0(ahc0:0:0): SCB 0x0 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 , retries:2 sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xe6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): abort message in message buffer sd0(ahc0:0:0): SCB 0x3 - timed out in dataout phase, SCSISIGI == 0xf6 SEQADDR = 0x127 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x13 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 4 SCBs aborted sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 Ordered Tag queued sd0(ahc0:0:0): SCB 0x1 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x0 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x3 timedout while recovery in progress sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): Queueing an Abort SCB sd0(ahc0:0:0): SCB 0x2 - timed out while idle, LASTPHASE == 0x1, SCSISIGI == 0x0 SEQADDR = 0x175 SCSISEQ = 0x12 SSTAT0 = 0x2 SSTAT1 = 0x0 sd0(ahc0:0:0): no longer in timeout ahc0: Issued Channel A Bus Reset. 5 SCBs aborted sd0(ahc0:0:0): UNIT ATTENTION asc:29,2 field replaceable unit: 2 , retries:2 Again, these errors are occuring while we're backing up the sd1&sd2 ccd device. Any help would be greatly appreciated as we can't currently backup the data. Greg Rowe -- Greg Rowe US WEST - !NTERACT Internet Services "To err is human, to really foul up requires the root password." To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-scsi" in the body of the message