From owner-freebsd-scsi@FreeBSD.ORG Mon Jul 12 13:05:59 2004 Return-Path: Delivered-To: freebsd-scsi@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 868DB16A4CE for ; Mon, 12 Jul 2004 13:05:59 +0000 (GMT) Received: from postino5.prima.com.ar (postino5b.prima.com.ar [200.42.0.178]) by mx1.FreeBSD.org (Postfix) with SMTP id EA83D43D31 for ; Mon, 12 Jul 2004 13:05:57 +0000 (GMT) (envelope-from lalbinati@ciudad.com.ar) Received: (qmail 16559 invoked from network); 12 Jul 2004 13:05:56 -0000 Received: from unknown (HELO SMEAGOL) (200.42.0.210) by postino5.prima.com.ar with SMTP; 12 Jul 2004 13:05:56 -0000 Message-ID: <006e01c46810$f84b5af0$0e2610ac@prima.local> From: "Luis M. Albinati" To: Date: Mon, 12 Jul 2004 10:05:56 -0300 MIME-Version: 1.0 X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2800.1409 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2800.1409 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable X-Content-Filtered-By: Mailman/MimeDel 2.1.1 Subject: problem with aic7899 and Maxtor ATLAS10K4 - timed out X-BeenThere: freebsd-scsi@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: SCSI subsystem List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 12 Jul 2004 13:05:59 -0000 Hi, Past week I had an incident with one of our production boxes. The box = stopped responding and the console lots of SCSI errors, after rebooting = the box since it was unresponsive even through local console, I examined = the logs and the following errors showed up: 17:20:56 /kernel: (da4:ahc0:0:4:0): SCB 0x1d - timed out 17:20:56 /kernel: >>>>>>>>>>>>>>>>>> Dump Card State Begins = <<<<<<<<<<<<<<<<< 17:20:56 /kernel: ahc0: Dumping Card State while idle, at SEQADDR 0x8 17:20:56 /kernel: Card was paused 17:20:56 /kernel: ACCUM =3D 0x0, SINDEX =3D 0x4e, DINDEX =3D 0xe4, ARG_2 = =3D 0x0 17:20:56 /kernel: HCNT =3D 0x0 SCBPTR =3D 0x10 17:20:56 /kernel: SCSIPHASE[0x0] SCSISIGI[0x0] ERROR[0x0] SCSIBUSL[0x0]=20 17:20:56 /kernel: LASTPHASE[0x1]:(P_BUSFREE) = SCSISEQ[0x12]:(ENAUTOATNP|ENRSELI)=20 17:20:56 /kernel: SBLKCTL[0xa]:(SELWIDE|SELBUSB) SCSIRATE[0x0] = SEQCTL[0x10]:(FASTMODE)=20 17:20:56 /kernel: SEQ_FLAGS[0xc0]:(NO_CDB_SENT|NOT_IDENTIFIED) = SSTAT0[0x0]=20 17:20:56 /kernel: SSTAT1[0x8]:(BUSFREE) SSTAT2[0x0] SSTAT3[0x0] = SIMODE0[0x8]:(ENSWRAP)=20 17:20:56 /kernel: SIMODE1[0xa4]:(ENSCSIPERR|ENSCSIRST|ENSELTIMO) = SXFRCTL0[0x80]:(DFON)=20 17:20:56 /kernel: DFCNTRL[0x0] = DFSTATUS[0x89]:(FIFOEMP|HDONE|PRELOAD_AVAIL)=20 17:20:56 /kernel: STACK: 0x0 0x163 0x109 0x3 17:20:56 /kernel: SCB count =3D 100 17:20:56 /kernel: Kernel NEXTQSCB =3D 49 17:20:56 /kernel: Card NEXTQSCB =3D 49 17:20:56 /kernel: QINFIFO entries:=20 17:20:56 /kernel: Waiting Queue entries:=20 17:20:56 /kernel: Disconnected Queue entries: 27:9 12:7 21:57 23:21 = 26:51 15:53 7:83 29:85 14:75 28:0 3:23 13:81 11:24 2:64 9:10 24:30 8:6 = 30:29=20 17:20:56 /kernel: QOUTFIFO entries:=20 17:20:56 /kernel: Sequencer Free SCB List: 16 6 4 19 22 31 0 20 10 5 18 = 25 1 17=20 17:20:56 /kernel: Sequencer SCB Info:=20 17:20:56 /kernel: 0 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 1 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x7]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 2 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x40]=20 17:20:56 /kernel: 3 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x17]=20 17:20:56 /kernel: 4 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 5 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x17]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 6 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x7]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 7 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x53]=20 17:20:56 /kernel: 8 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x6]=20 17:20:56 /kernel: 9 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xa]=20 17:20:56 /kernel: 10 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x17]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 11 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x18]=20 17:20:56 /kernel: 12 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x7]=20 17:20:56 /kernel: 13 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x51]=20 17:20:56 /kernel: 14 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x4b]=20 17:20:56 /kernel: 15 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0x35]=20 17:20:56 /kernel: 16 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x7]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 17 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 18 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x17]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 19 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:56 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:56 /kernel: 20 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:57 /kernel: 21 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x39]=20 17:20:57 /kernel: 22 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:57 /kernel: 23 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x15]=20 17:20:57 /kernel: 24 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x1e]=20 17:20:57 /kernel: 25 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x17]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:57 /kernel: 26 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x33]=20 17:20:57 /kernel: 27 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x9]=20 17:20:57 /kernel: 28 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x0]=20 17:20:57 /kernel: 29 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x55]=20 17:20:57 /kernel: 30 SCB_CONTROL[0x64]:(DISCONNECTED|TAG_ENB|DISCENB) = SCB_SCSIID[0x47]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0x1d]=20 17:20:57 /kernel: 31 SCB_CONTROL[0xe0]:(TAG_ENB|DISCENB|TARGET_SCB) = SCB_SCSIID[0x37]=20 17:20:57 /kernel: SCB_LUN[0x0] SCB_TAG[0xff]=20 17:20:57 /kernel: Pending list:=20 17:20:57 /kernel: 9 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x47] = SCB_LUN[0x0]=20 17:20:57 /kernel: 7 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x47] = SCB_LUN[0x0]=20 17:20:57 /kernel: 57 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 21 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 51 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 53 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 83 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 85 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 75 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 0 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x47] = SCB_LUN[0x0]=20 17:20:57 /kernel: 23 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 24 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 81 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 64 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 10 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 30 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: 6 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) SCB_SCSIID[0x47] = SCB_LUN[0x0]=20 17:20:57 /kernel: 29 SCB_CONTROL[0x60]:(TAG_ENB|DISCENB) = SCB_SCSIID[0x47] SCB_LUN[0x0]=20 17:20:57 /kernel: Kernel Free SCB list: 78 61 47 16 88 45 13 26 95 80 5 = 58 14 77 19 28 1 65 15 17 44 76 87 32 71 63 86 8 55 31 4 69 72 50 96 18 = 43 27 11 41 62 38 39 42 34 70 68 36 22 3 66 82 20 37 48 60 25 40 73 79 = 56 84 67 97 35 33 46 54 2 12 74 99 59 52 89 98 94 93 92 91 90=20 17:20:57 /kernel:=20 17:20:57 /kernel: <<<<<<<<<<<<<<<< Dump Card State Ends = >>>>>>>>>>>>>>>>>> 17:20:57 /kernel: sg[0] - Addr 0x73f67000 : Length 4096 17:20:57 /kernel: sg[1] - Addr 0xe6c68000 : Length 4096 17:20:57 /kernel: sg[2] - Addr 0xeb629000 : Length 4096 17:20:57 /kernel: sg[3] - Addr 0x1adca000 : Length 4096 17:20:57 /kernel: sg[4] - Addr 0xc7b8b000 : Length 4096 17:20:57 /kernel: sg[5] - Addr 0x986cc000 : Length 4096 17:20:57 /kernel: sg[6] - Addr 0xd146d000 : Length 4096 17:20:57 /kernel: sg[7] - Addr 0x652ee000 : Length 4096 17:20:57 /kernel: sg[8] - Addr 0x30d8f000 : Length 4096 17:20:57 /kernel: sg[9] - Addr 0xbee50000 : Length 4096 17:20:57 /kernel: sg[10] - Addr 0x12b31000 : Length 4096 17:20:57 /kernel: sg[11] - Addr 0x50432000 : Length 4096 17:20:57 /kernel: (da4:ahc0:0:4:0): Queuing a BDR SCB 17:20:57 /kernel: (da4:ahc0:0:4:0): Bus Device Reset Message Sent 17:20:57 /kernel: (da4:ahc0:0:4:0): no longer in timeout, status =3D 34b 17:20:57 /kernel: ahc0: Bus Device Reset on A:4. 18 SCBs aborted 2 minutes forward in the log file the message repeated up all over = again, this situation kept repeating over and over again until the box = crashed and I manually rebooted it. After the reboot the problem didn't showed up again, so I thought that = it was an isolated incident, but last night it happened again, and the = funny thing is, it happened over other disk (this time the error starts = with (da2:ahc0:0:2:0): SCB 0x0 - timed out) ), so I'm starting to think = that it's probably not a a disk failure issue. It is important to note that at the moment of the crash, the box was = under heavy load (delivering more that 100Mb/s from the disks to the = network) the dmesg of the box is the following: Copyright (c) 1992-2003 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights = reserved. FreeBSD 4.9-RELEASE #3: Fri May 21 17:07:49 ART 2004 root@xx.prima.com.ar:/usr/src/sys/compile/xx Timecounter "i8254" frequency 1193182 Hz CPU: Intel(R) Xeon(TM) CPU 3.06GHz (3057.65-MHz 686-class CPU) Origin =3D "GenuineIntel" Id =3D 0xf25 Stepping =3D 5 = Features=3D0xbfebfbff Hyperthreading: 2 logical CPUs real memory =3D 4160618496 (4063104K bytes) avail memory =3D 4053901312 (3958888K bytes) Changing APIC ID for IO APIC #0 from 0 to 8 on chip Changing APIC ID for IO APIC #1 from 0 to 9 on chip Changing APIC ID for IO APIC #2 from 0 to 10 on chip Programming 16 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 Programming 16 pins in IOAPIC #1 Programming 16 pins in IOAPIC #2 FreeBSD/SMP: Multiprocessor motherboard: 4 CPUs cpu0 (BSP): apic id: 0, version: 0x00050014, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00050014, at 0xfee00000 cpu2 (AP): apic id: 6, version: 0x00050014, at 0xfee00000 cpu3 (AP): apic id: 7, version: 0x00050014, at 0xfee00000 io0 (APIC): apic id: 8, version: 0x000f0011, at 0xfec00000 io1 (APIC): apic id: 9, version: 0x000f0011, at 0xfec01000 io2 (APIC): apic id: 10, version: 0x000f0011, at 0xfec02000 Preloaded elf kernel "kernel" at 0xc02ff000. Warning: Pentium 4 CPU: PSE disabled Pentium Pro MTRR support enabled Using $PIR table, 9 entries at 0xc00fc410 npx0: on motherboard npx0: INT 16 interface pcib0: on motherboard IOAPIC #1 intpin 3 -> irq 2 IOAPIC #1 intpin 7 -> irq 5 IOAPIC #1 intpin 11 -> irq 7 pci0: on pcib0 pci0: (vendor=3D0x1028, dev=3D0x000c) at 4.0 irq 2 pci0: (vendor=3D0x1028, dev=3D0x0008) at 4.1 irq 5 pci0: (vendor=3D0x1028, dev=3D0x000d) at 4.2 irq 7 pci0: at 14.0 atapci0: port = 0x8b0-0x8bf,0x8d8-0x8db,0x8d0-0x8d7,0x8c8-0x8cb,0x8c0-0x8c7 at device = 15.1 on pci0 ata0: at 0x1f0 irq 14 on atapci0 ata1: at 0x170 irq 15 on atapci0 isab0: at device 15.3 = on pci0 isa0: on isab0 pcib1: on motherboard IOAPIC #1 intpin 4 -> irq 10 pci1: on pcib1 fxp0: port 0xdcc0-0xdcff mem = 0xfcf00000-0xfcf1ffff,0xfcf20000-0xfcf20fff irq 10 at device 8.0 on pci1 fxp0: Ethernet address 00:02:b3:ed:ce:54 inphy0: on miibus0 inphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto pcib2: on motherboard pci2: on pcib2 pcib3: on motherboard IOAPIC #1 intpin 12 -> irq 11 IOAPIC #1 intpin 13 -> irq 13 pci3: on pcib3 bge0: mem = 0xfcd10000-0xfcd1ffff irq 11 at device 6.0 on pci3 bge0: Ethernet address: 00:0d:56:bb:e2:70 miibus1: on bge0 brgphy0: on miibus1 brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, = 1000baseTX-FDX, auto bge1: mem = 0xfcd00000-0xfcd0ffff irq 13 at device 8.0 on pci3 bge1: Ethernet address: 00:0d:56:bb:e2:71 miibus2: on bge1 brgphy1: on miibus2 brgphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, = 1000baseTX-FDX, auto pcib4: on motherboard pci4: on pcib4 pcib8: at device 8.0 = on pci4 IOAPIC #1 intpin 14 -> irq 16 IOAPIC #1 intpin 15 -> irq 17 pci5: on pcib8 ahc0: port 0xac00-0xacff mem = 0xfcaff000-0xfcafffff irq 16 at device 6.0 on pci5 aic7899: Ultra160 Wide Channel A, SCSI Id=3D7, 32/253 SCBs ahc1: port 0xa800-0xa8ff mem = 0xfcafe000-0xfcafefff irq 17 at device 6.1 on pci5 aic7899: Ultra160 Wide Channel B, SCSI Id=3D7, 32/253 SCBs pcib5: on motherboard pci6: on pcib5 pcib6: on motherboard pci7: on pcib6 pcib7: on motherboard pci8: on pcib7 orm0: