Date: Thu, 13 Sep 2001 13:59:40 -0700 (PDT) From: Jeremy Chadwick <jdc@best.net> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/30559: Intense SCSI tape access results in controller errors Message-ID: <200109132059.f8DKxem61171@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 30559 >Category: kern >Synopsis: Intense SCSI tape access results in controller errors >Confidential: no >Severity: serious >Priority: high >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Thu Sep 13 14:00:00 PDT 2001 >Closed-Date: >Last-Modified: >Originator: Jeremy Chadwick >Release: 4.4-RC >Organization: Best Internet/Verio/NTT >Environment: FreeBSD 4.4-RC #0: Tue Sep 11 03:10:20 PDT 2001 root@backup2.ba.best.net:/usr/obj/usr/src/sys/BEST-43-SMP Timecounter "i8254" frequency 1193182 Hz CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU) Origin = "GenuineIntel" Id = 0x652 Stepping = 2 Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR> real memory = 268435456 (262144K bytes) avail memory = 257765376 (251724K bytes) Programming 24 pins in IOAPIC #0 IOAPIC #0 intpin 2 -> irq 0 FreeBSD/SMP: Multiprocessor motherboard cpu0 (BSP): apic id: 0, version: 0x00040011, at 0xfee00000 cpu1 (AP): apic id: 1, version: 0x00040011, at 0xfee00000 io0 (APIC): apic id: 2, version: 0x00170011, at 0xfec00000 Preloaded elf kernel "kernel" at 0xc0320000. ccd0-7: Concatenated disk drivers Pentium Pro MTRR support enabled npx0: <math processor> on motherboard npx0: INT 16 interface pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard IOAPIC #0 intpin 16 -> irq 2 IOAPIC #0 intpin 17 -> irq 9 IOAPIC #0 intpin 18 -> irq 10 pci0: <PCI bus> on pcib0 pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0 pci1: <PCI bus> on pcib1 pci1: <Trident model 9750 VGA-compatible display device> at 0.0 irq 0 isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0 isa0: <ISA bus> on isab0 pci0: <Intel PIIX4 ATA controller> at 7.1 pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 7.2 intpm0: <Intel 82371AB Power management controller> port 0x440-0x44f irq 9 at device 7.3 on pci0 intpm0: I/O mapped 440 intpm0: intr IRQ 9 enabled revision 0 smbus0: <System Management Bus> on intsmb0 smb0: <SMBus general purpose I/O> on smbus0 intpm0: PM I/O mapped 400 ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 2 at device 16.0 on pci0 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xfebfef80-0xfebfefff irq 9 at device 17.0 on pci0 xl0: Ethernet address: 00:10:5a:18:4d:0a miibus0: <MII bus> on xl0 xlphy0: <3Com internal media interface> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto ahc1: <Adaptec 2940 Ultra SCSI adapter> port 0xe400-0xe4ff mem 0xfebfd000-0xfebfdfff irq 10 at device 18.0 on pci0 aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs orm0: <Option ROM> at iomem 0xc0000-0xcbfff on isa0 fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0 fdc0: FIFO enabled, 8 bytes threshold fd0: <1440-KB 3.5" drive> on fdc0 drive 0 atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sc0: <System console> on isa0 sc0: VGA <16 virtual consoles, flags=0x0> sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0 sio0: type 16550A, console sio1 at port 0x2f8-0x2ff irq 3 on isa0 sio1: type 16550A APIC_IO: Testing 8254 interrupt delivery APIC_IO: routing 8254 via IOAPIC #0 intpin 2 IP packet filtering initialized, divert disabled, rule-based forwarding disabled, default to deny, unlimited logging Waiting 5 seconds for SCSI devices to settle SMP: AP CPU #1 Launched! sa0 at ahc1 bus 0 target 1 lun 0 sa0: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device sa0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) sa1 at ahc1 bus 0 target 2 lun 0 sa1: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device sa1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) sa2 at ahc1 bus 0 target 3 lun 0 sa2: <SONY SDX-500C 0107> Removable Sequential Access SCSI-2 device sa2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit) da0 at ahc0 bus 0 target 0 lun 0 da0: <SEAGATE ST34573W 6244> Fixed Direct Access SCSI-2 device da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da0: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C) da2 at ahc0 bus 0 target 2 lun 0 da2: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da2: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C) da1 at ahc0 bus 0 target 1 lun 0 da1: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled da1: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C) ch0 at ahc1 bus 0 target 0 lun 0 ch0: <QUALSTAR TLS-46120 1.31> Removable Changer SCSI-2 device ch0: 3.300MB/s transfers ch0: 126 slots, 3 drives, 1 picker, 1 portal Mounting root from ufs:/dev/da0s1a >Description: Under heavy SCSI tape access, our system spits out the following on the console. Please note this applies to the ahc1 controller. (sa0:ahc1:0:1:0): SCB 0x7 - timed out ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6c ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1 HCNT = 0x0 SCSISEQ = 0x12, SBLKCTL = 0x2 DFCNTRL = 0x3c, DFSTATUS = 0x6d LASTPHASE = 0x0, SCSISIGI = 0x4, SXFRCTL0 = 0xa0 SSTAT0 = 0x0, SSTAT1 = 0x2 STACK == 0x83, 0x188, 0x147, 0x0 SCB count = 20 Kernel NEXTQSCB = 9 Card NEXTQSCB = 9 QINFIFO entries: Waiting Queue entries: Disconnected Queue entries: QOUTFIFO entries: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pending list: 7 Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 Untagged Q(1): 7 sg[0] - Addr 0x48fc000 : Length 4096 sg[1] - Addr 0x315d000 : Length 4096 sg[2] - Addr 0x7be000 : Length 4096 sg[3] - Addr 0x3fdf000 : Length 4096 sg[4] - Addr 0xd4c0000 : Length 4096 sg[5] - Addr 0xb001000 : Length 4096 sg[6] - Addr 0x63e2000 : Length 4096 sg[7] - Addr 0x38a3000 : Length 4096 sg[8] - Addr 0x6a04000 : Length 4096 sg[9] - Addr 0x2de5000 : Length 4096 sg[10] - Addr 0x46e6000 : Length 4096 sg[11] - Addr 0x52c7000 : Length 4096 sg[12] - Addr 0x6ee8000 : Length 4096 sg[13] - Addr 0xa6c9000 : Length 4096 sg[14] - Addr 0x5d2a000 : Length 4096 sg[15] - Addr 0x3b0b000 : Length 4096 (sa0:ahc1:0:1:0): BDR message in message buffer (sa0:ahc1:0:1:0): SCB 0x7 - timed out ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6d ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1 HCNT = 0x0 SCSISEQ = 0x12, SBLKCTL = 0x2 DFCNTRL = 0x3c, DFSTATUS = 0x6d LASTPHASE = 0x0, SCSISIGI = 0x14, SXFRCTL0 = 0xa0 SSTAT0 = 0x0, SSTAT1 = 0x2 STACK == 0x83, 0x188, 0x147, 0x0 SCB count = 20 Kernel NEXTQSCB = 9 Card NEXTQSCB = 9 QINFIFO entries: Waiting Queue entries: Disconnected Queue entries: QOUTFIFO entries: Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 Pending list: 7 Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 Untagged Q(1): 7 sg[0] - Addr 0x48fc000 : Length 4096 sg[1] - Addr 0x315d000 : Length 4096 sg[2] - Addr 0x7be000 : Length 4096 sg[3] - Addr 0x3fdf000 : Length 4096 sg[4] - Addr 0xd4c0000 : Length 4096 sg[5] - Addr 0xb001000 : Length 4096 sg[6] - Addr 0x63e2000 : Length 4096 sg[7] - Addr 0x38a3000 : Length 4096 sg[8] - Addr 0x6a04000 : Length 4096 sg[9] - Addr 0x2de5000 : Length 4096 sg[10] - Addr 0x46e6000 : Length 4096 sg[11] - Addr 0x52c7000 : Length 4096 sg[12] - Addr 0x6ee8000 : Length 4096 sg[13] - Addr 0xa6c9000 : Length 4096 sg[14] - Addr 0x5d2a000 : Length 4096 sg[15] - Addr 0x3b0b000 : Length 4096 (sa0:ahc1:0:1:0): no longer in timeout, status = 34b ahc1: Issued Channel A Bus Reset. 1 SCBs aborted (sa0:ahc1:0:1:0): failed to write terminating filemark(s) (sa0:ahc1:0:1:0): tape is now frozen- use an OFFLINE, REWIND or MTEOM command to clear this state. Our SCSI bus is terminated properly. The drives are not LVD. Cables do not "run too close to the power supply." Cable length does not exceed specification. Cable quality is high -- replacing cables made no difference. Decreasing speed from 40MB/sec to 20MB/sec made no difference. Disabling SMP (via sysctl MIB) made no difference. The only thing I haven't tried is removing the drive from the library/changer system itself, and throwing it right off the main SCSI cable. We have no problems with the other Adaptec controller (although used for hard disks). Both controllers use the same BIOS version. >How-To-Repeat: $ tar -b 512 -vpcf /dev/nsa0 shell2.la.best.com__sd* shell2.la.best.com__sd0a.gz shell2.la.best.com__sd0d.gz shell2.la.best.com__sd0e.gz shell2.la.best.com__sd0f.gz shell2.la.best.com__sd0g.gz shell2.la.best.com__sd0h.gz shell2.la.best.com__sd1d.gz shell2.la.best.com__sd1e.gz shell2.la.best.com__sd2d.gz tar: can't write to /dev/nsa0 : Input/output error Where the files in question total 1023875396 bytes (~1GB). Using a smaller blocksize results in the operation getting further, but still errors out: $ tar -b 20 -vpcf /dev/nsa0 shell2.la.best.com__sd* shell2.la.best.com__sd0a.gz shell2.la.best.com__sd0d.gz shell2.la.best.com__sd0e.gz shell2.la.best.com__sd0f.gz shell2.la.best.com__sd0g.gz shell2.la.best.com__sd0h.gz shell2.la.best.com__sd1d.gz shell2.la.best.com__sd1e.gz shell2.la.best.com__sd2d.gz shell2.la.best.com__sd2e.gz tar: can't write to /dev/nsa0 : Input/output error Blocksize set via mt is 512 bytes: $ mt -f /dev/sa0.ctl status Mode Density Blocksize bpi Compression Current: 0x31 512 bytes 0 0x3 ---------available modes--------- 0: 0x31 512 bytes 0 0x3 1: 0x31 512 bytes 0 0x3 2: 0x31 512 bytes 0 0x3 3: 0x31 512 bytes 0 0x3 --------------------------------- Current Driver State: at rest. --------------------------------- File Number: 0 Record Number: 0 Residual Count 0 Disabling hardware compression (mt comp off) makes no difference. The problem is 100% repeatable. >Fix: Fix unknown. >Release-Note: >Audit-Trail: >Unformatted: To Unsubscribe: send mail to majordomo@FreeBSD.org with "unsubscribe freebsd-bugs" in the body of the message
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200109132059.f8DKxem61171>