Skip site navigation (1)Skip section navigation (2)
Date:      Thu, 13 Sep 2001 13:59:40 -0700 (PDT)
From:      Jeremy Chadwick <jdc@best.net>
To:        freebsd-gnats-submit@FreeBSD.org
Subject:   kern/30559: Intense SCSI tape access results in controller errors
Message-ID:  <200109132059.f8DKxem61171@freefall.freebsd.org>

next in thread | raw e-mail | index | archive | help

>Number:         30559
>Category:       kern
>Synopsis:       Intense SCSI tape access results in controller errors
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    freebsd-bugs
>State:          open
>Quarter:        
>Keywords:       
>Date-Required:
>Class:          sw-bug
>Submitter-Id:   current-users
>Arrival-Date:   Thu Sep 13 14:00:00 PDT 2001
>Closed-Date:
>Last-Modified:
>Originator:     Jeremy Chadwick
>Release:        4.4-RC
>Organization:
Best Internet/Verio/NTT
>Environment:
FreeBSD 4.4-RC #0: Tue Sep 11 03:10:20 PDT 2001
    root@backup2.ba.best.net:/usr/obj/usr/src/sys/BEST-43-SMP
Timecounter "i8254"  frequency 1193182 Hz
CPU: Pentium II/Pentium II Xeon/Celeron (400.91-MHz 686-class CPU)
  Origin = "GenuineIntel"  Id = 0x652  Stepping = 2
  Features=0x183fbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,MMX,FXSR>
real memory  = 268435456 (262144K bytes)
avail memory = 257765376 (251724K bytes)
Programming 24 pins in IOAPIC #0
IOAPIC #0 intpin 2 -> irq 0
FreeBSD/SMP: Multiprocessor motherboard
 cpu0 (BSP): apic id:  0, version: 0x00040011, at 0xfee00000
 cpu1 (AP):  apic id:  1, version: 0x00040011, at 0xfee00000
 io0 (APIC): apic id:  2, version: 0x00170011, at 0xfec00000
Preloaded elf kernel "kernel" at 0xc0320000.
ccd0-7: Concatenated disk drivers
Pentium Pro MTRR support enabled
npx0: <math processor> on motherboard
npx0: INT 16 interface
pcib0: <Intel 82443BX (440 BX) host to PCI bridge> on motherboard
IOAPIC #0 intpin 16 -> irq 2
IOAPIC #0 intpin 17 -> irq 9
IOAPIC #0 intpin 18 -> irq 10
pci0: <PCI bus> on pcib0
pcib1: <Intel 82443BX (440 BX) PCI-PCI (AGP) bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <Trident model 9750 VGA-compatible display device> at 0.0 irq 0
isab0: <Intel 82371AB PCI to ISA bridge> at device 7.0 on pci0
isa0: <ISA bus> on isab0
pci0: <Intel PIIX4 ATA controller> at 7.1
pci0: <Intel 82371AB/EB (PIIX4) USB controller> at 7.2
intpm0: <Intel 82371AB Power management controller> port 0x440-0x44f irq 9 at device 7.3 on pci0
intpm0: I/O mapped 440
intpm0: intr IRQ 9 enabled revision 0
smbus0: <System Management Bus> on intsmb0
smb0: <SMBus general purpose I/O> on smbus0
intpm0: PM I/O mapped 400 
ahc0: <Adaptec 2940 Ultra SCSI adapter> port 0xe800-0xe8ff mem 0xfebff000-0xfebfffff irq 2 at device 16.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs
xl0: <3Com 3c905B-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xfebfef80-0xfebfefff irq 9 at device 17.0 on pci0
xl0: Ethernet address: 00:10:5a:18:4d:0a
miibus0: <MII bus> on xl0
xlphy0: <3Com internal media interface> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
ahc1: <Adaptec 2940 Ultra SCSI adapter> port 0xe400-0xe4ff mem 0xfebfd000-0xfebfdfff irq 10 at device 18.0 on pci0
aic7880: Ultra Wide Channel A, SCSI Id=7, 16/255 SCBs
orm0: <Option ROM> at iomem 0xc0000-0xcbfff on isa0
fdc0: <NEC 72065B or clone> at port 0x3f0-0x3f5,0x3f7 irq 6 drq 2 on isa0
fdc0: FIFO enabled, 8 bytes threshold
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
atkbdc0: <Keyboard controller (i8042)> at port 0x60,0x64 on isa0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
sc0: <System console> on isa0
sc0: VGA <16 virtual consoles, flags=0x0>
sio0 at port 0x3f8-0x3ff irq 4 flags 0x10 on isa0
sio0: type 16550A, console
sio1 at port 0x2f8-0x2ff irq 3 on isa0
sio1: type 16550A
APIC_IO: Testing 8254 interrupt delivery
APIC_IO: routing 8254 via IOAPIC #0 intpin 2
IP packet filtering initialized, divert disabled, rule-based forwarding disabled, default to deny, unlimited logging
Waiting 5 seconds for SCSI devices to settle
SMP: AP CPU #1 Launched!
sa0 at ahc1 bus 0 target 1 lun 0
sa0: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device 
sa0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
sa1 at ahc1 bus 0 target 2 lun 0
sa1: <SONY SDX-500C 0102> Removable Sequential Access SCSI-2 device 
sa1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
sa2 at ahc1 bus 0 target 3 lun 0
sa2: <SONY SDX-500C 0107> Removable Sequential Access SCSI-2 device 
sa2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit)
da0 at ahc0 bus 0 target 0 lun 0
da0: <SEAGATE ST34573W 6244> Fixed Direct Access SCSI-2 device 
da0: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da0: 4340MB (8888924 512 byte sectors: 255H 63S/T 553C)
da2 at ahc0 bus 0 target 2 lun 0
da2: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device 
da2: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da2: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
da1 at ahc0 bus 0 target 1 lun 0
da1: <SEAGATE ST318275LW 0001> Fixed Direct Access SCSI-2 device 
da1: 40.000MB/s transfers (20.000MHz, offset 8, 16bit), Tagged Queueing Enabled
da1: 17366MB (35566480 512 byte sectors: 255H 63S/T 2213C)
ch0 at ahc1 bus 0 target 0 lun 0
ch0: <QUALSTAR TLS-46120 1.31> Removable Changer SCSI-2 device 
ch0: 3.300MB/s transfers
ch0: 126 slots, 3 drives, 1 picker, 1 portal
Mounting root from ufs:/dev/da0s1a

>Description:
  Under heavy SCSI tape access, our system spits out the following on the console.  Please note this applies to the ahc1 controller.

(sa0:ahc1:0:1:0): SCB 0x7 - timed out
ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6c
ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1
HCNT = 0x0
SCSISEQ = 0x12, SBLKCTL = 0x2
 DFCNTRL = 0x3c, DFSTATUS = 0x6d
LASTPHASE = 0x0, SCSISIGI = 0x4, SXFRCTL0 = 0xa0
SSTAT0 = 0x0, SSTAT1 = 0x2
STACK == 0x83, 0x188, 0x147, 0x0
SCB count = 20
Kernel NEXTQSCB = 9
Card NEXTQSCB = 9
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Pending list: 7
Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 
Untagged Q(1): 7 
sg[0] - Addr 0x48fc000 : Length 4096
sg[1] - Addr 0x315d000 : Length 4096
sg[2] - Addr 0x7be000 : Length 4096
sg[3] - Addr 0x3fdf000 : Length 4096
sg[4] - Addr 0xd4c0000 : Length 4096
sg[5] - Addr 0xb001000 : Length 4096
sg[6] - Addr 0x63e2000 : Length 4096
sg[7] - Addr 0x38a3000 : Length 4096
sg[8] - Addr 0x6a04000 : Length 4096
sg[9] - Addr 0x2de5000 : Length 4096
sg[10] - Addr 0x46e6000 : Length 4096
sg[11] - Addr 0x52c7000 : Length 4096
sg[12] - Addr 0x6ee8000 : Length 4096
sg[13] - Addr 0xa6c9000 : Length 4096
sg[14] - Addr 0x5d2a000 : Length 4096
sg[15] - Addr 0x3b0b000 : Length 4096
(sa0:ahc1:0:1:0): BDR message in message buffer
(sa0:ahc1:0:1:0): SCB 0x7 - timed out
ahc1: Dumping Card State in Data-out phase, at SEQADDR 0x6d
ACCUM = 0x0, SINDEX = 0x8, DINDEX = 0x8f, ARG_2 = 0x1
HCNT = 0x0
SCSISEQ = 0x12, SBLKCTL = 0x2
 DFCNTRL = 0x3c, DFSTATUS = 0x6d
LASTPHASE = 0x0, SCSISIGI = 0x14, SXFRCTL0 = 0xa0
SSTAT0 = 0x0, SSTAT1 = 0x2
STACK == 0x83, 0x188, 0x147, 0x0
SCB count = 20
Kernel NEXTQSCB = 9
Card NEXTQSCB = 9
QINFIFO entries: 
Waiting Queue entries: 
Disconnected Queue entries: 
QOUTFIFO entries: 
Sequencer Free SCB List: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 
Pending list: 7
Kernel Free SCB list: 14 6 8 15 16 17 18 19 0 1 2 3 4 5 13 12 11 10 
Untagged Q(1): 7 
sg[0] - Addr 0x48fc000 : Length 4096
sg[1] - Addr 0x315d000 : Length 4096
sg[2] - Addr 0x7be000 : Length 4096
sg[3] - Addr 0x3fdf000 : Length 4096
sg[4] - Addr 0xd4c0000 : Length 4096
sg[5] - Addr 0xb001000 : Length 4096
sg[6] - Addr 0x63e2000 : Length 4096
sg[7] - Addr 0x38a3000 : Length 4096
sg[8] - Addr 0x6a04000 : Length 4096
sg[9] - Addr 0x2de5000 : Length 4096
sg[10] - Addr 0x46e6000 : Length 4096
sg[11] - Addr 0x52c7000 : Length 4096
sg[12] - Addr 0x6ee8000 : Length 4096
sg[13] - Addr 0xa6c9000 : Length 4096
sg[14] - Addr 0x5d2a000 : Length 4096
sg[15] - Addr 0x3b0b000 : Length 4096
(sa0:ahc1:0:1:0): no longer in timeout, status = 34b
ahc1: Issued Channel A Bus Reset. 1 SCBs aborted
(sa0:ahc1:0:1:0): failed to write terminating filemark(s)
(sa0:ahc1:0:1:0): tape is now frozen- use an OFFLINE, REWIND or MTEOM command to clear this state.

  Our SCSI bus is terminated properly.  The drives are not LVD.  Cables do not "run too close to the power supply."  Cable length does not exceed specification.  Cable quality is high -- replacing cables made no difference.  Decreasing speed from 40MB/sec to 20MB/sec made no difference.  Disabling SMP (via sysctl MIB) made no difference.

  The only thing I haven't tried is removing the drive from the library/changer system itself, and throwing it right off the main SCSI cable.

  We have no problems with the other Adaptec controller (although used for hard disks).  Both controllers use the same BIOS version.
>How-To-Repeat:
$ tar -b 512 -vpcf /dev/nsa0 shell2.la.best.com__sd*
shell2.la.best.com__sd0a.gz
shell2.la.best.com__sd0d.gz
shell2.la.best.com__sd0e.gz
shell2.la.best.com__sd0f.gz
shell2.la.best.com__sd0g.gz
shell2.la.best.com__sd0h.gz
shell2.la.best.com__sd1d.gz
shell2.la.best.com__sd1e.gz
shell2.la.best.com__sd2d.gz
tar: can't write to /dev/nsa0 : Input/output error

  Where the files in question total 1023875396 bytes (~1GB).

  Using a smaller blocksize results in the operation getting further, but still errors out:

$ tar -b 20 -vpcf /dev/nsa0 shell2.la.best.com__sd*
shell2.la.best.com__sd0a.gz
shell2.la.best.com__sd0d.gz
shell2.la.best.com__sd0e.gz
shell2.la.best.com__sd0f.gz
shell2.la.best.com__sd0g.gz
shell2.la.best.com__sd0h.gz
shell2.la.best.com__sd1d.gz
shell2.la.best.com__sd1e.gz
shell2.la.best.com__sd2d.gz
shell2.la.best.com__sd2e.gz
tar: can't write to /dev/nsa0 : Input/output error

  Blocksize set via mt is 512 bytes:

$ mt -f /dev/sa0.ctl status
Mode      Density              Blocksize      bpi      Compression
Current:  0x31                 512 bytes      0        0x3
---------available modes---------
0:        0x31                 512 bytes      0        0x3
1:        0x31                 512 bytes      0        0x3
2:        0x31                 512 bytes      0        0x3
3:        0x31                 512 bytes      0        0x3
---------------------------------
Current Driver State: at rest.
---------------------------------
File Number: 0  Record Number: 0    Residual Count 0

  Disabling hardware compression (mt comp off) makes no difference.

  The problem is 100% repeatable.
>Fix:
Fix unknown.
>Release-Note:
>Audit-Trail:
>Unformatted:

To Unsubscribe: send mail to majordomo@FreeBSD.org
with "unsubscribe freebsd-bugs" in the body of the message




Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200109132059.f8DKxem61171>