Skip site navigation (1)Skip section navigation (2)
Date:      Fri, 18 Mar 2005 11:26:32 +0100
From:      peter@bgnett.no (Peter N. M. Hansteen)
To:        freebsd-questions@freebsd.org
Cc:        peter@datadok.no
Subject:   sym driver broken in 5.3?
Message-ID:  <86ekedntbb.fsf@amidala.datadok.no>

next in thread | raw e-mail | index | archive | help
is anybody else having trouble with the sym scsi driver on 5.3-stable
systems?

I have a machine here where a tar to SCSI tape (tar cf /dev/nsa0
/home/data) will pretty reliably chrash the machine. This being our file
server, it's a tad inconvenient. I was suspecting that the tape drive
was bad, but today's crash gave me some new data - the console was full
of repeated

camq_init: - cannot malloc array!
followed by the uptime figures.

dmesg output immediately after reboot had according to grep -c 676 of
them, before the expected boot time messages:

Copyright (c) 1992-2004 The FreeBSD Project.
Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994
	The Regents of the University of California. All rights reserved.
FreeBSD 5.3-SECURITY #0: Fri Jan  7 04:09:28 UTC 2005
    root@builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC
Timecounter "i8254" frequency 1193182 Hz quality 0
CPU: AMD Athlon(tm) 64 Processor 3000+ (2000.09-MHz 686-class CPU)
  Origin = "AuthenticAMD"  Id = 0xfc0  Stepping = 0
  Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2>
  AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!>
real memory  = 1006567424 (959 MB)
avail memory = 975384576 (930 MB)
ACPI APIC Table: <AMIINT VIA_K8  >
ioapic0 <Version 0.3> irqs 0-23 on motherboard
npx0: [FAST]
npx0: <math processor> on motherboard
npx0: INT 16 interface
acpi0: <AMIINT VIA_K8> on motherboard
acpi0: Power Button (fixed)
Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000
acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0
cpu0: <ACPI CPU> on acpi0
acpi_button0: <Power Button> on acpi0
pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0
pci0: <ACPI PCI bus> on pcib0
agp0: <VIA 8380 host to PCI bridge> mem 0xd0000000-0xd7ffffff at device 0.0 on pci0
pcib1: <PCI-PCI bridge> at device 1.0 on pci0
pci1: <PCI bus> on pcib1
pci1: <display, VGA> at device 0.0 (no driver attached)
sym0: <895> port 0xe800-0xe8ff mem 0xcfffe000-0xcfffefff,0xcfffff00-0xcfffffff irq 16 at device 8.0 on pci0
sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking
sym0: [GIANT-LOCKED]
xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xcffffe80-0xcffffeff irq 19 at device 11.0 on pci0
miibus0: <MII bus> on xl0
xlphy0: <3c905C 10/100 internal PHY> on miibus0
xlphy0:  10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto
xl0: Ethernet address: 00:01:02:df:39:9a
atapci0: <VIA 6420 SATA150 controller> port 0xd000-0xd0ff,0xd400-0xd40f,0xd800-0xd803,0xdc00-0xdc07,0xe000-0xe003,0xe400-0xe407 irq 20 at device 15.0 on pci0
ata2: channel #0 on atapci0
ata3: channel #1 on atapci0
atapci1: <VIA 8237 UDMA133 controller> port 0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 15.1 on pci0
ata0: channel #0 on atapci1
ata1: channel #1 on atapci1
isab0: <PCI-ISA bridge> at device 17.0 on pci0
isa0: <ISA bus> on isab0
fdc0: <floppy drive controller> port 0x3f7,0x3f4-0x3f5,0x3f2-0x3f3 irq 6 drq 2 on acpi0
fdc0: [FAST]
fd0: <1440-KB 3.5" drive> on fdc0 drive 0
sio0: configured irq 4 not in bitmap of probed irqs 0
sio0: port may not be enabled
sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0
sio0: type 16550A
ppc0: <ECP parallel printer port> port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0
ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode
ppc0: FIFO with 16/16/9 bytes threshold
ppbus0: <Parallel port bus> on ppc0
plip0: <PLIP network interface> on ppbus0
lpt0: <Printer> on ppbus0
lpt0: Interrupt-driven port
ppi0: <Parallel I/O> on ppbus0
atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0
atkbd0: <AT Keyboard> irq 1 on atkbdc0
kbd0 at atkbd0
atkbd0: [GIANT-LOCKED]
orm0: <ISA Option ROMs> at iomem 0xe0000-0xe0fff,0xcd800-0xcf7ff,0xc8800-0xc8fff on isa0
pmtimer0 on isa0
sc0: <System console> at flags 0x100 on isa0
sc0: VGA <16 virtual consoles, flags=0x300>
sio1: configured irq 3 not in bitmap of probed irqs 0
sio1: port may not be enabled
vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0
Timecounter "TSC" frequency 2000087768 Hz quality 800
Timecounters tick every 10.000 msec
acpi_cpu: throttling enabled, 16 steps (100% to 6.2%), currently 100.0%
acd0: CDROM <CD-950E/TKU/A4E> at ata0-master UDMA33
ad4: 38204MB <SAMSUNG SP0411C/UU100-05> [77622/16/63] at ata2-master SATA150
Waiting 15 seconds for SCSI devices to settle
sa0 at sym0 bus 0 target 6 lun 0
sa0: <SEAGATE DAT    DAT72-000 A060> Removable Sequential Access SCSI-3 device 
sa0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit)
da0 at sym0 bus 0 target 2 lun 0
da0: <SEAGATE ST336753LW 0006> Fixed Direct Access SCSI-3 device 
da0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled
da0: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C)
Mounting root from ufs:/dev/ad4s1a
WARNING: / was not properly dismounted
WARNING: /home was not properly dismounted
/home: mount pending error: blocks 1092 files 2
WARNING: /home/data/merplass was not properly dismounted
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 120 bytes
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 180 bytes
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 240 bytes
xl0: transmission error: 90
xl0: tx underrun, increasing tx start threshold to 300 bytes

I've been debugging this on and off for a while now. Tar to tape worked
on the first couple of attempts, as far as I can tell from mt output
compression is enabled in the drive (meaning there should be space for
the data), but "excessive write errors" messages have been turning up in
the syslog messages - as in

Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): WRITE FILEMARKS. CDB: 10 0 0 0 2 0 
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): CAM Status: SCSI Status Error
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): SCSI Status: Check Condition
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): MEDIUM ERROR asc:3,2
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): Excessive write errors
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): Retries Exhausted
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): failed to write terminating filemark(s)
Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): tape is now frozen- use an OFFLINE, REWIND or MTEOM command to clear this state.

I was beginning to think I'd need to replace the tape drive, but the
camq_init message made me think this could be a driver problem (the
driver is afaik not supported in FreeBSD/amd64 at all, for example). 

The question is, what's the next reasonable debugging step here?

(and I know you're dying to ask - we do rsync to an off-site location
twice a day) 

- P 
-- 
Peter N. M. Hansteen, member of the first RFC 1149 implementation team
http://www.blug.linux.no/rfc1149/ http://www.datadok.no/ http://www.nuug.no/
"First, we kill all the spammers" The Usenet Bard, "Twice-forwarded tales"



Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?86ekedntbb.fsf>