From owner-freebsd-questions@FreeBSD.ORG Fri Mar 18 10:28:43 2005 Return-Path: Delivered-To: freebsd-questions@freebsd.org Received: from mx1.FreeBSD.org (mx1.freebsd.org [216.136.204.125]) by hub.freebsd.org (Postfix) with ESMTP id 7BE5A16A4D3 for ; Fri, 18 Mar 2005 10:28:43 +0000 (GMT) Received: from vs3.bgnett.no (vs3.bgnett.no [194.54.96.185]) by mx1.FreeBSD.org (Postfix) with ESMTP id 4BA2943D55 for ; Fri, 18 Mar 2005 10:28:42 +0000 (GMT) (envelope-from peter@bgnett.no) Received: from amidala.datadok.no.bgnett.no (amidala.datadok.no [194.54.103.98]) by vs3.bgnett.no (8.12.9p2/8.12.9) with ESMTP id j2IASGBM093092; Fri, 18 Mar 2005 11:28:17 +0100 (CET) (envelope-from peter@bgnett.no) To: freebsd-questions@freebsd.org From: peter@bgnett.no (Peter N. M. Hansteen) Date: Fri, 18 Mar 2005 11:26:32 +0100 Message-ID: <86ekedntbb.fsf@amidala.datadok.no> User-Agent: Gnus/5.1006 (Gnus v5.10.6) XEmacs/21.4 (Jumbo Shrimp, berkeley-unix) MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-bgnett.no-virusscanner: Found to be clean X-Envelope-To: freebsd-questions@freebsd.org, peter@datadok.no cc: peter@datadok.no Subject: sym driver broken in 5.3? X-BeenThere: freebsd-questions@freebsd.org X-Mailman-Version: 2.1.1 Precedence: list List-Id: User questions List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Fri, 18 Mar 2005 10:28:43 -0000 is anybody else having trouble with the sym scsi driver on 5.3-stable systems? I have a machine here where a tar to SCSI tape (tar cf /dev/nsa0 /home/data) will pretty reliably chrash the machine. This being our file server, it's a tad inconvenient. I was suspecting that the tape drive was bad, but today's crash gave me some new data - the console was full of repeated camq_init: - cannot malloc array! followed by the uptime figures. dmesg output immediately after reboot had according to grep -c 676 of them, before the expected boot time messages: Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-SECURITY #0: Fri Jan 7 04:09:28 UTC 2005 root@builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: AMD Athlon(tm) 64 Processor 3000+ (2000.09-MHz 686-class CPU) Origin = "AuthenticAMD" Id = 0xfc0 Stepping = 0 Features=0x78bfbff AMD Features=0xe0500000 real memory = 1006567424 (959 MB) avail memory = 975384576 (930 MB) ACPI APIC Table: ioapic0 irqs 0-23 on motherboard npx0: [FAST] npx0: on motherboard npx0: INT 16 interface acpi0: on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x808-0x80b on acpi0 cpu0: on acpi0 acpi_button0: on acpi0 pcib0: port 0xcf8-0xcff on acpi0 pci0: on pcib0 agp0: mem 0xd0000000-0xd7ffffff at device 0.0 on pci0 pcib1: at device 1.0 on pci0 pci1: on pcib1 pci1: at device 0.0 (no driver attached) sym0: <895> port 0xe800-0xe8ff mem 0xcfffe000-0xcfffefff,0xcfffff00-0xcfffffff irq 16 at device 8.0 on pci0 sym0: Tekram NVRAM, ID 7, Fast-40, LVD, parity checking sym0: [GIANT-LOCKED] xl0: <3Com 3c905C-TX Fast Etherlink XL> port 0xec00-0xec7f mem 0xcffffe80-0xcffffeff irq 19 at device 11.0 on pci0 miibus0: on xl0 xlphy0: <3c905C 10/100 internal PHY> on miibus0 xlphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto xl0: Ethernet address: 00:01:02:df:39:9a atapci0: port 0xd000-0xd0ff,0xd400-0xd40f,0xd800-0xd803,0xdc00-0xdc07,0xe000-0xe003,0xe400-0xe407 irq 20 at device 15.0 on pci0 ata2: channel #0 on atapci0 ata3: channel #1 on atapci0 atapci1: port 0xfc00-0xfc0f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 15.1 on pci0 ata0: channel #0 on atapci1 ata1: channel #1 on atapci1 isab0: at device 17.0 on pci0 isa0: on isab0 fdc0: port 0x3f7,0x3f4-0x3f5,0x3f2-0x3f3 irq 6 drq 2 on acpi0 fdc0: [FAST] fd0: <1440-KB 3.5" drive> on fdc0 drive 0 sio0: configured irq 4 not in bitmap of probed irqs 0 sio0: port may not be enabled sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: port 0x778-0x77b,0x378-0x37f irq 7 drq 3 on acpi0 ppc0: SMC-like chipset (ECP/EPP/PS2/NIBBLE) in COMPATIBLE mode ppc0: FIFO with 16/16/9 bytes threshold ppbus0: on ppc0 plip0: on ppbus0 lpt0: on ppbus0 lpt0: Interrupt-driven port ppi0: on ppbus0 atkbdc0: port 0x64,0x60 irq 1 on acpi0 atkbd0: irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] orm0: at iomem 0xe0000-0xe0fff,0xcd800-0xcf7ff,0xc8800-0xc8fff on isa0 pmtimer0 on isa0 sc0: at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled vga0: at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Timecounter "TSC" frequency 2000087768 Hz quality 800 Timecounters tick every 10.000 msec acpi_cpu: throttling enabled, 16 steps (100% to 6.2%), currently 100.0% acd0: CDROM at ata0-master UDMA33 ad4: 38204MB [77622/16/63] at ata2-master SATA150 Waiting 15 seconds for SCSI devices to settle sa0 at sym0 bus 0 target 6 lun 0 sa0: Removable Sequential Access SCSI-3 device sa0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit) da0 at sym0 bus 0 target 2 lun 0 da0: Fixed Direct Access SCSI-3 device da0: 80.000MB/s transfers (40.000MHz, offset 31, 16bit), Tagged Queueing Enabled da0: 35003MB (71687372 512 byte sectors: 255H 63S/T 4462C) Mounting root from ufs:/dev/ad4s1a WARNING: / was not properly dismounted WARNING: /home was not properly dismounted /home: mount pending error: blocks 1092 files 2 WARNING: /home/data/merplass was not properly dismounted xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 120 bytes xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 180 bytes xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 240 bytes xl0: transmission error: 90 xl0: tx underrun, increasing tx start threshold to 300 bytes I've been debugging this on and off for a while now. Tar to tape worked on the first couple of attempts, as far as I can tell from mt output compression is enabled in the drive (meaning there should be space for the data), but "excessive write errors" messages have been turning up in the syslog messages - as in Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): WRITE FILEMARKS. CDB: 10 0 0 0 2 0 Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): CAM Status: SCSI Status Error Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): SCSI Status: Check Condition Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): MEDIUM ERROR asc:3,2 Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): Excessive write errors Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): Retries Exhausted Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): failed to write terminating filemark(s) Mar 18 02:41:49 filehut kernel: (sa0:sym0:0:6:0): tape is now frozen- use an OFFLINE, REWIND or MTEOM command to clear this state. I was beginning to think I'd need to replace the tape drive, but the camq_init message made me think this could be a driver problem (the driver is afaik not supported in FreeBSD/amd64 at all, for example). The question is, what's the next reasonable debugging step here? (and I know you're dying to ask - we do rsync to an off-site location twice a day) - P -- Peter N. M. Hansteen, member of the first RFC 1149 implementation team http://www.blug.linux.no/rfc1149/ http://www.datadok.no/ http://www.nuug.no/ "First, we kill all the spammers" The Usenet Bard, "Twice-forwarded tales"