Date: Wed, 31 Aug 2005 18:03:13 +0100 From: "Steven Hartland" <killing@multiplay.co.uk> To: <freebsd-hackers@freebsd.org> Subject: Debugging an unknown reboot (disk / io related) Message-ID: <02db01c5ae4d$e38a1780$b3db87d4@multiplay.co.uk>
next in thread | raw e-mail | index | archive | help
When running a large rsync on one of our machines here it constantly ditches and reboots leaving no traces in the logs or anything. It looks like it could be a driver error but with no crash log or panic message to go on I dont know where to start. The machine is running 5.4-RELEASE-p2 and the latest driver set downloaded and compiled locally. The only error I have to go on is the errors displayed in the ssh session running the rsync. 35111 files to consider rsync: readdir(games/fps/sof2/server): Input/output error (5) rsync: readdir(games/fps/soldner): Input/output error (5) ... ... rsync: mkstemp "/usr/home/ftp/pub/apps/3dmark/win32/.3DMark03.exe.NhcgGA" failed: Input/output error (5) rsync: connection unexpectedly closed (1667283 bytes received so far) [receiver] rsync error: error in rsync protocol data stream (code 12) at io.c(365) rsync: connection unexpectedly closed (1667263 bytes received so far) [generator] rsync error: error in rsync protocol data stream (code 12) at io.c(365) Segmentation fault root@backup1> I've tried running with witness enabled but it fails to boot with a message about hpt_lock. I also tried originally with the default hptmv driver and no joy. When it crashes it takes the RAID5 with it always dropping the same disk. I've replaced the cable, disk and even plugged the disk direct to the raid controller on a different channel to eliminate the supermicro hotswap bay the disks are mounted in and still no changes the same disk always gets dropped. So the question is what can I try to get more info on what's happening? [dmesg] Aug 31 17:56:28 backup1 syslogd: kernel boot file is /boot/kernel/kernel Aug 31 17:56:28 backup1 kernel: Copyright (c) 1992-2005 The FreeBSD Project. Aug 31 17:56:28 backup1 kernel: Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 Aug 31 17:56:28 backup1 kernel: The Regents of the University of California. All rights reserved. Aug 31 17:56:28 backup1 kernel: FreeBSD 5.4-RELEASE-p2 #6: Thu Jun 23 00:23:54 UTC 2005 Aug 31 17:56:28 backup1 kernel: root@backup1:/.usr/i386/src/sys/i386/compile/MPUK_SMP_200HZ Aug 31 17:56:28 backup1 kernel: Timecounter "i8254" frequency 1193182 Hz quality 0 Aug 31 17:56:28 backup1 kernel: CPU: AMD Opteron(tm) Processor 244 (1794.41-MHz 686-class CPU) Aug 31 17:56:28 backup1 kernel: Origin = "AuthenticAMD" Id = 0xf5a Stepping = 10 Aug 31 17:56:28 backup1 kernel: Features=0x78bfbff<FPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,MMX,FXSR,SSE,SSE2> Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!> Aug 31 17:56:28 backup1 kernel: AMD Features=0xe0500000<NX,AMIE,LM,DSP,3DNow!> Aug 31 17:56:28 backup1 kernel: real memory = 2146893824 (2047 MB) Aug 31 17:56:28 backup1 kernel: avail memory = 2099625984 (2002 MB) Aug 31 17:56:28 backup1 kernel: ACPI APIC Table: <PTLTD APIC > Aug 31 17:56:28 backup1 kernel: FreeBSD/SMP: Multiprocessor System Detected: 2 CPUs Aug 31 17:56:28 backup1 kernel: cpu0 (BSP): APIC ID: 0 Aug 31 17:56:28 backup1 kernel: cpu1 (AP): APIC ID: 1 Aug 31 17:56:28 backup1 kernel: MADT: Forcing active-low polarity and level trigger for SCI Aug 31 17:56:28 backup1 kernel: ioapic0 <Version 1.1> irqs 0-23 on motherboard Aug 31 17:56:28 backup1 kernel: ioapic1 <Version 1.1> irqs 24-27 on motherboard Aug 31 17:56:28 backup1 kernel: ioapic2 <Version 1.1> irqs 28-31 on motherboard Aug 31 17:56:28 backup1 kernel: npx0: <math processor> on motherboard Aug 31 17:56:28 backup1 kernel: npx0: INT 16 interface Aug 31 17:56:28 backup1 kernel: acpi0: <PTLTD XSDT> on motherboard Aug 31 17:56:28 backup1 kernel: acpi0: Power Button (fixed) Aug 31 17:56:28 backup1 kernel: acpi0: Sleep Button (fixed) Aug 31 17:56:28 backup1 kernel: acpi_bus_number: can't get _ADR Aug 31 17:56:28 backup1 last message repeated 2 times Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported Aug 31 17:56:28 backup1 kernel: unknown: I/O range not supported Aug 31 17:56:28 backup1 kernel: ACPI-1304: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: ACPI-0239: *** Error: Method execution failed [\_SB_.PCI0.LPC_.LPT_._CRS] (Node 0xc30937a0), AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: can't fetch resources for \_SB_.PCI0.LPC_.LPT_ - AE_AML_BUFFER_LIMIT Aug 31 17:56:28 backup1 kernel: Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 Aug 31 17:56:28 backup1 kernel: acpi_timer0: <24-bit timer at 3.579545MHz> port 0x8008-0x800b on acpi0 Aug 31 17:56:28 backup1 kernel: cpu0: <ACPI CPU> on acpi0 Aug 31 17:56:28 backup1 kernel: cpu1: <ACPI CPU> on acpi0 Aug 31 17:56:28 backup1 kernel: acpi_button0: <Power Button> on acpi0 Aug 31 17:56:28 backup1 kernel: pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 Aug 31 17:56:28 backup1 kernel: pci0: <ACPI PCI bus> on pcib0 Aug 31 17:56:28 backup1 kernel: pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 Aug 31 17:56:28 backup1 kernel: pci1: <ACPI PCI bus> on pcib1 Aug 31 17:56:28 backup1 kernel: pci1: <display, VGA> at device 0.0 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib2: <ACPI PCI-PCI bridge> at device 6.0 on pci0 Aug 31 17:56:28 backup1 kernel: pci2: <ACPI PCI bus> on pcib2 Aug 31 17:56:28 backup1 kernel: bge0: <Broadcom BCM5705 Gigabit Ethernet, ASIC rev. 0x3003> mem 0xe8100000-0xe810ffff irq 19 at device 5.0 on pci2 Aug 31 17:56:28 backup1 kernel: miibus0: <MII bus> on bge0 Aug 31 17:56:28 backup1 kernel: brgphy0: <BCM5705 10/100/1000baseTX PHY> on miibus0 Aug 31 17:56:28 backup1 kernel: brgphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, 1000baseTX, 1000baseTX-FDX, auto Aug 31 17:56:28 backup1 kernel: bge0: Ethernet address: 00:0f:ea:7a:50:08 Aug 31 17:56:28 backup1 kernel: atapci0: <SiI 3114 SATA150 controller> port 0x3000-0x300f,0x3010-0x3013,0x3018-0x301f,0x3014-0x3017,0x3020-0x3027 mem 0xe8110000-0xe81103ff irq 18 at device 6.0 on pci2 Aug 31 17:56:28 backup1 kernel: ata2: channel #0 on atapci0 Aug 31 17:56:28 backup1 kernel: ata3: channel #1 on atapci0 Aug 31 17:56:28 backup1 kernel: ata4: channel #2 on atapci0 Aug 31 17:56:28 backup1 kernel: ata5: channel #3 on atapci0 Aug 31 17:56:28 backup1 kernel: isab0: <PCI-ISA bridge> at device 7.0 on pci0 Aug 31 17:56:28 backup1 kernel: isa0: <ISA bus> on isab0 Aug 31 17:56:28 backup1 kernel: atapci1: <AMD 8111 UDMA133 controller> port 0x1000-0x100f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 7.1 on pci0 Aug 31 17:56:28 backup1 kernel: ata0: channel #0 on atapci1 Aug 31 17:56:28 backup1 kernel: ata1: channel #1 on atapci1 Aug 31 17:56:28 backup1 kernel: pci0: <bridge> at device 7.3 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib3: <ACPI Host-PCI bridge> on acpi0 Aug 31 17:56:28 backup1 kernel: pci8: <ACPI PCI bus> on pcib3 Aug 31 17:56:28 backup1 kernel: pcib4: <ACPI PCI-PCI bridge> at device 3.0 on pci8 Aug 31 17:56:28 backup1 kernel: pci9: <ACPI PCI bus> on pcib4 Aug 31 17:56:28 backup1 kernel: bge1: <Broadcom BCM5703X Gigabit Ethernet, ASIC rev. 0x1100> mem 0xf8100000-0xf810ffff irq 25 at device 1.0 on pci9 Aug 31 17:56:28 backup1 kernel: bge1: Ethernet address: 00:10:18:0d:cc:da Aug 31 17:56:28 backup1 kernel: pci8: <base peripheral, interrupt controller> at device 3.1 (no driver attached) Aug 31 17:56:28 backup1 kernel: pcib5: <ACPI PCI-PCI bridge> at device 4.0 on pci8 Aug 31 17:56:28 backup1 kernel: pci14: <ACPI PCI bus> on pcib5 Aug 31 17:56:28 backup1 kernel: hptmv0: <RocketRAID 182x SATA Controller> mem 0xf8200000-0xf827ffff irq 30 at device 2.0 on pci14 Aug 31 17:56:28 backup1 kernel: RocketRAID 182x SATA Controller driver Version 1.1 Aug 31 17:56:28 backup1 kernel: RR182x [0,0]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,1]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,2]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,4]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x [0,5]: channel started successfully Aug 31 17:56:28 backup1 kernel: RR182x: RAID5 write-back enabled Aug 31 17:56:28 backup1 kernel: pci8: <base peripheral, interrupt controller> at device 4.1 (no driver attached) Aug 31 17:56:28 backup1 kernel: atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 Aug 31 17:56:28 backup1 kernel: atkbd0: <AT Keyboard> irq 1 on atkbdc0 Aug 31 17:56:28 backup1 kernel: fdc0: <floppy drive controller> port 0x3f7,0x3f0-0x3f5 irq 6 drq 2 on acpi0 Aug 31 17:56:28 backup1 kernel: sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 Aug 31 17:56:28 backup1 kernel: sio0: type 16550A Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: orm0: <ISA Option ROMs> at iomem 0xcd000-0xd2fff,0xcb000-0xccfff,0xc0000-0xcafff on isa0 Aug 31 17:56:28 backup1 kernel: sc0: <System console> at flags 0x100 on isa0 Aug 31 17:56:28 backup1 kernel: sc0: VGA <16 virtual consoles, flags=0x300> Aug 31 17:56:28 backup1 kernel: vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 Aug 31 17:56:28 backup1 kernel: sio1: configured irq 3 not in bitmap of probed irqs 0 Aug 31 17:56:28 backup1 kernel: sio1: port may not be enabled Aug 31 17:56:28 backup1 kernel: Timecounters tick every 5.000 msec Aug 31 17:56:28 backup1 kernel: da0 at hptmv0 bus 0 target 0 lun 0 Aug 31 17:56:28 backup1 kernel: da0: <RR182x RAID 5 Array 3.00> Fixed Direct Access SCSI-0 device Aug 31 17:56:28 backup1 kernel: da0: 1526216MB (3125691008 512 byte sectors: 255H 63S/T 194565C) Aug 31 17:56:28 backup1 kernel: da1 at hptmv0 bus 0 target 1 lun 0 Aug 31 17:56:28 backup1 kernel: da1: <ST340083 2AS 3.03> Fixed Direct Access SCSI-0 device Aug 31 17:56:28 backup1 kernel: da1: 381554MB (781422757 512 byte sectors: 255H 63S/T 48641C) Aug 31 17:56:28 backup1 kernel: SMP: AP CPU #1 Launched! Aug 31 17:56:28 backup1 kernel: Mounting root from ufs:/dev/da0s1d Aug 31 17:56:28 backup1 kernel: WARNING: / was not properly dismounted Aug 31 17:56:28 backup1 kernel: WARNING: R/W mount of / denied. Filesystem is not clean - run fsck [/dmesg] ================================================ This e.mail is private and confidential between Multiplay (UK) Ltd. and the person or entity to whom it is addressed. In the event of misdirection, the recipient is prohibited from using, copying, printing or otherwise disseminating it or any information contained in it. In the event of misdirection, illegible or incomplete transmission please telephone (023) 8024 3137 or return the E.mail to postmaster@multiplay.co.uk.
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?02db01c5ae4d$e38a1780$b3db87d4>