Date: Tue, 9 Nov 2004 18:27:27 GMT From: Heikki Soerum <heikkis@matnat.uio.no> To: freebsd-gnats-submit@FreeBSD.org Subject: kern/73740: [Panic] & [nfs(?)] 5-3-R#3 panic when accessing nfs exported ATA drives in Idle or suspend mode. Message-ID: <200411091827.iA9IRR8o025782@www.freebsd.org> Resent-Message-ID: <200411091830.iA9IULmo097035@freefall.freebsd.org>
next in thread | raw e-mail | index | archive | help
>Number: 73740 >Category: kern >Synopsis: [Panic] & [nfs(?)] 5-3-R#3 panic when accessing nfs exported ATA drives in Idle or suspend mode. >Confidential: no >Severity: serious >Priority: medium >Responsible: freebsd-bugs >State: open >Quarter: >Keywords: >Date-Required: >Class: sw-bug >Submitter-Id: current-users >Arrival-Date: Tue Nov 09 18:30:20 GMT 2004 >Closed-Date: >Last-Modified: >Originator: Heikki Soerum >Release: 5.3-RELEASE #3 >Organization: >Environment: FreeBSD a-ko 5.3-RELEASE FreeBSD 5.3-RELEASE #3: Tue Nov 9 11:07:05 CET 2004 root@a-ko:/usr/obj/usr/src/sys/A-KO i386 >Description: Problems started to occur after updating from 5.2.1-p9 to 5.3-release. Running GENERIC kernel renamed to A-KO and with devicepolling included. In addition debuging options vas enabled to produce the debuging trace.(at least i hoped for it.) The computer is an low cost fileserver serving several Linux clients over NFS. The server usually runs unattended 24/7 with the harddrives in IDLE or SUSPEND mode to reduce heat and noise since the files are only accessed occasionally. I use sysutils/ataidle to set the IDLE time to 5 minutes and the SUSPEND time to 10 minutes. before 5.3-RELEASE the only effect was occasional timeout warnings on local console (se dmesg) when the nfs mounted partitions were idle and an access attempts were made locally or over nfs. This incured no crashes, only an delay until the harddrives had time to spin up and answer the read/write request. But now the nfs client computer running linux gentoo and nfsutils 1.0.6-r4 will sometimes throw an message that one of several of the remote NFS mounts are "to big to be calculated" when running 'df -h'. After this any read/write or mount/umount attempts on the Freebsd server on the particular harddrives causes an fatal kernel panic. unmounting the nfs mounts on the remote linux client also causes the kernel panic on the fbsd box, but as long as _no_ I/O attempts are made towards the affected mounts everything appears to be fine. It is _not_ clear if this is an panic caused by NFS bugs, kernel bugs or both. I have not been able to reproduce the crashes when _not_ exporting the drives as nfs mounts. All NFS mounts are mounted with these options on the client :(rw,soft,intr,bg,nfsvers=3,retry=20,addr=192.168.1.10 ) I suspect that there might be an connection to an observed change in behavior from 5.2.1 to 5.3-release, because occasionally on 5.3-R an read/write request on an idle/suspended harddrive vil cause an input/output error to be printed on the local console. An wild guess would be that the NFS daemon or an process attached to it can't handle the input/output error message and causes an buffer overflow or integer to wrap around into zero or negative values. This again leads to an panic. Even an kernel running with debuging enabled and witness enabled will only produce this short panic message: PANIC message: ------------ panic: vrele: negative ref cnt uptime: (from a couple of minutes to a few days.) DMESG: PS. The READ_DMA timeouts and failures are normal(?) noncritical behaviour on f-bsd prior to 5.3-R that occur when attempting to read from an idling or suspended harddrive and can be ignored. They are only included to show prior behavior. These were present in erlier versions of fbsd that didn't panic and are still present. --------- Copyright (c) 1992-2004 The FreeBSD Project. Copyright (c) 1979, 1980, 1983, 1986, 1988, 1989, 1991, 1992, 1993, 1994 The Regents of the University of California. All rights reserved. FreeBSD 5.3-RELEASE #3: Tue Nov 9 11:07:05 CET 2004 root@a-ko:/usr/obj/usr/src/sys/A-KO WARNING: WITNESS option enabled, expect reduced performance. Timecounter "i8254" frequency 1193182 Hz quality 0 CPU: VIA C3 Samuel 2 (601.37-MHz 686-class CPU) Origin = "CentaurHauls" Id = 0x673 Stepping = 3 Features=0x803035<FPU,DE,TSC,MSR,MTRR,PGE,MMX> real memory = 251592704 (239 MB) avail memory = 236539904 (225 MB) npx0: [FAST] npx0: <math processor> on motherboard npx0: INT 16 interface acpi0: <VT9174 AWRDACPI> on motherboard acpi0: Power Button (fixed) Timecounter "ACPI-fast" frequency 3579545 Hz quality 1000 acpi_timer0: <24-bit timer at 3.579545MHz> port 0x408-0x40b on acpi0 cpu0: <ACPI CPU (3 Cx states)> on acpi0 acpi_button0: <Power Button> on acpi0 pcib0: <ACPI Host-PCI bridge> port 0xcf8-0xcff on acpi0 pci0: <ACPI PCI bus> on pcib0 agp0: <VIA Generic host to PCI bridge> mem 0xe6000000-0xe6ffffff at device 0.0 on pci0 pcib1: <ACPI PCI-PCI bridge> at device 1.0 on pci0 pci1: <ACPI PCI bus> on pcib1 pci1: <display, VGA> at device 0.0 (no driver attached) vr0: <VIA VT6105 Rhine III 10/100BaseTX> port 0xc000-0xc0ff mem 0xe8005000-0xe80050ff irq 12 at device 15.0 on pci0 miibus0: <MII bus> on vr0 ukphy0: <Generic IEEE 802.3u media interface> on miibus0 ukphy0: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr0: Ethernet address: 00:40:63:c9:da:14 uhci0: <VIA 83C572 USB controller> port 0xc400-0xc41f irq 10 at device 16.0 on pci0 uhci0: [GIANT-LOCKED] usb0: <VIA 83C572 USB controller> on uhci0 usb0: USB revision 1.0 uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <VIA 83C572 USB controller> port 0xc800-0xc81f irq 11 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <VIA 83C572 USB controller> on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 5 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <VIA 83C572 USB controller> on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: <serial bus, USB> at device 16.3 (no driver attached) isab0: <PCI-ISA bridge> at device 17.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <VIA 8235 UDMA133 controller> port 0xd000-0xd00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pcm0: <VIA VT8235> port 0xd400-0xd4ff irq 5 at device 17.5 on pci0 pcm0: [GIANT-LOCKED] pcm0: <VIA Technologies VIA1612A AC97 Codec> vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xd800-0xd8ff mem 0xe8004000-0xe80040ff irq 10 at device 18.0 on pci0 miibus1: <MII bus> on vr1 ukphy1: <Generic IEEE 802.3u media interface> on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr1: Ethernet address: 00:40:63:c9:da:13 atapci1: <Promise PDC20268 UDMA100 controller> port 0xec00-0xec0f,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 mem 0xe8000000-0xe8003fff i rq 11 at device 20.0 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A uhub0: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub0: 2 ports with 2 removable, self powered uhci1: <VIA 83C572 USB controller> port 0xc800-0xc81f irq 11 at device 16.1 on pci0 uhci1: [GIANT-LOCKED] usb1: <VIA 83C572 USB controller> on uhci1 usb1: USB revision 1.0 uhub1: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub1: 2 ports with 2 removable, self powered uhci2: <VIA 83C572 USB controller> port 0xcc00-0xcc1f irq 5 at device 16.2 on pci0 uhci2: [GIANT-LOCKED] usb2: <VIA 83C572 USB controller> on uhci2 usb2: USB revision 1.0 uhub2: VIA UHCI root hub, class 9/0, rev 1.00/1.00, addr 1 uhub2: 2 ports with 2 removable, self powered pci0: <serial bus, USB> at device 16.3 (no driver attached) isab0: <PCI-ISA bridge> at device 17.0 on pci0 isa0: <ISA bus> on isab0 atapci0: <VIA 8235 UDMA133 controller> port 0xd000-0xd00f,0x376,0x170-0x177,0x3f6,0x1f0-0x1f7 at device 17.1 on pci0 ata0: channel #0 on atapci0 ata1: channel #1 on atapci0 pcm0: <VIA VT8235> port 0xd400-0xd4ff irq 5 at device 17.5 on pci0 pcm0: [GIANT-LOCKED] pcm0: <VIA Technologies VIA1612A AC97 Codec> vr1: <VIA VT6102 Rhine II 10/100BaseTX> port 0xd800-0xd8ff mem 0xe8004000-0xe80040ff irq 10 at device 18.0 on pci0 miibus1: <MII bus> on vr1 ukphy1: <Generic IEEE 802.3u media interface> on miibus1 ukphy1: 10baseT, 10baseT-FDX, 100baseTX, 100baseTX-FDX, auto vr1: Ethernet address: 00:40:63:c9:da:13 atapci1: <Promise PDC20268 UDMA100 controller> port 0xec00-0xec0f,0xe800-0xe803,0xe400-0xe407,0xe000-0xe003,0xdc00-0xdc07 mem 0xe8000000-0xe8003fff i rq 11 at device 20.0 on pci0 ata2: channel #0 on atapci1 ata3: channel #1 on atapci1 sio0: <16550A-compatible COM port> port 0x3f8-0x3ff irq 4 flags 0x10 on acpi0 sio0: type 16550A ppc0: <Standard parallel printer port> port 0x378-0x37f irq 7 on acpi0 ppc0: Generic chipset (EPP/NIBBLE) in COMPATIBLE mode ppbus0: <Parallel port bus> on ppc0 plip0: <PLIP network interface> on ppbus0 lpt0: <Printer> on ppbus0 lpt0: Interrupt-driven port ppi0: <Parallel I/O> on ppbus0 atkbdc0: <Keyboard controller (i8042)> port 0x64,0x60 irq 1 on acpi0 atkbd0: <AT Keyboard> irq 1 on atkbdc0 kbd0 at atkbd0 atkbd0: [GIANT-LOCKED] pmtimer0 on isa0 orm0: <ISA Option ROMs> at iomem 0xd8000-0xda7ff,0xc0000-0xcdfff on isa0 sc0: <System console> at flags 0x100 on isa0 sc0: VGA <16 virtual consoles, flags=0x300> vga0: <Generic ISA VGA> at port 0x3c0-0x3df iomem 0xa0000-0xbffff on isa0 sio1: configured irq 3 not in bitmap of probed irqs 0 sio1: port may not be enabled Timecounter "TSC" frequency 601367159 Hz quality 800 Timecounters tick every 10.000 msec witness_get: witness exhausted ipfw2 initialized, divert disabled, rule-based forwarding disabled, default to deny, logging disabled acpi_cpu: throttling enabled, 2 steps (100% to 50.0%), currently 100.0% ad0: 38204MB <SAMSUNG SP0411N/TW100-11> [77622/16/63] at ata0-master UDMA100 ad1: 38204MB <SAMSUNG SP0411N/TW100-11> [77622/16/63] at ata0-slave UDMA100 ad2: 190782MB <ST3200822A/3.01> [387621/16/63] at ata1-master UDMA100 ad3: 190782MB <ST3200822A/3.01> [387621/16/63] at ata1-slave UDMA100 ad4: 114473MB <ST3120026A/3.06> [232581/16/63] at ata2-master UDMA100 ad5: 114473MB <ST3120026A/3.06> [232581/16/63] at ata2-slave UDMA100 ad6: 190782MB <ST3200822A/3.01> [387621/16/63] at ata3-master UDMA100 ad7: 190782MB <ST3200822A/3.01> [387621/16/63] at ata3-slave UDMA100 Mounting root from ufs:/dev/ad0s1a ad2: TIMEOUT - READ_DMA retrying (2 retries left) LBA=12127 ad2: FAILURE - READ_DMA timed out ad3: TIMEOUT - READ_DMA retrying (2 retries left) LBA=191 ad3: FAILURE - READ_DMA timed out >How-To-Repeat: 0. Boot GENERIC kernel and run nfsd on 5.3-R#3 1. run sysutils/ataidle to set timeout options on harddives 2. mount local partitions that will be NFS exported 3. mount exported NFS partitions on an nfs linux client 4. wait until the drives has entered idle or suspend 5. run 'df -h' or 'ls' or 'mv' or any other read/write operation that exceedes the content of the clients NFS cache. Occasionally one of these operations will throw an error message or input/output error if the drive is idle/suspended. this happens fairly often, but not always. 6. If warning message occurs, any futher read/write attempts both locally or remotely on the affected drive will cause an kernel panic and freeze on the Freebsd box. >Fix: Unknown. Possible workaround: Nott really, but see my _uneducated_ guess on the description. >Release-Note: >Audit-Trail: >Unformatted:
Want to link to this message? Use this URL: <https://mail-archive.FreeBSD.org/cgi/mid.cgi?200411091827.iA9IRR8o025782>